0% found this document useful (0 votes)
20 views

Multiple Regression Model

The document describes a multiple regression model where the dependent variable Y is modeled as a linear function of independent variables X1 and X2 plus an error term. β1 and β2 represent the sensitivity of Y to changes in X1 and X2 respectively, holding other factors constant. β2 specifically measures the difference in the expected value of Y when X2=1 versus when X2=0. The model is written in matrix notation and OLS is used to estimate the unknown β parameters. The OLS estimators are proven to be unbiased under certain assumptions. The variance-covariance matrix of the estimators is also derived. An estimator for the error variance is provided and its properties discussed. Perfect multicollinearity and how

Uploaded by

frapass99
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Multiple Regression Model

The document describes a multiple regression model where the dependent variable Y is modeled as a linear function of independent variables X1 and X2 plus an error term. β1 and β2 represent the sensitivity of Y to changes in X1 and X2 respectively, holding other factors constant. β2 specifically measures the difference in the expected value of Y when X2=1 versus when X2=0. The model is written in matrix notation and OLS is used to estimate the unknown β parameters. The OLS estimators are proven to be unbiased under certain assumptions. The variance-covariance matrix of the estimators is also derived. An estimator for the error variance is provided and its properties discussed. Perfect multicollinearity and how

Uploaded by

frapass99
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

MULTIPLE REGRESSION MODEL

The Data Generating Process (DGP), or the population, is described by the following linear model.
Y j=β 0 + β 1 X j ,1 + β 2 X j ,2 + ε j
where:
 Y j is the j-th observation of the dependent variable Y (it is known)
 X j ,1and X j ,2 are the j-th observations of the independent variables X 1 and X 2 (they are known)
 β 0 is the intercept term (it is unknown)
 β 1 and β 2 are parameters (they are unknown)
 ε j is the j-th error, the j-th unobserved factor that, besides X 1 and X 2 , affects Y (it is unknown)

If X j ,1varies of ∆ X j , 1, then Y j varies of


∆ Y j=β 1 ∆ X j ,1

So β 1 is the sensitivity of Y j to variations in X j ,1assuming that the other factors remain constant.

∆Y j
β 1=
∆ X j ,1
If X j ,2 varies of ∆ X j , 2, then Y j varies of
∆ Y j=β 2 ∆ X j ,2

So β 2 is the sensitivity of Y j to variations in X j ,2 assuming that the other factors remain constant.

∆Y j
β 2=
∆ X j ,2
In other words, β j measures the effect of a unit variation of the j-th regressor on Y by keeping
the other regressors (and the unknown factors) constant.

Suppose X j ,2 can be either 0 or 1 (dummy variable).


Example: X j ,2=0: no crisis, X j ,2=1 : crisis
Assume that
1) E [ ε j ] =0
2) E [ ε j∨ X 1 , X 2 ] =E [ ε j ] =0

Then
E [ Y j∨X j ,1 , X j , 2=0 ]=β 0 + β 1 X j , 1+ β2 ∙0=¿ β 0 + β 1 X j ,1 ¿
E [ Y j∨X j ,1 , X j , 2=1 ]=β 0 + β 1 X j ,1 + β 2 ∙ 1=¿ β 0 + β 1 X j , 1+ β 2 ¿

Now, subtract the second from the first


E [ Y j∨X j ,1 , X j , 2=1 ]−E [ Y j∨X j ,1 , X j , 2=0 ]= β0 + β 1 X j , 1+ β2− β0 + β 1 X j , 1=β 2

β 2 measures the difference between the expected value of Y j when X j ,2=1 and the expected value of
Y j when X j ,2=0, keeping the other regressors constant.
Multiple regression model: the matrix notation
Collect the observations { Y 1 , … , Y T } , {X 1, 1 , … , X T ,1 } and { Y 1 , … , Y T } , {X 1, 2 , … , X T ,2 } into vectors

Where T is the sample size.


The model Y j=β 0 + β 1 X j ,1 + β 2 X j ,2 + ε j becomes written as

That can be summarized with


Y = Xβ+ ε
If the number of the regressors increases, the matrix of the regressors and the vector of the beta increase
their dimension.

This is our new DGP. Let’s derive the OLS estimators minimizing the sum of the squared errors.
T
First of all, the objective function is O ( β 0 , β 1 , β 2 )=∑ ε t
2

t =1

Considering that X’X is always symmetric, the FOC’s are


' '
∇ β O ( β 0 , β 1 , β2 ) =0 ⟹ X Xβ= X Y
And assuming that X’X is invertible, the OLS estimator for a generic number of regressors is

^β=( X ' X )−1 (X ' Y )

You always have to verify that det(X’X) ≠ 0, because if it is not the matrix is not invertible.
Sometimes the determinant can be very close to 0 (ex. when there are regressors that are collinear, that
are high correlated), so even if the inverse can be computed and Matlab computes it, there will be huge
numerical errors and very poor results (closer is the determinant to 0, worst is the performance of the
algorithm that Matlab uses to calculate the inverse). For this reason, when the determinant is very small,
Matlab gives you a “Warning”.
Unbiasedness of the OLS estimators
Assume that
1) the DGP is Y = X β+ ε
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0

These 4 assumptions are enough to prove that the OLS estimator is unbiased.
^β can also be expressed as
^β=β + ( X ' X )−1 ( X ' ε )

From this, we derive that


E [ ^β|X ] = β
And then, by the law of Iterated expectations, ^β is an unbiased estimator of β , in the sense that

E [ ^β ]=β

The Conditional Variance-Covariance Matrix of the OLS estimators


The X-conditional variance-covariance matrix of the estimator ^β is calculated conditionally on having
observed the X. Consider the case K = 2:

It’s a 3x3 symmetric matrix which has on the main diagonal the conditional variances of the estimators
(there should be “|X” inside the square bracket) and in all the other positions the conditional covariances
between the estimators.

If we assume that
1) the DGP is Y = X β+ ε
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2

6) E [ ε t ε s ] =E [ ε t ] E [ ε s ] =0 for all t ≠ s (the errors are independent)

Then, the X-conditional variance covariance matrix of the estimator ^β coincides with…
2
−1
∑ ¿ E [( ^β−¿ β )( β−
'
^ β ) ∨ X ]¿ σ 2ε ( X ' X ) ¿
X

Estimator of the error’s variance


If we assume that
1) the DGP is Y = X β+ ε
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2

6) E [ ε t ε s ] =E [ ε t ] E [ ε s ] =0 for all t ≠ s (the errors are independent)

If we call ε^ =Y − X β^ =Y −Y
^ the residuals of the OLS regression, then the random variable
T

∑ ε^ 2j SSR
σ^2ε = j=1
=
T −( K +1) T −(K +1)
(where K is the number of the known regressors - then we add 1, that is the known constant regressor
associated to β 0 ¿ is an unbiased estimator of the error’s variance σ ε , in the sense that
2

E [ σ^ ¿ ¿ ε ]=σ ε ¿
2 2

If we have the additional assumption that the errors are i.i.d. r.vs. with normal distribution ε j N ( 0 , σ ε ),
2

then from the fact that ^β is a linear combination of the unknown errors ε

^β=β + ( X ' X )−1 ( X ' ε )


we derive that
^β N ( β , σ 2 ( X ' X )−1)
ε

Perfect multicollinearity
M out of K regressors X j ,t (where j = 1, …, M) have perfect multicollinearity if you can find a coefficient
different from 0 such that the linear combination of the M regressors is equal to 0 for all t (namely, if one
regressor can be expressed as a linear function of the others):
∃ λ 0 , … , λ M ∈ R such that λ0 + λ 1 X t ,1 +…+ λ M X t , M =0 t=1 , … , T

If among the regressors X 1 , … , X K two or more of them have perfect multicollinearity, then the
assumption number 2) is not respected, because det(X’X) = 0 and ∄(X’X)-1.

Some observations:
- Multicollinearity is a data problem
- Multicollinearity is more recurrent in small samples (micronumerosity) -> Solution: increase the
number of observations
- More regressors you include in your regression, higher is the probability to have multicollinearity (if a
model with 10 regressors has more or less the same results of a model with 100 regressors, it’s better
to choose the more parsimonious model, but you pay a price: the SSR increases)

Multicollinearity – detection rule: The Variance Inflation Factor


The variances of the regressors that are multicollinear are inflated. This implies a very poor estimate of the
regressors and a very poor statistical inference1.

1
Multicollinearity implies inflated variance and inflated variance implies deflated t-statistics (therefore, you could say
that the regressor isn’t significant while it is)
In order to find out if there is multicollinearity between the regressors, take the j-th regressor and regress it
on the other explanatory variables:
X t , j=β 0 + β 1 X t ,1 +…+ β j−1 X t , j−1+ …+ β j +1 X t , j+1 + β K X t , K + ε t
If the regressor that I regress on the others is multicollinear with one of the other explanatory variables, the
coefficient of determination of the regression will be very high, because a huge part of the variance of the
multicollinear regressor is explained by another one.

A possibility is to remove X t , j and re-check for multicollinearity.

Variance of the OLS estimators


If we assume that
1) the DGP is Y t =β 0 + β 1 X t ,1 +…+ β K X t , K +ε t
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε
2 2 2 2 2

6) E [ ε t ε s ] =E [ ε t ] E [ ε s ] =0 for all t ≠ s (the errors are independent)


7) the errors are i.i.d. r.vs. with normal distribution ε j N ( 0 , σ ε )
2

Then the OLS estimator minus its expected value is a gaussian distributed r.v. with variance equal to the j-th
element on the diagonal of the variance covariance matrix
^β −β N ( 0 , σ 2^ )
j j β j

Where
2
2 σε
σ ^β = 2
j
SS T j (1−R j )

( )
T T 2
1
 SS T j=∑ X t , j− ∑ X s , j
t =1 T s=1
 SST of the j-th regressor
R j is the R of the regression X t , j=β 0 + β 1 X t ,1 +…+ β j−1 X t , j−1+ …+ β j +1 X t , j+1 + β K X t , K + ε t
2 2

2
 R obtained regressing one regressor on the others

So, the old variance is multiplied by the VIF!

2 2
If there is multicollinearity, R j →1 and σ ^β →+ ∞ , so the estimates of β j are very noisy!
j

Remedy: remove the regressor or, whenever possible, increase T to have SS T j →+ ∞

The Gauss-Markov Theorem


Under the assumptions
1) the DGP is Y t =β 0 + β 1 X t ,1 +…+ β K X t , K +ε t
2) det(X’X) ≠ 0, so X’X is invertible
3) E [ ε t ] =0 (the errors have 0 mean)
4) E [ ε t∨ X 1 , … , X K ] =E [ ε t ] =0(the errors are mean independent from the regressors)
5) V [ ε j∨X 1 , … , X n ] =E [ ε j ∨X 1 , … , X n ] −( E [ ε j∨ X 1 , … , X n ] ) =E [ ε j ∨X 1 , … , X n ]=E [ ε j ]=σ ε (the
2 2 2 2 2

errors are variance independent from the regressors)


6) E [ ε t ε s ] =E [ ε t ] E [ ε s ] =0 for all t ≠ s (the errors are independent)

The OLS estimator ^β is BLUE (Best Linear Unbiased Estimator): among all the unbiased linear2 estimators, it
is the one with the least variance.

2 ^β is a linear combination of the data: ^β = (X’X)-1(X’Y)

You might also like