multiple regression edit_removed
multiple regression edit_removed
y X 11 X 2 2 ... X k k .
This is called the multiple linear regression model. The parameters 1 , 2 ,..., k are the regression
coefficients associated with X 1 , X 2 ,..., X k respectively and is the random error component reflecting the
difference between the observed and fitted linear relationship. There can be various reasons for such
difference, e.g., the joint effect of those variables not included in the model, random factors which can not
be accounted for in the model etc.
Note that the j th regression coefficient j represents the expected change in y per unit change in the j th
E ( y )
j .
X j
Linear model:
y E ( y )
A model is said to be linear when it is linear in parameters. In such a case (or equivalently )
j j
should not depend on any ' s . For example,
i) y 0 1 X is a linear model as it is linear in the parameters.
a linear model.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
1
iii) y 0 1 X 2 X 2
is linear in parameters 0 , 1 and 2 but it is nonlinear is variables X . So it is a linear model
1
iv) y 0
X 2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
v) y 0 1 X 2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
vi) y 0 1 X 2 X 2 3 X 3
is a cubic polynomial model which can be written as
y 0 1 X 2 X 2 3 X 3
which is linear in the parameters 0 , 1 , 2 , 3 and linear in the variables X 1 X , X 2 X 2 , X 3 X 3 .
So it is a linear model.
Example:
The income and education of a person are related. It is expected that, on average, a higher level of education
provides higher income. So a simple linear regression model can be expressed as
income 0 1 education .
Not that 1 reflects the change in income with respect to per unit change in education and 0 reflects the
income when education is zero as it is expected that even an illiterate person can also have some income.
Further, this model neglects that most people have higher income when they are older than when they are
young, regardless of education. So 1 will over-state the marginal impact of education. If age and education
are positively correlated, then the regression model will associate all the observed increase in income with an
increase in education. So a better model is
income 0 1 education 2 age .
Often it is observed that the income tends to rise less rapidly in the later earning years than is early years. To
accommodate such a possibility, we might extend the model to
income 0 1education 2 age 3age 2
This is how we proceed for regression modeling in real-life situation. One needs to consider the experimental
condition and the phenomenon before making the decision on how many, why and how to choose the
dependent and independent variables.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
2
Model set up:
Let an experiment be conducted n times, and the data is obtained as follows:
Observation number Response Explanatory variables
y X1 X 2 X k
or y X .
In general, the model with k explanatory variables can be expressed as
y X
where y ( y1 , y2 ,..., yn ) ' is a n 1 vector of n observation on study variable,
(iii) Rank ( X ) k
(iv) X is a non-stochastic matrix
(v) ~ N (0, 2 I n ) .
These assumptions are used to study the statistical properties of the estimator of regression coefficients. The
following assumption is required to study, particularly the large sample properties of the estimators.
X 'X
(vi) lim exists and is a non-stochastic and nonsingular matrix (with finite elements).
n
n
The explanatory variables can also be stochastic in some cases. We assume that X is non-stochastic unless
stated separately.
We consider the problems of estimation and testing of hypothesis on regression coefficient vector under the
stated assumption.
Estimation of parameters:
A general procedure for the estimation of regression coefficient vector is to minimize
n n
M ( x) x , in general.
p
We consider the principle of least square which is related to M ( x) x 2 and method of maximum likelihood
estimation for the estimation of parameters.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
4
Principle of ordinary least squares (OLS)
Let B be the set of all possible vectors . If there is no further information, the B is k -dimensional real
Euclidean space. The object is to find a vector b ' (b1 , b2 ,..., bk ) from B that minimizes the sum of squared
for given y and X . A minimum will always exist as S ( ) is a real-valued, convex and differentiable
function. Write
S ( ) y ' y ' X ' X 2 ' X ' y .
Differentiate S ( ) with respect to
S ( )
2X ' X 2X ' y
2 S ( )
2 X ' X (atleast non-negative definite).
2
The normal equation is
S ( )
0
X ' Xb X ' y
where the following result is used:
Result: If f ( z ) Z ' AZ is a quadratic form, Z is a m 1 vector and A is any m m symmetric matrix
then F ( z ) 2 Az .
z
Since it is assumed that rank ( X ) k (full rank), then X ' X is a positive definite and unique solution of the
normal equation is
b ( X ' X ) 1 X ' y
which is termed as ordinary least squares estimator (OLSE) of .
2 S ( )
Since is at least non-negative definite, so b minimize S ( ) .
2
where ( X ' X ) is the generalized inverse of X ' X and is an arbitrary vector. The generalized inverse
= X ( X ' X ) X ' y
which is independent of . This implies that ŷ has the same value for all solution b of X ' Xb X ' y.
(ii) Note that for any ,
S ( ) y Xb X (b ) y Xb X (b )
( y Xb)( y Xb) (b ) X ' X (b ) 2(b ) X ( y Xb)
( y Xb)( y Xb) (b ) X ' X (b ) (Using X ' Xb X ' y )
( y Xb)( y Xb) S (b)
y ' y 2 y ' Xb b ' X ' Xb
y ' y b ' X ' Xb
y ' y yˆ ' yˆ .
In the case of ˆ b,
yˆ Xb
X ( X ' X ) 1 X ' y
Hy
Residuals
The difference between the observed and fitted values of the study variable is called as residual. It is
denoted as
e y ~ yˆ
y yˆ
y Xb
y Hy
(I H ) y
Hy
where H I H .
Note that
(i) H is a symmetric matrix
(ii) H is an idempotent matrix, i.e.,
HH ( I H )( I H ) ( I H ) H and
(ii) Bias
Since X is assumed to be nonstochastic and E ( ) 0
E (b ) ( X ' X ) 1 X ' E ( )
0.
Thus OLSE is an unbiased estimator of .
(iv) Variance
The variance of b can be obtained as the sum of variances of all b1 , b2 ,..., bk which is the trace of covariance
matrix of b . Thus
Var (b) tr V (b)
k
E (bi i ) 2
i 1
k
Var (bi ).
i 1
e 'e
( y Xb) '( y Xb)
y '( I H )( I H ) y
y '( I H ) y
y ' Hy.
Also
SS r e s ( y Xb) '( y Xb)
y ' y 2b ' X ' y b ' X ' Xb
y ' y b ' X ' y (Using X ' Xb X ' y )
SSr e s y ' Hy
(X )'H (X )
' H (Using HX 0)
Thus E[ y ' Hy ] (n k ) 2
y ' Hy
2
or E
n k
or E MSr e s 2
SSr e s
where MSr e s is the mean sum of squares due to residual.
nk
Thus an unbiased estimator of 2 is
ˆ 2 MSr e s s 2 (say)
which is a model-dependent estimator.
Gauss-Markov Theorem:
The ordinary least squares estimator (OLSE) is the best linear unbiased estimator (BLUE) of .
Proof: The OLSE of is
b ( X ' X ) 1 X ' y
which is a linear function of y . Consider the arbitrary linear estimator
b* a ' y
of linear parametric function ' where the elements of a are arbitrary constants.
Then for b* ,
E (b* ) E (a ' y ) a ' X
Further
Var (a ' y ) a 'Var ( y )a 2 a ' a
Var ( ' b) 'Var (b)
2 a ' X ( X ' X ) 1 X ' a.
Consider
Var (a ' y ) Var ( ' b) 2 a ' a a ' X ( X ' X ) 1 X ' a
2 a ' I X ( X ' X ) 1 X ' a
2 a '( I H )a.
This reveals that if b* is any linear unbiased estimator then its variance must be no smaller than that of b .
Consequently b is the best linear unbiased estimator, where ‘best’ refers to the fact that b is efficient within
the class of linear and unbiased estimators.
1 1 n 2
(2 2 ) n /2
exp 2 2 i
i 1
1 1
exp 2 '
(2 )
2 n /2
2
1
1
exp 2 ( y X ) '( y X ) .
(2 ) 2
2 n /2
Since the log transformation is monotonic, so we maximize ln L( , 2 ) instead of L( , 2 ) .
n 1
ln L( , 2 ) ln(2 2 ) 2 ( y X ) '( y X ) .
2 2
The maximum likelihood estimators (m.l.e.) of and 2 are obtained by equating the first-order
ln L( , 2 ) 1
2 X '( y X ) 0
2 2
ln L( , 2 ) n 1
2 ( y X ) '( y X ).
2
2 2( 2 ) 2
The likelihood equations are given by
( X ' X ) 1 X ' y
1
2 ( y X ) '( y X ).
n
Further to verify that these values maximize the likelihood function, we find
2 ln L( , 2 ) 1
2 X 'X
2
2 ln L( , 2 ) n 1
6 ( y X ) '( y X )
( )
2 2 2
2 4
2 ln L( , 2 ) 1
4 X '( y X ).
2
Thus the Hessian matrix of second-order partial derivatives of ln L( , 2 ) with respect to and 2 is
2 ln L( , 2 ) 2 ln L( , 2 )
2 2
2 ln L( , 2 ) ln L( , )
2 2
2 2 ( 2 ) 2
which is negative definite at and 2 2 . This ensures that the likelihood function is maximized at
these values.
0.
This implies that OLSE converges to in quadratic mean. Thus OLSE is a consistent estimator of . This
holds true for maximum likelihood estimators also.
The same conclusion can also be proved using the concept of convergence in probability.
An estimator ˆn converges to in probability if
The consistency of OLSE can be obtained under the weaker assumption that
X 'X
plim * .
n
exists and is a nonsingular and nonstochastic matrix such that
X '
plim 0.
n
Since
b ( X ' X ) 1 X '
1
X ' X X '
.
n n
So
1
X 'X X '
plim(b ) plim plim
n n
*1.0
0.
Thus b is a consistent estimator of . Same is true for m.l.e. also.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
13
(ii) Consistency of s 2
Now we look at the consistency of s 2 as an estimate of 2 as
1
s2 e 'e
nk
1
' H
nk
1
1 k
1 ' ' X ( X ' X ) X '
1
n n
k ' ' X X ' X X '
1 1
1 .
n n n n n
' 1 n 2
Note that
n
consists of
n i 1
i and { i2 , i 1, 2,..., n} is a sequence of independently and identically
distributed random variables with mean 2 . Using the law of large numbers
'
2
plim
n
' X X ' X 1 X ' 'X X ' X
1
X '
plim
plim plim plim
n n n n n n
0.*1.0
0
plim( s ) (1 0) 0
2 1 2
2.
Thus s 2 is a consistent estimator of 2 . The same holds true for m.l.e. also.