0% found this document useful (0 votes)
7 views

multiple regression edit_removed

Uploaded by

ankushstatsdu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

multiple regression edit_removed

Uploaded by

ankushstatsdu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 3

Multiple Linear Regression Model


We consider the problem of regression when the study variable depends on more than one explanatory or
independent variables, called a multiple linear regression model. This model generalizes the simple linear
regression in two ways. It allows the mean function E ( y ) to depend on more than one explanatory variables
and to have shapes other than straight lines, although it does not allow for arbitrary shapes.

The linear model:


Let y denotes the dependent (or study) variable that is linearly related to k independent (or explanatory)
variables X 1 , X 2 ,..., X k through the parameters 1 ,  2 ,...,  k and we write

y  X 11  X 2  2  ...  X k  k   .
This is called the multiple linear regression model. The parameters 1 ,  2 ,...,  k are the regression

coefficients associated with X 1 , X 2 ,..., X k respectively and  is the random error component reflecting the
difference between the observed and fitted linear relationship. There can be various reasons for such
difference, e.g., the joint effect of those variables not included in the model, random factors which can not
be accounted for in the model etc.

Note that the j th regression coefficient  j represents the expected change in y per unit change in the j th

independent variable X j . Assuming E ( )  0,

E ( y )
j  .
X j

Linear model:
y E ( y )
A model is said to be linear when it is linear in parameters. In such a case (or equivalently )
 j  j
should not depend on any  ' s . For example,
i) y   0  1 X is a linear model as it is linear in the parameters.

ii) y   0 X 1 can be written as

log y  log  0  1 log X


y*   0*  1 x*
which is linear in the parameter  0* and 1 , but nonlinear is variables y*  log y, x*  log x. So it is

a linear model.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
1
iii) y   0  1 X   2 X 2
is linear in parameters  0 , 1 and  2 but it is nonlinear is variables X . So it is a linear model

1
iv) y  0 
X  2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
v) y   0  1 X 2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
vi) y   0  1 X   2 X 2   3 X 3
is a cubic polynomial model which can be written as
y   0  1 X   2 X 2  3 X 3
which is linear in the parameters  0 , 1 ,  2 ,  3 and linear in the variables X 1  X , X 2  X 2 , X 3  X 3 .

So it is a linear model.

Example:
The income and education of a person are related. It is expected that, on average, a higher level of education
provides higher income. So a simple linear regression model can be expressed as
income   0  1 education   .

Not that 1 reflects the change in income with respect to per unit change in education and  0 reflects the

income when education is zero as it is expected that even an illiterate person can also have some income.

Further, this model neglects that most people have higher income when they are older than when they are
young, regardless of education. So 1 will over-state the marginal impact of education. If age and education
are positively correlated, then the regression model will associate all the observed increase in income with an
increase in education. So a better model is
income   0  1 education   2 age   .
Often it is observed that the income tends to rise less rapidly in the later earning years than is early years. To
accommodate such a possibility, we might extend the model to
income   0  1education   2 age  3age 2  
This is how we proceed for regression modeling in real-life situation. One needs to consider the experimental
condition and the phenomenon before making the decision on how many, why and how to choose the
dependent and independent variables.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
2
Model set up:
Let an experiment be conducted n times, and the data is obtained as follows:
Observation number Response Explanatory variables
y X1 X 2  X k

1 y1 x11 x12  x1k


2 y2 x21 x22  x2 k
     
yn xn1 xn 2  xnk
n

Assuming that the model is


y   0  1 X 1   2 X 2  ...   k X k   ,
the n-tuples of observations are also assumed to follow the same model. Thus they satisfy
y1   0  1 x11   2 x12  ...   k x1k  1
y2   0  1 x21   2 x22  ...   k x2 k   2
 
yn   0  1 xn1   2 xn 2  ...   k xnk   n .
These n equations can be written as
 y1  1 x11 x12  x1k   0   1 
      
 y2    1 x21 x22  x2 k  1    2 

         
      
 yn   1 xn1 xn 2  xnk    k    n 

or y  X    .
In general, the model with k explanatory variables can be expressed as
y  X 
where y  ( y1 , y2 ,..., yn ) ' is a n  1 vector of n observation on study variable,

 x11 x12  x1k 


 
 x21 x22  x2 k 
X
    
 
 xn1 xn 2  xnk 
is a n  k matrix of n observations on each of the k explanatory variables,   ( 1 ,  2 ,...,  k ) ' is a k 1
vector of regression coefficients and   (1 ,  2 ,...,  n ) ' is a n 1 vector of random error components or
disturbance term.

If intercept term is present, take first column of X to be (1,1,…,1)’.


Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
3
Assumptions in multiple linear regression model
Some assumptions are needed in the model y  X    for drawing the statistical inferences. The following
assumptions are made:
(i) E ( )  0

(ii) E ( ')   2 I n

(iii) Rank ( X )  k
(iv) X is a non-stochastic matrix
(v)  ~ N (0,  2 I n ) .

These assumptions are used to study the statistical properties of the estimator of regression coefficients. The
following assumption is required to study, particularly the large sample properties of the estimators.

 X 'X 
(vi) lim     exists and is a non-stochastic and nonsingular matrix (with finite elements).
n 
 n 

The explanatory variables can also be stochastic in some cases. We assume that X is non-stochastic unless
stated separately.

We consider the problems of estimation and testing of hypothesis on regression coefficient vector under the
stated assumption.

Estimation of parameters:
A general procedure for the estimation of regression coefficient vector is to minimize
n n

 M ( i )   M ( yi  xi11  xi 2  2  ...  xik  k )


i 1 i 1

for a suitably chosen function M .


Some examples of choice of M are
M ( x)  x
M ( x)  x 2

M ( x)  x , in general.
p

We consider the principle of least square which is related to M ( x)  x 2 and method of maximum likelihood
estimation for the estimation of parameters.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
4
Principle of ordinary least squares (OLS)
Let B be the set of all possible vectors  . If there is no further information, the B is k -dimensional real
Euclidean space. The object is to find a vector b '  (b1 , b2 ,..., bk ) from B that minimizes the sum of squared

deviations of  i ' s, i.e.,


n
S (  )    i2   '   ( y  X  ) '( y  X  )
i 1

for given y and X . A minimum will always exist as S (  ) is a real-valued, convex and differentiable
function. Write
S (  )  y ' y   ' X ' X   2 ' X ' y .
Differentiate S (  ) with respect to 
S (  )
 2X ' X   2X ' y

 2 S ( )
 2 X ' X (atleast non-negative definite).
 2
The normal equation is
S (  )
0

 X ' Xb  X ' y
where the following result is used:
Result: If f ( z )  Z ' AZ is a quadratic form, Z is a m 1 vector and A is any m  m symmetric matrix

then F ( z )  2 Az .
z

Since it is assumed that rank ( X )  k (full rank), then X ' X is a positive definite and unique solution of the
normal equation is
b  ( X ' X ) 1 X ' y
which is termed as ordinary least squares estimator (OLSE) of  .

 2 S ( )
Since is at least non-negative definite, so b minimize S (  ) .
 2

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


5
In case, X is not of full rank, then
b  ( X ' X )  X ' y   I  ( X ' X )  X ' X  

where ( X ' X )  is the generalized inverse of X ' X and  is an arbitrary vector. The generalized inverse

( X ' X )  of X ' X satisfies

X ' X ( X ' X ) X ' X  X ' X


X ( X ' X ) X ' X  X
X ' X ( X ' X ) X '  X '
Theorem:
(i) Let ŷ  Xb be the empirical predictor of y . Then ŷ has the same value for all solutions b of
X ' Xb  X ' y.
(ii) S (  ) attains the minimum for any solution of X ' Xb  X ' y.
Proof:
(i) Let b be any member in
b  ( X ' X )  X ' y   I  ( X ' X )  X ' X   .

Since X ( X ' X )  X ' X  X , so then

Xb  X ( X ' X )  X ' y  X  I  ( X ' X )  X ' X  

= X ( X ' X ) X ' y
which is independent of  . This implies that ŷ has the same value for all solution b of X ' Xb  X ' y.
(ii) Note that for any  ,

S (  )   y  Xb  X (b   )   y  Xb  X (b   ) 
 ( y  Xb)( y  Xb)  (b   ) X ' X (b   )  2(b   ) X ( y  Xb)
 ( y  Xb)( y  Xb)  (b   ) X ' X (b   ) (Using X ' Xb  X ' y )
 ( y  Xb)( y  Xb)  S (b)
 y ' y  2 y ' Xb  b ' X ' Xb
 y ' y  b ' X ' Xb
 y ' y  yˆ ' yˆ .

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


6
Fitted values:
If ̂ is any estimator of  for the model y  X    , then the fitted values are defined as

ŷ  X ˆ where ̂ is any estimator of  .

In the case of ˆ  b,
yˆ  Xb
 X ( X ' X ) 1 X ' y
 Hy

where H  X ( X ' X ) 1 X ' is termed as Hat matrix which is


(i) symmetric
(ii) idempotent (i.e., HH  H ) and

(iii) tr H  tr X ( X X ) 1 X '  tr X ' X ( X ' X ) 1  tr I k  k .

Residuals
The difference between the observed and fitted values of the study variable is called as residual. It is
denoted as
e  y ~ yˆ
 y  yˆ
 y  Xb
 y  Hy
 (I  H ) y
 Hy

where H  I  H .

Note that
(i) H is a symmetric matrix
(ii) H is an idempotent matrix, i.e.,
HH  ( I  H )( I  H )  ( I  H )  H and

(iii) trH  trI n  trH  (n  k ).

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


7
Properties of OLSE
(i) Estimation error:
The estimation error of b is
b    ( X ' X ) 1 X ' y  
 ( X ' X ) 1 X '( X    )  
 ( X ' X ) 1 X ' 

(ii) Bias
Since X is assumed to be nonstochastic and E ( )  0

E (b   )  ( X ' X ) 1 X ' E ( )
 0.
Thus OLSE is an unbiased estimator of  .

(iii) Covariance matrix


The covariance matrix of b is
V (b)  E (b   )(b   ) '
 E ( X ' X ) 1 X '  ' X ( X ' X ) 1 
 ( X ' X ) 1 X ' E ( ') X ( X ' X ) 1
  2 ( X ' X ) 1 X ' IX ( X ' X ) 1
  2 ( X ' X ) 1.

(iv) Variance
The variance of b can be obtained as the sum of variances of all b1 , b2 ,..., bk which is the trace of covariance

matrix of b . Thus
Var (b)  tr V (b) 
k
  E (bi   i ) 2
i 1
k
  Var (bi ).
i 1

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


8
Estimation of  2
The least-squares criterion can not be used to estimate  2 because  2 does not appear in S (  ) . Since

E ( i2 )   2 , so we attempt with residuals ei to estimate  2 as follows:


e  y  yˆ
 y  X ( X ' X ) 1 X ' y
 [ I  X ( X ' X ) 1 X '] y
 Hy.
Consider the residual sum of squares
n
SSr e s   ei2
i 1

 e 'e
 ( y  Xb) '( y  Xb)
 y '( I  H )( I  H ) y
 y '( I  H ) y
 y ' Hy.
Also
SS r e s  ( y  Xb) '( y  Xb)
 y ' y  2b ' X ' y  b ' X ' Xb
 y ' y  b ' X ' y (Using X ' Xb  X ' y )

SSr e s  y ' Hy
 (X    )'H (X   )
  ' H  (Using HX  0)

Since  ~ N (0,  2 I ) , so y ~ N ( X  ,  2 I ) . Hence y ' Hy ~  2 (n  k ) .

Thus E[ y ' Hy ]  (n  k ) 2

 y ' Hy 
 
2
or E
 n  k 
or E  MSr e s    2

SSr e s
where MSr e s  is the mean sum of squares due to residual.
nk
Thus an unbiased estimator of  2 is
ˆ 2  MSr e s  s 2 (say)
which is a model-dependent estimator.

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


9
Variance of ŷ
The variance of ŷ is
V ( yˆ )  V ( Xb)
 XV (b) X '
  2 X ( X ' X ) 1 X '
  2H.

Gauss-Markov Theorem:
The ordinary least squares estimator (OLSE) is the best linear unbiased estimator (BLUE) of  .
Proof: The OLSE of  is

b  ( X ' X ) 1 X ' y
which is a linear function of y . Consider the arbitrary linear estimator

b*  a ' y
of linear parametric function  '  where the elements of a are arbitrary constants.

Then for b* ,
E (b* )  E (a ' y )  a ' X 

and so b* is an unbiased estimator of  '  when

E (b* )  a ' X    ' 


 a ' X   '.
Since we wish to consider only those estimators that are linear and unbiased, so we restrict ourselves to
those estimators for which a ' X   '.

Further
Var (a ' y )  a 'Var ( y )a   2 a ' a
Var ( ' b)   'Var (b)
  2 a ' X ( X ' X ) 1 X ' a.
Consider
Var (a ' y )  Var ( ' b)   2  a ' a  a ' X ( X ' X ) 1 X ' a 
  2 a '  I  X ( X ' X ) 1 X ' a
  2 a '( I  H )a.

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


10
Since ( I  H ) is a positive semi-definite matrix, so
Var (a ' y )  Var ( ' b)  0 .

This reveals that if b* is any linear unbiased estimator then its variance must be no smaller than that of b .
Consequently b is the best linear unbiased estimator, where ‘best’ refers to the fact that b is efficient within
the class of linear and unbiased estimators.

Maximum likelihood estimation:


In the model, y  X    , it is assumed that the errors are normally and independently distributed with

constant variance  2 or  ~ N (0,  2 I ).


The normal density function for the errors is
1  1 
f ( i )  exp   2  i2  i  1, 2,..., n. .
 2  2 
The likelihood function is the joint density of 1 ,  2 ,...,  n given as
n
L(  ,  2 )   f ( i )
i 1

1  1 n 2

(2 2 ) n /2
exp   2 2   i 
 i 1 
1  1 
 exp   2  '  
(2 )
2 n /2
 2 
 1
1 
 exp   2 ( y  X  ) '( y  X  )  .
(2 )  2
2 n /2

Since the log transformation is monotonic, so we maximize ln L(  ,  2 ) instead of L(  ,  2 ) .
n 1
ln L(  ,  2 )   ln(2 2 )  2 ( y  X  ) '( y  X  ) .
2 2
The maximum likelihood estimators (m.l.e.) of  and  2 are obtained by equating the first-order

derivatives of ln L(  ,  2 ) with respect to  and  2 to zero as follows:

 ln L(  ,  2 ) 1
 2 X '( y  X  )  0
 2 2
 ln L(  ,  2 ) n 1
 2  ( y  X  ) '( y  X  ).
 2
2 2( 2 ) 2
The likelihood equations are given by

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


11
X 'X  X 'y
1
 2  ( y  X  ) '( y  X  ).
n
Since rank( X )  k , so that the unique m.l.e. of  and  2 are obtained as

  ( X ' X ) 1 X ' y
1
 2  ( y  X  ) '( y  X  ).
n

Further to verify that these values maximize the likelihood function, we find
 2 ln L(  ,  2 ) 1
 2 X 'X
 2

 2 ln L(  ,  2 ) n 1
  6 ( y  X  ) '( y  X  )
 ( )
2 2 2
2 4

 2 ln L(  ,  2 ) 1
  4 X '( y  X  ).
 2

Thus the Hessian matrix of second-order partial derivatives of ln L(  ,  2 ) with respect to  and  2 is

  2 ln L(  ,  2 )  2 ln L(  ,  2 ) 
 
  2  2 
  2 ln L(  ,  2 )  ln L(  ,  ) 
2 2

 
  2   2 ( 2 ) 2 

which is negative definite at    and  2   2 . This ensures that the likelihood function is maximized at
these values.

Comparing with OLSEs, we find that


(i) OLSE and m.l.e. of  are same. So m.l.e. of  is also an unbiased estimator of  .
nk 2
(ii) OLSE of  2 is s 2 which is related to m.l.e. of  2 as  2  s . So m.l.e. of  2 is a
n
biased estimator of  2 .

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


12
Consistency of estimators
(i) Consistency of b :
 X 'X 
Under the assumption that lim     exists as a nonstochastic and nonsingular matrix (with finite
n 
 n 
elements), we have
1
1 X 'X 
lim V (b)   lim 
2

n  n  n
 n 
1
  2 lim  1
n  n

 0.
This implies that OLSE converges to  in quadratic mean. Thus OLSE is a consistent estimator of  . This
holds true for maximum likelihood estimators also.

The same conclusion can also be proved using the concept of convergence in probability.
An estimator ˆn converges to  in probability if

lim P  ˆn       0 for any   0


n   

and is denoted as plim(ˆn )   .

The consistency of OLSE can be obtained under the weaker assumption that
 X 'X 
plim    * .
 n 
exists and is a nonsingular and nonstochastic matrix such that
 X ' 
plim    0.
 n 
Since
b    ( X ' X ) 1 X ' 
1
 X ' X  X '
  .
 n  n
So
1
 X 'X   X ' 
plim(b   )  plim   plim  
 n   n 
 *1.0
 0.
Thus b is a consistent estimator of  . Same is true for m.l.e. also.
Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur
13
(ii) Consistency of s 2
Now we look at the consistency of s 2 as an estimate of  2 as
1
s2  e 'e
nk
1
  ' H
nk
1
1 k 
1    '    ' X ( X ' X ) X '  
1

n n
 k    '  ' X  X ' X  X ' 
1 1

 1       .
 n   n n  n  n 

 ' 1 n 2
Note that
n
consists of 
n i 1
 i and { i2 , i  1, 2,..., n} is a sequence of independently and identically

distributed random variables with mean  2 . Using the law of large numbers
  ' 
 
2
plim 
 n 
  ' X  X ' X  1 X '     'X    X ' X  
1
X ' 
plim    
  plim  plim     plim 
 n  n  n   n    n    n 
 0.*1.0
0
 plim( s )  (1  0)   0 
2 1 2

  2.

Thus s 2 is a consistent estimator of  2 . The same holds true for m.l.e. also.

Econometrics | Chapter 3 | Multiple Linear Regression Model | Shalabh, IIT Kanpur


14

You might also like