0% found this document useful (0 votes)
70 views92 pages

Financial Econometrics - #4

Violations of the classical regression model assumptions can impact the reliability of results. Three key assumptions discussed in the document are: 1. The error term has a zero mean. If an intercept is not included, this can lead to biased slope estimates. 2. The error term follows a normal probability distribution. Violation of this assumption makes hypothesis testing and interval estimation difficult. 3. The error terms are uncorrelated. Correlated error terms (autocorrelation) mean estimates are no longer most efficient, though they remain unbiased and consistent. Autocorrelation biases standard errors and undermines hypothesis tests.

Uploaded by

Dushyant Mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views92 pages

Financial Econometrics - #4

Violations of the classical regression model assumptions can impact the reliability of results. Three key assumptions discussed in the document are: 1. The error term has a zero mean. If an intercept is not included, this can lead to biased slope estimates. 2. The error term follows a normal probability distribution. Violation of this assumption makes hypothesis testing and interval estimation difficult. 3. The error terms are uncorrelated. Correlated error terms (autocorrelation) mean estimates are no longer most efficient, though they remain unbiased and consistent. Autocorrelation biases standard errors and undermines hypothesis tests.

Uploaded by

Dushyant Mudgal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 92

VIOLATION OF

ASSUMPTIONS

OF CLASSICAL

REGRESSION

MODEL
The assumptions made for theLS
O
Classical Regression Model…
F
O
 BASIC ASSUMPTIONS: S
N

I O
Zero Mean of the Disturbance: E[i] = 0 for all i;
P T
Homoscedasticity: Var[ ] =  , a constant
 for all i;
M
 i

Nonautocorrelation: Cov[ , U

S Si ] = 0 if i j;
j

 A
Uncorrelatedness of regressor and disturbance: Cov[X ,  ] = 0 if
F
i j

all i andj; O
N  ]; and
Normality:  ~ON[0,
I
 
i


A T
Non-Stochastic Regressor: the value of X is a known constant in
L
i

I O
the probability distribution of Y . And, X ’s are not linear function
i i

V
of other explanatory variables.
Violation #1:
The error term
does not have
a zero mean.
Error Term does notLS
have a zero mean!!!O
F
O
S
 If a constant term is included N in the
Regression Model, thenTIthis O assumption will
never be violated and Phence, we should not
M
worry about it.
SU
AS
F
 But, if the regression model does not require
the intercept
O
N and the error term is having
non-zeroI O mean, then it may lead to severe
A
biasesT in the slope coefficient estimates.
L
I O
V
Violation #2:
The error term
does not follow
Normal Probability
Distribution.
Error Term does not follow
Normal Probability LS
Distribution!!! O
F

O
It is assumed that Error TermSfollows Normal
Probability Distribution and Nas a
I O
consequence, we are able T to use t-Test, F-
P for Hypothesis
Test and 2 DistributionsM
Testing and Interval SU Estimation.
AS
F
O
 If this assumption is not true, then our
estimates N will be still BLUE but we would find
I
difficultyO in performing hypothesis-testing and
AT
interval
L estimation.
estimation
I O
V
Error Term does not follow
Normal Probability LS
Distribution!!! O
F
O
S
How to test whether Error Term follows

 One can see or judge whether N error terms


ODistribution by using
Normal Probability Distribution?

follow Normal Probability T I


the following tools- P
M
 Basic Statistics S
U
AS
 Graph – Q-Q F Plot or P-P Plot
O
 Histogram N
I O

A T
Kolmogorov-Smirnov Test of Goodness of Fit
L
IO Bera-Jerque Test
V
Distribution of Error Term a
serious concern for the LS
researcher? O
F
 No, if the sample size is large as O CENTRAL
the
S
LIMIT THEOREM will come inNfor our rescue!!!!
I O
P T
 It will not be of any concernM if we are interested
S U
only in estimates – especially the Point Estimates
and future prediction;
S
A nothing beyond that.
F
O
 If the sample N size is small, then one may eliminate
I O
AT
any outlier in the data, if it exists. It may ensure
Normal L Probability Distribution.
I O
V
Violation #3:
Error Terms are
correlated.
Error Terms in a regression model are not
allowed to have any relation!!!

• It is assumed that Error Term has no correlation


among themselves; that is they are independent.
It means that covariance among them is zero!!!
The behaviour of Error Term when there
is no AUTOCORRELATION!!!

eˆt +1 +
eˆt +

eˆt Time

- -
-
• No pattern in residuals at all.
all
• And, this is what we would like to see
If the Error Terms are correlated or have covariance
among them which is not zero, then it is called a problem
of SERIAL CORRELATION or AUTOCORRELATION !!!

It means that Cov(ei , ej ) ¹ 0


for some values of i ¹ j.
Positive Autocorrelation

ˆt
e
ût

eˆt +1
+

+
û t

-
ˆt
e
+
uˆ t 1
Time

- -

• Positive Autocorrelation is indicated by a cyclical residual plot


over time.
Negative Autocorrelation
ˆ
e
eˆt +1
û t
+ û t
t
+

-
ˆt
e
+
uˆ t 1 T
ime

- -

• Negative autocorrelation is indicated by an alternating pattern


where the residuals cross the time axis more frequently than if they
were distributed randomly.
Why should I
bother about
Autocorrelation?
CONCERNS OF AUTOCORRELATION!!!
What autocorrelation does not impact?
• If we have autocorrelation in our
data, Ordinary Least Square
estimators will still be linear.

• Ordinary Least Square


estimators will still be unbiased
and consistent even if we have
autocorrelation.
CONCERNS OF AUTOCORRELATION!!!
What autocorrelation does impact?
• Ordinary Least Square
estimators will no longer have
minimum variance – which
means that they will not be
efficient!!!

• As a consequence, they will no


longer be BLUE.
CONSEQUENCES OF AUTOCORRELATION!!!
• The estimates of the standard errors will
become biased and unreliable when
autocorrelation is present.

• This leads to problems in hypothesis


testing about estimators and other
statistics and confidence intervals.

• To be precise, we can have only


UNBIASED, LINEAR and CONSISTENT
estimators but not BLUE estimators.
WHY Autocorrelation …?
 There may be several reasons due to
which error terms may have relations:

1. Omitted Explanatory Variables.

2. Mis-specification of the Mathematical


form of the model

3. Over-reactions and inefficiencies in


financial markets

4. Inertia of the dependent variable


Autocorrelation…

Autocorrelation
problem is more
observed in Time Series
Data rather than Cross
Sectional Data.
How to detect
Autocorrelation …?

• One Method is – Graphical Method:

 Plot residuals against observations or

against time if the series is Time Series.

 Scatter plot of residuals against lagged

residuals.
No
Autocorrelation
!
Positive
Autocorrelation
!
How to detect
Autocorrelation …?
• Other Methods are –
• Run Test
• The Durbin-Watson test (d)
• It is test of autocorrelation of first order.
• The ratio of the sum of squared differences in successive residuals to the
RSS.
• No autocorrelation is its null hypothesis.
• But d has no unique critical value.
• Sample size and the number of regressors are used to calculate upper
(dU) and lower bounds (dL) to determine rejection regime.
• Rule of thumb: d  2(1-), d = 0 if  = 1, d = 2 if  = 0, d = 4 if  = -1
(This relation provides better approximation as the sample size increases)
Durbin-Watson test for
Autocorrelation

• Test: Durbin-Watson statistic:

d
 i i 1 , for n and K -1 d.f.
(e  e ) 2

i e 2

Positive Zone of No Autocorrelation Zone of Negative


autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|
___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4

Autocorrelation is clearly evident


Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
Let’s consider a Problem…

Is there any relation


between the
compensation per hour
the workers get and
their productivity?
Output obtained is …
What kind of Autocorrelation we
have?
4

0
RESID

-2

-4

-6
-6 -4 -2 0 2 4
RESID(-1)
From the Table, we obtain critical values of Durbin-
Watson Statistic for 1 explanatory variable, 40
observations and level of significance 5% - dL=1.44 and
dU=1.54.

What’s the conclusion?


How to detect
Autocorrelation …?

• Other Methods are –


• Breusch-Godfrey Serial
Correlation LM Test
Breusch-Godfrey Serial Correlation
LM Test

What’s the conclusion?


How to deal with
Autocorrelation?
• Use GLS following the Cochrane-Orcutt Procedure
which is as follows:
• First, estimate general classical model using OLS and
ignoring autocorrelation.
• Obtain the residuals.
• Run the following regression:
eˆt = r eˆt- 1 +ut
• Obtain estimate of and substitute the values in
general model.
• Run the GLS.
Let’s take an LS
example… O
F
O
S
N
I O
PT
M
How to estimate
SU the impact of
S
A Prices on the
Sugar Cane
F
O
Area Under its Cultivation?
N
I O
AT
L
I O
V Source: Undergraduate Econometric, - Hill, Griffiths, Judge
LS
Data collected is …O(only
part of the data) F
AREA UNDER S
O
OBSERVATIO PRICE OF
CULTIVATION N
NS
(in acres)I O SUGAR CANE
T
1 29 P 0.075258
M
2
SU71 0.114894
3
AS 42 0.101075
4 F 90 0.110309
5
O 72 0.109562
N
6 I O 57 0.132486
7 A T 44 0.141783
L
8IO 61 0.209559
V9 60 0.188259
LS
The Model… O
F
O
S
N
I O
T
L n( A ) =a +Pb L n(P ) +e
M
where S U
S
A under Cultivation
A = Area F
P =
O
N Price of Sugar Cane
I O
AT
L
I O
V
Dependent Variable: LOG(A)
Method: Least Squares
LS
Date: 08/13/10 Time: 06:09
O
Sample: 1 34
F
Included observations: 34 O
S
Std. ErrorN t-Statistic
Variable Coefficient
I O Prob.

C 6.111328 P T
0.168570 36.25397 0.0000
LOG(P) 0.970582 M
0.110629 8.773336 0.0000
SU
R-squared A
0.706345S Mean dependent var 4.707273
Adjusted R-squared F
0.697168 S.D. dependent var 0.561094
S.E. of regression O
0.308771 Akaike info criterion 0.544589
Sum squared resid N 3.050865 Schwarz criterion 0.634375
Log likelihood I O -7.258010 Hannan-Quinn criter. 0.575208
F-statistic AT 76.97143 Durbin-Watson stat 1.291242
L
Prob(F-statistic) 0.000000
I O
V
residuals –Residual LS
over observation F O
O
S
RESID
N
.8
I O
.6
PT
M
.4

SU
.2
AS
.0
F
-.2 O
N
-.4
I O
A
-.6
T
L
I O -.8

V 5 10 15 20 25 30
residuals – Residual(-
LS
1) and Residual F O
O
.8
S
N
.6
I O
.4
PT
M
.2
SU
S
RESID

.0
A
-.2
F
O
-.4
N
I O-.6

AT
L -.8

I O -.8 -.4 .0 .4 .8

V RESID(-1)
From the Table, we obtain critical values of Durbin-
Watson Statistic for 1 explanatory variable, 34
observations and level of significance 5% - dL=1.393 and
dU=1.514.
Dependent Variable: LOG(A)
Method: Least Squares
Date: 08/13/10 Time: 06:09 What’s the conclusion?
Sample: 1 34
Included observations: 34

Variable Coefficient Std. Error t-Statistic Prob.

C 6.111328 0.168570 36.25397 0.0000


LOG(P) 0.970582 0.110629 8.773336 0.0000

R-squared 0.706345 Mean dependent var 4.707273


Adjusted R-squared 0.697168 S.D. dependent var 0.561094
S.E. of regression 0.308771 Akaike info criterion 0.544589
Sum squared resid 3.050865 Schwarz criterion 0.634375
Log likelihood -7.258010 Hannan-Quinn criter. 0.575208
F-statistic 76.97143 Durbin-Watson stat 1.291242
Prob(F-statistic) 0.000000
et = r et - 1 +ut
and we get the following Eviews Output
Dependent Variable: R
Method: Least Squares
Date: 08/13/10 Time: 06:29
Sample (adjusted): 2 34
Included observations: 33 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C 0.009751 0.050858 0.191722 0.8492


R(-1) 0.342854 0.169120 2.027277 0.0513

R-squared 0.117057 Mean dependent var 0.007070


Adjusted R-squared 0.088575 S.D. dependent var 0.305920
S.E. of regression 0.292058 Akaike info criterion 0.434960
Sum squared resid 2.644227 Schwarz criterion 0.525658
Log likelihood -5.176847 Hannan-Quinn criter. 0.465477
F-statistic 4.109853 Durbin-Watson stat 1.901104
Prob(F-statistic) 0.051306
Using the above result, we need to define
the transformed model and variables as
thus:
• The transformed model can be depicted as –
* * *
y =b x +b x +vt
t 1 t1 2 t2

where the transformed variables are defined by


*
( 2
) *
(
y = 1 - r y1; x = 1 - r ; x = 1 - r x1
1 11
2
) *
12 ( 2
)
for the first observation; and
* * *
y = yt - r yt- 1; x =1 - r ; x =xt - r xt- 1
t t1 t2
for the remaining observations t = 2, 3, …
Once we get the transformed
variables, we use Least Square
Method without intercept using
variables –
* * *
y, x ,
t t1 xt2
Using GLS, we get the following:
Dependent Variable: YSTAR
Method: Least Squares
Date: 08/13/10 Time: 06:56
Sample: 1 34
Included observations: 34

Variable Coefficient Std. Error t-Statistic Prob.

C 6.164129 0.212808 28.96575 0.0000


XSTAR 1.006595 0.136930 7.351185 0.0000

R-squared 0.517155 Mean dependent var 3.146282


Adjusted R-squared 0.502066 S.D. dependent var 0.410850
S.E. of regression 0.289914 Akaike info criterion 0.418559
Sum squared resid 2.689608 Schwarz criterion 0.508345
Log likelihood -5.115504 Hannan-Quinn criter. 0.449179
Durbin-Watson stat 1.966447
Violation #4:
Error Terms are
correlated.
Error Terms in a regression model are not
allowed to have any relation!!!

• It is assumed that Error Term has no correlation


among themselves; that is they are independent.
It means that covariance among them is zero!!!

It implies that Cov( i ,  j )  0


for all values of i  j.
The behaviour of Error Term when there
is no AUTOCORRELATION!!!

eˆt +1 +
eˆt +

eˆt Time

- -
-
• No pattern in residuals at all.
all
• And, this is what we would like to see
If the Error Terms are correlated or have covariance
among them which is not zero, then it is called a problem
of SERIAL CORRELATION or AUTOCORRELATION !!!

It means that Cov( i ,  j )  0


for some values of i  j.
Positive Autocorrelation

ˆt
e
ût

eˆt +1
+

+
û t

-
ˆt
e
+
uˆ t 1
Time

- -

• Positive Autocorrelation is indicated by a cyclical residual plot


over time.
Negative Autocorrelation
ˆ
e
eˆt +1
û t
+ û t
t
+

-
ˆt
e
+
uˆ t 1 T
ime

- -

• Negative autocorrelation is indicated by an alternating pattern


where the residuals cross the time axis more frequently than if they
were distributed randomly.
Why should I
bother about
Autocorrelation?
CONCERNS OF AUTOCORRELATION!!!
What autocorrelation does not impact?
• If we have autocorrelation in our
data, Ordinary Least Square
estimators will still be linear.

• Ordinary Least Square


estimators will still be unbiased
and consistent even if we have
autocorrelation.
CONCERNS OF AUTOCORRELATION!!!
What autocorrelation does impact?
• Ordinary Least Square
estimators will no longer have
minimum variance – which
means that they will not be
efficient!!!

• As a consequence, they will no


longer be BLUE.
CONSEQUENCES OF AUTOCORRELATION!!!
• The estimates of the standard errors will
become biased and unreliable when
autocorrelation is present.

• This leads to problems in hypothesis


testing about estimators and other
statistics and confidence intervals.

• To be precise, we can have only


UNBIASED, LINEAR and CONSISTENT
estimators but not BLUE estimators.
WHY Autocorrelation …?
 There may be several reasons due to
which error terms may have relations:

1. Omitted Explanatory Variables.

2. Mis-specification of the Mathematical


form of the model

3. Over-reactions and inefficiencies in


financial markets

4. Inertia of the dependent variable


Autocorrelation…

Autocorrelation
problem is more
observed in Time Series
Data rather than Cross
Sectional Data.
How to detect
Autocorrelation …?

• One Method is – Graphical Method:

 Plot residuals against observations or

against time if the series is Time Series.

 Scatter plot of residuals against lagged

residuals.
No
Autocorrelation
!
Positive
Autocorrelation
!
How to detect
Autocorrelation …?
• Other Methods are –
• Run Test
• The Durbin-Watson test (d)
• It is test of autocorrelation of first order.
• The ratio of the sum of squared differences in successive residuals to the
n
RSS.  (e  e
t t 1 )2
d t2
n

e
t 1
2
t

• No autocorrelation is its null hypothesis.


• But d has no unique critical value.
• Sample size and the number of regressors are used to calculate upper (dU)
and lower bounds (dL) to determine rejection regime.
• Rule of thumb: d  2(1-), d = 0 if  = 1, d = 2 if  = 0, d = 4 if  = -1
(This relation provides better approximation as the sample size increases)
Durbin-Watson test for
Autocorrelation

• Test: Durbin-Watson statistic:

d
 i i 1 , for n and K -1 d.f.
(e  e ) 2

i e 2

Positive Zone of No Autocorrelation Zone of Negative


autocorrelation indecision indecision autocorrelation
|_______________|__________________|_____________|_____________|__________________|
___________________|
0 d-lower d-upper 2 4-d-upper 4-d-lower 4

Autocorrelation is clearly evident


Ambiguous – cannot rule out autocorrelation
Autocorrelation in not evident
Let’s consider a Problem…

Is there any relation


between the
compensation per hour
the workers get and
their productivity?
Output obtained is …
What kind of Autocorrelation we
have?
4

0
RESID

-2

-4

-6
-6 -4 -2 0 2 4
RESID(-1)
From the Table, we obtain critical values of Durbin-Watson
Statistic for 1 explanatory variable, 40 observations and
level of significance 5% - dL=1.44 and dU=1.54.

What’s the conclusion?


How to detect
Autocorrelation …?

• Other Methods are –


• Breusch-Godfrey Serial
Correlation LM Test
Breusch-Godfrey Serial Correlation
LM Test

What’s the conclusion?


How to deal with
Autocorrelation?
• Use GLS following the Cochrane-Orcutt Procedure
which is as follows:
• First, estimate general classical model using OLS and
ignoring autocorrelation.
• Obtain the residuals.
• Run the following regression:
ˆt  ˆt1  t
• Obtain estimate of and substitute the values in
general model.
• Run the GLS.
Let’s take an LS
example… O
F
O
S
N
I O
PT
M
How to estimate
SU the impact of
S
A Prices on the
Sugar Cane
F
O
Area Under its Cultivation?
N
I O
AT
L
I O
V Source: Undergraduate Econometric, - Hill, Griffiths, Judge
L S
Data collected is …O(only
part of the data) F
O
OBSERVATIONS
AREA UNDER
CULTIVATION N
S PRICE OF SUGAR
(in acres) O
CANE
I
1 29 PT 0.075258
2 71M 0.114894
S U
3
S
A 90
42 0.101075
4 0.110309
F
5 O 72 0.109562
6 N 57 0.132486
I O
7
AT 44 0.141783
8
L 61 0.209559
9IO 60 0.188259
V10 70 0.195946
LS
The Model… O
F
O
S
N
I O
T
L n( A ) =a +Pb L n(P ) +e
M
where S U
S
A under Cultivation
A = Area F
P =
O
N Price of Sugar Cane
I O
AT
L
I O
V
Dependent Variable: LOG(A)
Method: Least Squares
LS
Date: 08/13/10 Time: 06:09
O
Sample: 1 34
F
Included observations: 34 O
S
Std. ErrorN t-Statistic
Variable Coefficient
I O Prob.

C 6.111328 P T
0.168570 36.25397 0.0000
LOG(P) 0.970582 M
0.110629 8.773336 0.0000
SU
R-squared A
0.706345S Mean dependent var 4.707273
Adjusted R-squared F
0.697168 S.D. dependent var 0.561094
S.E. of regression O
0.308771 Akaike info criterion 0.544589
Sum squared resid N 3.050865 Schwarz criterion 0.634375
Log likelihood I O -7.258010 Hannan-Quinn criter. 0.575208
F-statistic AT 76.97143 Durbin-Watson stat 1.291242
L
Prob(F-statistic) 0.000000
I O
V
residuals –Residual LS
over observation F O
O
S
RESID
N
.8
I O
.6
PT
M
.4

SU
.2
AS
.0
F
-.2 O
N
-.4
I O
A
-.6
T
L
I O -.8

V 5 10 15 20 25 30
residuals – Residual(-
LS
1) and Residual F O
O
.8
S
N
.6
I O
.4
PT
M
.2
SU
S
RESID

.0
A
-.2
F
O
-.4
N
I O-.6

AT
L -.8

I O -.8 -.4 .0 .4 .8

V RESID(-1)
From the Table, we obtain critical values of Durbin-Watson
Statistic for 1 explanatory variable, 34 observations and
level of significance 5% - dL=1.393 and dU=1.514.
Dependent Variable: LOG(A)
Method: Least Squares
Date: 08/13/10 Time: 06:09 What’s the conclusion?
Sample: 1 34
Included observations: 34

Variable Coefficient Std. Error t-Statistic Prob.

C 6.111328 0.168570 36.25397 0.0000


LOG(P) 0.970582 0.110629 8.773336 0.0000

R-squared 0.706345 Mean dependent var 4.707273


Adjusted R-squared 0.697168 S.D. dependent var 0.561094
S.E. of regression 0.308771 Akaike info criterion 0.544589
Sum squared resid 3.050865 Schwarz criterion 0.634375
Log likelihood -7.258010 Hannan-Quinn criter. 0.575208
F-statistic 76.97143 Durbin-Watson stat 1.291242
Prob(F-statistic) 0.000000
To use GLS, we estimate the following equation –
et   et 1  t
and we get the following Eviews Output
Dependent Variable: R
Method: Least Squares
Date: 08/13/10 Time: 06:29
Sample (adjusted): 2 34
Included observations: 33 after adjustments

Variable Coefficient Std. Error t-Statistic Prob.

C 0.009751 0.050858 0.191722 0.8492


R(-1) 0.342854 0.169120 2.027277 0.0513

R-squared 0.117057 Mean dependent var 0.007070


Adjusted R-squared 0.088575 S.D. dependent var 0.305920
S.E. of regression 0.292058 Akaike info criterion 0.434960
Sum squared resid 2.644227 Schwarz criterion 0.525658
Log likelihood -5.176847 Hannan-Quinn criter. 0.465477
F-statistic 4.109853 Durbin-Watson stat 1.901104
Prob(F-statistic) 0.051306
Using the above result, we need to define
the transformed model and variables as
thus:
• The transformed model can be depicted as –
y   x   x  vt
*
t
*
1 t1
*
2 t2

where the transformed variables are defined by

1  2
 *
11 
y  1   y1; x  1   ; x  1   x1
* 2
 *
12  2

for the first observation; and

y  yt   yt1; x  1   ; x  xt   xt1
*
t
*
t1
*
t2
for the remaining observations t = 2, 3, …
Once we get the transformed
variables, we use Least Square
Method without intercept using
variables –
* * *
y, x ,
t t1 xt2
Using GLS, we get the following:
Dependent Variable: YSTAR
Method: Least Squares
Date: 08/13/10 Time: 06:56
Sample: 1 34
Included observations: 34

Variable Coefficient Std. Error t-Statistic Prob.

C 6.164129 0.212808 28.96575 0.0000


XSTAR 1.006595 0.136930 7.351185 0.0000

R-squared 0.517155 Mean dependent var 3.146282


Adjusted R-squared 0.502066 S.D. dependent var 0.410850
S.E. of regression 0.289914 Akaike info criterion 0.418559
Sum squared resid 2.689608 Schwarz criterion 0.508345
Log likelihood -5.115504 Hannan-Quinn criter. 0.449179
Durbin-Watson stat 1.966447
LET’S TAKE A
COMPLETE EXAMPLE…
DATA – IMPORTS AND GNP

Year Imports GNP


1950 3,748 21,777

1951 4,010 22,418

Source: Theory of Econometrics -


1952 3,711 22,308

1953 4,004 23,319

1954 4,151 24,180

Koutsoyiannis
1955 4,569 24,893

1956 4,582 25,310

1957 4,697 25,799

1958 4,753 25,886


THE MODEL …
THE STEPS IN THE ESTIMATION
PROCESS
 Run Simple OLS

 Test for Autocorrelation; if it exists, then


determine the nature of correlation.

 Once, we get the estimate of the probable


correlation, then go for GLS.
Violation #5:
Regressors are
correlated.
Explanatory Variables in a regression model are not
allowed to have any relation!!!

• It is assumed that Explanatory Variables has no


correlation among themselves; that is they are
independent. It means that covariance among is
zero!!!

It implies that Cov(xi , x j ) =0


for all values of i ¹ j.
If Explanatory Variables in a regression model are
having any relation, then there will exist non-zero
covariance among them!!! It is known as the problem
of MULTICOLLINEARITY.

ItIt means
means that Cov(xxi,,xx j)) ¹00
that Cov(
i j
for some
for some values of ii ¹ jj..
values of
Why should I
bother about
Multicollinearity
?
CONCERNS OF MULTICOLLINEARITY!!!
What MULTICOLLINEARITY does impact?

• If the inter-correlation among explanatory


variables is PERFECT, then
 The estimates of coefficients become indeterminate, and
 The standard errors become infinitely large.

• Ordinary Least Square estimators will still be


unbiased even if we have multicollinearity.
CONCERNS OF MULTICOLLINEARITY!!!
What MULTICOLLINEARITY does impact?
• Estimates become unstable and unpredictable;
that’s to say, when multicollinearity is present, the
estimated coefficients are unstable in the degree of
statistical significance, magnitude and sign.

• R2 becomes very high though the coefficients are not


significant!!!!

• The OLS estimators and their standard errors can


be sensitive to small changes in data.
Tools to identify Multicollinearity …

• Testing for significance the Correlation Matrix among


all X’s.
• High R2 but few significant t Ratios
• Tolerance and Variance Inflation Factor (VIF):
1
VI F =
(1 - R 2 )

and, Tolerance is the reciprocal of VIF.


Identifying Multicollinearity …
Model Summaryb

Adjusted Std. Error of Durbin-


Model R R Square R Square the Estimate Watson
1 .975a .951 .946 9.10427 1.825
a. Predictors: (Constant), RX, X, RY
b. Dependent Variable: Y

Coefficientsa

Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 6.377 5.629 1.133 .268
X .557 .075 .850 7.448 .000 .144 6.948
RY .154 .191 .154 .808 .426 .052 19.386
RX -.009 .153 -.013 -.058 .954 .034 29.070
a. Dependent Variable: Y
Solutions for Multicollinearity

• Ignore
• Drop Variable which are causing correlation.
• Transforming the variables
• Identify underlying factors.
What next…?

You might also like