Chapter 04 (1)
Chapter 04 (1)
4.1 Introduction
CLRM ASSUMPTIONS
Constant variance of the error term
(Homoscedasticity)
No autocorrelation of the error term
No multicolinearity among the explanatory variable.
But the classical assumptions do not always hold true.
4.1 Multicollinearity
4.2 Homoscedasticity
4.3 Autocorrelation
1
Introduction
In both the simple and multiple regression models, we made
assumptions.
12
13
Heteroscedasticity
(The error variance is not constant)
The nature of Heteroscedasticity
14
Homosecdastic error variance Heteroscedastic error variance
15
Reasons why the variances of Ui may be variables
16
Consequences of hetroscedasticity problem
17
ui
Detection of Hetroscedasticity
There are informal and formal methods of testing or
detecting presence of hetroscedasticity problem.
1. The informal methods are based on graphical presentations.
18
Con’t
In fig (a), we see there is no systematic
pattern between the two variables,
suggesting that perhaps no hetroscedasticity
is present in the data.
Figures B to E, however, exhibit definite
patterns.
For instance, suggests a linear relationship
where as D and E indicate quadratic
relationship between and .
19
2. The formal methods of detecting hetroscedasticity using the
following three tests.
20
• If the computed t-value exceeds the critical t-
value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it.
• If the regression model involves more than
one X variable, can be computed between
and each of the X variable separately and can
be tested for statistical significance by the t-
test given
21
Rank Correlation Test of Hetroscedasticity
22
23
• Note that for 8 (=10-2) df this t-value is not
significant even at the 10% level of significance.
• 2.30600
• Thus, there is no evidence of systematic
relationship between the explanatory variable and
the absolute value of the residuals, which might
suggest that there is no hetroscedasticity.
24
25
26
27
Solution for hetroscedasticity
• In the consequences of hetroscedasticity problem we have
seen that the problem does not destroy the unbiasedness
and consistency properties of the OLS estimators.
• However; they have no longer efficient (in small sample
case) and asymptotically efficient (in large sample case).
• This lack of efficiency violates the BLUE property of the
estimates and hence makes the hypothesis testing
inappropriate.
• Therefore, a remedial measure should be taken.
• There are two methods or approaches to the remedial
measure:
When the variance of the error term, is known and
when it is unknown. 28
29
30
31
Autocorrelation
(Error terms are correlated)
Simple & multiple regression models,
one of the assumptions ofu the
cov( i u j ) (ui u j ) 0
classicalist is that the
implies that successive values of
disturbance term U are temporarily
independent,
i.e. disturbance occurring at one point
of observation is not related to any
32
If the value of in any particular period is correlated with its
own preceding value(s), we say there is autocorrelation of
the random variables.
Hence, autocorrelation is defined as a ‘correlation’ between
members of series of observations ordered in time.
What is a difference between ‘correlation’ and
autocorrelation?
Autocorrelation is a special case of correlation which refers
to the relationship between successive values of the same
variable, while
Correlation may also refer to the relationship between two
or more different variables.
Note: Autocorrelation between error terms is a more Common
problem in time series data than in
cross-sectional data. 33
Graphical representation of Autocorrelation
34
Con’t
The figures (a) –(d) above, show a cyclical
pattern among the U’s indicating
autocorrelation i.e.
figures (b) and (c) suggest an upward and
downward linear trend and (d) indicates
quadratic trend in the disturbance terms.
Figure (e) indicates no systematic pattern
supporting non-autocorrelation assumption
of the classical linear regression model.
35
Reasons for Autocorrelation/
Causes of Autocorrelation
• There are several reasons why serial or autocorrelation a
rises. Some of these are:
A. Cyclical fluctuations/ Inertia effect
Time series such as GNP, price index, production,
employment and unemployment exhibit business cycle.
Starting at the bottom of recession, when economic
recovery starts, most of these series move upward.
In this upswing, the value of a series at one point in time
is greater than its previous value.
Therefore, regression involving time series data,
successive observations are likely to be interdependent.
36
B. Specification bias
This arises because of the following.
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression
model
i) Exclusion of variables from the regression model
For example, suppose the correct demand model is
given by: yt 1 x1t 2 x2t 31 x3t U t .......1
where quantity of beef demanded, price of beef,
consumer income, price of pork and time.
37
Now, suppose we run the following regression in lie
of 1 Now, yt 1 x1t 2 x2t V .......2
if equation 1 is the ‘correct’ model or true
relation, running equation 2 is the amount to
letting Vt 3 x3t U and to the extent the price
of pork affects the consumption of beef, the error
or disturbance term V will reflect a systematic
pattern, thus creating autocorrelation.
ii. Incorrect functional form: This is also one source
of the autocorrelation of error term. Suppose the
‘true’ or correct model in a cost-output study is as
follows.
38
However, we incorrectly fit the following model.
M arg inal cos t i 1 2 output i Vi
The marginal cost curve corresponding to the
‘true’ model is shown in the figure below along
with the ‘incorrect’ linear cost curve.
This result is to be expected because the
disturbance term is, in fact, equal to + ,
and hence will catch the systematic
effect of the term on the marginal cost.
In this case, will reflect autocorrelation
because of the use of an incorrect
functional form.
39
iii. Neglecting lagged term from the model: - If the
dependent variable of a certain regression
model is to be affected by the lagged value of
itself or the explanatory variable and is not
included in the model, the error term of the
incorrect model will reflect a systematic pattern
which indicates autocorrelation in the model.
Suppose the correct model for consumption
expenditure is: Ct 1 yt 2 yt 1 U t
But again for some reason we incorrectly regress:
Ct 1 yt Vt As in the case in (1) and (2);
Vt 2 y t 1 U t Hence, Vt shows systematic change 40
reflecting autocrrelation.
3. Due to Non-stationary of time
series: time series is stationary if its
mean, variance & covariance are time
invariant or constant. When the mean,
variance & covariance of time series
variable are not constant over time, it is
called Non - Stationary. A non -
stationary time series could cause
autocorrelation problem.
4. manipulation of Data and Data
Transformation may also induce
problem of autocorrelation into that data,
even if the original data don’t contain 41
Matrix representation of autocorrelation
The variance-covariance matrix of the error terms
43
• we say that U’s follow a first order autoregressive
scheme AR(1) (or first order Markove scheme) i.e.
45
Autocorrelation…
46
Consequences of Autocorrelation
Consequences of Autocorrelation - the uses of OLS
methods in the presence of Serial correlation between
error terms will have the following consequences:
1. Although coefficients are linear, unbiased, and
asymptotically normally distributed in the presence of
autocorrelation problem, the OLS estimators are can’t
be BLUE
2. When there is autocorrelation between error terms, the
estimated variance of the residuals are biased and
underestimate the true variance; var () < .
3. As per the second consequences, the OLS estimators’
variance and standard errors are underestimated; and test
statistics computed on these standard errors are invalid.
47
Consequences of Autocorrelation
4. Since variance of the residuals are
underestimated, then, it Overestimates
and F- statistics of the regression.
5. The under estimation of standard error
of coefficients will lead to
overestimation of t- value of
individual coefficient. Hence, declare
a coefficient statistically significant
or Reject more easily (; = 0).
6. The confidence intervals for individual
coefficient derived are likely to be wider.
7. Hypothesis testing using the usual, t, 48
Detecting autocorrelation
49
50
Detecting autocorrelation…
51
52
Example: Suppose in a regression involving 50 observations 4
regressors the estimated d was 1.43. From the Durbin Watson table we
find that at the 5% level the critical d value are d L = 1.38 and dU =
1.72 . Note that on the basis of the d test we can not say whether there
is positive autocorrelation or not because the estimated d value lies in
the indecisive range 53
How to overcome autocorrelation?
54
1. Given a sample of 50 observation and four
explanatory variables what can you say about
autocorrelation if
i) d = 1.05 ii) d = 1.40
2. Suppose that a research used a 20 years data on
imports and GDP of Ethiopia. Applying OLS to the
observations she obtained the following import
function.
and
Use the Durban Watson test to examine the problem
of autocorrelation.
55
Getting Data into Stata
I. Entering data via the command window
– Following the input command, we types the
sequence of variable names (eight letters or
less) separated by blanks
– Example type: input id age sex income
– When we finished entering the data, we can enter
the word, end.
56
II. Entering data via the data
spreadsheet
• The user can type edit in the command
window
• A data spreadsheet will appear
58
III. Copying data from a Spreadsheet
a) Select the whole data set
b) Copy the whole data set
c) Open STATA
d) Type edit
e) Hit the ENTER key
f) Click on EDIT in the header bar and a drop-down
menu will appear
g) Click on paste in the drop-down menu
h) The whole spreadsheet of data is transferred to
STATA
59
Cont.…
• Example;
ovtest
Ramsey RESET test using powers of the fitted values of cons
Ho: model has no omitted variables
F(3, 1441) = 4.47
Prob > F = 0.0039
– The ovtest, reject the hypothesis that there are no omitted
variables, indicated that we need to improve the specification
• Heteroskedasticity
– We can use the hettest command to run an auxiliary
regression of on the fitted values.
hettest
Ho: Constant variance
Variables: fitted values of cons
chi2(1) = 81.50
Prob > chi2 = 0.0000
– The hettest indicates that there is heterorskedasticity which
60
needs to be dealt with
• Vif
• ovtest
• hettest
• predict resid, residual
• histogram resid,normal
• histogram resid,kdensity normal
• T- tests for continuous variable
• ttest LANSZ,by(MCPART)
• For discrete variables we use chi2
• Autocorrelation can be tested using DW-
statstic
– dwstat
61
Some commands for diagnostic tests
• Vif- multicolliarity test for continuous variables and
• Pwcorr- Pair Wise Correlation or Contingency coefficient
test for discrete explanatory /Dummy Variables
• Ovtest - Ramsey RESET to test for omitted variables
(misspecification)
• hettest -Heteroskedasticity test
• predict resid, residual
• histogram resid,normal - Normality test
• histogram resid,kdensity normal
• T- tests for continuous variable
• ttest LANSZ,by(MCPART)
• For discrete variables we use chi2
• tab psi grade , chi2
62
63
64
65
66
67
68
69
70