0% found this document useful (0 votes)
3 views

Chapter 04 (1)

Chapter 04 discusses violations of classical linear regression model (CLRM) assumptions, focusing on multicollinearity, heteroscedasticity, and autocorrelation. It explains the implications of these violations on estimators, how to detect them, and potential remedial measures. The chapter emphasizes the importance of addressing these issues to maintain the efficiency and reliability of regression analyses.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 04 (1)

Chapter 04 discusses violations of classical linear regression model (CLRM) assumptions, focusing on multicollinearity, heteroscedasticity, and autocorrelation. It explains the implications of these violations on estimators, how to detect them, and potential remedial measures. The chapter emphasizes the importance of addressing these issues to maintain the efficiency and reliability of regression analyses.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Chapter 04

Violations of Econometric Assumptions

4.1 Introduction
CLRM ASSUMPTIONS
 Constant variance of the error term
(Homoscedasticity)
 No autocorrelation of the error term
 No multicolinearity among the explanatory variable.
 But the classical assumptions do not always hold true.
4.1 Multicollinearity
4.2 Homoscedasticity
4.3 Autocorrelation
1
Introduction
In both the simple and multiple regression models, we made
assumptions.

Now, we are going to address the following ‘questions:

 What if the error variance is not constant over all


observations?

 What if the different errors are correlated?

 What if the explanatory variables are correlated?

 What are the consequences such violations on estimators?

 How do we detect their presence?


2
 What are the remedial measures?
Multicollinearity
(Exact or less exact linear correlation between
Regressors)
 One of the assumptions of the classical linear regression model is that
there is no multicollinearity among the explanatory variables, the X’s.
• Perfect multicollinearity is not usually seen in practice, unless the
model is approximately linear relationship between the regressors
often exists.
• These are two extreme cases and rarely exist in practice. Of
particular interest are cases in between: moderate to high degree of
multicollinearity. Such kind of multicollinearity is so common in
macroeconomic time series (such as GNP, money supply, income, etc)
since economic variables tend to move together over time.
 Broadly interpreted, multicollinearity refers to the situation where
there is either exact or less than exact linear relationship among
explanatory (X) variables.
3
 Multicollinearity could be perfect or less than perfect.
 When there is perfect collinearity among explanatory
variables, the regression coefficients are indeterminate
and their standard errors are infinite.
 There is perfect multicollinearity between two
explanatory variables if one is expressed a constant
multiple of the other.
 Suppose we have a model:
; & are explanatory variables.
 Then, if expressed as a constant multiple of (l.e
= ) we say there is perfect multicollinearity between
them, where is a constant number, .
 If = - no perfect correlation 4
 If multicollinearity is less than perfect, even if it is very
high, the OLS estimators still retain the property of
BLUE: linear, unbiased, best (minimum variance).
 Estimation of regression coefficients is possible;
however, their standard errors tend to be large.
 As a result, the population values of the coefficients
cannot be estimated precisely.
 Under multicollinearity, standard errors of estimators
become very sensitive to even the slightest change in
the data.
 Note that multicollinearity is only about linear
relationships between two or more explanatory
variables; it isn’t about nonlinear relationships between
variables. 5
Con’t
• In general, the problem of multicollinearity
arises when individual effects of explanatory
variables cannot be isolated and the
corresponding parameter magnitudes
cannot be determined with the desired
degree of precision.
• Though it is quite frequent in cross section
data as well, it should be noted that it tends
to be more common and more serious
problem in time series data
6
Causes/Sources of multicollinearity

1. Less Variability in the values of


explanatory variables.
 It is sampling over a limited range of values
taken by the regressors in the population (less
variability of the values of the regressors).
2. Constraints/restriction imposed on the model
or in the population being sampled.
3. Model specification error, for example
adding polynomial terms to a regression model,
especially when the range of the X variable is
small.
4. Over determined model. This happens
when the model has large number of
explanatory variables but a few of observations.7
Consequences of Multicollinearity

1. If there is perfect collinearity among


explanatory variables, coefficients are
indeterminate and their standard errors
are not defined.
However, even if collinearity is high (but not
perfect), estimation of regression coefficients is
possible.
 But their standard errors are still large, as a
result, population values of the coefficients
cannot be estimated precisely.
2. The variances, covariance and standard
errors of OLS estimators will become larger
under multicollinearity. This will have serous
8
statistical consequences.
3. Under estimation of t- statistics of individual
coefficients .
The t - values /ratios computed will become low due
to high standard error. This leads to Acceptance of
null hypothesis more easily (i.e., the true
population coefcient could be declared zero or
statically insignificant more easily and frequently).
4. Wrong confidence interval estimation of
coefficients. Because of the large standard
errors, the confidence intervals for the
relevant population parameters tend to be
larger and confidence interval estimation will
be biased, and lead to faulty
inferences/conclusion.
5. The OLS estimators and their standard errors
highly sensitive to small changes in the data. 9
The Variance Inflation Factor (VIF)

VIF= , -Pair wise correlation b/n .


 VIF shows how the variance and covariance of estimators
are inflated by the presence of multicollinearity.
 As approaches 1, the VIF approaches infinity, becomes
very large. That is, as the extent of collinearity increases,
the variance of an estimator increases, and in the limit it
can become infinite. As can be readily seen, if there is no
collinearity between and , = 0 and VIF = 1.
 As approaches one, that is, as collinearity increases, VIF
will become a large number, the variances of estimators
increase and in the limit when =1(perfect collinearity) VIF
becomes infinite. It is equally clear that as increases to
1; the covariance of the two estimators also increases in
absolute value.
 If VIF exceeds 10, there is the problem of MULTICOLLINEARITY.
10
Detecting Multicollinearity
• Multicollinearity is a question of degree and not of a kind. The
meaningful distinction is not between the presence of
multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the
population.
1) High and insignificant coefficients.
 The clearest sign of multicollinearity is very high and
insignificant t-statistics computed for most regression
coefficients.
 High also imply large F – statistics from the data. Hence, the F -
test in most cases will reject the null hypothesis that the partial
slope coefficients are simultaneously equal to zero, but
individual t- tests may show none of the partial slope
coefficients are significant. 11
2. High pair-wise correlation
among regressors.
 If the pair-wise correlation coefficient among two
regressors is high, say in excess of 0.8, then
multicolinearity is a serious problem.
3. high but very low partial
correlations between dependent and
explanatory variables, could indicate
existence of multicollinearity.

12
13
Heteroscedasticity
(The error variance is not constant)
The nature of Heteroscedasticity

14
Homosecdastic error variance Heteroscedastic error variance

15
Reasons why the variances of Ui may be variables

16
Consequences of hetroscedasticity problem

17
ui
Detection of Hetroscedasticity
There are informal and formal methods of testing or
detecting presence of hetroscedasticity problem.
1. The informal methods are based on graphical presentations.

18
Con’t
In fig (a), we see there is no systematic
pattern between the two variables,
suggesting that perhaps no hetroscedasticity
is present in the data.
 Figures B to E, however, exhibit definite
patterns.
For instance, suggests a linear relationship
where as D and E indicate quadratic
relationship between and .

19
2. The formal methods of detecting hetroscedasticity using the
following three tests.

20
• If the computed t-value exceeds the critical t-
value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it.
• If the regression model involves more than
one X variable, can be computed between
and each of the X variable separately and can
be tested for statistical significance by the t-
test given

21
Rank Correlation Test of Hetroscedasticity

22
23
• Note that for 8 (=10-2) df this t-value is not
significant even at the 10% level of significance.
• 2.30600
• Thus, there is no evidence of systematic
relationship between the explanatory variable and
the absolute value of the residuals, which might
suggest that there is no hetroscedasticity.

24
25
26
27
Solution for hetroscedasticity
• In the consequences of hetroscedasticity problem we have
seen that the problem does not destroy the unbiasedness
and consistency properties of the OLS estimators.
• However; they have no longer efficient (in small sample
case) and asymptotically efficient (in large sample case).
• This lack of efficiency violates the BLUE property of the
estimates and hence makes the hypothesis testing
inappropriate.
• Therefore, a remedial measure should be taken.
• There are two methods or approaches to the remedial
measure:
 When the variance of the error term, is known and
when it is unknown. 28
29
30
31
Autocorrelation
(Error terms are correlated)
 Simple & multiple regression models,
one of the assumptions ofu the
cov( i u j ) (ui u j ) 0
classicalist is that the
implies that successive values of
disturbance term U are temporarily
independent,
 i.e. disturbance occurring at one point
of observation is not related to any
32
 If the value of in any particular period is correlated with its
own preceding value(s), we say there is autocorrelation of
the random variables.
 Hence, autocorrelation is defined as a ‘correlation’ between
members of series of observations ordered in time.
 What is a difference between ‘correlation’ and
autocorrelation?
 Autocorrelation is a special case of correlation which refers
to the relationship between successive values of the same
variable, while
 Correlation may also refer to the relationship between two
or more different variables.
Note: Autocorrelation between error terms is a more Common
problem in time series data than in
cross-sectional data. 33
Graphical representation of Autocorrelation

• autocorrelation is correlation between members


of series of observations ordered in time.

34
Con’t
The figures (a) –(d) above, show a cyclical
pattern among the U’s indicating
autocorrelation i.e.
 figures (b) and (c) suggest an upward and
downward linear trend and (d) indicates
quadratic trend in the disturbance terms.
 Figure (e) indicates no systematic pattern
supporting non-autocorrelation assumption
of the classical linear regression model.
35
Reasons for Autocorrelation/
Causes of Autocorrelation
• There are several reasons why serial or autocorrelation a
rises. Some of these are:
A. Cyclical fluctuations/ Inertia effect
 Time series such as GNP, price index, production,
employment and unemployment exhibit business cycle.
 Starting at the bottom of recession, when economic
recovery starts, most of these series move upward.
 In this upswing, the value of a series at one point in time
is greater than its previous value.
 Therefore, regression involving time series data,
successive observations are likely to be interdependent.
36
B. Specification bias
This arises because of the following.
i. Exclusion of variables from the regression model
ii. Incorrect functional form of the model
iii. Neglecting lagged terms from the regression
model
i) Exclusion of variables from the regression model
For example, suppose the correct demand model is
given by: yt   1 x1t   2 x2t   31 x3t  U t .......1
where quantity of beef demanded, price of beef,
consumer income, price of pork and time.
37
Now, suppose we run the following regression in lie
of 1 Now, yt   1 x1t   2 x2t  V .......2
 if equation 1 is the ‘correct’ model or true
relation, running equation 2 is the amount to
letting Vt 3 x3t  U and to the extent the price
of pork affects the consumption of beef, the error
or disturbance term V will reflect a systematic
pattern, thus creating autocorrelation.
ii. Incorrect functional form: This is also one source
of the autocorrelation of error term. Suppose the
‘true’ or correct model in a cost-output study is as
follows.
38
However, we incorrectly fit the following model.
M arg inal cos t i  1   2 output i  Vi
 The marginal cost curve corresponding to the
‘true’ model is shown in the figure below along
with the ‘incorrect’ linear cost curve.
This result is to be expected because the
disturbance term is, in fact, equal to + ,
and hence will catch the systematic
effect of the term on the marginal cost.
In this case, will reflect autocorrelation
because of the use of an incorrect
functional form.
39
iii. Neglecting lagged term from the model: - If the
dependent variable of a certain regression
model is to be affected by the lagged value of
itself or the explanatory variable and is not
included in the model, the error term of the
incorrect model will reflect a systematic pattern
which indicates autocorrelation in the model.
 Suppose the correct model for consumption
expenditure is: Ct   1 yt   2 yt  1  U t
 But again for some reason we incorrectly regress:
 Ct   1 yt  Vt As in the case in (1) and (2);
Vt  2 y t  1  U t Hence, Vt shows systematic change 40
reflecting autocrrelation.
3. Due to Non-stationary of time
series: time series is stationary if its
mean, variance & covariance are time
invariant or constant. When the mean,
variance & covariance of time series
variable are not constant over time, it is
called Non - Stationary. A non -
stationary time series could cause
autocorrelation problem.
4. manipulation of Data and Data
Transformation may also induce
problem of autocorrelation into that data,
even if the original data don’t contain 41
Matrix representation of autocorrelation
 The variance-covariance matrix of the error terms

 The assumption of no autocorrelation is responsible for


the appearance of zero off- the diagonals, whereas
 The assumption of homoscedasticity establishes the
equality of diagonal terms. 42
Example

43
• we say that U’s follow a first order autoregressive
scheme AR(1) (or first order Markove scheme) i.e.

• If depends on the values of the two previous


periods, then:
Generally when autocorrelation is present, we assume
simple 1st form of autocorrelation: and also in the linear
form:

where the coefficient of autocorrelation and V is a


random variable satisfying all the basic assumption of
ordinary least square.
44
• The above relationship states the simplest possible form
of autocorrelation; if we apply OLS on the model

 we can treat autocorrelation in the same way as correlation

45
Autocorrelation…

46
Consequences of Autocorrelation
Consequences of Autocorrelation - the uses of OLS
methods in the presence of Serial correlation between
error terms will have the following consequences:
1. Although coefficients are linear, unbiased, and
asymptotically normally distributed in the presence of
autocorrelation problem, the OLS estimators are can’t
be BLUE
2. When there is autocorrelation between error terms, the
estimated variance of the residuals are biased and
underestimate the true variance; var () < .
3. As per the second consequences, the OLS estimators’
variance and standard errors are underestimated; and test
statistics computed on these standard errors are invalid.
47
Consequences of Autocorrelation
4. Since variance of the residuals are
underestimated, then, it Overestimates
and F- statistics of the regression.
5. The under estimation of standard error
of coefficients will lead to
overestimation of t- value of
individual coefficient. Hence, declare
a coefficient statistically significant
or Reject more easily (; = 0).
6. The confidence intervals for individual
coefficient derived are likely to be wider.
7. Hypothesis testing using the usual, t, 48
Detecting autocorrelation

49
50
Detecting autocorrelation…

51
52
Example: Suppose in a regression involving 50 observations 4
regressors the estimated d was 1.43. From the Durbin Watson table we
find that at the 5% level the critical d value are d L = 1.38 and dU =
1.72 . Note that on the basis of the d test we can not say whether there
is positive autocorrelation or not because the estimated d value lies in
the indecisive range 53
How to overcome autocorrelation?

54
1. Given a sample of 50 observation and four
explanatory variables what can you say about
autocorrelation if
i) d = 1.05 ii) d = 1.40
2. Suppose that a research used a 20 years data on
imports and GDP of Ethiopia. Applying OLS to the
observations she obtained the following import
function.
and
Use the Durban Watson test to examine the problem
of autocorrelation.
55
Getting Data into Stata
I. Entering data via the command window
– Following the input command, we types the
sequence of variable names (eight letters or
less) separated by blanks
– Example type: input id age sex income
– When we finished entering the data, we can enter
the word, end.

56
II. Entering data via the data
spreadsheet
• The user can type edit in the command
window
• A data spreadsheet will appear

• Treating each column as a separate variable, the user begins


entering the data. 57
Cont.…

• Stata will recognize the data type and allocate the


appropriate format itself.

58
III. Copying data from a Spreadsheet
a) Select the whole data set
b) Copy the whole data set
c) Open STATA
d) Type edit
e) Hit the ENTER key
f) Click on EDIT in the header bar and a drop-down
menu will appear
g) Click on paste in the drop-down menu
h) The whole spreadsheet of data is transferred to
STATA

59
Cont.…
• Example;
ovtest
Ramsey RESET test using powers of the fitted values of cons
Ho: model has no omitted variables
F(3, 1441) = 4.47
Prob > F = 0.0039
– The ovtest, reject the hypothesis that there are no omitted
variables, indicated that we need to improve the specification

• Heteroskedasticity
– We can use the hettest command to run an auxiliary
regression of on the fitted values.
hettest
Ho: Constant variance
Variables: fitted values of cons
chi2(1) = 81.50
Prob > chi2 = 0.0000
– The hettest indicates that there is heterorskedasticity which
60
needs to be dealt with
• Vif
• ovtest
• hettest
• predict resid, residual
• histogram resid,normal
• histogram resid,kdensity normal
• T- tests for continuous variable
• ttest LANSZ,by(MCPART)
• For discrete variables we use chi2
• Autocorrelation can be tested using DW-
statstic
– dwstat
61
Some commands for diagnostic tests
• Vif- multicolliarity test for continuous variables and
• Pwcorr- Pair Wise Correlation or Contingency coefficient
test for discrete explanatory /Dummy Variables
• Ovtest - Ramsey RESET to test for omitted variables
(misspecification)
• hettest -Heteroskedasticity test
• predict resid, residual
• histogram resid,normal - Normality test
• histogram resid,kdensity normal
• T- tests for continuous variable
• ttest LANSZ,by(MCPART)
• For discrete variables we use chi2
• tab psi grade , chi2
62
63
64
65
66
67
68
69
70

You might also like