Unit-3
Unit-3
Criteria
Structure
3.0 Objectives
3.1 Introduction
3.2 Tests for Specification Errors
3.2.1 Test for Presence of Irrelevant Variables
3.2.3 Test for Omitted Variables and Incorrect Functional Form: Ramsey’s Test
3.0 OBJECTIVES
After reading this unit, you will be able to:
*
Rimpy Kaushal, PGDAV College, Delhi. 41
Empirical Issues
in Econometric
3.1 INTRODUCTION
Research
One of the assumptions of ‘classical linear regression model’ (CLRM) is that
the regression model is correctly specified. If the regression model is not
correctly specified, we encounter the problems of model specification errors.
We have studied the consequences of mis-specification of a model in the
previous unit (Unit 2).However, if we take a closer look at the assumption of
no specification errors, we find that specifying a true model for a given
dataset is near to impossible. Therefore, several model selection criterion are
suggested in theory. Hendry and Richard (1983) suggest six criteria that
should be met while formulating a regression model. These are:
However, the criteria given above lay down only a theoretical framework. In
practice, very often, we commit errors in model specification.
In order to test whether a variable, say Xki add real explanation to the model
(i.e. it is relevant to the model specified), we test the significance of
42 estimated βk by the usual t-test. Likewise, for testing the relevance of a set of
variables in the model (i.e. all independent variables taken together), we Model Selection
Criteria
apply the F-test. In other words, the detection of irrelevant variables in the
model is tested by the usual t and F-test. But it is important to note that these
tests of significance are carried out under the assumption of ‘true
specification’ of the regression model. In view of this, this approach needs to
be adopted along with a process called as ‘data mining’ which is but a
process of diagnostic procedures for developing or arriving at a good model.
Data mining is a term used to describe the process applied to extract useful
data (satisfying the assumptions made for the CLRM) from the raw data. It
basically means detecting and removing errors in data set arising from
violation of assumptions of CLRM. The primary objective of data mining is
therefore to develop a model after conducting several diagnostic tests to
finally lead to a regression model that fits the data well.
where Y= total cost and X= output. Let us assume that a researcher fits a
quadratic model, ignoring the cubic term, as:
�� = �� + �� �� + �� ��� + �� (3.3)
�� = �� + �� �� + �� (3.4)
For illustration, we consider the results of regression estimates for the three
models, 3.2 to 3.4, drawn from a secondary source (Gujarati, Basic
Econometrics, Fourth Edition, p-519) as follows:
Given the true relationship between total cost and output (Model 3.2), model
3.3 and 3.4 suffer from specification errors. The residual series from
estimation of model 3.3 and 3.4 also exhibit distinct patterns, indicating
presence of specification errors (Fig. 3.1).
44
60 Model Selection
Criteria
50
40 Linear Model
30 Quadratic Model
Cubic Model
20
10
0
-10 1 2 3 4 5 6 7 8 9 10
-20
-30
-40
Fig. 3.1: Residual Series from Linear, Quadratic, and Cubic Models
�� : �� = �� = 0
Where m is the number of restrictions imposed [i.e. 2 in our case for the
two extra regressors included in the unrestricted model], n is the number
of observations and k is the number of parameters in the model. The F-
statistic above will have m and (n–k) degree of freedom i.e. 2 and 6.
45
Empirical Issues e) If the computed value of F is statistically significant, we reject the null
in Econometric
Research hypothesis of correct specification of the restricted model and conclude
that the model is mis-specified.
Returning to our total cost model, specified in 3.4, we get the following
estimation for restricted and unrestricted model:
^� = 166.467 + 19.933�� :� � = 0.8409.
Restricted Model:�
^� = 2140.722 + 476.655�� − 0.092�^�� +
Unrestricted Model:�
0.0001�^�� ∶ � � = 0.9983.
For such a high value of the test statistic, we can reject the null hypothesis at
all levels of significance. Thus, we can conclude that model 3.4 is mis-
specified. The intuition behind this test is that if adding the powers of
predicted variable increase the explanatory power of the model then this
might be an evidence of specification error. Although easy to apply, the
RESET test has some drawbacks. First is that the test does not suggest any
alternative specification. Second is that the test does not provide any guide on
the power of the variable included in the unrestricted model.
Check Your Progress 1 [answer within the space given in about 50-100
words]
1) State the six model selection criteria suggested by Hendry and Richard.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
2) How are the t-test and the F-test helpful in identifying for irrelevant
variables in a regression model?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
3) What does the term ‘data mining’ connote?
46
…………………………………………………………………………… Model Selection
Criteria
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
4) Mention the six broad indicators of a regression result which indicates
the relevance of the model estimated.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
5) What feature observed in the graphical residual series in Fig. 3.1 suggest
that the cubic model is superior to the quadratic and linear models?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
6) What does the acronym RESET stand for? What specific purpose does
this test serve? Specify the steps involved in this test procedure.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
7) Mention the two limitations of the RESET test.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
8) Consider the estimated regression:
47
Empirical Issues ^� = −21.77 + 0.002��� + 0.123��� + 13.85���
�
in Econometric
Research (29.475) (0.0006) (0.013) (9.010)
� = 88; � � = 0.672
For testing specification error, RESET test was conducted and following
subsidiary regression was obtained:
^� = 166.097 + 0.0001��� + 0.0176��� + 2.175��� + 0.000353�^�� + 0.00000154�^��
�
(317.433) (0.00520) (0.299251) (33.8881) (0.0071) (0.0000065)
� = 88; � � = 0.70553
Carry out the RESET test for specification error at 5% level of
significance stating clearly the null and alternative hypothesis.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
3.3.1 R2
The coefficient of determination (R2) is one of the measures of goodness of
fit of a regression model. Recall that it is defined as:
��� ���
� � = ��� = 1 − ��� (3.7)
3.3.2 Adjusted R2
Henry Theil (1961) developed another measure of goodness of fit called the
adjusted R2 (or �¯� ). Recall that it introduces a penalty for each additional
variable included in the regression model by reducing the degrees of freedom
as follows:
���⁄����� ���
�¯� = 1 − ���⁄����� = 1 − �1 − � � � ��� (3.8)
Thus, for a k-variable model, �¯� ⩽ � � . However, unlike R2, adjusted R2 will
increase only if the additional variable included in the model significantly
increases the explanatory power of the model. Thus, as a criteria for model
selection, adjusted R2 is a better indicator than R2. However, here also in
order to compare the adjusted R2 from two models, the dependent variable
must be the same.
where lnAIC is the ‘natural log of AIC’ and 2k/n is the ‘penalty factor’. Most
of the statistical software package report log transformed AIC. AIC imposes
a stronger penalty as compared to the adjusted R2 for including more
variables in the regression model. As a model selection criteria, while
choosing between many models, the model with lowest value of AIC, is
preferred. There are many advantages of this criteria. One of them is that it
can be useful for in-sample as well as out-of-sample forecasting of a
regression model. It is also useful in choosing the lag length in an
autoregressive model in time series analysis.
49
Empirical Issues 3.3.4 Schwarz Information Criterion (SIC)
in Econometric
Research
Also called as the Bayesian information criteria (BIC), SIC is quite similar to
that of AIC. It is defined as:
∑ �^�� ���
��� = ��⁄� �
= ��⁄� �
(3.11)
In (3.12), the expression ��⁄� ���] is the ‘penalty factor’. SIC imposes the
strongest penalty (stronger than AIC) for adding an additional variable in the
regression model. For model selection purposes, models with lower value of
SIC is considered better. It is also useful for comparing both the in-sample as
well as the out-of-sample forecasting performance of a regression model.
where 2�������] is the penalty factor for adding an extra variable in the
regression model. This penalty factor is also stronger than the penalty factor
in AIC. Given any two estimated models, the model with the lower value of
HQIC is preferred. Like AIC and BIC, HQIC is also useful foe comparing in-
sample and out-of-sample forecasting performance of the regression model.
51
Empirical Issues Table 3.2: Estimated Regression Results by OLS Estimation for
in Econometric
Research Modified Model
Thus, AKI suggests that either of the two models can be chosen whereas the
BIS criterion suggests that Model 2 is better than Model 1. Sometimes serial-
correlation is caused due to specification errors. Thus, testing Durbin-Watson
test statistics can also be helpful in detecting the presence of specification
errors. In our example, the Durbin-Watson test value is closer to 2 in both
models (1.8172 for Model 1 and 1.7971 for Model 2), suggesting that there is
no adequate evidence for auto-correlation and specification errors.
Check Your Progress 2 [answer within the space given in about 50-100
words]
1) Define the term ‘model selection criteria’. What does it basically aim at?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
52
2) Distinguish between the terms ‘in sample forecast’ and ‘out of the Model Selection
Criteria
sample forecast’.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
3) How is R2 useful in determining the ‘choice of a model’? What are its
limitations?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
4) How is adjusted R2 superior to R2?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
5) Define AIC. How is AIC superior to adjusted-R2? What are its advantages?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) What is Schwartz Information Criteria? What are its advantages?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
3.6 KEYWORDS
Model : Model specification refers to the description of the
Specification process by which the dependent variable is estimated
by a set of independent variables considered.
Correct : Correct specification of the model is one which
Specification represents the true relationship between the regressors
and the regressand.
Restricted : This is the model which imposes some restrictions on
Model the values of one or more of the coefficients of the
model.
In- sample : An in-sample forecast employ a subset of the dataset to
forecasting forecast the values within the estimation period and
compare them to the actual outcomes. In other words,
in-sample forecasting is done to assess how well the
chosen model fits the data in a given sample.
54
4) Gujarati, D.N. and Porter, D.C. (2009). Basic Econometrics (Fifth Model Selection
Criteria
Edition), McGraw Hill.
5) Wooldridge, J.M. (2009), Econometrics, Cengage Learning.
3.8 ANSWERS/HINTSTOCHECKYOUR
PROGRESSEXERCISES
Check Your Progress 1
56