0% found this document useful (0 votes)
3 views

Unit-3

This document discusses model selection criteria in regression analysis, outlining objectives, tests for specification errors, and various selection criteria such as R2, AIC, and SIC. It emphasizes the importance of correctly specifying regression models and provides methods for detecting specification errors, including Ramsey's RESET test. The document also highlights the practical challenges researchers face in achieving accurate model specifications.

Uploaded by

Roidar khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit-3

This document discusses model selection criteria in regression analysis, outlining objectives, tests for specification errors, and various selection criteria such as R2, AIC, and SIC. It emphasizes the importance of correctly specifying regression models and provides methods for detecting specification errors, including Ramsey's RESET test. The document also highlights the practical challenges researchers face in achieving accurate model specifications.

Uploaded by

Roidar khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT 3 MODEL SELECTION CRITERIA* Model Selection

Criteria

Structure

3.0 Objectives
3.1 Introduction
3.2 Tests for Specification Errors
3.2.1 Test for Presence of Irrelevant Variables
3.2.3 Test for Omitted Variables and Incorrect Functional Form: Ramsey’s Test

3.3 Model Selection Criteria


3.3.1 R2
3.3.2 Adjusted R2
3.3.3 Akaike Information Criterion (AIC)
3.3.4 Schwarz Information Criterion (SIC)
3.3.5 Hannan-Quinn Information Criterion (HQIC)

3.4 Illustration for Model Selection Using the Various Criteria


3.5 Let Us Sum Up
3.6 Key Words
3.7 Suggested Books for Further Reading
3.8 Answers/Hints to Check Your Progress Exercises

3.0 OBJECTIVES
After reading this unit, you will be able to:

• specify the criteria for formulating a regression model;


• state the particulars of test for detecting the presence of ‘irrelevant
variables’;
• discuss the Ramsey’s Test (RESET) for identification of ‘omitted
variables’ and ‘incorrect functional form’;
• distinguish between the terms ‘in-sample forecast’ and ‘out-of-sample
forecast’;
• outline how R2 and adjusted R2 serve as indicators of ‘goodness of fit’ of
a regression model;
• differentiate between the ‘Akaike Information Criterion’ (AIC) and the
‘Schwarz Information Criterion’ (SIC) commenting on their usefulness
in forecasting the performance of a regression model; and
• illustrate the model selection procedure using the various criteria.

*
Rimpy Kaushal, PGDAV College, Delhi. 41
Empirical Issues
in Econometric
3.1 INTRODUCTION
Research
One of the assumptions of ‘classical linear regression model’ (CLRM) is that
the regression model is correctly specified. If the regression model is not
correctly specified, we encounter the problems of model specification errors.
We have studied the consequences of mis-specification of a model in the
previous unit (Unit 2).However, if we take a closer look at the assumption of
no specification errors, we find that specifying a true model for a given
dataset is near to impossible. Therefore, several model selection criterion are
suggested in theory. Hendry and Richard (1983) suggest six criteria that
should be met while formulating a regression model. These are:

1) Data Admissibility: Model predictions are consistent with the data.


2) Theoretical Consistency: Model specified is consistent with the existing
theory.
3) Exogenous Regressors: Regressors are uncorrelated with the error term.
4) Parameter Constancy: Estimates of the parameters are stable.
5) Data Coherency: The residuals from the model are purely random (i.e.
white noise).
6) Encompassing: The model is able to explain results from other rival
models (in other words, no other model is better than the chosen model).

However, the criteria given above lay down only a theoretical framework. In
practice, very often, we commit errors in model specification.

3.2 TESTS FOR SPECIFICATION ERRORS


Let us recall the different types of model specification errors (and their
consequences) studied in Unit 2 on ‘Specification Issues’. No researcher
knowingly commits specification errors. Specification errors arise
inadvertently due to researcher’s inability to develop the regression model as
meticulously as required. There can be several reasons for such specification
errors like weak theoretical background, unavailability of adequate data, etc.
Therefore, in practice, researchers emphasise on the detection of the
specification errors instead of finding out the reasons for specification errors.
In this section, we discuss some tests that are helpful in detecting the
specification errors.

3.2.1 Test for Presence of Irrelevant Variables


Let us consider a k-variable regression model as follows:

�� = �� + �� ��� +. . . . . . . . . . . . . . . . +�� ��� + �� (3.1)

In order to test whether a variable, say Xki add real explanation to the model
(i.e. it is relevant to the model specified), we test the significance of
42 estimated βk by the usual t-test. Likewise, for testing the relevance of a set of
variables in the model (i.e. all independent variables taken together), we Model Selection
Criteria
apply the F-test. In other words, the detection of irrelevant variables in the
model is tested by the usual t and F-test. But it is important to note that these
tests of significance are carried out under the assumption of ‘true
specification’ of the regression model. In view of this, this approach needs to
be adopted along with a process called as ‘data mining’ which is but a
process of diagnostic procedures for developing or arriving at a good model.
Data mining is a term used to describe the process applied to extract useful
data (satisfying the assumptions made for the CLRM) from the raw data. It
basically means detecting and removing errors in data set arising from
violation of assumptions of CLRM. The primary objective of data mining is
therefore to develop a model after conducting several diagnostic tests to
finally lead to a regression model that fits the data well.

3.2.2 Test for Omitted Variables and Incorrect Functional


Form: Ramsey’s Test
A researcher can never be sure that a regression model formulated is the ‘true
or best’ model for empirical investigation. It is only the theoretical
framework and prior empirical studies that helps a researcher. These help in
designing a model that is assumed to truly reflect the population
characteristics sought to be revealed by the regression model estimated on the
basis of sample data. In practice, it is only after the design, that a model is
subjected to empirical investigation. In other words, it is only after the stage
of specification of the model, that the model is tested for its adequacy by the
data collected. This is done by examining carefully the broad features of the
empirical results such as: (i) the value of coefficient of determination (R2),
(ii) the value of the adjusted R2, (iii) significance of t-ratios estimated, (iv)
results of F-test, (v) signs of the estimated coefficients, (vi) value of Durbin-
Watson test statistic to reveal the presence or absence of serial correlation
effect, etc. If the chosen model performs reasonably well in terms of broad
features specified above, then the model is considered to be a fair
representation of the true relationship in the population. However, if the
model fails to satisfy one or the other broad feature, the researcher would
have to suspect some specification error. This might be omitted variable bias,
wrong functional form, presence of serial correlation, etc. Thus, in order to
determine whether the model suffers from one or more specification errors,
one can adopt several methods (i.e. diagnosis for detection and treatment
procedures). These steps help in finally arriving at a ‘cleaned data set’ from
which the results of tests drawn would be realistically revealing the
population characteristics.

Examination of the residual series is a good diagnostic tool helpful in


detecting the presence of serial-correlation and heteroscedasticity in the data.
The residual examination is also helpful in detecting specification errors in
cross-sectional data. This is because in the presence of specification errors,
residual series display noticeable patterns. This can be illustrated by the
43
Empirical Issues ‘cubic total cost function’ as follows. Let us begin by assuming the true ‘total
in Econometric
Research cost function’ as:

�� = �� + �� �� + �� ��� + �� ��� + �� (3.2)

where Y= total cost and X= output. Let us assume that a researcher fits a
quadratic model, ignoring the cubic term, as:

�� = �� + �� �� + �� ��� + �� (3.3)

And another researcher considers only a linear relationship between Y and X,


ignoring even the quadratic term, as:

�� = �� + �� �� + �� (3.4)

For illustration, we consider the results of regression estimates for the three
models, 3.2 to 3.4, drawn from a secondary source (Gujarati, Basic
Econometrics, Fourth Edition, p-519) as follows:

For cubic cost function (Model in 3.2)


^� = 141.767 + 63.478�� − 12.962��� + 0.939���

t-statistic: (22.2) (13.3) (–13.2) (15.9)

� � = 0.9983, �¯� = 0.9975, � = 2.70

For quadratic cost function (Model in 3.3)


^� = 222.383 − 8.0250�� + 2.542���

t-statistic: (9.5) (- 0.82) (2.9)

� � = 0.9284, �¯� = 0.9079, � = 1.038

For linear cost function (Model in 3.4)


^� = 166.467 + 19.933��

t-statistic: (8.752) (6.502)

� � = 0.8409, �¯� = 0.8210, � = 0.716

Given the true relationship between total cost and output (Model 3.2), model
3.3 and 3.4 suffer from specification errors. The residual series from
estimation of model 3.3 and 3.4 also exhibit distinct patterns, indicating
presence of specification errors (Fig. 3.1).

44
60 Model Selection
Criteria
50

40 Linear Model
30 Quadratic Model
Cubic Model
20

10
0
-10 1 2 3 4 5 6 7 8 9 10

-20

-30
-40

Fig. 3.1: Residual Series from Linear, Quadratic, and Cubic Models

A formal approach was developed by Ramsey (1969) to detect the presence


of specification errors. The test is called RESET (regression specification
error test). To explain this test, let us revert to our total cost model discussed
above. After a visual examination of the residual series from linear, quadratic
and cubic models, we saw that model 3.3 and 3.4 are mis-specified in relation
to model 3.2. Let us suppose that a researcher estimates the model 3.4
(�� = �� + �� �� + �� ) and proceeds to test for the specification error. Then
the steps involved in Ramsey’s RESET test are as follows:

a) From incorrectly estimated linear cost model, we first obtain the


^� �.
estimated, or fitted values of total cost��
b) Now estimate the model again after including the higher powers of the
estimated total cost viz.��^�� , �^�� � as:
�� = �� + �� �� + �� �^�� + �� �^�� + �� (3.5)
c) Now our initially estimated model (3.4) is restricted model and the
model (3.5) is the unrestricted model. Consider the R2 values of both

these models [i.e. ��� and ��� ].
d) The null hypothesis of the test says that the restricted model is correctly
specified. That is:

�� : �� = �� = 0

Now, by using the test statistic:


�� � ���� �⁄�
� = ������� (3.6)
�� �⁄�����

Where m is the number of restrictions imposed [i.e. 2 in our case for the
two extra regressors included in the unrestricted model], n is the number
of observations and k is the number of parameters in the model. The F-
statistic above will have m and (n–k) degree of freedom i.e. 2 and 6.
45
Empirical Issues e) If the computed value of F is statistically significant, we reject the null
in Econometric
Research hypothesis of correct specification of the restricted model and conclude
that the model is mis-specified.

Returning to our total cost model, specified in 3.4, we get the following
estimation for restricted and unrestricted model:
^� = 166.467 + 19.933�� :� � = 0.8409.
Restricted Model:�
^� = 2140.722 + 476.655�� − 0.092�^�� +
Unrestricted Model:�
0.0001�^�� ∶ � � = 0.9983.

Thus, the computed F-statistic will be:


�0.9983 − 0.8409�⁄2
�= = 284.4035
�1 − 0.9983�⁄�10 − 4�

For such a high value of the test statistic, we can reject the null hypothesis at
all levels of significance. Thus, we can conclude that model 3.4 is mis-
specified. The intuition behind this test is that if adding the powers of
predicted variable increase the explanatory power of the model then this
might be an evidence of specification error. Although easy to apply, the
RESET test has some drawbacks. First is that the test does not suggest any
alternative specification. Second is that the test does not provide any guide on
the power of the variable included in the unrestricted model.

Check Your Progress 1 [answer within the space given in about 50-100
words]

1) State the six model selection criteria suggested by Hendry and Richard.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
2) How are the t-test and the F-test helpful in identifying for irrelevant
variables in a regression model?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
3) What does the term ‘data mining’ connote?
46
…………………………………………………………………………… Model Selection
Criteria
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
4) Mention the six broad indicators of a regression result which indicates
the relevance of the model estimated.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
5) What feature observed in the graphical residual series in Fig. 3.1 suggest
that the cubic model is superior to the quadratic and linear models?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
6) What does the acronym RESET stand for? What specific purpose does
this test serve? Specify the steps involved in this test procedure.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
7) Mention the two limitations of the RESET test.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
8) Consider the estimated regression:
47
Empirical Issues ^� = −21.77 + 0.002��� + 0.123��� + 13.85���

in Econometric
Research (29.475) (0.0006) (0.013) (9.010)
� = 88; � � = 0.672
For testing specification error, RESET test was conducted and following
subsidiary regression was obtained:
^� = 166.097 + 0.0001��� + 0.0176��� + 2.175��� + 0.000353�^�� + 0.00000154�^��

(317.433) (0.00520) (0.299251) (33.8881) (0.0071) (0.0000065)
� = 88; � � = 0.70553
Carry out the RESET test for specification error at 5% level of
significance stating clearly the null and alternative hypothesis.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………

3.3 MODEL SELECTION CRITERIA


Model selection criteria is defined as the set of rules used to select a
regression model, from among a set of models, based on observed data. It
aims at minimising the expected dissimilarity between the chosen model and
the true model. Several criteria are developed for this and these are discussed
in this section below. Most of these primarily focus on minimising the
‘residual sum of squares’ (RSS). It is important to distinguish between the
terms ‘in-sample forecasting’ and ‘out-of-sample forecasting’ here. An in-
sample forecast employs a subset of the dataset to forecast the values within
the estimation period and compare them to the actual outcomes. In other
words, in-sample forecasting is done to assess how well the chosen model fits
the data in a given sample. An out-of-sample forecasting uses all the values
in the available data in the sample to predict the future value of the
regressand.

3.3.1 R2
The coefficient of determination (R2) is one of the measures of goodness of
fit of a regression model. Recall that it is defined as:
��� ���
� � = ��� = 1 − ��� (3.7)

R2 lies between 0 and 1. A value of R2 closer to 1 indicates a good fit.


However, R2 suffers from some drawbacks. Firstly, it is an ‘estimate’
indicating the degree of closeness of a fitted value to that of the actual value.
48 Thus, it is an in-sample measure of goodness of fit which does not essentially
provide an accurate out-of-sample forecasting. Second, to compare the R2 Model Selection
Criteria
from two models, the dependent variable has to be same. Third, R2 is an
increasing function of number of explanatory variables in the model. This
means, it can be increased by simply increasing the number of explanatory
variables in the model whereas adding more variables may also increase error
variance. Therefore, one cannot rely solely on the values of R2 for choosing
the best model.

3.3.2 Adjusted R2
Henry Theil (1961) developed another measure of goodness of fit called the
adjusted R2 (or �¯� ). Recall that it introduces a penalty for each additional
variable included in the regression model by reducing the degrees of freedom
as follows:
���⁄����� ���
�¯� = 1 − ���⁄����� = 1 − �1 − � � � ��� (3.8)

Thus, for a k-variable model, �¯� ⩽ � � . However, unlike R2, adjusted R2 will
increase only if the additional variable included in the model significantly
increases the explanatory power of the model. Thus, as a criteria for model
selection, adjusted R2 is a better indicator than R2. However, here also in
order to compare the adjusted R2 from two models, the dependent variable
must be the same.

3.3.3 Akaike Information Criterion


Akaike information criterion (AIC) is an estimator of the relative quality of
each model i.e. relative to others. The criteria was developed by the Japanese
statistician Hirotugu Akaikein 1970. In AIC criterion also, a penalty is
imposed for additional variable included in the regression model. It is
defined as:
���
��� = � ��⁄� ∑ �^�� = � ��⁄� � (3.9)

where k is the number of parameters in the model and n is the number of


observations in the sample. Taking log on both sides of Equation [(3.9)],we
get:
�� ���
����� = �
+ �� � �
� (3.10)

where lnAIC is the ‘natural log of AIC’ and 2k/n is the ‘penalty factor’. Most
of the statistical software package report log transformed AIC. AIC imposes
a stronger penalty as compared to the adjusted R2 for including more
variables in the regression model. As a model selection criteria, while
choosing between many models, the model with lowest value of AIC, is
preferred. There are many advantages of this criteria. One of them is that it
can be useful for in-sample as well as out-of-sample forecasting of a
regression model. It is also useful in choosing the lag length in an
autoregressive model in time series analysis.
49
Empirical Issues 3.3.4 Schwarz Information Criterion (SIC)
in Econometric
Research
Also called as the Bayesian information criteria (BIC), SIC is quite similar to
that of AIC. It is defined as:
∑ �^�� ���
��� = ��⁄� �
= ��⁄� �
(3.11)

Log transformation of the above expression yields:


� ���
����� = � ��� + �� � �
� (3.12)

In (3.12), the expression ��⁄� ���] is the ‘penalty factor’. SIC imposes the
strongest penalty (stronger than AIC) for adding an additional variable in the
regression model. For model selection purposes, models with lower value of
SIC is considered better. It is also useful for comparing both the in-sample as
well as the out-of-sample forecasting performance of a regression model.

3.3.5 Hannan-Quinn Information Criterion (HQIC)


The Hannan-Quinn information criterion (HQIC) is a measure of the
goodness of fit of a regression model. It is often used as a criterion for model
selection as an alternative to Schwarz Information Criterion (SIC). It is
defined as:
���
���� = ��� �
+ 2�������] (3.13)

where 2�������] is the penalty factor for adding an extra variable in the
regression model. This penalty factor is also stronger than the penalty factor
in AIC. Given any two estimated models, the model with the lower value of
HQIC is preferred. Like AIC and BIC, HQIC is also useful foe comparing in-
sample and out-of-sample forecasting performance of the regression model.

3.4 ILLUSTRATION FOR MODEL SELECTION


USING THE VARIOUS CRITERIA
Let us consider a wage determination model as follows:

���� = �� + �� ����� + �� ����� + �� ���� + �� ������� + �� ���� +


�� �������� + �� ����� + �� ����� + ��� ���� � (3.12)

where wage = hourly wages in $, hours = number of working hours, educ =


education in years, exp = work experience in years, tenure = tenure or period
in present occupation, age = age in years, married = 1 if married, 0
otherwise, race = 1 if non-white, 0 otherwise, sibs = number of siblings in the
family and south = 1 if region of residence is south. 0 otherwise. A priori,
education, work experience, age, tenure, marital status are expected to be
positively related to hourly wages and race and region of residence negatively
related to hourly wages. We consider the estimation results based on a dataset
of 935 observations drawn from the secondary source [‘cps4_small’ dataset,
50
available on Online Resource Centre, Oxford University Press, for Model Selection
Criteria
Introduction to Econometrics, by Christopher Dougherty, 5th Edition] as in
Table 3.1. All the variables in the regression results (Table 3.1) have
expected signs but not all are statistically significant. Since estimations are
based on cross-sectional data with large number of observations, a low R2
(0.20) value can be justified. The R2 value is statistically significant as the
computed F value (28) is higher than its p-value (which are all close to zero).
Thus, the F test for the joint significance of all the variable in the model is
met.

For pedagogical purposes, another model is estimated after dropping some


variables, which were statistically insignificant in the earlier estimation i.e. as
per Table 3.1. The estimated results of this model are presented in Table 3.2.
All the variables considered are statistically significant at 5% level of
significance. The value of R2 and adjusted R2 is evidently smaller as there are
lesser number of variables in Model 2. But the value of BSI criterion is
smaller in this model whereas the value of AKI criterion is almost same for
both the models.

Table 3.1: Estimated Regression Results by OLS Estimation Procedure

Dependent Variable: Wage


Independent Coefficient Std. Error t-Ratio p-Value
Variables
Constant -0323.1 200.6 - 1.6 0.11
Hours - 3.02 2.26 - 1.34 0.18
Education 64.9 6.8 9.6 0.00
Experience 9.7 3.8 2.6 0.01
Tenure 5.4 2.5 2.1 0.03
Age 9.0 4.9 1.9 0.06
Married 172.1 35.2 4.9 0.00
Black - 132.7 31.3 - 4.2 0.00
Siblings - 6.7 5.0 - 1.3 0.18
South - 82.9 26.6 - 3.1 0.00
Mean Dependent Variable: 957.95 S.D. of Dependent Variable: 404.4
SSR: 122414111.3 S.E. of Regression: 363.8
R-squared:0.20 Adjusted R-square: 0.19
F(9, 925): 27.9 P-value of F: 0.00
Log-likelihood:- 6834. 97 Akaike Criterion: 13689.93
Schwarz Criterion: 13738.34 Hannan-Quinn: 13708.39

51
Empirical Issues Table 3.2: Estimated Regression Results by OLS Estimation for
in Econometric
Research Modified Model

Dependent Variable: Wage

Independent Coefficient Std. Error t-Ratio p-Value


Variables

Constant - 480.1 165.1 - 2.9 0.10

Hours 65.9 6.6 9.9 0.00

Education 10.1 3.8 2.7 0.01

Experience 5.8 2.5 2.3 0.02

Tenure 8.7 4.9 1.8 0.08

Age 169.5 35.4 4.8 0.00

Married - 139.0 30.3 - 4.5 0.00

Black - 82.1 26.6 - 3.1 0.00

Mean Dependent Variable: 957.95 S.D. of Dependent Variable: 404.4


SSR: 123038014.53 S.E. of Regression: 364.3
R-squared:0.19 Adjusted R-square: 0.19
F(9, 925): 34.4 P-value of F: 0.00
Log-likelihood:- 6837.34 Akaike Criterion: 13690.69
Schwarz Criterion: 13729.41 Hannan-Quinn: 13705.45

Thus, AKI suggests that either of the two models can be chosen whereas the
BIS criterion suggests that Model 2 is better than Model 1. Sometimes serial-
correlation is caused due to specification errors. Thus, testing Durbin-Watson
test statistics can also be helpful in detecting the presence of specification
errors. In our example, the Durbin-Watson test value is closer to 2 in both
models (1.8172 for Model 1 and 1.7971 for Model 2), suggesting that there is
no adequate evidence for auto-correlation and specification errors.

Check Your Progress 2 [answer within the space given in about 50-100
words]

1) Define the term ‘model selection criteria’. What does it basically aim at?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
52
2) Distinguish between the terms ‘in sample forecast’ and ‘out of the Model Selection
Criteria
sample forecast’.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
3) How is R2 useful in determining the ‘choice of a model’? What are its
limitations?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
4) How is adjusted R2 superior to R2?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
5) Define AIC. How is AIC superior to adjusted-R2? What are its advantages?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) What is Schwartz Information Criteria? What are its advantages?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................

3.5 LET US SUM UP


The CLRM assumes ‘correct specification’ of model [i.e. all the theoretically
relevant variables are included in the model, irrelevant variables are excluded
and there are no errors of measurement]. Therefore, it is crucial to test for the
53
Empirical Issues presence of specification errors. Econometric theory proposes several tests to
in Econometric
Research detect the presence of the specification error. One of the widely used tests is
the Ramsey’s regression specification test (RESET). Further, theory also
proposes several information criteria (such as R2, adjusted R2, Akaike
information criteria, Schwarz information criteria) that help in reaching a
good model. All of these criteria, except R2, impose some penalty on
including additional variables in the regression model.

3.6 KEYWORDS
Model : Model specification refers to the description of the
Specification process by which the dependent variable is estimated
by a set of independent variables considered.
Correct : Correct specification of the model is one which
Specification represents the true relationship between the regressors
and the regressand.
Restricted : This is the model which imposes some restrictions on
Model the values of one or more of the coefficients of the
model.
In- sample : An in-sample forecast employ a subset of the dataset to
forecasting forecast the values within the estimation period and
compare them to the actual outcomes. In other words,
in-sample forecasting is done to assess how well the
chosen model fits the data in a given sample.

Out-of-sample : An out-of-sample forecasting uses all the values in the


forecasting available data in the sample to predict the future value
of the regressand.

Simple Linear : A model with only one independent variable is called


Regression as a simple linear regression model. For such a model,
Model � � ≥ �¯� . In simple linear regression, it is not required
to conduct individual significance as well as joint
significance because for such models, square of ‘t’
value is equal to F.

3.7 SUGGESTED BOOKS FOR FURTHER


READING
1) Gujarati, D.N. and Porter, D.C. (2010).Essentials of Econometrics,
Fourth Edition), McGraw Hill.
2) Gujarati, D.N. (2015).Econometrics by Examples, (Second Edition),
Palgrave McMillan.
3) Dougherty, C. (2011). Introduction to Econometrics, Oxford, Oxford
University Press.

54
4) Gujarati, D.N. and Porter, D.C. (2009). Basic Econometrics (Fifth Model Selection
Criteria
Edition), McGraw Hill.
5) Wooldridge, J.M. (2009), Econometrics, Cengage Learning.

3.8 ANSWERS/HINTSTOCHECKYOUR
PROGRESSEXERCISES
Check Your Progress 1

1) Data admissibility, theoretical consistency, independent regressors,


parameter constancy, data coherency and competitive or best among the
rival models.
2) t-test helps in identifying whether an individual independent variable
included in a regression model adds real explanation. This is done based
on whether the test identifies the estimated parameter value as significant
or not. Likewise, the F-test serves the purpose of identifying whether all
the independent variables included in the model jointly makes for the
relevance of the set of variables taken together.
3) In spite of many precautions taken, there still could be many omissions
in the model specified and estimated. This basically happens because the
data set used would not be satisfying the properties of the CLRM
estimates obtained by the OLS method (which itself is based on the
assumptions made holding true).In this context, the term ‘data mining’ is
used to indicate the various diagnostic procedures to be applied on the
data set or database in order that the database is as free as possible from
theproblems resulting from the consequences of violation of assumptions.
4) (i)the value of coefficient of determination (R2), (ii) the value of the
adjusted R2, (iii) significance of t-ratios estimated, (iv) results of F-test,
(v) signs of the estimated coefficients, (vi) value of Durbin-Watson test
statistic.
5) Notice that the curve for the cubic model is more linear. It is this linear
character of the cubic curve (which could have been in increasing or
decreasing fashion i.e. not necessarily closer or horizontal to the X-axis)
which makes it score over the other two forms of the model estimated.
6) RESET stands for ‘regression specific error test’. It serves the purpose of
empirically or analytically testing for ‘mis-specification’ of the model.
The steps involved are as follows: (i) estimate the fitted values of the
dependent variable in step 1, (ii) estimate the model once again by
including two higher powers of estimates obtained in step 1 (for the
square and the cubic terms) [to be regarded as the unrestricted model],
(iii) consider the R-square values of the restricted and the unrestricted
models and (iv) test for the hypothesis of estimated parameters for the
squared and the cubic terms by the F-test specified in Equation (3.6).The
acceptance of the null establishes the hypothesis of no mis-specification.
55
Empirical Issues 7) One, it suggests no alternative specification. Two, the test does not
in Econometric
Research provide any guide for the number of power of the predicted variables
(i.e. 2 or 3 or 4, etc.)to be included in the unrestricted model.
8) Computed F is 4.67 which is greater than F(2,82) i.e. 3.11.Therefore, we
reject the null hypothesis of correct specification. We conclude that the
model is mis-specified.

Check Your Progress 2

1) It is defined as the set of rules used to select a regression model, from


among a set of candidate models, based on observed data. Its primary
focus is on minimising the ‘residual sum of squares’ (RSS).
2) The former employs a subset of the dataset to forecast the values within
the estimation period and compare them to the actual outcomes. The
latter uses all the values in the available data in the sample to predict the
future value of the regressand.
3) It is defined as the ratio of ‘error sum of squares’ to ‘total sum of
squares’ i.e. ESS/TSS. It suffers from three limitations viz. (i) for
comparing across models the regressand should be the same, (ii) it
increases in its value with the number of explanatory variables, but it
would also increase the ‘error variance’ and (iii) being an estimate, it is
an in-sample measure of goodness of fit which does not essentially
provide an accurate out-of-sample forecasting.
4) It is always less than R2. Further, in case of adjusted-R2, any increase in
its value is only when it adds to the explanatory power of the model (and
not otherwise like in the case of R2).
5) AIC is an estimator of the relative quality of each model i.e. relative to
���
others. It is defined as: ��� = � ��⁄� ∑ �^�� = � ��⁄� � where k is the
number of parameters in the model and n is number of observations in
the sample. By imposing a stronger penalty for including every
additional variable as compared to adjusted-R2, it is superior to the
adjusted-R2.Its advantages are: (i) it is useful both for in-sample as well
as out-of-sample forecasting of a regression model; and (ii)it is also
useful in choosing the lag length in an autoregressive model in time
series analysis.
∑ �^�� ���
6) SIC is defined as: ��� = ��⁄� �
= ��⁄� �
.SICimposes the strongest
penalty (stronger than AIC) for adding an additional variable in the
regression model. It is useful for comparing both the in-sample as well as
the out-of-sample forecasting performance of a regression model.

56

You might also like