Econometrics CRT M2: Regression Model Evaluation
Econometrics CRT M2: Regression Model Evaluation
Provide a thorough, rigorous analysis of which of the models is preferred. Your analysis should include features of each
coefficient, each model, and each of the diagnostic statistics. Do NOT analyses them one-by-one, but by theme as identified in
Module 2 of Econometrics. For the preferred model, give an analysis of the likely correlation among the explanatory variables.
ANSWERS
1. A Linear Regression Model is used to predict the value of a dependent variable Y, using a set
of independent variables and an intercept/constant.
When the model is run, it returns a set of coefficients for each of the input variables, their significance in
the predictive model and a set of consolidated test scores for the model, such as R-squared, Adjusted R-
squared, F-statistic, etc.
2. We try to reject the following Null Hypothesis and accept the alternative
hypothesis. Null Hypothesis : The coefficient of all the input variables is
zero. Alternative Hypothesis : The estimated coefficient is not zero.
3. Explanation of some of the KPIs from the Regression output:
a. Coefficient of variables: If the coefficient is positive, it means that the independent variable
x and the dependent variable Y are proportional to each other, i.e. a positive movement in x
will result in a positive movement in Y.
b. P-Value of variables: p-value refers to the degree of significance of the independent variable.
It is the measure of the probability that an observed difference could have occurred just by
random chance. A smaller p-value signifies stronger evidence in favor of the alternative
hypothesis.
c. R-squared: It represents the proportion of the variance for a dependent variable that is explained
by the set of independent variables used in the model.
4. Model selection:
Step 1: Reject the model where any of the variable is having a p-value > 0.05. If any of the variable is
having a higher p-value, null value cannot be rejected for it, thus the model becomes insignificant.
Model 3, Model 4 and Model 6 are rejected as x4 is having p-value > 0.05 in all the three
models.
a. Step 2: Check the degree of variance in the dependent variable being explained by
the independent variables by looking at R-squared.
It appears that Model 5 is the best looking at this result. However, it will be too early to judge at
this point, as the higher value of R-squared can be due to higher number of independent variables
used in the model.
Model 5 is having 3 variables which is greater than the no of variables in Model 1 and Model 2.
Thus, the higher value of R-squared can be due to additional variables.
However, between Model 1 and Model 2, we can conclude that Model 2 is better, as the number
of variables is same for both the models.
We do not have the information on no of observations in the question. So, we calculate it using
F-Statistics equation.
F-statistics = R2 * (n – p – 1)
(1 – R2) * p
In Model 5, R2 = 0.4638, p = 3, F-statistics = 56.51. Thus, we obtain value of n by substituting in
the above formula.
56.51 = 0.4638 * ( n – 3 – 1)
(1 – 0.4638) * 3
Thus, n = 200
Using the value of n in the Adjusted R-squared formula for Model 2 and Model 5 we get:
Model 2:
Adjusted R-squared = 1 – (1 – R2) (n – 1)
(n – p – 1)
= 1 – (1 – 0.4148) (200 – 1)
(200 – 2 – 1)
= 0.4088
Model 5:
Adjusted R-squared = 1 – (1 – R2) (n – 1)
(n – p – 1)
= 1 – (1 – 0.4638) (200 – 1)
(200 – 3 – 1)
= 0.4556
We can clearly see, that Model 5 is having higher adjusted R-squared. Hence, it is the best model
amongst the set of models in the question.
For the preferred model, give an analysis of the likely correlation among the explanatory variables.
VIF=1/(1-R^2)
VIF=1/(1-0.4638)= 1.8649
The correlation among explanatory variable is very small as can be seen from the VIF calculation.
(3) Write the Fama-French 3 factor model equation, specifying what each term means
RF = Risk-free rate
According to the Fama-French three factor model, small-cap companies outperform large-cap companies and value
companies outperform growth companies. The model expands over the CAPM model to adjust for these out
performance tendencies.
(5) Formulate the Fama-French regression using your stock’s returns, all the Fama-French factors, and the
benchmark returns.
TR = col_number(), XF = col_number()))
View(ffdata)
MKTRF<-ffdata[,2]
SMB<-ffdata[,3]
HML<-ffdata[,4]
XF<ffdata[,7]
XF<-ffdata[,7]
Call:
Coefficients:
print(summary(ffregression))
print(summary(ffregression))
Call:
Residuals:
Coefficients:
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.934 on 247 degrees of freedom
alpha refers to the excess return over the benchmark. beta refers to the factor coefficients, seen in the FF equation
above.
Call:
lm(formula = XF ~ MKTRF, data = ffdata)
Residuals:
Min 1Q Median 3Q Max
-15.6947 -1.2038 0.1935 1.4596 16.1200
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.8920 0.1877 -4.751 3.42e-06 ***
MKTRF 1.3153 0.2266 5.804 1.97e-08 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
F-Statistics = 13.11
P-value = 5.653e-08
F-statistics =33.68
p-value: =1.965e-08
Fama-French 3 factor model better approximates as it accounts for out-performance tendencies from growth/value
and small/large cap companies over the CAPM model.
Choosing the model based on adjusted R^2:
The adjusted R^2 on Fama-French model is higher (12.69%) as compared to to the adjusted R^2 on CAPM
(11.56%) hence Fama-French 3 factor model performed better as compared to CAPM.