Multiple Regression Analysis Project
Multiple Regression Analysis Project
ISSUE OF
MULTICOLLINEARITY
• A High correlation between 2 or
more independent variables is called
Multicollinearity and it can lead to
large Variances for OLS estimators.
• The correlations between of different
independent variables with each Fig.2
other is shown in the above figure.1.
Before going into Multicollinearity,
we can infer that none of the
variables exhibit a perfect Correlation
between each other. Also, none of
them are a Linear combination of the
others. Hence our Assumption.3(No
Perfect collinearity between the
independent variables are satisfied.
• The Impact of Multicollinearity is
measured by the Variance Inflation • Here the V.I.F s of none of the Variables are so large in order to become an
factor. The results are Given in figure
2. issue. Here we can proceed further with our Model.
ISSUE OF • In order to conclusively check the presence of Heteroskedasticity, a
HETEROSKEDAST Bruesh Pagan Test is done. It is first done step by step before using the
ICITY built in Stata Command. The results clearly indicated the presence of
statistically significant levels of Heteroskedasticity. The results are shown
• The Homoskedasticity is initially checked by below.
plotting standardised residuals with fitted
values. The plot obtained was not random (as it
should be) indicating Heteroskedasticity. The
Heteroskedasticity issue was more pronounced
when the functional relationship with selling
price and Explanatory variables were not
Logarithmic.
• Also, some of the observations were removed
which were either seen as an outlier or a
particular group of Cars which had high levels • A regression with robust standard errors for the regressors is done and
variation within in residuals. To our surprise, the significance of the parameters are checked again. Results are shown
the Cars that fell into this group are mostly cars
manufactured by Toyota. Cars like Fortuner, below. Because the new set of standard error are not much different, all
Innova, Corolla, Corolla Altis etc were exhibiting the variables that were statistically significant before are also significant
more than expected residuals, mostly towards
the negative side(Actual SP Very less than the now.
predicted).It is because Toyota’s operations in
India has been performing poorly in the recent
years forcing the company to stop the
production and services of many models. These
all downgraded the customer confidence. The
Unavailability of Spare parts which is also a
common reason for some second hand cars to
be priced less when compared to their similar
counterparts.
THE F-TEST
•Assumption:6-The Population Error is independent of the explanatory variables and is normally distributed
with zero mean and variance : σ^2
•F-test is usually conducted to test the overall significance of a regression or the joint significance of a group
of variables.
•Here we only need to find the overall significance of the Regression. So we are taking the null H0:
β1=β2=β3=β4=β5=0
•Ha: At least one of the βj is different from zero
•The F-statistic is reported by default in Stata.
•Here the probability value associated with F-statistic is almost null(0) and hence the null hypothesis can be
rejected at even a 1 % significance level in favour of the Alternative.
•Hence our Regression is significant, that is the independent variables help in explain the variation in
dependent variable with an R-squared of 93.3%
SUMMARY
•We Estimated the model subjected to the first 4 Gauss Markov Assumptions to get unbiased estimators of the parameters of interest.
(Both size and direction)
•The presence of Multicollinearity is checked as part of post estimation analysis by examining the Variance inflation factor and no
significant levels of the same were found.
•We detected heteroskedasticity initially in our analysis by looking at the plot of fitted values and standardised residuals. Some
observations were removed and the functional relationship between the variables were also changed which decreased the issue to some
extend. A Bruesh Pagan test was done to statistically conclude the Presence of Heteroskedasticity. A Regression with Robust Standard
errors were conducted and new t-statistic values were reported.
•An F-test also showed the overall significance of the Regression.
•Hence our estimated Regression Equation is: Log(Selling_Price)=.81 +.86*Log(Purchase Price) -.038*Age
-1.62e-06 *Kms +.056*Automatic_Dummy + .084*Diesel_Dummy.
•Selling_Price=1.003*Exp(Log[Selling_Price]) - Changing the log form of the Dependent Variable.
•Also we made the model by excluding many of the Cars manufactured by Toyota and hence it may not be a good model in predicting the
Prices of the same.
•With the help of this model, if one finds the price of a particular Car much higher than the predicted result, she may try to bargain it
down or may be in a position to find the reason why it is higher as a result of some extra fittings or unnoticed features of the Car.
•Also if one finds that the Price of Car too less than the predicted value, it will mostly because of the loosing popularity of the Brand or the
unavailability of Spares as the model is no more in production.
THANK YOU