Multiple Regression Analysis Project

1) The study uses data from CarDekho.com to predict the selling prices of used cars in India and analyze the impact of various factors like original price, kilometers driven, age, transmission type, and fuel type. 2) A multiple linear regression model is estimated with the log of selling price as the dependent variable and the log of original price, age, kilometers driven, and dummy variables for diesel, automatic transmission as independent variables. 3) The results show that original price, diesel and automatic transmission have a positive impact on selling price, while age and kilometers driven are negatively associated with selling price, holding other factors constant. Statistical tests support the validity of the model.

Uploaded by

Abhinand C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views9 pages

Multiple Regression Analysis Project

Uploaded by

Abhinand C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 9

MULTIPLE REGRESSION ANALYSIS

PREDICTING USED CAR PRICES IN INDIA SUBITTED BY

1. ABHINAND.C (540)
THE USED CAR MARKET
• The Covid-19 has impacted the fates of many Industries irrespective of any Scales. The Automobile Dealers witnessed a
zero-sale month (April) as a result of the stringent lockdown in India. The Automobile Sales market, especially that of
passenger Cars are expected to gain back its momentum once the restrictions are completely off
• As predicted by Analysts, the Second-hand or Used car market is expected to make a significant boom as more people are
moving away from the crowded public transport mode due to social distancing concerns which is going to be a part of
normal life for the near future.
• The Dataset we choose to work with is from VARIABLE TYPE
CarDekho.com and we intend to predict the second- 1.Selling Price (Dependent) • Continuous
hand selling price and also the causal effects of the
2.Original Price/Purchase Price (Inr) • Continuous
variables which we had considered in the selling price of
a particular car. 3.Kilometers Driven (K.m) • Continuous
• With the Year of make in hand, we can conveniently 4.Age of Car (Years) • Discrete
covert in into age of the car and because data
5.Transmission Type(Petrol/Diesel) • Categorical
limitations ,we are taking it as discrete units.
6.Fuel Type(Manual/Automatic) • Categorical
MODEL
ESTIMATION The population Model is: -
• The Gauss Markov Assumptions that are Log (Selling Price) = β0 + β1. Log (Purchase price) + β2.KMS_Driven + β3.Age
required to get an unbiased OLS estimator are
discussed below: +β4.Automatic_Dummy + β5.Diesel_Dummy + u
1. The population model is Linear in parameters.
2. Random Sampling- A random Sample is
obtained from CarDekho.com. Some outliers
were however removed on primary Visual
Inspection.
3. No perfect Collinearity between the
independent variables is also satisfied.
(Discussed in detail in Slide 4).
4. The expected value of Errors conditional on
different independent variables should be 0.
Some omitted variables that are present in the
error may affect the price such as ‘trend
factors’, ‘vintage value’ are not in any way
correlated with the added variables intuitively. The Estimated Model is:-
Therefore, even if the Marginal change Log(Selling Price)^= .8084958 + .8611*Log(Purchase Price) ^ -.0377*Age^ -
associated by an omitted variable is non zero,
the almost zero correlation makes sure that 1.62e-06*Kms_Driven^ + .083874*Diesel_dummy^
there is no bias because of the omitted values. + .0562636*Automatic_dummy^.
Hence assumption 4 holds in our estimation.
Taking proper functional forms also help in
strengthening the assumption. The correlation
between each of the independent variables
and residuals are found to be 0 which also
reinforces our claim of a satisfied MLR.4
assumption.(Figure.1 & 2) Fig.1 Fig.2
COEFFICIENTS AND ITS
INTERPRETATIONS
A NOTE ON SELECTION OF FUNCTIONAL FORM INTERPRETATION OF COEFFIECNETS
•By using log (Price), the CLM assumptions are getting more •For a Car, keeping all other variables constant, a 1% increase in
reinforced, especially Assumption.4 and Heteroskedasticity purchase price will increase the Selling price by .86%
assumption. Strictly positive variables often have conditional
•As the Age of Car increases by 1 year, the price of the Car drops by
distributions that are either skewed or Heteroskedastic.
3.77% when all the other variables are held constant.
• Taking log can mitigate the effect of Heteroskedasticity and it was
•Keeping Everything else constant, a Car that had driven 1 more Km
evident from the course of our study while analysing the plots of
will have its selling price decresed by .0162%.
Residuals and fitted values.
•Keeping all other factors as constant, a Car which has got an
•So apart from a 2 to 3 extra steps involved in changing the
automatic transmission will cost 5.6 % more than one which has a
Logarithmic or percentage change to absolute terms, taking the
manual transmission.
logarithmic function for the relation between the Selling Price and
other variables will definitely give us an unbiased and a more precise •Keeping all other factors as the same, a car which runs on Diesel costs
estimator as compared to any other alternative. 8.4% more than one which runs on petrol.
• There is no practical significance in interpreting the Intercept coefficient as No car will have a Purchase Price of 0 Rupees. The properties
of OLS no longer hold for regression through origin (intercept coefficient is 0). The cost of estimating the intercept coefficient when it is
actually zero is that the variances of OLS estimators will be larger-a trade-off which everyone has to make for getting unbiased and
consistent estimators.
Fig.1

ISSUE OF
MULTICOLLINEARITY
• A High correlation between 2 or
more independent variables is called
Multicollinearity and it can lead to
large Variances for OLS estimators.
• The correlations between of different
independent variables with each Fig.2
other is shown in the above figure.1.
Before going into Multicollinearity,
we can infer that none of the
variables exhibit a perfect Correlation
between each other. Also, none of
them are a Linear combination of the
others. Hence our Assumption.3(No
Perfect collinearity between the
independent variables are satisfied.
• The Impact of Multicollinearity is
measured by the Variance Inflation • Here the V.I.F s of none of the Variables are so large in order to become an
factor. The results are Given in figure
2. issue. Here we can proceed further with our Model.
ISSUE OF • In order to conclusively check the presence of Heteroskedasticity, a
HETEROSKEDAST Bruesh Pagan Test is done. It is first done step by step before using the
ICITY built in Stata Command. The results clearly indicated the presence of
statistically significant levels of Heteroskedasticity. The results are shown
• The Homoskedasticity is initially checked by below.
plotting standardised residuals with fitted
values. The plot obtained was not random (as it
should be) indicating Heteroskedasticity. The
Heteroskedasticity issue was more pronounced
when the functional relationship with selling
price and Explanatory variables were not
Logarithmic.
• Also, some of the observations were removed
which were either seen as an outlier or a
particular group of Cars which had high levels • A regression with robust standard errors for the regressors is done and
variation within in residuals. To our surprise, the significance of the parameters are checked again. Results are shown
the Cars that fell into this group are mostly cars
manufactured by Toyota. Cars like Fortuner, below. Because the new set of standard error are not much different, all
Innova, Corolla, Corolla Altis etc were exhibiting the variables that were statistically significant before are also significant
more than expected residuals, mostly towards
the negative side(Actual SP Very less than the now.
predicted).It is because Toyota’s operations in
India has been performing poorly in the recent
years forcing the company to stop the
production and services of many models. These
all downgraded the customer confidence. The
Unavailability of Spare parts which is also a
common reason for some second hand cars to
be priced less when compared to their similar
counterparts.
THE F-TEST
•Assumption:6-The Population Error is independent of the explanatory variables and is normally distributed
with zero mean and variance : σ^2
•F-test is usually conducted to test the overall significance of a regression or the joint significance of a group
of variables.
•Here we only need to find the overall significance of the Regression. So we are taking the null H0:
β1=β2=β3=β4=β5=0
•Ha: At least one of the βj is different from zero
•The F-statistic is reported by default in Stata.
•Here the probability value associated with F-statistic is almost null(0) and hence the null hypothesis can be
rejected at even a 1 % significance level in favour of the Alternative.
•Hence our Regression is significant, that is the independent variables help in explain the variation in
dependent variable with an R-squared of 93.3%
SUMMARY
•We Estimated the model subjected to the first 4 Gauss Markov Assumptions to get unbiased estimators of the parameters of interest.
(Both size and direction)
•The presence of Multicollinearity is checked as part of post estimation analysis by examining the Variance inflation factor and no
significant levels of the same were found.
•We detected heteroskedasticity initially in our analysis by looking at the plot of fitted values and standardised residuals. Some
observations were removed and the functional relationship between the variables were also changed which decreased the issue to some
extend. A Bruesh Pagan test was done to statistically conclude the Presence of Heteroskedasticity. A Regression with Robust Standard
errors were conducted and new t-statistic values were reported.
•An F-test also showed the overall significance of the Regression.
•Hence our estimated Regression Equation is: Log(Selling_Price)=.81 +.86*Log(Purchase Price) -.038*Age
-1.62e-06 *Kms +.056*Automatic_Dummy + .084*Diesel_Dummy.
•Selling_Price=1.003*Exp(Log[Selling_Price]) - Changing the log form of the Dependent Variable.
•Also we made the model by excluding many of the Cars manufactured by Toyota and hence it may not be a good model in predicting the
Prices of the same.
•With the help of this model, if one finds the price of a particular Car much higher than the predicted result, she may try to bargain it
down or may be in a position to find the reason why it is higher as a result of some extra fittings or unnoticed features of the Car.
•Also if one finds that the Price of Car too less than the predicted value, it will mostly because of the loosing popularity of the Brand or the
unavailability of Spares as the model is no more in production.
THANK YOU

Data Mining and Predictive Analytics - Andres Fortino
No ratings yet
Data Mining and Predictive Analytics - Andres Fortino
390 pages
Chapter 5 Violations of CLRM Assumptions
100% (2)
Chapter 5 Violations of CLRM Assumptions
25 pages
Data Analysis
No ratings yet
Data Analysis
263 pages
Using Multivariate Statistics: Barbara G. Tabachnick
100% (1)
Using Multivariate Statistics: Barbara G. Tabachnick
22 pages
Patrick Dattalo - Analysis of Multiple Dependent Variables
No ratings yet
Patrick Dattalo - Analysis of Multiple Dependent Variables
191 pages
Estimating Demand: Regression Analysis
No ratings yet
Estimating Demand: Regression Analysis
29 pages
The Ultimate Guide To Python Programming With Python 3.10
No ratings yet
The Ultimate Guide To Python Programming With Python 3.10
2 pages
Transaction Processing System (TPS) Office Automation System (OAS)
100% (17)
Transaction Processing System (TPS) Office Automation System (OAS)
66 pages
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
No ratings yet
An Introduction To Modern Econometrics Using Stata by Christopher F. Baum
362 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Robust Regression Modeling With STATA Lecture Notes
No ratings yet
Robust Regression Modeling With STATA Lecture Notes
93 pages
Girish Chadha - 29th December 2022
100% (3)
Girish Chadha - 29th December 2022
35 pages
OLS Assumptions
No ratings yet
OLS Assumptions
40 pages
Secondary Analysis of Existing Data
No ratings yet
Secondary Analysis of Existing Data
5 pages
I. Management and Leadership (17 Questions) : Certified Quality Engineer (Cqe) Body of Knowledge
No ratings yet
I. Management and Leadership (17 Questions) : Certified Quality Engineer (Cqe) Body of Knowledge
11 pages
Submitted To Submitted by
No ratings yet
Submitted To Submitted by
44 pages
Kuiper Ch03 PDF
No ratings yet
Kuiper Ch03 PDF
35 pages
SMDM Predictive Modeling Business Report 05.02.2022 PDF
No ratings yet
SMDM Predictive Modeling Business Report 05.02.2022 PDF
38 pages
2015 Regression Using Stata and SAS
No ratings yet
2015 Regression Using Stata and SAS
36 pages
Session7 LinearRegression
No ratings yet
Session7 LinearRegression
52 pages
Regression and Assumptions
No ratings yet
Regression and Assumptions
49 pages
AnmolSharma RandomMotors
No ratings yet
AnmolSharma RandomMotors
35 pages
New Section 1
No ratings yet
New Section 1
39 pages
Notes - Part II - Watermark
No ratings yet
Notes - Part II - Watermark
49 pages
Topic 7 Regression (Cont.)
No ratings yet
Topic 7 Regression (Cont.)
47 pages
M - E Slides
No ratings yet
M - E Slides
72 pages
Regression Analysis
No ratings yet
Regression Analysis
19 pages
Deeptex Industries
No ratings yet
Deeptex Industries
78 pages
Arun 27072021 Predictive Modeling PDF
No ratings yet
Arun 27072021 Predictive Modeling PDF
33 pages
Linear Regression Firm Basit PDF
No ratings yet
Linear Regression Firm Basit PDF
21 pages
Two-Way ANOVA and Heteroskedasticity
No ratings yet
Two-Way ANOVA and Heteroskedasticity
27 pages
Econometrics Project - 5808 - Vedaant Upadhgaya 2
No ratings yet
Econometrics Project - 5808 - Vedaant Upadhgaya 2
19 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
24 pages
Nutrition Survey Guidelines For Somalia - Revised March 2006
No ratings yet
Nutrition Survey Guidelines For Somalia - Revised March 2006
62 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
Exhibit+2 Ajmal Regression
No ratings yet
Exhibit+2 Ajmal Regression
15 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
5) Mba Assignment 4
No ratings yet
5) Mba Assignment 4
2 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Regression - Analysis - Sumana Mondal
No ratings yet
Regression - Analysis - Sumana Mondal
18 pages
Econometrics Guide E-Veiw
No ratings yet
Econometrics Guide E-Veiw
16 pages
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
No ratings yet
Intro To Econometrics Latter Half Chanon-1016098-17101310898743
15 pages
HW3 Solution
No ratings yet
HW3 Solution
10 pages
CW Report
No ratings yet
CW Report
15 pages
Module 7 Content
No ratings yet
Module 7 Content
10 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
OLS Assumptions and Diagnostics
No ratings yet
OLS Assumptions and Diagnostics
18 pages
1 - Introduction To Statistics
No ratings yet
1 - Introduction To Statistics
34 pages
Report Group 8 Final
No ratings yet
Report Group 8 Final
13 pages
Kuiper Ch03
No ratings yet
Kuiper Ch03
35 pages
Advanced ML PDF
No ratings yet
Advanced ML PDF
25 pages
07 - Multiple Linear Regression III
No ratings yet
07 - Multiple Linear Regression III
6 pages
Car Price Prediction
No ratings yet
Car Price Prediction
18 pages
Dafm Cia 2 - 2227610
No ratings yet
Dafm Cia 2 - 2227610
16 pages
Standardized Coefficients
No ratings yet
Standardized Coefficients
5 pages
Countertransference in Successful and Unsuccessful Cases of Psychotherapy
No ratings yet
Countertransference in Successful and Unsuccessful Cases of Psychotherapy
7 pages
Iconies 2018 Uin Maliki Malang
No ratings yet
Iconies 2018 Uin Maliki Malang
10 pages
Correlation Research
No ratings yet
Correlation Research
24 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Economic
No ratings yet
Economic
11 pages
Project Group 20
No ratings yet
Project Group 20
3 pages
Explainati On Interpretation of STATA Regression Output BY DR, Wahid Sherani
No ratings yet
Explainati On Interpretation of STATA Regression Output BY DR, Wahid Sherani
3 pages
Regression Lecture Notes
No ratings yet
Regression Lecture Notes
8 pages
The Four Assumptions of Linear Regression
No ratings yet
The Four Assumptions of Linear Regression
10 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Lecture Plan 12 - 16!1!1
No ratings yet
Lecture Plan 12 - 16!1!1
7 pages
Project Stat
No ratings yet
Project Stat
2 pages
We Have To Consider Different Variables While Purchasing The Used Cars
No ratings yet
We Have To Consider Different Variables While Purchasing The Used Cars
2 pages
What Are The Consequences of Heteroscedasticity and Multicollinearity in Regression? What Are The Possible Remedies?
No ratings yet
What Are The Consequences of Heteroscedasticity and Multicollinearity in Regression? What Are The Possible Remedies?
3 pages
Why's and Wherefore's
No ratings yet
Why's and Wherefore's
15 pages
Homework1 1
No ratings yet
Homework1 1
3 pages
Assosa University School of Graduate Studies Mba Program
No ratings yet
Assosa University School of Graduate Studies Mba Program
10 pages
ECON 4613 - Assignment 2
No ratings yet
ECON 4613 - Assignment 2
11 pages
5CAI3-01 - Data Mining-Concepts and Techniques
No ratings yet
5CAI3-01 - Data Mining-Concepts and Techniques
2 pages
Assignment (NAOE-4107)
No ratings yet
Assignment (NAOE-4107)
4 pages
Simple Linear Regression in SPSS
No ratings yet
Simple Linear Regression in SPSS
8 pages
Econometrics Assignment2
No ratings yet
Econometrics Assignment2
8 pages
Chapter 4
No ratings yet
Chapter 4
2 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Project Guideline For BBA-5th Sem
No ratings yet
Project Guideline For BBA-5th Sem
7 pages
Bus 173 Report - Project Work Bus 173 Report - Project Work
No ratings yet
Bus 173 Report - Project Work Bus 173 Report - Project Work
7 pages
Practical Research 2 Q2Module 2
No ratings yet
Practical Research 2 Q2Module 2
5 pages
Aiag SPC
No ratings yet
Aiag SPC
26 pages
Tests of Hypothesis
No ratings yet
Tests of Hypothesis
16 pages
SMDM Extended Project
No ratings yet
SMDM Extended Project
1 page
Cost & Managerial Accounting II Essentials
From Everand
Cost & Managerial Accounting II Essentials
William D. Keller
4/5 (1)
Cost Accounting for Entrepreneurs
From Everand
Cost Accounting for Entrepreneurs
Sam Ghosh
4.5/5 (2)

Multiple Regression Analysis Project

Uploaded by

Multiple Regression Analysis Project

Uploaded by

MULTIPLE REGRESSION ANALYSIS

PREDICTING USED CAR PRICES IN INDIA SUBITTED BY

You might also like