Prediction of Car Price Using Linear Regression
Prediction of Car Price Using Linear Regression
Volume 5 Issue 4, May-June 2021 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
INTRODUCTION
Given the demand for cars, the second-hand car market has other features like air conditioning, sound system, power
been growing in popularity, providing opportunities for both steering, cosmic wheels, and GPS navigator, may all affect the
buyers and sellers. Buying a used car is the best option for price.
customers in several countries because the price is fair and
The following is the outline for this research paper. In
affordable. After a few years of use, it might be possible to
section II the segment looked at some prior studies that
resell them for a profit. However, many factors affect the
were close to this one. We have discussed our methodology
price of a used car, including its age and current condition. In
in section III. We analysed and compared the results of our
most cases, the price of a used car on the market fluctuates.
algorithms in section IV. Section V concludes with a
As a result, a model for evaluating car prices is needed to
conclusion and a potential opportunity.
assist in trading.
Literature Review- Richardson [1] worked on the theory
In this paper, we used multiple linear regression, random
that car manufacturers are more likely to produce cars that
forest regression to build a price model for the car. Each
do not depreciate rapidly in another university study. He
algorithm relied on information gathered from a website.
demonstrated that hybrid cars (cars that use two separate
The main goal of this paper is to find the best predictive
power sources to drive the vehicle, i.e. they have both an
model for car price prediction. Predicting a car's resale value
internal combustion engine and an electric motor) are more
is not an easy job. The fact that the value of used cars is
able to keep their value than conventional vehicles by using
determined by a variety of variables. The most significant
a multiple regression study. This is most likely due to
ones are typically the car's age, model, origin (the
increased environmental concerns regarding climate change
manufacturer's original country), mileage (the number of
and higher fuel efficiency. Other variables such as age,
kilometer’s it has travelled), and horsepower.
mileage, make, and MPG (miles per gallon) were also taken
The fuel economy is also important because of rising fuel into account in this report. He gathered all of his information
prices. Unfortunately, most people may not realise how from various website.
much fuel their car consumes per km driven in reality. Other
To estimate the price of a vehicle, Noor and Jan [2] used
factors include the type of fuel it uses, the interior style, the
multiple linear regression. They used a variable selection
braking system, acceleration, the volume of its cylinders
method to find the variables that had the greatest influence
(measured in cc), safety index, the car's size, number of
and then eliminated the rest. Just a few variables are
doors, paint colour, weight, consumer reviews, prestigious
included in the data, which were used to create the linear
awards won by the car manufacturer, the car's physical
regression model. With an R-square of 98 per cent, the result
condition, whether it is a sports car, whether it has cruise
was remarkable.
control, and whether it is automatic or manual transmission,
whether it belonged to a person or a business, as well as
@ IJTSRD | Unique Paper ID – IJTSRD42421 | Volume – 5 | Issue – 4 | May-June 2021 Page 866
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
Peerun [3] et al. researched to assess the neural network's fairly effective at estimating the residual value of used
success in predicting used car prices. However, particularly vehicles.
on higher-priced vehicles, the predicted value is not very
Sun et al. [5] suggested using the optimized BP neural
similar to the actual price. In predicting the price of a used
network algorithm to develop an online used car price
car, they found that support vector machine regression
assessment model. To maximize secret neurons, they
outperformed neural networks and linear regression.
developed a new optimization method called Like Block-
To forecast the residual value of privately used vehicles, Monte Carlo Method (LB-MCM). As compared to the non-
Gonggi [4] suggested a new model focused on artificial optimized model, the result showed that the optimised
neural networks. The mileage, maker, and estimated useful model produced higher accuracy. Based on previous related
life were the three key features used in this analysis. The works, we discovered that no one had yet used the random
model was tweaked to accommodate nonlinear forest regression model to estimate the price of a used
relationships, which are difficult to analyze using traditional vehicle. As a result, we chose to use a random forest
linear regression approaches. This model was found to be regression model to build a model for evaluating used car
prices.
METHODOLOGY:
This section presents the research methodology
The car dataset for this study was obtained from www.quikr.com. For each vehicle, the following information was gathered:
make, model, seller type, kilometre’s driven, year of manufacture, fuel type, and price. A sample of the collected data is shown
below in Table 1.
Table I. Sample Data collection
SI. no Car Name Year Selling Price Kms Driven Fuel Type Seller Type
1. Ritz 2014 3.35 27000 Petrol Dealer
2. Sx4 2013 4.75 43000 Diesel Dealer
3. Ciaz 2017 7.25 6900 Petrol Dealer
4. Wagon r 2011 2.85 5200 Petrol Dealer
5. Swift 2014 4.60 42450 Diesel Dealer
6. Vitara brezza 2018 9.25 2071 Diesel Dealer
7. S cross 2015 6.50 33429 Diesel Dealer
8. Ciaz 2016 8.75 20273 Diesel Dealer
9. City 2016 9.50 33988 Diesel Dealer
10. Brio 2015 4.00 600000 Petrol Dealer
# Selling Price: In Lakhs
Table II. DESCRIPTIVE STATISTIC OF NUMERICAL VARIABLES
Attributes Mean Std Min Max
Selling Price 4.661296 5.082812 0.100000 35.000000
Present Price 7.628472 8.644115 0.320000 35.000000
Kms Driven 36947.205980 38886.883882 500.000000 500000.000000
Owner 0.043189 0.247915 0.000000 3.000000
These datasets will contain a large amount of used car data, so they will most likely need some tuning and engineering.
Duplicated, for example, the model output can be affected by observations, so they must be excluded beforehand.
Each attribute requires some tweaking, according to the statistical details in Table II. The average price, in particular, was
4.661296, with a standard deviation of 5.082812. This suggested that the price values in the dataset are widely dispersed.
In predictive statistics and machine learning, attributes with a high correlation coefficient have a greater effect on the
prediction variable, although this is not always the case. The correlation coefficient is a statistical measure that defines the
relationship between variables, as its name suggests. The correlation coefficient between two attributes is always in the range
of 1 (Positive relationship) to -1 (Negative relationship), while 0 indicates that there is no correlation at all.
@ IJTSRD | Unique Paper ID – IJTSRD42421 | Volume – 5 | Issue – 4 | May-June 2021 Page 867
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
@ IJTSRD | Unique Paper ID – IJTSRD42421 | Volume – 5 | Issue – 4 | May-June 2021 Page 868
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470
References: Software Engineering, Artificial Intelligence,
[1] RICHARDSON, "Determinants of Used Car Resale Networking and Parallel/Distributed Computing
Value," 2009. (SNPD), pp. 431-36, 2017.
[2] Sadaqat, N. Kanwal and a. J., "Vehicle Price Prediction [6] Monburinon, N. a. Chertchom, P. a. Kaewkiriya, T. a.
System using Machine Learning Techniques," Rungpheung, S. a. Buya, S. a. Boonpou and Pitchayakit,
International Journal of Computer Applications, pp. 27- "Prediction of prices for used car by using regression
31, 2017. models," International Conference on Business and
Industrial Research, no. IEEE, pp. 115-119, 2018.
[3] S. Peerun, N. H. Chummun and a. S. Pudaruth,
"Predicting the Price of Second-hand Cars using [7] Gegic, E. a. Isakovic, B. a. Keco, D. a. Masetic, Z. a.
Artificial Neural Networks," The Second International Kevric and Jasmin, "Car price prediction using
Conference on Data Mining, Internet Computing, and machine learning techniques," TEM Journal, vol. 8, p.
Big Data, pp. 17-21, 2015. 113, 2019.
[4] GONGGI, "New model for residual value prediction of [8] Sinha, S. a. Azim, R. a. Das and Sourav, "Linear
used cars based on BP neural network and non-linear Regression on Car Price Prediction," 2020.
curve fit," International Conference on Measuring
[9] Yang, R. R. a. Chen, S. a. Chou and Edward, Vehicle
Technology and Mechatronics Automation (ICMTMA),,
price prediction using visual features, 2018.
pp. 682-685, 2011.
[10] Kiran and S, "Prediction of Resale Value of the Car
[5] N. Sun, H. Bai, Y. Geng and a. H. Shi, "Price evaluation
Using Linear Regression Algorithm," International
model in second-hand car system based on BP neural
Journal of Innovative Science and Research Technology,
network theory," International Conference on
vol. 5, no. 7, pp. 382-386, 2020.
@ IJTSRD | Unique Paper ID – IJTSRD42421 | Volume – 5 | Issue – 4 | May-June 2021 Page 869