0% found this document useful (0 votes)
40 views

Car Price Predicition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Car Price Predicition

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 96

lOMoARcPSD|40893658

A Mini Project Report on

CAR PRICE PREDICTION USING LINEAR REGRESSION


Submitted to

Jawaharlal Nehru Technological University, Hyderabad


in partial fulfillment of requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In

COMPUTER SCIENCE AND ENGINEERING


By

Department of Computer Science and Engineering


KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
Approved by AICTE, Affiliated to JNTUH
3-5-1206, Narayanaguda, Hyderabad – 500029
2022-2023

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY


(Accredited by NBA & NAAC, Approved By A.I.C.T.E., Reg by Govt of Telangana
State & Affiliated to JNTU, Hyderabad)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the project entitled CAR PRICE PREDICTION USING LINEAR
REGRESSION being submitted by

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CONTENT

DESCRIPTION PAGE NO.

ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
CHAPTERS
1. INTRODUCTION 1-14
1.1. Machine Learning 1

1.2. What is Machine Learning 1

1.3. Types of Machine Learning 3

1.4. Linear Regression 8

1.5. Objective & Problem Statement 13

1.6. Purpose of Project 13

1.7. Architecture Diagram 14

1.8. Project Goal 14

2. SOFTWARE
REQUIREMENTS 15-16
SPECIFICATIONS
2.1. Requirements Specification Document 16
2.2. Functional Requirements 17
2.3. Non-Functional Requirements 17
2.4. Software Requirements 18
2.5. Hardware Requirements 18
2.6. Requirement Analysis 19
2.7. Test Construction and verification 20
2.8. Test Execution and Bug Reporting 20
2.9. Final Testing and Implementation 20
2.10. Post Implementation 20
2.11. Technologies used 21

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CAR PRICE PREDICTION

3. LITERATURE SURVEY 24-27


3.1. Proposed Model 25
3.2. Paper Work 26
27
3.3. Related Work
4. SYSTEM DESIGN 28-33
4.1. Introduction to UML 29
4.2. UML Diagrams 29
4.2.1. Use Case diagram 29
4.2.2. Sequence diagram 31

4.2.3. Class diagram 33

4.2.4. System Design 34

4.2.5. State Chart Diagram 36

5. IMPLEMENTATION 38-59
5.1. Pseudo code 39

5.2. Data Cleaning using Google Colab 40

5.2. Code Snippets 52

6. TESTING 60-72

6.1. Introduction to Testing 61

6.2. Test Cases 63

7. SCREENSHOTS 73-75

7.1. Layout of Testing Platform 74

7.2. Log & Reference 74

7.3. UI of Web Application 75

8.FURTHER ENHANCEMENTS 76

9.CONCLUSION 78

10.REFERENCES 80

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

ABSTRACT

In this fast-moving generation, the present study proposes the newer concept of
predicting the prices of certain items. With an idea and motivation to help everyone we
came up with a solution to get an appropriate estimate of one’s car using Machine
Learning Techniques which will save a lot of time and money. A car price prediction has
been a high interest research area, as it requires noticeable effort and knowledge of the
field expert. Considerable number of distinct attributes is examined for the reliable and
accurate prediction. The production of cars has been steadily increasing in the past
decade, with over 70 million passenger cars being produced in the year 2016. This has
given rise to the used car market, which on its own has become a booming industry. The
recent advent of online portals has facilitated the need for both the customer and the
seller to be better informed about the trends and patterns that determine the value of a
used car in the market. To build a model for predicting the price of used cars in, we
applied one of the machine learning techniques i.e., Linear Regression. Using linear
regression, there are multiple independent variables, but one and only one dependent
variable whose actual and predicted values are compared to find precision of results. Our
paper proposes a system where price is dependent variable which is predicted, and this
price is derived from factors like kilometers driven, car purchase year, Car Company, car
model, and the fuel type.

Keywords: Car Price Prediction, Linear Regression, Machine Learning, dependent


variable etc.

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

LIST OF FIGURES

LIST OF FIGURES PAGE NO

1.1 Machine Learning 1

1.2 Machine Learning & Traditional 2


Programming

1.3 Types of Machine Learning 3

1.3.1 Data Set of Supervised Learning 3

1.3.1.2 Types of Supervised Learning 4

1.3.2 Unsupervised 5

1.3.2.1 Types of Unsupervised Learning 6

1.3.4 Reinforcement Learning 7

1.4 Linear Regression 8

1.7 Architecture of Linear Regression’ 14

3.8.1 Google colab 22

4.2.1 Use Case Diagram -UML 30

4.2.2 Sequence Diagram –UML 32

4.2.3 Class Diagram –UML 33

4.2.4 System Design-UML 35

4.2.5 State Chart Diagram –UML 37

7.1 Selenium IDE Testing Platform 74

7.2 Log & Reference using Selenium IDE 74

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

7.3 Register page of Web Application -UI


75

7.5 Login page of Web Application -UI 75

7.5 Home page of Web Application-UI 76

7.6 Displaying available car companies -UI 76

7.7 Displaying suitable car models -UI 77

7.8 Displaying available years -UI 77

7.9 Displaying available Fuel Types- UI 78

7.10 Displaying Predicted Price -UI 78

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

LIST OF TABLES

6.2 Test Case for Web Application 62

6.2.1 Launching web application 62

6.2.2 Registration of user details 64

6.2.3 Login Positive test case 65

6.2.4 Login Negative test case 66

6.2.5 Displaying Attributes 66

6.2.6 Selecting Attributes 68

6.2.7 Selecting attributes for correct attributes 69

6.2.8 Selecting attributes for incorrect attributes 70

6.2.9 Home button Test case 71

6.2.10 Logout button Test case 72

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CHAPTER -1

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CAR PRICE PREDICTION

1. INTRODUCTION

1.1 MACHINE LEARNING


Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML is one of the most exciting technologies that one would have
ever come across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn. Machine learning is actively being used today, perhaps in
many more places than one would expect.

Figure 1.1 Machine

1.2 What is Machine

Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined
the term ―Machine Learning‖. He defined machine learning as – a ―Field of study that gives
computers the capability to learn without being explicitly programmed‖. In a very layman’s
manner, Machine Learning (ML) can be explained as automating and improving the learning
process of computers based on their experiences without being actually programmed i.e. without
any human assistance. The process starts with feeding good quality data and then training our
machines(computers) by building machine learning models using the data and different
algorithms. The choice of algorithms depends on what type of data do we have and what kind of
task we are trying to automate. Example: Training of students during exams. While preparing
for the exams students don’t actually cram the subject but try to learn it with complete
understanding. Before the examination, they feed their machine(brain) with a good amount of
high-quality data (questions and answers from different books or teachers’ notes, or online video
lectures).

Downloaded by Prashant Chaudhari


Page 1
lOMoARcPSD|40893658

CAR PRICE PREDICTION

Actually, they are training their brain with input as well as output i.e, what kind of approach or
logic do they have to solve a different kinds of questions. Each time they solve practice test
papers and find the performance (accuracy /score) by comparing answers with the answer key
given, Gradually, the performance keeps on increasing, gaining more confidence with the adopted
approach. That’s how actually models are built, train machine with data (both inputs and outputs
are given to the model), and when the time comes test on data (with input only) and achieve our
model scores by comparing its answer with the actual output which has not been fed while
training. Researchers are working with assiduous efforts to improve algorithms, and techniques so
that these models perform even much better.

Figure 1.2 Machine Learning &itional Programming

1.2.1 Basic Difference in ML and Traditional Programming?


Traditional Programming: We feed in DATA (Input) + PROGRAM (logic), run it on
the machine, and get the output.
Machine Learning: We feed in DATA (Input) + Output, run it on the machine during
training and the machine creates its own program (logic), which can be evaluated while
testing.

Page 2
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

1.3 ML | Types of Machine Learning


A machine is said to be learning from past experiences (data feed-in) with respect
to some class of tasks if its Performance in a given Task improves with the Experience.
For example, assume that a machine has to predict whether a customer will buy a specific
product let’s say ―Antivirus‖ this year or not. The machine will do it by looking at the
previous knowledge/past experiences i.e the data of products that the customer had
bought every year and if he buys Antivirus every year, then there is a high probability
that the customer is going to buy an antivirus this year as well. This is how machine
learning works at the basic conceptual level.

Figure 1.3 Types of Machine Laening

1.3.1 Supervised Learning


Supervised learning is when the model is getting trained on a labeled dataset. A labeled
dataset is one that has both input and output parameters. In this type of learning training
and validation, datasets are labeled as shown in the figures below.

Example

Both the above figures have labeled data set as follows:


Figure A: It is a dataset of a shopping store that is useful in predicting whether a

Page 3
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

customer will purchase a particular product under consideration or not based on his/ her
gender, age, and salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that
the customer won’t purchase it.
Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed
based on different parameters.
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output: Wind Speed

1.3.1 Types of Supervised Learning:

Figure 1.3.1 Types of Supervised Learning


A. Classification:
It is a Supervised Learning task where output is having defined labels (discrete
value). For example in above Figure A, Output – Purchased has defined labels i.e. 0 or 1;
1 means the customer will purchase, and 0 means that the customer won’t purchase. The
goal here is to predict discrete values belonging to a particular class and evaluate them on
the basis of accuracy.
It can be either binary or multi-class classification. In binary classification, the model
predicts either 0 or 1; yes or no but in the case of multi-class classification, the model
predicts more than one class. Example: Gmail classifies mails in more than one class like
social, promotions, updates, and forums.

KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 4


Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

B. Regression:
It is a Supervised Learning task where output is having continuous value.
For example in above Figure B, Output – Wind Speed is not having any discrete value
but is continuous in a particular range. The goal here is to predict a value as much closer
to the actual output value as our model can and then evaluation is done by calculating the
error value. The smaller the error, the greater the accuracy of our regression model.

Example of Supervised Learning Algorithms:


 Linear Regression
 Logistic Regression
 Nearest Neighbor
 Gaussian Naive Bayes
 Decision Trees
 Support Vector Machine (SVM)
 Random Forest

1.3.2 Unsupervised Learning:


Unsupervised machine learning analyzes and clusters unlabeled datasets using
machine learning algorithms. These algorithms find hidden patterns and data without any
human intervention, i.e., we don’t give output to our model. The training model has only
input parameter values and discovers the groups or patterns on its own. Data-set in
Figure A is Mall data that contains information about its clients that subscribe to them.
Once subscribed they are provided a membership card and the mall has complete
information about the customer and his/her every purchase. Now using this data and
unsupervised learning techniques, the mall can easily group clients based on the
parameters we are feeding in.

Figure 1.3.2 Unsupervised Learning

Page 5
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

The input to the unsupervised learning models is as follows:


Unstructured data: May contain noisy (meaningless) data, missing values, or unknown
data

1.3.2.1 Types of Unsupervised Learning are as follows:

Figure 1.3.2.1 Types of Unsupervised

Clustering: Broadly this technique is applied to group data based on different patterns,
such as similarities or differences, our machine model finds. These algorithms are used to
process raw, unclassified data objects into groups. For example, in the above figure, we
have not given output parameter values, so this technique will be used to group clients
based on the input parameters provided by our data.
Association: This technique is a rule-based ML technique that finds out some very useful
relations between parameters of a large data set. This technique is basically used for
market basket analysis that helps to better understand the relationship between different
products. For e.g. shopping stores use algorithms based on this technique to find out the
relationship between the sale of one product w.r.t to another’s sales based on customer
behavior. Like if a customer buys milk, then he may also buy bread, eggs, or butter. Once
trained well, such models can be used to increase their sales by planning different offers.
Some algorithms: K-Means Clustering

DBSCAN – Density-Based Spatial Clustering of Applications with Noise


BIRCH – Balanced Iterative Reducing and Clustering using Hierarchies Hierarchical
Clustering
1.3.3 Semi-supervised Learning:
As the name suggests, its working lies between Supervised and Unsupervised
techniques. We use these techniques when we are dealing with data that is a little bit
labeled and the rest large portion of it is unlabeled.

Page 6

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CAR PRICE PREDICTION

This technique is mostly applicable in the case of image data sets where usually all
images are not labeled.
1.3.4 Reinforcement Learning:
In this technique, the model keeps on increasing its performance using Reward
Feedback to learn the behavior or pattern. These algorithms are specific to a particular
problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and
even itself to get better and better performers in Go Game. Each time we feed in data,
they learn and add the data to their knowledge which is training data. So, the more it
learns the better it gets trained and hence experienced.

Figure1.3.4 Reinforcement

 Agents observe input.


 An agent performs an action by making some decisions.
 After its performance, an agent receives a reward and accordingly reinforces and
the model
 stores in state-action pair of information.
 Temporal Difference (TD)
 Q-Learning and Deep Adversarial Networks.

Page 7
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

1.4 ML | Linear Regression


In statistics, linear regression is a linear approach for modelling the relationship
between a scalar response and one or more explanatory variables (also known as
dependent and independent variables). The case of one explanatory variable is called
simple linear regression; for more than one, the process is called multiple linear
regression. This term is distinct from multivariate linear regression, where multiple
correlated dependent variables are predicted, rather than a single scalar variable.

In linear regression, the relationships are modeled using linear predictor functions whose
unknown model parameters are estimated from the data. Such models are called linear
models. Most commonly, the conditional mean of the response given the values of the
explanatory variables (or predictors) is assumed to be an affine function of those values;
less commonly, the conditional median or some other quantile is used. Like all forms of
regression analysis, linear regression focuses on the conditional probability distribution
of the response given the values of the predictors, rather than on the joint probability
distribution of all of these variables, which is the domain of multivariate analysis.

Linear regression was the first type of regression analysis to be studied rigorously, and to
be used extensively in practical applications. This is because models which depend
linearly on their unknown parameters are easier to fit than models which are non-linearly
related to their parameters and because the statistical properties of the resulting estimators
are easier to determine.

Figure 1.4 Linear Regression

Page 8
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

Linear regression has many practical uses. Most applications fall into one of the
following two broad categories:

 If the goal is prediction, forecasting, or error reduction,[clarification needed] linear


regression can be used to fit a predictive model to an observed data set of values of
the response and explanatory variables. After developing such a model, if additional
values of the explanatory variables are collected without an accompanying response
value, the fitted model can be used to make a prediction of the response.
 If the goal is to explain variation in the response variable that can be attributed to
variation in the explanatory variables, linear regression analysis can be applied to
quantify the strength of the relationship between the response and the explanatory
variables, and in particular to determine whether some explanatory variables may
have no linear relationship with the response at all, or to identify which subsets of
explanatory variables may contain redundant information about the response.

Linear regression models are often fitted using the least squares approach, but they may
also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm
(as with least absolute deviations regression), or by minimizing a penalized version of the
least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm
penalty). Conversely, the least squares approach can be used to fit models that are not
linear models. Thus, although the terms "least squares" and "linear model" are closely
linked, they are not synonymous.

Given a data set *𝑦i 𝑥i1 , . . .


𝑥i𝑝 + i=
of n statistical units, a linear regression model assumes
1
that the relationship between the dependent variable y and the p-vector of regressors x is
linear. This relationship is modeled through a disturbance term or error variable ε — an
unobserved random variable that adds "noise" to the linear relationship between the
dependent variable and regressors. Thus the model takes the form

𝑦i = 𝛽0 + 𝛽1𝑥i1+ . . . + 𝛽𝑝𝑥i𝑝 + isi = 𝑥 𝑇 𝛽 + si, i =1, …n

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CAR PRICE PREDICTION

where T denotes the transpose, so that xiTβ is the inner product between vectors xi and β.

Often these n equations are stacked together and written in matrix notation as
𝑦 = 𝑥𝛽 + s,

The very simplest case of a single scalar predictor variable x and a single scalar response
variable y is known as simple linear regression. The extension to multiple and/or vector-
valued predictor variables (denoted with a capital X) is known as multiple linear
regression, also known as multivariable linear regression (not to be confused with
multivariate linear regression.

Multiple linear regression is a generalization of simple linear regression to the case of


more than one independent variable, and a special case of general linear models,
restricted to one dependent variable. The basic model for multiple linear regression is
𝑌i = 𝛽0 + 𝛽1𝑥i1+𝛽𝑝𝑥i𝑝 + si
for each observation i = 1,...., n.

In the formula above we consider n observations of one dependent variable and p


independent variables. Thus, Yi is the ith observation of the dependent variable, Xij is ith
observation of the jth independent variable, j = 1, 2, ..., p. The values βj represent
parameters to be estimated, and εi is the ith independent identically distributed normal
error.

In the more general multivariate linear regression, there is one equation of the above
form for each of m > 1 dependent variables that share the same set of explanatory

𝑌ij = 𝛽0j + 𝛽1j𝑥i1+ 𝛽𝑝j𝑥i𝑝 + sij


variables and hence are estimated simultaneously with each other:

for all observations indexed as i = 1,....., n and for all dependent variables indexed as j =
1,....., m.

Nearly all real-world regression models involve multiple predictors, and basic
descriptions of linear regression are often phrased in terms of the multiple regression
model. Note, however, that in these cases the response variable y is still a scalar. Another
term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as
general linear regression.

Page 1
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

1.4.1 Type of loss in a linear model:

L1 loss: This is the difference between the predicted and actual values. It is also called
mean absolute error (MAE).

The model will calculate all the MAE values and add them to find the total L1 Loss. The

𝑀𝐴𝐸 = ∑ − 𝑦̂ |
1
formula of L1 loss is shown below.

|
𝑦
𝑁 i=1 i
where, 𝑦̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦
𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦

L2 Loss: In this loss, we take the squared average difference between the predicted and
actual value. It is also known as Mean Squared Error (MSE). The formula of L2 loss is

𝑀𝑆𝐸 = − 𝑦̂ )2
1
shown below.

∑𝑁 (𝑦
𝑁 i=1 i

where, 𝑦̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦


𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦

RSME Error: It tells the error rate by the square root of the L2 loss i.e. MSE. The
formula of RSME is shown below.

𝑅𝑆𝑀𝐸 =√
√𝑀𝑆𝐸 = 1 (𝑦 − 𝑦̂ )2
∑𝑁i=1 i
𝑁

Where, 𝑦̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦


𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦

R-squared error: It tells the good fit of the model-predicted line with the actual values
of data. The coefficient value range is from 0 to 1 i.e. the value close to 1 is a well-fitted
line. The formula is shown below.

𝑅2 = 1 − i ∑(𝑦 −𝑦̂ )2
∑(𝑦i−𝑦)2

Where, 𝑦̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦


Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION


Page 11

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

CAR PRICE PREDICTION

𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦
Note: In the case of an outlier, we can use L1 losses because with L2 loss the error is
being squared to give more loss value. We can remove the outlier from the first and then
can use L2 loss.

Learning Rate:
The alpha is the learning rate in the gradient descent formula as we seen above. It
functions of the alpha to control the speed of the gradient descent to get the minima point.
The value of alpha should be optimal so that it won’t miss the minima point or take time

∂𝐿
to reach the minima point.

𝜃𝑛ew = 𝜃o𝑙𝑑 − 𝛼
∂ o𝑙𝑑
𝜃

1.4.2 Gradient Descent:


To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE
value) and achieving the best fit line the model uses Gradient Descent. The idea is to start
with random θ1 and θ2 values and then iteratively updating the values, reaching
minimum cost.

1.4.3 One Hot Encoding:


Most Machine Learning algorithms cannot work with categorical data and needs
to be converted into numerical data. Sometimes in datasets, we encounter columns that
contain categorical features (string values) for example parameter Gender will have
categorical parameters like Male, Female. These labels have no specific order of
preference and also since the data is string labels, machine learning models
misinterpreted that there is some sort of hierarchy in them.

One approach to solve this problem can be label encoding where we will assign a
numerical value to these labels for example Male and Female mapped to 0 and 1. But this
can add bias in our model as it will start giving higher preference to the Female parameter
as 1>0 and ideally both labels are equally important in the dataset. To deal with this issue
we will use One Hot Encoding technique.

In this technique, the categorical parameters will prepare separate columns for both Male

Downloaded by Prashant Chaudhari


Page 12
lOMoARcPSD|40893658

CAR PRICE PREDICTION

and Female labels. So, wherever there is Male, the value will be 1 in Male column and 0
in Female column, and vice-versa. Let’s understand with an example: Consider the data
where fruits and their corresponding categorical values and prices are given.

1.5 Objective & Problem Statement

Objective Of the Project - The goal of this project is to create an efficient and
effective model that will be able to predict the price of a used car by using the Linear
Regression algorithm with better accuracy.
 Brand or Type of the car one prefers like Ford, Hyundai

 Model of the car namely Ford Figo, Hyundai Creta

 Year of manufacturing like 2020, 2021

 Type of fuel namely Petrol, Diesel

 Number of kilometers car has travelled

Problem Statement - It is easy for any company to price their new cars based on the
manufacturing and marketing cost it involves. But when it comes to a used car it is quite
difficult to define a price because it involves it is influenced by various parameters like
car brand, manufactured year and etc. The goal of our project is to predict the best price
for a pre-owned car in the Indian market based on the previous data related to sold cars
using Linear Regression.

1.6 Purpose of Project

The used car market is an ever-rising industry, which has almost doubled its market
value in the last few years. The emergence of online portals such as CarDheko, Quikr,
Carwale, Cars24, and many others has facilitated the need for both the customer and the
seller to be better informed about the trends and patterns that determine the value of the
used car in the market. Machine Learning algorithms can be used to predict the retail
value of a car, based on a certain set of features. The purpose of this project is to provide
Car price prediction using machine learning without any human interference.
In our day to day lives everyone buys and sells a car every day. Now there are
limited facilities and applications to get an appropriate price for one’s car. Now we use

Page 13
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

this application to get an estimate value of the car.

1.7 Architecture Diagram

Fig 1.7 – Architecture of Linear Regression (M.L)

1.8 Project Goal

We are required to model the price of cars with the available independent
variables. It will be used by the management to understand how exactly the prices vary
with the independent variables. They can accordingly manipulate the design of the cars,
the business strategy etc. to meet certain price levels. Further, the model will be a good
way for management to understand the pricing dynamics of a new market.

Page 14
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

CHAPTER -2

Page 15
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

2. SYSTEM REQUIREMENT SPECIFICATIONS

2.1 What is SRS?


Software Requirement Specification (SRS) is the starting point of the software
developing activity. As system grew more complex it became evident that the goal of
the entire system cannot be easily comprehended. Hence the need for the requirement
phase arose. The software project is initiated by the client needs. The SRS is the means
of translating the ideas of the minds of clients (the input) into a formal document
(theoutput of the requirement phase.)

The SRS phase consists of two basic activities:


Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the
problem,the goal and constraints.
Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as
representation, specification languages and tools, and checking the specifications are
addressed during this activity.
The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.

2.1.1 Role of SRS:


The purpose of the Software Requirement Specification is to reduce the
communication gap between the clients and the developers. Software Requirement
Specification is the medium though which the client and user needs are
accurately specified. It forms the basis of software development. A good SRS should
satisfy all the parties involved in the system.

2.2 Requirements Specification Document


A Software Requirements Specification (SRS) is a document that describes the
nature of a project, software or application. In simple words, SRS document is a manual
of a project provided it is prepared before you kick-start a project/application. This

Page 16
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

document is also known by the names SRS report, software document. A software
document is primarily prepared for a project, software or any kind of application.

There are a set of guidelines to be followed while preparing the software requirement
specification document. This includes the purpose, scope, functional and non-functional
requirements, software and hardware requirements of the project. In addition to this, it
also contains the information about environmental conditions required, safety and
security requirements, software quality attributes of the project etc.

The purpose of SRS (Software Requirement Specification) document is to describethe


external behavior of the application developed or software. It defines the operations,
performance and interfaces and quality assurance requirement of the application or
software. The complete software requirements for the system are captured by the SRS.
This section introduces the requirement specification document for Car Price Prediction
using linear Regression which enlists functional as well as non-functional requirements.

2.2 Functional Requirements


For documenting the functional requirements, the set of functionalities supported by
the system are to be specified. A function can be specified by identifying the state at
which data is to be input to the system, its input data domain, the output domain, and the
type of processing to be carried on the input data to obtain the output data. Functional
requirements define specific behavior or function of the application. Following are the
functional requirements:
FR1) After Registration the details should store in MySQL.
FR2) Entering Login details should show the user’s data .
FR3) The login page should redirect to next page(home).
FR4) The attributes should be shown after redirecting to home page.
FR5) After Entering attributes the price prediction should be shown.

2.3 Non-Functional Requirements


A non-functional requirement is a requirement that specifies criteria that can be used
to judge the operation of a system, rather than specific behaviors. Especially these are

Page 17
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

CAR PRICE PREDICTION

the constraints the system must work within. Following are the non-functional
requirements:

NFR 1) Must be able to work properly without bugs.

NFR 2) Should not be any lag showing the price

NFR 3) The database should access proper user data.

NFR 4) Attributes must be displayed properly to user.

2.3.1 Performance:
The performance of the developed applications can be calculated by using following
methods: Measuring enables you to identify how the performance of your application
stands in relation to your defined performance goals and helps you to identify the
bottlenecks that affect your application performance. It helps you identify whether your
application is moving toward or away from your performance goals. Defining what you
will measure, that is, your metrics, and defining the objectives for each metric is a
critical part of your testing plan.
Performance objectives include the following:
Response time, Latency throughput or Resource utilization.

2.4 Software Requirements


Operating System : Windows 10/11 or MAC OS.

Platform : Google colab, PyCharm IDE

Programming Language : Python, SQL

2.5 Hardware Requirements


Processor : Intel core i3 and above.
Hard Disk : 1 TB or above.
RAM : 4 GB or above.
Internet : 1 Mbps or above (Wireless).

Page 18
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

What is SRS ?

The process of testing a software in a well-planned and systematic way is known


as software testing lifecycle (STLC). Different organizations have different
phases in STLC however generic Software Test Life Cycle (STLC) for waterfall
development model consists of the following phases:

1.Requirements Analysis
2.Test Planning
3.Test Analysis
4.Test Design
5. Test Construction and Verification
6.Test Execution and Bug Reporting
7.Final Testing and Implementation
8.Post Implementation

2.6 Requirements Analysis


In this phase testers analyses the customer requirements and work with developers
during the design phase to see which requirements are testable and how they are going to
test those requirements. It is very important to start testing activities from the
requirements phase itself because the cost of fixing defect is very less if it is found in
requirements phase rather than in future phases. In this phase all the planning about
testing is done like what needs to be tested, how the testing will be done, test strategy to
be followed, what will be the test environment, what test methodologies will be
followed, hardware and software availability, resources, risks etc. A high level test plan
document is created which includes all the planning inputs mentioned above and
circulated to the stakeholders.

2.7 Test Construction and Verification


In this phase testers prepare more test cases by keeping in mind the positive and
negative scenarios, end user scenarios etc. All the test cases and automation scripts need
to be completed in this phase and got reviewed by the stakeholders. The test plan
document should also be finalized and verified by reviewers.

Page 19
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

2.8 Test Execution and Bug Reporting


Once the unit testing is done by the developers and test team gets the test build, The
test cases are executed and defects are reported in bug tracking tool, after the test
execution is complete and all the defects are reported. Test execution reports are created
and circulated to project stakeholders. After developers fix the bugs raised by testers
theygive another build with fixes to testers, testers do re-testing and regression testing to
ensure that the defect has been fixed and not affected any other areas of software.
Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done
after every defect fix. After tester assures that defects have been fixed and no more
critical defects remain in software the build is given for final testing.

2.9 Final Testing and Implementation


In this phase the final testing is done for the software, non-functional testing like
stress, load and performance testing are performed in this phase. The software is also
verified in the production kind of environment. Final test execution reports and
documents are prepared in this phase.

2.10 Post Implementation


In this phase the test environment is cleaned up and restored to default state, the
process review meetings are done and lessons learnt are documented. A document is
prepared to cope up similar problems in future releases.

Downloaded by Prashant Chaudhari


Page 20
lOMoARcPSD|40893658

Car Price Prediction

Phase Activities Outcome


Create high level test Test plan, Refined
Planning
plan Specification
Create detailed testplan, Revised Test Plan,
Analysis Functional Validation
Functional
Matrix, test cases

Validation Matrix, test


cases
Test cases are revised, Revised test cases, test
Design
select which test cases data sets,
to automate risk
assessment sheet.
Test
Scripting of test cases
procedures/Scripts,
Construction
to automate
Drivers, test
results,
Bug reports
Complete testing Test results, Bug
Testing cycles
cycles reports
Execute remainingstress and
Test results and
performancetests, complete
Final testing different metrics on
documentation
test efforts

Evaluate testing Plan for improvement


Post implementation
processes of testing process

Table 3.7 – Activities and Outcomes of each phase in SDLC

2.11 Technologies Used:


2.11.1 Google Colab
Colaboratory, or ―Colab‖ for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is especially
well suited to machine learning, data analysis and education. More technically, Colab is a
hosted Google colab service that requires no setup to use, while providing access free of
charge to computing resources including GPUs.

Page 21
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Is Google colab like Google colab?

Google Colab's major differentiator from Google colab is that it is cloud-based and
Jupyter is not. This means that if you work in Google Collab, you do not have to worry
about downloading and installing anything to your hardware.

Fig 3.8.1 – Google colab

2.11.2 PyCharm IDE


PyCharm is a dedicated Python Integrated Development Environment (IDE)
providing a wide range of essential tools for Python developers, tightly integrated to
create a convenient environment for productive Python, web, and data science
development.

JetBrains s.r.o. (formerly IntelliJ Software s.r.o.) is a Czech software development


company which makes tools for software developers and project managers. The company
offers integrated development environments (IDEs) for the programming languages Java,
Groovy, Kotlin, Ruby, Python, PHP, C, Objective-C, C++, C#, F#, Go, JavaScript, and
the domain-specific language SQL.

2.11.3 SQL
SQL (Structured Query Language) is a powerful and standard query language for
relational database systems. We use SQL to perform CRUD (Create, Read, Update,
Delete) operations on databases along with other various operations. SQL has evolved a
lot in the past decade.

Page 22
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Although SQL is an ANSI/ISO standard, there are different versions of the SQL
language. However, to be compliant with the ANSI standard, they all support at least the
major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a
similar manner.

MySQL, the most popular Open Source SQL database management system, is
developed, distributed, and supported by Oracle Corporation.

MySQL is a database management system.

A database is a structured collection of data. It may be anything from a simple shopping


list to a picture gallery or the vast amounts of information in a corporate network. To add,
access, and process data stored in a computer database, you need a database management
system such as MySQL Server. Since computers are very good at handling large amounts
of data, database management systems play a central role in computing, as standalone
utilities, or as parts of other applications.

RDBMS

RDBMS stands for Relational Database Management System. RDBMS is the basis for
SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle,
MySQL, and Microsoft Access. The data in RDBMS is stored in database objects called
tables. A table is a collection of related data entries and it consists of columns and rows.

Using SQL in Your Web Site


To build a web site that shows data from a database, you will need:
 An RDBMS database program (i.e. MS Access, SQL Server, MySQL)
 To use a server-side scripting language, like PHP or python
 To use SQL to get the data you want
 To use HTML / CSS to style the page

2.11.4 Flask
Flask is a micro web framework written in Python. It is classified as a micro
framework because it does not require particular tools or libraries. It has no database
abstraction layer, form validation, or any other components where pre-existing third-party
libraries provide common functions.

Page 23
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -3

Page 24
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

3. LITERATURE SURVEY

3.1 Paper work

Over fitting and under fitting come into picture when we create our statistical
models. The models might be too biased to the training data and might not perform well
on the test dataset. This is called over fitting. Likewise, the models might not take into
consideration all the variance present in the population and perform poorly on a test data
set. This is called underfitting. A perfect balance needs to be achieved between these two,
which leads to the concept of Bias-Variance tradeoff. Pierre Geurts has introduced and
explained how bias-variance tradeoff is achieved in both regression and classification.
The selection of variables/attribute plays a vital role in influencing both the bias and
variance of the statistical model. Robert Tibshirani proposed a new method called Lasso,
which minimizes the residual sum of squares. This returns a subset of attributes which
need to be included in multiple regression to get the minimal error rate. Similarly,
decision trees suffer from overfitting if they are not pruned/shrunk. Trevor Hastie and
Daryl Pregibon have explained the concept of pruning in their research paper. Moreover,
hypothesis testing using ANOVA is needed to verify whether the different groups of
errors really differ from each other. This is explained by TK Kim and Tae Kyun in their
paper. A Post-Hoc test needs to be performed along with ANOVA if the number of
groups exceeds two.

Turkey’s Test has been explored by Haynes W. in his research paper. Using these
techniques, we will create, train and test the effectiveness of our statistical models.

The paper is Predicting the price of Used Car Using Machine Learning Techniques. In
this paper, they investigate the application of supervised machine learning techniques to predict
the price of used cars in Mauritius. The predictions are based on historical data collected from
daily newspapers. Different techniques like multiple linear regression analysis, k-nearest
neighbors, naïve bayes and decision trees have been used to make the predictions.

The paper is Car Price Prediction Using Machine Learning Techniques. Considerable
number of distinct attributes is examined for the reliable and accurate prediction. To build
a model for predicting the price of used cars in Bosnia and Herzegovina, they have
applied three machine learning techniques (Artificial Neural Network, Support Vector
Machine and Random Forest).

Page 25
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

The paper is Price Evaluation model in second hand car system based on BP neural
networks. In this paper, the price evaluation model based on big data analysis is
proposed, which takes advantage of widely circulated vehicle data and a large number of
vehicle transaction data to analyze the price data for each type of vehicles by using the
optimized BP neural network algorithm. It aims to established second-hand car price
evaluation model to get the price that best matches the car.

3.2 PROPOSED MODEL

Null Hypothesis
Even though the magnitude of over fitting has been reduced, Regression trees still suffer
from over fitting even after Pruning. This leads to our following hypothesis.
Hypothesis: Multiple and Lasso Regressions are better at predicting price than the
Regression Tree.

Training and Testing Data


The data is split into training (70% - 563 records) and testing (30% - 241 records) data
sets through random sampling (seed was set to 2786).

Linear Regression
In statistics, linear regression is a linear approach for modelling the relationship between
a scalar response and one or more explanatory variables (also known as dependent and
independent variables). The case of one explanatory variable is called simple linear
regression; for more than one, the process is called multiple linear regression. This term
is distinct from multivariate linear regression, where multiple correlated dependent
variables are predicted, rather than a single scalar variable.

Page 26
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

3.3 Related work


Researchers more often predict prices of products using some previous data and
so did Pudaruth who predicted prices of cars in Mauritius and these cars were new rather
second hand. He used multiple linear regression, k-nearest neighbors, naïve Bayes and
decision trees algorithm in order to predict the prices. The comparison of prediction
results from these techniques showed that the prices from these methods are closely
comparable. However, it was found that decision tree algorithm and naïve bayes method
were unable to classify and predict numeric values. Pudaruth’s research also concluded
that limited number of instances in data set do not offer high prediction accuracies.
Multivariate regression model helps in classifying and predicting values of numeric
format. Kuiper used this model to predict price of 2005 General Motor (GM) cars. The
price prediction of cars does not require any special knowledge so the data available
online is enough to predict prices like the data available on www.pakwheels.com. Kuiper
did the same i.e. car price prediction and introduced variable selection techniques which
helped in finding which variables are more relevant for inclusion in model. He
encouraged students to use different models and find how checking model assumptions
work. Another similar research by Listiani uses Support Vector Machines (SVM) to
predict the prices of leased cars. This research showed that SVM is far more accurate in
predicting prices as compared to the multiple linear regression when a very large dataset
is available. SVM also handles high dimensional data better and avoids both the under-
fitting and over-fitting issues. Genetic algorithm is used by Listiani to find important
features for SVM. However, the technique does not show in terms of variance and mean
standard deviation why SVM is better than simple multiple regression.

Page 27
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -4

Page 28
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

4. SYSTEM DESIGN

4.1 Introduction to UML

The Unified Modeling Language allows the software engineer to express an analysis
model using the modeling notation that is governed by a set of syntactic, semantic and
pragmatic rules. A UML system is represented using five different views that describe
the system from distinctly different perspective. Each view is defined by a set of
diagram, which is as follows:
1. User Model View
This view represents the system from the users’ perspective. The analysis
representation describes a usage scenario from the end-users’ perspective.
2. Structural Model View
In this model, the data and functionality are arrived from inside the system. This
model view models the static structures.
3. Behavioral Model View
It represents the dynamic of behavioral as parts of the system, depicting he
interactions of collection between various structural elements described in the
user model and structural model view.
4. Implementation Model View
In this view, the structural and behavioral as parts of the system are represented
as they are to be built.
5. Environmental Model View
In this view, the structural and behavioral aspects of the environment in which
the system is to be implemented are represented.

4.2 UML Diagrams

4.2.1 Use Case Diagram

To model a system, the most important aspect is to capture the dynamic behavior. To
clarify a bit in details, dynamic behavior means the behavior of the system when it is
running/operating.

Page 29
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

So only static behavior is not sufficient to model a system rather dynamic


behavior is more important than static behavior. In UML there are five diagrams
available to modeldynamic nature and use case diagram is one of them. Now as
we have to discuss that the use case diagram is dynamic in nature there should
be some internal or external factors for making the interaction.
These internal and external agents are known as actors. So use case diagrams are
consisting of actors, use cases and their relationships. The diagram is used to
model the system/subsystem of an application. A single use case diagram
captures a particular functionality of a system. So to model the entire system
numbers of use case diagramsare used.
Use case diagrams are used to gather the requirements of a system including
internal and external influences. These requirements are mostly design
requirements. So when a system is analysed to gather its functionalities use
cases are prepared and actors are identified. In brief, the purposes of use case
diagrams can be as follows:
a. Used to gather requirements of a system.
b. Used to get an outside view of a system.
c. Identify external and internal factors influencing the system.
d. Show the interacting among the requirements are actors.

Fig 4.2.1 – Use Case Diagram

Page 30
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

4.2.2 Sequence Diagram

Sequence diagrams describe interactions among classes in terms of an exchange


of messages over time. They're also called event diagrams. A sequence diagram is a good
way to visualize and validate various runtime scenarios. These can help to predict how a
system will behave and to discover responsibilities a class may need to have in the
process of modelling a new system.
The aim of a sequence diagram is to define event sequences, which would have a desired
outcome. The focus is more on the order in which messages occur than on the message
per se. However, the majority of sequence diagrams will communicate what messages
are sent and the order in which they tend to occur.

Basic Sequence Diagram NotationsClass Roles or Participants


Class roles describe the way an object will behave in context. Use the UML object
symbol to illustrate class roles, but don't list object attributes.

Activation or Execution Occurrence


Activation boxes represent the time an object needs to complete a task. When an object
is busy executing a process or waiting for a reply message, use a thin grey rectangle
placed vertically on its lifeline.

Page 31
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Fig 4.2.2 – Sequence Diagram

Page 32
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

4.2.3 Class Diagram

Class diagrams are the main building blocks of every object oriented methods. The
class diagram can be used to show the classes, relationships, interface, association, and
collaboration. UML is standardized in class diagrams. Since classes are the building
block of an application that is based on OOPs, so as the class diagram has appropriate
structure to represent the classes, inheritance, relationships, and everything that OOPs
have in its context. It describes various kinds of objects and the static relationship in
between them.
The main purpose to use class diagrams are:
1. This is the only UML which can appropriately depict various aspects of
OOPsconcept.
2. Proper design and analysis of application can be faster and efficient.
3. It is base for deployment and component diagram.
Each class is represented by a rectangle having a subdivision of three compartments
name, attributes and operation.

Figure 4.2.3 Class Diagram

Page 33
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

4.2.4 System Design

A software module is the lowest level of design granularity in the system.


Depending on the software development approach, there may be one or more modules per
system. This section should provide enough detailed information about logic and data
necessary to completely write source code for all modules in the system (and/or integrate
COTS software programs).

If there are many modules or if the module documentation is extensive, place it in an


appendix or reference a separate document. Add additional diagrams and information, if
necessary, to describe each module, its functionality, and its hierarchy. Industry-standard
module specification practices should be followed. Include the following information in
the detailed module designs:

 A narrative description of each module, its function(s), the conditions under which
it is used (called or scheduled for execution), its overall processing, logic,
interfaces to other modules, interfaces to external systems, security requirements,
etc.; explain any algorithms used by the module in detail
 For COTS packages, specify any call routines or bridging programs to integrate the
package with the system and/or other COTS packages (for example, Dynamic Link
Libraries)
 Data elements, record structures, and file structures associated with module input
and output
 Graphical representation of the module processing, logic, flow of control, and
algorithms, using an accepted diagramming approach (for example, structure
charts, action diagrams, flowcharts, etc.)

Page 34
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

 Data entry and data output graphics; define or reference associated data elements;
if the project is large and complex or if the detailed module designs will be
incorporated into a separate document, then it may be appropriate to repeat the
screen information in this section
 Report layout

Figure 4.2.4 System Design

Page 35
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

4.2.5 State Chart Diagram

The name of the diagram itself clarifies the purpose of the diagram and other
details. It describes different states of a component in a system. The states are specific to
a component/object of a system.

A Statechart diagram describes a state machine. State machine can be defined as a


machine which defines different states of an object and these states are controlled by
external or internal events.

Activity diagram explained in the next chapter, is a special kind of a Statechart diagram.
As Statechart diagram defines the states, it is used to model the lifetime of an object.

4.2.5.1 How to Draw a Statechart Diagram?

Statechart diagram is used to describe the states of different objects in its life
cycle. Emphasis is placed on the state changes upon some internal or external events.
These states of objects are important to analyze and implement them accurately.

Statechart diagrams are very important for describing the states. States can be identified
as the condition of objects when a particular event occurs.

Before drawing a Statechart diagram we should clarify the following points −

 Identify the important objects to be analyzed.


 Identify the states.
 Identify the events.

Following is an example of a Statechart diagram where the state of Order object is


analyzed

The first state is an idle state from where the process starts. The next states are arrived for
events like send request, confirm request, and dispatch order. These events are
responsible for the state changes of order object.

Page 36
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

During the life cycle of an object (here order object) it goes through the following states
and there may be some abnormal exits. This abnormal exit may occur due to some
problem in the system. When the entire life cycle is complete, it is considered as a
complete transaction as shown in the following figure. The initial and final state of an
object is also shown in the following figure.

Figure 4.2.5 Sate Chart Diagram

Page 37
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -5

Page 38
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

5. IMPLEMENTATION

5.1 Pseudo Code

Step 1: Import the required packages.

Step 2: Download the dataset and link it to the google colab.

Step 3: Read the dataset and perform operations on data.

Step 4: Data cleaning.

Step 5: Data Preprocessing.

Step 6: Saving the cleaned car data set after performing operations on data.

Step 7: Start training the Machine learning Model.

Step 8: Split features and target as x and y respectively.

Step 9: Split the new data into 80% of Training data and 20% of Testing data.

Step 10: Train the model with Training data and Testing data.

Step 11: Implementing one hot encoder and column transformer to model.

Step 12: Applying Linear Regression to the model.

Step 13: Fit the Linear Regression Model.

Step 14: If accuracy is good use the model for prediction else fit the model again,
using other random states.

Step 15: Dump the Linear Regression model into our files using pickle .

Step 16: Open Pycharm and extract the cleaned car.csv and LinearRegressionModel.pkl
files into our project.

Step 17: Reading the model and dataset, make the prediction using python
and flask from webpage.

Page 39
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

5.2 Google Collab Data set Implementation:

import pandas as pd car=pd.read_csv("https://raw.githubusercontent.com/rajtilakls2510/

car_price_predictor/m
aster/quikr_car.csv")

car.shape
(892, 6)

car.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 892 entries, 0 to 891
Data columns (total 6 columns):
# Column Non-Null Count Dtype

0 name 892 non-null object


1 company 892 non-null object
2 year 892 non-null object
3 Price 892 non-null object
4 kms_driven 840 non-null object
5 fuel_type 837 non-null object
dtypes: object(6)
memory usage: 41.9+ KB
car['year'].unique()

Page 40
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

array(['2007', '2006', '2018', '2014', '2015', '2012', '2013', '2016',


'2010', '2017', '2008', '2011', '2019', '2009', '2005', '2000',
'...', '150k', 'TOUR', '2003', 'r 15', '2004', 'Zest', '/-Rs',
'sale', '1995', 'ara)', '2002', 'SELL', '2001', 'tion', 'odel',
'2 bs', 'arry', 'Eon', 'o...', 'ture', 'emi', 'car', 'able', 'no.',
'd...', 'SALE', 'digo', 'sell', 'd Ex', 'n...', 'e...', 'D...',
', Ac', 'go .', 'k...', 'o c4', 'zire', 'cent', 'Sumo', 'cab',
't xe', 'EV2', 'r...', 'zest'], dtype=object)
car['Price'].unique()
array(['80,000', '4,25,000', 'Ask For Price', '3,25,000', '5,75,000',
'1,75,000', '1,90,000', '8,30,000', '2,50,000', '1,82,000',
'3,15,000', '4,15,000', '3,20,000', '10,00,000', '5,00,000',
'3,50,000', '1,60,000', '3,10,000', '75,000', '1,00,000',
'2,90,000', '95,000', '1,80,000', '3,85,000', '1,05,000',
'6,50,000', '6,89,999', '4,48,000', '5,49,000', '5,01,000',
'4,89,999', '2,80,000', '3,49,999', '2,84,999', '3,45,000',
'4,99,999', '2,35,000', '2,49,999', '14,75,000', '3,95,000',
'2,20,000', '1,70,000', '85,000', '2,00,000', '5,70,000',
'1,10,000', '4,48,999', '18,91,111', '1,59,500', '3,44,999',
'4,49,999', '8,65,000', '6,99,000', '3,75,000', '2,24,999',
'12,00,000', '1,95,000', '3,51,000', '2,40,000', '90,000',
'1,55,000', '6,00,000', '1,89,500', '2,10,000', '3,90,000',
'1,35,000', '16,00,000', '7,01,000', '2,65,000', '5,25,000',
'3,72,000', '6,35,000', '5,50,000', '4,85,000', '3,29,500',
'2,51,111', '5,69,999', '69,999', '2,99,999', '3,99,999',
'4,50,000', '2,70,000', '1,58,400', '1,79,000', '1,25,000',
'2,99,000', '1,50,000', '2,75,000', '2,85,000', '3,40,000',
'70,000', '2,89,999', '8,49,999', '7,49,999', '2,74,999',
'9,84,999', '5,99,999', '2,44,999', '4,74,999', '2,45,000',
'1,69,500', '3,70,000', '1,68,000', '1,45,000', '98,500',
'2,09,000', '1,85,000', '9,00,000', '6,99,999', '1,99,999',
'5,44,999', '1,99,000', '5,40,000', '49,000', '7,00,000', '55,000',
'8,95,000', '3,55,000', '5,65,000', '3,65,000', '40,000',
'4,00,000', '3,30,000', '5,80,000', '3,79,000', '2,19,000',
'5,19,000', '7,30,000', '20,00,000', '21,00,000', '14,00,000',

Page 41
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'3,11,000', '8,55,000', '5,35,000', '1,78,000', '3,00,000',

'2,55,000', '5,49,999', '3,80,000', '57,000', '4,10,000',


'2,25,000', '1,20,000', '59,000', '5,99,000', '6,75,000', '72,500',
'6,10,000', '2,30,000', '5,20,000', '5,24,999', '4,24,999',
'6,44,999', '5,84,999', '7,99,999', '4,44,999', '6,49,999',
'9,44,999', '5,74,999', '3,74,999', '1,30,000', '4,01,000',
'13,50,000', '1,74,999', '2,39,999', '99,999', '3,24,999',
'10,74,999', '11,30,000', '1,49,000', '7,70,000', '30,000',
'3,35,000', '3,99,000', '65,000', '1,69,999', '1,65,000',
'5,60,000', '9,50,000', '7,15,000', '45,000', '9,40,000',
'1,55,555', '15,00,000', '4,95,000', '8,00,000', '12,99,000',
'5,30,000', '14,99,000', '32,000', '4,05,000', '7,60,000',
'7,50,000', '4,19,000', '1,40,000', '15,40,000', '1,23,000',
'4,98,000', '4,80,000', '4,88,000', '15,25,000', '5,48,900',
'7,25,000', '99,000', '52,000', '28,00,000', '4,99,000',
'3,81,000', '2,78,000', '6,90,000', '2,60,000', '90,001',
'1,15,000', '15,99,000', '1,59,000', '51,999', '2,15,000',
'35,000', '11,50,000', '2,69,000', '60,000', '4,30,000',
'85,00,003', '4,01,919', '4,90,000', '4,24,000', '2,05,000',
'5,49,900', '3,71,500', '4,35,000', '1,89,700', '3,89,700',
'3,60,000', '2,95,000', '1,14,990', '10,65,000', '4,70,000',
'48,000', '1,88,000', '4,65,000', '1,79,999', '21,90,000',
'23,90,000', '10,75,000', '4,75,000', '10,25,000', '6,15,000',
'19,00,000', '14,90,000', '15,10,000', '18,50,000', '7,90,000',
'17,25,000', '12,25,000', '68,000', '9,70,000', '31,00,000',
'8,99,000', '88,000', '53,000', '5,68,500', '71,000', '5,90,000',
'7,95,000', '42,000', '1,89,000', '1,62,000', '35,999',
'29,00,000', '39,999', '50,500', '5,10,000', '8,60,000',
'5,00,001'], dtype=object)

car['kms_driven'].unique()
array(['45,000 kms', '40 kms', '22,000 kms', '28,000 kms', '36,000 kms',
'59,000 kms', '41,000 kms', '25,000 kms', '24,530 kms',
'60,000 kms', '30,000 kms', '32,000 kms', '48,660 kms',
'4,000 kms', '16,934 kms', '43,000 kms', '35,550 kms',

Page 42
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'39,522 kms', '39,000 kms', '55,000 kms', '72,000 kms',


'15,975 kms', '70,000 kms', '23,452 kms', '35,522 kms',

'48,508 kms', '15,487 kms', '82,000 kms', '20,000 kms',


'68,000 kms', '38,000 kms', '27,000 kms', '33,000 kms',
'46,000 kms', '16,000 kms', '47,000 kms', '35,000 kms',
'30,874 kms', '15,000 kms', '29,685 kms', '1,30,000 kms',
'19,000 kms', nan, '54,000 kms', '13,000 kms', '38,200 kms',
'50,000 kms', '13,500 kms', '3,600 kms', '45,863 kms',
'60,500 kms', '12,500 kms', '18,000 kms', '13,349 kms',
'29,000 kms', '44,000 kms', '42,000 kms', '14,000 kms',
'49,000 kms', '36,200 kms', '51,000 kms', '1,04,000 kms',
'33,333 kms', '33,600 kms', '5,600 kms', '7,500 kms', '26,000 kms',
'24,330 kms', '65,480 kms', '28,028 kms', '2,00,000 kms',
'99,000 kms', '2,800 kms', '21,000 kms', '11,000 kms',
'66,000 kms', '3,000 kms', '7,000 kms', '38,500 kms', '37,200 kms',
'43,200 kms', '24,800 kms', '45,872 kms', '40,000 kms',
'11,400 kms', '97,200 kms', '52,000 kms', '31,000 kms',
'1,75,430 kms', '37,000 kms', '65,000 kms', '3,350 kms',
'75,000 kms', '62,000 kms', '73,000 kms', '2,200 kms',
'54,870 kms', '34,580 kms', '97,000 kms', '60 kms', '80,200 kms',
'3,200 kms', '0,000 kms', '5,000 kms', '588 kms', '71,200 kms',
'1,75,400 kms', '9,300 kms', '56,758 kms', '10,000 kms',
'56,450 kms', '56,000 kms', '32,700 kms', '9,000 kms', '73 kms',
'1,60,000 kms', '84,000 kms', '58,559 kms', '57,000 kms',
'1,70,000 kms', '80,000 kms', '6,821 kms', '23,000 kms',
'34,000 kms', '1,800 kms', '4,00,000 kms', '48,000 kms',
'90,000 kms', '12,000 kms', '69,900 kms', '1,66,000 kms',
'122 kms', '0 kms', '24,000 kms', '36,469 kms', '7,800 kms',
'24,695 kms', '15,141 kms', '59,910 kms', '1,00,000 kms',
'4,500 kms', '1,29,000 kms', '300 kms', '1,31,000 kms',
'1,11,111 kms', '59,466 kms', '25,500 kms', '44,005 kms',
'2,110 kms', '43,222 kms', '1,00,200 kms', '65 kms',
'1,40,000 kms', '1,03,553 kms', '58,000 kms', '1,20,000 kms',
'49,800 kms', '100 kms', '81,876 kms', '6,020 kms', '55,700 kms',
'18,500 kms', '1,80,000 kms', '53,000 kms', '35,500 kms',

Page 43
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'22,134 kms', '1,000 kms', '8,500 kms', '87,000 kms', '6,000 kms',
'15,574 kms', '8,000 kms', '55,800 kms', '56,400 kms',
'72,160 kms', '11,500 kms', '1,33,000 kms', '2,000 kms',

'88,000 kms', '65,422 kms', '1,17,000 kms', '1,50,000 kms',


'10,750 kms', '6,800 kms', '5 kms', '9,800 kms', '57,923 kms',
'30,201 kms', '6,200 kms', '37,518 kms', '24,652 kms', '383 kms',
'95,000 kms', '3,528 kms', '52,500 kms', '47,900 kms',
'52,800 kms', '1,95,000 kms', '48,008 kms', '48,247 kms',
'9,400 kms', '64,000 kms', '2,137 kms', '10,544 kms', '49,500 kms',
'1,47,000 kms', '90,001 kms', '48,006 kms', '74,000 kms',
'85,000 kms', '29,500 kms', '39,700 kms', '67,000 kms',
'19,336 kms', '60,105 kms', '45,933 kms', '1,02,563 kms',
'28,600 kms', '41,800 kms', '1,16,000 kms', '42,590 kms',
'7,400 kms', '54,500 kms', '76,000 kms', '00 kms', '11,523 kms',
'38,600 kms', '95,500 kms', '37,458 kms', '85,960 kms',
'12,516 kms', '30,600 kms', '2,550 kms', '62,500 kms',
'69,000 kms', '28,400 kms', '68,485 kms', '3,500 kms',
'85,455 kms', '63,000 kms', '1,600 kms', '77,000 kms',
'26,500 kms', '2,875 kms', '13,900 kms', '1,500 kms', '2,450 kms',
'1,625 kms', '33,400 kms', '60,123 kms', '38,900 kms',
'1,37,495 kms', '91,200 kms', '1,46,000 kms', '1,00,800 kms',
'2,100 kms', '2,500 kms', '1,32,000 kms', 'Petrol'], dtype=object)

car['fuel_type'].unique()
array(['Petrol', 'Diesel', nan, 'LPG'], dtype=object)

backup=car.copy()

car=car[car['year'].str.isnumeric()]

car['year']=car['year'].astype(int)

car.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 842 entries, 0 to 891

Page 44
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Data columns (total 6 columns):


# Column Non-Null Count Dtype

0 name 842 non-null object


1 company 842 non-null object
2 year 842 non-null int32
3 Price 842 non-null object
4 kms_driven 840 non-null object
5 fuel_type 837 non-null object
dtypes: int32(1), object(5)
memory usage: 42.8+ KB

car=car[car['Price'] != "Ask For Price"]


car['Price']=car['Price'].str.replace(',','').astype(int)

car['kms_driven']=car['kms_driven'].str.split(' ').str.get(0).str.replace(',','')
car=car[car['kms_driven'].str.isnumeric()]
car['kms_driven']=car['kms_driven'].astype(int)

car=car[~car['fuel_type'].isna()]

car['name']=car['name'].str.split(' ').str.slice(0,3).str.join(' ')

car=car.reset_index(drop=True)

car=car[car['Price']<6e6].reset_index(drop=True)

car.to_csv('cleaned car.csv')
#Splitting the features and target
x=car.drop(columns='Price')
y=car['Price']

Page 45
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

name company year kms_driven fuel_type


0 Hyundai Santro Xing Hyundai 2007 45000 Petrol
1 Mahindra Jeep CL550 Mahindra 2006 40 Diesel
2 Hyundai Grand i10 Hyundai 2014 28000 Petrol
3 Ford EcoSport Titanium Ford 2014 36000 Diesel
4 Ford Figo Ford 2012 41000 Diesel
... ... ... ... ... ...
811 Maruti Suzuki Ritz Maruti 2011 50000 Petrol
812 Tata Indica V2Tata 2009 30000 Diesel
813 Toyota Corolla Altis Toyota 2009 132000 Petrol
814 Tata Zest XM Tata 2018 27000 Diesel
815 Mahindra Quanto C8 Mahindra 2013 40000 Diesel
816 rows × 5 columns

from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

from sklearn.linear_model import LinearRegression


from sklearn.metrics import r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

ohe=OneHotEncoder()
ohe.fit(x[['name','company','fuel_type']])
OneHotEncoder()
ohe.categories_
[array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0', 'Audi A6 2.0',

Downloaded by Prashant Chaudhari


Page 46
lOMoARcPSD|40893658

Car Price Prediction

'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series',
'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d',
'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat Diesel',
'Chevrolet Beat LS', 'Chevrolet Beat LT', 'Chevrolet Beat PS',
'Chevrolet Cruze LTZ', 'Chevrolet Enjoy', 'Chevrolet Enjoy 1.4',
'Chevrolet Sail 1.2', 'Chevrolet Sail UVA', 'Chevrolet Spark',

'Chevrolet Spark 1.0', 'Chevrolet Spark LS', 'Chevrolet Spark LT',


'Chevrolet Tavera LS', 'Chevrolet Tavera Neo', 'Datsun GO T',
'Datsun Go Plus', 'Datsun Redi GO', 'Fiat Linea Emotion',
'Fiat Petra ELX', 'Fiat Punto Emotion', 'Force Motors Force',
'Force Motors One', 'Ford EcoSport', 'Ford EcoSport Ambiente',
'Ford EcoSport Titanium', 'Ford EcoSport Trend',
'Ford Endeavor 4x4', 'Ford Fiesta', 'Ford Fiesta SXi', 'Ford Figo',
'Ford Figo Diesel', 'Ford Figo Duratorq', 'Ford Figo Petrol',
'Ford Fusion 1.4', 'Ford Ikon 1.3', 'Ford Ikon 1.6',
'Hindustan Motors Ambassador', 'Honda Accord', 'Honda Amaze',
'Honda Amaze 1.2', 'Honda Amaze 1.5', 'Honda Brio', 'Honda Brio V',
'Honda Brio VX', 'Honda City', 'Honda City 1.5', 'Honda City SV',
'Honda City VX', 'Honda City ZX', 'Honda Jazz S', 'Honda Jazz VX',
'Honda Mobilio', 'Honda Mobilio S', 'Honda WR V', 'Hyundai Accent',
'Hyundai Accent Executive', 'Hyundai Accent GLE',

'Hyundai Accent GLX', 'Hyundai Creta', 'Hyundai Creta 1.6',


'Hyundai Elantra 1.8', 'Hyundai Elantra SX', 'Hyundai Elite i20',
'Hyundai Eon', 'Hyundai Eon D', 'Hyundai Eon Era',
'Hyundai Eon Magna', 'Hyundai Eon Sportz', 'Hyundai Fluidic Verna',
'Hyundai Getz', 'Hyundai Getz GLE', 'Hyundai Getz Prime',
'Hyundai Grand i10', 'Hyundai Santro', 'Hyundai Santro AE',
'Hyundai Santro Xing', 'Hyundai Sonata Transform', 'Hyundai Verna',
'Hyundai Verna 1.4', 'Hyundai Verna 1.6', 'Hyundai Verna Fluidic',
'Hyundai Verna Transform', 'Hyundai Verna VGT',
'Hyundai Xcent Base', 'Hyundai Xcent SX', 'Hyundai i10',
'Hyundai i10 Era', 'Hyundai i10 Magna', 'Hyundai i10 Sportz',
'Hyundai i20', 'Hyundai i20 Active', 'Hyundai i20 Asta',
'Hyundai i20 Magna', 'Hyundai i20 Select', 'Hyundai i20 Sportz',

Page 47
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'Jaguar XE XE', 'Jaguar XF 2.2', 'Jeep Wrangler Unlimited',


'Land Rover Freelander', 'Mahindra Bolero DI',
'Mahindra Bolero Power', 'Mahindra Bolero SLE',
'Mahindra Jeep CL550', 'Mahindra Jeep MM', 'Mahindra KUV100',
'Mahindra KUV100 K8', 'Mahindra Logan', 'Mahindra Logan Diesel',
'Mahindra Quanto C4', 'Mahindra Quanto C8', 'Mahindra Scorpio',
'Mahindra Scorpio 2.6', 'Mahindra Scorpio LX',

'Mahindra Scorpio S10', 'Mahindra Scorpio S4',


'Mahindra Scorpio SLE', 'Mahindra Scorpio SLX',
'Mahindra Scorpio VLX', 'Mahindra Scorpio Vlx',
'Mahindra Scorpio W', 'Mahindra TUV300 T4', 'Mahindra TUV300 T8',
'Mahindra Thar CRDe', 'Mahindra XUV500', 'Mahindra XUV500 W10',
'Mahindra XUV500 W6', 'Mahindra XUV500 W8', 'Mahindra Xylo D2',
'Mahindra Xylo E4', 'Mahindra Xylo E8', 'Maruti Suzuki 800',
'Maruti Suzuki A', 'Maruti Suzuki Alto', 'Maruti Suzuki Baleno',
'Maruti Suzuki Celerio', 'Maruti Suzuki Ciaz',
'Maruti Suzuki Dzire', 'Maruti Suzuki Eeco',
'Maruti Suzuki Ertiga', 'Maruti Suzuki Esteem',
'Maruti Suzuki Estilo', 'Maruti Suzuki Maruti',
'Maruti Suzuki Omni', 'Maruti Suzuki Ritz', 'Maruti Suzuki S',
'Maruti Suzuki SX4', 'Maruti Suzuki Stingray',
'Maruti Suzuki Swift', 'Maruti Suzuki Versa',
'Maruti Suzuki Vitara', 'Maruti Suzuki Wagon', 'Maruti Suzuki Zen',
'Mercedes Benz A', 'Mercedes Benz B', 'Mercedes Benz C',
'Mercedes Benz GLA', 'Mini Cooper S', 'Mitsubishi Lancer 1.8',
'Mitsubishi Pajero Sport', 'Nissan Micra XL', 'Nissan Micra XV',
'Nissan Sunny', 'Nissan Sunny XL', 'Nissan Terrano XL',
'Nissan X Trail', 'Renault Duster', 'Renault Duster 110',
'Renault Duster 110PS', 'Renault Duster 85', 'Renault Duster 85PS',
'Renault Duster RxL', 'Renault Kwid', 'Renault Kwid 1.0',
'Renault Kwid RXT', 'Renault Lodgy 85', 'Renault Scala RxL',
'Skoda Fabia', 'Skoda Fabia 1.2L', 'Skoda Fabia Classic',
'Skoda Laura', 'Skoda Octavia Classic', 'Skoda Rapid Elegance',
'Skoda Superb 1.8', 'Skoda Yeti Ambition', 'Tata Aria Pleasure',
'Tata Bolt XM', 'Tata Indica', 'Tata Indica V2', 'Tata Indica eV2',

Page 48
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'Tata Indigo CS', 'Tata Indigo LS', 'Tata Indigo LX',


'Tata Indigo Marina', 'Tata Indigo eCS', 'Tata Manza',
'Tata Manza Aqua', 'Tata Manza Aura', 'Tata Manza ELAN',
'Tata Nano', 'Tata Nano Cx', 'Tata Nano GenX', 'Tata Nano LX',
'Tata Nano Lx', 'Tata Sumo Gold', 'Tata Sumo Grande',
'Tata Sumo Victa', 'Tata Tiago Revotorq', 'Tata Tiago Revotron',
'Tata Tigor Revotron', 'Tata Venture EX', 'Tata Vista Quadrajet',
'Tata Zest Quadrajet', 'Tata Zest XE', 'Tata Zest XM',

'Toyota Corolla', 'Toyota Corolla Altis', 'Toyota Corolla H2',


'Toyota Etios', 'Toyota Etios G', 'Toyota Etios GD',
'Toyota Etios Liva', 'Toyota Fortuner', 'Toyota Fortuner 3.0',
'Toyota Innova 2.0', 'Toyota Innova 2.5', 'Toyota Qualis',
'Volkswagen Jetta Comfortline', 'Volkswagen Jetta Highline',
'Volkswagen Passat Diesel', 'Volkswagen Polo',
'Volkswagen Polo Comfortline', 'Volkswagen Polo Highline',
'Volkswagen Polo Highline1.2L', 'Volkswagen Polo Trendline',
'Volkswagen Vento Comfortline', 'Volkswagen Vento Highline',
'Volkswagen Vento Konekt', 'Volvo S80 Summum'], dtype=object),
array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford',
'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land',
'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan',
'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'],
dtype=object),
array(['Diesel', 'LPG', 'Petrol'], dtype=object)]
column_trans=make_column_transformer((OneHotEncoder(categories=ohe.categories_),
['name','company','fuel_type']),
remainder='passthrough')
lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
Pipeline(steps=[('columntransformer',
ColumnTransformer(remainder='passthrough',
transformers=[('onehotencoder',
OneHotEncoder(categories=[array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0',
'Audi A6 2.0',

Page 49
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series',
'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d',
'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat...
array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford',
'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land',
'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan',
'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'],
dtype=object),
array(['Diesel', 'LPG', 'Petrol'], dtype=object)]),
['name', 'company','fuel_type'])])),
('linearregression', LinearRegression())])

y_pred=pipe.predict(x_test)
y_pred
y_test
322 210000
204 500000
42 284999
606 500000
513 159000
...
801 465000
711 200000
731 300000
757 150000
379 130000
Name: Price, Length: 164, dtype: int32

r2_score(y_test,y_pred)
0.6863234123258164

# checking for maximum r2_score


scores=[]
for i in range(1000):
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=i)
lr=LinearRegression()

Page 50
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
y_pred=pipe.predict(x_test)
scores.append(r2_score(y_test,y_pred))

import numpy as np
np.argmax(scores)
906
scores[np.argmax(scores)]
0.7768125045875028

#Training the model using highest r2_score


x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=np.argmax(scores))
lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
y_pred=pipe.predict(x_test)
r2_score(y_test,y_pred)
0.8456515104452564

#predicting the price by taking input features

pipe.predict(pd.DataFrame([['Maruti Suzuki Swift','Maruti',2019,100,'Petrol']],


columns=['name','company','year','kms_driven','fuel_type']))

#prediction
array([459113.49353657]

# dumping the LinearRegressionModel.pkl file using pickle for further development process
import pickle
pickle.dump(pipe,open('LinearRegressionModel.pkl','wb'))

Page 51
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

5.3 Code Snippets

1. home.html
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-
fit=no">

<link rel="stylesheet" href="static/css/style.css">


<!-- Bootstrap CSS -->
<linkrel="stylesheet"
href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
integrity="sha384-
MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPM
O" crossorigin="anonymous">

<title>Car Price Predictor</title>


</head>
<body class="backgroundColor">

<div class=" d-flex flex-column flex-md-row align-items-center p-3 px-md-4 mb-3


navbar-light" style="background-color: #e0f2f1;">
<h5 class="my-0 mr-md-auto font-weight-normal"><b><h4>CAR PRICE
PREDICTOR</h4></b></h5>
<nav class="my-2 my-md-0 mr-md-3 ">
<a class="p-2 text-dark" href="{{url_for('home')}}"><b>Home</b></a>

</nav>
<a class="btn btn-outline-primary" href="/logout">Log out</a>
</div>
<div class="container">
<div clas="row">
<div class="card mt-50" style="width:100%;height:100%">
<div class="card-header">
<div class="col-12" style="text-align:center">
<h1>Welcome to Car Price Predictor</h1>

Page 52
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

</div>
</div>

<div class="card-body">
<form class="form" method="post" >

<div class="col-10 form-group" style="text-align: center">


<label> <b>Select company: </b></label>
<select class="selectpicker form-control" id="company" name="company"
required="1" onchange="load_car_models(this.id,'car_model')">

{% for company in companies %}


<option value="{{company}}">{{company}} </option>
{% endfor %}

</select>
</div>

<div class="col-10 form-group" style="text-align: center">


<label> <b>Select Model: </b></label>
<select class="selectpicker form-control" id="car_model" name="car_model"
required="1">
</select>
</div>

<div class="col-10 form-group" style="text-align: center">


<label> <b>Select Year of Purchase: </b></label>
<select class="selectpicker form-control" id="year" name="year" required="1">

{% for year in years %}


<option value="{{year}}">{{year}} </option>
{% endfor %}
</select>
</div>

<div class="col-10 form-group" style="text-align: center">


<label> <b>Select Fuel Type: </b></label>
<select class="selectpicker form-control" id="fuel_type" name="fuel_type"
required="1">

Page 53
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

{% for fuel_type in fuel_types %}


<option value="{{fuel_type}}">{{fuel_type}} </option>
{% endfor %}
</select>
</div>

<div class="col-10 form-group" style="text-align: center">


<label> <b>Kilometers travelled: </b></label>
<input class="form-control" type="text" id="kms_driven" name="kms_driven"
placeholder="Enter no.of kms travelled" >

</input>
</div>

<div class="col-10 form-group" style="text-align: center">


<button class="btn btn-primary btn-block btn-lg" onclick="send_data()"
value="Predict">Predict Price</button>
</div>

</form>
<br>
<div class="row">
<div class="col-12" style="text-align: center">
<h3><span id="prediction"></span> </h3>
</div>
</div>

</div>

</div>
</div>

</div>

Page 54
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

<script>
function load_car_models(company_id,car_model_id)
{
var company= document.getElementById(company_id);
var car_model= document.getElementById(car_model_id);

car_model.value="";
car_model.innerHTML="";

{% for company in companies %}

if(company.value == "{{company}}" )
{
{% for model in car_models %}

{% if company in model %}

var newOption = document.createElement("option");


newOption.value="{{ model }}";
newOption.innerHTML="{{ model }}";
car_model.options.add(newOption);

{% endif %}
{% endfor %}
}
{% endfor %}
}

function form_handler()
{
event.preventDefault();
}

function send_data()
{
document.querySelector('form').addEventListener('submit', form_handler);
var fd= new FormData(document.querySelector('form'));

Page 55
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

var xhr=new XMLHttpRequest();

xhr.open('POST', '/predict', true);


document.getElementById("prediction").innerHTML="wait! predicting price...";

xhr.onreadystatechange= function()
{
if(xhr.readyState == XMLHttpRequest.DONE)
{
document.getElementById("prediction").innerHTML="The Predicted Price is: "+
xhr.responseText + " Rs/-";

}
}

xhr.onload=function(){};
xhr.send(fd);

</script>
<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-
q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
crossorigin="anonymous"></script>

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/umd/popper.min.js"
integrity="sha384-
ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49"
crossorigin="anonymous"></script>

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.min.js"
integrity="sha384-
ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy"
crossorigin="anonymous"></script>
</body>
</html>

Page 56
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

1. App.java

import pandas as pd
#from flask import Flask, render_template, request, url_for,redirect,session
import pickle
import numpy as np
from flask import *
import flask_login
import os
from num2words import num2words
import mysql.connector

model=pickle.load(open("LinearRegressionModel.pkl",'rb'))
car=pd.read_csv("cleaned car.csv")
app=Flask( name )

app.secret_key=os.urandom(24)

conn=mysql.connector.connect(
host='localhost',
user='root',
password='Password123@',
port='3306',
database='database'
)

mycursor=conn.cursor()

@app.route('/')
def login():
if 'user_id' in session:
return redirect('/home')
else:
return render_template('login.html')

@app.route('/register')
def register():
return render_template('register.html')

Page 57
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

@app.route('/logout')
def logout():
session.pop('user_id')
return redirect('/')

@app.route('/login_validation',methods=['POST'])
def login_validation():
email=request.form.get('email')
password=request.form.get('password')

mycursor.execute('''SELECT * FROM `database`.`uinfo` WHERE `email` LIKE '{}'


AND `pwd` LIKE '{}' '''
.format(email,password))
uinfo=mycursor.fetchall()

if len(uinfo)>0:
session['user_id']=uinfo[0][0]
return redirect('/home')
else:
flash('Incorrect username/ password')
return redirect('/')

@app.route('/add_user',methods=['POST'])
def add_user():
name=request.form.get('uname')
email=request.form.get('uemail')
password=request.form.get('upassword')

mycursor.execute('''INSERT INTO `database`.`uinfo` (`uid`,`name`, `email`, `pwd`)


VALUES (NULL , '{}','{}','{}' )'''.format(name, email, password))

conn.commit()

return render_template('login.html')

Page 58
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

@app.route('/home')
def home():
companies=sorted(car['company'].unique())
car_models = sorted(car['name'].unique())
year = sorted(car['year'].unique(),reverse=True)
fuel_type = (car['fuel_type'].unique())
companies.insert(0, "Select Company")
year.insert(0,"Select Year of Purchase")

if 'user_id' in session:
return
render_template('home.html',companies=companies,car_models=car_models,years=year,
fuel_types=fuel_type)
else:
return redirect('/')

@app.route('/predict',methods=['POST'])
def predict():
company= request.form.get('company')
car_model=request.form.get('car_model')
year=request.form.get('year')
fuel_type=request.form.get('fuel_type')
kms_driven=request.form.get('kms_driven')

prediction=model.predict(pd.DataFrame([[car_model, company, year, kms_driven,


fuel_type]], columns=['name', 'company', 'year', 'kms_driven', 'fuel_type']))

return str(np.round(prediction[0], 0))

if name ==" main__":


app.run(debug=True)

Page 59
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -6

Downloaded by Prashant Chaudhari


Page 60
lOMoARcPSD|40893658

Car Price Prediction

6. TESTING

6.1 Introduction to Testing

Testing is the process of evaluating a system or its component(s) with the intent to
find whether it satisfies the specified requirements or not. Testing is executing a
system in order to identify any gaps, errors, or missing requirements in contrary to the
actual requirements.

According to ANSI/IEEE 1059 standard, Testing can be defined as - A process of


analyzing a software item to detect the differences between existing and required
conditions (that is defects/errors/bugs) and to evaluate the features of the software item.

Who does Testing?

It depends on the process and the associated stakeholders of the project(s). In the IT
industry, large companies have a team with responsibilities to evaluate the developed
software in context of the given requirements. Moreover, developers also conduct testing
which is called Unit Testing. In most cases, the following professionals are involved in
testing a system within their respective capacities:

● Software Tester
● Software Developer
● Project Lead/Manager
● End User
Levels of testing include different methodologies that can be used while conducting
software testing. The main levels of software testing are:

● Functional Testing
● Non-functional Testing

Functional Testing
This is a type of black-box testing that is based on the specifications of the software
that is to be tested. The application is tested by providing input and then the results are
examined that need to conform to the functionality it was intended for. Functional
testing of a software is conducted on a complete, integrated system to evaluate the
system's compliance with its specified requirements.

Page 61
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

6.1.1 Software Testing Life Cycle


The process of tes ting a software in a well- planned and systematic way is known as
software testing lifecycle (STLC).
Different organizations have different phases in STLC however generic Software Test Life
Cycle (STLC) for waterfall development model consists of the following phases.

1. Requirements Analysis
2. Test Planning
3. Test Analysis
4. Test Design

● Requirements Analysis
In this phase testers analyze the customer requirements and work with developersduring
the design phase to see which requirements are testable and how they are going to test
those requirements.
It is very important to start testing activities from the requirements phase itself because
the cost of fixing defect is very less if it is found in requirements phase rather than in
future phases.

● Test Planning

In this phase all the planning about testing is done like what needs to be tested, how the
testing will be done, test strategy to be followed, what will be the test environment, what
test methodologies will be followed, hardware and software availability, resources, risks
etc. A high level test plan document is created which includes all the planning inputs
mentioned above and circulated to the stakeholders.

● Test Analysis
After test planning phase is over test analysis phase starts, in this phase we needto dig
deeper into project and figure out what testing needs to be carried out in each SDLC
phase. Automation activities are also decided in this phase, information needs to be
done for software product, how will the automation be done, how much time will it
take to automate and which features need to be automated. Non functional testing
areas(Stress and performance testing) are also analyzed and defined in this phase.

Page 62
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

● Test Design
In this phase various black-box and white-box test design techniques are used to design
the test cases for testing, testers start writing test cases by following those design
techniques, if automation testing needs to be done then automation scripts also needs to
written in this phase.

6.2 Test Cases

1. Launching Home Page


2. Registration of user details
3. login Positive Test Case
4. login Negative Test Case
5. Displaying attributes
6. Selecting attributes
7. Price Prediction Test case with selecting correct attributes
8. Price Prediction Test case without selecting one or more attributes.
9. Home button Test case
10. Logout button Test case

Test Scenario Launching Web Test Case ID HomePage-1A


ID application
Test Case Launching Test Priority High
Description Register/Page
Pre-Requisite NA Post Requisite NA

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Comments
Result
Browser

1. Localhost Click on url: Launch Register/L Chrome Pass NA


http://127.0.0.1. Register/Login ogin page
//127.0.0.1.5000 5000/home page launched

Page 63
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.1 Registration of user details

Test Scenario ID Registration Test CaseID Register-1A

Test Case Register Form Test Priority High


Description

Pre-Requisite Email-id Post Requisite NA

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

http://127.0. Dinesh
1 Registered Registered Chrome Pass NA
0.1.5000/ho Successfully
me dinesh Successfully
@gma
il.com

12345

2 http://127.0. Pranav Registered Registered Chrome Pass NA


0.1.5000/ho
me successfully successfully
Pranav
@gmail
.com

Pranav
@1

Page 64
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.2: Login Positive Test Case

Test Scenario ID Login Test CaseID Login-1A

Test Case Login Positive Test Test Priority High


Description Case

Pre-Requisite Valid User Account Post Requisite NA

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comm
ents

1 http://127.0.0 Dinesh Home page Home Page Chrome Pass NA


.1.5000
dinesh@g
mail.com

12345

2 http://127.0.0 Pranav Home page Home page Chrome Pass NA


.1.5000

Pranav@gm
ail.com

Pranav@1

Page 65
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.3 Login Negative Test case

Test Scenario ID Login Test CaseID Login-2A

Test Case Login Negative Test High


Description Test Case Priority

Pre-Requisite Valid User Account Post NA


Requisite

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 http://127.0.0.1 Dinesh Login Enter Chrome Pass NA


.5000 Failed correct
dinesh username/
@gma password.
il.com Login
failed.
54321

2 http://127.0.0.1 Pranav1 Login Enter Chrome Pass NA


.5000 correct
Pranav Failed
username/
@gmail password.
.com Login
failed.
Pranav
@2

Page 66
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.4 Displaying attributes

Test Scenario ID Login Test CaseID Login-2A

Test Case Login Negative Test High


Description Test Case Priority

Pre-Requisite Valid User Account Post NA


Requisite

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 http://127.0.0.1 Dinesh Login Enter Chrome Pass NA


.5000 Failed correct
dinesh username/
@gma password.
il.com Login
failed.
54321

2 http://127.0.0.1 Pranav1 Login Enter Chrome Pass NA


.5000 correct
Pranav Failed
username/
@gmail password.
.com Login
failed.
Pranav
@2

Page 67
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.5 Selecting attributes

Test Scenario ID Attribute Selection Test CaseID Login-2A

Test Case Selecting Test High


Description Priority

Pre-Requisite Valid User Account Post NA


Requisite

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 Scroll and Selecting Company Company Chrome Pass NA


select company selected by user selected

2 Scroll and Selecting Car model Car model Chrome Pass NA


select car model selected by user selected

Scroll and Selecting Purchased year Purchased year


3 Chrome Pass NA
select purchase selected by user selected
year

4 Scroll and Selecting Fuel type is Fuel type Chrome Pass NA


select fuel type selected by user selected

Entering the Enter Kilometers Kilometers


5 Chrome Pass NA
value kilometers driven is entered driven is entered
driven by user

Page 68
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.5 Price Prediction Test case with selecting correct attributes

Test Scenario ID Price Prediction Test CaseID Login-2A

Test Case Predicting the Test High


Description price of the car Priority

Pre-Requisite Valid User Account Post NA


Requisite

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 Click on Clicking on Price is predicted Price predicted Chrome Pass Worked


predict price predict price at bottom successfully
button button

Page 69
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.6 Price Prediction Test case without selecting one or more attributes.

Test Scenario ID Price Prediction Test CaseID Login-2A

Test Case Predicting the Test High


Description price of the car Priority

Pre-Requisite Valid User Account Post NA


Requisite
Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

Click on predict Clicking on Fill all Fill all the Chrome Pass Price is not
1
price button predict price attributes attributes and price predicted
without filling button is not predicted.
all attributes

Click on predict Clicking on Incorrect and price is not Chrome Pass Price is not
2
price button with predict price attributes predicted. predicted
filling incorrect button
attributes

Page 70
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.7 Home button Test case

Test Scenario ID Home button Test CaseID Login-2A

Test Case Testing home Test High


Description button Priority

Pre-Requisite Valid User Account Post NA


Requisite

Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 Click on home Click Refreshing home Home page Chrome Pass Successfully
button page refreshed refreshed

Page 71
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

Table 6.2.8 Logout button Test case

Test Scenario ID Logout Test CaseID Login-2A

Test Case Logging out Test High


Description from Priority
homepage
Pre-Requisite Valid User Account Post NA
Requisite
Test Execution Steps:

S.No Action Inputs Expected Actual Test Test Test


Output Output Browser Result Comments

1 Click on logout Click on Return to login Return back to Chrome Pass Log out
button button page login page successfully

Page 72
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -7

Page 73
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

7. SCREENSHOTS

7.1 Layout of Testing Platform

Fig 7.1 - Selenium Testing Platform

7.2 Log and Reference of Testing

Fig 7.2 – Log & reference using Selenium

Page 74
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

7.3 Register Page of Web Application

7.3 Register page of Web Application

7.4 Login Page of Web Application

Figure 7.4 Login page of Web Application

Page 75
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

7.5 Home Page of Web Application

Figure 7.5 Home page of Web Application

7.6 Displaying available car companies

Figure 7.6 Displaying available car companies

Page 76
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

7.7 Displaying suitable car model

Figure 7.6 Displaying suitable car models

7.7 Displaying available years

Figure 7.7 Displaying available years

Page 77
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

7.8 Displaying available Fuel types

Figure 7.8 Displaying available Fuel types

7.9 Displaying Predicted Price

Figure 7.9 Displaying Predicted Price

Page 78
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -8

Page 79
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

8. FUTURE ENHANCEMENTS

A car price prediction has been a high-interest research area, as it requires


noticeable effort and knowledge of the field expert. A considerable number of distinct
attributes are examined for reliable and accurate predictions. The major step in the
prediction process is the collection and pre-processing of the data. In this project, data
was normalized and cleaned to avoid unnecessary noise for machine learning algorithms.
Applying a single machine algorithm to the data set accuracy was less than 70%.
Therefore, the ensemble of multiple machine learning algorithms has been proposed and
this combination of ML methods gains an accuracy of 93%. This is a significant
improvement compared to the single machine learning method approach. However, the
drawback of the proposed system is that it consumes much more computational resources
than a single machine learning algorithm. Although this system has achieved astonishing
performance in the car price prediction problem, it can also be implemented using an
advanced machine learning model and with Deep learning techniques to improve its
efficiency and accuracy. Moreover, as innovation has been increased in automobiles and
we can observe Electric vehicles have gained public attention and are preferred by most
than a normal car.

Downloaded by Prashant Chaudhari


Page 80
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -9

Page 81
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

9. CONCLUSION

The prediction error rate of all the models was well under the accepted 5% of
error. But, on further analysis, the mean error of the regression tree model was found to
be more than the mean error rate of the linear regression model. Even though for some
seeds the regression tree has better accuracy, its error rates are higher for the rest. This
has been confirmed by performing an ANOVA. Also, the post-hoc test revealed that
the error rates in multiple regression models and lasso regression models aren’t
significantly different from each other. To get even more accurate models, we can also
choose more advanced machine learning algorithms such as random forests, an
ensemble learning algorithm which creates multiple decision/regression trees, which
brings down overfitting massively or Boosting, which tries to bias the overall model by
weighing in the favor of good performers. More data from newer websites and
different countries can also be scraped and this data can be used to retrain these models to
check for reproducibility.

Downloaded by Prashant Chaudhari


Page 82
lOMoARcPSD|40893658

Car Price Prediction

CHAPTER -10

Page 83
Downloaded by Prashant Chaudhari
lOMoARcPSD|40893658

Car Price Prediction

10. REFERENCES

[1]. no. 22, pp. 12 693–12 700, 2018. [12] E. Gegic, B. Isakovic, D. Keco, Z. Masetic,
and J. Kevric, ―Car price prediction using machine learning techniques,‖ 2019.

[2]. N. Pal, P. Arora, P. Kohli, D. Sundararaman, and S. S. Palakurthy, ―How much is my


car worth? a methodology for predicting used cars prices using random forest,‖ in Future
of Information and Communication Conference. Springer, 2018, pp. 413–422.

[3]. R. Ragupathy and L. Phaneendra Maguluri, ―Comparative analysis of machine


learning algorithms on social media test,‖ International Journal of Engineering and
Technology(UAE), vol. 7, pp. 284–290, 03 2018.

[4]. F. Harahap, A. Y. N. Harahap, E. Ekadiansyah, R. N. Sari, R. Adawiyah, and C. B.


Harahap, ―Implementation of naive Bayes classification method for predicting
purchase,‖ in 2018 6th International Conference on Cyber and IT Service Management
(CITSM). IEEE, 2018, pp. 1–5.

[5]. F. Osisanwo, J. Akinsola, O. Awodele, J. Hinmikaiye, O. Olakanmi, and J. Akinjobi,


―Supervised machine learning algorithms: classification and comparison,‖
International Journal of Computer Trends and Technology (IJCTT), vol. 48, no. 3, pp.
128–138, 2017.

[6]. K. Noor and S. Jan, ―Vehicle price prediction system using machine learning
techniques,‖ International Journal of Computer Applications, vol. 167, no. 9, pp. 27–31,
2017.

[7]. M. Jabbar, ―Prediction of heart disease using k-nearest neighbor and particle swarm
optimization,‖ Biomed. Res, vol. 28, no. 9, pp. 4154– 4158, 2017.

[8]. M. R. Busse, D. G. Pope, J. C. Pope, and J. Silva-Risso, ―The psychological effect


of weather on car purchases,‖ The Quarterly Journal of Economics, vol. 130, no. 1, pp.
371–414, 2015.

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

Car Price Prediction


Page 84

Downloaded by Prashant Chaudhari


lOMoARcPSD|40893658

Car Price Prediction

[9]. S. Pudaruth, ―Predicting the price of used cars using machine learning techniques,‖
Int. J. Inf. Comput. Technol,vol. 4, no. 7, pp. 753–764, 2014. 183 Authorized licensed
use limited to: Carleton University. Downloaded on May 29,2021 at 09:56:13 UTC from
IEEE Xplore. Restrictions apply.

[10]. M. Jayakameswaraiah and S. Ramakrishna, ―Development of data mining system to


analyze cars using tknn clustering algorithm,‖ International Journal of Advanced
Research in Computer Engineering Technology, vol.3, no. 7, 2014.

[11]. Q. Yuan, Y. Liu, G. Peng, and B. Lv, ―A prediction study on the car sales based on
web search data,‖ in The International Conference on E-Business and E-Government
(Index by EI), 2011, p. 5.

[12]. K. S. Durgesh and B. Lekha, ―Data classification using support vector machine,‖
Journal of theoretical and applied information technology, vol. 12, no. 1, pp. 1–7, 2010.

[13]. Coronavirus on used car sales-


https://economictimes.indiatimes.com/topic/coronavirus-impact-onused-car-sales.

[14]. S. Veni and A. Srinivasan, ―Defect classification using naive Bayes classification,‖
International Journal of Applied Engineering Research, vol.

[15]. Information regarding on learning algorithms


https://en.wikipedia.org/wiki/Machine_learning.

[16]. M. C. Sorkun, ―Secondhand car price estimation using artificial neural network.‖

Page 85
Downloaded by Prashant Chaudhari

You might also like