0% found this document useful (0 votes)
33 views

Updated Used Cars Price Prediction Using Machine Learning

Uploaded by

banditadeka980
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Updated Used Cars Price Prediction Using Machine Learning

Uploaded by

banditadeka980
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Project report submitted in partial fulfilment of the Summer Internship program

required for the degree of Bachelor of Technology

By
Saurav Dutta

Institute: Jorhat Institute of Science and Technology(JIST)


Semester: 5th

Date: 10/08/2024

UNDER THE SUPERVISION OF


Dr. Anjan Kumar Talukdar

to

Department of Electronics and Communication Engineering (ECE)

Guwahati University, Assam, India

1
CONTENTS
Abstract .................................................................................................................................................................................................................. 3
Introduction ............................................................................................................................................................................................................. 4
project overview ...................................................................................................................................................................................................... 5
Data Science Workflow .............................................................................................................................................................................................. 6
Model Deployment .................................................................................................................................................................................................. 14
Web Interface ........................................................................................................................................................................................................ 15
output ................................................................................................................................................................................................................... 19
conclusion ............................................................................................................................................................................................................. 21
applications ........................................................................................................................................................................................................... 22
references ............................................................................................................................................................................................................. 24

2
ABSTRACT
This project focuses on developing a predictive model for estimating the selling price of
used cars based on various vehicle features. The model was built using the Extra Tree
Regressor, leveraging a dataset sourced from "Cardekho_Dataset." Key steps in the
model development process included data cleaning, outlier detection, feature
engineering, and dimensionality reduction. Hyperparameter tuning and K-Fold cross-
validation were employed to optimize the model's performance. The trained model
was then deployed through a Python Flask server, which serves predictions via HTTP
requests. A user-friendly web interface was created using HTML, CSS, and JavaScript,
allowing users to input vehicle details and receive price predictions in real time. This
integrated approach demonstrates the practical application of machine learning in the
automotive domain and provides a scalable solution for used car price estimation.

3
INTRODUCTION
In today’s automotive market, the demand for used cars has risen significantly due to
factors like economic constraints, the desire for affordable transportation, and the wide
variety of available options. Accurately predicting the price of a used car is crucial for
both sellers and buyers to ensure fair transactions. This project focuses on developing a
machine learning model using the Extra Tree Regressor that can predict the selling
price of used cars based on various features such as car name, odometer reading,
vehicle age, fuel type, transmission type, mileage, engine capacity, maximum power,
and number of seats. Creating a Python Flask server to serve predictions through HTTP
requests, and developing a user-friendly web interface for interacting with the model.
Theoretical Background
Predicting the price of used cars is a regression problem in machine learning, where the
objective is to estimate a continuous target variable (price) based on several input
features. The project involves various data science and machine learning concepts,
which are crucial to building a reliable prediction model.
Regression Models
Regression models are a fundamental class of algorithms used to predict continuous
outcomes. In the context of used car price prediction, regression models learn from
historical data by mapping input features (e.g., mileage, age) to the target variable
(price). The model attempts to minimize the difference between the predicted and
actual prices by finding the best-fit function that represents this relationship.
Extra Tree Regressor
The Extra Tree Regressor, also known as Extremely Randomized Trees, is an ensemble
learning method that aggregates the results of multiple decision trees to improve
prediction accuracy. Unlike traditional decision trees, which split nodes based on the
most informative feature, Extra Trees introduce randomness by selecting features and
thresholds at random, leading to a more diversified model and reducing the risk of
overfitting. This makes it particularly useful for handling datasets with high var and
complex interactions among features, as is often the case in used car price prediction.

4
PROJECT OVERVIEW
This project is divided into three main components:
1. Model Development: The first component involves building the price prediction
model using the Extra Tree Regressor. The process includes data cleaning,
feature engineering, dimensionality reduction, hyperparameter tuning, and
model validation.
2. API Development with Flask: The second component involves deploying the
trained model using a Python Flask server. The server handles HTTP requests,
taking in features related to the car and returning the predicted price. This
allows the model to be accessed and used programmatically through a simple
API.
3. Frontend Development: The third component is the development of a web-
based user interface using HTML, CSS, and JavaScript. This interface allows users
to input car details and receive price predictions, providing an accessible and
user-friendly way to interact with the model.
By integrating these components, the project not only demonstrates the practical
application of machine learning in the context of used car price prediction but also
provides a complete, end-to-end solution that can be deployed and used in real-world
scenarios.

5
DATA SCIENCE WORKFLOW
Data Loading and Cleaning
The dataset used for this project is the "Cardekho_Dataset." The first step was to load
the dataset into a Pandas DataFrame. The dataset was then cleaned to remove any
missing values, irrelevant columns, and duplicated records. The categorical variables
were encoded using label encoding or one-hot encoding where necessary.
Loading the dataset to Pandas DataFrame

6
Checking for missing values and
datatypes

7
Outlier Detection and Removal
Outliers in the dataset can distort the model’s performance. Therefore, I conducted
outlier detection using techniques such as the IQR (Interquartile Range) method and
visual inspection through box plots. Outliers were either removed or capped to reduce
their impact on the model.
Removing the Outliers

Feature Engineering
Feature engineering was performed to create new relevant features and transform
existing ones to improve the model's predictive power. For example:
• Vehicle age was derived from the car's manufacturing year.
• Mileage per liter was calculated where necessary.
• Engine and power values were cleaned and converted to numerical types.
• Categorical features like fuel type, transmission type, and car brand were
encoded.

8
Applying Feature Engineering

Dimensionality Reduction
To reduce the complexity of the dataset and improve model performance,
dimensionality reduction techniques were applied. This included removing highly
correlated features based on a correlation matrix analysis, and using PCA (Principal
Component Analysis) to reduce feature dimensions where applicable.
Cleaned Dataset
After cleaning and removing any outliers and applying feature engineering, the
cleaned and updated dataset has been displayed with some samples including the
feature values, which I assigned as ‘a’, and the corresponding labels, which I assigned
as ‘b’.

9
Displaying the Updated Dataset

Model Building with Extra Tree Regressor


The Extra Tree Regressor was chosen for its ability to handle both numerical and
categorical data, and its robustness to outliers. The model was trained on the cleaned
dataset with the target variable being the car price.
Model Building

Hyperparameter Tuning with GridSearchCV


To optimize the model's performance, hyperparameter tuning was carried out using
GridSearchCV. Various combinations of hyperparameters were tested, and the best
parameters were selected based on the model's cross-validated score.

10
Applying GridSearchCV to get the best Model

K-Fold Cross-Validation
To ensure the model's generalizability, K-Fold cross-validation was employed. This
technique splits the dataset into k subsets, trains the model on k-1 subsets, and
validates it on the remaining subset. This process was repeated k times, and the model’s
performance was averaged over all the folds to provide a more accurate assessment
of its predictive ability.
Applying K-Fold Cross Validation

11
R2 (R-squared)
Represents the proportion of the variance in the dependent variable that is predictable
from the independent variables. R² is a measure of how well the model's predictions
match the actual data, with values closer to 1 indicating better performance.
Applying R2 (R-squared)

Custom Accuracy within a Tolerance


Custom Accuracy shows the percentage of predictions within a specified tolerance of
the actual values. Here, I have taken a tolerance of 10%.
Applying Custom Accuracy

Save the Model


After training the model, the trained model is then saved by using Joblib. Joblib helps
the model to save it in pickle (.pkl) extension.

12
Saving the model using Joblib

13
MODEL DEPLOYMENT
Python Flask Server
A Python Flask server was developed to serve the trained model. The server loads the
saved model and exposes an endpoint that accepts HTTP POST requests containing the
features of a car. The server then returns the predicted price in response.
API Endpoint
• Endpoint: /predict
• Method: POST
• Input: JSON object containing car name, odometer reading, vehicle age, fuel
type, transmission type, mileage, engine, maximum power, number of seats.
• Output: Predicted price of the car.
from flask import Flask, request, jsonify
from flask_cors import CORS
import joblib
import pandas as pd

# Initialize Flask app


app = Flask(__name__)
CORS(app)

# Load the trained model


model = joblib.load('car_price_prediction_model.pkl')

# Define a route for predictions


@app.route('/predict', methods=['POST'])
def predict():
# Get the data from the POST request
data = request.get_json(force=True)

# Convert the data into a DataFrame


input_data = pd.DataFrame([data])

# Make predictions using the loaded model


prediction = model.predict(input_data)

# Return the prediction as a JSON response


return jsonify({'predicted_price': prediction[0]})

# Run the Flask app


if __name__ == '__main__':
app.run(debug=True)

14
WEB INTERFACE
Frontend Development
A simple and intuitive web interface was developed using HTML, CSS, and JavaScript.
The interface allows users to input the required features of a car and submit them to
the Flask server.
Form Design
The form includes fields for:
• Car name
• Odometer reading
• Vehicle age
• Fuel type
• Transmission type
• Mileage
• Engine capacity
• Maximum power
• Number of seats
Integration with Flask Server
When the user submits the form, the input data is sent as a POST request to the Flask
server. The server processes the input and returns the predicted price, which is then
displayed on the webpage.

15
This is the HTML code for website built :
1. <!DOCTYPE html>
2. <html lang="en">
3. <head>
4. <meta charset="UTF-8">
5. <meta name="viewport" content="width=device-width, initial-scale=1.0">
6. <title>Used Car Price Prediction</title>
7. <link rel="stylesheet" href="styles.css">
8. <link rel="icon" href="favicon.ico" type="image/x-icon">
9. </head>
10. <body>
11. <div class="container">
12. <h1>Used Car Price Prediction</h1>
13. <form id="prediction-form">
14. <label for="brand">Brand:</label>
15. <input type="text" id="brand" name="brand" required>
16.
17. <label for="vehicle_age">Vehicle Age:</label>
18. <input type="number" id="vehicle_age" name="vehicle_age" required>
19.
20. <label for="km_driven">Odometer Reading (km):</label>
21. <input type="number" id="km_driven" name="km_driven" required>
22.
23. <label for="seller_type">Seller Type:</label>
24. <select id="seller_type" name="seller_type">
25. <option value="Individual">Individual</option>
26. <option value="Dealer">Dealer</option>
27. </select>
28.
29. <label for="fuel_type">Fuel Type:</label>
30. <select id="fuel_type" name="fuel_type">
31. <option value="Petrol">Petrol</option>
32. <option value="Diesel">Diesel</option>
33. <option value="CNG">CNG</option>
34. <option value="LPG">LPG</option>
35. <option value="Electric">Electric</option>
36. </select>
37.
38. <label for="transmission_type">Transmission Type:</label>
39. <select id="transmission_type" name="transmission_type">
40. <option value="Manual">Manual</option>
41. <option value="Automatic">Automatic</option>
42. </select>
43.
44. <label for="mileage">Mileage (km/l):</label>
45. <input type="number" step="0.01" id="mileage" name="mileage" required>
46.
47. <label for="engine">Engine Capacity (cc):</label>
48. <input type="number" id="engine" name="engine" required>
49.
50. <label for="max_power">Max Power (bhp):</label>
51. <input type="number" id="max_power" name="max_power" required>
52.
53. <label for="seats">Number of Seats:</label>
54. <input type="number" id="seats" name="seats" required>
55.
56. <button type="button" onclick="predictPrice()">Predict Price</button>
57. </form>
58. <div id="result"></div>
59. </div>
60. <script src="script.js"></script>
61. </body>
62. </html>
63.

16
This is the CSS code for web design :
1. body {
2. font-family: Arial, sans-serif;
3. background: url('CAR.jpg') no-repeat center center fixed;
4. background-size: cover;
5. margin: 0;
6. padding: 0;
7. }
8.
9. .container {
10. width: 50%;
11. margin: 100px auto;
12. background: white;
13. padding: 20px;
14. border-radius: 10px;
15. box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
16. }
17.
18. h1 {
19. text-align: center;
20. }
21.
22. form {
23. display: flex;
24. flex-direction: column;
25. animation: fadeIn 1s ease-in-out;
26. }
27.
28. label {
29. margin-top: 10px;
30. }
31.
32. input, select, button {
33. padding: 10px;
34. margin-top: 5px;
35. border: 1px solid #ccc;
36. border-radius: 5px;
37. animation: slideIn 0.5s ease-in-out;
38. }
39.
40. button {
41. margin-top: 20px;
42. background-color: #5cb85c;
43. color: white;
44. cursor: pointer;
45. border: none;
46. animation: slideIn 0.7s ease-in-out;
47. }
48.
49. button:hover {
50. background-color: #4cae4c;
51. }
52.
53. #result {
54. margin-top: 20px;
55. font-size: 1.2em;
56. text-align: center;
57. }
58.
59. /* Add keyframes for the animations */
60. @keyframes fadeIn {
61. from { opacity: 0; }
62. to { opacity: 1; }
63. }
64.
65. @keyframes slideIn {

17
66. from { transform: translateY(20px); opacity: 0; }
67. to { transform: translateY(0); opacity: 1; }
68. }
69.

This is the JavaScript code for HTTP calls to our backend


1. function predictPrice() {
2. // Get the form data
3. const brand = document.getElementById('brand').value;
4. const vehicle_age = document.getElementById('vehicle_age').value;
5. const km_driven = document.getElementById('km_driven').value;
6. const seller_type = document.getElementById('seller_type').value;
7. const fuel_type = document.getElementById('fuel_type').value;
8. const transmission_type = document.getElementById('transmission_type').value;
9. const mileage = document.getElementById('mileage').value;
10. const engine = document.getElementById('engine').value;
11. const max_power = document.getElementById('max_power').value;
12. const seats = document.getElementById('seats').value;
13.
14. // Create a JSON object to send to the server
15. const carData = {
16. brand: brand,
17. vehicle_age: parseInt(vehicle_age),
18. km_driven: parseInt(km_driven),
19. seller_type: seller_type,
20. fuel_type: fuel_type,
21. transmission_type: transmission_type,
22. mileage: parseFloat(mileage),
23. engine: parseInt(engine),
24. max_power: parseFloat(max_power),
25. seats: parseInt(seats)
26. };
27.
28. // Send the data to the Flask server using fetch
29. fetch('http://127.0.0.1:5000/predict', {
30. method: 'POST',
31. headers: {
32. 'Content-Type': 'application/json'
33. },
34. body: JSON.stringify(carData)
35. })
36. .then(response => response.json())
37. .then(data => {
38. // Display the result in the #result div
39. document.getElementById('result').innerText = `Predicted Price:
₹${data.predicted_price.toFixed(2)}`;
40. })
41. .catch(error => {
42. console.error('Error:', error);
43. });
44. }
45.

18
OUTPUT
The testing of the Flask API using a tool like POSTMAN :

Output of the Frontend (website) :

19
20
CONCLUSION
This project successfully demonstrates the application of machine learning to predict the
price of used cars based on their features. The combination of data science techniques,
model building with Extra Tree Regressor, API development using Flask, and frontend
development provides a comprehensive solution for price prediction. The model's
performance was evaluated on a test set. The key metrics used to assess the model
were:
• R-squared (R²): The R² score of the model was 0.88, indicating that the model
explains approximately 88% of the variance in the car prices. This suggests that
the model has a good fit to the data and can reliably predict car prices based
on the given features.
• Mean Absolute Error (MAE): The model achieved an MAE of 61715.733, which
means that, on average, the predicted car prices differ from the actual prices by
approximately 61199.722 units.
• Prediction Accuracy within a 10% Tolerance: Additionally, 51.47% of the
model's predictions were within 10% of the actual car prices, demonstrating the
model's reliability in making reasonably close predictions.
Overall, the model exhibits strong predictive capabilities, making it a useful tool for
estimating the selling price of used cars based on their features.
Future work may include incorporating additional features, improving the model’s
accuracy with more advanced algorithms, and deploying the application to a cloud
platform for wider accessibility.

21
APPLICATIONS
The predictive model for used car prices developed in this project has a wide range of
practical applications across various sectors of the automotive industry:
1. Dealerships and Car Sales Platforms
• Pricing Strategy: Dealerships can use the model to set competitive and fair
prices for used cars, taking into account current market conditions and the
specific features of each vehicle. This helps in maximizing sales while ensuring
profitability.
• Inventory Management: The model can assist in evaluating the potential resale
value of vehicles in inventory, helping dealerships make informed decisions about
purchasing and stocking certain models.
2. Online Car Marketplaces
• Price Recommendations: Online platforms that facilitate car sales can integrate
the model to provide sellers with price recommendations based on the attributes
of their car. This ensures that listings are accurately priced, leading to quicker
sales.
• Buyer Guidance: Buyers can use the model to estimate the fair value of cars
they are interested in, helping them make informed purchasing decisions and
avoid overpaying.
3. Insurance Companies
• Premium Calculation: Insurance companies can use the predicted price of a
used car to determine the appropriate insurance premium. Accurate pricing data
ensures that premiums are aligned with the car's market value, reducing risk for
both the insurer and the insured.
• Claim Settlements: In the event of a total loss, the model can be used to
estimate the fair market value of the vehicle, aiding in the settlement of
insurance claims.

22
4. Financial Institutions
• Loan Valuation: Banks and financial institutions can use the model to assess the
value of used cars as collateral for auto loans. This ensures that loan amounts are
in line with the car's actual worth, mitigating the risk of lending.
• Risk Assessment: The model helps in evaluating the depreciation rate of cars,
which is crucial for determining the risk associated with financing used vehicles.
5. Automotive Industry Research
• Market Analysis: Researchers and analysts can use the model to study trends in
the used car market, such as how different factors (e.g., fuel type, mileage)
impact vehicle prices over time.
• Demand Forecasting: The model can be part of larger predictive analytics
efforts to forecast demand for certain car models based on their predicted
resale values, helping manufacturers and sellers adjust their strategies
accordingly.
6. Consumer Decision-Making
• Personal Financial Planning: Individual car buyers and sellers can use the
model to estimate the value of their current or prospective vehicles, aiding in
budgeting and financial planning.
• Negotiation Tool: The predicted price can serve as a baseline in negotiations
between buyers and sellers, leading to more transparent and fair transactions.
By integrating this predictive model into various facets of the automotive ecosystem,
stakeholders can make more informed decisions, optimize pricing strategies, and
improve overall efficiency in the buying and selling process of used cars.

23
REFERENCES
# Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
https://doi.org/10.1023/A:1010933404324
# Cardekho Dataset. Retrieved from
https://www.kaggle.com/datasets/nehalbirla/vehicle-dataset-from-cardekho
# Geron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd ed.).
O'Reilly Media.
# Zhang, H., & Ma, X. (2012). Ensemble machine learning: Methods and applications.
International Journal of Machine Learning and Cybernetics, 3(1), 1-
21.https://doi.org/10.1007/s13042-011-0010-x
# Flask Documentation. (n.d.). Retrieved from
https://flask.palletsprojects.com/en/2.0.x/

24

You might also like