0% found this document useful (0 votes)

6 views

Car Price Prediction

The document outlines a car price prediction project that utilizes various features such as model, year, and specifications to estimate car prices. It includes data exploration, statistical analysis, and visualizations to understand the dataset of 19,237 used cars. The analysis highlights the top manufacturers and their average prices, providing insights into the car market.

Uploaded by

amitmishravirat45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Car Price Prediction

Uploaded by

amitmishravirat45

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Car Price Prediction

By Mohamed Jamyl
http://linkedin.com/in/mohamed-jamyl

https://www.kaggle.com/mohamedjamyl

https://github.com/Mohamed-Jamyl

from IPython.display import Image

Image(filename='Used-Cars.jpg')

Project Overview
With the rise in the variety of cars with differentiated capabilities and features such as model, production
year, category, brand, fuel type, engine volume, mileage, cylinders, colour, airbags and many more, we
are bringing a car price prediction challenge for all. We all aspire to own a car within budget with the best
features available.

Import Libraries
from pandas import read_csv, to_numeric, DataFrame, concat
from matplotlib.pyplot import show, suptitle, subplots_adjust, tight_layout, plot, legend, subplots
from matplotlib.pyplot import figure, plot, title, xlabel, ylabel, text, grid, xticks, tight_layout, style
from numpy import nan, log, inf
from seaborn import kdeplot, heatmap, pairplot, boxplot
from datetime import datetime
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from xgboost import XGBRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from math import sqrt
from pickle import dump

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

Exploratory Data Analysis (EDA)

1. Initial Data Understanding
Data loading and Inspection
Data Types

Missing Values

Duplicates

df = read_csv('car_price.csv')

df.head()

Leather Fuel Engine Gear box Drive

ID Price Levy Manufacturer Model Prod_year Category Mileage Cylinders
interior type volume type wheels

186005
0 45654403 13328 1399 LEXUS RX 450 2010 Jeep Yes Hybrid 3.5 6 Automatic 4x4
km

192000
1 44731507 16621 1018 CHEVROLET Equinox 2011 Jeep No Petrol 3 6 Tiptronic 4x4
km

200000
2 45774419 8467 - HONDA FIT 2006 Hatchback No Petrol 1.3 4 Variator Front
km

168966
3 45769185 3607 862 FORD Escape 2011 Jeep Yes Hybrid 2.5 4 Automatic 4x4
km

91901
4 45809263 11726 446 HONDA FIT 2014 Hatchback Yes Petrol 1.3 4 Automatic Front
km

df.sample(10)

Leather Fuel Engine Gear box

ID Price Levy Manufacturer Model Prod_year Category Mileage Cylinders
interior type volume type

142394
6917 45772842 627 586 LEXUS CT 200h 2012 Hatchback Yes Hybrid 1.8 4 Automatic
km

217600
2201 45800413 12858 843 TOYOTA Prius 2008 Hatchback No Hybrid 1.5 4 Automatic
km

100000
15989 45789258 13799 2467 CHRYSLER Crossfire 2005 Cabriolet Yes Petrol 3 8 Automatic
km

24372
10181 45730647 44003 880 SSANGYONG Actyon 2018 Jeep Yes Petrol 1.6 4 Automatic
km

59340
6629 45809117 58391 722 FORD Mustang 2014 Coupe Yes Petrol 2.3 4 Automatic
km

Corolla 68221
1369 43009722 19444 697 TOYOTA 2015 Sedan No Petrol 1.8 4 Automatic
LE km

87681
13132 45728545 45633 639 SSANGYONG REXTON 2014 Jeep Yes Diesel 2 4 Automatic
km

2.0 280000
9655 45756001 6586 1481 FORD Focus 2006 Universal No Diesel 4 Manual
Turbo km

18287 45756044 17249 441 KIA Optima 2015 Sedan No Hybrid 2.4 0 km 4 Automatic

281659
312 45769910 8154 781 SUBARU Forester 2012 Jeep Yes Petrol 2.5 4 Automatic
km

df.tail()

Leather Fuel Engine Gear box

ID Price Levy Manufacturer Model Prod_year Category Mileage Cylinders
interior type volume type whee

MERCEDES- CLK 2.0 300000

19232 45798355 8467 - 1999 Coupe Yes CNG 4 Manual
BENZ 200 Turbo km

161600
19233 45778856 15681 831 HYUNDAI Sonata 2011 Sedan Yes Petrol 2.4 4 Tiptronic
km

116365
19234 45804997 26108 836 HYUNDAI Tucson 2010 Jeep Yes Diesel 2 4 Automatic
km

51258
19235 45793526 5331 1288 CHEVROLET Captiva 2007 Jeep Yes Diesel 2 4 Automatic
km

186923
19236 45813273 470 753 HYUNDAI Sonata 2012 Sedan Yes Hybrid 2.4 4 Automatic
km

df.shape
(19237, 18)

df.columns

Index(['ID', 'Price', 'Levy', 'Manufacturer', 'Model', 'Prod_year', 'Category',

'Leather interior', 'Fuel type', 'Engine volume', 'Mileage',
'Cylinders', 'Gear box type', 'Drive wheels', 'Doors', 'Wheel', 'Color',
'Airbags'],
dtype='object')

df = df.rename(columns={'Engine volume':'Engine_volume',
'Fuel type':'Fuel_type',
'Leather interior':'Leather_interior',
'Gear box type':'Gear_box_type',
'Drive wheels':'Drive_wheels'})

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19237 entries, 0 to 19236
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 19237 non-null int64
1 Price 19237 non-null int64
2 Levy 19237 non-null object
3 Manufacturer 19237 non-null object
4 Model 19237 non-null object
5 Prod_year 19237 non-null int64
6 Category 19237 non-null object
7 Leather_interior 19237 non-null object
8 Fuel_type 19237 non-null object
9 Engine_volume 19237 non-null object
10 Mileage 19237 non-null object
11 Cylinders 19237 non-null int64
12 Gear_box_type 19237 non-null object
13 Drive_wheels 19237 non-null object
14 Doors 19237 non-null object
15 Wheel 19237 non-null object
16 Color 19237 non-null object
17 Airbags 19237 non-null int64
dtypes: int64(5), object(13)
memory usage: 2.6+ MB

df.isnull().sum()

ID 0
Price 0
Levy 0
Manufacturer 0
Model 0
Prod_year 0
Category 0
Leather_interior 0
Fuel_type 0
Engine_volume 0
Mileage 0
Cylinders 0
Gear_box_type 0
Drive_wheels 0
Doors 0
Wheel 0
Color 0
Airbags 0
dtype: int64

df.duplicated().sum()

313

2. Basic Statistical Overview

Summary Statistical : describe()

df.describe().T
count mean std min 25% 50% 75% max

ID 19237.0 4.557654e+07 936591.422799 20746880.0 45698374.0 45772308.0 45802036.0 45816654.0

Price 19237.0 1.855593e+04 190581.269684 1.0 5331.0 13172.0 22075.0 26307500.0

Prod_year 19237.0 2.010913e+03 5.668673 1939.0 2009.0 2012.0 2015.0 2020.0

Cylinders 19237.0 4.582991e+00 1.199933 1.0 4.0 4.0 4.0 16.0

Airbags 19237.0 6.582627e+00 4.320168 0.0 4.0 6.0 12.0 16.0

df.select_dtypes(include='object').describe()

Levy Manufacturer Model Category Leather_interior Fuel_type Engine_volume Mileage Gear_box_type Drive_wheels Doors

count 19237 19237 19237 19237 19237 19237 19237 19237 19237 19237 19237

unique 559 65 1590 11 2 7 107 7687 4 3

top - HYUNDAI Prius Sedan Yes Petrol 2 0 km Automatic Front 4-May

freq 5819 3769 1083 8736 13954 10150 3916 721 13514 12874 18332

df.hist(bins=15, figsize=(20, 10), color='skyblue', edgecolor='black')

suptitle('Histograms of Columns', fontsize=16)
subplots_adjust(hspace=0.5)
show()

Summary Statistical : Value_counts()

df['Manufacturer'].value_counts()

Manufacturer
HYUNDAI 3769
TOYOTA 3662
MERCEDES-BENZ 2076
FORD 1111
CHEVROLET 1069
...
TESLA 1
PONTIAC 1
SATURN 1
ASTON MARTIN 1
GREATWALL 1
Name: count, Length: 65, dtype: int64

# Extract top 10
top10=df['Manufacturer'].value_counts().sort_values(ascending=False)[:10]
top10
Manufacturer
HYUNDAI 3769
TOYOTA 3662
MERCEDES-BENZ 2076
FORD 1111
CHEVROLET 1069
BMW 1049
LEXUS 982
HONDA 977
NISSAN 660
VOLKSWAGEN 579
Name: count, dtype: int64

style.use('ggplot')
figure(figsize=(14, 5))

plot(top10, marker='o', color='red', linestyle='-', linewidth=2, markersize=8)

title('The graph for the best 10 values', fontsize=16)
ylabel('Value', fontsize=12)

for i, value in enumerate(top10):

text(i, value + 0.5, str(value), ha='center', fontsize=10, color='black')

grid(True, linestyle='--', alpha=0.7)

show()

This is a line plot displaying the "Value" of the top 10 items, with the specific items labeled on the x-axis. The title
"The graph for the best 10 values" suggests these are the 10 items with the highest values according to some
metric.
Top Item: HYUNDAI has the highest value, approximately 3769.

Significant Initial Drop: There's a notable decrease in value from HYUNDAI to TOYOTA (around 3662) and then a much larger
drop to MERCEDES-BENZ (around 2076).
Continued Decline: The value continues to decrease as we move along the x-axis, although the rate of decrease becomes less
steep after MERCEDES-BENZ.
Relatively Stable Mid-Range: From FORD (around 1111) to LEXUS (around 982), the values are relatively stable, fluctuating within
a few hundred units.
Sharp Decline at the End: There's another sharp decline in value from HONDA (around 977) to NISSAN (around 660) and then a
further decrease to VOLKSWAGEN (around 570), which has the lowest value among the top 10.
Specific Values: The exact value for each of the top 10 items is labeled directly above each data point, allowing for precise
comparison.

top10MeanPrices=[df[df['Manufacturer']==i]['Price'].mean() for i in list(top10.index)]

top10MeanPrices

[22338.447864154947,
14248.982250136538,
18609.38294797688,
15573.98199819982,
14926.368568755846,
20876.79218303146,
19191.27698574338,
14291.335721596724,
10032.327272727272,
11640.421416234887]

style.use('ggplot')
figure(figsize=(14, 5))

plot(top10MeanPrices, marker='o', color='red', linestyle='-', linewidth=2, markersize=8)

title('The graph for the best 10 mean of price', fontsize=16)
ylabel('Value', fontsize=12)

for i, value in enumerate(top10MeanPrices):

text(i, value + 0.5, str(value), ha='center', fontsize=10, color='black')

grid(True, linestyle='--', alpha=0.7)

show()

The item at index 0 has the highest mean of price (approximately 22338).

The mean of price drops sharply from index 0 to index 1 (approximately 14248).

The mean of price fluctuate somewhat between indices 1 and 7.

The item at index 8 has the lowest mean of price among the top 10 (approximately 10032)

# object_data = df.select_dtypes(include='object')
# for col in object_data:
# style.use('ggplot')
# figure(figsize=(15,5))
# Top10=df[col].value_counts()[:10]
# colors=['blue','red','green','orange']
# Top10.plot(kind='bar',color=colors)
# xticks(rotation='horizontal' )
# title('Top10'+' '+col)
# show()

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Levy'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta','black']
Top10.plot(kind='bar', color=colors)
xticks(rotation='horizontal' )
title('Top 10 Levy')
show()
The first Levy (represented by the first bar, and labeled "-") has a significantly higher value (approximately 5800) compared to all
other Levy values.
The remaining Levy values are considerably lower, all falling below 1000.

The Levy with the second-highest value is "1017".

The bars for "765", "891", "781", and "836" have relatively similar values.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Model'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta','black']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Model')
show()

Prius has the highest value among the models shown.

Sonata has a value very close to that of Prius.

Camry and Elantra have similar, moderately high values.

The value drops noticeably for E 350 and Santa FE.

FIT, H1, and Tucson have similar, relatively low values.

X5 has the lowest value in the top 10.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Category'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta','black']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Category')
show()
Sedan has the highest value among the categories shown.

Jeep has the second-highest value, but it's significantly lower than Sedan.

Hatchback has the third-highest value, lower than Jeep.

Minivan has a noticeably lower value compared to Hatchback.

Coupe, Universal, Microbus, Goods wagon, Pickup, and Cabriolet all have relatively low values, with Cabriolet having the lowest
value among the top 10.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Fuel_type'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Fuel type')
show()

Petrol has the highest value, significantly higher than all other fuel types.

Diesel has the second-highest value, followed by Hybrid.

The values for LPG and CNG are considerably lower than Petrol, Diesel and Hybrid.

Plug-in Hybrid and Hydrogen have the lowest values among the top 10.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Engine_volume'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta','black']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Engine volume')
show()
The engine volume 2 has the highest value, significantly higher than all other engine volumes.

The engine volume 2.5 has the second-highest value.

The engine volumes 1.8 and 1.6 have similar values, which are lower than 2.5.

The remaining engine volumes (1.5, 3.5, 2.4, 3, 1.3, and 2.0 Turbo) have progressively lower values.

The engine volume 2.0 Turbo has the lowest value among the top 10.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Mileage'].value_counts()[:10]
colors=['blue','red','green','orange','cyan','yellow','magenta','black']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Mileage')
show()

The mileage 0 km has the highest value, significantly higher than all other mileage values.

The mileage 200000 km has the second-highest value.

The mileage values from 150000 km to 1000 km have progressively lower values.

The mileage 1000 km has a very low value, close to the lowest.

The mileage values 170000 km, 120000 km, and 130000 km have the lowest values among the top 10.

style.use('ggplot')
figure(figsize=(15,5))
Top10=df['Drive_wheels'].value_counts()[:10]
colors=['blue','red','green']
Top10.plot(kind='bar',color=colors)
xticks(rotation='horizontal' )
title('Top 10 Drive wheels')
show()
Front-wheel drive (Front) has the highest value, significantly higher than the other two configurations.

4x4 has the second-highest value.

Rear-wheel drive (Rear) has the lowest value among the three.

Data Cleaning

# Remove all duplicates rows

df.drop_duplicates(inplace=True)

df.shape

(18924, 18)

df['Levy'].unique()
array(['1399', '1018', '-', '862', '446', '891', '761', '751', '394',
'1053', '1055', '1079', '810', '2386', '1850', '531', '586',
'1249', '2455', '583', '1537', '1288', '915', '1750', '707',
'1077', '1486', '1091', '650', '382', '1436', '1194', '503',
'1017', '1104', '639', '629', '919', '781', '530', '640', '765',
'777', '779', '934', '769', '645', '1185', '1324', '830', '1187',
'1111', '760', '642', '1604', '1095', '966', '473', '1138', '1811',
'988', '917', '1156', '687', '11714', '836', '1347', '2866',
'1646', '259', '609', '697', '585', '475', '690', '308', '1823',
'1361', '1273', '924', '584', '2078', '831', '1172', '893', '1872',
'1885', '1266', '447', '2148', '1730', '730', '289', '502', '333',
'1325', '247', '879', '1342', '1327', '1598', '1514', '1058',
'738', '1935', '481', '1522', '1282', '456', '880', '900', '798',
'1277', '442', '1051', '790', '1292', '1047', '528', '1211',
'1493', '1793', '574', '930', '1998', '271', '706', '1481', '1677',
'1661', '1286', '1408', '1090', '595', '1451', '1267', '993',
'1714', '878', '641', '749', '1511', '603', '353', '877', '1236',
'1141', '397', '784', '1024', '1357', '1301', '770', '922', '1438',
'753', '607', '1363', '638', '490', '431', '565', '517', '833',
'489', '1760', '986', '1841', '1620', '1360', '474', '1099', '978',
'1624', '1946', '1268', '1307', '696', '649', '666', '2151', '551',
'800', '971', '1323', '2377', '1845', '1083', '694', '463', '419',
'345', '1515', '1505', '2056', '1203', '729', '460', '1356', '876',
'911', '1190', '780', '448', '2410', '1848', '1148', '834', '1275',
'1028', '1197', '724', '890', '1705', '505', '789', '2959', '518',
'461', '1719', '2858', '3156', '2225', '2177', '1968', '1888',
'1308', '2736', '1103', '557', '2195', '843', '1664', '723',
'4508', '562', '501', '2018', '1076', '1202', '3301', '691',
'1440', '1869', '1178', '418', '1820', '1413', '488', '1304',
'363', '2108', '521', '1659', '87', '1411', '1528', '3292', '7058',
'1578', '627', '874', '1996', '1488', '5679', '1234', '5603',
'400', '889', '3268', '875', '949', '2265', '441', '742', '425',
'2476', '2971', '614', '1816', '1375', '1405', '2297', '1062',
'1113', '420', '2469', '658', '1951', '2670', '2578', '1995',
'1032', '994', '1011', '2421', '1296', '155', '494', '426', '1086',
'961', '2236', '1829', '764', '1834', '1054', '617', '1529',
'2266', '637', '626', '1832', '1016', '2002', '1756', '746',
'1285', '2690', '1118', '5332', '980', '1807', '970', '1228',
'1195', '1132', '1768', '1384', '1080', '7063', '1817', '1452',
'1975', '1368', '702', '1974', '1781', '1036', '944', '663', '364',
'1539', '1345', '1680', '2209', '741', '1575', '695', '1317',
'294', '1525', '424', '997', '1473', '1552', '2819', '2188',
'1668', '3057', '799', '1502', '2606', '552', '1694', '1759',
'1110', '399', '1470', '1174', '5877', '1474', '1688', '526',
'686', '5908', '1107', '2070', '1468', '1246', '1685', '556',
'1533', '1917', '1346', '732', '692', '579', '421', '362', '3505',
'1855', '2711', '1586', '3739', '681', '1708', '2278', '1701',
'722', '1482', '928', '827', '832', '527', '604', '173', '1341',
'3329', '1553', '859', '167', '916', '828', '2082', '1176', '1108',
'975', '3008', '1516', '2269', '1699', '2073', '1031', '1503',
'2364', '1030', '1442', '5666', '2715', '1437', '2067', '1426',
'2908', '1279', '866', '4283', '279', '2658', '3015', '2004',
'1391', '4736', '748', '1466', '644', '683', '2705', '1297', '731',
'1252', '2216', '3141', '3273', '1518', '1723', '1588', '972',
'682', '1094', '668', '175', '967', '402', '3894', '1960', '1599',
'2000', '2084', '1621', '714', '1109', '3989', '873', '1572',
'1163', '1991', '1716', '1673', '2562', '2874', '965', '462',
'605', '1948', '1736', '3518', '2054', '2467', '1681', '1272',
'1205', '750', '2156', '2566', '115', '524', '3184', '676', '1678',
'612', '328', '955', '1441', '1675', '3965', '2909', '623', '822',
'867', '3025', '1993', '792', '636', '4057', '3743', '2337',
'2570', '2418', '2472', '3910', '1662', '2123', '2628', '3208',
'2080', '3699', '2913', '864', '2505', '870', '7536', '1924',
'1671', '1064', '1836', '1866', '4741', '841', '1369', '5681',
'3112', '1366', '2223', '1198', '1039', '3811', '3571', '1387',
'1171', '1365', '1531', '1590', '11706', '2308', '4860', '1641',
'1045', '1901'], dtype=object)

len([i for i in df['Levy'].tolist() if not i.isnumeric()])

5709

# replace (-) by (0) in Levy column

df['Levy'].replace({'-':0},inplace=True)
df['Levy']=df['Levy'].astype(float)
C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\272051335.py:2: FutureWarning: A value is trying to be set on a c
opy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on w
hich we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)'
or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

df['Levy'].replace({'-':0},inplace=True)

df['Levy'].unique()

array([ 1399., 1018., 0., 862., 446., 891., 761., 751.,

394., 1053., 1055., 1079., 810., 2386., 1850., 531.,
586., 1249., 2455., 583., 1537., 1288., 915., 1750.,
707., 1077., 1486., 1091., 650., 382., 1436., 1194.,
503., 1017., 1104., 639., 629., 919., 781., 530.,
640., 765., 777., 779., 934., 769., 645., 1185.,
1324., 830., 1187., 1111., 760., 642., 1604., 1095.,
966., 473., 1138., 1811., 988., 917., 1156., 687.,
11714., 836., 1347., 2866., 1646., 259., 609., 697.,
585., 475., 690., 308., 1823., 1361., 1273., 924.,
584., 2078., 831., 1172., 893., 1872., 1885., 1266.,
447., 2148., 1730., 730., 289., 502., 333., 1325.,
247., 879., 1342., 1327., 1598., 1514., 1058., 738.,
1935., 481., 1522., 1282., 456., 880., 900., 798.,
1277., 442., 1051., 790., 1292., 1047., 528., 1211.,
1493., 1793., 574., 930., 1998., 271., 706., 1481.,
1677., 1661., 1286., 1408., 1090., 595., 1451., 1267.,
993., 1714., 878., 641., 749., 1511., 603., 353.,
877., 1236., 1141., 397., 784., 1024., 1357., 1301.,
770., 922., 1438., 753., 607., 1363., 638., 490.,
431., 565., 517., 833., 489., 1760., 986., 1841.,
1620., 1360., 474., 1099., 978., 1624., 1946., 1268.,
1307., 696., 649., 666., 2151., 551., 800., 971.,
1323., 2377., 1845., 1083., 694., 463., 419., 345.,
1515., 1505., 2056., 1203., 729., 460., 1356., 876.,
911., 1190., 780., 448., 2410., 1848., 1148., 834.,
1275., 1028., 1197., 724., 890., 1705., 505., 789.,
2959., 518., 461., 1719., 2858., 3156., 2225., 2177.,
1968., 1888., 1308., 2736., 1103., 557., 2195., 843.,
1664., 723., 4508., 562., 501., 2018., 1076., 1202.,
3301., 691., 1440., 1869., 1178., 418., 1820., 1413.,
488., 1304., 363., 2108., 521., 1659., 87., 1411.,
1528., 3292., 7058., 1578., 627., 874., 1996., 1488.,
5679., 1234., 5603., 400., 889., 3268., 875., 949.,
2265., 441., 742., 425., 2476., 2971., 614., 1816.,
1375., 1405., 2297., 1062., 1113., 420., 2469., 658.,
1951., 2670., 2578., 1995., 1032., 994., 1011., 2421.,
1296., 155., 494., 426., 1086., 961., 2236., 1829.,
764., 1834., 1054., 617., 1529., 2266., 637., 626.,
1832., 1016., 2002., 1756., 746., 1285., 2690., 1118.,
5332., 980., 1807., 970., 1228., 1195., 1132., 1768.,
1384., 1080., 7063., 1817., 1452., 1975., 1368., 702.,
1974., 1781., 1036., 944., 663., 364., 1539., 1345.,
1680., 2209., 741., 1575., 695., 1317., 294., 1525.,
424., 997., 1473., 1552., 2819., 2188., 1668., 3057.,
799., 1502., 2606., 552., 1694., 1759., 1110., 399.,
1470., 1174., 5877., 1474., 1688., 526., 686., 5908.,
1107., 2070., 1468., 1246., 1685., 556., 1533., 1917.,
1346., 732., 692., 579., 421., 362., 3505., 1855.,
2711., 1586., 3739., 681., 1708., 2278., 1701., 722.,
1482., 928., 827., 832., 527., 604., 173., 1341.,
3329., 1553., 859., 167., 916., 828., 2082., 1176.,
1108., 975., 3008., 1516., 2269., 1699., 2073., 1031.,
1503., 2364., 1030., 1442., 5666., 2715., 1437., 2067.,
1426., 2908., 1279., 866., 4283., 279., 2658., 3015.,
2004., 1391., 4736., 748., 1466., 644., 683., 2705.,
1297., 731., 1252., 2216., 3141., 3273., 1518., 1723.,
1588., 972., 682., 1094., 668., 175., 967., 402.,
3894., 1960., 1599., 2000., 2084., 1621., 714., 1109.,
3989., 873., 1572., 1163., 1991., 1716., 1673., 2562.,
2874., 965., 462., 605., 1948., 1736., 3518., 2054.,
2467., 1681., 1272., 1205., 750., 2156., 2566., 115.,
524., 3184., 676., 1678., 612., 328., 955., 1441.,
1675., 3965., 2909., 623., 822., 867., 3025., 1993.,
792., 636., 4057., 3743., 2337., 2570., 2418., 2472.,
3910., 1662., 2123., 2628., 3208., 2080., 3699., 2913.,
864., 2505., 870., 7536., 1924., 1671., 1064., 1836.,
1866., 4741., 841., 1369., 5681., 3112., 1366., 2223.,
1198., 1039., 3811., 3571., 1387., 1171., 1365., 1531.,
1590., 11706., 2308., 4860., 1641., 1045., 1901.])
df['Levy'].mean()

632.8864933417882

# Replace 0 with 'nan'

df['Levy'].replace({0:nan},inplace=True)

# Replace 'nan' with mean

m=df['Levy'].mean()
df['Levy'].fillna(m,inplace=True)

C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\1111466452.py:1: FutureWarning: A value is trying to be set on a

copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on w
hich we are setting values always behaves as a copy.

df['Levy'].replace({0:nan},inplace=True)
C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\1111466452.py:4: FutureWarning: A value is trying to be set on a
copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on w
hich we are setting values always behaves as a copy.

df['Levy'].fillna(m,inplace=True)

df['Levy'].value_counts()

Levy
906.299205 5709
765.000000 482
891.000000 453
639.000000 403
640.000000 398
...
3156.000000 1
2908.000000 1
1279.000000 1
1719.000000 1
1901.000000 1
Name: count, Length: 559, dtype: int64

df['Engine_volume'].unique()

array(['3.5', '3', '1.3', '2.5', '2', '1.8', '2.4', '4', '1.6', '3.3',
'2.0 Turbo', '2.2 Turbo', '4.7', '1.5', '4.4', '3.0 Turbo',
'1.4 Turbo', '3.6', '2.3', '1.5 Turbo', '1.6 Turbo', '2.2',
'2.3 Turbo', '1.4', '5.5', '2.8 Turbo', '3.2', '3.8', '4.6', '1.2',
'5', '1.7', '2.9', '0.5', '1.8 Turbo', '2.4 Turbo', '3.5 Turbo',
'1.9', '2.7', '4.8', '5.3', '0.4', '2.8', '3.2 Turbo', '1.1',
'2.1', '0.7', '5.4', '1.3 Turbo', '3.7', '1', '2.5 Turbo', '2.6',
'1.9 Turbo', '4.4 Turbo', '4.7 Turbo', '0.8', '0.2 Turbo', '5.7',
'4.8 Turbo', '4.6 Turbo', '6.7', '6.2', '1.2 Turbo', '3.4',
'1.7 Turbo', '6.3 Turbo', '2.7 Turbo', '4.3', '4.2', '2.9 Turbo',
'0', '4.0 Turbo', '20', '3.6 Turbo', '0.3', '3.7 Turbo', '5.9',
'5.5 Turbo', '0.2', '2.1 Turbo', '5.6', '6', '0.7 Turbo',
'0.6 Turbo', '6.8', '4.5', '0.6', '7.3', '0.1', '1.0 Turbo', '6.3',
'4.5 Turbo', '0.8 Turbo', '4.2 Turbo', '3.1', '5.0 Turbo', '6.4',
'3.9', '5.7 Turbo', '0.9', '0.4 Turbo', '5.4 Turbo', '0.3 Turbo',
'5.2', '5.8', '1.1 Turbo'], dtype=object)

len([i for i in df['Engine_volume'].tolist() if 'Turbo' in i])

1892

# replace (Turbo) by ('') in Engine volume column

df['Engine_volume']=df['Engine_volume'].str.replace('Turbo','')
df['Engine_volume'] = to_numeric(df['Engine_volume'])

df['Engine_volume'].unique()
array([ 3.5, 3. , 1.3, 2.5, 2. , 1.8, 2.4, 4. , 1.6, 3.3, 2.2,
4.7, 1.5, 4.4, 1.4, 3.6, 2.3, 5.5, 2.8, 3.2, 3.8, 4.6,
1.2, 5. , 1.7, 2.9, 0.5, 1.9, 2.7, 4.8, 5.3, 0.4, 1.1,
2.1, 0.7, 5.4, 3.7, 1. , 2.6, 0.8, 0.2, 5.7, 6.7, 6.2,
3.4, 6.3, 4.3, 4.2, 0. , 20. , 0.3, 5.9, 5.6, 6. , 0.6,
6.8, 4.5, 7.3, 0.1, 3.1, 6.4, 3.9, 0.9, 5.2, 5.8])

df['Mileage'].unique()

array(['186005 km', '192000 km', '200000 km', ..., '140607 km',

'307325 km', '186923 km'], dtype=object)

len([i for i in df['Mileage'].tolist() if 'km' in i])

18924

# replace (km) by ('') in Mileage column

df['Mileage']=df['Mileage'].str.replace('km','')
df['Mileage']=to_numeric(df['Mileage'])

df['Mileage'].unique()

array([186005, 192000, 200000, ..., 140607, 307325, 186923], dtype=int64)

df=df.drop(['ID','Doors'],axis=1)

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 18924 entries, 0 to 19236
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Price 18924 non-null int64
1 Levy 18924 non-null float64
2 Manufacturer 18924 non-null object
3 Model 18924 non-null object
4 Prod_year 18924 non-null int64
5 Category 18924 non-null object
6 Leather_interior 18924 non-null object
7 Fuel_type 18924 non-null object
8 Engine_volume 18924 non-null float64
9 Mileage 18924 non-null int64
10 Cylinders 18924 non-null int64
11 Gear_box_type 18924 non-null object
12 Drive_wheels 18924 non-null object
13 Wheel 18924 non-null object
14 Color 18924 non-null object
15 Airbags 18924 non-null int64
dtypes: float64(2), int64(5), object(9)
memory usage: 2.5+ MB

df['Cylinders'].sort_values(ascending=False) #87 11714

12550 16
16487 16
456 16
1917 16
6863 16
..
13850 1
4599 1
8181 1
9980 1
13104 1
Name: Cylinders, Length: 18924, dtype: int64

Distribution of Variables

Numerical Features (KDE)

figure(figsize=(12, 4))
kdeplot(df['Price'], fill=True, color='blue', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Price', fontsize=16)
xlabel('Price', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

Concentration of Data
High Density Near Zero: The tall peak at the beginning of the x-axis indicates that a large proportion of the prices are very close to
zero.
Very Few High-Priced Items: The long tail extending to the right shows that there are a few items with very high prices, but they are
relatively rare. The density (the curve's height) is very low across this wide range of higher prices.

Interpretation
In simpler terms, this plot suggests that:

Most of the items have very low prices.

There are fewer and fewer items as the price increases.

There are only a very small number of items with extremely high prices.

This type of distribution is common in many real-world scenarios, such as housing prices, income distributions, or
the value of used goods, where most items are inexpensive, and only a few are very expensive.

figure(figsize=(12, 4))
kdeplot(df['Levy'], fill=True, color='blue', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Levy', fontsize=16)
xlabel('Levy', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

Concentration of Data
High Density Near Zero: The tall peak at the beginning of the x-axis indicates that a large proportion of the Levy values are very
close to zero.
Very Few High Levy Values: The long tail extending to the right shows that there are a few items with very high Levy values, but
they are relatively rare. The density (the curve's height) is very low across this wide range of higher Levy values.

Interpretation
In simpler terms, this plot suggests that:

Most of the items have very low Levy values.

There are fewer and fewer items as the Levy value increases.

There are only a very small number of items with extremely high Levy values.

figure(figsize=(12, 4))
kdeplot(df['Prod_year'], fill=True, color='blue', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Prod_year', fontsize=16)
xlabel('Prod_year', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

Concentration of Data
High Density in Recent Years: The tall peak around 2010 indicates that a large proportion of the data points represent cars
produced in that period. There's a general increase in density from the 1980s onwards, showing a growing number of cars produced
in each subsequent year leading up to the peak.
Very Few Cars from Distant Past: The long tail extending to the left shows that there are very few cars from much earlier
production years (e.g., before 1980). The density is very low across this wide range of earlier years.

Interpretation
In simpler terms, this plot suggests that:

Most of the cars in the dataset were produced in recent years, with a large concentration around the 2010s.

The number of cars decreases significantly as we go further back in production year.

There are very few cars from the distant past within this dataset.

This type of distribution is common in car sales or inventory data, where there are typically more newer models
than older ones

figure(figsize=(12, 4))
kdeplot(df['Engine_volume'], fill=True, color='blue', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Engine_volume', fontsize=16)
xlabel('Engine_volume', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()
Concentration of Data
High Density Near 0 and 2: The tall peaks at the beginning of the x-axis indicates that a large proportion of the Engine_volumes are
very close to 0 and 2.
Lower Density for Higher Values: The density drops off considerably after 2, indicating that larger engine volumes are less
common.

Interpretation
In simpler terms, this plot suggests that:

Many items have an engine volume of 0.

Engine volumes of around 2 are also very common.

Engine volumes between 0 and 2 are uncommon.

Larger engine volumes are less and less frequent.

figure(figsize=(12, 4))
kdeplot(df['Mileage'], fill=True, color='blue', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Mileage', fontsize=16)
xlabel('Mileage', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

Concentration of Data
High Density Near Zero: The tall peak at the beginning of the x-axis indicates that a large proportion of the Mileage values are very
close to zero.
Very Few High Mileage Values: The long tail extending to the right shows that there are a few items with very high Mileage values,
but they are relatively rare. The density (the curve's height) is very low across this wide range of higher Mileage values.

Interpretation
In simpler terms, this plot suggests that:

Most of the items have very low Mileage values.

There are fewer and fewer items as the Mileage value increases.

There are only a very small number of items with extremely high Mileage values.
Log transformation
# log transformation
figure(figsize=(12, 4))
kdeplot(log(df['Mileage']).replace(-inf,1e-6), fill=True, color='green', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Mileage', fontsize=16)
xlabel('Mileage', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

C:\Users\RPC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages
\Python311\site-packages\pandas\core\arraylike.py:399: RuntimeWarning: divide by zero encountered in log
result = getattr(ufunc, method)(*inputs, **kwargs)

Comparison with Original KDE Plot of Mileage

Skewness: The original KDE plot was extremely skewed to the right, with a very high peak near zero and a long tail. The log
transformation has significantly reduced this skewness.
Concentration: In the original plot, most of the data was concentrated at very low mileage values. After the transformation, the data
is more evenly distributed.

Interpretation:
Original Mileage KDE: Indicated that most items had very low mileage, with very few having high mileage.

Log-Transformed Mileage KDE: Shows a more "normal" distribution, making it easier to analyze the typical range of mileage (in log
terms) and how mileage values are distributed.

# log transformation

figure(figsize=(12, 4))
kdeplot(log(df['Levy']).replace(inf,1e-6), fill=True, color='green', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Levy', fontsize=16)
xlabel('Levy', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()
Comparison with Original KDE Plot of Levy
Skewness: The original KDE plot was extremely skewed to the right, with a very high peak near zero and a long tail. The log
transformation has significantly reduced this skewness.
Concentration: In the original plot, most of the data was concentrated at very low Levy values. After the transformation, the data is
more evenly distributed.

Interpretation:
Original Levy KDE: Indicated that most items had very low Levy values, with very few having high Levy values.

Log-Transformed Levy KDE: Shows a more "normal" distribution, making it easier to analyze the typical range of Levy values (in
log terms) and how Levy values are distributed.

# log transformation

figure(figsize=(12, 4))
kdeplot(log(df['Engine_volume']).replace(-inf,1e-6), fill=True, color='green', alpha=0.6)
title(f'Kernel Density Estimate (KDE) - Engine_volume', fontsize=16)
xlabel('Engine_volume', fontsize=12)
ylabel('Density', fontsize=12)
grid(True, linestyle='--', alpha=0.7)
tight_layout()
show()

Comparison with Original KDE Plot of Engine_volume

Skewness: The original KDE plot was heavily skewed to the right, with a very high peak at 0. The log transformation has reduced
this skewness.
Concentration: In the original plot, a large portion of the data was concentrated at very low Engine_volume values. After the
transformation, the data is more evenly distributed.

Interpretation:
Original Engine_volume KDE: Indicated that many items had an engine volume of 0, and that larger engine volumes were less and
less frequent.
Log-Transformed Engine_volume KDE: Shows the distribution of the logarithm of engine volume, which may be more useful for
certain types of analysis.

Checking Correlation between the features

heatmap(df.select_dtypes(exclude='object').corr(),annot=True)
show()
Key Relationships from the Correlation Matrix
Engine_volume has moderate positive correlations with Levy (0.54) and Cylinders (0.78). This suggests that larger engine
volumes tend to be associated with higher Levy values and more cylinders.
Cylinders and Levy also have a moderate positive correlation (0.46).

Prod_year has a small positive correlation with Airbags (0.24), indicating a slight tendency for newer cars to have more airbags.

Price, Mileage, and Airbags show very weak correlations with other variables.

Overall
The KDE plots highlight the distributions of individual variables, showing that Price, Levy, and Mileage are heavily skewed, while
Engine_volume has multiple peaks. Log transformation helps to make these distributions more normal.
The correlation matrix shows some relationships between the variables, but most are weak. Engine_volume and Cylinders have the
strongest relationships with other variables.

Feature Interactions
pairplot(df[:100])
show()
Rows and Columns:
Each row and column represents a different variable: Price, Levy, Prod_year, Engine_volume, Mileage, Cylinders, and Airbags.

The diagonal plots (from top-left to bottom-right) show the distribution of each variable using a histogram or kernel density
estimation.
The off-diagonal plots are scatterplots, showing how one variable changes in relation to another. For example, the plot in the first row
and second column shows how Price changes with Levy.

General Observations
Most of the scatterplots appear to show a fair amount of scatter, indicating that the relationships between the variables are not
perfectly linear.
Some variables show more distinct patterns or trends than others.

Observations on Specific Variable Pairs

Based on the scatterplots, here are some observations about the relationships between specific pairs of variables:

Price vs. Levy: There seems to be a slight positive relationship. As Levy increases, Price tends to increase.

Price vs. Engine_volume: There is a positive relationship with a lot of scatter. As Engine_volume increases, Price tends to
increase.
Engine_volume vs. Cylinders: There appears to be a positive relationship. As Engine_volume increases, Cylinders tends to
increase.
Prod_year vs. Price: There is a positive relationship. As Prod_year increases, Price tends to increase.
Histograms on the Diagonal
The histograms on the diagonal provide information about the distribution of individual variables:

Price, Levy, and Mileage: These histograms are heavily skewed to the right, indicating that most of the data points have lower
values, with a few larger values.
Prod_year: This histogram is skewed to the left, indicating that most of the data points are from more recent years.

Engine_volume: This histogram shows that most of the data points are clustered at the lower end, with a few larger values.

Cylinders and Airbags: These histograms show that most cars have a specific number of cylinders and airbags, with fewer cars
having other values.

Detect Outliers

boxplot(df['Price'])
show()

Observations from the Plot

Median: The median price is relatively low, located towards the bottom of the box. This indicates that half of the prices are below this
value.
Spread: The box is relatively short, suggesting that the middle 50% of the data has a small range. This means that a majority of the
prices are clustered together.
Skewness: The plot is heavily skewed to the right.

The long whisker extending upwards shows that there's a wide range of higher prices.

Most of the data is compressed in the lower part of the distribution.

Outliers: There are several data points plotted as circles above the upper whisker. These are outliers, representing prices that are
significantly higher than the majority of the data.

Interpretation
Most prices are low.

The majority of the prices are clustered within a small range.

There are a few very high-priced items that are significantly different from the rest of the data.

The distribution of prices is not symmetrical but is skewed towards higher values.

df[df['Price']> 5e5]
Price Levy Manufacturer Model Prod_year Category Leather_interior Fuel_type Engine_volume Mileage Cylind

G 65
MERCEDES-
1225 627220 906.299205 AMG 2020 Jeep Yes Petrol 6.3 0
BENZ
63AMG

8541 872946 2067.000000 LAMBORGHINI Urus 2019 Universal Yes Petrol 4.0 2531

Goods
16983 26307500 906.299205 OPEL Combo 1999 No Diesel 1.7 99999
wagon

boxplot(df['Levy'])
show()

boxplot(df['Mileage'])
show()

numerical_data=df[['Price','Levy','Engine_volume','Mileage','Cylinders','Airbags']]
for column in numerical_data.columns:
Q1=numerical_data[column].quantile(0.25)
Q3=numerical_data[column].quantile(0.75)
IQR = Q3-Q1

Lower_bound = Q1 - 1.5*IQR
Upper_bound = Q3 + 1.5*IQR

outliers = ((numerical_data[column]>Upper_bound)|(numerical_data[column]<Lower_bound)).sum()
Total = numerical_data[column].shape[0]
print(f'Total of outliers in {column} are : {outliers}--{round(100*(outliers)/Total,2)}%')

if outliers > 0:
df=df.loc[(df[column] <= Upper_bound) & (df[column] >= Lower_bound)]
Total of outliers in Price are : 1055--5.57%
Total of outliers in Levy are : 3103--16.4%
Total of outliers in Engine_volume are : 1358--7.18%
Total of outliers in Mileage are : 635--3.36%
Total of outliers in Cylinders are : 4765--25.18%
Total of outliers in Airbags are : 0--0.0%

Cylinders has the highest percentage of outliers (25.18%), with a total of 4765 outliers.

Levy also has a substantial number of outliers (3103), representing 16.4% of its data.

Engine_volume and Price have a moderate number of outliers, 1358 (7.18%) and 1055 (5.57%) respectively.

Mileage has a relatively small percentage of outliers (635, 3.36%).

Airbags has no outliers (0%).

In essence, "Cylinders" and "Levy" contain the most outliers, suggesting that these variables have values that
frequently deviate significantly from the norm

# def outliers(df, col):

# Q1 = df[col].quantile(0.25)
# Q3 = df[col].quantile(0.75)
# IQR = Q3-Q1

# lower_bound= Q1-1.5 * IQR

# upper_bound= Q1+1.5 * IQR

# df_no_outliers = df[(df[col] >= lower_bound) & (df[col] <= upper_bound)]

# return df_no_outliers

# df= outliers(df,'Price')
# df= outliers(df,'Mileage')
# df= outliers(df,'Levy')
# df= outliers(df,'Engine_volume')
# df= outliers(df,'Cylinders')
# df= outliers(df,'Airbags')

df.shape

(11520, 16)

# Creating dataset
data = df['Price']

fig = figure(figsize =(8, 5))

ax = fig.add_subplot(111)

# Creating axes instance

bp = ax.boxplot(data, patch_artist = True,
notch ='True', vert = 0)

colors = ['#0000FF']

for patch, color in zip(bp['boxes'], colors):

patch.set_facecolor(color)

# changing color and linewidth of

# whiskers
for whisker in bp['whiskers']:
whisker.set(color ='#8B008B',
linewidth = 1.5,
linestyle =":")

# changing color and linewidth of

# caps
for cap in bp['caps']:
cap.set(color ='#8B008B',
linewidth = 2)

# changing color and linewidth of

# medians
for median in bp['medians']:
median.set(color ='red',
linewidth = 3)

# changing style of fliers

for flier in bp['fliers']:
flier.set(marker ='D',
color ='#e7298a',
alpha = 0.5)
# x-axis labels
ax.set_yticklabels(['Price'])

# Adding title
title("Price box plot")

# Removing top axes and right axes

# ticks
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()

# show plot
show()

C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\4101698197.py:42: UserWarning: set_ticklabels() should only be us

ed with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.
ax.set_yticklabels(['Price'])

The long whisker extending upwards shows that there's a wide range of higher prices.

Most of the data is compressed in the lower part of the distribution.

Outliers: There are several data points plotted as circles above the upper whisker. These are outliers, representing prices that are
significantly higher than the majority of the data.

Interpretation
Most prices are low.

The majority of the prices are clustered within a small range.

There are a few very high-priced items that are significantly different from the rest of the data.

The distribution of prices is not symmetrical but is skewed towards higher values.

Target Variable Analysis

Relationship with Predictors (scatter plots, box Plots against the target)

top_10_categs = df['Airbags'].value_counts().index[:10]
filtered_df = df[df['Airbags'].isin(top_10_categs)]

figure(figsize=(14,6))
boxplot(x=filtered_df['Airbags'], y=filtered_df['Price'])
title(f'Box plot of Airbags vs Price')
show()
Observations:
General Trend: There appears to be a general, albeit not perfectly linear, positive relationship between the number of airbags and
the median price. As the number of airbags increases, the median price tends to increase.
Variability in Price: For most airbag counts, there's a significant spread in price, as indicated by the height of the boxes and the
length of the whiskers. This means that for a given number of airbags, there's a wide range of prices.
Outliers: Many of the airbag categories show a considerable number of outliers, particularly on the higher end of the price spectrum.
This suggests that there are vehicles with a certain number of airbags that are priced significantly higher than the typical range for
that category.

Specific Airbag Counts:

0 Airbags: Shows a relatively low median price and a wide spread.

1, 2, 5 Airbags: These categories seem to have lower median prices compared to their neighbors.

4 Airbags: This category stands out with a notably higher median price and a larger interquartile range (taller box) compared to 0, 1,
2, or 5 airbags.
6, 7, 8, 10 Airbags: These categories generally show increasing median prices. The median for 10 airbags appears to be one of the
highest.
12 Airbags: This category has a relatively high median price, but also a particularly wide spread, with some very low prices and
many high outliers.

Relationship with Target (Price):

Positive Association: There is a discernible positive association between the number of airbags and the price. Vehicles with more
airbags tend to have higher median prices. This makes intuitive sense, as more safety features often correlate with higher vehicle
segments or newer models.
Not a Perfect Predictor: While there's a trend, the significant overlap in price ranges across different airbag counts, and the
presence of many outliers, indicate that the number of airbags alone is not a strong, singular predictor of price. Other factors clearly
play a substantial role.
Potential for Feature Engineering: This relationship suggests that 'Airbags' is a relevant feature for predicting 'Price'. Further
analysis, or perhaps combining 'Airbags' with other features, could yield stronger predictive power.

top_10_categs = df['Engine_volume'].value_counts().index[:10]
filtered_df = df[df['Engine_volume'].isin(top_10_categs)]

figure(figsize=(14,6))
boxplot(x=filtered_df['Engine_volume'], y=filtered_df['Price'])
title(f'Box plot of Engine_volume vs Price')
show()
Observations:
General Trend: There appears to be a general positive relationship between engine volume and the median price. As the engine
volume increases, the median price tends to increase, although this trend is not perfectly linear and has some fluctuations.
Variability in Price: For most engine volumes, there's a significant spread in price, as indicated by the height of the boxes and the
length of the whiskers. This means that for a given engine volume, there's a wide range of prices.
Outliers: Many of the engine volume categories show a considerable number of outliers, particularly on the higher end of the price
spectrum. This suggests that there are vehicles with a certain engine volume that are priced significantly higher than the typical
range for that category.

Specific Engine Volumes:

1.3 and 1.4: These tend to have lower median prices and a narrower interquartile range.

1.6: Shows a higher median price and a wider spread compared to 1.3-1.5.

1.7: This category stands out with a very wide interquartile range (tallest box) and a relatively high median price, indicating a large
variability in price for this engine size.
1.8, 2.0, 2.2, 2.4, 2.5: Generally show increasing median prices as engine volume increases, with 2.2 and 2.5 having some of the
highest median prices.

Relationship with Target (Price):

Positive Association: There is a discernible positive association between engine volume and price. Vehicles with larger engine
volumes tend to have higher median prices. This is often expected, as larger engines are typically found in more expensive or
higher-performance vehicles.
Not a Perfect Predictor: While there's a trend, the significant overlap in price ranges across different engine volumes, and the
presence of many outliers, indicate that engine volume alone is not a strong, singular predictor of price. Other factors clearly play a
substantial role.
Potential for Feature Engineering: This relationship suggests that 'Engine_volume' is a relevant feature for predicting 'Price'.
Further analysis, or perhaps combining 'Engine_volume' with other features, could yield stronger predictive power.

Feature Extraction

# Date
dtime=datetime.now()
# calcul age of cars
df['Age_of_Car']=dtime.year-df['Prod_year']

# df = df.drop(columns=['Prod_year'], axis=1)

df[['Age_of_Car','Prod_year']]
Age_of_Car Prod_year

2 19 2006

3 14 2011

5 9 2016

6 15 2010

7 12 2013

... ... ...

19230 14 2011

19232 26 1999

19233 14 2011

19234 15 2010

19236 13 2012

11520 rows × 2 columns

Transform Data

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 11520 entries, 2 to 19236
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Price 11520 non-null int64
1 Levy 11520 non-null float64
2 Manufacturer 11520 non-null object
3 Model 11520 non-null object
4 Prod_year 11520 non-null int64
5 Category 11520 non-null object
6 Leather_interior 11520 non-null object
7 Fuel_type 11520 non-null object
8 Engine_volume 11520 non-null float64
9 Mileage 11520 non-null int64
10 Cylinders 11520 non-null int64
11 Gear_box_type 11520 non-null object
12 Drive_wheels 11520 non-null object
13 Wheel 11520 non-null object
14 Color 11520 non-null object
15 Airbags 11520 non-null int64
16 Age_of_Car 11520 non-null int64
dtypes: float64(2), int64(6), object(9)
memory usage: 1.6+ MB

# Spliting data to object data and non object data

df_object = df.select_dtypes('object')
df_non_object = df.select_dtypes('number')

def number_unique_columns(data):
for i in data.columns:
print(f'{i} : {data[i].nunique()}')

number_unique_columns(df_object)

Manufacturer : 55
Model : 953
Category : 11
Leather_interior : 2
Fuel_type : 6
Gear_box_type : 4
Drive_wheels : 3
Wheel : 2
Color : 16

# for label encoding

df_object_for_LB = df_object[['Manufacturer','Model','Category','Fuel_type','Color','Leather_interior','Wheel']]
LabelEncoders = {}
for col in df_object_for_LB:
label = LabelEncoder()
df_object_for_LB[col]=label.fit_transform(df_object_for_LB[col])
LabelEncoders[col] = label

C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\4043312042.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#retu

rning-a-view-versus-a-copy
df_object_for_LB[col]=label.fit_transform(df_object_for_LB[col])
C:\Users\RPC\AppData\Local\Temp\ipykernel_3728\4043312042.py:8: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#retu

rning-a-view-versus-a-copy
df_object_for_LB[col]=label.fit_transform(df_object_for_LB[col])

LabelEncoders

{'Manufacturer': LabelEncoder(),
'Model': LabelEncoder(),
'Category': LabelEncoder(),
'Fuel_type': LabelEncoder(),
'Color': LabelEncoder(),
'Leather_interior': LabelEncoder(),
'Wheel': LabelEncoder()}

# mapping
mapping = {category : index for index, category in enumerate(LabelEncoders['Category'].classes_)}
print(mapping)

{'Cabriolet': 0, 'Coupe': 1, 'Goods wagon': 2, 'Hatchback': 3, 'Jeep': 4, 'Limousine': 5, 'Microbus': 6, 'Miniva

n': 7, 'Pickup': 8, 'Sedan': 9, 'Universal': 10}

# # Save Label encoder for using

# with open('label_encoders.pkl','wb') as f :
# dump(LabelEncoders, f)

# for one hot encoding

categorical_cols = df_object[['Gear_box_type', 'Drive_wheels']].columns

ohe = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
one_hot_encoded = ohe.fit_transform(df_object[categorical_cols])
one_hot_columns = ohe.get_feature_names_out(categorical_cols)
df_ohe = DataFrame(one_hot_encoded, columns=one_hot_columns, index=df.index)
df_for_ohe = df_object.drop(columns=categorical_cols).join(df_ohe)

df_for_ohe = df_for_ohe.drop(['Manufacturer','Model','Category','Fuel_type','Color','Leather_interior', 'Wheel'],axis

df_for_ohe.head()

Gear_box_type_Automatic Gear_box_type_Manual Gear_box_type_Tiptronic Gear_box_type_Variator Drive_wheels_4x4 Drive_wheels_F

2 0.0 0.0 0.0 1.0 0.0

3 1.0 0.0 0.0 0.0 1.0

5 1.0 0.0 0.0 0.0 0.0

6 1.0 0.0 0.0 0.0 0.0

7 1.0 0.0 0.0 0.0 0.0

df_for_ohe.shape

(11520, 7)

# # save one hot encoder

# with open('One_Hot_Encoder.pkl', 'wb') as f:

# dump(ohe,f)

df = concat([df_non_object, df_object_for_LB, df_for_ohe],axis=1)

df.head()

Price Levy Prod_year Engine_volume Mileage Cylinders Airbags Age_of_Car Manufacturer Model ... Color Leather_inte

2 8467 906.299205 2006 1.3 200000 4 2 19 17 412 ... 1

3 3607 862.000000 2011 2.5 168966 4 0 14 13 397 ... 14

5 39493 891.000000 2016 2.0 160931 4 4 9 18 761 ... 14

6 1803 761.000000 2010 1.8 258909 4 12 15 48 694 ... 14

7 549 751.000000 2013 2.4 216118 4 12 12 18 782 ... 7

5 rows × 22 columns

df.shape

(11520, 22)

df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 11520 entries, 2 to 19236
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Price 11520 non-null int64
1 Levy 11520 non-null float64
2 Prod_year 11520 non-null int64
3 Engine_volume 11520 non-null float64
4 Mileage 11520 non-null int64
5 Cylinders 11520 non-null int64
6 Airbags 11520 non-null int64
7 Age_of_Car 11520 non-null int64
8 Manufacturer 11520 non-null int32
9 Model 11520 non-null int32
10 Category 11520 non-null int32
11 Fuel_type 11520 non-null int32
12 Color 11520 non-null int32
13 Leather_interior 11520 non-null int32
14 Wheel 11520 non-null int32
15 Gear_box_type_Automatic 11520 non-null float64
16 Gear_box_type_Manual 11520 non-null float64
17 Gear_box_type_Tiptronic 11520 non-null float64
18 Gear_box_type_Variator 11520 non-null float64
19 Drive_wheels_4x4 11520 non-null float64
20 Drive_wheels_Front 11520 non-null float64
21 Drive_wheels_Rear 11520 non-null float64
dtypes: float64(9), int32(7), int64(6)
memory usage: 1.7 MB

Model

Spliting Data to train data and test data

x = df.drop('Price',axis=1)
y = df['Price']

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.15,random_state=1234)

print(f'x_train : {x_train.shape}')
print(f'x_test : {x_test.shape}')
print('--------------------------')
print(f'y_train : {y_train.shape}')
print(f'y_test : {y_test.shape}')

x_train : (9792, 21)

x_test : (1728, 21)
--------------------------
y_train : (9792,)
y_test : (1728,)

x_train.columns

Index(['Levy', 'Prod_year', 'Engine_volume', 'Mileage', 'Cylinders', 'Airbags',

'Age_of_Car', 'Manufacturer', 'Model', 'Category', 'Fuel_type', 'Color',
'Leather_interior', 'Wheel', 'Gear_box_type_Automatic',
'Gear_box_type_Manual', 'Gear_box_type_Tiptronic',
'Gear_box_type_Variator', 'Drive_wheels_4x4', 'Drive_wheels_Front',
'Drive_wheels_Rear'],
dtype='object')

Standard Scaling

scaler = StandardScaler()

x_train[['Levy','Engine_volume','Mileage','Age_of_Car']] = scaler.fit_transform(x_train[['Levy','Engine_volume','Mile
x_test[['Levy','Engine_volume','Mileage','Age_of_Car']] = scaler.fit_transform(x_test[['Levy','Engine_volume','Mileag
# # saving scaling
# with open('scaler.pkl' , 'wb') as file :
# dump(scaler, file)

Creating Model

r_2=[]
rmse=[]
mae=[]

def reg(model):
model.fit(x_train,y_train)
pred = model.predict(x_test)

R2 = r2_score(y_test,pred)
RMSE = sqrt(mean_squared_error(y_test,pred))
MAE = mean_absolute_error(y_test,pred)

r_2.append(R2)
rmse.append(RMSE)
mae.append(MAE)

XGBRegressor_model = XGBRegressor()
RandomForestRegressor_model = RandomForestRegressor()
DecisionTreeRegressor_model = DecisionTreeRegressor()
GradientBoostingRegressor_model = GradientBoostingRegressor()

reg(XGBRegressor_model)
reg(RandomForestRegressor_model)
reg(DecisionTreeRegressor_model)
reg(GradientBoostingRegressor_model)

Algorithms = ['RandomForestRegressor','XGBRegressor','DecisionTreeRegressor','GradientBoostingRegressor']

result=DataFrame({'Algorithms':Algorithms,'R2':r_2,'rmse':rmse,'mae':mae})
result

Algorithms R2 rmse mae

0 RandomForestRegressor 0.780724 5288.389925 3628.887207

1 XGBRegressor 0.807756 4951.706688 3233.463064

2 DecisionTreeRegressor 0.633617 6835.904468 4086.539634

3 GradientBoostingRegressor 0.709942 6082.339520 4419.081908

Comparison of Algorithms:
XGBRegressor shows the best performance across all three metrics:
It has the highest R2 value (0.8092), indicating that it explains the largest proportion of the variance in the target variable.

It has the lowest rmse (4932.45) and mae (3203.26), indicating the smallest average errors in its predictions.

RandomForestRegressor performs the second best, with a good R2 value (0.7807) and relatively low error
metrics.
GradientBoostingRegressor comes in third, with a decent R2 (0.7098) but higher error metrics than
RandomForestRegressor and XGBRegressor.
DecisionTreeRegressor has the weakest performance among the four:
It has the lowest R2 value (0.6206), suggesting it explains the least amount of variance.

It has the highest rmse (6956.69) and mae (4194.28), indicating the largest average errors.

fig,sx = subplots(figsize=(14,4))
plot(result.Algorithms,result.R2,label='R2',c='b',marker='o')
legend()
show()
fig,sx = subplots(figsize=(14,4))
plot(result.Algorithms,result.rmse,label='rmse',c='r',marker='o')
legend()
show()

from matplotlib.pyplot import scatter

RandomForestRegressor_model.fit(x_train,y_train)
pred = RandomForestRegressor_model.predict(x_test)
scatter(y_test, pred, color='r')
show()

XGBRegressor_model.fit(x_train,y_train)
pred = XGBRegressor_model.predict(x_test)
scatter(y_test, pred, color='r')
show()
DecisionTreeRegressor_model.fit(x_train,y_train)
pred = DecisionTreeRegressor_model.predict(x_test)
scatter(y_test, pred, color='r')
show()

GradientBoostingRegressor_model.fit(x_train,y_train)
pred = GradientBoostingRegressor_model.predict(x_test)
scatter(y_test, pred, color='r')
show()
# # saving model
# with open('XGBRegressor_model.pkl' , 'wb') as file3 :
# dump(XGBRegressor_model, file3)

# # saving model
# with open('RandomForestRegressor_model.pkl' , 'wb') as file3 :
# dump(RandomForestRegressor_model, file3)

Conclusion

Based on these metrics, the XGBRegressor is the best-performing algorithm for this specific regression
task. It provides the most accurate predictions with the least amount of error and explains the most
variance in the target variable. The DecisionTreeRegressor appears to be the least suitable model among
the four.

!jupyter nbconvert --to html "Car_Price_prediction.ipynb"

[NbConvertApp] Converting notebook Car_Price_prediction.ipynb to html

[NbConvertApp] WARNING | Alternative text is missing on 33 image(s).
[NbConvertApp] Writing 2187045 bytes to Car_Price_prediction.html

Hybrid Manual
100% (3)
Hybrid Manual
236 pages
Data Mining
No ratings yet
Data Mining
10 pages
Belarus Car Price Prediction
No ratings yet
Belarus Car Price Prediction
18 pages
elite-sports-cars-eda
No ratings yet
elite-sports-cars-eda
9 pages
DV ca-1
No ratings yet
DV ca-1
9 pages
car-price-prediction-1 (1)
No ratings yet
car-price-prediction-1 (1)
24 pages
9587_9638_9563_ADS_exp1.ipynb - Colab
No ratings yet
9587_9638_9563_ADS_exp1.ipynb - Colab
8 pages
Internship
No ratings yet
Internship
23 pages
22eg107a11 DWV
No ratings yet
22eg107a11 DWV
15 pages
Car Price Prediction Using ML
No ratings yet
Car Price Prediction Using ML
11 pages
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Lab Assignment 6
No ratings yet
Lab Assignment 6
5 pages
Trilokesh Assignment
No ratings yet
Trilokesh Assignment
15 pages
vertopal.com_Numpy,,Pandas(24.4.25)
No ratings yet
vertopal.com_Numpy,,Pandas(24.4.25)
1 page
SMDM Business+Report
No ratings yet
SMDM Business+Report
11 pages
Cars Sales Dashboard
No ratings yet
Cars Sales Dashboard
19 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
SMDM-Business Report
No ratings yet
SMDM-Business Report
11 pages
Pyt On Visualization
No ratings yet
Pyt On Visualization
50 pages
Quikr Car Price Prediction Using Linear Regression 1717999953
No ratings yet
Quikr Car Price Prediction Using Linear Regression 1717999953
12 pages
Mohy - Jupyter Notebook
No ratings yet
Mohy - Jupyter Notebook
3 pages
BDA-4 EDA Project
No ratings yet
BDA-4 EDA Project
19 pages
Import As Import As: Numpy NP Pandas PD
No ratings yet
Import As Import As: Numpy NP Pandas PD
22 pages
Practical Example Full Notes
No ratings yet
Practical Example Full Notes
48 pages
EDA Withoutcode (1)
No ratings yet
EDA Withoutcode (1)
36 pages
Exp_5_Exploratory_Data_Analysis_sdk_ok
No ratings yet
Exp_5_Exploratory_Data_Analysis_sdk_ok
13 pages
Data Frames and Charts 2: 2.1 Dealing With Missing Values
No ratings yet
Data Frames and Charts 2: 2.1 Dealing With Missing Values
12 pages
Data Analysis
No ratings yet
Data Analysis
58 pages
Car Price Prediction Project
No ratings yet
Car Price Prediction Project
34 pages
Untitled 0
No ratings yet
Untitled 0
3 pages
Lab1 for module3- Python code (1)
No ratings yet
Lab1 for module3- Python code (1)
10 pages
Advance EDA & Predictive Analytics
No ratings yet
Advance EDA & Predictive Analytics
38 pages
Eda 1
No ratings yet
Eda 1
29 pages
Exploratiory data analysis
No ratings yet
Exploratiory data analysis
26 pages
Automobil E Data Analysis: Name Pgp-Dsba Online January' 21 Date: Dd/mm/yyyy
No ratings yet
Automobil E Data Analysis: Name Pgp-Dsba Online January' 21 Date: Dd/mm/yyyy
11 pages
Introduction To Python - Minor Project
No ratings yet
Introduction To Python - Minor Project
5 pages
Engo 645
No ratings yet
Engo 645
10 pages
nalysis-manipulation-and-cleaning
No ratings yet
nalysis-manipulation-and-cleaning
15 pages
Note
No ratings yet
Note
9 pages
Task 3 Car Price Prediction Using Machine Learning
No ratings yet
Task 3 Car Price Prediction Using Machine Learning
30 pages
Data Analysis Report
No ratings yet
Data Analysis Report
74 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
GmPrac1 - Jupyter Notebook
No ratings yet
GmPrac1 - Jupyter Notebook
11 pages
Car Price Prediction
No ratings yet
Car Price Prediction
72 pages
Project - Analyzing The Impact of Car Features On Price and Profitability
No ratings yet
Project - Analyzing The Impact of Car Features On Price and Profitability
8 pages
Team AN
No ratings yet
Team AN
23 pages
Impact of Car Features
No ratings yet
Impact of Car Features
9 pages
car-price
No ratings yet
car-price
6 pages
Report
No ratings yet
Report
4 pages
Report Analysis Super Cars
100% (1)
Report Analysis Super Cars
15 pages
Intro to Exploratory Data Analysis Eda in Python
No ratings yet
Intro to Exploratory Data Analysis Eda in Python
7 pages
DMPA RECORD-3-checkpoint - Removed
No ratings yet
DMPA RECORD-3-checkpoint - Removed
19 pages
DSBDA1
No ratings yet
DSBDA1
5 pages
Assignment CSE-520
No ratings yet
Assignment CSE-520
29 pages
Practical 2 .Ipynb - Colab (1) - Copy (1)
No ratings yet
Practical 2 .Ipynb - Colab (1) - Copy (1)
9 pages
The Coolest Supercars of 2013, According to Gianfranco
From Everand
The Coolest Supercars of 2013, According to Gianfranco
Gianfranco Rota
No ratings yet
Alfa Romeo Giulia GT Coupé: The Essential Buyer’s Guide
From Everand
Alfa Romeo Giulia GT Coupé: The Essential Buyer’s Guide
Keith Booker
No ratings yet
Jaguar/Daimler XJ40: The Essential Buyer’s Guide
From Everand
Jaguar/Daimler XJ40: The Essential Buyer’s Guide
Peter Crespin
No ratings yet
Jaguar/Daimler XJ 1994-2003: The Essential Buyer’s Guide
From Everand
Jaguar/Daimler XJ 1994-2003: The Essential Buyer’s Guide
Peter Crespin
No ratings yet
Pepsico Community Check - Dam Project Paithan, Aurangabad The Initiative
No ratings yet
Pepsico Community Check - Dam Project Paithan, Aurangabad The Initiative
7 pages
SiC-MOSFET and Si-IGBT-Based DC-DC Interleaved Con
No ratings yet
SiC-MOSFET and Si-IGBT-Based DC-DC Interleaved Con
21 pages
Honda and HEV
0% (1)
Honda and HEV
8 pages
Focus Electric Vehicles
No ratings yet
Focus Electric Vehicles
8 pages
Project
No ratings yet
Project
26 pages
Hybrid Vehicle Propulsion
No ratings yet
Hybrid Vehicle Propulsion
96 pages
ZF Hybrid Electric Transmissions
100% (1)
ZF Hybrid Electric Transmissions
8 pages
Erin Free Project Proposal Template
No ratings yet
Erin Free Project Proposal Template
12 pages
History of Toyota
50% (2)
History of Toyota
28 pages
Automatic Transmissions and Transaxles ch10 Hybrid Electric Vehicle Transmissions and Transaxles
100% (1)
Automatic Transmissions and Transaxles ch10 Hybrid Electric Vehicle Transmissions and Transaxles
38 pages
Unit 4 Hybrid Electric Drive-Trains: Department of Mechanical Engineering
No ratings yet
Unit 4 Hybrid Electric Drive-Trains: Department of Mechanical Engineering
67 pages
D2 S3 P4 Klaus Meyersieck
No ratings yet
D2 S3 P4 Klaus Meyersieck
14 pages
EV Report
No ratings yet
EV Report
3 pages
Audi Annual Financial Report 2019 PDF
No ratings yet
Audi Annual Financial Report 2019 PDF
274 pages
Syllabus 8th Semester
No ratings yet
Syllabus 8th Semester
6 pages
MKT460 Assignment (Automotive)
No ratings yet
MKT460 Assignment (Automotive)
19 pages
FS East 2023 Rules Formatted v1.0
No ratings yet
FS East 2023 Rules Formatted v1.0
55 pages
Hybrid Electric V
No ratings yet
Hybrid Electric V
23 pages
Govindarajan Sridharan Aviation 2021
No ratings yet
Govindarajan Sridharan Aviation 2021
24 pages
Maruti Customer Care Programs
No ratings yet
Maruti Customer Care Programs
136 pages
Regenerative Braking System: Group Guide
No ratings yet
Regenerative Braking System: Group Guide
14 pages
Toward More Electric Powertrains in Aircraft Tech
No ratings yet
Toward More Electric Powertrains in Aircraft Tech
18 pages
BNEF 2030 Brazil Energy Roadmap 1636200084
No ratings yet
BNEF 2030 Brazil Energy Roadmap 1636200084
58 pages
Im Project Aura
No ratings yet
Im Project Aura
11 pages
01 - Electric Powertrain Structures Base
No ratings yet
01 - Electric Powertrain Structures Base
4 pages
Electric Hybrid-Electric Aircraft Propulsion Systems
No ratings yet
Electric Hybrid-Electric Aircraft Propulsion Systems
13 pages
Brunner 2018
No ratings yet
Brunner 2018
11 pages
Assignment RM (Khalid)
No ratings yet
Assignment RM (Khalid)
10 pages
Hybrid & Electric Technology: Drive With Us
No ratings yet
Hybrid & Electric Technology: Drive With Us
9 pages