Electric Vehicle Range Prediction-Regression Analysis
Electric Vehicle Range Prediction-Regression Analysis
Regression Analysis
1
1.Fermi estimation:
Fermi estimation, also known as order-of-magnitude estimation, is a
problem-solving technique used to make rough calculations and
approximate solutions based on reasonable assumptions. To perform a
Fermi estimation for an electric vehicle market project, we need to
break down the problem statement into smaller components. Here's an
example breakdown:
Vehicle lifespan:
2
Determine the average lifespan of vehicles, e.g., 10 years.
Replacement rate:
Assume that each vehicle is replaced with a new one after its lifespan
ends.
Estimate the annual growth rate of the electric vehicle market, e.g.,
20%.
2. Data Collection:
https://www.tesla.com/ownersmanual/modely/en_kr/GUID-4AC32116-
979A-4146-A935-F41F8551AFE6.html
https://iq.opengenus.org/advantages-and-disadvantages-of-linear-
regression/
https://www.statology.org/linear-regression-assumptions/
Data Preprocessing
Libraries
0.1.1 L
o
aimport pandas as pd
dimport numpy as np
iimport matplotlib.pyplot as plt
n%matplotlib inline
gimport seaborn as sns
import plotly.express as px
4
from scipy.stats import norm
import warnings
warnings.filterwarnings('ignore')
df=data.copy()
df.head()
[3]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 103 entries, 0 to 102
Data columns (total 14 columns):
# Column Non-Null Count Dtype
5
6 FastCharge_KmH 103 non-null int64
7 RapidCharge 103 non-null object
8 PowerTrain 103 non-null object
9 PlugType 103 non-null object
10 BodyStyle 103 non-null object
11 Segment 103 non-null object
12 Seats 103 non-null int64
13 Price 103 non-null float64
dtypes: float64(2), int64(5), object(7)
memory usage: 11.4+ KB
[4]: df.describe()
Seats Price
count 103.000000 1.030000e+02
mean 4.883495 5.033056e+06
std 0.795834 3.077267e+06
min 2.000000 1.814931e+06
25% 5.000000 3.104336e+06
50% 5.000000 4.057425e+06
75% 5.000000 5.860725e+06
max 7.000000 1.938548e+07
[6]: df.head()
6
0 Yes AWD Type 2 CCS Sedan D 5 5002354.20
1 No RWD Type 2 CCS Hatchback C 5 2704950.00
2 Yes AWD Type 2 CCS Liftback D 5 5088912.60
3 Yes RWD Type 2 CCS SUV D 5 6134826.60
4 Yes RWD Type 2 CCS Hatchback B 4 2975174.51
Model_Brand
0 Tesla Model 3 Long Range Dual Motor
1 Volkswagen ID.3 Pure
2 Polestar 2
3 BMW iX3
4 Honda e
0.1.4 Exploration
[7]: cat_column=['RapidCharge','PowerTrain','PlugType','BodyStyle','Segment','Seats']
7
8
9
10
11
Interpretation
• There are a greater amount of rapid charge EV vehicles.
• The AWD power train of vehicles are in large number.
• Type 2 CCS plug type vehicles are larger in number.
• The SUVs are in a large number.
• C(Medium Segment) vehicles are in abundance.
• Most of the vehicles have 5 seats.
[9]: df['Seats']=df['Seats'].astype('str')
[10]: df['RapidCharge']=df['RapidCharge'].map({'Yes':1,'No':0})
num_df=df[num_column]
num_df.set_index(keys=df.Model_Brand,inplace=True)
12
13
14
Interpretation
• Tesla Roadster is the fastest among all.
• Again Tesla Roadster has the highest range/km ratio, followed by tesla cybertruck and lucid
air.
• The vehicles Tesla model 3 long range, tesla y long range and tesla roadster have the highest
fast charge efficiency.
[14]: df.columns
15
'Segment', 'Seats', 'Price', 'Model_Brand'],
dtype='object')
16
17
18
Interpretation
• Tesla Roadster has the lowest acclereation second among all.
• Lightyear One has the lowest efficiency WhKm, which means it is most efficient.
• The vehicles seat mini, smart eq, volkswagon e-up and smart eq four have the lowest prices.
sns.set(style='whitegrid')
fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(10, 8))
19
row = i // 3
col = i % 3
sns.boxplot( y='value', data=pd.melt(df[[attribute]]), ax=axs[row, col],␣
↪color=colors[i])
axs[row, col].set_xlabel('')
axs[row, col].set_ylabel(attribute)
20
There is a strong positive correlation between: * Price and Fast_charge. * Price and Range_km.
* Price and Top Speed. * Top Speed and Fast Charge. * Top Speed and Range km. * Range_km
and Fast charge.
There is a negative correlation between: * Accel_sec is negatively correlated with every other
attribute.
[20]: sns.pairplot(df[num_attributes],diag_kind='kde')
plt.show()
21
[21]: def check_skweness(df,columnName):
try:
(mu, sigma) = norm.fit(df[columnName])
except RuntimeError:
(mu,sigma) = norm.fit(df[columnName].dropna())
print("Mu {} : {}, Sigma {} : {}".format(
columnName.upper(), mu, columnName.upper(), sigma))
plt.figure(figsize=(20,10))
sns.distplot(df[columnName], fit=norm, color="orange")
plt.title(columnName.upper() +
" Distplot", color="black")
plt.show()
22
[22]: for columns in df[num_attributes].columns:
check_skweness(df[num_attributes],columns)
23
Mu TOPSPEED_KMH : 179.19417475728156, Sigma TOPSPEED_KMH : 43.36099501160743
24
Mu FASTCHARGE_KMH : 444.2718446601942, Sigma FASTCHARGE_KMH : 202.95679363121414
25
0.2 Cluster Analysis
for i in range(2,8):
pipeline = Pipeline([
('preprocessor', preprocessor),
('estimator', KMeans(n_clusters=i))
])
clusters = pipeline.fit_predict(df)
kmeans_estimator = pipeline.named_steps['estimator']
elbow[i] = kmeans_estimator.inertia_
26
In this elbow plot k=4 seems the optimal solution
df['Labels'] = pipeline.fit_predict(df)
[27]: df['RapidCharge']=df['RapidCharge'].map({1:'Yes',0:'No'})
[28]: df['Seats']=df['Seats'].astype('int')
df.groupby('Labels').mean()
27
0 6.216667 185.416667 380.694444 211.194444 510.555556
1 12.681818 132.545455 154.090909 173.727273 189.090909
2 8.660526 153.447368 278.815789 170.894737 313.684211
3 3.855556 249.611111 494.444444 193.111111 743.333333
Seats Price
Labels
0 5.055556 5.361855e+06
1 4.000000 2.451013e+06
2 4.868421 3.184689e+06
3 5.111111 9.855480e+06
The previous attribute information provides that: * Lower values AccelSec are generally preferred
as they indicate faster acceleration. * Higher values TopSpeed_KmH are generally preferred as
they indicate greater speed. * Higher values Range_Km are generally preferred as they indicate
greater range. * Lower values Efficiency_WhKm are generally preferred as they indicate better
energy efficiency. * Higher values FastCharge_KmH are generally preferred as they indicate faster
charging. * Lower values Price are generally preferred as they indicate greater affordability. *
Higher values Seats are generally preferred as they indicate more passenger capacity.
Renaming labels
[29]: df['Labels']=df['Labels'].map({1:'High1-Range',2:'High2-Range',3:'Mid-Range',0:
↪'Low-Range'})
model = TSNE(random_state=1)
transformed = model.fit_transform(preprocessor.fit_transform(data))
Visualise_cluster(df)
28
Interpretation :
The market is majorly dominated by mid range and a mix of high and mid car models, whereas
the low end and the high end vehicles are present in fewer number.
[31]: df1=data.copy()
df1 = df1.astype({'Brand': str, 'Model': str})
df1['Model_Brand']=df1['Brand']+df1['Model']
df1.drop(['Brand','Model'],axis=1,inplace=True)
df1['Seats']=df1['Seats'].astype('str')
df1['RapidCharge']=df1['RapidCharge'].map({'Yes':1,'No':0})
clusters_customers = ward.fit_predict(preprocessor.fit_transform(df1))
29
[34]: def plot_dendrogram(model, **kwargs):
# create linkage matrix and then plot the dendrogram
30
• According to the dendrogram 2 cluster solution is appropriate, in case of Ward’s Linkage
])
df1['Labels']=pipeline1.fit_predict(df1)
[37]: df1['RapidCharge']=df1['RapidCharge'].map({1:'Yes',0:'No'})
[38]: df1['Seats']=df1['Seats'].astype('int')
df1.groupby('Labels').mean()
31
1 9.398182 149.618182 268.454545 174.509091 310.727273
Seats Price
Labels
0 5.083333 7.169506e+06
1 4.709091 3.168518e+06
• The above labels suggest that label 0 are high end vehicles.
• The label 1 has vehicles that fall under mid-end section vehicles.
Renaming Labels
[39]: df1['Labels']=df1['Labels'].map({1:'Mid-Range',0:'High-Range'})
[40]: Visualise_cluster(df1)
• For a two cluster solution the market is evenly distributed for High and Mid range
32
Segment Extraction:
1 1. Feature Engineering: If needed, perform feature engineering
. techniques to preprocess the input features. This may include data
normalization, handling missing values, encoding categorical
F variables, and transforming features for better representation.
e
a 1. Train-Test Split: Split the dataset into training and testing sets.
t
u The training set is used to train the Gradient Boosting Regressor
r model, while the testing set is used to evaluate its performance
e
and generalization ability.
E 2. Model Training: Fit the Gradient Boosting Regressor to the
n
g training data. The model learns to iteratively improve the
i predictions by minimizing the residuals (errors) of the previous
n
e models in the ensemble. Gradient Boosting combines multiple
e decision trees, where each subsequent tree corrects the errors
r
made by the previous trees.
i
n
3. Hyperparameter Tuning: Tune the hyperparameters of the
g
: Gradient Boosting Regressor model to optimize its performance.
This may involve adjusting parameters such as the learning
I
f rate, number of estimators (trees), maximum depth of the trees,
and minimum samples required for a leaf node.
n
e 4. Model Evaluation: Evaluate the trained model using the testing
e
set. Calculate relevant regression metrics such as mean squared
d
e error (MSE), mean absolute error (MAE), or R-squared to assess
d the model's accuracy and reliability in predicting the electric
,
vehicle range.
p
e 5. Segment Extraction: Once the Gradient Boosting Regressor
r model is trained and validated, it can be used to extract
f
segments based on the importance of the input features. The
33
m to the relative significance of different features in predicting the
o range, thereby identifying the segments where specific features
d play a more crucial role.
e
6, Interpretation and Analysis: Analyze the results and
l interpretations from the model to understand the distinct
p
ML Models:-
r
o 1. In the context of electric vehicle range prediction using regression
analysis, several ML techniques can be utilized to extract segments.
v
Regression analysis aims to establish a relationship between input
i variables (features) and the target variable (electric vehicle range) to
d make predictions. Here are some ML techniques commonly used for
segment extraction in electric vehicle range prediction using regression
e analysis:
Profiling :
i process of creating user profiles or individualized representations
s based on their specific characteristics, preferences, and historical data.
35
T model to provide personalized range predictions for different users.
h Here's how profiling can be incorporated into an ML model for electric
e vehicle range prediction:
s Data Collection: Collect data from individual electric vehicle
e
users, including their driving patterns, charging habits,
p historical range information, and other relevant features. This
r
o data serves as the basis for creating user profiles.
f
User Segmentation: Analyze the collected data to identify
i
l different user segments based on their driving behavior,
e charging patterns, and other relevant factors. This
s
segmentation helps group users with similar characteristics
h together, allowing for more targeted profiling.
e
l Profile Creation: For each user segment, create individual
p
profiles that capture their specific characteristics and
t preferences. The profiles can include information such as
a average daily distance traveled, charging frequency, preferred
i
l charging locations, and other user-specific details.
o
r Feature Extraction: Extract relevant features from the user
profiles that are likely to impact range prediction. These
t
h features can include driving distance, time of day, weather
e conditions during the commute, charging duration, and any
other personalized attributes identified during profiling.
r
e Model Training: Use the extracted features from the user
g
r profiles along with the corresponding range values to train the
e regression model. The model learns the relationship between
s
s the personalized features and the range, enabling it to provide
i individualized range predictions.
o
n
Code Link:
36
C AveshBhati7/Market-Segementation-EV-cars/tree/main
o
d
e
L
i
n
k
:
h
t
t
p
s
:
/
/
g
i
t
h
u
b
.
c
o
m
/
37