Kaggle House Prices Advanced Regression Techniques
Kaggle House Prices Advanced Regression Techniques
Table of Contents
1. Introduction
2. Data Exploration
3. Data Preprocessing
4. Data Profiling
5. Feature Engineering
6. Model Building
7. Model Evaluation
8. Conclusion
Introduction
In the realm of real estate, accurately predicting house prices is of utmost importance for
both buyers and sellers. This project focuses on using advanced regression techniques to
develop a robust predictive model that can estimate house prices based on a multitude of
factors.
The dataset we will be working with contains a wide range of features, including information
about the properties, their location, and other relevant attributes. By leveraging various data
science tools and techniques, we aim to create a model that provides accurate price
estimates.
Data Exploration
Before diving into model building, it's essential to understand our data thoroughly. In this
section, we will explore the dataset, visualize key insights, and gain insights into the
relationships between different variables.
1 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Data Fields
• SalePrice: the property's sale price in dollars. This is the target variable that you're
trying to predict.
• MSSubClass: The building class
• MSZoning: The general zoning classification
• LotFrontage: Linear feet of street connected to property
• LotArea: Lot size in square feet
• Street: Type of road access
• Alley: Type of alley access
• LotShape: General shape of property
• LandContour: Flatness of the property
• Utilities: Type of utilities available
• LotConfig: Lot configuration
• LandSlope: Slope of property
• Neighborhood: Physical locations within Ames city limits
• Condition1: Proximity to main road or railroad
• Condition2: Proximity to main road or railroad (if a second is present)
• BldgType: Type of dwelling
• HouseStyle: Style of dwelling
• OverallQual: Overall material and finish quality
• OverallCond: Overall condition rating
• YearBuilt: Original construction date
• YearRemodAdd: Remodel date
• RoofStyle: Type of roof
• RoofMatl: Roof material
• Exterior1st: Exterior covering on house
• Exterior2nd: Exterior covering on house (if more than one material)
• MasVnrType: Masonry veneer type
• MasVnrArea: Masonry veneer area in square feet
• ExterQual: Exterior material quality
• ExterCond: Present condition of the material on the exterior
• Foundation: Type of foundation
• BsmtQual: Height of the basement
• BsmtCond: General condition of the basement
• BsmtExposure: Walkout or garden level basement walls
• BsmtFinType1: Quality of basement finished area
• BsmtFinSF1: Type 1 finished square feet
• BsmtFinType2: Quality of second finished area (if present)
• BsmtFinSF2: Type 2 finished square feet
• BsmtUnfSF: Unfinished square feet of basement area
• TotalBsmtSF: Total square feet of basement area
2 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
3 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Import Librabry
In [186… import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
%reload_ext autoreload
%autoreload 2
# will add libraries as an when needed
In [4]: df=pd.read_csv('train.csv')
test=pd.read_csv('test.csv')
print(f"Full DataSet Shape is : {df.shape}")
print(f"Full Test DataSet Shape is : {test.shape}")
Data Pre_profiling
In [5]: import pandas_profiling as pp
Data Preprocessing
Data preprocessing is a critical step in any machine learning project. Here, we will clean the
data, handle missing values, and transform variables to ensure they are suitable for model
training.
In [7]: df.head(10)
4 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[7]: Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
In [9]: df.head(2)
Out[9]: MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
In [10]: df.describe().T
5 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
6 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [11]: df.drop_duplicates(inplace=True)
In [12]: missing=df.isnull().sum()
lendf=len(df)
perc=(missing/lendf)*100
#print(perc)
col_nam=[]
for i, j in perc.items():
if j >=40:
col_nam.append(i)
print(f"List of Columsn has less more than 40% missing values: \t {col_nam} ")
df.drop(columns=(col_nam), inplace=True)
df.shape
List of Columsn has less more than 40% missing values: ['Alley', 'FireplaceQu', '
PoolQC', 'Fence', 'MiscFeature']
Out[12]: (1460, 75)
count 1460.000000
mean 180921.195890
std 79442.502883
min 34900.000000
25% 129975.000000
50% 163000.000000
75% 214000.000000
max 755000.000000
Name: SalePrice, dtype: float64
7 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [14]: df.dtypes.unique()
In [16]: df_num.head()
In [17]: df_cat=df.select_dtypes('O')
df_cat.head(2)
8 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[17]: MSZoning Street LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1
In [18]: df.isnull().sum()
9 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[18]: MSSubClass 0
MSZoning 0
LotFrontage 259
LotArea 0
Street 0
LotShape 0
LandContour 0
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 8
MasVnrArea 8
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinSF1 0
BsmtFinType2 38
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 1
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0
GarageType 81
GarageYrBlt 81
GarageFinish 81
GarageCars 0
10 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
GarageArea 0
GarageQual 81
GarageCond 81
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0
SalePrice 0
dtype: int64
In [19]: backup=df.copy()
In [21]: filled_df.isnull().sum()
11 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[21]: MSSubClass 0
MSZoning 0
LotFrontage 0
LotArea 0
Street 0
LotShape 0
LandContour 0
Utilities 0
LotConfig 0
LandSlope 0
Neighborhood 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
OverallQual 0
OverallCond 0
YearBuilt 0
YearRemodAdd 0
RoofStyle 0
RoofMatl 0
Exterior1st 0
Exterior2nd 0
MasVnrType 0
MasVnrArea 0
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 0
BsmtCond 0
BsmtExposure 0
BsmtFinType1 0
BsmtFinSF1 0
BsmtFinType2 0
BsmtFinSF2 0
BsmtUnfSF 0
TotalBsmtSF 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 0
1stFlrSF 0
2ndFlrSF 0
LowQualFinSF 0
GrLivArea 0
BsmtFullBath 0
BsmtHalfBath 0
FullBath 0
HalfBath 0
BedroomAbvGr 0
KitchenAbvGr 0
KitchenQual 0
TotRmsAbvGrd 0
Functional 0
Fireplaces 0
GarageType 0
GarageYrBlt 0
GarageFinish 0
GarageCars 0
12 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
GarageArea 0
GarageQual 0
GarageCond 0
PavedDrive 0
WoodDeckSF 0
OpenPorchSF 0
EnclosedPorch 0
3SsnPorch 0
ScreenPorch 0
PoolArea 0
MiscVal 0
MoSold 0
YrSold 0
SaleType 0
SaleCondition 0
SalePrice 0
dtype: int64
In [22]: df_num.info()
13 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 0 to 1459
Data columns (total 37 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 MSSubClass 1460 non-null int64
1 LotFrontage 1201 non-null float64
2 LotArea 1460 non-null int64
3 OverallQual 1460 non-null int64
4 OverallCond 1460 non-null int64
5 YearBuilt 1460 non-null int64
6 YearRemodAdd 1460 non-null int64
7 MasVnrArea 1452 non-null float64
8 BsmtFinSF1 1460 non-null int64
9 BsmtFinSF2 1460 non-null int64
10 BsmtUnfSF 1460 non-null int64
11 TotalBsmtSF 1460 non-null int64
12 1stFlrSF 1460 non-null int64
13 2ndFlrSF 1460 non-null int64
14 LowQualFinSF 1460 non-null int64
15 GrLivArea 1460 non-null int64
16 BsmtFullBath 1460 non-null int64
17 BsmtHalfBath 1460 non-null int64
18 FullBath 1460 non-null int64
19 HalfBath 1460 non-null int64
20 BedroomAbvGr 1460 non-null int64
21 KitchenAbvGr 1460 non-null int64
22 TotRmsAbvGrd 1460 non-null int64
23 Fireplaces 1460 non-null int64
24 GarageYrBlt 1379 non-null float64
25 GarageCars 1460 non-null int64
26 GarageArea 1460 non-null int64
27 WoodDeckSF 1460 non-null int64
28 OpenPorchSF 1460 non-null int64
29 EnclosedPorch 1460 non-null int64
30 3SsnPorch 1460 non-null int64
31 ScreenPorch 1460 non-null int64
32 PoolArea 1460 non-null int64
33 MiscVal 1460 non-null int64
34 MoSold 1460 non-null int64
35 YrSold 1460 non-null int64
36 SalePrice 1460 non-null int64
dtypes: float64(3), int64(34)
memory usage: 433.4 KB
In [23]: df_num.corr()
14 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
15 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [24]: cormat=df_num.corr()
paper=plt.figure(figsize=(7,8))
sns.set(font_scale=1.2)
sns.heatmap(cormat, cmap="coolwarm", cbar=True, linewidths=1, linecolor='black', vmax
plt.title("Correlation Heatmap")
plt.show()
16 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [26]: df_num_corr=df_num.corr()['SalePrice'][:-1]
best_num_features=df_num_corr[abs(df_num_corr)>0.4].sort_values(ascending=False)
17 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
OverallQual 0.790982
GrLivArea 0.708624
GarageCars 0.640409
GarageArea 0.623431
TotalBsmtSF 0.613581
1stFlrSF 0.605852
FullBath 0.560664
TotRmsAbvGrd 0.533723
YearBuilt 0.522897
YearRemodAdd 0.507101
GarageYrBlt 0.486362
MasVnrArea 0.477493
Fireplaces 0.466929
Name: SalePrice, dtype: float64
In [28]: correlation_matrix=df_num.corr()
salespricecorr=correlation_matrix["SalePrice"].sort_values(ascending=False)
salepricecorr_df=pd.DataFrame(salespricecorr)
plt.figure(figsize=(10, 8))
sns.heatmap(salepricecorr_df, annot=True, cmap="coolwarm", cbar=True)
plt.xlabel('Features')
plt.ylabel('Features')
plt.title('Correlation Heatmap with SalePrice')
plt.show()
In [29]: best_num_features.index
18 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [31]: df_best_num_feature
plt.show()
19 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [34]: df_best_num_feature.columns
In [35]: corr=df_best_num_feature.corr()
paper=plt.figure(figsize=(12,8))
sns.heatmap(corr[(corr>=0.5) |(corr<= -0.4)] ,
cmap='viridis', vmax=1.0, vmin=-1.0, linewidths=0.1,
annot=True, annot_kws={"size": 8}, square=True)
plt.show()
20 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [36]: sns.set(style="ticks")
sns.pairplot(df_best_num_feature)
plt.show()
21 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [37]: df_best_num_feature.isnull().sum()
Out[37]: OverallQual 0
GrLivArea 0
GarageCars 0
GarageArea 0
TotalBsmtSF 0
1stFlrSF 0
FullBath 0
TotRmsAbvGrd 0
YearBuilt 0
YearRemodAdd 0
SalePrice 0
dtype: int64
22 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[38]: MSZoning Street LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1
In [39]: df_cat.nunique()
Out[39]: MSZoning 5
Street 2
LotShape 4
LandContour 4
Utilities 2
LotConfig 5
LandSlope 3
Neighborhood 25
Condition1 9
Condition2 8
BldgType 5
HouseStyle 8
RoofStyle 6
RoofMatl 8
Exterior1st 15
Exterior2nd 16
MasVnrType 4
ExterQual 4
ExterCond 5
Foundation 6
BsmtQual 4
BsmtCond 4
BsmtExposure 4
BsmtFinType1 6
BsmtFinType2 6
Heating 6
HeatingQC 5
CentralAir 2
Electrical 5
KitchenQual 4
Functional 7
GarageType 6
GarageFinish 3
GarageQual 5
GarageCond 5
PavedDrive 3
SaleType 9
SaleCondition 6
dtype: int64
In [40]: df_cat['Neighborhood'].value_counts()
23 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
fig.update_layout(
title='Sale Price Distribution by Neighborhood',
xaxis=dict(title='Neighborhood'),
yaxis=dict(title='Sale Price'),
xaxis_tickangle=-45,
width=800,
height=400,
)
fig.show()
24 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
400k
200k
0
Cr er
oR r
M ge
Am t
ld es
Br wn
w s
ID rW
Ve gCr
m l
Sa de
N er
N Ht
ea R
Ed wV
Ti ds
G er
St ert
Cl Br
N Cr
m l
tn
So he
Bl Vil
N ers
N fo
Sa me
M TR
k
ng
e
r
ar
g
i
id
aw
ye
ilb
To
kS
do
itc
Pk
ea
en
on
ll
rid
O
A
w
Co
W
O
Neighborhood
In [43]: df_cat.nunique()
25 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[43]: MSZoning 5
Street 2
LotShape 4
LandContour 4
Utilities 2
LotConfig 5
LandSlope 3
Condition1 9
Condition2 8
BldgType 5
HouseStyle 8
RoofStyle 6
RoofMatl 8
Exterior1st 15
Exterior2nd 16
MasVnrType 4
ExterQual 4
ExterCond 5
Foundation 6
BsmtQual 4
BsmtCond 4
BsmtExposure 4
BsmtFinType1 6
BsmtFinType2 6
Heating 6
HeatingQC 5
CentralAir 2
Electrical 5
KitchenQual 4
Functional 7
GarageType 6
GarageFinish 3
GarageQual 5
GarageCond 5
PavedDrive 3
SaleType 9
SaleCondition 6
dtype: int64
In [44]: df_cat['Exterior1st'].value_counts()
In [45]: df_cat['Exterior2nd'].value_counts()
26 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
fig.update_layout(
title='Sale Price Distribution by Exterior2nd',
xaxis=dict(title='Exterior2nd'),
yaxis=dict(title='Sale Price'),
xaxis_tickangle=-45,
width=800,
height=400,
)
fig.show()
100M
80M
Sale Price
60M
40M
20M
0
e
Sd
Sd
ng
rd
ng
co
c
d
e
uc
ac
on
oo
tB
hn
Cm
Sh
oa
uc
Sh
Sd
l
al
ny
kF
St
en
yw
St
bS
ph
et
dB
St
Im
Vi
Br
d
d
Cm
M
As
As
Pl
Br
W
W
Exterior2nd
27 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
fig.update_layout(
title='Sale Price Distribution by Exterior1st',
xaxis=dict(title='Exterior1st'),
yaxis=dict(title='Sale Price'),
xaxis_tickangle=-45,
width=800,
height=400,
)
fig.show()
100M
80M
Sale Price
60M
40M
20M
0
Sd
Sd
ng
rd
m
ng
Bd
co
e
ac
on
oo
hn
Sh
om
oa
uc
Sd
hi
l
al
nt
ny
kF
yw
St
bS
ph
dS
et
dB
St
m
kC
Vi
Br
d
M
As
As
Pl
Ce
W
W
Br
Exterior1st
In [ ]:
28 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Pave 1454
Grvl 6
Name: Street, dtype: int64
AllPub 1459
NoSeWa 1
Name: Utilities, dtype: int64
Y 1365
N 95
Name: CentralAir, dtype: int64
False 943
True 517
Name: Has_VinylSd_Exterior, dtype: int64
False 1238
True 222
Name: Has_MetalSd_Exterior, dtype: int64
False 1234
True 226
Name: Has_Wd Sdng_Exterior, dtype: int64
False 1224
True 236
Name: Has_HdBoard_Exterior, dtype: int64
False 1409
True 51
Name: Has_BrkFace_Exterior, dtype: int64
False 1434
True 26
Name: Has_WdShing_Exterior, dtype: int64
False 1399
True 61
Name: Has_CemntBd_Exterior, dtype: int64
False 1306
True 154
Name: Has_Plywood_Exterior, dtype: int64
False 1437
True 23
Name: Has_AsbShng_Exterior, dtype: int64
False 1429
True 31
Name: Has_Stucco_Exterior, dtype: int64
False 1458
True 2
Name: Has_BrkComm_Exterior, dtype: int64
False 1457
True 3
Name: Has_AsphShn_Exterior, dtype: int64
False 1454
True 6
Name: Has_Stone_Exterior, dtype: int64
False 1450
True 10
Name: Has_ImStucc_Exterior, dtype: int64
False 1459
True 1
Name: Has_CBlock_Exterior, dtype: int64
29 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
# Check if the value count of the most common category is less than 50
if value_counts.iloc[1] < 50:
selected_columns.append(column)
30 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Column: Street
Pave 1454
Grvl 6
Name: Street, dtype: int64
Column: Utilities
AllPub 1459
NoSeWa 1
Name: Utilities, dtype: int64
Column: CentralAir
Y 1365
N 95
Name: CentralAir, dtype: int64
Column: Has_VinylSd_Exterior
False 943
True 517
Name: Has_VinylSd_Exterior, dtype: int64
Column: Has_MetalSd_Exterior
False 1238
True 222
Name: Has_MetalSd_Exterior, dtype: int64
Column: Has_Wd Sdng_Exterior
False 1234
True 226
Name: Has_Wd Sdng_Exterior, dtype: int64
Column: Has_HdBoard_Exterior
False 1224
True 236
Name: Has_HdBoard_Exterior, dtype: int64
Column: Has_BrkFace_Exterior
False 1409
True 51
Name: Has_BrkFace_Exterior, dtype: int64
Column: Has_WdShing_Exterior
False 1434
True 26
Name: Has_WdShing_Exterior, dtype: int64
Column: Has_CemntBd_Exterior
False 1399
True 61
Name: Has_CemntBd_Exterior, dtype: int64
Column: Has_Plywood_Exterior
False 1306
True 154
Name: Has_Plywood_Exterior, dtype: int64
Column: Has_AsbShng_Exterior
False 1437
True 23
Name: Has_AsbShng_Exterior, dtype: int64
Column: Has_Stucco_Exterior
False 1429
True 31
Name: Has_Stucco_Exterior, dtype: int64
Column: Has_BrkComm_Exterior
False 1458
True 2
Name: Has_BrkComm_Exterior, dtype: int64
Column: Has_AsphShn_Exterior
False 1457
True 3
31 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Pave 1454
Grvl 6
Name: Street, dtype: int64
AllPub 1459
NoSeWa 1
Name: Utilities, dtype: int64
False 1434
True 26
Name: Has_WdShing_Exterior, dtype: int64
False 1437
True 23
Name: Has_AsbShng_Exterior, dtype: int64
False 1429
True 31
Name: Has_Stucco_Exterior, dtype: int64
False 1458
True 2
Name: Has_BrkComm_Exterior, dtype: int64
False 1457
True 3
Name: Has_AsphShn_Exterior, dtype: int64
False 1454
True 6
Name: Has_Stone_Exterior, dtype: int64
False 1450
True 10
Name: Has_ImStucc_Exterior, dtype: int64
False 1459
True 1
Name: Has_CBlock_Exterior, dtype: int64
In [53]: df_cat.shape
32 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
# Create subplots
fig, axes = plt.subplots(num_rows, 3, figsize=(15, 5 * num_rows))
plt.subplots_adjust(hspace=0.5) # Adjust vertical spacing
plt.show()
33 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
34 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
35 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [55]: df_cat.isnull().sum()
36 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[55]: MSZoning 0
LotShape 0
LandContour 0
LotConfig 0
LandSlope 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
RoofStyle 0
RoofMatl 0
MasVnrType 8
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 37
BsmtCond 37
BsmtExposure 38
BsmtFinType1 37
BsmtFinType2 38
Heating 0
HeatingQC 0
CentralAir 0
Electrical 1
KitchenQual 0
Functional 0
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
PavedDrive 0
SaleType 0
SaleCondition 0
Has_VinylSd_Exterior 0
Has_MetalSd_Exterior 0
Has_Wd Sdng_Exterior 0
Has_HdBoard_Exterior 0
Has_BrkFace_Exterior 0
Has_CemntBd_Exterior 0
Has_Plywood_Exterior 0
dtype: int64
In [57]: cat_filled_df.isnull().sum()
37 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[57]: MSZoning 0
LotShape 0
LandContour 0
LotConfig 0
LandSlope 0
Condition1 0
Condition2 0
BldgType 0
HouseStyle 0
RoofStyle 0
RoofMatl 0
MasVnrType 0
ExterQual 0
ExterCond 0
Foundation 0
BsmtQual 0
BsmtCond 0
BsmtExposure 0
BsmtFinType1 0
BsmtFinType2 0
Heating 0
HeatingQC 0
CentralAir 0
Electrical 0
KitchenQual 0
Functional 0
GarageType 0
GarageFinish 0
GarageQual 0
GarageCond 0
PavedDrive 0
SaleType 0
SaleCondition 0
Has_VinylSd_Exterior 0
Has_MetalSd_Exterior 0
Has_Wd Sdng_Exterior 0
Has_HdBoard_Exterior 0
Has_BrkFace_Exterior 0
Has_CemntBd_Exterior 0
Has_Plywood_Exterior 0
dtype: int64
# Filter columns where the most frequent category exceeds the threshold
unrelevant_columns = category_counts[category_counts > threshold].index
relevant_columns = category_counts[category_counts < threshold].index
38 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [59]: print(df_cat[unrelevant_columns].shape)
print(df_cat[relevant_columns].shape)
print(f"Numerical {df_best_num_feature.shape}")
(1460, 27)
(1460, 13)
Numerical (1460, 11)
In [61]: relevant_columns=df_cat[relevant_columns]
39 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
RL 1151
RM 218
FV 65
RH 16
C (all) 10
Name: MSZoning, dtype: int64
Lvl 1311
Bnk 63
HLS 50
Low 36
Name: LandContour, dtype: int64
Inside 1052
Corner 263
CulDSac 94
FR2 47
FR3 4
Name: LotConfig, dtype: int64
Gtl 1382
Mod 65
Sev 13
Name: LandSlope, dtype: int64
Norm 1260
Feedr 81
Artery 48
RRAn 26
PosN 19
RRAe 11
PosA 8
RRNn 5
RRNe 2
Name: Condition1, dtype: int64
Norm 1445
Feedr 6
Artery 2
RRNn 2
PosN 2
PosA 1
RRAn 1
RRAe 1
Name: Condition2, dtype: int64
1Fam 1220
TwnhsE 114
Duplex 52
Twnhs 43
2fmCon 31
Name: BldgType, dtype: int64
Gable 1141
Hip 286
Flat 13
Gambrel 11
Mansard 7
Shed 2
Name: RoofStyle, dtype: int64
CompShg 1434
Tar&Grv 11
WdShngl 6
WdShake 5
Metal 1
Membran 1
40 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Roll 1
ClyTile 1
Name: RoofMatl, dtype: int64
TA 1282
Gd 146
Fa 28
Ex 3
Po 1
Name: ExterCond, dtype: int64
TA 1311
Gd 65
Fa 45
Po 2
Name: BsmtCond, dtype: int64
Unf 1256
Rec 54
LwQ 46
BLQ 33
ALQ 19
GLQ 14
Name: BsmtFinType2, dtype: int64
GasA 1428
GasW 18
Grav 7
Wall 4
OthW 2
Floor 1
Name: Heating, dtype: int64
Y 1365
N 95
Name: CentralAir, dtype: int64
SBrkr 1334
FuseA 94
FuseF 27
FuseP 3
Mix 1
Name: Electrical, dtype: int64
Typ 1360
Min2 34
Min1 31
Mod 15
Maj1 14
Maj2 5
Sev 1
Name: Functional, dtype: int64
TA 1311
Fa 48
Gd 14
Ex 3
Po 3
Name: GarageQual, dtype: int64
TA 1326
Fa 35
Gd 9
Po 7
Ex 2
Name: GarageCond, dtype: int64
Y 1340
N 90
41 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
P 30
Name: PavedDrive, dtype: int64
WD 1267
New 122
COD 43
ConLD 9
ConLI 5
ConLw 5
CWD 4
Oth 3
Con 2
Name: SaleType, dtype: int64
Normal 1198
Partial 125
Abnorml 101
Family 20
Alloca 12
AdjLand 4
Name: SaleCondition, dtype: int64
False 1238
True 222
Name: Has_MetalSd_Exterior, dtype: int64
False 1234
True 226
Name: Has_Wd Sdng_Exterior, dtype: int64
False 1224
True 236
Name: Has_HdBoard_Exterior, dtype: int64
False 1409
True 51
Name: Has_BrkFace_Exterior, dtype: int64
False 1399
True 61
Name: Has_CemntBd_Exterior, dtype: int64
False 1306
True 154
Name: Has_Plywood_Exterior, dtype: int64
42 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [63]: num_plots=len(unrelevant_col.columns)
num_plots_per_row=3
num_row=(num_plots+num_plots_per_row-1)//3
print(num_plots, num_row)
#create Subplots
fig.tight_layout()
plt.show()
27 9
43 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
44 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
45 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Reg 925
IR1 484
IR2 41
IR3 10
Name: LotShape, dtype: int64
1Story 726
2Story 445
1.5Fin 154
SLvl 65
SFoyer 37
1.5Unf 14
2.5Unf 11
2.5Fin 8
Name: HouseStyle, dtype: int64
None 864
BrkFace 445
Stone 128
BrkCmn 15
Name: MasVnrType, dtype: int64
TA 906
Gd 488
Ex 52
Fa 14
Name: ExterQual, dtype: int64
PConc 647
CBlock 634
BrkTil 146
Slab 24
Stone 6
Wood 3
Name: Foundation, dtype: int64
TA 649
Gd 618
Ex 121
Fa 35
Name: BsmtQual, dtype: int64
No 953
Av 221
Gd 134
Mn 114
Name: BsmtExposure, dtype: int64
Unf 430
GLQ 418
ALQ 220
BLQ 148
Rec 133
LwQ 74
Name: BsmtFinType1, dtype: int64
Ex 741
TA 428
Gd 241
Fa 49
Po 1
Name: HeatingQC, dtype: int64
TA 735
Gd 586
Ex 100
Fa 39
Name: KitchenQual, dtype: int64
46 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Attchd 870
Detchd 387
BuiltIn 88
Basment 19
CarPort 9
2Types 6
Name: GarageType, dtype: int64
Unf 605
RFn 422
Fin 352
Name: GarageFinish, dtype: int64
False 943
True 517
Name: Has_VinylSd_Exterior, dtype: int64
In [65]: num_plots=13
num_plots_per_row=3
num_row=(num_plots+num_plots_per_row-1)//3
print(num_plots, num_row)
#create Subplots
fig.tight_layout()
plt.show()
13 5
47 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
48 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Feature Engineering
Feature engineering is the art of creating new features from existing ones or selecting the
most relevant features for model training. In this section, we will engineer meaningful
features to improve our model's performance.
In [67]: df1.head()
49 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
g
2
Re
IR
IR
LotShape
50 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
g
2
Re
IR
IR
LotShape
600k
Sale Price
400k
200k
0
ry
ry
vl
in
nf
nf
e
SL
5F
5U
5U
oy
to
to
2S
1S
1.
SF
1.
2.
HouseStyle
51 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
e
e
ac
on
on
kF
St
Br
MasVnrType
600k
Sale Price
400k
200k
0
d
TA
Ex
G
ExterQual
52 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
c
ck
d
il
ab
on
kT
oo
lo
Sl
Br
PC
W
CB
Foundation
600k
Sale Price
400k
200k
0
d
TA
Ex
G
BsmtQual
53 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
o
n
N
M
BsmtExposure
600k
Sale Price
400k
200k
0
LQ
nf
Q
Re
AL
BL
U
G
BsmtFinType1
54 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
Ex
TA
Fa
G
HeatingQC
600k
Sale Price
400k
200k
0
d
TA
Ex
G
KitchenQual
55 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
400k
200k
In
t
d
t
or
en
ch
ch
ilt
rP
sm
t
et
Bu
At
Ca
D
Ba
GarageType
600k
Sale Price
400k
200k
0
n
nf
RF
GarageFinish
56 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
600k
Sale Price
400k
200k
0
ue
e
ls
tr
fa
Has_VinylSd_Exterior
600k
Sale Price
400k
200k
0
2
OverallQual
In [ ]:
57 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
plt.tight_layout()
plt.show()
800k
600k
Sale Price
400k
200k
0
2
8
OverallQual
58 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
800k
600k
Sale Price
400k
200k
0
0
00
00
00
00
00
10
20
30
40
50
GrLivArea
800k
600k
Sale Price
400k
200k
0
0
GarageCars
59 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
800k
600k
Sale Price
400k
200k
0
0
00
00
20
40
60
80
10
12
GarageArea
800k
600k
Sale Price
400k
200k
0
0
00
00
00
00
00
10
20
30
40
50
TotalBsmtSF
60 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
800k
600k
Sale Price
400k
200k
0
00
00
00
00
10
20
30
40
1stFlrSF
800k
600k
Sale Price
400k
200k
0
0
5
0.
1.
2.
FullBath
61 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
800k
600k
Sale Price
400k
200k
0
2
10
12
TotRmsAbvGrd
800k
600k
Sale Price
400k
200k
0
80
00
20
40
60
80
18
19
19
19
19
19
YearBuilt
62 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
800k
600k
Sale Price
400k
200k
0
50
60
70
80
90
00
19
19
19
19
19
20
YearRemodAdd
800k
600k
Sale Price
400k
200k
0
0
0k
0k
0k
0k
0k
0k
10
20
30
40
50
60
SalePrice
63 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
LotShape
Reg 925
IR1 484
IR2 41
IR3 10
Name: LotShape, dtype: int64
HouseStyle
1Story 726
2Story 445
1.5Fin 154
SLvl 65
SFoyer 37
1.5Unf 14
2.5Unf 11
2.5Fin 8
Name: HouseStyle, dtype: int64
MasVnrType
None 864
BrkFace 445
Stone 128
BrkCmn 15
Name: MasVnrType, dtype: int64
ExterQual
TA 906
Gd 488
Ex 52
Fa 14
Name: ExterQual, dtype: int64
Foundation
PConc 647
CBlock 634
BrkTil 146
Slab 24
Stone 6
Wood 3
Name: Foundation, dtype: int64
BsmtQual
TA 649
Gd 618
Ex 121
Fa 35
Name: BsmtQual, dtype: int64
BsmtExposure
No 953
Av 221
Gd 134
Mn 114
Name: BsmtExposure, dtype: int64
BsmtFinType1
Unf 430
GLQ 418
ALQ 220
BLQ 148
Rec 133
LwQ 74
Name: BsmtFinType1, dtype: int64
HeatingQC
Ex 741
TA 428
64 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Gd 241
Fa 49
Po 1
Name: HeatingQC, dtype: int64
KitchenQual
TA 735
Gd 586
Ex 100
Fa 39
Name: KitchenQual, dtype: int64
GarageType
Attchd 870
Detchd 387
BuiltIn 88
Basment 19
CarPort 9
2Types 6
Name: GarageType, dtype: int64
GarageFinish
Unf 605
RFn 422
Fin 352
Name: GarageFinish, dtype: int64
Has_VinylSd_Exterior
False 943
True 517
Name: Has_VinylSd_Exterior, dtype: int64
OverallQual
5 397
6 374
7 319
8 168
4 116
9 43
3 20
10 18
2 3
1 2
Name: OverallQual, dtype: int64
GrLivArea
864 22
1040 14
894 11
1456 10
848 10
..
2296 1
1123 1
1199 1
1473 1
1256 1
Name: GrLivArea, Length: 861, dtype: int64
GarageCars
2 824
1 369
3 181
0 81
4 5
Name: GarageCars, dtype: int64
65 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
GarageArea
0 81
440 49
576 47
240 38
484 34
..
320 1
594 1
831 1
878 1
192 1
Name: GarageArea, Length: 441, dtype: int64
TotalBsmtSF
0 37
864 35
672 17
912 15
1040 14
..
1838 1
1581 1
707 1
611 1
1542 1
Name: TotalBsmtSF, Length: 721, dtype: int64
1stFlrSF
864 25
1040 16
912 14
894 12
848 12
..
1509 1
2515 1
605 1
3138 1
1256 1
Name: 1stFlrSF, Length: 753, dtype: int64
FullBath
2 768
1 650
3 33
0 9
Name: FullBath, dtype: int64
TotRmsAbvGrd
6 402
7 329
5 275
8 187
4 97
9 75
10 47
11 18
3 17
12 11
2 1
14 1
Name: TotRmsAbvGrd, dtype: int64
66 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
YearBuilt
2006 67
2005 64
2004 54
2007 49
2003 45
1976 33
1977 32
1920 30
1959 26
1998 25
1999 25
1965 24
2000 24
1970 24
1954 24
1958 24
2008 23
2002 23
1972 23
1971 22
1968 22
1950 20
1957 20
2001 20
1994 19
1962 19
1940 18
1966 18
2009 18
1995 18
1910 17
1993 17
1960 17
1963 16
1978 16
1925 16
1955 16
1967 16
1996 15
1941 15
1964 15
1961 14
1948 14
1956 14
1969 14
1997 14
1992 13
1953 12
1990 12
1949 12
1973 11
1988 11
1900 10
1974 10
1915 10
1980 10
1984 9
1926 9
67 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
1936 9
1979 9
1930 9
1922 8
1975 8
1939 8
1916 8
1928 7
1914 7
1923 7
1924 7
1918 7
1946 7
1935 6
1951 6
1921 6
1945 6
1982 6
1931 6
1986 5
1937 5
1981 5
1991 5
1947 5
1952 5
1985 5
1929 4
1938 4
1983 4
1932 4
1880 4
1919 3
1989 3
1912 3
1927 3
1987 3
1934 3
1942 2
1890 2
1885 2
1908 2
1892 2
1913 1
1893 1
1906 1
2010 1
1898 1
1904 1
1882 1
1875 1
1911 1
1917 1
1872 1
1905 1
Name: YearBuilt, dtype: int64
YearRemodAdd
1950 178
2006 97
2007 76
68 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
2005 73
2004 62
2000 55
2003 51
2002 48
2008 40
1996 36
1998 36
1995 31
1976 30
1999 30
1970 26
1977 25
1997 25
2009 23
1994 22
2001 21
1972 20
1965 19
1993 19
1959 18
1971 18
1992 17
1968 17
1978 16
1966 15
1958 15
1990 15
1969 14
1954 14
1991 14
1962 14
1963 13
1960 12
1980 12
1967 12
1973 11
1989 11
1964 11
1953 10
1979 10
1987 10
1956 10
1975 10
1955 9
1957 9
1985 9
1988 9
1981 8
1961 8
1984 7
1982 7
1974 7
2010 6
1986 5
1952 5
1983 5
1951 4
Name: YearRemodAdd, dtype: int64
69 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
SalePrice
140000 20
135000 17
155000 14
145000 14
190000 13
..
202665 1
164900 1
208300 1
181500 1
147500 1
Name: SalePrice, Length: 663, dtype: int64
In [76]: df1.info()
70 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 0 to 1459
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 LotShape 1460 non-null object
1 HouseStyle 1460 non-null object
2 MasVnrType 1460 non-null object
3 ExterQual 1460 non-null object
4 Foundation 1460 non-null object
5 BsmtQual 1423 non-null object
6 BsmtExposure 1422 non-null object
7 BsmtFinType1 1423 non-null object
8 HeatingQC 1460 non-null object
9 KitchenQual 1460 non-null object
10 GarageType 1379 non-null object
11 GarageFinish 1379 non-null object
12 Has_VinylSd_Exterior 1460 non-null bool
13 OverallQual 1460 non-null int64
14 GrLivArea 1460 non-null int64
15 GarageCars 1460 non-null int64
16 GarageArea 1460 non-null int64
17 TotalBsmtSF 1460 non-null int64
18 1stFlrSF 1460 non-null int64
19 FullBath 1460 non-null int64
20 TotRmsAbvGrd 1460 non-null int64
21 YearBuilt 1460 non-null int64
22 YearRemodAdd 1460 non-null int64
23 SalePrice 1460 non-null int64
dtypes: bool(1), int64(11), object(12)
memory usage: 307.5+ KB
Ordinal
In [77]: from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OrdinalEncoder
df1["ExterQual"]=pd.DataFrame(qual_label.transform(df1[["ExterQual"]]))
df1["BsmtQual"]=pd.DataFrame(qual_label.transform(df1[["BsmtQual"]]))
71 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
df1["LotShape"]=pd.DataFrame(qual_label.transform(df1[["LotShape"]]))
df1["HouseStyle"]=pd.DataFrame(qual_label.transform(df1[["HouseStyle"]]))
df1["MasVnrType"]=pd.DataFrame(qual_label.transform(df1[["MasVnrType"]]))
df1["Foundation"]=pd.DataFrame(qual_label.transform(df1[["Foundation"]]))
df1["BsmtExposure"]=pd.DataFrame(qual_label.transform(df1[["BsmtExposure"]]))
df1["BsmtFinType1"]=pd.DataFrame(qual_label.transform(df1[["BsmtFinType1"]]))
df1["HeatingQC"]=pd.DataFrame(qual_label.transform(df1[["HeatingQC"]]))
df1["HeatingQC"].value_counts()
df1["KitchenQual"]=pd.DataFrame(qual_label.transform(df1[["KitchenQual"]]))
72 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
df1["GarageType"]=pd.DataFrame(qual_label.transform(df1[["GarageType"]]))
df1["GarageFinish"]=pd.DataFrame(qual_label.transform(df1[["GarageFinish"]]))
In [91]: df1.head()
Label Encoder
In [92]: label_encoder = LabelEncoder()
df1['Has_VinylSd_Exterior'] = label_encoder.fit_transform(df1['Has_VinylSd_Exterior'
In [93]: df1.head()
In [94]: df1.info()
73 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1460 entries, 0 to 1459
Data columns (total 24 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 LotShape 1460 non-null float64
1 HouseStyle 1460 non-null float64
2 MasVnrType 1460 non-null float64
3 ExterQual 1460 non-null float64
4 Foundation 1460 non-null float64
5 BsmtQual 1460 non-null float64
6 BsmtExposure 1460 non-null float64
7 BsmtFinType1 1460 non-null float64
8 HeatingQC 1460 non-null float64
9 KitchenQual 1460 non-null float64
10 GarageType 1460 non-null float64
11 GarageFinish 1460 non-null float64
12 Has_VinylSd_Exterior 1460 non-null int64
13 OverallQual 1460 non-null int64
14 GrLivArea 1460 non-null int64
15 GarageCars 1460 non-null int64
16 GarageArea 1460 non-null int64
17 TotalBsmtSF 1460 non-null int64
18 1stFlrSF 1460 non-null int64
19 FullBath 1460 non-null int64
20 TotRmsAbvGrd 1460 non-null int64
21 YearBuilt 1460 non-null int64
22 YearRemodAdd 1460 non-null int64
23 SalePrice 1460 non-null int64
dtypes: float64(12), int64(12)
memory usage: 317.4 KB
Normalization
In [95]: from sklearn.preprocessing import StandardScaler
In [96]: numeric_col=df_best_num_feature.columns[:-1]
numeric_col
In [99]: df1.head()
74 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Model Building
The heart of this project lies in building a predictive model. We will explore various
regression algorithms, tune hyperparameters, and train models to predict house prices
accurately.
In [223… y=df1['SalePrice']
Y shape(1460,)
X shape(1460, 22)
Linear Model
75 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
• Model Predict
R2 Score : 67.280 %
In [232… r2score=r2_score(y_test,y_pred)
print("R2 Score :\t{:.3F} %".format(r2score*100))
R2 Score : 67.280 %
Test Data
In [150… test.head()
Out[150]: Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
In [151… Id=test['Id']
In [152… test.columns
76 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [153… num_col=df_best_num_feature.columns[:-1]
In [154… num_col
In [155… test_col=test.columns
In [156… test_col
In [157… common_num_col=test_col.intersection(num_col)
In [160… test_num_df=test[common_num_col].copy()
In [161… test_num_df.isnull().sum()
77 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[161]: OverallQual 0
YearBuilt 0
YearRemodAdd 0
TotalBsmtSF 1
1stFlrSF 0
GrLivArea 0
FullBath 0
TotRmsAbvGrd 0
GarageCars 1
GarageArea 1
dtype: int64
In [163… test_num_df.isnull().sum()
Out[163]: OverallQual 0
YearBuilt 0
YearRemodAdd 0
TotalBsmtSF 0
1stFlrSF 0
GrLivArea 0
FullBath 0
TotRmsAbvGrd 0
GarageCars 0
GarageArea 0
dtype: int64
In [235… cat_col=relevant_columns.columns
In [168… common_cat_col=test_col.intersection(cat_col)
common_cat_col
78 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [236… test_cat_df=test[common_cat_col].copy()
In [237… test_cat_df.head()
In [238… test_cat_df.isnull().sum()
Out[238]: LotShape 0
HouseStyle 0
MasVnrType 16
ExterQual 0
Foundation 0
BsmtQual 44
BsmtExposure 44
BsmtFinType1 42
HeatingQC 0
KitchenQual 1
GarageType 76
GarageFinish 78
dtype: int64
In [240… test_cat_df.isnull().sum()
Out[240]: LotShape 0
HouseStyle 0
MasVnrType 0
ExterQual 0
Foundation 0
BsmtQual 0
BsmtExposure 0
BsmtFinType1 0
HeatingQC 0
KitchenQual 0
GarageType 0
GarageFinish 0
dtype: int64
In [241… df1.head(1)
In [242… test_cat_df.head(1)
79 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [243… test_num_df.head(1)
In [245… test_df1.head(1)
In [189… test_df1.info()
80 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1459 entries, 0 to 1458
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 LotShape 1459 non-null object
1 HouseStyle 1459 non-null object
2 MasVnrType 1459 non-null object
3 ExterQual 1459 non-null float64
4 Foundation 1459 non-null object
5 BsmtQual 1459 non-null float64
6 BsmtExposure 1459 non-null object
7 BsmtFinType1 1459 non-null object
8 HeatingQC 1459 non-null object
9 KitchenQual 1459 non-null object
10 GarageType 1459 non-null object
11 GarageFinish 1459 non-null object
12 OverallQual 1459 non-null int64
13 YearBuilt 1459 non-null int64
14 YearRemodAdd 1459 non-null int64
15 TotalBsmtSF 1459 non-null float64
16 1stFlrSF 1459 non-null int64
17 GrLivArea 1459 non-null int64
18 FullBath 1459 non-null int64
19 TotRmsAbvGrd 1459 non-null int64
20 GarageCars 1459 non-null float64
21 GarageArea 1459 non-null float64
dtypes: float64(5), int64(7), object(10)
memory usage: 250.9+ KB
81 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
df["Foundation"]=pd.DataFrame(qual_label.transform(df[["Foundation"]]))
df["BsmtExposure"]=pd.DataFrame(qual_label.transform(df[["BsmtExposure"]]))
df["BsmtFinType1"]=pd.DataFrame(qual_label.transform(df[["BsmtFinType1"]]))
df["HeatingQC"]=pd.DataFrame(qual_label.transform(df[["HeatingQC"]]))
df["KitchenQual"]=pd.DataFrame(qual_label.transform(df[["KitchenQual"]]))
df["GarageType"]=pd.DataFrame(qual_label.transform(df[["GarageType"]]))
df["GarageFinish"]=pd.DataFrame(qual_label.transform(df[["GarageFinish"]]))
return df
82 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
return df
test_df1 = transform_dataframe(test_df1)
In [247… test_df1.head()
test_df1["LotShape"]=pd.DataFrame(qual_label.transform(test_df1[["LotShape"]]))
test_df1["HouseStyle"]=pd.DataFrame(qual_label.transform(test_df1[["HouseStyle"]]))
test_df1["MasVnrType"]=pd.DataFrame(qual_label.transform(test_df1[["MasVnrType"]]))
In [249… test_df1.head()
In [250… df_best_num_feature.head()
83 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [251… backup=test_df1.copy()
scaler = StandardScaler()
df[numeric_col] = scaler.fit_transform(df[numeric_col])
return df
In [254… test_num_df
In [256… test_df2.head(2)
84 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [272… train_col=df1.columns[:-1]
In [273… train_col
In [275… test_df3.columns
22
23
85 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
In [274… test_df3=test_df2[train_col]
In [284… test_df3['BsmtQual'].value_counts()
Out[284]: TA 678
Gd 591
Ex 137
Fa 53
Name: BsmtQual, dtype: int64
In [286… test_df3.head(2)
test_df3["BsmtQual"]=pd.DataFrame(qual_label.transform(test_df3[["BsmtQual"]]))
# test_df3["ExterQual"]=pd.DataFrame(qual_label.transform(test_df3[["ExterQual"]]))
In [307… test_data_predict=model.predict(x_test)
# results_df = pd.DataFrame({'Id': id, 'SalePrice': test_data_predict})
# results_df.to_csv('house_prices_LinearRegression.csv', index=False)
In [308… test_data_predict=test_data_predict.round(3)
In [310… results_df
86 of 87 9/29/2023, 3:06 PM
0.1 ver KAGGLE HOUSE PRICES ADVANCED REGRESSION TE... file:///C:/Users/prashantkumar.sundge/Downloads/0.1%20ver%20KA...
Out[310]: Id SalePrice
0 1461 6764220.764
1 1462 3009809.795
2 1463 4554987.177
3 1464 4458130.080
4 1465 4823571.943
Conclusion
In the final section, we will summarize our findings, discuss the model's strengths and
weaknesses, and offer insights into potential areas for improvement.
• However, improvements are needed to reduce the Mean Square Error (MSE) of
approximately 2,259,584,718.02 and the
87 of 87 9/29/2023, 3:06 PM