0% found this document useful (0 votes)
5 views

DSBDA Assignment 4 Jupyter Notebook

The document is a Jupyter Notebook for an assignment involving housing data analysis using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing, model training with linear regression, and prediction of house prices based on user input. The model evaluation shows a Mean Squared Error of 25.00 and an R-squared value of 0.66.

Uploaded by

sumeet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

DSBDA Assignment 4 Jupyter Notebook

The document is a Jupyter Notebook for an assignment involving housing data analysis using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing, model training with linear regression, and prediction of house prices based on user input. The model evaluation shows a Mean Squared Error of 25.00 and an R-squared value of 0.66.

Uploaded by

sumeet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [21]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [ ]:

In [22]: df = pd.read_csv("HousingData.csv")

In [23]: df

Out[23]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

... ... ... ... ... ... ... ... ... ... ... ... ...

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

506 rows × 14 columns

In [24]: df.head()

Out[24]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

1 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [25]: df.tail()

Out[25]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

In [26]: df.isnull().sum()

Out[26]: CRIM 20
ZN 20
INDUS 20
CHAS 20
NOX 0
RM 0
AGE 20
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 20
MEDV 0
dtype: int64

In [27]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 MEDV 506 non-null float64
dtypes: float64(12), int64(2)

2 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [28]: df.describe()

Out[28]:
CRIM ZN INDUS CHAS NOX RM AGE

count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000

mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519

std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513

min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000

25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000

50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000

75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000

In [29]: df.fillna(df.median(numeric_only=True), inplace=True)

In [36]: df.isnull().sum()

Out[36]: CRIM 0
ZN 0
INDUS 0
CHAS 0
NOX 0
RM 0
AGE 0
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 0
MEDV 0
dtype: int64

In [30]: X = df.drop(columns=['MEDV'])
y = df['MEDV']

In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [32]: model = LinearRegression()


model.fit(X_train, y_train)

Out[32]:
▾ LinearRegression i ?
(https://
scikit-
LinearRegression() learn.org/1.4/
modules/
generated/

In [33]: y_pred = model.predict(X_test)

3 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [34]: mse = mean_squared_error(y_test, y_pred)


r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")


print(f"R-squared (R2): {r2:.2f}")

Mean Squared Error: 25.00


R-squared (R2): 0.66

In [35]: plt.figure(figsize=(8, 6))


sns.scatterplot(x=y_test, y=y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Prices")
plt.show()

4 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [41]: # Take user input for all features


print("Enter the house details to predict the price:")

CRIM = float(input("Crime rate per capita: "))


ZN = float(input("Proportion of residential land zoned for large lots: "
INDUS = float(input("Proportion of non-retail business acres per town: "
CHAS = float(input("Charles River (1 if bounds river, else 0): "))
NOX = float(input("Nitrogen oxide concentration (pollution level): "))
RM = float(input("Average number of rooms per dwelling: "))
AGE = float(input("Proportion of owner-occupied units built before 1940: "
DIS = float(input("Weighted distance to employment centers: "))
RAD = int(input("Index of accessibility to highways: "))
TAX = int(input("Property tax rate per $10,000: "))
PTRATIO = float(input("Pupil-teacher ratio by town: "))
B = float(input("Proportion of Black residents: "))
LSTAT = float(input("Lower status population percentage: "))

# Store input values in a DataFrame


user_data = pd.DataFrame([[CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS,
columns=X.columns)

# Predict the house price


predicted_price = model.predict(user_data)

# Display the result


print(f"\nPredicted House Price: ${predicted_price[0] * 1000:.2f}")

Enter the house details to predict the price:


Crime rate per capita: 12
Proportion of residential land zoned for large lots: 42
Proportion of non-retail business acres per town: 52
Charles River (1 if bounds river, else 0): 45
Nitrogen oxide concentration (pollution level): 23
Average number of rooms per dwelling: 5
Proportion of owner-occupied units built before 1940: 10
Weighted distance to employment centers: 45
Index of accessibility to highways: 15
Property tax rate per $10,000: 5
Pupil-teacher ratio by town: 5
Proportion of Black residents: 200
Lower status population percentage: 20

Predicted House Price: $-245333.47

In [ ]:

5 of 5 27/02/25, 11:30

You might also like