0% found this document useful (0 votes)

5 views

DSBDA Assignment 4 Jupyter Notebook

The document is a Jupyter Notebook for an assignment involving housing data analysis using Python libraries such as pandas, numpy, and sklearn. It includes data loading, preprocessing, model training with linear regression, and prediction of house prices based on user input. The model evaluation shows a Mean Squared Error of 25.00 and an R-squared value of 0.66.

Uploaded by

sumeet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

sumeet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [21]: import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [ ]:

In [22]: df = pd.read_csv("HousingData.csv")

In [23]: df

Out[23]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

... ... ... ... ... ... ... ... ... ... ... ... ...

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

506 rows × 14 columns

In [24]: df.head()

Out[24]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90

1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90

2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83

3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63

4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90

1 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [25]: df.tail()

Out[25]:
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT

501 0.06263 0.0 11.93 0.0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99

502 0.04527 0.0 11.93 0.0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90

503 0.06076 0.0 11.93 0.0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90

504 0.10959 0.0 11.93 0.0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45

505 0.04741 0.0 11.93 0.0 0.573 6.030 NaN 2.5050 1 273 21.0 396.90

In [26]: df.isnull().sum()

Out[26]: CRIM 20
ZN 20
INDUS 20
CHAS 20
NOX 0
RM 0
AGE 20
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 20
MEDV 0
dtype: int64

In [27]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 486 non-null float64
1 ZN 486 non-null float64
2 INDUS 486 non-null float64
3 CHAS 486 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 486 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null int64
9 TAX 506 non-null int64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 486 non-null float64
13 MEDV 506 non-null float64
dtypes: float64(12), int64(2)

2 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [28]: df.describe()

Out[28]:
CRIM ZN INDUS CHAS NOX RM AGE

count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000

mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519

std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513

min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000

25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000

50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000

75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000

In [29]: df.fillna(df.median(numeric_only=True), inplace=True)

In [36]: df.isnull().sum()

Out[36]: CRIM 0
ZN 0
INDUS 0
CHAS 0
NOX 0
RM 0
AGE 0
DIS 0
RAD 0
TAX 0
PTRATIO 0
B 0
LSTAT 0
MEDV 0
dtype: int64

In [30]: X = df.drop(columns=['MEDV'])
y = df['MEDV']

In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [32]: model = LinearRegression()

model.fit(X_train, y_train)

Out[32]:
▾ LinearRegression i ?
(https://
scikit-
LinearRegression() learn.org/1.4/
modules/
generated/

In [33]: y_pred = model.predict(X_test)

3 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [34]: mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

print(f"R-squared (R2): {r2:.2f}")

Mean Squared Error: 25.00

R-squared (R2): 0.66

In [35]: plt.figure(figsize=(8, 6))

sns.scatterplot(x=y_test, y=y_pred, color='blue', alpha=0.6)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted Prices")
plt.show()

4 of 5 27/02/25, 11:30
DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [41]: # Take user input for all features

print("Enter the house details to predict the price:")

CRIM = float(input("Crime rate per capita: "))

ZN = float(input("Proportion of residential land zoned for large lots: "
INDUS = float(input("Proportion of non-retail business acres per town: "
CHAS = float(input("Charles River (1 if bounds river, else 0): "))
NOX = float(input("Nitrogen oxide concentration (pollution level): "))
RM = float(input("Average number of rooms per dwelling: "))
AGE = float(input("Proportion of owner-occupied units built before 1940: "
DIS = float(input("Weighted distance to employment centers: "))
RAD = int(input("Index of accessibility to highways: "))
TAX = int(input("Property tax rate per $10,000: "))
PTRATIO = float(input("Pupil-teacher ratio by town: "))
B = float(input("Proportion of Black residents: "))
LSTAT = float(input("Lower status population percentage: "))

# Store input values in a DataFrame

user_data = pd.DataFrame([[CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS,
columns=X.columns)

# Predict the house price

predicted_price = model.predict(user_data)

# Display the result

print(f"\nPredicted House Price: ${predicted_price[0] * 1000:.2f}")

Enter the house details to predict the price:

Crime rate per capita: 12
Proportion of residential land zoned for large lots: 42
Proportion of non-retail business acres per town: 52
Charles River (1 if bounds river, else 0): 45
Nitrogen oxide concentration (pollution level): 23
Average number of rooms per dwelling: 5
Proportion of owner-occupied units built before 1940: 10
Weighted distance to employment centers: 45
Index of accessibility to highways: 15
Property tax rate per $10,000: 5
Pupil-teacher ratio by town: 5
Proportion of Black residents: 200
Lower status population percentage: 20

Predicted House Price: $-245333.47

In [ ]:

5 of 5 27/02/25, 11:30

Problem 01 Answer Key
100% (7)
Problem 01 Answer Key
6 pages
Api 650 2021-510-515
No ratings yet
Api 650 2021-510-515
6 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
DSBDA_prac4_1
No ratings yet
DSBDA_prac4_1
2 pages
a4 dsbda sana (2)
No ratings yet
a4 dsbda sana (2)
16 pages
prg7a - Jupyter Notebook
No ratings yet
prg7a - Jupyter Notebook
12 pages
DL 1
No ratings yet
DL 1
4 pages
DL (Pra 01)
No ratings yet
DL (Pra 01)
9 pages
Xgboost
No ratings yet
Xgboost
12 pages
Satya772244@gmail - Com House Price Prediction
No ratings yet
Satya772244@gmail - Com House Price Prediction
5 pages
Assignment 8
No ratings yet
Assignment 8
4 pages
Ash 4
No ratings yet
Ash 4
8 pages
data_analytucs_1[1]
No ratings yet
data_analytucs_1[1]
5 pages
exp_3_ml
No ratings yet
exp_3_ml
3 pages
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
No ratings yet
Implementing OLS Regression On Boston Housing Secondary Dataset. Also Check The Data For Missing Values and Outliers.
26 pages
DSBDA_prac4_2
No ratings yet
DSBDA_prac4_2
1 page
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Gaurav - Data Mining Lab Assignment
No ratings yet
Gaurav - Data Mining Lab Assignment
36 pages
Ex7 HTML
No ratings yet
Ex7 HTML
3 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Train
No ratings yet
Train
17 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
Ash Regression
No ratings yet
Ash Regression
11 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
T2_summary_VHA
No ratings yet
T2_summary_VHA
14 pages
Ml Manual
No ratings yet
Ml Manual
30 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
dsbda_exp4_part1
No ratings yet
dsbda_exp4_part1
39 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
data analytics lab manual
No ratings yet
data analytics lab manual
26 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
DA PROGRAM UPTO 6 (1)
No ratings yet
DA PROGRAM UPTO 6 (1)
20 pages
Data Science Record_05
No ratings yet
Data Science Record_05
20 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Experiment No.:1: Program
No ratings yet
Experiment No.:1: Program
7 pages
Unit 5
No ratings yet
Unit 5
171 pages
Assignment4
No ratings yet
Assignment4
7 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
ml manual
No ratings yet
ml manual
9 pages
Latihan4 - Analisis Deskriptif
No ratings yet
Latihan4 - Analisis Deskriptif
10 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Openlab1
No ratings yet
Openlab1
17 pages
featureselection
No ratings yet
featureselection
11 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Sukanya Linear LogisticRegression Report
100% (1)
Sukanya Linear LogisticRegression Report
23 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
DSC project 442
No ratings yet
DSC project 442
12 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
BAUDM Assignment Predicting Boston Housing Prices
No ratings yet
BAUDM Assignment Predicting Boston Housing Prices
6 pages
Machine Learning Lab Manual (1) (1)
No ratings yet
Machine Learning Lab Manual (1) (1)
26 pages
Machine File
No ratings yet
Machine File
27 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
The Ultimate Guide To Auto Cad 2022 3D Modeling For 3d Drawing And Modeling
From Everand
The Ultimate Guide To Auto Cad 2022 3D Modeling For 3d Drawing And Modeling
ALLEN BENTON
No ratings yet
Little Boy Stories From Tiger Scrub
From Everand
Little Boy Stories From Tiger Scrub
Stephen Guest
No ratings yet
CC-Unit4
No ratings yet
CC-Unit4
33 pages
CC-Unit5
No ratings yet
CC-Unit5
28 pages
Submission Certifcate_ TE COMP
No ratings yet
Submission Certifcate_ TE COMP
1 page
Pathway to Light - 6. the Blue Canal Elemental Water Attunements
No ratings yet
Pathway to Light - 6. the Blue Canal Elemental Water Attunements
31 pages
DSBDA Assignment 3 Jupyter Notebook
No ratings yet
DSBDA Assignment 3 Jupyter Notebook
3 pages
5. Reiki Certificate Example 1 - PLEASE EDIT
No ratings yet
5. Reiki Certificate Example 1 - PLEASE EDIT
1 page
6.queue
No ratings yet
6.queue
14 pages
AnsGDMO2023
No ratings yet
AnsGDMO2023
1 page
DSL135 Power Swivel Drawing and Part List
No ratings yet
DSL135 Power Swivel Drawing and Part List
33 pages
Types of Corrosion in Biomaterials
No ratings yet
Types of Corrosion in Biomaterials
47 pages
NEW: Pilkington Pyroclear Plus Meets EW
No ratings yet
NEW: Pilkington Pyroclear Plus Meets EW
2 pages
Bihar LRD Service Help
No ratings yet
Bihar LRD Service Help
6 pages
Hanuman Tha
No ratings yet
Hanuman Tha
36 pages
Prospect of Manufacturing Cutting Tools in Developing Countries
No ratings yet
Prospect of Manufacturing Cutting Tools in Developing Countries
7 pages
Risk Based Capital Management For Banks: Janata Bank Staff College Dhaka
No ratings yet
Risk Based Capital Management For Banks: Janata Bank Staff College Dhaka
39 pages
Task 3: Essay "The Greatest Accomplishment of My Life"
No ratings yet
Task 3: Essay "The Greatest Accomplishment of My Life"
6 pages
Hospital Housekeeping Manual
No ratings yet
Hospital Housekeeping Manual
14 pages
Principal's Message: The Calgary Science School
No ratings yet
Principal's Message: The Calgary Science School
20 pages
Family Code 1
No ratings yet
Family Code 1
10 pages
Legal Notice No. 284 of 2020
No ratings yet
Legal Notice No. 284 of 2020
13 pages
Antibacterial Property of Cayenne Pepper
No ratings yet
Antibacterial Property of Cayenne Pepper
21 pages
Issues and Debates
No ratings yet
Issues and Debates
10 pages
Forensic Engineering of Fire-Damaged Structures
No ratings yet
Forensic Engineering of Fire-Damaged Structures
6 pages
Experiment No.3 Title: Solar Energy Demonstration
No ratings yet
Experiment No.3 Title: Solar Energy Demonstration
6 pages
Anker Schroeder - Anchors For Marine Structures PDF
No ratings yet
Anker Schroeder - Anchors For Marine Structures PDF
28 pages
Lesson Plan 2
No ratings yet
Lesson Plan 2
5 pages
Balance Disorders and Vestibular Function Testing - 230602 - 081139
No ratings yet
Balance Disorders and Vestibular Function Testing - 230602 - 081139
55 pages
No Overage
No ratings yet
No Overage
35 pages
2 2-Pharmacognosy
100% (1)
2 2-Pharmacognosy
11 pages
Contract 072021
No ratings yet
Contract 072021
1 page
Catering Craft Practice Year 12
No ratings yet
Catering Craft Practice Year 12
8 pages
Environmental Engineering Lab Viva Questions
100% (1)
Environmental Engineering Lab Viva Questions
3 pages
Creep - EMM212 - Physical Metallurgy II
No ratings yet
Creep - EMM212 - Physical Metallurgy II
34 pages
Spinazzola - Complex Trauma in Children and Adolscents
No ratings yet
Spinazzola - Complex Trauma in Children and Adolscents
6 pages
Buckingham
No ratings yet
Buckingham
145 pages

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

DSBDA Assignment 4 Jupyter Notebook

Uploaded by

DSBDA-Assignment-4 - Jupyter Notebook http://localhost:8888/notebooks/DSBDA-Assignment-4...

In [21]: import numpy as np

506 rows × 14 columns

count 486.000000 486.000000 486.000000 486.000000 506.000000 506.000000 486.000000

mean 3.611874 11.211934 11.083992 0.069959 0.554695 6.284634 68.518519

std 8.720192 23.388876 6.835896 0.255340 0.115878 0.702617 27.999513

min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000

25% 0.081900 0.000000 5.190000 0.000000 0.449000 5.885500 45.175000

50% 0.253715 0.000000 9.690000 0.000000 0.538000 6.208500 76.800000

75% 3.560263 12.500000 18.100000 0.000000 0.624000 6.623500 93.975000

max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000

In [29]: df.fillna(df.median(numeric_only=True), inplace=True)

In [31]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2

In [32]: model = LinearRegression()

In [33]: y_pred = model.predict(X_test)

In [34]: mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")

Mean Squared Error: 25.00

In [35]: plt.figure(figsize=(8, 6))

In [41]: # Take user input for all features

CRIM = float(input("Crime rate per capita: "))

# Store input values in a DataFrame

# Predict the house price

# Display the result

Enter the house details to predict the price:

Predicted House Price: $-245333.47

You might also like