0% found this document useful (0 votes)
40 views

Prac - 8 (1) - Jupyter Notebook

This Python program uses linear regression to predict house prices based on various housing features in a dataset. The program loads and explores the data, splits it into training and test sets, trains a linear regression model on the training set and uses it to make predictions on the test set, and evaluates the model performance.

Uploaded by

Tirth Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Prac - 8 (1) - Jupyter Notebook

This Python program uses linear regression to predict house prices based on various housing features in a dataset. The program loads and explores the data, splits it into training and test sets, trains a linear regression model on the training set and uses it to make predictions on the test set, and evaluates the model performance.

Uploaded by

Tirth Shah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Practical-8

Write a python program to for house price prediction using linear regression

In [45]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import metrics
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
%matplotlib inline
data=pd.read_csv("/content/USA_Housing.csv")
data.head()



Out[45]: Avg.
Avg.
Area Avg. Area
Avg. Area Area Area
Number Number of Price Address
Income House Population
of Bedrooms
Age
Rooms

208 Michael Ferry Apt.


0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 674\nLaurabury, NE
3701...

188 Johnson Views


1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06 Suite 079\nLake
Kathleen, CA...

9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 Stravenue\nDanieltown,
WI 06482...

USS Barnett\nFPO AP
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06
44820

USNS Raymond\nFPO
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05
AE 09386
In [16]: data.info()

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 5000 entries, 0 to 4999

Data columns (total 7 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Avg. Area Income 5000 non-null float64

1 Avg. Area House Age 5000 non-null float64

2 Avg. Area Number of Rooms 5000 non-null float64

3 Avg. Area Number of Bedrooms 5000 non-null float64

4 Area Population 5000 non-null float64

5 Price 5000 non-null float64

6 Address 5000 non-null object

dtypes: float64(6), object(1)

memory usage: 273.6+ KB

In [18]: data.describe()

Out[18]: Avg. Area Avg. Area


Avg. Area Avg. Area Area
Number of Number of Price
Income House Age Population
Rooms Bedrooms

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03

mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06

std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06

In [20]: data.columns

Out[20]: Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',

'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],

dtype='object')
In [21]: sns.pairplot(data)

Out[21]: <seaborn.axisgrid.PairGrid at 0x7f43f2feccd0>


In [22]: sns.displot(data['Price'])

Out[22]: <seaborn.axisgrid.FacetGrid at 0x7f43f1f77810>

In [23]: sns.heatmap(data.corr(),annot=True)

Out[23]: <matplotlib.axes._subplots.AxesSubplot at 0x7f43f1ec2ed0>


In [25]: X = data[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]

y = data['Price']

In [30]: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_s


In [32]: ml=LinearRegression()

ml.fit(X_train,y_train)

Out[32]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [33]: print(ml.intercept_)

-2640159.796853739

In [40]: predictions=ml.predict( X_test)

In [41]: plt.scatter(y_test,predictions)

Out[41]: <matplotlib.collections.PathCollection at 0x7f43f1a9e250>


In [42]: sns.distplot((y_test-predictions),bins=50);

/usr/local/lib/python3.7/dist-packages/seaborn/distributions.py:2619: FutureWar
ning: `distplot` is a deprecated function and will be removed in a future versi
on. Please adapt your code to use either `displot` (a figure-level function wit
h similar flexibility) or `histplot` (an axes-level function for histograms).

warnings.warn(msg, FutureWarning)

In [46]: print('MAE:', metrics.mean_absolute_error(y_test, predictions))


print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

MAE: 82288.22251914928

MSE: 10460958907.208244

RMSE: 102278.82922290538

In [ ]: ​

You might also like