0% found this document useful (0 votes)

13 views

Nikitha

This document contains code to perform simple linear regression on a housing price dataset to predict prices based on square meters. It loads and explores the Paris housing dataset, selects relevant features correlated with price, then uses gradient descent with different learning rates to fit a linear regression model and analyze the cost function over iterations. Gradient descent with a learning rate of 0.1 converges faster than with a learning rate of 0.01 based on the plotted cost histories.

Uploaded by

Chakri Chakradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Nikitha

Uploaded by

Chakri Chakradhar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

dataset = pd.read_csv("/content/ParisHousing.csv")
dataset.head(20)

squareMeters numberOfRooms hasYard hasPool floors cityCode \

0 75523 3 0 1 63 9373
1 80771 39 1 1 98 39381
2 55712 58 0 1 19 34457
3 32316 47 0 0 6 27939
4 70429 19 1 1 90 38045
5 39223 36 0 1 17 39489
6 58682 10 1 1 99 6450
7 86929 100 1 0 11 98155
8 51522 3 0 0 61 9047
9 39686 42 0 0 15 71019
10 23563 21 0 1 90 91058
11 96470 74 1 0 21 92029
12 19127 31 1 0 5 7475
13 13087 44 1 0 77 40475
14 79770 3 0 1 69 54812
15 75985 60 1 0 67 6517
16 64169 88 0 1 6 61711
17 99371 31 1 1 16 96297
18 25966 37 1 1 17 22818
19 41792 43 1 1 10 80768

cityPartRange numPrevOwners made isNewBuilt hasStormProtector

\
0 3 8 2005 0 1

1 8 6 2015 1 0

2 6 8 2021 0 0

3 10 4 2012 0 1

4 3 7 1990 1 0

5 8 6 2012 0 1

6 10 9 1995 1 1

7 3 4 2003 1 0

8 8 3 2012 1 1
9 5 8 2021 1 1

10 6 8 1993 1 0

11 4 2 2011 1 1

12 2 9 2008 0 0

13 8 4 2004 1 0

14 10 5 2018 0 1

15 6 9 2009 1 1

16 3 9 2011 1 1

17 7 8 2013 1 1

18 3 1 2016 0 0

19 9 5 2017 1 1

basement attic garage hasStorageRoom hasGuestRoom price

0 4313 9005 956 0 7 7559081.5
1 3653 2436 128 1 2 8085989.5
2 2937 8852 135 1 9 5574642.1
3 659 7141 359 0 3 3232561.2
4 8435 2429 292 1 4 7055052.0
5 2009 4552 757 0 1 3926647.2
6 5930 9453 848 0 5 5876376.5
7 6326 4748 654 0 10 8696869.3
8 632 5792 807 1 5 5154055.2
9 5198 5342 591 1 3 3970892.1
10 703 852 684 1 10 2366397.3
11 5414 1172 716 1 9 9652258.1
12 5387 4430 374 0 4 1914688.8
13 1745 724 582 0 0 1320803.4
14 8871 7117 240 0 7 7986665.8
15 4878 281 384 1 5 7607322.9
16 3054 129 726 0 9 6420823.1
17 3258 6296 354 1 8 9944705.3
18 8257 2557 162 0 6 2604486.6
19 2950 9573 572 1 5 4187667.7

dataset.shape

(10000, 17)

dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 squareMeters 10000 non-null int64
1 numberOfRooms 10000 non-null int64
2 hasYard 10000 non-null int64
3 hasPool 10000 non-null int64
4 floors 10000 non-null int64
5 cityCode 10000 non-null int64
6 cityPartRange 10000 non-null int64
7 numPrevOwners 10000 non-null int64
8 made 10000 non-null int64
9 isNewBuilt 10000 non-null int64
10 hasStormProtector 10000 non-null int64
11 basement 10000 non-null int64
12 attic 10000 non-null int64
13 garage 10000 non-null int64
14 hasStorageRoom 10000 non-null int64
15 hasGuestRoom 10000 non-null int64
16 price 10000 non-null float64
dtypes: float64(1), int64(16)
memory usage: 1.3 MB

corr = dataset.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f",
linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()
correlation_with_price = dataset.corr()['price'].abs()

threshold = 0.01

highly_correlated_columns =
correlation_with_price[correlation_with_price >
threshold].index.tolist()

print("Columns highly correlated with 'Price':")

print(highly_correlated_columns)

Columns highly correlated with 'Price':

['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage', 'price']

df = dataset[highly_correlated_columns]
df.head()

squareMeters numPrevOwners isNewBuilt garage price

0 75523 8 0 956 7559081.5
1 80771 6 1 128 8085989.5
2 55712 8 0 135 5574642.1
3 32316 4 0 359 3232561.2
4 70429 7 1 292 7055052.0

Gradient Desent, Learning rate, Cost Function

X = df['squareMeters'].values
y = df['price'].values

learning_rate = 0.01
iterations = 10

# Initialize coefficients (slope and intercept)

b0 = 0 # Intercept
b1 = 0 # Slope

# Lists to store the history of coefficients and cost

b0_history = []
b1_history = []
cost_history = []

# Gradient Descent
for iteration in range(iterations):
# Calculate predictions
y_pred = b0 + b1 * X

# Calculate the cost (mean squared error)

cost = np.mean((y_pred - y) ** 2)

# Calculate gradients
gradient_b0 = np.mean(y_pred - y)
gradient_b1 = np.mean((y_pred - y) * X)

# Update coefficients using gradients and learning rate

b0 -= learning_rate * gradient_b0
b1 -= learning_rate * gradient_b1

# Append coefficients and cost to history lists for visualization

b0_history.append(b0)
b1_history.append(b1)
cost_history.append(cost)

# Plot the cost history

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost History')

plt.tight_layout()
plt.show()

# Print the final coefficients and cost

print("Final Intercept (b0):", b0)
print("Final Slope (b1):", b1)
print("Final Cost:", cost_history[-1])

Final Intercept (b0): -2.4127252943307258e+72

Final Slope (b1): -1.6037599090064153e+77
Final Cost: 7.759017749802741e+148

X = df['squareMeters'].values
y = df['price'].values

learning_rate = 0.1
iterations = 10

# Initialize coefficients (slope and intercept)

b0 = 0 # Intercept
b1 = 0 # Slope
# Lists to store the history of coefficients and cost
b0_history = []
b1_history = []
cost_history = []

# Gradient Descent
for iteration in range(iterations):
# Calculate predictions
y_pred = b0 + b1 * X

# Calculate the cost (mean squared error)

cost = np.mean((y_pred - y) ** 2)

# Calculate gradients
gradient_b0 = np.mean(y_pred - y)
gradient_b1 = np.mean((y_pred - y) * X)

# Update coefficients using gradients and learning rate

b0 -= learning_rate * gradient_b0
b1 -= learning_rate * gradient_b1

# Append coefficients and cost to history lists for visualization

b0_history.append(b0)
b1_history.append(b1)
cost_history.append(cost)

# Plot the cost history

plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(cost_history)
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Cost History')

plt.tight_layout()
plt.show()

# Print the final coefficients and cost

print("Final Intercept (b0):", b0)
print("Final Slope (b1):", b1)
print("Final Cost:", cost_history[-1])
Final Intercept (b0): -2.4127259493867817e+82
Final Slope (b1): -1.603760344427987e+87
Final Cost: 7.759021541641718e+166

from sklearn.model_selection import train_test_split

X = df['squareMeters'].values
y = df['price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=42)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

X_train = X_train.reshape(-1, 1)
X_test = X_test.reshape(-1, 1)
# Fit the model to the data
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

from sklearn.metrics import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
print("R-squared:", r2)
print("Mean Squared Error:", mse)

# Plot each independent variable against the dependent variable

for i in range(1):
plt.figure(figsize=(6, 4))
plt.scatter(X_train[:, i], y_train, label='Data')
plt.xlabel(df.columns[i])
plt.ylabel('Price')
plt.title(f'Scatter Plot of {df.columns[i]} vs. Price')

# Plot the regression line

sorted_indices = np.argsort(X_test[:, i])
plt.plot(X_test[:, i][sorted_indices], y_pred[sorted_indices],
color='red', label='Linear Regression')

plt.legend()
plt.show()

R-squared: 0.999998793097589
Mean Squared Error: 10440151.787275104

Gradient Desent, Cost Function, Learning Rate for Multi Regression

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']
def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - np.mean(feature)) / np.std(feature)

# Hyperparameters
alpha = 0.01
num_iterations = 10

# Initialization
m = len(X['squareMeters'])
X0 = np.ones(m)
X1 = normalize(np.array(X['isNewBuilt']))
X2 = normalize(np.array(X['numPrevOwners']))
X3 = normalize(np.array(X['squareMeters']))
X4 = normalize(np.array(X['garage']))
y = normalize(np.array(y))

X = np.array([X0, X1, X2, X3, X4]).T

theta = np.zeros(5)

# Gradient Descent
for _ in range(num_iterations):
y_pred = np.dot(X, theta)
gradient = (1/m) * np.dot(X.T, (y_pred - y))
theta -= alpha * gradient

print("Parameters:", theta)

Parameters: [-8.27782287e-19 -9.70733553e-04 1.51884192e-03

9.56150111e-02
-1.57511062e-03]

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']

def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - np.mean(feature)) / np.std(feature)

# Hyperparameters
alpha = 0.10
num_iterations = 10

X = np.array([X0, X1, X2, X3, X4]).T

theta = np.zeros(5)

# Gradient Descent
for _ in range(num_iterations):
y_pred = np.dot(X, theta)
gradient = (1/m) * np.dot(X.T, (y_pred - y))
theta -= alpha * gradient

print("Parameters:", theta)

Parameters: [ 2.11741735e-17 -4.05175641e-03 6.47160620e-03

6.51187869e-01
-6.73095587e-03]

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']

def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - np.mean(feature)) / np.std(feature)

# Hyperparameters
alpha = 0.01
num_iterations = 20

X = np.array([X0, X1, X2,X3,X4]).T

theta = np.zeros(5)
# Cost history to store MSE values for each iteration
cost_history = []

# Gradient Descent
for _ in range(num_iterations):
y_pred = np.dot(X, theta)
cost = (1/m) * sum((y_pred - y)**2)
cost_history.append(cost)

gradient = (1/m) * np.dot(X.T, (y_pred - y))

theta -= alpha * gradient

plt.plot(cost_history)
plt.title('Cost Function Over Iterations (Multiple Regression)')
plt.xlabel('Iterations')
plt.ylabel('Cost (MSE)')
plt.show()

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt', 'garage']]

y = df['price']
def normalize(feature):
"""Standardize the feature using Z-score normalization."""
return (feature - np.mean(feature)) / np.std(feature)

# Hyperparameters
alpha = 0.10
num_iterations = 20

X = np.array([X0, X1, X2,X3,X4]).T

theta = np.zeros(5)

# Cost history to store MSE values for each iteration

cost_history = []

# Gradient Descent
for _ in range(num_iterations):
y_pred = np.dot(X, theta)
cost = (1/m) * sum((y_pred - y)**2)
cost_history.append(cost)

gradient = (1/m) * np.dot(X.T, (y_pred - y))

theta -= alpha * gradient

plt.plot(cost_history)
plt.title('Cost Function Over Iterations (Multiple Regression)')
plt.xlabel('Iterations')
plt.ylabel('Cost (MSE)')
plt.show()
from sklearn.model_selection import train_test_split

X = df[['squareMeters', 'numPrevOwners', 'isNewBuilt',

'garage']].values
y = df['price'].values

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=42)

from sklearn.linear_model import LinearRegression

model = LinearRegression()

# Fit the model to the data

model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

from sklearn.metrics import r2_score, mean_squared_error

r2 = r2_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)

print("R-squared:", r2)
print("Mean Squared Error:", mse)

# Plot each independent variable against the dependent variable

for i in range(1):
plt.figure(figsize=(6, 4))
plt.scatter(X_train[:, i], y_train, label='Data')
plt.xlabel(df.columns[i])
plt.ylabel('Price')
plt.title(f'Scatter Plot of {df.columns[i]} vs. Price')

# Plot the regression line

sorted_indices = np.argsort(X_test[:, i])
plt.plot(X_test[:, i][sorted_indices], y_pred[sorted_indices],
color='red', label='Linear Regression')

plt.legend()
plt.show()

R-squared: 0.9999987942704442
Mean Squared Error: 10430006.155390566

Revenue Architecture Course - All Slides
No ratings yet
Revenue Architecture Course - All Slides
614 pages
Project Report On A Comparison of SEBI and SEC: in The Fulfilment of Master of Business Administration (2020-2022)
No ratings yet
Project Report On A Comparison of SEBI and SEC: in The Fulfilment of Master of Business Administration (2020-2022)
14 pages
Asian Paints - Financial Modeling (With Solutions) - CBA
No ratings yet
Asian Paints - Financial Modeling (With Solutions) - CBA
47 pages
CWF Program v2 PDF
No ratings yet
CWF Program v2 PDF
10 pages
Seminar PPT
80% (5)
Seminar PPT
18 pages
ML LAB Prob 1 5
No ratings yet
ML LAB Prob 1 5
22 pages
GFED5_OC
No ratings yet
GFED5_OC
2 pages
Mascot: Positive (0.3)
No ratings yet
Mascot: Positive (0.3)
19 pages
ML Cops
No ratings yet
ML Cops
17 pages
Bjørdal - Understanding Gödel's Ontological Argument
No ratings yet
Bjørdal - Understanding Gödel's Ontological Argument
5 pages
Assignment2 VidulGarg
No ratings yet
Assignment2 VidulGarg
11 pages
SI Summary Revenue April 2020 Date Voice Billable SMS Billable GPRS Billable Subscriptions
No ratings yet
SI Summary Revenue April 2020 Date Voice Billable SMS Billable GPRS Billable Subscriptions
95 pages
House Price Prediction
No ratings yet
House Price Prediction
1 page
Target January 25 Sylhet Zone
No ratings yet
Target January 25 Sylhet Zone
4 pages
Dự báo và phát triển kinh doanh
No ratings yet
Dự báo và phát triển kinh doanh
43 pages
Target PPSM
No ratings yet
Target PPSM
94 pages
PetEconTutQuest
No ratings yet
PetEconTutQuest
11 pages
mughal history1
No ratings yet
mughal history1
8 pages
Brazil BF
No ratings yet
Brazil BF
9 pages
M3 - Q3.xlsx (Updated)
No ratings yet
M3 - Q3.xlsx (Updated)
14 pages
DCP English Emassy-1
No ratings yet
DCP English Emassy-1
17 pages
Up 1
No ratings yet
Up 1
2 pages
Fan Calc Sheet
No ratings yet
Fan Calc Sheet
16 pages
Decision Science: Sales (INR 000s)
No ratings yet
Decision Science: Sales (INR 000s)
17 pages
Curva Hipsometrica Cuenca 1
No ratings yet
Curva Hipsometrica Cuenca 1
3 pages
ejercicio4
No ratings yet
ejercicio4
3 pages
Exercise 4 Trip Distribution: Work Trips
No ratings yet
Exercise 4 Trip Distribution: Work Trips
18 pages
Comparative Analysis
No ratings yet
Comparative Analysis
3 pages
DOC-20250409-WA0005.
No ratings yet
DOC-20250409-WA0005.
4 pages
Excel Pip Cascas
No ratings yet
Excel Pip Cascas
74 pages
Zomato-2024_CM
No ratings yet
Zomato-2024_CM
39 pages
Enchanting and Refining
No ratings yet
Enchanting and Refining
11 pages
Untitled
No ratings yet
Untitled
2 pages
01DesStats Types
No ratings yet
01DesStats Types
43 pages
21BCE0121_ML_CW_LAB_FINAL
No ratings yet
21BCE0121_ML_CW_LAB_FINAL
7 pages
Mighty_Digits_-_SaaS_MRR
No ratings yet
Mighty_Digits_-_SaaS_MRR
19 pages
3rd Attempt Revised
No ratings yet
3rd Attempt Revised
137 pages
Ventanilla - Completo A1 - Martin Cervera - Semana 8
No ratings yet
Ventanilla - Completo A1 - Martin Cervera - Semana 8
2 pages
Authorised Capital AcitivityDescription Active 1
No ratings yet
Authorised Capital AcitivityDescription Active 1
6 pages
ch16 Laquinta
No ratings yet
ch16 Laquinta
23 pages
Loan Amount (RS.) 3,000,000: Loan Details New Interest
No ratings yet
Loan Amount (RS.) 3,000,000: Loan Details New Interest
42 pages
sumanca1485cap
No ratings yet
sumanca1485cap
12 pages
DAILY Cosumption
No ratings yet
DAILY Cosumption
1,522 pages
Untitled
No ratings yet
Untitled
381 pages
Set 7
No ratings yet
Set 7
9 pages
Itm Enabled
No ratings yet
Itm Enabled
3 pages
Daily Report February 2022 4
No ratings yet
Daily Report February 2022 4
81 pages
Cone Length With Wastage: Count Constant Formula Cone Weight 2.5 4.166 5.95 6.25
No ratings yet
Cone Length With Wastage: Count Constant Formula Cone Weight 2.5 4.166 5.95 6.25
4 pages
Study B - 180303
No ratings yet
Study B - 180303
13 pages
GeoE 408 Activity No.1 - Payonga
No ratings yet
GeoE 408 Activity No.1 - Payonga
232 pages
Copy of BP Production Oct 2024 (11 Nov 2024)(1)
No ratings yet
Copy of BP Production Oct 2024 (11 Nov 2024)(1)
4 pages
Amtcoll September Ceria
No ratings yet
Amtcoll September Ceria
36 pages
676 Rows × 17 Columns: Import As
0% (1)
676 Rows × 17 Columns: Import As
2 pages
Engineering Economy
No ratings yet
Engineering Economy
4 pages
Financial Statement Analysis and Business Valuation CgnXN64hU2
No ratings yet
Financial Statement Analysis and Business Valuation CgnXN64hU2
18 pages
TP Debug Info
No ratings yet
TP Debug Info
17 pages
Vertopal.com_457 Labs
No ratings yet
Vertopal.com_457 Labs
19 pages
Corporacion Oriental, S.A Ret 01-03 Al 15-03-2024
No ratings yet
Corporacion Oriental, S.A Ret 01-03 Al 15-03-2024
3 pages
APR-DeC 2024 WISE Usage Data - 57 Functionalities - Google Sheets
No ratings yet
APR-DeC 2024 WISE Usage Data - 57 Functionalities - Google Sheets
1 page
Study D - 180303
No ratings yet
Study D - 180303
14 pages
Week01 LecA S20 Introduction
No ratings yet
Week01 LecA S20 Introduction
50 pages
Y
No ratings yet
Y
2 pages
a235943_2
No ratings yet
a235943_2
1 page
Easy Sudoku Puzzle Book (Printable Version)
From Everand
Easy Sudoku Puzzle Book (Printable Version)
Sheba Blake
No ratings yet
Minimalist White and Grey Professional Resume (1)
No ratings yet
Minimalist White and Grey Professional Resume (1)
1 page
Lecture 3 Notebook
No ratings yet
Lecture 3 Notebook
6 pages
Deutsch
No ratings yet
Deutsch
4 pages
Assignment 1: Q1. Task Description
No ratings yet
Assignment 1: Q1. Task Description
12 pages
J Adhoc 2018 05 008
No ratings yet
J Adhoc 2018 05 008
16 pages
Lec01introF23 PDF
No ratings yet
Lec01introF23 PDF
45 pages
Net 2018 07 026
No ratings yet
Net 2018 07 026
29 pages
Hofstede's Dimensions3b
No ratings yet
Hofstede's Dimensions3b
16 pages
CP500 Tools
No ratings yet
CP500 Tools
26 pages
Florida Manufacturing Companies Map
No ratings yet
Florida Manufacturing Companies Map
1 page
The Mumblin Wind Farm
No ratings yet
The Mumblin Wind Farm
2 pages
ISO-23081-1-2017
No ratings yet
ISO-23081-1-2017
11 pages
Transcriber Job Description
No ratings yet
Transcriber Job Description
3 pages
Libero IDE - Session 1
No ratings yet
Libero IDE - Session 1
190 pages
Fire Detection Robot
No ratings yet
Fire Detection Robot
7 pages
Resume Dev
No ratings yet
Resume Dev
1 page
DRAFT Climate Bonds Standard v4 Public Consultation 060922 Final
No ratings yet
DRAFT Climate Bonds Standard v4 Public Consultation 060922 Final
46 pages
BC20140225DOC012 2
No ratings yet
BC20140225DOC012 2
7 pages
Hire Charges
100% (1)
Hire Charges
15 pages
Reinforced Earth: Case Studies
No ratings yet
Reinforced Earth: Case Studies
7 pages
An Introduction To O-RAN: White Paper
No ratings yet
An Introduction To O-RAN: White Paper
8 pages
Aoxiang Golf Cart Brochure
No ratings yet
Aoxiang Golf Cart Brochure
28 pages
What Happened Till The First Supply 1
No ratings yet
What Happened Till The First Supply 1
10 pages
Unit 1.1
No ratings yet
Unit 1.1
20 pages
Chapter 4 - Financial Markets
No ratings yet
Chapter 4 - Financial Markets
81 pages
Transportation Research Part B: Gernot Sieg
No ratings yet
Transportation Research Part B: Gernot Sieg
9 pages
Manual-Quencher-21-8
No ratings yet
Manual-Quencher-21-8
6 pages
LCA21
No ratings yet
LCA21
4 pages
SDNIntegrationwith Firewallsand Enhancing Security Monitoringon Firewalls
No ratings yet
SDNIntegrationwith Firewallsand Enhancing Security Monitoringon Firewalls
8 pages
Endo Protect
100% (1)
Endo Protect
36 pages
Economic System
No ratings yet
Economic System
7 pages
sb375 1200kw Eng R DGR
No ratings yet
sb375 1200kw Eng R DGR
4 pages
Philhealth PPT (March 20 2019)
No ratings yet
Philhealth PPT (March 20 2019)
38 pages