0% found this document useful (0 votes)
51 views

Linear - Regression - Ipynb - Colaboratory

The document discusses building a linear regression model to predict salary based on years of experience. It loads salary and experience data, splits it into training and test sets, fits a linear regression model to the training set, makes predictions on the test set, and calculates error metrics to evaluate the model's performance. Key steps include data preprocessing, training a linear regression classifier, making predictions, and evaluating the model using metrics like MAE, MSE, and RMSE.

Uploaded by

avnimote121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Linear - Regression - Ipynb - Colaboratory

The document discusses building a linear regression model to predict salary based on years of experience. It loads salary and experience data, splits it into training and test sets, fits a linear regression model to the training set, makes predictions on the test set, and calculates error metrics to evaluate the model's performance. Key steps include data preprocessing, training a linear regression classifier, making predictions, and evaluating the model using metrics like MAE, MSE, and RMSE.

Uploaded by

avnimote121
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3/5/24, 2:16 AM Linear_regression.

ipynb - Colaboratory

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline

Data = pd.read_csv('Salary_Data - Salary_Data.csv')

Data.head(20)

YearsExperience Salary

0 1.1 39343

1 1.3 46205

2 1.5 37731

3 2.0 43525

4 2.2 39891

5 2.9 56642

6 3.0 60150

7 3.2 54445

8 3.2 64445

9 3.7 57189

10 3.9 63218

11 4.0 55794

12 4.0 56957

13 4.1 57081

14 4.5 61111

15 4.9 67938

16 5.1 66029

17 5.3 83088

18 5.9 81363

19 6.0 93940

Next steps: Generate code with Data


toggle_off View recommended plots

Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null int64
dtypes: float64(1), int64(1)
memory usage: 608.0 bytes

Data.describe()

YearsExperience Salary

count 30.000000 30.000000

mean 5.313333 76003.000000

std 2.837888 27414.429785

min 1.100000 37731.000000

25% 3.200000 56720.750000

50% 4.700000 65237.000000

75% 7.700000 100544.750000

max 10.500000 122391.000000

https://colab.research.google.com/drive/1iygS0lL6Sp60B6yLtgLDcZvDRP3b70sn#scrollTo=3bb9f1b6&printMode=true 1/4
3/5/24, 2:16 AM Linear_regression.ipynb - Colaboratory
Data.columns

Index(['YearsExperience', 'Salary'], dtype='object')

sns.pairplot(Data)

output <seaborn.axisgrid.PairGrid at 0x7c131d8be9b0>

sns.heatmap(Data.corr(),annot=True)

<Axes: >

sns.distplot(Data["Salary"])

https://colab.research.google.com/drive/1iygS0lL6Sp60B6yLtgLDcZvDRP3b70sn#scrollTo=3bb9f1b6&printMode=true 2/4
3/5/24, 2:16 AM Linear_regression.ipynb - Colaboratory

<ipython-input-243-a62986691193>:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(Data["Salary"])
<Axes: xlabel='Salary', ylabel='Density'>

Data.columns

Index(['YearsExperience', 'Salary'], dtype='object')

X=Data[[ 'YearsExperience']]
y=Data[['Salary']]

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.4,random_state=21)

from sklearn.linear_model import LinearRegression

lr=LinearRegression()

lr.fit(X_train,y_train)

▾ LinearRegression
LinearRegression()

coff=pd.DataFrame(lr.coef_,X.columns,columns=['coefficient'])

predictions=lr.predict(X_test)

plt.scatter(y_test,predictions)

https://colab.research.google.com/drive/1iygS0lL6Sp60B6yLtgLDcZvDRP3b70sn#scrollTo=3bb9f1b6&printMode=true 3/4
3/5/24, 2:16 AM Linear_regression.ipynb - Colaboratory

<matplotlib.collections.PathCollection at 0x7c131d223cd0>

from sklearn import metrics

a="MAE",metrics.mean_absolute_error(y_test,predictions)
print(a)
b="MSE",metrics.mean_squared_error(y_test,predictions)
print(b)
c="RMAE",np.sqrt(metrics.mean_absolute_error(y_test,predictions))
print(c)

('MAE', 5483.199040982923)
('MSE', 47564840.41388013)
('RMAE', 74.04862619240767)

Could not connect to the reCAPTCHA service. Please check your internet connection and reload to get a reCAPTCHA challenge.

https://colab.research.google.com/drive/1iygS0lL6Sp60B6yLtgLDcZvDRP3b70sn#scrollTo=3bb9f1b6&printMode=true 4/4

You might also like