0% found this document useful (0 votes)
8 views

LinearRegression_Iris

The document discusses linear regression as a statistical method for estimating relationships between dependent and independent variables, primarily for prediction and forecasting. It uses the Salary dataset to predict salary based on years of experience and the Iris dataset to demonstrate linear regression with features like sepal and petal lengths. The process includes data loading, model training, and performance evaluation using metrics such as R² score and RMSE.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

LinearRegression_Iris

The document discusses linear regression as a statistical method for estimating relationships between dependent and independent variables, primarily for prediction and forecasting. It uses the Salary dataset to predict salary based on years of experience and the Iris dataset to demonstrate linear regression with features like sepal and petal lengths. The process includes data loading, model training, and performance evaluation using metrics such as R² score and RMSE.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Linear Regression

Regression analysis is a statistical process used to estimate the relationships


between the dependent variable and one or more independent variables. Regression
analysis is mostly used for prediction and forecasting which overlaps with machine
learning. In this task we will experiment some linear regression use case.

The objective of LinearRegression is to fit a linear model to the dataset by adjusting


a set of parameters in order to make the sum of the squared residuals of the model
as small as possible.

A linear model is defined by: y = b + bx, where y is the target variable, X is the data,
b represents the coefficients.

Let's try and predict something using linear regression.

The Salary dataset consists of two variables [YearsExperience, Salary], The goal is to
predict the salary one is going to get using the years of experience.

We will kick off with a very famous data set loved by machine learning practitioner...

Let's get to know how Data and have fun with.

IRIS DATA SET LINEAR REGRESSION


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

In [2]:
iris_df = pd.read_csv('iris.csv')
iris_df.head()

Out[2]:

SepalWidthC PetalWidthC
Id SepalLengthCm PetalLengthCm Species
m m

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa


we load the iris dataset directly online (in case you do not
have it you can import it from skearlm)
In [3]:
data = load_iris()
data.feature_names #feature can be refere to as column but is a term(i think) that
refere to independents var

Out[3]:
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']
In [4]:
data.target_names #and over here we have names of species or our target, the
dependent values.

Out[4]:
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
In [5]:
data.target # over here we see that by calling target on the dataset we get the
number representations
# or dummy representatives of the values in the dependent column

Out[5]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
In [ ]:

In [102]:
X = data.data #data basically refere to the values in the independent columns
X.shape #check the shape hapy with that

Out[102]:
(150, 4)
In [103]:
y = data.target # collecting the number represatation of the independent values
y.shape #check the shape... not happy. let's reshape to 2D

Out[103]:
(150,)
In [104]:
# because sklearn doesn't like 1D arrays or vectors we're going to reshape it
y = y.reshape(-1, 1)
y.shape # get it to 2D

Out[104]:
(150, 1)

we're going to plot the lenght of sepal and petal to check


if the data Linear
In [81]:
plt.figure(figsize=(18,8),dpi=100) #set the canvas size for visibility

plt.scatter(X.T[0],X.T[2]) #over here I use the T ndarray method to transpose the


data then get columns at index 0 and 2
plt.title('IRIS Petal and sepal length', fontsize=20) # set the title of the plot
and adjust my font size for readability

#then we set the label (just to be obvious)


plt.ylabel('Petal Length')
plt.xlabel('sepal length')

Out[81]:
Text(0.5, 0, 'sepal length')

We can't really see how the iris are grouped but we can clearly see that there a
linear relationship here

Let's start the prediction


We're going to take this simple steps to predict the
hourlywage of a person
 split the data using the train_test_split() method from skitlearn
 Then we going to build the model and fit (train) it using our train data
 Last but not least we're going to generate prediction

In [105]:
from sklearn.model_selection import train_test_split #the tool for split the
data
from sklearn.linear_model import LinearRegression #and because we know we
going to use linear regression for our prediction we import the class as well

#over here we split the data. into the x&y trainer and y&x tester
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.20)

In [106]:
lr = LinearRegression() #create our linear model

#fitting the model on the training data and try to predict the X_test
iris_model = lr.fit(X_train, y_train)
predictions = iris_model.predict(X_test)

In [48]:
#plotting the error in our in our predicitions
plt.errorbar(range(1, len(y_test)+1), y_test, yerr=(y_test-predictions), fmt='^k',
ecolor='red')

Out[48]:
<ErrorbarContainer object of 3 artists>

In [50]:
from sklearn.metrics import r2_score #class will help us to calculate and see the
score of our predictions

r2_score(y_test, predictions)

Out[50]:
0.904901491129183
In [51]:
#so over to get the RMSE we first get the distance between the y_test and the
prediction then we elavated it to the power of **2
#after we get the average number and finally use the numpy square root function.
np.sqrt(((predictions - y_test)**2).mean())

Out[51]:
0.24520071494252943

You might also like