LinearRegression_Iris
LinearRegression_Iris
A linear model is defined by: y = b + bx, where y is the target variable, X is the data,
b represents the coefficients.
The Salary dataset consists of two variables [YearsExperience, Salary], The goal is to
predict the salary one is going to get using the years of experience.
We will kick off with a very famous data set loved by machine learning practitioner...
In [2]:
iris_df = pd.read_csv('iris.csv')
iris_df.head()
Out[2]:
SepalWidthC PetalWidthC
Id SepalLengthCm PetalLengthCm Species
m m
Out[3]:
['sepal length (cm)',
'sepal width (cm)',
'petal length (cm)',
'petal width (cm)']
In [4]:
data.target_names #and over here we have names of species or our target, the
dependent values.
Out[4]:
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
In [5]:
data.target # over here we see that by calling target on the dataset we get the
number representations
# or dummy representatives of the values in the dependent column
Out[5]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
In [ ]:
In [102]:
X = data.data #data basically refere to the values in the independent columns
X.shape #check the shape hapy with that
Out[102]:
(150, 4)
In [103]:
y = data.target # collecting the number represatation of the independent values
y.shape #check the shape... not happy. let's reshape to 2D
Out[103]:
(150,)
In [104]:
# because sklearn doesn't like 1D arrays or vectors we're going to reshape it
y = y.reshape(-1, 1)
y.shape # get it to 2D
Out[104]:
(150, 1)
Out[81]:
Text(0.5, 0, 'sepal length')
We can't really see how the iris are grouped but we can clearly see that there a
linear relationship here
In [105]:
from sklearn.model_selection import train_test_split #the tool for split the
data
from sklearn.linear_model import LinearRegression #and because we know we
going to use linear regression for our prediction we import the class as well
#over here we split the data. into the x&y trainer and y&x tester
X_train,X_test,y_train,y_test = train_test_split(X,y, test_size = 0.20)
In [106]:
lr = LinearRegression() #create our linear model
#fitting the model on the training data and try to predict the X_test
iris_model = lr.fit(X_train, y_train)
predictions = iris_model.predict(X_test)
In [48]:
#plotting the error in our in our predicitions
plt.errorbar(range(1, len(y_test)+1), y_test, yerr=(y_test-predictions), fmt='^k',
ecolor='red')
Out[48]:
<ErrorbarContainer object of 3 artists>
In [50]:
from sklearn.metrics import r2_score #class will help us to calculate and see the
score of our predictions
r2_score(y_test, predictions)
Out[50]:
0.904901491129183
In [51]:
#so over to get the RMSE we first get the distance between the y_test and the
prediction then we elavated it to the power of **2
#after we get the average number and finally use the numpy square root function.
np.sqrt(((predictions - y_test)**2).mean())
Out[51]:
0.24520071494252943