Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
Aim:- Study of the Linear Regression in the Machine Learning using the
Boston Housing Dataset.
1) MACHINE LEARNING: - Machine Learning is the field of study that gives
computers the capability to learn without being explicitly programmed.
ML is one of the most exciting technologies that one would have ever come
across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn.
Machine learning is actively being used today, perhaps in many more places
than one would expect.
Fig.3.1
a) SUPERVISED LEARNING :- Supervised learning is where you have input
variables (x) and an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output Y = f(X).
The goal is to approximate the mapping function so well that when you have new
input data (x) you can predict the output variables (Y) for that data
Fig.3.2
Fig.3.3
Fig.3 .4
Fig.3.5
Fig.3.6
iv. Printing the datatype of the given data in the dataset:-
CODE:-
df.shape
df.dtypes
OUTPUT:-
Fig.3.7
v. Print the information of the dataset:-
CODE:-
df.info()
OUTPUT:-
Fig.3.8
vi. Counting the missing value for each feature in dataset:-
CODE:-
df.isna().sum()
OUTPUT:-
Fig.3.9
vii. Creating the target feature and separating the object from the target
and input feature:-
CODE:-
target_feature = 'MEDV'
y = df[target_feature]
x = df.drop(target_feature, axis=1)
x.head()
y.head()
OUTPUT:-
Fig.3.10
Fig.3.11
viii. Splitting the dataset using train_test_split:-
CODE:-
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size= 0.2,random_state=7)
from sklearn.linear_model import LinearRegression
regression = LinearRegression()
regression.fit(x_train,y_train)
# train score
train_score= round(regression.score(x_train, y_train)*100,2)
print("train score of Linear Regression: ",train_score)
OUTPUT:-
ix. Printing the shape and size and datatypes of the testing set:-
CODE:-
y_pred = regression.predict(x_train)
print(y_pred.shape)
print(y_test.shape)
print(f"y_pred size: {y_pred.size}")
print(f"y_test size: {y_test.size}")
print(f"y_pred data type: {type(y_pred)}")
print(f"y_test data type: {type(y_test)}")
OUTPUT:-
Fig.3.12
x. Creating and calculating the Variance , Actual and Predicted values:-
CODE:-
y_test = np.array(y_test)
y_pred = y_pred[:y_test.size]
y_pred = y_pred.reshape(y_test.shape)
# Create a DataFrame with the correct shape
df1 = pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
# Calculate the variance as a separate step
df1['Variance'] = df1['Actual'] - df1['Predicted']
df1.head()
OUTPUT:-
Fig.3.13
Fig.3.14
Fig.3.15
OUTPUT:-
Fig.3.17
xiii. Creating a new Dataframe and get the linear coefficient :-
CODE:-
lr_coefficient = pd.DataFrame()
lr_coefficient["columns"]= x_train.columns
lr_coefficient["coefficient Estimate"]=pd.Series(regression.coef_)
print(lr_coefficient)
OUTPUT:-
Fig.3.18
xiv. Ploting the bar chart of coefficient using the matplotplotting library:-
CODE:-
fig, ax = plt.subplots(figsize = (20,10))
ax.bar(lr_coefficient["columns"],lr_coefficient["coefficient Estimate"])
ax.spines["bottom"].set_position("zero")
plt.style.use("ggplot")
plt.grid()
plt.show()
fig, ax = plt.subplots(figsize =(20,10))
x_ax = range(len(x_test))
plt.scatter(x_ax, y_test, s=30, color='green', label='original')
plt.scatter(x_ax, y_pred, s=30,color='red',label='predicated')
plt.legend()
#plt.grid()
plt.show()
OUTPUT:-
Fig.3.19
Fig.3.20
xv. Ploting the original and predicated value using the scatter and the
plot:-
CODE:-
fig, ax = plt.subplots(figsize =(20,10))
x_ax = range(len(x_test))
plt.scatter(x_ax, y_test, s=30, color='green', label='original')
plt.plot(x_ax, y_pred, lw=0.8,color='red',label='predicated')
plt.legend()
#plt.grid()
plt.show()
OUTPUT:-
Fig.3.21
xvi. Using scatter plot ,how the features are vary with the MEDV:-
CODE:-
plt.feature(figsize(20,5))
features = [‘LSTAT,’RM’]
target = df[‘MEDV’]
for i,col in enumerate(features):
plt.subplot(1, len(features), i+1)
x=df[col]
y=target
plt.scatter9x,y,marker=’0’)
plt.title(col)
plt.xlabel(col)
plt.ylabel(‘MEDV’)
output:-
Fig.3.22
Fig.3.23