0% found this document useful (0 votes)

6 views

Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)

Uploaded by

mrigendra kushwaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)

Uploaded by

mrigendra kushwaha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Experiment Number: 3

Aim:- Study of the Linear Regression in the Machine Learning using the
Boston Housing Dataset.
1) MACHINE LEARNING: - Machine Learning is the field of study that gives
computers the capability to learn without being explicitly programmed.
 ML is one of the most exciting technologies that one would have ever come
across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn.
 Machine learning is actively being used today, perhaps in many more places
than one would expect.

Fig.3.1
a) SUPERVISED LEARNING :- Supervised learning is where you have input
variables (x) and an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output Y = f(X).
 The goal is to approximate the mapping function so well that when you have new
input data (x) you can predict the output variables (Y) for that data

i. LINEAR REGRESSION :- A data model explicitly describes a relationship

between predictor and response variables.
 Regression algorithms are used to predict a continuous numerical output. For
example, a regression algorithm could be used to predict the price of a house
based on its size, location, and other features.
 Linear regression fits a data model that is linear in the model coefficients. The
most common type of linear regression is a least-squares fit, which can fit
both lines and polynomials, among other linear models
 Linear regression is not merely a predictive tool; it forms the basis for
various advanced models. Techniques like regularization and support vector
machines draw inspiration from linear regression, expanding its utility.
ii. CLASSIFICATION(DATA MINING) :- Data mining in general terms
means mining or digging deep into data that is in different forms to gain
patterns, and to gain knowledge on that pattern.
 In the process of data mining, large data sets are first sorted, then patterns
are identified and relationships are established to perform data analysis and
solve problems.
 Classification is a task in data mining that involves assigning a class label to
each instance in a dataset based on its features.
 The goal of classification is to build a model that accurately predicts the
class labels of new instances based on their features.

b) UNSUPERVISED LEARNING :- Unsupervised learning is a branch of machine

learning that deals with unlabeled data.
 Unlike supervised learning, where the data is labeled with a specific category or
outcome, unsupervised learning algorithms are tasked with finding patterns and
relationships within the data without any prior knowledge of the data’s meaning.
 This makes unsupervised learning a powerful tool for exploratory data analysis,
where the goal is to understand the underlying structure of the data
i. CLUSTERING :- It is basically a type of unsupervised learning method. An
unsupervised learning method is a method in which we draw references from
datasets consisting of input data without labeled responses.
 Generally, it is used as a process to find meaningful structure, explanatory
underlying processes, generative features, and groupings inherent in a set of
examples.
 Clustering is the task of dividing the population or data points into a number
of groups such that data points in the same groups are more similar to other
data points in the same group and dissimilar to the data points in other groups.
 It is basically a collection of objects on the basis of similarity and dissimilarity
between them

2) Applying the Linear regression using the Boston Housing Dataset:-

 There are 506 samples and 13 feature variables in this dataset. The objective is to
predict the value of prices of the house using the given features.
 The description of all the features of Boston-Housing-Dataset is given below:-
i) CRIM :- Per capita crime rate by town
ii) ZN :- Proportion of residential land zoned for lots over 25,000 sq. ft.
iii) INDUS :- Proportion of non-retail business acres per town
iv) CHAS :- Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
v) NOX :- Nitric oxide concentration (parts per 10 million)
vi) RM :- Average number of rooms per dwelling
vii) AGE -: Proportion of owner-occupied units built prior to 1940
viii) DIS :- Weighted distances to five Boston employment centers
ix) RAD :- Index of accessibility to radial highways
x) TAX :- Full-value property tax rate per $10,000
xi) PTRATIO :- Pupil-teacher ratio by town
xii) B :- 1000(Bk — 0.63)², where Bk is the proportion of [people of African
American descent] by town
xiii) LSTAT :- Percentage of lower status of the population
xiv) MEDV :- Median value of owner-occupied homes in $1000s

i. Importing all the required libraries :-

CODE:-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Import seaborn as sns
ii. Now loading the data in the Dataframe and print the first five row:-
CODE;-
df = pd.read_csv("boston-housing-dataset.csv")
df.head()
OUTPUT:-

Fig.3.2

Fig.3.3

Fig.3 .4
Fig.3.5

iii. Printing the statistical description:-

CODE:-
df.describe()
OUTPUT:-

Fig.3.6
iv. Printing the datatype of the given data in the dataset:-
CODE:-
df.shape
df.dtypes
OUTPUT:-

Fig.3.7
v. Print the information of the dataset:-
CODE:-
df.info()
OUTPUT:-

Fig.3.8
vi. Counting the missing value for each feature in dataset:-
CODE:-
df.isna().sum()
OUTPUT:-

Fig.3.9
vii. Creating the target feature and separating the object from the target
and input feature:-

CODE:-
target_feature = 'MEDV'
y = df[target_feature]
x = df.drop(target_feature, axis=1)
x.head()
y.head()
OUTPUT:-

Fig.3.10
Fig.3.11
viii. Splitting the dataset using train_test_split:-
CODE:-
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size= 0.2,random_state=7)
from sklearn.linear_model import LinearRegression
regression = LinearRegression()

regression.fit(x_train,y_train)
# train score
train_score= round(regression.score(x_train, y_train)*100,2)
print("train score of Linear Regression: ",train_score)
OUTPUT:-

ix. Printing the shape and size and datatypes of the testing set:-
CODE:-
y_pred = regression.predict(x_train)
print(y_pred.shape)
print(y_test.shape)
print(f"y_pred size: {y_pred.size}")
print(f"y_test size: {y_test.size}")
print(f"y_pred data type: {type(y_pred)}")
print(f"y_test data type: {type(y_test)}")
OUTPUT:-

Fig.3.12
x. Creating and calculating the Variance , Actual and Predicted values:-
CODE:-
y_test = np.array(y_test)
y_pred = y_pred[:y_test.size]
y_pred = y_pred.reshape(y_test.shape)
# Create a DataFrame with the correct shape
df1 = pd.DataFrame({'Actual':y_test,'Predicted':y_pred})
# Calculate the variance as a separate step
df1['Variance'] = df1['Actual'] - df1['Predicted']
df1.head()

OUTPUT:-

Fig.3.13
Fig.3.14

Fig.3.15

xi. Printing the 14 values of features of dataset:-

CODE:-
df.head(14)
OUTPUT:-
Fig.3.16
xii. Intercept and coefficient value:-
CODE:-
regression.intercept_
regression.coef_

OUTPUT:-

Fig.3.17
xiii. Creating a new Dataframe and get the linear coefficient :-
CODE:-
lr_coefficient = pd.DataFrame()
lr_coefficient["columns"]= x_train.columns
lr_coefficient["coefficient Estimate"]=pd.Series(regression.coef_)
print(lr_coefficient)
OUTPUT:-

Fig.3.18
xiv. Ploting the bar chart of coefficient using the matplotplotting library:-
CODE:-
fig, ax = plt.subplots(figsize = (20,10))
ax.bar(lr_coefficient["columns"],lr_coefficient["coefficient Estimate"])
ax.spines["bottom"].set_position("zero")
plt.style.use("ggplot")
plt.grid()
plt.show()
fig, ax = plt.subplots(figsize =(20,10))
x_ax = range(len(x_test))
plt.scatter(x_ax, y_test, s=30, color='green', label='original')
plt.scatter(x_ax, y_pred, s=30,color='red',label='predicated')
plt.legend()
#plt.grid()
plt.show()

OUTPUT:-

Fig.3.19

Fig.3.20
xv. Ploting the original and predicated value using the scatter and the
plot:-
CODE:-
fig, ax = plt.subplots(figsize =(20,10))
x_ax = range(len(x_test))
plt.scatter(x_ax, y_test, s=30, color='green', label='original')
plt.plot(x_ax, y_pred, lw=0.8,color='red',label='predicated')
plt.legend()
#plt.grid()
plt.show()

OUTPUT:-

Fig.3.21

xvi. Using scatter plot ,how the features are vary with the MEDV:-
CODE:-
plt.feature(figsize(20,5))
features = [‘LSTAT,’RM’]
target = df[‘MEDV’]
for i,col in enumerate(features):
plt.subplot(1, len(features), i+1)
x=df[col]
y=target
plt.scatter9x,y,marker=’0’)
plt.title(col)
plt.xlabel(col)
plt.ylabel(‘MEDV’)
output:-

Fig.3.22

xvii. Distribution of the target variable(MEDV):-

CODE:-
ns.set(rc={‘figure.figsize’(11,,8)})
sns.distplot(df[‘MEDV’],bins=30)
plt.show()
OUTPUT:-

Fig.3.23

ENGO 659-Report Project2
No ratings yet
ENGO 659-Report Project2
16 pages
Machine Learning Multiple Choice Questions
100% (1)
Machine Learning Multiple Choice Questions
20 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
No ratings yet
Agroconsultant: Intelligent Crop Recommendation System Using Machine Learning Algorithms
6 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Cap8 Predicting Continuous Target Variables with Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
No ratings yet
Cap8 Predicting Continuous Target Variables with Regression Analysis - Thakur Ankita 2016 - Python Real World Data Science
36 pages
T2_summary_VHA
No ratings yet
T2_summary_VHA
14 pages
ML Combined
No ratings yet
ML Combined
254 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Unit 5
No ratings yet
Unit 5
171 pages
4 - Học Máy Cơ Bản - Hồi Quy Tuyến Tính
No ratings yet
4 - Học Máy Cơ Bản - Hồi Quy Tuyến Tính
113 pages
Final Lab Manual
No ratings yet
Final Lab Manual
34 pages
Stanford ML
No ratings yet
Stanford ML
168 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Module 5
No ratings yet
Module 5
48 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Information Retrieval Important questions
No ratings yet
Information Retrieval Important questions
20 pages
Ml Cyber Lab
No ratings yet
Ml Cyber Lab
16 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
UNIT-6
No ratings yet
UNIT-6
107 pages
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Continuous Assessment
No ratings yet
Continuous Assessment
4 pages
Making predictions
No ratings yet
Making predictions
13 pages
CS 611 Slides 4
No ratings yet
CS 611 Slides 4
25 pages
Unit 1(DS)
No ratings yet
Unit 1(DS)
15 pages
LFD-1
No ratings yet
LFD-1
39 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Slides on DataI
No ratings yet
Slides on DataI
33 pages
Module 2
No ratings yet
Module 2
20 pages
Notes On Intro To Data Science Udacity
No ratings yet
Notes On Intro To Data Science Udacity
8 pages
PA DA1
No ratings yet
PA DA1
17 pages
Lecture-17-Linear Regression Using Sklearn
No ratings yet
Lecture-17-Linear Regression Using Sklearn
32 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Lecture02. ML Pipeline (Chapter 2)
No ratings yet
Lecture02. ML Pipeline (Chapter 2)
50 pages
machinelearning
No ratings yet
machinelearning
26 pages
ML LAB
No ratings yet
ML LAB
23 pages
Sklearn Tutorial: DNN On Boston Data
No ratings yet
Sklearn Tutorial: DNN On Boston Data
9 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
ML full for print new 1
No ratings yet
ML full for print new 1
38 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Unit1 6thsemCS
No ratings yet
Unit1 6thsemCS
22 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Module 4 - Study Material - Overview of Predictive Analytics
No ratings yet
Module 4 - Study Material - Overview of Predictive Analytics
15 pages
83 Sklearn Pipeline
No ratings yet
83 Sklearn Pipeline
8 pages
ML_AI
No ratings yet
ML_AI
53 pages
House Report
No ratings yet
House Report
26 pages
UNIT-4 PDA PPT
No ratings yet
UNIT-4 PDA PPT
111 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
Foundation of Machine Learning F-PMLFML02-WS
No ratings yet
Foundation of Machine Learning F-PMLFML02-WS
352 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
module 2 modified
No ratings yet
module 2 modified
67 pages
ML File
No ratings yet
ML File
37 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Responsible Data Science
From Everand
Responsible Data Science
Peter C. Bruce
No ratings yet
Slides Merged
No ratings yet
Slides Merged
374 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
19 pages
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
No ratings yet
A Semi-Supervised Approach For Detection of SCADA Attacks in Gas Pipeline Control Systems
8 pages
Ism Research Assessment 3
No ratings yet
Ism Research Assessment 3
27 pages
Artificial Neural Network - Quick Guide - Tutorialspoint
No ratings yet
Artificial Neural Network - Quick Guide - Tutorialspoint
61 pages
A Thesis On Automated Handling of Port Containers Using Machine Learning
100% (1)
A Thesis On Automated Handling of Port Containers Using Machine Learning
59 pages
1822 B.tech It Batchno 340
No ratings yet
1822 B.tech It Batchno 340
48 pages
Supervised Learning
No ratings yet
Supervised Learning
4 pages
Unit-5 DS Notes
No ratings yet
Unit-5 DS Notes
19 pages
Module - 5 - ANN
No ratings yet
Module - 5 - ANN
50 pages
DS Question in Mechanical Industry
No ratings yet
DS Question in Mechanical Industry
22 pages
Machine Learning Methods For Solar Radiation Forecasting. A Review
No ratings yet
Machine Learning Methods For Solar Radiation Forecasting. A Review
33 pages
Abh (Business Forecasting
No ratings yet
Abh (Business Forecasting
8 pages
Exercise Guide v2.0
No ratings yet
Exercise Guide v2.0
124 pages
Physics On Machine Learning
100% (1)
Physics On Machine Learning
44 pages
AI Pastpaper Solve by M.Noman Tariq
No ratings yet
AI Pastpaper Solve by M.Noman Tariq
23 pages
Progress in Energy and Combustion Science: Masoud Aliramezani, Charles Robert Koch, Mahdi Shahbakhti
No ratings yet
Progress in Energy and Combustion Science: Masoud Aliramezani, Charles Robert Koch, Mahdi Shahbakhti
38 pages
FinalReportEnd
No ratings yet
FinalReportEnd
92 pages
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
No ratings yet
Application of Machine Learning Techniques On Traffic Data For Customer's Segmentation, Churn Prediction and Customer's Lifetime Value Evaluation
113 pages
Image Classification - Digital Image Processing
No ratings yet
Image Classification - Digital Image Processing
16 pages
Parkinson's Disease Detection Using Machine Learning Algorithms
No ratings yet
Parkinson's Disease Detection Using Machine Learning Algorithms
7 pages
LINFO2262: Machine Learning: Classification and Evaluation
No ratings yet
LINFO2262: Machine Learning: Classification and Evaluation
39 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
SECA4002
No ratings yet
SECA4002
65 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
Touretzki, Et Al (2022) Machine Learning and the Five Big Ideas in AI
No ratings yet
Touretzki, Et Al (2022) Machine Learning and the Five Big Ideas in AI
36 pages
Unit 2 Supervised Learning Regression
No ratings yet
Unit 2 Supervised Learning Regression
111 pages

Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)

Uploaded by

Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)

Uploaded by

Experiment Number: 3

i. LINEAR REGRESSION :- A data model explicitly describes a relationship

b) UNSUPERVISED LEARNING :- Unsupervised learning is a branch of machine

2) Applying the Linear regression using the Boston Housing Dataset:-

i. Importing all the required libraries :-

iii. Printing the statistical description:-

xi. Printing the 14 values of features of dataset:-

xvii. Distribution of the target variable(MEDV):-

You might also like