0% found this document useful (0 votes)
2K views

ML LAB MANNUAL R22 CSE(DS)

The Machine Learning Lab Manual for B.Tech III Year II Semester outlines the vision, mission, program outcomes, educational objectives, and specific outcomes related to the course. It includes a list of experiments designed to provide practical experience with various machine learning techniques using Python. The manual also details the course objectives, outcomes, and mapping of course outcomes to program outcomes.

Uploaded by

FaZe Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

ML LAB MANNUAL R22 CSE(DS)

The Machine Learning Lab Manual for B.Tech III Year II Semester outlines the vision, mission, program outcomes, educational objectives, and specific outcomes related to the course. It includes a list of experiments designed to provide practical experience with various machine learning techniques using Python. The manual also details the course objectives, outcomes, and mapping of course outcomes to program outcomes.

Uploaded by

FaZe Gaming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 46

DEPARTMENT OF COMPUTER SCIENCE &ENGINEERING(DS)

MACHINE LEARNING LAB MANNUAL

(AcademicYear: 2024-25)

B.Tech III Year – II Semester (R22)

By

Mr. MOGILI SIVA


Assistant Professor

1 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


CONTENTS
Sl.No. PARTICULARS Page No.
1 VISION & MISSION OF THE COLLEGE 3

2 VISION & M ISSION OF THE DEPARTMENT 3

3 PROGRAM OUTCOMES 4

4 PROGRAM EDUCATIONAL OBJECTIVES(PEO) 6

5 PROGRAM SPECIFICOUT COMES (PSO) 6

6 COURSE OBJECTIVES AND COURSE OUTCOMES 6

7 CO – PO MAPPING 7

8 LIST OF EXPERIMENTS 8

EACH EXPERIMENT WITH


9
9 i) Aim ii) Description iii) Algorithm

iv) Program v) Output

10 MINI PROJECT 42

2 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


COLLEGE VISION

To evolve into a center of excellence in Science & Technology through


creative and innovative practices in teaching-learning, promoting academic
achievement & research excellence to produceinternationallyaccepted
competitive and world classprofessionalswho are psychologically strong and
emotionally balanced imbued with social consciousness and ethical values.

COLLEGE MISSION

To provide high quality academic programmers, training activities,


research facilities and opportunities supported by continuous industry-institute
interaction aimed at employability, entrepreneurship, leadership and research
aptitude among students and contribute to the economic and technological
development of the region, state and nation.

DEPARTMENT VISION

To develop Data Science professionals through creative and innovative


approaches to address the present and future challenges of the modern computing
world.

DEPARTMENT MISSION

Educate students by expanding their knowledge in cutting–edge


technologies to acquire professional ethics. Impart quality education to build
research & entrepreneurial eco system using niche technologies.

3 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


PROGRAM OUTCOMES(POs)

PO1.Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.

PO2.Problem analysis: Identify, formulate review research literature and


analyze complex engineeringproblemsreachingsubstantiatedconclusionsusing
first principleofmathematics, natural science and engineering science.

PO3.Design/development of solutions: Design solutions for complex


engineering problems and design system components or processes that meet the
specified needs with appropriate consideration for the public health and safety,
and the cultural, societal, and environmental considerations.

PO4.Conduct investigations of complex problems: Use research-based


knowledge and research methods including design of experiments, analysis and
interpretation of data, and synthesis of the information to provide valid
conclusions.

PO5.Modern tool usage: Create, select, and apply appropriate techniques,


resources, and modern engineering and IT tools including prediction and
modeling to complex engineering activities with an understanding of the
limitations.

PO6.The engineer and society: Apply reasoning informed by the contextual


knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.

PO7.Environment and sustainability: Understand the impact of the


professional engineering solutions insocietaland environmentalcontexts, and
demonstratethe knowledge of, and need for sustainable development.

4 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


PO8.Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO9.Individual and team work: Function effectively as an individual, and as a


member or leader in diverse teams, and in multidisciplinary settings.

PO10.Communication: Communicate effectivelyoncomplexengineering


activitieswiththe engineering community and with society at large, such as,
being able to comprehend and write effective reports and design documentation,
make effective presentations, and give and receive clear instructions.

PO11.Project management and finance: Demonstrate knowledge and


understanding ofthe engineering and management principles and apply these to
one's ownwork, as a member and leader in a team, to manage projects and in
multidisciplinary environments.

PO12.Life-long learning: Recognize the need for, and have the preparation and
ability to engage in independent and life-long learning inthe broadest context
oftechnologicalchange.

5 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


PROGRAM EDUCATIONAL OBJECTIVES (PEOs)

PEO1. Have Knowledge and analytical skills including Mathematics,


Science & basic Engineering

PEO2. Graduates will be able to work effectively in cross-functional


teams to develop Design and Analysis of Algorithms and Machine
Learning solutions that meet business objectives & societal needs.

PEO3.
HaveextensiveknowledgeinstateofartframeworksinDesignandAnalysisof
Algorithms and design industry accepted AI solutions using modern tools.
.

PROGRAM SPECIFIC OUTCOMES(PSOs)

PSO1.Understanding of statistical concepts and their applications in Machine


learning

PSO2. Familiarity with natural language processing and its applications in


areas such as sentiment analysis and language translation

PSO3. Adopt new and fast emerging technologies in Designand


Analysisof Algorithms and Machine Learning

6 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


R22 B.Tech. CSE (DS) Syllabus JNTU Hyderabad
DS604PC: MACHINE LEARNING LAB
B.Tech. III Year II Sem.
L T P C
0 0 2 1
Course Objective :
 The objective of this lab is to get an overview of the various machine learning techniques and can
demonstrate them using python.

Course Outcomes :
 Understand modern notions in predictive data analysis
 Select data, model selection, model complexity and identify the trends
 Understand a range of machine learning algorithms along with their strengths and
weaknesses
 Build predictive models from data and analyze their performance

CO-PO Mapping:

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 3 3 3 3 3 1 2 1 2 2
CO2 3 3 3 3 3 1 2 1 1 1
CO3 3 3 3 3 3 1 2 1 1 1
CO4 3 3 3 3 3 1 2 1 1 1

7 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


LIST OF EXPERIMENTS

SL NAME OF THE EXPERIMENT DATE OF


NO CONDUCTION
Introduction to Machine Learning Lab and basics of Python
packages
Write a python program to compute
1 Central Tendency Measures: Mean, Median, Mode
Measure of Dispersion: Variance, Standard Deviation
Write a python program to Study of Python Basic Libraries such
2
as Statistics, Math, Numpy and Scipy
Write a python program to Study of Python Libraries for ML
3
application such as Pandas and Matplotlib
4 Write a Python program to implement Simple Linear Regression
Write a python program to Implementation of Multiple Linear
5
Regression for House Price Prediction using sklearn
Write a python program to Implementation of Decision tree
6
using sklearn and its parameter tuning
7 Implementation of KNN using sklearn
Write a python program to Implementation of Logistic
8
Regression using sklearn
Write a python program to Implementation of K-Means
9
Clustering
Performance analysis of Classification Algorithms on a specific
10
dataset (Mini Project)

TEXT BOOK :
1. Machine Learning – Tom M. Mitchell, - MGH.

REFERENCE BOOK :
1. Machine Learning: An Algorithmic Perspective, Stephen Marshland, Taylor & Francis.

8 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Experiment 1 Date :
-------------------------------------------------------------------------------------------------------------------------------
Experiment 1 : Write a python program to compute Central Tendency Measures: Mean,
Median, Mode and Measure of Dispersion: Variance, Standard Deviation

Aim: To compute Central Tendency Measures: Mean, Median, Mode and


Measure of Dispersion: Variance, Standard Deviation of given data set

Description :
Measures of Central Tendency : Measures of central tendency are statistical metrics that
describe the center or middle of a dataset which is used to summarize the entire dataset.
There are 3 measures of central tendency (i) Mean (ii) Median (iii) Mmode.
(i) Mean : The sample mean X is computed as the sum of all the observed n outcomes (X i)
from the sample divided by the total number of events.
n

∑ Xi
X = i =1
n
(ii) Mode : The number with the highest frequency is called mode.
Example data set : 5, 2,,5,5,5,5,4,1,8,5
Here 5 is the mode, since it occurs 6 times
And the rest of the outcomes occur only once.
So Mode = 5
(iii) Median : The median is the middle value of a set of numbers. The median is the same as
the 50th percentile for the set of numbers.

Step 1 : Put data in sorting order


( either ascending or descending order)
Step 2 : Evaluate median of data set by
Case 1 : If number of observations(n) is odd

[ ]
th
n+1
Median=Value of observation at Position
2
Case 2 : If number of observations(n) is even
Median= Arithmetic mean of Values of observations at

[] [ ]
th th
n n
∧ +1 Positions
2 2
Measure of Dispersion : A statistic that tells us how the data values are dispersed or spread
out is called the Measure of Dispersion. It is used to determine the spread of data in a set.
(i) Variance : If X is mean of n observed n outcomes (X i) then Variance is
n

∑ ( X i− X )2
σ 2= i =1
n
(ii) Standard Deviation : Standard Deviation is

9 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


S.D. 𝜎=

n

∑ ( X i−X )2
√ Variance= i=1
n
 Use the numpy library to compute Mean, Median, Variance, S.D. of given data set
 Use the stats from scipy library to compute Mode of given data set

Install Required Libraries :


Command : py --version
py -m pip --version
To install the module using pip : py -m pip install "ModuleName"
py -m pip install statistics
py -m pip install numPy

Program:
# Python program for Mean, mode, median, variance and Standard Deviation

import numpy as np
from scipy import stats

# Example data
data = [10, 20, 30, 40, 40, 50, 50, 50, 60]

# Calculate mean
mean = np.mean(data)

# Calculate median

10 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


median = np.median(data)

# Calculate mode
mode = stats.mode(data)

# Calculate variance
variance = np.var(data)

# Calculate standard deviation


sd = np.std(data)

# Printing results
print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Variance: {variance}")
print(f"Standard Deviation: {sd}")

Output :
Mean: 38.888888888888886
Median: 40.0
Mode: ModeResult(mode=50, count=3)
Variance: 232.09876543209873
Standard Deviation: 15.234788000891209

11 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Experiment 2 Date :
-------------------------------------------------------------------------------------------------------------------------------
Experiment 2 : Write a python program to study of Python Basic Libraries such as Statistics,
Math, Numpy and Scipy

Aim: To implement Python Libraries such as Statistics, Math, Numpy and Scipy

Description :
1. Statistics library : For estimating statistical models and performing statistical tests, including
linear regression, time-series analysis, and hypothesis testing.

mean() : Arithmetic mean (“average”) of data.


geometric_mean() : Geometric mean of data.
harmonic_mean() : Harmonic mean of data.
median() : Median (middle value) of data.
mode() : Single mode (most common value) of discrete or nominal data.
quantiles() : Divide data into intervals with equal probability.

2. math library : Provides access to mathematical functions, including basic math operations,
trigonometry, and logarithms. The math library is used for mathematical operations that are
not covered by NumPy or SciPy.

math.sqrt() : Returns the square root of a number:


math.ceil() : Rounds a number upwards to its nearest integer
math.floor() : Rounds a number downwards to its nearest integer
math.pi : Returns the value of constant π (3.141592653589793..):

3. Numpy library : NumPy is a Python library used for working with arrays.
It also has functions for working in domain of linear algebra, fourier transform, and matrices.

1. prod() : Return the product of array elements over a given axis.


3. sum() : Sum of array elements over a given axis.
4. sin() : Trigonometric sine, element-wise.
5. cos() : Cosine element-wise.
6. gradient() : Return the gradient of an N-dimensional array.

12 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


4. SciPy library : SciPy stands for Scientific Python. SciPy is a scientific computation library that
uses NumPy underneath. It provides more utility functions for optimization, stats and signal
processing.

1. comb() : It is known as combinations and returns the combination of a given value.


2. perm() : The perm stands for the permutation. It will return the permutation of the given
numbers.
3. exp10() : This method gives the number with raise to 10 power of the given number.
4. gamma() : It is known as Gamma function.
5. logsumexp() : It is known as Log Sum Exponential Function. It will return the log of the sum
of the exponential of input elements.

Install Required Libraries :


Command : py --version
py -m pip --version
To install the module using pip : py -m pip install "ModuleName"
py -m pip install statistics
py -m pip install numPy
py -m pip install sciPy

Program:
# Python program for study of libraries statistics, math, numpy and scipy

import statistics
import math
import numpy as np
from scipy.special import comb

13 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


from scipy.special import perm
from scipy.special import logsumexp

data = [10,20,30,40,50]

# using statistics library


mean = statistics.mean(data)
median= statistics.median(data)
mode = statistics.mode(data)
gm= statistics.geometric_mean(data)
hm= statistics.harmonic_mean(data)

print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Geometric Mean : {gm}")
print(f"Harmonic Mean : {hm}")

# using math library


x = math.sqrt(49)
y=math.cbrt(64)
z = math.ceil(1.4)
w= math.floor(1.4)

print(f" Square Root of 49 is : {x}")


print(f" Cube Root of 64 is : {y}")
print(f" ceil value : {z}")
print(f"Floor value : {w}")

# using numpy library


product = np.prod(data)
total= np.sum(data)
s = np.sin(data)
c= np.cos(data)

print(f" Product of array elements is {product}")


print(f" Sum of array elements is {total}")
print(f" Sine of array elements are {s}")
print(f" Cosine of array elements are {c}")

# using scipy library


cb= comb(9,2)
pr= perm(5,4)
lg =logsumexp(data)
14 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
print(f" Combinatioons : {cb}")
print(f" Permutations : {pr}")
print(f" Log Sum Exponential of given array is {lg}")

Output :
Mean: 30
Median: 30
Mode: 10
Geometric Mean : 26.051710846973528
Harmonic Mean : 21.8978102189781
Square Root of 49 is : 7.0
Cube Root of 64 is : 4.0
ceil value : 2
Floor value : 1
Product of array elements is 12000000
Sum of array elements is 150
Sine of array elements are [-0.54402111 0.91294525 -0.98803162 0.74511316 -0.26237485]
Cosine of array elements are [-0.83907153 0.40808206 0.15425145 -0.66693806
0.96496603]
Combinatioons : 36.0
Permutations : 120.0
Log Sum Exponential of given array is 50.00004540096037

15 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Experiment 3 Date :
-------------------------------------------------------------------------------------------------------------------------------
Experiment 3 : Write a python program to study of Python Libraries for ML application such
as Pandas and Matplotlib

Aim: To Study of Python Libraries for ML application such as Pandas and Matplotlib

Description :
1. Pandas library : The Pandas library is used for data manipulation and analysis. The name
"Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by
Wes McKinney in 2008.
The three data structures of pandas
1. Series (1-dimensional)
2. DataFrame (2-dimensional)
3. Panel(2-dimensional)

Matplotlib library : Matplotlib is a data visualization library in Python.


It's used for creating static, interactive, and animated visualizations in Python.
It generate plots, histograms, bar charts, scatter plots, etc..
It was created by John D. Hunter.
plot() : Line Graphs
bar() : Bar Graphs
hist() : plot a simple histogram
pie() : plot a simple pie chart
scatter() : plot a simple scatter chart
stackplot() : Stackplot
boxplot() : boxplot
stem() : stem plot

character Marker description character Marker


'-' solid line 'o' circle
'--' dashed line 's' square
'-.' dash-dot line 'p' pentagon
':' dotted line 'D' diamond
'v' triangle_down 'd' thin_diamond
'^' triangle_up 'x' x marker
16 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
'<' triangle_left '+' plus marker
'>' triangle_right '*' star marker

character color
‘b’ blue
‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white

Install Required Libraries :


Command : py --version
py -m pip --version
To install the module using pip :
py -m pip install pandas
py -m pip install matplotlib

command to update pip: pip install --upgrade pip wheel


or python.exe -m pip install --upgrade pip

17 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Program:
# Python program for implement pandas and matplotlib Libraries

import pandas as pd
import matplotlib.pyplot as plt

# Create a DataFrame
data = {
'A': [1, 2, 3, 4, 5,6,7],
'B': [7, 8, 6, 11, 7,10,2],
'C': [12, 10, 6, 7,6,5,10],
'D': [4, 4, 9, 9,3,6,5]
}

df = pd.DataFrame(data)
print("A indicates Day")
print("B indicates No of Study Hours")
print("C indicates No of Playing Hours")
print("D indicates No of Sleeping Hours")

print("DataFrame:")
print(df)

# Accessing DataFrame elements


print("\nAccessing Study Hours column 'B':")
print(df['B'])

18 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


print("\nAccessing element at row 2, column 'C':")
print(df.at[2, 'C'])

#
# Plot using Matplotlib
plt.figure(figsize=(8, 6))

# Plotting DataFrame columns


plt.plot(df['B'], '-dr', label='B', markersize=12)
plt.plot(df['C'], '--^m', label='C', markersize=12 )
plt.plot(df['D'], '-.ob', label='D', markersize=12 )

# Adding title and labels


plt.title('Study,Play and Sleeping Hours Comparison')
plt.xlabel('Days')
plt.ylabel('Hours')

# Adding a legend
plt.legend()

# Show the plot


plt.show()

Output :
A indicates Day
B indicates No of Study Hours
C indicates No of Playing Hours
D indicates No of Sleeping Hours
DataFrame:
A B C D
0 1 7 12 4
1 2 8 10 4
2 3 6 6 9
3 4 11 7 9
4 5 7 6 3
5 6 10 5 6
6 7 2 10 5

Accessing Study Hours column 'B':


0 7
1 8
2 6
19 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
3 11
4 7
5 10
6 2
Name: B, dtype: int64

Accessing element at row 2, column 'C':


6

20 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Experiment 4 Date :
-------------------------------------------------------------------------------------------------------------------------------

EXPERIMENT 4: Write a Python program to implement Simple Linear Regression


Aim : Write a Python program to Implement Simple Linear Regression

Description :
Linear Regression : Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between the dependent variable and one or more
independent features by fitting a linear equation to observed data.

It predicts the continuous output variables based on the independent input variable.

Simple Linear Regression : If there is only one independent feature, then it is known as
Simple Linear Regression

Simple Linear Regression equation is y=β 0 + β 1 x


where: y is the dependent variable
x is the independent variable
β 1 is slope and β 0 is intercept
xy−x . y
Estimated coefficients are : β 1= 2 2
x −x
β 0= y −β1 x
Where x is mean of x
y is mean of y
xy is mean of xy
2
x is mean of x 2

Program :

21 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


import numpy as np
import matplotlib.pyplot as plt

# entering the observation points or data


x = np.array([1,2,3,4,5,6,7,8,9,10])
y = np.array([5,8,9,11,20,16,17,18,21,26])

n1 = np.size(x)
meanx = np.mean(x)
meany = np.mean(y)

sumxy = np.sum(x*y) - (n1 * meanx * meany)


sumxx = np.sum(x*x) - (n1 * meanx * meanx)

# calculate the regression coefficients


b1 = sumxy / sumxx
b0 = meany - b1 * meanx
print("Estimated coefficients are : ")
print(f"b0 = {b0}")
print(f"b1={b1}")

# Now, we will plot the actual points or observation as scatter plot


plt.scatter(x,y, color = "b", label='Given Data', marker = "o", s = 100)

# Calculate the predicted response vector


ypred = b0 + b1 * x

# plot the regression line


plt.plot(x,ypred, '-dr', label='Regression Line', markersize=10)

plt.xlabel('x')
plt.ylabel('y')
plt.title("Simple Linear Regression", fontsize=30,
color="magenta")
plt.legend()
plt.show()

Output :
Estimated coefficients are :
b0 = 3.799999999999999
b1=2.0545454545454547

22 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


23 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
EXPERIMENT 5 Date :
---------------------------------------------------------------------------------------------------------------
EXPERIMENT 5 : Write a Python program to Implementation of Multiple Linear Regression
for House Price Prediction using sklearn

Aim : Python program to Implementation of Multiple Linear Regression for House Price
Prediction using sklearn

Description :
Linear Regression : Linear regression is a supervised machine learning algorithm that
computes the linear relationship between the dependent variable and one or more
independent features by fitting a linear equation to observed data.

It predicts the continuous output variables based on the independent input variable.

Nultiple Linear Regression : If there are more than one independent feature in a Linear
regression, then it is known as Simple Linear Regression
The equation for multiple linear regression is:

y=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3±−−−−∓β n x n

where: y is the dependent variable

x 1 , x 2 , x 3−−−−−, x n are the independent variables


β 0 is the intercept
β 1 , β 2 , β 3 ,−−−−−, β n are the slopes

The solution ^β=( ( X T X )−1 X T ) y

sklearn : Scikit-learn, also known as sklearn, is a machine learning and data modeling library for
Python.
pip install scikit-learn

Program :
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Dataset
data = pd.read_csv(r"C:\Users\DELL\Downloads\DATASET\house.csv")
print(data)
# Load the dataset from a CSV file
24 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
file_path = 'your_file.csv' # Replace with your CSV file path

# Display the first few rows of the dataset


print(data.head())

# Assuming the dependent variable (target) is in a column named 'target'


# and the independent variables are in columns 'feature1', 'feature2', etc.

# Define the independent variables (features) and the dependent variable (target)
X = data[['area', 'bedrooms', 'bathrooms']]
y= data['price']

# Split the data into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the linear regression model


model = LinearRegression()

# Train the model


model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Output the model evaluation metrics


print(f'Mean Squared Error: {mse}')
print(f'R-squared: {r2}')

# Display the model coefficients


print('Coefficients:', model.coef_)
print('Intercept:', model.intercept_)

# Plot Actual vs Predicted


plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, color='blue', edgecolor='k', alpha=0.7)
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', lw=2) # Diagonal line
for reference
plt.title('Actual vs Predicted Values')
25 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.grid(True)
plt.show()

Output :

price area bedrooms ... parking prefarea furnishingstatus


0 13300000 7420 4 ... 2 yes furnished
1 12250000 8960 4 ... 3 no furnished
2 12250000 9960 3 ... 2 yes semi-furnished
3 12215000 7500 4 ... 3 yes furnished
4 11410000 7420 4 ... 2 no furnished
.. ... ... ... ... ... ... ...
540 1820000 3000 2 ... 2 no unfurnished
541 1767150 2400 3 ... 0 no semi-furnished
542 1750000 3620 2 ... 0 no unfurnished
543 1750000 2910 3 ... 0 no furnished
544 1750000 3850 3 ... 0 no unfurnished

[5 rows x 13 columns]
Mean Squared Error: 2750040479309.052
R-squared: 0.45592991188724463
Coefficients: [3.45466570e+02 3.60197650e+05 1.42231966e+06]
Intercept: 59485.379208717495

26 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


27 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Experiment 6 Date :
-------------------------------------------------------------------------------------------------------------------------------
EXPERIMENT 6: Write a Python program to Implementation of Decision tree using sklearn
and its parameter tuning

Aim : Write a Python program to Implementation of Decision tree using sklearn and its
parameter tuning
Description :
Decision Tree : A decision tree is a flowchart-like structure used to make decisions or
predictions. It consists of nodes representing decisions or tests on attributes, branches
representing the outcome of these decisions, and leaf nodes representing final outcomes or
predictions.

 Gini Impurity: Measures the likelihood of an incorrect classification of a new instance

 Entropy : Measures the amount of uncertainty or impurity in the dataset.

Information Gain: Measures the reduction in entropy or Gini impurity after a dataset is split on
an attribute.

n
Gain(S , A)=Entropy (S)−∑ ¿ S f ∨ ¿ Entropy (S )¿ ¿
f
i=1
i
¿ S∨¿ i

Where S f i
is the subset of S for which attribute A has value i, and the entropy of
partitioning the data is calculated by weighing the entropy of each partition by its size
relative to the original set.

Constructing Decision Trees :

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in
step -3. Continue this process until a stage is reached where you cannot further classify
the nodes and called the final node as a leaf node
28 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Program :

# Import necessary libraries


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
import matplotlib.pyplot as plt

# Load the Iris dataset


iris = load_iris()
X = iris.data # Features
y = iris.target # Labels

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Decision Tree classifier


clf = DecisionTreeClassifier(random_state=42)

# Train the classifier


clf.fit(X_train, y_train)

# Predict on the test set


y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Visualize the Decision Tree


plt.figure(figsize=(12, 8))
tree.plot_tree(clf, feature_names=iris.feature_names, class_names=list(iris.target_names),
filled=True)
plt.title("Decision Tree for Iris Dataset", color='red',size=42)
plt.show()

29 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Output :
sepal length (cm) sepal width (cm) ... petal width (cm) target

0 5.1 3.5 ... 0.2 0

1 4.9 3.0 ... 0.2 0

2 4.7 3.2 ... 0.2 0

3 4.6 3.1 ... 0.2 0

4 5.0 3.6 ... 0.2 0

[5 rows x 5 columns]

Accuracy: 100.00%

EXPERIMENT 7 Date :
---------------------------------------------------------------------------------------------------------------
EXPERIMENT 7 : Write a Python program to Implementation of KNN using sklearn

30 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Aim : Python program to Implementation of KNN using sklearn

Description :
KNN (K Nearest Neighbor) : KNN is one of the most essential classification algorithms in
machine learning. It belongs to the supervised learning domain. K - Nearest neighbor methods
is to find a predefined number of training samples closest in distance to the new point KNN is
used to analyzes data to find the nearest neighboring point:

Euclidean Distance : The cartesian distance between the two points which are in the
plane/hyperplane

√∑
n
2
d ( xi , yi )= ( xi − y i )
i =1

KNN (K Nearest Neighbor) Algorithm :


Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number of neighbors
Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of the data points in each category.
Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.

Program :
# Import necessary libraries
from sklearn.model_selection import train_test_split

31 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset


data = load_iris()
X = data.data # Features
y = data.target # Labels

# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the KNN classifier


k = 3 # Number of neighbors
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
# Predict the labels for the test set
y_pred = knn.predict(X_test)

# Evaluate the classifier


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Print classification report


print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

# Print confusion matrix


print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Output :

Accuracy: 100.00%

Classification Report:
precision recall f1-score support
setosa 1.00 1.00 1.00 10
versicolor 1.00 1.00 1.00 9
32 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
virginica 1.00 1.00 1.00 11
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Confusion Matrix:
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

EXPERIMENT 8 Date:
---------------------------------------------------------------------------------------------------------------
EXPERIMENT 8 : Write a Python program to Implementation of Logistic Regression using
sklearn
33 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Aim : Python program to Implementation of Logistic Regression using sklearn

Description :
Logistic Regression : Logistic regression is a supervised machine learning algorithm used for
classification tasks to predict the probability that an instance belongs to a given class or not.
Logistic regression is used for binary classification where it uses sigmoid function, that takes
input as independent variables and produces a probability value between 0 and 1.
The equation for multiple linear regression is:

y=β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3±−−−−∓β n x n

where: y is the dependent variable

x 1 , x 2 , x 3−−−−−, x n are the independent variables


β 0 is the intercept
β 1 , β 2 , β 3 ,−−−−−, β n are the slopes

Then Logistic function is

1 1
f ( x i ) =h ( y )= −y
= −( β + β x + β x + β x ±−−−−∓ β xn )
1+ e 1+ e 0 1 1 2 2 3 3 n

The logistic model (or logit model) is a statistical model that models the log-odds of an event as
a linear combination of one or more independent variables.

Loads the Iris dataset, splits it, trains a Logistic Regression model, and evaluates its
performance.

Program :

34 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


# Import necessary libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset


data = load_iris()
X = data.data # Features
y = data.target # Labels

# Split the dataset into training and testing sets (80% training, 20% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the Logistic Regression model


log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train, y_train)

# Predict the labels for the test set


y_pred = log_reg.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Print classification report


print("\nClassification Report:")

Output :
Accuracy: 100.00%

Classification Report:
precision recall f1-score support

setosa 1.00 1.00 1.00 10


versicolor 1.00 1.00 1.00 9
virginica 1.00 1.00 1.00 11

accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30

Confusion Matrix:
35 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
[[10 0 0]
[ 0 9 0]
[ 0 0 11]]

EXPERIMENT 9 Date :
--------------------------------------------------------------------------------------------------------------
EXPERIMENT 9 : Write a Python program to Implementation of K-Means Clustering

Aim : Python program to Implementation of K-Means Clustering


36 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
Description :
K-Means Clustering: K-Means Clustering is an Unsupervised Machine Learning algorithm, which
groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters. It is an iterative algorithm that divides the unlabeled dataset into k different clusters
in such a way that each dataset belongs only one group that has similar properties.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.

Algorithm :
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
min
d i= d ( xi , μ j )
j

Where
1
Mean of cluster jis μ j= ∑N x
N j i=¿ ¿ j i
And Euclidean Distance is

√∑ (
n
2
d ( xi , yi )= xi − y i )
i =1
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Flowchart :

37 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


K-Means Clustering on the Iris dataset for an unsupervised learning approach. K-Means is
commonly used to divide data into clusters based on feature similarity.

Program :
# Import necessary libraries

38 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


# Import necessary libraries
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load the Iris dataset


data = load_iris()
X = data.data # Features

# Create and train the K-Means model


num_clusters = 3 # Set the number of clusters
kmeans = KMeans(n_clusters=num_clusters, random_state=42)
kmeans.fit(X)

# Predict the clusters


clusters = kmeans.predict(X)

# Plotting the clusters


# Convert data to DataFrame for easier visualization
df = pd.DataFrame(X, columns=data.feature_names)
df['Cluster'] = clusters

# Plot the clusters (using only the first two features for visualization)
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x='sepal length (cm)', y='sepal width (cm)', hue='Cluster',
palette='viridis')
plt.title("K-Means Clustering on Iris Dataset", color="red",size=40)
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.show()

# Print the cluster centers


print("Cluster Centers:\n", kmeans.cluster_centers_)

Output :

Cluster Centers:
[[5.9016129 2.7483871 4.39354839 1.43387097]
[5.006 3.428 1.462 0.246 ]
39 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
[6.85 3.07368421 5.74210526 2.07105263]]

EXPERIMENT 10 Date :
( Mini Project )
--------------------------------------------------------------------------------------------------------------
EXPERIMENT 10 : Performance analysis of Classification Algorithms on a specific dataset
(Mini Project)

40 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


Mini Project Title : Performance Analysis of Classification Algorithms on the
" Iris Dataset "

Objective :
Analyze and compare the performance of multiple classification algorithms (e.g.,
Logistic Regression, Decision Tree, Random Forest, SVM, and KNN) on the
popular *Iris dataset* to predict the species of a flower based on its features.

Steps to Implement the Project :


1. Import Necessary Libraries python :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, accuracy_score, roc_auc_score,
confusion_matrix
from sklearn.metrics import roc_curve
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

2. Load the Dataset :


The Iris dataset can be loaded directly from `sklearn` or downloaded from external
sources like Kaggle.
from sklearn.datasets import load_iris

# Load dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)

41 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


data['species'] = iris.target

# Map target to class names for readability


data['species'] = data['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# Display basic info


print(data.head())
print(data.info())

3. Exploratory Data Analysis (EDA) :


1. Check class distribution.
2. Visualize feature relationships.

```python
# Check class distribution
print(data['species'].value_counts())

# Pairplot to visualize relationships


sns.pairplot(data, hue='species')
plt.show()

4. Data Preprocessing:
1. Split into training and test sets.
2. Scale the features.

# Split into features and target


X = data.drop('species', axis=1)
y = data['species']

# Encode target variable


y = pd.get_dummies(y, drop_first=False).values.argmax(axis=1)

# Train-test split

42 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
```
5. Train and Evaluate Models:

**Logistic Regression**

model_lr = LogisticRegression()
model_lr.fit(X_train, y_train)
y_pred_lr = model_lr.predict(X_test)
print("Logistic Regression:")
print(classification_report(y_test, y_pred_lr))
```

**Decision Tree**

model_dt = DecisionTreeClassifier()
model_dt.fit(X_train, y_train)
y_pred_dt = model_dt.predict(X_test)
print("Decision Tree:")
print(classification_report(y_test, y_pred_dt))
```

**Random Forest**

model_rf = RandomForestClassifier()
model_rf.fit(X_train, y_train)
y_pred_rf = model_rf.predict(X_test)
43 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
print("Random Forest:")
print(classification_report(y_test, y_pred_rf))
```

**K-Nearest Neighbors (KNN)**

model_knn = KNeighborsClassifier()
model_knn.fit(X_train, y_train)
y_pred_knn = model_knn.predict(X_test)
print("K-Nearest Neighbors:")
print(classification_report(y_test, y_pred_knn))
```

**Support Vector Machine (SVM)**

model_svm = SVC(probability=True)
model_svm.fit(X_train, y_train)
y_pred_svm = model_svm.predict(X_test)
print("Support Vector Machine:")
print(classification_report(y_test, y_pred_svm))

6. Compare Performance:
Create a table or plot comparing metrics such as accuracy, precision, recall, and
F1-score.
# Accuracy scores
accuracy_scores = {
'Logistic Regression': accuracy_score(y_test, y_pred_lr),
'Decision Tree': accuracy_score(y_test, y_pred_dt),
'Random Forest': accuracy_score(y_test, y_pred_rf),
'K-Nearest Neighbors': accuracy_score(y_test, y_pred_knn),
'Support Vector Machine': accuracy_score(y_test, y_pred_svm)
}

# Bar plot of accuracy


44 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE
plt.figure(figsize=(10, 6))
plt.bar(accuracy_scores.keys(), accuracy_scores.values(), color='skyblue')
plt.xlabel('Algorithm')
plt.ylabel('Accuracy')
plt.title('Comparison of Classification Algorithms')
plt.xticks(rotation=45)
plt.show()

7. Visualize Confusion Matrices:


from sklearn.metrics import ConfusionMatrixDisplay

models = [model_lr, model_dt, model_rf, model_knn, model_svm]


model_names = ['Logistic Regression', 'Decision Tree', 'Random Forest', 'KNN',
'SVM']

for model, name in zip(models, model_names):


ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)
plt.title(f'Confusion Matrix: {name}')
plt.show()

8. Deliverables:
1. **Python Notebook**: Full implementation of the project.
2. **Report**:
- Dataset summary and EDA findings.
- Algorithms and their performance (accuracy, F1-score, etc.).
- Recommendations for the best classi

Output :

45 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE


46 Machine Learning Lab | VIGNAN INSTITUTE OF TECHNOLOGY AND SCIENCE

You might also like