0% found this document useful (0 votes)
16 views80 pages

ML_final

This document is a laboratory manual for the course 'Introduction to Machine Learning' at Vishwakarma Government Engineering College, designed to enhance practical skills in machine learning through hands-on experiments. It outlines course objectives, practical outcomes, and relevant industry skills, while providing guidelines for both students and faculty. The manual emphasizes the use of Python and various libraries such as TensorFlow, Scikit-learn, and PyTorch for implementing machine learning algorithms.

Uploaded by

Ruturaj Nakum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views80 pages

ML_final

This document is a laboratory manual for the course 'Introduction to Machine Learning' at Vishwakarma Government Engineering College, designed to enhance practical skills in machine learning through hands-on experiments. It outlines course objectives, practical outcomes, and relevant industry skills, while providing guidelines for both students and faculty. The manual emphasizes the use of Python and various libraries such as TensorFlow, Scikit-learn, and PyTorch for implementing machine learning algorithms.

Uploaded by

Ruturaj Nakum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

PRESENTED BY: NISHI GOHIL

ENTROLLMENT NO:210170111039

1|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

A LABORATORY MANUAL FOR


Introduction to Machine Learning (3171114)

Submitted by

NAKUM RUTURAJ

210170111094

In partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING

In

Electronics and Communication Department

Vishwakarma Government Engineering College, Chandkheda

Gujarat Technological University, Ahmedabad

2024-2025

2|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

A Laboratory Manual for

Introduction to Machine Learning


(3171114)
B.E. Semester 7
(Electronics and Communication)

Vishwakarma Government
Engineering College, Chandkheda

Directorate of Technical Education,


Gandhinagar, Gujarat
Year : 2024-25

Vishwakarma Government Engineering College, Chandkheda

3|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Certificate

This is to certify that Mr./Ms.


Enrollment No. of B.E. Semester electronics &
Communication Engineering of this Institute (GTU Code: ) has satisfactorily
completed the Practical / Tutorial work for the subject of Introduction to Machine
Learning (3171114) for the academic year 2024-25.

Place:
Date:

Name and Sign of Faculty member Head of the Department

4|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Preface
Main motto of any laboratory/practical/field work is for enhancing required skills as well as creating
ability amongst students to solve real time problem by developing relevant competencies in
psychomotor domain. By keeping in view, GTU has designed competency focused outcome-based
curriculum for engineering degree programs where sufficient weightage is given to practical work.
It shows importance of enhancement of skills amongst the students and it pays attention to utilize
every second of time allotted for practical amongst students, instructors and faculty members to
achieve relevant outcomes by performing the experiments rather than having merely study type
experiments. It is must for effective implementation of competency focused outcome-based
curriculum that every practical is keenly designed to serve as a tool to develop and enhance relevant
competency required by the various industry among every student. These psychomotor skills are
very difficult to develop through traditional chalk and board content delivery method in the
classroom. Accordingly, this lab manual is designed to focus on the industry defined relevant
outcomes, rather than old practice of conducting practical to prove concept and theory.

By using this lab manual students can go through the relevant theory and procedure in advance
before the actual performance which creates an interest and students can have basic idea prior to
performance. This in turn enhances pre-determined outcomes amongst students. Each experiment in
this manual begins with competency, course outcomes as well as practical outcomes (objectives).
The students will also achieve safety and necessary precautions to be taken while performing
practical.

This manual also provides guidelines to faculty members to facilitate student centric lab activities
through each experiment by arranging and managing necessary resources in order that the students
follow the procedures with required safety and necessary precautions to achieve the outcomes. It
also gives an idea that how students will be assessed by providing rubrics.

This lab manual is focuses on the development of Computer Programs for machine learning that can
change when exposed to new data. In this manual we’ll see basics of Machine Learning, and
implementation of a simple machine-learning algorithm using python.
Machine learning is a method of teaching computers to learn from data, without being explicitly
programmed. Python is a popular programming language for machine learning because it has a large
number of powerful libraries and frameworks that make it easy to implement machine learning
algorithms.
To get started with machine learning using Python, you will need to have a basic understanding of
Python programming and some knowledge of mathematical concepts such as probability, statistics,
and linear algebra.

Utmost care has been taken while preparing this lab manual however always there is chances of
improvement. Therefore, we welcome constructive suggestions for improvement and removal of
errors if any.

5|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Practical – Course Outcome matrix


Course Outcomes (COs):
CO1: Understand basic concepts of machine learning as well as challenges involved
CO2:Learn and implement various basic machine learning algorithms.
CO3:Study dimensionality reduction concept and its role in machine learning techniques.
CO4:Realize concepts of advanced machine learning algorithms.
CO5:Comprehend basic concepts of Neural network and its use in machine learning.

Sr. CO CO CO CO CO
Aim / Objective(s) of Experiment
No. 1 2 3 4 5
1. Introduction to Libraries used for Machine Learning. √
Create dataset of 12 Samples each having 10 features.
2. Split it into training and testing dataset. √ √

3. Exploratory Data Analysis on Iris Dataset √


Write a program to implement simple linear regression

4. (A) from manual dataset. √

(B) from scikit-learn python library


Write a program to implement Multiple linear
5. √
regression.
Write a program to implement Decision tree on IRIS
6. √
datasets.
7. Write a program to implement logistic regression. √
Write a program to implement SVM classifier using
8. √
linear kernel using IRIS datasets.
Write a program to implement SVM classifier using
9. √
rbf kernel using IRIS datasets.
Write a program of dimensionality reduction using
10. √
PCA.
Write a program to implement Random Forest
11. √
algorithm feature extraction using IRIS Dataset
Write a program to implement K-Means Clustering
12. √
using IRIS Dataset.
13. Write a program to implement Navie Bias algorithm √
Write a program to implement and visualize the
14. √
working of activation function.
To implement Convolution Neural Network and find
15 out the total parameters of Convolution Neural √
Network.

6|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Industry Relevant Skills

The following industry relevant competency are expected to be developed in the student by
undertaking the practical work of this laboratory.
1.
2.

Guidelines for Faculty members


1. Teacher should provide the guideline with demonstration of practical to the students
with all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students
before starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the
students and ensure that the respective skills and competencies are developed in the
students after the completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task
assigned to check whether it is as per the instructions or not.
8. Teacher is expected to refer complete curriculum of the course and follow the
guidelines for implementation.

Instructions for Students


1. Students are expected to carefully listen to all the theory classes delivered by the faculty
members and understand the COs, content of the course, teaching and examination
scheme, skill set to be developed etc.
2. Students shall organize the work in the group and make record of all observations.
3. Student shall attempt to develop related hand-on skills and build confidence.
4. Student shall develop the habits of evolving more ideas, innovations, skills etc. apart from
those included in scope of manual.
5. Student shall refer technical magazines and data books.
6. Student should develop a habit of submitting the experimentation work as per the schedule
and s/he should be well prepared for the same.

Common Safety Instructions


1. Students are expected to proper shutdown computer after performing experiment.

7|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Index
(Progressive Assessment Sheet)

Sr. Objective(s) of Experiment Pag Date Date Assess Sign. of Rema


No. e of of ment Teacher rks
No. perfor submis Marks with
mance sion date
1 Introduction to Libraries used for Machine
Learning.
2 Create dataset of 12 Samples each having 10
features. Split it into training and testing
dataset.
3 Exploratory Data Analysis on Iris Dataset

4 Write a program to implement simple linear


regression: (A) from manual dataset.

(B) from scikit-learn python library


5 Write a program to implement Multiple linear
regression.
6 Write a program to implement Decision tree
on IRIS datasets.
7 Write a program to implement logistic
regression.
8 Write a program to implement SVM classifier
using linear kernel using IRIS datasets.
9 Write a program to implement SVM classifier
using rbf kernel using IRIS datasets.
10 Write a program of dimensionality reduction
using PCA.
11 Write a program to implement Random Forest
algorithm feature extraction using IRIS
Dataset
12 Write a program to implement K-Means
Clustering using IRIS Dataset.
13 Write a program to implement Navie Bias
algorithm
14 Write a program to implement and visualize
the working of activation function.

8|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

15 To implement Convolution Neural Network


and find out the total parameters of
Convolution Neural Network.
Total

9|Page
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 1

Introduction to library used for Machine Learning.


Date:

Competency and Practical Skills: Basic knowledge of Machine Learning


Relevant CO: Understand basic concepts of machine learning as well as challenges involved.
Objectives: Introduction to Libraries used for Machine Learning.

TensorFlow Library:

TensorFlow is an open-source end-to-end platform for creating Machine Learning


applications. It was created and is maintained by Google. It is a symbolic math library that
uses dataflow and differentiable programming to perform various tasks focused on training
and inference of deep neural networks. It allows developers to create machine learning
applications using various tools, libraries, and community resources.
Tensor Flow is at present the most popular software library. There are several real-world
applications of deep learning that makes Tensor Flow popular. Being an Open-Source library
for deep learning and machine learning, Tensor Flow finds a role to play in text-based
applications, image recognition, voice search, and many more. Deep Face, Facebook’s image
recognition system, uses Tensor Flow for image recognition. Every Google app that you use
has made good use of Tensor Flow to make your experience better.
Tensor flow’s name is directly derived from its core component: A tensor is a vector or
matrix of n-dimensions that represents all types of Tensor data. A tensor is a vector/matrix of
n- dimensions representing types of data. Values in a tensor are of identical data types with a
known shape, and this shape is the dimensionality of the matrix. A vector is a one-
dimensional tensor; a matrix is a two-dimensional tensor.
A tensor is an object with three properties:
(1) a unique label (name),
(2) a dimension (shape), and
(3) A data type (dtype).

Here is the first example of tensor Flow. It shows how you can define constants and perform
computation with those constants using the session.

# Import `tensorflow`

10 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

import tensorflow as tf

# Initialize two constants

x1 = tf.constant([1,2,3,4])

x2 = tf.constant([5,6,7,8])

# Multiply

result = tf.multiply(x1, x2)

# Print the result

print(result)

Output: tf.Tensor([5 12 21 32], shape=(4,), dtype=int32)

Scikit-learn Library:

Scikit-learn is an open source machine learning library that supports supervised and
unsupervised learning. It also provides various tools for model fitting, data preprocessing,
model selection, model evaluation, and many other utilities. Scikit-learn provides dozens of
built-in machine learning algorithms and models, called estimators. Each estimator can be
fitted to some data using its fit method.

The library is built upon the SciPy (Scientific Python) that must be installed before you can
use scikit-learn. This stack that includes:

 NumPy: Base n-dimensional array package.


 SciPy: Fundamental library for scientific computing.
 Matplotlib: Comprehensive 2D/3D plotting.
 IPython: Enhanced interactive console.
 Sympy: Symbolic mathematics.
 Pandas: Data structures and analysis.

Extensions or modules for SciPy care conventionally named SciKits. As such, the module
provides learning algorithms and is named scikit-learn.

11 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Scikit-learn provides below group of model to the user.

 Clustering − This model is used for grouping unlabeled data.


 Cross Validation − It is used to check the accuracy of supervised models on unseen data.
(splitting the dataset in testdataset and trainingdataset)
 Dimensionality Reduction − It is used for reducing the number of attributes in data which
can be further used for summarisation, visualisation and feature selection.
 Ensemble methods − As name suggest, it is used for combining the predictions of
multiple supervised models.
 Feature extraction − It is used to extract the features from data to define the attributes in
image and text data.
 Feature selection − It is used to identify useful attributes to create supervised models.
 Open Source − It is open source library and also commercially usable under BSD
(Berkeley Software Distribution) license.

Pytorch library:
PyTorch is a Python-based scientific computing package serving two broad purposes:
 A replacement for NumPy to use the power of GPUs and other accelerators.
 An automatic differentiation library that is useful to implement neural networks.

PyTorch is closely related to the lua-based Torch framework which is actively used in
Facebook.

Features of PyTorch : The major features of PyTorch are mentioned below

Easy Interface: PyTorch offers easy to use API; hence it is considered to be very simple to
operate and runs on Python. The code execution in this framework is quite easy.
Python usage: This library is considered to be Pythonic which smoothly integrates
with the Python data science stack. Thus, it can leverage all the services and
functionalities offered by the Python environment.
Computational graphs: PyTorch provides an excellent platform which offers
dynamic computational graphs. Thus a user can change them during runtime. This is
highly useful when a developer has no idea of how much memory is required for creating a
neural network model.
Pytorch Tensors: Tensors are a specialized data structure that are very similar to arrays and
matrices. Tensors are similar to NumPy’s ndarrays, except that tensors can run on GPUs or
other specialized hardware to accelerate computing.

Interoperability With Numpy:

12 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Numpy is a popular open source library used for mathematical and scientific computing in
Python. Instead of reinventing the wheel, Pytorch interpolates really well with Numpy to
leverage its existing ecosystem of tools and libraries.
Here’s how we create array using numpy:

import numpy as np

x = np. array ( [ [1, 2], [3, 4.]])

>> x

array ( [ [1 . , 2.],

[3., 4. ]])

We can convert a Numpy array in tensor using torch.from_numpy

# Convert the numpy array to a torch tensor.

y = torch. tensor (x)

>> y

tensor ( [ [1., 2. ],

[3., 4. ]], dtype=torch. float64)

Conclusion:

13 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Question:

1. What is Tensor Flow?. List the properties of Tensor flow.


2. What are the function of Scikit library.
3. Define (1) NumPy (2) SciPy (3) Matplotlib (3) IPython (4) Sympy.(5) Pandas
4. What is PyTorch?
5. What are the function of PyTorch.

Rubrics:

terminology Student can Student Student can Student can Remarks


not can understand understand all
understand understand all library library and
all library all library tools and implement all
tools and not tools but able to basic function
able to not able to implement
implementing implement some basic
basic basic function
function function
justification Poor Average Good Excellent
(0-2 marks) (3-5 marks) (5-7 marks) (8-10 m1rks)

14 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 2

Create train and test dataset


Date:

Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Comprehend basic concepts of neural network and its use in machine learning.

Objectives:

1) To learn train and test data generation


2) To create dataset of 12 samples each having 10 features.
3) Split it into training and testing dataset using Scikit-learn.

Assumption: Assume 12 sample is [0,1,1,0,1,0,0,1,1,0,1,0].


Code:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the CSV file (make sure the file path is correct)
student_data = pd.read_csv('student_placement.csv')

# Extract the relevant features and the target variable


X = student_data[['CGPA', 'IQ', 'Resume_Score',
'Internship']].values
y = student_data['Placed'].values

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=5, random_state=8)

15 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Display the split data


print("\nTrain Samples (Features) are:\n", X_train)
print("\nOutput classification of train sample (Placed) is:\n",
y_train)
print("\nTest Samples (Features) are:\n", X_test)
print("\nOutput classification of Test sample (Placed) is:\n",
y_test)

OUTPUT:

16 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question :(Sufficient space to be provided)


1. Define feature, training data and testing data.
2. Difference between training data and validation data.
3. How do you handle missing or corrupted data in dataset?
4. Write a function to split training and testing dataset using scikit learn.

Rubrics:

Rubrics 1 2 3 4 5

17 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 3

Analysis of iris dataset

Date:
Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives: Exploratory Data Analysis on Iris Dataset.

Exploratory Data Analysis (EDA) is a technique to analyze data using some visual Techniques.
With this technique, we can get detailed information about the statistical summary of the data. We
will also be able to deal with the duplicates values, outliers, and also see some trends or patterns
present in the dataset

Student Placement dataset


The Placement Dataset contains five columns: CGPA, IQ, Resume Score, Internship, and
Placed. It helps analyze students' placements based on academic and experiential factors.
Use the read_csv() function from the Pandas library to load the dataset and convert it into a
DataFrame.
You can use this dataset for various data analysis or machine learning tasks, similar to how
the Iris dataset is used in data science.

Example :
python
import pandas as pd
# Load the Placement
dataset df =
pd.read_csv('data.csv') #
Display the first 5 rows
print(df.head())

18 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

19 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Getting Information about the Dataset


We will use the shape parameter to get the shape of the dataset. We can see that the data
frame contains 6 columns and 100 rows. Now, let’s also the columns and their data types. For
this, we will use the info() method.
Example:

The describe() function applies basic statistical computations on the dataset like extreme
values, count of data points standard deviation, etc. Any missing value or NaN value is
automatically skipped. describe() function gives a good picture of the distribution of data.
Example:

110 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Checking Missing Values


We will check if our data contains any missing values or not. Missing values can occur when
no information is provided for one or more items or for a whole unit. We will use
the isnull() method.
Example:

Checking Duplicates
Let’s see if our dataset contains any duplicates or not. Pandas drop_duplicates() method helps
in removing duplicates from the data frame.

Example:

20 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

We can see that there are only three unique species. Let’s see if the dataset is balanced or not
i.e. all the species contain equal amounts of rows or not. We will use
the Series.value_counts() function. This function returns a Series containing counts of unique
values.

Example:

Data Visualization

Visualizing the target column

We will use Matplotlib and Seaborn library for the data visualization.
Example:

import seaborn as sns

import matplotlib.pyplot as plt

# Assuming 'df' is your DataFrame containing

the 'Placed' column

sns.countplot(x='Placed', data=df, palette=['#3498db', '#e74c3c'])

21 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.show()

Relation between variables

We will see the relationship between the CGPA and IQ and also between Resume score and
Internship

Example 1: Comparing CGPA and IQ

import seaborn as sns

import matplotlib.pyplot as plt

# Scatter plot with 'CGPA' on x-axis, 'Resume_Score' on y-axis, and 'Placed' as the hue

sns.scatterplot(x='CGPA',

y='IQ',

hue='Placed',

data=df)

# Placing the legend outside the figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)

plt.show()

From the above plot, we can infer that –

 Students with lower CGPA but higher IQ might still get placed.
 Students with moderate CGPA and IQ lie in the middle of those who are placed or not.
 Students with higher CGPA and lower IQ may or may not get placed depending on
other factors like Resume_Score or Internship.

Example 2: Comparing Resume Score and Internship

# importing packages

import seaborn as sns

import matplotlib.pyplot as plt

22 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Scatter plot with 'CGPA' on x-axis, 'Resume_Score' on y-axis, and 'Placed' as the hue

sns.scatterplot(x='Resume_Score', y='Internship', hue='Placed', data=df)

# Placing Legend outside the Figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)

plt.show()

From the above plot, we can infer that –

 Students with lower Resume Scores and no Internship experience (0) are less likely to
get placed.
 Students with moderate Resume Scores and Internship experience (1) fall in the
middle in terms of placement likelihood.
 Students with higher Resume Scores and Internship experience are more likely to get
placed.

Let’s plot all the column’s relationships using a pairplot. It can be used for multivariate
analysis.

Example:

import seaborn as sns

import matplotlib.pyplot as plt

# Pairplot with relevant features from the placement dataset

sns.pairplot(df[['CGPA', 'Resume_Score', 'IQ', 'Internship', 'Placed']],

hue='Placed', height=2)

plt.show()

23 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Histograms

Histograms allow seeing the distribution of data for various columns. It can be used for uni as
well as bi-variate analysis.

# importing packages

import seaborn as sns

import matplotlib.pyplot as plt

# Creating a 2x2 grid of subplots with histograms

fig, axes = plt.subplots(2, 2, figsize=(10, 10))

# Adjusting titles and plotting histograms for the placement dataset

axes[0, 0].set_title("CGPA")

24 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

axes[0, 0].hist(df['CGPA'], bins=7)

axes[0, 1].set_title("Resume Score")

axes[0, 1].hist(df['Resume_Score'], bins=5)

axes[1, 0].set_title("IQ")

axes[1, 0].hist(df['IQ'], bins=6)

axes[1, 1].set_title("Internship")

axes[1, 1].hist(df['Internship'], bins=6)

# Display the plot

plt.tight_layout()

plt.show()

From the above plot, we can see that:

25 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

 CGPA: The highest frequency of CGPA is between 7.0 and 8.0, indicating that most
students have CGPAs in this range.
 Resume Score: The highest frequency of Resume Scores is around 80, suggesting
that a majority of students score within the range of 70 to 90.
 IQ: The highest frequency of IQ scores is around 110, indicating that many students
have IQs in the range of 105 to 115.
 Internship: The highest frequency of Internship experience is around 1.0, showing
that most students have completed internships.

Histograms with Distplot Plot

Distplot is used basically for the univariant set of observations and visualizes it through a
histogram i.e. only one observation and hence we choose one particular column of the dataset.

Example:

# Importing packages

import seaborn as sns

import matplotlib.pyplot as pl

# Distplot for 'CGPA' with 'Placed' as hue

plot = sns.FacetGrid(df, hue="Placed")

plot.map(sns.histplot, "CGPA").add_legend()

# Distplot for 'Resume_Score' with 'Placed' as hue

plot = sns.FacetGrid(df, hue="Placed")

plot.map(sns.histplot, "Resume_Score").add_legend()

# Distplot for 'IQ' with 'Placed' as hue

plot = sns.FacetGrid(df, hue="Placed")

plot.map(sns.histplot, "IQ").add_legend()

# Distplot for 'Internship' with 'Placed' as hue

plot = sns.FacetGrid(df, hue="Placed")

plot.map(sns.histplot, "Internship").add_legend()

26 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Display the plots

plt.show()

From the above plots, we can see that:


 CGPA: There is a significant amount of overlapping between the CGPA distributions
of placed and unplaced students, indicating that CGPA may not be a strong
differentiator for placement status.

27 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

 Resume Score: Similar to CGPA, the Resume Score distributions show considerable
overlap, suggesting that it is also not a decisive factor in placement outcomes.
 IQ: There is a noticeable amount of overlap, but the distributions for placed and
unplaced students begin to diverge slightly, indicating that IQ may play a role in
placement decisions.
 Internship: There is a distinct separation between the distributions for placed and
unplaced students, suggesting that having an internship significantly impacts
placement status.

Handling Correlation

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.

Heatmaps

The heatmap is a data visualization technique that is used to analyze the dataset as colors in
two dimensions. Basically, it shows a correlation between all numerical variables in the
dataset. In simpler terms, we can plot the above-found correlation using the heatmaps.
Example:

# Importing packages

import seaborn as sns

import matplotlib.pyplot as plt

28 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Dropping non-numeric columns such as 'Name'

sns.heatmap(df.drop(['Name', 'Placed'], axis=1).corr(method='pearson'),

annot=True, cmap='coolwarm', linewidths=0.5)

# Display the heatmap

plt.title("Correlation Heatmap of Placement Dataset")

plt.show()

From the above graph, we can observe that:

 CGPA and Internship: There is a moderate negative correlation of -0.48, indicating


that higher CGPA is associated with lower Internship scores, or vice versa.
 IQ and Internship: A slight positive correlation of 0.18 suggests that higher IQ scores
may correlate with better Internship performance, although this relationship is not
strong.
 Resume Score and CGPA: There is a weak negative correlation of -0.17, suggesting a
negligible relationship between these two variables.
 Resume Score and Internship: The correlation is -0.24, indicating a mild negative
relationship, where better Internship performance might correspond with lower
Resume Scores.

29 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Box Plots

We can use boxplots to see how the categorical value os distributed with other numerical
values.
Example:

# importing packages

import seaborn as sns

import matplotlib.pyplot as plt

def graph(y):

sns.boxplot(x="Placed", y=y, data=df)

plt.figure(figsize=(10,10))

# Adding the subplot at the specified

# grid position

plt.subplot(221)

graph('CGPA')

plt.subplot(222)

graph('IQ')

plt.subplot(223)

graph('Resume_Score')

plt.subplot(224)

graph('Internship')

plt.show()

30 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

From the above graph, we can see that:


 CGPA: The 'Placed' candidates (1) exhibit a higher median CGPA compared to the
'Not Placed' candidates (0), indicating that higher CGPA is correlated with better
placement outcomes. The spread of CGPA values among placed candidates is
narrower, suggesting consistency in performance.
 IQ: The box plot shows that 'Placed' candidates have a higher median IQ than those
who are 'Not Placed', with less variability in IQ scores for those who secured
placements. This suggests that higher IQ may also play a role in the likelihood of
being placed.
 Resume Score: The median Resume Score for 'Placed' candidates is higher than for
'Not Placed' candidates, indicating that a better resume correlates with placement
success. The spread of scores shows a consistent performance for both groups, but
'Placed' candidates have higher scores overall.

31 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

 Internship: There is a significant difference in internship scores between the two


groups. However, the box plot indicates that the 'Not Placed' group has very few
internship scores above a certain threshold, suggesting that internship experience may
be a critical factor for securing placement.
Handling Outliers
An Outlier is a data-item/object that deviates significantly from the rest of the (so-called
normal)objects. They can be caused by measurement or execution errors. The analysis for
outlier detection is referred to as outlier mining.
There are many ways to detect the outliers, and the removal process is the data frame same as
removing a data item from the panda’s dataframe.

Let’s consider the placement dataset and let’s plot the boxplot for the CGPA column.
Example:

# importing packages

import seaborn as sns

import matplotlib.pyplot as plt

# Load the dataset

df = pd.read_csv('data.csv')

sns.boxplot(x='CGPA', data=df)

In the above graph, no outliers.


Removing Outliers
For removing the outlier, one must follow the same process of removing an entry from the
dataset using its exact position in the dataset because in all the above methods of detecting
the outliers end result is the list of all those data items that satisfy the outlier definition
according to the method used.
Example:
We will detect the outliers using IQR and then we will remove them. We will also draw the
boxplot to see if the outliers are removed or not.

# Importing necessary packages

import numpy as np

import pandas as pd

32 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

import seaborn as sns

import matplotlib.pyplot as plt

# Load the dataset (assuming df is already your placement dataset)

# df = pd.read_csv('your_dataset.csv')

# IQR for 'CGPA'

Q1 = np.percentile(df['CGPA'], 25, interpolation='midpoint')

Q3 = np.percentile(df['CGPA'], 75, interpolation='midpoint')

IQR = Q3 - Q1

print("Old Shape: ", df.shape)

# Upper and Lower bound to identify outliers

upper = np.where(df['CGPA'] >= (Q3 + 1.5 *

IQR)) lower = np.where(df['CGPA'] <= (Q1 - 1.5 *

IQR)) # Removing the outliers

df.drop(upper[0], inplace=True)

df.drop(lower[0], inplace=True

print("New Shape: ", df.shape)

# Plotting the boxplot for 'CGPA' column

sns.boxplot(x='CGPA', data=df)

plt.title("Boxplot for CGPA after Outlier Removal")

plt.show()

33 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question :

1. What is EDA?
2. Write a code for comparing Sepal length and sepal width.
3. Expalain Histogram.
4. Define the following function:(I) Heatmaps (ii)Box plot (iii)describe (iv)
checking duplicates.
5. Explain library used for data visualization

Rubrics:

Rubrics 1 2 3 4 5

34 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 4(A)

Linear Regression Algorithm


Date:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives:

1. Implement simple linear regression algorithm


2. Plot Regression line.
3. Find coefficient of linear regression
CODE:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('data.csv')
# Extracting CGPA as the independent variable (X) and Placed as the dependent variable (y)
X=
data['CGPA'].values y =
data['Placed'].values
n = len(X) # Number of data points
mean_x = np.mean(X)
mean_y = np.mean(y
# Calculating the slope (m)
numerator = np.sum((X - mean_x) * (y - mean_y))
denominator = np.sum((X - mean_x) ** 2)
slope = numerator / denominator
intercept = mean_y - slope * mean_x
y_pred = slope * X + intercept
plt.scatter(X, y, color='blue', label='Data points') # Original data points
plt.plot(X, y_pred, color='red', label='Regression Line') # Regression line
plt.xlabel('CGPA (Independent Variable)')
35 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

36 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.ylabel('Placed (Dependent Variable)')


plt.title('Simple Linear Regression: CGPA vs Placement')
plt.legend()
print(f"Slope (m): {slope}")
print(f"Intercept (b): {intercept}")
OUTPUT:

Conclusion:

37 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 4(B)

Linear Regression Algorithm


Date:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Learn and implement various basic machine learning algorithms
Objective:
1. Implement simple linear regression algorithm.

2. Display scatter plot of linear regression.

3. Find coefficient, mean squared error and variance score of linear regression

Code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load your dataset
data = pd.read_csv('data.csv')
# Step 2: Extract CGPA (independent variable) and Placed (dependent variable)
X = data['CGPA'].values.reshape(-1, 1) # Reshape to 2D array for sklearn
y = data['Placed'].values
# Step 3: Create and train the linear regression model
model = LinearRegression()
model.fit(X, y)
# Step 4: Make predictions using the trained model
y_pred = model.predict(X)
# Step 5: Plot the scatter plot and regression line
plt.scatter(X, y, color='blue', label='Data points') # Original data points
plt.plot(X, y_pred, color='red', label='Regression Line') # Regression line
plt.xlabel('CGPA (Independent Variable)')

38 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.ylabel('Placed (Dependent Variable)')


plt.title('Simple Linear Regression: CGPA vs Placement')
plt.legend()
plt.show()
# Step 6: Calculate and display the coefficient, intercept, mean squared error, and variance
score
slope = model.coef_[0]
intercept = model.intercept_
# Mean Squared Error
(MSE)
mse = mean_squared_error(y,
y_pred) # R-squared (variance score)
variance_score = r2_score(y, y_pred)
# Output the results
print(f"Coefficient (Slope): {slope}")
print(f"Intercept: {intercept}")
print(f"Mean Squared Error (MSE): {mse}")
print(f"Variance Score (R²): {variance_score}")
OUTPUT:

39 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

310 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question:

1. What is Linear regression?


2. What are the common types of error in linear regression.
3. Define the following term (1) mean square error (2) variance
Rubrics:

Rubrics 1 2 3 4 5

311 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 5

Multiple Linear Regression Algorithm


Date:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Learn and implement various basic machine learning algorithms
Objective:
1. Implement multiple linear regression algorithm.

2. Display scatter plot of multiple linear regression.

3. Find difference between actual and predicted value.


Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
data = pd.read_csv('data.csv')
X = data[['CGPA']] # Features (independent variables)
y = data['Placed'] # Dependent variable
model = LinearRegression()
model.fit(X, y)y_pred = model.predict(X)
plt.scatter(data['CGPA'], y, color='blue', label='Actual
values')
plt.scatter(data['CGPA'], y_pred, color='red', label='Predicted
values') plt.plot(data['CGPA'], y_pred, color='green',
label='Regression Line') plt.xlabel('CGPA (Feature)')
plt.ylabel('Placed (Dependent Variable)')
plt.title('Multiple Linear Regression: Actual vs
Predicted') plt.legend()
plt.show()
residuals = y - y_pred

40 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

40 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

print("Differences (Residuals) between actual and predicted values:\n", residuals)


Output:

Conclusion:

Question:

1. Explain simple and multiple linear regression.


2. What is the difference between linear and non-linear regression.
3. Define actual and predicted value.

Rubrics:

Rubrics 1 2 3 4 5

41 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 6

Decision Tree Algorithm


Date:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Learn and implement various basic machine learning algorithms
Objective:
1. Implement Decision tree algorithm on IRIS datasets.

2. Plot tree.

3. Find classification report, confusion matrix and accuracy.

Code:

import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.metrics import accuracy_score, classification_report,


confusion_matrix,ConfusionMatrixDisplay

from sklearn.model_selection import train_test_split

# Load the dataset

data = pd.read_csv('data.csv')

# Display the first few rows of the dataset

print("Dataset Preview:")

print(data.head())

# Load and split the data

X = data[['CGPA', 'IQ']]

y = data['Placed']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

42 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Initialize and train the model

decision_tree = DecisionTreeClassifier(random_state=42, max_depth=5)

decision_tree.fit(X_train, y_train)

# Make predictions

y_pred_tree = decision_tree.predict(X_test)

# Evaluate the model

accuracy_tree = accuracy_score(y_test, y_pred_tree)

print(f"Decision Tree Accuracy: {accuracy_tree * 100:.2f}%")

print("Classification Report:\n", classification_report(y_test, y_pred_tree))

# Display predicted values

print("Predicted Values (Decision Tree):",

y_pred_tree) # Plot Confusion Matrix

cm = confusion_matrix(y_test, y_pred_tree)

disp = ConfusionMatrixDisplay(confusion_matrix=cm)

disp.plot(cmap=plt.cm.Greens)

plt.title('Confusion Matrix - Decision Tree')

plt.show()

# Visualize the decision tree

plt.figure(figsize=(12, 8))

plot_tree(decision_tree, feature_names=['CGPA', 'IQ'],

class_names=['Not Placement', 'Placement'], filled=True)

plt.title('Decision Tree

Structure') plt.show()

# Predict on a new student record

43 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Example: [ 8.9, 129] => Replace with actual input values based on your dataset

new_student = [[ 8.9, 129]]

prediction = decision_tree.predict(new_student)

print(f"Prediction for new student {new_student}: {'Placed' if prediction[0] == 1 else 'Not


Placed'}")

Output:

44 | P a g e
Introduction to Machine Learning (3171114) NISHI GOHIL

210170111039

Conclusion:

Question:
1. Explain decision tree algorithm.
2. Explain classification of machine learning.
3. Define confusion matrix and its importance.
4. How can we plot decision tree using python.
Rubrics:

Rubrics 1 2 3 4 5

45 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 7

Logistic Regression Algorithm


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Learn and implement various basic machine learning algorithms.
Objective:
1. Implement Logistic regression algorithm.

2. Find classification report, confusion matrix and accuracy.

Code:

# Import necessary libraries

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score,


roc_curve, roc_auc_score, ConfusionMatrixDisplay

# Load the dataset from the uploaded CSV file

file_path = 'data.csv' # Update the file path if

necessary data = pd.read_csv(file_path)

# Identify categorical columns

# Assuming 'Placed' is the target variable and all others are features

categorical_cols = data.select_dtypes(include=['object']).columns.tolist()

# One-hot encode categorical variables

data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)

46 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Split dataset into features (X) and target (y)

X = data.drop('Placed', axis=1) # Drop the target variable from features

y = data['Placed'] # Target variable

# Step 1: Split the dataset into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 2: Create and train the Logistic Regression model

model = LogisticRegression()

model.fit(X_train, y_train)

# Step 3: Make predictions

y_pred = model.predict(X_test)

y_pred_proba = model.predict_proba(X_test)[:, 1] # Probability estimates for the positive

# Step 4: Generate classification report, confusion matrix, and accuracy score

classification_rep = classification_report(y_test, y_pred)

conf_matrix = confusion_matrix(y_test, y_pred)

accuracy = accuracy_score(y_test, y_pred)

# Print the results

print("Classification Report:\n", classification_rep)

print("Confusion Matrix:\n", conf_matrix)

print("Accuracy:", accuracy)

# Step 5: Plotting the Confusion Matrix

plt.figure(figsize=(8, 6))

ConfusionMatrixDisplay(confusion_matrix=conf_matrix).plot(cmap=plt.cm.Blues)

plt.title('Confusion Matrix')

47 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.show()

OUTPUT:

Conclusion:

Question:
1. What is logistic regression?
2. Compare linear and logistic regression.
3. What are the different types of logistic regression?
4. What are the advantage of logistic regression?

Rubrics:
Rubrics 1 2 3 4 5

48 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 8

SVM Classifier using Linear Kernel


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Realize concepts of advanced machine learning algorithms.
Objective:
1. Implement SVM classifier using linear kernel

2. Plot scatter plot for classification of IRIS data

Code:

# Import required libraries


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Load the dataset from the uploaded CSV file
file_path = 'data.csv' # Update the file path if necessary
data_df = pd.read_csv(file_path)
# Check for non-numeric columns and display the DataFrame info
print(data_df.info())
# Convert categorical columns to numeric using one-hot encoding
data_df = pd.get_dummies(data_df, drop_first=True)
# Split dataset into features (X) and target (y)
X = data_df.drop('Placed', axis=1) # Assuming 'Placed' is the target variable
y = data_df['Placed'] # Target variable
# Step 1: Split the dataset into training and testing sets (80% train, 20% test)

49 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Step 2: Create and train the SVM classifier with a linear kernel
svm_classifier = SVC(kernel='linear', random_state=42)
svm_classifier.fit(X_train, y_train)
# Step 3: Make predictions
y_pred = svm_classifier.predict(X_test)
# Step 4: Generate classification rport, confusion matrix, and accuracy score
classification_rep = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
# Print the results
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)
print("Accuracy:", accuracy)
# Step 5: Plotting the decision boundary and scatter plot
# For visualization, we will use only the first two
features
X_vis = X.iloc[:, :2] # Using the first two features for 2D visualization
# Train the SVM model on the reduced dataset
svm_classifier_vis = SVC(kernel='linear', random_state=42)
svm_classifier_vis.fit(X_vis, y)
# Create a mesh grid for plotting decision
boundaries h = .02 # step size in the mesh
x_min, x_max = X_vis.iloc[:, 0].min() - 1, X_vis.iloc[:, 0].max() + 1
y_min, y_max = X_vis.iloc[:, 1].min() - 1, X_vis.iloc[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Plot the decision boundary
Z = svm_classifier_vis.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12, 6))
plt.contourf(xx, yy, Z, alpha=0.3)

50 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

50 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111039

plt.scatter(X_vis.iloc[:, 0], X_vis.iloc[:, 1],


c=y, edgecolors='k', marker='o')
plt.title('SVM Classifier with Linear Kernel')
plt.xlabel(X.columns[0]) # Label for the first feature
plt.ylabel(X.columns[1]) # Label for the second feature
plt.show()
OUTPUT:

Conclusion:

Question:
1. Explain support vector Machine algorithm.
2. Explain different types of kernal functions.
3. What do know about hard margin SVM and Soft margin SVM?
4. What is hinge loss?
Rubrics:

Rubrics 1 2 3 4 5

51 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 9

SVM classifier using RBF Kernel


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Realize concepts of advanced machine learning algorithms.
Objective:
1. Implement SVM classifier using rbf kernel

2. Plot scatter plot for classification of IRIS data

Code:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# Load the dataset from the uploaded CSV file
file_path = 'data.csv' # Update the file path if necessary
iris_df = pd.read_csv(file_path)
# Check for non-numeric columns and display the DataFrame info
print(iris_df.info())
# Convert categorical columns to numeric using one-hot encoding
iris_df = pd.get_dummies(iris_df, drop_first=True)
# Split dataset into features (X) and target (y)
X = iris_df.drop('Placed', axis=1) # Assuming 'Placed' is the target variable
y = iris_df['Placed'] # Target variable
# Step 1: Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

52 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Step 2: Create and train the SVM classifier with an RBF kernel
svm_classifier_rbf = SVC(kernel='rbf', random_state=42)
svm_classifier_rbf.fit(X_train, y_train)
# Step 3: Make predictions
y_pred = svm_classifier_rbf.predict(X_test)
# Step 4: Generate classification report, confusion matrix, and accuracy score
classification_rep = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
# Print the results
print("Classification Report:\n", classification_rep)
print("Confusion Matrix:\n", conf_matrix)
print("Accuracy:", accuracy)
# Step 5: Plotting the decision boundary and scatter plot
# For visualization, we will use only the first two
features
X_vis = X.iloc[:, :2] # Using the first two features for 2D visualization
# Train the SVM model on the reduced dataset
svm_classifier_vis = SVC(kernel='rbf', random_state=42)
svm_classifier_vis.fit(X_vis, y)
# Create a mesh grid for plotting decision
boundaries h = .02 # step size in the mesh
x_min, x_max = X_vis.iloc[:, 0].min() - 1, X_vis.iloc[:, 0].max() + 1
y_min, y_max = X_vis.iloc[:, 1].min() - 1, X_vis.iloc[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Plot the decision boundary
Z = svm_classifier_vis.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.figure(figsize=(12, 6))
plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X_vis.iloc[:, 0], X_vis.iloc[:, 1], c=y, edgecolors='k', marker='o')
plt.title('SVM Classifier with RBF Kernel')
53 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

54 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.xlabel(X.columns[0]) # Label for the first feature


plt.ylabel(X.columns[1]) # Label for the second feature
plt.show()
OUTPUT:

Conclusion:

Question:
1. What are some application of SVM?
2. List out some advantage of SVM?
3. What is hyperplane in SVM?

Rubrics:

Rubrics 1 2 3 4 5

55 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 10

Principal Component Analysis Algorithm


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Study dimensionality reduction concept and its role in machine learning
techniques.
Objective:
1. Implement dimensionality reduction using PCA.

2. Learn standard scalar and plot scaled output.

Code:

# Import necessary libraries

import pandas as pd # to load the dataframe

from sklearn.preprocessing import StandardScaler # to standardize the features

from sklearn.decomposition import PCA # to apply PCA

import seaborn as sns # to plot the heat maps

import matplotlib.pyplot as plt # to show the plots

# Load your dataset

file_path = 'data.csv' # Update with your actual file path

data_df = pd.read_csv(file_path)

# Display the head (first 5 rows) of the dataset

print(data_df.head())

# Select features for PCA (exclude non-numeric or target columns)

features = ['CGPA', 'IQ', 'Resume_Score', 'Internship'] # Features for PCA

X = data_df[features] # Feature matrix

# Standardize the features

56 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

scaler = StandardScaler()

scaled_data = pd.DataFrame(scaler.fit_transform(X), columns=features) # Scaling the data

# Check the correlation between features without PCA

plt.figure(figsize=(10, 6))

sns.heatmap(scaled_data.corr(), annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Correlation Heatmap of Features Before PCA')

plt.show()

# Applying PCA

pca = PCA(n_components=3) # Taking number of Principal Components as 3

data_pca = pca.fit_transform(scaled_data)

data_pca = pd.DataFrame(data_pca, columns=['PC1', 'PC2', 'PC3'])

# Checking correlation between features after PCA

plt.figure(figsize=(10, 6))

sns.heatmap(data_pca.corr(), annot=True, cmap='coolwarm', fmt=".2f")

plt.title('Correlation Heatmap of Principal Components')

plt.show()

# Display the first few rows of the PCA-transformed data

print(data_pca.head())

OUTPUT:

57 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

2101701
2111094210170111039

Conclusion:

Question:
1. What is Dimensionality Reduction?
2. What is PCA? What does PCA do?
3. What are the advantages of Dimensionality Reduction.
4. List down the steps of a PCA algorithm.

58 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 11

Random Forest Algorithm


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Study dimensionality reduction concept and its role in machine learning
techniques.
Objective:
1. Implement random forest algorithm using python.

2. Visualize important feature on bar plot.

3. Find accuracy of model.

Code:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
# Load the dataset
file_path = 'data.csv' # Update with your actual file path
data = pd.read_csv(file_path)
# Display the first few rows of the dataset
print(data.head())
# Define features and target variable
X = data[['CGPA', 'IQ', 'Resume_Score', 'Internship']] # Features
y = data['Placed'] # Target variable
# Split dataset into training set and test set (70% training and 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

59 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Import Random Forest Model


# Create a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
# Train the model using the training sets
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Model Accuracy: how often is the classifier correct?
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
# Feature Importance
feature_imp = pd.Series(clf.feature_importances_,
index=X.columns).sort_values(ascending=False)
# Creating a bar plot to visualize feature importance
plt.figure(figsize=(10, 6))
sns.barplot(x=feature_imp, y=feature_imp.index)
# Add labels to your graph
plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important
Features") plt.show()
OUTPUT:

510 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question:
1. What do mean by random forest algorithm?
2. What do you mean by bagging?
3. What does random refer to in ‘Random Forest’?
4. List down the advantages and disadvantages of Random Forest algorithm.

Rubrics:

Rubrics 1 2 3 4 5

60 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 12

K means Clustering
DATE:

Competency and Practical Skills: Python, scikit-learn, machine learning


Relevant CO: Realize concepts of advanced machine learning algorithms.
Objective:
1. Implement K-Means Clustering using IRIS Dataset.

2. To find frequency distribution of species using pair plot.

3. Visualize correlation using a heat map.

Code:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import MinMaxScaler
# Reading the dataset
file_path = 'data.csv' # Update the file path to your dataset
data = pd.read_csv(file_path)
# Display information about the dataset
print(data.info())
print(data.head(10))
# Frequency distribution of 'Placed' (assuming 'Placed' is the categorical outcome)
outcome_distribution = pd.crosstab(index=data["Placed"], columns="count")
print(outcome_distribution)
# If you want to separate the data based on 'Placed', do it like this

61 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

placed = data[data["Placed"] == 1]
not_placed = data[data["Placed"] == 0]
# Drop non-numeric columns for correlation analysis
data_numeric = data.select_dtypes(include=[np.number]) # Select only numeric columns
# Check if the numeric data is empty
if data_numeric.empty:
print("No numeric columns available for correlation analysis.")
else:
# Visualize correlation using a heatmap
plt.figure(figsize=(10, 6))
sns.heatmap(data_numeric.corr(), cmap='Blues', annot=True)
plt.title('Correlation Heatmap')
plt.show()
# Prepare data for clustering (if desired)
# Selecting only the relevant features for clustering
x = data[['CGPA', 'IQ', 'Resume_Score', 'Internship']].values
# Normalize the features
scaler = MinMaxScaler()
x_scaled =
scaler.fit_transform(x) # KMeans
clustering
kmeans = KMeans(n_clusters=2, random_state=42) # Set n_clusters based on your data
kmeans.fit(x_scaled)
data['Cluster'] = kmeans.labels_
# Calculate silhouette score
silhouette_avg = silhouette_score(x_scaled, kmeans.labels_)
print("Silhouette Score:", silhouette_avg)
# Plotting the clusters
plt.figure(figsize=(10, 6))
plt.scatter(x_scaled[:, 0], x_scaled[:, 1], c=data['Cluster'], cmap='viridis', marker='o')
plt.title('Clusters of Students')

62 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

63 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.xlabel('Feature 1 (CGPA)')
plt.ylabel('Feature 2 (IQ)')
plt.colorbar(label='Cluster')
plt.grid()
plt.show()
OUTPUT:

64 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question:
1. Explain K means Clustering Algorithm.
2. Why do you prefer Euclidean distance over Manhattan distance in the K means algorithm.
3. List out advantage and disadvantage of K means clustering.
4. Difference between K means clustering and KNN algorithm.

Rubrics:

Rubrics 1 2 3 4 5

65 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 13

Naïve Bayes Algoritham


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Realize concepts of advanced machine learning algorithms.
Objective:
1. Implement Naive Bayes algorithm.

2. Find accuracy.

3. To visualize correlation using a heat map.

Code:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn.metrics import confusion_matrix
# Load the dataset
file_path = 'placed.csv' # Update the file path to your dataset
data = pd.read_csv(file_path)
# Display the first few rows of the dataset
print(data.head())
# Assuming the target variable is 'Placed'
X = data.drop('Placed', axis=1) # Features (excluding the target variable)
y = data['Placed'] # Target variable
# Splitting X and y into training and testing sets (60% train, 40% test)
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=1)

66 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

# Training the Gaussian Naive Bayes model on the training set


classifier = GaussianNB()
classifier.fit(x_train, y_train)
# Making predictions on the testing set
y_pred = classifier.predict(x_test)
# Comparing actual response values (y_test) with predicted response values (y_pred)
print("Gaussian Naive Bayes model accuracy (in %):", metrics.accuracy_score(y_test, y_pred)
* 100)
# Making the Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
# Plotting the confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
OUTPUT:

Conclusion:

Question:
1. What is Naive Bayes?Why it is called “Naive “ Bayes algorithm.?
2. List out the important characteristics of naive Bayes.
3. What are the main types of Naive Bayes Classifiers?
4. List out advantages, disadvantages and limitation of Naive bayes algorithm.
Rubrics:

Rubrics 1 2 3 4 5

67 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 14

Activation Function
DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.
Objectives:

1. To implement activation function.

2. Visualize the working of activation function


CODE:
import numpy as np
import matplotlib.pyplot as plt
# Define activation functions
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def relu(x):
return np.maximum(0, x)
def tanh(x):
return np.tanh(x)
# Generate input values
x = np.linspace(-10, 10, 400)
# Calculate activation function outputs
sigmoid_output = sigmoid(x)
relu_output = relu(x)
tanh_output = tanh(x)
# Plotting the activation functions
plt.figure(figsize=(15, 10))
# Sigmoid

68 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.subplot(3, 1, 1)
plt.plot(x, sigmoid_output, label='Sigmoid', color='blue')
plt.title('Sigmoid Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
# ReLU
plt.subplot(3, 1, 2)
plt.plot(x, relu_output, label='ReLU', color='orange')
plt.title('ReLU Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
# Tanh
plt.subplot(3, 1, 3)
plt.plot(x, tanh_output, label='Tanh', color='green')
plt.title('Tanh Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid()
plt.axhline(0, color='black', lw=0.5, ls='--')
plt.axvline(0, color='black', lw=0.5, ls='--')
plt.legend()
plt.tight_layout()

69 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

plt.show()
OUTPUT:

Conclusion:

Question:
1. What is the role of the activation functions in Neural Networks?
2. List down the names of some popular activation function in neural networks.
3. What is the difference between forward propagation and backward propagation in neural
networks?
4. Why is ReLU the most commonly used activation function.

Rubrics:

Rubrics 1 2 3 4 5

610 | P a g
e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

EXPERIMENT NO: 15

Convolution Neural Network(CNN)


DATE:
Competency and Practical Skills: Python, scikit-learn, machine learning
Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.
Objectives:

1. To implement Convolution Neural Network.

2. Find out the total parameters of Convolution Neural Network.


CODE:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers,
models from tensorflow.keras.datasets
import mnist # Load and preprocess the
MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = np.expand_dims(x_train, axis=-1) / 255.0
x_test = np.expand_dims(x_test, axis=-1) / 255.0
# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model

70 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

70 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, batch_size=64)
# Evaluate the model and print total parameters
test_loss, test_acc = model.evaluate(x_test, y_test)
total_params = model.count_params()
print(f"Test accuracy: {test_acc:.4f}")
print(f"Total parameters in the CNN: {total_params}")
OUTPUT:

71 | P a g e
Introduction to Machine Learning (3171114) NAKUM RUTURAJ

210170111094

Conclusion:

Question:
1. What do you mean by Convolutional Neural Network ?
2. List out different layer for in CNN.
3. Briefly explain the two major steps of CNN.(1.Feature learning and 2.Classification)
4. Explain the significance of “Parameter sharing” and “sparsity of connection” in CNN.
Rubrics:

Rubrics 1 2 3 4 5

72 | P a g e

You might also like