0% found this document useful (0 votes)
4 views75 pages

Csit III-II (r22a6681)Machine Learning Lab Manual (2024-25)

The document is a lab manual for a Machine Learning course at Malla Reddy College of Engineering and Technology for B.Tech students. It outlines the vision, mission, program educational objectives, specific outcomes, and general laboratory instructions, along with a detailed week-by-week schedule of practical exercises involving Python libraries and machine learning algorithms. The manual aims to equip students with essential skills in machine learning and ethical practices in technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views75 pages

Csit III-II (r22a6681)Machine Learning Lab Manual (2024-25)

The document is a lab manual for a Machine Learning course at Malla Reddy College of Engineering and Technology for B.Tech students. It outlines the vision, mission, program educational objectives, specific outcomes, and general laboratory instructions, along with a detailed week-by-week schedule of practical exercises involving Python libraries and machine learning algorithms. The manual aims to equip students with essential skills in machine learning and ethical practices in technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

MACHINE LEARNING

LAB MANUAL
(R22A6681)

B.TECH
(III YEAR – II SEM)

(2024-25)

Prepared By:
P.HARI KRISHNA

DEPARTMENT OF COMPUTER SCIENCE &


INFORMATION TECHNOLOGY

MALLA REDDY COLLEGE OF ENGINEERING&TECHNOLOGY


(Autonomous Institution – UGC, Govt. of India)
Recognized under 2(f) and 12 (B) of UGC ACT 1956
Affiliated to JNTUH, Hyderabad, Approved by AICTE - Accredited by NBA & NAAC – ‘A’ Grade - ISO 9001:2008
Certified)
Maisammaguda, Dhulapally (Post Via. Hakimpet), Secunderabad – 500100, Telangana State

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

Vision
* To achieve high quality in technical education that provides the skills and attitude to adapt to the
global needs of the Information Technology sector, through academic and research excellence.

Mission
* To equip the students with the cognizance for problem solving and to improve the teaching

learning pedagogy by using innovative techniques.

* To strengthen the knowledge base of the faculty and students with motivation towards possession

of effective academic skills and relevant research experience.

* To promote the necessary moral and ethical values among the engineers, for the betterment

of the society.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

PEO1 – ANALYTICAL SKILLS

 To facilitate the graduates with the ability to visualize, gather information, articulate, analyze,
solve complex problems, and make decisions. These are essential to address the challenges of complex
and computation intensive problems increasing their productivity.

PEO2 – TECHNICAL SKILLS

 To facilitate the graduates with the technical skills that prepare them for immediate employment
and pursue certification providing a deeper understanding of the technology in advanced areas of
computer science and related fields, thus encouraging to pursue higher education and research based on
their interest.

PEO3 – SOFT SKILLS

 To facilitate the graduates with the soft skills that include fulfilling the mission, setting goals,
showing self-confidence by communicating effectively, having a positive attitude, get involved in team-
work, being a leader, managing their career and their life.

PEO4 – PROFESSIONAL ETHICS

 To facilitate the graduates with the knowledge of professional and ethical responsibilities by
paying attention to grooming, being conservative with style, following dress codes, safety codes, and
adapting themselves to technological advancements.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

PROGRAM SPECIFIC OUTCOMES (PSOs)

After the completion of the course, B.Tech Computer Science & Information Technology, the graduates will

have the following Program Specific Outcomes:

1. Fundamentals and critical knowledge of the Computer System:- Able to Understand


the working principles of the computer System and its components , Apply the knowledge
to build, asses, and analyze the software and hardware aspects of it.

2. The comprehensive and Applicative knowledge of Software Development:


Comprehensive skills of Programming Languages, Software process models,
methodologies, and able to plan, develop, test, analyze, and manage the software and
hardware intensive systems in heterogeneous platforms individually or working in teams.

3. Applications of Computing Domain & Research: Able to use the professional,


managerial, interdisciplinary skill set, and domain specific tools in development
processes, identify the research gaps, and provide innovative solutions to them.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY

PROGRAM OUTCOMES (POs)

Engineering Graduates should possess the following:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design / development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multi disciplinary environments.
12. Life- long learning: Recognize the need for, and have the preparation and ability to engage in

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
independent and life-long learning in the broadest context of technological change.

MALLA REDDY COLLEGE OF ENGINEERING & TECHNOLOGY


Maisammaguda, Dhulapally Post, Via Hakimpet, Secunderabad – 500100

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY

GENERAL LABORATORY INSTRUCTIONS

1. Students are advised to come to the laboratory at least 5 minutes before (to the starting
time), those who come after 5 minutes will not be allowed into the lab.
2. Plan your task properly much before to the commencement, come prepared to the lab
with the synopsis / program/ experiment details.
3. Student should enter into the laboratory with:
a. Laboratory observation notes with all the details(Problem
statement,Aim,Algorithm, Procedure, Program, Expected Output, etc.,) filled in
for the lab session.
b. Laboratory Record updated upto the last session experiments and other utensils(ifany)
needed in the lab.
c. Proper Dresscode and Identity card.
4. Sign in the laboratory login register, write the TIME-IN, and occupy the computer system
allotted to you by the faculty.
5. Execute your task in the laboratory, and record the results /output in the lab observation note
book, and get certified by the concerned faculty.
6. All the students should be polite and cooperative with the laboratory staff, must maintain the
discipline and decency in the laboratory.
7. Computer labs are established with sophisticated and high-end branded systems, which should
be utilized properly.
8. Students/Facultymust keep their mobilephones in SWITCHEDOFF mode during the lab
sessions. Misuse of the equipment, misbehaviors with the staff and systems etc., will attract
severe punishment.
9. Students must take the permission of the faculty in case of any urgency to go out; if anybody
found loitering outside the lab/class without permission during working hours will be treated
seriously and punished appropriately.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
10. Students should LOG OFF/ SHUT DOWN the computer system before he/she leaves the lab
after completing the task (experiment) in all aspects. He/she must ensure the system / seat is
kept properly.

HEAD OF THE DEPARTMENT PRINCIPAL

MALLA REDDY COLLEGE OF ENGINEERING AND TECHNOLOGY


III Year B.TECH-II-SEM L/T/P/C
-/0/2/1
(R22A6681) MACHINE LEARNING LAB
Lab Objectives:

 To introduce the basic concepts and techniques of Machine Learning and the
need of Machine Learning techniques in real-world problems.
 To provide understanding of various Machine Learning algorithms and the way
to evaluate performance of the Machine Learning algorithms.
 To apply Machine Learning to learn, predict and classify the real-world problems
in the Supervised Learning paradigms as well as discover the Unsupervised
Learning paradigms of Machine Learning.
 To inculcate in students professional and ethical attitude, multidisciplinary
approach and an ability to relate real-world issues and provide a cost effective
solution to it by developing ML applications.

Week-1: Implementation of Python Basic Libraries such as Statistics, Math, Numpy and Scipy
a) Usage of methods such as floor(), ceil(), sqrt(), isqrt(), gcd() etc.
b) Usage of attributes of array such as ndim, shape, size, methods such as sum(), mean(), sort(),
sin() etc.
c) Usage of methods such as det(), eig() etc.
d) Consider a list datatype (1D) then reshape it into2D, 3D matrix using numpy
e) Generate random matrices using numpy
f) Find the determinant of a matrix using scipy
g) Find eigen value and eigen vector of a matrix using scipy

Week 2: Implementation of Python Libraries for ML application such as Pandas and Matplotlib.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
a) Create a Series using pandas and display
b) Access the index and the values of our Series
c) Compare an array using Numpy with a series using pandas
d) Define Series objects with individual indices
e) Access single value of a series
f) Load datasets in a Dataframe variable using pandas
g) Usage of different methods in Matplotlib.

Week 3: a) Creation and Loading different types of datasets in Python using the required libraries.
i.Creation using pandas
ii. Loading CSV dataset files using Pandas
iii. Loading datasets using sklearn
b) Write a python program to compute Mean, Median, Mode, Variance, Standard Deviation
using Datasets
c) Demonstrate various data pre-processing techniques for a given dataset.
Write a python program to compute
i.Reshaping the data,
ii. Filtering the data
, iii. Merging the data
iv. Handling the missing values in datasets
v. Feature Normalization: Min-max normalization

Week4: Implement Dimensionality reduction using Principle Component Analysis (PCA) method
on a dataset
(For example Iris).

Week 5: Write a program to demonstrate the working of the decision tree based ID3 algorithm by
considering a dataset.

Week 6: Consider a dataset, use Random Forest to predict the output class.
Vary the number of trees as follows and compare the results: i. 20 ii. 50 iii. 100 iv. 200 v. 500

Week 7: Write a Python program to implement Simple Linear Regression and plot the graph.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
Week 8: Write a Python program to implement Logistic Regression for iris using sklearn and
plot confusion matrix

Week 9: Build KNN Classification model for a given dataset. Vary the number of k values as follows
and compare the results: i. 1 ii. 3 iii. 5 iv. 7 v. 11

Week 10: Implement Support Vector Machine for a dataset and compare the accuracy by applying
The following kernel functions: i. Linear ii. Polynomial iii. RBF

Week 11: Write a python program to implement K-Means clustering Algorithm.


Vary the number of k values as follows and compare the results: i. 1 ii. 3 iii. 5

Machine Learning Lab

Table of Contents
S. Name of the Program Page No
No
Implementation of Python Basic Libraries such as Statistics, Math, Numpy 11
1. and Scipy
a) Usage of methods such as floor(), ceil(), sqrt(), isqrt(), gcd() etc.
b) Usage of attributes of array such as ndim, shape, size, methods such
as sum(), mean(), sort(), sin() etc.
c) Usage of methods such as det(), eig() etc.
d) Consider a list datatype(1D) then reshape it into2D, 3D matrix
using numpy
e) Generater and ommatrices using numpy
f) Find the determinant of a matrix using scipy
g) Find eigen value and eigen vector of a matrix using scipy
Implementation of Python Libraries for ML application such as Pandas 20
2.
and Matplotlib.
a) Create a Series using pandas and display
b) Access the index and the values of our Series
c) Compare an array using Numpy with a series using pandas
d) Define Series objects with individual indices
e) Access single value of a series
f) Load datasets in a Data frame variable using pandas
g) Usage of different methods in Matplotlib.

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
a) Creation and Loading different types of datasets in Python using the 28
3.
required libraries.
i. Creation using pandas
ii. Loading CSV dataset files using Pandas
iii. Loading datasets using sklearn

b) Write a python program to compute Mean, Median, Mode,


Variance,
Standard Deviation using Datasets
c) Demonstrate various data pre-processing techniques for a given
dataset.
Write a python program to compute
i. Reshaping the data,
ii. Filtering the data,
iii. Merging the data
iv. Handling the missing values in datasets
v. Feature Normalization: Min-max normalization
Implement Dimensionality reduction using Principle component Analysis method 57
4 on a dataset iris
Write a program to demonstrate the working of the decision tree based 59
5
ID3 algorithm by considering a dataset.
6.
Consider a dataset, use Random Forest to predict the output class. Vary the 62
number of trees as follows and compare the results:
i.20 ii.50 iii.100 iv.200 v.500

Write a Python program to implement Simple Linear Regression and plot 64


7.
the graph.
Write a Python program to implement Logistic Regression for iris using 68
8
sklearn and plot the confusion matrix.
9 Build KNN Classification model for a given dataset. Vary the number of k 70
values as follows and compare the results:
i. 1
ii. 3
iii. 5
iv. 7
v. 11

10 Implement Support Vector Machine for a dataset and compare the accuracy 72
by applying the following kernel functions:
i. Linear
ii. Polynomial
iii. RBF

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY
11 Write a python program to implement K-Means clustering Algorithm. Vary 73
the number of k values as follows and compare the results:
i. 1
ii. 3
iii. 5

DEPARTMENT OF COMPUTER SCIENCE & INFORMATION TECHNOLOGY


MACHINE LEARNING LAB MANUAL 2024-2025
2024-2025

Week 1:
a)Implementation of Python Basic Libraries such as Math, Numpy and Scipy
Theory/Description:

 Python Libraries
There are a lot of reasons why Python is popular among developers and one of them is that it
has an amazingly large collection of libraries that users can work with. In this Python Library,
we will discuss Python Standard library and different libraries offered by Python Programming
Language: scipy, numpy, etc.
We know that a module is a file with some Python code, and a package is a directory for sub
packages and modules. A Python library is a reusable chunk of code that you may want to
includein your programs/ projects. Here, a ‗library‘ loosely describes a collection of core
modules. Essentially, then, a library is a collection of modules. A package is a library that can
be installed using a package manager like npm.

 Python Standard Library


The Python Standard Library is a collection of script modules accessible to a Python program
to simplify the programming process and removing the need to rewrite commonly used
commands. They can be used by 'calling/importing' them at the beginning of a script. A list of
the Standard Library modules that are most important
◻ time
◻ sys
◻ csv
◻ math
◻ random
◻ pip
◻ os
◻ statistics
◻ tkinter
◻ socket

To display a list of all available modules, use the following command in the Python console:
>>> help('modules')
 List of important Python Libraries
o Python Libraries for Data Collection
 Beautiful Soup
 Scrapy
 Selenium
o Python Libraries for Data Cleaning and Manipulation
 Pandas
 PyOD

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

 NumPy
 Scipy
 Spacy
o Python Libraries for Data Visualization
 Matplotlib
 Seaborn
 Bokeh
o Python Libraries for Modeling
 Scikit-learn
 TensorFlow
 PyTorch

Implementation of Python Basic Libraries such as Math, Numpy and Scipy


 Python Math Library

The math module is a standard module in Python and is always available. To use
mathematical functions under this module, you have to import the module using import
math. It gives access tothe underlying C library functions. This module does not support
complex datatypes. The cmath module is the complex counterpart.

List of Functions in Python Math Module


Function Description
ceil(x) Returns the smallest integer greater than or equal to x.
copysign(x, Returns x with the sign of y
y)
fabs(x) Returns the absolute value of x
factorial(x) Returns the factorial of x
floor(x) Returns the largest integer less than or equal to x
fmod(x, y) Returns the remainder when x is divided by y
frexp(x) Returns the mantissa and exponent of x as the pair (m, e)
fsum(iterable) Returns an accurate floating point sum of values in the iterable
isfinite(x) Returns True if x is neither an infinity nor a NaN (Not a Number)
isinf(x) Returns True if x is a positive or negative infinity
isnan(x) Returns True if x is a NaN
ldexp(x, i) Returns x * (2**i)
modf(x) Returns the fractional and integer parts of x
trunc(x) Returns the truncated integer value of x
exp(x) Returns e**x
expm1(x) Returns e**x - 1

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-1

Program-2

Program-3

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-4

Program-5

Department of CSIT 14
MACHINE LEARNING LAB MANUAL 2024-2025

Program-6

 Python Numpy Library

NumPy is an open source library available in Python that aids in mathematical, scientific,
engineering, and data science programming. NumPy is an incredible library to perform
mathematical and statisticaloperations. It works perfectly well for multi-dimensional arrays
and matrices multiplication

For any scientific project, NumPy is the tool to know. It has been built to work with the
N- dimensional array, linear algebra, random number, Fourier transform, etc. It can be
integrated toC/C++ and Fortran.

NumPy is a programming language that deals with multi-dimensional arrays and matrices.
On top ofthe arrays and matrices, NumPy supports a large number of mathematical
operations.

NumPy is memory efficiency, meaning it can handle the vast amount of data more accessible
than anyother library. Besides, NumPy is very convenient to work with, especially for matrix
multiplication and reshaping. On top of that, NumPy is fast. In fact, TensorFlow and Scikit
learn to use NumPy arrayto compute the matrix multiplication in the back end.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

 Arrays in NumPy: NumPy‘s main object is the homogeneous multidimensional array.

◻ It is a table of elements (usually numbers), all of the same type, indexed by a tuple of
positiveintegers.
◻ In NumPy dimensions are called axes. The number of axes is rank.
◻ NumPy’s array class is called ndarray. It is also known by the alias array.

We use python numpy array instead of a list because of the below three reasons:
1. Less Memory
2. Fast
3. Convenient

 Numpy Functions

Numpy arrays carry attributes around with them. The most important
ones are:ndim: The number of axes or rank of the array
shape: A tuple containing the length in each
dimensionsize: The total number of elements

Program-1

Can be used just like Python lists


x[1] will access the second element
x[-1] will access the last element

Program-2

Arithmetic operations apply element wise

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

 Built-in Methods

Many standard numerical functions are available as methods out of the box:
Program-3

 Python Scipy Library

SciPy is an Open Source Python-based library, which is used in mathematics,


scientific computing, Engineering, and technical computing. SciPy also pronounced
as "Sigh Pi."

◻ SciPy contains varieties of sub packages which help to solve the most common issue
related to Scientific Computation.
◻ SciPy is the most used Scientific library only second to GNU Scientific Library for
C/C++ or Matlab's.
◻ Easy to use and understand as well as fast computational power.
◻ It can operate on an array of NumPy library.

Department of CSIT
Numpy VS SciPyNumpy:
1. Numpy is written in C and use for mathematical or numeric calculation.
2. It is faster than other Python Libraries
3. Numpy is the most useful library for Data Science to perform basic calculations.
4. Numpy contains nothing but array data type which performs the most basic operation like
sorting,shaping, indexing, etc.

SciPy:
1. SciPy is built in top of the NumPy
2. SciPy is a fully-featured version of Linear Algebra while Numpy contains only a few features.
3. Most new Data Science features are available in Scipy rather than Numpy.

Linear Algebra with SciPy

1. Linear Algebra of SciPy is an implementation of BLAS and ATLAS LAPACK libraries.


2. Performance of Linear Algebra is very fast compared to BLAS and LAPACK.

Linear algebra routine accepts two-dimensional array object and output is also a two-dimensional
array.
Now let's do some test with scipy.linalg,
Calculating determinant of a two-dimensional matrix,
Program-1

Eigenvalues and Eigenvector – scipy.linalg.eig()

◻ The most common problem in linear algebra is eigenvalues and eigenvector which can
beeasily solved using eig() function.
◻ Now lets we find the Eigenvalue of (X) and correspond eigenvector of a two-
dimensionalsquare matrix.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-2

Exercise programs:
1. consider a list datatype then reshape it into 2d,3d matrix using numpy
2. Genrate random matrices using numpy
3. Find the determinant of a matrix using scipy
4. Find eigenvalue and eigenvector of a matrix using scipy

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 2:
Implementation of Python Libraries for ML application such as Pandas and Matplotlib.

 Pandas Library

The primary two components of pandas are the Series and DataFrame.
A Series is essentially a column, and a DataFrame is a multi-dimensional table made
up of acollection of Series.
DataFrames and Series are quite similar in that many operations that you can do
with oneyou can do with the other, such as filling in null values and calculating

the mean.

◻ Reading data from CSVs

With CSV files all you need is a single line to load in the data:
df =
pd.read_csv('purchases.csv')df

Let's load in the IMDB movies dataset to begin:


movies_df = pd.read_csv("IMDB-Movie-Data.csv", index_col="Title")
We're loading this dataset from a CSV and designating the movie titles to be our index.

◻ Viewing your data


The first thing to do when opening a new dataset is print out a few rows to keep as
a visualreference. We accomplish this with .head():
movies_df.head()

Another fast and useful attribute is .shape, which outputs just a tuple of (rows, columns):
movies_df.shape
Note that .shape has no parentheses and is a simple tuple of format (rows, columns). So
we have1000 rows and 11 columns in our movies DataFrame.
You'll be going to .shape a lot when cleaning and transforming data. For example, you
might filtersome rows based on some criteria and then want to know quickly how many
rows were removed.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-1

We haven't defined an index in our example, but we see two columns in our output: The right column
contains our data, whereas the left column contains the index. Pandas created a default index starting with 0
going to 5, which is the length of the data minus 1.

Program-2

We can directly access the index and the values of our Series S:

Program-3

If we compare this to creating an array in numpy, we will find lots of similarities:

So far our Series have not been very different to ndarrays of Numpy. This changes, as soon as we start
defining Series objects with individual indices:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-4

Program-5

A big advantage to NumPy arrays is obvious from the previous example: We can use arbitrary indices.
If we add two series with the same indices, we get a new series with the same index and the correponding
values will be added:

fruits= ['apples', 'oranges', 'cherries', 'pears']


S= pd.Series([20, 33, 52, 10], index=fruits)
S2= pd.Series([17, 13, 31, 32], index=fruits)
print(S+ S2)
print("sum of S: ", sum(S))

OUTPUT:
apples 37
oranges 46
cherries 83
pears 42
dtype: int64
sum of S: 115

Program-6

The indices do not have to be the same for the Series addition. The index will be the "union" of both indices.
If an index doesn't occur in both Series, the value for this Series will be NaN:

fruits= ['peaches', 'oranges', 'cherries', 'pears']


fruits2= ['raspberries', 'oranges', 'cherries', 'pears']

S= pd.Series([20, 33, 52, 10], index=fruits)


S2= pd.Series([17, 13, 31, 32], index=fruits2)
print(S+ S2)

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

OUTPUT:
cherries 83.0
oranges 46.0
peaches NaN
pears 42.0
raspberries NaN
dtype: float64

Program-7

In principle, the indices can be completely different, as in the following example. We have two indices. One is
the Turkish translation of the English fruit names:
fruits= ['apples', 'oranges', 'cherries', 'pears']

fruits_tr= ['elma', 'portakal', 'kiraz', 'armut']

S= pd.Series([20, 33, 52, 10], index=fruits)


S2= pd.Series([17, 13, 31, 32], index=fruits_tr)
print(S+ S2)

OUTPUT:
apples NaN
armut NaN
cherries NaN
elma NaN
kiraz NaN
oranges NaN
pears NaN
portakal NaN
dtype: float64

Program-8

Indexing
It's possible to access single values of a Series.

print(S['apples'])

OUTPUT:
20

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

 Matplotlib Library

Pyplot is a module of Matplotlib which provides simple functions to add plot


elementslike lines, images, text, etc. to the current axes in the current figure.

◻ Make a simple plot


import matplotlib.pyplot as pltimport
numpy as np

List of all the methods as they appeared.

◻ plot(x-axis values, y-axis values) — plots a simple line graph with x-axis values
against y-axis values
◻ show() — displays the graph
◻ title(―stringǁ) — set the title of the plot as specified by the string
◻ xlabel(―stringǁ) — set the label for x-axis as specified by the string
◻ ylabel(―stringǁ) — set the label for y-axis as specified by the string
◻ figure() — used to control a figure level attributes
◻ subplot(nrows, ncols, index) — Add a subplot to the current figure
◻ suptitle(―stringǁ) — It adds a common title to the figure specified by the string
◻ subplots(nrows, ncols, figsize) — a convenient way to create subplots, in a single call.
It returns a tuple of a figure and number of axes.
◻ set_title(―stringǁ) — an axes level method used to set the title of subplots in a figure
◻ bar(categorical variables, values, color) — used to create vertical bar graphs
◻ barh(categorical variables, values, color) — used to create horizontal bar graphs
◻ legend(loc) — used to make legend of the graph
◻ xticks(index, categorical variables) — Get or set the current tick locations and labels
of the x-axis
◻ pie(value, categorical variables) — used to create a pie chart
◻ hist(values, number of bins) — used to create a histogram
◻ xlim(start value, end value) — used to set the limit of values of the x-axis
◻ ylim(start value, end value) — used to set the limit of values of the y-axis
◻ scatter(x-axis values, y-axis values) — plots a scatter plot with x-axis values against
y-axis values
◻ axes() — adds an axes to the current figure
◻ set_xlabel(―stringǁ) — axes level method used to set the x-label of the plot specified
as a string
◻ set_ylabel(―stringǁ) — axes level method used to set the y-label of the plot specified
as a string
◻ scatter3D(x-axis values, y-axis values) — plots a three-dimensional scatter plot with
x-axis values against y-axis values
◻ plot3D(x-axis values, y-axis values) — plots a three-dimensional line graph with x-
axis values against y-axis values

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Here we import Matplotlib‘s Pyplot module and Numpy library as most of the data thatwe
will be working with will be in the form of arrays only.
Program-1

Program-2
We pass two arrays as our input arguments to Pyplot‘s plot() method and use show() method to
invoke the required plot. Here note that the first array appears on the x-axis andsecond array appears
on the y-axis of the plot. Now that our first plot is ready, let us add the title, and name x-axis and y
axis using methods title(), xlabel() and ylabel() respectively.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-3
We can also specify the size of the figure using method figure()and passing the valuesas a tuple of
the length of rows and columns to the argument figsize

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-4

With every X and Y argument, you can also pass an optional third argument in the formof a string which
indicates the colour and line type of the plot. The default format is b- which means a solid blue line. In the
figure below we use go which means green circles.Likewise, we can make many such combinations to format
our plot.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 3: Creation and Loading different datasets in Python

Program-1

Method-I

Program-2
MACHINE LEARNING LAB MANUAL 2024-2025

Method-II:

Program-3 Uploading csv file:

Method-III:

b) Write a python program to compute Mean, Median, Mode, Variance, Standard Deviation using
Datasets
MACHINE LEARNING LAB MANUAL 2024-2025

 Python Statistics library


This module provides functions for calculating mathematical statistics of numeric (Real-
valued) data. The statistics module comes with very useful functions like: Mean, median,
mode, standard deviation,and variance.
The four functions we'll use in this post are common in statistics:
1. mean - average value
2. median - middle value
3. mode - most often value
4. standard deviation - spread of values

 Averages and measures of central location


These functions calculate an average or typical value from a population or
sample.mean() Arithmetic mean (―averageǁ) of data.
harmonic_mean() Harmonic mean of data.
median() Median (middle value) of
data.median_low() Low median of data.
median_high() High median of data.
median_grouped() Median, or 50th percentile, of grouped
data.mode() Mode (most common value) of discrete
data.

 Measures of spread
These functions calculate a measure of how much the population or sample tends to
deviate fromthe typical or average values.
pstdev() Population standard deviation of data.
pvariance() Population variance of data.
stdev() Sample standard deviation of data.
variance() Sample variance of data.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-1

Program-2

Program-3

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-4

Program-5

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

c) Write a python program to compute reshaping the data, Filtering the data , merging the data and
handling the missing values in datasets.

Assigning the data:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Filtering the data


suppose there is a requirement for the details regarding name, gender, marks of the top-scoring students. Here
we need to remove some unwanted data.

Program-1

Program-2

Program-3

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Merge data:
Merge operation is used to merge raw data and into the desired format.
Syntax:
pd.merge( data_frame1,data_frame2, on="field ")

Program-4

First type of data:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-5

Second type of data:

Program-6

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Handling the missing values:


Program-1

Program-2

In order to check null values in Pandas DataFrame, we use isnull() function this function return dataframe of
Boolean values which are True for NaN values.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-3

In order to check null values in Pandas Dataframe, we use notnull() function this function return dataframe of
Boolean values which are False for NaN values.

Program-4

Department of CSIT
MACHINE LEARNING LAB MANUAL 2022-2023

Program-5

Program-6

Program-7

Method-I
Drop Columns with Missing Values

Department of CSIT
MACHINE LEARNING LAB MANUAL 2022-2023

Program-8

Method-II
fillna() manages and let the user replace NaN values with some value of their own

Program-9

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-10

Filling missing values with mean

Program-11

Filling missing values in csv files:


df=pd.read_csv(r'E:\mldatasets\Machine_Learning_Data_Preprocessing_Python-
master\Sample_real_estate_data.csv', na_values='NAN')

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program-12

Program-13

Code:
missing_value = ["n/a","na","--"]
data1=pd.read_csv(r'E:\mldatasets\Machine_Learning_Data_Preprocessing_Python-
master\Sample_real_estate_data.csv', na_values = missing_value)
df = data1

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Reshaping the data:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Program:
Write a python program to loading csv dataset files using Pandas library functions.
Program:
a. Importing data(CSV)

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

b. Importing data(EXCEL)

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Excersice:
Demonstrate various data pre-processing techniques for a given dataset.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 4:
Implement Dimensionality reduction using Principle Component Analysis (PCA) method.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Observations:
- x1 and x2 do not seem correlated
- x1 seems very correlated with both x3 and x4
- x2 seems somewhat correlated with both x3 and x4
- x3 and x4 seem very correlated

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 5:
Develop Decision Tree Classification model for a given dataset and use it to classify a new sample.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes'

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 6:
Consider a dataset use Random Forest to predict the output class vary the number of trees as follows
and compare the results. i) 20 ii)50 iii)100 iv)200 v)500

from sklearn.ensemble import RandomForestClassifier


from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

data = load_iris()
X = data.data # Feature data
y = data.target # Target labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
tree_counts = [20, 50, 100, 200, 500]
accuracies = []
for n_trees in tree_counts:
# Initialize RandomForestClassifier with different number of trees
model = RandomForestClassifier(n_estimators=n_trees, random_state=42)
model.fit(X_train, y_train) # Train the model
y_pred = model.predict(X_test) # Make predictions
accuracy = accuracy_score(y_test, y_pred) # Evaluate accuracy
accuracies.append(accuracy) # Append the accuracy to the list
plt.figure(figsize=(8, 6))
plt.plot(tree_counts, accuracies, marker='o', linestyle='-', color='b')
plt.title('Accuracy vs. Number of Trees in Random Forest')
plt.xlabel('Number of Trees')
plt.ylabel('Accuracy')
plt.grid(True)
plt.xticks(tree_counts) # Set the x-axis ticks to the number of trees
plt.show()

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

output:

For 20 Trees: Accuracy might be around 0.90 (90%).


For 50 Trees: Accuracy might be around 0.95 (95%).
For 100 Trees: Accuracy might be around 0.96 (96%).
For 200 Trees: Accuracy might be around 0.96 (96%).
For 500 Trees: Accuracy might be around 0.97 (97%).

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 7:
Write a python program to implement Simple Linear Regression Models and plot the graph.
Program:

a) To implement Simple Linear Regression.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

b) To implement Multiple Linear Regression.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 8:
Write a python program to implement Logistic Regression Model for a given dataset.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Excersice:
Implement Naive Bayes classification in python.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week 9:
Build KNN Classification model for a given dataset.
Program:

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week-10
Implement Support Vector Machine for a dataset.

Department of CSIT
MACHINE LEARNING LAB MANUAL 2024-2025

Week-11

Write a python program to implement K-Means clustering Algorithm.

Program:

Department of CSIT
Department of CSIT

You might also like