0% found this document useful (0 votes)
8 views

Unit-16

Uploaded by

sandipmondal88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit-16

Uploaded by

sandipmondal88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT 16

MACHINE LEARNING – Clustering

PROGRAMMING USING PYTHON


Structure
16.0 Introduction
16.1 Objectives
16.2 Classification Algorithms
16.2.1 Naïve Bayes

16.2.2 K-Nearest Neighbour (K-NN)

16.2.3 Decision Trees

16.2.4 Logistic Regression

16.2.5 Support Vector Machines

16.3 Regression Algorithms


16.3.1 Linear Regresssion

16.3.2 Polynomial Regression

16.4 Feature Selection and Extraction


16.4.1 Principal Component Analysis

16.5 Association Rules


16.5.1 Apriori Algorithm

16.6 Clustering Algorithms


16.6.1 K-Means,

16.7 Summary
16.9 Solutions/ Answers
16.10 Further Readings

16.0 INTRODUCTION
In this unit we will see the implementation of various machine learning algorithms,
learned in this course. To understand the codes you need to have understanding
of the respective Machine learning algorithms along with that understanding of
Python programming is must. The codes are readily using various libraries of
Python programming language viz. Scikit Learn, Matplotlib, numpy etc., you
can execute these codes through any of the Python programming tools. Most of
the machines learning algorithms, you learned in this course, are implemented
here, just try to execute them and analyse the results.

16.1 OBJECTIVES
After going through this unit, you should be able to:
● Understand the implementation aspect of various machine learning
471
algorithms
Machine Learning - II 16.2 CLASSIFICATION ALGORITHMS
The starting units of this course primarily focused on the various classification
algorithms viz. Naïve Bayes classifiers, K-Nearest Neighbour (K-NN), Decision
Trees, Logistic Regression and Support Vector Machines.The theoretical
aspects of the same is already discussed in the respective units, now we will see
the implementation part of the mentioned classifiers, in Python programming
language.
16.2.1 Naive Bayes
It is a method of classification that is founded on Bayes' Theorem and makes the
assumption that predictors are free to act independently of one another. A Naive
Bayes classifier, to put it in layman's words, makes the assumption that the
existence of one particular characteristic in a class is unrelated to the presence
of any other feature.
We have already discussed this classifier in detail in Block 3 Unit 10 of this
course, you may refer to Block 3 Unit 10 to understand the concept.
The following procedures need to be carried out in order to classify data using
the Naive Bayes method.
• In the first step, we will begin by importing the dataset as well as any
necessary dependencies...
• The second step is to get the prior probability of each class using the formula
P(y).
• The Third Step is to Determine the likelihood of each characteristic using
the table you just created...
• Final and the Fourth Step is to Calculate the Posterior Probability for each
class by applying the Naive Bayesian equation.
Implementation code in Python
The screenshot of the executed code is given below

472
OUTPUT : Machine Learning –
Programming Using
Gaussian Naive Bayes model accuracy(in %): 95.0 Python

16.2.2 K-Nearest Neighbour (K-NN)


You have already discussed this classifier in detail in Block 3 Unit 10 of this
course, you may refer to Block 3 Unit 10 to understand the concept.
We learned that Suppose the value of K is 3. The KNN algorithm starts by
calculating the distance of point X from all the points. It then finds the 3 nearest
points with least distance to point X
In the example shown below following steps are performed:
• In Step 1, the scikit-learn package is used to import the k-nearest neighbour
algorithm.
• Step 2. is to create the feature variables and the target variables.
• Step 3. Separate the data into the test data and the training data.
• Step 4.Generate a k-NN model using neighbours value.
• Step 5. Train the model using the data or adjust the model based on the data.
• Proceed to Step 6, which is to make a forecast.
Now, in this section, we will see how Python's Scikit-Learn library can be used
to implement the KNN algorithm
Implementation code in Python
The screenshot of the executed code is given below

473
Machine Learning - II 16.2.3 Desicion Tree Implementation
A decision tree is a type of supervised machine learning algorithm that may be
used for both regression and classification tasks. It is one of the most popular
and widely used machine learning techniques.
In this case, the decision tree method creates a node for each attribute present
in the dataset, with the attribute that is considered to be the most significant
being placed at the top of the tree. When we first get started, we will think of
the entire training set as the root. There must be a categorical breakdown of the
feature values.
Before beginning to develop the model, the values are discretized in order to
determine whether or not they are continuous. A recursive process distributes
records according to the attribute values of each record. A statistical method is
utilised in order to determine which qualities should be placed at the tree's root
and which should be placed at internal nodes.
You have already discussed this classifier in detail in Block 3 Unit 10 of this
course, you may refer to Block 3 Unit 10 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

474
Machine Learning –
Programming Using
Python

475
Machine Learning - II

476
16.2.4 Logistic Regression Machine Learning –
Programming Using
Logistic Regression (LR) is a classification algorithm that is used in Machine Python
Learning to predict the likelihood of a categorical dependent variable. It is also
known as "logistic regression." The dependent variable in logistic regression is
a binary variable, which means that it comprises data that is either recorded as
1 (yes, success, etc.) or 0. (no, failure, etc.).
It should be brought to your attention that the Naive Bayes model is a generative
model, whereas the LR model is a discriminative model. LR performs better
than naive bayes when it comes to colinearity. This is because naive bayes
expects all of the characteristics to be independent, while LR does not. Naive
bayes works well with small datasets.
You have already discussed this classifier in detail in Block 3 Unit 10 of this
course, you may refer to Block 3 Unit 10 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

477
Machine Learning - II

16.2.5 Support Vector Machine


Support Vector Machine, more usually referred to as SVM, is a technique for
supervised and linear machine learning that is most frequently utilised for the
purpose of addressing classification issues. Support Vector Classification is
another name for SVM. In addition, there is a subset of SVM known as SVR,
478 which stands for Support Vector Regression. SVR applies the similar concepts
to the problem-solving process when addressing regression issues. SVM also Machine Learning –
offers a method known as the kernel method, which is also known as the kernel Programming Using
Python
SVM. This method enables us to deal with non-linearity.
The following are the steps involved in implementation:
• Import the Libraries
• Make sure the Dataset is loaded.
• Dataset will be divided into X and Y.
• Create a Training set and a Test set from the X and Y Datasets.
• Scaling the features should be done.
• Ensure that the SVM is adjusted to the Training set.
• Make a guess about the results of the test set.
• Construct the Matrix of Confusion.
You have already discussed this classifier in detail in Block 3 Unit 10 of this
course, you may refer to Block 3 Unit 10 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

479
Machine Learning - II

OUTPUT:

Check Your Progress - 1


1. Make Suitable assumptions and modify the python code of following
Classification algorithms:
a. K-NN
b. Decision Tree
c. Logistic Regression
d. Support Vector Machines

16.3 REGRESSION ALGORITHMS


We learned about the basic concept of regression in the respective unit of this
course, in this unit we will implement the Linear regression and Polynomial
regression in Python language. Lets start with the Linear regression.

16.3.1 Linear Regression


The purpose of a linear regression model is to determine whether or not there is
a connection between one or more characteristics (also known as independent
variables) and a target variable that is continuous (dependent variable). Linear
Regression is referred to as Uni-variate Linear Regression when there is only
one feature, and it is referred to as Several Linear Regression when there are
multiple features.

480
Following are the stages involved in the implementation of a linear regression Machine Learning –
model: Programming Using
Python
• Firstly, initialise the parameters.
• Given the value of an independent variable, predict what the value of a
dependent variable will be.
• Determine the amount of error that each forecast has for each data point.
• Using a0 and a1, perform the calculation for the partial derivative.
• Add up the individual costs that you have determined for each of the
numbers.
You have already discussed this classifier in detail in Block 3 Unit 11 of this
course, you may refer to Block 3 Unit 11 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

481
Machine Learning - II

482
Machine Learning –
Programming Using
Python

OUTPUT:

16.3.2 Polynomial Regression


Polynomial Regression is a type of linear regression in which the relationship
between the independent variable x and the dependent variable y is described as
an nth degree polynomial. This type of regression is also known as "extended"
linear regression. Polynomial regression is used to model a nonlinear relationship
between the value of an independent variable x and the conditional mean of a
dependent variable y. This relationship is represented by the notation E(y |x).
solely as a result of the non-linear relationship that exists between the dependent
and the independent variables When we want to transform linear regression into
polynomial regression, we just add some polynomial terms.
You have already discussed this classifier in detail in Block 3 Unit 11 of this
course, you may refer to Block 3 Unit 11 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

483
Machine Learning - II OUTPUT:

Check Your Progress - 2


2. Make Suitable assumptions and modify the python code of following
Regression algorithms:
a. Linear regression
b. Polynomial egression

16.4 FEATURE SELECTION AND EXTRACTION


Feature selection and extraction are one of the most important steps that must
be performed in order for machine learning to be successful. While we covered
the theoretical aspects of this process in the earlier units of this course, it is now
time to understand the implementation part of the mechanisms that we have
learned for Feature selection and extraction. Let's begin with dimensionality
reduction, which is the process of lowering the number of random variables that
are being considered by generating a set of primary variables. Dimensionality
reduction may be seen of as a way to streamline the analysis process.

16.4.1 Principal Component Analysis (PCA)


You have already discussed this classifier in detail in Block 4 Unit 13 of this
course, you may refer to Block 4 Unit 13 to understand the concept.
Among the various techniques the Principal Component Analysis (PCA) is
most frequently used, and the implementation of PCA is given below:
Implementation code in Python
The screenshot of the executed code is given below

484
Machine Learning –
Programming Using
Python

OUTPUT :

Check Your Progress - 3


3. Make Suitable assumptions and modify the python code of Principal
Component Analysis, for dimensionality reduction.

16.5 ASSOCIATION RULES


We discussed Apriori algorithm and FP Growth algorithm, while studying the
topic of Association Rules. These algorithms are frequently used in pattern
matching. Since FP Growth is a step ahead to Apriori Algorithm, we are
discussing the implementation of Apriori algorithm only.

16.5.1 Apriori Algorithm


The Apriori algorithm is a data mining technique that is used for mining frequent
item sets and appropriate association rules. It does this by using a mathematical 485
Machine Learning - II formula. We focused on the definitions of association rule mining and Apriori
algorithms, as well as the application of an Apriori algorithm, in the area of this
class that was most pertinent to the topic. In this section, we will construct one
Apriori model utilising the Python programming language and a hypothetical
situation involving a small firm. However, it does have some limits, the effects
of which can be mitigated using a variety of different approaches. Data mining
and pattern recognition are two of the many applications that see widespread
use of the method.
The candidate set is produced by the model that is described further down below
by merging the set of frequent items from the step before it.
Conduct testing on subsets, and if the candidate set contains infrequent item
sets, remove them. And then calculate the final frequent itemset by obtaining
the items that meet the minimal support requirement.
You have already discussed this classifier in detail in Block 4 Unit 14 of this
course, you may refer to Block 4 Unit 14 to understand the concept.
Implementation code in Python
The screenshot of the executed code is given below

486
Machine Learning –
Programming Using
Python

487
Machine Learning - II

16.6 CLUSTERING ALGORITHMS


We learned about the theoretical aspects of various clustering algorithms like
K-Means, DBSCAN etc. in the respective unit of this course. The K-Means
algorithm was quite simple and hence its implementation is given below:

16.6.1 K-Means - Implementation code in Python


You have already discussed this classifier in detail in Block 4 Unit 15 of this
course, you may refer to Block 4 Unit 15 to understand the concept.
Implementation code in Python

488 The screenshot of the executed code is given below


Machine Learning –
Programming Using
Python

Check Your Progress - 4


4. Make Suitable assumptions and modify the python code of K-Means
algorithm 489
Machine Learning - II 16.8 SUMMARY
In this unit we understood the implementation of various machine learning
algorithms for Classification, Regression, Dimension Reductionality and
clustering. The theoretical aspects of the respective algorithm were already
discussed in the respective units of this course.

16.9 SOLUTIONS/ANSWERS
Check Your Progress - 1
1. Make Suitable assumptions and modify the python code of following
Classification algorithms:
a. K-NN
b. Decision Tree
c. Logistic Regression
d. Support Vector Machines
Solution : Refer to section 16.2
Check Your Progress - 2
2. Make Suitable assumptions and modify the python code of following
Regression algorithms:
a. Linear regression
b. Polynomial egression
Solution : Refer to section 16.3
Check Your Progress - 3
3. Make Suitable assumptions and modify the python code of Principal
Component Analysis, for dimensionality reduction.
Solution : Refer to section 16.4
Check Your Progress - 4
4. Make Suitable assumptions and modify the python code of K-Means
algorithm
Solution : Refer to section 16.6

16.10 FURTHER READINGS


● https://www.kaggle.com/
● https://www.github.com/
● https://towardsdatascience.com
● https://machinelearningmastery.com

490

You might also like