ML unit-2 (CEC)

Uploaded by

riskman1919

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

ML unit-2 (CEC)

Uploaded by

riskman1919

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Probabilistic Classifiers:

1. Bayes Classifiers (Naïve Bayesian Classifier)

2. Logistic Regression
 What is logistic regression? Logistic regression is an
example of supervised learning. It is used to
calculate or predict the probability of a binary
(yes/no) event occurring. An example of logistic
regression could be applying machine learning to
determine if a person is likely to be infected with
COVID-19 or not.
 Logistic regression is a statistical method used to
predict the outcome of a dependent variable based
on previous observations. It's a type of regression
analysis and is a commonly used algorithm for
solving binary classification problems.
 Logistic regression performs better when the data is
linearly separable
 It does not require too many computational resources
as it’s highly interpretable
 There is no problem scaling the input features—It
does not require tuning
 It is easy to implement and train a model using
logistic regression
 It gives a measure of how relevant a predictor
(coefficient size) is, and its direction of association
(positive or negative)
1. K-Nearest Neighbor (KNN)
2. Support Vector Machine (SVM)
 K-Nearest Neighbour is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique.

 K-NN algorithm assumes the similarity between the

new case/data and available cases and put the new
case into the category that is most similar to the
available categories.
 K-NN algorithm stores all the available data and
classifies a new data point based on the similarity.
This means when new data appears then it can be
easily classified into a well suite category by using
K- NN algorithm.

 K-NN algorithm can be used for Regression as well

as for Classification but mostly it is used for the
Classification problems.

 It is also called a lazy learner algorithm because it

does not learn from the training set immediately
instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
 K-NN is a non-parametric algorithm, which means
it does not make any assumption on underlying data.

 KNN algorithm at the training phase just stores the

dataset and when it gets new data, then it classifies
that data into a category that is much similar to the
new data.
 Why do we need a K-NN Algorithm?

 Suppose there are two categories, i.e., Category A and

Category B, and we have a new data point x1, so this
data point will lie in which of these categories. To
solve this type of problem, we need a K-NN
algorithm, we can easily identify the category or class
of a particular dataset. Consider the below diagram:
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K
number of neighbors
 Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
 Step-4: Among these K neighbors, count the number
of the data points in each category.
 Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
 Step-6: Our model is ready.
 Firstly, we will choose the number of neighbors, so
we will choose the k=3, k=5.
 Next, we will calculate the Euclidean
distance between the data points. The Euclidean
distance is the distance between two points, which we
have already studied in geometry. It can be calculated
as:
 # importing libraries
 import numpy as nm
 import matplotlib.pyplot as mtp
 import pandas as pd

 #importing datasets
 data_set= pd.read_csv('user_data.csv')

 #Extracting Independent and dependent Variable
 x= data_set.iloc[:, [2,3]].values
 y= data_set.iloc[:, 4].values

 # Splitting the dataset into training and test set.
 from sklearn.model_selection import train_test_split
 x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25,
random_state=0)

 #feature Scaling
 from sklearn.preprocessing import StandardScaler
 st_x= StandardScaler()
 x_train= st_x.fit_transform(x_train)
 x_test= st_x.transform(x_test)
 #Fitting K-NN classifier to the training set
 from sklearn.neighbors import KNeighborsClassifier

 classifier= KNeighborsClassifier(n_neighbors=5, met

ric='minkowski', p=2 )
 classifier.fit(x_train, y_train)
 Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is used
for Classification as well as Regression problems.
However, primarily, it is used for Classification problems
in Machine Learning.

 The goal of the SVM algorithm is to create the best line

or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data
point in the correct category in the future. This best
decision boundary is called a hyperplane.
 SVM chooses the extreme points/vectors that help in
creating the hyperplane. These extreme cases are
called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the
below diagram in which there are two different
categories that are classified using a decision
boundary or hyperplane:

 Types of SVM
 SVM can be of two types:
 Linear SVM: Linear SVM is used for linearly
separable data, which means if a dataset can be
classified into two classes by using a single straight
line, then such data is termed as linearly separable
data, and classifier is used called as Linear SVM
classifier.
 Non-linear SVM: Non-Linear SVM is used for non-
linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
 Hyperplane and Support Vectors in the SVM algorithm:
 Hyperplane: There can be multiple lines/decision
boundaries to segregate the classes in n-dimensional
space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary
is known as the hyperplane of SVM.
 The dimensions of the hyperplane depend on the features
present in the dataset, which means if there are 2 features
(as shown in image), then hyperplane will be a straight
line. And if there are 3 features, then hyperplane will be a
2-dimension plane.
 We always create a hyperplane that has a maximum
margin, which means the maximum distance between the
data points.
Support Vectors:
 The data points or vectors that are the closest to the
hyperplane and which affect the position of the
hyperplane are termed as Support Vector. Since these
vectors support the hyperplane, hence called a
Support vector.
1. Accuracy-0/1 Loss, Sensitivity, Specificity.

2.Clustering: The General Problem, K-Means

Clustering.
Sensitivity tells us what proportion of the positive class
got correctly classified.
 Misclassification rate: It is also termed as Error rate,
and it defines how often the model gives the wrong
predictions. The value of error rate can be calculated as
the number of incorrect predictions to all number of the
predictions made by the classifier. The formula is given
below:
When to use Accuracy / Precision / Recall / F1-Score?
 a. Accuracy is used when the True Positives and True
Negatives are more important. Accuracy is a better
metric for Balanced Data.
 b. Whenever False Positive is much more important
use Precision.
 c. Whenever False Negative is much more important
use Recall.
 d. F1-Score is used when the False Negatives and
False Positives are important. F1-Score is a better
metric for Imbalanced Data.
 K-means clustering is the most common partitioning
algorithm. K-means reassigns each data in the dataset to
only one of the new clusters formed. A record or data
point is assigned to the nearest cluster using a measure of
distance or similarity.

 K-Means Clustering is an Unsupervised Learning

algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as
if k=2, there will be two clusters, and for k=3, there will
be three clusters, and so on.
There are the following steps used in the K-means
clustering −
 It can select K initial cluster centroid c1, c2, c3… . . ck.
 It can assign each instance x in the S cluster whose
centroid is nearest to x.
 For each cluster, recompute its centroid based on which
elements are contained in that cluster.
 Go to (b) until convergence is completed.
 It can separate the object (data points) into K clusters.
 It is used to cluster center (centroid) = the average of all
the data points in the cluster.
 It can assign each point to the cluster whose centroid is
nearest (using distance function).
Step 1: Take the mean value
 Step 2: Find the nearest number of mean and put in
cluster.
 Step 3: Repeat Step 1 and step2 until we get same
mean.
 Example:
 Data points={ 2,4,6,9,12,16,20,24,26}
 Number of clusters=2
Euclidean Distance Method
Suppose a data set, D, contains n objects in Euclidean
space. Partitioning methods distribute the objects in D
into k clusters, C1,...,Ck , that is, Ci ⊂ D and Ci ∩Cj =
∅ for (1 ≤ i, j ≤ k). An objective function is used to
assess the partitioning quality so that objects within a
cluster are similar to one another but dissimilar to
objects in other clusters. This is, the objective function
aims for high intra cluster similarity and low inter
cluster similarity.
 A centroid-based partitioning technique uses the
centroid of a cluster, Ci , to represent that cluster.
Conceptually, the centroid of a cluster is its center
point. The centroid can be defined in various ways such
as by the mean or medoid of the objects (or points)
assigned to the cluster. The difference between an
object p∈Ci and ci , the representative of the cluster, is
measured by dist(p,ci), where dist(x,y) is the Euclidean
distance between two points x and y. The quality of
cluster Ci can be measured by the withincluster
variation, which is the sum of squared error between all
objects in Ci and the centroid ci , defined as

Unit 5 Learning with Algorithm
No ratings yet
Unit 5 Learning with Algorithm
7 pages
Classification
No ratings yet
Classification
74 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
Unit 5
No ratings yet
Unit 5
28 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
ML u4 Omkar Pawar
No ratings yet
ML u4 Omkar Pawar
11 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Document
No ratings yet
Document
6 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Chapter 4. Classification Algorithms-Stud
No ratings yet
Chapter 4. Classification Algorithms-Stud
43 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Lesson 8 - Classification
No ratings yet
Lesson 8 - Classification
74 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Chapter 4. K Nearest Neighbors (2)
No ratings yet
Chapter 4. K Nearest Neighbors (2)
55 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
ml unit2
No ratings yet
ml unit2
38 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
ML 4 (1)
No ratings yet
ML 4 (1)
33 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
Unit 2
No ratings yet
Unit 2
16 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
03
No ratings yet
03
22 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
ML UNIT-2
No ratings yet
ML UNIT-2
33 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
Unit 1
No ratings yet
Unit 1
15 pages
Sayan Das - Machine Learning
No ratings yet
Sayan Das - Machine Learning
4 pages
Classification
No ratings yet
Classification
7 pages
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
No ratings yet
Week 4 v1.1 (hidden) - Supervised Learning (Classification)
43 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Module 3
No ratings yet
Module 3
79 pages
KNN & Support Vector Machines: Dr.S.Vasantharathna
No ratings yet
KNN & Support Vector Machines: Dr.S.Vasantharathna
22 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
AI UNIT 4
No ratings yet
AI UNIT 4
17 pages
UNIT-3
No ratings yet
UNIT-3
100 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Algorithm
No ratings yet
Algorithm
27 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
DSBDUNITIII_T1729232981820-1
No ratings yet
DSBDUNITIII_T1729232981820-1
26 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
KMEANS
No ratings yet
KMEANS
9 pages
Machine algorithm
No ratings yet
Machine algorithm
3 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unit-4.docx (1)
No ratings yet
Unit-4.docx (1)
24 pages
Deep Learning Answers
No ratings yet
Deep Learning Answers
36 pages
Q. 1) What Is Class Condition Density? (3 Marks) Ans
No ratings yet
Q. 1) What Is Class Condition Density? (3 Marks) Ans
12 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
Lectures 7 and 8 - Data Anaysis in Management - MBM
No ratings yet
Lectures 7 and 8 - Data Anaysis in Management - MBM
78 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
A48 A20fe 008 (2024)
No ratings yet
A48 A20fe 008 (2024)
10 pages
Applied Logistic Regression - 3rd Edition Scribd Download
100% (8)
Applied Logistic Regression - 3rd Edition Scribd Download
17 pages
Fundamentals of Biostatistics 8th edition by Bernard Rosner 130526892X 9798214344201 - Read the ebook online or download it to own the full content
100% (9)
Fundamentals of Biostatistics 8th edition by Bernard Rosner 130526892X 9798214344201 - Read the ebook online or download it to own the full content
90 pages
Wiese Freund IJBD 2011
No ratings yet
Wiese Freund IJBD 2011
8 pages
Crop
No ratings yet
Crop
53 pages
Impact of Performance Appraisal Justice On The Effectiveness of Pay-for-Performance Systems After Civil Service Reform
No ratings yet
Impact of Performance Appraisal Justice On The Effectiveness of Pay-for-Performance Systems After Civil Service Reform
24 pages
Social Media Sentiment Analysis
No ratings yet
Social Media Sentiment Analysis
49 pages
2 Governace Merga
No ratings yet
2 Governace Merga
17 pages
Simple Linear Regression and Correlation: Chapter Outline
No ratings yet
Simple Linear Regression and Correlation: Chapter Outline
77 pages
Ordered Probit and Logit Models R Program and Output
No ratings yet
Ordered Probit and Logit Models R Program and Output
3 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
3 - 1 Logistic Regression
No ratings yet
3 - 1 Logistic Regression
9 pages
Data analysis in vegetation ecology 3rd edition Edition Wildi pdf download
100% (1)
Data analysis in vegetation ecology 3rd edition Edition Wildi pdf download
55 pages
Landsat: Assessing Using Discrete Statistical
No ratings yet
Landsat: Assessing Using Discrete Statistical
8 pages
Project Predictive Modeling PDF
100% (1)
Project Predictive Modeling PDF
58 pages
Detecting Fraudulent Financial Reporting Using Fraud Score Model and Fraud Pentagon Theory: Empirical Study of Companies Listed in The L.Q. 45 Index
No ratings yet
Detecting Fraudulent Financial Reporting Using Fraud Score Model and Fraud Pentagon Theory: Empirical Study of Companies Listed in The L.Q. 45 Index
38 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
SASprimer
No ratings yet
SASprimer
125 pages
Financial Data Analytics with R Monte-Carlo Validation (Jenny Chen) (Z-Library)
No ratings yet
Financial Data Analytics with R Monte-Carlo Validation (Jenny Chen) (Z-Library)
297 pages
C
No ratings yet
C
10 pages
Analytix Labs Data Science Course
100% (1)
Analytix Labs Data Science Course
18 pages
Studiii
No ratings yet
Studiii
4 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
33 pages
Credit Card Data - Final Project Proposal - Victor
No ratings yet
Credit Card Data - Final Project Proposal - Victor
1 page
5.2) Multinomial logistic regression
No ratings yet
5.2) Multinomial logistic regression
34 pages
Importance of Neck Circumference To Thyromental Distance Ratio A Preoperative Bedscreening Test Characterizing The Complexity of Endotracheal Intubation in Obese Patients
No ratings yet
Importance of Neck Circumference To Thyromental Distance Ratio A Preoperative Bedscreening Test Characterizing The Complexity of Endotracheal Intubation in Obese Patients
10 pages
Shortage of Teachers
No ratings yet
Shortage of Teachers
15 pages
Chi-Square Test For Feature Selection in Machine Learning
No ratings yet
Chi-Square Test For Feature Selection in Machine Learning
15 pages
離散資料分析 Categorical Data Analysis: 陳俞成 Email:[email protected]
No ratings yet
離散資料分析 Categorical Data Analysis: 陳俞成 Email:[email protected]
91 pages
Analyzing Health Data in R For SAS Users 1st Edition Monika Maya Wahi All Chapters Instant Download
100% (4)
Analyzing Health Data in R For SAS Users 1st Edition Monika Maya Wahi All Chapters Instant Download
62 pages

ML unit-2 (CEC)

Uploaded by

ML unit-2 (CEC)

Uploaded by

Probabilistic Classifiers:

1. Bayes Classifiers (Naïve Bayesian Classifier)

 K-NN algorithm assumes the similarity between the

 K-NN algorithm can be used for Regression as well

 It is also called a lazy learner algorithm because it

 KNN algorithm at the training phase just stores the

 Suppose there are two categories, i.e., Category A and

 classifier= KNeighborsClassifier(n_neighbors=5, met

 The goal of the SVM algorithm is to create the best line

2.Clustering: The General Problem, K-Means

 K-Means Clustering is an Unsupervised Learning

You might also like