0% found this document useful (0 votes)

2 views

Unit-5

The document discusses various machine learning techniques, including K-means clustering, K-medoids, Gaussian mixtures, and linear discriminant functions, highlighting their applications and differences. It also covers forms of learning such as supervised, unsupervised, and semi-supervised learning, along with Bayes' theorem and comparisons between KNN and ANN. Additionally, it explains dimensionality reduction techniques like PCA and LDA, emphasizing their objectives, methodologies, and performance differences.

Uploaded by

Rahul Vijay

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit-5

Uploaded by

Rahul Vijay

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

UNIT 5

5.1 K-Means Clustering

K-means clustering is an unsupervised machine learning algorithm that creates groups made up of similar
data points. It has various applications, including customer segmentation, anomaly detection, and sentiment
analysis.

K means clustering, assigns data points to one of the K clusters depending on their distance from the center
of the clusters. It starts by randomly assigning the clusters centroid in the space. Then each data point assign
to one of the cluster based on its distance from centroid of the cluster. After assigning each point to one of
the cluster, new cluster centroids are assigned. This process runs iteratively until it finds good cluster. In the
analysis we assume that number of cluster is given in advanced and we have to put points in one of the
group.

K Means clustering performs best data is well separated. When data points overlapped this clustering is not
suitable. K Means is faster as compare to other clustering technique. It provides strong coupling between the
data points. K Means cluster do not provide clear information regarding the quality of clusters. Different
initial assignment of cluster centroid may lead to different clusters. Also, K Means algorithm is sensitive to
noise. It may have stuck in local minima.

5.2 K-Means v/s K-Mediods

K-means K-medoids
K-means takes the mean of data points to create K-medoids uses points from the data to serve as
new points called centroids. points called medoids.
Centroids are new points previously not found in
Medoids are existing points from the data.
the data.
K-medoids can be used for both numerical and
K-means can only by used for numerical data.
categorical data.
K-means focuses on reducing the sum of squared K-medoids focuses on reducing the
distances, also known as the sum of squared error dissimilarities between clusters of data from the
(SSE). dataset.
K-means uses Euclidean distance. K-medoids uses Manhattan distance.
K-medoids is outlier resistant and can reduce the
K-means is not sensitive to outliers within the
effect of outliers. More useful when the clusters
data.
have irregular shapes or different sizes.
5.3 Gaussian Mixture
A Gaussian mixture is a function that is composed of several Gaussians, each identified by k ∈ {1,…, K},
where K is the number of clusters of our data set. Each Gaussian k in the mixture is comprised of the
following parameters:

• A mean μ that defines its center.

• A covariance Σ that defines its width. This would be equivalent to the dimensions of an ellipsoid in a
multivariate scenario.
• A mixing probability π that defines how big or small the Gaussian function will be.

Let’s illustrate these parameters graphically:

Here, we can see that there are three Gaussian functions, hence K = 3. Each Gaussian explains the data
contained in each of the three clusters available. The mixing coefficients are themselves probabilities and
must meet this condition:

we must ensure that each Gaussian fits the data points belonging to each cluster.

5.4 Linear Discriminant Function

A linear discriminant function is a statistical method that helps classify data into groups by finding a linear
combination of features that best separates the data. A linear discriminant function separates data points into
different classes or categories based on their features. It finds a straight line or plane that best separates the
groups while minimizing overlap within each class. It's used in machine learning for pattern recognition,
image retrieval, and face detection algorithms. It's often chosen because it's simple, requires less training
time, and can overcome training set imbalances.
5.5 Forms of Learning
There are four main categories of Machine Learning algorithms: supervised, unsupervised, semi-supervised,
and reinforcement learning.

5.5.1 Supervised Learning is a type of machine learning where a model is trained on labelled data—
meaning each input is paired with the correct output. the model learns by comparing its predictions with the
actual answers provided in the training data. Over time, it adjusts itself to minimize errors and improve
accuracy. The goal of supervised learning is to make accurate predictions when given new, unseen data. For
example, if a model is trained to recognize handwritten digits, it will use what it learned to correctly identify
new numbers it hasn’t seen before.
Even though classification and regression are both from the category of supervised learning, they are not the
same.

• The prediction task is a classification when the target variable is discrete. An application is the
identification of the underlying sentiment of a piece of text.
• The prediction task is a regression when the target variable is continuous. An example can be the
prediction of the salary of a person given their education degree, previous work experience,
geographical location, and level of seniority.

Here are some examples of supervised classification:

• Email filtering: Classifying incoming emails as spam or legitimate

• Credit scoring: Using supervised learning for credit scoring
• Voice recognition: Using supervised learning for voice recognition

Some algorithms used in supervised classification include:

1. Decision tree: A popular algorithm that builds classification or regression models in the form of a tree
structure. It can classify both categorical and continuous dependent variables.
2. Logistic regression: A statistical method used for binary classification problems, where the outcome is
dichotomous, meaning it can take on one of two possible values.
3. Random forest: A versatile technique that improves accuracy by combining multiple independent decision
trees.

5.5.2 Semi-supervised learning is a type of machine learning that falls in between supervised and
unsupervised learning. It is a method that uses a small amount of labelled data and a large amount of
unlabeled data to train a model. The goal of semi-supervised learning is to learn a function that can
accurately predict the output variable based on the input variables, similar to supervised learning. However,
unlike supervised learning, the algorithm is trained on a dataset that contains both labelled and unlabelled
data. Semi-supervised learning is particularly useful when there is a large amount of unlabeled data
available, but it’s too expensive or difficult to label all of it.

Semi-Supervised Learning Flow Chart

5.5.3 Unsupervised classification is a machine learning technique that uses software to automatically group
pixels or data points with similar characteristics without the user providing sample classes. The software
uses statistical routines called "clustering" to analyze an image and group pixels into classes based on their
shared characteristics.
Unsupervised classification can be helpful when data science teams don't know what they're looking for in
data. For example, it can be used to categorize users based on their social media activity.
The software will generally result in more categories than the user is interested in. The user will then need to
decide which categories can be grouped together.

Some examples of unsupervised classification techniques include:

• Hierarchical clustering: An unsupervised learning method that builds clusters by measuring the
dissimilarities between data points.
• ISODATA: An unsupervised classification technique that calculates class means and iteratively
clusters the remaining pixels

5.6 Bayes’ theorem

When the intersection of two events happen, then the formula for conditional probability for the occurrence
of two events is given by- P(A|B) = N(A∩B)/N(B) -(eq1) Or P(B|A) = N(A∩B)/N(A) - (eq2)
Where, P(A|B) represents the probability of occurrence of A given B has occurred.
N(A ∩ B) is the number of elements common to both A and B.
N(B) is the number of elements in B, and it cannot be equal to zero.

Let N represent the total number of elements in the sample space.

Therefore, N(A ∩ B)/N can be written as P(A ∩ B) and N(B)/N as P(B) ⇒ P(A|B) = P(A ∩ B)/P(B)

Therefore, P(A ∩ B) = P(B) P(A|B) if P(B) ≠ 0 ⇒ P(A) P(B|A) if P(A) ≠ 0

Similarly, the probability of occurrence of B when A has already occurred is given by,
P(B|A) = P(B ∩ A)/P(A)

Bayes’ theorem defines the probability of occurrence of an event associated with any condition. It is
considered for the case of conditional probability. Also, this is known as the formula for the likelihood of
“causes”.
5.6 KNN v/s ANN

Criteria KNN ANN

Known for precision,
Prioritizes speed over exact precision, approximates
Accuracy identifies 'k' closest
nearest neighbors with slight accuracy compromise
neighbors for high accuracy
Slower due to exhaustive
Faster searches through approximation techniques, suitable
Speed search, especially with large
for real-time applications
datasets
Requires significant
Computational Designed for efficiency, reduces computational load
computational resources,
Resources through indexing and approximation
especially with large datasets
Challenges in scaling due to
Scalability Handles larger datasets effectively, scalable
high computational demands
Best for tasks requiring high
accuracy (e.g., medical Ideal for scenarios needing speed and scalability (e.g.,
Use Cases
diagnosis, financial search engines, recommendation systems)
forecasting)
Instance-based learning,
Learning makes predictions based on Not applicable in the traditional sense; ANN is an
Approach proximity without building a optimization technique rather than a learning method
model
Capable of capturing complex non-linear relationships, but
Handles non-linearity
Handling of this is more relevant to neural networks rather than ANN
naturally based on local data
Non-linearity specifically[Note: This point is more about neural
patterns
networks than ANN, but included for completeness]

5.7 Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis or Discriminant
Function Analysis, is a dimensionality reduction technique primarily utilized in supervised classification
problems. It facilitates the modelling of distinctions between groups, effectively separating two or more
classes. LDA operates by projecting features from a higher-dimensional space into a lower-dimensional one.
In machine learning, LDA serves as a supervised learning algorithm specifically designed for classification
tasks, aiming to identify a linear combination of features that optimally segregates classes within a dataset.
For example, we have two classes and we need to separate them efficiently. Classes can have multiple
features. Using only a single feature to classify them may result in some overlapping as shown in the below
figure. So, we will keep on increasing the number of features for proper classification.

Linear Discriminant analysis is used as a dimensionality reduction technique in machine learning, using
which we can easily transform a 2-D and 3-D graph into a 1-dimensional plane.

Let's consider an example where we have two classes in a 2-D plane having an X-Y axis, and we need to
classify them efficiently. As we have already seen in the above example that LDA enables us to draw a
straight line that can completely separate the two classes of the data points. Here, LDA uses an X-Y axis to
create a new axis by separating them using a straight line and projecting data onto a new axis.

To create a new axis, Linear Discriminant Analysis uses the following criteria:

• It maximizes the distance between means of two classes.

• It minimizes the variance within the individual class.

Using the above two conditions, LDA generates a new axis in such a way that it can maximize the distance
between the means of the two classes and minimizes the variation within each class.

5.8 PCA

1. Standardize the range of continuous initial variables

Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for
each value of each variable.

Once the standardization is done, all the variables will be transformed to the same scale.

2. Compute the covariance matrix to identify correlations

The covariance matrix is a p × p symmetric matrix (where p is the number of dimensions) that has as entries
the covariances associated with all possible pairs of the initial variables. For example, for a 3-dimensional
data set with 3 variables x, y, and z, the covariance matrix is a 3×3 data matrix of this from:

Covariance Matrix for 3-Dimensional Data.

Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the main diagonal (Top left
to bottom right) we actually have the variances of each initial variable. And since the covariance is
commutative (Cov(a,b)=Cov(b,a)), the entries of the covariance matrix are symmetric with respect to the
main diagonal, which means that the upper and the lower triangular portions are equal.

What do the covariances that we have as entries of the matrix tell us about the correlations between
the variables?

It’s actually the sign of the covariance that matters:

• If positive then: the two variables increase or decrease together (correlated)

• If negative then: one increases when the other decreases (Inversely correlated)

3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the principal
components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes

5.9 Here are some key differences between PCA and LDA:
Objective: PCA is an unsupervised technique that aims to maximize the variance of the data along the
principal components. The goal is to identify the directions that capture the most variation in the data. LDA,
on the other hand, is a supervised technique that aims to maximize the separation between different classes
in the data. The goal is to identify the directions that capture the most separation between the classes.

Supervision: PCA does not require any knowledge of the class labels of the data, while LDA requires
labeled data in order to learn the separation between the classes.

Dimensionality Reduction: PCA reduces the dimensionality of the data by projecting it onto a lower-
dimensional space, while LDA reduces the dimensionality of the data by creating a linear combination of the
features that maximizes the separation between the classes.

Output: PCA outputs principal components, which are linear combinations of the original features. These
principal components are orthogonal to each other and capture the most variation in the data. LDA outputs
discriminant functions, which are linear combinations of the original features that maximize the separation
between the classes.

Interpretation: PCA is often used for exploratory data analysis, as the principal components can be used to
visualize the data and identify patterns. LDA is often used for classification tasks, as the discriminant
functions can be used to separate the classes.

Performance: PCA is generally faster and more computationally efficient than LDA, as it does not require
labeled data. However, LDA may be more effective at capturing the most important information in the data
when class labels are available.

Decision Trees. These Models Use Observations About Certain
No ratings yet
Decision Trees. These Models Use Observations About Certain
6 pages
NLP Chapter 2
No ratings yet
NLP Chapter 2
79 pages
R20 machine learning unit 4
No ratings yet
R20 machine learning unit 4
49 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unit 3
No ratings yet
Unit 3
15 pages
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
No ratings yet
APznzab0G8iLD5cDfn798Gn-fXshRpam8ullbf6ZS5Hd4l0BEcKNHy9gDG24DS66RfgvnKXAQjMAivMmmi5cmDWF9tqOaPMy3afuzafCU1kpG1xfQIr7b98q406ZWiqt50nL8WhMI6azoYzWSgf7c7khnqww3VlQ9I90ROmc0QL4DbmipYYoLleGYR6TO4UYmc_PsaQB5v0XmLUwPEub3QuwGdUnUEr2dp_hV4bds0MuRbpJ
34 pages
Unsupervised Lec
No ratings yet
Unsupervised Lec
12 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
ML_unit_4
No ratings yet
ML_unit_4
17 pages
SJNanda - Spider and CollidingBodies
No ratings yet
SJNanda - Spider and CollidingBodies
50 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Classification and Clustering Method
0% (1)
Classification and Clustering Method
30 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
AIML
No ratings yet
AIML
30 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
ML-UNSUPERVISED
No ratings yet
ML-UNSUPERVISED
35 pages
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
No ratings yet
Colloquium Evaluation: Faculty of Computer Science and Engineering To:Kanika Gupta Ma'Am Bhavya Sethi 16csu082
12 pages
Assignment Part A
No ratings yet
Assignment Part A
7 pages
Module 6.1
No ratings yet
Module 6.1
42 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
overview_basics
No ratings yet
overview_basics
16 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Unit-4
No ratings yet
Unit-4
53 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
Unit 5a
No ratings yet
Unit 5a
60 pages
Cs8080 Unit3 Text Classification and Clustering
No ratings yet
Cs8080 Unit3 Text Classification and Clustering
171 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
CS1004 DataMining Unit 4 Notes
No ratings yet
CS1004 DataMining Unit 4 Notes
8 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
ML U5
No ratings yet
ML U5
24 pages
What Is Machine Learning?: Example 1
No ratings yet
What Is Machine Learning?: Example 1
5 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Unit 2
No ratings yet
Unit 2
57 pages
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
No ratings yet
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
8 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Classification
No ratings yet
Classification
50 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Chapter-V CLASSIFICATION & CLUSTERING
No ratings yet
Chapter-V CLASSIFICATION & CLUSTERING
153 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
DW Ans
No ratings yet
DW Ans
11 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Screenshot 2025-01-03 at 8.05.30 PM
No ratings yet
Screenshot 2025-01-03 at 8.05.30 PM
20 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Unit 5
No ratings yet
Unit 5
28 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
CV UNIT 4
No ratings yet
CV UNIT 4
60 pages
ML Notes 1
No ratings yet
ML Notes 1
3 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Unit III 1
No ratings yet
Unit III 1
22 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Root Cause Analysis For LTE Networks Based On Unsupervised Techniques
No ratings yet
Automatic Root Cause Analysis For LTE Networks Based On Unsupervised Techniques
18 pages
Revealing the promise of predictive analytics in management accounting a strategic analysis of the value-added potential
No ratings yet
Revealing the promise of predictive analytics in management accounting a strategic analysis of the value-added potential
206 pages
Class X Part B Unit 1 Intro. To AI
No ratings yet
Class X Part B Unit 1 Intro. To AI
19 pages
CLass 10 Preboard 1 A.I
No ratings yet
CLass 10 Preboard 1 A.I
5 pages
Seminar Report On Machine Learing
33% (3)
Seminar Report On Machine Learing
30 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
AI-AIML Question Bank
No ratings yet
AI-AIML Question Bank
6 pages
AI For School Teachers v1
No ratings yet
AI For School Teachers v1
10 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
What Is Artificial Intelligence
No ratings yet
What Is Artificial Intelligence
41 pages
843-Skill Handbook AI Class XI
No ratings yet
843-Skill Handbook AI Class XI
173 pages
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
No ratings yet
6 Finetuning For Classification - Build A Large Language Model (From Scratch)
49 pages
Ann-Unit Iv
No ratings yet
Ann-Unit Iv
27 pages
Q ClassX AI Ch2
No ratings yet
Q ClassX AI Ch2
10 pages
Machine Learning Models
100% (1)
Machine Learning Models
2 pages
Eti Project
No ratings yet
Eti Project
26 pages
Jailbreak GPT Handbook by Zsec
No ratings yet
Jailbreak GPT Handbook by Zsec
15 pages
HD Paper
No ratings yet
HD Paper
9 pages
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
No ratings yet
Generative Adversarial Networks (Gans) : An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments
17 pages
Data Science Interview Questions
100% (2)
Data Science Interview Questions
55 pages
Medicina 2676895
No ratings yet
Medicina 2676895
23 pages
Icpram 2025
No ratings yet
Icpram 2025
15 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Machine Learning Seminar Report
20% (5)
Machine Learning Seminar Report
26 pages
A Survey of Deep Learning Approaches To Image Restoration
No ratings yet
A Survey of Deep Learning Approaches To Image Restoration
20 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
ML Notes
No ratings yet
ML Notes
12 pages
CLASS 10 AI Chapter 2
No ratings yet
CLASS 10 AI Chapter 2
18 pages

Unit-5

Uploaded by

Unit-5

Uploaded by

UNIT 5

5.1 K-Means Clustering

5.2 K-Means v/s K-Mediods

• A mean μ that defines its center.

Let’s illustrate these parameters graphically:

5.4 Linear Discriminant Function

Here are some examples of supervised classification:

• Email filtering: Classifying incoming emails as spam or legitimate

Some algorithms used in supervised classification include:

Semi-Supervised Learning Flow Chart

Some examples of unsupervised classification techniques include:

5.6 Bayes’ theorem

Let N represent the total number of elements in the sample space.

Therefore, P(A ∩ B) = P(B) P(A|B) if P(B) ≠ 0 ⇒ P(A) P(B|A) if P(A) ≠ 0

Criteria KNN ANN

5.7 Linear Discriminant Analysis (LDA)

• It maximizes the distance between means of two classes.

1. Standardize the range of continuous initial variables

2. Compute the covariance matrix to identify correlations

Covariance Matrix for 3-Dimensional Data.

It’s actually the sign of the covariance that matters:

• If positive then: the two variables increase or decrease together (correlated)

You might also like