ARTIFICIAL INTELLIGENCE LEC 5

The document provides an overview of unsupervised learning in artificial intelligence, focusing on clustering and its types, including partitioning and hierarchical clustering. It explains the differences between classification and clustering, along with various applications of clustering in fields such as marketing, biology, and finance. Additionally, it details the K-Means algorithm and the steps involved in clustering data, emphasizing the importance of selecting the optimal number of clusters.

Uploaded by

Kunal Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

ARTIFICIAL INTELLIGENCE LEC 5

Uploaded by

Kunal Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

ARTIFICIAL INTELLIGENCE(ADVANCED)

A Course under Centre of Excellence as Initiative of Department of Science and

Technology, Government of Bihar

GOVERNMENT POLYTECHNIC SAHARSA

Presenter:
Prof. Shubham
HoD(Computer Science and Engineering)
Todays Class
➢ Introduction to Unsupervised Learning
➢Introduction to Clustering
➢Classification vs Clustering
➢Types of Clustering
Unsupervised Machine Learning:
Unsupervised learning is a type of machine learning where the algorithm learns
to recognize patterns in data without being explicitly trained using labeled
examples. The goal of unsupervised learning is to discover the underlying
structure or distribution in the data.
There are two main types of unsupervised learning:
•Clustering: Clustering algorithms group similar data points together based on
their characteristics. The goal is to identify groups, or clusters, of data points that
are similar to each other, while being distinct from other groups. Some popular
clustering algorithms include K-means, Hierarchical clustering, and DBSCAN.
•Dimensionality reduction: Dimensionality reduction algorithms reduce the
number of input variables in a dataset while preserving as much of the original
information as possible. This is useful for reducing the complexity of a dataset
and making it easier to visualize and analyze. Some popular dimensionality
reduction algorithms include Principal Component Analysis (PCA), t-SNE, and
Autoencoders.
Clustering
Clustering is the task of dividing the population or data points into a
number of groups such that data points in the same groups are more
similar to other data points in the same group and dissimilar to the data
points in other groups. It is basically a collection of objects on the basis
of similarity and dissimilarity between them.
” A way of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities remain in a group
that has less or no similarities with another group.“
Clustering is a technique of organising a group of data into classes and
clusters where the objects reside inside a cluster will have high similarity
and the objects of two clusters would be dissimilar to each other. Here
the two clusters can be considered as disjoint. The main target of
clustering is to divide the whole data into multiple clusters. Unlike
classification process, here the class labels of objects are not known
before, and clustering pertains to unsupervised learning.
Classification vs Clustering
1.Classification is the process of classifying the data with the help of class labels. On
the other hand, Clustering is similar to classification but there are no predefined
class labels.
2.Classification is geared with supervised learning. As against, clustering is also
known as unsupervised learning.
3.Training sample is provided in classification method while in case of clustering
training data is not provided.
4.Classification is more complex as compared to clustering as there are many
levels in the classification phase whereas only grouping is done in clustering.
5.Output in Classification is known but output in clustering is not known.
6.Classification examples are Logistic regression, Naive Bayes classifier, Support
vector machines, etc. Whereas clustering examples are k-means clustering
algorithm, Fuzzy c-means clustering algorithm, Gaussian (EM) clustering
algorithm, etc.
Applications of Clustering in different fields:
1.Marketing: It can be used to characterize & discover customer segments for marketing
purposes.
2.Biology: It can be used for classification among different species of plants and animals.
3.Libraries: It is used in clustering different books on the basis of topics and information.
4.Insurance: It is used to acknowledge the customers, their policies and identifying the frauds.
5.City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
6.Earthquake studies: By learning the earthquake-affected areas we can determine the
dangerous zones.
7.Image Processing: Clustering can be used to group similar images together, classify images
based on content, and identify patterns in image data.
8.Genetics: Clustering is used to group genes that have similar expression patterns and identify
gene networks that work together in biological processes.
9.Finance: Clustering is used to identify market segments based on customer behavior, identify
patterns in stock market data, and analyze risk in investment portfolios.
10.Customer Service: Clustering is used to group customer inquiries and complaints into
categories, identify common issues, and develop targeted solutions.
Types of Clustering
Partitioning Clustering
This clustering method classifies the information into multiple groups based on
the characteristics and similarity of the data.
It is a type of clustering that divides the data into non-hierarchical groups. It is also known
as the centroid-based method. The most common example of partitioning clustering is
the K-Means Clustering algorithm.
In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the distance
between the data points of one cluster is minimum as compared to another cluster
centroid.
Input: K: The number of clusters in which the dataset has to be divided
D: A dataset containing N number of objects
Output: A dataset of K clusters
K-Means Clustering
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need
to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be
three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into

k different clusters in such a way that each
dataset belongs only one group that has similar properties
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The k-means clustering algorithm mainly performs two tasks:
•Determines the best value for K center points or centroids by an iterative process.
•Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
Steps for K-Means Clustering
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
How to choose the value of "K number of clusters" in K-means Clustering?
The performance of the K-means clustering algorithm depends upon highly efficient clusters that it
forms. But choosing the optimal number of clusters is a big task. There are some different ways to find
the optimal number of clusters, but here we are discussing the most appropriate method to find the
number of clusters or value of K.
Hierarchical Clustering
•The clusters formed in this method form a tree-type structure based on the
hierarchy. New clusters are formed using the previously formed one. It is divided into
two category Agglomerative (bottom-up approach)
•Divisive (top-down approach)
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is
no requirement of pre-specifying the number of clusters to be created.
In this technique, the dataset is divided into clusters to create a tree-like structure, which is
also called a dendrogram. T
The observations or any number of clusters can be selected by cutting the tree at the correct
level.
The hierarchical clustering technique has two approaches:
1.Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with
taking all data points as single clusters and merging them until one cluster is left.
2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down
approach.
Agglomerative Clustering
• It follows the bottom-up approach. It means, this algorithm considers
each dataset as a single cluster at the beginning, and then start combining
the closest pair of clusters together. It does this until all the clusters are
merged into a single cluster that contains all the datasets.
• Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
• Step 2: Take two closest data points or clusters and merge them to form
one cluster. So, there will now be N-1 clusters.
• Step 3: Again, take the two closest clusters and merge them together to
form one cluster. There will be N-2 clusters.
• Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
Divisive Clustering
• It is also known as a top-down approach. This algorithm also
does not require to prespecify the number of clusters.
• Top-down clustering requires a method for splitting a cluster
that contains the whole data and proceeds by splitting clusters
recursively until individual data have been split into singleton
clusters.
• Divisive clustering is more complex as compared to
agglomerative clustering, as in the case of divisive clustering
we need a flat clustering method as “subroutine” to split each
cluster until we have each data having its own singleton
cluster.
Hierarchical Agglomerative vs Divisive Clustering
• Divisive clustering is more complex as compared to agglomerative clustering, as in
the case of divisive clustering we need a flat clustering method as “subroutine” to
split each cluster until we have each data having its own singleton cluster.
• Divisive clustering is more efficient if we do not generate a complete hierarchy all
the way down to individual data leaves. The time complexity of a naive
agglomerative clustering is O(n3) because we exhaustively scan the N x N matrix
dist_mat for the lowest distance in each of N-1 iterations. Using priority queue data
structure we can reduce this complexity to O(n2logn). By using some more
optimizations it can be brought down to O(n2). Whereas for divisive clustering given
a fixed number of top levels, using an efficient flat algorithm like K-Means, divisive
algorithms are linear in the number of patterns and clusters.
• A divisive algorithm is also more accurate. Agglomerative clustering makes
decisions by considering the local patterns or neighbor points without initially taking
into account the global distribution of data. These early decisions cannot be undone.
whereas divisive clustering takes into consideration the global distribution of data
when making top-level partitioning decisions.
•

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
ML CH 4
No ratings yet
ML CH 4
51 pages
Unit 4
No ratings yet
Unit 4
74 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Module 5
No ratings yet
Module 5
91 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
unit4
No ratings yet
unit4
96 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unit-4
No ratings yet
Unit-4
53 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unit 5
No ratings yet
Unit 5
5 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Clustering
No ratings yet
Clustering
13 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering new
No ratings yet
Clustering new
6 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
CBSYLLABUS BDA
No ratings yet
CBSYLLABUS BDA
5 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Clustering
No ratings yet
Clustering
44 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
1
No ratings yet
1
59 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
Clustering
No ratings yet
Clustering
29 pages
Unit 4
No ratings yet
Unit 4
40 pages
clustering
No ratings yet
clustering
20 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
No ratings yet
A Comprehensive Overview of Clustering Algorithms in Pattern Recognition
8 pages
ML_Unit-3
No ratings yet
ML_Unit-3
22 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
K Means
No ratings yet
K Means
9 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
Clustering
No ratings yet
Clustering
9 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
Clustering
No ratings yet
Clustering
10 pages
unit-4 ML
No ratings yet
unit-4 ML
16 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
49 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
DAA_Assignment @20242025
No ratings yet
DAA_Assignment @20242025
1 page
6) Python Tuple
No ratings yet
6) Python Tuple
3 pages
VLSI Digital Signal Processing Systems Keshab Parhi
No ratings yet
VLSI Digital Signal Processing Systems Keshab Parhi
8 pages
DST Questions
No ratings yet
DST Questions
2 pages
5 Hyperbola
No ratings yet
5 Hyperbola
27 pages
Programming in Haskell: Chapter 3 - Types and Classes
No ratings yet
Programming in Haskell: Chapter 3 - Types and Classes
27 pages
3 Examples
No ratings yet
3 Examples
21 pages
Unit 2 L5
No ratings yet
Unit 2 L5
10 pages
Full download Introduction to Recursive Programming 1st Edition Manuel Rubio-Sanchez pdf docx
100% (1)
Full download Introduction to Recursive Programming 1st Edition Manuel Rubio-Sanchez pdf docx
55 pages
Certified randomness using a trapped-ion quantum processor
No ratings yet
Certified randomness using a trapped-ion quantum processor
8 pages
Sheet 2
No ratings yet
Sheet 2
15 pages
Genetic Algorithm in Python: Data Mining Lab 6
No ratings yet
Genetic Algorithm in Python: Data Mining Lab 6
25 pages
Data Structures Notes
No ratings yet
Data Structures Notes
40 pages
cs402 QUIZ 1
No ratings yet
cs402 QUIZ 1
16 pages
Data Structures
No ratings yet
Data Structures
12 pages
Derrick_201906_GCN_complexityAnalysis-writeup
No ratings yet
Derrick_201906_GCN_complexityAnalysis-writeup
7 pages
Worksheet - 2
No ratings yet
Worksheet - 2
5 pages
18.204 Elizabeth Walker Final Paper
No ratings yet
18.204 Elizabeth Walker Final Paper
7 pages
CFT and DFT
No ratings yet
CFT and DFT
39 pages
Assignment No. 1
No ratings yet
Assignment No. 1
5 pages
Two Pointer Algorithm: Li Yin January 19, 2019
No ratings yet
Two Pointer Algorithm: Li Yin January 19, 2019
15 pages
Grammar
No ratings yet
Grammar
44 pages
Bits F232
No ratings yet
Bits F232
3 pages
Cse3090y 2020 2
No ratings yet
Cse3090y 2020 2
9 pages
Machine Learning Term Test 2
No ratings yet
Machine Learning Term Test 2
20 pages
Limiting Factors
100% (1)
Limiting Factors
7 pages
Report On Red-Black Tree
No ratings yet
Report On Red-Black Tree
19 pages
CO2 Ged102 pg.193
No ratings yet
CO2 Ged102 pg.193
3 pages
AAscript
No ratings yet
AAscript
158 pages
sp14 Midterm1 Solutions
No ratings yet
sp14 Midterm1 Solutions
19 pages

ARTIFICIAL INTELLIGENCE LEC 5

Uploaded by

ARTIFICIAL INTELLIGENCE LEC 5

Uploaded by

ARTIFICIAL INTELLIGENCE(ADVANCED)

A Course under Centre of Excellence as Initiative of Department of Science and

GOVERNMENT POLYTECHNIC SAHARSA

It is an iterative algorithm that divides the unlabeled dataset into

You might also like