Cluster Analysis

Cluster analysis note

Uploaded by

fazeelanwar.vp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views18 pages

Cluster Analysis

Cluster analysis note

Uploaded by

fazeelanwar.vp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Cluster Analysis

• Clustering: The process of grouping a set of physical or abstract

objects into classes of similar objects
• In other words: It is the task of grouping a set of objects in such
a way that objects in the same group (called a cluster) are more
similar (in some sense) to each other than to those in other
groups (clusters).
• A cluster is a collection of data objects that are similar to one
another within the same cluster and are dissimilar to the objects
in other clusters.
• In clustering analysis, First partition the set of data into groups
based on data similarity (e.g., using clustering), and then assign
labels to the relatively small number of groups
• Unlike classification, clustering and unsupervised learning do not
rely on predefined classes and class-labeled training examples.
• Clustering is a form of learning by observation, rather than
learning by
examples.
Fig: A sample of 3 clusters in n dimensional space
• Applications: market research, pattern recognition, data
analysis,image processing, machine learning, information
retrieval,bioinformatics, data compression, and computer graphics
• In business:help marketers discover distinct groups in their
customer bases and characterize customer groups based on
purchasing patterns.
• In biology: used to derive plant and animal taxonomies, categorize
genes with similar functionality, and gain insight into structures
inherent in populations
• Helps to classify documents on the Web for information discovery.
• Help in the identication of areas of similar land use in an earth
observation database etc...
• Outlier detection: can detect values that are far away from any
cluster
• Typical requirements of clustering algorithms in data mining:
• Scalability:
Algorithms must be scalable over large database that may contain millions
of
data objects.
• Ability to deal with different types of attributes: Should support not only
the interval-based (numerical) data, but also other types of data, such as
binary, categorical (nominal), and ordinal data, or mixtures of these data
types.
• Discovery of clusters with arbitrary shape:
Algorithms based on distance measures like Euclidean or Manhattan,tend to
find
spherical clusters with similar size and density.
• It is important to develop algorithms that can detect clusters of arbitrary
shape.
• Minimal requirements for domain knowledge to determine input
parameters
Many clustering algorithms require users to input certain parameters in
cluster analysis (such as the number of desired clusters), which are often
diffcult to determine.
The algorithms must have minimal requirements to avoid burden to users
• Ability to deal with noisy data:
Some clustering algorithms are sensitive to outliers or missing,unknown, or
erroneous data and may lead to clusters of poor quality. So the algorithms
should have a mechanism to deal such data to get clusters of better quality
• Incremental clustering and insensitivity to the order of input records:
Algorithms must incorporate newly inserted data (i.e., database updates)
into existing clustering structures, as well as they should be insensitive to the
order of input.
• High dimensionality
A database or a data warehouse can contain several dimensions or
attributes
Algorithms should support data objects in high dimensional space
• Constraint-based clustering:
Real-world applications may need to perform clustering under various
kinds of constraints.
• The algorithms must be capable of satisfying user specfied
constraints.
• Interpretability and usability:
The clustering results to be interpretable, comprehensible, and usable.
Categorization of Clustering Methods

• Partitioning methods:
A partitioning method constructs k partitions from the a given database of n
objects or data tuples, where each partition represents a cluster and k <= n.
• ie; it classiffies the data into k groups which together satisfy the following
requirements:
• (1) each group must contain at least one object
• (2) each object must belong to exactly one group
• Given k, the number of partitions to construct, a partitioning method
creates an initial partitioning
• Then uses an iterative relocation technique that attempts to improve the
partitioning by moving objects from one group to another.
• General criterion of a good partitioning: objects in the same cluster are
“close" or related to each other, whereas objects of different clusters are
“far apart" or very different
• There are various kinds of other criteria for judging the quality of
partitions.
• There are two types of clustering methods based on partitioning
• 1. k-means algorithm: where each cluster is represented by the mean
value of the objects in the cluster
• 2. k-medoids algorithm: where each cluster is represented by one of the
objects located near the center of the cluster.
• Hierarchical methods:
• A hierarchical method creates a hierarchical decomposition of the given
set of data objects
• A hierarchical method can be classified as being either agglomerative or
divisive, based on how the hierarchical decomposition is formed.
• Agglomerative Methods:
• also called the bottom-up approach, , starts with each object forming a
separate group
• It successively merges the objects or groups that are close to one another,
until all of the groups are merged into one (the topmost level of the
hierarchy), or until a termination condition holds.
• Divisive Methods:
• also called the top-down approach, starts with all of the objects in the same
cluster.
• In each successive iteration, a cluster is split up into smaller clusters, until
eventually each object is in one cluster, or until a termination condition holds.
• Hierarchical methods suffer from the fact that once a step (merge or split) is done,
it can never be undone.
• Heirachical Clustering Methods: BIRCH,DBSCAN
• Density-based methods:
• Clustering methods have been developed based on the notion of density(number
of objects or data points)
• The general idea is to continue growing the given cluster as long as the density
(number of objects or data points) in the “neighborhood“ exceeds some threshold
• ie; for each data point within a given cluster, the neighborhood of a given radius
has to contain at least a minimum number of points
• Such a method can be used to lter out noise (outliers) and discover
clusters of arbitrary shape.
• Density based clustering methods: DBSCAN and its extension,
OPTICS
• Grid-based methods
• Quantize the object space into a finite number of cells that form a
grid structure.
• All of the clustering operations are performed on the grid structure
(i.e., on the quantized space).
• The main advantage of this approach is its fast processing time,
which is typically independent of the number of data objects and
dependent only on the number of cells in each dimension in the
quantized space.
• Example: STING
• Model-based methods
• Hypothesize a model for each of the clusters and
find the best fit of the data to the given model
• A model-based algorithm may locate clusters by
constructing a density function that reflects the
spatial distribution of the data points
• It also leads to a way of automatically
determining the number of clusters based on
standard statistics
• Takes “noise" or outliers into account and thus
yielding robust clustering methods.
K-Medoid Clustering

• In k-medoid clustering, each cluster is represented by one of the

objects located near the center of the cluster.
• Steps:
1 Choose k, the number of clusters
2 Select at random k objects in D as the initial nearest representatives
or seeds
3 Assign each data point(object) to the closest representative object,
which forms k clusters
4 randomly select a nonrepresentative object, O.;
5 compute the cost change, S, of swapping representative object, Oj
,with O.;
6 if S < 0 then swap Oj , with O. to form the new set of k representative
objects;
7 Repeat steps 3 to 6 until no change; otherwise STOP
• Use k-medoid clustering algorithm to divide
the given data into two clusters and also
compute the representative data points for
the clusters.
(1) We have k = 2, the number of clusters
(2) Initialize random medoids c1 = (3; 4) and c2 = (7; 4)
(3) Calculating distance(cost) so as to associate each data object
to its nearest medoid
• The distance(cost) between ci = (a; b) and Xi = (c; d) is
calculated as
cost =|a − c| + |b − d| which is the Manhattan Distance
• Calculate the total cost(S1) as summation of the cost of
data object
from its medoid in its clusters, which is S1 = 20
• This divides the data into two clusters: Cluster 1:
{X1;X2;X3;X4}
and Cluster 2: {X5;X6;X7;X8;X9;X10}
• (4) Select another random medoid O. = (7; 3). Now the
medoids are
c1 = (3; 4);O. = (7; 3), and calculate the new total cost(S.)
• We have S’ = 22, We get the cost of swapping the
medoid from
old(c2) to new(O’) as:
• S = S’ − S1 = 22 − 20 = 2 > 0
• The positive value indicates a higher cost if we
swap to new medoid.
So moving to O’ would be bad idea. hence the
previous choice was
good and algorithm terminates here.
Assignment
For the given data, partition it into
two clusters using the k-medoid
algorithm.

C# Cheat Sheet
100% (6)
C# Cheat Sheet
12 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Clustering
No ratings yet
Clustering
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Clustering new
No ratings yet
Clustering new
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Clustering
No ratings yet
Clustering
104 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering
No ratings yet
Clustering
25 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
Cluster Is A Group of Objects That Belongs To The Same Class
No ratings yet
Cluster Is A Group of Objects That Belongs To The Same Class
12 pages
Unit 4
No ratings yet
Unit 4
21 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Clustering
No ratings yet
Clustering
11 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
DATA_MINING_UNIT-4
No ratings yet
DATA_MINING_UNIT-4
15 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Cluster
No ratings yet
Cluster
20 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Cluster Analysis
No ratings yet
Cluster Analysis
22 pages
Clustering
No ratings yet
Clustering
8 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Unit 5
No ratings yet
Unit 5
27 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Answer: 1,000 Subnets + 100 Subnets 1,100 Subnets, and As Many Host Addresses As
100% (1)
Answer: 1,000 Subnets + 100 Subnets 1,100 Subnets, and As Many Host Addresses As
2 pages
AI_ML_Internship_Diary_Darshanaa_M_23BCC010
No ratings yet
AI_ML_Internship_Diary_Darshanaa_M_23BCC010
2 pages
Bin Location Overview: SAP Business One Version 9.3
No ratings yet
Bin Location Overview: SAP Business One Version 9.3
23 pages
Chapter 1 Powerpoint
No ratings yet
Chapter 1 Powerpoint
104 pages
PDS-10SW - 22.8.3.L ESW-rev4
No ratings yet
PDS-10SW - 22.8.3.L ESW-rev4
2 pages
Solved SSC JE Electrical Engineering 23rd Jan 2018 Shift-1 Paper With Solutions
No ratings yet
Solved SSC JE Electrical Engineering 23rd Jan 2018 Shift-1 Paper With Solutions
52 pages
Masonry: Legesse Arega B.C. SUMMARY For Variation Order For DIRE DAWA Airport Police Residence
No ratings yet
Masonry: Legesse Arega B.C. SUMMARY For Variation Order For DIRE DAWA Airport Police Residence
3 pages
358 44 Datamining and Warehousing 4.4
No ratings yet
358 44 Datamining and Warehousing 4.4
155 pages
Diseases of Bone & Osteodystrophies
No ratings yet
Diseases of Bone & Osteodystrophies
54 pages
Oleh: Seva Pravitasari, Sumarni Dan Tri Anasari Akademi Kebidanan YLPP Purwokerto, JL KH Wahid Hasyim No. 274A Telp (0281) 641655
No ratings yet
Oleh: Seva Pravitasari, Sumarni Dan Tri Anasari Akademi Kebidanan YLPP Purwokerto, JL KH Wahid Hasyim No. 274A Telp (0281) 641655
9 pages
Lecture Plan-Autocad Content Type: Hours Per Day Topics
100% (1)
Lecture Plan-Autocad Content Type: Hours Per Day Topics
2 pages
4 Chemical Exposure Measurement
100% (1)
4 Chemical Exposure Measurement
58 pages
I Jcs It 20150603227
No ratings yet
I Jcs It 20150603227
3 pages
SPE-37054-MS (Bit Torque Calculation) PDF
100% (1)
SPE-37054-MS (Bit Torque Calculation) PDF
9 pages
203-V Campian
No ratings yet
203-V Campian
6 pages
Msge Desta Review of Speaker Recognition From Spectrogram Images
No ratings yet
Msge Desta Review of Speaker Recognition From Spectrogram Images
5 pages
Project On Mainframe Training by
No ratings yet
Project On Mainframe Training by
30 pages
3600NG04 Gomaco Curb & Gutter
No ratings yet
3600NG04 Gomaco Curb & Gutter
26 pages
GCSE Physics 2024 Paper 2 Model Answers
No ratings yet
GCSE Physics 2024 Paper 2 Model Answers
32 pages
Multiple ch3
No ratings yet
Multiple ch3
4 pages
ISPF Programmers Guide
No ratings yet
ISPF Programmers Guide
349 pages
Roll
No ratings yet
Roll
20 pages
Line Following Robot: - Presented by Ansh Kariwal (2K20/B12/06) Aryan Dutt (2K20/B12/27)
No ratings yet
Line Following Robot: - Presented by Ansh Kariwal (2K20/B12/06) Aryan Dutt (2K20/B12/27)
12 pages
Simple Harmonic Motion
No ratings yet
Simple Harmonic Motion
3 pages
Steps in Assembling Disassembling PC
No ratings yet
Steps in Assembling Disassembling PC
1 page
C Sharp (C#) : Benadir University
No ratings yet
C Sharp (C#) : Benadir University
33 pages
Solution (2)
No ratings yet
Solution (2)
16 pages
Takeoff-edu-group-VLSI-title-list
No ratings yet
Takeoff-edu-group-VLSI-title-list
130 pages
Metallurgy: List of Ores and Their Na Mes
No ratings yet
Metallurgy: List of Ores and Their Na Mes
14 pages