0% found this document useful (0 votes)

17 views

DWDM - Unit - VI

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

DWDM - Unit - VI

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Unit – VI

Clustering
 1. What is Clustering: Clustering is a Unsupervised learning i.e.
no predefined classes, Group of similar objects that differ
significantly from other objects.
 The process of grouping a set of physical or abstract objects into
classes of similar objects is called clustering.
 Clustering is “the process of organizing objects into groups whose
members are similar in some way”.

 The cluster property is Intra-cluster distances are minimized and

Inter-cluster distances are maximized.
 Cluster is a collection of data objects.
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
1
 Group points into clusters based on how “near” they are to one
another.
 Outlier detection and Cluster analysis are very useful for fraud
detection, etc. and can be performed by statistical, distance-based
or deviation-based approaches.

 What is Good Clustering: A good clustering method will produce

high quality clusters with
 high intra-class similarity
 low inter-class similarity

 Requirements of Clustering in DM
 Scalability
 Ability to deal with different types of attributes
 High dimensionality
 Able to deal with noise and outliers
 Interpretability
 Discovery of clusters with attribute shape
2
 Why Clustering
 Scalability
 Ability to deal with different types of attributes
 Discovery of clusters with arbitrary shape
 Minimal requirements for domain knowledge to determine input
parameters
 Ability to deal with noisy data
 Incremental clustering and insensitivity to the order of input
records: High dimensionality, Constraint-based clustering and
Interpretability and usability

3
Types of data in Clustering Analysis
1. Nominal variables
2. Ordinal variables
3. Categorical Data
4. Labeled Variables
5. Unlabeled Variables
6. Numerical Values
7. Interval-scaled variables
8. Binary variables
9. Ratio variables
10. Variables of mixed types
4
1. Nominal Variables allow for only qualitative classification. A
generalization of the binary variable in that it can take more than
2 states, e.g., red, yellow, blue, green
Ex: { male, female},{yes, no},{true, false}
2. Ordinal Data are categorical data where there is a logical
ordering to the categories.
Ex: 1=Strongly disagree; 2=Disagree; 3=Neutral; 4=Agree;
3. Categorical Data represent types of data which may be divided
into groups.
Ex: race, sex, age group, and educational level.
4. Labeled Data are share the class labels or the generative
distribution of data.
5. Unlabeled Data are does not share the class labels or the
generative distribution of the labeled data.
6. Numerical Values: The data values completely belongs to and
only numerical values. Ex: 1,2,3,4,….
5
7. Interval-valued variables: These are variables ranges from
numbers
 Ex: 10-20, 20-30, 30-40,……..

8. Binary Variables: These are the variables and combination of 0

and 1.
 Ex: 1, 0, 001,010 ….

9. Ratio-Scaled Variables: A positive measurement on a nonlinear

scale, approximately at exponential scale, such as AeBt or Ae-Bt
 Ex: ½, 2/4, 4/8,…..

10. Variables of Mixed Types: A database may contain all the six
types of variables symmetric binary, asymmetric binary,
nominal, ordinal, interval and ratio.
 Ex: 11121A1201
6
Similarity Measure
 Euclidean distance: Distances are normally used to measure the
similarity or dissimilarity between two data objects.

 Euclidean distance: Euclidean distance is the distance between two

points in Euclidean space.
Major Clustering Approaches

1. Partitioning Methods
2. Hierarchical Methods
3. Density-Based Methods
4. Grid-Based Methods
5. Model-Based Clustering Methods
6. Clustering High-Dimensional Data
7. Constraint-Based Cluster Analysis
8. Outlier Analysis 8
5. Model-Based 6. Clustering High-
Clustering Dimensional Data
Methods

1.
Partitioning 3. Density-
Methods Based
Major Clustering Methods
Approaches
2. 4. Grid-Based
Hierarchi Methods
cal
Methods

7. Constraint-Based 8. Outlier Analysis

Cluster Analysis

Fig: Major Clustering Approaches

9
1. Partitioning approach: Construct various partitions and
then evaluate them by some criterion, e.g., minimizing the
sum of square errors.
 A partitioning method first creates an initial set of k
partitions, where parameter k is the number of partitions to
construct.
 It then uses an iterative relocation technique that attempts
to improve the partitioning by moving objects from one
group to another.

 Typical partitioning methods include

1. k-means,
2. k-medoids,
3. CLARANS (Clustering LARge Applications) 10
2. Hierarchical approach: A hierarchical method creates a
hierarchical decomposition of the given set of data
objects.
 The method can be classified as being either
agglomerative (bottom-up) or divisive (top-down), based
on how the hierarchical decomposition is formed.

 To compensate for the rigidity of merge or split, the

quality of hierarchical agglomeration can be improved by
analyzing object linkages at each hierarchical
partitioning.

11
 By first performing microclustering and then operating
on the micro clusters with other clustering techniques,
such as iterative relocation.
 Create a hierarchical decomposition of the set of data (or
objects) using some criterion classified into .

1. DIANA ((DIvisive ANAlysis)

2. AGNES (AGglomerative NESting)
3. BIRCH (Balanced Iterative Reducing and Clustering using
Hierarchies)
4. ROCK (RObust Clustering using linKs)
5. CAMELEON (Hierarchical clustering algorithm that uses
dynamic modeling). 12
3. Density-based approach: A density-based method
clusters objects based on the notion of density.
 It either grows clusters according to the density of
neighborhood objects or according to some density
function.

 OPTICS is a density based method that generates an

augmented ordering of the clustering structure of the
data.
 Based on connectivity and density functions classified
into
1. DBSCAN (Density-Based Spatial Clustering of Applications
with Noise)
2. OPTICS (Ordering Points to Identify the Clustering
Structure)
3. DENCLUE( DENsity-based CLUstEring) 13
4. Grid-based Approach: A grid-based method first
quantizes the object space into a finite number of cells
that form a grid structure, and then performs clustering
on the grid structure.

 STING is a typical example of a grid-based method

based on statistical information stored in grid cells.
 Wave Cluster and CLIQUE are two clustering algorithms
that are both grid based and density-based.

 Based on a multiple-level granularity structure are

1. STING (STatistical INformation Grid)
2. WAVECLUSTER (Clustering Using Wavelet
Transformation) 14
5. Model-based Methods : A model-based method
hypothesizes a model for each of the clusters and
finds the best fit of the data to that model.

 Examples of model-based clustering include the EM

algorithm, conceptual clustering, and neural network
approaches.

 A model is hypothesized for each of the clusters and

tries to find the best fit of that model to each other.
They are
1. EM (Expectation-Maximization)
2. SOM (Self-organizing feature maps)
15
3. COBWEB (Conceptual Clustering)
6. Clustering High-Dimensional Data: Clustering high-
dimensional data is of crucial importance, because in many
advanced applications, data objects such as text documents and
microarray data are high-dimensional in nature.

 There are three typical methods to handle high dimensional data

sets: dimension-growth subspace clustering, represented by
CLIQUE, dimension-reduction projected clustering, represented by
PROCLUS, and frequent pattern–based clustering, represented by
clusters.

 Data objects such as text documents and microarray data are high-
dimensional in nature. They are
1. CLIQUE (CLustering InQUEst)
2. PROCLUS (PROjected CLUStering)
3. pCluster(frequent pattern–based clustering)
16
7. Constraint-Based Cluster Analysis: A constraint-based
clustering method groups objects based on application dependent or
user-specified constraints.

 Ex: clustering with the existence of obstacle objects and clustering

under user-specified constraints are typical methods of constraint-
based clustering, semi-supervised clustering based on “weak”
supervision.

 Groups objects based on application dependent or user-specified

constraints.

 They are
1. Clustering with Obstacle Objects
2. User-Constrained Cluster Analysis
3. Semi-Supervised Cluster Analysis
17
8. Outlier Analysis: These are very useful for fraud detection,
customized marketing, medical analysis, and many other tasks.

 Computer-based outlier analysis methods typically follow either a

statistical distribution-based approach, a distance-based approach, a
density-based local outlier detection approach, or a deviation-based
approach.

 They are
1. Statistical Distribution-Based Outlier Detection
2. Distance-Based Outlier Detection
3. Density-Based Local Outlier Detection
4. Deviation-Based Outlier Detection

18
Examples of Clustering Applications
1. Marketing
2. Land use
3. Insurance
4. City-planning
5. Earth-quake studies

Issues of Clustering
1. Accuracy,
2. Training time,
3. Robustness,
4. Interpretability, and
5. Scalability
6. Find top ‘n’ outlier points 19
Applications
 Pattern Recognition
 Spatial Data Analysis
 GIS(Geographical Information System)
 Cluster Weblog data to discover groups
 Credit approval
 Target marketing
 Medical diagnosis
 Fraud detection
 Weather forecasting
 Stock Marketing 20
2. Classification Vs Clustering
1. Classification is “the process 1. Clustering is “the process of
of organizing objects into organizing objects into
groups whose members are groups whose members are
not similar. similar in some way”.

2. It is a Supervised Learning. 2. It is a Unsupervised Learning.

3. Predefined classes. 3. No predefined classes.

4. Have labels for some points. 4. No labels in Clustering.

5. Require a “rule” that will 5. Group points into clusters

accurately assign labels to based on how “near” they
new points. are to one another. 21
6. Classification 6. Clustering

22
7. Classification approaches are 7. Clustering approaches are eight.
two types 1. Partition Method
2. Hierarchical Method
3. Density-Based Methods
1. Predictive Classification 4. Grid-Based Methods
2. Descriptive Classification 5. Model-Based Clustering Methods
6. Clustering High-Dimensional
Data
7. Constraint - Based Cluster
Analysis
8. Outlier Analysis
8. Issues of Classification
1. Accuracy, 8. Issues of Clustering
2. Training time, 1. Accuracy,
3. Robustness, 2. Training time,
4. Interpretability, and 3. Robustness,
5. Scalability 4. Interpretability, and
5. Scalability
6. Find top ‘n’ outlier points
9. Examples 9. Examples
1. Marketing 1. Marketing
2. Land use 2. Land use
3. Insurance 3. Insurance
4. City-planning 4. City-planning
5. Earth-quake studies 5. Earth-quake studies

10. Techniques
10. Techniques
1. K- Means Clustering
1. Decision Tree
2. DIANA ((DIvisive ANAlysis)
2. Bayesian classification
3. AGNES (AGglomerative NESting)
3. Rule-based classification
4. BIRCH (Balanced Iterative Reducing
4. Prediction and Accuracy and error
and Clustering using Hierarchies)
measures
5. DBSACN (Density-Based Spatial
Clustering of Applications with
Noise)
24
11. Applications 11. Applications
1. Credit approval 1. Pattern Recognition
2. Target marketing 2. Spatial Data Analysis
3. Medical diagnosis 3. WWW (World Wide Web)
4. Fraud detection 4. Weblog data to discover groups
5. Weather forecasting 5. Credit approval
6. Stock Marketing 6. Target marketing
7. Medical diagnosis
8. Fraud detection
9. Weather forecasting
10. Stock Marketing

25
3. k-Means Clustering
 It is a Partitioning cluster technique.
 It is a Centroid-Based cluster technique
 Clustering is a Unsupervised learning i.e. no predefined
classes, Group of similar objects that differ significantly
from other objects.
2 2 2
d (i, j)  (| x  x |  | x  x | ... | x  x | )
i1 j1 i2 j 2 ip jp
 It then creates the first k initial clusters (k= number of
clusters needed) from the dataset by choosing k rows of
data randomly from the dataset.
 The k-Means algorithm calculates the Arithmetic Mean
of each cluster formed in the dataset. 26
 Square-error criterion

 Where
– E is the sum of the square error for all objects in the data set;
– p is the point in space representing a given object; and
– mi is the mean of cluster
– Ci (both p and mi are multidimensional).
 Algorithm: The k-means algorithm for partitioning,
where each cluster’s center is represented by the mean
value of the objects in the cluster.
 Input:
– k: the number of clusters,
– D: a data set containing n objects.
 Output: A set of k clusters.
27
k-Means Clustering Method

Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
3

2 each
2 the 2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to most
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

similar
center reassign reassign
10 10

k=2 9 9

8 8

Arbitrarily choose K 7 7

object as initial cluster

6 6

5 5

center 4 Update 4

2
the 3

1 cluster 1

0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10

28
Fig: Clustering of a set of objects based on the k-means
method. (The mean of each cluster is marked by a “+”.)

29
Steps
k - Means algorithm is implemented in four
steps:
1. Partition objects into k nonempty subsets.
2. Compute seed points as the centroids of the clusters
of the current partition (the centroid is the center, i.e.,
mean point, of the cluster).
3. Assign each object to the cluster with the nearest seed
point.
4. Go back to Step 2, stop when no more new
assignment

31
Comments on the k-Means Method
 Strength: Relatively efficient: O(tkn), where n is # objects, k is #
clusters, and t is # iterations. Normally, k, t << n.
– Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))

 Comment: Often terminates at a local optimum. The global

optimum may be found using techniques such as deterministic
annealing and genetic algorithms

 Weakness
– Applicable only when mean is defined, then what about
categorical data?
– Need to specify k, the number of clusters, in advance
– Unable to handle noisy data and outliers
– Not suitable to discover clusters with non-convex shapes 32
Variations of the k-Means Method
 A few variants of the k-means which differ in
– Selection of the initial k means
– Dissimilarity calculations
– Strategies to calculate cluster means

 Handling categorical data: k-modes

– Replacing means of clusters with modes
– Using new dissimilarity measures to deal with categorical
objects
– Using a frequency-based method to update modes of clusters
– A mixture of categorical and numerical data: k-prototype
method

33
What is the Problem of the k-Means Method?
 The k-means algorithm is sensitive to outliers !
– Since an object with an extremely large value may substantially
distort the distribution of the data.

 k-Medoids: Instead of taking the mean value of the object in a

cluster as a reference point, medoids can be used, which is the
most centrally located object in a cluster.
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

34
The k-Medoids Clustering Method
 Find representative objects, called Medoids, in clusters

 PAM (Partitioning Around Medoids)

– starts from an initial set of medoids and iteratively
replaces one of the medoids by one of the non-medoids
if it improves the total distance of the resulting
clustering
– PAM works effectively for small data sets, but does not
scale well for large data sets

 CLARA (Kaufmann & Rousseeuw)

 CLARANS: Randomized sampling
 Focusing + Spatial data structure. 35
A Typical k-Medoids Algorithm (PAM)

Total Cost = 20
10 10 10

9 9 9

8 8 8

7 7 7

6
Arbitrary 6
Assign 6

5
choose k 5
each 5

4 object as 4 remainin 4

3
initial 3
g object 3

2
medoids 2
to 2

1 1
nearest
1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
medoids 0 1 2 3 4 5 6 7 8 9 10

K=2 Randomly select a

Total Cost = 26 nonmedoid object,Orandom
10 10

Do loop 9

8
Compute
9

8
Swapping O total cost of
Until no change
7 7

and Oramdom 6
swapping 6

5 5

If quality is 4 4

improved. 3 3

2 2

1 1

0 0

Data Mining:
0 1 2
Concepts
3 4 5
and
6 7
Techniques
8 9 10 0 1 2 3 4 5 6 7
36
8 9 10
Issues of Clustering
1. Accuracy,
2. Training time,
3. Robustness,
4. Interpretability, and
5. Scalability
6. Find top ‘n’ outlier points

Examples of Clustering Applications

1. Marketing
2. Land use
3. Insurance
4. City-planning
5. Earth-quake studies
37
Applications
 Pattern Recognition
 Spatial Data Analysis
 GIS(Geographical Information System)
 Image Processing
 WWW (World Wide Web)
 Cluster Weblog data to discover groups
 Credit approval
 Target marketing
 Medical diagnosis
 Fraud detection
 Weather forecasting
 Stock Marketing
38

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Chi Square Genetics Practice Problems Worksheet.280185356
100% (4)
Chi Square Genetics Practice Problems Worksheet.280185356
4 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
UNIT 2 DMW
No ratings yet
UNIT 2 DMW
26 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Unit 4
No ratings yet
Unit 4
21 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Unit 5
No ratings yet
Unit 5
27 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Custer Analysis: Prepared by Navin Ninama
No ratings yet
Custer Analysis: Prepared by Navin Ninama
20 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
MODULE-V
No ratings yet
MODULE-V
16 pages
Clustering new
No ratings yet
Clustering new
6 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
05. UNIT-V(DMWH6EM)
No ratings yet
05. UNIT-V(DMWH6EM)
30 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Clustering
No ratings yet
Clustering
6 pages
DM MODULE 4
No ratings yet
DM MODULE 4
17 pages
CLUSTER ANALYSIS unit 3 Data mining
No ratings yet
CLUSTER ANALYSIS unit 3 Data mining
84 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
DataMining_Unit4_notes
No ratings yet
DataMining_Unit4_notes
27 pages
8 Clustering
No ratings yet
8 Clustering
53 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
27 pages
Unit 2 - Introduction to Cluster Analysis
No ratings yet
Unit 2 - Introduction to Cluster Analysis
53 pages
DATA_MINING_UNIT-4
No ratings yet
DATA_MINING_UNIT-4
15 pages
Unit 5
No ratings yet
Unit 5
27 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Cluster Is A Group of Objects That Belongs To The Same Class
No ratings yet
Cluster Is A Group of Objects That Belongs To The Same Class
12 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
UNIT-4
No ratings yet
UNIT-4
106 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
E-Note_28966_Content_Document_20241211091351PM
No ratings yet
E-Note_28966_Content_Document_20241211091351PM
69 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Clustering
No ratings yet
Clustering
8 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
Clustering
No ratings yet
Clustering
47 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
Unit 4
No ratings yet
Unit 4
4 pages
Clustering
No ratings yet
Clustering
104 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Study of Clustering Methods in Data Mini PDF
No ratings yet
Study of Clustering Methods in Data Mini PDF
5 pages
IIOT-slideshare
No ratings yet
IIOT-slideshare
12 pages
Industrial IoT Lab Manual
No ratings yet
Industrial IoT Lab Manual
33 pages
Reference Manual
No ratings yet
Reference Manual
77 pages
INDUSTRIAL IOT CASE STUDIES
100% (1)
INDUSTRIAL IOT CASE STUDIES
6 pages
IOT _MID-II_BIT BANK
No ratings yet
IOT _MID-II_BIT BANK
21 pages
Industril Iot Unit Test 1
No ratings yet
Industril Iot Unit Test 1
1 page
Industrial IoT Laboratory
No ratings yet
Industrial IoT Laboratory
1 page
higher education proofs 2020-2021
No ratings yet
higher education proofs 2020-2021
32 pages
CSE_NIRF_2024
No ratings yet
CSE_NIRF_2024
57 pages
offer_letter_2024-10-15
No ratings yet
offer_letter_2024-10-15
10 pages
HIGHER EDUCATION DATA TEMPLATE
No ratings yet
HIGHER EDUCATION DATA TEMPLATE
3 pages
DWDM - Unit - VIII
No ratings yet
DWDM - Unit - VIII
32 pages
ENTREPENURESHIP DATA TEMPLATE
No ratings yet
ENTREPENURESHIP DATA TEMPLATE
3 pages
40 Most Popular Internet of Things (IOT) Applications & Examples
No ratings yet
40 Most Popular Internet of Things (IOT) Applications & Examples
23 pages
Practical Introduction To Stata PDF
100% (1)
Practical Introduction To Stata PDF
58 pages
Statistics and Data Analysis for Financial Engineering (Springer Texts in Statistics) David Ruppert Springer download
100% (1)
Statistics and Data Analysis for Financial Engineering (Springer Texts in Statistics) David Ruppert Springer download
26 pages
Ch. 12 The Analysis of Variance: Example
No ratings yet
Ch. 12 The Analysis of Variance: Example
27 pages
Wind Load Calculations According To CP 3:chapter V-2:1972
No ratings yet
Wind Load Calculations According To CP 3:chapter V-2:1972
2 pages
Test Bank for Essentials of Business Statistics, 2nd Edition, Sanjiv Jaggia, Alison Kelly instant download
100% (4)
Test Bank for Essentials of Business Statistics, 2nd Edition, Sanjiv Jaggia, Alison Kelly instant download
61 pages
The CANCORR Procedure
No ratings yet
The CANCORR Procedure
32 pages
Instruction Manual Coating Thickness Gauge
No ratings yet
Instruction Manual Coating Thickness Gauge
16 pages
Lecture 9.0 - Statistics
No ratings yet
Lecture 9.0 - Statistics
39 pages
IGCSE-Planning The Experiment
No ratings yet
IGCSE-Planning The Experiment
6 pages
Edexcel 6BI06 Coursework
No ratings yet
Edexcel 6BI06 Coursework
36 pages
Time Series Mining Slides
No ratings yet
Time Series Mining Slides
42 pages
Codigo Box Cox SAS
No ratings yet
Codigo Box Cox SAS
37 pages
Xsmle
No ratings yet
Xsmle
37 pages
Origin and Development of Statistics
No ratings yet
Origin and Development of Statistics
3 pages
Air Receivers Volume Calculation
100% (1)
Air Receivers Volume Calculation
123 pages
Heat Chap14 119
No ratings yet
Heat Chap14 119
16 pages
SPSS Lab 4 - Velu Pandian Ravichandran
No ratings yet
SPSS Lab 4 - Velu Pandian Ravichandran
10 pages
Excel Charts Visualization Secrets Ebook
No ratings yet
Excel Charts Visualization Secrets Ebook
24 pages
Journal of Sensory Studies Volume 27 Issue 3 2012 (Doi 10.1111/j.1745-459x.2012.00383.x) Guillermo Hough Lorena Garitta - Methodology For Sensory Shelf-Life Estimation - A Review
No ratings yet
Journal of Sensory Studies Volume 27 Issue 3 2012 (Doi 10.1111/j.1745-459x.2012.00383.x) Guillermo Hough Lorena Garitta - Methodology For Sensory Shelf-Life Estimation - A Review
11 pages
Chapter 1 - Introduction To Probability
No ratings yet
Chapter 1 - Introduction To Probability
36 pages
Ce123-Trip Generation and Attraction (Final)
No ratings yet
Ce123-Trip Generation and Attraction (Final)
48 pages
Wind Rose Diagram
No ratings yet
Wind Rose Diagram
4 pages
Synopsis
No ratings yet
Synopsis
13 pages
THE GREAT GAMBLER, Historical Novel
No ratings yet
THE GREAT GAMBLER, Historical Novel
51 pages
Stat - Sir Corpuz
No ratings yet
Stat - Sir Corpuz
213 pages
1 Forecasting-Questions
No ratings yet
1 Forecasting-Questions
4 pages
Bayesian Statistics Primer PDF
No ratings yet
Bayesian Statistics Primer PDF
23 pages
Chapter 3-Frequency Analysis
No ratings yet
Chapter 3-Frequency Analysis
18 pages
Calcium Aluminate Cement - Composition, Specifications and Conformity Criteria
100% (1)
Calcium Aluminate Cement - Composition, Specifications and Conformity Criteria
38 pages

DWDM - Unit - VI

Uploaded by

DWDM - Unit - VI

Uploaded by

Unit – VI

 The cluster property is Intra-cluster distances are minimized and

 What is Good Clustering: A good clustering method will produce

8. Binary Variables: These are the variables and combination of 0

9. Ratio-Scaled Variables: A positive measurement on a nonlinear

 Euclidean distance: Euclidean distance is the distance between two

7. Constraint-Based 8. Outlier Analysis

Fig: Major Clustering Approaches

 Typical partitioning methods include

 To compensate for the rigidity of merge or split, the

1. DIANA ((DIvisive ANAlysis)

 OPTICS is a density based method that generates an

 STING is a typical example of a grid-based method

 Based on a multiple-level granularity structure are

 Examples of model-based clustering include the EM

 A model is hypothesized for each of the clusters and

 There are three typical methods to handle high dimensional data

 Ex: clustering with the existence of obstacle objects and clustering

 Groups objects based on application dependent or user-specified

 Computer-based outlier analysis methods typically follow either a

2. It is a Supervised Learning. 2. It is a Unsupervised Learning.

3. Predefined classes. 3. No predefined classes.

4. Have labels for some points. 4. No labels in Clustering.

5. Require a “rule” that will 5. Group points into clusters

object as initial cluster

 Comment: Often terminates at a local optimum. The global

 Handling categorical data: k-modes

 k-Medoids: Instead of taking the mean value of the object in a

 PAM (Partitioning Around Medoids)

 CLARA (Kaufmann & Rousseeuw)

K=2 Randomly select a

Examples of Clustering Applications

You might also like