0% found this document useful (0 votes)

4 views

AI-unit-5

Uploaded by

shivamkkushwaha0

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

AI-unit-5

Uploaded by

shivamkkushwaha0

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 103

K-means Clustering

Ke Chen

COMP24111 Machine Learning

Outline
• Introduction

• K-means Algorithm

• Example

• K-means Demo

• Relevant Issues

• Conclusion

2
COMP24111 Machine Learning
Introduction
• Partitioning Clustering Approach
– a typical clustering analysis approach via partitioning data set iteratively
– construct a partition of a data set to produce several non-empty clusters
(usually, the number of clusters given in advance)
– in principle, partitions achieved via minimising the sum of squared distance in
each cluster
E   i 1 xCi || x  mi ||
K 2

• Given a K, find a partition of K clusters to optimise the chosen

partitioning criterion
– global optimal: exhaustively enumerate all partitions
– heuristic method: K-means algorithm
K-means algorithm (MacQueen’67): each cluster is represented by the centre
of the cluster and the algorithm converges to stable centres of clusters.

3
COMP24111 Machine Learning
K-mean Algorithm
• Given the cluster number K, the K-means algorithm is
carried out in three steps:
Initialisation: set seed points
• Assign each object to the
cluster with the nearest seed
point
• Compute seed points as the
centroids of the clusters of the
current partition (the centroid
is the centre, i.e., mean point,
of the cluster)
• Go back to Step 1), stop when
no more new assignment

4
COMP24111 Machine Learning
Example
• Problem
Suppose we have 4 types of medicines and each has two attributes (pH and
weight index). Our goal is to group these objects into K=2 group of medicine.

D
Medicine Weight pH-
Index C
A 1 1

B 2 1
A B
C 4 3

D 5 4

5
COMP24111 Machine Learning
Example
• Step 1: Use initial seed points for partitioning
c1  A, c2  B

Euclidean distance

d( D, c1 )  ( 5  1)2  ( 4  1)2  5
d( D, c2 )  ( 5  2)2  ( 4  1)2  4.24

Assign each object to the cluster

with the nearest seed point

6
COMP24111 Machine Learning
Example
• Step 2: Compute new centroids of the current partition

Knowing the members of each

cluster, now we compute the new
centroid of each group based on
these new memberships.
c1  (1, 1)
 2  4  5 1 3  4 
c2   , 
 3 3 
 (11 / 3 , 8 / 3)
 ( 3.67 , 2.67 )
7
COMP24111 Machine Learning
Example
• Step 2: Renew membership based on new centroids

Compute the distance of all

objects to the new centroids

Assign the membership to objects

8
COMP24111 Machine Learning
Example
• Step 3: Repeat the first two steps until its convergence

Knowing the members of each

cluster, now we compute the new
centroid of each group based on
these new memberships.

 1 2 11 1
c1   ,   (1 , 1)
 2 2  2
45 34 1 1
c2   ,   (4 , 3 )
 2 2  2 2

9
COMP24111 Machine Learning
Example
• Step 3: Repeat the first two steps until its convergence

Compute the distance of all objects

to the new centroids

Stop due to no new assignment

10
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. k=5)

11
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. K=5)
2. Randomly guess K cluster
Center locations

12
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. K=5)
2. Randomly guess K cluster
Center locations
3. Each data point finds out
which Center it’s closest to.
(Thus each Center “owns” a
set of data points)

13
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. K=5)
2. Randomly guess K cluster
centre locations
3. Each data point finds out
which centre it’s closest to.
(Thus each Center “owns” a
set of data points)
4. Each centre finds the centroid
of the points it owns

14
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. K=5)
2. Randomly guess K cluster
centre locations
3. Each data point finds out
which centre it’s closest to.
(Thus each centre “owns” a
set of data points)
4. Each centre finds the centroid
of the points it owns
5. …and jumps there

15
COMP24111 Machine Learning
K-means Demo
1. User set up the number of
clusters they’d like. (e.g. K=5)
2. Randomly guess K cluster
centre locations
3. Each data point finds out
which centre it’s closest to.
(Thus each centre “owns” a
set of data points)
4. Each centre finds the centroid
of the points it owns
5. …and jumps there
6. …Repeat until terminated!

16
COMP24111 Machine Learning
K-means Demo

K-means Demo

17
COMP24111 Machine Learning
Relevant Issues
• Efficient in computation
– O(tKn), where n is number of objects, K is number of clusters,
and t is number of iterations. Normally, K, t << n.
• Local optimum
– sensitive to initial seed points
– converge to a local optimum that may be unwanted solution
• Other problems
– Need to specify K, the number of clusters, in advance
– Unable to handle noisy data and outliers (K-Medoids algorithm)
– Not suitable for discovering clusters with non-convex shapes
– Applicable only when mean is defined, then what about
categorical data? (K-mode algorithm)

18
COMP24111 Machine Learning
Relevant Issues
• Cluster Validity
– With different initial conditions, the K-means algorithm may result
in different partitions for a given data set.
– Which partition is the “best” one for the given data set?
– In theory, no answer to this question as there is no ground-truth
available in unsupervised learning
– Nevertheless, there are several cluster validity criteria to assess the
quality of clustering analysis from different perspectives
– A common cluster validity criterion is the ratio of the total
between-cluster to the total within-cluster distances
• Between-cluster distance (BCD): the distance between means of two clusters
• Within-cluster distance (WCD): sum of all distance between data points and
the mean in a specific cluster
• A large ratio of BCD:WCD suggests good compactness inside clusters and
good separability among different clusters!
19
COMP24111 Machine Learning
Conclusion
• K-means algorithm is a simple yet popular method for
clustering analysis
• Its performance is determined by initialisation and
appropriate distance measure
• There are several variants of K-means to overcome its
weaknesses
– K-Medoids: resistance to noise and/or outliers
– K-Modes: extension to categorical data clustering analysis
– CLARA: dealing with large data sets
– Mixture models (EM algorithm): handling uncertainty of clusters

20
COMP24111 Machine Learning
Introduction
to
Pattern Recognition
2

Machine Perception
• Build a machine that can recognize
patterns:
– Speech recognition

– Fingerprint identification

– OCR (Optical Character Recognition)

– DNA sequence identification

An Example
• “Sorting incoming Fish on a conveyor
according to species using optical
sensing”
Sea bass
Species
Salmon
4

• Problem Analysis

– Set up a camera and take some sample images to

extract features

• Length
• Lightness
• Width
• Number and shape of fins
• Position of the mouth, etc…

• This is the set of all suggested features to explore for

use in our classifier!
5

• Preprocessing

– Use a segmentation operation to isolate fishes

from one another and from the background

• Information from a single fish is sent to a

feature extractor whose purpose is to reduce
the data by measuring certain features

• The features are passed to a classifier

6
7

• Classification

– Select the length of the fish as a possible

feature for discrimination
8
9

The length is a poor feature alone!

Select the lightness as a possible

feature.
10
11

• Threshold decision boundary and cost

relationship
• Move our decision boundary toward smaller
values of lightness in order to minimize the
cost (reduce the number of sea bass that are
classified salmon!)

Task of decision theory

• Adopt the lightness and add the width of

the fish
Fish xT = [x1, x2]

Lightness Width
13
14

• We might add other features that are not

correlated with the ones we already have. A
precaution should be taken not to reduce the
performance by adding such “noisy features”

• Ideally, the best decision boundary should be

the one which provides an optimal performance
such as in the following figure:
15
16

• However, our satisfaction is premature

because the central aim of designing a
classifier is to correctly classify novel input

Issue of generalization!
17
18

Pattern Recognition Systems

• Sensing

– Use of a transducer (camera or microphone)

– PR system depends on the bandwidth, the resolution
sensitivity distortion of the transducer

• Segmentation and grouping

– Patterns should be well separated and should not

overlap
19
20

• Feature extraction
– Discriminative features
– Invariant features with respect to translation, rotation and
scale.

• Classification
– Use a feature vector provided by a feature extractor to
assign the object to a category

• Post Processing
– Exploit context input dependent information other than from
the target pattern itself to improve performance (T/-\E and
C/-\T)
21

The Design Cycle

• Data collection
• Feature Choice
• Model Choice
• Training
• Evaluation
• Computational Complexity
22
23

• Data Collection

– How do we know when we have collected an

adequately large and representative set of
examples for training and testing the system?
24

• Feature Choice

– Depends on the characteristics of the

problem domain.
– Simple to extract, invariant to irrelevant
transformation, insensitive to noise.
25

• Model Choice

– Unsatisfied with the performance of our fish

classifier and want to jump to another class of
model
26

• Training

– Use data to determine the classifier.

– Many different procedures for training
classifiers and choosing models exists.
27

• Evaluation

– Measure the error rate (or performance) and

switch from one set of features to another one
Supervised vs. Unsupervised Learning

 Supervised learning (classification)

 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
1
Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction,
i.e., predicts class membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian
classifier, has comparable performance with decision tree
and selected neural network classifiers
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
data
 Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
of optimal decision making against which other methods
can be measured
September 2, 2023 2
Bayesian Theorem: Basics

 Let X be a data sample (“evidence”): class label is unknown

 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that
the hypothesis holds given the observed data sample X
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H), the probability of observing the sample X, given
that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is
31..40, medium income
September 2, 2023 3
Bayesian Theorem

 Given training data X, posteriori probability of a

hypothesis H, P(H|X), follows the Bayes theorem

P(H | X)  P(X | H )P(H )

P(X)
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
September 2, 2023 4
Towards Naïve Bayesian Classifier
 Let D be a training set of tuples and their associated class
labels, and each tuple is represented by an n-D attribute
vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the
maximal P(Ci|X)
 This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X)  i i
i P(X)
 Since P(X) is constant for all classes, only
P(C | X)  P(X | C )P(C )
i i i
needs to be maximized

September 2, 2023 5
Derivation of Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):

September 2, 2023 6
7
Naive Bayes: Example

 Consider PlayTennis, and new instance

<Outlk = sun, Temp = cool, Humid = high, Wind = strong>
 Want to compute:

P(y) P(sun|y) P(cool|y) P(high|y) P(strong|y) = .005

P(n) P(sun|n) P(cool|n) P(high|n) P(strong|n) = .021
 vNB = n

8
Nearest Neighbor

 Among the various methods of supervised statistical

pattern recognition, the Nearest Neighbor rule
achieves consistently high performance, without a
priori assumptions about the distributions from which
the training examples are drawn.
 Training set involves both positive and negative cases.
 A new sample is classified by calculating the distance
to the nearest training case; the sign of that point
then determines the classification of the sample.

September 2, 2023 9
Contd.

 The k-NN classifier extends this idea by taking

the k nearest points and assigning the sign of the
majority.
 It is common to select k small and odd to break
ties (typically 1, 3 or 5).
 Larger k values help reduce the effects of noisy
points within the training data set, and the choice
of k is often performed through cross-validation.

September 2, 2023 10
The k-Nearest Neighbor Algorithm

 All instances correspond to points in the n-D space

 The nearest neighbor are defined in terms of
Euclidean distance, dist(X1, X2)
 Target function could be discrete- or real- valued
 For discrete-valued, k-NN returns the most common
value among the k training examples nearest to xq

_
_ _
+ _
+
_ . +
xq
_
+
September 2, 2023 11
Discussion on the k-NN Algorithm

 k-NN for real-valued prediction for a given unknown tuple

 Returns the mean values of the k nearest neighbors
 Distance-weighted nearest neighbor algorithm
 Weight the contribution of each of the k neighbors
according to their distance to the query xq 1 w
 Give greater weight to closer neighbors d ( xq , x )2
i
 Robust to noisy data by averaging k-nearest neighbors

September 2, 2023 12
Example

September 2, 2023 13
SVM—Support Vector Machines
 A new classification method for both linear and nonlinear
data
 It uses a nonlinear mapping to transform the original
training data into a higher dimension
 With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”)
 With an appropriate nonlinear mapping to a sufficiently
high dimension, data from two classes can always be
separated by a hyperplane
 SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by the
support vectors)
September 2, 2023 14
SVM—History and Applications
 Vapnik and colleagues (1995)—groundwork from Vapnik
& Chervonenkis’ statistical learning theory in 1960s
 Features: training can be slow but accuracy is high owing
to their ability to model complex nonlinear decision
boundaries (margin maximization)
 Used both for classification and prediction
 Applications:
 handwritten digit recognition, object recognition,
speaker identification, benchmarking time-series
prediction tests
September 2, 2023 15
SVM—General Philosophy

Small Margin Large Margin

Support Vectors

September 2, 2023 16
SVM—Margins and Support Vectors

September 2, 2023 Data Mining: Concepts 17

SVM—When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to
find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)

September 2, 2023 18
Why Is SVM Effective on High Dimensional Data?

 The complexity of trained classifier is characterized by the # of

support vectors rather than the dimensionality of the data
 The support vectors are the essential or critical training examples —
they lie closest to the decision boundary (MMH)
 If all other training examples are removed and the training is
repeated, the same separating hyperplane would be found
 The number of support vectors found can be used to compute an
(upper) bound on the expected error rate of the SVM classifier, which
is independent of the data dimensionality
 Thus, an SVM with a small number of support vectors can have good
generalization, even when the dimensionality of the data is high

September 2, 2023 19
Dimensionality Reduction Using
PCA/LDA
Dimensionality Reduction
• One approach to deal with high dimensional data is by reducing
their dimensionality.
• Project high dimensional data onto a lower dimensional sub-space
using linear or non-linear transformations.

2
Dimensionality Reduction
• Linear transformations are simple to compute and tractable.

Y U X (bi  u a )t
i i

kx1 kxd dx1 (k<<d)

• Classical –linear- approaches:

– Principal Component Analysis (PCA)
– Fisher Discriminant Analysis (FDA)

3
Principal Component Analysis (PCA)

• Each dimensionality reduction technique finds an

appropriate transformation by satisfying certain criteria
(e.g., information loss, data discrimination, etc.)

• The goal of PCA is to reduce the dimensionality of the

data while retaining as much as possible of the
variation present in the dataset.

4
Principal Component Analysis (PCA)
• Find a basis in a low dimensional sub-space:
− Approximate vectors by projecting them in a low dimensional
sub-space:
(1) Original space representation:

x  a1v1  a2v2  ...  aN vN

where v1 , v2 ,..., vn is a basein the original N-dimensionalspace

(2) Lower-dimensional sub-space representation:

xˆ  b1u1  b2u2  ...  bK uK

where u1 , u2 ,..., uK is a basein the K -dimensionalsub-space (K<N)

• Note: if K=N, then x̂  x

5
Principal Component Analysis (PCA)
• Example (K=N):

6
Principal Component Analysis (PCA)
• Information loss
− Dimensionality reduction implies information loss !!
− PCA preserves as much information as possible:

min || x  xˆ || (reconstruction error)

• What is the “best” lower dimensional sub-space?
The “best” low-dimensional space is centered at the sample mean
and has directions determined by the “best” eigenvectors of the
covariance matrix of the data x.

− By “best” eigenvectors we mean those corresponding to the largest

eigenvalues ( i.e., “principal components”).
− Since the covariance matrix is real and symmetric, these
eigenvectors are orthogonal and form a set of basis vectors.
7
Principal Component Analysis (PCA)
• Methodology
− Suppose x1, x2, ..., xM are N x 1 vectors

8
Principal Component Analysis (PCA)
• Methodology – cont.

bi  uiT ( x  x )

9
Principal Component Analysis (PCA)
• Linear transformation implied by PCA
− The linear transformation RN  RK that performs the dimensionality
reduction is:

10
Principal Component Analysis (PCA)
• Geometric interpretation
− PCA projects the data along the directions where the data varies the
most.
− These directions are determined by the eigenvectors of the
covariance matrix corresponding to the largest eigenvalues.
− The magnitude of the eigenvalues corresponds to the variance of
the data along the eigenvector directions.

11
Principal Component Analysis (PCA)
• PCA and classification
− PCA is not always an optimal dimensionality-reduction procedure
for classification purposes.
• Multiple classes and PCA
− Suppose there are C classes in the training data.
− PCA is based on the sample covariance which characterizes the
scatter of the entire data set, irrespective of class-membership.
− The projection axes chosen by PCA might not provide good
discrimination power.

12
Linear Discriminant Analysis (LDA)

• What is the goal of LDA?

− Perform dimensionality reduction “while preserving as much of the
class discriminatory information as possible”.
− Seeks to find directions along which the classes are best separated.
− Takes into consideration the scatter within-classes but also the
scatter between-classes.
− More capable of distinguishing image variation due to identity from
variation due to other sources such as illumination and expression.

13
LDA

14
Linear Discriminant Analysis (LDA)
• Notation

C Mi
S w   ( x j  μi )( x j  μi )T
i 1 j 1

(each sub-matrix has

rank 1 or less, i.e., outer
product of two vectors)
(Sb has at most rank C-1)

15
Linear Discriminant Analysis (LDA)
• Methodology
projection matrix

y UTx
− LDA computes a transformation that maximizes the between-class
scatter while minimizing the within-class scatter:

Sb , S w : scatter matrices of the projected data y

16
Linear Discriminant Analysis (LDA)
• Is LDA always better than PCA?

− There has been a tendency in the computer vision community to

prefer LDA over PCA.
− This is mainly because LDA deals directly with discrimination
between classes while PCA does not pay attention to the underlying
class structure.
− Main results of this study:
1. When the training set is small, PCA can outperform LDA.
2. When the number of samples is large and representative for
each class, LDA outperforms PCA.

17
Support Vector Machine
Classification
• Everyday, all the time we classify things.
• Eg crossing the street:
– Is there a car coming?
– At what speed?
– How far is it to the other side?
– Classification: Safe to walk or not!!!
Classification Problem?
• The goal of classification is to organize and
categorize data into distinct classes
– A model is first created based on the previous
data (training samples)
– This model is then used to classify new data
(unseen samples)
• A sample is characterized by a set of features
• Classification is essentially finding the best
boundary between classes
Classification Formulation
• Given
– an input space 
– a set of classes  ={ 1 , 2 ,..., c }
• the Classification Problem is
– to define a mapping f: g  where each x in
 is assigned to one class
• This mapping function is called a Decision Function
SVM
• An SVM model is a representation of the
examples as points in space, mapped so
that the examples of the separate
categories are divided by a clear gap that
is as wide as possible. New examples are
then mapped into that same space and
predicted to belong to a category based on
which side of the gap they fall on.
b
Linear Classifiers
x f y
f(x,w,b) = sign(w x + b)
denotes +1 w x + b>0
denotes -1

How would you

classify this data?

w x + b<0
b
Linear Classifiers
x f y
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?
b
Linear Classifiers
x f y
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?
b
Linear Classifiers
x f y
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

Any of these
would be fine..

..but which is
best?
b
Linear Classifiers
x f y
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?

Misclassified
to +1 class
b
Classifier Margin
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1 Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
b

Maximum Margin
x f y
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
Support Vectors with the maximum
are those
datapoints that margin.
the margin This is the
pushes up
against simplest kind of
SVM (Called an
LSVM)
Linear SVM
Support Vector Machine (SVM)
Support vectors
• SVM, Introduced by Vapnik
(1995).
• SVMs maximize the margin
around the separating
hyperplane.
• The decision function is fully
specified by a subset of training
samples, the support vectors. Maximize
margin
• Solving SVMs is a quadratic
programming problem
Types of SVM
• Linear SVM
Used when the datasets are linearly
separable
0 x

• Non-Linear SVM
Used when the datasets are not
linearly separable
0 x
Linear SVM Mathematically
x+ M=Margin Width

X-

What we know:  
(x  x )  w 2
• w . x+ + b = +1 M  
w w
• w . x- + b = -1
• w . (x+-x-) = 2
Non-linear SVMs
• Solution: mapping data to a higher-dimensional
space:

0 x

0 x
Non-linear SVMs: Feature spaces
• General idea: the original input space can always
be mapped to some higher-dimensional feature
space where the training set is separable:

Φ: x → φ(x)
Properties of SVM
• Ability to handle large feature spaces
• Nice math property: a simple convex optimization
problem which is guaranteed to converge to a single
global solution. So, it is Deterministic Algorithm
SVM Applications
• SVM has been used successfully in
many real-world problems
- text (and hypertext) categorization
- image classification
- bioinformatics (Protein classification,
Cancer classification)
- hand-written character recognition
Weakness of SVM
• It is sensitive to noise
- A relatively small number of mislabeled examples can
dramatically decrease the performance

• It only considers two classes

- how to do multi-class classification with SVM?
- Answer:
1) with output m, learn m SVM’s
– SVM 1 learns “Output==1” vs “Output != 1”
– SVM 2 learns “Output==2” vs “Output != 2”
– :
– SVM m learns “Output==m” vs “Output != m”
2)To predict the output for a new input, just predict with
each SVM.

Beatrice Warde Typography Crystal
100% (1)
Beatrice Warde Typography Crystal
27 pages
A Student's Guide to Python for Physical Modeling: Second Edition
From Everand
A Student's Guide to Python for Physical Modeling: Second Edition
Jesse M. Kinder
No ratings yet
K Means
No ratings yet
K Means
23 pages
K Means
No ratings yet
K Means
22 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
k Mean Clustering
No ratings yet
k Mean Clustering
32 pages
K Means PDF
No ratings yet
K Means PDF
22 pages
K Mean Clustering
No ratings yet
K Mean Clustering
27 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Lecture 4
No ratings yet
Lecture 4
64 pages
Lecture-18-Clustering-19092024-091909am
No ratings yet
Lecture-18-Clustering-19092024-091909am
33 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Unit 7 Clustering
No ratings yet
Unit 7 Clustering
56 pages
Topic4 Clustering
No ratings yet
Topic4 Clustering
78 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
Week 11
No ratings yet
Week 11
49 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
K Mean Clustering
No ratings yet
K Mean Clustering
45 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
MLT Unit 3 Notes
No ratings yet
MLT Unit 3 Notes
19 pages
کتاب چهارم بارگزاری شده
No ratings yet
کتاب چهارم بارگزاری شده
63 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
K-Means Clustering and Related Algorithms: Ryan P. Adams
No ratings yet
K-Means Clustering and Related Algorithms: Ryan P. Adams
16 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
ML Clustering K Mean (1)
No ratings yet
ML Clustering K Mean (1)
33 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
K Mean Clustering
No ratings yet
K Mean Clustering
48 pages
04 - KMeans Clustering
No ratings yet
04 - KMeans Clustering
56 pages
Cluster
No ratings yet
Cluster
50 pages
K Means
No ratings yet
K Means
23 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
Cluster
100% (1)
Cluster
72 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Week6_clustering_regression
No ratings yet
Week6_clustering_regression
101 pages
Digital Image Processing: Segmentation-5
No ratings yet
Digital Image Processing: Segmentation-5
43 pages
Clustering
No ratings yet
Clustering
28 pages
Clustering Algorithm: A Fundamental Operation in Data Mining
No ratings yet
Clustering Algorithm: A Fundamental Operation in Data Mining
44 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Unit IV
No ratings yet
Unit IV
96 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
ML 3
No ratings yet
ML 3
100 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
K Mean Clustering 1
No ratings yet
K Mean Clustering 1
26 pages
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
No ratings yet
20 - 1 - ML - Unsup - 01 - Partition Based - Kmeans
20 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
Unit-4
No ratings yet
Unit-4
46 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
6.nsupervised Learning Clustering Lecture 7 Slides For4962
No ratings yet
6.nsupervised Learning Clustering Lecture 7 Slides For4962
37 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
Clustering
No ratings yet
Clustering
45 pages
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet
Free Industrial Training 2025
No ratings yet
Free Industrial Training 2025
3 pages
ch20 Network Layer - IP Protocal 25 nov
No ratings yet
ch20 Network Layer - IP Protocal 25 nov
69 pages
CSS
No ratings yet
CSS
75 pages
Artificial Intelligence Lab File
No ratings yet
Artificial Intelligence Lab File
10 pages
Esteban Sagel
No ratings yet
Esteban Sagel
2 pages
A Technical Perspective of The AP To GL
No ratings yet
A Technical Perspective of The AP To GL
18 pages
Stain Removal - Oil
No ratings yet
Stain Removal - Oil
20 pages
Optics DPP 1 Evelove and Achivers Batch-1
No ratings yet
Optics DPP 1 Evelove and Achivers Batch-1
32 pages
HM4B5 GROUP1 CHAP1-5
No ratings yet
HM4B5 GROUP1 CHAP1-5
102 pages
Science Education Center - by Slidesgo
No ratings yet
Science Education Center - by Slidesgo
45 pages
Seatwork 2
No ratings yet
Seatwork 2
3 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Executive Order 51 - National Code (DOH)
100% (4)
Executive Order 51 - National Code (DOH)
4 pages
Can Democracy Survive in The Post-Factual Age?: A Return To The Lippmann-Dewey Debate About The Politics of News
No ratings yet
Can Democracy Survive in The Post-Factual Age?: A Return To The Lippmann-Dewey Debate About The Politics of News
40 pages
Virtualization Integration
No ratings yet
Virtualization Integration
39 pages
Katalog Rain Bird
No ratings yet
Katalog Rain Bird
2 pages
Unit 7 Attitude & Psychographic Research
No ratings yet
Unit 7 Attitude & Psychographic Research
37 pages
Centennial Review May 2017
No ratings yet
Centennial Review May 2017
4 pages
Associações de Padrões Alimentares Com A Saúde Cerebral A Partir de Análises Comportamentais, de Neuroimagem, Bioquímicas e Genéticas
No ratings yet
Associações de Padrões Alimentares Com A Saúde Cerebral A Partir de Análises Comportamentais, de Neuroimagem, Bioquímicas e Genéticas
23 pages
Thesis Fan Zhang
No ratings yet
Thesis Fan Zhang
40 pages
Essay On Ageing Population in Singapore
No ratings yet
Essay On Ageing Population in Singapore
2 pages
VIK
No ratings yet
VIK
3 pages
Mental Health Survey
No ratings yet
Mental Health Survey
5 pages
The Effect of Diversification Strategy On Organizational Performance
No ratings yet
The Effect of Diversification Strategy On Organizational Performance
12 pages
Phil. Medicinal Plants
100% (9)
Phil. Medicinal Plants
7 pages
Positive and Negative Influences of The Mass Media Upon Education
No ratings yet
Positive and Negative Influences of The Mass Media Upon Education
5 pages
Tisch Env Catalog2016 PDF
No ratings yet
Tisch Env Catalog2016 PDF
50 pages
Principles of Quantum Scattering Theory 1st Edition Dzevad Belkic All Chapters Instant Download
No ratings yet
Principles of Quantum Scattering Theory 1st Edition Dzevad Belkic All Chapters Instant Download
76 pages
Module 005 Property Management Strategies (Part 1)
No ratings yet
Module 005 Property Management Strategies (Part 1)
8 pages
(WITH S4P1) BÀI TẬP TENSES (EXTRA) (N1+4)
No ratings yet
(WITH S4P1) BÀI TẬP TENSES (EXTRA) (N1+4)
7 pages
Gods Conundrum by Micah Moses
No ratings yet
Gods Conundrum by Micah Moses
12 pages
National Plumbing Code Quiz
No ratings yet
National Plumbing Code Quiz
19 pages
1 What - Is - Consumer - Research
No ratings yet
1 What - Is - Consumer - Research
7 pages

AI-unit-5

Uploaded by

AI-unit-5

Uploaded by

K-means Clustering

COMP24111 Machine Learning

• Given a K, find a partition of K clusters to optimise the chosen

Assign each object to the cluster

Knowing the members of each

Compute the distance of all

Assign the membership to objects

Knowing the members of each

Compute the distance of all objects

Stop due to no new assignment

– OCR (Optical Character Recognition)

– DNA sequence identification

– Set up a camera and take some sample images to

• This is the set of all suggested features to explore for

– Use a segmentation operation to isolate fishes

• Information from a single fish is sent to a

• The features are passed to a classifier

– Select the length of the fish as a possible

The length is a poor feature alone!

Select the lightness as a possible

• Threshold decision boundary and cost

Task of decision theory

• Adopt the lightness and add the width of

• We might add other features that are not

• Ideally, the best decision boundary should be

• However, our satisfaction is premature

Pattern Recognition Systems

– Use of a transducer (camera or microphone)

• Segmentation and grouping

– Patterns should be well separated and should not

The Design Cycle

– How do we know when we have collected an

– Depends on the characteristics of the

– Unsatisfied with the performance of our fish

– Use data to determine the classifier.

– Measure the error rate (or performance) and

 Supervised learning (classification)

 Let X be a data sample (“evidence”): class label is unknown

 Given training data X, posteriori probability of a

P(H | X)  P(X | H )P(H )

 Consider PlayTennis, and new instance

P(y) P(sun|y) P(cool|y) P(high|y) P(strong|y) = .005

 Among the various methods of supervised statistical

 The k-NN classifier extends this idea by taking

 All instances correspond to points in the n-D space

 k-NN for real-valued prediction for a given unknown tuple

Small Margin Large Margin

September 2, 2023 Data Mining: Concepts 17

 The complexity of trained classifier is characterized by the # of

kx1 kxd dx1 (k<<d)

• Classical –linear- approaches:

• Each dimensionality reduction technique finds an

• The goal of PCA is to reduce the dimensionality of the

x  a1v1  a2v2  ...  aN vN

(2) Lower-dimensional sub-space representation:

xˆ  b1u1  b2u2  ...  bK uK

• Note: if K=N, then x̂  x

min || x  xˆ || (reconstruction error)

− By “best” eigenvectors we mean those corresponding to the largest

• What is the goal of LDA?

(each sub-matrix has

Sb , S w : scatter matrices of the projected data y

− There has been a tendency in the computer vision community to

How would you

How would you

How would you

How would you

• It only considers two classes

You might also like