Clustering
Clustering
k-means clustering is an algorithm to classify or to group your objects based on attributes/features into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of K-mean clustering is to classify the data
k-means clustering
Determine number of cluster K Take any random objects as the initial centroids Iterate until stable (= no object move group):
Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance
k-means clustering
Example: Suppose we have 4 objects as your training data points and each object have 2 attributes
k-means clustering
K=2 Each medicine represents one point with two attributes (X, Y)
k-means clustering
1. Initial value of centroids : Suppose we use medicine A and medicine B as the first centroids. Let C1 and C2 denote the coordinate of the centroids, then C1 = (1,2) and C2 = (2,1)
k-means clustering
2. Objects-Centroids distance : we calculate the distance between cluster centroid to each object. Let us use Euclidean distance, then we have distance matrix at iteration 0 is
k-means clustering
3. Object Clustering : assign each object based on minimum distance.
k-means clustering
4. Determine Centroids