0% found this document useful (0 votes)
23 views

AL and ML Assessment Week 11

K-means clustering is an unsupervised machine learning algorithm that partitions data into K clusters by minimizing the distance between data points and the assigned cluster center. It works by iteratively assigning each data point to its nearest cluster center and recalculating cluster centers as the mean of points within the cluster until convergence. There are different types of k-means clustering including hard k-means, fuzzy k-means, and k-medoids. It is widely used for applications such as image segmentation, customer segmentation, anomaly detection, and document clustering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

AL and ML Assessment Week 11

K-means clustering is an unsupervised machine learning algorithm that partitions data into K clusters by minimizing the distance between data points and the assigned cluster center. It works by iteratively assigning each data point to its nearest cluster center and recalculating cluster centers as the mean of points within the cluster until convergence. There are different types of k-means clustering including hard k-means, fuzzy k-means, and k-medoids. It is widely used for applications such as image segmentation, customer segmentation, anomaly detection, and document clustering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Assessment 11

Artificial Intelligence and Machine Learning

Question 1:
K-Means Clustering:

Introduction
K-Means clustering is a popular unsupervised machine learning algorithm used for
partitioning a dataset into K distinct, non-overlapping subsets or clusters. The goal is
to group similar data points together and assign them to clusters, making it a useful
technique for exploratory data analysis and pattern discovery.

Basic Idea:
The algorithm works iteratively to assign each data point to one of K clusters based on
features' similarity. The mean (centroid) of the points in each cluster becomes the new
cluster center. This process is repeated until convergence, where the assignment of
data points to clusters stabilizes.

Algorithm Steps:
1. Initialization: Randomly select K data points as initial cluster centroids.
2. Assignment:Assign each data point to the cluster whose centroid is closest (typically
using Euclidean distance).
3. Update: Recalculate the centroids as the mean of the points in each cluster.
4. Repeat Assignment and Update: Repeat steps 2 and 3 until convergence.

Types of K-Means Clustering:

1. Hard/Traditional K-Means:
- Each data point is assigned exclusively to one cluster.
- The assignment of points to clusters is based on the nearest centroid.

2. Fuzzy K-Means:
- Allows data points to belong to multiple clusters with different degrees of
membership.
- Assigns each point a membership value indicating its degree of belonging to each
cluster.

3. K-Medoids:
- Uses the medoid (the most centrally located point in a cluster) instead of the mean
as the cluster center.
- Less sensitive to outliers compared to traditional K-Means.

4. Kernel K-Means:
- Applies the kernel trick to map data into a higher-dimensional space.
- Enables the clustering of non-linearly separable data.

Advantages of K-Means:
- Simplicity and ease of implementation.
- Scalable to large datasets.
- Applicable to a wide range of data types.

Disadvantages of K-Means:
- Sensitive to the initial placement of centroids.
- Assumes spherical clusters of similar sizes.
- May converge to local optima.

Use Cases:
- Image segmentation.
- Customer segmentation in marketing.
- Anomaly detection in cybersecurity.
- Document clustering in natural language processing.

Tips for Practical Use:


- Preprocess data to handle outliers.
- Consider using feature scaling.
- Run the algorithm multiple times with different initializations.
- Choose the number of clusters (K) carefully; use techniques like the elbow method.

K-Means clustering is a versatile algorithm with various extensions, and its


effectiveness depends on the nature of the data and the problem at hand.m

You might also like