0% found this document useful (0 votes)
5 views

UNIT 4 AIML

Unit IV covers ensemble techniques and unsupervised learning, focusing on model combination schemes like bagging, boosting, and stacking, as well as K-means clustering. The K-means algorithm is an iterative method that groups unlabeled data into predefined clusters based on similarity, with the goal of minimizing distances between data points and their centroids. The Elbow method is introduced as a technique to determine the optimal number of clusters by analyzing the Within Cluster Sum of Squares (WCSS).

Uploaded by

angelg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

UNIT 4 AIML

Unit IV covers ensemble techniques and unsupervised learning, focusing on model combination schemes like bagging, boosting, and stacking, as well as K-means clustering. The K-means algorithm is an iterative method that groups unlabeled data into predefined clusters based on similarity, with the goal of minimizing distances between data points and their centroids. The Elbow method is introduced as a technique to determine the optimal number of clusters by analyzing the Within Cluster Sum of Squares (WCSS).

Uploaded by

angelg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT IV

UNIT IV ENSEMBLE TECHNIQUES AND UNSUPERVISED


LEARNING
Combining multiple learners: Model combination schemes, Voting,
Ensemble Learning - bagging, boosting, stacking, Unsupervised
learning: K-means, Instance Based Learning: KNN, Gaussian mixture
models and Expectation maximization
Course Objective: Study about ensembling and unsupervised
learning algorithms
Course Outcome CO4: Build ensembling and unsupervised
models
K – Means clustering algorithm
• K-Means Clustering is an
Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K
defines the number of pre-defined clusters that need to
be created in the process, as if K=2, there will be two
clusters, and for K=3, there will be three clusters, and
so on.
• It is an iterative centroid based algorithm that
divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group
that has similar properties.
• Each cluster is associated with a centroid. The main aim
of this algorithm is to minimize the sum of distances
between the data point and their corresponding
Each cluster has datapoints with some
commonalities, and it is away from other
clusters.

https://www.javatpoint.com/k-means-clustering-algorithm-in-machine-learning
• Determines the best value for K center points or
centroids by an iterative process.
• Assigns each data point to its closest k-center. Those
data points which are near to the particular k-center,
create a cluster.
How does the K-Means Algorithm Work?

• Step-1: Select the number K to decide the number of clusters.


• Step-2: Select random K points or centroids. (It can be other from
the input dataset).
• Step-3: Assign each data point to their closest centroid, which will
form the predefined K clusters.
• Step-4: Calculate the variance and place a new centroid of each
cluster.
• Step-5: Repeat the third steps, which means reassign each
datapoint to the new closest centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to
FINISH.
• Step-7: The model is ready.
• The Elbow method is one of ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the
the most popular ways to square of the distances between each data point
find the optimal number of and its•tcentroid
executes the aK-means
within cluster1clustering on a
and the same
clusters. This method uses for thegiven dataset
other two terms.for different K values
the concept of WCSS value. (ranges from 1-10).
•For each value of K, calculates the WCSS
• WCSS - Within Cluster value.
Sum of Squares, which •Plots a curve between calculated WCSS
defines the total variations values and the number of clusters K.
within a cluster. •The sharp point of bend or a point of the
plot looks like an arm, then that point is
considered as the best value of K
Elbow method - Steps
• executes the K-means clustering
on a given dataset for different K
values (ranges from 1-10).
• For each value of K, calculates the
WCSS value.
• Plots a curve between calculated
WCSS values and the number of
clusters K.
• The sharp point of bend or a point
of the plot looks like an arm, then
that point is considered as the
best value of K
• WCSS is zero - the endpoint of the plot.
Python Implementation
• Data Pre-processing
• Finding the optimal number of clusters using the
elbow method
• Training the K-means algorithm on the training
dataset
• Visualizing the clusters
# importing libraries
• import numpy as nm
• import matplotlib.pyplot as mtp
• import pandas as pd
Example o/p of a K-means
Algorithm
Centroid
3rd iteration
Assignment Question

You might also like