0% found this document useful (0 votes)

14 views

ML Module Iv

Hierarchical clustering is an unsupervised machine learning algorithm that groups unlabeled data into a hierarchy of clusters without needing to specify the number of clusters. It can use either an agglomerative or divisive approach. The Birch algorithm performs hierarchical clustering over large datasets in an efficient manner by building a CF tree that summarizes the data.

Uploaded by

Crazy Chethan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

ML Module Iv

Uploaded by

Crazy Chethan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

MODULE – IV

1.Explain Hierarchical clustering

Hierarchical clustering is another unsupervised machine learning algorithm, which is used to

group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or
HCA.

In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.

Sometimes the results of K-means clustering and hierarchical clustering may look similar, but
they both differ depending on how they work. As there is no requirement to predetermine the
number of clusters as we did in the K-Means algorithm.

The hierarchical clustering technique has two approaches:

1. Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts

with taking all data points as single clusters and merging them until one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is
a top-down approach.

As we already have other clustering algorithms such as K-Means Clustering, we chose

hierarchical clustering because, as we have seen in the K-means clustering that there are
some challenges with this algorithm, which are a predetermined number of clusters, and it
always tries to create the clusters of the same size. To solve these two challenges, we can opt
for the hierarchical clustering algorithm because, in this algorithm, we don't need to have
knowledge about the predefined number of clusters.

Agglomerative Hierarchical clustering

The agglomerative hierarchical clustering algorithm is a popular example of HCA. To group

the datasets into clusters, it follows the bottom-up approach. It means, this algorithm
considers each dataset as a single cluster at the beginning, and then start combining the
closest pair of clusters together. It does this until all the clusters are merged into a single
cluster that contains all the datasets.This hierarchy of clusters is represented in the form of the
dendrogram.

Working of Agglomerative Hierarchical clustering

The working of the AHC algorithm can be explained using the below steps:

Step-1: Create each data point as a single cluster. Let's say there are N data points, so the
number of clusters will also be N.
Step-2: Take two closest data points or clusters and merge them to form one cluster. So, there
will now be N-1 clusters.

Step-3: Again, take the two closest clusters and merge them together to form one cluster.
There will be N-2 clusters.

Step-4: Repeat Step 3 until only one cluster left. So, we will get the following clusters.
Consider the below images:
Step-5: Once all the clusters are combined into one big cluster, develop the dendrogram to
divide the clusters as per the problem.

Divisivehierarchical clustering

Divisive hierarchical clustering is a method used in machine learning and data analysis to
create a hierarchical decomposition of a dataset. Unlike agglomerative clustering, which
starts with each sample as its own cluster and merges them, divisive clustering begins with a
single cluster containing all the samples and recursively divides it into smaller clusters.

Here's a basic outline of how divisive hierarchical clustering works:

Step-1 :Start with one cluster: Initially, all the data points belong to a single cluster.

Step-2: Divide the cluster: The algorithm identifies subgroups within the cluster. This
division can be based on various distance metrics or similarity measures.

Step-3: Recursive division: The clusters are divided further into smaller clusters, iteratively
creating a hierarchical structure.

Step -4 :Stop condition: The algorithm continues dividing until a stopping condition is met.
This could be a predefined number of clusters, reaching a certain threshold of similarity, or
any other criterion specific to the problem.

Step-5: Hierarchy creation: As clusters are split, a tree-like structure or dendrogram is

formed, showing the relationships between clusters at each level of division.

Divisive clustering can be computationally more expensive than agglomerative clustering

because it involves recursively splitting clusters. However, it can sometimes offer
advantages, especially when dealing with specific datasets where the structure of the data
might align better with a top-down approach.

One challenge with divisive clustering is determining the optimal number of clusters or
deciding when to stop the recursive division. This aspect often involves domain knowledge or
using techniques like examining dendrograms or evaluating cluster quality metrics to make
an informed decision.

2.Describe the Birch algorithm

BIRCH stands for Balanced Iterative Reducing and Clustering using Hierarchies,is an
unsupervised data mining algorithm that performs hierarchical clustering over large data sets.
With modifications, it can also be used to accelerate k-means clustering and Gaussian
mixture modelling with the expectation-maximization algorithm. An advantage of BIRCH is
its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data
points to produce the best quality clustering for a given set of resources (memory and time
constraints). In most cases, BIRCH only requires a single scan of the database.

Basic clustering algorithms like K means and agglomerative clustering are the most
commonly used clustering algorithms. But when performing clustering on very large datasets,
BIRCH and DBSCAN are the advanced clustering algorithms useful for performing precise
clustering on large datasets. Moreover, BIRCH is very useful because of its easy
implementation. BIRCH is a clustering algorithm that clusters the dataset first in small
summaries, then after small summaries get clustered. It does not directly cluster the dataset.
That is why BIRCH is often used with other clustering algorithms; after making the
summary, the summary can also be clustered by other clustering algorithms.

It is provided as an alternative to MinibatchKMeans. It converts data to a tree data structure

with the centroids being read off the leaf. And these centroids can be the final cluster centroid
or the input for other cluster algorithms like Agglomerative Clustering.

Stages of BIRCH

BIRCH is often used to complement other clustering algorithms by creating a summary of the
dataset that the other clustering algorithm can now use. However, BIRCH has one major
drawback it can only process metric attributes. A metric attribute is an attribute whose values
can be represented in Euclidean space, i.e., no categorical attributes should be present. The
BIRCH clustering algorithm consists of two stages:

1. Building the CF Tree: BIRCH summarizes large datasets into smaller, dense regions
called Clustering Feature (CF) entries. Formally, a Clustering Feature entry is defined
as an ordered triple (N, LS, SS) where 'N' is the number of data points in the cluster,
'LS' is the linear sum of the data points, and 'SS' is the squared sum of the data points
in the cluster. A CF entry can be composed of other CF entries. Optionally, we can
condense this initial CF tree into a smaller CF.
2. Global Clustering: Applies an existing clustering algorithm on the leaves of the CF
tree. A CF tree is a tree where each leaf node contains a sub-cluster. Every entry in a
CF tree contains a pointer to a child node, and a CF entry made up of the sum of CF
entries in the child nodes. Optionally, we can refine these clusters.

Due to this two-step process, BIRCH is also called Two-Step Clustering.

Algorithm
The tree structure of the given data is built by the BIRCH algorithm called the Clustering
feature tree (CF tree). This algorithm is based on the CF (clustering features) tree. In addition,
this algorithm uses a tree-structured summary to create clusters.
In context to the CF tree, the algorithm compresses the data into the sets of CF nodes. Those
nodes that have several sub-clusters can be called CF subclusters. These CF subclusters are
situated in no-terminal CF nodes.

The CF tree is a height-balanced tree that gathers and manages clustering features and holds
necessary information of given data for further hierarchical clustering. This prevents the need
to work with whole data given as input. The tree cluster of data points as CF is represented by
three numbers (N, LS, SS).

N = number of items in subclusters

LS = vector sum of the data points

SS = sum of the squared data points

There are mainly four phases which are followed by the algorithm of BIRCH.

o Scanning data into memory.

o Condense data (resize data).
o Global clustering.

o Refining clusters.
Two of them (resize data and refining clusters) are optional in these four phases. They come
in the process when more clarity is required. But scanning data is just like loading data into a
model. After loading the data, the algorithm scans the whole data and fits them into the CF
trees.

In condensing, it resets and resizes the data for better fitting into the CF tree. In global
clustering, it sends CF trees for clustering using existing clustering algorithms. Finally,
refining fixes the problem of CF trees where the same valued points are assigned to different
leaf nodes.

BIRCH clustering achieves its high efficiency by clever use of a small set of summary
statistics to represent a larger set of data points. These summary statistics constitute a CF and
represent a sufficient substitute for the actual data for clustering purposes.

A CF is a set of three summary statistics representing a set of data points in a single cluster.
These statistics are as follows:

o Count [The number of data values in the cluster]

o Linear Sum [The sum of the individual coordinates. This is a measure of the location
of the cluster]

o Squared Sum [The sum of the squared coordinates. This is a measure of the spread of
the cluster]

CF Tree
The building process of the CF Tree can be summarized in the following steps, such as:

Step 1: For each given record, BIRCH compares the location of that record with the location
of each CF in the root node, using either the linear sum or the mean of the CF. BIRCH passes
the incoming record to the root node CF closest to the incoming record.

Step 2: The record then descends down to the non-leaf child nodes of the root node CF
selected in step 1. BIRCH compares the location of the record with the location of each non-
leaf CF. BIRCH passes the incoming record to the non-leaf node CF closest to the incoming
record.
Step 3: The record then descends down to the leaf child nodes of the non-leaf node CF
selected in step 2. BIRCH compares the location of the record with the location of each leaf.
BIRCH tentatively passes the incoming record to the leaf closest to the incoming record.

Step 4: Perform one of the below points (i) or (ii):

1. If the radius of the chosen leaf, including the new record, does not exceed the
threshold T, then the incoming record is assigned to that leaf. The leaf and its parent
CF's are updated to account for the new data point.

2. If the radius of the chosen leaf, including the new record, exceeds the Threshold T,
then a new leaf is formed, consisting of the incoming record only. The parent CFs is
updated to account for the new data point.

If step 4(ii) is executed, and the maximum L leaves are already in the leaf node, the leaf node
is split into two leaf nodes. If the parent node is full, split the parent node, and so on. The
most distant leaf node CFs are used as leaf node seeds, with the remaining CFs being
assigned to whichever leaf node is closer. Note that the radius of a cluster may be calculat ed
even without knowing the data points, as long as we have the count n, the linear sum LS, and
the squared sum SS. This allows BIRCH to evaluate whether a given data point belongs to a
particular sub-cluster without scanning the original data set.

Clustering the Sub-Clusters

Once the CF tree is built, any existing clustering algorithm may be applied to the sub-clusters
(the CF leaf nodes) to combine these sub-clusters into clusters. The task of clustering
becomes much easier as the number of sub-clusters is much less than the number of data
points. When a new data value is added, these statistics may be easily updated, thus making
the computation more efficient.

Parameters of BIRCH

There are three parameters in this algorithm that needs to be tuned. Unlike K-means, the
optimal number of clusters (k) need not be input by the user as the algorithm determines
them.
o Threshold: Threshold is the maximum number of data points a sub-cluster in the leaf
node of the CF tree can hold.

o branching_factor: This parameter specifies the maximum number of CF sub-clusters

in each node (internal node).
o n_clusters: The number of clusters to be returned after the entire BIRCH algorithm is
complete, i.e., the number of clusters after the final clustering step. The final
clustering step is not performed if set to none, and intermediate clusters are returned.

Advantages of BIRCH

It is local in that each clustering decision is made without scanning all data points and
existing clusters. It exploits the observation that the data space is not usually uniformly
occupied, and not every data point is equally important.

It uses available memory to derive the finest possible sub-clusters while minimizing I/O
costs. It is also an incremental method that does not require the whole data set in advance.

3.Explain about density based clustering

Density-based clustering

Density-based clustering refers to a method that is based on local cluster criterion, such as
density connected points. In this tutorial, we will discuss density-based clustering with
examples.

What is Density-based clustering?

Density-Based Clustering refers to one of the most popular unsupervised learning

methodologies used in model building and machine learning algorithms. The data points in
the region separated by two clusters of low point density are considered as noise. The
surroundings with a radius ε of a given object are known as the ε neighborhood of the object.
If the ε neighborhood of the object comprises at least a minimum number, MinPts of objects,
then it is called a core object.

Density-Based Clustering - Background

There are two different parameters to calculate the density-based clustering

EPS: It is considered as the maximum radius of the neighborhood.

MinPts: MinPts refers to the minimum number of points in an Eps neighborhood of that
point.

NEps (i) : { k belongs to D and dist (i,k) < = Eps}

Directly density reachable:

A point i is considered as the directly density reachable from a point k with respect to Eps,
MinPts ifi belongs to NEps(k)

Core point condition:

NEps (k) >= MinPts

Density reachable:

A point denoted by i is a density reachable from a point j with respect to Eps, MinPts if there
is a sequence chain of a point i1,…., in, i1 = j, pn = i such that i i + 1 is directly density
reachable from i i.
Density connected:

A point i refers to density connected to a point j with respect to Eps, MinPts if there is a point
o such that both i and j are considered as density reachable from o with respect to Eps and
MinPts.

Working of Density-Based Clustering

Suppose a set of objects is denoted by D', we can say that an object I is directly density
reachable form the object j only if it is located within the ε neighborhood of j, and j is a core
object.
An object i is density reachable form the object j with respect to ε and MinPts in a given set
of objects, D' only if there is a sequence of object chains point i1,…., in, i1 = j, pn = i such
that ii + 1 is directly density reachable from ii with respect to ε and MinPts.

An object i is density connected object j with respect to ε and MinPts in a given set of
objects, D' only if there is an object o belongs to D such that both point i and j are density
reachable from o with respect to ε and MinPts.

Major Features of Density-Based Clustering

The primary features of Density-based clustering are given below.

o It is a scan method.

o It requires density parameters as a termination condition.

o It is used to manage noise in data clusters.

o Density-based clustering is used to identify clusters of arbitrary size.

Density-Based Clustering Methods

DBSCAN

DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise. It depends
on a density-based notion of cluster. It also identifies clusters of arbitrary size in the spatial
database with outliers.
OPTICS

OPTICS stands for Ordering Points To Identify the Clustering Structure. It gives a significant
order of database with respect to its density-based clustering structure. The order of the
cluster comprises information equivalent to the density-based clustering related to a long
range of parameter settings. OPTICS methods are beneficial for both automatic and
interactive cluster analysis, including determining an intrinsic clustering structure.

DENCLUE

Density-based clustering by Hinnebirg and Kiem. It enables a compact mathematical

description of arbitrarily shaped clusters in high dimension state of data, and it is good for
data sets with a huge amount of noise.

4.Describe Undirected Graphical models

Undirected graphical models, also known as Markov Random Fields (MRFs) or undirected
probabilistic graphical models, are a type of probabilistic model used in machine learning and
statistics to represent dependencies among variables. Here's a comprehensive description:

Definition and Components:

Graphical Representation: Undirected graphical models are represented by a graph, where

nodes represent random variables, and edges represent dependencies or interactions between
variables.

No Directionality: Unlike directed graphical models (like Bayesian Networks), undirected

models depict relationships without specifying the direction of influence between variables.

Components of Undirected Graphical Models:

Nodes (Vertices): Each node in the graph represents a random variable, and the entire set of
nodes represents the set of random variables in the model.

Edges (Links): The edges between nodes encode relationships or dependencies. Absence of
an edge between nodes implies conditional independence given the rest of the variables.

Properties and Characteristics:

Conditional Independence: The absence of a direct edge between two nodes indicates
conditional independence between the corresponding variables given all other variables in the
model.

Energy Function or Potential Functions: Undirected models utilize potential functions or

energy functions associated with cliques (groups of connected nodes) in the graph. These
functions specify the compatibility of variable assignments in the cliques.

Global Normalization: Unlike directed models, undirected models often require global
normalization due to the lack of a clear flow of probability as in directed models.

Inference and Learning:

Inference: Performing tasks like probability estimation, most probable configuration, or

marginalization involves methods like belief propagation, Markov Chain Monte Carlo
(MCMC), or graph cuts.

Learning: Learning in undirected models involves estimating parameters or structure from

data. Techniques include maximum likelihood estimation, pseudo-likelihood, or methods like
gradient-based optimization for learning parameters.

Applications:

Image Processing: Used in image denoising, segmentation, and reconstruction by modeling

pixel dependencies.

Social Network Analysis: Captures relationships between individuals in social networks,

identifying communities or influential nodes.

Bioinformatics: Modeling protein interactions, gene regulatory networks, or analyzing

biological data with dependencies.

Natural Language Processing: Dependency parsing, word sense disambiguation, and

syntactic analysis.

Computer Vision: Object recognition, scene understanding, and spatial modeling of visual
data.
Undirected graphical models offer a flexible framework for representing complex
dependencies among variables, enabling various applications in different domains while
capturing probabilistic relationships among them.

5.Discuss in detail about Variable elimination

Variable Elimination is a fundamental algorithm used in probabilistic graphical models,
particularly in the context of Bayesian networks and factor graphs, to perform exact inference
efficiently. Its primary purpose is to compute conditional probability distributions or marginal
probabilities over a subset of variables in a probabilistic model.

Introduction to Variable Elimination:

Probabilistic graphical models represent complex probability distributions using graphical

structures. Bayesian networks and factor graphs are two common types of such models.
Variable Elimination is crucial for performing inference tasks, such as computing
probabilities or making predictions based on observed evidence.

Key Concepts:

1. Factorization of Joint Probability:

 In graphical models, the joint probability distribution is factorized into

smaller, more manageable factors based on the graph structure.

 Each factor corresponds to a subset of variables and encodes their conditional

probabilities or potentials.

2. Elimination of Variables:

 The goal of Variable Elimination is to eliminate variables from these factors

systematically.

 By eliminating variables, the algorithm simplifies the computation of the

desired probabilities.

Steps in Variable Elimination:

1. Initialization:

 Begin with factors representing the entire joint probability distribution.

 Identify the query variables for which probabilities need to be computed.

2. Variable Ordering:

 Choose an elimination order for the variables. This order significantly impacts
the efficiency of the algorithm.

 Select an order that minimizes the computational complexity by considering

factors' sizes.

3. Variable Elimination Process:

 Iterate through the elimination order, eliminating one variable at a time.

 For each variable:

 Identify factors involving the variable.

 Perform variable elimination by summing/multiplying out the variable

from these factors.

 Create new factors based on the result, reducing the number of

variables.

4. Final Computation:

 Once all variables are eliminated, the remaining factors represent the desired
probabilities.

Example of Variable Elimination:

Consider a Bayesian network representing a medical diagnosis:

 Variables: Symptoms (S), Disease (D), Test Results (T)

 Conditional probabilities: P(S|D), P(D), P(T|D)

To compute P(D|S=T), where S is observed as true:

1. Initialize factors representing P(D), P(S|D), and P(T|D).

2. Choose an elimination order, say S, D, T.

3. Eliminate S:

 Multiply P(S|D) and P(D) to get P(S, D).

 Marginalize S from P(S, D) to obtain P(D).

4. Eliminate D:

 Combine P(T|D) with P(D) obtained earlier to get P(T, D).

 Marginalize D from P(T, D) to obtain P(T).

5. Eliminate T (if necessary) to get the final result.

Advantages and Considerations:

1. Efficiency: Variable Elimination significantly reduces computational complexity

compared to brute-force methods.

2. Memory Efficiency: It reduces the memory requirements by simplifying factors.

3. Choice of Variable Order: The choice of variable elimination order greatly

influences the efficiency of the algorithm.

Challenges and Extensions:

1. Optimal Variable Ordering: Finding the best elimination order is an NP-hard

problem.

2. Approximate Methods: In some cases, exact Variable Elimination might not be

feasible, leading to approximate algorithms like belief propagation.

Variable Elimination is a crucial algorithm in probabilistic graphical models, enabling

efficient computation of probabilities and performing inference tasks. While its efficiency
depends on the chosen variable elimination order, it remains a cornerstone for exact inference
in Bayesian networks and factor graphs, finding applications in various domains such as
healthcare, finance, and artificial intelligence.

6.Explain about CURE Algorithm

In the realm of data analysis and machine learning, accurate grouping of similar entities is
crucial for efficient decision−making processes. While traditional clustering algorithms have
certain limitations, CURE (Clustering Using Representatives) offers a unique approach that
shines with its creative methodology. In this article, we will dive into a detailed exploration
of the CURE algorithm, providing a clear understanding along with an illustrative diagram
example. As technology advances and big data proliferates across industries, harnessing the
power of algorithms like CURE is essential in extracting valuable knowledge from complex
datasets for improved decision−making processes and discovery of hidden patterns within
vast information−rich environments.

CURE Algorithm

The CURE algorithm provides an effective means for discovering hidden structures and
patterns in large datasets by adopting a systematic approach to clustering. Employing random
sampling, hierarchical clustering, distance measures, merging representative points along
with subsequent refinement and splitting stages all culminate in accurate final membership
assignments. Armed with its efficient execution time and utilization of partial aggregations,
CURE plays a crucial role in diverse applications where dataset exploration is paramount.

The CURE algorithm utilizes both single−level and hierarchical methods to overcome
common challenges faced by other clustering algorithms. Its core principle centers around
defining cluster representatives − points within a given cluster that best represent its overall
characteristics − rather than merely relying on centroids or medoids.

Data Subset Selection

To initiate the CURE algorithm, an initial subset of data points needs to be chosen from the
dataset being analyzed. These randomly selected points will act as potential representatives
for producing robust clusters.

Hierarchical Clustering

Next, these representative points are clustered hierarchically using either agglomerative or
divisive techniques. Agglomerative clustering gradually merges similar representatives until
reaching one central representative per cluster while divisive clustering splits them based on
dissimilarities.

Cluster Shrinkage

Once all clusters are obtained through hierarchical clustering, each cluster's size is reduced by
reducing the outlier’s weights in relation to their distance from their respective representative
points. This process helps eliminate irrelevant noise and focuses on more relevant patterns in
each individual cluster.

Final Data Point Assignment

After shrinking the initial clusters down to their core components, all remaining
nonrepresentative points are assigned to their nearest existing representative based on
Euclidean distance or other suitable measures consistent with specific applications.

A detailed explanation of the basic steps involved in the CURE algorithm is listed below,

Step 1: Random Sampling

The first step in the CURE algorithm entails randomly selecting a subset of data points from
the given dataset. This random sampling ensures that representative samples are obtained
across different regions of the data space rather than being biased toward particular areas or
clusters.

Step 2: Hierarchical Clustering

Next comes hierarchical clustering on the sampled points. Employing techniques such as
Single Linkage or Complete Linkage hierarchical clustering methods helps create initial
compact clusters based on their proximity to each other within this smaller dataset.

Step 3: Distance Measures

CURE leverages distance measures to compute distances between clusters during merging
operations while maintaining an efficient runtime. Euclidean distance is commonly used due
to simplicity; however, other distance metrics like Manhattan can be employed depending on
domain−specific requirements.

Step 4: Merging Representative Points

With cluster centroids determined through hierarchical clustering, CURE focuses on merging
representative points from various sub−clusters into a unified set by employing partial
aggregations and pruning appropriately. This consolidation facilitates a significant reduction
in computation time by making subsequent operations more concise.
Step 5: Cluster Refinement and Splitting

After merging representatives, refinement takes place through exchanging outliers among
aggregated sets for better alignment with true target structures within each merged group.
Subsequently, splitting occurs, when necessary, by forming new individual agglomerative
groups representing modified substructures unaccounted for during earlier hierarchies.

Step 6: Final Membership Assignment

Lastly, assigning remaining objects outside formed aggregates follows suit − specifically
those not captured effectively via either mergers or refinements. These yet−to−beclustered
points are linked with the cluster identifiers of their nearest representative points, finalizing
the overall clustering process.

By prioritizing cluster representation rather than pure centroid-based calculations, CURE

proves to be an innovative and powerful algorithm for effective data grouping tasks. Its
incorporation of hierarchical clustering and subsequent outlier reduction ensures more
accurate results while tackling inherent challenges faced by traditional algorithms such as
K−means or DBSCAN.

7.Explain about Partitional clustering

Partitional clustering is a significant technique in unsupervised machine learning used to

divide a dataset into distinct and non-overlapping groups, called clusters, where each data
point belongs to exactly one cluster. The clusters are formed based on similarities or
dissimilarities among the data points. Several algorithms, such as K-means, K-medoids, and
CLARANS, fall under the category of partitional clustering.

1. Fundamentals of Clustering

1.1. Unsupervised Learning:

 Clustering Objective: Identify inherent structures or patterns within the data without
labeled information.
 Grouping Similar Data: Clustering groups data points with similar characteristics
into clusters while keeping dissimilar data points in different clusters.

1.2. Types of Clustering:

 Partitional vs. Hierarchical: Partitional clustering forms flat clusters while

hierarchical clustering creates a tree-like hierarchy of clusters.
 Density-based, Model-based: Different approaches cater to various data distributions
and cluster shapes.

2. Partitional Clustering Algorithms

2.1. K-means Algorithm:

 Objective Function: Minimize the sum of squared distances between data points and
their respective cluster centroids.
 Steps:
 Initialization: Select initial cluster centroids.
 Assignment: Assign data points to the nearest centroid.
 Update Centroids: Recalculate centroids based on the new assignments.
 Convergence: Iterate until convergence or a stopping criterion is met.

2.2. K-medoids Algorithm:

 Medoid as Cluster Representative: Uses actual data points as cluster representatives

rather than centroids.
 Robustness to Outliers: More robust to noise and outliers compared to K-means.
 PAM (Partitioning Around Medoids): An iterative algorithm to minimize a cost
function based on medoids.

2.3. CLARANS (Clustering Large Applications based on Randomized Search):

 Randomized Search: Uses random sampling to find cluster medoids efficiently.

 Handling Large Datasets: Designed to handle large datasets better than other
partitional algorithms.

3. Key Concepts in Partitional Clustering

3.1. Centroid vs. Medoid:

 Centroid: Arithmetic mean of all data points in a cluster.

 Medoid: The most centrally located data point within a cluster, minimizing
dissimilarities to other points.

3.2. Distance Measures:

 Euclidean Distance: Commonly used metric for measuring distances between data
points in Euclidean space.
 Other Distance Metrics: Manhattan distance, cosine similarity, Mahalanobis
distance, etc., can be used based on data characteristics.

4. Evaluation Metrics for Partitional Clustering

4.1. Internal Evaluation:

 Intra-cluster Cohesion: Measures compactness of clusters using within-cluster sum

of squares or other metrics.
 Inter-cluster Separation: Evaluates the separation between clusters.

4.2. External Evaluation:

 Comparison with Ground Truth: If available, external metrics like purity, F-score,
or adjusted Rand index measure clustering performance against known labels.

5. Challenges and Considerations

5.1. Sensitivity to Initializations:

 Impact of Initialization: Different initializations can lead to varying clustering

results.
 Initialization Strategies: K-means++ initialization, random initialization, etc., to
mitigate this issue.

5.2. Determining Optimal Number of Clusters:

 Elbow Method, Silhouette Score: Techniques to find the optimal number of clusters.
 Domain Knowledge: Utilizing domain knowledge to validate clustering results.

6. Applications and Use Cases

6.1. Customer Segmentation:

 Marketing Strategies: Clustering customers based on behavior for targeted

marketing.

6.2. Image Segmentation:

 Medical Imaging, Object Recognition: Partitioning images into meaningful

segments for analysis.

7. Future Trends and Advancements

7.1. Scalability and Efficiency:

 Handling Big Data: Development of algorithms for efficient clustering of large-scale

datasets.
 Parallel and Distributed Computing: Utilizing parallel processing for faster
computations.

7.2. Integration with Deep Learning:

 Hybrid Approaches: Integrating partitional clustering with deep learning techniques

for improved feature representations and clustering results.

Partitional clustering techniques play a crucial role in data analysis and pattern recognition
across various domains. Understanding these algorithms, their strengths, limitations, and
practical applications contributes significantly to effective data exploration and knowledge
extraction from datasets.

8.Describe Hidden Markov Model(HMM)

Hidden Markov Models (HMMs) are a type of probabilistic model that are commonly used in
machine learning for tasks such as speech recognition, natural language processing, and
bioinformatics. They are a popular choice for modelling sequences of data because they can
effectively capture the underlying structure of the data, even when the data is noisy or
incomplete. In this article, we will give a comprehensive overview of Hidden Markov
Models, including their mathematical foundations, applications, and limitations.

What are Hidden Markov Models?

A Hidden Markov Model (HMM) is a probabilistic model that consists of a sequence of

hidden states, each of which generates an observation. The hidden states are usually not
directly observable, and the goal of HMM is to estimate the sequence of hidden states based
on a sequence of observations. An HMM is defined by the following components:

o A set of N hidden states, S = {s1, s2, ..., sN}.

o A set of M observations, O = {o1, o2, ..., oM}.
o An initial state probability distribution, ? = {?1, ?2, ..., ?N}, which specifies the
probability of starting in each hidden state.
o A transition probability matrix, A = [aij], defines the probability of moving from one
hidden state to another.
o An emission probability matrix, B = [bjk], defines the probability of emitting an
observation from a given hidden state.

The basic idea behind an HMM is that the hidden states generate the observations, and the
observed data is used to estimate the hidden state sequence. This is often referred to as
the forward-backwards algorithm.

Applications of Hidden Markov Models

Now, we will explore some of the key applications of HMMs, including speech recognition,
natural language processing, bioinformatics, and finance.

o Speech Recognition

One of the most well-known applications of HMMs is speech recognition. In this

field, HMMs are used to model the different sounds and phones that makeup speech.
The hidden states, in this case, correspond to the different sounds or phones, and the
observations are the acoustic signals that are generated by the speech. The goal is to
estimate the hidden state sequence, which corresponds to the transcription of the
speech, based on the observed acoustic signals. HMMs are particularly well-suited for
speech recognition because they can effectively capture the underlying structure of
the speech, even when the data is noisy or incomplete. In speech recognition systems,
the HMMs are usually trained on large datasets of speech signals, and the estimated
parameters of the HMMs are used to transcribe speech in real time.

o Natural Language Processing

Another important application of HMMs is natural language processing. In this field,

HMMs are used for tasks such as part-of-speech tagging, named entity recognition,
and text classification. In these applications, the hidden states are typically
associated with the underlying grammar or structure of the text, while the
observations are the words in the text. The goal is to estimate the hidden state
sequence, which corresponds to the structure or meaning of the text, based on the
observed words. HMMs are useful in natural language processing because they can
effectively capture the underlying structure of the text, even when the data is noisy or
ambiguous. In natural language processing systems, the HMMs are usually trained on
large datasets of text, and the estimated parameters of the HMMs are used to perform
various NLP tasks, such as text classification, part-of-speech tagging, and named
entity recognition.

o Bioinformatics
HMMs are also widely used in bioinformatics, where they are used to model
sequences of DNA, RNA, and proteins. The hidden states, in this case, correspond to
the different types of residues, while the observations are the sequences of residues.
The goal is to estimate the hidden state sequence, which corresponds to the underlying
structure of the molecule, based on the observed sequences of residues. HMMs are
useful in bioinformatics because they can effectively capture the underlying structure
of the molecule, even when the data is noisy or incomplete. In bioinformatics systems,
the HMMs are usually trained on large datasets of molecular sequences, and the
estimated parameters of the HMMs are used to predict the structure or function of
new molecular sequences.
o Finance
Finally, HMMs have also been used in finance, where they are used to model stock
prices, interest rates, and currency exchange rates. In these applications, the hidden
states correspond to different economic states, such as bull and bear markets, while
the observations are the stock prices, interest rates, or exchange rates. The goal is to
estimate the hidden state sequence, which corresponds to the underlying economic
state, based on the observed prices, rates, or exchange rates. HMMs are useful in
finance because they can effectively capture the underlying economic state, even
when the data is noisy or incomplete. In finance systems, the HMMs are usually
trained on large datasets of financial data, and the estimated parameters of the HMMs
are used to make predictions about future market trends or to develop investment
strategies.

Limitations of Hidden Markov Models

Now, we will explore some of the key limitations of HMMs and discuss how they can impact
the accuracy and performance of HMM-based systems.

o Limited Modeling Capabilities

One of the key limitations of HMMs is that they are relatively limited in their
modelling capabilities. HMMs are designed to model sequences of data, where the
underlying structure of the data is represented by a set of hidden states. However, the
structure of the data can be quite complex, and the simple structure of HMMs may not
be enough to accurately capture all the details. For example, in speech recognition, the
complex relationship between the speech sounds and the corresponding acoustic
signals may not be fully captured by the simple structure of an HMM.

o Overfitting
Another limitation of HMMs is that they can be prone to overfitting, especially when
the number of hidden states is large or the amount of training data is limited.
Overfitting occurs when the model fits the training data too well and is unable to
generalize to new data. This can lead to poor performance when the model is applied
to real-world data and can result in high error rates. To avoid overfitting, it is
important to carefully choose the number of hidden states and to use appropriate
regularization techniques.
o Lackof Robustness.
HMMs are also limited in their robustness to noise and variability in the data. For
example, in speech recognition, the acoustic signals generated by speech can be
subjected to a variety of distortions and noise, which can make it difficult for the
HMM to accurately estimate the underlying structure of the data. In some cases, these
distortions and noise can cause the HMM to make incorrect decisions, which can
result in poor performance. To address these limitations, it is often necessary to use
additional processing and filtering techniques, such as noise reduction and
normalization, to pre-process the data before it is fed into the HMM.

o Computational Complexity

Finally, HMMs can also be limited by their computational complexity, especially

when dealing with large amounts of data or when using complex models. The
computational complexity of HMMs is due to the need to estimate the parameters of
the model and to compute the likelihood of the data given in the model. This can be
time-consuming and computationally expensive, especially for large models or for
data that is sampled at a high frequency. To address this limitation, it is often
necessary to use parallel computing techniques or to use approximations that reduce
the computational complexity of the model.

Master Plumber Tips
88% (8)
Master Plumber Tips
2 pages
Heirarchical clustering
No ratings yet
Heirarchical clustering
22 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Hierarchical ClusteringAlgorithm
No ratings yet
Hierarchical ClusteringAlgorithm
32 pages
Expt-5
No ratings yet
Expt-5
3 pages
Unit 4 Self Made (1)
No ratings yet
Unit 4 Self Made (1)
28 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
13_BIRCH
No ratings yet
13_BIRCH
8 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
7 pages
Herichycal Cluster - March2020
No ratings yet
Herichycal Cluster - March2020
29 pages
Clustering Part 2
No ratings yet
Clustering Part 2
28 pages
Herichycal March2020
No ratings yet
Herichycal March2020
29 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Clustering
No ratings yet
Clustering
19 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Automatic Clustering Algorithms
No ratings yet
Automatic Clustering Algorithms
3 pages
Hierarchical Clustering PDF
No ratings yet
Hierarchical Clustering PDF
5 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Hierarchical Clustering pdf
No ratings yet
Hierarchical Clustering pdf
7 pages
ML_U2_BIRCH_61845fd2-aa4b-4335-afa5-37d9f3b4d63a (1)
No ratings yet
ML_U2_BIRCH_61845fd2-aa4b-4335-afa5-37d9f3b4d63a (1)
20 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
Unit-4 new
No ratings yet
Unit-4 new
36 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
8 pages
Balanced Iterative Reducing and Clustering Using Hierarchies
No ratings yet
Balanced Iterative Reducing and Clustering Using Hierarchies
33 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Hierachical Clustering
No ratings yet
Hierachical Clustering
31 pages
Week-10
No ratings yet
Week-10
84 pages
20 Ijictv3n10spl PDF
No ratings yet
20 Ijictv3n10spl PDF
8 pages
Spooo
No ratings yet
Spooo
9 pages
Birch 09
No ratings yet
Birch 09
31 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
ML CO4 SESSION 30 Hierarchical Clustering
No ratings yet
ML CO4 SESSION 30 Hierarchical Clustering
20 pages
Evaluation of BIRCH Clustering Algorithm For Big Data
No ratings yet
Evaluation of BIRCH Clustering Algorithm For Big Data
5 pages
476 emt abstract
No ratings yet
476 emt abstract
3 pages
Electronics 11 02735 v2
No ratings yet
Electronics 11 02735 v2
19 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
22 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Hierarchical clustering
No ratings yet
Hierarchical clustering
2 pages
ML_Lec-17
No ratings yet
ML_Lec-17
12 pages
Toward High Dimensiona Agustino
No ratings yet
Toward High Dimensiona Agustino
43 pages
Comparative Analysis of BIRCH and CURE Hierarchical Clustering Algorithm Using WEKA 3.6.9
No ratings yet
Comparative Analysis of BIRCH and CURE Hierarchical Clustering Algorithm Using WEKA 3.6.9
5 pages
A_new_hierarchical_clustering_algorithm (1)
No ratings yet
A_new_hierarchical_clustering_algorithm (1)
5 pages
10Hierarchical&Probabilistic Clustering & GMM (ML)
No ratings yet
10Hierarchical&Probabilistic Clustering & GMM (ML)
24 pages
UNIT III - ML
No ratings yet
UNIT III - ML
13 pages
Hierarchical Clustering in Data Mining
No ratings yet
Hierarchical Clustering in Data Mining
4 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Maa Bom 18042024 23369425039
No ratings yet
Maa Bom 18042024 23369425039
3 pages
Studentent Evaluation Sheet
No ratings yet
Studentent Evaluation Sheet
1 page
Introduction to Internet of Things
No ratings yet
Introduction to Internet of Things
1 page
ML Module Ii
No ratings yet
ML Module Ii
24 pages
ML Module I
No ratings yet
ML Module I
16 pages
VossPac - Silica Fabric2
No ratings yet
VossPac - Silica Fabric2
1 page
College Lesson Plan Sample: When Developing Your Lessons, Follow The Guidelines On Page 2 of This Form
No ratings yet
College Lesson Plan Sample: When Developing Your Lessons, Follow The Guidelines On Page 2 of This Form
7 pages
CH 6 HW List
No ratings yet
CH 6 HW List
9 pages
Keeping Journal Writing To Improve The Writing Ability of The Tenth Grade Students of Sma N 1 Jekulo Kudus in The Academy Year 2009/2010
No ratings yet
Keeping Journal Writing To Improve The Writing Ability of The Tenth Grade Students of Sma N 1 Jekulo Kudus in The Academy Year 2009/2010
12 pages
The Creation Reed 1
No ratings yet
The Creation Reed 1
29 pages
PCB Design Course Syllabus PDF
No ratings yet
PCB Design Course Syllabus PDF
3 pages
TR - Driving NC II
0% (1)
TR - Driving NC II
61 pages
Extreme Architecture: Online Workshop
No ratings yet
Extreme Architecture: Online Workshop
16 pages
Syllabus For Airport Operations
No ratings yet
Syllabus For Airport Operations
9 pages
Material Safety Data Sheet: Engen Dieselube 700 Super
No ratings yet
Material Safety Data Sheet: Engen Dieselube 700 Super
6 pages
Thesis Statement For A Cause and Effect Essay About The Effects of The War On The Us
100% (3)
Thesis Statement For A Cause and Effect Essay About The Effects of The War On The Us
8 pages
Construction of An Automatic Night Lamp
No ratings yet
Construction of An Automatic Night Lamp
10 pages
(Advances in Semiotics) Thomas Albert Sebeok-A Sign Is Just A Sign (Advances in Semiotics) - Indiana University Press (1991) PDF
No ratings yet
(Advances in Semiotics) Thomas Albert Sebeok-A Sign Is Just A Sign (Advances in Semiotics) - Indiana University Press (1991) PDF
201 pages
Computer Science (practical)
No ratings yet
Computer Science (practical)
21 pages
Art Integration Lesson Cause and Effect
No ratings yet
Art Integration Lesson Cause and Effect
8 pages
The-Impact-of-Current-Transformer-Saturation-on-TransformerDifferential-Protection
No ratings yet
The-Impact-of-Current-Transformer-Saturation-on-TransformerDifferential-Protection
21 pages
Annual Examination Result - 2020 - School and Examination Manager PDF
No ratings yet
Annual Examination Result - 2020 - School and Examination Manager PDF
2 pages
Sofimshc: Geometric Modelling
No ratings yet
Sofimshc: Geometric Modelling
115 pages
Division 3 3300
No ratings yet
Division 3 3300
3 pages
Kag: Boosting Llms in Professional Domains Via Knowledge Augmented Generation
No ratings yet
Kag: Boosting Llms in Professional Domains Via Knowledge Augmented Generation
33 pages
Metric System Imperial System
No ratings yet
Metric System Imperial System
8 pages
Cosmos Magazine - September 2022
No ratings yet
Cosmos Magazine - September 2022
118 pages
Instant download Essentials of Sociology James M. Henslin pdf all chapter
100% (5)
Instant download Essentials of Sociology James M. Henslin pdf all chapter
60 pages
The Business Idea Factory - A World-Class System For Creating Successful Business Ideas (PDFDrive)
No ratings yet
The Business Idea Factory - A World-Class System For Creating Successful Business Ideas (PDFDrive)
81 pages
Streamlit Application For Helmet Detection Based On YOLOS: Case Study Indonesia
No ratings yet
Streamlit Application For Helmet Detection Based On YOLOS: Case Study Indonesia
5 pages
Kendriya Vidyalaya Sangathan Hyderabad Region: All Questions Are Compulsory. Marks Questions
No ratings yet
Kendriya Vidyalaya Sangathan Hyderabad Region: All Questions Are Compulsory. Marks Questions
8 pages
Econometrics Multiple Regression Analysis: Heteroskedasticity
No ratings yet
Econometrics Multiple Regression Analysis: Heteroskedasticity
19 pages
Inclusivity in The Art World
No ratings yet
Inclusivity in The Art World
2 pages
5 Differences Between MCB and MCCB - Schneider Electric
No ratings yet
5 Differences Between MCB and MCCB - Schneider Electric
3 pages