0% found this document useful (0 votes)
10 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module 5

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 91

UNIT V

UNSUPERVISED LEARNING AND REINFORCEMENT


LEARNING
Clustering
• Clustering or cluster analysis is a machine learning technique, which groups the unlabelled
dataset.

• It can be defined as "A way of grouping the data points into different clusters, consisting of
similar data points. The objects with the possible similarities remain in a group that has less or
no similarities with another group.“

• It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color,
behavior, etc., and divides them as per the presence and absence of those similar patterns.

• It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it


deals with the unlabeled dataset.

• After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML
system can use this id to simplify the processing of large and complex datasets.
Realtime Example:
Let's understand the clustering technique with the real-world example of Mall:
When we visit any shopping mall, we can observe that the things with similar
usage are grouped together. Such as the t-shirts are grouped in one section,
and trousers are at other sections, similarly, at vegetable sections, apples,
bananas, Mangoes, etc., are grouped in separate sections, so that we can easily
find out the things.

The clustering technique also works in the same way. Other examples of
clustering are grouping documents according to the topic.
The clustering technique can be widely used in various tasks.
Some most common uses of this technique are:
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
Note:

it is used by the Amazon in its recommendation system to provide the recommendations as


per the past search of products.

Netflix also uses this technique to recommend the movies and web-series to its users as per
the watch history.
Diagram explains working of the clustering algorithm
Types of Clustering Methods
• The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to
another group also). - the main clustering methods used in Machine
learning:
1.Partitioning Clustering
2.Density-Based Clustering
3.Distribution Model-Based Clustering
4.Hierarchical Clustering
5.Fuzzy Clustering
Partitioning Clustering

• It is a type of clustering that divides the data into non-


hierarchical groups. It is also known as the centroid-
based method. The most common example of
partitioning clustering is the
K-Means Clustering algorithm.

• In this type, the dataset is divided into a set of k


groups, where K is used to define the number of pre-
defined groups.

• The cluster center is created in such a way that the


distance of the data points of one cluster is minimum as
compared to another cluster centroids.
Density-Based Clustering
• The density-based clustering method connects the highly-
dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be
connected.

• This algorithm does it by identifying different clusters in


the dataset and connects the areas of high densities into
clusters. The dense areas in data space are divided from
each other by sparser areas.

• These algorithms can face difficulty in clustering the data


points if the dataset has varying densities and high
dimensions.
Distribution Model-Based Clustering

• In the distribution model-based clustering


method, the data is divided based on the
probability of how a dataset belongs to a
particular distribution.

• The grouping is done by assuming some


distributions commonly Gaussian Distribution.

• The example of this type is the Expectation-


Maximization Clustering algorithm that uses
Gaussian Mixture Models (GMM).
Hierarchical Clustering

• Hierarchical clustering can be used as an alternative


for the partitioned clustering as there is no requirement
of pre-specifying the number of clusters to be created.

• In this technique, the dataset is divided into clusters to


create a tree-like structure, which is also called
a dendrogram.

• The observations or any number of clusters can be


selected by cutting the tree at the correct level. The
most common example of this method is
the Agglomerative Hierarchical algorithm.
Fuzzy Clustering
• Fuzzy clustering is a type of soft method in which a data object may belong to
more than one group or cluster.

• Each dataset has a set of membership coefficients, which depend on the degree of
membership to be in a cluster.

• Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes


also known as the Fuzzy k-means algorithm.

• The 'Fuzzy' word means the things that are not clear or are vague. Sometimes, we
cannot decide in real life that the given problem or statement is either true or false.

• At that time, this concept provides many values between the true and false and
gives the flexibility to find the best solution to that problem.
Clustering Algorithms
• K-means clustering
• The k-means algorithm is one of the most popular clustering algorithms. It classifies the dataset by dividing the
samples into different clusters of equal variances. The number of clusters must be specified in this algorithm. It is
fast with fewer computations required, with the linear complexity of O(n).
• K-Medoids
• K-Medoids is a clustering algorithm that is an extension of the K-Means algorithm. Instead of using the mean
(average) point as the center of a cluster, K-Medoids employs the actual data point that minimizes the sum of
dissimilarities to other points within the cluster. This approach makes K-Medoids more robust to outliers
compared to K-Means, as the medoid is less sensitive to extreme values. The algorithm is particularly useful in
situations where the mean may not accurately represent the cluster center, and the selection of medoids can lead to
more accurate clustering results.
• Density-based methods
• Density-based clustering methods are a class of clustering algorithms that identify clusters based on the density of
data points in the feature space. These methods aim to discover clusters of arbitrary shapes and can effectively
handle noise and outliers. One well-known density-based clustering algorithm is Density-Based Spatial Clustering of
Applications with Noise (DBSCAN).
• DBSCAN
• It stands for Density-Based Spatial Clustering of Applications with Noise. It is an example of a density-based
model similar to the mean-shift, but with some remarkable advantages. In this algorithm, the areas of high density
• Hierarchical clustering:
• Hierarchical clustering is a clustering algorithm that organizes data into a tree-like structure, known
as a dendrogram. The algorithm builds clusters in a hierarchical manner by iteratively merging or
splitting data points or existing clusters based on their similarities. There are two main types of
hierarchical clustering: agglomerative and divisive.

1. Agglomerative Hierarchical Clustering:


1. Begins with each data point as a separate cluster.

2. Iteratively merges the closest clusters until a single cluster containing all data points is formed.

3. The process is visualized in a dendrogram, where the height of the branches represents the dissimilarity between
merged clusters.

2. Divisive Hierarchical Clustering:


1. Starts with all data points in a single cluster.

2. Iteratively divides clusters into smaller ones until each cluster contains only one data point.

3. The process is also visualized in a dendrogram, showing the sequence of cluster divisions.
Applications of Clustering - commonly known applications of clustering technique

• In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of cancerous
cells. It divides the cancerous and non-cancerous data sets into different groups.

• In Search Engines: Search engines also work on the clustering technique. The search result appears based on the

closest object to the search query. It does it by grouping similar data objects in one group that is far from the
other dissimilar objects. The accurate result of a query depends on the quality of the clustering algorithm used.

• Customer Segmentation: It is used in market research to segment the customers based on their choice and
preferences.

• In Biology: It is used in the biology stream to classify different species of plants and animals using the image
recognition technique.

• In Land Use: The clustering technique is used in identifying the area of similar lands use in the GIS database.

This can be very useful to find that for what purpose the particular land should be used, that means for which
purpose it is more suitable.
• K-means Clustering:
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters.
Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will
be two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that
each dataset belongs only one group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
• The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative
process.
• Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.
• Hence each cluster has datapoints with some commonalities, and it is away
from other clusters.
• The below diagram explains the working of the K-means Clustering Algorithm:
How does the K-Means Algorithm Work? - Steps
• Step-1: Select the number K to decide the number of clusters.
• Step-2: Select random K points or centroids. (It can be other from the input
dataset).
• Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
• Step-4: Calculate the variance and place a new centroid of each cluster. For
each cluster, calculate the mean (centroid) of all the data points assigned to
that cluster. This mean becomes the new centroid for that cluster.
• Step-5: Repeat the third steps, which means reassign each datapoint to the
new closest centroid of each cluster.
• Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
• Step-7: The model is ready.
• Let's understand the above steps by considering the visual plots:
• Suppose we have two variables M1 and M2. The x-y axis scatter plot of
these two variables is given below:

• Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these datasets
into two different clusters.
• We need to choose some random k points or centroid to form the cluster.
These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not the
part of our dataset. Consider the below image:
Now we will assign each data point of the scatter plot to its closest K-point or centroid. We
will compute it by applying some mathematics that we have studied to calculate the distance
between two points. So, we will draw a median between both the centroids. Consider the
below image:
• From the above image, it is clear that points left side of the line is near to the
K1 or blue centroid, and points to the right of the line are close to the yellow
centroid. Let's color them as blue and yellow for clear visualization.

• As we need to find the closest cluster, so we will repeat the process by


choosing a new centroid. To choose the new centroids, we will compute the
center of gravity of these centroids, and will find new centroids as below:
• Next, we will reassign each datapoint to the new centroid. For this, we will
repeat the same process of finding a median line. The median will be like
below image:

• From the above image, we can see, one yellow point is on the left side of the
line, and two blue points are right to the line. So, these three points will be
assigned to new centroids.
• As reassignment has taken place, so we will again go to the step-4, which is
finding new centroids or K-points.
• We will repeat the process by finding the center of gravity of centroids, so the
new centroids will be as shown in the below image:

• As we got the new centroids so again will draw the median line and reassign
the data points. So, the image will be:
• We can see in the above image; there are no dissimilar data points on either
side of the line, which means our model is formed. Consider the below
image:

• As our model is ready, so we can now remove the assumed centroids, and
the two final clusters will be as shown in the below image:
How to choose the value of "K number of clusters" in K-means
Clustering?
• The performance of the K-means clustering algorithm depends upon highly
efficient clusters that it forms. But choosing the optimal number of clusters
is a big task. There are some different ways to find the optimal number of
clusters, but here we are discussing the most appropriate method to find the
number of clusters or value of K. The method is given below:
• Elbow Method
• The Elbow method is one of the most popular ways to find the optimal
number of clusters. This method uses the concept of WCSS
value. WCSS stands for Within Cluster Sum of Squares, which defines the
total variations within a cluster. The formula to calculate the value of WCSS
(for 3 clusters) is given below:
• In the above formula of WCSS,
• ∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between
each data point and its centroid within a cluster1 and the same for the other two
terms.
• To measure the distance between data points and centroid, we can use any
method such as Euclidean distance or Manhattan distance.
• To find the optimal value of clusters, the elbow method follows the below
steps:
• It executes the K-means clustering on a given dataset for different K values
(ranges from 1-10).
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters K.
• The sharp point of bend or a point of the plot looks like an arm, then that
point is considered as the best value of K.
Since the graph shows the sharp bend, which looks like an elbow, hence
it is known as the elbow method. The graph for the elbow method looks
like the below image:

Note: We can choose the number of clusters equal to the given data points. If we
choose the number of clusters equal to the data points, then the value of WCSS
becomes zero, and that will be the endpoint of the plot.
Python Implementation of K-means Clustering Algorithm

• 1.
https://www.javatpoint.com/k-means-clustering-algorithm-in-machine
-learning
• 2. https://www.w3schools.com/python/python_ml_k-means.asp
Why hierarchical clustering?
• we have seen in the K-means clustering that there are some challenges with
this algorithm, which are a predetermined number of clusters, and it always
tries to create the clusters of the same size.

• To solve these two challenges, we can opt for the hierarchical clustering
algorithm because, in this algorithm, we don't need to have knowledge about
the predefined number of clusters.
Agglomerative Hierarchical clustering

• The agglomerative hierarchical clustering algorithm is a popular example of


HCA. To group the datasets into clusters, it follows the bottom-up approach.

• It means, this algorithm considers each dataset as a single cluster at the


beginning, and then start combining the closest pair of clusters together.

• It does this until all the clusters are merged into a single cluster that contains
all the datasets.

• This hierarchy of clusters is represented in the form of the dendrogram.


How the Agglomerative Hierarchical clustering Work?

• The working of the AHC algorithm can be explained using the below steps:

• Step-1: Create each data point as a single cluster. Let's say there are N data points,

will also be N.

Step-2: Take two closest data points or clusters and merge them to form one cluster.

clusters.
• Step-3: Again, take the two closest clusters and merge them together to form
one cluster. There will be N-2 clusters.

• Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
• Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.

Measure for the distance between two clusters

• As we have seen, the closest distance between the two clusters is crucial for
the hierarchical clustering. There are various ways to calculate the distance
between two clusters, and these ways decide the rule for clustering. These
measures are called Linkage methods. Some of the popular linkage methods
are given below:
• Single Linkage: It is the Shortest Distance between the closest points of the
clusters. Consider the below image:

• Complete Linkage: It is the farthest distance between the two points of two
different clusters. It is one of the popular linkage methods as it forms tighter
clusters than single-linkage.
1.Average Linkage: It is the linkage method in which the distance between each
pair of datasets is added up and then divided by the total number of datasets to
calculate the average distance between two clusters. It is also one of the most
popular linkage methods.
2.Centroid Linkage: It is the linkage method in which the distance between the
centroid of the clusters is calculated. Consider the below image:

From the above-given approaches, we can apply any of them according to the
type of problem or business requirement.
• Woking of Dendrogram in Hierarchical clustering
• The dendrogram is a tree-like structure that is mainly used to store each step as
a memory that the HC algorithm performs. In the dendrogram plot, the Y-axis
shows the Euclidean distances between the data points, and the x-axis
shows all the data points of the given dataset.
• The working of the dendrogram can be explained using the below diagram:
• In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
• As we have discussed above, firstly, the datapoints P2 and P3 combine together
and form a cluster, correspondingly a dendrogram is created, which connects
P2 and P3 with a rectangular shape. The hight is decided according to the
Euclidean distance between the data points.
• In the next step, P5 and P6 form a cluster, and the corresponding dendrogram is
created. It is higher than of previous, as the Euclidean distance between P5 and
P6 is a little bit greater than the P2 and P3.
• Again, two new dendrograms are created that combine P1, P2, and P3 in one
dendrogram, and P4, P5, and P6, in another dendrogram.
• At last, the final dendrogram is created that combines all the data points
together.
• We can cut the dendrogram tree structure at any level as per our requirement.
Python Implementation of Agglomerative Hierarchical Clustering

1.
https://www.javatpoint.com/hierarchical-clustering-in-machine-learning
2.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.Agglom
erativeClustering.html
3. https://www.kaggle.com/code/khotijahs1/hierarchical-agglomerative-
clustering
Dendrogram
Dendrogram
Dendrogram
Dendrogram
Dendrogram
What is Density-based clustering?
• Density-Based Clustering refers to one of the most popular unsupervised
learning methodologies used in model building and machine learning
algorithms. The data points in the region separated by two clusters of low
point density are considered as noise. The surroundings with a radius ε of
a given object are known as the ε neighborhood of the object. If the ε
neighborhood of the object comprises at least a minimum number, MinPts
of objects, then it is called a core object.
Density-Based Clustering - Background
• There are two different parameters to calculate the density-based clustering
• EPS: It is considered as the maximum radius of the neighborhood.
• MinPts: MinPts refers to the minimum number of points in an Eps
neighborhood of that point.
• NEps (i) : { k belongs to D and dist (i,k) < = Eps}
• Directly density reachable:
• A point i is considered as the directly density reachable from a point k with
respect to Eps, MinPts if
• i belongs to NEps(k)
• Core point condition:
• NEps (k) >= MinPts
• Density reachable:
• A point denoted by i is a density reachable from a point j with respect to Eps,
MinPts if there is a sequence chain of a point i1,…., in, i1 = j, pn = i such that
ii + 1 is directly density reachable from ii.
• Density connected:
• A point i refers to density connected to a point j with respect to Eps, MinPts
if there is a point o such that both i and j are considered as density
reachable from o with respect to Eps and MinPts.
• Working of Density-Based Clustering
• Suppose a set of objects is denoted by D', we can say that an object I is directly
density reachable form the object j only if it is located within the ε
neighborhood of j, and j is a core object.
• An object i is density reachable form the object j with respect to ε and MinPts
in a given set of objects, D' only if there is a sequence of object chains point i1,
…., in, i1 = j, pn = i such that ii + 1 is directly density reachable from i i with
respect to ε and MinPts.
• An object i is density connected object j with respect to ε and MinPts in a
given set of objects, D' only if there is an object o belongs to D such that both
point i and j are density reachable from o with respect to ε and MinPts.
• Major Features of Density-Based Clustering
• The primary features of Density-based clustering are given below.
• It is a scan method.
• It requires density parameters as a termination condition.
• It is used to manage noise in data clusters.
• Density-based clustering is used to identify clusters of arbitrary size.
• Density-Based Clustering Methods-DBSCAN, OPTICS, DENCLUE

• DBSCAN
• DBSCAN stands for Density-Based Spatial Clustering of Applications with
Noise. It depends on a density-based notion of cluster. It also identifies
clusters of arbitrary size in the spatial database with outliers.
DBSCAN groups together closely packed points based on their density in a given space.
Here's a step-by-step explanation of how DBSCAN works:

1.Initialization:
1. Choose two parameters:
1.ϵ (epsilon): The maximum distance between two points for them to be considered as
in the same neighborhood.
2.MinPts: The minimum number of points required to form a dense region.
2.Core Points Identification:
1. For each point in the dataset, compute the number of points within its ϵ-neighborhood
(including itself).
2. If a point has at least MinPts neighboring points within ϵ, it is considered a core point.
3.Density-Reachable Points Identification:
1. For each core point, recursively find all points that are density-reachable from it.
2. A point P is density-reachable from another point Q if there exists a chain of core
points leading from Q to P, with each successive point in the chain being directly
reachable from the previous one within ϵ.
4. Cluster Formation:
1. Assign each core point and its density-reachable neighbors to the same cluster.
2. If a core point has neighbors that belong to different clusters, merge those clusters.
5. Border Points Assignment:
3. Assign any point that is not a core point but falls within the ϵ-neighborhood of a core
point to the same cluster as that core point.
4. These points are called border points.
6. Noise Points Identification:
5. Any point that is neither a core point nor a border point is considered a noise point and is
not assigned to any cluster.
7. Output:
6. The output of the DBSCAN algorithm is a set of clusters, each containing a group of
points that are closely packed together and separated by areas of lower density or noise
points.
DBSCAN is robust to noise and capable of identifying clusters of arbitrary shapes. However, it
requires careful selection of the parameters ϵ and MinPts to achieve optimal results for a given
dataset.
Association Rule Learning

• Association rule learning is a type of unsupervised learning technique that


checks for the dependency of one data item on another data item and maps
accordingly so that it can be more profitable. It tries to find some interesting
relations or associations among the variables of dataset. It is based on
different rules to discover the interesting relations between variables in the
database.
• The association rule learning is one of the very important concepts of
machine learning, and it is employed in Market Basket analysis, Web
usage mining, continuous production, etc. Here market basket analysis is a
technique used by the various big retailer to discover the associations
between items. We can understand it by taking an example of a supermarket,
as in a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or
milk, so these products are stored within a shelf or mostly nearby. Consider the
below diagram:
Association rule learning can be divided into
three types of algorithms:
• Apriori Algorithm
• This algorithm uses frequent datasets to generate association rules. It is designed to work on the
databases that contain transactions. This algorithm uses a breadth-first search and Hash Tree to
calculate the itemset efficiently.
• It is mainly used for market basket analysis and helps to understand the products that can be bought
together. It can also be used in the healthcare field to find drug reactions for patients.
• Eclat Algorithm
• Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a depth-first
search technique to find frequent itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.
• F-P Growth Algorithm
• The F-P growth algorithm stands for Frequent Pattern, and it is the improved version of the
Apriori Algorithm. It represents the database in the form of a tree structure that is known as a
frequent pattern or tree. The purpose of this frequent tree is to extract the most frequent patterns.
How does Association Rule Learning work?

• Association rule learning works on the concept of If and Else Statement,


such as if A then B.

• Here the If element is called antecedent, and then statement is called


as Consequent. These types of relationships where we can find out some
association or relation between two items is known as single cardinality.
It is all about creating rules, and if the number of items increases, then
cardinality also increases accordingly. So, to measure the associations
between thousands of data items, there are several metrics.
These metrics are given below:
• Support
• Confidence
• Lift
• Let's understand each of them:
• Support
• Support is the frequency of A or how frequently an item appears in the dataset.
It is defined as the fraction of the transaction T that contains the itemset X. If
there are X datasets, then for transactions T, it can be written as:
• Confidence
• Confidence indicates how often the rule has been found to be true. Or how often the
items X and Y occur together in the dataset when the occurrence of X is already
given. It is the ratio of the transaction that contains X and Y to the number of records
that contain X.

• Lift
• It is the strength of any rule, which can be defined as below formula:
• It is the ratio of the observed support measure and expected support if
X and Y are independent of each other. It has three possible values:
• If Lift= 1: The probability of occurrence of antecedent and consequent
is independent of each other.
• Lift>1: It determines the degree to which the two itemsets are
dependent to each other.
• Lift<1: It tells us that one item is a substitute for other items, which
means one item has a negative effect on another.
• Applications of Association Rule Learning
• It has various applications in machine learning and data mining. Below are some
popular applications of association rule learning:
• Market Basket Analysis: It is one of the popular examples and applications of
association rule mining. This technique is commonly used by big retailers to
determine the association between items.
• Medical Diagnosis: With the help of association rules, patients can be cured easily,
as it helps in identifying the probability of illness for a particular disease.
• Protein Sequence: The association rules help in determining the synthesis of
artificial Proteins.
• It is also used for the Catalog Design and Loss-leader Analysis and many more
other applications.
• Apriori Algorithm in Machine Learning
• The Apriori algorithm uses frequent itemsets to generate association rules, and
it is designed to work on the databases that contain transactions. With the help
of these association rule, it determines how strongly or how weakly two
objects are connected. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset associations efficiently. It is the iterative process
for finding the frequent itemsets from the large dataset.
• This algorithm was given by the R. Agrawal and Srikant in the year 1994. It
is mainly used for market basket analysis and helps to find those products that
can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.
• What is Frequent Itemset?
• Frequent itemsets are those items whose support is greater than the
threshold value or user-specified minimum support. It means if A & B
are the frequent itemsets together, then individually A and B should
also be the frequent itemset.
• Suppose there are the two transactions: A= {1,2,3,4,5}, and B=
{2,3,7}, in these two transactions, 2 and 3 are the frequent itemsets.
• Note: To better understand the apriori algorithm, and related
term such as support and confidence, it is recommended to
understand the association rule learning.
• Steps for Apriori Algorithm
• Below are the steps for the apriori algorithm:
• Step-1: Determine the support of itemsets in the transactional
database, and select the minimum support and confidence.
• Step-2: Take all supports in the transaction with higher support value
than the minimum or selected support value.
• Step-3: Find all the rules of these subsets that have higher confidence
value than the threshold or minimum confidence.
• Step-4: Sort the rules as the decreasing order of lift.
• Apriori Algorithm Working
• We will understand the apriori algorithm using an example and mathematical
calculation:
• Example: Suppose we have the following dataset that has various
transactions, and from this dataset, we need to find the frequent itemsets and
generate the association rules using the Apriori algorithm:
• Solution:
• Step-1: Calculating C1 and L1:
• In the first step, we will create a table that contains support count (The
frequency of each itemset individually in the dataset) of each itemset in the
given dataset. This table is called the Candidate set or C1.

• Now, we will take out all the itemsets that have the greater support count than
the Minimum Support (2). It will give us the table for the frequent itemset L1.
Since all the itemsets have greater or equal support count than the minimum
support, except the E, so E itemset will be removed.
Step-2: Candidate Generation C2, and L2:
•In this step, we will generate C2 with the help of L1. In C2, we will create the pair of the itemsets of L1 in the
form of subsets.
•After creating the subsets, we will again find the support count from the main transaction table of datasets,
i.e., how many times these pairs have occurred together in the given dataset. So, we will get the below table for
C2:
• Again, we need to compare the C2 Support count with the minimum support
count, and after comparing, the itemset with less support count will be
eliminated from the table C2. It will give us the below table for L2

• Step-3: Candidate generation C3, and L3:


• For C3, we will repeat the same two processes, but now we will form the C3
table with subsets of three itemsets together, and will calculate the support
count from the dataset. It will give the below table:
• Now we will create the L3 table. As we can see from the above C3 table, there
is only one combination of itemset that has support count equal to the
minimum support count. So, the L3 will have only one combination, i.e., {A,
B, C}.
• Step-4: Finding the association rules for the subsets:
• To generate the association rules, first, we will create a new table with the
possible rules from the occurred combination {A, B.C}. For all the rules, we
will calculate the Confidence using formula sup( A ^B)/A. After calculating
the confidence value for all rules, we will exclude the rules that have less
confidence than the minimum threshold(50%).
Consider the below table:

As the given threshold or minimum confidence is 50%, so the first three


rules A ^B → C, B^C → A, and A^C → B can be considered as the strong
association rules for the given problem.
• Advantages of Apriori Algorithm
• This is easy to understand algorithm
• The join and prune steps of the algorithm can be easily implemented on large
datasets.
• Disadvantages of Apriori Algorithm
• The apriori algorithm works slow compared to other algorithms.
• The overall performance can be reduced as it scans the database for multiple
times.
• The time complexity and space complexity of the apriori algorithm is O(2 D),
which is very high. Here D represents the horizontal width present in the
database.
Python Implementation of Apriori Algorithm

• 1. https://www.javatpoint.com/apriori-algorithm-in-machine-learning
• 2.
https://www.geeksforgeeks.org/implementing-apriori-algorithm-in-pyt
hon/
• 3. https://www.kaggle.com/code/nandinibagga/apriori-algorithm
Reinforcement Learning

• Reinforcement Learning is a feedback-based Machine learning technique in


which an agent learns to behave in an environment by performing the actions
and seeing the results of actions. For each good action, the agent gets positive
feedback, and for each bad action, the agent gets negative feedback or penalty.
• In Reinforcement Learning, the agent learns automatically using feedbacks
without any labeled data, unlike supervised learning.
• Since there is no labeled data, so the agent is bound to learn by its experience
only.
• RL solves a specific type of problem where decision making is sequential, and
the goal is long-term, such as game-playing, robotics, etc.
• The agent interacts with the environment and explores it by itself. The primary
goal of an agent in reinforcement learning is to improve the performance by
getting the maximum positive rewards.
• The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task in a better way. Hence, we can say
that "Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and
learns to act within that." How a Robotic dog learns the movement of his arms
is an example of Reinforcement learning.
• It is a core part of Artificial intelligence, and all AI agent works on the concept
of reinforcement learning. Here we do not need to pre-program the agent, as it
learns from its own experience without any human intervention.
• Example: Suppose there is an AI agent present
within a maze environment, and his goal is to find
the diamond. The agent interacts with the
environment by performing some actions, and
based on those actions, the state of the agent gets
changed, and it also receives a reward or penalty
as feedback.
• The agent continues doing these three things (take
action, change state/remain in the same state,
and get feedback), and by doing these actions, he
learns and explores the environment.
• The agent learns that what actions lead to positive
feedback or rewards and what actions lead to
negative feedback penalty. As a positive reward,
the agent gets a positive point, and as a penalty, it
gets a negative point.
• Terms used in Reinforcement Learning
• Agent(): An entity that can perceive/explore the environment and act upon it.
• Environment(): A situation in which an agent is present or surrounded by. In
RL, we assume the stochastic environment, which means it is random in nature.
• Action(): Actions are the moves taken by an agent within the environment.
• State(): State is a situation returned by the environment after each action taken
by the agent.
• Reward(): A feedback returned to the agent from the environment to evaluate
the action of the agent.
• Policy(): Policy is a strategy applied by the agent for the next action based on
the current state.
• Value(): It is expected long-term retuned with the discount factor and opposite
to the short-term reward.
• Q-value(): It is mostly similar to the value, but it takes one additional
parameter as a current action (a).
Introduction to Thompson Sampling method
• Reinforcement Learning is a branch of Machine Learning, also called Online
Learning. It is used to decide what action to take at t+1 based on data up to
time t. This concept is used in Artificial Intelligence applications such as
walking. A popular example of reinforcement learning is a chess engine. Here,
the agent decides upon a series of moves depending on the state of the board
(the environment), and the reward can be defined as a win or lose at the end of
the game.
• Thompson Sampling (Posterior Sampling or Probability Matching) is an
algorithm for choosing the actions that address the exploration-exploitation
dilemma in the multi-armed bandit problem. Actions are performed several
times and are called exploration. It uses training information that evaluates the
actions taken rather than instructs by giving correct actions. This is what
creates the need for active exploration, for an explicit trial-and-error search for
good behavior. Based on the results of those actions, rewards (1) or penalties
(0) are given for that action to the machine. Further actions are performed in
order to maximize the reward that may improve future performance. Suppose
a robot has to pick several cans and put them in a container. Each time it puts
the can to the container, it will memorize the steps followed and train itself to
perform the task with better speed and precision (reward). If the Robot is not
able to put the can in the container, it will not memorize that procedure (hence
speed and performance will not improve) and will be considered as a penalty.
• Thompson Sampling has the advantage of the tendency to decrease the
search as we get more and more information, which mimics the desirable
trade-off in the problem, where we want as much information as possible in
fewer searches. Hence, this Algorithm has a tendency to be more “search-
oriented” when we have fewer data and less “search-oriented” when we have
a lot of data.
• Multi-Armed Bandit Problem
Multi-armed Bandit is synonymous with a slot machine with many arms. Each action
selection is like a play of one of the slot machine’s levers, and the rewards are the
payoffs for hitting the jackpot. Through repeated action selections you are to
maximize your winnings by concentrating your actions on the best levers. Each
machine provides a different reward from a probability distribution over the mean
reward specific to the machine. Without knowing these probabilities, the gambler has
to maximize the sum of reward earned through a sequence of arms pull. If you
maintain estimates of the action values, then at any time step there is at least one
action whose estimated value is greatest. We call this a greedy action. The analogy to
this problem can be advertisements displayed whenever the user visits a webpage.
Arms are ads displayed to the users each time they connect to a web page. Each time
a user connects to the page makes around. At each round, we choose one ad to
display to the user. At each round n, ad I gives reward ri(n) ε {0, 1}: ri(n)=1 if the
user clicked on the ad i, 0 if the user didn’t. The goal of the algorithm will be to
maximize the reward. Another analogy is that of a doctor choosing between
experimental treatments for a series of seriously ill patients. Each action selection is a
treatment selection, and each reward is the survival or well-being of the patient.
Algorithm
• Some Practical Applications:
• Netflix Item based recommender systems: Images related to movies/shows are shown to users in such
a way that they are more likely to watch it.
• Bidding and Stock Exchange: Predicting Stocks based on Current data of stock prices.
• Traffic Light Control: Predicting the delay in the signal.
• Automation in Industries: Bots and Machines for transporting and Delivering items without human
intervention.
• Robotics: Reinforcement learning is used in robotics for motion planning, grasping objects, and
controlling the robot’s movement. It enables robots to learn from experience and make decisions based
on their environment.
• Game AI: Reinforcement learning has been used to train AI agents to play games like Chess, Go, and
Poker. It has been used to develop game bots that can compete against human players.
• Natural Language Processing (NLP): Reinforcement learning is used in NLP to train chatbots and
virtual assistants to provide personalized responses to users. It enables chatbots to learn from user
interactions and improve their responses over time.
• Advertising: Reinforcement learning is used in advertising to optimize ad placements and target
audiences. It enables advertisers to learn which ads perform best and adjust their campaigns accordingly.
• Finance: Reinforcement learning is used in finance for portfolio management, fraud detection, and risk
assessment. It enables financial
Thank You…

You might also like