0% found this document useful (0 votes)
37 views

Clustering & Association Algorithms 4

This document discusses clustering, an unsupervised machine learning technique used to group similar data points together. It describes different types of clustering including exclusive, overlapping, and hierarchical clustering. K-means clustering is discussed as an algorithm that groups data points into a predefined number of clusters based on similarity. The steps of the K-means algorithm are outlined. Association rule mining is also summarized as discovering relationships between data through if-then statements and the Apriori algorithm is presented as a way to generate frequent itemsets and association rules from transactional data.

Uploaded by

sanyengere
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Clustering & Association Algorithms 4

This document discusses clustering, an unsupervised machine learning technique used to group similar data points together. It describes different types of clustering including exclusive, overlapping, and hierarchical clustering. K-means clustering is discussed as an algorithm that groups data points into a predefined number of clusters based on similarity. The steps of the K-means algorithm are outlined. Association rule mining is also summarized as discovering relationships between data through if-then statements and the Apriori algorithm is presented as a way to generate frequent itemsets and association rules from transactional data.

Uploaded by

sanyengere
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

HIT 2203-BIG DATA & DATA

ANALYTICS
Facilitators:
L.Amos
S.Chaputsira
T.Butsa
ADVANCED ANALYTICS THEORY AND METHODS
CLUSTERING
• Clustering is the process of dividing the dataset into groups,consisting of similar data points
• Points in the same group are as similar as possible
• Points in different groups are as dissimilar as possible e.g group of diners in a restaurant or
items arranged in a mall
• It is used in amazon recommendation systems and Netflix recommended movies
• In retail it is used in market segmentation, analysis of customer shopping behavior
• In banking it is used in customer segmentation and customer credit scoring
TYPES OF CLUSTERING

Exclusive clustering Overlapping clustering Hierarchical clustering


Hard clustering Soft clustering Data points are combined based on similarity
in form of hierarchy
Data points /items belong exclusively to one Data points/items belongs to multiple cluster
cluster e.g k-means e.g Fuzzy/C-means clustering
K-MEANS CLUSTERING
ALGORITHM
• K-means is a clustering algorithm whose main goal is to group similar elements or data
points into a cluster

• K-represents the number of clusters


K-MEANS ALGORITHM
Step 1: Select the number of clusters to be identified i.e select a value for k

Step 2: Randomly select 3 distinct data points

Step 3: Measure the distance between the 1st point to the nearest cluster

Step 4:Assign the 1st point to the nearest cluster

Step 5:Calculating the mean value including the new point for the red cluster
Association Rule Mining, as the name suggests, association rules are simple If/Then
statements that help discover relationships between seemingly independent relational
databases or other data repositories.

Most machine learning algorithms work with numeric datasets and hence tend to be
mathematical. However, association rule mining is suitable for non-numeric, categorical data
and requires just a little bit more than simple counting.
Association rule mining is a procedure which aims to observe frequently occurring patterns,
correlations, or associations from datasets found in various kinds of databases such as
relational databases, transactional databases, and other forms of repositories.
• An association rule has 2 parts:
• an antecedent (if) and
• a consequent (then)
An antecedent is something that’s found in data, and a consequent is an item that is found in
combination with the antecedent e.g

“If a customer buys bread, he’s 70% likely of buying milk.”

bread is the antecedent and milk is the consequent


Simply put, it can be understood as a retail store’s association rule to target their customers
better.
If the above rule is a result of a thorough analysis of some data sets, it can be used to not only
improve customer service but also improve the company’s revenue.

Lets look at an example of an association algorithm in the next slide .


APRIORI-ALGORITHM
• Apriori algorithm uses frequent item sets to generate association rules

• It is based on the concept that a subset of frequent item sets must also be a frequent itemset

• Frequent itemset is an itemset whose support value is greater than the threshold value
STEPS IN APRIORI
The steps followed in the Apriori Algorithm of data mining are:
• Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with
itself.
• Prune Step: This step scans the count of each item in the database. If the candidate item
does not meet minimum support, then it is regarded as infrequent and thus it is removed.
This step is performed to reduce the size of the candidate itemsets.
Transaction List of Item
• T1 • Cabbage, Carrots, Spinach
• T2 • Carrots, Spinach ,Peas
• T3 • Peas, Brocoli
• T4 • Cabbage, Carrots, Peas
• T5 • Cabbage, Carrots, Spinach, Brocoli
• T6 • Cabbage, Carrots,Spinach,Peas
• Find all the frequent item sets by using apriori algorithm where:
min support =50%
• Big Data & Data Analytics (google.com)

You might also like