Clustering & Association Algorithms 4
Clustering & Association Algorithms 4
ANALYTICS
Facilitators:
L.Amos
S.Chaputsira
T.Butsa
ADVANCED ANALYTICS THEORY AND METHODS
CLUSTERING
• Clustering is the process of dividing the dataset into groups,consisting of similar data points
• Points in the same group are as similar as possible
• Points in different groups are as dissimilar as possible e.g group of diners in a restaurant or
items arranged in a mall
• It is used in amazon recommendation systems and Netflix recommended movies
• In retail it is used in market segmentation, analysis of customer shopping behavior
• In banking it is used in customer segmentation and customer credit scoring
TYPES OF CLUSTERING
Step 3: Measure the distance between the 1st point to the nearest cluster
Step 5:Calculating the mean value including the new point for the red cluster
Association Rule Mining, as the name suggests, association rules are simple If/Then
statements that help discover relationships between seemingly independent relational
databases or other data repositories.
Most machine learning algorithms work with numeric datasets and hence tend to be
mathematical. However, association rule mining is suitable for non-numeric, categorical data
and requires just a little bit more than simple counting.
Association rule mining is a procedure which aims to observe frequently occurring patterns,
correlations, or associations from datasets found in various kinds of databases such as
relational databases, transactional databases, and other forms of repositories.
• An association rule has 2 parts:
• an antecedent (if) and
• a consequent (then)
An antecedent is something that’s found in data, and a consequent is an item that is found in
combination with the antecedent e.g
• It is based on the concept that a subset of frequent item sets must also be a frequent itemset
• Frequent itemset is an itemset whose support value is greater than the threshold value
STEPS IN APRIORI
The steps followed in the Apriori Algorithm of data mining are:
• Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with
itself.
• Prune Step: This step scans the count of each item in the database. If the candidate item
does not meet minimum support, then it is regarded as infrequent and thus it is removed.
This step is performed to reduce the size of the candidate itemsets.
Transaction List of Item
• T1 • Cabbage, Carrots, Spinach
• T2 • Carrots, Spinach ,Peas
• T3 • Peas, Brocoli
• T4 • Cabbage, Carrots, Peas
• T5 • Cabbage, Carrots, Spinach, Brocoli
• T6 • Cabbage, Carrots,Spinach,Peas
• Find all the frequent item sets by using apriori algorithm where:
min support =50%
• Big Data & Data Analytics (google.com)