cbs
cbs
Associations are the specific measurable constraints on interestingness used in association rule
learning. Regardless of the rules being employed to classify new data, the associations need to be
defined by constraints to determine what is both interesting and relevant. While many variables
could be selected as an association constraint, the most commonly used are:
Support – How frequently the pattern/items occur in the dataset.
Confidence – How often the rule being used has been true (conditional probability).
Lift – Actual success rate of the target model (rule) over the expected success from
random chance.
Conviction – Actual incorrect predication rate over the expected failure rate from random
chance.
What is Association Rule Learning in Machine Learning?
Association rule learning is an unsupervised learning technique that examines the
dependency of one data item on another and maps accordingly to make it more profitable.
It tries to discover some interesting relationships or links between the dataset's variables.
It uses a set of rules to find interesting relationships between variables in a database.
One of the most important topics in machine learning is association rule learning, which
is used in Market Basket analysis, Web usage mining, continuous production, and
other applications. Market basket analysis is a method used by many large retailers to
find the relationships between items. We may explain it by using the example of a
supermarket, where all things purchased at the same time are grouped together.
How Does Association Learning Work?
Along with cluster analysis and anomaly detection, association rules are the most widely used
unsupervised learning techniques. Association learning is a machine learning and data mining
technique that creates rules for finding interesting relations between variables. Unlike
conventional association algorithms measuring degrees of similarity, association rule learning
identifies hidden correlations in databases by applying some measure of interestingness to
generate an association rule for new searches.
Association rule algorithms count the frequency of complimentary occurrences, or associations,
across a large collection of items or actions. The goal is to find associations that take place
together far more often than you would find in a random sampling of possibilities. This rule-
based approach is a fast and powerful tool for mining categorized, non-numeric databases.
A classic example of this system in practice is analyzing retail sales to find the best way to place
items in a store. In a store with a million transactions a year, 10,000 sales might include newborn
baby diapers and 100,000 include razor blades. At first glance, newborn diapers and razors seem
statistically independent, with no apparent correlation. But rule mining would dig deeper into the
transaction frequency and find out that 5,000 sales include both items.
So instead of simply learning that 1% of shoppers buy diapers and 10% buy razor blades, the
association system generates a new rule that 50% of all shoppers purchasing newborn diapers
will also buy razor blades. A much more useful bit of information for retailers.
Just as important, the rule-based approach enhances performance and generates new rules as it
analyzes more data. With a large enough dataset, this allows the machine to mimic the human
brain’s feature extraction and abstract association capabilities from raw data.The same basic
technique has countless other applications as well
Association rule learning is a kind of unsupervised learning technique that tests for the reliance
of one data element on another data element and design appropriately so that it can be more cost-
effective. It tries to discover some interesting relations or associations between the variables of
the dataset. It depends on various rules to find interesting relations between variables in the
database.
The association rule learning is the most important approach of machine learning, and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc. In market
basket analysis, it is an approach used by several big retailers to find the relations between items.
Web mining can be viewed as the application of adapted data mining methods to the internet,
although data mining is defined as the application of the algorithm to discover patterns on mostly
structured data fixed into a knowledge discovery process.
Web mining has a distinctive property to support a collection of multiple data types. The web has
several aspects that yield multiple approaches for the mining process, such as web pages
including text, web pages are connected via hyperlinks, and user activity can be monitored via
web server logs.
In market basket analysis, customer buying habits are analyzed by finding associations between
the different items that customers place in their shopping baskets. By discovering such
associations, retailers produce marketing methods by analyzing which elements are frequently
purchased by users. This association can lead to increased sales by supporting retailers to do
selective marketing and plan for their shelf area.
Types of Association Rule Learning
There are the following types of Association rule learning which are as follows −
Apriori Algorithm − This algorithm needs frequent datasets to produce association rules. It is
designed to work on databases that include transactions. This algorithm needs a breadth-first
search and hash tree to compute the itemset efficiently.
It is generally used for market basket analysis and support to learn the products that can be
purchased together. It can be used in the healthcare area to discover drug reactions for patients.
Eclat Algorithm − The Eclat algorithm represents Equivalence Class Transformation. This
algorithm needs a depth-first search method to discover frequent itemsets in a transaction
database. It implements quicker execution than Apriori Algorithm.
F-P Growth Algorithm − The F-P growth algorithm represents Frequent Pattern. It is the
enhanced version of the Apriori Algorithm. It describes the database in the form of a tree
structure that is referred to as a frequent pattern or tree. This frequent tree aims to extract the
most frequent patterns.
If element is referred to as the antecedent, and the Then statement is referred to as the
Consequent. Single cardinality refers to relationships in which we can discover an association or
relationship between two objects. It's all about making rules, and as the number of items grows,
so does cardinality. There are numerous metrics for measuring the relationships between
thousands of data pieces. These figures are as follows:
Support
Confidence
Lift
Let's understand each of them:
Support
The frequency of A, or how frequently an item appears in the dataset, is called support. It's the
percentage of the transaction T that has the itemset X in it. If there are X datasets, the following
can be written per transaction T:
Confidence
The level of confidence represents how often the rule has been proven correct. Or, since the
incidence of X is already known, how frequently the elements X and Y appear together in the
dataset. It's the ratio of the number of records that contain X to the number of transactions that
contain X.
Lift
It is the strength of any rule, which can be defined as below formula:
If X and Y are independent of one another, it is the ratio of observed support to expected support.
It can take one of three forms:
If X and Y are independent of one another, it is the ratio of observed support to expected support.
It can take one of three forms:
If Lift = 1 : the chances of the antecedent and consequent occurring are independent of
one another.
Lift > 1 : This specifies the degree to which the two itemsets are interdependent.
Lift < 1 : It indicates that one object can be used in place of another, implying that
one item has a negative impact on another.
Types of Association Rule Lerning
Association rule learning can be divided into 3 algorithms:
Apriori Algorithm
To build association rules, this technique employs a large number of datasets. It's made to deal
with databases that have transactions in them. To calculate the itemset efficiently, this algorithm
uses a breadth-first search and a Hash Tree.
It is mostly used for market basket analysis and assists in determining which products can be
purchased together. It can also be utilized to discover drug reactions in patients in the healthcare
field.
Eclat Algorithm
Equivalence Class Transformation is the name of the Eclat algorithm. This algorithm finds
frequent itemsets in a transaction database by using a depth-first search technique. It executes
faster than the Apriori Algorithm.
F-P Growth Algorithm
The F-P growth algorithm is an upgraded variant of the Apriori Algorithm. It stands for Frequent
Pattern. It represents the database as a frequent pattern or tree, which is a type of tree structure.
This frequent tree's goal is to extract the most common patterns.
Applications of Association Rule Learning
It can be used in a variety of machine learning and data mining applications. The following are
some of the most common uses of association rule learning:
Market Basket Analysis: One of the most well-known examples and applications of
association rule mining is market basket analysis. Big merchants frequently employ this
strategy to determine the relationship between items.
Medical Diagnosis: Patients can be cured quickly using association rules, as they assist
in determining the likelihood of illness for a specific ailment.
Protein Sequence: The rules of association aid in the production of artificial proteins. •
It's also utilized for Catalog Design, Loss-leader Analysis, and a variety of other tasks.