Unit-II Association Rules
Unit-II Association Rules
and can be interpreted as the ratio of the expected frequency that X occurs without Y
(that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were
independent divided by the observed frequency of incorrect predictions.
This processanalyzes customer buying habits by finding associations between the different items
thatcustomers place in their shopping baskets. The discovery of such associationscan help
retailers develop marketing strategies by gaining insight into which itemsare frequently
purchased together by customers. For instance, if customers are buyingmilk, how likely are they
to also buy bread (and what kind of bread) on the same trip to the supermarket. Such information
can lead to increased sales by helping retailers doselective marketing and plan their shelf space.
Example:
If customers who purchase computers also tend to buy antivirussoftware at the same time, then
placing the hardware display close to the software displaymay help increase the sales of both
items. In an alternative strategy, placing hardware andsoftware at opposite ends of the store may
entice customers who purchase such items topick up other items along the way. For instance,
after deciding on an expensive computer,a customer may observe security systems for sale while
heading toward the software displayto purchase antivirus software and may decide to purchase a
home security systemas well. Market basket analysis can also help retailers plan which items to
put on saleat reduced prices. If customers tend to purchase computers and printers together,
thenhaving a sale on printers may encourage the sale of printers as well as computers.
We can mine the complete set of frequent itemsets, the closed frequent itemsets, and
the maximal frequent itemsets, given a minimum support threshold.
We can also mine constrained frequent itemsets, approximate frequent
itemsets,near- match frequent itemsets, top-k frequent itemsets and so on.
Some methods for associationrule mining can find rules at differing levels of abstraction.
For example, supposethat a set of association rules mined includes the following rules
where X is a variablerepresenting a customer:
In rule (1) and (2), the items bought are referenced at different levels ofabstraction (e.g.,
―computer‖ is a higher-level abstraction of ―laptop computer‖).
3. Based on the number of data dimensions involved in the rule:
If the items or attributes in an association rule reference only one dimension, then it is
asingle-dimensional association rule.
buys(X, ―computer‖))=>buys(X, ―antivirus software‖)
If a rule references two or more dimensions, such as the dimensions age, income, and
buys, then it is amultidimensional association rule. The following rule is an exampleof a
multidimensional rule:
age(X, ―30,31…39‖) ^ income(X, ―42K,…48K‖))=>buys(X, ―high resolution TV‖)
4. Based on the types of values handled in the rule:
If a rule involves associations between the presence or absence of items, it is a
Booleanassociation rule.
If a rule describes associations between quantitative items or attributes, then it is
aquantitative association rule.
7. The transactions in D are scanned in order to determine L3, consisting of those candidate
3-itemsets in C3 having minimum support.
8. The algorithm uses L3x L3 to generate a candidate set of 4-itemsets, C4.
Generating Association Rules from Frequent Itemsets:
Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them.
Example:
For many applications, it is difficult to find strong associations among data items at
lowor primitive levels of abstraction due to the sparsity of data at those levels.
Strong associations discovered at high levels of abstraction may represent
commonsenseknowledge.
Therefore, data mining systems should provide capabilities for mining association
rulesat multiple levels of abstraction, with sufficient flexibility for easy traversal
amongdifferentabstraction spaces.
Association rules generated from mining data at multiple levels of abstraction arecalled
multiple-level or multilevel association rules.
Multilevel association rules can be mined efficiently using concept hierarchies under a
support-confidence framework.
In general, a top-down strategy is employed, where counts are accumulated for the
calculation of frequent itemsets at each concept level, starting at the concept level 1 and
working downward in the hierarchy toward the more specific concept levels,until no
more frequent itemsets can be found.
Association rules that involve two or more dimensions or predicates can be referred
toas multidimensional association rules.
age(X, “20…29”)^occupation(X, “student”)=>buys(X, “laptop”)
Above Rule contains three predicates (age, occupation,and buys), each of which occurs
only once in the rule. Hence, we say that it has norepeated predicates.
Multidimensional association rules with no repeated predicates arecalled
interdimensional association rules.
We can also mine multidimensional associationrules with repeated predicates, which
contain multiple occurrences of some predicates.These rules are called hybrid-
dimensional association rules. An example of sucha rule is the following, where the
predicate buys is repeated:
age(X, ―20…29‖)^buys(X, ―laptop‖)=>buys(X, ―HP printer‖)