CH 5
CH 5
Outline
• What is association rule?
• Advanced Frequent Pattern Mining
• Mining Multi-Level Association
• Mining Multi-Dimensional Association
• Mining Quantitative Association Rules
• Mining Rare Patterns and Negative Patterns
3
Computational Complexity
• Given d unique items:
• Total number of itemsets = 2d
• Total number of possible association rules:
d d k
R
d 1 d k
k j
k 1 j 1
3 2 1
d d 1
Rule Generation
• Given a frequent itemset L, find all non-empty
subsets f L such that f L – f satisfies the
minimum confidence requirement
• If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC BD, AD BC, BC AD,
BD AC, CD AB,
Rule Generation
• How to efficiently generate rules from frequent
itemsets?
• In general, confidence does not have an anti-monotone
property
c(ABC D) can be larger or smaller than c(AB D)
• But confidence of rules generated from the same
itemset has an anti-monotone property
• e.g., L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD)
• Confidence is anti-monotone w.r.t. number of items on
the RHS of the rule
11
Beyond Itemsets
• Sequence Mining
• Finding frequent subsequences from a collection of
sequences
• Graph Mining
• Finding frequent (connected) subgraphs from a
collection of graphs
• Tree Mining
• Finding frequent (embedded) subtrees from a set of
trees/graphs
• Geometric Structure Mining
• Finding frequent substructures from 3-D or 2-D
geometric graphs
• Others…
Research on Pattern Mining: A Road Map
14
15
s(A U B) = 1/105, s(A) * s(B) = 1/103 * 1/103, s(A U B) > s(A) * s(B)
• Where is the problem? —Null transactions, i.e., the support-based
definition is not null-invariant!
23
24
• Push constants deeply when possible into the mining process (see
the remaining discussions on constraint-push techniques)
• Use confidence, correlation, and other measures when possible
28
c either 20 b, c, d, f, g, h
30 b, c, d, f, g
• The key for data anti-monotone is recursive
40 c, e, f, g
data reduction Item Profit
• Ex. 1. sum(S.Price) v is data anti-monotone a 40
• Ex. 2. min(S.Price) v is data anti-monotone b 0
• Ex. 3. C: range(S.profit) 25 is data anti-monotone c -20
• Itemset {b, c}’s projected DB: d -15