0% found this document useful (0 votes)

37 views11 pages

Association Rule Learning

Uploaded by

Manoj Kanti Debnath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views11 pages

Association Rule Learning

Uploaded by

Manoj Kanti Debnath

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Association rule learning

Association rule learning is a rule-based machine learning method for discovering interesting relations
between variables in large databases. It is intended to identify strong rules discovered in databases using
some measures of interestingness.[1]

Based on the concept of strong rules, Rakesh Agrawal, Tomasz Imieliński and Arun Swami[2] introduced
association rules for discovering regularities between products in large-scale transaction data recorded by
point-of-sale (POS) systems in supermarkets. For example, the rule
found in the sales data of a supermarket would indicate that if a customer buys onions and potatoes
together, they are likely to also buy hamburger meat. Such information can be used as the basis for
decisions about marketing activities such as, e.g., promotional pricing or product placements.

In addition to the above example from market basket analysis association rules are employed today in
many application areas including Web usage mining, intrusion detection, continuous production, and
bioinformatics. In contrast with sequence mining, association rule learning typically does not consider
the order of items either within a transaction or across transactions.

Contents
Definition
Useful Concepts
Support
Confidence
Lift
Conviction
Alternative measures of interestingness
Process
History
Statistically sound associations
Algorithms
Apriori algorithm
Eclat algorithm
FP-growth algorithm
Others
ASSOC
OPUS search
Lore
Other types of association rule mining
See also
References
Bibliographies
Definition
Following the original definition by Agrawal,
Example database with 5 transactions and 5 items
Imieliński, Swami[2] the problem of
association rule mining is defined as: transaction ID milk bread butter beer diapers
1 1 1 0 0 0
Let be a set of binary
2 0 0 1 0 0
attributes called items.
3 0 0 0 1 1
Let be a set of 4 1 1 1 0 0
transactions called the database.
5 0 1 0 0 0

Each transaction in has a unique

transaction ID and contains a subset of the items in .

A rule is defined as an implication of the form:

, where .

In Agrawal, Imieliński, Swami[2] a rule is defined only between a set and a single item, for
.

Every rule is composed by two different sets of items, also known as itemsets, and , where is
called antecedent or left-hand-side (LHS) and consequent or right-hand-side (RHS).

To illustrate the concepts, we use a small example from the supermarket domain. The set of items is
and in the table is shown a small database containing the
items, where, in each entry, the value 1 means the presence of the item in the corresponding transaction,
and the value 0 represents the absence of an item in that transaction.

An example rule for the supermarket could be meaning that if butter and
bread are bought, customers also buy milk.

Note: this example is extremely small. In practical applications, a rule needs a support of several hundred
transactions before it can be considered statistically significant, and datasets often contain thousands or
millions of transactions.

Useful Concepts
In order to select interesting rules from the set of all possible rules, constraints on various measures of
significance and interest are used. The best-known constraints are minimum thresholds on support and
confidence.

Let be itemsets, an association rule and a set of transactions of a given database.

Support
Support is an indication of how frequently the itemset appears in the dataset.
The support of with respect to is defined as the proportion of transactions in the dataset which
contains the itemset .

In the example dataset, the itemset has a support of since it occurs in

20% of all transactions (1 out of 5 transactions). The argument of is a set of preconditions, and
thus becomes more restrictive as it grows (instead of more inclusive). [3]

Confidence
Confidence is an indication of how often the rule has been found to be true.

The confidence value of a rule, , with respect to a set of transactions , is the proportion of the
transactions that contains which also contains .

Confidence is defined as:

For example, the rule has a confidence of in the database,

which means that for 100% of the transactions containing butter and bread the rule is correct (100% of
the times a customer buys butter and bread, milk is bought as well).

Note that means the support of the union of the items in X and Y. This is somewhat
confusing since we normally think in terms of probabilities of events and not sets of items. We can
rewrite as the probability , where and are the events that a
transaction contains itemset and , respectively. [4]

Thus confidence can be interpreted as an estimate of the conditional probability , the

probability of finding the RHS of the rule in transactions under the condition that these transactions also
contain the LHS.[3][5]

Lift
The lift of a rule is defined as:

or the ratio of the observed support to that expected if X and Y were independent.

For example, the rule has a lift of .

If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of
the consequent are independent of each other. When two events are independent of each other, no rule
can be drawn involving those two events.
If the lift is > 1, that lets us know the degree to which those two occurrences are dependent on one
another, and makes those rules potentially useful for predicting the consequent in future data sets.

If the lift is < 1, that lets us know the items are substitute to each other. This means that presence of one
item has negative effect on presence of other item and vice versa.

The value of lift is that it considers both the support of the rule and the overall data set.[3]

Conviction

The conviction of a rule is defined as .[6]

For example, the rule has a conviction of , and can be

interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency
that the rule makes an incorrect prediction) if X and Y were independent divided by the observed
frequency of incorrect predictions. In this example, the conviction value of 1.2 shows that the rule
would be incorrect 20% more often (1.25 times as often) if the association
between X and Y was purely random chance.

Alternative measures of interestingness

In addition to confidence, other measures of interestingness for rules have been proposed. Some popular
measures are:

All-confidence[7]
Collective strength[8]
Leverage[9]
Several more measures are presented and compared by Tan et al.[10] and by Hahsler.[4] Looking for
techniques that can model what the user has known (and using these models as interestingness measures)
is currently an active research trend under the name of "Subjective Interestingness."

Process
Association rules are usually required to satisfy a user-specified minimum support and a user-specified
minimum confidence at the same time. Association rule generation is usually split up into two separate
steps:

1. A minimum support threshold is applied to find all frequent itemsets in a database.

2. A minimum confidence constraint is applied to these frequent itemsets in order to form
rules.
While the second step is straightforward, the first step needs more attention.
Finding all frequent itemsets in a database is difficult since it
involves searching all possible itemsets (item combinations). The
set of possible itemsets is the power set over and has size
(excluding the empty set which is not a valid itemset).
Although the size of the power-set grows exponentially in the
number of items in , efficient search is possible using the
downward-closure property of support[2][11] (also called anti-
monotonicity[12]) which guarantees that for a frequent itemset, all
its subsets are also frequent and thus no infrequent itemset can be Frequent itemset lattice, where the
a subset of a frequent itemset. Exploiting this property, efficient color of the box indicates how many
algorithms (e.g., Apriori[13] and Eclat[14]) can find all frequent transactions contain the combination
itemsets. of items. Note that lower levels of the
lattice can contain at most the
minimum number of their parents'
History items; e.g. {ac} can have only at
most items. This is called
The concept of association rules was popularised particularly due the downward-closure property.[2]
to the 1993 article of Agrawal et al.,[2] which has acquired more
than 18,000 citations according to Google Scholar, as of August
2015, and is thus one of the most cited papers in the Data Mining field. However, what is now called
"association rules" is introduced already in the 1966 paper[15] on GUHA, a general data mining method
developed by Petr Hájek et al.[16]

An early (circa 1989) use of minimum support and confidence to find all association rules is the Feature
Based Modeling framework, which found all rules with and greater than user
defined constraints.[17]

Statistically sound associations

One limitation of the standard approach to discovering associations is that by searching massive numbers
of possible associations to look for collections of items that appear to be associated, there is a large risk
of finding many spurious associations. These are collections of items that co-occur with unexpected
frequency in the data, but only do so by chance. For example, suppose we are considering a collection of
10,000 items and looking for rules containing two items in the left-hand-side and 1 item in the right-
hand-side. There are approximately 1,000,000,000,000 such rules. If we apply a statistical test for
independence with a significance level of 0.05 it means there is only a 5% chance of accepting a rule if
there is no association. If we assume there are no associations, we should nonetheless expect to find
50,000,000,000 rules. Statistically sound association discovery[18][19] controls this risk, in most cases
reducing the risk of finding any spurious associations to a user-specified significance level.

Algorithms
Many algorithms for generating association rules have been proposed.

Some well-known algorithms are Apriori, Eclat and FP-Growth, but they only do half the job, since they
are algorithms for mining frequent itemsets. Another step needs to be done after to generate rules from
frequent itemsets found in a database.
Apriori algorithm
Apriori[13] uses a breadth-first search strategy to count the support of itemsets and uses a candidate
generation function which exploits the downward closure property of support.

Eclat algorithm
Eclat[14] (alt. ECLAT, stands for Equivalence Class Transformation) is a depth-first search algorithm
based on set intersection. It is suitable for both sequential as well as parallel execution with locality-
enhancing properties.[20][21]

FP-growth algorithm
FP stands for frequent pattern.[22]

In the first pass, the algorithm counts the occurrences of items (attribute-value pairs) in the dataset of
transactions, and stores these counts in a 'header table'. In the second pass, it builds the FP-tree structure
by inserting transactions into a trie.

Items in each transaction have to be sorted by descending order of their frequency in the dataset before
being inserted so that the tree can be processed quickly. Items in each transaction that do not meet the
minimum support requirement are discarded. If many transactions share most frequent items, the FP-tree
provides high compression close to tree root.

Recursive processing of this compressed version of the main dataset grows frequent item sets directly,
instead of generating candidate items and testing them against the entire database (as in the apriori
algorithm).

Growth begins from the bottom of the header table i.e. the item with the smallest support by finding all
sorted transactions that end in that item. Call this item .

A new conditional tree is created which is the original FP-tree projected onto . The supports of all nodes
in the projected tree are re-counted with each node getting the sum of its children counts. Nodes (and
hence subtrees) that do not meet the minimum support are pruned. Recursive growth ends when no
individual items conditional on meet the minimum support threshold. The resulting paths from root to
will be frequent itemsets. After this step, processing continues with the next least-supported header
item of the original FP-tree.

Once the recursive process has completed, all frequent item sets will have been found, and association
rule creation begins.[23]

Others

ASSOC
The ASSOC procedure[24] is a GUHA method which mines for generalized association rules using fast
bitstrings operations. The association rules mined by this method are more general than those output by
apriori, for example "items" can be connected both with conjunction and disjunctions and the relation
between antecedent and consequent of the rule is not restricted to setting minimum support and
confidence as in apriori: an arbitrary combination of supported interest measures can be used.

OPUS search
OPUS is an efficient algorithm for rule discovery that, in contrast to most alternatives, does not require
either monotone or anti-monotone constraints such as minimum support.[25] Initially used to find rules
for a fixed consequent[25][26] it has subsequently been extended to find rules with any item as a
consequent.[27] OPUS search is the core technology in the popular Magnum Opus association discovery
system.

Lore
A famous story about association rule mining is the "beer and diaper" story. A purported survey of
behavior of supermarket shoppers discovered that customers (presumably young men) who buy diapers
tend also to buy beer. This anecdote became popular as an example of how unexpected association rules
might be found from everyday data. There are varying opinions as to how much of the story is true.[28]
Daniel Powers says:[28]

In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff
prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores.
Database queries were developed to identify affinities. The analysis "did discover that
between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did
NOT exploit the beer and diapers relationship by moving the products closer together on the
shelves.

Other types of association rule mining

Multi-Relation Association Rules: Multi-Relation Association Rules (MRAR) are association rules
where each item may have several relations. These relations indicate indirect relationship between the
entities. Consider the following MRAR where the first item consists of three relations live in, nearby and
humid: “Those who live in a place which is nearby a city with humid climate type and also are younger
than 20 -> their health condition is good”. Such association rules are extractable from RDBMS data or
semantic web data.[29]

Contrast set learning is a form of associative learning. Contrast set learners use rules that differ
meaningfully in their distribution across subsets.[30][31]

Weighted class learning is another form of associative learning in which weight may be assigned to
classes to give focus to a particular issue of concern for the consumer of the data mining results.

High-order pattern discovery facilitate the capture of high-order (polythetic) patterns or event
associations that are intrinsic to complex real-world data. [32]

K-optimal pattern discovery provides an alternative to the standard approach to association rule
learning that requires that each pattern appear frequently in the data.
Approximate Frequent Itemset mining is a relaxed version of Frequent Itemset mining that allows
some of the items in some of the rows to be 0.[33]

Generalized Association Rules hierarchical taxonomy (concept hierarchy)

Quantitative Association Rules categorical and quantitative data

Interval Data Association Rules e.g. partition the age into 5-year-increment ranged

Sequential pattern mining discovers subsequences that are common to more than minsup sequences in
a sequence database, where minsup is set by the user. A sequence is an ordered list of transactions.[34]

Subspace Clustering, a specific type of Clustering high-dimensional data, is in many variants also based
on the downward-closure property for specific clustering models.[35]

Warmr is shipped as part of the ACE data mining suite. It allows association rule learning for first order
relational rules.[36]

See also
Sequence mining
Production system (computer science)
Learning classifier system
Rule-based machine learning

References
1. Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in
Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in
Databases, AAAI/MIT Press, Cambridge, MA.
2. Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of
items in large databases". Proceedings of the 1993 ACM SIGMOD international conference
on Management of data - SIGMOD '93. p. 207. CiteSeerX 10.1.1.40.6984 (https://citeseerx.i
st.psu.edu/viewdoc/summary?doi=10.1.1.40.6984). doi:10.1145/170035.170072 (https://doi.
org/10.1145%2F170035.170072). ISBN 978-0897915922.
3. Hahsler, Michael (2005). "Introduction to arules – A computational environment for mining
association rules and frequent item sets" (https://mran.revolutionanalytics.com/web/package
s/arules/vignettes/arules.pdf) (PDF). Journal of Statistical Software.
4. Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures
for Association Rules. http://michael.hahsler.net/research/association_rules/measures.html
5. Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. (2000). "Algorithms for association rule mining --- a
general survey and comparison". ACM SIGKDD Explorations Newsletter. 2: 58–64.
CiteSeerX 10.1.1.38.5305 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.53
05). doi:10.1145/360402.360421 (https://doi.org/10.1145%2F360402.360421).
6. Brin, Sergey; Motwani, Rajeev; Ullman, Jeffrey D.; Tsur, Shalom (1997). "Dynamic itemset
counting and implication rules for market basket data". Proceedings of the 1997 ACM
SIGMOD international conference on Management of data - SIGMOD '97. pp. 255–264.
CiteSeerX 10.1.1.41.6476 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.64
76). doi:10.1145/253260.253325 (https://doi.org/10.1145%2F253260.253325). ISBN 978-
0897919111.
7. Omiecinski, E.R. (2003). "Alternative interest measures for mining associations in
databases". IEEE Transactions on Knowledge and Data Engineering. 15: 57–69.
CiteSeerX 10.1.1.329.5344 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.
5344). doi:10.1109/TKDE.2003.1161582 (https://doi.org/10.1109%2FTKDE.2003.1161582).
8. Aggarwal, Charu C.; Yu, Philip S. (1998). "A new framework for itemset generation".
Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles
of database systems - PODS '98. pp. 18–24. CiteSeerX 10.1.1.24.714 (https://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.24.714). doi:10.1145/275487.275490 (https://doi.or
g/10.1145%2F275487.275490). ISBN 978-0897919968.
9. Piatetsky-Shapiro, Gregory; Discovery, analysis, and presentation of strong rules,
Knowledge Discovery in Databases, 1991, pp. 229-248
10. Tan, Pang-Ning; Kumar, Vipin; Srivastava, Jaideep (2004). "Selecting the right objective
measure for association analysis". Information Systems. 29 (4): 293–313.
CiteSeerX 10.1.1.331.4740 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.
4740). doi:10.1016/S0306-4379(03)00072-3 (https://doi.org/10.1016%2FS0306-4379%280
3%2900072-3).
11. Tan, Pang-Ning; Michael, Steinbach; Kumar, Vipin (2005). "Chapter 6. Association Analysis:
Basic Concepts and Algorithms" (http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf)
(PDF). Introduction to Data Mining. Addison-Wesley. ISBN 978-0-321-32136-7.
12. Jian Pei; Jiawei Han; Lakshmanan, L.V.S. (2001). "Mining frequent itemsets with convertible
constraints". Proceedings 17th International Conference on Data Engineering. pp. 433–442.
CiteSeerX 10.1.1.205.2150 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.205.
2150). doi:10.1109/ICDE.2001.914856 (https://doi.org/10.1109%2FICDE.2001.914856).
ISBN 978-0-7695-1001-9.
13. Agrawal, Rakesh; and Srikant, Ramakrishnan; Fast algorithms for mining association rules
in large databases (http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf) Archived (htt
ps://web.archive.org/web/20150225213708/http://rakesh.agrawal-family.com/papers/vldb94
apriori.pdf) 2015-02-25 at the Wayback Machine, in Bocca, Jorge B.; Jarke, Matthias; and
Zaniolo, Carlo; editors, Proceedings of the 20th International Conference on Very Large
Data Bases (VLDB), Santiago, Chile, September 1994, pages 487-499
14. Zaki, M. J. (2000). "Scalable algorithms for association mining". IEEE Transactions on
Knowledge and Data Engineering. 12 (3): 372–390. CiteSeerX 10.1.1.79.9448 (https://cites
eerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.79.9448). doi:10.1109/69.846291 (https://do
i.org/10.1109%2F69.846291).
15. Hájek, P.; Havel, I.; Chytil, M. (1966). "The GUHA method of automatic hypotheses
determination". Computing. 1 (4): 293–308. doi:10.1007/BF02345483 (https://doi.org/10.100
7%2FBF02345483).
16. Hájek, Petr; Rauch, Jan; Coufal, David; Feglar, Tomáš (2004). "The GUHA Method, Data
Preprocessing and Mining". Database Support for Data Mining Applications. Lecture Notes
in Computer Science. 2682. pp. 135–153. doi:10.1007/978-3-540-44497-8_7 (https://doi.or
g/10.1007%2F978-3-540-44497-8_7). ISBN 978-3-540-22479-2.
17. Webb, Geoffrey (1989). "A Machine Learning Approach to Student Modelling". Proceedings
of the Third Australian Joint Conference on Artificial Intelligence (AI 89): 195–205.
18. Webb, Geoffrey I. (2007). "Discovering Significant Patterns". Machine Learning. 68: 1–33.
doi:10.1007/s10994-007-5006-x (https://doi.org/10.1007%2Fs10994-007-5006-x).
19. Gionis, Aristides; Mannila, Heikki; Mielikäinen, Taneli; Tsaparas, Panayiotis (2007).
"Assessing data mining results via swap randomization". ACM Transactions on Knowledge
Discovery from Data. 1 (3): 14–es. CiteSeerX 10.1.1.141.2607 (https://citeseerx.ist.psu.edu/
viewdoc/summary?doi=10.1.1.141.2607). doi:10.1145/1297332.1297338 (https://doi.org/10.
1145%2F1297332.1297338).
20. Zaki, Mohammed Javeed; Parthasarathy, Srinivasan; Ogihara, Mitsunori; Li, Wei (1997).
"New Algorithms for Fast Discovery of Association Rules": 283–286.
CiteSeerX 10.1.1.42.3283 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.42.32
83). hdl:1802/501 (https://hdl.handle.net/1802%2F501).
21. Zaki, Mohammed J.; Parthasarathy, Srinivasan; Ogihara, Mitsunori; Li, Wei (1997). "Parallel
Algorithms for Discovery of Association Rules". Data Mining and Knowledge Discovery. 1
(4): 343–373. doi:10.1023/A:1009773317876 (https://doi.org/10.1023%2FA%3A100977331
7876).
22. Han (2000). Mining Frequent Patterns Without Candidate Generation. Proceedings of the
2000 ACM SIGMOD International Conference on Management of Data. SIGMOD '00.
pp. 1–12. CiteSeerX 10.1.1.40.4436 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=1
0.1.1.40.4436). doi:10.1145/342009.335372 (https://doi.org/10.1145%2F342009.335372).
ISBN 978-1581132175.
23. Witten, Frank, Hall: Data mining practical machine learning tools and techniques, 3rd edition
24. Hájek, Petr; Havránek, Tomáš (1978). Mechanizing Hypothesis Formation: Mathematical
Foundations for a General Theory (http://www.cs.cas.cz/hajek/guhabook/). Springer-Verlag.
ISBN 978-3-540-08738-0.
25. Webb, Geoffrey I. (1995); OPUS: An Efficient Admissible Algorithm for Unordered Search,
Journal of Artificial Intelligence Research 3, Menlo Park, CA: AAAI Press, pp. 431-465
online access (http://webarchive.loc.gov/all/20011118141304/http://www.cs.washington.ed
u/research/jair/abstracts/webb95a.html)
26. Bayardo, Roberto J., Jr.; Agrawal, Rakesh; Gunopulos, Dimitrios (2000). "Constraint-based
rule mining in large, dense databases". Data Mining and Knowledge Discovery. 4 (2): 217–
240. doi:10.1023/A:1009895914772 (https://doi.org/10.1023%2FA%3A1009895914772).
27. Webb, Geoffrey I. (2000). "Efficient search for association rules". Proceedings of the sixth
ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '00.
pp. 99–107. CiteSeerX 10.1.1.33.1309 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.33.1309). doi:10.1145/347090.347112 (https://doi.org/10.1145%2F347090.347112).
ISBN 978-1581132335.
28. "DSS News: Vol. 3, No. 23" (http://www.dssresources.com/newsletters/66.php).
29. Ramezani, Reza, Mohamad Sunni ee, and Mohammad Ali Nematbakhsh; MRAR: Mining
Multi-Relation Association Rules, Journal of Computing and Security, 1, no. 2 (2014)
30. GI Webb and S. Butler and D. Newlands (2003). On Detecting Differences Between Groups
(http://portal.acm.org/citation.cfm?id=956781). KDD'03 Proceedings of the Ninth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining.
31. Menzies, T.; Ying Hu (2003). "Computing practices - Data mining for very busy people".
Computer. 36 (11): 22–29. doi:10.1109/MC.2003.1244531 (https://doi.org/10.1109%2FMC.
2003.1244531).
32. Wong, A.K.C.; Yang Wang (1997). "High-order pattern discovery from discrete-valued data".
IEEE Transactions on Knowledge and Data Engineering. 9 (6): 877–893.
CiteSeerX 10.1.1.189.1704 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.189.
1704). doi:10.1109/69.649314 (https://doi.org/10.1109%2F69.649314).
33. Liu, Jinze; Paulsen, Susan; Sun, Xing; Wang, Wei; Nobel, Andrew; Prins, Jan (2006).
"Mining Approximate Frequent Itemsets in the Presence of Noise: Algorithm and Analysis".
Proceedings of the 2006 SIAM International Conference on Data Mining. pp. 407–418.
CiteSeerX 10.1.1.215.3599 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.215.
3599). doi:10.1137/1.9781611972764.36
(https://doi.org/10.1137%2F1.9781611972764.36). ISBN 978-0-89871-611-5.
34. Zaki, Mohammed J. (2001); SPADE: An Efficient Algorithm for Mining Frequent Sequences,
Machine Learning Journal, 42, pp. 31–60
35. Zimek, Arthur; Assent, Ira; Vreeken, Jilles (2014). Frequent Pattern Mining. pp. 403–423.
doi:10.1007/978-3-319-07821-2_16 (https://doi.org/10.1007%2F978-3-319-07821-2_16).
ISBN 978-3-319-07820-5.
36. King, R. D.; Srinivasan, A.; Dehaspe, L. (Feb 2001). "Warmr: a data mining tool for chemical
data". J Comput Aided Mol Des. 15 (2): 173–81. Bibcode:2001JCAMD..15..173K (https://ui.
adsabs.harvard.edu/abs/2001JCAMD..15..173K). doi:10.1023/A:1008171016861 (https://do
i.org/10.1023%2FA%3A1008171016861). PMID 11272703 (https://pubmed.ncbi.nlm.nih.go
v/11272703).

Bibliographies
Annotated Bibliography on Association Rules (http://michael.hahsler.net/research/bib/associ
ation_rules/) by M. Hahsler

Retrieved from "https://en.wikipedia.org/w/index.php?title=Association_rule_learning&oldid=931361299"

This page was last edited on 18 December 2019, at 13:11 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using
this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

Adding Custom Tab To The Transaction VF01, VF02 & VF03 Header&Item Detail Screen
0% (1)
Adding Custom Tab To The Transaction VF01, VF02 & VF03 Header&Item Detail Screen
25 pages
Knime PDF
100% (1)
Knime PDF
222 pages
Basic Concepts
No ratings yet
Basic Concepts
2 pages
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
No ratings yet
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
4 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Association Rule Learning
No ratings yet
Association Rule Learning
16 pages
DM UNIT II (1)
No ratings yet
DM UNIT II (1)
30 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Data Mining
No ratings yet
Data Mining
4 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
15 pages
Support: R. Agrawal, T. Imielinski, and A. Swami. Mining Associations Between Sets of Items in Large Databases
No ratings yet
Support: R. Agrawal, T. Imielinski, and A. Swami. Mining Associations Between Sets of Items in Large Databases
3 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Ch.5 - Association Rule Mining
No ratings yet
Ch.5 - Association Rule Mining
29 pages
Mining Items From Large Database Using Coherent Rules
No ratings yet
Mining Items From Large Database Using Coherent Rules
10 pages
Measuring The Accuracy and Interest of Association Rules: A New Framework
No ratings yet
Measuring The Accuracy and Interest of Association Rules: A New Framework
15 pages
Assignment ON Data Mining: Submitted by Name: Manjula.T
No ratings yet
Assignment ON Data Mining: Submitted by Name: Manjula.T
11 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
DM Unit 3
No ratings yet
DM Unit 3
22 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Arm 1
No ratings yet
Arm 1
46 pages
Basic Association Rules
No ratings yet
Basic Association Rules
12 pages
Defining_interestingness_for_association_rule
No ratings yet
Defining_interestingness_for_association_rule
7 pages
Apriori
No ratings yet
Apriori
27 pages
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
No ratings yet
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
5 pages
Module 2
No ratings yet
Module 2
13 pages
Data Mining
No ratings yet
Data Mining
65 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
Lec.5.Intro.D.S. Fall 2023
No ratings yet
Lec.5.Intro.D.S. Fall 2023
18 pages
03. UNIT-III(DMWH6EM)
No ratings yet
03. UNIT-III(DMWH6EM)
24 pages
unit2[1]
No ratings yet
unit2[1]
23 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
DAR LEC 15 ASSOCIATION RULES
No ratings yet
DAR LEC 15 ASSOCIATION RULES
16 pages
Untitled Document
No ratings yet
Untitled Document
59 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
Lecture06 Association Mining
No ratings yet
Lecture06 Association Mining
54 pages
Chapter 10 Association Rule
No ratings yet
Chapter 10 Association Rule
41 pages
Association Rule
No ratings yet
Association Rule
27 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
DMDW 05
No ratings yet
DMDW 05
12 pages
cbs
No ratings yet
cbs
5 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
BIA Unit 4
No ratings yet
BIA Unit 4
11 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Arules
No ratings yet
Arules
38 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Association Rule Learning
No ratings yet
Association Rule Learning
16 pages
DWDM Unit 3 PDF
No ratings yet
DWDM Unit 3 PDF
16 pages
Introduction To Arules - A Computational Environment For Mining Association Rules and Frequent Item Sets
No ratings yet
Introduction To Arules - A Computational Environment For Mining Association Rules and Frequent Item Sets
37 pages
DWDM-UNIT-4
No ratings yet
DWDM-UNIT-4
12 pages
Model question paper and solution_DWDM.docx
No ratings yet
Model question paper and solution_DWDM.docx
57 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
No ratings yet
Bread, Milk Bread, Diapers, Beer, Eggs Bread, Diapers, Beer, Cola Bread, Milk, Diapers, Beer Bread, Milk, Diapers, Cola
4 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Belief Revision: Fundamentals and Applications
From Everand
Belief Revision: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Systems Thinker - Dynamic Systems: The Systems Thinker Series, #5
From Everand
The Systems Thinker - Dynamic Systems: The Systems Thinker Series, #5
Albert Rutherford
2/5 (1)
MCMC
No ratings yet
MCMC
81 pages
Application of Stochastic Frontier To Agriculture in Ethiopia
No ratings yet
Application of Stochastic Frontier To Agriculture in Ethiopia
20 pages
Estimating Technology Adoption and Technical Efficiency in Smallholder Maize Production A Double Bootstrap DEA Approach
No ratings yet
Estimating Technology Adoption and Technical Efficiency in Smallholder Maize Production A Double Bootstrap DEA Approach
17 pages
A Stochastic Frontier Analysis of Technical Efficiency in Smallholder Maize Production in Zimbabwe
No ratings yet
A Stochastic Frontier Analysis of Technical Efficiency in Smallholder Maize Production in Zimbabwe
15 pages
P P P P P P P P P P P P P P P P P P P P P P P P
No ratings yet
P P P P P P P P P P P P P P P P P P P P P P P P
4 pages
SSAS Interview Questions
No ratings yet
SSAS Interview Questions
34 pages
PRM Backup Restore
No ratings yet
PRM Backup Restore
10 pages
Efficient SQL Server License Key For All Version TechAid24
No ratings yet
Efficient SQL Server License Key For All Version TechAid24
1 page
The Politics of Data Warehousing
No ratings yet
The Politics of Data Warehousing
9 pages
Nibha Dubey
No ratings yet
Nibha Dubey
5 pages
31 MySQL Questions
No ratings yet
31 MySQL Questions
13 pages
Oracle Startup and Shutdown Phases
100% (1)
Oracle Startup and Shutdown Phases
9 pages
PSP Student Workbook.20061007.Release Notes
No ratings yet
PSP Student Workbook.20061007.Release Notes
1 page
Unit Unit - 22: JDBC Programming JDBC Programming JDBC Programming JDBC Programming
No ratings yet
Unit Unit - 22: JDBC Programming JDBC Programming JDBC Programming JDBC Programming
78 pages
Case Study DW
No ratings yet
Case Study DW
18 pages
SQL Interview Questions
100% (1)
SQL Interview Questions
4 pages
Chapter 3
No ratings yet
Chapter 3
16 pages
Stock Management
No ratings yet
Stock Management
20 pages
2022-08-08
No ratings yet
2022-08-08
4 pages
Ora 10 GUtilities
No ratings yet
Ora 10 GUtilities
62 pages
Implementation: 4.1 Component Modules
No ratings yet
Implementation: 4.1 Component Modules
10 pages
SQL Tutorial
No ratings yet
SQL Tutorial
6 pages
Transactions Ch15 Korth
No ratings yet
Transactions Ch15 Korth
32 pages
Lecture 3 - Doubly Linked List
No ratings yet
Lecture 3 - Doubly Linked List
11 pages
Normalization in DBMS11
No ratings yet
Normalization in DBMS11
17 pages
Algorithms Complexity and Data Structures Efficiency
No ratings yet
Algorithms Complexity and Data Structures Efficiency
17 pages
SQL Server Documentation
No ratings yet
SQL Server Documentation
477 pages
Database Fundamentals Distributed Databases
No ratings yet
Database Fundamentals Distributed Databases
18 pages
Intranet PDF
No ratings yet
Intranet PDF
17 pages
Ma'Lumotlar Bazasi
No ratings yet
Ma'Lumotlar Bazasi
13 pages
Ntfs Vs Fat: Criteria Ntfs5 Ntfs Exfat/Fat6 4 Fat32 Fat16 Fat12
No ratings yet
Ntfs Vs Fat: Criteria Ntfs5 Ntfs Exfat/Fat6 4 Fat32 Fat16 Fat12
8 pages
Deploying The Tivoli Storage Manager Client in A Windows 2000 Environment Sg246141
No ratings yet
Deploying The Tivoli Storage Manager Client in A Windows 2000 Environment Sg246141
190 pages
Software Requirements Specification Document
No ratings yet
Software Requirements Specification Document
26 pages

Association Rule Learning

Uploaded by

Association Rule Learning

Uploaded by

Association rule learning

Each transaction in has a unique

A rule is defined as an implication of the form:

Let be itemsets, an association rule and a set of transactions of a given database.

In the example dataset, the itemset has a support of since it occurs in

Confidence is defined as:

For example, the rule has a confidence of in the database,

Thus confidence can be interpreted as an estimate of the conditional probability , the

For example, the rule has a lift of .

The conviction of a rule is defined as .[6]

For example, the rule has a conviction of , and can be

Alternative measures of interestingness

1. A minimum support threshold is applied to find all frequent itemsets in a database.

Statistically sound associations

Other types of association rule mining

Generalized Association Rules hierarchical taxonomy (concept hierarchy)

Quantitative Association Rules categorical and quantitative data

Retrieved from "https://en.wikipedia.org/w/index.php?title=Association_rule_learning&oldid=931361299"

This page was last edited on 18 December 2019, at 13:11 (UTC).

You might also like