0% found this document useful (0 votes)

65 views7 pages

UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis

This document contains the topic of Association Analysis discussed in Data Warehousing and Data Mining and Algorithms used in Association Analysis

Uploaded by

Vee Beat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views7 pages

UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis

This document contains the topic of Association Analysis discussed in Data Warehousing and Data Mining and Algorithms used in Association Analysis

Uploaded by

Vee Beat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Data Warehousing and Data UNIT-5

Mining
Association Analysis
Association:
Association mining aims to extract interesting correlations, frequent patterns,
associations or casual structures among sets of items or objects in transaction databases,
relational database or other data repositories. Association rules are widely used in various
areas such as telecommunication networks, market and risk management, inventory control,
cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

Examples:
Rule Form: BodyHead [Support, confidence]
Buys (X, “Computer”)  Buys (X, “Software”) [40%, 50%]

Association rule: basic concepts:

 Given: (1) database of transaction, (2) each transaction is a list of items (purchased by
a customer in visit)
 Find: all rules that correlate the presence of one set of items with that of another set of
items.
 E.g., 98% of people who purchase tires and auto accessories also get
automotive services done.
 E.g., Market Basket Analysis  This process analyzes customer buying habits
by finding associations between the different items that customers place in
their “Shopping Baskets”. The discovery of such associations can help
retailers develop marketing strategies by gaining insight into which items are
frequently purchased together by customer.
Applications:
 Maintenance agreement (what the store should do to boost maintenance agreement
sales)
 Home Electronics (what other products should the store stocks up?)
 Attached mailing in direct marketing

Association Rule:
An association rule is an implication expression of the form XY, where X and Y
are disjoint itemsets, i.e., X ∩ Y = ∅. The strength of an association rule can be
measured in terms of its support and confidence. Support determines how often a rule is
applicable to a given data set, while confidence determines how frequently items in Y
appear in transactions that contain X. The formal deﬁnition of these metrics are

Support, s(XY)= 𝜎(𝑋∪Y)/𝑁

𝑁
Confidence, c(XY) = 𝜎(𝑋∪Y)/𝜎(𝑋)

Page 1
Data Warehousing and Data UNIT-5
Mining
Why Use Support and Confidence? Support is an important measure because a rule
that has very low support may occur simply by chance. A low support rule is also likely to
be uninteresting from a business perspective because it may not be profitable to promote
items that customers seldom buy together. For these reasons, support is often used to
eliminate uninteresting rules.
Confidence, on the other hand, measures the reliability of the inference made by a
rule. For a given rule XY, the higher the confidence, the more likely it is for Y to be present
in transactions that contain X. Confidence also provides an estimate of the conditional
probability of Y given X.

Therefore, a common strategy adopted by many association rule mining algorithms is to

decompose the problem into two major subtasks:
1. Frequent Itemset Generation, whose objective is to ﬁnd all the itemsets that satisfy
the minsupthreshold. These itemsets are called frequent itemsets.
2. Rule Generation, whose objective is to extract all the high-conﬁdence rules from the
frequent itemsets found in the previous step. These rules are called strong rules.

Frequent Itemset Generation:

A lattice structure can be used to enumerate the list of all possible itemsets. Above
Figure shows an itemset lattice for I = {a, b, c, d, e}. In general, a data set that contains k
items can potentially generate up to 2k − 1 frequent itemsets, excluding the null set.
Because k can be very large in many practical applications, the search space of itemsets that
need to be explored is exponentially large.
To find frequent itemsets we have two algorithms,
a) Apriori Algorithm
b) FP-Growth

Page 2
Data Warehousing and Data UNIT-5
Mining
a) Apriori Algorithm:
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for
mining frequent itemsets for Boolean association rules. The name of the algorithm is based
on the fact that the algorithm uses prior knowledge of frequent itemset properties, as we shall
see later. Apriori employs an iterative approach known as a level-wise search, where k-
itemsets are used to explore (k+1)-itemsets.
First, the set of frequent 1-itemsets is found by scanning the database to accumulate
the count for each item, and collecting those items that satisfy minimum support. The
resulting set is denoted by L1. Next, L1 is used to find L2, the set of frequent 2-itemsets,
which is used to find L3, and so on, until no more frequent k-itemsets can be found. The
finding of each Lk requires one full scan of the database.
To improve the efficiency of the level-wise generation of frequent itemsets, an
important property called the Apriori property is used to reduce the search space.

Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
The Apriori property is based on the following observation. By definition, if an
itemset I does not satisfy the minimum support threshold, min sup, then I is not frequent, that
is, P(I)< min sup. If an item A is added to the itemset I, then the resulting itemset (i.e.,IUA)
cannot occur more frequently than I. Therefore, IUA is not frequent either, that is, P(IUA) <
min sup.
This property belongs to a special category of properties called antimonotonicity in
the sense that if a set cannot pass a test, all of its supersets will fail the same test as well. It is
called antimonotonicity because the property is monotonic in the context of failing a test.
A two-step process is followed, consisting of join and prune actions.
1. The join step: To find Lk, a set of candidate k-itemsets is generated by joining Lk-1 with
itself. This set of candidates is denoted Ck.
2. The prune step: Ck is a superset of Lk, that is, its members may or may not be frequent,
but all of the frequent k-itemsets are included in Ck. A database scan to determine the count
of each candidate in Ck would result in the determination of Lk.
Example:

Page 3
Data Warehousing and Data UNIT-5
Mining
1. In the first iteration of the algorithm, each item is a member of the set of candidate 1-
itemsets, C1. The algorithm simply scans all of the transactions to count the number of
occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2. (Here, we
are referring to absolute support because we are using a support count. The
corresponding relative support is 2/9 = 22%.) The set of frequent 1-itemsets, L1, can
then be determined. It consists of the candidate 1-itemsets satisfying minimum
support. In our example, all of the candidates in C1 satisfy minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 ⋈ L1 to
generate a candidate set of 2-itemsets, C2. C2 consists of 2-itemsets. Note that no
candidates are removed from C2 during the prune step because each subset of the
candidates is also frequent.
4. Next, the transactions in D are scanned and the support count of each candidate
itemset in C2 is accumulated, as shown in the middle table of the second row in Figure
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate 2-
itemsets in C2 having minimum support.

Page 4
Data Warehousing and Data UNIT-5
Mining
6. The generation of the set of the candidate 3-itemsets, C3, is detailed in Figure From
the join step, we first get C3 = L2 ⋈ L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2,
I3, I4}, {I2, I3, I5}, {I2, I4, I5}} Based on the Apriori property that all subsets of a
frequent itemset must also be frequent, we can determine that the four latter
candidates cannot possibly be frequent. We therefore remove them from C3, thereby
saving the effort of unnecessarily obtaining their counts during the subsequent scan of
D to determine L3.
7. The transactions in D are scanned to determine L3, consisting of those candidate 3-
itemsets in C3 having minimum support.
8. The algorithm uses L3 ⋈ L3 to generate a candidate set of 4-itemsets, C4. Although
the join results in {I1, I2, I3, I5}, itemset {I1, I2, I3, I5} is pruned because its subset
{I2, I3, I5} is not frequent. Thus, C4 ≠Ø, and the algorithm terminates, having found
all of the frequent itemsets.

b) FP-Growth:

FP-growth (finding frequent itemsets without candidate generation). We

reexamine the mining of transaction database, D, of Table in previous Example using the
frequent pattern growth approach.

The first scan of the database is the same as Apriori, which derives the set of frequent
items (1-itemsets) and their support counts (frequencies). Let the minimum support count be
2. The set of frequent items is sorted in the order of descending support count. This resulting
set or list is denoted by L. Thus, we have L = {{I2:7}, {I1:6}, {I3:6}, {I4:2}, {I5:2}}
An FP-tree is then constructed as follows. First, create the root of the tree, labeled
with “null.” Scan database D a second time. The items in each transaction are processed in L
order (i.e., sorted according to descending support count), and a branch is created for each
transaction.

Page 5
Data Warehousing and Data UNIT-5
Mining

The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an
initial suffix pattern), construct its conditional pattern base (a “sub-database,” which
consists of the set of prefix paths in the FP-tree co-occurring with the suffix pattern), then
construct its (conditional) FP-tree, and perform mining recursively on the tree. The pattern
growth is achieved by the concatenation of the suffix pattern with the frequent patterns
generated from a conditional FP-tree.

Finally, we can conclude that frequent itemsets are {I2, I1, I5} and {I2, I1, I3}.

Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from transactions in a database D have been found, it is
straightforward to generate strong association rules from them (where strong association
rules satisfy both minimum support and minimum confidence). This can be done using Eq.
for confidence, which we show again here for completeness:

Page 6
Data Warehousing and Data UNIT-5
Mining

Page 7

DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
CCM v4.0 Implementation Guidelines v2.0 20240528
No ratings yet
CCM v4.0 Implementation Guidelines v2.0 20240528
366 pages
The Beginners Guide To Power Automate v2.0
No ratings yet
The Beginners Guide To Power Automate v2.0
98 pages
Quiz
91% (11)
Quiz
152 pages
Pinnacle CPA Review App Packages PDF
No ratings yet
Pinnacle CPA Review App Packages PDF
7 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
DWM-UNIT-4
No ratings yet
DWM-UNIT-4
11 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DWDM-UNIT-4
No ratings yet
DWDM-UNIT-4
12 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Mod_5
No ratings yet
Mod_5
56 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Data Mining
No ratings yet
Data Mining
41 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
DWM UNIT-4 SEM ANS
No ratings yet
DWM UNIT-4 SEM ANS
9 pages
dm 2
No ratings yet
dm 2
71 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Unit IV Dwdm
No ratings yet
Unit IV Dwdm
17 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Unit 4
No ratings yet
Unit 4
72 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Mining Associans in Large Data Bases (Unit-5)
No ratings yet
Mining Associans in Large Data Bases (Unit-5)
12 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Contents
No ratings yet
Contents
59 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
L9
No ratings yet
L9
24 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
MIDTERM
No ratings yet
MIDTERM
20 pages
NIS Unit 3
No ratings yet
NIS Unit 3
27 pages
ASUS IPPSB-SFA DELL Inspiron One 2320
No ratings yet
ASUS IPPSB-SFA DELL Inspiron One 2320
79 pages
Microsegmentation 2
No ratings yet
Microsegmentation 2
11 pages
Day 48 - Vocab Quiz Sped
No ratings yet
Day 48 - Vocab Quiz Sped
3 pages
BCS POINT Network Camera Manual ENG
No ratings yet
BCS POINT Network Camera Manual ENG
91 pages
History of AI
No ratings yet
History of AI
2 pages
CSD-USSD-Primary-Issues-User-Guide
No ratings yet
CSD-USSD-Primary-Issues-User-Guide
7 pages
CURVES Basics: Prof. Janakarajan Ramkumar Professor Department of Mechanical & Design Program IIT Kanpur, India
No ratings yet
CURVES Basics: Prof. Janakarajan Ramkumar Professor Department of Mechanical & Design Program IIT Kanpur, India
61 pages
Normally-Closed Contacts For Stop Buttons
No ratings yet
Normally-Closed Contacts For Stop Buttons
9 pages
Chapter One
No ratings yet
Chapter One
46 pages
1505760060csen 3111
No ratings yet
1505760060csen 3111
6 pages
Welcome To The Winter Term Session!
No ratings yet
Welcome To The Winter Term Session!
9 pages
Lifting-Solutions-Brochure
No ratings yet
Lifting-Solutions-Brochure
6 pages
Variant Configuration SAP
No ratings yet
Variant Configuration SAP
67 pages
Pricelist Cahaya Distribusi Nusantara - UPDATE 14 Oktober 2022 D
No ratings yet
Pricelist Cahaya Distribusi Nusantara - UPDATE 14 Oktober 2022 D
6 pages
DBMS Exp 4
No ratings yet
DBMS Exp 4
9 pages
IaaS vs. PaaS vs. SaaS
No ratings yet
IaaS vs. PaaS vs. SaaS
4 pages
Lesson Plan C C Unit 10 NewTechnology Vocabulary
No ratings yet
Lesson Plan C C Unit 10 NewTechnology Vocabulary
4 pages
Practical File Cs - Ameya Atreya
No ratings yet
Practical File Cs - Ameya Atreya
29 pages
Mde 4370D
No ratings yet
Mde 4370D
26 pages
Procedure For Renewal DLC 08032022
No ratings yet
Procedure For Renewal DLC 08032022
12 pages
777AV29 OQAR Deactivated
No ratings yet
777AV29 OQAR Deactivated
2 pages
2nd PUC Computer Science Paper 2
No ratings yet
2nd PUC Computer Science Paper 2
2 pages
Unit1 Arrow Java
No ratings yet
Unit1 Arrow Java
41 pages
Module1 - PartA - Dr. Ilavarasi
No ratings yet
Module1 - PartA - Dr. Ilavarasi
53 pages

UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis

Uploaded by

UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis

Uploaded by

Data Warehousing and Data UNIT-5

Association rule: basic concepts:

Support, s(XY)= 𝜎(𝑋∪Y)/𝑁

Therefore, a common strategy adopted by many association rule mining algorithms is to

Frequent Itemset Generation:

FP-growth (finding frequent itemsets without candidate generation). We

Generating Association Rules from Frequent Itemsets

You might also like