0% found this document useful (0 votes)
205 views

09 Association Analysis (Itemset Representation)

The document discusses two representations for compactly representing frequent itemsets: maximal frequent itemsets and closed frequent itemsets. Maximal frequent itemsets are itemsets whose supersets are infrequent, while closed frequent itemsets are itemsets whose supersets do not share the same support count. Both representations provide a minimal set of itemsets from which all other frequent itemsets can be derived.

Uploaded by

Oscar Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
205 views

09 Association Analysis (Itemset Representation)

The document discusses two representations for compactly representing frequent itemsets: maximal frequent itemsets and closed frequent itemsets. Maximal frequent itemsets are itemsets whose supersets are infrequent, while closed frequent itemsets are itemsets whose supersets do not share the same support count. Both representations provide a minimal set of itemsets from which all other frequent itemsets can be derived.

Uploaded by

Oscar Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Compact representation of frequent

itemsets
 The number of frequent itemsets produced
from a transaction data set can be very large.
 It is useful to identify a small representative
set of frequent itemsets from which all other
frequent itemsets can be derived.
 Two representations are
 Maximal frequent itemsets
 Closed frequent itemsets

1
Maximal frequent itemsets
 A maximal frequent itemset is defined as a
frequent itemset for which none of its
immediate supersets are frequent.
 We consider the itemset lattice shown in the
following figure.
 The itemsets in the lattice are divided into two
groups
 Those that are frequent
 Those that are infrequent

2
Maximal frequent itemsets

3
Maximal frequent itemsets
 A frequent itemset border is also illustrated in
the figure.
 Every itemset located above the border is
frequent.
 On the other hand, those located below the
border are infrequent.

4
Maximal frequent itemsets
 {a,d}, {a,c,e} and {b,c,d,e} are considered to
be maximal frequent itemsets.
 This is because their immediate supersets
are infrequent.
 In contrast, {a,c} is non-maximal because one
of its immediate supersets, {a,c,e}, is frequent.

5
Maximal frequent itemsets
 Maximal frequent itemsets effectively provide
a compact representation of frequent
itemsets.
 They form the smallest set of itemsets from
which all frequent itemsets can be derived.

6
Maximal frequent itemsets
 Maximal frequent itemsets do not contain the
support information of their subsets.
 An additional pass over the data set is
required to determine the support counts of
the non-maximal frequent itemsets.

7
Closed frequent itemsets
 An itemset X is closed if none of its
immediate supersets has exactly the same
support count as X.
 Put another way, X is not closed if at least
one of its immediate supersets has the same
support count as X.

8
Closed frequent itemsets
 We consider the itemsets shown in the
following figure.
 Each node (itemset) in the lattice is
associated with a list of its corresponding
TIDs.

9
Closed frequent itemsets

10
Closed frequent itemsets
 We notice that every transaction that contains
b also contains c.
 Consequently, the support for {b} is identical
to {b,c}.
 {b} should not be considered a closed itemset.

11
Closed frequent itemsets
 Similarly, the itemset {a,d} is not closed, since
c occurs in every transaction that contains
both a and d.
 On the other hand, {b,c} is a closed itemset.
 This is because it does not have the same
support count as any of its immediate
supersets.

12
Closed frequent itemsets
 An itemset is a closed frequent itemset if
 It is closed and
 Its support is greater than or equal to minsup.
 In the previous example, assuming that the
support threshold is 40%.
 {b,c} is a closed frequent itemset because its
support is 60%.
 The rest of the closed frequent itemsets are
indicated by the shaded nodes.
13
Closed frequent itemsets
 We can use the closed frequent itemsets to
determine the support counts for the non-closed
frequent itemsets.
 For example, we consider the frequent itemset {a,d}
shown in the figure on slide 10.
 Because the itemset is not closed, its support count
must be identical to that of one of its immediate
supersets.
 The key is to determine which superset (among
{a,b,d}, {a,c,d} or {a,d,e}) has exactly the same
support count as {a,d}.
14
Closed frequent itemsets
 A transaction that contains one of the
immediate supersets of {a,d} must also
contain {a,d}.
 However, a transaction that contains {a,d}
does not have to contain that particular
immediate superset of {a,d}.
 For this reason, the support for {a,d} must be
equal to the largest support among those of
its immediate supersets.

15
Closed frequent itemsets
 The support for {a,c,d} is the largest among
those of the three supersets.
 As a result, the support for {a,d} must be
identical to the support for {a,c,d}.

16
Closed frequent itemsets
 All maximal frequent itemsets are closed.
 This is because, by definition, the support
count of a maximal frequent itemset cannot
be the same as that of any of its immediate
supersets.
 The relationship among frequent, maximal
frequent, and closed frequent itemsets are
shown in the following figure.

17
Closed frequent itemsets

18

You might also like