0% found this document useful (0 votes)
66 views

4.1) FP Growth Algorithm

The FP-Growth algorithm uses an FP-tree to efficiently mine frequent itemsets from transactional databases. It involves two steps: (1) building an FP-tree from the database by scanning it twice, and (2) mining the FP-tree by identifying conditional patterns and constructing conditional FP-trees.

Uploaded by

Sanjana Sairama
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

4.1) FP Growth Algorithm

The FP-Growth algorithm uses an FP-tree to efficiently mine frequent itemsets from transactional databases. It involves two steps: (1) building an FP-tree from the database by scanning it twice, and (2) mining the FP-tree by identifying conditional patterns and constructing conditional FP-trees.

Uploaded by

Sanjana Sairama
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Association Analysis (3)

FP-Tree/FP-Growth Algorithm
• Use a compressed representation of the database using an
FP-tree
• Once an FP-tree has been constructed, it uses a recursive
divide-and-conquer approach to mine the frequent itemsets.

Building the FP-Tree


1. Scan data to determine the support count of each item.
Infrequent items are discarded, while the frequent items are
sorted in decreasing support counts.
2. Make a second pass over the data to construct the FPtree.
As the transactions are read, before being processed, their items
are sorted according to the above order.
First scan – determine frequent 1-
itemsets, then build header
TID Items B 8
1 {A,B} A 7
2 {B,C,D}
C 7
3 {A,C,D,E}
D 5
4 {A,D,E}
E 3
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
FP-tree construction
null
After reading TID=1:

B:1
TID Items
1 {A,B}
2 {B,C,D} A:1
3 {A,C,D,E}
4 {A,D,E} After reading TID=2:
5 {A,B,C} null
6 {A,B,C,D}
B:2
7 {B,C}
8 {A,B,C}
C:1
9 {A,B,D} A:1
10 {B,C,E}
D:1
FP-Tree Construction
TID Items
Transaction
1 {A,B} null
2 {B,C,D} Database
3 {A,C,D,E}
4 {A,D,E}
B:8 A:2
5 {A,B,C}
6 {A,B,C,D} A:5 C:3 C:1 D:1
7 {B,C}
8 {A,B,C}
9 {A,B,D} C:3 D:1 D:1 E:1 D:1 E:1
10 {B,C,E}

Header table D:1 E:1


Item Pointer
B 8
A 7
Chain pointers help in quickly finding all the paths
C 7
of the tree containing some given item.
D 5
E 3
FP-Tree size
• The size of an FPtree is typically smaller than the size of the uncompressed
data because many transactions often share a few items in common.
• Bestcase scenario:
– All transactions have the same set of items, and the FPtree contains only a
single branch of nodes.
• Worstcase scenario:
– Every transaction has a unique set of items.
– As none of the transactions have any items in common, the size of the FP-
tree is effectively the same as the size of the original data.

• The size of an FPtree also depends on how the items are ordered.
– If the ordering scheme in the preceding example is reversed,
• i.e., from lowest to highest support item, the resulting FPtree probably is
denser (shown in next slide).
• Not always though…ordering is just a heuristic.
An FPtree representation for the data set with a different item ordering scheme.
FP-Growth (I)
• FPgrowth generates frequent itemsets from an FPtree by
exploring the tree in a bottomup fashion.

• Given the example tree, the algorithm looks for frequent


itemsets ending in E first, followed by D, C, A, and finally, B.

• Since every transaction is mapped onto a path in the FPtree, we


can derive the frequent itemsets ending with a particular item,
say, E, by examining only the paths containing node E.

• These paths can be accessed rapidly using the pointers


associated with node E.
Paths containing node E
null

B:8 A:2

A:5 C:3 C:1 D:1

C:3 D:1 D:1 E:1 D:1 E:1

null
D:1 E:1

B:3 A:2

C:3 C:1 D:1

E:1 D:1 E:1

E:1
Conditional FP-Tree for E
• We now need to build a conditional FP-Tree for E, which is the
tree of itemsets ending in E.

• It is not the tree obtained in previous slide as result of deleting


nodes from the original tree.

• Why? Because the order of the items change.


– In this example, C has a higher count than B.
Conditional FP-Tree for E
null Header table
Item Pointer
The
B:3 A:2 C 4
conditional
B 3
FP-Tree for E
A 2
C:3 C:1 D:1 D 2 null
The new
C:3 C:1 A:1
E:1 D:1 E:1 header

B:3
E:1 A:1 D:1

The set of paths containing E.


D:1
Insert each path (after truncating Adding up the counts for D we get
E) into a new tree. 2, so {E,D} is frequent itemset.

We continue recursively.
Base of recursion: When the tree
has a single path only.
FP-Tree Another Example
Transactions Freq. 1-Itemsets. Transactions with items sorted based
Supp. Count 2 on frequencies, and ignoring the
infrequent items.
ABCEFO A:8 ACEBF
ACG C:8 ACG
EI E:8 E
ACDEG G:5 ACEGD
B:2
ACEGL ACEG
D:2
EJ E
F:2
ABCEFP ACEBF
ACD ACD
ACEGM ACEG
ACEGN ACEG
FP-Tree after reading 1st transaction
ACEBF
Header null
ACG
E A:8 A:1
C:8
ACEGD
E:8 C:1
ACEG
G:5
E
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 2nd transaction
ACEBF
Header null
ACG
E A:8 A:2
C:8
ACEGD
E:8 C:2
ACEG
G:5
E G:1
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 3rd transaction
ACEBF
Header null
ACG
E A:8 A:2 E:1
C:8
ACEGD
E:8 C:2
ACEG
G:5
E G:1
B:2 E:1
ACEBF D:2
ACD F:2 B:1
ACEG
ACEG F:1
FP-Tree after reading 4th transaction
ACEBF
Header null
ACG
E A:8 A:3 E:1
C:8
ACEGD
E:8 C:3
ACEG
G:5
E G:1
B:2 E:2
ACEBF D:2
ACD F:2 B:1
G:1
ACEG
ACEG F:1 D:1
FP-Tree after reading 5th transaction
ACEBF
Header null
ACG
E A:8 A:4 E:1
C:8
ACEGD
E:8 C:4
ACEG
G:5
E G:1
B:2 E:3
ACEBF D:2
ACD F:2 B:1
G:2
ACEG
ACEG F:1 D:1
FP-Tree after reading 6th transaction
ACEBF
Header null
ACG
E A:8 A:4 E:2
C:8
ACEGD
E:8 C:4
ACEG
G:5
E G:1
B:2 E:3
ACEBF D:2
ACD F:2 B:1
G:2
ACEG
ACEG F:1 D:1
FP-Tree after reading 7th transaction
ACEBF
Header null
ACG
E A:8 A:5 E:2
C:8
ACEGD
E:8 C:5
ACEG
G:5
E G:1
B:2 E:4
ACEBF D:2
ACD F:2 B:2
G:2
ACEG
ACEG F:2 D:1
FP-Tree after reading 8th transaction
ACEBF
Header null
ACG
E A:8 A:6 E:2
C:8
ACEGD
E:8 C:6
ACEG
G:5
E G:1 D:1
B:2 E:4
ACEBF D:2
ACD F:2 B:2
G:2
ACEG
ACEG F:2 D:1
FP-Tree after reading 9th transaction
ACEBF
Header null
ACG
E A:8 A:7 E:2
C:8
ACEGD
E:8 C:7
ACEG
G:5
E G:1 D:1
B:2 E:5
ACEBF D:2
ACD F:2 B:2
G:3
ACEG
ACEG F:2 D:1
FP-Tree after reading 10th transaction
ACEBF
Header null
ACG
E A:8 A:8 E:2
C:8
ACEGD
E:8 C:8
ACEG
G:5
E G:1 D:1
B:2 E:6
ACEBF D:2
ACD F:2 B:2
G:4
ACEG
ACEG F:2 D:1
Conditional FP-Trees
Build the conditional FP-Tree for each of the items.
For this:

1. Find the paths containing on focus item. With those paths we


build the conditional FP-Tree for the item.

2. Read again the tree to determine the new counts of the items
along those paths. Build a new header.

3. Insert the paths in the conditional FP-Tree according to the new


order.
Conditional FP-Tree for F
Header null null
New Header

A:8 A:8 A:2 A:2


C:8 C:2
E:8 C:8 E:2 C:2
G:5 B:2
B:2 E:6 E:2
D:2
F:2 B:2 B:2

F:2

There is only a single path containing F


Recursion
• We continue recursively on the
null
conditional FP-Tree for F. New Header
• However, when the tree is just a A:6 A:2
single path it is the base case for C:6
the recursion. E:5 C:2
• So, we just produce all the subsets B:2
of the items on this path merged E:2
with F.
B:2
{F} {A,F} {C,F} {E,F} {B,F}
{A,C,F}, …,
{A,C,E,F}
Conditional FP-Tree for D
New Header null
null

A:8
A:2 A:2
C:2
C:2
C:8
The other items are
E:6 D:1 removed as infrequent.
The tree is just a single path; it is
G:4 the base case for the recursion.
So, we just produce all the
subsets of the items on this path
merged with D.
D:1
{D} {A,D} {C,D} {A,C,D}
Paths containing D after updating the counts
Exercise: Complete the example.

You might also like