FP-Growth Algorithm (1)
FP-Growth Algorithm (1)
1 Introduction
The FP-Growth algorithm is an efficient method for frequent pattern mining in
large datasets, which avoids the candidate generation step used in algorithms
like Apriori. It uses a compact data structure called the FP-tree (Frequent
Pattern tree) to represent the database in a compressed form. This allows for
efficient mining of frequent itemsets.
1
• E: Appears in all transactions (5 times)
• F: Appears in T3, T5 (2 times)
B : 5, D : 5, E : 5, A : 3, C : 2, F : 2
• T3: A, B, D, E, F → Sorted: B, D, E, A, F
• T4: B, C, D, E → Sorted: B, D, E, C
• T5: A, B, D, E, F → Sorted: B, D, E, A, F
2
[ROOT]
|
(B:5)
|
+---------+---------+
| |
(D:4) (F:2)
|
+-------+-------+
| | |
(E:4) (A:3) (C:2)
|
(F:2)
Here:
• ROOT is the root node.
3
5.1 Conditional Pattern Base for Item B
To mine the patterns related to item B, we look at all paths that contain B and
trace back to the root. The conditional pattern base for B consists of all items
appearing with B in the transactions.
Transactions containing B:
T 1 : B, D, E, A
T 2 : B, D, E, C
T 3 : B, D, E, A, F
T 4 : B, D, E, C
T 5 : B, D, E, A, F
The conditional pattern base for B is:
{D, E, A, F }, {D, E, C}
We then recursively build a conditional FP-tree for this pattern base and
continue mining for further frequent itemsets.
{B} : Support = 5
{D} : Support = 5
{E} : Support = 5
{A} : Support = 3
{B, D} : Support = 4
{B, E} : Support = 4
{B, D, E} : Support = 4
{A, B} : Support = 3
{A, D} : Support = 3
{A, B, D} : Support = 3
4
7 Advantages of FP-Growth
• No Candidate Generation: Unlike Apriori, FP-Growth does not gen-
erate candidate itemsets, which reduces computation time.
• Efficient Memory Use: The FP-tree is a compressed representation of
the dataset, saving memory.
• Scalability: FP-Growth scales well with large datasets due to its efficient
use of memory and reduced I/O operations.
8 Conclusion
The FP-Growth algorithm is a powerful tool for frequent pattern mining. By us-
ing the FP-tree and header table, FP-Growth efficiently mines frequent itemsets
without the need for candidate generation, making it faster and more scalable
than other algorithms like Apriori.