0% found this document useful (0 votes)
2 views

FP-Growth Algorithm (1)

The FP-Growth algorithm is an efficient method for mining frequent patterns in large datasets, utilizing a compact data structure called the FP-tree to avoid candidate generation. The process involves calculating item frequencies, constructing the FP-tree, creating a header table, and recursively mining for frequent itemsets. Key advantages of FP-Growth include reduced computation time, efficient memory use, and scalability with large datasets.

Uploaded by

Arya Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

FP-Growth Algorithm (1)

The FP-Growth algorithm is an efficient method for mining frequent patterns in large datasets, utilizing a compact data structure called the FP-tree to avoid candidate generation. The process involves calculating item frequencies, constructing the FP-tree, creating a header table, and recursively mining for frequent itemsets. Key advantages of FP-Growth include reduced computation time, efficient memory use, and scalability with large datasets.

Uploaded by

Arya Verma
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

FP-Growth Algorithm and FP-Tree Construction

November 28, 2024

1 Introduction
The FP-Growth algorithm is an efficient method for frequent pattern mining in
large datasets, which avoids the candidate generation step used in algorithms
like Apriori. It uses a compact data structure called the FP-tree (Frequent
Pattern tree) to represent the database in a compressed form. This allows for
efficient mining of frequent itemsets.

2 Step 1: Calculate Item Frequencies


Before building the FP-tree, we need to calculate the frequency (support) of
each item in the dataset. Support refers to the number of transactions that
contain the item.

2.1 Example Dataset


Consider the following transactions:

Transaction ID Items Purchased


T1 A, B, D, E
T2 B, C, D, E
T3 A, B, D, E, F
T4 B, C, D, E
T5 A, B, D, E, F

2.2 Item Frequency Calculation


The frequency (support) of each item is calculated as follows:

• A: Appears in T1, T3, T5 (3 times)


• B: Appears in T1, T2, T3, T4, T5 (5 times)
• C: Appears in T2, T4 (2 times)
• D: Appears in all transactions (5 times)

1
• E: Appears in all transactions (5 times)
• F: Appears in T3, T5 (2 times)

3 Step 2: Build the FP-Tree


Once the item frequencies are calculated, items are sorted in descending order of
their frequencies. Items with low support are removed from the dataset before
constructing the tree.

3.1 Sort Items by Frequency


The items are sorted in descending order of frequency:

B : 5, D : 5, E : 5, A : 3, C : 2, F : 2

3.2 Insert Transactions into the FP-Tree


Now, we insert each transaction into the FP-tree, sorting the items according
to the frequency list and building the tree. We either increment the count of an
existing node or create a new node.

3.2.1 Inserting Transactions


• T1: A, B, D, E → Sorted: B, D, E, A
• T2: B, C, D, E → Sorted: B, D, E, C

• T3: A, B, D, E, F → Sorted: B, D, E, A, F
• T4: B, C, D, E → Sorted: B, D, E, C
• T5: A, B, D, E, F → Sorted: B, D, E, A, F

3.3 FP-Tree Structure


After inserting all transactions, the FP-tree looks like this:

2
[ROOT]
|
(B:5)
|
+---------+---------+
| |
(D:4) (F:2)
|
+-------+-------+
| | |
(E:4) (A:3) (C:2)
|
(F:2)
Here:
• ROOT is the root node.

• The node B appears 5 times.


• The node D appears 4 times.
• The node E appears 4 times, and so on.

4 Step 3: Header Table


The header table is an essential part of the FP-tree. It stores the items and
their frequencies and provides links to the nodes in the tree. The header table
facilitates efficient mining by allowing easy traversal of the FP-tree.

4.1 Header Table Example


For the FP-tree above, the header table looks like this:

Item Frequency Linked Nodes


B 5 (B : 5) → (B : 4) → (B : 4) → (B : 3) → (B : 3)
D 5 (D : 4) → (D : 4) → (D : 3) → (D : 2)
E 5 (E : 4) → (E : 4) → (E : 3) → (E : 2)
A 3 (A : 3) → (A : 2)
C 2 (C : 2)
F 2 (F : 2) → (F : 2)

5 Step 4: Mining the FP-Tree


The mining process involves recursively extracting frequent itemsets from the
FP-tree by examining the header table and conditional pattern bases.

3
5.1 Conditional Pattern Base for Item B
To mine the patterns related to item B, we look at all paths that contain B and
trace back to the root. The conditional pattern base for B consists of all items
appearing with B in the transactions.

Transactions containing B:
T 1 : B, D, E, A
T 2 : B, D, E, C
T 3 : B, D, E, A, F
T 4 : B, D, E, C
T 5 : B, D, E, A, F
The conditional pattern base for B is:

{D, E, A, F }, {D, E, C}

We then recursively build a conditional FP-tree for this pattern base and
continue mining for further frequent itemsets.

6 Step 5: Recursive Mining and Frequent Item-


sets
The mining process continues recursively for each item in the header table, and
we extract frequent itemsets. The frequent itemsets for the given dataset could
include:

{B} : Support = 5
{D} : Support = 5
{E} : Support = 5
{A} : Support = 3
{B, D} : Support = 4
{B, E} : Support = 4
{B, D, E} : Support = 4
{A, B} : Support = 3
{A, D} : Support = 3
{A, B, D} : Support = 3

4
7 Advantages of FP-Growth
• No Candidate Generation: Unlike Apriori, FP-Growth does not gen-
erate candidate itemsets, which reduces computation time.
• Efficient Memory Use: The FP-tree is a compressed representation of
the dataset, saving memory.
• Scalability: FP-Growth scales well with large datasets due to its efficient
use of memory and reduced I/O operations.

8 Conclusion
The FP-Growth algorithm is a powerful tool for frequent pattern mining. By us-
ing the FP-tree and header table, FP-Growth efficiently mines frequent itemsets
without the need for candidate generation, making it faster and more scalable
than other algorithms like Apriori.

You might also like