0% found this document useful (0 votes)

2 views

Modified Frequent Pattern Mining From Data Stream

The document discusses frequent pattern mining from data streams, highlighting the characteristics and challenges of data streams, such as memory constraints and concept drift. It covers various algorithms for frequent pattern mining, including Apriori, ECLAT, and FP-Growth, detailing their processes, advantages, and limitations. Applications of frequent pattern mining span across e-commerce, cybersecurity, and healthcare, demonstrating its relevance in real-world scenarios.

Uploaded by

chitrabhanuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Modified Frequent Pattern Mining From Data Stream

Uploaded by

chitrabhanuk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

FREQUENT PATTERN

MINING FROM
DATA STREAM
PRESENTED BY:
Rajeev M. – AP24122040003
Tejashri A. – AP24122040009
Ridhamkumar T. – AP24122040014
Suleiman Ismail – AP24122040020
P. Sandeep –AP24122060005
G.Sri Harsha Vardhan-AP24122060007
K.V. Krishna Sampath- AP24122060017
WHAT IS DATA
STREAM?
● A continuous, high-speed, and time-varying flow of data
generated in real-time.
● Unlike traditional datasets, data streams are unbounded,
evolving, and cannot be stored entirely.
● Challenges: Memory constraints, concept drift, and scalability.
● Examples:
Sensor networks (e.g., IoT devices)
Online transactions (e.g., stock market, e-commerce)
Social media feeds (e.g., Twitter, Facebook)
KEY CHARACTERISTICS OF DATA
STREAM
● Unbounded
The data continuously flows and never ends.
● High-speed
Data arrives rapidly, requiring real-time processing.
● Concept Drift
The patterns and distributions in data can change over
time.
● One-pass Processing
Data is processed only once due to memory constraints.
CHALLENGES IN MINING DATA
STREAMS
● Memory Constraints
Traditional algorithms assume data fits in memory, but
streaming data is too large.

● Scalability
Must handle large volumes of fast-arriving data efficiently.

● No Reprocessing
Once data passes, it cannot be re-evaluated.
WHAT IS FREQUENT PATTERN
MINING?
● Frequent Pattern Mining(FPM) is a data mining technique
used to discover recurring patterns, associations, or
correlations in large datasets.

● Initially used in market basket analysis, where it identifies

items frequently bought together.

● Applied in fraud detection, network security,

bioinformatics, and recommendation systems.
KEY CONCEPTS IN FPM
● Frequent Itemsets
Sets of items that appear frequently together in
transactions.
● Support
The proportion of transaction containing an itemset.
● Confidence
The likelihood that one item appears given another (used in
association rules).
● Lift
Measures how much more likely two items occur together
compared to if they were independent.
CHALLENGES IN FPM
● High Computational Cost
Exponential number of possible itemsets.

● Handling Dynamic Data

Frequent patterns change over time in data streams.

● Memory Limitations
Processing large-scale datasets requires efficient storage
structures.
APPLICATIONS OF FPM
● E-commerce
Personalized recommendations based on past purchases.

● Cybersecurity
Detecting abnormal behavior in network traffic.

● Healthcare
Identifying disease correlations from patient records.
POPULAR ALGORITHMS FOR FPM

1. Apriori Algorithm

2. FP-Growth (Frequent Pattern Growth)

Algorithm

3. Lossy Counting Algorithm

1. APRIORI ALGORITHM
INTRODUCTION

● A data mining algorithm used to find frequent itemsets and

generate association rules.

● Helps identify relationships between items in large datasets,

widely used in market basket analysis.

● Example: If customers buy bread, they often buy butter →

Helps in product placement & marketing.
1. APRIORI ALGORITHM
HOW IT WORKS?
● Identifying Frequent Itemsets
Scans the dataset, finds single items & applies a minimum
support threshold.
● Creating Candidate Item Groups
Combines frequent items into larger itemsets iteratively.
● Pruning Infrequent Item Groups
Uses the Apriori Property (if an itemset is infrequent, all
its supersets are infrequent).
● Generating Association Rules
Extracts relationships using support, confidence, and
lift.
1. APRIORI ALGORITHM
KEY MATRICS
● Support
Frequency of an itemset in all transactions ("Bread
appears in 20% of purchases").
● Confidence
Probability of buying Y when X is bought ("If bread is
bought, butter follows 75% of the time").
● Lift
Measures how strongly two items are associated ("Bread &
butter are bought together more than by chance").
1. APRIORI ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery

store:

Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
1. APRIORI ALGORITHM
Step 1 : Identifying Frequent Itemsets (Support Calculation)
● Let’s set minimum support = 50% (≥2 transactions out of 4).
● 1-itemsets (Support Count)
● Milk = 3/4 = 75% ✅
● Bread = 3/4 = 75% ✅
● Butter = 2/4 = 50% ✅

● All 1-itemsets meet the minimum support, so we move to 2-itemsets.

● 2-itemsets (Support Count)
● (Milk, Bread) = 2/4 = 50% ✅
● (Milk, Butter) = 1/4 = 25% ❌ (Below 50%, discard)
● (Bread, Butter) = 1/4 = 25% ❌ (Below 50%, discard)
● Final frequent itemsets:
✅ {Milk}, {Bread}, {Butter}, {Milk, Bread}
1. APRIORI ALGORITHM
Step 2 : Generating Association Rules (Confidence Calculation)
● Using minimum confidence = 60%, we generate rules:

● Rule 1: {Milk} → {Bread}

● Confidence = (Milk, Bread) / Milk = (2/3) = 66.7% ✅

● Rule 2: {Bread} → {Milk}

● Confidence = (Milk, Bread) / Bread = (2/3) = 66.7% ✅
● Rule 3: {Bread} → {Butter}
● Confidence = (Bread, Butter) / Bread = (1/3) = 33.3% ❌
● Rule 4: {Milk} → {Butter}
● Confidence = (Milk, Butter) / Milk = (1/3) = 33.3% ❌
1. APRIORI ALGORITHM
Step 3 : Lift Calculation (For {Bread} → {Milk})
● Lift = Confidence / Support of Bread
● Lift = 66.7% / 75% = 0.88

Final Association Rules:

✅ {Milk} → {Bread} (66.7% confidence)

✅ {Bread} → {Milk} (66.7% confidence)
1. APRIORI ALGORITHM
Advantages:
● Simple and easy to implement.
● Works well for small datasets.

Limitations:
● Expensive in computation (requires multiple database scans).Works well
for small datasets.
● Generates many candidate itemsets, increasing complexity.

Applications:
● Market Basket Analysis (e.g., finding products frequently bought
together).
● Web usage mining (e.g., analyzing browsing behavior).
2. EQUIVALENCE CLASS
CLUSTERING AND BOTTOM-UP
LATTICE TRAVERSAL (ECLAT)
INTRODUCTION

● It is a more efficient and scalable version of the Apriori

algorithm.

● It uses vertical data format (item → list of transaction IDs) to

identify frequent itemsets.

● ECLAT is efficient and fast, especially for dense datasets.

2. ECLAT ALGORITHM
HOW IT WORKS?
● Convert transactions into a vertical database — for each
item, store a list of transactions (TID-list) where the item
appears.
● Find frequent 1-itemsets using support threshold.
● Generate larger frequent itemsets by intersecting TID-lists of
smaller itemsets.
● If intersection count (support) ≥ minimum support threshold,
keep the itemset; otherwise, discard.
2. ECLAT ALGORITHM
KEY MATRICS
● Support
Number of transactions containing the itemset.

● TID-list
List of transaction IDs where an item occurs.

● Minimum Support
Threshold to decide if an itemset is frequent.
2. ECLAT ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery

store:

Datasets
TRANSACTION ID (Transactions)
ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
2. ECLAT ALGORITHM
Step 1 : Create Vertical Data Format (TID-lists)

ITEM TID-list

Milk {T1, T2, T4}

Bread {T1, T3, T4}

Butter {T2, T3}

Step 2 : Find Frequent 1-itemsets

● Assuming Minimum Support = 2
● All items (Milk, Bread, Butter) are frequent as they each appear ≥ 2 times.
2. ECLAT ALGORITHM
Step 3 : Find Frequent 2-itemsets via TID-list Intersections

ITEMSET TID-list Intersection SUPPORT

{Milk, Bread} {T1, T4} 2

{Milk, Butter} {T2} 1(discard)

{Bread, Butter} {T3} 1(discard)

Step 4 : Find Frequent 3-itemsets

● {Milk, Bread, Butter}: Intersection of {T1, T4} ∩ {T2, T3} = ∅ (support = 0,
discard).
2. ECLAT ALGORITHM
Step 5 : Final Frequent Itemsets

ITEMSET SUPPORT

{Milk} 3

{Bread} 3

{Butter} 2

{Milk, Bread} 2
2. ECLAT ALGORITHM
Advantages:
● Fast for dense datasets.
● Uses vertical format, making intersections efficient.
● Requires fewer scans of the database.
● Can handle long frequent patterns effectively.

Limitations:
● Not suitable for sparse datasets (many empty intersections).
● Memory intensive when TID-lists are very large.
● Performs poorly when itemsets are very long in high-dimensional data.

Applications:
● Market Basket Analysis — to find products frequently bought together.
● Web usage mining — to discover frequently visited pages.
● Bioinformatics — to detect frequent patterns in gene sequences.
● Recommendation Systems — to suggest frequently grouped items.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
INTRODUCTION

● Frequent Pattern Growth (FP-Growth) is an efficient algorithm

for frequent pattern mining.

● It overcomes the drawbacks of the Apriori Algorithm by

avoiding multiple database scans and eliminating the need to
generate candidate sets.

● FP-Growth avoids the inefficiencies of the Apriori Algorithm

by using a tree-based approach to mine frequent patterns
efficiently.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
HOW IT WORKS?
● Build the FP-Tree
Scan the dataset once to count item frequencies.
Construct a compact FP-Tree by grouping transactions
with common items.
● Extract Frequent Patterns
Traverse the FP-Tree bottom-up.
Generate conditional FP-Trees to find frequent itemsets.
● Generating Association Rules
Identify relationships between frequent itemsets.
Use support and confidence to validate the rules.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
KEY MATRICS
● Support
Frequency of an itemset in all transactions ("Bread
appears in 20% of purchases").
● Confidence
Probability of buying Y when X is bought ("If bread is
bought, butter follows 75% of the time").
● Lift
Measures how strongly two items are associated ("Bread &
butter are bought together more than by chance").
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery

store:

Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 1 : Count Frequency of Items
● Let’s count how many times each item appears:

ITEM FREQUENCY(SUPPORT COUNT)

Milk 3

Bread 3

Butter 2

● Assume Minimum Support = 2, so all items are frequent.

3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 2 : Sort Items in Each Transaction (by frequency)

TRANSACTION ID ITEM PURCHASED

T1 Bread, Milk

T2 Milk, Butter

T3 Bread, Butter

T4 Bread, Milk
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 3 : Build FP-Tree
Roo
t

Milk Bread
(1) (3)

Butter Milk Butter

(1) (2) (1)
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 4 : Final Frequent Itemsets

FREQUENT ITEMSET SUPPORT COUNT

{Bread} 3

{Milk} 3

{Butter} 2

{Bread, Milk} 2
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 5 : Generate Association Rules
From frequent itemsets:
1. Bread → Milk
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%
2. Milk → Bread
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%

Final Output:
Frequent Itemsets:
● {Bread}, {Milk}, {Butter}, {Bread, Milk}
Association Rules:
● Bread → Milk (Confidence: 66.7%)
● Milk → Bread (Confidence: 66.7%)
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Advantages:
● Faster than Apriori (No need to generate candidate sets).
● Efficient for large datasets (Uses a compact FP-Tree).
● Reduces database scans (Processes data in a compressed way).

Limitations:
● Tree Construction Overhead (Consumes memory for large datasets).
● Not scalable for extremely large databases.

Applications:
● Market Basket Analysis → Recommending products based on purchase
history.
● Fraud Detection → Identifying unusual transaction patterns.
● Web Usage Mining → Analyzing user behavior for recommendations.
● Bioinformatics → Identifying common gene sequences.
REFERENCE
● Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd
ed.). Morgan Kaufmann.

● Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules.
Proceedings of the 20th International Conference on Very Large Data Bases
(VLDB), 487–499.

● Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate
generation. ACM SIGMOD Record, 29(2), 1–12.

● Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for
fast discovery of association rules. KDD, 283–296.

● Manku, G. S., & Motwani, R. (2002). Approximate frequency counts over data
streams. Proceedings of the 28th International Conference on Very Large Data
Bases (VLDB), 346–357.

● https://www.geeksforgeeks.org/
Thank
You!

BS 4514 2001 PDF
100% (1)
BS 4514 2001 PDF
18 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Association
No ratings yet
Association
40 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Unit_3 Mining Frequent Patterns
No ratings yet
Unit_3 Mining Frequent Patterns
10 pages
DM UNIT-2
No ratings yet
DM UNIT-2
14 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Unit3 Data mining Pattern
No ratings yet
Unit3 Data mining Pattern
46 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
frequent pattern mining
No ratings yet
frequent pattern mining
2 pages
Explain Architecture of Data Mining
No ratings yet
Explain Architecture of Data Mining
12 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rules
No ratings yet
Association Rules
48 pages
dm 2
No ratings yet
dm 2
71 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Week 3
No ratings yet
Week 3
56 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
30 pages
Lecture_4
No ratings yet
Lecture_4
76 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Unit-2
No ratings yet
Unit-2
65 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
DWDM 3
No ratings yet
DWDM 3
34 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
DWDM MOD-1
No ratings yet
DWDM MOD-1
13 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM Unit2_1 Association Mining 19I504
No ratings yet
DM Unit2_1 Association Mining 19I504
86 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
Unit 3
No ratings yet
Unit 3
62 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Chapter4
No ratings yet
Chapter4
32 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
DWDS Unit 4
No ratings yet
DWDS Unit 4
56 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Six Sigma Yellow Belt: Introduction to Lean six Sigma Methodology for Beginners
From Everand
Six Sigma Yellow Belt: Introduction to Lean six Sigma Methodology for Beginners
Elias Soussi
No ratings yet
How to Create Continuous Production Flow?: Toyota Production System Concepts
From Everand
How to Create Continuous Production Flow?: Toyota Production System Concepts
Mohammed Hamed Ahmed Soliman
5/5 (1)
Corrosion Lab Report
No ratings yet
Corrosion Lab Report
10 pages
Use of Kitchen Tools Equipment and Paraphernalias
No ratings yet
Use of Kitchen Tools Equipment and Paraphernalias
8 pages
Exercises of Practical Life-questions
No ratings yet
Exercises of Practical Life-questions
8 pages
Instant Download Digital Storytelling: Capturing Lives, Creating Community 5th Edition Lambert PDF All Chapter
100% (3)
Instant Download Digital Storytelling: Capturing Lives, Creating Community 5th Edition Lambert PDF All Chapter
52 pages
1-edexcel-a-level-physics
No ratings yet
1-edexcel-a-level-physics
18 pages
Chapter 7 Energy Changes
No ratings yet
Chapter 7 Energy Changes
19 pages
Final Annual Report Comp Dept 2022-23
No ratings yet
Final Annual Report Comp Dept 2022-23
6 pages
PAN'S PIPE
No ratings yet
PAN'S PIPE
5 pages
Tandem Mass Spectrometry MSMS
100% (1)
Tandem Mass Spectrometry MSMS
22 pages
OTT Platform Mini Project
No ratings yet
OTT Platform Mini Project
45 pages
K3V - K5V Series: Variable Displacement Open Loop Circuit Axial Piston Pumps For Industrial Vehicles
100% (2)
K3V - K5V Series: Variable Displacement Open Loop Circuit Axial Piston Pumps For Industrial Vehicles
20 pages
To Change The Way You Think, Change The Way You See
No ratings yet
To Change The Way You Think, Change The Way You See
6 pages
Tadhkirah Ibn Mulaqqin
No ratings yet
Tadhkirah Ibn Mulaqqin
22 pages
Assignments Week1 26072019
No ratings yet
Assignments Week1 26072019
10 pages
Project Management MBA Notes
88% (8)
Project Management MBA Notes
5 pages
Terminal Report
No ratings yet
Terminal Report
16 pages
The 7 Types of Photography
No ratings yet
The 7 Types of Photography
15 pages
CH16 Sequential Circuit Design
No ratings yet
CH16 Sequential Circuit Design
23 pages
Non Metallic Materials: Ceramics
No ratings yet
Non Metallic Materials: Ceramics
72 pages
Data Sheet VERTY NOVA 400 M
No ratings yet
Data Sheet VERTY NOVA 400 M
3 pages
Narendra Gangareddi
No ratings yet
Narendra Gangareddi
4 pages
Scnausea Protocol 1mar2012
No ratings yet
Scnausea Protocol 1mar2012
6 pages
Pinagmulan NG Pilipinas Batay Sa Teorya: Quarter 1 Week 2
No ratings yet
Pinagmulan NG Pilipinas Batay Sa Teorya: Quarter 1 Week 2
91 pages
Altermodernity: A Postcolonial(s) Constellation
50% (2)
Altermodernity: A Postcolonial(s) Constellation
10 pages
GPT Text
No ratings yet
GPT Text
24 pages
GKQ 2012 Online Appendix
No ratings yet
GKQ 2012 Online Appendix
6 pages
2015 Toyota Highlander Hybrid Limited 3.5L
No ratings yet
2015 Toyota Highlander Hybrid Limited 3.5L
229 pages
Rabbit Rabbits Are Small Mammals in The Family Leporidae of The Order
100% (1)
Rabbit Rabbits Are Small Mammals in The Family Leporidae of The Order
11 pages
Elephant'S Toothpaste: A Comparative Study of Two Catalysts With Experiment
No ratings yet
Elephant'S Toothpaste: A Comparative Study of Two Catalysts With Experiment
4 pages

Modified Frequent Pattern Mining From Data Stream

Uploaded by

Modified Frequent Pattern Mining From Data Stream

Uploaded by

FREQUENT PATTERN

● Initially used in market basket analysis, where it identifies

● Applied in fraud detection, network security,

● Handling Dynamic Data

2. FP-Growth (Frequent Pattern Growth)

3. Lossy Counting Algorithm

● A data mining algorithm used to find frequent itemsets and

● Helps identify relationships between items in large datasets,

● Example: If customers buy bread, they often buy butter →

Let's consider a small dataset with 4 transactions in a grocery

● All 1-itemsets meet the minimum support, so we move to 2-itemsets.

● Rule 1: {Milk} → {Bread}

● Rule 2: {Bread} → {Milk}

Final Association Rules:

✅ {Milk} → {Bread} (66.7% confidence)

● It is a more efficient and scalable version of the Apriori

● It uses vertical data format (item → list of transaction IDs) to

● ECLAT is efficient and fast, especially for dense datasets.

Let's consider a small dataset with 4 transactions in a grocery

Milk {T1, T2, T4}

Bread {T1, T3, T4}

Butter {T2, T3}

Step 2 : Find Frequent 1-itemsets

ITEMSET TID-list Intersection SUPPORT

{Milk, Bread} {T1, T4} 2

{Milk, Butter} {T2} 1(discard)

{Bread, Butter} {T3} 1(discard)

Step 4 : Find Frequent 3-itemsets

● Frequent Pattern Growth (FP-Growth) is an efficient algorithm

● It overcomes the drawbacks of the Apriori Algorithm by

● FP-Growth avoids the inefficiencies of the Apriori Algorithm

Let's consider a small dataset with 4 transactions in a grocery

ITEM FREQUENCY(SUPPORT COUNT)

● Assume Minimum Support = 2, so all items are frequent.

TRANSACTION ID ITEM PURCHASED

Butter Milk Butter

FREQUENT ITEMSET SUPPORT COUNT

You might also like