0% found this document useful (0 votes)
2 views

Modified Frequent Pattern Mining From Data Stream

The document discusses frequent pattern mining from data streams, highlighting the characteristics and challenges of data streams, such as memory constraints and concept drift. It covers various algorithms for frequent pattern mining, including Apriori, ECLAT, and FP-Growth, detailing their processes, advantages, and limitations. Applications of frequent pattern mining span across e-commerce, cybersecurity, and healthcare, demonstrating its relevance in real-world scenarios.

Uploaded by

chitrabhanuk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Modified Frequent Pattern Mining From Data Stream

The document discusses frequent pattern mining from data streams, highlighting the characteristics and challenges of data streams, such as memory constraints and concept drift. It covers various algorithms for frequent pattern mining, including Apriori, ECLAT, and FP-Growth, detailing their processes, advantages, and limitations. Applications of frequent pattern mining span across e-commerce, cybersecurity, and healthcare, demonstrating its relevance in real-world scenarios.

Uploaded by

chitrabhanuk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

FREQUENT PATTERN

MINING FROM
DATA STREAM
PRESENTED BY:
Rajeev M. – AP24122040003
Tejashri A. – AP24122040009
Ridhamkumar T. – AP24122040014
Suleiman Ismail – AP24122040020
P. Sandeep –AP24122060005
G.Sri Harsha Vardhan-AP24122060007
K.V. Krishna Sampath- AP24122060017
WHAT IS DATA
STREAM?
● A continuous, high-speed, and time-varying flow of data
generated in real-time.
● Unlike traditional datasets, data streams are unbounded,
evolving, and cannot be stored entirely.
● Challenges: Memory constraints, concept drift, and scalability.
● Examples:
Sensor networks (e.g., IoT devices)
Online transactions (e.g., stock market, e-commerce)
Social media feeds (e.g., Twitter, Facebook)
KEY CHARACTERISTICS OF DATA
STREAM
● Unbounded
The data continuously flows and never ends.
● High-speed
Data arrives rapidly, requiring real-time processing.
● Concept Drift
The patterns and distributions in data can change over
time.
● One-pass Processing
Data is processed only once due to memory constraints.
CHALLENGES IN MINING DATA
STREAMS
● Memory Constraints
Traditional algorithms assume data fits in memory, but
streaming data is too large.

● Scalability
Must handle large volumes of fast-arriving data efficiently.

● No Reprocessing
Once data passes, it cannot be re-evaluated.
WHAT IS FREQUENT PATTERN
MINING?
● Frequent Pattern Mining(FPM) is a data mining technique
used to discover recurring patterns, associations, or
correlations in large datasets.

● Initially used in market basket analysis, where it identifies


items frequently bought together.

● Applied in fraud detection, network security,


bioinformatics, and recommendation systems.
KEY CONCEPTS IN FPM
● Frequent Itemsets
Sets of items that appear frequently together in
transactions.
● Support
The proportion of transaction containing an itemset.
● Confidence
The likelihood that one item appears given another (used in
association rules).
● Lift
Measures how much more likely two items occur together
compared to if they were independent.
CHALLENGES IN FPM
● High Computational Cost
Exponential number of possible itemsets.

● Handling Dynamic Data


Frequent patterns change over time in data streams.

● Memory Limitations
Processing large-scale datasets requires efficient storage
structures.
APPLICATIONS OF FPM
● E-commerce
Personalized recommendations based on past purchases.

● Cybersecurity
Detecting abnormal behavior in network traffic.

● Healthcare
Identifying disease correlations from patient records.
POPULAR ALGORITHMS FOR FPM

1. Apriori Algorithm

2. FP-Growth (Frequent Pattern Growth)


Algorithm

3. Lossy Counting Algorithm


1. APRIORI ALGORITHM
INTRODUCTION

● A data mining algorithm used to find frequent itemsets and


generate association rules.

● Helps identify relationships between items in large datasets,


widely used in market basket analysis.

● Example: If customers buy bread, they often buy butter →


Helps in product placement & marketing.
1. APRIORI ALGORITHM
HOW IT WORKS?
● Identifying Frequent Itemsets
Scans the dataset, finds single items & applies a minimum
support threshold.
● Creating Candidate Item Groups
Combines frequent items into larger itemsets iteratively.
● Pruning Infrequent Item Groups
Uses the Apriori Property (if an itemset is infrequent, all
its supersets are infrequent).
● Generating Association Rules
Extracts relationships using support, confidence, and
lift.
1. APRIORI ALGORITHM
KEY MATRICS
● Support
Frequency of an itemset in all transactions ("Bread
appears in 20% of purchases").
● Confidence
Probability of buying Y when X is bought ("If bread is
bought, butter follows 75% of the time").
● Lift
Measures how strongly two items are associated ("Bread &
butter are bought together more than by chance").
1. APRIORI ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery


store:

Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
1. APRIORI ALGORITHM
Step 1 : Identifying Frequent Itemsets (Support Calculation)
● Let’s set minimum support = 50% (≥2 transactions out of 4).
● 1-itemsets (Support Count)
● Milk = 3/4 = 75% ✅
● Bread = 3/4 = 75% ✅
● Butter = 2/4 = 50% ✅

● All 1-itemsets meet the minimum support, so we move to 2-itemsets.


● 2-itemsets (Support Count)
● (Milk, Bread) = 2/4 = 50% ✅
● (Milk, Butter) = 1/4 = 25% ❌ (Below 50%, discard)
● (Bread, Butter) = 1/4 = 25% ❌ (Below 50%, discard)
● Final frequent itemsets:
✅ {Milk}, {Bread}, {Butter}, {Milk, Bread}
1. APRIORI ALGORITHM
Step 2 : Generating Association Rules (Confidence Calculation)
● Using minimum confidence = 60%, we generate rules:

● Rule 1: {Milk} → {Bread}


● Confidence = (Milk, Bread) / Milk = (2/3) = 66.7% ✅

● Rule 2: {Bread} → {Milk}


● Confidence = (Milk, Bread) / Bread = (2/3) = 66.7% ✅
● Rule 3: {Bread} → {Butter}
● Confidence = (Bread, Butter) / Bread = (1/3) = 33.3% ❌
● Rule 4: {Milk} → {Butter}
● Confidence = (Milk, Butter) / Milk = (1/3) = 33.3% ❌
1. APRIORI ALGORITHM
Step 3 : Lift Calculation (For {Bread} → {Milk})
● Lift = Confidence / Support of Bread
● Lift = 66.7% / 75% = 0.88

Final Association Rules:

✅ {Milk} → {Bread} (66.7% confidence)


✅ {Bread} → {Milk} (66.7% confidence)
1. APRIORI ALGORITHM
Advantages:
● Simple and easy to implement.
● Works well for small datasets.

Limitations:
● Expensive in computation (requires multiple database scans).Works well
for small datasets.
● Generates many candidate itemsets, increasing complexity.

Applications:
● Market Basket Analysis (e.g., finding products frequently bought
together).
● Web usage mining (e.g., analyzing browsing behavior).
2. EQUIVALENCE CLASS
CLUSTERING AND BOTTOM-UP
LATTICE TRAVERSAL (ECLAT)
INTRODUCTION

● It is a more efficient and scalable version of the Apriori


algorithm.

● It uses vertical data format (item → list of transaction IDs) to


identify frequent itemsets.

● ECLAT is efficient and fast, especially for dense datasets.


2. ECLAT ALGORITHM
HOW IT WORKS?
● Convert transactions into a vertical database — for each
item, store a list of transactions (TID-list) where the item
appears.
● Find frequent 1-itemsets using support threshold.
● Generate larger frequent itemsets by intersecting TID-lists of
smaller itemsets.
● If intersection count (support) ≥ minimum support threshold,
keep the itemset; otherwise, discard.
2. ECLAT ALGORITHM
KEY MATRICS
● Support
Number of transactions containing the itemset.

● TID-list
List of transaction IDs where an item occurs.

● Minimum Support
Threshold to decide if an itemset is frequent.
2. ECLAT ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery


store:

Datasets
TRANSACTION ID (Transactions)
ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
2. ECLAT ALGORITHM
Step 1 : Create Vertical Data Format (TID-lists)

ITEM TID-list

Milk {T1, T2, T4}

Bread {T1, T3, T4}

Butter {T2, T3}

Step 2 : Find Frequent 1-itemsets


● Assuming Minimum Support = 2
● All items (Milk, Bread, Butter) are frequent as they each appear ≥ 2 times.
2. ECLAT ALGORITHM
Step 3 : Find Frequent 2-itemsets via TID-list Intersections

ITEMSET TID-list Intersection SUPPORT

{Milk, Bread} {T1, T4} 2

{Milk, Butter} {T2} 1(discard)

{Bread, Butter} {T3} 1(discard)

Step 4 : Find Frequent 3-itemsets


● {Milk, Bread, Butter}: Intersection of {T1, T4} ∩ {T2, T3} = ∅ (support = 0,
discard).
2. ECLAT ALGORITHM
Step 5 : Final Frequent Itemsets

ITEMSET SUPPORT

{Milk} 3

{Bread} 3

{Butter} 2

{Milk, Bread} 2
2. ECLAT ALGORITHM
Advantages:
● Fast for dense datasets.
● Uses vertical format, making intersections efficient.
● Requires fewer scans of the database.
● Can handle long frequent patterns effectively.

Limitations:
● Not suitable for sparse datasets (many empty intersections).
● Memory intensive when TID-lists are very large.
● Performs poorly when itemsets are very long in high-dimensional data.

Applications:
● Market Basket Analysis — to find products frequently bought together.
● Web usage mining — to discover frequently visited pages.
● Bioinformatics — to detect frequent patterns in gene sequences.
● Recommendation Systems — to suggest frequently grouped items.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
INTRODUCTION

● Frequent Pattern Growth (FP-Growth) is an efficient algorithm


for frequent pattern mining.

● It overcomes the drawbacks of the Apriori Algorithm by


avoiding multiple database scans and eliminating the need to
generate candidate sets.

● FP-Growth avoids the inefficiencies of the Apriori Algorithm


by using a tree-based approach to mine frequent patterns
efficiently.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
HOW IT WORKS?
● Build the FP-Tree
Scan the dataset once to count item frequencies.
Construct a compact FP-Tree by grouping transactions
with common items.
● Extract Frequent Patterns
Traverse the FP-Tree bottom-up.
Generate conditional FP-Trees to find frequent itemsets.
● Generating Association Rules
Identify relationships between frequent itemsets.
Use support and confidence to validate the rules.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
KEY MATRICS
● Support
Frequency of an itemset in all transactions ("Bread
appears in 20% of purchases").
● Confidence
Probability of buying Y when X is bought ("If bread is
bought, butter follows 75% of the time").
● Lift
Measures how strongly two items are associated ("Bread &
butter are bought together more than by chance").
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
NUMERICAL EXAMPLE

Let's consider a small dataset with 4 transactions in a grocery


store:

Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED

T1 Milk, Bread

T2 Milk, Butter

T3 Bread, Butter

T4 Milk, Bread
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 1 : Count Frequency of Items
● Let’s count how many times each item appears:

ITEM FREQUENCY(SUPPORT COUNT)

Milk 3

Bread 3

Butter 2

● Assume Minimum Support = 2, so all items are frequent.


3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 2 : Sort Items in Each Transaction (by frequency)

TRANSACTION ID ITEM PURCHASED

T1 Bread, Milk

T2 Milk, Butter

T3 Bread, Butter

T4 Bread, Milk
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 3 : Build FP-Tree
Roo
t

Milk Bread
(1) (3)

Butter Milk Butter


(1) (2) (1)
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 4 : Final Frequent Itemsets

FREQUENT ITEMSET SUPPORT COUNT

{Bread} 3

{Milk} 3

{Butter} 2

{Bread, Milk} 2
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 5 : Generate Association Rules
From frequent itemsets:
1. Bread → Milk
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%
2. Milk → Bread
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%

Final Output:
Frequent Itemsets:
● {Bread}, {Milk}, {Butter}, {Bread, Milk}
Association Rules:
● Bread → Milk (Confidence: 66.7%)
● Milk → Bread (Confidence: 66.7%)
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Advantages:
● Faster than Apriori (No need to generate candidate sets).
● Efficient for large datasets (Uses a compact FP-Tree).
● Reduces database scans (Processes data in a compressed way).

Limitations:
● Tree Construction Overhead (Consumes memory for large datasets).
● Not scalable for extremely large databases.

Applications:
● Market Basket Analysis → Recommending products based on purchase
history.
● Fraud Detection → Identifying unusual transaction patterns.
● Web Usage Mining → Analyzing user behavior for recommendations.
● Bioinformatics → Identifying common gene sequences.
REFERENCE
● Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd
ed.). Morgan Kaufmann.

● Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules.
Proceedings of the 20th International Conference on Very Large Data Bases
(VLDB), 487–499.

● Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate
generation. ACM SIGMOD Record, 29(2), 1–12.

● Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for
fast discovery of association rules. KDD, 283–296.

● Manku, G. S., & Motwani, R. (2002). Approximate frequency counts over data
streams. Proceedings of the 28th International Conference on Very Large Data
Bases (VLDB), 346–357.

● https://www.geeksforgeeks.org/
Thank
You!

You might also like