Modified Frequent Pattern Mining From Data Stream
Modified Frequent Pattern Mining From Data Stream
MINING FROM
DATA STREAM
PRESENTED BY:
Rajeev M. – AP24122040003
Tejashri A. – AP24122040009
Ridhamkumar T. – AP24122040014
Suleiman Ismail – AP24122040020
P. Sandeep –AP24122060005
G.Sri Harsha Vardhan-AP24122060007
K.V. Krishna Sampath- AP24122060017
WHAT IS DATA
STREAM?
● A continuous, high-speed, and time-varying flow of data
generated in real-time.
● Unlike traditional datasets, data streams are unbounded,
evolving, and cannot be stored entirely.
● Challenges: Memory constraints, concept drift, and scalability.
● Examples:
Sensor networks (e.g., IoT devices)
Online transactions (e.g., stock market, e-commerce)
Social media feeds (e.g., Twitter, Facebook)
KEY CHARACTERISTICS OF DATA
STREAM
● Unbounded
The data continuously flows and never ends.
● High-speed
Data arrives rapidly, requiring real-time processing.
● Concept Drift
The patterns and distributions in data can change over
time.
● One-pass Processing
Data is processed only once due to memory constraints.
CHALLENGES IN MINING DATA
STREAMS
● Memory Constraints
Traditional algorithms assume data fits in memory, but
streaming data is too large.
● Scalability
Must handle large volumes of fast-arriving data efficiently.
● No Reprocessing
Once data passes, it cannot be re-evaluated.
WHAT IS FREQUENT PATTERN
MINING?
● Frequent Pattern Mining(FPM) is a data mining technique
used to discover recurring patterns, associations, or
correlations in large datasets.
● Memory Limitations
Processing large-scale datasets requires efficient storage
structures.
APPLICATIONS OF FPM
● E-commerce
Personalized recommendations based on past purchases.
● Cybersecurity
Detecting abnormal behavior in network traffic.
● Healthcare
Identifying disease correlations from patient records.
POPULAR ALGORITHMS FOR FPM
1. Apriori Algorithm
Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED
T1 Milk, Bread
T2 Milk, Butter
T3 Bread, Butter
T4 Milk, Bread
1. APRIORI ALGORITHM
Step 1 : Identifying Frequent Itemsets (Support Calculation)
● Let’s set minimum support = 50% (≥2 transactions out of 4).
● 1-itemsets (Support Count)
● Milk = 3/4 = 75% ✅
● Bread = 3/4 = 75% ✅
● Butter = 2/4 = 50% ✅
Limitations:
● Expensive in computation (requires multiple database scans).Works well
for small datasets.
● Generates many candidate itemsets, increasing complexity.
Applications:
● Market Basket Analysis (e.g., finding products frequently bought
together).
● Web usage mining (e.g., analyzing browsing behavior).
2. EQUIVALENCE CLASS
CLUSTERING AND BOTTOM-UP
LATTICE TRAVERSAL (ECLAT)
INTRODUCTION
● TID-list
List of transaction IDs where an item occurs.
● Minimum Support
Threshold to decide if an itemset is frequent.
2. ECLAT ALGORITHM
NUMERICAL EXAMPLE
Datasets
TRANSACTION ID (Transactions)
ITEM PURCHASED
T1 Milk, Bread
T2 Milk, Butter
T3 Bread, Butter
T4 Milk, Bread
2. ECLAT ALGORITHM
Step 1 : Create Vertical Data Format (TID-lists)
ITEM TID-list
ITEMSET SUPPORT
{Milk} 3
{Bread} 3
{Butter} 2
{Milk, Bread} 2
2. ECLAT ALGORITHM
Advantages:
● Fast for dense datasets.
● Uses vertical format, making intersections efficient.
● Requires fewer scans of the database.
● Can handle long frequent patterns effectively.
Limitations:
● Not suitable for sparse datasets (many empty intersections).
● Memory intensive when TID-lists are very large.
● Performs poorly when itemsets are very long in high-dimensional data.
Applications:
● Market Basket Analysis — to find products frequently bought together.
● Web usage mining — to discover frequently visited pages.
● Bioinformatics — to detect frequent patterns in gene sequences.
● Recommendation Systems — to suggest frequently grouped items.
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
INTRODUCTION
Datasets (Transactions)
TRANSACTION ID ITEM PURCHASED
T1 Milk, Bread
T2 Milk, Butter
T3 Bread, Butter
T4 Milk, Bread
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 1 : Count Frequency of Items
● Let’s count how many times each item appears:
Milk 3
Bread 3
Butter 2
T1 Bread, Milk
T2 Milk, Butter
T3 Bread, Butter
T4 Bread, Milk
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 3 : Build FP-Tree
Roo
t
Milk Bread
(1) (3)
{Bread} 3
{Milk} 3
{Butter} 2
{Bread, Milk} 2
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Step 5 : Generate Association Rules
From frequent itemsets:
1. Bread → Milk
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%
2. Milk → Bread
● Support = 2/4 = 50%
● Confidence = 2/3 ≈ 66.7%
Final Output:
Frequent Itemsets:
● {Bread}, {Milk}, {Butter}, {Bread, Milk}
Association Rules:
● Bread → Milk (Confidence: 66.7%)
● Milk → Bread (Confidence: 66.7%)
3. FREQUENT PATTERN(FP)
GROWTH ALGORITHM
Advantages:
● Faster than Apriori (No need to generate candidate sets).
● Efficient for large datasets (Uses a compact FP-Tree).
● Reduces database scans (Processes data in a compressed way).
Limitations:
● Tree Construction Overhead (Consumes memory for large datasets).
● Not scalable for extremely large databases.
Applications:
● Market Basket Analysis → Recommending products based on purchase
history.
● Fraud Detection → Identifying unusual transaction patterns.
● Web Usage Mining → Analyzing user behavior for recommendations.
● Bioinformatics → Identifying common gene sequences.
REFERENCE
● Han, J., Pei, J., & Kamber, M. (2011). Data Mining: Concepts and Techniques (3rd
ed.). Morgan Kaufmann.
● Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules.
Proceedings of the 20th International Conference on Very Large Data Bases
(VLDB), 487–499.
● Han, J., Pei, J., & Yin, Y. (2000). Mining frequent patterns without candidate
generation. ACM SIGMOD Record, 29(2), 1–12.
● Zaki, M. J., Parthasarathy, S., Ogihara, M., & Li, W. (1997). New algorithms for
fast discovery of association rules. KDD, 283–296.
● Manku, G. S., & Motwani, R. (2002). Approximate frequency counts over data
streams. Proceedings of the 28th International Conference on Very Large Data
Bases (VLDB), 346–357.
● https://www.geeksforgeeks.org/
Thank
You!