SlideShare a Scribd company logo
A Fuzzy Associative Rule-
    based Approach for Pattern
     Mining and Pattern-based
                 Classification

                                         Ashish Mangalampalli
                                       Advisor: Dr. Vikram Pudi
                                   Centre for Data Engineering
       International Institute of Information Technology (IIIT)
                                                     Hyderabad
1
Outline
       Introduction
       Crisp and Fuzzy Associative Classification

       Pre-Processing and Mining
           Fuzzy Pre-Processing – FPrep
           Fuzzy ARM – FAR-Miner and FAR-HD

       Associative Classification – Our Approach
           FACISME – Fuzzy Adaption of ACME (Maximum Entropy Associative Classifier)
           Simple and Effective Associative Classifier (SEAC)
           Fuzzy Simple and Effective Associative Classifier (FSEAC)

       Associative Classification – Applications
           Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
           Associative Classifier for Ad-targeting

       Conclusions
    2
Introduction
       Associative classification
           Mines huge amounts of data
           Integrates Association Rule Mining (ARM) with Classification

                         A = a, B = b, C = c → X = x

       Associative classifiers have several advantages
           Frequent itemsets capture dominant relationships between
            items/features
           Statistically significant associations make classification
            framework robust
           Low-frequency patterns (noise) are eliminated during ARM
           Rules are very transparent and easily understood
               Unlike black-box-like approach used in popular classifiers, such as
                SVMs and Artificial Neural Networks

    3
Outline
       Introduction

       Crisp and Fuzzy Associative Classification
       Pre-Processing and Mining
           Fuzzy Pre-Processing – FPrep
           Fuzzy ARM – FAR-Miner and FAR-HD

       Associative Classification – Our Approach
           Simple and Effective Associative Classifier (SEAC)
           Fuzzy Simple and Effective Associative Classifier (FSEAC)

       Associative Classification – Applications
           Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
           Associative Classifier for Ad-targeting

       Conclusions

    4
Crisp Associative Classification
       Most associative classifiers are crisp
           Most real-life datasets contain binary and numerical attributes
           Use sharp partitioning
           Transform numerical attributes to binary ones, e.g. Income =
            [100K and above]

       Drawbacks of sharp partitioning
           Introduces uncertainty, especially at partition boundaries
           Small changes in intervals lead to misleading results
           Gives rise to polysemy and synonymy
           Intervals do not generally have clear semantics associated

       For example, sharp partitions for the attribute Income
           Up to 20K, 20K-100K, 100K and above
           Income = 50K would fit in the second partition
           But, so would Income = 99K
    5
Fuzzy Associative Classification
       Fuzzy logic
           Used to convert numerical attributes to fuzzy attributes
            (e.g. Income = High)
           Maintains integrity of information conveyed by numerical
            attributes
           Attribute values belong to partitions with some
            membership - interval [0, 1]




    6
Outline
       Introduction

       Crisp and Fuzzy Associative Classification

       Pre-Processing and Mining
           Fuzzy Pre-Processing – FPrep
           Fuzzy ARM – FAR-Miner and FAR-HD

       Associative Classification – Our Approach
           Simple and Effective Associative Classifier (SEAC)
           Fuzzy Simple and Effective Associative Classifier (FSEAC)

       Associative Classification – Applications
           Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
           Associative Classifier for Ad-targeting

       Conclusions

    7
Pre-Processing and Mining
       Fuzzy pre-processing
           Convert crisp dataset (binary and numerical attributes)
            into fuzzy dataset (binary and fuzzy attributes)
           FPrep Algorithm used

       Efficient and robust Fuzzy ARM algorithms
           Web-scale datasets mandate such algorithms
           Fuzzy Apriori is most popular
           Many efficient crisp ARM algorithms exist like ARMOR
            and FP-Growth
           Algorithms used
               FAR-Miner for normal transactional datasets
               FAR-HD for high dimensional datasets

    8
Outline
    Introduction

    Crisp and Fuzzy Associative Classification

    Pre-Processing and Mining
        Fuzzy Pre-Processing – FPrep
        Fuzzy ARM – FAR-Miner and FAR-HD


    Associative Classification – Our Approach
        Simple and Effective Associative Classifier (SEAC)
        Fuzzy Simple and Effective Associative Classifier (FSEAC)

    Associative Classification – Applications
        Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
        Associative Classifier for Ad-targeting


    Conclusions



    13
Associative Classification – Our
Approach
    AC algorithms like CPAR and CMAR only mine frequent
     itemsets
        Processed using additional (greedy) algorithms like FOIL and PRM
        Overhead in running time; process more complex

    Association rules directly used for training and scoring
        Exhaustive approach
            Controlled by appropriate support
            Not a time-intensive process
        Rule pruning and ranking take care of huge volume and
         redundancy

    Classifier built in a two-phased manner
        Global rule-mining and training
        Local rule-mining and training
        Provides better accuracy and representation/coverage
    14
Associative Classification – Our
Approach (cont’d)
    Pre-processing to generate fuzzy dataset (for fuzzy
     associative classifiers) using FPrep

    Classification Association Rules (CARs) mining using
     FAR-Miner or FAR-HD

    CARs pruning and classifier training using SEAC or
     FSEAC

    Rule ranking and application (scoring) techniques

    15
Simple and Effective Associative
Classifier (SEAC)
   Direct mining of CARs –
    faster and simpler training

   CARs used directly through
    effective pruning and sorting

   Pruning and rule-ranking
    based on
        Information gain
        Rule-length

   Two-phased manner
        Global rule-mining and training
        Local rule-mining and training


    16
SEAC - Example




Example Dataset

Scoring Example
Unlabeled: B=2, C=2
X=1 → 16, 17, 19 (IG=0.534)
X=2 → 13, 14, 20 (IG=0.657)


                              Ruleset
   17
Fuzzy Simple and Effective Associative
Classifier (FSEAC)
    Amalgamates Fuzzy Logic with Associative Classification

    Pre-processed using FPreP

    CARs mined using FAR-Miner / FAR-HD

    CARs pruned based on Fuzzy Information Gain (FIG)
     and rule length - no sorting required

    Scoring – rules applied taking µ into account
        Sorting done then
        Final score computed



    18
FSEAC - Example


                Format for Fuzzy Version of Dataset




     Example Dataset             Fuzzy Version of Example Dataset
19
FSEAC – Example (cont’d)




               Ruleset
20
SEAC and FSEAC Experimental Setup
    SEAC
        12 classifiers (Associative and non-associative)
        14 UCI ML datasets
        100-5000 records per dataset
        2-10 classes per dataset
        Up to 20 features per dataset
        10-fold Cross Validation

    FSEAC
        17 classifiers (Associative and non-associative; fuzzy and crisp)
        23 UCI ML datasets
        100-5000 records per dataset
        2-10 classes per dataset
        Up to 60 features per dataset
        10-fold Cross Validation

    21
SEAC – Results (10 fold-CV)




                              continued


22
SEAC - Results (10 fold-CV)




23
FSEAC - Results (10 fold-CV)




                               continued
24
FSEAC - Results (10 fold-CV)




25
Outline
    Introduction

    Crisp and Fuzzy Associative Classification

    Pre-Processing and Mining
        Fuzzy Pre-Processing – FPrep
        Fuzzy ARM – FAR-Miner and FAR-HD

    Associative Classification – Our Approach
        Simple and Effective Associative Classifier (SEAC)
        Fuzzy Simple and Effective Associative Classifier (FSEAC)


    Associative Classification – Applications
        Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
        Associative Classifier for Ad-targeting

    Conclusions

    26
Efficient Fuzzy Associative Classifier for
Object Classes in Images (I-FAC)
    Adapts fuzzy associative classification for Object Class
     Detection in images
         Speeded-Up Robust Features (SURF) - interest point detector
          and descriptor for images
         Fuzzy clusters used as opposed to hard clustering used in Bag-
          of-words

    Only positive class (CP) examples used for mining
         Negative class (CN) in object class detection is very vague
             CN = U – CP


        Rules are pruned and ranked based on Information Gain
         Other AC algorithms use third-party algorithms for rule-
          generation from frequent itemsets
         Top k rules are used for scoring and classification

    27                                         ICPR 2010
I-FAC
    SURF points extracted from positive class images
        FCM applied to derive clusters
        Clusters (with µs) used to generate dataset for mining
            100 fuzzy clusters as opposed to1000-2000 crisp clusters-based algorithms




    ARM generates Classification Association Rules (CARs)
     associated with positive class

    CARs are pruned and sorted using
        Fuzzy Information Gain (FIG) of each rule
        Length of each rule i.e. number of attributes in each rule

    Scoring based on rule-match and FIG
    28                                                ICPR 2010
I-FAC - Performance Study
   Performs well when
    compared to BOW or SVM
       Very well at low FPRs (≤0.3)


   Fuzzy nature helps avoid
    polysemy and synonymy

   Uses only positive class
    for training



    30                                 ICPR 2010
Visual Concept Detection on MIR Flickr
    Revamped version of I-FAC

    Multi-class detection
        38 visual concepts
        e.g. car, sky, clouds, water, building, sea, face

    Experimental evaluation
        First 10K images of MIR Flick dataset
        AUC values for each concept




    31
Experimental Results (3-fold CV)




                                   continued
32
Experimental Results (3-fold CV)




33
Look-alike Modeling using Feature-Pair-
based Associative Classification
    Display-ad targeting currently done using methods which rely
     on publisher-defined segments like Behavior-targeting (BT)

    Look-alike model trained to identify similar users
        Similarity is based on historical user behavior
        Model iteratively rebuilt as more users are added
        Advertiser supplies seed list of users

    Approach for building advertiser specific audience segments
        Complements publisher defined segments such as BT
        Provides advertisers control over the audience definition

    Given a list of target users (e.g., people who clicked or
     converted on a particular category or ad campaign), find other
     similar users.

    34                                         WWW 2011
Look-alike Modeling using Feature-Pair-
based Associative Classification – cont’d
    Enumerate all feature-pairs in training set occurring in at
     least 5 positive-class records
        Feature-pairs modelled as AC rules
        Only rules for positive class used
        Works well in Tail Campaigns


    Affinity measured by Frequency-weighted LLR (F-LLR)
        FLLR = P(f) log(P(f | conv) / P(f | non-conv))
        Rules sorted in descending order by F-LLRs


    Scoring - Top k rules are applied
        Cumulative score from all rules used for classification

    35                                        WWW 2011
Performance Study
    Two pilot campaigns
        300K records each                             Lift
                                      Baseline     (Conversion          Lift (AUC)
        One record per user
                                                      Rate)
        Training window - 14          Random
         days                                         82%                   –
                                      Targeting
        Scoring window - seven      Linear SVM       301%                11%
         days
                                        GBDT          100%                2%

    Works very well for Tail              Results on a Tail Campaign
     Campaigns
        Can find meaningful                                                      Lift
                                        Baseline       Lift (Conversion Rate)
         associations in extremely                                               (AUC)
         sparse and skewed data          Random
                                                                 48%                  –
                                        Targeting
    SVM and GBDT work                 Linear SVM                -12%                 -6%
     well for Head Campaigns              GBDT                   -40%                -14%
                                            Results on a Head Campaign
    36                                             WWW 2011
Outline
    Introduction

    Crisp and Fuzzy Associative Classification

    Pre-Processing and Mining
        Fuzzy Pre-Processing – FPrep
        Fuzzy ARM – FAR-Miner and FAR-HD

    Associative Classification – Our Approach
        Simple and Effective Associative Classifier (SEAC)
        Fuzzy Simple and Effective Associative Classifier (FSEAC)
    Associative Classification – Applications
        Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)
        Associative Classifier for Ad-targeting


    Conclusions

    37
Conclusions
    Fuzzy pre-processing for dataset transformation

    Fuzzy ARM for various types of datasets

    Fuzzy and Crisp Associative Classifiers for various
     domains
        Customizations required for different domains
            Pre-processing
            Pruning
            Rule ranking techniques
            Rule application (scoring) techniques


    38
References
    Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O. Hatch, Abraham Bagherjeiran,
     Rajesh Parekh, and Vikram Pudi. A Feature-Pair-based Associative Classification
     Approach to Look-alike Modeling for Conversion-Oriented User-Targeting in Tail
     Campaigns. In International World Wide Web Conference (WWW), 2011.

    Ashish Mangalampalli, Vineet Chaoji, and Subhajit Sanyal. I-FAC: Efficient fuzzy
     associative classifier for object classes in images. In International Conference on
     Pattern Recognition (ICPR), 2010.

    Ashish Mangalampalli and Vikram Pudi. FPrep: Fuzzy clustering driven efficient
     automated pre-processing for fuzzy association rule mining. In IEEE International
     Conference on Fuzzy Systems (FUZZ-IEEE), 2010.

    Ashish Mangalampalli and Vikram Pudi. FACISME: Fuzzy associative classification
     using iterative scaling and maximum entropy. In IEEE International Conference on
     Fuzzy Systems (FUZZ-IEEE), 2010.

    Ashish Mangalampalli and Vikram Pudi. Fuzzy Association Rule Mining Algorithm for
     Fast and Efficient Performance on Very Large Datasets. In IEEE International
     Conference on Fuzzy Systems (FUZZ-IEEE), 2009.



    39
Thank You, and
       Questions




40

More Related Content

PDF
Neural Style Transfer in practice
MohamedAmineHACHICHA1
 
PDF
CFM Challenge - Course Project
KhalilBergaoui
 
PDF
1998 - Thesis JL Pacherie Parallel perators
Jean-Lin Pacherie, Ph.D.
 
PDF
IRJET- Novel based Stock Value Prediction Method
IRJET Journal
 
PDF
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
ijwmn
 
PDF
Bs25412419
IJERA Editor
 
PDF
Design and implementation of Parallel Prefix Adders using FPGAs
IOSR Journals
 
PDF
A SEMI BLIND CHANNEL ESTIMATION METHOD BASED ON HYBRID NEURAL NETWORKS FOR UP...
ijwmn
 
Neural Style Transfer in practice
MohamedAmineHACHICHA1
 
CFM Challenge - Course Project
KhalilBergaoui
 
1998 - Thesis JL Pacherie Parallel perators
Jean-Lin Pacherie, Ph.D.
 
IRJET- Novel based Stock Value Prediction Method
IRJET Journal
 
MODIFIED LLL ALGORITHM WITH SHIFTED START COLUMN FOR COMPLEXITY REDUCTION
ijwmn
 
Bs25412419
IJERA Editor
 
Design and implementation of Parallel Prefix Adders using FPGAs
IOSR Journals
 
A SEMI BLIND CHANNEL ESTIMATION METHOD BASED ON HYBRID NEURAL NETWORKS FOR UP...
ijwmn
 

Viewers also liked (16)

PDF
26 Machine Learning Unsupervised Fuzzy C-Means
Andres Mendez-Vazquez
 
PPTX
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 
PPTX
Mining Fuzzy Moving Object Clusters
NhatHai Phan
 
PPTX
Python programming advance lab api npr 2
profbnk
 
PDF
Fuzzy and Neural Approaches in Engineering MATLAB
ESCOM
 
PPTX
Market basketanalysis using r
Yogesh Khandelwal
 
PPTX
Market Basket Analysis in SAS
Andrew Kramer
 
PPTX
Data mining- Association Analysis -market basket
Swapnil Soni
 
PPT
Market basket analysis
tsering choezom
 
PPTX
Masket Basket Analysis
Marc Berman
 
PPTX
Market Basket Analysis
Mahendra Gupta
 
PPTX
Real-time Market Basket Analysis for Retail with Hadoop
DataWorks Summit
 
PDF
Market baasket analysis
SiddharthaPanapakam
 
PPTX
Correlation of subjects in school (b.ed notes)
Namrata Saxena
 
PPT
Data mining slides
smj
 
PPT
Bioinformatics
JTADrexel
 
26 Machine Learning Unsupervised Fuzzy C-Means
Andres Mendez-Vazquez
 
PhD Thesis: Mining abstractions in scientific workflows
dgarijo
 
Mining Fuzzy Moving Object Clusters
NhatHai Phan
 
Python programming advance lab api npr 2
profbnk
 
Fuzzy and Neural Approaches in Engineering MATLAB
ESCOM
 
Market basketanalysis using r
Yogesh Khandelwal
 
Market Basket Analysis in SAS
Andrew Kramer
 
Data mining- Association Analysis -market basket
Swapnil Soni
 
Market basket analysis
tsering choezom
 
Masket Basket Analysis
Marc Berman
 
Market Basket Analysis
Mahendra Gupta
 
Real-time Market Basket Analysis for Retail with Hadoop
DataWorks Summit
 
Market baasket analysis
SiddharthaPanapakam
 
Correlation of subjects in school (b.ed notes)
Namrata Saxena
 
Data mining slides
smj
 
Bioinformatics
JTADrexel
 
Ad

Similar to PhD Defense -- Ashish Mangalampalli (20)

PPTX
Scaling Up And Speeding Up Video Analytics Inside Database Engine
Rui Liu
 
PDF
E502024047
IJERA Editor
 
PDF
E502024047
IJERA Editor
 
PDF
Study of Density Based Clustering Techniques on Data Streams
IJERA Editor
 
PPT
lecture14-learning-ranking.ppt
VishalKumar725248
 
PPT
lecture14-learning-ranking.ppt
VishalKumar725248
 
PDF
IRJET- Comparative Study of PCA, KPCA, KFA and LDA Algorithms for Face Re...
IRJET Journal
 
PDF
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
IRJET Journal
 
PPTX
Forecasting time series for business and operations data: A tutorial
Colleen Farrelly
 
PDF
Real-Time Pertinent Maneuver Recognition for Surveillance
IRJET Journal
 
PDF
Multisensor Data Fusion : Techno Briefing
Paveen Juntama
 
PDF
AI and Deep Learning
Subrat Panda, PhD
 
PPTX
B4UConference_machine learning_deeplearning
Hoa Le
 
PPSX
Topology Aware Resource Allocation
Sujith Jay Nair
 
PDF
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
PDF
SVM-KNN Hybrid Method for MR Image
IRJET Journal
 
PDF
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Eswar Publications
 
PDF
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
PPTX
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Bunyamin Sisman
 
PDF
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
Mumtaz Hannah Vauhkonen
 
Scaling Up And Speeding Up Video Analytics Inside Database Engine
Rui Liu
 
E502024047
IJERA Editor
 
E502024047
IJERA Editor
 
Study of Density Based Clustering Techniques on Data Streams
IJERA Editor
 
lecture14-learning-ranking.ppt
VishalKumar725248
 
lecture14-learning-ranking.ppt
VishalKumar725248
 
IRJET- Comparative Study of PCA, KPCA, KFA and LDA Algorithms for Face Re...
IRJET Journal
 
Adaptive Neuro-Fuzzy Inference System (ANFIS) for segmentation of image ROI a...
IRJET Journal
 
Forecasting time series for business and operations data: A tutorial
Colleen Farrelly
 
Real-Time Pertinent Maneuver Recognition for Surveillance
IRJET Journal
 
Multisensor Data Fusion : Techno Briefing
Paveen Juntama
 
AI and Deep Learning
Subrat Panda, PhD
 
B4UConference_machine learning_deeplearning
Hoa Le
 
Topology Aware Resource Allocation
Sujith Jay Nair
 
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
SVM-KNN Hybrid Method for MR Image
IRJET Journal
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Eswar Publications
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
IRJET Journal
 
Assisting Code Search with Automatic Query Reformulation for Bug Localization
Bunyamin Sisman
 
VauhkonenVohraMadaan-ProjectDeepLearningBenchMarks
Mumtaz Hannah Vauhkonen
 
Ad

Recently uploaded (20)

PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPT
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Software Development Company | KodekX
KodekX
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Doc9.....................................
SofiaCollazos
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Revolutionize Operations with Intelligent IoT Monitoring and Control
Rejig Digital
 
Software Development Methodologies in 2025
KodekX
 
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Software Development Company | KodekX
KodekX
 

PhD Defense -- Ashish Mangalampalli

  • 1. A Fuzzy Associative Rule- based Approach for Pattern Mining and Pattern-based Classification Ashish Mangalampalli Advisor: Dr. Vikram Pudi Centre for Data Engineering International Institute of Information Technology (IIIT) Hyderabad 1
  • 2. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  FACISME – Fuzzy Adaption of ACME (Maximum Entropy Associative Classifier)  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 2
  • 3. Introduction  Associative classification  Mines huge amounts of data  Integrates Association Rule Mining (ARM) with Classification A = a, B = b, C = c → X = x  Associative classifiers have several advantages  Frequent itemsets capture dominant relationships between items/features  Statistically significant associations make classification framework robust  Low-frequency patterns (noise) are eliminated during ARM  Rules are very transparent and easily understood  Unlike black-box-like approach used in popular classifiers, such as SVMs and Artificial Neural Networks 3
  • 4. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 4
  • 5. Crisp Associative Classification  Most associative classifiers are crisp  Most real-life datasets contain binary and numerical attributes  Use sharp partitioning  Transform numerical attributes to binary ones, e.g. Income = [100K and above]  Drawbacks of sharp partitioning  Introduces uncertainty, especially at partition boundaries  Small changes in intervals lead to misleading results  Gives rise to polysemy and synonymy  Intervals do not generally have clear semantics associated  For example, sharp partitions for the attribute Income  Up to 20K, 20K-100K, 100K and above  Income = 50K would fit in the second partition  But, so would Income = 99K 5
  • 6. Fuzzy Associative Classification  Fuzzy logic  Used to convert numerical attributes to fuzzy attributes (e.g. Income = High)  Maintains integrity of information conveyed by numerical attributes  Attribute values belong to partitions with some membership - interval [0, 1] 6
  • 7. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 7
  • 8. Pre-Processing and Mining  Fuzzy pre-processing  Convert crisp dataset (binary and numerical attributes) into fuzzy dataset (binary and fuzzy attributes)  FPrep Algorithm used  Efficient and robust Fuzzy ARM algorithms  Web-scale datasets mandate such algorithms  Fuzzy Apriori is most popular  Many efficient crisp ARM algorithms exist like ARMOR and FP-Growth  Algorithms used  FAR-Miner for normal transactional datasets  FAR-HD for high dimensional datasets 8
  • 9. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 13
  • 10. Associative Classification – Our Approach  AC algorithms like CPAR and CMAR only mine frequent itemsets  Processed using additional (greedy) algorithms like FOIL and PRM  Overhead in running time; process more complex  Association rules directly used for training and scoring  Exhaustive approach  Controlled by appropriate support  Not a time-intensive process  Rule pruning and ranking take care of huge volume and redundancy  Classifier built in a two-phased manner  Global rule-mining and training  Local rule-mining and training  Provides better accuracy and representation/coverage 14
  • 11. Associative Classification – Our Approach (cont’d)  Pre-processing to generate fuzzy dataset (for fuzzy associative classifiers) using FPrep  Classification Association Rules (CARs) mining using FAR-Miner or FAR-HD  CARs pruning and classifier training using SEAC or FSEAC  Rule ranking and application (scoring) techniques 15
  • 12. Simple and Effective Associative Classifier (SEAC)  Direct mining of CARs – faster and simpler training  CARs used directly through effective pruning and sorting  Pruning and rule-ranking based on  Information gain  Rule-length  Two-phased manner  Global rule-mining and training  Local rule-mining and training 16
  • 13. SEAC - Example Example Dataset Scoring Example Unlabeled: B=2, C=2 X=1 → 16, 17, 19 (IG=0.534) X=2 → 13, 14, 20 (IG=0.657) Ruleset 17
  • 14. Fuzzy Simple and Effective Associative Classifier (FSEAC)  Amalgamates Fuzzy Logic with Associative Classification  Pre-processed using FPreP  CARs mined using FAR-Miner / FAR-HD  CARs pruned based on Fuzzy Information Gain (FIG) and rule length - no sorting required  Scoring – rules applied taking µ into account  Sorting done then  Final score computed 18
  • 15. FSEAC - Example Format for Fuzzy Version of Dataset Example Dataset Fuzzy Version of Example Dataset 19
  • 16. FSEAC – Example (cont’d) Ruleset 20
  • 17. SEAC and FSEAC Experimental Setup  SEAC  12 classifiers (Associative and non-associative)  14 UCI ML datasets  100-5000 records per dataset  2-10 classes per dataset  Up to 20 features per dataset  10-fold Cross Validation  FSEAC  17 classifiers (Associative and non-associative; fuzzy and crisp)  23 UCI ML datasets  100-5000 records per dataset  2-10 classes per dataset  Up to 60 features per dataset  10-fold Cross Validation 21
  • 18. SEAC – Results (10 fold-CV) continued 22
  • 19. SEAC - Results (10 fold-CV) 23
  • 20. FSEAC - Results (10 fold-CV) continued 24
  • 21. FSEAC - Results (10 fold-CV) 25
  • 22. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 26
  • 23. Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Adapts fuzzy associative classification for Object Class Detection in images  Speeded-Up Robust Features (SURF) - interest point detector and descriptor for images  Fuzzy clusters used as opposed to hard clustering used in Bag- of-words  Only positive class (CP) examples used for mining  Negative class (CN) in object class detection is very vague  CN = U – CP  Rules are pruned and ranked based on Information Gain  Other AC algorithms use third-party algorithms for rule- generation from frequent itemsets  Top k rules are used for scoring and classification 27 ICPR 2010
  • 24. I-FAC  SURF points extracted from positive class images  FCM applied to derive clusters  Clusters (with µs) used to generate dataset for mining  100 fuzzy clusters as opposed to1000-2000 crisp clusters-based algorithms  ARM generates Classification Association Rules (CARs) associated with positive class  CARs are pruned and sorted using  Fuzzy Information Gain (FIG) of each rule  Length of each rule i.e. number of attributes in each rule  Scoring based on rule-match and FIG 28 ICPR 2010
  • 25. I-FAC - Performance Study  Performs well when compared to BOW or SVM  Very well at low FPRs (≤0.3)  Fuzzy nature helps avoid polysemy and synonymy  Uses only positive class for training 30 ICPR 2010
  • 26. Visual Concept Detection on MIR Flickr  Revamped version of I-FAC  Multi-class detection  38 visual concepts  e.g. car, sky, clouds, water, building, sea, face  Experimental evaluation  First 10K images of MIR Flick dataset  AUC values for each concept 31
  • 27. Experimental Results (3-fold CV) continued 32
  • 29. Look-alike Modeling using Feature-Pair- based Associative Classification  Display-ad targeting currently done using methods which rely on publisher-defined segments like Behavior-targeting (BT)  Look-alike model trained to identify similar users  Similarity is based on historical user behavior  Model iteratively rebuilt as more users are added  Advertiser supplies seed list of users  Approach for building advertiser specific audience segments  Complements publisher defined segments such as BT  Provides advertisers control over the audience definition  Given a list of target users (e.g., people who clicked or converted on a particular category or ad campaign), find other similar users. 34 WWW 2011
  • 30. Look-alike Modeling using Feature-Pair- based Associative Classification – cont’d  Enumerate all feature-pairs in training set occurring in at least 5 positive-class records  Feature-pairs modelled as AC rules  Only rules for positive class used  Works well in Tail Campaigns  Affinity measured by Frequency-weighted LLR (F-LLR)  FLLR = P(f) log(P(f | conv) / P(f | non-conv))  Rules sorted in descending order by F-LLRs  Scoring - Top k rules are applied  Cumulative score from all rules used for classification 35 WWW 2011
  • 31. Performance Study  Two pilot campaigns  300K records each Lift Baseline (Conversion Lift (AUC)  One record per user Rate)  Training window - 14 Random days 82% – Targeting  Scoring window - seven Linear SVM 301% 11% days GBDT 100% 2%  Works very well for Tail Results on a Tail Campaign Campaigns  Can find meaningful Lift Baseline Lift (Conversion Rate) associations in extremely (AUC) sparse and skewed data Random 48% – Targeting  SVM and GBDT work Linear SVM -12% -6% well for Head Campaigns GBDT -40% -14% Results on a Head Campaign 36 WWW 2011
  • 32. Outline  Introduction  Crisp and Fuzzy Associative Classification  Pre-Processing and Mining  Fuzzy Pre-Processing – FPrep  Fuzzy ARM – FAR-Miner and FAR-HD  Associative Classification – Our Approach  Simple and Effective Associative Classifier (SEAC)  Fuzzy Simple and Effective Associative Classifier (FSEAC)  Associative Classification – Applications  Efficient Fuzzy Associative Classifier for Object Classes in Images (I-FAC)  Associative Classifier for Ad-targeting  Conclusions 37
  • 33. Conclusions  Fuzzy pre-processing for dataset transformation  Fuzzy ARM for various types of datasets  Fuzzy and Crisp Associative Classifiers for various domains  Customizations required for different domains  Pre-processing  Pruning  Rule ranking techniques  Rule application (scoring) techniques 38
  • 34. References  Ashish Mangalampalli, Adwait Ratnaparkhi, Andrew O. Hatch, Abraham Bagherjeiran, Rajesh Parekh, and Vikram Pudi. A Feature-Pair-based Associative Classification Approach to Look-alike Modeling for Conversion-Oriented User-Targeting in Tail Campaigns. In International World Wide Web Conference (WWW), 2011.  Ashish Mangalampalli, Vineet Chaoji, and Subhajit Sanyal. I-FAC: Efficient fuzzy associative classifier for object classes in images. In International Conference on Pattern Recognition (ICPR), 2010.  Ashish Mangalampalli and Vikram Pudi. FPrep: Fuzzy clustering driven efficient automated pre-processing for fuzzy association rule mining. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2010.  Ashish Mangalampalli and Vikram Pudi. FACISME: Fuzzy associative classification using iterative scaling and maximum entropy. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2010.  Ashish Mangalampalli and Vikram Pudi. Fuzzy Association Rule Mining Algorithm for Fast and Efficient Performance on Very Large Datasets. In IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2009. 39
  • 35. Thank You, and Questions 40