0% found this document useful (0 votes)
1 views

Data Mining mod 2

Association Rule Mining is a data mining technique that identifies relationships between variables in large datasets using 'if-then' rules. Key concepts include itemsets, transactions, support, confidence, and the Apriori principle, which helps in finding frequent itemsets efficiently. The process involves two main steps: finding frequent itemsets and generating strong association rules, with applications across various domains such as retail, e-commerce, and medicine.

Uploaded by

divyansh.p.m.126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Data Mining mod 2

Association Rule Mining is a data mining technique that identifies relationships between variables in large datasets using 'if-then' rules. Key concepts include itemsets, transactions, support, confidence, and the Apriori principle, which helps in finding frequent itemsets efficiently. The process involves two main steps: finding frequent itemsets and generating strong association rules, with applications across various domains such as retail, e-commerce, and medicine.

Uploaded by

divyansh.p.m.126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Basic Concepts and Algorithms

Preliminaries (Association Analysis –


Module II)

🔷 1. What is Association Rule Mining?


Association Rule Mining is a data mining technique used to identify relationships
between variables in large datasets. These relationships are presented in the form of
"if-then" rules.

Example:

🛒 In a supermarket, you may find:


●​ If a customer buys milk, then they also buy bread.​

This is expressed as:

Milk⇒Bread\text{{Milk}} \Rightarrow \text{{Bread}}Milk⇒Bread

🔷 2. Key Terminologies in Association Mining


Let’s break down every fundamental term used in association rule mining with practical
understanding.

🔹 2.1 Item
An item is a single object, product, or attribute being analyzed.

●​ Example: "milk", "bread", "butter"​

🔹 2.2 Itemset
A collection of one or more items.

●​ 1-itemset: {milk}​
●​ 2-itemset: {milk, bread}​

●​ k-itemset: itemset of k items​

🔹 2.3 Transaction
A set of items bought together at the same time. Stored in a transactional database.

●​ Example: A shopping cart with {milk, bread, eggs}​

🔹 2.4 Transaction ID (TID)


A unique identifier for each transaction.

TID Items

1 {milk, bread}

2 {milk, bread, butter}

3 {bread, butter}

🔹 2.5 Support Count (σ)


The number of transactions containing an itemset.

Example:

●​ {milk, bread} appears in 2 transactions → support count = 2​

🔹 2.6 Support
The fraction or percentage of transactions that contain the itemset.

Support(A)=Number of transactions containing ATotal number of


transactions\text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total
number of transactions}}Support(A)=Total number of transactionsNumber of transactions
containing A​
●​ Support({milk, bread}) = 2 / 3 = 66.7%​

🔹 2.7 Confidence
The conditional probability that a transaction containing itemset A also contains itemset B.

Confidence(A⇒B)=Support(A∪B)Support(A)=P(B∣A)\text{Confidence}(A \Rightarrow B) =
\frac{\text{Support}(A \cup B)}{\text{Support}(A)} =
P(B|A)Confidence(A⇒B)=Support(A)Support(A∪B)​=P(B∣A)

●​ If:​

○​ Support({milk, bread}) = 2 / 3​

○​ Support({milk}) = 2 / 3​
Then:​

○​ Confidence(milk ⇒ bread) = (2/3) ÷ (2/3) = 1 (100%)​

🔹 2.8 Frequent Itemset


An itemset whose support is greater than or equal to a user-defined minimum support
threshold.

Example:

●​ If minimum support = 2/3​

●​ {milk, bread} is frequent​

🔹 2.9 Association Rule


An implication expression of the form:

A⇒BA \Rightarrow BA⇒B

Where:

●​ A and B are itemsets​


●​ A ∩ B = ∅​

Represents a relationship such that when A is bought, B is also likely to be bought.

🔹 2.10 Interestingness Measures


To determine which rules are “interesting” or useful.

Measure Use

Support Indicates how common the rule is

Confidence Indicates how strong the rule is

Lift Indicates correlation between


items

🔷 3. Problem Definition
Let’s formally define the task:

●​ I = set of items, e.g., {milk, bread, eggs, butter}​

●​ D = database of transactions, where each transaction T is a subset of I​

●​ A rule is an implication A ⇒ B, where A, B ⊆ I and A ∩ B = ∅​

4. Association Rule Mining Process – 2-Step


Association rule mining can be broken into two major tasks:

🔸 Step 1: Find all Frequent Itemsets


●​ Find itemsets that occur frequently in D​

●​ Must satisfy minimum support​

🔸 Step 2: Generate Strong Association Rules


●​ Use the frequent itemsets from Step 1​
●​ Generate rules A ⇒ B​

●​ Each rule must satisfy:​

○​ Minimum Support​

○​ Minimum Confidence​

🔷 5. The Apriori Principle – Foundation for Algorithms


🔸 Definition:
If an itemset is not frequent, then all of its supersets are also not frequent

This principle allows the algorithm to prune the search space and avoid computing all
combinations.

🔸 Example:
If {milk, butter} is not frequent, then:

●​ {milk, butter, bread}​

●​ {milk, butter, eggs} … and so on, are also not frequent​

🔷 6. Algorithms for Frequent Itemset Generation


🔹 6.1 Naive Algorithm (Inefficient)
●​ Generate all item combinations​

●​ Count their support​

●​ Retain those above minimum support​

🛑 Problem: Computationally expensive and slow for large datasets.

🔹 6.2 Apriori Algorithm (Efficient)


●​ Developed by Agrawal and Srikant​

●​ Uses level-wise search and pruning with the Apriori principle​

Steps:

1.​ Generate frequent 1-itemsets (L1)​

2.​ Generate candidate 2-itemsets (C2) from L1​

3.​ Prune C2 using Apriori principle​

4.​ Calculate support and select frequent 2-itemsets (L2)​

5.​ Repeat for L3, L4,… until no more frequent itemsets​

🔁 Join and Prune operations in each iteration.

🔷 7. Applications of Association Rule Mining


Domain Application Example

Retail Basket analysis: {diaper} ⇒ {beer}

E-commerce Recommender systems (e.g., Amazon)

Banking Fraud detection based on transaction


patterns

Education Predicting dropout or failure

Medicine Diagnosis patterns: {fever, cough} ⇒ {flu}

Social Media Content suggestion, trend analysis

🔷 8. Real Example – Market Basket Data


Assume the following 5 transactions:

TID Items

1 {milk, bread, butter}


2 {milk, bread}

3 {bread, butter}

4 {milk, butter}

5 {bread}

Let min support = 0.4 (2 transactions)

●​ Frequent 1-itemsets: {milk}, {bread}, {butter}​

●​ Frequent 2-itemsets: {milk, bread}, {bread, butter}​

●​ Rule: milk ⇒ bread​

○​ Support = 2/5 = 0.4​

○​ Confidence = 2/3 ≈ 66.7%​

9. Types of Association Rules


Type Description Example

Single-Dimensio Items from the same buys(computer) ⇒ buys(antivirus)


nal dimension

Multi-Dimensiona Items from different age(30-39) ∧ income(42k-48k) ⇒


l dimensions buys(TV)

Boolean True/False presence of buys(laptop) ⇒ buys(printer)


items

You might also like