0% found this document useful (0 votes)

1 views

Data Mining mod 2

Association Rule Mining is a data mining technique that identifies relationships between variables in large datasets using 'if-then' rules. Key concepts include itemsets, transactions, support, confidence, and the Apriori principle, which helps in finding frequent itemsets efficiently. The process involves two main steps: finding frequent itemsets and generating strong association rules, with applications across various domains such as retail, e-commerce, and medicine.

Uploaded by

divyansh.p.m.126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Data Mining mod 2

Uploaded by

divyansh.p.m.126

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Basic Concepts and Algorithms

Preliminaries (Association Analysis –

Module II)

🔷 1. What is Association Rule Mining?

Association Rule Mining is a data mining technique used to identify relationships
between variables in large datasets. These relationships are presented in the form of
"if-then" rules.

Example:

🛒 In a supermarket, you may find:

● If a customer buys milk, then they also buy bread.

This is expressed as:

Milk⇒Bread\text{{Milk}} \Rightarrow \text{{Bread}}Milk⇒Bread

🔷 2. Key Terminologies in Association Mining

Let’s break down every fundamental term used in association rule mining with practical
understanding.

🔹 2.1 Item
An item is a single object, product, or attribute being analyzed.

● Example: "milk", "bread", "butter"

🔹 2.2 Itemset
A collection of one or more items.

● 1-itemset: {milk}
● 2-itemset: {milk, bread}

● k-itemset: itemset of k items

🔹 2.3 Transaction
A set of items bought together at the same time. Stored in a transactional database.

● Example: A shopping cart with {milk, bread, eggs}

🔹 2.4 Transaction ID (TID)

A unique identifier for each transaction.

TID Items

1 {milk, bread}

2 {milk, bread, butter}

3 {bread, butter}

🔹 2.5 Support Count (σ)

The number of transactions containing an itemset.

Example:

● {milk, bread} appears in 2 transactions → support count = 2

🔹 2.6 Support
The fraction or percentage of transactions that contain the itemset.

Support(A)=Number of transactions containing ATotal number of

transactions\text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total
number of transactions}}Support(A)=Total number of transactionsNumber of transactions
containing A
● Support({milk, bread}) = 2 / 3 = 66.7%

🔹 2.7 Confidence
The conditional probability that a transaction containing itemset A also contains itemset B.

Confidence(A⇒B)=Support(A∪B)Support(A)=P(B∣A)\text{Confidence}(A \Rightarrow B) =
\frac{\text{Support}(A \cup B)}{\text{Support}(A)} =
P(B|A)Confidence(A⇒B)=Support(A)Support(A∪B)=P(B∣A)

● If:

○ Support({milk, bread}) = 2 / 3

○ Support({milk}) = 2 / 3
Then:

○ Confidence(milk ⇒ bread) = (2/3) ÷ (2/3) = 1 (100%)

🔹 2.8 Frequent Itemset

An itemset whose support is greater than or equal to a user-defined minimum support
threshold.

Example:

● If minimum support = 2/3

● {milk, bread} is frequent

🔹 2.9 Association Rule

An implication expression of the form:

A⇒BA \Rightarrow BA⇒B

Where:

● A and B are itemsets

● A ∩ B = ∅

Represents a relationship such that when A is bought, B is also likely to be bought.

🔹 2.10 Interestingness Measures

To determine which rules are “interesting” or useful.

Measure Use

Support Indicates how common the rule is

Confidence Indicates how strong the rule is

Lift Indicates correlation between

items

🔷 3. Problem Definition
Let’s formally define the task:

● I = set of items, e.g., {milk, bread, eggs, butter}

● D = database of transactions, where each transaction T is a subset of I

● A rule is an implication A ⇒ B, where A, B ⊆ I and A ∩ B = ∅

4. Association Rule Mining Process – 2-Step

Association rule mining can be broken into two major tasks:

🔸 Step 1: Find all Frequent Itemsets

● Find itemsets that occur frequently in D

● Must satisfy minimum support

🔸 Step 2: Generate Strong Association Rules

● Use the frequent itemsets from Step 1
● Generate rules A ⇒ B

● Each rule must satisfy:

○ Minimum Support

○ Minimum Confidence

🔷 5. The Apriori Principle – Foundation for Algorithms

🔸 Definition:
If an itemset is not frequent, then all of its supersets are also not frequent

This principle allows the algorithm to prune the search space and avoid computing all
combinations.

🔸 Example:
If {milk, butter} is not frequent, then:

● {milk, butter, bread}

● {milk, butter, eggs} … and so on, are also not frequent

🔷 6. Algorithms for Frequent Itemset Generation

🔹 6.1 Naive Algorithm (Inefficient)
● Generate all item combinations

● Count their support

● Retain those above minimum support

🛑 Problem: Computationally expensive and slow for large datasets.

🔹 6.2 Apriori Algorithm (Efficient)

● Developed by Agrawal and Srikant

● Uses level-wise search and pruning with the Apriori principle

Steps:

1. Generate frequent 1-itemsets (L1)

2. Generate candidate 2-itemsets (C2) from L1

3. Prune C2 using Apriori principle

4. Calculate support and select frequent 2-itemsets (L2)

5. Repeat for L3, L4,… until no more frequent itemsets

🔁 Join and Prune operations in each iteration.

🔷 7. Applications of Association Rule Mining

Domain Application Example

Retail Basket analysis: {diaper} ⇒ {beer}

E-commerce Recommender systems (e.g., Amazon)

Banking Fraud detection based on transaction

patterns

Education Predicting dropout or failure

Medicine Diagnosis patterns: {fever, cough} ⇒ {flu}

Social Media Content suggestion, trend analysis

🔷 8. Real Example – Market Basket Data

Assume the following 5 transactions:

TID Items

1 {milk, bread, butter}

2 {milk, bread}

3 {bread, butter}

4 {milk, butter}

5 {bread}

Let min support = 0.4 (2 transactions)

● Frequent 1-itemsets: {milk}, {bread}, {butter}

● Frequent 2-itemsets: {milk, bread}, {bread, butter}

● Rule: milk ⇒ bread

○ Support = 2/5 = 0.4

○ Confidence = 2/3 ≈ 66.7%

9. Types of Association Rules

Type Description Example

Single-Dimensio Items from the same buys(computer) ⇒ buys(antivirus)

nal dimension

Multi-Dimensiona Items from different age(30-39) ∧ income(42k-48k) ⇒

l dimensions buys(TV)

Boolean True/False presence of buys(laptop) ⇒ buys(printer)

items

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Unit - III
No ratings yet
Unit - III
27 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
UNIT 2 Updated (1) (1)
No ratings yet
UNIT 2 Updated (1) (1)
50 pages
6 - Association Rules- for students
No ratings yet
6 - Association Rules- for students
39 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association and Recommendation System
No ratings yet
Association and Recommendation System
24 pages
Contents
No ratings yet
Contents
59 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
No ratings yet
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
174 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association
No ratings yet
Association
54 pages
Association Rules
No ratings yet
Association Rules
24 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
ChatPDF-DataMining Lec4 (1)
No ratings yet
ChatPDF-DataMining Lec4 (1)
5 pages
04-association_rule_mining
No ratings yet
04-association_rule_mining
22 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rules Overview
No ratings yet
Association Rules Overview
23 pages
Association Rule Mining
No ratings yet
Association Rule Mining
26 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
BD25
No ratings yet
BD25
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Mod 4 part1_merged
No ratings yet
Mod 4 part1_merged
104 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Unit 5
No ratings yet
Unit 5
40 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Slides
No ratings yet
Slides
92 pages
dmunit2
No ratings yet
dmunit2
85 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Lect 6
No ratings yet
Lect 6
74 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Do-It-Yourself Technical Analysis Simplified by Trained Accountant
From Everand
Do-It-Yourself Technical Analysis Simplified by Trained Accountant
Anthony Brticevic
No ratings yet
Thesis Defense Slides
100% (3)
Thesis Defense Slides
6 pages
Connect 4 Unit 10
No ratings yet
Connect 4 Unit 10
20 pages
Occupational Therapy Coloring Pages-Freebie
No ratings yet
Occupational Therapy Coloring Pages-Freebie
8 pages
Annual Procurement Plan
No ratings yet
Annual Procurement Plan
10 pages
Book
No ratings yet
Book
15 pages
If Elif Else 10
No ratings yet
If Elif Else 10
8 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Downlink Bench Tree en Ingles
100% (1)
Downlink Bench Tree en Ingles
5 pages
MCQ
50% (2)
MCQ
32 pages
C4.5 Decision Tree Algorithm
No ratings yet
C4.5 Decision Tree Algorithm
47 pages
Census PDF
No ratings yet
Census PDF
5 pages
Resume - Manikandan
No ratings yet
Resume - Manikandan
4 pages
DAM Andler: Senior Business Analyst / Project Manager
No ratings yet
DAM Andler: Senior Business Analyst / Project Manager
2 pages
Software Testing And Analysis Process Principles And Techniques Mauro Pezz download
No ratings yet
Software Testing And Analysis Process Principles And Techniques Mauro Pezz download
76 pages
Section 10 DA 263
No ratings yet
Section 10 DA 263
38 pages
Trimble Link Primer and Tutorial DRAFT
No ratings yet
Trimble Link Primer and Tutorial DRAFT
7 pages
Tecnica Plasma 18-31 PDF
No ratings yet
Tecnica Plasma 18-31 PDF
20 pages
Extra
No ratings yet
Extra
5 pages
Ucs422 Partc Group Assingment
No ratings yet
Ucs422 Partc Group Assingment
10 pages
Bachelor of Computer Applications (Revised) (BCA) Term-End Practical Examination December, 2015
No ratings yet
Bachelor of Computer Applications (Revised) (BCA) Term-End Practical Examination December, 2015
2 pages
Application of Linear Programming For Optimal Use of Raw Materials in Bakery
No ratings yet
Application of Linear Programming For Optimal Use of Raw Materials in Bakery
11 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
Sailaja Resume
No ratings yet
Sailaja Resume
4 pages
Questions: Practice Set
No ratings yet
Questions: Practice Set
4 pages
Infineon-AN2008 03 Thermal Equivalent Circuit models-AN-v1.0-en PDF
No ratings yet
Infineon-AN2008 03 Thermal Equivalent Circuit models-AN-v1.0-en PDF
10 pages
SMB Sales Impl Guide
No ratings yet
SMB Sales Impl Guide
29 pages
Modeling and Simulation of Unified Power Quality Conditioner (UPQC)
No ratings yet
Modeling and Simulation of Unified Power Quality Conditioner (UPQC)
5 pages
MTAMA5 CN PDF
No ratings yet
MTAMA5 CN PDF
32 pages
Diploma in Accounting Syllabus
No ratings yet
Diploma in Accounting Syllabus
52 pages
Diagrama Ternário
No ratings yet
Diagrama Ternário
27 pages

Data Mining mod 2

Uploaded by

Data Mining mod 2

Uploaded by

Basic Concepts and Algorithms

Preliminaries (Association Analysis –

🔷 1. What is Association Rule Mining?

🛒 In a supermarket, you may find:

This is expressed as:

Milk⇒Bread\text{{Milk}} \Rightarrow \text{{Bread}}Milk⇒Bread

🔷 2. Key Terminologies in Association Mining

●​ Example: "milk", "bread", "butter"​

●​ k-itemset: itemset of k items​

●​ Example: A shopping cart with {milk, bread, eggs}​

🔹 2.4 Transaction ID (TID)

2 {milk, bread, butter}

🔹 2.5 Support Count (σ)

●​ {milk, bread} appears in 2 transactions → support count = 2​

Support(A)=Number of transactions containing ATotal number of

○​ Confidence(milk ⇒ bread) = (2/3) ÷ (2/3) = 1 (100%)​

🔹 2.8 Frequent Itemset

●​ If minimum support = 2/3​

●​ {milk, bread} is frequent​

🔹 2.9 Association Rule

A⇒BA \Rightarrow BA⇒B

●​ A and B are itemsets​

Represents a relationship such that when A is bought, B is also likely to be bought.

🔹 2.10 Interestingness Measures

Support Indicates how common the rule is

Confidence Indicates how strong the rule is

Lift Indicates correlation between

●​ I = set of items, e.g., {milk, bread, eggs, butter}​

●​ D = database of transactions, where each transaction T is a subset of I​

●​ A rule is an implication A ⇒ B, where A, B ⊆ I and A ∩ B = ∅​

4. Association Rule Mining Process – 2-Step

🔸 Step 1: Find all Frequent Itemsets

●​ Must satisfy minimum support​

🔸 Step 2: Generate Strong Association Rules

●​ Each rule must satisfy:​

🔷 5. The Apriori Principle – Foundation for Algorithms

●​ {milk, butter, bread}​

●​ {milk, butter, eggs} … and so on, are also not frequent​

🔷 6. Algorithms for Frequent Itemset Generation

●​ Count their support​

●​ Retain those above minimum support​

🛑 Problem: Computationally expensive and slow for large datasets.

🔹 6.2 Apriori Algorithm (Efficient)

●​ Uses level-wise search and pruning with the Apriori principle​

1.​ Generate frequent 1-itemsets (L1)​

2.​ Generate candidate 2-itemsets (C2) from L1​

3.​ Prune C2 using Apriori principle​

4.​ Calculate support and select frequent 2-itemsets (L2)​

5.​ Repeat for L3, L4,… until no more frequent itemsets​

🔁 Join and Prune operations in each iteration.

🔷 7. Applications of Association Rule Mining

Retail Basket analysis: {diaper} ⇒ {beer}

E-commerce Recommender systems (e.g., Amazon)

Banking Fraud detection based on transaction

Education Predicting dropout or failure

Medicine Diagnosis patterns: {fever, cough} ⇒ {flu}

Social Media Content suggestion, trend analysis

🔷 8. Real Example – Market Basket Data

1 {milk, bread, butter}

Let min support = 0.4 (2 transactions)

●​ Frequent 1-itemsets: {milk}, {bread}, {butter}​

●​ Frequent 2-itemsets: {milk, bread}, {bread, butter}​

●​ Rule: milk ⇒ bread​

○​ Support = 2/5 = 0.4​

○​ Confidence = 2/3 ≈ 66.7%​

9. Types of Association Rules

Single-Dimensio Items from the same buys(computer) ⇒ buys(antivirus)

Multi-Dimensiona Items from different age(30-39) ∧ income(42k-48k) ⇒

Boolean True/False presence of buys(laptop) ⇒ buys(printer)

You might also like

● Example: "milk", "bread", "butter"

● k-itemset: itemset of k items

● Example: A shopping cart with {milk, bread, eggs}

● {milk, bread} appears in 2 transactions → support count = 2

○ Confidence(milk ⇒ bread) = (2/3) ÷ (2/3) = 1 (100%)

● If minimum support = 2/3

● {milk, bread} is frequent

● A and B are itemsets

● I = set of items, e.g., {milk, bread, eggs, butter}

● D = database of transactions, where each transaction T is a subset of I

● A rule is an implication A ⇒ B, where A, B ⊆ I and A ∩ B = ∅

● Must satisfy minimum support

● Each rule must satisfy:

● {milk, butter, bread}

● {milk, butter, eggs} … and so on, are also not frequent

● Count their support

● Retain those above minimum support

● Uses level-wise search and pruning with the Apriori principle

1. Generate frequent 1-itemsets (L1)

2. Generate candidate 2-itemsets (C2) from L1

3. Prune C2 using Apriori principle

4. Calculate support and select frequent 2-itemsets (L2)

5. Repeat for L3, L4,… until no more frequent itemsets

● Frequent 1-itemsets: {milk}, {bread}, {butter}

● Frequent 2-itemsets: {milk, bread}, {bread, butter}

● Rule: milk ⇒ bread

○ Support = 2/5 = 0.4

○ Confidence = 2/3 ≈ 66.7%