0% found this document useful (0 votes)

15 views44 pages

ICS 2408 - Lecture 5 - Association

Uploaded by

mmdennis25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views44 pages

ICS 2408 - Lecture 5 - Association

Uploaded by

mmdennis25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Mining Frequent Patterns, Association and

Correlations

 Basic concepts
 Efficient and scalable frequent itemset mining methods
 Constraint-based association mining

August 12, 2024 Moso J : Dedan Kimathi University 1

What Is Association Mining?
 Association rule mining:
 Finding frequent patterns, associations, correlations, or causal
structures among sets of items or objects in transaction databases,
relational databases, and other information repositories.
 Frequent pattern: pattern (set of items, sequence, etc.) that occurs
frequently in a database

 Motivation: finding regularities in data

 What products were often purchased together? — Beer and
diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?

August 12, 2024 Moso J : Dedan Kimathi University 2

Why Is Association Mining Important?
 Foundation for many essential data mining tasks
 Association, correlation, causality
 Sequential patterns, temporal or cyclic association, partial
periodicity, spatial and multimedia association
 Associative classification, cluster analysis, iceberg cube, fascicles
(semantic data compression)
 Discloses an intrinsic and important property of data sets
 Broad applications
 Basket data analysis, cross-marketing, catalog design, sale
campaign analysis
 Web log (click stream) analysis, DNA sequence analysis, etc.

August 12, 2024 Moso J : Dedan Kimathi University 3

Market Basket Analysis (MBA)
 Retail – each customer purchases different set of products, different
quantities, different times
 MBA uses this information to:
 Identify who customers are (not by name)

 Understand why they make certain purchases

 Gain insight about its merchandise (products):

 Fast and slow movers

 Products which are purchased together

 Products which might benefit from promotion

 Take action:

 Store layouts

 Which products to put on specials, promote, coupons…

 Combining all of this with a customer loyalty card it becomes even

more valuable
August 12, 2024 Moso J : Dedan Kimathi University 4
Transactional Data

Market basket example:

Basket1: {bread, cheese, milk}
Basket2: {apple, eggs, salt, yogurt}
…
Basketn: {biscuit, eggs, milk}
Definitions:
 An item: an article in a basket, or an attribute-value pair
 I: the set of all items sold in the store
 A transaction: items purchased in a basket; it may have TID
(transaction ID)
 A transactional dataset: A set of transactions
August 12, 2024 Moso J : Dedan Kimathi University 5
Itemsets and Association Rules
 An itemset is a set of items.
 E.g., {milk, bread, cereal} is an itemset.
 A k-itemset is an itemset with k items.
 Given a dataset D, an itemset X has a (frequency) count in
D
 An association rule is about relationships between two
disjoint itemsets X and Y
XY
 It presents the pattern when X occurs, Y also occurs

August 12, 2024 Moso J : Dedan Kimathi University 6

Use of Association Rules
 Association rules do not represent any sort of causality or
correlation between the two itemsets.
 X  Y does not mean X causes Y, so no Causality

 X  Y can be different from Y  X, unlike correlation

 Association rules assist in:

 Marketing

 Targeted advertising

 Floor planning

 Inventory control,

 Churning management etc

August 12, 2024 Moso J : Dedan Kimathi University 7

Other Applications
 Market Basket Analysis: given a database of customer
transactions, where each transaction is a set of items the goal is to
find groups of items which are frequently purchased together.
 Telecommunication (each customer is a transaction containing the
set of phone calls)
 Credit Cards/ Banking Services (each card/account is a
transaction containing the set of customer’s payments)
 Medical Treatments (each patient is represented as a transaction
containing the ordered set of diseases)
 Fraud detection: Unusual combinations of insurance claims can be
a warning of fraud

August 12, 2024 Moso J : Dedan Kimathi University 8

Association Rule: Basic Concepts
 Given: (1) database of transactions,
(2) each transaction is a list of items (purchased by a
customer in a visit)
 Find: all rules that correlate the presence of one set of items with
that of another set of items
 E.g. 98% of people who purchase tires and auto accessories also

get automotive services done

 Applications
 Maintenance Agreement (What the store should do to boost
Maintenance Agreement sales)
 Home Electronics (What other products should the store stock
up?)
 Attached mailing in direct marketing

August 12, 2024 Moso J : Dedan Kimathi University 9

Rule Measures: Support and Confidence
Transaction-id Items bought  Itemset X = {x1, …, xk}
10 A, B, D  Find all the rules X  Y with minimum
20 A, C, D support and confidence
30 A, D, E  support, s, probability that a
40 B, E, F
transaction contains X  Y
50 B, C, D, E, F
 confidence, c, conditional
Customer
buys both
Customer probability that a transaction
buys diaper
having X also contains Y
Let supmin = 50%, confmin = 50%
Frequent Pattern: {A:3, B:3, D:4, E:3, AD:3}
Association rules:
Customer A  D (60%, 100%)
buys beer D  A (60%, 75%)

August 12, 2024 Moso J : Dedan Kimathi University 10

Support and Confidence
 Support count: The support count of an itemset X, denoted
by X.count, in a data set T is the number of transactions in T
that contain X. Assume T has n transactions.
 Then,
( X  Y ).count
support 
n
( X  Y ).count
confidence 
X .count
 Interesting association rules are (for now) those whose S and
C are greater than minSup and minConf
August 12, 2024 Moso J : Dedan Kimathi University 11
Support (utility)
 Usefulness of a rule can be measured with a minimum
support threshold
 This parameter lets us measure how many events have such
itemsets that match both sides of the implication in the
association rule
 Rules for events whose itemsets do not match boths sides
sufficiently often (defined by a threshold value) can be
excluded

August 12, 2024 Moso J : Dedan Kimathi University 12

Confidence (certainty)
 Certainty of a rule can be measured with a threshold for
confidence
 This parameter lets us measure how often an event’s itemset
that matches the left side of the implication in the association
rule also matches for the right side
 Rules for events whose itemsets do not match sufficiently
often the right side while matching the left (defined by a
threshold value) can be excluded

August 12, 2024 Moso J : Dedan Kimathi University 13

Example

Data set D
Count, Support,
TID Itemsets Confidence:
T100 134
Count(1 3)=2
T200 235
|D| = 4
T300 1235
Support(1 3)=0.5
T400 25
Support(32)=0.5
Confidence(32)=0.67

August 12, 2024 Moso J : Dedan Kimathi University 14

Mining Association Rules: Example

Transaction-id Items bought Min. support 50%

10 A, B, C
Min. confidence 50%
20 A, C Frequent pattern Support
30 A, D {A} 75%
40 B, E, F {B} 50%
{C} 50%
{A, C} 50%
For rule A  C :
support = support({A }{C }) =(2/4)*100= 50%
confidence = support({A }{C })/support({A })
=((2/4)/(3/4))*100 =66.6%

August 12, 2024 Moso J : Dedan Kimathi University 15

Mining Association Rules: What We Need to Know
 Goal: Rules with high support/confidence
 How to compute?
 Support: Find sets of items that occur frequently

 Confidence: Find frequency of subsets of supported

itemsets
 If we have all frequently occurring sets of items (frequent
itemsets), we can compute support and confidence!

August 12, 2024 Moso J : Dedan Kimathi University 16

Mining Frequent Itemsets: the Key Step

 Find the frequent itemsets: the sets of items that have

minimum support
 A subset of a frequent itemset must also be a frequent itemset
 i.e., if {AB } is a frequent itemset, both {A } and {B } should
be a frequent itemset
 Iteratively find frequent itemsets with cardinality from 1 to k (k-
itemset)
 Use the frequent itemsets to generate association rules.

August 12, 2024 Moso J : Dedan Kimathi University 17

Scalable Methods for Mining Frequent Patterns

 The downward closure property of frequent patterns

 The Apriori principal: Any subset of a frequent itemset

must be frequent
 If {beer, diaper, nuts} is frequent, so is {beer,

diaper} i.e., every transaction having {beer, diaper,

nuts} also contains {beer, diaper}
 Scalable mining methods: Three major approaches
 Apriori algorithm

 Frequent pattern growth

 Vertical data format approach

August 12, 2024 Moso J : Dedan Kimathi University 18

Apriori: A Candidate Generation-and-test Approach

 Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k frequent
itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be generated

August 12, 2024 Moso J : Dedan Kimathi University 19

Apriori: A Candidate Generation-and-test Approach
 Join Step: Ck is generated by joining Lk-1with itself
 Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of
a frequent k-itemset
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
August 12, 2024 Moso J : Dedan Kimathi University 20
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup Itemset sup
Database TDB
{A} 2 L1 {A} 2
Tid Items C1
{B} 3 {B} 3
10 A, C, D
1st scan {C} 3 {C} 3
20 B, C, E
{D} 1 {E} 3
30 A, B, C, E
{E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2
2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
August 12, 2024 Moso J : Dedan Kimathi University 21
Important Details of Apriori
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 How to count supports of candidates? :Example of Candidate-
generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4={abcd}
August 12, 2024 Moso J : Dedan Kimathi University 22
Step 2: Generating rules from frequent itemsets

 Frequent itemsets  association rules

 One more step is needed to generate association rules
 For each frequent itemset X,
For each proper nonempty subset A of X,
 Let B = X - A

 A  B is an association rule if

 Confidence(A  B) ≥ minconf,
support(A  B) = support(AB) = support(X)
confidence(A  B) = support(A  B) / support(A)

August 12, 2024 Moso J : Dedan Kimathi University 23

Generating rules: an example
 Given {B,C,E} is frequent, with sup=50% and minconf = 50% then
 Proper nonempty subsets: {B,C}, {B,E}, {C,E}, {B}, {C}, {E}, with sup=50%,
75%, 50%, 75%, 75%, 75% respectively
 These generate these association rules:
 B,C  E, confidence=100%
 B,E  C, confidence=67%
 C,E  B, confidence=100%
 B  C,E, confidence=67%
 C  B,E, confidence=67%
 E  B,C, confidence=67%
 All rules have support = 50%

August 12, 2024 Moso J : Dedan Kimathi University 24

Apriori Advantages and Disadvantages

 Advantages:
 Uses large itemset property.

 Easily parallelized

 Easy to implement.

 Disadvantages:
 Assumes transaction database is memory resident.

 Requires up to m database scans.

August 12, 2024 Moso J : Dedan Kimathi University 25

Methods to Improve Apriori’s Efficiency
 Hash-based itemset counting: A k-itemset whose corresponding
hashing bucket count is below the threshold cannot be frequent.
 Transaction reduction: A transaction that does not contain any
frequent k-itemset is useless in subsequent scans.
 Partitioning: Any itemset that is potentially frequent in DB must be
frequent in at least one of the partitions of DB.
 Sampling: mining on a subset of given data, lower support
threshold + a method to determine the completeness.
 Dynamic itemset counting: add new candidate itemsets only
when all of their subsets are estimated to be frequent.

August 12, 2024 Moso J : Dedan Kimathi University 26

Domains where Apriori is used

Adverse drug reaction detection

It is used to perform association analysis on the characteristics of patients,
the drugs they are taking, their primary diagnosis, co-morbid conditions, and
the ADRs or adverse events (AE) they experience.
Association rules are produced that indicate what combinations of
medications and patient characteristics lead to ADRs.
Oracle Bone Inscription Explication
One of the oldest writing in the world, but of all 6000 words found till now
there are only about 1500 words that can be explicated explicitly thus it’s an
open problem in this field.
The OBI data extracted from the OBI corpus are preprocessed; with and
used as input for Apriori algorithm to get the frequent itemset. And
combined by the interestingness measurement the strong association rules
between OBI words are produced.
August 12, 2024 Moso J : Dedan Kimathi University 27
Challenges of Frequent Pattern Mining
 If we use candidate generation
 Need to generate a huge number of candidate sets.

 Need to repeatedly scan the database and check a large set of

candidates by pattern matching.

 Tedious workload of support counting for candidates

 Can we avoid that?

 FP-Trees (Frequent Pattern Trees)

 FP-Growth: allows frequent itemset discovery without candidate

itemset generation. Two step approach:
 Step 1: Build a compact data structure called the FP-tree: Built

using 2 passes over the data-set.

 Step 2: Extracts frequent itemsets directly from the FP-tree

August 12, 2024 Moso J : Dedan Kimathi University 28

Step 1: FP-Tree Construction
 FP-Tree is constructed using 2 passes over the data-set:

Pass 1:
– Scan data and find support for each item.
– Discard infrequent items.
– Sort frequent items in decreasing order based on their support.
Use this order when building the FP-Tree, so common prefixes can
be shared.
Step 1: FP-Tree Construction
Pass 2:
Nodes correspond to items and have a counter
1. FP-Growth reads 1 transaction at a time and maps it to a path

2. Fixed order is used, so paths can overlap when transactions share

items (when they have the same prefix ).
– In this case, counters are incremented

3. Pointers are maintained between nodes containing the same item,

creating singly linked lists (dotted lines)
– The more paths that overlap, the higher the compression. FP-tree

may fit in memory.

4. Frequent itemsets extracted from the FP-Tree.
Step 1: FP-Tree Construction (Example)
FP-Tree size
 The FP-Tree usually has a smaller size than the uncompressed data -
typically many transactions share items (and hence prefixes).
– Best case scenario: all transactions contain the same set of items.

• 1 path in the FP-tree

– Worst case scenario: every transaction has a unique set of items
(no items in common)
• Size of the FP-tree is at least as large as the original data.
• Storage requirements for the FP-tree are higher - need to store the pointers
between the nodes and the counters.

 The size of the FP-tree depends on how the items are ordered
 Ordering by decreasing support is typically used but it does not
always lead to the smallest tree (it's a heuristic).
Step 2: Frequent Itemset Generation
 FP-Growth extracts frequent itemsets from the FP-tree.
 Bottom-up algorithm - from the leaves towards the root
 Divide and conquer: first look for frequent itemsets ending in e, then
de, etc. . . then d, then cd, etc. . .
 First, extract prefix path sub-trees ending in an item(set). (hint: use
the linked lists)
Prefix path sub-trees (Example)
Step 2: Frequent Itemset Generation
 Each prefix path sub-tree is processed
recursively to extract the frequent itemsets.
Solutions are then merged.
 E.g. the prefix path sub-tree for e will be

used to extract frequent itemsets ending in

e, then in de, ce, be and ae, then in cde,
bde, cde, etc.
 Divide and conquer approach
Conditional FP-Tree
 The FP-Tree that would be built if we only consider transactions containing a
particular itemset (and then removing that itemset from all transactions).
 Example: FP-Tree conditional on e.
Example
Let minSup = 2 and extract all frequent itemsets containing e.
1. Obtain the prefix path sub-tree for e:
Example
2. Check if e is a frequent item by adding the counts along the linked
list (dotted line). If so, extract it.
Yes, count =3 so {e} is extracted as a frequent itemset.
3. As e is frequent, find frequent itemsets ending in e.( i.e. de, ce, be
and ae).
4. Use the conditional FP-tree for e to find frequent itemsets ending in
de, ce and ae
Note that be is not considered as b is not in the conditional FP-tree
for e.
For each of them (e.g. de), find the prefix paths from the conditional

tree for e, extract frequent itemsets, generate conditional FP-tree, etc...

(recursive)
Example
 Example: e -> de -> ade ({d,e}, {a,d,e} are found to be frequent)

• Example: e -> ce ({c,e} is found to be frequent)

Result
Frequent itemsets found (ordered by surfix and order in which they are
found):
Advantages of FP-Tree
 Completeness:
 Never breaks a long pattern of any transaction

 Preserves complete information for frequent pattern mining

 Compactness
 Reduce irrelevant information—infrequent items are gone

 Frequency descending ordering: more frequent items are more

likely to be shared
 Never be larger than the original database (if not count node-links

and counts)
Advantages/Disadvantages of FP growth

 Advantages of FP-Growth
 Only 2 passes over data-set
 “Compresses” data-set
 No candidate generation
 Much faster than Apriori

 Disadvantages of FP-Growth
 FP-Tree may not fit in memory!!
 FP-Tree is expensive to build
Constraint-based (Query-Directed) Mining

 Finding all the patterns in a database autonomously? —

unrealistic!
 The patterns could be too many but not focused!
 Data mining should be an interactive process
 User directs what to be mined using a data mining query
language (or a graphical user interface)
 Constraint-based mining
 User flexibility: provides constraints on what to be mined
 System optimization: explores such constraints for efficient
mining—constraint-based mining

August 12, 2024 Moso J : Dedan Kimathi University 43

Constraints in Data Mining

 Knowledge type constraint: Specify the type of knowledge to be mined

 classification, association, etc.

 Data constraint :set of task relevant data- using SQL-like queries

 find product pairs sold together in stores in Nyeri in Dec.’14

 Dimension/level constraint: Specify the desired dimensions (or attributes)

of the data, or levels of the concept hierarchies, to be used in mining.
 in relevance to region, price, brand, customer category

 Rule (or pattern) constraint : The form of rules to be mined

 small sales (price < Ksh.10) triggers big sales (sum > Ksh.200)

 Interestingness constraint: Specify thresholds on statistical measures of

rule interestingness
 strong rules: min_support  3%, min_confidence  60%

August 12, 2024 Moso J : Dedan Kimathi University 44

DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
BTonly CH 12345
60% (10)
BTonly CH 12345
267 pages
15. Association RuleMining
No ratings yet
15. Association RuleMining
52 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
TMK_DWDM_Unit 4. From government engineering College
No ratings yet
TMK_DWDM_Unit 4. From government engineering College
176 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
CH - 5
No ratings yet
CH - 5
43 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
3160714_DM_GTU_Study_Material_Presentations_Unit-3_21052021124240PM
No ratings yet
3160714_DM_GTU_Study_Material_Presentations_Unit-3_21052021124240PM
54 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
dataanalytics unit-4
No ratings yet
dataanalytics unit-4
23 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Lect 6
No ratings yet
Lect 6
74 pages
Contents
No ratings yet
Contents
59 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Lecture 6 - Other Data Science Tasks and Techniques
No ratings yet
Lecture 6 - Other Data Science Tasks and Techniques
60 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
DWDM-UNIT-3
No ratings yet
DWDM-UNIT-3
29 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
unit IV
No ratings yet
unit IV
86 pages
class 4-Associative Analysis
No ratings yet
class 4-Associative Analysis
42 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
14-Introduction to Apriori level wise algorithm-03-09-2024
No ratings yet
14-Introduction to Apriori level wise algorithm-03-09-2024
32 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
Association
No ratings yet
Association
54 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DA Unit 4
No ratings yet
DA Unit 4
125 pages
The Behavior of Financial Markets under Rational Expectations
From Everand
The Behavior of Financial Markets under Rational Expectations
Yan Han
No ratings yet
Lesson 10
No ratings yet
Lesson 10
11 pages
ICS 2408 - Lecture 8 - Applications and Trends in Data Mining
No ratings yet
ICS 2408 - Lecture 8 - Applications and Trends in Data Mining
17 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
ICS 2408 - Lecture 3 and 4 - Data Warehouse and OLAP
No ratings yet
ICS 2408 - Lecture 3 and 4 - Data Warehouse and OLAP
46 pages
Arrays Theory & Sorting
No ratings yet
Arrays Theory & Sorting
11 pages
BCA 2ND SEMESTER MATHS SSIGNMENT
No ratings yet
BCA 2ND SEMESTER MATHS SSIGNMENT
2 pages
Clustering
No ratings yet
Clustering
18 pages
Genral Instructions
No ratings yet
Genral Instructions
18 pages
Chapter2 Nonlinear Eqs Version2021
No ratings yet
Chapter2 Nonlinear Eqs Version2021
19 pages
CDT-22 Interval Trees
No ratings yet
CDT-22 Interval Trees
7 pages
Assignment 2-2.1.2 Pseudocode and Flowcharts
100% (1)
Assignment 2-2.1.2 Pseudocode and Flowcharts
3 pages
IC-Assignment_04_128
No ratings yet
IC-Assignment_04_128
6 pages
Binary Search Tree
No ratings yet
Binary Search Tree
64 pages
Exp 7 PDF
No ratings yet
Exp 7 PDF
11 pages
Amazon Aptitude Reasoning Technical Set II 1
No ratings yet
Amazon Aptitude Reasoning Technical Set II 1
7 pages
3x3 Rubik's Cube Solve
No ratings yet
3x3 Rubik's Cube Solve
2 pages
LaporanDLLNC&DLLC AzrinaPutri 2TK
No ratings yet
LaporanDLLNC&DLLC AzrinaPutri 2TK
19 pages
Splay Trees and B-Trees
No ratings yet
Splay Trees and B-Trees
27 pages
Unit 4-Wagner Whiting
No ratings yet
Unit 4-Wagner Whiting
17 pages
Introduction To Algorithms
No ratings yet
Introduction To Algorithms
19 pages
OS Program
No ratings yet
OS Program
36 pages
Unit 1 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Analysis and Design of Algorithm - WWW - Rgpvnotes.in
19 pages
Slides Array
No ratings yet
Slides Array
33 pages
Quiz about Top MCQs on Sorting Algorithms with Answers
No ratings yet
Quiz about Top MCQs on Sorting Algorithms with Answers
6 pages
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
No ratings yet
Green University of Bangladesh Department of Computer Science and Engineering (CSE)
4 pages
Sensors 22 06420
No ratings yet
Sensors 22 06420
40 pages
DSA in Java
No ratings yet
DSA in Java
8 pages
Daa Sly
No ratings yet
Daa Sly
4 pages
CSE548 Lecture 9
No ratings yet
CSE548 Lecture 9
52 pages
Module 1 - DS
No ratings yet
Module 1 - DS
170 pages
8335 Decap770 Advanced Data Structures
No ratings yet
8335 Decap770 Advanced Data Structures
241 pages
CS3401 Algorithms
No ratings yet
CS3401 Algorithms
51 pages
Soft Computing CT QP
No ratings yet
Soft Computing CT QP
2 pages

ICS 2408 - Lecture 5 - Association

Uploaded by

ICS 2408 - Lecture 5 - Association

Uploaded by

Mining Frequent Patterns, Association and

August 12, 2024 Moso J : Dedan Kimathi University 1

 Motivation: finding regularities in data

August 12, 2024 Moso J : Dedan Kimathi University 2

August 12, 2024 Moso J : Dedan Kimathi University 3

 Understand why they make certain purchases

 Gain insight about its merchandise (products):

 Fast and slow movers

 Products which are purchased together

 Products which might benefit from promotion

 Which products to put on specials, promote, coupons…

 Combining all of this with a customer loyalty card it becomes even

Market basket example:

August 12, 2024 Moso J : Dedan Kimathi University 6

 X  Y can be different from Y  X, unlike correlation

 Association rules assist in:

 Churning management etc

August 12, 2024 Moso J : Dedan Kimathi University 7

August 12, 2024 Moso J : Dedan Kimathi University 8

get automotive services done

August 12, 2024 Moso J : Dedan Kimathi University 9

August 12, 2024 Moso J : Dedan Kimathi University 10

August 12, 2024 Moso J : Dedan Kimathi University 12

August 12, 2024 Moso J : Dedan Kimathi University 13

August 12, 2024 Moso J : Dedan Kimathi University 14

Transaction-id Items bought Min. support 50%

August 12, 2024 Moso J : Dedan Kimathi University 15

 Confidence: Find frequency of subsets of supported

August 12, 2024 Moso J : Dedan Kimathi University 16

 Find the frequent itemsets: the sets of items that have

August 12, 2024 Moso J : Dedan Kimathi University 17

 The downward closure property of frequent patterns

diaper} i.e., every transaction having {beer, diaper,

 Frequent pattern growth

 Vertical data format approach

August 12, 2024 Moso J : Dedan Kimathi University 18

 Apriori pruning principle: If there is any itemset which is

August 12, 2024 Moso J : Dedan Kimathi University 19

 Frequent itemsets  association rules

August 12, 2024 Moso J : Dedan Kimathi University 23

August 12, 2024 Moso J : Dedan Kimathi University 24

 Requires up to m database scans.

August 12, 2024 Moso J : Dedan Kimathi University 25

August 12, 2024 Moso J : Dedan Kimathi University 26

Adverse drug reaction detection

 Need to repeatedly scan the database and check a large set of

candidates by pattern matching.

 Can we avoid that?

 FP-Growth: allows frequent itemset discovery without candidate

using 2 passes over the data-set.

August 12, 2024 Moso J : Dedan Kimathi University 28

2. Fixed order is used, so paths can overlap when transactions share

3. Pointers are maintained between nodes containing the same item,

may fit in memory.

• 1 path in the FP-tree

used to extract frequent itemsets ending in

tree for e, extract frequent itemsets, generate conditional FP-tree, etc...

• Example: e -> ce ({c,e} is found to be frequent)

 Preserves complete information for frequent pattern mining

 Frequency descending ordering: more frequent items are more

 Finding all the patterns in a database autonomously? —

August 12, 2024 Moso J : Dedan Kimathi University 43

 Knowledge type constraint: Specify the type of knowledge to be mined

 Data constraint :set of task relevant data- using SQL-like queries

 Dimension/level constraint: Specify the desired dimensions (or attributes)

 Rule (or pattern) constraint : The form of rules to be mined

 Interestingness constraint: Specify thresholds on statistical measures of

August 12, 2024 Moso J : Dedan Kimathi University 44

You might also like