0% found this document useful (0 votes)

2 views

Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2

The document discusses data cleaning through attribute creation, focusing on methodologies like attribute extraction, sampling, and clustering to enhance data representation. It also covers frequent pattern mining, detailing concepts such as itemsets, support, and association rules, emphasizing their importance in various data mining tasks. Additionally, it introduces closed patterns and max-patterns as efficient ways to manage frequent patterns and reduce complexity.

Uploaded by

Dina Bardakji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2

Uploaded by

Dina Bardakji

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

BIDA311: Data Mining

Ch.2:
Data, measurements, and preprocessing Lecture 3
Ch. 4+5: Pattern mining

by Dr. Jamal Al Qundus

Data Cleaning (Alternative): Attribute Creation
(Feature Generation)
• Create new attributes (features) that can capture the important information
in a data set more effectively than the original ones
• Three general methodologies
• Attribute extraction
• Domain-specific
• Mapping data to new space (see: data reduction)
• E.g., Fourier transformation, wavelet transformation, manifold approaches (not
covered)
• Attribute construction
• Combining features (see: discriminative frequent patterns in Chapter 7)
• Data discretization

17
Attribute extraction: Clustering
• Partition data set into clusters based on similarity, and
store cluster representation (e.g., centroid) only
• Can be very effective if data is clustered but not if data is
“smeared”
• Can have hierarchical clustering and be stored in multi-
dimensional index tree structures
• There are many choices of clustering definitions and
clustering algorithms

18
Attribute extraction: Sampling
• Sampling: obtaining a small sample s to represent the whole
data set N
• Key principle: Choose a representative subset of the data
• Simple random sampling may have very poor
performance
• Develop adaptive sampling methods, e.g., stratified
sampling

19
Types of Sampling
• Simple random sampling
• There is an equal probability of selecting any particular item
• Sampling without replacement
• Once an object is selected, it is removed from the population
• Sampling with replacement
• A selected object is not removed from the population
• Stratified sampling:
• Partition the data set, and draw samples from each partition
(proportionally, i.e., approximately the same percentage of the
data)

20
Sampling: With or without Replacement

W O R
SRS le random
im p h o ut
( s e wit
p l
sam ment)
p la c e
re

SRSW
R

Raw Data
21
Outline
• Mining Frequent Patterns
• Association and Correlations
• Basic Concepts and Methods
• Frequent Itemset Mining Methods
• Which Patterns Are Interesting?—Pattern
Evaluation Methods

• Goal: Understanding concept of mining

frequent patterns
What Is Frequent Pattern Analysis?
• Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining
• Motivation: Finding inherent regularities in data
• What products were often purchased together?— milk and chocolate?!
• What are the subsequent purchases after buying a PC?
• What kinds of DNA are sensitive to this new drug?
• Can we automatically classify web documents?
• Applications
• Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
23
Why Is Freq. Pattern Mining Important?

• Foundation for many essential data mining tasks

• Association, correlation, and causality analysis
• Sequential, structural (e.g., sub-graph) patterns
• Pattern analysis in multimedia, time-series, and stream
data
• Classification: discriminative, frequent pattern analysis
• Cluster analysis: frequent pattern-based clustering
• Data warehousing: iceberg cube
• Semantic data compression
• Broad applications

Iceberg cube represents items whose

aggregate values are above a given threshold
24
Basic Concepts: Frequent Patterns

Tid Items bought • itemset: A set of one or more

10 soda, Nuts, Chocolate items
20 soda, Coffee, Chocolate • k-itemset X = {x1, …, xk}
30 soda, Chocolate, Eggs
• (absolute) support, or, support
40 Nuts, Eggs, Milk
count of X: Frequency or
50 Nuts, Coffee, Chocolate, Eggs, Milk
occurrence of an itemset X
Customer
buys both
Customer • (relative) support, s, is the fraction
buys of transactions that contains X
chocolate
(i.e., the probability that a
transaction contains X)
• An itemset X is frequent if X’s
Customer support is no less than a minsup
buys soda threshold
25
Basic Concepts: Association Rules
Tid Items bought • Find all the rules X à Y with
10 Soda, Nuts, Chocolate minimum support and confidence
20 Soda, Coffee, Chocolate
• support, s, probability that a
30 Soda, Chocolate, Eggs
40 Nuts, Eggs, Milk
transaction contains X È Y
50 Nuts, Coffee, Chocolate, Eggs, • confidence, c, conditional
Customer
Milk
probability that a transaction
Customer
buys both
buys
having X also contains Y
chocolate Let minsup = 50%, minconf = 50%
Freq. Pat.: Soda:3, Nuts:3, Chocolate :4,
Eggs:3, {Soda, Chocolate}:3
Customer
buys soda n Association rules: (many more!)
n soda à chocolate (60%, 100%)
n chocolate à soda (80%, 75%)
26
Closed Patterns and Max-Patterns
• A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains 2100 sub-patterns!
• Solution: Mine closed patterns and max-patterns instead
• An itemset X is closed if X is frequent and there exists no
super-pattern Y ‫ כ‬X, with the same support as X (proposed by
Pasquier, et al. @ ICDT’99)
• An itemset X is a max-pattern if X is frequent and there exists
no frequent super-pattern Y ‫ כ‬X (proposed by Bayardo @
SIGMOD’98)
• Closed pattern is a lossless compression of freq. patterns
• Reducing the # of patterns and rules
27
Max pattern? Closed pattern?

yes frequ no yes frequ no

ent ent

Part yes Part yes

no no
of of
super super

less
yes freque no
nt
super

Yes No Yes No

Pattern xy is a frequent pattern and there is no Pattern xy is a frequent pattern and also the only
super-pattern xyz. super-pattern xyz is less frequent than xy.
{a} = 4 {a,b,c} = 1
{b} = 2 {a,b,d} = 0 Minsupp = 50% = 3
{c} = 5 {a,b,e} = 1
{d} = 4 {a,c,d} = 2
{e} = 6 {a,c,e} = 3 Closed-pattern?
{a,b} = 1 {a,d,e} = 3 Max-pattern?
{a,c} = 3 {b,c,d} = 0
{a,d} = 3 {b,c,e} = 2
{a,e} = 4 {c,d,e} = 3
{b,c} = 2 {a,b,c,d} = 0
{b,d} = 0 {a,b,c,e} = 1
{b,e} = 2 {b,c,d,e} = 0
{c,d} = 3
{c,e} = 5
{d,e} = 4
{a} = 4 {a,b,c} = 1
{b} = 2 {a,b,d} = 0 Minsupp = 50% = 3
{c} = 5 {a,b,e} = 1
{d} = 4 {a,c,d} = 2
{e} = 6 {a,c,e} = 3 Closed-pattern:
{a,b} = 1 {a,d,e} = 3 e=6
a,e = 4
{a,c} = 3 {b,c,d} = 0
c,e = 5
{a,d} = 3 {b,c,e} = 2 d,e = 4
{a,e} = 4 {c,d,e} = 3 a,c,e = 3
{b,c} = 2 {a,b,c,d} = 0 a,d,e = 3
{b,d} = 0 {a,b,c,e} = 1 c,d,e = 3
{b,e} = 2 {b,c,d,e} = 0 Max-pattern:
a,c,e = 3
{c,d} = 3 a,d,e = 3
{c,e} = 5 c,d,e = 3
{d,e} = 4

Haier
No ratings yet
Haier
36 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Chap4-PatternMiningBasic
No ratings yet
Chap4-PatternMiningBasic
52 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
06 FPBasic
No ratings yet
06 FPBasic
74 pages
06Apriori Edited v3
No ratings yet
06Apriori Edited v3
29 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
6a - Frequent Pattern Analysis
No ratings yet
6a - Frequent Pattern Analysis
13 pages
38 GM_ASAP-Association Rule Mining
No ratings yet
38 GM_ASAP-Association Rule Mining
64 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Unit 3
No ratings yet
Unit 3
62 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Week 3
No ratings yet
Week 3
56 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
Module 3
No ratings yet
Module 3
136 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
dm 2
No ratings yet
dm 2
71 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Data Cube Computation and Data Generation
No ratings yet
Data Cube Computation and Data Generation
54 pages
Association
No ratings yet
Association
40 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Association Rules
No ratings yet
Association Rules
48 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Chapter 5 Topic 1
No ratings yet
Chapter 5 Topic 1
15 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
100 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Module 3
No ratings yet
Module 3
98 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
94 pages
2007 Jiawei Han FP Mining
No ratings yet
2007 Jiawei Han FP Mining
32 pages
Association Rules
No ratings yet
Association Rules
20 pages
Unit 3
No ratings yet
Unit 3
44 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Self Inflected Wound
No ratings yet
Self Inflected Wound
13 pages
Demystifying Innovation in the Value Chain (2)
No ratings yet
Demystifying Innovation in the Value Chain (2)
8 pages
apologia final
No ratings yet
apologia final
12 pages
Important Techniques for Analyzing Visual Tex
No ratings yet
Important Techniques for Analyzing Visual Tex
6 pages
ACT I
No ratings yet
ACT I
3 pages
poem annotation
No ratings yet
poem annotation
26 pages
Interesting Python
No ratings yet
Interesting Python
5 pages
Finance Ch 3 booklet- financial statements (2)
No ratings yet
Finance Ch 3 booklet- financial statements (2)
8 pages
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts (1)
No ratings yet
Lecture 11 Chapter 6 Part 2 Big Data Processing Concepts (1)
14 pages
Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts (5)
No ratings yet
Lecture 5 Chapter 5 Part 1 Big Data Storage Concepts (5)
19 pages
English a Language and Literature Internal Assessment Class of 2022
No ratings yet
English a Language and Literature Internal Assessment Class of 2022
6 pages
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts (1)
No ratings yet
Lecture 10 Chapter 6 Part 1 Big Data Processing Concepts (1)
26 pages
Evolution 5.1 natural selection edit
No ratings yet
Evolution 5.1 natural selection edit
3 pages
Lecture 1
No ratings yet
Lecture 1
68 pages
docx (4)
No ratings yet
docx (4)
15 pages
Lecture 6 Chapter 5 Part 2 Big Data Storage Concepts (4)
No ratings yet
Lecture 6 Chapter 5 Part 2 Big Data Storage Concepts (4)
6 pages
chem practice
No ratings yet
chem practice
2 pages
A symbiotic relationship Biology 4.4
No ratings yet
A symbiotic relationship Biology 4.4
1 page
Teilnehmerliste_Mündlicher Ausdruck_Labs
No ratings yet
Teilnehmerliste_Mündlicher Ausdruck_Labs
14 pages
Cells IB
No ratings yet
Cells IB
37 pages
Soft Skills Summary
No ratings yet
Soft Skills Summary
17 pages
chapter-9-test-bank
No ratings yet
chapter-9-test-bank
31 pages
Microsoft Word - Lecture 1.Docx
No ratings yet
Microsoft Word - Lecture 1.Docx
55 pages
ds-bida
No ratings yet
ds-bida
2 pages
Death of a Salesman - Act 2 Questions
No ratings yet
Death of a Salesman - Act 2 Questions
2 pages
mis-laudon-14-chapter-4-test-bank (1)
No ratings yet
mis-laudon-14-chapter-4-test-bank (1)
29 pages
PCA - Colab
No ratings yet
PCA - Colab
2 pages
Document
No ratings yet
Document
3 pages
Water
No ratings yet
Water
15 pages
Here are the answers to your questions
No ratings yet
Here are the answers to your questions
3 pages
r0 - Analysis and Design Structural Report of Facade Steel Supporting Structure - Ewa Hidd
No ratings yet
r0 - Analysis and Design Structural Report of Facade Steel Supporting Structure - Ewa Hidd
623 pages
How to Write Falling for Best Friend Brother or Sister Trope in a YA Romance Novel
No ratings yet
How to Write Falling for Best Friend Brother or Sister Trope in a YA Romance Novel
23 pages
WHO Partograph Study Lancet 1994
No ratings yet
WHO Partograph Study Lancet 1994
6 pages
Cabadbaran City Time Capsule Laid To Rest - Edit
No ratings yet
Cabadbaran City Time Capsule Laid To Rest - Edit
6 pages
Supermarket Genset - PUSAT GENSET INDUSTRY HARGA SUPERMARKET
No ratings yet
Supermarket Genset - PUSAT GENSET INDUSTRY HARGA SUPERMARKET
7 pages
1h Ai Automation Agency Course Aaa
No ratings yet
1h Ai Automation Agency Course Aaa
9 pages
Software Engineering:: - Software Maintenance & Reliability Issues
No ratings yet
Software Engineering:: - Software Maintenance & Reliability Issues
20 pages
Determination of Selected Engineering Properties of Soybean (Glycine Max) Related To Design of Processing Machine
No ratings yet
Determination of Selected Engineering Properties of Soybean (Glycine Max) Related To Design of Processing Machine
5 pages
Sapta Bhumika GP2020
No ratings yet
Sapta Bhumika GP2020
15 pages
Experiment #6
71% (7)
Experiment #6
11 pages
The 9 Age Fantasy Battles - Homebrew: The Vôdun Vampires
No ratings yet
The 9 Age Fantasy Battles - Homebrew: The Vôdun Vampires
24 pages
2058 s13 Ms 41
0% (1)
2058 s13 Ms 41
6 pages
Lessons of The Naqshbandi Mujaddidi Tariqah
No ratings yet
Lessons of The Naqshbandi Mujaddidi Tariqah
17 pages
ME8791 Mechatronics Course File
100% (1)
ME8791 Mechatronics Course File
22 pages
Avionics - Module 2
No ratings yet
Avionics - Module 2
38 pages
Week 10 Determine Meaning of Words
No ratings yet
Week 10 Determine Meaning of Words
17 pages
Manual
No ratings yet
Manual
27 pages
SF Dump
No ratings yet
SF Dump
31 pages
Airport Choice in A Multiple Airport Region
No ratings yet
Airport Choice in A Multiple Airport Region
11 pages
Language
No ratings yet
Language
2 pages
Module 3 Creative Writing
No ratings yet
Module 3 Creative Writing
13 pages
Origin of Vedas
No ratings yet
Origin of Vedas
5 pages
Resume Jonna
100% (1)
Resume Jonna
5 pages
Electromagnetic Spectrum
100% (1)
Electromagnetic Spectrum
24 pages
For Free
No ratings yet
For Free
4 pages
8 Rules For Good Customer Service
No ratings yet
8 Rules For Good Customer Service
18 pages
Manual 800 3310 IB 09MC1
No ratings yet
Manual 800 3310 IB 09MC1
199 pages
Croda NF-T
No ratings yet
Croda NF-T
3 pages
PDMS 12.1 Electrical & Instrumentation: AVEVA Solutions Limited
0% (1)
PDMS 12.1 Electrical & Instrumentation: AVEVA Solutions Limited
7 pages

Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2

Uploaded by

Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2

Uploaded by

BIDA311: Data Mining

by Dr. Jamal Al Qundus

• Goal: Understanding concept of mining

• Foundation for many essential data mining tasks

Iceberg cube represents items whose

Tid Items bought • itemset: A set of one or more

yes frequ no yes frequ no

Part yes Part yes

You might also like