Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2
Httpsmygju.gju.Edu.jofacescourse Portfoliocourse Syllabuscourse Syllabus.xhtml 2
Ch.2:
Data, measurements, and preprocessing Lecture 3
Ch. 4+5: Pattern mining
17
Attribute extraction: Clustering
• Partition data set into clusters based on similarity, and
store cluster representation (e.g., centroid) only
• Can be very effective if data is clustered but not if data is
“smeared”
• Can have hierarchical clustering and be stored in multi-
dimensional index tree structures
• There are many choices of clustering definitions and
clustering algorithms
18
Attribute extraction: Sampling
• Sampling: obtaining a small sample s to represent the whole
data set N
• Key principle: Choose a representative subset of the data
• Simple random sampling may have very poor
performance
• Develop adaptive sampling methods, e.g., stratified
sampling
19
Types of Sampling
• Simple random sampling
• There is an equal probability of selecting any particular item
• Sampling without replacement
• Once an object is selected, it is removed from the population
• Sampling with replacement
• A selected object is not removed from the population
• Stratified sampling:
• Partition the data set, and draw samples from each partition
(proportionally, i.e., approximately the same percentage of the
data)
20
Sampling: With or without Replacement
W O R
SRS le random
im p h o ut
( s e wit
p l
sam ment)
p la c e
re
SRSW
R
Raw Data
21
Outline
• Mining Frequent Patterns
• Association and Correlations
• Basic Concepts and Methods
• Frequent Itemset Mining Methods
• Which Patterns Are Interesting?—Pattern
Evaluation Methods
less
yes freque no
nt
super
Yes No Yes No
Pattern xy is a frequent pattern and there is no Pattern xy is a frequent pattern and also the only
super-pattern xyz. super-pattern xyz is less frequent than xy.
{a} = 4 {a,b,c} = 1
{b} = 2 {a,b,d} = 0 Minsupp = 50% = 3
{c} = 5 {a,b,e} = 1
{d} = 4 {a,c,d} = 2
{e} = 6 {a,c,e} = 3 Closed-pattern?
{a,b} = 1 {a,d,e} = 3 Max-pattern?
{a,c} = 3 {b,c,d} = 0
{a,d} = 3 {b,c,e} = 2
{a,e} = 4 {c,d,e} = 3
{b,c} = 2 {a,b,c,d} = 0
{b,d} = 0 {a,b,c,e} = 1
{b,e} = 2 {b,c,d,e} = 0
{c,d} = 3
{c,e} = 5
{d,e} = 4
{a} = 4 {a,b,c} = 1
{b} = 2 {a,b,d} = 0 Minsupp = 50% = 3
{c} = 5 {a,b,e} = 1
{d} = 4 {a,c,d} = 2
{e} = 6 {a,c,e} = 3 Closed-pattern:
{a,b} = 1 {a,d,e} = 3 e=6
a,e = 4
{a,c} = 3 {b,c,d} = 0
c,e = 5
{a,d} = 3 {b,c,e} = 2 d,e = 4
{a,e} = 4 {c,d,e} = 3 a,c,e = 3
{b,c} = 2 {a,b,c,d} = 0 a,d,e = 3
{b,d} = 0 {a,b,c,e} = 1 c,d,e = 3
{b,e} = 2 {b,c,d,e} = 0 Max-pattern:
a,c,e = 3
{c,d} = 3 a,d,e = 3
{c,e} = 5 c,d,e = 3
{d,e} = 4