SlideShare a Scribd company logo
2
Most read
4
Most read
8
Most read
1
Discretization and Concept
Hierarchy Generation
2
Discretization
 Types of attributes:
 Nominal — values from an unordered set, e.g., color, profession
 Ordinal — values from an ordered set, e.g., military or academic
rank
 Continuous — real numbers, e.g., integer or real numbers
 Discretization:
 Divide the range of a continuous attribute into intervals
 Reduce data size by discretization
3
Discretization and Concept Hierarchy
 Discretization
 Reduce the number of values for a given continuous attribute
by dividing the range of the attribute into intervals
 Interval labels can then be used to replace actual data values
 Supervised vs. unsupervised
 Split (top-down) vs. merge (bottom-up)
 Discretization can be performed recursively on an attribute
4
Concept hierarchy
 Concept hierarchy formation
 Recursively reduce the data by collecting and replacing low level
concepts (such as numeric values for age) by higher level
concepts (such as young, middle-aged, or senior)
 Detail lost
 More meaningful
 Easier to interpret
 Mining becomes easier
 Several concept hierarchies can be defined for the same
attribute
 Manual / Implicit
5
Discretization and Concept Hierarchy
Generation for Numeric Data
 Typical methods:
 Binning
 Histogram analysis
 Clustering analysis
 Entropy-based discretization
 χ2
merging
 Segmentation by natural partitioning
All the methods can be applied recursively
6
Techniques
 Binning
 Distribute values into bins
 Replace by bin mean / median
 Recursive application – leads to concept hierarchies
 Unsupervised technique
 Histogram Analysis
 Data Distribution – Partition
 Equiwidth – (0-100], (100-200], …
 Equidepth
 Recursive
 Minimum Interval size
 Unsupervised
7
Techniques
 Cluster Analysis
 Clusters form nodes of concept hierarchy
 Can decompose / combine
 Lower level / higher level of hierarchy
8
Entropy-Based Discretization
 Given a set of samples S, if S is partitioned into two intervals S1 and S2
using boundary T, the expected information requirement after partitioning is
 Entropy is calculated based on class distribution of the samples in the set.
Given m classes, the entropy of S1 is
where pi is the probability of class i in S1
 The boundary that minimizes the expected information requirement over all
possible boundaries is selected as a binary discretization
 The process is recursively applied to partitions obtained until some stopping
criterion is met
)(
||
||
)(
||
||
),( 2
2
1
1
SEntropy
S
S
SEntropy
S
S
TSI +=
∑=
−=
m
i
ii ppSEntropy
1
21 )(log)(
9
 Reduces data size
 Class information is considered
 Improves accuracy
Entropy-Based Discretization
10
Interval Merging by χ2
Analysis
 ChiMerge
 Bottom-up approach
 find the best neighbouring intervals and merges them to form larger intervals
 Supervised
 If two adjacent intervals have similar distribution of classes – they can be
merged
 Initially each value is in a separate interval
 χ2
tests are performed for adjacent intervals. Those with least
values are merged
 Can be repeated
 Stopping condition (Threshold, Number of intervals)
11
Segmentation by Natural Partitioning
 A simply 3-4-5 rule can be used to segment numeric data into
relatively uniform, “natural” intervals.
 If an interval covers 3, 6, 7 or 9 distinct values at the most
significant digit, partition the range into 3 equi-width intervals
 If it covers 2, 4, or 8 distinct values at the most significant digit,
partition the range into 4 intervals
 If it covers 1, 5, or 10 distinct values at the most significant digit,
partition the range into 5 intervals
12
 Outliers could be present
 Consider only the majority values
 5th
percentile – 95th
percentile
Segmentation by Natural Partitioning
13
Example of 3-4-5 Rule
(-$400 -$5,000)
(-$400 - 0)
(-$400 -
-$300)
(-$300 -
-$200)
(-$200 -
-$100)
(-$100 -
0)
(0 - $1,000)
(0 -
$200)
($200 -
$400)
($400 -
$600)
($600 -
$800) ($800 -
$1,000)
($2,000 - $5, 000)
($2,000 -
$3,000)
($3,000 -
$4,000)
($4,000 -
$5,000)
($1,000 - $2, 000)
($1,000 -
$1,200)
($1,200 -
$1,400)
($1,400 -
$1,600)
($1,600 -
$1,800)
($1,800 -
$2,000)
msd=1,000 Low=-$1,000 High=$2,000Step 2:
Step 4:
Step 1: -$351 -$159 profit $1,838 $4,700
Min Low (i.e, 5%-tile) High(i.e, 95%-tile) Max
count
(-$1,000 - $2,000)
(-$1,000 - 0) (0 -$ 1,000)
Step 3:
($1,000 - $2,000)
14
Concept Hierarchy Generation for
Categorical Data
 Specification of a partial ordering of attributes explicitly at
the schema level by users or experts
 User / Expert defines hierarchy
 Street < city < state < country
 Specification of a portion of a hierarchy by explicit data
grouping
 Manual
 Intermediate level information specified
 Industrial, Agricultural..
15
Concept Hierarchy Generation for
Categorical Data
 Specification of a set of attributes but not their partial
ordering
 Automatically inferring the hierarchy
 Heuristic rule
 High level concepts contain a smaller number of values
 Specification of only a partial set of attributes
 Embedding data semantics
 Attributes with tight semantic connections are pinned together

More Related Content

PPTX
Data preprocessing in Machine learning
pyingkodi maran
 
PPTX
Decision tree induction
thamizh arasi
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPT
Cure, Clustering Algorithm
Lino Possamai
 
PPTX
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
PPTX
Association rule mining.pptx
maha797959
 
PPT
3. mining frequent patterns
Azad public school
 
PPTX
Data warehouse architecture
janani thirupathi
 
Data preprocessing in Machine learning
pyingkodi maran
 
Decision tree induction
thamizh arasi
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Cure, Clustering Algorithm
Lino Possamai
 
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
Association rule mining.pptx
maha797959
 
3. mining frequent patterns
Azad public school
 
Data warehouse architecture
janani thirupathi
 

What's hot (20)

PPT
introduction to data mining tutorial
Salah Amean
 
PPT
1.7 data reduction
Krish_ver2
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
PPTX
Data mining primitives
lavanya marichamy
 
PPT
3.7 outlier analysis
Krish_ver2
 
PPTX
Data For Datamining
DataminingTools Inc
 
PPTX
Data mining Measuring similarity and desimilarity
Rushali Deshmukh
 
PPT
3.3 hierarchical methods
Krish_ver2
 
PPTX
k medoid clustering.pptx
Roshan86572
 
PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
ODP
Machine Learning with Decision trees
Knoldus Inc.
 
PPTX
Map Reduce
Prashant Gupta
 
PPTX
Data science unit1
varshakumar21
 
PPTX
Data Mining
SHIKHA GAUTAM
 
PPTX
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
PPTX
Discretization and concept hierarchy(os)
snegacmr
 
PPTX
Hadoop File system (HDFS)
Prashant Gupta
 
PPT
3.5 model based clustering
Krish_ver2
 
PPTX
Ordbms
ramandeep brar
 
PPTX
Major issues in data mining
Slideshare
 
introduction to data mining tutorial
Salah Amean
 
1.7 data reduction
Krish_ver2
 
Hadoop MapReduce Fundamentals
Lynn Langit
 
Data mining primitives
lavanya marichamy
 
3.7 outlier analysis
Krish_ver2
 
Data For Datamining
DataminingTools Inc
 
Data mining Measuring similarity and desimilarity
Rushali Deshmukh
 
3.3 hierarchical methods
Krish_ver2
 
k medoid clustering.pptx
Roshan86572
 
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Machine Learning with Decision trees
Knoldus Inc.
 
Map Reduce
Prashant Gupta
 
Data science unit1
varshakumar21
 
Data Mining
SHIKHA GAUTAM
 
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Discretization and concept hierarchy(os)
snegacmr
 
Hadoop File system (HDFS)
Prashant Gupta
 
3.5 model based clustering
Krish_ver2
 
Major issues in data mining
Slideshare
 
Ad

Viewers also liked (20)

PDF
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
PPTX
Concept description characterization and comparison
ric_biet
 
PDF
Dimensionality Reduction
mrizwan969
 
PDF
Dimensionality reduction
Shatakirti Er
 
PPTX
Different type of databases
Shwe Yee
 
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
PDF
How I data mined my text message history
Joe Cannatti Jr.
 
PPTX
Odam: Open Data, Access and Mining
Daniel JACOB
 
PPTX
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
PPT
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
PPT
Data Mining Concepts
Dung Nguyen
 
PPTX
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
PPT
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
PPT
3.2 partitioning methods
Krish_ver2
 
PPT
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
PPTX
Data visualization
Jan Willem Tulp
 
PPT
Data Warehousing and Data Mining
idnats
 
PPTX
Data cube computation
Rashmi Sheikh
 
PPTX
Data Mining: Classification and analysis
DataminingTools Inc
 
PDF
Support Vector Machines for Classification
Prakash Pimpale
 
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Concept description characterization and comparison
ric_biet
 
Dimensionality Reduction
mrizwan969
 
Dimensionality reduction
Shatakirti Er
 
Different type of databases
Shwe Yee
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
error007
 
How I data mined my text message history
Joe Cannatti Jr.
 
Odam: Open Data, Access and Mining
Daniel JACOB
 
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
Data Mining Concepts
Dung Nguyen
 
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
3.2 partitioning methods
Krish_ver2
 
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Data visualization
Jan Willem Tulp
 
Data Warehousing and Data Mining
idnats
 
Data cube computation
Rashmi Sheikh
 
Data Mining: Classification and analysis
DataminingTools Inc
 
Support Vector Machines for Classification
Prakash Pimpale
 
Ad

Similar to 1.8 discretization (20)

PPTX
Datamining
lalithambiga kamaraj
 
PPTX
Data mining
Maulik Togadiya
 
PPTX
Datamining
Haripritha
 
PDF
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
 
PPT
multiarmed bandit.ppt
LPrashanthi
 
PPT
clustering.ppt
VivekKumar898803
 
PPTX
Cluster analysis
Pushkar Mishra
 
PPTX
Clusters techniques
rajshreemuthiah
 
PPTX
8clustering.pptx
DeepanshuPatel19
 
PPT
DM UNIT_4 PPT for btech final year students
sriharipatilin
 
PPTX
Dm powerpoint
snegacmr
 
PPT
Jewei Hans & Kamber Capter 7
Houw Liong The
 
PPTX
Hierarchical clustering
ishmecse13
 
PPTX
Cluster Analysis
Baivab Nag
 
PDF
Chapter 5.pdf
DrGnaneswariG
 
PPT
Chapter 07
Houw Liong The
 
PDF
4 module 3 --
tafosepsdfasg
 
PPT
My8clst
ketan533
 
PPTX
Cluster analysis
Avijit Famous
 
PDF
Multilevel techniques for the clustering problem
csandit
 
Data mining
Maulik Togadiya
 
Datamining
Haripritha
 
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
 
multiarmed bandit.ppt
LPrashanthi
 
clustering.ppt
VivekKumar898803
 
Cluster analysis
Pushkar Mishra
 
Clusters techniques
rajshreemuthiah
 
8clustering.pptx
DeepanshuPatel19
 
DM UNIT_4 PPT for btech final year students
sriharipatilin
 
Dm powerpoint
snegacmr
 
Jewei Hans & Kamber Capter 7
Houw Liong The
 
Hierarchical clustering
ishmecse13
 
Cluster Analysis
Baivab Nag
 
Chapter 5.pdf
DrGnaneswariG
 
Chapter 07
Houw Liong The
 
4 module 3 --
tafosepsdfasg
 
My8clst
ketan533
 
Cluster analysis
Avijit Famous
 
Multilevel techniques for the clustering problem
csandit
 

More from Krish_ver2 (20)

PPT
5.5 back tracking
Krish_ver2
 
PPT
5.5 back track
Krish_ver2
 
PPT
5.5 back tracking 02
Krish_ver2
 
PPT
5.4 randomized datastructures
Krish_ver2
 
PPT
5.4 randomized datastructures
Krish_ver2
 
PPT
5.4 randamized algorithm
Krish_ver2
 
PPT
5.3 dynamic programming 03
Krish_ver2
 
PPT
5.3 dynamic programming
Krish_ver2
 
PPT
5.3 dyn algo-i
Krish_ver2
 
PPT
5.2 divede and conquer 03
Krish_ver2
 
PPT
5.2 divide and conquer
Krish_ver2
 
PPT
5.2 divede and conquer 03
Krish_ver2
 
PPT
5.1 greedyyy 02
Krish_ver2
 
PPT
5.1 greedy
Krish_ver2
 
PPT
5.1 greedy 03
Krish_ver2
 
PPT
4.4 hashing02
Krish_ver2
 
PPT
4.4 hashing
Krish_ver2
 
PPT
4.4 hashing ext
Krish_ver2
 
PPT
4.4 external hashing
Krish_ver2
 
PPT
4.2 bst
Krish_ver2
 
5.5 back tracking
Krish_ver2
 
5.5 back track
Krish_ver2
 
5.5 back tracking 02
Krish_ver2
 
5.4 randomized datastructures
Krish_ver2
 
5.4 randomized datastructures
Krish_ver2
 
5.4 randamized algorithm
Krish_ver2
 
5.3 dynamic programming 03
Krish_ver2
 
5.3 dynamic programming
Krish_ver2
 
5.3 dyn algo-i
Krish_ver2
 
5.2 divede and conquer 03
Krish_ver2
 
5.2 divide and conquer
Krish_ver2
 
5.2 divede and conquer 03
Krish_ver2
 
5.1 greedyyy 02
Krish_ver2
 
5.1 greedy
Krish_ver2
 
5.1 greedy 03
Krish_ver2
 
4.4 hashing02
Krish_ver2
 
4.4 hashing
Krish_ver2
 
4.4 hashing ext
Krish_ver2
 
4.4 external hashing
Krish_ver2
 
4.2 bst
Krish_ver2
 

Recently uploaded (20)

PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
DOCX
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PDF
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PDF
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PDF
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Modul Ajar Deep Learning Bahasa Inggris Kelas 11 Terbaru 2025
wahyurestu63
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
2.Reshaping-Indias-Political-Map.ppt/pdf/8th class social science Exploring S...
Sandeep Swamy
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
RA 12028_ARAL_Orientation_Day-2-Sessions_v2.pdf
Seven De Los Reyes
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Health-The-Ultimate-Treasure (1).pdf/8th class science curiosity /samyans edu...
Sandeep Swamy
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 

1.8 discretization

  • 2. 2 Discretization  Types of attributes:  Nominal — values from an unordered set, e.g., color, profession  Ordinal — values from an ordered set, e.g., military or academic rank  Continuous — real numbers, e.g., integer or real numbers  Discretization:  Divide the range of a continuous attribute into intervals  Reduce data size by discretization
  • 3. 3 Discretization and Concept Hierarchy  Discretization  Reduce the number of values for a given continuous attribute by dividing the range of the attribute into intervals  Interval labels can then be used to replace actual data values  Supervised vs. unsupervised  Split (top-down) vs. merge (bottom-up)  Discretization can be performed recursively on an attribute
  • 4. 4 Concept hierarchy  Concept hierarchy formation  Recursively reduce the data by collecting and replacing low level concepts (such as numeric values for age) by higher level concepts (such as young, middle-aged, or senior)  Detail lost  More meaningful  Easier to interpret  Mining becomes easier  Several concept hierarchies can be defined for the same attribute  Manual / Implicit
  • 5. 5 Discretization and Concept Hierarchy Generation for Numeric Data  Typical methods:  Binning  Histogram analysis  Clustering analysis  Entropy-based discretization  χ2 merging  Segmentation by natural partitioning All the methods can be applied recursively
  • 6. 6 Techniques  Binning  Distribute values into bins  Replace by bin mean / median  Recursive application – leads to concept hierarchies  Unsupervised technique  Histogram Analysis  Data Distribution – Partition  Equiwidth – (0-100], (100-200], …  Equidepth  Recursive  Minimum Interval size  Unsupervised
  • 7. 7 Techniques  Cluster Analysis  Clusters form nodes of concept hierarchy  Can decompose / combine  Lower level / higher level of hierarchy
  • 8. 8 Entropy-Based Discretization  Given a set of samples S, if S is partitioned into two intervals S1 and S2 using boundary T, the expected information requirement after partitioning is  Entropy is calculated based on class distribution of the samples in the set. Given m classes, the entropy of S1 is where pi is the probability of class i in S1  The boundary that minimizes the expected information requirement over all possible boundaries is selected as a binary discretization  The process is recursively applied to partitions obtained until some stopping criterion is met )( || || )( || || ),( 2 2 1 1 SEntropy S S SEntropy S S TSI += ∑= −= m i ii ppSEntropy 1 21 )(log)(
  • 9. 9  Reduces data size  Class information is considered  Improves accuracy Entropy-Based Discretization
  • 10. 10 Interval Merging by χ2 Analysis  ChiMerge  Bottom-up approach  find the best neighbouring intervals and merges them to form larger intervals  Supervised  If two adjacent intervals have similar distribution of classes – they can be merged  Initially each value is in a separate interval  χ2 tests are performed for adjacent intervals. Those with least values are merged  Can be repeated  Stopping condition (Threshold, Number of intervals)
  • 11. 11 Segmentation by Natural Partitioning  A simply 3-4-5 rule can be used to segment numeric data into relatively uniform, “natural” intervals.  If an interval covers 3, 6, 7 or 9 distinct values at the most significant digit, partition the range into 3 equi-width intervals  If it covers 2, 4, or 8 distinct values at the most significant digit, partition the range into 4 intervals  If it covers 1, 5, or 10 distinct values at the most significant digit, partition the range into 5 intervals
  • 12. 12  Outliers could be present  Consider only the majority values  5th percentile – 95th percentile Segmentation by Natural Partitioning
  • 13. 13 Example of 3-4-5 Rule (-$400 -$5,000) (-$400 - 0) (-$400 - -$300) (-$300 - -$200) (-$200 - -$100) (-$100 - 0) (0 - $1,000) (0 - $200) ($200 - $400) ($400 - $600) ($600 - $800) ($800 - $1,000) ($2,000 - $5, 000) ($2,000 - $3,000) ($3,000 - $4,000) ($4,000 - $5,000) ($1,000 - $2, 000) ($1,000 - $1,200) ($1,200 - $1,400) ($1,400 - $1,600) ($1,600 - $1,800) ($1,800 - $2,000) msd=1,000 Low=-$1,000 High=$2,000Step 2: Step 4: Step 1: -$351 -$159 profit $1,838 $4,700 Min Low (i.e, 5%-tile) High(i.e, 95%-tile) Max count (-$1,000 - $2,000) (-$1,000 - 0) (0 -$ 1,000) Step 3: ($1,000 - $2,000)
  • 14. 14 Concept Hierarchy Generation for Categorical Data  Specification of a partial ordering of attributes explicitly at the schema level by users or experts  User / Expert defines hierarchy  Street < city < state < country  Specification of a portion of a hierarchy by explicit data grouping  Manual  Intermediate level information specified  Industrial, Agricultural..
  • 15. 15 Concept Hierarchy Generation for Categorical Data  Specification of a set of attributes but not their partial ordering  Automatically inferring the hierarchy  Heuristic rule  High level concepts contain a smaller number of values  Specification of only a partial set of attributes  Embedding data semantics  Attributes with tight semantic connections are pinned together