0% found this document useful (0 votes)

201 views

En Tanagra Clustering Tree

This document describes using clustering trees to induce groupings of animals (the zoo dataset) based on their characteristics. It shows how to: 1. Import the zoo dataset and select relevant attributes for clustering. 2. Perform multiple correspondence analysis to transform the data and build a new representation space. 3. Use clustering trees to recursively partition the data into groups defined by rules based on the attributes. 4. Compare the clustering tree groups to the expert classification and to k-means clustering, finding them to produce equivalent groupings in this case.

Uploaded by

Uyi Kristina Nuraini

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

201 views

En Tanagra Clustering Tree

Uploaded by

Uyi Kristina Nuraini

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Didacticiel - Etudes de cas R.R.

Subject
We show how to induce clustering trees with TANAGRA.

The aim of clustering is to build groups of individuals so that, the examples in the same
group are similar, the examples in different groups are dissimilar.

Top down induction of clustering trees adapts the supervised decision/regression trees
framework towards clustering. The groups are built by recursive partitioning of the dataset,
the internal nodes of the tree are classically split with input attributes. The obtained model,
the clustering tree, describes the groups; the learning algorithm selects automatically the
relevant attributes.

The clustering trees approach is not very known; we show in this tutorial the interesting
properties of this method. Our main references are the papers of Chavent 1 (1998) and
Blockeel2 (1998).

Dataset
We use the ZOO dataset (UCI). We want to group animals using their characteristics such as
number of legs, producing milk, …

The expert domain proposes 7 clusters. We want to know (1) if our algorithm can find these
clusters; (2) if we find the same clusters as the well-known K-MEANS algorithm.

Clustering trees

Downloading the dataset

In the first time, we must create a diagram and import ZOO.XLS. We click on the FILE/NEW
menu.

1 M. Chavent (1998), « A monothetic clustering method », Pattern Recognition Letters, 19, 989—996.
2 H. Blockeel, L. De Raedt, J. Ramon (1998), « Top-Down Induction of Clustering rees », ICML, 55—63.

06/05/2006 Page 1 sur 12

Didacticiel - Etudes de cas R.R.

Selecting the attributes

In the next step, we select the attributes that we use in order to characterize the homogeneity
of groups. We choice all the measured attributes; we do not use the TYPE attribute, which is
provided by experts. We use the DEFINE STATUS component.

06/05/2006 Page 2 sur 12

Didacticiel - Etudes de cas R.R.
Feature construction
Computing a distance on discrete attributes is possible but not easy. Moreover, some
attributes may be redundant. We use factorial analysis in order to build a new representation
space where we respect, as much as possible, the proximity between the individuals.

Because we have discrete attributes, we use multiple correspondence analysis (MCA). This
data transformation cumulates several advantages: we can use now classical Euclidian
distance, more especially as the factorial axes (the latent variables) are independent; by
selecting only the first 10 axes, we recover "useful" information and leave side "disturbed"
information specific to the file (the artifact information in the dataset).

We add a MCA component in the diagram, we set 10 the number of produced axis
(approximately the half of the total number of axis).

Note: In the case of continuous attributes, we follow the same principle and use instead a principal
component analysis (PCA). We observe the same advantages.

We click on the VIEW contextual menu. The 10 axis summaries 90% of available information,
that is fully suitable.

06/05/2006 Page 3 sur 12

Didacticiel - Etudes de cas R.R.

Target and input attributes for clustering tree

In order to build groups, we want split the dataset using original attributes (INPUT); the
homogeneity of groups is computed on factorial axis (TARGET). We add a DEFINE STATUS
component in the diagram and set these parameters.

We obtain the following results (VIEW menu).

06/05/2006 Page 4 sur 12

Didacticiel - Etudes de cas R.R.

Note: In this tutorial, we use the same attributes for the homogeneity computation and the
construction of the tree. But, in fact, we can use two separate sets of attributes. We obtain a
generalization of decision/regression trees; some authors call this approach “multi-objective
regression/decision trees” or “predictive clustering trees”.

Clustering trees
We add the clustering tree component in the diagram (CTP -- CLUSTERING TREE WITH
PRUNING).

06/05/2006 Page 5 sur 12

Didacticiel - Etudes de cas R.R.

Roughly speaking, it is a generalization of CART algorithm (Breiman et al, 1984) with two
specificities:
1. We compute inertia instead of variance to evaluate homogeneity of groups.
2. Our goal is not to produce an accurate prediction but find “natural” groups. So, we try to
detect the “angle” of the within-inertia computed on the pruning set. At the present time,
we use a regression on successive 3 points. We select the cut point that corresponds to a
slope of the lines near to zero.

In this tutorial, we use 20% of the dataset as pruning set; 80% of examples are used for the
growing phase. We obtain the following clustering tree (VIEW menu).

06/05/2006 Page 6 sur 12

Didacticiel - Etudes de cas R.R.

We obtain 4 groups (the leaves of the tree), each cluster corresponds to the following rule:

If milk = true Then Cluster 1

If milk = false And feathers = false And backbone = true Then Cluster 2
If milk = false And feathers = false And backbone = false Then Cluster 3
If milk = false And feathers = true Then Cluster 4

It is very easy to assign a group to a new example with theses rules.

We can see also the decrease of the within-class inertia according to the number of the leaves
(groups), on the growing and the pruning set.

The 14 groups clustering minimizes the within inertia on the pruning set (green mark). But
we see an “angle” when we have 4 groups (red mark). The following chart shows the
variation of the within inertia.

06/05/2006 Page 7 sur 12

Didacticiel - Etudes de cas R.R.

Within-groups inertia according to

the number of clusters
1.2

0.8 Growing set

Pruning set
0.6

0.4

0.2

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Comparison with the classification of the domain expert

The experts suggest 7 groups. Our aim is to compare our 4 groups clustering with this
classification. It is a good indicator of the relevance of our results.

We add a DEFINE STATUS component in the diagram. We set TYPE as TARGET and our
clustering suggestion (CLUSTER_CTP_1) as INPUT. Then we add a CONTINGENCY CHI-
SQUARE (NON PARAMETRIC STATISTICS tab) in order to compare the groups.

Dataset (zoo.xls)
Define status 1
M ultiple Correspondance Analysis 1
Define status 2
CTP 1
Define status 3
Contingency Chi-Square 1

We note that we have very similar groups.

06/05/2006 Page 8 sur 12

Didacticiel - Etudes de cas R.R.

Each expert group is set in one cluster. And each cluster is a pure group (Cluster 1 and
Cluster 4) or a mix of similar species (Cluster 2 and Cluster 3)3.

Comparison with K-MEANS clustering algorithm

The learning and representation bias of the clustering trees can lead to not very effective
solutions compared to well-known methods such as K-MEANS. In this next step, we
compare the groups of CTP with the groups produced by K-MEANS.

We insert again a DEFINE STATUS component under the CTP (Clustering Tree) component.
We set as INPUT the factorial axis. We add the K-MEANS component that is configured so
that the results of the two approaches (tree and k-means) are comparable: we want 4 groups;
we must not normalize the factorial axis in the inertia computation.

3 I am not an expert !

06/05/2006 Page 9 sur 12

Didacticiel - Etudes de cas R.R.

We obtain the following results.

We want to compare these groups with the groups obtained with CTP.

We insert another DEFINE STATUS in the diagram; we set as TARGET the clusters of the
tree (CLUSTER_CTP_1), as INPUT the clusters of the K-MEANS (CLUSTER_KMEANS_1).

06/05/2006 Page 10 sur 12

Didacticiel - Etudes de cas R.R.
So we add again the contingency table component in order to comparing the two
approaches.

The two methods are equivalent; the profit of interpretability of the trees is not
counterbalanced by a degradation of the precision of calculations. The other advantage of the
tree in this case is that it selected the relevant variables automatically.

Visualization of groups
Factorial analysis allows us to visualize the dataset in a reduced dimension space. We want
to see if we can perceive the expert groups in the first two “latent” variables.

(X1) MCA_1_Axis_1 vs. (X2) MCA_1_Axis_2 by (Y) type

-1

mammal fish bird invertebrate insect amphibian reptile

06/05/2006 Page 11 sur 12

Didacticiel - Etudes de cas R.R.

This result is edifying. The groups proposed by experts are really distinct even on the first
two axes which summaries only about 50% of the available information (see MCA result,
44.89%).

If this example shows well that the visual tools are often very powerful; the main difficulty is
to be able to be came back thereafter to the initial space of description and obtain an
interpretable results in relation to these descriptors. The reading of the results of the MCA
remains obscure for the people who are not accustomed.

The clustering trees approach is a simple method to build automatically clusters and obtain
interpretable results.

06/05/2006 Page 12 sur 12

DM 21bcc028
No ratings yet
DM 21bcc028
4 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
Comparison of Graph Clustering Algorithms
No ratings yet
Comparison of Graph Clustering Algorithms
6 pages
Fraiman 2013
No ratings yet
Fraiman 2013
21 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Ghattas 2017
No ratings yet
Ghattas 2017
30 pages
Expt-5
No ratings yet
Expt-5
3 pages
Lecture-9 Cluster Analysis_LAK
No ratings yet
Lecture-9 Cluster Analysis_LAK
4 pages
1 Theme: Comparison of The Implementation of The CART Algorithm Under Tanagra and R (Rpart Package)
No ratings yet
1 Theme: Comparison of The Implementation of The CART Algorithm Under Tanagra and R (Rpart Package)
15 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
69 pages
Clustering
No ratings yet
Clustering
45 pages
Unit 4 - Data Warehousing and Mining
No ratings yet
Unit 4 - Data Warehousing and Mining
51 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unit 3
No ratings yet
Unit 3
95 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
5 pages
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
No ratings yet
FCM-Fuzzy Rule Base: A New Rule Extraction Mechanism
5 pages
En Tanagra Hac Pca PDF
No ratings yet
En Tanagra Hac Pca PDF
11 pages
Lesson 3.6 - Supervised Learning Neural Networks
No ratings yet
Lesson 3.6 - Supervised Learning Neural Networks
35 pages
List of Figures Chapter 1: State of The Art
No ratings yet
List of Figures Chapter 1: State of The Art
25 pages
Machine Learning Bloque 4
No ratings yet
Machine Learning Bloque 4
12 pages
Cluster Analysis Thesis Matlab Code PDF
100% (3)
Cluster Analysis Thesis Matlab Code PDF
7 pages
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
No ratings yet
Lp2-Etl Model Assignment No. 2: R (2) C (4) V (2) T (2) Total (10) Dated Sign
7 pages
Clustering Large Data Sets With Mixed Numeric and Categorical Values
No ratings yet
Clustering Large Data Sets With Mixed Numeric and Categorical Values
14 pages
Unit 5 Clustering
No ratings yet
Unit 5 Clustering
70 pages
Clustering
No ratings yet
Clustering
75 pages
ML Mod6
No ratings yet
ML Mod6
24 pages
Overview of Clustering:: UNIT-5
No ratings yet
Overview of Clustering:: UNIT-5
27 pages
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
No ratings yet
Session 3: Clustering Techniques - Partitioning & Hierarchical Methods
27 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Liu 2000
No ratings yet
Liu 2000
10 pages
BIRCH: A New Data Clustering Algorithm and Its Applications
No ratings yet
BIRCH: A New Data Clustering Algorithm and Its Applications
42 pages
Master Wilson
No ratings yet
Master Wilson
66 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
Presentation 1
No ratings yet
Presentation 1
50 pages
Lecture 18
No ratings yet
Lecture 18
27 pages
Chapter13 Slides
No ratings yet
Chapter13 Slides
24 pages
ML Lecture 10
No ratings yet
ML Lecture 10
14 pages
Clustering
No ratings yet
Clustering
36 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Classification Chap3
No ratings yet
Classification Chap3
110 pages
K-Means Clustering
No ratings yet
K-Means Clustering
8 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
13_BIRCH
No ratings yet
13_BIRCH
8 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
CLustering Methods
No ratings yet
CLustering Methods
2 pages
Hierarchical clustering
No ratings yet
Hierarchical clustering
23 pages
Clustering
No ratings yet
Clustering
64 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Analysis of Learning Behavior Characteristics and Prediction of Learning Effect For Improving College Students Information Literacy Based On Machine Learning
No ratings yet
Analysis of Learning Behavior Characteristics and Prediction of Learning Effect For Improving College Students Information Literacy Based On Machine Learning
15 pages
ML Tennis
No ratings yet
ML Tennis
6 pages
Pharma Industry 4 0 Preparing For The Smart Factories 1636727193
No ratings yet
Pharma Industry 4 0 Preparing For The Smart Factories 1636727193
9 pages
A Simple Scilab 5.3.3 Code For ANN (Artificial Neuron Networks)
No ratings yet
A Simple Scilab 5.3.3 Code For ANN (Artificial Neuron Networks)
4 pages
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
No ratings yet
Top 170 Machine Learning Interview Questions and Answers (2024) - Reader View
51 pages
TYCS_DATA_SCIENCE_SEM6[1]
No ratings yet
TYCS_DATA_SCIENCE_SEM6[1]
99 pages
Deep Learning For Logo Recognition
No ratings yet
Deep Learning For Logo Recognition
10 pages
BTP B23RVP02 Final PPT Presentation
No ratings yet
BTP B23RVP02 Final PPT Presentation
14 pages
Final report chatbot (1)
No ratings yet
Final report chatbot (1)
45 pages
Purple Green Abstract Professional Research Presentation
No ratings yet
Purple Green Abstract Professional Research Presentation
21 pages
trees_classification.ipynb - Colab
No ratings yet
trees_classification.ipynb - Colab
6 pages
Master of IT in Business: Leading Intelligent Business Transformation
No ratings yet
Master of IT in Business: Leading Intelligent Business Transformation
6 pages
UNIT V Compression and Recognition
No ratings yet
UNIT V Compression and Recognition
97 pages
Data Mining For Fraud Detection
No ratings yet
Data Mining For Fraud Detection
4 pages
ML Unit 1 Part 2
No ratings yet
ML Unit 1 Part 2
56 pages
UNIT - 5 - ID3 Algotithm (Good Slide)
No ratings yet
UNIT - 5 - ID3 Algotithm (Good Slide)
28 pages
AI Startup Strategy: A Blueprint to Building Successful Artificial Intelligence Products from Inception to Exit 1st Edition Adhiguna Mahendra - Download the ebook with all fully detailed chapters
No ratings yet
AI Startup Strategy: A Blueprint to Building Successful Artificial Intelligence Products from Inception to Exit 1st Edition Adhiguna Mahendra - Download the ebook with all fully detailed chapters
73 pages
Immediate download Data Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Julian Soh ebooks 2024
100% (10)
Immediate download Data Science Solutions on Azure: Tools and Techniques Using Databricks and MLOps Julian Soh ebooks 2024
45 pages
Curriculum For Machine Learning Course
No ratings yet
Curriculum For Machine Learning Course
3 pages
Full download Machine Learning for Practical Decision Making: A Multidisciplinary Perspective with Applications from Healthcare, Engineering and Business Analytics Christo El Morr pdf docx
100% (3)
Full download Machine Learning for Practical Decision Making: A Multidisciplinary Perspective with Applications from Healthcare, Engineering and Business Analytics Christo El Morr pdf docx
50 pages
Conjoint Analysis Case Study PDF
No ratings yet
Conjoint Analysis Case Study PDF
8 pages
Algorithms: An Intelligent Coup Agent
No ratings yet
Algorithms: An Intelligent Coup Agent
1 page
Slides Lect 02
No ratings yet
Slides Lect 02
25 pages
CS F415 DATA MINING L1
No ratings yet
CS F415 DATA MINING L1
4 pages
UT 1 Intro To AI Class 10
No ratings yet
UT 1 Intro To AI Class 10
54 pages
Quality and Defect Inspection of Green Coffee Bean
No ratings yet
Quality and Defect Inspection of Green Coffee Bean
18 pages
DRDO
No ratings yet
DRDO
26 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
417-AI-X
No ratings yet
417-AI-X
10 pages
Class 9-AI TERM-II MCQ
No ratings yet
Class 9-AI TERM-II MCQ
6 pages

En Tanagra Clustering Tree

Uploaded by

En Tanagra Clustering Tree

Uploaded by

Didacticiel - Etudes de cas R.R.

Downloading the dataset

06/05/2006 Page 1 sur 12

Selecting the attributes

06/05/2006 Page 2 sur 12

06/05/2006 Page 3 sur 12

Target and input attributes for clustering tree

We obtain the following results (VIEW menu).

06/05/2006 Page 4 sur 12

06/05/2006 Page 5 sur 12

06/05/2006 Page 6 sur 12

If milk = true Then Cluster 1

It is very easy to assign a group to a new example with theses rules.

06/05/2006 Page 7 sur 12

Within-groups inertia according to

0.8 Growing set

Comparison with the classification of the domain expert

We note that we have very similar groups.

06/05/2006 Page 8 sur 12

Comparison with K-MEANS clustering algorithm

06/05/2006 Page 9 sur 12

We obtain the following results.

06/05/2006 Page 10 sur 12

(X1) MCA_1_Axis_1 vs. (X2) MCA_1_Axis_2 by (Y) type

mammal fish bird invertebrate insect amphibian reptile

06/05/2006 Page 11 sur 12

06/05/2006 Page 12 sur 12

You might also like