0% found this document useful (0 votes)

32 views

Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R

Multvariate analysis_Multivariate Regression Tree

Uploaded by

RAMZI Azeddine

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R

Multvariate analysis_Multivariate Regression Tree

Uploaded by

RAMZI Azeddine

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Chapter 9 Multivariate regression tree

Multivariate regression tree (MRT) is a hierarchical constrained clustering technique.

Introduced by De’ath (2002), the MRT splits a response matrix (Y ) into clusters based on
thresholds of explanatory variables (X). Like RDA, MRT is a regression technique. While the
former explains the global structure of relationships through a linear model, the latter produces
a tree model to highlight local structures and interactions among variables.

Figure 9.1: The basic structure of a multivariate regression tree (MRT).

MRT has many convenient characteristics:

It does not assume a linear relationship between Y and X matrices;

The results are easy to visualize and interpret (it’s a tree!);
It clearly identifies importance of explanatory variables;
It is robust to missing values;
It is robust to collinearity among the explanatory variables;
It can handle raw explanatory variables, meaning there is no need to standardize.

A quick note on MRT terminology:

Branch: each group formed by a split;

Node: splitting point (threshold value of an explanatory variable);
Leaf: terminal group of sites.
9.1 Computing the MRT

The MRT splits the data into clusters of samples similar in their species composition based on
environmental value thresholds. It involves two procedures running at the same time: 1) the
computation of the constrained partitioning of the data, and 2) the calculation of the relative
error of the successive partitioning levels by multiple cross-validations. This cross-validation is,
in essence, aiming to identify best predictive tree. The “best” tree varies depending on your
study goals. Usually you want a tree that is parsimonious, but still has an informative number
of groups. This is, of course, a subjective decision to make according to the question you are
trying to answer.

9.1.1 Building the tree: Constrained partitioning of the

data

First, the method computes all possible partitions of the sites into two groups. For each
quantitative explanatory variable, the sites will be sorted in the ascending values of the
variables. For categorical variables, the sites will be aggregated by levels to test all
combinations of levels. The method will split the data after the first object, the second object
and so on, and compute the sum of within-group sum of squared distances to the group mean
(within-group SS) for the response data. The method will retain the partition into two groups
minimizing the within-group SS and the threshold value/level of the explanatory variable.
These steps will be repeated within the two subgroups formed previously, until all objects form
their own group. In other words, this process ends when each leaf of the tree contains one
object.

9.1.2 Selecting the tree: Cross-validation and pruning

The next step is to perform a cross-validation and identify the best predictive tree. The cross-
validation procedure consists in using a subset of the objects to construct the tree, and to
allocate the remaining objects to the groups. In a good predictive tree, objects are assigned to
the appropriate groups. The cross-validated relative error (CVRE) is the measure of the
predictive error. Without cross-validation, one would retain the number of partitions minimizing
the variance not explained by the tree (i.e. the relative error: the sum of the within-group SS
over all leaves divided by the overall SS of the data). This is the solution which maximizes the
R
2
, so to speak.

9.2 MRT in R

The function mvpart() from the package mvpart computes both the partition and the cross-
validation steps required to build a multivariate regression tree.

We will demonstrate the process of building a multivariate regression tree on the Doubs River
data.

# First, remove the "distance from source" variable

env <- subset(env, select = -das)

# Create multivariate regression tree

doubs.mrt <- mvpart(as.matrix(spe.hel) ~ ., data = env,
xv = "pick", # interactively select best tree

xval = nrow(spe.hel), # number of cross-validations

xvmult = 100, # number of multiple cross-validations
which = 4, # plot both node labels
legend = FALSE, margin = 0.01, cp = 0)

## X-Val rep : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## Minimum tree sizes
## tabmins

## 2 3 4 5 6 7 8 9 10
## 7 1 2 4 11 9 23 4 39
At this point, you will need to select the tree with an appropriate number of groups, depending
on the aim of your study. In other words, you must prune the tree by picking the best-fit tree. A
fully resolved tree is not the desirable outcome; instead, one is usually interested in a tree
including only informative partitions/groups. In such cases, it is possible to have an a priori
idea of the number of potential groups to be retained. You can make this choice interactively,
with the argument xv = "pick" .

The resulting figure shows the relative error RE (in green) and the cross-validated relative
error CVRE (in blue) of trees of increasing size. The red dot indicates the solution with the
smallest CVRE, and the orange dot shows the smallest tree within one standard error of
CVRE. It has been suggested that instead of choosing the solution minimizing CVRE, it would
be more parsimonious to opt for the smallest tree for which the CVRE is within one standard
error of the tree with the lowest CVRE Breiman et al. (1984). The green bars at the top
indicate the number of times each size was chosen during the cross-validation process. This
graph is interactive, which means you will have to click on the blue point corresponding your
choice of tree size. In summary:

Green points: Relative error

Blue points: Cross-validated relative error (CVRE)
Red dot: Which tree has the smallest CVRE
Orange dot: Smallest tree within one standard error of the CVRE
Lime green bars: number of times each tree size was chosen

We don’t have an a priori expectation about how to partition this data, so we’ll select the
smallest tree within 1 standard error of the overall best-fit tree (i.e. the orange dot). We can
select this tree using the xv = "1se" argument.

# Select the solution we want

doubs.mrt <- mvpart(as.matrix(spe.hel) ~ ., data = env,
xv = "1se", # select smallest tree within 1 se
xval = nrow(spe.hel), # number of cross-validations
xvmult = 100, # number of multiple cross-validations
which = 4, # plot both node labels

legend = FALSE, margin = 0.01, cp = 0)

## X-Val rep : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## Minimum tree sizes
## tabmins
## 2 3 5 6 7 8 9 10
## 10 2 3 13 6 26 4 36
The statistics at the bottom of the figure are: the residual error, the cross-validated error, and
the standard error. This tree has only two leaves separated by one node. Each leaf is
characterized by a small barplot showing the abundances of the species included in the group,
the number of sites in the group, and the group’s relative error. From this figure, we can report
the following statistics: * The species matrix is partitioned according to an altitude threshold
(361.5 m) * Residual error = 0.563, which means the model’s R
2
is 43.7% (
1 − 0.563 = 0.437 )

9.2.1 MRT selection process

We can also compare solutions, to help us chose the best tree. For example, let’s take a look
at a 10-group solution!
# Trying 10 groups
mvpart(as.matrix(spe.hel) ~ ., data = env,
xv = "none", # no cross-validation
size = 10, # set tree size
which = 4,

legend = FALSE, margin = 0.01, cp = 0, prn = FALSE)

This tree is much harder to interpret, because there are so many groups! Although this version
of the tree offers higher explanatory power, its predictive power (CV Error = 0.671) is basically
the same as the previous two-group solution (CV Error = 0.673). This suggests that we may
want to try a tree with a few more groupings than the two-group solution, while staying lower
than 10 groups.

Let’s look at a solution with fewer (4) groups!

# Trying fewer groups
mvpart(as.matrix(spe.hel) ~ ., data = env,
xv = "none", # no cross-validation
size = 4, # set tree size
which = 4,

legend = FALSE, margin = 0.01, cp = 0, prn = FALSE)

This tree is much easier to interpret! It also offers higher explanatory power (lower Error) than
our original solution, and higher predictive power than both previous solutions (CV Error). We
have a winner!

9.2.2 Interpreting MRT output

To find out how much variance is explained by each node in the tree, we need to look at the
complexity parameter (CP). The CP at nsplit = 0 is the R
2
of the entire tree.
# Checking the complexity parameter
doubs.mrt$cptable

## CP nsplit rel error xerror xstd

## 1 0.4369561 0 1.0000000 1.0758122 0.07493568
## 2 0.1044982 1 0.5630439 0.6755865 0.09492709

The summary then outlines, for each node, the best threshold values to split the data.

# Checking the tree result summary

summary(doubs.mrt)
## Call:
## mvpart(form = as.matrix(spe.hel) ~ ., data = env, xv = "1se",
## xval = nrow(spe.hel), xvmult = 100, margin = 0.01, which = 4,
## legend = FALSE, cp = 0)
## n= 29

##
## CP nsplit rel error xerror xstd
## 1 0.4369561 0 1.0000000 1.0758122 0.07493568
## 2 0.1044982 1 0.5630439 0.6755865 0.09492709
##
## Node number 1: 29 observations, complexity param=0.4369561

## Means=0.07299,0.2472,0.2581,0.2721,0.07133,0.06813,0.06897,0.07664,0.1488,0.2331,0.
## left son=2 (15 obs) right son=3 (14 obs)
## Primary splits:
## alt < 361.5 to the right, improve=0.4369561, (0 missing)
## deb < 23.65 to the left, improve=0.4369561, (0 missing)

## amm < 0.06 to the left, improve=0.3529830, (0 missing)

## nit < 1.415 to the left, improve=0.3513335, (0 missing)
## pen < 1.5 to the right, improve=0.3372429, (0 missing)
##
## Node number 2: 15 observations

## Means=0.1208,0.4463,0.4194,0.4035,0.1104,0.09023,0,0.02108,0.1256,0.2164,0.04392,0.
##
## Node number 3: 14 observations
## Means=0.02179,0.03391,0.08514,0.1313,0.02945,0.04444,0.1429,0.1362,0.1736,0.2509,0.

9.2.3 Indicator species

You might also be interested in finding out which species are significant indicator species for
each grouping of sites.
# Calculate indicator values (indval) for each species
doubs.mrt.indval <- indval(spe.hel, doubs.mrt$where)

# Extract the significant indicator species (and which node

# they represent)

doubs.mrt.indval$maxcls[which(doubs.mrt.indval$pval <= 0.05)]

## TRU VAI LOC HOT TOX BAR SPI GOU BRO PER BOU PSO ROT CAR BCO PCH GRE GAR BBO ABL
## 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## ANG
## 2

# Extract their indicator values

doubs.mrt.indval$indcls[which(doubs.mrt.indval$pval <= 0.05)]

## TRU VAI LOC HOT TOX BAR SPI GOU

## 0.8674301 0.7758443 0.7042392 0.8571429 0.6185282 0.6363569 0.7347359 0.6442950
## BRO PER BOU PSO ROT CAR BCO PCH

## 0.5533235 0.5449488 0.7857143 0.8070918 0.6352865 0.7307582 0.6428571 0.5000000

## GRE GAR BBO ABL ANG
## 0.8571429 0.7726181 0.7142857 1.0000000 0.7857143

TRU has the highest indicator value (0.867) overall, and is an indicator species for the first (alt
>= 361.5) leaf of the tree.

9.3 Challenge 4

Create a multivariate regression tree for the mite data. * Select the smallest tree within 1 SE of
the CVRE. * What is the proportion of variance (R2) explained by this tree? * How many
leaves does it have? * What are the top 3 discriminant species?

Remember to load the mite data:

data("mite")
data("mite.env")

Recall some useful functions:

`?`(mvpart() # hint: pay attention to the 'xv' argument!

)
summary()

9.3.1 Challenge 4: Solution

Step 1: Create the multivariate regression tree.

mite.mrt <- mvpart(as.matrix(mite.spe.hel) ~ ., data = mite.env,

xv = "1se", # choose smallest tree within 1 SE

xval = nrow(mite.spe.hel),
xvmult = 100,
which = 4, legend = FALSE, margin = 0.01, cp = 0,
prn = FALSE)

## X-Val rep : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

## Minimum tree sizes

## tabmins
## 2 3 4 5 6 7 8 9 12 15
## 1 11 3 9 9 24 33 8 1 1
What is the proportion of variance (R2 ) explained by this tree? * 1 − 0.748 = 0.252 , so the
tree explains 25.2% of the variance in the species matrix.

How many leaves does it have? * 2 leaves

Step 2: Identify the indicator species.

Which species are significant indicator species for each grouping of sites?

# Calculate indicator values (indval) for each species

mite.mrt.indval <- indval(mite.spe.hel, mite.mrt$where)

# Extract the significant indicator species (and which node

# they represent)

mite.mrt.indval$maxcls[which(mite.mrt.indval$pval <= 0.05)]

## PHTH RARD SSTR Protopl MEGR MPRO TVIE HMIN
## 2 2 2 2 2 2 1 2
## HMIN2 NPRA TVEL ONOV SUCT LCIL Oribatl1 Ceratoz1
## 2 2 2 2 2 1 2 1
## PWIL Galumna1 Stgncrs2 Trhypch1 NCOR SLAT FSET Lepidzts

## 2 2 2 1 1 2 2 2
## Miniglmn LRUG Ceratoz3 Trimalc2
## 2 1 1 1

# Extract their indicator values

mite.mrt.indval$indcls[which(mite.mrt.indval$pval <= 0.05)]

## PHTH RARD SSTR Protopl MEGR MPRO TVIE HMIN

## 0.5317919 0.5584677 0.2256592 0.2517509 0.5769554 0.1567789 0.3793303 0.6421174
## HMIN2 NPRA TVEL ONOV SUCT LCIL Oribatl1 Ceratoz1
## 0.6193076 0.4620892 0.7412296 0.6312483 0.6087557 0.7152107 0.5978167 0.4744997

## PWIL Galumna1 Stgncrs2 Trhypch1 NCOR SLAT FSET Lepidzts

## 0.3779883 0.5974145 0.3897917 0.4545803 0.4539642 0.2249109 0.6361272 0.2108305
## Miniglmn LRUG Ceratoz3 Trimalc2
## 0.1880194 0.6683300 0.3962540 0.4358974

References

Breiman, Leo, Jerome Friedman, Charles J Stone, and Richard A Olshen. 1984. Classification
and Regression Trees. CRC press.
De’ath, Glenn. 2002. “Multivariate Regression Trees: A New Technique for Modeling Species–
Environment Relationships.” Ecology 83 (4): 1105–17.

All the content of the workshop series is under a Creative Commons Attribution-NonCommercial-
ShareAlike 4.0 International License.

UWorld CFA L2 Formulasheet@2024
No ratings yet
UWorld CFA L2 Formulasheet@2024
24 pages
Forecasting Report Sample5
No ratings yet
Forecasting Report Sample5
41 pages
Credit Scoring SAS
No ratings yet
Credit Scoring SAS
42 pages
Ml2-Summary
No ratings yet
Ml2-Summary
8 pages
Miss Forest
No ratings yet
Miss Forest
10 pages
CP 4
No ratings yet
CP 4
2 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Miss Forest
No ratings yet
Miss Forest
10 pages
Support, Decision and Random
No ratings yet
Support, Decision and Random
8 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Module09 TreeBasedMethods
No ratings yet
Module09 TreeBasedMethods
36 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Unit 2
No ratings yet
Unit 2
11 pages
Studio 9 Questions
No ratings yet
Studio 9 Questions
6 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
05.Random Forest (2)
No ratings yet
05.Random Forest (2)
3 pages
Machine Learning in Ecology
No ratings yet
Machine Learning in Ecology
15 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
ML Decode TE IT
No ratings yet
ML Decode TE IT
71 pages
Stats 3
No ratings yet
Stats 3
3 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
Here
No ratings yet
Here
17 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
RevoScale & Decision Trees
No ratings yet
RevoScale & Decision Trees
11 pages
Module10 TreeBasedMethods
No ratings yet
Module10 TreeBasedMethods
33 pages
Random Forest
No ratings yet
Random Forest
32 pages
Cluster Analysis in R
No ratings yet
Cluster Analysis in R
8 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
RBF, KNN, SVM, DT
No ratings yet
RBF, KNN, SVM, DT
9 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
11 pages
Ds Practical
No ratings yet
Ds Practical
25 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
FATHOM Za MATLAB Tutorijal
No ratings yet
FATHOM Za MATLAB Tutorijal
90 pages
Session 16-Discriminant Analysis
No ratings yet
Session 16-Discriminant Analysis
16 pages
ML unit-3
No ratings yet
ML unit-3
23 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
16 pages
Algorithms for calculating variance - Welford Method
No ratings yet
Algorithms for calculating variance - Welford Method
11 pages
Model Definition
No ratings yet
Model Definition
6 pages
g (y) = βo + β (Age) - (a)
No ratings yet
g (y) = βo + β (Age) - (a)
6 pages
Model Definition11
No ratings yet
Model Definition11
6 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
ABDULLAH SAAD MACHINE LEARNING ASSIGNMENT 01
No ratings yet
ABDULLAH SAAD MACHINE LEARNING ASSIGNMENT 01
15 pages
Package Party': January 27, 2015
No ratings yet
Package Party': January 27, 2015
38 pages
Bs Report On Iris
No ratings yet
Bs Report On Iris
6 pages
Machine Learning: B.E, M.Tech, PH.D
No ratings yet
Machine Learning: B.E, M.Tech, PH.D
23 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
u34
No ratings yet
u34
4 pages
Random Forest
No ratings yet
Random Forest
21 pages
CS 2420 Program 2 - 24 Points Due Fall 2013 Fun With Recursion
No ratings yet
CS 2420 Program 2 - 24 Points Due Fall 2013 Fun With Recursion
4 pages
Optimized hyperparameters tuning of multi-class classification algorithms
No ratings yet
Optimized hyperparameters tuning of multi-class classification algorithms
17 pages
R Examples of Using Some Prediction Tools (Highlight: Random Forest)
No ratings yet
R Examples of Using Some Prediction Tools (Highlight: Random Forest)
9 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Creating Filters - Quarto
No ratings yet
Creating Filters - Quarto
3 pages
Quarto Word Templates Apply Different Style To Table Section - General - Posit Community
No ratings yet
Quarto Word Templates Apply Different Style To Table Section - General - Posit Community
2 pages
Basiloneetal.201713386 37426 1 PB
No ratings yet
Basiloneetal.201713386 37426 1 PB
14 pages
CAPAM Good Practices Pre-Announcement
No ratings yet
CAPAM Good Practices Pre-Announcement
1 page
Data Manipulation With Dplyr
100% (1)
Data Manipulation With Dplyr
39 pages
Data Science With Python Workflow
No ratings yet
Data Science With Python Workflow
1 page
Create Elegant Data Visualisations Using The Grammar of Graphics - Ggplot2
No ratings yet
Create Elegant Data Visualisations Using The Grammar of Graphics - Ggplot2
5 pages
Forecasting Trip Attraction Based On Com PDF
No ratings yet
Forecasting Trip Attraction Based On Com PDF
9 pages
GROUP 1 SEC. 22 MPA Chapter 5 Nos. 1&2 Introduction and Application of Research Statistics 1
No ratings yet
GROUP 1 SEC. 22 MPA Chapter 5 Nos. 1&2 Introduction and Application of Research Statistics 1
19 pages
LESSON-1 Statistics
No ratings yet
LESSON-1 Statistics
7 pages
Kas 3
No ratings yet
Kas 3
2 pages
Assignment - 3
No ratings yet
Assignment - 3
2 pages
Ekram Assignment ECONO
100% (1)
Ekram Assignment ECONO
16 pages
Miller and Freunds Probability and Statistics For Engineers 9th Edition Johnson Solutions Manual
100% (32)
Miller and Freunds Probability and Statistics For Engineers 9th Edition Johnson Solutions Manual
8 pages
Activity #10 - DIFFERENT TYPES OF HYPOTHESIS TESTING
No ratings yet
Activity #10 - DIFFERENT TYPES OF HYPOTHESIS TESTING
3 pages
Basic Econometrics
No ratings yet
Basic Econometrics
250 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Ch04 ClassProblems
No ratings yet
Ch04 ClassProblems
11 pages
Optimasi Multivariat Fungsi Permintaan
No ratings yet
Optimasi Multivariat Fungsi Permintaan
3 pages
Chi-Square 1. Test of Goodness-Of-Fit
No ratings yet
Chi-Square 1. Test of Goodness-Of-Fit
6 pages
ANOVA - Example - Welch and G-H - Key
No ratings yet
ANOVA - Example - Welch and G-H - Key
6 pages
Multivariate Analysis IBS
No ratings yet
Multivariate Analysis IBS
20 pages
Using Excel To Calculate Confidence Intervals
No ratings yet
Using Excel To Calculate Confidence Intervals
9 pages
Imacd and Linreg
No ratings yet
Imacd and Linreg
2 pages
Stata Tests
No ratings yet
Stata Tests
2 pages
L11 - Regularization
No ratings yet
L11 - Regularization
25 pages
BÀI TẬP THỰC HÀNH TRÊN STATA
No ratings yet
BÀI TẬP THỰC HÀNH TRÊN STATA
5 pages
Logit
No ratings yet
Logit
3 pages
Engin2423424234242423424eering Statisti342342342342cs Completed
No ratings yet
Engin2423424234242423424eering Statisti342342342342cs Completed
10 pages
Operations Management: - Forecasting
No ratings yet
Operations Management: - Forecasting
96 pages
BCS301 QB
No ratings yet
BCS301 QB
5 pages
Correlationanalysis
No ratings yet
Correlationanalysis
49 pages
Introduction To Statistics With GraphPad Prism Slides
No ratings yet
Introduction To Statistics With GraphPad Prism Slides
101 pages

Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R

Uploaded by

Chapter 9 Multivariate Regression Tree - Workshop 10 - Advanced Multivariate Analyses in R

Uploaded by

Chapter 9 Multivariate regression tree

Multivariate regression tree (MRT) is a hierarchical constrained clustering technique.

Figure 9.1: The basic structure of a multivariate regression tree (MRT).

MRT has many convenient characteristics:

It does not assume a linear relationship between Y and X matrices;

A quick note on MRT terminology:

Branch: each group formed by a split;

9.1.1 Building the tree: Constrained partitioning of the

9.1.2 Selecting the tree: Cross-validation and pruning

# First, remove the "distance from source" variable

env <- subset(env, select = -das)

# Create multivariate regression tree

xval = nrow(spe.hel), # number of cross-validations

Green points: Relative error

# Select the solution we want

legend = FALSE, margin = 0.01, cp = 0)

9.2.1 MRT selection process

legend = FALSE, margin = 0.01, cp = 0, prn = FALSE)

Let’s look at a solution with fewer (4) groups!

legend = FALSE, margin = 0.01, cp = 0, prn = FALSE)

9.2.2 Interpreting MRT output

## CP nsplit rel error xerror xstd

# Checking the tree result summary

## amm < 0.06 to the left, improve=0.3529830, (0 missing)

9.2.3 Indicator species

# Extract the significant indicator species (and which node

doubs.mrt.indval$maxcls[which(doubs.mrt.indval$pval <= 0.05)]

# Extract their indicator values

## TRU VAI LOC HOT TOX BAR SPI GOU

## 0.5533235 0.5449488 0.7857143 0.8070918 0.6352865 0.7307582 0.6428571 0.5000000

Remember to load the mite data:

Recall some useful functions:

`?`(mvpart() # hint: pay attention to the 'xv' argument!

9.3.1 Challenge 4: Solution

Step 1: Create the multivariate regression tree.

mite.mrt <- mvpart(as.matrix(mite.spe.hel) ~ ., data = mite.env,

xv = "1se", # choose smallest tree within 1 SE

## Minimum tree sizes

How many leaves does it have? * 2 leaves

Step 2: Identify the indicator species.

# Calculate indicator values (indval) for each species

# Extract the significant indicator species (and which node

mite.mrt.indval$maxcls[which(mite.mrt.indval$pval <= 0.05)]

# Extract their indicator values

## PHTH RARD SSTR Protopl MEGR MPRO TVIE HMIN

## PWIL Galumna1 Stgncrs2 Trhypch1 NCOR SLAT FSET Lepidzts

You might also like