0% found this document useful (0 votes)
62 views

How to compute the complexity parameter α?: Study Notes of CART

The document discusses how to compute the complexity parameter (α) for CART decision tree models. It presents a formula to calculate α as the ratio of the difference between the risk of a node (R(t)) and its subtree (R(Tt)) to the number of terminal nodes in the subtree minus one. It proves this formula works by showing that increasing α increases the risk of subtrees faster than individual nodes, until their risks are equal. An example calculation on a sample dataset with 5 terminal nodes demonstrates applying the formula to compute α for each node.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

How to compute the complexity parameter α?: Study Notes of CART

The document discusses how to compute the complexity parameter (α) for CART decision tree models. It presents a formula to calculate α as the ratio of the difference between the risk of a node (R(t)) and its subtree (R(Tt)) to the number of terminal nodes in the subtree minus one. It proves this formula works by showing that increasing α increases the risk of subtrees faster than individual nodes, until their risks are equal. An example calculation on a sample dataset with 5 terminal nodes demonstrates applying the formula to compute α for each node.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Study Notes of CART (2):

How to compute the complexity parameter α?


by Yin Zhao
School of Mathematical Sciences
USM, Penang, Malaysia
December 2013

Proposition: The complexity parameter α (i.e. cp in R “rpart” package) is

R(t) − R(Tt )
α=
|T̃ | − 1

Proof:
Recall that the definition: Rα (T ) = R(T ) + α|T̃ |, and Tt is a branch including node
t. For any single node t ∈ T , we have

Rα (t) = R(t) + α (1)

since there is only one terminal node at a single node.


Similarly, for any branch Tt ∈ T , we have

Rα (Tt ) = R(Tt ) + α|T̃t | (2)

When α = 0, R0 (t) = R(t) > R(Tt ) = R0 (Tt ). This inequality is guaranteed


because the first step of pruning is to prune off all of the terminal nodes which
satisfy R(t) = R(tL ) + R(tR ). That is, the remaining nodes must be satisfied
R(t) > R(tL ) + R(tR ) (the details can be found in the previous notes). Further-
more, the inequality holds for sufficient small α.
Then if we gradually increase α, Rα (Tt ) increases faster than Rα (t) since the coeffi-
cients |T̃t | > 1. In other words, at a certain α we will have Rα (Tt ) = Rα (t). Solve the
equations (1) and (2), we have

R(t) + α = R(Tt ) + α|T̃t |

1
⇒ (T̃t − 1) · α = R(t) − R(Tt )
R(t) − R(Tt )
⇒α=
|T̃ | − 1
as desired.
Example:
This example will simply show how to calculate the complexity parameter α (see
Figure 1 below). The data set has 2 classes say A, B, and 200 samples in all. T1 is a
subtree of the whole tree T , there are 5 terminal nodes in T1 , say t5 , t6 , t7 , t8 , and t9 .

Figure 1: Subtree T1 obtains 5 leaves

According to the formula, we have

R(t1 ) − R(Tt1 ) 100/200 − 0


α(T1 (t1 )) = = = 1/8
5−1 4
R(t2 ) − R(Tt2 ) 10/200 − 0
α(T1 (t2 )) = = = 1/40
3−1 2
R(t3 ) − R(Tt3 ) 60/200 − 0
α(T1 (t3 )) = = = 3/10
2−1 1

2
R(t4 ) − R(Tt4 ) 2/200 − 0
α(T1 (t4 )) = = = 1/100
2−1 1
α(T1 (t4 )) is the first value of α since it obtains the lowest value. That is, we prune
the tree below the node t4 . After this a new iteration should be used as before and
the tree will be pruned once again.

You might also like