Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree
Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree
CHAID
Credit Rating:
;;
;;
;
;
;
;
@@
@
@
;@
;;
;
;
@@
@@
1199
1198
Algorithm:
@@
Bad
Poor
Good
V.Good
@@
NT=0 NT 1
1200
1201
bad
poor
good
v.good
2
X1
2
X2
bad
poor
good
v.good
Repeat step 2
if the new table
has more than
two columns
1202
1203
bad
poor
good
v.good
;@@
;
;;
;;
;;
C1+C2
Compute a \Bonferroni" adjusted
chi-squared test of independence for
the reduced table for each explanatory variable.
1204
@@
@@
@@
C3
C4+C5+C6
- repeat steps 1-5 for
each of the o spring
nodes.
Stop if
no variable is signi cant in step 4.
the number of cases reaching a node is below a speci ed limit.
1205
Summary:
CHAID is an algorithm
Must categorize every variable
{ ordinal variables
{ nominal variables
At each node it tries to nd
{ best explanatory variable
{ best merger of categories
;@
;; @@
;;
;;
;
@@
@@
@
1206
/*
1207
R = RESIDENTIAL AREA
X = COUNT
run
chaidwis.sas
*/
proc format
/*
/*
2 = '36-55'
/*
3 = 'over 55'
value d
targetdevice=ps300 rotate=landscape
1 = 'Disease'
2 = 'Control'
*/
value v
1 = 'Some'
2 = 'None'
value r
1 = '> 150000'
device=WIN target=ps
2 = '39-150000'
rotate=landscape
3 = '10-39000'
4 = '< 10000'
DATA SET1
5 = 'rural'
INFILE 'c:\courses\st557\sas\drivall.dat'
INPUT AGE
LABEL
SEX
run
D = DRIVER GROUP
run
V = VIOLATION STATUS
1208
/*
/*
%inc 'c:\courses\st557\sas\xmacro.sas'
%treedisc(intree=trd,
*/
%treedisc(intree=trd, draw=graphics)
TREEDISC Analysis
Values of
AGE :
Values of
R :
Values of
D :
SEX :
AGE
4 5
2
1
Ordinal
57.39
0.0001
Nominal
36.80
0.0001
Type
Nominal
4.40
0.0359
Predictor
SEX
Values of
Ordinal
2.53
0.4458
2
Best split:
V
New node: 3
DV values:
Chi-Square Adjusted p
AGE = 2
DV count:
New node: 2
1864
133
656
AGE = 1
DV count:
1209
147
1210
20
Predictor
Type
Type
Predictor
Chi-Square Adjusted p
Ordinal
1.41
0.7031
Nominal
0.06
0.8101
Chi-Square Adjusted p
SEX
Nominal
41.59
0.0001
Nominal
0.01
0.9193
Ordinal
0.15
0.9975
Best split:
Best split:
New node: 5
SEX = 1
DV count:
New node: 4
102
302
31
354
SEX = 2
DV count:
1211
1212
AGE value(s): 2
Variable (DV) V
DV counts: 147
3
1864
2
2520
SEX value(s): 2
DV counts: 20
563
656
AGE value(s): 2
DV counts: 14
284
354
AGE value(s): 3
DV counts: 6
279
302
D value(s): 2
DV counts: 0
1213
127
1214
D value(s): 1
DV counts: 6
D value(s): 2
152
DV counts: 18
217
D value(s): 1
22
DV counts: 40
R value(s): 2
DV counts: 1
3
111
AGE value(s): 3
DV counts: 69
839
R value(s): 5
DV counts: 2
245
19
R value(s): 1
DV counts: 20
SEX value(s): 1
DV counts: 127
139
1301
DV counts: 49
AGE value(s): 2
700
462
1216