02 DecisionTrees Done
02 DecisionTrees Done
A decision tree is a
supervised learning
algorithm that is used for
classification and regression
modeling
Sample Dataset
• Columns denote features Xi
• Rows denote labeled instances
• Class label denotes whether a tennis game was played
Decision Tree
• A possible decision tree for the data:
NO
Decision Tree
• If features are continuous, internal nodes can
test the value of a feature against a threshold
Decision Tree Induced
Partition
Decision Tree – Decision Boundary
• Decision trees divide the feature space into axis-
parallel (hyper-)rectangles
• Each rectangular region is labeled with one label
– or a probability distribution over labels
Decision
boundary
Another Example:
Restaurant Domain (Russell & Norvig)
مدير
Model a patron’s decision of whether to wait for a table at a restaurant
Play
Day Outlook Temp Humidity Wind
Tennis
Outlook 1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
Yes No Yes No
Example
Question 3 Question 4
Yes No Yes No
Question 1 Question 2
E=1 E=1
Yes No Yes No
Information Gain
k
Question 1 Question 2
E=1 E=1
Yes No Yes No
Wind
E=0.811 E=1
G (S, W ind) = 0.048
E=0.954
Humidity
E=0.985 E=0.592
G (S, W ind) = 0.048
G (S , H umidity) = 0.151 E=0.954
Temp
d
Mil
E=1 E=0.92 E=0.81
G (S, W ind) = 0.048
G (S , H umidity) = 0.151 E=0.954
G (S , T emp) = 0.042
Outlook
Overcast
E=0.971 E=0 E=0.971
Outlook
Overcast
Humidity Yes Wind
No Yes No Yes
Example
This dataset is originally from the National Institute of Diabetes and Digestive
and Kidney Diseases. The objective of the dataset is to diagnostically predict
whether or not a patient has diabetes, based on certain diagnostic
measurements included in the dataset. Several constraints were placed on the
selection of these instances from a larger database. In particular, all patients
here are females at least 21 years old of Pima Indian heritage. بيما التراث
.الهندي
Content The datasets consists of several medical predictor variables and one
target variable, Outcome. Predictor variables includes the number of
pregnancies the patient has had, their BMI, insulin level, age, and so on.