CS467-M4-Machine Learning-Ktustudents - in
CS467-M4-Machine Learning-Ktustudents - in
Decision Trees- Entropy, Information Gain, Tree construction, ID3, Issues in Decision Tree
learning- Avoiding Over-fitting, Reduced Error Pruning, The problem of Missing Attributes,
Gain Ratio, Classification by Regression (CART)
Neural Networks- The Perceptron, Activation Functions, Training Feed Forward Network by
Back Propagation.
Decision Trees
A decision tree is a classifier in the form of a tree structure with two types of nodes:
Decision node: Specifies attributes, with one branch for each outcome
Each internal node is an attribute and create child nodes for each
attribute value
KTUStudents.in
ID3 Algorithm
5) If all training examples are perfectly classified stop, else iterate over new nodes.
When to stop
Entropy
KTUStudents.in
A measure for uncertainty
This measure is used to select among the candidate attributes at each step while growing the
tree
Information gain measures how well a given attribute separates the training examples
according to their target classification
Gain is measure of how much we can reduce uncertainty (Value lies between 0,1)
The decision on whether tennis can be played or not is based on the following features:
Outlook E {Sunny, Overcast, Rain}, Temperature E {Hot, Mild, Cool}, Humidity E {High,
Normal} and Wind E {Weak, Strong}. The training data is
KTUStudents.in
For selecting the best splitting attribute find informtion gain for each attribute
KTUStudents.in
Gain(S,Temperature) =0.029
where S denotes the collection of training examples
Traverse the tree. Each path from root to leaf is a rule. Decision tree can be used for feature
extraction.
KTUStudents.in
IF Outlook = Sunny and Humidity = Normal THEN Play = Yes
IF Outlook = Sunny and Humidity = High THEN Play = No
Overfitting
• Learning a tree that classifies the training data perfectly may not lead to the tree with
the best generalization performance
• A hypothesis h is said to overfit the training data if there is another hypothesis, h’,
such that h has smaller error than h’ on the training data but h has larger error on the
test data than h’.
• Overfitting: A more complex hypothesis. Overfitting results in decision trees that are
more complex than necessary
• Evaluate splits before installing them, avoid splits that don’t look worth
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
• Stop if number of instances is less than some user-specified threshold
Post-pruning
KTUStudents.in
Two types of decision trees
Classification trees
Decision trees where the target variable can take a discrete set of values are called
classification trees.
In these tree structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels
Regression trees
Decision trees where the target variable can take continuous values like the price of a
house are called regression trees.
Gain ratio
The gain ratio is another feature selection measure in the construction of classification trees.
KTUStudents.in
CART split each of the input node into two child nodes, so CART decision tree is Binary
Decision Tree.
CART Algorithm
Decision Tree building algorithm involves a few simple steps and these are:
Take Labelled Input data - with a Target Variable and a list of features
Best Variable: Select the Best feature value for the split
Continue step 2-4 on each of the nodes until meet stopping criteria
KTUStudents.in