0% found this document useful (0 votes)
59 views

CS467-M4-Machine Learning-Ktustudents - in

The document discusses decision trees and their construction. It covers topics like entropy, information gain, the ID3 algorithm, issues in decision tree learning like overfitting, and techniques to address overfitting like reduced error pruning. It also discusses neural networks briefly, covering the perceptron, activation functions, and training feedforward networks using backpropagation.

Uploaded by

Reshma Sindhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

CS467-M4-Machine Learning-Ktustudents - in

The document discusses decision trees and their construction. It covers topics like entropy, information gain, the ID3 algorithm, issues in decision tree learning like overfitting, and techniques to address overfitting like reduced error pruning. It also discusses neural networks briefly, covering the perceptron, activation functions, and training feedforward networks using backpropagation.

Uploaded by

Reshma Sindhu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Module 4

Decision Trees- Entropy, Information Gain, Tree construction, ID3, Issues in Decision Tree
learning- Avoiding Over-fitting, Reduced Error Pruning, The problem of Missing Attributes,
Gain Ratio, Classification by Regression (CART)
Neural Networks- The Perceptron, Activation Functions, Training Feed Forward Network by
Back Propagation.

Decision Trees

A decision tree is a classifier in the form of a tree structure with two types of nodes:

Decision node: Specifies attributes, with one branch for each outcome

Each internal node is an attribute and create child nodes for each
attribute value

Branches in the tree are identified by the values of attribute / feature

Leaf node: Indicates class labels.

KTUStudents.in

ID3 Algorithm

1) Select the “best” decision attribute

2) Assign A as decision attribute for node

3) For each value of A create new child

For more study materials: 1WWW.KTUSTUDENTS.IN


4) Sort training examples according to the attribute value of the branch

5) If all training examples are perfectly classified stop, else iterate over new nodes.

Issues in Decision tree construction

Which node to proceed to with

When to stop

How to handle data with missing attribute values

How deep to grow

How to handle continuous attributes

How to handle overfit

Measures to select attributes

Entropy

KTUStudents.in
A measure for uncertainty

For more study materials: 2WWW.KTUSTUDENTS.IN


Information gain

This measure is used to select among the candidate attributes at each step while growing the
tree

Information gain measures how well a given attribute separates the training examples
according to their target classification

Gain is measure of how much we can reduce uncertainty (Value lies between 0,1)

Gain(S, A): expected reduction in entropy due to partitioning S on attribute A

Gain(S,A)=Entropy(S) −  |S | / |S| Entropy(S )


vvalues(A) v v

Which Attribute is ”best”? Decide A1 or A2

Entropy ([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99

KTUStudents.in Want to find Gain (S,A1) and Gain(S,A1)

So, split using A1 is best

For more study materials: 3WWW.KTUSTUDENTS.IN


Construct a decision tree to predict whether we could play tennis given the atmospheric
conditions.

The decision on whether tennis can be played or not is based on the following features:
Outlook E {Sunny, Overcast, Rain}, Temperature E {Hot, Mild, Cool}, Humidity E {High,
Normal} and Wind E {Weak, Strong}. The training data is

KTUStudents.in
For selecting the best splitting attribute find informtion gain for each attribute

For more study materials: 4WWW.KTUSTUDENTS.IN


The information gain values for the 4 attributes are:

Gain( S,Outlook) =0.247


Gain( S, Humidity) =0.151
Gain(S,Wind) =0.048

KTUStudents.in
Gain(S,Temperature) =0.029
where S denotes the collection of training examples

For more study materials: 5WWW.KTUSTUDENTS.IN


Rule extraction from decision tree

Traverse the tree. Each path from root to leaf is a rule. Decision tree can be used for feature
extraction.

KTUStudents.in
IF Outlook = Sunny and Humidity = Normal THEN Play = Yes
IF Outlook = Sunny and Humidity = High THEN Play = No

Overfitting

• Generalization: How well a model performs on new data

• Learning a tree that classifies the training data perfectly may not lead to the tree with
the best generalization performance

• A hypothesis h is said to overfit the training data if there is another hypothesis, h’,
such that h has smaller error than h’ on the training data but h has larger error on the
test data than h’.

• Overfitting: A more complex hypothesis. Overfitting results in decision trees that are
more complex than necessary

• Underfitting: A less complex hypothesis, underfitting results in simple decision tree

For more study materials: 6WWW.KTUSTUDENTS.IN


How can we avoid overfitting a decision tree?

• Prepruning : Stop growing when data split not statistically significant


• Postpruning: Grow full tree then remove nodes

Pre-Pruning (Early Stopping)

• Evaluate splits before installing them, avoid splits that don’t look worth
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
• Stop if number of instances is less than some user-specified threshold

Post-pruning

• Reduced-error Pruning is a post-pruning, cross validation approach


• Partition training data into “train” set and “validation” set.
• Build a complete tree for the “train” data and then prune subtrees
• Cross validation is used to identify which node to remove
• Let T be the decision tree and S be the subtree which we consider to remove
• Find Error(T) using the validation set. Also find Error (T- S)
• If Error (T- S) is smaller then S is a candidate for removal

KTUStudents.in
Two types of decision trees

Classification trees

Decision trees where the target variable can take a discrete set of values are called
classification trees.

In these tree structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels

Splitting criteria: Entropy, Information gain, Gini

Goodness of fit: Misclassification rate

Regression trees

Decision trees where the target variable can take continuous values like the price of a
house are called regression trees.

Splitting criteria: Sum of squared errors

Goodness of fit: Sum of squared errors

For more study materials: 7WWW.KTUSTUDENTS.IN


GINI Index

Gini split index

Gain ratio

The gain ratio is another feature selection measure in the construction of classification trees.

KTUStudents.in

= Gain (S, A) / Entropy (S, A)

For more study materials: 8WWW.KTUSTUDENTS.IN


CART Classification And Regression Tree

CART split each of the input node into two child nodes, so CART decision tree is Binary
Decision Tree.

CART Algorithm

Decision Tree building algorithm involves a few simple steps and these are:

Take Labelled Input data - with a Target Variable and a list of features

Best Split: Find Best Split feature

Best Variable: Select the Best feature value for the split

Split the input data into Left and Right Nodes

Continue step 2-4 on each of the nodes until meet stopping criteria

Decision Tree Pruning : Steps to prune Decision Tree

KTUStudents.in

For more study materials: 9WWW.KTUSTUDENTS.IN

You might also like