0% found this document useful (0 votes)

59 views

CS467-M4-Machine Learning-Ktustudents - in

The document discusses decision trees and their construction. It covers topics like entropy, information gain, the ID3 algorithm, issues in decision tree learning like overfitting, and techniques to address overfitting like reduced error pruning. It also discusses neural networks briefly, covering the perceptron, activation functions, and training feedforward networks using backpropagation.

Uploaded by

Reshma Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

CS467-M4-Machine Learning-Ktustudents - in

Uploaded by

Reshma Sindhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Module 4

Decision Trees- Entropy, Information Gain, Tree construction, ID3, Issues in Decision Tree
learning- Avoiding Over-fitting, Reduced Error Pruning, The problem of Missing Attributes,
Gain Ratio, Classification by Regression (CART)
Neural Networks- The Perceptron, Activation Functions, Training Feed Forward Network by
Back Propagation.

Decision Trees

A decision tree is a classifier in the form of a tree structure with two types of nodes:

Decision node: Specifies attributes, with one branch for each outcome

Each internal node is an attribute and create child nodes for each
attribute value

Branches in the tree are identiﬁed by the values of attribute / feature

Leaf node: Indicates class labels.

KTUStudents.in

ID3 Algorithm

1) Select the “best” decision attribute

2) Assign A as decision attribute for node

3) For each value of A create new child

For more study materials: 1WWW.KTUSTUDENTS.IN

4) Sort training examples according to the attribute value of the branch

5) If all training examples are perfectly classified stop, else iterate over new nodes.

Issues in Decision tree construction

Which node to proceed to with

When to stop

How to handle data with missing attribute values

How deep to grow

How to handle continuous attributes

How to handle overfit

Measures to select attributes

Entropy

KTUStudents.in
A measure for uncertainty

For more study materials: 2WWW.KTUSTUDENTS.IN

Information gain

This measure is used to select among the candidate attributes at each step while growing the
tree

Information gain measures how well a given attribute separates the training examples
according to their target classification

Gain is measure of how much we can reduce uncertainty (Value lies between 0,1)

Gain(S, A): expected reduction in entropy due to partitioning S on attribute A

Gain(S,A)=Entropy(S) −  |S | / |S| Entropy(S )

vvalues(A) v v

Which Attribute is ”best”? Decide A1 or A2

Entropy ([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64 = 0.99

KTUStudents.in Want to find Gain (S,A1) and Gain(S,A1)

So, split using A1 is best

For more study materials: 3WWW.KTUSTUDENTS.IN

Construct a decision tree to predict whether we could play tennis given the atmospheric
conditions.

The decision on whether tennis can be played or not is based on the following features:
Outlook E {Sunny, Overcast, Rain}, Temperature E {Hot, Mild, Cool}, Humidity E {High,
Normal} and Wind E {Weak, Strong}. The training data is

KTUStudents.in
For selecting the best splitting attribute find informtion gain for each attribute

For more study materials: 4WWW.KTUSTUDENTS.IN

The information gain values for the 4 attributes are:

Gain( S,Outlook) =0.247

Gain( S, Humidity) =0.151
Gain(S,Wind) =0.048

KTUStudents.in
Gain(S,Temperature) =0.029
where S denotes the collection of training examples

For more study materials: 5WWW.KTUSTUDENTS.IN

Rule extraction from decision tree

Traverse the tree. Each path from root to leaf is a rule. Decision tree can be used for feature
extraction.

KTUStudents.in
IF Outlook = Sunny and Humidity = Normal THEN Play = Yes
IF Outlook = Sunny and Humidity = High THEN Play = No

Overfitting

• Generalization: How well a model performs on new data

• Learning a tree that classifies the training data perfectly may not lead to the tree with
the best generalization performance

• A hypothesis h is said to overfit the training data if there is another hypothesis, h’,
such that h has smaller error than h’ on the training data but h has larger error on the
test data than h’.

• Overfitting: A more complex hypothesis. Overfitting results in decision trees that are
more complex than necessary

• Underfitting: A less complex hypothesis, underfitting results in simple decision tree

For more study materials: 6WWW.KTUSTUDENTS.IN

How can we avoid overfitting a decision tree?

• Prepruning : Stop growing when data split not statistically significant

• Postpruning: Grow full tree then remove nodes

Pre-Pruning (Early Stopping)

• Evaluate splits before installing them, avoid splits that don’t look worth
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
• Stop if number of instances is less than some user-specified threshold

Post-pruning

• Reduced-error Pruning is a post-pruning, cross validation approach

• Partition training data into “train” set and “validation” set.
• Build a complete tree for the “train” data and then prune subtrees
• Cross validation is used to identify which node to remove
• Let T be the decision tree and S be the subtree which we consider to remove
• Find Error(T) using the validation set. Also find Error (T- S)
• If Error (T- S) is smaller then S is a candidate for removal

KTUStudents.in
Two types of decision trees

Classiﬁcation trees

Decision trees where the target variable can take a discrete set of values are called
classiﬁcation trees.

In these tree structures, leaves represent class labels and branches represent
conjunctions of features that lead to those class labels

Splitting criteria: Entropy, Information gain, Gini

Goodness of fit: Misclassification rate

Regression trees

Decision trees where the target variable can take continuous values like the price of a
house are called regression trees.

Splitting criteria: Sum of squared errors

Goodness of fit: Sum of squared errors

For more study materials: 7WWW.KTUSTUDENTS.IN

GINI Index

Gini split index

Gain ratio

The gain ratio is another feature selection measure in the construction of classiﬁcation trees.

KTUStudents.in

= Gain (S, A) / Entropy (S, A)

For more study materials: 8WWW.KTUSTUDENTS.IN

CART Classiﬁcation And Regression Tree

CART split each of the input node into two child nodes, so CART decision tree is Binary
Decision Tree.

CART Algorithm

Decision Tree building algorithm involves a few simple steps and these are:

Take Labelled Input data - with a Target Variable and a list of features

Best Split: Find Best Split feature

Best Variable: Select the Best feature value for the split

Split the input data into Left and Right Nodes

Continue step 2-4 on each of the nodes until meet stopping criteria

Decision Tree Pruning : Steps to prune Decision Tree

KTUStudents.in

For more study materials: 9WWW.KTUSTUDENTS.IN

Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
2.lockbox Approach
100% (1)
2.lockbox Approach
7 pages
Unit 1:-: Question Bank Multiple Choice Questions (MCQ)
No ratings yet
Unit 1:-: Question Bank Multiple Choice Questions (MCQ)
60 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
decision tree
No ratings yet
decision tree
66 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
ESGB_2025_classification and regression tress [Enregistré automatiquement]
No ratings yet
ESGB_2025_classification and regression tress [Enregistré automatiquement]
43 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Trees
No ratings yet
Trees
19 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
u34
No ratings yet
u34
4 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
6__DecisionTrees__ID3_CART
No ratings yet
6__DecisionTrees__ID3_CART
24 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
10. Decistion Tree.pptx
No ratings yet
10. Decistion Tree.pptx
27 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
BSC ML Ch3.pptx
No ratings yet
BSC ML Ch3.pptx
106 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Remove Duplicate Character Using Hash Function PDF
No ratings yet
Remove Duplicate Character Using Hash Function PDF
2 pages
Module 5
No ratings yet
Module 5
51 pages
Chaudhari 2018
No ratings yet
Chaudhari 2018
7 pages
CS409 Cryptography and Network Security, December 2018
No ratings yet
CS409 Cryptography and Network Security, December 2018
2 pages
CS409 Cryptography and Network Security, December 2018 PDF
No ratings yet
CS409 Cryptography and Network Security, December 2018 PDF
2 pages
CS401-M4-Computer Graphics-Ktustudents - in
No ratings yet
CS401-M4-Computer Graphics-Ktustudents - in
15 pages
Programming: Hardware & Software
No ratings yet
Programming: Hardware & Software
18 pages
Lab 13 Sp17 208A
No ratings yet
Lab 13 Sp17 208A
4 pages
Prachi Sonje (Resume)
No ratings yet
Prachi Sonje (Resume)
1 page
Just A Brief Introduction To C++ Data Types PDF
No ratings yet
Just A Brief Introduction To C++ Data Types PDF
4 pages
ALGOL
No ratings yet
ALGOL
29 pages
OOSD AKTU PAPER 20-21 Solutions
No ratings yet
OOSD AKTU PAPER 20-21 Solutions
12 pages
Vehicle Rental System
No ratings yet
Vehicle Rental System
10 pages
Java Thread
No ratings yet
Java Thread
36 pages
Cypress Introduction
No ratings yet
Cypress Introduction
6 pages
University of Akron Thesis Guidelines
100% (3)
University of Akron Thesis Guidelines
6 pages
Chapter-1 Introduction To JAVA
No ratings yet
Chapter-1 Introduction To JAVA
11 pages
SyllabusCSIS215Fall2023 2024
No ratings yet
SyllabusCSIS215Fall2023 2024
4 pages
PunkChainer_Levels_PunkAlgo_3.0
No ratings yet
PunkChainer_Levels_PunkAlgo_3.0
6 pages
Applied Computing Guess 2024
No ratings yet
Applied Computing Guess 2024
17 pages
Modern Periodic Table - C Program
No ratings yet
Modern Periodic Table - C Program
10 pages
File Handling
No ratings yet
File Handling
12 pages
Cp4161 Ads Lab Manual
No ratings yet
Cp4161 Ads Lab Manual
62 pages
CV Data Analyst - pdf.236432962
No ratings yet
CV Data Analyst - pdf.236432962
1 page
Self-Test Java EE Architecture
No ratings yet
Self-Test Java EE Architecture
8 pages
Detailed List and Syllabuses of Courses: Syrian Arab Republic Damascus University
No ratings yet
Detailed List and Syllabuses of Courses: Syrian Arab Republic Damascus University
10 pages
NodeJS PPT
No ratings yet
NodeJS PPT
30 pages
r2 Cheatsheet PDF
No ratings yet
r2 Cheatsheet PDF
2 pages
Bottom Up Parser
No ratings yet
Bottom Up Parser
75 pages
School of Law Brochure 2023-24
No ratings yet
School of Law Brochure 2023-24
16 pages
Edulito Practice Exams 1 AQA 8525-1 Computational Thinking & Programming
No ratings yet
Edulito Practice Exams 1 AQA 8525-1 Computational Thinking & Programming
24 pages
Enumeration Sort I PDF
No ratings yet
Enumeration Sort I PDF
2 pages
Cs4102 Hw3 Solutions
No ratings yet
Cs4102 Hw3 Solutions
4 pages
Cópia de How To Create A New Event For E-Social - v1.7
No ratings yet
Cópia de How To Create A New Event For E-Social - v1.7
28 pages