SlideShare a Scribd company logo
Machine Learning Using Python
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Agenda for Today’s Session
▪ What is Classification?
▪ Types of Classification
▪ Classification Use case
▪ What is Decision Tree?
▪ Terminologies associated to a Decision Tree
▪ Visualizing a Decision Tree
▪ Writing a Decision Tree Classifier form Scratch in Python using
CART Algorithm
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
What is Classification?
Machine Leaning Training Using Python
“Classification is the process of dividing the datasets
into different categories or groups by adding label”
What is
Classification?
▪ Note: It adds the data point to a particular
labelled group on the basis of some condition”
Types of
Classification
Decision Tree
Random Forest
Naïve Bayes
KNN
Decision Tree
▪ Graphical representation of all the possible solutions to a decision
▪ Decisions are based on some conditions
▪ Decision made can be easily explained
Types of
Classification
Decision Tree
Random Forest
Naïve Bayes
KNN
Random Forest
▪ Builds multiple decision trees and merges them together
▪ More accurate and stable prediction
▪ Random decision forests correct for decision trees' habit
of overfitting to their training set
▪ Trained with the “bagging” method
Types of
Classification
Decision Tree
Random Forest
Naïve Bayes
KNN
Naïve Bayes
▪ Classification technique based on Bayes' Theorem
▪ Assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature
Types of
Classification
Decision Tree
Random Forest
Naïve Bayes
KNN
K-Nearest Neighbors
▪ Stores all the available cases and classifies new cases
based on a similarity measure
▪ The “K” is KNN algorithm is the nearest neighbors we wish
to take vote from.
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
What is Decision Tree?
Machine Leaning Training Using Python
“A decision tree is a graphical representation of all
the possible solutions to a decision based on certain
conditions”
What is
Decision Tree?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Understanding a Decision Tree
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Colour Diameter Label
Green 3 Mango
Yellow 3 Mango
Red 1 Grape
Red 1 Grape
Yellow 3 Lemon
Dataset
This is how our dataset looks like!
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
`
Decision
Tree
is diameter > = 3?
Color Diam Label
Green 3 Mango
Yellow 3 Lemon
Red 1 Grape
Yellow 3 Mango
Red 1 Grape
G 3 Mango
Y 3 Mango
Y 3 Lemon
R 1 Grape
R 1 Grape
is colour = = Yellow?
Y 3 Mango
Y 3 Lemon
G 3 Mango
Gini Impurity = 0.44
Gini Impurity = 0
Information Gain = 0.37
Information
Gain = 0.11
100% Grape
100% Mango
50% Mango
50% Lemon
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Is the colour green?
Is the diameter >=3
Is the colour yellow
TRUE False
Green 3 Mango
Yellow 3 Lemon
Yellow 3 Mango
`
What is
Decision Tree?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Decision Tree Terminologies
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Decision Tree Terminology
Pruning
Opposite of Splitting, basically
removing unwanted branches from
the tree
Root Node
It represents the entire population or
sample and this further gets divided
into two or more homogenous sets.
Parent/Child Node
Root node is the parent node and all
the other nodes branched from it is
known as child node
Splitting
Splitting is dividing the root node/sub
node into different parts on the basis
of some condition.
Leaf Node
Node cannot be further segregated
into further nodes
Branch/SubTree
Formed by splitting the tree/node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
is diameter > = 3?
Color Diam Label
Green 3 Mango
Yellow 3 Lemon
Red 1 Grape
Yellow 3 Mango
Red 1 Grape
is colour = = Yellow?
G 3 Mango
Y 3 Mango
Y 3 Lemon
R 1 Grape
R 1 Grape
100% Grape Y 3 Mango
Y 3 Lemon
100% Mango
50% Mango
50% Lemon
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
CART Algorithm
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Visualize the Decision Tree
Which Question to ask and When?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Visualize the Decision Tree
No
Yes
NormalHigh
Yes
WeakStrong
No Yes
Outlook
WindyHumidity
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
Which one among them
should you pick first?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
Answer: Determine the
attribute that best
classifies the training data
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Learn about Decision Tree
But How do we choose
the best attribute?
Or
How does a tree decide
where to split?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
How Does A Tree Decide Where To Split?
Information Gain
The information gain is the decrease in
entropy after a dataset is split on the basis
of an attribute. Constructing a decision tree
is all about finding attribute that returns the
highest information gain
Gini Index
The measure of impurity (or purity) used in
building decision tree in CART is Gini Index
Reduction in Variance
Reduction in variance is an algorithm used
for continuous target variables (regression
problems). The split with lower variance is
selected as the criteria to split the
population
Chi Square
It is an algorithm to find out the statistical
significance between the differences
between sub-nodes and parent node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Understand What is Impurity
Impurity = 0
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Let’s First Understand What is Impurity
Impurity ≠ 0
What is
Entropy?
▪ Defines randomness in the data
▪ Entropy is just a metric which measures the impurity or
▪ The first step to solve the problem of a decision tree
What is
Entropy?
If number of yes = number of no ie P(S) = 0.5
 Entropy(s) = 1
If it contains all yes or all no ie P(S) = 1 or 0
 Entropy(s) = 0
- P(yes) log2 P(yes) − P(no) log2 P(no)Entropy(s) =
Where,
▪ S is the total sample space,
▪ P(yes) is probability of yes
What is
Entropy?
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) =P(No) = 0.5 ie YES + NO = Total Sample(S)
E(S) = 0.5 log2 0.5 − 0.5 log2 0.5
E(S) = 0.5( log2 0.5 - log2 0.5)
E(S) = 1
What is
Entropy?
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠)
When P(Yes) = 1 ie YES = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
E(S) = -P(No) log2 𝑃(𝑁𝑜)
When P(No) = 1 ie No = Total Sample(S)
E(S) = 1 log2 1
E(S) = 0
What is
Information
Gain?
▪ Measures the reduction in entropy
▪ Decides which attribute should be selected as the
decision node
If S is our total collection,
Information Gain = Entropy(S) – [(Weighted Avg) x Entropy(each feature)]
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Let’s Build Our Decision Tree
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Out of 14 instances we have 9 YES and 5 NO
So we have the formula,
E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) − P(No) log2 𝑃(𝑁𝑜)
E(S) = - (9/14)* log2 9/14 - (5/14)* log2 5/14
E(S) = 0.41+0.53 = 0.94
Step 1: Compute the entropy for the Data set
D1
D2
D3
D4
D5
D6
D7
D8
D9
D10
D11
D12
D13
D14
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node?
Outlook? Temperature?
Humidity? Windy?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
Outlook?
Sunny Overcast
Yes
Yes
No
No
No
Yes
Yes
Yes
Yes
Rainy
Yes
Yes
Yes
No
No
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
E(Outlook = Sunny) = -2/5 log2 2/5 − 3/5 log2 3/5 = 0.971
E(Outlook = Overcast) = -1 log2 1
E(Outlook = Sunny) = -3/5 log2 3/5
− 0 log2 0 = 0
− 2/5 log2 2/5 = 0.971
I(Outlook) = 5/14 x 0.971 + 4/14 x 0 + 5/14 x 0.971 = 0.693
Information from outlook,
Information gained from outlook,
Gain(Outlook) = E(S) – I(Outlook)
0.94 – 0.693 = 0.247
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Outlook
Windy?
False
Yes
Yes
Yes
Yes
Yes
Yes
No
No
True
Yes
Yes
Yes
No
No
No
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node: Windy
E(Windy = True) = 1
E(Windy = False) = 0.811
I(Windy) = 8/14 x 0.811+ 6/14 x 1 = 0.892
Information from windy,
Information gained from outlook,
Gain(Windy) = E(S) – I(Windy)
0.94 – 0.892 = 0.048
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Similarly We Calculated For Rest Two
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select As Root Node
Outlook:
Info 0.693
Gain: 0.940-0.693 0.247
Temperature:
Info 0.911
Gain: 0.940-0.911 0.029
Windy:
Info 0.892
Gain: 0.940-0.982 0.048
Humidity:
Info 0.788
Gain: 0.940-0.788 0.152
Since Max gain = 0.247,
Outlook is our ROOT Node
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Which Node To Select Further?
Outlook
Yes ??
Overcast
Outlook = overcast
Contains only yes
You need to
recalculate things
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
This Is How Your Complete Tree Will Look Like
No
Yes
NormalHigh
Yes
WeakStrong
No Yes
Outlook
WindyHumidity
Overcast
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
What Should I Do To Play - Pruning
“A decision tree is a graphical representation of all
the possible solutions to a decision based on certain
conditions”
What is
Pruning?
Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python
Pruning: Reducing The Complexity
Yes
Normal
Yes
Weak
Yes
Outlook
WindyHumidity
Copyright © 2018, edureka and/or its affiliates. All rights reserved.
Machine Leaning
Training Using
Python
Are tree based models better than
linear models?
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka

More Related Content

What's hot (20)

PDF
Decision tree
R A Akerkar
 
PPTX
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
PDF
Decision trees in Machine Learning
Mohammad Junaid Khan
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Naive Bayes
CloudxLab
 
PPTX
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
PPTX
Ensemble learning Techniques
Babu Priyavrat
 
PPTX
Decision Trees
Student
 
PPT
2.2 decision tree
Krish_ver2
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Random forest
Ujjawal
 
PPT
Decision tree and random forest
Lippo Group Digital
 
PPTX
Decision tree
shivani saluja
 
PDF
Machine learning Lecture 2
Srinivasan R
 
PPTX
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPT
3. mining frequent patterns
Azad public school
 
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
PPTX
Decision Tree - ID3
Xueping Peng
 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
 
Decision tree
R A Akerkar
 
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Naive bayes
Ashraf Uddin
 
Naive Bayes
CloudxLab
 
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Ensemble learning Techniques
Babu Priyavrat
 
Decision Trees
Student
 
2.2 decision tree
Krish_ver2
 
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Random forest
Ujjawal
 
Decision tree and random forest
Lippo Group Digital
 
Decision tree
shivani saluja
 
Machine learning Lecture 2
Srinivasan R
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
3. mining frequent patterns
Azad public school
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Decision Tree - ID3
Xueping Peng
 
Introduction to Machine Learning Classifiers
Functional Imperative
 

Similar to Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka (20)

PDF
2023 Supervised Learning for Orange3 from scratch
FEG
 
PPTX
Decision Tree.pptx
JayabharathiMuraliku
 
PDF
From decision trees to random forests
Viet-Trung TRAN
 
PPT
classification in data warehouse and mining
anjanasharma77573
 
PPTX
BAS 250 Lecture 8
Wake Tech BAS
 
PDF
CSA 3702 machine learning module 2
Nandhini S
 
PDF
Machine Learning with Python- Machine Learning Algorithms- Decision Tree.pdf
KalighatOkira
 
PPTX
Unit 2-ML.pptx
Chitrachitrap
 
PPTX
Decision Trees
Carlos Santillan
 
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
PPTX
Machine Learning with Python unit-2.pptx
GORANG6
 
PDF
Decision tree
Varun Jain
 
PPTX
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
PPTX
Lect9 Decision tree
hktripathy
 
PPTX
Random forest and decision tree
AAKANKSHA JAIN
 
PDF
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
CHIRAGGOWDA41
 
PPTX
DECISION TRESS 2 for machine learning beginners
DebdattaBhattacharya1
 
PPTX
DECISION TRESS for Machine Learning Beginners
DebdattaBhattacharya1
 
DOCX
Classification Using Decision Trees and RulesChapter 5.docx
monicafrancis71118
 
2023 Supervised Learning for Orange3 from scratch
FEG
 
Decision Tree.pptx
JayabharathiMuraliku
 
From decision trees to random forests
Viet-Trung TRAN
 
classification in data warehouse and mining
anjanasharma77573
 
BAS 250 Lecture 8
Wake Tech BAS
 
CSA 3702 machine learning module 2
Nandhini S
 
Machine Learning with Python- Machine Learning Algorithms- Decision Tree.pdf
KalighatOkira
 
Unit 2-ML.pptx
Chitrachitrap
 
Decision Trees
Carlos Santillan
 
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Machine Learning with Python unit-2.pptx
GORANG6
 
Decision tree
Varun Jain
 
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Lect9 Decision tree
hktripathy
 
Random forest and decision tree
AAKANKSHA JAIN
 
Aiml ajsjdjcjcjcjfjfjModule4_Pashrt1-1.pdf
CHIRAGGOWDA41
 
DECISION TRESS 2 for machine learning beginners
DebdattaBhattacharya1
 
DECISION TRESS for Machine Learning Beginners
DebdattaBhattacharya1
 
Classification Using Decision Trees and RulesChapter 5.docx
monicafrancis71118
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 

Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorithms | Edureka

  • 2. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Agenda for Today’s Session ▪ What is Classification? ▪ Types of Classification ▪ Classification Use case ▪ What is Decision Tree? ▪ Terminologies associated to a Decision Tree ▪ Visualizing a Decision Tree ▪ Writing a Decision Tree Classifier form Scratch in Python using CART Algorithm
  • 3. Copyright © 2018, edureka and/or its affiliates. All rights reserved. What is Classification? Machine Leaning Training Using Python
  • 4. “Classification is the process of dividing the datasets into different categories or groups by adding label” What is Classification? ▪ Note: It adds the data point to a particular labelled group on the basis of some condition”
  • 5. Types of Classification Decision Tree Random Forest Naïve Bayes KNN Decision Tree ▪ Graphical representation of all the possible solutions to a decision ▪ Decisions are based on some conditions ▪ Decision made can be easily explained
  • 6. Types of Classification Decision Tree Random Forest Naïve Bayes KNN Random Forest ▪ Builds multiple decision trees and merges them together ▪ More accurate and stable prediction ▪ Random decision forests correct for decision trees' habit of overfitting to their training set ▪ Trained with the “bagging” method
  • 7. Types of Classification Decision Tree Random Forest Naïve Bayes KNN Naïve Bayes ▪ Classification technique based on Bayes' Theorem ▪ Assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature
  • 8. Types of Classification Decision Tree Random Forest Naïve Bayes KNN K-Nearest Neighbors ▪ Stores all the available cases and classifies new cases based on a similarity measure ▪ The “K” is KNN algorithm is the nearest neighbors we wish to take vote from.
  • 9. Copyright © 2018, edureka and/or its affiliates. All rights reserved. What is Decision Tree? Machine Leaning Training Using Python
  • 10. “A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions” What is Decision Tree?
  • 11. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Understanding a Decision Tree
  • 12. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Colour Diameter Label Green 3 Mango Yellow 3 Mango Red 1 Grape Red 1 Grape Yellow 3 Lemon Dataset This is how our dataset looks like!
  • 13. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python ` Decision Tree is diameter > = 3? Color Diam Label Green 3 Mango Yellow 3 Lemon Red 1 Grape Yellow 3 Mango Red 1 Grape G 3 Mango Y 3 Mango Y 3 Lemon R 1 Grape R 1 Grape is colour = = Yellow? Y 3 Mango Y 3 Lemon G 3 Mango Gini Impurity = 0.44 Gini Impurity = 0 Information Gain = 0.37 Information Gain = 0.11 100% Grape 100% Mango 50% Mango 50% Lemon
  • 14. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Is the colour green? Is the diameter >=3 Is the colour yellow TRUE False Green 3 Mango Yellow 3 Lemon Yellow 3 Mango ` What is Decision Tree?
  • 15. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Decision Tree Terminologies
  • 16. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Decision Tree Terminology Pruning Opposite of Splitting, basically removing unwanted branches from the tree Root Node It represents the entire population or sample and this further gets divided into two or more homogenous sets. Parent/Child Node Root node is the parent node and all the other nodes branched from it is known as child node Splitting Splitting is dividing the root node/sub node into different parts on the basis of some condition. Leaf Node Node cannot be further segregated into further nodes Branch/SubTree Formed by splitting the tree/node
  • 17. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python is diameter > = 3? Color Diam Label Green 3 Mango Yellow 3 Lemon Red 1 Grape Yellow 3 Mango Red 1 Grape is colour = = Yellow? G 3 Mango Y 3 Mango Y 3 Lemon R 1 Grape R 1 Grape 100% Grape Y 3 Mango Y 3 Lemon 100% Mango 50% Mango 50% Lemon
  • 18. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python CART Algorithm
  • 19. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Visualize the Decision Tree Which Question to ask and When?
  • 20. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Visualize the Decision Tree No Yes NormalHigh Yes WeakStrong No Yes Outlook WindyHumidity
  • 21. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree Which one among them should you pick first?
  • 22. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree Answer: Determine the attribute that best classifies the training data
  • 23. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Learn about Decision Tree But How do we choose the best attribute? Or How does a tree decide where to split?
  • 24. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python How Does A Tree Decide Where To Split? Information Gain The information gain is the decrease in entropy after a dataset is split on the basis of an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain Gini Index The measure of impurity (or purity) used in building decision tree in CART is Gini Index Reduction in Variance Reduction in variance is an algorithm used for continuous target variables (regression problems). The split with lower variance is selected as the criteria to split the population Chi Square It is an algorithm to find out the statistical significance between the differences between sub-nodes and parent node
  • 25. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Understand What is Impurity Impurity = 0
  • 26. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Let’s First Understand What is Impurity Impurity ≠ 0
  • 27. What is Entropy? ▪ Defines randomness in the data ▪ Entropy is just a metric which measures the impurity or ▪ The first step to solve the problem of a decision tree
  • 28. What is Entropy? If number of yes = number of no ie P(S) = 0.5  Entropy(s) = 1 If it contains all yes or all no ie P(S) = 1 or 0  Entropy(s) = 0 - P(yes) log2 P(yes) − P(no) log2 P(no)Entropy(s) = Where, ▪ S is the total sample space, ▪ P(yes) is probability of yes
  • 29. What is Entropy? E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) When P(Yes) =P(No) = 0.5 ie YES + NO = Total Sample(S) E(S) = 0.5 log2 0.5 − 0.5 log2 0.5 E(S) = 0.5( log2 0.5 - log2 0.5) E(S) = 1
  • 30. What is Entropy? E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) When P(Yes) = 1 ie YES = Total Sample(S) E(S) = 1 log2 1 E(S) = 0 E(S) = -P(No) log2 𝑃(𝑁𝑜) When P(No) = 1 ie No = Total Sample(S) E(S) = 1 log2 1 E(S) = 0
  • 31. What is Information Gain? ▪ Measures the reduction in entropy ▪ Decides which attribute should be selected as the decision node If S is our total collection, Information Gain = Entropy(S) – [(Weighted Avg) x Entropy(each feature)]
  • 32. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Let’s Build Our Decision Tree
  • 33. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Out of 14 instances we have 9 YES and 5 NO So we have the formula, E(S) = -P(Yes) log2 𝑃(𝑌𝑒𝑠) − P(No) log2 𝑃(𝑁𝑜) E(S) = - (9/14)* log2 9/14 - (5/14)* log2 5/14 E(S) = 0.41+0.53 = 0.94 Step 1: Compute the entropy for the Data set D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14
  • 34. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node? Outlook? Temperature? Humidity? Windy?
  • 35. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook Outlook? Sunny Overcast Yes Yes No No No Yes Yes Yes Yes Rainy Yes Yes Yes No No
  • 36. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook E(Outlook = Sunny) = -2/5 log2 2/5 − 3/5 log2 3/5 = 0.971 E(Outlook = Overcast) = -1 log2 1 E(Outlook = Sunny) = -3/5 log2 3/5 − 0 log2 0 = 0 − 2/5 log2 2/5 = 0.971 I(Outlook) = 5/14 x 0.971 + 4/14 x 0 + 5/14 x 0.971 = 0.693 Information from outlook, Information gained from outlook, Gain(Outlook) = E(S) – I(Outlook) 0.94 – 0.693 = 0.247
  • 37. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Outlook Windy? False Yes Yes Yes Yes Yes Yes No No True Yes Yes Yes No No No
  • 38. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node: Windy E(Windy = True) = 1 E(Windy = False) = 0.811 I(Windy) = 8/14 x 0.811+ 6/14 x 1 = 0.892 Information from windy, Information gained from outlook, Gain(Windy) = E(S) – I(Windy) 0.94 – 0.892 = 0.048
  • 39. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Similarly We Calculated For Rest Two
  • 40. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select As Root Node Outlook: Info 0.693 Gain: 0.940-0.693 0.247 Temperature: Info 0.911 Gain: 0.940-0.911 0.029 Windy: Info 0.892 Gain: 0.940-0.982 0.048 Humidity: Info 0.788 Gain: 0.940-0.788 0.152 Since Max gain = 0.247, Outlook is our ROOT Node
  • 41. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Which Node To Select Further? Outlook Yes ?? Overcast Outlook = overcast Contains only yes You need to recalculate things
  • 42. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python This Is How Your Complete Tree Will Look Like No Yes NormalHigh Yes WeakStrong No Yes Outlook WindyHumidity Overcast
  • 43. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python What Should I Do To Play - Pruning
  • 44. “A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions” What is Pruning?
  • 45. Copyright © 2018, edureka and/or its affiliates. All rights reserved.Machine Leaning Training Using Python Pruning: Reducing The Complexity Yes Normal Yes Weak Yes Outlook WindyHumidity
  • 46. Copyright © 2018, edureka and/or its affiliates. All rights reserved. Machine Leaning Training Using Python Are tree based models better than linear models?