100% found this document useful (6 votes)

3K views

Decision Trees

Decision trees are hierarchical classification models where internal nodes represent attributes and branches represent attribute values. They partition data into terminal nodes representing class labels. An example decision tree is presented classifying whether to play or not based on weather attributes. Decision trees are induced from data using algorithms like ID3, C4.5, and CART. They provide interpretable classification rules and handle both numeric and categorical data.

Uploaded by

Vijay kumar Gupta .C

Available Formats

Download as PPT or read online on Scribd

100% found this document useful (6 votes)

3K views

Decision Trees

Uploaded by

Vijay kumar Gupta .C

Available Formats

Download as PPT or read online on Scribd

You are on page 1/ 28

Decision Trees

What is a tree in CS?

• A tree is a non-linear data structure
• It has a unique node called the root
• Every non-trivial tree has one or more leaf
nodes, arranged in different levels
• Trees are always drawn with the root at the
top or on the left
• Nodes at a level are connected to nodes at
higher (parent) level or lower (child) level
• There are no loops in a tree
Decision Trees
• A decision tree (DT) is a hierarchical
classification and prediction model
• It is organized as a rooted tree with 2 types
of nodes called decision nodes and class nodes
• It is a supervised data mining model used for
classification or prediction
An Example Data Set and Decision Tree

# Attribute Class outlook

Outlook Company Sailboat Sail? sunny rainy
1 sunny big small yes
2 sunny med small yes yes company
3 sunny med big yes
no big
4 sunny no small yes med
5 sunny big big yes
6 rainy no small no no sailboat yes

7 rainy med small yes small big

8 rainy big big yes
9 rainy no big no yes no

10 rainy med big no

Classification
• What is classification?
• What are some applications of Decision Tree
Classifiers (DTC)
• What is a BDTC?
• Misclassification errors
Classification

# Attribute Class sunny

outlook

rainy

Outlook Company Sailboat Sail? yes company

1 sunny no big ? no
med
big

2 rainy big small ?

no sailboat yes

small big

yes no
Chance and Terminal nodes
• Each internal node of a DT is a decision point,
where some condition is tested
• The result of this condition determines which
branch of the tree is to be taken next
• Thus they are called decision node, chance
node or non-terminal node
• Chance nodes partition the available data at
that point to maximize dependent variable
differences
Terminal nodes
• The leaf nodes of a DT are called terminal
node
• They indicate the class into which a data
instance will be classified
• They have just one incoming node
• They do not have child nodes (outgoing nodes)
• There are no conditions tested at terminal
nodes
• Tree traversal from the root to the leaf
produces the production rule for that class
Advantages of DT
• Easy to understand and interpret
• Works for categorical and quantitative data
• DT can grow to any depth
• Attributes can be chosen in any desired order
• Pruning a DT is very easy
• Works for missing or null values
Advantages contd.
• Can be used to identify outliers
• Production rules can be obtained directly
from the built DT
• They are relatively faster than other
classification models
• DT can be used even when domain experts are
absent
Disadvantages
• A DT induces sequential decisions
• Class-overlap problem
• Correlated data
• Complex production rules
• A DT can be sub-optimal
Quinlan’s classical example

# Attribute Class
Outlook Temperature Humidity Windy Play
1 sunny hot high no N
2 sunny hot high yes N
3 overcast hot high no P
4 rainy moderate high no P
5 rainy cold normal no P
6 rainy cold normal yes N
7 overcast cold normal yes P
8 sunny moderate high no N
9 sunny cold normal no P
10 rainy moderate normal no P
11 sunny moderate normal yes P
12 overcast moderate high yes P
13 overcast hot normal no P
14 rainy moderate high yes N
Simple Tree

Outlook

sunny rainy
overcast

Humidity P Windy

high normal yes no

N P N P
Complicated Tree

Temperature

hot
cold moderate

Outlook Outlook Windy

sunny rainy sunny rainy yes no

overcast overcast

P P Windy Windy P Humidity N Humidity

yes no yes no high normal high

N P P N Windy P Outlook P

yes no sunny rainy

overcast
N P N P null
Production rules
• Rules abstracted by a DT can be converted
into production rules
• These are obtained by traversing each branch
of the DT from root to each of the leaves
• A DT can be reconstructed if all production
rules are known
General View of DT Induction
ID3 induction algorithm
• ID3 (Interactive dichotomiser)
• Introduced in 1986 by Quinlan
• Uses greedy tree-growing method
• Works on binary attributes
• Uses entropy measure
C4.5 induction algorithm

• Invented by Quinlan in 1993

• Is an extension of ID3 algorithm
• Uses greedy tree-growing method
• Works on general attributes
• Uses entropy measure
• Uses multi-way splits
CART induction algorithm
• Invented by Breiman, et.al. in 1984
• Uses binary recursive partitioning method
• Works on general attributes
• Uses Gini measure
• Uses two-way splits
Measures for node splitting
• Gini’s Index measure
• Modified Gini Index
• Normalized, symmetric and asymmetric Gini
Index measure
• Shannon’s entropy measure
• Minimum classification error measure
• Chi-square statistic
Entropy

• The average amount of information I needed

to classify an object is given by the entropy
measure

• For a two-class problem:

Chi-squared Automatic Interaction Detector
(CHAID)

• As the name implies, this is a statistical

technique for tree induction that uses Karl
Pearson's X2 test for contingency tables.
• It works for categorical variables (with 2 or
more categories), and can be used as an
alternative to logistic regression.
• There is no pruning step as it stops growing
the DT when a certain condition is met.
Pruning DT
• Once the decision tree has been constructed, a
sensitivity analysis should be performed to test the
suitability of the model to variations in the data
instances. Expected values of each alternative are
evaluated to determine optimal model. But the
decision maker's attitude towards high risk
alternatives can negatively influence the outcome of a
sensitivity analysis. Most of the decision tree
software packages allows the user to carry out
sensitivity analysis.
Pre Vs Post-pruning
• There are two approaches to prune a DT -- pre-
pruning and post-pruning. In pre-pruning, the tree
growing is halted when a stopping condition is met.
• Post-pruning works with a completely grown tree. In
post-pruning, test cases are used to prune the DT to
minimize the classification error or to adjust the
tree to data changes.
• Tree pruning is usually a post-processing step with an
intention to minimize over fitting, and to remove
redundancies.
Decision Tables
• A decision table is a hierarchical structure akin to
decision trees, except that data are enumerated into
a table using a pair of attributes, rather than a single
attribute.
• Quantitative variables should be categorized using
the discretisation technique discussed in chapter 1.
Fraud Detection
• Fraud detection is increasingly becoming a
necessity due to the large number of
uncaught frauds. Fraudulent financial
transaction amounts to billions of dollars
every year throughout the world. Fraud
prevention is different from fraud detection,
as the former is pre-transaction safety, and
the later is used during or immediately after
a transaction.
Software for DT
• DTREG is a powerful statistical analysis program that
generates classification and regression trees (
www.dtreg.com)
• GATree (www.gatree.com)
• Weka (University of Waikato, NZ)
• TreeAge Pro (www.treeage.com)
• YaDT (www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.1.zip)
THE END

Organizational Readiness to E-Transformation
From Everand
Organizational Readiness to E-Transformation
Aqel M. Aqel
No ratings yet
Grant Application Form Template
No ratings yet
Grant Application Form Template
8 pages
Writing Skills Are An Important Part of Communication
No ratings yet
Writing Skills Are An Important Part of Communication
10 pages
SQL Inections Issa
No ratings yet
SQL Inections Issa
31 pages
Introduction To The Microbiology of - Food Processing
No ratings yet
Introduction To The Microbiology of - Food Processing
64 pages
UML Advantages and Disadvantages
100% (2)
UML Advantages and Disadvantages
2 pages
Social Network Analysis
No ratings yet
Social Network Analysis
50 pages
Chap8 Basic Cluster Analysis
100% (1)
Chap8 Basic Cluster Analysis
104 pages
DATA Mining
No ratings yet
DATA Mining
55 pages
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
No ratings yet
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
79 pages
Data Warehouse
No ratings yet
Data Warehouse
71 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
Lecture 4 Centrality Measure
No ratings yet
Lecture 4 Centrality Measure
83 pages
Pert 7 - Ethics and Privacy
No ratings yet
Pert 7 - Ethics and Privacy
18 pages
Field Work Observation Report
No ratings yet
Field Work Observation Report
1 page
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
No ratings yet
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
11 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
100% (5)
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
63 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
IT Governance and Ethics
No ratings yet
IT Governance and Ethics
20 pages
Unit #1 - Data Warehouse and Data Mining
No ratings yet
Unit #1 - Data Warehouse and Data Mining
62 pages
Text Analytics and Text Mining Overview
No ratings yet
Text Analytics and Text Mining Overview
16 pages
Chapter 4 - Transportation Example - Exhibit 4.15 - Excel Solver
100% (1)
Chapter 4 - Transportation Example - Exhibit 4.15 - Excel Solver
14 pages
Big Data - Introduction: Ravichandran
100% (1)
Big Data - Introduction: Ravichandran
44 pages
Lecture 5
No ratings yet
Lecture 5
14 pages
Sentiment Analysis Using Natural Language Processing
No ratings yet
Sentiment Analysis Using Natural Language Processing
7 pages
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
No ratings yet
Subject Code:Mb20Ba01 Subject Name: Data Visulization For Managers Faculty Name: Dr.M.Karthikeyan
34 pages
Mapping Datawarehouse Architecture
100% (1)
Mapping Datawarehouse Architecture
2 pages
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
No ratings yet
Https Raw - Githubusercontent.com Joelgrus Data-Science-From-Scratch Master Code Natural Language Processing
5 pages
Chapter 4 Decision Support Intelligent Systems Ioenotes
No ratings yet
Chapter 4 Decision Support Intelligent Systems Ioenotes
67 pages
SE 7204 BIG Data Analysis Unit I Final
No ratings yet
SE 7204 BIG Data Analysis Unit I Final
66 pages
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
No ratings yet
Bachelor of Science in Accountancy: Program Curriculum Ay 2020 - 2021
6 pages
Disadvantages of Management Information System
No ratings yet
Disadvantages of Management Information System
2 pages
1-Big Data Analytics
No ratings yet
1-Big Data Analytics
37 pages
E-Commerce, Data Warehouse, Data Mining
No ratings yet
E-Commerce, Data Warehouse, Data Mining
13 pages
Unit #2 - Data Warehouse and Data Mining
No ratings yet
Unit #2 - Data Warehouse and Data Mining
51 pages
Advantages of Data Warehouse
No ratings yet
Advantages of Data Warehouse
2 pages
Big Data Metods
No ratings yet
Big Data Metods
23 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Big Data and Data Warehouse
No ratings yet
Big Data and Data Warehouse
19 pages
Science and Technology Policy 2013
No ratings yet
Science and Technology Policy 2013
9 pages
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
No ratings yet
MRA - Big Data Analytics - Its Impact On Changing Trends in Retail Industry
4 pages
Jeffrey A. Hoffer, Mary B. Prescott, Fred R. Mcfadden: Modern Database Management 10 Edition
No ratings yet
Jeffrey A. Hoffer, Mary B. Prescott, Fred R. Mcfadden: Modern Database Management 10 Edition
13 pages
2 - Relative and Absolute Cell References
No ratings yet
2 - Relative and Absolute Cell References
11 pages
Unit #5 - Data Warehouse and Data Mining
No ratings yet
Unit #5 - Data Warehouse and Data Mining
49 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
D7.2 Data Managment Plan v1.04
No ratings yet
D7.2 Data Managment Plan v1.04
14 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Formulating and Solving LPs Using Excel Solver
No ratings yet
Formulating and Solving LPs Using Excel Solver
10 pages
File System Vs DBMS
No ratings yet
File System Vs DBMS
6 pages
Problem MRP Parameters
No ratings yet
Problem MRP Parameters
2 pages
A Presentation On: Big Data, Big Rewards
No ratings yet
A Presentation On: Big Data, Big Rewards
15 pages
Data Ethics Framework 2
No ratings yet
Data Ethics Framework 2
23 pages
Social Network Analysis in R PDF
No ratings yet
Social Network Analysis in R PDF
35 pages
Big Data - Challenges for the Hospitality Industry: 2nd Edition
From Everand
Big Data - Challenges for the Hospitality Industry: 2nd Edition
Michael Toedt
No ratings yet
Big Data Analytics Complete Self-Assessment Guide
From Everand
Big Data Analytics Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
L-17 DT2
No ratings yet
L-17 DT2
28 pages
Lecture 19 - Decision Tress
No ratings yet
Lecture 19 - Decision Tress
21 pages
Yapay Zeka Ve Makine Öğrenmesi 10
No ratings yet
Yapay Zeka Ve Makine Öğrenmesi 10
34 pages
Ystemic Eforms FOR Urriculum Hange: P P N F G
No ratings yet
Ystemic Eforms FOR Urriculum Hange: P P N F G
35 pages
C Essentials
No ratings yet
C Essentials
45 pages
Data Recovery Book
No ratings yet
Data Recovery Book
136 pages
R Intro
No ratings yet
R Intro
100 pages
Neural Networks 16 Mark Answers
No ratings yet
Neural Networks 16 Mark Answers
3 pages
Credit Verification Process
No ratings yet
Credit Verification Process
18 pages
Programming in Java
No ratings yet
Programming in Java
439 pages
WinDbg CheatSheet
No ratings yet
WinDbg CheatSheet
1 page
Kdump Docs
No ratings yet
Kdump Docs
40 pages
Top 70 CCNA Interview Questions & Answers
No ratings yet
Top 70 CCNA Interview Questions & Answers
8 pages
CGMP Complete PDF
No ratings yet
CGMP Complete PDF
128 pages
Bitbull Tech Notes - OpenVPN Site To Site With CentOS 7
No ratings yet
Bitbull Tech Notes - OpenVPN Site To Site With CentOS 7
4 pages
KRMx01 EStop Upgrade
No ratings yet
KRMx01 EStop Upgrade
32 pages
Coding With Mata in Stata
No ratings yet
Coding With Mata in Stata
16 pages
Opnet 11.5 Configuration
No ratings yet
Opnet 11.5 Configuration
11 pages
Training Kit PEK 408 Eng
No ratings yet
Training Kit PEK 408 Eng
38 pages
Bringing Computational Thinking To STEM Education PDF
No ratings yet
Bringing Computational Thinking To STEM Education PDF
6 pages
Huawei Sig Guide
No ratings yet
Huawei Sig Guide
224 pages
Smart Bus Tracker: K.V. Nagalakshmi, Monisha M, Puja Kaul, Sumaya Bhat, Revathy C Kattepur, Sneha Sara Jacob
No ratings yet
Smart Bus Tracker: K.V. Nagalakshmi, Monisha M, Puja Kaul, Sumaya Bhat, Revathy C Kattepur, Sneha Sara Jacob
5 pages
Hotel Chart of Accounts
No ratings yet
Hotel Chart of Accounts
25 pages
ZTE LR14 LTE FDD Power Control Feature Guide
100% (2)
ZTE LR14 LTE FDD Power Control Feature Guide
104 pages
Supertype, Subtype
No ratings yet
Supertype, Subtype
31 pages
Previous Topic Next Topic Contact z/OS Library PDF
No ratings yet
Previous Topic Next Topic Contact z/OS Library PDF
101 pages
Pages From 7SG18 - Solkor - N - Complete - Technical - Manual PDF
No ratings yet
Pages From 7SG18 - Solkor - N - Complete - Technical - Manual PDF
1 page
Web Based Mail Client
No ratings yet
Web Based Mail Client
16 pages
BITS-Pilani 1 Semester 2022-23 MATH F213 (Discrete Mathematics)
No ratings yet
BITS-Pilani 1 Semester 2022-23 MATH F213 (Discrete Mathematics)
22 pages
Fixed-Point Iteration - Wikipedia
No ratings yet
Fixed-Point Iteration - Wikipedia
17 pages
Guide Book
No ratings yet
Guide Book
32 pages
project report (TRANSLATOR) EDIT
No ratings yet
project report (TRANSLATOR) EDIT
39 pages
Cis 310 Excel Assignment
No ratings yet
Cis 310 Excel Assignment
16 pages
All English - Merged
No ratings yet
All English - Merged
82 pages
9608_w16_qp_41
No ratings yet
9608_w16_qp_41
16 pages
4g Ulcom
No ratings yet
4g Ulcom
3 pages
China's Disruptors (Ed Tse) 10-18-2016
No ratings yet
China's Disruptors (Ed Tse) 10-18-2016
21 pages