0% found this document useful (0 votes)

29 views

Decision Trees

This document provides an overview of decision trees, including: - Definitions of decision trees and why they should be used for classification and prediction. - The basic algorithm involves using information gain to select attributes that best split a dataset into subgroups. - Common steps include variable selection, handling missing values, splitting nodes, stopping/pruning trees, and using the tree to make predictions. - Decision trees can be used for part-of-speech tagging and text classification in natural language processing.

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Decision Trees

Uploaded by

Asimullah, M.Phil. Scholar Department of Computer Science, UoP

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Decision Trees

Presentation no.1

1 Submitted to : Dr. Muhammad Naeem

Presented by: Nosheen Fayyaz
2 Contents

 Definitions
 Why should we use Decision Trees
 The basic algorithm of Decision Trees (overview)
 Common steps for using Decision Trees
 Disadvantages
 Application of Decision Trees in NLP
3 Decision Trees

 Definition:
 The decision tree method is a powerful statistical tool for
classification, prediction, interpretation, and data
manipulation that has several potential applications.
 Non-parametric approach without distributional assumptions.
 Decision tree can also be re-represented as
if-then rules to improve human readability
4 Why should we use Decision Trees?

• Simplifies complex relationships between input variables and target

variables by dividing original input variables into significant subgroups.
• Easy to understand and interpret.
• Easy to handle missing values without needing to resort to imputation.
• Easy to handle heavy skewed data without needing to resort to data
transformation.
• Robust to outliers
5 Why should we use Decision Trees?

 Decision trees can be used when

 Instances can be described by attribute-value pairs
 Target function is discrete valued (classification)
 Disjunctive hypothesis may be required that searches a
completely expressive hypothesis space
 Used for inductive inferencing
 Possibly noisy training data
6 The Basic DTL Algorithm (overview)

Top-down, greedy search through the space of possible decision trees

(ID3 and C4.5)

In construction process, the important thing is the selection of attribute

used for splitting the example set in to different classes.
 Root: best attribute for classification that would be decided on the value of
information gain
7 The Basic DTL Algorithm (overview)

Information gain
A quantitative measure of the worth of an attribute or Measures the
expected reduction in entropy given the value of some attribute A
Gain(S,A)  Entropy(S) - vValues(A) |Sv|Entropy(S)/|S|

Values(A): Set of all possible values for attribute A

Sv: Subset of S for which attribute A has value v
8 The Basic DTL Algorithm (overview)
Entropy
Entropy(S) - p+ log2 p+ - p- log2 p-

p+(-) = proportion of positive (negative)

examples

 Entropy specifies the minimum number

of bits of information needed to encode
the classification of an arbitrary
member of S
 In general:
 Entropy(S) = -  i=1,c pi log2 pi
9 Common Steps for using decision trees (1)

 Variable selection.
 To select the most relevant input variables that should be used to
form decision tree models.
 Assessing the relative importance of variables.
 Generally, variable importance is computed based on the
reduction of model accuracy, when the variable is removed. In
most circumstances the more records a variable have an effect
on, the greater the importance of the variable.
10 Common Steps for using decision trees (2)

 Handling of missing values.

 A common but incorrect method is to exclude missing values; which
introduces bias in the analysis and inefficiency.
 Decision tree analysis can deal with missing data in two ways:
 Either classify missing values as a separate category
 Or set the variable with lots of missing value as a target variable to make
prediction and replace these missing ones with the predicted value.
11 Common Steps for using decision trees (3)
 Nodes specifies some attributes to be classified
 Root node, Internal node, Leaf node
 Branches corresponds to one of the possible values of the attributes
 Splitting.
 Only input variables related to the target variable are used
 Identify the most important input variables, and then split records at the root node and at subsequent internal
nodes
 Characteristics that are related to the degree of ‘purity’ (all the records have the target outcome) of the
resultant child nodes include entropy (measure of disorder), Gini index, classification error, information gain,
gain ratio, and twoing criteria.
 This splitting procedure continues until pre-determined homogeneity or stopping criteria are met.
12 Common Steps for using decision trees (4)

 Stopping
 All decision trees need stopping criteria or it would be possible, and undesirable, to
grow a tree in which each case occupied its own node. The resulting tree would be
computationally expensive, difficult to interpret and would probably not work very
well with new data
 Number of cases in the node is less than some pre-specified limit.
 Purity of the node is more than some pre-specified limit.
 Depth of the node is more than some pre-specified limit.
 Predictor values for all records are identical - in which no rule could be
generated to split them.
13 Common Steps for using decision trees (5)

 Pruning.
 In some situations, stopping rules do not work well. An alternative way to build a
decision tree model is to grow a large tree first, and then prune it to optimal size by
removing nodes that provide less additional information.
 Two types
 pre-pruning (forward pruning) uses Chi-square tests or multiple-comparison
adjustment methods to prevent the generation of non-significant branches.
 Post-pruning is used after generating a full decision tree to remove branches in a
manner that improves the accuracy of the overall classification when applied to the
validation dataset.
14 Common Steps for using decision trees (6)

 Pruning can be done by the following techniques

 selects the best possible sub-tree from several candidates to consider the proportion of
records with error prediction
 selecting the best alternative is to use a validation dataset (i.e., dividing the sample in two
and one sample is used to develop the model (training dataset), while the other is used to
test the model (validation dataset).
 For small samples, cross validation (i.e., dividing the sample in 10 groups or ‘folds’, the
model developed from 9 folds and tested on the 10th fold, and averaging the rates or
erroneous predictions).
15 Common Steps for using decision trees (7)

 Prediction.
 This is one of the most important usages of decision tree models to predict the
result for future records.
16 Disadvantages

 It can be subject to overfitting and underfitting, particularly when using a small data set.
 This can limit the generalizability and robustness of the resultant models.
 Strong correlation between different potential input variables may result in the selection of
variables that improve the model statistics but are not causally related to the outcome of
interest.
Application of Decision Trees in NLP

17
Part of speech (POS) tagging
Text Classification
18 Application of Decision Trees in NLP

 Three approaches are used for part of speech tagging

 Linguistic
 Uses manually written set of rules/constraints
 Statistics
 Uses statistical models, lexical and transitional probabilities
 Problem in adopting the tagger for other languages, and lack of annotated corpora
 Accuracy of taggers is 96-97%, Need of high accuracy for known and unknown words
 Machine Learning
 Automatically learn a set of transformation rules
19 Use of Decision Trees for POS tagging

 Description of the training corpus and the word form lexicon

 Training corpus: A portion of 1,170,000 words of the WSJ(Wall Street Journal)
 tag set: Penn Treebank (45 different tags), to train and test.
 Ambiguity: About 36.5% of the words in the corpus, ambiguity ratio of 2.44
tags/word over the ambiguous words, 1.52 overall.
 Word form lexicon: created a 49,206 entries lexicon with the associated lexical
probabilities for each word.
20 POS Tagging

 The heuristic is choosing for each word its most probable tag
according to the lexical probability.
 Choosing the proper syntactic tag for a word in a particular context
can be stated as a problem of classification.
 Learning algorithm would be used for a set of possible tags,
 Classes are identified with tags.
It is possible to group all the words appearing in the corpus according to the set of
21
their possible tags called ambiguity classes.
Taxonomy extracted from the WSJ. The general POS tagging problem is split into
one classification problem for each ambiguity class.
22 Treetagger
 Classify the word using the corresponding decision tree. The ambiguity of
the context (either left or right) during classification may generate
multiple answers for the questions of the nodes. In this case, all the paths
are followed and the result is taken as a weighted average of the results of
all possible paths. The weight for each path is actually its probability.
 Use the resulting probability distribution to update the probability
distribution of the word.
 Discard the tags with almost zero probability, that is, those with
probabilities lower than a certain discard boundary parameter.
23 Treetagger
After the stopping criterion is satisfied, some words could still remain
ambiguous. Then there are two possibilities:
1) Choose the most probable tag for each still-ambiguous word to
completely disambiguate the text.
2) Accept the residual ambiguity.

 Pruning the tree.

 In order to decrease the effect of over-fitting, a post pruning technique is
implemented . Experimental tests have shown that in our domain, the
pruning process reduces tree sizes up to 50% and improves their accuracy
by 2–5%
24
25 Text Classification
 Early the text classification was carried out by an expert system using the if-then
rule
 In 90s, efforts were made to use the machine learning algorithms
 a general inductive process (the learner) is fed with a set of “training” documents,
pre-classified according to the categories of interest.
 By observing the characteristics of the training documents, the learner may
generate a model (the classifier) of the conditions that are necessary for a
document to belong to any of the categories considered.
 This model can subsequently be applied to previously unseen documents for
classifying them according to these categories.
26 Text Classification
 Advantages over the knowledge engineering approach.
 A higher degree of automation is introduced: The engineer needs to
build not a text classifier, but an automatic builder of text classifiers
(the learner). Once built, the learner can then be applied to
generating many different classifiers for many different domains and
applications; one only needs to feed it with the appropriate sets of
training documents.
 Accuracy of the classifiers is more than knowledge engineering
27 A general process of text classification
28 Classifiers for text classification

 Bayesian classifier
 Decision Tree
 K-nearest neighbor(KNN)
 Support Vector Machines(SVMs)
 Neural Networks
 Rocchio’s.
29 How decision tree works for text
classification?
 When decision tree is used for text classification it consist tree
internal node are label by term, branches departing from them are
labeled by test on the weight, and leaf node are represented by
corresponding class labels .
 Tree can classify the document by running through the query
structure from root to until it reaches a certain leaf, which represents
the goal for the classification of the document.
30
 Advantages
 Simplicity in understanding and interpreting, even for non-expert
users.
 The multi-label document reduce cost of induction.
 Decision-tree-based symbolic rule induction system for text
categorization also improves text classification.
 Disadvatage
 Most of training data will not fit in memory decision tree
construction it becomes inefficient due to swapping of training
tuples. This issue was handled by using numeric and categorical
data.
31 Which classifier to use?

 Different algorithms perform differently depending

on data collection.
 However, to the certain extent SVM with term
weighted VSM (vector space model)
representation scheme performs well in many text
classification tasks.
32 References

 Song Y, Lu Y. Decision tree methods: applications for classification and prediction.

Shanghai Arch Psychiatry. 2015;27:130–5.
 Màrquez, Lluís, and Horacio Rodríguez. "Part-of-speech tagging using decision
trees." European Conference on Machine Learning. Springer, Berlin, Heidelberg, 1998.
 Màrquez, Lluís, Lluis Padro, and Horacio Rodriguez. "A machine learning approach to
POS tagging." Machine Learning 39.1 (2000): 59-91.
 Sebastiani, Fabrizio. "Text categorization." Encyclopedia of Database Technologies and
Applications. IGI Global, 2005. 683-687.
 Korde, Vandana, and C. Namrata Mahender. "Text classification and classifiers: A
survey." International Journal of Artificial Intelligence & Applications 3.2 (2012): 85.

Download Introduction to algorithms solutions 3rd Edition Thomas H. Cormen ebook All Chapters PDF
100% (14)
Download Introduction to algorithms solutions 3rd Edition Thomas H. Cormen ebook All Chapters PDF
60 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
Decision Trees Practice Questions
No ratings yet
Decision Trees Practice Questions
7 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Assignment Decision Tree
No ratings yet
Assignment Decision Tree
15 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
ESGB_2025_classification and regression tress [Enregistré automatiquement]
No ratings yet
ESGB_2025_classification and regression tress [Enregistré automatiquement]
43 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
No ratings yet
Decision Trees: Decision Tree Representation ID3 Learning Algorithm Entropy, Information Gain Overfitting
33 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
ML Unit 4
No ratings yet
ML Unit 4
47 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Unit 4
No ratings yet
Unit 4
33 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
PSR 0607 Chap10
No ratings yet
PSR 0607 Chap10
33 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Trees - Neha Chowdhary PPT
No ratings yet
Decision Trees - Neha Chowdhary PPT
20 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Classification
No ratings yet
Classification
8 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Unit 3
No ratings yet
Unit 3
30 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
5 Learning
No ratings yet
5 Learning
7 pages
6.2.Unit-2 ML Handsout
No ratings yet
6.2.Unit-2 ML Handsout
18 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
AD3491 - FDSA - Unit I - Introduction - Part I
100% (2)
AD3491 - FDSA - Unit I - Introduction - Part I
23 pages
Fake News Detection Using Machine Learning Report PDF
No ratings yet
Fake News Detection Using Machine Learning Report PDF
52 pages
Predicting Students Employability Using ML
No ratings yet
Predicting Students Employability Using ML
5 pages
Question Bank( DA)-1
No ratings yet
Question Bank( DA)-1
14 pages
Lecture 21 9 11 2023
No ratings yet
Lecture 21 9 11 2023
42 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
Prediction of Road Traffic Congestion Based On Random Forest
No ratings yet
Prediction of Road Traffic Congestion Based On Random Forest
4 pages
Road Accident Prediction Journal Paper
No ratings yet
Road Accident Prediction Journal Paper
3 pages
Lesson 8 INDIVIDUAL TASK
No ratings yet
Lesson 8 INDIVIDUAL TASK
3 pages
ML 2 marks
No ratings yet
ML 2 marks
7 pages
11.1 Chinese Medicine Mark Scheme
No ratings yet
11.1 Chinese Medicine Mark Scheme
3 pages
GWL Prediction Paper
No ratings yet
GWL Prediction Paper
17 pages
aiml manual
No ratings yet
aiml manual
27 pages
Skin Cancer Detection
No ratings yet
Skin Cancer Detection
16 pages
Decision Trees
No ratings yet
Decision Trees
88 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
Quiz 8
No ratings yet
Quiz 8
3 pages
Cost-Sensitive Trees For Interpretable Reinforcement Learning
No ratings yet
Cost-Sensitive Trees For Interpretable Reinforcement Learning
9 pages
Immediate download (Ebook) Data Analytics for the Social Sciences: Applications in R by Garson, G. David ISBN 9780367624293, 9780367624279, 9781003109396, 036762429X, 0367624273, 100310939X ebooks 2024
100% (5)
Immediate download (Ebook) Data Analytics for the Social Sciences: Applications in R by Garson, G. David ISBN 9780367624293, 9780367624279, 9781003109396, 036762429X, 0367624273, 100310939X ebooks 2024
81 pages
Dwdm Unit 5 Part One
No ratings yet
Dwdm Unit 5 Part One
29 pages
Machine Learning For Microalgae Detection and Utilization
No ratings yet
Machine Learning For Microalgae Detection and Utilization
22 pages
Ayush Machine Learning Lab
No ratings yet
Ayush Machine Learning Lab
38 pages
DA Lecture - 3
No ratings yet
DA Lecture - 3
70 pages
BMT - 5 - Decision Tree - NOTES
No ratings yet
BMT - 5 - Decision Tree - NOTES
9 pages
Increasing The Robustness of Uplift Modeling Using Additional Splits and Diversified Leaf Select
No ratings yet
Increasing The Robustness of Uplift Modeling Using Additional Splits and Diversified Leaf Select
9 pages