0% found this document useful (0 votes)

20 views

ML - 4

Uploaded by

CST A69 Trisha Nandy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

ML - 4

Uploaded by

CST A69 Trisha Nandy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPSX, PDF, TXT or read online on Scribd

You are on page 1/ 58

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Course Name : AI & ML

Click to edit Master subtitle style

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

Concept of Decision Tree

A Decision Tree is an important data structure
known to solve many computational
problems.
It may be binary (if based on single yes or no)
It may be m-array

July 12, 2024 2

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Binary Decision Tree

A B C f
0 0 0 m0
0 0 1 m1
0 1 0 m2
0 1 1 m3
1 0 0 m4
1 0 1 m5
1 1 0 m6
1 1 1 m7

July 12, 2024 3

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Basic Concept

• In the last slide we have considered a decision tree where values of any
attribute if binary only. Decision tree is also possible where attributes are
of continuous data type
Decision Tree with numeric data

July 12, 2024 4

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

age income student credit_rating buys_computer

<=30 high no fair no
 Training data set: Buys_computer <=30 high no excellent no
 The data set follows an example of 31…40 high no fair yes
>40 medium no fair yes
Quinlan’s ID3 (Playing Tennis) >40 low yes fair yes
 Resulting tree: >40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

student? yes credit rating?

no yes excellent fair

no July 12, 2024 yes no yes

5
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
Some Characteristics
• Decision tree may be n-ary, n ≥ 2.
• There is a special node called root node.
• All nodes drawn with circle (ellipse) are called internal nodes.
• All nodes drawn with rectangle boxes are called terminal nodes or leaf
nodes.
• Edges of a node represent the outcome for a value of the node.
• In a path, a node with same label is never repeated.
• Decision tree is not unique, as different ordering of internal nodes can
give different decision tree.

July 12, 2024 6

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
Decision tree helps us to classify data.
– Internal nodes are some attribute
– Edges are the values of attributes
– External nodes are the outcome of classification

• Such a classification is, in fact, made by posing questions starting from

the root node to each terminal node.

July 12, 2024 7

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

How are decision tree used for classification?

Given a tuple, X, for which the associated class
label is unknown, the attribute values of the
tuple are tested against the decision tree. A
path is traced from the root to a leaf node,
which holds the class prediction for that
tuple.

July 12, 2024 8

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

Why are decision tree classifiers so popular?

Construction of decision tree does not
require any domain knowledge.
It can deal with high dimensional data.
Decision tree classifiers have good accuracy.

July 12, 2024 9

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
Building Decision Tree
In principle, there are exponentially many decision tree that can be
constructed from a given database (also called training data).
• Two approaches are known

– Greedy strategy
• A top-down recursive divide-and-conquer

– Modification of greedy strategy

• ID3
• C4.5
• CART, etc.

July 12, 2024 10

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer
manner
At start, all the training examples are at the root
Attributes are categorical (if continuous-valued, they are discretized
in advance)
Examples are partitioned recursively based on selected attributes
Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
Conditions for stopping partitioning
All samples for a given node belong to the same class
There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
There are no samples left

July 12, 2024 11

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 12

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

Discussions on Algorithm:
Called with three parameters: D, attribute_list,
attribute_selection_method
D: data partition
attribute_list: list of attributes describing the
tuple
attribute_selection_method: heuristic method
for selecting best attribute.
July 12, 2024 13
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• The tree starts as a single node, N, representing

the training tuples in D
• If the tuples in D are all of the same class, then
node N becomes a leaf and is labeled with that
class (steps 2 and 3). Note that steps 4 and 5 are
terminating conditions.
• Otherwise, the algorithm calls Attribute
selection method to determine the splitting
criterion.
July 12, 2024 14
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

The splitting criterion indicates the splitting attribute

and may also indicate either a split-point or a
splitting subset.
The splitting criterion is determined so that, ideally, the
resulting partitions at each branch are as “pure” as
possible. A partition is pure if all of the tuples in it
belong to the same class. In other words, if we were
to split up the tuples in D according to the mutually
exclusive outcomes of the splitting criterion, we hope
for the resulting partitions to be as pure as possible.
July 12, 2024 15
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• Let A be the splitting attribute. A has v distinct values,

a1, a2, : : : , based on the training data.
• A is discrete-valued: In this case, the outcomes of the
test at node N correspond directly to the known values
of A.
• A is continuous-valued: In this case, the test at node N
has two possible outcomes,corresponding to the
conditions A <=split point and A > split point,
respectively.
• A is discrete-valued and a binary tree
July 12, 2024 16
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 17

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• If the splitting attribute is continuous-valued or if we

are restricted to binary trees then, respectively, either a
split point or a splitting subset must also be determined
as part of the splitting criterion.
• The tree node created for partition D is labeled with the
splitting criterion, branches are grown for each
outcome of the criterion, and the tuples are partitioned
accordingly.
• This section describes three popular attribute selection
measures—information gain, gain ratio, and gini index.
July 12, 2024 18
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• The notation used herein is as follows. Let D,

the data partition, be a training set of class-
labeled tuples. Suppose the class label
attribute has m distinct values defining m
distinct classes, Ci (for i = 1, : : : , m). Let Ci,D
be the set of tuples of class Ci in D. Let |D|and
|Ci,D| denote the number of tuples in D and |
Ci,D| respectively.

July 12, 2024 19

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

m=2
July 12, 2024 20
Attribute Selection Measure:
Information Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs to
class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify
m a tuple in D:
Info( D )   pi log 2 ( pi )
i 1
 Information needed (after using A to split D into v partitions) to
v | D |
classify D:
Info A ( D )  
j
 Info( D j )
j 1 | D |
 Information gained by branching on attribute A
Gain(A)  Info(D)  Info A(D)
July 12, 2024 21
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

The expected information needed to classify a

tuple in D is given by Info(D)
• Info(D) is just the average amount of
information needed to identify the class label
of a tuple in D. Note that, at this point, the
information we have is based solely on the
proportions of tuples of each class. Info(D) is
also known as the entropy of D.

July 12, 2024 22

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

The expected information needed to classify a

July 12, 2024 23

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 24

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• In other words, Gain(A) tells us how much would be
gained by branching on A. It is the expected reduction in
the information requirement caused by knowing the
value of A.
• The attribute A with the highest information gain,
(Gain(A)), is chosen as the splitting attribute at node N.
This is equivalent to saying that we want to partition on
the attribute A that would do the “best classification,” so
that the amount of information still required to finish
classifying the tuples is minimal (i.e., minimum InfoA(D)).

July 12, 2024 25

Discussions
• .

July 12, 2024 26

Tree(its not final) after selecting age as
splitting attribute

July 12, 2024 27

Final Tree

July 12, 2024 28

Attribute Selection: Information Gain
g Class P: buys_computer = “yes” 5 4
Infoage ( D )  I (2,3)  I (4,0)
g Class N: buys_computer = “no” 14 14
9 9 5 5 5
Info( D)  I (9,5)   log 2 ( )  log 2 ( ) 0.940  I (3,2)  0.694
14 14 14 14 14
age pi ni I(p i, n i) 5
<=30 2 3 0.971 I (2,3)means “age <=30” has 5 out of
14
31…40 4 0 0 14 samples, with 2 yes’es and 3
>40 3 2 0.971 no’s. Hence
age
<=30
income student credit_rating
high no fair
buys_computer
no
Gain(age)  Info( D)  Infoage ( D)  0.246
<=30 high no excellent no
31…40 high
>40 medium
no
no
fair
fair
yes
yes
Similarly,
>40 low yes fair yes

Gain(income)  0.029
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30
>40
low
medium
yes
yes
fair
fair
yes
yes
Gain( student )  0.151
<=30 medium
31…40 medium
yes
no
excellent
excellent
yes
yes Gain(credit _ rating )  0.048
31…40 high
July 12, 2024 yes fair yes 29
>40 medium no excellent no
Computing Information-Gain for Continuous-
Valued Attributes
• Let attribute A be a continuous-valued attribute
• Must determine the best split point for A
– Sort the value A in increasing order
– Typically, the midpoint between each pair of adjacent values
is considered as a possible split point
• (ai+ai+1)/2 is the midpoint between the values of ai and ai+1
– The point with the minimum expected information
requirement for A is selected as the split-point for A
• Split:
– D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is
the set of tuples in D satisfying A > split-point
July 12, 2024 30
Gain Ratio for Attribute Selection (C4.5)
• Information gain measure is biased towards attributes with a
large number of values
• C4.5 (a successor of ID3) uses gain ratio to overcome the
problem (normalization to information gain)
v | Dj | | Dj |
SplitInfo A ( D)    log 2 ( )
j 1 |D| |D|
– GainRatio(A) = Gain(A)/SplitInfo(A)
• Ex.

– gain_ratio(income) = 0.029/1.557 = 0.019

• The attribute with the maximum gain ratio is selected as the
splitting attribute
31
Gini Index (CART, IBM IntelligentMiner)
• If a data set D contains examples from n classes, gini index, gini(D)
is defined as n 2
gini( D)  1  p j
j 1
where pj is the relative frequency of class j in D
• If a data set D is split on A into two subsets D1 and D2, the gini
index gini(D) is defined as |D | |D |
gini A ( D)  1 gini( D1)  2 gini( D 2)
|D| |D|
• Reduction in Impurity: gini( A)  gini( D)  gini ( D)
A

• The attribute provides the smallest ginisplit(D) (or the largest

reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute) 32
Computation of Gini Index
• Ex. D has 9 tuples in buys_computer = “yes”
2
and
2
5 in “no”
 9  5
gini ( D )  1        0.459
 14   14 
• Suppose the attribute income partitions D into 10 in D1: {low,
medium} and 4 in D2 giniincome{low,medium} ( D)   Gini( D1 )   Gini( D2 )
10 4
 14   14 

Gini{low,high} is 0.458; Gini{medium,high} is 0.450.

Thus, split on the {low,medium} (and {high}) . since it has the
lowest Gini index.

33
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• The three measures, in general, return good results but
– Information gain:
• biased towards multivalued attributes
– Gain ratio:
• tends to prefer unbalanced splits in which one partition is
much smaller than the others
– Gini index:
• biased to multivalued attributes
• has difficulty when # of classes is large
• tends to favor tests that result in equal-sized partitions and
purity in both partitions
July 12, 2024 34
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• CHAID: a popular decision tree algorithm, measure based on χ2 test for
independence
• C-SEP: performs better than info. gain and gini index in certain cases
• G-statistic: has a close approximation to χ2 distribution
• MDL (Minimal Description Length) principle (i.e., the simplest solution is
preferred):
– The best tree as the one that requires the fewest # of bits to both (1)
encode the tree, and (2) encode the exceptions to the tree
• Multivariate splits (partition based on multiple variable combinations)
– CART: finds multivariate splits based on a linear comb. of attrs.
• Which attribute selection measure is the best?
July 12, 2024 35
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Using IF-THEN Rules for Classification

• Represent the knowledge in the form of IF-THEN rules
R: IF age = youth AND student = yes THEN buys_computer = yes
– Rule antecedent/precondition vs. rule consequent
• Assessment of a rule: coverage and accuracy
– ncovers = # of tuples covered by R
– ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
• If more than one rule are triggered, need conflict resolution
– Size ordering: assign the highest priority to the triggering rules that has the
“toughest” requirement (i.e., with the most attribute test)
– Class-based ordering: decreasing order of prevalence or misclassification cost per
class
– Rule-based ordering (decision list): rules are organized into one long priority list,
according to some measure of rule quality or by experts
July 12, 2024 36
Rule Extraction from a Decision Tree
age?

<=30 31..40 >40

 Rules are easier to understand than large trees student? credit rating?
yes
 One rule is created for each path from the root to a excellent fair
no yes
leaf no yes
no yes
 Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
 Rules are mutually exclusive and exhaustive
• Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
July 12, 2024 37
Rule Extraction from the Training Data

• Sequential covering algorithm: Extracts rules directly from training data

• Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
• Rules are learned sequentially, each for a given class Ci will cover many tuples
of Ci but none (or few) of the tuples of other classes
• Steps:
– Rules are learned one at a time
– Each time a rule is learned, the tuples covered by the rules are removed
– The process repeats on the remaining tuples unless termination condition,
e.g., when no more training examples or when the quality of a rule
returned is below a user-specified threshold
• Comp. w. decision-tree induction: learning a set of rules simultaneously

July 12, 2024 38

Sequential Covering Algorithm
while (enough target tuples left)
generate a rule
remove positive target tuples satisfying this rule

Examples covered
Examples covered by Rule 2
by Rule 1 Examples covered
by Rule 3

Positive
examples

July 12, 2024 39

How to Learn-One-Rule?
• Star with the most general rule possible: condition = empty
• Adding new attributes by adopting a greedy depth-first strategy
– Picks the one that most improves the rule quality
• Rule-Quality measures: consider both coverage and accuracy
– Foil-gain (in FOIL & RIPPER): assesses info_gain by extending condition
pos ' pos
FOIL _ Gain  pos '(log 2  log 2 )
pos ' neg ' pos  neg
It favors rules that have high accuracy and cover many positive tuples
• Rule pruning based on an independent set of test tuples
pos  neg
FOIL _ Prune( R ) 
pos  neg
Pos/neg are # of positive/negative tuples covered by R.
If FOIL_Prune is higher for the pruned version of R, prune R

40
Rule Generation
• To generate a rule
while(true)
find the best predicate p
if foil-gain(p) > threshold then add p to current rule
else break

A3=1&&A1=2
A3=1&&A1=2
&&A8=5A3=1

Positive Negative
examples examples

July 12, 2024 41

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

Overfitting and Tree Pruning

• Overfitting: An induced tree may overfit the training data
– Too many branches, some may reflect anomalies due to noise or
outliers
– Poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early ̵ do not split a node if this
would result in the goodness measure falling below a threshold
• Difficult to choose an appropriate threshold
– Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
• Use a set of data different from the training data to decide which
is the “best pruned tree”
July 12, 2024 42
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

Enhancements to Basic Decision Tree Induction

• Allow for continuous-valued attributes
– Dynamically define new discrete-valued attributes that partition
the continuous attribute value into a discrete set of intervals
• Handle missing attribute values
– Assign the most common value of the attribute
– Assign probability to each of the possible values
• Attribute construction
– Create new attributes based on existing ones that are sparsely
represented
– This reduces fragmentation, repetition, and replication
July 12, 2024 43
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

When a decision tree is built, many of the

branches will reflect anomalies in the training
data due to noise or outliers. Tree pruning
methods address this problem of overfitting the
data.
• Pruned trees tend to be smaller and less complex
and, thus, easier to comprehend. Theyare usually
faster and better at correctly classifying
independent test data (i.e., of previously unseen
tuples) than unpruned trees.
July 12, 2024 44
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

How does tree pruning work?” There are two common

approaches to tree pruning:
• prepruning and postpruning.
• In the prepruning approach, a tree is “pruned” by
halting its construction early (e.g.,by deciding not to
further split or partition the subset of training
tuples at a given node). Upon halting, the node
becomes a leaf. The leaf may hold the most frequent
class among the subset tuples or the probability
distribution of those tuples.
July 12, 2024 45
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• When constructing a tree, measures such as
statistical significance, information gain, Gini
index, and so on can be used to assess the
goodness of a split. If partitioning the tuples at a
node would result in a split that falls below a
pre-specified threshold, then further
partitioning of the given subset is halted.
• There are difficulties, however, in choosing an
appropriate threshold. High thresholds could
result in oversimplified trees, whereas low
thresholds could result in very little
simplification.
July 12, 2024 46
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 47

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

The second and more common approach is postpruning,

which removes subtrees from a “fully grown” tree. A
subtree at a given node is pruned by removing its
branches and replacing it with a leaf. The leaf is
labeled with the most frequent class among the
subtree being replaced. For example, notice the
subtree at node “A3?” in the unpruned tree of the given
Figure. Suppose that the most common class within
this subtree is “class B.” In the pruned version of the
tree, the subtree in question is pruned by replacing it
withthe leaf “class B.”
July 12, 2024 48
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• The cost complexity pruning algorithm used in CART is an
example of the postpruning approach.
• This approach considers the cost complexity of a tree to be a
function of the number of leaves in the tree and the error rate
of the tree (where the error rate is the percentage of tuples
misclassified by the tree).
• It starts from the bottom of the tree. For each internal node, N,
it computes the cost complexity of the sub-tree at N, and the cost
complexity of the sub-tree at N if it were to be pruned (i.e.,
replaced by a leaf node). The two values are compared. If
pruning the sub-tree at node N would result in a smaller cost
complexity, then the sub-tree is pruned. Otherwise, it is kept.
July 12, 2024 49
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• C4.5 uses a method called pessimistic pruning,

which is similar to the cost complexity method in
that it also uses error rate estimates to make
decisions regarding subtree pruning.
• Rather than pruning trees based on estimated error
rates, we can prune trees based on the number of
bits required to encode them. The “best” pruned
tree is the one that minimizes the number of
encoding bits. This method adopts theMinimum
Description Length (MDL) principle.
July 12, 2024 50
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• Alternatively, prepruning and postpruning

may be interleaved for a combined
approach. Postpruning requires more
computation than prepruning, yet generally
leads to a more reliable tree. No single
pruning method has been found to be
superior over all others.

July 12, 2024 51

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• Although pruned trees tend to be more compact than their unpruned
counterparts, they may still be rather large and complex.
• Decision trees can suffer from repetition and replication , making them
overwhelming to interpret.
• Repetition occurs when an attribute is repeatedly tested along a given
branch of the tree (such as “age < 60?”, followed by “age < 45”?, and so
on). In replication, duplicate subtrees exist within the tree.
• These situations can impede the accuracy and comprehensibility of a
decision tree.
• The use of multivariate splits (splits based on a combination of
attributes) can prevent these problems.
• Another approach is to use a different form of knowledge
representation, such as rules, instead of decision trees.

July 12, 2024 52

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 53

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

July 12, 2024 54

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition
• Scalability:-The efficiency of existing decision tree algorithms,
such as ID3, C4.5, and CART, has been well established for
relatively small data sets. Efficiency becomes an issue of
concern when these algorithms are applied to the mining of
very large real-world databases. The pioneering decision tree
algorithms that we have discussed so far have the restriction
that the training tuples should reside in memory.
• In data mining applications, very large training sets of millions
of tuples are common. Most often, the training data will not fit
in memory. Decision tree construction therefore becomes
inefficient due to swapping of the training tuples in and out of
main and cache memories.
July 12, 2024 55
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA
Definition

• More recent decision tree algorithms that

address the scalability issue have been
proposed. Algorithms for the induction of
decision trees from very large training sets
include SLIQ and SPRINT, both of which can
handle categorical and continuous valued
attributes. Both algorithms propose
presorting techniques on disk-resident data
sets that are too large to fit in memory.
July 12, 2024 56
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Thank You
UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Thank You

Information Sources: NPTEL, HanKamber.

QXDM
50% (4)
QXDM
3 pages
CM110572en 09
No ratings yet
CM110572en 09
46 pages
Vectorworks 2021 Shortcuts: Tool Shortcuts (Modify in Workspace Editor)
No ratings yet
Vectorworks 2021 Shortcuts: Tool Shortcuts (Modify in Workspace Editor)
5 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
4. Classification
No ratings yet
4. Classification
75 pages
DWDM Asgmnt Prog
No ratings yet
DWDM Asgmnt Prog
51 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
dm4
No ratings yet
dm4
68 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Attribute Selection Measures
No ratings yet
Attribute Selection Measures
15 pages
Classification_Decision Tree
No ratings yet
Classification_Decision Tree
32 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
classification-by-decision-tree-induction
No ratings yet
classification-by-decision-tree-induction
25 pages
decision tree
No ratings yet
decision tree
66 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
5-Classification (2)
No ratings yet
5-Classification (2)
59 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
07.2.Decision Trees_ML
No ratings yet
07.2.Decision Trees_ML
32 pages
ML-Lecture-8-9-Classification
No ratings yet
ML-Lecture-8-9-Classification
35 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Decision Tree
100% (4)
Decision Tree
66 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
L-10 Iiitmg
No ratings yet
L-10 Iiitmg
28 pages
AIML Lec-11
No ratings yet
AIML Lec-11
18 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
10.1 Decision Tree
No ratings yet
10.1 Decision Tree
17 pages
Unit-3_ML
No ratings yet
Unit-3_ML
47 pages
Data Mining - Lecture 5
No ratings yet
Data Mining - Lecture 5
33 pages
Learning Analytics
No ratings yet
Learning Analytics
56 pages
Decision Tree Decision Tree: R. Akerkar
No ratings yet
Decision Tree Decision Tree: R. Akerkar
30 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture 3_Graph Theory_Discrete Mathematics_SDR (1)
No ratings yet
Lecture 3_Graph Theory_Discrete Mathematics_SDR (1)
35 pages
eBooks for all 1st year Sec (2)
No ratings yet
eBooks for all 1st year Sec (2)
1 page
CHEMISTRY SEMINAR
No ratings yet
CHEMISTRY SEMINAR
27 pages
M-1-pH Buffer - Dilution
No ratings yet
M-1-pH Buffer - Dilution
28 pages
Semester Lab Report - ESC292
No ratings yet
Semester Lab Report - ESC292
25 pages
Lab Manual - Basic Electronics - 2020 - 1599502255270
No ratings yet
Lab Manual - Basic Electronics - 2020 - 1599502255270
25 pages
Scheduling 4
No ratings yet
Scheduling 4
13 pages
Grade8English - The Boy Who Broke The Bank - Question Bank
No ratings yet
Grade8English - The Boy Who Broke The Bank - Question Bank
5 pages
Java Class 3
No ratings yet
Java Class 3
18 pages
Digital Twin Technology
100% (2)
Digital Twin Technology
15 pages
Download Complete Computational mathematics models methods and analysis with MATLAB and MPI 1st Edition Robert E. White PDF for All Chapters
100% (1)
Download Complete Computational mathematics models methods and analysis with MATLAB and MPI 1st Edition Robert E. White PDF for All Chapters
61 pages
Instant Download Fundamentals of Information Systems 9th Edition Ralph Stair PDF All Chapters
100% (7)
Instant Download Fundamentals of Information Systems 9th Edition Ralph Stair PDF All Chapters
65 pages
Test Your Skills in Python Language A Complete Questionnaire For Self-Assessment by Shivani Goel
No ratings yet
Test Your Skills in Python Language A Complete Questionnaire For Self-Assessment by Shivani Goel
148 pages
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
No ratings yet
VLSI Implementation of Bit Serial Architecture Based Multiplier in Floating Point Arithmetic
6 pages
Marine Automation and Impact On Shipboard Machinery
No ratings yet
Marine Automation and Impact On Shipboard Machinery
319 pages
XLI 2007 Brochure
No ratings yet
XLI 2007 Brochure
2 pages
GEC E3 - Module Living in The IT Era - Chapter 2
No ratings yet
GEC E3 - Module Living in The IT Era - Chapter 2
18 pages
Cs Self Quiz 1235678
No ratings yet
Cs Self Quiz 1235678
130 pages
Imaster NCE V100R020C10 REST NBI User Guide 10
100% (1)
Imaster NCE V100R020C10 REST NBI User Guide 10
110 pages
Gaurav Singh Mls-203 Assignment
No ratings yet
Gaurav Singh Mls-203 Assignment
9 pages
BPMN Fundamentals: Romi Satria Wahono
No ratings yet
BPMN Fundamentals: Romi Satria Wahono
49 pages
Energy Technology and Management PDF
No ratings yet
Energy Technology and Management PDF
242 pages
Final Exam Answer Key
No ratings yet
Final Exam Answer Key
3 pages
Axios Circular Error
No ratings yet
Axios Circular Error
5 pages
Your Palletizing Solution
No ratings yet
Your Palletizing Solution
8 pages
Portofoliu Programare
No ratings yet
Portofoliu Programare
16 pages
DPU40D-N06A3, DBU20B-N12A3, and DBU50B-N12A1 Distributed Power Quick Guide
No ratings yet
DPU40D-N06A3, DBU20B-N12A3, and DBU50B-N12A1 Distributed Power Quick Guide
8 pages
ARO_Training_769GB_rev08
No ratings yet
ARO_Training_769GB_rev08
35 pages
Cydia Sources
No ratings yet
Cydia Sources
3 pages
Malaysia Online Instant Coffee Market Size
No ratings yet
Malaysia Online Instant Coffee Market Size
16 pages
Integration Using Numerical Recipes: Realfunction Realfunction Operator
No ratings yet
Integration Using Numerical Recipes: Realfunction Realfunction Operator
21 pages
Guidebook General Participant Phic 2024
No ratings yet
Guidebook General Participant Phic 2024
13 pages
Thranda Kodiak Credits and Copyright
No ratings yet
Thranda Kodiak Credits and Copyright
2 pages
MS Publisher
No ratings yet
MS Publisher
45 pages
Swift Algorithms Data Structures
No ratings yet
Swift Algorithms Data Structures
50 pages
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
0% (1)
Principles of Digital Signal Processing 2nbsped 3030963217 9783030963217
689 pages

ML - 4

Uploaded by

ML - 4

Uploaded by

UNIVERSITY OF ENGINEERING & MANAGEMENT, KOLKATA

Course Name : AI & ML

Click to edit Master subtitle style

Concept of Decision Tree

July 12, 2024 2

Binary Decision Tree

July 12, 2024 3

July 12, 2024 4

age income student credit_rating buys_computer

student? yes credit rating?

no yes excellent fair

no July 12, 2024 yes no yes

July 12, 2024 6

• Such a classification is, in fact, made by posing questions starting from

July 12, 2024 7

How are decision tree used for classification?

July 12, 2024 8

Why are decision tree classifiers so popular?

July 12, 2024 9

– Modification of greedy strategy

July 12, 2024 10

July 12, 2024 11

July 12, 2024 12

• The tree starts as a single node, N, representing

The splitting criterion indicates the splitting attribute

• Let A be the splitting attribute. A has v distinct values,

July 12, 2024 17

• If the splitting attribute is continuous-valued or if we

• The notation used herein is as follows. Let D,

July 12, 2024 19

The expected information needed to classify a

July 12, 2024 22

The expected information needed to classify a

July 12, 2024 23

July 12, 2024 24

July 12, 2024 25

July 12, 2024 26

July 12, 2024 27

July 12, 2024 28

– gain_ratio(income) = 0.029/1.557 = 0.019

• The attribute provides the smallest ginisplit(D) (or the largest

Gini{low,high} is 0.458; Gini{medium,high} is 0.450.

Using IF-THEN Rules for Classification

<=30 31..40 >40

• Sequential covering algorithm: Extracts rules directly from training data

July 12, 2024 38

July 12, 2024 39

July 12, 2024 41

Overfitting and Tree Pruning

Enhancements to Basic Decision Tree Induction

When a decision tree is built, many of the

How does tree pruning work?” There are two common

July 12, 2024 47

The second and more common approach is postpruning,

• C4.5 uses a method called pessimistic pruning,

• Alternatively, prepruning and postpruning

July 12, 2024 51

July 12, 2024 52

July 12, 2024 53

July 12, 2024 54

• More recent decision tree algorithms that

Information Sources: NPTEL, HanKamber.

You might also like