0% found this document useful (0 votes)

83 views7 pages

Classification: Table 4.1. Data Set For Exercise 2

This document discusses classification algorithms and provides examples to calculate various metrics for classification problems, including: 1) Calculating the Gini index for attributes like gender, car type, and shirt size using a sample training dataset. Car type had the lowest Gini index, making it the best attribute. 2) Computing information gain and error rates for attributes in another sample training dataset to determine the best attribute for splitting nodes. Attribute a1 had the highest information gain and lowest error rate, making it the best attribute. 3) Explaining that the entropy of a node can never increase after splitting, as the entropy calculation incorporates the probabilities from splitting the node into child nodes.

Uploaded by

Asad Ahmed Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views7 pages

Classification: Table 4.1. Data Set For Exercise 2

Uploaded by

Asad Ahmed Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

26 Chapter 4 Classiﬁcation

The preceding tree cannot be simpliﬁed.

2. Consider the training examples shown in Table 4.1 for a binary classiﬁcation
problem.

Table 4.1. Data set for Exercise 2.

Customer ID Gender Car Type Shirt Size Class
1 M Family Small C0
2 M Sports Medium C0
3 M Sports Medium C0
4 M Sports Large C0
5 M Sports Extra Large C0
6 M Sports Extra Large C0
7 F Sports Small C0
8 F Sports Small C0
9 F Sports Medium C0
10 F Luxury Large C0
11 M Family Large C1
12 M Family Extra Large C1
13 M Family Medium C1
14 M Luxury Extra Large C1
15 F Luxury Small C1
16 F Luxury Small C1
17 F Luxury Medium C1
18 F Luxury Medium C1
19 F Luxury Medium C1
20 F Luxury Large C1

(a) Compute the Gini index for the overall collection of training examples.
Answer:
Gini = 1 − 2 × 0.52 = 0.5.
(b) Compute the Gini index for the Customer ID attribute.
Answer:
The gini for each Customer ID value is 0. Therefore, the overall gini
for Customer ID is 0.
(c) Compute the Gini index for the Gender attribute.
Answer:
The gini for Male is 1 − 2 × 0.52 = 0.5. The gini for Female is also 0.5.
Therefore, the overall gini for Gender is 0.5 × 0.5 + 0.5 × 0.5 = 0.5.
27

Table 4.2. Data set for Exercise 3.

Instance a1 a2 a3 Target Class
1 T T 1.0 +
2 T T 6.0 +
3 T F 5.0 −
4 F F 4.0 +
5 F T 7.0 −
6 F T 3.0 −
7 F F 8.0 −
8 T F 7.0 +
9 F T 5.0 −

(d) Compute the Gini index for the Car Type attribute using multiway
split.
Answer:
The gini for Family car is 0.375, Sports car is 0, and Luxury car is
0.2188. The overall gini is 0.1625.
(e) Compute the Gini index for the Shirt Size attribute using multiway
split.
Answer:
The gini for Small shirt size is 0.48, Medium shirt size is 0.4898, Large
shirt size is 0.5, and Extra Large shirt size is 0.5. The overall gini for
Shirt Size attribute is 0.4914.
(f) Which attribute is better, Gender, Car Type, or Shirt Size?
Answer:
Car Type because it has the lowest gini among the three attributes.
(g) Explain why Customer ID should not be used as the attribute test
condition even though it has the lowest Gini.
Answer:
The attribute has no predictive power since new customers are assigned
to new Customer IDs.

3. Consider the training examples shown in Table 4.2 for a binary classiﬁcation
problem.

(a) What is the entropy of this collection of training examples with respect
to the positive class?
Answer:
There are four positive examples and ﬁve negative examples. Thus,
P (+) = 4/9 and P (−) = 5/9. The entropy of the training examples is
−4/9 log2 (4/9) − 5/9 log2 (5/9) = 0.9911.
28 Chapter 4 Classiﬁcation

(b) What are the information gains of a1 and a2 relative to these training
examples?
Answer:
For attribute a1 , the corresponding counts and probabilities are:
a1 + -
T 3 1
F 1 4

The entropy for a1 is

4
− (3/4) log2 (3/4) − (1/4) log2 (1/4)
9

5
+ − (1/5) log2 (1/5) − (4/5) log2 (4/5) = 0.7616.
9

Therefore, the information gain for a1 is 0.9911 − 0.7616 = 0.2294.

For attribute a2 , the corresponding counts and probabilities are:

a2 + -
T 2 3
F 2 2

The entropy for a2 is

5
− (2/5) log2 (2/5) − (3/5) log2 (3/5)
9

4
+ − (2/4) log2 (2/4) − (2/4) log2 (2/4) = 0.9839.
9

Therefore, the information gain for a2 is 0.9911 − 0.9839 = 0.0072.

(c) For a3 , which is a continuous attribute, compute the information gain
for every possible split.
Answer:
a3 Class label Split point Entropy Info Gain
1.0 + 2.0 0.8484 0.1427
3.0 - 3.5 0.9885 0.0026
4.0 + 4.5 0.9183 0.0728
5.0 -
5.0 - 5.5 0.9839 0.0072
6.0 + 6.5 0.9728 0.0183
7.0 +
7.0 - 7.5 0.8889 0.1022

The best split for a3 occurs at split point equals to 2.

(d) What is the best split (among a1 , a2 , and a3 ) according to the infor-
mation gain?
Answer:
According to information gain, a1 produces the best split.
(e) What is the best split (between a1 and a2 ) according to the classiﬁcation
error rate?
Answer:
For attribute a1 : error rate = 2/9.
For attribute a2 : error rate = 4/9.
Therefore, according to error rate, a1 produces the best split.
(f) What is the best split (between a1 and a2 ) according to the Gini index?
Answer:
For attribute a1 , the gini index is

4 5
1 − (3/4) − (1/4) + 1 − (1/5) − (4/5) = 0.3444.
2 2 2 2
9 9
For attribute a2 , the gini index is

5 4
1 − (2/5) − (3/5) + 1 − (2/4) − (2/4) = 0.4889.
2 2 2 2
9 9
Since the gini index for a1 is smaller, it produces the better split.

4. Show that the entropy of a node never increases after splitting it into smaller
successor nodes.
Answer:
Let Y = {y1 , y2 , · · · , yc } denote the c classes and X = {x1 , x2 , · · · , xk } denote
the k attribute values of an attribute X. Before a node is split on X, the
entropy is:

c
c
k
E(Y ) = − P (yj ) log2 P (yj ) = P (xi , yj ) log2 P (yj ), (4.1)
j=1 j=1 i=1

k
where we have used the fact that P (yj ) = i=1 P (xi , yj ) from the law of
total probability.
After splitting on X, the entropy for each child node X = xi is:

c
E(Y |xi ) = − P (yj |xi ) log2 P (yj |xi ) (4.2)
j=1
30 Chapter 4 Classiﬁcation

where P (yj |xi ) is the fraction of examples with X = xi that belong to class
yj . The entropy after splitting on X is given by the weighted entropy of the
children nodes:

k
E(Y |X) = P (xi )E(Y |xi )
i=1

k
c
= − P (xi )P (yj |xi ) log2 P (yj |xi )
i=1 j=1

k
c
= − P (xi , yj ) log2 P (yj |xi ), (4.3)
i=1 j=1

where we have used a known fact from probability theory that P (xi , yj ) =
P (yj |xi )×P (xi ). Note that E(Y |X) is also known as the conditional entropy
of Y given X.
To answer this question, we need to show that E(Y |X) ≤ E(Y ). Let us com-
pute the diﬀerence between the entropies after splitting and before splitting,
i.e., E(Y |X) − E(Y ), using Equations 4.1 and 4.3:

E(Y |X) − E(Y )

k
c
k
c
= − P (xi , yj ) log2 P (yj |xi ) + P (xi , yj ) log2 P (yj )
i=1 j=1 i=1 j=1

k
c
P (yj )
= P (xi , yj ) log2
i=1 j=1
P (yj |xi )

k
c
P (xi )P (yj )
= P (xi , yj ) log2 (4.4)
i=1 j=1
P (xi , yj )

To prove that Equation 4.4 is non-positive, we use the following property of

a logarithmic function:

d

d

ak log(zk ) ≤ log ak zk , (4.5)
k=1 k=1

d
subject to the condition that k=1 ak = 1. This property is a special case
of a more general theorem involving convex functions (which include the
logarithmic function) known as Jensen’s inequality.
31

By applying Jensen’s inequality, Equation 4.4 can be bounded as follows:

k c
P (xi )P (yj )
E(Y |X) − E(Y ) ≤ log2 P (xi , yj )
i=1 j=1
P (xi , yj )

k
c
= log2 P (xi ) P (yj )
i=1 j=1
= log2 (1)
= 0
Because E(Y |X) − E(Y ) ≤ 0, it follows that entropy never increases after
splitting on an attribute.

5. Consider the following data set for a binary class problem.

A B Class Label
T F +
T T +
T T +
T F −
T T +
F F −
F F −
F F −
T T −
T F −

(a) Calculate the information gain when splitting on A and B. Which

attribute would the decision tree induction algorithm choose?
Answer:
The contingency tables after splitting on attributes A and B are:
A=T A=F B=T B=F
+ 4 0 + 3 1
− 3 3 − 1 5

The overall entropy before splitting is:

Eorig = −0.4 log 0.4 − 0.6 log 0.6 = 0.9710

The information gain after splitting on A is:

4 4 3 3
EA=T = − log − log = 0.9852
7 7 7 7
3 3 0 0
EA=F = − log − log = 0
3 3 3 3
∆ = Eorig − 7/10EA=T − 3/10EA=F = 0.2813
32 Chapter 4 Classiﬁcation

The information gain after splitting on B is:

3 3 1 1
EB=T = − log − log = 0.8113
4 4 4 4
1 1 5 5
EB=F = − log − log = 0.6500
6 6 6 6
∆ = Eorig − 4/10EB=T − 6/10EB=F = 0.2565

Therefore, attribute A will be chosen to split the node.

(b) Calculate the gain in the Gini index when splitting on A and B. Which
attribute would the decision tree induction algorithm choose?
Answer:
The overall gini before splitting is:

Gorig = 1 − 0.42 − 0.62 = 0.48

The gain in gini after splitting on A is:

2 2
4 3
GA=T = 1 − − = 0.4898
7 7
2 2
3 0
GA=F = 1 = − =0
3 3
∆ = Gorig − 7/10GA=T − 3/10GA=F = 0.1371

The gain in gini after splitting on B is:

2 2
1 3
GB=T = 1 − − = 0.3750
4 4
2 2
1 5
GB=F = 1 = − = 0.2778
6 6
∆ = Gorig − 4/10GB=T − 6/10GB=F = 0.1633

Therefore, attribute B will be chosen to split the node.

(c) Figure 4.13 shows that entropy and the Gini index are both monotonously
increasing on the range [0, 0.5] and they are both monotonously decreas-
ing on the range [0.5, 1]. Is it possible that information gain and the
gain in the Gini index favor diﬀerent attributes? Explain.
Answer:
Yes, even though these measures have similar range and monotonous
behavior, their respective gains, ∆, which are scaled diﬀerences of the
measures, do not necessarily behave in the same way, as illustrated by
the results in parts (a) and (b).

6. Consider the following set of training examples.

Solved Numericals
No ratings yet
Solved Numericals
7 pages
Homework1 Excersises
No ratings yet
Homework1 Excersises
12 pages
Decision Trees
No ratings yet
Decision Trees
31 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
University of Gondar: August 2011 E.C Gondar, Ethiopia
No ratings yet
University of Gondar: August 2011 E.C Gondar, Ethiopia
10 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Excercise Solutions
No ratings yet
Excercise Solutions
21 pages
6 (4 Files Merged)
0% (1)
6 (4 Files Merged)
4 pages
Decision Tree
No ratings yet
Decision Tree
47 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Business Analytics & Machine Learning: Decision Tree Classifiers
No ratings yet
Business Analytics & Machine Learning: Decision Tree Classifiers
60 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
21 pages
Data Mining Algorithms Classification L4
No ratings yet
Data Mining Algorithms Classification L4
7 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
Solutions 2
No ratings yet
Solutions 2
5 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
3160714_DM_GTU_Study_Material_Presentations_Unit-4_21052021124323PM
No ratings yet
3160714_DM_GTU_Study_Material_Presentations_Unit-4_21052021124323PM
28 pages
Unit-3_ML
No ratings yet
Unit-3_ML
47 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
CH 5
No ratings yet
CH 5
81 pages
AI-day-3-14th mar-2023
No ratings yet
AI-day-3-14th mar-2023
12 pages
Classification
No ratings yet
Classification
7 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Gini Vs Entrophy
No ratings yet
Gini Vs Entrophy
8 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision Trees MIT 15.097 Course Notes
No ratings yet
Decision Trees MIT 15.097 Course Notes
17 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
dm4
No ratings yet
dm4
68 pages
DM Lec8
No ratings yet
DM Lec8
17 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Trees & Overfi/ng
No ratings yet
Decision Trees & Overfi/ng
32 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
Math Practice Tests For The ACT
From Everand
Math Practice Tests For The ACT
Vibrant Publishers
No ratings yet
Data Smart: Using Data Science to Transform Information into Insight
From Everand
Data Smart: Using Data Science to Transform Information into Insight
John W. Foreman
4.5/5 (20)
Chess Training Repertoire 2: Chess Training Repertoire, #2
From Everand
Chess Training Repertoire 2: Chess Training Repertoire, #2
Tim Sawyer
No ratings yet
Chapter 3
No ratings yet
Chapter 3
102 pages
Data Structure Questions and Answers - Stack Using Array: Prev Next
No ratings yet
Data Structure Questions and Answers - Stack Using Array: Prev Next
20 pages
ML Unit 3 V1
No ratings yet
ML Unit 3 V1
25 pages
Lab A - 02 - 2022
No ratings yet
Lab A - 02 - 2022
2 pages
CS3401 Set1
No ratings yet
CS3401 Set1
3 pages
Investigatory - Projects - Maths Rohitsasmal
No ratings yet
Investigatory - Projects - Maths Rohitsasmal
11 pages
CSE221 MID Fall22 PDF
No ratings yet
CSE221 MID Fall22 PDF
5 pages
Cyclic Redundancy Check
No ratings yet
Cyclic Redundancy Check
10 pages
Question Bank For Third Year 5TH Sem Discrete Mathametics 2013 Regulation
No ratings yet
Question Bank For Third Year 5TH Sem Discrete Mathametics 2013 Regulation
17 pages
13 516 3 Artificial Neural Network A T
No ratings yet
13 516 3 Artificial Neural Network A T
2 pages
Classes and Objects
100% (1)
Classes and Objects
8 pages
Visvesvaraya Technological University: "File Compression Using Huffman Coding"
No ratings yet
Visvesvaraya Technological University: "File Compression Using Huffman Coding"
5 pages
DSA Practical File - MCA
No ratings yet
DSA Practical File - MCA
47 pages
Gauss-Seidel Iteration Method
No ratings yet
Gauss-Seidel Iteration Method
7 pages
Java Collection Framework
No ratings yet
Java Collection Framework
21 pages
Qam ch08 1
No ratings yet
Qam ch08 1
35 pages
CSC 233 Exam 2010-2011
No ratings yet
CSC 233 Exam 2010-2011
2 pages
Psoc
100% (1)
Psoc
2 pages
NSM - Runge Kutta - 2nd Order - Lab 4
No ratings yet
NSM - Runge Kutta - 2nd Order - Lab 4
3 pages
Quantum Engineering - Wikipedia
No ratings yet
Quantum Engineering - Wikipedia
10 pages
C++ Record
No ratings yet
C++ Record
80 pages
Evaluation of Postfix Expression
No ratings yet
Evaluation of Postfix Expression
7 pages
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
No ratings yet
Lecture 09 - Calculus and Optimization Techniques (3) - Plain
15 pages
A1m06 Form A1
No ratings yet
A1m06 Form A1
4 pages
Idoc - Pub - Ccodechamp Com C Program of Newton Raphson Method C Code Cha PDF
No ratings yet
Idoc - Pub - Ccodechamp Com C Program of Newton Raphson Method C Code Cha PDF
7 pages
Radix Sort
No ratings yet
Radix Sort
10 pages
On The Fault-Tolerant Metric Dimension of Convex Polytopes
No ratings yet
On The Fault-Tolerant Metric Dimension of Convex Polytopes
14 pages
Dsa 2 PDF
No ratings yet
Dsa 2 PDF
12 pages
Applied Mathematics Unit Test 1 - 180124
No ratings yet
Applied Mathematics Unit Test 1 - 180124
3 pages
Code Optimiztion Criteria For Code-Improving Transformations
No ratings yet
Code Optimiztion Criteria For Code-Improving Transformations
10 pages

Classification: Table 4.1. Data Set For Exercise 2

Uploaded by

Classification: Table 4.1. Data Set For Exercise 2

Uploaded by

26 Chapter 4 Classiﬁcation

The preceding tree cannot be simpliﬁed.

Table 4.1. Data set for Exercise 2.

Table 4.2. Data set for Exercise 3.

The entropy for a1 is

Therefore, the information gain for a1 is 0.9911 − 0.7616 = 0.2294.

For attribute a2 , the corresponding counts and probabilities are:

The entropy for a2 is

Therefore, the information gain for a2 is 0.9911 − 0.9839 = 0.0072.

The best split for a3 occurs at split point equals to 2.

E(Y |X) − E(Y )

To prove that Equation 4.4 is non-positive, we use the following property of

By applying Jensen’s inequality, Equation 4.4 can be bounded as follows:

5. Consider the following data set for a binary class problem.

(a) Calculate the information gain when splitting on A and B. Which

The overall entropy before splitting is:

The information gain after splitting on A is:

The information gain after splitting on B is:

Therefore, attribute A will be chosen to split the node.

Gorig = 1 − 0.42 − 0.62 = 0.48

The gain in gini after splitting on A is:

The gain in gini after splitting on B is:

Therefore, attribute B will be chosen to split the node.

6. Consider the following set of training examples.

You might also like