100% found this document useful (2 votes)
3K views

Data Mining Quiz 2

The document discusses a 10 question quiz on decision tree algorithms and the CART model. It includes questions about overfitting, accuracy, precision, recall, important features, and performance metrics for decision tree models built on heart disease data.

Uploaded by

Shripad H
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
3K views

Data Mining Quiz 2

The document discusses a 10 question quiz on decision tree algorithms and the CART model. It includes questions about overfitting, accuracy, precision, recall, important features, and performance metrics for decision tree models built on heart disease data.

Uploaded by

Shripad H
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Mining Quiz 2 - CART

Type :Graded Quiz Questions : 10 Time : 45m


Q No: 1

Correct Answer
Marks: 0.50/0.50
In CART, the splitting criterion is decided in such a way that the net Gini Index across the nodes

Reduced by at least 'x', x being predefined

Depends on the context of the problem

Reduces by maximum possible amount in comparison to the root node


You Selected
increases by maximum possible amount in comparison to the root node
Q No: 2

Correct Answer
Marks: 0.50/0.50

Overfitting happens when the model,

a) has high complexity and captures both information & noise

b) has good performance on training dataset but relatively poor on testing dataset

Only a

Only b

both a and b
You Selected
None of these
Q No: 3

Correct Answer
Marks: 0.50/0.50

Match the following:

i.                     Accuracy A.     TP/(TP + FN)

ii.                   Sensitivity B.      (TP + TN)/(TP + TN + FP + FN)

iii.                 Specificity C.      TP/(TP + FP)


iv.                 Precision D.      TN/(TN + FP)

 Where,

TP = True Positive

FP = False Positive

TN = True Negative

FN = False Negative

i-B, ii-C, iii-D, iv-B

i-A, ii-B, iii-C, iv-D

i-A, ii-D, iii-C, iv-B

i-B, ii-A, iii–D, iv-C


You Selected

Specificity is TN/(TN + FP) and Sensitivity is TP/(TP + FN). 

Accuracy is (TP + TN)/(TP + TN + FP + FN) and Precision is TP/(TP + FP)

Q No: 4

Incorrect Answer
Marks: 0/0.50

Consider the following Decision Tree which is extracted from a bigger tree representing Churners (1)
and Non Churners (0) in a telecom company based on factors like ContractRenewal_1, DayMins,
DataUsage in GB etc.

HINT: Assume this to be an independent decision tree.


What insights can we draw from the excerpt of this Decision Tree? 

The 1.3% proportion of users who use <= 319 minutes (DayMins<=319) shall not stick with the
existing Network Provider (Churner).

The proportion of users who use more than 319 minutes (DayMins >319) are Non-Churners.
You Selected
The proportion of users who use less than 276.3 DayMins are Non-Churners.

None of the mentioned

The 0.1% proportion of users whose DataUsage is <= 3.62 GB, and use > 276.3 DayMins shall stick
with the existing Network Provider (Non-Churner).
Correct Option
Please refer to the node circled in Red. If you notice just 1 level above this node (encorcled in yellow),
the split occurs at DayMins <=276.3. The encircled node (Red circle) represents the customers who
use more than 276.3 minutes (DayMins<=276.3 = False). Now at this node, another split occurs at
Data Usage <=3.62. For DataUsage<=3.62=True (encircled in green), these customers are non-
churners, i.e., this segment (usage DayMins > 276 and DataUsage <= 3.62GB) represents loyal
customers who will stick with the Operator and will not churn (Class 0)

Q No: 5

Correct Answer
Marks: 0.50/0.50

For the decision tree given below, identify the root node.

HINT: Assume this to be an independent Decision tree.


None of the mentioned

 
 

You Selected
Root node is the top most node in a Decision Tree. From the Root Node, the population is split into
various sub sets.
Q No: 6

Correct Answer
Marks: 0.50/0.50
 Load the dataset heart.csv and attempt the following questions based on that. 
 You are required to execute the following – perform the basic sanitary checks on the data,
drop the dependent variable (target) and store it into an object and store the rest into a
second object, split them to create Train and Test Dataset (70% Train and 30% Test).

The test dataset has 91 observations for each variable.

True
You Selected
False
Q No: 7

Correct Answer
Marks: 0.50/0.50

You are required to build a Decision Tree Classifier model (with the parameters – max_depth=7,
criterion = ‘Gini’, random_state=0), check the Decision Tree model metrics.

What is the value of Accuracy Score in the basic Decision Tree Model of the train and the test data
respectively?

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

0.9858, 0.7582
You Selected
0.9052, 0.7011

0.9588, 0.4713

0.9995, 0.8951
Q No: 8

Correct Answer
Marks: 0.50/0.50

The three most important features according to the basic Decision Tree Model are:

(random_state=0 is to be used for building models)

cp, thal, thalach

cp, oldpeak, ca
You Selected
slope, age, sex

restecg, trestbps, fbs


Q No: 9

Correct Answer
Marks: 0.50/0.50

What is the AUC score (test set) obtained using the basic Decision Tree Model?

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

0.9054

0.8132

0.7580
You Selected
0.7541
Q No: 10

Correct Answer
Marks: 0.50/0.50

What is the True Negative of train vs test in this Decision Tree Model?

(random_state=0 is to be used for model building)

30, 90

33, 91

90, 30

91, 33
You Selected
In Python, the confusion matrix prints the opposite of what is theoretically taught.
Index (0,0) is TN
Index (0,1) is FP
Index (1,0) is FN
Index (1,1) is TP
Refer to the official
documentation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.
html
In the binary case, we can extract true positives, etc as follows:
>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)

You might also like