Data Mining Quiz 2
Data Mining Quiz 2
Correct Answer
Marks: 0.50/0.50
In CART, the splitting criterion is decided in such a way that the net Gini Index across the nodes
Correct Answer
Marks: 0.50/0.50
b) has good performance on training dataset but relatively poor on testing dataset
Only a
Only b
both a and b
You Selected
None of these
Q No: 3
Correct Answer
Marks: 0.50/0.50
Where,
TP = True Positive
FP = False Positive
TN = True Negative
FN = False Negative
Q No: 4
Incorrect Answer
Marks: 0/0.50
Consider the following Decision Tree which is extracted from a bigger tree representing Churners (1)
and Non Churners (0) in a telecom company based on factors like ContractRenewal_1, DayMins,
DataUsage in GB etc.
The 1.3% proportion of users who use <= 319 minutes (DayMins<=319) shall not stick with the
existing Network Provider (Churner).
The proportion of users who use more than 319 minutes (DayMins >319) are Non-Churners.
You Selected
The proportion of users who use less than 276.3 DayMins are Non-Churners.
The 0.1% proportion of users whose DataUsage is <= 3.62 GB, and use > 276.3 DayMins shall stick
with the existing Network Provider (Non-Churner).
Correct Option
Please refer to the node circled in Red. If you notice just 1 level above this node (encorcled in yellow),
the split occurs at DayMins <=276.3. The encircled node (Red circle) represents the customers who
use more than 276.3 minutes (DayMins<=276.3 = False). Now at this node, another split occurs at
Data Usage <=3.62. For DataUsage<=3.62=True (encircled in green), these customers are non-
churners, i.e., this segment (usage DayMins > 276 and DataUsage <= 3.62GB) represents loyal
customers who will stick with the Operator and will not churn (Class 0)
Q No: 5
Correct Answer
Marks: 0.50/0.50
For the decision tree given below, identify the root node.
You Selected
Root node is the top most node in a Decision Tree. From the Root Node, the population is split into
various sub sets.
Q No: 6
Correct Answer
Marks: 0.50/0.50
Load the dataset heart.csv and attempt the following questions based on that.
You are required to execute the following – perform the basic sanitary checks on the data,
drop the dependent variable (target) and store it into an object and store the rest into a
second object, split them to create Train and Test Dataset (70% Train and 30% Test).
True
You Selected
False
Q No: 7
Correct Answer
Marks: 0.50/0.50
You are required to build a Decision Tree Classifier model (with the parameters – max_depth=7,
criterion = ‘Gini’, random_state=0), check the Decision Tree model metrics.
What is the value of Accuracy Score in the basic Decision Tree Model of the train and the test data
respectively?
0.9858, 0.7582
You Selected
0.9052, 0.7011
0.9588, 0.4713
0.9995, 0.8951
Q No: 8
Correct Answer
Marks: 0.50/0.50
The three most important features according to the basic Decision Tree Model are:
cp, oldpeak, ca
You Selected
slope, age, sex
Correct Answer
Marks: 0.50/0.50
What is the AUC score (test set) obtained using the basic Decision Tree Model?
0.9054
0.8132
0.7580
You Selected
0.7541
Q No: 10
Correct Answer
Marks: 0.50/0.50
30, 90
33, 91
90, 30
91, 33
You Selected
In Python, the confusion matrix prints the opposite of what is theoretically taught.
Index (0,0) is TN
Index (0,1) is FP
Index (1,0) is FN
Index (1,1) is TP
Refer to the official
documentation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.
html
In the binary case, we can extract true positives, etc as follows:
>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)