100% found this document useful (2 votes)

3K views

Data Mining Quiz 2

The document discusses a 10 question quiz on decision tree algorithms and the CART model. It includes questions about overfitting, accuracy, precision, recall, important features, and performance metrics for decision tree models built on heart disease data.

Uploaded by

Shripad H

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

3K views

Data Mining Quiz 2

Uploaded by

Shripad H

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Mining Quiz 2 - CART

Type :Graded Quiz Questions : 10 Time : 45m

Q No: 1

Correct Answer
Marks: 0.50/0.50
In CART, the splitting criterion is decided in such a way that the net Gini Index across the nodes

Reduced by at least 'x', x being predefined

Depends on the context of the problem

Reduces by maximum possible amount in comparison to the root node

You Selected
increases by maximum possible amount in comparison to the root node
Q No: 2

Correct Answer
Marks: 0.50/0.50

Overfitting happens when the model,

a) has high complexity and captures both information & noise

b) has good performance on training dataset but relatively poor on testing dataset

Only a

Only b

both a and b
You Selected
None of these
Q No: 3

Correct Answer
Marks: 0.50/0.50

Match the following:

i. Accuracy A. TP/(TP + FN)

ii. Sensitivity B. (TP + TN)/(TP + TN + FP + FN)

iii. Specificity C. TP/(TP + FP)

iv. Precision D. TN/(TN + FP)

Where,

TP = True Positive

FP = False Positive

TN = True Negative

FN = False Negative

i-B, ii-C, iii-D, iv-B

i-A, ii-B, iii-C, iv-D

i-A, ii-D, iii-C, iv-B

i-B, ii-A, iii–D, iv-C

You Selected

Specificity is TN/(TN + FP) and Sensitivity is TP/(TP + FN).

Accuracy is (TP + TN)/(TP + TN + FP + FN) and Precision is TP/(TP + FP)

Q No: 4

Incorrect Answer
Marks: 0/0.50

Consider the following Decision Tree which is extracted from a bigger tree representing Churners (1)
and Non Churners (0) in a telecom company based on factors like ContractRenewal_1, DayMins,
DataUsage in GB etc.

HINT: Assume this to be an independent decision tree.

What insights can we draw from the excerpt of this Decision Tree?

The 1.3% proportion of users who use <= 319 minutes (DayMins<=319) shall not stick with the
existing Network Provider (Churner).

The proportion of users who use more than 319 minutes (DayMins >319) are Non-Churners.
You Selected
The proportion of users who use less than 276.3 DayMins are Non-Churners.

None of the mentioned

The 0.1% proportion of users whose DataUsage is <= 3.62 GB, and use > 276.3 DayMins shall stick
with the existing Network Provider (Non-Churner).
Correct Option
Please refer to the node circled in Red. If you notice just 1 level above this node (encorcled in yellow),
the split occurs at DayMins <=276.3. The encircled node (Red circle) represents the customers who
use more than 276.3 minutes (DayMins<=276.3 = False). Now at this node, another split occurs at
Data Usage <=3.62. For DataUsage<=3.62=True (encircled in green), these customers are non-
churners, i.e., this segment (usage DayMins > 276 and DataUsage <= 3.62GB) represents loyal
customers who will stick with the Operator and will not churn (Class 0)

Q No: 5

Correct Answer
Marks: 0.50/0.50

For the decision tree given below, identify the root node.

HINT: Assume this to be an independent Decision tree.

None of the mentioned

You Selected
Root node is the top most node in a Decision Tree. From the Root Node, the population is split into
various sub sets.
Q No: 6

Correct Answer
Marks: 0.50/0.50
 Load the dataset heart.csv and attempt the following questions based on that.
 You are required to execute the following – perform the basic sanitary checks on the data,
drop the dependent variable (target) and store it into an object and store the rest into a
second object, split them to create Train and Test Dataset (70% Train and 30% Test).

The test dataset has 91 observations for each variable.

True
You Selected
False
Q No: 7

Correct Answer
Marks: 0.50/0.50

You are required to build a Decision Tree Classifier model (with the parameters – max_depth=7,
criterion = ‘Gini’, random_state=0), check the Decision Tree model metrics.

What is the value of Accuracy Score in the basic Decision Tree Model of the train and the test data
respectively?

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

0.9858, 0.7582
You Selected
0.9052, 0.7011

0.9588, 0.4713

0.9995, 0.8951
Q No: 8

Correct Answer
Marks: 0.50/0.50

The three most important features according to the basic Decision Tree Model are:

(random_state=0 is to be used for building models)

cp, thal, thalach

cp, oldpeak, ca
You Selected
slope, age, sex

restecg, trestbps, fbs

Q No: 9

Correct Answer
Marks: 0.50/0.50

What is the AUC score (test set) obtained using the basic Decision Tree Model?

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

0.9054

0.8132

0.7580
You Selected
0.7541
Q No: 10

Correct Answer
Marks: 0.50/0.50

What is the True Negative of train vs test in this Decision Tree Model?

(random_state=0 is to be used for model building)

30, 90

33, 91

90, 30

91, 33
You Selected
In Python, the confusion matrix prints the opposite of what is theoretically taught.
Index (0,0) is TN
Index (0,1) is FP
Index (1,0) is FN
Index (1,1) is TP
Refer to the official
documentation: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.
html
In the binary case, we can extract true positives, etc as follows:
>>> tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0]).ravel()
>>> (tn, fp, fn, tp)
(0, 2, 1, 1)

Advance Statistics Project
100% (9)
Advance Statistics Project
9 pages
Vijaya ML
88% (8)
Vijaya ML
26 pages
Points
0% (6)
Points
1 page
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
ML Quiz 3 Machine Learning Great Learning
89% (9)
ML Quiz 3 Machine Learning Great Learning
7 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
ARIMA Models: Instructions
60% (5)
ARIMA Models: Instructions
3 pages
Data Mining - Business Report: Clustering Clean - Ads
100% (4)
Data Mining - Business Report: Clustering Clean - Ads
24 pages
SMDM Project Business
80% (5)
SMDM Project Business
13 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Kailash BusinessReport
No ratings yet
Kailash BusinessReport
31 pages
AS Project - 3 Business Report
0% (1)
AS Project - 3 Business Report
10 pages
Quiz 3 LDA Predictive Modeling Great Learning
100% (5)
Quiz 3 LDA Predictive Modeling Great Learning
7 pages
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
100% (1)
Weekly Quiz 1 (TSF) - Time Series Forecasting - Great Learning PDF
4 pages
Mountain State University 2
80% (5)
Mountain State University 2
4 pages
This Study Resource Was: Quiz 3
100% (1)
This Study Resource Was: Quiz 3
5 pages
Advanced Statistics Project - Business Report
No ratings yet
Advanced Statistics Project - Business Report
11 pages
PM - ExtendedProject - Business Report
100% (4)
PM - ExtendedProject - Business Report
35 pages
Logistic Regression Quiz - Predictive Modeling - Great Learning
100% (4)
Logistic Regression Quiz - Predictive Modeling - Great Learning
8 pages
Ashish+Gupta+Project+Report Advanced+Statistics 13 11 2022
50% (2)
Ashish+Gupta+Project+Report Advanced+Statistics 13 11 2022
21 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
Predictive Modeling PDF
100% (3)
Predictive Modeling PDF
49 pages
Advanced Statistics Project
17% (6)
Advanced Statistics Project
2 pages
Weekly Quiz 3 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
100% (2)
Weekly Quiz 3 (AS) - PGPBABI.O.OCT19 Advanced Statistics - Great Learning PDF
6 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
100% (2)
SMDM Project Business Report - Ketan Sawalkar: (Document Title)
17 pages
Business Report
No ratings yet
Business Report
12 pages
Graded Project As - Kamalpreet Kaur
No ratings yet
Graded Project As - Kamalpreet Kaur
8 pages
TSF Week3 Quiz Part2 PDF
67% (3)
TSF Week3 Quiz Part2 PDF
3 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
ML Ts Proj
100% (9)
ML Ts Proj
58 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
100% (1)
Weekly Quiz 2 Predictive Modeling Logistic Regression PDF
3 pages
Advanced Statistics Jupyter File PDF
100% (2)
Advanced Statistics Jupyter File PDF
56 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Problem 2
100% (1)
Problem 2
10 pages
Predictive Modelling Project - n1
100% (4)
Predictive Modelling Project - n1
36 pages
As Quiz 3 PCA Solution PDF
100% (1)
As Quiz 3 PCA Solution PDF
1 page
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
Predictive Modeling
100% (1)
Predictive Modeling
22 pages
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
No ratings yet
Problem 1:: Readingcsv PD Read - Excel (Readingcsv) Readingcsv Head
18 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Weekly Quiz - 2 (TSF) - Time Series Forecasting - Great Learning PDF
100% (3)
Weekly Quiz - 2 (TSF) - Time Series Forecasting - Great Learning PDF
4 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Graded Project AS
No ratings yet
Graded Project AS
14 pages
Advanced Statistics: Business Report Ranvijay Sharma
No ratings yet
Advanced Statistics: Business Report Ranvijay Sharma
16 pages
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
100% (2)
Which Year Has The Most Number of Records?: AS Quiz 2: Exploratory Data Analysis
5 pages
Predective Modelling Project Business Report
50% (2)
Predective Modelling Project Business Report
58 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Predictive Modelling Sweta Kumari
No ratings yet
Predictive Modelling Sweta Kumari
35 pages
Project 7 - DVT - Manoj
No ratings yet
Project 7 - DVT - Manoj
1 page

Data Mining Quiz 2

Uploaded by

Data Mining Quiz 2

Uploaded by

Data Mining Quiz 2 - CART

Type :Graded Quiz Questions : 10 Time : 45m

Reduced by at least 'x', x being predefined

Depends on the context of the problem

Reduces by maximum possible amount in comparison to the root node

Overfitting happens when the model,

a) has high complexity and captures both information & noise

Match the following:

i. Accuracy A. TP/(TP + FN)

ii. Sensitivity B. (TP + TN)/(TP + TN + FP + FN)

iii. Specificity C. TP/(TP + FP)

i-B, ii-C, iii-D, iv-B

i-A, ii-B, iii-C, iv-D

i-A, ii-D, iii-C, iv-B

i-B, ii-A, iii–D, iv-C

Specificity is TN/(TN + FP) and Sensitivity is TP/(TP + FN).

Accuracy is (TP + TN)/(TP + TN + FP + FN) and Precision is TP/(TP + FP)

HINT: Assume this to be an independent decision tree.

None of the mentioned

HINT: Assume this to be an independent Decision tree.

The test dataset has 91 observations for each variable.

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

(random_state=0 is to be used for building models)

cp, thal, thalach

restecg, trestbps, fbs

(random_state=0 is to be used for model building)

Note:Choose the answer to the nearest given option

What is the True Negative of train vs test in this Decision Tree Model?

(random_state=0 is to be used for model building)

You might also like