0% found this document useful (0 votes)
5 views

Data Science and ML - End Term

iim kashipur

Uploaded by

tansley007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Science and ML - End Term

iim kashipur

Uploaded by

tansley007
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Term IV 2023: End-Term Exam Question Paper - Part B

Subject:Data Science and Machine Learning


Program :MBA / MBA(Analytics)

Faculty : Prof. K. Venkataraghavan


Date of Exam : 22. 09.2023

Part B
Use of Internet: Yes
Use of Laptop : Allowed
Open book : Yes
Calculator : Yes
Duration : 75 Minutes
Part B Marks : 20 Marks

Instructions

Submit

1. One single . ipynb file containing code, output and


interpretation for all questions.
2. Donot submit .py file. No marks will be given
3. Datasets can be downloaded from End-Term Exam Placeholder in
Google Classroom of DSML Course.
4. The single .ipynb file must be uploaded in End-Term Exam
Placeholder in Google Classroom.
5. It is your responsibility to ensure that you upload the ipynb
file before deadline in the Google Classroom . Late Submissions
will be summarily disregarded. I can very well see late
Submissions in Google Classroom.
6. I Will not accept any files later on. Emailing of files is
strictly prohibited.
7. E-Mailing of Part B will attract penalty marks in Part A.
8. No Excuses (Laptop Crashed / Could Not Connect to Internet)
will be entertained.
9. If you copy, you will be awarded zero for End-Term

Note

1. If your .ipnyb does not run, you may not get any marks.
2. It is your responsibility to ensure that the files you submit
are complete in all aspects.
3. Do not forget to mention your name and rollnumber in the file
name

4. You should use your computer to run the code. Sharing of code
files is strictly prohibited.

Datasets Supplied
1. Use the Use the german rec.t dataset German Credit Data csv'

Q1:Use the dataset German Credit Data.csv. Your task is as follows.

L. Divide the data into train and test using last 3 digits of your
roll number as random state [2 Marks]
Build three classification models using the train dataset by
applying (a) 1ogistic regression and (b) SVM- RBF Kernel and (c)
Decision Trees [2 Marks]

Get the predicted probabilities for the test data set for each
of the above three models. Store the three predicted
probabilities in a dataframe named "result". [2 Marks]

4. Obtain the predicted classes in each case (a) logistic


regression and (b) Naive-Bayes and (c) Decision Trees using
appropriate thresholds. Store the three predicted classes in the
dataframe "result". [2 Marks]

5. Approach I - Do a polling of the three classes to get the


predicted class. Call this predicted class class as
"pred_class_voted". Store it in dataframe "result" [2 Marks]

6. Combine the predicted probabilities of (a) logistic regression


and (b) Naive-Bayes and (c) Decision Trees, weighted by their AUC
Values. Call it as "pred_prob_all" and store it to the dataframe
"result ". [2 Marks]
7. Approach. II - Obtain the predicted class from "pred prob_all".
Call the class as "pred_class_by prob" and store in the dataframe
"result". [2 Marks]

8. Obtain two confusion matrices for the predicted classes - From


Step 5 and Step 7 and display them. [4 Marks]

9. Which approach is better based on the results. [2 Marks]


Term IV 2023: End-Term Exam Question Paper - Part A
Subject:Data Science and Machine Learning
Program: MBA /MBA(Analytics)
Faculty :Prof. K. Venkat araghavan
Date of Exam : 22.09.2023

Mode Pen and Paper fxam


Use of Internet: No
Open book : No
Calculator :No
Duration :50 Minutes
Part A Marks :30 Marks

PART A
Q1 State True or False [Total Marks 09]

1. GINI Index = 0, indicates high impurity.


2. Higher the entropy, higher is the impurity.
data.
3. Accuracy is a good indicator of classifier performance with class imbalanced
4. F1-Score is harmonic mean of accuracy and recall.
5. In SVM, increasing the cost of misclassification is a good idea.
6. Gamma is ahyperparameter of RBF Kernel.
7. Random forest is a type of feature bagging.
8. SGB helps avoid overfitting
9. Feature Importance can be determined for Tree Based Models.

PART B
Q1. Find the Gini Index of the following nodes. Show the steps in deriving the answe.
(3 Marks]

Node A: N=120, Class A = 70, Class B = 50


Node B: N=70, Class A = 50, Class B = 20
Node C: N=50, Class A = 20, Class B = 30

Q2.Find the Entropy of the following nodes. Show the steps in deriving the answer. (3
Marks]

Node A: N=120, Class A= 70, Class B = 50


Node B: N=70, Class A = 50, Class B = 20
Node C: N=50, Class A= 20, Class B = 30
Q3 Calculate the Log-Loss Values in Each Case (3 Marks)

1. Actual Class = 1, Predicted Probability = 0.9


2. Actua! Class = 1, Predicted Probability = 0.5
3.Actual Class = 0, Predicted Probability =0.4
customer base on 1Million
Q4. YoU are amarketing manager in ABC Inc. You have a
cost of campaign is
Customers and the average response rate for acampaiqn is 10%. The
30000 responses.
Rs 10per customer. So, what will be the cost of campaign if you need
Showthe steps in deriving the answer. [3 Marks]

that the response rates


Q5. Following on previous question assume that a classifier tells you
20%,
in first, second and third deciles of predicted probabilities (in descending order) are
10% and 10%. So, what will be the cost of campaign if you need 30000 responses. Show the
[3 Marks].
steps in deriving the answer. Is there any financial benefit of using the classifier

Q6. Answer the following questions from the following figure (3 Marks]

checkin acc A14s 0.5


gini=0.419
samples700
s value [491, 209]s
class = Good Credit
True False

duration <=33.0 plans A143. s=0.5


gini = 0.484 gIni 0.222
samples =425 samples 275
value = [251, 174] Lvalue 240 351
class = Good Credit class Good Ciedit

gini = 0,458 gini = 0.464 gini = 0423 gini0.167


samples= 343 samples = 82 samples 46 samples229
value = (221, 122] value =[30, 521 value [32, 14) value 208, 211
class Good Credit class = Bad Credit class Good Credit class GOod Credit

1.What is the probability of being good credit if Checkin_acc_Al4 =0and duration is 20?
2. What is the probability of being good credit if Checkin_acc_Al4 = 0 and duration is 35 ?
3. What is the probability of being good credit if Checkin_acc_Al4 = 1 and inst_plans_A143
= 0?

Q7.Answer the following questions from confusion matrix below [3 Marks]

In a given confusion matrix,


TN = 188
TP = 30
FP = 21
FN = 61
I increase the threshold which makes TN = 208 and FN = 85. T74P

Find the accuracy, sensitivity, specificity for the new confusion


matriX.

You might also like