0% found this document useful (0 votes)
5 views

MLFA Spring 2024

The document outlines the structure and content of a class test for the Machine Learning Foundations and Applications course at IIT Kharagpur, scheduled for February 1, 2024. It includes various topics such as supervised and unsupervised learning, K-Nearest Neighbors, linear models, and Naive Bayes, along with specific questions and instructions for the test. Additionally, it provides details on a mid-semester examination and a subsequent class test, including guidelines on allowed materials and the format of questions.

Uploaded by

yevoc62980
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

MLFA Spring 2024

The document outlines the structure and content of a class test for the Machine Learning Foundations and Applications course at IIT Kharagpur, scheduled for February 1, 2024. It includes various topics such as supervised and unsupervised learning, K-Nearest Neighbors, linear models, and Naive Bayes, along with specific questions and instructions for the test. Additionally, it provides details on a mid-semester examination and a subsequent class test, including guidelines on allowed materials and the format of questions.

Uploaded by

yevoc62980
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Indian Institute of Technology Kharagpur

Machine Learning Foundations and Applications


(AI42001)
Class Test-1, Date: Feb 1, 2024

Timing: 2:10 to 3:40 PM (# Qns: 4) Spring 2023-24 Max marks: 35

Attempt all questions


1. Intro to Machine Learning
ka) Explain the differences and similarities between (a) Supervised Learning; (b)
Unsupervised Learning, and (c) Reinforcement Learning. (3)
6) In the context of Supervised learning, explain the concepts of (a) Labelled Data,
(b) Model, (c) Loss, and (d) Parameter Optimization. (5)
() In the context of Unsupervised learning, explain the difference(s) between Clus
tering and Association using a practical example. (2)

2. KNearest Neighbor
(at In certain situations K in a KNN cannot be too small or too large. Explain
what are those situations? (2)
(b) Explain the difference between KNN and Weighted KNN. State whether the
weighted KNN solves the problem of too small K or too large K. Justify. (2
(é) In the context of KNN, explain (a) Why we need to normalize the input features
before KNN classification. (b) What is curse of dimensionality. (2)
(a A KNN classifier assigns a test instance the majority class associated with its
K nearest training instances. Distance between instances is measured using
Euclidean distance. Suppose we have the following training set of positive (+)
and negative (-) instances and a single test instance (o). All instances are
projected onto a vector space of two real-valued features (X and Y). Answer
the following questions. Assume "unweighted" KNN (every nearest neighbor
contributes equally to the final vote).
Y

test instance

Figure Input distribution.


(i) What would be the class assigned to this test instance for K=1?
(ii) What would be the class assigned to this test instance for K=3?
(iii) What would be the class assigned to this test instance for K=5?
(iv) Setting K to a large value seems like a good idea. We get more votes!
Given this particular training set, would you recommend setting K = 11?
Why or why not?
(4)
3. Linear Models
(a) For a binary classification problem, explain (a) pre-inner product and post
inner product interpretation. What are the advantages of one over other? (b)
What is the importance of bias using the post-inner product interpretation. (3)
) Can a linear regression classifier achieve zero training error on any of the
datasets in Fig. 2. Provide justification for your answer.

Page 2
(A)
(B)

(E) (F

Figure 2: The 2-dimensional labeled training sets with two


classes.

(2)
( A random sample of eight drivers insured with a company and
having similar
auto insurance policies was selected. The following table lists their driving
experiences (in years) and monthly auto insurance premiums.

5
Driving Experience (years) Monthly Auto Insurance Premium
64 USD
2 87
12 50
9 71 ()c-9

15 44
6 56
25 42
16 60 }2

1) Does the insurance premium depend on the driving experience ? Do you


expect'a positive or a negative relationship between these two variables?
(ii) Find the least squares regression line by choosing appropriate dependent
and independent variables based on your answer in part 1.
) Interpret the meaning of the values of a and b calculated in part 2.
AV) Predict the monthly auto insurance premium for a drËver with 10 years of
driving experience.
(5)

Page 3
A. Naive Bayes
(a Here's a naive Bayes model with the following conditional probability table and
the following prior probabilities over classes.
Word type b C

P(w y =1) 5/10 3/10 2/10


P(wy =0) 2/10 2/ 10 6/10
P(y=1) P(y=0)
8/10 2/10
Consider a binary classification problem, for whether a document is about the
Chandrayaan-3 (class y = 1), or it is not about the Chandrayaan-3 (y = 0).
Consider a document consisting of 2 a's, and 1c.
What is the probability that it is about the Chandrayaan?
6) What is the probability that it is not about the Chandrayaan?
(5)

Best wishes

Page 4
So
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
Mid Spring Semester Examination 2023-24
Date of Examination: 20-02-2024 Session: FN Duration:2 Hrs_Full Marks: 40
Subject No. :AM2001 Subject : MACHINE LEARNING FOUNDATIONS AND APPUGATIONS
DepartmentUCenter/School: Artiflclal Intellgence
Specific charts, graph paper, log book et., requlred No (Ensure qucstion papct has 10 questions)
Special Instructions (if any) :Calculators arc allowed. Rough work must be prescnt in the answct sript itscif.

Short Answer Questions (Answer this as a Separate Scction)


, K Explain the principle of the gradient desccnt algorithm. Accompany your explanation with a diagram
Explain the use of all the tems and constants. (2]
2. Derive the gradicnt descent training rule assuming that the target function representation is
Od =Wo + W;X1d .+ W,Xpd.

Define explicitly the cost/error function E, assuming that a set of training examples D is provided, where
cach training example deDis associated with the target output ta. (2]
3. Which of the following statements are true for k-NN classifiers (provide all answers that are correct)? [1]
a) The classification accuracy is better with larger values of k.

b) The decision boundary is smoother with smaller values of k.


c) k-NN is a type of instance-based learning.

d) k-NN does not require an explicit training step.


e) The decision boundary is linear.

4. Give a one sentence reason why: [1-1-1-1+1]


A. Though both Supervised and Reinforcement learning use supervision, two supervisions are different.
B. The two unsupervised learning problems Association and Clustering are different.

. We might prefer Decision Tree learning over K-NN classifier.


. We choose parameters that minimize the sum of squared training errors in Lincar Regression.

LASSO Regression cnforces more sparsity in weights as comparcd to Ridge Regression


irelevant to the classification
. Suppose among the attributes used to represent the instances, a subset may be
and Decision Tree,
problem being solved. Given this situation, which one among K Nearest Neighbor
[1]
would be better modelling choice? Provide justification.
(Answer this as a Separate Section)
Long Answer Questions
[(4+2+2]
6. In the context of Lincar Regression, answer the following questions:
coresponding output pair: {Xi1, Xi2, Yi}. We would
We are given a set of two-dimensional inputs and their
like to use the following regression modcl to predict y:
y= wi'x t w:'x. (Wz may
Dernve the optimal valuc for wË when using least squares as the target minimization function
appear in your resulting cquation). Note that there may be morc than one possible valuc for w

B. Now assume we only observe a single input for each output (that is, a set of fx, y} pairs). We would like
to compare the following two models on our input datasct (for cach onc we split into training and testing
set to evaluate the lcamed model). Assume we have an unlimited amount of data:
Model A: y=wx,
Model B: y = wx.
Which of the following is correct (chose the answer that best describes the outcome)? Justify.
a. There are datasets for which A would perform better than B
b. There are datasets for which B would perfom better than A
c. Both l and 2 are correct.
d. They would perform equally well on all datasets

C. For the data above we are now comparing the following two models:
Model A: y=wix+ W2x,
Model B: y= wx.
Note that model A now uses two parameters (though both multiply the same input value, x). Again, we
assume unlimited data. Which of the following is correct (chose the answer that best describes the
outcome)? Justify your answer.
a. There are datasets for which A would perform better than B
b. There are datasets for which B would perform better than A
c. Bothl and 2are correct.
d. They would perform equally well on all datasets.

.7. Suppose you are given a Linear Classification problem with the dataset in Fig. 1. [3)

Fig. 1
y = wo + w;X) + w:X; Illustrate using suitable
We would like to use the following classification model: weights: (a) No regularization (b) LI
decision boundaries, what is the impact of regularization on
regularization and (c) L2 regularization. We must aim for the
least loss value and assume that we can
neglect atmost 1 misclassified datapoint, if needed.
of different fruits. Apply Naïve Baye's and predict
. Suppose we have the following table which has attributes 3]
if a fruit has the following properties: (Yellow, Sweet, Long}, then which type of the fruit it is.
that
have

re

Frequency Table:
Fruit Yellow Sweet Long Total
Mango |350 450 0 650
Banana |400 300 350 400 20
Others 50 100 50 |150
Total 800 850 400 1200

ISRO intends to include a module in Pragyan, the lunar probe of Chandrayaan-3, that will discriminate
between igneous rocks found in Moon (M) and igneous rocks found in Earth (E) based on the folowing
characteristics (attributes): Water content E (N, Y), Number of distinct textures E(> 10,< 10), Size E
{S, L), Smelly ¬ (N, Y}. Available training data is as follows. (S+3+2]
Index Type Water No: of Textures Size Smelly
1 Y > 10/ Y
(2) M N < 10 L N
(3 N > 10 / N
(4 M < 10 S
(5 Y < 10
(6) E Y < 10 S
(7) Y < 10
8 E N < 10 S N

a Train a decision tree using the above data and draw the tree (provide all your calculations).
Write the learned concept for an igneous rocks found in Moon as a set of conjunctive rules (using
AND and OR operators)
Figure 2shows a decision tree with depth two. Show that this decision tree perfectly classifies the
given data. Though this decision tree gives a simpler hypothesis with zero error, why does the
approach employed in question (a) fails to output this kind of simpler decision tree

Size

Smelly Water

Fig. 2
Learning. You should mathematically derive
0. Explain the concept of Bias-variance trade-off in Machine
boosting methods to counteract bias
the bias-variance relation. Also explain, how we use bagging and
individually help. [5]
variance trade-off, and which part ofbias-variance trade-off will they
Good luck!
Al42001 - Machine Learning Foundations and
Applications
Class Test 2

Instuctions: Please answer all questions. The maximum points of this test is 15.
and you are allowed 30 minutes to complete the test. This is a closed book test, and
the use of electronic devices other than non-progranmable calculators are not
permitted during the duration of the test. Good luck!

Question 1

Pick the correct option in each of the following questions. Some questions may
have more than one correct option, and you have need to identify all of them to
receive full credit.

t. A single image of size 27 x 27 >x3 is passed through a convolutional layer


having 16 convolutional filters are of spatial dimension 3 x3 and has 3
channels corresponding to the 3 channels of the input image. The padding size
is 2 and the stride of the convolutional filters is also 2. What would be the size of
the feature map produced by this convolutional layer? [1point)
A. 9X9× 16
B. 15 x 15 x 16
C. 14 x14 ×3
D. None of the above

2. Suppose we use a polynomial kernel of degree 4 anda RBF (Gaussian) kernel to


implicitiy map a7 dimensional feature vector to feature spaces of dimensions
k, and k, for polynomial and RBF kernels, respectively. What are the values of
k and k,? [1point]
A. k, = 4, ky = o
B. k, = 28, k, =7

Page 1 of 4
C. ky = 2401, k, = 49
D. k, = 330, k, = 0

3. The figure shows two decision boundaries obtained using soft-margin SVM
classifiers, Aand Bobtained using soft-margin SVM as discussed in class:
M
min
i=l

s. t.y(w"r0+b) 1-§ Vi
>0Vi

A B
The values of the hyper parameter C are C and Cg for the learned classifiers A
and B. What is the relationship between C and Cg? [1point]
A. CC<Cg
B. CA> CB
C. CA = CB
D. Cannot be determined from available infornmation

A. Suppose you have a deep CNN model having several layers that perfoms an
image classification task by learning on a dataset of 256 X 256 images, and a
logistic regression model operating on 10 features extracted from the sante
dataset of images to perform the same classification task. Which ensemble

Page 2 of 4
learning technique would you apply to improve the bias-variance tradeoff of
each learner? [1point)
A. Boosting for deep CNN, bagging for logistic regression
B. Boosting for both learners
C. Bagging for both learners
D. Bagging for deep CNN, boosting for logistic regression

5. Which solutions could be effective in mitigating the vanishing/exploding gradient


problems in RNNs?
A. Gradient clipping
B. Making the RNN bidirectional by adding backward layers.
C. Using gated RNNs such as LSTMs
D. All of the above.

Question 2

Design a 2 input XOR gate using 3 units (artificial neurons). You may assume that
the inputs x and x, as wellas the output ytake binary values, i.e xj, X). y
e{-1,1), while the weights and biases can take integer values (positive or
negative). Clearly show the structure of the network and specify the weights and
biases of each unit. (5 points]

Question 3

The receptive field of a layer with respect to the input is defined as the number of
pixels of input that influences each element of the feature map produced by the
corresponding layer. Assume that a 64 × 64 X3 image, I is passed through two
convolutional layers followed by a max pooling layer to produce a feature map Fas
shown in the figure below.

C(5x5X 8) C(3X3X 16) Max pool (4X4) F


Stride-2 Stride=1 Stride=4

Page 3 of 4
i) What is the receptive field of the first convolutional layer C,? 1 point)
i) What is the receptive field of the second convolutional layer C, 2 points]
0v) Calculate the receptive field of the max pooling layer. 2 points]

Page 4 of 4

You might also like