0% found this document useful (0 votes)
36 views

CS725 2020 Midsem

This document contains 5 problems about machine learning concepts. Problem 1 has two parts about Bernoulli random variables and maximum a posteriori estimation. Problem 2 contains 3 true/false statements. Problem 3 involves designing a decision tree classifier for a dataset. Problem 4 is about using Naive Bayes classification. Problem 5 discusses modifying the perceptron algorithm weight update rule.

Uploaded by

tatha.research
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

CS725 2020 Midsem

This document contains 5 problems about machine learning concepts. Problem 1 has two parts about Bernoulli random variables and maximum a posteriori estimation. Problem 2 contains 3 true/false statements. Problem 3 involves designing a decision tree classifier for a dataset. Problem 4 is about using Naive Bayes classification. Problem 5 discusses modifying the perceptron algorithm weight update rule.

Uploaded by

tatha.research
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Problem 1: Short Answers (4 points)

(a) Consider a Bernoulli random variable X with parameter b (i.e. P (X = 1) = b). Say you observe the
following values of X: (0, 0, 1, 0, 1).
(i) Write an expression for the likelihood as a function of b. (b(1 − b)2 can be written as b(1-b)**2.)
[1 pts]

(ii) What is the maximum a posteriori (MAP) estimate of b, if P (b = 52 ) = 3


5
and P (b = 35 ) = 25 ?
Write your answer as a fraction. [1 pts]

(b) Suppose we build a least-squares linear regression model, where we impose a Gaussian-distributed
prior probability on the weights. Then, we are doing: [1 pts]
(i) Logistic regression (ii) Ridge regression (iii) Lasso regression (iv) L1 regularization.

(c) Consider two decision trees T1 and T2 of depths 2 and 4, respectively, that are trained on the same
dataset. Which of the following are likely correct? [1 pts]
(i) Bias(T1 ) > Bias(T2 )
(ii) Bias(T1 ) < Bias(T2 )
(iii) Variance(T1 ) > Variance (T2 )
(iv) Variance(T1 ) < Variance(T2 )

Problem 2: True or False (3 points)


Mark the following statements as “True” or “False”.

(a) Increasing the depth of a decision tree cannot increase its training error. True [1 pts]

(b) When a linear separator f (x) = wT x + w0 is trained on data that is linearly separable, a perceptron
classifier is guaranteed to achieve zero error on the training data. True [1 pts]

(c) Given a matrix X, (XXT + λI)−1 for λ > 0 always exists. True [1 pts]

Problem 3: Decision Tree Classifiers (6 points)


You are given the following dataset with attributes X1 , X2 , X3 , X4 and X5 . You need to design a decision
tree classifier to predict Y given these attributes.

X1 X2 X3 X4 X5 Y
1 0 1 1 0 0
1 0 0 1 0 0
1 0 1 0 1 1
1 0 1 0 1 1
0 0 1 0 1 1
0 0 0 1 0 0
1 0 0 1 1 0
1 1 0 1 0 1

You can use the following approximate values: log2 (3) ≈ 23 , log2 (5) ≈ 11
5
and log2 (7) ≈ 14
5
.

1
(a) Let IG(Xi , Y ) denote the information gain of attribute Xi at the root node. What is IG(X3 , Y )?
[1 pts]

(b) Choose the correct option below.


(i) IG(X3 , Y ) = IG(X5 , Y ) (ii) IG(X3 , Y ) > IG(X5 , Y ) (iii) IG(X3 , Y ) < IG(X5 , Y ) [1 pts]

(c) Using information gain as the splitting criterion, which attribute would you choose to use at the root
of the tree?

(i) X1 (ii) X2 (iii) X3 (iv) X4 (v) X5 [2 pts]

(d) With information gain as the splitting criterion, suppose you construct the smallest decision tree
that will yield zero training error on the dataset above. This tree can be constructed using only
two attributes, one of which you found out in the previous question. Which are the two attributes?
(Note that you do not need any more information gain computations to answer this question.)

(i) X4 , X5 (ii) X3 , X5 (iii) X3 , X4 (iv) X2 , X3 (v) X4 , X2 [2 pts]

Problem 4: Naive Bayes (3 points)


The table shows data for two Boolean input variables, x1 and x2 , and a Boolean output y.

x1 x2 y
0 0 1
0 1 1
1 1 0
1 1 1
0 0 0
0 1 0
0 0 1
0 1 0

(a) Say we want to use a Naive Bayes classifier to predict y using x1 and x2 . What is P (y = 0 | x1 =
0, x2 = 0) according to this Naive Bayes classifier? (You can leave the final answer as a fraction.)
[1 pts]

(b) What is the expected error rate of the Naive Bayes classifier on test samples generated according to
probabilities estimated using the dataset in the table above? Pick the correct answer. (i) 12 (ii) 34
(iii) 85 (iv) 83 [2 pts]

Problem 5: Perceptron Algorithm (4 points)


Recall the following update rule for the perceptron algorithm, given a training instance (x, y) ∈ D:

if ywT x ≤ 0 then
w ← w + ηyx

2
Here, η is a learning rate. Say we want to modify the weight update rule by appropriately setting η
such that if the algorithm sees the same example twice in a row, it will never incorrectly label the second
occurrence of the example. Which of the following constraints on η would help satisfy this property?
[1 pts]
−ywT x
(a) η ≤ ||x||2

−ywT x
(b) η > ||x||2

1−ywT x
(c) η ≤ ||x||2

1−ywT x
(d) η > ||x||2

[Pen and paper] Show how you derived the constraint you picked above. [3 pts]

You might also like