ML Assignment 3
ML Assignment 3
(a) In a binary classification problem, all points x on the decision boundary satisfy δ1 (x) =
δ2 (x).
(b) In a three-class classification problem, all points on the decision boundary satisfy δ1 (x) =
δ2 (x) = δ3 (x).
(c) In a three-class classification problem, all points on the decision boundary satisfy at least
one of δ1 (x) = δ2 (x), δ2 (x) = δ3 (x) or δ3 (x) = δ1 (x).
(d) Let the input space be Rn . If x does not lie on the decision boundary, there exists an
ϵ > 0 such that all inputs y satisfying ||y − x|| < ϵ belong to the same class.
Y 1 1 0 1
p(xi ) 0.8 0.4 0.2 0.9
(a) 0.346
(b) 0.230
(c) 0.058
(d) 0.086
Sol. (b)
QN
Apply the equation L = i=1 p(xi )yi (1 − p(xi ))(1−yi )
3. Which of the following statement(s) about logistic regression is/are true?
(a) It learns a model for the probability distribution of the data points in each class.
(b) The output of a linear model is transformed to the range (0, 1) by a sigmoid function.
(c) The parameters are learned by optimizing the mean-squared loss.
(d) The loss function is optimized by using an iterative numerical algorithm.
1
4. Consider a modified form of logistic regression given below where k is a positive constant and
β0 and β1 are parameters.
1 − p(x)
log( ) = β0 − β1 x
kp(x)
Sol. (c)
5. Consider a Bayesian classifier for a 3-class classification problem. The following tables give the
class-conditioned density fk (x) for three classes k = 1, 2, 3 at some point x in the input space.
k 1 2 3
fk (x) 0.15 0.20 0.05
Note that πk denotes the prior probability of class k. Which of the following statement(s)
about the predicted label at x is/are true? (One or more choices may be correct.)
(a) If the three classes have equal priors, the prediction must be class 2.
(b) If π3 < π2 and π1 < π2 , the prediction may not necessarily be class 2.
(c) If π1 > 2π2 , the prediction could be class 1 or class 3.
(d) If π1 > π2 > π3 , the prediction must be class 1.
2
(i) (i)
6. The following table gives the binary labels (y (i) ) for four points (x1 , x2 ) where i = 1, 2, 3, 4.
Among the given options, which set of parameter values β0 , β1 , β2 of a standard logistic re-
1
gression model p(xi ) = 1+e−(β0 +β 1 x+β2 x)
results in the highest likelihood for this data?
x1 x2 y
0.4 -0.2 1
0.6 -0.5 1
-0.3 0.8 0
-0.7 0.5 0
(a) β0 = 0.5, β1 = 1.0, β2 = 2.0
(b) β0 = −0.5, β1 = −1.0, β2 = 2.0
(c) β0 = 0.5, β1 = 1.0, β2 = −2.0
(d) β0 = −0.5, β1 = 1.0, β2 = 2.0
Sol. (c)
For each option, first compute the probabilities and then compute the log-likelihood using the
following equation. You can either do this manually or programmatically.
N N
X X p(xi )
l(β0 , β1 , β2 ) = log(1 − p(xi )) + yi log( )
i=1 i=1
1 − p(xi )
7. Which of the following statement(s) about a two-class LDA model is/are true? (One or more
choices may be correct)
(a) It is assumed that the class-conditioned probability density of each class is a Gaussian.
(b) A different covariance matrix is estimated for each class.
(c) At a given point on the decision boundary, the class-conditioned probability densities
corresponding to both classes must be equal.
(d) At a given point on the decision boundary, the class-conditioned probability densities
corresponding to both classes may or may not be equal.
fk (x) πk
log( ) + log( ) = 0
fl (x) πl
If πk = πl (i.e. equal number of samples from both the classes), we obtain fk (x) = fl (x).
Similarly, πk ̸= πl ⇒ fk (x) ̸= fl (x). Since the question does not mention whether there is an
equal number of samples from both classes, option (c) is incorrect, and option (d) is correct.
3
8. Consider the following two datasets and two LDA models trained respectively on these datasets.
Dataset A: 100 samples of class 0; 50 samples of class 1
Dataset B: 100 samples of class 0 (same as Dataset A); 100 samples of class 1 created by
repeating twice the class 1 samples from Dataset A
The classifier is defined as follows in terms of the decision boundary wT x + b = 0. Here, w is
called the slope and b is called the intercept.
(
0 if wT x + b < 0
y=
1 if wT x + b ≥ 0
(a) The learned decision boundary will be the same for both models.
(b) The two models will have the same slope but different intercepts.
(c) The two models will have different slopes but the same intercept.
(d) The two models may have different slopes and different intercepts.
Sol. (b)
The first term corresponds to wT x while the second and third terms constitute the intercept.
From the construction of the two datasets, it is clear that the estimates of µ0 , µ1 and Σ would
be the same for both datasets. The two decision boundaries would only differ in the term
log( ππ01 ) which depends on the sample sizes of each class.
9. Which of the following statement(s) about LDA is/are true? (One or more choices may be
correct)
(a) For any classification dataset, both algorithms learn the same decision boundary.
4
(b) Adding a few outliers to the dataset is likely to cause a larger change in the decision
boundary of LDA compared to that of logistic regression.
(c) Adding a few outliers to the dataset is likely to cause a similar change in the decision
boundaries of both classifiers.
(d) If the within-class distributions deviate significantly from the Gaussian distribution, lo-
gistic regression is likely to perform better than LDA.