0% found this document useful (0 votes)
167 views

ML Assignment 3

Machine learning
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views

ML Assignment 3

Machine learning
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 3

Introduction to Machine Learning


Prof. B. Ravindran
1. Which of the following statement(s) about decision boundaries and discriminant functions of
classifiers is/are true? (One or more choices may be correct)

(a) In a binary classification problem, all points x on the decision boundary satisfy δ1 (x) =
δ2 (x).
(b) In a three-class classification problem, all points on the decision boundary satisfy δ1 (x) =
δ2 (x) = δ3 (x).
(c) In a three-class classification problem, all points on the decision boundary satisfy at least
one of δ1 (x) = δ2 (x), δ2 (x) = δ3 (x) or δ3 (x) = δ1 (x).
(d) Let the input space be Rn . If x does not lie on the decision boundary, there exists an
ϵ > 0 such that all inputs y satisfying ||y − x|| < ϵ belong to the same class.

Sol. (a), (c), (d)


At any point x on the decision boundary, argmaxi (δi (x)) gives two or more classes corre-
sponding to the highest value of the discriminant function. If a point does not lie on a decision
boundary, it lies in the interior of a region corresponding to one of the classes.
2. The following table gives the binary ground truth labels yi for four input points xi (not given).
We have a logistic regression model with some parameter values that computes the probability
p(xi ) that the label is 1. Compute the likelihood of observing the data given these model
parameters.

Y 1 1 0 1
p(xi ) 0.8 0.4 0.2 0.9

(a) 0.346
(b) 0.230
(c) 0.058
(d) 0.086

Sol. (b)
QN
Apply the equation L = i=1 p(xi )yi (1 − p(xi ))(1−yi )
3. Which of the following statement(s) about logistic regression is/are true?

(a) It learns a model for the probability distribution of the data points in each class.
(b) The output of a linear model is transformed to the range (0, 1) by a sigmoid function.
(c) The parameters are learned by optimizing the mean-squared loss.
(d) The loss function is optimized by using an iterative numerical algorithm.

Sol. (b), (d)


Unlike LDA, logistic regression does not learn a probability distribution for the data. The
optimal parameters are computed by maximizing the log-likelihood.

1
4. Consider a modified form of logistic regression given below where k is a positive constant and
β0 and β1 are parameters.

1 − p(x)
log( ) = β0 − β1 x
kp(x)

Then find p(x).


eβ 0
(a) keβ0 +eβ1 x
eβ 1 x
(b) eβ0 +keβ1 x
eβ 1 x
(c) keβ0 +eβ1 x
eβ1 x
(d) keβ0 +e−β1 x

Sol. (c)

Exponentiating the given equation,


1−p(x)
kp(x) = eβ0 −β1 x
1 − p(x) = kp(x)eβ0 −β1 x
p(x)(1 + keβ0 −β1 x ) = 1
1
p(x) = 1+keβ0 −β1 x
eβ1 x
p(x) = keβ0 +eβ1 x

5. Consider a Bayesian classifier for a 3-class classification problem. The following tables give the
class-conditioned density fk (x) for three classes k = 1, 2, 3 at some point x in the input space.

k 1 2 3
fk (x) 0.15 0.20 0.05

Note that πk denotes the prior probability of class k. Which of the following statement(s)
about the predicted label at x is/are true? (One or more choices may be correct.)

(a) If the three classes have equal priors, the prediction must be class 2.
(b) If π3 < π2 and π1 < π2 , the prediction may not necessarily be class 2.
(c) If π1 > 2π2 , the prediction could be class 1 or class 3.
(d) If π1 > π2 > π3 , the prediction must be class 1.

Sol. (a), (c)


For a Bayesian classifier, the prediction is given by argmaxk fk (x)πk .
In option (a) and option (b), fk (x)πk would be highest for class 2.
In option (c), π1 > 2π2 ⇒ f1 (x)π1 > f2 (x)π2 . So, the prediction is class 1 if f1 (x)π1 > f3 (x)π3
and class 3 otherwise.
Similarly, in option (d), the prediction could be either class 1 or class 2.

2
(i) (i)
6. The following table gives the binary labels (y (i) ) for four points (x1 , x2 ) where i = 1, 2, 3, 4.
Among the given options, which set of parameter values β0 , β1 , β2 of a standard logistic re-
1
gression model p(xi ) = 1+e−(β0 +β 1 x+β2 x)
results in the highest likelihood for this data?

x1 x2 y
0.4 -0.2 1
0.6 -0.5 1
-0.3 0.8 0
-0.7 0.5 0
(a) β0 = 0.5, β1 = 1.0, β2 = 2.0
(b) β0 = −0.5, β1 = −1.0, β2 = 2.0
(c) β0 = 0.5, β1 = 1.0, β2 = −2.0
(d) β0 = −0.5, β1 = 1.0, β2 = 2.0

Sol. (c)
For each option, first compute the probabilities and then compute the log-likelihood using the
following equation. You can either do this manually or programmatically.

N N
X X p(xi )
l(β0 , β1 , β2 ) = log(1 − p(xi )) + yi log( )
i=1 i=1
1 − p(xi )

7. Which of the following statement(s) about a two-class LDA model is/are true? (One or more
choices may be correct)

(a) It is assumed that the class-conditioned probability density of each class is a Gaussian.
(b) A different covariance matrix is estimated for each class.
(c) At a given point on the decision boundary, the class-conditioned probability densities
corresponding to both classes must be equal.
(d) At a given point on the decision boundary, the class-conditioned probability densities
corresponding to both classes may or may not be equal.

Sol. (a), (d)


Option (a) is true and (b) is false according to the assumptions of LDA.
The log ratio of the posterior probabilities of any two classes is

P r(G = k|X = x) fk (x) πk


log( ) = log( ) + log( )
P r(G = l|X = x) fl (x) πl
The decision boundary is defined as the set of points at which the posterior probabilities of
both classes are equal, i.e.

fk (x) πk
log( ) + log( ) = 0
fl (x) πl
If πk = πl (i.e. equal number of samples from both the classes), we obtain fk (x) = fl (x).
Similarly, πk ̸= πl ⇒ fk (x) ̸= fl (x). Since the question does not mention whether there is an
equal number of samples from both classes, option (c) is incorrect, and option (d) is correct.

3
8. Consider the following two datasets and two LDA models trained respectively on these datasets.
Dataset A: 100 samples of class 0; 50 samples of class 1
Dataset B: 100 samples of class 0 (same as Dataset A); 100 samples of class 1 created by
repeating twice the class 1 samples from Dataset A
The classifier is defined as follows in terms of the decision boundary wT x + b = 0. Here, w is
called the slope and b is called the intercept.
(
0 if wT x + b < 0
y=
1 if wT x + b ≥ 0

Which of the given statement is true?

(a) The learned decision boundary will be the same for both models.
(b) The two models will have the same slope but different intercepts.
(c) The two models will have different slopes but the same intercept.
(d) The two models may have different slopes and different intercepts.

Sol. (b)

Consider the LDA decision boundary given by


πk 1
xT Σ−1 (µk − µl ) + log( ) − (µk + µl )T Σ−1 (µk − µl ) = 0
πl 2

The first term corresponds to wT x while the second and third terms constitute the intercept.
From the construction of the two datasets, it is clear that the estimates of µ0 , µ1 and Σ would
be the same for both datasets. The two decision boundaries would only differ in the term
log( ππ01 ) which depends on the sample sizes of each class.
9. Which of the following statement(s) about LDA is/are true? (One or more choices may be
correct)

(a) It minimizes the between-class variance relative to the within-class variance.


(b) It maximizes the between-class variance relative to the within-class variance.
(c) Maximizing the Fisher information results in the same direction of the separating hyper-
plane as the one obtained by equating the posterior probabilities of classes.
(d) Maximizing the Fisher information results in a different direction of the separating hy-
perplane from the one obtained by equating the posterior probabilities of classes.

Sol. (b), (c)


Please refer to the lecture.
10. Which of the following statement(s) regarding logistic regression and LDA is/are true for a
binary classification problem? (One or more choices may be correct)

(a) For any classification dataset, both algorithms learn the same decision boundary.

4
(b) Adding a few outliers to the dataset is likely to cause a larger change in the decision
boundary of LDA compared to that of logistic regression.
(c) Adding a few outliers to the dataset is likely to cause a similar change in the decision
boundaries of both classifiers.
(d) If the within-class distributions deviate significantly from the Gaussian distribution, lo-
gistic regression is likely to perform better than LDA.

Sol. (b), (d)


The decision boundaries learned by the two techniques are different because logistic regression
uses maximum likelihood estimation (MLE) while LDA performs Bayesian classification by
assuming the distribution of each class is a Gaussian.
Since LDA assumes that the underlying intra-class distributions are Gaussian, outliers have a
greater effect on LDA compared to logistic regression.
LDA will not perform well if the data does not satisfy the assumptions of the model. On the
other hand, logistic regression does not make any assumptions about the data distribution.

You might also like