0% found this document useful (0 votes)
12 views

ml-20231026-1

Machine learning exam 2023

Uploaded by

bakadi6010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

ml-20231026-1

Machine learning exam 2023

Uploaded by

bakadi6010
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Exam in DD2421 Machine Learning

2023-10-26, kl 08.00 – 12.00


Aids allowed: calculator, language dictionary.

To take this exam you must be registered to this specific exam as well as to the course.
In order to pass this exam, your score x first needs to be 20 or more (out of 42, full point).
In addition, given your points y from the Programming Challenge (out of 18, full point), the
requirements on the total points, p = x + y, are preliminarily set for different grades as:

54 < p ≤ 60 → A
48 < p ≤ 54 → B
42 < p ≤ 48 → C
36 < p ≤ 42 → D
29 < p ≤ 36 → E (A pass is guaranteed with the required points for ’E’.)
0 ≤ p ≤ 29 → F

This exam consists of sections A, B, and C. NB. Use different papers (answer sheets) for
different sections.

Page 1 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
A Graded problems
Potential inquiries to be addressed to Atsuto Maki.

A-1 Terminology (4p)

For each term (a–d) in the left list, find the explanation from the right list which best describes
how the term is used in machine learning.

1) A variation of Branch-and-Bound search


2) The latent structure optimization
a) The LASSO 3) Random strategy for area compression
b) RANSAC 4) A way to exploit training data for assessing a model

c) Bagging 5) An example of ensemble learning


6) A method for evaluating a mixture of models
d) Cross validation
7) A regularization method that results in feature seletion
8) Robust method to fit a model to data involving outliers

A-2 Entropy (6p)

You have booked a flight for tomorrow, but now there are some risks that it might be cancelled
due to two factors: typhoon and strike. Your estimate on flight cancellation due to the weather, i.e.
typhoon, is 40%. Independently, the probability of cancellation due to strike is 50%.

a) What is the probability that there will be a flight cancellation due to one or both of the
factors? (2p)

b) How unpredictable is it that the flight is cancelled, either due to the weather or the strike (or
both)? Answer in terms of Entropy, measured in bits. (2p)

c) You realized that you can find out whether there will be a strike or not (which we can assume
reliable) on the airport website tonight.
What is the expected information gain from checking it on the website? (2p)

Page 2 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
(i) (ii)

High
High

High Bias Low Bias High Bias Low Bias


Low Variance High Variance Low Variance High Variance

Prediction Error
Prediction Error

Training Sample Test Sample

Testing Sample Training Sample

Low
Low

Low Model Complexity High Low Model Complexity High

(iii) (iv)
High

High
Low Bias High Bias Low Bias High Bias
High Variance Low Variance High Variance Low Variance
Prediction Error

Prediction Error

Training Sample Test Sample

Testing Sample
Low

Training Sample
Low

Low Model Complexity High Low Model Complexity High

Figure 1. Typical behavior of prediction error plotted against model complexity.

A-3 Bias and Variance (5p)

a) One of the four subfigures (i)-(iv) in Figure 1 displays the typical trend of prediction error of
a model for training and testing data with comments on its bias and variance, {high, low}.
Which one of the four figures most well represents the general situation? (1p)

b) Now, let us consider a model, function f (x) of input vector x, and the following concepts:

fˆ(x) : prediction function (= model) estimated with a set of data samples, D


ED [fˆ(x)] : the average of models due to different sample set

Show the bias and variance of the classifier in formulae referring these terms. (2p)

c) Derive that the mean square error (MSE) for estimating f (x) can be decomposed into a
two-fold representation consisting of the terms of bias and variance. (2p)

Page 3 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
A-4 Ensemble Methods (4p)

Briefly answer the following questions regarding ensemble methods of classification.

a) What are the two kinds of randomness involved in the design of Random Forests?

b) In Adaboost algorithm, each training sample is given a weight and it is updated according
to some factors through an iteration of training weak classifiers. What are the two most
dominant factors in updating the weights? How are they used?

A-5 Principal Component Analysis (PCA), Subspace Method (3p)

We consider to solve a K-class classification problem with the Subspace Method and for that
we compute a subspace L(j) (j = 1, ..., K) using training data for each class, respectively. That
is, given a set of feature vectors (as training data) which belong to a specific class C (i.e. with an
identical class label), we perform PCA on them and generated an orthonormal basis {u1 , ..., up }
which spans a p-dimensional subsapce, L, as the outcome.

Provide an answer to the following questions.

a) Which one among the three is least relevant to PCA?


i. Viewing the input data from a different coordinate system.
ii. Finding errors in data labels.
iii. Exploring the possiblity of dimentionality reduction.
Simply indicate your choice. (1p)

b) Given that we compute {u1 , ..., up } as eigenvectors of the auto-correlation matrix Q based
on the training data, how should we choose the eigenvectors in relation to the corresponding
eigenvalues of Q? (1p)

c) Given x, we computed its projectiton length on each subspace as S (j) (j = 1, ..., K), re-
spectively. For classes with labels {l,m,n} among those, we had the following observations:
S (l) was the minimum of all S (j) ’s,
S (m) was 0.5, and
S (n) was the maximum of all S (j) ’s.
Which class should x belong to? Simply choose a class label. (1p)

Page 4 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
poopoos
walk Shoogee Nulnul Max the Tax
1 0 1
2 1 2
3 1 1
4 0 1
5 1 2

Table 1. The number of poopoos emitted by two doggos on five walks together.

B Graded problems
Potential inquiries to be addressed to zoom link (B).

B-1 Warm up (4p)

Our dog Shoogee (pictured) loves to hide from us and surprise us when we find her – followed
by high pitched squeals of love and tickles. One day she hid from us. We were 99% sure she was
in one of the nine hiding spots we know of in the apartment. She chooses any of the spots with
equal frequency, so we just started searching at random. We checked all but one of the spots, and
still no Shoogee. What’s the probability she has found a new hiding spot?

Figure 2. Shoogee (dog) in one of her nine known hiding spots.

B-2 Maximum likelihood estimation (4p)

Consider the data in Table 1. Assume all observations (walks) are independent. Assume the
number of poopoos emitted by Shoogee Nulnul is distributed Poisson with parameter λS . Assume
the number of poopoos emitted by Max the Tax conditioned on the number of poopoos emitted by
Shoogee Nulnul s is distributed Poisson with parameter λM + s. Find the maximum likelihood
estimates of λS and λM . Show your work. (You may use a calculator.)

Page 5 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
B-3 Maximum a posteriori classification (4p)

Consider the data in Table 1. Assume all observations (walks) are independent. Assume the
number of poopoos emitted by Shoogee Nulnul is distributed Poisson with parameter λS = 0.60.
Assume the number of poopoos emitted by Max the Tax conditioned on the number of poopoos
emitted by Shoogee Nulnul s is distributed Poisson with parameter λM = 0.74. On a sixth walk
together, one of the doggos emitted 1 poopoos. Which dog is the most likely to have done this
according to a maximum a posteriori classifier? Show your work. (You may use a calculator.)

Page 6 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
C Graded problems
Potential inquiries to be addressed to Jörg Conradt.

C-1 Multiple-Choice: Support Vector Machine (1p)

Do not justify your answer. Instead, select exactly one option of (1.), (2.), or (3.).
Complete the following sentence: Out of all hyperplanes which solve a classification problem,
the one with widest margin will probably ...

1. ... generalize best.

2. ... compute fastest.

3. ... have the smallest number of parameters.

C-2 Support Vector Classification (3p)

The following diagram shows a small data set consisting of four RED samples (A, B, C, D)
and four BLUE samples (E, F, G, H). This data set can be linearly separated.

a) We use a linear support vector machine (SVM) without kernel function to correctly separate
the RED (A-D) and the BLUE (E-H) class. Which of the data points (A-H) will the support
vectors machine use to separate the two classes? Name the point(s) (no explanation needed).
(1p)

b) Assume someone suggests using a non-linear kernel for the SVM classification of the above
data set (A-H). Give one argument in favor of and one argument against using non-linear
SVM classification for such a data set. USE KEYWORDS! (2p)

Page 7 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki
C-3 Multiple-Choice: Artificial Neural Networks (1p)

Do not justify your answer. Instead, select exactly one option of (1.), (2.), or (3.).
Error-Backpropagation-Training for Neuronal Networks requires the following:

1. a single input and a single output to the network.

2. labeled training data and differentiable neuron activation functions.

3. Gaussian distributed training data and a fast computer.

C-4 Artificial Neural Networks (3p)


Consider the training data in the table, where + means a positive
sample and − a negative. x1 x2 Class
8 8 −
a) What is the minimum number of layers needed for an artificial 8 -4 −
neural network to correctly classify all these points? Motivate 4 0 +
your answer IN KEYWORDS. (2p) 0 4 +
-6 -6 −
b) How many input nodes and how many output nodes does your -6 8 −
neuronal network need to address this problems? (1p)

Page 8 (of 8)

DD2421 Machine Learning • HT 2023


Sturm, Conradt, and Maki

You might also like