ml-20231026-1
ml-20231026-1
To take this exam you must be registered to this specific exam as well as to the course.
In order to pass this exam, your score x first needs to be 20 or more (out of 42, full point).
In addition, given your points y from the Programming Challenge (out of 18, full point), the
requirements on the total points, p = x + y, are preliminarily set for different grades as:
54 < p ≤ 60 → A
48 < p ≤ 54 → B
42 < p ≤ 48 → C
36 < p ≤ 42 → D
29 < p ≤ 36 → E (A pass is guaranteed with the required points for ’E’.)
0 ≤ p ≤ 29 → F
This exam consists of sections A, B, and C. NB. Use different papers (answer sheets) for
different sections.
Page 1 (of 8)
For each term (a–d) in the left list, find the explanation from the right list which best describes
how the term is used in machine learning.
You have booked a flight for tomorrow, but now there are some risks that it might be cancelled
due to two factors: typhoon and strike. Your estimate on flight cancellation due to the weather, i.e.
typhoon, is 40%. Independently, the probability of cancellation due to strike is 50%.
a) What is the probability that there will be a flight cancellation due to one or both of the
factors? (2p)
b) How unpredictable is it that the flight is cancelled, either due to the weather or the strike (or
both)? Answer in terms of Entropy, measured in bits. (2p)
c) You realized that you can find out whether there will be a strike or not (which we can assume
reliable) on the airport website tonight.
What is the expected information gain from checking it on the website? (2p)
Page 2 (of 8)
High
High
Prediction Error
Prediction Error
Low
Low
(iii) (iv)
High
High
Low Bias High Bias Low Bias High Bias
High Variance Low Variance High Variance Low Variance
Prediction Error
Prediction Error
Testing Sample
Low
Training Sample
Low
a) One of the four subfigures (i)-(iv) in Figure 1 displays the typical trend of prediction error of
a model for training and testing data with comments on its bias and variance, {high, low}.
Which one of the four figures most well represents the general situation? (1p)
b) Now, let us consider a model, function f (x) of input vector x, and the following concepts:
Show the bias and variance of the classifier in formulae referring these terms. (2p)
c) Derive that the mean square error (MSE) for estimating f (x) can be decomposed into a
two-fold representation consisting of the terms of bias and variance. (2p)
Page 3 (of 8)
a) What are the two kinds of randomness involved in the design of Random Forests?
b) In Adaboost algorithm, each training sample is given a weight and it is updated according
to some factors through an iteration of training weak classifiers. What are the two most
dominant factors in updating the weights? How are they used?
We consider to solve a K-class classification problem with the Subspace Method and for that
we compute a subspace L(j) (j = 1, ..., K) using training data for each class, respectively. That
is, given a set of feature vectors (as training data) which belong to a specific class C (i.e. with an
identical class label), we perform PCA on them and generated an orthonormal basis {u1 , ..., up }
which spans a p-dimensional subsapce, L, as the outcome.
b) Given that we compute {u1 , ..., up } as eigenvectors of the auto-correlation matrix Q based
on the training data, how should we choose the eigenvectors in relation to the corresponding
eigenvalues of Q? (1p)
c) Given x, we computed its projectiton length on each subspace as S (j) (j = 1, ..., K), re-
spectively. For classes with labels {l,m,n} among those, we had the following observations:
S (l) was the minimum of all S (j) ’s,
S (m) was 0.5, and
S (n) was the maximum of all S (j) ’s.
Which class should x belong to? Simply choose a class label. (1p)
Page 4 (of 8)
Table 1. The number of poopoos emitted by two doggos on five walks together.
B Graded problems
Potential inquiries to be addressed to zoom link (B).
Our dog Shoogee (pictured) loves to hide from us and surprise us when we find her – followed
by high pitched squeals of love and tickles. One day she hid from us. We were 99% sure she was
in one of the nine hiding spots we know of in the apartment. She chooses any of the spots with
equal frequency, so we just started searching at random. We checked all but one of the spots, and
still no Shoogee. What’s the probability she has found a new hiding spot?
Consider the data in Table 1. Assume all observations (walks) are independent. Assume the
number of poopoos emitted by Shoogee Nulnul is distributed Poisson with parameter λS . Assume
the number of poopoos emitted by Max the Tax conditioned on the number of poopoos emitted by
Shoogee Nulnul s is distributed Poisson with parameter λM + s. Find the maximum likelihood
estimates of λS and λM . Show your work. (You may use a calculator.)
Page 5 (of 8)
Consider the data in Table 1. Assume all observations (walks) are independent. Assume the
number of poopoos emitted by Shoogee Nulnul is distributed Poisson with parameter λS = 0.60.
Assume the number of poopoos emitted by Max the Tax conditioned on the number of poopoos
emitted by Shoogee Nulnul s is distributed Poisson with parameter λM = 0.74. On a sixth walk
together, one of the doggos emitted 1 poopoos. Which dog is the most likely to have done this
according to a maximum a posteriori classifier? Show your work. (You may use a calculator.)
Page 6 (of 8)
Do not justify your answer. Instead, select exactly one option of (1.), (2.), or (3.).
Complete the following sentence: Out of all hyperplanes which solve a classification problem,
the one with widest margin will probably ...
The following diagram shows a small data set consisting of four RED samples (A, B, C, D)
and four BLUE samples (E, F, G, H). This data set can be linearly separated.
a) We use a linear support vector machine (SVM) without kernel function to correctly separate
the RED (A-D) and the BLUE (E-H) class. Which of the data points (A-H) will the support
vectors machine use to separate the two classes? Name the point(s) (no explanation needed).
(1p)
b) Assume someone suggests using a non-linear kernel for the SVM classification of the above
data set (A-H). Give one argument in favor of and one argument against using non-linear
SVM classification for such a data set. USE KEYWORDS! (2p)
Page 7 (of 8)
Do not justify your answer. Instead, select exactly one option of (1.), (2.), or (3.).
Error-Backpropagation-Training for Neuronal Networks requires the following:
Page 8 (of 8)