Mock 2015 Wwithanswers
Mock 2015 Wwithanswers
Preliminaries
Problem 1 [2 points] Fill in your immatriculation number on every sheet you hand in. Make sure
it is easily readable. Make sure you do not write your name on any sheet you hand in.
imat:
Mock Exam · Page 2 Machine Learning 1 — WS2014/2015 — Module IN2064
1 Probability Theory
Problem 2 [2 points] Your friend Mark Z. has a hepatic carcinoma checkup using a new medical
technology. The result shows that Mark has the disease. The test is 98% accurate. In other words,
this method detects hepatic carcinoma in 98 out of 100 times when it is there, and in 2/100 misses
it; this method detects hepatic carcinoma in 2 of 100 cases when it is not there, and 98 of 100 times
correctly returns a negative result. The previous research of hepatic carcinoma suggests that one in
2,000 people has this disease. What is the probability that Mark actually has hepatic carcinoma? Put
down everything up to the point where you would use your calculator and stop there.
Assume T=1 represent a positive test outcome, T=0 represent a negative test outcome, D = 1
mean Mark has the disease, and D = 0 mean Mark doesn’t have the disease. We have, P (T =
1|D = 1) = 0.98, P (T = 0|D = 0 = 0.98) and P (D = 1) = 0.005. The question is simplified to
calculate P (D = 1|T = 1). With Bayes’ rule, we can derive:
P (T = 1|D = 1)P (D = 1)
P (D = 1|T = 1) =
P (T = 1|D = 1)P (D = 1) + P (T = 1|D = 0)P (D = 0)
0.98 × 0.005
= =
0.98 × 0.005 + 0.02 × 0.995
2 Linear Regression
Problem 3 [2 points] We have 1D input points X = [0, 1, 2], and corresponding 2D output Y =
[{−1, 1}, {1, −1}, {2, −1}]. We embed xi into 2d with the basis function:
Φ(0) = (1, 0)T , Φ(1) = (1, 1)T , Φ(2) = (2, 2)T
T 1 1 2
Φ =
0 1 2
T −1 T −1 1
Ŵ = (Φ Φ) Φ Y=
2 −1.6
imat:
Machine Learning 1 — WS2014 — Module IN2064 Mock Exam · Page 3
3 Logistic Regression
Consider the data in the following figure, where we fit the model p(y = 1|x, w) = σ(w0 +w1 x1 +w2 x2 ).
Suppose we fit the model by maximum likelihood, i.e., we minimize
and λ is a very large number. Sketch a possible decision boundary. Show your work.
Heavily regularizing w0 , so the line must go through the origin. There are several possible line
with different slopes.
Problem 6 [2 points] Now suppose we heavily regularize only the w1 parameter, i.e., we minimize
imat:
Mock Exam · Page 4 Machine Learning 1 — WS2014/2015 — Module IN2064
4 Multivariate Gaussian
Problem 7 [2 points] The plot below shows a joint Gaussian distribution p(x1 , x2 ). Qualitatively
(!) draw the conditionals p(x1 |x2 = 0) and p(x1 |x2 = 2) in the given coordinate systems. Note that
the scaling a is arbitrary but fixed.
0
x2
−1
−2
−3
−3 −2 −1 0 1 2 3
3a x1
p(x1|x2 = 0.0)
2a
a
0
−3 −2 −1 0 1 2 3
3a x1
p(x1|x2 = 2.0)
2a
a
0
−3 −2 −1 0 1 2 3
x1
imat:
Machine Learning 1 — WS2014 — Module IN2064 Mock Exam · Page 5
For a bivariate Gaussian distribution p(x1 , x2 ) = N (x1 , x2 |µ, Σ) with µ = (µ1 , µ2 )T and
2
σ1 σ12
Σ=
σ21 σ22
applying the formula for conditioning a Gaussian (e.g. Murphy, section 4.3.1) yields
σ12 2
σ12
2
p(x1 |x2 ) = N (x1 |µ1|2 , σ1|2 ) = N (x1 |µ1 + (x 2 − µ 2 ), σ1 − )
σ22 σ22
We see that while µ1|2 depends on the value of x2 , σ1|2 does not. That implies for the drawing:
• the shape of both conditional Gaussians is identical.
• µ1|2 can be roughly inferred graphically as the middle point between the intersections of the
horizontal x2 lines with the iso-curves.
5 Kernels
√ √ √
ϕ(x) = [1 x21 2x1 x2 x22 2x1 2x2 ]T
Problem 8 [2 points] Determine the kernel K(x, y). Simplify your answer.
6 Constrained optimization
Find the box with the maximum volume which has surface area no more than S ∈ R+ .
Problem 9 [2 points] Derive the Lagrangian of the problem and the corresponding Lagrange dual
function. Hint: set the parameters of the length, width and height to be l, w, h respectively.
Problem 10 [3 points] Solve the dual problem and give the solution to the original problem.
imat:
Mock Exam · Page 6 Machine Learning 1 — WS2014/2015 — Module IN2064
S
Subject to: h(l, w, h) = lw + lh + hw − 2 ≤0
S
L(l, w, h) = −lwh + β(lw + lh + hw − )
2
∂L
= −wh + β(w + h) = 0
∂l
∂L
= −lh + β(l + h) = 0
∂w
∂L
= −wl + β(w + l) = 0
∂h
(1)
S
g(α) = min L(l, w, h, α) = 4α3 − α .
l,w,h 2
dg
0=
dα
subject to dual feasibility α ≤ 0, yields
S 1/2
α=( )
24
and thus
S 1
l = w = h = ( )2
6
S 3
max(lwh) = − min(f ) = ( ) 2
6
7 Neural Networks
This is an unfair mock question about material you have not yet seen.
imat:
Machine Learning 1 — WS2014 — Module IN2064 Mock Exam · Page 7
imat: