mock end term solution
mock end term solution
Final Exam
Weeks 1 to 12
Week 1
1. A social networking service company needs to develop a system that detects violent
images.The technical team proposed a machine learning model to achieve the task.Then,
they created a data set containing millions of images. About, 55 % of those images are
flagged as violent by the users. Assuming that all users genuinely flagged violent images
as violent, which of the following machine learning algorithm is (are) suitable to address
this problem?
A. Supervised Linear/non-linear classification algorithms
B. Supervised Linear/non-linear regression algorithms
C. Unsupervised Density estimation algorithms
D. None of the above
Answer: A
Solution:
Since the system needs to detect whether an image is violent nor not, the possible set
of outputs are finite (that is 2 in this case).Therefore this is a supervised classification
problem.Hence,any linear/non-linear classification algorithms is suitable.
Week 2
1. Find the equation of the tangent plane to the surface f (x, y, z) = x2 + y 2 + z 2 at point
(2, 2, −1).
Answer: f (x, y, z) = 4x + 4y − 2z − 9 = 0
Solution:
f (x, y, z) = x2 + y 2 + z 2
∂f
= 2x,
∂x
∂f
= 2y,
∂y
∂f
= 2z.
∂z
Equation of tangent at (2,2,-1):
f (x, y, z) = f (2, 2, −1) + 2(2)(x − 2) + 2(2)(y − 2) + 2(−1)(z + 1)
f (x, y, z) = 9 + 4(x − 2) + 4(y − 2) − 2(z + 1)
f (x, y, z) = 4x + 4y − 2z − 9
Course: Machine Learning - Foundations Page 2 of 12
Week 3
(Common data Q1, Q2) Consider two lines as follows:
Line A: A line with slope of 0.5 and y-intercept of 0.5.
Line B: A line with slope of 0.5 and y-intercept of 1
These two lines are used to fit a set of data points given in the below figure.
1. (2 points) Which of these lines gives the minimal mean sum of squared residuals to fit
the data points?
A. Line A
B. Line B
Answer: A
2. (1 point) Enter the minimum value of the mean sum of squared residuals.
Answer: 4.29, Range: 3.75, 4.75
Solution:
x y ŷA = 0.5x + 0.5 d2A = (y − ŷA )2 ŷB = 0.5x + 1 d2B = (y − ŷB )2
2 2 3.5 2.25 3 1
3 3 4 1 4 1
4 2 4.5 6.25 3 9
5 4 5 1 6 4
6 3 5.5 6.25 7 16
7 3 6 9 8 25
Mean sum of squared residuals for Line A = 16 (2.25 + 1 + 6.25 + 1 + 6.25 + 9) = 4.2917
Mean sum of squared residuals for Line B = 16 (1 + 1 + 9 + 4 + 16 + 25) = 9.333
Week 4
2 1 0
1. (3 points) The number of linearly independent eigenvectors of matrix A = 0 2 0
0 0 3
is
Answer: 2
Solution:
One of the eigenvalues is repeated.
Course: Machine Learning - Foundations Page 4 of 12
Week 5
1 1 + i 2i
1. (4 points) Consider a complex matrix A = 1 − i 5 −3. Then the eigenvalues
−2i −3 0
of AA∗ are?
A. λ1 = 4.9, λ2 = 7.8, λ3 = 43.8
B. λ1 = −2.8 + i, λ2 = 2.2 − 2i, λ3 = 6.5 + 0i
C. λ1 = i, λ2 = −2i, λ3 = 0
D. λ1 = −2.8, λ2 = 2.8 − 2i, λ3 = −2.8i
Answer: A
Solution:
The given matrix is Hermitian and the product AA∗ is also Hermitian.For Hermitian
matrix, all the eigenvalues are real.Therefore, option A is the only correct option.
Course: Machine Learning - Foundations Page 5 of 12
Week 6
Answer: B
fxx fxy
The hessian matrix is denoted as, H =
fxy fyy
4 2
f (x, y) =
2 2
Since D = fxx fyy − fxy 2 = 8 − 4 = 4 ≥ 0
and fxx = 4 > 0
Hence Given point is minima.
2. (1 point) (Multiple
select) Which of the following statements is/are true about the ma-
5 0 0
trix A = 0 7 0?
0 0 9
A. A is positive definite
B. A is positive semidefinite
C. A is negative definite
D. A is negative semidefinite
E. A is indefinite
Answer: A,B
Since It is symmetric matrix,and all eigen values are positive. Hence it will be positive
definite matrix as well as semidefinite matrix.
Course: Machine Learning - Foundations Page 6 of 12
Week 7
x2 cos(x) − x
1. (5 points) Consider f (x) = . Assume that the initial point x0 = 6 and the
10
step size η = 0.25. What would be the values of the next two points x1 and x2 according
to gradient descent algorithm?
x1 :
x2 :
Week 8
1. (1 point) What is the boundary value of y so that the function f (x, y) = (x−7)2 +(y+9)2
remains convex?
A. y ≥ 1
B. y ≥ 2
C. for any value of y ∈ R function will remain convex.
D. None of these
Answer: C
f (x, y) = (x − 7)2 + (y + 9)2
First order partial derivatives, fx = 2(x − 7), fy = 2(y + 9)
Second order partial derivatives, fxx = 2, fxy = 0, fyy = 2
for any x or y value fxx ≥ 0.
Here The hessian matrix will be,
fxx fxy 2 0
D= = =4
fxy fyy 0 2
D will be non negative for any x or y value hence the function remains convex for all
value of x or y ∈ R.
Course: Machine Learning - Foundations Page 8 of 12
Week 9
(Common data Q1, Q2, Q3) Solve the following optimization problem using KKT con-
ditions:
minimize f (x1 , x2 ) = (x1 − 4)2 + (x2 − 4)2
subject to
2x1 + 3x2 ≥ 6
12 − 3x1 − 2x2 ≥ 0
x1 , x2 ≥ 0.
1. (1 point) What is the minimum value of the function obtained at the optimal solution?
Answer: 4.926, Range: 4.5,5.4
−2x1 − 3x2 + 6 ≤ 0
The KKT conditions for the problem with multiple variables are given as follows:
(i) Stationary condition
2
∂f X ∂gj
+ uj = 0, i = 1, 2
∂xi j=1 ∂xi
Therefore, we get
2x1 − 2u1 + 3u2 − 8 = 0 (1)
2x2 − 3u1 + 2u2 − 8 = 0 (2)
(ii) Complementary slackness condition
ui gi = 0, i = 1, 2
Therefore, we get
u1 (6 − 2x1 − 3x2 ) = 0 (3)
u2 (−12 + 3x1 + 2x2 ) = 0 (4)
(iii) Primal feasibility condition
gi ≤ 0
6 − 2x1 − 3x2 ≤ 0 (5)
Course: Machine Learning - Foundations Page 9 of 12
Week 10
1. A discrete random variable X has the probability function as follows.
(
k × (1 − x)2 , for x = 1, 2, 3
P (X = x) =
0, otherwise
Evaluate E(X)
Answer: 2.8
Solution:
P
P (X = x) = 1
k + 4k = 1
k = 0.2
P
E(X) = P (X = xi ) × xi
0.2 × 2 + 0.8 × 3
x 1 2 3 4 5 6 7
2 2 2
P (X) k 2k 2k 3k k 2k 7k + k
Find P (X ≥ 6)
Answer: 0.19
P7
1 P (X) = 1
Here, k = 0.1
Week 11
Answer: 0.9375
Solution:
Using
R3 the definition of pdf.
0
f (x) =1
R 2.5
Now, P (X ≤ 2.5) = 0
f (x)
4 1
A. P (X = x) = , x ∈ {0, 1, 2, 3, · · · }
5 5x
2x
B. P (X = x) = , x ∈ {1, 2, 3, · · · n}
n(n + 1)
6x2
C. P (X = x) = , x ∈ {1, 2, 3, · · · n}
n(n + 1)(2n + 1)
x
D. P (X = x) = , x ∈ {1, 2, 3, · · · n}
2n(n + 1)
Answer: A, B, C
Pn
For equation A, B, C it follows 0 P (X = x) = 1
Week 12
1. Let X be random variable of binomial(n, p). Using Chebyshev’s inequality, find an upper
1 3
bound on P (X ≥ αn), where p < α < 1. Evaluate the upper bound for p = and α =
2 4
and n = 8
Answer: 0.5
Course: Machine Learning - Foundations Page 12 of 12
Solution:
P (X ≥ αn) = P (X − np ≥ αn − np)
≤ (|X − np| ≥ nα − np)
V ar(X)
≤
(nα − np)2
4
we will get upper bound as
n
2. (1 point) Find the maximum likelihood estimate of the parameter θ of a population
2
having density function as 2 × (θ − x),0 < x < θ for a sample of unit size, x being the
θ
sample value.
A. θ = 2x
B. θ = 4x
C. θ = 3x
D. θ = x
Answer: A
Solution:
2
L= × (θ − x)
θ2
−2 1
= + =0
θ θ−x
= 2(θ − x) − θ = 0
θ = 2x