0% found this document useful (0 votes)
28 views12 pages

mock end term solution

The document outlines a final exam for a Machine Learning course, covering various topics over 12 weeks, including supervised learning, optimization problems, eigenvalues, and probability functions. Each week presents specific questions with answers and solutions, illustrating concepts such as classification algorithms, tangent planes, and KKT conditions. The exam assesses understanding of both theoretical and practical aspects of machine learning and related mathematical principles.

Uploaded by

Xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views12 pages

mock end term solution

The document outlines a final exam for a Machine Learning course, covering various topics over 12 weeks, including supervised learning, optimization problems, eigenvalues, and probability functions. Each week presents specific questions with answers and solutions, illustrating concepts such as classification algorithms, tangent planes, and KKT conditions. The exam assesses understanding of both theoretical and practical aspects of machine learning and related mathematical principles.

Uploaded by

Xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Course: Machine Learning - Foundations

Final Exam
Weeks 1 to 12

Week 1

1. A social networking service company needs to develop a system that detects violent
images.The technical team proposed a machine learning model to achieve the task.Then,
they created a data set containing millions of images. About, 55 % of those images are
flagged as violent by the users. Assuming that all users genuinely flagged violent images
as violent, which of the following machine learning algorithm is (are) suitable to address
this problem?
A. Supervised Linear/non-linear classification algorithms
B. Supervised Linear/non-linear regression algorithms
C. Unsupervised Density estimation algorithms
D. None of the above

Answer: A
Solution:
Since the system needs to detect whether an image is violent nor not, the possible set
of outputs are finite (that is 2 in this case).Therefore this is a supervised classification
problem.Hence,any linear/non-linear classification algorithms is suitable.

Week 2

1. Find the equation of the tangent plane to the surface f (x, y, z) = x2 + y 2 + z 2 at point
(2, 2, −1).
Answer: f (x, y, z) = 4x + 4y − 2z − 9 = 0
Solution:
f (x, y, z) = x2 + y 2 + z 2
∂f
= 2x,
∂x
∂f
= 2y,
∂y
∂f
= 2z.
∂z
Equation of tangent at (2,2,-1):
f (x, y, z) = f (2, 2, −1) + 2(2)(x − 2) + 2(2)(y − 2) + 2(−1)(z + 1)
f (x, y, z) = 9 + 4(x − 2) + 4(y − 2) − 2(z + 1)
f (x, y, z) = 4x + 4y − 2z − 9
Course: Machine Learning - Foundations Page 2 of 12

Week 3
(Common data Q1, Q2) Consider two lines as follows:
Line A: A line with slope of 0.5 and y-intercept of 0.5.
Line B: A line with slope of 0.5 and y-intercept of 1
These two lines are used to fit a set of data points given in the below figure.

1. (2 points) Which of these lines gives the minimal mean sum of squared residuals to fit
the data points?
A. Line A
B. Line B
Answer: A
2. (1 point) Enter the minimum value of the mean sum of squared residuals.
Answer: 4.29, Range: 3.75, 4.75
Solution:
x y ŷA = 0.5x + 0.5 d2A = (y − ŷA )2 ŷB = 0.5x + 1 d2B = (y − ŷB )2
2 2 3.5 2.25 3 1
3 3 4 1 4 1
4 2 4.5 6.25 3 9
5 4 5 1 6 4
6 3 5.5 6.25 7 16
7 3 6 9 8 25

Mean sum of squared residuals for Line A = 16 (2.25 + 1 + 6.25 + 1 + 6.25 + 9) = 4.2917
Mean sum of squared residuals for Line B = 16 (1 + 1 + 9 + 4 + 16 + 25) = 9.333

Therefore Line A gives minimal mean sum of squared residuals.


Course: Machine Learning - Foundations Page 3 of 12

Week 4
 
2 1 0
1. (3 points) The number of linearly independent eigenvectors of matrix A = 0 2 0
0 0 3
is
Answer: 2

Solution:
One of the eigenvalues is repeated.
Course: Machine Learning - Foundations Page 4 of 12

Week 5
 
1 1 + i 2i
1. (4 points) Consider a complex matrix A = 1 − i 5 −3. Then the eigenvalues
−2i −3 0
of AA∗ are?
A. λ1 = 4.9, λ2 = 7.8, λ3 = 43.8
B. λ1 = −2.8 + i, λ2 = 2.2 − 2i, λ3 = 6.5 + 0i
C. λ1 = i, λ2 = −2i, λ3 = 0
D. λ1 = −2.8, λ2 = 2.8 − 2i, λ3 = −2.8i

Answer: A
Solution:
The given matrix is Hermitian and the product AA∗ is also Hermitian.For Hermitian
matrix, all the eigenvalues are real.Therefore, option A is the only correct option.
Course: Machine Learning - Foundations Page 5 of 12

Week 6

1. (1 point) Given f (x, y) = 2x2 + 2xy + y 2 , the point (0, 0) is a .


A. maxima
B. minima
C. saddle point
D. None of these

Answer: B
 
fxx fxy
The hessian matrix is denoted as, H =
fxy fyy
 
4 2
f (x, y) =
2 2
Since D = fxx fyy − fxy 2 = 8 − 4 = 4 ≥ 0
and fxx = 4 > 0
Hence Given point is minima.

2. (1 point) (Multiple
 select) Which of the following statements is/are true about the ma-
5 0 0
trix A = 0 7 0?
0 0 9
A. A is positive definite
B. A is positive semidefinite
C. A is negative definite
D. A is negative semidefinite
E. A is indefinite

Answer: A,B
Since It is symmetric matrix,and all eigen values are positive. Hence it will be positive
definite matrix as well as semidefinite matrix.
Course: Machine Learning - Foundations Page 6 of 12

Week 7
x2 cos(x) − x
1. (5 points) Consider f (x) = . Assume that the initial point x0 = 6 and the
10
step size η = 0.25. What would be the values of the next two points x1 and x2 according
to gradient descent algorithm?
x1 :
x2 :

Answer: 5.4859, 4.775


1
Solution: f ′ (x) = [−x2 sin(x) + cos(x)(2x) − 1]
10

x1 = x0 − ηf (x0 )
x1 = 5.4859
x2 = x1 − ηf ′ (x1 )
x2 = 4.775
Course: Machine Learning - Foundations Page 7 of 12

Week 8

1. (1 point) What is the boundary value of y so that the function f (x, y) = (x−7)2 +(y+9)2
remains convex?
A. y ≥ 1
B. y ≥ 2
C. for any value of y ∈ R function will remain convex.
D. None of these

Answer: C
f (x, y) = (x − 7)2 + (y + 9)2
First order partial derivatives, fx = 2(x − 7), fy = 2(y + 9)
Second order partial derivatives, fxx = 2, fxy = 0, fyy = 2
for any x or y value fxx ≥ 0.
Here The hessian matrix will be,
   
fxx fxy 2 0
D= = =4
fxy fyy 0 2
D will be non negative for any x or y value hence the function remains convex for all
value of x or y ∈ R.
Course: Machine Learning - Foundations Page 8 of 12

Week 9
(Common data Q1, Q2, Q3) Solve the following optimization problem using KKT con-
ditions:
minimize f (x1 , x2 ) = (x1 − 4)2 + (x2 − 4)2
subject to
2x1 + 3x2 ≥ 6
12 − 3x1 − 2x2 ≥ 0
x1 , x2 ≥ 0.
1. (1 point) What is the minimum value of the function obtained at the optimal solution?
Answer: 4.926, Range: 4.5,5.4

2. (2 points) Enter the value for x∗1 .


Answer: 2.153, Range:1.9, 2.5

3. (2 points) Enter the value for x∗2 .


Answer: 2.769, Range: 2.4,3,2
Solution:
First let us convert the constraints to standard form:

−2x1 − 3x2 + 6 ≤ 0

−12 + 3x1 + 2x2 ≤ 0

The KKT conditions for the problem with multiple variables are given as follows:
(i) Stationary condition
2
∂f X ∂gj
+ uj = 0, i = 1, 2
∂xi j=1 ∂xi
Therefore, we get
2x1 − 2u1 + 3u2 − 8 = 0 (1)
2x2 − 3u1 + 2u2 − 8 = 0 (2)
(ii) Complementary slackness condition

ui gi = 0, i = 1, 2

Therefore, we get
u1 (6 − 2x1 − 3x2 ) = 0 (3)
u2 (−12 + 3x1 + 2x2 ) = 0 (4)
(iii) Primal feasibility condition
gi ≤ 0
6 − 2x1 − 3x2 ≤ 0 (5)
Course: Machine Learning - Foundations Page 9 of 12

−12 + 3x1 + 2x2 ≤ 0 (6)


(iv) Dual feasibility condition
ui ≥ 0

Case (i): From (3), Let u1 = 0


Substitute in (1);
2x1 + 3u2 − 8 = 0
−3u2 + 8
x1 = (7)
2
Substitute in (2);
2x2 + 2u2 − 8 = 0
x2 = −u2 + 4 (8)

Substitute (7) and (8) in (4) ;

u2 (−12 + 3 ∗ 21 (−3u2 + 8) + 2(−u2 + 4)) = 0


u2 (−12 − 92 u2 + 12 − 2u2 + 8) = 0
13
u2 (− u2 + 8) = 0 (9)
2
Case (i)a: Let u2 = 0 from (9)
Substituting in (7) and (8); we get x1 = −4 and x2 = 4. This violates condition x1 ≥ 0.
Case (i)b: Let − 13
2 2
u + 8 = 0 from (9)
We get;
16
u2 = (10)
13
Substitute (10) in (7) and (8);
x1 = 12 (−3 ∗ 16
13
+ 8)
28
x1 = = 2.153 (11)
13
x2 = − 16
13
+4
36
x2 = = 2.769 (12)
13
This solution [x∗1 , x∗2 ] = [2.153, 2.769] satisfies all the KKT conditions and hence optimal.

Minimum value of the function obtained at the optimal solution


= (2.153 − 4)2 + (2.769 − 4)2 = 4.926
Course: Machine Learning - Foundations Page 10 of 12

Week 10
1. A discrete random variable X has the probability function as follows.
(
k × (1 − x)2 , for x = 1, 2, 3
P (X = x) =
0, otherwise

Evaluate E(X)

Answer: 2.8
Solution:

P
P (X = x) = 1

k + 4k = 1

k = 0.2

P
E(X) = P (X = xi ) × xi

0.2 × 2 + 0.8 × 3

0.4 + 2.4 = 2.8


2. A discrete random variables X has the probability function as given in table

x 1 2 3 4 5 6 7
2 2 2
P (X) k 2k 2k 3k k 2k 7k + k

Table 1: Table: Probability distribution

Find P (X ≥ 6)

Answer: 0.19
P7
1 P (X) = 1

Solve the above equation to get the value of k

Here, k = 0.1

P (X ≥ 6) = 9k 2 + k = 9(0.1)2 + 0.1 = 0.19


Course: Machine Learning - Foundations Page 11 of 12

Week 11

1. Probability density function of a random variable X is given by




 ax if 0 ≤ x ≤ 1

a if 1 ≤ x ≤ 2
f (x) =
−ax + 3a if 2 ≤ x ≤ 3



0 otherwise

Find the P (X ≤ 2.5).

Answer: 0.9375
Solution:
Using
R3 the definition of pdf.
0
f (x) =1

Solving above equation will give us the value of a, that is 0.5

R 2.5
Now, P (X ≤ 2.5) = 0
f (x)

2. Which of the following is/are probability mass functions?

4 1
A. P (X = x) = , x ∈ {0, 1, 2, 3, · · · }
5 5x
2x
B. P (X = x) = , x ∈ {1, 2, 3, · · · n}
n(n + 1)
6x2
C. P (X = x) = , x ∈ {1, 2, 3, · · · n}
n(n + 1)(2n + 1)
x
D. P (X = x) = , x ∈ {1, 2, 3, · · · n}
2n(n + 1)

Answer: A, B, C
Pn
For equation A, B, C it follows 0 P (X = x) = 1

Week 12

1. Let X be random variable of binomial(n, p). Using Chebyshev’s inequality, find an upper
1 3
bound on P (X ≥ αn), where p < α < 1. Evaluate the upper bound for p = and α =
2 4
and n = 8

Answer: 0.5
Course: Machine Learning - Foundations Page 12 of 12

Solution:

P (X ≥ αn) = P (X − np ≥ αn − np)
≤ (|X − np| ≥ nα − np)
V ar(X)

(nα − np)2

Putting the values of α, p and n

4
we will get upper bound as
n
2. (1 point) Find the maximum likelihood estimate of the parameter θ of a population
2
having density function as 2 × (θ − x),0 < x < θ for a sample of unit size, x being the
θ
sample value.
A. θ = 2x
B. θ = 4x
C. θ = 3x
D. θ = x

Answer: A
Solution:

2
L= × (θ − x)
θ2

Taking log and then differentiating w.r.t θ gives

−2 1
= + =0
θ θ−x

= 2(θ − x) − θ = 0

θ = 2x

You might also like