0% found this document useful (0 votes)
127 views

Assignment 5 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran

This document summarizes the solutions to 8 questions from an assignment on machine learning. 1) The first question asks which logical function a simple neural network computes based on its structure and activation thresholds. 2) The second question asks for the partial derivative of the loss function with respect to a weight parameter in a neural network being trained with backpropagation. 3) The third question asks for the value of the weight parameter after one update step of backpropagation training, given the learning rate and other parameters from question 2.

Uploaded by

noor ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views

Assignment 5 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran

This document summarizes the solutions to 8 questions from an assignment on machine learning. 1) The first question asks which logical function a simple neural network computes based on its structure and activation thresholds. 2) The second question asks for the partial derivative of the loss function with respect to a weight parameter in a neural network being trained with backpropagation. 3) The third question asks for the value of the weight parameter after one update step of backpropagation training, given the learning rate and other parameters from question 2.

Uploaded by

noor ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 5 (Sol.

)
Introduction to Machine Learning
Prof. B. Ravindran
1. You are given the following neural networks which take two binary valued inputs x1 , x2 ∈ {0, 1}
and the activation function is the threshold function(h(x) = 1 if x > 0; 0 otherwise). Which
of the following logical functions does it compute?

Figure 1: Q1

(a) OR
(b) AND
(c) NAND
(d) None of the above.
Solution: B
You can construct the truth table and see the values and decide which gate the network mimics.
0 0 0
0 1 0
1 0 0
1 1 1

2. We have a function which takes a two-dimensional input x = (x1 , x2 ) and has two parameters
w = (w1 , w2 ) given by f (x, w) = σ(σ(x1 w1 )w2 + x2 ) where σ(x) = 1+e1−x . We use backprop-
agation to estimate the right parameter values. We start by setting both the parameters to

1
0. Assume that we are given a training point x1 = 1, x2 = 0, y = 5. Given this information
∂f
answer the next two questions. What is the value of ∂w 2
?
(a) 0.5
(b) -0.25
(c) 0.125
(d) -0.5
Solution: C
Write σ(x1 w1 )w2 + x2 as o2 and x1 w1 as o1

∂f ∂f ∂o2
=
∂w2 ∂o2 ∂w2
∂f
= σ(o2 )(1 − σ(o2 )) × σ(o1 )
∂w2
∂f
= 0.5 ∗ 0.5 ∗ 0.5
∂w2
3. If the learning rate is 0.5, what will be the value of w2 after one update using backpropagation
algorithm?
(a) 0.0625
(b) -0.0625
(c) 0.5625
(d) - 0.5625
Solution: C
The update equation would be
∂L
w2 = w2 − λ
∂w2
where L is the loss function, here L = (y − f )2

∂f
w2 = w2 − λ × 2(y − f ) × (−1) ×
∂w2

Now putting in the given values we get the right answer.


4. Given N samples x1 , x2 , . . . , xN drawn independently from a Gaussian distribution with vari-
ance σ2 and unknown mean µ, find the MLE of the mean.
PN
xi
(a) µM LE = i=1
σ2
PN
i=1 xi
(b) µM LE = 2σ 2 N
PN
i=1 xi
(c) µM LE = N
PN
i=1 xi
(d) µM LE = N −1

2
Solution C
We will write the log likelihood as the following,
X 1 (xi −µ)2
L= log( √ e 2σ2 )
i
σ 2π

X (xi − µ)2
L=K+
i
2σ 2
∂L
Now we need to maximize this L, which we do by setting ∂µ to 0, which gives us option C as
the solution.
5. Continuing with the above question, assume that the prior distribution of the mean is also a
Gaussian distribution, but with parameters mean µp and variance σp2 . Find the MAP estimate
of the mean.
σ 2 µp +σp2 N
P
i=1 xi
(a) µM AP = σ +N σp2
2

σ 2 +σp2 N
P
i=1 xi
(b) µM AP = σ +σp2
2

σ 2 +σp2 N
P
i=1 xi
(c) µM AP = σ +N σp2
2

σ 2 µp +σp2 N
P
i=1 xi
(d) µM AP = N (σ +σp2 )
2

Solution C
For a MAP estimate, we try to maximize f (µ)f (X|µ)
(µ−µp )2
1 Y 1 (xi −µ)2
2
f (µ)f (X|µ) = √ e 2σp
√ e 2σ2
σp 2π i
σ 2π

We will maximize this with respect to µ after taking a logarithm. This will yield the following
equation, P
i xi µp N 1
+ − µ( + ) = 0
σ σp σ σp
Thus solution will be C

6. Which among the following statements is (are) true?

(a) MAP estimates suffer more from overfitting than maximum likelihood estimates.
(b) MAP estimates are equivalent to the ML estimates when the prior used in the MAP is a
uniform prior over the parameter space.
(c) One drawback of maximum likelihood estimation is that in some scenarios (hint: multi-
nomial distribution), it may return probability estimates of zero.
(d) The parameters which minimize the expected Bayesian L1 Loss is the median of the
posterior distribution.

Solution - B, C, D

3
7. Using the notations used in class and the tutorial document, evaluate the value of the neural
network with a 3-3-1 architecture (2-dimensional input with 1 node for the bias term in both
the layers). The parameters are as follows
 
1 0.2 0.4
α=
−1 0.3 0.5

 
β = 0.3 0.4 0.5
Using sigmoid function as the activation functions at both the layers, the output of the network
for an input of (0.8, 0.7) will be
(a) 0.6710
(b) 0.6617
(c) 0.6948
(d) 0.3369
Solution C
This is a straight forward computation task. First pad x with 1 and make it the X vector,
 
1
X = 0.8
0.7

The output of the first layer can be written as

o1 = αX

Next apply the sigmoid function and compute


1
a1 (i) =
1 + e−o1 (i)
Then pad the a1 vector also with 1 for bias, then compute the output of the second layer.

o2 = βa1
1
a2 =
1 + e−o2
a2 = 0.6948

8. Which of the following statements is/are true about Neural Networks?

(a) Neural Networks can model arbitrarily complex decision boundaries.


(b) Neural Networks can be used to emulate a Gaussian kernel SVM
(c) Training of a neural network is very sensitive to the initial weights.
(d) Ideal initialization for weights would be setting all of them to zeros

4
Solution A, B, C
A - Neural networks are also called as universal approximators, because of their ability to learn
complex functions by varying the number of layersPand nodes.
h
B - The decision from any SVM is given by ŷ = ( i=0 αK(xi , x) + b) where xi represent the
Support Vectors and K is the gaussian kernel. This can be implemented using a RBF-Neural
Network. The first layer would be the input layer. Second layer would be the radial basis
nodes, with as many nodes as support vectors in the SVM. And a single node in the final
layer. The centers of the gaussian basis functions would be the support vectors of the SVM.
The would be same as that of the kernel. The weights connected the hidden layer to the last
layer would be given by i and a bias b. The activation function for the last layer would be the
sgn function. C This is true because bad initializations might hinder the learning of the neural
network, for example if you use all zeros the network might not be able to learn anything
because of zero gradients.

You might also like