ass9_soln
ass9_soln
Deep Learning
Assignment- Week 9
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
What can be a possible consequence of choosing a very small learning rate?
a. Slow convergence
b. Overshooting minima
c. Oscillations around the minima
d. All of the above
Correct Answer: a
Detailed Solution:
Choosing a very small learning rate can lead to slower convergence and thus option (a) is
correct.
______________________________________________________________________________
QUESTION 2:
The following is the equation of update vector for momentum optimizer. Which of the
following is true for ?
Correct Answer: a
Detailed Solution:
A fraction of the update vector of the past time step is added to the current update vector. is
that fraction which indicates how much acceleration you want and its value lies between 0 and 1.
______________________________________________________________________________
QUESTION 3:
Which of the following is true about momentum optimizer?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Correct Answer: d
Detailed Solution:
Option (a), (b) and (c) all are true for momentum optimiser. Thus, option (d) is correct.
______________________________________________________________________________
QUESTION 4:
Let be the cost function. Let the gradient descent update rule for be,
a.
b.
c.
d.
Correct Answer: a
Detailed Solution:
Gradient descent update rule for is,
, is the learning rate
______________________________________________________________________________
QUESTION 5:
2
-
gradient descent optimization at step t+1? Consider, to be the learning rate.
a.
b.
c.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d.
Correct Answer: a
Detailed Solution:
QUESTION 6:
If the first few iterations of gradient descent cause the function f 0 1) to increase rather than
decrease, then what could be the most likely cause for this?
Correct Answer: a
Detailed Solution:
If learning rate were small enough, then gradient descent should successfully take a tiny small
downhill and decrease 0 1) at least a little bit. If gradient descent instead increases the
objective value that means learning rate is too high.
______________________________________________________________________________
QUESTION 7:
For a function f 0 1), 0 1 are initialized at a global minimum, then what should be the
0 1 after a single iteration of gradient descent?
Correct Answer: b
Detailed Solution:
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
At a local minimum, the derivative (gradient) is zero, so gradient descent will not change the
parameters.
______________________________________________________________________________
QUESTION 8:
What can be one of the practical problems of exploding gradient?
a. Too large update of weight values leading to unstable network
b. Too small update of weight values inhibiting the network to learn
c. Too large update of weight values leading to faster convergence
d. Too small update of weight values leading to slower convergence
Correct Answer: a
Detailed Solution:
Exploding gradients are a problem where large error gradients accumulate and result in very
large updates to neural network model weights during training. This has the effect of your model
being unstable and unable to learn from your training data.
______________________________________________________________________________
QUESTION 9:
What are the steps for using a gradient descent algorithm?
1. Calculate error between the actual value and the predicted value
2. Update the weights and biases using gradient descent formula
3. Pass an input through the network and get values from output layer
4. Initialize weights and biases of the network with random values
5. Calculate gradient value corresponding to each weight and bias
a. 1, 2, 3, 4, 5
b. 5, 4, 3, 2, 1
c. 3, 2, 1, 5, 4
d. 4, 3, 1, 5, 2
Correct Answer: d
Detailed Solution:
Initialize random weights, and then start passing input instances and calculate error response
from output layer and back-propagate the error through each subsequent layers. Then update the
neuron weights using a learning rate and gradient of error. Please refer to the lectures of week 4.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 10:
You run gradient descent for 15 iterations with learning rate and compute error after
each iteration. You find that the value of error decreases very slowly. Based on this, which of
the following conclusions seems most plausible?
Correct Answer: a
Detailed Solution:
Error rate is decreasing very slowly. Therefore increasing the learning rate is a most plausible
solution.
______________________________________________________________________________
______________________________________________________________________________
************END*******