Assignment_4_2022
Assignment_4_2022
Deep Learning
Assignment- Week 4
TYPE OF QUESTION: MCQ/MSQ
Number of questions: 10 Total mark: 10 X 1 = 10
______________________________________________________________________________
QUESTION 1:
A given cost function is of the form J(θ) = θ2 - θ+2? What is the weight update rule for gradient
descent optimization at step t+1? Consider, 𝛼=0.01 to be the learning rate.
a. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
b. 𝜃𝑡+1 = 𝜃𝑡 + 0.01(2𝜃)
c. 𝜃𝑡+1 = 𝜃𝑡 − (2𝜃 − 1)
d. 𝜃𝑡+1 = 𝜃𝑡 − 0.01(𝜃 − 1)
Correct Answer: a
Detailed Solution:
𝜕𝐽(𝜃)
= 2𝜃 − 1
𝜕𝜃
So, weight update will be
𝜃𝑡+1 = 𝜃𝑡 − 0.01(2𝜃 − 1)
______________________________________________________________________________
QUESTION 2:
Can you identify in which of the following graph gradient descent will not work correctly?
a. First figure
b. Second figure
c. First and second figure
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
d. Fourth figure
Correct Answer: b
Detailed Solution:
This is a classic example of saddle point problem of gradient descent. In the second graph
gradient descent may get stuck in the saddle point.
______________________________________________________________________________
QUESTION 3:
From the following two figures can you identify which one corresponds to batch gradient
descent and which one to Stochastic gradient descent?
Correct Answer: a
Detailed Solution:
The graph of cost vs epochs is quite smooth for batch gradient descent because we are
averaging over all the gradients of training data for a single step. The average cost over the
epochs in Stochastic gradient descent fluctuates because we are using one example at a
time.
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 4:
Suppose for a cost function 𝐽(𝜃) = 0.25𝜃 2 as shown in graph below, in which point do you feel
magnitude of weight update will be more? 𝜃 is plotted along horizontal axis.
Correct Answer: a
Detailed Solution:
Weight update is directly proportional to the magnitude of the gradient of the cost
𝜕𝐽(𝜃)
function. In our case, = 0.5𝜃. So, the weight update will be more for higher values of 𝜃.
𝜕𝜃
______________________________________________________________________________
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
QUESTION 5:
Which logic function can be performed using a 2-layered Neural Network?
a. AND
b. OR
c. XOR
d. All
Correct Answer: d
Detailed Solution:
A two layer neural network can be used for any type logic Gate (linear or non linear)
implementation.
____________________________________________________________________________
QUESTION 6:
Let X and Y be two features to discriminate between two classes. The values and class labels of
the features are given hereunder. The minimum number of neuron-layers required to design
the neural network classifier
X Y #Class
0 2 Class-II
1 2 Class-I
2 2 Class-I
1 3 Class-I
1 -3 Class-II
a. 1
b. 2
c. 4
d. 5
Correct Answer: a.
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
Detailed Solution:
Plot the feature points. They are linearly separable. Hence single layer is able to do the
classification task.
____________________________________________________________________________
QUESTION 7:
Which among the following options give the range for a logistic function?
a. -1 to 1
b. -1 to 0
c. 0 to 1
d. 0 to infinity
Correct Answer: c
Detailed Solution:
______________________________________________________________________________
QUESTION 8:
The number of weights (including bias) to be learned by the neural network having 3 inputs and
2 classes and a hidden layer with 5 neurons is: (Assume we use 2 output nodes for 2 classes)
a. 12
b. 15
c. 25
d. 32
Correct Answer: d
Detailed Solution:
______________________________________________________________________________
QUESTION 9:
For a XNOR function as given in the figure below, activation function of each node is given by:
1, 𝑥 ≥ 0
𝑓(𝑥) = { . Consider 𝑋1 = 1 and𝑋2 = 0, what will be the output for the above
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
neural network?
a. 1.5
b. 2
c. 0
d. 1
Correct Answer: c
Detailed Solution:
____________________________________________________________________________
QUESTION 10:
Which activation function is more prone to vanishing gradient problem?
NPTEL Online Certification Courses
Indian Institute of Technology Kharagpur
a. ReLU
b. Tanh
c. sigmoid
d. Threshold
Correct Answer: b
Detailed Solution:
************END*******