0% found this document useful (0 votes)
51 views6 pages

Deep Learning - week 4

nptel

Uploaded by

Vj Br
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views6 pages

Deep Learning - week 4

nptel

Uploaded by

Vj Br
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected]

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Deep Learning - IIT Ropar (course)

Course Week 4 : Assignment 4


outline The due date for submitting this assignment has passed.
Due on 2024-08-21, 23:59 IST.
About
NPTEL ()
Assignment submitted on 2024-08-21, 16:46 IST
How does an 1) A team has a data set that contains 1000 samples for training a feed-forward neural 1 point
NPTEL network. Suppose they decided to use stochastic gradient descent algorithm to update the
online weights. How many times do the weights get updated after training the network for 5 epochs?
course
work? () 1000
5000
Week 1 ()
100
5
Week 2 ()
Yes, the answer is correct.
Week 3 () Score: 1
Accepted Answers:
5000
week 4 ()

Recap: 2) What are the benefits of using stochastic gradient descent compared to vanilla 1 point
Learning gradient descent?
Parameters:
Guess Work, SGD converges more quickly than vanilla gradient descent.
Gradient SGD is computationally efficient for large datasets.
Descent (unit?
unit=59&lesso
SGD theoretically guarantees that the descent direction is optimal.
n=60) SGD experiences less oscillation compared to vanilla gradient descent.

Contours Yes, the answer is correct.


Maps (unit? Score: 1
unit=59&lesso Accepted Answers:
n=61) SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 1/6
10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

Momentum 3) A team has a data set that contains 100 samples for training a feed-forward neural 1 point
based network. Suppose they decided to use the gradient descent algorithm to update the weights.
Gradient Suppose further that they use line search algorithm for the learning rate as follows,
Descent (unit? η = [0.01, 0.1, 1, 2, 10]. How many times do the weights get updated after training the network
unit=59&lesso for 10 epochs? (Note, for each weight update the loss has to decrease)
n=62)
100
Nesterov
Accelerated 5
Gradient 500
Descent (unit?
10
unit=59&lesso
n=63) 50

Stochastic And No, the answer is incorrect.


Score: 0
Mini-Batch
Gradient Accepted Answers:
Descent (unit? 10
unit=59&lesso
n=64) 4) The figure below shows the change in loss value over iterations 1 point

Tips for
Adjusting
Learning Rate
and
Momentum
(unit?
unit=59&lesso
n=65)

Line Search
(unit?
unit=59&lesso
n=66)

Gradient The oscillation in the loss value might be due to


Descent with
Adaptive Mini-batch gradient descent algorithm used for parameter updates
Learning Rate Batch gradient descent with constant learning rate algorithm used for parameter updates
(unit?
Stochastic gradient descent algorithm used for parameter updates
unit=59&lesso
n=67) Batch gradient descent with line search algorithm used for parameter updates

Bias No, the answer is incorrect.


Score: 0
Correction in
Adam (unit? Accepted Answers:
unit=59&lesso
Mini-batch gradient descent algorithm used for parameter updates
n=68) Stochastic gradient descent algorithm used for parameter updates

Lecture
5) What is the advantage of using mini-batch gradient descent over batch gradient 1 point
Material for
descent?
Week 4 (unit?
unit=59&lesso
Mini-batch gradient descent is more computationally efficient than batch gradient
n=69)
descent.
Week 4 Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch
Feedback
gradient descent.
Form: Deep
Learning - IIT Mini batch gradient descent gives us a better solution.

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 2/6
10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

Ropar (unit? Mini-batch gradient descent can converge faster than batch gradient descent.
unit=59&lesso
n=187) Yes, the answer is correct.
Score: 1
Quiz: Week 4 Accepted Answers:
: Assignment Mini-batch gradient descent is more computationally efficient than batch gradient descent.
4 Mini-batch gradient descent can converge faster than batch gradient descent.
(assessment?
name=288) 6) Which of the following represents the contour plot of the function f(x,y) = x 2 − y 2 ? 1 point

Week 5 ()

Week 6 ()

Week 7 ()

Week 8 ()

Week 9 ()

week 10 ()

Week 11 ()

Week 12 ()

Download
Videos ()

Books ()

Text
Transcripts
()

Problem
Solving
Session -
July 2024 ()

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 3/6
10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

Yes, the answer is correct.


Score: 1
Accepted Answers:

7) Consider a gradient profile ∇W = [1, 0.9, 0.6, 0.01, 0.1, 0.2, 0.5, 0.55, 0.56]. 1 point
Assume v−1 = 0, ϵ = 0, β = 0.9 and the learning rate is η −1 = 0.1 . Suppose that we use the
Adagrad algorithm then what is the value of η 6 = η/sqrt(vt + ϵ)?

0.03

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 4/6
10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

0.06
0.08
0.006

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.06

8) Which of the following can help avoid getting stuck in a poor local minimum while 1 point
training a deep neural network?

Using a smaller learning rate.


Using a smaller batch size.
Using a shallow neural network instead.
None of the above.

No, the answer is incorrect.


Score: 0
Accepted Answers:
None of the above.

9) What are the two main components of the ADAM optimizer? 1 point

Momentum and learning rate.


Gradient magnitude and previous gradient.
Exponential weighted moving average and gradient variance.
Learning rate and a regularization term.

Yes, the answer is correct.


Score: 1
Accepted Answers:
Exponential weighted moving average and gradient variance.

10) What is the role of activation functions in deep learning? 1 point

Activation functions transform the output of a neuron into a non-linear function, allowing
the network to learn complex patterns.
Activation functions make the network faster by reducing the number of iterations needed
for training.
Activation functions are used to normalize the input data.
Activation functions are used to compute the loss function.

Yes, the answer is correct.


Score: 1
Accepted Answers:
Activation functions transform the output of a neuron into a non-linear function, allowing the
network to learn complex patterns.

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 5/6
10/24/24, 10:08 AM Deep Learning - IIT Ropar - - Unit 7 - week 4

https://onlinecourses.nptel.ac.in/noc24_cs114/unit?unit=59&assessment=288 6/6

You might also like