0% found this document useful (0 votes)

11 views8 pages

CST414-SCHEME

This document is a draft scheme of valuation and answer key for the Deep Learning course (CST414) at APJ Abdul Kalam Technological University for the October 2023 examination. It outlines the structure of the exam, including parts A and B, with specific questions and marks allocated for each. The document includes various topics related to deep learning, such as regularization, CNNs, RNNs, and evaluation techniques.

Uploaded by

sonyajimon306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views8 pages

CST414-SCHEME

Uploaded by

sonyajimon306

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

0400CST414052303

DRAFT Scheme of Valuation/Answer Key

(Scheme of evaluation (marks in brackets) and answers of problems/key)
APJ ABDUL KALAM TECHNOLOGICAL UNIVERSITY
EIGHTH SEMESTER B.TECH DEGREE(S) EXAMINATION, OCTOBER2023(2019 SCHEME)
Course Code: CST414
Course Name: DEEP LEARNING
Max. Marks: 100 Duration: 3 Hours

PART A
Answer all questions, each carries 3 marks. Marks

1 (3)
Train with more data, Data augmentation, Addition of noise to the input data, Feature
selection, Cross-validation, Regularization, Ensembling, Early stopping, Adding dropout
layers
Any 3 methods- 3 marks
2 X=X1*W1+X2*W2+X3*W3+b (3)
Given b=0, Assume initial weights.
Output, Y= σ(X)
ie: 1/(1 + e-x)

Equation- 1 mark
Solution-2 marks
3 a) Dataset augmentation – 3 marks (3)
4 L2 Regularization is a commonly used technique in ML systems is also sometimes (3)
referred to as “Weight Decay”. It works by adding a quadratic term to the Cross
Entropy Loss Function L, called the Regularization Term, which results in a new
Loss Function LR given by:

The Regularization Term consists of the sum of the squares of all the link weights
in the DLN, multiplied by a parameter λ called the Regularization Parameter. This
is another Hyperparameter whose appropriate value needs to be chosen as part of
the training process by using the validation data set. By choosing a value for this
parameter, we decide on the relative importance of the Regularization Term vs the
Loss Function term. Note that the Regularization Term does not include the biases,
since in practice it has been found that their inclusion does not make much of a
difference to the final result. The value of the λ governs the relative importance of
the Cross Entropy term(L) vs the regularization term and as λ increases, the system
tends to favour smaller and smaller weight values.

The weight update rule for L2 regularization is

Page 1of 8
0400CST414052303

……………Eqn 1
Explanation-3 marks

5 Because consecutive layers are only partially connected and because it heavily (3)
reuses its weights, a CNN has many fewer parameters than a fully connected DNN,
which makes it much faster to train, reduces the risk of overfitting, and requires
much less training data.
Any two advantages- 3 marks
6 Output: [(n+2p-f)/s+1] X [(n+2p-f)/s+1]xK (3)

N=64, f=5, p=1,s=1, K=2

Output dimension= (64+2-5/1+1)x(64+2-5/1+1)x2

ie. 62x62x2.
Equation-1 mark
Solution-2 marks
7 (3)
Basic formula of RNN is shown below:

It basically says the current hidden state h(t) is a function f of the previous hidden
state h(t-1) and the current input x(t). The theta are the parameters of the function f.
The network typically learns to use h(t) as a kind of lossy summary of the task-relevant
aspects of the past sequence of inputs up to t. Unfolding maps the left to the right in
the figure below.

where the black square indicates that an interaction takes place with a delay of 1 time
step, from the state at time t to the state at time t + 1. Unfolding/parameter sharing is
better than using different parameters per position: less parameters to estimate,
generalize to various length.
Diagram- 2 marks

Page 2of 8
0400CST414052303

Explanation -1 mark
8 A recursive network has a computational graph that generalizes that of the recurrent (3)
network from a chain to a tree.

It generalizes a recurrent network from a chain to a tree. A variable sequence

x(1),x(2)…..x(t) can be mapped to a fixed size representation (the output o), with a
fixed set of parameters (the weight matrices U,V,W).

Diagram- 2 marks
Explanation -1 mark
9 Representation learning- 3 marks (3)
10 Importance of deep learning in natural language processing- 3 marks (3)
PART B
Answer one full question from each module, each carries 14 marks.
Module I
11 a) (10)

Multilayer perceptron- Diagram- 3 marks, Explanation- 2 marks

Weight updation rule using gradient descent- 5 marks

Page 3of 8
0400CST414052303

b) (4)

Activation functions- 4 marks

OR
12 a) (8)
Y=0.3*-0.21+0.5*0.53+0.8*0.31+0.6*0.82+1*0.25
= -0.063+0.265+0.248+0.492+0.25=1.192 (4 marks)
Bipolar sigmoid function,

(2 marks)
Bipolar sigmoid(Y)= 0.697/1.303=0.5349 (2 marks)
b) Importance of step size in neural networks (6)
• determines the subset of the local optima that the algorithm can converge to
• Large step sizes can cause you to overstep local minima.
6 marks
Module II
13 a) Stochastic gradient descent (SGD) in contrast performs a parameter update (8)
for each training example x(i) and label y(i):

Batch gradient descent performs redundant computations for large datasets, as

it recomputes gradients for similar examples before each parameter update. SGD
does away with this redundancy by performing one update at a time. It is
therefore usually much faster. SGD performs frequent updates with a high
variance that cause the objective function to fluctuate heavily

Page 4of 8
0400CST414052303

Momentum is a method that helps accelerate SGD in the relevant direction and
dampens oscillation. It does this by adding a fraction γ of the update vector of
the past time step to the current update vector:

Essentially, when using momentum, we push a ball down a hill. The ball
accumulates momentum as it rolls downhill, becoming faster and faster on the

way (until it reaches its terminal velocity if there is air resistance, i.e. γ<1). The

same thing happens to our parameter updates: The momentum term increases
for dimensions whose gradients point in the same directions and reduces
updates for dimensions whose gradients change directions. As a result, we gain
faster convergence and reduced oscillation.
Equations- 4 marks
Explanation- 4 marks
b) Early Stopping is one of the most popular, and also effective, techniques to prevent (6)
overfitting. Use the validation data set to compute the loss function at the end of
each training epoch, and once the loss stops decreasing, stop the training and use
the test data to compute the final classification accuracy. In practice it is more
robust to wait until the validation loss has stopped decreasing for four or five
successive epochs before stopping. The point at which the validation loss starts to
increase is when the model starts to overfit the training data, since from this point
onwards its generalization ability starts to decrease. Early Stopping can be used by
iteself or in combination with other Regularization techniques.

Explanation- 6 marks

OR
14 a) Explanation- 7 marks (7)
b) • Can handle sparse gradients on noisy datasets. (7)
• Default hyperparameter values do well on most problems.
• Computationally efficient.
• Requires little memory, thus memory efficient.
• Works well on large datasets.
Advantages- 7 marks
Module III

Page 5of 8
0400CST414052303

15 a) (10)

Diagram-5 marks
Explanation- 5 marks
b) Convolution networks can be used to output a high dimensional structured object, (4)
rather than just predicting a class label for a classification task or a real value for
regression tasks. Eg: The model might emit a tensor S where Si,j,k is the probability
that pixel (j, k) of the input belongs to class i.

Explanation- 4 marks
OR
16 a) Dilated convolution (8)
Transposed Convolution
Seperable convolution
Variants- 8 marks
b) Sparse representation (6)
Equivariance to translation
Parameter sharing
Explanation- 2 marks each
Module IV
17 a) (8)

Diagram- 4 marks
Explanation- 4 marks

Page 6of 8
0400CST414052303

b) (6)

Diagram- 4 marks
Explanation- 2 marks
OR
18 a) (9)

Diagram- 5 marks
Explanation- 4 marks
b) Explanation with necessary equations-5 marks (5)
Module V
19 a) Any 2 methods(Word2Vec , GloVe )- 5 marks each (10)
b) Application of deep learning in Speech Recognition- 4 marks (4)

OR
20 a) Merits- 3.5 marks (7)
Demerits- 3.5 marks
b) Boltzmann Machine – 3.5 marks (7)
Deep Belief Network – 3.5 marks

Page 7of 8
0400CST414052303

****

Page 8of 8

[PIT IKA-13 MALANG 2025] PARTNERSHIP PROPOSAL_END-1
No ratings yet
[PIT IKA-13 MALANG 2025] PARTNERSHIP PROPOSAL_END-1
23 pages
ĐỀ THI CHUYÊN ANH BÌNH THUẬN 2024 2025
No ratings yet
ĐỀ THI CHUYÊN ANH BÌNH THUẬN 2024 2025
10 pages
WEEK 9
No ratings yet
WEEK 9
80 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
65 pages
Magazine2023_24OCT
No ratings yet
Magazine2023_24OCT
136 pages
7COM1033test_0000
No ratings yet
7COM1033test_0000
4 pages
Unit 4 NNDL-1
No ratings yet
Unit 4 NNDL-1
12 pages
6_Tips for Training Deep Neural Networks
No ratings yet
6_Tips for Training Deep Neural Networks
59 pages
NEP-Need of NBA
No ratings yet
NEP-Need of NBA
39 pages
DL MODULE 2
No ratings yet
DL MODULE 2
8 pages
The Atlantic_March 2025
No ratings yet
The Atlantic_March 2025
104 pages
Cs230exam spr21
No ratings yet
Cs230exam spr21
16 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
901 Class 6 STUDY MATREIAL NOTES ARTIFICIAL INTELLIGENCE CHAP-1 (2024-25)
No ratings yet
901 Class 6 STUDY MATREIAL NOTES ARTIFICIAL INTELLIGENCE CHAP-1 (2024-25)
26 pages
s12859-024-05970-9
No ratings yet
s12859-024-05970-9
40 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
ANN notes
No ratings yet
ANN notes
7 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
SS_2021_Solutions
No ratings yet
SS_2021_Solutions
16 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
Unit – IV
No ratings yet
Unit – IV
24 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
About AI Automation
No ratings yet
About AI Automation
18 pages
Problem Statements
No ratings yet
Problem Statements
21 pages
DSE 3151 25 Sep 2023
No ratings yet
DSE 3151 25 Sep 2023
9 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
APS360H1 20231 631682452284APS360 Midterm Winter 2023
No ratings yet
APS360H1 20231 631682452284APS360 Midterm Winter 2023
16 pages
Time 09.10.2023
100% (1)
Time 09.10.2023
66 pages
Intelligent Data Catalog For Axon, IICS and EDP
No ratings yet
Intelligent Data Catalog For Axon, IICS and EDP
35 pages
2024-exam2-solution
No ratings yet
2024-exam2-solution
11 pages
Major Domains for IT Projects
No ratings yet
Major Domains for IT Projects
10 pages
FP&A Board Maturity Model
No ratings yet
FP&A Board Maturity Model
21 pages
Ai Superpowers Summary
No ratings yet
Ai Superpowers Summary
4 pages
SS 2020 Solutions
No ratings yet
SS 2020 Solutions
22 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
Generative AI Hub
No ratings yet
Generative AI Hub
29 pages
SS_2021
No ratings yet
SS_2021
16 pages
WS_2021_Solutions
No ratings yet
WS_2021_Solutions
16 pages
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
No ratings yet
General Notes: Heruntergeladen Durch Petre Weinberger (Extern - Weinberger@tum - De)
6 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
CaltechMag Spring2024
No ratings yet
CaltechMag Spring2024
23 pages
3
No ratings yet
3
11 pages
AD3501 DEEP LEARNING QB
No ratings yet
AD3501 DEEP LEARNING QB
8 pages
SS_2020
No ratings yet
SS_2020
21 pages
Fuzzy Logic and Hybrid Based Approaches For The Risk of Heart
No ratings yet
Fuzzy Logic and Hybrid Based Approaches For The Risk of Heart
17 pages
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
No ratings yet
CS 229, Autumn 2017 Problem Set #4: EM, DL & RL
10 pages
WS_2021
No ratings yet
WS_2021
16 pages
CST384 jun 2022
No ratings yet
CST384 jun 2022
2 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
M3 L4 RNN Regularization
No ratings yet
M3 L4 RNN Regularization
24 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
ML Endsem 2022
No ratings yet
ML Endsem 2022
7 pages
Emerging Trends in Cybersecurity
No ratings yet
Emerging Trends in Cybersecurity
3 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
19CSE456_VI Sem May 2022
No ratings yet
19CSE456_VI Sem May 2022
6 pages
Cs230exam Spr18 Soln PDF
100% (1)
Cs230exam Spr18 Soln PDF
45 pages
Startup List With Phone Number (1)
No ratings yet
Startup List With Phone Number (1)
2 pages
Plan and Prepare for Dynamics 365 S
No ratings yet
Plan and Prepare for Dynamics 365 S
2 pages
Artificial Intelligence in India's Judiciary System
No ratings yet
Artificial Intelligence in India's Judiciary System
16 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Impact_of_AI_Automation_on_Job_Market
No ratings yet
Impact_of_AI_Automation_on_Job_Market
5 pages
CM412_DL_Model Paper
No ratings yet
CM412_DL_Model Paper
5 pages
Solution PDF
No ratings yet
Solution PDF
20 pages
Deep Learning Chemical Engineering
100% (1)
Deep Learning Chemical Engineering
13 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
8 pages
Terna Engineering College: Rohini Patil
No ratings yet
Terna Engineering College: Rohini Patil
9 pages
SHANA KALLEM_Edgewater_QA Software Tester
No ratings yet
SHANA KALLEM_Edgewater_QA Software Tester
1 page
Untitled0.ipynb - Colaboratory
No ratings yet
Untitled0.ipynb - Colaboratory
5 pages
DL - Midterm - Fall23
No ratings yet
DL - Midterm - Fall23
2 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Solution: Introduction To Deep Learning
No ratings yet
Solution: Introduction To Deep Learning
20 pages
Ai Notes 2
No ratings yet
Ai Notes 2
11 pages
CS230 Midterm Fall 2022
No ratings yet
CS230 Midterm Fall 2022
14 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
QuestionBank c# and .net
No ratings yet
QuestionBank c# and .net
3 pages
Senior Product Lead - NI
No ratings yet
Senior Product Lead - NI
1 page
Understanding Limiting Distribution of Markov Chains
No ratings yet
Understanding Limiting Distribution of Markov Chains
2 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
No ratings yet
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
29 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Building An Effective Data Science Practice
100% (2)
Building An Effective Data Science Practice
376 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

CST414-SCHEME

Uploaded by

CST414-SCHEME

Uploaded by

0400CST414052303

DRAFT Scheme of Valuation/Answer Key

The weight update rule for L2 regularization is

N=64, f=5, p=1,s=1, K=2

Output dimension= (64+2-5/1+1)x(64+2-5/1+1)x2

It generalizes a recurrent network from a chain to a tree. A variable sequence

Multilayer perceptron- Diagram- 3 marks, Explanation- 2 marks

Activation functions- 4 marks

Batch gradient descent performs redundant computations for large datasets, as

You might also like