DL Question Paper Solved
DL Question Paper Solved
A Feed Forward Neural Network (FFNN) is a type of artificial neural network where the
information moves in one direction, from input to output without any loops. It consists of
layers of nodes, also called as neurons.
Advantages of Pooling:
Reduces Computational Load: By reducing the size of the feature map, pooling reduces
the number of parameters and computations required in subsequent layers.
Prevents Overfitting: Pooling helps the model generalize better by summarizing the
features in each region, making it less likely to memorize the data.
Increases Invariance: Pooling helps the network become invariant to small translations,
rotations, and distortions in the input image.
8) What are the different types of Gradient Descent methods, explain any three of
them.
There are several types of Gradient Descent methods, each with different ways of
updating model weights based on data size and efficiency. Here are three commonly used
types:
1. Batch Gradient Descent:
o In this method, the gradient is computed using the entire training dataset for each
update of the weights.
o It provides a stable and accurate estimate of the gradient, but it's slow and
memory-intensive for large datasets since it processes all data at once.
o Use Case: Suitable for smaller datasets where computational resources are less
constrained.
2. Stochastic Gradient Descent (SGD):
o In SGD, the gradient is computed and weights are updated for each data point (or
sample) individually, rather than the entire dataset.
o This makes SGD faster and allows it to handle large datasets more easily. However,
it can lead to more noisy updates, potentially making convergence less smooth.
o Use Case: Often used for large datasets, as it requires less memory and can
converge faster.
3. Mini-Batch Gradient Descent:
o Mini-batch gradient descent combines the benefits of both batch and stochastic
gradient descent. Here, the dataset is divided into small batches, and the gradient
is computed for each batch.
o This approach balances the speed of SGD with the stability of batch gradient
descent, making it widely used in practice.
o Use Case: Commonly used in deep learning, as it is computationally efficient and
can leverage GPU acceleration effectively.
Each method has its own strengths, and the choice depends on the dataset size,
hardware, and desired trade-off between accuracy and speed.
How LSTM Works: The LSTM model introduces memory cells with three key gates to
manage the flow of information effectively:
1. Forget Gate: Decides which information from the previous time step should be discarded.
2. Input Gate: Determines what new information should be stored in the memory cell.
3. Output Gate: Controls what information from the cell state should be used as output at
the current time step.
These gates help the LSTM retain essential information over longer periods and forget
irrelevant information, allowing it to learn dependencies that are further back in the
sequence.
Advantages of LSTM:
Solves the Vanishing Gradient Problem: LSTM’s architecture helps gradients flow
better through the network, allowing it to learn from long-term dependencies.
Effective for Long Sequences: LSTM can handle long sequences of data without losing
important context, unlike traditional RNNs.
Applications: LSTMs are widely used in natural language processing (NLP) tasks, time
series prediction, and any domain where understanding context over time is essential.
oAn undercomplete autoencoder has a bottleneck layer (latent space) with fewer
dimensions than the input data.
o This forced compression means the model must learn the most important features,
ignoring noise or redundant information.
o Purpose: Primarily used for data compression and feature extraction.
2. Denoising Autoencoder:
o Use Case: Often used in the output layer for binary classification tasks since it
outputs values between 0 and 1.
o Pros and Cons: Provides smooth gradients but can suffer from the "vanishing
gradient" problem, making it harder to train deep networks.
2. Tanh (Hyperbolic Tangent) Function:
2
o Formula: f ( x )=tanh ( x )= −2 x
−1
1+ e
o Range: -1 to 1.
o Use Case: Common in hidden layers; like the sigmoid function, but outputs are
centered around zero, which helps with faster convergence.
o Pros and Cons: Reduces the vanishing gradient problem to some extent but can
still saturate and slow down training.
3. ReLU (Rectified Linear Unit) Function:
o Formula: f ( x )=max (0 , x )
o Range: 0 to ∞.
o Use Case: Widely used in hidden layers because it’s computationally efficient and
helps networks converge faster.
o Pros and Cons: Reduces vanishing gradient issues; however, it can lead to "dead
neurons" (neurons stuck at zero).
4. Leaky ReLU:
o Range: -∞ to ∞.
o Use Case: Used to address the "dying ReLU" problem by allowing a small gradient
when x is negative.
o Pros and Cons: Prevents dead neurons by allowing a small gradient for negative
inputs.
5. Softmax Function:
e xi
o Formula: f ( x i ) =
∑ e xj
o Range: 0 to 1 for each class, with the sum of outputs equal to 1.
o Use Case: Primarily used in the output layer for multi-class classification to
interpret outputs as probabilities across classes.
o Pros and Cons: Useful for probabilistic interpretation but computationally more
expensive.
Each activation function has its strengths and weaknesses, and the choice depends on
the specific layer and type of problem. Non-linear activation functions like ReLU and
Tanh allow neural networks to learn intricate patterns, making them essential for
modern deep learning.