0% found this document useful (0 votes)
28 views

DL Question Paper Solved

Question paper of DL solved for exam

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

DL Question Paper Solved

Question paper of DL solved for exam

Uploaded by

ANIKET LOHKARE
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1) What are Feed Forward Neural Network?

A Feed Forward Neural Network (FFNN) is a type of artificial neural network where the
information moves in one direction, from input to output without any loops. It consists of
layers of nodes, also called as neurons.

These layers are:


1. Input Layer: The first layer where the data enters the network.
2. Hidden Layers: Intermediate layers where the network processes the data.
3. Output Layer: The final layer where the result or prediction is produced.
Each node in a layer is connected to every node in the next layer, and these connections
have weights that are adjusted during training. The purpose of a feedforward network is to
map input data to the correct output. It is the simplest type of neural network and is used in
many applications like classification and regression.

2) Explain Gradient Descent in Deep Learning.


Gradient Descent is an optimization technique used in deep learning to minimize the loss
function, which measures how well the model is performing. The goal is to adjust the
model’s weights to reduce the error or loss.
Here's how it works:
1. Initialize Weights: Start with random values for the weights of the model.
2. Calculate Loss: Use the model to make predictions and calculate the loss (error) based
on the difference between predicted and actual values.
3. Compute Gradient: Find the gradient (derivative) of the loss function with respect to
each weight. This tells us how much each weight needs to be adjusted.
4. Update Weights: Adjust the weights in the opposite direction of the gradient to reduce
the loss. The learning rate controls how big each adjustment is.
5. Repeat: This process is repeated over multiple iterations (or epochs) until the loss is
minimized.
Gradient Descent helps the model learn by gradually reducing error and improving
predictions. There are different types of Gradient Descent like Batch, Stochastic, and Mini-
Batch, depending on how the data is processed during training.

3) Explain the dropout method and its advantages.


Dropout is a regularization technique used in deep learning to prevent overfitting. During
training, it randomly "drops out" (sets to zero) a certain percentage of neurons in the
network for each iteration. This prevents the model from relying too much on any single
neuron, forcing it to learn more general features.
Here’s how it works:
1. Randomly Drop Neurons: During each training step, a fraction of neurons in each layer
are randomly turned off (dropped out).
2. Training: The network trains with a reduced number of neurons, preventing overfitting by
not allowing the model to memorize the data.
3. Testing: During testing, no neurons are dropped, but the weights of the neurons are
scaled by the dropout rate to adjust for the missing neurons during training.
Advantages of Dropout:
1. Prevents Overfitting: It reduces the chance of the model memorizing the training data,
leading to better generalization.
2. Improves Performance: By forcing the model to rely on a variety of neurons, it often
leads to a more robust and accurate model.
3. Efficient Regularization: It’s simple and computationally efficient to implement in most
models.

4) What are Undercomplete Autoencoders?


An Undercomplete Autoencoder is a type of neural network used for unsupervised
learning, primarily for dimensionality reduction and feature learning.

It consists of two parts:


1. Encoder: This part compresses the input data into a smaller, more compact
representation (called as the latent space or bottleneck). The size of this representation is
smaller than the original input, hence the term "undercomplete."
2. Decoder: This part reconstructs the original data from the compressed representation.
The key feature of an undercomplete autoencoder is that the network is forced to learn a
more efficient, lower-dimensional representation of the data, which captures its most
important features.
Advantages:
 It helps in learning useful features from the data without the need for labeled examples.
 The compressed representation can be used for tasks like data compression, denoising, or
anomaly detection.
An undercomplete autoencoder is contrasted with an overcomplete autoencoder, where
the latent space size is larger than the input, which can lead to overfitting.

5) Explain Pooling operation in CNN.


Pooling is a crucial operation in Convolutional Neural Networks (CNNs) that reduces the
spatial dimensions (height and width) of the input data, helping to reduce computation
and control overfitting. It also helps the model become invariant to small translations or
distortions in the input.
There are two main types of pooling:
1. Max Pooling: Max pooling is a pooling operation that selects the maximum element from
the region of the feature map covered by the filter. Thus, the output after max-pooling
layer would be a feature map containing the most prominent features of the previous
feature map.
2. Average Pooling: Average pooling computes the average of the elements present in the
region of feature map covered by the filter. Thus, while max pooling gives the most
prominent feature in a particular patch of the feature map, average pooling gives the
average of features present in a patch.

Advantages of Pooling:
 Reduces Computational Load: By reducing the size of the feature map, pooling reduces
the number of parameters and computations required in subsequent layers.
 Prevents Overfitting: Pooling helps the model generalize better by summarizing the
features in each region, making it less likely to memorize the data.
 Increases Invariance: Pooling helps the network become invariant to small translations,
rotations, and distortions in the input image.

6) What are the Three Classes of Deep Learning, explain each?


The three main classes of deep learning are:
1. Supervised Learning:
o In supervised learning, the model is trained using labeled data, meaning the input
comes with corresponding output labels.
o The goal is for the model to learn a mapping from input to output based on these
labels, so it can predict the output for unseen data.
o Examples: Classification tasks (e.g., recognizing cats in images), regression tasks
(e.g., predicting house prices).
o Example Algorithms: Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), Multilayer Perceptrons (MLPs).
2. Unsupervised Learning:
o In unsupervised learning, the model is trained on data without labels. The goal is to
find patterns, structures, or relationships in the data.
o The model tries to group similar data points or reduce the dimensionality of the
data.
o Examples: Clustering (e.g., grouping similar customers), anomaly detection (e.g.,
fraud detection).
o Example Algorithms: K-means, Principal Component Analysis (PCA),
Autoencoders.
3. Reinforcement Learning:
o In reinforcement learning, an agent learns by interacting with an environment. The
agent takes actions and receives rewards or penalties based on those actions.
o The goal is for the agent to learn the best strategy (policy) to maximize cumulative
rewards over time.
o Examples: Game-playing AI (e.g., AlphaGo), robotics, self-driving cars.
o Example Algorithms: Q-learning, Deep Q Networks (DQNs), Policy Gradient
Methods.
Each class has different applications and is suited to different types of problems in
deep learning.

7) Explain the architecture of CNN with the help of a diagram.


A Convolutional Neural Network (CNN) is a specialized neural network architecture
primarily used for image recognition and processing. CNNs are designed to automatically
capture spatial features like edges, textures, shapes, and objects in images.

Below is an explanation of the main layers in a CNN architecture:


1. Input Layer: This is where the input image is fed into the network, usually represented as
a matrix of pixel values (e.g., 28x28x3 for a 28x28 RGB image).
2. Convolutional Layer: The key layer in CNNs, this layer applies filters (kernels) to the
input to extract features. Each filter detects specific patterns, such as edges or textures,
creating feature maps.
3. Pooling Layer: This layer reduces the spatial size of the feature maps, typically through
Max Pooling or Average Pooling, to decrease computation and control overfitting. Pooling
also provides translation invariance.
4. Fully Connected (Dense) Layer: After multiple convolutional and pooling layers, the
feature maps are flattened into a single vector and passed through fully connected layers,
which help combine the extracted features and make the final classification or regression.
5. Output Layer: The final layer, which produces the output (e.g., class probabilities for
image classification).

8) What are the different types of Gradient Descent methods, explain any three of
them.
There are several types of Gradient Descent methods, each with different ways of
updating model weights based on data size and efficiency. Here are three commonly used
types:
1. Batch Gradient Descent:
o In this method, the gradient is computed using the entire training dataset for each
update of the weights.
o It provides a stable and accurate estimate of the gradient, but it's slow and
memory-intensive for large datasets since it processes all data at once.
o Use Case: Suitable for smaller datasets where computational resources are less
constrained.
2. Stochastic Gradient Descent (SGD):
o In SGD, the gradient is computed and weights are updated for each data point (or
sample) individually, rather than the entire dataset.
o This makes SGD faster and allows it to handle large datasets more easily. However,
it can lead to more noisy updates, potentially making convergence less smooth.
o Use Case: Often used for large datasets, as it requires less memory and can
converge faster.
3. Mini-Batch Gradient Descent:
o Mini-batch gradient descent combines the benefits of both batch and stochastic
gradient descent. Here, the dataset is divided into small batches, and the gradient
is computed for each batch.
o This approach balances the speed of SGD with the stability of batch gradient
descent, making it widely used in practice.
o Use Case: Commonly used in deep learning, as it is computationally efficient and
can leverage GPU acceleration effectively.
Each method has its own strengths, and the choice depends on the dataset size,
hardware, and desired trade-off between accuracy and speed.

9) Explain main components of an Autoencoder and its architecture.


An Autoencoder is a type of neural network used for unsupervised learning, primarily to
compress and reconstruct data. The architecture consists of two main components:
1. Encoder:
o The encoder compresses the input data into a smaller representation called the
"latent space" or "bottleneck." This part captures the most important features of the
data.
o It consists of one or more layers that reduce the dimensionality of the input by
learning the key features while ignoring noise or irrelevant details.
2. Decoder:
o The decoder reconstructs the original data from the compressed representation
created by the encoder.
o This part aims to recreate the input as closely as possible, based on the compressed
information from the encoder.
o The decoder structure usually mirrors the encoder structure but in reverse, to
gradually expand the data back to its original form.
Architecture: The typical autoencoder structure is as follows:
The bottleneck layer is critical because it forces the model to learn an efficient, compact
representation of the data. By comparing the reconstructed output with the original input,
the autoencoder can minimize reconstruction errors during training.
Applications: Autoencoders are used in tasks like data compression, noise reduction,
feature extraction, and anomaly detection.

10) Explain LSTM model, how it overcomes the limitation of RNN.


The LSTM (Long Short-Term Memory) model is a type of Recurrent Neural Network
(RNN) specifically designed to address the limitations of traditional RNNs, especially with
long sequences. RNNs are useful for sequence data, like time series or text, but they
struggle with remembering long-term dependencies because of the "vanishing gradient"
problem, where gradients become too small to learn patterns over long-time steps.

How LSTM Works: The LSTM model introduces memory cells with three key gates to
manage the flow of information effectively:
1. Forget Gate: Decides which information from the previous time step should be discarded.
2. Input Gate: Determines what new information should be stored in the memory cell.
3. Output Gate: Controls what information from the cell state should be used as output at
the current time step.
These gates help the LSTM retain essential information over longer periods and forget
irrelevant information, allowing it to learn dependencies that are further back in the
sequence.
Advantages of LSTM:
 Solves the Vanishing Gradient Problem: LSTM’s architecture helps gradients flow
better through the network, allowing it to learn from long-term dependencies.
 Effective for Long Sequences: LSTM can handle long sequences of data without losing
important context, unlike traditional RNNs.
Applications: LSTMs are widely used in natural language processing (NLP) tasks, time
series prediction, and any domain where understanding context over time is essential.

11) What are the issues faced by Vanilla GAN models?


Vanilla GANs (Generative Adversarial Networks) face several challenges that can affect
their training stability and output quality. Here are some key issues:
1. Mode Collapse:
o In mode collapse, the generator produces a limited variety of outputs, essentially
"collapsing" to a narrow range of outputs rather than capturing the diversity in the
data.
o This happens when the generator finds a few outputs that can consistently fool the
discriminator, and it keeps producing those, ignoring other potential outputs.
2. Training Instability:
o GANs are inherently difficult to train because they involve two networks (generator
and discriminator) competing against each other. If one network becomes
significantly stronger than the other, it can lead to unstable training.
o For instance, if the discriminator becomes too strong, it can easily reject the
generator's outputs, preventing the generator from learning effectively.
3. Vanishing Gradients:
o When the discriminator performs too well, it may provide very small gradients to the
generator during backpropagation, slowing down or even stopping the generator's
learning process.
o This issue often arises in the initial stages of training when the generator is still
weak and the discriminator can easily distinguish real data from fake data.
4. Sensitivity to Hyperparameters:
o GANs are highly sensitive to hyperparameters like learning rates, batch sizes, and
network architecture. Even slight changes in these settings can lead to suboptimal
performance or instability.
Addressing These Issues: To mitigate these problems, researchers have developed
improved GAN architectures, such as Wasserstein GAN (WGAN) to reduce training
instability and Conditional GANs to improve diversity by conditioning outputs on
specific labels.

12) What are L1 and L2 regularization methods?


L1 and L2 regularization are techniques used in machine learning to prevent models
from overfitting by adding a penalty term to the loss function. This encourages the model
to have smaller or more simplified weights, improving its generalization to new data.
1. L1 Regularization (also known as Lasso Regularization):
o In L1 regularization, a penalty equal to the absolute value of the weights is added to
the loss function.
o This encourages weights to be zero, effectively removing some features from the
model, making it sparse. Sparsity can be helpful when dealing with high-
dimensional data.
o Formula: Loss = Original Loss + λ * Σ|weights|
o Use Case: L1 is often used in feature selection, as it can reduce the model
complexity by setting irrelevant feature weights to zero.
2. L2 Regularization (also known as Ridge Regularization):
o In L2 regularization, a penalty equal to the square of the weights is added to the
loss function.
o This encourages weights to be small but not necessarily zero, which helps the
model to generalize better without making it sparse.
o Formula: Loss = Original Loss + λ * Σ(weights²)
o Use Case: L2 is commonly used to reduce model complexity while keeping all
features, useful when you want all features to contribute but with lower magnitude.
Key Difference:
 L1 results in sparse models (some weights are exactly zero), while L2 reduces the
magnitude of weights without necessarily making them zero. Both regularizations reduce
overfitting by constraining the weights but in slightly different ways.

13) Explain any three types of Autoencoders.


Here are three common types of Autoencoders and their unique purposes:
1. Undercomplete Autoencoder:

oAn undercomplete autoencoder has a bottleneck layer (latent space) with fewer
dimensions than the input data.
o This forced compression means the model must learn the most important features,
ignoring noise or redundant information.
o Purpose: Primarily used for data compression and feature extraction.
2. Denoising Autoencoder:

oA denoising autoencoder is trained to reconstruct the original input from a noisy


version of it. During training, random noise is added to the input, but the model
learns to output a clean, noise-free version.
o This teaches the autoencoder to ignore noise and focus on the core features of the
data.
o Purpose: Useful for noise reduction in images, text, or signals.
3. Sparse Autoencoder:
o In a sparse autoencoder, a regularization term is added to make most neurons in
the hidden layer inactive, creating a sparse representation. This sparsity constraint
allows the model to learn only the most essential features.
o The model can have more hidden units than the input, but only a few are activated,
encouraging efficient feature learning.
o Purpose: Commonly used for feature selection, anomaly detection, and pattern
discovery.
Each type of autoencoder is tailored to specific tasks, from noise reduction to
discovering underlying data features.

14) What is the significance of Activation Functions in Neural Networks,


explain different types Activation functions used in NN.
Activation functions play a crucial role in neural networks by introducing non-linearity
into the model, enabling it to learn complex patterns in data. Without activation functions,
a neural network would essentially behave like a linear model, limiting its ability to solve
complex problems. Here are some commonly used activation functions and their purposes:
1. Sigmoid Function:
1
o Formula: f ( x )= −x
1+e
o Range: 0 to 1.

o Use Case: Often used in the output layer for binary classification tasks since it
outputs values between 0 and 1.
o Pros and Cons: Provides smooth gradients but can suffer from the "vanishing
gradient" problem, making it harder to train deep networks.
2. Tanh (Hyperbolic Tangent) Function:
2
o Formula: f ( x )=tanh ( x )= −2 x
−1
1+ e
o Range: -1 to 1.

o Use Case: Common in hidden layers; like the sigmoid function, but outputs are
centered around zero, which helps with faster convergence.
o Pros and Cons: Reduces the vanishing gradient problem to some extent but can
still saturate and slow down training.
3. ReLU (Rectified Linear Unit) Function:
o Formula: f ( x )=max ⁡(0 , x )
o Range: 0 to ∞.

o Use Case: Widely used in hidden layers because it’s computationally efficient and
helps networks converge faster.
o Pros and Cons: Reduces vanishing gradient issues; however, it can lead to "dead
neurons" (neurons stuck at zero).
4. Leaky ReLU:

o Formula: f (x)= {αxif ∧x ≤0


x if ∧x >0

o Range: -∞ to ∞.
o Use Case: Used to address the "dying ReLU" problem by allowing a small gradient
when x is negative.
o Pros and Cons: Prevents dead neurons by allowing a small gradient for negative
inputs.
5. Softmax Function:
e xi
o Formula: f ( x i ) =
∑ e xj
o Range: 0 to 1 for each class, with the sum of outputs equal to 1.

o Use Case: Primarily used in the output layer for multi-class classification to
interpret outputs as probabilities across classes.
o Pros and Cons: Useful for probabilistic interpretation but computationally more
expensive.
Each activation function has its strengths and weaknesses, and the choice depends on
the specific layer and type of problem. Non-linear activation functions like ReLU and
Tanh allow neural networks to learn intricate patterns, making them essential for
modern deep learning.

15) What are Generative Adversarial Networks, comment on its applications.


Generative Adversarial Networks (GANs) are a type of neural network architecture
designed to generate new, synthetic data that resembles a given dataset. They consist of
two networks, a generator and a discriminator that compete in a game-like setup:
1. Generator: This network creates fake data from random noise. Its goal is to produce data
that is indistinguishable from real data, trying to "fool" the discriminator.
2. Discriminator: This network evaluates both real and generated data and attempts to
classify them correctly as "real" or "fake."
The two networks train together in a back-and-forth process. As the generator improves its
fake data, the discriminator becomes better at detecting it. This "adversarial" process
pushes the generator to produce increasingly realistic outputs.
Applications of GANs
1. Image Generation:
o GANs can create realistic images from scratch, widely used in art, game
development, and fashion. For instance, they can generate human faces,
landscapes, or textures that don’t exist in reality.
2. Image-to-Image Translation:
o GANs can transform images from one domain to another, such as converting
sketches to photos, daytime photos to nighttime, or black-and-white images to
color. This is used in applications like photo enhancement, style transfer, and
medical imaging.
3. Text-to-Image Generation:
o GANs can create images based on text descriptions. This is helpful in generating
images for specific text prompts in areas like advertising, art, and content creation.
4. Super-Resolution:
o GANs can improve the resolution of images, often used in enhancing old or low-
quality photos, surveillance footage, or satellite imagery.
5. Data Augmentation:
o In fields like medical imaging or autonomous driving, GANs can generate additional,
realistic data samples to train machine learning models when labeled data is scarce.
Overall, GANs are powerful tools for creating realistic data across various fields,
pushing advancements in visual arts, entertainment, and scientific research. However,
they also require careful training to avoid instability and mode collapse, which can limit
the diversity of generated samples.

You might also like