0% found this document useful (0 votes)
37 views38 pages

DL QB With Ans

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views38 pages

DL QB With Ans

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

lOMoARcPSD|42383224

AD8701 Deep Learning question bank

deep learning (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by saranya suriyaprakasam ([email protected])
lOMoARcPSD|42383224

PANIMALAR ENGINEERING COLLEGE


Bangalore Trunk Road, Varadharajapuram, Nazarathpet,
Poonamalle, Chennai 600123.
DEPARTMENT OF ARTIFICAL INTELLIGENCE AND DATA
SCIENCE

AD8701 – DEEP LEARNING


QUESTION BANK
VII SEMESTER
REGULATION 2017

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

UNIT I

DEEP NETWORKS BASICS

Linear Algebra: Scalars -- Vectors -- Matrices and tensors; Probability Distributions --


Gradientbased Optimization – Machine Learning Basics: Capacity -- Overfitting and
underfitting -- Hyperparameters and validation sets -- Estimators -- Bias and variance --
Stochastic gradient descent -- Challenges motivating deep learning; Deep Networks: Deep
feedforward networks; Regularization -- Optimization.

Two Marks – Part A


1. What is Deep Learning?
Deep learning is a part of machine learning with an algorithm inspired by the structure
and function of the brain, which is called an artificial neural network. In the mid-1960s,
Alexey Grigorevich Ivakhnenko published the first general, while working on deep
learning network. Deep learning is suited over a range of fields such as computer
vision, speech recognition, natural language processing, etc

2. What are the main differences between AI, Machine Learning, and Deep Learning?
AI stands for Artificial Intelligence. It is a technique which enables machines to mimic
human behavior.
Machine Learning is a subset of AI which uses statistical methods to enable machines
to improve with experiences.

Deep learning is a part of Machine learning, which makes the computation of multi-
layer neural networks feasible. It takes advantage of neural networks to simulate
human-like decision making.
3. Differentiate supervised and unsupervised deep learning procedures.
Supervised learning is a system in which both input and desired output data are
provided. Input and output data are labeled to provide a learning basis for future data
processing.
Unsupervised procedure does not need labeling information explicitly, and the
operations can be carried out without the same. The common unsupervised learning
method is cluster analysis. It is used for exploratory data analysis to find hidden
patterns or grouping in data.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

4. What are the applications of deep learning?


There are various applications of deep learning:
Computer vision
Natural language processing and pattern recognition
Image recognition and processing
Machine translation
Sentiment analysis
Question answering system
Object Classification and Detection
Automatic Handwriting Generation
Automatic Text Generation.

5. What is scalar and vector?


A scalar is just a single number, in contrast to most of the other objects like Vectors,
which are usually arrays of multiple numbers.

6. What are matrices and tensors?


Matrices: A matrix is a 2D array of numbers, so each element is identified by two
subscripts instead of just one. We usually give matrices uppercase variable names with
bold characters, such as A.

We usually identify the elements of a matrix by using its name in italics but not in bold,
and the subscripts are listed with separating commas.

Tensors: In some cases, we’ll need an array with more than two axes. In the general
case, an array of numbers arranged on a regular grid with a varying number of axes is
called a tensor. We note a tensor named “A” with this font: A.

7. Why probability is important in deep learning?


Probability is the science of quantifying uncertain things. Most of machine learning and
deep learning systems utilize a lot of data to learn about patterns in the data. Whenever
data is utilized in a system rather than sole logic, uncertainty grows up and whenever
uncertainty grows up, probability becomes relevant.

By introducing probability to a deep learning system, we introduce common sense to


the system. Otherwise the system would be very brittle and will not be useful.In deep
learning, several models like Bayesian models, probabilistic graphical models, hidden
markov models are used. They depend entirely on probability concepts.

8. Define Random Variable.


A random variable is a variable that can take on different values randomly. We typically
denote the random variable itself with a lower case letter in plain typeface, and the
values it can take on with lower case script letters. For example, x1 and x2 are both
possible values that the random variable x can take on.

9. Do random variables is discrete or continuous?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Random variables may be discrete or continuous. A discrete random variable is one that
has a finite or countably infinite number of states. Note that these states are not
necessarily the integers; they can also just be named states that are not considered to
have any numerical value. A continuous random variable is associated with a real value.

10. What are probability distributions?


A probability distribution is a description of how likely a random variable or set of
random variables is to take on each of its possible states. The way we describe
probability distributions depends on whether the variables are discrete or continuous.

11. Define Probability mass function?


A probability distribution over discrete variables may be described using a probability
mass function (PMF). We typically denote probability mass functions with a capital P.
Often we associate each random variable with a different probability mass function and
the reader must infer which probability mass function to use based on the identity of the
random variable, rather than the name of the function; P(x) is usually not the same as
P(y).

12. List the properties that probability mass function satisfies?


• The domain of P must be the set of all possible states of x.

• ∀x ∈ x,0 ≤ P(x) ≤ 1. An impossible event has probability 0 and no state can be less
probable than that. Likewise, an event that is guaranteed to happen has probability 1,
and no state can have a greater chance of occurring.

• ∑x∈x P(x) = 1. We refer to this property as being normalized. Without this property, we
could obtain probabilities greater than one by computing the probability of one of many
events occurring.

13. List the properties that probability density function satisfies?


When working with continuous random variables, we describe probability distributions
using a probability density function (PDF) rather than a probability mass function.

To be a probability density function, a function p must satisfy the following properties:


• The domain of p must be the set of all possible states of x.
• ∀x ∈ x, p(x) ≥ 0. Note that we do not require p(x) ≤ 1.
• ʃp(x)dx = 1.

14. What is Gradient based optimizer?


Gradient descent is an optimization algorithm that’s used when training deep learning
models. It’s based on a convex function and updates its parameters iteratively to
minimize a given function to its local minimum.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

The notation used in the above Formula is given below,


In the above formula,
 α is the learning rate,
 J is the cost function, and
 ϴ is the parameter to be updated.
As you can see, the gradient represents the partial derivative of J(cost function) with
respect to ϴj

15. Why overfitting and underfitting in ML?


Factors determining how well an ML algorithm will perform are its ability to:
1. Make the training error small
2. Make gap between training and test errors small

• They correspond to two ML challenges


Underfitting - Inability to obtain low enough error rate on the training set
Overfitting - Gap between training error and testing error is too large

We can control whether a model is more likely to overfit or underfit by altering its
capacity

16. What is capacity of a model?


Model capacity is ability to fit variety of functions
– Model with Low capacity struggles to fit training set
– A High capacity model can overfit by memorizing properties of training set not useful
on test set
• When model has higher capacity, it overfits – One way to control capacity of a
learning algorithm is by choosing the hypothesis space
• i.e., set of functions that the learning algorithm is allowed to select as being the
solution

17. How to control the capacity of learning algorithm?


One way to control the capacity of a learning algorithm is by choosing its hypothesis
space, the set of functions that the learning algorithm is allowed to select as being the
solution. For example, the linear regression algorithm has the set of all linear functions
of its input as its hypothesis space. We can generalize linear regression to include
polynomials, rather than just linear functions, in its hypothesis space. Doing so increases
the model’s capacity

18. Define Bayes error.


Ideal model is an oracle that knows the true probability distributions that generate the
data • Even such a model incurs some error due to noise/overlap in the distributions •
The error incurred by an oracle making predictions from the true distribution p(x,y) is
called the Bayes error

19. Why hyperparameters in ML?


Most ML algorithms have hyperparameters
– We can use to control algorithm behavior
– Values of hyperparameters are not adapted by learning algorithm itself
• Although, we can design nested learning where one learning algorithm
– Which learns best hyperparameters for another learning algorithm.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

20. How to solve overfitting problem caused by learning hyperparameters on training


dataset?
• To solve the problem, we use a validation set
– Examples that training algorithm does not observe
• Test examples should not be used to make choices about the model hyperparameters •
Training data is split into two disjoint parts
– First to learn the parameters
– Other is the validation set to estimate generalization error during or after training
• allowing for the hyperparameters to be updated
– Typically, 80% of training data for training and 20% for validation

21. What are point estimators?


Point estimators are functions that are used to find an approximate value of a population
parameter from random samples of the population. They use the sample data of a population
to calculate a point estimate or a statistic that serves as the best estimate of an
unknown parameter of a population.

Most often, the existing methods of finding the parameters of large populations are
unrealistic. For example, when finding the average age of kids attending kindergarten, it will
be impossible to collect the exact age of every kindergarten kid in the world. Instead, a
statistician can use the point estimator to make an estimate of the population parameter.

22. List the characteristics or Properties of Point Estimators?


The following are the main characteristics of point estimators:
1. Bias
The bias of a point estimator is defined as the difference between the expected value of the
estimator and the value of the parameter being estimated. When the estimated value of the
parameter and the value of the parameter being estimated are equal, the estimator is
considered unbiased.
Also, the closer the expected value of a parameter is to the value of the parameter being
measured, the lesser the bias is.
2. Consistency

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Consistency tells us how close the point estimator stays to the value of the parameter as it
increases in size. The point estimator requires a large sample size for it to be more consistent
and accurate.
You can also check if a point estimator is consistent by looking at its corresponding
expected value and variance. For the point estimator to be consistent, the expected value
should move toward the true value of the parameter.
3. Most efficient or unbiased
The most efficient point estimator is the one with the smallest variance of all the unbiased
and consistent estimators. The variance measures the level of dispersion from the estimate,
and the smallest variance should vary the least from one sample to the other.

23. Define Stochastic Gradient Descent with merits and demerits.


Stochastic Gradient Descent (SGD) is a variant of the Gradient Descent algorithm used for
optimizing machine learning models. In this variant, only one random training example is
used to calculate the gradient and update the parameters at each iteration. Here are some of
the advantages and disadvantages of using SGD:
Advantages of Gradient Descent
Speed: SGD is faster than other variants of Gradient Descent such as Batch Gradient
Descent and Mini-Batch Gradient Descent since it uses only one example to update the
parameters.
Memory Efficiency: Since SGD updates the parameters for each training example one
at a time, it is memory-efficient and can handle large datasets that cannot fit into
memory.
Avoidance of Local Minima: Due to the noisy updates in SGD, it has the ability to
escape from local minima and converges to a global minimum.

Disadvantages of Gradient Descent


Noisy updates: The updates in SGD are noisy and have a high variance, which can
make the optimization process less stable and lead to oscillations around the minimum.

Slow Convergence: SGD may require more iterations to converge to the minimum
since it updates the parameters for each training example one at a time.

Sensitivity to Learning Rate: The choice of learning rate can be critical in SGD since
using a high learning rate can cause the algorithm to overshoot the minimum, while a
low learning rate can make the algorithm converge slowly.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Less Accurate: Due to the noisy updates, SGD may not converge to the exact global
minimum and can result in a suboptimal solution. This can be mitigated by using
techniques such as learning rate scheduling and momentum-based updates

24. What is a deep feedforward network?


In a feedforward network, the information moves only in the forward direction, from the
input layer, through the hidden layers (if they exist), and to the output layer. There are no
cycles or loops in this network. Feedforward neural networks are sometimes
ambiguously called multilayer perceptron’s.

25. What is the working principle of a feed forward neural network?

When the feed forward neural network gets simplified, it can appear as a single layer
perceptron.
This model multiplies inputs with weights as they enter the layer. Afterward, the
weighted input values get added together to get the sum. As long as the sum of the
values rises above a certain threshold, set at zero, the output value is usually 1, while if it
falls below the threshold, it is usually -1.

26. What are the Layers of feed forward neural network?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

27. Brief on classification of activation function.


An activation function can be classified into three major categories: sigmoid, Tanh, and
Rectified Linear Unit (ReLu).
 Sigmoid:

Input values between 0 and 1 get mapped to the output values.

 Tanh:

A value between -1 and 1 gets mapped to the input values.


 Rectified linear Unit:

Only positive values are allowed to flow through this function. Negative values get
mapped to 0.
28. What is Regularization?
Regularization is a technique used in machine learning and deep learning to prevent
overfitting and improve the generalization performance of a model. It involves adding a
penalty term to the loss function during training. This penalty discourages the model
from becoming too complex or having large parameter values, which helps in
controlling the model’s ability to fit noise in the training data. Regularization methods
include L1 and L2 regularization, dropout, early stopping, and more.

29. What is dropout in neural network?


Dropout is a regularization technique used in neural networks to prevent overfitting.
During training, a random subset of neurons is “dropped out” by setting their outputs to
zero with a certain probability. This forces the network to learn more robust and
independent features, as it cannot rely on specific neurons. Dropout improves
generalization and reduces the risk of overfitting.

30. Difference between regularization and optimization.


The main conceptual difference is that optimization is about finding the set of
parameters/weights that maximizes/minimizes some objective function (which can also
include a regularization term), while regularization is about limiting the values that your
parameters can take during the optimization/learning/training, so optimization with
regularisation (especially, with L1 and L2 regularization) can be thought of as
constrained optimization, but, in some cases, such as dropout, it can also be thought of
as a way of introducing noise in the training process.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

31. How does splitting a dataset into train, dev and test sets help identify overfitting?
• Overfitting: the model fits the training set so much that it does not generalize well.
• Low training error and high dev error can be used to identify this
• Must ensure that the distribution of train and dev is the same/similar!

PART B
1. Develop short notes on following with respect to deep learning with
Examples.
i) Scalar and Vectors. (6)
ii) Matrices. (7)
2. Explicate Probability Mass function and Probability Density function (13)
3. Describe Gradient-based optimization in deep learning.
4. Explain in detain on linear regression machine learning algorithm. (13)
5. Describe Stochastic Gradient Descent in detail. (13)
6. Explain in detail on different regularization technique in Deep learning? (13)
7. Brief how does regularization help reduce overfitting? (13)
8. Analyse and write short notes on Dataset Augmentation. (13)
9. Point out and explain different set of layers in Feed forward networks.
10. Describe Deep feed forward networks with neat diagram. (13)

PART C
1. Assess the following with respect to deep learning examples.
i) Random Variables. (6)
ii) Probability. (7)
2. Explain briefly on Estimators, Bias and Variance that are useful for generalization,
underfitting and overfitting.
3. Briefly explain an example of a fully functioning feed forward network on a simple
task.
4. Assess the difference between linear models and neural networks. (15)

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

UNIT II
CONVOLUTIONAL NEURAL NETWORKS
Convolution Operation -- Sparse Interactions -- Parameter Sharing -- Equivariance -- Pooling --

Convolution Variants: Strided -- Tiled -- Transposed and dilated convolutions; CNN Learning:

Nonlinearity Functions -- Loss Functions -- Regularization -- Optimizers -- Gradient

Computation.

Part A

1. What is convolutional neural network?


A Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and interprehe image or visual data.

2. What are the three types of layers in neural network?


In a regular Neural Network there are three types of layers:
Input Layers: It’s the layer in which we give input to our model. The number of
neurons in this layer is equal to the total number of features in our data (number of
pixels in the case of an image).
Hidden Layer: The input from the Input layer is then feed into the hidden layer. There
can be many hidden layers depending upon our model and data size. Each hidden layer
can have different numbers of neurons which are generally greater than the number of
features. The output from each layer is computed by matrix multiplication of output of
the previous layer with learnable weights of that layer and then by the addition of
learnable biases followed by activation function which makes the network nonlinear.
Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of
each class.

3. Define feedforward and Backpropagation.


The data is fed into the model and output from each layer is obtained from the above
step is called feedforward, we then calculate the error using an error function, some
common error functions are cross-entropy, square loss error, etc. The error function
measures how well the network is performing. After that, we backpropagate into the

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

model by calculating the derivatives. This step is called Backpropagation which


basically is used to minimize the loss.

4. What is convolution operation with the representation of equation?


Applying a weighted average operation at every moment with respective to time , a new
estimated function s is obtained
s(t) = ʃx(a)w(t − a)da
This operation is called convolution. The convolution operation is denoted as asterick.
s(t) = (x ∗ w)(t)
Where
w= valid probability density function and w needs to 0 for all negative arguments. In general,
convolution is defined for any functions for which the above integral is defined, and may be
used for other purposes besides taking weighted averages.
In convolutional network terminology, the first argument (in this example, the function x) to
the convolution is often referred to as the input and the second argument (in this example, the
function w) as the kernel. The output is sometimes referred to as the feature map.
5. What are the motivation of convolution?
1. Sparse interactions
2. Parameter Sharing
3. Equivariant representations.
6. Define Sparse Connectivity or Sparse interactions.
A Convolution layer defines a window or filter or kernel by which they examine a subset of the
data, and subsequently scans the data looking through this window. his is what we call sparse
connectivity or sparse interactions or sparse weights. Actually, it limits the activated
connections at each layer. In the example below an 5x5 input with a 2x2 filter produces a
reduced 4x4 output. The first element of feature map is calculated by the convolution of the
input area with the filter i.e.

Apply 2x2 filter to the input and get the first convolutional layer (a feature map)

First element of the feature map

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

7. How many filters use at each layer?


1) hyperparameter which is called the depth of the output volume.

2) Another hyperparameter is the stride that defines how much we slide the
filter over the data. For example if stride is 1 then we move the window
by 1 pixel at a time over the image, when our input is an image. When
we use larger values of stride 2 or 3 we allow jumping 2 or pixels at a
time. This reduces significantly the output size.

3) The last hyperparameter is the size of zero-padding, when sometimes is


convenient to pad the input volume with zeros around the border.

8. Write the formula to find how many neurons fit for a network?
To compute the spatial size of the output volume as a function of the input volume size (W),
the receptive field size of the Conv Layer neurons (F), the stride with which they are applied
(S), and the amount of zero padding used (P) on the border. The formula for calculating how
many neurons “fit” is given by

9. Why parameter sharing is used in CNN?


Parameter sharing is used in the convolutional layers to reduce the number of parameters
in the network. For example in the first convolutional layer let’s say we have an output
of 15x15x4 where 15 is the size of the output and 4 the number of filters used in this
layer. For each output node in that layer we have the same filter, thus reducing
dramatically the storage requirements of the model to the size of the filter.

The same filter


(weights) (1, 0, -1) are used for that layer.
10. What is equivariant representations?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Equivariant means varying in the similar or equivalent proportion. Equivariant to


translation means that a translation of input features results in an equivalent translation
of outputs. It makes the CNN understand the rotation or proportion change. The
equivariance allows the network to generalize edge, texture, shape, detection in different
locations.

11. Why pooling layer in CNN?


A pooling layer is another building block of a CNN. Pooling Its function is to
progressively reduce the spatial size of the representation to reduce the network
complexity and computational cost.

12. What are the two types of pooling widely used?


There are two types of widely used pooling in CNN layer:

Max Pooling
Average Pooling

13. Outline the problem arise due to convolution.


1. Every time after convolution operation, original image size get shrinks.
2. The second issue is that, when kernel moves over original images, it touches the
edge of the image less number of times and touches the middle of the image
more number of times and it overlaps also in the middle. So, the corner features
of any image or on the edges aren’t used much in the output.

14. Define Padding and Stride.


 Padding preserves the size of the original image.
 Stride is the number of pixels shifts over the input matrix. For padding p, filter size
∗ and input image size � ∗ � and stride ‘�’ our output image dimension will be
[ {(� + 2� − � + 1) / �} + 1] ∗ [ {(� + 2� − � + 1) / �} + 1].

15. What is the difference between normal convolution and transposed convolution?
Traditional convolution determines the output value as the dot product between filter
and input, by moving the filter kernel for two pixels in every step, the input is
downsampled by factor two. For transposed convolution, the input value determines the
filter values that will be written to the output.

16. What is a transposed convolution?


Transposed convolutions are standard convolutions but with a modified input feature
map. The stride and padding do not correspond to the number of zeros added around the

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

image and the amount of shift in the kernel when sliding it across the input, as they
would in a standard convolution operation.

17. What does non-linearity mean?


It means that the neural network can successfully approximate functions that do not
follow linearity or it can successfully predict the class of a function that is divided by a
decision boundary which is not linear.

18. What is linear and non-linear in deep learning?


Linearity refers to the property of a system or model where the output is directly
proportional to the input, while nonlinearity implies that the relationship between input
and output is more complex and cannot be expressed as a simple linear function.

19. What is the loss function in CNN machine learning?


A loss function is a function that compares the target and predicted output values; measures
how well the neural network models the training data. When training, we aim to minimize this
loss between the predicted and target outputs.
20. List any two loss function.
1. Regression

MSE(Mean Squared Error)


MAE(Mean Absolute Error)
Hubber loss

2. Classification

Binary cross-entropy
Categorical cross-entropy

21. Differentiate loss function and cost function.


Loss Function:

A loss function/error function is for a single training example/input.

Cost Function:

A cost function, on the other hand, is the average loss over the entire training dataset.

22. Give details on some of regularization techniques used in CNN?


L1 and L2 Regularization (Weight Decay), Dropout, Batch Normalization, Data
Augmentation and Early Stopping.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

23. Define Mean Absolute Error (MAE)


Mean absolute error (MAE) also called L1 Loss is a loss function used for regression
problems. It represents the difference between the original and predicted values
extracted by averaging the absolute difference over the data set.

24. What is regularization in CNN?


Regularization is a technique that helps prevent overfitting, which occurs when a neural
network learns too much from the training data and fails to generalize well to new data.

25. What are the commonly used non linearity function using CNN?
1.Rectified Linear Unit (ReLU)
2. Leaky ReLU
3. Sigmoid
4. Hyperbolic Tangent (Tanh)
5. Softmax

26. Why is it important to place non-linearities between the layers of neural networks?
Non-linearity introduces more degrees of freedom to the model. It lets it capture more
complex representations which can be used towards the task at hand. A deep neural
network without non-linearities is essentially a linear regression.

27. Following the last FC-3 layer of your network, what activation must be applied?
Given a vector a = [0.3, 0.3, 0.3], what is the result of using your activation on this
vector?
Softmax is the one that is used as it can output class probabilities. Output is [0.33, 0.33,
0.33]

PART B

1. Elucidate how convolutional layers work with schematic representation. (13)


2. Illustrate the convolution operation with example. (13)
3. Explain in detail on computation of 1D and 2D convolution operation. (13)
4. Illustrate with example, why CNN are great performers? (13)
5. Briefly enumerate the pooling function to modify the output layer. (13)
6. Enumerate briefly on ReLU and Sigmoid activation function in detail. (13)
7. Discuss on Hyperbolic tangent and Softmax activation function using CNN. (13)
8. Explain different types of loss functions in deep learning. (13)
9. Explain in detail on different regularization technique using CNN? (13)
10. Give an overview of how gradient computation works in CNNs

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

PART C
1. Explain how to build a CNN model from a scratch for any real time application.
2. Analyse, why to use Adam optimizer for CNN model for training purpose than to use
Gradient Descent or Stochastic Gradient Descent?
3. With an example explain the layers of CNN by running a covnets on of image of
dimension 32x32x3.
4. Let’s consider an image and apply the convolution layer, activation layer, and pooling
layer operation to extract the inside feature.

UNIT III
DEEP LEARNING ALGORITHMS FOR AI
Artificail Neural Netowrks – Linear Associative Networks – Perceptrons -The
Backpropagation Algorithm - Hopfield Nets - Boltzmann Machines - Deep RBMs - Variational
Autoencoders - DeepBackprop Networks- Autoencoders

PART A
1. Draw a simplified taxonomy of artificial neural network.

2. What is an Artificial Neural Network?


An artificial neural network is a set of interconnected model neurons or units. These
units are arranged in a series of layers that together constitute the whole Artificial Neural
Network in a system. A layer can have only a dozen units or millions of units as this
depends on how the complex neural networks will be required to learn the hidden
patterns in the dataset

3. Define the layers of ANN.


Artificial Neural Network has an input layer, an output layer as well as hidden layers.
The input layer receives data from the outside world which the neural network needs to
analyze or learn about. Then this data passes through one or multiple hidden layers that
transform the input into data that is valuable for the output layer. Finally, the output
layer provides an output in the form of a response of the Artificial Neural Networks to
input data provided.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

4. How the units are interconnected in ANN?


In the majority of neural networks, units are interconnected from one layer to another.
Each of these connections has weights that determine the influence of one unit on
another unit. As the data transfers from one unit to another, the neural network learns
more and more about the data which eventually results in an output from the output
layer.

5. Schematic representation of ANN.

6. What are the types of Artificial neuron network?


Feed forward neural network, Convolutional neural network, Modular neural
network, Radial basis function Neural Network, Recurrent neural network
7. List the application of ANN.
Social Media, Marketing and sales, Health Care, Personal Assistants,

8. Differentiate feed forward neural network and Recurrent neural network.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

9. Define Linear Associative network (LAN).


LANs are a type of artificial neural network proposed by Donald Hebb. They consist of
input nodes, output nodes, and connection weights. LANs use the Hebbian learning rule
to associate specific patterns of input activity with specific output responses. However,
LANs have limited representational power compared to more complex neural network
architectures.

10. Define Perceptron’s.


Perceptrons are a specific type of artificial neural network, which can be considered as a
building block for more complex networks. A perceptron consists of a single layer of
artificial neurons (also called perceptrons) connected to the input nodes. Each
connection between the input nodes and the perceptrons has an associated weight. The
perceptrons apply a nonlinear activation function (typically a step function) to the
weighted sum of their inputs, producing an output. Perceptrons can be used for binary
classification tasks, where they learn to separate input patterns into two categories.

11. What is backpropagation algorithm in ANN?


The backpropagation algorithm is a learning algorithm used to train multilayer neural
networks, including perceptrons. It allows the network to learn from labeled training
data and adjust the connection weights to minimize the difference between the network's
predicted output and the desired output. The algorithm works by propagating the error
backward through the network, calculating the gradient of the error with respect to the
weights, and then updating the weights accordingly using gradient descent.
Backpropagation enables the network to learn complex mappings between inputs and
outputs by iteratively adjusting the weights based on the error signal.

12. Brief about the relationship between LAN, Perceptron’s and backpropagation
algorithm.
The relationship between these concepts are, LANs are an early type of neural network
model that inspired the development of more complex architectures like perceptrons.
Perceptrons, in turn, laid the foundation for multilayer neural networks, which are
trained using the backpropagation algorithm to learn complex mappings between inputs

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

and outputs. The backpropagation algorithm revolutionized the field of neural networks
and played a crucial role in their widespread adoption and success in various
applications.

13. What is perceptron and its types?


Perceptron is a single layer neural network and a multi-layer perceptron is called Neural
Networks. Perceptron is a linear classifier (binary). Also, it is used in supervised
learning. It helps to classify the given input data.

14. What are the components of perceptron?


Input layer, weights, Bias, Activation function, output, training algorithm.

15. Name the types of perceptron.


Single layer: Single layer perceptron can learn only linearly separable patterns.
Multilayer: Multilayer perceptrons can learn about two or more layers having a greater
processing power.
The Perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.

16. What is a backpropagation algorithm in a neural network?


Artificial neural networks use backpropagation as a learning algorithm to compute a
gradient descent with respect to weight values for the various inputs. By comparing
desired outputs to achieved system outputs, the systems are tuned by adjusting
connection weights to narrow the difference between the two as much as possible.

17. Determine the time complexity of a backpropagation algorithm?


The time complexity of each iteration -- how long it takes to execute each statement in
an algorithm -- depends on the network's structure. For multilayer perceptron, matrix
multiplications dominate time.

18. What is backpropagation algorithm for multilayer artificial neural networks?

The backpropagation algorithm performs learning on a multilayer feed-forward neural


network. It iteratively learns a set of weights for prediction of the class label of tuples. A
multilayer feed-forward neural network consists of an input layer, one or more hidden
layers, and an output layer.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

19. What is Hopfield Networks?


he Hopfield Neural Networks, invented by Dr John J. Hopfield consists of one layer of
‘n’ fully connected recurrent neurons. It is generally used in performing auto association
and optimization tasks. It is calculated using a converging interactive process and it
generates a different response than our normal neural nets.

20. What is Boltzmann machine in deep learning?

A deep Boltzmann machine is a model with more hidden layers with directionless
connections between the nodes as shown in Fig. DBM learns the features hierarchically
from the raw data and the features extracted in one layer are applied as hidden variables
as input to the subsequent layer.

21. What are the types of Boltzmann machines?


 Restricted Boltzmann Machines (RBMs)
 Deep Belief Networks (DBNs)
 Deep Boltzmann Machines (DBMs)
22. What is the two-step procedure for deep RBM in training process?
The training process for deep RBMs involves a two-step procedure known as pretraining
and fine-tuning.

23. What is the difference between autoencoders and deep autoencoders?


Autoencoder is basically a technique to find fundamental features representing the input
images. A simple autoencoder will have 1 hidden layer between the input and output,
wheras a deep autoencoder will have multiple hidden layers (the number of hidden layer
depends on your configuration).

24. What is an autoencoder?


An autoencoder is a type of artificial neural network used to learn data encodings in an
unsupervised manner.

The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for


a higher-dimensional data, typically for dimensionality reduction, by training the
network to capture the most important parts of the input image.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

25. What are the types of autoencoders?


Here are five popular autoencoders:
Undercomplete autoencoders
Sparse autoencoders
Contractive autoencoders
Denoising autoencoders
Variational Autoencoders (for generative modelling)

PART B
1. Differentiate Artificial neurons with Biological neurons (13)
2. How do the artificial neurons work? Explain with neat diagram (13)
3. Explain with diagram the architecture of linear associative network (13)
4. Describe how does perceptron works in artificial neural network. (13)
5. Explain in detail on Perceptron function, inputs, activation function and outputs of
perceptron (13)
6. How backpropagation algorithm work for neural network in detail? (13)
7. Why backpropagation algorithm for neural network in detail? (13)
8. Elucidate briefly on structure and architecture of Hopfield network. (13)
9. Describe in detail on Deep Restricted Boltzmann Machines (RBMs). (13)
10. Explain in detail on the architecture of autoencoders and how to train autoencoders?
(13)

PART C

1. Develop an ANN model for perceptron to solves a linear classification problem.(15)


2. Consider the following neural network, analyse how back propagation algorithm work?
(15)

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

3. Consider the following problem. We are required to create Discrete Hopfield Network
with bipolar representation of input vector as [1 1 1 -1] or [1 1 1 0] (in case of binary
representation) is stored in the network. Test the hopfield network with missing entries
in the first and second component of the stored vector (i.e. [0 0 1 0]). (15)
4. Assess the working principle of how restricted Boltzmann machines work with suitable
example. (15)

UNIT IV
DATA SCIENCE AND DEEP LEARNING

Data science fundamentals and responsibilities of a data scientist - life cycle of data science –
Data science tools - Data modeling, and featurization - How to work with data variables and
data science tools - How to visualize the data - How to work with machine learning algorithms
and Artificial Neural Networks

PART A

1. What is Data Science?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Data science is the science of analyzing raw data using statistics and machine learning
techniques with the purpose of drawing conclusions about that information.

2. Key Pillars of Data Science


 Domain Knowledge
 Math Skills
 Computer Science
 Communication Skill

3. What are Data Science Processes?

 Setting the Research Goal


 Retrieving Data
 Data Preparation
 Data Exploration
 Data Modeling
 Presentation and Automation

4. Who is data scientist?

Who integrates the skills of software programmer, statistician and storyteller slash artist
to extract the nuggets of gold hidden under mountains of data.

5. Give the Roles & Responsibilities of a Data Scientist

 Management
 Analytics
 Strategy/Design
 Collaboration
 Knowledge

6. Difference Between Data Scientist, Data Analyst, and Data Engineer

Data Scientist Data Analyst Data Engineer


The focus will be on the The main focus of a data Data Engineers focus on
futuristic display of data. analyst is on optimization of optimization techniques and
scenarios, for example how the construction of data in a
an employee can enhance conventional manner. The
the company’s product purpose of a data engineer is

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

growth. continuously advancing data


consumption.
Data scientists present both Data formation and cleaning Frequently data engineers
supervised and unsupervised of raw data, interpreting and operate at the back end.
learning of data, say visualization of data to Optimized machine learning
regression and classification perform the analysis and to algorithms were used for
of data, Neural networks, perform the technical keeping data and making
etc. summary of data. data to be prepared most
accurately.
Skills required for Data Skills required for Data Skills required for Data
Scientist are Python, R, Analyst are Python, R, SQL, Engineer are MapReduce,
SQL, Pig, SAS, Apache SAS. Hive, Pig Hadoop,
Hadoop, Java, Perl, Spark. techniques.

7. Primary motives for the use of Data science technology.

 It helps to convert the big quantity of uncooked and unstructured records into significant
insights.
 It can assist in unique predictions such as a range of surveys, elections, etc.
 It also helps in automating transportation such as growing a self-driving car, we can say
which is the future of transportation.
 Companies are shifting towards Data science and opting for this technology. Amazon,
Netflix, etc, which cope with the big quantity of data, are the use of information science
algorithms for higher consumer experience.

8. Draw the life cycle of Data Science

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

9. Tools used in data science?

 Apache Hadoop

 SAS (Statistical Analysis System)

 Apache Spark

 Data Robot

 Tableau

 BigML

 TensorFlow

 Jupyter

10. What is Data modelling?

The modeling techniques are the most compensating process that has become the center
of attention for the data learners. It is not just about applying functions from one package
class and applying it to the available data. There is more to it than that.

11. What is Data Featurization?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

It is a process that converts the nested JSON object into a pointer. It becomes a vector
of scalar value that is the basic requirement for the analysis process.

12. What is JavaScript Object Notation (JSON)

JSON is a lightweight format for the data set through which machines can easily write
and read. The main reason behind using JSON is that it can easily and strongly interact with
different languages (platform) such as JavaScript, R, Python, etc. The software that is used
to interact with the stored data is mainly for the data that is influenced by JSON.

13. What is data visualization?

Data visualization is the graphical or visual representation of data. It helps to highlight


the most useful insights from a dataset, making it easier to spot trends, patterns, outliers, and
correlations.

14. What are the two main types of data visualization?

a) Exploration:

Investigate the dataset and identify some of its main features, laying the
foundation for more thorough analysis. At this stage, visualizations can make it
easier to get a sense of what’s in your dataset and to spot any noteworthy trends or
anomalies.

b) Explanation

Once you’ve conducted your analysis and have figured out what the data is
telling you, you’ll want to share these insights with others—key business
stakeholders who can take action based on the data.

15. Write the advantages and benefits of effective data visualization?

 Get an initial understanding of your data by making trends, patterns, and outliers
easily visible to the naked eye

 Comprehend large volumes of data quickly and efficiently

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

 Communicate insights and findings to non-data experts, making your data accessible
and actionable

 Tell a meaningful and impactful story, highlighting only the most relevant
information for a given context

16. What are the five data visualization categories?

 Temporal data visualizations

 Hierarchical visualizations

 Network visualizations

 Multidimensional or 3D visualizations

 Geospatial visualizations

17. Write the common general types of data visualization?

 Charts

 Tables

 Graphs

 Maps

 Infographics

 Dashboards

18. What are the rules of Data Visualization?

 Keep it simple

 Add white space

 Use purposeful design principles

 Focus on these three elements

 Make it easy to compare data

 Blend your data sources

19. What are the similarities between data science and AI?

 Both rely on large amounts of data to be effective.

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

 Both use statistical techniques to analyze data and extract insights.

 Both are interdisciplinary fields that draw from computer science, mathematics, and
statistics.

20. Advantages of AI?

Artificial Intelligence can automate repetitive and time-consuming tasks, improve


efficiency, and reduce human error. It can also analyze large amounts of data quickly and
accurately, and provide personalized recommendations and insights. Artificial Intelligence
has the potential to transform many industries, including healthcare, transportation, and
finance.

21. What is Featurization in data science?

Featurization is the process to convert varied forms of data to numerical data which can
be used for basic ML algorithms. Data can be text data, images, videos, graphs, various
database tables, time-series, categorical features, etc.

22. How do artificial neural networks work in machine learning?

A neural network is a method in artificial intelligence that teaches computers to process


data in a way that is inspired by the human brain. It is a type of machine learning
process, called deep learning, that uses interconnected nodes or neurons in a layered
structure that resembles the human brain.

23. How do AI and machine learning work together?

Machine learning is a subset of artificial intelligence that automatically enables a


machine or system to learn and improve from experience. Instead of explicit
programming, machine learning uses algorithms to analyze large amounts of data, learn
from the insights, and then make informed decisions.

24. What is the application of AI in machine learning?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Technologies that come under the umbrella of AI include machine learning and deep
learning. Machine learning enables software applications to become more accurate at
predicting outcomes without being explicitly programmed to do so. Machine learning
algorithms use historical data as input to predict new output values.

25.What are the four 4 types of machine learning algorithms?


There are four types of machine learning algorithms: supervised, semi-supervised,
unsupervised and reinforcement.

PART – B

1. Explicate in detail about data science life cycle with neat diagram? (13)

2. Briefly explain the most commonly used data science tools with example? (13)

3. Explain the roles and responsibilities of data scientist? (13)

4. How to visualize the data and explain the visualization techniques with suitable example.
(13)

5. Compare and Contrast on Machine learning and Artificial Neural Network in detail. (13)

6. How to work with data variables and data science tools with example? (13)

7. How do the artificial neurons work? Explain with neat diagram (13)

8. Discuss step by step working of the Artificial Neural Network in detail. (13)

PART C

1. Analyze why data visualization is so important in Data Science? (15)

2. Implement a model using featurization to add information to dataset to enrich its


performance and accuracy. (15)

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

3. Explicate the steps involved for machine learning using algorithms that automatically
help the system to gather and use data to learn more with real time example. (15)

UNIT V
APPLICATIONS OF DEEP LEARNING

Detection in chest X-ray images -object detection and classification -RGB and depth image
fusion - NLP tasks - dimensionality estimation - time series forecasting -building electric

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

power grid for controllable energy resources - guiding charities in maximizing donations and
robotic control in industrial environments.

1. What is object detection?

Object detection is a profound computer vision technique that focuses on identifying


and labeling objects within images, videos, and even live footage. Object detection models
are trained with a surplus of annotated visuals in order to carry out this process with new
data. It becomes as simple as feeding input visuals and receiving a fully marked-up output
visual.

2. What is the key component is the object detection?

 Bounding box which identifies the edges of the object tagged with a clear-cut
quadrilateral — typically either a square or rectangle.

 Label of the object, whether it be a person, a car, or a dog to describe the target
object. Bounding boxes can overlap to showcase multiple objects in a given shot as
long as the model has prior knowledge of items it is tagging.

3. What is image classification?

This is the prediction of the class of an item in an image. Image classification can show
that a particular object exists in the image, but it involves one primary object and does not
provide the location of the object within the visual.

4. What is image segmentation?

It is the task of grouping pixels with comparable properties together instead of


bounding boxes to identify objects.

5. What is Object localization?

Object localization seeks to identify the location of one or more objects in an image,
whereas object detection identifies all objects and their borders without much focus on
placement.

6. Types of object detection algorithms and methods used in Deep Learning?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

 R-CNN

 Fast R-CNN

 Faster R-CNN

 YOLO (You Only Look Once)

7. What is NLP?

Natural language processing (NLP) deals with building computational algorithms to


automatically analyze and represent human language. NLP is also useful to teach machines
the ability to perform complex natural language related tasks such as machine translation
and dialogue generation.

8. Write the Deep Learning applications i\od NLP?

 Tokenization and Text Classification

 Generating Captions for Images

 Speech Recognition

 Machine Translation

 Question Answering (QA)

 Document Summarization

9. What is time series forecasting?

Time series forecasting is a technique for the prediction of events through a sequence of
time. It predicts future events by analyzing the trends of the past, on the assumption that
future trends will hold similar to historical trends. It is used across many fields of study in
various applications including: Astronomy.

10. Mention the deep learning architectures specialized in time series forecasting?

 N-BEATS (ElementAI)

 DeepAR (Amazon)

 Spacetimeformer

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

 Temporal Fusion Transformer or TFT

11. Draw the architecture diagram for N-BEATS?

12. What are the advantages of N-BEATS model?

 Expressive and easy to use

 Multiple time-series

 Doubly Residual Stacking

 Interpretability

13. What is the core idea of Spacetimeformer model?

The model would consider both temporal and spatial relationships. This is the core idea
of Spacetimeformer.

14. What is the difference between depth image and RGB image?

An RGB-D image provides a per-pixel depth information aligned with corresponding


image pixels. An image formed through depth information is an image channel in which
each pixel relates to a distance between the image plane and the corresponding object in the
RGB image.

15. What is the depth of RGB image?

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

The RGB image represents a 24-bit integer value for each pixel with a fixed resolution of
1/256 of a millimeter. Each bit of blue represents 1/256 millimeter. Each bit of green
represents 1 millimeter. Each bit of red represents 256 millimeters

16. What are RGB images used for?

The main purpose of the RGB color model is for the sensing, representation, and display of
images in electronic systems, such as televisions and computers, though it has also been used
in conventional photography.

17. What is dimensionality estimation?

Dimensionality estimation in deep learning refers to the process of determining the


effective number of dimensions or features required to represent the data accurately. It aims
to identify the intrinsic complexity or structure of the data that can be effectively captured
by a deep learning model.

18. What are the approaches used for dimensionality estimation in deep learning?

Principle Component analysis.


Variational autoencoders
Random Projections
Information theory measures

19. List the advantages of DEEP AR model.

Multiple time Series


Rich sets of inputs
Automatic Scaling

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

Probabilistic forecasting

20. Can deep learning be used in robotics?

The main drive behind the use of deep learning in robotics is that it is more general than
any other learning algorithm. It has been proven deep networks are capable of thinking and
abstraction at a high level

21. How are industrial robots controlled?

The arm has a controller which is the “brain” of the system. The controller holds the
programming code and receives signals from the system (input), processes the signals, and
then sends signals out to the system (output) to control the robot.

22. Which algorithm is used in robotics and industrial?

PRM Algorithm

They are used in complex planning systems and also to find low cost paths around
obstacles. PRMs use a random sample of points on their map where a robot device can
possibly move and then the shortest path is calculated.

PART B

1. Explain in detail on how the object is detected and classified using deep learning
concepts.(13)

2. Detailed overview on building electric power grid for controllable energy resources in
deep learning concept. (13)

3. Explain in detail on the process involved for prediction on historical time dependent data
using neural network. (13)

Downloaded by saranya suriyaprakasam ([email protected])


lOMoARcPSD|42383224

4. Give brief note on (13)

N-Beats
Deep AR
Space time former
Temporal Fusion transformer

5. Explain the role of deep learning in Robotics. (13)

PART C

1. Build a deep learning model for guiding charities in maximizing donations (15)

2. Build a deep learning model for detecting chest X-ray images using tensor flow (15)

3. Implement a model for object detection of traffic images using python. (15)

Downloaded by saranya suriyaprakasam ([email protected])

You might also like