0% found this document useful (0 votes)
12 views65 pages

Deep Learning Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views65 pages

Deep Learning Techniques

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

DEEP LEARNING TECHNIQUES

UNIT-1
1.Is AI a science or is it engineering? Or neither or both? Explain.
A. Artificial Intelligence is both a science and engineering.
Artificial Intelligence (AI) is a branch of research and engineering that
integrates science and engineering to construct intelligent machines. It draws
on work from philosophy, psychology, and computer science, as well as brain
science and languages.
AI Science and Engineering is about the study and building of "intelligent agents": any
system that perceives its environment and takes actions that maximize its chance of
achieving its goals.
AISE aims to create machines that mimic "cognitive" functions that humans associate
with the human mind, such as "learning" and "problem solving".

2. Write down present and future scope of AI.


A. The scope of AI is vast, encompassing everything from job automation to
personalized learning to cybersecurity.
Voice assistants, picture recognition for face unlocking in cell phones, and ML-based
financial fraud detection are all examples of AI software that is now in use.
Artificial Intelligence (AI) is evolving rapidly and should have a significant impact on
society in the near future.
The future of AI is likely to be shaped by a combination of technological
advancements, increased investment, and changing societal attitudes towards the
technology.
There is a wide scope of AI in the scientific sector, AI can revolutionize research and
development. This makes this technology perfect for scientific research that contains
high data volumes.

3. What are kernel methods in Deep learning? Explain.


A. The kernel method is the mathematical technique that is used in ml for analyzing
data. This method uses the Kernel function - that maps data from one space to
another space.
It is generally used in Support Vector Machines (SVMs) where the algorithms classify
data by finding the hyperplane that separates the data points of different classes.
The most important benefit of the Kernel Method is that it can work with non-
linearly separable data, and it works with multiple Kernel functions.

1. Linear Kernel: It is used when the data is linearly separable.

K(x1, x2) = x1 . x2

2. Polynomial Kernel: It is used when the data is not linearly separable.

K(x1, x2) = (x1 . x2 + 1)d


3. Gaussian Kernel: The Gaussian kernel is an example of a radial basis function kernel.

k(xi, xj) = exp(-𝛾||xi - xj||2)

4. Exponential Kernel: Similar to the RBF kernel, but it decays much more quickly.

k(x, y) =exp(-||x -y||22)

5. Laplacian Kernel: Similar to RBF Kernel, it has a sharper peak and faster decay.

k(x, y) = exp(- ||x - y||)

6. Hyperbolic or the Sigmoid Kernel: It is used for non-linear classification problems. It transforms
the input data into a higher-dimensional space using the Sigmoid kernel.

k(x, y) = tanh(xTy + c)

7. Anova radial basis kernel: It is a multiple-input kernel function that can be used for feature
selection.

k(x, y) = k=1nexp(-(xk -yk)2)d

8. Radial-basis function kernel: It maps the input data to an infinite-dimensional space.

K(x, y) = exp(-γ ||x - y||^2)

4.Write down brief history and evolution of AI.


A. Early Beginnings (1950s-1960s): Dartmouth Summer Research Project (1956),
Neural Networks (1943)
Rule-Based Expert Systems (1970s-1980s): MYCIN (1976), PROLOG (1972),
Knowledge Engineering (1980s)
Machine Learning (1980s-1990s): Backpropagation (1986), Decision Trees (1980s),
Support Vector Machines (1995)
AI Winter (1984-2000s): Funding decline (1984), Expert systems limitations (1980s)
Resurgence (2000s-present): Deep Learning (2006), Big Data (2000s), Computational
Power (2000s).
Modern AI (2010s-present): Convolutional Neural Networks (2012), Recurrent Neural
Networks (RNNs)
AI's evolution has been shaped by advances in computing, mathematics, and our
understanding of human intelligence.
5. Define in your own words the terms: state, state space, search tree, Search
node.
A. State: A state represents a configuration of the problem you are trying to solve. It
is a snapshot of all the relevant information at a particular point in time. For example,
in a chess game, a state would be the positions of all the pieces on the board.
State Space: The state space is the set of all possible states that can be reached from
the initial state by any sequence of actions. In the chess example, the state space
would be all possible board configurations.
Search Tree: A search tree is a tree structure that represents the search process. Each
node in the tree represents a state. The root of the tree is the initial state, and the
branches of the tree are actions leading to other states (nodes).
Search Node: A search node is a node in the search tree. It represents a state and
also contains information about the action that led to this state and the total cost of
the path from the initial state.
6. Explain the terms Over fitting and Under fitting in ML. (or) Interpret the
concept of Underfitting with a suitable example.
A. Bias and Variance in Machine Learning
• Bias: Bias refers to the error due to overly simplistic assumptions in the
learning algorithm. When a model has poor performance both on the training
and testing data means high bias because of the simple model, indicating
underfitting.
• Variance: Variance is the error due to the model’s sensitivity to fluctuations in
the training data. The model performs well on the training data but poorly on
the testing data, indicating overfitting.

Underfitting in Machine Learning: A statistical model or a machine learning


algorithm is said to have underfitting when a model is too simple to capture data
complexities.
Note: The underfitting model has High bias and low variance.
Reasons for Underfitting
1. The model is too simple to represent the data.
2. The size of the training dataset used is not enough.
3. Features are not scaled.
Techniques to Reduce Underfitting
1. Increase model complexity.
2. Increase the number of features, performing feature engineering.
3. Remove noise from the data.
Overfitting in Machine Learning: A statistical model is said to be overfitted when the
model does not make accurate predictions on testing data.
Note: The overfitting model has low bias and high variance.
Reasons for Underfitting
Overfitting is a problem where the evaluation of machine learning algorithms on
training data is different from unseen data.
Reasons for Overfitting:
1. High variance and low bias.
2. The model is too complex.
3. The size of the training data.
Techniques to Reduce Overfitting
1. Improving the quality of training data.
2. Increase the training data can improve the model’s ability.
3. Reduce model complexity.

Good Fit in a Statistical Model


Ideally, the case when the model makes the predictions with 0 error, is said to have a
good fit on the data. This situation is achievable at a spot between overfitting and
underfitting.

7.How random forests are related to Decision trees. (or) What is a Decision
tree algorithm? Explain.
A. The Random Forest Algorithm combines the output of multiple Decision Trees to
generate the final output. This process of combining the output of multiple individual
models is called Ensemble Learning.
Relationship between Decision Trees and Random Forests:
1. Random Forests are an extension of Decision Trees
2. Random Forests address Decision Trees' limitations: Overfitting, High variance.
3. Random Forests improve Decision Trees' performance:
- Increased accuracy
- Improved robustness
- Better handling of high-dimensional data
Overview of Random Forest vs Decision Tree

Aspect Random Forest Decision Tree

Ensemble of multiple decision


Nature Single decision tree
trees

Bias-Variance Lower variance, reduced Higher variance, prone to


Trade-off overfitting overfitting

Predictive Generally higher due to Less interpretable due to the


Accuracy ensemble ensemble

More robust to outliers and


Robustness Sensitive to outliers and noise
noise

Slower due to multiple tree


Training Time Faster as it builds a single tree
construction

Provides feature importance More interpretable as a single


Interpretability
but less reliable tree

Feature Provides feature importance Provides feature importance but


Importance scores is less reliable

Decision Tree: A decision tree is a flowchart-like structure used to make decisions or


predictions. It consists of nodes representing attributes, branches representing the
outcome of these decisions, and leaf nodes representing final outcomes or
predictions.
Structure of a Decision Tree
1. Root Node: Represents the entire dataset and the initial decision to be made.
2. Internal Nodes: Each internal node has one or more branches.
3. Branches: The outcome of a decision or test, leading to another node.
4. Leaf Nodes: final decision which has no further splits occur at these nodes.
How Decision Trees Work?
1. Selecting the Best Attribute: Using a metric like Gini impurity, entropy, or
information gain, the best attribute to split the data is selected.
2. Splitting the Dataset: The dataset is split into subsets based on the selected
attribute.
3. Repeating the Process: The process is repeated recursively for each subset,
creating a new internal node or leaf node until a stopping criterion is met.
Metrics for Splitting

• Gini Impurity: Measures the likelihood of an incorrect classification of a


new instance.
• Entropy: Measures the amount of uncertainty or impurity in the dataset.
• Information Gain: Measures the reduction in entropy or Gini impurity after
a dataset is split on an attribute.
Advantages: Simplicity and Interpretability, Versatility, No Need for Feature Scaling.
Disadvantages: Overfitting, Instability, Pruning
Applications: Business Decision Making, Healthcare, Finance, Marketing
Random Forest:
Random Forest is a tree-based machine-learning algorithm that leverages the power
of multiple decision trees to make decisions.

This process of combining the output of multiple individual models (also known as
weak learners) is called Ensemble Learning. If you want to read more about how the
random forest and other ensemble learning algorithms work, check out the following
articles:
8. What are the assumptions of Gradient boosting Algorithm?
A. Gradient Boosting Algorithm (GBA) relies on the following assumptions:
Assumptions:
1. Linearity: Relationship b/w features and target variable is approximately linear.
2. Independence: Observations are independent and identically distributed.
3. Constant Variance: Variance of residuals is constant across all levels of predictors.
4. Normality: Residuals follow a normal distribution.
5. No multicollinearity: Features are not highly correlated.
6. No significant outliers: Data is free from influential outliers.
7. Sufficient data: Large enough sample size for reliable estimates.
Gradient Boosting
Gradient Boosting is a powerful boosting algorithm that combines several weak
learners into strong learners, in which each new model is trained to minimize the loss
function such as mean squared error or cross-entropy of the previous model using
gradient descent.

9. How can you improve the performance of the Gradient Boosting


Algorithm?
A. Improving Gradient Boosting Algorithm performance involves:
Hyperparameter Tuning: Optimize learning rate for convergence.
Regularization Techniques: L1 and L2 Regularization- Reduce overfitting.
Feature Engineering: selecting, transforming, and creating features to improve
model performance.
Data Preprocessing: ensuring data quality and readiness for analysis.
Ensemble Methods: Stacking, Bagging Boosting: Combine GBM with other boosting
models.
Advanced Techniques: Gradient Boosting with H2O, XGBoost, LightGBM.
Monitoring and Evaluation: Cross-validation, Monitor overfitting.
Real-World Considerations: Data quality, Model deployment.

10. How is it possible to perform un-supervised learning with Random


Forest?
A. Unsupervised learning with random forest is done by constructing a joint
distribution based on your independent variables that roughly describes your
data. Then simulate a certain number of observations using this distribution. For
example if you have 1000 observations you could simulate 1000 more.

12.Explain how random forests give output for classification and regression
problems.
A. Decision Making and Voting: When it comes to making predictions, each decision
tree in the Random Forest casts its vote. For classification tasks, the final prediction is
determined by the mode across all the trees. In regression tasks, the average of the
individual tree predictions is taken.
13. Discuss about Probabilistic modelling in detail.
A. Probabilistic models are used to optimize complex models with many parameters,
such as neural networks. Probabilistic models are statistical models that include one
or more probability distributions in the model to account for these additional factors.
They allow us to account for error or randomness in our statistical models of data.
Categories Of Probabilistic Models
These models can be classified into the following categories:
• Generative models
• Discriminative models.
• Graphical models
Generative models: Generative models aim to model the joint distribution of the
input and output variables. They can be used for tasks such as image and speech
synthesis
Discriminative models: The discriminative model aims to model the conditional
distribution of the output variable given the input variable. They can be used for
tasks such as image recognition, speech recognition.
Graphical models: These models use graphical representations to show the
conditional dependence between variables. They are commonly used for tasks such
as image recognition, natural language processing.
Advantages Of Probabilistic Models
• Probabilistic models are an increasingly popular method in many fields,
including artificial intelligence, finance, and healthcare.
• The main advantage of these models is their ability to take into account
uncertainty and variability in data.
Disadvantages Of Probabilistic Models
There are also some disadvantages to using probabilistic models.
• One of the disadvantages is the potential for overfitting, where the model is
too specific to the training data.
• Not all data fits well into a probabilistic framework.

14. Describe the role of Artificial Intelligence in Natural Language Processing.


A. Natural language processing is a branch of artificial intelligence that focuses on
giving computers the ability to understand human language.
Elements: Computational linguistic, Machine learning, Deep learning
Some of the key benefits of NLP for businesses include:
• Detecting and processing large volumes of data.
• Providing valuable insights into brand performance
• Detecting issues and addressing them to improve performance
NLP can be structured in many different ways using various machine learning models,
• Syntax analysis used to determine the exact meaning of text.
• Sentiment analysis measures tone useful in analyzing social media posts
• Semantic analysis, which helps the ml about the less literal meanings of words
• Lexicons, which are lists of words and the emotions they evoke
• Word sense disambiguation, which is used in computational
• Lemmatization, which determines the context of a sentence.
• Stemming, which analyzes the end or beginning of a word.
• Summarization is commonly used to summarize news reports.
Natural Language Processing Tools and Techniques
Natural language processing tools and techniques provide the foundation for
implementing this technology in real-world applications. Two of the most popular
NLP tools are Python and the Natural Language Toolkit (NLTK).
NLP Examples
• Voice assistants: Siri and Alexa.
• Chatbots: learn to recognize context over time to provide responses.
• Language translation: Google Translate use NLP to accurately capture the
input language and translate it to the output language.
• Sentiment analysis: This NLP application is commonly used in social media
analyses to discover consumer insights.
• Text extraction: NLP enables users to obtain pre-defined information from
text, making it helpful in extracting keywords.
The Use of AI in Natural Language Processing
As a crucial element of artificial intelligence, NLP provides solutions to real-world
problems, making it a fascinating and important field to pursue. Understanding
human language is key to the justification of AI’s claim to intelligence.
The progress and advancements in the field of NLP will play a significant role in the
overall development and growth of AI.
UNIT - II
1.Discuss about types of Optimizers.
A. An optimizer is an algorithm or function that adapts the neural network's attributes,
like learning rate and weights.
1.Gradient Descent (GD): This is the most basic optimizer that directly uses the
derivative of the loss function and learning rate to reduce the loss and achieve the
minima. This approach is also adopted in backpropagation in neural networks.
a_new = x –alpha *f’(a)

2.Stochastic Gradient Descent: This is a changed version of the GD method, where


the model parameters are updated on every iteration. It means that after every
training sample, the loss function is tested and the model is updated.
3.Mini-Batch Gradient Descent: Another variant of this GD approach is mini-batch,
where the model parameters are updated in small batch sizes. Therefore, the mini-
batch gradient descent algorithm is comparatively faster than both batch gradient
descent and stochastic gradient descent algorithms.
4.Adagrad: One of the key benefits of using Adagrad optimizer in neural networks is
that it does not need manual modification of the learning rate. It is more reliable
than the gradient descent algorithms and their other variants. Moreover, it attains
convergence at a faster speed.
5.RMSProp (root-mean-square prop): It is an improvement to the Adagrad optimizer.
This aims to reduce the aggressiveness of the learning rate by taking an exponential
average of the gradients instead of the cumulative sum of squared gradients.
6.Adam: Adaptive Moment Estimation combines the power of RMSProp and
momentum-based GD. In Adam optimizers, the power of momentum GD to hold the
history of updates and the adaptive learning rate provided by RMSProp makes Adam
optimizer a powerful method.

2. Explain the difference between AI, ML and DL.


A. Below is a table of differences between AL, ML and DL:
Artificial Intelligence Machine Learning Deep Learning
AI stands for Artificial ML stands for Machine DL stands for Deep
Intelligence, and is Learning, and is the study Learning, and is the study
basically the study which that uses statistical that makes use of Neural
enables machines to methods enabling Networks to imitate
mimic human behaviour machines to improve with functionality just like a
through particular experience. human brain.
algorithm.
AI is the broader family ML is the subset of AI. DL is the subset of ML.
consisting of ML and DL as
it’s components.
AI is a computer ML is an AI algorithm DL is a ML algorithm that
algorithm which exhibits which allows system to uses deep neural
intelligence through learn from data. networks to analyze data
decision making. and provide output
accordingly.
Search Trees and much K-Mean, Support Vector complex functionalities
complex math is involved Machines, etc., then it into linear/lower
in AI. defines the ML aspect. dimension features by
adding more layers, then
it defines the DL aspect.
The aim is to increase The aim is to increase It attains the highest
chances of success and accuracy not success accuracy when it is
not accuracy. ratio. trained with large amount
of data.
Three categories Of AI : Three categories Of ML: DL can be four
Artificial Narrow Supervised Learning, fundamental network
Intelligence (ANI), Unsupervised Learning architectures:
Artificial General and Reinforcement Unsupervised Pre-trained
Intelligence (AGI) and Learning Networks, Convolutional
Artificial Super Neural Networks,
Intelligence (ASI) Recurrent Neural
Networks and Recursive
Neural Networks
The efficiency Of AI is the Less efficient than DL More powerful than ML.
efficiency provided by ML
and DL respectively.
Examples of AI: Google’s Examples of ML: Virtual Examples of DL: Image
AI-Powered Predictions, Personal Assistants: Siri, analysis and caption
Ridesharing Apps Like Alexa, Google, etc., generation, etc.
Uber and Lyft, etc.

3. How to improve Deep learning using weight initialization.


A. Weight initialization is an important design choice when developing deep learning
neural network models.
• Weight initialization is used to define the initial values for the parameters in
neural network models.
• How to implement the xavier and normalized xavier weight initialization
heuristics used for nodes that use the Sigmoid or Tanh activation functions.
• How to implement the he weight initialization heuristic used for nodes that
use the ReLU activation function.

Weight Initialization Techniques:


1. Xavier Initialization (2010): For Sigmoid/Tanh activation functions
- Variance: 1/n (n = number of inputs)
2. Normalized Xavier Initialization (2010): For Sigmoid/Tanh activation functions
3. He Initialization (2015): For ReLU activation functions
4. Orthogonal Initialization: For recurrent neural networks (RNNs)
- Weights initialized as orthogonal matrices
Implementation:
Python (TensorFlow/Keras):
import tensorflow as tf
from tensorflow.keras.layers import Dense
xavier_initializer = tf.keras.initializers.GlorotUniform()
he_initializer = tf.keras.initializers.HeUniform()
model = tf.keras.models.Sequential([
Dense (64, activation='relu', kernel_initializer=he_initializer),
Dense (32, activation='sigmoid', kernel_initializer=xavier_initializer)])

4. Explain the Google duplex project.


A. Google Duplex is an artificial intelligence (AI) technology that mimics a human
voice and makes phone calls on a person's behalf.
Conducting Natural Conversations
There are several challenges in conducting natural conversations: natural language is
hard to understand, natural behavior is tricky to model, latency expectations require
fast processing, and generating natural sounding speech, with the appropriate
intonations, is difficult.
Enter Duplex
Google Duplex’s conversations sound natural thanks to advances
in understanding, interacting, timing, and speaking. At the core of Duplex is
a recurrent neural network (RNN) designed to cope with these challenges, built
using TensorFlow Extended (TFX).

Sounding Natural
We use a combination of a concatenative text to speech (TTS) engine and a synthesis
TTS engine to control intonation depending on the circumstance.
System Operation
The Google Duplex system has a self-monitoring capability, which allows it to
recognize the tasks it cannot complete autonomously.
Benefits for Businesses and Users
Duplex can call the business to inquire about open hours and make the information
available online with Google, making the information more accessible to everyone.
Duplex asking for holiday hours:
For users, Google Duplex is making supported tasks easier. Instead of making a phone
call, the user simply interacts with the Google Assistant, and the call happens
completely in the background without any user involvement.
5. What is optimization? What are the measures used to minimise cost.
A. Optimization is the process of finding the best solution among a set of possible
solutions to maximize or minimize a specific objective function, subject to
constraints.
Types of Optimization:
1. Maximization: Increase revenue, profit, or efficiency.
2. Minimization: Reduce cost, risk, or time.
Measures to Minimize Cost:
1. Linear Programming (LP)
2. Integer Programming (IP)
3. Dynamic Programming (DP)
4. Greedy Algorithms
5. Branch and Bound

6. Explain the deep learning network architecture.


A. Neural Network Architectures
Neural network architectures are the building blocks of deep learning models. They
consist of interconnected nodes, called neurons, which are organized in layers. Each
neuron receives inputs, computes mathematical operations, and produces outputs.

Main Components of Neural Network Architecture


1. Input Layer: The input layer is the initial layer of the neural network and is
responsible for receiving the input data.
2. Hidden Layers: Hidden layers are the intermediate layers between the input
and output layers.
3. Neurons (Nodes): Neurons, also known as nodes, are the individual computing
units within a neural network.
4. Weights and Biases: Weights and biases are parameters associated with the
connections between neurons.
5. Activation Functions: Activation functions add non-linear behavior to the
network and allow it to learn complex patterns.
6. Output Layer: The output layer is the final layer of the neural network that
produces the outputs after processing the input data.
7. Loss Function: The loss function measures the discrepancy b/w network's
predicted output and the true output.
Types of neural network architectures:
1. Feedforward Neural Networks (FNNs): An FNN is the most fundamental type
of neural network, where information flows in one direction, from the input
layer to the output layer.
2. Convolutional Neural Networks (CNNs): CNNs are particularly effective for
processing grid-like data, such as images and videos.
3. Recurrent Neural Networks (RNNs): RNNs are designed to process sequential
data, where the order of inputs matters.
4. Long Short-Term Memory (LSTM) Networks: LSTMs are a type of RNN that
addresses the vanishing gradient problem.

7.Compare traditional machine learning approaches with current deep


learning approaches.
A. Traditional programming is rule-based and deterministic, relying on human-crafted
logic, whereas machine learning is data-driven and probabilistic, relying on patterns
learned from data. As you can see, machine learning can turn your business data into
a financial asset.
Traditional machine learning is a method where algorithms learn from a given set of
data, drawing patterns and making decisions based on predefined features.
Deep learning is a subset of machine learning that mimics the functioning of the
human brain through neural networks. Unlike its traditional counterpart, deep
learning doesn’t rely on handcrafted features; instead, it autonomously learns from
raw data.

8. Explain about biological vision and machine vision.


A. Biological vision runs on organic matter and cortical cells. Computer vision runs on
transistors and electronic circuits.

Computer vision and human vision are two distinct yet interconnected domains that
shape our understanding of visual perception.
Computer vision refers to the ability of machines to understand and interpret visual
data. It is a field of artificial intelligence that utilizes algorithms and computational
models to analyze and make sense of images and videos.
Human vision is a remarkable biological process that allows us to perceive and
interpret the visual world. It begins with the eyes, which capture light and send
signals to the brain for processing.
Key Differences Between Computer Vision and Human Vision
1. Processing Mechanisms: Computer vision relies on algorithms and
computational models to process visual data, whereas human vision involves
complex neural networks and biological processes.
2. Adaptability and Efficiency: Human vision is highly adaptable and efficient in
recognizing patterns. Computer vision algorithms can struggle in such
situations.
3. Handling Complex Scenes and Varied Conditions: Human vision integrates
information from multiple sensory channels. Computer vision algorithms often
focus on specific visual features.

9. Explain about Adaptive gradient algorithm.


A. Adaptive Gradient Algorithm (Adagrad) is an algorithm for gradient-based
optimization. The idea behind this particular method is that it enables the learning
rate to adapt to the geometry of the loss function.
Adaptive Gradient Algorithms are a family of optimization algorithms that adjust the
learning rate for each parameter based on the gradient's magnitude.
Key Characteristics:
1. Adaptive Learning Rate: Adjusts learning rate for each parameter.
2. Parameter-Specific Learning Rate: Different learning rates for each parameter.
3. Gradient-Based Adaptation: Adjusts learning rate based on gradient magnitude.
How Adaptive Gradient Algorithms Work:
1. Initialize parameters and learning rate.
2. Compute gradient for each parameter.
3. Update learning rate for each parameter based on gradient magnitude.
4. Update parameters using updated learning rate.

10. Explain the terms forward and backward propagation in ML.


A. Forward Propagation: Forward propagation is the initial step in training a neural
network, where the input data is fed through the network to generate a prediction.
Input Data: The input layer consists of the data that you feed into the network.
Weights and Bias: Weights and biases are the learnable parameters of a neural
network.
Neurons and Layers
• Input Layer: This is where the network receives input from your dataset.
• Hidden Layer(s): These are layers in between the input and output layers.
• Output Layer: This is the final layer provides the predictions.
Linear Transformation: The first step in forward propagation involves calculating a
weighted sum of the inputs and the associated weights, and then adding the bias.
Activation Functions: A neural network is composed of layers of nodes. Each node in
a layer receives input from multiple nodes from the previous layer.

Backward propagation: In machine learning, backpropagation is an effective


algorithm used to train artificial neural networks, especially in feed-forward neural
networks.
• Backpropagation is an iterative algorithm, that helps to minimize the cost
function by determining which weights and biases should be adjusted.
1. Supply data to the network and give the weights time to work through the
model.
2. Compare the output to the input and calculate the loss function.

3. Run the error through the network from output to input.

4. Update the weights and repeat until the system minimizes the error and makes
accurate predictions with new data.
11. Illustrate on computation representation of language in Human and
Machine language.
A. Here's an illustration of computational representation of language in human and
machine languages:
Human Language Representation:
1. Phonological: Sound waves (speech) or written symbols (text)
2. Lexical: Words and phrases
3. Syntactic: Grammar and sentence structure
4. Semantic: Meaning and context
5. Pragmatic: Communication goals and intentions

Machine Language Representation:


1. Binary: 0s and 1s (machine code)
2. Assembly Language: Symbolic representation (e.g., x86)
3. Programming Languages: High-level languages (e.g., Python, Java)
4. Data Structures: Arrays, linked lists, trees, graphs
5. Algorithms: Instructions for processing language data
Computational Models of Human Language:
1. Finite State Machines (FSMs)
2. Context-Free Grammars (CFGs)
Language Representation Formats:
1. Text: ASCII, Unicode
2. Speech: WAV, MP3

12. Enumerate the concept of L1 and L2 regularization in detail.


A. Regularization is a technique used in machine learning and statistical modelling to
prevent overfitting and improve the generalization ability of models.
1. L1 regularization
L1 regularization, also known as Lasso regularization, adds the sum of the absolute
values of the model’s coefficients to the loss function. L1 regularization is particularly
useful when dealing with high-dimensional datasets with desired feature selection.
Mathematically, the L1 regularization term can be written as:
L1 regularization = λ * Σ|wi|
Here, λ is the regularization parameter, wi represents the individual model
coefficients and the sum is taken over all coefficients.
2. L2 regularization
L2 regularization, also known as Ridge regularization, adds the sum of the squared
values of the model’s coefficients to the loss function. L2 regularization can prevent
overfitting by spreading the influence of a single feature across multiple features.
Mathematically, the L2 regularization term can be written as:
L2 regularization = λ * Σ(wi^2)

13. How to know if our model is suffering from the Exploding/Vanishing


gradient problem.
A. The model weights quickly become very large during training. The model weights
go to NaN values during training. The error gradient values are consistently above 1.0
for each node and layer during training.
How to recognise Vanishing Gradient Problem
1. Calculate loss using Keras and if its consistent during epochs that means
Vanishing Gradient Problem.
2. Draw the graphs between weights and epochs and if it is constant that means
weight has not changed and hence vanishing gradient problem.
Here are practical solutions to prevent the occurrence of exploding gradients:
1. Gradient Clipping.
2. Weight Initialization.
3. Architectural Modifications.

14. Elaborate on various cost functions used in training deep networks.


A. Cost functions, also known as loss functions or objective functions, are used to
evaluate the performance of a deep learning model during training.
Types of Cost Functions:
Regression Tasks, Binary Classification, Binary Cross-Entropy Loss, Multi-Class
Classification, Imbalanced Data, Custom Loss Functions:
1. Mean Squared Error (MSE): measures the average squared difference between
predicted and actual values.
2. Cross-Entropy Loss: measures the difference between predicted probabilities and
actual labels.
3. Binary Cross-Entropy Loss: Used for binary classification problems.
4. Categorical Cross-Entropy Loss: Used for multi-class classification problems.
5. Mean Absolute Error (MAE): Measures the average absolute difference between
predicted and actual values.
6. Mean Absolute Percentage Error (MAPE): Measures the average absolute
percentage difference between predicted and actual values.

15. Discuss about the Softmax layer of a Fast Food – classifying Network.
A. The Softmax layer is a crucial component in a Fast Food-classifying Network,
enabling the model to predict probabilities for each food class.
Softmax Layer:
1. Normalizes output to ensure probabilities sum to 1.
2. Computes probability distribution over all classes.
3. Used for multi-class classification problems.
Fast Food-classifying Network:
1. Convolutional Neural Network (CNN) architecture.
2. Trained on dataset of images of various fast foods.
3. Softmax layer outputs probabilities for each food class.
By incorporating a Softmax layer, the Fast Food-classifying Network accurately
predicts probabilities for each food class, enabling efficient classification and
decision-making.
UNIT - III
1.Explain different types of neural networks.
A. Neural Networks are a fundamental component of Deep Learning, inspired by the
human brain's structure and function.

1. Convolutional neural networks: Convolutional neural networks are beneficial for


AI-powered image recognition applications is commonly used in natural language
processing (NLP), image classification.
2. Deconvolutional neural networks: Deconvolutional neural networks work on the
same principles as convolutional networks, except in reverse. Deconvolution neural
networks are helpful for various applications, including image analysis and synthesis.
3. Recurrent neural networks: This complex neural network model works by saving
the output generated by feeding them back into the algorithm. Recurrent neural
networks are commonly used in text-to-speech applications
4. Feed-forward neural networks: Feed-forward neural networks are designed to
process large volumes of ‘noisy’ data and create ‘clean’ outputs. This type of neural
network is also known as the multi-layer perceptrons (MLPs) model.
5. Modular neural networks: Modular neural networks feature a series of
independent neural networks whose operations are overseen by an intermediary.
6. Generative adversarial networks: Generative modeling uses unsupervised learning
to generate plausible conclusions from an original dataset.

2. Explain the terms loss function and optimizers with respect to DL.
A. Loss function is a method of evaluating how well your algorithm is modeling your
dataset.

optimizer is primary role is to minimize the model's error or loss function,


enhancing performance.
Deep Learning Model Optimization Methods
1. Pruning reduces model size by removing less important neurons.
2. Quantization decreases memory usage and computation time.
The loss function is the quantity that will be minimized during training. The
optimizer determines how the network will be updated based on the loss
function.

3. Explain the steps in setting up the deep learning workstation.


A. The main steps to set up a deep learning workstation
Step 1: Install Operating System
Choose an OS (Ubuntu, Windows, or macOS) and Install OS on SSD.
Step 2: Install GPU Drivers
Download and install NVIDIA CUDA drivers and Verify GPU detection.
Step 3: Install Deep Learning Frameworks
Install TensorFlow, PyTorch, or Keras using pip and Verify framework installation.
Step 4: Install Python and Libraries
Install Python (3.7+ or 3.9+) & Install necessary libraries (NumPy, SciPy).
Step 5: Set up IDE
Install Jupyter Notebook, Visual Studio Code & Configure IDE settings.
Step 6: Install Additional Tools
Install cuDNN & Install OpenCV.
Step 7: Verify Setup
Run sample deep learning code & Verify GPU utilization.
Step 8: Optimize Performance
Adjust GPU settings & Optimize code for parallel processing.

4. What is the best GPU for deep learning? Explain in detail.


A. The GIGABYTE GeForce RTX 3080 is the best GPU for deep learning since it was
designed to meet the requirements of the latest deep learning techniques, such as
neural networks and generative adversarial networks. The RTX 3080 enables you to
train your models much faster than with a different GPU.
Technical Features
• CUDA cores: 10,240
• Clock speed: 1,800 MHz
• GPU memory: 10 GB of GDDR6

5.Plot the graph between training (i.e no. of epochs) and validation loss.
Explain.
A. Training loss is the calculated error when the model makes predictions on the
training data. It is updated after every forward and backward pass of the model
during the training process. A loss function quantifies the difference between the
predicted and actual labels.
Validation loss evaluates the model’s performance on a separate dataset that the
model has never seen during training. Validation loss is computed at the end of each
epoch during training but is not used to update the model weights.
6. Explain the anatomy of a neural network.
A. The Anatomy of a Neural Network
Neural Network: Neural networks are made up of multiple layers of interconnected
nodes or neurons that work together to process and analyze data.
1. Input Layer: The input layer is the first layer of the neural network and is
responsible for receiving data from the outside world.
2. Hidden Layer: The hidden layer is where the magic happens. This layer is
responsible for processing and analyzing the data received from the input layer.
3. Output Layer: The output layer is the final layer of the neural network and is
responsible for producing the output or prediction.
4. Activation Function: The activation function is a mathematical function that is
applied to the output of each node in the neural network.
5. Backpropagation: Backpropagation is a technique used to train neural networks.
6. Dropout: Dropout is a regularization technique that is used to prevent overfitting
in neural networks.
7. Convolutional Neural Networks: Convolutional neural networks (CNNs) are a type
of neural network that is particularly well-suited for image recognition tasks.
7. Elaborate on Reuters dataset in detail.
A. The Reuters-21578 dataset is one of the most widely used data collections for text
categorization research. It is collected from the Reuters financial newswire service in
1987.
Dataset Features
- Text Classification: each newswire is labelled with one or more topics.
- Multi-Label Classification: each newswire can belong to multiple topics.
- Word Indexing: Words are indexed by frequency for efficient filtering operations.
- Sequence Length: Sequences have length that can be specified .
Dataset Structure
The dataset is divided into training and testing sets, with 80% used for training and
20% for testing. Each sequence is represented as a list of word indexes, with
unknown words replaced by a special character.

8. Inspect the implementation of binary classification.


A. Binary classification is one of the most commonly used techniques in ml, used to
classify data into two distinct classes. This technique is used in many real-world
applications, such as image classification, email spam detection & medical diagnosis.
Key Steps
1. Define the Problem: define the problem you are trying to solve.
2. Prepare the Data: The next step is to prepare the data for analysis.
3. Split the Data: Once the data is prepared, it is important to split it into training
and testing sets.
4. Choose a Model: When choosing a model, it is important to consider the
complexity of the model, its ability to handle the data.
5. Train the Model: The next step is to train the model using the training data.
6. Evaluate the Model: To evaluate its performance on the testing data.

7. Improve the Model: If the model's performance is not satisfactory, it can be


improved by using techniques.

9. Discuss about keras workflow.


A. Keras is a model-level library, providing high-level building blocks for developing
deep-learning models. It doesn’t handle low-level operations such as tensor
manipulation and differentiation.
There 4 steps to the portion of your overall neural network ml workflow where Keras
comes into play. These steps are as follows:
1.Define the training data: define your input and target tensors.
2. Define a neural network model: The Sequential model class and the Functional
API, both share the goal of defining a neural network, but take different approach.
3. Configure the learning process: With both the training data defined and model
defined, it's time configure the learning process.
4. Train the model: At this point we have training data and a fully configured neural
network to train with said data.

# Define the training data


import numpy as np
X_train = np.random.random((5000, 32))
y_train = np.random.random((5000, 5))
# Define the neural network model
from keras import models
from keras import layers
INPUT_DIM = X_train.shape[1]
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_dim=INPUT_DIM))
model.add(layers.Dense(5, activation='softmax'))
# Configure the learning process
from keras import optimizers
from keras import metrics
model.compile(optimizer='adam', loss='categorical_crossentropy',
metrics=['accuracy'])
# Train the model
model.fit(X_train, y_train, batch_size=1, epochs=10)

10. Explain about the architecture of Keras.


A. Keras is a high-level neural networks API, written in Python. Its architecture is
designed for easy usage and flexibility.

Keras Architecture:
1. Frontend: User interface for building and training models.
2. Backend: Engine for computing tensors and gradients.
Key Components:
1. Models: Sequential, Functional, or Subclassing API.
2. Layers: Building blocks of models (Dense, Conv2D, LSTM, etc.).
3. Activations: Functions applied to layer outputs (ReLU, Sigmoid, etc.).
4. Optimizers: Algorithms for weight updates (Adam, SGD, etc.).
5. Loss Functions: Metrics for evaluating model performance.
Backend Engines:
1. TensorFlow: Default backend.
2. Theano: Optional backend.
3. CNTK: Optional backend.

11. Explain the concept “Deep learning with Cloud”.


A. Deep Learning with Cloud refers to the integration of deep learning technologies
with cloud computing resources to enable scalable, efficient, and cost-effective AI
model training and deployment.
Deep learning uses hierarchical artificial neural networks for machine learning
processes. These networks mimic the human brain, with interconnected neuron
nodes. Unlike traditional methods, deep learning enables nonlinear data processing.
Benefits:
1. Scalability: Train large models on vast datasets using cloud-based infrastructure.
2. Flexibility: Choose from various cloud providers (AWS, Google Cloud, Azure).
3. Cost-effectiveness: Pay-as-you-go pricing models reduce infrastructure costs.
4. Accelerated Training: Utilize specialized hardware (GPUs, TPUs) for faster training.
5. Collaboration: Share resources and expertise across teams and organizations.
Cloud-Based Deep Learning Platforms:
1. Google Cloud AI Platform
2. Amazon SageMaker
3. Microsoft Azure Machine Learning

12. Discuss the classification of newswires and explain with the dataset.
A. Classifying newswires: a multi-class classification example
This notebook contains the code samples found in Chapter 3, Section 5 of Deep
Learning with Python.
Classification of newswires involves categorizing news articles into predefined topics
or categories.
Classification Categories:
1. Business
2. Technology
3. Education
4. International News
Dataset:
The Reuters Corpus dataset is commonly used for newswire classification.
Reuters Corpus Dataset:
- 11,228 newswires from Reuters
- 46 topics (categories)
- 10,788 training samples
- 2,440 testing samples
- Text data with corresponding topic labels

13. With a neat sketch, enumerate the concept of the deep-learning software
and hardware stack.
A. The deep-learning software and hardware stack consists of several layers that
work together to facilitate the development and deployment of deep learning
models.
1. Hardware Layer: Provides the computational power for training deep learning
models.
Examples: GPUs, TPUs, CPUs, FPGAs, and ASICs.
2. System Software Layer: provides an environment for running deep learning
frameworks.
Examples: Os, device drivers, and resource managers.
3. Deep Learning Frameworks: handle complex mathematical operations and
optimize performance.
Examples: TensorFlow, PyTorch, Keras.
4. Libraries and Tools: Offer additional functionalities for data manipulation.
Examples: NumPy, OpenCV, and specialized libraries.
5. Model Development: The phase where deep learning models are created.
Examples: Data preprocessing, model design, training, and evaluation.

14. Explain the high-level building blocks required for developing deep-
learning models.
A. The building blocks of deep learning:
The fundamental data structure in neural networks is the layer.
A layer is a data-processing module that takes as input one or more tensors and that
outputs one or more tensors. Some layers are stateless, but more frequently layers
have a state.
Different layers are appropriate for different tensor formats and different types of
data processing.
Building deep-learning models in Keras is done by clipping together compatible layers
to form useful data-transformation pipelines.
Consider the following example
When using Keras, you don’t have to worry about compatibility, because the layers
you add to your models are dynamically built to match the shape of the incoming
layer. For instance, suppose you write the following

The second layer didn’t receive an input shape argument—instead, it automatically


inferred its input shape as being the output shape of the layer that came before.
UNIT-4
1. Explain about filters in CNN’s.
A. A filter, or kernel, in a CNN is a small matrix of weights that slides over the input
data, performs element-wise multiplication with the part of the input it is currently
on, and then sums up all the results into a single output pixel. This process is known
as convolution.
Filters are at the heart of what makes CNNs work. They are the primary component
that helps the model extract useful features from the input data.
Here’s how they work:
1. Size: Filters are typically small, square matrices. Common dimensions include
3x3, 5x5, and 7x7.
2. Sliding the Filter: The filter slides across the input data, moving by a certain
number of pixels each time, defined by the “stride”.
3. Convolution Operation: This operation involves element-wise multiplication
of the filter’s weights and the pixel values in the image, followed by summing
these results.
4. Learning: During training, the CNN learns the best values for the filter weights
to achieve the task at hand.
5. Feature Extraction: Filters are responsible for feature extraction in CNNs.
By applying multiple filters to the input data, a CNN can learn to detect a wide
variety of features. This makes them powerful tools for image processing tasks.

2. Explain about BPTT algorithm.


A. Backpropagation through time (BPTT) is a method used in recurrent neural
networks (RNNs) to train the network by backpropagating errors through time.
However, in RNNs, there are connections between nodes in different time steps,
which means that the output of the network at one time step depends on the input
at that time step as well as the previous time steps.
BPTT works by unfolding the RNN over time, creating a series of interconnected
feedforward networks. Each time step corresponds to one layer in this unfolded
network, and the weights between layers are shared across time steps. The unfolded
network can be thought of as a very deep feedforward network, where the weights
are shared across layers.
During training, the error is backpropagated through the unfolded network.
Uses of BPTT:
Speech recognition, Language modeling, Time series prediction

3. Discuss about neural networks and representation learning.


A. Neural networks are a biologically-inspired algorithm that attempt to mimic the
functions of neurons in the brain.
Deep Learning for Representation Learning
Deep Neural Networks are representation learning models. They encode the input
information into hierarchical representations and project it into various subspaces.
Deep Learning tasks can be divided into two categories: Supervised and
Unsupervised Learning.

Supervised Representation Learning:


The learning process is tailored towards a specific task, such as image classification
or sentiment analysis. The learned representations are optimized to perform well on
that particular task.
Examples: Convolutional Neural Network (CNN)
Unsupervised Representation Learning:
Works with unlabeled data. The algorithm identifies patterns and relationships within
the data itself. The goal is to learn informative representations that capture the
underlying structure and essential features of the data.
Examples: autoencoder

4. Explain the differences between ANN and CNN.


A. Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN) are
both deep learning models, but they differ in architecture, functionality, and
applications:
Artificial Neural Networks (ANN):
1. Feedforward network
2. Fully connected layers
3. Each neuron connected to every neuron in next layer
4. Good for:
- Simple classification tasks
- Regression tasks
5. Limitations:
- Not efficient for image/data with spatial hierarchy
- Prone to overfitting
Convolutional Neural Networks (CNN):
1. Specialized for image and video processing
2. Convolutional and pooling layers
3. Locally connected neurons (receptive fields)
4. Good for:
- Image classification
- Object detection
5. Advantages:
- Efficient feature extraction
- Robust to translation and rotation
- Reduced parameters

5. Explain about the features of PyTorch library.


A. PyTorch supports dynamic computational graphs, enabling network behavior to be
changed at runtime.
Key Features
PyTorch’s popularity is driven by several key features:
• Dynamic Computation Graphs: The capability to define computation graphs
on the fly supports an interactive development environment.
• Ease of Integration: It integrates smoothly with Python and various scientific
computing packages, such as NumPy and SciPy.
• Rich Ecosystem: PyTorch boasts a vast ecosystem with libraries tailored
for computer vision (TorchVision), natural language processing.
• Community and Support: The PyTorch community is active and welcoming,
offering abundant online resources, tutorials, and forums.

6. Explain the term Gated recurrent units in RNN’s.


A. A Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) that
enhances the speed performance of LSTM networks by simplifying the structure with
only two gates: the update gate and the reset gate. It is used when speed is crucial in
processing large amounts of data.
The different gates of a GRU are as described below:-
1. Update Gate(z): It determines how much of the past knowledge needs to be
passed along into the future.
2. Reset Gate(r): It is analogous to the combination of the Input Gate and the
Forget Gate in an LSTM recurrent unit.
3. Current Memory Gate(h1): It is incorporated into the Reset Gate just like the
Input Modulation Gate is a sub-part of the Input Gate.
The basic work-flow of a Gated Recurrent Unit Network is similar to that of a
basic Recurrent Neural Network.

Working of a Gated Recurrent Unit:


• Take input the current input and the previous hidden state as vectors.
• Calculate the values of the three different gates by following the steps :-
1. For each gate, calculate the parameterized current input and previously
hidden state vectors by performing element-wise multiplication
between the concerned vector and the respective weights for each
gate.
2. Apply the respective activation function for each gate element-wise on
the parameterized vectors. Below given is the list of the gates with the
activation function to be applied for the gate.
• Update Gate: Sigmoid Function
• Reset Gate: Sigmoid Function

7. Explain in detail about LSTM in RNN.


A. LSTM is a type of RNN with higher memory power to remember the outputs of
each node for a more extended period to produce the outcome for the next node
efficiently.
Long Short-Term Memory is an improved version of recurrent neural network
designed by Hochreiter & Schmidhuber.
A traditional RNN has a single hidden state that is passed through time, which can
make it difficult for the network to learn long-term dependencies. LSTMs
model address this problem by introducing a memory cell, which is a container that
can hold information for an extended period.
LSTMs can also be used in combination with other neural network architectures, such
as Convolutional Neural Networks (CNNs) for image and video analysis.
LSTM Working
LSTM architecture has a chain structure that contains four neural networks and
different memory blocks called cells.

8. Explain why to use Recurrence neural networks other than CNN’s.


A. The main differences between CNNs and RNNs include the following: CNNs are
commonly used to solve problems involving spatial data, such as images. RNNs are
better suited to analyzing temporal and sequential data, such as text or videos. CNNs
and RNNs have different architectures.
CNNs are feedforward neural networks that use filters and pooling layers, whereas
RNNs feed results back into the network.
The main differences between CNNs and RNNs include the following:
• CNNs are commonly used to solve problems involving spatial data, such as
images. RNNs are better suited to analyzing temporal and sequential data,
such as text or videos.
• CNNs and RNNs have different architectures. CNNs are feedforward neural
networks that use filters and pooling layers, whereas RNNs feed results back
into the network.
• In CNNs, the size of the input and the resulting output are fixed. A CNN
receives images of fixed size and outputs a predicted class label for each image
along with a confidence level. In RNNs, the size of the input and the resulting
output can vary.
• Common use cases for CNNs include facial recognition, medical analysis and
image classification. Common use cases for RNNs include machine
translation, natural language processing, sentiment analysis and speech
analysis.

9. Discuss about PyTorch Vs TensorFlow.


A. Difference between PyTorch and TensorFlow
There are various deep learning libraries but the two most famous libraries are
PyTorch and Tensorflow.

S.No PyTorch TensorFlow

1 It was developed by Facebook It was developed by Google

It was deployed on Theano which is a


2 It was made using Torch library.
python library

It works on a dynamic graph


3 It believes on a static graph concept
concept
S.No PyTorch TensorFlow

Its has a higher level functionality and


Pytorch has fewer features as
4 provides broad spectrum of choices to
compared to Tensorflow.
work on.

Pytorch uses simple API which


It has a major benefit that whole graph
5 saves the entire weight of
could be saved as protocol buffer.
model.

It is more supportive for embedded and


It is comparatively less
6 mobile deployments as compared to
supportive in deployments.
Pytorch

7 It has a smaller community. It has a larger community.

It is easy to learn and


8 It is comparatively hard to learn
understand.

It requires user to store Default settings are well-defined in


9
everything into the device. Tensorflow.

It has a dynamic computational


10 It requires the use of debugger tool.
process.

10.Explain about multi-channel convolutional operation in neural networks.


A. As you probably already know, a single-channel convolution works by sliding a 2D
filter, usually smaller than the input matrix, across the height and width dimensions.
single channel convolution: There is only one filter (K x K), and each time it slides
across the input matrix, we compute the weighted sum and compress the
information into a single cell on the output matrix. Repeat and we’ll have a G x G final
output.
Multichannel convolution with M input channels: The filter (now M x K x K) will
always have the same number of channels (M) as the input matrix (M x F x F). This
way, when we’re calculating the weighted sum, each input channel will have its
corresponding kernel.

Note: A “filter” is a collection of “kernels”. Color (R,G,B) represents a kernel.


Multi-channel convolution with M input channels N output channels

11. Explain about convolutional operation in neural networks.


A. The convolution operation, the main part of the CNN, applies specific filters or
kernel functions to a selected region of the image to detect local features .
The name “Convolutional neural network” indicates that the network employs a
mathematical operation called Convolution. Convolution is a specialized kind of linear
operation.

Convolution operations:
Convolution Kernels: A kernel is a 2D matrix that maps on the input image by simple
matrix multiplication and addition, the output obtained is of lower dimensions and
therefore easier to work with.

12. What do you mean by weight sharing? Explain weight sharing in CNNs.
A. weight sharing is to share the same weights across all four filters. It reduces the
number of weights that must be learned, which reduces model training time and
cost.
The weight-sharing property of convolutional neural networks (CNNs) has been a
revolutionary concept in the field of deep learning and computer vision. This article
explores the genesis, advantages, and disadvantages of weight-sharing in CNNs,
including a Python example to illustrate its practical application.
Advantages
1. Reduced Complexity: CNNs drastically reduce the number of parameters,
making the network less complex and easier to train.
2. Translation Invariance: helps in detecting features regardless of their position
in the input space.
3. Efficiency: With fewer parameters, CNNs are more computationally efficient
and require less memory.
Disadvantages
1. Limited Perception: Due to their local receptive field, CNN might have a limited
understanding of the overall context.
2. Spatial Invariance Limitation: CNNs are less effective with other
transformations like rotation and scaling without additional augmentation.

Python Example
import tensorflow as tf

from tensorflow.keras.layers import Conv2D

from tensorflow.keras.models import Sequential

model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu',


input_shape=(28, 28, 1)))

model.summary()

13.What are the advantages of using Convolutional neural networks over


other neural networks.
A. Advantages of convolutional neural networks
• No require human supervision required.
• Automatic feature extraction.
• Highly accurate at image recognition & classification.
• Weight sharing.
• Minimizes computation.
• Uses same knowledge across all image locations.
• Ability to handle large datasets.
• Hierarchical learning.

14. How CNN’s and RNN’s works with PyTorch.


A. Here's an overview of how Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs) work with PyTorch:
->Convolutional Neural Networks (CNNs)
Architecture:
1. Conv2d (Convolutional Layer)
2. MaxPool2d (Pooling Layer)
3. Flatten ()
4. Linear (Fully Connected Layer)
5. Activation Functions (ReLU, Sigmoid, etc.)
->Recurrent Neural Networks (RNNs)
Architecture:
1. RNN Cell (Basic RNN, LSTM, GRU)
2. Embedding Layer
3. Linear (Fully Connected Layer)
4. Activation Functions (Tanh, Sigmoid, etc.)
Training
1. Prepare dataset and data loader
2. Set training loop
3. Forward pass
4. Calculate loss
5. Backward pass
6. Update model parameters

15. Implement stride and padding with a practical example.


A. Here's a practical example of implementing stride and padding in a Convolutional
Neural Network (CNN) using PyTorch:
Example:
import torch
import torch.nn as nn
import torch.nn.functional as F
class ConvLayer(nn.Module):
def __init__(self):
super(ConvLayer, self).__init__()
self.conv = nn.Conv2d(1, 1, kernel_size=3, stride=2, padding=1)
def forward(self, x):
return F.relu(self.conv(x))
model = ConvLayer()
input_tensor = torch.randn(1, 1, 10, 10)
print("Input shape:", input_tensor.shape)
output = model(input_tensor)
print("Output shape:", output.shape)
In this example:
Stride:
* Stride controls the step size of the convolutional filter.
* A stride of 2 means the filter moves 2 pixels at a time.
* This reduces the spatial dimensions of the output.
Padding:
* Padding adds zeros to the input borders.
* Padding of 1 adds 1 zero pixel border around the input.
* This helps maintain the spatial dimensions of the output.
Calculating Output Shape:
* Output height = (10 + 2 \* 1 - 3) / 2 + 1 = 5
* Output width = (10 + 2 \* 1 - 3) / 2 + 1 = 5

16. Enumerate the concept of sequence learning problems.


A. Sequence learning problems are used to better understand the different types of
sequence learning. There are four basic sequence learning problems: sequence
prediction, sequence generation, sequence recognition, and sequential decision
making.
There are four basic sequence learning problems: sequence prediction, sequence
generation, sequence recognition, and sequential decision making.
UNIT-5
1.What are the regular algorithms used in NLP.
A. Natural Language Processing (NLP) is a branch of AI that focuses on developing
computer algorithms to understand and process natural language.
It allows computers to understand human written and spoken language to analyze
text, extract meaning, recognize patterns, and generate new text content.
Types of NLP algorithms
Sentiment analysis: Sentiment analysis is the process of classifying text into
categories of positive, negative, or neutral sentiment.
It works through the use of several techniques: Tokenization, Stop words removal,
Text normalization, Feature extraction, Classification
Keyword extraction: Keyword extraction is a process of extracting important
keywords or phrases from text. This algorithm extracts meaningful keywords from
text to help identify the topics or trends.
Knowledge graph: This algorithm creates a graph network of important entities, such
as people, places, and things. This graph can then be used to understand how
different concepts are related.
Word cloud: This one most of us have come across at one point or another! A word
cloud is a graphical representation of the frequency of words used in the text. It can
be used to identify trends and topics in customer feedback.
Text summarization: This algorithm creates summaries of long texts to make it easier
for humans to understand their contents quickly. Businesses can use it to summarize
customer feedback or large documents into shorter versions for better analysis.
3 common use cases for NLP algorithms
1. Customer support: Businesses can use sentiment analysis to monitor
customer feedback and identify areas of improvement.
2. Market analysis: Keyword extraction can help businesses identify topics and
trends in customer conversations to inform their marketing strategies.
3. Text summarization: Businesses can use text summarization to quickly analyze
long documents or customer feedback.

3.Discuss about training the dataset in GAN’s


A. Steps to Train Generative Adversarial Networks
A. Step 1: Define the problem. ...
B. Step 2: Define architecture of GAN. ...
C. Step 3: Train Discriminator on real data for n epochs. ...
D. Step 4: Generate fake inputs for generator & train discriminator on fake data.
E. Step 6: Check if the fake data manually if it seems legit.

3. What are some advantages of using machine vision over regular


human inspection?
A. Machine vision has several advantages over regular human inspection:
Advantages:
1. Increased Accuracy: Machine vision systems can inspect products with high
precision and consistency, reducing error rates.
2. Improved Speed: Machine vision systems can inspect products at much faster
rates than humans.
3. 24/7 Operation: Machine vision systems can operate continuously without fatigue
or breaks.
4. Consistency: Machine vision systems provide consistent inspection results.
5. Enhanced Defect Detection: Machine vision systems can detect subtle defects that
may be missed by humans.
6. Reduced Labor Costs: Automating inspection tasks reduces labor costs.
7. Improved Safety: Machine vision systems can inspect hazardous or hard-to-reach
areas.
8. Data Analysis: Machine vision systems provide valuable data for process
improvement.

4. Explain the types of GAN’s.


A. The goal of generative modeling is to autonomously identify patterns in input
data, enabling the model to produce new examples that feasibly resemble the
original dataset.
Types of GANs
1. Vanilla GAN: Here, the Generator and the Discriminator are simple a
basic multi-layer perceptrons. This algorithm is really simple, it tries to
optimize the mathematical equation using stochastic gradient descent.
2. Conditional GAN (CGAN): CGAN can be described as a deep learning method
in which some conditional parameters are put into place.
3. Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular and It
is composed of ConvNets in place of multi-layer perceptrons.
4. Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible
image representation consisting of a set of band-pass images.
5. Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of
designing a GAN in which a deep neural network is used along with an
adversarial network in order to produce higher-resolution images

5. Can you explain how an image sensor is in context with machine vision?
A. Image sensors: At the center of any vision system, an image sensor converts light
energy into electrical signals that can be analyzed by software. Image sensors are
solid-state semiconductor chips comprising millions of photodetectors called pixels,
which convert light into electrical signals.
Machine Vision Image Sensor Context:
1. Lighting: Proper lighting illuminates the object or scene, ensuring optimal image
quality.
2. Optics: Lenses focus light onto the image sensor, controlling field of view, depth of
field, and resolution.
3. Image Sensor: Converts light into electrical signals, capturing visual data (e.g.,
intensity, color, texture).
4. Image Processing: Software processes and enhances the captured image,
correcting distortions, noise, and artifacts.
5. Analysis: Algorithms analyze the processed image, extracting relevant information
(e.g., object detection, classification, measurement).

6. Explain about Natural Language Processing.


A. Natural language processing (NLP) is a field of computer science and a subfield of
artificial intelligence that aims to make computers understand human language.
NLP Techniques
1. Text Processing and Preprocessing In NLP: Standardizing text, including case
normalization, removing punctuation, and correcting spelling errors.
2. Syntax and Parsing In NLP: Analyzing the grammatical structure of a sentence to
identify relationships between words.
3. Semantic Analysis: Identifying and classifying entities in text, such as names of
people, organizations, locations, dates, etc.
4. Information Extraction: Identifying and categorizing the relationships between
entities in a text.
5. Text Classification in NLP: Determining the sentiment or emotional tone expressed
in a text (e.g., positive, negative, neutral).
6. Language Generation: Translating text from one language to another and
automatically generating coherent and contextually relevant text.
7. Speech Processing: Converting spoken language into text and Converting written
text into spoken language.
8. Question Answering: Finding and returning the most relevant text passage in
response to a query.
9. Dialogue Systems: Enabling systems to engage in conversations with users,
providing responses and performing tasks based on user input.
10. Sentiment and Emotion Analysis in NLP: Identifying and categorizing emotions
expressed in text. Analyzing reviews to understand public sentiment toward products,
services, or topics.
Working of Natural Language Processing (NLP)

7.How do GAN’s work?


A. The steps involved in how a GAN works:
1. Initialization: Two neural networks are created: a Generator (G) and a
Discriminator (D).
• G is tasked with creating new data, like images or text, that closely
resembles real data.
• D acts as a critic, trying to distinguish between real data (from a training
dataset) and the data generated by G.
2.Generator’s First Move: G takes a random noise vector as input. This noise
vector contains random values and acts as the starting point for G’s creation
process.
3. Discriminator’s Turn: D receives two kinds of inputs:
• Real data samples from the training dataset.
• D’s job is to analyze each input and determine whether it’s real data or
something G cooked up.
4. The Learning Process: If D correctly identifies real data as real and generated
data as fake. This is because they’re both doing their jobs well.
5. Generator’s Improvement: When D mistakenly labels G’s creation as real , it’s
a sign that G is on the right track.
6. Discriminator’s Adaptation: Conversely, if D correctly identifies G’s fake data,
but G receives no reward, D is further strengthened in its discrimination
abilities.

8. Explain about various NLP tools.


A. Various NLP tools:-
1. Monkey Learn: Monkey Learn is considered as a solution that helps a person to
extract data that are inside any Gmail, tweets.
2. spaCY: The most famous AI tool for NLP is spaCY is considered an open-source
library that helps in natural language processing in Python.
3. Stanford Core NLP: Stanford Core NLP is a type of backup download page that is
also used in language analysis tools in Java.
4. MindMeld: MindMeld is considered a language conversation platform that assists
in having a conversational understanding of the domain and other algorithms.
5. Amazon Comprehend: Amazon Comprehend has the feature of AI on NLP offers
natural language processing.
6. OpenAI: OpenAI is advanced AI tool on NLP with machine learning, NLP, robotics,
and deep learning programs.
7. Microsoft Azure: It is a leading AI on NLP with cloud storage features processing
diverse applications within.
8. Google Cloud: Google Cloud has the same infrastructure as Google with its
developed applications and offers a platform for custom services for cloud
computing.
9. IBM Watson: One of the common AI tools for NLP is IBM Watson the service
developed by IBM for NLP for comprehension of texts in various languages.
10. Gensim: Gensim is used by data scientists as an open source with a variety of
algorithms and random projections.
11. PyTorch: PyTorch is an optimizer with dynamic features assuming static behavior,
and recompiling data sizes.
12. FireEye Helix: The last AI tool on NLP is FireEye Helix offers a pipeline and is
software with features of a tokenizer and summarizer.

9.What are Boltzmann machines and Restricted Boltzmann machines?


A. Boltzmann machines include connections between visible and hidden nodes, but
RBMs don't. This is how RBMs vary from them. Boltzmann machines and RBMs are
identical in every other respect. The neural network that is a part of the energy-based
model is called RBM.
Restricted Boltzmann Machines (RBMs)
A restricted term refers to that we are not allowed to connect the same type layer to
each other. In other words, the two neurons of the input layer or hidden layer can’t
connect to each other. Although the hidden layer and visible layer can be connected
to each other.
How do Restricted Boltzmann Machines work?
In RBM there are two phases through which the entire RBM works:
1st Phase: In this phase, we take the input layer and using the concept of weights and
biased we are going to activate the hidden layer. This process is said to be Feed
Forward Pass.
Feed Forward Equation:
• Positive Association — When the association between the visible unit and the
hidden unit is positive.
• Negative Association — When the association between the visible unit and
the hidden unit is negative.
2nd Phase: As we don’t have any output layer. Instead of calculating the output layer,
we are reconstructing the input layer through the activated hidden state. This process
is said to be Feed Backward Pass.
Feed Backward Equation:
• Error = Reconstructed Input Layer-Actual Input layer
• Adjust Weight = Input*error*learning rate (0.1)

10. Explain about Machine vision libraries.


A. Machine vision libraries are software libraries that provide pre-built functions and
tools for developing machine vision applications.
These libraries enable developers to:
1. Capture and process images
2. Detect and recognize objects
3. Classify and segment images
4. Track objects and motion
5. Perform optical character recognition (OCR)
Example:
Popular machine libraries like OpenCV:
1. Most widely used machine vision library
2. Cross-platform (Windows, Linux, macOS, Android, iOS)
3. Over 2,500 algorithms and functions
4. Supports various programming languages (C++, Python, Java, MATLAB)

11. What are the differences between GAN and Auto-encoders?


A. GANs (Generative Adversarial Networks) and autoencoders are both powerful
tools in the realm of machine learning, but they have distinct purposes and
applications. Here's a breakdown of their key differences:
Goal
• GANs: Focused on generating entirely new, realistic data that closely
resembles existing data. This data can be images, videos, text, or even music.
• Autoencoders: Aim to learn efficient representations of input data. They
achieve this by compressing the data into a latent space and then
reconstructing the original data from that code.
Here's a table summarizing the key differences:

12. What are the steps involved in typical deep reinforcement learning
algorithm
A. Step 0: Defining Your Problem Space
Step 1: Start with a Pre-trained Model
The first step in developing AI applications involves starting with a pre-trained model,
which can be obtained from open-source providers.
Step 2: Supervised Fine-Tuning
Supervised fine-tuning is a crucial step in the development of generative AI
applications for large language models, allowing them to become adaptable to
specific use cases.
Step 3: Reward Model Training
Reward model training is an advanced technique that involves training a model to
recognize desirable outputs created by another model.
Step 4: Reinforcement learning via proximal policy optimization (PPO)
Reinforcement learning via proximal policy optimization (PPO) is a type of algorithm
that trains large language models to produce outputs that maximize a reward signal
through trial and error.
Step 5: Red teaming
It allows for human evaluators to provide real-world feedback on the performance of
the generative AI models.

13. What is Deep-Net in Deep learning?


A. Deep-Net, also known as Deep Neural Network, is a type of artificial neural
network with multiple layers, typically more than two hidden layers.
Deep-Nets are designed to learn complex patterns in data, making them a
fundamental component of deep learning.

Characteristics of Deep-Nets: Deep-Net Applications:


1. Multiple hidden layers 1. Image classification
2. Large number of parameters 2. Natural Language Processing (NLP)
3. Hierarchical representation learning 3. Object detection
4. Distributed representations 4. Segmentation
Deep-Net Advantages:
1. Ability to learn complex patterns
2. Improved accuracy
3. Robustness to noise and variability
4. Ability to handle large datasets
5. Flexibility in architecture design

14. How do Restricted Boltzmann Machines work.


A. Restricted Boltzmann Machine (RBM) is a type of artificial neural network that is
used for unsupervised learning. It is a type of generative model that is capable of
learning a probability distribution over a set of input data.
How do Restricted Boltzmann Machines work?
In RBM there are two phases through which the entire RBM works:
1st Phase: In this phase, we take the input layer and using the concept of weights and
biased we are going to activate the hidden layer. This process is said to be Feed
Forward Pass.
Feed Forward Equation:
• Positive Association — When the association between the visible unit and the
hidden unit is positive.
• Negative Association — When the association between the visible unit and
the hidden unit is negative.
2nd Phase: As we don’t have any output layer. Instead of calculating the output layer,
we are reconstructing the input layer through the activated hidden state. This process
is said to be Feed Backward Pass.
Feed Backward Equation:
• Error = Reconstructed Input Layer-Actual Input layer
• Adjust Weight = Input*error*learning rate (0.1)

15. What are the advantages of Deep belief networks


A. Deep belief networks use probabilistic modeling and a supervised learning
approach to offer certain benefits over conventional neural networks.

Advantages:
1. Unsupervised Learning: DBNs can learn complex patterns in data.
2. Dimensionality Reduction: reduce the dimensionality of high-dimensional data.
3. Feature Learning: DBNs can learn relevant features from data.
4. Improved Classification: DBNs can improve classification accuracy.
5. Robustness to Noise: DBNs can handle noisy or missing data.
6. Efficient Training: DBNs can be trained efficiently using greedy layer-wise training.
7. Scalability: DBNs can handle large datasets.
8. Flexibility: DBNs can be used for various tasks.

16.Discuss in detail about Denoising Auto encoders with a suitable example.


A. Denoising autoencoders are a specific type of neural network that enables
unsupervised learning of data representations or encodings. Their primary objective
is to reconstruct the original version of the input signal corrupted by noise.
DAE can overcome the noise in the input samples and extract a more robust implicit
representation.
An autoencoder consists of two main components:
• Encoder: maps the input data into a low-dimensional representation.
• Decoder: returns the encoding to the original data space.
Architecture of DAE
The denoising autoencoder (DAE) architecture is similar to a standard autoencoder.

Examples
• Image Denoising: DAEs are effective in removing noise from images, such as
Gaussian noise.
• Fraud Detection: reconstruct common transactions from their noisy
counterparts.
• Data Imputation: data imputation in datasets with incomplete information.
Image Denoising with Denoising Autoencoder
1.Importing Libraries and Dataset
!pip install tensorflow numpy matplotlib
from tensorflow.keras.datasets import mnist
(x_train, _), (x_test, _) = mnist.load_data()
2.Preprocess Data
noise_factor = 0.5
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
x_test_noisy = np.clip(x_test_noisy, 0., 1.)
x_train_noisy = x_train_noisy / 255.
x_test_noisy = x_test_noisy / 255.
3.Define the model
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
from tensorflow.keras.models import Model
input_layer = Input(shape=(28, 28, 1))
4. Train the model
autoencoder.fit(x_train_noisy, x_train, epochs=10, batch_size=128,
validation_data=(x_test_noisy, x_test))
denoised_images = autoencoder.predict(x_test_noisy)
5.Visualize the original, noisy, and denoised images.
import matplotlib.pyplot as plt
plt.imshow(x_test[0], cmap='gray')
plt.show()
plt.imshow(x_test_noisy[0], cmap='gray')
plt.show()
Output:

You might also like