0% found this document useful (0 votes)
49 views

Unit 1

Deep Learning Unit-1

Uploaded by

gireesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Unit 1

Deep Learning Unit-1

Uploaded by

gireesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT-1 (Deep Learning)

What is Deep Learning?


The definition of Deep learning is that it is the branch of machine learning that is based on
artificial neural network architecture. An artificial neural network or ANN uses layers of
interconnected nodes called neurons that work together to process and learn from the input
data.
In a fully connected Deep neural network, there is an input layer and one or more hidden
layers connected one after the other. Each neuron receives input from the previous layer
neurons or the input layer. The output of one neuron becomes the input to other neurons in
the next layer of the network, and this process continues until the final layer produces the
output of the network. The layers of the neural network transform the input data through a
series of nonlinear transformations, allowing the network to learn complex representations
of the input data.

Today Deep learning AI has become one of the most popular and visible areas of machine
learning, due to its success in a variety of applications, such as computer vision, natural
language processing, and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement
machine learning.

Difference between Machine Learning and Deep


Learning
Machine learning and deep learning AI both are subsets of artificial intelligence but there
are many similarities and differences between them.
Machine Learning Deep Learning

Apply statistical algorithms to learn the Uses artificial neural network


hidden patterns and relationships in the architecture to learn the hidden patterns
dataset. and relationships in the dataset.

Requires the larger volume of dataset


Can work on the smaller amount of dataset
compared to machine learning

Better for complex task like image


Better for the low-label task. processing, natural language processing,
etc.

1
UNIT-1 (Deep Learning)

Machine Learning Deep Learning

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features Relevant features are automatically


which are manually extracted from images extracted from images. It is an end-to-
to detect an object in the image. end learning process.

More complex, it works like the black


Less complex and easy to interpret the
box interpretations of the result are not
result.
easy.

It can work on the CPU or requires less


It requires a high-performance computer
computing power as compared to deep
with GPU.
learning.

Various paradigms of learning problems


Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions. Machine
learning contains a set of algorithms that work on a huge amount of data. Data is fed to these
algorithms to train them, and on the basis of training, they build the model & perform a
specific task.

Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

In this topic, we will provide a detailed description of the types of Machine Learning along
with their respective algorithms:
2
UNIT-1 (Deep Learning)
1. Supervised Machine Learning

As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.

The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.

Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given
below:

o Classification
o Regression

a) Classification

Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

b) Regression

Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.

Some popular Regression algorithms are given below:


3
UNIT-1 (Deep Learning)
o Simple Linear Regression Algorithm
o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression

Advantages and Disadvantages of Supervised Learning

Advantages:

o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:

o These algorithms are not able to solve complex tasks.


o It may predict the wrong output if the test data is different from the training data.
o It requires lots of computational time to train the algorithm.

Applications of Supervised Learning

Some common applications of Supervised Learning are given below:

o Image Segmentation: Supervised Learning algorithms are used in image


segmentation. In this process, image classification is performed on different image
data with pre-defined labels.
o Medical Diagnosis: Supervised algorithms are also used in the medical field for
diagnosis purposes. It is done by using medical images and past labelled data with
labels for disease conditions. With such a process, the machine can identify a disease
for the new patients.
o Fraud Detection - Supervised Learning classification algorithms are used for
identifying fraud transactions, fraud customers, etc. It is done by using historic data to
identify the patterns that can lead to possible fraud.
o Spam detection - In spam detection & filtering, classification algorithms are used.
These algorithms classify an email as spam or not spam. The spam emails are sent to
the spam folder.
o Speech Recognition - Supervised learning algorithms are also used in speech
recognition. The algorithm is trained with voice data, and various identifications can
be done using the same, such as voice-activated passwords, voice commands, etc.

2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabelled dataset, and the machine predicts the output without
any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.

4
UNIT-1 (Deep Learning)
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.

Categories of Unsupervised Machine Learning

Unsupervised Learning can be further classified into two types, which are given below:

o Clustering
o Association

1) Clustering

The clustering technique is used when we want to find the inherent groups from the data. It is
a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups. An
example of the clustering algorithm is grouping the customers by their purchasing behaviour.

Some of the popular clustering algorithms are given below:

o K-Means Clustering algorithm


o Mean-shift algorithm
o DBSCAN Algorithm
o Principal Component Analysis
o Independent Component Analysis

2) Association

Association rule learning is an unsupervised learning technique, which finds interesting


relations among variables within a large dataset. The main aim of this learning algorithm is to
find the dependency of one data item on another data item and map those variables
accordingly so that it can generate maximum profit. This algorithm is mainly applied
in Market Basket analysis, Web usage mining, continuous production, etc.

Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.

Advantages and Disadvantages of Unsupervised Learning Algorithm

Advantages:

o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.

5
UNIT-1 (Deep Learning)
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:

o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.

Applications of Unsupervised Learning

o Network Analysis: Unsupervised learning is used for identifying plagiarism and


copyright in document network analysis of text data for scholarly articles.
o Recommendation Systems: Recommendation systems widely use unsupervised
learning techniques for building recommendation applications for different web
applications and e-commerce websites.
o Anomaly Detection: Anomaly detection is a popular application of unsupervised
learning, which can identify unusual data points within the dataset. It is used to
discover fraudulent transactions.
o Singular Value Decomposition: Singular Value Decomposition or SVD is used to
extract particular information from the database. For example, extracting information
of each user located at a particular location.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between


Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabeled datasets
during the training period.

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it mostly consists
of unlabeled data. As labels are costly, but for corporate purposes, they may have few labels.
It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning


algorithms, the concept of Semi-supervised learning is introduced. The main aim of semi-
supervised learning is to effectively use all the available data, rather than only labelled data
like in supervised learning. Initially, similar data is clustered along with an unsupervised
learning algorithm, and further, it helps to label the unlabelled data into labelled data. It is
because labelled data is a comparatively more expensive acquisition than unlabelled data.

We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analyzing the
same concept under the guidance of an instructor at college.

6
UNIT-1 (Deep Learning)

Advantages and disadvantages of Semi-Supervised Learning

Advantages:

o It is simple and easy to understand the algorithm.


o It is highly efficient.
o It is used to solve drawbacks of Supervised and Unsupervised Learning algorithms.
Disadvantages:

o Iterations results may not be stable.


o We cannot apply these algorithms to network-level data.
o Accuracy is low.

4. Reinforcement Learning

Reinforcement learning works on a feedback-based process, in which an AI agent (A


software component) automatically explore its surrounding by hitting & trail, taking
action, learning from experiences, and improving its performance. Agent gets rewarded
for each good action and get punished for each bad action; hence the goal of reinforcement
learning agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.

A reinforcement learning problem can be formalized using Markov Decision


Process(MDP). In MDP, the agent constantly interacts with the environment and performs
actions; at each action, the environment responds and generates a new state.

Categories of Reinforcement Learning

Reinforcement learning is categorized mainly into two types of methods/algorithms:

o Positive Reinforcement Learning: Positive reinforcement learning specifies


increasing the tendency that the required behaviour would occur again by adding
something. It enhances the strength of the behaviour of the agent and positively
impacts it.
o Negative Reinforcement Learning: Negative reinforcement learning works exactly
opposite to the positive RL. It increases the tendency that the specific behaviour
would occur again by avoiding the negative condition.

7
UNIT-1 (Deep Learning)

Real-world Use cases of Reinforcement Learning

o Video Games: RL algorithms are much popular in gaming applications. It is used to


gain super-human performance. Some popular games that use RL algorithms
are AlphaGO and AlphaGO Zero.
o Resource Management: The "Resource Management with Deep Reinforcement
Learning" paper showed that how to use RL in computer to automatically learn and
schedule resources to wait for different jobs in order to minimize average job
slowdown.
o Robotics: RL is widely being used in Robotics applications. Robots are used in the
industrial and manufacturing area, and these robots are made more powerful with
reinforcement learning. There are different industries that have their vision of building
intelligent robots using AI and Machine learning technology.
o Text Mining: Text-mining, one of the great applications of NLP, is now being
implemented with the help of Reinforcement Learning by Salesforce company.

Advantages and Disadvantages of Reinforcement Learning

Advantages

o It helps in solving complex real-world problems which are difficult to be solved by


general techniques.
o The learning model of RL is similar to the learning of human beings; hence most
accurate results can be found.
o Helps in achieving long term results.
Disadvantage

o RL algorithms are not preferred for simple problems.


o RL algorithms require huge data and computations.
o Too much reinforcement learning can lead to an overload of states which can weaken
the results.

Perspectives and Issues in deep learning framework

Key Perspectives:

1. Biological Perspective:
o Inspiration from the Brain: Deep learning models, particularly neural
networks, are inspired by the structure and function of the human brain.
o Artificial Neurons: The fundamental units of neural networks, artificial
neurons, mimic biological neurons in processing and transmitting information.
o Learning Through Experience: Deep learning models learn from large
amounts of data, similar to how humans learn through experience.
2. Mathematical Perspective:

8
UNIT-1 (Deep Learning)
oOptimization Techniques: Deep learning heavily relies on optimization
algorithms like gradient descent to minimize the loss function and improve
model performance.
o Calculus and Linear Algebra: These mathematical concepts are essential for
understanding and implementing deep learning algorithms.
o Statistical Learning Theory: This theory provides a theoretical framework
for analyzing the generalization ability of deep learning models.
3. Computational Perspective:
o Hardware Acceleration: GPUs and TPUs significantly accelerate the training
and inference processes of deep learning models.
o Distributed Computing: Large-scale deep learning models often require
distributed computing frameworks to efficiently train on massive datasets.
o Efficient Algorithms: Optimization techniques and efficient data processing
pipelines are crucial for training deep learning models in a reasonable time.
4. Engineering Perspective:
o Framework Design: Deep learning frameworks provide high-level APIs and
tools to simplify the development and deployment of deep learning models.
o Model Architecture: Engineers design and implement neural network
architectures, considering factors like model complexity, computational cost,
and performance.
o Hyperparameter Tuning: Fine-tuning hyperparameters like learning rate,
batch size, and optimizer can significantly impact model performance.

Deep Learning Challenges (Issues)

Deep learning, a branch of artificial intelligence, uses neural networks to analyze and learn
from large datasets. It powers advancements in image recognition, natural language
processing, and autonomous systems. Despite its impressive capabilities, deep learning is
not without its challenges. It includes issues such as data quality, computational demands,
and model interpretability are common obstacles.

Deep learning faces significant challenges such as data quality, computational demands,
and model interpretability. This article explores Deep Learning Challenges and strategies to
address them effectively. Understanding these challenges and finding ways to overcome
them is crucial for successful implementation.

9
UNIT-1 (Deep Learning)

1. Overfitting and Underfitting


Balancing model complexity to ensure it generalizes well to new data is
challenging. Overfitting occurs when a model is too complex and captures noise in the
training data. Underfitting happens when a model is too simple and fails to capture the
underlying patterns.

2. Data Quality and Quantity


Deep learning models require large, high-quality datasets for training. Insufficient or poor-
quality data can lead to inaccurate predictions and model failures. Acquiring and annotating
large datasets is often time-consuming and expensive.

3. Computational Resources
Training deep learning models demands significant computational power and resources.
This can be expensive and inaccessible for many organizations. High-performance
hardware like GPUs and TPUs are often necessary to handle the intensive computations.

4. Interpretability
Deep learning models often function as "black boxes," making it difficult to understand
how they make decisions. This lack of transparency can be problematic, especially in
critical applications. Understanding the decision-making process is crucial for trust and
accountability.

5. Hyperparameter Tuning
Finding the optimal settings for a model’s hyperparameters requires expertise. This process
can be time-consuming and computationally intensive. Hyperparameters significantly
impact the model’s performance, and tuning them effectively is essential for achieving high
accuracy.

6. Scalability
Scaling deep learning models to handle large datasets and complex tasks efficiently is a
major challenge. Ensuring models perform well in real-world applications often requires
significant adjustments. This involves optimizing both algorithms and infrastructure to
manage increased loads.

7. Ethical and Bias Issues

10
UNIT-1 (Deep Learning)
Deep learning models can inadvertently learn and perpetuate biases present in the training
data. This can lead to unfair outcomes and ethical concerns. Addressing bias and ensuring
fairness in models is critical for their acceptance and trustworthiness.

8. Hardware Limitations
Training deep learning models requires substantial computational resources, including
high-performance GPUs or TPUs. Access to such hardware can be a bottleneck for
researchers and practitioners.

10. Adversarial Attacks


Deep learning models are susceptible to adversarial attacks, where subtle perturbations to
input data can cause misclassification. Robustness against such attacks remains a
significant concern in safety-critical applications.

Strategies to Overcome Deep Learning Challenges

Addressing the challenges in deep learning is crucial for developing effective and reliable
models. By implementing the right strategies, we can mitigate these issues and enhance the
performance of our deep learning systems. Here are the key strategies:

1. Enhancing Data Quality and Quantity


 Pre-processing: Invest in data pre-processing techniques to clean and organize data.
 Data Augmentation: Use data augmentation methods to artificially increase the size of
your dataset.
 Data Collection: Gathering more labeled data improves model accuracy and
robustness.

2. Leveraging Cloud Computing


 Cloud Platforms: Utilize cloud-based platforms like AWS, Google Cloud, or Azure for
accessing computational resources.
 Scalable Computing: These platforms offer scalable computing power without the
need for significant upfront investment.
 Tools and Frameworks: Cloud services also provide tools and frameworks that
simplify the deployment and management of deep learning models.

3. Implementing Regularization Techniques


 Dropout: Use techniques like dropout to prevent overfitting.
 L2 Regularization: Regularization helps the model generalize better by adding
constraints or noise during training.
 Data Augmentation: This ensures that the model performs well on new, unseen data.

4. Improving Model Interpretability


 Interpretability Tools: Employ tools like LIME (Local Interpretable Model-agnostic
Explanations) or SHAP (SHapley Additive explanations) to understand model
decisions.
 Transparency: Enhancing interpretability helps build trust in the model, especially in
critical applications.

11
UNIT-1 (Deep Learning)
5. Automating Hyperparameter Tuning
 Automated Tuning: Use automated tools like grid search, random search, or Bayesian
optimization for hyperparameter tuning.
 Efficiency: Automated tuning saves time and computational resources by
systematically exploring the hyperparameter space.

6. Optimizing Algorithms and Hardware


 Efficient Algorithms: Implement efficient algorithms and leverage specialized
hardware like GPUs and TPUs.
 Advanced Hardware: These optimizations significantly reduce training time and
improve model performance.

7. Addressing Bias and Ethical Concerns


 Fairness Practices: Implement fairness-aware machine learning practices to identify
and mitigate biases.
 Regular Audits: Regularly audit models to ensure they do not perpetuate harmful
biases present in the training data.

Review of fundamental learning techniques

Learning is a multifaceted process that involves acquiring, processing, and retaining


information. Over the years, numerous techniques have been developed to enhance learning
efficiency and effectiveness. Here's a review of some fundamental learning techniques:

Active Learning Techniques

 Spaced Repetition: This technique involves reviewing information at increasing intervals


over time. By spacing out review sessions, you reinforce learning and improve long-term
retention.
 Active Recall: Actively retrieving information from memory, rather than passively rereading,
strengthens neural connections. Techniques like flashcards and quizzing can be effective for
active recall.
 Elaborative Rehearsal: Connecting new information to existing knowledge helps deepen
understanding. Creating mental images, analogies, or personal stories can enhance this
process.
 Interleaving: Mixing different topics or problem types during study sessions can improve
problem-solving skills and reduce the risk of rote memorization.

Effective Study Strategies

 Pomodoro Technique: This time management method involves working in focused 25-
minute intervals, followed by short breaks. It helps maintain concentration and prevents
mental fatigue.
 Mind Mapping: Visualizing information in a hierarchical structure can aid in understanding
complex concepts and relationships.
 Note-Taking: Taking effective notes helps organize information and identify key points.
Techniques like the Cornell method and the outline method can be useful.
 Practice Testing: Regularly testing yourself can solidify understanding and identify areas
that need further review.

Metacognitive Strategies

12
UNIT-1 (Deep Learning)
 Self-Assessment: Regularly evaluating your own learning progress helps identify strengths
and weaknesses.
 Self-Regulation: Setting specific learning goals, monitoring your progress, and adjusting
your strategies as needed are essential for effective learning.
 Self-Questioning: Asking yourself questions about the material can deepen understanding
and critical thinking skills.

Additional Tips for Effective Learning

 Create a conducive learning environment: Find a quiet, well-lit space where you can focus.
 Prioritize sleep: Adequate sleep is crucial for memory consolidation and cognitive function.
 Take breaks: Short breaks can help prevent mental fatigue and improve focus.
 Stay organized: Use calendars, planners, or digital tools to manage your time effectively.
 Seek help when needed: Don't hesitate to ask teachers, tutors, or classmates for assistance.

Activation functions in Neural Networks


An activation function in the context of neural networks is a mathematical function applied to
the output of a neuron. The purpose of an activation function is to introduce non-linearity into
the model, allowing the network to learn and represent complex patterns in the data. Without
non-linearity, a neural network would essentially behave like a linear regression model,
regardless of the number of layers it has.
The activation function decides whether a neuron should be activated or not by calculating
the weighted sum and further adding bias to it. The purpose of the activation function is to
introduce non-linearity into the output of a neuron.
Variants of Activation Function

1. Linear Function
 Equation: Linear function has the equation similar to as of a straight line i.e. y = x
 No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
 Range: -∞ to +∞
 Uses: Linear activation function is used at just one place i.e. output layer.
 Issues: If we will differentiate linear function to bring non-linearity, result will no more
depend on input “x” and function will become constant, it won’t introduce any ground-
breaking behavior to our algorithm.

For example: Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.

2. Sigmoid Function

13
UNIT-1 (Deep Learning)

 It is a function which is plotted as ‘S’ shaped graph.


 Equation: A = 1/(1 + e-x)
 Nature: Non-linear. Notice that X values lies between -2 to 2, Y values are very steep.
This means, small changes in x would also bring about large changes in the value of Y.
 Value Range: 0 to 1
 Uses: Usually used in output layer of a binary classification, where result is either 0 or
1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted
easily to be 1 if value is greater than 0.5 and 0 otherwise.

3. Tanh Function

14
UNIT-1 (Deep Learning)
 The activation that works almost always better than sigmoid function is Tanh function
also known as Tangent Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.
 Equation :- f(x) = tanh(x) = 2/(1 + e -2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
 Value Range:- -1 to +1
 Nature:- non-linear
 Uses:- Usually used in hidden layers of a neural network as it’s values lies between -1
to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps
in cantering the data by bringing mean close to 0. This makes learning for the next
layer much easier.

4. RELU Function
 It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
 Value Range :- [0, inf)
 Nature :- non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.

5. Softmax Function

The softmax function is also a type of sigmoid function but is handy when we are trying to
handle multi- class classification problems.
 Nature:- non-linear
 Uses:- Usually used when trying to handle multiple classes. the softmax function was
commonly found in the output layer of image classification problems. The softmax
15
UNIT-1 (Deep Learning)
function would squeeze the outputs for each class between 0 and 1 and would also
divide by the sum of the outputs.
 Output:- The softmax function is ideally used in the output layer of the classifier where
we are actually trying to attain the probabilities to define the class of each input.
 The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function in hidden layers and is used in
most cases these days.
 If your output is for binary classification then, sigmoid function is very natural choice
for output layer.
 If your output is for multi-class classification then, Softmax is very useful to predict the
probabilities of each classes.

Feedforward neural network


A Feedforward Neural Network (FNN) is a type of artificial neural network where
connections between the nodes do not form cycles. This characteristic differentiates it from
recurrent neural networks (RNNs). The network consists of an input layer, one or more
hidden layers, and an output layer. Information flows in one direction—from input to output
—hence the name "feedforward."
Structure of a Feedforward Neural Network
1. Input Layer: The input layer consists of neurons that receive the input data. Each
neuron in the input layer represents a feature of the input data.
2. Hidden Layers: One or more hidden layers are placed between the input and output
layers. These layers are responsible for learning the complex patterns in the data. Each
neuron in a hidden layer applies a weighted sum of inputs followed by a non-linear
activation function.
3. Output Layer: The output layer provides the final output of the network. The number
of neurons in this layer corresponds to the number of classes in a classification problem
or the number of outputs in a regression problem.

Each connection between neurons in these layers has an associated weight that is adjusted
during the training process to minimize the error in predictions.

Training a Feedforward Neural Network


Training a Feedforward Neural Network involves adjusting the weights of the neurons to
minimize the error between the predicted output and the actual output. This process is
typically performed using backpropagation and gradient descent.
16
UNIT-1 (Deep Learning)
Forward Propagation: During forward propagation, the input data passes through the
network, and the output is calculated.
Loss Calculation: The loss (or error) is calculated using a loss function such as Mean
Squared Error (MSE) for regression tasks or Cross-Entropy Loss for classification tasks.
Backpropagation: In backpropagation, the error is propagated back through the network to
update the weights. The gradient of the loss function with respect to each weight is
calculated, and the weights are adjusted using gradient descent.

Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function by
iteratively updating the weights in the direction of the negative gradient. Common variants
of gradient descent include:
 Batch Gradient Descent: Updates weights after computing the gradient over the entire
dataset.
 Stochastic Gradient Descent (SGD): Updates weights for each training example
individually.
 Mini-batch Gradient Descent: Updates weights after computing the gradient over a
small batch of training examples.

Evaluation of Feedforward neural network


Evaluating the performance of the trained model involves several metrics:
 Accuracy: The proportion of correctly classified instances out of the total instances.
 Precision: The ratio of true positive predictions to the total predicted positives.
 Recall: The ratio of true positive predictions to the actual positives.
 F1 Score: The harmonic mean of precision and recall, providing a balance between the
two.
 Confusion Matrix: A table used to describe the performance of a classification model,
showing the true positives, true negatives, false positives, and false negatives.

17
UNIT-1 (Deep Learning)
Multi-layer neural network
Multilayer Neural Networks, also known as Artificial Neural Networks (ANN) or Deep
Neural Networks (DNN), are a powerful class of machine learning algorithms that can learn
complex patterns and representations from the input data. It is a generalization of single-layer
perceptron that we studied before. In this exploration, we will cover the basics of multilayer
neural networks and how they are constructed.

Multilayer Neural Network Architecture

A multilayer neural network consists of multiple layers of interconnected nodes or neurons.


Each neuron computes a weighted sum of its input values and passes it through an activation
function to produce an output value. The layers can be categorized into three types:

1. Input Layer: The first layer that receives input data.


2. Hidden Layers: The layers between the input and output layers. There can be
multiple hidden layers in a neural network.
3. Output Layer: The final layer that produces the output of the network.

Typical structure of a Multilayer Neural Network: This illustration depicts the standard
structure of a feed-forward neural network consisting of two hidden layers. The network
begins with an input layer receiving variables x1, x2, …, xn. These inputs traverse through
the hidden layers where the network’s learning and adjustments occur. Finally, they reach the
output layer, yielding the results y1, y2, …, ym. Each layer consists of neurons
interconnected, each computing distinct operations, thereby enabling complex patterns and
relationships to be captured within the data.

Neurons and Activation Functions


A neuron in a neural network computes a weighted sum of its input values and applies an
activation function to produce an output value. The activation function introduces non-
linearity into the network, allowing it to learn complex relationships in the data. Common
activation functions include:

18
UNIT-1 (Deep Learning)

Varieties of Activation Functions: The illustration features three fundamental activation


functions utilized in neural networks. (a) The Sigmoid function, ranging between 0 and 1, is
typically used for binary classification problems. (b) The tanh function, with its output range
from -1 to 1, centers the data making it zero-centered, which often accelerates convergence.
(c) The ReLU (Rectified Linear Unit) function, allowing only positive values to pass through.

Forward Propagation

Forward propagation is the process of computing the output of a neural network given an
input. The input values are passed through the network layer by layer, with each neuron
computing its output value based on the weighted sum of its input values and applying the
activation function. The output of the final layer is the output of the network. The forward
propagation algorithm can be summarized as follows:

1. For each neuron in the input layer, set the output equal to the input feature value.
2. For each neuron in the hidden and output layers, calculate the weighted sum of the
outputs from the previous layer and apply the activation function to obtain the output.

Loss Functions

In order to train a neural network, we need to define a loss function that quantifies the
difference between the predicted output and the true output. The objective of training is to
minimize this loss function. Some common loss functions include:

1. Mean Squared Error (MSE) Loss: Used for regression tasks, this loss function
calculates the squared difference between the predicted and true output values and
averages it over all the examples. Mathematically, the MSE loss is defined as:

where N is the number of examples, yi is the true output, and y^i is the predicted
output.

2. Cross-Entropy Loss: Used for classification tasks, this loss function calculates the
negative log-likelihood of the true class for each example and averages it over all the
examples. Mathematically, the cross-entropy loss for a single example is defined as:
19
UNIT-1 (Deep Learning)

where C is the number of classes, yi is the true class label (one-hot encoded),
and y^i is the predicted probability for class i.

Backward Propagation

Backward propagation, also known as backpropagation, is an algorithm used to train


multilayer neural networks. It involves computing the gradient of the loss function with
respect to each weight by applying the chain rule. The gradient is then used to update the
weights in the network to minimize the loss function. The backpropagation algorithm can be
summarized as follows:

1. For each neuron in the output layer, compute the gradient of the loss with respect to
its output.
2. For each neuron in the hidden layers, compute the gradient of the loss with respect to
its output using the chain rule and the gradients computed for the neurons in the next
layer.
3. Update the weights using the computed gradients and a learning rate.

Training a Multilayer Neural Network

Training a multilayer neural network involves the following steps:

1. Initialize the weights and biases of the network.


2. Perform forward propagation to compute the output of the network.
3. Calculate the loss function based on the predicted output and true target values.
4. Perform backward propagation to compute the gradient of the loss function with
respect to the weights.
5. Update the weights using the computed gradient and a learning rate.
6. Repeat steps 2-5 for multiple epochs or until a stopping criterion is met.

Overfitting and Regularization

Overfitting occurs when a neural network learns the noise in the training data instead of the
underlying patterns. This can lead to poor generalization to new, unseen data. Regularization
techniques can be used to prevent overfitting. Common regularization methods
include ℓ1 and ℓ2 regularization, dropout, and early stopping.

1. ℓ1 and ℓ2 Regularization: These techniques add a penalty term to the loss function
based on the magnitude of the weights. ℓ1 regularization adds the sum of the absolute
values of the weights, while ℓ2 regularization adds the sum of the squared values of
the weights. This encourages the model to have smaller weights, making it less likely
to over fit. Mathematically, the loss function with ℓ1 or ℓ2 regularization is defined
as:

20
UNIT-1 (Deep Learning)

2. Dropout: Dropout is a regularization technique that involves randomly “dropping out”


or setting to zero a fraction of the neurons in a layer during training. This prevents the
model from relying too much on any single neuron and encourages the model to learn
more robust representations. Dropout is applied only during training, and during
inference, all neurons are used with their weights scaled by the dropout rate.

3. Early Stopping: Early stopping involves monitoring the performance of the model on
a validation set during training and stopping the training process when the
performance on the validation set starts to degrade, indicating that the model is
starting to overfit. This technique helps to find the optimal point in training where the
model has the best generalization performance.

By applying one or a combination of these regularization techniques, overfitting can be


mitigated, leading to better generalization of the neural network to new, unseen data.

1. What are the main challenges in deep learning?


Answer:
The main challenges in deep learning include:
 Data Quality and Quantity: Ensuring large, diverse, and high-quality datasets.
 Computational Resources: High demands for powerful hardware like GPUs.
 Model Interpretability: Difficulty in understanding and explaining model
decisions.
 Overfitting: Models performing well on training data but poorly on unseen data.
 Hyperparameter Tuning: The complex and time-consuming process of
optimizing model parameters.
 Scalability: Issues with scaling models to handle massive datasets and high-
dimensional data.
 Ethical and Bias Concerns: Ensuring models are fair and unbiased.

2. How do big data and deep learning intersect, and what challenges arise?
Answer:
Big data and deep learning intersect in that deep learning models often require
vast amounts of data to train effectively. Challenges include:
 Data Integration: Combining data from various sources in different formats.
 Storage and Management: Efficiently storing and managing massive datasets.
 Processing Speed: Quickly processing large datasets to train models in a
reasonable time frame.
 Quality Control: Ensuring data cleanliness and consistency across large
volumes.
 Security and Privacy: Protecting sensitive data during processing and storage.

21
UNIT-1 (Deep Learning)
3. What are the perspectives on addressing big data and deep learning
challenges?
Answer:
Perspectives on addressing these challenges include:
 Advanced Algorithms: Developing more efficient and scalable algorithms.
 Distributed Computing: Utilizing distributed systems and cloud computing to
handle large-scale data and computations.
 Automated Machine Learning (AutoML): Tools to automate hyperparameter
tuning and model selection.
 Data Augmentation: Techniques to artificially increase the size of training
datasets.
 Federated Learning: Training models across decentralized data sources while
maintaining data privacy.
4. What opportunities arise when databases meet deep learning?
Answer:
When databases meet deep learning, several opportunities emerge:
 Enhanced Data Analysis: Leveraging deep learning for more sophisticated
data analysis and insights.
 Predictive Analytics: Using deep learning models to predict trends and
behaviors from large datasets.
 Automated Decision-Making: Implementing deep learning models to
automate decision-making processes.
 Personalization: Tailoring services and products based on deep learning
insights from user data.
 Real-Time Processing: Combining databases with deep learning to process
and analyze data in real-time.

22

You might also like