Unit 1
Unit 1
Today Deep learning AI has become one of the most popular and visible areas of machine
learning, due to its success in a variety of applications, such as computer vision, natural
language processing, and Reinforcement learning.
Deep learning AI can be used for supervised, unsupervised as well as reinforcement
machine learning.
1
UNIT-1 (Deep Learning)
Takes less time to train the model. Takes more time to train the model.
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
In this topic, we will provide a detailed description of the types of Machine Learning along
with their respective algorithms:
2
UNIT-1 (Deep Learning)
1. Supervised Machine Learning
As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.
Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of
a cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.
The main goal of the supervised learning technique is to map the input variable(x) with
the output variable(y). Some real-world applications of supervised learning are Risk
Assessment, Fraud Detection, Spam filtering, etc.
Supervised machine learning can be classified into two types of problems, which are given
below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output
variable is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The
classification algorithms predict the categories present in the dataset. Some real-world
examples of classification algorithms are Spam Detection, Email filtering, etc.
b) Regression
Regression algorithms are used to solve regression problems in which there is a linear
relationship between input and output variables. These are used to predict continuous output
variables, such as market trends, weather prediction, etc.
Advantages:
o Since supervised learning work with the labelled dataset so we can have an exact idea
about the classes of objects.
o These algorithms are helpful in predicting the output on the basis of prior experience.
Disadvantages:
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabelled dataset, and the machine predicts the output without
any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
4
UNIT-1 (Deep Learning)
The main aim of the unsupervised learning algorithm is to group or categories the
unsorted dataset according to the similarities, patterns, and differences. Machines are
instructed to find the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data. It is
a way to group the objects into a cluster such that the objects with the most similarities
remain in one group and have fewer or no similarities with the objects of other groups. An
example of the clustering algorithm is grouping the customers by their purchasing behaviour.
2) Association
Some popular algorithms of Association rule learning are Apriori Algorithm, Eclat, FP-
growth algorithm.
Advantages:
o These algorithms can be used for complicated tasks compared to the supervised ones
because these algorithms work on the unlabeled dataset.
5
UNIT-1 (Deep Learning)
o Unsupervised algorithms are preferable for various tasks as getting the unlabeled
dataset is easier as compared to the labelled dataset.
Disadvantages:
o The output of an unsupervised algorithm can be less accurate as the dataset is not
labelled, and algorithms are not trained with the exact output in prior.
o Working with Unsupervised learning is more difficult as it works with the unlabelled
dataset that does not map with the output.
3. Semi-Supervised Learning
We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analyzing the
same concept under the guidance of an instructor at college.
6
UNIT-1 (Deep Learning)
Advantages:
4. Reinforcement Learning
In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
The reinforcement learning process is similar to a human being; for example, a child learns
various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
7
UNIT-1 (Deep Learning)
Advantages
Key Perspectives:
1. Biological Perspective:
o Inspiration from the Brain: Deep learning models, particularly neural
networks, are inspired by the structure and function of the human brain.
o Artificial Neurons: The fundamental units of neural networks, artificial
neurons, mimic biological neurons in processing and transmitting information.
o Learning Through Experience: Deep learning models learn from large
amounts of data, similar to how humans learn through experience.
2. Mathematical Perspective:
8
UNIT-1 (Deep Learning)
oOptimization Techniques: Deep learning heavily relies on optimization
algorithms like gradient descent to minimize the loss function and improve
model performance.
o Calculus and Linear Algebra: These mathematical concepts are essential for
understanding and implementing deep learning algorithms.
o Statistical Learning Theory: This theory provides a theoretical framework
for analyzing the generalization ability of deep learning models.
3. Computational Perspective:
o Hardware Acceleration: GPUs and TPUs significantly accelerate the training
and inference processes of deep learning models.
o Distributed Computing: Large-scale deep learning models often require
distributed computing frameworks to efficiently train on massive datasets.
o Efficient Algorithms: Optimization techniques and efficient data processing
pipelines are crucial for training deep learning models in a reasonable time.
4. Engineering Perspective:
o Framework Design: Deep learning frameworks provide high-level APIs and
tools to simplify the development and deployment of deep learning models.
o Model Architecture: Engineers design and implement neural network
architectures, considering factors like model complexity, computational cost,
and performance.
o Hyperparameter Tuning: Fine-tuning hyperparameters like learning rate,
batch size, and optimizer can significantly impact model performance.
Deep learning, a branch of artificial intelligence, uses neural networks to analyze and learn
from large datasets. It powers advancements in image recognition, natural language
processing, and autonomous systems. Despite its impressive capabilities, deep learning is
not without its challenges. It includes issues such as data quality, computational demands,
and model interpretability are common obstacles.
Deep learning faces significant challenges such as data quality, computational demands,
and model interpretability. This article explores Deep Learning Challenges and strategies to
address them effectively. Understanding these challenges and finding ways to overcome
them is crucial for successful implementation.
9
UNIT-1 (Deep Learning)
3. Computational Resources
Training deep learning models demands significant computational power and resources.
This can be expensive and inaccessible for many organizations. High-performance
hardware like GPUs and TPUs are often necessary to handle the intensive computations.
4. Interpretability
Deep learning models often function as "black boxes," making it difficult to understand
how they make decisions. This lack of transparency can be problematic, especially in
critical applications. Understanding the decision-making process is crucial for trust and
accountability.
5. Hyperparameter Tuning
Finding the optimal settings for a model’s hyperparameters requires expertise. This process
can be time-consuming and computationally intensive. Hyperparameters significantly
impact the model’s performance, and tuning them effectively is essential for achieving high
accuracy.
6. Scalability
Scaling deep learning models to handle large datasets and complex tasks efficiently is a
major challenge. Ensuring models perform well in real-world applications often requires
significant adjustments. This involves optimizing both algorithms and infrastructure to
manage increased loads.
10
UNIT-1 (Deep Learning)
Deep learning models can inadvertently learn and perpetuate biases present in the training
data. This can lead to unfair outcomes and ethical concerns. Addressing bias and ensuring
fairness in models is critical for their acceptance and trustworthiness.
8. Hardware Limitations
Training deep learning models requires substantial computational resources, including
high-performance GPUs or TPUs. Access to such hardware can be a bottleneck for
researchers and practitioners.
Addressing the challenges in deep learning is crucial for developing effective and reliable
models. By implementing the right strategies, we can mitigate these issues and enhance the
performance of our deep learning systems. Here are the key strategies:
11
UNIT-1 (Deep Learning)
5. Automating Hyperparameter Tuning
Automated Tuning: Use automated tools like grid search, random search, or Bayesian
optimization for hyperparameter tuning.
Efficiency: Automated tuning saves time and computational resources by
systematically exploring the hyperparameter space.
Pomodoro Technique: This time management method involves working in focused 25-
minute intervals, followed by short breaks. It helps maintain concentration and prevents
mental fatigue.
Mind Mapping: Visualizing information in a hierarchical structure can aid in understanding
complex concepts and relationships.
Note-Taking: Taking effective notes helps organize information and identify key points.
Techniques like the Cornell method and the outline method can be useful.
Practice Testing: Regularly testing yourself can solidify understanding and identify areas
that need further review.
Metacognitive Strategies
12
UNIT-1 (Deep Learning)
Self-Assessment: Regularly evaluating your own learning progress helps identify strengths
and weaknesses.
Self-Regulation: Setting specific learning goals, monitoring your progress, and adjusting
your strategies as needed are essential for effective learning.
Self-Questioning: Asking yourself questions about the material can deepen understanding
and critical thinking skills.
Create a conducive learning environment: Find a quiet, well-lit space where you can focus.
Prioritize sleep: Adequate sleep is crucial for memory consolidation and cognitive function.
Take breaks: Short breaks can help prevent mental fatigue and improve focus.
Stay organized: Use calendars, planners, or digital tools to manage your time effectively.
Seek help when needed: Don't hesitate to ask teachers, tutors, or classmates for assistance.
1. Linear Function
Equation: Linear function has the equation similar to as of a straight line i.e. y = x
No matter how many layers we have, if all are linear in nature, the final activation
function of last layer is nothing but just a linear function of the input of first layer.
Range: -∞ to +∞
Uses: Linear activation function is used at just one place i.e. output layer.
Issues: If we will differentiate linear function to bring non-linearity, result will no more
depend on input “x” and function will become constant, it won’t introduce any ground-
breaking behavior to our algorithm.
For example: Calculation of price of a house is a regression problem. House price may
have any big/small value, so we can apply linear activation at output layer. Even in this
case neural net must have any non-linear function at hidden layers.
2. Sigmoid Function
13
UNIT-1 (Deep Learning)
3. Tanh Function
14
UNIT-1 (Deep Learning)
The activation that works almost always better than sigmoid function is Tanh function
also known as Tangent Hyperbolic function. It’s actually mathematically shifted
version of the sigmoid function. Both are similar and can be derived from each other.
Equation :- f(x) = tanh(x) = 2/(1 + e -2x) – 1
OR
tanh(x) = 2 * sigmoid(2x) – 1
Value Range:- -1 to +1
Nature:- non-linear
Uses:- Usually used in hidden layers of a neural network as it’s values lies between -1
to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps
in cantering the data by bringing mean close to 0. This makes learning for the next
layer much easier.
4. RELU Function
It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly
implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have
multiple layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it
involves simpler mathematical operations. At a time only a few neurons are activated
making the network sparse making it efficient and easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.
5. Softmax Function
The softmax function is also a type of sigmoid function but is handy when we are trying to
handle multi- class classification problems.
Nature:- non-linear
Uses:- Usually used when trying to handle multiple classes. the softmax function was
commonly found in the output layer of image classification problems. The softmax
15
UNIT-1 (Deep Learning)
function would squeeze the outputs for each class between 0 and 1 and would also
divide by the sum of the outputs.
Output:- The softmax function is ideally used in the output layer of the classifier where
we are actually trying to attain the probabilities to define the class of each input.
The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function in hidden layers and is used in
most cases these days.
If your output is for binary classification then, sigmoid function is very natural choice
for output layer.
If your output is for multi-class classification then, Softmax is very useful to predict the
probabilities of each classes.
Each connection between neurons in these layers has an associated weight that is adjusted
during the training process to minimize the error in predictions.
Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the loss function by
iteratively updating the weights in the direction of the negative gradient. Common variants
of gradient descent include:
Batch Gradient Descent: Updates weights after computing the gradient over the entire
dataset.
Stochastic Gradient Descent (SGD): Updates weights for each training example
individually.
Mini-batch Gradient Descent: Updates weights after computing the gradient over a
small batch of training examples.
17
UNIT-1 (Deep Learning)
Multi-layer neural network
Multilayer Neural Networks, also known as Artificial Neural Networks (ANN) or Deep
Neural Networks (DNN), are a powerful class of machine learning algorithms that can learn
complex patterns and representations from the input data. It is a generalization of single-layer
perceptron that we studied before. In this exploration, we will cover the basics of multilayer
neural networks and how they are constructed.
Typical structure of a Multilayer Neural Network: This illustration depicts the standard
structure of a feed-forward neural network consisting of two hidden layers. The network
begins with an input layer receiving variables x1, x2, …, xn. These inputs traverse through
the hidden layers where the network’s learning and adjustments occur. Finally, they reach the
output layer, yielding the results y1, y2, …, ym. Each layer consists of neurons
interconnected, each computing distinct operations, thereby enabling complex patterns and
relationships to be captured within the data.
18
UNIT-1 (Deep Learning)
Forward Propagation
Forward propagation is the process of computing the output of a neural network given an
input. The input values are passed through the network layer by layer, with each neuron
computing its output value based on the weighted sum of its input values and applying the
activation function. The output of the final layer is the output of the network. The forward
propagation algorithm can be summarized as follows:
1. For each neuron in the input layer, set the output equal to the input feature value.
2. For each neuron in the hidden and output layers, calculate the weighted sum of the
outputs from the previous layer and apply the activation function to obtain the output.
Loss Functions
In order to train a neural network, we need to define a loss function that quantifies the
difference between the predicted output and the true output. The objective of training is to
minimize this loss function. Some common loss functions include:
1. Mean Squared Error (MSE) Loss: Used for regression tasks, this loss function
calculates the squared difference between the predicted and true output values and
averages it over all the examples. Mathematically, the MSE loss is defined as:
where N is the number of examples, yi is the true output, and y^i is the predicted
output.
2. Cross-Entropy Loss: Used for classification tasks, this loss function calculates the
negative log-likelihood of the true class for each example and averages it over all the
examples. Mathematically, the cross-entropy loss for a single example is defined as:
19
UNIT-1 (Deep Learning)
where C is the number of classes, yi is the true class label (one-hot encoded),
and y^i is the predicted probability for class i.
Backward Propagation
1. For each neuron in the output layer, compute the gradient of the loss with respect to
its output.
2. For each neuron in the hidden layers, compute the gradient of the loss with respect to
its output using the chain rule and the gradients computed for the neurons in the next
layer.
3. Update the weights using the computed gradients and a learning rate.
Overfitting occurs when a neural network learns the noise in the training data instead of the
underlying patterns. This can lead to poor generalization to new, unseen data. Regularization
techniques can be used to prevent overfitting. Common regularization methods
include ℓ1 and ℓ2 regularization, dropout, and early stopping.
1. ℓ1 and ℓ2 Regularization: These techniques add a penalty term to the loss function
based on the magnitude of the weights. ℓ1 regularization adds the sum of the absolute
values of the weights, while ℓ2 regularization adds the sum of the squared values of
the weights. This encourages the model to have smaller weights, making it less likely
to over fit. Mathematically, the loss function with ℓ1 or ℓ2 regularization is defined
as:
20
UNIT-1 (Deep Learning)
3. Early Stopping: Early stopping involves monitoring the performance of the model on
a validation set during training and stopping the training process when the
performance on the validation set starts to degrade, indicating that the model is
starting to overfit. This technique helps to find the optimal point in training where the
model has the best generalization performance.
2. How do big data and deep learning intersect, and what challenges arise?
Answer:
Big data and deep learning intersect in that deep learning models often require
vast amounts of data to train effectively. Challenges include:
Data Integration: Combining data from various sources in different formats.
Storage and Management: Efficiently storing and managing massive datasets.
Processing Speed: Quickly processing large datasets to train models in a
reasonable time frame.
Quality Control: Ensuring data cleanliness and consistency across large
volumes.
Security and Privacy: Protecting sensitive data during processing and storage.
21
UNIT-1 (Deep Learning)
3. What are the perspectives on addressing big data and deep learning
challenges?
Answer:
Perspectives on addressing these challenges include:
Advanced Algorithms: Developing more efficient and scalable algorithms.
Distributed Computing: Utilizing distributed systems and cloud computing to
handle large-scale data and computations.
Automated Machine Learning (AutoML): Tools to automate hyperparameter
tuning and model selection.
Data Augmentation: Techniques to artificially increase the size of training
datasets.
Federated Learning: Training models across decentralized data sources while
maintaining data privacy.
4. What opportunities arise when databases meet deep learning?
Answer:
When databases meet deep learning, several opportunities emerge:
Enhanced Data Analysis: Leveraging deep learning for more sophisticated
data analysis and insights.
Predictive Analytics: Using deep learning models to predict trends and
behaviors from large datasets.
Automated Decision-Making: Implementing deep learning models to
automate decision-making processes.
Personalization: Tailoring services and products based on deep learning
insights from user data.
Real-Time Processing: Combining databases with deep learning to process
and analyze data in real-time.
22