0% found this document useful (0 votes)
7 views

UNIT 4 ML NN ,DL,CNN-1

The document provides an overview of Machine Learning Techniques, focusing on Artificial Neural Networks (ANNs) and Gradient Descent. It covers key concepts such as perceptrons, multilayer perceptrons, deep learning, and various gradient descent methods, including their applications and limitations. Additionally, it discusses challenges like vanishing and exploding gradients, along with techniques to mitigate these issues during model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

UNIT 4 ML NN ,DL,CNN-1

The document provides an overview of Machine Learning Techniques, focusing on Artificial Neural Networks (ANNs) and Gradient Descent. It covers key concepts such as perceptrons, multilayer perceptrons, deep learning, and various gradient descent methods, including their applications and limitations. Additionally, it discusses challenges like vanishing and exploding gradients, along with techniques to mitigate these issues during model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Maharana Pratap Group of Institutions, Mandhana, Kanpur

(Approved By AICTE, New Delhi And Affiliated To AKTU, Luck now)

Digital Notes
[Department of Computer Science Engineering]

Course : B.TECH
Branch : CSE 3rd Yr
Subject Name :Machine Learning Techniques
(BCS055)
Prepared by: Mr. Abhishek Singh Sengar
Unit 4
• ARTIFICIAL NEURAL NETWORKS – Perceptron’s,
Multilayer perceptron, Gradient descent and the Delta
rule, Multilayer networks, Derivation of Back propagation
Algorithm, Generalization, Unsupervised Learning – SOM
Algorithm and its variant
• DEEP LEARNING - Introduction, concept of convolutional
neural network, Types of layers – (Convolutional Layers,
Activation function, pooling, fully connected), Concept of
Convolution (1D and 2D) layers, Training of network, Case
study of CNN for eg., on Diabetic Retinopathy, Building a
smart speaker, Self-deriving car etc.
ARTIFICIAL NEURAL NETWORKS
• Artificial Neural Networks (ANNs), or neural
networks, are computational models inspired by
the human brain, designed to learn from data
and make predictions. They consist of
interconnected nodes (neurons) arranged in
layers that process and transmit information,
mimicking how biological neurons communicate.
• ANNs are a type of machine learning algorithm
and a key component of deep learning
Key aspects of ANNs:
• Structure:
• ANNs typically have an input layer, one or more hidden layers, and an output
layer.
• Nodes:
• Each node (neuron) receives input, processes it, and produces an output, which
is then passed to other nodes.
• Interconnections:
• The connections between nodes have weights, which determine the strength of
the connection, according to ScienceDirect.com.
• Learning:
• ANNs learn by adjusting the weights during a process called "training," where
they compare their predictions with actual outcomes and refine their internal
parameters, according to NVIDIA Developer.
• Applications:
• ANNs are used in a wide range of applications, including image recognition,
natural language processing, and predictive modeling
Perceptron
• A perceptron is a fundamental unit in artificial neural networks, essentially a single
neuron. It takes inputs, applies weights and biases, and then uses an activation function to
produce an output. A multilayer perceptron (MLP), on the other hand, is a more complex
neural network architecture that builds upon the perceptron by stacking multiple layers of
perceptrons together. This allows MLPs to learn more intricate patterns and relationships in
data compared to a single-layer perceptron.
• Here's a more detailed breakdown:
• Perceptron:
• Definition:
• A perceptron is a basic computational unit in neural networks, often considered the building
block for more complex networks.
• Structure:
• It has an input layer, where data is fed in, and a single output neuron that produces a binary
output (usually 0 or 1) based on a threshold.
• Function:
• The perceptron calculates a weighted sum of its inputs and then applies an activation
function to this sum. This function determines the final output.
Key aspects of ANNs:

• Structure:
• ANNs typically have an input layer, one or more hidden layers, and an output layer.
• Nodes:
• Each node (neuron) receives input, processes it, and produces an output, which is
then passed to other nodes.
• Interconnections:
• The connections between nodes have weights, which determine the strength of the
connection, according to ScienceDirect.com.
• Learning:
• ANNs learn by adjusting the weights during a process called "training," where they
compare their predictions with actual outcomes and refine their internal
parameters, according to NVIDIA Developer.
• Applications:
• ANNs are used in a wide range of applications, including image recognition, natural
language processing, and predictive modeling
Limitations:

• A single-layer perceptron can only learn linear decision boundaries, making it


limited in its ability to solve non-linear problems. For example, it struggles to
solve the XOR problem.
• Multilayer Perceptron (MLP):
• Definition:
• An MLP is a feedforward neural network with multiple layers, including an
input layer, one or more hidden layers, and an output layer.
• Structure:
• The hidden layers contain numerous perceptrons, allowing for more complex
computations and feature extraction.
• Function:
• Each layer in an MLP performs a series of calculations and transformations on
the data, ultimately leading to a prediction or classification.
• Advantages:
• Non-linear decision boundaries: MLPs can learn non-linear relationships in
data, allowing them to solve complex problems that a single-layer perceptron
cannot.
• Feature extraction: The hidden layers in an MLP can extract relevant features
from the input data, making it easier to identify patterns and make predictions.
• Generalization: MLPs can generalize well to new, unseen data, meaning they
can make accurate predictions on data that they were not explicitly trained on.
• Training:
• MLPs are typically trained using backpropagation, an algorithm that adjusts the
weights and biases of the network based on the error between the predicted
output and the actual output.
• In essence, a perceptron is a simple unit, while an MLP is a more powerful and
versatile architecture built upon the foundation of perceptrons. MLPs are widely
used in various machine learning applications, including image recognition,
natural language processing, and more.
Gradient descent
• Gradient descent is the backbone of the learning
process for various algorithms, including linear
regression, logistic regression, support vector machines,
and neural networks which serves as a fundamental
optimization technique to minimize the cost function of
a model by iteratively adjusting the model parameters
to reduce the difference between predicted and actual
values, improving the model’s performance.
• Let’s see it’s role in machine learning:
• Prerequisites: Understand the working and math of
gradient descent
1. Training Machine Learning Models
• Neural networks are trained using Gradient Descent (or its
variants) in combination with backpropagation. Back
propagation computes the gradients of the loss function with
respect to each parameter (weights and biases) in the
network by applying the chain rule. The process involves:
• Forward Propagation: Computes the output for a given input
by passing data through the layers.
• Backward Propagation: Uses the chain rule to calculate
gradients of the loss with respect to each parameter (weights
and biases) across all layers.
• Gradients are then used by Gradient Descent to update the
parameters layer-by-layer, moving toward minimizing the
loss function.
2. Minimizing the Cost Function

• The algorithm minimizes a cost function, which quantifies the error or loss of the
model’s predictions compared to the true labels for:
• 1. Linear Regression
• Gradient descent minimizes the Mean Squared Error (MSE) which serves as the loss
function to find the best-fit line. Gradient Descent is used to iteratively update the
weights (coefficients) and bias by computing the gradient of the MSE with respect
to these parameters.
• Since MSE is a convex function gradient descent guarantees convergence to the
global minimum if the learning rate is appropriately chosen. For each iteration:
• The algorithm computes the gradient of the MSE with respect to the weights and
biases.
• It updates the weights (w) and bias (b) using the formula:
• Calculating the gradient of the log-loss with respect to the weights.
• Updating weights and biases iteratively to maximize the likelihood of the correct
classification:
• The formula is the parameter update rule for
gradient descent, which adjusts the weights w
and biases b to minimize a cost function. This
process iteratively adjusts the line’s slope and
intercept to minimize the error.
2. Logistic Regression
• In logistic regression, gradient descent minimizes
the Log Loss (Cross-Entropy Loss) to optimize the
decision boundary for binary classification. Since
the output is probabilistic (between 0 and 1), the
sigmoid function is applied. The process involves:
• Calculating the gradient of the log-loss with respect
to the weights.
• Updating weights and biases iteratively to
maximize the likelihood of the correct
classification:
3. Support Vector Machines (SVMs)
• For SVMs, gradient descent optimizes the hinge loss,
which ensures a maximum-margin hyperplane. The
algorithm:
• Calculates gradients for the hinge loss and the
regularization term (if used, such as L2 regularization).
• Updates the weights to maximize the margin between
classes while minimizing misclassification penalties
with same formula provided above.
• Gradient descent ensures the optimal placement of
the hyperplane to separate classes with the largest
possible margin.
• Gradient Descent Python Implementation
• Diving further into the concept, let’s
understand in depth, with practical
implementation.
• Import the necessary libraries
• import torch
• import torch.nn as nn
• import matplotlib.pyplot as plt
• # set random seed for reproducibility
• torch.manual_seed(42)

• # set number of samples


• num_samples = 1000

• # create random features with 2 dimensions


• x = torch.randn(num_samples, 2)

• # create random weights and bias for the linear regression model
• true_weights = torch.tensor([1.3, -1])
• true_bias = torch.tensor([-3.5])

• # Target variable
• y = x @ true_weights.T + true_bias

• # Plot the dataset


• fig, ax = plt.subplots(1, 2, sharey=True)
• ax[0].scatter(x[:,0],y)
• ax[1].scatter(x[:,1],y)

• ax[0].set_xlabel('X1')
• ax[0].set_ylabel('Y')
• ax[1].set_xlabel('X2')
• ax[1].set_ylabel('Y')
• plt.show()
Define the loss function

• Loss function (J)=1/n​∑(actual−predicted)2

• Here we are calculating the Mean Squared


Error by taking the square of the difference
between the actual and the predicted value
and then dividing it by its length (i.e n = the
Total number of output or target values) which
is the mean of squared errors.
• Here we are calculating the Mean Squared
Error by taking the square of the difference
between the actual and the predicted value
and then dividing it by its length (i.e n = the
Total number of output or target values) which
is the mean of squared errors.
Gradient Descent Learning Rate

• The learning rate is a critical hyperparameter in the context of gradient


descent, influencing the size of steps taken during the optimization process
to update the model parameters. Choosing an appropriate learning rate is
crucial for efficient and effective model training.
• When the learning rate is too small, the optimization process progresses
very slowly. The model makes tiny updates to its parameters in each
iteration, leading to sluggish convergence and potentially getting stuck in
local minima.
• On the other hand, an excessively large learning rate can cause the
optimization algorithm to overshoot the optimal parameter values, leading
to divergence or oscillations that hinder convergence.
• Achieving the right balance is essential. A small learning rate might result
in vanishing gradients and slow convergence, while a large learning rate
may lead to overshooting and instability.
Vanishing and Exploding Gradients

• Vanishing and exploding gradients are common problems that can occur
during the training of deep neural networks. These problems can significantly
slow down the training process or even prevent the network from learning
altogether.
• The vanishing gradient problem occurs when gradients become too small
during backpropagation. The weights of the network are not considerably
changed as a result, and the network is unable to discover the underlying
patterns in the data. Many-layered deep neural networks are especially prone
to this issue. The gradient values fall exponentially as they move backward
through the layers, making it challenging to efficiently update the weights in
the earlier layers.
• The exploding gradient problem, on the other hand, occurs when gradients
become too large during backpropagation. When this happens, the weights
are updated by a large amount, which can cause the network to diverge or
oscillate, making it difficult to converge to a good solution.
• To address these problems the following technique can be used:
• Weights Regularzations: The initialization of weights can be adjusted
to ensure that they are in an appropriate range. Using a different
activation function, such as the Rectified Linear Unit (ReLU), can also
help to mitigate the vanishing gradient problem.
• Gradient clipping: It involves limiting the maximum and minimum
values of the gradient during backpropagation. This can prevent the
gradients from becoming too large or too small and can help to
stabilize the training process.
• Batch normalization: It can also help to address these problems by
normalizing the input to each layer, which can prevent the activation
function from saturating and help to reduce the vanishing and
exploding gradient problems.
Different Variants of Gradient Descent

• There are several variants of gradient descent that differ in the way the
step size or learning rate is chosen and the way the updates are made.
Here are some popular variants:
• Batch Gradient Descent
• In batch gradient descent, To update the model parameter values like
weight and bias, the entire training dataset is used to compute the
gradient and update the parameters at each iteration. This can be slow
for large datasets but may lead to a more accurate model.
• It is effective for convex or relatively smooth error manifolds because it
moves directly toward an optimal solution by taking a large step in the
direction of the negative gradient of the cost function. However, it can be
slow for large datasets because it computes the gradient and updates the
parameters using the entire training dataset at each iteration. This can
result in longer training times and higher computational costs.
• Stochastic Gradient Descent (SGD)
• In SGD, only one training example is used to compute the
gradient and update the parameters at each iteration. This can
be faster than batch gradient descent but may lead to more
noise in the updates.
• Mini-batch Gradient Descent
• In Mini-batch gradient descent a small batch of training
examples is used to compute the gradient and update the
parameters at each iteration. This can be a good compromise
between batch gradient descent and Stochastic Gradient
Descent, as it can be faster than batch gradient descent and
less noisy than Stochastic Gradient Descent.
• Momentum-based Gradient Descent
• In momentum-based gradient descent, Momentum is a variant of
gradient descent that incorporates information from the previous
weight updates to help the algorithm converge more quickly to
the optimal solution. Momentum adds a term to the weight
update that is proportional to the running average of the past
gradients, allowing the algorithm to move more quickly in the
direction of the optimal solution.
• The updates to the parameters are based on the current gradient
and the previous updates. This can help prevent the optimization
process from getting stuck in local minima and reach the global
minimum faster.
• .
Nesterov Accelerated Gradient (NAG)

• Nesterov Accelerated Gradient (NAG) is an


extension of Momentum Gradient Descent. It
evaluates the gradient at a hypothetical
position ahead of the current position based
on the current momentum vector, instead of
evaluating the gradient at the current
position.
• This can result in faster convergence and
better performance
• Adagrad
• In Adagrad, the learning rate is adaptively adjusted for
each parameter based on the historical gradient
information. This allows for larger updates for infrequent
parameters and smaller updates for frequent
parameters.
• RMSprop
• In RMSprop the learning rate is adaptively adjusted for
each parameter based on the moving average of the
squared gradient. This helps the algorithm to converge
faster in the presence of noisy gradients
Adam
• Adam stands for adaptive moment estimation, it combines the
benefits of Momentum-based Gradient Descent, Adagrad, and
RMSprop the learning rate is adaptively adjusted for each
parameter based on the moving average of the gradient and the
squared gradient, which allows for faster convergence and better
performance on non-convex optimization problems.
• It keeps track of two exponentially decaying averages the first-
moment estimate, which is the exponentially decaying average of
past gradients, and the second-moment estimate, which is the
exponentially decaying average of past squared gradients. The first-
moment estimate is used to calculate the momentum, and the
second-moment estimate is used to scale the learning rate for each
parameter. This is one of the most popular optimization algorithms
for deep learning.
Advantages of Gradient Descent
• Widely used: Gradient descent and its variants are widely
used in machine learning and optimization problems because
they are effective and easy to implement.
• Convergence: Gradient descent and its variants can converge
to a global minimum or a good local minimum of the cost
function, depending on the problem and the variant used.
• Scalability: Many variants of gradient descent can be
parallelized and are scalable to large datasets and high-
dimensional models.
• Flexibility: Different variants of gradient descent offer a
range of trade-offs between accuracy and speed, and can be
adjusted to optimize the performance of a specific problem.
Disadvantages of gradient descent:
• Choice of learning rate: The choice of learning rate is crucial for the convergence
of gradient descent and its variants. Choosing a learning rate that is too large
can lead to oscillations or overshooting while choosing a learning rate that is too
small can lead to slow convergence or getting stuck in local minima.
• Sensitivity to initialization: Gradient descent and its variants can be sensitive to
the initialization of the model’s parameters, which can affect the convergence
and the quality of the solution.
• Time-consuming: Gradient descent and its variants can be time-consuming,
especially when dealing with large datasets and high-dimensional models. The
convergence speed can also vary depending on the variant used and the specific
problem.
• Local optima: Gradient descent and its variants can converge to a local minimum
instead of the global minimum of the cost function, especially in non-convex
problems. This can affect the quality of the solution, and techniques like random
initialization and multiple restarts may be used to mitigate this issue.
Gradient descent and the Delta rule in
machine learning
• The development of the perceptron was a big step towards the goal
of creating useful connectionist networks capable of learning
complex relations between inputs and outputs.
• In the late 1950’s, the connectionist community understood that
what was needed for further development of connectionist models
was a mathematically-derived (and thus potentially more flexible
and powerful) rule for learning.
• By early 1960’s, the Delta Rule [also known as the Widrow & Hoff
Learning rule or the Least Mean Square (LMS) rule] was invented
by Widrow and Hoff. This rule is similar to the perceptron learning
rule by McClelland & Rumelhart, 1988, but is also characterized by
a mathematical utility and elegance missing in the perceptron and
other early learning rules.
• The Delta Rule uses the difference between target activation (i.e., target
output values) and obtained activation to drive learning. For reasons
discussed below, the use of a threshold activation function (as used in
both the McCulloch-Pitts network and the perceptron) is dropped &
instead a linear sum of products is used to calculate the activation of the
output neuron (alternative activation functions can also be applied).
• Thus, the activation function is called a Linear Activation function, in
which the output node’s activation is simply equal to the sum of the
network’s respective input/weight products. The strength of network
connections (i.e., the values of the weights) are adjusted to reduce the
difference between target and actual output activation (i.e., error).
• A graphical depiction of a simple two-layer network capable of deploying
the Delta Rule is given in the figure below (Such a network is not limited
to having only one output node):
Unsupervised learning
• Unsupervised learning is a machine learning technique where algorithms analyze unlabeled
data to discover hidden patterns and relationships without explicit human guidance. It
contrasts with supervised learning, which relies on labeled data for training. Unsupervised
learning aims to uncover structures, groupings, or anomalies within the data, often for tasks
like customer segmentation, anomaly detection, or dimensionality reduction.
• Here's a more detailed explanation:
• Key Characteristics:
• Unlabeled Data:
• Unsupervised learning algorithms work with data that is not pre-categorized or labeled.
• Pattern Discovery:
• The primary goal is to identify patterns, clusters, or relationships within the data.
• No Supervision:
• Unlike supervised learning, there is no direct feedback or guidance during the learning
process.
• Exploratory Analysis:
• Unsupervised learning is often used for exploratory data analysis to uncover hidden insights
or understand the underlying structure of the data.
Examples of Applications:

• Clustering: Grouping similar data points together based on their features or


characteristics.
• Association Rule Mining: Discovering relationships between variables in a
dataset, like finding which items are frequently purchased together.
• Dimensionality Reduction: Reducing the number of variables while retaining
important information, making it easier to visualize or analyze complex data.
• Anomaly Detection: Identifying data points that deviate significantly from the
norm.
• Customer Segmentation: Grouping customers with similar purchasing behaviors
or demographics.
• Common Algorithms:
• Clustering Algorithms: K-means, DBSCAN, hierarchical clustering.
• Dimensionality Reduction Algorithms: Principal Component Analysis (PCA), t-
distributed Stochastic Neighbor Embedding (t-SNE).
• Association Rule Mining Algorithms: Apriori, Eclat.
Why Use Unsupervised Learning?

• Discovering Hidden Insights:


• Unsupervised learning can reveal patterns and
relationships that might not be obvious or readily
apparent without analyzing the data.
• Exploratory Data Analysis:
• It allows for a deeper understanding of the data and can
help identify areas for further investigation.
• Feature Engineering:
• Unsupervised methods can help identify relevant features
or create new ones for use in supervised learning models.
Self Organizing Map
• Self Organizing Map (or Kohonen Map or SOM) is a type of
Artificial Neural Network which is also inspired by biological
models of neural systems from the 1970s. It follows an
unsupervised learning approach and trained its network
through a competitive learning algorithm.
• SOM is used for clustering and mapping (or dimensionality
reduction) techniques to map multidimensional data onto
lower-dimensional which allows people to reduce complex
problems for easy interpretation. SOM has two layers, one is
the Input layer and the other one is the Output layer.
• The architecture of the Self Organizing Map with two clusters
and n input features of any sample is given below:
• How do SOM works?
• Let’s say an input data of size (m, n) where m is the number of training
examples and n is the number of features in each example. First, it
initializes the weights of size (n, C) where C is the number of clusters.
Then iterating over the input data, for each training example, it updates
the winning vector (weight vector with the shortest distance (e.g
Euclidean distance) from training example). Weight updation rule is
given by :
• wij = wij(old) + alpha(t) * (xik - wij(old))
• where alpha is a learning rate at time t, j denotes the winning vector, i
denotes the ith feature of training example and k denotes the kth training
example from the input data. After training the SOM network, trained
weights are used for clustering new examples. A new example falls in the
cluster of winning vectors.
Algorithm

• Training:
• Step 1: Initialize the weights wij random value may be assumed. Initialize the
learning rate α.
• Step 2: Calculate squared Euclidean distance.
• D(j) = Σ (wij – xi)^2 where i=1 to n and j=1 to m
• Step 3: Find index J, when D(j) is minimum that will be considered as winning
index.
• Step 4: For each j within a specific neighborhood of j and for all i, calculate the
new weight.
• wij(new)=wij(old) + α[xi – wij(old)]
• Step 5: Update the learning rule by using :
• α(t+1) = 0.5 * t
• Step 6: Test the Stopping Condition.
• Below is the implementation of the above approach:
• import math


• class SOM:

• # Function here computes the winning vector
• # by Euclidean distance
• def winner(self, weights, sample):

• D0 = 0
• D1 = 0

• for i in range(len(sample)):

• D0 = D0 + math.pow((sample[i] - weights[0][i]), 2)
• D1 = D1 + math.pow((sample[i] - weights[1][i]), 2)

• # Selecting the cluster with smallest distance as winning cluster

• if D0 < D1:
• return 0
• else:
• return 1
• # Function here updates the winning vector
• def update(self, weights, sample, J, alpha):
• # Here iterating over the weights of winning cluster and modifying them
• for i in range(len(weights[0])):
• weights[J][i] = weights[J][i] + alpha * (sample[i] - weights[J][i])

• return weights

• # Driver code


• def main():

• # Training Examples ( m, n )
• T = [[1, 1, 0, 0], [0, 0, 0, 1], [1, 0, 0, 0], [0, 0, 1, 1]]

• m, n = len(T), len(T[0])

• # weight initialization ( n, C )
• weights = [[0.2, 0.6, 0.5, 0.9], [0.8, 0.4, 0.7, 0.3]]

• # training
• ob = SOM()

• epochs = 3
• alpha = 0.5


• for i in range(epochs):
• for j in range(m):

• # training sample
• sample = T[j]

• # Compute winner vector
• J = ob.winner(weights, sample)

• # Update winning vector
• weights = ob.update(weights, sample, J, alpha)

• # classify test sample
• s = [0, 0, 0, 1]
• J = ob.winner(weights, s)

• print("Test Sample s belongs to Cluster : ", J)
• print("Trained weights : ", weights)


• if __name__ == "__main__":
• main()
Generalization
• In machine learning, generalization refers to a model's ability to make accurate predictions on new,
unseen data that it wasn't explicitly trained on. A model that generalizes well learns the underlying
patterns in the training data and can apply those patterns to new, similar examples. Good generalization
is crucial for building effective and reliable machine learning models.

• Elaboration:

• Why Generalization Matters:


• The ultimate goal of most machine learning models is to perform well in real-world scenarios, where the
data they encounter will likely be different from the training data. A model that only performs well on the
training data is not useful in practice.
• Good vs. Poor Generalization:
• Good Generalization: A model that generalizes well can accurately predict outcomes on new data based
on the patterns it learned from the training data.
• Poor Generalization: A model that doesn't generalize well might either:Overfit: Memorize the training
data, leading to high accuracy on the training set but poor performance on unseen data.
• Underfit: Fail to learn meaningful patterns from the training data, leading to poor performance on both
training and unseen data.
Real-world Examples
• Autonomous Vehicles: Self-driving car models
need to generalize to different road conditions
and object types.
• Healthcare AI: Medical diagnostic models
need to generalize to new patient data and
different disease presentations.
• Spam Filters: Spam filters need to generalize
to new types of spam emails, even if they
haven't seen those exact emails before.
Factors Affecting Generalization:

• Model Complexity: Overly complex models can


overfit the training data, while overly simple models
may underfit.
• Data Size and Quality: A larger and more
representative training dataset generally leads to
better generalization.
• Regularization: Techniques like regularization can
help prevent overfitting and improve generalization.
• Hyperparameter Tuning: Optimizing the model's
hyperparameters can also improve generalization
Deep Learning

• Deep Learning is transforming the way


machines understand, learn, and interact with
complex data. Deep learning mimics neural
networks of the human brain, it enables
computers to autonomously uncover patterns
and make informed decisions from vast
amounts of unstructured data.
Deep Learning Works?

• Neural network consists of layers of interconnected


nodes, or neurons, that collaborate to process input
data. In a fully connected deep neural network, data
flows through multiple layers, where each neuron
performs nonlinear transformations, allowing the
model to learn intricate representations of the data.
• In a deep neural network, the input layer receives
data, which passes through hidden layers that
transform the data using nonlinear functions. The
final output layer generates the model’s prediction.
• Deep Learning in Machine Learning Paradigms
• Supervised Learning: Neural networks learn from labeled data
to predict or classify, using algorithms like CNNs and RNNs for
tasks such as image recognition and language translation.
• Unsupervised Learning: Neural networks identify patterns in
unlabeled data, using techniques like Autoencoders and
Generative Models for tasks like clustering and anomaly
detection.
• Reinforcement Learning: An agent learns to make decisions
by maximizing rewards, with algorithms like DQN and DDPG
applied in areas like robotics and game playing.
Advantages of Deep Learning

• High accuracy: Deep Learning algorithms can achieve state-of-the-art


performance in various tasks, such as image recognition and natural
language processing.
• Automated feature engineering: Deep Learning algorithms can
automatically discover and learn relevant features from data without
the need for manual feature engineering.
• Scalability: Deep Learning models can scale to handle large and
complex datasets, and can learn from massive amounts of data.
• Flexibility: Deep Learning models can be applied to a wide range of
tasks and can handle various types of data, such as images, text, and
speech.
• Continual improvement: Deep Learning models can continually
improve their performance as more data becomes available.
Disadvantages of Deep Learning

• High computational requirements: Deep Learning AI models require


large amounts of data and computational resources to train and
optimize.
• Requires large amounts of labeled data: Deep Learning models often
require a large amount of labeled data for training, which can be
expensive and time- consuming to acquire.
• Interpretability: Deep Learning models can be challenging to
interpret, making it difficult to understand how they make decisions.
Overfitting: Deep Learning models can sometimes overfit to the
training data, resulting in poor performance on new and unseen data.
• Black-box nature: Deep Learning models are often treated as black
boxes, making it difficult to understand how they work and how they
arrived at their predictions.
Challenges in Deep Learning

• Deep learning has made significant advancements in various fields, but there
are still some challenges that need to be addressed. Here are some of the main
challenges in deep learning:
• Data availability: It requires large amounts of data to learn from. For using
deep learning it’s a big concern to gather as much data for training.
• Computational Resources: For training the deep learning model, it is
computationally expensive because it requires specialized hardware like GPUs
and TPUs.
• Time-consuming: While working on sequential data depending on the
computational resource it can take very large even in days or months.
• Interpretability: Deep learning models are complex, it works like a black box. it
is very difficult to interpret the result.
• Overfitting: when the model is trained again and again, it becomes too
specialized for the training data, leading to overfitting and poor performance on
new data.
• Deep Learning Applications
• 1. Computer vision
• In computer vision, deep learning models enable machines to identify and
understand visual data. Some of the main applications of deep learning in
computer vision include:
• Object detection and recognition: Deep learning models are used to identify
and locate objects within images and videos, making it possible for machines
to perform tasks such as self-driving cars, surveillance, and robotics.
• Image classification: Deep learning models can be used to classify images
into categories such as animals, plants, and buildings. This is used in
applications such as medical imaging, quality control, and image retrieval.
• Image segmentation: Deep learning models can be used for image
segmentation into different regions, making it possible to identify specific
features within images.
• 2. Natural language processing (NLP)
• In NLP, deep learning model enable machines to understand and generate human
language. Some of the main applications of deep learning in NLP include:
• Automatic Text Generation: Deep learning model can learn the corpus of text and
new text like summaries, essays can be automatically generated using these trained
models.
• Language translation: Deep learning models can translate text from one language
to another, making it possible to communicate with people from different linguistic
backgrounds.
• Sentiment analysis: Deep learning models can analyze the sentiment of a piece of
text, making it possible to determine whether the text is positive, negative, or
neutral.
• Speech recognition: Deep learning models can recognize and transcribe spoken
words, making it possible to perform tasks such as speech-to-text conversion, voice
search, and voice-controlled devices.
• .
3. Reinforcement learning

• In reinforcement learning, deep learning works as training agents


to take action in an environment to maximize a reward. Some of
the main applications of deep learning in reinforcement learning
include:
• Game playing: Deep reinforcement learning models have been
able to beat human experts at games such as Go, Chess, and Atari.
• Robotics: Deep reinforcement learning models can be used to train
robots to perform complex tasks such as grasping objects,
navigation, and manipulation.
• Control systems: Deep reinforcement learning models can be used
to control complex systems such as power grids, traffic
management, and supply chain optimization
CNN
• A Convolutional Neural Network (CNN) is a type of neural network,
particularly well-suited for processing visual data, that uses a "convolutional"
layer to extract features from input data. CNNs are widely used for image
recognition, object detection, and other computer vision tasks.
• Convolutional Neural Network (CNN) is an advanced version of
artificial neural networks (ANNs), primarily designed to extract features
from grid-like matrix datasets. This is particularly useful for visual datasets
such as images or videos, where data patterns play a crucial role. CNNs are
widely used in computer vision applications due to their effectiveness in
processing visual data.
• CNNs consist of multiple layers like the input layer, Convolutional layer,
pooling layer, and fully connected layers. Let’s learn more about CNNs in
detail.
How Convolutional Layers Works?

• Convolution Neural Networks are neural


networks that share their parameters.
• Imagine you have an image. It can be
represented as a cuboid having its length,
width (dimension of the image), and height
(i.e the channel as images generally have red,
green, and blue channels).
• Now imagine taking a small patch of this image and running
a small neural network, called a filter or kernel on it, with
say, K outputs and representing them vertically.
• Now slide that neural network across the whole image, as a
result, we will get another image with different widths,
heights, and depths. Instead of just R, G, and B channels
now we have more channels but lesser width and height.
This operation is called Convolution. If the patch size is the
same as that of the image it will be a regular neural
network. Because of this small patch, we have fewer
weights.
• Flattening: The resulting feature maps are flattened into
a one-dimensional vector after the convolution and
pooling layers so they can be passed into a completely
linked layer for categorization or regression.
• Fully Connected Layers: It takes the input from the
previous layer and computes the final classification or
regression task.
• Output Layer: The output from the fully connected
layers is then fed into a logistic function for classification
tasks like sigmoid or softmax which converts the output
of each class into the probability score of each class.
Layers Used to Build ConvNets

• A complete Convolution Neural Networks architecture is also known as covnets. A covnets is a sequence of
layers, and every layer transforms one volume to another through a differentiable function.
• Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
• Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input will be an
image or a sequence of images. This layer holds the raw input of the image with width 32, height 32, and
depth 3.
• Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset. It applies
a set of learnable filters known as the kernels to the input images. The filters/kernels are smaller matrices
usually 2×2, 3×3, or 5×5 shape. it slides over the input image data and computes the dot product between
kernel weight and the corresponding input image patch. The output of this layer is referred as feature maps.
Suppose we use a total of 12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
• Activation Layer: By adding an activation function to the output of the preceding layer, activation layers add
nonlinearity to the network. it will apply an element-wise activation function to the output of the
convolution layer. Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU, etc. The
volume remains unchanged hence output volume will have dimensions 32 x 32 x 12.
• Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce the size of
volume which makes the computation fast reduces memory and also prevents overfitting. Two common
types of pooling layers are max pooling and average pooling. If we use a max pool with 2 x 2 filters and
stride 2, the resultant volume will be of dimension 16x16x12.
• # import the necessary libraries
• import numpy as np
• import tensorflow as tf
• import matplotlib.pyplot as plt
• from itertools import product

• # set the param


• plt.rc('figure', autolayout=True)
• plt.rc('image', cmap='magma')

• # define the kernel


• kernel = tf.constant([[-1, -1, -1],
• [-1, 8, -1],
• [-1, -1, -1],
• ])

• # load the image


• image = tf.io.read_file('Ganesh.jpg')
• image = tf.io.decode_jpeg(image, channels=1)
• image = tf.image.resize(image, size=[300, 300])

• # plot the image


• img = tf.squeeze(image).numpy()
• plt.figure(figsize=(5, 5))
• plt.imshow(img, cmap='gray')
• plt.axis('off')
• plt.title('Original Gray Scale image')
• plt.show();
Key Concepts:

• Convolutional Layer:
• This layer uses filters (kernels) to scan the input image, extracting
features like edges, textures, and shapes.
• Feature Extraction:
• CNNs learn to automatically extract relevant features from the
input data, reducing the need for manual feature engineering.
• Pooling Layers:
• These layers reduce the spatial dimensions of the feature maps,
making the network more robust to small variations in the input.
• Fully Connected Layers:
• These layers integrate the extracted features to make a final
prediction or classification.
• # Reformat
• image = tf.image.convert_image_dtype(image, dtype=tf.float32)
• image = tf.expand_dims(image, axis=0)
• kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
• kernel = tf.cast(kernel, dtype=tf.float32)

• # convolution layer
• conv_fn = tf.nn.conv2d

• image_filter = conv_fn(
• input=image,
• filters=kernel,
• strides=1, # or (1, 1)
• padding='SAME',
• )

• plt.figure(figsize=(15, 5))

• # Plot the convolved image


• plt.subplot(1, 3, 1)

• plt.imshow(
• tf.squeeze(image_filter)
• )
• plt.axis('off')
• plt.title('Convolution')
• # activation layer
• relu_fn = tf.nn.relu
• # Image detection
• image_detect = relu_fn(image_filter)

• plt.subplot(1, 3, 2)
• plt.imshow(
• # Reformat for plotting
• tf.squeeze(image_detect)
• )

• plt.axis('off')
• plt.title('Activation')

• # Pooling layer
• pool = tf.nn.pool
• image_condense = pool(input=image_detect,
• window_shape=(2, 2),
• pooling_type='MAX',
• strides=(2, 2),
• padding='SAME',
• )

• plt.subplot(1, 3, 3)
• plt.imshow(tf.squeeze(image_condense))
• plt.axis('off')
• plt.title('Pooling')
• plt.show()
How CNNs Work:

• 1. Input:
• A CNN takes an image as input (e.g., a 3D matrix representing the image's pixels).
• 2. Convolution:
• Convolutional layers apply filters to the input, creating feature maps that highlight
specific patterns.
• 3. Pooling:
• Pooling layers reduce the spatial dimensions of the feature maps.
• 4. Activation:
• Non-linear activation functions (like ReLU) are applied to introduce non-linearity into
the model.
• 5. Fully Connected Layers:
• The extracted features are passed to fully connected layers for classification or
prediction.
Applications:

• Image Recognition: Identifying objects, scenes, and actions in


images.
• Object Detection: Locating and classifying objects within an
image.
• Image Segmentation: Dividing an image into different regions
based on content.
• Medical Image Analysis: Analyzing medical images for diagnosis
and treatment.
• Natural Language Processing: Processing and understanding text.
• Speech Recognition: Converting speech into text.
• Time Series Analysis: Analyzing data that changes over time.
• Robotics: Developing intelligent robots.
Advantages of CNNs:

• Automatic Feature Extraction:


• CNNs learn to extract relevant features directly from the data,
reducing the need for manual feature engineering.
• Hierarchical Feature Learning:
• CNNs can learn hierarchical representations of features, capturing
both simple and complex patterns.
• Translation Invariance:
• Pooling layers make the network more robust to small variations
in the input, such as small shifts in object position.
• Efficiency:
• CNNs are computationally efficient for processing large amounts
of data.
Limitations:

• Data Requirements: CNNs often require large


amounts of labeled data for training.
• Hyperparameter Tuning: Finding the optimal
hyperparameters for a CNN can be
challenging.
• Overfitting: CNNs can be prone to overfitting
if they are not properly regularized.
• Advantages of CNNs
• Good at detecting patterns and features in images, videos, and audio
signals.
• Robust to translation, rotation, and scaling invariance.
• End-to-end training, no need for manual feature extraction.
• Can handle large amounts of data and achieve high accuracy.
• Disadvantages of CNNs
• Computationally expensive to train and require a lot of memory.
• Can be prone to overfitting if not enough data or proper regularization is
used.
• Requires large amounts of labeled data.
• Interpretability is limited, it’s hard to understand what the network has
learned
1D & 2 D Convolution:

• convolutions operate on two axes, like width and height, typically used for image processing. 1D convolutions
are commonly used for sequential data like time series or text, and 2D convolutions are used for images.
• 1D Convolution:
• Input: 1D data (e.g., time series, text sequences).
• Kernel: A 1D vector (filter) that slides along the input sequence.
• Output: A 1D feature map.
• Applications: Time series analysis, NLP (Natural Language Processing), audio processing.
• 2D Convolution:
• Input: 2D data (e.g., images).
• Kernel: A 2D matrix (filter) that slides over the input image in two directions (width and height).
• Output: A 2D feature map.
• Applications: Image processing, computer vision, video analysis.
• Key Differences:
• Input Data: 1D convolutions work on 1D data, while 2D convolutions work on 2D data.
• Kernel: 1D convolutions use a 1D kernel, while 2D convolutions use a 2D kernel.
• Output: 1D convolutions produce a 1D output, while 2D convolutions produce a 2D output.
• Direction of Sliding: The kernel in 1D convolution slides in one direction, while the kernel in 2D convolution
slides in two directions (width and height).
• Applications: 1D convolutions are suitable for sequential data, while 2D convolutions are suitable for images.
• In machine learning, network training refers to the process of teaching a neural network or a similar model to perform a specific task by exposing it to
data and adjusting its internal parameters (weights and biases) to minimize prediction errors. This is typically achieved through iterative algorithms like
backpropagation, which adjusts the network's internal structure to better match the input data and desired output.
• Here's a more detailed explanation:
• 1. The Goal: The goal of network training is to find the optimal set of parameters for a neural network (or other model) so that it can accurately predict
the desired output for new, unseen input data.
• 2. The Process:
• Data:
• Training involves providing the network with a large dataset of labeled examples, where each example consists of an input and the corresponding
correct output.
• Forward Propagation:
• The network processes the input data, passing it through its layers and neurons until it produces an output.
• Error Calculation:
• The difference between the network's output and the correct output (target value) is calculated, often using a "loss function".
• Backward Propagation (Backpropagation):
• This algorithm calculates the error gradient (the rate of change of the loss function with respect to the network's parameters).
• Parameter Adjustment:
• The network's weights and biases are adjusted based on the error gradient, typically using an optimization algorithm like gradient descent, to minimize
the loss function.
• Iteration:
• This process (forward propagation, error calculation, backward propagation, and parameter adjustment) is repeated iteratively for many epochs (passes
through the entire training dataset) until the network's performance improves to a desired level.
• 3. Key Concepts:
• Weights and Biases:
• These are the parameters of the network that control how it processes data. They are adjusted during training to minimize errors.
• Loss Function:
• A mathematical function that quantifies the difference between the network's predictions and the actual target values.
• Optimization Algorithm:
• An algorithm used to adjust the network's parameters in a way that minimizes the loss function (e.g., gradient descent).
• Backpropagation:
• The most common algorithm for training neural networks, it calculates the error gradient and allows for efficient adjustment of the network's
parameters.
Types of Network Training:

• Supervised Learning:
• The network learns from labeled data, where each input
has a corresponding correct output.
• Unsupervised Learning:
• The network learns from unlabeled data, trying to
discover patterns and relationships within the data.
• Reinforcement Learning:
• The network learns by interacting with an environment
and receiving rewards or penalties for its actions.
Here is how to build a smart speaker using a
Convolutional Neural Network (CNN) in Python

• 1. Data Collection and Preprocessing


• Gather a large dataset of audio samples with
corresponding commands or intents.
• Preprocess the audio data by converting it into
a suitable format for the CNN, such as
spectrograms or Mel-Frequency Cepstral
Coefficients (MFCCs). Libraries like Librosa can
be used for feature extraction.
• import librosa
• import numpy as np
• def extract_features(audio_path):
• audio, sr = librosa.load(audio_path)
• mfccs = librosa.feature.mfcc(y=audio, sr=sr,
n_mfcc=40)
• return np.mean(mfccs.T, axis=0)
2. Model Architecture

• Design a CNN architecture suitable for audio


classification. This typically involves
convolutional layers, pooling layers, and fully
connected layers.
• Consider using techniques like dropout to
prevent overfitting.
3. Model Training

• Compile the model with an appropriate loss


function (e.g., categorical cross-entropy) and
optimizer (e.g., Adam).
• Train the model on the preprocessed audio
data, splitting it into training and validation
sets.
• Monitor the model's performance on the
validation set to fine-tune hyperparameters
and prevent overfitting.
4. Integration with Smart Speaker Hardware

• Integrate the trained model with the hardware


components of the smart speaker, such as the
microphone and speaker.
• Implement real-time audio processing to
capture and preprocess user commands.
• Use the trained model to predict the intent of
the command and execute the corresponding
action.
5. Testing and Refinement

• Thoroughly test the smart speaker in various


environments and with different users.
• Refine the model and system based on user
feedback and performance evaluations
THANK YOU

You might also like