MODULE 5
MODULE 5
Working principle:
Convolutional Layer
Applies filters(kernels) to the input image.
Extracts features such as edges, textures, and patterns.
Uses activation functions (like ReLU) to introduce non-linearity.
Pooling Layer
Reduces the spatial dimensions (zoom out) of feature maps.
Common types: Max Pooling (keeps the most important
features).
fully Connected (FC) Layer
Flattens the pooled feature maps into a 1D vector.
Passes through dense layers with activation functions (like
Softmax for classification).
Produces final predictions.
Dropout & Batch Normalization
Dropout prevents overfitting by randomly deactivating neurons.
Batch Normalization speeds up training and stabilizes learning.
Where it was used?
Image classification, facial recognition, image segmentation,
generative models, NLP, robotics…etc.
Advantages of CNNs
leNet
AlexNet
GoogleNet
VGG
ResNET
There are many popular tools and frameworks for developing CNNs, including:
Keras: A high-level deep learning API for Python that can be used with
TensorFlow, PyTorch, or MXNet.
Convolutional Layer:
This layer is the core building block of a CNN. The layer’s parameters consist
of learnable kernels or filters which extend through the full depth of the input.
Each unit of this layer receives inputs from a set of units located in small
neighbourhood in the previous layer. Such a neighbourhood is called as the
neuron’s receptive field in the previous layer. During the forward pass each
filter is convolved with input which produces a map. When multiple such feature
maps that are generated from a multiple filters are stacked they form the output
of the convolution layer.
Non-linearity Layer:
Pooling Layer:
The Convolution layer may be followed by the pooling layer which takes small
rectangular blocks from the convolution layer and subsamples it to produce a
single maximum output from the block. Pooling layer progressively reduces the
spatial size of the representation, thus reducing the parameters to be computed.
It also controls overfitting.
There maybe one or more fully-connected layers that perform high level
reasoning by taking all neurons in the previous layer and connecting them to
every single neuron in the current layer to generate global semantic
information.
Feature extraction:
It is the process of identifying and learning important patterns
from input data, particularly images, to help in classification,
detection, and other tasks. In Convolutional Neural Networks
(CNNs), this is done automatically through multiple layers.
1.LeNet:
the first successful CNNs designed for handwritten digit recognition.
It laid the foundation for modern CNNs and achieved high accuracy
on the MNIST (Modified National Institute of Standards and
Technology) dataset, which contains 70,000 images of handwritten
digits (0-9).
2.AlexNet:
a major image recognition, it helped to establish CNNs as a powerful
tool for image recognition.
3. ResNet:
it is designed for image recognition and processing tasks. They are
distinguished for their ability to train deep networks without
overfitting, making them highly effective for complex tasks.
4.GoogleNet:
It is also known as InceptionNet, it is distinguished for achieving high
accuracy in image classification while using parameters.
In this case, the model doesn’t work well on either the training
or testing data.
Convolutional Autoencoder:
A Convolutional Autoencoder (CAE) is a type of autoencoder that
leverages convolutional layers to learn spatial hierarchies of features
from images. It consists of two main parts:
Uses:
To understand what features the encoder extracts.
To inspect the reconstructed images and latent representations.
To verify whether the model is learning meaningful
representations.
4. Filter/Kernels
APPLICATION OF CNN:
3. Autonomous Vehicles
CNN deep learning technologies are crucial to the development of
autonomous vehicles. Road signals and obstacle recognition are only
two examples of the dynamic environmental stimuli that these neural
networks allow vehicles to process and react to.
4. Healthcare Imaging
CNNs are transforming medical imaging in the healthcare industry by
providing better diagnostic capabilities. CNN neural network models
can be used to analyze medical images more accurately by healthcare
providers, and this can lead to earlier detection of conditions such as
cancer.
5. Financial Services
7. Industrial Automation
In CBIR, a user specifies a query image and gets the images in the
database similar to the query image.
To find the most similar images, CBIR compares the content of the
input image to the database images.
CBIR compares visual features such as shapes, colours, texture and
spatial information and measures the similarity between the query
image (A query image is the image provided by a user to search for
visually similar images in a database) with the images in the database
with respect to those features:
1. Feature Extraction:
o A pre-trained or custom CNN (e.g., ResNet, VGG)
processes the query image.
o Intermediate layers extract feature representations (e.g.,
edges, textures, and high-level patterns).
2. Feature Vector Representation:
o The extracted features are converted into a feature vector
(A feature vector is a numerical representation of an image (or any data)
that captures its important characteristics in a compact form).
o Fully connected layers or pooling layers often generate
this vector.
Compact form representing an image using a feature vector instead of
storing or processing the entire image.
3. Similarity Matching:
o When a user submits a image, its feature vector is
computed.
o The system compares it with stored feature vectors using
distance metrics like:
Euclidean Distance
Cosine Similarity
Manhattan Distance
o The closest matches (i.e., visually similar images) are
retrieved.
1. User Input:
o The user uploads or selects an image as the query image.
2. Feature Extraction:
o A CNN processes the query image to extract a feature
vector (a numerical representation).
3. Feature Matching:
o The feature vector of the query image is compared with
feature vectors of database images.
4. Similarity Search:
o The system retrieves the most similar images based on a
similarity metric (e.g., Euclidean Distance or Cosine
Similarity).
5. Output:
o The retrieved images, ranked by similarity, are displayed
as search results.
Dataset:
ImageNet uses the hierarchical structure of WordNet. Each
meaningful concept in WordNet, can be described as “synonym set”
or “synset”.
1. Feature Extraction
o A convolutional neural network (CNN) extracts features
from the input image.
o These features help distinguish different objects based on
shape, texture, and colour.
2. Object Localization & Classification
o The model predicts bounding boxes around detected
objects.
o It assigns a class label to each detected object.
| (With Objects) |
+-----------------+
Natural Language Processing (NLP) in Deep Learning:
Text Preprocessing
Preprocessing is crucial to clean and prepare the raw text data for
analysis. Common preprocessing steps include:
Text Representation
4. Feature Extraction
Extracting meaningful features from the text data that can be used for
various NLP tasks.
Sequence Training:
Sequence training in deep learning refers to training models on sequential data,
where the order of the data matters.
It is widely used in tasks such as:
Speech Recognition
Video Analysis