0% found this document useful (0 votes)
2 views

MODULE 5

Convolutional Neural Networks (CNNs) are specialized deep learning algorithms designed for image analysis, widely used in tasks such as image classification and object detection. They consist of layers including convolutional, pooling, and fully connected layers, which automatically extract features from images while reducing dimensionality. CNNs have applications in various fields, including healthcare, autonomous vehicles, and natural language processing, and are supported by frameworks like TensorFlow and PyTorch.

Uploaded by

bejagambalu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

MODULE 5

Convolutional Neural Networks (CNNs) are specialized deep learning algorithms designed for image analysis, widely used in tasks such as image classification and object detection. They consist of layers including convolutional, pooling, and fully connected layers, which automatically extract features from images while reducing dimensionality. CNNs have applications in various fields, including healthcare, autonomous vehicles, and natural language processing, and are supported by frameworks like TensorFlow and PyTorch.

Uploaded by

bejagambalu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Refer textbook

CONVOLUTIONAL NEURAL NETWORK


It is one type of artificial neural network that is
particularly good at analyzing visual images.
In similar terms, it’s a deep learning algorithm specifically
designed to process and understand images.
It is widely used in computer vision tasks like image
classification, object detection, and segmentation.
Diagram for CNN:

Working principle:
Convolutional Layer
 Applies filters(kernels) to the input image.
 Extracts features such as edges, textures, and patterns.
 Uses activation functions (like ReLU) to introduce non-linearity.
Pooling Layer
 Reduces the spatial dimensions (zoom out) of feature maps.
 Common types: Max Pooling (keeps the most important
features).
fully Connected (FC) Layer
 Flattens the pooled feature maps into a 1D vector.
 Passes through dense layers with activation functions (like
Softmax for classification).
 Produces final predictions.
Dropout & Batch Normalization
 Dropout prevents overfitting by randomly deactivating neurons.
 Batch Normalization speeds up training and stabilizes learning.
Where it was used?
Image classification, facial recognition, image segmentation,
generative models, NLP, robotics…etc.

Advantages of CNNs

 Automatic Feature Extraction – No need for manual feature


engineering.
 Parameter Sharing – Reduces the number of parameters compared to
fully connected networks.
 Translation Invariance – Detects objects regardless of their position in
an image.

Different types of CNN:

 leNet
 AlexNet
 GoogleNet
 VGG
 ResNET

There are many popular tools and frameworks for developing CNNs, including:

 TensorFlow: An open-source software library for deep learning


developed by Google.

 PyTorch: An open-source deep learning framework developed by


Facebook.
 MXNet: An open-source deep learning framework developed by Apache
MXNet.

 Keras: A high-level deep learning API for Python that can be used with
TensorFlow, PyTorch, or MXNet.

Convolutional Layer:

This layer is the core building block of a CNN. The layer’s parameters consist
of learnable kernels or filters which extend through the full depth of the input.
Each unit of this layer receives inputs from a set of units located in small
neighbourhood in the previous layer. Such a neighbourhood is called as the
neuron’s receptive field in the previous layer. During the forward pass each
filter is convolved with input which produces a map. When multiple such feature
maps that are generated from a multiple filters are stacked they form the output
of the convolution layer.

Non-linearity Layer:

This is a layer of neurons which apply various activation functions. These


functions introduce nonlinearities which are desirable for multi-layer networks.
The activation functions are typically sigmoid, tanh and ReLU. Compared to
other functions Rectified Linear Units (ReLU) are preferable because neural
networks train several times faster.

Pooling Layer:

The Convolution layer may be followed by the pooling layer which takes small
rectangular blocks from the convolution layer and subsamples it to produce a
single maximum output from the block. Pooling layer progressively reduces the
spatial size of the representation, thus reducing the parameters to be computed.
It also controls overfitting.

Fully Connected Layer:

There maybe one or more fully-connected layers that perform high level
reasoning by taking all neurons in the previous layer and connecting them to
every single neuron in the current layer to generate global semantic
information.

Feature extraction:
It is the process of identifying and learning important patterns
from input data, particularly images, to help in classification,
detection, and other tasks. In Convolutional Neural Networks
(CNNs), this is done automatically through multiple layers.

In Convolution Layer (Feature Extraction) A feature is


considered as an important part of an image and is used as a
starting point for computer vision algorithms.

Input layer in CNN

Different Types of CNN Models:

1.LeNet:
the first successful CNNs designed for handwritten digit recognition.
It laid the foundation for modern CNNs and achieved high accuracy
on the MNIST (Modified National Institute of Standards and
Technology) dataset, which contains 70,000 images of handwritten
digits (0-9).

2.AlexNet:
a major image recognition, it helped to establish CNNs as a powerful
tool for image recognition.

3. ResNet:
it is designed for image recognition and processing tasks. They are
distinguished for their ability to train deep networks without
overfitting, making them highly effective for complex tasks.

4.GoogleNet:
It is also known as InceptionNet, it is distinguished for achieving high
accuracy in image classification while using parameters.

The core component of GoogleNet, Inception modules allow the


network to learn features at different scales simultaneously,
improving the performance.

5. VGG (Visual Geometry Group)

it uses small 3×3 convolutional filters stacked in multiple layers,


creating a deep and uniform structure.
Overfitting happens when a model learns too much from the training
data, including details that don’t matter (like noise or outliers).

 For example, imagine fitting a very complicated curve to a set of


points. The curve will go through every point, but it won’t
represent the actual pattern.
 As a result, the model works great on training data but fails
when tested on new data.

Underfitting is the opposite of overfitting. It happens when a model is


too simple to capture what’s going on in the data.

 For example, imagine drawing a straight line to fit points that


actually follow a curve. The line misses most of the pattern.

 In this case, the model doesn’t work well on either the training
or testing data.
Convolutional Autoencoder:
A Convolutional Autoencoder (CAE) is a type of autoencoder that
leverages convolutional layers to learn spatial hierarchies of features
from images. It consists of two main parts:

1. Encoder: Compresses the input image into a lower-dimensional


latent space.
2. Decoder: Reconstructs the original image from the latent space
representation.

Uses:
 To understand what features the encoder extracts.
 To inspect the reconstructed images and latent representations.
 To verify whether the model is learning meaningful
representations.

Methods to Visualize a Trained Convolutional Autoencoder

1. Visualizing Reconstructed Images


Compare original vs. reconstructed images.

2. Visualizing Latent Space


Using PCA (Principal Component Analysis) or t-SNE (t-
Distributed Stochastic Neighbour Embedding) helps reduce
the complexity of high-dimensional encoded features by mapping
them into a lower-dimensional space (2D or 3D).

3. Feature Maps (Activation Maps)


Visualize activations from the encoder to understand learned
features.

4. Filter/Kernels

Visualize the learned convolution filters.

APPLICATION OF CNN:

1. Image and Video Recognition

Image and video recognition is the most well-known application of


convolutional neural networks. Enterprises use CNNs to improve their
security systems with real-time facial recognition

2. Natural Language Processing

CNNs are essential to the development of Natural Language


Processing (NLP), which allows machines to understand human
language with a high degree of precision. With its advanced chatbots
and virtual assistants that can offer individualized interactions, this
application is transforming consumer engagement.

3. Autonomous Vehicles
CNN deep learning technologies are crucial to the development of
autonomous vehicles. Road signals and obstacle recognition are only
two examples of the dynamic environmental stimuli that these neural
networks allow vehicles to process and react to.

4. Healthcare Imaging
CNNs are transforming medical imaging in the healthcare industry by
providing better diagnostic capabilities. CNN neural network models
can be used to analyze medical images more accurately by healthcare
providers, and this can lead to earlier detection of conditions such as
cancer.

This improves patient outcomes while optimizing resource allocation


by reducing misdiagnosis and unnecessary treatments.

5. Financial Services

In financial services, CNN convolutional neural networks are


redefining the analysis of complex data sets. By identifying hidden
patterns and anomalies, CNNs strengthen fraud detection mechanisms
and risk management strategies.

6. Retail and E-commerce

Retail and e-commerce sites are deploying neural networks and


applications to improve customer experiences through advanced
recommendation systems. CNNs analyze buyer behavior and generate
personalized product recommendations, improving sales and customer
engagement.

7. Industrial Automation

Convolutional neural networks are driving change in industrial


environments by making improvements in areas such as quality
control and predictive maintenance.

Content-Based Image Retrieval:

Content-Based Image Retrieval (CBIR) is a way of retrieving images


from a database.

In CBIR, a user specifies a query image and gets the images in the
database similar to the query image.

To find the most similar images, CBIR compares the content of the
input image to the database images.
CBIR compares visual features such as shapes, colours, texture and
spatial information and measures the similarity between the query
image (A query image is the image provided by a user to search for
visually similar images in a database) with the images in the database
with respect to those features:

Working functionalities of CBIR in CNN:

1. Feature Extraction:
o A pre-trained or custom CNN (e.g., ResNet, VGG)
processes the query image.
o Intermediate layers extract feature representations (e.g.,
edges, textures, and high-level patterns).
2. Feature Vector Representation:
o The extracted features are converted into a feature vector
(A feature vector is a numerical representation of an image (or any data)
that captures its important characteristics in a compact form).
o Fully connected layers or pooling layers often generate
this vector.
Compact form representing an image using a feature vector instead of
storing or processing the entire image.
3. Similarity Matching:
o When a user submits a image, its feature vector is
computed.
o The system compares it with stored feature vectors using
distance metrics like:
 Euclidean Distance
 Cosine Similarity
 Manhattan Distance
o The closest matches (i.e., visually similar images) are
retrieved.

Query Image Works in CBIR:

1. User Input:
o The user uploads or selects an image as the query image.
2. Feature Extraction:
o A CNN processes the query image to extract a feature
vector (a numerical representation).
3. Feature Matching:
o The feature vector of the query image is compared with
feature vectors of database images.
4. Similarity Search:
o The system retrieves the most similar images based on a
similarity metric (e.g., Euclidean Distance or Cosine
Similarity).
5. Output:
o The retrieved images, ranked by similarity, are displayed
as search results.

Applications of CBIR Using CNNs


 Medical Imaging: Retrieve similar X-rays or MRI scans for
diagnosis.
 E-Commerce: Find visually similar products in online stores.
 Surveillance & Security: Identify similar faces or objects in
video footage.
 Digital Art & Photography: Search for similar artworks or
design references.

Dataset:
ImageNet uses the hierarchical structure of WordNet. Each
meaningful concept in WordNet, can be described as “synonym set”
or “synset”.

Object Detection in Deep Learning:

Object detection is a computer vision technique that identifies and


localizes objects in images or videos. Unlike image classification,
which assigns a label to an entire image, object detection provides
both the class label and the bounding box coordinates for each
detected object.

Object detection in deep learning typically involves two main steps:

1. Feature Extraction
o A convolutional neural network (CNN) extracts features
from the input image.
o These features help distinguish different objects based on
shape, texture, and colour.
2. Object Localization & Classification
o The model predicts bounding boxes around detected
objects.
o It assigns a class label to each detected object.

Object Detection Models

1. R-CNN (Region-based CNN) Family


o R-CNN → Selective search to find regions of interest
(slow).
o Fast R-CNN → Uses CNN to extract features, making it
faster.
o Faster R-CNN → Introduces a Region Proposal Network
(RPN) for better efficiency.
2. YOLO (You Only Look Once) Series
o YOLO processes the entire image in one pass, making it
real-time and efficient.
o It divides the image into a grid and predicts bounding
boxes and class probabilities directly.
3. SSD (Single Shot MultiBox Detector)
o Uses multiple feature maps at different scales for accurate
detection.
o Faster than R-CNN but slightly less accurate than Faster
R-CNN.

Map of Object Detection Process


Below is a simple conceptual diagram illustrating object detection:

1. Input Image → 2. Feature Extraction (CNN) → 3. Bounding


Box & Class Prediction
+-----------------+

| Input Image | ---> CNN Feature Extraction ---> Bounding Box


& Class Label

| (With Objects) |

+-----------------+
Natural Language Processing (NLP) in Deep Learning:

 Natural Language Processing (NLP) is a branch of artificial


intelligence (AI) that enables machines to understand, interpret,
generate, and respond to human language.
 Deep learning techniques, particularly neural networks, have
significantly improved NLP tasks, making AI systems more
capable of handling complex linguistic patterns.
 Deep learning models process text data using neural networks to
capture the meaning and context of words, sentences, and
paragraphs. These models learn from large datasets and can
handle tasks like translation, sentiment analysis, and chatbot
responses.

Working in natural language processing (NLP) typically involves


using computational techniques to analyze and understand human
language. This can include tasks such as language understanding,
language generation, and language interaction.

Text Input and Data Collection


 Data Collection: Gathering text data from various sources such
as websites, books, social media, or proprietary databases.

 Data Storage: Storing the collected text data in a structured


format, such as a database or a collection of documents.

Text Preprocessing
Preprocessing is crucial to clean and prepare the raw text data for
analysis. Common preprocessing steps include:

 Tokenization: Splitting text into smaller units like words or


sentences.

 Lowercasing: Converting all text to lowercase to ensure


uniformity.

 Stop word Removal: Removing common words that do not


contribute significant meaning, such as “and,” “the,” “is.”

 Punctuation Removal: Removing punctuation marks.

 Text Normalization: Standardizing text format, including


correcting spelling errors, expanding contractions, and handling
special characters.

Text Representation

 Bag of Words (BoW): Representing text as a collection of


words, ignoring grammar and word order but keeping track of
word frequency.

 Term Frequency-Inverse Document Frequency (TF-IDF): A


statistic that reflects the importance of a word in a document
relative to a collection of documents.

 Word Embeddings: Using dense vector representations of


words where semantically similar words are closer together in
the vector space (e.g., Word2Vec, GloVe).

4. Feature Extraction

Extracting meaningful features from the text data that can be used for
various NLP tasks.

 N-grams: Capturing sequences of N words to preserve some


context and word order.
 Syntactic Features: Using parts of speech tags, syntactic
dependencies, and parse trees.

 Semantic Features: Leveraging word embeddings and other


representations to capture word meaning and context.

5. Model Selection and Training

Selecting and training a machine learning or deep learning model to


perform specific NLP tasks.

 Supervised Learning: Using labeled data to train models like


Support Vector Machines (SVM), Random Forests, or deep
learning models like Convolutional Neural Networks (CNNs)
and Recurrent Neural Networks (RNNs).

 Unsupervised Learning: Applying techniques like clustering or


topic modeling (e.g., Latent Dirichlet Allocation) on unlabeled
data.

 Pre-trained Models: Utilizing pre-trained language models


such as BERT, GPT, or transformer-based models that have
been trained on large corpora.

6. Model Deployment and Inference


Deploying the trained model and using it to make predictions or
extract insights from new text data.

 Text Classification: Categorizing text into predefined classes


(e.g., spam detection, sentiment analysis).

 Named Entity Recognition (NER): Identifying and classifying


entities in the text.

 Machine Translation: Translating text from one language to


another.

 Question Answering: Providing answers to questions based on


the context provided by text data.
7. Evaluation and Optimization

Evaluating the performance of the NLP algorithm using metrics such


as accuracy, precision, recall, F1-score, and others.
 Hyperparameter Tuning: Adjusting model parameters to
improve performance.

 Error Analysis: Analyzing errors to understand model


weaknesses and improve robustness.

8. Iteration and Improvement

Continuously improving the algorithm by incorporating new data,


refining preprocessing techniques, experimenting with different
models, and optimizing features.

Key Deep Learning Models for NLP

1. Word Embeddings (Feature Representation)


o Traditional models represented words as one-hot vectors,
but deep learning uses word embeddings (like
Word2Vec, GloVe, and FastText) to represent words in a
continuous vector space.
o Example: "king" - "man" + "woman" ≈ "queen" (semantic
relationships).
2. Recurrent Neural Networks (RNNs)
o RNNs process sequential data and are widely used for
NLP.
o They have a memory mechanism to retain previous
information but struggle with long sequences due to the
vanishing gradient problem.
3. Long Short-Term Memory (LSTM) & Gated Recurrent
Units (GRU)
o LSTMs and GRUs are advanced versions of RNNs that
handle long-term dependencies better by using gating
mechanisms.
o Example: Used in speech recognition and text
generation.
4. Transformer Models (Revolution in NLP)
o The Transformer architecture (introduced in 2017)
replaced RNNs and CNNs in NLP due to its self-attention
mechanism.
o Transformer-based models process entire sentences in
parallel, leading to faster and more accurate NLP
applications.
5. BERT (Bidirectional Encoder Representations from
Transformers)
o Developed by Google, BERT understands the context of
words from both left and right (bidirectional).
o Example: "He went to the bank to withdraw money" vs.
"He sat near the bank of the river."
o BERT helps in question answering, sentiment analysis,
and search engines.
6. GPT (Generative Pre-trained Transformer)
o GPT models (GPT-3, GPT-4) are transformer-based
models designed for text generation.
o Example: AI chatbots, text summarization, and code
generation.
Applications of Deep Learning in NLP
 Machine Translation (Google Translate)

 Chatbots & Virtual Assistants (Siri, Alexa, ChatGPT)

 Sentiment Analysis (Detecting positive/negative reviews)

 Text Summarization (Extracting key points from articles)

 Speech Recognition (Voice assistants, automatic subtitles)

 Named Entity Recognition (NER) (Extracting names, places,


dates)

Sequence Training:
Sequence training in deep learning refers to training models on sequential data,
where the order of the data matters.
It is widely used in tasks such as:

Natural Language Processing (NLP)

Speech Recognition

Time Series Forecasting

Video Analysis

You might also like