0% found this document useful (0 votes)
6 views

2nd Unit NN Final Class Notes (1)

Deep learning notes Jntuh

Uploaded by

niteeshs7e
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

2nd Unit NN Final Class Notes (1)

Deep learning notes Jntuh

Uploaded by

niteeshs7e
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Subject: Neural Networks and Deep Learning

(UNIT-2 Class Notes)-Unsupervised Learning Networks


Example: Suppose the unsupervised learning algorithm is given an input dataset containing images of different types of cats and
dogs. The algorithm is never trained upon the given dataset, which means it does not have any idea about the features of the
dataset.
The task of the unsupervised learning algorithm is to identify the image features on their own. Unsupervised learning
algorithm will perform this task by clustering the image dataset into the groups according to similarities between
images.
Why use Unsupervised Learning?
Below are some main reasons which describe the importance of Unsupervised Learning:
• Unsupervised learning is helpful for finding useful insights from the data.
• Unsupervised learning is much similar as a human learns to think by their own experiences, which makes it closer to
the real AI.
• Unsupervised learning works on unlabeled and uncategorized data which make unsupervised learning more
important.
• In real-world, we do not always have input data with the corresponding output so to solve such cases, we need
unsupervised learning.
Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:

Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given.
Now, this unlabeled input data is fed to the machine learning model in order to train it.

Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms
such as k-means clustering, Decision tree, etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and
difference between the objects.
• Clustering: Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of those commonalities.

• Association: An association rule is an unsupervised learning method which is used for finding the relationships between
variables in the large database. It determines the set of items that occurs together in the dataset. Association rule makes marketing
strategy more effective. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

Advantages of Unsupervised Learning


• Unsupervised learning is used for more complex tasks as compared to supervised learning because, in unsupervised learning, we
don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.

Disadvantages of Unsupervised Learning


• Unsupervised learning is intrinsically more difficult than supervised learning as it does not have corresponding output.
• The result of the unsupervised learning algorithm might be less accurate as input data is not labeled, and algorithms
do not know the exact output in advance.
Clustering Types of Unsupervised Learning Algorithms
Below are the clustering types of Unsupervised Machine Learning algorithms:

Unsupervised learning problems further grouped into clustering and association problems.

Clustering

Clustering

Clustering is an important concept when it comes to unsupervised learning. It mainly deals


with finding a structure or pattern in a collection of uncategorized data.

Unsupervised Learning Clustering algorithms will process your data and find natural
clusters(groups) if they exist in the data. You can also modify how many clusters your
algorithms should identify.

It allows you to adjust the granularity of these groups.

There are different types of clustering you can utilize:

Exclusive (partitioning)
In this clustering method, Data are grouped in such a way that one data can belong to one
cluster only.

Example: K-means

Agglomerative
In this clustering technique, every data is a cluster. The iterative unions between the two nearest
clusters reduce the number of clusters.

Example: Hierarchical clustering


Overlapping
In this technique, fuzzy sets is used to cluster data. Each point may belong to two or more
clusters with separate degrees of membership.

Here, data will be associated with an appropriate membership value. Example: Fuzzy C-Means

Probabilistic
This technique uses probability distribution to create the clusters

Example: Following keywords

 “man’s shoe.”
 “women’s shoe.”
 “women’s glove.”
 “man’s glove.”

can be clustered into two categories “shoe” and “glove” or “man” and “women.”

Clustering Types
Following are the clustering types of Machine Learning:

 Hierarchical clustering
 K-means clustering
 K-NN (k nearest neighbors)
 Principal Component Analysis
 Singular Value Decomposition
 Independent Component Analysis

Hierarchical Clustering
Hierarchical clustering is an algorithm which builds a hierarchy of clusters. It begins with all
the data which is assigned to a cluster of their own. Here, two close cluster are going to be in
the same cluster. This algorithm ends when there is only one cluster left.

K-means Clustering
K means it is an iterative clustering algorithm which helps you to find the highest value for
every iteration. Initially, the desired number of clusters are selected. In this clustering method,
you need to cluster the data points into k groups. A larger k means smaller groups with more
granularity in the same way. A lower k means larger groups with less granularity.

The output of the algorithm is a group of “labels.” It assigns data point to one of the k groups.
In k-means clustering, each group is defined by creating a centroid for each group. The
centroids are like the heart of the cluster, which captures the points closest to them and adds
them to the cluster.

K-mean clustering further defines two subgroups:

 Agglomerative clustering
 Dendrogram
Agglomerative clustering
This type of K-means clustering starts with a fixed number of clusters. It allocates all data into
the exact number of clusters. This clustering method does not require the number of clusters K
as an input. Agglomeration process starts by forming each data as a single cluster.

This method uses some distance measure, reduces the number of clusters (one in each iteration)
by merging process. Lastly, we have one big cluster that contains all the objects.

Dendrogram
In the Dendrogram clustering method, each level will represent a possible cluster. The height
of dendrogram shows the level of similarity between two join clusters. The closer to the bottom
of the process they are more similar cluster which is finding of the group from dendrogram
which is not natural and mostly subjective.

K- Nearest neighbors
K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other
machine learning techniques, in that it doesn’t produce a model. It is a simple algorithm which
stores all available cases and classifies new instances based on a similarity measure.

It works very well when there is a distance between examples. The learning speed is slow when
the training set is large, and the distance calculation is nontrivial.

Principal Components Analysis


In case you want a higher-dimensional space. You need to select a basis for that space and only
the 200 most important scores of that basis. This base is known as a principal component. The
subset you select constitute is a new space which is small in size compared to original space.
It maintains as much of the complexity of data as possible.

Association
Association rules allow you to establish associations amongst data objects inside large
databases. This unsupervised technique is about discovering interesting relationships between
variables in large databases. For example, people that buy a new home most likely to buy new
furniture.

Other Examples:

 A subgroup of cancer patients grouped by their gene expression measurements


 Groups of shopper based on their browsing and purchasing histories
 Movie group by the rating given by movies viewers
Supervised vs. Unsupervised Machine Learning
Supervised machine learning
Parameters Unsupervised machine learning technique
technique
Algorithms are trained using Algorithms are used against data which is not
Input Data
labeled data. labelled
Computational Supervised learning is a simpler Unsupervised learning is computationally
Complexity method. complex
Highly accurate and trustworthy
Accuracy Less accurate and trustworthy method.
method.

Applications of Unsupervised Machine Learning


Some application of Unsupervised Learning Techniques are:

 Clustering automatically split the dataset into groups base on their similarities
 Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions
 Association mining identifies sets of items which often occur together in your dataset
 Latent variable models are widely used for data preprocessing. Like reducing the
number of features in a dataset or decomposing the dataset into multiple components

Disadvantages of Unsupervised Learning

 You cannot get precise information regarding data sorting, and the output as data used
in unsupervised learning is labeled and not known
 Less accuracy of the results is because the input data is not known and not labeled by
people in advance. This means that the machine requires to do this itself.
 The spectral classes do not always correspond to informational classes.
 The user needs to spend time interpreting and label the classes which follow that
classification.
 Spectral properties of classes can also change over time so you can’t have the same
class information while moving from one image to another.

Summary

 Unsupervised learning is a machine learning technique, where you do not need to


supervise the model.
 Unsupervised machine learning helps you to finds all kind of unknown patterns in data.
 Clustering and Association are two types of Unsupervised learning.
 Four types of clustering methods are 1) Exclusive 2) Agglomerative 3) Overlapping 4)
Probabilistic.
 Important clustering types are: 1)Hierarchical clustering 2) K-means clustering 3) K-
NN 4) Principal Component Analysis 5) Singular Value Decomposition 6) Independent
Component Analysis.
 Association rules allow you to establish associations amongst data objects inside large
databases.
 In Supervised learning, Algorithms are trained using labelled data while in
Unsupervised learning Algorithms are used against data which is not labelled.
 Anomaly detection can discover important data points in your dataset which is useful
for finding fraudulent transactions.
 The biggest drawback of Unsupervised learning is that you cannot get precise
information regarding data sorting.
Adaptive Resonance Theory (ART)

 Adaptive Resonance Theory (ART) Adaptive resonance theory is a type of


neural network technique developed by Stephen Grossberg and Gail Carpenter in
1987.
 The basic ART uses unsupervised learning technique. The
term “adaptive” and “resonance” used in this suggests that they are open to new
learning(i.e. adaptive) without discarding the previous or the old information(i.e.
resonance).
 The ART networks are known to solve the stability-plasticity dilemma i.e.,
stability refers to their nature of memorizing the learning and plasticity refers to
the fact that they are flexible to gain new information. Due to this the nature of
ART they are always able to learn new input patterns without forgetting the past.
 ART networks implement a clustering algorithm. Input is presented to the network
and the algorithm checks whether it fits into one of the already stored clusters. If
it fits then the input is added to the cluster that matches the most else a new cluster
is formed.
 Types of Adaptive Resonance Theory(ART) Carpenter and Grossberg
developed different ART architectures as a result of 20 years of research. The
ARTs can be classified as follows:
 ART1 – It is the simplest and the basic ART architecture. It is capable of
clustering binary input values. 
 ART2 – It is extension of ART1 that is capable of clustering continuous-
valued input data. 
 Fuzzy ART – It is the augmentation of fuzzy logic and ART.
 ARTMAP – It is a supervised form of ART learning where one ART learns
based on the previous ART module. It is also known as predictive ART. 
 FARTMAP – This is a supervised ART architecture with Fuzzy logic
included.
Basic of Adaptive Resonance Theory (ART) Architecture The adaptive resonant
theory is a type of neural network that is self-organizing and competitive.
It can be of both types, the unsupervised ones(ART1, ART2, ART3, etc) or the
supervised ones(ARTMAP). Generally, the supervised algorithms are named with the
suffix “MAP”. But the basic ART model is unsupervised in nature and consists of :
 F1 layer or the comparison field(where the inputs are processed)
 F2 layer or the recognition field (which consists of the clustering units)
 The Reset Module (that acts as a control mechanism)

The F1 layer accepts the inputs and performs some processing and transfers it to the
F2 layer that best matches with the classification factor. There exist two sets of
weighted interconnection for controlling the degree of similarity between the units
in the F1 and the F2 layer. The F2 layer is a competitive layer.

The cluster unit with the large net input becomes the candidate to learn the input
pattern first and the rest F2 units are ignored. The reset unit makes the decision
whether or not the cluster unit is allowed to learn the input pattern depending on how
similar its top-down weight vector is to the input vector and to the decision. This is
called the vigilance test.

Thus we can say that the vigilance parameter helps to incorporate new memories or
new information. Higher vigilance produces more detailed memories, lower vigilance
produces more general memories.
Special networks introduction to various networks in neural networks

 Neural Networks are artificial networks used in Machine Learning that work in a similar
fashion to the human nervous system.

 Many things are connected in various ways for a neural network to mimic and work
like the human brain. Neural networks are basically used in computational models.

Types of Neural Networks and Definition of Neural Network

This blog is custom-tailored to aid your understanding of different types of commonly used
neural networks, how they work, and their industry applications.

The blog commences with a brief introduction to the working of neural networks. We have
tried to keep it very simple yet effective.

Types of neural networks models are listed below:

The nine types of neural networks are:

1. Perceptron
2. Feed Forward Neural Network
3. Multilayer Perceptron
4. Convolutional Neural Network
5. Radial Basis Functional Neural Network
6. Recurrent Neural Network
7. LSTM – Long Short-Term Memory
8. Sequence to Sequence Models
9. Modular Neural Network
An Introduction to Artificial Neural Network

 Neural networks represent deep learning using artificial intelligence. Certain application
scenarios are too heavy or out of scope for traditional machine learning algorithms to
handle. As they are commonly known, Neural Network pitches in such scenarios and fills
the gap. Also, enrol in the neural networks and deep learning course and enhance your
skills today.

 Artificial neural networks are inspired by the biological neurons within the human body
which activate under certain circumstances resulting in a related action performed by the
body in response.

 Artificial neural nets consist of various layers of interconnected artificial neurons powered
by activation functions that help in switching them ON/OFF. Like traditional machine
algorithms, here too, there are certain values that neural nets learn in the training phase.

 Briefly, each neuron receives a multiplied version of inputs and random weights, which is
then added with a static bias value (unique to each neuron layer); this is then passed to an
appropriate activation function which decides the final value to be given out of the neuron.
There are various activation functions available as per the nature of input values.

 Once the output is generated from the final neural net layer, loss function (input vs output)is
calculated, and backpropagation is performed where the weights are adjusted to make the
loss minimum. Finding optimal values of weights is what the overall operation
focuses around. Please refer to the following for better understanding-
Weights are numeric values that are multiplied by inputs. In backpropagation, they are
modified to reduce the loss. In simple words, weights are machine learned values from
Neural Networks. They self-adjust depending on the difference between predicted outputs
vs training inputs.

Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.

 Input layer represents dimensions of the input vector.


 Hidden layer represents the intermediary nodes that divide the input space
into regions with (soft) boundaries. It takes in a set of weighted input and
produces output through an activation function.
 Output layer represents the output of the neural network.
Types of Neural Networks

There are many types of neural networks available or that might be in the development
stage. They can be classified depending on their: Structure, Data flow, Neurons used and
their density, Layers and their depth activation filters etc. Also, learn about the Neural
network in R to further your learning.
Types of Neural network

We are going to discuss the following neural networks:

A. Perceptron

Perceptron
Perceptron model, proposed by Minsky-Papert is one of the simplest and oldest models of
Neuron. It is the smallest unit of neural network that does certain computations to detect
features or business intelligence in the input data. It accepts weighted inputs, and apply the
activation function to obtain the output as the final result. Perceptron is also known as
TLU(threshold logic unit)

Perceptron is a supervised learning algorithm that classifies the data into two categories,
thus it is a binary classifier. A perceptron separates the input space into two categories by
a hyperplane represented by the following equation:

Advantages of Perceptron

Perceptrons can implement Logic Gates like AND, OR, or NAND.

Disadvantages of Perceptron: Perceptrons can only learn linearly separable problems


such as boolean AND problem. For non-linear problems such as the boolean XOR problem,
it does not work.
B. Feed Forward Neural Networks

Applications on Feed Forward Neural Networks:

 Simple classification (where traditional Machine-learning based


classification algorithms have limitations)
 Face recognition [Simple straight forward image processing]
 Computer vision [Where target classes are difficult to classify] 
 Speech Recognition
The simplest form of neural networks where input data travels in one direction only, passing
through artificial neural nodes and exiting through output nodes. Where hidden layers may
or may not be present, input and output layers are present there. Based on this, they can be
further classified as a single-layered or multi-layered feed-forward neural network.

Number of layers depends on the complexity of the function. It has uni-directional forward
propagation but no backward propagation. Weights are static here. An activation function
is fed by inputs which are multiplied by weights. To do so, classifying activation function
or step activation function is used. For example: The neuron is activated if it is above
threshold (usually 0) and the neuron produces 1 as an output. The neuron is not activated if
it is below threshold (usually 0) which is considered as -1. They are fairly simple to
maintain and are equipped with to deal with data which contains a lot of noise.

Advantages of Feed Forward Neural Networks

1. Less complex, easy to design & maintain


2. Fast and speedy [One-way propagation]
3. Highly responsive to noisy data
Disadvantages of Feed Forward Neural Networks:

1. Cannot be used for deep learning [due to absence of dense layers and back
propagation]
C. Multilayer Perceptron

Applications on Multi-Layer Perceptron

 Speech Recognition
 Machine Translation
 Complex Classification
An entry point towards complex neural nets where input data travels through various layers
of artificial neurons. Every single node is connected to all neurons in the next layer which
makes it a fully connected neural network. Input and output layers are present having
multiple hidden Layers i.e. at least three or more layers in total. It has a bi-directional
propagation i.e. forward propagation and backward propagation.

Inputs are multiplied with weights and fed to the activation function and in
backpropagation, they are modified to reduce the loss. In simple words, weights are
machine learnt values from Neural Networks. They self-adjust depending on the difference
between predicted outputs vs training inputs. Nonlinear activation functions are used
followed by softmax as an output layer activation function.
Advantages on Multi-Layer Perceptron

1. Used for deep learning [due to the presence of dense fully connected layers
and back propagation]
Disadvantages on Multi-Layer Perceptron:

1. Comparatively complex to design and maintain


Comparatively slow (depends on number of hidden layers)

D. Convolutional Neural Network

Applications on Convolution Neural Network

 Image processing
 Computer Vision
 Speech Recognition
 Machine translation

Convolution neural network contains a three-dimensional arrangement of neurons instead
of the standard two-dimensional array. The first layer is called a convolutional layer. Each
neuron in the convolutional layer only processes the information from a small part of the
visual field. Input features are taken in batch-wise like a filter.

The network understands the images in parts and can compute these operations multiple
times to complete the full image processing. Processing involves conversion of the image
from RGB or HSI scale to grey-scale. Furthering the changes in the pixel value will help to
detect the edges and images can be classified into different categories.

Propagation is uni-directional where CNN contains one or more convolutional layers


followed by pooling and bidirectional where the output of convolution layer goes to a fully
connected neural network for classifying the images as shown in the above diagram.

Filters are used to extract certain parts of the image. In MLP the inputs are multiplied with
weights and fed to the activation function. Convolution uses RELU and MLP uses
nonlinear activation function followed by softmax. Convolution neural networks show very
effective results in image and video recognition, semantic parsing and paraphrase detection.
Advantages of Convolution Neural Network:

1. Used for deep learning with few parameters


2. Less parameters to learn as compared to fully connected layer
Disadvantages of Convolution Neural Network:

 Comparatively complex to design and maintain


 Comparatively slow [depends on the number of hidden layers]
E. Radial Basis Function Neural Networks


Radial Basis Function Network consists of an input vector followed by a layer of RBF
neurons and an output layer with one node per category. Classification is performed by
measuring the input’s similarity to data points from the training set where each neuron
stores a prototype. This will be one of the examples from the training set.

 When a new input vector [the n-dimensional vector that you are trying to classify] needs
to be classified, each neuron calculates the Euclidean distance between the input and
its prototype. For example, if we have two classes i.e. class A and Class B, then the new
input to be classified is more close to class A prototypes than the class B prototypes.
Hence, it could be tagged or classified as class A.

 Each RBF neuron compares the input vector to its prototype and outputs a value ranging
which is a measure of similarity from 0 to 1. As the input equals to the prototype, the
output of that RBF neuron will be 1 and with the distance grows between the input and
prototype the response falls off exponentially towards 0. The curve generated out of
neuron’s response tends towards a typical bell curve. The output layer consists of a set
of neurons [one per category].
Application: Power Restoration

a. Powercut P1 needs to be restored first


b. Powercut P3 needs to be restored next, as it impacts more houses
c. Powercut P2 should be fixed last as it impacts only one house
F. Recurrent Neural Networks

Applications of Recurrent Neural Networks

 Text processing like auto suggest, grammar checks, etc.


 Text to speech processing
 Image tagger
 Sentiment Analysis
 Translation

Designed to save the output of a layer, Recurrent Neural Network is fed back
to the input to help in predicting the outcome of the layer. The first layer is
typically a feed forward neural network followed by recurrent neural
network layer where some information it had in the previous time-step is
remembered by a memory function.

 Forward propagation is implemented in this case. It stores information


required for it’s future use. If the prediction is wrong, the learning rate is
employed to make small changes. Hence, making it gradually increase
towards making the right prediction during the backpropagation. 
Advantages of Recurrent Neural Networks

1. Model sequential data where each sample can be assumed to be dependent


on historical ones is one of the advantage.
2. Used with convolution layers to extend the pixel effectiveness.
Disadvantages of Recurrent Neural Networks

1. Gradient vanishing and exploding problems


2. Training recurrent neural nets could be a difficult task
3. Difficult to process long sequential data using ReLU as an activation
function.
Improvement over RNN: LSTM (Long Short-Term Memory) Networks

LSTM networks are a type of RNN that uses special units in addition to standard units.
LSTM units include a ‘memory cell’ that can maintain information in memory for long
periods of time. A set of gates is used to control when information enters the memory when
it’s output, and when it’s forgotten.

There are three types of gates viz, Input gate, output gate and forget gate. Input gate decides
how many information from the last sample will be kept in memory; the output gate
regulates the amount of data passed to the next layer, and forget gates control the tearing
rate of memory stored. This architecture lets them learn longer-term dependencies.

This is one of the implementations of LSTM cells, many other architectures exist.

Source: Research gate


G. Sequence to sequence models

A sequence-to-sequence model consists of two Recurrent Neural Networks. Here, there


exists an encoder that processes the input and a decoder that processes the output.

The encoder and decoder work simultaneously – either using the same parameter or
different ones. This model, on contrary to the actual RNN, is particularly applicable in those
cases where the length of the input data is equal to the length of the output data. While they
possess similar benefits and limitations of the RNN, these models are usually applied
mainly in chatbots, machine translations, and question answering systems.

H. Modular Neural Network

Applications of Modular Neural Network

1. Stock market prediction systems


2. Adaptive MNN for character recognitions
3. Compression of high level input data
A modular neural network has a number of different networks that function independently
and perform sub-tasks. The different networks do not really interact with or signal each
other during the computation process. They work independently towards achieving the
output.
As a result, a large and complex computational process are done significantly faster by
breaking it down into independent components. The computation speed increases because
the networks are not interacting with or even connected to each other.

Advantages of Modular Neural Network

1. Efficient
2. Independent training
3. Robustness
Disadvantages of Modular Neural Network

1. Moving target Problems


1. What are the 9 major types of neural networks?

The nine types of neural networks are:

 Perceptron
 Feed Forward Neural Network
 Multilayer Perceptron
 Convolutional Neural Network
 Radial Basis Functional Neural Network
 Recurrent Neural Network
 LSTM – Long Short-Term Memory
 Sequence to Sequence Models
 Modular Neural Network
2. What is neural network and its types?

Neural Networks are artificial networks used in Machine Learning that work in a similar
fashion to the human nervous system. Many things are connected in various ways for a
neural network to mimic and work like the human brain. Neural networks are basically
used in computational models.
3. What is CNN and DNN?

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers
between the input and output layers. They can model complex non-linear relationships.
Convolutional Neural Networks (CNN) are an alternative type of DNN that allow
modelling both time and space correlations in multivariate signals.

4. How does CNN differ from Ann?

CNN is a specific kind of ANN that has one or more layers of convolutional units. The
class of ANN covers several architectures including Convolutional Neural Networks
(CNN), Recurrent Neural Networks (RNN) eg LSTM and GRU, Autoencoders, and Deep
Belief Networks.

5. Why is CNN better than MLP?

Multilayer Perceptron (MLP) is great for MNIST as it is a simpler and more straight
forward dataset, but it lags when it comes to real-world application in computer vision,
specifically image classification as compared to CNN which is great.

In the context of neural networks, when we talk about "special networks," we are usually
referring to specialized architectures or types of neural networks that are designed for
specific tasks or have unique characteristics. Here's an introduction to various special
networks commonly used in the field of deep learning:

Extra Information:

Convolutional Neural Networks (CNNs):

a) Purpose: CNNs are primarily used for image and video analysis tasks. They excel
at feature extraction from grid-like data, making them suitable for tasks like image
classification, object detection, and image segmentation.
b) Structure: CNNs consist of convolutional layers that apply filters to input data to
extract hierarchical features and pooling layers to down sample the feature maps.
They are known for their ability to capture spatial hierarchies in data.

Recurrent Neural Networks (RNNs):

a) Purpose: RNNs are designed for sequential data processing, making them suitable
for tasks like natural language processing (NLP), speech recognition, and time-
series analysis.
b) Structure: RNNs have recurrent connections that allow information to persist
across time steps. They can model sequences and dependencies in data, but they
may suffer from vanishing gradient problems.
Long Short-Term Memory Networks (LSTMs):

a) Purpose: LSTMs are a specialized type of RNN that addresses the vanishing
gradient problem. They are widely used in NLP, speech recognition, and time-series
forecasting tasks.
b) Structure: LSTMs have memory cells that can capture long-range dependencies in
data. They are effective at modeling sequential data and are less prone to gradient-
related issues compared to vanilla RNNs.

Gated Recurrent Unit Networks (GRUs):

a) Purpose: GRUs are another type of RNN designed to address the vanishing
gradient problem, similar to LSTMs. They are used in tasks similar to those of
LSTMs.
b) Structure: GRUs have a simplified structure compared to LSTMs, with fewer
gates. They strike a balance between performance and computational complexity.

Autoencoders:

a) Purpose: Autoencoders are used for unsupervised learning and dimensionality


reduction. They can be employed for tasks like image denoising, feature learning,
and anomaly detection.
b) Structure: Autoencoders consist of an encoder network that maps input data to a
lower-dimensional representation (latent space) and a decoder network that
reconstructs the original data from the latent space.

Generative Adversarial Networks (GANs):

a) Purpose: GANs are used for generative tasks, such as image generation, style
transfer, and data augmentation. They consist of two networks: a generator and a
discriminator, engaged in a competitive training process.
b) Structure: The generator aims to create data that is indistinguishable from real data,
while the discriminator tries to differentiate between real and generated data. This
competition leads to the generation of realistic data samples.

Transformers:

a) Purpose: Transformers are a versatile architecture used in various tasks, especially


in NLP. They excel at capturing long-range dependencies and have been applied to
machine translation, text generation, and question-answering tasks.
b) Structure: Transformers use self-attention mechanisms to process input data in
parallel, making them highly scalable and efficient for sequential tasks.

These are some of the special neural network architectures commonly used in deep
learning, each tailored to specific types of data and tasks. Researchers continue to develop
new network architectures and variations to improve performance on various applications.

You might also like