2nd Unit NN Final Class Notes (1)
2nd Unit NN Final Class Notes (1)
Here, we have taken an unlabeled input data, which means it is not categorized and corresponding outputs are also not given.
Now, this unlabeled input data is fed to the machine learning model in order to train it.
Firstly, it will interpret the raw data to find the hidden patterns from the data and then will apply suitable algorithms
such as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups according to the similarities and
difference between the objects.
• Clustering: Clustering is a method of grouping the objects into clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another group. Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence of those commonalities.
• Association: An association rule is an unsupervised learning method which is used for finding the relationships between
variables in the large database. It determines the set of items that occurs together in the dataset. Association rule makes marketing
strategy more effective. Such as people who buy X item (suppose a bread) are also tend to
purchase Y (Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.
Unsupervised Learning algorithms:
Below is the list of some popular unsupervised learning algorithms:
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
Unsupervised learning problems further grouped into clustering and association problems.
Clustering
Clustering
Unsupervised Learning Clustering algorithms will process your data and find natural
clusters(groups) if they exist in the data. You can also modify how many clusters your
algorithms should identify.
Exclusive (partitioning)
In this clustering method, Data are grouped in such a way that one data can belong to one
cluster only.
Example: K-means
Agglomerative
In this clustering technique, every data is a cluster. The iterative unions between the two nearest
clusters reduce the number of clusters.
Here, data will be associated with an appropriate membership value. Example: Fuzzy C-Means
Probabilistic
This technique uses probability distribution to create the clusters
“man’s shoe.”
“women’s shoe.”
“women’s glove.”
“man’s glove.”
can be clustered into two categories “shoe” and “glove” or “man” and “women.”
Clustering Types
Following are the clustering types of Machine Learning:
Hierarchical clustering
K-means clustering
K-NN (k nearest neighbors)
Principal Component Analysis
Singular Value Decomposition
Independent Component Analysis
Hierarchical Clustering
Hierarchical clustering is an algorithm which builds a hierarchy of clusters. It begins with all
the data which is assigned to a cluster of their own. Here, two close cluster are going to be in
the same cluster. This algorithm ends when there is only one cluster left.
K-means Clustering
K means it is an iterative clustering algorithm which helps you to find the highest value for
every iteration. Initially, the desired number of clusters are selected. In this clustering method,
you need to cluster the data points into k groups. A larger k means smaller groups with more
granularity in the same way. A lower k means larger groups with less granularity.
The output of the algorithm is a group of “labels.” It assigns data point to one of the k groups.
In k-means clustering, each group is defined by creating a centroid for each group. The
centroids are like the heart of the cluster, which captures the points closest to them and adds
them to the cluster.
Agglomerative clustering
Dendrogram
Agglomerative clustering
This type of K-means clustering starts with a fixed number of clusters. It allocates all data into
the exact number of clusters. This clustering method does not require the number of clusters K
as an input. Agglomeration process starts by forming each data as a single cluster.
This method uses some distance measure, reduces the number of clusters (one in each iteration)
by merging process. Lastly, we have one big cluster that contains all the objects.
Dendrogram
In the Dendrogram clustering method, each level will represent a possible cluster. The height
of dendrogram shows the level of similarity between two join clusters. The closer to the bottom
of the process they are more similar cluster which is finding of the group from dendrogram
which is not natural and mostly subjective.
K- Nearest neighbors
K- nearest neighbour is the simplest of all machine learning classifiers. It differs from other
machine learning techniques, in that it doesn’t produce a model. It is a simple algorithm which
stores all available cases and classifies new instances based on a similarity measure.
It works very well when there is a distance between examples. The learning speed is slow when
the training set is large, and the distance calculation is nontrivial.
Association
Association rules allow you to establish associations amongst data objects inside large
databases. This unsupervised technique is about discovering interesting relationships between
variables in large databases. For example, people that buy a new home most likely to buy new
furniture.
Other Examples:
Clustering automatically split the dataset into groups base on their similarities
Anomaly detection can discover unusual data points in your dataset. It is useful for
finding fraudulent transactions
Association mining identifies sets of items which often occur together in your dataset
Latent variable models are widely used for data preprocessing. Like reducing the
number of features in a dataset or decomposing the dataset into multiple components
You cannot get precise information regarding data sorting, and the output as data used
in unsupervised learning is labeled and not known
Less accuracy of the results is because the input data is not known and not labeled by
people in advance. This means that the machine requires to do this itself.
The spectral classes do not always correspond to informational classes.
The user needs to spend time interpreting and label the classes which follow that
classification.
Spectral properties of classes can also change over time so you can’t have the same
class information while moving from one image to another.
Summary
The F1 layer accepts the inputs and performs some processing and transfers it to the
F2 layer that best matches with the classification factor. There exist two sets of
weighted interconnection for controlling the degree of similarity between the units
in the F1 and the F2 layer. The F2 layer is a competitive layer.
The cluster unit with the large net input becomes the candidate to learn the input
pattern first and the rest F2 units are ignored. The reset unit makes the decision
whether or not the cluster unit is allowed to learn the input pattern depending on how
similar its top-down weight vector is to the input vector and to the decision. This is
called the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or
new information. Higher vigilance produces more detailed memories, lower vigilance
produces more general memories.
Special networks introduction to various networks in neural networks
Neural Networks are artificial networks used in Machine Learning that work in a similar
fashion to the human nervous system.
Many things are connected in various ways for a neural network to mimic and work
like the human brain. Neural networks are basically used in computational models.
This blog is custom-tailored to aid your understanding of different types of commonly used
neural networks, how they work, and their industry applications.
The blog commences with a brief introduction to the working of neural networks. We have
tried to keep it very simple yet effective.
1. Perceptron
2. Feed Forward Neural Network
3. Multilayer Perceptron
4. Convolutional Neural Network
5. Radial Basis Functional Neural Network
6. Recurrent Neural Network
7. LSTM – Long Short-Term Memory
8. Sequence to Sequence Models
9. Modular Neural Network
An Introduction to Artificial Neural Network
Neural networks represent deep learning using artificial intelligence. Certain application
scenarios are too heavy or out of scope for traditional machine learning algorithms to
handle. As they are commonly known, Neural Network pitches in such scenarios and fills
the gap. Also, enrol in the neural networks and deep learning course and enhance your
skills today.
Artificial neural networks are inspired by the biological neurons within the human body
which activate under certain circumstances resulting in a related action performed by the
body in response.
Artificial neural nets consist of various layers of interconnected artificial neurons powered
by activation functions that help in switching them ON/OFF. Like traditional machine
algorithms, here too, there are certain values that neural nets learn in the training phase.
Briefly, each neuron receives a multiplied version of inputs and random weights, which is
then added with a static bias value (unique to each neuron layer); this is then passed to an
appropriate activation function which decides the final value to be given out of the neuron.
There are various activation functions available as per the nature of input values.
Once the output is generated from the final neural net layer, loss function (input vs output)is
calculated, and backpropagation is performed where the weights are adjusted to make the
loss minimum. Finding optimal values of weights is what the overall operation
focuses around. Please refer to the following for better understanding-
Weights are numeric values that are multiplied by inputs. In backpropagation, they are
modified to reduce the loss. In simple words, weights are machine learned values from
Neural Networks. They self-adjust depending on the difference between predicted outputs
vs training inputs.
Activation Function is a mathematical formula that helps the neuron to switch ON/OFF.
There are many types of neural networks available or that might be in the development
stage. They can be classified depending on their: Structure, Data flow, Neurons used and
their density, Layers and their depth activation filters etc. Also, learn about the Neural
network in R to further your learning.
Types of Neural network
A. Perceptron
Perceptron
Perceptron model, proposed by Minsky-Papert is one of the simplest and oldest models of
Neuron. It is the smallest unit of neural network that does certain computations to detect
features or business intelligence in the input data. It accepts weighted inputs, and apply the
activation function to obtain the output as the final result. Perceptron is also known as
TLU(threshold logic unit)
Perceptron is a supervised learning algorithm that classifies the data into two categories,
thus it is a binary classifier. A perceptron separates the input space into two categories by
a hyperplane represented by the following equation:
Advantages of Perceptron
Number of layers depends on the complexity of the function. It has uni-directional forward
propagation but no backward propagation. Weights are static here. An activation function
is fed by inputs which are multiplied by weights. To do so, classifying activation function
or step activation function is used. For example: The neuron is activated if it is above
threshold (usually 0) and the neuron produces 1 as an output. The neuron is not activated if
it is below threshold (usually 0) which is considered as -1. They are fairly simple to
maintain and are equipped with to deal with data which contains a lot of noise.
1. Cannot be used for deep learning [due to absence of dense layers and back
propagation]
C. Multilayer Perceptron
Speech Recognition
Machine Translation
Complex Classification
An entry point towards complex neural nets where input data travels through various layers
of artificial neurons. Every single node is connected to all neurons in the next layer which
makes it a fully connected neural network. Input and output layers are present having
multiple hidden Layers i.e. at least three or more layers in total. It has a bi-directional
propagation i.e. forward propagation and backward propagation.
Inputs are multiplied with weights and fed to the activation function and in
backpropagation, they are modified to reduce the loss. In simple words, weights are
machine learnt values from Neural Networks. They self-adjust depending on the difference
between predicted outputs vs training inputs. Nonlinear activation functions are used
followed by softmax as an output layer activation function.
Advantages on Multi-Layer Perceptron
1. Used for deep learning [due to the presence of dense fully connected layers
and back propagation]
Disadvantages on Multi-Layer Perceptron:
Image processing
Computer Vision
Speech Recognition
Machine translation
Convolution neural network contains a three-dimensional arrangement of neurons instead
of the standard two-dimensional array. The first layer is called a convolutional layer. Each
neuron in the convolutional layer only processes the information from a small part of the
visual field. Input features are taken in batch-wise like a filter.
The network understands the images in parts and can compute these operations multiple
times to complete the full image processing. Processing involves conversion of the image
from RGB or HSI scale to grey-scale. Furthering the changes in the pixel value will help to
detect the edges and images can be classified into different categories.
Filters are used to extract certain parts of the image. In MLP the inputs are multiplied with
weights and fed to the activation function. Convolution uses RELU and MLP uses
nonlinear activation function followed by softmax. Convolution neural networks show very
effective results in image and video recognition, semantic parsing and paraphrase detection.
Advantages of Convolution Neural Network:
Radial Basis Function Network consists of an input vector followed by a layer of RBF
neurons and an output layer with one node per category. Classification is performed by
measuring the input’s similarity to data points from the training set where each neuron
stores a prototype. This will be one of the examples from the training set.
When a new input vector [the n-dimensional vector that you are trying to classify] needs
to be classified, each neuron calculates the Euclidean distance between the input and
its prototype. For example, if we have two classes i.e. class A and Class B, then the new
input to be classified is more close to class A prototypes than the class B prototypes.
Hence, it could be tagged or classified as class A.
Each RBF neuron compares the input vector to its prototype and outputs a value ranging
which is a measure of similarity from 0 to 1. As the input equals to the prototype, the
output of that RBF neuron will be 1 and with the distance grows between the input and
prototype the response falls off exponentially towards 0. The curve generated out of
neuron’s response tends towards a typical bell curve. The output layer consists of a set
of neurons [one per category].
Application: Power Restoration
Designed to save the output of a layer, Recurrent Neural Network is fed back
to the input to help in predicting the outcome of the layer. The first layer is
typically a feed forward neural network followed by recurrent neural
network layer where some information it had in the previous time-step is
remembered by a memory function.
LSTM networks are a type of RNN that uses special units in addition to standard units.
LSTM units include a ‘memory cell’ that can maintain information in memory for long
periods of time. A set of gates is used to control when information enters the memory when
it’s output, and when it’s forgotten.
There are three types of gates viz, Input gate, output gate and forget gate. Input gate decides
how many information from the last sample will be kept in memory; the output gate
regulates the amount of data passed to the next layer, and forget gates control the tearing
rate of memory stored. This architecture lets them learn longer-term dependencies.
This is one of the implementations of LSTM cells, many other architectures exist.
The encoder and decoder work simultaneously – either using the same parameter or
different ones. This model, on contrary to the actual RNN, is particularly applicable in those
cases where the length of the input data is equal to the length of the output data. While they
possess similar benefits and limitations of the RNN, these models are usually applied
mainly in chatbots, machine translations, and question answering systems.
1. Efficient
2. Independent training
3. Robustness
Disadvantages of Modular Neural Network
Perceptron
Feed Forward Neural Network
Multilayer Perceptron
Convolutional Neural Network
Radial Basis Functional Neural Network
Recurrent Neural Network
LSTM – Long Short-Term Memory
Sequence to Sequence Models
Modular Neural Network
2. What is neural network and its types?
Neural Networks are artificial networks used in Machine Learning that work in a similar
fashion to the human nervous system. Many things are connected in various ways for a
neural network to mimic and work like the human brain. Neural networks are basically
used in computational models.
3. What is CNN and DNN?
A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers
between the input and output layers. They can model complex non-linear relationships.
Convolutional Neural Networks (CNN) are an alternative type of DNN that allow
modelling both time and space correlations in multivariate signals.
CNN is a specific kind of ANN that has one or more layers of convolutional units. The
class of ANN covers several architectures including Convolutional Neural Networks
(CNN), Recurrent Neural Networks (RNN) eg LSTM and GRU, Autoencoders, and Deep
Belief Networks.
Multilayer Perceptron (MLP) is great for MNIST as it is a simpler and more straight
forward dataset, but it lags when it comes to real-world application in computer vision,
specifically image classification as compared to CNN which is great.
In the context of neural networks, when we talk about "special networks," we are usually
referring to specialized architectures or types of neural networks that are designed for
specific tasks or have unique characteristics. Here's an introduction to various special
networks commonly used in the field of deep learning:
Extra Information:
a) Purpose: CNNs are primarily used for image and video analysis tasks. They excel
at feature extraction from grid-like data, making them suitable for tasks like image
classification, object detection, and image segmentation.
b) Structure: CNNs consist of convolutional layers that apply filters to input data to
extract hierarchical features and pooling layers to down sample the feature maps.
They are known for their ability to capture spatial hierarchies in data.
a) Purpose: RNNs are designed for sequential data processing, making them suitable
for tasks like natural language processing (NLP), speech recognition, and time-
series analysis.
b) Structure: RNNs have recurrent connections that allow information to persist
across time steps. They can model sequences and dependencies in data, but they
may suffer from vanishing gradient problems.
Long Short-Term Memory Networks (LSTMs):
a) Purpose: LSTMs are a specialized type of RNN that addresses the vanishing
gradient problem. They are widely used in NLP, speech recognition, and time-series
forecasting tasks.
b) Structure: LSTMs have memory cells that can capture long-range dependencies in
data. They are effective at modeling sequential data and are less prone to gradient-
related issues compared to vanilla RNNs.
a) Purpose: GRUs are another type of RNN designed to address the vanishing
gradient problem, similar to LSTMs. They are used in tasks similar to those of
LSTMs.
b) Structure: GRUs have a simplified structure compared to LSTMs, with fewer
gates. They strike a balance between performance and computational complexity.
Autoencoders:
a) Purpose: GANs are used for generative tasks, such as image generation, style
transfer, and data augmentation. They consist of two networks: a generator and a
discriminator, engaged in a competitive training process.
b) Structure: The generator aims to create data that is indistinguishable from real data,
while the discriminator tries to differentiate between real and generated data. This
competition leads to the generation of realistic data samples.
Transformers:
These are some of the special neural network architectures commonly used in deep
learning, each tailored to specific types of data and tasks. Researchers continue to develop
new network architectures and variations to improve performance on various applications.