Soft Computing Unit 2 Notes..
Soft Computing Unit 2 Notes..
In supervised learning, the training data provided to the machines work as the supervisor that
teaches the machines to predict the output correctly. It applies the same concept as a student learns
in the supervision of the teacher.
Supervised learning is a process of providing input data as well as correct output data to the machine
learning model. The aim of a supervised learning algorithm is to find a mapping function to map
the input variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image classification,
Fraud Detection, spam filtering, etc.
The working of Supervised learning can be easily understood by the below example and diagram:
The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape
on the bases of a number of sides, and predicts the output.
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Below are some popular Regression algorithms which come under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two
classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
o With the help of supervised learning, the model can predict the output on the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud detection, spam
filtering, etc.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.
Perceptron Learning-
In Machine Learning and Artificial Intelligence, Perceptron is the most commonly used term for
all folks. It is the primary step to learn Machine Learning and Deep Learning technologies,
which consists of a set of weights, input values or scores, and a threshold. Perceptron is a
building block of an Artificial Neural Network. Initially, in the mid of 19th century, Mr. Frank
Rosenblatt invented the Perceptron for performing certain calculations to detect input data
capabilities or business intelligence. Perceptron is a linear Machine Learning algorithm used for
supervised learning for various binary classifiers. This algorithm enables neurons to learn
elements and processes them one by one during preparation.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks. However, it is a supervised learning algorithm of binary classifiers. Hence, we can
consider it as a single-layer neural network with four main parameters, i.e., input values,
weights and Bias, net sum, and an activation function.
Activation Function:
These are the final and important components that help to determine whether the neuron will fire or
not. Activation Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on various problem
statements and forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in
perceptron models by checking whether the learning process is slow or has vanishing or exploding
gradients.
This step function or Activation function plays a vital role in ensuring that output is mapped between
required values (0,1) or (-1,1). It is important to note that the weight of input is indicative of the strength of
a node. Similarly, an input's bias value gives the ability to shift the activation function curve up or down.
Step-1
In the first step first, multiply all input values with corresponding weight values and then add them to
determine the weighted sum. Mathematically, we can calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted sum, which gives
us output either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
Single Layer/Multilayer
Types of Perceptron Models
Based on the layers, Perceptron models are divided into two types. These are as follows:
In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with
inconstantly allocated input for weight parameters. Further, it sums up all inputs (weight). After
adding all inputs, if the total sum of all inputs is more than a pre-determined value, the model
gets activated and shows the output value as +1.
If the outcome is same as pre-determined or threshold value, then the performance of this model
is stated as satisfied, and weight demand does not change. However, this model consists of a
few discrepancies triggered when multiple weight inputs values are fed into the model. Hence,
to find desired output and minimize errors, some changes should be necessary for the weights
input.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in
two stages as follows:
Forward Stage: Activation functions start from the input layer in the forward stage and
terminate on the output layer.
Backward Stage: In the backward stage, weight and bias values are modified as per the
model's requirement. In this stage, the error between actual output and demanded originated
backward on the output layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having
various layers in which activation function does not remain linear, similar to a single layer perceptron
model. Instead of linear, activation function can be executed as sigmoid, TanH, ReLU, etc., for
deployment.
A multi-layer perceptron model has greater processing power and can process linear and non-linear
patterns. Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR,
NOR.
Perceptron Function-
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight
coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron-
The perceptron model has the following characteristics.
o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer
function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If input
vectors are non-linear, it is not easy to classify them properly.
ADALINE, MADALINE
Architecture of ADALINE (Adaptive Linear Neuron)
ADALINE is one of the earliest neural networks developed by Bernard Widrow and Ted Hoff in 1960.
It stands for Adaptive Linear Neuron or Adaptive Linear Element. The ADALINE architecture is
similar to a simple perceptron but with a key difference in the learning rule and activation function. It is
used for classification and prediction tasks.
Workflow
Architecture:
Key Characteristics:
Linear Decision Boundary: Since it uses a linear activation function, it can only solve linearly
separable problems.
Real-Valued Output: The output is not binary; instead, it's a real value determined by the
weighted sum of inputs.
Algorithm:
Step 1: Initialize weight not zero but small random values are used. Set learning rate α.
Step 2: While the stopping condition is False do steps 3 to 7.
Step 3: for each training set perform steps 4 to 6.
Step 4: Set activation of input unit x i = si for (i=1 to n).
Step 5: compute net input to output unit
when the predicted output and the true value are the same then the weight will not change.
Step 7: Test the stopping condition. The stopping condition may be when the weight changes at a
low rate or no change.
MADALINE (Multiple ADAptive LINear Elements) is a more complex neural network architecture
than ADALINE, designed to handle non-linear decision boundaries. It was also developed by Bernard
Widrow and his student, Michael Lehr, in 1959. MADALINE is essentially a network of multiple
ADALINE units.
The Madaline(supervised Learning) model consists of many Adaline in parallel with a single
output unit. The Adaline layer is present between the input layer and the Madaline layer hence
Adaline layer is a hidden layer. The weights between the input layer and the hidden layer are
adjusted, and the weight between the hidden layer and the output layer is fixed.
It may use the majority vote rule, the output would have an answer either true or false. Adaline
and Madaline layer neurons have a bias of ‘1’ connected to them. use of multiple Adaline helps
counter the problem of non-linear separability.
Architecture Components:
1. Input Layer:
o Similar to ADALINE, the input layer consists of multiple input neurons that receive the
input features.
3. Output Layer:
o The output layer consists of one or more neurons, and each neuron aggregates the
outputs from the hidden layer neurons.
o Identifying which hidden layer unit’s weight update will produce the largest reduction in
error.
o Updating the weights based on which neuron’s update leads to the largest reduction in
error.
Key Characteristics:
Non-linear Classification: MADALINE can solve non-linear problems due to its multiple
ADALINE units.
Threshold Logic: Uses a step function (sign function) for final output classification.
Multilayered: Unlike ADALINE, MADALINE consists of multiple layers, making it more
versatile for complex pattern recognition tasks.
Algorithm:
Step 1: Initialize weight and set learning rate α.
v1=v2=0.5 , b=0.5
other weight may be a small random value.
Step 2: While the stopping condition is False do steps 3 to 9.
Step 3: for each training set perform steps 4 to 8.
Step 4: Set activation of input unit xi = si for (i=1 to n).
Step 5: compute net input of Adaline unit
zin1 = b1 + x1w11 + x2w21
zin2 = b2 + x1w12 + x2w22
Step 6: for output of remote Adaline unit using activation function given below:
Activation function f(z) = .
z1=f(zin1)
z2=f(zin2)
Step 7: Calculate the net input to output.
yin = b3 + z1v1 + z2v2
Apply activation to get the output of the net
y=f(yin)
Step 8: Find the error and do weight updation
if t ≠ y then t=1 update weight on z(j) unit whose next input is close to 0.
if t = y no updation
wij(new) =wij(old) + α(t-zinj)xi
bj(new) = bj(old) + α(t-zinj)
if t=-1 then update weights on all unit z k which have positive net input
Step 9: Test the stopping condition; weights change all number of epochs.
ADALINE: Used for linear classification problems, signal processing, and adaptive filtering.
MADALINE: Suitable for more complex tasks such as speech recognition, pattern
classification, and multi-class problems where non-linear decision boundaries are required.
Both architectures are historically significant in the evolution of neural networks and contributed
foundational concepts to more advanced neural network architectures used today.
1. Input Layer:
o The input layer consists of neurons that receive the input data and pass it forward to the
next layer. Each neuron in this layer represents one feature from the input dataset.
2. Hidden Layer(s):
o The hidden layer(s) consist of neurons that transform the input features into a more
abstract representation. A backpropagation network can have one or more hidden layers
depending on the complexity of the problem.
o Each neuron in the hidden layer applies an activation function (such as the sigmoid,
tanh, or ReLU) to its weighted sum of inputs.
3. Output Layer:
o The output layer generates the final prediction of the network. The number of neurons in
this layer corresponds to the number of output variables (for example, one for
regression, or multiple for classification).
Working of Backpropagation:
1. Forward Propagation:
o The input data is passed through the input layer, hidden layer(s), and output layer.
o Each neuron computes a weighted sum of its inputs, applies the activation function, and
passes the result to the next layer.
o The output of the network is compared to the actual output (the target), and an error is
calculated (commonly using a loss function like mean squared error or cross-entropy).
3. Weight Updates:
o The weights are updated in the direction that minimizes the error. The weight update is
usually performed using an optimization algorithm like gradient descent, where the
weights are adjusted based on the learning rate and the gradient of the error:
4. Repeat:
o The forward and backward passes are repeated for multiple iterations (epochs) until the
network’s error converges to a minimum, or it reaches a stopping criterion (such as a
fixed number of epochs or a desired accuracy).
Supervised Learning: Backpropagation networks are trained with labeled data, where the
correct output (target) is provided for each input.
Error Minimization: The goal of backpropagation is to reduce the difference (error) between
the network’s predicted output and the actual output by adjusting the network's weights.
Multiple Layers: A backpropagation network typically contains at least one hidden layer,
which allows it to capture more complex relationships in the data (multilayer perceptron, MLP).
Non-linear Activation Functions: Neurons use activation functions such as sigmoid, tanh, or
ReLU to introduce non-linearity into the model, allowing the network to learn complex, non-
linear mappings.
Applications of Backpropagation Networks:
1. Classification:
o Backpropagation is widely used for classification tasks such as image classification,
text classification, and speech recognition. The network learns to assign input data to
different categories based on its features.
2. Regression:
o BPNs are also applied in regression problems where the goal is to predict a continuous
output variable, such as predicting house prices, stock market trends, or demand
forecasting.
3. Pattern Recognition:
o Backpropagation networks are used in applications like handwriting recognition,
fingerprint recognition, and facial recognition due to their ability to learn patterns from
the input data.
5. Medical Diagnosis:
o In medical applications, backpropagation networks can help in disease diagnosis by
analyzing patient data and learning patterns that indicate the presence of certain
conditions.
Advantages:
Ability to Learn Complex Patterns: BPNs can learn highly non-linear relationships between
input and output data, making them suitable for a wide range of tasks.
Generalization: After training, BPNs can generalize and make accurate predictions on new,
unseen data.
Versatility: The architecture can be applied to classification, regression, and other predictive
tasks.
Disadvantages:
Slow Convergence: Backpropagation can be slow to converge, especially with deep networks,
as it may require many iterations to reach an optimal solution.
Local Minima: The network can get stuck in local minima of the loss function, leading to
suboptimal solutions.
Sensitive to Hyperparameters: The performance of a BPN is sensitive to hyperparameters
such as the learning rate, number of hidden layers, and number of neurons. These need to be
carefully tuned.
Overfitting: If the network is too large (too many neurons or layers), it may overfit the training
data and fail to generalize well to unseen data.
Example Architecture:
Radial Basis Function Network(RBFN)- A Radial Basis Function Network (RBFN) is a type of
artificial neural network that uses radial basis functions as activation functions. It is commonly used for
function approximation, time series prediction, classification, and system control due to its simple
structure and fast learning capabilities. The radial basis function (RBF) is a classification and functional
approximation neural network developed by M.J.D. Powell.
The network uses the most common nonlinearities such as sigmoidal and Gaussian kernel functions.
The Gaussian functions are also used in regularization networks. The response of such a function is
positive for all values of y; the response decreases to 0 as |1|-> 0.
f(y) = e-Y2
The derivative of this function is given by:
f’(y) = -2ye-Y2
= -2yf(y)
ARCHITECTURE OF RBFN
1. Input Layer:
o The input layer consists of input neurons that pass the input features directly to the
hidden layer without any transformation.
2. Hidden Layer:
o The hidden layer uses radial basis functions (usually Gaussian functions) as the
activation functions.
o Each neuron in the hidden layer has a center and a spread (radius). The radial basis
function computes the distance between the input vector and the center of the neuron.
3. Output Layer:
o The output layer is typically a linear combination of the hidden layer outputs. It can have
one or more neurons depending on the task (regression or classification).
o The weights between the hidden layer and output layer are typically learned through
linear regression or another optimization technique.
Working of RBFN:
1. Training Phase:
o In the training process, the centers c\mathbf{c}c of the radial basis functions are
determined. These can be chosen randomly, through k-means clustering, or by other
methods.
o The spread parameter σ\sigmaσ for each neuron is set based on the distance between the
centers or manually.
o Once the centers and spreads are fixed, the weights connecting the hidden layer to the
output layer are learned using methods like least squares or gradient descent.
2. Prediction Phase:
o For a given input, the network calculates the distance between the input and the centers
of each neuron in the hidden layer.
o The radial basis function (such as Gaussian) is applied to the distances to compute the
hidden layer outputs.
o Finally, the weighted sum of these hidden layer outputs is used to generate the output.
Key Characteristics:
Localized learning: The output of a hidden neuron only depends on the inputs near its center,
making the RBFN focus on local regions of the input space.
Fast training: Since the weights between the hidden and output layers are often learned
linearly, training can be much faster compared to traditional feedforward networks.
Universal approximator: RBFNs are proven to be capable of approximating any continuous
function given enough hidden neurons.
Applications:
Function approximation: RBFNs can approximate complex, non-linear functions and are often
used in curve fitting and interpolation problems.
Time-series prediction: RBFNs can predict future values in time-series data by learning
patterns from historical data.
Classification: RBFNs can be used in pattern recognition and classification problems by
assigning classes based on the outputs of the network.
Control systems: RBFNs are applied in robotics and control systems where function
approximation and fast learning are required.
Advantages:
Simplicity: RBFNs have a simpler structure compared to other networks like multilayer
perceptrons (MLPs), making them easier to design and train.
Fast learning: Since the weights between the hidden and output layers are often determined
using linear regression, training is relatively quick.
Localized learning: RBFNs focus on local regions of the input space, making them effective in
tasks where local data patterns matter.
Disadvantages:
Scalability: RBFNs can require a large number of neurons for high-dimensional input data,
which may lead to higher computational costs.
Center and spread selection: The performance of the RBFN is highly dependent on how the
centers and spreads of the radial basis functions are chosen, which can sometimes be difficult to
tune.
Sensitive to outliers: The performance may degrade if there are outliers in the training data, as
these affect the selection of centers and the spread.
Example Architecture:
For an RBFN used for classification with two input features, three radial basis neurons in the hidden
layer, and one output neuron:
The network takes the input features, applies the radial basis functions in the hidden layer, and
generates a final output, which is the predicted class.
Training Algorithm:
Step 5: Select the centers for the radial basis function. The centers are selected from the set of input
vectors. It should be noted that a sufficient number of centers have to be selected to ensure adequate
sampling of the input vector space.
where xji is the center of the RBF unit for input variable; σi the width of ith RBF unit; xji the jth
variable of input pattern.
k
ynet = ∑ wimvi(xi) + w0
i=1
where k is the number of hidden layer nodes (RBF function);ynet the output value of mth node in
output layer for the nth incoming pattern; Wim the weight between ith RBF unit and m th output node;
Wo the biasing term at nth output node.
Step 8: Calculate the error and test for the stopping condition. The stopping condition may be number
of epochs or to a certain extent weight change.
Neural networks are widely used for predictive tasks, particularly in time series forecasting. The ability
of neural networks to learn complex, non-linear patterns makes them suitable for various types of
forecasting.
Applications:
Financial Forecasting: Neural networks, especially Recurrent Neural Networks (RNNs) and
Long Short-Term Memory (LSTM) networks, are used to predict stock prices, market trends,
and other financial metrics based on historical data.
Weather Forecasting: Neural networks can model weather patterns by analyzing historical
weather data, predicting temperature, rainfall, and other meteorological variables.
Demand Forecasting: Businesses use neural networks for inventory and sales forecasting to
predict future demand and optimize supply chain operations.
Energy Load Forecasting: In the energy sector, neural networks help predict future energy
consumption based on historical load data, helping manage grid demand efficiently.
Benefits:
Neural networks can be applied to compress data by learning efficient data representations that require
fewer bits while preserving the original information. This is crucial in scenarios where data storage and
transmission costs need to be minimized.
Applications:
Autoencoders for Data Compression: An autoencoder is a type of neural network used for
unsupervised learning that can compress data into a lower-dimensional representation
(encoding) and then reconstruct it (decoding). It consists of an encoder that compresses the
input and a decoder that reconstructs it.
o Lossless Compression: Autoencoders are used to compress high-dimensional data into
a latent space representation without losing important information, which is useful for
compressing text, sensor data, etc.
o Dimensionality Reduction: Neural networks reduce data size by learning a smaller
representation of the original dataset, making data more manageable for tasks like
visualization, storage, or faster processing.
Benefits:
Efficient Compression: Neural networks can learn the most important features, leading to
highly efficient data compression.
Improved Performance: Compression models built on neural networks often outperform
traditional algorithms in terms of compression ratios and reconstruction quality.
Neural networks, particularly deep learning methods, have revolutionized image compression
techniques by offering more efficient and intelligent ways to compress images with minimal loss in
quality.
Applications:
Deep Autoencoders for Image Compression: Similar to data compression, autoencoders are
used to compress image data. They learn to represent the most essential features of an image in
a compact, low-dimensional format and reconstruct the original image during decompression.
Convolutional Neural Networks (CNNs) for Image Compression: CNNs can be used to
detect patterns and features in images, enabling them to generate efficient image
representations. Deep learning-based compression methods often outperform traditional
algorithms like JPEG in preserving image quality at higher compression ratios.
Generative Adversarial Networks (GANs) for Super-resolution: GANs can be used in
conjunction with compression techniques to generate high-quality reconstructions from
compressed data, which is especially useful in tasks like image super-resolution and restoration.
Variational Autoencoders (VAEs): VAEs are a type of autoencoder that can generate new
data from compressed representations, making them suitable for image compression tasks where
the balance between reconstruction quality and compression is crucial.
Benefits: