PERCEPTRON IMPLEMENTATION
PERCEPTRON IMPLEMENTATION
Intelligence in the input data is known as Perceptron. This neural links to the artificial
neurons using simple logic gates with binary outputs. An artificial neuron invokes the
mathematical function and has node, input, weights, and output equivalent to the cell
nucleus, dendrites, synapse, and axon, respectively, compared to a biological
neuron.
A binary classifier in machine learning is a type of model that is trained to classify data into
one of two possible categories, typically represented as binary labels such as 0 or 1, true or
false, or positive or negative. For example, a binary classifier may be trained to distinguish
between spam and non-spam emails, or to predict whether a credit card transaction is
fraudulent or legitimate.
Binary classifiers are a fundamental building block of many machine learning applications,
and there are numerous algorithms that can be used to build them, including logistic
regression, support vector machines (SVMs), decision trees, random forests, and neural
networks. These models are typically trained using labeled data, where the correct label or
category for each example in the training set is known, and then used to predict the category
of new, unseen examples.
The performance of a binary classifier is typically evaluated using metrics such as accuracy,
precision, recall, and F1 score, which measure how well the model is able to correctly
identify positive and negative examples in the data. High-quality binary classifiers are
essential for a wide range of applications, including natural language processing, computer
vision, fraud detection, and medical diagnosis, among many others.
Biological Neuron
A human brain has billions of neurons. Neurons are interconnected nerve cells in the human
brain that are involved in processing and transmitting chemical and electrical signals.
Dendrites are branches that receive information from other neurons.
Cell nucleus or Soma processes the information received from dendrites. Axon is a cable that
is used by neurons to send information. Synapse is the connection between an axon and other
neuron dendrites.
Researchers Warren McCullock and Walter Pitts published their first concept of simplified
brain cell in 1943. This was called McCullock-Pitts (MCP) neuron. They described such a
nerve cell as a simple logic gate with binary outputs.
Multiple signals arrive at the dendrites and are then integrated into the cell body, and, if the
accumulated signal exceeds a certain threshold, an output signal is generated that will be
passed on by the axon. In the next section, let us talk about the artificial neuron.
In the next section, let us compare the biological neuron with the artificial neuron.
Dendrites Input
Inputs are summed and passed through a nonlinear function to produce output
Perceptron
1. Input Layer: The input layer consists of one or more input neurons, which receive input
signals from the external world or from other layers of the neural network.
2. Weights: Each input neuron is associated with a weight, which represents the strength of
the connection between the input neuron and the output neuron.
3. Bias: A bias term is added to the input layer to provide the perceptron with additional
flexibility in modeling complex patterns in the input data.
4. Activation Function: The activation function determines the output of the perceptron
based on the weighted sum of the inputs and the bias term. Common activation functions
used in perceptrons include the step function, sigmoid function, and ReLU function.
5. Output: The output of the perceptron is a single binary value, either 0 or 1, which indicates
the class or category to which the input data belongs.
7. Overall, the perceptron is a simple yet powerful algorithm that can be used to perform
binary classification tasks and has paved the way for more complex neural networks used
in deep learning today.
Perceptron in Machine Learning
The most commonly used term in Artificial Intelligence and Machine Learning (AIML) is
Perceptron. It is the beginning step of learning coding and Deep Learning technologies,
which consists of input values, scores, thresholds, and weights implementing logic gates.
Perceptron is the nurturing step of an Artificial Neural Link. In 19h century, Mr. Frank
Rosenblatt invented the Perceptron to perform specific high-level calculations to detect input
data capabilities or business intelligence. However, now it is used for various other purposes.
History of Perceptron
The perceptron was introduced by Frank Rosenblatt in 1958, as a type of artificial neural
network capable of learning and performing binary classification tasks. Rosenblatt was a
psychologist and computer scientist who was interested in developing a machine that could
learn and recognize patterns in data, inspired by the workings of the human brain.
The perceptron was based on the concept of a simple computational unit, which takes one or
more inputs and produces a single output, modeled after the structure and function of a
neuron in the brain. The perceptron was designed to be able to learn from examples and
adjust its parameters to improve its accuracy in classifying new examples.
The perceptron algorithm was initially used to solve simple problems, such as recognizing
handwritten characters, but it soon faced criticism due to its limited capacity to learn complex
patterns and its inability to handle non-linearly separable data. These limitations led to the
decline of research on perceptrons in the 1960s and 1970s.
A machine-based algorithm used for supervised learning of various binary sorting tasks is
called Perceptron. Furthermore, Perceptron also has an essential role as an Artificial Neuron
or Neural link in detecting certain input data computations in business intelligence. A
perceptron model is also classified as one of the best and most specific types of Artificial
Neural networks. Being a supervised learning algorithm of binary classifiers, we can also
consider it a single-layer neural network with four main parameters: input values, weights
and Bias, net sum, and an activation function.
AS discussed earlier, Perceptron is considered a single-layer neural link with four main
parameters. The perceptron model begins with multiplying all input values and their weights,
then adds these values to create the weighted sum. Further, this weighted sum is applied to
the activation function ‘f’ to obtain the desired output. This activation function is also known
as the step function and is represented by ‘f.’
Source: javapoint
This step function or Activation function is vital in ensuring that output is mapped between
(0,1) or (-1,1). Take note that the weight of input indicates a node’s strength. Similarly, an
input value gives the ability the shift the activation function curve up or down.
Step 1: Multiply all input values with corresponding weight values and then add to calculate
the weighted sum. The following is the mathematical expression of it:
Add a term called bias ‘b’ to this weighted sum to improve the model’s performance.
Step 2: An activation function is applied with the above-mentioned weighted sum giving us
an output either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
Types of Perceptron models
We have already discussed the types of Perceptron models in the Introduction. Here, we shall
give a more profound look at this:
1. Single Layer Perceptron model: One of the easiest ANN(Artificial Neural Networks) types consists
of a feed-forward network and includes a threshold transfer inside the model. The main objective
of the single-layer perceptron model is to analyze the linearly separable objects with binary
outcomes. A Single-layer perceptron can learn only linearly separable patterns.
2. Multi-Layered Perceptron model: It is mainly similar to a single-layer perceptron model but has
more hidden layers.
Forward Stage: From the input layer in the on stage, activation functions begin and terminate
on the output layer.
Backward Stage: In the backward stage, weight and bias values are modified per the model’s
requirement. The backstage removed the error between the actual output and demands
originating backward on the output layer. A multilayer perceptron model has a greater
processing power and can process linear and non-linear patterns. Further, it also implements
logic gates such as AND, OR, XOR, XNOR, and NOR.
Advantages:
Helps us obtain the same accuracy ratio with big and small data.
Disadvantages:
It is tough to predict how much the dependent variable affects each independent variable.
4. Training: Adjust weights using the Perceptron Learning Rule based on errors.
Binary classification involves predicting one of two classes (e.g., 0 or 1). For a dataset to be
linearly separable, a straight line (or hyperplane in higher dimensions) can perfectly separate
the two classes.
Example: Classifying red and blue dots on a 2D plane with a straight line.
A perceptron can handle such data because it is inherently linear. Training the perceptron adjusts
the weights to find the optimal line that separates the two classes.
With PyTorch, you can build and train a perceptron model to perform binary classification.
Here are the steps to follow:
1. Import Libraries: Use PyTorch for model creation, training, and testing.
3. Model Definition: Define a single-layer perceptron using PyTorch's torch.nn.Linear for the linear
part and torch.sigmoid or another function for activation.
4. Loss and Optimizer: Use Binary Cross Entropy Loss (BCELoss) and an optimizer like Stochastic
Gradient Descent (SGD).
5. Train the Model: Pass inputs through the model, compute the loss, backpropagate the error, and
update weights iteratively.
3. Initially, weights are multiplied with input features, and then the decision is made whether the
neuron is fired or not.
4. The activation function applies a step rule to check whether the function is more significant than
zero.
5. The linear decision boundary is drawn, enabling the distinction between the two linearly
separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it must have an output
signal; otherwise, no output will be shown.
1. The output of a perceptron can only be a binary number (0 or 1) due to the hard-edge transfer
function.
2. It can only be used to classify the linearly separable sets of input vectors. If the input vectors are
non-linear, it is not easy to classify them correctly.
Perceptron Learning Rule states that the algorithm would automatically learn the optimal
weight coefficients. The input features are then multiplied with these weights to determine if
a neuron fires or not.
The Perceptron receives multiple input signals, and if the sum of the input signals exceeds a
certain threshold, it either outputs a signal or does not return an output. In the context of
supervised learning and classification, this can then be used to predict the class of a sample.
Perceptron Function
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight
coefficient; an output value ”f(x)”is generated.
“b” = bias (an element that adjusts the boundary away from origin without any
dependence on the input value)
Inputs of a Perceptron
A Perceptron accepts inputs, moderates them with certain weight values, then applies the
transformation function to output the final result. The image below shows a Perceptron with a
Boolean output.
A Boolean output is based on inputs such as salaried, married, age, past credit profile, etc. It
has only two values: Yes and No or True and False. The summation function “∑” multiplies
all inputs of “x” by weights “w” and then adds them up as follows:
The activation function applies a step rule (convert the numerical output into +1 or -1) to
check if the output of the weighting function is greater than zero or not.
For example:
Step function gets triggered above a certain value of the neuron output; else it outputs zero.
Sign Function outputs +1 or -1 depending on whether neuron output is greater than zero or
not. Sigmoid is the S-curve and outputs a value between 0 and 1.
Output of Perceptron
Inputs: x1…xn
Output: o(x1….xn)
An output of +1 specifies that the neuron is triggered. An output of -1 specifies that the
neuron did not get triggered.
Error in Perceptron
In the Perceptron Learning Rule, the predicted output is compared with the known output. If
it does not match, the error is propagated backward to allow weight adjustment to happen.
Bias Unit
For simplicity, the threshold θ can be brought to the left and represented as w0x0, where w0=
-θ and x0= 1.
The value w0 is called the bias unit.
Output:
The figure shows how the decision function squashes wTx to either +1 or -1 and how it can
be used to discriminate between two linearly separable classes.
Perceptron at a Glance
Perceptron is an algorithm for Supervised Learning of single layer binary linear classifiers.
Weights are multiplied with the input features and decision is made if the neuron is fired or not.
Activation function applies a step rule to check if the output of the weighting function is greater
than zero.
Linear decision boundary is drawn enabling the distinction between the two linearly separable
classes +1 and -1.
If the sum of the input signals exceeds a certain threshold, it outputs a signal; otherwise, there is
no output.
Types of activation functions include the sign, step, and sigmoid functions.
The Perceptron learning rule converges if the two classes can be separated by the linear
hyperplane. However, if the classes cannot be separated perfectly by a linear classifier, it
could give rise to errors.
As discussed in the previous topic, the classifier boundary for a binary output in a Perceptron
is represented by the equation given below:
The diagram above shows the decision surface represented by a two-input Perceptron.
Observation:
In Fig(a) above, examples can be clearly separated into positive and negative values; hence, they
are linearly separable. This can include logic gates like AND, OR, NOR, NAND.
Fig (b) shows examples that are not linearly separable (as in an XOR gate).
Diagram (a) is a set of training examples and the decision surface of a Perceptron that classifies
them correctly.
Diagram (b) is a set of training examples that are not linearly separable, that is, they cannot be
correctly classified by any straight line.
Logic gates are the building blocks of a digital system, especially neural networks. In short,
they are the electronic circuits that help in addition, choice, negation, and combination to
form complex circuits. Using the logic gates, Neural Networks can learn on their own without
you having to manually code the logic. Most logic gates have two inputs and one output.
Each terminal has one of the two binary conditions, low (0) or high (1), represented by
different voltage levels. The logic state of a terminal changes based on how the circuit
processes data.
Based on this logic, logic gates can be categorized into seven types:
AND
NAND
OR
NOR
NOT
XOR
XNOR
1. AND
If the two inputs are TRUE (+1), the output of Perceptron is positive, which amounts to
TRUE.
2. OR
If either of the two inputs are TRUE (+1), the output of Perceptron is positive, which
amounts to TRUE.
x1 = 1 (TRUE), x2 = 0 (FALSE)
3. XOR
A XOR gate, also called as Exclusive OR gate, has two inputs and one output.
The gate returns a TRUE as the output if and ONLY if one of the input states is true.
A B
0 0 0
0 1 1
1 0 1
1 1 0
Unlike the AND and OR gate, an XOR gate requires an intermediate hidden layer for
preliminary transformation in order to achieve the logic of an XOR gate.
An XOR gate assigns weights so that XOR conditions are met. It cannot be implemented with
a single layer Perceptron and requires Multi-layer Perceptron or MLP.
t3= threshold for H3; t4= threshold for H4; t5= threshold for O5
Next up, let us learn more about the Sigmoid activation function!
Sigmoid Curve
This is useful as an activation function when one is interested in probability mapping rather
than precise values of input parameter t.
The sigmoid output is close to zero for highly negative input. This can be a problem in neural
network training and can lead to slow learning and the model getting trapped in local minima
during training. Hence, hyperbolic tangent is more preferable as an activation function in
hidden layers of a neural network.
The Perceptron output is 0.888, which indicates the probability of output y being a 1.
If the sigmoid outputs a value greater than 0.5, the output is marked as TRUE. Since the
output here is 0.888, the final output is marked as TRUE.
In the next section, let us focus on the rectifier and softplus functions.
Apart from Sigmoid and Sign activation functions seen earlier, other common activation
functions are ReLU and Softplus. They eliminate negative units as an output of max function
will output 0 for all units 0 or less.
A rectifier or ReLU (Rectified Linear Unit) is a commonly used activation function. This
function allows one to eliminate negative units in an ANN. This is the most popular
activation function used in deep neural networks.
Allows faster and more effective training of deep neural architectures on large and complex
datasets
Sparse activation of only about 50% of units in a neural network (as negative units are
eliminated)
Scales well
Non-differentiable at zero - Non-differentiable at zero means that values close to zero may give
inconsistent or intractable results.
Non-zero centered - Being non-zero centered creates asymmetry around data (only positive
values handled), leading to the uneven handling of data.
Unbounded - The output value has no limit and can lead to computational issues with large
values being passed through.
Dying ReLU problem - When the learning rate is too high, Relu neurons can become inactive and
“die.”
Softmax Function
Another very popular activation function is the Softmax function. The Softmax outputs
probability of the result belonging to a certain set of classes. It is akin to a categorization
logic at the end of a neural network. For example, it may be used at the end of a neural
network that is trying to determine if the image of a moving object contains an animal, a car,
or an airplane.
In probability theory, the output of the Softmax function represents a probability distribution
over K different outcomes.
In Softmax, the probability of a particular sample with net input z belonging to the ith class
can be computed with a normalization term in the denominator, that is, the sum of all M
linear functions:
The Softmax function is used in ANNs and Naïve Bayes classifiers.
For example, if we take an input of [1,2,3,4,1,2,3], the Softmax of that is [0.024, 0.064,
0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight if the original input is
'4’ This function is normally used for:
This code implements the softmax formula and prints the probability of belonging to one of
the three classes. The sum of probabilities across all classes is 1.
Hyperbolic Functions
1. Hyperbolic Tangent
The advantage of the hyperbolic tangent over the logistic function is that it has a broader
output spectrum and ranges in the open interval (-1, 1), which can improve the convergence
of the backpropagation algorithm.
3. Hyperbolic Tangent
This code implements the tanh formula. Then it calls both logistic and tanh functions on the z
value. The tanh function has two times larger output space than the logistic function.
With larger output space and symmetry around zero, the tanh function leads to the more even
handling of data, and it is easier to arrive at the global maxima in the loss function.
Various activation functions that can be used with Perceptron are shown below:
The activation function to be used is a subjective decision taken by the data scientist, based
on the problem statement and the form of the desired results. If the learning process is slow or
has vanishing or exploding gradients, the data scientist may try to change the activation
function to see if these problems can be resolved.
Future of Perceptron
With the increasing popularity and usage of Machine Learning, the future of Perceptron
seems significant and prospectus. It helps to interpret data by building innate patterns and
applying them shortly. Coding is continuously evolving in this era, and the end of perceptron
technology will continue to support and facilitate analytical behavior in machines that will
add further efficiency to modern computers.