Unit 1 Notes
Unit 1 Notes
Review of Transistor as a switch, Logic gates and Truth Tables. Characteristic of Neural Networks,
Historical Development of Neural Networks, Biological Neuron and their artificial Model, McCulloch
Pitts Neuron Model, Thresholding Logic functions, Neural Network Learning rules, Perceptron
Learning Algorithm, Perceptrons Model, Simulation of logic gates, Limitations of Pereceptron Learning.
A synapse is able to increase or decrease The artificial signals can be changed by weights in a
the strength of the connection. This is manner similar to the physical changes that occur in the
where information is stored. synapses.
Approx 1011 neurons. 102– 104 neurons with current technology
Difference between the human brain and computers in terms of how information is processed.
Our brain changes their connectivity over time The connectivity between the electronic components
to represents new information and requirements in a computer never change unless we replace its
McCulloch-Pitts Model
Simple McCulloch-Pitts neurons can be used to design logical operations. For that purpose, the
connection weights need to be correctly decided along with the threshold function (rather than the
threshold value of the activation function). For better understanding purpose, let me consider an
example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need to
decide when John will carry the umbrella. The situations are as follows:
First scenario: It is not raining, nor it is sunny
Second scenario: It is not raining, but it is sunny
Third scenario: It is raining, and it is not sunny
Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input signals
as follows:
X1: Is it raining?
X2 : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights X 1 and
X2 as 1 and a threshold function as 1. So, the neural network model will look like:
1 0 0 0 0
2 0 1 1 1
3 1 0 1 1
4 1 1 2 1
The truth table built with respect to the problem is depicted above. From the truth table, I can conclude
that in the situations where the value of yout is 1, John needs to carry an umbrella. Hence, he will need to
carry an umbrella in scenarios 2, 3 and 4.
2. Rosenblatt’s Perceptron
Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model. The diagrammatic
representation is as follows:
Rosenblatt’s Perceptron
The perceptron receives a set of input x 1, x2,….., xn. The linear combiner or the adder mode
computes the linear combination of the inputs applied to the synapses with synaptic weights being
w1, w2,……,wn. Then, the hard limiter checks whether the resulting sum is positive or negative If
the input of the hard limiter node is positive, the output is +1, and if the input is negative, the
output is -1. Mathematically the hard limiter input is:
However, perceptron includes an adjustable value or bias as an additional weight w0. This
additional weight is attached to a dummy input x0, which is assigned a value of 1. This
consideration modifies the above equation to:
The objective of the perceptron is o classify a set of inputs into two classes c 1 and c2. This can be
done using a very simple decision rule – assign the inputs to c 1 if the output of the perceptron i.e.
yout is +1 and c2 if yout is -1. So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the
simplest form of perceptron will have two decision regions, resembling two classes, separated by a
hyperplane defined by:
Therefore, the two input signals denoted by the variables x 1 and x2, the decision boundary is a
straight line of the form:
or
So, for a perceptron having the values of synaptic weights w 0,w1 and w2 as -2, 1/2 and 1/4,
respectively. The linear decision boundary will be of the form:
So, any point (x,1x2) which lies above the decision boundary, as depicted by the graph, will be assigned
to class c1 and the points which lie below the boundary are assigned to class c2.
Thus, we see that for a data set with linearly separable classes, perceptrons can always be
employed to solve classification problems using decision lines (for 2-dimensional space),
decision planes (for 3-dimensional space) or decision hyperplanes (for n-dimensional space).
Appropriate values of the synaptic weights can be obtained by training a perceptron. However,
one assumption for perceptron to work properly is that the two classes should be linearly
separable i.e. the classes should be sufficiently separated from each other. Otherwise, if the
classes are non-linearly separable, then the classification problem cannot be solved by
perceptron.
Multi-layer perceptron: A basic perceptron works very successfully for data sets which possess
linearly separable patterns. However, in practical situations, that is an ideal situation to have. This
was exactly the point driven by Minsky and Papert in their work in 1969. They showed that a basic
perceptron is not able to learn to compute even a simple 2 bit XOR. So, let us understand the
reason.
Consider a truth table highlighting output of a 2 bit XOR function:
x1 x2 x1 XOR x2 Class
1 1 0 c2
1 0 1 c1
0 1 1 c1
0 0 0 c2
The data is not linearly separable. Only a curved decision boundary can separate the classes
properly. To address this issue, the other option is to use two decision boundary lines in place of
one.
This is the philosophy used to design the multi-layer perceptron model. The major highlights of
this model are as follows:
The neural network contains one or more intermediate layers between the input and output
nodes, which are hidden from both input and output nodes
Each neuron in the network includes a non-linear activation function that is differentiable.
The neurons in each layer are connected with some or all the neurons in the previous layer.
The connection weights from x 1,x2,…….xn are exhibitory denoted by ‘w’ and connection
weights from Xn+1 , Xn+2,…….Xn+m are inhibitory denoted by ‘-p’.
-> The McCulloch-Pitts neuron Y has the activation function.
f(yin) = 1 if yin >= Θ where net input yin is given by yin = Σ xiwi
0 if yin < Θ
where Θ is the threshold value and yin is the total net input signal received by neuron Y.
-> The McCulloch-Pitts neuron will fire if it receives k or more exhibitory inputs and no
inhibitory inputs.
Kw >= Θ > (K-1)w
Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a
supervised learning algorithm of binary classifiers. Hence, we can consider it as a single-layer neural network with
four main parameters, i.e., input values, weights and Bias, net sum, and an activation function.
Binary classifiers can be considered as linear classifiers. In simple words, we can understand it as a classification
algorithm that can predict linear predictor function in terms of weight and feature vectors.
This is the primary component of Perceptron which accepts the initial data into the system for further processing.
Each input node contains a real numerical value.
Weight parameter represents the strength of the connection between units. This is another most important parameter
of Perceptron components. Weight is directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the neuron will fire or not. Activation
Function can be considered primarily as a step function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based on various problem statements and
forms the desired outputs. Activation function may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by
checking whether the learning process is slow or has vanishing or exploding gradients.
Step-1
In the first step first, multiply all input values with corresponding weight values and then add them to determine the
weighted sum. Mathematically, we can calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's performance.
∑wi*xi + b
Step-2
In the second step, an activation function is applied with the above-mentioned weighted sum, which gives us output
either in binary form or a continuous value as follows:
Y = f(∑wi*xi + b)
In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with inconstantly
allocated input for weight parameters. Further, it sums up all inputs (weight). After adding all inputs, if the total
sum of all inputs is more than a pre-determined value, the model gets activated and shows the output value as +1.
The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes in two stages as
follows:
o Forward Stage: Activation functions start from the input layer in the forward stage and terminate on the
output layer.
o Backward Stage: In the backward stage, weight and bias values are modified as per the model's
requirement. In this stage, the error between actual output and demanded originated backward on the output
layer and ended on the input layer.
Hence, a multi-layered perceptron model has considered as multiple artificial neural networks having various layers
in which activation function does not remain linear, similar to a single layer perceptron model. Instead of linear,
activation function can be executed as sigmoid, TanH, ReLU, etc., for deployment.
A multi-layer perceptron model has greater processing power and can process linear and non-linear patterns.
Further, it can also implement logic gates such as AND, OR, XOR, NAND, NOT, XNOR, NOR.
o In multi-layer Perceptron, it is difficult to predict how much the dependent variable affects each independent
variable.
o The model functioning depends on the quality of the training.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x' with the learned weight coefficient
'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
The perceptron model has the following characteristics.
o The output of a perceptron can only be a binary number (0 or 1) due to the hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input vectors. If input vectors are non-
linear, it is not easy to classify them properly.
Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to interpret data by building intuitive
patterns and applying them in the future. Machine learning is a rapidly growing technology of Artificial Intelligence
that is continuously evolving and in the developing phase; hence the future of perceptron technology will continue
to support and facilitate analytical behavior in machines that will, in turn, add to the efficiency of computers.
The perceptron model is continuously becoming more advanced and working efficiently on complex problems with
the help of artificial neurons.
For above neuron architecture, the net input has to be calculated in the way.
I = xA + yB
where x and y are the activations of the input neurons X and Y. The output z of the output
neuron Z can be obtained by applying activations over the net input.
O = f(I)
Output = Function ( net input calculated )
The function to be applied over the net input is called activation function . There are
various activation function possible for this
1. Every new technology need assistance from the previous one i.e. data from previous ones
and these data are analyzed so that every pros and cons should be studied correctly. All of
these things are possible only through the help of neural network.
2. Neural network is suitable for the research on Animal behavior, predator/prey
relationships and population cycles .
3. It would be easier to do proper valuation of property, buildings, automobiles, machinery
etc. with the help of neural network.
4. Neural Network can be used in betting on horse races, sporting events, and most
importantly in stock market.
5. It can be used to predict the correct judgment for any crime by using a large data of crime
details as input and the resulting sentences as output.
6. By analyzing data and determining which of the data has any fault ( files diverging from
peers ) called as Data mining, cleaning and validation can be achieved through neural
network.
7. Neural Network can be used to predict targets with the help of echo patterns we get from
sonar, radar, seismic and magnetic instruments.
8. It can be used efficiently in Employee hiring so that any company can hire the right
employee depending upon the skills the employee has and what should be its productivity in
future.
9. It has a large application in Medical Research .
10. It can be used to for Fraud Detection regarding credit cards , insurance or taxes by
analyzing the past records
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron
Algorithm)
While taking the Udacity Pytorch Course by Facebook, I found it difficult understanding
how the Perceptron works with Logic gates (AND, OR, NOT, and so on). I decided to
check online resources, but as of the time of writing this, there was really no explanation
on how to go about it. So after personal readings, I finally understood how to go about it,
which is the reason for this medium post.
Note: The purpose of this article is NOT to mathematically explain how the neural
network updates the weights, but to explain the logic behind how the values are being
changed in simple terms.
Also, the steps in this method are very similar to how Neural Networks learn, which is as
follows;
Forward Propagate
AND Gate
From our knowledge of logic gates, we know that an AND logic table is given by the
diagram below
AND Gate
The question is, what are the weights and bias for the AND perceptron?
First, we need to understand that the output of an AND gate is 1 only if both inputs (in this
case, x1 and x2) are 1. So, following the steps listed above;
Row 1
From w1*x1+w2*x2+b, initializing w1, w2, as 1 and b as –1, we get;
x1(1)+x2(1)–1
Passing the first row of the AND logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct, and no
need for Backpropagation.
Row 2
Passing (x1=0 and x2=1), we get;
0+1–1 = 0
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is correct, as the output is 0
for the AND gate.
From the Perceptron rule, this works (for both row 1, row 2 and 3).
Row 4
Passing (x1=1 and x2=1), we get;
1+1–1 = 1
Again, from the perceptron rule, this is still valid.
Therefore, we can conclude that the model to achieve an AND gate, using the Perceptron
algorithm is;
x1+x2–1
OR Gate
OR Gate
Row 1
x1(1)+x2(1)–1
Passing the first row of the OR logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. Therefore, this row is correct.
Row 2
Passing (x1=0 and x2=1), we get;
0+1–1 = 0
From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.
So we want values that will make inputs x1=0 and x2=1 give y` a value of 1. If we
change w2 to 2, we have;
0+2–1 = 1
From the Perceptron rule, this is correct for both the row 1 and 2.
Row 3
Passing (x1=1 and x2=0), we get;
1+0–1 = 0
From the Perceptron rule, if Wx+b <= 0, then y`=0. Therefore, this row is incorrect.
Since it is similar to that of row 2, we can just change w1 to 2, we have;
2+0–1 = 1
From the Perceptron rule, this is correct for both the row 1, 2 and 3.
Row 4
Passing (x1=1 and x2=1), we get;
2+2–1 = 3
Again, from the perceptron rule, this is still valid. Quite Easy!
Therefore, we can conclude that the model to achieve an OR gate, using the Perceptron
algorithm is;
2x1+2x2–1
NOT Gate
NOT Gate
From the diagram, the output of a NOT gate is the inverse of a single input. So,
following the steps listed above;
Row 1
From w1x1+b, initializing w1 as 1 (since single input), and b as –1, we get;
x1(1)–1
Passing the first row of the NOT logic table (x1=0), we get;
0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOT gate.
So we want values that will make input x1=0 to give y` a value of 1. If we change b to 1,
we have;
0+1 = 1
From the Perceptron rule, this works.
Row 2
Passing (x1=1), we get;
1+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is so incorrect, as the output
is 0 for the NOT gate.
So we want values that will make input x1=1 to give y` a value of 0. If we change w1 to
–1, we have;
–1+1 = 0
From the Perceptron rule, if Wx+b ≤ 0, then y`=0. Therefore, this works (for both row 1
and row 2).
Therefore, we can conclude that the model to achieve a NOT gate, using the Perceptron
algorithm is;
–x1+1
NOR Gate
NOR Gate
From the diagram, the NOR gate is 1 only if both inputs are 0.
Row 1
From w1x1+w2x2+b, initializing w1 and w2 as 1, and b as –1, we get;
x1(1)+x2(1)–1
Passing the first row of the NOR logic table (x1=0, x2=0), we get;
0+0–1 = –1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NOR gate.
So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;
0+0+1 = 1
From the Perceptron rule, this works.
Row 2
Passing (x1=0, x2=1), we get;
0+1+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.
So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w2 to –1, we have;
0–1+1 = 0
From the Perceptron rule, this is valid for both row 1 and row 2.
Row 3
Passing (x1=1, x2=0), we get;
1+0+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is incorrect, as the output is 0
for the NOR gate.
So we want values that will make input x1=0 and x2 = 1 to give y` a value of 0. If we
change w1 to –1, we have;
–1+0+1 = 0
From the Perceptron rule, this is valid for both row 1, 2 and 3.
Row 4
-1-1+1 = -1
Therefore, we can conclude that the model to achieve a NOR gate, using the Perceptron
algorithm is;
-x1-x2+1
NAND Gate
From the diagram, the NAND gate is 0 only if both inputs are 1.
Row 1
From w1x1+w2x2+b, initializing w1 and w2 as 1, and b as -1, we get;
x1(1)+x2(1)-1
Passing the first row of the NAND logic table (x1=0, x2=0), we get;
0+0-1 = -1
From the Perceptron rule, if Wx+b≤0, then y`=0. This row is incorrect, as the output is 1
for the NAND gate.
So we want values that will make input x1=0 and x2 = 0 to give y` a value of 1. If we
change b to 1, we have;
0+0+1 = 1
From the Perceptron rule, this works.
Row 2
Passing (x1=0, x2=1), we get;
0+1+1 = 2
From the Perceptron rule, if Wx+b > 0, then y`=1. This row is also correct (for both row
2 and row 3).
Row 4
Passing (x1=1, x2=1), we get;
1+1+1 = 3
This is not the expected output, as the output is 0 for a NAND combination of x1=1 and
x2=1.
Changing values of w1 and w2 to -1, and value of b to 2, we get;
-1-1+2 = 0
It works for all rows.
Therefore, we can conclude that the model to achieve a NAND gate, using the Perceptron
algorithm is;
-x1-x2+2
XNOR Gate
XNOR Gate
Now that we are done with the necessary basic logic gates, we can combine them to give an
XNOR gate.
The boolean representation of an XNOR gate is;
x1x2 + x1`x2`
Where ‘`' means inverse.
From the expression, we can say that the XNOR gate consists of an AND gate (x1x2), a
NOR gate (x1`x2`), and an OR gate.
AND (x1+x2–1)
NOR (-x1-x2+1)
OR (2x1+2x2–1)
XOR Gate
XOR Gate
The boolean representation of an XOR gate is;
x1x`2 + x`1x2
We first simplify the boolean expression
x`1x2 + x1x`2 + x`1x1 + x`2x2
x1(x`1 + x`2) + x2(x`1 + x`2)
(x1 + x2)(x1x2)`
From the simplified expression, we can say that the XOR gate consists of an OR gate (x1 +
x2), a NAND gate (-x1-x2+1) and an AND gate (x1+x2–1.5).
OR (2x1+2x2–1)
NAND (-x1-x2+2)
AND (x1+x2–1)