Perceptron 2015
Perceptron 2015
set of outputs
Number of inputs/outputs is variable
The Network itself is composed of an
Neural Network
arbitrary number of nodes or units, connected
by links, with an arbitrary topology.
A link from unit i to unit j serves to propagate
the activation aj to j, and it has a weight Output 0 Output 1 ... Output m
Wij.
What can a neural networks do?
Compute a known function / Approximate an unknown function
Pattern Recognition / Signal Processing
Learn to do any of the above Carla P. Gomes
CS4700
Different
types of nodes
Carla P. Gomes
CS4700
An Artificial Neuron
Node or Unit:
A Mathematical Abstraction
Artificial Neuron,
Node or unit ,
Processing Unit i
Input
Input edges, Output
function(ini): Output edges,
each with weights Activation
weighted sum n
a g ( W j ,i a j ) each with weights
(positive, negative, and of its inputs, function (g) i j 0
including applied to (positive, negative, and
change over time,
fixed input a0. input function change over time,
learning) n (typically learning)
ini W j ,i a j
j 0
non-linear).
a processing element producing an output based on a function of its inputs
Note: the fixed input and bias weight are conventional; some authors instead, e.g., or a 0=1 and -W0i
Carla P. Gomes
CS4700
Activation Functions
n n
ini W j ,i a j 0; ini w j ,i a j w0,i a0 0;
j 0 j 1
n
defining a0 1 we get W j ,i a j w0,i , i w0,i
j 1
n
defining a0 1 we get W j ,i a j w0,i , i w0,i
j 1
Input edges,
each with weights
(positive, negative, and
change over time,
learning)
i threshold value
associated with
unit i
i=0 i=t Carla P. Gomes
CS4700
Implementing Boolean Functions
Activation of
threshold units when:
n
W
j 1
j ,i a j W0,i
Carla P. Gomes
CS4700
Boolean AND
W0= 1.5
0 0 0
0 1 0 -1
w1=1 w2=1
1 0 0
1 1 1 x1 x2
Activation of
threshold units when:
n
W
j 1
j ,i a j W0,i
Carla P. Gomes
CS4700
Boolean OR
w0= 0.5
0 0 0
0 1 1 -1
w1=1 w2=1
1 0 1
1 1 1 x1 x2
Activation of
threshold units when:
n
W
j 1
j ,i a j W0,i
Carla P. Gomes
CS4700
Inverter
input x1 output
w0= -
0 1
1 0 -1 w1= 1
x1
Activation of
threshold units when:
n
So, units with a threshold activation function
W
j 1
j ,i a j W0,i can act as logic gates given the appropriate input and
bias weights.
Carla P. Gomes
CS4700
Network Structures
Feed-forward
Recurrent networks networks implement functions,
have no internal state (only weights).
– Feed the outputs back into own inputs
Network is a dynamical system
(stable state, oscillations, chaotic behavior)
Response of the network depends on initial state
– Can support short-term memory
– More difficult to understand
Carla P. Gomes
CS4700
Feed-forward Network:
Represents a function of Its Input
Given an input vector x = (x1,x2), the activations of the input units are set to values of the
input vector, i.e., (a1,a2)=(x1,x2), and the network computes:
Weights are the parameters of the function
ROSENBLATT, Frank.
(Cornell Aeronautical Laboratory at Cornell
University )
The Perceptron: A Probabilistic Model for
Information Storage and Organization in the Brain.
Carla P. Gomes
CS4700
Single Layer Feed-forward Neural Networks
Perceptrons
Carla P. Gomes
CS4700
Perceptron to Learn to Identify Digits
(From Pat. Winston, MIT)
Digit x0 x1 x2 x3 x4 x5 x6
0 0 1 1 1 1 1 1
9 1 1 1 1 1 1 0
Seven line segments 8 1 1 1 1 1 1 1
are enough to produce
7 0 0 1 1 1 0 0
all 10 digits
6 1 1 1 0 1 1 1
2 5 1 1 1 0 1 1 0
4 1 1 0 1 1 0 0
1
3 1 0 1 1 1 1 0
0 2 1 0 1 1 0 1 1
6
1 0 0 0 1 1 0 0
5 Carla P. Gomes
CS4700
Perceptron to Learn to Identify Digits
(From Pat. Winston, MIT)
2
1
0
A vision system reports which of the seven segments
6
0 0 1 1 1 1 1 1 1
When the input digit is 0,
what’s the value of
-1 sum?
Seven line segments 0
are enough to produce 0
all 10 digits 0
0
0
2
1
0
1
0 Sum>0 output=1
Else output=0
6
Carla P. Gomes
CS4700
Perceptron Learning:
Intuition
Weight Update
Input Ij (j=1,2,…,n)
Single output O: target output, T.
Consider some initial weights
Define example error: Err = T – O
Now just move weights in right direction!
If the error is positive, then we need to increase O.
Err >0 need to increase O;
Err <0 need to decrease O;
Each input unit j, contributes Wj Ij to total input:
if Ij is positive, increasing Wj tends to increase O;
if Ij is negative, decreasing Wj tends to increase O;
So, use:
Wj Wj + Ij Err
Perceptron Learning Rule (Rosenblatt 1960)
Learning rate = 1.
Threshold function: k n
S wk xk S 0 then O 1 else O 0
k 0
Carla P. Gomes
CS4700
Err = T – O Perceptron Learning:
Wj Wj + Ij Err Simple Example
Set of examples, each example is a pair ( x i , yi ) This procedure provably converges
i.e., an input vector and a label y (0 or 1). (polynomial number of steps)
if the function is represented
by a perceptron
(i.e., linearly separable)
Learning procedure, called the “error correcting method”
Sample x0 x1 x2 label
1 1 0 0 0
2 1 0 1 1
3 1 1 0 1
4 1 1 1 1
k n
Activation Function S wk xk S 0 then O 1 else O 0
k 0
Carla P. Gomes
CS4700
k n
S wk xk S 0 then O 1 else O 0
Perceptron Learning:
k 0
Error correcting method
If perceptron is 0 while it should be 1,
add the input vector to the weight vector
If perceptron is 1 while it should be 0, Simple Example
subtract the input vector to the weight vector
Otherwise do nothing.
1
We’ll use a single perceptron with three inputs. I0 w0
We’ll start with all weights 0 W= <0,0,0>
I1 w O
1
Example 1 I= < 1 0 0> label=0 W= <0,0,0>
Perceptron (10+ 00+ 00 =0, S=0) output 0 I2 w2
it classifies it as 0, so correct, do nothing
Carla P. Gomes
CS4700
1
Example 3 I=<1 1 0> label=1 W= <1,0,1> I0 w0
Perceptron (10+ 10+ 00 > 0) output = 1
it classifies it as 1, correct, do nothing I1 w O
1
W = <1,0,1>
I2 w2
Carla P. Gomes
CS4700
Error correcting method
If perceptron is 0 while it should be 1,
add the input vector to the weight vector
Perceptron Learning:
If perceptron is 1 while it should be 0,
subtract the input vector from the weight vector Simple Example
Otherwise do nothing.
1 I0 w0
Epoch 2, through the examples, W = <1,0,1> .
I1 w O
Example 1 I = <1,0,0> label=0 W = <1,0,1> 1
Carla P. Gomes
CS4700
Example 3 I=<1 1 0> label=1 W= <0,0,1>
Perceptron (10+ 10+ 01 > 0) output = 0
it classifies it as 0, while it should be 1, so add input to weights
W = <0,0,1> + W = <1,1,0> = <1, 1, 1>
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
1 I0 w0
Epoch 3, through the examples, W = <1,1,1> .
I1 w O
Example 1 I=<1,0,0> label=0 W = <1,1,1> 1
Carla P. Gomes
CS4700
Example 3 I=<1 1 0> label=1 W= <0, 1, 1>
Perceptron (10+ 11+ 01 > 0) output = 1
it classifies it as 1, correct, do nothing
Carla P. Gomes
CS4700
Perceptron Learning:
Simple Example
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 1 1 0 1
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4
1 1 1 1 1 0 1 1 0 1 0 1
2 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4
1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2
1 0 1 1 0 1 1 1 0 0 1 1
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2
1 0 1 1 0 1 1 1 0 0 1 1
example 3
1 1 0 1 0 1 1 1 0 0 1 1
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2
1 0 1 1 0 1 1 1 0 0 1 1
example 3
1 1 0 1 0 1 1 1 0 0 1 1
example 4
1 1 1 1 0 1 1 1 0 0 1 1
4 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Epoch x0 x1 x2 Desired w0 w1 w2 Output Error New New New
Target
w0 w1 w2
1 example 1 1 0 0 0 0 0 0 0 0 0 0 0
example 2
1 0 1 1 0 0 0 0 1 1 0 1
example 3
1 1 0 1 1 0 1 1 0 1 0 1
example 4 1 1 1 1 1 0 1 1 0 1 0 1
2 example 1 1 0 0 0 1 0 1 1 -1 0 0 1
example 2
1 0 1 1 0 0 1 1 0 0 0 1
example 3
1 1 0 1 0 0 1 0 1 1 1 1
example 4
1 1 1 1 1 1 1 1 0 1 1 1
3 example 1 1 0 0 0 1 1 1 1 -1 0 1 1
example 2
1 0 1 1 0 1 1 1 0 0 1 1
example 3
1 1 0 1 0 1 1 1 0 0 1 1
example 4
1 1 1 1 0 1 1 1 0 0 1 1
4 example 1 1 0 0 0 0 1 1 0 0 0 1 1
Carla P. Gomes
CS4700
Derivation of a learning rule for
Perceptrons Minimizing Squared Errors
We’ll use:
Sum of squared errors (e.g., used in linear regression), classical error measure
Definition:
1
E (x) Squared Error (x) ( y hw (x)) 2
2
Carla P. Gomes
CS4700
Derivation of a learning rule for
Perceptrons Minimizing Squared Errors
The squared error for a single training example with input x and true output y is:
Where hw (x) is the output of the perceptron on the example and y is the true output
value.
We can use the gradient descent to reduce the squared error by calculating the
partial derivatives of E with respect to each weight.
Note: g’(in) derivative of the activation function. For sigmoid g’=g(1-g). For threshold perceptrons,
Where g’(n) is undefined, the original perceptron rule simply omitted it. Carla P. Gomes
CS4700
Gradient descent algorithm we want to reduce , E, for each weight wi , change weight in
direction of steepest descent:
learning rate
Intuitively:
Wj Wj + Ij Err
Err = y – hW(x) positive
output is too small weights are increased for positive inputs and decreased for
negative inputs.
Carla P. Gomes
CS4700
Perceptron Learning:
Intuition
Carla P. Gomes
CS4700
Gradient descent in weight space
wi , wi wi wi ,
where
wi ( y g (in)) g ' (in) xi
5. Go to 2.
Carla P. Gomes
CS4700
Expressiveness of Perceptrons
Carla P. Gomes
CS4700
Expressiveness of Perceptrons
w1 w0
x2 x1
w2 w2
Percepton used for classification
Carla P. Gomes
CS4700
Linear Separability
x2
OR
x1
Carla P. Gomes
CS4700
Linear Separability
x2
AND
x1
Carla P. Gomes
CS4700
Linear Separability
x2
XOR
x1
Carla P. Gomes
CS4700
Linear Separability
x2
Not linearly separable
XOR
x1
Consider a threshold perceptron for the logical XOR function (two inputs):
w1 x1 w2 x2 T
Carla P. Gomes
CS4700
Perceptron learns majority function easily,
DTL is hopeless Carla P. Gomes
CS4700
DTL learns restaurant function easily,
perceptron cannot represent it
Carla P. Gomes
CS4700
Good news: Adding hidden layer allows more target
functions to be represented.
Carla P. Gomes
CS4700