SlideShare a Scribd company logo
SS2016 Modern Neural
Computation
Lecture 5: Neural Networks
and Neuroscience
Hirokazu Tanaka
School of Information Science
Japan Institute of Science and Technology
Supervised learning as functional approximation.
In this lecture we will learn:
• Single-layer neural networks
Perceptron and the perceptron theorem.
Cerebellum as a perceptron.
• Multi-layer feedforward neural networks
Universal functional approximations, Back-propagation
algorithms
• Recurrent neural networks
Back-propagation-through-time (BPTT) algorithms
• Tempotron
Spike-based perceptron
Gradient-descent learning for optimization.
• Classification problem: to output discrete labels.
For a binary classification (i.e., 0 or 1), a cross-entropy is
often used.
• Regression problem: to output continuous values.
Sum of squared errors is often used.
Cost function: classification and regression.
• Classification problem: to output discrete labels.
For a binary classification (i.e., 0 or 1), a cross-entropy is
often used.
• Regression problem: to output continuous values.
Sum of squared errors is often used.
ˆ:output of network, :desired outputi iy y
( ) ( ) ( )
ˆ1ˆ
: samples: samples
ˆ ˆlog 1 log 1 log 1ii
i i i i i
yy
i
ii
y y y y y y
−
− − =− + − −  ∑∏
( )
: sa p e
2
m l s
ˆi
i
iy y−∑
Perceptron: single-layer neural network.
• Assume a single-layer neural network with an input layer
composed of N units and an output layer composed of
one unit.
• Input units are specified by
and an output unit are determined by
( )1
T
Nx x=x 
( )T
0
1
0
n
i
i iy f w x fw w
=
 
= + = + 
 
∑ w x
( )
1 if 0
0 if 0
u
f
u
u
≥
= 
<
Perceptron: single-layer neural network.
feature 1
feature 2
Perceptron: single-layer neural network.
• [Remark] Instead of using
often, an augmented input vector
are used. Then,
( )1
T
Nx x=x 
( ) ( )T T
0y f w f= + =w x w x
( )11
T
Nx x=x 
( )10
T
Nw w w=w 
Perceptron Learning Algorithm.
( ) ( ) ( ){ }21 1 2, , ,, , ,P Pd d dx x x
• Given a training set:
• Perceptron learning rule:
( )i i iydη −∆ =w x
while err>1e-4 && count<10
y = sign(w'*X)';
wnew = w + X*(d-y)/P;
wnew = wnew/norm(wnew);
count = count+1;
err = norm(w-wnew)/norm(w)
w = wnew;
end
Perceptron Learning Algorithm.
Case 1: Linearly separable case
Perceptron Learning Algorithm.
Case 2: Linearly non-separable case
Perceptron’s capacity: Cover’s Counting Theorem.
• Question: Suppose that there are P vectors in N-
dimensional Euclidean space.
There are 2P possible patterns of two classes. How many
of them are linearly separable?
[Remark] They are assumed to be in general position.
• Answer: Cover’s Counting Theorem.
{ }1, ,, N
P i ∈x x x 
( )
1
0
1
, 2
N
k
P
C P N
k
−
=
− 
=  
 
∑
Perceptron’s capacity: Cover’s Counting Theorem.
• Cover’s Counting Theorem.
• Case 𝑃𝑃 ≤ 𝑁𝑁:
• Case 𝑃𝑃 = 2𝑁𝑁:
• Case 𝑃𝑃 ≫ 𝑁𝑁:
( )
1
0
1
, 2
N
k
P
C P N
k
−
=
− 
=  
 
∑
( ), 2P
C P N =
( ) 1
, 2P
C P N −
=
( ), N
C P N AP≈
Cover (1965) IEEE Information; Sompolinsky (2013) MIT lecture note
Perceptron’s capacity: Cover’s Counting Theorem.
• Case for large P:
Orhan (2014) “Cover’s Function Counting Theorem”
( ) 1 2
1 e
2
rf
,
2 2P
pC P
N
N
p
  
+ −   
   
≈
Cerebellum as a Perceptron.
Llinas (1974) Scientific American
Cerebellum as a Perceptron.
• Cerebellar cortex has a feedforward structure:
mossy fibers -> granule cells -> parallel fibers -> Purkinje
cells
Ito (1984) “Cerebellum and Neural Control”
Cerebellum as a Perceptron (or its extensions)
• Perceptron model
Marr (1969): Long-term potentiation (LTP) learning.
Albus (1971): Long-term depression (LTD) learning.
• Adaptive filter theory
Fujita (1982): Reverberation among granule and Golgi
cells for generating temporal templates.
• Liquid-state machine model
Yamazaki and Tanaka (2007):
Perceptron: a new perspective.
• Evaluation of memory capacity of a Purkinje cell using
perceptron methods (the Gardner limit).
Brunel, N., Hakim, V., Isope, P., Nadal, J. P., & Barbour, B. (2004). Optimal
information storage and the distribution of synaptic weights: perceptron versus
Purkinje cell. Neuron, 43(5), 745-757.
• Estimation of dimensions of neural representations
during visual memory task in the prefrontal cortex using
perceptron methods (Cover’s counting theorem).
Rigotti, M., Barak, O., Warden, M. R., Wang, X. J., Daw, N. D., Miller, E. K., & Fusi,
S. (2013). The importance of mixed selectivity in complex cognitive tasks.
Nature, 497(7451), 585-590.
Limitation of Perceptron.
• Only linearly separable input-output sets can be learned.
• Non-linear sets, even a simple one like XOR, CANNOT be
learned.
Multilayer neural network: feedforward design
( )n
ix
( )1n
jx −
Layer 1 Layer n-1 Layer n Layer N
( )1n
ijw
−
• Feedforward network: a unit in layer n receives inputs
from layer n-1 and projects to layer n+1.
Multilayer neural network: feedforward design
( )n
ix
( )1n
jx −
Layer 1 Layer n-1 Layer n Layer N
( )1n
ijw
−
• Feedforward network: a unit in layer n receives inputs
from layer n-1 and projects to layer n+1.
Multilayer neural network: forward propagation.
( ) ( )
( ) ( ) ( )1 1
1
n n n n
i i ij j
j
x f u f w x− −
=
 
= =  
 
∑
( )
1
1 u
f u
e−
=
+
( )
( )
( ) ( )( )2
1 1
1
1
1
11
u
u uu
f
e
e e
u
e
f u f u
−
− −−
 
= = − = 
+ + +
′ −
Layer n-1 Layer n
( )n
ix
( )1n
jx
−
( )1n
ijw
−
( ) ( ) ( )1 1
1
n n n
i ij j
j
u w x
− −
=
= ∑
In a feedforward multilayer neural network propagates its activities
from one layer to another in one direction:
Inputs to neurons in layer n are a
summation of activities of neurons in
layer n-1:
The function f is called an activation function, and its derivative is
easy to compute:
Multilayer neural network: error backpropagation
• Define an cost function as a squared sum of errors in
output units:
Gradients of cost function with respect to weights:
( )
( ) ( )
( )
2 21 1
2 2
N N
i i i
i i
x z= − = ∆∑ ∑
Layer n-1 Layer n
( ) ( ) ( ) ( )
( ) ( )1 1
1
n n n n n
i j j j ji
j
x x w
− −
∆ = ∆ −∑
( )1n
j
−
∆
( )n
i∆
The neurons in the output layer has
explicit supervised errors (the difference
between the network outputs and the
desired outputs). How, then, to compute
the supervising signals for neurons in
intermediate layers?
Multilayer neural network: error backpropagation
1. Compute activations of units in all layers.
2. Compute errors in the output units, .
3. “Back-propagate” the errors to lower layers using
4. Update the weights
( )
{ } ( )
{ } ( )
{ }1
,, , ,
n N
i i ix x x 
( )
{ }N
i∆
( ) ( ) ( ) ( )
( ) ( )1 1
1n n n n n
i j j j ji
j
x x w
− −
∆ = ∆ −∑
( ) ( ) ( ) ( )
( ) ( )1 1 1
1
n n n n n
ij i i i jw x x xη + + +
∆ =− ∆ −
Multilayer neural network as universal machine for
functional approximation.
A multilayer neural network is in principle able to approximate any
functional relationship between inputs and outputs at any desired
accuracy (Funahashi, 1988).
Intuition: A sum or a difference of two sigmoid functions is a “bump-
like” function. And, a sufficiently large number of bump functions
can approximate any function.
NETtalk: A parallel network that learns to read aloud.
Sejnowski & Rosenberg (1987) Complex Systems
A feedforward three-layer neural network with delay lines.
NETtalk: A parallel network that learns to read aloud.
Sejnowski & Rosenberg (1987) Complex Systems; https://www.youtube.com/watch?v=gakJlr3GecE
A feedforward three-layer neural network with delay lines.
NETtalk: A parallel network that learns to read aloud.
Sejnowski & Rosenberg (1987) Complex Systems
Activations of hidden units for a same sound but different inputs
Hinton diagrams: characterizing and visualizing
connection to and from hidden units.
Hinton (1992) Sci Am
Activations of hidden units for a same sound but different inputs
Autonomous driving learning by backpropagation.
Pomerleau (1991) Neural Comput
Activations of hidden units for a same sound but different inputs
Autonomous driving learning by backpropagation.
Pomerleau (1991) Neural Comput; https://www.youtube.com/watch?v=ilP4aPDTBPE
Gradient vanishing problem: why is training a multi-layer
neural network so difficult?
Hochreiter et al. (1991)
• The back-propagation algorithm works only for neural networks of
three or four layers.
• Training neural networks with many hidden layers – called “deep
neural networks”- is notoriously difficult.
( ) ( ) ( ) ( )
( ) ( )1 1
1N N N N N
j i i i ij
i
x x w− −
∆ = ∆ −∑
( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
2 1 1 1 2
1 1 1 2
1
1 1
N N N N N
k j j j jk
j
N N N N N N N
i i i ij j j jk
j i
x x w
x x w x x w
− − − − −
− − − −
∆ = ∆ −
 
= ∆ − − 
 
∑
∑ ∑
( )
( ) ( ) ( ) ( )( 1) ( 1) ( 1) ( 1) ( ) ( )
~ 1 1 1
n Nn n N N N N
x x x x x x+ + − −
∆ − × × − × − ×∆
Multilayer neural network: recurrent connections
• A feedforward neural network can represent an
instantaneous relationship between inputs and outputs
- memoryless: it depends on current inputs but not on
previous inputs.
• In order to describe a history, a neural network should
have its own dynamics.
• One way to incorporate dynamics into a neural network
is to introduce recurrent connections between units.
Working memory in the parietal cortex.
• A feedforward neural network can represent an
instantaneous relationship between inputs and outputs
- memoryless: it depends on current inputs x(t) but not
on previous inputs x(t-1), x(t-2), ...
• In order to describe a history, a neural network should
have its own dynamics.
• One way to incorporate dynamics into a neural network
is to introduce recurrent connections between units.
Multilayer neural network: recurrent connections
( ) ( )( ) ( ) ( )( )( )1 1ii i
x t f u t f t t+= += +Wx Ua
( ) ( )( )iz t g t= Vx
Recurrent dynamics of neural network:
Output readout:
a x z
U VW
Temporal unfolding: backpropagation through time (BPTT)
1t−a
1t−x tztx
{ }10 2 1,, , ,, ,t T −a a a aa  
{ }1 2 3, , , ,, ,t Tzz z zz  
,U W V
Training set for a recurrent network:
Input series:
Output series:
Optimize the weight matrices so as to approximate the training set:
Temporal unfolding: backpropagation through time (BPTT)
0a 1z1x,U W V
0a
2z1x,U W
V,U W
1a 2x
0a
3z
1x,U W
V
,U W
1a 3x2x
,U W
2a
1t−a
1t−x tztx,U W V
Working-memory related activity in parietal cortex.
Gnadt & Andersen (1988) Exp Brain Res
Temporal unfolding: backpropagation through time (BPTT)
Zipser (1991) Neural Comput
Temporal unfolding: backpropagation through time (BPTT)
Zipser (1991) Neural Comput
Model
Experiment
Model
Experiment
Spike pattern discrimination in humans.
Johansson & Birznieks (2004); Johansson & Flanagan (2009)
Spike pattern discrimination in dendrites.
Branco et al. (2009) Science
Tempotron: Spike-based perceptron.
Consider five neurons and each emitting one spike but at different timings:
Rate coding: Information is coded in numbers of spikes in a given period.
( ) ( )31 2 4 5, , , , 1,1,1,1,1r r r r r =
Temporal coding: Information is coded in temporal patterns of spiking.
Tempotron: Spike-based perceptron.
Consider five neurons and each emitting one spike but at different timings:
Tempotron: Spike-based perceptron.
Basic idea: Expand the spike pattern into time:
N
T
N×T
Now
Tempotron: Spike-based perceptron.
3
1 1
t t
w e w e− ∆ −∆
+
2 2
2 t
w e w− ∆
+
2
1 1
t
w e w− ∆
+
3
2 2
t t
w e w e− ∆ −∆
+
( ) ( )2
1 2
3 2
1t t t
w e e w e θ− ∆ − ∆ − ∆
+ + + > ( ) ( )2
1 2
2 3
1t t t
w e w e e θ− ∆ − ∆ − ∆
+ + + <
( ) ( )
3 2 2
1
2 3 2
1 2
2
1
, ,
1
t t t
t t t
w e e e
w e e e
− ∆ − ∆ − ∆
− ∆ − ∆ − ∆
   + + 
= = =    
+ +     
w x x
( ) ( )T T1 2
,θ θ> <w x w x
Consider a classification problem of two spike patterns:
If a vector notation is introduced:
This classification problem is reduced to a perceptron problem:
Tempotron: Spike-based perceptron.
3
1 1
t t
w e w e− ∆ −∆
+
2 2
2 t
w e w− ∆
+
2
1 1
t
w e w− ∆
+
3
2 2
t t
w e w e− ∆ −∆
+
( ) ( )2
1 2
3 2
1t t t
w e e w e θ− ∆ − ∆ − ∆
+ + + > ( ) ( )2
1 2
2 3
1t t t
w e w e e θ− ∆ − ∆ − ∆
+ + + <
( ) ( )
3 2 2
1
2 3 2
1 2
2
1
, ,
1
t t t
t t t
w e e e
w e e e
− ∆ − ∆ − ∆
− ∆ − ∆ − ∆
   + + 
= = =    
+ +     
w x x
( ) ( )T T1 2
,θ θ> <w x w x
Consider a classification problem of two spike patterns:
If a vector notation is introduced:
This classification problem is reduced to a perceptron problem:
Learning a tempotron: intuition.
3
1 1
t t
w e w e− ∆ −∆
+
2 2
2 t
w e w− ∆
+
2
1 1
t
w e w− ∆
+
3
2 2
t t
w e w e− ∆ −∆
+
( ) ( )2
1 2
3 2
1t t t
w e e w e θ− ∆ − ∆ − ∆
+ + + > ( ) ( )2
1 2
2 3
1t t t
w e w e e θ− ∆ − ∆ − ∆
+ + >+
What was wrong if the second pattern was misclassified?
The last spike of neuron #1 (red one) is most responsible for the error, so
the synaptic strength of this neuron should be reduced.
1w λ∆ = −
Learning a tempotron: intuition.
3
1 1
t t
w e w e− ∆ −∆
+
2 2
2 t
w e w− ∆
+
2
1 1
t
w e w− ∆
+
3
2 2
t t
w e w e− ∆ −∆
+
( ) ( )2
1 2
3 2
1t t t
w e e w e θ− ∆ − ∆ − ∆
+ + <+ ( ) ( )2
1 2
2 3
1t t t
w e w e e θ− ∆ − ∆ − ∆
+ + + <
What was wrong if the second pattern was misclassified?
The last spike of neuron #2 (red one) is most responsible for the error, so
the synaptic strength of this neuron should be potentiated.
2w λ∆ = +
Exercise: Capacity of perceptron.
• Generate a set of random vectors.
• Write a code for the Perceptron learning algorithm.
• By randomly relabeling, count how many of them are
linearly separable.
Rigotti, M., Barak, O., Warden, M. R., Wang, X. J., Daw, N. D., Miller, E. K., & Fusi, S.
(2013). The importance of mixed selectivity in complex cognitive tasks. Nature,
497(7451), 585-590.
Exercise: Training of recurrent neural networks.
0
α
=
I
P
T
1 T
1
n n n n
n n
n n n
+= −
+
P r r P
P P
r P r
Goal: Investigate the effects of chaos and feedback in a recurrent
network.
( )1t n n n t+= −+ + ∆x x x Mr
T
tanhnn nz = w x
tanhn n=r x
1 nn n n ne+= −w w P r
nn ne z f= −
Recurrent dynamics without feedback:
Update of covariance matrix:
Update of weight matrix:
force_internal_all2all.m
Exercise: Training of recurrent neural networks.
0
α
=
I
P
T
1 T
1
n n n n
n n
n n n
+= −
+
P r r P
P P
r P r
Goal: Investigate the effects of chaos and feedback in a recurrent
network.
( )1
f
t n nn n n tz+= − ++ + ∆x x Mr wx
T
tanhnn nz = w x
tanhn n=r x
1 nn n n ne+= −w w P r
nn ne z f= −
Recurrent dynamics with feedback:
Update of covariance matrix:
Update of weight matrix:
force_external_feedback_loop.m
Exercise: Training of recurrent neural networks.
Goal: Investigate the effects of chaos and feedback in a recurrent
network.
• Investigate the effect of output feedback. Are there any difference
in the activities of recurrent units?
• Investigate the effect of gain parameter g. What happens if the gain
parameter is smaller than 1?
• Try to approximate some other time series such as chaotic ones.
Use the Lorentz model, for example.
References
• Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-
propagating errors. Cognitive modeling, 5(3), 1.
• Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text.
Complex systems, 1(1), 145-168.
• Funahashi, K. I. (1989). On the approximate realization of continuous mappings by neural networks.
Neural networks, 2(3), 183-192.
• S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the
difficulty of learning long-term dependencies
• Zipser, D. (1991). Recurrent network model of the neural mechanism of short-term active memory.
Neural Computation, 3(2), 179-193.
• Johansson, R. S., & Birznieks, I. (2004). First spikes in ensembles of human tactile afferents code
complex spatial fingertip events. Nature neuroscience, 7(2), 170-177.
• Branco, T., Clark, B. A., & Häusser, M. (2010). Dendritic discrimination of temporal input sequences
in cortical neurons. Science, 329(5999), 1671-1675.
• Gütig, R., & Sompolinsky, H. (2006). The tempotron: a neuron that learns spike timing–based
decisions. Nature neuroscience, 9(3), 420-428.
• Sussillo, D., & Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic neural
networks. Neuron, 63(4), 544-557.

More Related Content

What's hot (20)

PDF
Neural Processes Family
Kota Matsui
 
PDF
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
PDF
Backpropagation in Convolutional Neural Network
Hiroshi Kuwajima
 
PDF
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
PDF
Random Matrix Theory and Machine Learning - Part 1
Fabian Pedregosa
 
PDF
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
PDF
Neural Processes
Sangwoo Mo
 
PPT
Annintro
kaushaljha009
 
PDF
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
 
PPT
Nn3
Ruchi Sharma
 
PDF
Dynamics of structures with uncertainties
University of Glasgow
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
PDF
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
PPTX
Neuronal self-organized criticality (II)
Osame Kinouchi
 
PDF
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
Zac Darcy
 
PDF
03 image transform
Rumah Belajar
 
PPTX
Neuronal self-organized criticality
Osame Kinouchi
 
PDF
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Mohammed Bennamoun
 
PPTX
Deep neural networks & computational graphs
Revanth Kumar
 
Neural Processes Family
Kota Matsui
 
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
Backpropagation in Convolutional Neural Network
Hiroshi Kuwajima
 
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 1
Fabian Pedregosa
 
Random Matrix Theory and Machine Learning - Part 2
Fabian Pedregosa
 
Neural Processes
Sangwoo Mo
 
Annintro
kaushaljha009
 
Artificial Neural Network Lect4 : Single Layer Perceptron Classifiers
Mohammed Bennamoun
 
Dynamics of structures with uncertainties
University of Glasgow
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Stochastic Alternating Direction Method of Multipliers
Taiji Suzuki
 
05 history of cv a machine learning (theory) perspective on computer vision
zukun
 
Neuronal self-organized criticality (II)
Osame Kinouchi
 
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
Zac Darcy
 
03 image transform
Rumah Belajar
 
Neuronal self-organized criticality
Osame Kinouchi
 
Artificial Neural Networks Lect8: Neural networks for constrained optimization
Mohammed Bennamoun
 
Deep neural networks & computational graphs
Revanth Kumar
 

Viewers also liked (20)

PDF
KDD2016論文読み会資料(DeepIntent)
Sotetsu KOYAMADA(小山田創哲)
 
PDF
【強化学習】Montezuma's Revenge @ NIPS2016
Sotetsu KOYAMADA(小山田創哲)
 
PDF
最近のRのランダムフォレストパッケージ -ranger/Rborist-
Shintaro Fukushima
 
PDF
機械学習によるデータ分析 実践編
Ryota Kamoshida
 
PPTX
Kerberos
Gichelle Amon
 
PPT
Os module 2 d
Gichelle Amon
 
PDF
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
Sotetsu KOYAMADA(小山田創哲)
 
PDF
Why dont you_create_new_spark_jl
Shintaro Fukushima
 
PDF
Probabilistic Graphical Models 輪読会 #1
Takuma Yagi
 
PDF
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
Takuma Yagi
 
PPTX
Women in Tech: How to Build A Human Company
Luminary Labs
 
PDF
Rユーザのためのspark入門
Shintaro Fukushima
 
PDF
論文紹介:Using the Forest to See the Trees: A Graphical. Model Relating Features,...
Takuma Yagi
 
PDF
機械学習によるデータ分析まわりのお話
Ryota Kamoshida
 
PPTX
What is the maker movement?
Luminary Labs
 
PPT
Network security
Gichelle Amon
 
PDF
The Human Company Playbook, Version 1.0
Luminary Labs
 
PDF
Hype vs. Reality: The AI Explainer
Luminary Labs
 
PPTX
A Non Linear Model to explain persons with Stroke
Hariohm Pandian
 
PDF
From epilepsy to migraine to stroke: A unifying framework.
MPI Dresden / HU Berlin
 
KDD2016論文読み会資料(DeepIntent)
Sotetsu KOYAMADA(小山田創哲)
 
【強化学習】Montezuma's Revenge @ NIPS2016
Sotetsu KOYAMADA(小山田創哲)
 
最近のRのランダムフォレストパッケージ -ranger/Rborist-
Shintaro Fukushima
 
機械学習によるデータ分析 実践編
Ryota Kamoshida
 
Kerberos
Gichelle Amon
 
Os module 2 d
Gichelle Amon
 
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
Sotetsu KOYAMADA(小山田創哲)
 
Why dont you_create_new_spark_jl
Shintaro Fukushima
 
Probabilistic Graphical Models 輪読会 #1
Takuma Yagi
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
Takuma Yagi
 
Women in Tech: How to Build A Human Company
Luminary Labs
 
Rユーザのためのspark入門
Shintaro Fukushima
 
論文紹介:Using the Forest to See the Trees: A Graphical. Model Relating Features,...
Takuma Yagi
 
機械学習によるデータ分析まわりのお話
Ryota Kamoshida
 
What is the maker movement?
Luminary Labs
 
Network security
Gichelle Amon
 
The Human Company Playbook, Version 1.0
Luminary Labs
 
Hype vs. Reality: The AI Explainer
Luminary Labs
 
A Non Linear Model to explain persons with Stroke
Hariohm Pandian
 
From epilepsy to migraine to stroke: A unifying framework.
MPI Dresden / HU Berlin
 
Ad

Similar to JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience (20)

PPTX
Multilayer Perceptron Neural Network MLP
Abdullah al Mamun
 
PPT
19_Learning.ppt
gnans Kgnanshek
 
PPT
Intro to Deep learning - Autoencoders
Akash Goel
 
PPT
SOFT COMPUTERING TECHNICS -Unit 1
sravanthi computers
 
PPT
SOFTCOMPUTERING TECHNICS - Unit
sravanthi computers
 
PPTX
Neural network
marada0033
 
PPT
lecture07.ppt
butest
 
PPTX
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
gnans Kgnanshek
 
PPT
ann-ics320Part4.ppt
GayathriRHICETCSESTA
 
PPT
ann-ics320Part4.ppt
GayathriRHICETCSESTA
 
PPTX
Unit 2 ml.pptx
PradeeshSAI
 
PPTX
Unit 1
Vinod Srinivasan
 
PPT
INTRODUCTION TO ARTIFICIAL INTELLIGENCE.
SoumitraKundu4
 
PPT
Artificial neural networks and deep learning.ppt
justjoking99yt
 
PPT
neural1Advanced Features of Neural Network.ppt
dabeli2153
 
PPT
neural networking and factor analysis.ppt
apsapssingh9
 
PPT
Data mining techniques power point presentation
IDLEGamerz
 
PPTX
Artificial Neural Network Topology
Harshana Madusanka Jayamaha
 
PPT
2011 0480.neural-networks
Parneet Kaur
 
Multilayer Perceptron Neural Network MLP
Abdullah al Mamun
 
19_Learning.ppt
gnans Kgnanshek
 
Intro to Deep learning - Autoencoders
Akash Goel
 
SOFT COMPUTERING TECHNICS -Unit 1
sravanthi computers
 
SOFTCOMPUTERING TECHNICS - Unit
sravanthi computers
 
Neural network
marada0033
 
lecture07.ppt
butest
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
gnans Kgnanshek
 
ann-ics320Part4.ppt
GayathriRHICETCSESTA
 
ann-ics320Part4.ppt
GayathriRHICETCSESTA
 
Unit 2 ml.pptx
PradeeshSAI
 
INTRODUCTION TO ARTIFICIAL INTELLIGENCE.
SoumitraKundu4
 
Artificial neural networks and deep learning.ppt
justjoking99yt
 
neural1Advanced Features of Neural Network.ppt
dabeli2153
 
neural networking and factor analysis.ppt
apsapssingh9
 
Data mining techniques power point presentation
IDLEGamerz
 
Artificial Neural Network Topology
Harshana Madusanka Jayamaha
 
2011 0480.neural-networks
Parneet Kaur
 
Ad

More from hirokazutanaka (11)

PDF
東京都市大学 データ解析入門 10 ニューラルネットワークと深層学習 1
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 9 クラスタリングと分類分析 2
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 8 クラスタリングと分類分析 1
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 6 回帰分析とモデル選択 1
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 4 スパース性と圧縮センシング1
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 3 行列分解 2
hirokazutanaka
 
PDF
東京都市大学 データ解析入門 2 行列分解 1
hirokazutanaka
 
PDF
Computational Motor Control: Reinforcement Learning (JAIST summer course)
hirokazutanaka
 
PDF
Computational Motor Control: Introduction (JAIST summer course)
hirokazutanaka
 
東京都市大学 データ解析入門 10 ニューラルネットワークと深層学習 1
hirokazutanaka
 
東京都市大学 データ解析入門 9 クラスタリングと分類分析 2
hirokazutanaka
 
東京都市大学 データ解析入門 8 クラスタリングと分類分析 1
hirokazutanaka
 
東京都市大学 データ解析入門 7 回帰分析とモデル選択 2
hirokazutanaka
 
東京都市大学 データ解析入門 6 回帰分析とモデル選択 1
hirokazutanaka
 
東京都市大学 データ解析入門 5 スパース性と圧縮センシング 2
hirokazutanaka
 
東京都市大学 データ解析入門 4 スパース性と圧縮センシング1
hirokazutanaka
 
東京都市大学 データ解析入門 3 行列分解 2
hirokazutanaka
 
東京都市大学 データ解析入門 2 行列分解 1
hirokazutanaka
 
Computational Motor Control: Reinforcement Learning (JAIST summer course)
hirokazutanaka
 
Computational Motor Control: Introduction (JAIST summer course)
hirokazutanaka
 

Recently uploaded (20)

PDF
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
IDEAS AND EARLY STATES Social science pptx
NIRANJANASSURESH
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
PPTX
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
PPTX
THE JEHOVAH’S WITNESSES’ ENCRYPTED SATANIC CULT
Claude LaCombe
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
10CLA Term 3 Week 4 Study Techniques.pptx
mansk2
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
Introduction to Probability(basic) .pptx
purohitanuj034
 
PDF
A guide to responding to Section C essay tasks for the VCE English Language E...
jpinnuck
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
PPTX
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
PPTX
Constitutional Design Civics Class 9.pptx
bikesh692
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PDF
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
Exploring-the-Investigative-World-of-Science.pdf/8th class curiosity/1st chap...
Sandeep Swamy
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
IDEAS AND EARLY STATES Social science pptx
NIRANJANASSURESH
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
LDP-2 UNIT 4 Presentation for practical.pptx
abhaypanchal2525
 
THE JEHOVAH’S WITNESSES’ ENCRYPTED SATANIC CULT
Claude LaCombe
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
10CLA Term 3 Week 4 Study Techniques.pptx
mansk2
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
Introduction to Probability(basic) .pptx
purohitanuj034
 
A guide to responding to Section C essay tasks for the VCE English Language E...
jpinnuck
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Command Palatte in Odoo 18.1 Spreadsheet - Odoo Slides
Celine George
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Applied-Statistics-1.pptx hardiba zalaaa
hardizala899
 
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
Constitutional Design Civics Class 9.pptx
bikesh692
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience

  • 1. SS2016 Modern Neural Computation Lecture 5: Neural Networks and Neuroscience Hirokazu Tanaka School of Information Science Japan Institute of Science and Technology
  • 2. Supervised learning as functional approximation. In this lecture we will learn: • Single-layer neural networks Perceptron and the perceptron theorem. Cerebellum as a perceptron. • Multi-layer feedforward neural networks Universal functional approximations, Back-propagation algorithms • Recurrent neural networks Back-propagation-through-time (BPTT) algorithms • Tempotron Spike-based perceptron
  • 3. Gradient-descent learning for optimization. • Classification problem: to output discrete labels. For a binary classification (i.e., 0 or 1), a cross-entropy is often used. • Regression problem: to output continuous values. Sum of squared errors is often used.
  • 4. Cost function: classification and regression. • Classification problem: to output discrete labels. For a binary classification (i.e., 0 or 1), a cross-entropy is often used. • Regression problem: to output continuous values. Sum of squared errors is often used. ˆ:output of network, :desired outputi iy y ( ) ( ) ( ) ˆ1ˆ : samples: samples ˆ ˆlog 1 log 1 log 1ii i i i i i yy i ii y y y y y y − − − =− + − −  ∑∏ ( ) : sa p e 2 m l s ˆi i iy y−∑
  • 5. Perceptron: single-layer neural network. • Assume a single-layer neural network with an input layer composed of N units and an output layer composed of one unit. • Input units are specified by and an output unit are determined by ( )1 T Nx x=x  ( )T 0 1 0 n i i iy f w x fw w =   = + = +    ∑ w x ( ) 1 if 0 0 if 0 u f u u ≥ =  <
  • 6. Perceptron: single-layer neural network. feature 1 feature 2
  • 7. Perceptron: single-layer neural network. • [Remark] Instead of using often, an augmented input vector are used. Then, ( )1 T Nx x=x  ( ) ( )T T 0y f w f= + =w x w x ( )11 T Nx x=x  ( )10 T Nw w w=w 
  • 8. Perceptron Learning Algorithm. ( ) ( ) ( ){ }21 1 2, , ,, , ,P Pd d dx x x • Given a training set: • Perceptron learning rule: ( )i i iydη −∆ =w x while err>1e-4 && count<10 y = sign(w'*X)'; wnew = w + X*(d-y)/P; wnew = wnew/norm(wnew); count = count+1; err = norm(w-wnew)/norm(w) w = wnew; end
  • 9. Perceptron Learning Algorithm. Case 1: Linearly separable case
  • 10. Perceptron Learning Algorithm. Case 2: Linearly non-separable case
  • 11. Perceptron’s capacity: Cover’s Counting Theorem. • Question: Suppose that there are P vectors in N- dimensional Euclidean space. There are 2P possible patterns of two classes. How many of them are linearly separable? [Remark] They are assumed to be in general position. • Answer: Cover’s Counting Theorem. { }1, ,, N P i ∈x x x  ( ) 1 0 1 , 2 N k P C P N k − = −  =     ∑
  • 12. Perceptron’s capacity: Cover’s Counting Theorem. • Cover’s Counting Theorem. • Case 𝑃𝑃 ≤ 𝑁𝑁: • Case 𝑃𝑃 = 2𝑁𝑁: • Case 𝑃𝑃 ≫ 𝑁𝑁: ( ) 1 0 1 , 2 N k P C P N k − = −  =     ∑ ( ), 2P C P N = ( ) 1 , 2P C P N − = ( ), N C P N AP≈ Cover (1965) IEEE Information; Sompolinsky (2013) MIT lecture note
  • 13. Perceptron’s capacity: Cover’s Counting Theorem. • Case for large P: Orhan (2014) “Cover’s Function Counting Theorem” ( ) 1 2 1 e 2 rf , 2 2P pC P N N p    + −        ≈
  • 14. Cerebellum as a Perceptron. Llinas (1974) Scientific American
  • 15. Cerebellum as a Perceptron. • Cerebellar cortex has a feedforward structure: mossy fibers -> granule cells -> parallel fibers -> Purkinje cells Ito (1984) “Cerebellum and Neural Control”
  • 16. Cerebellum as a Perceptron (or its extensions) • Perceptron model Marr (1969): Long-term potentiation (LTP) learning. Albus (1971): Long-term depression (LTD) learning. • Adaptive filter theory Fujita (1982): Reverberation among granule and Golgi cells for generating temporal templates. • Liquid-state machine model Yamazaki and Tanaka (2007):
  • 17. Perceptron: a new perspective. • Evaluation of memory capacity of a Purkinje cell using perceptron methods (the Gardner limit). Brunel, N., Hakim, V., Isope, P., Nadal, J. P., & Barbour, B. (2004). Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron, 43(5), 745-757. • Estimation of dimensions of neural representations during visual memory task in the prefrontal cortex using perceptron methods (Cover’s counting theorem). Rigotti, M., Barak, O., Warden, M. R., Wang, X. J., Daw, N. D., Miller, E. K., & Fusi, S. (2013). The importance of mixed selectivity in complex cognitive tasks. Nature, 497(7451), 585-590.
  • 18. Limitation of Perceptron. • Only linearly separable input-output sets can be learned. • Non-linear sets, even a simple one like XOR, CANNOT be learned.
  • 19. Multilayer neural network: feedforward design ( )n ix ( )1n jx − Layer 1 Layer n-1 Layer n Layer N ( )1n ijw − • Feedforward network: a unit in layer n receives inputs from layer n-1 and projects to layer n+1.
  • 20. Multilayer neural network: feedforward design ( )n ix ( )1n jx − Layer 1 Layer n-1 Layer n Layer N ( )1n ijw − • Feedforward network: a unit in layer n receives inputs from layer n-1 and projects to layer n+1.
  • 21. Multilayer neural network: forward propagation. ( ) ( ) ( ) ( ) ( )1 1 1 n n n n i i ij j j x f u f w x− − =   = =     ∑ ( ) 1 1 u f u e− = + ( ) ( ) ( ) ( )( )2 1 1 1 1 1 11 u u uu f e e e u e f u f u − − −−   = = − =  + + + ′ − Layer n-1 Layer n ( )n ix ( )1n jx − ( )1n ijw − ( ) ( ) ( )1 1 1 n n n i ij j j u w x − − = = ∑ In a feedforward multilayer neural network propagates its activities from one layer to another in one direction: Inputs to neurons in layer n are a summation of activities of neurons in layer n-1: The function f is called an activation function, and its derivative is easy to compute:
  • 22. Multilayer neural network: error backpropagation • Define an cost function as a squared sum of errors in output units: Gradients of cost function with respect to weights: ( ) ( ) ( ) ( ) 2 21 1 2 2 N N i i i i i x z= − = ∆∑ ∑ Layer n-1 Layer n ( ) ( ) ( ) ( ) ( ) ( )1 1 1 n n n n n i j j j ji j x x w − − ∆ = ∆ −∑ ( )1n j − ∆ ( )n i∆ The neurons in the output layer has explicit supervised errors (the difference between the network outputs and the desired outputs). How, then, to compute the supervising signals for neurons in intermediate layers?
  • 23. Multilayer neural network: error backpropagation 1. Compute activations of units in all layers. 2. Compute errors in the output units, . 3. “Back-propagate” the errors to lower layers using 4. Update the weights ( ) { } ( ) { } ( ) { }1 ,, , , n N i i ix x x  ( ) { }N i∆ ( ) ( ) ( ) ( ) ( ) ( )1 1 1n n n n n i j j j ji j x x w − − ∆ = ∆ −∑ ( ) ( ) ( ) ( ) ( ) ( )1 1 1 1 n n n n n ij i i i jw x x xη + + + ∆ =− ∆ −
  • 24. Multilayer neural network as universal machine for functional approximation. A multilayer neural network is in principle able to approximate any functional relationship between inputs and outputs at any desired accuracy (Funahashi, 1988). Intuition: A sum or a difference of two sigmoid functions is a “bump- like” function. And, a sufficiently large number of bump functions can approximate any function.
  • 25. NETtalk: A parallel network that learns to read aloud. Sejnowski & Rosenberg (1987) Complex Systems A feedforward three-layer neural network with delay lines.
  • 26. NETtalk: A parallel network that learns to read aloud. Sejnowski & Rosenberg (1987) Complex Systems; https://www.youtube.com/watch?v=gakJlr3GecE A feedforward three-layer neural network with delay lines.
  • 27. NETtalk: A parallel network that learns to read aloud. Sejnowski & Rosenberg (1987) Complex Systems Activations of hidden units for a same sound but different inputs
  • 28. Hinton diagrams: characterizing and visualizing connection to and from hidden units. Hinton (1992) Sci Am Activations of hidden units for a same sound but different inputs
  • 29. Autonomous driving learning by backpropagation. Pomerleau (1991) Neural Comput Activations of hidden units for a same sound but different inputs
  • 30. Autonomous driving learning by backpropagation. Pomerleau (1991) Neural Comput; https://www.youtube.com/watch?v=ilP4aPDTBPE
  • 31. Gradient vanishing problem: why is training a multi-layer neural network so difficult? Hochreiter et al. (1991) • The back-propagation algorithm works only for neural networks of three or four layers. • Training neural networks with many hidden layers – called “deep neural networks”- is notoriously difficult. ( ) ( ) ( ) ( ) ( ) ( )1 1 1N N N N N j i i i ij i x x w− − ∆ = ∆ −∑ ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 2 1 1 1 2 1 1 1 2 1 1 1 N N N N N k j j j jk j N N N N N N N i i i ij j j jk j i x x w x x w x x w − − − − − − − − − ∆ = ∆ −   = ∆ − −    ∑ ∑ ∑ ( ) ( ) ( ) ( ) ( )( 1) ( 1) ( 1) ( 1) ( ) ( ) ~ 1 1 1 n Nn n N N N N x x x x x x+ + − − ∆ − × × − × − ×∆
  • 32. Multilayer neural network: recurrent connections • A feedforward neural network can represent an instantaneous relationship between inputs and outputs - memoryless: it depends on current inputs but not on previous inputs. • In order to describe a history, a neural network should have its own dynamics. • One way to incorporate dynamics into a neural network is to introduce recurrent connections between units.
  • 33. Working memory in the parietal cortex. • A feedforward neural network can represent an instantaneous relationship between inputs and outputs - memoryless: it depends on current inputs x(t) but not on previous inputs x(t-1), x(t-2), ... • In order to describe a history, a neural network should have its own dynamics. • One way to incorporate dynamics into a neural network is to introduce recurrent connections between units.
  • 34. Multilayer neural network: recurrent connections ( ) ( )( ) ( ) ( )( )( )1 1ii i x t f u t f t t+= += +Wx Ua ( ) ( )( )iz t g t= Vx Recurrent dynamics of neural network: Output readout: a x z U VW
  • 35. Temporal unfolding: backpropagation through time (BPTT) 1t−a 1t−x tztx { }10 2 1,, , ,, ,t T −a a a aa   { }1 2 3, , , ,, ,t Tzz z zz   ,U W V Training set for a recurrent network: Input series: Output series: Optimize the weight matrices so as to approximate the training set:
  • 36. Temporal unfolding: backpropagation through time (BPTT) 0a 1z1x,U W V 0a 2z1x,U W V,U W 1a 2x 0a 3z 1x,U W V ,U W 1a 3x2x ,U W 2a 1t−a 1t−x tztx,U W V
  • 37. Working-memory related activity in parietal cortex. Gnadt & Andersen (1988) Exp Brain Res
  • 38. Temporal unfolding: backpropagation through time (BPTT) Zipser (1991) Neural Comput
  • 39. Temporal unfolding: backpropagation through time (BPTT) Zipser (1991) Neural Comput Model Experiment Model Experiment
  • 40. Spike pattern discrimination in humans. Johansson & Birznieks (2004); Johansson & Flanagan (2009)
  • 41. Spike pattern discrimination in dendrites. Branco et al. (2009) Science
  • 42. Tempotron: Spike-based perceptron. Consider five neurons and each emitting one spike but at different timings: Rate coding: Information is coded in numbers of spikes in a given period. ( ) ( )31 2 4 5, , , , 1,1,1,1,1r r r r r = Temporal coding: Information is coded in temporal patterns of spiking.
  • 43. Tempotron: Spike-based perceptron. Consider five neurons and each emitting one spike but at different timings:
  • 44. Tempotron: Spike-based perceptron. Basic idea: Expand the spike pattern into time: N T N×T Now
  • 45. Tempotron: Spike-based perceptron. 3 1 1 t t w e w e− ∆ −∆ + 2 2 2 t w e w− ∆ + 2 1 1 t w e w− ∆ + 3 2 2 t t w e w e− ∆ −∆ + ( ) ( )2 1 2 3 2 1t t t w e e w e θ− ∆ − ∆ − ∆ + + + > ( ) ( )2 1 2 2 3 1t t t w e w e e θ− ∆ − ∆ − ∆ + + + < ( ) ( ) 3 2 2 1 2 3 2 1 2 2 1 , , 1 t t t t t t w e e e w e e e − ∆ − ∆ − ∆ − ∆ − ∆ − ∆    + +  = = =     + +      w x x ( ) ( )T T1 2 ,θ θ> <w x w x Consider a classification problem of two spike patterns: If a vector notation is introduced: This classification problem is reduced to a perceptron problem:
  • 46. Tempotron: Spike-based perceptron. 3 1 1 t t w e w e− ∆ −∆ + 2 2 2 t w e w− ∆ + 2 1 1 t w e w− ∆ + 3 2 2 t t w e w e− ∆ −∆ + ( ) ( )2 1 2 3 2 1t t t w e e w e θ− ∆ − ∆ − ∆ + + + > ( ) ( )2 1 2 2 3 1t t t w e w e e θ− ∆ − ∆ − ∆ + + + < ( ) ( ) 3 2 2 1 2 3 2 1 2 2 1 , , 1 t t t t t t w e e e w e e e − ∆ − ∆ − ∆ − ∆ − ∆ − ∆    + +  = = =     + +      w x x ( ) ( )T T1 2 ,θ θ> <w x w x Consider a classification problem of two spike patterns: If a vector notation is introduced: This classification problem is reduced to a perceptron problem:
  • 47. Learning a tempotron: intuition. 3 1 1 t t w e w e− ∆ −∆ + 2 2 2 t w e w− ∆ + 2 1 1 t w e w− ∆ + 3 2 2 t t w e w e− ∆ −∆ + ( ) ( )2 1 2 3 2 1t t t w e e w e θ− ∆ − ∆ − ∆ + + + > ( ) ( )2 1 2 2 3 1t t t w e w e e θ− ∆ − ∆ − ∆ + + >+ What was wrong if the second pattern was misclassified? The last spike of neuron #1 (red one) is most responsible for the error, so the synaptic strength of this neuron should be reduced. 1w λ∆ = −
  • 48. Learning a tempotron: intuition. 3 1 1 t t w e w e− ∆ −∆ + 2 2 2 t w e w− ∆ + 2 1 1 t w e w− ∆ + 3 2 2 t t w e w e− ∆ −∆ + ( ) ( )2 1 2 3 2 1t t t w e e w e θ− ∆ − ∆ − ∆ + + <+ ( ) ( )2 1 2 2 3 1t t t w e w e e θ− ∆ − ∆ − ∆ + + + < What was wrong if the second pattern was misclassified? The last spike of neuron #2 (red one) is most responsible for the error, so the synaptic strength of this neuron should be potentiated. 2w λ∆ = +
  • 49. Exercise: Capacity of perceptron. • Generate a set of random vectors. • Write a code for the Perceptron learning algorithm. • By randomly relabeling, count how many of them are linearly separable. Rigotti, M., Barak, O., Warden, M. R., Wang, X. J., Daw, N. D., Miller, E. K., & Fusi, S. (2013). The importance of mixed selectivity in complex cognitive tasks. Nature, 497(7451), 585-590.
  • 50. Exercise: Training of recurrent neural networks. 0 α = I P T 1 T 1 n n n n n n n n n += − + P r r P P P r P r Goal: Investigate the effects of chaos and feedback in a recurrent network. ( )1t n n n t+= −+ + ∆x x x Mr T tanhnn nz = w x tanhn n=r x 1 nn n n ne+= −w w P r nn ne z f= − Recurrent dynamics without feedback: Update of covariance matrix: Update of weight matrix: force_internal_all2all.m
  • 51. Exercise: Training of recurrent neural networks. 0 α = I P T 1 T 1 n n n n n n n n n += − + P r r P P P r P r Goal: Investigate the effects of chaos and feedback in a recurrent network. ( )1 f t n nn n n tz+= − ++ + ∆x x Mr wx T tanhnn nz = w x tanhn n=r x 1 nn n n ne+= −w w P r nn ne z f= − Recurrent dynamics with feedback: Update of covariance matrix: Update of weight matrix: force_external_feedback_loop.m
  • 52. Exercise: Training of recurrent neural networks. Goal: Investigate the effects of chaos and feedback in a recurrent network. • Investigate the effect of output feedback. Are there any difference in the activities of recurrent units? • Investigate the effect of gain parameter g. What happens if the gain parameter is smaller than 1? • Try to approximate some other time series such as chaotic ones. Use the Lorentz model, for example.
  • 53. References • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back- propagating errors. Cognitive modeling, 5(3), 1. • Sejnowski, T. J., & Rosenberg, C. R. (1987). Parallel networks that learn to pronounce English text. Complex systems, 1(1), 145-168. • Funahashi, K. I. (1989). On the approximate realization of continuous mappings by neural networks. Neural networks, 2(3), 183-192. • S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies • Zipser, D. (1991). Recurrent network model of the neural mechanism of short-term active memory. Neural Computation, 3(2), 179-193. • Johansson, R. S., & Birznieks, I. (2004). First spikes in ensembles of human tactile afferents code complex spatial fingertip events. Nature neuroscience, 7(2), 170-177. • Branco, T., Clark, B. A., & Häusser, M. (2010). Dendritic discrimination of temporal input sequences in cortical neurons. Science, 329(5999), 1671-1675. • Gütig, R., & Sompolinsky, H. (2006). The tempotron: a neuron that learns spike timing–based decisions. Nature neuroscience, 9(3), 420-428. • Sussillo, D., & Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic neural networks. Neuron, 63(4), 544-557.