Supervised Learning in Multilayer Spiking Neural Networks
Supervised Learning in Multilayer Spiking Neural Networks
Bohte
Ioana Sporea
[email protected]
André Grüning
[email protected]
Department of Computing, University of Surrey, Guildford, GU2 7XH, U.K.
1 Introduction
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
474 I. Sporea and A. Grüning
2 Background
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 475
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
476 I. Sporea and A. Grüning
The ReSuMe algorithm has also been applied to neural networks with
a hidden layer, where weights of downstream neurons are subject to mul-
tiplicative scaling (Grüning & Sporea, 2012). The simulations show that
networks with one hidden layer can perform linearly nonseparable logi-
cal operations, while networks without hidden layers cannot. The ReSuMe
algorithm has also been used to train the output layer in a feedforward net-
work in Glackin, Maguire, McDaid, and Sayers (2011) and Wade, McDaid,
Santos, and Sayers (2010), where the hidden layer acted as a frequency filter.
However, input and target outputs here consisted of fixed-rate spike trains.
Approaches from a different angle include Rostro-Gonzalez, Vasquez-
Betancour, Cessac, and Viéville (2010), who use a linear programming ap-
proach to estimate weights (and delays) in recurrent spiking networks based
on LIF neurons and successfully attempt to reconstruct weights from a spike
raster and initial conditions such as a (fixed) input spike train and initial
states of membrane potentials.
Finally, despite some positive evidence (Knudsen, 1994, 2002), there is
still a debate of whether supervised learning is taking place in nervous
systems at all or whether reinforcement-style learning is more plausible.
Urbanczik and Senn (2009) use a clever approach based on stochastic gra-
dients to derive a reinforcement learning rule for populations of spiking
neurons. Their network consists of ensembles of noise-escape neurons in a
single layer and with an external critic. It is applied to classification tasks
where inputs are spike-train-encoded; however, outputs are spike-rate or
latency encoded, not making use of full spike-train patterns. No attempt is
made to extend this behavior to networks with a hidden layer or to true
spatiotemporal spike patterns as outputs. However, the difference between
fully supervised and reinforcement learning schemes might be only a no-
tional one, as Roelfsema and van Ooyen (2005) and Grüning (2007) demon-
strate. Their work focuses on relating back propagation weight changes to
reinforcement learning for multilayer networks of rate neurons in classi-
fication and times series prediction tasks; similar techniques can perhaps
also to be applied to spiking neurons.
This letter introduces a new supervised learning algorithm that combines
the quality of SpikeProp, spanning to multiple layers (Bohte et al., 2002),
with the flexibility of ReSuMe, which can be used with multiple spikes and
different neuron models (Ponulak & Kasiński, 2010).
3 Learning Algorithm
In this section, we describe the new learning algorithm for feedforward mul-
tilayer spiking neural networks. The learning rule is derived for networks
with only one hidden layer, as the algorithm can be extended to networks
with more hidden layers similarly. First, we give an alternative motivation
of the ReSuMe learning rule. Ponulak and Kasiński (2010), in their original
motivation of ReSuMe, simply invoke a spiking analog of the delta rule
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 477
as its starting point that was model free and could be shown to converge
for single input, output, and target spikes (Ponulak, 2006). Our approach
instead gives a more explicit relation between gradient descent and weight
changes under ReSuMe. In our alternative formulation, the algorithm is
then extended to networks with a hidden layer.
3.1 Neuron Model. The input and output signals of spiking neurons are
represented by the timing of spikes. A spike train is defined as a sequence of
impulses fired by a particular neuron at times t f . Spike trains are formalized
by a sum of Dirac δ functions (Gerstner & Kistler, 2002):
S(t) = δ(t − t f ). (3.1)
f
In order to establish a relation between the input and output spike trains
for a single neuron, we start from a linear stochastic neuron model in
continuous time. The instantaneous firing rate Ro (t) of a neuron o is the
probability density of firing at time t and is determined by the instantaneous
firing rates of its presynaptic neurons h,
1
Ro (t) = woh Rh (t), (3.2)
n
h∈H
1
M
R(t) = S(t) = lim S j (t), (3.3)
M→∞ M
j=1
where M is the number of trials and S j (t) is the concrete spike train for each
trial.
The instantaneous firing rate R(t) will be used for deriving the learning
algorithm due to its smoothness. However, it will subsequently be replaced
at an appropriate point by an estimate for a single run, namely, the (discon-
tinuous) spike train S(t). This is a more elaborate and explicit procedure
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
478 I. Sporea and A. Grüning
than in Ponulak and Kasiński (2010) but is based on the same underlying
ideas.
1 a
E(t) = E(Rao (t)) = [Ro (t) − Rdo (t)]2 . (3.4)
2
o∈O
In order to minimize the network error, the weights are modified using a
process of gradient descent,
∂E(Rao (t))
woh (t) = −η , (3.5)
∂woh
where η is the learning rate and woh represents the weight between the
output neuron o and hidden neuron h. woh (t) is the weight change con-
tribution
due to the error E(t) at time t, and the total weight change is
w = w(t)dt over the duration of the spike train. This is analogous to
the starting point of standard backpropagation for rate neurons in discrete
time. For simplicity, the learning rate will be considered η = 1 and will be
suppressed in the following equations, as the step length of each learning
iteration will be given by other learning parameters to be defined later.
Also, in the following, derivatives are understood in a functional sense.
3.2.1 Weight Modifications for the Output Neurons. In this section we derive
the weight-update formulated for the ReSuMe learning algorithm in an
alternative way and connect with gradient-descent learning for spiking
neurons. We will need this derivation as a first step to derive our extension
of ReSuMe to subsequent layers in section 3.2.2. However, this derivation
is also instructive in its own right as it works out more clearly than in the
original derivation (Ponulak & Kasiński, 2010) how ReSuMe and gradient
descent are connected. It also varies Ponulak’s statement that ReSuMe can
be applied to any neuron model. Here, this is the case if the neural model
can, on an appropriate timescale, be approximated well enough with a linear
neuron model (first-order approximation in a Taylor or Volterra series).
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 479
The first term of the right-hand side of equation 3.6 can be calculated as
∂E(Rao (t))
= Rao (t) − Rdo (t). (3.7)
∂Rao (t)
∂Ro (t) 1
= Rh (t), (3.8)
∂woh nh
1 a
woh (t) = − [R (t) − Rdo (t)]Rh (t). (3.9)
nh o
For convenience, we define the backpropagated error δo (t) for the output
neuron o:
1 d
δo (t) := [R (t) − Rao (t)]; (3.10)
nh o
hence:
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
480 I. Sporea and A. Grüning
1 d
woh (t) = [S (t) − Sao (t)]Sh (t). (3.12)
nh o
∞
Sdo (t)Sh (t) → Sh (t) a + a pre (s)Sdo (t − s)ds
0
∞
+ So (t) a +
d
a post
(s)Sh (t − s)ds , (3.13)
0
∞
Sao (t)Sh (t) → Sh (t) a + a pre (s)Sao (t − s)ds
0
∞
+ Sao (t) a + a post (s)Sh (t − s)ds , (3.14)
0
⎧
⎪
⎪ s
⎨ ,
⎪a pre (−s) = −A− exp if s ≤ 0
τ−
W (s) = , (3.15)
⎪
⎪ −s
⎪
⎩a (s) = +A+ exp
post
, if s > 0
τ+
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 481
where A+ , A− > 0 are the amplitudes and τ+ , τ− > 0 are the time constants
of the learning window. Other forms of learning windows can be considered
(Ponulak, 2008); however, in this letter, we use only this form. Thus, the final
learning formula for the weight modifications becomes
∞
1
woh (t) = S (t) a pre (s)[Sdo (t − s) − Sao (t − s)]ds
nh h 0
∞
1 d
+ [So (t) − So (t)] a +
a
a (s)Sh (t − s)ds .
post
(3.16)
nh 0
3.2.2 Weight Modifications for the Hidden Neurons. In this section we ex-
tend the argument above to weight changes between the input and the
hidden layers. The weight modifications for the hidden neurons are calcu-
lated in a similar manner in the negative gradient direction:
∂E(Rao (t))
whi (t) = − . (3.17)
∂whi
The first factor on the right-hand part of the above equation is expanded
for each output neuron using the chain rule:
The second factor on the right-hand side of the above equation is calculated
from equation 3.2:
∂Rao (t) 1
= woh . (3.20)
∂Rh (t) nh
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
482 I. Sporea and A. Grüning
The derivatives of the error with respect to the output spike train have al-
ready been calculated for the weights to the output neurons in equation 3.7.
By combining these results,
∂E(Rao (t)) 1 a
= [Ro (t) − Rdo (t)]woh . (3.21)
∂Rh (t) nh
o∈O
∂Rh (t) 1
= Ri (t), (3.22)
∂whi ni
1 a
whi (t) = − [Ro (t) − Rdo (t)]Ri (t)woh . (3.23)
nh ni
o∈O
We define the backpropagated error δh (t) for layers other than the output
layer:
1
δh (t) := δo (t)woh . (3.24)
ni
o∈O
1 d
whi (t) = [So (t) − Sao (t)]Si (t)woh . (3.25)
nh ni
o∈O
We now repeat the procedure of replacing the product of two spike trains
(involving δ-distributions) with an STDP process. We note first that equa-
tion 3.25 no longer depends on any spikes fired or not fired in the hidden
layer. Although there are neurobiological plasticity processes that can con-
vey information about a transmitted spike from the effected synapses to
lateral or downstream synapses (for an overview, see Harris, 2008), no di-
rect neurobiological basis is known for an STDP process between a synapse
and the outgoing spikes of an upstream neuron. Therefore, this substitution
is to be seen as a computational analogy and the weights will be modified
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 483
according to
∞
1
whi (t) = Si (t) a pre
(s)[Sdo (t − s) − Sao (t − s)]ds woh
ni nh 0
o∈O
∞
1 d
+ [So (t) − Sao (t)] a + a post (s)Si (t − s)ds woh .
ni nh 0
o∈O
(3.26)
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
484 I. Sporea and A. Grüning
this is an indication that this algorithm will function with similar neuron
models, as we demonstrate in section 5.
(3.27)
woh
k
= δo (t)Rh (t − doh
k
), (3.28)
where woh
k
is the weight between output neuron o and hidden neuron h
k
delayed by doh . The backpropagated error for the output is then
1
δo (t) = [Rd (t) − Rao (t)], (3.29)
mnh o
where m is the number of subconnections. The learning rule for the weight
modifications for any hidden layer is derived similarly as
whi
k
= δh (t)Ri (t − doh
k
), (3.30)
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 485
where δh (t) is the backpropagated error calculated over all possible back-
ward paths (from all output neurons through all delayed subconnections):
1
δh (t) = δo (t)woh
l
. (3.31)
mni
l,o∈O
3.2.7 Synaptic Scaling. There has been extensive evidence that suggests
that spike-timing-dependent plasticity is not the only form of plasticity
(Watt & Desai, 2010). Another plasticity mechanism used to stabilize the
neurons’ activity is synaptic scaling (Shepard et al., 2006). Synaptic scaling
regulates the strength of synapses in order to keep the neuron’s firing rate
within a particular range. The synaptic weights are scaled multiplicatively,
this way maintaining the relative differences in strength between any inputs
(Watt & Desai, 2010).
In our network, in addition to the learning rule described above, the
weights are modified according to synaptic scaling in order to keep the
postsynaptic neuron firing rate within an optimal range [rmin , rmax ]. If a
weight wji from neuron i to neuron j causes the postsynaptic neuron to fire
with a rate outside the optimal range, the weights are scaled according to
the following formula (Grüning & Sporea, 2012):
⎧
⎪ (1 + f )wji , wji > 0
⎨
wji = 1 , (3.32)
⎪
⎩ wji , wji < 0
1+ f
where the scaling factor f > 0 for r j < rmin , and f < 0 for r j > rmax .
Synaptic scaling solves the problem of optimal weight initialization. It
was observed that the initial values of the weights have a significant in-
fluence on the learning process, as values that are too large or too low
may result in failure to learn (Bohte et al., 2002). Preliminary experiments
showed that a feedforward network can still learn reliably simple spike
trains without synaptic scaling as long as the weights are initialized within
an optimal range. However, as the target patterns contain more spikes, find-
ing the optimal initial values for the weights becomes difficult. Moreover, as
the firing rate of the target neurons increases, it becomes harder to maintain
the output neurons’ firing rate within the target range without using small
learning rates. The introduction of synaptic scaling solves the problem of
weight initialization as well as speeds up the learning process.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
486 I. Sporea and A. Grüning
three-layer network. The output layer consists of a single neuron. The neu-
rons are connected through a single subconnection with no delay. For sim-
plicity, in this section spike trains will comprise only a single spike. Let
td and ta denote, respectively, the desired and actual spike time of output
neuron o and th and ti , respectively, the spike times of the hidden neuron h
and input neuron i, respectively. Also, for simplicity, synaptic scaling will
not be considered here.
For a start, we assume to, td > th > ti , that is, where relevant postsynaptic
spikes occur after the presynaptic spikes. With these assumptions, equa-
tions 3.16 and 3.27 read after integrating out:
1 th − td t − to
woh = A+ exp − A+ exp h , (4.1)
nh τ+ τ+
1 t − td t − to
whi = |woh | A+ exp i − A+ exp i . (4.2)
nh ni τ+ τ+
We discuss only this case in the following and note that the case to, td <
th , ti (i.e., post-before-pre) can be discussed along the same lines with A+
above replaced by A− . We discuss now the following subcases:
1. The output neuron fires a spike at time to before the target firing time
td (to < td ).
a. Weight modifications for the synapses between the output and hid-
den neurons. The weights are modified according to woh =
t −t t −t t −t
n
1
(A+ exp hτ d − A+ exp hτ + o ). Since to < td , then exp ( hτ o ) >
h + +
th −td
exp ( τ+
) in equation 4.1. This results in woh < 0, and thus in
a decrease of this weight. If the connection is an excitatory one,
the connection becomes less excitatory, increasing the likeli-
hood that the output neuron fires later during the next iteration,
hence minimizing the difference between the actual output and
the target firing time. If the connection is inhibitory, the connec-
tion will become stronger inhibitory, resulting in a later firing
of the output neuron o as well (see also Ponulak, 2006).
b. Weight modifications for the synapses between the hidden and in-
put neurons. The weights to the hidden neurons are modified
t −t t −t
according to: whi = n 1n (A+ exp iτ d − A+ exp iτ +o )|woh |.
h i +
i. woh ≥ 0. By a analogous reasoning to the case above, whi ≥
0; hence, the connection will become less excitatory or more
inhibitory, again making the hidden neuron fire slightly
later or suppress a hidden-layer spike, and hence making
it more likely that the output neuron fires later because the
connection from hidden to output layer is excitatory.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 487
ii. woh < 0. For the weight whi , the direction of the weight
change stays the same; hence, neuron h will fire later. As it is
now more likely to fire later, its inhibitory effect will come to
bear on the output neuron also slightly later. Alternatively,
if we do not take the absolute value of woh in equation 3.27,
but stick to equation 3.26, then the direction of change to whi
is reversed, that is, whi > 0. This brings forward the firing
of neuron h; hence, h has a less suppressive effect at to and
contributes to making the output fire even earlier. This is
why it makes sense to use the modulus |woh |.
2. The output neuron fires a spike at time to after the target firing time
td (to > td ). As equations 4.1 and 4.2 change their sign when to and td
are swapped, this case reduces to the above, but with the opposite
sign of the weight change (i.e., overall weight change such that to
moves forward in time, close to td ).
5 Simulations
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
488 I. Sporea and A. Grüning
where ti are the times of the spikes and H(t) is the Heaviside function. τc ,
the time constant of the exponential function, is chosen to be appropriate to
the interspike interval of the output neurons (van Rossum, 2001). In the fol-
lowing simulations, the output neurons are required to fire approximately
one spike in 10 ms; thus, τc = 10 ms. The distance between two spike trains
is the squared Euclidean distance between these two functions:
T
1
D2 ( f, g) = [ f (t) − g(t)]2 dt, (5.2)
τc 0
where the distance is calculated over a time domain [0, T] that covers all
the spikes in the system. The van Rossum distance is also used to classify
the output pattern during learning and testing. The output pattern is in-
terpreted as the closest of the target patterns in terms of the van Rossum
distance. To give the reader an intuitive sense of the magnitude of the van
Rossum distance as used, a van Rossum distance of 0.1 corresponds, for
example, to a pair of spike trains that agree on all spike times, but one spike
pair is about 1 ms apart.
The results are averaged over a large number of trials (50 trials unless
stated otherwise), with the network being initialized with a new set of
random weights every trial. On each testing trial, the learning algorithm
is applied a maximum of 2000 iterations, or until the network error has
reached the minimum value.
Unless stated otherwise, the network parameters used in these simu-
lations are the threshold ϑ = 0.7, the time constant of the spike response
function τ = 7 ms, and the time constant of after-potential kernel τr = 12 ms.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 489
0 0 0 16
0 6 0 10
6 0 0 10
6 6 0 16
The scaling factor is set to f = ±0.005. The learning parameters are initial-
ized as follows: A+ = 1.2, A− = 0.5, τ+ = τ− = 5 ms, a = 0.05.
The weights are initialized with random values uniformly distributed
between −0.2 and 0.8. The weights are then normalized by dividing them
to the total number of subconnections.
5.2.1 Technical Details. The input and output patterns are encoded using
spike-time patterns as in Bohte et al. (2002). The signals are associated with
single spikes as follows: a binary symbol 0 is associated with a late firing
(a spike at 6 ms for the input pattern), and a 1 is associated with an early
firing (a spike at 0 ms for the input pattern). We also used a third input
neuron that designates the reference start time as this encoding needs an
absolute reference start time to determine the latency of the firing (Sporea
& Grüning, 2012). Without a reference start time, two of the input patterns
become identical, and without an absolute reference time, the network is
unable to distinguish the two patterns (0-0 and 6-6) and would always
respond with a delayed output. Table 1 shows the input and target spike
timing patterns presented to the network. The values represent the times of
the spikes for each input and target neuron in ms of simulated time.
The learning algorithm was applied to a feedforward network as de-
scribed above. The input layer is composed of three neurons, the hidden
layer contains five spiking neurons, and the output layer contains only
one neuron. Multiple subconnections with different delays were used for
each connection in the spiking neural network. Preliminary experiments
showed that 12 subconnections with delays from 0 ms to 11 ms are suffi-
cient to learn the XOR problem. The results are averaged over 100 trials.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
490 I. Sporea and A. Grüning
The network error is summed over all pattern pairs, with a minimum value
for convergence of 0.2. The minimum value is chosen to ensure that the
network has learned to classify all patterns correctly by matching the exact
number of spikes of the target spike train as well as the timing of the spikes
with 1 ms precision. Each spiking neuron in the network was simulated for
a time window of 30 ms, with a time step of 0.1 ms. In the following, we
systematically vary the parameters of the learning algorithm and examine
their effects.
5.2.2 The Learning and Network Parameters. Here, we vary the learning
parameters A+ and A− in equation 3.15 in order to determine the most ap-
propriate values. A+ is varied between 0.5 and 2.0 while keeping A− = 12 A+ .
The parameters A+ and A− play the role of a learning rate. Just like the
classic backpropagation algorithm for rate neurons, when the learning pa-
rameters have higher values, the number of iterations needed for conver-
gence is lower. The detailed results of the simulations are summarized in
appendix B in Table 3 (left). Although the algorithm converges with a high
rate for all values of A+ , for the lower values (A+ < 0.8), the learning pro-
cess is slower. When A+ has higher values, the network requires around
200 iterations to learn all four patterns. If A+ is too high, the convergence
rate starts to drop.
In order to determine the best ratio between the two learning parameters,
various values are chosen for A− , while keeping A+ = 1.2 fixed (the results
are summarized in Table 3 (right). The learning algorithm is able to converge
for the values of A− lower than A+ . As A− becomes equal to or higher than
A+ , the convergence rate slowly decreases and the number of iterations
needed for convergence significantly rises. The lowest average number of
iterations with a high convergence rate is 137 averaged over 98% successful
trials (where A+ = 1.2 and A− = 0.5).
The algorithm also converges when the spiking neural network has a
smaller number of subconnections. However, a lower number of delayed
subconnections (between 4 and 10) results in a significantly lower conver-
gence rate without necessarily a lower average of learning iterations for
the successful trials. Although more subconnections can produce a more
stable learning process, due to the larger number of weights that need to
be coordinated, the learning process is slower in this case (more than 300
iterations). Table 4 in appendix B shows the summarized results, where
A+ = 1.2 and A− = 0.6.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 491
8
3
Network error
6
2
4
1
2
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Iterations Iterations
(a) (b)
100 100 100 100
Iterations
50 50 50 50
0 0 0 0
10 15 20 10 15 20 10 15 20 10 15 20
Time [ms]
(c)
100 100 100 100 100
Iterations
50 50 50 50 50
0 0 0 0 0
8 10 12 13 15 17 10 15 20 12 14 16 14 16 18
Time [ms]
(d)
Figure 1: The XOR task. Analysis of the learning process with parameters A+ =
1.2 and A− = 0.5 for a sample trial. (a) The network error during learning.
(b) The Euclidean distance between the weight vector solution and the weight
vectors during the learning process. (c) The output signals for each of the four
patterns during learning. An x represents the target spike times. (d) The hidden
signals during learning for each hidden neuron for one input pattern ([0 0]).
after 63 iterations, due to the nature of the STDP processes, the solution is
lost, only to converge again later. Similar findings were reported in Grüning
and Sporea (2012). Figure 1b shows the Euclidean distance between the
weight vector solution found on that particular trial and the weight vectors
during each learning iteration that led to this weight vector. The weight
vectors are tested against the solution found during this trial because in
principle, there can be multiple solutions to weight vectors (e.g., different
initial weight set results in a different weight solution for the same set of
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
492 I. Sporea and A. Grüning
Note: The patterns contain a single spike, where the timing (shown in ms) differs for each
of the three patterns.
pattern pairs). While the error graph is irregular, the weight vector graph
shows that the weight vector moves steadily toward the solution. The irreg-
ularity of the network error during the learning process can be explained
by the fact that small changes to the weights can produce an additional
or missing output spike, which causes significant changes in the network
error. The highest error value corresponds to the network’s not firing any
spike for any of the four input patterns. The error graph also shows the
learning rule’s ability to modify the weights in order to produce the cor-
rect number of output spikes. Figure 1c shows the output signals during
learning for all input patterns. For two of the input patterns, the output
signals are stable after only 50 learning iterations, while for the other two
patterns, the output neurons fire around the target spike times. As such,
during learning, the network responds with either the incorrect response
or the correct response, but with the time difference between the target and
output signal too large. Figure 1d shows the spike timings during learning
for each of the hidden neurons for one of the patterns.
5.3.1 Technical Details. The three species are described by four measure-
ments of the plants: the lengths and widths of the petal and sepal. Each of
the four features is represented by the timing of a single spike of a corre-
sponding input neuron. The measurements of the Iris flower range from 0
to 8 (see Table 2) and are fed into the spiking neural network as spike-timing
patterns to the input neurons. The output of the network is represented by
the spike time of the output neuron (see Table 2). The hidden layer contains
10 spiking neurons, and each connection has between 8 and 12 delayed
subconnections depending on the experiment. The network is simulated in
a 30 ms time window with 0.1 ms time step.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 493
50 50 50
0 0 0
10 15 20 10 15 20 10 15 20
Time [ms]
Figure 2: The Iris data set. Output signals during learning for each of three
species for a sample trial. The x markers represent the target spike times.
During each trial, the input patterns are randomly divided into a training
set (75% of samples) and a testing set (25% of samples) for cross-validation.
During each iteration, the training set is used for the learning process to
calculate the weight modifications and test if the network has learned the
patterns. The learning is considered successful if the network error has
reached a minimum average value of 0.2 for each pattern pair and 95% of
the patterns in the training set are correctly classified. As in the previous
experiment, this minimum value is chosen to ensure that the network has
learned to classify all patterns correctly by matching the exact number of
spikes of the target spike train as well as timing of the spikes with 1 ms
precision. Figure 2 shows the output signals during learning for all three
classes of species during a sample trial. While the first pattern is learned
after only a few iterations, it takes more than 100 iterations to distinguish
the other two classes. Table 5 in appendix B shows the summarized results
on the Iris data set for different network architectures with different num-
bers of delayed subconnections. Again, a too low or too high number of
subconnections results in lower performance—a convergence rate of less
than 80%. A network with 10 subconnections achieves a convergence rate
of 80% within 114 iterations on average.
Multilayer ReSuMe permits the spiking neural network to learn the Iris
data set using a straightforward encoding of the patterns and results in
much faster learning than SpikeProp, as the average number of iterations is
always lower than 200, as opposed to the population coding based on arrays
of receptive fields that requires 1000 learning iterations with SpikeProp
(Bohte et al., 2002).
5.4 The XOR Task with Spike Train Patterns. In this experiment, the
learning algorithm is tested on a linearly nonseparable problem and map-
ping of corresponding sequences of spikes. Again, the XOR problem is
applied to a network of spiking neurons, but the logic patterns are encoded
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
494 I. Sporea and A. Grüning
5.4.1 Technical Details. Each input logical value is associated with the
spike trains of a group of 20 spiking neurons. In order to ensure some
dissimilarity among the patterns, for each input neuron, a spike train is
generated by a pseudo-Poisson process with a constant firing rate of r =
0.06/ms within a 30 ms time window. The minimum interspike interval is
set to 3 ms. This spike train is then split in two new spike trains by randomly
distributing all the spikes (Grüning and Sporea, 2012). The newly created
spike trains represent the patterns for the logical symbols 0 and 1. The input
spike trains are required to consist of at least one spike.
The output patterns are created similarly and will be produced by one
output neuron. The spike train to be split is generated by a pseudo-Poisson
process with a constant firing rate of r = 0.2/ms within a 30 ms period
of time. The resulting output patterns are chosen so that the spike trains
contain exactly three spikes.
Apart from the minimal network error as before, an additional stopping
criterion for the learning process is introduced. The network must correctly
classify all four patterns. An input pattern is considered correctly classified
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 495
Output layer
Hidden layer
Input layer
Signal S1 Signal S2
(a)
50 50 50 50
0 0 0 0
0 20 40 0 20 40 0 20 40 0 20 40
Time [ms]
(b)
40
Input
20
0
0 10 20 30
100
Hidden
50
0
0 10 20 30
Output
0 10 20 30
Time [ms]
(c)
Figure 3: The XOR task with spike train patterns. (a) Network structure for the
XOR problem. A feedforward network with three layers, where the input layers
consist of two groups of 20 neurons for each logical signal. (b) Output spikes
for all four input patterns [0 0], [0 1], [1 0], [1 1] during learning for a sample
trial. The x markers represent the target spike times. (c) Sample input, hidden,
and output signals for the logical input ([1 0]), after the learning process has
converged. The gray signals in the output graph represent the target pattern.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
496 I. Sporea and A. Grüning
if the output spike train is closest to the target pattern in terms of the van
Rossum distance. The network error consists of the sum of van Rossum
distances between the target and actual output over the four patterns as
before; a minimum value of 3 ensures that the output spikes are reproduced
with acceptable precision.
In addition to the previous experiments, an absolute refractory period is
set for all neurons to t = 3 ms. The learning is simulated over a period of
50 ms, with a time step of 0.5 ms.
In order to determine the optimal size of the hidden layer for a higher
convergence rate, different network topologies have been considered. In
appendix B, Table 6 shows the convergence rate for each network topology,
with a new set of spike-timing patterns being generated every trial.
The learning rule is able to converge with a higher rate as the number
of neurons in the hidden layer increases. A larger hidden layer means that
the patterns are mapped to a richer spiking activity, hence, it is easier for
output neurons to produce the required spike patterns. A smaller number
of neurons in the hidden layer than in the input layer does not result in high
convergence rate because the input patterns are not sufficiently distributed
in the hidden activity. Also, more than 100 units in the hidden layer does not
result in higher convergence rates, but as the number of weights increases,
the learning process is slower. Previous simulations (Grüning & Sporea,
2012) show that a neural network without a hidden layer cannot learn
linearly nonseparable logical operations.
Figure 3b shows the output signals during learning for a sample trial.
Figure 3c shows the input, hidden, and output signals for one of the patterns.
The first 20 input spike trains represent the pattern for the logical symbol 1,
while the other 20 spike trains represent the pattern for the logical symbol
0. Although the network is not responding with the exact target spike train,
the output spike train is closest to the pattern representing the logical 1 than
to the pattern representing logical 0 in terms of the van Rossum distance.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 497
100 100
80 80
Accuracy [%]
Accuracy [%]
60 60
40 40
20 20
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Jitter [ms] Jitter [ms]
(a) (b)
10 10 10 10 10 10
Iterations
5 5 5 5 5 5
0 0 0 0 0 0
0 50 100 0 50 100 0 50 100 0 50 100 0 50 100 0 50 100
Time [ms]
(c)
15 15 15
Iterations
10 10 10
5 5 5
0 0 0
0 500 0 500 0 500
Time [ms]
(d)
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
498 I. Sporea and A. Grüning
minimum network error allows the output spike train to miss or add an
extra spike as long as the pattern is still closest to the target in terms of the
van Rossum distance. The network is simulated for 120 ms with a 1 ms time
step.
5.5.2 Size of the Hidden Layer. In order to determine how the structure of
the neural network influences the number of patterns that can be learned,
different architectures have been tested. In these simulations, 100 input
neurons are considered in order to have a distributed firing activity for
the simulated time period. The output layer contains a single neuron as in
the previous simulations. The size of the hidden layer is varied from 200
to 300 neurons to determine the optimal size for storing 10 input-output
pattern pairs. The network is able to perform better as the number of hidden
neurons increases. However, a hidden layer with more than 260 neurons
does not result in a higher convergence rate. The detailed results of the
simulations are summarized in appendix B in Table 7 (left).
5.5.4 Noise. After the learning has converged, the networks are also
tested against noisy patterns. The noisy patterns are generated by moving
each spike within a gaussian distribution with mean 0 and standard devi-
ation between 1 ms and 10 ms. After the network has learned all patterns,
the network is tested with a random set of 500 noisy patterns. Figure 4a
shows the accuracy rate (the percentage of input patterns that are correctly
classified) for the network with 260 spiking neurons in the hidden layer
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 499
trained with 10 pattern pairs. The accuracy rate is defined as the percentage
of correctly classified patterns calculated over the successful trials. The ac-
curacy rates are similar for all the networks described above. The network
is able to recognize more than 20% (above the random performance level
of 10%) of the patterns when these are distorted with 10 ms.
5.6.1 Technical Details. Three random patterns are fed into the network
through 100 input spiking neurons. The hidden layer contains 210 neurons,
and the patterns are classified by a single output neuron. The input patterns
are generated by a pseudo-Poisson process with a constant firing rate of
r = 0.1/ms within a 500 ms time period, where the spike trains are chosen so
that they contain between 15 and 20 spikes. For the spike train generation,
an interspike interval is set to 5 ms. As in the previous experiment, in
order to ensure that a solution exists, the target patterns are generated as
the output of a spiking neural networks initialized with a random set of
weights. The target spike trains are chosen so that they contain at least three
spikes and no more than seven spikes. The input and target patterns are
distributed over such large periods of time in order to simulate complex
forms of temporal processing, such as speech recognition, that spans over
hundreds of milliseconds (Mauk & Buonomano, 2004).
During learning, for each iteration, noisy versions of the input patterns
are generated by moving each spike by a time interval within a gaussian
distribution with mean 0 and standard deviation varying in the range of
1 ms to 4 ms. Figure 5 shows a sample input pattern where the spikes are
moved by a time interval within a gaussian distribution with mean 0 and
standard deviation 4 ms. The spikes in the target patterns are also shifted
by a time interval within a gaussian distribution with mean 0 and standard
deviation 1 ms independent of the noise level in the input patterns. The
network is simulated for 520 ms with 1 ms time step.
A minimum average error of 0.6 for each pattern pair is required for the
learning to be considered successful. During each iteration, the network is
tested against a new set of 30 random noisy patterns; in order for the learn-
ing to be considered converged, the network must also correctly classify
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
500 I. Sporea and A. Grüning
40
Input
20
0
0 100 200 300 400 500
Time [ms]
Figure 5: Example of an input pattern where the spikes are moved by a time
interval within a gaussian distribution with mean 0 and standard deviation
4 ms. The gray markers represent the original signals, and the black markers
represent the noisy signals.
at least 80% of noisy patterns. The spike times of the testing patterns are
shifted with the same distribution as the training patterns. Figure 4d shows
the output signals during learning for a sample trial. Again, the minimum
network error allows the output spike train to miss or add an extra spike
as long as the input patterns are correctly classified.
The detailed results of the simulations are shown in appendix B in Table 8,
where the average number of iterations is calculated over the successful
trials. The table also shows the number of successful trials when the network
is trained on noise-free patterns. When the network is trained with a low
amount of noise in the input patterns, the learning algorithm performs
slightly better than the network trained with patterns without noise. The
network is able to learn even when the spike train patterns are distorted
with 3 ms or 4 ms; however, the speed of learning as well as the convergence
rate drop as more noise is added to the input patterns.
Figure 4b shows the accuracy rates on a trained network against a ran-
dom set of 150 different noisy patterns, generated from the three original
input patterns. The network is trained on input patterns where the spikes
are moved within a gaussian distribution with mean 0 and standard devi-
ation 4 ms. The graph shows the accuracy rates on patterns with the spikes
moved within a gaussian distribution with mean 0 and standard deviation
between 1 ms and 10 ms. The graph also shows the network response on
the noise-free patterns. The accuracy rates are similar for all input pattern
jitter. The network is able to recognize more than 50% (again above the
random performance level of 33%) of the input patterns even when these
are distorted with up to 10 ms.
6 Discussion
This letter introduces a new algorithm for feedforward spiking neural net-
works. The first supervised learning algorithm for feedforward spiking
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 501
neural networks with multiple layers, SpikeProp, considers only the first
spike of each neuron, ignoring all subsequent spikes (Bohte et al., 2002).
Extensions of SpikeProp allow multiple spikes in the input and hidden
layer but not in the output layer (Booij & Nguyen, 2005; Ghosh-Dastidar
& Adeli, 2009). Our learning rule is, to the best of our knowledge, the first
fully supervised algorithm that considers multiple spikes in all layers of
the network. Although ReSuMe allows multiple spikes, the algorithm in its
original form can be applied only to single layers or to train readout neu-
rons in liquid state machines (Ponulak & Kasiński, 2010). In our approach,
multilayer ReSuMe, the hidden layer permits the networks to learn linearly
nonseparable problems as well as complex mapping and classification tasks
without using a large number of spiking neurons as liquid state machines
do or without the need of a large number of input neurons in single-layer
networks. Because the learning rule presented here extends the ReSuMe
algorithm to multiple layers, it can, like the original ReSuMe, in principle
be applied to any neuron model, as the weight modification rules depend
on only the input, output, and target spike trains and do not depend on the
specific dynamics of the neuron model. We discuss a few important aspects
in order.
On the one hand, the ReSuMe learning rule applied to a single layer
(Ponulak & Kasiński, 2010) with 12 to 16 delayed subconnections for each
connection is not able to learn the XOR problem with the early and late
timing patterns (simulations not presented in this letter). Although the al-
gorithm is able to change the weights in the correct direction, the network
never responds with the correct output for all four input patterns. The ad-
ditional hidden layer permits the network to learn the XOR problem (see
section 5.2). On the other hand, a spiking neural network with the same
number of units in each layer, but with 16 subconnections trained with
SpikeProp on the XOR patterns, needs 250 iterations to converge (Bohte
et al., 2002), while multilayer ReSuMe converged in 137 iterations on aver-
age. Furthermore, SpikeProp uses 16 delayed subconnections instead of just
12; hence, more weight changes need to be computed. Finally, SpikeProp
matches the time of only the first target spike, ignoring any subsequent
spikes. Although our algorithm also matches the exact number of spikes
and the precise timing of the target patterns, the network learns all the
patterns faster.
Studies on SpikeProp show that the algorithm is unstable, affecting the
performance of the learning process (Takase et al., 2009; Fujita, Takase, Kita,
& Hayashi, 2008). For our learning algorithm, the weight vector moves
steadily toward a solution during the learning process, as seen in Figures 1a
and 1b. This can be seen in a direct comparison with SpikeProp on the XOR
benchmark. Finally, the learning algorithm presented here permits using
different encoding methods with spiking patterns. In section 5.3, the Iris
data set is encoded using 4 input neurons instead of the 50 neurons required
by a population encoding (Bohte et al., 2002). The simpler encoding of the
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
502 I. Sporea and A. Grüning
iris flower dimensions allows the network to learn the patterns in five times
fewer iterations than with a population encoding used with SpikeProp
(Bohte et al., 2002).
When we move from rate-coded neurons to spiking neurons, the ques-
tion about the encoding of patterns arises. One encoding was proposed
by Bohte et al. (2002), where logical 0 and 1 are associated with the tim-
ing of early and late spikes, respectively (latency encoding). As the input
neurons’ activity is very sparse, the spikes must be multiplied over the
simulated time period to generate enough activity in the hidden layer to
support firing of the output neuron at defined times. This is achieved by
having multiple subconnections for each input neuron that replicate the ac-
tion potential with different delays. These additional subconnections, each
with a different synaptic strength, require additional training. This encod-
ing also requires an additional input neuron to set the reference start time
(Sporea & Grüning, 2011). The alternative to this latency encoding is to
use spike trains over a group of neurons as patterns. Thus, a pattern is
represented by the (multiple) firing times of a group of input (and output)
neurons. In order to guarantee that a set of weights exists for an arbitrary
target mapping without replicating the input signals as above, a relatively
large number of input neurons must be considered. As the input pattern
is distributed over several spike trains, some of the information might be
redundant and would not have a major contribution to the output, but then
only some of the delayed subconnections in the latency encoding scheme
have a major contribution. Finally, such an encoding does not require an ad-
ditional input neuron to designate the reference start time, as the patterns
are encoded in the relative timing of the spikes. The experiment in sec-
tion 5.4 shows that this encoding can be successfully used for an originally
linearly nonseparable problem.
In sections 5.5 and 5.6, the target patterns are generated as the output
signals of networks with random weights. Again, encodings are sparse, and
the corresponding pattern pairs are often locally linearly separable. The
network is able to learn these transformations very fast—most of them in
fewer than 10 learning iterations. During the learning process, the weights
are modified in order to correctly map all input into output patterns so that
they can be correctly classified, as seen in Figures 4c and 4d. In the task in
section 5.5, where the network is trained on 10 spike-timing pattern pairs,
the learning algorithm converges with a higher rate as the hidden layer
increases in size.
The simulations where noise was added to the spike-timing patterns
show that the learning is robust to the variability of spike timing. A spiking
neural network trained on 10 noise-free patterns can recognize more than
20% of noisy patterns if the timing of spikes is shifted following a gaus-
sian distribution with standard deviation up to 10 ms (see section 5.5 and
Figure 4a). And when the network is trained on 3 noisy patterns, it can
recognize more than 50% of noisy patterns where the timing of spikes is
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 503
f
u j (t) = η t − t j + w ji yi , (A.2)
i∈I k
t
η(t) = −ϑ exp − , (A.3)
τr
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
504 I. Sporea and A. Grüning
f
yki (t) = ε t − ti , (A.4)
f
with ε(t) is the spike response function with ε(t) = 0 for t ≤ 0. The times
f
ti represent the firing times of neuron i. In our case, the spike response
function ε(t) describes a standard postsynaptic potential,
t t
ε(t) = exp 1 − , (A.5)
τ τ
where τ > 0 models the membrane potential time constant and determines
the rise and decay of the function.
B.1 XOR Benchmark (Section 5.2). Table 3 shows the summarized re-
sults for the XOR benchmark where the learning algorithm is tested with
a different values for learning parameters A+ and A− . Table 4 shows the
convergence rate and the average number of iterations for the XOR bench-
mark when the network is tested with a different number of delayed
subconnections.
B.2 Iris Data Set (Section 5.3). Table 5 shows the convergence rate and
the average number of iterations for the Iris data set when the network is
tested with 8 to 12 subconnections.
B.3 The XOR Task with Spike Train Patterns (Section 5.4). Table 6
shows the convergence rate and the average number of iterations for this
task when the network is different hidden layer sizes.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 505
Notes: (Left) The parameters A+ and A− are varied in order to determine the best val-
ues for faster convergence. The ratio between these parameters is constant A+ = 2A− .
(Right) While keeping A+ = 1.2 fixed, A− is varied in order to determine the best ratio
between these parameters.
4 11 63 ± 20
6 24 169 ± 37
8 73 192 ± 27
10 81 154 ± 17
12 96 207 ± 31
14 96 309 ± 52
16 73 472 ± 56
different number of pattern pairs, keeping the size of the hidden layer fixed
to 260 spiking neurons.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
506 I. Sporea and A. Grüning
50 70 293 ± 59
60 54 301 ± 66
70 56 327 ± 91
80 60 469 ± 87
90 76 247 ± 42
100 76 439 ± 73
Notes: (Left) The network is trained with 10 pattern pairs, where the size of the hidden
layer is varied in order to determine the best network architecture. (Right) A neural
network with a hidden layer containing 260 neurons is trained with different numbers of
pattern pairs.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 507
0 96 10 ± 1.2
1 98 12 ± 1.1
2 95 19 ± 2.3
3 66 26 ± 5.6
4 64 115 ± 51
Acknowledgments
References
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
508 I. Sporea and A. Grüning
Gerstner, W., & Kistler, W. M. (2002). Spiking neuron models: Single neurons, populations,
plasticity. Cambridge: Cambridge University Press.
Ghosh-Dastidar, S., & Adeli, H. (2009). A new supervised learning algorithm for
multiple spiking neural networks with application in epilepsy and seizure detec-
tion. Neural Networks, 22, 1419–1431.
Glackin, C., Maguire, L., McDaid, L., & Sayers, H. (2011). Receptive field optimisation
and supervision of a fuzzy spiking neural network. Neural Networks, 24, 247–256.
Grüning, A. (2007). Elman backpropagation as reinforcement for simple recurrent
networks. Neural Computation, 19, 3108–3131.
Grüning, A., & Sporea, I. (2012). Supervised learning of logical operations in layered
spiking neural networks with spike train encoding. Neural Processing Letters, 36,
117–134. doi:10.1007/s11063-012-9225-1.
Gütig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that learns spike
timing-based decisions. Nature Neuroscience, 9, 420–428.
Harris, K. D. (2008). Stability of the fittest: Organizing learning through retroaxonal
signals. Trends in Neuroscience, 31, 130–136.
Hebb, D. O. (1949). The organization of behavior. New York: Wiley.
Johansson, R. S., & Birznieks, I. (2004). First spikes in ensembles of human tactile
afferents code complex spatial fingertip events. Nature Neuroscience, 7, 170–177.
Legenstein, R., Naeger, C., & Maass, W. (2005). What can a neuron learn with spike-
timing-dependent plasticity? Neural Computation, 17, 2337–2382.
Knudsen, E. I. (1994). Supervised learning in the brain. Journal of Neuroscience, 14,
3985–3997.
Knudsen, E. I. (2002). Instructed learning in the auditory localization pathway of the
barn owl. Nature, 417(6886), 322–328.
Maass, W. (1997a). Networks of spiking neurons: The third generation of neural
network models. Transactions of the Society for Computer Simulation International,
14, 1659–1671.
Maass, W. (1997b). Fast sigmoidal networks via spiking neurons. Neural Computation,
9, 279–304.
Mauk, M. D., & Buonomano, D. V. (2004). The neural basis of temporal processing.
Annual Rev. Neuroscience, 27, 304–340.
McKennoch, S., Voegtlin, T., & Bushnell, L. (2009). Spike-timing error backpropaga-
tion in theta neuron networks. Neural Computation, 21, 9–45.
Neuenschwander, S., & Singer, W. (1996). Long-range synchronization of oscillatory
light responses in the cat retina and lateral geniculate nucleus. Nature, 379, 728–
733.
Ponulak, F. (2006). ReSuMe–Proof of convergence. http://d1.cie.put.poznan.pl/dav/
fp/FP_ConvergenceProof_TechRep.pdf
Ponulak, F. (2008). Analysis of the ReSuMe learning process for spiking neural
networks. International Journal of Applied Mathematics and Computer Science, 18,
117–127.
Ponulak, F., & Kasiński, A. (2010). Supervised learning in spiking neural networks
with ReSuMe: Sequence learning, classification, and spike shifting. Neural Com-
putation, 22, 467–510.
Roelfsema, P. R., & van Ooyen, A. (2005). Attention-gated reinforcement learning of
internal representations for classification. Neural Computation, 17, 1–39.
Rojas, R. (1996). Neural networks: A systematic introduction. Berlin: Springer-Verlag.
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a
Supervised Learning in Multilayer Spiking Networks 509
orized licensed use limited to: University of Plymouth. Downloaded on November 24,2022 at 10:41:18 UTC from IEEE Xplore. Restrictions a