0% found this document useful (0 votes)
112 views12 pages

ANNtoSNN PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views12 pages

ANNtoSNN PDF

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ORIGINAL RESEARCH

published: 07 December 2017


doi: 10.3389/fnins.2017.00682

Conversion of Continuous-Valued
Deep Networks to Efficient
Event-Driven Networks for Image
Classification
Bodo Rueckauer 1*, Iulia-Alexandra Lungu 1 , Yuhuang Hu 1 , Michael Pfeiffer 1, 2 and
Shih-Chii Liu 1
1
Institute of Neuroinformatics, University of Zurich and ETH Zurich, Zurich, Switzerland, 2 Bosch Center for Artificial
Intelligence, Renningen, Germany

Spiking neural networks (SNNs) can potentially offer an efficient way of doing inference
because the neurons in the networks are sparsely activated and computations are
event-driven. Previous work showed that simple continuous-valued deep Convolutional
Neural Networks (CNNs) can be converted into accurate spiking equivalents. These
networks did not include certain common operations such as max-pooling, softmax,
Edited by: batch-normalization and Inception-modules. This paper presents spiking equivalents of
Gert Cauwenberghs, these operations therefore allowing conversion of nearly arbitrary CNN architectures. We
University of California, San Diego,
United States show conversion of popular CNN architectures, including VGG-16 and Inception-v3,
Reviewed by: into SNNs that produce the best results reported to date on MNIST, CIFAR-10 and
Sadique Sheik, the challenging ImageNet dataset. SNNs can trade off classification error rate against
University of California, San Diego,
the number of available operations whereas deep continuous-valued neural networks
United States
John V. Arthur, require a fixed number of operations to achieve their classification error rate. From the
IBM, United States examples of LeNet for MNIST and BinaryNet for CIFAR-10, we show that with an increase
Bruno Umbria Pedroni contributed to in error rate of a few percentage points, the SNNs can achieve more than 2x reductions
the review of John V. Arthur in operations compared to the original CNNs. This highlights the potential of SNNs in
*Correspondence: particular when deployed on power-efficient neuromorphic spiking neuron chips, for use
Bodo Rueckauer
in embedded applications.
[email protected]
Keywords: artificial neural network, spiking neural network, deep learning, object classification, deep networks,
Specialty section: spiking network conversion
This article was submitted to
Neuromorphic Engineering,
a section of the journal 1. INTRODUCTION
Frontiers in Neuroscience

Received: 25 July 2017 Deep Artificial Neural Network (ANN) architectures such as GoogLeNet (Szegedy et al., 2015)
Accepted: 22 November 2017 and VGG-16 (Simonyan and Zisserman, 2014) have successfully pushed the state-of-the-art
Published: 07 December 2017 classification error rates to new levels on challenging computer vision benchmarks like ImageNet
Citation: (Russakovsky et al., 2015). Inference in such very large networks, i.e., classification of an ImageNet
Rueckauer B, Lungu I-A, Hu Y, frame, requires substantial computational and energy costs, thus limiting their use in mobile and
Pfeiffer M and Liu S-C (2017) embedded applications.
Conversion of Continuous-Valued
Recent work have shown that the event-based mode of operation in SNNs is particularly
Deep Networks to Efficient
Event-Driven Networks for Image
attractive for reducing the latency and computational load of deep neural networks (Farabet et al.,
Classification. 2012; O’Connor et al., 2013; Neil et al., 2016; Zambrano and Bohte, 2016). Deep SNNs can be
Front. Neurosci. 11:682. queried for results already after the first output spike is produced, unlike ANNs where the result
doi: 10.3389/fnins.2017.00682 is available only after all layers have been completely processed (Diehl et al., 2015). SNNs are

Frontiers in Neuroscience | www.frontiersin.org 1 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

also naturally suited to process input from event-based sensors was restricted to having zero bias and only average-pooling
(Posch et al., 2014; Liu et al., 2015), but even in classical frame- layers. Their method was improved by Diehl et al. (2015),
based machine vision applications such as object recognition or who achieved nearly loss-less conversion of ANNs for the
detection, they have been shown to be accurate, fast, and efficient, MNIST (LeCun et al., 1998) classification task by using a weight
in particular when implemented on neuromorphic hardware normalization scheme. This technique rescales the weights to
platforms (Neil and Liu, 2014; Stromatias et al., 2015; Esser et al., avoid approximation errors in SNNs due to either excessive or
2016). SNNs could thus play an important role in supporting, too little firing of the neurons. Hunsberger and Eliasmith (2016)
or in some cases replacing deep ANNs in tasks where fast and introduced a conversion method where noise injection during
efficient classification in real-time is crucial, such as detection of training improves the robustness to approximation errors of the
objects in larger and moving scenes, tracking tasks, or activity SNN with more realistic biological neuron models. Esser et al.
recognition (Hu et al., 2016). (2016) demonstrated an approach that optimized CNNs for the
Multi-layered spiking networks have been implemented on TrueNorth platform which has binary weights and restricted
digital commodity platforms such as FPGAs (Neil and Liu, 2014; connectivity. Zambrano and Bohte (2016) have developed a
Gokhale et al., 2014), but spiking networks with more than tens conversion method using spiking neurons that adapt their firing
of thousands of neurons can be implemented on large-scale threshold to reduce the number of spikes needed to encode
neuromorphic spiking platforms such as TrueNorth (Benjamin information.
et al., 2014; Merolla et al., 2014) and SpiNNaker (Furber et al., These approaches achieve very good results on MNIST, but the
2014). Recent demonstrations with TrueNorth (Esser et al., 2016) SNN results are below state-of-the-art ANN results when scaling
show that CNNs of over a million neurons can be implemented up to networks that can solve CIFAR-10 (Krizhevsky, 2009). One
on a set of chips with a power dissipation of only a few hundred reason is that SNN implementations of many operators that are
mW. Given the recent successes of deep networks, it would be crucial for improved ANN error rate, such as max-pooling layers,
advantageous if spiking forms of deep ANN architectures such as softmax activation functions, and batch-normalization, are non-
VGG-16 can be implemented on these power-efficient platforms existent, and thus SNNs can only approximately match the
while still producing good error rates. This would allow the inference of an ANN. As a consequence, none of the previously
deployment of deep spiking networks in combination with an proposed conversion approaches are general enough for full
event-based sensor for real-world applications (Orchard et al., automatic conversion of arbitrary pre-trained ANNs taken from
2015a; Serrano-Gotarredona et al., 2015; Kiselev et al., 2016). a Deep-Learning model zoo available, for example, in Caffe1 .
In order to bridge the gap between Deep Learning continuous- In this work, we address some important shortcomings
valued networks and neuromorphic spiking networks, it is of existing ANN-to-SNN conversion methods. Through
necessary to develop methods that yield deep Spiking Neural mathematical analysis of the approximation of the output firing
Networks (SNNs) with equivalent error rates as their continuous- rate of a spiking neuron to the equivalent analog activation
valued counterparts. Successful approaches include direct value, we were able to derive a theoretical measure of the error
training of SNNs using backpropagation (Lee et al., 2016), introduced in the previous conversion process. On the basis
the SNN classifier layers using stochastic gradient descent of this novel theory, we propose modifications to the spiking
(Stromatias et al., 2017), or modifying the transfer function of neuron model that significantly improve the performance of
the ANNs during training so that the network parameters can be deep SNNs. By developing spiking implementations of max-
mapped better to the SNN (O’Connor et al., 2013; Esser et al., pooling layers, softmax activation, neuron biases, and batch
2015; Hunsberger and Eliasmith, 2016). The largest architecture normalization (Ioffe and Szegedy, 2015), we extend the suite of
trained by Hunsberger and Eliasmith (2016) in this way is CNNs that can be converted. In particular, we demonstrate for
based on AlexNet (Krizhevsky et al., 2012). While the results the first time that GoogLeNet Inception-V3 can be converted
are promising, these novel methods have yet to mature to the to an equivalent-accurate SNN. Further, we show that the
state where training spiking architectures of the size of VGG-16 conversion to spiking networks is synergistic with ANN network
becomes possible, and the same state-of-the-art error rate as the compression techniques such as parameter quantization and the
equivalent ANN is achieved. use of low-precision activations.
A more straightforward approach is to take the parameters To automate the process of transforming a pre-trained ANN
of a pre-trained ANN and to map them to an equivalent- into an SNN, we developed an SNN-conversion toolbox that
accurate SNN. Early studies on ANN-to-SNN conversion began is able to transform models written in Keras (Chollet, 2015),
with the work of Perez-Carrasco et al. (2013), where CNN Lasagne and Caffe, and offers built-in simulation tools for
units were translated into biologically inspired spiking units evaluation of the spiking model. Alternatively, the converted
with leaks and refractory periods, aiming for processing inputs SNN can be exported for use in spiking simulators like pyNN or
from event-based sensors. Cao et al. (2015) suggested a close Brian2. The documentation and source code is publicly available
link between the transfer function of a spiking neuron, i.e., online2 .
the relation between input current and output firing frequency The remainder of the paper is organized as follows: section
to the activation of a rectified linear unit (ReLU), which 2.1 outlines the conversion theory and section 2.2 presents the
is nowadays the standard model for the neurons in ANNs.
They report good performance error rates on conventional 1 https://github.com/BVLC/caffe/wiki/Model-Zoo

computer vision benchmarks, converting a class of CNNs that 2 http://snntoolbox.readthedocs.io/

Frontiers in Neuroscience | www.frontiersin.org 2 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

methods for implementing the different features of a CNN. The The principle of the ANN-to-SNN conversion method as
work in these two sections is extended from earlier work in introduced in Cao et al. (2015), Diehl et al. (2015), postulates
Rueckauer et al. (2016). section 3 presents the conversion results that the firing rate of a neuron ril correlates with its original
of networks tested on the MNIST, CIFAR-10, and ImageNet ANN activation ali in (1). In the following, we introduce a
datasets. membrane equation for the spiking neurons to formalize a
concrete relationship ril (t) ∝ ali .
2. METHODS
2.1.1. Membrane Equation
2.1. Theory for Conversion of ANNs into The spiking neuron integrates inputs zil (t) until the membrane
SNNs potential Vil (t) exceeds a threshold Vthr ∈ R+ and a spike is
The basic principle of converting ANNs into SNNs is that firing generated. Once the spike is generated, the membrane potential
rates of spiking neurons should match the graded activations of is reset. We discuss next two types of reset: reset to zero, used e.g.,
analog neurons. Cao et al. (2015) first suggested a mechanism in Diehl et al. (2015), always sets the membrane potential back
for converting (ReLU) activations, but a theoretical groundwork to a baseline, typically zero. Reset by subtraction, or “linear reset
for this principle was lacking. Here we present an analytical mode” in Diehl et al. (2016); Cassidy et al. (2013), subtracts the
explanation for the approximation, and on its basis we are threshold Vthr from the membrane potential at the time when it
able to derive a simple modification of the reset mechanism exceeds the threshold:
following a spike, which turns each SNN neuron into an unbiased
approximator of the target function (Rueckauer et al., 2016).    
l l l
We assume here a one-to-one correspondence between an  Vi (t − 1) + zi (t) 1 − 2t,i

 reset to zero (4a)
ANN unit and a SNN neuron, even though it is also possible to Vil (t) = Vil (t − 1) + zil (t) − Vthr 2lt,i reset by subtraction.
represent each ANN unit by a population of spiking neurons. For



a network with L layers let Wl , l ∈ {1, . . . , L} denote the weight (4b)
matrix connecting units in layer l − 1 to layer l, with biases bl .
From these membrane equations, we can derive slightly different
The number of units in each layer is M l . The ReLU activation of
approximation properties for the two reset mechanisms. In this
the continuous-valued neuron i in layer l is computed as:
section we analyze the first hidden layer and expand the argument

M l−1
 in section 2.1.2 to higher layers. We assume that the input
currents zi1 > 0 remain constant over time, and justify this
X
ali : = max 0, Wijl al−1
j + bli  , (1)
j=1 assumption in section 2.2.4. The input to first-layer neurons (2)
is then related to the ANN activations (1) via zi1 = Vthr a1i . In
starting with a0 = x, where x is the input, normalized so that each order to relate these ANN activations to the SNN spike rates, we
xi ∈ [0, 1]3 . Each SNN neuron has a membrane potential Vil (t), merely have to average the membrane Equations (4a) and (4b)
which integrates its input current at every time step: over the simulation time. The detailed calculations are given in
 l−1  the Supplementary Material; the resulting rates are obtained as
MX
l−1
zil (t) : = Vthr  Wijl 2t,j + bli  , (2)
Vthr Vi1 (t)

j=1  a1i rmax · − reset to zero (5a)
1

Vthr + ǫi t · (Vthr + ǫil )




where Vthr is the threshold and 2lt,i is a step function indicating 1
ri (t) = Vi1 (t)
the occurrence of a spike at time t:

 a1i rmax − reset by subtraction.



 t · Vthr
( (5b)
l : l l 1 if x ≥ 0
2t,i = 2(Vi (t −1)+zi (t)−Vthr ), with 2(x) =
0 else.
As expected, the spike rates are proportional to the ANN
(3)
activations a1i , but reduced by an additive approximation error
Every input pattern is presented for T time steps, with time step
term, and in case of reset to zero an additional multiplicative
size 1t ∈ R+ . The highest firing rate supported by a time stepped
error term. In the reset to zero case, with constant input, there
simulator is given by the inverse time resolution rmax : = 1/1t.
is always a constant number of time steps n1i between spikes of
Input rates to the first layer are proportional to the constant
the same neuron i, and the threshold will always be exceeded by
pixel intensities or RGB image values. We can compute the firing
the same constant amount ǫi1 = Vi1 (n1i ) − Vthr = n1i · zi1 −
rate of each SNN neuron i as ril (t) : = Nil (t)/t, where Nil (t) : =
Pt l
Vthr ≥ 0. This residual charge ǫi1 is discarded at reset, which
t ′ =1 2t ′ ,i is the number of spikes generated. results in a reduced firing rate and thereby loss of information.
For shallow networks and small datasets such as MNIST, this
3 Thisanalysis focuses on applications with image data sets, which are generally error seems to be a minor problem but we have found that an
transformed in this way. The argument could be extended to the case of zero-
centered data by interpreting negative input to the first hidden layer of the SNN
accumulation of approximation errors in deeper layers degrades
as coming from a class of inhibitory neurons, and inverting the sign of the charge the classification error rate. We also see from Equation (5a) that
deposited in the post-synaptic neuron. a larger Vthr and smaller inputs improve the approximation at

Frontiers in Neuroscience | www.frontiersin.org 3 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

the expense of longer integration times. Using the definition with 1Vil : = Vil (t)/(t · Vthr ). Thus, a neuron i in layer l
(n1i − 1)zi1 < Vthr ≤ n1i zi1 for n1i and ǫi1 = n1i zi1 − Vthr for receives an input spike train with a slightly lower spike rate,
ǫi1 , we find that the approximation error is limited from above reduced according to the quantization error 1V of previous
by the magnitude of the input zi1 . This insight further explains layer neurons. These errors accumulate for higher layers, which
why the weight normalization scheme of Diehl et al. (2015) explains why it takes longer to achieve high correlations of ANN
improves performance in the reset-to-zero case: By guaranteeing activations, and why SNN firing rates deteriorate in higher layers.
that the ANN activations a1i are too low to drive a neuron in the
SNN above Vthr within a single time step, we can keep zi1 = 2.2. Spiking Implementations of ANN
Vthr a1i and thereby ǫil low. Another obvious way of improving Operators
the approximation is to reduce the simulation time step, but this In this section we introduce new methods that improve the
comes at the cost of increased computational effort. classification error rate of deep SNNs (Rueckauer et al., 2016).
A simple switch to the reset by subtraction mechanism These methods either allow the conversion of a wider ranger of
improves the approximation, and makes the conversion scheme ANNs, or reduce the approximation errors in the SNN.
suitable also for deeper networks. The excess charge ǫ is not
discarded at reset and can be used for the next spike generation. 2.2.1. Converting Biases
Accordingly, the error term due to ǫ does not appear in Equation Biases are standard in ANNs, but were explicitly excluded by
(5b). Instead, the firing rate estimate in the first hidden layer previous conversion methods for SNNs. In a spiking network, a
converges to its target value a1i · rmax ; the only approximation bias can simply be implemented with a constant input current of
error due to the discrete sampling vanishes over time. We validate equal sign as the bias. Alternatively, one could present the bias
by simulations in section 3.1 that this mechanism indeed leads with an external spike input of constant rate proportional to the
to more accurate approximations of the underlying ANN than ANN bias, as proposed in Neftci et al. (2014), though then one
the methods proposed in Cao et al. (2015), Diehl et al. (2015), in may have to invert the sign of spikes to account for negative
particular for larger networks. biases. The theory in section 2.1 can be applied to the case of
neurons with biases, and the following section 2.2.2 shows how
2.1.2. Firing Rates in Higher Layers parameter normalization can be applied to biases as well.
The previous results were based on the assumption that the
neuron receives a constant input z over the simulation time. 2.2.2. Parameter Normalization
When neurons in the hidden layers are spiking, this condition One source of approximation errors is that in time-stepped
only holds for the first hidden layer and for inputs in the simulations of SNNs, the neurons are restricted to a firing rate
form of analog currents instead of irregular spike trains. In range of [0, rmax ], whereas ANNs typically do not have such
the reset-by-subtraction case, we can derive analytically how the constraints. Weight normalization is introduced by Diehl et al.
approximation error propagates through the deeper layers of the (2015) as a means to avoid approximation errors due to too low
network. For this, we insert the expression for SNN input zil from or too high firing. This work showed significant improvement
Equation (2) into the membrane Equation (4b) for l > 1, average of the performance of converted SNNs by using a data-based
Vil (t) over the simulation time, and solve for the firing rate ril (t). weight normalization mechanism. We extend this method to the
This yields: case of neurons with biases and suggest a method that makes the
normalization process more robust to outliers.
M l−1
X Vil (t)
ril (t) = Wijl rjl−1 (t) + rmax bli − . (6) 2.2.2.1. Normalization with biases
t · Vthr The data-based weight normalization mechanism is based on
j=1
the linearity of the ReLU unit used for ANNs. It can simply be
This equation states that the firing rate of a neuron in layer l extended to biases by linearly rescaling all weights and biases
is given by the weighted sum of the firing rates of the previous such that the ANN activation a [as computed in Equation (1)]
layer, minus the time-decaying approximation error described in is smaller than 1 for all training examples. In order to preserve
Equation (5b). This relationship implies that each layer computes the information encoded within a layer, the parameters of a
a weighted sum of the approximation errors of earlier layers, layer need to be scaled jointly. Denoting the maximum ReLU
and adds its own approximation error. The recursive expression activation in layer l as λl = max[al ], then weights Wl and biases
l−1
Equation (6) can be solved iteratively by inserting the expression bl are normalized to Wl → Wl λλl and bl → bl /λl .
for the previous layer rates, starting with the known rates of the
first layer Equation (5b): 2.2.2.2. Robust normalization
Although weight normalization avoids firing rate saturation in
M l−1
X SNNs, it might result in very low firing rates, thereby increasing
ril = ali rmax − 1Vill − Will il−1 1Vil−1
l−1
− ··· the latency until information reaches the higher layers. We
il−1 =1 refer to the algorithm described in the previous paragraph as
M l−1
X M
X
1 “max-norm,” because the normalization factor λl was set to the
− Will il−1 ··· Wi22 i1 1Vi11 (7) maximum ANN activation within a layer, where the activations
il−1 =1 i1 =1 are computed using a large subset of the training data. This is a

Frontiers in Neuroscience | www.frontiersin.org 4 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

γil
 
very conservative approach, which ensures that the SNN firing and b̃li = bli − µli + βil . This makes it simple to convert
σil
rates will most likely not exceed the maximum firing rate. The
BN layers into SNNs, because after transforming the weights of
drawback is that this procedure is prone to be influenced by
the preceding layer, no additional conversion for BN layers is
singular outlier samples that lead to very high activations, while
necessary. Empirically we found loss-less conversion when the
for the majority of the remaining samples, the firing rates will
BN parameters are integrated into other weights. The advantage
remain considerably below the maximum rate.
lies purely in obtaining better performing ANNs if BN is used
Such outliers are not uncommon, as shown in Figure 1A,
during training.
which plots the log-scale distribution of all non-zero activations
in the first convolution layer for 16,666 CIFAR10 samples. The 2.2.4. Analog Input to First Hidden Layer
maximum observed activation is more than three times higher Because event-based benchmark datasets are rare (Hu et al.,
than the 99.9th percentile. Figure 1B shows the distribution of 2016; Rueckauer and Delbruck, 2016), conventional frame-based
the highest activations across the 16,666 samples for all ANN image databases such as MNIST (LeCun et al., 1998) or CIFAR
units in the same layer, revealing a large variance across the (Krizhevsky, 2009) have been used to evaluate the classification
dataset, and a peak that is far away from the absolute maximum. error rate of the converted SNN. Previous methods (Cao et al.,
This distribution explains why normalizing by the maximum 2015; Diehl et al., 2015) usually transform the analog input
can result in a potentially poor classification performance of the activations, e.g., gray levels or RGB values, into Poisson firing
SNN. For the vast majority of input samples, even the maximum rates. But this transformation introduces variability into the firing
activation of units within a layer will lie far below the chosen of the network and impairs its performance.
normalization scale leading to insufficient firing within the layer Here, we interpret the analog input activations as constant
to drive higher layers and subsequently worse classification currents. Following Equation (2), the input to the neurons in the
results. first hidden layer is obtained by multiplying the corresponding
We propose a more robust alternative where we set λl to kernels with the analog input image x:
the p-th percentile of the total activity distribution of layer l4 .
This choice discards extreme outliers, and increases SNN firing
 0 
XM
rates for a larger fraction of samples. The potential drawback zi1 : = Vthr  1 1
Wij xj + bi  . (8)
is that a small percentage of neurons will saturate, so choosing j=1
the normalization scale involves a trade-off between saturation
and insufficient firing. In the following, we refer to the percentile This results in one constant charge value zil per neuron i, which is
p as the “normalization scale,” and note that the “max-norm” added to the membrane potential at every time step. The spiking
method is recovered as the special case p = 100. Typical output then begins with the first hidden layer. Empirically we
values for p that perform well are in the range [99.0, 99.999]. In found this to be particularly effective in the low-activation regime
general, saturation of a small fraction of neurons leads to a lower of ANN units, where usually undersampling in spiking neurons
degradation of the network classification error rate compared poses a challenge for successful conversion.
to the case of having spike rates that are too low. This method
can be combined with batch-normalization (BN) used during 2.2.5. Spiking Softmax
ANN training (Ioffe and Szegedy, 2015), which normalizes the Softmax is commonly used on the outputs of a deep ANN,
activations in each layer and therefore produces fewer extreme because it results in normalized and strictly positive class
outliers. likelihoods. Previous approaches for ANN-to-SNN conversion
did not convert softmax layers, but simply predicted the output
2.2.3. Conversion of Batch-Normalization Layers class corresponding to the neuron that spiked most during the
Batch-normalization reduces internal covariate shift in ANNs presentation of the stimulus. However, this approach fails when
and thereby speeds up the training process. BN introduces all neurons in the final layer receive negative inputs, and thus
additional layers where affine transformations of inputs are never spike.
performed in order to achieve zero-mean and unit variance. An Here we implement two versions of a spiking softmax layer.
input x is transformed into BN[x] = σγ (x − µ) + β, where The first is based on the mechanism proposed in Nessler et al.
mean µ, variance σ , and the two learned parameters β and γ (2009), where output spikes are triggered by an external Poisson
are all obtained during training as described in Ioffe and Szegedy generator with fixed firing rate. The spiking neurons do not
(2015). After training, these transformations can be integrated fire on their own but simply accumulate their inputs. When the
into the weight vectors, thereby preserving the effect of BN, but external generator determines that a spike should be produced,
eliminating the need to compute the normalization repeatedly for a softmax competition according to the accumulated membrane
γil potentials is performed. The second variant of our spiking
each sample during inference. Specifically, we set W̃ijl = Wl
σil ij softmax function is similar, but does not rely on an external
clock. To determine if a neuron should spike, we compute the
4 This
softmax on the membrane potentials, and use the resulting values
distribution is obtained by computing the ANN activations on a large
fraction of the training set. From this, the scaling factor can be determined and
in range of [0, 1] as rate parameters in a Poisson process for
applied to the layer parameters. This has to be done only once for a given network; each neuron. In both variants, the final classification result over
during inference the parameters do not change. the course of stimulus presentation is then given by the index

Frontiers in Neuroscience | www.frontiersin.org 5 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

FIGURE 1 | Distribution of all non-zero activations in the first convolution layer of a CNN, for 16666 CIFAR10 samples, and plotted in log-scale. The dashed line in
both plots indicates the 99.9th percentile of all ReLU activations across the dataset, corresponding to a normalization scale λ = 6.83. This is more than three times
less than the overall maximum of λmax = 23.16. (B) Distribution of maximum ReLU activations for the same 16666 CIFAR10 samples. For most samples their
maximum activation is far from λmax . (A) ANN activations; (B) Maximum ANN activation.

of the neuron with the highest firing rate, as before. We prefer connections to a neuron, and similarly fan-out fout as the number
the second variant because it does not depend on an additional of outgoing projections to neurons in the subsequent layer. To
hyperparameter. A third variant has been suggested by one of give some examples: In a convolutional layer, the fan-in is given
the reviewers: Since the softmax is applied at the last layer of the by the size of the 2-dimensional convolution kernel multiplied by
network, one could simply infer the classification output from the the number of channels in the previous layer. In a fully-connected
softmax computed on the membrane potentials, without another layer, the fan-in simply equals the number of neurons in the
spike generation mechanism. This simplification could speed up preceding layer. The fan-out of a neuron in a convolutional layer
inference time and possibly improve the accuracy by reducing l that is followed by another convolution layer l + 1 generally
stochasticity. This method is appealing where one does not insist depends on the stride of layer l + 1. If the stride is 1, the fan-
upon a purely spiking network. out is simply given by the size of the 2-dimensional convolution
kernel of layer l+1, multiplied by the number of channels in layer
2.2.6. Spiking Max-Pooling Layers l + 1. Note that the fan-out may be reduced in corners and along
Most successful ANNs use max-pooling to spatially down-sample edges of the feature map depending on how much padding is
feature maps. However, this has not been used in SNNs because applied.
computing maxima with spiking neurons is non-trivial. Instead, In case of the ANN, the total number of floating-point
simple average pooling used in Cao et al. (2015), Diehl et al. operations for classification of one frame is given by:
(2015), results in weaker ANNs being trained before conversion.
Lateral inhibition, as suggested in Cao et al. (2015), does not L
fulfill the job properly, because it only selects the winner, but
X
(2fin,l + 1)nl Ops/frame, (9)
not the actual maximum firing rate. Another suggestion is to l=1
use a temporal Winner-Take-All based on time-to-first-spike
encoding, in which the first neuron to fire is considered the
with nl the number of neurons in layer l. The factor 2 comes from
maximally firing one (Masquelier and Thorpe, 2007; Orchard
the fact that each fan-in operation consist of a multiplication and
et al., 2015b). Here we propose a simple mechanism for spiking
addition. With +1, we count the operations needed to add the
max-pooling, in which output units contain gating functions that
bias. The pooling operation is not considered here.
only let spikes from the maximally firing neuron pass, while
In the case of an SNN, only additions are needed when the
discarding spikes from other neurons. The gating function is
neuron states are updated. We adopt the notation from Merolla
controlled by computing estimates of the pre-synaptic firing
et al. (2014) and report the Synaptic Operations, i.e., the updates
rates, e.g., by computing an online or exponentially weighted
in the neurons of a layer caused by a spike in the previous layer5 .
average of these rates. In practice we found several methods to
The total number of synaptic operations in the SNN across the
work well, but demonstrate only results using a finite impulse
response filter to control the gating function.
5 This synaptic operation count does not include updates of the state variables due

to a bias or dynamics of the post-synaptic potential (which is instantaneous in


2.3. Counting Operations our case). We validated in our simulations that the operations caused by the bias
To obtain the number of operations in the networks during are about two orders of magnitude fewer in number than synaptic operations, in
classification, we define as fan-in fin the number of incoming addition to being less costly in terms of memory fetches).

Frontiers in Neuroscience | www.frontiersin.org 6 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

simulation duration T is 3.1. Contribution of Improved ANN


T
" L # Architectures
X X The methods introduced in section 2 allow conversion of CNNs
fout,l sl (t) Ops/frame, (10)
that use biases, softmax, batch-normalization, and max-pooling
t=1 l=1
layers, which all improve the classification error rate of the ANN.
The performance of a converted network was quantified on the
where sl (t) denotes the number of spikes fired in layer l at time t.
CIFAR-10 benchmark (Krizhevsky, 2009), using a CNN with 4
In the ANN, the number of operations needed to classify one
convolution layers (32 3×3 - 32 3×3 - 64 3×3 - 64 3×3), ReLU
image, consisting of the cost of a full forward-pass, is a constant.
activations, batch-normalization, 2×2 max-pooling layers after
In the SNN, the image is presented to the network for a certain
the 2nd and 4th convolutions, followed by 2 fully connected
simulation duration, and the network outputs a classification
layers (512 and 10 neurons respectively) and a softmax output.
guess at every time step. By measuring both the classification
This ANN achieved 12.14% error rate (Table 1). Constraining the
error rate and the operation count at each step during simulation,
biases to zero increased the error rate to 12.27%. Replacing max-
we are able to display how the classification error rate of the
pooling by average-pooling further decreased the performance
SNN gradually decreases with increasing number of operations
to 12.31%. Eliminating the softmax and using only ReLUs in the
(cf Figure 4).
output led to a big drop to 30.56%. With our new methods we can
The two different modes of operation—single forward pass
therefore start the conversion already with much better ANNs
in the ANN vs. continuous simulation in the SNN—have
than was previously possible.
significant implications when aiming for an efficient hardware
implementation. One well known fact is that additions required
in SNNs are cheaper than multiply accumulates needed in 3.2. Contribution of Improved SNN
ANNs. For instance, our simulations in a Global Foundry 28 Conversion Methods
nm process show that the cost of performing a 32-bit floating- Figure 2 shows that in the case of CIFAR-10, the conversion
point addition is about 14 X lower than that of a MAC operation of the best ANN into an SNN using the default approach (i.e.,
and the corresponding chip area is reduced by 21 X. It has also no normalization, Poisson spike train input, reset-to-zero) fails,
been shown that memory transfer outweighs the energy cost yielding an error rate of 83.50%, barely above chance level.
of computations by two orders of magnitude (Horowitz, 2014). Adding the data-based weight normalization (Diehl et al., 2015)
In the ANN, reading weight kernels and neuron states from (green bar) lowers the error rate to 40.18%, but this is still a
memory, and writing states back to memory is only done once big drop from the ANN result of 12.14% (dashed black line).
during the forward pass of one sample. In contrast, memory Changing to the reset-by-subtraction mechanism from section 2.1
access in the SNN is less predictable and has to be repeated leads to another 20% improvement (brown bar), and switching to
for individual neurons in proportion to their spike rates. If the analog inputs to the first hidden layer instead of Poisson spike
number of operations needed by the SNN to achieve a similar trains results in an error rate of 16.40% (orange bar). Finally,
classification error as that of the ANN is lower, then equivalently using the 99.9th percentile of activations for robust weight
the SNN would also have a reduction in the number of memory normalization yields 12.18% error rate, which is on par with
accesses. The direct implementation of SNNs on dedicated the ANN performance and gives our best result for CIFAR-10.
spiking hardware platforms like SpiNNaker or TrueNorth is We therefore conclude that our proposed mechanisms for ANN
left to future work, and will be necessary for estimating the training and ANN-to-SNN conversion contribute positively to
real energy cost in comparison to the cost of implementing the success of the method. The conversion into a SNN is nearly
the original ANNs on custom ANN hardware accelerators like loss-less, and the results are very competitive for classification
Eyeriss (Chen et al., 2017). benchmarks using SNNs (Table 1). These results were confirmed
also on MNIST, where a 7-layer network with max-pooling
3. RESULTS achieved an error rate of 0.56%, thereby improving previous
state-of-the-art results for SNNs reported by Diehl et al. (2015)
There are two ways of improving the classification error rate and Zambrano and Bohte (2016).
of an SNN obtained via conversion: (1) training a better SNNs are known to exhibit a so-called accuracy-latency
ANN before conversion, and (2) improving the conversion by trade-off (Diehl et al., 2015; Neil et al., 2016), which means
eliminating approximation errors of the SNN. We proposed that the error rate drops the longer the network is simulated,
several techniques for these two approaches in section 2; in i.e., the more operations we invest. The latency in which
sections 3.1 and 3.2 we evaluate their effect using the CIFAR- the final error rate is achieved, is dependent on the type of
10 data set. section 3.3 extends the SNN conversion methods parameter normalization as illustrated by the three curves in
to the ImageNet data set. In section 3.4 we show that SNNs Figure 3. Parameter normalization is necessary to improve upon
feature an accuracy-vs.-operations trade-off that allow tuning the chance-level classification (blue, no normalization). However,
performance of a network to a given computational budget. our previous max-norm method (green) converges very slowly
The networks were implemented in Keras (Chollet, 2015). to the ANN error rate because the weight scale is overly reduced
Some of the CIFAR-10 results were previously reported and spike-activity is low. With a robust normalization using the
in Rueckauer et al. (2016). 99.9th percentile of the activity distribution, the weights are larger

Frontiers in Neuroscience | www.frontiersin.org 7 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

TABLE 1 | Classification error rate on MNIST, CIFAR-10 and ImageNet for our converted spiking models, compared to the original ANNs, and compared to spiking
networks from other groups.

Data set [architecture] ANN err. SNN err. Neur. Synap.

MNIST [ours] 0.56 0.56 8k 1.2 M


MNIST [Zambrano and Bohte, 2016] 0.86 0.86 27 k 6.6 M
CIFAR-10 [ours, BinaryNet sign] 11.03 11.75 0.5 M 164 M
CIFAR-10 [ours, BinaryNet Heav] 11.58 12.55 0.5 M 164 M
CIFAR-10 [ours, BinaryConnect, binarized at infer.] 16.81 16.65 0.5 M 164 M
CIFAR-10 [ours, BinaryConnect, full prec. at infer.] 8.09 9.15 0.5 M 164 M
CIFAR-10 [ours] 11.13 11.18 0.1 M 23 M
CIFAR-10 [Esser et al., 2016], 8 chips NA 12.50 8M NA
CIFAR-10 [Esser et al., 2016], single chip NA 17.50 1M NA
CIFAR-10 [Hunsberger and Eliasmith, 2016]* 14.03 16.46 50 k NA
CIFAR-10 [Cao et al., 2015]** 20.88 22.57 35 k 7.4 M
ImageNet [ours, VGG-16]† 36.11 (15.14) 50.39 (18.37) 15 M 3.5 B
ImageNet [ours, Inception-V3]†† 23.88 (7.01) 25.40 (7.96) 11.7 M 0.5 B
ImageNet [Hunsberger and Eliasmith, 2016]‡ NA 48.20 (23.80) 0.5 M NA

The reported error rate is top-1, with top-5 in brackets for ImageNet.
† ††
*Cropped to 24x24. **Cropped to 24x24. On a subset of 2570 samples, using single-scale images of size 224x224. On a subset of 1382 samples, using single-scale images of size

299x299. On a subset of 3072 samples. The values in bold highlight the best SNN result for a particular data set.

FIGURE 2 | Influence of novel mechanisms for ANN-to-SNN conversion on


the SNN error rate for CIFAR-10.
FIGURE 3 | Accuracy-latency trade-off. Robust parameter normalization (red)
enables our spiking network to correctly classify CIFAR-10 samples much
faster than using our previous max-normalization (green). Not normalizing
and convergence is much faster. Empirically, the best results leads to classification at chance level (blue).
were obtained with normalization factors in the range between
the 99th and 99.9th percentile of activations, which allows the
network to converge quickly to error rates similar to those of the
underlying ANN. first places in the localization and classification competitions
This accuracy-latency trade-off is very prominent in case of of the ImageNet ILSVRC-2014 respectively. By introducing
the classic LeNet architecture on MNIST (Figure 5). While the inception modules and bottlenecks, GoogLeNet requires 12X
ANN achieves an error rate of 1.04 % using a fixed amount of 2.35 fewer parameters and significantly less computes than VGG-
MOps per frame, the spiking model reaches within 1 percentage 16, even though the total layer count is much higher. Since
point of the ANN using 2x less operations (2.07 % error rate at their initial introduction in 2014, both architectures have been
1.07 MOps/frame). At 1.47 MOps, the SNN error rate is 1.13 %. improved. The third version of GoogLeNet which was released
The SNN then continues to improve until it reaches 1.07 % error in 2015 as Inception-V3 (Szegedy et al., 2016), improved on the
rate at the end of the simulation. ImageNet results to state-of-the art 5.6% top-5 error rate, and
uses 2.5X more computes than the original GoogLeNet. This was
3.3. ImageNet in part done by further reducing the kernel size and dimensions
VGG Simonyan and Zisserman (2014) and GoogLeNet Szegedy inside the network, applying regularization via batch-normalized
et al. (2015) are two deep network architectures that won auxiliary classifiers, and label smoothing.

Frontiers in Neuroscience | www.frontiersin.org 8 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

3.3.1. Transient Dynamics and Voltage Clamp of neurons is 0.016 Hz8 in VGG-16, and 0.053 Hz in
While the conversion pipeline outlined in section 2 can deliver Inception-V3.
converted SNNs that produced equivalent error rates as the We expect that the transient of the network could be reduced
original ANNs on the MNIST and CIFAR-10 data sets, the by training the network with constraints on the biases or the β
error rate of the converted Inception-V3 was initially far parameter of the batch-normalization layers. Table 1 summarizes
from the error rate of the ANN. One main reason is that the error rates achieved by our SNNs using the methods
neurons undergo a transient phase at the beginning of the presented above, and compares them to previous work by other
simulation because a few neurons have large biases or large groups.
input weights. During the first few time steps, the membrane
potential of each neuron needs to accumulate input spikes 3.4. Combination with Low-Precision
before it can produce any output. The firing rates of neurons Models
in the first layer need several time steps to converge to a steady The neurons in our spiking network emit events at a rate
rate, and this convergence time is increased in higher layers proportional to the activation of the corresponding unit in
that receive transiently varying input. The convergence time the ANN. Target activations with reduced precision can be
is decreased in neurons that integrate high-frequency input, approximated more quickly and accurately with a small number
but increased in neurons integrating spikes at low frequency6 . of spike events. For instance, if the activations are quantized into
Another factor contributing to a large transient response are values of {0, 0.1, 0.2, ..., 0.9, 1.0}, the spiking neuron can perfectly
1×1 convolution layers. In these layers, the synaptic input to represent each value within at most 10 time steps. On the other
a single neuron consists only of a single column through the hand, to approximate a floating-point precision number using 16
channel-dimension of the previous layer, so that the neuron’s bit precision, the neuron in the worst case would have to be active
bias or a single strongly deviating synaptic input may determine for 216 = 65536 time steps.
the output dynamics. With larger kernels, more spikes are To demonstrate the potential benefit of using low-
gathered that can outweigh the influence of e.g., a large precision activations when transforming a given model
bias7 . into a spiking network, we apply the methods from section
In order to overcome the negative effects of transients in 2.2 to BinaryNet Courbariaux et al. (2016), a CNN where both
neuron dynamics, we tried a number of possible solutions, weights and activations are constrained to either {0, +1}, or
including the initializations of the neuron states, different reset {−1, +1}. To obtain the binarized ANNs with these two sets
mechanisms, and bias relaxation schemes. The most successful of activations, we train BinaryNet using the publicly available
approach we found was to clamp the membrane potential to zero source code Courbariaux et al. (2016) on two different activation
for the first N time-steps, where N increases linearly with the functions: First with a Heaviside activation function, and second,
layer depth l: N(l) = d · l. The slope d represents the temporal with a signed activation function. The two binarized models are
delay between lifting the clamp from consecutive layers. The then converted into spiking networks. Instead of interpreting
longer the delay d, the more time is given to a previous layer to the negative activations of BinaryNet “sign” as negative firing
converge to steady-state before the next layer starts integrating rates, we invert the sign of the spikes emitted by neurons with a
its output. negative activation. To achieve this, we add a second threshold at
This simple modification of the SNN state variables removes −1, where neurons can emit spikes of size −1 if the threshold is
the transient response completely (see Figure S1), because by reached from above.
the time the clamp is lifted from post-synaptic neurons, the By virtue of the quantized activations, these two SNNs are able
presynaptic neurons have settled at their steady-state firing- to approximate the ANN activations with very few operations
rate. We found a clamping delay of d = 10 in Inception- (see Figure 4). The BinaryNet SNNs already show an error
V3 to be sufficient. Clamping the membrane potential in rate which is close to the ANN target error rates early in
VGG-16 did not have a notable impact on the error rate. the simulation, in fact as soon as the first output spikes are
Each input image was presented to the converted VGG-16 produced. In contrast, in full-precision models (cf. Figures 3, 5),
spiking network for 400 time steps, and to the converted the classification error rate starts at chance level and drops over
Inception-V3 for 550 time steps. The average firing rate the course of the simulation, as more operations are invested.
The lowest error rate for our converted spiking CIFAR-10
models is achieved using BinaryConnect (Courbariaux et al.,
6 An ANN neuron responds precisely the same whether (A) receiving input from
2015). This network is trained using full-precision weights in
a neuron with activation 0.1 and connecting weight 0.8, or (B) activation 0.8 and
weight 0.1. In contrast, the rate of an SNN neuron will take longer to converge
combination with binarized weights. Either set of weights can
in case (A) than in (B). This phenomenon forms the basis of the accuracy-latency be used during inference. We test the resulting model with
trade-off mentioned above: One would like to keep firing rates as low as possible both the binarized weights and the full-precision copy kept
to reduce the operational cost of the network, but has to sacrifice approximation during training (cf. Table 1). These results illustrate how spiking
accuracy for it.
7 Even though the parameters in each layer were normalized such that the input to
networks benefit from and at the same time complement the
each neuron is below threshold, this does not guarantee that all biases are sub- strengths of low-precision models.
threshold: their effect could be reduced by inhibitory input spikes. While such
inhibitory synaptic input is still missing at the onset of the simulation, the output 8 As our neuron model does not contain any time constant, this unit should be read

dynamics of a neuron will be dominated by a large bias. as “spikes per simulation time step” and is not related to spikes per wall-clock time.

Frontiers in Neuroscience | www.frontiersin.org 9 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

images, (Hunsberger and Eliasmith, 2016) achieve 16.46% error


rate. Better SNN error rates to date have only been reported by
Esser et al. (2016), where an error rate of 12.50% was reported
for a very large network optimized for 8 TrueNorth chips, and
making use of ternary weights and multiple 1×1 network-in-
network layers. A smaller network fitting on a single chip is
reported to achieve 17.50%. In our own experiments with similar
low-precision training schemes for SNNs, we converted the
BinaryConnect model by Courbariaux et al. (2016) to 8.65% error
rate on CIFAR10, which is by far the best SNN result reported to
date.
In addition to the improved SNN results on MNIST and
CIFAR-10, this work presents for the first time, a spiking network
implementation of VGG-16 and Inception-V3 models, utilizing
simple non-leaky integrate-and-fire neurons. The top-5 error
rates of the SNNs during inference lie close to the original ANNs.
Future investigations will be carried out to identify additional
conversion methods that will allow the VGG-16 SNN to reach
FIGURE 4 | Classification error rate vs number of operations for the BinaryNet the error rate of the ANN. For instance, we expect a reduction
ANN and SNN implementation on the complete CIFAR-10 dataset.
in the observed initial transients of higher up layers within large
networks, by training the networks with constraints on the biases.
With BinaryNet (an 8-layer CNN with binary weights and
activations tested on CIFAR-10) (Courbariaux et al., 2016), we
demonstrated that low-precision models are well suited for
conversion to spiking networks. While the original network
requires a fixed amount of 1.23 GOps to classify a single frame
with average error rate of 11.57%, the SNN can be queried for
a classification result at a variable number of operations. For
instance, the average error rate of the SNN is 15.13% at 0.46
GOps (2.7x reduction), and improves further when investing
more operations. This reduction in operation count is due to
the fact that, first, activation values at lower precision can more
easily be approximated by discrete spikes, and second, zero
activations are natively skipped in the activity-driven operation
of spiking networks. In light of this, our work builds upon and
complements the recent advances in low-precision models and
network compression.
The converted networks highlight a remarkable feature of
FIGURE 5 | Classification error rate vs number of operations for the LeNet spiking networks: While ANNs require a fixed amount of
ANN and SNN implementation on the MNIST dataset.
computations to achieve a classification result, the final error rate
in a spiking network drops off rapidly during inference when an
increasing number of operations is used to classify a sample. The
4. DISCUSSION network classification error rate can be tailored to the number
of operations that are available during inference, allowing for
This work presents two new developments. The first is a novel accurate classification at low latency and on hardware systems
theory that describes the approximation of an SNN firing with limited computational resources. In some cases, the number
rates to its equivalent ANN activations. The second is the of operations needed for correct classification can be reduced
techniques to convert almost arbitrary continuous-valued CNNs significantly compared to the original ANN. We found a savings
into spiking equivalents. By implementing SNN-compatible in computes of 2x for smaller full-precision networks (e.g.,
versions of common ANN CNN features such as max pooling, LeNet has 8 k neurons and 1.2 M connections), and larger low-
softmax, batch normalization, biases and Inception modules, we precision models (e.g., BinaryNet has 0.5 M neurons and 164
allow a larger class of CNNs including VGG-16 and GoogLeNet M connections). These savings did not scale up to the very
Inception-V3 to be converted into SNNs. Table 1 shows that our large networks such as VGG-16 and Inception-V3 with more
SNN results compare favorably to previous SNN results on all than 11 M neurons and over 500 M connections. One reason is
tested data sets: (Cao et al., 2015) achieved 22.57% error rate that each additional layer in the SNN introduces another stage
on CIFAR-10, albeit with a smaller network and after cropping where high-precision activations need to be approximated by
images to 24×24. With a similarly small network and cropped discrete spikes. We show in Equation (5b) that this error vanishes

Frontiers in Neuroscience | www.frontiersin.org 10 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

over time. But since higher layers are driven by inputs that and tested the spiking max-pool layer. I-AL contributed to some
contain approximation errors from lower layers (cf. Equation of the experiments. MP and S-CL contributed to the design of
6), networks of increasing depth need to be simulated longer the experiments, the analysis of the data, and to the writing of the
for an accurate approximation. We are currently investigating manuscript.
spike encoding schemes that make more efficient use of temporal
structure than the present rate-based encoding. Mostafa et al. FUNDING
(2017) present such an approach where the precise spike time
is used to train a network to classify MNIST digits with a single This work has been supported by the Samsung Advanced
spike per neuron. Such a sparse temporal code clearly reduces the Institute of Technology, University of Zurich and ETH
cost of repeated weight fetches which dominates in rate-encoded Zurich.
SNNs.
Finally, this conversion framework allows the deployment of ACKNOWLEDGMENTS
state-of-the-art pre-trained high-performing ANN models onto
energy-efficient real-time neuromorphic spiking hardware such We thank Jun Haeng Lee for helpful comments and discussions,
as TrueNorth (Benjamin et al., 2014; Merolla et al., 2014; Pedroni and the reviewers for their valuable contributions.
et al., 2016).
SUPPLEMENTARY MATERIAL
AUTHOR CONTRIBUTIONS
The Supplementary Material for this article can be found
BR developed the theory, implemented the methods, conducted online at: https://www.frontiersin.org/articles/10.3389/fnins.
the experiments and drafted the manuscript. YH implemented 2017.00682/full#supplementary-material

REFERENCES Esser, S. K., Merolla, P. A., Arthur, J. V., Cassidy, A. S., Appuswamy,
R., Andreopoulos, A., et al. (2016). Convolutional networks for fast,
Benjamin, B. V., Gao, P., McQuinn, E., Choudhary, S., Chandrasekaran, energy-efficient neuromorphic computing. Proc. Natl. Acad. Sci. U.S.A. 113,
A. R., Bussat, J.-M., et al. (2014). Neurogrid: a mixed-analog-digital 11441–11446. doi: 10.1073/pnas.1604850113
multichip system for large-scale neural simulations. Proc. IEEE 102, 699–716. Farabet, C., Paz, R., Pérez-Carrasco, J., Zamarreño-Ramos, C., Linares-Barranco,
doi: 10.1109/JPROC.2014.2313565 A., LeCun, Y., et al. (2012). Comparison between frame-constrained fix-pixel-
Cao, Y., Chen, Y., and Khosla, D. (2015). Spiking deep convolutional neural value and frame-free spiking-dynamic-pixel convNets for visual processing.
networks for energy-efficient object recognition. Int. J. Comput. Vis. 113, 54–66. Front. Neurosci. 6:32. doi: 10.3389/fnins.2012.00032
doi: 10.1007/s11263-014-0788-3 Furber, S. B., Galluppi, F., Temple, S., and Plana, L. A. (2014). The SpiNNaker
Cassidy, A. S., Merolla, P., Arthur, J. V., Esser, S. K., Jackson, B., Alvarez- project. Proc. IEEE 102, 652–665. doi: 10.1109/JPROC.2014.2304638
Icaza, R., et al. (2013). “Cognitive computing building block: a versatile Gokhale, V., Jin, J., Dundar, A., Martini, B., and Culurciello, E. (2014). “A 240 G-
and efficient digital neuron model for neurosynaptic cores,” in Proceedings ops/s mobile coprocessor for deep neural networks,” in IEEE Computer Society
of the International Joint Conference on Neural Networks (Dallas, TX). Conference on Computer Vision and Pattern Recognition Workshops (Columbus,
doi: 10.1109/IJCNN.2013.6707077 OH), 696–701. doi: 10.1109/CVPRW.2014.106
Chen, Y.-H., Krishna, T., Emer, J. S., and Sze, V. (2017). Eyeriss: an energy-efficient Horowitz, M. (2014). “Computing’s energy problem (and what we can do about it),”
reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid in Digest of Technical Papers IEEE International Solid-State Circuits Conference,
State Circ. 52, 127–138. doi: 10.1109/JSSC.2016.2616357 Vol. 57 (San Francisco, CA), 10–14. doi: 10.1109/ISSCC.2014.6757323
Chollet, F. (2015). Keras (Version 2.0) [Computer software]. Available online at: Hu, Y., Liu, H., Pfeiffer, M., and Delbruck, T. (2016). DVS benchmark datasets
https://github.com/fchollet/keras for object tracking, action recognition, and object recognition. Front. Neurosci.
Courbariaux, M., Bengio, Y., and David, J.-P. (2015). “BinaryConnect: training 10:405. doi: 10.3389/fnins.2016.00405
deep neural networks with binary weights during propagations,” in Advances Hunsberger, E., and Eliasmith, C. (2016). Training spiking deep networks for
in Neural Information Processing Systems 28 (NIPS 2015) (Montréal, QC), 1–9. neuromorphic hardware. arXiv:1611.05141.
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., and Bengio, Y. (2016). Ioffe, S., and Szegedy, C. (2015). Batch normalization: accelerating deep network
Binarized neural networks: training deep neural networks with weights and training by reducing internal covariate shift. arXiv:1502.03167.
activations constrained to +1 or -1. arXiv:1602.02830. Kiselev, I., Neil, D., and Liu, S. C. (2016). “Event-driven deep neural
Diehl, P. U., Neil, D., Binas, J., Cook, M., Liu, S.-C., and Pfeiffer, M. network hardware system for sensor fusion,” in Proceedings - IEEE
(2015). “Fast-classifying, high-accuracy spiking deep networks through International Symposium on Circuits and Systems (Montréal, QC), 2495–2498.
weight and threshold balancing,” in Proceedings of the International doi: 10.1109/ISCAS.2016.7539099
Joint Conference on Neural Networks (Killarney). doi: 10.1109/IJCNN.2015. Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images.
7280696 Technical Report, University of Toronto.
Diehl, P. U., Pedroni, B. U., Cassidy, A., Merolla, P., Neftci, E., and Zarrella, G. Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). “ImageNet classification
(2016). “TrueHappiness: neuromorphic emotion recognition on TrueNorth,” with deep convolutional neural networks,” in Advances in Neural Information
in Proceedings of the International Joint Conference on Neural Networks Processing Systems (Lake Tahoe, NV), 1–9.
(Vancouver, BC), 4278–4285. doi: 10.1109/IJCNN.2016.7727758 LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). “Gradient-based learning
Esser, S. K., Arthur, J. V., Merolla, P. A., Modha, D. S., and Appuswamy, R. (2015). applied to document recognition,” in Proceedings of the IEEE (IEEE), Vol. 86,
“Backpropagation for energy-efficient neuromorphic computing,” in Advances 2278–2323.
in Neural Information Processing Systems 28 (NIPS 2015) (Montréal, QC), 1– Lee, J. H., Delbruck, T., and Pfeiffer, M. (2016). Training deep spiking
9. Available online at: http://papers.nips.cc/paper/5862-backpropagation-for- neural networks using backpropagation. Front. Neurosci. 10:508.
energy-efficient-neuromorphic-computing doi: 10.3389/fnins.2016.00508

Frontiers in Neuroscience | www.frontiersin.org 11 December 2017 | Volume 11 | Article 682


Rueckauer et al. Spiking Network Conversion

Liu, S.-C., Delbruck, T., Indiveri, G., Douglas, R., and Whatley, A. (2015). Event- Rueckauer, B., and Delbruck, T. (2016). Evaluation of event-based
Based Neuromorphic Systems. Chichester, UK: John Wiley & Sons, 440. algorithms for optical flow with ground-truth from inertial
Masquelier, T., and Thorpe, S. J. (2007). Unsupervised learning of visual measurement sensor. Front. Neurosci. 10:176. doi: 10.3389/fnins.2016.
features through spike timing dependent plasticity. PLoS Comput. Biol. 3:e31. 00176
doi: 10.1371/journal.pcbi.0030031 Rueckauer, B., Lungu, I.-A., Hu, Y., and Pfeiffer, M. (2016). Theory and tools
Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., for the conversion of analog to spiking convolutional neural networks.
Akopyan, F., et al. (2014). A million spiking-neuron integrated circuit with arXiv:1612.04052.
a scalable communication network and interface. Science 345, 668–673. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015).
doi: 10.1126/science.1254642 ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115,
Mostafa, H., Pedroni, B. U., Sheik, S., and Cauwenberghs, G. (2017). “Fast 211–252. doi: 10.1007/s11263-015-0816-y
classification using sparsely active spiking networks,” in ISCAS. Serrano-Gotarredona, T., Linares-Barranco, B., Galluppi, F., Plana, L., and
Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., and Cauwenberghs, G. (2014). Furber, S. (2015). “ConvNets experiments on SpiNNaker,” in Proceedings -
Event-driven contrastive divergence for spiking neuromorphic systems. Front. IEEE International Symposium on Circuits and Systems (Lisbon), 2405–2408.
Neurosci. 7:272. doi: 10.3389/fnins.2013.00272 doi: 10.1109/ISCAS.2015.7169169
Neil, D., and Liu, S.-C. (2014). Minitaur, an event-driven FPGA-based spiking Simonyan, K., and Zisserman, A. (2014). “Very deep convolutional networks for
network accelerator. IEEE Trans. Very Large Scale Integr. Syst. 22, 2621–2628. large-scale image recognition,” in ICLR (Banff, AB), 1–14.
doi: 10.1109/TVLSI.2013.2294916 Stromatias, E., Neil, D., Galluppi, F., Pfeiffer, M., Liu, S. C., and Furber,
Neil, D., Pfeiffer, M., and Liu, S.-C. (2016). “Learning to be efficient: algorithms S. (2015). “Scalable energy-efficient, low-latency implementations of
for training low-latency, low-compute deep spiking neural networks,” in trained spiking Deep Belief Networks on SpiNNaker,” in Proceedings
Proceedings of the 31st Annual ACM Symposium on Applied Computing (Pisa), of the International Joint Conference on Neural Networks (Killarney),
293–298. doi: 10.1145/2851613.2851724 1–8. doi: 10.1109/IJCNN.2015.7280625
Nessler, B., Maass, W., and Pfeiffer, M. (2009). “STDP enables spiking neurons Stromatias, E., Soto, M., Serrano-Gotarredona, T., and Linares-Barranco,
to detect hidden causes of their inputs,” in Advances in Neural Information B. (2017). An event-driven classifier for spiking neural networks fed
Processing Systems, Vol. 22 (Vancouver, BC), 1357–1365. with synthetic or dynamic vision sensor data. Front. Neurosci. 11:350.
O’Connor, P., Neil, D., Liu, S. C., Delbruck, T., and Pfeiffer, M. (2013). Real- doi: 10.3389/fnins.2017.00350
time classification and sensor fusion with a spiking deep belief network. Front. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al.
Neurosci. 7:178. doi: 10.3389/fnins.2013.00178 (2015). “Going deeper with convolutions,” in 2015 IEEE Conference on
Orchard, G., Lagorce, X., Posch, C., Furber, S. B., Benosman, R., and Galluppi, Computer Vision and Pattern Recognition (CVPR) (Boston, MA: IEEE), 1–9.
F. (2015a). “Real-time event-driven spiking neural network object recognition doi: 10.1109/CVPR.2015.7298594
on the SpiNNaker platform,” in Proceedings - IEEE International Symposium Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. (2016).
on Circuits and Systems (Lisbon), 2413–2416. doi: 10.1109/ISCAS.2015. “Rethinking the inception architecture for computer vision,” in IEEE
7169171 Conference on Computer Vision and Pattern Recognition (Las Vegas, NV).
Orchard, G., Meyer, C., Etienne-Cummings, R., Posch, C., Thakor, N., doi: 10.1109/CVPR.2016.308
and Benosman, R. (2015b). HFirst: a temporal approach to object Zambrano, D., and Bohte, S. M. (2016). Fast and efficient asynchronous neural
recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2028–2040. computation with adapting spiking neural networks. ArXiv:1609.02053.
doi: 10.1109/TPAMI.2015.2392947
Pedroni, B. U., Das, S., Arthur, J. V., Merolla, P. A., Jackson, B. L., Conflict of Interest Statement: The authors declare that the research was
Modha, D. S., et al. (2016). Mapping generative models onto a network conducted in the absence of any commercial or financial relationships that could
of digital spiking neurons. IEEE Trans. Biomed. Circ. Syst. 10, 837–854. be construed as a potential conflict of interest.
doi: 10.1109/TBCAS.2016.2539352
Perez-Carrasco, J. A., Zhao, B., Serrano, C., Acha, B., Serrano-Gotarredona, T., The reviewer, SS, and handling Editor declared their shared affiliation.
Chen, S., et al. (2013). Mapping from frame-driven to frame-free event-driven
vision systems by low-rate rate coding and coincidence processing - application Copyright © 2017 Rueckauer, Lungu, Hu, Pfeiffer and Liu. This is an open-access
to feedforward convNets. IEEE Trans. Pattern Anal. Mach. Intel. 35, 2706–2719. article distributed under the terms of the Creative Commons Attribution License (CC
doi: 10.1109/TPAMI.2013.71 BY). The use, distribution or reproduction in other forums is permitted, provided the
Posch, C., Serrano-Gotarredona, T., Linares-Barranco, B., and Delbruck, T. (2014). original author(s) or licensor are credited and that the original publication in this
Retinomorphic event-based vision sensors: bioinspired cameras with spiking journal is cited, in accordance with accepted academic practice. No use, distribution
output. Proc. IEEE 102, 1470–1484. doi: 10.1109/JPROC.2014.2346153 or reproduction is permitted which does not comply with these terms.

Frontiers in Neuroscience | www.frontiersin.org 12 December 2017 | Volume 11 | Article 682

You might also like