0% found this document useful (0 votes)
20 views10 pages

Yoo 2022 Ieeeaccess DTS-SNN Spiking Neural Networks With Dynamic Time-Surfaces

Uploaded by

nawgnas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views10 pages

Yoo 2022 Ieeeaccess DTS-SNN Spiking Neural Networks With Dynamic Time-Surfaces

Uploaded by

nawgnas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received 8 August 2022, accepted 22 September 2022, date of publication 26 September 2022, date of current version 30 September 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3209671

DTS-SNN: Spiking Neural Networks With


Dynamic Time-Surfaces
DONGHYUNG YOO AND DOO SEOK JEONG , (Member, IEEE)
Division of Materials Science and Engineering, Hanyang University, Seoul 04763, South Korea
Corresponding author: Doo Seok Jeong ([email protected])
This work was supported by the National Research and Development Program through the National Research Foundation of Korea (NRF)
funded by the Ministry of Science and ICT under Grant NRF-2021M3F3A2A01037632 and Grant NRF-2019R1C1C1009810.

1 ABSTRACT Convolution helps spiking neural networks (SNNs) capture the spatio-temporal structures
2 of neuromorphic (event) data as evident in the convolution-based SNNs (C-SNNs) with the state-of-the-
3 art classification-accuracies on various datasets. However, the efficacy aside, the efficiency of C-SNN is
4 questionable. In this regard, we propose SNNs with novel trainable dynamic time-surfaces (DTS-SNNs)
5 as efficient alternatives to convolution. The novel dynamic time-surface proposed in this work features its
6 high responsiveness to moving objects given the use of the zero-sum temporal kernel that is motivated
7 by the simple cells’ receptive fields in the early stage visual pathway. We evaluated the performance and
8 computational complexity of our DTS-SNNs on three real-world event-based datasets (DVS128 Gesture,
9 Spiking Heidelberg dataset, N-Cars). The results highlight high classification accuracies and significant
10 improvements in computational efficiency, e.g., merely 1.51% behind of the state-of-the-art result on
11 DVS128 Gesture but a ×18 improvement in efficiency. The code is available online (https://github.com/
12 dooseokjeong/DTS-SNN).

13 INDEX TERMS Lightweight spiking neural network, spiking neural network, dynamic time-surfaces, event-
14 based data.

15 I. INTRODUCTION advantages of SNNs is their operations based on asyn- 32

16 Convolution-based methods are pervasive in a variety of chronous spikes, unlike layer-wise sequential operations in 33

17 deep learning application domains given their high efficacy DNNs which impose forward locking constraints [5], [6]. 34

18 across different domains when implemented in convolutional To leverage this advantage, it is required to implement SNNs 35

19 neural networks (CNNs). The same holds for spiking neural in dedicated hardware, which is referred to as neuromorphic 36

20 networks (SNNs) in that convolution-based SNNs (C-SNNs) hardware [7], [8], [9], [10], [11]. Generally, a neuromorphic 37

21 hold the state-of-the-art classification accuracies on a variety processor consists of multiple cores supporting asynchronous 38

22 of datasets [1], [2], [3]. Convolution is an operation-intensive event-based operations across them. The consequent power 39

23 method that involves a large number of multiply-accumulate efficiency is the key feature of neuromorphic hardware. 40

24 operations over 3D feature maps. Therefore, convolution Time-surface (TS) analyses are effective methods to pro- 41

25 generally results in high computational complexity and high cess asynchronous events (spikes) for various tasks [12], [13], 42

26 power consumption, which is a daunting challenge, partic- [14]. A TS for a given event is a 2D map of the event 43

27 ularly for C-SNNs, because power efficiency is supposed timestamps prior to the event in the spatial vicinity of the 44

28 to be one of the key advantages of SNNs over deep neural event. Therefore, the TS can capture the spatio-temporal local 45

29 networks (DNNs). structure of the events responding to the object. Nevertheless, 46

30 SNNs are time-dependent hypotheses consisting of spik- the previous TSs are not tailored to SNNs and hardly support 47

31 ing units and unidirectional synapses [4]. One of the end-to-end learning. 48

In this regard, we attempt to use TSs, in place of con- 49

The associate editor coordinating the review of this manuscript and volution, to extract the features of event data in a highly 50

approving it for publication was Fu-Kwun Wang . operation-efficient manner to leverage the key advantage, 51

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 102659
D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

As a workaround, Sironi et al. proposed HATS that is 92

based on averaged TSs [13]. For a compact representation, 93

they partitioned the input field into grid-cells. For each grid- 94

cell, the TSs for several recent events in the grid-cell are 95

averaged over the entire timesteps and grid-cell to acquire 96

a single smooth TS that is representative for the grid-cell. 97

Unlike HOTS, in HATS, the TS of each event considers 98

several recent timestamps by convolving an event stream with 99

an exponentially decaying temporal kernel. 100

HOTS uses a set of TSs as a dictionary to compare it 101

with the input TS and to consequently build a feature map 102


FIGURE 1. Schematic of the temporal kernel (k) of a simple cell,
rendering the simple cell responsive to time-varying stimuli (s). The (matching frequency). This comparison is repeated through 103

response is denoted by r . multiple stages. The feature histogram from the last stage 104

is used to categorize input data, which is based on the 105

histogram-similarity between the input and instances in each 106

52 i.e., power efficiency, of SNNs. To the best of our knowledge, category. HATS uses grid-cell-wise averaged TSs as a dic- 107

53 this work is the first attempt to integrate TSs into SNNs to tionary. Similar to HOTS, the feature map is built based on 108

54 process event-based data efficiently. Moreover, we signifi- matching frequency but uses a support vector machine as a 109

55 cantly modified the conventional TSs to better capture the classifier. Notably, both HOTS and HATS use time-invariant 110

56 dynamic features of event-data by using a zero-sum temporal (static) TSs as inputs to their classifiers. 111

57 kernel motivated by the temporal kernels of simple cells in the With the development of training algorithms for SNNs, 112

58 early stage visual pathways [15]. Additionally, our TSs are there have been attempts to process event data by exploiting 113

59 dynamic insomuch as they are calculated for every timestep the spatio-temporal processing ability of SNNs [1], [2], [18], 114

60 to encode event dynamics unlike the previous TSs, which are [19], [20]. Yet, to achieve high classification accuracies, most 115

61 referred to as dynamic time-surfaces (DTSs). The primary of them used large C-SNNs with multiple hidden layers, 116

62 contributions of our work include the following: which cause significant computational complexity. 117

63 • We propose DTS-SNN to replace C-SNN, which is This lets us revisit the initial motivation of SNNs, energy- 118

64 remarkably lightweight but exhibits high classification efficiency, and consequently rethink of efficient methods to 119

65 accuracy on event-based data. extract the spatio-temporal features of event data using TSs 120

66 • We propose trainable DTSs that are susceptible to mov- as alternatives to convolution. To this end, the prerequisites 121

67 ing objects and fully support end-to-end learning. include (i) the modification of the conventional time-invariant 122

68 • We evaluate the classification accuracy and computa- TSs to time-dependent (dynamic) forms with a noise-robust 123

69 tional efficiency of DTS-SNNs on various event-based temporal kernel and (ii) development of a DTS builder sup- 124

70 datasets, including DVS128 Gesture [16], Spiking Hei- porting end-to-end batch learning. 125

71 delberg dataset (SHD) [17], and N-Cars [13].


III. SPIKING NEURAL NETWORKS WITH DYNAMIC 126

72 II. RELATED WORK TIME-SURFACES 127

73 In the early stage of visual processing, simple cells (lin- DTS-SNNs consists of a DTS builder and SNN classifier. 128

74 ear neurons) exhibit linear responses to visual inputs in The builder constructs DTSs for the events at every timestep, 129

75 their receptive fields of particular spatial and temporal struc- which are subsequently fed into the SNN as inputs. To val- 130

76 tures [15]. The temporal structure (kernel) of receptive fields idate the feature extraction ability of the proposed DTS 131

77 features alternating positive and negative contributions of builder and the importance of well-defined features for SNNs, 132

78 input to the simple cell’s response in time such that the input we used a simple dense SNN with a single hidden layer, which 133

79 at a particular point (t0 ) causes a positive contribution within a was trained using a surrogate gradient-based backpropagation 134

80 time window (tc in width, i.e., if t − t0 ≤ tc ), which turns into algorithm [21]. This section elucidates the DTS in compari- 135

81 a negative contribution when the time exceeds the window. son with the previous TSs and a method to build DTSs in 136

82 A schematic of the temporal kernel is illustrated in Figure 1. parallel for the samples in a single batch. 137

83 This type of temporal kernel supports the responsiveness of


84 simple cells to moving objects over their receptive fields. A. DYNAMIC TIME-SURFACES WITH ZERO-SUM 138

85 A comparison between simple cell’s responses to static and TEMPORAL KERNELS 139

86 moving objects is depicted in Figure 1. For an event stream from an event camera, the ith event (ei ) 140

87 As the early research of TS analysis, HOTS considers only is encoded as ei = (pi , ti , Xi ), where pi , ti , and Xi denote 141

88 the last timestamps of the pixels in a given TS. Although its polarity pi ∈ {−1, 1}, timestamp, and location on a 142

89 HOTS successfully introduced the concept of TS to process 2D pixel array Xi = (xi , yi ), respectively. The DTS for the 143

90 event data, the TS was prone to noises that are events irrele- ith event Tei only considers the previous or simultaneous 144

91 vant to the objects under consideration [12]. events ej (j ≤ i) of the same polarity (pj = pi ), which are 145

102660 VOLUME 10, 2022


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

FIGURE 3. (a) Zero-sum temporal kernel ktzs with τ1 = 40 ms and


τ2 = 80 ms compared with the single-exponential kernel kt with
τ0 = 40 ms. b Encoding the timestamps (gray bars) using ktzs and kt .

The events were generated at a constant rate of 100 Hz.

FIGURE 2. TSs for three consecutive events of the same polarity at t1 , t2 ,


and t3 . We set rx and ry to 2 so that the spatial domain for each TS Dei Figure 3 shows an example of timestamps encoded using 179
is a 5 × 5 grid. The function f denotes a timestamp-encoding function.
the zero-sum temporal kernel ktzs compared with encoding 180

using a single-exponential kernel kt (Figure 3(a)). Figure 3(b) 181

highlights the high responsiveness of the encoding function to 182


located at Xj = xj , yj ∈ Dei . The spatial domain Dei is

146
events varying their rate such that it outputs high responses 183
xj , yj | xj − xi ≤ rx , yj − yi ≤ ry .
 
147 defined as Dei = in the initial encoding phase while the following responses 184

Thus, Tei pi , 1xij , 1yij ∈ R2×Rx ×Ry , where 1xij = xj − xi ,



148 merely fluctuate around zero due to the constant event 185

149 1yij = yj − yi , Rx = 2rx + 1, and Ry = 2ry + 1. An example rate. This can clearly be differentiated from the timestamp 186

150 of building DTSs for three consecutive events is illustrated encoding using the single-exponential kernel kt as compared 187

151 in Figure 2. in Figure 3(b). 188

152 The DTS at timestep ti is encoded with the novel zero-sum Similar to HATS, the input field is partitioned into grid- 189

153 temporal kernel ktzs as cells, and a single grid-cell-wise representative DTS is built 190

for each grid-cell. However, unlike HATS, the representative


Tei pi , 1xij , 1yij = (ktzs ∗ ρ) (ti ) ,
191

154 (1)
DTS T c (t) for a given grid-cell c and timestep t is the 192

155 The event stream ρ for each location Xj is described by weighted sum of the DTSs of simultaneous events Tei . 193

 X X
156 ρ t; Xj = δ (t − tk ) , (2) T c (t) = ai Tei , (5) 194

tk ∈tk ei ∈et,c

157 where δ is the Dirac delta function,


 and tk is a set of all where et,c = {ei |ti = t, Xi ∈ c}. The weight of each element 195

158 previous timestamps tk = tk |k ≤ j, Xk = Xj . The zero- time-surface Tei is denoted by ai which is a trainable param- 196

159 sum temporal kernel ktzs is given by eter. This set of weights is shared among all grid-cells. Note 197

τ1 that, in HATS, the representative TS is the simple average of 198


160 ktzs = e−τ/τ1 − e−τ/τ2 , where τ1 < τ2 , (3) the TSs of simultaneous events as follows.
τ2 199

which is distinguished from a single exponential temporal 1 X


161
Tc= T ei . 200
162 kernel kt used in HOTS and HATS. |et,c | e ∈e
i t,c

163 kt = e−τ/τ0 . (4)


B. BUILDING DYNAMIC TIME-SURFACES IN PARALLEL 201

164 Eq. (3) is termed zero-sum temporal kernel because the con- The key to training SNNs using DTSs on a given dataset 202

165 volution with this kernel over an event stream constant firing is the parallel calculations of DTSs for all samples in a 203

166 rate yields zero due to the balance between the positive and batch. Additionally, the compatibility of parallel calculations 204

167 negative sub-kernels in Eq. (3). with readily available deep learning frameworks enhances 205

168 Lemma 1: Consider the convolution of a train of Poisson efficiency. To this end, we propose pixel-wise timestamp- 206

169 spikes ρ at a constant firing rate r using the zero-sum tempo- encoding banks E (t) that are updated once for every timestep. 207

170 ral kernel ktzs , y (t) = (ktzs ∗ ρ) (t). The result converges to The timestamp encodings in the bank can readily be retrieved 208

171 zero as time t increases, i.e., y (∞) = 0. when events at particular pixels occur. This bank is subse- 209

172 The derivation of Lemma 1 is shown in Appendix A. Accord- quently unfolded to endow each pixel with an element time- 210

173 ing to Lemma 1, the kernel yields high responsiveness to surface. At a given timestep, the element time-surfaces for 211

174 spike trains at time-dependent firing rates by filtering out the the simultaneous events are retrieved and summed with their 212

175 spikes at constant firing rates. Consequently, the zero-sum weights to calculate the DTS for a given grid-cell. 213

176 temporal kernel endows the timestamp encoding with high We consider periodically distributed grid-cells over 214

177 responsiveness to moving objects compared with the single- a H × W input field for a given polarity; each grid-cell is 215

178 exponential kernel. hc × wc in size so that there exist H /hc × W /wc grid-cells on 216

VOLUME 10, 2022 102661


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

FIGURE 4. Procedure for building DTSs for a given sample at a given timestep t . Rx and Ry define the size of a DTS such that
Rx = 2rx + 1 and Ry = 2ry + 1.


217 the input field. As such, the spatial domain of each DTS Dei is reshaped into a P × Rx Ry × H × W map. This reshaping 247

Rx × Ry in size, where Rx = (2rx + 1) and Ry = 2ry + 1 .



218 is for the grid-cell-wise calculation of the weighted sum of 248

219 The procedure is detailed in the following subsections. The the DTSs using 3D convolution. This process is depicted in 249

220 pseudocode is shown in Appendix D. Figure 4, indicated by ‘‘Flattening/reshaping’’. 250

221 1) BUILDING TIMESTAMP-ENCODING BANKS 4) MULTIPLICATION BY EVENT-MAPS 251

222 A timestamp-encoding bank E (t) is a bank of timestamp- The reshaped preliminary time-surface map T (t) is read out 252

223 encodings for all pixels so that its dimension is identical to to acquire the DTSs of the events occurring at timestep t. 253

224 the input field. E (t) ∈ RP×H ×W for a P × H × W input field. To this end, we reuse the event-map 1A (p, X) in Eq. (7). 254

225 Each element Epxy (t) is calculated by convolving an event- This event-map is expanded by repeating the map along 255

226 stream (polarity p) at a location (x, y), ρ (t; p, x, y) with the the flattened time-surface axis. This expanded event-map is 256

227 zero-sum temporal kernel ktzs , element-wise multiplied by the map T (t), resulting in the 257

map T (t) including nonzero element time-surfaces Tei (t) 258


228 Epxy (t) = (ktzs ∗ ρ (t; p, x, y)) (t) . (6)
for the events at timestep t only. This process is indicated 259

229 For efficient computation, we transform this convolution into by ‘‘Multiplied by expanded event-map’’ in Figure 4. The 260

230 a recursive form as follows: elements in each flattened Tei (t) are L2-normalized. 261

231 E (t + 1) = E1 (t + 1) − E2 (t + 1) ,
5) GRID-CELL-WISE WEIGHTED SUM OF TIME-SURFACES 262
232 E1 (t + 1) = E1 (t) e−1/τ1 + 1A (p, X) , The input field is a H /hc × W /hw grid, and each grid-cell 263

233 E2 (t + 1) = E2 (t) e−1/τ2 + τ1 /τ2 1A (p, X) . (7) is hc × wc in size. For a given grid-cell, the DTS T c (t) 264

in Eq. (5) is calculated by convolving the map T (t) along


The indicator function 1A (p, X) is an event-map at timestep
265

the time-surface axis using a kernel a ∈ RP×1×hc ×wc with


234
266
235 t, where A = {(pi , Xi ) |ti = t}.
a stride of one. Note that the kernel elements indicate the 267

contributions of the element time-surfaces T (t) to the grid- 268


2) UNFOLDING TIMESTAMP-ENCODING BANKS
cell-wise representative DTS T c (t) as in Eq. (5). The same
236
269
237 The timestamp-encoding bank E (t) ∈ RP×H ×W is subse- process is repeated for the next grid-cell: moving the kernel 270
238 quently unfolded to build a preliminary time-surface map to the next grid-cell (with a stride of hc or hw ) and convolving
T (t) ∈ RP×H ×W ×(Rx ×Ry ) in which each location in the
271
239
T (t) with a stride of one. Thus, the calculation of T c (t) is 272
240 P × H × W input field is given a Rx × Ry preliminary equivalent to 3D convolution of the element time-surface map 273
241 time-surface centered at the location such that T (t) using the rank-4 kernel a. This allows us to readily use 274

242 Tpxy (t) ← Ep,(x−rx ):(x+rx ),(y−ry ):(y+ry ) (t) . (8) the convolution methods in the deep learning frameworks. 275

The resulting Rx Ry × H /hc × W /wc map is a map 276


243 This process is indicated by ‘‘Unfolding’’ in Figure 4. of flattened DTSs for all grid-cells. The map is reshaped 277

into H /hc × W /wc × Rx × Ry . The number of operations



278

244 3) RESHAPING UNFOLDED BANKS involved (#OPDTS ) is given by 279

245 Each preliminary time-surface in the map T (t) is flattened,


246 leading to a P × H × W × Rx Ry map. This is subsequently #OPDTS = 2PHWRx Ry . (9) 280

102662 VOLUME 10, 2022


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

281 C. SPIKING NEURAL NETWORK CLASSIFIER


282 The SNN classifier used is a simple fully-connected network
283 (FCN) with a single hidden layer. The H /hc ×W /wc ×Rx ×Ry
284 DTS is flattened and fed to the SNN classifier as synaptic
285 currents at a given timestep. Each dense layer consists of spik-
286 ing neurons conforming to the spike-response model (SRM)
287 but without a refractory mechanism. The sub-threshold mem-
288 brane potential of the ith neuron in the lth layer is denoted by
(l)
289 ui (t). Hereafter, the subscript and superscript of a variable
290 mean a neuron index and layer index, respectively. The poten-
(l)
291 tial ui (t) is given by
(l)
X (l)  (l−1)

292 ui (t) = wij  ∗ ρj (t) ,
j
τm
= e−τ/τm − e−τ/τs 2 (τ ) ,

293 (10)
τm − τs
(l)
294 where wij , τm , and τs denote the weight from the jth neu-
295 ron in the (l − 1)th layer, time-constant for potential decay,
296 and time-constant for synaptic current decay, respectively.
297 A spike train from the jth neuron in the (l −1)th layer is
(l−1) (l)
298 denoted by ρj . We use a spike function Sϑ ui , FIGURE 5. Validation accuracy change with training epoch.

(
(l)
 
(l) 1 if ui ≥ ϑ,
299 Sϑ ui = (11) in Python using the Pytorch’s Autograd framework [22]. 327
0 otherwise.
We trained the networks using Adam [23] without weight 328

300 When the potential in Eq. (10) crosses a threshold for decay and learning rate scheduling. 329

301 spiking ϑ, a spike is emitted. Subsequently, the potential is


302 reset to zero. A. DVS128 GESTURE 330
303 We trained the SNN classifier using the spatio-temporal DVS128 Gesture is an event-based hand-gesture dataset, 331
304 backpropagation (STBP) algorithm based on surrogate gra- which comprises 1,342 samples labeled as 11 classes. 332
305 dient [21]. However, for simplicity, we modified STBP such We set the input sampling time 1t and the number of 333
(l)
306 that the gradient ∂Sε /∂ui is replaced by a boxcar function B timesteps to 5 ms and 300, respectively. Each original sample 334
307 as follows: (H = W = 128) was mapped onto an 8 × 8 grid, i.e., 335
(
(l)
∂Sϑ  
(l) 1 if |ui − ϑ| < a, hc = wc = 16. The size of each time-surface was set to 7 × 7, 336
308
(l)
← B ui = (12) i.e., Rx = Ry = 7. The flattened DTS was thus 3136 in length 337
∂ui 0 otherwise,
and fed into a 3136-400-11 SNN classifier. 338

309 where a is a positive constant. Table 1 shows the performance and efficiency of 339

DTS-SNN on DVS128 Gesture in comparison with previ- 340

310 IV. EXPERIMENTS ous methods using CNN-based SNNs. It highlights (i) high 341

311 We evaluated the performance of DTS-SNNs on three real- classification accuracy (1.51% lower than the state-of-the- 342

312 world datasets, DVS128 Gesture, SHD datasets, N-Cars. For art result though) and (ii) extremely high computational effi- 343

313 all datasets, we reduced the input event sampling rate to ciency (×18 of that of the state-of-the-art result). The high 344

314 reduce the computational complexity, which is equivalent to computational efficiency arises from the use of a small FCN 345

315 the reciprocal timestep 1t −1 . The hyper-parameters used for instead of a CNN and the high efficiency of the DTS builder. 346

316 each dataset are listed in Appendix E, which were found using The evolution of test accuracy with training epoch is plotted 347

317 manual searches. We used the raw event datasets without any in Figure 5. 348

318 pre-processing. Additionally, we achieved a 3.12% accuracy improve- 349

319 To identify the effect of the zero-sum temporal kernel ment by using the zero-sum temporal kernel ktzs instead 350

320 ktzs on performance, we compared the classification accu- of the single-exponential kernel kt , which indicates the 351

321 racy for the zero-sum temporal kernel ktzs with that for the higher temporal responsiveness of the zero-sum temporal 352

322 single-exponential kernel kt . We evaluated the computational kernel than the conventional single-exponent temporal ker- 353

323 efficiency of DTS-SNN in terms of the number of OPs per nel. We visualize the DTSs T c of the two temporal ker- 354

324 timestep. Note that the number of OPs includes #OPDTS in nels at five timesteps (200 – 240) in Figure 6. A detailed 355

325 Eq. (9). All experiments were conducted on a GPU work- comparison between the two time-surfaces is addressed 356

326 station (GPU: RTX 2080 TI). DTS-SNNs were implemented in Appendix B. 357

VOLUME 10, 2022 102663


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

TABLE 1. Performance comparisons of DTS-SNN on DVS128 Gesture, SHD, and N-Cars.

FIGURE 6. DTSs for (upper panel) the zero-sum temporal kernel and (lower panel) the single-exponential kernel. Each grid-cell in an
8 × 8 grid is a 7 × 7 DTS. (Right hand clockwise sample from DVS128 Gesture).

358 B. SHD the input data dimension. Each sample varies in length 363

359 SHD is an audio classification dataset. It consists of 10,420 (0.24 – 1.17 s). We set the input sampling time 1t to 1 ms. The 364

360 samples of spoken digits (0 – 9) in English and German, other hyper-parameters are listed in Table 4. We considered 365

361 and thus labeled as 20 classes. The recorded samples were the original 700-long 1D sample at a given timestep as a 2D 366

362 analyzed using 700 channels as bases which determine sample (H = 1, W = 700) and mapped it onto a 1 × 35 grid, 367

102664 VOLUME 10, 2022


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

FIGURE 7. Distribution of timestamp encoding values for the zero-sum temporal kernel ktzs and single-exponential kernel kt (Right hand
clockwise sample from DVS128 Gesture).

368 i.e., hc = 1 and wc = 20. The size of each time-surface was TABLE 2. Classification (best) accuracy for different time-surface sizes
(R × R for DVS128 Gesture and N-Cars, and R for SHD) and grid-cell sizes
369 set to 1 × 3, i.e., Rx = 1 and Ry = 3. The 105-long flattened (C × C for DVS128 Gesture and N-Cars, and C for SHD).
370 DTS was fed into a 105-128-20 SNN classifier.
371 Table 1 shows the performance and efficiency on SHD
372 compared with previous works. Note that all models
373 except [26] on SHD in Table 1 have 128 neurons in each
374 hidden layer. The use of the zero-sum temporal kernel real-
375 ized a significant improvement in classification accuracy
376 (by 15.96%) compared with the single-exponential kernel.
377 The learning kinetics for both cases is plotted in Figure 5.

378 C. N-CARS
379 N-Cars is an event-based dataset that was directly recorded
380 using an event camera for car detection task. This dataset
381 aims to binary classification task (car or background) with enhances the responsiveness to time-varying events. To show 399

382 event data of static objects (rather than dynamic objects as this, we address the distribution of timestamp encoding val- 400

ues on a H /hc × W /wc × Rx × Ry DTS map T c (t) at



383 for DVS128 Gesture and SHD). Each sample is 100 ms long, 401

384 and we set the input sampling time 1t to 1 ms so that the given timesteps. We plot the distribution for the zero-sum 402

385 number of timesteps was 100. temporal kernel ktzs and single-exponential kernel kt at given 403

386 We mapped each sample at a given timestep (H = 100, timesteps in Figure 7. We used a sample from DVS128 Ges- 404

387 W = 120) onto a 10 × 12 grid, i.e., hc = wc = 10. We used ture. The comparison evidently indicates the larger dispersion 405

388 the same size of time-surfaces as for DVS128 Gesture and of encoding values for the single-exponential kernel, and 406

389 SHD, i.e., Rx = Ry = 5. The 3,000-long flattened DTS thus the larger standard deviation than the zero-sum temporal 407

390 was fed into a 3000-400-2 SNN classifier. The results are kernel. The larger encoding values for the single-exponential 408

391 shown in Table 1 and compared with several state-of-the- kernel are likely attributed to persistent events. The zero-sum 409

392 art methods. The results indicate an accuracy improvement temporal kernel filters out such large encoding values and 410

393 by 0.81% by using the zero-sum temporal kernel instead of consequently allows the SNN classifier to pay attention to 411

394 the single-exponential kernel. The evolution of test accuracy time-varying events. 412

395 with training epoch is plotted in Figure 5. The dimensions of each time-surface and grid-cell are 413

important hyper-parameters, which are given by (R × R and 414

396 V. DISCUSSION C × C) for 2D data and (R and C) for 1D data. The depen- 415

397 The zero-sum temporal kernel ktzs avoids large timestamp dency of classification accuracy on these hyper-parameters 416

398 encoding values caused by persistent events so that it is shown in Table 2. The larger the value R, the further 417

VOLUME 10, 2022 102665


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

418 events are considered to build time-surfaces, capturing the probabilities are given by 467

419 spatio-temporal correlation of likely incoherent long-range Y


420 events. The larger the value C, the further time-surfaces are ∀k ∈ T , Pk = Ps (k) (1 − Ps (t)) ≈ Ps (k) . 468

421 considered to build a single representative time-surface per t6 =k


422 grid-cell. Table 2 highlights the presence of optimal dimen-
423 sions of time-surfaces and grid-cells which may optimally The expected value y at timestep t can be calculated consid- 469

424 take into account coherent events by filtering out incoherent ering the nontrivial cases only. 470

425 long-range events. We chose the optimal values R and C for t


X
426 each dataset with reference to the data in Table 2. y (t) = e−(t−k)1t/τ Ps (k). 471

k=1
427 VI. CONCLUSION
428 We proposed DTS-SNN that merges the time-surface analysis Because Ps (k) = r (k) 1t, we have 472

429 and event-based classification to realize lightweight inference t


X
430 models with high classification capability. The DTS builder y (t) = e−(t−k)1t/τ r(k)1t. (14) 473

431 with the conventional single-exponential temporal kernel suc- k=1


432 cessfully extracted spatio-temporal features of event data,
433 allowing the following simple SNN classifier (1-layer FCN) Eq. (14) is the discrete form of the convolution: 474

434 to classify input data at high precision. To enhance the fea-


435 ture extraction, we introduced the zero-sum temporal kernel, y (t) = (k ∗ r (t)) (t) . 475

436 which was motivated by the temporal structure of biological


 476
437 receptive fields. With the zero-sum kernel, we improved the
Lemma 3: Consider the convolution of a train of Poisson 477
classification accuracy further. The accuracy improvement is
spikes ρ at a constant firing rate r using the zero-sum tempo-
438
478
deemed to be due to the zero-sum temporal kernel endowing
ral kernel ktzs , y (t) = (ktzs ∗ ρ) (t). The result converges to
439
479
the DTS builder with high responsiveness to dynamic objects.
zero as time t increases, i.e., y (∞) = 0.
440
480
441 Nevertheless, systematic studies on the role of the zero-sum
Proof: We first divide the zero-sum temporal kernel ktzs 481
442 temporal kernel on accuracy improvements need to be con- (1) (2)
into two sub-kernels ktzs and ktzs . 482
443 ducted, which are left for future work.
(1) τ1 (2)
ktzs = ktzs − k , 483
444 APPENDIX τ2 tzs
445 A. PROOFS OF LEMMAS (1)
ktzs = e−τ/τ1 , 484
446 Lemma 2: Consider the convolution of a train of Poisson (2) −τ/τ2
ktzs =e .
spikes ρ at a firing rate r,
485
447

448 y (t) = (k ∗ ρ (t)) (t) , Using Lemma 2, we have 486

τ1
where k is a single exponential kernel k = e−t/τ . The y (t) = y(1) (t) − y(2) (t) , 487
τ
449
2
450 expected value y is approximated to the convolution of the 
(i)
451 firing rate r using the same kernel, y (t) = ktzs ∗ r (t) (t) , i ∈ {1, 2} .
(i)
(15) 488

452 y (t) = (k ∗ r (t)) (t) . We consider a Poisson-spike train whose firing rate r is given 489

453 Proof: The probability of a particular pattern of proba- by a boxcar function with constant nonzero firing rate r0 in 490

454 bilistic spikes in a period is calculated using the probability the range t0 < t < t1 . 491

455 of spiking Ps (t) and not spiking 1−Ps (t). Consider a pattern
456 of spikes at timesteps Ts and no spikes at timesteps Tns . The r (t) = r0 (H (t − t0 ) − H (t − t1 )) , 492

457 probability of the pattern is given by


Y Y where H is the Heaviside step function. Consequently, 493

458 P= Ps (t) (1 − Ps (t)) . (13) we have the result of the convolution in Eq. (15) as follows. 494

t∈Ts t∈Tns
y (t) = 0 if 0 ≤ t < t0 , 495

The probability of spiking Ps is the product of spiking rate r h i


y (t) = r0 τ1 e−(t−t0 )/τ2 − e−(t−t0 )/τ1
459
if t0 ≤ t < t1 ,
and timestep size 1t, i.e., Ps = a1t. Generally, spiking
496
460

rate is below 50 Hz, and setting 1t to 1 ms is common-


h i
461
y (t) = r0 τ1 e−(t−t0 )/τ2 − e−(t−t0 )/τ1 497
462 place. Even for r = 50 Hz and 1t = 1 ms, the spiking h i
463 probability Ps is 0.05. Therefore, ignoring nonlinear terms −r0 τ1 e−(t−t1 )/τ2 − e−(t−t1 )/τ1 if t ≥ t1 . (16) 498

464 in Eq. (13) is a reasonable approximation. Considering this


465 approximation, the nontrivial cases include only one spike in Eq. (16) indicates that y (t) converges to zero if t0  t < t1 . 499

466 the entire period T , Thus, there are T nontrivial cases whose  500

102666 VOLUME 10, 2022


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

Algorithm 1: Building Dynamic Time-Surfaces. Func-


tions encTstamp and Unfold Are Given by Eqs. 7
and 8. Function Mul Element-Wise Multiplies T by the
Event-Map Built Using e = {ei |ti = t}. Function conv3d
Executes the 3D Convolution of T Using the Rank-4
Kernel a
Input: Set of events e = {ei }N i=1 for a sample
Output: Dynamic time-surfaces T c (t) at every timestep
t for all grid-cells C = {ci }M
i=1
Initialization: E ← 0
for t = 1 to T do
FIGURE 8. Effect of nonzo-sum temporal kernel on classification
accuracy on DVS128 Gesture. The time-constants τ1 and τ2 were set to /* Update of timestamp-encoding
50 ms and 100 ms, respectively. bank E */
E ← encTstamp(ET , e = {ei |ti = t})
501 B. NONZERO-SUM TEMPORAL KERNEL /* Building preliminary
502 The use of the zero-sum temporal kernel instead of the time-surface map T */
503 single-exponential kernel improves classification accuracy T ← unfold(E)
504 on all three datasets considered in this study. We evaluate /* Flattening/reshaping T to
505 the effect of nonzero-sum temporal kernel on classification P × [Rx Ry ] × H × W */
506 accuracy in depth by introducing a temporal kernel kt0 , T ← flatten/reshape(T)
τ1 /* Building T using the event-map
507 kt0 = e−τ/τ1 − b e−τ/τ2 , (17)
τ2 */
508 where b is a non-negative constant that determines the con- T ← Mul(T, e = {ei |ti = t})
509 tribution of a single event to timestamp encoding such that T ← Normalize(T )
 /* 3D convolution using rank-4

kt if b = 0, kernel a */
positive-sum temporal kernel if 0 < b < 1, T c ← conv3d(T , a)


510 kt0 = (18)
k tzs if b = 1, /* Reshaping T c to
H /hc × W /w
 
 c× Rx × Ry */

negative-sum temporal kernel if b > 1.

T c ← Reshape T c
511 We evaluated the classification accuracy on DVS128 Gesture
512 with respect to b (Figure 8). The results indicate the highest Output T c
513 accuracy achieved at b = 1, i.e., kt0 = ktzs . end

514 C. ZERO-SUM TEMPORAL KERNELS WITH DIFFERENT


515 TIME-CONSTANTS TABLE 4. Hyper-parameters used for evaluation.
516 We manually explored time-constant space in search of the
517 optimal set of time-constants τ1 and τ2 for the zero-sum
518 temporal kernel (τ2 = 2τ1 ). Here, we report the classification
519 accuracy on DVS128 Gesture, SHD, and N-Cars for three
520 sets of time-constants (τ1 /τ2 = 10/20, 20/40, and 50/100)
521 in Table 3.

TABLE 3. Classification accuracy for three different sets of


time-constants τ1 and τ2 . This is the best validation
accuracy from a single trial for each case.

522 D. PSEUDOCODE E. HYPER-PARAMETERS 525

523 The pseudocode for constructing dynamic time-surfaces in The hyper-parmeters used for evaluation are shown in 526

524 parallel is shown in Algorithm 1. Table 4. 527

VOLUME 10, 2022 102667


D. Yoo, D. S. Jeong: DTS-SNN: Spiking Neural Networks With Dynamic Time-Surfaces

528 REFERENCES [22] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, 602

529 [1] H. Zheng, Y. Wu, L. Deng, Y. Hu, and G. Li, ‘‘Going deeper with directly- Z. Lin, N. Gimelshein, L. Antiga, and A. Desmaison, ‘‘Pytorch: An imper- 603

530 trained larger spiking neural networks,’’ in Proc. AAAI Conf. Artif. Intell., ative style, high-performance deep learning library,’’ in Proc. Adv. Neural 604

531 2021, pp. 11062–11070. Inf. Process. Syst., 2019, pp. 8026–8037. 605

532 [2] A. Kugele, T. Pfeil, M. Pfeiffer, and E. Chicca, ‘‘Efficient processing [23] D. P. Kingma and J. Ba, ‘‘Adam: A method for stochastic optimization,’’ 606

533 of spatio-temporal data streams with spiking neural networks,’’ Frontiers 2014, arXiv:1412.6980. 607

534 Neurosci., vol. 14, p. 439, May 2020. [24] S. B. Shrestha and G. Orchard, ‘‘Slayer: Spike layer error reassignment in 608

535 [3] B. Yin, F. Corradi, and S. M. Bohté, ‘‘Accurate and efficient time-domain time,’’ in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 1412–1421. 609

536 classification with adaptive spiking recurrent neural networks,’’ Nature [25] W. Fang, Z. Yu, Y. Chen, T. Masquelier, T. Huang, and Y. Tian, ‘‘Incorpo- 610

537 Mach. Intell., vol. 3, no. 10, pp. 905–913, Oct. 2021. rating learnable membrane time constant to enhance learning of spiking 611

538 [4] D. S. Jeong, ‘‘Tutorial: Neuromorphic spiking neural networks for tempo- neural networks,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 612

539 ral learning,’’ J. Appl. Phys., vol. 124, no. 15, Oct. 2018, Art. no. 152002. Oct. 2021, pp. 2661–2671. 613

540 [5] M. Pfeiffer and T. Pfeil, ‘‘Deep learning with spiking neurons: Opportuni- [26] F. Zenke and T. P. Vogels, ‘‘The remarkable robustness of surrogate gra- 614

541 ties and challenges,’’ Frontiers Neurosci., vol. 12, p. 774, Oct. 2018. dient learning for instilling complex function in spiking neural networks,’’ 615

542 [6] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, Neural Comput., vol. 33, no. 4, pp. 899–925, 2021. 616

543 D. Silver, and K. Kavukcuoglu, ‘‘Decoupled neural interfaces using syn- [27] N. Perez-Nieves, V. C. H. Leung, P. L. Dragotti, and D. F. M. Goodman, 617

544 thetic gradients,’’ in Proc. Int. Conf. Mach. Learn., 2017, p. 1627–1635. ‘‘Neural heterogeneity promotes robust learning,’’ Nature Commun., 618

545 [7] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, vol. 12, no. 1, pp. 1–9, Dec. 2021. 619

546 F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, [28] A. Viale, A. Marchisio, M. Martina, G. Masera, and M. Shafique, 620

547 I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner, ‘‘CarSNN: An efficient spiking neural network for event-based 621

548 W. P. Risk, R. Manohar, and D. S. Modha, ‘‘A million spiking-neuron autonomous cars on the loihi neuromorphic research processor,’’ in 622

549 integrated circuit with a scalable communication network and interface,’’ Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2021, pp. 1–10. 623

550 Science, vol. 345, no. 6197, pp. 668–673, Aug. 2014.
551 [8] M. Davies, N. Srinivasa, T. H. Lin, G. Chinya, Y. Cao, S. H. Choday,
552 G. Dimou, P. Joshi, N. Imam, S. Jain, and Y. Liao, ‘‘Loihi: A neuromorphic
553 manycore processor with on-chip learning,’’ IEEE Micro, vol. 38, no. 1,
554 pp. 82–99, Jan. 2018.
555 [9] A. Neckar, S. Fok, B. V. Benjamin, T. C. Stewart, N. N. Oza, A. R. Voelker,
556 C. Eliasmith, R. Manohar, and K. Boahen, ‘‘Braindrop: A mixed-signal
557 neuromorphic architecture with a dynamical systems-based programming
558 model,’’ Proc. IEEE, vol. 107, no. 1, pp. 144–164, Jan. 2019.
559 [10] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, ‘‘A scalable multicore
560 architecture with heterogeneous memory structures for dynamic neuromor-
561 phic asynchronous processors (DYNAPs),’’ IEEE Trans. Biomed. Circuits
562 Syst., vol. 12, no. 1, pp. 106–122, Feb. 2018.
563 [11] V. Kornijcuk and D. S. Jeong, ‘‘Recent progress in real-time adaptable
564 digital neuromorphic hardware,’’ Adv. Intell. Syst., vol. 1, no. 6, Oct. 2019, DONGHYUNG YOO received the B.S. degree in 624
565 Art. no. 1900030. materials science and engineering from Hanyang 625
566 [12] X. Lagorce, G. Orchard, F. Gallupi, B. E. Shi, and R. Benosman, ‘‘HOTS: University, Seoul, South Korea, in 2017, where he 626
567 A hierarchy of event-based time-surfaces for pattern recognition,’’ IEEE is currently pursuing the Ph.D. degree in materi- 627
568 Trans. Pattern Anal. Mach. Intell., vol. 39, no. 7, pp. 1346–1359, Jan. 2017.
als science and engineering. His current research 628
569 [13] A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. Benosman,
interests include learning algorithms for spiking 629
570 ‘‘HATS: Histograms of averaged time surfaces for robust event-based
571 object classification,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern neural networks and neuromorphic vision sensor. 630

572 Recognit., Jun. 2018, pp. 1731–1740.


573 [14] J. Manderscheid, A. Sironi, N. Bourdis, D. Migliore, and V. Lepetit,
574 ‘‘Speed invariant time surface for learning to detect corner points with
575 event-based cameras,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
576 Recognit. (CVPR), Jun. 2019, pp. 10237–10246.
577 [15] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and
578 Mathematical Modeling of Neural Systems. Cambridge, MA, USA: MIT
579 Press, 2005.
580 [16] A. Amir, B. Taba, D. Berg, T. Melano, J. McKinstry, C. Di Nolfo, T. Nayak,
581 A. Andreopoulos, G. Garreau, M. Mendoza, J. Kusnitz, M. Debole,
582 S. Esser, T. Delbruck, M. Flickner, and D. Modha, ‘‘A low power, fully
583 event-based gesture recognition system,’’ in Proc. IEEE Conf. Comput. Vis.
584 Pattern Recognit. (CVPR), Jul. 2017, pp. 7243–7252.
585 [17] B. Cramer, Y. Stradmann, J. Schemmel, and F. Zenke, ‘‘The Heidelberg
586 spiking data sets for the systematic evaluation of spiking neural networks,’’
587 IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 7, pp. 2744–2757,
588 Jul. 2022. DOO SEOK JEONG (Member, IEEE) received 631
589 [18] Y. Xing, G. D. Caterina, and J. Soraghan, ‘‘A new spiking convolutional the B.E. and M.E. degrees in materials science 632
590 recurrent neural network (SCRNN) with applications to event-based hand from Seoul National University, in 2002 and 633
591 gesture recognition,’’ Frontiers Neurosci., vol. 14, p. 1143, Nov. 2020. 2005, respectively, and the Ph.D. degree in mate- 634
592 [19] J. Kaiser, H. Mostafa, and E. Neftci, ‘‘Synaptic plasticity dynamics for
rials science from RWTH Aachen, Germany, 635
593 deep continuous local learning (DECOLLE),’’ Frontiers Neurosci., vol. 14,
in 2008. He was with the Korea Institute of Sci- 636
594 p. 424, May 2020.
595 [20] W. He, Y. Wu, L. Deng, G. Li, H. Wang, Y. Tian, W. Ding, W. Wang, and ence and Technology, from 2008 to 2018. He is 637

596 Y. Xie, ‘‘Comparing SNNs and RNNs on neuromorphic vision datasets: an Associate Professor with Hanyang University, 638

597 Similarities and differences,’’ Neural Netw., vol. 132, pp. 108–120, South Korea. His research interests include spik- 639

598 Dec. 2020. ing neural networks for sequence learning, future 640

599 [21] Y. Wu, L. Deng, G. Li, J. Zhu, and L. Shi, ‘‘Spatio-temporal backpropa- prediction, learning algorithms, spiking neural network design, and digital 641

600 gation for training high-performance spiking neural networks,’’ Frontiers neuromorphic processor design. 642
601 Neurosci., vol. 12, p. 331, May 2018. 643

102668 VOLUME 10, 2022

You might also like