0% found this document useful (0 votes)
4 views

08-A data-driven spike sorting feature map for

The paper presents a novel data-driven spike sorting feature map that resolves overlapping spikes directly in the feature space, improving upon traditional clustering methods that often struggle with spike overlap. This approach utilizes a neural network architecture trained to simultaneously perform spike sorting and overlap resolution, eliminating the need for expensive post-processing techniques. The proposed method demonstrates similar sorting performance to state-of-the-art techniques while simplifying the overall spike sorting pipeline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

08-A data-driven spike sorting feature map for

The paper presents a novel data-driven spike sorting feature map that resolves overlapping spikes directly in the feature space, improving upon traditional clustering methods that often struggle with spike overlap. This approach utilizes a neural network architecture trained to simultaneously perform spike sorting and overlap resolution, eliminating the need for expensive post-processing techniques. The proposed method demonstrates similar sorting performance to state-of-the-art techniques while simplifying the overall spike sorting pipeline.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Journal of Neural

Engineering

PAPER You may also like


- A knowledge-based approach for
A data-driven spike sorting feature map for automatic quantification of epileptiform
activity in children with electrical status
resolving spike overlap in the feature space epilepticus during sleep
Xian Zhao, Xinhua Wang, Chen Chen et
al.
To cite this article: J Wouters et al 2021 J. Neural Eng. 18 0460a7 - Noise-robust unsupervised spike sorting
based on discriminative subspace learning
with outlier handling
Mohammad Reza Keshtkaran and Zhi
Yang
View the article online for updates and enhancements. - A Bayesian approach for simultaneous
spike/LFP separation and spike sorting
Steven Le Cam, Pauline Jurczynski,
Jacques Jonas et al.

This content was downloaded from IP address 188.217.55.68 on 24/12/2024 at 10:08


J. Neural Eng. 18 (2021) 0460a7 https://doi.org/10.1088/1741-2552/ac0f4a

Journal of Neural Engineering

PAPER

A data-driven spike sorting feature map for resolving spike overlap


RECEIVED
26 January 2021
REVISED
in the feature space
30 May 2021
J Wouters1,∗, F Kloosterman2,3,4 and A Bertrand1
ACCEPTED FOR PUBLICATION
28 June 2021 1
KU Leuven, Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing
PUBLISHED and Data Analytics and Leuven., Leuven, Belgium
2
19 July 2021 Neuro-Electronics Research Flanders (NERF), Leuven, Belgium
3
KU Leuven, Brain & Cognition Research Unit, Leuven, Belgium
4
VIB, Leuven, Belgium

Author to whom any correspondence should be addressed.
E-mail: [email protected]

Keywords: spike sorting, overlapping spikes, feature extraction, neural network

Abstract
Objective. Spike sorting is the process of extracting neuronal action potentials, or spikes, from an
extracellular brain recording, and assigning each spike to its putative source neuron. Spike sorting
is usually treated as a clustering problem. However, this clustering process is known to be affected
by overlapping spikes. Existing methods for resolving spike overlap typically require an expensive
post-processing of the clustering results. In this paper, we propose the design of a domain-specific
feature map, which enables the resolution of spike overlap directly in the feature space. Approach.
The proposed domain-specific feature map is based on a neural network architecture that is trained
to simultaneously perform spike sorting and spike overlap resolution. Overlapping spikes clusters
can be identified in the feature space through a linear relation with the single-neuron clusters for
which the neurons contribute to the overlapping spikes. To aid the feature map training, a data
augmentation procedure is presented that is based on biophysical simulations. Main results. We
demonstrate the potential of our method on independent and realistic test data. We show that our
novel approach for resolving spike overlap generalizes to unseen and realistic test data.
Furthermore, the sorting performance of our method is shown to be similar to the state-of-the-art,
but our method does not assume the availability of spike templates for resolving spike overlap.
Significance. Resolving spike overlap directly in the feature space, results in an overall simplified
spike sorting pipeline compared to the state-of-the-art. For the state-of-the-art, the overlapping
spike snippets exhibit a large spread in the feature space and do not appear as concentrated
clusters. This can lead to biased spike template estimates which affect the sorting performance of
the state-of-the-art. In our proposed approach, overlapping spikes form concentrated clusters and
spike overlap resolution does not depend on the availability of spike templates.

1. Introduction are a frequently studied component of the electrical


brain signals because they reflect the output signals of
Neurons communicate with each other through neurons.
electro-chemical signalling mechanisms. Such When performing extracellular electrophysiolo-
neuronal signalling is believed to underlie cogni- gical recordings, electrodes are lowered into the
tion. In order to better understand the brain and brain near the neuron population that is under
how it gives rise to cognition, neuroscientists per- study. Modern neural probes contain a dense grid
form experiments that are designed to measure of hundreds of recording electrodes [2, 3] and are
neural activity and relate this activity to behavioural becoming increasingly popular for performing large-
data. When performing electrophysiological experi- scale neural recordings. Neural probes can record
ments, the electrical component in neural signalling a mixture of spikes from hundreds of neurons
is studied. Neuronal action potentials, or ‘spikes’, simultaneously. The spiking activity at each electrode

© 2021 IOP Publishing Ltd


J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

channel is referred to as multi-unit activity as each These supervised learning building blocks are typ-
electrode typically capture spikes from multiple neur- ically intended for replacing classical spike detec-
ons. However, when an experimenter is interested in tion and feature extraction methods, resulting in
the neural activity of individual neurons, i.e. single- domain-specific detectors and feature maps. How-
unit activity, a neural source separation problem ever, these recent feature maps, as well as preced-
has to be solved. The transformation of the activ- ing classical techniques, do not take into account
ity of multiple neurons into the individual activity spike overlap and overlap resolution mechanisms.
of the different recorded neurons is known as spike In this work we propose the use of a neural net-
sorting [4]. While multi-unit activity is often suf- work spike sorting feature map that is capable of
ficient for coarse brain-computer interfacing tasks, resolving spike overlap directly in the feature space.
having access to the neural activity of individual neur- In this way, the clustering bias due to spike over-
ons is essential to neuroscientists that try to under- lap is overcome. Through the use of this special-
stand how individual neurons contribute to brain ized feature map, there is no need for matched filter
function [5–7]. post-processors, nor for iterative fitting techniques to
A common approach for solving the spike sort- resolve spike overlap, such that a minimal processing
ing problem is through the use of an unsupervised pipeline is delivered that is capable of resolving spike
learning framework [8]. The framework is character- overlap.
ized by four consecutive processing steps: (1) spike In section 2 we formalize the classical spike
detection, (2) spike alignment, (3) feature extrac- sorting process and show how overlapping spikes
tion, and (4) clustering. The spike detection results hamper the sorting performance. Then, the domain-
in the extraction of spike snippets from the neural specific feature map that is designed for resolv-
recording. These spike snippets are then mutually ing spike overlap is presented in section 3. In
aligned according to some alignment criterion to sup- section 4 we explore the design space of the pro-
port the feature extraction. Next, feature values are posed feature map and quantify the feature map
computed for every spike snippet, which ideally res- spike sorting performance on realistic ground truth
ults in the discriminability of spikes from different data. Finally, we conclude the presented work in
neurons, while promoting the grouping of spikes section 5.
from the same neuron. Finally, the single-neuron
clusters are identified through an automated cluster- 2. Spike sorting and overlapping spikes
ing analysis. Several implementations and variations
on this common framework have been proposed Consider x[k] ∈ RN to be the high-pass filtered
[9–15]. extracellular recording at time sample k as meas-
When two or more neurons fire near- ured simultaneously on N recording electrodes.
simultaneously in time, a compound spike wave- A single neuronal spike contributes to several
form, also referred to as an overlapping spike snip- consecutive time samples of the extracellular
pet, is generated. If the vanilla clustering framework recording when using typical sampling frequen-
from the previous paragraph is used for spike sort- cies (25–30 kHz). To facilitate the study of the
ing, the overlapping spike snippets are likely going neuronal spikes, we define the stacked vector
to be misinterpreted, leading to wrongly clustered  T
x̄[k] = x[k − L + 1]T . . . x[k − 1]T x[k]T ∈ RNL to
spikes and an overall bias in the cluster identifica-
represent a spatio-temporal window on the extracel-
tion itself. It has been shown that overlapping spikes
lular recording of N channels by L samples. The bar
are affecting spike sorting accuracy in practice [16].
notation ■ ¯ is also used in combination with other
Therefore, strategies have been designed to cope with
symbols to indicate a similar delay-line extension on
overlapping spikes. To the best of our knowledge
the data that is represented by that symbol.
these strategies all depend on the use of spike tem-
A spike sorting algorithm takes such an extracel-
plates [17]. The strategies consist of either apply-
lular recording x[k] at its input to output the sample
ing matched filters that are derived from the spike
times at which the individual neurons embedded in
templates following the biased clustering analysis
the recording generate spikes.
n The o output thus con-
[18–23], or rely on an iterative clustering/template
fitting procedure to resolve spike overlap [16, 24, 25]. sists of a set collection Ŝn ∀n , where an indi-
Spike template estimates will also be biased by over- vidual set Ŝn contains the spike time estimates (i.e.
lapping spikes when they are based on biased clus- the spike train) of the sortable neuron n. The dif-
tering results. In case of the iterative template estim- ferent spike waveforms as generated by a specific
ation, the template estimation bias can be overcome, neuron are very similar throughout time, which is
but only at a considerable increase in computational why clustering techniques are commonly used in
cost. spike sorting. A typical clustering approach for trans-
Recently, modern supervised machine learning forming the input recording into the spike sorting
concepts such as (deep) neural networks have become output consists of the following sequential processing
increasingly popular for use in spike sorting [26–30]. steps [8]:

2
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

(a) detection: Extract a set of unsorted spike times


U from the extracellular recording. Although
several detection approaches are in use, usually
they involve an amplitude thresholding opera-
tion that can be optionally preceded by a pree-
mphasis filtering.
(b) alignment: Given the set of unsorted spike times
U, every spike time is adjusted such that the
corresponding spike waveforms are mutually
aligned in their L samples wide temporal win-
dow according to some alignment criterion. A
popular alignment criterion is to place the local
spike minimum in the middle of the temporal
window. This processing step results in a set of
Figure 1. Three clusters are distinguishable in the feature
aligned, yet unsorted spike time U∥ . space. Both the red and blue clusters consist of
(c) feature extraction: Given the aligned spike single-neuron spike projections. The yellow cluster consists
of projected overlapping spikes that are composed of
times, a set of features F is derived from the spike aligned single-neuron spikes of both the red and blue
waveform as follows: cluster. For the case of perfectly aligned overlapping spikes
and a linear feature map, spike overlap can be resolved by
 establishing relationships between clusters through the
F= f (x̄[u]) ∀u ∈ U∥ , vector addition of cluster centroids that potentially map on
overlapping spikes clusters.
with f : RNL → RM the feature map to the M-
dimensional feature space. A popular choice of
feature map for spike sorting is based on a prin- j k
Wi +Wj
cipal component analysis (PCA), which results in tj | < 2 , i.e. their respective spike templates
a linear feature map [8]. overlap and are superimposed in x.
(d) cluster analysis: Finally, a cluster analysis is per- Consider now a recording sample in which the
formed on F, where the spike feature projec- spiking activity of two neurons i and j is present. For
tions, which are also referred to as spike embed- the sake of intelligibility, let us first consider a model
dings, of the same neuron are supposed to cluster of spike overlap where |ti − tj | = 0 for the spike times
together, and clusters from different neurons are at which overlap occurs, i.e. all overlapping spikes
separable in that feature space. Based on the clus- consist of the sum of perfectly aligned single-neuron
tering in F,nthe original space U∥ocan be parti- templates. For this scenario, a linear feature map f
tioned into Ŝn ∀n ∈ N̂ , Sunsorted , where N̂ is (e.g. the commonly used PCA feature map) has the
the set of sortable neurons which typically results following interesting linearity property:
from a manual curation of the clustering results
and Sunsorted is a ‘garbage’ set which contains all
multi-neuron and noise clusters that could not f(s̄i + s̄j ) = f(s̄i ) + f(s̄j ), (2)
be assigned to a single neuron.

Ideally, Sunsorted is empty. However, in practice, where s̄i and s̄j ∈ RNL are the mutually aligned spatio-
Sunsorted usually contains a significant fraction of the temporal spike templates of neuron i and j, respect-
detected spike times. A well-known problem that ively. This property enables the resolution of these
hampers the spike sorting process is the occurrence overlapping spikes in the feature space: if the sum of
of spike overlap. To illustrate this problem consider two cluster centroids maps onto the centroid of a third
the following model for the extracellular recording: cluster, this third cluster is believed to contain the
overlapping spikes that are composed of the sum of
XX
x[k] = αn,t sn [k − t] + n[k], (1) the aligned templates associated with the two former
n∈N t∈Sn clusters. The overlap resolution property is illustrated
in figure 1.
where sn is the finite-support prototypical spike wave- Unfortunately, the assumption of perfect spike
form of neuron n, also referred to as the spike tem- alignment is of little practical use. Although neur-
plate of neuron n. The spike template sn is assumed ons might exist for which the alignment assumption
to have W n non-zero elements centered around k = 0. holds, e.g. when such neurons are coupled through
Furthermore, αn,t is a template scaling factor to model gap junctions, general spike overlap is believed to
amplitude variations and n is defined to be an addit- occur by chance because two or more neurons hap-
ive noise component that accounts for both the neural pen to fire near-simultaneously in time. When such
and electrical noise. We say spike overlap occurs random overlap happens for two neurons i and j,
between a neuron i and j if ∃ ti , tj ∈ Si × Sj : |ti − ti − tj results from a uniform random distribution.

3
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Given its supervised nature, the design of the fea-


ture map can be broken down into the following com-
ponents:

• neural network architecture selection,


• overlap resolving cost function design,
• training data generation and augmentation pro-
cess, and
• training procedure.

In the following sections we will go into more


detail on each of these design components.
Figure 2. The red and blue clusters consists of
single-neuron spike projections. The projections of
overlapping spikes with random alignment are shown in
3.1. Neural network architecture selection
yellow and are smeared out in feature space, which prevents In this work we focus on the use of a multi-layer
the overlapping spikes cluster from being recovered. Note perceptron (MLP) neural network architecture for
that the colours in this figure are assigned based on
ground-truth information and not based on an actual the non-linear feature map g : RNL → RM . Although
clustering analysis. other function families exist, the MLP with a single
hidden layer is known to be a universal function
approximator [31], meaning that it can fit any con-
To concisely represent such randomly aligned spike tinuous function up to an error that can be made
overlap we introduce the following notation: arbitrarily small by increasing the dimensionality of
the hidden layer. By incorporating additional hidden
layers in the network, the total number of artificial
s̄i [k − ti ] + s̄j [k − tj ] = s̄i ⊕ s̄j , (3)
neurons that are needed to achieve a certain error
bound, can be drastically reduced [32].
where ⊕ denotes the random-shift-and-sum oper-
The deep neural network architecture that is used
ator. The random time shift between spikes intro-
in this work is shown in figure 3. The neural net-
duces a non-linearity that leads to the smearing of
work takes an NL-dimensional vector as input. The
the overlapping spikes cluster when using a linear
network contains three fully-connected hidden lay-
feature map, i.e. f(s̄i ⊕ s̄j ) ̸= f(s̄i ) + f(s̄j ). Such cluster
ers with 2NL, NL and 15 NL hidden artificial neur-
smearing is shown in figure 2. This smearing pre-
ons, respectively. Because of the fixed network input
vents a reliable recovery of the clusters when com-
dimensionality, we assume that the sampling fre-
pared to figure 1. Without well-defined clusters, the
quency of the neural recordings is constant through-
resolution of overlapping spikes in the feature space
out both the training and actual spike sorting phases.
becomes infeasible. Therefore, in the remainder of
The hidden neurons use rectified linear unit (ReLU)
this work, we will investigate the use of a non-
activation functions [33]. The output consists of M
linear feature map g that is intended to behave
artificial neurons with linear activation, where M is
as if it were a linear function applied to a linear
the desired dimension of the feature space. For train-
combination:
ing regularization purposes we have included a dro-
pout layer [34] before every neuron layer. To com-
g(s̄i ⊕ s̄j ) = g(s̄i ) + g(s̄j ), (4)
plement the dropout layer, the weights incident to
each artificial neuron are constrained to not exceed
such that the feature embedding has similar prop-
a specified maximum norm as also proposed in [34].
erties as the embedding presented in figure 1, but
The output of the hidden layers is subjected to an
now also in case of overlapping spikes with random
optional batch normalization [35]. The middle hid-
alignment.
den layer is marked optional as well. Here, by optional
we mean that whether or not the optional feature is
3. Overlap resolving feature map design
used, is considered a hyperparameter. The hyperpara-
meter exploration will be discussed in section 4.
The design of the feature map is approached as a
supervised machine learning problem. Because spike
sorting is inherently unsupervised, the feature map is 3.2. Cost function
envisioned to be applied in a train once, apply ‘any- The model selection, as presented in the previous
where’ fashion. However, the meaning of anywhere section, only results in an architectural skeleton. In
is limited to the usage of a fixed probe geometry order to apply the neural network g to recording
and spatio-temporal window, as will become appar- data, the appropriate function coefficients have to be
ent from the details in this section. chosen, leading to a concrete realization of g.

4
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Figure 3. Schematic representation of the fully connected MLP neural network architecture as used in this work. The network
accepts an NL-dimensional vector at its input (red). This input vector is transformed to an M-dimensional output vector (blue)
via a chained projection onto three consecutive hidden layers, prior to a projection onto the output neurons. The dimensionality
of the hidden layers ordered from input to output is 2NL, NL and 15 NL, respectively. Both the hidden layers and the output layer
contain a dropout layer (yellow) for training regularization purposes. The hidden neurons make use of a non-linear rectifier
activation function (grey). The use of batch normalization (green) is explored in this work. The second hidden layer is marked as
optional because we will explore networks both with and without this layer.

In a typical supervised learning task, the coeffi- The loss function that is associated with the first
cients of the network are chosen to enable classifica- term of the cost acts on three spatio-temporal spike
tion or regression. To obtain such behaviour, the class waveforms and is given by:
labels or dependent variables values yk are required
 2
during training for all k ∈ T , with T the set of train- L1 x̄i , x̄j , x̄i⊕j = g(x̄i ) + g(x̄j ) − g(x̄i⊕j ) , (6)
2
ing samples. A realization of ĝ is then obtained by tun-
ing the coefficients until the (local) minimum of a with a slight abuse of notation, let x̄i = s̄i + n̄ and x̄j =
certain cost function is obtained: s̄j + n̄ be a training sample spike of neuron i and j,
X respectively, each with their own random noise com-
ĝ = arg ming L (g (x̄k ) , yk ) , (5) ponent. x̄i⊕j = s̄i ⊕ s̄j + n̄ is a training sample repres-
k∈T
enting two overlapping spikes involving neuron i and
where L is a training sample loss function, e.g. a least j. Minimizing L1 over g has the potential to results in
squares criterion. a feature map g that behaves linearly for non-aligned
Irrespective of the choice of loss, the classical overlapping spikes as described by (4).
approach to supervised learning with explicit training Minimizing only L1 , will result in the trivial solu-
labels yk is not suitable for use in spike sorting because tion g(x̄) = 0 ∀x̄. To obtain a non-trivial feature map
of its inherent unsupervised nature. Therefore, we g, which is suitable for clustering, L1 is combined with
propose the use of a cost function where the training a second loss term L2 :
labels are used implicitly, i.e. the network is not forced   
2
to produce a specific output, rather it is instruc- L2 x̄i , x̄j = exp − g(x̄i ) − g(x̄j ) 2 . (7)
ted to learn an output representation which behaves
according to that cost function. The cost function When minimizing L2 over g, it will favour realizations
that is proposed in this work consists of four distinct of g that result in a large distance in the feature space
terms that each have a specific intended purpose. In between a pair of spikes from different neurons. As
what follows, the different terms will be introduced such, this term is intended to maximize the between-
one-by-one. cluster distance.

5
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

To promote the clustering of spikes from the same


neuron, L3 is defined as:

L3 (x̄i , x̄i′ ) = ∥g(x̄i ) − g(x̄i′ )∥2 ,


2
(8)

where x̄i and x̄i′ are distinct spike samples from the
same neuron. As such this term serves to minimize the
within-cluster distance for the given neuron.
The compound cost function used in this work
is composed from the above three loss functions as
follows:
X  
L1 x̄i , x̄j , x̄i⊕j + κ1 L2 x̄i , x̄j
T     
+ κ2 L3 x̄i , x̄σi + L3 x̄j , x̄σj
    
+ κ3 L3 x̄i , x̄δi + L3 x̄j , x̄δj , (9)
P
where the sum operator acts on the train-
ing set T . A single entry of the training
set
n is a collection of training o example spikes
σ σ δ δ
x̄i , x̄j , x̄i⊕j , x̄i , x̄j , x̄i , x̄j from a pair of train-
ing neurons (i, j) with i ̸= j. Note that for a given pair
of neurons, multiple distinct entries can be present
in the training set. In this work the neuron identities,
e.g. i and j, serve as implicit labels, i.e. the cost func- Figure 4. Flowchart that contains the different steps that are
involved for the generation of training data.
tion describes how the feature map should behave
for spikes of different neurons, but it does not push
the output to a predefined value. x̄σ represents an
amplitude scaled version of x̄. As such, the terms that g, we need to construct the actual training set T to
involve x̄σ (i.e. the terms weighted with κ2 ) serve perform the training. figure 4 depicts the different
to penalize spike amplitude scaling variance in the steps that are involved in the training data generation
desired feature space. x̄δ represent a spike snippet that process. These steps will be discussed in detail in the
is superimposed with a spike waveform of a different remainder of this section.
neuron that reaches its maximum energy on another In this work we make use of a data augmentation
channel. As discussed in the following section, neur- approach for the generation of the training data. First,
ons that reach their maximum spike energy on a 634 spike templates s̄n ∈ RN2L from 634 distinct bio-
different channel are to be treated in separate clus- physically detailed neuron models [36] are simulated
tering problems (see divide-and-conquer clustering). using MEArec [37]. Note that for these templates a
Therefore, the superimposed spike waveform is to be temporal window of 2L is used rather than L. This
treated as physiological noise and feature robustness is done to make sure that the minimum of an over-
against such noise is desirable. More details on the lapping spikes segment with |si − sj | ̸= 0 can be tem-
specifics for x̄σ and x̄δ are given in the next section. porally aligned within an L-window prior to com-
κ1 , κ2 and κ3 are hyper parameters that allow for puting the feature map projection. In this work, the
an explicit trade-off between (a) the linear behaviour probe geometry is equivalent to a single column of a
within the feature space5 , (b) the between-cluster dis- neuropixel probe [2]. A single-column probe design
tance and (c) the within-cluster distance. Note that is chosen here to aid the training data generation pro-
L1 does not have an explicit hyper parameter because cess in terms of its spatio-temporal alignment proced-
its importance can be controlled relative to the other ure (see below)6 . The simulated sampling frequency is
terms through κ1 , κ2 and κ3 . chosen to be 32 kHz. The temporal window length is
chosen to be L = 32, which corresponds to 1 ms. The
3.3. Training data minimum of every s̄n is positioned at the same sample
Now that we have defined the structure of g, as well index through a spatio-temporal alignment proced-
as a criterion to optimally choose the coefficients of ure. The spatial alignment results in all templates
reaching their maximum on the same channel. This

5 Linearity within the feature space does not imply a linear beha-
viour between the input space and the feature space. Indeed, an 6 Although the proposed feature map design is deemed to be probe-
amplitude scaling of a spike will not result in an equal scaling of dependent, the feature map design approach is not limited to the
its feature vector, which is even discouraged due to the κ3 term probe geometry used in this work, i.e. the generation of training
minimizing the within-cluster distance. data can be re-done when a different probe geometry is used.

6
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

might sound limiting, but we envision the feature


map to be used in a divide-and-conquer type of
spike sorting [38]. In a divide-and-conquer scheme,
the clustering analysis is performed for every elec-
trode separately. During this clustering, only spatio-
temporal snippets are considered that reach their
minimum on the same ‘dominant’ electrode. In such
a case N is often taken to be the number of electrodes
that make up a local neighbourhood surrounding the
dominant electrode. In this work N = 10 is used. A
trained feature map can be re-used for all local neigh-
bourhoods that have a similar spatial configuration.
Such a divide-and-conquer approach has the advant-
age that it scales linearly with an increasing number of
electrodes and is an easy subject for parallelism. One
third of the simulated spike templates is randomly
selected and excluded from the further steps in the
augmentation process. This set of excluded templates
are reserved for testing in section 4.2. We will refer to
this set as Stest The other two thirds, i.e. a total of 423
templates, are further processed to create the augmen-
ted training data. We will refer to this set of templates
as Straining . Example simulated templates are shown
in figure 5. Note that these templates will not act as
training samples themselves, but instead will form the
basis for a synthetic model to generate training data.
A PCA is performed on the set of training templates.
This requires the computation of the sample covari-
ance matrix over the training templates:

1 X
Rss = Figure 5. Sixteen example simulated spatio-temporal spike
|Straining | templates are shown. The waveform as impinging on a
s̄n ∈Straining
 T specific channel is given a unique colour for improved
× s̄n − ⟨Straining ⟩ s̄n − ⟨Straining ⟩ , (10) readability. The plotting distance between the channels is
constant.

where |Straining | indicates the cardinality of Straining


and ⟨Straining ⟩ is the mean training template. The
principal components are then computed through normalized projection histograms. A similar quality
the eigendecomposition of Rss = UΣUT , where the of fit was seen for the other principal components
columns of U contain the normalized eigenvectors as well. Note that we model the PDF for each k sep-
and Σ is a diagonal matrix with the eigenvalues on the arately instead of a joint PDF, to avoid the curse of
main diagonal. Without loss of generality, we assume dimensionality, which is supported by the uncorrel-
that the elements of Σ are sorted in ascending order. atedness of the entries in v due to the PCA proced-
We then select the first 30 columns of U to construct a ure. Sampling each of the thirty distributions consec-
principal components subspace U1:30 ∈ RN2L×30 that utively leads to an observed random mean-corrected
accounts for approximately 99% of the training tem- training template in the principal component sub-
plates energy. space. By projecting this observation back to the ori-
T 
Consider vn = U1:30 s̄n − ⟨Straining ⟩ to be the ginal space and re-adding the mean template, a new
principal component projections of a mean-corrected training template can be generated:
training template. Let vnk be the projection of s̄n
onto the kth principal component. For each k = 1 : 30, ˆs̄ = U1:30 v + ⟨Straining ⟩, (11)
we estimate the probability density function (PDF)
Pk (v k ) across all samples vnk for n = 1, . . . , |Straining |. where the kth entry of v is sampled from Pk . Such
We use a non-parametric kernel-density PDF estima- example augmented spike templates are shown in
tion method with a Gaussian kernel [39] and a band- figure 7. This figure also contains a non-spatially
width factor of 0.1. The bandwidth factor was chosen aligned example. Such non-aligned examples are
based on visual inspection. Figure 6 shows a visual removed from the set of augmented training tem-
representation of the estimated PDFs for the first four plates. In this work a set of 25 000 random augmented
principal components, which are shown on top of the test templates is generated.

7
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Figure 6. Normalized projection histograms are shown in blue for the first four principal components of the set of ‘training
templates’ Straining . The projection histograms are constructed by projecting all entries of Straining onto the respective principal
component. For every histogram, the respective continuous probability density estimate is shown in red.

Given two distinct augmented training templates of placing the minimum value of the (compound)
ˆs̄i and ˆs̄j , a training set entry can be generated from spike waveform at the centre of the temporal win-
following equations: dow. In this work the random relative shift between
 two spike waveforms is sampled from a uniform dis-
x̄i = rL ˆs̄i + n̄ (12) tribution that is bounded between −10 and 10. With a
slight abuse of notation, let all n̄ be random Gaussian
 spatio-temporal white noise with an average SNR of
x̄j = rL ˆs̄j + n̄ (13) 30 dB (see section 4.2). αi is a random scaling factor
associated with the neuron i that is sampled from
 a uniform distribution that is bounded between 0.8
x̄i⊕j = rL ˆs̄i ⊕ ˆs̄i + n̄ (14)
and 1.2. ˆs̄δk is a spike waveform that reaches its max-
imum energy on a different channel than ˆs̄i and ˆs̄j . As
 such, ˆs̄δk is included to model overlap from neurons
x̄σi = rL αiˆs̄i + n̄ (15)
that reach their minimum spike trough on a different
channel. Note that such non-spatially aligned over-

x̄σj = rL αjˆs̄j + n̄ (16) lapping spikes have to be treated as a source of noise in
a divide-and-conquer clustering setting. By randomly
 sampling (with replacement) ˆs̄i and ˆs̄j with i ̸= j from
x̄δi = rL ˆs̄i ⊕ ˆs̄δk + n̄ (17) the 25 000 augmented templates, a total of 1 000 000
training set entries {(12)–(18)} are generated to form
 the training set T .
x̄jδ = rL ˆs̄j ⊕ ˆs̄δk + n̄, (18)

with r L the temporal realignment-and-truncate oper- 3.4. Training procedure


ator that results in a temporally aligned spike wave- Given the model structure, cost function and train-
form with a temporal window size of L samples. ing data, we can optimize the model to minim-
The alignment procedure used in this work consists ize the cost over the training data. To this end,

8
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

T . The final model that is retained is the model with


the lowest validation cost.

4. Experiments

4.1. Hyper parameter exploration


The model as presented in section 3.1 depends on
four continuous hyper parameters: κ1 , κ2 , κ3 , and
the dropout fraction. For each of the four continu-
ous hyper parameters we explore three predefined
values: κ1 ∈ {1, 10, 100}, κ2 ∈ {0.1, 1, 10}, κ3 ∈
{0.1, 1, 10} and drop out fraction ∈ {0.2, 0.4, 0.6}.
Note that the values of the different κ are rel-
ative to each other and to the first term of (9).
Only three values are considered for every para-
meter to limit the dimensionality of the explora-
tion space. Furthermore, we consider the usage of
two optional features as additional binary hyper
parameter: the use of batch normalization and the
use of a third hidden layer. Training a feature
map for all possible combinations of the described
hyper parameter settings results in a total of 324
models.
Because the cost function is directly modulated by
the choice of κ, the cost function value can not be
used to assess the spike sorting performance of the
different models. Furthermore, note that the training
is example-based, i.e. the training is performed over
the training set entries that each consist of only seven
spike waveform examples at a time. In this section we
Figure 7. Sixteen example augmented spike templates
use a higher number of spike waveforms for each pair
generated from (11). The waveform as impinging on a of neurons (i, j) to enable a proper clustering analysis.
specific channel is given a unique colour for improved For every pair of neurons, a set U(i, j) of 300 spike snip-
readability. The plotting distance between the channels is
constant. For illustrative purposes, an augmented spike pet are generated:
template that reaches its maximum energy on a different
channel than intended (and which is excluded from the • 100 samples of x̄αi ,
further analysis) is shown in grey.
• 100 samples of x̄αj , and,
• 100 samples of x̄α
i⊕j that is defined as an overlapping
spikes snippet where each of the individual spike
we make use of Adam [40], which is a stochastic templates are randomly scaled prior to the applic-
gradient descent algorithm that makes use of ation of the ⊕-operator.
parameter-specific adaptive learning. Adam is oper-
ated with its default parameter settings as imple- For each spike snippet there is a 5% chance that a
mented in TensorFlow [41]. We perform the training random non-spatially aligned spike is superimposed.
in random batches of 1000 training set entries per The neuronal spike templates from which the snip-
iteration. pets are derived are templates from the validation set
To prevent overfitting during training we apply V that is used for the early stopping regularization7 .
dropout regularization, as already mentioned in Given these 300 spikes, a K-means clustering is per-
section 3.1. The dropout fraction, i.e. the fraction of formed with K = 3.
random connections that are removed from the net- To quantify the clustering performance, we com-
work during every training step, is treated as a hyper- pute two additional metrics: the adjusted Rand index
parameter and will be discussed in the next section. To [43] and the centre prediction error (CPE).The adjus-
further improve the training immunity with respect ted Rand index is a widely used metric to assess
to overfitting, we make use of early stopping [42] to the clustering performance. The CPE e(i, j) between
terminate the training iteration. If the cost evaluated neurons i and j is defined here to be a measure of
on a validation set does not decrease for five consec-
utive optimization passes over the entire training set, 7 The fact that these templates have been used for training purposes
the training is terminated. The validation set V con- is not an issue for the model selection, because the model is not dir-
sists of a random 30% sample of the training data set ectly optimized for the model selection metrics that are used here.

9
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Figure 8. The six plots in this figure show the average rand index (x-axis) and centre prediction error (y-axis) over the validation
pairs for all feature maps in the exploration experiment. The lowest right-most point corresponds to the best performance. Each
plot contains the same values, but has a unique colour coding that is informed by one of the hyper parameters as indicated in the
respective legends.

the targeted ‘linear behaviour’ of the feature map as rand index and CPE for every feature map. The aver-
shown in (4): age metrics for every feature map resulting from this
hyper parameter exploration experiment are shown

∥ci⊕j − ci + cj ∥ in figure 8.
e(i,j) = q P , (19) As a first feature map selection criterion, we
x̄k ∈U(i,j) ∥g (x̄k ) − cx̄k ∥
1 2
|U(i,j) | choose to minimize the CPE because we highly
value the resolution of overlapping spikes in this
where the different c indicate the cluster centres. cx̄k work. Fortunately, from our experimental data (see
denotes the cluster centre from the cluster to which figure 8) there seems to be a strong negative correla-
a certain spike waveform x̄k from U(i, j) is assigned. tion between the CPE and Rand index, such that fea-
The denominator of e(i, j) is a measure of the overall ture map selection based on the CPE also leads to
within-cluster spread that is related to the root mean excellent clustering behaviour.
square error. From figure 8 we can also obtain an insight into
Given the clustering results for a pair of neur- the effects of the different parameters on the spike
ons (i, j), both the Rand index and CPE can be com- sorting performance. For κ1 we see that its actual
puted for that pair. This procedure is repeated for a value does not have a major impact on the spike sort-
total of 100 validation pairs, resulting in an average ing performance, both in terms of Rand index and

10
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

CPE. For κ2 we see that this parameter affects the problem. This means that the feature map approach
CPE. This dependency is likely due to the normaliz- is uninformed about the spike waveforms of the neur-
ation with respect to the within-cluster spread in the ons under study during the feature map training,
computation of the CPE. The effect of κ3 is most pro- whereas the template matching approach depends on
nounced for the clustering performance, i.e. focusing the availability of the spike waveform for the matched
too much on robustness against non-spatially aligned filter design. Therefore, we do not expect the fea-
overlap will deteriorate the clustering performance. ture space approach to perform better than tem-
For the use of batch normalization, the optional layer plate matching approaches due to the fundament-
and the value of the dropout fraction, we could not ally different prior knowledge assumptions. At best,
see any clear effect on the spike sorting performance. our presented feature map approach would result in
The minimum CPE model that we finally selected for a similar performance, but using a simplified pro-
further analysis and comparison has an average CPE cessing pipeline. Furthermore, in this section, we will
of 0.300 and average Rand index of 0.971 for the val- make use of other accuracy metrics, as compared
idation data. The hyper parameters associated with to the previous section, that are applicable to both
this model are: κ1 = 1, κ2 = 0.1, κ3 = 0.1, batch nor- approaches and that are also more commonly used for
malization is not used, the optional layer is used (res- spike sorting validation, i.e. precision and recall. Pre-
ulting in a so-called deep neural network [32]) and cision is equal to the fraction of true positive detec-
a dropout fraction of 0.2 is used during training. It is tions over the total number of detection, whereas
noted that the model with maximum Rand index only recall is the fraction of true positive detections over
reached a slightly higher Rand index of 0.975 but had the total number of ground truth positives.
a more than double CPE of 0.660. The test data that is used in this section consists
again of 100 pairs of neurons, but here we make use
4.2. Sorting overlapping spikes benchmark of the biophysically simulated test templates in Stest .
Given the final feature map from the previous section, Note that these templates were obtained from neur-
we can benchmark our method against the state- onal models that are kept out of the data augment-
of-the-art for overlap resolution. The method that ation process for the generation of random training
we have presented here is unique, in that it is the and validation templates. Furthermore, rather than
only method according to our knowledge that has using spatio-temporal white noise as during the train-
the capability of resolving spike overlap directly in ing and hyper parameter exploration, here, we embed
the feature space without the need for first extract- test spikes in real neural recordings [45] which are
ing single-unit templates from the recording. Given acquired from a recording device with the same probe
that there is no feature space benchmark available that geometry as was used for the simulations. The injec-
we could compare against, we resort to a state-of- tion of the biophysically simulated spikes into this
the-art matched filtering technique for resolving spike recording allows to quantify the spike sorting per-
overlap. The matched filter post-processors that we formance since the ground-truth spikes/clusters are
compare against are signal-to-peak-interference ratio known. For every pair we generate 300 spike snippets
(SPIR) optimal linear filters [23]. The desired SPIR as in the previous section, but we do not add artifi-
was set to 20 dB and a subspace regularization [44] cial non-spatially aligned spikes, because the record-
was applied, which accounts for 90% of the signal ing already contains such interfering spikes.
power. Finally, a fixed detection threshold of 25% of In figure 9, the spike embeddings for a pair of test
the template response output power (which is known neurons injected in a real recording are shown. From
by design) was used on the SPIR-optimal filter output this figure, it is clear that the different clusters are
to perform the final spike detection/classification. In largely separable. Note that the colour coding is based
this section, we also report the spike sorting accuracy on ground-truth labels and not on the actual cluster-
for a clustering-based approach (also using K-means ing results. Through the vector addition of the cluster
with K = 3 for every pair of test neurons) which uses centres from the single-neuron clusters, an estimate
PCA features. The features are computed by project- of the centre of the overlapping spikes cluster can be
ing the data on three principal components. By com- obtained. The estimated centre is indicated in figure 9
paring our method against a PCA clustering-based by a black cross and it is shown to be close to the actual
approach, we can assess the loss in accuracy when not cluster centre of the overlapping spikes cluster. The
accounting for overlapping spikes. cluster centres that are used here are obtained directly
It is important to realize that the presented fea- from the clustering analysis (i.e. they are not based on
ture map approach and SPIR-optimal benchmark are the ground truth labels). From this example, it is clear
very different, and this difference should be taken into that the feature map generalizes in terms of spike sort-
consideration when comparing the sorting accuracy ing and overlap resolving capabilities for this test pair.
of the different methods. Use of the presented feature Figure 10 summarizes the spike sorting perform-
map approach is ultimately an unsupervised cluster- ance for the proposed feature map approach, SPIR-
ing approach, whereas the use of template matching optimal filtering approach and PCA-based approach
post-processors is a binary PU-learning classification for all test pairs. The feature map approach has

11
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Figure 9. Three orthogonal projections of the three dimensional feature space are shown. The spike projections are colour coded
based on the ground truth labels: red for spikes of neuron one, blue for spikes of neuron two and yellow for overlapping spikes of
neuron one and two. The green dot represents the cluster centre of the overlapping spikes cluster. The vector addition of the
cluster centres of the red and blue clusters, which is indicated by a black cross, is shown to map closely to the cluster centre of the
overlapping spikes cluster.

Figure 10. The precision and recall boxplots are shown for both the proposed MLP feature map approach (red), the SPIR-optimal
filter approach (blue) and a principal component feature map approach (yellow). The median values for the different metrics are
depicted in orange. Outliers are shown as grey dots. For the MLP and principal component feature map approaches, we could
only express the sorting performance in terms of precision and recall for 94% and 90% of the test pairs, respectively (see text).

a median precision of 99% and a median recall map approach. However, our proposed feature map
of 98.5%, whereas for the SPIR-optimal filter this approach is superior to the PCA-based approach in
is 100% for both. For the PCA-based clustering the context of overlapping spikes. As already men-
approach a median precision of 93.3% and a median tioned, the SPIR-optimal filter approach requires the
recall of only 50% is noted. From this comparison spike templates to be known. In this work we assume
it is clear that both the proposed approach and that all spike templates are known, which is very
SPIR-optimal filter achieve a very good spike sort- unlikely to be the case in practice. Furthermore,
ing performance, although the SPIR-optimal filter because the feature map approach is unsupervised,
approach slightly outperforms the proposed feature it can not benefit from this assumption. Although

12
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

the proposed feature map performs worse than the


SPIR-optimal filter benchmark, its performance is
still at an acceptable level. Note that we are only able
to express the clustering performance of 94% and
90% of the test pairs in terms of precision and recall
for the feature map approach and the PCA-based
approach, respectively. This can be understood from
the fact that the K-means clustering used here, some-
times groups together the single-neuron clusters into
a single cluster such that this cluster can only be arbit-
rarily assigned to one of the two neurons. This is a
common problem in spike sorting, which implies that
Figure 11. The precision (red) and recall (blue) are shown
no single-unit activity can be extracted for these neur- for four different mean SNRs: 10, 20, 30 and 40 dB.
ons. This issue can be potentially resolved by using The median metrics for each scenario are indicated in
more advanced clustering techniques [46], which is orange. The depicted measure of spread corresponds to the
non-outlier boxplot interval, i.e. Q1 −1.5 IQR and Q3
beyond the scope of this paper. As the SPIR-optimal +1.5 IQR, where IQR is the inter-quartile range. For each
filtering is here computed from the ground-truth mean SNR the proportion of test pairs for which the sorting
templates (instead of the cluster centroids), it does not accuracy can be expressed in terms of precision and recall is
shown between brackets.
suffer from this issue. However, in reality these tem-
plates should also be extracted from a prior unsuper-
vised clustering [23].
The overall high spike sorting accuracy for both for the lower SNR scenarios that were unseen during
the proposed approach and SPIR-optimal filtering training. A potential retraining with lower SNR spike
approach is driven by the high SNR spike trains that waveforms could improve the sorting performance
we have used for both training and testing. The sig- for these scenarios.
nal power that is used here to compute the SNR is the
peak power of the spike waveforms. The SNR of the 5. Discussion and conclusion
test templates that are used in this section are situ-
ated between 25 and 38 dB, with an average of 30 dB. We have presented a feature map that is capable of
This SNR range is similar to the SNR range used dur- resolving spike overlap in the feature space. Over-
ing training. For such high SNR neurons our pro- lapping spikes clusters are identifiable by means of a
posed feature space approach can thus be used to reli- simple vector addition of pairs of cluster centres. If
ably sort and resolve overlapping spikes. Peak SNRs of the sum of the cluster centres from a pair of clusters
20–30 dB do occur in practice and these spike trains is sufficiently close to the centre of a third cluster, this
are usually easy to detect and can be reliably sorted, as third cluster is believed to contain spike snippets that
was shown in the previous analysis. are a superposition of spikes from the pair of neurons
In figure 11 we investigate whether the result- corresponding to the pair of clusters. This approach
ing feature map generalizes to other SNR scenarios. is particularly interesting, because the feature map
The SNR measures that are reported in the graph are has to be trained only once. After this initial train-
the mean SNRs that correspond to a specific noise ing, the feature map is intended to be used without
level. The 30 dB case corresponds to the case that was any retraining on other recordings that have a similar
presented in figure 10. When further increasing the recording setting. We have validated this usage scen-
SNR to an average of 40 dB the spike sorting per- ario by showing that the proposed feature map and its
formance further increases with a median precision overlap resolution capabilities generalize to data that
and precision of both 100%. We were able to express was unseen during training.
the sorting accuracy in terms of precision and recall In section 3 we have broken down the design of
for 96% of the test pairs. When decreasing the SNR the feature map into its different ingredients: model
to an average of 20 dB, the median precision and selection, cost function, training data and training
recall decrease to 85.1% and 84.2% respectively. For procedure. Although we have shown that our fea-
the 20 dB case we were able to express the spike sort- ture map is capable of sorting overlapping spikes to
ing performance in terms of precision and recall for a large extent, our method performed slightly worse
88% of the test pairs. Finally, for the 10 dB case the compared to the state-of-the-art. Although we do
median precision and recall further decrease to both not expect a feature map approach to outperform
66.2%. For this final case, we were only able to express a state-of-the-art (SPIR-optimal) matched filtering
the sorting performance for 17% of the test pairs approach due to the fundamental difference in learn-
(which also explains the low spread compared to the ing paradigm and available prior knowledge, a fur-
20 dB case). We can conclude from this experiment ther and deeper exploration of the design space could
that the resulting feature map does not generalize further close the gap between the two paradigms.

13
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

Unfortunately, the design space is infinitely large, Data availability statement


which forces researchers to narrow down the free
parameters. The data that support the findings of this study are
However, in case of future research in this field, available upon reasonable request from the authors.
the proposed MLP architecture could be further
improved or other neural network families could Acknowledgements
be considered, e.g. convolutional neural networks.
The proposed cost function can also be further This project has received funding from the European
tweaked, e.g. to account for overlapping spikes snip- Research Council (ERC) under the European
pets from the activity of three or more neurons. Union’s Horizon 2020 research and innovation pro-
However, support for multiple overlap (as well as gramme (Grant Agreement No. 802895) and the
other additional cost terms) is likely to trade off Research Foundation Flanders (FWO) project FWO
with other feature map characteristics, such as cluster G0D7516N. This research received funding from the
separability, which could hamper the sorting per- Flemish Government (AI Research Program). The
formance. Therefore, it is important to assess the scientific responsibility is assumed by its authors. A
need for additional cost term modelling based on conference precursor of this paper has been pub-
how these terms affect the sorting performance, lished in [1]. This work was carried out at the ESAT
i.e. both with and without the additional term Laboratory of KU Leuven.
involved. Other cost function related improvement
could be integrated to account for neuronal burst-
ing and probe drift [47]. A design component in this
ORCID iDs
works that remains largely unexplored are the use
J Wouters  https://orcid.org/0000-0002-5190-8698
of different optimization algorithms and regulariz-
A Bertrand  https://orcid.org/0000-0002-4827-
ation approaches, that might also lead to alternat-
8568
ive feature map approaches with improved sorting
capabilities.
We believe that the general success of domain- References
specific learning-based spike detection and feature
[1] Wouters J, Kloosterman F and Bertrand A 2020 A neural
map building blocks for spike sorting will depend on
network-based spike sorting feature map that resolves spike
the availability of data augmentation approaches that overlap in the feature space Proc. IEEE Int. Conf. Acoustics,
generalize to real recordings. In this work we have Speech and Signal Processing (ICASSP) pp 1175–9
proposed a simple approach based on simulated tem- [2] Jun J J et al 2017 Fully integrated silicon probes for
high-density recording of neural activity Nature
plates, that was capable of generalizing to realistic
551 232–6
recordings that contained simulated test templates [3] Chung J E et al 2019 High-density, long-lasting and
embedded in real recording noise. Note that the real- multi-region electrophysiological recordings using polymer
istic recording noise was not available during train- electrode arrays Neuron 101 21–31
[4] Lewicki M S 1998 A review of methods for spike sorting: the
ing. Apparently, the feature map did not work well
detection and classification of neural action potentials Netw.,
for low-SNR regimes. In future research, the training Comput. Neural Syst. 9 R53–78
could be repeated, taking into account realistic noise [5] Moser E I, Kropff E and Moser M-B 2008 Place cells, grid
conditions and a wider range of signal-to-noise ratio cells and the brain’s spatial representation system Annu. Rev.
Neurosci. 31 69–89
conditions. Such a training might even lead to a fea-
[6] Khatoun A, Asamoah B and Mc Laughlin M 2017
ture map with noise-reduction capabilities. Finally, Simultaneously excitatory and inhibitory effects of
we do not think learning-based building blocks will transcranial alternating current stimulation revealed using
generalize between distinct recording devices, e.g. dif- selective +96pulse-train stimulation in the rat motor cortex
J. Neurosci. 37 9389–402
ferent probe geometries, and this is a limiting prac-
[7] Aydin Ç, Couto J, Giugliano M, Farrow K and Bonin V 2018
tical factor when compared to non-domain-specific Locomotion modulates specific functional cell types in the
approaches that are widely used today. We recognize mouse visual thalamus Nat. Commun. 9 1–12
that the proposed data augmentation method heav- [8] Gibson S, Judy J W and Marković D 2011 Spike sorting: the
first step in decoding the brain IEEE Signal Process. Mag.
ily depends on the availability of realistic simulation
29 124–43
data, which could hamper its adoption when a wide [9] Shoham S, Fellows M R and Normann R A 2003 Robust,
variety of recording devices are of interest to a spe- automatic spike sorting using mixtures of multivariate
cific user. Therefore, we encourage future research on t-distributions J. Neurosci. Methods 127 111–22
[10] Aksenova T I, Chibirova O K, Dryga O A, Tetko I V,
data augmentation procedures that take real record-
Benabid A-L and Villa A E 2003 An unsupervised automatic
ings at their input. Nonetheless, the use of domain- method for sorting neuronal spike waveforms in awake and
specific building block has the potential to overcome freely moving animals Methods 30 178–87
long standing spike sorting challenges. For example, [11] Quiroga R Q, Nadasdy Z and Ben-Shaul Y 2004
Unsupervised spike detection and sorting with wavelets and
in our work, the template estimation bias due to over-
superparamagnetic clustering Neural Comput. 16 1661–87
lapping spikes is circumvented through an integrated [12] Rossant C et al 2016 Spike sorting for large, dense electrode
feature space approach. arrays Nat. Neurosci. 19 634–41

14
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al

[13] Pachitariu M, Steinmetz N A, Kadir S N, Carandini M and [28] Saif-ur Rehman M et al 2021 Spikedeep-classifier: a
Harris K D 2016 Fast and accurate spike sorting of deep-learning based fully automatic offline spike sorting
high-channel count probes with kilosort Adv. Neural Inf. algorithm J. Neural Eng. 18 016009
Process. Syst. 29 (NIPS 2016) 4448–56 [29] Li Z, Wang Y, Zhang N and Li X 2020 An accurate and robust
[14] Chung J E et al 2017 A fully automated approach to spike method for spike sorting based on convolutional neural
sorting Neuron 95 1381–94 networks Brain Sci. 10 835
[15] Yger P et al 2018 A spike sorting toolbox for up to thousands [30] Eom J, Park I Y, Kim S, Jang H, Park S, Huh Y and Hwang D
of electrodes validated with ground truth recordings in vitro 2021 Deep-learned spike representations and sorting via an
and in vivo Elife 7 e34518 ensemble of auto-encoders Neural Netw. 134 131–42
[16] Pillow J W, Shlens J, Chichilnisky E and Simoncelli E P 2013 [31] Haykin S 2007 Neural Networks: A Comprehensive
A model-based spike sorting algorithm for removing Foundation (Englewood Cliffs, NJ: Prentice-Hall)
correlation artifacts in multi-neuron recordings PLoS One [32] Theodoridis S 2015 Machine Learning: A Bayesian and
8 e62123 Optimization Perspective (New York: Academic)
[17] Abeles M and Goldstein M H 1977 Multispike train analysis [33] Glorot X, Bordes A and Bengio Y 2011 Deep sparse rectifier
Proc. IEEE 65 762–73 neural networks Proc. Fourteenth Int. Conf. Artificial
[18] Adamos D A, Laskaris N A, Kosmidis E K and Intelligence and Statistics pp 315–23
Theophilidis G 2010 Nass: an empirical approach to spike [34] Srivastava N, Hinton G, Krizhevsky A, Sutskever I and
sorting with overlap resolution based on a hybrid Salakhutdinov R 2014 Dropout: a simple way to prevent
noise-assisted methodology J. Neurosci. Methods neural networks from overfitting J. Mach. Lear. Res.
190 129–42 15 1929–58
[19] Marre O, Amodei D, Deshmukh N, Sadeghi K, Soo F, [35] Ioffe S and Szegedy C 2015 Batch normalization: accelerating
Holy T E and Berry M J 2012 Mapping a complete deep network training by reducing internal covariate shift
neural population in the retina J. Neurosci. (arXiv:1502.03167)
32 14859–73 [36] Ramaswamy S et al 2015 The neocortical microcircuit
[20] Franke F, Quiroga R Q, Hierlemann A and Obermayer K collaboration portal: a resource for rat somatosensory cortex
2015 Bayes optimal template matching for spike Front. Neural Circuits 9 44
sorting–combining fisher discriminant analysis with optimal [37] Buccino A P and Einevoll G T 2021 Mearec: a fast and
filtering J. Comput. Neurosci. 38 439–59 customizable testbench simulator for ground-truth
[21] Mokri Y, Salazar R F, Goodell B, Baker J, Gray C M and extracellular spiking activity Neuroinformatics 19 185–204
Yen S-C 2017 Sorting overlapping spike waveforms from [38] Swindale N V and Spacek M A 2014 Spike sorting for
electrode and tetrode recordings Front. Neuroinform. polytrodes: a divide and conquer approach Front. Syst.
11 53 Neurosci. 8 6
[22] Wouters J, Kloosterman F and Bertrand A 2018 Towards [39] Scott D W 2015 Multivariate Density Estimation: Theory,
online spike sorting for high-density neural probes using Practice and Visualization (New York: Wiley)
discriminative template matching with suppression of [40] Kingma D P and Ba J 2014 Adam: a method for stochastic
interfering spikes J. Neural Eng. 15 056005 optimization (arXiv:1412.6980)
[23] Wouters J, Patrinos P, Kloosterman F and Bertrand A 2020 [41] Abadi M et al 2016 Tensorflow: a system for large-scale
Multi-pattern recognition through maximization of machine learning 12th USENIX symposium on operating
signal-to-peak-interference ratio with application to systems design and implementation (OSDI 16) pp 265–83
neural spike sorting IEEE Trans. Signal Process. [42] Prechelt L 1998 Early stopping-but when? Neural Networks:
68 6240–54 Tricks of the Trade (Berlin: Springer) pp 55–69
[24] Prentice J S, Homann J, Simmons K D, Tkačik G, [43] Rand W M 1971 Objective criteria for the evaluation of
Balasubramanian V and Nelson P C 2011 Fast, scalable, clustering methods J. Am. Stat. Assoc. 66 846–50
Bayesian spike identification for multi-electrode arrays PLoS [44] Wouters J, Kloosterman F and Bertrand A 2019 A
One 6 e19884 data-driven regularization approach for template matching
[25] Ekanadham C, Tranchina D and Simoncelli E P 2014 A in spike sorting with high-density neural probes 2019 41st
unified framework and method for automatic neural spike Annu. Int. Conf. IEEE Engineering in Medicine and Biology
identification J. Neurosci. Methods 222 47–55 Society (EMBC) (IEEE) pp 4376–9
[26] Lee J H et al 2017 YASS: Yet another spike sorter Adv. Neural [45] Steinmetz N, Carandini M, and Harris K D, ‘Single Phase3’
Inf. Process. Syst. 30 4002–12 and ‘Dual Phase3’ Neuropixels Datasets3 2019
[27] Hurwitz C, Xu K, Srivastava A, Buccino A and Hennig M [46] Rodriguez A and Laio A 2014 Clustering by fast search and
2019 Scalable spike source localization in extracellular find of density peaks Science 344 1492–6
recordings using amortized variational inference Adv. Neural [47] Rey H G, Pedreira C and Quiroga R Q 2015 Past, present and
Inf. Process. Syst. 32 (NeurIPS 2019) 4724–36 future of spike sorting techniques Brain Res. Bull. 119 106–17

15

You might also like