08-A data-driven spike sorting feature map for
08-A data-driven spike sorting feature map for
Engineering
PAPER
Abstract
Objective. Spike sorting is the process of extracting neuronal action potentials, or spikes, from an
extracellular brain recording, and assigning each spike to its putative source neuron. Spike sorting
is usually treated as a clustering problem. However, this clustering process is known to be affected
by overlapping spikes. Existing methods for resolving spike overlap typically require an expensive
post-processing of the clustering results. In this paper, we propose the design of a domain-specific
feature map, which enables the resolution of spike overlap directly in the feature space. Approach.
The proposed domain-specific feature map is based on a neural network architecture that is trained
to simultaneously perform spike sorting and spike overlap resolution. Overlapping spikes clusters
can be identified in the feature space through a linear relation with the single-neuron clusters for
which the neurons contribute to the overlapping spikes. To aid the feature map training, a data
augmentation procedure is presented that is based on biophysical simulations. Main results. We
demonstrate the potential of our method on independent and realistic test data. We show that our
novel approach for resolving spike overlap generalizes to unseen and realistic test data.
Furthermore, the sorting performance of our method is shown to be similar to the state-of-the-art,
but our method does not assume the availability of spike templates for resolving spike overlap.
Significance. Resolving spike overlap directly in the feature space, results in an overall simplified
spike sorting pipeline compared to the state-of-the-art. For the state-of-the-art, the overlapping
spike snippets exhibit a large spread in the feature space and do not appear as concentrated
clusters. This can lead to biased spike template estimates which affect the sorting performance of
the state-of-the-art. In our proposed approach, overlapping spikes form concentrated clusters and
spike overlap resolution does not depend on the availability of spike templates.
channel is referred to as multi-unit activity as each These supervised learning building blocks are typ-
electrode typically capture spikes from multiple neur- ically intended for replacing classical spike detec-
ons. However, when an experimenter is interested in tion and feature extraction methods, resulting in
the neural activity of individual neurons, i.e. single- domain-specific detectors and feature maps. How-
unit activity, a neural source separation problem ever, these recent feature maps, as well as preced-
has to be solved. The transformation of the activ- ing classical techniques, do not take into account
ity of multiple neurons into the individual activity spike overlap and overlap resolution mechanisms.
of the different recorded neurons is known as spike In this work we propose the use of a neural net-
sorting [4]. While multi-unit activity is often suf- work spike sorting feature map that is capable of
ficient for coarse brain-computer interfacing tasks, resolving spike overlap directly in the feature space.
having access to the neural activity of individual neur- In this way, the clustering bias due to spike over-
ons is essential to neuroscientists that try to under- lap is overcome. Through the use of this special-
stand how individual neurons contribute to brain ized feature map, there is no need for matched filter
function [5–7]. post-processors, nor for iterative fitting techniques to
A common approach for solving the spike sort- resolve spike overlap, such that a minimal processing
ing problem is through the use of an unsupervised pipeline is delivered that is capable of resolving spike
learning framework [8]. The framework is character- overlap.
ized by four consecutive processing steps: (1) spike In section 2 we formalize the classical spike
detection, (2) spike alignment, (3) feature extrac- sorting process and show how overlapping spikes
tion, and (4) clustering. The spike detection results hamper the sorting performance. Then, the domain-
in the extraction of spike snippets from the neural specific feature map that is designed for resolv-
recording. These spike snippets are then mutually ing spike overlap is presented in section 3. In
aligned according to some alignment criterion to sup- section 4 we explore the design space of the pro-
port the feature extraction. Next, feature values are posed feature map and quantify the feature map
computed for every spike snippet, which ideally res- spike sorting performance on realistic ground truth
ults in the discriminability of spikes from different data. Finally, we conclude the presented work in
neurons, while promoting the grouping of spikes section 5.
from the same neuron. Finally, the single-neuron
clusters are identified through an automated cluster- 2. Spike sorting and overlapping spikes
ing analysis. Several implementations and variations
on this common framework have been proposed Consider x[k] ∈ RN to be the high-pass filtered
[9–15]. extracellular recording at time sample k as meas-
When two or more neurons fire near- ured simultaneously on N recording electrodes.
simultaneously in time, a compound spike wave- A single neuronal spike contributes to several
form, also referred to as an overlapping spike snip- consecutive time samples of the extracellular
pet, is generated. If the vanilla clustering framework recording when using typical sampling frequen-
from the previous paragraph is used for spike sort- cies (25–30 kHz). To facilitate the study of the
ing, the overlapping spike snippets are likely going neuronal spikes, we define the stacked vector
to be misinterpreted, leading to wrongly clustered T
x̄[k] = x[k − L + 1]T . . . x[k − 1]T x[k]T ∈ RNL to
spikes and an overall bias in the cluster identifica-
represent a spatio-temporal window on the extracel-
tion itself. It has been shown that overlapping spikes
lular recording of N channels by L samples. The bar
are affecting spike sorting accuracy in practice [16].
notation ■ ¯ is also used in combination with other
Therefore, strategies have been designed to cope with
symbols to indicate a similar delay-line extension on
overlapping spikes. To the best of our knowledge
the data that is represented by that symbol.
these strategies all depend on the use of spike tem-
A spike sorting algorithm takes such an extracel-
plates [17]. The strategies consist of either apply-
lular recording x[k] at its input to output the sample
ing matched filters that are derived from the spike
times at which the individual neurons embedded in
templates following the biased clustering analysis
the recording generate spikes.
n The o output thus con-
[18–23], or rely on an iterative clustering/template
fitting procedure to resolve spike overlap [16, 24, 25]. sists of a set collection Ŝn ∀n , where an indi-
Spike template estimates will also be biased by over- vidual set Ŝn contains the spike time estimates (i.e.
lapping spikes when they are based on biased clus- the spike train) of the sortable neuron n. The dif-
tering results. In case of the iterative template estim- ferent spike waveforms as generated by a specific
ation, the template estimation bias can be overcome, neuron are very similar throughout time, which is
but only at a considerable increase in computational why clustering techniques are commonly used in
cost. spike sorting. A typical clustering approach for trans-
Recently, modern supervised machine learning forming the input recording into the spike sorting
concepts such as (deep) neural networks have become output consists of the following sequential processing
increasingly popular for use in spike sorting [26–30]. steps [8]:
2
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
Ideally, Sunsorted is empty. However, in practice, where s̄i and s̄j ∈ RNL are the mutually aligned spatio-
Sunsorted usually contains a significant fraction of the temporal spike templates of neuron i and j, respect-
detected spike times. A well-known problem that ively. This property enables the resolution of these
hampers the spike sorting process is the occurrence overlapping spikes in the feature space: if the sum of
of spike overlap. To illustrate this problem consider two cluster centroids maps onto the centroid of a third
the following model for the extracellular recording: cluster, this third cluster is believed to contain the
overlapping spikes that are composed of the sum of
XX
x[k] = αn,t sn [k − t] + n[k], (1) the aligned templates associated with the two former
n∈N t∈Sn clusters. The overlap resolution property is illustrated
in figure 1.
where sn is the finite-support prototypical spike wave- Unfortunately, the assumption of perfect spike
form of neuron n, also referred to as the spike tem- alignment is of little practical use. Although neur-
plate of neuron n. The spike template sn is assumed ons might exist for which the alignment assumption
to have W n non-zero elements centered around k = 0. holds, e.g. when such neurons are coupled through
Furthermore, αn,t is a template scaling factor to model gap junctions, general spike overlap is believed to
amplitude variations and n is defined to be an addit- occur by chance because two or more neurons hap-
ive noise component that accounts for both the neural pen to fire near-simultaneously in time. When such
and electrical noise. We say spike overlap occurs random overlap happens for two neurons i and j,
between a neuron i and j if ∃ ti , tj ∈ Si × Sj : |ti − ti − tj results from a uniform random distribution.
3
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
4
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
Figure 3. Schematic representation of the fully connected MLP neural network architecture as used in this work. The network
accepts an NL-dimensional vector at its input (red). This input vector is transformed to an M-dimensional output vector (blue)
via a chained projection onto three consecutive hidden layers, prior to a projection onto the output neurons. The dimensionality
of the hidden layers ordered from input to output is 2NL, NL and 15 NL, respectively. Both the hidden layers and the output layer
contain a dropout layer (yellow) for training regularization purposes. The hidden neurons make use of a non-linear rectifier
activation function (grey). The use of batch normalization (green) is explored in this work. The second hidden layer is marked as
optional because we will explore networks both with and without this layer.
In a typical supervised learning task, the coeffi- The loss function that is associated with the first
cients of the network are chosen to enable classifica- term of the cost acts on three spatio-temporal spike
tion or regression. To obtain such behaviour, the class waveforms and is given by:
labels or dependent variables values yk are required
2
during training for all k ∈ T , with T the set of train- L1 x̄i , x̄j , x̄i⊕j = g(x̄i ) + g(x̄j ) − g(x̄i⊕j ) , (6)
2
ing samples. A realization of ĝ is then obtained by tun-
ing the coefficients until the (local) minimum of a with a slight abuse of notation, let x̄i = s̄i + n̄ and x̄j =
certain cost function is obtained: s̄j + n̄ be a training sample spike of neuron i and j,
X respectively, each with their own random noise com-
ĝ = arg ming L (g (x̄k ) , yk ) , (5) ponent. x̄i⊕j = s̄i ⊕ s̄j + n̄ is a training sample repres-
k∈T
enting two overlapping spikes involving neuron i and
where L is a training sample loss function, e.g. a least j. Minimizing L1 over g has the potential to results in
squares criterion. a feature map g that behaves linearly for non-aligned
Irrespective of the choice of loss, the classical overlapping spikes as described by (4).
approach to supervised learning with explicit training Minimizing only L1 , will result in the trivial solu-
labels yk is not suitable for use in spike sorting because tion g(x̄) = 0 ∀x̄. To obtain a non-trivial feature map
of its inherent unsupervised nature. Therefore, we g, which is suitable for clustering, L1 is combined with
propose the use of a cost function where the training a second loss term L2 :
labels are used implicitly, i.e. the network is not forced
2
to produce a specific output, rather it is instruc- L2 x̄i , x̄j = exp − g(x̄i ) − g(x̄j ) 2 . (7)
ted to learn an output representation which behaves
according to that cost function. The cost function When minimizing L2 over g, it will favour realizations
that is proposed in this work consists of four distinct of g that result in a large distance in the feature space
terms that each have a specific intended purpose. In between a pair of spikes from different neurons. As
what follows, the different terms will be introduced such, this term is intended to maximize the between-
one-by-one. cluster distance.
5
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
where x̄i and x̄i′ are distinct spike samples from the
same neuron. As such this term serves to minimize the
within-cluster distance for the given neuron.
The compound cost function used in this work
is composed from the above three loss functions as
follows:
X
L1 x̄i , x̄j , x̄i⊕j + κ1 L2 x̄i , x̄j
T
+ κ2 L3 x̄i , x̄σi + L3 x̄j , x̄σj
+ κ3 L3 x̄i , x̄δi + L3 x̄j , x̄δj , (9)
P
where the sum operator acts on the train-
ing set T . A single entry of the training
set
n is a collection of training o example spikes
σ σ δ δ
x̄i , x̄j , x̄i⊕j , x̄i , x̄j , x̄i , x̄j from a pair of train-
ing neurons (i, j) with i ̸= j. Note that for a given pair
of neurons, multiple distinct entries can be present
in the training set. In this work the neuron identities,
e.g. i and j, serve as implicit labels, i.e. the cost func- Figure 4. Flowchart that contains the different steps that are
involved for the generation of training data.
tion describes how the feature map should behave
for spikes of different neurons, but it does not push
the output to a predefined value. x̄σ represents an
amplitude scaled version of x̄. As such, the terms that g, we need to construct the actual training set T to
involve x̄σ (i.e. the terms weighted with κ2 ) serve perform the training. figure 4 depicts the different
to penalize spike amplitude scaling variance in the steps that are involved in the training data generation
desired feature space. x̄δ represent a spike snippet that process. These steps will be discussed in detail in the
is superimposed with a spike waveform of a different remainder of this section.
neuron that reaches its maximum energy on another In this work we make use of a data augmentation
channel. As discussed in the following section, neur- approach for the generation of the training data. First,
ons that reach their maximum spike energy on a 634 spike templates s̄n ∈ RN2L from 634 distinct bio-
different channel are to be treated in separate clus- physically detailed neuron models [36] are simulated
tering problems (see divide-and-conquer clustering). using MEArec [37]. Note that for these templates a
Therefore, the superimposed spike waveform is to be temporal window of 2L is used rather than L. This
treated as physiological noise and feature robustness is done to make sure that the minimum of an over-
against such noise is desirable. More details on the lapping spikes segment with |si − sj | ̸= 0 can be tem-
specifics for x̄σ and x̄δ are given in the next section. porally aligned within an L-window prior to com-
κ1 , κ2 and κ3 are hyper parameters that allow for puting the feature map projection. In this work, the
an explicit trade-off between (a) the linear behaviour probe geometry is equivalent to a single column of a
within the feature space5 , (b) the between-cluster dis- neuropixel probe [2]. A single-column probe design
tance and (c) the within-cluster distance. Note that is chosen here to aid the training data generation pro-
L1 does not have an explicit hyper parameter because cess in terms of its spatio-temporal alignment proced-
its importance can be controlled relative to the other ure (see below)6 . The simulated sampling frequency is
terms through κ1 , κ2 and κ3 . chosen to be 32 kHz. The temporal window length is
chosen to be L = 32, which corresponds to 1 ms. The
3.3. Training data minimum of every s̄n is positioned at the same sample
Now that we have defined the structure of g, as well index through a spatio-temporal alignment proced-
as a criterion to optimally choose the coefficients of ure. The spatial alignment results in all templates
reaching their maximum on the same channel. This
5 Linearity within the feature space does not imply a linear beha-
viour between the input space and the feature space. Indeed, an 6 Although the proposed feature map design is deemed to be probe-
amplitude scaling of a spike will not result in an equal scaling of dependent, the feature map design approach is not limited to the
its feature vector, which is even discouraged due to the κ3 term probe geometry used in this work, i.e. the generation of training
minimizing the within-cluster distance. data can be re-done when a different probe geometry is used.
6
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
1 X
Rss = Figure 5. Sixteen example simulated spatio-temporal spike
|Straining | templates are shown. The waveform as impinging on a
s̄n ∈Straining
T specific channel is given a unique colour for improved
× s̄n − ⟨Straining ⟩ s̄n − ⟨Straining ⟩ , (10) readability. The plotting distance between the channels is
constant.
7
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
Figure 6. Normalized projection histograms are shown in blue for the first four principal components of the set of ‘training
templates’ Straining . The projection histograms are constructed by projecting all entries of Straining onto the respective principal
component. For every histogram, the respective continuous probability density estimate is shown in red.
Given two distinct augmented training templates of placing the minimum value of the (compound)
ˆs̄i and ˆs̄j , a training set entry can be generated from spike waveform at the centre of the temporal win-
following equations: dow. In this work the random relative shift between
two spike waveforms is sampled from a uniform dis-
x̄i = rL ˆs̄i + n̄ (12) tribution that is bounded between −10 and 10. With a
slight abuse of notation, let all n̄ be random Gaussian
spatio-temporal white noise with an average SNR of
x̄j = rL ˆs̄j + n̄ (13) 30 dB (see section 4.2). αi is a random scaling factor
associated with the neuron i that is sampled from
a uniform distribution that is bounded between 0.8
x̄i⊕j = rL ˆs̄i ⊕ ˆs̄i + n̄ (14)
and 1.2. ˆs̄δk is a spike waveform that reaches its max-
imum energy on a different channel than ˆs̄i and ˆs̄j . As
such, ˆs̄δk is included to model overlap from neurons
x̄σi = rL αiˆs̄i + n̄ (15)
that reach their minimum spike trough on a different
channel. Note that such non-spatially aligned over-
x̄σj = rL αjˆs̄j + n̄ (16) lapping spikes have to be treated as a source of noise in
a divide-and-conquer clustering setting. By randomly
sampling (with replacement) ˆs̄i and ˆs̄j with i ̸= j from
x̄δi = rL ˆs̄i ⊕ ˆs̄δk + n̄ (17) the 25 000 augmented templates, a total of 1 000 000
training set entries {(12)–(18)} are generated to form
the training set T .
x̄jδ = rL ˆs̄j ⊕ ˆs̄δk + n̄, (18)
8
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
4. Experiments
9
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
Figure 8. The six plots in this figure show the average rand index (x-axis) and centre prediction error (y-axis) over the validation
pairs for all feature maps in the exploration experiment. The lowest right-most point corresponds to the best performance. Each
plot contains the same values, but has a unique colour coding that is informed by one of the hyper parameters as indicated in the
respective legends.
the targeted ‘linear behaviour’ of the feature map as rand index and CPE for every feature map. The aver-
shown in (4): age metrics for every feature map resulting from this
hyper parameter exploration experiment are shown
∥ci⊕j − ci + cj ∥ in figure 8.
e(i,j) = q P , (19) As a first feature map selection criterion, we
x̄k ∈U(i,j) ∥g (x̄k ) − cx̄k ∥
1 2
|U(i,j) | choose to minimize the CPE because we highly
value the resolution of overlapping spikes in this
where the different c indicate the cluster centres. cx̄k work. Fortunately, from our experimental data (see
denotes the cluster centre from the cluster to which figure 8) there seems to be a strong negative correla-
a certain spike waveform x̄k from U(i, j) is assigned. tion between the CPE and Rand index, such that fea-
The denominator of e(i, j) is a measure of the overall ture map selection based on the CPE also leads to
within-cluster spread that is related to the root mean excellent clustering behaviour.
square error. From figure 8 we can also obtain an insight into
Given the clustering results for a pair of neur- the effects of the different parameters on the spike
ons (i, j), both the Rand index and CPE can be com- sorting performance. For κ1 we see that its actual
puted for that pair. This procedure is repeated for a value does not have a major impact on the spike sort-
total of 100 validation pairs, resulting in an average ing performance, both in terms of Rand index and
10
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
CPE. For κ2 we see that this parameter affects the problem. This means that the feature map approach
CPE. This dependency is likely due to the normaliz- is uninformed about the spike waveforms of the neur-
ation with respect to the within-cluster spread in the ons under study during the feature map training,
computation of the CPE. The effect of κ3 is most pro- whereas the template matching approach depends on
nounced for the clustering performance, i.e. focusing the availability of the spike waveform for the matched
too much on robustness against non-spatially aligned filter design. Therefore, we do not expect the fea-
overlap will deteriorate the clustering performance. ture space approach to perform better than tem-
For the use of batch normalization, the optional layer plate matching approaches due to the fundament-
and the value of the dropout fraction, we could not ally different prior knowledge assumptions. At best,
see any clear effect on the spike sorting performance. our presented feature map approach would result in
The minimum CPE model that we finally selected for a similar performance, but using a simplified pro-
further analysis and comparison has an average CPE cessing pipeline. Furthermore, in this section, we will
of 0.300 and average Rand index of 0.971 for the val- make use of other accuracy metrics, as compared
idation data. The hyper parameters associated with to the previous section, that are applicable to both
this model are: κ1 = 1, κ2 = 0.1, κ3 = 0.1, batch nor- approaches and that are also more commonly used for
malization is not used, the optional layer is used (res- spike sorting validation, i.e. precision and recall. Pre-
ulting in a so-called deep neural network [32]) and cision is equal to the fraction of true positive detec-
a dropout fraction of 0.2 is used during training. It is tions over the total number of detection, whereas
noted that the model with maximum Rand index only recall is the fraction of true positive detections over
reached a slightly higher Rand index of 0.975 but had the total number of ground truth positives.
a more than double CPE of 0.660. The test data that is used in this section consists
again of 100 pairs of neurons, but here we make use
4.2. Sorting overlapping spikes benchmark of the biophysically simulated test templates in Stest .
Given the final feature map from the previous section, Note that these templates were obtained from neur-
we can benchmark our method against the state- onal models that are kept out of the data augment-
of-the-art for overlap resolution. The method that ation process for the generation of random training
we have presented here is unique, in that it is the and validation templates. Furthermore, rather than
only method according to our knowledge that has using spatio-temporal white noise as during the train-
the capability of resolving spike overlap directly in ing and hyper parameter exploration, here, we embed
the feature space without the need for first extract- test spikes in real neural recordings [45] which are
ing single-unit templates from the recording. Given acquired from a recording device with the same probe
that there is no feature space benchmark available that geometry as was used for the simulations. The injec-
we could compare against, we resort to a state-of- tion of the biophysically simulated spikes into this
the-art matched filtering technique for resolving spike recording allows to quantify the spike sorting per-
overlap. The matched filter post-processors that we formance since the ground-truth spikes/clusters are
compare against are signal-to-peak-interference ratio known. For every pair we generate 300 spike snippets
(SPIR) optimal linear filters [23]. The desired SPIR as in the previous section, but we do not add artifi-
was set to 20 dB and a subspace regularization [44] cial non-spatially aligned spikes, because the record-
was applied, which accounts for 90% of the signal ing already contains such interfering spikes.
power. Finally, a fixed detection threshold of 25% of In figure 9, the spike embeddings for a pair of test
the template response output power (which is known neurons injected in a real recording are shown. From
by design) was used on the SPIR-optimal filter output this figure, it is clear that the different clusters are
to perform the final spike detection/classification. In largely separable. Note that the colour coding is based
this section, we also report the spike sorting accuracy on ground-truth labels and not on the actual cluster-
for a clustering-based approach (also using K-means ing results. Through the vector addition of the cluster
with K = 3 for every pair of test neurons) which uses centres from the single-neuron clusters, an estimate
PCA features. The features are computed by project- of the centre of the overlapping spikes cluster can be
ing the data on three principal components. By com- obtained. The estimated centre is indicated in figure 9
paring our method against a PCA clustering-based by a black cross and it is shown to be close to the actual
approach, we can assess the loss in accuracy when not cluster centre of the overlapping spikes cluster. The
accounting for overlapping spikes. cluster centres that are used here are obtained directly
It is important to realize that the presented fea- from the clustering analysis (i.e. they are not based on
ture map approach and SPIR-optimal benchmark are the ground truth labels). From this example, it is clear
very different, and this difference should be taken into that the feature map generalizes in terms of spike sort-
consideration when comparing the sorting accuracy ing and overlap resolving capabilities for this test pair.
of the different methods. Use of the presented feature Figure 10 summarizes the spike sorting perform-
map approach is ultimately an unsupervised cluster- ance for the proposed feature map approach, SPIR-
ing approach, whereas the use of template matching optimal filtering approach and PCA-based approach
post-processors is a binary PU-learning classification for all test pairs. The feature map approach has
11
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
Figure 9. Three orthogonal projections of the three dimensional feature space are shown. The spike projections are colour coded
based on the ground truth labels: red for spikes of neuron one, blue for spikes of neuron two and yellow for overlapping spikes of
neuron one and two. The green dot represents the cluster centre of the overlapping spikes cluster. The vector addition of the
cluster centres of the red and blue clusters, which is indicated by a black cross, is shown to map closely to the cluster centre of the
overlapping spikes cluster.
Figure 10. The precision and recall boxplots are shown for both the proposed MLP feature map approach (red), the SPIR-optimal
filter approach (blue) and a principal component feature map approach (yellow). The median values for the different metrics are
depicted in orange. Outliers are shown as grey dots. For the MLP and principal component feature map approaches, we could
only express the sorting performance in terms of precision and recall for 94% and 90% of the test pairs, respectively (see text).
a median precision of 99% and a median recall map approach. However, our proposed feature map
of 98.5%, whereas for the SPIR-optimal filter this approach is superior to the PCA-based approach in
is 100% for both. For the PCA-based clustering the context of overlapping spikes. As already men-
approach a median precision of 93.3% and a median tioned, the SPIR-optimal filter approach requires the
recall of only 50% is noted. From this comparison spike templates to be known. In this work we assume
it is clear that both the proposed approach and that all spike templates are known, which is very
SPIR-optimal filter achieve a very good spike sort- unlikely to be the case in practice. Furthermore,
ing performance, although the SPIR-optimal filter because the feature map approach is unsupervised,
approach slightly outperforms the proposed feature it can not benefit from this assumption. Although
12
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
13
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
14
J. Neural Eng. 18 (2021) 0460a7 J Wouters et al
[13] Pachitariu M, Steinmetz N A, Kadir S N, Carandini M and [28] Saif-ur Rehman M et al 2021 Spikedeep-classifier: a
Harris K D 2016 Fast and accurate spike sorting of deep-learning based fully automatic offline spike sorting
high-channel count probes with kilosort Adv. Neural Inf. algorithm J. Neural Eng. 18 016009
Process. Syst. 29 (NIPS 2016) 4448–56 [29] Li Z, Wang Y, Zhang N and Li X 2020 An accurate and robust
[14] Chung J E et al 2017 A fully automated approach to spike method for spike sorting based on convolutional neural
sorting Neuron 95 1381–94 networks Brain Sci. 10 835
[15] Yger P et al 2018 A spike sorting toolbox for up to thousands [30] Eom J, Park I Y, Kim S, Jang H, Park S, Huh Y and Hwang D
of electrodes validated with ground truth recordings in vitro 2021 Deep-learned spike representations and sorting via an
and in vivo Elife 7 e34518 ensemble of auto-encoders Neural Netw. 134 131–42
[16] Pillow J W, Shlens J, Chichilnisky E and Simoncelli E P 2013 [31] Haykin S 2007 Neural Networks: A Comprehensive
A model-based spike sorting algorithm for removing Foundation (Englewood Cliffs, NJ: Prentice-Hall)
correlation artifacts in multi-neuron recordings PLoS One [32] Theodoridis S 2015 Machine Learning: A Bayesian and
8 e62123 Optimization Perspective (New York: Academic)
[17] Abeles M and Goldstein M H 1977 Multispike train analysis [33] Glorot X, Bordes A and Bengio Y 2011 Deep sparse rectifier
Proc. IEEE 65 762–73 neural networks Proc. Fourteenth Int. Conf. Artificial
[18] Adamos D A, Laskaris N A, Kosmidis E K and Intelligence and Statistics pp 315–23
Theophilidis G 2010 Nass: an empirical approach to spike [34] Srivastava N, Hinton G, Krizhevsky A, Sutskever I and
sorting with overlap resolution based on a hybrid Salakhutdinov R 2014 Dropout: a simple way to prevent
noise-assisted methodology J. Neurosci. Methods neural networks from overfitting J. Mach. Lear. Res.
190 129–42 15 1929–58
[19] Marre O, Amodei D, Deshmukh N, Sadeghi K, Soo F, [35] Ioffe S and Szegedy C 2015 Batch normalization: accelerating
Holy T E and Berry M J 2012 Mapping a complete deep network training by reducing internal covariate shift
neural population in the retina J. Neurosci. (arXiv:1502.03167)
32 14859–73 [36] Ramaswamy S et al 2015 The neocortical microcircuit
[20] Franke F, Quiroga R Q, Hierlemann A and Obermayer K collaboration portal: a resource for rat somatosensory cortex
2015 Bayes optimal template matching for spike Front. Neural Circuits 9 44
sorting–combining fisher discriminant analysis with optimal [37] Buccino A P and Einevoll G T 2021 Mearec: a fast and
filtering J. Comput. Neurosci. 38 439–59 customizable testbench simulator for ground-truth
[21] Mokri Y, Salazar R F, Goodell B, Baker J, Gray C M and extracellular spiking activity Neuroinformatics 19 185–204
Yen S-C 2017 Sorting overlapping spike waveforms from [38] Swindale N V and Spacek M A 2014 Spike sorting for
electrode and tetrode recordings Front. Neuroinform. polytrodes: a divide and conquer approach Front. Syst.
11 53 Neurosci. 8 6
[22] Wouters J, Kloosterman F and Bertrand A 2018 Towards [39] Scott D W 2015 Multivariate Density Estimation: Theory,
online spike sorting for high-density neural probes using Practice and Visualization (New York: Wiley)
discriminative template matching with suppression of [40] Kingma D P and Ba J 2014 Adam: a method for stochastic
interfering spikes J. Neural Eng. 15 056005 optimization (arXiv:1412.6980)
[23] Wouters J, Patrinos P, Kloosterman F and Bertrand A 2020 [41] Abadi M et al 2016 Tensorflow: a system for large-scale
Multi-pattern recognition through maximization of machine learning 12th USENIX symposium on operating
signal-to-peak-interference ratio with application to systems design and implementation (OSDI 16) pp 265–83
neural spike sorting IEEE Trans. Signal Process. [42] Prechelt L 1998 Early stopping-but when? Neural Networks:
68 6240–54 Tricks of the Trade (Berlin: Springer) pp 55–69
[24] Prentice J S, Homann J, Simmons K D, Tkačik G, [43] Rand W M 1971 Objective criteria for the evaluation of
Balasubramanian V and Nelson P C 2011 Fast, scalable, clustering methods J. Am. Stat. Assoc. 66 846–50
Bayesian spike identification for multi-electrode arrays PLoS [44] Wouters J, Kloosterman F and Bertrand A 2019 A
One 6 e19884 data-driven regularization approach for template matching
[25] Ekanadham C, Tranchina D and Simoncelli E P 2014 A in spike sorting with high-density neural probes 2019 41st
unified framework and method for automatic neural spike Annu. Int. Conf. IEEE Engineering in Medicine and Biology
identification J. Neurosci. Methods 222 47–55 Society (EMBC) (IEEE) pp 4376–9
[26] Lee J H et al 2017 YASS: Yet another spike sorter Adv. Neural [45] Steinmetz N, Carandini M, and Harris K D, ‘Single Phase3’
Inf. Process. Syst. 30 4002–12 and ‘Dual Phase3’ Neuropixels Datasets3 2019
[27] Hurwitz C, Xu K, Srivastava A, Buccino A and Hennig M [46] Rodriguez A and Laio A 2014 Clustering by fast search and
2019 Scalable spike source localization in extracellular find of density peaks Science 344 1492–6
recordings using amortized variational inference Adv. Neural [47] Rey H G, Pedreira C and Quiroga R Q 2015 Past, present and
Inf. Process. Syst. 32 (NeurIPS 2019) 4724–36 future of spike sorting techniques Brain Res. Bull. 119 106–17
15