0% found this document useful (0 votes)
2 views

07-Learning_to_Sort_Few-shot_Spike_Sorting_with_Adversarial_Representation_Learning

This paper presents a novel few-shot spike sorting paradigm that utilizes deep adversarial representation learning to classify unseen spikes based on a small number of annotated examples. The proposed method, combined with a semi-manual clustering routine called DidacticSort, aims to reduce human intervention and improve the efficiency of spike sorting in real-time applications. The performance of the model has been validated using synthetic and in vitro datasets, demonstrating its effectiveness in processing large-scale neural recordings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

07-Learning_to_Sort_Few-shot_Spike_Sorting_with_Adversarial_Representation_Learning

This paper presents a novel few-shot spike sorting paradigm that utilizes deep adversarial representation learning to classify unseen spikes based on a small number of annotated examples. The proposed method, combined with a semi-manual clustering routine called DidacticSort, aims to reduce human intervention and improve the efficiency of spike sorting in real-time applications. The performance of the model has been validated using synthetic and in vitro datasets, demonstrating its effectiveness in processing large-scale neural recordings.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Learning to Sort: Few-shot Spike Sorting with Adversarial

Representation Learning
Tong Wu1∗ , Anikó Rátkai2 , Katalin Schlett2 , László Grand3 , and Zhi Yang1

Abstract— Spike sorting has long been used to obtain activi- optimization problem through iterative computations to infer
ties of single neurons from multi-unit recordings by extracting spike times and waveforms. In density-based clustering,
spikes from continuous data and assigning them to putative spikes are grouped into regions of higher densities than
neurons. A large body of spike sorting algorithms have been
developed that typically project spikes into a low-dimensional the rest in the feature space, requiring little or no prior
feature space and cluster them through iterative computations. knowledge about the data. In practice, the number of clusters
However, there is no reached consensus on the optimal feature is often sensitive to the parameter that determines the radius
space or the best way of segmenting spikes into clusters, which of neighborhood to estimate densities. Furthermore, calcu-
often leads to the requirement of human intervention. It is hence lating the distance between every pair of data points in each
desirable to effectively and efficiently utilize human knowledge
in spike sorting while keeping a minimum level of manual iteration could drastically slow down the processing. Model-
intervention. Furthermore, the iterative computations that are based clustering classifies spikes by fitting assumed model
commonly involved during clustering are inherently slow and structures (e.g., Gaussian mixture model) to the empirical
hinder real-time processing of large-scale recordings. In this data distribution. It also allows incorporating prior informa-
paper, we propose a novel few-shot spike sorting paradigm tion or assumptions into modeling. However, the learning
that employs a deep adversarial representation neural network
to learn from a handful of annotated spikes and robustly and inference of such probabilistic models often incur high
classify unseen spikes sharing similar properties to the labeled computational costs.
ones. Once trained, the deep neural network can implement In addition to the increasing demand for computational
a parametric function that encodes analytically the categorical power, one important commonality of recent spike sorting
distribution of spike clusters, which can be significantly accel- pipelines is the leverage of human knowledge to improve
erated by GPUs and support processing hundreds of thousands
of recording channels in real time. The paradigm also includes the sorting performance. Human supervision is mainly ne-
a clustering routine termed DidacticSort to aid users for cessitated due to the lack of ground truth information and
labeling spikes that will be used to train the deep neural that even well designed spike sorting pipelines cannot ex-
network. We have validated the performance of the proposed haustively cover all possible situations. For example, in
paradigm with both synthetic and in vitro datasets. [3], an interactive clustering is designed to allow users to
I. I NTRODUCTION merge or split clusters for model re-fitting. In general, human
supervision is used as part of the post-processing in spike
Understanding the coordinated activity underlying brain sorting pipelines for correcting the erroneous or suboptimal
computations requires large-scale, simultaneous electrophys- decisions made by the automated heuristic routines. There
iological recordings from distributed neuronal structures at a are limitations of this “cluster-then-refine” arrangement: 1)
cellular-level resolution. A key step in interpreting the multi- It requires a suitable feature space for visualizing and
unit neural activities is spike sorting, the process of detecting manipulating spikes, which is difficult to design; 2) The
spiking events from continuously sampled extracellular volt- human supervision can only happen after the collection and
ages and assigning the events to their originating neurons. automated clustering of a large amount of spikes, making
Recent spike sorting pipelines are mainly based on three it less suitable for online decoding experiments that require
techniques (often a mixture of them): template matching [1], minimum processing delay.
[2], [3], density-based clustering [2], [4], [5], and model- In this work, we propose the design of a few-shot spike
based clustering [6], [3]. Template matching assumes that sorting (FSSS) model that can learn to sort from a small
extracellularly recorded signals can be decomposed into a number of labeled spikes, thereby allowing the model im-
weighted sum of spike templates plus noise. Identifying itating the way human operators sort spikes and avoiding
and clustering spikes usually requires solving a customized problematic clustering decisions that might be made by auto-
1 T. Wu (corresponding author) and Z. Yang are with the Department of mated routines. This few-shot learning capability is achieved
Biomedical Engineering, University of Minnesota, Minneapolis, MN, 55455, primarily through an adversarial representation learning pro-
USA. Emails: {wuxx1521, yang5029}@umn.edu cess inspired by the semi-supervised learning theory from
2 A. Rátkai and K. Schlett are with the Neuronal Cell Biology Research
machine learning community [7], [8]. The proposed model
Group, Department of Physiology and Neurobiology, Eotvos University,
Budapest, Hungary. is built upon a deep neural network with no recurrent struc-
3 L. Grand is with the Faculty of Information Technology and Bion- tures, therefore can be significantly accelerated by dedicated
ics, Pázmány Péter Catholic University, Budapest, Hungary; Neurology hardwares such as GPUs. In addition to FSSS, we design
and Neurosurgery Department, Johns Hopkins University, Baltimore, MD,
21205, USA; APPERCELL Biotech Ltd., Budapest, Hungary. Email: a lightweight clustering routine termed DidacticSort to
[email protected] aid users in sorting spikes semi-manually.

978-1-5386-1311-5/19/$31.00 ©2019 IEEE 713


Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on December 24,2024 at 10:05:58 UTC from IEEE Xplore. Restrictions apply.
The rest of the paper is organized as follows. Section 2 blocks with bottlenecks each having 256 1 × 3 conv filters,
describes the algorithm details of both DidacticSort and and two dense layers with 512 neurons that output 1) a one-
FSSS. Section 3 presents and discusses the experimental hot vector y through a softmax layer that predicts cluster
results. Section 4 concludes the paper. labels, and 2) a hidden code z representing style information
of spikes that are supposedly irrespective of labels. The
II. A LGORITHMS
decoder takes both y and z as inputs and is practically a
A. DidacticSort replication of the encoder in the reverse order to reconstruct
Spikes are aligned to the absolute peaks. All spikes (to the original spikes x. Our objective is to disentangle y from
be manually labeled) are filtered by a Hanning window to z such that we could use the limited label information to
attenuate the spike waveforms on the two sides of the peak, regularize and derive the posterior distribution q(y|x).
with stronger attenuations if farther away from the peak. This To do that we resort to the generative adversarial networks
is motivated by that spikes belonging to different clusters can (GAN), a framework that establishes a min-max adversarial
be mainly differentiated by the shapes of depolarization and game between a generative model G and a discriminative
repolarization; the rest are more random and susceptible to model D [10]. G generates data from random samples z
noise, which should be considered with less importance. subject to a prior p(z), and D estimates the probability that
a sample comes from the actual data distribution x ∼ pdata
Algorithm 1: DidacticSort instead of G. The purpose is to train G to maximize the
Input: Spike events XN ×p probability of D making mistakes:
Output: Number of clusters k, spike IDs I1:N , spike
templates Sk×p min max Ex∼pdata [log D(x)] + Ez∼p(z) [log(1 − D(G(z)))].
Constants: kmin , kmax , distance threshold dist thr G D
begin In our context, the encoder q(y, z|x) is the generator
X̃ ← HanningF ilter(X) G that maps spikes x to the label and style information.
repeat Two discriminative networks, Dy and Dz , are designed to
k ← 1, I1:N ← 1, S ← X̃(1,:) differentiate y and z from true samplings drawn from a
Set dist thr manually
categorical distribution and a d-dimensional Gaussian dis-
for i ← 2 to N do
x ← X̃(i,:)
tribution, respectively. Both Dy and Dz are implemented as
d1:k ← L2 dist(x, S) two dense layers each having 512 neurons.
if min(d1:k ) > dist thr then Algorithm 2: FSSS
k ← k + 1, I(i) ← k
S ← concatenate(S, x) Input: Labeled spikes X̂N1 ×p and the IDs Î1:N1 ,
else unlabeled spikes XN2 ×p
min loc ← argmin(d1:k ) Output: IDs of unlabeled spikes I1:N2
I(i) ← min loc begin
S(min loc,:) ← mean(X̃(I==min loc,:) ) repeat
end // Reconstruction
end Sample Xn2 ×p randomly from XN2 ×p
0
until kmin ≤ k ≤ kmax Xn2 ×p ← gs (ga (Xn2 ×p ; φ); θ)
0
end Lrecon ← Smooth L1 dist(Xn2 ×p , Xn2 ×p )
Update ga and gs with ∇φ Lrecon and ∇θ Lrecon
In DidacticSort, a spike is first compared with all // Regularization
spike templates (initialized as the first spike) to find the Draw Zn2 ×d ∼ N (0, I), and Y1:n2 ∼ Cat(y)
0 0
shortest Euclidean distance. If the distance is greater than a Zn2 ×d , Y1:n2 ← ga (Xn2 ×p ; φ)
threshold dist thr, this spike is considered as the first mem- Lgauss ←
0
ber of a new cluster and also initializes the corresponding −mean(log(Dz (Zn2 ×d ))+log(1−Dz (Zn2 ×d )))
template; otherwise, the spike will be assigned to the closest Lcat ← 0
−mean(log(Dy (Y1:n2 )) + log(1 − Dy (Y1:n2 )))
cluster, and the spike template will be updated by taking the
Update Dz and Dy with ∇Lgauss and ∇Lcat
average of all assigned spikes. After processing all spikes, it 0
Lreg ← −mean(log(Dz (Zn2 ×d ))) −
is up to users to evaluate the quality of clustering, and decide 0
mean(log(Dy (Y1:n2 )))
which clusters to keep or start over with a different dist thr. Update ga with ∇φ Lreg
The details of the algorithm are given in Algorithm 1.
// Semi − supervision
B. Few-shot spike sorting (FSSS) Sample X̂n1 ×p randomly from X̂N1 ×p
0 0
Ẑn1 ×d , Ŷ1:n1 ← ga (X̂n1 ×p ; φ)
At the core of the proposed model is an autoencoder [9]. 0
0 Update ga with ∇φ Cross Entropy(Ŷ1:n1 , Î1:n1 )
The operation of the autoencoder can be described as x = 0
0
gs (ga (x; φ); θ), where x and x are input and output data; until Ŷ1:N1 ≈ Î1:N1
ZN2 ×d , I1:N2 ← ga (XN2 ×p ; φ)
ga and gs denote the encoder and decoder, respectively, that end
are parameterized by φ and θ.
In this work, the encoder consists of an input layer with A detailed description of FSSS is given in Algorithm 2
64 1 × 1 convolutional (conv) filters, followed by 2 residual and explained as follows. First, the autoencoder is trained

714
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on December 24,2024 at 10:05:58 UTC from IEEE Xplore. Restrictions apply.
TABLE I Cluster 1 (155) Cluster 2 (38) Cluster 3 (35) Cluster 4 (13)
500 500 500
S PIKE SORTING PERFORMANCE ON C D I F F I C U L T 1. 500
0
0 0 0
−500
−500
50 spikes 100 spikes 200 spikes −1000
−500
−1000
−500

Noise 005 (3383) 97.9% 97.9% 98.2% Cluster 5 (14)


500
Cluster 6 (11) Cluster 7 (2) Cluster 8 (6)
500 500 500
Noise 01 (3448) 94.7% 94.9% 95.7% 0
250
0
0 0
Noise 015 (3472) 83.5% 89.7% 93.0% −500
−250 −500
−1000 −500
Noise 02 (3414) 66.4% 75.9% 85.5% Cluster 9 (6)
−500
Cluster 10 (2) Cluster 11 (6)
−1000
Cluster 12 (5)
500 500 500 500

250
0 0

with an unlabeled mini-batch to update the encoder q(y, z|x) −500 −250
0 0

−500
and the decoder by minimizing the reconstruction error −1000 −500 −500

Cluster 13 (1) Cluster 14 (1) Cluster 15 (1) Cluster 16 (1)


measured in smoothed L1 distance. Second, Dy and Dz 500 500
500 500
are updated by differentiating true samples generated by 0 0
0
0
the categorical and Gaussian distributions from the fake −500
−500 −500
−500

−1000
−1000 −1000
samples generated by the encoder (generator G). The event Cluster 17 (2) Cluster 18 (1)
1000
probabilities of Cat(y) are determined by the proportions of 500

250
0
the spike counts of the selected clusters in DidacticSort. 0

This is followed by updating the encoder to confuse the −250 −1000

discriminative networks. Finally, the posterior q(y|x) is


updated by minimizing the cross-entropy cost on the mini- Fig. 1. DidacticSort on 300 spikes from MEA data (dist thr = 4).
batch of spikes labeled by DidacticSort. The numbers in parentheses are the spike count of each cluster. Each spike
contains 48 samples. The vertical axis of each sub-figure is in µV. Ten
clusters (1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12) are kept.
III. R ESULTS
A. Sorting synthetic neural data We selected one channel data with active spiking activities
from a 10-minute recording for demonstration. 3064 spikes
The synthetic dataset we used is Wave Clus from were detected from this data sequence using the median-
University of Leicester [11]. The dataset is generated by based spike detection method defined in [12], with 48
inserting multiple instances of several spike templates to samples per spike. Figure 1 shows the 18 clusters found
continuous background noise of various levels, thus re- by DidacticSort from 300 randomly chosen spikes with
alizing different signal-to-noise ratios (SNRs). We picked dist thr set as 4. We chose the 10 clusters (292 spikes in
four sequences, C Difficult1 with noise levels from total) that contain 5 spikes or more, and kept these labeled
0.005 to 0.02. Spikes were extracted using the ground truth spikes for next-step processing. The criterion of choosing
timestamps included in the datasets, with 64 samples per clusters is at the user’s choice. The proportions of the spike
spike. As the spike labels are known a priori, we skipped counts of these clusters are used to initialize the event
DidacticSort on the selected Wave Clus data. For each probability of the categorical distribution in FSSS, which
data sequence, we selected an increasing number of labeled can in practice accelerate convergence of model training
spikes (50, 100, and 200) randomly from all extracted spikes, compared with a uniform initialization.
and sorted the remaining spikes using FSSS. Next we trained the deep learning model of FSSS. We
Performance measured in classification accuracy are given found the resulting model quite robust to the latent dimension
in Table I. The numbers following noise levels indicate the of the autoencoder, which can be any value from 3 to 10.
total number of spikes in that sequence. The results illustrate With ∼300 labeled spikes, the training typically requires
that the proposed model can successfully leverage the limited around 2 minutes on a server with one Intel i7-6800K, one
label information (no more than 5%) to improve sorting GeForce Titan Xp 12 GB, and 32 GB memory. Deployed in
performance, especially on low-SNR data. For example, on inference mode, the model can sort spikes at a speed of over
Noise 02, additional 150 labeled spikes can lead to almost 200,000 spikes/second on one GPU, which could support
20% increase of classification accuracy, facilitated by the hundreds of thousands of channels at the same time, and
few-shot adversarial representation learning. facilitate large-scale neural signal processing.
Figure 2(a) shows the ten clusters found by FSSS from
B. Sorting neural data recorded with multielectrode array
all the unlabeled spikes, which correspond to the sample
We also tested the proposed model on in vitro data clusters in Figure 1 identified by DidacticSort. Some
recorded with a multielectrode array (MEA). Mouse hip- of the sample clusters contain very few spikes, yet FSSS
pocampal neurons were cultivated on transparent commer- can reliably recognize and classify similar unseen spikes
cial MEA arrays. Neurophysiological data recordings were by exploiting the limited supervised information. It should
carried out on DIV11 and DIV12 with 20 kHz sampling be noted that this is fundamentally different from template
rate. The MEA arrays were mounted on the recording matching in which a new spike is exhaustively compared
hardware located outside of the incubator. The recording was with every template to find the nearest cluster; instead,
maximized to be 10 minutes to prevent pH changes. FSSS learns a parametric function that characterizes the

715
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on December 24,2024 at 10:05:58 UTC from IEEE Xplore. Restrictions apply.
1000 1000 600
500
500
500
0 400
0 0 0

−500 −500
−500
−1000 200
−1000 −1000 −1000

0 20 40 0 20 40 0 20 40 0 20 40
1000
0
1000 500
500 500
250 −200

µV
0 0
0 0

−500
−500
−250
−400
−1000
−1000
−500
−1000 −600
0 20 40 0 20 40 0 20 40 0 20 40

500 500 −800


0
0 −1000
−500
−500
−1000 0 10 20 30 40
0 20 40 0 20 40 Number of samples

(a) (b)

Fig. 2. (a) Ten sorted clusters on all the unlabeled spikes processed by FSSS. The vertical axis of each sub-figure is in µV. A few clusters contain several
outliers that are not labeled by DidacticSort. (b) Spike templates of the ten clusters.

statistical distributions of clusters, thereby encoding the label ACKNOWLEDGMENT


information of spikes analytically. It should also be noted Research at the Eotvos University was supported by the
that there are a few outliers in some of the clusters, which National Brain Research Programs (KTIA NAP 13-2-2014-
mostly came from the left-out spikes by DidacticSort. 0018 and 2017-1.2.1-NKP-2017-00002) provided by NR-
Given the rare occurrences of the outliers, their impact on DIO, Hungary.
the downstream tasks such as decoding is minimal. Figure
2(b) shows the spike templates of the 10 clusters. R EFERENCES
[1] M. Pachitariu, N. A. Steinmetz, S. N. Kadir, M. Carandini, and K. D.
IV. C ONCLUSION Harris, “Fast and accurate spike sorting of high-channel count probes
with kilosort,” in Advances in Neural Information Processing Systems,
2016, pp. 4448–4456.
Spike sorting is an important step in deciphering the [2] P. Yger, G. L. Spampinato, E. Esposito, B. Lefebvre, S. Deny,
multi-unit spike trains and understanding the mechanism how C. Gardella, M. Stimberg, F. Jetter, G. Zeck, S. Picaud, et al., “A
neurons communicate with each other. Existing spike sorting spike sorting toolbox for up to thousands of electrodes validated with
ground truth recordings in vitro and in vivo,” ELife, vol. 7, p. e34518,
methods have several drawbacks. First, the requirement of 2018.
designing a suitable feature space proves difficult, resulting [3] K. Q. Shan, E. V. Lubenov, and A. G. Siapas, “Model-based spike sort-
in suboptimal or even erroneous classification. Second, the ing with a mixture of drifting t-distributions,” Journal of neuroscience
methods, vol. 288, pp. 82–98, 2017.
clustering based on iterative optimization is inherently slow [4] G. Hilgen, M. Sorbaro, S. Pirmoradian, J.-O. Muthmann, I. E. Kepiro,
and by nature offline. Third, human intervention is often S. Ullo, C. J. Ramirez, A. P. Encinas, A. Maccione, L. Berdondini,
required to curate the automated sorting results. et al., “Unsupervised spike sorting for large-scale, high-density multi-
electrode arrays,” Cell reports, vol. 18, no. 10, pp. 2521–2532, 2017.
In this paper, we propose a new spike sorting paradigm [5] J. E. Chung, J. F. Magland, A. H. Barnett, V. M. Tolosa, A. C. Tooker,
that consists of two algorithms, DidacticSort and FSSS. K. Y. Lee, K. G. Shah, S. H. Felix, L. M. Frank, and L. F. Greengard,
DidacticSort is a simple classification routine with only “A fully automated approach to spike sorting,” Neuron, vol. 95, no. 6,
pp. 1381–1394, 2017.
one tunable parameter that can quickly generate candidate [6] J. H. Lee, D. E. Carlson, H. S. Razaghi, W. Yao, G. A. Goetz,
clusters from a small number of spike samples. It allows E. Hagen, E. Batty, E. Chichilnisky, G. T. Einevoll, and L. Paninski,
incorporating human knowledge without imposing excessive “Yass: Yet another spike sorter,” in Advances in Neural Information
Processing Systems, 2017, pp. 4002–4012.
burdens to users. FSSS features the learning capability from [7] A. Makhzani, J. Shlens, N. Jaitly, and I. Goodfellow, “Adversarial au-
a small amount of labeled samples and generalizing the toencoders,” in International Conference on Learning Representations,
learned knowledge to many unseen events for unsupervised 2016. [Online]. Available: http://arxiv.org/abs/1511.05644
[8] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau,
clustering. Combined together, FSSS can imitate the way T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn
DidacticSort clusters spikes that encompasses human’s by gradient descent by gradient descent,” in Advances in Neural
decision making. The proposed paradigm brings several Information Processing Systems, 2016, pp. 3981–3989.
[9] Y. Bengio et al., “Learning deep architectures for ai,” Foundations and
useful features to the development of spike sorting: 1) Human trends R in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009.
knowledge can be better utilized as guiding (prior instructive [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
information) instead of merely intervention (post-processing) S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Advances in neural information processing systems, 2014, pp. 2672–
to achieve more reasonable spike sorting results; 2) FSSS 2680.
can learn a parametric function that encodes the categorical [11] R. Q. Quiroga, Z. Nadasdy, and Y. Ben-Shaul, “Unsupervised spike
distribution of spike clusters analytically, thus can avoid detection and sorting with wavelets and superparamagnetic clustering,”
Neural Comput., vol. 16, pp. 1661–1687, 2004.
iterative computations and easily be accelerated by GPUs [12] J. T. Springenberg, “Unsupervised and semi-supervised learning
to facilitate online, large-scale neural signal processing in with categorical generative adversarial networks,” arXiv preprint
real time; 3) The paradigm only requires a small number arXiv:1511.06390, 2015.
of spikes for labeling & model training, and can perform
robustly on large amounts of unseen data.

716
Authorized licensed use limited to: Universita degli Studi di Bologna. Downloaded on December 24,2024 at 10:05:58 UTC from IEEE Xplore. Restrictions apply.

You might also like