0% found this document useful (0 votes)

10 views

sc3: Consensus Clustering of Single-Cell Rna-Seq Data: Brief Communications

Uploaded by

Joy Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

sc3: Consensus Clustering of Single-Cell Rna-Seq Data: Brief Communications

Uploaded by

Joy Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

brief communications

SC3: consensus To constrain parameter values in the SC3 pipeline, we first con-
sidered six publicly available scRNA-seq datasets6–11 featuring high-

clustering of single-cell confidence cell labels (since they include cells from different stages,
conditions or lines) that can be considered gold standards (Fig. 1b

RNA-seq data and Supplementary Results 1). To quantify the similarity between
the reference labels and the clusters obtained by SC3, we used the
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

adjusted Rand index (Online Methods), which ranges from 0 for a

Vladimir Yu Kiselev1, Kristina Kirschner2, level of similarity expected by chance to 1 for identical clusterings.
Michael T Schaub3,4, Tallulah Andrews1, Andrew Yiu1, For the gold-standard datasets, we found that the quality of the
Tamir Chandra1,5, Kedar N Natarajan1,6, Wolf Reik1,5,7, outcome as measured by the adjusted Rand index was sensitive to
Mauricio Barahona8, Anthony R Green2 & the number of eigenvectors, d, retained after spectral transforma-
tion (Supplementary Figs. 1 and 2). For all six datasets, we found
Martin Hemberg1 that the best clusterings were achieved when d was between 4%
and 7% of the number of cells, N (Fig. 1c, Supplementary Fig. 3a
Single-cell RNA-seq enables the quantitative characterization and Online Methods). The robustness of the 4–7% range was sup-
of cell types based on global transcriptome profiles. ported by a simulation experiment in which the reads from the
We present single-cell consensus clustering (SC3), six gold-standard datasets were downsampled by a factor of ten
a user-friendly tool for unsupervised clustering, which (Supplementary Fig. 3a). We further tested the SC3 pipeline on
achieves high accuracy and robustness by combining multiple six other published datasets12–17, in which the cell labels can only
clustering solutions through a consensus approach be considered ‘silver standard’ since they were assigned using com-
(http://bioconductor.org/packages/SC3). We demonstrate putational methods and the authors’ knowledge of the underlying
that SC3 is capable of identifying subclones from the biology. Again, we found that SC3 performed well when using
transcriptomes of neoplastic cells collected from patients. d in the 4–7% of N interval (Supplementary Fig. 3b). The final
step, consensus clustering, improved both the accuracy and the
A key advantage of single-cell RNA sequencing (scRNA-seq) is stability of the solution. k-means-based methods typically provide
that it can be used to determine cell types in an unbiased way by different outcomes depending on the initial conditions. We found
submitting transcriptomes to unsupervised clustering1–3. A full that this variability was significantly reduced with the consensus
characterization of the transcriptional landscape of individual approach (Fig. 1d).
cells holds enormous potential for both basic biology and clinical To benchmark SC3, we considered five other methods: tSNE18
applications. However, de novo identification and characterization followed by k-means clustering (t-SNE + k-means; similar to
of cell types requires robust and accurate computational meth- the method used by Grün et al.1), pcaReduce19, SNN-Cliq20,
ods. We have developed SC3, an interactive and user-friendly SINCERA21 and SEURAT22. SC3 performed better than the
R package for clustering (Supplementary Software 1 and five tested methods across all benchmark datasets (Wilcoxon
see http://bioconductor.org/packages/SC3 for the latest version). signed-rank test, P < 0.01), with only a few exceptions (Fig. 2a).
Its integration with Bioconductor4 and scater5 makes it easy to In addition to considering accuracy, we also compared the stabil-
incorporate into existing workflows. ity of SC3 with other stochastic methods (pcaReduce and tSNE +
Each step of the SC3 pipeline (Fig. 1a and Online Methods) k-means but not SEURAT) by running them 100 times (Fig. 2a,b
requires the user to specify a number of parameters, which can be and Online Methods). In contrast to the other methods that rely
difficult and time-consuming to optimize. To avoid this problem, on different initializations, SC3 was highly stable.
SC3 utilizes a parallelization approach whereby a significant sub- Although SC3’s consensus strategy provided high accuracy, it
set of the parameter space is evaluated simultaneously to obtain a came at a moderate computational cost: the run time for 2,000 cells
set of clusterings. SC3 then combines all the different clustering was ~20 min (Supplementary Fig. 4a). The main bottleneck was the
outcomes into a consensus matrix that summarizes how often k-means clustering. By reducing how many runs were considered, it
each pair of cells is located in the same cluster. The final result was possible to cluster 5,000 cells in ~20 min with only a slight reduc-
is determined by complete-linkage hierarchical clustering of the tion in accuracy (Supplementary Fig. 4b). To apply SC3 to even
consensus matrix into k groups. larger datasets, we implemented a hybrid approach that combines

1Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. 2Cambridge Institute for Medical Research, Wellcome Trust/MRC Stem Cell Institute and Department

of Haematology, University of Cambridge, Hills Road, Cambridge, UK. 3Department of Mathematics and naXys, University of Namur, Namur, Belgium. 4ICTEAM,
Université Catholique de Louvain, Louvain-la-Neuve, Belgium. 5Epigenetics Programme, The Babraham Institute, Babraham, Cambridge, UK. 6EMBL-European
Bioinformatics Institute, Hinxton, Cambridge, UK. 7Centre for Trophoblast Research, University of Cambridge, Cambridge, UK. 8Department of Mathematics, Imperial
College London, London, UK. Correspondence should be addressed to M.H. ([email protected]).
Received 28 November 2016; accepted 1 March 2017; published online 27 March 2017; doi:10.1038/nmeth.4236

nature methods | VOL.14 NO.5 | MAY 2017 | 483

brief communications
a Input Gene Filter Distances Transformations d range k-means Consensus
Euclidean PCA N cells
Pearson Laplacian N cells

N cells
(1,d1)
Spearman d1

N cells
N cells
N cells d2
N cells
(1,dD)
Filtered genes

N cells
Genes

dD
(6,d1)
1
0.8
0.6
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

0.4
(6,dD)
0.2
0

b Gold standard k N Units

c d Individual Consensus
40
Biase (ref. 6) 3 49 FPKM Biase Yan Goolam Deng Pollen1
Number of solutions with ARI > 95% of max.

1.00
Yan (ref. 7) 7 90 RPKM
0.75
Goolam (ref. 8) 5 124 CPM 0.50
30
Deng (ref. 9) 10 268 RPKM 0.25
Pollen (ref. 10) 11 301 TPM 0.00
Pollen2 Kolodz. Treutlein Ting Patel
Kolodziejczyk (ref. 11) 3 704 CPM 20 1.00
Silver standard 0.75

ARI
Treutlein (ref. 12) 5 80 FPKM 0.50
0.25
Ting (ref. 13) 7 149 CPM 10
0.00
Patel (ref. 14) 5 430 TPM
Usoskin1 Usoskin2 Usoskin3 Klein Zeisel
Usoskin (ref. 15) 11 622 RPM 1.00
0 0.75
Klein (ref. 16) 4 2,717 UMI
0.50
Zeisel (ref. 17) 9 3,005 UMI 0 5 10 15 20 0.25
d (percent of N) 0.00

Figure 1 | The SC3 framework for consensus clustering of scRNA-seq data. (a) Overview of clustering with SC3. Results of the consensus step are
shown for the Treutlein12 data. (b) Published datasets used to set SC3 parameters. N, number of cells; k, number of clusters originally identified by
the authors; RPKM, reads per kilobase of transcript per million mapped reads; RPM, reads per million mapped reads; FPKM, fragments per kilobase of
transcript per million mapped reads; TPM, transcripts per million mapped reads; UMI, unique molecular identifiers; CPM, counts per million mapped
reads. (c) Eigenvector (d) values that achieve adjusted Rand index (ARI) > 0.95 on gold-standard datasets. Black vertical lines indicate the interval
d = 4–7% of N, showing high accuracy in the classification. (d) 100 realizations of the SC3 clustering of the datasets in b. Dots represent individual
clustering runs and bars represent the median. Red and gray correspond to clustering with and without consensus step, respectively. The solid black
line corresponds to ARI = 0.8. The dashed black line separates gold- and silver-standard datasets.

unsupervised and supervised methodologies. SC3 selects a subset of RMT estimates and cluster numbers suggested by the original
5,000 cells uniformly at random and obtains clusters from this subset authors (Fig. 2b). SC3 is also interactive, allowing users to explore
as described above. Subsequently, the inferred labels are used to train different choices of k in real time by assessing the consensus matrix
a support vector machine, which then assigns labels to the remaining (Fig. 2d), the silhouette index26 (a measure of how tightly grouped
cells (Online Methods). The hybrid approach worked well to predict the cells in the clusters are) or the expression matrix.
cell labels (Fig. 2c and Supplementary Fig. 4c). We were able to ana- SC3 can help to interpret the results of clustering by identify-
lyze a Drop-seq dataset with 44,808 cells and 39 clusters22, generating ing differentially expressed genes, marker genes and outlier cells
results in good agreement with the original study (Supplementary (Supplementary Fig. 6, Supplementary Table 2 and Online
Results, Supplementary Fig. 5 and Supplementary Table 1). The Methods). Marker genes are particularly useful since they can
main drawback of the sampling strategy is that rare cell-types may be used to uniquely identify a cluster. To illustrate these features,
not be identified, and when the number of cells greatly exceeds we analyzed the Deng9 dataset tracing embryonic developmen-
5,000, there is a substantial risk that the sampled distribution will tal stages. The most stable result for k = 10 generated clusters
differ significantly from the full distribution (Online Methods). For that largely agreed with known sampling timepoints (Fig. 2d).
identifying rare subpopulations (for example, cancer stem cells), We identified ~3,000 marker genes (Supplementary Table 3),
methods specifically designed for this purpose, such as RaceID1 or many of which had been previously reported as developmental
GiniClust23, may be more appropriate. stage-specific27,28 and several of which were stage-specific but had
To help users choose an optimal number of clusters, we have not been previously reported (Supplementary Table 3). Notably,
implemented a method based on random matrix theory (RMT)24,25 when using published reference labels9, we identified nine cells
(Online Methods). Overall, we found good agreement between with high outlier scores (Supplementary Fig. 6c), which turned

484 | VOL.14 NO.5 | MAY 2017 | nature methods

brief communications

a Biase Yan Goolam Deng Pollen1 b Estimation of k Solution stability

1.00 SNN- tSNE +
Ref SC3 SINCERA SC3 pcReduce
Gold standard Cliq k-means
0.75
Biase 3 3 5 6 1 0.12 0.82
0.50 Yan 7 6 6 11 1 0.53 0.89
0.25 Goolam 5 6 4 21 1 0.8 0.57
Deng 10 9 3 20 0.99 0.81 0.54
0.00
Pollen 11 11 9 14 0.79 0.76 0.89
Pollen2 Kolodz. Treutlein Ting Patel 1 0.95 0.88
1.00 Method Kolodziejczyk 3 10 18 2 1 0.87 0.85
SC3
0.75 Silver standard
tSNE + k-means
Treutlein 5 3 19 3 1 0.68 0.35
ARI

0.50 pcaReduce
SNN–Cliq Ting 7 10 10 13 1 0.89 0.62
0.25
SINCERA Patel 5 17 10 25 1 0.93 0.44
0.00
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

SEURAT 0.93 0.84 0.37

Usoskin 11 11 11 20 0.95 0.77 0.39
Usoskin1 Usoskin2 Usoskin3 Klein Zeisel 0.95 0.74 0.42
1.00
Klein 4 18 7 305 1 0.59 0.8
0.75 Zeisel 9 30 8 330 0.95 0.89 0.43
0.50

0.25 d
0.00 1 Stage
Zygote
c 1 2 3 10 20 30 40 50 Percent of total
number of cells 0.8 Early two-cell
1.00 in a training set Mid-two-cell
Dataset Late two-cell
0.6 Four-cell
0.75 Deng
Pollen2 Eight-cell
Kolodziejczyk 0.4 16-cell
ARI

0.50
Patel Early blastocyst
Usoskin3 Mid-blastocyst
0.25 0.2 Late blastocyst
Klein
Zeisel
0.00 Macosko 0
Cluster 1 2 3 4 5 6 7 8 9 10

Figure 2 | Benchmarking of SC3 against existing methods. (a) SC3, tSNE + k-means and pcaReduce were applied 100 times to each dataset. SNN-Cliq
and SINCERA are deterministic and were run only once. SEURAT was also run once but was optimized over values of the density parameter G (Online
Methods). Dots represent ARI between inferred clusterings and reference labels; bars correspond to median ARI. The solid black line indicates ARI = 0.8.
The dashed black line separates gold- and silver-standard datasets. (b) The number of clusters k̂ predicted by SC3, SINCERA and SNN-Cliq for all datasets.
Ref, reference clustering reported by the authors. Stability is defined as Nc/100, where Nc is the number of times the most frequent solution was found
from 100 runs. (c) Performance of the SC3 hybrid approach. Dots represent outliers higher (or lower) than the highest (or lowest) value within 1.5× the
interquartile range (IQR). The solid black line indicates ARI = 0.8. The dashed black line in the legend separates gold- and silver-standard datasets.
(d) Consensus matrix as generated by SC3 for the Deng9 dataset, indicating how often each pair of cells was assigned to the same cluster by the
different parameter combinations (1, always; 0, never). Colors at the top represent reference labels corresponding to stages of development.

out to have been prepared using the Smart-seq2 protocol instead determined by growing individual HSCs into granulocyte and
of the Smart-seq protocol9,20. macrophage colonies, followed by Sanger sequencing of the TET2
Finally, we investigated the ability of SC3 to identify subclones and JAK2V617F loci (Supplementary Fig. 7b,c). In agreement
based on transcriptomes. Myeloproliferative neoplasms, a group with SC3 clustering, patient 1 was found to harbor three different
of diseases characterized by the overproduction of terminally dif- subclones: (i) cells with mutations in both loci, (ii) cells with a
ferentiated myeloid cells, reflect an early stage of tumorigenesis TET2 mutation and (iii) wild-type cells. Strikingly, the SC3 clusters
in which multiple subclones are known to coexist in the same contained 22%, 29% and 49% of the cells, respectively, in excel-
patient29. Myeloproliferative neoplasms are thought to originate lent agreement with the 20%, 30% and 50% found in the patient
from hematopoietic stem cells (HSCs). To gain further insight into (Supplementary Fig. 7c). The HSC compartment of patient 2 was
the transcriptional landscape of patient-derived HSCs, we obtained 100% mutant for TET2 and JAK2V617F (Supplementary Fig. 7c),
scRNA-seq data from two patients (Supplementary Figs. 7 again consistent with SC3 clustering (Supplementary Fig. 10).
and 8, Supplementary Table 4 and Online Methods). For patient 1 We then analyzed the pooled cells from patients 1 and 2. SC3 clus-
(N = 51), the silhouette index and the RMT method suggested tering again suggested k = 3 (Fig. 3 and Supplementary Fig. 11),
that three clusters were optimal, and SC3 produced three clusters in agreement with the RMT algorithm. Notably, all of the puta-
of similar size (Supplementary Fig. 9). For patient 2 (N = 89), tive double-mutant cells from patient 1 were grouped with the
SC3 generated a single cluster (Supplementary Fig. 10), in agree- double-mutant cells from patient 2. SC3 reported 33 marker genes
ment with the RMT algorithm. for the putative TET2 mutant and 202 marker genes for the puta-
Since TET2 and JAK2V617F30,31 are the only loci with known tive double mutant clone (Fig. 3 and Supplementary Table 5).
driver mutations in these two patients, we hypothesized that Together with additional evidence (Supplementary Results and
clusters corresponded to clones with different combinations of Supplementary Fig. 12), we conclude that SC3 is able to identify
mutations. The genotype composition of each HSC clone was subclones across patients.

nature methods | VOL.14 NO.5 | MAY 2017 | 485

brief communications
K.K. and A.R.G. are supported by Bloodwise (grant ref. 13003), the Wellcome
Trust (grant ref. 104710/Z/14/Z), the Medical Research Council, the Kay Kendall
Leukaemia Fund, the Cambridge NIHR Biomedical Research Center, the Cambridge
Cluster 14
SOX4
Experimental Cancer Medicine Centre, the Leukemia and Lymphoma Society of
ABHD8 12 America (grant ref. 07037) and a core support grant from the Wellcome Trust
NEU1
TALDO1
10 and MRC to the Wellcome Trust-Medical Research Council Cambridge Stem Cell
CR1L 8 Institute. W.R. was supported by BBSRC (grant ref. BB/K010867/1), the Wellcome
CEP135
MLLT3 6 Trust (grant ref. 095645/Z/11/Z), EU BLUEPRINT and EpiGeneSys.
PRSS57
DGKZ 4
SYTL1
PACS2
2 AUTHOR CONTRIBUTIONS
MS4A8B 0 M.H. conceived the study; V.Y.K., M.H., M.T.S., M.B., T.A. and A.Y. contributed
POLR3H
ACO2 to the computational framework; K.K. and T.C. performed the experiments for
EIF5B
PAX5
the patient data; K.N.N. helped with the analysis of embryonic mouse data;
CD83 M.B., W.R., A.R.G. and M.H. supervised the research; and V.Y.K. and M.H. led the
WDFY3
EIF2AK3 writing of the manuscript with input from the other authors.
MED25
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

CLDN6
PSME1 COMPETING FINANCIAL INTERESTS
EFCAB11
MLL The authors declare no competing financial interests.
CYBASC3
DUSP14
SYNRG Reprints and permissions information is available online at http://www.nature.
PHC1
IRF9 com/reprints/index.html.
KLF3
Cluster 1. Grün, D. et al. Nature 525, 251–255 (2015).
Patient1−TET2 + JAK2V617F Patient1−WT TET2 + WT JAK2V617F 2. Jaitin, D.A. et al. Science 343, 776–779 (2014).
Patient1−TET2 + WT JAK2V617F Patient2−TET2 + JAK2V617F
3. Mahata, B. et al. Cell Rep. 7, 1130–1142 (2014).
4. Gentleman, R.C. et al. Genome Biol. 5, R80 (2004).
Figure 3 | SC3 defines subclones from two patients with myeloproliferative
5. McCarthy, D.J., Campbell, K.R., Lun, A.T.L. & Wills, Q.F.
neoplasm. Marker-gene expression matrix (after gene filter and log- Bioinformatics https://doi.org/10.1093/bioinformatics/btw777 (2017).
transformation; see Online Methods) of a combined dataset of patient 1 6. Biase, F.H., Cao, X. & Zhong, S. Genome Res. 24, 1787–1796 (2014).
and patient 2. Clusters (separated by white vertical lines) correspond 7. Yan, L. et al. Nat. Struct. Mol. Biol. 20, 1131–1139 (2013).
to k = 3. Only the top 10 marker genes are shown for each cluster. 8. Goolam, M. et al. Cell 165, 61–74 (2016).
WT, wild type (i.e., no mutation). 9. Deng, Q., Ramsköld, D., Reinius, B. & Sandberg, R. Science 343,
193–196 (2014).
10. Pollen, A.A. et al. Nat. Biotechnol. 32, 1053–1058 (2014).
Methods 11. Kolodziejczyk, A.A. et al. Cell Stem Cell 17, 471–485 (2015).
Methods, including statements of data availability and any associated 12. Treutlein, B. et al. Nature 509, 371–375 (2014).
accession codes and references, are available in the online version 13. Ting, D.T. et al. Cell Rep. 8, 1905–1918 (2014).
of the paper. 14. Patel, A.P. et al. Science 344, 1396–1401 (2014).
15. Usoskin, D. et al. Nat. Neurosci. 18, 145–153 (2015).
16. Klein, A.M. et al. Cell 161, 1187–1201 (2015).
Note: Any Supplementary Information and Source Data files are available in the 17. Zeisel, A. et al. Science 347, 1138–1142 (2015).
online version of the paper. 18. van der Maaten, L. & Hinton, G. J. Mach. Learn. Res. 9, 2579–2605 (2008).
19. Zurauskiene, J. & Yau, C. BMC Bioinformatics http://doi.org/10.1186/
Acknowledgments s12859-016-0984-y (2016).
We thank B. Vangelov, J.-C. Delvenne and R. Lambiotte for fruitful discussions 20. Xu, C. & Su, Z. Bioinformatics https://doi.org/10.1093/bioinformatics/
and for their help with computational methods. We also thank D. Flores Santa Cruz, btv088 (2015).
D. Dimitropolou and J. Grinfeld for technical assistance with experiments. 21. Guo, M., Wang, H., Potter, S.S., Whitsett, J.A. & Xu, Y. PLoS Comput. Biol.
We thank I. Vasquez-Garcia, D. Harmin, M. Kosicki, D. Ramsköld and M. Huch 11, e1004575 (2015).
for comments on the manuscript. V.Y.K., T.A., A.Y. and M.H. are supported by 22. Macosko, E.Z. et al. Cell 161, 1202–1214 (2015).
Wellcome Trust Grants. K.N.N. is supported by the Wellcome Trust Strategic 23. Jiang, L., Chen, H., Pinello, L. & Yuan, G.-C. Genome Biol. 17, 144 (2016).
Award ‘Single cell genomics of mouse gastrulation’. M.T.S. acknowledges support 24. Patterson, N., Price, A.L. & Reich, D. PLoS Genet. 2, e190 (2006).
from FRS-FNRS; the Belgian Network DYSCO (Dynamical Systems, Control and 25. Tracy, C.A. & Widom, H. Commun. Math. Phys. 159, 151–174 (1994).
Optimisation), funded by the Interuniversity Attraction Poles Programme 26. Rousseeuw, P.J. J. Comput. Appl. Math. 20, 53–65 (1987).
initiated by the Belgian State Science Policy Office; and the ARC (Action de 27. Guo, G. et al. Dev. Cell 18, 675–685 (2010).
Recherche Concerte) on Mining and Optimization of Big Data Models, funded 28. Boroviak, T. et al. Dev. Cell 35, 366–382 (2015).
by the Wallonia-Brussels Federation. M.B. acknowledges support from EPSRC 29. Chen, E., Staudt, L.M. & Green, A.R. Immunity 36, 529–541 (2012).
(grant EP/N014529/1). T.C. was funded through a core funded fellowship by the 30. Ortmann, C.A. et al. N. Engl. J. Med. 372, 601–612 (2015).
Sanger Institute and a Chancellor′s fellowship from the University of Edinburgh. 31. Nangalia, J. et al. N. Engl. J. Med. 369, 2391–2405 (2013).

486 | VOL.14 NO.5 | MAY 2017 | nature methods

ONLINE METHODS (Fig. 1a). In principle, the k used for the hierarchical cluster-
SC3 clustering. SC3 takes as input an expression matrix, M, ing need not be the same as the k used in step 5. However, for
in which columns correspond to cells and rows correspond to simplicity in SC3 the two parameters are constrained to have the
genes/transcripts. Each element of M corresponds to the expres- same value. Figure 1d shows how the quality and the stability of
sion of a gene/transcript in a given cell. By default, SC3 does clustering improves after consensus clustering.
not carry out any form of normalization or correction for batch
effects. SC3 is based on five elementary steps. The parameters Adjusted Rand index. If cell-labels are available (for example,
in each of these steps can be easily adjusted by the user but are from a published dataset) the adjusted Rand index (ARI)34 can
set to sensible default values, determined via the gold-standard be used to calculate similarity between the SC3 clustering and
datasets (see main text). the published clustering. ARI is defined as follows. Given a set
1. Gene filter. The gene filter removes genes/transcripts that are of n elements and two clusterings of these elements, the overlap
either expressed (expression value > 2) in less than X% of cells between the two clusterings can be summarized in a contin-
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

(rare genes/transcripts) or expressed (expression value > 0) in at gency table, in which each entry denotes the number of objects
least (100 – X)% of cells (ubiquitous genes/transcripts). By default, in common between the two clusterings. The ARI can then be
X is set at 6. The motivation for the gene filter is that ubiquitous calculated as
and rare genes are most often not informative for clustering.
We also explored all three parameters defined in the gene filter  nij    ai   b j    n
(expression thresholds of rare and ubiquitous genes/transcripts ∑ ij  2  − ∑ i  2  ∑ j  2   /  2
ARI =  
and the percentage X) and found that in general the gene filter  a  b    a  b j    n
1  i  j  i 
did not affect the accuracy of clustering (Supplementary Fig. 3c).  ∑ i  2  + ∑ j  2   −  ∑ i  2  ∑ j  2   /  2
However, the gene filter significantly reduced the dimensionality 2           
of the data, thereby speeding up the method.
For further analysis, the filtered expression matrix M is log- where nij are values from the contingency table, ai is the sum of the
transformed after adding a pseudocount of 1: M′ = log2(M + 1). ith row of the contingency table, bj is the sum of the jth column of
2. Distance calculations. Distances between the cells (i.e., col-  
umns) in M′ are calculated using the Euclidean, Pearson and the contingency table and   denotes a binomial coefficient.
 
Spearman metrics to construct distance matrices. Since the reference labels are known for all published datasets,
We investigated the impact of dropouts on distance calculations ARI is used for all comparisons throughout the paper.
by considering a modified distance metric that ignores dropouts.
This was done by excluding genes that were not expressed in at Downsampling of the gold-standard datasets. For each gene i
least one cell from the distance calculation. We found that this did and each cell j, the downsampled expression value was generated
not improve the performance (Supplementary Fig. 3d). by drawing from a binomial distribution with parameters P = 0.1
3. Transformations. All distance matrices are then trans- and n = round(Mij).
formed using either principal component analysis (PCA) or by
calculating the eigenvectors of the associated graph Laplacian Additional validation of SC3 pipeline. Additionally, we
(L = I – D–1/2AD–1/2, where I is the identity matrix, A is a simi- investigated the impact of dropouts by considering a modified
larity matrix (A = e–A′/max(A′)), where A′ is a distance matrix) distance metric that ignores dropouts, but we found that this
and D is the degree matrix of A, a diagonal matrix that contains did not improve the performance (Supplementary Fig. 3d and
the row-sums of A on the diagonal (Dii = ΣjAij). The columns Online Methods).
of the resulting matrices are then sorted in ascending order by
their corresponding eigenvalues. Identification of a suitable number of groups k̂ . Matrix Z is
4. k-means. k-means clustering is performed on the first d eigen- obtained from M′ by subtracting the mean and dividing by the
vectors of the transformed distance matrices (Fig. 1a) by using s.d. for each column (z-score). Next, the eigenvalues of X = ZTZ
the default kmeans() R function with the Hartigan and Wong are calculated. The number of clusters, k̂ , is determined by
algorithm32. By default, the maximum number of iterations is set the number of eigenvalues that are significantly different with
to 109 and the number of starts is set to 1,000. P < 0.001 from the Tracy–Widom distribution24,25 with mean
5. Consensus clustering. SC3 computes a consensus matrix using ( n − 1 + p )2 and s.d.
the cluster-based similarity partitioning algorithm (CSPA)33.
1
For each individual clustering result, a binary similarity matrix is
 1 1 3
constructed from the corresponding cell labels: if two cells belong ( n −1 + p)⋅
 n −1
+
p 
,
to the same cluster, their similarity is 1; otherwise the similar-
ity is 0 (Fig. 1a). A consensus matrix is calculated by averaging
all similarity matrices of individual clusterings. To reduce where n is the number of genes/transcripts and p is the number
computational time, if the length of the d range (D in Fig. 1a) is of cells.
more than 15, a random subset of 15 values selected uniformly
from the d range is used. Benchmarking. For each dataset we used the expression units pro-
The resulting consensus matrix is clustered using hierarchi- vided by the authors of that set (Fig. 1b). The gene filter was applied
cal clustering with complete agglomeration, and the clusters are to all the datasets. For tSNE + k-means, SNN-Cliq and pcaRe-
inferred at the k level of hierarchy, where k is defined by the user duce, the same log-transformation as in SC3 (M′ = log2(M + 1))

doi:10.1038/nmeth.4236 nature methods

was applied. For SINCERA, we used the original z-score normali- Biological insights. SC3 can identify differentially expressed
zation21 instead of the log-transformation. For tSNE, the Rtsne R genes as genes that vary between two or more clusters. Accordingly,
package was used with the default parameters. For SEURAT, we marker genes are identified as genes that are highly expressed in
used the original Seurat R package (version 1.3): we performed only one of the clusters and are able to distinguish one cluster
tSNE embedding with the default parameters once (following the from all the remaining ones (Supplementary Fig. 6a). Cell out-
authors’ tutorial at http://satijalab.org/seurat/seurat_clustering_ liers are identified through the calculation of a score for each
tutorial_part1.html) and then clustered the data using the cell using the minimum covariance determinant36. Cells that fit
DBSCAN algorithm multiple times, during which we varied the well into their clusters receive an outlier score of 0, whereas high
density parameter G in the range 10−3–103 to find a maximal values indicate that the cell should be considered an outlier.
ARI (this ARI is presented in Fig. 2a). SEURAT was not able to Identification of differential expression. Differential expres-
find more than one cluster for the smallest datasets (Biase, Yan, sion is calculated using the nonparametric Kruskal–Wallis test,
Goolam, Treutlein and Ting) leading to very small ARI scores. For an extension of the Mann–Whitney test for tests of more than
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

all methods we supplied the k used by the original authors. two groups. The Kruskal–Wallis test has the advantage of being
nonparametric, but as a consequence, it is not well suited for
Cluster stability. We calculated stability of clustering solutions situations in which many genes have the same expression value.
by running each method 100 times and finding the most frequent A significant P-value indicates that gene expression in at least one
solution and the number of times (Nc) it appeared. The stability cluster stochastically dominates one other cluster. SC3 provides a
measure shown in Figure 2b is then calculated as Nc/100. list of all differentially expressed genes with P < 0.01, corrected for
multiple testing (using the default ‘holm’ method of the p.adjust()
Support vector machines (SVM). When using SVM, a specific R function), and plots gene expression profiles of the 50 most
fraction of the cells is selected at random with uniform prob- significant differentially expressed genes. Note that calculating
ability. Next, an SVM35 model with a linear kernel is constructed differential expression after clustering can introduce a bias in the
based on the obtained clustering. We used the svm function of the distribution of P-values, and thus we advise using the P-values
e1071 R package with default parameters. The cluster IDs for the for ranking the genes only.
remaining cells are then predicted by the SVM model. Identification of marker genes. For each gene, a binary classi-
fier is constructed based on the mean cluster-expression values.
Identification of rare cell-types. To specifically evaluate the The area under the receiver operating characteristic (ROC) curve
sensitivity of SC3 for identifying rare cell-types, we carried out is used to quantify the accuracy of the prediction. A P-value is
a synthetic experiment in which cells from one cell-type were assigned to each gene using the Wilcoxon signed-rank test, com-
removed iteratively from the Kolodziejczyk and Pollen datasets. paring gene ranks in the cluster with the highest mean expression
For the Pollen dataset, all but 1–7 of the cells in one of the 11 with all others (P-values are adjusted using the default ‘holm’
clusters were removed. The limit of 7 cells corresponds to the method of the p.adjust() R function). Genes with areas under
size of the smallest cluster in the original data. Subsequently, the ROC curve (AUROC) > 0.85 and with P < 0.01 are defined
SC3 was run using k = 11, and we asked whether or not the cells as marker genes. The AUROC threshold corresponds to the 99th
of the rare cell-type were located in a separate cluster. This was percentile of the AUROC distributions obtained from 100 random
repeated 100 times for each cell-type; Supplementary Figure 4d permutations of cluster labels for all datasets (Supplementary
reports the percentage of runs in which the rare cells were found Table 2 and Supplementary Fig. 6b). SC3 provides a visualiza-
together in a cluster with no other cells. Note that the ARI is tion of the gene expression profiles for the top 10 marker genes
a poor indicator of the ability to identify rare cells, since this of each obtained cluster.
measure is relatively insensitive to the behavior of a small frac- Cell outlier detection. Outlier cells are detected by first taking an
tion of the cells. For the Kolodziejczyk dataset, we used a similar expression matrix of each individual cluster (all cells with the same
strategy, but we allowed for 1–101 cells in the rare group. For labels) and reducing its dimensionality using the robust method
the Pollen dataset, SC3 can detect clusters containing ~1% of the for PCA (ROBPCA)37. This method outputs a matrix with N rows
cells, whereas for the Kolodziejczyk dataset ~10% of the cells are (number of cells in the cluster) and P columns (retained number
required (Supplementary Fig. 4d). We hypothesize that the abil- of principal components after running ROBPCA). SC3 then uses
ity to identify rare cells reflects the origins of the two datasets; the P = min(P, 3) first principal components for further analysis. If
Pollen data is more diverse, as it represents 11 different cell lines, ROBPCA fails to perform or P = 0, SC3 shows a warning message.
while the Kolodziejczyk data comes from one cell-type grown in We found (results not shown) that this usually happened when the
three different conditions. distribution of gene expression in cells was too skewed toward 0.
For the hybrid SC3 approach, with 30% of cells used to train Second, robust distances (Mahalanobis) between the cells in each
the SVM, we were able to calculate the probability of including the cluster are calculated from the reduced expression matrix using
rare cell-types in the training set analytically by multiplying the the minimum covariance determinant (MCD)36. We then used a
data from Supplementary Figure 4d by the probability of all rare threshold based on the Q% quantile of the chi-squared distribution
cells to be included in the drawn sample (30% of all cells). This (with p degrees of freedom) to define outliers. By default Q = 99.99,
probability was calculated using the hypergeometric distribution but it can be manually adjusted by a user. Finally, we define an
R function: phyper(n.rare.cells – 1, n.rare.cells, n.other.cells, 0.3 × outlier score as the difference between the square root of the
(n.other.cells + n.rare.cells), lower.tail = F), where n.rare.cells is the robust distance and the square root of the Q% quantile of the
number of rare cells and n.other.cells is the number of other cells chi-squared distribution (with p degrees of freedom). The outlier
in the dataset (Supplementary Fig. 4e). score is plotted as a bar plot (Supplementary Fig. 6c).

nature methods doi:10.1038/nmeth.4236

Gene and pathway enrichment analysis. We used the g:Profiler spike-in controls downloaded from the ERCC consortium.
web tool38 to perform gene and pathway enrichment analysis in Counts of uniquely mapped reads in each protein coding gene and
all obtained sets of genes. each ERCC spike-in were calculated using SeqMonk (http://
www.bioinformatics.bbsrc.ac.uk/projects/seqmonk) and were
Analysis of the Macosko dataset. To analyze the Drop-seq used for further downstream analysis. Quality control of the
dataset we followed the procedure used by Macosko et al.22 and cells comprised two steps: (i) filtering cells based on the number
selected the 11,040 cells in which more than 900 genes were of expressed genes and (ii). filtering cells based on the ratio of
expressed. Moreover, due to the low read depth, the gene fil- the total number of ERCC spike-in reads to the total number of
ter was removed. We then sampled 5,000 cells and clustered reads in protein-encoding genes. Filtering thresholds were manu-
using SC3, including the SVM step, 100 times. All 100 solu- ally chosen by visual exploration of the quality control features
tions were consistent with each other, resulting in an average (Supplementary Fig. 8). After filtering, 51 and 89 cells were
ARI of 0.58, and they were sufficiently accurate compared to retained from patient 1 and patient 2, respectively. The expres-
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

the reference authors’ clustering, yielding an average ARI of sion values in each dataset were then normalized by first using a
0.54 (Supplementary Fig. 5a). Since each of the 100 solutions size-factor normalization (from DESeq2 package45), to account
were different, we added an additional consensus clustering for sequencing depth variability, and then by using a normali-
step using the ‘best of k’ consensus algorithm39. This approach zation based on ERCC spike-ins, performed using the RUVSeq
provided a single solution based on the 100 different solutions package46 (RUVg() function with parameter k = 1), to account
and was as accurate as the individual solutions, with an ARI of for technical variability. For combined patient data, normalization
0.52 (the actual labels are presented in Supplementary Table 1). steps were performed after pooling the cells. The resulting filtered
The SC3 consensus solution splits the large original cluster and normalized datasets were clustered by SC3. Potential biases in
(cluster 24 with 29,400 cells) hierarchically into two clusters of cell filtering on the proportions of cells in the clusters of patient 1
smaller sizes (18,105 + 10,558 = 28,663 cells; clusters 4 and 8 in are considered in Supplementary Results 2. The cluster of lower
Supplementary Fig. 5b). Additional gene and pathway enrich- cell quality was separated from the other biologically meaningful
ment analysis for the differentially expressed genes between the clusters of patient 1 and did not change the total proportion of
two clusters is presented in Supplementary Table 1. If more the biologically meaningful clusters. Supplementary Results 3
than 75% of the cells from the reference cluster were shared with shows that SC3 clustering results of patient 1 did not depend on
the SC3 cluster, we defined these two clusters as matched. In the normalization procedure.
total, 31 reference clusters were matched to the SC3 clusters. Clustering of patient scRNA-seq data by SC3. We clustered
scRNA-seq data from patient 1 and patient 2 separately, as well
Patients. Both patients provided written informed consent. as a combined dataset containing data from patient 1 + patient 2.
Diagnoses were made in accordance with the guidelines of the For patient 1, in agreement with the RMT algorithm, the best
British Committee for Standards in Haematology. clustering was achieved for k = 3 (Supplementary Fig. 9). Data
Isolation of hematopoietic stem and progenitor cells. Cell popu- from patient 2 was homogeneous, and SC3 was unable to iden-
lations were derived from peripheral blood enriched for hemat- tify more than one meaningful cluster (Supplementary Fig. 10),
opoietic stem and progenitor cells (CD34+, CD38–, CD45RA–, again in agreement with the RMT algorithm. For the combined
CD90+), hereafter referred to as HSCs. For single cell cultures, dataset for patient 1 + patient 2, the best values of the silhouette
individual HSCs were sorted into 96-well plates (Supplementary index were obtained when k was 2 or 3 (Supplementary Fig. 11).
Fig. 7a,b) and grown in a cytokine cocktail designed to promote In both cases, all of the cells from cluster 1 in patient 1 were
progenitor expansion as previously described40. For scRNA-seq grouped with the cells from patient 2. For k = 3, clusters 2 and 3
studies, single HSCs were directly sorted into lysis buffer, as of patient 1 were also resolved (Fig. 3). The RMT algorithm also
described in Picelli et al.41. provided k = 3 for the merged patient 1 + patient 2 dataset.
Determination of mutation load. Colonies of granulocyte/ Comparison of clustering of patient 1 scRNA-seq data. Results of
macrophage composition were chosen and DNA-isolated for the clustering of patient 1 data by other methods and their com-
Sanger sequencing for JAK2V617F and TET2 mutations as pre- parisons are SC3 is presented in Supplementary Results 4 and 5.
viously described by Ortmann et al.30. Identification of differentially expressed genes from microarray data.
Single cell RNA-sequencing. Single HSCs were sorted into 96- The microarray data of patient 1 was obtained from Array Express,
well plates and cDNA generated as described previously41. The under accession number E-MTAB-3086 (ref. 30). One replicate
Nextera XT library-making kit was used for library generation as (2B) was identified as an outlier and removed. The ‘limma’ R
described by Picelli et al.41. package47 was used to identify 932 differentially expressed genes
Processing of scRNA-seq data from HSCs. We sequenced 96 between WT and TET2/JAK2V617F double-mutants using an
single cell samples per patient with tow sequencing lanes per adjusted (by false discovery rate) P-value threshold of 0.1.
sample, yielding a variable number of reads (mean = 2,180,357; Marker genes analysis for patients. For both patients, to increase
s.d. = 1,342,541). FastQC42 was used to assess the sequence qual- the number of marker genes, the AUROC threshold was set to
ity. Foreign sequences from the Nextera Transposase agent were 0.7 instead of the default value of 0.85 and the P-value threshold
discovered and subsequently removed with Trimmomatic43, using was set at 0.1.
the parameters HEADCROP:19 ILLUMINACLIP:NexteraPE-
PE.fa:2:30:10 TRAILING:28 CROP:90 MINLEN:60 to trim the Data availability. All datasets (in Fig. 1b and the Macosko data-
reads to 90 bases, before mapping with TopHat44 to the Ensembl set) were acquired from the accession numbers provided in the
reference genome version GRCh38.77, augmented with the original publications. According to their respective authors, the

doi:10.1038/nmeth.4236 nature methods

Pollen dataset contains two distinct hierarchies and the cells º devtools::install_github(“satijalab/Seurat”, ref = “da6cd08”)
can be grouped either into 4 or 11 clusters, and the Usoskin data- º In the newer versions of SEURAT, a different algorithm is
set contains three hierarchies and the cells can be grouped either used for clustering.
into 4, 8 or 11 clusters. scRNA-seq data for patient 1 and 2 is • Source files used for generating Supplementary Results 2–5
available from GEO under accession code GSE79102. Source data can be found in Supplementary Software 2.
files for Figures 1–3, and Supplementary Figure 1–7 and 12 are
available online.

Software availability. SC3 is available as a R package at http:// 32. Hartigan, J.A. & Wong, M.A. J. R. Stat. Soc. Ser. C Appl. Stat. 28,
100–108 (1979).
bioconductor.org/packages/SC3/. 33. Strehl, A. & Ghosh, J. J. Mach. Learn. Res. 3, 583–617 (2003).
Scripts for figure generation are available at http://github.com/ 34. Hubert, L. & Arabie, P. J. Classif. 2, 193–218 (1985).
hemberg-lab/SC3-paper-figures. At the time of writing the manu- 35. Ben-Hur, A., Horn, D., Siegelmann, H.T. & Vapnik, V. J. Mach. Learn. Res.
2, 125–137 (2001).
© 2017 Nature America, Inc., part of Springer Nature. All rights reserved.

script, the following old versions of some of the tools were used
36. Hubert, M. & Debruyne, M. WIREs Comp Stat 2, 36–43 (2010).
(these tools have been updated/upgraded since then): 37. Hubert, M., Rousseeuw, P.J. & Branden, K.V. Technometrics 47,
• SC3 (1.1.2 ≤ Version < 1.1.5). These versions of SC3 can be 64–79 (2005).
installed: 38. Reimand, J. et al. Nucleic Acids Res. 44, W83–W89 (2016).
39. Goder, A. & Filkov, V. Consensus clustering algorithms: comparison and
• from source/binary files from Bioconductor http://biocon- refinement. in Proceedings of the Meeting on Algorithm Engineering &
ductor.org/packages/3.3/bioc/html/SC3.html Experiments 109–117 (Society for Industrial and Applied Mathematics,
• from GitHub using commands: 2008).
■ install.packages(“devtools”) 40. Petzer, A.L., Zandstra, P.W., Piret, J.M. & Eaves, C.J. J. Exp. Med. 183,
2551–2558 (1996).
■ devtools::install_github(“hemberg-lab/SC3”, ref = 41. Picelli, S. et al. Nat. Protoc. 9, 171–181 (2014).
“8a86b60463”) 42. Andrews, S. FastQC: A quality control tool for high throughput sequence
º SC3 v.1.1.2 source and DESCRIPTION files can be found data. Reference Source (2010).
43. Bolger, A.M., Lohse, M. & Usadel, B. Bioinformatics 30, 2114–2120
in Supplementary Software 1. (2014).
º In the newer versions, the main SC3 pipeline has not been 44. Trapnell, C., Pachter, L. & Salzberg, S.L. Bioinformatics 25,
changed. 1105–1111 (2009).
• SEURAT (version 1.3), which can be installed from 45. Love, M.I., Huber, W. & Anders, S. Genome Biol. 15, 550 (2014).
46. Risso, D., Ngai, J., Speed, T.P. & Dudoit, S. Nat. Biotechnol. 32,
GitHub: 896–902 (2014).
º install.packages(“devtools”) 47. Ritchie, M.E. et al. Nucleic Acids Res. 43, e47 (2015).

nature methods doi:10.1038/nmeth.4236

SingleCellReview
No ratings yet
SingleCellReview
14 pages
ScDHA ARI Table
No ratings yet
ScDHA ARI Table
42 pages
scRNAseq_clustering_Asa_Bjorklund_2021
No ratings yet
scRNAseq_clustering_Asa_Bjorklund_2021
53 pages
bbz062
No ratings yet
bbz062
14 pages
Challenges in Unsupervised Clustering of Single-cell RNA-seq Data
No ratings yet
Challenges in Unsupervised Clustering of Single-cell RNA-seq Data
10 pages
Bacher 2016
No ratings yet
Bacher 2016
14 pages
Tang 2020
No ratings yet
Tang 2020
3 pages
Bbac 625
No ratings yet
Bbac 625
12 pages
Gcanno: A Graph-Based Single Cell Type Annotation Method: Methodologyarticle Open Access
No ratings yet
Gcanno: A Graph-Based Single Cell Type Annotation Method: Methodologyarticle Open Access
10 pages
Machine Learning and Statistical Methods For Clustering Single-Cell RNA-sequencing Data
No ratings yet
Machine Learning and Statistical Methods For Clustering Single-Cell RNA-sequencing Data
15 pages
Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
No ratings yet
Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
7 pages
2502.02629v1
No ratings yet
2502.02629v1
29 pages
2024.12.10.627659v1.full
No ratings yet
2024.12.10.627659v1.full
22 pages
Modeling Intercellular Communication in Tissues Using Spatial Graphs of Cells
No ratings yet
Modeling Intercellular Communication in Tissues Using Spatial Graphs of Cells
23 pages
s13578-019-0314-y
No ratings yet
s13578-019-0314-y
9 pages
Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints
No ratings yet
Constructing Cell-type Taxonomy by Optimal Transport with Relaxed Marginal Constraints
28 pages
Reordering Life: Knowledge and Control in the Genomics Revolution
From Everand
Reordering Life: Knowledge and Control in the Genomics Revolution
Stephen Hilgartner
No ratings yet
scRNA-seq Analysis JY Chen 09-13-2019 Share
No ratings yet
scRNA-seq Analysis JY Chen 09-13-2019 Share
27 pages
Single-Cell Multiomics: Multiple Measurements From Single Cells
No ratings yet
Single-Cell Multiomics: Multiple Measurements From Single Cells
14 pages
10 1371@journal Pone 0221068
No ratings yet
10 1371@journal Pone 0221068
20 pages
Visualizing Hierarchies in scRNA-seq Data Using A Density Tree-Biased Autoencoder
No ratings yet
Visualizing Hierarchies in scRNA-seq Data Using A Density Tree-Biased Autoencoder
18 pages
s42003-023-05480-z
No ratings yet
s42003-023-05480-z
12 pages
The Role of Single-Cell Genomics in Human Genetics
No ratings yet
The Role of Single-Cell Genomics in Human Genetics
13 pages
1-s2.0-S0957417421017590-main
No ratings yet
1-s2.0-S0957417421017590-main
10 pages
Asap Poster Sibdays
No ratings yet
Asap Poster Sibdays
1 page
A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods
No ratings yet
A Systematic Evaluation of Single-cell RNA-sequencing Imputation Methods
30 pages
A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines
No ratings yet
A Hitchhiker's Guide To Single-Cell Transcriptomics and Data Analysis Pipelines
14 pages
NoePerron SummerBioinformaticsWorkshop
No ratings yet
NoePerron SummerBioinformaticsWorkshop
68 pages
Delineating the effective use of self-supervised learning in single cell genomics
No ratings yet
Delineating the effective use of self-supervised learning in single cell genomics
14 pages
Untangling The Hairball
No ratings yet
Untangling The Hairball
60 pages
SMC25_1090_MS
No ratings yet
SMC25_1090_MS
6 pages
Keywords: Graph Algorithms, Data Tegrati, Cellular S, Prote - Prote Teracti S, Transcripti Al Regulatory S, Modularity
No ratings yet
Keywords: Graph Algorithms, Data Tegrati, Cellular S, Prote - Prote Teracti S, Transcripti Al Regulatory S, Modularity
35 pages
CIBERSORTx
No ratings yet
CIBERSORTx
16 pages
Aracne Califano2006 Nat Protocol
No ratings yet
Aracne Califano2006 Nat Protocol
10 pages
ScBERT As A Large-Scale Pretrained Deep Language Model For Cell Type Annotation of Single-Cell RNA-seq Data
No ratings yet
ScBERT As A Large-Scale Pretrained Deep Language Model For Cell Type Annotation of Single-Cell RNA-seq Data
27 pages
Journal Pcbi 1010730
No ratings yet
Journal Pcbi 1010730
23 pages
Computational Method For Single Cell Data Analysis
No ratings yet
Computational Method For Single Cell Data Analysis
270 pages
Computational Methods For Singlecell Data Analysis Hardcover Guocheng Yuan instant download
100% (3)
Computational Methods For Singlecell Data Analysis Hardcover Guocheng Yuan instant download
78 pages
Deep Learning for Cb
No ratings yet
Deep Learning for Cb
16 pages
Methods of Gene Prediction
No ratings yet
Methods of Gene Prediction
5 pages
TMP 3623
No ratings yet
TMP 3623
9 pages
The_impacts_of_active_and_self
No ratings yet
The_impacts_of_active_and_self
14 pages
Supervised Classification Enables Rapid Annotation
No ratings yet
Supervised Classification Enables Rapid Annotation
9 pages
Xóa 13
No ratings yet
Xóa 13
6 pages
bbaf009
No ratings yet
bbaf009
16 pages
Ijcet 10 01 005 PDF
No ratings yet
Ijcet 10 01 005 PDF
10 pages
2024 Annotating Cell Types in Single-Cell ATAC Data Via The Guidance of The Underlying DNA Sequences
No ratings yet
2024 Annotating Cell Types in Single-Cell ATAC Data Via The Guidance of The Underlying DNA Sequences
2 pages
Live Seq Enables Temporal Transcriptomic Recording of Single Cells 2022 Deplancke
No ratings yet
Live Seq Enables Temporal Transcriptomic Recording of Single Cells 2022 Deplancke
32 pages
data-04-00081-v3
No ratings yet
data-04-00081-v3
12 pages
ceng465_week15
No ratings yet
ceng465_week15
44 pages
CGE Course Johanne
No ratings yet
CGE Course Johanne
24 pages
Bayesian Phylogenetic Inference under a Statistical Insertion Deletion Model 1st Edition by Gerton Lunter, IstvÃ¡n MiklÃ³s, Alexei Drummond, Jens Ledet Jensen, Jotun Hein 9783540200765instant download
No ratings yet
Bayesian Phylogenetic Inference under a Statistical Insertion Deletion Model 1st Edition by Gerton Lunter, IstvÃ¡n MiklÃ³s, Alexei Drummond, Jens Ledet Jensen, Jotun Hein 9783540200765instant download
53 pages
FemaleLiver 02 NetworkConstr Blockwise
No ratings yet
FemaleLiver 02 NetworkConstr Blockwise
6 pages
Exploiting Single-cell RNA Sequencing Data
No ratings yet
Exploiting Single-cell RNA Sequencing Data
27 pages
Btad 165
No ratings yet
Btad 165
8 pages
Computational Cancer Biology An Interaction Network Approach Full MOBI eBook
100% (10)
Computational Cancer Biology An Interaction Network Approach Full MOBI eBook
14 pages
Camara 2017
No ratings yet
Camara 2017
7 pages
BBT 034
No ratings yet
BBT 034
17 pages
Integrating Single-cell Multi-omics and Prior Biological Knowledge for a Functional Characterization of the Immune System
No ratings yet
Integrating Single-cell Multi-omics and Prior Biological Knowledge for a Functional Characterization of the Immune System
13 pages
ScGen Predicts Single-cell Perturbation Responses
No ratings yet
ScGen Predicts Single-cell Perturbation Responses
11 pages
VA Training Letter 10-01
100% (1)
VA Training Letter 10-01
12 pages
Virology Lab Techniques, Growth Curve and Protein Electrophoresis
No ratings yet
Virology Lab Techniques, Growth Curve and Protein Electrophoresis
19 pages
Mr. Yellapragada Abhinav - Three Gene
No ratings yet
Mr. Yellapragada Abhinav - Three Gene
1 page
Drug and Gene Delivery System: Takuro Niidome
No ratings yet
Drug and Gene Delivery System: Takuro Niidome
49 pages
Clinchem 1753
No ratings yet
Clinchem 1753
17 pages
Lyra Leite Star Protocols 2022
No ratings yet
Lyra Leite Star Protocols 2022
23 pages
Writing Assignment Week 2
No ratings yet
Writing Assignment Week 2
4 pages
Preparation of competent cell.
No ratings yet
Preparation of competent cell.
5 pages
Diabetes and Sepsis: Preclinical Findings and Clinical Relevance
No ratings yet
Diabetes and Sepsis: Preclinical Findings and Clinical Relevance
8 pages
Tools For Family Assessment
No ratings yet
Tools For Family Assessment
44 pages
Teknik Reaksi
No ratings yet
Teknik Reaksi
21 pages
P Value
No ratings yet
P Value
6 pages
Noninvasive Prenatal Screening (NIPS)
No ratings yet
Noninvasive Prenatal Screening (NIPS)
3 pages
Cleanroom Microbiology
No ratings yet
Cleanroom Microbiology
14 pages
Physiology of Aging: Muhammad Isman Sandira
No ratings yet
Physiology of Aging: Muhammad Isman Sandira
76 pages
Clinical Diagnosis in Plastic Surgery PDF
100% (1)
Clinical Diagnosis in Plastic Surgery PDF
166 pages
En Science 11 Life-Sciences Elab
No ratings yet
En Science 11 Life-Sciences Elab
7 pages
2009 Pewaukee Scholarship Fund Recipients
No ratings yet
2009 Pewaukee Scholarship Fund Recipients
4 pages
Immunology Qs - Part #2 M.Tawalbeh
100% (7)
Immunology Qs - Part #2 M.Tawalbeh
15 pages
This Is Your Brain On Trading
100% (3)
This Is Your Brain On Trading
7 pages
Human Physiology: Prof. Oran Kwon
No ratings yet
Human Physiology: Prof. Oran Kwon
28 pages
Xii Biology Marking Scheme Pre Board Examination Term - 2
No ratings yet
Xii Biology Marking Scheme Pre Board Examination Term - 2
8 pages
Distinguishing A Choroidal Nevus From A Choroidal Melanoma PDF
No ratings yet
Distinguishing A Choroidal Nevus From A Choroidal Melanoma PDF
2 pages
Skin The Jack of All Trades - Important Qs - Important Questions - ICSE
100% (1)
Skin The Jack of All Trades - Important Qs - Important Questions - ICSE
2 pages
Cells Objectives
No ratings yet
Cells Objectives
2 pages
Tamilnadu Veterinary and Animal Sciences University
100% (1)
Tamilnadu Veterinary and Animal Sciences University
46 pages
Calon Peserta PPG 1
No ratings yet
Calon Peserta PPG 1
48 pages
Emerging and Re-Emerging Infectious Diseases
No ratings yet
Emerging and Re-Emerging Infectious Diseases
69 pages
Cerebellum
No ratings yet
Cerebellum
28 pages
World Probiotics 2018 42937 Scientific Program 201868742
No ratings yet
World Probiotics 2018 42937 Scientific Program 201868742
3 pages

sc3: Consensus Clustering of Single-Cell Rna-Seq Data: Brief Communications

Uploaded by

sc3: Consensus Clustering of Single-Cell Rna-Seq Data: Brief Communications

Uploaded by

brief communications

adjusted Rand index (Online Methods), which ranges from 0 for a

nature methods | VOL.14 NO.5 | MAY 2017 | 483

b Gold standard k N Units

484 | VOL.14 NO.5 | MAY 2017 | nature methods

a Biase Yan Goolam Deng Pollen1 b Estimation of k Solution stability

SEURAT 0.93 0.84 0.37

nature methods | VOL.14 NO.5 | MAY 2017 | 485

486 | VOL.14 NO.5 | MAY 2017 | nature methods

doi:10.1038/nmeth.4236 nature methods

nature methods doi:10.1038/nmeth.4236

doi:10.1038/nmeth.4236 nature methods

nature methods doi:10.1038/nmeth.4236

You might also like