scRNAseq_clustering_Asa_Bjorklund_2021
scRNAseq_clustering_Asa_Bjorklund_2021
Åsa Björklund
[email protected]
Cell identity
• Hypotheses:
– What is a cell type? What cell types are in my tissue?
– What is the number of clusters k?
• Choices:
– Gene set selection
– Similarity measure / Space to calculate similarity
– Algorithm and hyper parameters of that algorithm.
• Different choice leads to different results. Validate,
interpret and repeat steps.
What is clustering?
• Structure when:
1) Samples within cluster
resemble each other (within
variance, σW(i))
2) Clusters deviate from each
other
(between variance, σB)
Group samples such that:
Hierarchical clustering
http://www.slideshare.net/uzairjavedsiddiqui/malhotra20
• Ward (minimum variance method). Similarity of two clusters is
based on the increase in squared error when two clusters are
merged.
http://www.slideshare.net/uzairjavedsiddiqui/malhotra20
K-means clustering
1. Starts with random selection of cluster centers
(centroids)
2. Then assigns each data points to the nearest cluster
3. Recalculates the centroids for the new cluster
definitions
4. Repeats steps 2-3 until no more changes occur.
Can use same distance measures as in hclust.
https://en.wikipedia.org/wiki/K-means_clustering
Network/graph clustering
Node/Vertice
Community
Edge –
(weighted
& directed)
Hubs
Connectivity
- # of edges
(http://www.lyonwj.com/2016/06/26/
graph-of-thrones-neo4j-social-network-analysis/)
Types of graphs
• FindNeighbors:
– First construct a KNN (k-nearest neighbor) graph – default is based on
the euclidean distance in PCA space
– Then SNN graph the edge weights between any two cells based on the
shared overlap in their local neighborhoods (Jaccard distance) and
pruning of distant edges.
• Important parameters:
– reduction: default is “pca”
– dims: number of PCs
– k.param: Number of neighbors in KNN graph
– prune.snn: Cutoff for pruning
(http://satijalab.org/seurat/)
Seurat clustering
(http://satijalab.org/seurat/)
Scran clustering
https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html
Subclustering
• Hypotheses:
– What is a cell type? What cell types are in my tissue?
– What is the number of clusters k?
• Choices:
– Gene set selection
– Similarity measure / Space to calculate similarity
– Algorithm and hyper parameters of that algorithm.
• Different choice leads to different results. Validate,
interpret and repeat steps.
Conclusions