0% found this document useful (0 votes)
1 views

Code

The document outlines a series of data processing steps and analyses using various bioinformatics techniques, including reciprocal PCA, SCTransform, CCA, and harmony for single-cell RNA sequencing data. It details the loading of datasets, quality control measures, and the identification of cell type markers across different immune cell populations. Additionally, it emphasizes the importance of version control and the use of specific libraries to minimize bugs during analysis.

Uploaded by

janicepdudas54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Code

The document outlines a series of data processing steps and analyses using various bioinformatics techniques, including reciprocal PCA, SCTransform, CCA, and harmony for single-cell RNA sequencing data. It details the loading of datasets, quality control measures, and the identification of cell type markers across different immune cell populations. Additionally, it emphasizes the importance of version control and the use of specific libraries to minimize bugs during analysis.

Uploaded by

janicepdudas54
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

---

title: "00.data"
output: html_notebook
editor_options:
chunk_output_type: console
---

2019-9-30
Use reciprocal PCA + reference-based + SCTransform + CCA + harmony.
Load the least library, write the more specific, to make sure less bugs.
Find markers from all the replications, separately.

2020-04-29
revise version

2020-09-17
BMC biology revision

2020-10-10
1. plos biology mouse spleen immune cell dataset,
2. nature immunology mouse spleen nk cell dataset,
3. our DNT dataset

2020-11-19
make sure save all versions

2020-1-7 version
**Use peciprocal PCA + reference-based + SCTransform + CCA + harmony to elevate the speed**
**Load the least library, write more specifically, to make sure less bugs.**
**Find markers from all the replications, separately.**
**Use the Zemmour_code to remove the contaminated cells, not a good choice, cannot classify
the cell type well in our datasets.**
**DoubletDecon: Use the DoubletDecon to remove doublets, about half of cells were marked as
doublets, and the expression of Cd74 has no difference, so the cluster 4 is not because of
doublets.**
**2020-02-07:Show the results before and after QC, and after use DoubletDecon, expression of
cluster 4 have no differences.**
**Scrublet: python version is too hard.**
**2020-02-11:DoubletFinder: is good at removing heterotypic doublets. Still cluster 4 expressing
Cd74.**
**2020-02-12 Union or intersect DoubletDecon and DoubletFinder result**

# library
```{r}
Sys.setenv(LANGUAGE = "en")
options(warn = -1)
memory.limit(size = 64670)
library(Seurat)
library(harmony)
library(MAST)
library(dplyr)
library(tidyselect)
library(RColorBrewer)
library(future)
library(ggplot2)
library(org.Mm.eg.db)
library(EnsDb.Mmusculus.v79)
library(cowplot)
library(data.table)
library(clusterProfiler)
library(DoubletDecon)
library(DoubletFinder)
```
# paralelization set
```{r}
# plan("multiprocess", workers = 4)
# options(future.globals.maxSize = 1024^4)
# plan()
# plan("sequential")
```
# theme set
```{r}
theme_set(theme_cowplot(font_size = 8))
theme.text = theme_cowplot(font_size = 8)
cols = c(brewer.pal(9, "Set1"),
brewer.pal(8, "Set2")[-c(2,4,8)],
brewer.pal(12, "Set3")[-9],
brewer.pal(12, "Paired"),
brewer.pal(8, "Dark2"),
brewer.pal(11, "Spectral")[-6],
brewer.pal(11, "BrBG")[-6]
)
```
# load data
```{r}
load("01.sct.dnt.Rdata")
load("01.sct.nk.Rdata")
load("01.sct.sp.Rdata")
load("01.sct.kept.Rdata")

load("01.harmony.dnt.nk.Rdata")
load("01.harmony.dnt.nk.meta.Rdata")

# load("01.harmony.dnt.Rdata")
# load("01.harmony.dnt.cc.marker.Rdata")

load("01.FindMarkers.between.DNT.and.CD4.CD8.NK.Rdata")
load("01.FindMarkers.between.DNT.and.CD4.CD8.NK.join.Rdata")

load("01.harmony.all.Rdata")
load("01.harmony.all.meta.all.Rdata")

load("01.cca.kept.Rdata")
load("01.cca.kept.cc.marker.Rdata")
load("01.cca.kept.cc.marker.48nk.Rdata")

load("01.cca.unfilter.Rdata")
load("01.cca.unfilter.meta.all.Rdata")

# load("01.cc.marker.DNT.Rdata")
# load("01.cca.kept.cc.marker.DNT.48nk.Rdata")

load("01.nDNT.decon.Rdata")
load("01.DoubletDecon.DoubletFinder.Rdata")

load("01.trans.marker.Rdata")
```
# prepare data (sct)
```{r}
##### run sct #####
sct = mapply(function(sample, data.dir){
print(paste("Preparing", sample))
##### read the 10x files of different formats #####
if (sample %in% c("CD4", "CD8", "TCRab", "nDNT_rep1", "nDNT_rep2", "aDNT_rep1",
"aDNT_rep2")) {
sct = Read10X(data.dir = data.dir)
} else if (sample %in% c("NK_sp1", "NK_sp2", "NK_sp3")) {
sct = data.table::fread(file = data.dir, header = "auto", sep = ",") %>%
tibble::column_to_rownames("V1")
} else if (sample %in% c("sp3_rep1", "sp3_rep2", "sp4_rep1", "sp4_rep2")){
sct = data.table::fread(file = data.dir, header = "auto", sep = ",") %>%
dplyr::mutate(SYMBOL = AnnotationDbi::mapIds(x = EnsDb.Mmusculus.v79, keys = V1,
column = "SYMBOL", keytype = "GENEID")) %>%
na.omit() %>%
dplyr::filter(!SYMBOL == "") %>%
dplyr::select(-V1)
dup = sct$SYMBOL[duplicated(sct$SYMBOL)]
z1 = sct %>%
dplyr::filter(SYMBOL %in% dup) %>%
dplyr::group_by(SYMBOL) %>%
dplyr::summarise_all(sum)
# long time, but should work
z2 = sct %>%
dplyr::filter(!SYMBOL %in% dup)
sct = rbind(z1, z2) %>%
tibble::column_to_rownames("SYMBOL")
}
print(head(sct[,1:5]))
print(head(sct[,(ncol(sct) - 5):ncol(sct)]))
##### create the seurat objects #####
sct = sct %>%
CreateSeuratObject(project = sample) %>%
PercentageFeatureSet(pattern = "^mt-",
col.name = "percent.mt") %>%
RenameCells(add.cell.id = sample)
print(dim(sct)) # print the cell number before qc
##### subset the seurat objects #####
sct = subset(sct, subset = percent.mt < 10 & nFeature_RNA > 500)
print(dim(sct)) # print the cell number after qc
print(median(sct$nCount_RNA)) # print the median UMIs after qc
print(max(sct$nCount_RNA)) # print the max UMIs after qc
##### scale each samples #####
sct = SCTransform(object = sct,
vars.to.regress = "percent.mt",
verbose = F,
conserve.memory = F )
##### return the objects #####
return(sct)
},
sample = c(list.files(path = "../01.processed_data/processed_data/") %>%
sub("_Tcell", "", x = .),
"NK_sp1", "NK_sp2", "NK_sp3",
"sp3_rep1", "sp3_rep2", "sp4_rep1", "sp4_rep2"),
data.dir = c(list.files(path = "../01.processed_data/processed_data/",
full.names = T),
list.files(path = "../01.processed_data/NK/NK_mouse_spleen/",
full.names = T),
list.files(path = "../01.processed_data/mouse_spleen_2019_plos.biology/",
pattern = "mouse_[3-4]", full.names = T)),
SIMPLIFY = F)
##### split and save the sct files #####
sct.dnt = sct[1:7]
sct.nk = sct[8:10]
sct.sp = sct[11:14]
save(sct.dnt, file = "01.sct.dnt.Rdata")
save(sct.nk, file = "01.sct.nk.Rdata")
save(sct.sp, file = "01.sct.sp.Rdata")
rm(sct)
gc()
##### sct.kept #####
sct.kept = mapply(function(sct, sample){
DefaultAssay(sct) = "RNA"
print(sample)
if (sample %in% c("nDNT_rep1", "nDNT_rep2", "aDNT_rep1", "aDNT_rep2")) {
sct.kept = subset(x = sct, subset = (Cd3d>0|Cd3e>0|Cd3g>0)&Cd4==0&Cd8b1==0&Klrb1c==0)
} else if (sample %in% "CD4") {
sct.kept = subset(x = sct, subset = (Cd3d>0|Cd3e>0|
Cd3g>0)&Cd8a==0&Cd8b1==0&Klrb1c==0 )
} else if (sample %in% "CD8") {
sct.kept = subset(x = sct, subset = (Cd3d>0|Cd3e>0|Cd3g>0)&Cd4==0&Klrb1c==0 )
} else if (sample %in% "TCRab") {
sct.kept = subset(x = sct, subset = (Cd3d>0|Cd3e>0|Cd3g>0) )
} else if (sample %in% c("NK_sp1", "NK_sp2", "NK_sp3")) {
sct.kept = subset(x = sct, subset = Cd3d == 0 & Cd3g == 0)
}
sct.kept = SCTransform(object = sct.kept,
vars.to.regress = "percent.mt",
verbose = F,
conserve.memory = F )
return(sct.kept)
}, sct = c(sct.dnt, sct.nk), sample = c(names(sct.dnt), names(sct.nk)), SIMPLIFY = F)
save(sct.kept, file = "01.sct.kept.Rdata")
```
# harmony CD4 CD8 NK DNT kept cells
```{r run CD4 CD8 NK DNT harmony}
##### merge the sct files #####
harmony.dnt.nk = merge(x = sct.kept$aDNT_rep1, y = sct.kept[2:10])
rm(sct.kept)
gc()
plan("sequential")
harmony.dnt.nk = harmony.dnt.nk %>%
SCTransform(vars.to.regress = "percent.mt", conserve.memory = F) %>%
# take a lot memory, about 40 min.
RunPCA(verbose = F)
##### change the core #####
plan("multiprocess", workers = 4)
options(future.globals.maxSize = 1024^4)
##### set the batch #####
table(harmony.dnt.nk$orig.ident)
harmony.dnt.nk$batch = factor(
harmony.dnt.nk$orig.ident,
labels = c("b1", "b2", rep("b1", 3), "b2", rep("b3", 3), "b1"))
table(harmony.dnt.nk$batch, harmony.dnt.nk$orig.ident)
##### run the harmony #####
harmony.dnt.nk = harmony.dnt.nk %>%
RunHarmony(group.by.vars = "batch", assay.use = "SCT") %>%
RunUMAP(dims = 1:30, reduction = "harmony") %>%
FindNeighbors(dims = 1:30, reduction = "harmony") %>%
FindClusters(resolution = .1)
##### set idents to be celltype #####
harmony.dnt.nk$celltype = sub("(_rep[1-2])|(_sp[1-3])", "", harmony.dnt.nk$orig.ident)
harmony.dnt.nk$celltypesplit = sub("_sp[1-3]", "", harmony.dnt.nk$orig.ident)
table(harmony.dnt.nk$celltype)
table(harmony.dnt.nk$celltypesplit)
Idents(harmony.dnt.nk) = "celltype"
##### ScaleData for findmarkers #####
plan("sequential")
DefaultAssay(harmony.dnt.nk) = "RNA"
harmony.dnt.nk = harmony.dnt.nk %>%
NormalizeData(., normalization.method = "LogNormalize", scale.factor = 10000) %>%
FindVariableFeatures(., selection.method = "vst", nfeatures = 2000) %>%
ScaleData(., vars.to.regress = "percent.mt")
##### save the results #####
save(harmony.dnt.nk, file = "01.harmony.dnt.nk.Rdata")
```
## markers
```{r}
object = harmony.dnt.nk
##### FindMarkers between DNT and CD4, CD8, NK respectively #####
markers = mapply(function(ident.1, ident.2){
FindMarkers(object = object, ident.1 = ident.1, ident.2 = ident.2, group.by = "celltypesplit", assay
= "RNA", slot = "data", only.pos = T) %>%
tibble::rownames_to_column("gene") %>%
dplyr::mutate(type = paste(ident.1, "vs", ident.2, sep = ".")) %>%
dplyr::arrange(desc(avg_logFC))
},
ident.1 = rep(c("nDNT_rep1", "nDNT_rep2", "aDNT_rep1", "aDNT_rep2"), 3),
ident.2 = rep(c("CD4", "CD8", "NK"), c(4, 4, 4)),
SIMPLIFY = F, USE.NAMES = F)
names(markers) = paste(rep(c("nDNT_rep1", "nDNT_rep2", "aDNT_rep1", "aDNT_rep2"), 3), "vs",
rep(c("CD4", "CD8","NK"), c(4,4,4)), sep = ".")
##### save results #####
save(markers, file = "01.FindMarkers.between.DNT.and.CD4.CD8.NK.Rdata")
##### join the replication result #####
genelist = list(
nDNT.vs.CD4 = intersect(markers$nDNT_rep1.vs.CD4$gene, markers$nDNT_rep2.vs.CD4$gene),
nDNT.vs.CD8 = intersect(markers$nDNT_rep1.vs.CD8$gene, markers$nDNT_rep2.vs.CD8$gene),
nDNT.vs.NK = intersect(markers$nDNT_rep1.vs.NK$gene, markers$nDNT_rep2.vs.NK$gene ),
aDNT.vs.CD4 = intersect(markers$aDNT_rep1.vs.CD4$gene, markers$aDNT_rep2.vs.CD4$gene),
aDNT.vs.CD8 = intersect(markers$aDNT_rep1.vs.CD8$gene, markers$aDNT_rep2.vs.CD8$gene),
aDNT.vs.NK = intersect(markers$aDNT_rep1.vs.NK$gene, markers$aDNT_rep2.vs.NK$gene )
)
markers.both = mapply(function(ident.1, ident.2){
FindMarkers(object = object, ident.1 = ident.1, ident.2 = ident.2, group.by = "celltype", assay =
"RNA", slot = "data", only.pos = T) %>%
tibble::rownames_to_column("gene") %>%
dplyr::mutate(type = paste(ident.1, "vs", ident.2, sep = ".")) %>%
dplyr::arrange(desc(avg_logFC))
},
ident.1 = rep(c("nDNT", "aDNT"), 3),
ident.2 = rep(c("CD4", "CD8","NK"), c(2,2,2)),
SIMPLIFY = F, USE.NAMES = F)
names(markers.both) = paste(rep(c("nDNT", "aDNT"), 3), "vs", rep(c("CD4", "CD8","NK"),c(2,2,2)),
sep = ".")
markers.join = lapply(names(markers.both), function(i){
markers.both[[i]] %>%
dplyr::filter(gene %in% genelist[[i]])
})
names(markers.join) = names(markers.both)
##### save results #####
save(genelist, markers.both, markers.join,
file = "01.FindMarkers.between.DNT.and.CD4.CD8.NK.join.Rdata")
##### write the csv #####
rbind(markers.join$nDNT.vs.CD4, markers.join$nDNT.vs.CD8, markers.join$nDNT.vs.NK) %>%
write.csv(file = "TableS2.nDNT.vs.CD4.CD8.NK.csv", x = ., row.names = F)
rbind(markers.join$aDNT.vs.CD4, markers.join$aDNT.vs.CD8, markers.join$aDNT.vs.NK) %>%
write.csv(file = "TableS3.aDNT.vs.CD4.CD8.NK.csv", x = ., row.names = F)
#####
```
# harmony DNT kept cells
```{r run nDNT, aDNT harmony [NOT USED]}
names(sct.kept)
plan("sequential")
harmony.dnt = mapply(function(i, j, lambda, theta, resolution){
##### run the harmony #####
object = merge(i, j) %>%
SCTransform(vars.to.regress = "percent.mt", conserve.memory = F) %>%
RunPCA(verbose = F) %>%
RunHarmony(group.by.vars = "orig.ident", assay.use = "SCT",
lambda = lambda, theta = theta) %>%
RunUMAP(dims = 1:30, reduction = "harmony") %>%
FindNeighbors(dims = 1:30, reduction = "harmony") %>%
FindClusters(resolution = resolution)
##### ScaleData for findmarkers #####
DefaultAssay(object) = "RNA"
object = object %>%
NormalizeData(., normalization.method = "LogNormalize", scale.factor = 10000) %>%
FindVariableFeatures(., selection.method = "vst", nfeatures = 2000) %>%
ScaleData(., vars.to.regress = "percent.mt")
##### return object #####
return(object)
}, SIMPLIFY = F,
i = list("nDNT" = sct.kept$nDNT_rep1, "aDNT" = sct.kept$aDNT_rep1),
j = list("nDNT" = sct.kept$nDNT_rep2, "aDNT" = sct.kept$aDNT_rep2),
lambda = c(1, .05),
theta = c(2, 3),
resolution = c(.05, .01))
##### run the harmony of naDNT #####
# harmony.naDNT = merge(sct.kept$aDNT_rep1, sct.kept[c(2,5:6)]) %>%
# SCTransform(vars.to.regress = "percent.mt", conserve.memory = F) %>%
# RunPCA(verbose = F)
# harmony.naDNT$batch = sub("[n|a]DNT_", "", harmony.naDNT$orig.ident)
# table(harmony.naDNT$batch, harmony.naDNT$orig.ident)
# harmony.naDNT = harmony.naDNT %>%
# RunHarmony(group.by.vars = "batch", assay.use = "SCT") %>%
# RunUMAP(dims = 1:30, reduction = "harmony") %>%
# FindNeighbors(dims = 1:30, reduction = "harmony") %>%
# FindClusters(resolution = .1)
# DimPlot(harmony.naDNT, split.by = "orig.ident", ncol = 2, label = T, cols = "Paired")
# FeaturePlot(harmony.naDNT, c("Il17a","Cxcr6", "Eomes", "Gzmb", "Cd74"), order = T,label = T)
##### save the results #####
save(harmony.dnt, file = "01.harmony.dnt.Rdata")
```
## markers
```{r [NOT USED]}
cc.marker = mapply(function(object){
lapply(sort(unique(object$seurat_clusters)), function(i){
if (min(table(object$orig.ident[object$seurat_clusters == i])) > 30) {
FindConservedMarkers(object, ident.1 = i, grouping.var = "orig.ident",
assay = "RNA", slot = "data", only.pos = T) %>%
tibble::rownames_to_column("gene") %>%
dplyr::filter(max_pval < .05 & minimump_p_val < .05) %>%
dplyr::mutate(cluster = i)
}}) %>%
data.table::rbindlist() %>%
dplyr::left_join(y = FindAllMarkers(object, assay = "RNA", slot = "data", only.pos = T), by =
c("cluster", "gene")) %>%
dplyr::arrange(desc(avg_logFC))
}, object = harmony.dnt, SIMPLIFY = F)
names(cc.marker)
save(cc.marker, file = "01.harmony.dnt.cc.marker.Rdata")
```
### 48nk.marker
```{r [NOT USED]}
object = harmony.dnt.nk
head([email protected])
df1 = rbind([email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, seurat_clusters_before = seurat_clusters),
[email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, seurat_clusters_before = seurat_clusters))
table(df1$seurat_clusters)
head(df1)
df2 = [email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, integrated_snn_res.0.01)
table(df2$integrated_snn_res.0.01)
meta = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::left_join(y = df1, by = "cells") %>%
dplyr::left_join(y = df2, by = "cells") %>%
tibble::column_to_rownames("cells")
head(meta)
meta$cluster = paste0(meta$celltype, meta$seurat_clusters_before) %>% sub("NA", "", .) %>%
factor(., levels = c(paste0("nDNT", 0:4), paste0("aDNT", 0:1), "CD4", "CD8", "NK", "TCRab"))
table(meta$cluster)
meta$recluster = paste0(sub("^[n|a]", "", meta$celltype), meta$integrated_snn_res.0.01) %>%
sub("NA", "", .) %>%
factor(., levels = c(paste0("DNT", 0:1), "CD4", "CD8", "NK", "TCRab"))
table(meta$recluster)
save(meta, file = "01.harmony.dnt.nk.meta1.Rdata")

# [email protected] = meta
# cc.marker.48nk = mapply(function(ident.1, group.by, cc.marker){
#
# x = mapply(function(ident.2){
# FindMarkers(object, ident.1, ident.2, group.by = group.by, only.pos = T, assay = "RNA", slot =
"data",
# features = cc.marker) %>%
# tibble::rownames_to_column("gene") %>%
# dplyr::filter(p_val_adj < .05)
# }, SIMPLIFY = F,
# ident.2 = c("CD4", "CD8", "NK"))
# dplyr::inner_join(x$CD4, x$CD8, by = "gene", suffix = c(".CD4", ".CD8")) %>%
# dplyr::inner_join(., x$NK, by = "gene", suffix = c("", ".NK")) %>%
# dplyr::mutate(type = ident.1)
#
# }, SIMPLIFY = F,
# ident.1 = c(paste0("nDNT",0:4), paste0("aDNT",0:1), paste0("DNT",0:1)),
# group.by = c(rep("cluster",7), rep("recluster",2)),
# cc.marker = sapply(cc.marker, function(i){
# sapply(sort(unique(i$cluster)), function(j){
# (i %>% dplyr::filter(cluster == j))$gene
# })
# }) %>%
# unlist(., recursive = F)) %>%
# data.table::rbindlist()
# head(cc.marker.48nk)
# save(cc.marker.48nk, file = "01.cca.kept.cc.marker.48nk.Rdata")
```
# harmony all
```{r run harmony all}
##### merge sct files #####
harmony.all = merge(x = sct.dnt$aDNT_rep1, y = c(sct.dnt[2:7], sct.nk, sct.sp))
rm(sct.dnt, sct.nk, sct.sp)
gc()
plan("sequential")
harmony.all = harmony.all %>%
SCTransform(vars.to.regress = "percent.mt", conserve.memory = F) %>%
# take a lot memory, about 40 min.
RunPCA(verbose = F)
##### change the core #####
plan("multiprocess", workers = 4)
options(future.globals.maxSize = 1024^4)
##### set the batch #####
table(harmony.all$orig.ident)
harmony.all$batch = factor(
harmony.all$orig.ident,
labels = c("b1", "b2", rep("b1", 3), "b2", rep("b3", 3), rep("b4", 4), "b1"))
table(harmony.all$batch, harmony.all$orig.ident)
##### run the harmony #####
[email protected] = meta.all
harmony.all = RunHarmony(object = harmony.all, group.by.vars = "batch", assay.use = "SCT",
lambda = .1, theta = 3) %>%
RunUMAP(dims = 1:30, reduction = "harmony") %>%
FindNeighbors(dims = 1:30, reduction = "harmony") %>%
FindClusters(resolution = .1)
DimPlot(harmony.all, group.by = "celltype.converge", label = T, cols = cols)
##### set idents to be celltype #####
harmony.all$celltype = factor(
harmony.all$orig.ident,
labels = c("aDNT", "aDNT", "CD4", "CD8", "nDNT", "nDNT",
rep("NK", 3),
rep("sp", 4),
"TCRab"))
harmony.all$celltypesplit = factor(
harmony.all$orig.ident,
labels = c("aDNT_rep1", "aDNT_rep2", "CD4", "CD8", "nDNT_rep1", "nDNT_rep2",
rep("NK", 3),
rep("sp", 4),
"TCRab"))
table(harmony.all$celltype)
table(harmony.all$celltypesplit)
Idents(harmony.all) = "celltype"
##### ScaleData for findmarkers #####
plan("sequential")
DefaultAssay(harmony.all) = "RNA"
harmony.all = harmony.all %>%
NormalizeData(., normalization.method = "LogNormalize", scale.factor = 10000) %>%
FindVariableFeatures(., selection.method = "vst", nfeatures = 2000) %>%
ScaleData(., vars.to.regress = "percent.mt")
##### save the results #####
save(harmony.all, file = "01.harmony.all.Rdata")
```
```{r modify harmony all metadata}
##### save the pre metadata #####
metapre = [email protected]
save(metapre, file = "01.harmony.all.metapre.Rdata")
##### import the markers from plos dataset #####
plos.meta1 = fread(input = "../01.processed_data/mouse_spleen_2019_plos.biology/other
information/pbio.3000528.s013.csv") # Mouse spleen scRNA-seq clustering data.
##### modify the plos metadata #####
meta.plos = mapply(function(i, j){
plos.meta1[grep(i, plos.meta1$cell), ] %>%
dplyr::mutate(cell.rename = paste(j, cell, sep = "_"))
},
i = c("mouse_3.1", "mouse_3.2", "mouse_4.1", "mouse_4.2"),
j = c("sp3_rep1", "sp3_rep2", "sp4_rep1", "sp4_rep2"),
SIMPLIFY = F) %>%
rbindlist() %>%
dplyr::mutate(converged.cell.type.simple = sub(".[1-4]$", "", converged.cell.type)) %>%
dplyr::select(cell.rename, converged.cell.type.simple)
head(meta.plos)
colnames(meta.plos)
table(meta.plos$converged.cell.type.simple)
##### gene expression #####
DefaultAssay(harmony.all) = "RNA"
meta.gene = FetchData(object = harmony.all, c("Cd3d", "Cd3e", "Cd3g", "Cd4", "Cd8a", "Cd8b1",
"Klrb1c"), slot = "data") %>%
tibble::rownames_to_column("cell.rename")
head(meta.gene)
##### merge the pre metadata with DNT and plos metadata #####
meta.all = metapre %>%
tibble::rownames_to_column("cell.rename") %>%
# dplyr::left_join(y = meta.DNT, by = c("cell.rename")) %>%
dplyr::left_join(y = meta.plos, by = c("cell.rename")) %>%
dplyr::left_join(y = meta.gene, by = c("cell.rename")) %>%
tibble::column_to_rownames("cell.rename")
head(meta.all)
##### put celltype inside converged.cell.type.simple #####
meta.all$converged.cell.type.simple[is.na(meta.all$converged.cell.type.simple)] =
as.character(meta.all$celltypesplit)[is.na(meta.all$converged.cell.type.simple)]
table(meta.all$converged.cell.type.simple)
##### celltype.converge #####
meta.all$celltypesplit.converge = sub("_rep[1|2]", "", meta.all$converged.cell.type.simple)
meta.all$celltype.converge = sub("_rep[1|2].*", "", meta.all$converged.cell.type.simple)
table(meta.all$celltypesplit.converge)
table(meta.all$celltype.converge)
##### T identity #####
meta.all$Tid = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &
meta.all$celltype.converge %in% c("aDNT", "CD4", "CD8", "nDNT", "TCRab",
"Memory-CD4-T-cell", "Memory-CD8-T-cell",
"Naive-CD4-T-cell", "Naive-CD8-T-cell",
"NK-T-cell")
table(meta.all$Tid, meta.all$celltype.converge)

meta.all$CD4id = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &


meta.all$Cd8a == 0 & meta.all$Cd8b1 == 0 & meta.all$Klrb1c == 0 &
meta.all$celltype.converge %in% c("CD4",
"Memory-CD4-T-cell",
"Naive-CD4-T-cell")
table(meta.all$CD4id, meta.all$celltype.converge)

meta.all$CD8id = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &


meta.all$Cd4 == 0 & meta.all$Klrb1c == 0 &
meta.all$celltype.converge %in% c("CD8",
"Memory-CD8-T-cell",
"Naive-CD8-T-cell")
table(meta.all$CD8id, meta.all$celltype.converge)

meta.all$DNTid = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &


meta.all$Cd4 == 0 & meta.all$Klrb1c == 0 & meta.all$Cd8b1 == 0 &
meta.all$celltype.converge %in% c("aDNT", "nDNT")
table(meta.all$DNTid, meta.all$celltype.converge)

meta.all$NKid = (meta.all$Cd3d == 0 & meta.all$Cd3g == 0) &


meta.all$celltype.converge %in% c("NK", "NK-cell")
table(meta.all$NKid, meta.all$celltype.converge)

meta.all$contaminate = (meta.all$Tid == FALSE &


meta.all$celltype.converge %in% c("aDNT", "CD4", "CD8", "nDNT", "TCRab",
"Memory-CD4-T-cell", "Memory-CD8-T-cell",
"Naive-CD4-T-cell", "Naive-CD8-T-cell",
"NK-T-cell")) |
(meta.all$CD4id == FALSE &
meta.all$celltype.converge %in% c("CD4",
"Memory-CD4-T-cell",
"Naive-CD4-T-cell")) |
(meta.all$CD8id == FALSE &
meta.all$celltype.converge %in% c("CD8",
"Memory-CD8-T-cell",
"Naive-CD8-T-cell")) |
(meta.all$DNTid == FALSE &
meta.all$celltype.converge %in% c("aDNT", "nDNT")) |
(meta.all$NKid == FALSE &
meta.all$celltype.converge %in% c("NK", "NK-cell"))
table(meta.all$contaminate, meta.all$celltype.converge)
##### save modified metadata #####
save(meta.all, file = "01.harmony.all.meta.all.Rdata")

```
# cca kept
```{r}
plan("sequential")
cca.kept = mapply(function(object.list){
anchor.features = SelectIntegrationFeatures(object.list = object.list,
nfeatures = 3000)
object.integrated = PrepSCTIntegration(object.list = object.list,
anchor.features = anchor.features) %>%
lapply(FUN = RunPCA, features = anchor.features, verbose = F) %>%
FindIntegrationAnchors(object.list = .,
reduction = "rpca",
anchor.features = anchor.features,
normalization.method = "SCT",
reference = 1) %>% # long time
IntegrateData(anchorset = .,
normalization.method = "SCT") %>%
RunPCA(object = .) %>%
RunUMAP(object = .,
dims = 1:30) %>%
FindNeighbors(dims = 1:30) %>%
FindClusters(resolution = .05)
##### scale for finding markers #####
DefaultAssay(object.integrated) = "RNA"
object.integrated = NormalizeData(object.integrated,
normalization.method = "LogNormalize",
scale.factor = 10000)
object.integrated = FindVariableFeatures(object.integrated,
selection.method = "vst",
nfeatures = 2000)
object.integrated = ScaleData(object.integrated,
vars.to.regress = "percent.mt")
return(object.integrated)
},
object.list = list("nDNT" = sct.kept[5:6], "aDNT" = sct.kept[1:2], "DNT" = sct.kept[c(5:6, 1:2)]),
SIMPLIFY = F)
DefaultAssay(cca.kept$DNT) = "integrated"
cca.kept$DNT = FindClusters(cca.kept$DNT, resolution = .01)
save(cca.kept, file = "01.cca.kept.Rdata")
```
## markers
```{r}
cc.marker = mapply(function(object){
print(table(object$seurat_clusters, object$orig.ident))
lapply(sort(unique(object$seurat_clusters)), function(i){
if (min(table(object$orig.ident[object$seurat_clusters == i])) > 30){
FindConservedMarkers(object, ident.1 = i, grouping.var = "orig.ident",
assay = "RNA", slot = "data", only.pos = T) %>%
tibble::rownames_to_column("gene") %>%
dplyr::filter(max_pval < .05 & minimump_p_val < .05) %>%
dplyr::mutate(cluster = i)
}}) %>%
data.table::rbindlist() %>%
dplyr::left_join(y = FindAllMarkers(object, assay = "RNA", slot = "data", only.pos = T), by =
c("cluster", "gene")) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T)
}, object = cca.kept, SIMPLIFY = F)
names(cc.marker)
save(cc.marker, file = "01.cca.kept.cc.marker.Rdata")
```
### 48nk.marker
```{r}
object = harmony.dnt.nk
# df1 = rbind([email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, integrated_snn_res.0.05),
# [email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, integrated_snn_res.0.05))
# table(df1$integrated_snn_res.0.05)
# head(df1)
# df2 = [email protected] %>% tibble::rownames_to_column("cells") %>%
dplyr::select(cells, integrated_snn_res.0.01)
# table(df2$integrated_snn_res.0.01)
# meta = [email protected] %>%
# tibble::rownames_to_column("cells") %>%
# dplyr::left_join(y = df1, by = "cells") %>%
# dplyr::left_join(y = df2, by = "cells") %>%
# tibble::column_to_rownames("cells")
# head(meta)
# meta$cluster = paste0(meta$celltype, meta$integrated_snn_res.0.05) %>% sub("NA", "", .) %>
%
# factor(., levels = c(paste0("nDNT", 0:4), paste0("aDNT", 0:1), "CD4", "CD8", "NK", "TCRab"))
# table(meta$cluster)
# meta$recluster = paste0(sub("^[n|a]", "", meta$celltype), meta$integrated_snn_res.0.01) %>%
sub("NA", "", .) %>%
# factor(., levels = c(paste0("DNT", 0:1), "CD4", "CD8", "NK", "TCRab"))
# table(meta$recluster)
# save(meta, file = "01.harmony.dnt.nk.meta.Rdata")

[email protected] = meta
cc.marker.48nk = mapply(function(ident.1, group.by, cc.marker){

x = mapply(function(ident.2){
FindMarkers(object, ident.1, ident.2, group.by = group.by, only.pos = T, assay = "RNA", slot =
"data",
features = cc.marker) %>%
tibble::rownames_to_column("gene") %>%
dplyr::filter(p_val_adj < .05)
}, SIMPLIFY = F,
ident.2 = c("CD4", "CD8", "NK"))
dplyr::inner_join(x$CD4, x$CD8, by = "gene", suffix = c(".CD4", ".CD8")) %>%
dplyr::inner_join(., x$NK, by = "gene", suffix = c("", ".NK")) %>%
dplyr::mutate(type = ident.1)

}, SIMPLIFY = F,
ident.1 = c(paste0("nDNT",0:4), paste0("aDNT",0:1), paste0("DNT",0:1)),
group.by = c(rep("cluster",7), rep("recluster",2)),
cc.marker = sapply(cc.marker, function(i){
sapply(sort(unique(i$cluster)), function(j){
(i %>% dplyr::filter(cluster == j))$gene
})
}) %>%
unlist(., recursive = F)) %>%
data.table::rbindlist()
head(cc.marker.48nk)
save(cc.marker.48nk, file = "01.cca.kept.cc.marker.48nk.Rdata")
```
### up and downregulate
```{r}
object = cca.kept$DNT
object$celltype = sub("_rep[1|2]", "", object$orig.ident) %>% factor(., levels = c("nDNT",
"aDNT"))
table(object$integrated_snn_res.0.01, object$orig.ident)
DefaultAssay(object) = "RNA"
df = mapply(function(ident.1, ident.2, subset.ident){
x = FindMarkers(object, ident.1 = ident.1, ident.2 = ident.2, subset.ident = subset.ident,
group.by = "orig.ident", assay = "RNA", slot = "data") %>%
tibble::rownames_to_column("gene") %>%
dplyr::filter(p_val_adj < .05) %>%
dplyr::mutate(type = paste0(ident.1,".vs.",ident.2), trans = avg_logFC > 0) %>%
dplyr::arrange(desc(avg_logFC))
x$trans = factor(x$trans, labels = c("down", "up"))
return(x)
},
ident.1 = rep(paste0("aDNT_rep",1:2),2),
ident.2 = rep(paste0("nDNT_rep",1:2),2),
subset.ident = c(0,0,1,1),
SIMPLIFY = F)
head(df[[1]])
trans.marker = mapply(function(df){
x = dplyr::inner_join(df[[1]], df[[2]], by = c("gene", "trans"), suffix = c(".rep1", ".rep2"))
}, SIMPLIFY = F, df = list("DNT0" = df[1:2], "DNT1" = df[3:4]))
head(trans.marker[[1]])
save(trans.marker, file = "01.trans.marker.Rdata")
```
# cca unfilter
```{r}
plan("sequential")
names(sct.dnt)
cca.unfilter = mapply(function(object.list){
anchor.features = SelectIntegrationFeatures(object.list = object.list,
nfeatures = 3000)
object.integrated = PrepSCTIntegration(object.list = object.list,
anchor.features = anchor.features) %>%
lapply(FUN = RunPCA, features = anchor.features, verbose = F) %>%
FindIntegrationAnchors(object.list = .,
reduction = "rpca",
anchor.features = anchor.features,
normalization.method = "SCT",
reference = 1) %>% # long time
IntegrateData(anchorset = .,
normalization.method = "SCT") %>%
RunPCA(object = .) %>%
RunUMAP(object = .,
dims = 1:30) %>%
FindNeighbors(dims = 1:30) %>%
FindClusters(resolution = .05)
##### scale for finding markers #####
DefaultAssay(object.integrated) = "RNA"
object.integrated = NormalizeData(object.integrated,
normalization.method = "LogNormalize",
scale.factor = 10000)
object.integrated = FindVariableFeatures(object.integrated,
selection.method = "vst",
nfeatures = 2000)
object.integrated = ScaleData(object.integrated,
vars.to.regress = "percent.mt")
return(object.integrated)
},
object.list = list("nDNT" = sct.dnt[5:6], "aDNT" = sct.dnt[1:2], "DNT" = sct.dnt[c(5:6, 1:2)]),
SIMPLIFY = F)
# object = cca.unfilter$DNT
# DefaultAssay(object) = "integrated"
# object = FindClusters(object, resolution = .02)
# DimPlot(object)
save(cca.unfilter, file = "01.cca.unfilter.Rdata")
```
```{r modify the metadata}
meta.all = mapply(function(object){
##### gene expression #####
DefaultAssay(object) = "RNA"
meta.gene = FetchData(object = object,
vars = c("Cd3d", "Cd3e", "Cd3g", "Cd4", "Cd8a", "Cd8b1", "Klrb1c"),
slot = "data") %>%
tibble::rownames_to_column("cell.rename")
head(meta.gene)
##### save the pre metadata #####
metapre = [email protected]
##### merge pre and modified metadata #####
meta.all = metapre %>%
tibble::rownames_to_column("cell.rename") %>%
dplyr::left_join(x = ., y = meta.gene, by = "cell.rename") %>%
tibble::column_to_rownames("cell.rename")
head(meta.all)
##### T identity #####
if (("Cd4" %in% colnames(meta.all)) == F){
meta.all$DNTid = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &
meta.all$Klrb1c == 0 & meta.all$Cd8b1 == 0
} else {
meta.all$DNTid = (meta.all$Cd3d > 0 |meta.all$Cd3e > 0| meta.all$Cd3g > 0) &
meta.all$Cd4 == 0 & meta.all$Klrb1c == 0 & meta.all$Cd8b1 == 0
}
table(meta.all$DNTid, meta.all$integrated_snn_res.0.05)
meta.all$contaminate = meta.all$DNTid == FALSE
table(meta.all$contaminate, meta.all$integrated_snn_res.0.05)
meta.all$contaminate = factor(meta.all$contaminate, labels = c("kept", "removed"))
return(meta.all)
}, object = cca.unfilter, SIMPLIFY = F)
save(meta.all, file = "01.cca.unfilter.meta.all.Rdata")
```
## markers
```{r}
cc.marker = mapply(function(object){
print(table(object$seurat_clusters, object$orig.ident))
lapply(sort(unique(object$seurat_clusters)), function(i){
if (min(table(object$orig.ident[object$seurat_clusters == i])) > 30){
FindConservedMarkers(object, ident.1 = i, grouping.var = "orig.ident",
assay = "RNA", slot = "data", only.pos = T) %>%
tibble::rownames_to_column("gene") %>%
dplyr::filter(max_pval < .05 & minimump_p_val < .05) %>%
dplyr::mutate(cluster = i)
}}) %>%
data.table::rbindlist() %>%
dplyr::left_join(y = FindAllMarkers(object, assay = "RNA", slot = "data", only.pos = T), by =
c("cluster", "gene")) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T)
}, object = cca.unfilter, SIMPLIFY = F)
names(cc.marker)
save(cc.marker, file = "01.cca.unfilter.cc.marker.Rdata")
```

# DoubletFinder and DoubletDecon


## DoubletDecon use sct results
```{r}
source("../26.MAIT/DoubletDecon.f.r")
############################################################
# remove doublets using DoubletDecon, dataset need to be clustered first, must have more than
two clusters, the plot window should be the maximized.
############################################################
dev.off()
object = lapply(sct.kept[5:6], function(i){
i = i %>%
RunPCA() %>%
RunUMAP(dims = 1:30) %>%
FindNeighbors() %>%
FindClusters(resolution = .1)})
nDNT.decon = mapply(DoubletDecon.f,
object = object,
filename = c("nDNT_rep1.sct", "nDNT_rep2.sct"),
rhop = 1,
SIMPLIFY = F)
save(nDNT.decon, file = "01.nDNT.decon.Rdata")
```
## DoubletFinder union Decon use sct results
```{r}
source("../26.MAIT/DoubletFinder.f.r")
doublet.finder.res = mapply(DoubletFinder.f, object = nDNT.decon, SIMPLIFY = F)
save(doublet.finder.res, file = "01.DoubletDecon.DoubletFinder.Rdata")
```

---
title: "00.plot2"
output: html_notebook
editor_options:
chunk_output_type: console
---

2020-11-20 edit

# library
```{r}
library(Seurat)
library(harmony)
library(MAST)
library(dplyr)
library(tidyselect)
library(RColorBrewer)
library(future)
library(ggplot2)
library(org.Mm.eg.db)
library(EnsDb.Mmusculus.v79)
library(cowplot)
library(data.table)
library(clusterProfiler)
library(DoubletDecon)
library(DoubletFinder)
Sys.setenv(LANGUAGE = "en")
options(warn = -1)
memory.limit(size = 64670)
```
# theme set
```{r}
theme_set(theme_cowplot(font_size = 8))
theme.text = theme_cowplot(font_size = 8)
cols = c(brewer.pal(9, "Set1"),
brewer.pal(8, "Set2")[-c(2,4,8)],
brewer.pal(12, "Set3")[-9],
brewer.pal(12, "Paired"),
brewer.pal(8, "Dark2"),
brewer.pal(11, "Spectral")[-6],
brewer.pal(11, "BrBG")[-6]
)
```
# load data
```{r}
load("01.cca.kept.Rdata")
load("01.cca.kept.cc.marker.Rdata")
load("01.cca.kept.cc.marker.48nk.Rdata")
load("01.harmony.dnt.nk.Rdata")
load("01.harmony.dnt.nk.meta.Rdata")
load("01.DoubletDecon.DoubletFinder.Rdata")
```
# load function
```{r}
source("../24.publish_least_library/function/heatmap.f.r")
source("../24.publish_least_library/function/dotplot.f.r")

```
# fig 2a dimplot
```{r}
object = cca.kept$nDNT
##### save the png #####
(DimPlot(object, cols = "Paired", pt.size = .001, order = T) +
theme_nothing() +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "tmp.png", plot = ., units = "cm", width = 6, height = 6, dpi = 1200)
##### plots #####
((DimPlot(object, pt.size = NA, cols = "Paired", combine = F))[[1]] +
theme.text +
labs(color = paste0("nDNT\ncells\n(", prettyNum(ncol(object), big.mark = ","), ")"), tag = "a") +
annotation_raster(png::readPNG("tmp.png"), -Inf, Inf, -Inf, Inf) +
annotation_custom(grob = ggplotGrob((DimPlot(object, label = T, label.size = 2.5, pt.size = NA,
combine = F))[[1]] +
theme_nothing() +
theme(aspect.ratio = 1))) +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "Fig2a.pdf", plot = ., units = "cm", width = 6, height = 5)

```
# fig 2b heatmap
```{r}
colnames(cc.marker$nDNT)
object = cca.kept$nDNT
df = cc.marker$nDNT %>%
dplyr::filter((pct.1 > .5 & cluster == 4 & avg_logFC > .5) | !(cluster == 4) & p_val_adj < .05) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T)
tail(df)
table(df$cluster)
write.csv(df, file = "TableS4.nDNT.subgroup.markers.new.csv", row.names = F)
heatmap.f(object = object,
features = df$gene,
type = df$cluster,
xlab = paste0("nDNT cells (", prettyNum(ncol(object), big.mark = ","),")"),
ylab = paste0("Subgroup markers of nDNT (", prettyNum(nrow(df), big.mark = ",") ,")"),
tag = "b",
pal = c(brewer.pal(5, "Paired"),
brewer.pal(5, "Paired")),
axis.text.y = element_blank()
) %>%
ggsave(filename = "Fig2b.pdf", plot = ., units = "cm", width = 8, height = 7)
```
# fig 2c dotplot
```{r}
##### set the parameters #####
df = cc.marker$nDNT %>%
dplyr::filter((pct.1 > .5 & cluster == 4 & avg_logFC > 1) | !(cluster == 4) & p_val_adj < .05) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("nDNT", cluster)) %>%
dplyr::left_join(y = cc.marker.48nk %>%
dplyr::filter(type %in% paste0("nDNT", 0:4)),
by = c("gene", "type"),
suffix = c("", ".48nk")) %>%
dplyr::mutate(type = "Higher than CD4 CD8 NK")
colnames(df)
head(df)
df$type[is.na(df$avg_logFC.48nk)] = "Not higher than CD4 CD8 NK"
head(df[,c("gene", "type", "avg_logFC", "cluster")])
##### annotate #####
ensembl = biomaRt::useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mm_go_anno = biomaRt::getBM(
attributes = c("external_gene_name", "description", "go_id", "name_1006",
"namespace_1003", "definition_1006"),
filters = c("external_gene_name"),
values = unique(as.character(df$gene)),
mart = ensembl) %>%
dplyr::filter(namespace_1003 %in% c("cellular_component", "molecular_function"))
cs = sort(unique(mm_go_anno$external_gene_name[grep("component of membrane",
mm_go_anno$name_1006)]))
cs.neg = sort(unique(mm_go_anno$external_gene_name[grep("mitochondrial|integral
component of Golgi membrane|intracellular membrane-bounded organelle|nuclear membrane",
mm_go_anno$name_1006)]))
cs = cs[!(cs %in% cs.neg)]
secreted = sort(unique(mm_go_anno$external_gene_name[
c(grep("extracellular space", mm_go_anno$name_1006),
grep("granzyme", mm_go_anno$description))]))
tf = sort(unique(mm_go_anno$external_gene_name[
grep("transcription factor activity|transcription activator activity
|transcription repressor activity|transcription coactivator activity|
DNA binding|DNA-binding",
mm_go_anno$name_1006)]))
intersect(cs, secreted)
intersect(cs, tf)
intersect(secreted, tf)
cs = cs[!(cs %in% tf)]
secreted = secreted[(!(secreted %in% cs)) & (!(secreted %in% tf))]
ng = sort(unique(df$gene[!(df$gene %in% c(cs, secreted, tf))]))
anno.df = data.frame(
gene = c(cs, secreted, tf, ng),
anno = c(rep("cell surface", length(cs)),
rep("secreted", length(secreted)),
rep("transcription factor", length(tf)),
rep("", length(ng)))) %>%
dplyr::filter(!(gene %in% c("Actn1", "Actn2")))
df.plot =
dplyr::left_join(df, anno.df, by = c("gene")) %>%
dplyr::select(gene, cluster, type, anno, avg_logFC) %>%
dplyr::mutate(cluster = paste0("nDNT", cluster))
df.plot$anno = factor(
df.plot$anno,
levels = c("cell surface", "secreted", "transcription factor", ""))
head(df.plot)
##### object #####
object = harmony.dnt.nk
[email protected] = meta
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"nDNT")])
table(object$cluster)
object$cluster = factor(object$cluster, levels = c(paste("nDNT",0:4,sep = ""), "CD4", "CD8", "NK"))
##### plot #####
plan("sequential")
p = mapply(function(i, j){
x = df.plot %>%
dplyr::filter(anno == i) %>%
dplyr::group_by(cluster) %>%
dplyr::top_n(10, avg_logFC) %>%
dplyr::group_by(cluster, type) %>%
dplyr::arrange(gene, .by_group = T)
head(x)
p = dotplot.f(object, features = x$gene, group.by = "cluster", facet = x$type,
axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1)) +
labs(x = "", title = j, y = "Subgroup marker genes")
return(p)
},
i = c("cell surface", "secreted", "transcription factor", ""),
j = c("Cell\nsurface","Secreted","Transcription\nfactor","Remaining\ngenes"),
SIMPLIFY = F,
USE.NAMES = F)
p[[1]] = p[[1]] + NoLegend()
p[[2]] = p[[2]] + theme(axis.title.y = element_blank()) + NoLegend()
p[[3]] = p[[3]] + theme(axis.title.y = element_blank())
p[[4]] = p[[4]] + theme(axis.title.y = element_blank()) + NoLegend()
##### change the strip color #####
for (i in 1:4) {
pal = c(brewer.pal(3, "Pastel1")[1:2])
g = ggplotGrob(p[[i]])
strips = grep("strip-", g$layout$name)
for (x in seq_along(strips)) {
k = which(grepl("rect", g$grobs[[strips[[x]]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[[x]]]]$grobs[[1]]$children[[k]]$gp$fill = pal[x]
}
p[[i]] = g
}
##### save plots #####
plot_grid(plotlist = p, nrow = 1, labels = "c", label_size = 8, rel_widths = c(1,.9,.85,1.1)) %>%
ggsave(filename = "Fig2c.pdf", plot = ., units = "cm", width = 19, height = 16)
```
# fig S6 dimplot
```{r}
object = cca.kept$nDNT
##### save pngs #####
mapply(function(ident, num){
(DimPlot(object, cols = "Paired", pt.size = .1, cells = colnames(object)[object$orig.ident ==
ident]) + theme_nothing()) %>% ggsave(filename = paste0("tmp",num,".png"), plot = ., units =
"cm", width = 6, height = 6, dpi = 1200)
}, ident = c("nDNT_rep1", "nDNT_rep2"), num = 1:2, SIMPLIFY = F)

##### annotate function #####


annotation_custom2 <- function(grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, data){
layer(data = data, stat = StatIdentity, position = PositionIdentity,
geom = ggplot2:::GeomCustomAnn,
inherit.aes = TRUE, params = list(grob = grob,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax))}
##### plots #####
p = DimPlot(object, cols = "Paired", pt.size = NA, split.by = "orig.ident", combine = F)[[1]]
(p +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp1.png")), -Inf, Inf, -Inf, Inf, data =
p$data[which(p$data$orig.ident == "nDNT_rep1")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "nDNT_rep1"]), pt.size = NA, label = T, label.size = 2.5) + NoAxes() +
NoLegend()), -Inf, Inf, -Inf, Inf, data = p$data[which(p$data$orig.ident == "nDNT_rep1")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp2.png")), -Inf, Inf, -Inf, Inf, data =
p$data[which(p$data$orig.ident == "nDNT_rep2")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "nDNT_rep2"]), pt.size = NA, label = T, label.size = 2.5) + NoAxes() +
NoLegend()), -Inf, Inf, -Inf, Inf, data = p$data[which(p$data$orig.ident == "nDNT_rep2")[1],]) +
labs(color = paste0("nDNT\ncells\n(", prettyNum(ncol(object), big.mark = ","),")")) +
theme.text +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "FigS6.pdf", plot = ., units = "cm", width = 12, height = 6)
```
# fig S7 heatmap
```{r}
##### set the parameters #####
df = cc.marker$nDNT %>%
dplyr::filter((pct.1 > .5 & cluster == 4 & avg_logFC > 1) | !(cluster == 4) & p_val_adj < .05) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("nDNT", cluster)) %>%
dplyr::left_join(y = cc.marker.48nk %>%
dplyr::filter(type %in% paste0("nDNT", 0:4)),
by = c("gene", "type"),
suffix = c("",".NK")) %>%
dplyr::mutate(type = "Higher than CD4 CD8 NK")
colnames(df)
df$type[is.na(df$pct.2.NK)] = "Not higher than CD4 CD8 NK"
head(df[,c("gene", "type")])
df = df[df$type == "Higher than CD4 CD8 NK",]
df$gene = factor(df$gene, levels = rev(sort(unique(as.character(df$gene)))))
##### write csv #####
write.csv(df, file = "TableS5.nDNT.subgroup.CD48NK.markers.csv", row.names = F)
##### object #####
object = harmony.dnt.nk
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"nDNT")])
object$id.dot = as.character(object$celltype)
object$id.dot[grep("nDNT", object$id.dot)] = paste("nDNT",
cca.kept$nDNT$integrated_snn_res.0.05, sep = "")
object$id.dot = factor(object$id.dot, levels = c(paste("nDNT",0:4,sep = ""), "CD4", "CD8", "NK"))
df1 = df[1:95,]
df2 = df[96:(96 + 95),]
df3 = df[(96 + 95):nrow(df),]
##### plot #####
p1 = heatmap.f(object = object,
features = df1$gene,
type = df1$cluster,
group.by = "id.dot",
angle.x = 90,
axis.text.y = element_text(face = "italic", hjust = 0, size = 7),
pal = c(brewer.pal(8, "Paired"),
brewer.pal(8, "Paired")[1:2]),
xlab = "",
ylab = "",
tag = "",
space = "free_y")
p2 = heatmap.f(object = object,
features = df2$gene,
type = df2$cluster,
group.by = "id.dot",
angle.x = 90,
axis.text.y = element_text(face = "italic", hjust = 0, size = 7),
pal = c(brewer.pal(8, "Paired"),
brewer.pal(8, "Paired")[2:3]),
xlab = "",
ylab = "",
tag = "",
space = "free_y")
p3 = heatmap.f(object = object,
features = df3$gene,
type = df3$cluster,
group.by = "id.dot",
angle.x = 90,
axis.text.y = element_text(face = "italic", hjust = 0, size = 7),
pal = c(brewer.pal(8, "Paired"),
brewer.pal(8, "Paired")[3:5]),
xlab = "",
ylab = paste("Subgroup markers of nDNT higher than CD4 CD8 NK (", nrow(df), ")", sep =
""),
tag = "",
space = "free_y")
#####
plot_grid(p1, p2, p3, nrow = 1) %>%
ggsave(filename = "FigS7.pdf", plot = ., units = "cm", width = 19, height = 25)
```
# fig S8 go
```{r cluego}
##### read the data #####
df = read.table(file = "FigS8.cluego.txt", header = T, sep = "\t") %>%
setNames(c("Term", "Source", "AdjustP", "AssociatedGenes", "Num.Genes", "Genes", "Cluster"))
%>%
dplyr::group_by(Cluster, Source) %>%
dplyr::arrange(Term, .by_group = T)
df$Term = factor(df$Term, levels = unique(df$Term))
head(df)
##### plot #####
p = ggplot(df, aes(Cluster, Term, color = AdjustP, size = AssociatedGenes)) +
geom_point() +
facet_grid(rows = vars(Source), scales = "free", space = "free") +
scale_color_distiller(palette = "Spectral", direction = 1, guide = guide_colorbar(default.unit =
"cm", barwidth = .2, barheight = 1.5)) +
scale_size_area(max_size = 2.5, guide_legend(title = "Associated\nGenes (%)")) +
labs(x = "nDNT cluster") +
theme.text +
theme(strip.text.y = element_text(angle = 0))
##### change the strip color #####
g = ggplotGrob(p)
pal = brewer.pal(5, "Set3")
strips = grep("strip-", g$layout$name)
for (i in seq_along(strips)) {
k = which(grepl("rect",
g$grobs[[strips[i]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[i]]]$grobs[[1]]$children[[k]]$gp$fill = pal[i]
}
##### save plot #####
ggsave(filename = "FigS8.cluego.pdf", plot = plot_grid(g), units = "cm", width = 16, height = 9)
```

# fig S10 doubletfinder


```{r}
##### set parameters #####
meta = lapply(doublet.finder.res, function(i){
[email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::select(cells, isADoublet, contains("DF.classifications")) %>%
setNames(c("cells", "isADoublet", "rm","DoubletFinder")) %>%
dplyr::select(-rm)
}) %>%
data.table::rbindlist() # get the doublets information
object = cca.kept$nDNT
[email protected] = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::left_join(y = meta, by = "cells") %>%
tibble::column_to_rownames("cells") %>%
dplyr::mutate(doublets = paste(isADoublet, DoubletFinder)) # put doublets information into cca
results
head([email protected])
# "FALSE Singlet" # singlets type
##### plots #####
mapply(function(features){
(FeaturePlot(object, features, pt.size = .01, order = T) +
scale_color_distiller(palette = "Spectral") +
theme_nothing() +
theme(panel.background = element_rect(fill = "lightgrey", colour = NA))) %>%
ggsave(filename = "tmp.png", plot = ., units = "cm", width = 6, height = 6, dpi = 1200)
mapply(function(object, subtitle){
p1 = FeaturePlot(object, features, pt.size = NA) +
scale_color_distiller(palette = "Spectral", guide = guide_colorbar(title = "Expression",
title.position = "left", title.theme = element_text(angle = 90, size = 7, hjust = .5, vjust = .5),
default.unit = "cm", barwidth = .1, barheight = 1.5)) +
labs(subtitle = paste0(subtitle, " doublets removed")) +
theme.text +
theme(aspect.ratio = 1,
plot.title = element_text(face = "italic", hjust = .5, size = 8),
plot.subtitle = element_text(hjust = .5)) +
NoAxes() +
annotation_raster(png::readPNG("tmp.png"), -Inf, Inf, -Inf, Inf) +
annotation_custom(ggplotGrob(FeaturePlot(object, features, pt.size = NA, label = T, label.size =
2.5, repel = T) + theme_nothing()))
p2 = VlnPlot(object, features, pt.size = F) +
scale_fill_brewer(palette = "Paired") +
labs(x = "nDNT") +
theme.text +
NoLegend() +
theme(plot.title = element_blank())
plot_grid(p1, p2, ncol = 1, rel_heights = c(2, 1))
}, object = list(object, subset(object, cells = colnames(object)[object$doublets == "FALSE
Singlet"])), SIMPLIFY = F, subtitle = c("Before", "After")) %>%
plot_grid(plotlist = ., nrow = 1)
},
features = c("Cd74", "H2-Ab1", "H2-Eb1", "Spi1"),
SIMPLIFY = F) %>%
plot_grid(plotlist = ., labels = "auto", label_size = 8) %>%
ggsave(filename = "FigS10.pdf", plot = ., units = "cm", width = 19, height = 18)

```

---
title: "00.plot3"
output: html_notebook
editor_options:
chunk_output_type: console
---

2020-11-24

# library
```{r}
library(Seurat)
library(ggplot2)
library(cowplot)
library(dplyr)
library(grid)
```
# theme set
```{r}
theme.text = theme(text = element_text(size = 8),
axis.text = element_text(size = 8),
axis.title = element_text(size = 8),
plot.title = element_text(size = 8, face = "plain"),
plot.tag = element_text(size = 8, face = "bold"))
theme_set(theme_cowplot())
```
# fig 3c barplot
```{r}
N2 = 17
N4 = 10.1
N0 = 70.9*82.8/100
N1 = 70.9*82.8/100
N3 = 7.88*70.9/100
df = data.frame("percent" = c(N0,N1,N2,N3,N4),
"cluster" = 0:4,
"type" = "flow")
df$percent = round(df$percent/sum(df$percent)*100, digits = 1)
df1 = (prop.table(table(cca.kept$nDNT$integrated_snn_res.0.05, cca.kept$nDNT$orig.ident),
2)*100) %>% round(., 1) %>%
data.frame() %>%
setNames(c("cluster", "type", "percent"))
df = rbind(df, df1)
head(df)
(ggplot(df, aes(type, percent, fill = cluster)) +
geom_bar(stat = "identity", width = .7) +
scale_fill_brewer(palette = "Paired") +
labs(x = "", y = "percentage (%)", tag = "c") +
theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1))) %>%
ggsave(filename = "Fig3c.pdf", plot = ., units = "cm", width = 3.5, height = 4.5)
# df = df %>%
# dplyr::mutate(lab.pos = cumsum(percent) - .1*percent)
#c=
# ggplot(data = df,
# mapping = aes(x = "", y = percent, fill = type)) +
# geom_bar(stat = "identity") +
# coord_polar("y", start = 0) +
# scale_fill_brewer(palette = "Paired") +
# NoAxes() +
# geom_text(aes(y = lab.pos,
# label = scales::percent(percent/100, accuracy = 1)),
# size = 3) +
# theme.text +
# theme(plot.title = element_text(hjust = .5)) +
# labs(fill = "Cell Type",
# title = "nDNT cells",
# tag = "c")
c
```
# fig 3d heatmap
```{r}
load("01.cc.marker.Rdata")
df = read.table("../11.王崧验证结果/Naive01234.txt",
header = T,
stringsAsFactors = F) %>%
dplyr::select(N0 = Naive_0,
N1 = Naive_2,
N2 = Naive_1,
N3 = Naive_3,
N4 = Naive_4) %>%
t() %>%
as.data.frame() %>%
tibble::rownames_to_column("cell") %>%
reshape::melt("cell") %>%
dplyr::group_by(variable) %>%
dplyr::mutate(value = scale(value)) %>%
dplyr::select(gene = variable, cell, value) %>%
dplyr::left_join(
y = cc.marker$nDNT[, c("gene", "cluster")],
by = c("gene")
)
df$gene = factor(df$gene,
levels = rev(sort(unique(df$gene))))
d = ggplot(data = df,
mapping = aes(x = cell, y = gene, fill = value)) +
geom_tile(color = "white", size = 1) +
facet_grid(cluster~., scales = "free", space = "free") +
scale_fill_distiller(
palette = "RdBu",
guide = guide_colorbar(
title = expression(paste(Scaled, " ", Delta, Ct)),
title.position = "top",
barwidth = 1.5,
barheight = .2,
default.unit = "cm"
),
breaks = seq(-2, 2)
)+
labs(x = "nDNT cluster",
y = "",
tag = "d") +
scale_x_discrete(labels = as.character(0:4)) +
theme.text +
theme(legend.position = "bottom",
legend.justification = c(1,1),
axis.line = element_blank(),
axis.ticks = element_blank(),
axis.text.y = element_text(face = "italic"),
strip.text.y = element_text(angle = 0))
d
```
# fig3e facs stat subset
```{r FACS.stat.subset}
files = list.files(pattern = "FACS.*.csv", path = "../26.MAIT/")
files
celltype = c("nDNT0", "nDNT1", "nDNT2", "nDNT3", "nDNT4")
genetype = c("IKZF2", "Ly-6C", "IL17A", "GZMB")
df = lapply(grep("Helios.csv|Ly6c.csv|Il17a|Gzmb.v2", files, value = T), function(file){
df = read.csv(file = paste0("../26.MAIT/", file)) %>%
dplyr::mutate(gene = sub("Helios", "IKZF2",
sub("Gzmb", "GZMB",
sub("Ly6C", "Ly-6C",
sub("_[1-5]", "", X))))) %>%
tibble::column_to_rownames(var = "X") %>%
reshape2::melt(id.vars = "gene", value.name = "percentage") %>%
dplyr::mutate(percentage = as.numeric(sub("%", "", percentage)))
}) %>%
dplyr::bind_rows() %>%
dplyr::filter(variable %in% celltype)
df$gene = factor(df$gene, levels = genetype)
df$variable = factor(df$variable, levels = celltype)
head(df)
df.sum = df %>%
dplyr::group_by(gene, variable) %>%
dplyr::mutate(avg = mean(percentage), sd = sd(percentage)) %>%
dplyr::select(-percentage, -variable) %>%
unique()
df.sum$gene = factor(df.sum$gene, levels = genetype)
df.sum$variable = factor(df.sum$variable, levels = celltype)
df.sum
pvalue = c(
sapply(celltype[2:5], function(i){
t.test(subset(df, gene == "IKZF2" & variable == i)$percentage,
subset(df, gene == "IKZF2" & variable == "nDNT0")$percentage,
paired = T)$p.value
}),
sapply(celltype[2:5], function(i){
t.test(subset(df, gene == "Ly-6C" & variable == i)$percentage,
subset(df, gene == "Ly-6C" & variable == "nDNT0")$percentage,
paired = T)$p.value
}),
sapply(celltype[c(1, 3:5)], function(i){
t.test(subset(df, gene == "IL17A" & variable == i)$percentage,
subset(df, gene == "IL17A" & variable == "nDNT1")$percentage,
paired = T)$p.value
}),
sapply(celltype[c(1:3, 5)], function(i){
t.test(subset(df, gene == "GZMB" & variable == i)$percentage,
subset(df, gene == "GZMB" & variable == "nDNT3")$percentage,
paired = F)$p.value
}))
pvalue
pvalue = sapply(pvalue, function(i) {
if (i > .05) i = "ns" else if (i <= 0.05 & i > .01) i = "*" else if (i <= .01 & i > .001) i = "**" else if (i
<= .001) i = "***"
})
pvalue
max(df$percentage[df$gene == "IL17A"])
anno = data.frame(x1 = c(rep(0, 8),
0, 1, 1, 1,
0, 1, 2, 3) + 1,
x2 = c(1:4, 1:4 ,
1, 2, 3, 4,
3, 3, 3, 4) + 1,
y1 = c(seq(77, 200, 10)[1:4],
seq(77, 200, 10)[1:4],
seq(13, 200, 10)[1:4],
seq(75, 200, 10)[1:4]),
lab = pvalue,
gene = rep(genetype, c(4, 4, 4, 4))) %>%
dplyr::mutate(y2 = y1 + 3,
ystar = y1 + 6,
xstar = (x1 + x2)/2)
anno
p1 = ggplot(data = df.sum, mapping = aes(x = variable, y = avg)) +
geom_bar(stat = "identity", fill = NA, width = .6, aes(color = variable)) +
geom_errorbar(aes(ymin = avg - sd, ymax = avg + sd, color = variable), width = .3) +
geom_jitter(data = df, mapping = aes(x = variable, y = percentage, color = variable),
size = .5, width = .35) +
facet_grid(gene~., scales = "free_y", drop = F) +
geom_text(data = anno, aes(x = xstar, y = ystar, label = lab), size = 2) +
geom_segment(data = anno, aes(x = x1, xend = x1, y = y1, yend = y2), colour = "black") +
geom_segment(data = anno, aes(x = x2, xend = x2, y = y1, yend = y2), colour = "black") +
geom_segment(data = anno, aes(x = x1, xend = x2, y = y2, yend = y2), colour = "black") +
NoLegend() +
theme.text +
theme(strip.text.y.right = element_text(angle = 90, hjust = .5, vjust = .5)) +
labs(x = "nDNT", y = "FACS Percentage (%)") +
scale_x_discrete(labels = 0:4) +
scale_color_brewer(
palette = "Set1",
labels = labels,
guide = guide_legend(label.hjust = 0, default.unit = "mm", keywidth = 3, keyheight = 3))
p1
# ggsave(filename = "Figure3e.png", plot = p1, units = "cm", width = 4, height = 8)
# pdf("Figure3e.pdf", useDingbats = F, width = 4/2.54, height = 6/2.54)
# print(p1)
# dev.off()
```
```{r vlnplot ikzf2 ly6c2}
# load("01.cca.Rdata")
object = cca$nDNT
DefaultAssay(object) = "SCT"
df = FetchData(object, c("Ikzf2", "Ly6c1", "Ly6c2", "Il17a", "Gzmb","seurat_clusters")) %>%
dplyr::rowwise() %>%
dplyr::mutate(Ly6c = sum(c(Ly6c1, Ly6c2))) %>%
reshape2::melt(id.vars = "seurat_clusters") %>%
dplyr::filter(!variable %in% c("Ly6c1", "Ly6c2"))
head(df)
table(df$variable)
df$variable = factor(df$variable, levels = c("Ikzf2", "Ly6c", "Il17a", "Gzmb"))
striplabel = c("Ikzf2", "Ly6c1 + Ly6c2", "Il17a", "Gzmb")
names(striplabel) = levels(df$variable)
p2 = ggplot(data = df, mapping = aes(x = seurat_clusters, y = value, color = seurat_clusters)) +
geom_violin(scale = "width") +
facet_grid(variable~., scales = "free", labeller = as_labeller(striplabel)) +
scale_color_brewer(palette = "Set1") +
theme.text +
theme(strip.text.y.right = element_text(angle = 90, hjust = .5, vjust = .5, face = "italic")) +
NoLegend() +
labs(x = "nDNT", y = "Expression level")
p2
p = plot_grid(p1, p2, nrow = 1)
ggsave(filename = "Figure3e.png", plot = p, units = "cm", width = 8, height = 9)
pdf("Figure3e.pdf", useDingbats = F, width = 8/2.54, height = 9/2.54)
print(p)
dev.off()
```
```{r ly6c1 ly6c2}
object = cca$nDNT
DefaultAssay(object) = "SCT"
df = FetchData(object, c("Ly6c1", "Ly6c2", "seurat_clusters")) %>%
reshape2::melt(id.vars = "seurat_clusters")
head(df)
table(df$variable)
p2 = ggplot(data = df, mapping = aes(x = seurat_clusters, y = value, color = seurat_clusters)) +
geom_violin(scale = "width") +
facet_grid(~variable, scales = "free") +
scale_color_brewer(palette = "Set1") +
theme.text +
theme(strip.text.x = element_text(angle = 0, hjust = .5, vjust = .5, face = "italic")) +
NoLegend() +
labs(x = "nDNT", y = "Expression level")
p2
ggsave(filename = "Figure3g.png", plot = p2, units = "cm", width = 4, height = 3)
pdf("Figure3g.pdf", useDingbats = F, width = 4/2.54, height = 3/2.54)
print(p2)
dev.off()
```

# fig 3
```{r}
pdf(file = "Fig.3.pdf",
width = 21/2.54,
height = 29.7/2.54,
useDingbats = F)
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow = 297, ncol = 210)))
print(c, vp = viewport(layout.pos.row = 140:190,
layout.pos.col = 10:80))
print(d, vp = viewport(layout.pos.row = 100:220,
layout.pos.col = 100:145))
dev.off()
```

---
title: "00.plot4"
output: html_notebook
editor_options:
chunk_output_type: console
---

2020-11-21 edit

# library
```{r}
Sys.setenv(LANGUAGE = "en")
library(Seurat)
library(harmony)
library(MAST)
library(dplyr)
library(tidyselect)
library(RColorBrewer)
library(future)
library(ggplot2)
library(org.Mm.eg.db)
library(EnsDb.Mmusculus.v79)
library(cowplot)
library(data.table)
library(clusterProfiler)
library(DoubletDecon)
library(DoubletFinder)
options(warn = -1)
memory.limit(size = 64670)
```
# theme set
```{r}
theme_set(theme_cowplot(font_size = 8))
theme.text = theme_cowplot(font_size = 8)
cols = c(brewer.pal(9, "Set1"),
brewer.pal(8, "Set2")[-c(2,4,8)],
brewer.pal(12, "Set3")[-9],
brewer.pal(12, "Paired"),
brewer.pal(8, "Dark2"),
brewer.pal(11, "Spectral")[-6],
brewer.pal(11, "BrBG")[-6]
)
```
# load function
```{r}
source("../24.publish_least_library/function/heatmap.f.r")
source("../24.publish_least_library/function/dotplot.f.r")

```
# load data
```{r}
load("01.cca.kept.Rdata")
load("01.cca.kept.cc.marker.Rdata")
load("01.cca.kept.cc.marker.48nk.Rdata")

load("01.harmony.dnt.nk.Rdata")
load("01.harmony.dnt.nk.meta.Rdata")
```
# fig 4a dimplot
```{r}
object = cca.kept$aDNT
##### save the png #####
(DimPlot(object, cols = "Paired", pt.size = .001, order = T) +
theme_nothing() +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "tmp.png", plot = ., units = "cm", width = 6, height = 6, dpi = 1200)
##### plots #####
p1 = (DimPlot(object, pt.size = NA, cols = "Paired", combine = F))[[1]] +
theme.text +
labs(color = paste0("aDNT\ncells\n(", prettyNum(ncol(object), big.mark = ","), ")"), tag = "a") +
annotation_raster(png::readPNG("tmp.png"), -Inf, Inf, -Inf, Inf) +
annotation_custom(grob = ggplotGrob((DimPlot(object, label = T, label.size = 2.5, pt.size = NA,
combine = F))[[1]] +
theme_nothing() +
theme(aspect.ratio = 1))) +
theme(aspect.ratio = 1)
```
```{r barplot}
df = (prop.table(table(cca.kept$aDNT$integrated_snn_res.0.05, cca.kept$aDNT$orig.ident),
2)*100) %>% round(., 1) %>%
data.frame() %>%
setNames(c("cluster", "type", "percent"))
head(df)
p2 = ggplot(df, aes(type, percent, fill = cluster)) +
geom_bar(stat = "identity", width = .7) +
scale_fill_brewer(palette = "Paired") +
labs(x = "", y = "percentage (%)", tag = "") +
theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1),
aspect.ratio = 2)
plot_grid(p1, p2, nrow = 1, rel_widths = c(2, 1)) %>%
ggsave(filename = "Fig4a.pdf", width = 9, height = 6, units = "cm")
```

# fig 4b heatmap
```{r}
object = cca.kept$aDNT
colnames(cc.marker$aDNT)
df = cc.marker$aDNT %>%
dplyr::filter((pct.1 > .5 & cluster == 4 & avg_logFC > 1) | !(cluster == 4) & p_val_adj < .05) %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T)
tail(df)
table(df$cluster)
##### write.csv #####
write.csv(df, file = "TableS7.aDNT.subgroup.markers.csv", row.names = F)
##### plots #####
heatmap.f(object = object,
features = df$gene,
type = df$cluster,
xlab = paste0("aDNT cells (", prettyNum(ncol(object), big.mark = ","),")"),
ylab = paste0("Subgroup markers of aDNT (", prettyNum(nrow(df), big.mark = ",") ,")"),
tag = "b",
pal = c(brewer.pal(3, "Paired")[1:2],
brewer.pal(3, "Paired")[1:2]),
axis.text.y = element_blank()
) %>%
ggsave(filename = "Fig4b.pdf", plot = ., units = "cm", width = 8, height = 7)
```
# fig 4c dotplot
```{r}
##### set the parameters #####
df = cc.marker$aDNT %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("aDNT", cluster)) %>%
dplyr::left_join(y = cc.marker.48nk %>%
dplyr::filter(type %in% paste0("aDNT", 0:4)),
by = c("gene", "type"),
suffix = c("", ".48nk")) %>%
dplyr::mutate(type = "Higher than\nCD4 CD8 NK")
df$type[is.na(df$avg_logFC.48nk)] = "Not higher than\nCD4 CD8 NK"
head(df[,c("gene", "type", "avg_logFC", "cluster")])
##### annotate #####
ensembl = biomaRt::useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mm_go_anno = biomaRt::getBM(
attributes = c("external_gene_name", "description", "go_id", "name_1006",
"namespace_1003", "definition_1006"),
filters = c("external_gene_name"),
values = unique(as.character(df$gene)),
mart = ensembl) %>%
dplyr::filter(namespace_1003 %in% c("cellular_component", "molecular_function"))
cs = sort(unique(mm_go_anno$external_gene_name[grep("component of membrane",
mm_go_anno$name_1006)]))
cs.neg = sort(unique(mm_go_anno$external_gene_name[grep("mitochondrial|integral
component of Golgi membrane|intracellular membrane-bounded organelle|nuclear membrane",
mm_go_anno$name_1006)]))
cs = cs[!(cs %in% cs.neg)]
secreted = sort(unique(mm_go_anno$external_gene_name[
c(grep("extracellular space", mm_go_anno$name_1006),
grep("granzyme", mm_go_anno$description))]))
tf = sort(unique(mm_go_anno$external_gene_name[
grep("transcription factor activity|transcription activator activity
|transcription repressor activity|transcription coactivator activity|
DNA binding|DNA-binding",
mm_go_anno$name_1006)]))
intersect(cs, secreted)
intersect(cs, tf)
intersect(secreted, tf)
cs = cs[!(cs %in% tf)]
secreted = secreted[(!(secreted %in% cs)) & (!(secreted %in% tf))]
ng = sort(unique(df$gene[!(df$gene %in% c(cs, secreted, tf))]))
anno.df = data.frame(
gene = c(cs, secreted, tf, ng),
anno = c(rep("cell surface", length(cs)),
rep("secreted", length(secreted)),
rep("transcription factor", length(tf)),
rep("", length(ng)))) %>%
dplyr::filter(!(gene %in% c("Actn1", "Actn2")))
df.plot =
dplyr::left_join(df, anno.df, by = c("gene")) %>%
dplyr::select(gene, cluster, type, anno, avg_logFC) %>%
dplyr::mutate(cluster = paste0("aDNT", cluster))
df.plot$anno = factor(
df.plot$anno,
levels = c("cell surface", "secreted", "transcription factor", ""))
##### object #####
object = harmony.dnt.nk
[email protected] = meta
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"aDNT")])
table(object$cluster)
object$cluster = factor(object$cluster, levels = c(paste("aDNT",0:1,sep = ""), "CD4", "CD8", "NK"))
##### plot #####
p = mapply(function(i, j){
x = df.plot %>%
dplyr::filter(anno == i) %>%
dplyr::group_by(cluster) %>%
dplyr::top_n(10, avg_logFC) %>%
dplyr::group_by(cluster, type) %>%
dplyr::arrange(gene, .by_group = T)
head(x)
p = dotplot.f(object, features = x$gene, group.by = "cluster", facet = x$type,
axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1)) +
labs(x = "", title = j, y = "Subgroup marker genes")
return(p)
},
i = c("cell surface", "secreted", "transcription factor", ""),
j = c("Cell\nsurface","Secreted","Transcription\nfactor","Remaining\ngenes"),
SIMPLIFY = F,
USE.NAMES = F)
p[[1]] = p[[1]] + NoLegend()
p[[2]] = p[[2]] + theme(axis.title.y = element_blank()) + NoLegend()
p[[3]] = p[[3]] + theme(axis.title.y = element_blank())
p[[4]] = p[[4]] + theme(axis.title.y = element_blank()) + NoLegend()
##### change the strip color #####
for (i in 1:4) {
pal = c(brewer.pal(3, "Pastel1")[1:2])
g = ggplotGrob(p[[i]])
strips = grep("strip-", g$layout$name)
for (x in seq_along(strips)) {
k = which(grepl("rect", g$grobs[[strips[[x]]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[[x]]]]$grobs[[1]]$children[[k]]$gp$fill = pal[x]
}
p[[i]] = g
}
##### save plots #####
plot_grid(plotlist = p, nrow = 1, labels = "c", label_size = 8, rel_widths = c(1,.9,.85,1.1)) %>%
ggsave(filename = "Fig4c.pdf", plot = ., units = "cm", width = 16, height = 8)
```
# fig 4d go
```{r}
##### set genelist #####
df = cc.marker$aDNT %>%
dplyr::select(gene, cluster) %>%
dplyr::mutate(cluster = paste0("aDNT",cluster))
head(df)
combinations = mapply(function(i){df$gene[df$cluster == i]}, i = paste0("aDNT", 0:1), SIMPLIFY =
F) %>%
lapply(., function(i){
AnnotationDbi::mapIds(x = org.Mm.eg.db, keys = i, column = "ENTREZID", keytype = "SYMBOL")
})
names(combinations)
### go #####
go = clusterProfiler::compareCluster(geneClusters = combinations, # list names needed
fun = "enrichGO",
ont = "ALL",
OrgDb = org.Mm.eg.db,
readable = T,
pool = T)
head(go@compareClusterResult)
### kegg #####
kegg = clusterProfiler::compareCluster(geneClusters = combinations,
fun = "enrichKEGG",
keyType = "ncbi-geneid",
organism = "mmu",
use_internal_data = T)
kegg@readable = F
kegg = setReadable(x = kegg, OrgDb = "org.Mm.eg.db", keyType = "ENTREZID")
kegg@compareClusterResult$ONTOLOGY = "KEGG"
head(kegg@compareClusterResult)
##### plot #####
lapply(c(go, kegg), function(i){
enrichplot::dotplot(object = i, showCategory = 5, font.size = 8) +
facet_grid(ONTOLOGY ~ ., scales = "free", space = "free") +
scale_size_area(max_size = 2.5) +
scale_color_distiller(palette = "Spectral", direction = 1, guide = guide_colorbar(
default.unit = "mm", barwidth = 2)) +
theme.text +
theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1))
}) %>%
plot_grid(plotlist = ., ncol = 1, labels = "d", label_size = 8) %>%
ggsave(filename = "Fig4d.pdf", plot = ., units = "cm", width = 9, height = 10)
```
```{r cluego}
##### read the data #####
df = read.table(file = "Fig4d.cluego.txt", header = T, sep = "\t") %>%
setNames(c("Term", "Source", "AdjustP", "AssociatedGenes", "Num.Genes", "Genes", "Cluster"))
%>%
dplyr::group_by(Cluster, Source) %>%
dplyr::arrange(AdjustP, .by_group = T)
df$Term = factor(df$Term, levels = unique(df$Term))
df$Cluster = as.character(df$Cluster)
head(df)
##### plot #####
p = ggplot(df, aes(Cluster, Term, color = AdjustP, size = AssociatedGenes)) +
geom_point() +
facet_grid(rows = vars(Source), scales = "free", space = "free") +
scale_color_distiller(palette = "Spectral", direction = 1, guide = guide_colorbar(default.unit =
"cm", barwidth = .2, barheight = 1.5)) +
scale_size_area(max_size = 3, guide_legend(title = "Associated\nGenes (%)")) +
labs(x = "aDNT cluster") +
theme.text
p
##### change the strip color #####
g = ggplotGrob(p)
pal = brewer.pal(4, "Set3")
strips = grep("strip-", g$layout$name)
for (i in seq_along(strips)) {
k = which(grepl("rect",
g$grobs[[strips[i]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[i]]]$grobs[[1]]$children[[k]]$gp$fill = pal[i]
}
##### save plot #####
ggsave(filename = "Fig4d.cluego.pdf", plot = plot_grid(g), units = "cm", width = 10.5, height = 8)
```

# fig S11 dimplot and featureplot


```{r dimplot}
object = cca.kept$aDNT
##### save pngs #####
mapply(function(ident, num){
(DimPlot(object, cols = "Paired", pt.size = .1, cells = colnames(object)[object$orig.ident ==
ident]) + theme_nothing()) %>% ggsave(filename = paste0("tmp",num,".png"), plot = ., units =
"cm", width = 6, height = 6, dpi = 1200)
}, ident = c("aDNT_rep1", "aDNT_rep2"), num = 1:2, SIMPLIFY = F)

##### annotate function #####


annotation_custom2 <- function(grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, data){
layer(data = data, stat = StatIdentity, position = PositionIdentity,
geom = ggplot2:::GeomCustomAnn,
inherit.aes = TRUE, params = list(grob = grob,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax))}
##### plots #####
p = DimPlot(object, cols = "Paired", pt.size = NA, split.by = "orig.ident", combine = F)[[1]]
(p +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp1.png")), -Inf, Inf, -Inf, Inf, data =
p$data[which(p$data$orig.ident == "aDNT_rep1")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "aDNT_rep1"]), pt.size = NA, label = T, label.size = 2.5) + NoAxes() +
NoLegend()), -Inf, Inf, -Inf, Inf, data = p$data[which(p$data$orig.ident == "aDNT_rep1")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp2.png")), -Inf, Inf, -Inf, Inf, data =
p$data[which(p$data$orig.ident == "aDNT_rep2")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "aDNT_rep2"]), pt.size = NA, label = T, label.size = 2.5) + NoAxes() +
NoLegend()), -Inf, Inf, -Inf, Inf, data = p$data[which(p$data$orig.ident == "aDNT_rep2")[1],]) +
labs(color = paste0("aDNT\ncells\n(", prettyNum(ncol(object), big.mark = ","),")"), tag = "a") +
theme.text +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "FigS11a.pdf", plot = ., units = "cm", width = 8, height = 4)
```
```{r featureplot}
object = cca.kept$aDNT
DefaultAssay(object) = "RNA"
object$CXCR6 = FetchData(object, "Cxcr6")
object$CXCR6 = object$CXCR6 > 0
table(object$CXCR6)
mapply(function(gene) {
##### save png plot #####
(FeaturePlot(object = object, features = gene, order = F, pt.size = .001) +
scale_color_distiller(palette = "Spectral") +
theme_nothing() +
theme(aspect.ratio = 1,
panel.background = element_rect(fill = "lightgrey", colour = NA))) %>%
ggsave(filename = "tmp.png", plot = ., units = "cm", width = 6, height = 6, dpi = 1200)
##### plot #####
p1 = FeaturePlot(object = object, features = gene, pt.size = NA, combine = F)[[1]] +
scale_color_distiller(palette = "Spectral",
guide = guide_colorbar(title = "Expression", title.position = "left",
title.theme = element_text(angle = 90, hjust = .5, vjust = .5, size = 7),
default.unit = "cm", barwidth = .1, barheight = 1.2,
label.theme = element_text(size = 6))) +
annotation_raster(png::readPNG("tmp.png"), -Inf, Inf, -Inf, Inf) +
theme.text +
theme(aspect.ratio = 1,
plot.title = element_text(size = 8, hjust = .5, face = "italic")) +
NoAxes() +
annotation_custom(ggplotGrob(FeaturePlot(object, gene, pt.size = NA, label = T, label.size =
2.5, combine = F)[[1]] +
theme(aspect.ratio = 1) +
theme_nothing()))
p2 = VlnPlot(object, gene, pt.size = F, group.by = "CXCR6") +
scale_color_brewer(palette = "Paired") +
scale_x_discrete(labels = c("Cxcr6-", "Cxcr6+")) +
theme.text +
NoLegend() +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(face = "italic"),
plot.title = element_text(face = "italic", hjust = .5))
return(plot_grid(p1, p2, ncol = 1, rel_heights = c(1.5,1)))
},
gene = c("Cxcr6", "Ikzf2", "Ly6c1", "Ly6c2", "Il17a", "Gzmb"), SIMPLIFY = F) %>%
plot_grid(plotlist = ., ncol = 3, labels = c("c", "d", "e", "f", "g", "h"), label_size = 9) %>%
ggsave(filename = "FigS11b.pdf", plot = ., units = "cm", width = 3*4, height = 2*6)
```

# fig S12 heatmap


```{r}
##### set the parameters #####
df = cc.marker$aDNT %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("aDNT", cluster)) %>%
dplyr::left_join(y = cc.marker.48nk %>%
dplyr::filter(type %in% paste0("aDNT", 0:1)),
by = c("gene", "type"),
suffix = c("", ".NK")) %>%
dplyr::mutate(type = "Higher than CD4 CD8 NK")
df$type[is.na(df$pct.2.NK)] = "Not higher than CD4 CD8 NK"
head(df[,c("gene", "type", "pct.2.NK")])
df = df[df$type == "Higher than CD4 CD8 NK",]
df$gene = factor(df$gene, levels = rev(sort(unique(as.character(df$gene)))))
table(df$type)
##### write csv #####
# write.csv(df, file = "TableS8.aDNT.subgroup.CD48NK.markers.csv", row.names = F)
##### object #####
object = harmony.dnt.nk
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"aDNT")])
object$id.dot = as.character(object$celltype)
object$id.dot[grep("aDNT", object$id.dot)] = paste("aDNT",
cca.kept$aDNT$integrated_snn_res.0.05, sep = "")
object$id.dot = factor(object$id.dot, levels = c(paste("aDNT",0:1,sep = ""), "CD4", "CD8", "NK"))
##### plot #####
heatmap.f(object = object,
features = df$gene,
type = df$cluster,
group.by = "id.dot",
angle.x = 90,
axis.text.y = element_text(face = "italic", hjust = 0, size = 5),
pal = c(brewer.pal(8, "Paired"),
brewer.pal(8, "Paired")[3:5]),
xlab = paste0(prettyNum(ncol(object), big.mark = ",")," cells"),
ylab = paste("Subgroup markers of aDNT higher than CD4 CD8 NK (", nrow(df), ")", sep = ""),
tag = "",
space = "free") %>%
ggsave(filename = "FigS12.pdf", plot = ., units = "cm", width = 8, height = 20)
```
---
title: "00.plot5"
output: html_notebook
editor_options:
chunk_output_type: console
---

2020-11-22 edit

# library
```{r}
Sys.setenv(LANGUAGE = "en")
library(Seurat)
library(harmony)
library(MAST)
library(dplyr)
library(tidyselect)
library(RColorBrewer)
library(future)
library(ggplot2)
library(org.Mm.eg.db)
library(EnsDb.Mmusculus.v79)
library(cowplot)
library(data.table)
library(clusterProfiler)
library(DoubletDecon)
library(DoubletFinder)
options(warn = -1)
memory.limit(size = 64670)
```
# theme set
```{r}
theme.text = theme_cowplot(font_size = 8)
cols = c(brewer.pal(9, "Set1"),
brewer.pal(8, "Set2")[-8],
brewer.pal(12, "Set3")[-9],
brewer.pal(12, "Paired"),
brewer.pal(8, "Dark2"),
brewer.pal(11, "Spectral")[-6],
brewer.pal(11, "BrBG")[-6]
)
```
# load function
```{r}
source("../24.publish_least_library/function/heatmap.f.r")
source("../24.publish_least_library/function/dotplot.f.r")

```
# load data
```{r}
load("01.cca.kept.Rdata")

load("01.harmony.dnt.nk.Rdata")
load("01.cc.marker.DNT.Rdata")
load("01.cca.kept.cc.marker.DNT.48nk.Rdata")

```
# fig 5a dimplot
```{r}
##### set parameters #####
object = cca.kept$DNT
DefaultAssay(object) = "integrated"
object = FindClusters(object, resolution = .01)
object$group = paste0("DNT", object$integrated_snn_res.0.01)
##### save png #####
(DimPlot(object, group.by = "group", cols = "Set2", pt.size = .1) +
theme_nothing()) %>%
ggsave(filename = "tmp.png", plot = ., units = "cm", width = 6, height = 6, dpi = 1200)
##### plots #####
p1 = (DimPlot(object, group.by = "group", cols = "Set2", pt.size = NA) +
theme.text +
labs(color = paste0("nDNT & aDNT\ncells (", prettyNum(ncol(object), big.mark = ","),")"), tag =
"a") +
theme(aspect.ratio = 1) +
NoAxes() +
annotation_raster(png::readPNG("tmp.png"), -Inf, Inf, -Inf, Inf) +
annotation_custom(ggplotGrob(DimPlot(object, group.by = "group", pt.size = NA, label = T,
label.size = 2.5) +
theme_nothing())))
p1
#####
meta = mapply(function(object, celltype){
[email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::mutate(recluster = paste0(celltype, integrated_snn_res.0.05)) %>%
dplyr::select(cells, recluster)
}, object = cca.kept[1:2], SIMPLIFY = F, celltype = names(cca.kept)[1:2]) %>%
data.table::rbindlist()
[email protected] = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::left_join(y = meta, by = "cells") %>%
tibble::column_to_rownames("cells")
object$celltype = sub("_rep[1-2]", "", object$orig.ident) %>%
factor(x = ., levels = c("nDNT", "aDNT"))
object$recluster = factor(object$recluster, levels = c(paste0("nDNT", 0:4),paste0("aDNT", 0:1)))
##### bar plot #####
df = prop.table(table(object$recluster, object$group), 1) %>%
round(., 2) %>%
data.frame()
p3 = ggplot(df, aes(Var1, Freq, fill = Var2)) +
geom_bar(stat = "identity", position = "stack", width = .6) +
scale_fill_brewer(palette = "Set2") +
labs(x = "", y = "Percentage", fill = "") +
theme.text +
theme(aspect.ratio = 1,
axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1))
p3
##### save pngs #####
mapply(function(ident, num, color){
(DimPlot(object, pt.size = .1, group.by = "recluster",
cells = colnames(object)[object$celltype == ident]) +
scale_color_manual(values = brewer.pal(7, "Paired")[color]) +
theme_nothing()) %>%
ggsave(filename = paste0("tmp",num,".png"), plot = ., units = "cm", width = 6, height = 6, dpi =
1200)
}, ident = c("nDNT", "aDNT"), num = 1:2, color = list(1:5, 6:7), SIMPLIFY = F)
##### annotate function #####
annotation_custom2 <- function(grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, data){
layer(data = data, stat = StatIdentity, position = PositionIdentity,
geom = ggplot2:::GeomCustomAnn,
inherit.aes = TRUE, params = list(grob = grob,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax))}
##### plots #####
p = (DimPlot(object, cols = "Paired", pt.size = NA, split.by = "celltype", group.by = "recluster"))
p2 = (p +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp1.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$celltype == "nDNT")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$celltype == "nDNT"]), pt.size = NA, label = T, label.size = 2.5, group.by = "recluster") +
NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$celltype == "nDNT")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp2.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$celltype == "aDNT")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$celltype == "aDNT"]), pt.size = NA, label = T, label.size = 2.5, group.by = "recluster") +
NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$celltype == "aDNT")[1],]) +
labs(color = paste0("nDNT & aDNT\ncells (", prettyNum(ncol(object), big.mark = ","),")")) +
theme.text +
NoAxes() +
theme(aspect.ratio = 1))
p2
##### save plots #####
plot_grid(plot_grid(p1, p3, nrow = 1, rel_widths = c(1.4, 1)), p2, ncol = 1) %>%
ggsave(filename = "Fig5a.pdf", plot = ., units = "cm", width = 10, height = 8)
#####
```
# fig 5b heatmap
```{r}
##### set the conserved marker genes #####
table(object$celltype, object$group)
DefaultAssay(object) = "RNA"
table(Idents(object))
# m = FindAllMarkers(object, only.pos = T)
# head(m)
# df = FindConservedMarkers(object, ident.1 = 0, ident.2 = 1, grouping.var = "orig.ident") %>%
# dplyr::filter(max_pval < .05) %>%
# tibble::rownames_to_column("gene") %>%
# dplyr::left_join(y = m, by = "gene") %>%
# dplyr::arrange(desc(avg_logFC))
# colnames(df)
# cc.marker.DNT = df
# save(cc.marker.DNT, file = "01.cc.marker.DNT.Rdata")
#####
df = cc.marker.DNT
head(df)
write.csv(df, file = "TableS10.nDNT.aDNT.conserved.marker.csv", row.names = F)
table(object$group)
heatmap.f(object = object,
features = df$gene,
type = df$cluster,
group.by = "group",
xlab = paste0("nDNT & aDNT cells (", prettyNum(ncol(object), big.mark = ","),")"),
ylab = paste0("Conserved subgroup markers\nof nDNT & aDNT cells (",
prettyNum(nrow(df), big.mark = ",") ,")"),
tag = "b",
pal = c(brewer.pal(4, "Set2")[1:2],
brewer.pal(4, "Set2")[1:2]),
axis.text.y = element_blank(),
angle.x = 90
) %>%
ggsave(filename = "Fig5b.pdf", plot = ., units = "cm", width = 8, height = 7)
#####
```
# fig 5c dotplot
```{r}
object = harmony.dnt.nk
[email protected] = meta
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"nDNT", "aDNT")])
table(object$recluster)
object$recluster = factor(object$recluster, levels = c(paste0("DNT", 0:1), "CD4", "CD8", "NK"))
```
```{r}
##### set the parameters #####
df = cc.marker$DNT %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("DNT", cluster)) %>%
dplyr::left_join(y = cc.marker.48nk %>%
dplyr::filter(type %in% paste0("DNT", 0:1)),
by = c("gene", "type"),
suffix = c("", ".48nk")) %>%
dplyr::mutate(type = "Higher than\nCD4 CD8 NK")
df$type[is.na(df$avg_logFC.48nk)] = "Not higher than\nCD4 CD8 NK"
head(df[,c("gene", "type", "avg_logFC", "cluster")])
##### annotate #####
# ensembl = biomaRt::useMart("ensembl", dataset = "mmusculus_gene_ensembl")
mm_go_anno = biomaRt::getBM(
attributes = c("external_gene_name", "description", "go_id", "name_1006",
"namespace_1003", "definition_1006"),
filters = c("external_gene_name"),
values = unique(as.character(df$gene)),
mart = ensembl) %>%
dplyr::filter(namespace_1003 %in% c("cellular_component", "molecular_function"))
cs = sort(unique(mm_go_anno$external_gene_name[grep("component of membrane",
mm_go_anno$name_1006)]))
cs.neg = sort(unique(mm_go_anno$external_gene_name[grep("mitochondrial|integral
component of Golgi membrane|intracellular membrane-bounded organelle|nuclear membrane",
mm_go_anno$name_1006)]))
cs = cs[!(cs %in% cs.neg)]
secreted = sort(unique(mm_go_anno$external_gene_name[
c(grep("extracellular space", mm_go_anno$name_1006),
grep("granzyme", mm_go_anno$description))]))
tf = sort(unique(mm_go_anno$external_gene_name[
grep("transcription factor activity|transcription activator activity
|transcription repressor activity|transcription coactivator activity|
DNA binding|DNA-binding",
mm_go_anno$name_1006)]))
intersect(cs, secreted)
intersect(cs, tf)
intersect(secreted, tf)
cs = cs[!(cs %in% tf)]
secreted = secreted[(!(secreted %in% cs)) & (!(secreted %in% tf))]
ng = sort(unique(df$gene[!(df$gene %in% c(cs, secreted, tf))]))
anno.df = data.frame(
gene = c(cs, secreted, tf, ng),
anno = c(rep("cell surface", length(cs)),
rep("secreted", length(secreted)),
rep("transcription factor", length(tf)),
rep("", length(ng)))) %>%
dplyr::filter(!(gene %in% c("Actn1", "Actn2")))
df.plot =
dplyr::left_join(df, anno.df, by = c("gene")) %>%
dplyr::select(gene, cluster, type, anno, avg_logFC) %>%
dplyr::mutate(cluster = paste0("DNT", cluster))
df.plot$anno = factor(
df.plot$anno,
levels = c("cell surface", "secreted", "transcription factor", ""))
head(df.plot)
##### plot #####
p = mapply(function(i, j){
x = df.plot %>%
dplyr::filter(anno == i) %>%
dplyr::group_by(cluster) %>%
dplyr::top_n(10, avg_logFC) %>%
dplyr::group_by(cluster, type) %>%
dplyr::arrange(gene, .by_group = T)
head(x)
p = dotplot.f(object, features = x$gene, group.by = "recluster", facet = x$type,
axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1)) +
labs(x = "", title = j, y = "Conserved subgroup markers\nof nDNT & aDNT cells")
return(p)
},
i = c("cell surface", "secreted", "transcription factor", ""),
j = c("Cell\nsurface","Secreted","Transcription\nfactor","Remaining\ngenes"),
SIMPLIFY = F,
USE.NAMES = F)
p[[1]] = p[[1]] + NoLegend()
p[[2]] = p[[2]] + theme(axis.title.y = element_blank()) + NoLegend()
p[[3]] = p[[3]] + theme(axis.title.y = element_blank())
p[[4]] = p[[4]] + theme(axis.title.y = element_blank()) + NoLegend()
##### change the strip color #####
for (i in 1:4) {
pal = c(brewer.pal(3, "Pastel1")[1:2])
g = ggplotGrob(p[[i]])
strips = grep("strip-", g$layout$name)
for (x in seq_along(strips)) {
k = which(grepl("rect", g$grobs[[strips[[x]]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[[x]]]]$grobs[[1]]$children[[k]]$gp$fill = pal[x]
}
p[[i]] = g
}
##### save plots #####
plot_grid(plotlist = p, nrow = 1, labels = "c", label_size = 8, rel_widths = c(1.2,.85,.85,1.1)) %>%
ggsave(filename = "Fig5c.pdf", plot = ., units = "cm", width = 15, height = 7)
```
# fig 5d go
```{r cluego}
##### read the data #####
df = read.table(file = "Fig5d.cluego.txt", header = T, sep = "\t") %>%
setNames(c("Term", "Source", "AdjustP", "AssociatedGenes", "Num.Genes", "Genes", "Cluster"))
%>%
dplyr::group_by(Cluster, Source) %>%
dplyr::arrange(AdjustP, .by_group = T)
df$Term = factor(df$Term, levels = unique(df$Term))
df$Cluster = as.character(df$Cluster)
head(df)
##### plot #####
p = ggplot(df, aes(Cluster, Term, color = AdjustP, size = AssociatedGenes)) +
geom_point() +
facet_grid(rows = vars(Source), scales = "free", space = "free") +
scale_color_distiller(palette = "Spectral", direction = 1, guide = guide_colorbar(default.unit =
"cm", barwidth = .2, barheight = 1.5)) +
scale_size_area(max_size = 3, guide_legend(title = "Associated\nGenes (%)")) +
labs(x = "DNT cluster", tag = "d") +
theme.text
p
##### change the strip color #####
g = ggplotGrob(p)
pal = brewer.pal(4, "Set3")
strips = grep("strip-", g$layout$name)
for (i in seq_along(strips)) {
k = which(grepl("rect",
g$grobs[[strips[i]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[i]]]$grobs[[1]]$children[[k]]$gp$fill = pal[i]
}
##### save plot #####
ggsave(filename = "Fig5d.cluego.pdf", plot = plot_grid(g), units = "cm", width = 10, height = 5.5)
```
# fig S13 dimplot
```{r}
##### set parameters #####
object = cca.kept$DNT
meta = mapply(function(object, celltype){
[email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::mutate(recluster = paste0(celltype, integrated_snn_res.0.05)) %>%
dplyr::select(cells, recluster)
}, object = cca.kept[1:2], SIMPLIFY = F, celltype = names(cca.kept)[1:2]) %>%
data.table::rbindlist()
[email protected] = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::left_join(y = meta, by = "cells") %>%
tibble::column_to_rownames("cells")
object$celltype = sub("_rep[1-2]", "", object$orig.ident) %>%
factor(x = ., levels = c("nDNT", "aDNT"))
object$recluster = factor(object$recluster, levels = c(paste0("nDNT", 0:4),paste0("aDNT", 0:1)))
table(object$recluster)
##### save pngs #####
mapply(function(ident, num, color){
(DimPlot(object, pt.size = .1, group.by = "recluster",
cells = colnames(object)[object$orig.ident == ident]) +
scale_color_manual(values = brewer.pal(7, "Paired")[color]) +
theme_nothing()) %>%
ggsave(filename = paste0("tmp",num,".png"), plot = ., units = "cm", width = 6, height = 6, dpi =
1200)
},
ident = c("nDNT_rep1", "nDNT_rep2", "aDNT_rep1", "aDNT_rep2"),
num = 1:4,
color = list(1:5, 1:5, 6:7, 6:7),
SIMPLIFY = F)
##### annotate function #####
annotation_custom2 <- function(grob, xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf, data){
layer(data = data, stat = StatIdentity, position = PositionIdentity,
geom = ggplot2:::GeomCustomAnn,
inherit.aes = TRUE, params = list(grob = grob,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax))}
##### plots #####
object$orig.ident = factor(object$orig.ident, levels = c("nDNT_rep1", "nDNT_rep2", "aDNT_rep1",
"aDNT_rep2"))
p = (DimPlot(object, cols = "Paired", pt.size = NA, split.by = "orig.ident", group.by = "recluster",
ncol = 2))
(p +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp1.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "nDNT_rep1")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "nDNT_rep1"]), pt.size = NA, label = T, label.size = 2.5, group.by =
"recluster") + NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "nDNT_rep1")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp2.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "nDNT_rep2")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "nDNT_rep2"]), pt.size = NA, label = T, label.size = 2.5, group.by =
"recluster") + NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "nDNT_rep2")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp3.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "aDNT_rep1")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "aDNT_rep1"]), pt.size = NA, label = T, label.size = 2.5, group.by =
"recluster") + NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "aDNT_rep1")[1],]) +
annotation_custom2(grid::rasterGrob(png::readPNG("tmp4.png")), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "aDNT_rep2")[1],]) +
annotation_custom2(ggplotGrob(DimPlot(subset(object, cells = colnames(object)
[object$orig.ident == "aDNT_rep2"]), pt.size = NA, label = T, label.size = 2.5, group.by =
"recluster") + NoAxes() + NoLegend()), -Inf, Inf, -Inf, Inf,
data = p$data[which(p$data$orig.ident == "aDNT_rep2")[1],]) +
labs(color = paste0("nDNT & aDNT\ncells (", prettyNum(ncol(object), big.mark = ","),")")) +
theme.text +
NoAxes() +
theme(aspect.ratio = 1)) %>%
ggsave(filename = "FigS13.pdf", plot = ., units = "cm", width = 10, height = 8)
#####
```
# fig S14 heatmap
```{r}
##### object #####
object = cca.kept$DNT
DefaultAssay(object) = "integrated"
object = FindClusters(object, resolution = .01)
df = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::select(cells, integrated_snn_res.0.01)
head(df)
object = harmony.dnt.nk
object = subset(object, cells = colnames(object)[object$celltype %in% c("CD4", "CD8", "NK",
"nDNT", "aDNT")])
meta = [email protected] %>%
tibble::rownames_to_column("cells") %>%
dplyr::left_join(y = df, by = "cells") %>%
tibble::column_to_rownames("cells")
head(meta)
meta$cluster = paste0(sub("a|n", "", meta$celltype), meta$integrated_snn_res.0.01) %>%
sub("NA", "", .) %>%
factor(., levels = c(paste0("DNT", 0:1), "CD4", "CD8", "NK"))
table(meta$cluster)
[email protected] = meta
##### set the parameters #####
df = cc.marker.DNT %>%
dplyr::group_by(cluster) %>%
dplyr::arrange(desc(avg_logFC), .by_group = T) %>%
dplyr::mutate(type = paste0("DNT", cluster)) %>%
dplyr::left_join(y = cc.marker.DNT.48nk %>%
dplyr::filter(type %in% paste0("DNT", 0:1)),
by = c("gene", "type"),
suffix = c("", ".NK")) %>%
dplyr::mutate(type = "Higher than CD4 CD8 NK")
df$type[is.na(df$pct.1.NK)] = "Not higher than CD4 CD8 NK"
head(df[,c("gene", "type", "avg_logFC")])
df = df[df$type == "Higher than CD4 CD8 NK",]
df$gene = factor(df$gene, levels = rev(sort(unique(as.character(df$gene)))))
##### write csv #####
write.csv(df, file = "TableS11.nDNT.aDNT.conserved.marker.vs.CD48NK.csv", row.names = F)
##### plot #####
heatmap.f(object = object,
features = df$gene,
type = df$cluster,
group.by = "cluster",
angle.x = 90,
axis.text.y = element_text(face = "italic", hjust = 0, size = 7),
pal = c(brewer.pal(5, "Paired"),
brewer.pal(5, "Paired")[1:2]),
xlab = paste0(prettyNum(ncol(object), big.mark = ",")," cells"),
ylab = paste("Conserved subgroup markers of nDNT & aDNT\nhigher than CD4 CD8 NK (",
nrow(df), ")", sep = ""),
tag = "",
space = "free") %>%
ggsave(filename = "FigS14.pdf", plot = ., units = "cm", width = 8, height = 12)
```

---
title: "00.plot6"
output: html_notebook
editor_options:
chunk_output_type: console
---

2020-11-23 edit

# library
```{r}
Sys.setenv(LANGUAGE = "en")
library(Seurat)
library(harmony)
library(MAST)
library(dplyr)
library(tidyselect)
library(RColorBrewer)
library(future)
library(ggplot2)
library(org.Mm.eg.db)
library(EnsDb.Mmusculus.v79)
library(cowplot)
library(data.table)
library(clusterProfiler)
library(DoubletDecon)
library(DoubletFinder)
options(warn = -1)
memory.limit(size = 64670)
```
# theme set
```{r}
theme_set(theme_cowplot(font_size = 8))
theme.text = theme_cowplot(font_size = 8)
cols = c(brewer.pal(9, "Set1"),
brewer.pal(8, "Set2")[-8],
brewer.pal(12, "Set3")[-9],
brewer.pal(12, "Paired"),
brewer.pal(8, "Dark2"),
brewer.pal(11, "Spectral")[-6],
brewer.pal(11, "BrBG")[-6]
)
```
# load function
```{r}
source("../24.publish_least_library/function/heatmap.f.r")
source("../24.publish_least_library/function/dotplot.f.r")

```
# load data
```{r}
load("01.cca.kept.Rdata")

# load("01.harmony.dnt.nk.Rdata")
# load("01.cc.marker.DNT.Rdata")
# load("01.cca.kept.cc.marker.DNT.48nk.Rdata")

load("01.trans.marker.Rdata")
ensembl = biomaRt::useMart("ensembl", dataset = "mmusculus_gene_ensembl")

```
# fig 6a scatter plot plus euler venn
```{r euler venn}
##### venn1 #####
combinations = list("DNT0" = trans.marker$DNT0$gene[trans.marker$DNT0$trans == "up"],
"DNT1" = trans.marker$DNT1$gene[trans.marker$DNT1$trans == "up"])
venn1 = plot(eulerr::euler(
combinations = combinations),
fills = list(fill = brewer.pal(6, "Set2")[c(6,4,3)], alpha = .6),
quantities = list(fontsize = 8, font = 1),
labels = list(fontsize = 8, font = 1,
labels = c(paste0("DNT0\n(", length(combinations$DNT0), ")"),
paste0("DNT1\n(", length(combinations$DNT1), ")"))),
main = list(labels = paste0("Subgroup upregulated genes\nafter activation (",
length(unique(c(combinations$DNT0, combinations$DNT1))), ")"),
fontsize = 6, font = 1),
adjust_labels = T)
##### venn2 #####
combinations = list("DNT0" = trans.marker$DNT0$gene[trans.marker$DNT0$trans == "down"],
"DNT1" = trans.marker$DNT1$gene[trans.marker$DNT1$trans == "down"])
venn2 = plot(eulerr::euler(
combinations = combinations),
fills = list(fill = brewer.pal(6, "Set2")[c(2,5,1)], alpha = .6),
quantities = list(fontsize = 8, font = 1),
labels = list(fontsize = 8, font = 1,
labels = c(paste0("DNT0\n(", length(combinations$DNT0), ")"),
paste0("DNT1\n(", length(combinations$DNT1), ")"))),
main = list(labels = paste0("Subgroup downregulated genes\nafter activation (",
length(unique(c(combinations$DNT0, combinations$DNT1))), ")"),
fontsize = 6, font = 1),
adjust_labels = T)
#####
plot_grid(venn1, venn2, nrow = 1) %>%
ggsave(filename = "Fig6a.s.pdf", plot = ., units = "cm", width = 6, height = 4)
#####
```
```{r scatter plot}
##### set data #####
df = dplyr::full_join(trans.marker$DNT0, trans.marker$DNT1, by = c("gene"), suffix = c(".DNT0",
".DNT1")) %>%
dplyr::rowwise() %>%
dplyr::mutate(avg_logFC.DNT0 = mean(c(avg_logFC.rep1.DNT0, avg_logFC.rep2.DNT0)),
avg_logFC.DNT1 = mean(c(avg_logFC.rep1.DNT1, avg_logFC.rep2.DNT1)),
type.DNT0 = gsub("_rep1","0",type.rep1.DNT0),
type.DNT1 = gsub("_rep1","1",type.rep1.DNT1)) %>%
dplyr::select(gene, avg_logFC.DNT0, avg_logFC.DNT1, type.DNT0, type.DNT1, contains("trans"))
head(df)
df$avg_logFC.DNT0[is.na(df$avg_logFC.DNT0)] = 0
df$avg_logFC.DNT1[is.na(df$avg_logFC.DNT1)] = 0
df = dplyr::mutate(df, type = paste(type.DNT0, trans.DNT0, type.DNT1, trans.DNT1, sep = ".")) %>
%
dplyr::select(gene, contains("avg"), type)
df$type = factor(df$type,
labels = c("aDNT0.vs.nDNT0 down\naDNT1.vs.nDNT1 down (81)",
"aDNT0.vs.nDNT0 down (114)",
"aDNT0.vs.nDNT0 up\naDNT1.vs.nDNT1 up (234)",
"aDNT0.vs.nDNT0 up (237)",
"aDNT1.vs.nDNT1 down (40)",
"aDNT1.vs.nDNT1 up (99)"))
table(df$type)
head(df)
write.csv(df, file = "TableS13.up-down-regulated.marker.csv", row.names = F)
##### plot #####
scatterplot = (ggplot(df, aes(avg_logFC.DNT0, avg_logFC.DNT1, color = type)) +
geom_point(alpha = .4, size = .5) +
geom_hline(yintercept = c(1, -1), color = "red", linetype = "dashed", size = .1) +
geom_vline(xintercept = c(1, -1), color = "red", linetype = "dashed", size = .1) +
ggrepel::geom_label_repel(data = df %>% dplyr::filter(abs(avg_logFC.DNT0) >= 1 |
abs(avg_logFC.DNT1) >= 1),
aes(avg_logFC.DNT0, avg_logFC.DNT1, label = gene), alpha = 1, size = 2, fontface =
"italic", segment.size = .1, min.segment.length = 0, seed = 1) +
scale_color_brewer(palette = "Set2") +
labs(x = "avg_logFC (aDNT0.vs.nDNT0)", y = "avg_logFC (aDNT1.vs.nDNT1)", color = "", tag = "a")
+
theme(aspect.ratio = 1,
legend.position = c(.65, 0.15)))
plot_grid(venn2, scatterplot, venn1, nrow = 1, rel_widths = c(.2, 1, .2)) %>%
ggsave(filename = "Fig6a.pdf", plot = ., units = "cm", width = 12*1.4, height = 12)
```
# fig 6b dotplot
```{r}
##### set data #####
up = df[grep("up", df$type),]
up$type = factor(up$type, levels = sort(unique(up$type))[c(2,1,3)]) %>%
gsub(" up","", .) %>%
gsub(" \\(.*\\)", "", .)
table(up$type)
##### annotate #####
mm_go_anno = biomaRt::getBM(
attributes = c("external_gene_name", "description", "go_id", "name_1006",
"namespace_1003", "definition_1006"),
filters = c("external_gene_name"),
values = unique(as.character(up$gene)),
mart = ensembl) %>%
dplyr::filter(namespace_1003 %in% c("cellular_component", "molecular_function"))
cs = sort(unique(mm_go_anno$external_gene_name[grep("component of membrane",
mm_go_anno$name_1006)]))
cs.neg = sort(unique(mm_go_anno$external_gene_name[grep("mitochondrial|integral
component of Golgi membrane|intracellular membrane-bounded organelle|nuclear membrane",
mm_go_anno$name_1006)]))
cs = cs[!(cs %in% cs.neg)]
secreted = sort(unique(mm_go_anno$external_gene_name[
c(grep("extracellular space", mm_go_anno$name_1006),
grep("granzyme", mm_go_anno$description))]))
tf = sort(unique(mm_go_anno$external_gene_name[
grep("transcription factor activity|transcription activator activity
|transcription repressor activity|transcription coactivator activity|
DNA binding|DNA-binding",
mm_go_anno$name_1006)]))
intersect(cs, secreted)
intersect(cs, tf)
intersect(secreted, tf)
cs = cs[!(cs %in% tf)]
secreted = secreted[(!(secreted %in% cs)) & (!(secreted %in% tf))]
ng = sort(unique(df$gene[!(df$gene %in% c(cs, secreted, tf))]))
anno.df = data.frame(
gene = c(cs, secreted, tf, ng),
anno = c(rep("cell surface", length(cs)),
rep("secreted", length(secreted)),
rep("transcription factor", length(tf)),
rep("", length(ng)))) %>%
dplyr::filter(!(gene %in% c("Actn1", "Actn2")))
df.plot =
dplyr::left_join(up, anno.df, by = c("gene"))
head(df.plot)
df.plot$anno = factor(
df.plot$anno,
levels = c("cell surface", "secreted", "transcription factor", ""))
head(df.plot)
table(df.plot$anno)
##### object #####
object = cca.kept$DNT
object$cluster = paste0(sub("_rep[1|2]", "", object$orig.ident), object$integrated_snn_res.0.01)
%>%
factor(., levels = paste0(c("nDNT","aDNT"), c(0,0,1,1)))
table(object$cluster)
##### plot #####
p = mapply(function(i, j){
x = df.plot %>%
dplyr::rowwise() %>%
dplyr::mutate(avg_logFC = mean(c(avg_logFC.DNT0, avg_logFC.DNT1))) %>%
dplyr::filter(anno == i) %>%
dplyr::ungroup() %>%
dplyr::group_by(type) %>%
dplyr::top_n(10, avg_logFC) %>%
dplyr::arrange(gene, .by_group = T)
head(x)
p = dotplot.f(object, features = x$gene, group.by = "cluster", facet = x$type,
axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1)) +
labs(x = "", title = j, y = "Subgroup upregulated genes\nafter activation")
return(p)
},
i = c("cell surface", "secreted", "transcription factor", ""),
j = c("Cell\nsurface","Secreted","Transcription\nfactor","Remaining\ngenes"),
SIMPLIFY = F,
USE.NAMES = F)
p[[1]] = p[[1]] + NoLegend()
p[[2]] = p[[2]] + theme(axis.title.y = element_blank()) + NoLegend()
p[[3]] = p[[3]] + theme(axis.title.y = element_blank())
p[[4]] = p[[4]] + theme(axis.title.y = element_blank()) + NoLegend()
##### change the strip color #####
for (i in 1:4) {
pal = brewer.pal(6, "Set2")[c(6,3,4)]
g = ggplotGrob(p[[i]])
strips = grep("strip-", g$layout$name)
for (x in seq_along(strips)) {
k = which(grepl("rect", g$grobs[[strips[[x]]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[[x]]]]$grobs[[1]]$children[[k]]$gp$fill = pal[x]
}
p[[i]] = g
}
##### save plots #####
plot_grid(plotlist = p, nrow = 1, labels = "b", label_size = 8, rel_widths = c(1.2,1,1,1.1)) %>%
ggsave(filename = "Fig6b.pdf", plot = ., units = "cm", width = 15, height = 12)
```
# fig S14 go
```{r cluego}
##### read the data #####
df = read.table(file = "FigS14.cluego.txt", header = T, sep = "\t") %>%
setNames(c("Term", "Source", "AdjustP", "AssociatedGenes", "Num.Genes", "Genes", "Cluster"))
df$Cluster = factor(df$Cluster, levels = c("DNT0.up.only", "DNT0.DNT1.up", "DNT1.up.only",
"DNT0.down.only", "DNT0.DNT1.down", "DNT1.down.only"))
df$Source = factor(df$Source, levels = c("BP", "CC", "MF", "KEGG", "WikiPathways"))
df = df %>%
dplyr::group_by(Cluster, Source) %>%
dplyr::arrange(Term, .by_group = T) %>%
dplyr::mutate(Type = sub(".*(up|down).*", "\\1", Cluster))
df$Term = factor(df$Term, levels = unique(df$Term))
df$Type = factor(df$Type, levels = c("up", "down"))
head(df)
##### plot #####
p = ggplot(df, aes(Cluster, Term, color = AdjustP, size = AssociatedGenes)) +
geom_point() +
facet_grid(rows = vars(Source), cols = vars(Type), scales = "free", space = "free") +
scale_color_distiller(palette = "Spectral", direction = 1, guide = guide_colorbar(default.unit =
"cm", barwidth = .2, barheight = 1.5)) +
scale_size_area(max_size = 2.5, guide_legend(title = "Associated\nGenes (%)")) +
labs(x = "", tag = "c") +
theme.text +
theme(axis.text.x = element_text(angle = 60, hjust = 1, vjust = 1))
p
##### change the strip color #####
g = ggplotGrob(p)
pal = brewer.pal(11, "Set3")
strips = grep("strip-", g$layout$name)
for (i in seq_along(strips)) {
k = which(grepl("rect",
g$grobs[[strips[i]]]$grobs[[1]]$childrenOrder))
g$grobs[[strips[i]]]$grobs[[1]]$children[[k]]$gp$fill = pal[i]
}
##### save plot #####
ggsave(filename = "FigS14.cluego.pdf", plot = plot_grid(g), units = "cm", width = 10, height = 10)
```

You might also like