0% found this document useful (0 votes)
30 views

14-Integration Default Lognorm Pipeline-22-02-2024

The document provides an overview of integrating single-cell RNA-seq datasets. It describes loading datasets from different conditions, normalizing and analyzing them separately, then integrating them to identify shared cell subpopulations and states across conditions. The integration aims to allow comparison of cell types between conditions to find condition-specific responses.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

14-Integration Default Lognorm Pipeline-22-02-2024

The document provides an overview of integrating single-cell RNA-seq datasets. It describes loading datasets from different conditions, normalizing and analyzing them separately, then integrating them to identify shared cell subpopulations and states across conditions. The integration aims to allow comparison of cell types between conditions to find condition-specific responses.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to scRNA-seq

integration
Basics
• Integration of single-cell sequencing datasets, for example across
experimental batches, donors, or conditions, is often an important
step in scRNA-seq workflows.
• Integrative analysis can help to match shared cell types and states
across datasets, which can boost statistical power, and most
importantly, facilitate accurate comparative analysis across datasets.
Integration goals

• The following tutorial is designed to give you an overview of the kinds


of comparative analyses on complex cell types that are possible using
the Seurat integration procedure.
Here, we address a few key goals:
• Identify cell subpopulations that are present in both datasets
• Obtain cell type markers that are conserved in both control and
stimulated cells
• Compare the datasets to find cell-type specific responses to
stimulation
Load packages
• library(Seurat)
• library(SeuratData)
• library(patchwork)
InstallData("ifnb")
• The object contains data from human PBMC from two conditions,
interferon-stimulated and control cells (stored in the stim column in
the object metadata).
• We will aim to integrate the two conditions together, so that we can
jointly identify cell subpopulations across datasets, and then explore
how each group differs across conditions
• # load dataset
• ifnb <- LoadData("ifnb")
• # split the RNA measurements into two layers one for control cells,
one for stimulated cells

• ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)


Perform analysis without integration (merged data)

• # run standard anlaysis workflow


• ifnb <- NormalizeData(ifnb)
• ifnb <- FindVariableFeatures(ifnb)
• ifnb <- ScaleData(ifnb)
• ifnb <- RunPCA(ifnb)
• ifnb <- FindNeighbors(ifnb, dims = 1:30, reduction = "pca")
• ifnb <- FindClusters(ifnb, resolution = 2, cluster.name =
"unintegrated_clusters")
Continue…
• ifnb <- RunUMAP(ifnb, dims = 1:30, reduction = "pca",
reduction.name = "umap.unintegrated")
• DimPlot(ifnb, reduction = "umap.unintegrated", group.by = c("stim",
"seurat_clusters"))
Perform integration
• We now aim to integrate data from the two conditions, so that cells from the same cell
type/subpopulation will cluster together.

• We often refer to this procedure as intergration/alignment. When aligning two genome sequences
together, identification of shared/homologous regions can help to interpret differences between the
sequences as well.
• Similarly for scRNA-seq integration, our goal is not to remove biological differences across
conditions, but to learn shared cell types/states in an initial step - specifically because that will
enable us to compare control stimulated and control profiles for these individual cell types.

• The Seurat integration procedure aims to return a single dimensional reduction that captures the
shared sources of variance across multiple layers, so that cells in a similar biological state will cluster.
• The method returns a dimensional reduction (i.e. integrated.cca) which can be used for visualization
and unsupervised clustering analysis.
Default pipeline
Perform integration

• h0.data <- Read10X(data.dir = "/outs/filtered_feature_bc_matrix",gene.column =


1)

control data
2nd data
• h6.data <- Read10X(data.dir = "outs/filtered_feature_bc_matrix",gene.column =
1)

Stimulus data
Pipeline
• SE.features <- SelectIntegrationFeatures(object.list =
c(h0.dataset,h6.dataset),nfeatures = 3000)
• <- FindIntegrationAnchors(object.list = c(h0.dataset,h6.dataset),
anchor.features = SE.features, verbose = FALSE)
• SE.integrated <- IntegrateData(anchorset = SE.anchors,
verbose = FALSE)
• SE.integrated <- ScaleData(SE.integrated)
• SE.integrated <- RunPCA(SE.integrated, verbose = FALSE)
• SE.integrated <- RunUMAP(SE.integrated, dims = 1:50)
• SE.integrated <- FindNeighbors(SE.integrated,dims = 1:50)
• SE.integrated <- FindClusters(SE.integrated, resolution = 0.5)
Visualize
• DefaultAssay(SE.integrated) <- 'RNA‘
• DimPlot(SE.integrated,label = T)
• DimPlot(SE.integrated, group.by = 'orig.ident’)
How to annotate?
Known marker genes will help

clusters

Known markers
Knowledge from cluster and markers
Annotate
• SE.integrated <- RenameIdents(SE.integrated,
`0`= CD14',
`5`= B,
……..)
Annotate map
Next part
• Identify conserved cell type markers
• Identify differential expressed genes across conditions
• Trajectory analysis

You might also like