0% found this document useful (0 votes)
14 views

09.05.23_Sequencing Technology and Development_Canvas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

09.05.23_Sequencing Technology and Development_Canvas

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Genomics, Transcriptomics, and Development

C
Learning Objectives…

op
• Know how twin studies and sequencing approaches have been used to

yr
identify the genetic basis of traits/diseases.

ig
• Recall single nucleotide polymorphisms (SNPs) and know how SNP

ht
arrays have been used to conduct Genome Wide Association Studies
(GWAS).

ed
• Be able to briefly describe the core features of Sanger sequencing used
to complete whole genome sequencing (WGS) of the first human genome

M
as well as modern Next-gen Sequence by Synthesis protocols.

at
• Compare and contrast bulk-tissue RNAseq with recently developed
single-cell (scRNAseq) approaches.

er
ia
l2
02
2
Molecular Regulation of Development: By the Numbers

C
• Human development generates 30-40 trillion cells

op
• Each cell contains 19,975 protein-coding genes (60,603 total genes)

yr
• Typically, some combination of 12,262 ± 1,007 protein-coding genes are expressed
in any given cell

ig
Estimates suggest ~40% expressed in all cells, ~60% tissue specific.

ht
ed
• What is the role of these molecules during development?

M
GENES
at PROTEINS
~19,975 protein-coding

er 100-105 proteins per mRNA

ia
5-100 mRNA’s per gene 1,000,000,000-3,000,000,000 per cell

l2
02
2
https://bionumbers.hms.harvard.edu/search.aspx
The “Omics”

C
• Life scientists are now able to comprehensively study many biological

op
questions at the scale of ALL genes (genomics), RNA (transcriptomics)
and proteins (proteomics) in many organisms.

yr
• These technologies have allowed for an unprecedented view of biological

ig
processes including those relevant to developmental biology.

ht
ed
M
at
er
ia
l2
02
2
Benson et al. (2020) J. Neurosci. 40:81-88
Genetic Basis of Disease: Classic Twin Studies

C
op
Twin Studies
• Quantifies phenotypic differences in monozygotic

yr
(“MZ”, genetically identical) and dizygotic (“DZ”, ~50%

ig
genetic similarity) twin pairs to calculate concordance
and estimate heritability.

ht
ed
M
at
er
ia
l2
02
2
http://www.nature.com/doifinder/10.1038/ng.3285
Genetic Basis of Disease: Classic Twin Studies

C
• Twin studies utilize estimates of heritability (h2) a statistical measure describing how

op
much of the variation in a given trait can be attributed to genetic variation range in
value from 0 to 1 (for detailed description see https://pubmed.ncbi.nlm.nih.gov/18319743/)

yr
• If H = 1, all variation is due to differences between genotype.

ig
• If H = 0, all variation is due to differences in the environments experienced by twins.

ht
ed
M
at
er
ia
l2
02
2
https://doi.org/10.1016/j.cell.2019.01.015
Father of Genomics: Frederick Sanger

C
So what are the exact changes in the sequence of effected genes?

op
• In 1977, Sanger developed a clever approach using chain-terminating

yr
dideoxynucleotides (ddNTPs) to sequence a simple 5,386bp bacterial virus.

ig
• “Sanger sequencing” became the gold standard for >40 years and is still used.

ht
• Won his second Nobel prize in 1980 with Paul Berg and Walter Gilbert.

ed
M
at "for their contributions

er
concerning the
determination of base

ia
sequences in nucleic acids."

l2
02
2
Sanger Sequencing

C
• Sanger developed chain-terminating dideoxynucleotide triphosphates (ddNTPs) that stop
DNA polymerization during PCR to cleverly resolve sequence by measuring size.

op
Regular deoxynucleotide triphosphates (dNTPs, “N” = A, G, C, or T) and a small amount of
four fluorescently-conjugated ddNTPs (“N” = A, G, C, or T) and are mixed with template,

yr
genomic fragment, PCR product, or plasmid of interest.

ig
• PCR reaction will generate fragments of all lengths, but when a fluorescently-conjugated
ddNTP is incorporated, amplification of that strand stops and is color-coded.

ht
Following size separation by gel electrophoresis, the sequence of fragments up to 1000bp

ed
can then be determined.

M
at
er
ia
l2
02
2
https://www.sigmaaldrich.com/US/en/technical-documents/protocol/genomics/sequencing/sanger-sequencing
The Human Genome Project

C
• Whole Genome Sequencing (WGS) - the

op
comprehensive sequencing of the entire genome.
• Using Sanger sequencing, >10 yrs, $3 billion, ~12

yr
individuals from Buffalo, the Human Genome
Project completed the first WGS of ~3 billion

ig
base-pairs in 2003 covering ~90% of the genome.

ht
• The Telomere-to-Telomere (T2T) Consortium
finished filling in the remaining gaps in 2022-

ed
Dr. Francis Collins analyzing an autoradiogram
https://www.science.org/doi/10.1126/science.abj6987 . displaying the results of a Sanger DNA sequencing
experiment, such as that used in the early years of the
Human Genome Project.
https://www.youtube.com/watch?v=qOW5e4BgEa4

M
at
er
ia
l2
02
2
https://www.nature.com/articles/s41576-020-0275-3
Functional Gene Expression & Transcriptomics

C
op
Studying the expression of all genes (aka functional
genomics or transcriptomics) has been achieved with a
range of approaches.
yr
ig
ht
• Typically requires extraction of mRNA transcripts from a
tissue, or more recently, single cells!
ed
• Technologies continue to evolve…
M
• Microarrays
at
• Next-generation sequencing (NGS, RNAseq)
er
• Single-cell RNA-seq
ia
• Spatial Transcriptomics
l2
Nice review from the text…
https://learninglink.oup.com/access/content/barresi-12e-student- 02
2
resources/barresi-12e-further-development-3-17-6-microarrays-and-macroarrays https://www.nature.com/articles/nrg.2016.49
Microarrays

C
op
yr
• Once model genomes were

ig
sequenced, approaches to attach
small oligonucleotides to multiple

ht
genes of interest on a chip

ed
(microarrays) were developed.

M
• By the 2000s, microarrays
containing ~20,000 oligos
complementary to all known
at
er
protein-coding mRNAs were
regularly used to study the
transcriptome.
ia
l2
02
2
Microarrays

C
• Tissue derived RNA samples are added to the chip and the intensity of

op
a hybridized “spot” on the array provides an estimate of RNA
abundance for that specific oligonucleotide.

yr
ig
ht
ed
M
at
er
ia
l2
02
2
Single Nucleotide Polymorphisms Arrays & GWAS

C
• A single nucleotide polymorphism (abbreviated SNP, pronounced ‘snip’) is a

op
genomic variant at a single base position in the DNA.
• By 2005, the Human Genome Project and International HapMap Project (~270

yr
individuals) generated maps of ~3 million SNPs across the human genome.
• Inexpensive SNP microarrays based on these maps were created to conduct

ig
Genome Wide Association Studies (GWAS) that detect associations between

ht
genetic loci and traits/disease.

ed
M
at
er
ia
In one of the first uses of SNP arrays, the SNP in this Manhattan
plot (https://www.youtube.com/watch?v=Pdic7p_dk0I ) was l2
02
detected in 146 patients with age-related macular degeneration
(leading cause of blindness) and ultimately led to the discovery
that mutations in the CFH gene increase risk of AMD >5-fold.

2
GWAS

C
Estimates suggest 90-95% of SNPs detected in GWAS are located in non-coding

op
genomic regions that indirectly modulate gene expression ~40-50% of the time.

yr
ig
ht
ed
M
at
er
ia
l2
02
2
https://www.nature.com/articles/s43586-021-00056-9
GWAS Catalogs

C
op
• In response to the massive increase in the number of published GWAS, the GWAS
Catalog was founded by NIH-NHGRI in 2008. https://www.ebi.ac.uk/gwas/

yr
ig
ht
ed
M
at
er
ia
l2
02
2
Case Study: GWAS Catalog

C
GWAS using SNP arrays have provided

op
unprecedented insight into the genetic basis of
many traits/diseases, such as the data from

yr
this paper on autism.

ig
ht
ed
M
at
er
ia
l2
02
2
https://www.nature.com/articles/s41598-021-95447-z#Sec26
Genetic Basis of Disease: GWAS

C
op
yr
ig
ht
ed
M
at
er
https://doi.org/10.1016/j.cell.2019.01.015

ia
CONS
• l2
Depending on the disease/trait, GWAS loci are often found to only confer a small increase in disease risk

02
(many diseases are strongly associated with environmental factors, not genetics) and explain a fraction of the
heritability.

2
• Only ~10-20% of all GWAS participants are of non-European descent.
Next Generation Sequencing Technologies

C
• Sanger sequencing and SNP arrays gave way to high-throughput

op
sequencing technologies that are faster and cheaper.
• These massively parallel, highly multiplexed protocols are now typically

yr
referred to as Next-generation Sequencing (NGS) approaches often

ig
termed DNAseq or RNAseq.
• Current ‘next-gen’ based human WGS or whole exome sequencing (WES)

ht
now costs <$1000 and a few weeks.

ed
M
at
er
ia
l2
02
https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost

2
2
02
l2
ia
er
at
M
ed
ht
ig
yr
op
C
RNAseq
NGS Example Tech: Sequence by synthesis

C
Reversible dye-terminator (RDT)-based sequencing-by-synthesis (SBS)

op
(please note this is but one commonly used chemistry, there are many others)

yr
1. Fragments of the sample DNA/RNA template of interest (800–
1000K clusters per mm2) are anchored on a chip or “flowcell”.

ig
2. RDT-ddNTPs with florescent tags are added during synthesis (no
dNTPs) and at the end of each cycle, a picture is taken.

ht
3. An enzyme cleaves the fluorescent tag and turns the RDT-ddNTP

ed
into a regular dNTP to reverse termination and allow another
synthesis cycle to continue.
4. Repeat Step 2-3 for 100-300 bp: each “colored dot” in the pic

M
reveals the sequence.

at
er
ia
l2
02
2
https://www.youtube.com/watch?v=fCd6B5HRaZ8
Bioinformatics

C
op
• A typical next-gen WGS experiment on a single tissue derived mRNA
sample will sequence >500 million fragments or “reads” ~100-300bp in

yr
length.

ig
• Often stored in a >250 GB “.fastq” formatted file with quality control info

ht
for each “read” that looks something like…

ed
M
at
er
ia
l2
02
2
Bioinformatics

C
• Resequencing: when reads are aligned back to known genomic sequence

op
to identify all genetic variants, including…
• SNPs

yr
• insertions/deletions

ig
• structural variants

ht
• copy number variants (CNVs)

ed
• Genome Reference Consortium Human Build 38 (GRCh38)
https://www.ncbi.nlm.nih.gov/grc

M
at
er
ia
l2
02
2
Example Transcriptome Analytic Pipeline

C
• How do we convert TBs of sequencing data from mRNA samples

op
(RNAseq) into information on changes in gene expression?

yr
• MANY different analytic packages have been developed, here is but one
example pipeline/workflow…

ig
Map reads to genome:
ht Count reads: Compute Differential Gene

ed
e.g. STAR e.g. featurecounts Expression: e.g. DeSeq2

M
at
er
ia
l2
02
2
Public Repositories

C
• MANY different publically accessible repositories, that provide RNAseq

op
based information on gene expression.

yr
ig
ht
ed
M
at
er
ia
l2
RPKM = Reads Per Kilobase transcript, per Million mapped reads.
02
2
A commonly used normalized unit of transcript expression.
Public Repositories

C
• The Gene Expression Omnibus portal is a NCBI managed public

op
repository for sequencing data.
• https://www.ncbi.nlm.nih.gov/geo/

yr
ig
ht
ed
M
at
er
ia
• l2
Most journals now require data to be uploaded to GEO prior to publication.

02
2
Limitations of Bulk-tissue Analyses

C
• DNAseq/RNAseq aggregate information from all cells within a sample.

op
• What if the tissue is highly heterogenous?

yr
ig
ht If a gene is known to be

ed
expressed in multiple cell
types, bulk-tissue RNAseq
may be unable to tell you

M which cell type is

at
responsible for detected
changes in expression.

er
ia
l2
02
2
Single Cell RNAseq (scRNAseq)

C
• Is there a way to sequence nucleic acids from a single cell in a complex

op
multicellular tissue?
• Yes, by gently digesting/dissociating tissue, single cells can be extracted

yr
and individual cells (sometimes nuclei) can be sequenced.

ig
• Bioinformatics analysis is much more data intensive, complex, and

ht
continues to evolve.
• Sequencing “depth” (# of detected transcripts) is often limited relative to
RNAseq.
ed
M
at
er
ia
l2
02
2
Single Cell RNAseq (scRNAseq)

C
op
yr
ig
ht
ed
M
at
er
ia
l2
02
tSNE = t-distributed stochastic neighbor embedding: a statistical method for visualizing
and clustering cells defined by a global pattern of gene expression that gives each cell

2
a single location in a two or three-dimensional map
Case Study: Single nuclei RNAseq & Autism

C
op
yr
• Performed snRNAseq
ig
analysis of 104,559
ht
postmortem single nuclei
isolated from 15 ASD and 16
control individuals forebrain
regions.
ed
M
at
• Used a statistical clustering
method to identify patterns of
gene expression to classify
cells.
er
ia
• Able to untangle neuron vs
glial specific changes in
l2
02
gene expression in autism.

2
https://www.science.org/doi/10.1126/science.aav8130
Allen Brain Atlas Transcriptomic Explorer

C
• A Single Cell RNAseq Database that provides RNA expression

op
information from 49,495 randomly selected cells from the human cortex
• https://celltypes.brain-map.org/rnaseq/human_ctx_smart-seq?

yr
ig
ht
ed
M
at
er
ia
l2
02
2
Spatial Transcriptomics

C
Isolation of single cells in scRNA-seq destroys information on spatial

op
localization and proximity to other cells.
• What if we could examine gene expression in situ?

yr
“Spatial transcriptomics” is still nascent, but represents the next frontier in

ig
building 3D cellular and gene expression atlases of entire tissues/organisms.

ht
ed
M
at
er
ia
l2
02
2
Integrative “Omics”…

C
op
yr
ig
ht
ed
M
at
er
ia
l2
02
2
Circles = relevant molecules Arrows = potential interactions

You might also like