0% found this document useful (0 votes)

55 views

Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics

This document provides an overview of a lecture series on next generation sequencing (NGS) transcriptomics. The 4 lectures cover: 1) basics of NGS technologies and their strengths/weaknesses, 2) RNAseq experiment design considerations, 3) RNAseq expression analysis methods, and 4) using RNAseq reads for phylogeny and genetic applications like SNP calling and population genetics.

Uploaded by

tumatto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics

Uploaded by

tumatto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 120

Brief Guide for NGS Transcriptomics:

From gene expression to genetics.

by
Aureliano Bombarely
[email protected]
Lectures:
1.Basics of the Next Generation Sequencing (NGS).
1.1. The sequencing revolutions.
1.2. Strengths and weaknesses of the different technologies.
1.3. Inputs and outputs.
2.RNAseq experiment design.
2.1. Reference vs Non-reference.
2.2. High heterozygosity and polyploid polyploid problem.
2.3. Tissue selection and treatments.
2.4. Sequencing technology.
3.RNAseq expression analysis.
3.1. Reference preparation and read mapping.
3.2. Gene expression.
3.3. Analysis and visualization.
4.Use of RNAseq reads for phylogeny and genetics.
4.1. Recovering full length mRNA: Reference guided assembly.
4.2. Phylogeny though RNAseq: From gene tree to species tree.
4.3. From reads to markers: SNP calling.
4.4. Population genetics and NGS.
Lectures:
1.Basics of the Next Generation Sequencing (NGS).
1.1. The sequencing revolutions.
1.2. Strengths and weaknesses of the different technologies.
1.3. Inputs and outputs.
2.RNAseq experiment design.
2.1. Reference vs Non-reference.
2.2. High heterozygosity and polyploid polyploid problem.
2.3. Tissue selection and treatments.
2.4. Sequencing technology.
3.RNAseq expression analysis.
3.1. Reference preparation and read mapping.
3.2. Gene expression.
3.3. Analysis and visualization.
4.Use of RNAseq reads for phylogeny and genetics.
4.1. Recovering full length mRNA: Reference guided assembly.
4.2. Phylogeny though RNAseq: From gene tree to species tree.
4.3. From reads to markers: SNP calling.
4.4. Population genetics and NGS.
1.Basics of the Next Generation Sequencing (NGS).

DNA Sequencing:
“Process of determining the precise order of nucleotides within a DNA molecule.”
-Wikipedia

Genetics Medicine

Molecular Forensics
Taxonomy
Biology Biology

Breeding Ecology
1.Basics of the Next Generation Sequencing (NGS).

DNA Sanger Sequencing

1) PCR with ddNTPs

2) Chromatographic
Separation
Taq-Polymerase

ddATP

ddGTP

ddTTP STOP
ddCTP
time

3) Chromatogram Read

GTCACCCTGAAT

Total nucleotides
Run Time Sequence Length Reads/Run
sequenced
Capillary Sequencing
~2.5 h 800 bp 386 0.308 Mb
(ABI37000)
MS2 Bacteriophage (3.658 Kb) 1977
1978
1979
1980
1981
1982
1983
Epstein-Barr Virus (170 Kb) 1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
Haemophilus influenzae (1.83 Mb) 1995

Sanger Sequencing
Saccharomyces cerevisiae (12.1 Mb) 1996
1997
Caenorhabditis elegans (100 Mb) 1998
1.Basics of the Next Generation Sequencing (NGS).

1999
Arabidopsis thaliana (157 Mb) 2000
Homo sapiens (3.2 Gb) 2001
Oryza sativa (420 Mb) 2002
2003
2004
2005
Populus trichocarpa (550 Mb) 2006
Vitis vinifera (487 Mb) 2007
Physcomitrela (480 Mb); Carica papaya (372 Mb) 2008
454

Sorghum bicolor (730 Mb); Zea Mays (2.3 Gb); Cucumis sativus (367 Mb) 2009
Ectocarpus (214 Mb); Malus x domestica (742 Mb); Glycine max (1.1 Gb) 2010
SOLiD
Solexa / Illumina

Pigeonpea, Potato, Cannabis, A. lyrata, Cacao, Strawberry, Medicago 2011

IonT

Mei, Benthamiana, Tomato, Setaria, Melon, Flax, T. salsuginea, Banana, Cotton, Orange, Pear 2012
PB
1.1 The sequencing revolutions.

Features of Next Generation Sequencing.

1. Massive sequence production (from 0.1 to 300 Gb).

2. Wide range of sequence lengths (from 50 to 3,000 bp).

3. Same or bigger error rate that the traditional sequencing (from 87 to 99.9%).

4. Cheap price per base.

Next Generation Sequencing technologies

• Pyrosequencing (454/Roche).
• Illumina sequencing
• SOLID sequencing
• Ion semiconductor sequencing (IonTorrent)
• Single Molecule SMRT sequencing (PacBio)
1.1 The sequencing revolutions.

Velculescu VE & Kinzler KW Nature Biotechnology 25, 878 - 880 (2007)

1.1 The sequencing revolutions.

Next Generation Sequencing technologies

Total nucleotides
Run Time Sequence Length Reads/Run
sequenced per run
Capillary Sequencing
~2.5 h 800 bp 386 0.308 Mb
(ABI37000)
454 Pyrosequencing 700 Mb
~23 h 700 bp 1,000,000
(GS FLX Titanium XL+) (0.7 Gb)
Illumina 264 h / 27 h 2 x 100 bp 2 x 3,000,000,000 600,000 / 120,000 Mb
(HiSeq 2500) (11 days) 2 x 150 bp 2 x 600,000,000 (600 / 120 Gb)
Illumina 8,500 Mb
39 h 2 x 250 bp 2 x 17,000,000
(MiSeq) (8.5 Gb)
SOLID 48 h 30,000 Mb
75 bp 400,000,000
(5500xl system) (2 days) (30 Gb)
Ion Torrent 10,000 Mb
2h 100 bp 100,000,000
(Ion Proton I) (10 Gb)
PacBio 100 Mb
1.5 h ~3,000 bp 25,000
(PacBioRS) (0.1 Gb)
1.2 Strengths and weaknesses of the different technologies.

Next Generation Sequencing technologies

Strenghs Weaknesses

- Long reads (450/700 bp).

- Homopolymer error.
454 Pyrosequencing - Long insert for mate pair libraries (20Kb).
- Low sequence yield per run (0.7 Gb).
(GS FLX Titanium XL+) - Low observed raw error rate (0.1%)
- Preferred assembler (gsAssembler) uses
- Low percentage of PCR duplications for mate
overlapping methodology.
pair libraries

- High percentage of PCR duplications for mate

Illumina - High sequence yield per run (600 Gb) pair libraries.
(HiSeq 2500) - Low observed raw error rate (0.26%) - Long run time (11 days)
- High instrument cost (~ $650K)

Illumina - Medium read size (250 bp)

- Medium sequence yield per run (8.5 Gb)
(MiSeq) - Faster run than Illumina HiSeq

SOLID - 2-base encoding reduce the observed raw error - 2-base color coding makes difficult the
sequence manipulation and assembly.
(5500xl system) rate (0.06%)
- Short reads (75 bp)

Ion Torrent - Fast run (2 hours)

- Medium sequence yield per run (10 Gb)
(Ion Proton I) - Low instrument cost ($80K).
- Medium observed raw error rate (1.7%)
- Medium read size (200 bp)

PacBio - Long reads (3000 bp) - Really high observed raw error rate (12.7%)
(PacBioRS) - Fast run (2 hours) - High instrument cost (~ $700K)
- No pair end/mate pair reads
1.3 Inputs and Outputs.

Next Generation Sequencing technologies

Inputs Outputs

454 Pyrosequencing - Single Reads Library.

- sff files
(GS FLX Titanium XL+) - Pair End Library (3 to 20 Kb insert size).
- (fasta and fastq files)
- Multiplexed sample.

Illumina
(HiSeq 2500) - Single Reads Library.
- Pair End Library (170-800 bp insert size). - fastq files (Phred+64)
- Mate Pair Library (2 to 10 Kb insert Size) - fastq files (Phred+33, Illumina 1.8+)
Illumina - Multiplexed sample.
(MiSeq)

SOLID - Single Reads Library.

(5500xl system) - Mate Pairs Library (0.6 to 6 Kb insert size).
- Multiplexed sample.

Ion Torrent - Single Reads Library.

(Ion Proton I) - Pair End Library (0.6 to 6 Kb insert size). - fastq files (Phred+33)
- Multiplexed sample.

PacBio
(PacBioRS) - Single Reads Library.
1.3 Inputs and Outputs.

Next Generation Sequencing technologies

★ Library types:
• Single reads
• Pair ends (PE) (from 150-800 bp)

• Mate pairs (MP) (from 2Kb to 20 Kb)

1.3 Inputs and Outputs.

Next Generation Sequencing technologies

★ Library types (orientations):

• Single reads
F

• Pair ends (PE) (150-800 bp insert size)

F R Illumina

• Mate pairs (MP) (2-20 Kb insert size)

R F Illumina

F R 454/Roche
1.3 Inputs and Outputs.

Next Generation Sequencing technologies

• Why is important the pair information ?

- novo assembly:

Reads Consensus sequence

(Contig)

Pair Read information

Scaffold
NNNNN (or Supercontig)

Genetic information (markers)

F
Pseudomolecule
(or ultracontig)
NNNNN NN
1.3 Inputs and Outputs.

Next Generation Sequencing technologies

★ Multiplexing:
Use of different tags (4-6 nucleotides) to
identify different samples in the same lane/
sector.

AGTCGT
AGTCGT
AGTCGT
AGTCGT
AGTCGT
AGTCGT
AGTCGT Sequencing
TGAGCA
AGTCGT
TGAGCA
TGAGCA TGAGCA
TGAGCA TGAGCA
TGAGCA AGTCGT
TGAGCA
TGAGCA
1.3 Inputs and Outputs.

Sff files:

Standard flowgram format (SFF) is a binary file format used to encode results of
pyrosequencing from the 454 Life Sciences platform for high-throughput sequencing. SFF files
can be viewed, edited and converted with DNA Baser SFF Workbench (graphic tool), or
converted to FASTQ format with sff2fastq or sff_extract.
-Wikipedia

sff2fastq, (program written in C)

https://github.com/indraniel/sff2fastq

sff_extract, (program written in Python)

http://bioinf.comav.upv.es/sff_extract/download.html
1.3 Inputs and Outputs.

Fasta files:

It is a text-based format for representing either nucleotide sequences or peptide sequences, in

which nucleotides or amino acids are represented using single-letter codes.
-Wikipedia
1.3 Inputs and Outputs.

Fastq files:

FASTQ format is a text-based format for storing both a biological sequence (usually
nucleotide sequence) and its corresponding quality scores.
-Wikipedia

1. Single line ID with at symbol (“@”) in the first column.

2. There should be not space between “@” symbol and the first letter of the identifier.

3. Sequences are in multiple lines after the ID line

4. Single line with plus symbol (“+”) in the first column to represent the quality line.

5. Quality ID line can have or have not ID

6. Quality values are in multiple lines after the + line

1.3 Inputs and Outputs.

Fastq files:

Phred score of a base is: Qphred=-10 log10 (e)

Q=15 e=0.03 (min. used Sanger)

Q=20 e=0.01 (min. used 454 and Illumina)
Q=30 e=0.001 (standard used 454)
Exercises:

1. Basic Linux commands.

2. Sequencing evaluation.

3. Simple read mapping.

4. Simple de-novo assembly.

5. Basic R commands

6. Functional annotation.

7. Differential gene expression.

8. Cluster analysis for gene expression.

9. Selecting genes for phylogeny.

10. SNP calling and filtering.

11. Analysis of the population structure.

Exercises:

1. Basic Linux commands.

2. Sequencing evaluation.

3. Simple read mapping.

4. Simple de-novo assembly.

5. Basic R commands

6. Functional annotation.

7. Differential gene expression.

8. Cluster analysis for gene expression.

9. Selecting genes for phylogeny.

10. SNP calling and filtering.

11. Analysis of the population structure.

Lectures:
1.Basics of the Next Generation Sequencing (NGS).
1.1. The sequencing revolutions.
1.2. Strengths and weaknesses of the different technologies.
1.3. Inputs and outputs.
2.RNAseq experiment design.
2.1. Reference vs Non-reference.
2.2. High heterozygosity and polyploid polyploid problem.
2.3. Tissue selection and treatments.
2.4. Sequencing technology.
3.RNAseq expression analysis.
3.1. Reference preparation and read mapping.
3.2. Gene expression.
3.3. Analysis and visualization.
4.Use of RNAseq reads for phylogeny and genetics.
4.1. Recovering full length mRNA: Reference guided assembly.
4.2. Phylogeny though RNAseq: From gene tree to species tree.
4.3. From reads to markers: SNP calling.
4.4. Population genetics and NGS.
2. RNAseq Experiment Design

Applications for NGS Transcriptomics:

Novel Gene
Discovery
Alternative NcRNA Profiling
Splicing Discovery and Discovery

Genetic Marker Population

Development Analysis

Gene Expression
Analysis
2. RNAseq Experiment Design

Complex systems may create complex transcriptomes

Store the information DNA

Interactions between elements

(4,460 Escherichia coli genes*)
(33, 282 Arabidopis thaliana genes**)
(56,278 Oryza sativa genes***)
(20,000 - 25,000 Homo sapiens****)

* Regulation
Space (where) and Time (when)

(Promotors ...)

Express the information as RNA

* Regulation
(miRNA, alternative splicings ...)

Translate to protein
* Regulation
(glycosilations, phosphorilations...)

Synthesize compounds

* Karp et al. Multidimensional annotation of the Escheriichia coli K-12 genome. Nucleid Acid Research. 2007:35:7577-7590
** http://www.arabidopsis.org/portals/genAnnotation/genome_snapshot.jsp
*** http://rice.plantbiology.msu.edu/riceInfo/info.shtml
**** http://www.sanger.ac.uk/Info/Press/2004/041020.shtml
2. RNAseq Experiment Design

Transcriptome Complexity:

Simple System:

One Genome => Gene 1 copy => Single mRNA

Gene

Genome AAAAAAA
mRNA
mRNA population

Single Cell
2. RNAseq Experiment Design

Transcriptome Complexity:

How many species we are analyzing ?

1) Problems to isolate a single species (rhizosphere)
2) Species interaction study (plant-pathogen)

Gene mRNA population

Pathogen

Genome AAAAAAA
mRNA
2. RNAseq Experiment Design

Transcriptome Complexity:

How many possible alleles we expect per gene ?

1) Polyploids (autopolyploids, allopolyploids).
2) Heterozygosity
3) Complex Gene Families (tandem duplications)

mRNA population

AAAAAAA

Genome
AAAAAAA
mRNA
2. RNAseq Experiment Design

Transcriptome Complexity:

How many isoforms we expect for each allele ?

1) Alternative splicings

mRNA-1
AAAAAAA

Genome mRNA population

Gene

AAAAAAA
mRNA-2
2. RNAseq Experiment Design

Transcriptome Complexity:

Is the study performed at different time points?

1) Developmental stages (difficult to select the same)
2) Response to a treatment

time

1 2 3
2. RNAseq Experiment Design

Transcriptome Complexity:

Is the study performed with different parts?

1) Organ specific
2) Tissue/Cell type specific
(Laser Capture Microdissection, LCM)

mRNA population
2. RNAseq Experiment Design

Experimental Design:

Genomic Biological
Considerations Considerations
Number of Species Organ/Tissue/Cell Type
Polyploidy/Heterozygosity Developmental Stage
Treatments

Economical Technical
Considerations Considerations
Budged Skills/Hardware
Controls Replicates

Technology Used
Library Preparation Sequencing Amount
Analysis Pipeline
2.1 Reference vs. Non-reference

Reference:

Generally a genomic sequence with gene models used to align

the reads, but a reference can be a de-novo assembled
transcriptome.

The most important advantage of the use of a reference is that

the analysis is computationally less intense because it only
needs to align reads.

Methodology Technology Program Minimum RAM* Time*

454 gsMapper 1 Gb several hours
Mapping
Illumina Bowtie2 2 Gb** < 1 hour
454 gsAssembler 8 Gb > 1 day
De-novo Trinity 16 Gb > 1 day
Illumina
SOAPdenovo-trans 16 Gb several hours
* Rough approach
** Human genome size (~3 Gb)
2.1 Reference vs. Non-reference

Plant Genomes:

http://chibba.agtec.uga.edu/duplication/index/home
2.1 Reference vs. Non-reference

Reference and phylogenetic relations:

Can I use as a reference a different accession ?

Yes

Can I use as a reference a different species ?

Same genus

For most of them, but some losses are expected for the
most polymorphic genes

Same family

Probably not. Still some reads will map with the most
conserved genes.
2.1 Reference vs. Non-reference

Reference and phylogenetic relations:

Percentage of mapped reads using Arabidopsis thaliana col. as reference

% Mapped
Species Accession SRA Reads Time
Read

Col SRR513732 12672866 75% 00:11:45

Arabidopsis
Ler SRR392121 9752382 71% 00:07:05
thaliana

C24 SRR392124 6186734 72% 00:04:29

Arabidopsis
- SRR072809 9214967 69% 00:10:11
lyrata

Brassica rapa - ERR037339 29230003 20% 00:23:15

This test was performed using 1 core. The memory peak was 155 Mb. Reads were preprocessed with Q20 L30. Mapping tool: Bowtie2
2.2 High heterozygosity and polyploid polyploid problem

High heterozygosity/Polyploid problem:

mRNA from species with a high heterozygosity or a polyploid genome can
produce highly polymorphic reads for the same gene.

Reference Gene 1

ATGCGCGCTAGACGACATGACGACA

Irregular CACTTGACGACATGACG Gene 1 A

mapping
CTTGACGACATGACGAC
CCCTTGACGACATGACG Gene 1 B
Mapping Gene CGCCCTTGACGACATGA
expression is
an average Expression Gene 1 = A + B
Highly
polymorphic
genes
CACTTGACGACATGACG Gene 1 A
Similar
De-novo CTTGACGACATGACGAC
alleles will CCCTTGACGACATGACG Gene 1 B
assembly
collapse CGCCCTTGACGACATGA
CGCCCTTGACGACATGACGACA
Collapsed consensus Gene A + Gene B
2.2 High heterozygosity and polyploid polyploid problem

High heterozygosity/Polyploid problem:

If the reference is a paleoploid and it had recent WGD event, the mapping
can be irregular and produce an important bias.

ATGCGCGCTAGACGACATGACGACAGCGTGGCGTAG Reference Gene 1

GCGCGC ATGACG
CGCGCT 2
ATGACG
TGACGA
+4=6
TGACGA (+ 2 = 4)

CCGCTA
CCGCTA
CCGCTA 5
TGACGA +4=9
CCCGCT TGACGA (+ 5 = 10 )
CCCGCT ATGACG
GCCCGC ATGACG
ATGCCCGCTAGACGACATGACGACAGCGTGTCGTAG Reference Gene 2

Non
Polymorphic Polymorphic Reads assigned randomly
Region Region
2.2 High heterozygosity and polyploid polyploid problem

High heterozygosity/Polyploid problem:

If the reference is a paleoploid and it had recent WGD event, the mapping
can be irregular.

http://chibba.agtec.uga.edu/duplication/index/home
2.3 Tissue Selection and Treatments

Tissue selection and treatments:

Different organs, tissues or cell types can produce different mRNA

extraction yields.

For samples where a low yield is expected a common practice is 1 to 3

rounds of cDNA amplification, specially using techniques such as LCM.

Amplifications produce severe bias for between low/high represented

transcripts

Best Practices:
1) Compare samples with same number of amplification rounds
2) Use software to measure and correct the bias
(example: seqbias from R/Bioconductor, Jones DC et al. 2012)
2.3 Tissue Selection and Treatments

Tissue selection and treatments:

Sequencing of multiple samples can be performed using multiplexing.

The multiplexing add a tag/barcode of 4-6 nucleotides during the library

preparation to identify the sample. Common kits can add up to 96 different
tags.

CGATCG

Library
Control - mRNA preparation
extraction and
multiplexing

ATCGTA

Treatment +
2.4 Sequencing technology

Selecting the right sequencing technology.

1) How many reads I need per sample ?

Enough to represent the mRNA population.

ENCODE consortium’s Standards, Guidelines and Best Practices for RNA-Seq

“Experiments whose purpose is to evaluate the similarity between the

transcriptional profiles of two polyA+ samples may require only modest depths of
sequencing (e.g. 30M pair-end reads of length > 30NT, of which 20-25M are
mappable to the genome or known transcriptome.”

“Experiments whose purpose is discovery of novel transcribed elements and

strong quantification of known transcript isoforms… a minimum depth of
100-200 M 2 x 76 bp or longer reads is currently recommended.”

http://www.rna-seqblog.com/information/how-many-reads-are-enough/
2.4 Sequencing technology

Selecting the right sequencing technology.

1) How many reads I need per sample ?

Enough to represent the mRNA population.

Tarazona S. et al. (2012) Differential expression in RNA-seq: a matter of depth. Genome Res.21:2213-23
2.4 Sequencing technology

Selecting the right sequencing technology.

1) How many reads I need per sample ?

Selecting the right sequencing technology.

2) How long should be these reads?

Depending if you need to do a de-novo assembly, mapping

with a reference with recent WGD or mapping with a
reference without recent WGD.

• de-novo assembly
• Reference with recent WGD } ‣Longer is better (at least 100 bp)
‣Pair ends recommended

• Reference without recent WGD ‣Any size beyond 35 bp

2.4 Sequencing technology

Selecting the right sequencing technology.

3) Do I have software/hardware limitations?

Some tools have some limitations to work with long reads.

Other tools doesnt work with color space reads produced by

Solid.
2.4 Sequencing technology

Selecting the right sequencing technology.

Yes Do I have a reference ? No

Illumina / Ion Torrent / Solid

Single Reads
Do I have a multiple
Yes No
conditions ?

Illumina / Ion Torrent

Pair Ends
Do I have a powerful
Yes No
server?

Illumina 454 for de-novo

Pair Ends Illumina for expression
Lectures:
1.Basics of the Next Generation Sequencing (NGS).
1.1. The sequencing revolutions.
1.2. Strengths and weaknesses of the different technologies.
1.3. Inputs and outputs.
2.RNAseq experiment design.
2.1. Reference vs Non-reference.
2.2. High heterozygosity and polyploid polyploid problem.
2.3. Tissue selection and treatments.
2.4. Sequencing technology.
3.RNAseq expression analysis.
3.1. Reference preparation and read mapping.
3.2. Gene expression.
3.3. Analysis and visualization.
4.Use of RNAseq reads for phylogeny and genetics.
4.1. Recovering full length mRNA: Reference guided assembly.
4.2. Phylogeny though RNAseq: From gene tree to species tree.
4.3. From reads to markers: SNP calling.
4.4. Population genetics and NGS.
3. RNAseq Expression Analysis

RNAseq Data Analysis Steps:

Demultiplexing Fastq raw

Fastq
Yes Are the samples multiplexed ? No
demultiplexed
Preprocesssing

Fastq
preprocessed Assembling

Yes Do I have a reference ? No Consensus

Mapping

Bam Expression Tables Differentially Expressed Genes

Read
SNP Calling counting Gene Clusters by Expression

Vcf Population Clusters

3.1 Reference preparation and read mapping

0. Read quality evaluation.

a) Length of the read.

b) Bases with qscore > 20 or 30.

• FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/)
3.1 Reference preparation and read mapping

1. Read demultiplexing, filtering and trimming

• Separation of multiplexed samples.

• Adaptors removing.
• Low quality extreme trimming.
• Minimum Q20.

• Suggested Q30.

• Short sequence removing.

• Suggested L50
3.1 Reference preparation and read mapping

1. Read demultiplexing, filtering and trimming

• Fastx-Toolkit (http://hannonlab.cshl.edu)
• Ea-Utils (http://code.google.com/p/ea-utils/)
• PrinSeq (http://edwards.sdsu.edu/cgi-bin/prinseq/prinseq.cgi)

Software Multiplexing Trimming/Filtering

Fastx-Toolkit fastx_barcode_splitter fastq_quality_filter

Ea-Utils fastq-multx fastq-mcf

PrinSeq PrinSeq PrinSeq

3.1 Reference preparation and read mapping

2.1 Read mapping

• Reference is a genome with gene model annotations.

Tophat (Bowtie2)

• Reference is not a genome with gene model annotations.

Bowtie2

BWA

GsMapper (454) ...

3.1 Reference preparation and read mapping

2.1 Read mapping

Sequencing
Software Features URL
technology
http://bio-bwa.sourceforge.net/
bwa Illumina Mapping bwa.shtml

Illumina, http://bowtie-bio.sourceforge.net/
bowtie Mapping index.shtml
SOLID
Illumina, 454 http://bowtie-bio.sourceforge.net/
bowtie2 Mapping bowtie2
(fastq)
Illumina, http://www.novocraft.com/main/
novoalign Mapping index.php
SOLID
http://454.com/products/analysis-
gsMapper 454 (sff) Mapping, annotation software/index.asp#reference-tabbing

http://soap.genomics.org.cn/
SOAPaligner Illumina Mapping soapaligner.html

TopHat
Illumina Mapping, splicing http://tophat.cbcb.umd.edu/index.html
(bowtie)
3.1 Reference preparation and read mapping

2.1 Read mapping

• Reference is a genome with gene model annotations.

1. Fasta file with genome sequence

2. Gff file with gene model annotations

##gff-version 3
##sequence-region ctg123 1 1497228
ctg123 . gene 1000 9000 . + . ID=gene00001;Name=EDEN
ctg123 . exon 1300 1500 . + . ID=exon00001;Parent=mRNA00003

http://www.sequenceontology.org/resources/gff3.html
3.1 Reference preparation and read mapping

2.1 Read mapping

Gff file with gene model annotations:

Tabular format file with 9 columns:

- Column 1: "seqid"
- Column 2: "source"
- Column 3: "type"
- Column 4: "start coordinate" (1 based coordinate)
- Column 5: "end coordinate"
- Column 7: "strand"
- Column 8: "phase"
- Column 9: "attributes" composed by tags such as:
ID; Name; Alias; Parent; Target; Gaps; Derives_from; Note;Dbxref
3.1 Reference preparation and read mapping

2.1 Read mapping

Before run the software some considerations:

• How similar are the reads with the reference ?

By default only 1 SNP is allowed by the mapper.

• Can one read map equally in different genes (gene duplications) ?

By default read mappers assign randomly sequences that

map equally.

• Are you mapping more than one species ?

Use combine sets to avoid multiple mappings of the same

read.
3.1 Reference preparation and read mapping

2.1 Read mapping

Before run the software some considerations:

Example: Mapping reads from infected N. benthamiana leaves

with P. syringae.

1. Join both datasets (fasta and gff3).

2. Map the reads using Bowtie2:

bowtie2-build -f merged_reference.fasta

bowtie2 -N 1 -M 0 -x merged_reference.fasta
seq1.fastq -S results.sam

• Exclude reads with more than M+1 matches

• Allow 1 mismatch in the seed

3.1 Reference preparation and read mapping

2.1 Read mapping

Fasta
(Reference)

Read Mapping
Fastq
Sam/Bam
preprocessed

Gff
(Reference)

Note about the hardware and mapping software:

+ Bigger is the reference, more memory the programs needs
(example: Bowtie2 ~2.1 Gb for human genome with 3 Gb)
+ Longer are the reads, more time the program needs for the mapping.
3.1 Reference preparation and read mapping

2.1 Read mapping

The standard output for read alignments is a sam/bam format. Sam

format is a tabular delimited format with a header lines starting
with the character ‘@’ and one line per alignment with 11
mandatory fields.

http://samtools.sourceforge.net/SAM1.pdf
3.1 Reference preparation and read mapping

2.1 Read mapping

Alignment

Sam file

http://samtools.sourceforge.net/SAM1.pdf
3.1 Reference preparation and read mapping

2.1 Read mapping

Cigar String

8M2I4M1D3M 8M 8 aligned nucleotides (match or mismatch)

2I 2 insertions to the reference
4M 4 aligned nucleotides (match or mismatch)
1D 1 deletion from the reference
3M 3 aligned nucleotides (match or mismatch)

http://samtools.sourceforge.net/SAM1.pdf
3.1 Reference preparation and read mapping

2.1 Read mapping

Flags String

‣ Flag = 4 means 0x4 read unmapped

‣ Flag = 16 means 0x10 read reverse strand
‣ Flag = 83 means 0x1 read paired, 0x2 read mapped proper pair, 0x10 read
reverse strand and 0x40 first in pair
http://samtools.sourceforge.net/SAM1.pdf
3.1 Reference preparation and read mapping

2.1 Read mapping

Evaluation of the mapping results:

Software:
Samtools (http://samtools.sourceforge.net/samtools.shtml)

Count all the reads: samtools view -c file.sam/file.bam

Count mapped reads: samtools view -c -F 4 file.sam/file.bam

Sam to Bam: samtools view -Scb -o file.bam file.sam

Sam to Bam (mapped): samtools view -Scb -F 4 -o file.bam file.sam

3.1 Reference preparation and read mapping

2.2 Transcriptome de-novo assembly

Workflow scheme for a transcriptome assembly

Schliesky S, Gowik U, Weber AP, Bräutigam A. (2012) RNA-Seq Assembly – Are We There Yet? Front Plant Sci doi: 10.3389/fpls.2012.00220
3.1 Reference preparation and read mapping

2.2 Transcriptome de-novo assembly

Decisions during the Fastq raw

Experimental Design
1. Low quality reads (qscore) (Q30)
Reads 2. Short reads (L50)
Technology processing
3. PCR duplications (Only Genomes).
and
Library Preparation filtering 4. Contaminations.
5. Corrections
Sequencing Amount
Fastq Processed

Consensus Decisions during the

Assembly Optimization
Contigs
Software
Scaffolding
Identity %, Kmer ...
Scaffolds Post-assembly Filtering
3.1 Reference preparation and read mapping

2.2 Transcriptome de-novo assembly

Sequencing
Software Type Features URL
technology
Overlap-layout- Highly http://sourceforge.net/apps/
MIRA Sanger, 454 mediawiki/mira-assembler
consensus configurable
Overlap-layout- http://454.com/products/
gsAssembler Sanger, 454 Splicings analysis-software/index.asp
consensus
Overlap-layout- Improves http://bioinfo.bti.cornell.edu/
iAssembler Sanger, 454 tool/iAssembler
consensus MIRA
Splicings, http://www.bcgsc.ca/platform/
Trans-ABySS* 454 or Illumina Bruijn graph bioinfo/software/trans-abyss
Gene fusions
SOAPdenovo- http://soap.genomics.org.cn/
454 or Illumina Bruijn graph Fastest SOAPdenovo-Trans.html
trans*
454 or Illumina http://www.ebi.ac.uk/~zerbino/
Velvet/Oases Bruijn graph SOLiD oases/
or SOLiD
Downstream http://
Trinity* 454 or Illumina Bruijn graph trinityrnaseq.sourceforge.net/
expression

* Comparisons in the Article: Vijay N. et al (2012) Molecular Ecology DOI: 10.1111/mec.12014

3.1 Reference preparation and read mapping

What is a Kmer ?
Specific n-tuple or n-gram of nucleic acid or amino acid sequences.
-Wikipedia
ordered list contiguous sequence
of elements of n items from a given
sequence of text

ATGCGCAGTGGAGAGAGAGCGATG Sequence A with 25 nt

5 Kmers of 20-mer

ATGCGCAGTGGAGAGAGAGC
TGCGCAGTGGAGAGAGAGCG
GCGCAGTGGAGAGAGAGCGA N_kmers = L_read - Kmer_size
CGCAGTGGAGAGAGAGCGAT
GCAGTGGAGAGAGAGCGATG
3.1 Reference preparation and read mapping

2.2 Transcriptome de-novo assembly

Overlap-layout-consensus Bruijn graph

More memory, Faster,

percentage of identity configurable less memory intensive

Li Z. et al. (2011) Comparison of the two major classes of assembly algorithms: overlap–layout–consensus and de-bruijn-graph
Brief. Funct. Genomics 11: 25-37. doi: 10.1093/bfgp/elr035
1. A brief history of the sequence assembly.

Compeau PEC. et al. How to apply de Bruijn graphs to genome assembly. Nature Biotech. 2011. 29:287-291
3.2 Gene Expression

3.1 Gene Expression for RNAseq

Alicia Oshlack A, Robinson MD, and YoungMD, Genome Biology 2010, 11:220
3.2 Gene Expression

3.1 Gene Expression for RNAseq

Gene expression for RNAseq analysis is based in how many reads map
to an specific gene. For comparison purposes the counts needs to be
normalized. There are different methodologies.

๏ RPKM (Mortazavi et al. 2008): Reads per Kilobase of Exon

perMillion of Mapped reads.
๏ Upper-quartile (Bullard et al. 2010): Counts are divided per
upper quartile of counts with at least one read.
๏ TMM (Robinson and Oshlack, 2010): Trimmed Means of M values
(EdgeR).
๏ FPKM (Trapnell et al. 2010): Fragment per Kilobase of exon per
Million of Mapped fragments (Cufflinks).
3.2 Gene Expression

3.1 Gene Expression for RNAseq

Software Normalization Notes URL

http://woldlab.caltech.edu/wiki/
ERANGE RPKM Python RNASeq

http://www.broadinstitute.org/
Scripture RPKM Java software/scripture

R/Bioconductor, http://www.bioconductor.org/
BitSeq* RPKM Calculate DE packages/2.12/bioc/html/BitSeq.html

R/Bioconductor, http://www.bioconductor.org/
EdgeR TMM Calculate DE packages/2.11/bioc/html/edgeR.html

Isoforms,
Cufflinks* FPKM Calculate DE
http://cufflinks.cbcb.umd.edu/

Isoforms,
MMSEQ* FPKM Haplotypes
http://bgx.org.uk/software/mmseq.html

http://deweylab.biostat.wisc.edu/rsem/
RSEM* FPKM Calculate DE (EBSeq) README.html

* Comparisons in the Article: Glaus P. et al (2012) Bioinformatics 28:1721-1728 doi:10.1093/bioinformatics/bts260

3.2 Gene Expression

3.1 Gene Expression for RNAseq

“Tuxedo” Pipeline: Bowtie2 + TopHat + Cufflinks

Trapnell C, et al. 2012 Nature Biotechnology doi:10.1038/nbt.1621

3.2 Gene Expression

3.2 Differential Gene Expression

Statistical test to evaluate if one gene has an differential expression

between two or more conditions. These test can be based in different
methodologies.

๏ Negative binomial distribution (DESeq, CuffLinks).

๏ Bayesian methods for the negative binomial
distribution (EdgeR, BaySeq, BitSeq).
๏ Non-parametric: models the noise distribution of count
changes by contrasting fold-change differences (M) and absolute
expression differences (D) (NOISeq).
3.2 Gene Expression

3.2 Differential Gene Expression

Need
Software Normalization Input URL
Replicas

Library Size / http://www.bioconductor.org/

EdgeR Yes Raw Counts
TMM packages/2.11/bioc/html/edgeR.html

http://bioconductor.org/packages/
DESeq Library Size No Raw Counts release/bioc/html/DESeq.html

http://www.bioconductor.org/
baySeq Library Size Yes Raw Counts packages/2.11/bioc/html/baySeq.html

Raw or
Library Size / http://bioinfo.cipf.es/noiseq/doku.php?
NOISeq No Normalized
RPKM / UpperQ Counts
id=start

Tarazona S. et al. (2012) Differential expression in RNA-seq: a matter of depth. Genome Res.21:2213-23
3.3 Analysis and Visualization

3.3 Explorative Data Mining Methods

Data mining is the process that attempts to discover patterns in

large data sets. Data mining involves six common classes of tasks:

▪
Anomaly detection (Outlier/change/deviation detection) - Search
of unusual data records

▪
Association rule learning (Dependency modeling) - Search of
relationships between variables.

▪
Clustering - Discover groups and structures by similarity.

▪
Classification - Apply known structure to the new data

▪
Regression - Modeling to find the least error

▪
Summarization – Including visualization and report generation.

http://en.wikipedia.org/wiki/Data_mining
3.3 Analysis and Visualization

3.3 Explorative Data Mining Methods

For gene expression there are some common tasks and associated
methods for the data mining:

▪
Clustering of the expression values and principal component
analysis to reduce the variables.

▪
Classification using Gene Ontology terms and metabolic
annotations

▪
Summarization visualizing the expression data through heat
maps.
3.3 Analysis and Visualization

3.3 Cluster Analysis and Visualization

Cluster analysis or clustering is the task of assigning a set of

objects into groups (called clusters) so that the objects in the same
cluster are more similar (in some sense or another) to each other than
to those in other clusters. Clustering is a main task of explorative
data mining. The most used clustering algorithm for gene
expression are:

๏ Hierarchical clustering (HCL), where the distance between

elements is used to build the clusters.
๏ K-means clustering (KMC), where clusters are represented
by a vector. The number of clusters is fixed and the elements are
assigned based in its distance to the vector.

http://en.wikipedia.org/wiki/Cluster_analysis
3.3 Analysis and Visualization

3.3 Cluster Analysis and Visualization

Software Clustering Algorithm URL

MeV HC, KMC, visualization http://www.tm4.org/mev/about

HC ( hclust() function )
Stats http://stat.ethz.ch/R-manual/R-patched/library/
KMC ( kmeans() function ) stats/html/stats-package.html
(R package)
Visualization ( gplots() function )

http://www.broadinstitute.org/cancer/software/
GENE-E HC, visualization GENE-E/
3.3 Analysis and Visualization

3.3 Cluster Analysis and Visualization

Severin AJ et al., 2010 BMC Plant Biology, 10:160

3.3 Analysis and Visualization

3.3 Classification Analysis and Visualization

One of the most common classification data mining method is the use
of gene annotations such as GO terms or metabolic annotations. These
methodologies compare two groups between them to find if there are
term more represented in one group than in other. Some examples are:

๏ Gene Set Enrichment Analysis (GSEA), computational

method that determines whether an a priori defined set of genes
shows statistically significant.
๏ Profile comparisons, each group defines a profile based in the
annotation groups (generally GO terms). Profiles are compared to
find if they are significantly different.
3.3 Analysis and Visualization

3.3 Classification Analysis and Visualization

๏ Gene Set Enrichment Analysis (GSEA), computational

3.3 Classification Analysis and Visualization

Gene ontologies:

Structured controlled vocabularies (ontologies) that

describe gene products in terms of their associated

biological processes,

cellular components and

molecular functions

in a species-independent manner

http://www.geneontology.org/GO.doc.shtml
3.3 Analysis and Visualization

3.3 Classification Analysis and Visualization

Biological processes,
Recognized series of events or molecular functions. A process is
a collection of molecular events with a defined beginning and end.

Cellular components,
Describes locations, at the levels of subcellular structures and
macromolecular complexes.

Molecular functions
Describes activities, such as catalytic or binding activities, that occur at the molecular
level.

http://www.geneontology.org/GO.doc.shtml
3.3 Analysis and Visualization

3.3 Classification Analysis and Visualization

Bioconductor Packages for GO Terms:

GO.db A set of annotation maps describing the entire Gene Ontology

Gostats Tools for manipulating GO and microarrays

GOSim functional similarities between GO terms and gene products

GOProfiles Statistical analysis of functional profiles

TopGO Enrichment analysis for Gene Ontology

http://www.geneontology.org/GO.doc.shtml
Lectures:
1.Basics of the Next Generation Sequencing (NGS).
1.1. The sequencing revolutions.
1.2. Strengths and weaknesses of the different technologies.
1.3. Inputs and outputs.
2.RNAseq experiment design.
2.1. Reference vs Non-reference.
2.2. High heterozygosity and polyploid polyploid problem.
2.3. Tissue selection and treatments.
2.4. Sequencing technology.
3.RNAseq expression analysis.
3.1. Reference preparation and read mapping.
3.2. Gene expression.
3.3. Analysis and visualization.
4.Use of RNAseq reads for phylogeny and genetics.
4.1. Recovering full length mRNA: Reference guided assembly.
4.2. Phylogeny though RNAseq: From gene tree to species tree.
4.3. From reads to markers: SNP calling.
4.4. Population genetics and NGS.
4. Use of RNAseq reads for phylogeny and genetics.

4.0 What is a phylogenetic tree ?

A phylogenetic tree is a diagram that shows the evolutionary

relationships among genes and organism. A phylogenetic tree has
different parts:
A A and B are leaves
1 a
C is an external node
d
2 B 1 and 2 are internal nodes
b
z a, b, c and d are branches
C z is the root
c

External nodes and leaves represents extant and existing

taxa (operational taxonomic units, OTU).
Internal nodes may be called hypothetical taxonomic units (HTU)

The Phylogenetic Handbook, 2nd ed. Cambridge

4. Use of RNAseq reads for phylogeny and genetics.

4.0 What is a phylogenetic tree ?

A
B
C

Unroot tree B
C Outgroup
Root tree

Unroot tree: Position of each taxa relative to each other.

Root tree: Position of the taxa to a common ancestor.
A tree can be rooted if at least one of the OTU is an outgroup

The Phylogenetic Handbook, 2nd ed. Cambridge

4. Use of RNAseq reads for phylogeny and genetics.

4.0 What is a phylogenetic tree ?

There are two different methods to construct phylogenetic trees:

๏ Character state, uses discrete characters such as morphologic

data or sequence data.
๏ Distance matrix, uses a measure of the dissimilarity of two
OTUs to produce a pairwise distance matrix.

They use a evolutionary model to correct multiple hits.

Tree evaluation by optimality search criterion.

The Phylogenetic Handbook, 2nd ed. Cambridge
4. Use of RNAseq reads for phylogeny and genetics.

4.0 What is a phylogenetic tree ?

Classification of Phylogenetic Analysis

Optimality Search Criterion Clustering

Maximum Parsimony (MP)

Character
Maximum Likelihood (ML)
State
Bayesian Inference (BI)

Fitch-Margoliash (FM) UPGMA

Distance
Matrix Minimum Evolution (ME) Neighbor-joining (NJ)

The Phylogenetic Handbook, 2nd ed. Cambridge

4. Use of RNAseq reads for phylogeny and genetics.

4.0 Phylogenetic Software

Software Methods URL

http://evolution.genetics.washington.edu/
Phylip UPGMA, NJ, Fitch, ML and MP phylip/general.html

MEGA4 NJ, ME, ML and MP http://www.megasoftware.net

PAUP ML and MP http://paup.csit.fsu.edu/about.html

FastTree NJ, ME and ML http://www.microbesonline.org/fasttree/

PhyML ML http://www.atgc-montpellier.fr/phyml/

RAxML ML http://sco.h-its.org/exelixis/software.html

MrBayes BI http://mrbayes.sourceforge.net/index.php
4. Use of RNAseq reads for phylogeny and genetics.

4.0 Steps to Infer a Phylogenetic Tree

Sequence

Alignment

Pairwise Distance
Matrix

Tree

Tree with bootstraps

4. Use of RNAseq reads for phylogeny and genetics.

4.0 Why a phylogenetic tree needs a bootstrap analysis ?

Bootstrap analysis and Jackknifing are the methodologies used

to evaluate the reliability of the inferred tree. They can be
applied to all tree construction. Under normal circumstances, branches
supported by less than 70% of the bootstrap should be treated
with caution.
ATGCGTCGTTAG - A ATGCGTCGTTAG - A ATGCGTCGTGAG - A
ATGTGTCGTTAG - B AGGTGTCGTTAG - B ATGTGTCGTTAG - B
ATGTGACGTTAG - C ATGTGACGTTAG - C ATGTGACGTTAG - C D
100
ATGTGACTTTAG - D ATGTGACTTTAG - D ATGTGACTTTAG - D 100 C
80
D D D B

C C C A

B A B

A B A

Original Tree Bootstrap 1 .... Bootstrap 100

4. Use of RNAseq reads for phylogeny and genetics.

4.0 Why a phylogenetic tree needs a bootstrap analysis ?

So bootstrap values are like the error bars for a phylogenetic tree.
A tree without bootstrapping values has an incomplete information
about how reliable are each of the branches.

100 D
100 C
80
B

A
4. Use of RNAseq reads for phylogeny and genetics.

4.0 Use of RNAseq for phylogenetic analysis

One of the advantages of the RNAseq data to the microarrays is that

RNAseq produces thousand of mRNA sequences. These sequences, like
any other sequence can be used to perform a phylogenetic analysis:

1. Use CDS sequence, from start codon to the codon before the stop
codon. Use full length if they are available.
2. The consensus sequence is supported by enough reads to avoid
sequencing errors.
4.1 Recovering full length mRNA: Reference guided assembly.

4.1 Consensus sequences

There are two ways to retrieve a consensus sequence of a set of reads

from the same gene (or gene family).

1. De-novo assembly
2. Reference guided assembly

ATGCCCGCTAGACGACATGACGACAGCGTGTCGTAG Reference
TCGCTA TGACGA
ACGCTA TGACGA Mapped reads
TCGCTA ATGACG
CTCGCT ATGACG
CTCGCT
GCTCGC
Consensus
NNGCTCGCTANNNNNNATGACGANNNNNNNNNNNNN reference guided
4.1 Recovering full length mRNA: Reference guided assembly.

4.1 Consensus sequences

The most common tool used to generate a sequence consensus from a

read dataset alignment is samtools/bcftools using the SNP information
generated by samtools mpileup.

samtools mpileup -uf reference.fa align.bam | 1) BAM => BCF

bcftools view -cg - | 2) BCF => VCF

vcfutils.pl varFilter -d 5 | 3) Filter SNP depth < 5
vcfutils.pl vcf2fq > consensus.fq 4) Generate consensus

BCF (Binary variant Call Format) stores the variant call for the mapped reads at each
reference position.
4.1 Recovering full length mRNA: Reference guided assembly.

4.1 Consensus sequences

VCF (Variant Call Format) is a common file format to store sequence
polymorphism (SNPs and INDELs) based in a reference position. It has
three parts:
• Meta-information lines (starting with ‘#’)
• Header line (starting with ‘#CHROM’).
• Data lines, 8 columns separated by tabs.
‣ CHROM, chromosome
‣ POS, position (1-based coordinate)
‣ ID, identifier for the polymorphism
‣ REF, reference base
‣ ALT, alternative base (for no alternative base, ‘.’)
‣ QUAL, phred based quality score for the alternative base
‣ FILTER, if the variant call has passed thr filter
‣ INFO, additional information, such as read depth (DP), or allele frequency
(AF1)

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
4.1 Recovering full length mRNA: Reference guided assembly.

4.1 Consensus sequences

Once the consensus is generated, the mRNA or the cds can be

retrieved using a gene model annotation gff3 file.

bp_sreformat.pl -if fastq -of fasta 1) Fastq => Fasta

-i consensus.fq
-o consensus.fa

gffread annot.gff -g consensus.fa -w mrna.fa 2.1) Get mRNA

gffread annot.gff -g consensus.fa -x cds.fa 2.2) Get CDS

Consensus sequences will have the same ID than the reference so is

easy to retrieve the same gene in different samples.
4.2 Phylogeny though RNAseq: From gene tree to species tree.

4.2 Inferring the gene tree.

1) Select the gene of interest. If it is possible
Sequence search for an outgroup at GenBank and add
to the fasta file with all the sequences.

2) Select the alignment tool, run the alignment

Alignment and check the results to get an optimal
alignment.

Pairwise Distance 3) Decide the type of phylogenetic analysis to

Matrix perform and choose the right software
based in the method, speed, memory
consumption and usability.

Tree

Tree with bootstraps

4.2 Phylogeny though RNAseq: From gene tree to species tree.

4.2 Inferring the gene tree.

There are dozens of multiple sequence alignment tools. The most

representatives are:

Software Feature Alignment Type URL

ClustalW Progressive alignment Global and Local http://www.ebi.ac.uk/clustalw/

http://bibiserv.techfak.uni-
DiAlign Segment-based method Global and Local bielefeld.de/dialign

Kalign Progressive alignment Global http://msa.cgb.ki.se/

Progressive and iterative http://align.bmr.kyushu-u.ac.jp/

MAFFT alignment Global and Local mafft/software/

Progressive and iterative http://phylogenomics.berkeley.edu/

MUSCLE alignment Global and Local cgi-bin/muscle/input_muscle.py

Sensitive progressive
TCoffee alignment Global and Local http://www.tcoffee.org

http://www.kuleuven.be/aidslab/phylogenybook/Table3.1.html
4.2 Phylogeny though RNAseq: From gene tree to species tree.

4.2 Exploring multiple gene phylogenetic trees

Multiple phylogenetic analysis can be performed for different gene

groups.
๏ Hal (Robbertse B. et al. 2011).
๏ PhygOmics (Bombarely A.)

1. Search of unusual tree topologies.

2. Look into the most represented tree topologies to infer an
species tree.
4.2 Phylogeny though RNAseq: From gene tree to species tree.

4.2 Exploring multiple gene phylogenetic trees

๏PhygOmics

https://github.com/solgenomics/PhygOmics
4.2 Phylogeny though RNAseq: From gene tree to species tree.

4.2 Exploring multiple gene phylogenetic trees

๏PhygOmics
~ 1,000 Gene trees
for the allotetraploid
Nicotiana tabacum and
its diploids
progenitors, N.
sylvestris and N.
tomentosiformis were
analyzed to identify
the origin of each
homoeolog.

Bombarely A et al., 2012 BMC Genomics

4.3 From reads to markers: SNP calling.

4.3 SNPs from RNAseq

Polymorphism analysis can be performed over the RNAseq alignments.

There are several programs that can be used for this purpose:

Software Features URL

Samtools/ http://samtools.sourceforge.net/
SNPs, INDELS
Bcftools mpileup.shtml

SNPs, INDELs and SV (Structural Variations). http://www.broadinstitute.org/gsa/wiki/

GATK Polyploids. index.php/The_Genome_Analysis_Toolkit

SNPs, INDELs, MNPs (Multiple Nucleotide http://bioinformatics.bc.edu/marthlab/

FreeBayes Polymorphism), Complex Events. Polyploids FreeBayes

Pipeline for genome assembly and SNP http://cortexassembler.sourceforge.net/

Cortex_var calling for population index_cortex_var.html

Pipeline to process and align reads and call

Ngs_backbone SNP and SSR
http://bioinf.comav.upv.es/ngs_backbone/
4.3 From reads to markers: SNP calling.

4.3 SNPs from RNAseq

๏Example of GATK workflow

http://www.broadinstitute.org/gatk/about
4.3 From reads to markers: SNP calling.

4.3 SNPs from RNAseq

After the SNP calling and before use the SNP data for other analysis is
recommended to perform a SNP filtering.

Common SNP calling errors:

• Missing calls for SNPs with overlapping genotype clusters (Anney et
al., 2008)
• Homozygote–heterozygote miscalls (Teo et al.,2007),
• False homozygote calls in heterozygous individuals due to allelic
dropout (Pompanon et al.,2005)
• Erroneous assessment of monomorphic SNPs as polymorphic
(Pettersson et al., 2008).

Pongpanich M et al., 2010 Bioinfomatics

4.3 From reads to markers: SNP calling.

4.3 SNPs from RNAseq

Common SNP filtering criteria:

1. Read alignment and SNP calling based.
1.1. By a minimum read depth (DP).
1.2. By a minimum quality variant call (QUAL).
1.3. By a maximum/minimum allele frequency (AF1)
1.4. Biallelic polymorphism.
1.5. By minimum physical distance between SNPs
1.6. By a minimum distance from a genomic/genetic element.

2. Genetic data incongruences.

2.1. Hardy–Weinberg equilibrium (HWE)
2.2. Missing proportion (MSP)
2.3. Minor allele frequency (MAF) Chagne D. et al. 2012 Plos One
Pongpanich M et al., 2010 Bioinfomatics
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

A whole genome SNP dataset can be an inestimable source of markers

with a wide use spectrum such as:
• Marker discovery (example: SNPs and SSRs for Pepper, Ashrafi
H. et al. 2012;CbCC methods in Chickpea, Azam S. et al. 2012)
• Genetic map development (example: Genetic map in
Miscanthus sinensis, Swaminathan K. et al. 2012).
• Gene mapping (example: Gene Mapping via Bulked Segregant
RNA-Seq (BSR-Seq), Liu S. et al. 2012, Trick M. et al. 2012).
• Population genetic analysis (example: Population genetic of
sunflowers, Renaut S. et al. 2012).
• eQTL (phosphorous supply intake in Brassica rapa, Hammond
JP. et al. 2012)
• Homoeologus regions identification in polyploids
(unpublished).
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Homoeologus regions identification in polyploids

Glycine syndetika Glycine tomentella 1. RNAseq from leaves

D4 D3 samples of D3, D4 and T2
X (Coate J. et al. 2012)
(2n) (2n)
2. Generation of the
consensus sequence for
D3 and D4 using G. max
as reference.
Glycine dolichocarpa 3. Selective mapping of T2
T2 reads to D3 or D4
(4n) consensus.
4. SNP representation.
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Homoeologus regions identification in polyploids

T2 species: accession G1134 RNAseq−eFISH
|||||||| |||||||||||| ||| ||||||||| ||||||| || ||||| | | || | || || | | | | || || | | || || | | | | || | || | | || | || | | | | | | || | | | | || | | | | |||| ||| |||||||||| |||||||||||| ||||||||||||||||||||| |||||||||||||||||||
Chr01 |||||||| |||||||||||| |||| |||||||||| ||||||| || ||||||| | | | | || | | | | || || | | || || | | | | | || |||| | | || | || | | | | | | || | | | | || | | | | ||| ||||| ||||| |||||||||||| |||||||||||||||||||||||||||||||||||| ||| D3 D4
|||||||||||||||||||| ||||||||||||||| |||||||| | ||||||||||||||||||| |||||||||| ||||||||||| ||| ||| |||||||||||| | | | | || ||| || ||| |||| ||| || | || | | ||| | | | |||||||||| ||||| ||||||||||| |||||||||||||||||||||||||||||| ||||||||||||||||||||||
Chr02 ||||||||||||||||||| |||||||||| |||| ||||||||| |||||||||||||||||| |||||||||||| |||||||||| || ||||||||||||||||| | | | | | || | ||| || |||| |||||| || || || | || | |||| ||||||||||||||||| |||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
||||||| |||||||| | |||| | | | | ||| | | | || || | ||| | | || | | | | | || | ||| T2_G1134.parent.SNP.Gm01.A$V2
|||| || ||| | | |||||||| | |||| |||||||| ||||||| ||||||| ||||||| |||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
Chr03 ||||||||||||||| |||| | | | | | | || || | ||| | | || | | | | ||| | | ||| || ||| | || |||||| | | ||||||||||||||| ||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||
||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||| ||||| ||| |||| | ||| | || | | |||| | |||||||||| | || | | | |||||T2_G1134.parent.SNP.Gm02.A$V2
| | || | | ||||||| | ||||||| | ||| ||||||||| ||||||| |||| ||||||||||| |||||||||||||||||||||||||||||
Chr04 ||||||||||||||||||||||||| |||||||||||||||||||||||| ||||||||||||||| ||||||||| |||| |||| | || | | |||| | ||||||| | || | | | | |||| | || | ||| |||||| | | |||| | | | |||||||| || ||||| |||| ||||| |||||||||||||||||||||||||||||||||||
|||||||||||||||||||||||||||| ||||| |||||||| | || |||||||||||| | | || | | | ||| | | | | || ||||| ||||| | T2_G1134.parent.SNP.Gm03.A$V2
|| | | |||| |||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Chr05 |||||||||||||||| |||||||||||| ||||| |||||||| | || |||||||||||| | || | || | | | | | | | || |||| ||||| | | | | | |||| |||||||||||||||||||||||||||||| ||||| |||||||||||||||||||||||||||||||||||||||||||||||||
||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||| || |||||| | | || || || | | ||| | | | | | | T2_G1134.parent.SNP.Gm04.A$V2
| || | | ||| | | | ||| || || | | | | | ||| |||| |||| ||||||||||||| |||||||||
Chr06 ||||||||||||||||||||||||||||| |||||||||||||| ||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||| | | ||| || | | | |||| | | | | | | || | | ||| | | | ||| || | || | || |||| |||| ||| || ||||||||| ||||||
||||||||||||||||||||||||||||||||||||||||||||||| |||||||||| |||||||||| || ||| | | |||| | ||| || | |||||||| | ||||| ||| || | || ||| ||| | ||| | | | || |||||||||| ||||||||||||||||||||||||||| | |||||| ||||||||||||||||||||||||||
T2_G1134.parent.SNP.Gm05.A$V2
Chr07 ||||||||||||||||||||||||||||||||||||||||||||||| |||||||| ||||||||||||| ||||||| | |||| | | | ||| | ||||||||| ||||| | || | || | || | | | | | | || |||| |||||| |||||||||||||||||||||||||||| || ||||||| |||||||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| ||||| |||||| ||||||||||||| ||||||||||||||||||| |||| | ||||||| || ||| || || | | | |T2_G1134.parent.SNP.Gm06.A$V2
|| | | | | | ||| | ||| | |||| |||||| ||| | || |||| |||||||| ||||||||||||||||||||||||
Chr08 |||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| |||||||||| |||||| |||||||||||| ||||||||||||||||||| |||| | ||||||| | | ||||| | | | | || | | || | | || | | | ||||| |||| ||| |||| |||| ||||||||||||||| ||||||||||||||||||||
||||||||||||||||||| |||||| ||| || || | ||| ||||||||||| |||| | | | ||||| ||| | | ||| | ||| | | | ||| | | T2_G1134.parent.SNP.Gm07.A$V2
| | |||| || | || ||||| |||| | | | || ||||| ||||||| ||||||||| |||||| ||||||||||| ||||||||||||||||||||| |||||||||| |||||| ||
Chr09 |||||||||||||||| |||||||||| ||| |||| || | ||| ||||||||| | |||| | | | | | ||||| | || | | ||| ||| | | | | || || ||||| || | || | || | |||||| | | || ||||||||||| ||||||||| |||||| |||||||||||||||||||||||| |||||||||||||||||| ||||||||| ||
|||||| ||||||||||||||||| ||||||||||||||||| ||| ||| | ||| | | || ||| | || | || | | || || ||| | | | | || | |T2_G1134.parent.SNP.Gm08.A$V2
||| | | || ||| | ||| | ||||||||||||||||||||||||||||||||||||| |||||||||||||| ||||||||||| ||||||||||||||| ||||||||| ||||||||||||||||||
Chr10 |||||| |||||||||||||||| |||||||||||||||||| ||| ||| | ||| || | || |||| | || | ||| | | | | | | ||| | | | | | | | | || ||| | ||| | ||||||||||||||||||||||||||||||||||||| ||||||||||||| |||||||||||||||||||||||||||||||||||||| ||||||||||||||||||
|||||||| |||||||||||| ||| ||||||||| ||||||| || ||||| | | || | || || | | | | || || | | || ||T2_G1134.parent.SNP.Gm09.A$V2
| | | | || | || | | || | || | | | | | | || | | | | || | | | | |||| ||| |||||||||| |||||||||||| ||||||||||||||||||||| |||||||||||||||||||
Chr11 |||||||| |||||||||||| |||| |||||||||| ||||||| || ||||||| | | | | || | | | | || || | | || || | | | | | || |||| | | || | || | | | | | | || | | | | || | | | | ||| ||||| ||||| |||||||||||| |||||||||||||||||||||||||||||||||||| |||
|||||||||||||||||||||||||||||||||||||||||||| ||||||||||||| ||||||||||||||||| |||| | | || | || | ||| || | | | | | || | || || | | |||||||||||||||| | ||||||| ||||||||||||||||||| |||||||||||||||| ||||
T2_G1134.parent.SNP.Gm10.A$V2
Chr12 |||||||||||||||||||||||||||||||||||||||||| || ||||||||||||||||||||||||||| |||| || | || | || | || | | | | ||| || | || | | | || |||||||||||||||| |||||||| |||| ||||||||||| ||||||||| |||| |||||||
||| ||||||||| | | || || || | ||||||||||||||||| || |||||||||||| || |||| || | | || | || | | | |||||||||||| ||||||||||||| |||||||| ||||||||||||||||||||||||||||T2_G1134.parent.SNP.Gm11.A$V2
|||| ||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||| |||||||
Chr13 ||| ||||||||| | | | |||| | ||||||||||||||| |||||| |||||| || |||| || | | || | || | ||| | |||||||||||| | |||||||||||| |||||||| ||||||||||||||||||||||||||| |||| |||||||||||||||||||||| ||||||||||||||||||||||| ||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| |||||
|||||||||||||||||||||||||||||||||||||||||||||| |||| |||| || || ||| | || |||| || || | ||| || | || || ||| || | | || | | | |T2_G1134.parent.SNP.Gm12.A$V2
| | || | || | | | | | || | || || |||| ||||||||| |||| |||||||||||||||||
Chr14 ||||||||||||||||||||| |||||||||||| |||||||||||||||||||| |||| || |||||| | || |||| || || | | | || | ||| || ||| | |||| | | ||| | | | | | | |||| | | || | | | | | || || | || |||||||| ||| |||||||||||||||
|||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||| | ||||||||||| ||||| ||| | ||||| | | |||||| || ||| | | |||| | ||| | | | || | | || | | || | | | | || || || || | ||| || ||||||| ||||| |||||||||| ||
T2_G1134.parent.SNP.Gm13.A$V2
Chr15 ||||||||||||| ||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||| ||||||| ||| ||| | |||| | | |||||||| |||||| | | |||| |||| | || | || | || | | || | | | || || || || | ||| || ||||||| |||| ||||||||| ||
|||||||||||||||| |||||||||||||||||||||||| |||| ||||||||||||| ||| | | | | | ||| | | | | || || | | | || || | | | ||| ||||||T2_G1134.parent.SNP.Gm14.A$V2
|| |||||| ||| |||||||||| |||| |||||||| | |||||||||||||||||||
Chr16 |||||||||||||||||||||||||||||||||||||||||||| || || ||||||||||| ||| | | | | | |||| | | | | | || | | | | |||| | | || ||| | | ||| |||||||| || || | |||||||||||||||| ||||||| | ||||||||||||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||| |||||| |||| || |||| | || || | ||| ||| || T2_G1134.parent.SNP.Gm15.A$V2
| ||| ||||| | |||| | | |||||| ||||||||||||||||||||||||||||
Chr17 |||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||| |||||||| ||||||| |||| || || | | ||| | | || ||| || | ||||| ||||| | |||| | |||| ||||||| |||||||||| |||||||||||||
||||| |||| ||||||||| |||||||||||||||| || || ||||| || |||||||| | |||| |||| ||| | | | | || |||| || | || | | ||| ||| | || | || | | | || | | | || || || | ||||||
T2_G1134.parent.SNP.Gm16.A$V2 || ||| | ||||| ||||| |||| | || || ||| || | ||||||||| || |||||||||||||||||| |||||||| ||||||||||||||||||||||
Chr18 |||||||||||||||||| |||||||||||||||||||| ||| ||||| ||| |||||||| | |||| |||| || | ||| | || |||| ||| | || | ||| | | || | || | | || | || | | || | | | |||| |||||| | ||| | || || |||| |||| | || || | || |||||||||||||||| ||||||||||||||||||| ||||||||| |||||||||||||||||||||||
|||||||||| | | || || ||||||| || | |||| |||||||||||| | | | | | || | | || || | || T2_G1134.parent.SNP.Gm17.A$V2
||| | | | || ||| | | | ||||||||| ||||||||||| ||||||||||||||||||||||||||||||| |||| ||||||||||||||||| |||||||||||||||| ||||||||||| ||||||||||||||||
Chr19 ||||||||| | | || || ||||||| | | |||| |||||||| || | | | | || | || | | || || ||||| | | | || ||| | | | |||||||| ||||| |||| |||||||||||||||||||||||||||||||| |||| ||||||||||||||||| |||||||||||||||| ||||||||||||||||||| |||||||||
||||||||||| | |||||||||| | || || | || ||| | || ||| ||||| | || | ||| | | | | ||| | T2_G1134.parent.SNP.Gm18.A$V2
| | ||| | | | |||||||||||||||||||||||||||||||||||| | ||| |||| ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||
Chr20 |||||||||| | ||||||| | | || | | | | | || | || | | ||| || | | || | | | | | | || | | | ||||||| ||||||||||||||||||||||||||||||| |||| ||| |||||||||||||||||||||||||||||| ||| |||||||| |||||||||||||| ||||||||
T2_G1134.parent.SNP.Gm19.A$V2
0 10 20 30 40 50 60 70
Chromosome Location (Mb)
T2_G1134.parent.SNP.Gm20.A$V2
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis

Recommended resource:
The Simple Fool’s Guide to Population Genomics via RNA-Seq
http://sfg.stanford.edu/
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis

Software Features URL

TASSEL Association analysis, PCA for populations http://www.maizegenetics.net/tassel

http://pritch.bsd.uchicago.edu/
Structure Population structure analysis structure.html

Population structure analysis for High

FineStructure Throughput Data. PCA
http://paintmychromosomes.com/

http://stephenslab.uchicago.edu/
Phase Genetic phasing of alleles software.html#phase
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis

Population analysis tools frequently use a specific format

different from the VCF format produced by SNP calling
tools. Options:
1. Write your own script.
2. Use a script that someone wrote

GenoToolBox
MultiVcfTool Hapmap2Structure
https://github.com/aubombarely/GenoToolBox
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: Example Glycine perennials analysis

G. canescens (A):
G1232
G. syndetika (D4):
G1300,G2073,G2321 G. clandestina (A)
G1126, G1253

G. dolichocarpa G. tomentella
(T2): (T5):
G1134,G1188,G1286,
A39, A58, G1487,G1969

G. tomentella (D3): G. tomentella

G. tomentella (D1):
G1364,G1366,G1403, (T1):
G1156,G1157,G1316
G1820 G1288, G1361,G1763
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: Example Glycine perennials analysis

1. RNAseq of 8 species (5 diploids (D1, D3, D4, A canescens, A

clandestina), 3 allotetraploids (T1, T2, T5)), 25 accessions.
2. Generation of the consensus sequence for A, D1, D3 and D4
using G. max as reference.
3. Selective mapping of:
1. T1 reads to D1 or D3 consensus
2. T2 reads to D3 or D4 consensus.
3. T5 reads to A or D1
4. SNP calling.
5. Change format from:
1. VCF to HapMap for TASSEL
2. VCF to Structure for Structure
3. VCF to phase for fineStructure
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: Structure

Mean Ln(K) variation

Delta K
for different population sizes

4
−20000 ●

Structure analysis:
1. Number of clusters
optimization (Evanno
3

−30000

G. et al. 2005)
● ● ● ● ●
● ● ● ● ● ● ● ●
●
● ●
●
Mean Ln(K)

Delta K

●
−40000
2

−50000
●
●
K=6
1

(K = 16)
● ●

●
●
● ●
●
● ●
●
●
●

●
−60000 ●

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

K K
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: Structure

2. Run Structure with and without

homoeologus read separation
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: FineStructure D3

3. Run FineStructure
without homoeologus
read separation
T2

Admixture model
T1
A

D1
4.4 Population Genetics and RNAseq

4.4 Uses for SNPs

• Population genetic analysis: FineStructure

Phylogenetic information:

A
D4 T2 D3 D1

It agrees with
previous data from
nuclear genes

CE6068 Lecture 4
No ratings yet
CE6068 Lecture 4
82 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
44 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
No ratings yet
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
56 pages
Next-Generation DNA Sequencing: Diana Le Duc, M.D. Biochemistry Institute, Medical Faculty, University of Leipzig
No ratings yet
Next-Generation DNA Sequencing: Diana Le Duc, M.D. Biochemistry Institute, Medical Faculty, University of Leipzig
40 pages
Lecture2-High Throughput Sequencing-2019
No ratings yet
Lecture2-High Throughput Sequencing-2019
58 pages
Next Generation Sequencing - : An Overview
No ratings yet
Next Generation Sequencing - : An Overview
46 pages
NGS Workshop Update
No ratings yet
NGS Workshop Update
98 pages
Lecture 01 - Genome Sequencing
No ratings yet
Lecture 01 - Genome Sequencing
48 pages
Margue Rat 2010
No ratings yet
Margue Rat 2010
11 pages
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
No ratings yet
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
74 pages
Nucleic Acid Sequencing
No ratings yet
Nucleic Acid Sequencing
59 pages
nihms-977214
No ratings yet
nihms-977214
21 pages
7 - APA478 - Clase 7. Aplicaciones Genómica
No ratings yet
7 - APA478 - Clase 7. Aplicaciones Genómica
40 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
UNIT III Introduction to Bio Bricks & its applications
No ratings yet
UNIT III Introduction to Bio Bricks & its applications
24 pages
MG - L8 - Genomics & Proteomics
No ratings yet
MG - L8 - Genomics & Proteomics
79 pages
Illumina
No ratings yet
Illumina
68 pages
Genome Sequencing Projects
No ratings yet
Genome Sequencing Projects
7 pages
lecture1-4_525_W16_large
No ratings yet
lecture1-4_525_W16_large
80 pages
Pant Nagar
No ratings yet
Pant Nagar
45 pages
Where can buy Algorithms for next generation sequencing 1st Edition Sung ebook with cheap price
100% (5)
Where can buy Algorithms for next generation sequencing 1st Edition Sung ebook with cheap price
60 pages
Bioinformatics Experimental Design
No ratings yet
Bioinformatics Experimental Design
6 pages
Biology: Next-Generation Sequencing Technology: Current Trends and Advancements
No ratings yet
Biology: Next-Generation Sequencing Technology: Current Trends and Advancements
25 pages
Next generation sequencing platforms_28-03-2025
No ratings yet
Next generation sequencing platforms_28-03-2025
29 pages
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
No ratings yet
Bioinformatics/Computationa L Tools For NGS Data Analysis: An Overview
81 pages
A Practical Guide To NGS 08 05 17 Digital
No ratings yet
A Practical Guide To NGS 08 05 17 Digital
76 pages
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
No ratings yet
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
29 pages
BFG_Chapter09_NGS_v04
No ratings yet
BFG_Chapter09_NGS_v04
123 pages
05 Introduction To Next-Generation Sequencing (NGS)
No ratings yet
05 Introduction To Next-Generation Sequencing (NGS)
25 pages
Algorithms for next generation sequencing 1st Edition Sung all chapter instant download
No ratings yet
Algorithms for next generation sequencing 1st Edition Sung all chapter instant download
77 pages
Ismail H. Bioinformatics. a Practical Guide...Sequencing Data Analysis 2023
No ratings yet
Ismail H. Bioinformatics. a Practical Guide...Sequencing Data Analysis 2023
349 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
No ratings yet
Intro To NGS - Torsten Seemann - PeterMac - 27 Jul 2012
51 pages
Data Analysis in Next Generation Sequencing
100% (1)
Data Analysis in Next Generation Sequencing
78 pages
Next generation sequencing platforms_FINAL.ptx (1)
No ratings yet
Next generation sequencing platforms_FINAL.ptx (1)
29 pages
3. DNA-Seq_Intro(1)
No ratings yet
3. DNA-Seq_Intro(1)
15 pages
The RNA World 11th Lect High-throughput Methods GH AY16 2017
No ratings yet
The RNA World 11th Lect High-throughput Methods GH AY16 2017
59 pages
Soon Et Al 2013 High Throughput Sequencing For Biology and Medicine
No ratings yet
Soon Et Al 2013 High Throughput Sequencing For Biology and Medicine
14 pages
NGSandApp
No ratings yet
NGSandApp
41 pages
Biological Sequence Determination: Protein
No ratings yet
Biological Sequence Determination: Protein
68 pages
Algorithms for next generation sequencing 1st Edition Sung - The ebook with rich content is ready for you to download
No ratings yet
Algorithms for next generation sequencing 1st Edition Sung - The ebook with rich content is ready for you to download
77 pages
Introduction to Bioinformatics in Microbiology 2018
No ratings yet
Introduction to Bioinformatics in Microbiology 2018
54 pages
NGS ToolsFormats r1 BDG
No ratings yet
NGS ToolsFormats r1 BDG
32 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
9 pages
4 - 7 Genome Assembly To Annotation - Final
No ratings yet
4 - 7 Genome Assembly To Annotation - Final
92 pages
Intro and Sequencing Tech
No ratings yet
Intro and Sequencing Tech
50 pages
Comparison of High Throughput Next Gener
No ratings yet
Comparison of High Throughput Next Gener
12 pages
Grp#3
No ratings yet
Grp#3
21 pages
Base-Calling For Next-Generation Sequencing Platforms
No ratings yet
Base-Calling For Next-Generation Sequencing Platforms
9 pages
5-Next Generation Sequencing
100% (1)
5-Next Generation Sequencing
39 pages
SciLife Bioinfo Course May2017 AA
No ratings yet
SciLife Bioinfo Course May2017 AA
54 pages
[FREE PDF sample] Algorithms for Next-Generation Sequencing 1st Edition Wing-Kin Sung ebooks
100% (1)
[FREE PDF sample] Algorithms for Next-Generation Sequencing 1st Edition Wing-Kin Sung ebooks
55 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
23 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
Soysauce Sensory
No ratings yet
Soysauce Sensory
15 pages
Optimization For The Alcohol Fermentation of Hydrolyzed Vegetable Protein (HVP) Soy Sauce by Saccharomyces Rouxii
No ratings yet
Optimization For The Alcohol Fermentation of Hydrolyzed Vegetable Protein (HVP) Soy Sauce by Saccharomyces Rouxii
5 pages
Evaluation of Aroma Differences Between High-Salt Liquid-State Fermentation and Low-Salt Solid State Fermentation Soy Sauce
No ratings yet
Evaluation of Aroma Differences Between High-Salt Liquid-State Fermentation and Low-Salt Solid State Fermentation Soy Sauce
9 pages
Bacterial Community Analysis During Fermentation of Ten Representative Kinds of Kimchi
No ratings yet
Bacterial Community Analysis During Fermentation of Ten Representative Kinds of Kimchi
8 pages
Chemical and Sensory Characteristics of Soy Sauce A Review
No ratings yet
Chemical and Sensory Characteristics of Soy Sauce A Review
19 pages
2 Preparation and Dilution of Solutions 0
No ratings yet
2 Preparation and Dilution of Solutions 0
28 pages
Building A Resilient, Sustainable, and Healthier Food Supply Through Innovation and Technology
No ratings yet
Building A Resilient, Sustainable, and Healthier Food Supply Through Innovation and Technology
31 pages
Genetics, Lecture 2, Purines and Pyrimidines (Lecture Notes)
No ratings yet
Genetics, Lecture 2, Purines and Pyrimidines (Lecture Notes)
16 pages
G2 Certificate - RICH
No ratings yet
G2 Certificate - RICH
2 pages
Comparative Study of Mobile Lab and Smart Lab: Approaches For Conducting Practicals in Blended Learning Mode at Virtual University of Pakistan
No ratings yet
Comparative Study of Mobile Lab and Smart Lab: Approaches For Conducting Practicals in Blended Learning Mode at Virtual University of Pakistan
36 pages
9700_w24_ms_21
No ratings yet
9700_w24_ms_21
16 pages
Biochemistry 9th edition. Edition Shawn O. Farrell - The complete ebook version is now available for download
100% (1)
Biochemistry 9th edition. Edition Shawn O. Farrell - The complete ebook version is now available for download
59 pages
A Four-Plex Real-Time PCR Assay, Based On Rfbe, stx1, stx2, and Eae Genes, For The Detection and Quantification of Shiga Toxin-Producing Escherichia Coli O157 in Cattle Feces
No ratings yet
A Four-Plex Real-Time PCR Assay, Based On Rfbe, stx1, stx2, and Eae Genes, For The Detection and Quantification of Shiga Toxin-Producing Escherichia Coli O157 in Cattle Feces
8 pages
BS Biotech
No ratings yet
BS Biotech
47 pages
Class 11 Neet Cell - The Unit of Life
No ratings yet
Class 11 Neet Cell - The Unit of Life
14 pages
Melcs Week 1 Genetics Mendels
No ratings yet
Melcs Week 1 Genetics Mendels
50 pages
NOTES - PART-2-BIOLOGY-CHAPTER-5 - Fundamental Unit
No ratings yet
NOTES - PART-2-BIOLOGY-CHAPTER-5 - Fundamental Unit
4 pages
Antibody Conjugate Protac
No ratings yet
Antibody Conjugate Protac
7 pages
Ijms 21 09256
No ratings yet
Ijms 21 09256
9 pages
Cell Differentiation
No ratings yet
Cell Differentiation
7 pages
Darkmatter Pitch Deck Mission
No ratings yet
Darkmatter Pitch Deck Mission
11 pages
Bahan MMI BIO F5
No ratings yet
Bahan MMI BIO F5
2 pages
Gel Electrophorosis Mpat
No ratings yet
Gel Electrophorosis Mpat
19 pages
Biochemistry 8th Edition Berg Test Bankinstant download
100% (5)
Biochemistry 8th Edition Berg Test Bankinstant download
45 pages
Prescott/Harley/Klein's Microbiology. 7th Edition. ISBN 0072992913, 978-0073302089
100% (33)
Prescott/Harley/Klein's Microbiology. 7th Edition. ISBN 0072992913, 978-0073302089
23 pages
Antibiotics: Relationship Between Virulence and Resistance Among Gram-Negative Bacteria
No ratings yet
Antibiotics: Relationship Between Virulence and Resistance Among Gram-Negative Bacteria
11 pages
BB3 1
No ratings yet
BB3 1
68 pages
Biotechnology: Ordinary Level (Syllabus NP04)
No ratings yet
Biotechnology: Ordinary Level (Syllabus NP04)
13 pages
Poster 73-Bact-Builder A New Streamlined Tool For Generating High Quality Consensus Based, Complete Mycobacterium Tuberculosis Genomes
No ratings yet
Poster 73-Bact-Builder A New Streamlined Tool For Generating High Quality Consensus Based, Complete Mycobacterium Tuberculosis Genomes
1 page
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
100% (1)
Pub - Encyclopedia of Genetics Genomics Proteomics and B PDF
4,046 pages
Bio 120 Ex 3
100% (6)
Bio 120 Ex 3
2 pages
PG Old Regu
No ratings yet
PG Old Regu
16 pages
DDR Inhibitors Summit 2023 Brochure v12
No ratings yet
DDR Inhibitors Summit 2023 Brochure v12
13 pages
An Industrial Visit To The Malladi Drugs and Pharmaceuticals
No ratings yet
An Industrial Visit To The Malladi Drugs and Pharmaceuticals
2 pages
Journal Impact Factors 2019
No ratings yet
Journal Impact Factors 2019
267 pages
CRO Market Analysis September 2016
No ratings yet
CRO Market Analysis September 2016
7 pages
Dr. Marhaen Hardjo, M.Biomed, PHD: Bagian Biokimia Fakultas Kedokteran Universitas Hasanuddin Makassar
No ratings yet
Dr. Marhaen Hardjo, M.Biomed, PHD: Bagian Biokimia Fakultas Kedokteran Universitas Hasanuddin Makassar
63 pages