0% found this document useful (0 votes)

4 views

Advanced Applications of RNA Sequencing

This document provides a comprehensive overview of RNA sequencing (RNA-seq) technology, highlighting its applications in transcriptome analysis and the challenges faced during data processing. It discusses various aspects of RNA-seq, including data preprocessing, differential gene expression, and the development of bioinformatics tools necessary for effective analysis. The authors emphasize the importance of computational methods and present examples of advancements in understanding gene regulation and disease mechanisms through RNA-seq.

Uploaded by

Eva Vicinda Tindoc

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Advanced Applications of RNA Sequencing

Uploaded by

Eva Vicinda Tindoc

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Advanced Applications of RNA Sequencing

and Challenges
Yixing Han1, Shouguo Gao2, Kathrin Muegge1,3, Wei Zhang4 and Bing Zhou5
1
Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD,
USA. 2Bioinformatics and Systems Biology Core, National Heart Lung Blood Institute, National Institutes of Health, Rockville Pike, Bethesda,
MD, USA. 3Leidos Biomedical Research, Inc., Basic Science Program, Frederick National Laboratory, Frederick, MD, USA. 4Department of
Medicine, University of California, San Diego, La Jolla, CA, USA. 5Department of Cellular and Molecular Medicine, University of California,
San Diego, La Jolla, CA, USA.

Supplementary Issue: Current Developments in RNA Sequence Analysis

Abstract: Next-generation sequencing technologies have revolutionarily advanced sequence-based research with the advantages of high-throughput,
high-sensitivity, and high-speed. RNA-seq is now being used widely for uncovering multiple facets of transcriptome to facilitate the biological applications.
However, the large-scale data analyses associated with RNA-seq harbors challenges. In this study, we present a detailed overview of the applications of this
technology and the challenges that need to be addressed, including data preprocessing, differential gene expression analysis, alternative splicing analysis,
variants detection and allele-specific expression, pathway analysis, co-expression network analysis, and applications combining various experimental proce-
dures beyond the achievements that have been made. Specifically, we discuss essential principles of computational methods that are required to meet the key
challenges of the RNA-seq data analyses, development of various bioinformatics tools, challenges associated with the RNA-seq applications, and examples
that represent the advances made so far in the characterization of the transcriptome.

Keywords: RNA-seq, data preprocessing, differential gene expression, alternative splicing, variants detection, pathway analysis, co-expression
network, systems biology

SUPPLEMENT: Current Developments in RNA Sequence Analysis Competing Interests: Authors disclose no potential conflicts of interest.
Citation: Han et al. Advanced Applications of RNA Sequencing and Challenges. Correspondence: [email protected]
Bioinformatics and Biology Insights 2015:9(S1) 29–46 doi: 10.4137/BBI.S28991.
Copyright: © the authors, publisher and licensee Libertas Academica Limited. This is
TYPE: Review an open-access article distributed under the terms of the Creative Commons CC-BY-NC
3.0 License.
Received: July 16, 2015. ReSubmitted: September 30, 2015. Accepted for
publication: October 02, 2015. aper subject to independent expert blind peer review. All editorial decisions made
P
by independent academic editor. Upon submission manuscript was subject to anti-
Academic editor: J.T. Efird, Associate Editor plagiarism scanning. Prior to publication all authors have given signed confirmation of
Peer Review: Three peer reviewers contributed to the peer review report. Reviewers’ agreement to article publication and compliance with all applicable ethical and legal
reports totaled 1186 words, excluding any confidential comments to the academic editor. requirements, including the accuracy of author and contributor information, disclosure of
competing interests and funding sources, compliance with ethical requirements relating
Funding: The Intramural Research Program of the NIH, National Cancer Institute, to human and animal study participants, and compliance with any copyright requirements
Center for Cancer Research and National Heart Lung Blood Institute, NIH supported this of third parties. This journal is a member of the Committee on Publication Ethics (COPE).
work. WZ is sponsored by NIH grant ES014811 funded to Dr. Trey Ideker. BZ is supported
by NIH grants GM049369, HG004659, and GM052872 funded to Dr. Xiangdong Fu. The Published by Libertas Academica. Learn more about this journal.
authors confirm that the funder had no influence over the study design, content of the
article, or selection of this journal.

Introduction sequencing. Gene expression is known to be time-, cell-type-,

High-throughput sequencing technologies are being widely and stimulus-dependent, and many loci are only expressed
applied in biomedical research. Since the initial application, it under very specific conditions. In fact, the genome-sequencing
has expedited tremendous advances in the characterization and project has revealed numerous open reading frames encoding
quantification of genomes, epigenomes, and transcriptomes “hypothetical” genes, for which expression patterns are not
over the last few years. Next-generation sequencing (NGS) established yet.3,4 RNA-seq allows quantifying the abundance
technology is free from many of the confines dictated by pre- level or relative changes of each transcript during defined devel-
vious technologies, such as the bias due to the probe selection opmental stages or under specific treatment conditions. Also,
in array technology, cross-hybridization background, and sig- RNA-seq allows for analysis of the transcriptome in a rather
nal saturation-induced detection dynamic range limitation.1,2 unbiased way, with single base pair resolution, a tremendous
Moreover, this high-throughput technology produces large dynamic detection range (.8,000 fold), and low background
and complex datasets at single nucleotide resolution, and signals.5 In contrast to hybridization-based technologies, it is
the cost is continuously dropping so that it offers the possi- not limited to the interrogation of selected probes on an array
bility of investigating the molecular biology genome widely and can be also applied in species, for which the whole refer-
in a far more precise and comprehensive manner as has been ence genome is not assembled yet.
previously achieved. RNA-seq is not only a tool for quantitative assessment
RNA-seq is the set of experimental procedure that gener- of RNA but can also be exploratory. Only until recently, it
ates cDNA sequences derived from the entire RNA molecules, was appreciated that 85% of the human genome can be tran-
followed by library construction and massively parallel deep scribed, albeit only 3% of the genome encodes protein-coding

Bioinformatics and Biology Insights 2015:9(S1) 29

Han et al

genes.6 Thus, RNA-seq has been instrumental to catalog the data analysis, describe the challenges associated with the
diversity of novel transcript species including long non-coding RNA-seq application, and discuss examples that represent the
RNA, miRNA, siRNA, and other small RNA classes most advances in the transcriptome characterization.
(eg, snRNA and piRNA) involved in regulation of RNA
stability, protein translation, or the modulation of chroma- RNA-seq Workflow
tin states.7,8 For instance, RNA-seq has been used to dis- An overview of a typical RNA-seq workflow is outlined in
cover enhancer RNA, a class of short transcript directly Figure 1. Three main sections are presented: the Experimental
transcribed from the enhancer region, which contributes to Biology, the Computational Biology, and the Systems Biology.
our knowledge of epigenetic gene regulation.9,10 In addition, The experimental part includes the methods’ choice of RNA
RNA-seq can give information about transcriptional start collection, first strand synthesis, and library construction,
sites, revealing alternative promoter usage, information about resulting in millions of short reads from the NGS sequencer.
mRNA isoforms derived from alternative splicing, and pre- Multiple platforms (Table 1) have been applied for the RNA-seq
mature transcription termination at the 3’ end, which is criti- including sequencing-by-synthesis approach Illumina GA IIx
cal from mRNA stability.11–15 Most recently, RNA-seq was
used to study biological problems including precisely locat-
ing regulatory elements.16,17 RNA-seq information can also
Experimental
identify allele-specific expression, disease-associated single RNA extraction
biology
nucleotide polymorphisms (SNP), and gene fusions contrib-
uting to our understanding about disease causal variants in RNA fragmentation and reverse transcription
cancer.18–21 Furthermore, RNA-seq can provide information
about the transcription of endogenous retrotransposons and
other parasitic repeat elements that may influence the tran- Library construction and sequencing

scription of neighboring genes or may result in somatic mosa-

icism in the brain.22 Finally, single-cell RNA-seq analysis has
been widely applied to study the cellular heterogeneity and
Computational
diversity in stem cell biology and neuroscience.23–25 Millions of short reads biology
While RNA-seq technology is considered unbiased, it is
important to note that the preparation and fragmentation of Quality control and preprocessing
RNA and the library construction (which includes size selec-
tion) can be biased.5 This bias may be undesired or unfavor-
Alignment to reference genome or de novo assembly
able; for example, the use of oligo (dT) primers in the first
strand synthesis enriches poly (A) mRNA, which is useful to
study expression of most protein coding genes, but misses on Indexing to coding regions/exons/junctions
canonical histones, 26 some histone variants, and subclasses of
non-coding RNA.27 Strand-specific sequencing retains the
orientation of the original RNA transcript, which may be
critical to identify antisense or non-coding RNA. 28 Systems
DEG analysis biology
The interpretation of the NGS datasets requires sophis-
Transcriptome
ticated and powerful computational programs. The RNA-seq structure assay
data generation is an ever-evolving process, which includes
development in sequencing technology, experiment design,
Integration analysis with
and algorithm development. Accompanied with this, com- epigenomic/proteomic data
Pathway analysis or
putational tools with varying performances are emerging co-expression network
constantly. A wealth of mature tools exists to meet the basic
requirements of RNA-seq data analysis, for instance, the
quality assessment and reads mapping. Meanwhile, challenges Enriched categories test
remain that require comprehensive solutions, such as differen-
tial gene expression analysis, as well as the detection of fusion
Biological insights
genes. Instead of describing each software, we outline in this
study the available tools to perform the analysis of data pre-
Figure 1. Overview of the typical RNA-seq pipeline. Three main sections
processing, differentially gene expression (DGE), alternative
are presented: The Experimental Biology, The Computational Biology
splicing, variants detection and allele-specific expression, path- and The Systems Biology. The pipeline starts from the experimental
way analysis, co-expression network analysis and highlight the preparation and come with the work flow to the sequencing and analysis
essential principles of computational methods in the RNA-seq steps as the arrows point from step to step.

30 Bioinformatics and Biology Insights 2015:9(S1)

Table 1. Overiew of technical specifications of next generation sequencing platforms.*

Platform Illumina GAIIx Illmina HiSeq 2000 Illumina MiSeq v2 SOLiD–5500xl 454 GS FLX+ Ion Torrent PGM PacBio RS
Chemistry principle Sequence-by- Sequence-by- Sequence-by- Ligation and Pyro-sequencing Proton detection Real-time
synthesize synthesize synthesize two base coding sequencing
Instrument price $256 K $654 K $128 K $251 $450 K $80 K (System price including $695 K
PGM, server, OneTouch and
OneTouch ES.)
Sequence yield per run 30Gb 600Gb 1.5–2Gb 150Gb 0.7Gb 50 Mb on 314 chip, 400 Mb on 100 Mb
316 chip, 1.5Gb on 318 chip
Sequence cost per GB $148 $45 $502 $67.00 $50 $800 (318 chip) $2,000
Reagent cost per run** $17,575 $23,470 $1,070 $10,503 $4,842 $349 on 314 chip, $549 on $$300
316 chip, $749 on 318 chip
Reagent cost per MB $0.19 .$0.04 $0.14 ,$0.07 $7 $5 on 314 chip, $1.2 on 316 $2–17
chip, $0.6 on 318 chip
Run time 10 days 11 days 27 hours*** 7 Days for SE 20 hours 2–5 hours 2 hours
14 Days for PE
Observed raw error rate 0.76% 0.26% 0.80% ,0.1% 1% ∼1% ∼10%
Read length Up to 150 bases Up to 150 bases Up to 150 bases 85 bases 700 base ∼200 bases 3000 bases, up
to 15000 bases
Read type PE PE PE PE SR PE SR
Insert size Up to 700 bases Up to 700 bases Up to 700 bases 300 bases Up to 40 kb Up to 250 bases Up to 10 kb
Typical DNA amount 50–1000 ng 50–1000 ng 50–1000 ng 400–4000 ng 25–1000 ng 100–1000 ng ∼1 µg
requirement
Computation resources $222 cluste $222 cluste Desktop/cloud $35 cluster $5 (desktop) $16.5 (desktop) $65 cluster
Data file sizes (GB)*** 600 ,600 1 148 40 images, 8 sff 0.1 sff, 0.2 fastq on 314 chip, 2 (basecalls,
5 sff, 1 fastq on 316 chip, QV, kinetics)
10 sff, 2.5 fastq on 318 chip

Notes: *Information based on company sources alone, data update to 2013–2014. **Cost only count the cost per run, does not include general purpose and library preparation equipment, annual maintenance
agreements and extra sevices. ***New compressed binary data format saves base and quality-value data in a 1byte: 1base ratio.

Bioinformatics and Biology Insights 2015:9(S1)

31
Advanced applications of RNA sequencing and challenges
Han et al

and HiSeq, 29,30 Applied Biosystems SOLiD, 31 Roche 454 Life and low-quality N bases to enhance the quality of reads39 and
Science, 32 semi-conductor technology-driven Ion torrent Per- HTSeq to depict the base calling and evaluate the base quality
sonal Genome Machine, 33 single molecule real-time PCR at position-based way as well as the overall read features.40
machine PacBio, 34 and nanopore technology-driven por- Before alignment to the reference genome, RNA-seq
table device MinION, and PromethION (http://allseq.com/ data can be further preprocessed to meet expectations in the
blog/minion-and-promethion-oxford-nanopore-s-present- next sequencing mapping steps. There are multiple tools avail-
and-future). able for this purpose, for example, BBMerge from BBMap
RNA preparation methods may vary for different kinds package (http://sourceforge.net/projects/bbmap/) merges
of sequencing platforms, RNA subtypes, and sequencing pur- paired reads based on overlap to create longer reads and cre-
poses. However, sample quality is always the determinant of ates an insert-size histogram. FLASH41 combines paired-end
acquiring qualified data and deriving biological insights from reads that overlapped and converts them to single long reads.
unbiased analysis. Poly-A-selection of sufficient mRNA is It is also a good practice to assess the RNA-seq data quality
well used in a variety of whole-transcriptome analysis includ- after the preprocessing procedure, and there are packages, for
ing gene expression, alternative splicing, and variations example, RSeQC package, to comprehensively evaluate the
detection.35,36 While for single cell sequencing, molecular reads that will go to analysis.42
labeling, and random sequencing, the labeled molecules on Reads mapping. Once high-quality data are obtained
Illumina platform can achieve remarkable mRNA capture from preprocessing, the next step is to map the short reads
efficiency.36 With more RNA-seq applications in clinical to the reference genome or to assemble them into contigs and
samples, formalin-fixed paraffin-embedded (FFPE) tissue align them to the reference genome. This procedure refers to the
samples became invaluable recourse for transcriptomic stud- classic bioinformatics problem of discovering the most reliable
ies. Ribo rRNA depletion is the preferred method for archival original sources of a large scale of short DNA sequences from
and long-aged FFPE samples.37 the genome in a speed- and memory-efficient manner.43,44
The raw reads served as starting material of the second There are many popular bioinformatics programs that can be
part, the computational biology. First, technical and biologi- used for this purpose, including ELAND (http://support.illu-
cal contaminations were removed from preprocessing steps, mina.com/help/SequencingAnalysisWorkf low/Content/
followed by mapping the qualified reads to the genome or Vau lt / In for mat ics/S equenc ing _ A na ly sis/CA SAVA /
transcriptome. The mapped reads for each sample were sub- swSEQ _mCA_ReferenceFiles.htm), SOAP, 45 SOAP2, 46
sequently indexed into gene-level, exon-level, or transcript- MAQ ,47 Bowtie,48 BWA,49 ZOOM,50 STAR,51 etc. Com-
level to assess the abundance of each category depending on parable analyses on real data have been done to assess most
the experimental purpose. The summarized data were then mapping tools.52 These programs are typically suitable for
assessed by statistical models of differentially expression gene reads that are not located at the poly (A) tails or exon-intron
list and alternative splicing events, or regulatory mechanisms splicing junctions. Poly (A) tails can be easily identified by the
were evaluated via integration analysis with other datasets presence of multiple As or Ts, and a partial junction library
such as epigenomic or proteomic data. Finally, pathway or that contains the known junction sequence has been com-
network level analyses were implemented to gain biological piled to allow the alignment of difficult mapping reads.23,53
insight through the systems biology approaches. From a different point of view, the reads that locate at the
exon–intron boundaries are helps with the determination
Data Preprocessing of the alternative splicing pattern, where advent RNA-seq
Quality assessment. Since RNA-seq is a complicated, promotes the development of new generation of slice-align-
multiple-step process involving sample preparation, frag- ment software such as BLAT,54,55 TopHat,56,57 GEM,58
mentation, purification, amplification, and sequencing, it is and MapSplice.59
not straightforward to identify and quantify all RNA species Another problem in reads mapping is that of polymor-
from the reads sequenced. Hence, quality assessment is the phisms, which occur when sequence reads align to multiple
first step of the bioinformatics pipeline of RNA-seq, and also, locations of the genome. Polymorphisms are especially com-
it is important as a step before analysis. Often, it is necessary mon for the large and complex transcriptomes. For lower
to filter data, removing (trimming) low-quality sequences or repetitive reads, one can employ the solution of assigning the
bases adaptors, contaminations, or overrepresented sequences reads to multiple locations proportionally based on the neigh-
to ensure a coherent final result. An array of tools are available boring unique reads.31,53 However, for the short reads that
for this purpose with reads quality visualized graphically such have a very high copy number and repetitive sequences, poly-
as FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/ morphism is still a great challenge. A longer read sequencer
fastqc), HTQC,38 as listed in Table 2. Recently, more flexible such as the Roche 454 or PacBio sequence analyzer might be
and efficient preprocessing tools were developed: Trimmo- required. Alternatively, there are bioinformatics solutions to
matic was developed to remove adapters and scan every read extend the short pair-end reads into 200–500 bp fragments
with a 4-base sliding window and trim the lower-scored bases before deciding upon the multiple-aligned reads.60–62

32 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

Table 2. Selected list of packages and tools for RNA-seq data analysis.

Analysis step Package Description and Comments References

Quanlity assessment FastQC A sequencing quality evaluator, easy to use, reports with reads http://www.bioinformatics.bbsrc.
and preprocessing quality visualized graphically. ac.uk/projects/fastqc
HTQC A toolkit including statistics tool for illumina high-throughput 38
sequencing data, and filtration tools for sequence quality, length,
tail quality. Depict the base calling and evaluate the base quality at
position based way and the overall read features.
Trimmomatic Trimmomatic performs a variety of useful trimming tasks for 39
illumina paired-end and single ended data. Remove PCR primers,
adpater sequences, scan every read with a 4-base sliding window
and trimming the lower-scored bases and low quality N bases to
enhance the reads qualityflexible, can handle paired end data.
BBMap Short read aligner for DNA and RNA-seq data. Capable of http://sourceforge.net/projects/
handling arbitrarily large genomes with millions of scaffolds. Han- bbmap/
dles Illumina, PacBio, 454, and other reads; very high sensitivity
and tolerant of errors and numerous large indels. Very fast.
BBMerge included which can merge paired reads based on over-
lap to create longer reads and creates an insert-size
histogram.
FLASH A rapid and cost-effective method for large-scale assembly of 41
TALENs. combines paired-end reads that overlapped and con-
verts them to single long reads.
RSeQC RSeQC package provides powerful modules that can 42
comprehensively evaluate RNA-seq data after the preprocessing
procedure. Some basic modules quickly inspect sequence quality,
nucleotide composition bias, PCR bias and GC bias, while RNA-
seq specific modules evaluate sequencing saturation, mapped
reads distribution, coverage uniformity, strand specificity, etc.
Mapping ELAND The first short read aligner but not the fastest any more. Eland sub- http://support.illumina.com/help/
stantially influences many aligners in this category and still outper- SequencingAnalysisWorkflow/
forms many followers. Eland itself works for 32 bp single-end reads Content/Vault/Informatics/
only. Additional Perl scripts in GAPipeline extend its ability. Sequencing_Analysis/CASAVA/
swSEQ_mCA_ReferenceFiles.htm
SOAP A program for efficient gapped and ungapped alignment of short 45
oligonucleotides onto reference sequences. SOAP is compat-
ible with numerous applications, including single-read or pair-end
resequencing, small RNA discovery and mRNA tag sequence map-
ping. SOAP is a command-driven program, which supports multi-
threaded parallel computing, and has a batch module for multiple
query sets.
SOAP2 An updated version of SOAP software for short reads alignment. 46
Super fast and accurate alignment for huge amounts of short
reads, includes a single individual genotype caller (SOAPsnp,
SOAPsnv, SOAPindel)
MAQ A program to align short reads and to call variants. Features 47
includes PET mapping, quality aware, gapped alignment for PET,
mapping quality, adapter trimming, partial occurrences counting,
and SNP caller.
Bowtie An ultrafast, memory-efficient short read aligner. Bowtie indexes 48
the genome with a Burrows-Wheeler index to keep its memory
footprint small. Useful unspliced aligners.
BWA A software package for mapping low-divergent sequences against 49
a large reference genome. It consists of three algorithms: BWA-
backtrack, BWA-SW and BWA-MEM, which are suitable for reads
length from 70 bp to 1Mb.
ZOOM A framework that is able to map the Illumina/Solexa reads of 15x 50
coverage of a human genome to the reference human genome in
one CPU-day, allowing two mismatches, at full sensitivity.
STAR An ultrafast universal RNA-seq aligner which utilizes sequential 51
maximum mappable seed search in uncompressed suffix arrays
followed by seed clustering and stitching procedure. STAR has a
potential for accurately aligning long (several kilobases) reads
that are emerging from the third-generation sequencing
technologies.
BLAT 54,55

(Continued )

Bioinformatics and Biology Insights 2015:9(S1) 33

Han et al

Table 2. (Continued)

Analysis step Package Description and Comments References

HTSeq A Python framework to work with high-throughput sequencing data, 40
able to perform sequencing quality evaluation, reads counting. It is
flexible that customize the needs by writing scripts or just use the
stand alone scripts.
Easy A bioconductor package for processing RNA-Seq data, which 64
RNASeq perform count summarization per feature of interest and count
normalization.
Geno A bioconductor package defines general purpose containers for 65
micRanges storing genomic intervals. Specialized containers for representing
and manipulating short alignments against a reference genome
are defined in the GenomicAlignments package.
Feature- An R package suitable for counting reads generated from either 66
Counts RNA or genomic DNA sequencing. It implements highly efficient
chromosome hashing and feature blocking techniques so consider-
ably faster than existing methods and requires far less computer
memory.
Expression Alexa-seq A comprehensive package that include a database for alignment, 12
quantification gene expression euantification, extract isoform features and
visualize the results.
Cufflinks Transcriptome assembly and differential expression analysis for 7,83,84
RNA-Seq. It also can perform Isoform Quantification, Maximum
likelihood estimation of relative isoform expression.
RSEM A package for quantifying gene and isoform abundances from 79
single-end or paired-end RNA-Seq data. RSEM outputs abun-
dance estimates, 95% credibility intervals, and visualization files
and can also simulate RNA-Seq data. In contrast to other existing
tools, the software does not require a reference genome. Thus,
in combination with a de novo transcriptome assembler, RSEM
enables accurate transcript quantification for species without
sequenced genomes.
Differential Cuffdiff A robust and accurate tool for differential analysis of RNA-Seq 7,83,84
expression experiments. isoform level analysis, Uses isoform levels in analysis.
DESeq An R package to analyse count data from high-throughput 31
sequencing assays such as RNA-Seq and test for differential
expression. It uses multi-factors analysis, Poisson GLM.
DESeq2 A method for differential analysis of count data, using shrinkage 85
estimation for dispersions and fold changes to improve stability
and interpretability of estimates. This enables a more quantitative
analysis focused on the strength rather than the mere presence of
differential expression.
EdgeR A bioconductor software package for examining differential expres- 82
sion of replicated count data. An overdispersed Poisson model is
used to account for both biological and technical variability. Empiri-
cal Bayes methods are used to moderate the degree of overdisper-
sion across transcripts, improving the reliability of inference. The
methodology can be used even with the most minimal levels of
replication, provided at least one phenotype or experimental condi-
tion is replicated. The software may have other applications beyond
sequencing data, such as proteome peptide count data.
PoissonSeq A method for normalization, testing, and false discovery rate 86
estimation for RNA-sequencing data based on poisson log-linear
model.
Limma- Limma is data analysis R package based on linear models and 87, 88, 89
voom differential expression for microarray data. voom function in the
limma package offers a way to transform count data into Gaussian
distributed data so that significance can be tested statistically.
MISO A probabilistic framework that quantitates the expression level of 105
alternatively spliced genes from RNA-Seq data, and identifies dif-
ferentially regulated isoforms or exons across samples.
Altenative splicing TopHat A widely used, fast splice junction mapper for RNA-Seq reads. It 56, 57
aligns RNA-Seq reads to mammalian-sized genomes using the
ultra high-throughput short read aligner Bowtie, and then analyzes
the mapping results to identify splice junctions between exons.
(Continued)

34 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

Table 2. (Continued)

Analysis step Package Description and Comments References

MapSplice An algorithm for mapping RNA-seq data to reference genome 59
for splice junction discovery. It utilizes the exon-first methods,
supports both single-end and pair-end reads with high memory
efficiency and accuracy.
SpliceMap A de novo splice junction discovery and alignment tool. It offers 106
high sensitivity and accuracy and support for arbitrary RNA-seq
read lengths.
SplitSeek A program for de novo prediction of splice junctions in RNA-seq 107
data. It utilizes the exon-first method.
GEM A fast, accurate and versatile alignment by filtration. It can lever- 58
mapper age string matching by filtration to search the alignment space
more efficiently, simultaneously delivering precision and speed.
SpliceR An easy-to-use tool that extends the usability of RNA-seq and 108
assembly technologies by allowing greater depth of annotation of
RNA-seq data.
Splicing- A method and software to predict genes that are differentially 109
Compass spliced between two different conditions using RNA-seq data.
GliMMPS A robust statistical method for detecting splicing quantitative trait 110
loci (sQTLs) from RNA-seq data.
MATS A computational tool to detect differential alternative splicing events 111
from RNA-Seq data. The statistical model of MATS calculates
the P-value and false discovery rate that the difference in the
isoform ratio of a gene between two conditions exceeds a given
user-defined threshold. From the RNA-Seq data, MATS can auto-
matically detect and analyze alternative splicing events correspond-
ing to all major types of alternative splicing patterns. MATS handles
replicate RNA-Seq data from both paired and unpaired study
design.
rMATS A statistical model and computer program designed for detection 112
of differential alternative splicing from replicate RNA-Seq data.
rMATS uses a hierarchical model to simultaneously account for
sampling uncertainty in individual replicates and variability among
replicates.
Varients detection GATK Package for aligned NGS data analysis, which includes a SNP http://gatkforums.broadinstitute.
and genotype caller (Unifed Genotyper), SNP filtering (Variant org/discussion/3891/calling-
Filtration) and SNP quality recalibration (Variant Recalibrator). variants-in-rnaseq).
ANNOVAR An efficient software tool to functionally annotate genetic variants 115
(gene-based, region-based or filter-based) detected from diverse
genomes.
SNPiR A highly accurate approach termed SNPiR to identify SNPs in 116
RNA-seq data.
SNiPlay3 A web-based application for exploration and large scale analyses 117
of genomic variations.
Pathway analysis GSEA A knowledge-based approach for interpreting genome-wide 130
expression profiles. It determines whether an a priori defined set
of genes shows statistically significant, concordant differences
between two biological states (eg, phenotypes).
GSVA A non-parametric, unsupervised method for estimating variation 131
of gene set enrichment through the samples of a expression data
set. GSVA performs a change in coordinate systems, transforming
the data from a gene by sample matrix to a gene-set by sample
matrix, thereby allowing the evaluation of pathway enrichment for
each sample.
SeqGSEA The package generally provides methods for gene set enrichment 132
analysis of high-throughput RNA-Seq data by integrating differen-
tial expression and splicing. It uses negative binomial distribution
to model read count data, which accounts for sequencing biases
and biological variation. Based on permutation tests, statistical
significance can also be achieved regarding each gene’s differen-
tial expression and splicing, respectively.
GAGE An evaluation of the very latest large-scale genome assembly 133
algorithms.
(Continued)

Bioinformatics and Biology Insights 2015:9(S1) 35

Han et al

Table 2. (Continued)

Analysis step Package Description and Comments References

SPIA An R package that uses the information form a list of differentially 135
expressed genes and their log fold changes together with signal-
ing pathways topology, in order to identify the pathways most rel-
evant to the condition under the study.
TAPPA A java-based tool, for identification of phenotype-associated 136
genetic pathways utilizing the pathway topological measures.
DEAP A tool capitalizes on information about biological pathways to iden- 137
tify important regulatory patterns from differential expression data.
It makes significant improvements over existing approaches by
including information about pathway structure and discovering the
most differentially expressed portion of the pathway.
GSAA A toolset for gene set association analysis of RNA-Seq count data. 134
SeqSP GSAASeqSP identify pathways/gene sets significantly associated
with a disease or a phenotype by analyzing genome-wide patterns
of gene expression variation measured by RNA-Seq technology.
Co-expression GSCA An open source software package to help researchers use mas- 146
network sive amounts of publicly available gene expression data (PED) to
make discoveries. Users can interactively visualize and explore
gene and gene set activities in 25,000+ consistently normalized
human and mouse gene expression samples representing diverse
biological contexts.
DICER A method for detecting differentially co-expressed gene sets using 151
a novel probabilistic score for differential correlation. DICER goes
beyond standard differential co-expression and detects pairs of
modules showing differential co-expression.
WGCNA A powerful method to extract co-expressed groups of genes from 152
large microarray data sets and has been successfully applied
to RNA-seq data. It is suggested to remove genes whose read
counts are consistently low and normalize the data with a vari-
ance-stabilizing transformation before calculating pairwise similar-
ity of expression pattern.

Reads counting. RNA-seq reads number that map to summarizeOverlaps, which is a function in the Genomics-
a gene is the measurement of the gene’s expression level. Ranges package in Bioconductor, and featureCounts, which
After mapping the reads to the reference genome, counting have implemented, highly efficient chromsome hashing
the reads number that mapped to gene body will facilitate and feature blocking methods, are suitable for RNA-seq or
the next steps. Library preparation methods, such as whether genomic DNA sequencing data. Different tools and their dif-
the protocol is strand-specific, whether first read is on the ferent related parameters generate different reads’ numbers,
same strand or opposite strands, are determinant factors for and thus affect downstream analysis because they use differ-
the counting of reads. One example of tools and packages ent strategies to assign reads to features.
for read counting from bam file is the multicov command in In addition, the gene model that hypothesizes the struc-
bedtools that takes a feature file (GFF) and read counts in ture of transcripts produced by a gene also affects the analy-
certain regions, such as all exons of a gene.63 By default, it sis. Among multiple genome annotation databases, RefGene,
counts reads on both strands within interested regions. But it Ensembl, and the UCSC annotation databases are the most
can work in a strand specific manner if necessary. HTseq is popular ones. The choice of genome annotation directly affects
a specialized utility for counting reads although speed lifting gene expression estimation. Recently, Zhao and Zhang sys-
is necessary in the future.40 However, it allows us to look for tematically characterized the impact of genome annotation
more fine-grained controls on read counting by setting dif- datasets choice on read mapping and transcriptome quantifi-
ferent parameters. This is very useful, especially when a read cation.67 They found that the impact of a gene model on map-
overlaps more than one gene and we want to use customized ping of nonjunction reads is different from junction reads. The
strategy. Note that HTseq-counts assume that the RNA-seq percentage of correct mapped nonjunction reads was much
data is strand-specific; it will only count those reads that higher than that of the junction reads for all gene models.
were mapped to the strand that the feature is on. R pack- Surprisingly, although there are 21,958 common genes among
ages include easyRNASeq, summarizeOverlaps and feature- RefGene, Ensembl, and UCSC annotation, only 16.3% of
Counts for reads counting. easyRNASeq hides the complex genes obtained identical quantification results. Approximately
interplay of the required packages and thus can be easily used. 28.1% of genes’ expression levels differed by $5% when using

36 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

different annotation, and of those, the relative expression read counts with respect to overall mapped read number and
levels for 9.3% of genes differed by $50%. This study revealed gene length.32,53 However, beside the read coverage, there are
that the different gene definition of gene models frequently other factors that determine the estimated transcript abun-
result in inconsistency in gene quantification. dance including sequencing depth, gene length, and isoforms
Normalization. After getting the read counts, data nor- abundance.72,77,78 Since the RPKM method handles all the
malization is one of the most crucial steps of data process- RNA-seq reads almost equally, for example, without concern
ing, and this process must be carefully considered, as it is for isoforms, it has been criticized. RNA-Seq by Expectation
essential to ensure accurate inference of gene expression and Maximization (RSEM) is a newly developed software tool,
subsequent analyses thereof. There are multiple facets of the which gives accurate estimates for gene and isoform expres-
RNA-seq data to be taken into account including transcript sion levels and can be used even for species without a reference
size, GC-content, sequencing depth sequencing error rate, genome assembly.79
insert size, etc.68,69 Multiple normalization methods should be Most algorithms to date for differential gene expression
compared for the specific bias elimination of a dataset, which analysis apply simple count-based probability distributions
can be done by comparing their corresponding estimated per- (eg, Poisson distribution) followed by Fisher’s exact test with-
formance parameters using measurement error models.69,70 out accounting for biological variability among samples. 32,53,80
Plenty of comparative analysis or integrative analysis con- While the technical variability of RNA-seq is extremely low
cluded the best approach in different types of RNA-seq data compared with microarray data, 32 the biological variability
analysis. For instance, quantile normalization can improve the could be significantly reduced by analyzing several replicates
mRNA-seq data quality including those from low amounts of through a permutation-derived methods.75 Serial analysis of
RNA.71,72 R package EDASeq using within-lane normaliza- gene expression has been developed for biological variability
tion procedures followed by between-lane normalization can assessment, in which larger scale datasets are used so that an
reduce GC-content bias.73 Lowess normalization and quantile additional dispersion parameter can be estimated based on an
normalization worked well in microRNA-seq data normal- extended Poisson distribution, allowing extensive molecular
ization.74 Further advancement of RNA-seq application calls characterization capability.81,82
for the development of effective statistical and computational However, for most applications, a large number of replica
methods for RNA-seq data normalization. may be too costly, and many developed methods have over-
There are other bioinformatics challenges for the RNA- come the problem by modeling biological variability and
seq reads mapping, for example, reducing the errors in image measuring the significance with limited number of samples,
analysis and base calling to enhance sequencing accuracy; applying pairwise or multiple group comparisons.75 Several
removing low-quality reads; and the development of appli- programs offer well-done solution for this purpose and have
cable approaches to store, retrieve, and process large datasets been applied in numerous studies for biomedical and clinical
in a time- and energy-efficient manner. research. Examples of these programs are Cuffdiff from the
Cufflinks package,7,83,84 DESeq, 31 DESeq2,85 and EdgeR.82
Differential Gene Expression Since RNA-seq read counts are integer numbers that range
The transcriptome is the complete set of transcripts in a cell or from zero to millions and are highly skewed, many kinds of
cell population, and transcriptome analysis provides informa- transformation algorithms have been applied to the counts so
tion about the identity and quantity of all RNA molecules. An that the numbers can be fit to statistic distribution models for
important application of RNA-seq is the comparison of tran- differential expression detection. For instance, Li et al devel-
scriptomes across different developmental stages, across a dis- oped PoissonSeq, a Poisson log-linear model for differential
ease state compared to normal cells, or specific experimental gene expression assay.86 Approaches developed for microar-
stimuli compared to physiologic conditions. This type of anal- ray data analysis based on continuous distribution have been
ysis requires identification of genes along with their isoforms improved for RNA-seq counts. Excellent example is the voom
and precise assessment of their abundance comparing two or function in the limma package, which offers a way to transform
multiple samples. It is essential for interpreting the functional count data into Gaussian distributed data so that significance
elements of the genome and uncovering the molecular consti- can be tested statistically.87–89 An extensive comparison to
tution providing important insights in the biological mecha- evaluate the performances of several DGE packages has been
nisms of development and diseases. recently reported.90,91 However, to the best of our knowledge,
After the step of preprocessing RNA-seq reads, it is an there is no one-size-fits-all strategy. Also, space for refine-
important question to reveal how the transcripts level differs ment of existing pipelines exists to develop effective strategies
across samples, known as DGE analysis. Numerous statisti- for the following questions: how to uniform the reads coverage
cal methods have been developed that use read coverage to along the genome with the nucleotide composition variation;
quantify transcript abundance since the microarray era.75,76 how to detect the “within-sample” variations without simply
The RPKM (reads per kilo base per million mapped reads) is assuming that the underlying conditions or treatments affect
widely used method to account for expression and normalized all individual gene equally; how to improve current methods to

Bioinformatics and Biology Insights 2015:9(S1) 37

Han et al

detect differences in gene isoform preferences and abundance developed from a statistical method and used for detecting
level in varying conditions; and how to account for the dif- differential alternative splicing events from RNA-seq data.111
ferent probability in read coverage in long genes versus short rMATS is a statistical method for robust and flexible detec-
genes since we can gain great sequencing depth nowadays. tion of genome-wide differential alternative splicing from
paired or unpaired replicates.112 ALEXA-Seq assesses the
Alternative Splicing differential and alternative expression of the mRNA iso-
The biological complexity and genomic diversity are deter- forms after cataloging transcripts.12 An integrative analysis
mined, to a large degree, by the alternative splicing events.92 approach constructed an exon co-splicing network based on
Alternative splicing shapes the control of numerous pivotal distances combined with matrix correlations and found that
cellular processes, and abnormal splicing events are involved the co-splicing network was distinct and complementary to
in 15%–50% of disease-causing mutations in human.93 Com- the co-expression network, although they both possess scale-
pared with constitutive splicing, alternative splicing refers to free properties.113,114
the differential inclusion/exclusion of exons in the processed The field of alternative splicing analysis using RNA-seq
RNA product after splicing of a precursor RNA segment.94 data is still in its infancy and would benefit from new strate-
It is a crucial step in controlling the expression of ∼95% of gies. An extensive evaluation and comparison of the existing
all multiexon genes, and an increasing number of diseases are methods would be desirable, and to date, there is no gen-
found to be associated with the “wrong” splice sites usage, eral consensus regarding which method performs best under
while the overall transcript abundance does not change.95,96 given conditions. We are expecting to see the novel, explor-
Spliceosomes, composited of intricate structures with RNA– ing methods to be developed in this flourishing field in the
RNA, protein–protein, and RNA–protein interactions, carry near future.
out the splicing reaction.97 Splicing mechanism studies on
model genes have deduced many regulatory principles includ- Variants Detection and Allele-Specific Expression
ing the role of negative intrinsic sites binding and positive The main applications of RNA-seq analysis are novel gene
enhancement of splicing sites selection in the formation of identification, expression, and splicing analysis. However,
spliceosome assembly.94 RNA-seq data is also a useful by-product of sequence-based
However, given the variety of cis-acting elements and mutation analysis, though there are many limitations, such as
trans-acting factors involved in splicing, either cooperatively or highly differential coverage between different genes. Among
in a competing manner, the “code” for controlling alternative many variants calling and annotation methods such as
splicing needs still further deciphering using high-throughput ANNOVAR,115 SNPiR,116 and SNiPlay3,117 the best practical
approaches.98 RNA-seq technology allows us to estimate workflow provided by GATK may be still the best pipeline
alternative splicing events on genome-wide scales and in an to identify mutations from RNA-seq data, although it is still
unbiased manner. Deep surveying of alternative splicing by far from perfect and under heavy development (http://gatk-
RNA-seq revealed unprecedented wealth of splice junctions forums.broadinstitute.org/discussion/3891/calling-variants-
and RNA-binding motifs and provides more reliable measure- in-rnaseq). In GATK pipeline, the sequence reads are first
ments compared with microarray technology.99–101 Further- mapped to the reference using STAR aligner (2-pass protocol)
more, alternative splicing is tissue-specific, with hundreds of to produce a file in SAM/BAM format sorted by a coordi-
context-sensitive RNA features and tissue-dependent splicing nate. After marking and removing duplicates, GATK splits
regulatory elements, which generate thousands of combina- reads with N operators in the CIGAR strings into component
tions of alternative splicing events.102,103 In-depth of RNA- reads and trims to remove any overhangs into splice junctions
sequencing analysis yield a digital inventory of gene and mRNA to reduce the occurrence of artifacts. The remaining steps are
isoform expression with tissue specificity and high sensitivity similar to DNA-seq variants calling, such as local alignment
of single cells and provides a framework of understanding and haplotype variant call.
alternative splicing pattern on genome-wide scales.15,104 Heterozygous SNP, which means two different alleles in
With the rapid accumulation of RNA-seq data, many the same position in the DNA, may lead to the following: one
methods and tools have been developed to infer alternative of two alleles is highly transcribed into mRNA and another is
splicing events. These tools generally focus on either gapped lowly transcribed or even not transcribed at all. This is called as
alignment of short reads or de novo assembly and charac- allele-specific expression (ASE). Both genetic and epigenetic
terization of transcript models. Examples of these methods determinants govern transcriptional activity at the different
are MISO for identification and regulation of isoforms from alleles of a gene in a non-haploid genome, and impairment
CLIP-seq data and105 SpliceMap,106 SplitSeek,107 spliceR,108 of this highly regulated process can lead to disease.118,119
and SplicingCompass109 for detection of splice junctions and Whole genome DNA sequencing (WGS) allows identifica-
exon usage from pair-end RNA-seq. GLiMMPS provides tion of single nucleotide mutations or polymorphisms in the
a useful tool for elucidating the genetic variation of alterna- entire human genome. The expression state of the heterozy-
tive splicing in humans and model organisms.110 MATS is gous loci can be investigated in the matched RNA-Seq and

38 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

WGS sample from the same individual, and ASE activity can is unaffected by sample sizes, experimental designs, assay
be identified to uncover the instances of allele silencing.120 platforms, or other types of heterogeneity.133 GSAASeqSP
Though conceptual simple, there is still a challenge to identify offers a variety of statistical procedures by adapting and com-
ASE due to many problems, such as reads bias and lack of bining multiple gene-level and gene set-level statistics for RNA-
sophisticated statistical model.121 Recently, Mayba et al devel- seq count-based data. Such statistics include Weighted_KS,
oped a pipeline, MBASED to ASE detection, through aggre- L2Norm, Mean, WeightedSigRatio, SigRatio, Geometric-
gating information across multiple single nucleotide variation Mean, TruncatedProduct, FisherMethod, MinP, and Rank-
loci to obtain a gene-level ASE.122 More sophisticated soft- Sum.134 GSAASeqSP is a powerful platform for investigating
wares are needed for ASE identification. molecular differential activity within biological pathways.
The limitations of the gene set analysis methods devel-
Beyond the Differentially Expressed Gene Lists oped for microarrays in the context of RNA-seq data have
Creating lists of the differentially expressed genes is only the been comprehensively investigated.128 Several frequently used
starting point of gaining biological insights into experimen- RNA-seq normalization strategies were studied to exam-
tal systems, developmental stages, or specific disease sce- ine the performance of multivariate tests. Data transforma-
narios. To understand the biologic context of differentially tions were also investigated in an attempt to extend other
expressed genes, many advanced analyses have been working approaches beyond microarray data analysis. It was found that
on gene ontology,123,124 gene sets,125 network inference, and the use of log counts when normalized for sequence depth is a
knowledge databases.126,127 good strategy for data transformation prior pathway analysis.
Pathway Analysis. The interpretation of gene expression Previously, pathway analysis methods had been developed
data is based on the function of individual genes as well as based on algorithms considering pathways as simple gene lists
their role in pathways since genes work connectively in all bio- and ignoring pathway structure. Recently, methods have been
logical processes. In addition, for some genes, a small expres- developed that incorporate various aspects of pathway topol-
sion change may be not significant at a single gene level, but ogy. For example, SPIA captures pathway topology through
minor changes of several genes may be relevant in a pathway its scoring system, in which the positions and the interactions
and may have dramatic biological consequences. Thus, differ- of the genes in the pathway are considered.135 Accordingly,
entially expressed biological pathways provide better explana- interacting differentially expressed gene pairs are preferen-
tory results than a long list of seemingly unrelated genes.128 tially weighted over two non-interacting genes. Similarly,
One traditional analysis works with a gene list of inter- TAPPA is a scoring method in which higher weights are auto-
est, identified with genomics methods or curated by biologists, matically assigned to hub genes and interacting gene pairs.136
and applies statistical methods, such as the Fisher Exact Test, DEAP identifies the most differentially expressed path to
on contingency tables to test for enrichment of each anno- provide a refined focus for further biological exploration.137
tated gene set.129 Such approaches can be applied to the dif- Accordingly, biological pathways are represented by directed
ferentially expressed gene list identified with RNA-seq data graphs, where nodes are biological compounds and the edges
directly. Another class of analysis ranks all expressed genes represent catalytic or inhibitory regulatory.
according to metrics of expression difference and then uses Applying methods developed for microarray data analy-
Kolmogorov–Smirnov like tests to obtain enrichment sig- sis without considering specific data features of RNA-seq data
nificance. Gene set enrichment analysis (GSEA) is one such may lead to biases. For example, long or highly expressed tran-
highly effective method that has been widely used in studying scripts are more likely to be detected as differentially expressed
functional enrichment between two biological groups.130 than are the short and/or lowly expressed ones. By developing
Many studies have adapted pathway analysis tools from new statistical framework, the new problem of gene length
microarray data analysis and developed new tools applicable bias and total reads number bias from RNA-seq could be well
to RNA-seq data. For example, a non-parametric competitive corrected. One good example is the GOseq package for gene
GSA approach named Gene Set Variation Analysis has been ontology analysis. It considered the read counts bias by estimat-
developed to fit RNA-seq data characteristics. Such analy- ing the probability weighting function and used resampling
ses have given highly correlated results between microarrays strategy beyond the differentially expressed gene expression
and RNA-Seq sample sets of lymphoblastoids cell lines that so that it can highlight GO categories more consistent with
have been profiled using both technologies.131 SeqGSEA uses the known biology.138 Development of good methods to cor-
count data modeling with negative binomial distributions to rect the biases in pathway analysis brought by GC content,
score differential expression and then executes gene set enrich- dinucleotide distribution, and other factors is challenging.139
ment analysis to achieve biological insights. In real applica- Although many pathway databases are available, high-
tions, SeqGSEA detects more biologically meaningful gene resolution annotation of such knowledge bases is still lack-
sets without biases toward longer or more highly expressed ing. For example, .90% of the human genome is alternatively
genes.132 GAGE is another method for pathway analysis spliced and transcripts from the same gene may have distinct,
that is applicable to both microarray and RNA-seq data. It even opposing functions. However, current knowledge bases

Bioinformatics and Biology Insights 2015:9(S1) 39

Han et al

only are curated at gene level. It is essential to also include co-expression networks.145 The co-expressed gene list in
knowledge about pathway-specific transcript activity. In COXPRESdb provides a comparable view of orthologous
addition, high-quality annotations for genes are still needed, genes among several species (human, mouse, rat, chicken, fly,
although there are enormous numbers of annotations available zebra fish, nematode, monkey, dog, and yeast) and the num-
in the public domain.140 We expect to see more sophisticated bers of common edges for all pairs of species.
data mining and machine learning algorithms applicable to Besides building gene co-expression networks under
RNA-seq data, especially those methods considering the gene defined conditions, finding co-expression modules in one
in the context of its pathway. condition and then testing if these modules show different
Co-expression network analysis. Co-expression network co-expression in other conditions can assist in understand-
analysis is an important complement to DGE analysis. A gene ing the regulatory change under disease conditions. Gene set
co-expression network is represented as an undirected graph, co-expression analysis was proposed to test differential co-
in which each node corresponds to a gene, and two nodes expression of known pathways through testing the changes
are linked if there is a significant co-expression relationship in co-expression over all gene pairs in the pathway.146 Based
between them. Because co-expressed genes are often function- on theoretical analysis, a small highly co-expressed subnet-
ally related, controlled by the same set of transcriptional factors, work was found to be a good indicator of disease onset or
or work together within same pathway, building co-expression other biological process. This finding has been validated with
networks can help to extract meaningful biological modules that real data and confirmed that this small set of genes clustered
are tightly associated within a specific biological process.141 within a strongly correlated subnetwork is able to provide the
The co-expression network has been extensively studied significant warning signal just before onset of disease.147 The
since microarray era and such data have been examined using current approach of building dynamic network biomarker is
RNA-seq data with the emergence of NGS technology. Com- based on population data. It might be interesting to build a
parison studies between RNA-seq co-expression networks co-expression of network with time series data from same
and microarray data-derived networks revealed that correla- subject with self-correlation or synchronization,148,149 such
tions from RNA-seq data are much higher due to the reason that we can use it to predict disease onset for diagnosis and
that RNA-seq data is of greater sensitivity and larger dynamic personalized medicine.
range. Although both co-expression networks show scale-free Interestingly, in biological systems, antagonistic and self-
properties, there is low overlap between hub-like genes. This reinforcing co-expressed modules have been found in system
phenomenon can be explained by low correlation between stability and adaptability.150 Algorithm have been designed to
microarray and RNA-seq data, especially for high- and low- model this phenomenon, for instance, DICER, in which the
transcript abundances.142 expression profiles of genes within each module of the pair
Both sample size and reads’ depth affect the quality of are correlated across all samples and the correlation between
RNA-seq-derived co-expression networks.143 Larger sample the two modules differ dramatically between the disease and
sizes and greater read depth can increase the functional con- normal samples.151 Weighted gene co-expression network
nectivity of the networks. The minimal suggested experimen- analysis is a powerful method to extract co-expressed groups
tal criteria to obtain performance on par with microarrays are of genes from large microarray data sets and has been suc-
at least 20 samples with total number of reads greater than cessfully applied to RNA-seq data. It is suggested to remove
10 million per sample. Meta-analysis across multiple data sets genes whose read counts are consistently low and normalize
is a good solution to increase the relatively poor performance of the data with a variance-stabilizing transformation before cal-
individual co-expression networks. Aggregation across differ- culating pairwise similarity of expression pattern. It can per-
ent experiments can improve performance significantly beyond form various aspects of weighted correction of co-expression
that attained by even the largest individual co-expression net- network analysis including network construction, module
works in one experiment. However, thousands of samples from detection, gene selection, calculations of topological prop-
different conditions are necessary to obtain the “gold standard” erties, data simulation, visualization, and interfacing with
co-expression networks. external software.152
The high quality of co-expression network by large As more and more RNA-seq data become publically
meta-analysis promises the power of a functional genom- available, there is a great need to develop new algorithms
ics tool to biologists and clinicians. GeneFriends project to formulate both the global and local characteristics of co-
team has constructed co-expression maps for human and expression networks, especially those dynamic changes asso-
mouse with RNA-seq datasets of 4,000 and 2,500 samples ciated with biological processes. Much work still remains for
from different experiments, respectively.144 This information the development of RNA-seq co-expression methodologies.
can be used statistically, such as using a guilt by association So far, there have been few published statistical studies that
approach to predict gene function, identifying and prioritizing have examined metrics for similarity of expression profiles
novel candidate genes involved in biological processes. with RNA-seq data. Si et al designed an algorithm to cluster
COXPRESdb is another database of RNAseq-based gene genes by measuring the differential expression patterns across

40 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

treatments using model-based statistical methods according DNA fragments

to either Poisson or NB models for RNA-seq data, using the Desired
mean expression level as reference, bypassing treating RNA- measurement

seq data directly.153 The co-expression networks built with

different expression measurements, such as those using raw
Reporter library
counts, RPKM, or variance-stabilizing transformation have Poly-A site
ORF
low overlap. Therefore, the development of new metrics for
co-expression network establishment is urgently needed.142 ORF Poly-A site

Centrality and network flow have been successful for ORF Poly-A site

the identification of important genes and modules from co-

Select and
expression networks. However, we lack a good way to for- reduce for
mulate the network structure. Some network metric, such as Transfection sequencing
percolation, is too simple to grasp network characteristics effi- into cells
ciently due to the dynamic nature of the biological processes.
The lack of a model that represents the dynamic change of co-
expression network at different time points limits our ability Poly-A RNA isolation, conversion and
to observe biological system changes at a network level.150 cDNA library preparation

Systems Biology
High-throughput sequencing technologies are now rou- Sequencing Sequence
tinely being applied to a wide range of topics in biology and
medicine, allowing scientists to address important questions
and reveal difficult discoveries that were impossible before. Computational analysis
Advances in genome sequencing and data analysis are of
Analyze
critical roles, while the procedure for how to prepare sam-
ples selectively and how to generate qualified data requires
sophisticated experimental design, which is essential part of
systems biology (Fig. 2). Figure 2. The STARR-seq pipeline and the corresponding ‘systems
biology’ steps. The sonicated genomic DNA are PCR amplified and
Concerning gene expression analysis, the integration data-
placed downstream of a minimal promoter in reporter vectors. The desired
sets from diverse platforms in this Next-Generation Genom- measurement are embedded in the genome. The reporter library is
ics era, including genomics, epigenomics, and proteomics with transfected into the cultured cell lines and Poly-A RNAs are isolated from
transcriptomics, is critical in the effort to understand complex the pool of total RNA. These steps are selectively to enrich the targets
biological systems. A wide scope of integrating analysis proj- interested. After RNA-seq is performed, the reads are mapped to the
ects were well defined for a more complete picture of gene reference genome and their enrichment over input are measured to reflect
enhancer activity. The steps of systems biology including mathematics and
regulation such as the Roadmap Epigenomics Project, the
computational biology analysis will help with the interpretation.
ENCODE Project, and The Cancer Genome Atlas.154 RNA-
seq has been used in combination with transcription factor
(TF) binding,155,156 histone modification,157,158 DNA methyla- on enhancer-associated histone modifications (H3K4me1,
tion,159,160 genotyping data,161,162 and RNA interference.163 In H2K27ac, H3K18ac, etc.)172–175; and (3) ChIP-seq on TFs or
this study, we summarize two excellent examples to illustrate cofactors (p300, CEBPB, etc.).176 However, the mapping of
RNA-seq application in the frame of systems biology. open chromatins and histone modifications usually lacks suffi-
STARR-seq: whole genome functional readout of cient resolution and specificity to detect precise enhancer loca-
enhancers. Enhancers are functional non-coding DNA tions, and the binding of some specific TFs or cofactors can
sequences that can recruit TFs, physically interact with pro- hardly cover all the active enhancers. Moreover, none of these
moters, and regulate the timing and tissue specificity of gene methods can provide a quantitative measurement of enhancers’
expression.164–169 Despite their important roles during devel- activities. The traditional quantitative reporter assays, on the
opment, in response to stimuli and various diseases, a genome- other hand, cannot be scaled up to a high-throughput genome-
wide approach to identify functional enhancer regions is still wide manner.177,178
lacking. Current high-throughput enhancer detection methods To address this question, Arnold and colleagues developed
can be grouped into three categories: (1) identification of open a method, named self-transcribing active regulatory region
chromatins, including deep sequencing of DNase hypersensi- sequencing (STARR-seq),17 which quantitatively measures
tive sites (DHS-seq)170 and formaldehyde-assisted isolation of the activity of enhancers in the whole genome. They sheared
regulatory elements sequencing (FAIRE-seq)171; (2) chromatin Drosophila melanogaster genomic DNA and selected ∼600 bp
immunoprecipitation followed by deep sequencing (ChIP-seq) fragments. These random fragments were PCR amplified and

Bioinformatics and Biology Insights 2015:9(S1) 41

Han et al

placed downstream of a minimal promoter in reporter vectors skin infections to deep-seated systematic candidiasis with
(Fig. 2). The reporter library contains 11.3 million candidate high mortality rates, for which progression and severity
fragments, which covered 96% of the non-repetitive genome are determined by the host immune system.179 The disease
by 10-folds. In these constructs, if candidate DNA fragments caused by Candida albicans largely depends on the feature to
are enhancers, they will have an opportunity to activate their change its transcription landscape thus switch its morpholo-
own transcription. Furthermore, by transfecting the reporter gies in response to different host niches or environmental
library into Drosophila cell lines, isolating polyadenylated stimuli.180,181 Because of the clinical significance, based on
RNA, and performing RNA-seq, the authors were able to the feature of change transcriptome upon environmental clue,
quantitatively estimate the enhancers’ strength based on the Beuno with colleagues generated RNA-seq data from in vitro-
amount of their transcription. cultured C. albicans with diverse growth conditions includ-
Computational analyses include mapping the STARR- ing hyphae-inducing condition, high/low oxidative stress/pH
seq data to the genome and examining their enrichment over condition, nitrosative stress, and cell wall damage-inducing
input. From this, the authors identified 5,499 enhancers in condition.182 From a total of 177 million mapped reads, they
Drosophila S2 cells and validated 77 in addition to 65 negative have remarkably refined the primary genome annotations by
controls by luciferase assays. As a result, 81% of the predicted determining transcripts position, identifying new genes and
enhancers and 14% of negative controls showed enhancer activ- new introns, and determining expression levels under each
ity. There was a strong linear correlation (r = 0.83) between the growth condition and condition-specific expression of novel
levels of luciferase activity and their STARR-seq transcrip- transcripts. With similar experimental design strategy, Linde
tion readouts, indicating STARR-seq is a reliable quantitative et al depicted an even detailed transcriptional map by anno-
measurement of enhancers’ strength. tating protein coding genes and non-coding genes, intron and
STARR-seq is a high-throughput application of the UTR in another Candida species Candida glabrata under pH
traditional enhancer reporter assay that directly and quanti- and nitrosative stress.183 Comparison genomics also fueled
tatively assesses enhancers’ activities in a genome-wide man- this study to determine species-specific and condition-specific
ner. It complements existing enhancer detection methods adaptions are regulated by individual genetic repertories and
based mainly on chromatin features. One of the limitations of conserved orthologs on transcriptional level.184
STARR-seq, as the authors pointed out, is that it only assesses
the potential enhancer ability of DNA sequences irrespective Outlook/Perspective
of the endogenous genomic context, such as DNA accessibil- In this review, we have outlined major applications of the
ity and histone modification. RNA-seq in biomedical research, highlighted the compu-
Structural genome (re)annotation. The task of defining tational approach in data preprocessing, differential gene
the complete set of transcripts is complicated because of the fact expression, alternative splicing, pathway analysis, and co-
that transcriptomes are of high dynamic entities, which change expression network, and presented examples to show how
in response to both of the intracellular signals and extracellu- this technology can be applied in systems biology field
lar environment. In addition, expression level, allele expression, to advance our understanding in genomic level. Since it is
and alternative splicing events are involved in increasing the potent in investigating the transcriptome in a highly quanti-
complexity of transcriptome defining with regard to the devel- tative manner at single nucleotide resolution, complex disease
opment stages, growth condition, or disease status. diagnosis, and precision medicine, the rapidly accumulating
Genomic studies including gene expression by microar- genome sequence data allow researchers to address funda-
ray and chromatin feature assays by tiling array are based mental biological questions that were not even asked just a
on genome annotations. However, the genome annotation few years ago. Although many progresses have been made
is continuously being updated and even the current annota- since the initial application of this technology, there are still
tion is incomplete indicating that the previous studies might more applications possible if further refinement is provided
have missed important information or they are not precise for each of the topics.
enough to uncover the biological insight. Accumulating Single RNA-seq. RNA-seq in single cells has provided a
studies using RNA-seq to reveal the genome and transcrip- new powerful approach to study complex biological processes,
tome annotation structurally have been generating a more for instance, promoting advances in cancer studies starting
complete and more precise map to facilitate our understand- from qualitative microscopic images to quantitative genomic
ing of the gene transcription. We highlight in this study datasets in recent year.185 Single-cell genome and exome
examples that finely annotated transcriptional landscapes sequencing fueled the investigation of fundamental questions
in a major invasive fungal pathogen with combined elegant including resolving solid tumor heterogeneity, identifying
experiment design and RNA-seq following comprehensive stem cells, tracking cell lineages and population consump-
data analysis. tion, measuring mutation rates, and detecting fusion gene
Candida species is a major invasive fungal pathogen of events.19,186–188 Although single-cell sequencing can provide
humans, responsible for diseases ranging from superficial far more accurate measurement, however, the challenges of the

42 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

single-cell sequencing in cancer cells exist in the sequencing Acknowledgments

and data analysis steps beyond the cancer cell isolation. First, We thank the NIH Fellows Editorial Board and Dr. Cuncong
the two copies of DNA strands as the input material results Zhong for suggestions on the manuscript. The content of this
in technical errors including insufficient coverage, difficulties publication does not necessarily reflect the views or policies
in mutation calling, and false-positive error in heterogene- of the Department of Health and Human Services, nor does
ity characterization. Multiple datasets from different single- mention of trade names, commercial products, or organiza-
cell sequencing encompass even higher requirements for the tions imply endorsement by the US Government.
post-sequencing comparison analysis. In the near future, we
expect to see that the single-cell sequencing will be applied in Author Contributions
much more new issues of cancer genomics study such as dif- Wrote the first draft of the manuscript: YH, SG, WZ. Con-
ferentiate extensive biological complexity or extensive techni- tributed to the writing of the manuscript: YH, SG, KM, WZ.
cal errors, rare cancer diagnosis, and early development stage Agree with manuscript results and conclusions: YH, SG,
tumor discovery. KM, WZ, BZ. Jointly developed the structure and arguments
Dual RNA-seq. Pathogen–host interactions study for the paper: YH, SG. Made critical revisions and approved
including the immune response of eukaryotic cells is another final version: YH, SG, WZ, BZ. All authors reviewed and
important battlefield, where RNA-seq plays a critical role. approved of the final manuscript.
Transcriptomic analysis has predominantly focused on either
the host or the pathogen, which requires the RNA molecule
separation from the host or the pathogen at specific time References
point, prior to the high-throughput sequencing era.189 Deeper 1. Okoniewski MJ, Miller CJ. Hybridization interactions between probesets in short
understandings of the interaction process, identification of oligo microarrays lead to spurious correlations. BMC Bioinformatics. 2006;7:276.
2. Royce TE, Rozowsky JS, Gerstein MB. Toward a universal microarray: predic-
new virulence factors, immune response mechanism, and tion of gene expression through nearest-neighbor probe sequence identification.
development of therapeutic approach will require the simul- Nucleic Acids Res. 2007;35(15):e99.
3. Kolker E, Makarova KS, Shabalina S, et al. Identification and functional analy-
taneous analysis of interaction partners because the battle sis of ‘hypothetical’ genes expressed in Haemophilus influenzae. Nucleic Acids Res.
leads to a constantly changing environment and complex gene 2004;32(8):2353−61.
4. Galperin MY, Koonin EV. From complete genome sequence to ‘complete’ under-
expression patterns. A “dual RNA-seq” approach allows to standing? Trends Biotechnol. 2010;28(8):398–406.
monitor the genes from both host and pathogen without RNA 5. Wang Z, Gerstein M, Snyder M. RNA-seq: a revolutionary tool for transcrip-
tomics. Nat Rev Genet. 2009;10(1):57–63.
separation throughout the infection process.190 It enables the
6. Hangauer MJ, Vaughn IW, McManus MT. Pervasive transcription of the human
study of dynamic response and interspecies gene regulatory genome produces thousands of previously unidentified long intergenic noncod-
networks in both the interaction partners from initial contact ing RNAs. PLoS Genet. 2013;9(6):e1003569.
7. Trapnell C, Williams BA, Pertea G, et al. Transcript assembly and quantification
through to invasion and the final persistence of the pathogen by RNA-Seq reveals unannotated transcripts and isoform switching during cell
or clearance by the host immune system with high level of differentiation. Nat Biotechnol. 2010;28(5):511–5.
8. Robertson G, Schein J, Chiu R, et al. De novo assembly and analysis of RNA-seq
accuracy and depth. Dual RNA-seq attempt studies are in data. Nat Methods. 2010;7(11):909–12.
widespread areas such as molecular and cellular biology,191,192 9. Andersson R, Gebhard C, Miguel-Escalada I, et al; FANTOM Consortium.
An atlas of active enhancers across human cell types and tissues. Nature. 2014;
public health,191 immune response in disease,193,194 and bac- 507(7493):455–61.
teria and plant interactions.195–197 As a discovery-from-data 10. Kim TK, Hemberg M, Gray JM, et al. Widespread transcription at neuronal
activity-regulated enhancers. Nature. 2010;465(7295):182–7.
approach, computational process and storage of the high mag-
11. Camarena L, Bruno V, Euskirchen G, Poggio S, Snyder M. Molecular mecha-
nitude data are of great challenge recently, although project- nisms of ethanol-induced pathogenesis revealed by RNA-sequencing. PLoS Pat-
specific packages have been developed.198–200 Computational hog. 2010;6(4):e1000834.
12. Griffith M, Griffith OL, Mwenifumbo J, et al. Alternative expression analysis by
modeling and algorithm design beyond the existing ones will RNA sequencing. Nat Methods. 2010;7(10):843–7.
facilitate greatly for answering emerging questions by ever- 13. Picardi E, Horner DS, Chiara M, Schiavon R, Valle G, Pesole G. Large-
scale detection and analysis of RNA editing in grape mtDNA by RNA deep-
developing applications from NGS to nanopore sequencing sequencing. Nucleic Acids Res. 2010;38(14):4755–67.
and single-cell sequencing. 14. Wilhelm BT, Briau M, Austin P, et al. RNA-seq analysis of 2 closely related
leukemia clones that differ in their self-renewal capacity. Blood. 2011;117(2):
As the biological complexity, the challenges of devel- e27–38.
opment of computational methods also exist in multiple 15. Wang ET, Sandberg R, Luo S, et al. Alternative isoform regulation in human
tissue transcriptomes. Nature. 2008;456(7221):470–6.
dimensions. We have to consider the particular situation
16. Liu Y, Han D, Han Y, et al. Ab initio identification of transcription start sites in
and design experiment accordingly, and no single method the Rhesus macaque genome by histone modification and RNA-Seq. Nucleic Acids
or pipeline is optimal under all circumstances even in the Res. 2011;39(4):1408–18.
17. Arnold CD, Gerlach D, Stelzer C, Boryn LM, Rath M, Stark A. Genome-wide
same fields. In addition, with rapid accumulation of data quantitative enhancer activity maps identified by STARR-seq. Science. 2013;
in public repositories, new challenges arise from the urgent 339(6123):1074–7.
18. Maher CA, Kumar-Sinha C, Cao X, et al. Transcriptome sequencing to detect
need to effectively integrate many different RNA-seq data- gene fusions in cancer. Nature. 2009;458(7234):97–101.
sets, as well as different levels omics data to study the bio- 19. Berger MF, Levin JZ, Vijayendran K, et al. Integrative analysis of the melanoma
transcriptome. Genome Res. 2010;20(4):413–27.
logical complexity and ultimately facilitate the precision and 20. Supper J, Gugenmus C, Wollnik J, et al. Detecting and visualizing gene fusions.
personalized medicine. Methods. 2013;59(1):S24–8.

Bioinformatics and Biology Insights 2015:9(S1) 43

Han et al

21. Conde L, Bracci PM, Richardson R, Montgomery SB, Skibola CF. Integrating 56. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2:
GWAS and expression data for functional characterization of disease-associated accurate alignment of transcriptomes in the presence of insertions, deletions and
SNPs: an application to follicular lymphoma. Am J Hum Genet. 2013;92(1):126–30. gene fusions. Genome Biol. 2013;14(4):R36.
22. Erwin JA, Marchetto MC, Gage FH. Mobile DNA elements in the generation 57. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with
of diversity and complexity in the brain. Nat Rev Neurosci. 2014;15(8):497–506. RNA-Seq. Bioinformatics. 2009;25(9):1105–11.
23. Wilhelm BT, Marguerat S, Watt S, et al. Dynamic repertoire of a eukaryotic 58. Marco-Sola S, Sammeth M, Guigo R, Ribeca P. The GEM mapper: fast, accu-
transcriptome surveyed at single-nucleotide resolution. Nature. 2008;453(7199): rate and versatile alignment by filtration. Nat Methods. 2012;9(12):1185–8.
1239–43. 59. Wang K, Singh D, Zeng Z, et al. MapSplice: accurate mapping of RNA-seq
24. Saliba AE, Westermann AJ, Gorski SA, Vogel J. Single-cell RNA-seq: advances reads for splice junction discovery. Nucleic Acids Res. 2010;38(18):e178.
and future challenges. Nucleic Acids Res. 2014;42(14):8845–60. 60. Holt RA, Jones SJ. The new paradigm of flow cell sequencing. Genome Res. 2008;
25. Wilson NK, Kent DG, Buettner F, et al. Combined single-cell functional and 18(6):839–46.
gene expression analysis resolves heterogeneity within stem cell populations. Cell 61. Hillier LW, Marth GT, Quinlan AR, et al. Whole-genome sequencing and vari-
Stem Cell. 2015;16(6):712–24. ant discovery in C. elegans. Nat Methods. 2008;5(2):183–8.
26. Marzluff WF, Wagner EJ, Duronio RJ. Metabolism and regulation of canonical 62. Campbell PJ, Stephens PJ, Pleasance ED, et al. Identification of somatically
histone mRNAs: life without a poly(A) tail. Nat Rev Genet. 2008;9(11):843–54. acquired rearrangements in cancer using genome-wide massively parallel paired-
27. Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL. Genomewide char- end sequencing. Nat Genet. 2008;40(6):722–9.
acterization of non-polyadenylated RNAs. Genome Biol. 2011;12(2):R16. 63. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing
28. Parkhomchuk D, Borodina T, Amstislavskiy V, et al. Transcriptome analysis by genomic features. Bioinformatics. 2010;26(6):841–2.
strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009; 64. Delhomme N, Padioleau I, Furlong EE, Steinmetz LM. easyRNASeq: a biocon-
37(18):e123. ductor package for processing RNA-seq data. Bioinformatics. 2012;28(19):2532–3.
29. Nagalakshmi U, Wang Z, Waern K, et al. The transcriptional landscape of the 65. Lawrence M, Huber W, Pagès H, et al. Software for computing and annotating
yeast genome defined by RNA sequencing. Science. 2008;320(5881):1344–9. genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118.
30. Liu L, Li Y, Li S, et al. Comparison of next-generation sequencing systems. 66. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for
J Biomed Biotechnol. 2012;2012:251364. assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30.
31. Cloonan N, Forrest AR, Kolle G, et al. Stem cell transcriptome profiling via 67. Zhao S, Zhang B. A comprehensive evaluation of ensembl, RefSeq, and UCSC
massive-scale mRNA sequencing. Nat Methods. 2008;5(7):613–9. annotations in the context of RNA-seq read mapping and gene quantification.
32. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assess- BMC Genomics. 2015;16:97.
ment of technical reproducibility and comparison with gene expression arrays. 68. Li S, Łabaj PP, Zumbo P, et al. Detecting and correcting systematic variation in
Genome Res. 2008;18(9):1509–17. large-scale RNA sequencing data. Nat Biotechnol. 2014;32(9):888–95.
33. Rothberg JM, Hinz W, Rearick TM, et al. An integrated semiconductor device 69. Filloux C, Cédric M, Romain P, et al. An integrative method to normalize
enabling non-optical genome sequencing. Nature. 2011;475(7356):348–52. RNA-seq data. BMC Bioinformatics. 2014;15:188.
34. Eid J, Fehr A, Gray J, et al. Real-time DNA sequencing from single polymerase 70. Sun Z, Zhu Y. Systematic comparison of RNA-Seq normalization methods
molecules. Science. 2009;323(5910):133–8. using measurement error models. Bioinformatics. 2012;28(20):2584–91.
35. Tariq MA, Kim HJ, Jejelowo O, Pourmand N. Whole-transcriptome RNAseq 71. Ager-Wick E, Henkel CV, Haug TM, Weltzien FA. Using normalization to
analysis from minute amount of total RNA. Nucleic Acids Res. 2011;39(18):e120. resolve RNA-Seq biases caused by amplification from minimal input. Physiol
36. Carrara M, Lum J, Cordero F, et al. Alternative splicing detection workflow Genomics. 2014;46(21):808–20.
needs a careful combination of sample prep and bioinformatics analysis. BMC 72. Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods
Bioinformatics. 2015;16(suppl 9):S2. for normalization and differential expression in mRNA-Seq experiments. BMC
37. Webster AF, Zumbo P, Fostel J, et al. Mining the archives: a cross-platform Bioinformatics. 2010;11:94.
analysis of gene expression profiles in archival formalin-fixed paraffin-embedded 73. Risso D, Schwartz K, Sherlock G, Dudoit S. GC-content normalization for
(FFPE) tissue. Toxicol Sci. 2015. RNA-seq data. BMC Bioinformatics. 2011;12:480.
38. Yang X, Liu D, Liu F, et al. HTQC: a fast quality control toolkit for Illumina 74. Garmire LX, Subramaniam S. Evaluation of normalization methods in mam-
sequencing data. BMC Bioinformatics. 2013;14:33. malian microRNA-Seq data. RNA. 2012;18(6):1279–88.
39. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 75. Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to
sequence data. Bioinformatics. 2014;30(15):2114–20. the ionizing radiation response. Proc Natl Acad Sci U S A. 2001;98(9):5116–21.
40. Anders S, Pyl PT, Huber W. HTSeq – a Python framework to work with high- 76. Grant GR, Manduchi E, Stoeckert CJ Jr. Analysis and management of microar-
throughput sequencing data. Bioinformatics. 2015;31(2):166–9. ray gene expression data. Curr Protoc Mol Biol. 2007;Chapter 19:Unit19.6.
41. Magoc T, Salzberg SL. FLASH: fast length adjustment of short reads to improve 77. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds
genome assemblies. Bioinformatics. 2011;27(21):2957–63. systems biology. Biol Direct. 2009;4:14.
42. Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bio- 78. Wang X, Wu Z, Zhang X. Isoform abundance inference provides a more accu-
informatics. 2012;28(16):2184–5. rate estimation of gene expression levels in RNA-seq. J Bioinform Comput Biol.
43. Trapnell C, Salzberg SL. How to map billions of short reads onto genomes. Nat 2010;8(suppl 1):177–92.
Biotechnol. 2009;27(5):455–7. 79. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-seq data
44. Flicek P, Birney E. Sense from sequence reads: methods for alignment and with or without a reference genome. BMC Bioinformatics. 2011;12:323.
assembly. Nat Methods. 2009;6(11 suppl):S6–12. 80. Jiang H, Wong WH. Statistical inferences for isoform expression in RNA-Seq.
45. Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment pro- Bioinformatics. 2009;25(8):1026–32.
gram. Bioinformatics. 2008;24(5):713–4. 81. Sengoelge G, Winnicki W, Kupczok A, et al. A SAGE based approach to human
46. Li R, Yu C, Li Y, et al. SOAP2: an improved ultrafast tool for short read align- glomerular endothelium: defining the transcriptome, finding a novel molecule
ment. Bioinformatics. 2009;25(15):1966–7. and highlighting endothelial diversity. BMC Genomics. 2014;15:725.
47. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling vari- 82. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for
ants using mapping quality scores. Genome Res. 2008;18(11):1851–8. differential expression analysis of digital gene expression data. Bioinformatics.
48. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory- 2010;26(1):139–40.
efficient alignment of short DNA sequences to the human genome. Genome Biol. 83. Trapnell C, Hendrickson DG, Sauvageau M, Goff L, Rinn JL, Pachter L. Dif-
2009;10(3):R25. ferential analysis of gene regulation at transcript resolution with RNA-seq. Nat
49. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler Biotechnol. 2013;31(1):46–53.
transform. Bioinformatics. 2009;25(14):1754–60. 84. Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expres-
50. Lin H, Zhang Z, Zhang MQ , Ma B, Li M. ZOOM! Zillions of oligos mapped. sion analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc.
Bioinformatics. 2008;24(21):2431–7. 2012;7(3):562–78.
51. Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq 85. Love MI, Huber W, Anders S. Moderated estimation of fold change and disper-
aligner. Bioinformatics. 2013;29(1):15–21. sion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550.
52. Bao S, Jiang R, Kwan W, Wang B, Ma X, Song YQ. Evaluation of next-generation 86. Li J, Witten DM, Johnstone IM, Tibshirani R. Normalization, testing, and false
sequencing software in mapping and assembly. J Hum Genet. 2011;56(6):406–14. discovery rate estimation for RNA-sequencing data. Biostatistics. 2012;13(3):
53. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quanti- 523–38.
fying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5(7):621–8. 87. Law CW, Chen Y, Shi W, Smyth GK. voom: precision weights unlock linear
54. Kent WJ. BLAT – the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64. model analysis tools for RNA-seq read counts. Genome Biol. 2014;15(2):R29.
55. Fonseca NA, Rung J, Brazma A, Marioni JC. Tools for mapping high-throughput 88. Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses
sequencing data. Bioinformatics. 2012;28(24):3169–77. for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

44 Bioinformatics and Biology Insights 2015:9(S1)

Advanced applications of RNA sequencing and challenges

89. Ritchie ME, Silver J, Oshlack A, et al. A comparison of background correction 121. Degner JF, Marioni JC, Pai AA, et al. Effect of read-mapping biases on detecting
methods for two-colour microarrays. Bioinformatics. 2007;23(20):2700–7. allele-specific expression from RNA-sequencing data. Bioinformatics. 2009;25(24):
90. Soneson C, Delorenzi M. A comparison of methods for differential expression 3207–12.
analysis of RNA-seq data. BMC Bioinformatics. 2013;14:91. 122. Mayba O, Gilbert HN, Liu J, et al. MBASED: allele-specific expression detec-
91. Kvam VM, Liu P, Si Y. A comparison of statistical methods for detecting dif- tion in cancer tissues and cell lines. Genome Biol. 2014;15(8):405.
ferentially expressed genes from RNA-seq data. Am J Bot. 2012;99(2):248–56. 123. Dennis G Jr, Sherman BT, Hosack DA, et al. DAVID: database for annotation,
92. Graveley BR. Alternative splicing: increasing diversity in the proteomic world. visualization, and integrated discovery. Genome Biol. 2003;4(5):3.
Trends Genet. 2001;17(2):100–7. 124. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unifica-
93. Wang GS, Cooper TA. Splicing in disease: disruption of the splicing code and tion of biology. The Gene Ontology Consortium. Nature Genetics. 2000;25(1):
the decoding machinery. Nat Rev Genet. 2007;8(10):749–61. 25–9.
94. Fu XD, Ares M Jr. Context-dependent control of alternative splicing by RNA- 125. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis:
binding proteins. Nat Rev Genet. 2014;15(10):689–701. a knowledge-based approach for interpreting genome-wide expression profiles.
95. Black DL. Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Proc Natl Acad Sci USA. 2005;102(43):15545–50.
Biochem. 2003;72:291–336. 126. Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: Kyoto
96. Wahl MC, Will CL, Luhrmann R. The spliceosome: design principles of a encyclopedia of genes and genomes. Nucleic Acids Res. 1999;27(1):29–34.
dynamic RNP machine. Cell. 2009;136(4):701–18. 127. Du J, Yuan Z, Ma Z, Song J, Xie X, Chen Y. KEGG-PATH: Kyoto encyclope-
97. Chen M, Manley JL. Mechanisms of alternative splicing regulation: insights from dia of genes and genomes-based pathway analysis using a path analysis model.
molecular and genomics approaches. Nat Rev Mol Cell Biol. 2009;10(11):741–54. Mol Biosyst. 2014;10(9):2441–7.
98. Pandit S, Zhou Y, Shiue L, et al. Genome-wide analysis reveals SR protein coope 128. Rahmatallah Y, Emmert-Streib F, Glazko G. Comparative evaluation of gene set
ration and competition in regulated splicing. Mol Cell. 2013;50(2):223–35. analysis approaches for RNA-Seq data. BMC Bioinformatics. 2014;15:397.
99. Pan Q , Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative 129. Huang D, Sherman BT, Lempicki RA. Systematic and integrative analysis of
splicing complexity in the human transcriptome by high-throughput sequencing. large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:47.
Nat Genet. 2008;40(12):1413–5. 130. Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1[alpha]-responsive genes
100. Ray D, Kazan H, Cook KB, et al. A compendium of RNA-binding motifs for involved in oxidative phosphorylation are coordinately downregulated in human
decoding gene regulation. Nature. 2013;499(7457):172–7. diabetes. Nat Genet. 2003;34(3):267–73.
101. Sultan M, Schulz MH, Richard H, et al. A global view of gene activity and alter- 131. Hanzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for
native splicing by deep sequencing of the human transcriptome. Science. 2008; microarray and RNA-Seq data. BMC Bioinformatics. 2013;14(1):7.
321(5891):956–60. 132. Wang X, Cairns M. Gene set enrichment analysis of RNA-Seq data: integrating
102. Reddy AS, Rogers MF, Richardson DN, Hamilton M, Ben-Hur A. Deciphering differential expression and splicing. BMC Bioinformatics. 2013;14(suppl 5):S16.
the plant splicing code: experimental and computational approaches for predicting 133. Luo W, Friedman M, Shedden K, Hankenson K, Woolf P. GAGE: generally applica-
alternative splicing and splicing regulatory elements. Front Plant Sci. 2012;3:18. ble gene set enrichment for pathway analysis. BMC Bioinformatics. 2009;10(1):161.
103. Barash Y, Calarco JA, Gao W, et al. Deciphering the splicing code. Nature. 2010; 134. Xiong Q , Mukherjee S, Furey TS. GSAASeqSP: a toolset for gene set associa-
465(7294):53–9. tion analysis of RNA-seq data. Sci Rep. 2014;4:6347.
104. Tang F, Barbacioru C, Wang Y, et al. mRNA-Seq whole-transcriptome analysis 135. Tarca AL, Draghici S, Khatri P, et al. A novel signaling pathway impact analysis.
of a single cell. Nat Methods. 2009;6(5):377–82. Bioinformatics. 2009;25(1):75–82.
105. Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing 136. Gao S, Wang X. TAPPA: topological analysis of pathway phenotype association.
experiments for identifying isoform regulation. Nat Methods. 2010;7(12):1009–15. Bioinformatics. 2007;23(22):3100–2.
106. Au KF, Jiang H, Lin L, Xing Y, Wong WH. Detection of splice junctions from 137. Haynes WA, Higdon R, Stanberry L, Collins D, Kolker E. Differential expres-
paired-end RNA-seq data by SpliceMap. Nucleic Acids Res. 2010;38(14):4570–8. sion analysis for pathways. PLoS Comput Biol. 2013;9(3):e1002967.
107. Ameur A, Wetterbom A, Feuk L, Gyllensten U. Global and unbiased detection 138. Young M, Wakefield M, Smyth G, Oshlack A. Gene ontology analysis for RNA-
of splice junctions from RNA-seq data. Genome Biol. 2010;11(3):R34. seq: accounting for selection bias. Genome Biol. 2010;11(2):R14.
108. Vitting-Seerup K, Porse BT, Sandelin A, Waage J. spliceR: an R package for 139. Zheng W, Chung LM, Zhao H. Bias detection and correction in RNA-sequencing
classification of alternative splicing and prediction of coding potential from data. BMC Bioinformatics. 2011;12:290.
RNA-seq data. BMC Bioinformatics. 2014;15:81. 140. Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: current approaches
109. Aschoff M, Hotz-Wagenblatt A, Glatting KH, Fischer M, Eils R, Konig R. and outstanding challenges. PLoS Comput Biol. 2012;8(2):e1002375.
SplicingCompass: differential splicing detection using RNA-seq data. Bioinfor- 141. Stuart JM, Segal E, Koller D, Kim SK. A gene-coexpression network for global
matics. 2013;29(9):1141–8. discovery of conserved genetic modules. Science. 2003;302(5643):249–55.
110. Zhao K, Lu ZX, Park JW, Zhou Q , Xing Y. GLiMMPS: robust statistical model 142. Giorgi FM, Fabbro CD, Licausi F. Comparative study of RNA-seq- and
for regulatory variation of alternative splicing using RNA-seq data. Genome Biol. microarray-derived coexpression networks in Arabidopsis thaliana. Bioinformatics.
2013;14(7):R74. 2013;29(6):717–24.
111. Shen S, Park JW, Huang J, et al. MATS: a Bayesian framework for flexible 143. Ballouz S, Verleyen W, Gillis J. Guidance for RNA-seq co-expression network con-
detection of differential alternative splicing from RNA-Seq data. Nucleic Acids struction and analysis: safety in numbers. Bioinformatics. 2015;31(13):2123–30.
Res. 2012;40(8):e61. 144. van Dam S, Craig T, de Magalhães JP. GeneFriends: a human RNA-seq-based gene
112. Shen S, Park JW, Lu ZX, et al. rMATS: robust and flexible detection of differ- and transcript co-expression database. Nucleic Acids Res. 2015;43(D1):D1124–32.
ential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A. 145. Obayashi T, Okamura Y, Ito S, Tadaka S, Motoike IN, Kinoshita K. COX-
2014;111(51):E5593–601. PRESdb: a database of comparative gene coexpression networks of eleven species
113. Li W, Dai C, Kang S, Zhou XJ. Integrative analysis of many RNA-seq datasets for mammals. Nucleic Acids Res. 2013;41(D1):D1014–20.
to study alternative splicing. Methods. 2014;67(3):313–24. 146. Choi Y, Kendziorski C. Statistical methods for gene set co-expression analysis.
114. Iancu OD, Colville A, Darakjian P, Hitzemann R. Chapter four – coexpres- Bioinformatics. 2009;25(21):2780–6.
sion and cosplicing network approaches for the study of mammalian brain tran- 147. Chen L, Liu R, Liu Z-P, Li M, Aihara K. Detecting early-warning signals for sud-
scriptomes. In: Robert H, Shannon M, eds. International Review of Neurobiology. den deterioration of complex diseases by dynamical network biomarkers. Sci Rep.
Vol 116. Waltham: Academic Press; 2014:73–93. 2012;6:2.
115. Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic vari- 148. Bar-Joseph Z, Gitter A, Simon I. Studying and modelling dynamic biologi-
ants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164. cal processes using time-series gene expression data. Nat Rev Genet. 2012;13(8):
116. Piskol R, Ramaswami G, Li JB. Reliable identification of genomic variants from 552–64.
RNA-seq data. Am J Hum Genet. 2013;93(4):641–51. 149. Gao S, Wang X. Identification of highly synchronized subnetworks from gene
117. Dereeper A, Homa F, Andres G, et al. SNiPlay3: a web-based application for expression data. BMC Bioinformatics. 2013;14(suppl 9):S5.
exploration and large scale analyses of genomic variations. Nucleic Acids Res. 150. Yosef N, Shalek AK, Gaublomme JT, et al. Dynamic regulatory network con-
2015;43(W1):W295–300. trolling TH17 cell differentiation. Nature. 2013;496(7446):461–8.
118. Chuang LC, Kao CF, Shih WL, Kuo PH. Pathway analysis using information 151. Amar D, Safer H, Shamir R. Dissection of regulatory networks that are altered in
from allele-specific gene methylation in genome-wide association studies for disease via differential co-expression. PLoS Comput Biol. 2013;9(3):e1002955.
bipolar disorder. PLoS One. 2013;8(1):e53092. 152. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network
119. Pastinen T. Genome-wide allele-specific analysis: insights into regulatory varia- analysis. BMC Bioinformatics. 2008;9:559.
tion. Nat Rev Genet. 2010;11(8):533–8. 153. Si Y, Liu P, Li P, Brutnell TP. Model-based clustering for RNA-seq data.
120. Costa V, Aprile M, Esposito R, Ciccodicola A. RNA-Seq and human complex Bioinformatics. 2013;30(2):197–205.
diseases: recent accomplishments and future perspectives. Eur J Hum Genet. 154. Hawkins RD, Hon GC, Ren B. Next-generation genomics: an integrative
2013;21(2):134–42. approach. Nat Rev Genet. 2010;11(7):476–86.

Bioinformatics and Biology Insights 2015:9(S1) 45

Han et al

155. Wei G, Abraham BJ, Yagi R, et al. Genome-wide analyses of transcription factor 178. Patwardhan RP, Hiatt JB, Witten DM, et al. Massively parallel functional
GATA3-mediated gene regulation in distinct T cell types. Immunity. 2011; dissection of mammalian enhancers in vivo. Nat Biotechnol. 2012;30(3):265–70.
35(2):299–311. 179. Klepser ME. Candida resistance and its clinical relevance. Pharmacotherapy.
156. Ouyang Z, Zhou Q , Wong WH. ChIP-Seq of transcription factors predicts 2006;26(6 pt 2):68S–75S.
absolute and differential gene expression in embryonic stem cells. Proc Natl Acad 180. Biswas S, Van Dijck P, Datta A. Environmental sensing and signal transduc-
Sci U S A. 2009;106(51):21521–6. tion pathways regulating morphopathogenic determinants of Candida albicans.
157. Han Y, Han D, Yan Z, et al. Stress-associated H3K4 methylation accumulates dur- Microbiol Mol Biol Rev. 2007;71(2):348–76.
ing postnatal development and aging of Rhesus macaque brain. Aging Cell. 2012; 181. Cutler JE. Putative virulence factors of Candida albicans. Annu Rev Microbiol.
11(6):1055–64. 1991;45:187–218.
158. Wei G, Hu G, Cui K, Zhao K. Genome-wide mapping of nucleosome occupancy, 182. Bruno VM, Wang Z, Marjani SL, et al. Comprehensive annotation of the
histone modifications, and gene expression using next-generation sequencing transcriptome of the human fungal pathogen Candida albicans using RNA-seq.
technology. Methods Enzymol. 2012;513:297–313. Genome Res. 2010;20(10):1451–8.
159. Lister R, Pelizzola M, Dowen RH, et al. Human DNA methylomes at base resolu- 183. Linde J, Duggan S, Weber M, et al. Defining the transcriptomic landscape of
tion show widespread epigenomic differences. Nature. 2009;462(7271):315–22. Candida glabrata by RNA-Seq. Nucleic Acids Res. 2015;43(3):1392–406.
160. Yu W, McIntosh C, Lister R, et al. Genome-wide DNA methylation patterns in 184. Grumaz C, Lorenz S, Stevens P, et al. Species and condition specific adaptation
LSH mutant reveals de-repression of repeat elements and redundant epigenetic of the transcriptional landscapes in Candida albicans and Candida dubliniensis.
silencing pathways. Genome Res. 2014;24(10):1613–23. BMC Genomics. 2013;14:212.
161. Montgomery SB, Sammeth M, Gutierrez-Arcelus M, et al. Transcriptome 185. Navin NE. Cancer genomics: one cell at a time. Genome Biol. 2014;15(8):452.
genetics using second generation sequencing in a Caucasian population. Nature. 186. Gerlinger M, Rowan AJ, Horswell S, et al. Intratumor heterogeneity and branched
2010;464(7289):773–7. evolution revealed by multiregion sequencing. N Engl J Med. 2012;366(10):883–92.
162. Pickrell JK, Marioni JC, Pai AA, et al. Understanding mechanisms underlying 187. Van Loo P, Campbell PJ. ABSOLUTE cancer genomics. Nat Biotechnol.
human gene expression variation with RNA sequencing. Nature. 2010;464(7289): 2012;30(7):620–1.
768–72. 188. Shah SP, Roth A, Goya R, et al. The clonal and mutational evolution spectrum
163. Solana J, Kao D, Mihaylova Y, et al. Defining the molecular profile of planar- of primary triple-negative breast cancers. Nature. 2012;486(7403):395–9.
ian pluripotent stem cells using a combinatorial RNAseq, RNA interference and 189. Sirbu A, Kerr G, Crane M, Ruskin HJ. RNA-seq vs dual- and single-channel
irradiation approach. Genome Biol. 2012;13(3):R19. microarray data: sensitivity analysis for differential expression and clustering.
164. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. PLoS One. 2012;7(12):e50986.
2003;424(6945):147–51. 190. Westermann AJ, Gorski SA, Vogel J. Dual RNA-seq of pathogen and host.
165. Levine M. Transcriptional enhancers in animal development and evolution. Curr Nat Rev Microbiol. 2012;10(9):618–30.
Biol. 2010;20(17):R754–63. 191. Dodsworth BT, Flynn R, Cowley SA. The current state of naive human pluripo-
166. Levine M, Cattoglio C, Tjian R. Looping back to leap forward: transcription tency. Stem Cells. 2015.
enters a new era. Cell. 2014;157(1):13–25. 192. Das A, Chai JC, Kim SH, et al. Dual RNA sequencing reveals the expression of
167. Buecker C, Wysocka J. Enhancers as information integration hubs in develop- unique transcriptomic signatures in lipopolysaccharide-induced BV-2 microglial
ment: lessons from genomics. Trends Genet. 2012;28(6):276–84. cells. PLoS One. 2015;10(3):e0121117.
168. Calo E, Wysocka J. Modification of enhancer chromatin: what, how, and why? 193. Pittman KJ, Aliota MT, Knoll LJ. Dual transcriptional profiling of mice and Tox-
Mol Cell. 2013;49(5):825–37. oplasma gondii during acute and chronic infection. BMC Genomics. 2014;15:806.
169. Bulger M, Groudine M. Functional and mechanistic diversity of distal transcrip- 194. Choi YJ, Aliota MT, Mayhew GF, Erickson SM, Christensen BM. Dual RNA-
tion enhancers. Cell. 2011;144(3):327–39. seq of parasite and host reveals gene expression dynamics during filarial worm-
170. Boyle AP, Davis S, Shulha HP, et al. High-resolution mapping and characteriza- mosquito interactions. PLoS Negl Trop Dis. 2014;8(5):e2905.
tion of open chromatin across the genome. Cell. 2008;132(2):311–22. 195. Lu M, Zhang PJ, Li CH, Lv ZM, Zhang WW, Jin CH. miRNA-133 augments
171. Gaulton KJ, Nammo T, Pasquali L, et al. A map of open chromatin in human coelomocyte phagocytosis in bacteria-challenged Apostichopus japonicus via targeting
pancreatic islets. Nat Genet. 2010;42(3):255–9. the TLR component of IRAK-1 in vitro and in vivo. Sci Rep. 2015;5:12608.
172. Heintzman ND, Stuart RK, Hon G, et al. Distinct and predictive chromatin 196. Camilios-Neto D, Bonato P, Wassem R, et al. Dual RNA-seq transcriptional
signatures of transcriptional promoters and enhancers in the human genome. Nat analysis of wheat roots colonized by Azospirillum brasilense reveals up-regulation
Genet. 2007;39(3):311–8. of nutrient acquisition and cell cycle genes. BMC Genomics. 2014;15:378.
173. Heintzman ND, Hon GC, Hawkins RD, et al. Histone modifications at human 197. Lange M, Eisenhauer N, Sierra CA, et al. Plant diversity increases soil microbial
enhancers reflect global cell-type-specific gene expression. Nature. 2009;459(7243): activity and soil carbon storage. Nat Commun. 2015;6:6707.
108–12. 198. Schulze S, Henkel SG, Driesch D, Guthke R, Linde J. Computational predic-
174. Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. tion of molecular pathogen-host interactions based on dual transcriptome data.
A unique chromatin signature uncovers early developmental enhancers in Front Microbiol. 2015;6:65.
humans. Nature. 2011;470(7333):279–83. 199. Torres-García W, Zheng S, Sivachenko A, et al. PRADA: pipeline for RNA
175. Bonn S, Zinzen RP, Girardot C, et al. Tissue-specific analysis of chromatin state sequencing data analysis. Bioinformatics. 2014;30(15):2224–6.
identifies temporal signatures of enhancer activity during embryonic develop- 200. Xu G, Strong MJ, Lacey MR, Baribault C, Flemington EK, Taylor CM. RNA
ment. Nat Genet. 2012;44(2):148–56. CoMPASS: a dual approach for pathogen and host transcriptome analysis of
176. Visel A, Blow MJ, Li Z, et al. ChIP-seq accurately predicts tissue-specific activ- RNA-seq datasets. PLoS One. 2014;9(2):e89445.
ity of enhancers. Nature. 2009;457(7231):854–8.
177. Melnikov A, Murugan A, Zhang X, et al. Systematic dissection and optimiza-
tion of inducible enhancers in human cells using a massively parallel reporter
assay. Nat Biotechnol. 2012;30(3):271–7.

46 Bioinformatics and Biology Insights 2015:9(S1)

Bioinformatics A Practical Guide To Next Generation Sequencing Data
No ratings yet
Bioinformatics A Practical Guide To Next Generation Sequencing Data
349 pages
Oring & Lank 1982
100% (1)
Oring & Lank 1982
7 pages
(Methods in Molecular Biology 1751) Yejun Wang, Ming-An Sun (Eds.) - Transcriptome Data Analysis - Methods and Protocols-Humana Press (2018)
100% (1)
(Methods in Molecular Biology 1751) Yejun Wang, Ming-An Sun (Eds.) - Transcriptome Data Analysis - Methods and Protocols-Humana Press (2018)
239 pages
Plant Cell Color Page Worksheet and Quiz Ce
100% (1)
Plant Cell Color Page Worksheet and Quiz Ce
6 pages
Gene Expression Ebook M GL 00258
No ratings yet
Gene Expression Ebook M GL 00258
26 pages
Soon Et Al 2013 High Throughput Sequencing For Biology and Medicine
No ratings yet
Soon Et Al 2013 High Throughput Sequencing For Biology and Medicine
14 pages
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
No ratings yet
Why Deep Learning Is Changing The Way To Approach NGS Data Processing A Review
9 pages
tmp168B TMP
No ratings yet
tmp168B TMP
2 pages
CE6068 Lecture 4
No ratings yet
CE6068 Lecture 4
82 pages
Complete_Bulk_RNA_Sequencing_Presentation
No ratings yet
Complete_Bulk_RNA_Sequencing_Presentation
10 pages
Margue Rat 2010
No ratings yet
Margue Rat 2010
11 pages
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
No ratings yet
Deep Sequencing: Introduction To Bioinformatics Seminar November 9th, 2009
56 pages
RNA Sequencing Process and Applications-F19960606001
No ratings yet
RNA Sequencing Process and Applications-F19960606001
7 pages
Kratz et al. 2014. The devil in details RNAseq - copia
No ratings yet
Kratz et al. 2014. The devil in details RNAseq - copia
3 pages
Ernesto Picardi - RNA Bioinformatics-Humana (2021)
100% (1)
Ernesto Picardi - RNA Bioinformatics-Humana (2021)
576 pages
Sequences, Genomes, and Genes in R / Bioconductor: Martin Morgan October 21, 2013
No ratings yet
Sequences, Genomes, and Genes in R / Bioconductor: Martin Morgan October 21, 2013
46 pages
R..Sequences, Genomes, and Genes in R Bioconductor
100% (1)
R..Sequences, Genomes, and Genes in R Bioconductor
46 pages
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
No ratings yet
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
120 pages
Backofen et al2017_RNA-bioinformatic
No ratings yet
Backofen et al2017_RNA-bioinformatic
9 pages
Article BioinformaticsNewToolsAndAppli
No ratings yet
Article BioinformaticsNewToolsAndAppli
15 pages
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
No ratings yet
Perspectives: Rna-Seq: A Revolutionary Tool For Transcriptomics
7 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Biology: Next-Generation Sequencing Technology: Current Trends and Advancements
No ratings yet
Biology: Next-Generation Sequencing Technology: Current Trends and Advancements
25 pages
Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
No ratings yet
Bioinformatics Tools and Methods To Analyze Single-Cell RNA Sequencing Data
7 pages
Bioinformatics New Tools and Applications in Life
No ratings yet
Bioinformatics New Tools and Applications in Life
16 pages
Transcriptome Analysis
No ratings yet
Transcriptome Analysis
6 pages
Trapnell 2024 TopHat discovering splice junction wiht RNaSeq
No ratings yet
Trapnell 2024 TopHat discovering splice junction wiht RNaSeq
7 pages
Zhang 2019 IOP Conf. Ser. Earth Environ. Sci. 332 042003
No ratings yet
Zhang 2019 IOP Conf. Ser. Earth Environ. Sci. 332 042003
7 pages
Dissertation Next Generation Sequencing
100% (1)
Dissertation Next Generation Sequencing
5 pages
Bunnik 2013 AWC
No ratings yet
Bunnik 2013 AWC
10 pages
The RNA World 11th Lect High-throughput Methods GH AY16 2017
No ratings yet
The RNA World 11th Lect High-throughput Methods GH AY16 2017
59 pages
MG - L8 - Genomics & Proteomics
No ratings yet
MG - L8 - Genomics & Proteomics
79 pages
RNA Sequnecing and Analysis - 2015 Nihms768779
No ratings yet
RNA Sequnecing and Analysis - 2015 Nihms768779
29 pages
Artigo Bioinformática
No ratings yet
Artigo Bioinformática
19 pages
Levy Myers 2016 Advancements in Next Generation Sequencing
No ratings yet
Levy Myers 2016 Advancements in Next Generation Sequencing
23 pages
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
No ratings yet
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
6 pages
Download ebooks file RNA Bioinformatics 1st Edition Ernesto Picardi (Eds.) all chapters
100% (1)
Download ebooks file RNA Bioinformatics 1st Edition Ernesto Picardi (Eds.) all chapters
67 pages
Gene Expression RNA Sequence
No ratings yet
Gene Expression RNA Sequence
120 pages
7256
No ratings yet
7256
51 pages
Bioin
No ratings yet
Bioin
34 pages
Instant ebooks textbook Sequence Analysis and Modern C++ Hauswedell download all chapters
100% (1)
Instant ebooks textbook Sequence Analysis and Modern C++ Hauswedell download all chapters
36 pages
Next Generation Sequencing Data Analysis
No ratings yet
Next Generation Sequencing Data Analysis
435 pages
2000-Genetic Network Inference-From Coexpression Clustering To Reverse Engineering
No ratings yet
2000-Genetic Network Inference-From Coexpression Clustering To Reverse Engineering
20 pages
RNA Bioinformatics 1st Edition Ernesto Picardi (Eds.) All Chapters Instant Download
100% (9)
RNA Bioinformatics 1st Edition Ernesto Picardi (Eds.) All Chapters Instant Download
67 pages
Bianca Castiglioni
No ratings yet
Bianca Castiglioni
96 pages
Download full Next-Generation Sequencing Data Analysis 2nd Edition Xinkun Wang ebook all chapters
No ratings yet
Download full Next-Generation Sequencing Data Analysis 2nd Edition Xinkun Wang ebook all chapters
65 pages
CE6068 Lecture 2
No ratings yet
CE6068 Lecture 2
95 pages
Next-Generation Sequencing Data Analysis 2nd Edition
No ratings yet
Next-Generation Sequencing Data Analysis 2nd Edition
86 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Paulson_2017
No ratings yet
Paulson_2017
10 pages
Chapter On Transcriptomics
No ratings yet
Chapter On Transcriptomics
13 pages
Bioinformatics for DNA sequence analysis 1st Edition Kit J. Menlove - The ebook is available for quick download, easy access to content
100% (1)
Bioinformatics for DNA sequence analysis 1st Edition Kit J. Menlove - The ebook is available for quick download, easy access to content
57 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
14 pages
Next Generation Sequencing (NGS) and its application in genomics
No ratings yet
Next Generation Sequencing (NGS) and its application in genomics
10 pages
RNA Bioinformatics
100% (1)
RNA Bioinformatics
429 pages
7 - APA478 - Clase 7. Aplicaciones Genómica
No ratings yet
7 - APA478 - Clase 7. Aplicaciones Genómica
40 pages
Ec 94
No ratings yet
Ec 94
2 pages
RNA Sequencing
No ratings yet
RNA Sequencing
3 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
992548Instant Download (Ebook) RNA Bioinformatics by Ernesto Picardi (eds.) ISBN 9781493922901, 1493922904 PDF All Chapters
100% (1)
992548Instant Download (Ebook) RNA Bioinformatics by Ernesto Picardi (eds.) ISBN 9781493922901, 1493922904 PDF All Chapters
77 pages
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
No ratings yet
APPLICATION OF BIOINFORMATICS IN MOLECULAR BIOLOGY AND CURRENT RESEACRH-Dr. Ruchi Yadav
105 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Guyton & Hall Physiology Review (Guyton Physiology) 4th Edition John E. Hall Phd - eBook PDF all chapter instant download
100% (1)
Guyton & Hall Physiology Review (Guyton Physiology) 4th Edition John E. Hall Phd - eBook PDF all chapter instant download
62 pages
Paramecium Reproduction
No ratings yet
Paramecium Reproduction
4 pages
PMDS Fidp Template
No ratings yet
PMDS Fidp Template
3 pages
Artificial Hybridisation and Double Fertilisation
No ratings yet
Artificial Hybridisation and Double Fertilisation
12 pages
Breaking Proteins Activity PDF 3
No ratings yet
Breaking Proteins Activity PDF 3
6 pages
The Manure Tour
100% (1)
The Manure Tour
41 pages
PHS 201
100% (1)
PHS 201
219 pages
ITS As An Environmental DNA Barcode For Fungi: An: in Silico Approach Reveals Potential PCR Biases
No ratings yet
ITS As An Environmental DNA Barcode For Fungi: An: in Silico Approach Reveals Potential PCR Biases
9 pages
Mam Arifa 502
No ratings yet
Mam Arifa 502
24 pages
Short Answers Chapter 1 Page 3
No ratings yet
Short Answers Chapter 1 Page 3
42 pages
Structure of Retrovirus
No ratings yet
Structure of Retrovirus
12 pages
Fortune Vushe CH205 Assignment 1
No ratings yet
Fortune Vushe CH205 Assignment 1
4 pages
My Scrapbook of Animals
50% (2)
My Scrapbook of Animals
5 pages
Gcse Biology Unit2 Question Paper Jun19
No ratings yet
Gcse Biology Unit2 Question Paper Jun19
36 pages
Equine Genomics 1st Edition Bhanu P. Chowdhary - Read the ebook online or download it for the best experience
100% (3)
Equine Genomics 1st Edition Bhanu P. Chowdhary - Read the ebook online or download it for the best experience
45 pages
Icsehelp Com Cell Cycle Cell Division Goyal Brothers Prakashan Icse Class 10 ...
No ratings yet
Icsehelp Com Cell Cycle Cell Division Goyal Brothers Prakashan Icse Class 10 ...
13 pages
Activity 1.3.1: Student Response Sheet: PART A-Restriction Enzymes
No ratings yet
Activity 1.3.1: Student Response Sheet: PART A-Restriction Enzymes
3 pages
4 - Cell Structure Gizmo Answers
100% (1)
4 - Cell Structure Gizmo Answers
5 pages
Transgenic Plants
No ratings yet
Transgenic Plants
15 pages
RC Tone
No ratings yet
RC Tone
20 pages
Jared Diamond-Why Is Sex Fun Chap 1b
No ratings yet
Jared Diamond-Why Is Sex Fun Chap 1b
1 page
Chapter 12 DNA and RNA
No ratings yet
Chapter 12 DNA and RNA
6 pages
Lab Activity - Extracting DNA: Materials
No ratings yet
Lab Activity - Extracting DNA: Materials
2 pages
University-BSc-Botany-MJC-1-Syllabus-1st-Semester
No ratings yet
University-BSc-Botany-MJC-1-Syllabus-1st-Semester
2 pages
Get The Biological Resources of Model Organisms 1st Edition Robert L. Jarret PDF ebook with Full Chapters Now
No ratings yet
Get The Biological Resources of Model Organisms 1st Edition Robert L. Jarret PDF ebook with Full Chapters Now
65 pages
2023.10.10 MBG Proteins - Structure and Function
No ratings yet
2023.10.10 MBG Proteins - Structure and Function
96 pages
Hauser Originofthemind - SciAm.3
No ratings yet
Hauser Originofthemind - SciAm.3
9 pages
SG_InteractionofHeredityandEnvironmentQuiz_
No ratings yet
SG_InteractionofHeredityandEnvironmentQuiz_
2 pages

Advanced Applications of RNA Sequencing

Uploaded by

Advanced Applications of RNA Sequencing

Uploaded by

Advanced Applications of RNA Sequencing

Supplementary Issue: Current Developments in RNA Sequence Analysis

Introduction sequencing. Gene expression is known to be time-, cell-type-,

Bioinformatics and Biology Insights 2015:9(S1) 29

scription of neighboring genes or may result in somatic mosa-

30 Bioinformatics and Biology Insights 2015:9(S1)

Bioinformatics and Biology Insights 2015:9(S1)

32 Bioinformatics and Biology Insights 2015:9(S1)

Analysis step Package Description and Comments References

Bioinformatics and Biology Insights 2015:9(S1) 33

Analysis step Package Description and Comments References

34 Bioinformatics and Biology Insights 2015:9(S1)

Analysis step Package Description and Comments References

Bioinformatics and Biology Insights 2015:9(S1) 35

Analysis step Package Description and Comments References

36 Bioinformatics and Biology Insights 2015:9(S1)

Bioinformatics and Biology Insights 2015:9(S1) 37

38 Bioinformatics and Biology Insights 2015:9(S1)

Bioinformatics and Biology Insights 2015:9(S1) 39

40 Bioinformatics and Biology Insights 2015:9(S1)

treatments using model-based statistical methods according DNA fragments

seq data directly.153 The co-expression networks built with

the identification of important genes and modules from co-

Bioinformatics and Biology Insights 2015:9(S1) 41

42 Bioinformatics and Biology Insights 2015:9(S1)

single-cell sequencing in cancer cells exist in the sequencing Acknowledgments

Bioinformatics and Biology Insights 2015:9(S1) 43

44 Bioinformatics and Biology Insights 2015:9(S1)

Bioinformatics and Biology Insights 2015:9(S1) 45

46 Bioinformatics and Biology Insights 2015:9(S1)

You might also like