0% found this document useful (0 votes)

389 views

Getting Started With HISAT, StringTie, and Ballgown

Getting started with HISAT, StringTie, and Ballgown

Uploaded by

Patricia

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

389 views

Getting Started With HISAT, StringTie, and Ballgown

Getting started with HISAT, StringTie, and Ballgown

Uploaded by

Patricia

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

 MENU Search and hit enter...

DAVE TANG'S BLOG

CO M P UT AT I O N AL BI O L O GY AN D GE N O M I CS

Getting started with HISAT, StringTie,

StringTie and Ballgown
B IO IN FO RMA T ICS DA VO OCT OBE R 2 5 , 2 0 1 7  11

A popular toolset used for analysing RNA-seq data is the tuxedo suite, which consists of TopHat and Cu inks. The
suite provided a start to nish pipeline that allowed users to map reads, assemble transcripts, and perform
di erential expression analyses. A newer “tuxedo suite” has been developed and is made up of three tools: HISAT,
StringTie, and Ballgown. A Nature Protocols article provides a summary of the new suite as well as a tutorial; this
StringTie
post was written while I was going through the tutorial.

I worked through the tutorial on a MacBook Pro, which means that I downloaded binaries for OS X. If you’re using
some avour of Linux, download the Linux binaries instead. The data for the tutorial is available at
ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol; you can perform a recursive download using wget to download all
the les on the FTP server. You can use your data but you’ll have to index the relevant reference le and prepare
your own sample text le. For this post, I used the same data as the tutorial.

A description of the data set is provided by geuvadis_phenodata.csv. Normally, you will have to prepare this le
yourself; it will be used later in the Ballgown step.

1 cat chrX_data/geuvadis_phenodata.csv
2 "ids","sex","population"
3 "ERR188044","male","YRI"
4 "ERR188104","male","YRI"
5 "ERR188234","female","YRI"
6 "ERR188245","female","GBR"
7 "ERR188257","male","GBR"
8 "ERR188273","female","YRI"
9 "ERR188337","female","GBR"
10 "ERR188383","male","GBR"
11 "ERR188401","male","GBR"
12 "ERR188428","female","GBR"
13 "ERR188454","male","YRI"
14 "ERR204916","female","YRI"

Now let’s download the programs; have a look at the HISAT2 page to nd the appropriate binary to download. I
like to download programs in a src directory and link them to a bin directory, which is in my PATH.

1 # for OS X
2 cd ~/src
3 wget -c ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/hisat2-2.1.0-OSX_x86_64.zip
4 unzip hisat2-2.1.0-OSX_x86_64.zip
5
6 # provide link to binaries in my bin directory
7 cd ~/bin/
8 ln -s ~/src/hisat2-2.1.0/hisat2* .
9 # some files were already linked
10 ln -s ~/src/hisat2-2.1.0/*.py .
11 ln: ./hisat2_extract_exons.py: File exists
12 ln: ./hisat2_extract_snps_haplotypes_UCSC.py: File exists
13 ln: ./hisat2_extract_snps_haplotypes_VCF.py: File exists
14 ln: ./hisat2_extract_splice_sites.py: File exists
15 ln: ./hisat2_simulate_reads.py: File exists

Again, take a look at the StringTie page to nd the appropriate binary to download.

1 # for OS X
2 cd ~/src
3 wget -c http://ccb.jhu.edu/software/stringtie/dl/stringtie-1.3.3b.OSX_x86_64.tar.gz
We 4use cookies
tar xzftostringtie-1.3.3b.OSX_x86_64.tar.gz
ensure that we give you the best experience on our website. If you continue to use this site we will
5 assume that you are happy with it.
6 # provide link to binary in my bin directory
7 cd ~/bin/ Ok
8 ln -s ~/src/stringtie-1.3.3b.OSX_x86_64/stringtie
The g compare tool needs to be compiled.

1 cd ~/src/
2 git clone https://github.com/gpertea/gclib
3 git clone https://github.com/gpertea/gffcompare
4 cd gffcompare
5 make release
6
7 # link again
8 cd ~/bin/
9 ln -s ~/src/gffcompare/gffcompare

Download SAMtools from http://www.htslib.org/download/ and compile.

1 # unzip and compile

2 tar xjf samtools-1.6.tar.bz2
3 cd samtools-1.6
4 ./configure
5 make
6
7 # link samtools
8 cd ~/bin
9 ln -s ~/src/samtools-1.6/samtools

Ballgown is a Bioconductor package, so we need to install that using R. While we are at it, we will install various
dependencies too.

1 install.packages("devtools")
2 install.packages("dplyr")
3
4 source("https://www.bioconductor.org/biocLite.R")
5 biocLite(c("alyssafrazee/RSkittleBrewer", "ballgown", "genefilter"))

Now that we have downloaded and prepared all the required programs, we can start the analysis!

Mapping
Mapping is performed using HISAT2 and usually the rst step, prior to mapping, is to create an index of the
reference genome. The indices are provided in the data folder but let’s create them again.

1 mkdir my_index
2 cd my_index
3
4 # use the Python scripts to extract splice-site and exon information from a gene annotatio
5 extract_splice_sites.py ../chrX_data/genes/chrX.gtf > chrX.ss
6 extract_exons.py ../chrX_data/genes/chrX.gtf > chrX.exon
7
8 head -3 chrX.ss
9 chrX 276393 281481 +
10 chrX 281683 284166 +
11 chrX 284313 288732 +
12
13 head -3 chrX.exon
14 chrX 276323 276393 +
15 chrX 281393 281683 +
16 chrX 284166 284313 +
17
18 # now to build the index
19 # the --ss and --exon options can be omitted if annotation data is not available
20 time hisat2-build -p 8 --ss chrX.ss --exon chrX.exon ../chrX_data/genome/chrX.fa chrX_tran
21 # screen output not shown to save space
22 Total time for call to driver() for forward index: 00:03:34
23
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
24 real 3m33.870s
25 user 10m10.778s assume that you are happy with it.
26 sys 1m9.074s
Ok
27
28 ls -1
29 chrX.exon
30 chrX.fa
31 chrX.ss
32 chrX_tran.1.ht2
33 chrX_tran.2.ht2
34 chrX_tran.3.ht2
35 chrX_tran.4.ht2
36 chrX_tran.5.ht2
37 chrX_tran.6.ht2
38 chrX_tran.7.ht2
39 chrX_tran.8.ht2

Despite creating our own indices, we’ll use the ones provided by the tutorial for reproducibility’s sake. From
geuvadis_phenodata.csv we saw that there are 12 samples; each sample has two FASTQ les since this is paired-
end data. Let’s start the mapping.

1 # create directory to store mapping results

2 mkdir map
3
4 # map each sample using 8 threads
5 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fas
6 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188104_chrX_1.fas
7 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188234_chrX_1.fas
8 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188245_chrX_1.fas
9 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188257_chrX_1.fas
10 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188273_chrX_1.fas
11 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188337_chrX_1.fas
12 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188383_chrX_1.fas
13 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188401_chrX_1.fas
14 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188428_chrX_1.fas
15 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188454_chrX_1.fas
16 hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR204916_chrX_1.fas
17
18 # mapping took around two and a half minutes
19 # real 2m36.509s
20 # user 15m17.815s
21 # sys 3m29.939s

You should always only store sorted BAM (or CRAM) les and delete the SAM les after conversion.

1 # sort mapping results using SAMtools on 8 threads

2 samtools sort -@ 8 -o map/ERR188044_chrX.bam map/ERR188044_chrX.sam
3 samtools sort -@ 8 -o map/ERR188104_chrX.bam map/ERR188104_chrX.sam
4 samtools sort -@ 8 -o map/ERR188234_chrX.bam map/ERR188234_chrX.sam
5 samtools sort -@ 8 -o map/ERR188245_chrX.bam map/ERR188245_chrX.sam
6 samtools sort -@ 8 -o map/ERR188257_chrX.bam map/ERR188257_chrX.sam
7 samtools sort -@ 8 -o map/ERR188273_chrX.bam map/ERR188273_chrX.sam
8 samtools sort -@ 8 -o map/ERR188337_chrX.bam map/ERR188337_chrX.sam
9 samtools sort -@ 8 -o map/ERR188383_chrX.bam map/ERR188383_chrX.sam
10 samtools sort -@ 8 -o map/ERR188401_chrX.bam map/ERR188401_chrX.sam
11 samtools sort -@ 8 -o map/ERR188428_chrX.bam map/ERR188428_chrX.sam
12 samtools sort -@ 8 -o map/ERR188454_chrX.bam map/ERR188454_chrX.sam
13 samtools sort -@ 8 -o map/ERR204916_chrX.bam map/ERR204916_chrX.sam
14
15 # remove SAM files
16 rm map/*.sam
17
18 # sorting and converting took just over a minute
19 real 1m14.533s
20 user 5m44.637s
21 sys 0m9.590s

Assembly
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.
Ok
Now we need to assemble the mapped reads into transcripts. StringTie can assemble transcripts with or without
annotation; as noted in the protocol, annotation can be helpful when the number of reads for a transcript is too
low for an accurate assembly.

1 # store assembly results in a new directory

2 mkdir assembly
3
4 # create assembly per sample using 8 threads
5 stringtie map/ERR188044_chrX.bam -l ERR188044 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
6 stringtie map/ERR188104_chrX.bam -l ERR188104 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
7 stringtie map/ERR188234_chrX.bam -l ERR188234 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
8 stringtie map/ERR188245_chrX.bam -l ERR188245 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
9 stringtie map/ERR188257_chrX.bam -l ERR188257 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
10 stringtie map/ERR188273_chrX.bam -l ERR188273 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
11 stringtie map/ERR188337_chrX.bam -l ERR188337 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
12 stringtie map/ERR188383_chrX.bam -l ERR188383 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
13 stringtie map/ERR188401_chrX.bam -l ERR188401 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
14 stringtie map/ERR188428_chrX.bam -l ERR188428 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
15 stringtie map/ERR188454_chrX.bam -l ERR188454 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
16 stringtie map/ERR204916_chrX.bam -l ERR204916 -p 8 -G chrX_data/genes/chrX.gtf -o assembly
17
18 # assembly and quantification took a minute and a half
19 # real 1m30.893s
20 # user 1m58.455s
21 # sys 0m9.860s
22
23 # before merging we need to modify mergelist.txt
24 # this is because I created a new directory to store the results
25 # the modified mergelist.txt should look like this
26 cat chrX_data/mergelist.txt
27 assembly/ERR188044_chrX.gtf
28 assembly/ERR188104_chrX.gtf
29 assembly/ERR188234_chrX.gtf
30 assembly/ERR188245_chrX.gtf
31 assembly/ERR188257_chrX.gtf
32 assembly/ERR188273_chrX.gtf
33 assembly/ERR188337_chrX.gtf
34 assembly/ERR188383_chrX.gtf
35 assembly/ERR188401_chrX.gtf
36 assembly/ERR188428_chrX.gtf
37 assembly/ERR188454_chrX.gtf
38 assembly/ERR204916_chrX.gtf
39
40 # merge all transcripts from the different samples
41 stringtie --merge -p 8 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf chrX_data/merge
42
43 # check out the transcripts
44 cat stringtie_merged.gtf | head
45 # stringtie --merge -p 8 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf chrX_data/mer
stringtie
46 # StringTie version 1.3.3b
47 chrX StringTie transcript 322514 323718 1000 . . gene_id "M
48 chrX StringTie exon 322514 323718 1000 . . gene_id "MSTRG.1";
49 chrX StringTie transcript 319145 321319 1000 + . gene_id "M
50 chrX StringTie exon 319145 321319 1000 + . gene_id "MSTRG.2";
51 chrX StringTie transcript 319145 321319 1000 + . gene_id "M
52 chrX StringTie exon 319145 319551 1000 + . gene_id "MSTRG.2";
53 chrX StringTie exon 320208 321319 1000 + . gene_id "MSTRG.2";
54 chrX StringTie transcript 304750 318701 1000 - . gene_id "M
55
56 # how many transcripts?
57 cat stringtie_merged.gtf | grep -v "^#" | awk '$3=="transcript" {print}' | wc -l
58 3491

Let’s compare the StringTie transcripts to known transcripts using g compare.

We use
1 cookies to ensure
# compare the that we givetranscripts
assembled you the best experience on our website. If you continue to use this site we will
to known transcripts
2 gffcompare -r chrX_data/genes/chrX.gtf -G -o merged stringtie_merged.gtf
assume that you are happy with it.
3
4 cat merged.stats Ok
5 # gffcompare v0.10.1 | Command line was:
6 #gffcompare -r chrX_data/genes/chrX.gtf -G -o merged stringtie_merged.gtf
stringtie
7 #
8
9 #= Summary for dataset: stringtie_merged.gtf
stringtie
10 # Query mRNAs : 3281 in 1521 loci (2651 multi-exon transcripts)
11 # (535 multi-transcript loci, ~2.2 transcripts per locus)
12 # Reference mRNAs : 2102 in 1086 loci (1856 multi-exon)
13 # Super-loci w/ reference transcripts: 998
14 #-----------------| Sensitivity | Precision |
15 Base level: 100.0 | 77.6 |
16 Exon level: 100.0 | 85.4 |
17 Intron level: 99.8 | 91.0 |
18 Intron chain level: 99.6 | 69.7 |
19 Transcript level: 99.6 | 63.8 |
20 Locus level: 100.0 | 70.9 |
21
22 Matching intron chains: 1848
23 Matching transcripts: 2094
24 Matching loci: 1086
25
26 Missed exons: 0/8804 ( 0.0%)
27 Novel exons: 971/10608 ( 9.2%)
28 Missed introns: 14/7946 ( 0.2%)
29 Novel introns: 219/8714 ( 2.5%)
30 Missed loci: 0/1086 ( 0.0%)
31 Novel loci: 421/1521 ( 27.7%)
32
33 Total union super-loci across all input datasets: 1521
34 3281 out of 3281 consensus transcripts written in merged.annotated.gtf (0 discarded as red

The high sensitivity means that almost all of the StringTie transcripts match the known transcripts, i.e. low false
negative. The precision is much lower indicating that many of the StringTie transcripts are not in the list of known
transcripts, which are either false positives or truly de novo transcripts. The novel exons, introns, and loci indicate
how many of the sites were not found in the list of known transcripts.

All known transcripts were assembled by StringTie

StringTie, including a few novel ones.

Now that we have our assembled transcripts, we can estimate their abundances.

We use
1 cookies to ensure
stringtie -e -Bthat
-pwe
8 give you the best experience-o
-G stringtie_merged.gtf on ballgown/ERR188044/ERR188044_chrX.gtf
our website. If you continue to use this site we will
map/
2 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188104/ERR188104_chrX.gtf map/
assume that you are happy with it.
3 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188234/ERR188234_chrX.gtf map/
4 stringtie -e -B -p 8 -G stringtie_merged.gtf Ok -o ballgown/ERR188245/ERR188245_chrX.gtf map/
5 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188257/ERR188257_chrX.gtf map/
6 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188273/ERR188273_chrX.gtf map/
7 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188337/ERR188337_chrX.gtf map/
8 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188383/ERR188383_chrX.gtf map/
9 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188401/ERR188401_chrX.gtf map/
10 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188428/ERR188428_chrX.gtf map/
11 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR188454/ERR188454_chrX.gtf map/
12 stringtie -e -B -p 8 -G stringtie_merged.gtf -o ballgown/ERR204916/ERR204916_chrX.gtf map/
13
14 # estimation took just over a minute and a half
15 # real 1m39.661s
16 # user 2m0.179s
17 # sys 0m9.223s
18
19 # check out the files
20 ls -1 ballgown/ERR188044
21 ERR188044_chrX.gtf
22 e2t.ctab
23 e_data.ctab
24 i2t.ctab
25 i_data.ctab
26 t_data.ctab

Differential expression
To perform the expression analyses, we need to use R and Ballgown; I recommend using RStudio. To get started
load the required libraries and the data.

1 library(ballgown)
2 library(RSkittleBrewer)
3 library(genefilter)
4 library(dplyr)
5 library(devtools)
6
7 # change this to the directory that contains all the StringTie results
8 setwd("~/muse/tuxedo")
9
10 # load the sample information
11 pheno_data <- read.csv("chrX_data/geuvadis_phenodata.csv")
12
13 # create a ballgown object
14 bg_chrX <- ballgown(dataDir = "ballgown",
15 samplePattern = "ERR",
16 pData = pheno_data)
17
18 class(bg_chrX)
19 [1] "ballgown"
20 attr(,"package")
21 [1] "ballgown"
22
23 bg_chrX
24 ballgown instance with 3491 transcripts and 12 samples

What methods are available for ballgown objects?

1 methods(class="ballgown")
2 [1] dirs eexpr expr expr<- geneIDs geneN
3 [8] iexpr indexes indexes<- mergedDate pData pData
4 [15] seqnames show structure subset texpr trans
5 see '?methods' for accessing help and source code
6
7 # we can get the gene, transcript, exon, and intron expression levels using
8 # gexpr(), texpr(), eexpr(), and iexpr()
9 head(gexpr(bg_chrX), 2)
10 FPKM.ERR188044 FPKM.ERR188104 FPKM.ERR188234 FPKM.ERR188245 FPKM.ERR188257 FPKM.E
11 MSTRG.1 7.169349
We use cookies to ensure that 10.42652
we give you the best experience13.83639
on our website. If1.050201 5.677819
you continue to use 1
this site we will
12 MSTRG.10 21.428192 13.13144 14.11443 18.454338 10.182308
13 assume that you
FPKM.ERR188383 FPKM.ERR188401 are happy with FPKM.ERR188454
FPKM.ERR188428 it. FPKM.ERR204916
14 MSTRG.1 4.732841 11.424809 5.733899 6.688090 5.061143
15 MSTRG.10 11.815677 8.196958 Ok 9.578302 9.961549 10.997639
16
17 head(texpr(bg_chrX), 2)
18 FPKM.ERR188044 FPKM.ERR188104 FPKM.ERR188234 FPKM.ERR188245 FPKM.ERR188257 FPKM.ERR18827
19 1 23.9694 18.49576 39.70492 14.06822 25.51846 23.8477
20 2 0.0000 0.00000 27.79636 13.96464 44.97094 0.0000
21 FPKM.ERR188401 FPKM.ERR188428 FPKM.ERR188454 FPKM.ERR204916
22 1 28.03131 24.97612 28.2617 20.24706
23 2 25.81932 0.00000 0.0000 0.00000

Next we lter out transcripts with low variance.

1 # note that this subset function is not the base R function but a ballgown one
2 # to see the order in which R looks for functions in packages use search()
3 # search()
4 # [1] ".GlobalEnv" "package:bindrcpp" "package:devtools" "package
5 # [5] "package:genefilter" "package:RSkittleBrewer" "package:ballgown" "tools:r
6 # [9] "package:stats" "package:graphics" "package:grDevices" "package
7 # [13] "package:datasets" "package:methods" "Autoloads" "package
8 #
9 # the rowVars is from the genefilter package and calculates the row variance
10 bg_chrX_filt <- subset(bg_chrX, "rowVars(texpr(bg_chrX)) >1", genomesubset=TRUE)
11
12 # 1,264 transcripts were filtered out
13 bg_chrX_filt
14 ballgown instance with 2227 transcripts and 12 samples

Perform the di erential expression analysis stattest() function; confounders are speci ed using the adjustvars
parameter, which has to match the column name in pheno_data. We are testing for transcripts and genes that are
di erentially expressed between male and females, hence sex is our covariate of interest. In addition to testing
transcripts and genes, we can also test di erential expression at exons and introns; just change the feature
parameter accordingly.

1 head(pData(bg_chrX_filt), 3)
2 ids sex population
3 1 ERR188044 male YRI
4 2 ERR188104 male YRI
5 3 ERR188234 female YRI
6
7 # test on transcripts
8 results_transcripts <- stattest(bg_chrX_filt,
9 feature="transcript",
10 covariate="sex",
11 adjustvars = c("population"),
12 getFC=TRUE, meas="FPKM")
13
14 # results are in a data frame
15 class(results_transcripts)
16 [1] "data.frame"
17
18 dim(results_transcripts)
19 [1] 2227 5
20
21 head(results_transcripts)
22 feature id fc pval qval
23 1 transcript 1 0.9386481 0.7208669 0.9454480
24 2 transcript 2 1.2073309 0.8670656 0.9756579
25 3 transcript 3 1.0058534 0.9964598 0.9997816
26 4 transcript 4 0.3847566 0.5214029 0.9290666
27 5 transcript 5 0.6089373 0.3247825 0.9278154
28 6 transcript 6 0.6449469 0.3062408 0.9253708
29
30 table(results_transcripts$qval < 0.05)
31
We 32
use cookies
FALSE toTRUE
ensure that we give you the best experience on our website. If you continue to use this site we will
33 2215 12 assume that you are happy with it.
34
35 # test on genes Ok
36 results_genes <- stattest(bg_chrX_filt,
37 feature="gene",
38 covariate="sex",
39 adjustvars = c("population"),
40 getFC=TRUE, meas="FPKM")
41
42 class(results_genes)
43 [1] "data.frame"
44
45 dim(results_genes)
46 [1] 1013 5
47
48 table(results_genes$qval<0.05)
49
50 FALSE TRUE
51 1002 11

The results_transcripts data frame doesn’t contain any identi ers; we will create a new data frame with this
information.

1 # the order is the same so we can simply combine the information

2 results_transcripts <- data.frame(geneNames = geneNames(bg_chrX_filt),
3 geneIDs = geneIDs(bg_chrX_filt),
4 results_transcripts)
5
6 # now we have the identifiers
7 head(results_transcripts)
8 geneNames geneIDs feature id fc pval qval
9 1 . MSTRG.4 transcript 1 0.9386481 0.7208669 0.9454480
10 2 PLCXD1 MSTRG.4 transcript 2 1.2073309 0.8670656 0.9756579
11 3 . MSTRG.4 transcript 3 1.0058534 0.9964598 0.9997816
12 4 . MSTRG.4 transcript 4 0.3847566 0.5214029 0.9290666
13 5 . MSTRG.5 transcript 5 0.6089373 0.3247825 0.9278154
14 6 PLCXD1 MSTRG.4 transcript 6 0.6449469 0.3062408 0.9253708
15
16 # which transcripts are detected as differentially expressed at qval < 0.05?
17 results_transcripts %>% filter(qval < 0.05)
18 geneNames geneIDs feature id fc pval qval
19 1 PNPLA4 MSTRG.64 transcript 186 0.592477057 2.119474e-04 4.290970e-02
20 2 . MSTRG.140 transcript 421 3.141219608 6.096529e-05 1.508552e-02
21 3 KDM6A MSTRG.255 transcript 734 0.054166544 1.208983e-04 2.692404e-02
22 4 RPS4X MSTRG.511 transcript 1605 0.598737678 2.560509e-04 4.751878e-02
23 5 TSIX MSTRG.522 transcript 1648 0.078029979 1.743580e-06 7.765906e-04
24 6 . MSTRG.523 transcript 1649 0.016057740 3.872369e-10 2.874589e-07
25 7 XIST MSTRG.523 transcript 1650 0.002997908 1.849406e-10 2.059314e-07
26 8 . MSTRG.523 transcript 1651 0.030714646 1.360867e-10 2.059314e-07
27 9 . MSTRG.523 transcript 1652 0.028289665 6.782559e-08 3.776190e-05
28 10 . MSTRG.605 transcript 1843 7.378759461 1.285917e-05 4.772897e-03
29 11 . MSTRG.612 transcript 1847 9.154881892 4.889775e-05 1.361191e-02
30 12 . MSTRG.766 transcript 2333 0.272425415 1.909634e-05 6.075365e-03

Let’s create a MA plot.

1 library(ggplot2)
2 library(cowplot)
3
4 results_transcripts$mean <- rowMeans(texpr(bg_chrX_filt))
5
6 ggplot(results_transcripts, aes(log2(mean), log2(fc), colour = qval<0.05)) +
7 scale_color_manual(values=c("#999999", "#FF0000")) +
8 geom_point() +
9 geom_hline(yintercept=0)

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.

Ok
Summary
The new tuxedo package is very fast; I realise that the tutorial only used a small subset of reads that were already
determined to map to chromosome X. Despite this, the mapping and assembly took mere minutes. A recent
benchmark of RNA-seq aligners did demonstrate that HISAT or HISAT2 was the fastest splice-aware mapper out of
14 algorithms. However, HISAT or HISAT2 had a low recall percentage when mapping reads with high complexity,
i.e. more polymorphic sites and higher error rates, on the default settings; mapping accuracy was vastly improved
after tuning the parameters.

I plan to set up a Snakemake pipeline for running the new tuxedo suite and will compare it with other pipelines,
such as this STAR and Cu inks/RSEM pipeline.

This work is licensed under a Creative Commons

Attribution 4.0 International License.

 Twitter  Facebook  LinkedIn  Email

LIKE THIS:

Like
Be use
We the first to like this.
cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.
Ok
RELA TED

Getting started with TopHat Getting started with Picard Getting started with Seurat
May 9, 2012 July 26, 2014 August 1, 2017
In "bioinformatics" In "bioinformatics" In "single cell"

 Posted in bioinformatics
 Tagged RNA-seq

1 1 C O M M EN TS A DD Y O URS

N A N DI T A
January 31, 2018 at 7:59 am

Hi Dave- I was wondering if you could comment on an observation we made when we ran this pipeline
as described here.

We did an experiment in mouse, knockout vs WT. For alignment we used hisat2, default parameters.
Followed by stringtie
stringtie, and ballgown. We got a large number of signi cantly D.E. “transcripts”, but, when
we conducted a gene level analysis, we got barely any D.E. genes. The D.E. transcripts list mostly has the
same gene showing D.E. of di erent splice forms in each condition. Since we are dealing with the same
tissue, we really don’t expect such a huge splicing e ect. I wonder if many of the splice variants could be
mapping artifacts, because, in some cases, I look at the aligned reads in a browser and it shows no
di erence between the two samples in terms of # of reads mapped.

RE PLY

DA VO
February 1, 2018 at 12:56 am

Hi Nandita,

I recall that a former colleague had a similar problem to what you are describing, which is the
discrepancy in DE between genes and transcripts. Regarding your example, I guess the obvious
thing to do (which you may have already done) is to create an expression table of the gene and
another of the transcripts belonging to the same gene. Perhaps in the knockout, it has switched
to another splice variant, therefore there is DE on the transcript level. However, when you
collapse expression onto a gene level they are expressed similarly. I’m not so sure about what
you meant about mapping artifacts though. If there was a systematic artifact, it should a ect
both samples equally and you shouldn’t have a discrepancy only in one sample.

Cheers,
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
Dave assume that you are happy with it.
Ok
RE PLY
N A N DI T A
February 1, 2018 at 7:37 am

Thanks for responding, Dave.

“Perhaps in the knockout, it has switched to another splice variant, therefore there is DE on the
transcript level.” This would be really cool if true, and the when we use the “plot transcripts” function in
ballgown (Fig 5. page 1664 in the Nature Protocols 2016 paper) to look at one of these cases, it indeed
implies di erent transcripts are expressed.

However, when using the ucsc genome browser to view bigwig les generated from the aligned bams,
we cannot see any unique splice junction being covered in one condition versus the other. So we are not
sure why these reads have been assigned to di erent splice isoforms. Additionally, like I said, we really
don’t expect so many events where isoforms are switched in the system we are examining.

“If there was a systematic artifact, it should a ect both samples equally and you shouldn’t have a
discrepancy only in one sample.”

Agreed. I am unable to explain it either. ? Short of eyeballing every such event in a genome browser, or
asking the lab to validate via qPCR, I’m not able to assign con dence in the di erential transcripts
results, even though the fc, p-val and q-val look very good.

RE PLY

UPE N DR A K UM A R DE VI S E T T Y
June 9, 2018 at 4:13 am

Hi Dan,
Very nice blog. I have one quick questions. Is there a way one can logFC in addition to FC in ballgown
output?
Thanks,
Upendra

RE PLY

R A M A N S E T HI
January 3, 2019 at 10:30 am

Nice blog. I want to ask how Ballgown compares with DESeq2? And which is the best tool to plot heat
maps, GO and Pathway Analysis, PCA Analysis? Thank you!

RE PLY

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
DA VO assume that you are happy with it.

January 23, 2019 at 8:32 am Ok

I haven’t made the comparison yet. For heatmaps, pathway analysis, and PCA I like pheatmap,
fgsea, and FactoMineR, respectively.

RE PLY

J O S R UI R O D5
January 9, 2019 at 10:25 am

Hi, very nice blog entry.

Thanks for the comprehensive explanations. At the end, did you compare the pipline of the new tuxedo
suite with others such as the STAR/Cu inks? I couldn’t nd any other entry. I wonder if you have any
comment on this.

Thank you!

RE PLY

DA VO
January 23, 2019 at 8:26 am

I haven’t done the comparison yet. It’s on my TODO list.

RE PLY

FA W Z I Y A S S I N E
March 16, 2019 at 3:45 pm

how to interpret fold change (fc) in ballgown results, a fake example calculation is appreciated.
regards,

RE PLY

J O S E B A S I LI O
March 26, 2019 at 2:49 pm

Thank you for your post. I would like to know if you have the possibility to get, and send to my email,
the paper which you have mentioned at the end of your post:
https://link.springer.com/protocol/10.1007%2F978-1-4939-4035-6_14

Thank you once again.

Best Regards, José

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
RE PLY
assume that you are happy with it.
Ok
DA VI D HUE LS
April 4, 2019 at 11:46 am

Hi Dave,
thank you for your detailed blog post. Very helpful!
Which les did you load in the IGV to visualise the known as well as the novel transcripts?
Cheers
David

RE PLY

Notify me of follow-up comments by email.

Notify me of new posts by email.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.

Ok
Search … S E A RC H

WHO ' S O N L I N E

31 visitors online now

27 guests, 4 bots, 0 members
Map of Visitors

SUP P O R T

Buy me a co ee

L I CE N SE

This work is licensed under a Creative Commons Attribution 4.0 International License.

R E CE N T P O ST S

Setting up Windows for bioinformatics in 2019

Importing vector images into R

The Golden Rule of Bioinformatics

Visualising Google Trends results with R

Getting started with Cell Ranger

10x single cell BAM les

Interactive plots in R

Making a heatmap in R with the pheatmap package

Organising computational biology projects with Cookiecutter

Compiling R with GNU Readline

R E CE N T CO M M E N T S

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.
Ashley on Getting started with Seurat
Ok
Gwang-Jin Kim on Getting started with Monocle
Santiago on Making a heatmap in R with the pheatmap package

David on The Adjusted Rand index

Daniela on Creating a coverage plot using BEDTools and R

Josh on Tissue speci city

Rittik on On curve tting using R

Jyoti on Setting up Windows for bioinformatics in 2019

David Huels on Getting started with HISAT, StringTie

StringTie, and Ballgown

Dimitris on Making a heatmap in R with the pheatmap package

T AG CL O UD

6mer 10x annotation bedtools bioinformatics biomaRt CAGE clustering correlation DGE
encode etc fork genome GO graph heatmap histones home machine learning mapping

maths miRNA motif OMIM parser pca perl promoter python R refseq repeats rnaseq SAM scan

sequencing snps spearman statistics TFBS tips twitter variants visualisation

AR CHI V E S
Tweets by @davetang31

Dave Tang Retweeted April 2019

RStudio
February 2019
@rstudio

In light of the recently disclosed incident of sexual January 2019

harassment at DataCamp and their response to it,
including the attempt to conceal their public December 2018
acknowledgement from search engines, we want
August 2018
to share the steps that we have taken:

June 2018
Apr 15, 2019

May 2018
Dave Tang
@davetang31 February 2018
Given everything I've read about DataCamp in the
January 2018
past week, I have unsubscribed and deleted my
account. I have also removed all links to their site October 2017
from my blog and will stop recommending it.
September 2017
Apr 15, 2019
We use cookies to ensure that we give you the best August 2017on our website. If you continue to use this site we will
experience
Dave Tang Retweeted assume that you are happy with it.
July 2017
F Rodriguez-Sanchez Ok
@frod_san
Software authors deserve being cited too! June 2017
For #rstats, just run `grateful::cite_packages()` and
March 2017
get citations ready to paste into your
manuscript!github.com/Pakillo/gratef…
February 2017

Pakillo/grateful January 2017

Facilitate citation of R package…
github.com November 2016

October 2016
Feb 6, 2019
September 2016
Dave Tang Retweeted
August 2016
bioRxiv Bioinfo
@biorxiv_bioinfo July 2016

A comprehensive analysis of the usability and

May 2016
archival stability of omics computational tools and
resources biorxiv.org/cgi/content/sh… March 2016
#biorxiv_bioinfo
January 2016
A comprehensive analysis of t…
December 2015
Developing new software tools …
biorxiv.org
October 2015

August 2015
Oct 25, 2018

July 2015
Dave Tang Retweeted

ＡｎｉｓＭｕｓｌｉ ć ⣢ June 2015

@0xUID
May 2015
A Unix Shell poster from 1983
April 2015

March 2015

February 2015

January 2015

December 2014

November 2014

October 2014

September 2014

August 2014

July 2014
Oct 13, 2018
June 2014
Dave Tang Retweeted
May 2014
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
Eric Alper
@ThatEricAlper assume that you2014
April are happy with it.

When you're overqualified for the job Ok

March 2014
February 2014

January 2014

December 2013

November 2013

October 2013
Oct 2, 2018
September 2013

Dave Tang Retweeted August 2013

Jason Sheltzer
July 2013
@JSheltzer

The Nobel Prize in Medicine will be announced this May 2013

coming Monday. You might think that the winner is a
secret, but, with some degree of confidence, you April 2013
can narrow it down to some likely candidates -
March 2013
Sep 28, 2018
February 2013

Dave Tang Retweeted January 2013

Stephen Turner
December 2012
@strnr

#ASMNGS18 @torstenseemann describes the November 2012

woes of installing/running bioinformatics software.
Many bioinformaticians' colleagues don't realize October 2012
bioinfo software != production/enterprise software.
"Just run the software on my data" isn't as easy as September 2012
it sounds.
August 2012

July 2012

June 2012

May 2012

April 2012

March 2012
Sep 26, 2018
February 2012

Dave Tang Retweeted January 2012

Sean Kross
@seankross November 2011

Paraphrasing @mgymrek: October 2011

Your paper is cited outside of your field in one September 2011

semi-related paragraph: *counts towards your
career progression* August 2011

We use cookies to ensure that we give you the best experience

July 2011 on our website. If you continue to use this site we will
Your academic software package has 100 stars on
assume that you are happy with it.
GitHub: *counts for nothing*
June 2011
Ok
Something is wrong here #jsm2018
May 2011
Jul 30, 2018
April 2011
Dave Tang Retweeted
January 2011
Stephen Turner
@strnr December 2010
FASTQ sequence quality visualisation with Emoji
November 2010
github.com/lonsbio/fastqe

October 2010

ME T A

Entries RSS
Jul 25, 2018
Comments RSS
Embed View on Twitter
WordPress.org

I N T E N T I O N AL L Y BL AN K

Boston Theme by FameThemes

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will
assume that you are happy with it.
Ok

Dokumen - Pub Cultural Anthropology Asking Questions About Humanity 2nd Paperbacknbsped 0190679026 9780190679026
100% (2)
Dokumen - Pub Cultural Anthropology Asking Questions About Humanity 2nd Paperbacknbsped 0190679026 9780190679026
492 pages
HISAT2
100% (1)
HISAT2
35 pages
Biopython Tutorial
100% (1)
Biopython Tutorial
26 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
A Short History of Transfusion Medicine
No ratings yet
A Short History of Transfusion Medicine
14 pages
Hydra Text Book
No ratings yet
Hydra Text Book
3 pages
Poster PPT Portrait
No ratings yet
Poster PPT Portrait
1 page
Hisat 2
No ratings yet
Hisat 2
7 pages
Manual - Hisat2
No ratings yet
Manual - Hisat2
18 pages
Manual - Hisat2
No ratings yet
Manual - Hisat2
18 pages
RNAseq
No ratings yet
RNAseq
58 pages
HTSeq Presentazione 2
No ratings yet
HTSeq Presentazione 2
6 pages
Fall 2018 BIF401 1 Solution
No ratings yet
Fall 2018 BIF401 1 Solution
9 pages
Linux Examples Exercises
No ratings yet
Linux Examples Exercises
7 pages
BPGA User Manual
No ratings yet
BPGA User Manual
9 pages
Bi183 HW2
No ratings yet
Bi183 HW2
4 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Tcseq: Time Course Sequencing Data Analysis
No ratings yet
Tcseq: Time Course Sequencing Data Analysis
8 pages
Bio Tools Booklet
No ratings yet
Bio Tools Booklet
5 pages
Bioinformatics Assingment - B8.Docx Alex Presly-37
No ratings yet
Bioinformatics Assingment - B8.Docx Alex Presly-37
10 pages
Andreas G Skifjeld Master
No ratings yet
Andreas G Skifjeld Master
159 pages
Package Ontologyindex': R Topics Documented
No ratings yet
Package Ontologyindex': R Topics Documented
14 pages
UCSC Genome Browser Presentation
No ratings yet
UCSC Genome Browser Presentation
11 pages
Martinez - Phylogenetics Assignment-1
No ratings yet
Martinez - Phylogenetics Assignment-1
3 pages
Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
No ratings yet
Vimal Roll No 2211022 ANALYSIS TOOL. PHYLIPpptx
27 pages
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
No ratings yet
FreeBayes variant calling workflow for DNA-Seq - Bioinformatics Workbook
9 pages
biological_networks_construction
No ratings yet
biological_networks_construction
7 pages
RIP-Tutorials-bioinformatics
No ratings yet
RIP-Tutorials-bioinformatics
19 pages
Li, Marin, Farhat - 2024 - Exploring Gene Content With Pangenome Gene Graphs
No ratings yet
Li, Marin, Farhat - 2024 - Exploring Gene Content With Pangenome Gene Graphs
16 pages
2015 PAG Variant PDF
No ratings yet
2015 PAG Variant PDF
65 pages
Media 1
No ratings yet
Media 1
18 pages
BTC 506 Gene Identification Using Bioinformatic Tools-230302130331
No ratings yet
BTC 506 Gene Identification Using Bioinformatic Tools-230302130331
14 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
PROTOCOL Galaxy Practical-2022
No ratings yet
PROTOCOL Galaxy Practical-2022
28 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
R GWAS Packages
No ratings yet
R GWAS Packages
18 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Active Driver WGS
No ratings yet
Active Driver WGS
16 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
Software: Next-Generation Sequence Alignment Software
No ratings yet
Software: Next-Generation Sequence Alignment Software
3 pages
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
No ratings yet
Phylip Via Emboss - Tree Building:: Phylip (Phylogeny Inference Programs)
17 pages
Send E-Mail To Darwin Team
No ratings yet
Send E-Mail To Darwin Team
116 pages
BioInformatics For Newbies Dantelan
No ratings yet
BioInformatics For Newbies Dantelan
46 pages
alignment
No ratings yet
alignment
3 pages
Lecture5 Sequence Comparison-2019
No ratings yet
Lecture5 Sequence Comparison-2019
91 pages
Phylofriend User Guide: Dirk Struve Phylofriend at Projectory - de
No ratings yet
Phylofriend User Guide: Dirk Struve Phylofriend at Projectory - de
26 pages
Advanced Population and Medical Genetics EPI511, Spring 2019 Experience 1
No ratings yet
Advanced Population and Medical Genetics EPI511, Spring 2019 Experience 1
4 pages
2023s2 Cosc122 Assignment1 Handout
No ratings yet
2023s2 Cosc122 Assignment1 Handout
9 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
No ratings yet
A Tutorial: Genome - Based RNA - Seq Analysis Using The TUXEDO Package (Updated: 2014 - 10 - 21)
17 pages
Root and Pyroot
No ratings yet
Root and Pyroot
48 pages
Mileidy W. Gonzalez and William R. Pearson
No ratings yet
Mileidy W. Gonzalez and William R. Pearson
23 pages
Webb 2008 Phylocom
No ratings yet
Webb 2008 Phylocom
3 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
Laboratorio Di Basi Di Dati Per Bioinformatica
No ratings yet
Laboratorio Di Basi Di Dati Per Bioinformatica
20 pages
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
No ratings yet
R Tutorial For Identification of Positional and Functional Candidate Genes Using R
15 pages
Genomic Analyses Using Radseq: 1. Raw Data Manipulation
No ratings yet
Genomic Analyses Using Radseq: 1. Raw Data Manipulation
7 pages
Genomic Data Preprocessing Through Different Libraries
No ratings yet
Genomic Data Preprocessing Through Different Libraries
30 pages
Metagen Overview
No ratings yet
Metagen Overview
1 page
Paulson_2017
No ratings yet
Paulson_2017
10 pages
HLA Sensitisation - Can It Be Prevented
No ratings yet
HLA Sensitisation - Can It Be Prevented
11 pages
Application of PCR-SSP Method For HLA-B27 Identification As An Auxiliary Tool For Diagnosis of Ankylosing Spondylitis
No ratings yet
Application of PCR-SSP Method For HLA-B27 Identification As An Auxiliary Tool For Diagnosis of Ankylosing Spondylitis
6 pages
A Genome-Scale, Constraint-Based Approach To Systems Biology of Human Metabolism
No ratings yet
A Genome-Scale, Constraint-Based Approach To Systems Biology of Human Metabolism
6 pages
Mephisto Signal
No ratings yet
Mephisto Signal
3 pages
Brandi Gord Ok
No ratings yet
Brandi Gord Ok
5 pages
Blood Donation and Acupuncture
No ratings yet
Blood Donation and Acupuncture
2 pages
143-Probioticos - Cepas, Microbiota, Disbiose, Infecções-02
No ratings yet
143-Probioticos - Cepas, Microbiota, Disbiose, Infecções-02
73 pages
Ingles Instrumental - Atividade 1
No ratings yet
Ingles Instrumental - Atividade 1
5 pages
Artigo Modulo 1 Brief Bioinform 2001 Marcotte 363 74
No ratings yet
Artigo Modulo 1 Brief Bioinform 2001 Marcotte 363 74
12 pages
FABCO Pipe Catalog PDF
No ratings yet
FABCO Pipe Catalog PDF
24 pages
a level marking scheme
No ratings yet
a level marking scheme
24 pages
Science Activities Announcement
No ratings yet
Science Activities Announcement
7 pages
Outline: Adviser: Feng Huei Lin, PHD Speaker: Jung Feng Lin Chun Ting Yang Yen Jye Shyong Date: 2012/08/13
No ratings yet
Outline: Adviser: Feng Huei Lin, PHD Speaker: Jung Feng Lin Chun Ting Yang Yen Jye Shyong Date: 2012/08/13
6 pages
Chapter 8: More Number Theory
No ratings yet
Chapter 8: More Number Theory
5 pages
Powder Metallurgy
No ratings yet
Powder Metallurgy
35 pages
SOx Emmissions Absortion FLS
No ratings yet
SOx Emmissions Absortion FLS
14 pages
GATE-2023: Electronic Devices & Circuits
No ratings yet
GATE-2023: Electronic Devices & Circuits
57 pages
Filling Machine Water Arduino Uno
No ratings yet
Filling Machine Water Arduino Uno
5 pages
The Road Not Taken
No ratings yet
The Road Not Taken
21 pages
Boris Plotkin
No ratings yet
Boris Plotkin
19 pages
Structure and Property Characterization of Oyster Shell Cementing Material
No ratings yet
Structure and Property Characterization of Oyster Shell Cementing Material
8 pages
Manual Motor Diesel 1135386, 1135389, 1135411, 1135412, 1135413, 1135414, 1135415, 1135416 - 2
No ratings yet
Manual Motor Diesel 1135386, 1135389, 1135411, 1135412, 1135413, 1135414, 1135415, 1135416 - 2
316 pages
As 1774.28-2007 Refractories and Refractory Materials - Physical Test Methods Ceramic Fibre Products - Test M
No ratings yet
As 1774.28-2007 Refractories and Refractory Materials - Physical Test Methods Ceramic Fibre Products - Test M
3 pages
1978 Bingham - Innovation, Bureaucracy, and Public Policy - A Study of Innovation Adoption by Local Government
No ratings yet
1978 Bingham - Innovation, Bureaucracy, and Public Policy - A Study of Innovation Adoption by Local Government
29 pages
(InterAmerican Research - Contact, Communication, Conflict) Stefanie Quakernack - Political Protest and Undocumented Immigrant Youth - (Re-) Framing Testimonio-Routledge (2018)
No ratings yet
(InterAmerican Research - Contact, Communication, Conflict) Stefanie Quakernack - Political Protest and Undocumented Immigrant Youth - (Re-) Framing Testimonio-Routledge (2018)
233 pages
Ece 232
No ratings yet
Ece 232
3 pages
Research on Cognition Disorders Theoretical and Methodological Issues Benito Damasceno - The full ebook set is available with all chapters for download
100% (1)
Research on Cognition Disorders Theoretical and Methodological Issues Benito Damasceno - The full ebook set is available with all chapters for download
66 pages
Bca 2079
No ratings yet
Bca 2079
9 pages
PLUM - Ordinal Regression: Notes
No ratings yet
PLUM - Ordinal Regression: Notes
4 pages
00289ATFWS - Genuine Toyota Automatic Transmission Fluid-WS. ATF - WS - Genuine Toyota Part
No ratings yet
00289ATFWS - Genuine Toyota Automatic Transmission Fluid-WS. ATF - WS - Genuine Toyota Part
1 page
CRZ Notification: A Case Study
No ratings yet
CRZ Notification: A Case Study
29 pages
Northern India:: Heralding The Next Chapter of Growth and Development
No ratings yet
Northern India:: Heralding The Next Chapter of Growth and Development
56 pages
Lab Report (Tensile) - Mohd Syazwan Bin Sarudin - 2021836538
No ratings yet
Lab Report (Tensile) - Mohd Syazwan Bin Sarudin - 2021836538
21 pages
Essay Self-Assessment Sheets
No ratings yet
Essay Self-Assessment Sheets
3 pages
First Periodical Examination (Final) in Gen Math Sy 2019-2020
No ratings yet
First Periodical Examination (Final) in Gen Math Sy 2019-2020
5 pages
Artculo Karen Ocampo Et Al 2020
No ratings yet
Artculo Karen Ocampo Et Al 2020
14 pages
NEOM
No ratings yet
NEOM
1 page
Cepsa Atf Avant Diii
No ratings yet
Cepsa Atf Avant Diii
1 page

Getting Started With HISAT, StringTie, and Ballgown

Uploaded by

Getting Started With HISAT, StringTie, and Ballgown

Uploaded by

 MENU Search and hit enter...

DAVE TANG'S BLOG

Getting started with HISAT, StringTie,

Download SAMtools from http://www.htslib.org/download/ and compile.

1 # unzip and compile

1 # create directory to store mapping results

1 # sort mapping results using SAMtools on 8 threads

1 # store assembly results in a new directory

Let’s compare the StringTie transcripts to known transcripts using g compare.

All known transcripts were assembled by StringTie

What methods are available for ballgown objects?

Next we lter out transcripts with low variance.

1 # the order is the same so we can simply combine the information

Let’s create a MA plot.

This work is licensed under a Creative Commons

 Twitter  Facebook  LinkedIn  Email

Thanks for responding, Dave.

January 23, 2019 at 8:32 am Ok

Hi, very nice blog entry.

I haven’t done the comparison yet. It’s on my TODO list.

Thank you once again.

Notify me of follow-up comments by email.

Notify me of new posts by email.

31 visitors online now

Setting up Windows for bioinformatics in 2019

Importing vector images into R

The Golden Rule of Bioinformatics

Visualising Google Trends results with R

Getting started with Cell Ranger

10x single cell BAM les

Making a heatmap in R with the pheatmap package

Organising computational biology projects with Cookiecutter

Compiling R with GNU Readline

David on The Adjusted Rand index

Daniela on Creating a coverage plot using BEDTools and R

Josh on Tissue speci city

Rittik on On curve tting using R

Jyoti on Setting up Windows for bioinformatics in 2019

David Huels on Getting started with HISAT, StringTie

Dimitris on Making a heatmap in R with the pheatmap package

sequencing snps spearman statistics TFBS tips twitter variants visualisation

Dave Tang Retweeted April 2019

In light of the recently disclosed incident of sexual January 2019

Pakillo/grateful January 2017

A comprehensive analysis of the usability and

Ａｎｉｓ Ｍｕｓｌｉ ć ⣢ June 2015

When you're overqualified for the job Ok

Dave Tang Retweeted August 2013

The Nobel Prize in Medicine will be announced this May 2013

Dave Tang Retweeted January 2013

#ASMNGS18 @torstenseemann describes the November 2012

Dave Tang Retweeted January 2012

Paraphrasing @mgymrek: October 2011

Your paper is cited outside of your field in one September 2011

We use cookies to ensure that we give you the best experience

Copyright © 2019 Dave Tang's blog. All Rights Reserved.

You might also like

ＡｎｉｓＭｕｓｌｉ ć ⣢ June 2015