0% found this document useful (0 votes)

14 views

Bioinformatics Experimental Design

The document provides guidance on experimental design for Next Generation Sequencing experiments. It discusses important considerations like platform choice, whether to use paired-end or single-end sequencing, recommended read lengths, replication levels, randomization techniques, and recommended sequencing depth for different applications. The goal is to help researchers plan experiments for robust statistical analysis and the best outcomes given their research needs.

Uploaded by

Marisol Benítez

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Bioinformatics Experimental Design

Uploaded by

Marisol Benítez

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Bioinformatics Advice on Experimental Design

Where do I start?
Please refer to the following guide to better plan your experiments for good statistical
analysis, best suited for your research needs. Statistics cannot rescue a bad experimental
design.

Please contact our Bioinformatics team for a consultation when in doubt.

Next Generation Sequencing (NGS) experiments

Many steps in the experimental process can introduce various biases and errors, and careful
consideration must be given to the following aspects:

§ Platform choice:

Platform Platform
Genome Sequencer Genome Hiseq 2000 SOLiD 4 system HeliScope
FLX Titanium System Analyzer IIx
Company Roche Illumina Illumina Applied Biosystems Helicos
Biosciences
Read length 400-600bp 2x100bp 2x100-150bp 50 +25bp ~30bp
Samples per run 16 8 16 16 50
Reads per run ~1 million ~300million ~800 million >700 million ~500 million
Run time 10 h 8 days 8 days 11-13 days 8 days
Website www.454.com www.illumina.com www.illumina.com www.appliedbiosystems.com www.helicosbio.com

These numbers change rapidly as technology improves. Please note that these numbers are
based on data from Oct. 2010. Please refer to the websites listed under each platform for the
latest numbers.

§ Type of Run – Paired End (PE) or Single End (SE):

The following table provides a guide to what type to run is recommended for typical
applications of various NGS assays.

Center for Research Informatics Bioinformatics Core last updated May 2015
Paired End Single End
RNASeq - De novo Assembly RNASeq - Counting
RNASeq - Splicing ChIP-Seq - Counting
ChIP Seq – Epigenetic modifications
DNA – SNP Identification
DNA – Indel identification
DNA – Structural variants

§ Read Length:
50bp reads are typically sufficient for read mapping to the reference genome, and
RNASeq counting experiments. >100bp reads are useful for whole genome and
transcriptome studies based on the application.

§ Replication:
Samples must be sequenced with replicates to identify sources of variance and increase
statistical power to separate true biological variance from technical variance. Biological
replicates are critical whereas technical replicates are typically not required.

Cutting back replicates to reduce cost might seem like a good option, but remember: A sample or
sequencing run can fail, and lead to repeating the experiment.

In general, 4 biological replicates per experiment are recommended, however, 3

replicates if also reasonable. Please consult with us with further questions. You can also
use http://bioinformatics.bc.edu/marthlab/scotty/help.html for calculation of power
from your pilot data.

§ Randomization:
Assign individuals at random to different groups to reduce bias. We recommend
randomization of samples such that each sequencing lane contains samples from all
experimental groups. Please refer to Blocking and Multiplexing below to understand how to
do this.

§ Blocking & Multiplexing:

Distribute samples across various lanes on the flowcell to avoid lane effects. Use
multiplexing effectively for balanced block designs. (Fig.1) But all samples cannot be
sequenced on each lane as the number of unique barcodes for each lane also limits us.
Solution: Balanced incomplete block design.

“Block what you can and randomize what you cannot.” – Box, Hunter, & Hunter (1978)

Center for Research Informatics Bioinformatics Core last updated May 2015
Group! A! B!

Biological 1! 2! 3! 1! 2! 3!
replicates!

RNA
R1! R2! R3! R1! R2! R3!
extraction!

Flowcell! Flowcell!

L1! L1! L1! L1! L1! L1!

L2! L2! L2! L2! L2! L2!
L3! L3! L3! L3! L3! L3!
L1! L2! L3! L4! L5! L6! L4!
L5!
L4!
L5!
L4!
L5!
L4!
L5!
L4!
L5!
L4!
L5!
L6! L6! L6! L6! L6! L6!

Lane1 Lane2 Lane3 Lane4 Lane5 Lane6! Lane1 Lane2 Lane3 Lane4 Lane5 Lane6!

✗ ✔
If,
I= Number of groups/treatments
J= Number of biological replicates per treatment
s= Number of unique barcodes that can be added in one lane
L= Number of lanes sequenced
T=Total number of technical replicates
sL
T=
JI
If s<I, complete block design is not possible. [1]

§ Sequencing depth:
The following table provides general recommendations for coverage/reads
(https://genohub.com/recommended-sequencing-coverage-by-application/) for typical
read lengths for the HUMAN genome. Please visit https://genohub.com/next-
generation-sequencing-guide/#reads for typical number of reads/lane for various
commonly used NGS platforms.

A useful resource from Illumina for specific coverage estimates for various Illumina
instruments and genomes of different sizes is
http://support.illumina.com/downloads/sequencing_coverage_calculator.html

Center for Research Informatics Bioinformatics Core last updated May 2015
DNA:

RNA (for human/mouse genome):

Please note that the number of reads your need for any type of RNASeq also depends on
the desired dynamic range of expression.

Center for Research Informatics Bioinformatics Core last updated May 2015
Microarray Experiments
A very useful resource for microarray design is:
http://discover.nci.nih.gov/microarrayAnalysis/Experimental.Design.jsp
• Balanced samples
o Same amount of cases and controls
o Matched phenotypes: gender, age, etc.
• Biological replicates
o Pure background to avoid biological variation
o More replicates are needed if there is larger variation between individuals and
small difference between groups
• Avoid technical variation
o Process sample at same condition as much as possible
o Technician, reagents, time, procedures
• Randomize samples on array
o Avoid confounding technical and biological factors
o Randomly put samples on different array slides and positions

Center for Research Informatics Bioinformatics Core last updated May 2015
Frequently Asked Questions
´ What if I do not have replicates of data points?
Understand the limitations of un-replicated data! You cannot separate technical variance
from biological variance, thus, the results only apply to the data points sequenced but cannot
be extrapolated to the population.

´ What is difference between Biological replicates and technical replicates?

Technical replicates: measure quantity from 1 source. This measures the reproducibility of the
results. The differences are based only on technical issues in the measurement. (I weigh
myself three times, do I get different weights? How different?)
Biological replicates: measure a quantity from difference sources under the same conditions.
Tumors from 5 different people with lung cancer may show similar gene expression
patterns. These replicates are useful to show what is similar in your replicates and how they
are different from a different set of conditions (ie. treated, normal).

Biological variation is intrinsic to all organisms; it may be influenced by genetic or

environmental factors, as well as by whether the samples are pooled or individual. Technical
variation is introduced during the extraction, labeling and hybridization of samples.
Measurement is associated with reading the fluorescent signals, which may be affected by
factors such as dust on the array.

References
1. P. L. Auer and R. W. Doerge. 2010. Statistical design and analysis of RNA sequencing data.
Genetics 185:405-416.

Center for Research Informatics Bioinformatics Core last updated May 2015

Rnaseq by Example
No ratings yet
Rnaseq by Example
163 pages
OSPF Demystified With RFC: Request For Comments Translated Into Practice
From Everand
OSPF Demystified With RFC: Request For Comments Translated Into Practice
Redouane MEDDANE
5/5 (1)
A Practical Guide to XLIFF 2.0
From Everand
A Practical Guide to XLIFF 2.0
Bryan Schnabel
No ratings yet
Nicoll Highway Incident 22 October 2014
No ratings yet
Nicoll Highway Incident 22 October 2014
41 pages
Cutting Handling
No ratings yet
Cutting Handling
25 pages
RNA seq Data Analysis
No ratings yet
RNA seq Data Analysis
90 pages
Module 7 8 Lecture Slides
No ratings yet
Module 7 8 Lecture Slides
59 pages
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
No ratings yet
Documents - Pub Introduction To Next Generation Sequencing and Variant Calling Karin Kassahn
74 pages
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
No ratings yet
Brief Guide For NGS Transcriptomics: From Gene Expression To Genetics
120 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
23 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
Lecture 01 - Genome Sequencing
No ratings yet
Lecture 01 - Genome Sequencing
48 pages
RNA-Seq Module 1
No ratings yet
RNA-Seq Module 1
54 pages
ExSeq Presentation With Background
No ratings yet
ExSeq Presentation With Background
40 pages
RNA Seq R - Final Decode
No ratings yet
RNA Seq R - Final Decode
76 pages
2023-GenomicaFuncional y Biocomputacion-Day1
No ratings yet
2023-GenomicaFuncional y Biocomputacion-Day1
92 pages
Introduction To Next-Generation Sequencing Technology
No ratings yet
Introduction To Next-Generation Sequencing Technology
12 pages
Illumina
No ratings yet
Illumina
68 pages
Estimating Sequencing Coverage
No ratings yet
Estimating Sequencing Coverage
2 pages
Next Generation Sequencing - : An Overview
No ratings yet
Next Generation Sequencing - : An Overview
46 pages
Blank en Berg Pittsburgh 2011 Ngs
No ratings yet
Blank en Berg Pittsburgh 2011 Ngs
59 pages
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
No ratings yet
RNA Sequencing: An Introduction To Efficient Planning and Execution of RNA Sequencing (RNA-Seq) Experiments
6 pages
CE6068 Lecture 4
No ratings yet
CE6068 Lecture 4
82 pages
nihms-977214
No ratings yet
nihms-977214
21 pages
RNA-Seq Analysis Course
No ratings yet
RNA-Seq Analysis Course
40 pages
Illumina Sequencing Introduction
No ratings yet
Illumina Sequencing Introduction
12 pages
Implementation of A Read Mapping Tool Based On The Pigeon-Hole Principle
No ratings yet
Implementation of A Read Mapping Tool Based On The Pigeon-Hole Principle
38 pages
01 Lecture
No ratings yet
01 Lecture
50 pages
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
No ratings yet
EBTY348L_Comp Genomics lectures_Even Sem_2024-25 _set 2
29 pages
Data Analysis in Next Generation Sequencing
100% (1)
Data Analysis in Next Generation Sequencing
78 pages
Bianca Castiglioni
No ratings yet
Bianca Castiglioni
96 pages
NGS Workshop Update
No ratings yet
NGS Workshop Update
98 pages
Ngs Technologies
No ratings yet
Ngs Technologies
34 pages
Margue Rat 2010
No ratings yet
Margue Rat 2010
11 pages
Brown Goecks 2015 Sample NextGenDNASequencingInformatics2ed
No ratings yet
Brown Goecks 2015 Sample NextGenDNASequencingInformatics2ed
8 pages
Intro To Using Galaxy - For Bioinformatics: Carrie Ganote
No ratings yet
Intro To Using Galaxy - For Bioinformatics: Carrie Ganote
26 pages
RNA Seq - Applications and Best Practices
No ratings yet
RNA Seq - Applications and Best Practices
34 pages
Beginner's Guide To Using The DESeq2 Package
No ratings yet
Beginner's Guide To Using The DESeq2 Package
32 pages
Biology Meets Programming 101
No ratings yet
Biology Meets Programming 101
79 pages
Lecture 2 - Sequencing
No ratings yet
Lecture 2 - Sequencing
47 pages
Bacher 2016
No ratings yet
Bacher 2016
14 pages
Long Read Sequencing in Deciphering Human Genetics To A Greater Depth
No ratings yet
Long Read Sequencing in Deciphering Human Genetics To A Greater Depth
15 pages
Intro 2 RNAseq
No ratings yet
Intro 2 RNAseq
98 pages
tmp168B TMP
No ratings yet
tmp168B TMP
2 pages
Next Generation Sequencing
No ratings yet
Next Generation Sequencing
44 pages
Introduction To Differential Gene Expression Analysis Using RNA-seq
No ratings yet
Introduction To Differential Gene Expression Analysis Using RNA-seq
97 pages
IBT Practical Assignment MEMO Genomics S1 FGuerfali
No ratings yet
IBT Practical Assignment MEMO Genomics S1 FGuerfali
4 pages
lecture1-4_525_W16_large
No ratings yet
lecture1-4_525_W16_large
80 pages
Pant Nagar
No ratings yet
Pant Nagar
45 pages
NGS notes
No ratings yet
NGS notes
2 pages
Sequencing Depth and Coverage Key
No ratings yet
Sequencing Depth and Coverage Key
12 pages
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
No ratings yet
Sequencing Depth and Coverage: Key Considerations in Genomic Analyses
12 pages
NGS ToolsFormats r1 BDG
No ratings yet
NGS ToolsFormats r1 BDG
32 pages
SciLife Bioinfo Course May2017 AA
No ratings yet
SciLife Bioinfo Course May2017 AA
54 pages
Ismail H. Bioinformatics. a Practical Guide...Sequencing Data Analysis 2023
No ratings yet
Ismail H. Bioinformatics. a Practical Guide...Sequencing Data Analysis 2023
349 pages
Kratz et al. 2014. The devil in details RNAseq - copia
No ratings yet
Kratz et al. 2014. The devil in details RNAseq - copia
3 pages
Genetic Testing Techniques 2024
No ratings yet
Genetic Testing Techniques 2024
49 pages
RNA-Seq and Transcriptome Analysis: Jessica Holmes
No ratings yet
RNA-Seq and Transcriptome Analysis: Jessica Holmes
98 pages
1.RNA Seq Part1 WorkingToTheGoal
No ratings yet
1.RNA Seq Part1 WorkingToTheGoal
75 pages
Ngs Application Guide
No ratings yet
Ngs Application Guide
144 pages
RNA-seq
No ratings yet
RNA-seq
3 pages
2015 PAG Variant PDF
No ratings yet
2015 PAG Variant PDF
65 pages
Design Report
100% (1)
Design Report
193 pages
Master of Nursing Science (Pre-Registration) Programme
No ratings yet
Master of Nursing Science (Pre-Registration) Programme
2 pages
Chapter 11
No ratings yet
Chapter 11
39 pages
Water Extraction of L-Dopa From Mucuna Bean
No ratings yet
Water Extraction of L-Dopa From Mucuna Bean
14 pages
139-147 Kegel Exercises On Urinary Incontinence
No ratings yet
139-147 Kegel Exercises On Urinary Incontinence
9 pages
Data Cod Promotor 2023
No ratings yet
Data Cod Promotor 2023
7 pages
Pension Portion: Hints For Calculation of Pension "1 Step"
100% (1)
Pension Portion: Hints For Calculation of Pension "1 Step"
55 pages
Hukum Tingkepan
No ratings yet
Hukum Tingkepan
8 pages
Download Full Theories of Counseling and Psychotherapy A Case Approach 4th Edition Nancy L. Murdock PDF All Chapters
100% (8)
Download Full Theories of Counseling and Psychotherapy A Case Approach 4th Edition Nancy L. Murdock PDF All Chapters
67 pages
Health and Fitness Vocabulary List
No ratings yet
Health and Fitness Vocabulary List
12 pages
Section 10
No ratings yet
Section 10
4 pages
R1 Inspection Call Format
No ratings yet
R1 Inspection Call Format
1 page
FEDSM-ICNMM2010-30775: Gas Liquid Vane Separators in High Pressure Applications
No ratings yet
FEDSM-ICNMM2010-30775: Gas Liquid Vane Separators in High Pressure Applications
5 pages
ESM 18 19 Eng
No ratings yet
ESM 18 19 Eng
320 pages
7P
No ratings yet
7P
13 pages
Nutrients 13 03561 v3
No ratings yet
Nutrients 13 03561 v3
15 pages
SAADDigest 2017
No ratings yet
SAADDigest 2017
7 pages
Examining The Evidence For Chytridiomycosis in Threatened Amphibian Species
No ratings yet
Examining The Evidence For Chytridiomycosis in Threatened Amphibian Species
4 pages
Einhell Air Conditionner
No ratings yet
Einhell Air Conditionner
29 pages
Conference Paper
No ratings yet
Conference Paper
174 pages
Seed Production Technology of Chili 1
No ratings yet
Seed Production Technology of Chili 1
10 pages
Harga Beli Mersifarma Ekatalog 2019
No ratings yet
Harga Beli Mersifarma Ekatalog 2019
3 pages
Salt Analysis - 5 - SrCl2
No ratings yet
Salt Analysis - 5 - SrCl2
3 pages
MUE
No ratings yet
MUE
269 pages
Sinh Li Thuc Vat
No ratings yet
Sinh Li Thuc Vat
83 pages
ELICIAE MV20 - Clinical Study - VL - 26AUG2020
No ratings yet
ELICIAE MV20 - Clinical Study - VL - 26AUG2020
29 pages
DR Cast Iron Fittings Charlotte
No ratings yet
DR Cast Iron Fittings Charlotte
124 pages
CF Sharp V Orbeta
No ratings yet
CF Sharp V Orbeta
2 pages

Bioinformatics Experimental Design

Uploaded by

Bioinformatics Experimental Design

Uploaded by

Bioinformatics Advice on Experimental Design

Please contact our Bioinformatics team for a consultation when in doubt.

Next Generation Sequencing (NGS) experiments

§ Type of Run – Paired End (PE) or Single End (SE):

In general, 4 biological replicates per experiment are recommended, however, 3

§ Blocking & Multiplexing:

L1! L1! L1! L1! L1! L1!

Category Application Recommended coverage

RNA (for human/mouse genome):

Category Application Recommended of mapped

´ What is difference between Biological replicates and technical replicates?

Biological variation is intrinsic to all organisms; it may be influenced by genetic or

You might also like