Deep Sequencing Data Analysis Noam Shomron pdf download
Deep Sequencing Data Analysis Noam Shomron pdf download
com
https://ebookmeta.com/product/deep-sequencing-data-analysis-
noam-shomron/
OR CLICK HERE
DOWLOAD EBOOK
https://ebookmeta.com/product/next-generation-sequencing-data-
analysis-2nd-edition-xinkun-wang/
ebookmeta.com
https://ebookmeta.com/product/loose-leaf-for-essentials-of-life-span-
development-7th-edition-john-w-santrock/
ebookmeta.com
The Digital Journey of Banking and Insurance, Volume II
Volker Liermann
https://ebookmeta.com/product/the-digital-journey-of-banking-and-
insurance-volume-ii-volker-liermann/
ebookmeta.com
https://ebookmeta.com/product/history-of-psychology-5th-edition-
hothersall/
ebookmeta.com
https://ebookmeta.com/product/archaeopoetics-word-image-history-1st-
edition-mandy-bloomfield/
ebookmeta.com
https://ebookmeta.com/product/j-krishnamurti-educator-for-peace-1st-
edition-meenakshi-thapan/
ebookmeta.com
https://ebookmeta.com/product/work-in-progress-a-marriage-of-
convenience-romantic-comedy-1st-edition-staci-hart-hart/
ebookmeta.com
Multivariate Data Integration Using R: Methods and
Applications with the mixOmics Package 1st Edition Kim-Anh
Lê Cao
https://ebookmeta.com/product/multivariate-data-integration-using-r-
methods-and-applications-with-the-mixomics-package-1st-edition-kim-
anh-le-cao/
ebookmeta.com
Methods in
Molecular Biology 2243
Deep
Sequencing
Data Analysis
Second Edition
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK
Edited by
Noam Shomron
Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Editor
Noam Shomron
Sackler Faculty of Medicine
Tel Aviv University
Tel Aviv, Israel
This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface
I recall reading my first “NGS” paper (in 2005), during my postdoc at MIT, when a
company with a strange name—454 [1] (454 Life Sciences, later purchased by Roche)—
described its method to sequence millions of nucleotides in every experimental run. While
reading the methodology, I felt like I was entering a great science fiction novel or watching a
Mission Impossible movie. “How could all this work. . .” I wondered. There are so many
potential pitfalls along the process, and yet the publication was solid and the results
convincing. I admired the engineers and the biologists who devised it all. Before long we
started getting used to these DNA churning machines. Our appetites grew and we sought
more and more data from every spin of the experimental wheel. A few years later we were
flabbergasted again with the report of a new machine with even larger capacity. This time it
was the Solexa technology [2] (purchased by Illumina). Not long after, the Pacific Bios-
ciences novel single molecule reads [3] apparatus matured, followed by the Ion Torrent
[4] from Life Technologies, which skipped fluorescence and luminescence, and then the
SOLiD [5] system from Applied Biosystems. This was followed by exciting advances in
Nanopore readers [6] (from Oxford Nanopore Technologies). Fifteen years have passed
since the first NGS publication. During this period, sample preparation protocols have been
established (at first machines came without instructions or preparation kits); secrets of how
to obtain the most reads out of the DNA sequencers have passed from one technician to
another (setting the air conditioner in the lab to the minimum temperature, for example,
increases the nucleotide output); and bioinformaticians have learned to deal with error-
prone (very) short DNA reads (at first Illumina reads, for example, were limited to
35 nucleotides). In parallel, engineers, chemists, and biologists have developed advanced
machines and supportive protocols to improve outputs. All along, the bioinformaticians,
computational biologists, and computer scientists have supported these technologies. Their
goals have been to ensure that the read output matches the genuine nucleotide sequence,
and that the data analysis is accurate. In our second edition of the book, entitled Deep
Sequencing Data Analysis, under the “Methods in Molecular Biology” series, leading
authors contributed to the multidimensional task of deep sequencing data analysis. We
start by describing methods of detecting detrimental variants using whole genome sequenc-
ing, statistical considerations for inferring copy number variations, whole-metagenome
shotgun sequencing studies and 16S amplicon sequencing, mapping the accessible chroma-
tin landscape, and (small/single cell) RNA sequencing (RNA seq). In our coverage of
computational oriented methods, we discuss the use of deep learning for data analysis and
genome-wide noninvasive prenatal diagnosis (NIPD) of SNPs, indels, and de novo muta-
tions. We end with specialized topics that deal with accurate imputation of untyped variants,
multi-region sequence analysis to predict intratumor heterogeneity, and cancer classifica-
tion. We present several topics as primers for the bioinformatics student, while we take an
in-depth dive for the professional readers. The topics should be of great use for both
beginner and savvy bioinformaticians when tackling deep sequencing data analysis.
v
vi Preface
References
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Detecting Causal Variants in Mendelian Disorders Using Whole-Genome
Sequencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Abdul Rezzak Hamzeh, T. Daniel Andrews, and Matt A. Field
2 Statistical Considerations on NGS Data for Inferring Copy Number
Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Jie Chen
3 Applications of Community Detection Algorithms to Large Biological
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Itamar Kanter, Gur Yaari, and Tomer Kalisky
4 Processing and Analysis of RNA-seq Data from Public Resources . . . . . . . . . . . . . 81
Yazeed Zoabi and Noam Shomron
5 Improved Analysis of High-Throughput Sequencing Data Using
Small Universal k-Mer Hitting Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Yaron Orenstein
6 An Introduction to Whole-Metagenome Shotgun Sequencing Studies . . . . . . . . 107
Tyler A. Joseph and Itsik Pe’er
7 Microbiome Analysis Using 16S Amplicon Sequencing: From Samples
to ASVs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Amnon Amir
8 RNA-Seq in Nonmodel Organisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Vered Chalifa-Caspi
9 Deep Learning Applied on Next Generation Sequencing Data Analysis . . . . . . . . 169
Artem Danilevsky and Noam Shomron
10 Interrogating the Accessible Chromatin Landscape of Eukaryote
Genomes Using ATAC-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Georgi K. Marinov and Zohar Shipony
11 Genome-Wide Noninvasive Prenatal Diagnosis of SNPs and Indels . . . . . . . . . . . 227
Tom Rabinowitz and Noam Shomron
12 Genome-Wide Noninvasive Prenatal Diagnosis of De Novo Mutations . . . . . . . . 249
Ravit Peretz-Machluf, Tom Rabinowitz, and Noam Shomron
13 Accurate Imputation of Untyped Variants from Deep Sequencing Data. . . . . . . . 271
Davoud Torkamaneh and François Belzile
14 Multiregion Sequence Analysis to Predict Intratumor Heterogeneity
and Clonal Evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Soyeon Ahn and Haiyan Huang
15 Overcoming Interpretability in Deep Learning Cancer Classification . . . . . . . . . . 297
Yue Yang (Alan) Teo, Artem Danilevsky, and Noam Shomron
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Contributors
SOYEON AHN • Division of Statistics, Medical Research Collaborating Center, Seoul National
University Bundang Hospital, Seongnam, Republic of Korea
AMNON AMIR • Microbiome Center, The Chaim Sheba Medical Center, Tel-Hashomer,
Ramat-Gan, Israel
T. DANIEL ANDREWS • John Curtin School of Medical Research, Australian National
University, Canberra, ACT, Australia
FRANÇOIS BELZILE • Département de Phytologie, Université Laval, Québec City, QC,
Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec
City, QC, Canada
VERED CHALIFA-CASPI • Bioinformatics Core Facility, Ben-Gurion University of the Negev,
Beer-Sheva, Israel
JIE CHEN • Division of Biostatistics and Data Science, Department of Population Health
Sciences, Medical College of Georgia, Augusta University, Augusta, GA, USA
ARTEM DANILEVSKY • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
MATT A. FIELD • John Curtin School of Medical Research, Australian National University,
Canberra, ACT, Australia; Centre for Tropical Bioinformatics and Molecular Biology,
Australian Institute of Tropical Health and Medicine, James Cook University, Cairns,
QLD, Australia
SYDNIE GRABELL • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
ABDUL REZZAK HAMZEH • John Curtin School of Medical Research, Australian National
University, Canberra, ACT, Australia
HAIYAN HUANG • Department of Statistics, University of California, Berkeley, CA, USA;
Center for Computational Biology, University of California, Berkeley, CA, USA
TYLER A. JOSEPH • Department of Computer Science, Fu Foundation School of Engineering
& Applied Science, Columbia University, New York, NY, USA
TOMER KALISKY • BIU, Department of Bioengineering, Bar-Ilan University, Ramat Gan,
Israel
ITAMAR KANTER • BIU, Department of Bioengineering, Bar-Ilan University, Ramat Gan,
Israel
WENDAO LIU • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
GEORGI K. MARINOV • Department of Genetics, Stanford University, Stanford, CA, USA
YARON ORENSTEIN • Ben-Gurion University of the Negev, Beersheba, Israel
METSADA PASMANIK-CHOR • Bioinformatics Unit, G.S.W. Faculty of Life Science, Tel Aviv
University, Tel Aviv, Israel
ITSIK PE’ER • Department of Computer Science, Fu Foundation School of Engineering &
Applied Science, Columbia University, New York, NY, USA
RAVIT PERETZ-MACHLUF • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
TOM RABINOWITZ • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
GUY SHAPIRA • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
ZOHAR SHIPONY • Department of Genetics, Stanford University, Stanford, CA, USA
ix
x Contributors
NOAM SHOMRON • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
YUE YANG (ALAN) TEO • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
DAVOUD TORKAMANEH • Département de Phytologie, Université Laval, Québec City, QC,
Canada; Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec
City, QC, Canada; Department of Plant Agriculture, University of Guelph, Guelph, ON,
Canada
GUR YAARI • BIU, Department of Bioengineering, Bar-Ilan University, Ramat Gan, Israel
YAZEED ZOABI • Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Chapter 1
Abstract
Increasingly affordable sequencing technologies are revolutionizing the field of genomic medicine. It is
now feasible to interrogate all major classes of variation in an individual across the entire genome for less
than $1000 USD. While the generation of patient sequence information using these technologies has
become routine, the analysis and interpretation of this data remains the greatest obstacle to widespread
clinical implementation. This chapter summarizes the steps to identify, annotate, and prioritize variant
information required for clinical report generation. We discuss methods to detect each variant class and
describe strategies to increase the likelihood of detecting causal variant(s) in Mendelian disease. Lastly, we
describe a sample workflow for synthesizing large amount of genetic information into concise clinical
reports.
Key words Variant detection, Variant annotation, Clinical reports, SNV, Copy number variation,
Missense mutation, Mendelian disease
1 Introduction
Noam Shomron (ed.), Deep Sequencing Data Analysis, Methods in Molecular Biology, vol. 2243,
https://doi.org/10.1007/978-1-0716-1103-6_1, © Springer Science+Business Media, LLC, part of Springer Nature 2021
1
2 Abdul Rezzak Hamzeh et al.
3 Variation Types
3.1 SNVs and Small The vast majority of variation in the human genome consists of
Indels single nucleotide polymorphisms and small indels (<50 bp). Any
human genome aligned to the standard reference genome (cur-
rently GRCh38) results in roughly four million SNVs and
400,000 small indel calls. Many algorithms exist for detecting
SNVs and small indels, with most tools detecting both types of
variation in a single pass. Some of the most commonly used algo-
rithms used for SNV/small indel detection are listed in Table 1.
Most algorithms require a BAM alignment file as input and
output a variant call format (vcf) file. Some programs run more
than one variant detection algorithm and employ a consensus
approach (BAYSIC [14] and appreci8 [15]) while others support
the use of molecular tagging techniques in order to detect variation
within sequence reads derived from individual input DNA mole-
cules (DeepSNVMiner [16] and smCounter2 [17]). Overall,
SNV/small indel algorithms achieve the highest accuracy relative
to other variants types with a recent review showing all algorithms
achieving F-scores >0.975 for SNVs and >0.85 for small
indels [18].
4 Abdul Rezzak Hamzeh et al.
Table 1
Commonly used germline variant callers for SNVs/indels
UMI
Programming Multiple barcode
Tool language Variant type tools handling
GATK Haplotype [12] Java SNV, indel No No
SAMtools [85] C SNV, indel No No
VarScan/VarScan2 Java SNV, indel No No
[86]
Platypus [87] Python SNV, indel, SV No No
Strelka2 [88] Python SNV, indel No No
BAYSIC [14] Perl/R SNV Yes No
VarDict [89] Java/Perl SNV, indel, SV No No
DeepSNVMiner [16] Perl/R SNV, indel No Yes
LoFreq [90] C/python SNV, indel No No
Appreci8R [15] R SNV, indel Yes No
smCounter2 [16] Python SNV, indel No Yes
3.2 Structural/Copy Structural variation (SV) is defined as variants >50 bp that can be
Number Variation classified as deletions, insertions, duplications, inversions, and
translocations. SVs are further classified as balanced or unbalanced
based on whether they alter the size of the resultant genome.
Inversions and translocations events are classified as balanced,
while deletions and duplications (collectively referred to as copy
number variants or CNVs) and insertions are referred to as unbal-
anced. While somewhat arbitrary, SVs are identified separately from
SNVs and small indels due to the distinct mechanism in how they
are formed [19]. In any human genome, there are far fewer SVs
compared to SNVs and small indels (<10,000 for all SV types);
however, their larger size means they are more likely to have an
impact on function. SVs are more challenging to detect and resolve
with short-read data than SNV/small indels partly because SVs are
often longer than the sequence read length. Additional challenges
arise as each SV type requires a distinct algorithm due to its unique
read alignment pattern with the problem further confounded by SV
breakpoints being enriched in repetitive regions where short read
aligners struggle [20]. SV detection algorithms vary in both the
types of SVs they detect and the types of alignment evidence they
use in detection. Some of the most commonly used algorithms used
for SV/CNV detection are listed in Table 2.
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing 5
Table 2
Commonly used germline variant callers for structural/copy number variation
3.3 Repeat Variation High copy repeat variation consists broadly of mobile elements and
tandem repeats with tandem repeats further classified by size into
microsatellites (1–6 bp; also called short tandem repeats), minisa-
tellites (7–49 bp), and larger satellite repeats typically found in
centromeres and heterochromatin regions. Due to the challenges
of detecting this type of variation the estimated number per human
genome is less certain; however, current estimates are ~10,000
tandem repeats and ~ 2000 mobile elements per genome [27]. Reli-
able detection of repeat variation is challenging with short reads
due to issues with accurate read alignment and the accuracy of the
reference genome in these regions. Some of the most commonly
used algorithms for repeat variant detection are listed in Table 3.
In general, the repeat variation detection algorithms are
extremely specialized employing a wide variety of approaches. For
example, RepeatExplorer2 [28] uses graph-based clustering of
reads and identifies different families of repetitive elements based
on the clustering patterns, patterns which can be further analyzed
with regard to the sequence composition and abundance of these
repeat variants. STRetch [29], on the other hand, constructs a set
of additional STR sequences comprising all possible 1–6 bp repeats
which are added to the reference genome as decoy sequences.
Table 3
Commonly used germline variant callers for repeat variation
4.1 Variant Impact Typically, variants are compared to a gene model such as
on Proteins ENSEMBL or RefSeq [30] to determine their potential functional
impact on proteins. A variety of software annotates variants relative
to a gene model, two of the most popular being the Variant Effect
Predictor (VEP) [31] and AnnoVar [32]. Overall, the algorithms
are similar; however, they differ in terms of whether additional gene
models are supported and also in how they determine the variant
effect when multiple overlapping transcripts are available for a gene.
For example, VEP offers an option to only consider the impact on
the “canonical” transcript (generally the longest CCDS transcripts
with no stop codon); however, the canonical transcript does not
necessarily represent the most biologically active transcript meaning
the annotations may be inaccurate when noncanonical transcripts
are biologically active.
Following annotation, variants are broadly grouped into cod-
ing and noncoding with coding variants further divided based on
their effect on the protein amino acid sequence. Coding SNVs are
classified as missense (alter amino acid sequence), nonsense (gener-
ate stop codon), or synonymous (no effect on amino acid
sequence), while coding indels are divided into frameshift or non-
frameshift based on whether their length is a factor of 3. Noncoding
SNVs and indels are overlapped to a variety of features known to
influence gene function including splice sites, 30 and 50 UTRs,
miRNAs, or known regulatory elements. Splice site mutations are
enriched for causal variants often resulting in intron retention, exon
skipping, or exon creation, all of which may lead to the production
of aberrant proteins [33]. Larger and more complex variants (SVs
and repeats) are examined in terms of their proximity to genes and
exons, the potential rearrangement of known regulatory elements,
and any possible gene fusion transcripts generated. For SVs, the
genomic region to examine is affected by the SV type with balanced
SVs limited to the immediate breakpoint region compared to CNVs
where the entire duplicated/deleted region is examined.
8 Abdul Rezzak Hamzeh et al.
4.2 Missense Missense mutations often cause disease; however, any human
Mutation genome contains hundreds of such mutations most of which pro-
Prediction Tools duce no discernible phenotype. Currently, it is not feasible to
functionally test all missense mutations so computational tools
have been developed to predict how damaging any given mutation
is likely to be to protein function. These tools are usually trained on
both known disease mutations and common polymorphisms; how-
ever, they have yet to be tested against an unbiased spectrum of
random de novo mutations. A variety of algorithms exist that rely
on four types of evidence: sequence conservation, protein struc-
ture, annotations, and training data (Fig. 2).
Depends on
SIFT
CanPredict
B-SIFT
CHASM
MAPP
mCluster Align-GVGD
Mutation
SNAP Assessor
Logre
Condel
Based on
PolyPhen-2 Conservation
Structure
Non-Cancer-Specific Annotation
Cancer-Specific Training Sets
It has been shown that these algorithms suffer from high false
positive rates and often the predictions do not track with clinical
phenotype [34]. This is reflected by the American College of Med-
ical Genetics and Genomics (AGMG) recommendations where
they state, “These are only predictions, however, and their use in
sequence variant interpretation should be implemented carefully. It
is not recommended that these predictions be used as the sole
source of evidence to make a clinical assertion.” Likely the high
false positive rate can be explained by the heavy reliance of all the
tools on sequence conservation. Heavily relying on sequence con-
servation effectively measures purifying selection; however, not all
variants under purifying selection result in a clinical phenotype.
Some variants only generate a phenotype under specific environ-
mental conditions, the so-called “nearly neutral” mutations first
described by Tomoko Ohta in 1992. Currently these tools are
unable to differentiate immediately clinically relevant mutations
from nearly neutral mutations [34].
5 Specialized Strategies
5.1 Sample Selection While not always possible, choosing which sample(s) to sequence
Strategies can increase the likelihood of identifying disease-causing variants. If
only sequencing a single individual, a strategy of focusing on early-
onset cases with extreme phenotypes that are clinically well-defined
has been shown to increase diagnosis rates [35]. In such cases rare
or de novo mutations are most likely causal, particularly when no
other family members are known to be affected. Another example
where a single individual is often sufficient is within consanguine-
ous families where homozygous mutations are first examined thus
greatly reducing the total genetic search space [36].
When multiple samples are available for sequencing, if the
individuals are unrelated, it is critical to focus on patients with a
well-defined shared phenotype. In such instances a common muta-
tion can be found [37, 38]; however, it is often necessary to employ
gene network analysis software (e.g., Ingenuity [39] or String [40])
to identify mutations affecting the same disease pathway via differ-
ent genes.
The greatest success rates in detecting causal variants in Men-
delian diseases currently comes with sequencing multiple indivi-
duals within a family or pedigree. While clinical sequencing has
been successfully in discovering causal variation using small num-
bers of unrelated individuals [41] it is clear this approach is insuffi-
cient to reliably identify the underlying genetic causes in many cases
10 Abdul Rezzak Hamzeh et al.
[42]. With pedigree sequencing the search space for causal variants
is reduced, by both the prioritization of variants common to
affected individuals and the exclusion of benign variants shared
between affected and unaffected individuals. The effective analysis
of sequenced pedigrees requires tools capable of combining variant
specific and pedigree wide annotations to dramatically reduce the
causal variation search space. Generally tools focus on either pro-
gressively removing variants based on criteria deemed unlikely to be
causal [43] or by focusing on variants matching specific inheritance
models (compound heterozygotes [44]; autosomal dominant
[45]). Other tools like Gemini [46] or VASP [47] do not make
any assumptions regarding disease inheritance.
5.3 Software Finally, software strategies such as running multiple variant callers
Strategies and employing a consensus-based approach are being employed to
increase diagnosis rates. This strategy has arisen because it has
become increasingly clear that no single variant caller for any variant
type performs optimally under all conditions [18, 25]. Several stud-
ies have shown that combining variant calls of multiple tools results
in the best quality resultant variant set, for either specificity or
sensitivity, depending on whether the intersection or union, of all
variant calls is used respectively [15, 52, 53]. While this view is
increasingly accepted, a tendency still exists to rely on the results
from a single tool alone given the current complexity of incorpor-
ating external software into a genome analysis infrastructure. While
implementing such features represents an increase in complexity
and computation the results offer indisputable improvements in
data quality. Such an approach is especially important in a clinical
setting where there is low tolerance for false negative variants.
6 Clinical Reporting
6.1 External Data Variant-specific data embedded in clinical reports typically come
Sources from the same data sources used throughout the filtration process.
These sources can be classified either into databases containing dis-
ease, sequence, and population information, or into software
(Table 4).
Gene–disease databases are obligatory sources for any clinical
report as they provide verifiable evidence of links between human
genes and human disorders. The most important of these databases
is OMIM [58], which focuses on the relationship between variation
in human genes and genetic disorders at the molecular level. Similar
databases include Orphanet, The Monarch Initiative [59], The
Phenomizer [60], and Genomics England PanelApp. These data-
bases are similar in that they do not attempt to exhaustively discuss
all the variants in every single gene, as is the case with clinical
databases such as ClinVar [55], Human Gene Mutation Database
[61] and DECIPHER [62]. Additionally, there are locus, gene, and
disease-specific databases such as IARC TP53 [63], Infevers (regis-
try of Hereditary Auto-inflammatory Disorders Mutations) [64],
and locus-specific databases built in the LOVD system [65]. While
such resources are invaluable, there are limitations as most inter-
pretations are formulated by expert opinions based on evaluation of
often incomplete functional evidence. A review of annotations for
recessive disease-causing genes found 27% percent of mutations
cited in the literature were incorrect, and were identified as com-
mon polymorphisms or misannotated in public databases [66]. A
follow up study of literature revealed even worse numbers with only
7.5% of 239 unique variants annotated as disease-causing in
HGMD found to fit the definition [67].
Other database such as dbSNP [68], dbVar [69], and Database
of Genomic Variants [70] aim to catalog all population level varia-
tion regardless of their clinical importance. Population level variant
databases are indispensable for assessing the pathogenicity of var-
iants, examples include the genome aggregation database
[54], 1000 Genomes Project Phase 3 [71] and Exome Variant
Server. Additionally, aggregation-style databases exist with the
aim of providing a comprehensive review of the variant, as is the
case of VarSome [72] or MARRVEL [73]. These comprehensive
databases are beneficial is that they follow HGVS-approved nomen-
clature for the variant in DNA, RNA, and protein sequences, which
can be confirmed using the rules published by the Human Genome
Variation Society. This is important as only HGNC-approved gene
names should be used throughout the entire process of sequence
alignment, variant annotation, and prioritization. To ensure stan-
dardization of names, use the “Multi-symbol checker” tool from
HGNC (HUGO Gene Nomenclature Committee) [56].
When consulting any of these databases, it is important to be
aware of the time lag between the appearance of reportable findings
in peer-reviewed publications and their subsequent inclusion in the
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing 13
Table 4
List of databases, resources and software that are commonly used during variant prioritization
subsequent to clinical sequencing
Database Link
OMIM (Online Mendelian Inheritance in Man) [58] https://www.omim.org
Orphanet https://www.orpha.net
The Monarch Initiative [59] https://monarchinitiative.org
The Phenomizer [60] http://compbio.charite.de/phenomizer
Genomics England PanelApp https://panelapp.genomicsengland.co.uk
ClinVar [55] https://www.ncbi.nlm.nih.gov/clinvar
Human Gene Mutation Database (HGMD) [61] http://www.hgmd.cf.ac.uk
DECIPHER [62] http://decipher.sanger.ac.uk
Infevers (The registry of Hereditary Auto-inflammatory https://infevers.umai-montpellier.fr
Disorders Mutations) [64]
LOVD [65] http://www.lovd.nl
dbSNP [68] http://www.ncbi.nlm.nih.gov/snp
dbVar [69] http://www.ncbi.nlm.nih.gov/dbvar
Database of Genomic Variants [70] http://dgv.tcag.ca/dgv/app/home
gnomAD (Genome aggregation database) [54] https://gnomad.broadinstitute.org
1000 Genomes Project Phase 3 [71] http://phase3browser.1000genomes.org
Exome Variant Server http://evs.gs.washington.edu/EVS
VarSome [72] https://varsome.com
MARRVEL [73] http://marrvel.org
Human Genome Variation Society https://varnomen.hgvs.org
Multi-symbol checker [56] https://www.genenames.org/tools/
multi-symbol-checker
HGNC (HUGO Gene Nomenclature Committee) [56] https://www.genenames.org
PubMed https://ncbi.nlm.nih.gov/pubmed
KEGG PATHWAY Database [106] https://www.genome.jp/kegg/pathway.
html
GTEx (The Genotype-Tissue Expression project) [81] https://gtexportal.org
Variant Effect Predictor [31] https://ensembl.org/info/docs/tools/
vep/index.html
AnnoVar [32] http://annovar.openbioinformatics.org/
en/latest/
VarAFT [107] https://varaft.eu/
(continued)
14 Abdul Rezzak Hamzeh et al.
Table 4
(continued)
Database Link
dbNFSP [108] https://sites.google.com/site/jpopgen/
dbNSFP
snpEff [109] http://snpeff.sourceforge.net/
Polyphen2 [79] http://genetics.bwh.harvard.edu/pph2/
SIFT [80] https://sift.bii.a-star.edu.sg/
CADD [45] https://cadd.gs.washington.edu/
various databases. The lag is often relatively long, even for critically
important clinical data, making it essential to comb the relevant
literature on PubMed in order to guarantee that the clinical report
incorporates the most up-to-date information.
6.4 Strategy Following clinical sequencing, several different lines of variant fil-
of Variant tration are applied to each of the four types of variants: SNVs,
Prioritization indels, repeats, and structural variants. Additional filters are also
applied when analyzing variants uncovered as part of a cohort
compared to variants from a singleton sample. Generally, the first
step in SNVs prioritization is to exclude noncoding, synonymous,
and “common” population variants from the total pool of SNVs
from the affected/s. Minor allele frequencies (MAF) are obtained
from databases such as gnomAD [54] or the Exome Sequencing
Project (ESP), and variants with MAF above the threshold of 1% are
considered common. In some instances, collective exclusion of
variants based on absolute MAF cut-offs may remove pathogenic
variants that are more prevalent than expected (as in cases of the
founder effect). Similarly, mass exclusion of all synonymous variants
ignores the effects that these variants may have on RNA level (e.g.,
altered splicing). It is therefore important to go through the litera-
ture about the condition under study in search for such exceptions
16 Abdul Rezzak Hamzeh et al.
References
1. Taupin D, Lam W, Rangiah D, McCallum L, ultrafast universal RNA-seq aligner. Bioinfor-
Whittle B, Zhang Y, Andrews D, Field M, matics 29(1):15–21. https://doi.org/10.
Goodnow CC, Cook MC (2015) A deleteri- 1093/bioinformatics/bts635
ous RNF43 germline mutation in a severely 11. Kim D, Pertea G, Trapnell C, Pimentel H,
affected serrated polyposis kindred. Hum Kelley R, Salzberg SL (2013) TopHat2: accu-
Genome Var 2:15013 rate alignment of transcriptomes in the pres-
2. Dunkerton S, Field M, Cho V, Bertram E, ence of insertions, deletions and gene fusions.
Whittle B, Groves A, Goel H (2015) A de Genome Biol 14(4):R36. https://doi.org/
novo mutation in KMT2A (MLL) in mono- 10.1186/gb-2013-14-4-r36
zygotic twins with Wiedemann-Steiner syn- 12. McKenna A, Hanna M, Banks E,
drome. Am J Med Genet A 167A(9):2182- Sivachenko A, Cibulskis K, Kernytsky A,
2187. https://doi.org/10.1002/ajmg.a. Garimella K, Altshuler D, Gabriel S, Daly M,
37130 DePristo MA (2010) The Genome Analysis
3. van Dijk EL, Auger H, Jaszczyszyn Y, Toolkit: a MapReduce framework for analyz-
Thermes C (2014) Ten years of next- ing next-generation DNA sequencing data.
generation sequencing technology. Trends Genome Res 20(9):1297–1303. https://doi.
Genet 30(9):418–426. https://doi.org/10. org/10.1101/gr.107524.110
1016/j.tig.2014.07.001 13. Lappalainen T, Scott AJ, Brandt M, Hall IM
4. Johar AS, Mastronardi C, Rojas-Villarraga A, (2019) Genomic analysis in the age of human
Patel HR, Chuah A, Peng K, Higgins A, genome sequencing. Cell 177(1):70–84.
Milburn P, Palmer S, Silva-Lara MF, Velez https://doi.org/10.1016/j.cell.2019.02.
JI, Andrews D, Field M, Huttley G, 032
Goodnow C, Anaya JM, Arcos-Burgos M 14. Cantarel BL, Weaver D, McNeill N, Zhang J,
(2015) Novel and rare functional genomic Mackey AJ, Reese J (2014) BAYSIC: a Bayes-
variants in multiple autoimmune syndrome ian method for combining sets of genome
and Sjogren’s syndrome. J Transl Med variants with improved specificity and sensitiv-
13:173. https://doi.org/10.1186/s12967- ity. BMC Bioinformatics 15:104. https://doi.
015-0525-x org/10.1186/1471-2105-15-104
5. Pabinger S, Dander A, Fischer M, Snajder R, 15. Sandmann S, Karimi M, de Graaf AO,
Sperk M, Efremova M, Krabichler B, Speicher Rohde C, Gollner S, Varghese J, Ernsting J,
MR, Zschocke J, Trajanoski Z (2014) A sur- Walldin G, van der Reijden BA, Muller-
vey of tools for variant analysis of next- Tidow C, Malcovati L, Hellstrom-Lindberg E,
generation genome sequencing data. Brief Jansen JH, Dugas M (2018) appreci8: a pipe-
Bioinform 15(2):256–278. https://doi.org/ line for precise variant calling integrating
10.1093/bib/bbs086 8 tools. Bioinformatics 34(24):4205–4212.
6. Bolger AM, Lohse M, Usadel B (2014) Trim- https://doi.org/10.1093/bioinformatics/
momatic: a flexible trimmer for Illumina bty518
sequence data. Bioinformatics 30 16. Andrews TD, Jeelall Y, Talaulikar D, Good-
(15):2114–2120. https://doi.org/10.1093/ now CC, Field MA (2016) DeepSNVMiner: a
bioinformatics/btu170 sequence analysis tool to detect emergent,
7. Patel RK, Jain M (2012) NGS QC Toolkit: a rare mutations in subsets of cell populations.
toolkit for quality control of next generation PeerJ 4:e2074. https://doi.org/10.7717/
sequencing data. PLoS One 7(2):e30619. peerj.2074
https://doi.org/10.1371/journal.pone. 17. Xu C, Gu X, Padmanabhan R, Wu Z, Peng Q,
0030619 DiCarlo J, Wang Y (2019) smCounter2: an
8. Li H, Durbin R (2009) Fast and accurate accurate low-frequency variant caller for tar-
short read alignment with Burrows-Wheeler geted sequencing data with unique molecular
transform. Bioinformatics 25 identifiers. Bioinformatics 35(8):1299–1309.
(14):1754–1760. https://doi.org/10.1093/ https://doi.org/10.1093/bioinformatics/
bioinformatics/btp324 bty790
9. Langmead B, Salzberg SL (2012) Fast 18. Chen J, Li X, Zhong H, Meng Y, Du H
gapped-read alignment with Bowtie 2. Nat (2019) Systematic comparison of germline
Methods 9(4):357–359. https://doi.org/ variant calling pipelines cross multiple next-
10.1038/nmeth.1923 generation sequencers. Sci Rep 9(1):9345.
10. Dobin A, Davis CA, Schlesinger F, https://doi.org/10.1038/s41598-019-
Drenkow J, Zaleski C, Jha S, Batut P, 45835-3
Chaisson M, Gingeras TR (2013) STAR:
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing 19
19. Abyzov A, Li S, Kim DR, Mohiyuddin M, Emery S, Fan X, Gujral M, Kahveci F, Kidd
Stutz AM, Parrish NF, Mu XJ, Clark W, JM, Kong Y, Lameijer EW, McCarthy S,
Chen K, Hurles M, Korbel JO, Lam HY, Flicek P, Gibbs RA, Marth G, Mason CE,
Lee C, Gerstein MB (2015) Analysis of dele- Menelaou A, Muzny DM, Nelson BJ,
tion breakpoints from 1,092 humans reveals Noor A, Parrish NF, Pendleton M,
details of mutation mechanisms. Nat Com- Quitadamo A, Raeder B, Schadt EE,
mun 6:7256. https://doi.org/10.1038/ Romanovitch M, Schlattl A, Sebra R, Shabalin
ncomms8256 AA, Untergasser A, Walker JA, Wang M, Yu F,
20. Monlong J, Cossette P, Meloche C, Zhang C, Zhang J, Zheng-Bradley X,
Rouleau G, Girard SL, Bourque G (2018) Zhou W, Zichner T, Sebat J, Batzer MA,
Human copy number variants are enriched McCarroll SA, Genomes Project C, Mills
in regions of low mappability. Nucleic Acids RE, Gerstein MB, Bashir A, Stegle O, Devine
Res 46(14):7236–7249. https://doi.org/10. SE, Lee C, Eichler EE, Korbel JO (2015) An
1093/nar/gky538 integrated map of structural variation in
21. Abyzov A, Urban AE, Snyder M, Gerstein M 2,504 human genomes. Nature 526
(2011) CNVnator: an approach to discover, (7571):75–81. https://doi.org/10.1038/
genotype, and characterize typical and atypical nature15394
CNVs from family and population genome 27. Sudmant PH, Mallick S, Nelson BJ,
sequencing. Genome Res 21(6):974–984. Hormozdiari F, Krumm N, Huddleston J,
https://doi.org/10.1101/gr.114876.110 Coe BP, Baker C, Nordenfelt S, Bamshad M,
22. Cameron DL, Schroder J, Penington JS, Jorde LB, Posukh OL, Sahakyan H, Watkins
Do H, Molania R, Dobrovic A, Speed TP, WS, Yepiskoposyan L, Abdullah MS, Bravi
Papenfuss AT (2017) GRIDSS: sensitive and CM, Capelli C, Hervig T, Wee JT, Tyler-
specific genomic rearrangement detection Smith C, van Driem G, Romero IG, Jha AR,
using positional de Bruijn graph assembly. Karachanak-Yankova S, Toncheva D,
Genome Res 27(12):2050–2060. https:// Comas D, Henn B, Kivisild T, Ruiz-Linares A,
doi.org/10.1101/gr.222109.117 Sajantila A, Metspalu E, Parik J, Villems R,
Starikovskaya EB, Ayodo G, Beall CM, Di
23. Quinlan AR, Clark RA, Sokolova S, Leibowitz Rienzo A, Hammer MF, Khusainova R,
ML, Zhang Y, Hurles ME, Mell JC, Hall IM Khusnutdinova E, Klitz W, Winkler C,
(2010) Genome-wide mapping and assembly Labuda D, Metspalu M, Tishkoff SA,
of structural variant breakpoints in the mouse Dryomov S, Sukernik R, Patterson N,
genome. Genome Res 20(5):623–635. Reich D, Eichler EE (2015) Global diversity,
https://doi.org/10.1101/gr.102970.109 population stratification, and selection of
24. Wang J, Mullighan CG, Easton J, Roberts S, human copy-number variation. Science 349
Heatley SL, Ma J, Rusch MC, Chen K, Harris (6253):aab3761. https://doi.org/10.1126/
CC, Ding L, Holmfeldt L, Payne-Turner D, science.aab3761
Fan X, Wei L, Zhao D, Obenauer JC, 28. Novak P, Neumann P, Pech J, Steinhaisl J,
Naeve C, Mardis ER, Wilson RK, Downing Macas J (2013) RepeatExplorer: a Galaxy-
JR, Zhang J (2011) CREST maps somatic based web server for genome-wide characteri-
structural variation in cancer genomes with zation of eukaryotic repetitive elements from
base-pair resolution. Nat Methods 8 next-generation sequence reads. Bioinformat-
(8):652–654. https://doi.org/10.1038/ ics 29(6):792–793. https://doi.org/10.
nmeth.1628 1093/bioinformatics/btt054
25. Kosugi S, Momozawa Y, Liu X, Terao C, 29. Dashnow H, Lek M, Phipson B, Halman A,
Kubo M, Kamatani Y (2019) Comprehensive Sadedin S, Lonsdale A, Davis M, Lamont P,
evaluation of structural variation detection Clayton JS, Laing NG, MacArthur DG, Osh-
algorithms for whole genome sequencing. lack A (2018) STRetch: detecting and discov-
Genome Biol 20(1):117. https://doi.org/ ering pathogenic short tandem repeat
10.1186/s13059-019-1720-5 expansions. Genome Biol 19(1):121.
26. Sudmant PH, Rausch T, Gardner EJ, Hand- https://doi.org/10.1186/s13059-018-
saker RE, Abyzov A, Huddleston J, Zhang Y, 1505-2
Ye K, Jun G, Fritz MH, Konkel MK, 30. Pruitt KD, Maglott DR (2001) RefSeq and
Malhotra A, Stutz AM, Shi X, Casale FP, LocusLink: NCBI gene-centered resources.
Chen J, Hormozdiari F, Dayama G, Chen K, Nucleic Acids Res 29(1):137–140. https://
Malig M, Chaisson MJP, Walter K, Meiers S, doi.org/10.1093/nar/29.1.137
Kashin S, Garrison E, Auton A, Lam HYK,
Mu XJ, Alkan C, Antaki D, Bae T, Cerveira E, 31. McLaren W, Pritchard B, Rios D, Chen Y,
Chines P, Chong Z, Clarke L, Dal E, Ding L, Flicek P, Cunningham F (2010) Deriving the
20 Abdul Rezzak Hamzeh et al.
consequences of genomic variants with the Pascual V, Cook MC, Vinuesa CG (2019)
Ensembl API and SNP Effect Predictor. Bio- Functional rare and low frequency variants in
informatics 26(16):2069–2070. https://doi. BLK and BANK1 contribute to human lupus.
org/10.1093/bioinformatics/btq330 Nat Commun 10(1):2201. https://doi.org/
32. Wang K, Li M, Hakonarson H (2010) 10.1038/s41467-019-10242-9
ANNOVAR: functional annotation of genetic 39. Kramer A, Green J, Pollard J Jr, Tugendreich
variants from high-throughput sequencing S (2014) Causal analysis approaches in Inge-
data. Nucleic Acids Res 38(16):e164. nuity Pathway Analysis. Bioinformatics 30
https://doi.org/10.1093/nar/gkq603 (4):523–530. https://doi.org/10.1093/bio
33. Anna A, Monika G (2018) Splicing mutations informatics/btt703
in human genetic disorders: examples, detec- 40. Szklarczyk D, Morris JH, Cook H, Kuhn M,
tion, and confirmation. J Appl Genet 59 Wyder S, Simonovic M, Santos A, Doncheva
(3):253–268. https://doi.org/10.1007/ NT, Roth A, Bork P, Jensen LJ, von Mering C
s13353-018-0444-7 (2017) The STRING database in 2017:
34. Miosge LA, Field MA, Sontani Y, Cho V, quality-controlled protein-protein association
Johnson S, Palkova A, Balakishnan B, networks, made broadly accessible. Nucleic
Liang R, Zhang Y, Lyon S, Beutler B, Acids Res 45(D1):D362–D368. https://doi.
Whittle B, Bertram EM, Enders A, Goodnow org/10.1093/nar/gkw937
CC, Andrews TD (2015) Comparison of pre- 41. Ng SB, Buckingham KJ, Lee C, Bigham AW,
dicted and actual consequences of missense Tabor HK, Dent KM, Huff CD, Shannon PT,
mutations. Proc Natl Acad Sci USA 112(37): Jabs EW, Nickerson DA, Shendure J, Bam-
E5189-E5198. https://doi.org/10.1073/ shad MJ (2010) Exome sequencing identifies
pnas.1511585112 the cause of a mendelian disorder. Nat Genet
35. Johar AS, Anaya JM, Andrews D, Patel HR, 42(1):30–35. https://doi.org/10.1038/ng.
Field M, Goodnow C, Arcos-Burgos M 499
(2014) Candidate gene discovery in autoim- 42. Yang Y, Muzny DM, Reid JG, Bainbridge
munity by using extreme phenotypes, next MN, Willis A, Ward PA, Braxton A,
generation sequencing and whole exome cap- Beuten J, Xia F, Niu Z, Hardison M,
ture. Autoimmun Rev 14(3):204-209. Person R, Bekheirnia MR, Leduc MS,
https://doi.org/10.1016/j.autrev.2014.10. Kirby A, Pham P, Scull J, Wang M, Ding Y,
021 Plon SE, Lupski JR, Beaudet AL, Gibbs RA,
36. Al Sukaiti N, AbdelRahman K, AlShekaili J, Al Eng CM (2013) Clinical whole-exome
Oraimi S, Al Sinani A, Al Rahbi N, Cho V, sequencing for the diagnosis of mendelian
Field M, Cook MC (2017) Agammaglobuli- disorders. N Engl J Med 369
naemia despite terminal B-cell differentiation (16):1502–1511. https://doi.org/10.1056/
in a patient with a novel LRBA mutation. Clin NEJMoa1306555
Transl Immunol 6(5):e144 43. Li MX, Gui HS, Kwan JS, Bao SY, Sham PC
37. Cardinez C, Miraghazadeh B, Tanita K, da (2012) A comprehensive framework for prior-
Silva E, Hoshino A, Okada S, Chand R, itizing variants in exome sequencing studies of
Asano T, Tsumura M, Yoshida K, Mendelian diseases. Nucleic Acids Res 40(7):
Ohnishi H, Kato Z, Yamazaki M, Okuno Y, e53. https://doi.org/10.1093/nar/gkr1257
Miyano S, Kojima S, Ogawa S, Andrews TD, 44. Kamphans T, Sabri P, Zhu N, Heinrich V,
Field MA, Burgio G, Morio T, Vinuesa CG, Mundlos S, Robinson PN, Parkhomchuk D,
Kanegane H, Cook MC (2018) Gain-of-func- Krawitz PM (2013) Filtering for compound
tion IKBKB mutation causes human com- heterozygous sequence variants in
bined immune deficiency. J Exp Med. non-consanguineous pedigrees. PLoS One 8
https://doi.org/10.1084/jem.20180639 (8):e70151. https://doi.org/10.1371/jour
38. Jiang SH, Athanasopoulos V, Ellyard JI, nal.pone.0070151
Chuah A, Cappello J, Cook A, Prabhu SB, 45. Kircher M, Witten DM, Jain P, O’Roak BJ,
Cardenas J, Gu J, Stanley M, Roco JA, Cooper GM, Shendure J (2014) A general
Papa I, Yabas M, Walters GD, Burgio G, framework for estimating the relative patho-
McKeon K, Byers JM, Burrin C, Enders A, genicity of human genetic variants. Nat Genet
Miosge LA, Canete PF, Jelusic M, Tasic V, 46(3):310–315. https://doi.org/10.1038/
Lungu AC, Alexander SI, Kitching AR, Ful- ng.2892
cher DA, Shen N, Arsov T, Gatenby PA, 46. Paila U, Chapman BA, Kirchner R, Quinlan
Babon JJ, Mallon DF, de Lucas CC, Stone AR (2013) GEMINI: integrative exploration
EA, Wu P, Field MA, Andrews TD, Cho E, of genetic variation and genome annotations.
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing 21
PLoS Comput Biol 9(7):e1003153. https:// 54. Karczewski KJ, Francioli LC, Tiao G, Cum-
doi.org/10.1371/journal.pcbi.1003153 mings BB, Alföldi J, Wang Q, Collins RL,
47. Field MA, Cho V, Cook MC, Enders A, Laricchia KM, Ganna A, Birnbaum DP, Gau-
Vinuesa C, Whittle B, Andrews TD, Good- thier LD, Brand H, Solomonson M, Watts
now CC (2015) Reducing the search space for NA, Rhodes D, Singer-Berk M, Seaby EG,
causal genetic variants with VASP: Variant Kosmicki JA, Walters RK, Tashman K,
Analysis of Sequenced Pedigrees. Bioinfor- Farjoun Y, Banks E, Poterba T, Wang A,
matics 31(14):2377-2379. https://doi.org/ Seed C, Whiffin N, Chong JX, Samocha KE,
10.1093/bioinformatics/btv135 Pierce-Hoffman E, Zappala Z, O’Donnell-
48. Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Luria AH, Vallabh Minikel E, Weisburd B,
Hiatt JB, Loeb LA (2012) Detection of ultra- Lek M, Ware JS, Vittal C, Armean IM,
rare mutations by next-generation sequenc- Bergelson L, Cibulskis K, Connolly KM,
ing. Proc Natl Acad Sci U S A 109 Covarrubias M, Donnelly S, Ferriera S,
(36):14508–14513. https://doi.org/10. Gabriel S, Gentry J, Gupta N, Jeandet T,
1073/pnas.1208715109 Kaplan D, Llanwarne C, Munshi R,
Novod S, Petrillo N, Roazen D, Ruano-
49. Cummings BB, Marshall JL, Tukiainen T, Rubio V, Saltzman A, Schleicher M, Soto J,
Lek M, Donkervoort S, Foley AR, Bolduc V, Tibbetts K, Tolonen C, Wade G, Talkowski
Waddell LB, Sandaradura SA, O’Grady GL, ME, Neale BM, Daly MJ, MacArthur DG
Estrella E, Reddy HM, Zhao F, Weisburd B, (2019) Variation across 141,456 human
Karczewski KJ, O’Donnell-Luria AH, exomes and genomes reveals the spectrum of
Birnbaum D, Sarkozy A, Hu Y, loss-of-function intolerance across human
Gonorazky H, Claeys K, Joshi H, protein-coding genes. bioRxiv:531210.
Bournazos A, Oates EC, Ghaoui R, Davis https://doi.org/10.1101/531210
MR, Laing NG, Topf A, Genotype-Tissue
Expression C, Kang PB, Beggs AH, North 55. Landrum MJ, Lee JM, Riley GR, Jang W,
KN, Straub V, Dowling JJ, Muntoni F, Clarke Rubinstein WS, Church DM, Maglott DR
NF, Cooper ST, Bonnemann CG, MacArthur (2014) ClinVar: public archive of relation-
DG (2017) Improving genetic diagnosis in ships among sequence variation and human
Mendelian disease with transcriptome phenotype. Nucleic Acids Res 42(Database
sequencing. Sci Transl Med 9(386): issue):D980–D985. https://doi.org/10.
eaal5209. https://doi.org/10.1126/ 1093/nar/gkt1113
scitranslmed.aal5209 56. Braschi B, Denny P, Gray K, Jones T, Seal R,
50. Byron SA, Van Keuren-Jensen KR, Engeltha- Tweedie S, Yates B, Bruford E (2019) Gene-
ler DM, Carpten JD, Craig DW (2016) names.org: the HGNC and VGNC resources
Translating RNA sequencing into clinical in 2019. Nucleic Acids Res 47(D1):
diagnostics: opportunities and challenges. D786–D792. https://doi.org/10.1093/
Nat Rev Genet 17(5):257–271. https://doi. nar/gky930
org/10.1038/nrg.2016.10 57. UniProt C (2008) The universal protein
51. Merker JD, Wenger AM, Sneddon T, resource (UniProt). Nucleic Acids Res 36
Grove M, Zappala Z, Fresard L, Waggott D, (Database issue):D190–D195. https://doi.
Utiramerur S, Hou Y, Smith KS, Montgom- org/10.1093/nar/gkm895
ery SB, Wheeler M, Buchan JG, Lambert CC, 58. Hamosh A, Scott AF, Amberger JS, Bocchini
Eng KS, Hickey L, Korlach J, Ford J, Ashley CA, McKusick VA (2005) Online Mendelian
EA (2018) Long-read genome sequencing Inheritance in Man (OMIM), a knowledge-
identifies causal structural variation in a Men- base of human genes and genetic disorders.
delian disease. Genet Med 20(1):159–163. Nucleic Acids Res 33(Database issue):
https://doi.org/10.1038/gim.2017.86 D514–D517. https://doi.org/10.1093/
52. Field MA, Cho V, Andrews TD, Goodnow nar/gki033
CC (2015) Reliably detecting clinically 59. Mungall CJ, McMurry JA, Kohler S, Balhoff
important variants requires both combined JP, Borromeo C, Brush M, Carbon S,
variant calls and optimized filtering strategies. Conlin T, Dunn N, Engelstad M, Foster E,
PLoS One 10(11):e0143199. https://doi. Gourdine JP, Jacobsen JO, Keith D,
org/10.1371/journal.pone.0143199 Laraway B, Lewis SE, NguyenXuan J,
53. Waardenberg AJ, Field MA (2019). consen- Shefchek K, Vasilevsky N, Yuan Z,
susDE: an R package for assessing consensus Washington N, Hochheiser H, Groza T,
of multiple RNA-seq algorithms with RUV Smedley D, Robinson PN, Haendel MA
correction. PeerJ 7:e8206. https://doi.org/ (2017) The Monarch initiative: an integrative
10.7717/peerj.8206 data and analytic platform connecting
22 Abdul Rezzak Hamzeh et al.
Association for Molecular Pathology. Genet 83. Romanet P, Odou MF, North MO,
Med 17(5):405–424. https://doi.org/10. Saveanu A, Coppin L, Pasmant E,
1038/gim.2015.30 Mohamed A, Goudet P, Borson-Chazot F,
75. Nykamp K, Anderson M, Powers M, Garcia J, Calender A, Beroud C, Levy N, Giraud S,
Herrera B, Ho YY, Kobayashi Y, Patil N, Barlier A (2019) Proposition of adjustments
Thusberg J, Westbrook M, Invitae Clinical to the ACMG-AMP framework for the inter-
Genomics G, Topper S (2017) Sherloc: a pretation of MEN1 missense variants. Hum
comprehensive refinement of the ACMG- Mutat 40(6):661–674. https://doi.org/10.
AMP variant classification criteria. Genet 1002/humu.23746
Med 19(10):1105–1117. https://doi.org/ 84. Kelly MA, Caleshu C, Morales A, Buchan J,
10.1038/gim.2017.37 Wolf Z, Harrison SM, Cook S, Dillon MW,
76. Field MA, Burgio G, Chuah A, Al Shekaili J, Garcia J, Haverfield E, Jongbloed JDH,
Hassan B, Al Sukaiti N, Foote SJ, Cook MC, Macaya D, Manrai A, Orland K, Richard G,
Andrews TD (2019) Recurrent miscalling of Spoonamore K, Thomas M, Thomson K, Vin-
missense variation from short-read genome cent LM, Walsh R, Watkins H, Whiffin N,
sequence data. BMC Genomics 20(Suppl Ingles J, van Tintelen JP, Semsarian C, Ware
8):546. https://doi.org/10.1186/s12864- JS, Hershberger R, Funke B (2018) Adapta-
019-5863-2 tion and validation of the ACMG/AMP vari-
77. Kalia SS, Adelman K, Bale SJ, Chung WK, ant classification framework for MYH7-
Eng C, Evans JP, Herman GE, Hufnagel SB, associated inherited cardiomyopathies:
Klein TE, Korf BR, McKelvey KD, Ormond recommendations by ClinGen’s Inherited
KE, Richards CS, Vlangos CN, Watson M, Cardiomyopathy Expert Panel. Genet Med
Martin CL, Miller DT (2017) Recommenda- 20(3):351–359. https://doi.org/10.1038/
tions for reporting of secondary findings in gim.2017.218
clinical exome and genome sequencing, 85. Li H (2011) A statistical framework for SNP
2016 update (ACMG SF v2.0): a policy state- calling, mutation discovery, association
ment of the American College of Medical mapping and population genetical parameter
Genetics and Genomics. Genet Med 19 estimation from sequencing data. Bioinfor-
(2):249–255. https://doi.org/10.1038/ matics 27(21):2987–2993. https://doi.org/
gim.2016.190 10.1093/bioinformatics/btr509
78. Robinson JT, Thorvaldsdottir H, Winckler W, 86. Koboldt DC, Chen K, Wylie T, Larson DE,
Guttman M, Lander ES, Getz G, Mesirov JP McLellan MD, Mardis ER, Weinstock GM,
(2011) Integrative genomics viewer. Nat Bio- Wilson RK, Ding L (2009) VarScan: variant
technol 29(1):24–26. https://doi.org/10. detection in massively parallel sequencing of
1038/nbt.1754 individual and pooled samples. Bioinformatics
79. Adzhubei IA, Schmidt S, Peshkin L, 25(17):2283–2285. https://doi.org/10.
Ramensky VE, Gerasimova A, Bork P, Kon- 1093/bioinformatics/btp373
drashov AS, Sunyaev SR (2010) A method 87. Rimmer A, Phan H, Mathieson I, Iqbal Z,
and server for predicting damaging missense Twigg SRF, Consortium WGS, Wilkie
mutations. Nat Methods 7(4):248–249. AOM, McVean G, Lunter G (2014) Integrat-
https://doi.org/10.1038/nmeth0410-248 ing mapping-, assembly- and haplotype-based
80. Sim NL, Kumar P, Hu J, Henikoff S, approaches for calling variants in clinical
Schneider G, Ng PC (2012) SIFT web server: sequencing applications. Nat Genet
predicting effects of amino acid substitutions 46 (8):912–918. https://doi.org/10.1038/
on proteins. Nucleic Acids Res 40(Web Server ng.3036
issue):W452–W457. https://doi.org/10. 88. Kim S, Scheffler K, Halpern AL, Bekritsky
1093/nar/gks539 MA, Noh E, Kallberg M, Chen X, Kim Y,
81. Consortium GT (2013) The Genotype- Beyter D, Krusche P, Saunders CT (2018)
Tissue Expression (GTEx) project. Nat Strelka2: fast and accurate calling of germline
Genet 45(6):580–585. https://doi.org/10. and somatic variants. Nat Methods 15
1038/ng.2653 (8):591–594. https://doi.org/10.1038/
s41592-018-0051-x
82. Gelb BD, Cave H, Dillon MW, Gripp KW,
Lee JA, Mason-Suares H, Rauen KA, 89. Lai Z, Markovets A, Ahdesmaki M, Chapman B,
Williams B, Zenker M, Vincent LM, ClinGen Hofmann O, McEwen R, Johnson J,
RWG (2018) ClinGen’s RASopathy Expert Dougherty B, Barrett JC, Dry JR (2016) Var-
Panel consensus methods for variant interpre- Dict: a novel and versatile variant caller for next-
tation. Genet Med 20(11):1334–1345. generation sequencing in cancer research.
https://doi.org/10.1038/gim.2018.3 Nucleic Acids Res 44(11):e108. https://doi.
org/10.1093/nar/gkw227
24 Abdul Rezzak Hamzeh et al.
90. Wilm A, Aw PP, Bertrand D, Yeo GH, Ong Kooyman M, Tazelaar GHP, van Es MA,
SH, Wong CH, Khor CC, Petric R, Hibberd McLaughlin R, Sproviero W, Shatunov A,
ML, Nagarajan N (2012) LoFreq: a sequence- Jones A, Al Khleifat A, Pittman A,
quality aware, ultra-sensitive variant caller for Morgan S, Hardiman O, Al-Chalabi A,
uncovering cell-population heterogeneity Shaw C, Smith B, Neo EJ, Morrison K,
from high-throughput sequencing datasets. Shaw PJ, Reeves C, Winterkorn L, Wexler
Nucleic Acids Res 40(22):11189–11201. NS, Group US-VCR, Housman DE, Ng
https://doi.org/10.1093/nar/gks918 CW, Li AL, Taft RJ, van den Berg LH, Bentley
91. Rausch T, Zichner T, Schlattl A, Stutz AM, DR, Veldink JH, Eberle MA (2017) Detec-
Benes V, Korbel JO (2012) DELLY: struc- tion of long repeat expansions from PCR-free
tural variant discovery by integrated paired- whole-genome sequence data. Genome Res
end and split-read analysis. Bioinformatics 28 27(11):1895–1903. https://doi.org/10.
(18):i333–i339. https://doi.org/10.1093/ 1101/gr.225672.117
bioinformatics/bts378 99. Willems T, Zielinski D, Yuan J, Gordon A,
92. Layer RM, Chiang C, Quinlan AR, Hall IM Gymrek M, Erlich Y (2017) Genome-wide
(2014) LUMPY: a probabilistic framework profiling of heritable and de novo STR varia-
for structural variant discovery. Genome Biol tions. Nat Methods 14(6):590–592. https://
15(6):R84. https://doi.org/10.1186/gb- doi.org/10.1038/nmeth.4267
2014-15-6-r84 100. Tankard RM, Bennett MF, Degorski P, Dela-
93. Chen X, Schulz-Trieglaff O, Shaw R, tycki MB, Lockhart PJ, Bahlo M (2018)
Barnes B, Schlesinger F, Kallberg M, Cox Detecting expansions of tandem repeats in
AJ, Kruglyak S, Saunders CT (2016) Manta: cohorts sequenced with short-read sequenc-
rapid detection of structural variants and ing data. Am J Hum Genet 103(6):858–873.
indels for germline and cancer sequencing https://doi.org/10.1016/j.ajhg.2018.10.
applications. Bioinformatics 32 015
(8):1220–1222. https://doi.org/10.1093/ 101. Tang H, Kirkness EF, Lippert C, Biggs WH,
bioinformatics/btv710 Fabani M, Guzman E, Ramakrishnan S,
94. Marschall T, Costa IG, Canzar S, Bauer M, Lavrenko V, Kakaradov B, Hou C, Hicks B,
Klau GW, Schliep A, Schonhuth A (2012) Heckerman D, Och FJ, Caskey CT, Venter
CLEVER: clique-enumerating variant finder. JC, Telenti A (2017) Profiling of short-
Bioinformatics 28(22):2875–2882. https:// tandem-repeat disease alleles in 12,632
doi.org/10.1093/bioinformatics/bts566 human whole genomes. Am J Hum Genet
95. Chen K, Wallis JW, McLellan MD, Larson 101(5):700–715. https://doi.org/10.1016/
DE, Kalicki JM, Pohl CS, McGrath SD, j.ajhg.2017.09.013
Wendl MC, Zhang Q, Locke DP, Shi X, Ful- 102. Benson G (1999) Tandem repeats finder: a
ton RS, Ley TJ, Wilson RK, Ding L, Mardis program to analyze DNA sequences. Nucleic
ER (2009) BreakDancer: an algorithm for Acids Res 27(2):573–580. https://doi.org/
high-resolution mapping of genomic struc- 10.1093/nar/27.2.573
tural variation. Nat Methods 6(9):677–681. 103. Mayer C, Leese F, Tollrian R (2010)
https://doi.org/10.1038/nmeth.1363 Genome-wide analysis of tandem repeats in
96. Jiang Y, Wang Y, Brudno M (2012) PRISM: Daphnia pulex—a comparative approach.
pair-read informed split-read mapping for BMC Genomics 11:277. https://doi.org/
base-pair level detection of insertion, deletion 10.1186/1471-2164-11-277
and structural variants. Bioinformatics 28 104. Keane TM, Wong K, Adams DJ (2013) Ret-
(20):2576–2583. https://doi.org/10.1093/ roSeq: transposable element discovery from
bioinformatics/bts484 next-generation sequencing data. Bioinfor-
97. Ye K, Schulz MH, Long Q, Apweiler R, Ning matics 29(3):389–390. https://doi.org/10.
Z (2009) Pindel: a pattern growth approach 1093/bioinformatics/bts697
to detect break points of large deletions and 105. Wu J, Lee WP, Ward A, Walker JA, Konkel
medium sized insertions from paired-end MK, Batzer MA, Marth GT (2014) Tangram:
short reads. Bioinformatics 25 a comprehensive toolbox for mobile element
(21):2865–2871. https://doi.org/10.1093/ insertion detection. BMC Genomics 15:795.
bioinformatics/btp394 https://doi.org/10.1186/1471-2164-15-
98. Dolzhenko E, van Vugt J, Shaw RJ, Bekritsky 795
MA, van Blitterswijk M, Narzisi G, Ajay SS, 106. Kanehisa M, Goto S (2000) KEGG: kyoto
Rajan V, Lajoie BR, Johnson NH, encyclopedia of genes and genomes. Nucleic
Kingsbury Z, Humphray SJ, Schellevis RD, Acids Res 28(1):27–30. https://doi.org/10.
Brands WJ, Baker M, Rademakers R, 1093/nar/28.1.27
Detecting Causal Variants in Mendelian Disorders Using Whole-Genome Sequencing 25
Abstract
The next-generation sequencing (NGS) technology has revolutionized research in genetics and genomics,
resulting in massive NGS data and opening more fronts to answer unresolved issues in genetics. NGS data
are usually stored at three levels: image files, sequence tags, and alignment reads. The sizes of these types of
data usually range from several hundreds of gigabytes to several terabytes. Biostatisticians and bioinforma-
ticians are typically working with the aligned NGS read count data (hence the last level of NGS data) for
data modeling and interpretation.
To horn in on the use of NGS technology, researchers utilize it to profile the whole genome to study
DNA copy number variations (CNVs) for an individual subject (or patient) as well as groups of subjects
(or patients). The resulting aligned NGS read count data are then modeled by proper mathematical and
statistical approaches so that the loci of CNVs can be accurately detected. In this book chapter, a summary
of most popularly used statistical methods for detecting CNVs using NGS data is given. The goal is to
provide readers with a comprehensive resource of available statistical approaches for inferring DNA copy
number variations using NGS data.
Key words Bayesian analysis, CNVs, Information criterion, Likelihood ratio test, NGS reads, Read
counts, Read depth, Statistical change point analysis
Noam Shomron (ed.), Deep Sequencing Data Analysis, Methods in Molecular Biology, vol. 2243,
https://doi.org/10.1007/978-1-0716-1103-6_2, © Springer Science+Business Media, LLC, part of Springer Nature 2021
27
28 Jie Chen
3 Computational Approaches
3.1 The SeqSeq A lung cancer NGS data was analyzed in [7] using a local change-
Software Tool point method followed by a merging procedure that joins adjacent
segments. Specifically, for each tumor read mapping position, a
window is created by extending to the left and to the right, respec-
tively, to include w reads, where w is a pre-defined parameter. A
log-ratio statistic D for the tumor position of interest is calculated
based on read counts from the tumor and normal samples at the
two sides of the tumor position within the window. A p-value was
computed for change point detection based on the D statistics. If
the p-value is smaller than an initially pre-defined parameter pinit,
then the tumor position of interest will be in the list of potential
breakpoints for the next step. The next step is to examine these
potential breakpoints to reduce false discovery rate (FDR). Specifi-
cally, the whole chromosome or genome are segmented by the list
of potential breakpoints detected in the above step. Starting from
the least significant breakpoint, using the read count data in the
tumor and the normal samples in the entire segment, which is often
longer than a window in the first step, the algorithm iteratively joins
adjacent segments if the p-value is greater than the pre-defined
threshold pmerge. The algorithm is illustrated in Fig. 1, i.e., Figure 2
(a), (b), (c) in [7]. A software tool, SegSeq [7], was developed to
implement the above described algorithm.
TABLE OF CONTENTS
A
Adams, President John Quincy;
on First Treaty with Prussia: 229
Alabama, The; Confederate Cruiser: 51, 111
Allied Nations in War: 11
Alsace-Lorraine: 11
No Desire for French Annexation;
Linked with the German Empire;
German Character of: 12
General Rapp Demands Independence of;
Germans Deported from: 14
France Distrusts Her Own People in: 15
American Bearers of Foreign Titles: 27
“American Liberal, The”: 70
American School Children and Foreign Propaganda: 20
Americanization Committee of Massachusetts on;
Macaulay on George III;
King George Not Alone Responsible: 21
George Haven Putnam’s London Address: 22
Owen Wister in London “Times”: 23
Americans Not an English People: 16
William Elliot Griffis Quoted: 178-179
Prof. Albert B. Faust: 16
James Russell Lowell;
Douglas Campbell: 17
Scott Nearing: 18
James A. Garfield;
Charles E. Hughes: 19
Americans Saved from Tampico Mob by German Cruiser: 19
Armstead, Major George;
Defender of Ft. McHenry: 20
Astor, John Jacob;
American Pathfinder: 25
Atherton, Gertrude;
on Experience in Germany: 188
Atrocities, Belgian and French: 28
Melville E. Stone on: 29
Rev. J. F. Stillimans on;
London “Globe” on: 30
London “Universe” on;
John T. McCutcheon on;
Irvin S. Cobb on;
Emily S. Hobhouse on: 31
Rev. J. F. Matthews on: 32
Horace Green on;
Prof. Kellogg on;
Ernest P. Bicknell on: 33
American Correspondents on;
Premier Asquith Denies: 34
State Department Refuses Information on;
Church Authorities Investigate: 35
William K. Draper Quoted;
Why Created: 36
Same Stories Told in Civil War Period;
Post Office Department Prohibits Denial of: 37
B
Bancroft, George;
on Germans in American Revolution: 105
Negotiates Memorable Agreement with Bismarck: 38
Refers Vancouver Boundary Dispute to German Emperor;
Advises Friendship With Germany: 39
Baralong, English Pirate Ship: 39
Beck, James M.: 199
Becker, Alfred L., Deputy Attorney General of New York, Investigates German
Propaganda;
Investigated by Senator Reed: 71
Employed Ex-Convicts: 73
Becker, Prof. Carl L.;
on Composition of American People: 103
Berger, Mrs. Frances, Victim of Mob: 67
Berliner, Emile, Inventor of the Microphone: 40
Bernstorff, German Ambassador, Quotes Col. House: 131
Blaine, James G., Quotes English Sentiment During Civil War: 112
Blockade, “Illegal, Ineffective and Indefensible”: 42
Blue Laws of Virginia: 184
Boers, The;
English Treatment of: 40
“Bombing Maternity Hospitals”: 44
Brant, Indian Chief, Destroys German Settlements: 135, 175
C
Campbell, Douglas, on Composition of American People: 17
Carnegie, Andrew, on British-American Union: 197-8
Cavell, Edith, Executed by Germans;
Execution Justified by Col. E. R. West: 46
Chamberlain, Senator, Speech on English Threats: 74
Cheradame, Andre, French Propagandist, Conspires Against President
Wilson: 187
Christiansen, Hendrik, True Explorer of the Hudson River: 48
Clemenceau, Premier Georges, Blames France for War of 1870-71: 241
Cobb, Sanford H., Story of the Palatines: 104
Concord, The; Brought Germantown Settlers: 121
Concord Society, The;
Objects of: 47
Cramb, Prof. J. A., on Germany’s Lofty Spirit: 51
Cramps, Shipbuilders: 125
Creasy, Prof. E. S., on the German Race: 18
Creel and the Sisson Documents: 44
Cromberger, Johann: 45
Custer, General George A., a Hessian Descendant: 45
D
Daimler, Gottlieb, Inventor of the Gas Engine: 138
Danzig: 60, 85
DeKalb, Major General Johann von: 48
“Dial, The,” on French Propaganda: 187
Dillon, Dr. E. J., on Alsace-Lorraine: 11
Dorsheimer, Hon. William: 49
Dual Citizenship: 49
Dutch and German: 49
E
Earling, Albert J., Railway President: 50
Eckert, Thomas: 50
Election of 1916 and the League of Nations Covenants: 51
President Wilson’s Colloquy with Senator McCumber: 56
Foreign Minister Hanotaux Promised American Aid in 1914: 57
Eliot, Prof. Charles W.,
on German Civilization: 50
England Plundered American Commerce: 51
Refuses Loan to United States in Civil War: 110
Threatens United States Through Canada: 73
English Government Offers $8 for American Scalps: 136
View of Paul Jones: 139
First to Use Poison Gas: 192
Tribute to Germany’s Lofty Spirit: 51
Opinion of Prussians in 1815: 58
Investment in Confederate Bonds: 114
Propaganda in Public Schools: 20
White Book Justifies Invasion of Belgium: 207
Statesmen Denounce American Union: 113
“English-Speaking Union”: 198
Erzberger, Appeal to Conscience of America: 90
Espionage Act, Vote on: 58
How Administered: 59
Report of Civil Liberties Bureau;
New York “Sun” Quoted: 63
Friends of German Democracy;
Mrs. William Jay;
German Masons in New Jersey: 64
Exports and Imports in 1914: 58
F
Fisher, Admiral,
Justifies German Submarines: 212
Foreign Residents Assured as to their Investments: 230
Fourteen Points, The;
History of: 86
France’s Historic Relations with the United States: 76
Franklin, Benjamin: 80
Alarmed by German Immigration: 81
Praises German Population: 83
Frederick the Great and the American Colonies: 84
Prevents Russian Alliance with England Against Colonies;
Offers American Cruisers Refuge at Danzig: 85
Free Masons in New Jersey Against Language Edict: 64
Fresch, Hermann, Sulphur King: 224
Fricke, Albert Paul,
Tried for Treason and Acquitted: 67
Friends of German Democracy: 64
Fritchie, Barbara,
Immortalized by Whittier: 90
G
Gas, Poison, First Employed by English: 192
George III, a “German King”?: 20
Macaulay on: 21
George, Lloyd,
Denounces Atrocities Against Boers: 41
German American Captains of Industry: 94
German Element in American Life: 102
Mechanics in Jamestown Settlement: 91
In Virginia: 105
Moravians First Settlers in Ohio: 107
On Indian Border in Pennsylvania: 108
Settle Frankfort and Louisville, Ky: 109
Ardent patriots in Revolution: 105, 109, 175, 181
Early Western Border Occupied by: 108
Protest Against Slavery: 180
First Proclamation of Independence: 175
Praise for Their Republican Virtues: 180
In Civil War: 114
In Confederate Army: 120
Ideals of Liberty: 154
Women Spies Executed by French: 49
In American Art, Science and Literature: 91
Praised by Franklin: 83
Praised by Washington: 245
Praised by Jefferson: 141
First Newspapers: 91
George Bancroft on: 105
Subscriptions to Liberty Loan: 153
In Massachusetts Bay Colony: 156
Keeps Missouri in the Union: 159
German Emperor Decides Vancouver Boundary Dispute in Our Favor: 39
Germantown Settlement: 121
Germany; Why Strengthened Her Army: 124
Treatment of France After War of 1870-71: 90
Conduct During Civil War: 110
Buys $600,000,000 of Union Bonds: 111
Bancroft Quoted: 39
Sends Relief During Civil War: 90
Godfrey, Inventor of Quadrant: 178
Gould, B. A.;
Civil War Statistics: 115
Grey, Sir Edward,
on Humanity in War: 132
Griffis, Dr. William Elliot,
on German Element: 104
Early German Mechanics: 105
On Jacob Leisler: 146
On Teutonic Influence: 178-9
On Bay Colony Aristocracy: 181
On Confusing Germans with Dutch: 49
Guizot, on German Love of Liberty: 154
H
Hagner, Peter: 124
Haiman, Louis,
“Swordmaker of the Confederacy”: 227
Hanotaux, Foreign Minister,
on Assurances Given France in 1914 by American Ambassadors: 56
Harris, Frank,
on Germany and England: 155
Hartford Convention, The: 124
Hempel: 125
“Herald,” New York,
Urges Hanging of German Americans: 125
Hereshoffs and Cramps: 125
Herkimer, General Nicholas,
Hero of Oriskany: 125
Hervé, Gustave, on Alsace Lorraine: 12
On Poison Gas: 192
Hessians, The: 125
Swell Jackson’s Stonewall Brigade;
Where Settled: 129
General Custer, Descended from: 45
Hillegas, Michael,
First Treasurer of the United States: 129
Hitchcock, Senator Gilbert M.,
on Seizure of Alien Property: 232
House, Col. E. M.;
Reputed Author of “Philip Dru, Administrator”: 130
Influences President on Surrender of Saar Valley: 131
Friend of Lloyd George;
Attended School in England: 130
I
Ibanez, Vincente Blasco,
French Propaganda Agent: 185
Ideals of Liberty: 154
Illiteracy of Contending Countries: 132
Immigration: 132
Germantown: 177
Indians, Tories and German Settlements: 135
Invention of Telephone, Gas Engine,
Photographic Lenses, etc.: 138
“Issues and Events”: 69
J
Jaeger, Pastor,
Murdered for Being German: 67
Jay, Mrs. William,
Leads Campaign to Suppress German Music: 64
Jefferson, Thomas,
on German Immigrants: 141
On English Hyphenates: 140
On Virginia Blue Laws: 184
On Longing for an English King: 24
Jones, John Paul;
English View of: 139
K
Kapp, Frederich,
History of American People: 102-4
King, Senator, of Utah,
Bill Canceling Charter of the German American Alliance: 69
Knobel, Caspar,
Captures Jefferson Davis: 142
Knownothing Party: 142
Koerner, Gustav,
on Political Character of German Americans: 143
Krech, Alvin W.:
Kudlich, Dr. Hans,
the Peasant Emancipator: 143
L
Langlotz, Prof. C. A.,
Author of “Old Nassau”: 145
Lee, Lighthouse Harry: 148
Lehman, Philip Theodore,
William Penn’s Secretary: 145
Lehmann, Frederick William: 145
Leisler, Jacob,
First Martyr to Cause of American Independence: 145
Lieber, Francis: 146
Founder, “Encyclopedia Americana”: 147
Legal Advisor to Lincoln Government;
Author of “Instructions for the Armies in the Field”: 148
Lincoln, Abraham,
of German Extraction?: 148
London “Times” in 1862: 113
Long, Frances L.,
One of Custer’s Sergeants and Survivor Greeley Arctic Expedition: 152
Lossing, Benson J.,
on Our Debt to France: 77
On Jacob Leisler: 146
On Conrad Weiser: 245
Lowell, James Russell;
American People Not English: 17
Ludwig, Christian,
Purveyor of the Revolutionary Army: 153
M
Macaulay, Lord,
on German Immigrant Settlers: 104
On George III: 21
Marix, Rear Admiral Adolph: 156
Massow, Baron von,
Member of Mosby’s Brigade: 156
McCarthy, Justin,
on Cruise of the Alabama;
Recognition of Confederacy: 111
On Schleswig-Holstein Question: 210
McCumber, Senator,
Asks President About Our Entrance Into the War: 56
McNeill, Walter S.,
on German Constitution: 155
On German Civil Law: 157
Memminger, Christoph Gustav,
Secretary of the Treasury in the Confederate Cabinet: 157
Menken, S. Stanwood,
Organizer and President National Security League: 171-2
Mergenthaler, Ottmar,
Inventor of the Linotype Machine: 157
Military Establishments of the Warring Nations in 1914: 157
Minnewit, Peter,
Purchased Island of Manhattan from Indians: 158
Missouri, How Kept in the Union: 159
Montesquieu, on Birth of Liberty: 154
Morgan, J. Pierpont: 158
Related to Viscount Lewis Harcourt: 159
Accused in Congress of Controlling Press: 190
Muhlenberg, Heinrich Melchior, Founder Lutheran Church in America;
Frederick August, First Speaker House of Representative;
Peter, General; Career of: 161
N
Nagel, Charles,
Secretary of Commerce and Labor: 169
Nast, Thomas,
America’s Greatest Cartoonist;
Kills the Tweed Ring;
Grant’s Opinion of: 169
National Security League;
Objects of, Backers of: 169
Representative Cooper of Wisconsin on: 170
Interference with New York Public Schools: 171
How Organized; Disbursements by: 172
Denounced in Congress: 171-2
Neutrality; President Wilson on,
in Mexican Relations: 172
New Ulm Massacre: 173
Northcliffe, Lord;
Control of American Newspapers: 174
O
Ohio; Germans First to Settle,
First White Child in: 107
Orth, Charles D.,
President National Security League: 171-2
Osterhaus, General Peter Joseph,
Record in Union Army: 174
His Pension Canceled: 175
Overman Bill: 54
P
Palatines, the;
Sanford H. Cobb on: 104
Judge Benton Quoted: 105
Declaration of Independence Antedates that of Mecklenburg: 175
Its Signers: 176-7
Panin, Count Nikolai I, Russian Premier,
Bribed by Frederick the Great: 85
Pastorius, Franz Daniel,
Founder of Germantown: 121, 177
Agitation Against Unveiling of Monument to: 179
Author of First Protest Against Slavery: 180
Pathfinders, German American: 191
Penn, William, and Crefeld Immigrants: 121
His Mother a Dutch Woman: 193
Pennypacker, Ex-Governor Samuel Whitaker: 121
Pilgrim Society: 193
Pitcher, Molly;
Famous Heroine of German Descent: 190
Poison Gas; First Used at Colenso;
French Testimony: 192
Prager, Robert B.,
Lynched by Anti-German Mob: 67
Press Attacked in Congress: 190
Propaganda in the United States: 185
Vincente Blasco Ibanez, French Agent: 185
Louis Tracy, English Agent;
How Conducted: 186
French Described by “The Dial;”
Andre Cheradame: 187
Overman Committee;
Gertrude Atherton: 188
Prussia, First Treaty with: 229
Prussian Constitution,
Praised by President Wilson: 156
Puritans; Land in 1620;
Great Migration; Freemen;
Hang Quakers and Witches;
Blue Laws: 184
Putnam, George Haven,
Repudiates the American Revolution;
Proposes to Rewrite Text Books of American History in Public
Schools: 22
Regrets American Independence from England: 23
Q
Quakers Hanged in Bay Colony: 184
Quitman, General J. A.,
in Mexican War: 194
R
Rassieur, Leo: 205
Reis, Philipp, Inventor of the Telephone: 139
Representation in Congress: 194
Rhodes, Cecil; Text of Secret Will to Reclaim the United States: 195
Sinclair Kennedy, on Plan: 196-7
Whitelaw Reid, on Unity with English Government: 196
Andrew Carnegie, on British-American Union; Rhodes Scholarships: 197
General Pershing’s Statement; James M. Beck’s Statement: 199
Admiral Sims’s Guildhall Speech; New York “Globe” Quotes Ambassador
Page: 200
Prof. Roland G. Usher, on Secret Understanding; Colonial Secretary
Chamberlain Quoted: 201
Joseph H. Choate’s Toast to the King: 202
Ringling, Al: 203, 207
Rittenhouse, David, First Great American Scientist: 204
Roebling, John August, Famous Bridge Builder: 205
Roosevelt, Theodore: 205
Russia Approached by England for Alliance Against the Colonies: 85
S
Sauer, Christopher,
Famous Colonial Printer: 217
Scheffauer, Herman George,
American Poet: 215
Schell, Johann Christian:
An Episode of the Early Border: 215
Schleswig-Holstein,
“One and Indivisible”: 209
Wish to be German;
Revolution Against Denmark, 1848: 210
Cradle of Purest Germanism: 211
Total Danish-Speaking Population in Germany: 212
Schley, Admiral Winfield Scott;
Rescue of Lt. Greeley: 216
Schreiner, George A.,
on American Passport Discriminations: 66
On Use of Poison Gas at Colenso: 192
On Lusitania Sinking: 242
Schurz, Carl,
on German Revolution of 1848: 214
On German Element in the United States: 102
Scraps of Paper: 208
Secret Treaties: 89
Seward, Secretary William H.,
Expresses Thanks to Prussia: 112
Slavery, First Protest Against: 180
Starving Germany;
Result of, and Casualties: 217
State Department Note of Assurance, February 8, 1917: 230
Steinmetz, Charles P.,
Famous Electrician: 217
Steuben, Baron Frederick von: 220
Sutter, the Romance of a California Pioneer: 225
First to Hoist American Flag to Stay;
Founds New Switzerland on Sacramento River;
Alvarado Land Grant: 225
Sides with Santa Anna;
Lays Out Town of Sutterville, now Sacramento;
Visited by Major Fremont;
Hoists the American Flag on His Fort;
Gold Discovered on His Ranch by Marshall: 226
Sutter Ruined;
Dies Poor in Pennsylvania;
Tribute to: 227
“Swordmaker of the Confederacy”: 227
T
Taft, William H., on Religious Intolerance: 185
Praises Kaiser: 208
“Times,” London, Denounces United States: 113
Advocates British Propaganda in the United States: 24
Titled Americans: 27
Tolstoy on American Liberty: 228
Tracy, Louis, Head of English Propaganda Bureau: 186
Treaties of 1799 and 1828, with Germany: 229-30
Treaty, Commercial, with Germany, and How Observed; President John
Quincy Adams on First Treaty; Treaties of 1799-1828: 229
State Department Assures Foreign Residents: 230
Alien Custodianship Aired in Congress; Senator Hitchcock’s Momentous
Statement; President Wilson’s Remarks of April 2, 1917; List of
Persons Whose Property Was Seized: 232
Property of Wives of Aliens Seized: 233
Tryon County Committee of Safety: 175
U
Usher, Prof. Roland G.,
on “Understanding” with England: 200-2
V
Viereck, George Sylvester: 71, 92
Villard, Henry: 236
Virginia Blue Laws: 184
Vote on War in Congress: 236
W
War of 1870-71 240
War Lies Repudiated by English Paper: 241
Washington’s Body Guard: 244
Tribute to Germans: 245
Weiser, Conrad,
Pioneer and Statesman: 245
West, Col. E. R.,
Justifies Execution of Edith Cavell: 46
Wetzel, Lou, Indian Fighter: 246
Whittier, John Greenleaf,
Poem on Germantown Settlement: 180
Williams, Deantor John Sharp,
on Fighting Canada: 76
Wilson, Woodrow, President;
on Our Debt to France: 78
On His Fourteen Points: 88
Friendship for German People: 90
German Intellectualism, 1917 and 1919: 155
Praises Prussian Constitution: 156
On “Best Practices of Nations”: 172
Wirt, William,
Famous Jurist and Author: 247
Wirtz, Captain Henry,
of Andersonville Prison: 247
Wistar, Caspar: 247
Z
Zane, Elizabeth,
Early Border Heroine: 248
Zeisberger, David,
Founds First Christian Community in Ohio: 107
Zenger, John Peter,
and the Freedom of the Press: 250
Ziegler, David,
Revolutionary Soldier and Indian Fighter: 248
Transcriber’s Notes
The following corrections have been made in the text:
1—
‘inferference’ replaced with ‘interference’
2—
‘liberatarian’ replaced with ‘libertarian’
3—
‘have’ replaced with ‘gave’
4—
‘spech’ replaced with ‘speech’
5—
‘boks’ replaced with ‘books’
6—
‘correspondenece’ replaced with ‘correspondence’
8—
‘Nocosian’ replaced with ‘Nicosian’
9—
‘tradegy’ replaced with ‘tragedy’
10 —
‘Scandanavia’ replaced with ‘Scandinavia’
11 —
‘compells’ replaced with ‘compels’
12 —
‘Minnewitt’ replaced with ‘Minnewit’
13 —
‘resul’ replaced with ‘result’
14 —
‘Dalmation’ replaced with ‘Dalmatian’
15 —
‘imigrants’ replaced with ‘immigrants’
17 —
‘Heidelburg’ replaced with ‘Heidelberg’
18 —
‘feed’ replaced with ‘feet’
19 —
‘parishoners’ replaced with ‘parishioners’
20 —
‘Gregoty’ replaced with ‘Gregory’
21 —
‘volunters’ replaced with ‘volunteers’
22 —
‘Gettsyburg’ replaced with ‘Gettysburg’
23 —
‘Bushbeck’ replaced with ‘Buschbeck’ for consistency
24 —
‘Schimmelpfenning’ replaced with ‘Schimmelpfennig’
26 —
‘Hannover’ replaced with ‘Hanover’
27 —
‘filbuster’ replaced with ‘filibuster’
28 —
‘Thones’ replaced with ‘Thonas’
29 —
‘proclaimng’ replaced with ‘proclaiming’
30 —
‘Herreshoffs’ replaced with ‘Hereshoffs’
31 —
illegible numbers in table replaced with ‘?’
32 —
‘Genessee’ replaced with ‘Genesee’
33 —
‘bloodpath’ replaced with ‘bloodbath’
35 —
‘Hobokon’ replaced with ‘Hoboken’
36 —
‘sudents’ replaced with ‘students’
37 —
‘lond’ replaced with ‘long’
38 —
‘Wurtemburg’ replaced with ‘Wurtemberg’
39 —
‘thy’ replaced with ‘they’
40 —
‘McNeil’ replaced with ‘McNeill’
41 —
‘rubel’ replaced with ‘ruble’
42 —
‘Daughers’ replaced with ‘Daughters’
44 —
‘Saurs’ replaced with ‘Sauers’
45 —
‘Saur’ replaced with ‘Sauer’
46 —
‘bigoty’ replaced with ‘bigotry’
47 —
‘American’ replaced with ‘America’
48 —
‘American’ replaced with ‘Americans’
49 —
‘Annabaptists’ replaced with ‘Anabaptists’
50 —
‘patriotiotic’ replaced with ‘patriotic’
51 —
‘centennary’ replaced with ‘centenary’
53 —
‘Amerca’ replaced with ‘America’
54 —
‘Poachim’ replaced with ‘Joachim’
55 —
‘northermost’ replaced with ‘northernmost’
56 —
‘ostenibly’ replaced with ‘ostensibly’
57 —
‘Palmertson’ replaced with ‘Palmerston’
58 —
‘barels’ replaced with ‘barrels’
59 —
‘illegel’ replaced with ‘illegal’
60 —
‘sonsidered’ replaced with ‘considered’
62 —
‘Macauley’ replaced with ‘Macaulay’
63 —
‘40’ replaced with ‘184’
64 —
‘24’ replaced with ‘125’
65 —
‘121’ replaced with ‘39’
66 —
‘39’ replaced with ‘121’
67 —
‘125’ replaced with ‘135’
68 —
‘153’ replaced with ‘17’
69 —
‘Moseby’ replaced with ‘Mosby’
71 —
‘Montesqieu’ replaced with ‘Montesquieu’
72 —
‘Fench’ replaced with ‘French’
(French Testimony)
73 —
‘Amehican’ replaced with ‘American’
74 —
‘216’ replaced with ‘208’