0% found this document useful (0 votes)
635 views

Download complete Bioinformatics and Functional Genomics 3ed. Edition Jonathan Pevsner ebook PDF file all chapters

The document provides information about the third edition of 'Bioinformatics and Functional Genomics' by Jonathan Pevsner, including download links and ISBN details. It outlines the book's structure, covering topics such as DNA, RNA, and protein sequence analysis, genomewide analysis, and functional genomics. Additionally, it offers links to other related ebooks available for download.

Uploaded by

buthesaffouq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
635 views

Download complete Bioinformatics and Functional Genomics 3ed. Edition Jonathan Pevsner ebook PDF file all chapters

The document provides information about the third edition of 'Bioinformatics and Functional Genomics' by Jonathan Pevsner, including download links and ISBN details. It outlines the book's structure, covering topics such as DNA, RNA, and protein sequence analysis, genomewide analysis, and functional genomics. Additionally, it offers links to other related ebooks available for download.

Uploaded by

buthesaffouq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Visit https://ebookfinal.

com to download the full version and


explore more ebooks

Bioinformatics and Functional Genomics 3ed.


Edition Jonathan Pevsner

_____ Click the link below to download _____


https://ebookfinal.com/download/bioinformatics-and-
functional-genomics-3ed-edition-jonathan-pevsner/

Explore and download more ebooks at ebookfinal.com


Here are some suggested products you might be interested in.
Click the link to download

Biotechnology Genomics and Bioinformatics 2nd Edition


Christoph W. Sensen

https://ebookfinal.com/download/biotechnology-genomics-and-
bioinformatics-2nd-edition-christoph-w-sensen/

Functional Genomics Methods and Protocols 2nd Edition


Jingfen Zhang

https://ebookfinal.com/download/functional-genomics-methods-and-
protocols-2nd-edition-jingfen-zhang/

Plant Functional Genomics 1st Edition Meizhong Luo Phd

https://ebookfinal.com/download/plant-functional-genomics-1st-edition-
meizhong-luo-phd/

Genomics Proteomics and Metabolomics in Nutraceuticals and


Functional Foods Second Edition Bagchi

https://ebookfinal.com/download/genomics-proteomics-and-metabolomics-
in-nutraceuticals-and-functional-foods-second-edition-bagchi/
Designing for Emerging Technologies UX for Genomics
Robotics and the Internet of Things 1st Edition Jonathan
Follett
https://ebookfinal.com/download/designing-for-emerging-technologies-
ux-for-genomics-robotics-and-the-internet-of-things-1st-edition-
jonathan-follett/

Ion Channels From Atomic Resolution Physiology to


Functional Genomics No 245 Novartis Foundation Symposia
1st Edition Novartis Foundation (Author)
https://ebookfinal.com/download/ion-channels-from-atomic-resolution-
physiology-to-functional-genomics-no-245-novartis-foundation-
symposia-1st-edition-novartis-foundation-author/

Calendrical Calculations 3ed. Edition Dershowitz N.

https://ebookfinal.com/download/calendrical-calculations-3ed-edition-
dershowitz-n/

The vitamins 3ed Edition Combs G.F.Jr.

https://ebookfinal.com/download/the-vitamins-3ed-edition-combs-g-f-jr/

Dementia 3Ed 3rd Edition David Ames

https://ebookfinal.com/download/dementia-3ed-3rd-edition-david-ames/
Bioinformatics and Functional Genomics 3ed. Edition
Jonathan Pevsner Digital Instant Download
Author(s): Jonathan Pevsner
ISBN(s): 9781118581780, 1118581784
Edition: 3ed.
File Details: PDF, 26.50 MB
Year: 2015
Language: english
BIOINFORMATICS AND
FUNCTIONAL GENOMICS
third edition

Jonathan Pevsner
Bioinformatics and
Functional Genomics
Bioinformatics
and Functional
Genomics
Third Edition

Jonathan Pevsner
Department of Neurology, Kennedy Krieger Institute,
Baltimore, Maryland, USA
and
Department of Psychiatry and Behavioral Sciences,
The Johns Hopkins School of Medicine, Baltimore,
Maryland, USA
This edition first published 2015 © 2015 by John Wiley & Sons Inc
Registered office: John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex,
PO19 8SQ, UK
Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
111 River Street, Hoboken, NJ 07030-5774, USA
For details of our global editorial offices, for customer services and for information about how
to apply for permission to reuse the copyright material in this book please see our website at
www.wiley.com/wiley-blackwell.
The right of the author to be identified as the author of this work has been asserted in accordance
with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the
prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks.
All brand names and product names used in this book are trade names, service marks, trademarks
or registered trademarks of their respective owners. The publisher is not associated with any
product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author(s) have used their
best efforts in preparing this book, they make no representations or warranties with respect to
the accuracy or completeness of the contents of this book and specifically disclaim any implied
warranties of merchantability or fitness for a particular purpose. It is sold on the understanding
that the publisher is not engaged in rendering professional services and neither the publisher nor
the author shall be liable for damages arising herefrom. If professional advice or other expert
assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Pevsner, Jonathan, 1961- , author.
Bioinformatics and functional genomics / Jonathan Pevsner.—Third edition.
   p. ; cm.
 Includes bibliographical references and indexes.
 ISBN 978-1-118-58178-0 (cloth)
 I. Title.
[DNLM: 1. Computational Biology—methods. 2. Genomics. 3. Genetic
Techniques. 4. Proteomics. QU 26.5]
QH441.2
572.8′6–dc23
2015014465
A catalogue record for this book is available from the British Library.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in
print may not be available in electronic books.
The cover image is by Leonardo da Vinci, a study of a man in profile with studies of horse and
riders (reproduced with kind permission of the Gallerie d’Accademia, Venice, Ms. 7r [236r], pen,
black and red chalk). To the upper right a DNA molecule is shown (image courtesy of Wikimedia
Commons) and a protein (human serum albumin, the most abundant protein in blood plasma,
accession 1E7I, visualized with Cn3D software described in Chapter 13). Leonardo’s text reads:
“From the eyebrow to the junction of the lip with the chin, and the angle of the jaw and the upper
angle where the ear joins the temple will be a perfect square. And each side by itself is half the
head. The hollow of the cheek bone occurs half way between the tip of the nose and the top of
the jaw bone, which is the lower angle of the setting on of the ear, in the frame here represented.
From the angle of the eye-socket to the ear is as far as the length of the ear, or the third of the
face.” (Translation by Jean-Paul Richter, The Notebooks of Leonardo da Vinci, London, 1883.)
Set in Times LT Std 10.5/13 by Aptara, India
Printed in Singapore
1 2015
For three generations of family: to my parents
Aihud and Lucille; to my wife Barbara; to my daughters Kim,
Ava, and Lillian; and to my niece Madeline
Contents in Brief

part I Analyzing DNA, RNA, and Protein Sequences


1 Introduction, 3
2 Access to Sequence Data and Related Information, 19
3 Pairwise Sequence Alignment, 69
4 Basic Local Alignment Search Tool (BLAST), 121
5 Advanced Database Searching, 167
6 Multiple Sequence Alignment, 205
7 Molecular Phylogeny and Evolution, 245

Part II Genomewide Analysis of DNA, RNA, and Protein


8 DNA: The Eukaryotic Chromosome, 307
9 Analysis of Next-Generation Sequence Data, 377
10 Bioinformatic Approaches to Ribonucleic Acid (RNA), 433
11 Gene Expression: Microarray and RNA-seq Data Analysis, 479
12 Protein Analysis and Proteomics, 539
13 Protein Structure, 589
14 Functional Genomics, 635

Part III Genome Analysis


15 Genomes Across the Tree of Life, 699
16 Completed Genomes: Viruses, 755
17 Completed Genomes: Bacteria and Archaea, 797
18 Eukaryotic Genomes: Fungi, 847
19 Eukaryotic Genomes: From Parasites to Primates, 887
20 Human Genome, 957
21 Human Disease, 1011
GLOSSARY, 1075
Self-Test Quiz: Solutions, 1103
Author Index, 1105
Subject Index, 1109

vii
Contents

Preface to the third edition, xxxi


About the Companion Website, xxxiii

part I Analyzing DNA, RNA, and Protein Sequences


1 Introduction, 3
Organization of the Book, 4
Bioinformatics: The Big Picture, 5
A Consistent Example: Globins, 6
Organization of the Chapters, 8
Suggestions For Students and Teachers: Web Exercises, Find-a-Gene, and
Characterize-a-Genome, 9
Bioinformatics Software: Two Cultures, 10
Web-Based Software, 11
Command-Line Software, 11
Bridging the Two Cultures, 12
New Paradigms for Learning Programming for Bioinformatics, 13
Reproducible Research in Bioinformatics, 14
Bioinformatics and Other Informatics Disciplines, 15
Advice for Students, 15
Suggested Reading, 15
References, 16

2 Access to Sequence Data and Related Information, 19


Introduction to Biological Databases, 19
Centralized Databases Store DNA Sequences, 20
Contents of DNA, RNA, and Protein Databases, 24
Organisms in GenBank/EMBL-Bank/DDBJ, 24
Types of Data in GenBank/EMBL-Bank/DDBJ, 26
Genomic DNA Databases, 27
DNA-Level Data: Sequence-Tagged Sites (STSs), 27
DNA-Level Data: Genome Survey Sequences (GSSs), 27
DNA-Level Data: High-Throughput Genomic Sequence (HTGS), 27
RNA data, 27
RNA-Level Data: cDNA Databases Corresponding to Expressed Genes, 27
RNA-Level Data: Expressed Sequence Tags (ESTs), 28
RNA-Level Data: UniGene, 28
ix
x contents

Access to Information: Protein Databases, 29


UniProt, 31
Central Bioinformatics Resources: NCBI and EBI, 31
Introduction to NCBI, 31
The European Bioinformatics Institute (EBI), 32
Ensembl, 34
Access to Information: Accession Numbers to Label and Identify
Sequences, 34
The Reference Sequence (RefSeq) Project, 36
RefSeqGene and the Locus Reference Genomic Project, 37
The Consensus Coding Sequence CCDS Project, 37
The Vertebrate Genome Annotation (VEGA) Project, 37
Access to Information via Gene Resource at NCBI, 38
Relationship Between NCBI Gene, Nucleotide, and Protein Resources, 41
Comparison of NCBI’s Gene and UniGene, 41
NCBI’s Gene and HomoloGene, 42
Command-Line Access to Data at NCBI, 42
Using Command-Line Software, 42
Accessing NCBI Databases with EDirect, 45
EDirect Example 1, 46
EDirect Example 2, 46
EDirect Example 3, 46
EDirect Example 4, 47
EDirect Example 5, 48
EDirect Example 6, 48
EDirect Example 7, 48
Access to Information: Genome Browsers, 49
Genome Builds, 49
The University of California, Santa Cruz (UCSC) Genome Browser, 50
The Ensembl Genome Browser, 50
The Map Viewer at NCBI, 52
Examples of How to Access Sequence Data: Individual Genes/Proteins, 52
Histones, 52
HIV-1 pol, 53
How to Access Sets of Data: Large-Scale Queries of Regions and Features, 54
Thinking About One Gene (or Element) Versus Many Genes (Elements), 54
The BioMart Project, 54
Using the UCSC Table Browser, 54
Custom Tracks: Versatility of the BED File, 56
Galaxy: Reproducible, Web-Based, High-Throughput Research, 57
Access to Biomedical Literature, 58
Example of PubMed Search, 59
Perspective, 59
Pitfalls, 60
Advice for Students, 60
Contents xi

Web Resources, 60
Discussion Questions, 61
Problems/Computer Lab, 61
Self-Test Quiz, 63
Suggested Reading, 64
References, 64

3 Pairwise Sequence Alignment, 69


Introduction, 69
Protein Alignment: Often More Informative than DNA Alignment, 70
Definitions: Homology, Similarity, Identity, 70
Gaps, 78
Pairwise Alignment, Homology, and Evolution of Life, 78
Scoring Matrices, 79
Dayhoff Model Step 1 (of 7): Accepted Point Mutations, 79
Dayhoff Model Step 2 (of 7): Frequency of Amino Acids, 79
Dayhoff Model Step 3 (of 7): Relative Mutability of Amino Acids, 80
Dayhoff Model Step 4 (of 7): Mutation Probability Matrix for the
 Evolutionary Distance of 1 PAM, 82
Dayhoff Model Step 5 (of 7): PAM250 and Other PAM Matrices, 84
Dayhoff Model Step 6 (of 7): From a Mutation Probability Matrix to a
Relatedness Odds Matrix, 88
Dayhoff Model Step 7 (of 7): Log-Odds Scoring Matrix, 89
Practical Usefulness of PAM Matrices in Pairwise Alignment, 91
Important Alternative to PAM: BLOSUM Scoring Matrices, 91
Pairwise Alignment and Limits of Detection: The “Twilight Zone”, 94
Alignment Algorithms: Global and Local, 96
Global Sequence Alignment: Algorithm of Needleman and Wunsch, 96
Step 1: Setting Up a Matrix, 96
Step 2: Scoring the Matrix, 97
Step 3: Identifying the Optimal Alignment, 99
Local Sequence Alignment: Smith and Waterman Algorithm, 101
Rapid, Heuristic Versions of Smith–Waterman: FASTA and BLAST, 103
Basic Local Alignment Search Tool (BLAST), 104
Pairwise Alignment with Dotplots, 104
The Statistical Significance of Pairwise Alignments, 106
Statistical Significance of Global Alignments, 106
Statistical Significance of Local Alignments, 108
Percent Identity and Relative Entropy, 108
Perspective, 110
Pitfalls, 112
Advice for Students, 112
Web Resources, 112
Discussion Questions, 113
Problems/Computer Lab, 113
xii contents

Self-Test Quiz, 114


Suggested Reading, 115
References, 116    

4 Basic Local Alignment Search Tool (BLAST) , 121


Introduction, 121
BLAST Search Steps, 124
Step 1: Specifying Sequence of Interest, 124
Step 2: Selecting BLAST Program, 124
Step 3: Selecting a Database, 126
Step 4a: Selecting Optional Search Parameters, 127
Step 4b: Selecting Formatting Parameters, 132
Stand-Alone BLAST, 135
BLAST Algorithm Uses Local Alignment Search Strategy, 138
BLAST Algorithm Parts: List, Scan, Extend, 138
BLAST Algorithm: Local Alignment Search Statistics and E Value, 141
Making Sense of Raw Scores with Bit Scores, 143
BLAST Algorithm: Relation Between E and p Values, 143
BLAST Search Strategies, 145
General Concepts, 145
Principles of BLAST Searching, 146
How to Evaluate the Significance of Results, 146
How to Handle Too Many Results, 150
How to Handle Too Few Results, 150
BLAST Searching with Multidomain Protein: HIV-1 Pol, 151
Using Blast For Gene Discovery: Find-a-Gene, 155
Perspective, 159
Pitfalls, 160
Advice for Students, 160
Web Resources, 160
Discussion Questions, 160
Problems/Computer Lab, 160
Self-Test Quiz, 161
Suggested Reading, 162
References, 163

5 Advanced Database Searching , 167


Introduction, 167
Specialized BLAST Sites, 168
Organism-Specific BLAST Sites, 168
Ensembl BLAST, 168
Wellcome Trust Sanger Institute, 170
Specialized BLAST-Related Algorithms, 170
WU BLAST 2.0, 170
European Bioinformatics Institute (EBI), 170
Contents xiii

Specialized NCBI BLAST Sites, 170


BLAST of Next-Generation Sequence Data, 170
Finding Distantly Related Proteins: Position-Specific Iterated BLAST
(PSI-BLAST) and DELTA-BLAST, 171
PSI-BLAST Errors: Problem of Corruption, 177
Reverse Position-Specific BLAST, 177
Domain Enhanced Lookup Time Accelerated BLAST (DELTA-BLAST), 177
Assessing Performance of PSI-BLAST and DELTA-BLAST, 179
Pattern-Hit Initiated BLAST (PHI-BLAST), 179
Profile Searches: Hidden Markov Models, 181
HMMER Software: Command-Line and Web-Based, 184
BLAST-Like Alignment Tools to Search Genomic DNA Rapidly, 186
Benchmarking to Assess Genomic Alignment Performance, 187
PatternHunter: Nonconsecutive Seeds Boost Sensitivity, 188
BLASTZ, 188
Enredo and Pecan, 191
MegaBLAST and Discontinuous MegaBLAST, 191
BLAST-Like Tool (BLAT), 192
LAGAN, 192
SSAHA2, 194
Aligning Next-Generation Sequence (NGS) Reads to a Reference Genome, 194
Alignment Based on Hash Tables, 194
Alignment Based on the Burrows–Wheeler Transform, 196
Perspective, 197
Pitfalls, 197
Advice For Students, 198
Web Resources, 198
Discussion Questions, 198
Problems/Computer Lab, 198
Self-Test Quiz, 199
Suggested Reading, 200
References, 201

6 Multiple Sequence Alignment, 205


Introduction, 205
Definition of Multiple Sequence Alignment, 206
Typical Uses and Practical Strategies of Multiple Sequence Alignment, 207
Benchmarking: Assessment of Multiple Sequence Alignment Algorithms, 207
Five Main Approaches to Multiple Sequence Alignment, 208
Exact Approaches to Multiple Sequence Alignment, 208
Progressive Sequence Alignment, 208
Iterative Approaches, 214
Consistency-Based approaches, 218
Structure-Based Methods, 220
Benchmarking Studies: Approaches, Findings, Challenges, 221
xiv contents

Databases of Multiple Sequence Alignments, 222


Pfam: Protein Family Database of Profile HMMs, 223
SMART, 224
Conserved Domain Database, 226
Integrated Multiple Sequence Alignment Resources: InterPro and
iProClass, 226
Multiple Sequence Alignment Database Curation: Manual Versus
 ­Automated, 227
Multiple Sequence Alignments of Genomic Regions, 227
Analyzing Genomic DNA Alignments via UCSC, 229
Analyzing Genomic DNA Alignments via Galaxy, 229
Analyzing Genomic DNA Alignments via Ensembl, 231
Alignathon Competition to Assess Whole-Genome Alignment
Methods, 231
Perspective, 234
Pitfalls, 234
Advice for Students, 235
Discussion Questions, 235
Problems/Computer Lab, 235
Self-Test Quiz, 237
Suggested Reading, 238
References, 239

7 Molecular Phylogeny and Evolution, 245


Introduction to Molecular Evolution, 245
Principles of Molecular Phylogeny and Evolution, 246
Goals of Molecular Phylogeny, 246
Historical Background, 247
Molecular Clock Hypothesis, 250
Positive and Negative Selection, 254
Neutral Theory of Molecular Evolution, 258
Molecular Phylogeny: Properties of Trees, 259
Topologies and Branch Lengths of Trees, 259
Tree Roots, 262
Enumerating Trees and Selecting Search Strategies, 263
Type of Trees, 266
Species Trees versus Gene/Protein Trees, 266
DNA, RNA, or Protein-Based Trees, 268
Five Stages of Phylogenetic Analysis, 270
Stage 1: Sequence Acquisition, 270
Stage 2: Multiple Sequence Alignment, 271
Stage 3: Models of DNA and Amino Acid Substitution, 272
Stage 4: Tree-Building Methods, 281
Distance-Based, 282
Phylogenetic Inference: Maximum Parsimony, 287
Contents xv

Model-Based Phylogenetic Inference: Maximum Likelihood, 289


Tree Inference: Bayesian Methods, 290
Stage 5: Evaluating Trees, 293
Perspective, 295
Pitfalls, 295
Advice for Students, 296
Web Resources, 297
Discussion Questions, 297
Problems/Computer Lab, 297
Self-Test Quiz, 298
Suggested Reading, 298
References, 299

part II Genomewide Analysis of DNA, RNA, and Protein

8 DNA: The Eukaryotic Chromosome, 307


Introduction, 308
Major Differences between Eukaryotes and Bacteria and Archaea, 308
General Features of Eukaryotic Genomes and Chromosomes, 310
C Value Paradox: Why Eukaryotic Genome Sizes Vary So Greatly, 312
Organization of Eukaryotic Genomes into Chromosomes, 310
Analysis of Chromosomes Using Genome Browsers, 314
Analysis of Chromosomes Using BioMart and biomaRt, 314
Example 1, 317
Example 2, 319
Example 3, 319
Example 4, 319
Example 5, 320
Analysis of Chromosomes by the ENCODE Project, 320
Critiques of ENCODE: the C Value Paradox Revisited and the Definition of
Function, 322
Repetitive DNA Content of Eukaryotic Chromosomes, 323
Eukaryotic Genomes Include Noncoding and Repetitive DNA Sequences, 323
Interspersed Repeats (Transposon-Derived Repeats), 325
Processed Pseudogenes, 326
Simple Sequence Repeats, 331
Segmental Duplications, 331
Blocks of Tandemly Repeated Sequences, 333
Gene Content of Eukaryotic Chromosomes, 334
Definition of Gene, 334
Finding Genes in Eukaryotic Genomes, 336
Finding Genes in Eukaryotic Genomes: EGASP Competition, 339
Three Resources for Studying Protein-Coding Genes: RefSeq, UCSC Genes,
GENCODE, 340
Protein-Coding Genes in Eukaryotes: New Paradox, 342
xvi contents

Regulatory Regions of Eukaryotic Chromosomes, 342


Databases of Genomic Regulatory Factors, 342
Ultraconserved Elements, 345
Nonconserved Elements, 345
Comparison of Eukaryotic DNA, 346
Variation in Chromosomal DNA, 347
Dynamic Nature of Chromosomes: Whole-Genome Duplication, 347
Chromosomal Variation in Individual Genomes, 349
Structural Variation: Six Types, 351
Inversions, 351
Mechanisms of Creating Duplications, Deletions, and Inversions, 351
Models for Creating Gene Families, 353
Chromosomal Variation in Individual Genomes: SNPs, 354
Techniques to Measure Chromosomal Change, 355
Array Comparative Genomic Hybridization, 356
SNP Microarrays, 356
Next-Generation Sequencing, 359
Perspective, 359
Pitfalls, 359
Advice to Students, 360
Web Resources, 360
Discussion Questions, 361
Problems/Computer Lab, 361
Self-Test Quiz, 364
Suggested Reading, 365
References, 366

9 Analysis of Next-Generation Sequence Data, 377


Introduction, 378
DNA Sequencing Technologies, 377
Sanger Sequencing, 379
Next-Generation Sequencing, 379
Cyclic Reversible Termination: Illumina, 382
Pyrosequencing, 384
Sequencing by Ligation: Color Space with ABI SOLiD, 385
Ion Torrent: Genome Sequencing by Measuring pH, 387
Pacific Biosciences: Single-Molecule Sequencing with Long Read Lengths, 387
Complete Genomics: Self-Assembling DNA Nanoarrays, 387
Analysis of Next-Generation Sequencing of Genomic DNA, 387
Overview of Next-Generation Sequencing Data Analysis, 387
Topic 1: Experimental Design and Sample Preparation, 389
Topic 2: From Generating Sequence Data to FASTQ, 390
Finding and Viewing FASTQ files, 392
Quality Assessment of FASTQ data, 393
FASTG: A Richer Format than FASTQ, 394
Contents xvii

Topic 3: Genome Assembly, 394


Competitions and Critical Evaluations of the Performance of Genome
Assemblers, 396
The End of Assembly: Standards for Completion, 398
Topic 4: Sequence Alignment, 399
Alignment of Repetitive DNA, 400
Genome Analysis Toolkit (GATK) Workflow: Alignment with BWA, 401
Topic 5: The SAM/BAM Format and SAMtools, 402
Calculating Read Depth, 405
Finding and Viewing BAM/SAM files, 405
Compressed Alignments: CRAM File Format, 406
Topic 6: Variant Calling: Single-Nucleotide Variants and Indels, 408
Topic 7: Variant Calling: Structural Variants, 409
Topic 8: Summarizing Variation: The VCF Format and VCFtools, 410
Finding and Viewing VCF files, 413
Topic 9: Visualizing and Tabulating Next-Generation Sequence Data, 413
Topic 10: Interpreting the Biological Significance of Variants, 417
Topic 11: Storing Data in Repositories, 421
Specialized Applications of Next-Generation Sequencing, 421
Perspective, 422
Pitfalls, 423
Advice for Students, 423
Web Resources, 424
Discussion Questions, 424
Problems/Computer Lab, 424
Self-Test Quiz, 425
Suggested Reading, 425
References, 425

10 Bioinformatic Approaches to Ribonucleic Acid (RNA), 433


Introduction to RNA, 433
Noncoding RNA, 436
Noncoding RNAs in the Rfam Database, 436
Transfer RNA, 438
Ribosomal RNA, 441
Small Nuclear RNA, 445
Small Nucleolar RNA, 445
MicroRNA, 445
Short Interfering RNA, 447
Long Noncoding RNA (lncRNA), 447
Other Noncoding RNA, 448
Noncoding RNAs in the UCSC Genome and Table Browser, 448
Introduction to Messenger RNA, 450
mRNA: Subject of Gene Expression Studies, 450
Low- and High-Throughput Technologies to Study mRNAs, 452
xviii contents

Analysis of Gene Expression in cDNA Libraries, 455


Full-Length cDNA Projects, 459
BodyMap2 and GTEx: Measuring Gene Expression Across the Body, 459
Microarrays and RNA-Seq: Genome-Wide Measurement of
Gene Expression, 460
Stage 1: Experimental Design for Microarrays and RNA-seq, 461
Stage 2: RNA Preparation and Probe Preparation, 461
Stage 3: Data Acquisition, 464
Hybridization of Labeled Samples to DNA Microarrays, 464
Data acquisition for RNA-seq, 465
Stage 4: Data Analysis, 465
Stage 5: Biological Confirmation, 465
Microarray and RNA-seq Databases, 465
Further Analyses, 465
Interpretation of RNA Analyses, 466
The Relationship between DNA, mRNA, and Protein Levels, 466
The Pervasive Nature of Transcription, 467
eQTLs: Understanding the Genetic Basis of Variation in Gene Expression
through Combined RNA-seq and DNA-seq, 468
Perspective, 469
Pitfalls, 470
Advice to Students, 470
Web Resources, 470
Discussion Questions, 471
Problems/Computer Lab, 471
Self-Test Quiz, 471
Suggested Reading, 472
References, 473

11 Gene Expression: Microarray and RNA-seq Data Analysis, 479


Introduction, 479
Microarray Analysis Method 1: GEO2R at NCBI, 482
GEO2R Executes a Series of R Scripts, 482
GEO2R Identifies the Chromosomal Origin of Regulated Transcripts, 485
GEO2R Normalizes Data, 486
GEO2R uses RMA Normalization for Accuracy and Precision, 488
Fold Change (Expression Ratios), 490
GEO2R Performs >22,000 Statistical Tests, 490
GEO2R Offers Corrections for Multiple Comparisons, 494
Microarray Analysis Method 2: Partek, 495
Importing Data, 496
Quality Control, 496
Adding Sample Information, 497
Sample Histogram, 498
Scatter Plots and MA Plots, 498
Contents xix

Working with Log2 Transformed Microarray Data, 498


Exploratory Data Analysis with Principal Components
Analysis (PCA), 498
Performing ANOVA in Partek, 501
From t-test to ANOVA, 503
Microarray Analysis Method 3: Analyzing a GEO Dataset with R, 504
Setting up the Analyses, 504
Reading CEL Files and Normalizing with RMA, 506
Identifying Differentially Expressed Genes (Limma), 508
Microarray Analysis and Reproducibility, 510
Microarray Data Analysis: Descriptive Statistics, 511
Hierarchical Cluster Analysis of Microarray Data, 511
Partitioning Methods for Clustering: k-Means Clustering, 516
Multidimensional Scaling Compared to Principal Components
Analysis, 517
Clustering Strategies: Self-Organizing Maps, 517
Classification of Genes or Samples, 517
RNA-Seq, 519
Setting up a TopHat and CuffLinks Sample Protocol, 523
TopHat to Map Reads to a Reference Genome, 524
Cufflinks to Assemble Transcripts, 525
Cuffdiff to Determine Differential Expression, 525
CummeRbund to Visualize RNA-seq Results, 526
RNA-seq Genome Annotation Assessment Project
(RGASP), 527
Functional Annotation of Microarray Data, 528
Perspective, 529
Pitfalls, 530
Advice for Students, 531
Suggested Reading, 531
Problems/Computer Lab, 532
Self-Test Quiz, 532
Suggested Reading, 533
References, 534

12 Protein Analysis and Proteomics, 539


Introduction, 539
Protein Databases, 540
Community Standards for Proteomics Research, 542
Evaluating the State-of-the-Art: ABRF analytic
challenges, 542
Techniques for Identifying Proteins, 543
Direct Protein Sequencing, 543
Gel Electrophoresis, 543
Mass Spectrometry, 547
xx contents

Four Perspectives on Proteins, 551


Perspective 1: Protein Domains and Motifs: Modular Nature of Proteins, 552
Added Complexity of Multidomain Proteins, 557
Protein Patterns: Motifs or Fingerprints Characteristic of Proteins, 557
Perspective 2: Physical Properties of Proteins, 559
Accuracy of Prediction Programs, 561
Proteomic Approaches to Phosphoryation, 563
Proteomic Approaches to Transmembrane Regions, 565
Introduction to Perspectives 3 and 4: Gene Ontology Consortium, 567
Perspective 3: Protein Localization, 568
Perspective 4: Protein Function, 570
Perspective, 575
Pitfalls, 575
Advice for Students, 575
Web Resources, 576
Discussion Questions, 578
Problems/Computer Lab, 578
Self-Test Quiz, 579
Suggested Reading, 580
References, 580

13 Protein Structure, 589


Overview of Protein Structure, 589
Protein Sequence and Structure, 590
Biological Questions Addressed by Structural Biology: Globins, 591
Principles of Protein Structure, 591
Primary Structure, 591
Secondary Structure, 594
Tertiary Protein Structure: Protein-Folding Problem, 598
Structural Genomics, the Protein Structure Initiative, and Target Selection, 600
Protein Data Bank, 602
Accessing PDB Entries at NCBI Website, 606
Integrated Views of Universe of Protein Folds, 609
Taxonomic System for Protein Structures: SCOP Database, 610
CATH Database, 613
Dali Domain Dictionary, 615
Comparison of Resources, 617
Protein Structure Prediction, 617
Homology Modeling (Comparative Modeling), 618
Fold Recognition (Threading), 619
Ab Initio Prediction (Template-Free Modeling), 621
A Competition to Assess Progress in Structure Prediction, 621
Intrinsically Disordered Proteins, 622
Protein Structure and Disease, 622
Perspective, 625
Contents xxi

Pitfalls, 625
Advice for Students, 625
Discussion Questions, 625
Problems/Computer Lab, 626
Self-Test Quiz, 627
Suggested Reading, 628
References, 628

14 Functional Genomics, 635


Introduction to Functional Genomics, 635
The Relationship Between Genotype and Phenotype, 637
Eight-Model Organisms For Functional Genomics, 638
1. The Bacterium Escherichia coli, 639
2. The Yeast Saccharomyces cerevisiae, 640
3. The Plant Arabidopsis thaliana, 643
4. The Nematode Caenorhabditis elegans, 643
5. The Fruit Fly Drosophila melanogaster, 645
6. The Zebrafish Danio rerio, 645
7. The Mouse Mus musculus, 646
8. Homo sapiens: Variation in Humans, 647
Functional Genomics Using Reverse and Forward Genetics, 648
Reverse Genetics: Mouse Knockouts and the β-Globin Gene, 650
Reverse Genetics: Knocking Out Genes in Yeast Using Molecular
 ­Barcodes, 653
Reverse Genetics: Random Insertional Mutagenesis (Gene Trapping), 657
Reverse Genetics: Insertional Mutagenesis in Yeast, 660
Reverse Genetics: Gene Silencing by Disrupting RNA, 662
Forward Genetics: Chemical Mutagenesis, 665
Comparison of Reverse and Forward Genetics, 665
Functional Genomics and the Central Dogma, 666
Approaches to Function and Definitions of Function, 646
Functional Genomics and DNA: Integrating Information, 668
Functional Genomics and RNA, 668
Functional Genomics and Protein, 670
Proteomics Approaches to Functional Genomics, 670
Functional Genomics and Protein: Critical Assessment of Protein Function
Annotation, 672
Protein–Protein Interactions, 672
Yeast Two-Hybrid System, 673
Protein Complexes: Affinity Chromatography and Mass
Spectrometry, 675
Protein–Protein Interaction Databases, 676
From Pairwise Interactions to Protein Networks, 678
Assessment of Accuracy, 680
Choice of Data, 680
xxii contents

Experimental Organism, 680


Variation in Pathways, 681
Categories of Maps, 681
Pathways, Networks, and Integration: Bioinformatics Resources, 682
Perspective, 685
Pitfalls, 686
Advice for Students, 686
Web Resources, 686
Discussion Questions, 686
Problems/Computer Lab, 686
Self-Test Quiz, 687
Suggested Reading, 688
References, 688

part III Genome Analysis

15 Genomes Across the Tree of Life, 699


Introduction, 700
Five Perspectives on Genomics, 701
Brief History of Systematics, 701
History of Life on Earth, 705
Molecular Sequences as the Basis of the Tree of Life, 705
Role of Bioinformatics in Taxonomy, 709
Prominent Web Resources, 710
Ensembl Genomes, 710
NCBI Genome, 710
Genome Portal of DOE JGI and the Integrated Microbial Genomes, 710
Genomes On Line Database (GOLD), 710
UCSC, 710
Genome-Sequencing Projects: Chronology, 711
Brief Chronology, 711
1976–1978: First Bacteriophage and Viral Genomes, 711
1981: First Eukaryotic Organellar Genome, 712
1986: First Chloroplast Genomes, 714
1992: First Eukaryotic Chromosome, 715
1995: Complete Genome of Free-Living Organism, 715
1996: First Eukaryotic Genome, 715
1997: Escherichia coli, 715
1998: First Genome of Multicellular Organism, 716
1999: Human Chromosome, 716
2000: Fly, Plant, and Human Chromosome 21, 716
2001: Draft Sequences of Human Genome, 716
2002: Continuing Rise in Completed Genomes, 717
2003: HapMap, 717
2004: Chicken, Rat, and Finished Human Sequences, 717
Contents xxiii

2005: Chimpanzee, Dog, Phase I HapMap, 718


2006: Sea Urchin, Honeybee, dbGaP, 718
2007: Rhesus Macaque, First Individual Human Genome, ENCODE Pilot, 718
2008: Platypus, First Cancer Genome, First Personal Genome Using NGS, 718
2009: Bovine, First Human Methlyome Map, 718
2010: 1000 Genomes Pilot, Neandertal , Exome Sequencing to
Find Disease Genes, 719
2011: A Vision for the Future of Genomics, 719
2012: Denisovan Genome, Bonobo, and 1000 Genomes Project, 719
2013: The Simplest Animal and a 700,000-Year-Old Horse, 719
2014: Mouse ENCODE, Primates, Plants, and Ancient Hominids, 719
2015: Diversity in Africa, 720
Genome Analysis Projects: Introduction, 720
Large-Scale Genomics Projects, 721
Criteria for Selection of Genomes for Sequencing, 722
Genome Size, 722
Cost, 722
Relevance to Human Disease, 723
Relevance to Basic Biological Questions, 724
Relevance to Agriculture, 724
Sequencing of One Versus Many Individuals from a Species, 724
Role of Comparative Genomics, 724
Resequencing Projects, 725
Ancient DNA Projects, 725
Metagenomics Projects, 725
Genome Analysis Projects: Sequencing, 728
Genome-Sequencing Centers, 728
Trace Archive: Repository for Genome Sequence Data, 728
HTGS Archive: Repository for Unfinished Genome Sequence Data, 730
Genome Analysis Projects: Assembly, 730
Four Approaches to Genome Assembly, 730
Genome Assembly: From FASTQ to Contigs with Velvet, 733
Comparative Genome Assembly: Mapping Contigs to Known Genomes, 734
Finishing: When Has a Genome Been Fully Sequenced?, 735
Genome Assembly: Measures of Success, 735
Genome Assembly: Challenges, 735
Genome Analysis Projects: Annotation, 737
Annotation of Genes in Eukaryotes: Ensembl Pipeline, 738
Annotation of Genes in Eukaryotes: NCBI Pipeline, 739
Core Eukaryotic Genes Mapping Approach (CEGMA), 739
Assemblies from the Genome Reference Consortium, 741
Assembly Hubs and Transfers at UCSC, Ensembl, and NCBI, 741
Annotation of Genes in Bacteria and Archaea, 741
Genome Annotation Standards, 741
Perspective, 742
xxiv contents

Pitfalls, 742
Advice for Students, 743
Discussion Questions, 743
Problems/Computer Lab, 743
Self-Test Quiz, 745
Suggested Reading, 743
References, 745

16 Completed Genomes: Viruses, 755


Introduction, 755
International Committee on Taxonomy of Viruses (ICTV) and
Virus Species, 756
Classification of Viruses, 758
Classification of Viruses Based on Morphology, 758
Classification of Viruses Based on Nucleic Acid Composition, 758
Classification of Viruses Based on Genome Size, 758
Classification of Viruses Based on Disease Relevance, 760
Diversity and Evolution of Viruses, 762
Metagenomics and Virus Diversity, 764
Bioinformatics Approaches to Problems in Virology, 765
Human Immunodeficiency Virus (HIV), 766
NCBI and LANL resources for HIV-1, 766
Influenza Virus, 771
Measles Virus, 774
Ebola Virus, 775
Herpesvirus: From Phylogeny to Gene Expression, 776
The Pairwise Sequence Comparison (PASC) Tool, 780
Giant Viruses, 782
Comparing genomes with MUMmer, 783
Perspectives, 785
Pitfalls, 786
Advice for Students, 786
Web Resources, 786
Discussion Questions, 787
Problems/Computer Lab, 787
Self-Test Quiz, 788
Suggested Reading, 789
References, 789

17 Completed Genomes: Bacteria and Archaea, 797


Introduction, 797
Classification of Bacteria and Archaea, 798
Classification of Bacteria by Morphological Criteria, 800
Classification of Bacteria and Archaea Based on Genome Size and
Geometry, 801
Contents xxv

Classification of Bacteria and Archaea Based on Lifestyle, 805


Classification of Bacteria Based on Human Disease Relevance, 808
Classification of Bacteria and Archaea Based on Ribosomal RNA
Sequences, 809
Classification of Bacteria and Archaea Based on Other Molecular
 ­Sequences, 810
The Human Microbiome, 811
Analysis of Bacterial and Archaeal Genomes, 814
Nucleotide Composition, 817
Finding Genes, 819
Interpolated Context Model (ICM), 822
GLIMMER3, 824
Challenges of Bacterial and Archaeal Gene Prediction, 825
Gene Annotation, 825
Lateral Gene Transfer, 827
Comparison of Bacterial Genomes, 830
TaxPlot, 830
MUMmer, 833
Perspective, 834
Pitfalls, 835
Advice for Students, 835
Web Resources, 835
Discussion Questions, 836
Problems/Computer Lab, 836
Self-Test Quiz, 836
Suggested Reading, 837
References, 837

18 Eukaryotic Genomes: Fungi, 847


Introduction, 847
Description and Classification of Fungi, 848
Introduction to Budding Yeast Saccharomyces Cerevisiae, 849
Sequencing Yeast Genome, 851
Features of Budding Yeast Genome, 851
Exploring Typical Yeast Chromosome, 854
Web Resources for Analyzing a Chromosome, 854
Exploring Variation in a Chromosome with Command-Line Tools, 857
Finding Genes in a Chromosome with Command-Line Tools, 858
Properties of Yeast Chromosome XII, 860
Gene Duplication and Genome Duplication of S. cerevisiae, 860
Comparative Analyses of Hemiascomycetes, 865
Comparative Analyses of Whole-Genome Duplication, 866
Identification of Functional Elements, 868
Analysis of Fungal Genomes, 869
Fungi in the Human Microbiome, 870
xxvi contents

Aspergillus, 871
Candida albicans, 871
Cryptococcus neoformans: model fungal pathogen, 872
Atypical Fungus: Microsporidial Parasite Encephalitozoon cuniculi, 873
Neurospora crassa, 873
First Basidiomycete: Phanerochaete chrysosporium, 875
Fission Yeast Schizosaccharomyces pombe, 875
Other Fungal Genomes, 876
Ten Leading Fungal Plant Pathogens, 876
Perspective, 876
Pitfalls, 877
Advice for Students, 877
Web Resources, 877
Discussion Questions, 877
Problems/Computer Lab, 878
Self-Test Quiz, 879
Suggested Reading, 880
References, 880

19 Eukaryotic Genomes: From Parasites to Primates, 887


Introduction, 887
Protozoans at Base of Tree Lacking Mitochondria, 890
Trichomonas, 890
Giardia lamblia: A Human Intestinal Parasite, 891
Genomes of Unicellular Pathogens: Trypanosomes and Leishmania, 890
Trypanosomes, 892
Leishmania, 894
The Chromalveolates, 895
Malaria Parasite Plasmodium falciparum, 895
More Apicomplexans, 898
Astonishing Ciliophora: Paramecium and Tetrahymena, 899
Nucleomorphs, 902
Kingdom Stramenopila, 904
Plant Genomes, 906
Overview, 906
Green Algae (Chlorophyta), 908
Arabidopsis thaliana Genome, 910
The Second Plant Genome: Rice, 913
Third Plant: Poplar, 914
Fourth Plant: Grapevine, 915
Giant and Tiny Plant Genomes, 915
Hundreds More Land Plant Genomes, 915
Moss, 916
Slime and Fruiting Bodies at the Feet of Metazoans, 916
Social Slime Mold Dictyostelium discoideum, 916
Contents xxvii

Metazoans, 917
Introduction to Metazoans, 917
900 MYA: the Simple Animal Caenorhabditis elegans, 918
900 MYA: Drosophila melanogaster (First Insect Genome), 919
900 MYA: Anopheles gambiae (Second Insect Genome), 921
900 MYA: Silkworm and Butterflies, 922
900 MYA: Honeybee, 923
900 MYA: A Swarm of Insect Genomes, 923
840 MYA: A Sea Urchin on the Path to Chordates, 924
800 MYA: Ciona intestinalis and the Path to Vertebrates, 925
450 MYA: Vertebrate Genomes of Fish, 926
350 MYA: Frogs, 929
320 MYA: Reptiles (Birds, Snakes, Turtles, Crocodiles), 929
180 MYA: The Platypus and Opposum Genomes, 931
100 MYA: Mammalian Radiation from Dog to Cow, 933
80 MYA: The Mouse and Rat, 934
5–50 MYA: Primate Genomes, 937
Perspective, 940
Pitfalls, 941
Advice for Students, 941
Web Resources, 942
Discussion Questions, 942
Problems/Computer Lab, 942
Self-Test Quiz, 943
Suggested Reading, 944
References, 944

20 Human Genome, 957


Introduction, 957
Main Conclusions of Human Genome Project, 958
Gateways to Access the Human Genome, 959
NCBI, 959
Ensembl, 959
University of California at Santa Cruz Human Genome Browser, 961
NHGRI, 961
Wellcome Trust Sanger Institute, 964
Human Genome Project, 964
Background of Human Genome Project, 964
Strategic Issues: Hierarchical Shotgun Sequencing to Generate Draft ­
Sequence, 966
Human Genome Assemblies, 966
Broad Genomic Landscape, 968
Long-Range Variation in GC Content, 969
CpG Islands, 969
Comparison of Genetic and Physical Distance, 970
xxviii contents

Repeat Content of Human Genome, 971


Transposon-Derived Repeats, 972
Simple Sequence Repeats, 973
Segmental Duplications, 973
Gene Content of Human Genome, 974
Noncoding RNAs, 975
Protein-Coding Genes, 975
Comparative Proteome Analysis, 975
Complexity of Human Proteome, 978
25 Human Chromosomes, 979
Group A (Chromosomes 1–3), 981
Group B (Chromosomes 4, 5), 982
Group C (Chromosomes 6–12, X), 983
Group D (Chromosomes 13–15), 983
Group E (Chromosomes 16–18), 984
Group F (Chromosomes 19, 20), 984
Group G (Chromosomes 21, 22, Y), 984
Mitochondrial Genome, 985
Human Genome Variation, 986
SNPs, Haplotypes, and HapMap, 986
Viewing and Analyzing SNPs and Haplotypes, 988
HaploView, 988
HapMap Browser, 988
Integrative Genomics Browser (IGV), 988
NCBI dbSNP, 988
PLINK, 992
SNPduo, 990
Major Conclusions of HapMap Project, 994
The 1000 Genomes Project, 995
Variation: Sequencing Individual Genomes, 998
Perspective, 999
Pitfalls, 1000
Advice for Students, 1001
Discussion Questions, 1001
Problems/Computer Lab, 1001
Self-Test Quiz, 1003
Suggested Reading, 1004
References, 1004

21 Human Disease, 1011


Human Genetic Disease: A Consequence of DNA Variation, 1011
A Bioinformatics Perspective on Human Disease, 1012
Garrod’s View of Disease, 1014
Classification of Disease, 1015
NIH Disease Classification: MeSH Terms, 1017
Contents xxix

Categories of Disease, 1020


Allele Frequencies and Effect Sizes, 1020
Monogenic Disorders, 1021
Complex Disorders, 1024
Genomic Disorders, 1025
Environmentally Caused Disease, 1029
Disease and Genetic Background, 1030
Mitochondrial Disease, 1030
Somatic Mosaic Disease, 1032
Cancer: A Somatic Mosaic Disease, 1033
Disease Databases, 1036
OMIM: Central Bioinformatics Resource for Human
Disease, 1036
Human Gene Mutation Database (HGMD), 1039
ClinVar and Databases of Clinically Relevant Variants, 1040
GeneCards, 1041
Integration of Disease Database Information at the UCSC Genome
Browser, 1041
Locus-Specific Mutation Databases and LOVD, 1041
The PhenCode Project, 1044
Limitations of Disease Databases: The Growing Interpretive
Gap, 1045
Human Disease Genes and Amino Acid Substitutions, 1045
Approaches to Identifying Disease-Associated Genes and Loci, 1046
Linkage Analysis, 1047
Genome-Wide Association Studies, 1047
Identification of Chromosomal Abnormalities, 1050
Human Genome Sequencing, 1051
Genome Sequencing to Identify Monogenic Disorders, 1051
Genome Sequencing to Solve Complex Disorders, 1051
Research Versus Clinical Sequencing and Incidental
Findings, 1052
Disease-causing Variants in Apparently Normal
 Individuals, 1054
Human Disease Genes in Model Organisms, 1055
Human Disease Orthologs in Nonvertebrate
Species, 1056
Human Disease Orthologs in Rodents, 1058
Human Disease Orthologs in Primates, 1059
Functional Classification of Disease Genes, 1060
Perspective, 1063
Pitfalls, 1063
Advice for Students, 1063
Discussion Questions, 1064
Problems/Computer Lab, 1062
xxx contents

Self-Test Quiz, 1065


Suggested Reading, 1066
References, 1066

GLOSSARY, 1075

Self-Test Quiz: Solutions, 1103

Author Index, 1105

Subject Index, 1109


Preface to the Third
Edition

When the first edition of this textbook was published in 2003, the Human Genome Proj-
ect had just been completed at a cost of nearly US$ 3 billion. When the second edition
came into print in 2009, the first genome sequence of an individual (J. Craig Venter) had
recently been published at an estimated cost of US$ 80 million.
Let me tell you a remarkable story. It is now 2015 and it costs just several thousand
dollars to obtain the complete genome sequence of an individual. Sturge‐Weber syn-
drome is a rare neurocutaneous disorder (affecting the brain and skin) that is sometimes
debilitating: some patients must have a hemispherectomy (removal of half the brain)
to alleviate the severe seizures. We obtained paired samples from just three individuals
with Sturge‐Weber syndrome: biopsies were from affected parts of the body (such as
port‐wine stains that occur on the face, neck, or shoulder) or from presumably unaf-
fected regions. We purified DNA and sequenced these six whole genomes, compared the
matched pairs, and identified a single base pair mutation in the GNAQ gene as responsible
for Sturge‐Weber syndrome. (The mutation is somatic, mosaic, and activating: somatic in
that it occurs during development but is not transmitted from the parents; mosaic in that
it affects just part of the body; and activating because GNAQ encodes a protein that in
the mutated form turns on a signaling cascade.) We found that mutations in this gene also
cause port‐wine stain birthmarks (which affect 1 in 300 people or about 23 million people
worldwide). Matt Shirley, then a graduate student in my lab, performed the bioinformat-
ics analyses that led to this discovery. He analyzed about 700 billion bases of DNA. After
finding the mutation he confirmed it by re‐sequencing dozens of samples, typically at over
10,000‐fold depth of coverage. We reported these findings in the New England Journal
of Medicine in 2013.
This story illustrates several aspects of the fields of bioinformatics and genomics.
First, we are in a time period when there is an explosive growth in the availability of DNA
sequence. This is enabling us to address biological questions in unprecedented ways. Sec-
ond, while it is inexpensive to acquire DNA sequences, it is essential to know how to ana-
lyze them. One goal of this book is to introduce sequence analysis. Third, bioinformatics
serves biology: we can only interpret the significance of DNA sequence variation in the
context of some biological process (such as a disease state). In the case of the GNAQ
mutation, that gene encodes a protein (called Gαq) that we can study in tremendous detail
using the tools of bioinformatics; we can evaluate its three‐dimensional structure, the
proteins and chemical messengers it interacts with, and the cellular pathways it partici-
pates in. Fourth, bioinformatics and genomics offer us hope. For Sturge‐Weber syndrome
patients and those with port‐wine stain birthmarks, we are hopeful that a molecular under-
standing of these conditions will lead to treatments.
This book is written by a biologist who has used the tools of bioinformatics to help
understand biomedical research questions. I introduce concepts in the context of biolog-
ical problem‐solving. Compared to earlier editions, this new text emphasizes command‐
line software on the Linux (or Mac) platform, complemented by web‐based approaches.
xxxi
xxxii Preface to the third edition

In an era of “Big Data” there is a great divide between those whose intellectual core is
centered in biomedical science and those whose focus involves computer science. I hope
this book helps to bridge the divide between these two cultures.
Writing a book like this is a wonderful and constant learning experience. I thank past
and present members of my lab who taught me including Shruthi Bandyadka (for advice
on R), Christopher Bouton, Carlo Colantuoni, Donald Freed (for extensive advice on
next‐generation sequencing or NGS), Laurence Frelin, Mari Kondo, Sarah McClymont,
Nathaniel Miller, Alicia Rizzo, Eli Roberson, Matt Shirley (who also provided extensive
NGS advice), Eric Stevens, and Jamie Wangen. For advice on specific chapters, I thank:
Ben Busby of the National Center for Biotechnology Information (NCBI) for advice
regarding Chapters 1, 2, and 5 and detailed comments on Chapters 9 and 10; Eric Sayers
and Jonathan Kans of NCBI for advice on EDirect in Chapter 2; Heiko Schmidt for advice
on TREE‐PUZZLE and MrBayes in Chapter 7; Joel Benington for detailed comments on
Chapters 8 and 15–19 and helpful discussions about teaching; Harold Lehmann for guid-
ance on various fields of informatics; and N. Varg for helpful comments on all chapters. I
thank many colleagues who participated in teaching bioinformatics and genomics courses
over the years. I've learned from all these teachers, including Dimitri Avramopoulos, Jef
Boeke, Kyle Cunningham, Garry Cutting, George Dimopoulos, Egert Hoiczyk, Rafael
Irizarry, Akhilesh Pandey, Sean Prigge, Ingo Ruczinski, Alan Scott, Alan F. Scott, Kirby
D. Smith, David Sullivan, David Valle, and Sarah Wheelan. I am grateful to faculty mem-
bers with whom I taught genomics workshops including Elana Fertig, Luigi Marchionni,
John McGready, Loris Mulroni, Frederick Tan, and Sarah Wheelan. This book includes
several thousand literature references, but I apologize to the many more colleagues whose
work I did not cite. I also cite 900 websites and again apologize to the developers of the
many I did not include.
I also acknowledge the support of Dr Gary W. Goldstein, President and CEO of the
Kennedy Krieger Institute where I work. Kennedy Krieger Institute sees 22,000 patients a
year, mostly children with neurodevelopmental disorders from common conditions (such
as autism spectrum disorder and intellectual disability) to rare genetic diseases. I am
motivated to try to apply the tools of bioinformatics and genomics to help these children.
This perspective has guided my writing of this book, which emphasizes the relevance of
all the topics in bioinformatics and genomics to human disease in general. We are hopeful
that genomics will lead to an understanding of the molecular bases of so many devastating
conditions, and this in turn may one day lead to better diagnosis, prevention, treatment,
and perhaps even cures.
It is my pleasure to thank my editors at Wiley‐Blackwell – Laura Bell, Celia Carden,
Beth Dufour, Elaine Rowan, Fiona Seymour, Audrie Tan, and Rachel Wade – for generous
support throughout this project. I appreciate all their dedication to the quality of the book.
On a personal note I thank my wife Barbara for her love and support throughout the
very long process of writing this textbook. Finally, to my girls Ava and Lillian: I hope
you'll always be inspired to be curious and full of wonder about the world around us.
About the Companion
Website

This book is accompanied by a companion website:


www.wiley.com/go/pevsnerbioinformatics
Readers can visit this website for supplemental information, such as PowerPoint files of
all the figures and tables from the book, solutions to the Self-Test Quizzes and Problems
found at the end of each chapter.
The author also maintains a comprehensive website for the book:
www.bioinfbook.org
This site features lecture files (in PowerPoint and audiovisual format), over 900 Web
Links and over 130 Web Documents that are referred to throughout the book as well as
videocasts of how to perform many basic operations.

xxxiii
Analyzing DNA, PART
RNA, and Protein
Sequences
I
The first third of this book covers essen-
tial topics in bioinformatics. Chapter 1
provides an overview of the approaches
we take, including the use of web-based
and command-line software. We describe
how to access sequences (Chapter 2).
We then align them in a pairwise fashion
(Chapter 3) or compare them to members
of a database using BLAST (Chapter 4),
including specialized searches of protein
or DNA databases (Chapter 5). We next
perform multiple sequence alignment
(Chapter 6) and visualize these alignments
as phylogenetic trees with an evolution-
ary perspective (Chapter 7).

The upper image shows the connectivity of the internet (from the Wikipedia entry for “internet”),
while the lower image shows a map of human protein interactions (from the Wikipedia entry for
“Protein–protein interaction”). We seek to understand biological principles on a genome-wide scale
using the tools of bioinformatics.
Sources: Upper: Dcrjsr, 2002. Licensed under the Creative Commons Attribution 3.0 Unported license. Lower: The
Opte Project, 2006. Licensed under the Creative Commons Attribution 2.5 Generic license.
Introduction Chapter

1
Penetrating so many secrets, we cease to believe in the unknowable. But there it sits nev-
ertheless, calmly licking its chops.
— H.L. Mencken

Learning objectives
After reading this chapter you should be able to:
■■ define the terms bioinformatics;

■■ explain the scope of bioinformatics;

■■ explain why globins are a useful example to illustrate this discipline; and

■■ describe web-based versus command-line approaches to bioinformatics.

Bioinformatics represents a new field at the interface of the ongoing revolutions in molec-
ular biology and computers. I define bioinformatics as the use of computer databases and
computer algorithms to analyze proteins, genes, and the complete collection of deoxy-
ribonucleic acid (DNA) that comprises an organism (the genome). A major challenge
in biology is to make sense of the enormous quantities of sequence data and structural
data that are generated by genome‐sequencing projects, proteomics, and other large‐scale
molecular biology efforts. The tools of bioinformatics include computer programs that
help to reveal fundamental mechanisms underlying biological problems related to the
structure and function of macromolecules, biochemical pathways, disease processes, and
evolution.
According to a National Institutes of Health (NIH) definition, bioinformatics is
The NIH Bioinformatics Definition
“research, development, or application of computational tools and approaches for expand-
Committee findings are reported
ing the use of biological, medical, behavioral, or health data, including those to acquire,
at http://www.bisti.nih.gov/
store, organize, analyze, or visualize such data.” The related discipline of computational
docs/CompuBioDef.pdf (WebLink
biology is “the development and application of data‐analytical and theoretical methods,
1.1 at http://bioinfbook.org). The
mathematical modeling, and computational simulation techniques to the study of bio-
NHGRI definition is available at
logical, behavioral, and social systems.” Another definition from the National Human
http://www.genome.gov/
Genome Research Institute (NHGRI) is that “Bioinformatics is the branch of biology that
19519278 (WebLink 1.2).
is concerned with the acquisition, storage, display, and analysis of the information found
in nucleic acid and protein sequence data.”
Russ Altman (1998) and Altman and Dugan (2003) offer two definitions of bioinfor-
matics. The first involves information flow following the central dogma of molecular biol-
ogy (Fig. 1.1). The second definition involves information flow that is transferred based

Bioinformatics and Functional Genomics, Third Edition, Jonathan Pevsner.


© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.
Companion Website: www.wiley.com/go/pevsnerbioinformatics
3
4 Analyzing DNA, RNA, and Protein Sequences

Central dogma of molecular biology

cellular
DNA RNA protein
phenotype

Central dogma of genomics


cellular
genome transcriptome proteome
phenotype

DNA RNA protein

Figure 1.1 A first perspective of the field of bioinformatics is the cell. Bioinformatics has emerged
as a discipline as biology has become transformed by the emergence of molecular sequence data. Data-
bases such as the European Molecular Biology Laboratory (EMBL), GenBank, the Sequence Read
Archive, and the DNA Database of Japan (DDBJ) serve as repositories for quadrillions (1015) of nucle-
otides of DNA sequence data (see Chapter 2). Corresponding databases of expressed genes (RNA) and
protein have been established. A main focus of the field of bioinformatics is to study molecular sequence
data to gain insight into a broad range of biological problems.

on scientific methods. This second definition includes problems such as designing, vali-
dating, and sharing software; storing and sharing data; performing reproducible research
workflows; and interpreting experiments.
While the discipline of bioinformatics focuses on the analysis of molecular
sequences, genomics and functional genomics are two closely related disciplines. The
goal of genomics is to determine and analyze the complete DNA sequence of an organ-
ism, that is, its genome. The DNA encodes genes can be expressed as ribonucleic acid
(RNA) transcripts and then, in many cases, further translated into protein. Functional
genomics describes the use of genome‐wide assays to study gene and protein function.
For humans and other species, it is now possible to characterize an individual’s genome,
collection of RNA (transcriptome), proteome and even the collections of metabolites and
epigenetic changes, and the catalog of organisms inhabiting the body (the microbiome)
(Topol, 2014).
The aim of this book is to explain both the theory and practice of bioinformatics and
genomics. The book is especially designed to help the biology student use computer pro-
grams and databases to solve biological problems related to proteins, genes, and genomes.
Bioinformatics is an integrative discipline, and our focus on individual proteins and genes
is part of a larger effort to understand broad issues in biology such as the relationship of
structure to function, development, and disease. For the computer scientist, this book
explains the motivations for creating and using algorithms and databases.

Organization of the Book


There are three main sections of the book. Part I (Chapters 2–7) explains how to access
biological sequence data, particularly DNA and protein sequences (Chapter 2). Once
sequences are obtained, we show how to compare two sequences (pairwise alignment;
Exploring the Variety of Random
Documents with Different Content
“Right again, Nick.”
“I never have heard, however, that Connie Taggart was friendly with
them. If any of them were with Lang last night, we may be able to find
positive evidence of it and to force a squeal from them. Otherwise—hello!”
Nick broke off abruptly when they turned the corner, and Chick also saw
the occasion for it.
“Goodness!” he exclaimed. “There is Patsy, and—yes, by Jove, it’s Frank
Mantell. What the deuce has sent them here?”
The touring car containing Patsy Garvan and Mantell, driven by the
latter’s chauffeur, had just swerved to the sidewalk near the house in which
the two murders had been committed.
CHAPTER III.

THE MAN FROM MEXICO.

Nick Carter hastened to join Patsy and Frank Mantell, pausing at the
latter’s touring car to learn the occasion for his visit. He had not long to wait,
for Mantell hardly took time to greet him.
“You must throw up this murder case, Nick; you really must, and take on
a matter in which I am desperately interested,” he forcibly insisted. “More
than half a million dollars are at stake. They’re hopelessly lost, in fact,
unless you can trace and recover them. You must drop this case and——”
“Wait!” Nick interposed, after intently regarding him. “Keep your head.
Who has lost so much money, and when?”
“It’s not money,” Mantell replied, in hurried undertones. “It’s a collection
of old jewels of vast value, which was obtained under most extraordinary
circumstances. I cannot inform you in detail out here, Nick, where I might be
overheard by others. Come with me to my residence, where——”
“Presently, perhaps,” Nick again interrupted. “Come into this house,
instead, where we can occupy one of the chambers. I then will hear what you
have to say.”
Mantell did not wait for the invitation to be repeated. He sprang out of the
car before it was fairly uttered, then accompanied the detective to the house,
followed by Chick and Patsy.
Nick lingered only to inform Sergeant Kennedy that he had other
business for a few minutes, directing him to take charge of the house while
he was engaged, and he then led his three companions to a front chamber
and closed the door.
“Now, Mantell, out with it as briefly as possible,” said he, when they
were seated. “What is this matter in which you are so desperately
interested?”
He had read in Mantell’s pale face the depths of his anxiety and distress,
and knowing him to have a level head and excellent judgment and
discretion, he reasoned that it must be a matter of extraordinary importance.
Mantell hastened to obey him.
“It began, Nick, with a letter I received about ten days ago from an old
college chum of mine, Calvin Vandyke, a man able in every way to judge of
what he wrote me,” he said earnestly. “Unfortunately, however, I haven’t the
letter in my pocket. It is in the desk in my library.”
“Well, well, what is it about?” Nick inquired. “Where is Mr. Vandyke?”
“He now is in Mexico City, under so important a contract that he cannot
possibly leave the country for several months.”
“Mexico City, eh?”
Nick shot a swift, furtive glance at Chick, so significant that the latter
suppressed a look of surprise and remained silent.
“Yes,” Mantell quickly nodded. “The letter he wrote me explained all
that, Nick, and why he made me his partner in this matter, giving me an
equal interest with him and the third party involved.”
“Who is the third party?”
“A Mexican named Juan Padillo, recently a soldier in Villa’s forces
during the campaign in northern Mexico. He has deserted, and now is in this
city. That is to say—if he still is in the land of the living. I’m far from sure of
it.”
“Explain,” said Nick. “Why did Juan Padillo become a deserter?”
“Because of a find he made during the sacking of an old monastery in
Chihuahua territory, after the subjection of that section in which it is located
and the flight of most of the inhabitants. Vandyke has quietly looked up the
legal side of the matter, and he finds that the retention of these spoils of war
is entirely legitimate. In other words, Juan Padillo has a right to retain his
prize and dispose of it to the best advantage.”
“Admitting that, Mantell, what are the other circumstances?” Nick
inquired.
“They may be briefly stated. Padillo made this find in a secret vault,
which he discovered entirely by chance, under a wine cellar in the
monastery. He was the only person in Mexico who knew of his discovery
and that he got away with his plunder, with the single exception of Calvin
Vandyke, with whom Padillo long has had friendly relations, and to whom
he turned for aid and advice.”
“Of what do these spoils of war, as you call them, consist?” Nick
questioned.
“I can give you only an idea, Nick, without referring to Vandyke’s letter,
which describes the articles in detail and estimates their value,” said Mantell.
“They consist of clerical robes and jewels of great antiquity, which, Vandyke
has learned, must have been brought from Spain as far back as the sixteenth
century, and which probably have since been kept in concealment in the
monastery vault.”
“Give me an idea of them.”
“Well, one article is an archbishop’s robe of purple, wrought with a
design in diamonds, emeralds, rubies, and pearls. The gems are mounted in
gold, covering the entire breast of the robe, with a design consisting of the
ancient Spanish coat of arms, the double eagles back to back, with wings
raised and beaks open.”
“I recall it,” Nick nodded.
“There are two gold crowns, also, lavishly mounted with diamonds,
emeralds, and sapphires, the most of which are of unusual size and
corresponding value. In addition to these are other clerical robes of purple
and white silk, all worked with gems the worth of which could only be
roughly estimated. Vandyke places the value of the entire prize, however, at
about six hundred thousand dollars.”
“Gee whiz!” Patsy quietly exclaimed. “That sure was some find.”
“Juan Padillo was much dazzled by it, of course, and scarce knew what to
do,” Mantell earnestly continued. “He did not dare to confide in any of his
countrymen. He determined to take advantage of the prize, however, and to
get out of the country with it.”
“How long ago was that?” Nick inquired.
“Nearly two months. He obtained an old leather suit case, in which he
packed the spoils, and with which he succeeded in reaching Mexico City,
where he at once sought Vandyke and confided in him, offering to share
equally with him in return for his advice and assistance.”
“I see.”
“Vandyke looked into the matter, keeping Padillo concealed in his
residence,” Mantell went on. “He then realized the vast value of the prize,
but being utterly unable to leave the country himself, he proposed including
me in the matter on an equal footing, telling Padillo that he could come to
me and that I would dispose of the gems at their market value. Padillo
eagerly accepted the proposal, knowing that he would be shot as a deserter,
if caught, and that he must lose no time in getting out of the country.”
“I follow you,” Nick put in.
“Vandyke then smuggled him to Vera Cruz, and finally got him on board
a schooner about to leave for New York, paying his passage and giving him
careful instructions.”
“Namely?”
“He directed him not to leave the vessel after his arrival here until I called
for him, also not to open the suit case until he was safe in my residence, and
to pretend all the while that he was a penniless Mexican on his way to join
relatives in this city.”
“All were wise precautions,” Nick remarked.
“Vandyke then sent me a letter, stating all of these facts and invited me to
coöperate with him,” Mantell continued. “Naturally, with two hundred
thousand dollars in view, I was more than glad to comply. I wrote Vandyke
to that effect, and since have been constantly on the watch for the arrival of
the vessel. She was docked at Gray’s wharf late yesterday afternoon. But I
did not learn of it until I read the shipping news this morning. I then rushed
down to the wharf with my touring car, only to learn that——”
“That Juan Padillo left the vessel soon after her arrival yesterday and in
company with a man who used your name,” said Nick, interrupting.
“Good heavens!” Mantell exclaimed, with a gasp. “How did you know
that?”
“Your anxiety, coupled with the fact that Padillo was to remain on the
vessel until you called for him, admits of no other deductions,” Nick replied
evasively.
“You are right, Carter, perfectly right,” Mantell said, with a groan.
“Padillo left the vessel about six o’clock last evening, taking with him the
suit case containing his plunder.”
“With a man who used your name?”
“Yes.”
“Who informed you?”
“The captain of the vessel.”
“What more could he tell you?”
“Only that Padillo had, as I then could judge, carefully followed the
directions Vandyke had given him. Captain Macy evidently knew nothing
about the contents of the suit case, and he said it was the only piece of
luggage the Mexican had, and that he had taken it ashore. He could give me
only a vague description of the man who called for him, and said that Padillo
appeared relieved and eager to accompany him. They left from the head of
the wharf in a touring car, and——”
“And that’s all you know about them,” Nick again interrupted.
“I admit that, Carter, and that’s why I want your aid,” Mantell said
earnestly. “This man and the suit case must be found. I never can look
Vandyke in the face. Think of it! If——”
“That’s what I am doing,” said Nick, smiling a bit oddly. “Now, Mantell,
answer my questions. I then may do something more than think. Whom have
you told about this matter?”
“Only three persons,” Mantell quickly asserted. “My wife and my
parents, with whom Helen and I have been living since our marriage. You
knew, of course, that I was married eight weeks ago to Helen Bailey, the
pretty telephone girl whom you served so kindly—and who, I may add,
thinks so well of you Carters.”
“Yes, indeed, I know all about that, Mantell, but it’s irrelevant just now,”
smiled Nick. “Did you caution your parents, however, to say nothing about
the matter?”
“I did so most impressively.”
“Do you think they have obeyed you?”
“Yes, positively.”
“Where did you talk with them about it?”
“At home, Nick, in the library.”
“You must have been overheard.”
“I don’t think so.”
“I know so,” Nick insisted. “Either that, Mantell, or the letter sent you by
Vandyke has been read by one of your servants, or by some outsider. In no
other way, if your wife and parents have been silent on the subject, could the
man who lured Juan Padillo from the vessel and used your name have
learned anything about the matter.”
“I confess that I am mystified, Carter, as well as filled with dismay,”
Mantell hopelessly admitted. “You are the only one to whom I can turn.
What can be done? How can——”
“Stop a moment,” Nick interposed, rising abruptly. “There is nothing in
further discussing the case. Return to your car, Mantell, and wait until I
rejoin you. Go with him, Patsy.”
“Which may mean that you will——”
“Look into the matter?” Nick cut in again. “Yes, I will do what I can for
you. Time is of value, moreover, so don’t delay to thank me. Go at once.”
Patsy led the way, Mantell following, with an expression of great relief on
his refined, attractive face.
“Well, by Jove, that sheds limelight on this murder mystery,” said Chick,
lingering briefly with Nick in the chamber. “This certainly is a remarkable
coincidence.”
“I suspected something of the kind, Chick, when he mentioned the loss of
a vast quantity of jewels,” Nick replied. “That was one reason why I
consented to hear his story.”
“You have no doubt, of course, that the Mexican who was here last
evening was Juan Padillo.”
“Not the slightest.”
“Lured here by crooks who had learned of the circumstances and been
watching for the vessel.”
“Exactly. They were more alert than Mantell, and got in their work ahead
of him.”
“But how do you size up what occurred here?”
“I’m not quite ready to say,” said Nick. “I am going with Mantell to his
residence. You remain here and get what information Gibson can impart.
Have a look in the meantime at the doors and windows of the house. There
may be evidence indicating that it was broken into by some of the rascals
afterward engaged in the fight.”
“I’ll find it, Nick, if there is any,” Chick confidently predicted. “I see at
what you are driving.”
“Have Kennedy summon the coroner, also, and direct him to take the
customary legal steps here,” Nick added. “Say nothing about what we have
learned and suspect, but tell him we will continue our investigations, and
report later.”
“I’ve got you.”
“Having taken those steps, rejoin me at Mantell’s residence as quickly as
possible,” Nick directed. “He lives——”
“I know the house. It’s the mansion built by Mantell, the senior, in
Riverside Drive,” Chick put in. “I will lose no time in following you.”
“I will go with Mantell in his car, leaving Danny to bring you in ours,”
said Nick, as both turned from the chamber. “There must be quick work
done on this case, or, unless I am much mistaken, both Juan Padillo and his
war prize of ancient jewels will go by the board.”
“Quick work, then, is the proper caper,” Chick declared. “I’ll see you a
little later.”
Nick did not reply, but hastened out to the car in which Patsy and Frank
Mantell were waiting.
“To your residence,” he directed, addressing the latter. “Let her go at top
speed, chauffeur. Minutes count.”
CHAPTER IV.

A STARTLING DISCOVERY.

It was nearly noon when the touring car containing Nick Carter and his
companions sped up the broad driveway and stopped under the porte-
cochère of the magnificent Mantell mansion overlooking the Hudson.
“We shall not find my father at home, Nick,” Mantell remarked, while
alighting from the car. “He still is engaged in settling up the affairs of our
defunct department store, wrecked by the knavery of his junior partner, that
treacherous miscreant, Gaston Goulard. No need to tell you of that rascal,
Nick, whom you so quickly pulled up to the ringbolt after taking on the
case.”
“No need, indeed,” Nick replied, a bit grimly. “It was deucedly
unfortunate, though, that he slipped through the meshes of the legal net and
eluded the punishment he deserved.”
“Decidedly so.”
“His being a partner in the business was all that saved him,” Nick added.
“It enabled a clever criminal lawyer to pull him out of the fire, on grounds
that either of the partners had a legal right to dispose at will of the property
of the firm. It was a hard fight, and the rascal got away without punishment,
barring the penalty he had brought upon himself, that of financial ruin and
hopeless dishonor.”
“Right in both respects,” Mantell nodded. “Gaston Goulard is down and
out forever.”
“By the way, Mantell, do you ever see him?” Nick inquired.
“Yes, occasionally,” was the reply. “I never see him, however, that he
does not threaten to get even with me for the past.”
“Humph!” Nick ejaculated contemptuously.
“Get even, indeed!” Mantell bitterly added. “The boot should be on the
other leg. He hates me for having won and married Helen Bailey, Nick, to
whose hand he had aspirations even while engaged in his treacherous
robberies. I saw him about ten days ago, looking seedy enough, Nick, and as
if dissipation was making inroads upon his health.”
“Threatened you, Mantell, has he?” questioned Nick, with brows knitting
slightly.
“Repeatedly,” Mantell nodded, as they mounted the steps. “I somehow
fear the rascal, Nick, for he is capable of any degree of knavery, and is a
desperate dog when crossed. I expect trouble from him, in fact, and for that
reason am constantly alert.”
“I predicted after his exposure and arrest that he would go to the bad,”
said Nick. “Ah, this is a pleasure, indeed, Mrs. Mantell.”
Having entered the handsomely furnished house while speaking, where
they were met in the hall by Mantell’s charming young wife, the beautiful
girl whom Nick first had seen at a telephone switchboard, under
circumstances that revealed her lofty and heroic character, as well as which
enabled him to be of great service to her.
She hastened to shake hands with both him and Patsy, saying feelingly:
“Your pleasure could not be greater than mine, Mr. Carter. I am delighted
to see you. I ought to scold you roundly, however, for not having called here
occasionally, at least.”
“That’s right, too, Helen,” put in Mantell.
“You overlook one fact,” smiled Nick, replying to her.
“What is that, Mr. Carter?”
“That I have hardly an hour in the week, not to say in a day, that I can
really call my own,” Nick said gravely. “I am a very busy man, you know.”
“Ah, I suppose so,” Helen rejoined. “And chiefly because other men are
so wicked.”
“True.”
“It is deplorable.”
“True again,” said Nick. “Nor am I less busy than usual this morning. I
think, Frank, we had better get right at this matter.”
“I think so, too.”
“I’m sure your wife will excuse us.”
She bowed and smiled agreeably, and Nick and Patsy followed Mantell
into the library, a superbly furnished room overlooking the side grounds.
“Now, Nick, what can I tell you?” he asked, placing chairs for them.
“Why have you come here?”
“To begin with, Mantell, I want to see the letter written to you by Calvin
Vandyke,” said Nick. “Where have you kept it?”
“Here, in my desk,” said Mantell, rising to unlock a large roll-top desk in
one corner of the spacious room.
“Is your desk usually locked?”
“Always, Nick, when I am absent.”
“Wait one moment,” said the detective. “Let me examine the lock.”
Mantell complied, handing him the key.
Nick unlocked the desk, and, rolling the top partly up, he began a careful
inspection of the brass socket which received the bolt of the lock when the
desk was securely closed. He found several tiny, faint scratches on one side
of it, which could not have been caused by the action of the bolt, not being
where it came in contact with the socket. An examination with a powerful
lens, moreover, showed that these slight marks were quite bright, as if
recently made and with an instrument as sharp as the point of a pin.
Nick returned the ring of keys and resumed his seat.
“That lock has recently been picked, Mantell,” he said confidently.
“Picked!” Mantell exclaimed amazedly. “Are you sure of it?”
“Positively.”
“But——”
“There aren’t any buts,” Nick interrupted. “I know when evidence shows
that a lock has been picked. The crook who picked that one used a tool with
a sharp point, which at times touched one side of the bolt socket and left
faint marks in the brass. The brightness of them shows that it was quite
recently done.”
“But our servants are entirely trustworthy, Nick, and——”
“I don’t think it was done by one of your servants,” Nick again
interrupted. “Have you a burglar alarm in the house?”
“Yes, an electric alarm,” said Mantell. “All of the doors and windows on
the ground floor are protected. Perkins, the butler, sets it each night before
he retires.”
“This job may have been done during the day.”
“But there is always some one in the house.”
“I will look farther presently,” said Nick, not inclined to argue the point.
“Let me see the Vandyke letter, also the envelope, if you have it.”
Mantell took them from a pigeonhole in the desk and placed them in the
detective’s hand.
Nick turned to the window and began to inspect them with his lens,
which he had not replaced in his pocket. He did not read the letter, which
covered several closely written sheets, and in which he apparently had no
interest aside from the paper on which it was written.
“A man handling a tool small enough to pick the lock of a desk is very
likely to soil the balls of his thumb and fingers with the metal,” he remarked,
after several moments. “There are faint marks and smooches both on this
envelope and the backs of several sheets of the paper.”
“I did not observe them,” said Mantell, noting the detective’s subtle
intonation. “What do you make of them, Carter?”
“They look very much like finger prints,” said Nick. “Patsy——”
“Yes, chief.”
Patsy had foreseen what was coming and was alert on the instant.
“Mantell’s car is waiting outside,” said Nick, folding the letter and
replacing it in the envelope. “His chauffeur will take you to our office and
bring you back here. Examine these smooches with a magnifying glass and
see what you make of them. If finger prints, compare them with our
collection. Report as quickly as possible.”
“Trust me for that, chief,” cried Patsy, hastening from the room.
“While we are waiting, Mantell, I will have a look around the outside of
the house,” said Nick, rising. “I may find evidence that it has been recently
entered, in spite of your burglar alarm. You had better wait here. I can work
more quickly alone.”
Nick walked out through the hall after the last remark, and ten minutes
had passed, when he returned.
“Well?” questioned Mantell anxiously. “What have you found?”
“Nothing positively showing that the house was entered by night,” Nick
replied, resuming his seat. “It may have been accomplished through a
second-story window, however, several of which can be quite easily reached.
I found, nevertheless, positive evidence of something else.”
“Of what?”
“That two men quite recently were playing the eavesdropper under your
library windows,” said Nick. “There are partly obliterated footprints in the
greensward and the flower beds flanking the foundation wall below the
windows.”
“By Jove, is it possible!”
“If they were under only one window, I would feel less confident,” Nick
added. “The fact that traces of the same impressions appear under all of the
windows convinces me that I am right. They were spying outside ten
evenings ago.”
“How do you fix the exact day?” Mantell questioned perplexedly.
“By the character of the imprints and the condition of the near
greensward, to which they frequently stepped,” Nick explained. “We had a
hard rain eleven days ago, and have had none since then.”
“I remember.”
“A hard rain would completely obliterate such imprints from the soil of a
flower bed,” Nick went on. “These, then, must have been formed since the
storm. The depth and irregular character of them, however, show that the soil
must have been very soft and muddy, as if very soon after the rain. This
appears, too, in that when they stepped to the greensward they left many
traces of the soil clinging to their soles. I feel perfectly safe in saying that
they were there the night after the storm.”
Mantell’s face had taken on a more serious expression.
“By Jove, you have reminded me of something, Carter,” he said gravely.
“What is that?”
“It was on the day following that storm that I received Vandyke’s letter,
and I read it aloud that evening to my wife and parents. We were here in the
library. I begin to think your deductions are correct.”
“I am very sure of it,” Nick declared, smiling a bit oddly.
“But who could have been spying upon us, or playing the eavesdropper?”
“There were two men, Mantell, judging from the different imprints, or
what little is left of them,” said Nick. “They may have been here with some
other object in view, possibly the planning of a burglary. Their hearing that
letter, however, may have been only incidental, though it evidently resulted
in a change of their plans for an entirely different job.”
“You mean that of getting and robbing Juan Padillo.”
“Precisely.”
“But why do you suspect that a burglary was contemplated?”
“Because a notorious burglar, one of the most dangerous yeggs in the
country, was killed last night in a house in Manhattanville,” Nick now
explained. “I refer to Cornelius Taggart, quite commonly known as Connie
Taggart, the cracksman.”
“Good heavens!” Mantell’s color had been steadily waning. “You imply,
Carter, that he may have been one of the eavesdroppers, that he may have
been the scoundrel who used my name to deceive Juan Padillo.”
“Either he, Mantell, or his confederate,” bowed Nick. “That is precisely
what I think.”
“But why? For any other reason?” Mantell asked anxiously.
“Yes, a very potent reason,” nodded the detective. “Listen, Mantell, and I
will tell why I think so.”
Nick then informed him of what had been discovered in the
Manhattanville house, the evidence he had found, and many of the
conclusions at which he had arrived.
Mantell listened without interrupting, but with steadily increasing
apprehensions, as appeared in the look of despair that finally settled on his
drawn, white face.
“There is nothing to it, Carter,” he said, with a groan, when Nick had
concluded. “They have got both the man and the jewels. They have killed
Padillo, and the jewels are gone forever.”
“Don’t be so sure of that,” said Nick. “I may find a way to save the man
and recover the gems. That’s what I am seeking—the way.”
“You mean——”
“I mean that I want to discover, if possible, the identity of Taggart’s
confederate,” Nick interrupted. “I then can shape up my work. That is why I
came here to see Vandyke’s letter. I suspect that a copy of it was made. I
suspected, also, if it was obtained by breaking into the house and forcing
your desk, that it might bear finger prints of the crooks. Patsy will report a
little later.”
“But why wouldn’t a crook have taken the letter itself?” questioned
Mantell. “Why would he have made a copy of it?”
“Because you would have missed the letter, and, of course, would have
become suspicious,” Nick pointed out. “You would immediately have taken
steps to thwart the knavery that has been successfully accomplished through
leaving the letter in its customary place.”
“Yes, yes, I see,” Mantell nodded. “I ought to have thought of that. You
suspect then, that——”
“Wait! There comes my touring car with Chick and Danny, my
chauffeur,” Nick interrupted, glancing from the window. “I must see what
more he has learned.”
“I will admit him,” cried Mantell, hastening to do so.
Chick entered the library with him a few moments later. He at once
proceeded to report to Nick that Gibson, the house broker, could add nothing
definite to the statements he had made by telephone, and that his description
of the couple who had called to rent the house were of but little value, the
woman having been veiled at the time, while the man probably was in
disguise.
On one of the basement windows, however, Chick had found convincing
evidence that the house had been forcibly entered, but he could discover no
clew to the identity or number of the burglars.
“Whether they were confederates of Taggart or——”
“They were not,” said Nick, interrupting Chick’s report. “Taggart was
killed by Padillo, and he either was the man who lured the Mexican to the
house, or a confederate of the man who did so. In either case, Chick, the
Taggart gang would have had access to the house without breaking into it.”
“That’s logical,” Chick quickly admitted. “There is no denying it.”
“If we can discover the identity of Taggart’s confederate, therefore, we
shall have a definite clew to both gangs that evidently were in the house,”
Nick added. “Ah, Patsy is returning. Admit him, Mantell. His haste indicates
that he has made a discovery of some importance.”
Nick had caught sight of the returning automobile, from which Patsy was
hastening to alight before it came to a stop in the driveway. He entered the
library a moment later, and his first words confirmed Nick’s prediction.
“They are finger prints, chief, all right,” he cried, returning the Vandyke
letter.
“Are there corresponding ones in our collection?” Nick inquired.
“That’s what, chief.”
“Whose are they, Patsy?”
“Those of the crook who gave the law the slip, but not before we got his
measurements and identification marks,” cried Patsy. “There is no mistaking
them, chief. They are the finger prints of—Gaston Goulard!”
CHAPTER V.

A CHANCE CLEW.

No jungle in the heart of the African desert, no wilds of the Far West, no
desert region of the ice-bound North, no corner of the whole wide world, in
fact, contains beasts more to be dreaded, more crafty, cruel, and terrible, than
those to be found within the precincts of a great city, in the haunts of the
underworld, in the lairs and labyrinths of vice and crime.
Close upon four o’clock that afternoon, or about three hours after Nick
Carter and his assistants left the Mantell residence, two women met by
chance in a certain disreputable section of the East Side, and nearly in front
of an inferior hotel restaurant and barroom run by one Barney Magrath.
There was no mistaking their type and character. Their flashy attire, their
painted cheeks, the swagger atmosphere with which they met and entered
into conversation, told the story in broad-faced type and double-leaded lines.
One was a slender, thin-featured woman with red hair, crafty gray eyes,
and a sinister expression.
The other was a more striking woman. She had a fine figure, the better
clad of the two, a woman in the twenties, with regular features, dark hair and
complexion, a firm mouth and chin. Hers was a decidedly strong and quite
handsome face, lighted with eyes that had a habitual searching and defiant
expression.
The first words that passed between them, uttered by the woman with red
hair, fell upon the ears of a man who was about emerging from the near
barroom, and who instantly passed back of the swinging doors and lingered
to listen.
“Oh, I say!” exclaimed the woman. “You’re just the skirt I want to see.
I’ve been looking for you, Sadie.”
The brows of the listening man knit slightly. He appeared of a type that
frequented that locality, a rather sinister-looking fellow with a black
mustache.
No observer would have suspected him of being a detective—to say
nothing of being the most noted detective of his day.
“The woman herself—Sadie Badger,” was the thought that flashed
through his mind. “The other jade is Mollie Damon, a running mate of
Slugger Sloan, a holdup man.”
Nick had obtained a momentary glimpse of both women when they halted
on the sidewalk, and he had instantly recognized both notorious crooks.
“Looking for me, Moll?” Sadie Badger questioned, sharply eyeing her.
“That’s what, Sadie.”
“What do you want? Are you on the borrow?”
“Nix! Not much! I’ve got coin to burn.”
“What’s up, then?”
“There’s a gent who wants to meet you. He wanted me to find you.”
“Meet me, eh?” Sadie’s eyes took on a sinister squint. “Why does he want
to meet me?”
“He’ll tell you,” Moll Damon returned. “I’m not wise. That is, only wise
to—whisper!”
She leaned nearer to her companion and spoke with lowered voice, but
her sharp aspirates reached the ears of the listening detective.
“It’s about the trick that was turned last night.”
Sadie Badger gazed at her without a change of countenance.
“What trick is that?” she demanded. “Come across plainly. I don’t get
you.”
“You don’t, eh?” Moll frowned. “Tell that to the marines.”
“Tell it to whom you like,” Sadie retorted. “It’s all one to me.”
“Well, whether you get me, Sadie, or not, the gent wants to meet you,”
Moll insisted. “What do you say?”
Sadie Badger gazed at the curbing for several seconds, evidently sizing
up the significance of what she had heard, and the consequences involved in
whatever course she might shape.
“Who is the gent, Moll?” she then asked abruptly.
“You don’t know him.”
“What’s his name?”
“Goulard.”
“I never heard of him.”
“That cuts no ice,” Moll declared. “He’s all right. You’d better see him. If
you’ll go with me——”
“I guess not! Not if the court knows itself,” Sadie Badger interrupted,
with scornful significance. “Safety first, Moll. When I meet strange gents, I
meet them where I’m dead sure of having the best of it.”
“I’ll send him to you, then,” Moll Damon quickly suggested.
Sadie hesitated again for a moment, then said curtly:
“You may do that, Moll, if you like.”
“Where to?”
“I’m heading for home. You know where I hang out. Send him there and
I’ll see him.”
“I’ll do it,” Moll quickly nodded. “He’ll show up within an hour.”
“All right! I’ll be there.”
The women parted with as little ceremony as they had met.
“Goulard, eh?” thought Nick, having heard every word that passed
between the couple. “Goulard, eh? If he shows up before I do, Miss Sadie
Badger, he’ll go some. This is too good an opportunity to lose.”
The conversation between the two women had transpired in a very few
minutes. The significance of it, in view of what Nick had learned and
suspected, convinced him not only that he was on the right track, but also
that the work he had laid out for himself and his two assistants before
leaving the Mantell residence, the nature of which will appear, was likely to
prove successful.
No one had noticed him in the barroom doorway, and Nick presently
slipped out and started in pursuit of Sadie Badger.
“She is not acquainted with Goulard, and probably does not know him by
sight,” he rightly reasoned from what he had overheard. “If I have sized up
the evidence correctly, then, I probably can worm out of her precisely what
took place in the Manhattanville house, and possibly learn what became of
Padillo and his war prize. I’ll wager I have it near enough to pull wool over
the woman’s eyes and loosen her tongue. I’ll take the chance, at all events,
regardless of the consequences.”
Nick had no difficulty in overtaking Sadie Badger nor in trailing her to
her destination.
It proved to be the end dwelling of a long wooden block in the upper East
Side. The end house in which she dwelt was within fifty yards of the
swirling waters of East River. The intervening space was occupied with a
motley aggregation of old buildings devoted to divers uses. They extended
even to the walled bank of the restless river, a large sign on the farthest one
bearing the single word: “Lime.”
“Not a savory section, by Jove,” thought Nick, after watching the woman
enter the house. “I’ll allow reasonable time for Goulard to have been seen
and sent here, and then I’ll tackle the woman and—well, the proof of a
pudding is its eating.”
Nick waited less than ten minutes, however, apprehending that Goulard
might possibly arrive before he could hoodwink Sadie Badger, and he then
approached the house and rang the doorbell.
“I shall hear the rascal ring, of course, if he shows up before I have got in
my work,” he said to himself while waiting on the steps. “I’ll arrest both of
them in that case and land them where they belong.”
Nick had waited only about a minute when the door was opened by the
woman herself, divested of her street garments, and wearing a loose woolen
house jacket. She gazed sharply at him, and Nick at once said inquiringly:
“Miss Badger?”
“Yes, I am Miss Badger,” said Sadie, nodding a bit coldly.
“I am the man Moll Damon told you about—Gaston Goulard.”
“You arrive here very soon after my talk with her,” said Sadie
suspiciously. “How did she see you so quickly?”
“She did not see me,” said Nick, ready with an explanation. “She
telephoned.”
“Ah! Come in, Mr. Goulard.”
Nick entered and followed her into a small rear parlor, divided from that
in front by a curtained doorway. Through the broad portière, however, Nick
could see that the front room was unoccupied. Listening intently, moreover,
he could hear not a sound indicating that other persons were in the house.
Upon taking the chair to which the woman invited him, nevertheless,
Nick inquired:
“Do I find you alone here? As you may infer, Miss Badger, my business
with you is of a private nature.”
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like