100% found this document useful (2 votes)
45 views38 pages

Statistics and Data Analysis for Microarrays using MATLAB 2nd edition Draghici - Download the complete ebook in PDF format and read freely

The document provides information about various ebooks related to statistics and data analysis, particularly focusing on microarrays and MATLAB. It includes links to download specific titles and details about the authors and editions of these books. Additionally, it outlines the aims of the CRC Press Mathematical and Computational Biology Series, which seeks to integrate mathematical, statistical, and computational methods into biology.

Uploaded by

nkwabumaans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
45 views38 pages

Statistics and Data Analysis for Microarrays using MATLAB 2nd edition Draghici - Download the complete ebook in PDF format and read freely

The document provides information about various ebooks related to statistics and data analysis, particularly focusing on microarrays and MATLAB. It includes links to download specific titles and details about the authors and editions of these books. Additionally, it outlines the aims of the CRC Press Mathematical and Computational Biology Series, which seeks to integrate mathematical, statistical, and computational methods into biology.

Uploaded by

nkwabumaans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Visit ebookfinal.

com to download the full version and


explore more ebooks or textbooks

Statistics and Data Analysis for Microarrays using


MATLAB 2nd edition Draghici

_____ Click the link below to download _____


https://ebookfinal.com/download/statistics-and-data-
analysis-for-microarrays-using-matlab-2nd-edition-draghici/

Explore and download more ebooks or textbook at ebookfinal.com


Here are some recommended products that we believe you will be
interested in. You can click the link to download.

Statistics for microarrays design analysis and inference


1st Edition Ernst Wit

https://ebookfinal.com/download/statistics-for-microarrays-design-
analysis-and-inference-1st-edition-ernst-wit/

Analysis and design of control systems using MATLAB 2nd ed


Edition Dukkipati

https://ebookfinal.com/download/analysis-and-design-of-control-
systems-using-matlab-2nd-ed-edition-dukkipati/

Time Series Data Analysis Using EViews Statistics in


Practice I. Gusti Ngurah Agung

https://ebookfinal.com/download/time-series-data-analysis-using-
eviews-statistics-in-practice-i-gusti-ngurah-agung/

Statistics for Censored Environmental Data Using Minitab


and R Second Edition Dennis R. Helsel(Auth.)

https://ebookfinal.com/download/statistics-for-censored-environmental-
data-using-minitab-and-r-second-edition-dennis-r-helselauth/
Using SAS for Data Management Statistical Analysis and
Graphics 1st Edition Ken Kleinman

https://ebookfinal.com/download/using-sas-for-data-management-
statistical-analysis-and-graphics-1st-edition-ken-kleinman/

Data Analysis and Statistics for Geography Environmental


Science and Engineering 1st Edition Miguel F. Acevedo
(Author)
https://ebookfinal.com/download/data-analysis-and-statistics-for-
geography-environmental-science-and-engineering-1st-edition-miguel-f-
acevedo-author/

An Introduction to Categorical Data Analysis Wiley Series


in Probability and Statistics 2nd Edition Alan Agresti

https://ebookfinal.com/download/an-introduction-to-categorical-data-
analysis-wiley-series-in-probability-and-statistics-2nd-edition-alan-
agresti/

Digital Image Processing Using Matlab 2nd Edition Rafael


C. Gonzalez

https://ebookfinal.com/download/digital-image-processing-using-
matlab-2nd-edition-rafael-c-gonzalez/

Digital Signal Processing using MATLAB 2nd Edition Vinay


K. Ingle

https://ebookfinal.com/download/digital-signal-processing-using-
matlab-2nd-edition-vinay-k-ingle/
Statistics and Data Analysis for Microarrays using
MATLAB 2nd edition Draghici Digital Instant Download
Author(s): Draghici, Sorin
ISBN(s): 9781439809761, 1439809763
Edition: 2nd ed
File Details: PDF, 130.85 MB
Year: 2011
Language: english
Statistics and Data Analysis
for Microarrays
Using R and Bioconductor
Second Edition
CHAPMAN & HALL/CRC
Mathematical and Computational Biology Series

Aims and scope:


This series aims to capture new developments and summarize what is known
over the entire spectrum of mathematical and computational biology and
medicine. It seeks to encourage the integration of mathematical, statistical,
and computational methods into biology by publishing a broad range of
textbooks, reference works, and handbooks. The titles included in the
series are meant to appeal to students, researchers, and professionals in the
mathematical, statistical and computational sciences, fundamental biology
and bioengineering, as well as interdisciplinary researchers involved in the
field. The inclusion of concrete examples and applications, and programming
techniques and examples, is highly encouraged.

Series Editors

N. F. Britton
Department of Mathematical Sciences
University of Bath

Xihong Lin
Department of Biostatistics
Harvard University

Hershel M. Safer
School of Computer Science
Tel Aviv University

Maria Victoria Schneider


European Bioinformatics Institute

Mona Singh
Department of Computer Science
Princeton University

Anna Tramontano
Department of Biochemical Sciences
University of Rome La Sapienza

Proposals for the series should be submitted to one of the series editors above or directly to:
CRC Press, Taylor & Francis Group
4th, Floor, Albert House
1-4 Singer Street
London EC2A 4BQ
UK
Published Titles
Algorithms in Bioinformatics: A Practical Exactly Solvable Models of Biological
Introduction Invasion
Wing-Kin Sung Sergei V. Petrovskii and Bai-Lian Li
Bioinformatics: A Practical Approach Gene Expression Studies Using
Shui Qing Ye Affymetrix Microarrays
Biological Computation Hinrich Göhlmann and Willem Talloen
Ehud Lamm and Ron Unger Glycome Informatics: Methods and
Biological Sequence Analysis Using Applications
the SeqAn C++ Library Kiyoko F. Aoki-Kinoshita
Andreas Gogol-Döring and Knut Reinert Handbook of Hidden Markov Models
Cancer Modelling and Simulation in Bioinformatics
Luigi Preziosi Martin Gollery

Cancer Systems Biology Introduction to Bioinformatics


Edwin Wang Anna Tramontano

Cell Mechanics: From Single Scale- Introduction to Bio-Ontologies


Based Models to Multiscale Modeling Peter N. Robinson and Sebastian Bauer
Arnaud Chauvière, Luigi Preziosi, Introduction to Computational
and Claude Verdier Proteomics
Clustering in Bioinformatics and Drug Golan Yona
Discovery Introduction to Proteins: Structure,
John D. MacCuish and Norah E. MacCuish Function, and Motion
Combinatorial Pattern Matching Amit Kessel and Nir Ben-Tal
Algorithms in Computational Biology An Introduction to Systems Biology:
Using Perl and R Design Principles of Biological Circuits
Gabriel Valiente Uri Alon
Computational Biology: A Statistical Kinetic Modelling in Systems Biology
Mechanics Perspective Oleg Demin and Igor Goryanin
Ralf Blossey Knowledge Discovery in Proteomics
Computational Hydrodynamics of Igor Jurisica and Dennis Wigle
Capsules and Biological Cells Meta-analysis and Combining
C. Pozrikidis Information in Genetics and Genomics
Computational Neuroscience: Rudy Guerra and Darlene R. Goldstein
A Comprehensive Approach Methods in Medical Informatics:
Jianfeng Feng Fundamentals of Healthcare
Data Analysis Tools for DNA Microarrays Programming in Perl, Python, and Ruby
Sorin Draghici Jules J. Berman
Differential Equations and Mathematical Modeling and Simulation of Capsules
Biology, Second Edition and Biological Cells
D.S. Jones, M.J. Plank, and B.D. Sleeman C. Pozrikidis
Dynamics of Biological Systems Niche Modeling: Predictions from
Michael Small Statistical Distributions
Engineering Genetic Circuits David Stockwell
Chris J. Myers
Published Titles (continued)
Normal Mode Analysis: Theory and Statistics and Data Analysis for
Applications to Biological and Chemical Microarrays Using R and Bioconductor,
Systems Second Edition
Qiang Cui and Ivet Bahar Sorin Drăghici
Optimal Control Applied to Biological Stochastic Modelling for Systems
Models Biology
Suzanne Lenhart and John T. Workman Darren J. Wilkinson
Pattern Discovery in Bioinformatics: Structural Bioinformatics: An Algorithmic
Theory & Algorithms Approach
Laxmi Parida Forbes J. Burkowski
Python for Bioinformatics The Ten Most Wanted Solutions in
Sebastian Bassi Protein Bioinformatics
Spatial Ecology Anna Tramontano
Stephen Cantrell, Chris Cosner, and
Shigui Ruan
Spatiotemporal Patterns in Ecology
and Epidemiology: Theory, Models,
and Simulation
Horst Malchow, Sergei V. Petrovskii, and
Ezio Venturino
Statistics and Data Analysis
for Microarrays
Using R and Bioconductor
Second Edition

Sorin Drăghici
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2012 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 2011909

International Standard Book Number-13: 978-1-4398-0976-1 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To Jeannette, my better half,
to Tavi, who brightens every day of my life,
and to Althea, whom I miss every day we are not together
This page intentionally left blank
Contents

List of Figures xxv

List of Tables xxxv

Preface xxxix

1 Introduction 1

1.1 Bioinformatics – an emerging discipline . . . . . . . . . . . . 1

2 The cell and its basic mechanisms 5

2.1 The cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5


2.2 The building blocks of genomic information . . . . . . . . . . 13
2.2.1 The deoxyribonucleic acid (DNA) . . . . . . . . . . . 13
2.2.2 The DNA as a language . . . . . . . . . . . . . . . . . 19
2.2.3 Errors in the DNA language . . . . . . . . . . . . . . . 23
2.2.4 Other useful concepts . . . . . . . . . . . . . . . . . . 24
2.3 Expression of genetic information . . . . . . . . . . . . . . . 28
2.3.1 Transcription . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Translation . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Gene regulation . . . . . . . . . . . . . . . . . . . . . . 35
2.4 The need for high-throughput methods . . . . . . . . . . . . 36
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Microarrays 39

3.1 Microarrays – tools for gene expression analysis . . . . . . . 39


3.2 Fabrication of microarrays . . . . . . . . . . . . . . . . . . . 41
3.2.1 Deposition . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1.1 The Illumina technology . . . . . . . . . . . 42
3.2.2 In situ synthesis . . . . . . . . . . . . . . . . . . . . . 48
3.2.3 A brief comparison of cDNA and oligonucleotide tech-
nologies . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3 Applications of microarrays . . . . . . . . . . . . . . . . . . . 57
3.4 Challenges in using microarrays in gene expression studies . 58

ix
x Contents

3.5 Sources of variability . . . . . . . . . . . . . . . . . . . . . . 63


3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Reliability and reproducibility issues in DNA microarray


measurements 69

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 What is expected from microarrays? . . . . . . . . . . . . . . 70
4.3 Basic considerations of microarray measurements . . . . . . 70
4.4 Sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.6 Reproducibility . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.7 Cross-platform consistency . . . . . . . . . . . . . . . . . . . 78
4.8 Sources of inaccuracy and inconsistencies in microarray mea-
surements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.9 The MicroArray Quality Control (MAQC) project . . . . . . 85
4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5 Image processing 89

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Basic elements of digital imaging . . . . . . . . . . . . . . . . 90
5.3 Microarray image processing . . . . . . . . . . . . . . . . . . 95
5.4 Image processing of cDNA microarrays . . . . . . . . . . . . 96
5.4.1 Spot finding . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4.2 Image segmentation . . . . . . . . . . . . . . . . . . . 100
5.4.3 Quantification . . . . . . . . . . . . . . . . . . . . . . 106
5.4.4 Spot quality assessment . . . . . . . . . . . . . . . . . 111
5.5 Image processing of Affymetrix arrays . . . . . . . . . . . . . 113
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6 Introduction to R 119

6.1 Introduction to R . . . . . . . . . . . . . . . . . . . . . . . . 119


6.1.1 About R and Bioconductor . . . . . . . . . . . . . . . 119
6.1.2 Repositories for R and Bioconductor . . . . . . . . . . 120
6.1.3 The working setup for R . . . . . . . . . . . . . . . . . 121
6.1.4 Getting help in R . . . . . . . . . . . . . . . . . . . . . 122
6.2 The basic concepts . . . . . . . . . . . . . . . . . . . . . . . . 122
6.2.1 Elementary computations . . . . . . . . . . . . . . . . 122
6.2.2 Variables and assignments . . . . . . . . . . . . . . . . 125
6.2.3 Expressions and objects . . . . . . . . . . . . . . . . . 126
6.3 Data structures and functions . . . . . . . . . . . . . . . . . 128
6.3.1 Vectors and vector operations . . . . . . . . . . . . . . 128
6.3.2 Referencing vector elements . . . . . . . . . . . . . . . 131
Contents xi

6.3.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 133


6.3.4 Creating vectors . . . . . . . . . . . . . . . . . . . . . 135
6.3.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.3.6 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
6.3.7 Data frames . . . . . . . . . . . . . . . . . . . . . . . . 141
6.4 Other capabilities . . . . . . . . . . . . . . . . . . . . . . . . 144
6.4.1 More advanced indexing . . . . . . . . . . . . . . . . . 144
6.4.2 Missing values . . . . . . . . . . . . . . . . . . . . . . 145
6.4.3 Reading and writing files . . . . . . . . . . . . . . . . 148
6.4.4 Conditional selection and indexing . . . . . . . . . . . 150
6.4.5 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.4.6 Implicit loops . . . . . . . . . . . . . . . . . . . . . . . 154
6.5 The R environment . . . . . . . . . . . . . . . . . . . . . . . 159
6.5.1 The search path: attach and detach . . . . . . . . . . 159
6.5.2 The workspace . . . . . . . . . . . . . . . . . . . . . . 161
6.5.3 Packages . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.5.4 Built-in data . . . . . . . . . . . . . . . . . . . . . . . 165
6.6 Installing Bioconductor . . . . . . . . . . . . . . . . . . . . . 165
6.7 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.8 Control structures in R . . . . . . . . . . . . . . . . . . . . . 169
6.8.1 Conditional statements . . . . . . . . . . . . . . . . . 170
6.8.2 Pre-test loops . . . . . . . . . . . . . . . . . . . . . . . 171
6.8.3 Counting loops . . . . . . . . . . . . . . . . . . . . . . 172
6.8.4 Breaking out of loops . . . . . . . . . . . . . . . . . . 173
6.8.5 Post-test loops . . . . . . . . . . . . . . . . . . . . . . 173
6.9 Programming in R versus C/C++/Java . . . . . . . . . . . . 174
6.9.1 R is “forgiving” – which can be bad . . . . . . . . . . . 174
6.9.2 Weird syntax errors . . . . . . . . . . . . . . . . . . . 175
6.9.3 Programming style . . . . . . . . . . . . . . . . . . . . 179
6.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
6.11 Solved Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7 Bioconductor: principles and illustrations 193

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193


7.2 The portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
7.2.1 The main resource categories . . . . . . . . . . . . . . 195
7.2.2 Working with the software repository . . . . . . . . . 195
7.3 Some explorations and analyses . . . . . . . . . . . . . . . . 197
7.3.1 The representation of microarray data . . . . . . . . . 197
7.3.2 The annotation of a microarray platform . . . . . . . 199
7.3.3 Predictive modeling using microarray data . . . . . . 203
7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
xii Contents

8 Elements of statistics 207

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207


8.2 Some basic concepts . . . . . . . . . . . . . . . . . . . . . . . 208
8.2.1 Populations versus samples . . . . . . . . . . . . . . . 208
8.2.2 Parameters versus statistics . . . . . . . . . . . . . . . 209
8.3 Elementary statistics . . . . . . . . . . . . . . . . . . . . . . 211
8.3.1 Measures of central tendency: mean, mode, and median 211
8.3.1.1 Mean . . . . . . . . . . . . . . . . . . . . . . 211
8.3.1.2 Mode . . . . . . . . . . . . . . . . . . . . . . 212
8.3.1.3 Median, percentiles, and quantiles . . . . . . 213
8.3.1.4 Characteristics of the mean, mode, and me-
dian . . . . . . . . . . . . . . . . . . . . . . . 214
8.3.2 Measures of variability . . . . . . . . . . . . . . . . . . 215
8.3.2.1 Range . . . . . . . . . . . . . . . . . . . . . . 215
8.3.2.2 Variance . . . . . . . . . . . . . . . . . . . . 216
8.3.3 Some interesting data manipulations . . . . . . . . . . 218
8.3.4 Covariance and correlation . . . . . . . . . . . . . . . 219
8.3.5 Interpreting correlations . . . . . . . . . . . . . . . . . 223
8.3.6 Measurements, errors, and residuals . . . . . . . . . . 230
8.4 Degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . 231
8.4.1 Degrees of freedom as independent error estimates . . 232
8.4.2 Degrees of freedom as number of additional measure-
ments . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8.4.3 Degrees of freedom as observations minus restrictions 233
8.4.4 Degrees of freedom as measurements minus model pa-
rameters . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.4.5 Degrees of freedom as number of measurements we can
change . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.4.6 Data split between estimating variability and model pa-
rameters . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.4.7 A geometrical perspective . . . . . . . . . . . . . . . . 235
8.4.8 Calculating the number of degrees of freedom . . . . . 236
8.4.8.1 Estimating k quantities from n measurements 236
8.4.9 Calculating the degrees of freedom for an n × m table 237
8.5 Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.5.1 Computing with probabilities . . . . . . . . . . . . . . 243
8.5.1.1 Addition rule . . . . . . . . . . . . . . . . . . 243
8.5.1.2 Conditional probabilities . . . . . . . . . . . 244
8.5.1.3 General multiplication rule . . . . . . . . . . 247
8.6 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 247
8.7 Testing for (or predicting) a disease . . . . . . . . . . . . . . 250
8.7.1 Basic criteria: accuracy, sensitivity, specificity, PPV,
NPV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Contents xiii

8.7.2 More about classification criteria: prevalence, incidence,


and various interdependencies . . . . . . . . . . . . . . 253
8.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.9 Solved problems . . . . . . . . . . . . . . . . . . . . . . . . . 257
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

9 Probability distributions 261

9.1 Probability distributions . . . . . . . . . . . . . . . . . . . . 261


9.1.1 Discrete random variables . . . . . . . . . . . . . . . . 262
9.1.2 The discrete uniform distribution . . . . . . . . . . . . 265
9.1.3 Binomial distribution . . . . . . . . . . . . . . . . . . 266
9.1.4 Poisson distribution . . . . . . . . . . . . . . . . . . . 275
9.1.5 The hypergeometric distribution . . . . . . . . . . . . 278
9.1.6 Continuous random variables . . . . . . . . . . . . . . 281
9.1.7 The continuous uniform distribution . . . . . . . . . . 283
9.1.8 The normal distribution . . . . . . . . . . . . . . . . . 283
9.1.9 Using a distribution . . . . . . . . . . . . . . . . . . . 287
9.2 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . 291
9.3 Are replicates useful? . . . . . . . . . . . . . . . . . . . . . . 292
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
9.5 Solved problems . . . . . . . . . . . . . . . . . . . . . . . . . 295
9.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

10 Basic statistics in R 299

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 299


10.2 Descriptive statistics in R . . . . . . . . . . . . . . . . . . . . 300
10.2.1 Mean, median, range, variance, and standard deviation 300
10.2.2 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
10.2.3 More built-in R functions for descriptive statistics . . 305
10.2.4 Covariance and correlation . . . . . . . . . . . . . . . 307
10.3 Probabilities and distributions in R . . . . . . . . . . . . . . 308
10.3.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 308
10.3.2 Empirical probabilities . . . . . . . . . . . . . . . . . . 309
10.3.3 Standard distributions in R . . . . . . . . . . . . . . . 315
10.3.4 Generating (pseudo-)random numbers . . . . . . . . . 316
10.3.5 Probability density functions . . . . . . . . . . . . . . 316
10.3.6 Cumulative distribution functions . . . . . . . . . . . 317
10.3.7 Quantiles . . . . . . . . . . . . . . . . . . . . . . . . . 319
10.3.7.1 The normal distribution . . . . . . . . . . . . 321
10.3.7.2 The binomial distribution . . . . . . . . . . . 324
10.3.8 Using built-in distributions in R . . . . . . . . . . . . 326
10.4 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . 329
utn|512748|1435358748

10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336


xiv Contents

10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

11 Statistical hypothesis testing 337

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 337


11.2 The framework . . . . . . . . . . . . . . . . . . . . . . . . . . 338
11.3 Hypothesis testing and significance . . . . . . . . . . . . . . 341
11.3.1 One-tailed testing . . . . . . . . . . . . . . . . . . . . 342
11.3.2 Two-tailed testing . . . . . . . . . . . . . . . . . . . . 346
11.4 “I do not believe God does not exist” . . . . . . . . . . . . . 348
11.5 An algorithm for hypothesis testing . . . . . . . . . . . . . . 350
11.6 Errors in hypothesis testing . . . . . . . . . . . . . . . . . . . 351
11.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
11.8 Solved problems . . . . . . . . . . . . . . . . . . . . . . . . . 356

12 Classical approaches to data analysis 359

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 359


12.2 Tests involving a single sample . . . . . . . . . . . . . . . . . 360
12.2.1 Tests involving the mean. The t distribution. . . . . . 360
12.2.2 Choosing the number of replicates . . . . . . . . . . . 366
12.2.3 Tests involving the variance (σ 2 ). The chi-square distri-
bution . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
12.2.4 Confidence intervals for standard deviation/variance . 374
12.3 Tests involving two samples . . . . . . . . . . . . . . . . . . . 375
12.3.1 Comparing variances. The F distribution. . . . . . . . 375
12.3.2 Comparing means . . . . . . . . . . . . . . . . . . . . 380
12.3.2.1 Equal variances . . . . . . . . . . . . . . . . 384
12.3.2.2 Unequal variances . . . . . . . . . . . . . . . 386
12.3.2.3 Paired testing . . . . . . . . . . . . . . . . . 387
12.3.3 Confidence intervals for the difference of means µ1 − µ2 388
12.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392

13 Analysis of Variance – ANOVA 393

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 393


13.1.1 Problem definition and model assumptions . . . . . . 393
13.1.2 The “dot” notation . . . . . . . . . . . . . . . . . . . . 397
13.2 One-way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . 398
13.2.1 One-way Model I ANOVA . . . . . . . . . . . . . . . . 398
13.2.1.1 Partitioning the Sum of Squares . . . . . . . 399
13.2.1.2 Degrees of freedom . . . . . . . . . . . . . . . 401
13.2.1.3 Testing the hypotheses . . . . . . . . . . . . 401
13.2.2 One-way Model II ANOVA . . . . . . . . . . . . . . . 405
Contents xv

13.3 Two-way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . 408


13.3.1 Randomized complete block design ANOVA . . . . . . 409
13.3.2 Comparison between one-way ANOVA and randomized
block design ANOVA . . . . . . . . . . . . . . . . . . 412
13.3.3 Some examples . . . . . . . . . . . . . . . . . . . . . . 413
13.3.4 Factorial design two-way ANOVA . . . . . . . . . . . 417
13.3.5 Data analysis plan for factorial design ANOVA . . . . 422
13.3.6 Reference formulae for factorial design ANOVA . . . . 423
13.4 Quality control . . . . . . . . . . . . . . . . . . . . . . . . . . 423
13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427

14 Linear models in R 431

14.1 Introduction and model formulation . . . . . . . . . . . . . . 431


14.2 Fitting linear models in R . . . . . . . . . . . . . . . . . . . 433
14.3 Extracting information from a fitted model: testing hypotheses
and making predictions . . . . . . . . . . . . . . . . . . . . . 437
14.4 Some limitations of linear models . . . . . . . . . . . . . . . 438
14.5 Dealing with multiple predictors and interactions in the linear
models, and interpreting model coefficients . . . . . . . . . . 441
14.5.1 Details on the design matrix creation and coefficients
estimation in linear models . . . . . . . . . . . . . . . 443
14.5.2 ANOVA using linear models . . . . . . . . . . . . . . . 445
14.5.2.1 One-way Model I ANOVA . . . . . . . . . . 446
14.5.2.2 Randomized block design ANOVA . . . . . . 451
14.5.3 Practical linear models for analysis of microarray data 452
14.5.4 A two-group comparison gene expression analysis using
a simple t -test . . . . . . . . . . . . . . . . . . . . . . 453
14.5.5 Differential expression using the limma library of Bio-
conductor . . . . . . . . . . . . . . . . . . . . . . . . . 455
14.5.5.1 Two group comparison with single-channel
data . . . . . . . . . . . . . . . . . . . . . . . 455
14.5.5.2 Multiple contrasts with single-channel data . 457
14.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

15 Experiment design 461

15.1 The concept of experiment design . . . . . . . . . . . . . . . 462


15.2 Comparing varieties . . . . . . . . . . . . . . . . . . . . . . . 462
15.3 Improving the production process . . . . . . . . . . . . . . . 464
15.4 Principles of experimental design . . . . . . . . . . . . . . . . 466
15.4.1 Replication . . . . . . . . . . . . . . . . . . . . . . . . 466
15.4.2 Randomization . . . . . . . . . . . . . . . . . . . . . . 469
15.4.3 Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . 470
xvi Contents

15.5 Guidelines for experimental design . . . . . . . . . . . . . . . 470


15.6 A short synthesis of statistical experiment designs . . . . . . 472
15.6.1 The fixed effect design . . . . . . . . . . . . . . . . . . 473
15.6.2 Randomized block design . . . . . . . . . . . . . . . . 474
15.6.3 Balanced incomplete block design . . . . . . . . . . . . 474
15.6.4 Latin square design . . . . . . . . . . . . . . . . . . . 475
15.6.5 Factorial design . . . . . . . . . . . . . . . . . . . . . . 475
15.6.6 Confounding in the factorial design . . . . . . . . . . . 478
15.7 Some microarray specific experiment designs . . . . . . . . . 479
15.7.1 The Jackson Lab approach . . . . . . . . . . . . . . . 479
15.7.2 Ratios and flip-dye experiments . . . . . . . . . . . . . 482
15.7.3 Reference design versus loop design . . . . . . . . . . . 484
15.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

16 Multiple comparisons 489

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 489


16.2 The problem of multiple comparisons . . . . . . . . . . . . . 490
16.3 A more precise argument . . . . . . . . . . . . . . . . . . . . 497
16.4 Corrections for multiple comparisons . . . . . . . . . . . . . 499
16.4.1 The Šidák correction . . . . . . . . . . . . . . . . . . . 499
16.4.2 The Bonferroni correction . . . . . . . . . . . . . . . . 500
16.4.3 Holm’s step-wise correction . . . . . . . . . . . . . . . 501
16.4.4 The false discovery rate (FDR) . . . . . . . . . . . . . 502
16.4.5 Permutation correction . . . . . . . . . . . . . . . . . 503
16.4.6 Significance analysis of microarrays (SAM) . . . . . . 505
16.4.7 On permutation-based methods . . . . . . . . . . . . . 506
16.5 Corrections for multiple comparisons in R . . . . . . . . . . . 506
16.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511

17 Analysis and visualization tools 513

17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 513


17.2 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
17.3 Gene pies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
17.4 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
17.4.1 Scatter plots in R . . . . . . . . . . . . . . . . . . . . 523
17.4.2 Scatter plot limitations . . . . . . . . . . . . . . . . . 524
17.4.3 Scatter plot summary . . . . . . . . . . . . . . . . . . 527
17.5 Volcano plots . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
17.5.1 Volcano plots in R . . . . . . . . . . . . . . . . . . . . 528
17.6 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
17.6.1 Histograms summary . . . . . . . . . . . . . . . . . . . 540
17.7 Time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
17.8 Time series plots in R . . . . . . . . . . . . . . . . . . . . . . 541
Contents xvii

17.9 Principal component analysis (PCA) . . . . . . . . . . . . . 548


17.9.1 PCA limitations . . . . . . . . . . . . . . . . . . . . . 556
17.9.2 Principal component analysis in R . . . . . . . . . . . 557
17.9.3 PCA summary . . . . . . . . . . . . . . . . . . . . . . 558
17.10 Independent component analysis (ICA) . . . . . . . . . . . . 561
17.11 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562

18 Cluster analysis 565

18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 565


18.2 Distance metric . . . . . . . . . . . . . . . . . . . . . . . . . 566
18.2.1 Euclidean distance . . . . . . . . . . . . . . . . . . . . 567
18.2.2 Manhattan distance . . . . . . . . . . . . . . . . . . . 568
18.2.3 Chebychev distance . . . . . . . . . . . . . . . . . . . 570
18.2.4 Angle between vectors . . . . . . . . . . . . . . . . . . 571
18.2.5 Correlation distance . . . . . . . . . . . . . . . . . . . 571
18.2.6 Squared Euclidean distance . . . . . . . . . . . . . . . 572
18.2.7 Standardized Euclidean distance . . . . . . . . . . . . 573
18.2.8 Mahalanobis distance . . . . . . . . . . . . . . . . . . 575
18.2.9 Minkowski distance . . . . . . . . . . . . . . . . . . . . 575
18.2.10 When to use what distance . . . . . . . . . . . . . . . 576
18.2.11 A comparison of various distances . . . . . . . . . . . 578
18.3 Clustering algorithms . . . . . . . . . . . . . . . . . . . . . . 579
18.3.1 k -means clustering . . . . . . . . . . . . . . . . . . . . 582
18.3.1.1 Characteristics of the k-means clustering . . 584
18.3.1.2 Cluster quality assessment . . . . . . . . . . 586
18.3.1.3 Number of clusters in k-means . . . . . . . . 591
18.3.1.4 Algorithm complexity . . . . . . . . . . . . . 591
18.3.2 Hierarchical clustering . . . . . . . . . . . . . . . . . . 592
18.3.2.1 Inter-cluster distances and algorithm complex-
ity . . . . . . . . . . . . . . . . . . . . . . . . 594
18.3.2.2 Top-down versus bottom-up . . . . . . . . . 595
18.3.2.3 Cutting tree diagrams . . . . . . . . . . . . . 597
18.3.2.4 An illustrative example . . . . . . . . . . . . 599
18.3.2.5 Hierarchical clustering summary . . . . . . . 601
18.3.3 Kohonen maps or self-organizing feature maps (SOFM) 603
18.4 Partitioning around medoids (PAM) . . . . . . . . . . . . . . 612
18.5 Biclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
18.5.1 Types of biclusters . . . . . . . . . . . . . . . . . . . . 615
18.5.2 Biclustering algorithms . . . . . . . . . . . . . . . . . 616
18.5.3 Differential biclustering . . . . . . . . . . . . . . . . . 618
18.5.4 Biclustering summary . . . . . . . . . . . . . . . . . . 619
18.6 Clustering in R . . . . . . . . . . . . . . . . . . . . . . . . . . 619
18.6.1 Partition around medoids (PAM) in R . . . . . . . . . 627
18.6.2 Biclustering in R . . . . . . . . . . . . . . . . . . . . . 629
xviii Contents

18.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630

19 Quality control 633

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 633


19.2 Quality control for Affymetrix data . . . . . . . . . . . . . . 634
19.2.1 Reading raw data (.CEL files) . . . . . . . . . . . . . . 634
19.2.2 Intensity distributions . . . . . . . . . . . . . . . . . . 635
19.2.3 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . 637
19.2.4 Probe intensity images . . . . . . . . . . . . . . . . . . 637
19.2.5 Quality control metrics . . . . . . . . . . . . . . . . . 639
19.2.6 RNA degradation curves . . . . . . . . . . . . . . . . . 645
19.2.7 Quality control plots . . . . . . . . . . . . . . . . . . . 647
19.2.8 Probe-level model (PLM) fitting. RLE and NUSE plots 652
19.3 Quality control of Illumina data . . . . . . . . . . . . . . . . 658
19.3.1 Reading Illumina data . . . . . . . . . . . . . . . . . . 658
19.3.2 Bead-summary data . . . . . . . . . . . . . . . . . . . 661
19.3.2.1 Raw probe data import, visualization, and
quality assessment using “beadarray” . . . . 661
19.3.2.2 Raw probe data import, visualization, and
quality assessment using “lumi” . . . . . . . . 663
19.3.3 Bead-level data . . . . . . . . . . . . . . . . . . . . . . 667
19.3.3.1 Raw bead data import and assessment . . . 667
19.3.3.2 Summarizing from bead-level to probe-level
data . . . . . . . . . . . . . . . . . . . . . . . 688
19.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 689

20 Data preprocessing and normalization 691

20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 691


20.2 General preprocessing techniques . . . . . . . . . . . . . . . . 692
20.2.1 The log transform . . . . . . . . . . . . . . . . . . . . 692
20.2.2 Combining replicates and eliminating outliers . . . . . 694
20.2.3 Array normalization . . . . . . . . . . . . . . . . . . . 696
20.2.3.1 Dividing by the array mean . . . . . . . . . . 699
20.2.3.2 Subtracting the mean . . . . . . . . . . . . . 699
20.2.3.3 Using control spots/genes . . . . . . . . . . . 701
20.2.3.4 Iterative linear regression . . . . . . . . . . . 701
20.2.3.5 Other aspects of array normalization . . . . 702
20.3 Normalization issues specific to cDNA data . . . . . . . . . . 702
20.3.1 Background correction . . . . . . . . . . . . . . . . . . 702
20.3.1.1 Local background correction . . . . . . . . . 702
20.3.1.2 Sub-grid background correction . . . . . . . 703
20.3.1.3 Group background correction . . . . . . . . . 703
20.3.1.4 Background correction using blank spots . . 703
Contents xix

20.3.1.5 Background correction using control spots . 703


20.3.2 Other spot level preprocessing . . . . . . . . . . . . . 704
20.3.3 Color normalization . . . . . . . . . . . . . . . . . . . 704
20.3.3.1 Curve fitting and correction . . . . . . . . . 706
20.3.3.2 LOWESS/LOESS normalization . . . . . . . 708
20.3.3.3 Piece-wise normalization . . . . . . . . . . . 711
20.3.3.4 Other approaches to cDNA data normaliza-
tion . . . . . . . . . . . . . . . . . . . . . . . 713
20.4 Normalization issues specific to Affymetrix data . . . . . . . 713
20.4.1 Background correction . . . . . . . . . . . . . . . . . . 713
20.4.2 Signal calculation . . . . . . . . . . . . . . . . . . . . . 716
20.4.2.1 Ideal mismatch . . . . . . . . . . . . . . . . . 716
20.4.2.2 Probe values . . . . . . . . . . . . . . . . . . 717
20.4.2.3 Scaled probe values . . . . . . . . . . . . . . 718
20.4.3 Detection calls . . . . . . . . . . . . . . . . . . . . . . 719
20.4.4 Relative expression values . . . . . . . . . . . . . . . . 720
20.5 Other approaches to the normalization of Affymetrix data . 720
20.5.1 Cyclic Loess . . . . . . . . . . . . . . . . . . . . . . . . 720
20.5.2 The model-based dChip approach . . . . . . . . . . . 721
20.5.3 The Robust Multi-Array Analysis (RMA) . . . . . . . 722
20.5.4 Quantile normalization . . . . . . . . . . . . . . . . . . 722
20.6 Useful preprocessing and normalization sequences . . . . . . 725
20.7 Normalization procedures in R . . . . . . . . . . . . . . . . . 726
20.7.1 Normalization functions and procedures for Affymetrix
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
20.7.2 Background adjustment and various types of normaliza-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
20.7.3 Summarization . . . . . . . . . . . . . . . . . . . . . . 733
20.8 Batch preprocessing . . . . . . . . . . . . . . . . . . . . . . . 736
20.9 Normalization functions and procedures for Illumina data . . 737
20.10Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
20.11Appendix: A short primer on logarithms . . . . . . . . . . . 744

21 Methods for selecting differentially expressed genes 747

21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 747


21.2 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
21.3 Fold change . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
21.3.1 Description . . . . . . . . . . . . . . . . . . . . . . . . 751
21.3.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . 752
21.4 Unusual ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
21.4.1 Description . . . . . . . . . . . . . . . . . . . . . . . . 754
21.4.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . 755
21.5 Hypothesis testing, corrections for multiple comparisons, and
resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
Discovering Diverse Content Through
Random Scribd Documents
*** END OF THE PROJECT GUTENBERG EBOOK ABRAHAM
LINCOLN ***

Updated editions will replace the previous one—the old editions will
be renamed.

Creating the works from print editions not protected by U.S.


copyright law means that no one owns a United States copyright in
these works, so the Foundation (and you!) can copy and distribute it
in the United States without permission and without paying copyright
royalties. Special rules, set forth in the General Terms of Use part of
this license, apply to copying and distributing Project Gutenberg™
electronic works to protect the PROJECT GUTENBERG™ concept
and trademark. Project Gutenberg is a registered trademark, and
may not be used if you charge for an eBook, except by following the
terms of the trademark license, including paying royalties for use of
the Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is very
easy. You may use this eBook for nearly any purpose such as
creation of derivative works, reports, performances and research.
Project Gutenberg eBooks may be modified and printed and given
away—you may do practically ANYTHING in the United States with
eBooks not protected by U.S. copyright law. Redistribution is subject
to the trademark license, especially commercial redistribution.

START: FULL LICENSE


THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the free


distribution of electronic works, by using or distributing this work (or
any other work associated in any way with the phrase “Project
Gutenberg”), you agree to comply with all the terms of the Full
Project Gutenberg™ License available with this file or online at
www.gutenberg.org/license.

Section 1. General Terms of Use and


Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand, agree
to and accept all the terms of this license and intellectual property
(trademark/copyright) agreement. If you do not agree to abide by all
the terms of this agreement, you must cease using and return or
destroy all copies of Project Gutenberg™ electronic works in your
possession. If you paid a fee for obtaining a copy of or access to a
Project Gutenberg™ electronic work and you do not agree to be
bound by the terms of this agreement, you may obtain a refund from
the person or entity to whom you paid the fee as set forth in
paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only be


used on or associated in any way with an electronic work by people
who agree to be bound by the terms of this agreement. There are a
few things that you can do with most Project Gutenberg™ electronic
works even without complying with the full terms of this agreement.
See paragraph 1.C below. There are a lot of things you can do with
Project Gutenberg™ electronic works if you follow the terms of this
agreement and help preserve free future access to Project
Gutenberg™ electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law in
the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name
associated with the work. You can easily comply with the terms of
this agreement by keeping this work in the same format with its
attached full Project Gutenberg™ License when you share it without
charge with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the terms
of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.

1.E. Unless you have removed all references to Project Gutenberg:

1.E.1. The following sentence, with active links to, or other


immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears, or
with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is derived


from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is posted


with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning of
this work.

1.E.4. Do not unlink or detach or remove the full Project


Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute this


electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1 with
active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or expense
to the user, provide a copy, a means of exporting a copy, or a means
of obtaining a copy upon request, of the work in its original “Plain
Vanilla ASCII” or other form. Any alternate format must include the
full Project Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,


performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or providing


access to or distributing Project Gutenberg™ electronic works
provided that:

• You pay a royalty fee of 20% of the gross profits you derive from
the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who


notifies you in writing (or by e-mail) within 30 days of receipt that
s/he does not agree to the terms of the full Project Gutenberg™
License. You must require such a user to return or destroy all
copies of the works possessed in a physical medium and
discontinue all use of and all access to other copies of Project
Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of


any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™


electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend


considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except


for the “Right of Replacement or Refund” described in paragraph
1.F.3, the Project Gutenberg Literary Archive Foundation, the owner
of the Project Gutenberg™ trademark, and any other party
distributing a Project Gutenberg™ electronic work under this
agreement, disclaim all liability to you for damages, costs and
expenses, including legal fees. YOU AGREE THAT YOU HAVE NO
REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF
WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE
PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE
FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you


discover a defect in this electronic work within 90 days of receiving it,
you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or entity
that provided you with the defective work may elect to provide a
replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.

1.F.4. Except for the limited right of replacement or refund set forth in
paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and distribution
of Project Gutenberg™ electronic works, harmless from all liability,
costs and expenses, including legal fees, that arise directly or
indirectly from any of the following which you do or cause to occur:
(a) distribution of this or any Project Gutenberg™ work, (b)
alteration, modification, or additions or deletions to any Project
Gutenberg™ work, and (c) any Defect you cause.

Section 2. Information about the Mission of


Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many small
donations ($1 to $5,000) are particularly important to maintaining tax
exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About Project


Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed


editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebookfinal.com

You might also like