Modeling Dose-Response Microarray Data in Early Drug Development Experiments Using R
Modeling Dose-Response Microarray Data in Early Drug Development Experiments Using R
Series Editors:
Robert Gentleman Kurt Hornik Giovanni Parmigiani
For further volumes:
http://www.springer.com/series/6991
Dan Lin Ziv Shkedy Daniel Yekutieli
Dhammika Amaratunga Luc Bijnens
Editors
Modeling Dose-Response
Microarray Data in Early
Drug Development
Experiments Using R
Order-Restricted Analysis of Microarray Data
123
Editors
Dan Lin Dhammika Amaratunga
Veterinary Medicine Research Biostatistics and Programming
& Development, Pfizer Animal Health Janssen Pharmaceutical Companies
Zaventem, Belgium of Johnson & Johnson
Raritan, NJ USA
Ziv Shkedy
Interuniversity Institute for Biostatistics Luc Bijnens
and statistical Bioinformatics (I-BioStat) Biostatistics and Programming
Center for Statistics (CenStat) Janssen Pharmaceutical Companies
Hasselt University, Diepenbeek, Belgium of Johnson & Johnson
Beerse, Belgium
Daniel Yekutieli
Department of Statistics and Operations
Research, School of Mathematical Sciences,
Tel-Aviv University, Tel-Aviv, Israel
Bioinformatics and statistical bioinformatics have developed rapidly over the last
15 years. In particular, the development of microarray technology introduced the
challenge of the analysis of massive datasets and the need to consider inference
when thousands of genes are tested.
Microarray experiments are slowly becoming an integrated part of the pharma-
ceutical research and development (R&D) process. Microarray experiments offer
the ability to measure, at the same time, the RNA derived from entire genomes.
Microarrays specific for animal species can be used to process the information
coming from in vivo animal experiments. Microarrays specific for humans can be
used to test biological material coming from biopsies, blood samples, or cell line
cultures. The functional genomic information coming from microarray experiments
can be used both at the target identification and the target validation level of the early
drug discovery process. Moreover, functional genomics can be used at many later
stages of pharmaceutical research and development to screen therapeutic effects and
unwanted side effects in many R&D programs.
This book is about a specific setting in which gene expression is measured at
different dose levels of a drug. The main goal of the analysis of dose-response
microarray experiments is to detect trends in gene expression caused by increasing
doses of compound. Therefore, this book is focused on estimation, inference, and
clustering under order restrictions of dose-response microarray data. The aim of
these microarray experiments is to get insight into the mechanism of action and
the safety profile of a drug using functional genomic data to identify pathways that
are affected by the compound at hand. In this context, gene expression experiments
have become important either before or parallel to the clinical testing programs.
In this book, we present a toolbox for the analysis of dose-response microarray
experiments. The toolbox consists of different statistical methods for the analysis
and different R packages which were developed for the analysis. The web site
accompanying this book contains all the R programs and datasets used to produce
the output presented in the book. It can be reached through the web site of Hasselt
University:
http://www.ibiostat.be/software/IsoGeneGUI/index.html
v
vi Preface
R packages can be downloaded from either the R Project web site, R-Forge, CRAN,
or the Bioconductor web site: http://www.r-project.org/, http://cran.r-project.org/,
http://r-forge.r-project.org/, and http://www.bioconductor.org/, respectively.
In the first part of the book, we introduce the main concepts of estimation and
inference under order constraints and dose-response modeling. In the second part
of the book, we focus on the analysis of dose-response microarray experiments
and address issues such as multiplicity adjustment, selective inference, single and
multiple contrast tests, order-restricted clustering, pathway analysis, hierarchical
Bayesian models, and model-based approach.
Bioinformatics and statistical bioinformatics are multidisciplinary areas. The
materials presented in this book have been developed over the last few years by
a group of biologists, biostatisticians, mathematical statisticians, and computer
scientists from both academia and pharmaceutical industry. Most of the coauthors
of this book are part of the CHIPS (common hour involving practical statistics)
network. This network was initiated about 10 years ago as biweekly workshops to
give statisticians, bioinformaticians, and life scientists the opportunity to integrate
expertise on the design and analysis of microarray experiments. The network has a
strong culture of sharing knowledge and expertise among the different professions.
Last, but not least, we would like to thank all our collaborators, without their
work this book could never have been published: Marc Aerts, Frank Bretz, Tomasz
Burzykowski, Djork-Arn Clevet, An De Bondt, Gemechis D. Dijra, Filip De Ridder,
Hinrich. W.H. Göhlmann, Philippe Haldermans, Sepp Hochreiter, Ludwig Hothorn,
Adetayo Kasim, Bernet Kato, Martin Otava, Setia Pramana, Pieter Peeters, Tim
Perrera, Jose Pinheiro, Nandini Raghavan, Roel Straetemans, Willem Talloen, Suzy
Van Sanden, and Tobias Verbeke. We would like to thank Niels Thomas and Alice
Blanck from Springer, Heidelberg, and the head of the production team Ms. Ranjani
Shanmugaraj for all their help and support during the preparation of this book.
1 Introduction .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
Dan Lin, Willem Talloen, Luc Bijnens,
Hinrich W.H. Göhlmann, Dhammika Amaratunga,
and Roel Straetemans
vii
viii Contents
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 281
Contributors
DoseFinding
DoseFinding is a CRAN package for the design and the analysis of dose-
finding experiments. It provides functions for multiple contrast tests, nonlinear
dose-response modeling, calculating optimal designs, and an implementation of the
MCPMod methodology discussed in Pinheiro et al. (2006).
fdrame
FDR-AME is a Bioconductor package that computes FDR adjustments for p-values
generated in multiple hypotheses testing of gene expression data obtained by a
microarray experiment. It applies both theoretical-distribution-based and
resampling-based multiple testing procedures and presents as output of the adjusted
p-values and p-value plots, as described in Reiner et al. (2003).
IsoGene
A CRAN R package for testing monotone trends in dose-response microarray
experiments. The package provides several testing procedures discussed in Lin et al.
(2007). Inference is based on either the asymptotic distribution of the likelihood
ratio test statistic or resampling-based inference for the t-type test statistics.
Adjustment for multiplicity is based on either the BH-FDR procedure or SAM.
IsoGeneGUI
A graphical user interface for the IsoGene package that does not require an exten-
sive knowledge of R. The package performs all the statistical tests implemented in
the IsoGene and provides several default and user-defined graphical and numerical
output. The capacity of the package is discussed in Pramana et al. (2010a,b).
limma
The Bioconductor Limma package (Smyth, 2004) fits a hybrid frequentist/eBayes
linear model for the expression levels of the genes in the array. The package can be
used to analyze gene expression data obtained from several microarray platforms
such as two-color cDNA (including normalization function for data preprocessing)
and Affymetrix.
xi
xii R Packages
MLP
A Bioconductor R package for analysis of data from a microarray experiment to
determine significant sets of genes that are functionally related or in a certain
biological pathway. The package performs gene set analysis using the MLP
approach described in Raghavan et al. (2006). Genes are mapped into gene sets or
pathways by utilizing gene annotation databases such as the Gene Ontology, KEGG,
etc. The p-values corresponding to genes in a gene set are used to define a gene set
statistic. Gene set significance is determined using a permutation procedure based
on randomly reassigning p-values to genes.
mratios
The mratios package provides simultaneous inferences for ratios of linear
combinations of coefficients in the general linear model. It includes several multiple
comparison procedures as applied to ratio parameters, parallel-line and slope-ratio
assays, and tests for noninferiority and superiority based on relative thresholds
(Dilba et al. 2007).
multtest
The Bioconductor package multtest uses resampling-based multiple testing
procedures for controlling the family-wise error rate (FWER), generalized family-
wise error rate (gFWER), and false discovery rate (FDR). Single-step and stepwise
methods are implemented. The results are reported in terms of adjusted p-values,
confidence regions, and test statistic cutoffs. The procedures are directly applicable
to identifying differentially expressed genes in DNA microarray experiments.
The package is discussed by Dudoit and van der Laan (2008), Multiple testing
procedures with applications to genomics.
multcomp
A CRAN R package for simultaneous tests and confidence intervals for general
linear hypotheses in parametric models, including linear, generalized linear, linear
mixed effects, and survival models. The package capacity is described in Bretz et al.
(2010), Multiple comparisons using R.
nlme
A CRAN R package for fitting linear mixed models, nonlinear mixed effects models
and generelaized linear models. The function gnls() can be used to fit nonlinear
models. An elaborate discussion about the methodology is given by Pinheiro and
Bates (2000), Mixed effects models in S and S plus.
ORCME
A CRAN R package for simple order-restricted clustering of dose-response
microarray data. The ORCME package finds clusters of genes with co-regulated
dose-response relationship. This package implements a variation of biclustering
algorithms of Cheng and Church (2000).
ORIClust
An R package for order-restricted clustering of dose-response microarray data.
The clustering algorithm implemented in ORIClust, ORICC, is a model
R Packages xiii
References
Bretz, F., Hothorn, T., & Westfall P. (2010). Multiple comparisons using R. Boca Raton: CRC.
Cheng, Y., & Church, G.M. (2000). Biclustering of expression data. Proceedings of the Conference
on Intelligent Systems for Molecular Biology, 55, 93–104.
De Leeuw, J., Hornik, K., & Mair, P. (2009). Isotone Optimization in R: Pool-Adjacent-Violators
Algorithm (PAVA) and Active Set Methods. Journal of Statistical Software, 32(5), 1–24.
Dilba, D., Schaarschmidt, F., & Hothorn, L.A. (2007). Inferences for ratios of normal means.
R News 7, 1, 20–23.
Dudoit, S., & van der Laan, M.J. (2008). Multiple testing procedures with applications to genomics.
New York: Springer.
Lin, D., Shkedy, Z., Yekutieli, D., Burzykowki, T., Göhlmann, H.W.H., De Bondt, A., et al. (2007).
Testing for trend in dose-response microarray experiments: Comparison of several testing
procedures, multiplicity, and resampling-based inference. Statistical Application in Genetics
and Molecular Biology, 6(1). Article 26.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009a). Order-restricted information criterion-based
clustering algorithm. Reference manual. http://cran.r-project.org/web/packages/ORIClust/.
xiv R Packages
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009b). Information criterion-based clustering with order
restricted candidate profiles in short time-course microarray experiments. BMC Bioinformatics
10, 146.
Peddada, S., Lobenhofer, E.K., Li, L., Afshari, C.A., Weinberg, C.R., & Umbach, D.M. (2003).
Gene selection and clustering for time-course and dose-response microarray experimants using
order-restricted inference. Bioinformatics, 19(7), 834–841.
Peddada, S., Harris, S., & Harvey E. (2005). ORIOGEN: Order restricted inference for ordered
gene expression data. Bioinformatics, 21(20), 3933–3934.
Pinheiro, J., & Bates, D. (2000). Mixed-effects models in S and S-PLUS. Springer, New York
Pinheiro, J.C., Bretz, F., & Branson, M. (2006). Analysis of dose-response studies—Modeling
approaches. In N. Ting (Ed.), Dose finding in drug development (pp. 146–171). New York:
Springer.
Pramana, S. Lin, D., Haldermans, P., Shkedy, Z., Verbeke, T., Göhlmann, H., et al. (2010a).
IsoGene: an R package for analyzing dose-response studies in microarray experiments. The
R Journal, 2(1), 5–12.
Pramana, S., Lin, D., & Shkedy, Z. (2010b). IsoGeneGUI package vignette. Bioconductor. http://
www.bioconductor.org.
Raghavan, N., Amaratunga, D., Cabrera, J., Nie, A., Qin, J., & McMillian, M. (2006). On Methods
for gene function scoring as a means of facilitating the interpretation of microarray results.
Journal of Computational Biology, 13(3), 798–809.
Reiner, A., Yekutieli, D., & Benjamini, Y. (2003). Identifying differentially expressed genes using
false discovery rate controlling procedures. Bioinformatics, 19(3), 368–375.
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential
expression in microarray experiments. Statistical Applications in Genetics and Molecular
Biology, 3. Article 3. http://www.bepress.com/sagmb/vol3/iss1/art
Sturtz, S., Ligges, U., & Gelman, A. (2005) R2WinBUGS: A package for running WinBUGS from
R. Journal of Statistical Software, 12(3), 1–16.
Tusher, V.G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrys applied to the
ionizing radiation response, Proceedings of the National Academy of Sciences, 98, 5116–5121.
Acknowledgments
The work of Ludwig A. Hothorn was supported by the German Science Foundation
fund HO1687/9-1 and the EC FP7 ESNATS project no. 201619.
Daniel Yekutieli gratefully acknowledged the support from the Wharton School,
University of Pennsylvania, for his appointment to the Harry W. Reynolds Visiting
International Professorship.
Dan Lin, Ziv Shkedy, Marc Aerts, Tomasz Burzykowski, Philippe Haldermans,
Martin Otava, and Setia Pramana gratefully acknowledged the support from the
Belgian IUAP/PAI network “Statistical techniques and modeling for complex
substantive questions with complex data.”
xv
Chapter 1
Introduction
1.1 Introduction
The development of new and innovative treatments for unmet medical (Barlow
et al. 1972) needs is the major challenge in biomedical research. Unfortunately,
for the past decade, there has been a steady decline in the number of new therapies
reaching the market, despite of the increased investments in pharmaceutical R&D
(FDA 2004). One of the most critical steps in a drug discovery program is
target identification and validation (Sams-Dodd 2005). Good drugs are potent and
specific, that is, they must have strong effects on a specific biological pathway and
minimal effects on all other pathways (Marton et al. 1998). Confirmation that a
compound inhibits the intended target (drug target validation) and the identification
of undesirable secondary effects are among the main challenges in developing
new drugs. This is the reason why dose-response experiments are pivotal in drug
discovery programs. Dose-response experiments help us to understand how the
drug works and to explore whether it has the desired properties of a potential novel
therapy. A compound will only move further in clinical testing when it has a side
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
W. Talloen L. Bijnens H.W.H. Göhlmann
Janssen Pharmaceutical Companies of Johnson & Johnson, Beerse, Belgium
e-mail: [email protected]; [email protected]; [email protected]
D. Amaratunga
Biostatistics and Programming, Janssen Pharmaceutical Companies of Johnson & Johnson,
Raritan, NJ USA
e-mail: [email protected]
R. Straetemans
Ablynx NV, Zwijnaarde, Belgium
e-mail: [email protected]
effect profile that is acceptable within the dose range that demonstrates a high level
of target activity.
Dose-response experiments have a simple concept. The compound of interest
is administered at several doses to a biological sample (a cell line, an animal
model, a human volunteer, or a patient) and the response is measured. Dose-
response experiments allow researchers to assess the relationship between the dose
(amount, concentration) of a drug and the response observed. Despite the conceptual
simplicity, however, the practical analysis is much more complicated. First, a
response can change dose-dependently in a lot of different ways, and many of dose-
response relationship are complex and nonlinear. Second, it is difficult to choose an
appropriate response measure. Often one may want to investigate even more than
one response. This is because a treatment will generally lead to multiple biological
reactions, and one needs to try to disentangle direct from indirect responses and
desired from undesired effects.
the drug effect? (2) For which doses is the response different from the response
in the control group? (3) What is the nature of the dose-response relationship?
and (4) What is the optimal dose? We can answer these questions either by
testing for monotone trend of the response with respect to dose or by modeling
the dose-response relationship. In both cases, our underlying assumption is that
the true relationship between the dose and the response is monotone. In some
applications, the underlying assumption of monotonicity is not appropriate, and
other non-monotone order-restricted profiles such as the simple tree order, unimodal
partial order (umbrella profiles), and cyclical patterns should be considered. For a
discussion about monotonicity issues within the dose-response setting, we refer to
Cooke (2009) and Louis (2009).
Dose-response modeling refers to implementing a mathematical representation
of some true and unknown relationship. Dose-response models can be classified
as being empirical or mechanistic in nature. An empirical model, such as the
four-parameter logistic model, discussed in Chaps. 4 and 14, serves to adequately
describe the observed pattern between a dose and a response without giving an
understanding of the underlying biological process. In other words, the parameters
present in the model do not represent biological processes. A mechanistic model on
the other hand uses mechanistical pathways to explain the observed pattern. In this
book, we focus on the first type of models.
Now the genome of man and other species have been completely sequenced, we
enter the so-called “post-genomic era” that concentrates on harvesting the fruits
hidden in the genomic text (Lengauer 2001). The advent of biotechnologies such
as microarrays allows us to do so by effectively measuring the activity of an entire
genome at once under different conditions. The wealth of biological information
of this procedure presents immense new opportunities for developing effective
therapies. History has taught us that 30–40% of experimental drugs fail because
an inappropriate biological target was pursued (Butcher 2003). The major impact
of genomic information may therefore be to reduce this biological failure rate by
earlier definition of drug targets related to disease susceptibility or progression. This
becomes clearer when one reflects about what a “drug-target” actually is. A drug
target is a relatively vague term referring to any number of biological molecular
classes (proteins, genes, RNA, sugars, . . . ) that are “druggable”. To be druggable,
a target needs to be accessible to putative drug molecules and bind them in such
a way that a beneficial biological effect is produced. With microarrays and other
high-content screening tools, a wide array of target identification and validation
technologies becomes available. Genomics, transcriptomics (i.e., gene expression
profiling), and proteomics allow researchers to study many of these drug targets
in an unprecedented high-content way. They allow researchers to monitor and
discover the biological effects of a potential drug. In summary, the availability of
4 D. Lin et al.
the human genome sequence represents an exciting advance for the development
of novel treatments. When combined with high-content screening methods such as
microarrays, the success rate for experimental drugs can be expected to improve
(Butcher 2003). The major issue with microarray data analysis is the curse of high
dimensionality. Because so much information is gathered on biological activity,
it becomes a challenge to find the relevant information in the haystack of irrelevant
information. One runs the risk of missing the interesting results in a mass of false
positive findings.
Microarray dose-response experiments allow researchers to study the relation-
ship between the dose of a drug and the activity of an entire genome at once.
It combines the information wealth of microarrays with the benefits of dose-
response studies. Their combined use yields two additional advantages. First,
the proportion of false positive findings will be substantially reduced as more
information on the entire dose-response profile is collected. False positive genes
identified by a one-dose treatment study are easier to unmask in multiple-dose
studies when the gene has an unrealistic dose-response relationship. Second, genes
within the same biological sample may respond differently to drug dose. One
therefore wants to investigate more than one gene. How many exactly is difficult
to say, but in a discovery phase, it is typically the more the better. In early stages
of drug development, one indeed tries to explore as many potential effects of the
drug as possible. A microarray dose-response experiment studies the entire genome
at once, and is therefore an ideal tool to elucidate variation in dose-dependency of a
treatment across all genes and all known pathways.
Although analysis of gene expression data is the main focus of the book, the
discussion about microarray technology is beyond the scope of this book. We
refer to Amaratunga and Cabrera (2003) and Göehlmann and Talloen (2009) for
an elaborate discussion about the microarray technology and topics related to the
analysis of microarray data.
The general structure of this book is shown in Fig. 1.1. Although the main part of this
book is devoted to the specific setting of microarray dose-response experiment, we
introduce the main concept of dose-response modeling in the first part of the book.
Estimation under order restrictions and inference are discussed in Chaps. 2 and 3,
while parametric nonlinear modeling of dose-response data is described in Chap. 4.
The methodology discussed in these chapters is introduced in a general setting, and
materials for these chapters are used throughout the second part of the book.
The second part of the book starts with an introduction to dose-response microar-
ray experiments and their specific data structure in Chap. 5, in which the case studies
are introduced as well. The analysis of microarray data introduces the challenge of
multiple testing. A general guidance for the multiple testing problem in a microarray
1 Introduction 5
1.5 Notation
Throughout the book, we denote di as the ith dose level. We use .di / and i for
the mean gene expression at the ith dose level. Isotonic means at the ith dose are
O i / and O i . Unless specified otherwise, all the models presented in
denoted as .d
the book are gene specific. In order to simplify notation, we drop the index for the
gene. R code is presented in the following way:
1 Introduction 7
Note that, the R code for the analysis is presented as complete as possible. Due to
space limit and the length of the code, parts of the code is omitted. The complete
R code can be downloaded from the website of the book at:
http://www.ibiostat.be/software/IsoGeneGUI/index.html.
References
Amaratunga & Cabrera, J. (2003). Exploration and Analysis of DNA Microarray and Protein Array
Data, New York: John Wiley
Barlow, R.E., Bartholomew, D.J., Bremner, M.J., & Brunk, H.D. (1972). Statistical inference under
order restriction. New York: Wiley.
Butcher, S. (2003). Target discovery and validation in the post-genomic era. Neurochemical
Research, 28, 367–377.
Chuang-Stein, C., & Agresti, A. (1997). Tutorial in biostatistics: A review of tests for detecting
a monotone dose-response relationship with ordinal response data. Statistics in Medicine, 16,
2599–2618.
Cooke, R.M. (Ed.). (2009). Uncertainty modeling in dose-response. New York: Wiley.
FDA, U. (2004). Innovation or stagnation: Challenge and opportunity on the critical path to new
medical products. Silver Spring: US Food and Drug Administation. New York: Wiley
Goehlmann, H., & Talloen, W. (2009). Gene expression studies using Affymetrix microarrays. Boca
Raton: Chapman & Hall/CRC. New York: Wiley
Jacqmin, P., Snoeck, E., van Schaick, E., Gieschke, R., Pillai, P., Steimer, J.-L., et al. (2007).
Modelling response time profiles in the absence of drug concentrations: Definition and perfor-
mance evaluation of the KPD model. Journal of Pharmacokinetics and Pharmacodynamics,
34, 57–85.
Jacobs, T., Straetmans, R., Molenberghs, G., Bouwknecht, A., & Bijnens, L. (2010). Latent phar-
macokinetic time profile to model dose-response survival data. Journal of Biopharmaceutical
Statistics, 20(4), 759–767.
Lengauer, T. (2001). Computational biology at the beginning of the post-genomic era. In
Informatics—10 years back. 10 years ahead (Vol. 355). Heidelberg: Springer. New York: Wiley.
Louis, T.A. (2009). Math/Stat perspective in Chapter 2: agreement and disagreement, in Cooke,
R.M. (Ed.). (2009). Uncertainty modeling in dose-response. New York: Wiley.
Marton, M., DeRisi, J., Bennett, H., Iyer, V., Meyer, M. Roberts, C., Stoughton, R., Burchard, J.,
Slade, D., & Dai, H. (1998). Drug target validation and idenfication of secondary drug target
effects using DNA microarrays. Nature Medicine, 4, 1293–1301.
Ruberg, S.J. (1995a). Dose response studies. I. Some design considerations. Journal of Biophar-
maceutical Statistics, 5(1), 1–14.
Ruberg, S. J. (1995b) Dose response studies. II. Analysis and interpretation. Journal of Biophar-
maceutical Statistics, 5(1), 15–42.
Sams-Dodd, F. (2005). Target-based drug discovery: Is something wrong? Drug Discovery Today,
10, 139–147.
Part I
Dose–Response Modeling: An Introduction
Chapter 2
Estimation Under Order Restrictions
2.1 Introduction
The basic setting on which we focus in the first part of this book is one in which a
response variable Y is expected to increase or decrease monotonically with respect
to increasing levels of a predictor variable x which in biomedical applications is
usually the dose or concentration of a drug. We assume that the mean response is
given by
E.Y jx/ D .x/;
where . / is an unknown monotone function. The case in which .x/ is order
restricted but not monotone will be discussed in Chaps. 5, 11 and 15 in the second
part of the book. The main problem is that although . / is a monotone function,
unless the sample size at each design point increases to infinity, neither the observed
data nor the estimated means are necessarily monotone. For illustration, consider a
linear model with five discrete design points:
Here, i is the true mean at the each design point, i D 5, 5.5, 6, 6.5, and 7,
respectively, and ni is the sample size at each design point. We generate 10 6
datasets according to model (2.1) with the sample size at each design point equal
to 5, 10, 25, 50, 100, and 1,000, respectively. Figure 2.1a shows an example of
a
8
y
6
1 2 3 4 5
x
b
sample size at each design point: 5 sample size at each design point: 10
8 8
mean response
mean response
7 7
6 6
5 5
4 4
1 2 3 4 5 1 2 3 4 5
x x
sample size at each design point: 25 sample size at each design point: 50
8 8
mean response
mean response
7 7
6 6
5 5
4 4
1 2 3 4 5 1 2 3 4 5
x x
sample size at each design point: 100 sample size at each design point: 1000
8 8
mean response
mean response
7 7
6 6
5 5
4 4
1 2 3 4 5 1 2 3 4 5
x x
Fig. 2.1 Illustrative example with true mean at each deign point equal to 5, 5.5, 6, 6.5, and 7,
respectively. (a) Illustrative example of a dataset with five observations at each design point.
Dashed line: unrestricted means; solid line: true (monotone) means. (b) Sample means at each
design point for ten datasets generated under model (2.1), sample sizes are equal to 5, 10, 25, 50,
100, and 1,000
one dataset with five observations at each design point while Fig. 2.1b shows the
estimated means for ten datasets for an increasing number of observations at each
design point. Clearly, when sample sizes are relatively small, the observed means
are not monotone even though the true means are monotone.
2 Estimation Under Order Restrictions 13
Two questions arise now. The first is how to estimate the mean response under
the assumption that the true mean is monotone with respect to x. The second is
how to test whether the mean responses in the population are indeed monotone.
In this chapter, we focus on the first question and discuss the estimation problem
using isotonic regression while the topic of inference under order restrictions will
be discussed in Chap. 3.
in the class of all isotonic functions f on X . The PAVA was proposed by Barlow
et al. (1972) and Robertson et al. (1988) in order to minimize (2.2) subject to the
constraint f .xj / f .xi / for xj xi .
Let us focus again on an experiment in which the predictor variable X has KC1
discrete levels and the response variable has ni replicates at each level of the
predictor variable. Hence, at each design point, the observed data consists of the
pairs f.xi ; yij /g; i D 0; 1; : : : ; K; j D 1; : : : ; ni . Without loss of generality, we
assume x0 xK . Denote the maximum likelihood estimate of .xi / by .x O i /.
14 Z. Shkedy et al.
O i / C ni C1 .x
ni .x O i C1 /
O i ; xi C1 / D
.x :
ni C ni C1
Robertson et al. (1988) discussed a dental study in which the size of the pituitary
fissure was measured for groups of girls at age 8, 10, 12, and 14. The raw data
are shown in Fig. 2.2a. The underlying assumption is that the size of the pituitary
fissure increases with age. However, as can be seen clearly in Fig. 2.2a, the observed
mean at each age group is not monotone with age since the observed mean at age
12 (20.8333) is smaller than the observed mean at age 10 (23.3333).
> age = c(8, 8, 8, 10, 10, 10, 12, 12, 12,14, 14)
> size = c(21,23.5,23,24,21,25,21.5,22,19,23.5,25)
> #sample means at each age group
> msize<-tapply(size,as.factor(age),mean)
> par(mfrow=c(2,2))
> plot(age,size)
> lines(unique(age),msize,lty=2)
> title("a:row data")
> msize
8 10 12 14
22.50000 23.33333 20.83333 24.25000
In the first step of the PAVA, we pool together the means of the second and third age
groups,
O
n10 .10/ C n12 .12/
O
O
.10; 12/ D :
n10 C n12
> msize1<-msize
> msize1[2:3]<-(3*23.33333+3*20.83333)/6
> msize1
8 10 12 14
22.50000 22.08333 22.08333 24.25000
> plot(age,size)
> lines(unique(age),msize,lty=2)
> lines(unique(age),msize1)
> title("b:step 1")
2 Estimation Under Order Restrictions 15
24 24
23 23
size
size
22 22
21 21
20 20
19 19
8 9 10 11 12 13 14 8 9 10 11 12 13 14
age age
c step 2 d PAVA
25 25
24 24
23 23
size
size
22 22
21 21
20 20
19 19
8 9 10 11 12 13 14 8 9 10 11 12 13 14
age age
Fig. 2.2 Isotonic regression for the pituitary fissure example from Robertson et al. (1988). Panels
(a)–(c): isotonic regression step by step. Panel (d): isotonic regression using the pava() function.
Dashed line: unrestricted means
The result is presented in Fig. 2.2b, and we can see that after the first pooling, the
mean in the first age group is higher than the pooled mean of the second and third
groups, and therefore, a second pooling is needed,
O
n8 .8/ C .n10 C n12 /.10;
O 12/
O 10; 12/ D
.8; :
n8 C n10 C n12
> msize2<-msize1
> msize2[1:3]<-(3*22.5+6*22.08333)/9
> msize2
8 10 12 14
22.22222 22.22222 22.22222 24.25000
> plot(age,size)
> lines(unique(age),msize1,lty=2)
> lines(unique(age),msize2)
> title("c:step 2")
Figure 2.2c shows that after the second pooling, the isotonic means are monotone.
Isotonic regression can be fitted using the R function pava(). A general call of the
function in the Iso package has the form
16 Z. Shkedy et al.
pava(observed means,weights)
The first object is a vector of the unrestricted means while the second object is a
vector which specifies the weight (i.e., the sample size) at each design point. For the
example of the dental study, we use
> iso.r1<-pava(msize,w=c(3,3,3,2))
> iso.r1
8 10 12 14
22.22222 22.22222 22.22222 24.25000
> plot(age,size)
> lines(unique(age),msize,lty=2)
> lines(unique(age),iso.r1)
> title("d:PAVA")
Note that the observation unit in (2.3) is the individual. Alternatively, for data
grouped into n unique age groups, a1 < a2 < < an , the likelihood is
Y
n Pni Pn i
LD .ai / j D1 yij
.1 .ai //.ni j D1 yij /
: (2.4)
i D1
Pn
Pni ni is the sample size in the i th age group, N D
Here, O i/ D
i D1 ni and .a
j D1 yij =ni is the observed prevalence at the i th age group. Figure 2.3 shows the
prevalence of rubella in the UK. The maximum likelihood estimators under order
constraints for .ai / in (2.3) and (2.4) are identical. It is the isotonic regression
of the observed prevalence, .a O i /, with weights ni . The isotonic regression is a
step function with respect to age with Ł the final number of sets (or the final
number of steps) and aQ 1 ; aQ 2 ; : : : ; aQ Ł the jump points. The observed prevalence and
the estimated isotonic means are shown in Fig. 2.3. We can see that as long as
the observed prevalence is monotone, the isotonic regression will reproduce the
observed mean (for example, as in age groups 4.5–9.5). In our example, the final
number of sets is equal to 19. The first violation of the order occurs in age groups
2.5 and 3.5 with observed prevalence equal to 0.2055 and 0.2024, respectively. The
second violation of the order occurs at age 9.5 and 10.5 (with observed prevalence
O
equal to .9:5/ D 0:7444 and .10:5/ O D 0:6875, respectively). For these two age
2 Estimation Under Order Restrictions 17
prevalence
*
0.6
**
*
0.4
0.2 **
*
0 10 20 30 40
age
groups, the PAVA pools the two means together and estimates the isotonic mean to
O
be equal to .9:5; 10:5/ D 0:7176.
The PAVA discussed in the previous section is the algorithm we use in order
to estimate the isotonic regression for a given dataset. Graphical interpretation
of isotonic regression is discussed by Barlow et al. (1972) and PRobertson et al.
`
(1988). Let wi be the weight at the i th design point, G` D i D1 g.xi /wi and
P
W` D `iD1 wi . The cumulative sum diagram (CSD) is the curve that joins together
the two dimensional points P` D .W` ; G` /, ` D 0; : : : ; K with P0 D .0; 0/.
Figure 2.4a presents an example of isotonic regression with five design points
discussed by Silvapulle and Sen (2005). Note that the first two sample means were
pooled together by the PAVA.
# EXAMPLE from Silvapulle and Sen (2005)
> xi<-c(1,2,3,4,5,6)
> wi<-c(1,1,3,1,2,2)
> yi<-c(-1,-3,1,1,3,4)
> plot(xi,yi,pch="*",xlab="x",ylab="y")
> iso1<-pava(yi,w=wi)
> lines(xi,iso1)
> cbind(xi,wi,yi,iso1)
xi wi yi iso1
[1,] 1 1 -1 -2
[2,] 2 1 -3 -2
[3,] 3 3 1 1
[4,] 4 1 1 1
[5,] 5 2 3 3
[6,] 6 2 4 4
18 Z. Shkedy et al.
a 4 *
3 *
1 * *
y
0
−1 *
−2
−3 *
1 2 3 4 5 6
x
b CSD GCM
* *
10 10
sum(wi*yi)
sum(wi*yi)
5 * 5 *
0 0
* *
* * * *
* *
0 2 4 6 8 10 0 2 4 6 8 10
sum(wi) sum(wi)
Fig. 2.4 Graphical interpretation of the PAVA. (a) Unrestricted (stars) means and isotonic
regression (solid line) for the example in Table 2.9 of Silvapulle and Sen (2005). (b) Left panel:
the cumulative sum diagram (CSD). Right panel: the greatest convex minorant plot (GCM)
2 Estimation Under Order Restrictions 19
The corresponding CSD is shown in Fig. 2.4b. Note that the unrestricted
estimated of g.xi / in Fig. 2.4 are the left slopes at P` of the CSD, that is,
G` G`1
g.xi / D :
W` W`1
The greatest convex minorant (GCM) is defined as the graph of the supremum of
all convex functions whose graphs lie below the CSD (Barlow et al. 1972). The
right panel in Fig. 2.4b shows the GCM for our example. Note that the slope of the
straight line that joins P0 to P1 is greater than the slope of the straight line that joins
P0 to P2 . As a result, the point P1 lies above the GCM, and the first two unrestricted
O 1 / and g.x
means g.x O 2 / are pooled together by the PAVA.
> par(mfrow=c(1,2))
> plot(swi,swiyi,pch=" ",xlim=c(0,11),xlab="sum(wi)"
,ylab="sum(wi*yi)")
> points(swi,swiyi,pch="*",cex=2)
> lines(c(-1,0.5),c(0,0))
> lines(c(0,0),c(-1,1))
> lines(c(0,swi),c(0,swiyi))
> title("CSD")
> plot(swi,swiyi,pch=" ",xlim=c(0,11),xlab="sum(wi)"
,ylab="sum(wi*yi)")
> points(swi,swiyi,pch="*",cex=2)
> lines(c(-1,0.5),c(0,0))
> lines(c(0,0),c(-1,1))
> l1<-c(0,2,5,6,8,10)
> l2<-c(0,-4,-1,0,6,14)
> lines(l1,l2,lty=4)
> title("GCM")
Let us focus now on the connection between the GCM plot and the PAVA.
Figure 2.5a presents an hypothetical example with five design points and five
observations at each design point. We can see in Fig. 2.5a and from the panel
below that the PAVA pooled together the first two and the last three unrestricted
means.
20 Z. Shkedy et al.
a
1.0 *
yi * *
0.0 *
−1.0 *
1 2 3 4 5
xi
b
* *
sum(wi*yi)
4
0 * *
−6 *
0 5 10 15 20 25
sum(wi)
Fig. 2.5 Illustrative example of CSD and GCM (see also Fig. 1.2.3 in Robertson et al. 1988). Panel
(a): data, observed means (dashed line) and isotonic means (solid line). Panel (b): CSD (solid line)
and GCM (dashed-dotted line)
# unrestricted means: yi
# isotonic means: iso1
> cbind(xi,wi,yi,iso1)
xi wi yi iso1
1 1 5 0.4424360 -0.3233958
2 2 5 -1.0892276 -0.3233958
3 3 5 1.2481568 0.5663858
4 4 5 0.4272252 0.5663858
5 5 5 0.0237755 0.5663858
Figure 2.5b shows the CSD and the GCM plots. The GCM is the curve which joins
P0 –P2 and P2 – P5 with straight lines (dashed-dotted line). The slope of the straight
line that joins P0 –P1 is greater than the slope of the straight line that joins P0 –P2 . As
a result, P1 lies above the GCM, and the unrestricted means at x1 and x2 are pooled
together. Similarly, since the points P3 , and P4 lie above the GCM, the unrestricted
means at x3 ; x4 ; x5 are pooled together.
O
Let .x/ D ..xO 0 /; .x
O 1 /; : : : ; .x
O K // be the estimated unrestricted means at
the design points x0 ; x1 ; x2 ; : : : ; xK (hence, the unrestricted maximum likelihood
estimates) and n0 ; n1 ; n2 ; : : : ; nK be the sample sizes at each design point. As we
showed in the previous sections, the maximum likelihood estimate under order
restriction is the isotonic regression of .xO i / with weights ni . We denote the isotonic
regression by O ? .x/ D .O ? .x0 /; O ? .x1 /; : : : ; O ? .xK //. Let ` be the `th final
2 Estimation Under Order Restrictions 21
set, ` D 1; : : : ; Ł. For all design points belonging to the same final set ` , say,
xu < xuC1 <; : : : ; < xuCv , the isotonic regression is the same, O ?xu D O ?xuC1 D
D O ?xuCv . Hence, given the final numbers of sets, the isotonic regression can be
expressed as
O ? D S;
O (2.5)
Suppose that design points xi ; xi C1 ; : : : ; xi Cm belong to the final set ` . Then the
rows in the corresponding sub-matrix are identical and given by
!
ni ni C1 ni Cm
ŒS` i: D P ;P ;:::; P ; (2.8)
i 2` ni i 2` ni i 2` ni
and
!
ni ni C1 ni Cm 0
O .xi ; : : : ; xi Cm / D P
?
;P ;:::; P O i /; : : : ; .x
..x O i Cm // :
i 2` ni i 2` ni i 2` ni
P
It follows that i ŒS`i i D 1 and trace.S/ D Ł. Within the framework of non-
parametric regression, the trace of the smoothing matrix is equivalent to the effective
number of parameters (Hastie and Tibshirani 1990). In our setting, it is simply the
final number of sets.
In the first example presented in Fig. 2.6a, the unrestricted means are in order. As
a result, isotonic regression interpolates the unrestricted means. As can be seen in
Fig. 2.6b, the GCM in this case is identical to the CSD, so the number of straight
lines joining P` to P`C1 is equal to K C 1. In the second example, presented in
Fig. 2.6c, the unrestricted means at x2 and x3 were pooled together. This implies
22 Z. Shkedy et al.
a b 200
14
*
13 150
*
sum(wi*yi)
12 100
y
*
11
50 *
10 *
0
1 2 3 4 5 0 5 10 15
x sum(wi)
c 15 d
*
14 150
*
sum(wi*yi)
13
100
12 *
y
11 50 *
10 *
0
1 2 3 4 5 0 5 10 15
x sum(wi)
Fig. 2.6 Panel (a): data for example I, unrestricted means (dotted line) and isotonic regression
(solid line). Panel (b): GCM plot (example I). Panel (c): data for example II, unrestricted means
(dotted line) and isotonic regression (solid line). Panel (d): GCM plot (example II)
that the number of straight lines joining P` to P`C1 in the GCM plot is equal to four.
Note that the point P2 in Fig. 2.6d lies above the GCM curve.
# Data for example I
> x1<-c(rnorm(3,10,0.5),rnorm(3,11,0.5),rnorm(3,12,0.5)
,rnorm(3,13,0.5),rnorm(3,14,0.5))
> d1<-c(rep(1,3),rep(2,3),rep(3,3),rep(4,3),rep(5,3))
> plot(d1,x1,xlab="dose",ylab="response")
> mx1<-tapply(x1,as.factor(d1),mean)
> lines(unique(d1),mx1)
> iso.r1<-pava(mx1,w=rep(3,3,3,3,3))
> lines(unique(d1),iso.r1)
> x1<-c(rnorm(3,10,0.5),rnorm(3,11,0.5),rnorm(3,9,0.5)
,rnorm(3,13,0.5),rnorm(3,14,0.5))
> d1<-c(rep(1,3),rep(2,3),rep(3,3),rep(4,3),rep(5,3))
> plot(d1,x1,xlab="dose",ylab="response")
> mx1<-tapply(x1,as.factor(d1),mean)
> lines(unique(d1),mx1,lty=2)
> iso.r1<-pava(mx1,w=rep(3,3,3,3,3))
> lines(unique(d1),iso.r1)
2 Estimation Under Order Restrictions 23
13
12
x$y
11
10
0 1 2 3 4 5
x0
50
40
cumsum(x$y)
30
20
10
0
0 1 2 3 4 5
x0
The GCM plot can be produced automatically with the R function isoreg().
A typical call of the function has the form of the isoreg(x,y), where y is a
vector of the sample means at each design point. For the example discussed above,
we use
> mx1<-c(9.784394,10.907780,9.197900,13.063718,13.907857)
> iso.fit<-isoreg(c(1,2,3,4,5),mx1)
> plot(iso.fit,plot.type = "row")
The object iso.fit contains information about the observed isotonic means
(y and yf, respectively) and the cumulative sum (yc). Figure 2.7 shows the
graphical output of the function. Note that the isoreg() function can be used
only for the case of equal weights, i.e., equal sample sizes.
24 Z. Shkedy et al.
> iso.fit
Isotonic regression from isoreg(x = c(1, 2, 3, 4, 5), y = mx1),
with 4 knots / breaks at obs.nr. 1 3 4 5 ;
initially ordered ’x’
and further components List of 4
$ x : num [1:5] 1 2 3 4 5
$ y : num [1:5] 9.78 10.91 9.20 13.06 13.91
$ yf: num [1:5] 9.78 10.05 10.05 13.06 13.91
$ yc: num [1:6] 0.00 9.78 20.69 29.89 42.95 \ldots
The upper panel in Fig. 2.8 shows an example of the propulsion dose-response
experiment in which the response decreases with increasing dose. An elaborate
description of the experiment is given in Sect. 4.2.3. We use the R function
monoreg() to obtain the isotonic regression fit of the data. Note that monoreg()
automatically groups the data and calculates the observed means according to the
unique values of the dose. The monoreg() function can be used to calculate
both increasing (the default) and decreasing monotone means. The response was
measured in eight different unique dose levels. At the first two and last two dose
levels, there are five observations per dose, while for the other dose levels there are
ten observations per dose. Hence, the weights for the isotonic regression (wi) are
given by wi D .5; 5; 10; 10; 10; 10; 5; 5/. To estimate the isotonic means we use the
monoreg() function:
> library(fdrtool)
> wi <- c(5,5,10,10,10,10,5,5)
> # my1: the observed means at each dose level
> y.up <- monoreg(sort(unique(x1)),my1,w=wi,type=c("isotonic"))
In our example, isotonic regression produces a flat profile since all the observed
means were pooled together (see the upper panel in Fig. 2.8). The object yf is the
isotonic mean vector for which all elements are equal to 56.8087.
> y.up
$x
[1] -4.6051702 -3.9120230 -3.2188758 -1.8325815 -0.4620355
[6] 0.9162907 2.3025851 3.6888795
$y
[1] 97.01493 83.28822 84.17740 61.74196 43.70133 32.83132 28.22858 28.26821
$w
[1] 5 5 10 10 10 10 5 5
$yf
[1] 56.80866 56.80866 56.80866 56.80866 56.80866 56.80866 56.80866 56.80866
$type
[1] "isotonic"
$call
monoreg(x = sort(unique(x1)), y = my1, w = wi, type = c("isotonic"))
attr(,"class")
[1] "monoreg"
2 Estimation Under Order Restrictions 25
100
+
+
isotonic regression
+
80 antitonic regression
response
+
60
+
40
+
+ +
20
−4 −2 0 2 4
dose
100
+
+ +
80
response
+
60
+
40
+
+ +
20
−4 −2 0 2 4
dose
Fig. 2.8 Isotonic and antitonic regression for the propulsion dose-response experiment. Dashed
line: isotonic regression; solid line: antitonic regression. Upper panel: data are presented in the
original order (response versus log(dose)). Lower panel: the data are presented in the reverse order
(response versus log(dose))
26 Z. Shkedy et al.
Note that the observed means of the second and the third dose levels were pooled
together by the PAVA since the observed means at these dose levels violate the order,
O 2 D 83:28822 and O 3 D 84:17740.
> y.down
$x
[1] -4.6051702 -3.9120230 -3.2188758 -1.8325815 -0.4620355
[6] 0.9162907 2.3025851 3.6888795
$y
[1] 97.01493 83.28822 84.17740 61.74196 43.70133 32.83132 28.22858 28.26821
$w
[1] 5 5 10 10 10 10 5 5
$yf
[1] 97.01493 83.88101 83.88101 61.74196 43.70133 32.83132 28.24840 28.24840
$type
[1] "antitonic"
$call
monoreg(x = sort(unique(x1)), y = my1, w = wi, type = c("antitonic"))
attr(,"class")
[1] "monoreg"
If we reverse the order of the dose (i.e., treat the highest dose as the lowest dose),
the antitonic regression produces a flat profile for the means, while the isotonic
regression estimate produces a mirror image of the antitonic means discussed above
(see lower panel in Fig. 2.8).
> y.down
.
.
$yf
[1] 97.01493 83.88101 83.88101 61.74196 43.70133 32.83132 28.24840 28.24840
$type
[1] "antitonic"
> y.up
.
.
$yf
[1] 28.24840 28.24840 32.83132 43.70133 61.74196 83.88101 83.88101 97.01493
$type
[1] "isotonic"
2.6 Discussion
References
Barlow, R. E., Bartholomew, D. J., Bremner, M. J., & Brunk, H. D. (1972). Statistical inference
under order restriction. New York: Wiley.
Cooke, R. M. (Ed.). (2009). Uncertainty modeling in dose-response. New York: Wiley.
De Leeuw, J., Hornik, K., Mair, P. (2009). Isotone optimization in R: Pool-Adjacent-Violators
Algorithm (PAVA) and Active Set Methods. Journal of Statistical Software, 32(5), 1–24.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Silvapulle, M. J., & Sen, P. K. (2005). Constrained statistical inference: Order, inequality, and
shape constraints. New York: Wiley.
Hastie, T.J. and Tibshirani, R.J., (1990) Generalized Additive Models, Chapman & Hall/CRC.
Chapter 3
Testing of Equality of Means Against Ordered
Alternatives
3.1 Introduction
In the previous chapter, we discussed the problem of estimation under order restric-
tions. We used isotonic regression to estimate the order-restricted means. This chap-
ter is devoted to order-restricted inference in a dose-response setting. In Sect. 3.2, we
formulate the null hypothesis and the ordered alternatives. In Sect. 3.3, we discuss
the test statistics proposed by Williams (1971, 1972) and Marcus (1976); both are
“t-test”-type test statistics that can be used to compare each nonzero dose level to
the zero dose (i.e., the control) under order constraints. In Sect. 3.4, we discuss the
likelihood ratio test (Barlow et al. 1972; Robertson et al. 1988).
Z. Shkedy ()
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
D. Lin
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
D. Amaratunga
Biostatistics and Programming, Jansenn Pharmaceutical Companies of Johnson & Johnson,
Raritan, NJ, USA
e-mail: [email protected]
where Yij is the j th response at the ith dose level, di are the K C 1 dose levels,
d0 is the zero dose level (i.e., the control group), .di / is the mean response at i th
dose level, and "ij N.0; 2 / independent of one another. The null hypothesis of
no dose effect is given by
A one-sided alternative hypothesis of a positive dose effect for at least one dose
level (i.e., an increasing trend) is specified by
Up
H1 W .d0 / .d1 / .dK /; (3.3)
with at least one strict inequality. When testing the effect of a drug for a positive
outcome, the researcher may be able to specify a positive effect as the desirable
alternative. However, without prior knowledge, it seems reasonable to assume that
the response levels may increase or decrease in response to increasing dose, but
with the direction of the trend not known in advance. Thus, we must also consider
an additional alternative:
O ? yN0
ti D pi : (3.5)
2s 2 =n
Here, yN0 is the sample mean at the zero dose level (control), O ?i is the estimate of the
mean at the i th dose level under the ordered alternative, and s 2 is an estimate of the
variance. For the O ?i , Williams (1971, 1972) used the isotonic regression estimates
3 Testing of Equality of Means Against Ordered Alternatives 31
of the observed means with respect to dose. In case the number of observations
at dose i ni ¤ n, the denominator of Williams’ test statistic can be adjusted as a
two-sample t-test.
Williams’ test procedure is a sequential procedure. In the first step, O ?K is
compared to yN0 . If the null hypothesis is rejected (implying that there is a dose-
response trend across the K groups), O ?K1 is compared to yN0 , etc. This process
stops whenever a null hypothesis is not rejected. If this happens at the K 0 th dose
level (0 K 0 K), then it implies that there is a significant difference between
the control dose and the .K 0 C 1/th dose level onwards. In certain contexts, the
.K 0 C 1/th dose is referred to as the “minimum effective dose” (MED). Identifying
the MED (or its analog, the “no effect level” or NOEL, which would be the K 0 th
dose level) is important in many pharmaceutical settings.
Williams (1971) discussed an example of a dose-response experiment including
six dose levels and a zero control with n D 8 observations at each dose level. The ob-
served means and isotonic regression for the experiment are shown in Fig. 3.1a. Note
that the PAVA algorithm pools together the first three dose levels and the last two
dose levels (O ?0 D O ?1 D O ?2 D 10:1;
O ?3 D 10:6; O ?4 D 11:4; O ?5 D O ?6 D 11:8).
> ybar<-c(10.4,9.9,10.0,10.6,11.4,11.9,11.7)
> dose<-c(0,1,2,3,4,5,6)
> plot(dose,ybar,ylab="mean response")
> lines(dose,ybar,lty=4)
> iso.y<-pava(ybar,w=rep(1,7))
> lines(dose,iso.y)
> cbind(dose,ybar,iso.y)
dose ybar iso.y
[1,] 0 10.4 10.1
[2,] 1 9.9 10.1
[3,] 2 10.0 10.1
[4,] 3 10.6 10.6
[5,] 4 11.4 11.4
[6,] 5 11.9 11.8
[7,] 6 11.7 11.8
Williams (1971) assumed that the mean square error s 2 is equal to 1.16. The test
statistic to compare the last and the first dose levels is
11:8 10:4
t6 D p D 2:60:
2 1:16=8
The distribution of the test statistic under the null hypothesis was derived by
Williams, and the critical value for significance at 5% is 1.81. As shown in Fig. 3.1b,
a significant difference can be detected between the last dose level by and the
control.
Williams (1971) proposed a modification of the test statistic (3.5) that replaced
yN0 with O ?0 , the estimate of the first dose (control) mean under order restrictions:
O ? O0 ?
ti D pi : (3.6)
2s 2 =n
32 Z. Shkedy et al.
mean response
line: isotonic regression.
(b) Williams’ (solid line) and 11.0
Marcus’ (dotted dashed line)
test statistics at each dose
level. Dashed horizontal line:
10.5 williams
the critical value for X
Williams’ test statistics
(for ˛ D 0:05, K C 1 D 7,
X
n D 8) 10.0 marcus
0 1 2 3 4 5 6
dose
b
williams
marcus
3
test statistics
1 2 3 4 5 6
dose
Marcus (1976) derived the distribution under the null hypothesis of the modified
test statistic (3.6). For the example presented above, the modified test statistic for
the last two dose levels is
#williams
> (11.8-10.4)/sqrt(2*1.16/8)
[1] 2.599735
#Marcus
> (11.8-10.1)/sqrt(2*1.16/8)
[1] 3.156821
3 Testing of Equality of Means Against Ordered Alternatives 33
It has been shown that the performance of Marcus’ test is close to that of Williams’
in terms of power (Marcus 1976). In certain cases, the two tests are even equivalent.
For K D 1, Williams’ and Marcus’ test statistics reduce to the two-sample t-test.
The two test statistics are also equal in the case that the observed means are
monotone since in that case, the isotonic regression reproduces the observed means.
Although both Williams’ and Marcus’ test statistics are t-test-type statistics and
formulated in a similar way, for a given direction, they differ in terms of the
distribution of the test statistics under the null hypothesis. Figure 3.2 shows an
example of a dose-response experiment with four dose levels and three observations
at each dose level. We test the null hypothesis (3.2) against the one sided ordered
Up
alternative H1 . The observed test statistics are 2.69 and 3.63 for Williams’ and
Marcus’ tests, respectively.
> #y:vector of gene expression data.
> #x: vector of dose levels (1,1,1,2,2,2,3,3,3,4,4,4)
> xi<-unique(x) # xi is the dose levels
> wi<-c(3,3,3,3)
> yi.i<-tapply(y,as.factor(x),mean)
> plot(x,y,pch="*",xlab="dose",ylab="gene expression")
> lines(xi,yi.i,lty=2)
> iso.i<-pava(yi.i,w=wi)
> lines(xi,iso.i)
> mse<-anova(aov(y˜as.factor(x)))[2,3]
> x0<-yi.i[1]
> m0<-iso.i[1]
> m3<-iso.i[4]
> r<-3
>#Williams test statistic
> tkw.obs<-(m3-x0)/(sqrt(2*mse/r))
>#Marcus test statistic
> tkm.obs<-(m3-m0)/(sqrt(2*mse/r))
>#Williams
> tkw.obs
2.695187
>#Marcus
> tkm.obs
3.639801
gene expression
compared with the mean gene
expression of the control 7.0
group. (a) Example of a *
*
dose-response experiment. *
Dashed line: mean response; * *
solid line: isotonic regression. 6.8
*
(b) Bootstrap distribution of * *
Williams’ (upper panel) and
Marcus’ (lower panel) test
6.6 *
statistics. Vertical line:
*
observed test statistics
1.0 1.5 2.0 2.5 3.0 3.5 4.0
dose
b
0.8 observed
Density
0.4
0.0
−4 −2 0 2 4 6
t_k (williams)
observed
2.0
Density
1.0
0.0
0 1 2 3 4 5 6 7
t_k (marcus)
We illustrate the difference between the two test statistics using three hypothet-
ical examples presented in Fig. 3.3. In Fig. 3.3a, xN 0 D O ?0 and as a result, both test
statistics are equal to 1.105. In Fig. 3.3b, we see xN 0 > O ?0 , and since O ?3 > xN 0 > O ?0 ,
Williams’ test statistic is smaller than Marcus’ test statistic, but both statistics are
positive. Figure 3.3c shows an example in which xN 0 > O ?0 and xN 0 > O ?3 . As a result,
Williams’ test statistic is negative, whereas Marcus’ test statistic is positive.
Williams’ and Marcus’ procedures are step-down procedures, i.e., the comparison
between a lower dose and control is tested only if the test of a higher dose versus
the control is significant. The underlying assumption is that there is a monotone
dose-response relationship with a known direction. In this section, we discuss the
likelihood ratio test when the direction is unknown. Generally, the likelihood ratio
test presented in this section is applied once, not sequentially, as with the t-test-
type of the previous section. It is worth noting, however, that when the objective
is to identify the MED, the likelihood ratio test too can be applied sequentially in
the same way as presented in Sect. 3.3 (Amaratunga and Ge 1998); this approach,
however, is not pursued further in this book.
As a starting point, let us focus on testing the null hypothesis against an
unconstrained alternative H2 , i.e.,
for at least one pair of i and `, i ¤ `. The maximum likelihood estimator under
the null hypothesis H0 is the sample mean O D y; N while under the alternative
H2 , the maximum likelihood estimators are the sample means at each dose level
.O 0 ; O 1 ; : : : ; O K / D .yN0 ; yN1 ; : : : ; yNK /. Following Silvapulle and Sen (2005), we
define the residual sum of squares under each hypothesis:
P 2
RSS0 D ij yij yN ; residual sum of squares under H0 ;
P 2 (3.8)
RSS2 D ij yij yNi ; residual sum of squares under H2 :
36 Z. Shkedy et al.
response
line: isotonic regression. Test *
statistics are calculated for
6.1
the last dose level versus the *
*
control. (a) Example 1: ** *
tWilliams D tMarcus D 1:105. *
*
6.0
(b) Example 2: tWilliams D
3:44; tMarcus D 5:303.
(c) Example 3: tWilliams D *
0:0612; tMarcus D 1:792 1.0 1.5 2.0 2.5 3.0 3.5 4.0
dose
b
*
6.2
*
6.1
*
6.0
response
5.9
* *
5.8 * *
*
5.7
*
*
1.0 1.5 2.0 2.5 3.0 3.5 4.0
dose
c
*
*
6.1
*
6.0
response
* *
5.9 *
*
*
5.8 * *
5.7
*
1.0 1.5 2.0 2.5 3.0 3.5 4.0
dose
3 Testing of Equality of Means Against Ordered Alternatives 37
The null hypothesis will be rejected if the difference RSS0 RSS2 is large,
or equivalently if the value of the pseudo-F statistic given by
is large. In what follows, we define an equivalent pseudo-F statistic for the case of
an ordered alternative.
Testing the equality of ordered means using a likelihood ratio test, for the case
that the response is assumed to be normally distributed, was discussed by Barlow
et al. (1972) and Robertson et al. (1988). The likelihood ratio test works out to be
the ratio of the error variances under the null and the alternative hypotheses:
P
2 O H2 1 ij .yij O ?i /2 RSS1
01 D
N
D P D ; (3.10)
O H2 0 ij .yij O 2
/ RSS0
where O H2 0 and O H2 1 are the ML estimates for the error variance under the null and the
alternative hypothesis, respectively, and RSS1 is the residual sum of squares under
2
N
the ordered alternative. The null hypothesis is rejected for a small value of 01 .
Equivalently, H0 is rejected for large value of EN 01 , where
2
P P
2
ij .yij /
O 2 ij .yij O ?i /2 RSS0 RSS1
EN 01
2
D 1 01 D
N
P D : (3.11)
ij .yij O
/ 2 RSS0
P .EN 01
2
C / D ˙`D1
L
P .`; L; w/P .B 1 .`1/; 1 .N `/ C / (3.12)
2 2
38 Z. Shkedy et al.
a b Beta(0.5,0.5*(N−2)
40
0.4
level probability
30
fx1
20
0.2
10
0.0 0
number of levels 0.0 0.2 0.4 0.6 0.8 1.0
x
c Beta(0.5*2,0.5*(N−3) d Beta(0.5*3,0.5*(N−4)
4 2.0
3
fx2
fx3
2 1.0
0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Fig. 3.4 Level probabilities and the components for the beta mixture distribution for N D 12 and
LD4
gene expression
observed means.
(b) Histogram and density 7.0
estimate for the bootstrap * *
replicates of EN 01
2 *
*
*
6.8 *
* *
6.6 *
*
b observed
20
15
Density
10
# x: dose
# y: gene expression
> cbind(x,y)
x y
V2 1 6.948563
V3 1 6.859254
V4 1 6.793810
V5 2 6.621029
V6 2 6.560225
V7 2 6.758055
V8 3 6.960683
V9 3 6.837769
V10 3 6.750577
V11 4 7.257968
V12 4 6.920253
V13 4 7.368275
> xi <- unique(x) # xi is the dose levels
> yi.i<-tapply(y,as.factor(x),mean)
> plot(x,y,pch="*",xlab="dose",ylab="gene expression")
> lines(xi,yi.i,lty=2)
> iso.i<-pava(yi.i,w=wi)
> lines(xi,iso.i)
The observed test statistic is equal to 0:6078, and the p value of 0:0069 is
calculated using the distribution of EN 01
2
according to (3.12). The small p value
indicates that the null hypothesis should be rejected.
> NN<-12
> y.m<-tapply(y,as.factor(x),mean)
> y.is.u<-pava(y.m,w=wi)
> rep.iso.u<-rep(y.is.u,wi)
> RSS0<-sum((y-mean(y))ˆ2)
> RSS1<-sum((y-rep.iso.u)ˆ2)
> Ebar01.obs<-1-(RSS1/RSS0)
> (RSS0-RSS1)/RSS0
[1] 0.6077756
> 0.45833*(1-pbeta(Ebar01.obs,0.5,0.5*(NN-2)))+
+0.25*(1-pbeta(Ebar01.obs,0.5*2,0.5*(NN-3)))+
+0.04167*(1-pbeta(Ebar01.obs,0.5*3,0.5*(NN-4)))
[1] 0.006993
The analysis presented above relies on the assumption that gene expression observa-
tions are normally distributed. An alternative approach is to perform a bootstrap test
(Efron and Tibshirani 1993) which does not require any distributional assumption
for validity. The basic idea is to generate bootstrap samples under the null hypothesis
by resampling with replacement from the gene expression vector and keeping the
dose level fixed. We used the following bootstrap algorithm:
1. For b in 1 W B
(a) Generate a bootstrap sample of size N drawn with replacement under the
null hypothesis from the observed data:
(b) For each bootstrap sample, calculate the residuals sum of squares under
.b/ .b/
the null and alternative hypotheses, RSS0 and RSS1 , respectively, and
calculate the bootstrap replicate for the test statistic
.b/
RSS1
EN 01 D 1
2.b/
.b/
:
RSS0
#.EN 01 > EN 01
2.b/ 2.obs/
/
P D :
B
> wi<-c(3,3,3,3)
> set.seed(100)
> B<-2000
> Ebar01.boot<-c(1:B)
>#begin the bootstrap
> for(i in 1:B)
+ {
+ index<-sample(c(1:length(y)), length(y),
replace = TRUE)
+ y.boot<-y[index]
+ y.m<-tapply(y.boot,as.factor(x),mean)
+ y.is.u<-pava(y.m,w=wi)
+ rep.iso.u<-rep(y.is.u,wi)
+ RSS0<-sum((y.boot-mean(y.boot))ˆ2)
+ RSS1<-sum((y.boot-rep.iso.u)ˆ2)
+ Ebar01.boot[i]<-1-(RSS1/RSS0)
+ }
>#end of the bootstrap
> # P-VALUE
> sum(Ebar01.boot>Ebar01.obs)/B
[1] 0.01
The bootstrap p value is equal to 0.01, indicating that the null hypothesis should be
rejected. Figure 3.5b shows the histogram of the bootstrap replicates for EN 01
2
.
3.5 Discussion
In Chaps. 2 and 3, we discussed the estimation and inference for a one-way ANOVA
model of the form
Yij D .di / C "ij :
We use isotonic regression in order to estimate the mean response under order
restrictions at each dose level. The testing procedures we discussed in this chapter
were used either to test the global null hypothesis of no dose effect (using the
42 Z. Shkedy et al.
LRT test) or to compare between the mean response of two dose levels (Williams’
and Marcus’ test statistics). Using isotonic regression implies that the estimated
dose-response curve is a step function. In the next chapter, we focus on parametric
dose-response models for which
.di / D f .; di /:
References
Amaratunga, D., & Ge, N. (1998). Step-down trend tests to determine a minimum effective dose.
Journal of Biopharmaceutical Statistics, 8, 145–156.
Barlow, R. E., Bartholomew, D. J., Bremner, M. J., & Brunk, H. D. (1972). Statistical inference
under order restriction. New York: Wiley.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman &
Hall.
Marcus, R. (1976). The powers of some tests of the quality of normal means against an ordered
alternative. Biometrika, 63, 177–183.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Silvapulle, M. J., & Sen, P. K. (2005). Constrained statistical inference: Order, inequality, and
shape constraints. New York: Wiley.
Williams, D. A. (1971). A test for differences between treatment means when several dose levels
are compared with a zero dose control. Biometrics, 27, 103–117.
Williams, D. A. (1972). The comparision of several dose levels with a zero dose control.
Biometrics, 28, 519–531.
Chapter 4
Nonlinear Modeling of Dose-Response Data
Roel Straetemans
4.1 Introduction
R. Straetemans ()
Ablynx NV, Zwijnaarde, Belgium
e-mail: [email protected]
110
100
80
Response
60
40
20
4 1
Yij D 1 C C "ij ; i D 0; : : : ; K; j D 1; 2; : : : ; ni ; (4.1)
1 C 10.xi 2 /3
where Yij is the jth observation at the ith dose level, xi is the dose i expressed
as decadic logarithm (base 10 value), 1 and 4 are the response values in the
asymptotic region of the curve, 2 is the x (dose) value for which half the effect
(difference between the two asymptotes) is obtained and 3 is a slope parameter.
It is further assumed that, "ij N.0; 2 /. The 4PL model (4.1) uses the log10
parametrization and can be specified in the loge parametrization if wanted or deemed
more suitable. The 4PL nonlinear model in (4.1) can generally be written as
The parameters 1 and 4 in Eq. (4.1) correspond to the response values at the
plateau of the sigmoidal curve depicted in Fig. 4.2a. Which of the two parameters
corresponds to the asymptote at dose zero and which to the asymptote at infinite
dose depends on the value of the slope parameter 3 ? If 3 > 0, 1 corresponds
to the response value at infinite dose and 4 the response value at dose zero as
illustrated in the left panel of Fig. 4.2a. If 3 < 0, the interpretation of 1 and 4 is
reversed as shown in the right panel of Fig. 4.2a.
The parameter 3 is a slope parameter which determines the steepness of the curve
or in other words which quantifies the sensitivity of the response to the dose range of
the drug. Figure 4.2b shows three 4PL models which are identical with the exception
of the value for 3 and illustrates how a larger absolute value for 3 corresponds to
a steeper curve. An important consequence of a steep dose-response curve is that
the response is sensitive to relative small changes in the dose range between the two
plateau values.
The parameter 2 is commonly named as the ED50 dose or alternatively ID50 , EC50
or IC50 depending on the response (effect or inhibition) and the exposure (dose or
concentration). It is the dose at which 50% of the effect is observed, where 50% of
46 R. Straetemans
a 100+(0−100)/(1+10**((x−0)*1)) 100+(0−100)/(1+10**((x−0)*−1))
120 0 + (100−0)/(1+10**((x−0)*1)) 120 0 + (100−0)/(1+10**((x−0)*−1))
100 100
80 80
Response
Response
60 60
40 40
20 20
0 0
b 100
80
60
Response
θ3 = 0.5
θ3 = 1
40
θ3 = 2
20
Fig. 4.2 Parameters in the 4PL model. (a) 1 and 4 parameters in the 4PL model (4.1). Left panel:
3 > 0. Right panel: 3 < 0. (b) The influence of 3 on the sigmoidal pattern of the 4PL model
the effect corresponds to the value at half the difference between the two asymptotes.
Notice that 2 is the dose value at the inflection point of the sigmoidal curve. This
is graphically illustrated in Fig. 4.3a where the dose-response profile goes from a
minimum response of 20 at the lower dose to a maximum response of 100 at the
highest dose, meaning that the maximum obtained dose effect is an increase of 80.
The ED50 parameter corresponds to the dose for which the response value is 60.
4 Nonlinear Modeling of Dose-Response Data 47
a 100
80
60
Response
θ4 = Upper Asymptote
40
20
θ1 = Lower Asymptote
θ2 = ED50
0
Dose (mg/kg)
b 100
80
60
Response
40
20
Fig. 4.3 The ED50 (2 ) parameter in the 4PL model. (a) Illustration of influence of 2 on the
sigmoidal pattern of the 4PL model. (b) Dose-response curve for two compounds following a 4PL
dose-response model (4.1) where one compound is ten times as potent as the other
48 R. Straetemans
A common mistake is to think that, in cases where the response value is a percent-
age, the ED50 parameter is the dose for which the response value equals 50%. This
is only true if the lower and upper asymptotes are 0% and 100% or in the case where
half the maximum effect is given by 50%, e.g. asymptotes go from 20% to 80%.
The ED50 parameter is also called the potency of a compound, since it quantifies
the dose necessary to reach 50% of the maximal effect and as such is a measure
of how potent the compound is. Let us assume that at the early stages of research,
an animal study is performed to compare the efficacy of two compounds. When the
ED50 dose of one compound (Fig. 4.3b, solid line) is ten times as low as the ED50
dose of the other (dotted line), this compound is said to be ten times as potent.
Potency however only tells a part of the story. When two compounds have the
same efficacy window (difference between the two asymptotes), and assuming both
are equivalent with respect to factors such as toxicity and pharmacokinetic (PK)
characteristics, the difference in potency is obviously very meaningful and the more
potent compound will be the preferred chemical entity to go further in development.
If a difference in efficacy window between two compounds exists, a difference in
their potency becomes less meaningful. Imagine an example where the more potent
compound has a maximal effect of 80% and the less potent compound a maximal
effect of 100%. With all other relevant characteristics being comparable, the choice
of preference is less obvious. Overall, however, the potency, expressed as the ED50
dose, is a powerful tool to summarize and compare different compounds under the
assumption that the molecular weights are comparable.
The propulsion dataset originates from an in vivo charcoal meal test in rats. This
test was developed to evaluate effects of a test compound on bowel motility. The
protocol of the study is as follows. A charcoal meal is administered by gavage to
rats. Twenty minutes after feeding, the rats are sacrificed and the small intestine
is surgically removed. The length traversed by the charcoal is divided by the
total length of the small intestine and multiplied by 100, resulting in the percent
distance traveled as the response of interest. For an elaborate discussion about the
experiment, we refer to Tanila et al. (1993).
The study design involved a parallel group design with a vehicle group (dose D 0)
and eight doses (0.01, 0.02, 0.04, 0.16, 0.63, 2.5, 10, or 40 mg/kg) of three
chemicals. Each dose group had at least 5 animals, the vehicle group had 64 animals,
and some dose groups had 10 animals. The raw data are shown in Fig. 4.4, with the
individual animal data in the left panel and the averages by treatment and dose in the
right panel. The upper panel shows the response plotted versus dose, and the lower
panel shows the response plotted versus loge dose, with dose 0 replaced by a small
value for plotting purposes. Figure 4.4 shows that there is a decrease in ratio with
4 Nonlinear Modeling of Dose-Response Data 49
80 80
60 60
40 40
20 20
0 0
0 10 20 30 40 0 10 20 30 40
Dose (mg/kg) Dose (mg/kg)
Percent distance travelled (%)
80 80
60 60
40 40 vehicle
treatment 1
20 20 treatment 2
treatment 3
0 0
−10 −8 −6 −4 −2 0 2 4 −10 −8 −6 −4 −2 0 2 4
Dose (mg/kg, loge scale) Dose (mg/kg, loge scale)
Fig. 4.4 The propulsion dataset. Left panels: individual data. Right panels: mean profiles. Top
panels: dose plotted on the absolute scale (mg/kg). Bottom panels: dose plotted on the loge scale
increasing dose for the three treatments, that the average vehicle response is close
to 100%, and that for treatment 1 no responses at the maximum dose (40 mg/kg)
are available. Noteworthy is the apparently less steep decline of the ratio under
treatment 1 compared to the other two treatments.
Our primary interest is to test for treatment effects, i.e., to test if the three treatments
differ in their dose-response profile. The following 4PL model was fitted to the data:
4 1
Yij D 1 C C "ij ; (4.3)
1 C e.xi 2 /3
Notice that the dose in Eq. (4.3) is expressed as the natural logarithm (base e value),
the reason being the doubling pattern in the dose range studied. The 4PL model can
be fitted in R using the function gnls() (Pinheiro and Bates 2000). A general call
for the function has the form:
θ4 = 99.0782775
100
60
40
20 θ1 = 26.4224
θ2 = −1.6138463
0
−15 −10 −5 0 5 10
Dose (mg/kg, loge scale)
Fig. 4.5 The propulsion data and the estimated 4PL model
The 4PL model (4.3) was fitted using the following code:
gnls.model001<-gnls(
ratio˜(th1+(th4-th1)/(1+(exp((lmpk-th2)*th3)))),
data=data2b,
params=list(th1+th2+th3+th4˜1),
start=c(90,-0.2,1,28),
control=gnlsControl(nlsTol=0.1))
(th1+(th4-th1)/(1+(exp((lmpk-th2)*th3))))
implies that the model is fitted without taking into account possible treatment
differences. Parameter estimates for are shown in the panel below and in Fig. 4.5.
The vertical line is the ED50 value, O2 D 1:614, while the parameter estimates for
the upper (O4 ) and lower (O1 ) asymptote are equal to 99.08 and 26.42, respectively.
4 Nonlinear Modeling of Dose-Response Data 51
> summary(gnls.model001)
Generalized nonlinear least squares fit
Model: ratio ˜ (th1 + (th4 - th1)/(1 + (exp((lmpk - th2) * th3))))
Data: data2b
AIC BIC logLik
1627.413 1644.358 -808.7064
Coefficients:
Value Std.Error t-value p-value
th1 26.42247 2.4654373 10.71715 0
th2 -1.61385 0.1534649 -10.51606 0
th3 0.78141 0.0755436 10.34384 0
th4 99.07828 1.2469491 79.45655 0
Correlation:
th1 th2 th3
th2 -0.728
th3 0.702 -0.408
th4 -0.197 -0.237 -0.443
Standardized residuals:
Min Q1 Med Q3 Max
-4.1327652 -0.5091635 0.1135234 0.5000150 2.2931539
In the next step, the 4PL model is fitted to the data, taking into account possible
treatment effects. This can be done by including a dummy variable for each
treatment. In the code below, the R objects dum1, dum2, dum3, and dum4 are
dummy variables for the vehicle data, treatment groups 1, 2, and 3, respectively,
which take the value 1 for the respective group and zero for the others. This model
specification allows estimation of 4 based on the vehicle data and estimation of a
separate 1 ; 2 , and 3 parameters for each of the active treatments (with different
lower asymptotes, ED50 s, and slopes). The model was fitted using the following
code:
gnls.model002<-gnls(
ratio ˜ th4*(dum1) +
(th11 + (th4-th11)/(1 + (exp((lmpk-th21)*th31))))*(dum2) +
(th12 + (th4-th12)/(1 + (exp((lmpk-th22)*th32))))*(dum3) +
(th13 + (th4-th13)/(1 + (exp((lmpk-th23)*th33))))*(dum4),
data = data2b,
params=list(th4+th21+th22+th23+th31+th32+th33+
th11+th12+th13˜1),
start=c(90,
-0.2, -1.3, -1.3,
0.4, 0.9, 1,
28, 28, 28),
control=gnlsControl(nlsTol=0.1, apVar=TRUE))
Data and estimated models for each group are shown in Fig. 4.6. Parameter estimates
are shown below.
> summary(gnls.model002)
Generalized nonlinear least squares fit
Model: ratio ˜ th4 * (dum1)
+ (th11 + (th4 - th11)/(1 + (exp((lmpk -th21) * th31)))) * (dum2)
+ (th12 + (th4 - th12)/(1 + (exp((lmpk -th22) * th32)))) * (dum3)
+ (th13 + (th4 - th13)/(1 + (exp((lmpk -th23) * th33)))) * (dum4)
52 R. Straetemans
Data: data2b
AIC BIC logLik
1603.530 1640.809 -790.7648
Coefficients:
Value Std.Error t-value p-value
th4 98.98596 1.10624 89.47954 0.0000
th21 0.36960 2.32540 0.15894 0.8739
th22 -1.87467 0.18409 -10.18359 0.0000
th23 -1.58505 0.16093 -9.84918 0.0000
th31 0.36993 0.11659 3.17282 0.0017
th32 0.89430 0.11817 7.56773 0.0000
th33 1.03261 0.13712 7.53061 0.0000
th11 4.16074 37.10361 0.11214 0.9108
th12 27.37244 2.95050 9.27722 0.0000
th13 26.56520 2.82509 9.40332 0.0000
Next, a likelihood ratio test is performed to determine whether the null hypothesis
of all parameters being equal, i.e., the treatments having equal dose-response
profiles, can be rejected or not. This can be done using the R function anova() in
the following way:
anova(gnls.model001, gnls.model002)
Model df AIC BIC logLik Test L.Ratio p-value
gnls.model001 1 5 1627.413 1644.358 -808.7064
gnls.model002 2 11 1603.530 1640.809 -790.7648 1 vs 2 35.88319 <.0001
The two models differ in six degrees of freedom since the full model gnls.mode
l002 has six more parameters compared to the null model gnls.model001,
namely a separate 2 , 3 , and 4 for each treatment group. The p value for
the likelihood ratio test is smaller than 0:0001, indicating that there is statistical
evidence to reject the hypothesis of the three treatments having an equal dose-
response curve. Parameter estimates and 95% confidence intervals are obtained
using the function intervals(). Note that the confidence intervals for 21 and
11 , which were found to be nonsignificant, cover the value of zero as expected.
> intervals(gnls.model002)
Approximate 95% confidence intervals
Coefficients:
lower est. upper
th4 96.8051420 98.9859633 101.166785
th21 -4.2146358 0.3696039 4.953844
th22 -2.2375775 -1.8746713 -1.511765
th23 -1.9023087 -1.5850504 -1.267792
th31 0.1400810 0.3699315 0.599782
th32 0.6613390 0.8943032 1.127267
th33 0.7622878 1.0326054 1.302923
th11 -68.9845584 4.1607408 77.306040
th12 21.5558871 27.3724447 33.189002
th13 20.9958865 26.5652039 32.134521
attr(,"label")
[1] "Coefficients:"
60
40
20
θ11 = 4.160691
0 θ21 = 0.3696070
−6 −4 −2 0 2
Dose (mg/kg, loge scale)
b θ4 = 98.9859635
100
Percent distance travelled (%)
80
60
40
20
θ12 = 27.3724449
0 θ22 = −1.8746713
−6 −4 −2 0 2
Dose (mg/kg, loge scale)
c θ4 = 98.9859635
100
Percent distance travelled (%)
80
60
40
20
θ13 = 26.5652040
0 θ23 = −1.5850504
−6 −4 −2 0 2
Dose (mg/kg, loge scale)
The broad confidence interval for 21 .0:015I 141:72/ obtained through exponenti-
ating (the estimated lower and upper values in the panel above) and 11 parameters
are noteworthy; however, they should not be unexpected. Figure 4.4 shows that
54 R. Straetemans
for treatment 1, no data were available for the 40 mg/kg dose, and as a result, no
information in the lower plateau of the curve is present. This lack of information
is represented in the high uncertainty on the parameter estimate for 11 . Note that
2 and 1 in the 4PL model tend to be correlated since the parameter 2 is defined
as the dose for which half of the total effect is obtained where the total effect is
given by the difference in the lower and upper asymptote. As a result, the missing
information in the lower plateau area for treatment 1 is translated into a highly
imprecise estimates for 21 and 11 . Although this dataset is a good example to
illustrate how the 4PL dose-response model can be fitted for independent data, it
also highlights a warning. These models are very flexible in nature, and care should
be given to the interpretation of the obtained results.
The nonlinear models formulated in (4.4) (Pinheiro and Bates 2000, p. 517) and
in (4.5) (Plikaytis et al. 1991) are two examples of the 4PL model with different
parametrizations. Note that for the 4PL model (4.5), the dose variable xi is
introduced on the original dose scale. Although this model will often fit the data,
the disadvantage is that the ED50 parameter is typically log distributed. Ignoring
this might lead to negative lower limits on the estimated confidence intervals.
4 1
Yij D 1 C C "ij ; (4.4)
1 C e.2 xi /=3
and
4 1
Yij D 1 C C "ij : (4.5)
1 C .xi =2 /3
The two models can be fitted using the following code:
#4PL model (4)
gnls.dp1<-gnls(ratio˜(th1+(th4-th1)/(1+(exp((th2-lmpk )/th3)))),
data = data2b,
params = list(th1 + th2 + th3 + th4 ˜1),
start=c(90,-0.2,1,28),
control=gnlsControl(nlsTol=0.1))
Parameter estimates are shown in the panel below, and the fitted models are shown
in Fig. 4.7. Notice how in model (4.4), the meaning of the 1 and 4 is switched due
4 Nonlinear Modeling of Dose-Response Data 55
θ1 = 99.078277 θ4 = 99.078279
100 100
Percent distance travelled (%)
60 60
40 40
θ4 = 26.422 θ1 = 26.422
20 20
θ2 = −1.613846 θ2 = −1.613845
0 0
−15 −5 0 5 10 −15 −5 0 5 10
Dose (mg/kg, loge scale) Dose (mg/kg, loge scale)
Fig. 4.7 The propulsion data and estimated 4PL models (4.4) and (4.5)
to the shift in position between xi and 2 . For graphical consistency, the fitted model
for (4.5) in Fig. 4.7 is shown with dose on the loge scale, while the model predicted
values and 2 parameter are on the absolute scale. Models (4.4) and (4.5) are just
two examples of alternative parametrizations, and others can be found.
#R output for 4PL model (4)
> summary(gnls.dp1)
Generalized nonlinear least squares fit
Model: ratio ˜ (th1 + (th4 - th1)/(1 + (exp((th2 - lmpk)/th3))))
Data: data2b
AIC BIC logLik
1627.413 1644.358 -808.7064
Coefficients:
Value Std.Error t-value p-value
th1 99.07828 1.2469491 79.45655 0
th2 -1.61385 0.1534648 -10.51607 0
th3 1.27974 0.1237196 10.34384 0
th4 26.42247 2.4654370 10.71715 0
Correlation:
th1 th2 th3
th2 -0.237
th3 0.443 0.408
th4 -0.197 -0.728 -0.702
Standardized residuals:
Min Q1 Med Q3 Max
-4.1327652 -0.5091636 0.1135235 0.5000151 2.2931540
> summary(gnls.dp2)
Generalized nonlinear least squares fit
Model: ratio ˜ (th1 + (th4 - th1)/(1 + ((mpk/th2)ˆth3)))
Data: data2b
AIC BIC logLik
1627.413 1644.358 -808.7064
Coefficients:
Value Std.Error t-value p-value
th1 26.42245 2.4654394 10.71714 0
th2 0.19912 0.0305580 6.51615 0
th3 0.78141 0.0755435 10.34384 0
th4 99.07828 1.2469494 79.45654 0
Correlation:
th1 th2 th3
th2 -0.728
th3 0.702 -0.408
th4 -0.197 -0.237 -0.443
Standardized residuals:
Min Q1 Med Q3 Max
-4.1327649 -0.5091627 0.1135234 0.5000143 2.2931538
When certain parameters of the 4PL model are known to be fixed, the model can
be simplified. Assume that, for an increasing curve, the asymptote at infinite dose
is known and fixed to be a known constant C . In this case, the 4PL model (4.1)
reduces to the 3PL (4.6) model where parameter 1 (if 3 > 0) is replaced with the
known constant C
4 C
Yij D C C C "ij : (4.6)
1 C 10.xi 2 /=3
Similarly, two or three parameters can be fixed, i.e., both the plateau values and/or
the slope parameter, and the 4PL model can respectively be simplified to a 2PL or
1PL model. In the R code below, a common 3PL model (4.6) is fitted to the different
treatments in the propulsion dataset with C D 0.
gnls.model3PL<-gnls(ratio˜((th4)/(1+(exp((lmpk-th2)*th3)))),
data = data2b,
params = list(th4 + th2 + th3 ˜1),
start=c(100,-0.2,1),
control=gnlsControl(nlsTol=0.1))
The parameter estimates are shown in the panel below, and the estimated model
is shown in Fig. 4.8a. We notice that since we constrain the model to have a lower
4 Nonlinear Modeling of Dose-Response Data 57
asymptote equal to zero, the parameter estimate for ED50 changed to O2 D 0:303
compared with O2 D 1:61385 obtained in Sect. 4.2.2 for the 4PL model. The
Akaike information criterion (AIC, Akaike 1973) obtained for the 3PL model
(1,658.058) is higher than the AIC of the 4PL model (1,627.413), indicating that
the 4PL model is to be preferred.
> summary(gnls.model3PL)
Generalized nonlinear least squares fit
Model: ratio ˜ ((th4)/(1 + (exp((lmpk - th2) * th3))))
Data: data2b
AIC BIC logLik
1658.058 1671.614 -825.029
Coefficients:
Value Std.Error t-value p-value
th4 102.36701 1.6565090 61.79683 0.0000
th2 -0.30292 0.1344734 -2.25267 0.0253
th3 0.43089 0.0260811 16.52115 0.0000
Correlation:
th4 th2
th2 -0.678
th3 -0.635 0.302
Standardized residuals:
Min Q1 Med Q3 Max
-3.53495283 -0.32513218 -0.01983675 0.43433966 2.67578036
Another well-known dose-response model is the so-called Emax model, also called
Hill model, given by
x n Emax
Yij D E0 C n i C "ij : (4.7)
xi C EDn50
Here, Yij is the j th observation in dose xi , E0 is the base effect, corresponding to
the response when the dose equals zero, Emax is the maximum drug effect, ED50 is
the dose producing half of the Emax effect, and n is the slope parameter, also called
Hill factor. A detailed description of the Emax model can be found in Chap. 9 in
Ting (2006). In the R code below, a common Emax model 4.7 is fitted to the different
treatments in the Propulsion dataset
gnls.modelEmax2<-gnls(ratio˜E0+((Emax*mpkˆn)/(ED50ˆn+mpkˆn)),
data = data2b,
params = list(E0+Emax+ED50+n ˜1),
start=c(1,70,0.1,-1),
control=gnlsControl(nlsTol=0.1))
The parameter estimates are shown below, and Fig. 4.8b shows the estimated
model. Note that the parameter estimate for the ED50 D 0:19912 which is similar
58 R. Straetemans
−15 −10 −5 0 5 10
Dose (mg/kg, loge scale)
80
Percent distance travelled (%)
60
Emax = 72.6557968
40
20 E0 = 26.42247
−15 −10 −5 0 5 10
Dose (mg/kg, loge scale)
c 100
θ4 5PLL = 98.5056501
80
Percent distance travelled (%)
60
40
20
Halfway dose 5PLL = −1.478928 θ1 5PLL = 21.1452
θ2 4PLL = −1.6138463
0
−15 −10 −5 0 5 10
Dose (mg/kg, loge scale)
4 Nonlinear Modeling of Dose-Response Data 59
to the exponent of O2 obtained from the 4PL model in Sect. 4.2.2, exp.O2 / D
exp.1:61385/ D 0:19911.
> summary(gnls.modelEmax2)
Generalized nonlinear least squares fit
Model: ratio ˜ E0 + ((Emax * mpkˆn)/(ED50ˆn + mpkˆn))
Data: data2b
AIC BIC logLik
1627.413 1644.358 -808.7064
Coefficients:
Value Std.Error t-value p-value
E0 26.42248 2.4654354 10.717166 0
Emax 72.65580 2.9739869 24.430436 0
ED50 0.19912 0.0305579 6.516154 0
n -0.78141 0.0755436 -10.343844 0
Correlation:
E0 Emax ED50
Emax -0.912
ED50 -0.728 0.504
n -0.702 0.767 0.408
Standardized residuals:
Min Q1 Med Q3 Max
-4.1327655 -0.5091642 0.1135235 0.5000156 2.2931541
A first asymmetric model is obtained by extending the 4PL with a fifth parameter to
describe asymmetry. The 5PL model is given by:
4 1
Yij D 1 C 5 C "ij : (4.8)
1 C 10.xi 2 /3
The parameter estimates are shown in the panel below, and the estimated model is
shown in Fig. 4.8c. The dose corresponding to the response halfway between the
c 50 D 1:478, slightly higher
two asymptotes was calculated according to (4.9), ED
O
than 2 obtained for the 4PL model .1:613/.
> summary(gnls.model5PL002)
Generalized nonlinear least squares fit
Model: ratio ˜ (th1 + (th4 - th1)/((1 + (exp((lmpk - th2) * th3)))ˆth5))
Data: data2b
AIC BIC logLik
1627.617 1647.951 -807.8085
Coefficients:
Value Std.Error t-value p-value
th1 21.14527 6.954185 3.04065 0.0027
th2 -2.95604 0.847753 -3.48691 0.0006
th3 1.07869 0.334077 3.22887 0.0014
th4 98.50565 1.237730 79.58576 0.0000
th5 0.38977 0.283665 1.37404 0.1709
Correlation:
th1 th2 th3 th4
th2 0.831
th3 -0.650 -0.909
th4 0.112 0.155 -0.355
th5 0.863 0.986 -0.931 0.248
Standardized residuals:
Min Q1 Med Q3 Max
-4.1760032 -0.5382668 0.1562766 0.4685797 2.2711929
Other examples of asymmetric alternatives to the 4PL model are members of the
so-called growth models, often used in biology (Lindsey 2001; Narinc et al. 2010),
such as the Gompertz function and Richards function. Both models were developed
for growth data where the phases of growth are asymmetrical. As in the 5PL
model (4.8), these models do not have an ED50 parameter directly included. These
functions can however be applied to asymmetrical dose-response curves, and the
ED50 parameter can be explicitly calculated (if wanted) through inverse prediction.
For illustrational purposes, we fit the models to the propulsion dataset. Since
growth models are typically used to model increasing responses with increasing
time, the original response variable, percent distance traveled .%/, in the propulsion
dataset will be transformed to inhibition as 100 ratio. Notice that the Gompertz and
Richards models are not good choices for this dataset since they both assume a lower
asymptote equal to zero, which is clearly not the case.
Gompertz Function
## parameter estimates
Coefficients:
Value Std.Error t-value p-value
a 80.10299 3.863470 20.733432 0
b 0.35444 0.059149 5.992368 0
c 0.45215 0.047345 9.550109 0
Figure 4.9b shows the Gompertz model estimated for the propulsion dataset.
62 R. Straetemans
a 100
80
60
Response
40
20
b 100
inflection = −2.293934 Halfway dose = −1.483340
80
Inhibition (%)
60
40
20
−15 −10 −5 0 5 10
Fig. 4.9 Gompertz model. (a) Gompertz model (solid line) and 4PL model (dotted line).
Parameter setting for the 4PL model: 1 D 100; 2 D 0; 3 D 1, and 4 D 0. The open circles are
inflection points of the model. (b) Gompertz model fitted to the propulsion dataset. The vertical
solid line represents the doses at the inflection point, and the dotted vertical line represents the
dose corresponding to half the response between the two asymptotes
4 Nonlinear Modeling of Dose-Response Data 63
Richards Function
# Richards model
gnls.modelR001<-gnls(inhibition˜a*(1-b*exp(-1*k*lmpk))ˆ(1/(1-m)),
data = data2b,
params = list(a + b + k + m ˜1),
start=c(78,-0.156,0.55,1.44),
control=gnlsControl(nlsTol=0.1))
# Parameter estimates
Coefficients:
Value Std.Error t-value p-value
a 75.51039 4.195788 17.996714 0.0000
b -0.19126 0.178131 -1.073730 0.2841
k 0.64348 0.205562 3.130361 0.0020
m 1.62286 0.640525 2.533637 0.0120
Figure 4.10b shows the estimated model. The inflection point is a function of the
model parameters and can be calculated by
> inflec<-(1/(coef(gnls.modelR001)[3]))*
log(coef(gnls.modelR001)[2]/(1-coef(gnls.modelR001)[4]))
> inflec
k
-1.834793
64 R. Straetemans
a 100
80
m = 1.01
m = 1.4
60 m = 3.4
Response
m = 6.4
40
20
−10 0 10 20
Dose
b 100
inflection = −1.834793 Halfway dose = −1.612732
80
Inhibition (%)
60
40
20
−15 −10 −5 0 5 10
Dose (mg/kg, loge scale)
Fig. 4.10 Richards function. (a) Richards functions with several values of m. (b) Richards
function fitted to the propulsion data
4 Nonlinear Modeling of Dose-Response Data 65
4.4 Discussion
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle.
In B. Petrov, & B. Csaki (Eds.), Second international symposium on information theory (pp.
267–281). Budapest: Academiai Kiado.
Gompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and
on a new mode of determining the value of life contingencies. Philosophical Transactions of
the Royal Society of London, 115, 513–585.
Gottschalk, P. G., & Dunn, J. R. (2005). The five-parameter logistic: A characterization and
comparison with the four-parameter logistic. Analytical Biochemistry, 343, 54–65.
Lindsey, J. (2001). Nonlinear Models in Medical Statistics. Oxford, Oxford University Press.
Narinc, D., Karaman, E., Firat, M. Z. F., & Aksoy, T. (2010). Comparison of non-linear growth
models to describe the growth in Japanese Quail. Journal of Animal and Veterinary Advances,
9(14), 1961–1966.
66 R. Straetemans
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-Plus. New York: Springer.
Plikaytis, B. D., Turner, S. H., Gheesling, L. L., & Carlone, G. M. (1991). Comparisons of standard
curve-fitting methods to quantitate Neisseria meningitides group a polysaccharide antibody
levels by enzyme-linked immunosorbent assay. Journal of Clinical Microbiology, 29(7),
1439–1446.
Richards, F. J. A. (1959). Flexible growth function for empirical use. Journal of Experimental
Botany, 10(29), 290–300.
Straetemans, R., & Bijnens, L. (2010). Application of the separate ray model to investigate
interaction effects. Frontiers in Bioscience, E2, 266–278.
Tanila, H., Kauppila, T., & Tana, T. (1993). Inhibition of intestinal motility and reversal
of postlaparotomy ileus by selective 2-adrenergic drugs in the rat. Gastroenterology, 104,
819–824.
Ting, N. (Ed.). (2006). Dose finding in drug development. New York: Springer.
Part II
Dose–Response Microarray Experiments
Chapter 5
Functional Genomic Dose-Response
Experiments
Luc Bijnens, Hinrich W.H. Göhlmann, Dan Lin, Willem Talloen, Tim Perrera,
Ilse Van Den Wyngaert, Filip De Ridder, An De Bondt, and Pieter Peeters
5.1 Introduction
In the first part of the book, we discussed different aspects of the analysis of
dose-response data such as estimation, inference, and modeling. In this part of the
book, we focus on dose-response microarray experiments. Within the microarray
setting, a dose-response experiment has the same structure as described in part I
of the book. The response is the gene expression at a certain dose level. The role
of functional genomics, particularly in this setting, is to find indications of both
safety and efficacy before the drug is administrated to patients. Studies in human
cell lines or in rodents are often used for that purpose. They are usually the next step
after the high-throughput experimentation that identifies and validates the biological
targets. Preclinical experiments include both discovery and toxicology assays. They
are usually carried out before or during the clinical programs.
For a single gene, the data structure is similar to the one we discussed in the
first part of the book: a response vector (gene expression) which is measured in
a sequence of (increasing) dose levels. Let Y be an expression matrix containing
information about the gene expression of m genes in n samples (arrays) measured
at K C 1 dose levels. Figure 5.1 shows an illustration of the data structure. Here,
Yij k is the expression level of the j th sample (array) of the kth gene measured at
dose level i and Y0j k is the gene expression measured at dose zero. Note that at each
dose level, there are rj replicates available for each gene. Hence, each row in the
expression matrix consists of a dose-response experiment for a specific gene.
The main goal of the analysis is to detect trends in gene expression caused by
increasing doses of compounds. The aim of these microarray experiments is to
get insight into the mechanism of action and the safety profile using functional
genomics data to identify pathways that are affected by the compound at hand.
In this context, gene expression experiments have become important either before
or parallel to the clinical testing programs. Dose-response relationships either
upregulated or downregulated may give insight in the biological target and the
mechanism that the new medicine uses to treat the disease. At the same time, it
may generate data on the pathways involved in potential unwanted side effects.
When side effects are detected early on in the development, the molecules could
potentially be modified chemically so that it no longer has that effect. Dose finding
using microarray experiments can also be of importance in the search for biomarkers
that could be used as signatures for response (Göehlmann and Talloen 2009). These
signatures could then be used to find the target population in which the drug has
the best therapeutic effect. Another use is the identification of biomarkers that
eventually could be used in phase II or phase III clinical trials as biomarkers
endpoints to replace the classical endpoints.
Ruberg (1995a,b) and Chuang-Stein and Agresti (1997) formulated four main
questions usually asked in dose-response studies: (1) Is there any evidence of the
drug effect? (2) For which doses is the response different from the response in the
control group? (3) What is the nature of the dose-response relationship? and (4)
What is the optimal dose?
5 Functional Genomic Dose-Response Experiments 71
Throughout the second part of the book, we address the above questions in several
ways. In Chaps. 2 and 3, we discussed the setting of a simple order, i.e., .d0 /
.d1 / ; : : : ; .dK /. In this setting, for a single gene, the first question is
related to hypothesis testing of the null hypothesis of no dose effect versus an order
alternative. Figure 5.2a shows an example of two genes with and without significant
dose effect. The aim of the first step of the analysis is to identify those genes with a
significant (and monotone) relationship with doses. Inference on the dose-response
relationship under order constraints by using various approaches, i.e., likelihood
ratio test and other t-type tests, is discussed in Chap. 7, the significance analysis of
microarrays (SAM) in Chap. 9, and the Bayesian approach in Chap. 13. Figure 5.2b
illustrates the problem related to the second question. Both genes presented in
Fig. 5.2b have a significant monotone relationship with dose. However, the dose
in which the gene expression is different from the control dose is not the same.
This topic is discussed in details in Chaps. 15–17 where we discuss the topics
of the ratio test between means of highest dose and dose zero using the multiple
contrast test (MCT), the multiple contrast ratio test, and the FDR-adjusted CIs
for the ratio parameters. Figure 5.2c illustrates the problem related to the third
question. Two parametric models (or isotonic regression can be used as well),
representing different dose-response curve shapes, are fitted to the expression data
of two genes, and the problem is to select the best gene-specific model. We employ
an exploratory tool, namely, a clustering algorithm to find genes with similar dose-
response patterns discussed in Chap. 9, several information criteria to explicitly
select the best from a set of fitted possible models discussed in Chap. 10, as well as
the parametric modeling of dose-response relationship to obtain ED50 the parameter
of interest in Chap. 14. Finally, Chap. 12 is devoted to gene set analysis which can be
useful for the biological interpretation of the results obtained using the approaches
discussed in the book.
The simple tree alternative can be used to test several doses with the control. Within
this setting, we consider one-sided comparisons of all dose levels with the control
but do not specify any order relationship among the mean gene expression at higher
dose levels, i.e., .d0 / Œ.d1 /; .d2 /; : : : ; .dK / (Robertson et al. 1988). In the
case that this is the primary aim of the analysis, Dunnett’s test can be used (Dunnett
1955; Robertson et al. 1988) for the one-way ANOVA model specified in (3.5). The
simple tree ordering is discussed in Chap. 15 in the context of MCTs.
72 L. Bijnens et al.
Gene Expression
Gene Expression
which the dose-response
6.0 6.0
relationship is not significant.
Right panel: a gene for which
+ + +
the dose-response 5.5 5.5
relationship is significant.
(b) Examples of two genes 5.0 + + + + 5.0
with fitted isotonic regression
means. Left panel: a gene
4.5 4.5
whose expression of control
dose is different than the 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
second dose onward. Right Dose Dose
panel: a gene whose
expression of control dose is b +
different than the third dose 6 6
+
onward. (c) Examples of two
genes with fitted parametric
models
Gene Expression
Gene Expression
+
5 5
+
+ + +
+
4 4
+
3 3
c 8.5 8.5
Gene Expression
Gene Expression
8.0 8.0
7.5 7.5
7.0 7.0
gene expression
gene expression
6.5
7.6
8.0
7.4 6.0
7.5
7.2
7.0 5.5
7.0 6.5
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
5.5 6.2
8.4
gene expression
gene expression
gene expression
5.8 8.2
5.0
8.0
4.5 5.4
7.8
4.0 5.0
7.6
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
Fig. 5.3 Examples of genes with unimodal partial order (umbrella) mean profile. First row: down–
up profiles. Second row: up–down profiles
The unimodal partial ordering (umbrella profile) is an ordering which imposes the
following restrictions .d0 / .d1 / ; : : : ; .dh / .dhC1 / .dK /.
Genes for which the mean profile satisfies a unimodal partial ordering restrictions
(up–down profile) have a mean profile which increases up to dose level h and
thereafter decrease (Robertson et al. 1988; Peddada et al. 2003). An example for
genes for which the mean profile follows a unimodal partial ordering (down–up and
up–down) is shown in Fig. 5.3.
Gene selection method based on the SAM and clustering of the dose-response
curve shapes within this setting are discussed in Chap. 11. We discuss two different
approaches. The first, ORIOGEN (Peddada et al. 2003), is a resampling-based
method which can be used for both inference and clustering. The second, ORICC
(Liu et al. 2009a), is a method based on model selection which can be used
for clustering. The umbrella alternative is discussed, in the context of MCTs, in
Chap. 15.
74 L. Bijnens et al.
gene expression
gene expression
6.8
7.8
5.0
6.6
7.6
6.4 4.5
7.4
6.2
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
gene expression
gene expression
11.4
9.0
8.0
11.2
7.5 8.5
11.0
7.0
10.8 8.0
6.5
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
Fig. 5.4 Examples of genes cyclical profiles for K C 1 D 4. First row: the observed mean gene
expression is in cyclical order. Second row: observed mean (dashed line) and estimated mean (solid
line) under cyclical ordered alternative
A cyclical mean profile (Simmons and Peddada 2007) implies that the mean gene
expression has minima and maxima within the dose range. For a dose-response
experiment with four dose levels, there is only one possible cyclical model .d0 /
.d1 / .d2 / .d3 /. For this cyclical mean profile, the mean gene expression
turns down at dose level 2 and turns up at dose level 3. Example of genes with
cyclical mean profiles for K C 1 D 4 are shown in Fig. 5.4. Hypothetical examples
of cyclical mean profiles for dose-response experiments with four, five, and six dose
levels are shown in Fig. 5.5. Note that for early drug development dose-response
experiments, the cyclical mean profile is often not of interest, and we will not discuss
this setting further in the book.
5 Functional Genomic Dose-Response Experiments 75
gene expression
gene expression
6.6
7.0 7.0
6.5
6.4 6.6
6.6
6.3
6.2 6.2
6.2
dose dose dose
6.6
gene expression
gene expression
gene expression
7.0
7.0
6.4
6.6
6.6
6.2
6.2
6.2
6.0
dose dose dose
Fig. 5.5 Hypothetical examples of cyclical mean profiles for four, five, and six dose levels
The data come from an oncology experiment designed to better understand the
biological effects of growth factors in human tumor. Human epidermal squamous
carcinoma cell line A431 (HESCA431) was grown in Dulbecco’s modified Eagle’s
medium, supplemented with L-glutamine (20 mM), gentamycin (5 mg/ml), and 10%
fetal bovine serum. The cells were stimulated with growth factor EGF (R&D Sys-
tems, 236-EG) at different concentrations (0, 1, 10, and 100 ng/ml) for 24 h. RNA
was harvested using RLT buffer (Qiagen). All microarray-related steps including
the amplification of total RNAs, labeling, hybridization, and scanning were carried
out, as described in the GeneChip Expression Analysis Technical Manual, Rev.4
(Affymetrix 2004). Biotin-labeled target samples were hybridized to human genome
arrays U133 A 2.0 containing probe sets interrogation approximately 22,000 tran-
scripts from the UniGene database (Build 133). Hybridization was performed using
76 L. Bijnens et al.
A data frame with the log2 transformed gene intensities is loaded into the R
environment. The first ten genes and first six samples are displayed in the table
below. The row names of the genes show the probe ID, X1, X1.1, and X1.2
are the three arrays for dose zero, while X2, X2.1, and X2.2 are the arrays
for the first dose. The data frame is loaded using the function load():
> load("data.Rdata")
IsoPlot() is the plotting function that can be used to explore the data. Figure 5.6
shows a scatter plot for the second gene in the dataset (data[2,]), and it can be
produced by
5.6 5.6
5.4 5.4
+ +*
gene expression
gene expression
5.2 5.2
+ +
+ +*
* *
5.0 5.0
+ +
4.8 4.8
4.6 4.6
Fig. 5.6 The data points are plotted as circles, while sample means as pluses. The right panel
additionally plots the fitted increasing isotonic regression model (solid line)
a
Gene: Gene 1 Gene: Gene 1
+ + +* +*
10 10
+ +*
Gene Expression
Gene Expression
9 9
+ +*
8 8
7 + 7 +*
+ +*
6 6
0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5
Doses Doses
b
Gene: Gene 1 Gene: Gene 1
+ + + +
*
*
10 10
+ +
*
Gene Expression
Gene Expression
9 9
+ +
*
8 8
7 + 7 +
*
+ +
*
6 6
0 0.01 0.04 0.16 0.63 2.5 0 0.01 0.04 0.16 0.63 2.5
Doses Doses
Fig. 5.7 Exploratory plots with IsoPlot(). (a) The plots produced by IsoPlot with ori-
ginal data (left panel) and with the isotonic regression model fitted to the data (right panel,
add.curve=TRUE) where dose is treated as a continuous variable. The data points are plotted
as circles, with sample means as pluses and the fitted isotonic regression model (solid line) for the
right panel plot. (b) The plots produced by IsoPlot with original data (left panel) and with the
isotonic regression model fitted to the data (right panel) where dose is treated as an ordinal variable
(type = “ordinal”). The real dose level is presented on the x-axis
The experiment was performed using the Affymetrix whole human genome array.
Each chip consists of 11,562 probe sets. For simplicity, we refer the probe sets
as genes. The biological question of this experiment is to know which genes are
affected by downstream of the dopamine D2 receptor.
The example data explained above are stored in a workspace “data.RData”
which contains two objects: dose and express (containing gene intensities). To
load these objects, we can use the function load:
> load("data.RData")
> dose
[1] 0.00 0.00 0.01 0.01 0.04 0.04 0.16 0.16 0.63 0.63 2.50 2.50 0.00 0.00
[15] 0.00 0.01 0.01 0.01 0.04 0.04 0.16 0.16 0.63 0.63 2.50 2.50
> dim(express)
[1] 11562 26
As illustrated in the previous example, we use the IsoPlot function to produce the
dose-response scatterplots. The function has the options to specify the dose levels by
using type = “ordinal” or “continuous” (default). The scatterplots for the
first gene in the dataset (express[1,]), shown in Fig. 5.7, can be produced by:
> IsoPlot(dose,express[1,],type="continuous")
> IsoPlot(dose,express[1,],type="continuous",add.curve = TRUE)
> IsoPlot(dose,express[1,],type="ordinal")
> IsoPlot(dose,express[1,],type="ordinal", add.curve=TRUE)
References
Affymetrix GeneChip. (2004). Expression analysis technical manual, Rev.4. Santa Clara: Affy-
metrix.
Amaratunga, D. & Cabrera, J. (2004). Exploration and Analysis of DNA Microarray and Protein
Array Data, Hoboken, NJ, Wiley-Interscience-John Wiley and Sons, Inc.
Bolstad, B. M., Irizarry, R. A., Astrand, M., & Speed, T. P. (2002). A comparison of normalization
methods for high density oligonucleotide array data based on bias and variance. Bioinformatics,
19, 185–193.
Chuang-Stein, C., & Agresti, A. (1997). Tutorial in biostatistics: A review of tests for detecting
a monotone dose-response relationship with ordinal response data. Statistics in Medicine, 16,
2599–2618.
Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a
control. JASA, 50, 1096–1121.
Goehlmann, H., & Talloen, W. (2009). Gene expression studies using Affymetrix microarrays. Boca
Raton: Chapman & Hall/CRC.
Hubbell, E., Liu, W. M., & Mei. R. (2002) Robust estimators for expression analysis. Bioinformat-
ics, 18(12), 1585–1592.
80 L. Bijnens et al.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009a). Order-restricted information criterion-based
clustering algorithm. Reference manual. http://cran.r-project.org/web/
packages/ORIClust/.
Peddada, S., Lobenhofer, E. K., Li, L., Afshari, C. A., Weinberg, C. R., & Umbach, D. M. (2003).
Gene selection and clustering for time-course and dose-response microarray experimants using
order-restricted inference. Bioinformatics, 19(7), 834–841.
Simmons, S. J., & Peddada, S. (2007). Order-restricted inference for ordered gene expresion
(ORIOGEN) data under heteroscedastic variances. Bioinformation, 1(10), 414–419.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference. New
York: Wiley.
Ruberg, S. J. (1995a). Dose response studies. I. Some design considerations. Journal of Biophar-
maceutical Statistics, 5(1), 1–14.
Ruberg, S. J. (1995b) Dose response studies. II. Analysis and interpretation. Journal of Biophar-
maceutical Statistics, 5(1), 15–42.
Chapter 6
Adjustment for Multiplicity
6.1 Introduction
D. Yekutieli ()
Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv
University, Tel-Aviv, Israel
e-mail: [email protected]
D. Lin
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
D. Amaratunga
Biostatistics and Programming, Janssen Pharmaceutical Companies of Johnson & Johnson,
Raritan, NJ, USA
e-mail: [email protected]
controlling the Familywise Error Rate (FWER), the probability of making at least
one type I error.
While there are many cases in which FWER control is needed, the purpose of
gene expression data analysis is to find genes that are potential candidates for further
investigation, and several erroneous rejections will not distort the conclusions at this
stage of the investigation, as long as their proportion is small. Thus, controlling
the probability of making even one erroneous rejection is overconservative and
will result in reduced experimental efficiency due to unnecessary loss of power.
A more suitable option may be controlling the false discovery rate (FDR), defined
in Benjamini and Hochberg (1995) as the proportion of errors in the set of identified
differentially expressed genes.
In this chapter, we discuss a few procedures controlling for the FWER, such as
the Bonferroni, Holm, and the maxT procedures. However, the focus of this chapter
is controlling the FDR, since it admits a more powerful outcome. We discuss several
variations of the Benjamini and Hochberg step-up procedure (BH-FDR 1995),
the permutation-based FDR controlling procedures, and the significance analysis
of microarrays (SAM) approach of Tusher et al. (2001) and Efron et al. (2001),
and Storey (2002, 2003) Bayesian interpretation of the FDR within the context of
microarray data.
Benjamini and Hochberg (1995) considered the case where there are m hypotheses
(Table 6.1) that need to be tested, among which m0 are true null hypotheses and m1
are false null hypotheses. Let V be the number of true null hypotheses that we reject
and let R be the total number of rejected hypotheses. Note that the values of m0
and m1 are unknown in practice.
The FWER (Hochberg and Tamhane 1987) is defined as the probability to reject
erroneously at least one true null hypothesis, i.e., FWER D P .V > 0/. Here the
term “family” refers to the collection of null hypotheses H01 ; : : : ; H0m that is being
considered for joint testing. Once the family has been defined, strong control of the
FWER (at a joint level ˛) requires that FWER ˛ for all possible constellations of
true and false null hypotheses (Lehmann and Romano 2005).
The FDR introduced by Benjamini and Hochberg (1995) is defined as the ex-
pected proportion of false rejections among the rejected hypotheses, FDR D E.Q/,
where Q D V =R when R > 0, and Q D 0 otherwise. Approaches based on the
control of the FDR have gained their popularity in the microarray setting, because
they lead to higher power as compared to the methods that control the FWER.
Recently, two new error rates were introduced to control the number of false
positives. The k-FWER, proposed by Hommel and Hoffman (1998), is defined as the
probability of rejecting at least k true null hypotheses, i.e., k-FWER D P .V > k/.
Controlling the k-FWER overcomes the stringency of controlling the FWER, and
has the advantage of controlling the number of mistakes as compared to controlling
the FWER or FDR (Xu and Hsu 2007). A generalization of the FDR, similar in spirit
6 Adjustment for Multiplicity 83
to the way the k-FWER generalizes the FWER, was proposed by Sarkar (2007). The
k-FDR is defined as the expected proportion of k or more false rejections among all
rejections, i.e., k-FDR D E.Qk / with Qk D V =R if V > k and Qk D 0 if V k.
The procedures controlling the k-FWER and k-FDR are not implemented in this
book.
In this section, we outline several procedures used to control the FWER. These
include Bonferroni’s and Holm’s (1979) approaches. We also describe procedures
for controlling the FDR: Benjamini and Hochberg step-up procedure (BH-FDR)
and Benjamini and Yekutieli procedure (BY-FDR). Some other resampling-based
procedures, such as the maxT (Westfall and Young 1993) for controlling the FWER,
the adaptive resampling-based procedure for controlling the FDR, the SAM, and the
linear mixed model for microarray (Limma), will be discussed.
Let Pi be the raw p value for the test statistic ti for gene i (i D 1 : : : m) and let H0i
be the corresponding null hypothesis.
The single-step Bonferroni procedure rejects Hi if Pi ˛=m, where ˛ is the
desired level of the FWER. This ensures that the probability that at least one true
hypothesis is rejected is not greater than ˛. In order to keep the significance level
irrespective of the number of tests performed, the adjusted p values can be computed
to simplify the multiple testing procedure. The adjusted p value for Bonferroni can
be written as PQi D min.m Pi ; 1/ and H0i is rejected if PQi ˛.
Let P.1/ P.2/ : : : P.i / : : : P.m/ be the ordered p values and let H0.1/ ; H0.2/ ; : : : ,
H0.m/ be the corresponding null hypotheses. Holm’s procedure (Holm 1979) is a
step-down procedure that compares P.1/ to ˛=m and sequentially compares the
84 D. Yekutieli et al.
ordered raw p values P.i / with ˛=.m i C1/ and rejects all the hypotheses stopping
at the first i for which P.i / is smaller than the corresponding critical value. Holm’s
adjusted p values are given by
PQ.1/ D min mP.1/ ; 1
PQ.2/ D min max.PQ.1/ ; .m 1/P.2/ /; 1
::
:
PQ.i / D min max.PQ.i 1/ ; .m i C 1/P.i 1/ /; 1
::
:
PQ.m/ D min max.PQ.m1/ ; P.m/ /; 1 :
For a desired FDR level q, the ordered p value P.i / is compared to the critical
value q i=m. Let k D maxfi W P.i / q i=mg. Then reject H.1/ ; : : :; H .k/ , if such
a k exists. Benjamini and Hochberg (1995) showed that when the test statistics are
independent, this procedure controls the FDR at the level q m0 =m q. Benjamini
and Yekutieli (2001) further showed that the FDR qm0 =m for positively
dependent test statistics as well. The technical condition under which the control
holds is that of positive regression dependency on each test statistic corresponding
to the true null hypotheses.
Benjamini and Yekutieli (2001) further P derived a universal bound for the FDR of
the level q BH-FDR procedure q . m j D1 1=j / m0 =m. This yields the BY-FDR
procedure, a modification of the BH-FDR procedure that controls the FDR for
any joint P test statistic distribution: reject H.1/ ,. . . ,H .k/; if k D maxfi W P.i /
q i=Œm . m j D1 1=j /g exists.
The adjusted p values for the BH-FDR and BY-FDR procedures are given by
Q mC
P.i / D minkDi;:::;m min P.i / ; 1 ; (6.1)
i
Pm
where C D 1 for the BH-FDR procedure and C D j D1 1=j for the BY-FDR
procedure.
6 Adjustment for Multiplicity 85
The adjusted p values by using these four procedures discussed above can be
obtained using the following code:
> praw <-c(0.0001, 0.0004, 0.0019,0.0095,0.0201, 0.0278,0.0298,
0.0344, 0.0459, 0.3240, 0.4262, 0.5719, 0.6528,0.7590,1)
> p.Bonf <- p.adjust(praw, method = "bonferroni", n = length(praw))
> p.Holm <-p.adjust(praw, method = "holm", n = length(praw))
> p.BH <-p.adjust(praw, method = "BH", n = length(praw))
> p.BY <-p.adjust(praw, method = "BY", n = length(praw))
> cbind(praw, p.Bonf, p.Holm, p.BH, p.BY)
praw p.Bonf p.Holm p.BH p.BY
[1,] 0.0001 0.0015 0.0015 0.00150000 0.004977343
[2,] 0.0004 0.0060 0.0056 0.00300000 0.009954687
[3,] 0.0019 0.0285 0.0247 0.00950000 0.031523175
[4,] 0.0095 0.1425 0.1140 0.03562500 0.118211908
[5,] 0.0201 0.3015 0.2211 0.06030000 0.200089208
[6,] 0.0278 0.4170 0.2780 0.06385714 0.211892623
[7,] 0.0298 0.4470 0.2780 0.06385714 0.211892623
[8,] 0.0344 0.5160 0.2780 0.06450000 0.214025770
[9,] 0.0459 0.6885 0.3213 0.07650000 0.253844518
[10,] 0.3240 1.0000 1.0000 0.48600000 1.000000000
[11,] 0.4262 1.0000 1.0000 0.58118182 1.000000000
[12,] 0.5719 1.0000 1.0000 0.71487500 1.000000000
[13,] 0.6528 1.0000 1.0000 0.75323077 1.000000000
[14,] 0.7590 1.0000 1.0000 0.81321429 1.000000000
[15,] 1.0000 1.0000 1.0000 1.00000000 1.000000000
o Raw
+ Bonferroni
* Holm #
0.8 # BH
$ BY #
#
+
0.6
#
p−values
+
#
+
+
0.4
+ *
* * * $
0.2 $* $ $ $
+
$*
# # # # #
$*
+ #
0.0 $
+
#* $
+
#* #
2 4 6 8 10 12 14
Index
There are nine raw p values (solid line) below 0.05. From the adjusted p values,
the BH procedure (with #) discovers four genes as differentially expressed, while
all the other procedures yield three significant genes at the significance level of 0.05
(horizontal solid line).
This case study was obtained from a behavioral experiment, in which 24 males,
experimentally naive Long-Evans rats obtained from Janvier (France), weighing
300–370 g at the start of the experiment, were randomized into two treatment groups
(12 rats in each group). Quinpirole hydrochloride (Sigma-Aldrich) (treatment 1) was
dissolved in physiological saline and administered at a dose of 0.5 mg/kg s.c. (the
method used by Szechtman et al. 1998). Equivalent volumes of saline (treatment
0) were used in solvent injections. Animals were tested in a large open field. Two
datasets encompassed behavioral response and microarray data. Rat behavior data
6 Adjustment for Multiplicity 87
9
5.2
8
5.0
4.8
7
4.6
6
4.4
5
4.2
4 4.0
Fig. 6.2 Boxplot of downregulated expression for gene 345 (panel a) and upregulated expression
for gene 5216 (panel b) in the two treatment groups
Table 6.2 Raw and adjusted p values for differentially expressed genes found by using t -tests
with multiplicity adjustment (only adjusted p values <0.05 are reported)
Gene Raw p value Bonferroni p value Holm p value BH p value BY p value
59 0.0000 0.0215 0.0215 0.0027 0.0248
60 0.0000 0.0318 0.0318 0.0032 0.0293
158 0.0001 0.0477
214 0.0000 0.0134 0.0134 0.0021 0.0194
345 0.0000 0.0006 0.0006 0.0002 0.0019
486 0.0000 0.0012 0.0012 0.0003 0.0028
662 0.0000 0.0002 0.0002 0.0001 0.0011
1962 0.0000 0.0019 0.0019 0.0004 0.0035
2247 0.0000 0.0289 0.0289 0.0032 0.0293
4297 0.0001 0.0477
4447 0.0000 0.0002 0.0002 0.0001 0.0011
5216 0.0000 0.0147 0.0147 0.0021 0.0194
5614 0.0001 0.0452
The number of differentially expressed genes found after Bonferroni, Holm, and
the BY-FDR adjustment is the same (i.e., 10), while the BH-FDR adjustment yields
13 differentially expressed genes with smaller adjusted p values than those of the
other methods.
In a microarray setting, resampling methods to adjust for multiplicity are often used
(Reiner et al. 2003; Tusher et al. 2001; Ge et al. 2003). The main motivation is to
avoid inference based on asymptotic distribution of the test statistics, which, within
the microarray setting, can be problematic because of either typically small sample
sizes or departure from the assumption about the distribution of the response. Also,
in some cases, the asymptotic distribution of the test statistics is unknown (Tusher
et al. 2001).
The histogram of the p values obtained based on the permutations by using the
method in (6.4) seems to be uniformly distributed (see Fig. 6.3a). This indicates
that there are few genes which seem to be differentially expressed. The number
of differentially expressed genes found by using both Bonferroni and the BH-FDR
90 D. Yekutieli et al.
a b
Histogram of Permutation p−values Comparison of p−values
1.0
600
500 0.8
Asymptotic p−values
400
0.6
Frequency
300
0.4
200
0.2
100
0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
raw p−values Permutation p−values
Fig. 6.3 The raw p values. (a) Histogram of raw p values based on permutations using the method
in (6.4). (b) Comparison of raw p values from permutations and asymptotic t -tests
procedures is nine, indicating which are the most differentially expressed among all
the 5,644 genes. All the nine genes found have permutation p values equal to zero
and are contained in the list of differentially expressed genes (except genes 158,
2247, 4297, and 5614) detected using the asymptotic p values by the t-tests and
adjusted by Bonferroni and the BH-FDR procedures. The comparison between the
asymptotic p values and permutation p values shows a high agreement, as shown
in Fig. 6.3b.
The maxT procedure, proposed by Westfall and Young (1993) in order to control the
FWER, is discussed in the context of microarray analysis by Ge et al. (2003).
The starting points are the observed statistics and the permutation matrix T .
Let t.1/ t.2/ ; : : : ; t.m/ be the ordered values of the test statistics. The
permutation matrix T is sorted based on the original order of the observed statistics
(Westfall and Young 1993). For each column of the permutation matrix, the adjusted
test statistics are calculated in the following way:
For a two sample comparison, the maxT procedure calculates the raw p values
of t-test statistics or Wilcoxon test statistics by using the sampling-based approach
given by (6.3). This is implemented in the R function mt.maxT in the multtest()
package :
> library(multtest)
> maxt <- mt.maxT(gene.exp,trt,test="t",side="abs",fixed.seed.sampling="y",
B=10000)
> sum(maxt[,4]<=0.05)
> 10
> maxt[1:10,]
ID teststat rawp adjp
1962 -10.407638 1e-04 0.0001
345 -9.702417 1e-04 0.0001
4447 -9.200213 1e-04 0.0001
662 -8.272687 1e-04 0.0006
60 -7.602661 1e-04 0.0010
486 -7.477946 1e-04 0.0011
59 -6.580940 1e-04 0.0051
5216 6.367395 2e-04 0.0080
214 -6.311827 1e-04 0.0085
2247 -6.086100 1e-04 0.0141
In the animal behavior study, ten genes are found to be differentially expressed
by using the maxT approach. This procedure gives one more significant gene (gene
2247) as compared to Bonferroni and the BH-FDR approach based on the raw p
values obtained from the permutations using the method in (6.4).
Since the BH procedure controls the FDR at a level too low by a factor of m0 =m,
it is natural to try to estimate m0 and use q D q m=m0 instead of q to gain more
power. Benjamini et al. (2006) suggest a simple two-stage procedure: use the BH-
FDR procedure once to reject r1 hypotheses; then use the BH-FDR at the second
stage at level q D q m=.m r1 /.1 C q/. This two-stage procedure has proven
92 D. Yekutieli et al.
FDR controlling properties under independence and simulation studies support for
controlling properties under positive dependence.
V .p/
FDRest .p/ D EV .p/ :
V .p/ C sO .p/
Two estimation methods for sO .p/ are suggested differing by their strictness level.
The FDR local estimator is conservative on the mean, and the FDR upper limit
bounds the FDR with probability 95%.
A third alternative uses the BH procedure to control the FDR, but rather than using
the raw p values, it applies the BH-FDR adjustment to resampling-based p values
similar to those computed in (6.4).
The fdr.ma() is the main function in the fdrame package. A general call for the
function has the form of
fdr.ma(exp.arr,design,
p.method=c("theoretic",""resampling"),
fdr.adj=c("BH-LSU","adaptive","point.est","upper.est"),
equal.var=TRUE,
plot=c("pvlVSrank","adjVSstat"),perms.num=100)
The fdr.ma() function provides four methods for adjusting the FDR: the
BH-FDR approach with theoretic and resampling-based p values, the two-stage
6 Adjustment for Multiplicity 93
adaptive procedure can be used in combination with both the theoretical and
resampling-based p values, while the upper and point estimates of the FDR are
obtained only using the resampling-based procedure. The following code can be
used to obtain the number of differentially expressed genes for each of the four
procedures for the animal behavior case study:
> library(fdrame)
> fdr.BH.theoretic <- fdr.ma(data.matrix(gene.exp),trt,
p.method="theoretic",fdr.adj="BH-LSU",equal.var=TRUE)
> sum(fdr.BH.theoretic$adj<=0.05)
[1] 11
> sum(fdr.BH.resampling$adj<=0.05)
[1] 11
> sum(fdr.adaptive.theoretic$adj<=0.05)
[1] 13
> sum(fdr.adaptive.resampling$adj<=0.05)
[1] 20
> sum(fdr.upper.resampling$adj<=0.05)
[1] 11
> sum(fdr.point.resampling$adj<=0.05)
[1] 18
The table produced by the code bellow lists the top 20 differentially expressed
genes with the adjusted p values for all the six methods:
In the edited top table (see Table 6.3), genes are ordered according to their
row numbers in the dataset, and we omit some of the p values which are greater
94 D. Yekutieli et al.
Table 6.3 Edited top table: top 20 differentially expressed genes declared by the various
resampling-based FDR procedures with the adjusted p values
BH.theo BH.rsmpl adpt.theo adpt.rsmpl up.rsmpl pnt.rsmpl
59 0.0012 0.0012 0.001 0 0 0.0011
60 0.0002 0 0.0002 0 0 0
158 0.0477 0.0315 0.0335
214 0.0017 0.0013 0.0015 0.0011 0 0.0011
345 0 0 0 0 0 0
486 0.0002 0 0.0002 0 0 0
522 0.0485
662 0.0001 0 0 0 0 0
1022 0.0439 0.0464
1316 0.0424 0.0456
1962 0 0 0 0 0 0
2247 0.0025 0.0033 0.0023 0.003 0 0.002
2489 0.0533 0.0329 0.0368
3170 0.0424 0.0456
4297 0.0477 0.0283 0.0335
4447 0 0 0 0 0 0
5216 0.0017 0.0012 0.0015 0.0011 0 0.0011
5352 0.0485
5356 0.0424 0.0456
5614 0.0347 0.037 0.0315 0.0191 0.0355 0.0248
Genes are ordered according to their row number
than 0.05. The FDR adaptive procedure yields the largest number of differentially
expressed genes (20), which contains all the significant genes found by the other
methods. Depending on the random permutation seed, the output of these results
can vary; therefore, a fixed random seed (123) was used to produce the number of
differentially expressed genes above.
The SAM (Tusher et al. 2001) is a widely used testing procedure that estimates the
FDR by using permutations under the assumption that all null hypotheses are true.
The procedure consists of three components: (1) the adjusted test statistics, (2) an
approximation of the distribution of the test statistics based on permutations, and
(3) the control of the FDR.
For a two-group setting, the modified test statistic for gene i in SAM is given by,
XN 1 XN 2
tiSAM D ; (6.5)
si C s0
6 Adjustment for Multiplicity 95
where
Pn 1 Pn2
j D1 x1j j D1 x2j
XN 1 D ; XN2 D ;
n1 n2
and
s Pn 1 Pn2
1 1 j D1 .x1j xN 1 /2 C j D1 .x2j xN 2 /2
si D C :
n1 n2 n1 C n2 2
Here, s0 is the fudge factor which is estimated from the data and is discussed later
(in Sect. 8.3).
The SAM requires that the test statistic for each permutation is sorted for all
the genes, such that the first row of the sorted matrix is the minimum test statistic
across permutations and the last row is the maximum. The expected values of the
observed ordered statistics are approximated by the means of the rows of the sorted
permutation matrix T SAM :
0 1
t.1/1 t.1/2 ::: t.1/B 0 1
Bt t.2/B C tN.1/1
B .2/1 t.2/2 ::: C
B C B tN.2/1 C
B: : : : C e B C
T SAM DB C ) T SAM D B : C: (6.6)
B: : : : C @ :: A
B C
@: : : : A tN.m/1
t.m/1 t.m/2 ::: t.m/B
o e
Let tiSAM and tiSAM be the observed and the expected values of the test statistic
for the i th gene, respectively. A gene is declared differentially expressed whenever
o e
jtiSAM tiSAM j :
The value of is chosen in order to control the FDR at a desired level. For the
details of the SAM procedure, approximating the FDR, and choice of the fudge
factor s0 used in the test statistic, we refer to Tusher et al. (2001) and Chap. 8.
The R package samr can be used to perform the SAM in R. In this section, we
illustrate how to use the samr() function for the animal behavior experiment
described in Sect. 6.4.2. For a two-sided t-test we use
> dim(data) ## gene expression
> d=list(x=data,y=trt+1,geneid=as.character(1:5644),
genenames=paste("gene", as.character(1:5644)),logged2=False)
> samr.obj <- samr(d, resp.type="Two class unpaired")
96 D. Yekutieli et al.
0
observed score
−2
−4
−6
−8
−3 −2 −1 0 1 2 3
expected score
> delta=.75
> samr.plot(samr.obj,delta)
> siggenes.table<-samr.compute.siggenes.table(samr.obj,delta,
d, delta.table)
> siggenes.table
$genes.up
Row Gene ID Gene Name Score(d) Numerator(r) Denominator(s+s0)
Fold Change q-value(%)
[1,] "5217" "gene 5216" "5216" "4.1331" "0.6936" "0.1678" "1.1613" "0"
[2,] "5615" "gene 5614" "5614" "3.8381" "1.0157" "0.2646" "1.3296" "0"
$genes.lo
Row Gene ID Gene Name Score(d) Numerator(r) Denominator(s+s0)
Fold Change q-value(%)
[1,] "1963" "gene 1962" "1962" "-9.1204" "-4.3426" "0.4761" "0.2665" "0"
[2,] "346" "gene 345" "345" "-8.2776" "-3.3194" "0.4010" "0.5671" "0"
[3,] "61" "gene 60" "60" "-6.8054" "-3.8216" "0.5616" "0.2773" "0"
[4,] "487" "gene 486" "486" "-6.3247" "-2.4150" "0.3818" "0.5814" "0"
[5,] "4448" "gene 4447" "4447" "-5.7429" "-0.8990" "0.1567" "0.8394" "0"
[6,] "60" "gene 59" "59" "-5.6754" "-2.4290" "0.4280" "0.5621" "0"
[7,] "663" "gene 662" "662" "-5.0379" "-0.7587" "0.1506" "0.8880" "0"
[8,] "215" "gene 214" "214" "-4.3599" "-0.8303" "0.1904" "0.8927" "0"
[9,] "159" "gene 158" "158" "-3.5751" "-0.8809" "0.2464" "0.8682" "0"
[10,] "1570" "gene 1569" "1569" "-3.0626" "-1.0422" "0.3403" "0.4991" "0"
[11,] "2248" "gene 2247" "2247" "-3.0525" "-0.3606" "0.1181" "0.9486" "0"
$color.ind.for.multi
NULL
$ngenes.up
[1] 2
$ngenes.lo
[1] 11
The Limma package fits a hybrid frequentist empirical Bayes (eBayes) linear model
(Smyth 2004; Efron et al. 2001) for the expression levels of the genes in the array.
The expression level variances of the genes in the array, 12 m2 , are elicited
a scaled inverse chi-squared marginal prior density. The hyperparameters of this
prior distribution, s02 —the prior variance and 0 —the prior degrees of freedom, are
derived by applying the Limma eBayes function to the sample variances s12 sm 2
.
Limma produces moderated t and F statistics, computed by dividing the standard
frequentist numerator with a denominator in which the degree of freedom sample
variance si2 is replaced with sQi2 D . 0 s02 C si2 /=. 0 C / the posterior mean of i2 jsi2 .
The significance levels provided in the Limma are the corresponding tabulated t or
F values with augmented 0 C degrees of freedom.
lmFit(object,design=NULL,ndups=1,spacing=1,block=NULL,correlation,
weights=NULL,method="ls", . . .)
where the object should be of class numeric, matrix, MAList, EList, mar-
rayNorm, ExpressionSet, PLMset containing log-ratios, or log-values of expression
for a series of microarrays, and design defines the dummy coded design matrix
for exploratory variables in the model. The function also supports two different cor-
relation structures. By defining the block structure, different arrays are assumed
to be correlated; while assigning block as NULL and ndups greater than one,
replicate spots on the same array are assumed to be correlated. However, it is not
possible to fit models with both a block structure and a duplicate-spot correlation
structure simultaneously.
By loading the library and specifying the data matrix and the design matrix from
the animal behavior study, the following code can be used to illustrate the use of
empirical Bayes inference:
> library(limma)
> design <- cbind(rep(1,24),trt)
> design
trt
[1,] 1 1
[2,] 1 1
[3,] 1 1
[4,] 1 1
[5,] 1 1
[6,] 1 1
[7,] 1 1
[8,] 1 1
[9,] 1 1
[10,] 1 1
[11,] 1 1
[12,] 1 1
[13,] 1 0
[14,] 1 0
[15,] 1 0
[16,] 1 0
[17,] 1 0
[18,] 1 0
[19,] 1 0
[20,] 1 0
[21,] 1 0
[22,] 1 0
[23,] 1 0
[24,] 1 0
After obtaining the model fit from the lmFit(), the output can be directly
passed to the function eBayes(), which computes moderated t-statistics, mod-
erated F-statistic, and log-odds of differential expression by empirical Bayes
shrinkage of the standard errors toward a common value. And some additional
information can be obtained for the prior variance (common for all genes), sample
variance, the prior degrees of freedom (common for all genes), and posterior
variance:
6 Adjustment for Multiplicity 99
> fit1$t
$sigma
3 4 5 6 7
0.3175772 0.2390783 0.3410579 0.5360771 0.6993906
5639 more elements . . .
$df.prior
[1] 2.885634
$s2.prior
[1] 0.07502008
$s2.post
3 4 5 6 7
0.09785952 0.05922962 0.11153145 0.26275448 0.44112675
5639 more elements . . .
The results (the moderated t-test statistics and adjusted p values) of the analysis
are shown as
> fit1$t
$t
trt
3 65.35861 -0.88594877
4 124.81012 -1.76788239
5 91.84175 -0.56723023
6 27.67912 -0.02735882
7 13.60096 -1.26273256
5639 more rows . . .
> fit1$p.value
$p.value
trt
3 2.311827e-29 0.38412651
4 2.479289e-36 0.08934106
5 5.038620e-33 0.57563939
6 3.265455e-20 0.97839176
7 5.043676e-13 0.21838979
5639 more rows . . .
After the multiplicity adjustment (the BH-FDR procedure), 14 genes are declared
to be differentially expressed with the significance level of 0.05:
6.6 Discussion
The procedures that control the FDR are preferred over those that control of the
FWER as they result in tests that have higher power. The BH-FDR, BY-FDR,
maxT, the adaptive procedures, and the SAM procedures can be combined with
resampling-based inference (RBI) to adjust for multiple testing without strong
distributional assumptions.
In this chapter, we have discussed the resampling-based procedures for control-
ling the FWER and FDR. The advantage of such resampling-based procedures is to
avoid the distributional assumption of gene expression data. Approaches, such as p
values obtained from permutations adjusted by the BH-FDR, BY-FDR procedure,
the adaptive procedures of the FDR, and the SAM procedure, can be used to find
differentially expressed genes. However, an issue related to RBI is that, because
of the small samples that are typically used in microarray experiments, the RBI p
value distributions can be coarse or granular. As a result, it will often be difficult to
obtain p values (as in (6.3)) that are below some specified level. To overcome this
problem, three approaches are discussed in this chapter, i.e., (1) raw p values are
obtained using (6.4) and adjusted by the BH-FDR procedure, (2) the SAM procedure
combines resampled test statistics across all genes as the null distribution to obtain
very small p values, and (3) the Limma empirical Bayes approach uses moderated
test statistics and estimates the common variance for all the genes. The procedures
preserve the correlation among test statistics of all the genes. Moreover, using
Limma one can borrow strength for variance estimation across the genes and derive
more powerful rejection regions for testing by assuming a statistic from a mixture of
the null and alternative distributions, as well as from the pure null distribution (Efron
et al. 2001). However, this is based on two assumptions: that the null distribution of
the test statistic is the same for all genes; and that genes are independent of each
other. Unfortunately, neither of the assumptions is necessarily valid.
References
Affymetrix GeneChip. (2004). Expression analysis technical manual, Rev.4. Santa Clara:
Affymetrix.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful
approach to multiple testing. Journal of Royal Statistical Soceity B, 57, 289–300.
6 Adjustment for Multiplicity 101
Benjamini, Y., Krieger, A. M., & Yekutieli, D. (2006). Adaptive linear step-up false discovery rate
controlling procedures. Biometrika, 93(3), 491–507.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing
under dependency. Annal of Statistics, 29(4), 1165–1188.
Dudoit, S., Fridlyand, J., & Speed, T. P. (2002). Comparison of discrimination methods for
the classification of tumors using gene expression data. Journal of the American Statistical
Association, 98, 77–87.
Efron, B., Tibshirani, R., Storey, J. D., & Tusher, V. (2001). Empirical Bayes analysis of a
microarray experiment. Journal of the American Statistical Association, 96, 1151–1160.
Ge, Y., Dudoit, S., & Speed, P. T. (2003). Resampling based multiple testing for microarray data
analysis (Technical report, 633). Berkeley: University of Berkeley.
Hochberg, Y., & Tamhane, Y. C. (1987). Multiple comparison procedures. New York: Wiley.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of
Statistics, 6, 65–70.
Hommel, G., & Hoffman, T. (1998). Controlled uncertainty. In P. Bauer, G. Hommel, &
E. Sonnemann (Eds.), Multiple hypotheses testing (pp. 154–161). Heidelberg: Springer.
Lehmann, E.L., & Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist.,
33(3), 1138–1154.
Reiner, A., Yekutieli, D., & Benjamini, Y. (2003). Identifying differentially expressed genes using
false discovery rate controlling procedures. Bioinformatics, 19(3), 368–375.
Sarkar, S. K. (2007). Procedures controlling generalized FWER and generalized FDR. The Annals
of Statistics, 35(6), 2405–2420.
Smyth, G. K. (2004) Linear models and empirical Bayes methods for assessing differential
expression in microarray experiments. Statistical Applications in Genetics and Molecular
Biology, 3(1), Article 3.
Storey, J. D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical
Society B, 64(Pt 3), 479–498.
Storey, J. D. (2003) The positive false discovery rate: A Bayesian interpretation and the q-value.
The Annals of Statistics, 31(6), 2013–2035.
Szechtman, H., Sulis, W., & Eilam, D. (1998). Quinpirole induces compulsive checking behavior
in rats: A potential animal model of obsessive-compulsive disorder (OCD). Behavioral
Neuroscience, 112, 1475–1485.
Tusher, V. G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrys applied to the
ionizing radiation response, Proceedings of the National Academy of Sciences, 98, 5116–5121.
Westfall, P. H., & Young, S. S. (1993). Resampling based multiple testing. New York: John Wiley &
Sons.
Xu, H., & Hsu, J. C. (2007). Using the partitioning principle to control the generalized family error
rate. Biometrical Journal, 49, 52–67.
Yekutieli, D., & Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple
test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82,
171–196.
Chapter 7
Single Contrast Tests
7.1 Introduction
In Chap. 3, we discussed several testing procedures for testing the null hypothesis of
no dose effect against ordered alternatives. Williams (1971, 1972) proposed a step-
down procedure to test for the dose effect. The tests are performed sequentially from
the comparison between the isotonic mean of the highest dose and the sample mean
of the control, to the comparison between the isotonic mean of the lowest dose and
the sample mean of the control. The procedure stops at the dose level where the null
hypothesis of no dose effect is not rejected. Marcus (1976) proposed a modification
of Williams’ procedure, in which the sample mean of the control was replaced by the
isotonic mean of the control. The likelihood ratio test, discussed by Bartholomew
(1961), Barlow et al. (1972), and Robertson et al. (1988), uses the ratio between the
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy T. Burzykowski
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]; [email protected]
D. Yekutieli
Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv
University, Tel-Aviv, Israel
e-mail: [email protected]
H.W.H. Göhlmann A. De Bondt L. Bijnens
Janssen Pharmaceutical Companies of Johnson & Johnson, Beerse, Belgium
e-mail: [email protected]; [email protected]; [email protected]
T. Verbeke
OpenAnalytics BVBA, Heist-op-den-Berg, Belgium
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 103
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 7,
© Springer-Verlag Berlin Heidelberg 2012
104 D. Lin et al.
variance calculated under the null hypothesis and the variance calculated under an
ordered alternative. In the context of dose-response microarray data, Hu et al. (2005)
proposed a test statistic that was similar to Marcus’ statistic, but with the variance
estimator calculated under the ordered alternative. Lin et al. (2007) proposed a
modification for Hu’s test statistic in which the degrees of freedom for the variance
estimator are not fixed for all genes.
The chapter is organized as follows. In Sects. 7.2 and 7.3, we review the M test
statistics of Hu et al. (2005) and Lin et al. (2007). Directional inference and the
multiplicity issue are discussed in Sect. 7.4. In Sect. 7.5, we show how to use the
IsoGene package to compute the five test statistics discussed above and how to
obtain a list of significant genes using the FDR correction. Finally, in Sect. 7.6, we
compare the results of the analysis of the case study using the five tests discussed. In
particular, we present the comparison of the likelihood ratio test and the modified M
test statistic.
O ?K O ?0
M D qP P : (7.1)
K ni
i D0 j D1 .yij
O ?i /2 =.N K/
The numerator of the M test statistic is the same as that of Marcus’ statistic, while
the denominator is an estimate of the standard error under an ordered alternative.
This is in contrast to Williams’ and Marcus’ approaches that use the unrestricted
means to derive the estimate for the standard error.
Hu et al. (2005) evaluated the performance of the EN 012
and M test statistics by
comparing the ranks of genes obtained by using both statistics, and reported similar
findings for simulated and real-life data sets.
For the variance estimate, Hu et al. (2005) used N –K degrees of freedom. However,
the unique number of isotonic means is not fixed but changes across the genes. For
that reason, Lin et al. (2007) proposed a modification
qP toP the standard error estimator
K ni
used in the M statistic by replacing it with i D0 j D1 .yij
O ?i /2 =.N I /,
where I is the unique number of isotonic means for a given gene. Such a
modification is expected to improve the standard error estimates across all the genes.
7 Single Contrast Tests 105
The five test statistics discussed in Sects. 3.3, 7.2, and 7.3 should be calculated
assuming a particular direction of the ordered alternative. However, the direction
of the test is unknown in advance. In this section, we address the issue of how to
obtain the two-sided p value for the five testing procedures and how to determine
the direction of the trend from the two-sided p value.
Up
We focus on the two possible directions of the alternatives: H1 , defined in
Down Up Up
Eq. (3.3) and H1 , defined in Eq. (3.4). Let p and T denote the p value
Up
and the corresponding test statistic computed to test H0 vs. H1 , and let p Down and
T Down denote the p value and the corresponding test statistic computed to test H0
vs. H1Down . Barlow et al. (1972) showed that, for K > 2, a N 2 statistic for testing H0
may actually yield p Up < ˛ and p Down < ˛. However, p D 2 min.p Up ; p Down / is
Up
always a conservative p value for the two-sided test of H0 vs. either H1 or H1Down .
Hu et al. (2005) adapted the approach by taking the larger of the likelihoods of
Up
H1 or H1Down , i.e., the larger of T Up and T Down as the test statistic for two-sided
inference. In contrast to Hu et al. (2005), we obtain two-sided p values by taking
p D min.2 min.p Up ; p Down /; 1/, where p Up and p Down are calculated for T Up and
T Down using permutations to approximate the null distribution of these test statistics.
After rejecting the null hypothesis against the two-sided test, there is still a need
to determine the direction of the trend. The direction can be inferred by the following
Up
procedure. If p Up ˛=2, then reject H0 and declare H1 ; if p Down ˛=2, then
reject H0 and declare H1Down . The validity of this directional inference is based
Up
on the following property: under H1 , p Down is stochastically larger than U Œ0; 1,
Down Up
and under H1 , p is stochastically larger than U Œ0; 1. Thus, the probability of
falsely rejecting H0 is ˛, and the probability of declaring a wrong direction for the
trend is ˛=2. It is also important to note that the event p Up < ˛=2 and p Down <
Up
˛=2 may be observed. Under H0 , H1 , or H1Down , this event is unlikely. However,
it is likely if the treatment has a large and non-monotone effect. An example of this
unique situation, in which the null hypothesis can be rejected for both directions, is
given in Sect. 7.6.1.
In order to verify whether the property needed for directional inference applies to
the five test statistics, we conduct a simulation study to investigate the distribution
Up
of the p Up and p Down values. For each simulation,pdata are generated under H1 : the
means are assumed to be equal to .1; 2; 3; 4/= 5 for the four doses, respectively,
and the variance is equal to 2 D 1. The test statistics T Up and T Down are calculated
Up
for the two possible alternatives H1 and H1Down . Their corresponding p Up and
Down
p values are obtained using 10,000 permutations.
Figure 7.1 shows the cumulative distribution of p Up and p Down . Clearly, the
simulations show that the cumulative distribution of p Down (the p value of the
test statistics calculated assuming the wrong direction, dotted line in Fig. 7.1) is
stochastically higher than U Œ0; 1 (solid line in Fig. 7.1), which is the distribution
of the p values under the null hypothesis. Moreover, the distribution of p Up
(the p value for the test statistics calculated assuming the right direction, dashed
106 D. Lin et al.
a: E2 b: Williams c: Marcus
1.0 1.0 1.0
cumulative p−values
cumulative p−values
cumulative p−values
0.8 0.8 0.8
d: M e: Modified M
1.0 1.0
cumulative p−values
cumulative p−values
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p−values p−values
Fig. 7.1 The cumulative distribution of p Up values (dashed line) and p Down values (dottedp line)
Up
for the five test statistics. Data are generated under H1 with isotonic means .1; 2; 3; 4/= 5 for
the four doses. Solid line: cumulative distribution of H0 U Œ0; 1
line in Fig. 7.1) is, as expected, stochastically smaller than U Œ0; 1. Similar
results (not shown) are obtained when the data are generated under H1Down . The
results imply that all the five test statistics possess the property required for the
Up
directional inference: under H1 the distribution of p Down is stochastically greater
than U Œ0; 1.
Up
Figure 7.2 shows the values of test statistics, which were calculated under H1
Up
and H1Down , for data generated under H1 . The five test statistics are calculated for
testing H0 vs. H1Down (the x-axis of each test statistic in Fig. 7.2). The behavior of
Marcus’, M, and the modified M statistics is similar, as they all use the difference
between the highest and the lowest isotonic means. The maximum value of the
test statistics (when calculated assuming the wrong direction) is equal to zero. In
contrast, Williams’ test statistic for testing H0 vs. H1Down (shown on the x-axis of
7 Single Contrast Tests 107
a: E2 b: Williams c: Marcus
8 8
0.8
6 6
0.6
test stat up
test stat up
test stat up
4
0.4 4
2
0.2 2
0
0.0 0
0.0 0.1 0.2 0.3 0.4 0.5 −2 −1 0 1 2 3 4 −2.5 −2.0 −1.5 −1.0 −0.5 0.0
test stat down test stat down test stat down
d: M e: Modified M
7 5
6
4
5
test stat up
test stat up
3
4
3 2
2
1
1
0 0
−1.5 −1.0 −0.5 0.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0
test stat down test stat down
Up
Fig. 7.2 The five test statistics calculated for H0 vs. H1 (y-axis) and H0 vs. H1Down (x-axis)
the panel b) can be positive or negative, because the sample mean of control group is
used instead of the isotonic mean. A similar pattern was observed in Chap. 3 for the
bootstrap distribution of Williams’ and Marcus’ statistics. Note that we reject the
null hypothesis in favor of H1Down for negative values of the test statistic. Further,
Up
the value of the test statistics for testing H0 vs. H1 (the y-axis of Fig. 7.2) is higher
than the value of the test statistics calculated for testing H0 vs. H1Down (the x-axis of
Fig. 7.2).
When FDR controlling procedures are used to adjust for multiplicity in the
microarray setting, the set of two-sided p values computed for each gene is adjusted
108 D. Lin et al.
The first stage of the analysis, which is also the time-consuming stage, consists
of permutations under the null hypothesis in order to obtain the distribution of the
test statistic under the null hypothesis. Note that, by default, all five test statistics
discussed above are calculated. The function IsoRawp() is used to perform the
permutation. A general call of the function IsoRawp() has the form of
Here, x is a vector which contains the dose levels, and data is the R data frame
which contains the information about gene expression and gene names. Once the
permutation stage is completed, the FDR-adjusted p values can be obtained using
the function IsoTestBH(). The function calculates the adjusted p values for each
statistic using either the BH-FDR or BY-FDR for multiplicity adjustment. The user
can specify one of the five test statistics discussed above or use the default call; in
the latter case, adjusted p values for all test statistics will be calculated. A typical
call of the IsoTestBH() function has the form:
IsoTestBH(rawp, FDR, type,stat)
The first argument rawp is an R object which contains all the output produced by
the function IsoRawp(). The argument FDR and type are vectors used to specify
the error rate and the adjustment type (BH or BY or both). The last argument stat
is a vector in which we specify the test statistics to be used. For example, a simple
call of IsoTestBH()
IsoTestBH(rp, FDR=0.05, type=c("BH","BY"),
stat=c("E2","Williams","Marcus","M","ModifM"))
will produce the adjusted p values for BH-FDR and BY-FDR at levels of 5% for all
test statistics. In what follows, we illustrate in more details the use of the functions
in the IsoGene package.
The five test statistics described above can be obtained by using the function
IsoGene1()(for a single gene) and the function IsoGenem() (for all the genes
simultaneously). We can calculate the test statistics with the call
> IsoGene1(dose levels, gene expression vector)
For example, the test statistics for gene 1 from the human epidermal carcinoma case
study can be calculated using the following code
> stat1 <- IsoGene1(x, gene1)
The object stat1 contains the information about the five test statistics and the
direction for which the likelihood is maximized:
> stat1
$ E2.up [1] 0.2010957
$ Williams.up [1] 0.6712589
$ Marcus.up [1] 1.3646790
$ M.up [1] 1.0004640
$ ModM.up [1] 1.0611520
$ E2.dn [1] 0.0098105
$ Williams.dn [1] -0.2850243
$ Marcus.dn [1] -0.2850243
$ M.dn [1] -0.1876899
$ ModM.dn [1] -0.2098437
$ direction [1] "u"
110 D. Lin et al.
The first ten objects are the values calculated for the five test statistics under
increasing and decreasing trends. The last object indicates the higher likelihood
of isotonic regression with “u” meaning an increasing trend or “d” meaning a
decreasing trend.
In a similar way, the test statistics can be calculated for the first ten genes using
the function IsoGenem():
> statm <- IsoGenem(x, data[1:10,])
> statm
$E2.up
2 3 4 5 6
0.00000000 0.20109571 0.50774178 0.24835414 0.00000000
7 8 9 10 11
0.16263545 0.43080221 0.00000000 0.06367646 0.00000000
$Williams.up
[1] -1.1298186 0.6712589 2.4888850 0.8911883 -0.6520746
[6] 0.5582301 2.1412458 -0.5774471 0.8895008 -1.6655641
$Marcus.up
[1] 0.0000000 1.3646791 2.4888850 1.3929232 0.0000000
[6] 1.4262255 2.1412458 0.0000000 0.8895008 0.0000000
$M.up
[1] 0.0000000 1.0004635 2.0194520 0.9386711 0.0000000
[6] 0.7196721 1.6954778 0.0000000 0.4960717 0.0000000
$ModM.up
[1] 0.0000000 1.0611518 2.1419523 1.0494662 0.0000000
[6] 0.8046179 1.7983258 0.0000000 0.5261635 0.0000000
$E2.dn
2 3 4 5 6
0.275992755 0.009810531 0.000000000 0.002919082 0.139779913
7 8 9 10 11
0.079338232 0.000000000 0.099457466 0.124447675 0.456682221
$Williams.dn
[1] -1.66138158 -0.28502430 1.72265475 0.06394548
[5] -1.10255870 -1.01756814 1.31114235 -0.76992952
[9] 0.12749125 -3.83085270
$Marcus.dn
[1] -1.6613816 -0.2850243 0.0000000 -0.1743749 -1.1025587
[6] -1.1502473 0.0000000 -0.7699295 -1.0674972 -3.8308527
$M.dn
[1] -1.2705909 -0.1876899 0.0000000 -0.1020262 -0.9002354
[6] -0.5535348 0.0000000 -0.6266432 -0.6156540 -2.0821512
$ModM.dn
[1] -1.3476651 -0.2098437 0.0000000 -0.1140688 -0.9002354
[6] -0.6188707 0.0000000 -0.7006084 -0.6883221 -2.2084548
$direction
2 3 4 5 6 7 8 9 10 11
"d" "u" "u" "u" "d" "u" "u" "d" "d" "d"
The output from IsoGenem has the same structure as the one for the Iso-
Gene1, but each object contains the values of the test statistics and the likely
direction of the isotonic regression model for all the genes.
As discussed above, we use permutations to obtain the raw p values for the five test
statistics:
7 Single Contrast Tests 111
> set.seed(1234)
> rawp <- IsoRawp(x=x, y=data, niter=1000)
The R object rawp contains four objects with p values for the five test statistics:
the first one contains the two-sided p values, the second contains the one-sided
p values, the third contains p Up values, and the last one contains p Down values.
Below we print a part of the object with two-sided p values for illustration:
> rawp.twosided=rawp[[2]]
> rawp.twosided[1:10,]
Probe.ID E2 Williams Marcus M ModM
1 31637_s_at 0.000 0.000 0.000 0.000 0.000
2 32402_s_at 0.129 0.225 0.124 0.123 0.125
3 33646_g_at 0.003 0.004 0.003 0.003 0.002
4 34063_at 0.487 0.379 0.500 0.467 0.474
5 33494_at 0.071 0.185 0.035 0.063 0.064
6 34031_i_at 0.082 0.220 0.086 0.103 0.086
7 34449_at 0.357 0.445 0.432 0.400 0.391
8 34478_at 0.472 0.516 0.535 0.518 0.511
9 35436_at 0.151 0.116 0.148 0.150 0.140
10 36711_at 0.000 0.000 0.000 0.000 0.000
The second output object from the rawp is a matrix with six columns, where the
first column indicates the probe ID. Columns from the second to the sixth are p
values for each of the five test statistics, respectively. The remaining three output
objects (rawp[[1]], rawp[[3]], rawp[[4]]) are structured in the same way.
For a single gene, the function IsopvaluePlot() can be used to show the p Up -
and p Down values for a given test statistic:
We use one gene as an example to illustrate how p Up and p Down values (in the upper
and lower panels of Fig. 7.3) are obtained. In Fig. 7.3, the observed test statistics
are drawn as the dashed line, and the values of the test statistics obtained from
permutations are spread over the x-axis. For this gene, the p Up is much smaller as
compared to the p Down since T Up T Down , which implies a possible increasing
trend in the data:
> gene2 <- data[2,]
> set.seed(123)
> IsopvaluePlot(x, gene2, niter=1000, stat="E2")
112 D. Lin et al.
Gene 31637_s_at:p−value^{up}=0
600
Density
200
0
0.0 0.2 0.4 0.6 0.8
E2
Gene31637_s_at:p−value^{down}=0.761
600
Density
400
200
0
0.0 0.2 0.4 0.6
E2
With the two-sided p values, we now need to select one of the five test statistics, the
FDR level, and the type of multiplicity adjustment (BH-FDR or BY-FDR) to obtain
the list of significant genes. The following example shows the use of the likelihood
ratio test EN 01
2
, the FDR level of 0.05, and the BH-FDR procedure controlling the
FDR:
The first ten rows in the R object E2.BH list the sorted raw and adjusted p values
for the EN 01
2
test statistic:
7 Single Contrast Tests 113
0.8
Adjusted Aymptotic P values
0.6
0.4
Raw P
BH(FDR)
0.2
BY(FDR)
0.0
Fig. 7.4 The unadjusted (solid line) and the BH-FDR (dotted and dashed red line) and BY-FDR
(dashed line) adjusted p values for the EN 01
2
> E2.BH[1:10,]
Probe.ID row.name raw p-values BH adjusted p values
1 31637_s_at 1 0.000 0.000000000
2 33646_g_at 3 0.003 0.015647131
3 36711_at 10 0.000 0.000000000
4 37079_at 12 0.001 0.007115111
5 37117_at 13 0.011 0.042679297
6 37152_at 14 0.003 0.015647131
7 38158_at 29 0.003 0.015647131
8 38241_at 30 0.000 0.000000000
9 39248_at 35 0.003 0.015647131
10 39249_at 36 0.008 0.033642751
Note that order of the list of genes found significant is based on the row number.
Moreover, the function IsoBHPlot() can be used to plot the raw and adjusted
p values in order to visualize the number of significant findings by using the BH-
FDR and BY-FDR procedures for the specified test statistic. A general call of
IsoBHPlot() has the same structure as the function IsoTestBH():
IsoBHPlot(rp, FDR=c(0.05,0.1),
stat=c("E2","Williams","Marcus","M","ModifM"))
Figure 7.4 shows the unadjusted (solid line) and the BH-FDR (dotted and dashed
line) and BY-FDR (dashed line) adjusted p values for EN 01
2
. It is obtained using the
following call for the function IsoBHPlot():
> IsoBHPlot(rawp.twosided, FDR=0.05, stat="E2")
114 D. Lin et al.
7.6 Results
The testing procedures discussed in the previous sections are applied to the case
study data. For each test statistic, p Up and p Down are obtained based on the
permutation matrix, in which the null distribution of the test statistics T Up and
T Down , respectively, is approximated using 1,000 permutations. The inference is
made based on the two-sided p values. Table 7.1 shows the number of rejected
hypotheses for several multiplicity adjustment methods and for the five test statistics
that are tested at the significance level of 0.05. Figure 7.5 shows the adjusted
p values for the five test statistics. Clearly, the adjusted p values for the maxT,
Bonferroni, and BY-FDR are larger than the adjusted p values obtained for the BH-
FDR. For instance, for the EN 012
, without adjusting for multiple testing, we reject
Table 7.1 Number of rejected null hypotheses for various testing procedures
at the significance level of 0.05
Method EN 01
2
Willams Marcus M Modified M
Unadjusted 5,457 5,238 5,465 5,449 5,451
maxT 224 215 223 265 251
Bonferroni 1,814 1,592 1,669 1,755 1,745
Holm 1,814 1,592 1,669 1,755 1,745
BH-FDR 3,613 3,209 3,533 3,562 3,567
BY-FDR 1,814 1,592 1,669 1,755 1,745
7 Single Contrast Tests 115
1.0
0.8 maxT
Adjusted p−values
0.6
Bonf/Holm/BY−FDR
0.4 E2
Williams
BH−FDR Marcus
M
0.2 Modified M
0.0
Fig. 7.5 Adjusted p values using the Bonferroni, BH(FDR), and maxT procedures for the five test
statistics
the null hypothesis for 5,457 genes. With Bonferroni, Holm, and the BY-FDR
adjustment procedures, we obtain the same number of significant genes, i.e., 1,814.
Using the maxT for controlling the FWER seems to be the most conservative
approach with only 224 genes declared significant.
Note that due to the incidence that a different random permutation seed is used
for the analysis of EN 01
2
for the human epidermal carcinoma case study, 3,499 genes
are obtained as compared to 3,613 obtained as above, which will be used throughout
the remaining chapters of the book:
> rbind(colSums(rawp.twosided[,-1]<=0.05),colSums(pval.maxT[,-1]<=0.05),
colSums(adjp.Bonf[,-1]<=0.05),
colSums(adjp.Holm[,-1]<=0.05)
colSums(adjp.BH[,-1]<=0.05)
colSums(adjp.BY[,-1]<=0.05))
Note that the number of significant genes obtained for each test statistic for a
given multiple testing adjustment is similar. For example, for the BH-FDR adjust-
ment, we find 3,613, 3,562, and 3,567 significant genes for EN 01
2
, M, and the modified
M statistic, respectively. This method of adjustment for multiple testing yields
more liberal results as compared to the other methods. For that reason, the FDR
adjustment for multiplicity is commonly used within the microarray framework (Ge
et al. 2003; Tusher et al. 2001; Storey and Tibshirani 2003). Moreover, the BH-
FDR procedure controls the MD-FDR, as discussed in Sect. 7.4.1. Therefore, in
what follows, we use the BH-FDR procedure to investigate the performance of the
considered test statistics.
116 D. Lin et al.
gene expression
gene expression
7.3
gene expression
11.7 +*
+ 11.5
7.2
* ** ** * 11.6
+ *
* * * +*
7.1 11.4 11.5 * *
* * *
7.0 11.4
+*
+* 11.3 11.3
6.9 +* +*
+
11.2
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
Gene:9219 Gene:10167
12.0
+* 11.7 +*
11.9 +*
gene expression
gene expression
11.8 +*
11.6
+*
+*
11.7 * * *
*
11.5 * *
* * *
11.6
11.5 11.4
11.4 +
+ 11.3
Fig. 7.6 Five genes rejected by Marcus’ statistics with both p Up and p Down values smaller than
Up
the rejection threshold. Solid line: the isotonic means obtained for testing H0 against H1 . Dashed
Down
line: the isotonic means obtained for testing H0 against H1
As we argue in Sect. 7.4, there is a possibility, although unlikely, that the null
hypothesis is rejected for both directions, i.e., p Up ˛=2 and p Down ˛=2. For the
analysis discussed above, the null hypothesis is rejected for only five genes when
using Marcus’ statistic with p Up and p Down smaller than the rejection threshold (with
multiple testing adjustment), suggesting a non-monotonic trend. The five genes are
shown in Fig. 7.6.
For Marcus’ statistic, the large values of T Up and T Down are obtained from the
large difference between the isotonic mean of the highest and control doses, relative
to the variance calculated under the unrestricted alternative. Instead, EN 012
, M, and
the modified M use the variance estimator calculated under the order alternative,
that results in smaller test statistic values. Hence, using these test statistics, the
null hypothesis is not rejected. If the difference between the highest isotonic mean
and control sample mean exists, Williams’ test statistic will tend to reject the null
hypothesis as well.
7 Single Contrast Tests 117
c d
0
Log (M p values)
−1
−2
−3
a b
−4
−4 −3 −2 −1 0
Log (E2 p values)
In particular, for the five genes, the estimates of 2 for Williams’ and Marcus’
test statistic calculated under the unrestricted alternatives are equal, respectively, to
0.0414, 0.0075, 0.0204, 0.0145, and 0.0232. They are smaller than the estimates
for 2 for EN 01
2
, M, and the modified M procedures calculated under the ordered
Up
alternative H1 , which are equal, respectively, to 0.2995, 0.1788, 0.3277, 0.3317,
and 0.2437, and under H1Down , which are equal, respectively, to 0.2608, 0.1868,
0.4679, 0.4401, and 0.2065.
Although in our case study, the number of significant genes obtained for the five
testing procedures is very similar, there are some discrepancies. In this section, we
investigate the subset of genes not commonly found by EN 01
2
, M, and the modified M
statistics.
We first compare genes identified as significant or nonsignificant by M and EN 01
2
.
The logarithm of two-sided p values for these genes is shown in Fig. 7.7. Among
the total of 16,998 genes, 3,420 genes are found significant for monotonic trends
118 D. Lin et al.
a b
16000
rank E
rank E
13000
14000
12000 11000
c d 15000
14500
13500 10000
rank E
rank E
12500 5000
11500
0
11000 12000 13000 14000 0 2000 6000 10000 14000
rank M rank M
and M
for both statistics. However, 193 genes are found to be significant for EN 01 2
and
nonsignificant for M-test statistic, while for 142 genes, the reversed order is
observed. These genes account for 8.9% (.193 C 142/=.3;420 C 193 C 142/) of
the total significant findings for both test statistics, which is not negligible.
Similar to Hu et al. (2005), we compare the ranking of M and EN 01 2
of all the
genes. In both Hu et al. (2005) and our example, the correlation of the ranks is
equal to 0.99. Based on this observation, Hu et al. (2005) concluded that the two
statistics perform similarly. However, in our data, the correlation of ranks of 142
genes found significant only for the M statistic (panel c of Fig. 7.8) is 0.92, while
the correlation of ranks of 193 genes significant only for EN 012
(panel b) is 0.85. Both
are somewhat lower than the correlation for genes in panel a (3,420 genes significant
for both statistics, correlation of 0.98) and in panel d (genes nonsignificant by either
statistic, correlation of 0.99). The discrepant conclusions (rejecting the null only for
7 Single Contrast Tests 119
−1
Log (Modified M p values)
c d
−2
−3
a b
−4
−4 −3 −2 −1 0
Log (M p values)
Fig. 7.9 Logarithm of p values (two-sided) for the M and the modified M. Panel (a): 3,478 genes
are rejected by both M and the modified M statistics; panel (b): 86 genes are rejected by M statistic
only; panel (c): 89 genes are rejected from the modified M only; panel (d): 13,345 genes are not
rejected by either statistic
one of statistics) can be explained by the fact that the M statistic looks for the mean
difference between the highest dose and the control. On the other hand, EN 01 2
is a
global test for the monotonic trend.
The logarithm of the two-sided p values for the genes identified as significant or
nonsignificant by M and the modified M statistics is shown in Fig. 7.9. Among the
total of 16,998 genes, 3,478 genes are found significant for monotonic trends for
both tests. However, 86 genes are found to be significant for M statistic and non-
significant for the modified M test, while for 89 genes, the reverse is true. These
genes account for about 4.8% (.86 C 89/=.86 C 89 C 3; 478/) of the total significant
findings for both test statistics.
The overall correlation between the ranks of genes obtained for M and the
modified M test statistics is 0.99. The correlation between genes in each panel of
Fig. 7.10 is also very high, with 0.97 (in panel b) for genes declared significant
only by the modified M, 0.98 (in panel c) for genes declared significant only by
M, 0.99 (in panel a) for genes declared significant by both of the test statistics, and
0.998 (in panel c) for genes declared significant by neither of the test statistics. The
difference between the two statistics lies in the adjustment of the degrees of freedom
in the standard error estimator of the modified M test statistic. Nevertheless, the
discrepancy is not substantial.
120 D. Lin et al.
a b
17000
14000
rank Modifed M
rank Modifed M
15000 13000
12000
13000
11000
11000
11000 13000 15000 17000 11000 12000 13000 14000
rank M rank M
c d
15000
rank Modifed M
8000
13000
4000
11000 0
11000 12000 13000 14000 15000 0 5000 10000 15000
rank M rank M
Fig. 7.10 Correlation between M and the modified M. Panel (a): correlation (0.99) between
rankings of 3,478 genes rejected both from M and the modified M. Panel (b): correlation (0.97) bet-
ween rankings of 89 genes rejected only from the modified M. Panel (c): correlation (0.98)
between rankings of 86 genes rejected only from M and panel (d): correlation (0.998) between
rankings of 13,345 genes not rejected from M and the modified M
7.7 Discussion
In this chapter, we introduced another two t-type test statistics to test for the
monotone trends of gene expression with respect to doses, namely, M and the
modified M tests, which complements Williams’ and Marcus’ tests discussed in
Chap. 3. Further we discussed the issue of directional FDR in order-restricted
inference in the dose-response microarray setting. Different approaches for direc-
tional FDR are discussed in Guo et al. (2010) in the context of ordered categorical
predictor and in Sun and Wei (2011) in the context of multiple testing for pattern
identification for time course data. This chapter also introduced the use of the
IsoGene package to implement the various test statistics aforementioned, the
resampling-based inference, and multiple testing adjustment procedures, such as
the BH-FDR and BY-FDR procedures.
7 Single Contrast Tests 121
References
Barlow, R. E., Bartholomew, D. J., Bremner, M. J., & Brunk, H. D. (1972). Statistical inference
under order restriction. New York: Wiley.
Bartholomew, D. J. (1961). Ordered tests in the analysis of variance. Biometrika, 48, 325–332.
Benjamini, Y., & Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals
for selected parameters. Journal of the American Statistical Association, 100, 71–81
Ge, Y., Dudoit, S., & Speed, P. T. (2003). Resampling based multiple testing for microarray data
analysis (Technical report, 633). Berkeley: University of Berkeley.
Guo, W., Sarkar, S., & Peddada, S. (2010). Controlling false discoveries in multidimensional direc-
tional decisions, with applications to gene expression data on ordered categories. Biometrics,
66(2), 485–492.
Hu, J., Kapoor, M., Zhang, W., Hamilton, S. R., & Coombes, K. R. (2005). Analysis of dose
response effects on gene expression data with comparison of two microarray platforms.
Bioinformatics, 21(17), 3524–3529.
Lin, D., Shkedy, Z., Yekutieli, D., Burzykowki, T., Göhlmann, H. W. H., De Bondt, A., et al. (2007).
Testing for trend in dose-response microarray experiments: comparison of several testing
procedures, multiplicity, and resampling-based inference. Statistical Application in Genetics
and Molecular Biology, 6(1). Article 26.
Marcus, R. (1976). The powers of some tests of the quality of normal means against an ordered
alternative. Biometrika, 63, 177–83.
Storey, J. D., & Tibshirani R. (2003) SAM thresholding and false discovery rates for detecting
differential gene expression in DNA microarrays. In The Analysis of Gene Expression Data:
Methods and Software, by G Parmigiani, ES Garrett, RA Irizarry and SL Zeger (editors).
Springer, New York.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Sun, W., & Wei, Z. (2011). Multiple testing for pattern identification, with applications to
microarray time-course experiments. Journal of the American Statistical Association, 106(493),
73–88.
Tusher, V. G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrys applied to the
ionizing radiation response, Proceedings of the National Academy of Sciences, 98, 5116–5121.
Williams, D. A. (1971). A test for differences between treatment means when several dose levels
are compared with a zero dose control. Biometrics, 27, 103–117.
Williams, D. A. (1972). The comparision of several dose levels with a zero dose control.
Biometrics, 28, 519–531.
Yekutieli, D., & Benjamini, Y. (1999). Resampling-based false discovery rate controlling multiple
test procedures for correlated test statistics. Journal of Statistical Planning and Inference, 82,
171–196.
Chapter 8
Significance Analysis of Dose-Response
Microarray Data
8.1 Introduction
i yN0 O i
tiSAM D D ; (8.1)
s C s0 s C s0
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
H.W.H. Göhlmann An. De Bondt L. Bijnens W. Talloen
Janssen Pharmaceutical Companies of Johnson & Johnson, Beerse, Belgium
e-mail: [email protected]; [email protected]; [email protected];
[email protected]
D. Amaratunga
Biostatistics and Programming, Jansenn Pharmaceutical Companies of Johnson & Johnson,
Raritan, NJ, USA
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 123
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 8,
© Springer-Verlag Berlin Heidelberg 2012
124 D. Lin et al.
where i is the isotonic mean at dose i (i D 1; : : : ; K), xN 0 is the sample mean at
dose zero, and
v
u
u 1 1 X
K X
ni
1
sDt C .yij yNi /2 ;
ni n0 i D0 j D1
N K
The regularized test statistics for Marcus, M, and the modified M test statistics can
be obtained in the same way.
The likelihood ratio test statistic, which is the ratio between the variance under
the null and the ordered hypotheses, respectively, is regularized in a similar way as
the SAM F statistic (Chu et al. 2011). The modified test statistic is given by
q
O H2 0 O H2 1
N 2SAM
E01 D q : (8.2)
O H2 0 C s0
Note that in both (8.1) and (8.2) s0 is the fudge factor which is estimated from
the data as discussed in Sect. 6.3. Based on the regularized SAM test statistics, the
SAM procedure can be carried out to find significant genes with monotonic increa-
sing/decreasing trends while controlling the FDR empirically. Before discussing the
implementation of the SAM procedure within the IsoGene package in Sect. 8.4,
we discuss in the following section the main ideas behind the SAM procedure, while
in Sect. 8.3, we discuss the choice of the fudge factor in (8.1) and (8.2).
To show the effect of the SAM test statistic, we compare the values of the usual t-
test statistics and the SAM test statistics and investigate how the SAM test statistic
values are affected by the fudge factor. We use the human epidermal squamous
carcinoma cell line data to illustrate the change in the modified M test statistics
by adding the fudge factor. Figure 8.1 shows the effect size (numerator of t-test
statistics) vs. the absolute values of the SAM t-test statistics without (Fig. 8.1a) and
with the fudge factor (Fig. 8.1b). We observe that a large number of genes have large
test statistic values with small effect sizes, which are represented by points lying
along the zero vertical line (Fig. 8.1a). With the introduction of the fudge factor,
the points are gathering more around the zero crossing point (Fig. 8.1b). However,
the values of test statistics for all the genes decrease simultaneously. Figure 8.2
illustrates how the fudge factor affects genes with different variances.
The two axes of Fig. 8.2a represent the numerator (absolute value of effect size)
and denominator (standard error) of the t-test statistics. The angle ˛ (between the
y-axis and the solid line) for the three genes using small, median, and large standard
8 Significance Analysis of Dose-Response Microarray Data 125
SAM−s0 SAM+s0
40 15
30
10
Test Statistics
Test Statistics
20
10
0 0
−4 0 2 4 6 8 −4 0 2 4 6 8
Mean diff Mean diff
Fig. 8.1 Comparison of the SAM test statistics for the modified M test (absolute values) without
the fudge factor (left panel: SAM s0 ) and with the fudge factor (right panel: SAM C s0 ) using
the case study data. s0 D 0:218 (45th percentile of the standard errors in the data, see Sect. 8.3)
errors (s1 < s2 < s3 ) and corresponding effect sizes (1 < 2 < 3 ) constitutes
the same value of the t-test statistic, i.e., t1 D 1 =s1 D t2 D 2 =s2 D t3 D 3 =s3 .
Note that the test statistic value for these three genes is equal to cot.˛/. When the
fudge factor s0 is added in the denominator (extending the standard errors, s1 , s2 , and
s3 by s0 , respectively), the new angles are formed by increasing ˇ1 , ˇ2 , and ˇ3 on
the basis of ˛, respectively. The three newly formed angles are provided between the
y-axis and the dotted line (˛ C ˇ1 ), short dashed line (˛ C ˇ2 ), and the long dashed
line (˛ C ˇ3 ), respectively. Thus, the SAM test statistics for the three genes become
cot.˛ C ˇ1 /, cot.˛ C ˇ2 /, and cot.˛ C ˇ3 /, respectively. The values of the SAM
test statistics are illustrated by the cotangent function in Fig. 8.2b. The left panel
of Fig. 8.2b shows the same t-test statistic value of the three genes with angle ˛.
However, the introduction of the SAM fudge factor decreases the values of the SAM
test statistics for three genes simultaneously, in particular, t1SAM < t2SAM < t3SAM (see
the right panel of Fig. 8.2b) due to s1 < s2 < s3 .
Let s.1/ ; s.2/ ; : : : ; s.m/ be the order statistics of the standard error in the microarray
experiment with m genes. Let s .0/ ; s .1/ ; s .l/ ; : : : ; s .100/ be the lth percentile of
126 D. Lin et al.
a b
1/ tan(x) 1 / tan(x)
4 4
2 2
0 0
–2 –2
–4 –4
0 1 2 3 0 1 2 3
Fig. 8.2 Graphical interpretation of the SAM test statistic: (a) the SAM test statistics and (b) the
cotangent function
s.1/ ; s.2/ ; : : : ; s.m/ . Let the fudge factor s0 D s .q/ ; it is easy to see that for gene g
and for a given dose i , the SAM test statistic with the fudge factor (tgSAM ) and the
t-test statistic (tg ) have the following relationship:
8
ˆ
< tg
SAM
< 12 tg s.g/ < s .q/ ;
tgSAM
D 12 tg s.g/ D s .q/ ;
:̂ t SAM > 1 t s > s .q/ :
g 2 g .g/
Hence, the SAM test statistic with the fudge factor is smaller than 1/2 of the
t-test statistics for genes with their standard errors smaller than the fudge factor.
Moreover, the ratio between the SAM with and without the fudge factor is
sg =.sg C s0 / since jtjSAM
g D j j=sg sg =.sg C s0 /. Depending on sg , the standard
error of gene g, the SAM test statistic becomes smaller by ratio of sg =.sg C s0 /.
As pointed out in Tusher et al. (2001) and Chu et al. (2011), the fudge factor is
selected in order to minimize the coefficient of variation (CV) of median absolute
deviance (MAD) of the test statistics. Several candidates for s0 are the percentiles
8 Significance Analysis of Dose-Response Microarray Data 127
Density
the CV of MAD of the test
statistics
0 0.218
0.3
CV of MAD of test statistics
0.2
+
0.1
0.02 0.1 0.13 0.16 0.2 0.24 0.31 0.43 0.69 0.9 1.59
0.0
of the distribution of the standard error in the sample. Figure 8.3a shows the
distribution of the standard error of the M test statistic, which seems to be bimodal.
Several percentiles of the distribution, s.5%/ ; s.10%/ ; : : : ; s.95%/ , are chosen, and the
CV of MAD of the modified M test statistics is calculated for each one of the chosen
percentiles. As shown in Fig. 8.3b, the value of s0 in this case study is equal to 0.218,
which is the 45th percentile of the standard errors of the test statistics. Note that, as
we discussed in Chap. 6, the fudge factor is chosen before the permutation, and it is
kept fixed during the permutations.
128 D. Lin et al.
The first two arguments x and data are R objects that contain information about
the dose levels and the expression matrix. The argument fudge can be used in
order to specify whether the analysis will be conducted without the fudge factor
adjustment (fudge = “none”) or with automatic selection of the fudge factor
(fudge = “pooled”). The IsoTestSAM() combines several functions which
can be used to carry out different parts of the analysis:
1. Isofudge() calculates the fudge factor in the SAM regularized test statistic.
2. IsoGenemSAM() is used to obtain the SAM test statistics.
3. Isoqqstat() calculates the SAM test statistic for the required number of
permutations specified by user.
4. Isoallfdr() obtains the delta table in the SAM procedure.
5. Isoqval() computes q values of the SAM.
6. IsoSAMPlot()produces the graphical display output of the analysis.
The R object fudge.factor is a vector of the fudge factors for the five test
statistics and it can be produced with the function Isofudge() in the following
way:
> fudge.factor <- Isofudge(x, data)
> fudge.factor
[1] 0.05433744 0.22199918 0.19684197 0.25662444 0.21811043
After the initial step in which we calculate the fudge factor, the observed test
statistics are calculated with the function IsoGenemSAM().
8 Significance Analysis of Dose-Response Microarray Data 129
The object SAMtest.stat contains all test statistics and the direction of the
trend.
> SAMtest.stat[[1]][1:10]
[1] 0.26447986 0.18999517 0.48012060 0.23728408 0.10796339
0.14052809 0.42501333 0.09426217 0.11186881 0.41814965
> SAMtest.stat[[2]][1:10]
[1] -0.89438193 0.33788341 1.17016612 0.44809154 -0.37792503
0.16928575 1.40636038 -0.42466794 0.04789247 -1.26568858
> SAMtest.stat[[3]][1:10]
[1] -1.0376186 0.8067224 1.3904565 0.8226705 -0.4703604
0.5463295 1.5671898 -0.4904282 -0.4930563 -1.5825487
> SAMtest.stat[[4]][1:10]
[1] -0.8990584 0.6872440 1.2812581 0.6610949 -0.4531919
0.4158778 1.3448651 -0.4422525 -0.3838333 -1.2290796
> SAMtest.stat[[5]][1:10]
[1] -0.9924257 0.7608038 1.4288688 0.7586334 -0.4985868
0.4826826 1.4670386 -0.5074367 -0.4436651 -1.3792490
> SAMtest.stat[[6]][1:10]
2 3 4 5 6 7 8 9 10 11
"d" "u" "u" "u" "d" "u" "u" "d" "d" "d"
Note that in the Isoallfdr(), the value of is left unspecified, with default
value taken from the data, i.e., all the percentiles of the standard errors. By fixing
the 50% FDR at 0.05, the corresponding delta value is 0.73 (marked in-between the
dashed lines) as we obtain from the delta table above, the number of differentially
expressed genes is 2,148 with potential 144 genes as false positives.
The q value of a gene is the FDR for the list of all gene declared differentially
expressed including that gene and all genes that are more significant (Chu et al.
2011). By specifying the desired value, delta table, and the user-defined test
statistic in the function Isoqval(), we can obtain the q value of each gene from
the SAM procedure.
> qval <- Isoqval(delta=0.73, allfdr=dtable, qqstat=qqstat,
stat="ModifM")
> dim(qval[[1]])
[1] 16998 3
> qval[[1]]
Row.names t.stat q.val
[1,] 5131 -8.802030 0
[2,] 3009 -8.385913 0
[3,] 9453 -8.266634 0
[4,] 7279 -7.390471 0
.
.
[16995,] 4625 11.152476595 0.0000
[16996,] 4624 11.606319834 0.0000
[16997,] 7760 13.106895443 0.0000
[16998,] 51 14.852821743 0.0000
> dim(qval[[2]])
[1] 2148 3
> qval[[2]]
Row.names t.stat q.val
[1,] 5131 -8.802030 0
[2,] 3009 -8.385913 0
[3,] 9453 -8.266634 0
[4,] 7279 -7.390471 0
.
.
[2145,] 4625 11.152477 0.0000
[2146,] 4624 11.606320 0.0000
[2147,] 7760 13.106895 0.0000
[2148,] 51 14.852822 0.0000
The first object of the output is the list of q values for all the genes, ranked
from the smallest test statistic value to the largest, while the second object is the
8 Significance Analysis of Dose-Response Microarray Data 131
list of q values for the 2,148 differentially expressed genes at the 50% FDR level of
0.05, ranked from the smallest test statistic value to the largest. The first column of
the output matrices is the row name of genes in the dataset, the second column is the
observed modified M test statistic value, and the last column is the q value of the
SAM procedure for both objects.
Instead of using the functions above to produce separate parts of the analysis, we can
use the function IsoTestSAM() to summarize all the steps above and give results
of a list of significant findings, which is the same second output of the function
Isoqval() and the output of the function Isoqqstat() and Isoallfdr()
as well.
IsoTestSAM(x, y=data, fudge=c(0,"pooled"), niter=100,
FDR=0.05, stat=c("E2","Williams","Marcus","M","ModifM"))
Specifying the same options as above in this function, we can obtain the list
of significant genes. Note that the last two columns of the IsoTestSAM gives
additional information by calculating the permutation p values based on the SAM
permutation matrix and adjusting these p values using the BH-FDR procedure.
> set.seed(123)
> IsoSAM.obj <- IsoTestSAM (x, y=data, fudge="pooled",
niter=100, FDR=0.05, stat="ModifM")
> IsoSAM.obj[[1]]
Probe.ID row.number stat.val qvalue pvalue adj.pvalue
1 g5131 5131 -8.802030 0.0000 0.000000e+00 0.0000000000
2 g3009 3009 -8.385913 0.0000 1.176609e-06 0.0007407407
3 g9453 9453 -8.266634 0.0000 1.176609e-06 0.0007407407
4 g7279 7279 -7.390471 0.0000 1.764914e-06 0.0009677419
.
.
2145 g4625 4625 11.152477 0.0000 0.000000e+00 0.0000000000
2146 g4624 4624 11.606320 0.0000 0.000000e+00 0.0000000000
2147 g7760 7760 13.106895 0.0000 0.000000e+00 0.0000000000
2148 g51 51 14.852822 0.0000 0.000000e+00 0.0000000000
Finally, the graphic output of the SAM procedure can be produced using the function
IsoSAMPlot() .
> IsoSAMPlot(qqstat, allfdr, FDR=0.05, stat=c(E2","Williams",
"Marcus","M","ModifM"))
This function requires the use of output from Isoqqstat() and Isoallf-
dr(), given a user-defined test statistic, and the FDR level to control. We still
take the modified M test statistic, for example, at the FDR of 0.05. There are four
132 D. Lin et al.
a b
0.5 FDR90% 15000
# of significant genes
FDR50%
0.4
10000
FDR
0.3
0.2
5000
0.1
0.0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
delta delta
c d 15
delta=0.73
FP90% 10
# of false positives
FP50%
observed
6000
5
0
2000
−5
0
0 1 2 3 4 5 6 −2 0 2
delta expected
Fig. 8.4 The SAM plots: (a) plot of the FDR vs. delta; (b) plot of number of significant genes vs.
delta; (c) plot of number of false positives vs. delta; (d) plot of observed vs. expected test statistics
plots generated by the SAM procedure. Panel a shows the FDR [either 50% or
90% (more stringent)] vs. , from which, user can choose the delta value with
the corresponding desired FDR. Panel b shows the number of significant genes vs.
, and panel c shows the number of false positives (either 50% or 90%) vs. .
Finally, panel d shows the observed vs. the expected (obtained from permutations)
test statistics, in which dots beyond the two dashed lines are those genes called
differentially expressed (Fig. 8.4).
> IsoSAMPlot(qqstat=qqstat, allfdr=dtable, FDR=0.05, stat="ModifM")
8.5 Discussion
In this chapter, we discussed the issue of genes with small variance and the solution
that the SAM procedure provides for this problem. The SAM procedure was adapted
in the dose-response microarray setting in case of small variance, by modifying
the likelihood ratio test and the four t-type test statistics with a fudge factor. This
8 Significance Analysis of Dose-Response Microarray Data 133
adjustment of fudge factor for the five test statistics is new; therefore, we suggest to
conduct a simulation study to further investigate the performance of these adjusted
test statistics.
References
Chu, G., Li, J., Narasimhan, B., Tibshirani, R., & Tusher, V. (2011). SAM: “Significance Analysis of
Microarrays” users guide and technical document. http://www-stat.stanford.edu/tibs/SAM/
sam.pdf.
Tusher, V. G., Tibshirani, R., & Chu, G. (2001). Significance analysis of microarrys applied to the
ionizing radiation response. Proceedings of the National Academy of Sciences, 98, 5116–5121.
Chapter 9
ı-Clustering of Monotone Profiles
9.1 Introduction
A. Kasim ()
Wolfson Research Institute, Durham University, Durham, UK
e-mail: [email protected]
S. Van Sanden W. Talloen
Janssen Pharmaceutical Companies of Johnson & Johnson, Beerse, Belgium
e-mail: [email protected]; [email protected]
M. Otava
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
D. Lin
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
S. Hochreiter D.-A. Clevert
Institute of Bioinformatics, Johannes Kepler University, Linz, Austria
e-mail: [email protected]; [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 135
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 9,
© Springer-Verlag Berlin Heidelberg 2012
136 A. Kasim et al.
Table 9.1 The set of seven possible monotonic dose-response models for an experiment with four
dose levels
Model Up: mean structure Down: mean structure
g1 0 D 1 D 2 < 3 0 D 1 D 2 > 3
g2 0 D 1 < 2 D 3 0 D 1 > 2 D 3
g3 0 < 1 D 2 D 3 0 > 1 D 2 D 3
g4 0 < 1 D 2 < 3 0 > 1 D 2 > 3
g5 0 D 1 < 2 < 3 0 D 1 > 2 > 3
g6 0 < 1 < 2 D 3 0 > 1 > 2 D 3
g7 0 < 1 < 2 < 3 0 > 1 > 2 > 3
i is the mean response of dose level
Traditional clustering methods such as K-means (Hartigan and Wong 1979), hier-
archical methods (Johnson and Wichern 2008), and self-organizing maps (Kohonen
2001) have been used extensively in microarray experiments to identify clusters
of co-regulated genes. While the clustering methods find co-regulated genes based
on similarity across all the conditions in an experiment, it has been argued
that co-expression under subsets of conditions is more biologically intuitive and
relevant. As such, several methods for biclustering have been developed in the
recent years to find co-regulated genes under a subset of conditions (Madeira and
Oliviera 2004). The major difference between clustering and biclustering is that
clustering focuses on global patterns in gene expression data, while biclustering
focuses on local structures that are inherent in gene expression data. However,
both clustering and biclustering methods assume that genes are profiled under
multiple nominal experimental conditions. More recently, special attention has been
paid to microarray experiments profiled under experimental conditions with trends,
such as time course, dose-response, and temperature (Madeira and Oliviera 2004).
The aim of the analysis discussed in this chapter is to cluster genes with similar
9 ı-Clustering of Monotone Profiles 137
g1 g2 g3 g4
2.0
2.2 2.2 2.0
1.8
1.8
2.0 2.0
Gene Expression
Gene Expression
Gene Expression
Gene Expression
1.6
1.6
1.8 1.8
1.4
0.8
1.2 1.2
1.0
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
Dose Dose Dose Dose
g 5 g6 g7
1.8 1.8
2.0
1.6 1.6
Gene Expression
Gene Expression
Gene Expression
1.4 1.4
1.5
1.2 1.2
0.8 0.8
0.5
0.6 0.6
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
Dose Dose Dose
Fig. 9.1 Illustrative example. The set of all possible upward monotone increasing dose-response
curves for an experiment with four dose levels
dose-response curve shapes under monotone constraints. Hence, for the analysis
presented in this chapter, the dimension of the dose is fixed, and our goal is to find
subsets of genes with similarly shaped dose-response curves.
gene expression matrix Y with entries ymi , the residue of expression value of the
mth gene under condition (dose) i can be expressed as :
1 X
HMI D r2 ; (9.2)
jM jjI j m2M i 2I mi
where yMI is the overall mean of the expression matrix Y, ymI is the mean
expression of gene m, and yM i is the mean expression of condition i . The numbers
of genes and conditions are denoted with jM j and jI j, respectively. Note that the
model for the residual in (9.1) can be expressed in the form of a two-way ANOVA
model without an interaction term:
In some settings, the column effects ˇ in (9.3) have inherent ordering, which may be
time, temperature, or, in our example, increasing doses of a therapeutic compound.
The aim is therefore to find clusters of genes that are similar in intensities and
trends. ı-Clustering, a variant of ı-biclustering by Cheng and Church (2000), may
in general be used.
Algorithm 1: ı-clustering
Input: Y , a matrix of real numbers; , minimum number of genes in a cluster;
and : 0 1.
Output: Ysub , a sub-matrix with number of rows or columns less than or equal
to the number of the rows or columns of the original matrix Y .
Initialization: ı D HP , where HP is the mean squared residue score of the
observed data.
Iteration:
1. Apply node deletion algorithms proposed by Cheng and Church
(2000) only to the genes while the dose levels are kept fixed.
2. If the mean squared residue of the reduced matrix satisfies the ı
criterion or the number of genes in the reduced matrix is at least , then
output the reduced matrix as a cluster.
3. Delete members of the cluster found in step 2.
4. Repeat steps 1–3 on the non-clustered genes until every gene belongs
to a cluster.
140 A. Kasim et al.
The algorithm discussed above can be applied to any setting of an ordered design
variable (time, temperature, dose, etc.), but it does not require a monotone gene
expression profile. In other words, using Algorithm 1, we will be able to cluster
subsets of genes with a similar dose-response curve shape, but not necessarily
monotone. In the following section, we discuss an algorithm in which isotonic
regression is used in the dose dimension in order to cluster genes with similar
monotone dose-response curve shapes. Note that the ı-clustering algorithm is
applied to an expression matrix after an initial filtering in which genes with non
significant dose-response relationship are excluded from the analysis.
P
ymi
˛m D i 2I jI j effect of the mth row (gene); (9.5)
P
ymi
ˇi D m2M jM j effect of the i th column (dose):
The clustering algorithm is applied specifically to each direction in order to find
clusters of genes with monotone increasing or decreasing trends. The linear model
for the ı-clustering algorithm using a reduced gene expression matrix based only on
the isotonic means is given by the model in (9.6) and is described in Algorithm 2:
ymi D C ˛m C ˇi C rmi
: (9.6)
In the next step, after the determination of the trend direction, we need to
create the R objects for genes with upward and downward trends. The function
plotIsomeans() can be used to produce gene-specific profile plot:
> plotIsomeans(monoData=incData,obsData=obsIncData,doseData=
doseData,geneIndex=10)
142 A. Kasim et al.
a b
6.0
6.70
Gene Expression
Gene Expression
5.5
6.60
5.0
6.50
4.5
6.40
1 2 3 4 1 2 3 4
Dose Dose
Fig. 9.2 Examples of two significant genes. (a) Upward trend. (b) Downward trend
The ı-clustering method with D 0:15 results in 26 clusters for the 1,600 upward
monotone genes. The first cluster contains 1,278 genes, and the last cluster contains
2 genes. The large size of the first cluster is an inherent feature of the ı-clustering
method. The genes in the first cluster are usually the least expressed genes. The first
clusters from the ı-clustering method often contain genes that are less expressed
than those in the later clusters. Figure 9.3 presents examples of clusters with
upward monotone profiles. The upper panel shows the raw gene expression values,
and the lower panel shows gene expression values centered around gene-specific
means. Genes within a cluster show coherence in terms of similarities between
their expression values and trends. The function plotCluster() produces the
isotonic mean profiles for a specific cluster. The option zeroMean=TRUE centered
the gene profiles around the gene-specific means, as shown in the lower panels in
Fig. 9.3.
> plotCluster(DRdata=incData,doseData=doseData, ORCMEoutput=ORCMEoutput,
clusterID=4,zeroMean=FALSE, xlabel="Dose",ylabel="Gene Expression")
a b c
7
10 10
Gene Expression
Gene Expression
Gene Expression
6
8 8 5
6 4
6 3
4
2
4
2 1
1 2 3 4 1 2 3 4 1 2 3 4
Dose Dose Dose
d e f
1.5 2
2
Gene Expression
Gene Expression
Gene Expression
1.0
1
1 0.5
0.0 0
0
−0.5
−1
−1 −1.0
1 2 3 4 1 2 3 4 1 2 3 4
Dose Dose Dose
Fig. 9.3 Examples of clusters from upward monotone genes. Panels (a) and (b): average gene
expression profiles. (a) Cluster ID D 4; (b) cluster ID D 6; (c) cluster ID D 13; (d) cluster ID D 4
(centered); (e) cluster ID D 6 (centered); (f) cluster ID D 13 (centered).
Figure 9.4 presents examples of clusters with downward monotone profiles. Similar
to the clustering of the upward monotone genes, the clusters contain genes with
coherent values. However, there are situations that few members of a cluster show
different dose-response trends, but in most cases, the deviation occurs in only one
of the four doses in the experiments.
a b c
8 4.5
10
Gene Expression
Gene Expression
Gene Expression
4.0
8 6 3.5
3.0
6
4 2.5
4 2.0
2 1.5
2
1.0
1 2 3 4 1 2 3 4 1 2 3 4
Dose Dose Dose
d 1.0 e f
1.0
0.5
Gene Expression
Gene Expression
Gene Expression
0.5
0.0 0.5
0.0
−0.5
0.0
−0.5 −1.0
−1.5 −0.5
−1.0
−2.0 −1.0
1 2 3 4 1 2 3 4 1 2 3 4
Dose Dose Dose
Fig. 9.4 Examples of clusters with downward monotone profiles. (a) Cluster ID D 3; (b) cluster
ID D 6; (c) cluster ID D 13; (d) Cluster ID D 3 (centered); (e) cluster ID D 6 (centered); (f) cluster
ID D 13 (centered)
XX
R. / D .ymi q ˛m ˇi?q /2 ; q D 1; : : : ; n. /: (9.7)
q mi
Let N be the number of genes to be clustered. The range for n. / lies between
one and the number of genes, i.e., 1 n. / N . When D 1, n. / D 1
and n. / N for D 0. Since R. / is a decreasing function of n. / and an
increasing function of , R. / will be minimum when n. / D N and maximum
when n. / D 1. Note that when n. / D 1, the within-cluster sum of squares equals
the total sum of squares for the gene expression matrix. Our aim is to find the value
of , taking into account the trade-off between the within-cluster sum of squares and
the number of resulting clusters. This criterion is referred to as penalized within-
cluster sum of squares (pWSS), and it is defined as
w. ` /
H. / D 1 = ŒN n . `C1 / ; (9.10)
w. `C1 /
where ` is an index for the unique value of . The original definition for the H index
is based on the sequential increase in number of clusters. For our proposal, this is
not the case, as more than one value of may result in the same number of clusters.
However, the criterion can still be used to investigate the gain in within-cluster sum
of squares when moving from a lower value of to an adjacent higher value.
The relative proportion ( ) of the mean squared residue score of the monotonized
gene expression matrix is proposed as a clustering parameter for the ı-clustering
method. Though is bounded between 0 and 1, the choice of the optimum value
of is unknown. Similar to the resampling approach for random forest (Breiman
1996) and ABC learning (Amaratunga et al. 2008), we propose to generate 100
resampled datasets, with each dataset containing 100 genes randomly sampled with
replacement from the reduced expression data. For each of the resampled datasets,
the ı-clustering method is applied based on a set of values of ranging from 0.05
to 0.95. Note that the minimum number of genes in a cluster is fixed at two. The
resampling is done using the function resampleORCME().
## exploring optimum choice of lambda (the sequence for lambda)
> lambdaVector <- seq(0.05,0.95,0.05)
## upward trends
> lambdaChoiceOutput <- resampleORCME(clusteringData=incData,
lambdaVector=lambdaVector)
Figure 9.5 shows the relationship between the within-cluster sum of square,
the number of resulting clusters, and . Panels (a) and (c) show the relationship
between the within-cluster sum of square and for the upward and downward
monotone genes, respectively. Panels (c) and (d) show the relationship between
the number of resulting clusters and for the upward and downward monotone
genes, respectively. The within-cluster sum of squares increases with an increase
in , while the number of clusters decreases with an increase in . It shows that a
trade-off between the within-cluster sum of squares and number of clusters may be
a criterion for an optimum choice of . Diagnostic plots were produced using the
function plotLambda().
146 A. Kasim et al.
a b
50
14
40 12
No. of Clusters
10
WSS
30
8
20 6
4
10 2
0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85
Lambda Lambda
c d
20
15
No. of Clusters
15
WSS
10
10
5
5
0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85
Lambda Lambda
Fig. 9.5 Within-cluster sum of squares and the number of clusters as a function of . (a) WSS
(upward trends); (b) number of clusters (upward trends); (c) WSS (downwards trends); (d) number
of clusters (downward trends)
> plotLambda(lambdaChoiceOutput,output="wss")
> plotLambda(lambdaChoiceOutput,output="ncluster")
The trade-off between the within-cluster sum of squares and number of clusters
based on a pWSS is presented in Fig. 9.6a, d for upward and downward monotone
genes, respectively. The pWSS for the upward monotone genes reaches a minimum
at D 0:15 and at D 0:3 for the downward monotone genes. Figure 9.6b, e shows
the relationship between the CH values and for upward and downward monotone
9 ı-Clustering of Monotone Profiles 147
a b c 0.00
50 50
–0.01
40
CH Values
45
H Values
pWSS
–0.02
40 30
20 –0.03
35
10 –0.04
30
0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85
Lambda Lambda Lambda
d e 40
f 0.00
35 35
30 –0.01
CH Values
H Values
25
pWSS
30 –0.02
20
25 15 –0.03
10
20 5 –0.04
0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85 0.05 0.25 0.45 0.65 0.85
Lambda Lambda Lambda
Fig. 9.6 The choice of . Upper panels: upward trends. Lower panels: downward trends. (a and
d) pWSS; (b and e) CV index; (c and f) H index
genes, respectively. The maximum value of CH is reached at D 0:05 for both the
upward and downward monotone genes. It appears for our case study that the variant
of the Calsanzik and Harabasx (1974) index is not an informative criterion. It favors
the value which results in the highest number of clusters. Figure 9.6c, f presents
the relationship between the H value and for the upward and downward monotone
genes, respectively. The H values do not show a smooth pattern as observed from the
pWSS. However, it reaches its minimum value at D 0:35 for the upward monotone
genes and at D 0:3 for the downward monotone genes. Graphical output can be
produced using the function plotLambda(). The option output= determines
which index will be plotted.
> plotLambda(lambdaChoiceOutput,output="pwss")
> plotLambda(lambdaChoiceOutput,output="ch")
> plotLambda(lambdaChoiceOutput,output="h")
9.4 Discussion
method proposed by Cheng and Church (2000), where they defined a bicluster as a
subset of genes and a subset of conditions with a “high similarity score” using the
mean squared residue score. For the ı-clustering method, the ı value is modified to
be data dependent. It is expressed as a relative proportion ( ) of mean squared error
from the direction-dependent monotonized expression matrix. The method shares
some features of standard clustering methods in that it partitions genes in a dose-
response microarray data into nonoverlapping groups but also benefits from the local
structures of the biclustering methods.
The ı-clustering procedure discussed in this chapter was applied to a reduced
expression matrix obtained after an initial (inference-based) filtering. After the
initial filtering step, the within-gene variability is ignored by the ı-clustering method
and the cluster is constructed in order to reduce the between-gene variability (D the
within-cluster variability). The optimum choice of for the clustering method
is explored with a penalized within-cluster sum of squares, which is a trade-off
between goodness of fit and complexity of the resulting clusters for different values
of ranging from zero to one. The goodness-of-fit is captured with the within-
cluster sum of squares, and the complexity is captured with the number of clusters.
Note that the within-cluster sum of squares increases with an increase in and the
number of clusters decreases with an increase in . Based on the values of that
correspond to the minimum values of the penalized within-cluster sum of squares for
the upward and downward monotone genes, 26 and 19 clusters were obtained for the
upward and downward monotone genes, respectively. The first clusters contained
the least expressed genes and the last clusters contained the most expressed genes
in terms of the raw gene expression values.
References
Amaratunga, D., Cabrera, J., & Kovtun, V. (2008). Microarray learning with ABC. Biostatistics, 9,
128–136.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and
powerful approach to multiple testing. Journal of Royal Statistical Soceity B, 57, 289–300.
Breiman, L. (1996) Random forests. Machine Learning, 24, 123–140.
Calinski, R. B., & Harabasz, J. A. (1974). Dendrite method for cluster analysis. Communications
in Statistics, 3, 1–27.
Cheng, Y., & Church, G. M. (2000). Biclustering of expression data. Proceedings of the Conference
on Intelligent Systems for Molecular Biology, 55, 93–104.
Ge, Y., Dudoit, S., & Speed, P. T. (2003). Resampling based multiple testing for microarray data
analysis (Technical report, 633). Berkeley: University of Berkeley.
Hartigan, J. A., & Wong, M. A, (1979). Algorithm as 136: A k-means clustering algorithm. Journal
of the Royal Statistical Society, Series C (Applied Statistics), 28(1), 100–108.
Johnson, R. A., and Wichern, D. W.(2008). Applied Multivariate statistical analysis. Pearson.
Kohonen, T. (2001), Self-Organizing Maps. 3rd edition, Springer-verlag, Berlin.
9 ı-Clustering of Monotone Profiles 149
Lin, D., Shkedy, Z., Yekutieli, D., Burzykowki, T., Göhlmann, H. W. H., De Bondt, A., et al. (2007).
Testing for trend in dose-response microarray experiments: Comparison of several testing
procedures, multiplicity, and resampling-based inference. Statistical Application in Genetics
and Molecular Biology, 6(1). Article 26.
Madeira, S. C., & Oliviera, A. L. (2004). Biclustering algorithms for biological data analysis: A
survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 1(1),
24–45.
Prelic, A., Bleuler, S., Zimmermann, P., Wille, A., Buhlmann, P., Gruissem, W., et al. (2006).
Systematic comparison and evaluation of biclustering methods for gene expression data.
Bioinformatics, 22(9), 1122–1129.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of clusters in a data set via
the gap statistic. Journal of the Royal Statistical Society B, 63, 411–423.
Chapter 10
Classification of Monotone Gene Profiles Using
Information Theory Selection Methods
10.1 Introduction
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy M. Aerts
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]; [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 151
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 10,
© Springer-Verlag Berlin Heidelberg 2012
152 D. Lin et al.
g1 g2 g3 g4
5.6 5.6 5.6 5.6
Gene Expression
Gene Expression
Gene Expression
+ + + +
5.2 5.2 + + 5.2 5.2
+ + +
5.0 5.0 5.0 5.0
+
+ + +
4.8 4.8 4.8 4.8
+ +
+
4.6 4.6 4.6 4.6
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
Dose Dose Dose Dose
g5 g6 g7
5.6 5.6 5.6
Gene Expression
Gene Expression
+ +
5.2 + + 5.2 5.2
+ +
5.0 5.0 5.0
Fig. 10.1 Seven monotone models fitted for gene 3467. The model with the highest likelihood is
g7 while g6 is the smallest AIC (see discussion in Sect. 10.2.2)
Table 10.1 Comparison of penalty values by the AIC, BIC, and ORIC for models g1 to g7 . N is
the total number of arrays
Models Parameters AIC BIC ORIC
g1 ,g2 ,g3 2 22 2 log.N / 2 1:16666
g4 ,g5 ,g6 3 23 3 log.N / 2 1:91666
g7 4 24 4 log.N / 2 2:08334
complexity and to classify (or cluster) genes with similar dose-response curve
shapes. This is in contrast with Chap. 9 in which the ı-clustering procedure was
based on the minimization of the residual sum of squares within a cluster.
We use the human epidermal dose-response data (three higher doses and
one control dose under the control compound) as a case study to illustrate the
methodology discussed in this chapter. After the initial testing procedure (using
EN 01
2
test statistic), discussed in Sect. 7.6, 3,613 genes were rejected for the null
hypothesis of homogeneity of means in gene expression (see Table 7.1). However,
10 Classification Using Information Theory 153
due to a different seed used for the permutations, we now obtained 3,499 genes,
which is slightly different from the number of genes reported in Chap. 7.
The content of this chapter is organized as follows. In Sect. 10.2, we address
the problem of trend classification within the framework of model selection. Since
the set of candidate models is estimated under order restrictions, we discuss,
in Sect. 10.3, the ORIC. The one- and two-stage ORICC algorithms are discussed
in Sect. 10.4. We apply the proposed method to the case study in Sect. 10.5.
In this section, we address the problem of trend classification (or the identification
of the dose-response curve shape). For each gene declared significant in the first
step, the set fg1 ; g2 ; g3 ; g4 ; g5 ; g6 ; g7 g, given in Table 9.1 and shown in Fig. 9.1,
is the set of seven possible models with increasing trend for an experiment with
four dose levels. Analogously, a set of seven models with decreasing trend is
considered as well (see Fig. 10.1). Note that a model selection procedure that
leads to a selection of the best model from the set of all candidate models will
allow one to identify both dose-response curve shape and the minimum effective
dose (MED). On the other hand, using hypothesis testing for the determination
of the MED will not always allow us to identify the shape of the dose-response
curve.
To select the best model, we propose to use the posterior probability of the model
gr given the data (Burnham and Anderson 2002), defined by
P .Djgr /P .gr /
P .gr jD/ D PR ; r D 1; : : : ; R: (10.1)
rD1 P .Djgr /P .gr /
Here, P .Djgr / and P .gr / are the likelihood and the prior probabilities of the rth
model, respectively, and R is the number of all candidate models (for the order-
restricted ANOVA models, R D 7 for our case study). Note that if we use a
non-informative prior, i.e., P .gr / D 1=R (Whitney and Ryan 2009), the isotonic
regression model has the highest posterior probability since the maximum likelihood
estimate is unique. However, the posterior model probabilities (10.1) do not take
the complexity of the model into account. In what follows, we focus on the model
selection procedures based on information criteria, which take into account both
the goodness-of-fit and the model complexity. In other words, we focus on the
question whether changing the number of parameters in the model will lead to a
better compromise between goodness-of-fit and model complexity.
154 D. Lin et al.
The model selection theory discussed by Burnham and Anderson (2002, 2004) and
Claeskens and Hjort (2008) allows one to incorporate the need to balance between
goodness-of-fit and model complexity within the model selection procedure. Our
starting point for Burnham and Anderson’s model selection theory is the Kullback–
Leibler (KL) information given by Burnham and Anderson (2002) and Claeskens
and Hjort (2008): Z
f .x/
I.f; g/ D f .x/ log dx: (10.2)
g.xj/
Here, f represents the density function of the true and unknown model, g represents
the density function of the model that is used to approximate f , and is the
unknown parameter to be estimated. The KL information (or the KL distance) is
interpreted as the loss of information when the true model f is approximated by the
model g.xj/,O where O is the parameter estimate for the unknown parameter .
For a given set of candidate models fg1 ; g2 ; : : : ; gR g, one can compare the KL
information for each model and select the model that minimizes the information loss
across the considered set of models (Burnham and Anderson 2002, 2004; Poeter and
Anderson 2005). However, in practice, I.f; g/ cannot be computed since the true
model f is unknown.
Akaike (1973, 1974) and Burnham and Anderson (2002) made the link between
the KL information and likelihood theory and showed that the expected Kullback–
Leibler information can be expressed as
O
E.KL/ D log L.jD/ C M; (10.3)
where L.jD/ is the likelihood and M is the number of parameters in the model.
The well-known Akaike’s information criterion (AIC) is given by
Akaike’s approach allows for model selection that takes into account both goodness-
of-fit and model complexity. Because the individual AIC values are not inter-
pretable, as they contain arbitrary constants and are much affected by sample size,
for a given set of R models, Burnham and Anderson (2004) proposed to rescale the
AIC to
AICr D AICr AICmin ; (10.5)
with AICmin being the smallest AIC value across the set of R models. The AIC
difference, AICr , is interpreted as the information loss when model gr , rather than
the best model gmin , is used to approximate f . Some simple rules of thumb are used
10 Classification Using Information Theory 155
in assessing the relative merits of the models in the set (Burnham and Anderson
2002): models with AICr 2 have substantial support (evidence); those with
4 AICr 7 have considerably less support; and models with AICr > 10
have essentially no support.
Akaike (1981) and Burnham and Anderson (2002) advocated the use of
exp. 12 AICi / for the relative likelihood of the model given the data, defined by
1
L.gr jD/ / e 2 AICr : (10.6)
Note that the model likelihood L.gr jD/ takes into account both goodness-of-
fit and model complexity, while P .gr jD/ takes into account only goodness-of-fit.
Similar to the posterior probabilities in (10.1), Akaike (Burnham and Anderson
2002) defined Akaike’s weights by
Akaike’s weight PA .gr jD/ can be interpreted as the weight of evidence that model
gr is the best KL model given a set of R models and given that one of the models in
the set must be the best KL model. Claeskens and Hjort (2008) referred to PA .gr jD/
as the smooth AIC weight. Note that Akaike’s weights can be interpreted as the
posterior probabilities of the models.
where N is the number of arrays in the data and M is the number of parameters in
the model. The BIC uses a higher penalty on the number of observations than the
AIC, which penalizes on the number of parameters in the model. Therefore, the BIC
leads to the selection of less complex models. The posterior model probabilities are
given by
exp. 12 BICr /P .gr /
PB .gr jD/ D PR 1
: (10.9)
rD1 exp. 2 BICr /P .gr /
Whitney and Ryan (2009) referred to PB .gr jD/ as the approximation to the
model posterior probability. Burnham and Anderson (2002) showed that for prior
probabilities
1 1
P .gr / D B exp. BICr / exp. AICr /; (10.10)
2 2
it follows that PB .gr jD/ D PA .gr jD/ where B is a constant.
156 D. Lin et al.
X
K
ORIC D 2 log L.jD/ C 2 `P .`; K; w/;
`D1
with P .`; K; w/ the level probability discussed in Chap. 2 (Robertson et al. 1988),
wi D ni =i where ni is the number of arrays at dose i (i D 0; : : : ; K) and i is
P case that wi D D wK , or K D 2, it follows that
the variance at dose i . For the
ORIC D 2 log L.jD/ C 1=` (Robertson et al. 1988; Anraku 1999).
The posterior model probabilities (with non-informative priors) are given by
exp. 12 ORICr /
POR .gr jD/ D PR 1
: (10.11)
rD1 exp. 2 ORICr /
Up
Under simple order alternatives, i.e., H1 or H1Down [see (3.3] and (3.4),
respectively], in the setting of four doses (control and three higher doses) and equal
number of arrays per dose, the level probabilities P .`; K; w/, given by Robertson
et al. (1988), are equal to 0.25, 0.45833, 0.25, and 0.04167 for ` D 1, 2, 3, and 4,
respectively. The smallest value of the ORIC for the model in the set of R possible
models indicates the best configuration of the parameters under order restriction in
terms of the compromise between goodness-to-fit and model complexity.
Table 10.1 lists the values of the penalty used by the AIC, BIC, and ORIC. Note
that the relative magnitude of the penalty used by the ORIC for models g1 , g2 , g3
(two parameters) and models g4 , g5 , g6 (three parameters) is larger than that used
for model g7 (four parameters). Thus, due to the small penalty difference between
models g4 , g5 , g6 , and model g7 , the latter is more likely to be classified as the best
model in the set of R possible models.
The one-stage ORICC was proposed by Liu et al. (2009a, 2009b) in the context of
time-course microarray data and a similar algorithm, in the context of dose-response
microarray data, was proposed by Lin et al. (2009) in order to classify monotone
10 Classification Using Information Theory 157
gene profiles (with respect to dose). Liu’s and Lin’s algorithms are similar in the
sense that they both used the ORIC for clustering. However, they are different
in their focus. Lin’s algorithm proposes for clustering of monotone (increasing
or decreasing) profiles into their subset profiles while Liu’s algorithm treats all
monotone profiles as one cluster but allows for clustering of any order-restricted
profiles (including monotone, umbrella, and cyclical profiles). In addition, the
algorithm of Lin et al. (2009) is based on Anraku’s information criterion while Liu
et al. (2009b) developed an information criterion for the setting of non-monotone
profiles. The two-stage ORICC algorithm is similar to the one-stage ORICC
algorithm but includes an initial filtering stage in order to reduce computation time.
Liu et al. (2009b) proposed a filtering procedure based on model selection while Lin
et al. (2009) advocated the use of the likelihood ratio test for initial filtering. The
ORICC of Liu et al. (2009b) will be discussed further in Chap. 11.
The two-stage ORICC algorithm (in Lin et al. 2009) is defined as follows:
1. Initial filtering. Use the LRT to select genes with significant monotone dose-
response relationship.
2. Prespecify a collection of all possible order-restricted ANOVA models,
g1 ; : : : ; gR .
3. Compute for each model
ORIC.g1 /; : : : ; ORIC.gR /:
In Lin et al. (2009), the initial filtering is done based on the likelihood ratio test.
Genes for which the null hypothesis cannot be rejected are excluded from the
analysis. The one-stage ORICC algorithm is similar to the two-stage algorithm, but
it does not have the filtering stage, i.e., all genes are included for the analysis and the
null model g0 is fitted for each gene as well. Based on a simulation study, Lin et al.
(2009) advocated the use of the two-stage ORICC algorithm for three reasons:
1. The initial filtering reduces the misclassification error in the model selection step.
2. The model selection procedure is applied to those genes for which there is an
evidence of a monotone relationship between gene expression and dose. Hence,
the best monotone model must be in the model set, as required.
3. The initial testing step reduces the computation time for the model selection,
as the selection is applied for a relatively small number of genes and since the
direction of the trend (upward or downward trend) is known from the initial step.
The method discussed above can be used to identify the MED (i.e., the first dose
for which we observed a different response from the control dose) for the order-
restricted ANOVA models. An alternative approach to determine the MED is the
use of a hypothesis testing procedure, such as Williams’ (1971, 1972) or Marcus’
158 D. Lin et al.
(1976) procedure, that identifies the MED based on hypothesis testing in a step-
down fashion. However, identification of the MED using these procedures does not
imply the shape of the dose-response curve. On the other hand, model selection
procedures based on information theory can address two objectives at the same
time—the determination of the MED and the classification of trends. Once the best
model from the candidate set is selected, one can identify the MED from the selected
model.
Based on the 3,499 genes found to be significant in the initial inference step, the
classification of mean profiles for these genes is to be obtained. For each gene, we
fit seven models to the data. The log-likelihood of these models are obtained by
using the following R code:
> dim(data.sign) ## 3499 significant genes rejected by the LRT
3499 12
> g1 <- as.factor(c(rep(1,9),rep(2,3)))
> loglik.g1 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g1-1)))
> g2 <- as.factor(c(rep(1,6),rep(2,6)))
> loglik.g2 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g2-1)))
> g3 <- as.factor(c(rep(1,3),rep(2,9)))
> loglik.g3 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g3-1)))
> g4 <- as.factor(c(rep(1,3),rep(2,6),rep(3,3)))
> loglik.g4 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g4-1)))
> g5 <- as.factor(c(rep(1,6),rep(2,3),rep(3,3)))
> loglik.g5 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g5-1)))
> g6 <- as.factor(c(rep(1,3),rep(2,3),rep(3,6)))
> loglik.g6 <- apply(data.sign, 1, function(genei) logLik(aov(genei˜g6-1)))
In the above code, g1, g2, g3, g4, g5, g6, and g7 are R objects which
define the design matrices. For example, g1 and g6 imply the following design
matrices, respectively,
0 1 0 1
10 100
B1 0C B1 0 0C
B C B C
B C B C
B1 0C B1 0 0C
B C B C
B1 0C B0 1 0C
B C B C
B1 0C B0 1 0C
B C B C
B1 0C B0 1 0C
B
X.g1 / D B C B C
C and X.g6 / D B 0 0 1 C :
B1 0C B C
B1 0C B0 0 1C
B C B C
B C B C
B1 0C B0 0 1C
B C B C
B0 1C B0 0 1C
B C B C
@0 1A @0 0 1A
01 001
10 Classification Using Information Theory 159
Note that the log-likelihood of the first six models can be easily obtained after fitting
the data with predefined constraints. For model g7 , the isotonic means are obtained
by using the function monoreg() by specifying the increasing or decreasing trend
as option.
> g7m <- matrix(0, nrow(data.sign), ncol(data.sign))
> dir <- IsoGenem(x.res,data.sign)[[11]]
> g7m[dir=="u",] <- t(apply(data.sign[dir=="u",],1,
+function(genei)monoreg(unique(x.res),tapply(genei,as.factor(x.res),mean)
,type="isotonic")$yf))[,x.res]
> g7m[dir=="d",] <- t(apply(data.sign[dir=="d",],1,
+function(genei)monoreg(unique(x.res),tapply(genei,as.factor(x.res),mean)
,type="antitonic")$yf))[,x.res]
> residuals <- data.sign-g7m
> w <- rep.int(1, ncol(data.sign))
> N <- ncol(data.sign)
> loglik.g7 <- apply(residuals, 1, function(res)
+0.5*(sum(log(w)) - N * (log(2 * pi) + 1 - log(N) + log(sum(w * resˆ2)))))
> m2loglik.mat <- -2* cbind(loglik.g1,loglik.g2,loglik.g3,
+loglik.g4,loglik.g5,loglik.g6,loglik.g7)
Using the four information criteria (the likelihood, AIC, BIC, and ORIC) the
genes are classified into the seven curve shapes for each direction (see Table 10.2
and Fig. 10.2a). For the likelihood-based posterior probabilities defined in (10.1),
1,710 genes (48:85%) are classified as g7 —the isotonic regression model with
four parameters. As shown in Table 10.2, when the AIC and BIC criteria are
used to calculate the posterior probabilities, the number of genes that are classified
as g1 (isotonic regression with two parameters) increases from 344 to 1,528 and
1,648, respectively. The same pattern is observed for models g2 and g3 (both are
isotonic regression models with two parameters). For the ORIC, the number of
genes classified to models from g1 to g6 decreases as compared to the AIC and
BIC. On the other hand, 816 genes are classified as g7 .
Figure 10.2b shows the data for gene 3467. Based on the likelihood and ORIC,
the best model for the gene is g7 (dashed line). Using the AIC, the gene is classified
as g6 (solid line). For both models, the second dose level is estimated to be the MED
level. Using the BIC, the model is further reduced and the gene is classified as g2
(dotted line), and the MED level is estimated to be the third dose level. Figure 10.3
shows the clusters for genes which are classified as g3 and g6 according to the ORIC.
We notice that there is a large variability between these genes. This is expected since
160 D. Lin et al.
a g1 g2 g3 g4 g5 g6 g7
1500
1000
500
0
L A B O L A B O L A B O L A B O L A B O L A B O L A B O
b Gene 3467
5.6
Likelihood/ORIC
AIC
5.4 BIC
+
Gene expression
5.2 + +
5.0
4.8 +
+ +
+
4.6
1 2 3 4
Doses
Fig. 10.2 Classification based on information criteria. (a) Classification of trends based on
likelihood (L), AIC (A), BIC (B), and ORIC (O). (b) The best model according to the likelihood,
AIC, BIC, and ORIC for gene 3467. Dashed line is the isotonic regression model g7 (four
parameters) selected by the likelihood and ORIC, Solid line is g6 (three parameters) selected by
the AIC, and dotted line is g2 (two parameters) selected by the BIC
10 Classification Using Information Theory 161
g3 Increasing g3 Decreasing
12 12
10 10
Gene Expression
Gene Expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
Dose Dose
g6 Increasing g6 Decreasing
12 12
10 10
Gene Expression
Gene Expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
Dose Dose
Fig. 10.3 Classification of trends based on the AIC for the clusters of g3 and g6
the classification is based on information criteria and does not aim to reduce the
variability between genes.
Note that, although the three models (g1 , g2 , and g3 ) have the same number of
parameters, the effective dose levels are not the same. The MED for model g1 is
the highest dose level, while the effective dose levels for g2 and g3 are the third and
the second dose levels, respectively. In general, the AIC and BIC criteria favor the
two-parameter models as compared to the three-parameter models g4 , g5 , g6 , and
the four-parameter model g7 , which is favored by the likelihood. Table 10.2 shows
that the AIC and BIC classify most of the genes as g1 and g5 , the ORIC criterion
classifies most of the genes as g1 and g7 , while the likelihood criterion classifies
most of the genes as g7 .
162 D. Lin et al.
10.6 Discussion
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle.
In B. Petrov, & B. Csaki (Eds.), Second international symposium on information theory
(pp. 267–281). Budapest: Academiai Kiado.
10 Classification Using Information Theory 163
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, AC-19, 716–723.
Akaike, H. (1981). Likelihood of a model and information criteria. Journal of Econometrics, 16,
3–14.
Anraku, K. (1999). An information criterion for parameters under a simple order restriction.
Biometrika, 86(1), 141–152(12).
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: a practical
information—Theoretic approach. New York: Springer.
Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in
model selection. Sociological Methods Research, 33, 261–304.
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge Series in
Statistical and Probabilistic Mathematics.
Lin, D., Shkedy, Z., Burzykowki, T., Aerts, M., Göhlmann, H. W. H., De Bondt, A., et al. (2009).
Classification of trends in dose-response microarray experiments using information theory
selection methods. The Open Applied Informatics Journal, 3, 34–43.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009a). Order-restricted information criterion-based
clustering algorithm. Reference manual. http://cran.r-project.org/web/packages/ORIClust/.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009b). Information criterion-based clustering with order-
restricted candidate profiles in short time-course microarray experiments. BMC Bioinformatics,
10, 146.
Marcus, R. (1976). The powers of some tests of the quality of normal means against an ordered
alternative. Biometrika, 63, 177–83.
Poeter, E., & Anderson, D. (2005). Multimodel ranking and inference in ground water modeling.
Ground Water, 43(4), 597–605.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Schwarz, M. J. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Williams, D. A. (1971). A test for differences between treatment means when several dose levels
are compared with a zero dose control. Biometrics, 27, 103–117.
Whitney, M., & Ryan, L. (2009). Quantifying dose-response uncertainty using bayesian model
averaging. In Cooke, R. M. (Ed.), Uncertainty modeling in dose-response. New York: Wiley.
Chapter 11
Beyond the Simple Order Alternatives
11.1 Introduction
Up to this point in the book, we focused on estimation and inference for monotone
mean profiles. In Chaps. 7 and 8, the null hypothesis of no dose effect was tested
against order alternatives of .d0 / .d1 / .d2 / .d3 / or .d0 / .d1 /
.d2 / .d3 / (with at least one strict inequality), and in Chaps. 9 and 10, we
discussed two-stage clustering procedures for the subgroup of genes which were
found to be significant. The ordered alternatives discussed in the previous chapters
are called simple order alternatives, and the underlying assumption is that there is
a monotone relationship between the dose and the mean gene expression. Typical
mean profiles which satisfy a simple order are shown in Fig. 11.1a, d. A second
assumption that was made in the previous chapters is that the variance is equal across
all dose levels, i.e., Yij N.i ; 2 /. In this chapter, we relax these assumptions
and discuss the case of testing the null hypothesis against order-restricted, but not
necessarily monotone, alternatives assuming heteroscedastic variances. The order-
restricted alternatives we consider in this chapter are the unimodal partial order
(umbrella profiles) alternatives (Robertson et al. 1988; Bretz and Hothorn 2003;
Peddada et al. 2003, 2005, 2009, 2010; Simmons and Peddada 2007; Liu et al.
2009b). Typical mean profiles which follow an umbrella dose-response curve shape
are shown in Fig. 11.1b, c, e, f. We discuss two algorithms for inference and cluster-
ing of ordered-restricted gene expression data. The first, the ORIOGEN, proposed
by Peddada et al. (2003), can be used for inference and clustering. The algorithm
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 165
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 11,
© Springer-Verlag Berlin Heidelberg 2012
166 D. Lin and Z. Shkedy
a b c
gene expression
gene expression
gene expression
dose dose dose
d e f
gene expression
gene expression
gene expression
Fig. 11.1 Examples of order-restricted profiles in a dose-response experiment with four dose
levels. Panel (a): increasing profiles. Panel (b): an umbrella profile with downturn at the third dose
level. Panel (c): an umbrella profile with downturn at the second dose level. Panel (d): decreasing
profiles. Panel (e): an inverted umbrella profile with upturn at the third dose level. Panel (f): an
inverted umbrella profile with upturn at the second dose level
is discussed in Sect. 11.2 and the ORIOGEN 3.0 package (Peddada et al. 2005) is
used for the analysis. An elaborate discussion about the ORIOGEN 3.0 package is
given in Chap. 18. In Sect. 11.2.1, we test the null hypothesis against a simple order
alternative under heteroscedastic variances and discuss the similarities to the analy-
ses presented in Chaps. 7 and 8. In Sect. 11.2.2, the null hypothesis is tested against
order-restricted alternatives (either simple order or partial order alternatives). The
second algorithm we discuss in this chapter is the ORICC algorithm, proposed by
Liu et al. (2009b). Similar to the analysis discussed in Chap. 10, the ORICC algo-
rithm clusters order-restricted gene profiles using information criterion. The ORICC
algorithm is implemented in the R package ORIClust and discussed in Sect. 11.3.
under order restrictions, and in contrast with the analysis presented in the previous
chapters, the ORIOGEN algorithm is not focused only on simple order alternatives.
In Sect. 11.2.1, we present an analysis under simple order alternatives for the case
of heteroscedastic variances, while the analysis under partial order alternatives is
discussed in Sect. 11.2.2.
Our primary interest is to test the null hypothesis of no dose effect H0 W .d0 / D
Up
.d1 / D D .dK / against the simple order alternatives H1 W .d0 / .d1 /
.dK / or H1Down W .d0 / .d1 / .dK / with at least one
strict inequality. To test the null hypothesis against simple order alternatives, the
ORIOGEN 3.0 package performs a resampling-based SAM analysis using a SAM
t-type test statistics given, respectively, for increasing and decreasing profiles by
1 O K O 0 1 O 0 O K
lUp D q and lDown D q :
.s0 C s/ n1K C 1
n0 .s0 C s/ n10 C 1
nK
Note that for a given value of the fudge factor s0 , this specific analysis in
ORIOGEN is similar to the SAM analysis in IsoGene discussed in Chap. 8. The
main difference is that IsoGene uses permutation-based inference of the actual
expression data in which the columns of the expression matrix are permuted (and
therefore assumes variance homogeneity across the dose levels) while ORIOGEN
approximates the distribution of the test statistic by bootstraping residuals. The
bootstrap algorithm is discussed in Simmons and Peddada (2007), Gou and Peddada
(2008) and Peddada et al. (2010).
In order to perform the analysis using the ORIOGEN package, we need to
specify that the mean profiles of primary interest for the analysis are increasing
or decreasing. For the FDR level equal to 5% and s0 D s10% (i.e., s0 equals to tenth
percentile of the standard errors of the test statistics), we obtained 3,459 significant
genes from which 1,187 and 2,272 are clustered as increasing and decreasing
profiles, respectively. We consider a second analysis (with 1,000 bootstraps) without
the fudge factor, i.e., s0 D 0. The number of significant genes is equal to 3,151
from which 1,716 and 1,435 are clustered as increasing and decreasing profiles,
respectively. In Chap. 7, for Marcus’ test statistic, 3,533 genes were found to be
significant (using permutation-based inference with FDR-BH D 5%). In total, 2,957
significant genes are found to be common for the two approaches. An example of
168 D. Lin and Z. Shkedy
three genes found to be significant is shown in Fig. 11.2, and the partial output of the
ORIOGEN package is presented in the panel below. Note that the predicted means
for a simple order profile are obtained from the isotonic regression on the observed
means.
#PARTIAL OUTPUT OF THE ORIOGEN.
Results:
Gene ID Profile # P Value Q Value Fit.1 Fit.2 Fit.3 Fit.4
The main advantage of the ORIOGEN package is that it allows to test the null
hypothesis against any ordered alternative of interest. To keep notation in line with
Peddada et al. (2003), we denote Cr as a possible order-restricted profile. For a dose-
response experiment with four dose levels .K C 1 D 4/, there are six noncyclical
order-restricted profiles given by
˚
C1 D ˚ 2 RKC1 W 0 1 2 3 ;
increasing profile;
C2 D ˚ 2 RKC1 W 0 1 2 3 ;
umbrella profile, downturn at 2;
C3 D 2 RKC1 W 0 1 2 3 ;
umbrella profile, downturn at 3;
˚
C4 D ˚ 2 RKC1 W 0 1 2 3 ;
decreasing profile;
C5 D ˚ 2 RKC1 W 0 1 2 3 ;
inverted umbrella profile, upturn at 2;
C6 D 2 RKC1 W 0 1 2 3 ;
inverted umbrella profile, upturn at 3:
(11.2)
The profiles C1 and C4 are the simple order profiles that were tested in the previous
section. The algorithm implemented in the ORIOGEN package consists of the
following steps:
• Specify the set of candidate profiles of primary interest C1 ; C2 ; : : : ; CR .
• For each gene, estimate the mean under each candidate profile.
1 R
• Calculate the goodness of fit statistic, l1 ; : : : ; l1 , for each candidate model.
• Use a bootstrap algorithm to approximate the distribution of the test statistic (the
r
goodness of fit statistic) l1 D max.l1 1 R
; : : : ; l1 / under the null hypothesis.
11 Beyond the Simple Order Alternatives 169
gene expression
7.6
expression under simple order
restriction. Dashed line:
inverted umbrella with turn 7.4
point at the third dose level.
Dotted-dashed line: inverted 7.2
umbrella with turn point at
the second dose level.
7.0
(a) Gene 1; (b) Gene 3; (c)
Gene 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0
dose
b gene: 3
7.0
6.5
gene expression
6.0
5.5
8.5
gene expression
8.0
7.5
7.0
6.5
• Once a gene is declared significant, assign a gene to the profile which has the
largest goodness of fit statistic.
The methodology used in each step was discussed in details by Peddada et al.
(2003), Simmons and Peddada (2007), and Peddada et al. (2010). In what follows,
we discuss briefly the calculation of the test statistic and the testing (and clustering)
procedures.
C21 D f
˚ 2 R KC2s
s
W 0 1 s g ;
C22 D 2 R W K K1 s ;
for which all the parameters in C21 are linked and all the parameters in C22 are
linked, but the parameters in C21 , except s , are not linked with the parameters
in C22 . The parameter s is linked with all the parameters in the profile, and it is
called a nodal parameter (Peddada et al. 2003).
2. A subgraph is formed by a subvector of the profile (such as C21 and C22 ). A
subgraph is a linked subgraph if all the parameters in the subvectors are linked
(Simmons and Peddada 2007). A linked subgraph is a maximal linked subgraph if
all other linked subgraphs are subvectors of the subgraph. Hence, C21 and C22 are
maximal linked subgraphs for the umbrella-shaped profile and C1 is a maximal
linked subgraph of a simple order profile.
3. The goodness of fit statistic of a profile is the maximum standardized difference
between the parameter estimates of the farthest linked parameters of all the
maximal linked subgraphs of the profile (Simmons and Peddada 2007). For
example
• For the simple order profile, C1 is a maximal subgraph, and therefore, the
goodness of fit statistic is the standardized difference between the parameter
estimates of the two farthest parameters of C1 , that is,
11 Beyond the Simple Order Alternatives 171
O O 0
l11 D q K2 :
O K O 02
nK C n0
We notice that for the simple order profiles, the goodness of fit statistic is
Marcus’ statistic for the case of heteroscedasticity.
• For the umbrella-shaped profile, the goodness of fit statistic is the maximum
between the standardized difference of the farthest parameter estimates of C21
and C22 , that is,
0 1
B O O 0 O s O K C
l21 D max @ q s ; q A:
O s2 O 02 O s2 2
O K
ns
C n0 ns
C nK
Hence, for an umbrella with a downturn point at dose level s, the goodness of
fit statistic is the maximum of Marcus’ statistics calculated for each subgraph.
Under the assumption of homoscedastic variance the parameter estimates for dose-
specific variance can be replaced by the pooled sample variance in both l11 and
l21 . Both test statistics can be modified as SAM test statistics as discussed in the
previous section.
In order to carry out the analysis for simple and partial order alternatives, we need
to specify all the noncyclical profiles as candidate profiles. As we mentioned above,
for K C 1 D 4, there are six possible profiles.
Profile Selections:
1 Decreasing profile
2 Umbrella profile, downturn at 2
3 Umbrella profile, downturn at 3
4 Increasing profile
5 Inverted umbrella profile, upturn at 2
6 Inverted umbrella profile, upturn at 3
172 D. Lin and Z. Shkedy
1000
1. Decreasing profile
800 2. Umbrella profile, downturn at 2
3. Umbrella profile, downturn at 3
4. Increasing profile
5. Inverted umbrella, upturn at 2
6. Inverted umbrella, upturn at 3
600
400
200
0
1 2 3 4 5 6
Fig. 11.3 Number of genes per cluster. Seven hundred and eighteen genes are clustered into C1 ,
314 genes are clustered into C2 , 272 genes are clustered into C3 , 591 genes are clustered into C4 ,
336 genes are clustered into C5 , and 243 genes are clustered into C6
Let us focus again on genes 1, 3, and 10 presented in Fig. 11.2. Recall that these
genes were found to be significant for the analysis discussed in Sect. 11.2.1 when
only simple order alternatives were considered. As can be seen in the panel below,
in the current analysis, the three genes are found be significant as well. Gene 1 is
clustered into C1 (an increasing profile), gene 3 is clustered into C6 (an inverted
umbrella profile with upturn at the third dose level), and gene 10 was clustered into
C5 (an inverted umbrella profile with upturn at the second dose level).
Results:
Gene ID Profile # P Value Q Value Fit.1 Fit.2 Fit.3 Fit.4
For an analysis with s0 D 0 and FDR D 5%, we obtained 2,573 significant genes
from which 718, 413, 272, 591, 336, and 243 genes are clustered to profiles C1 –C6 ,
respectively (see Fig. 11.3). Figure 11.4a shows the genes that are clustered based
11 Beyond the Simple Order Alternatives 173
12 12 12
10 10 10
gene expression
gene expression
gene expression
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
12 12 12
10 10 10
gene expression
gene expression
gene expression
8 8 8
6 6 6
4 4 4
2 2 2
0 0 0
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
4 4 4
gene expression
gene expression
gene expression
2 2 2
0 0 0
−2 −2 −2
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
4 4 4
gene expression
gene expression
gene expression
2 2 2
0 0 0
−2 −2 −2
1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0 1.0 2.0 3.0 4.0
dose dose dose
Fig. 11.4 Graphical output of the ORIOGEN 3.0 package. (a) Mean profiles for the genes that
were found to be significant clustered into the six profiles of interest. (b) Six clusters with overall
mean gene expression subtracted
174 D. Lin and Z. Shkedy
on the six order-restricted profiles defined above. Figure 11.4b shows the genes in
the clusters with the overall mean gene expression subtracted. After subtracting the
means, it becomes evident to identify the dose-response trends for the six clusters.
In the previous section, the ORIOGEN algorithm has two components: inference
and clustering. Genes were clustered into a candidate profile only if they were
found to have a significant dose-response relationship. This is similar to the two-
stage approaches that were discussed in Chaps. 9 and 10. In both chapters, the
initial filtering step was an inference step, and a clustering method was applied
in the second step. Recall that the clustering procedure of monotone genes profiles,
discussed in Chap. 10, was applied to simple order profiles:
˚
C1 D 2 RKC1 W 0 1 2 3 ; increasing profile;
˚
C4 D 2 RKC1 W 0 1 2 3 ; decreasing profile:
In Chap. 10, we decomposed the simple order profile into all possible sub profiles,
given by (for increasing profiles and K C 1 D 4)
˚
g1 D 2 RKC1 W 0 < 1 < 2 < 3 ;
˚
g2 D 2 RKC1 W 0 < 1 D 2 D 3 ;
˚
g3 D 2 RKC1 W 0 < 1 < 2 D 3 ;
˚
g4 D 2 RKC1 W 0 < 1 D 2 < 3 ;
˚
g5 D 2 RKC1 W 0 D 1 D 2 < 3 ;
˚
g6 D 2 RKC1 W 0 D 1 < 2 < 3 ;
˚
g7 D 2 RKC1 W 0 D 1 < 2 D 3 :
The ORIC (Anraku 1999) was used for clustering after an initial inference-based
filtering step.
Similar to the previous section, the focus of the ORICC algorithm is not only
on monotone profiles. Note that the main difference between the initial filtering
step implemented in the ORIOGEN algorithm and the ORICC algorithm is that
the former uses an inference-based initial filtering step while the latter is a
model selection-based algorithm. Note that the order-restricted information criterion
implemented in the ORICC algorithm is different from Anraku’s ORIC, and it was
discussed by Liu et al. (2009b). For the analysis presented in this section, we use
the ORIClust R package (Liu et al. 2009a).
11 Beyond the Simple Order Alternatives 175
Let us consider a dose-response experiment with K C 1 dose levels for which the
ordered-restricted profiles of interest are given in (11.2). The one-stage ORICC
algorithm proposed by Liu et al. (2009b) is as follows:
1. Specify a set of candidate order-restricted profiles C1 ; : : : ; CR .
2. For each gene, calculate the ORIC.r/ and assign a gene to the candidate profile
from which ORIC.r/ is minimum.
The two-stage ORICC algorithm is similar to the one-stage algorithm but, in order
to reduce computation ˚time, has a filtering stage in which the ORIC is calculated for
the null model .C˚0 D 2 RKC1 W 0 D 1 ; : : : ; D K / and for the unrestricted
model .CUR D 2 RKC1 W 0 ¤ 1 ¤; : : : ; ¤ K /. A gene is considered for
clustering if ORIC.C0 / > ORIC.CUR /.
> library(ORIClust)
> #order restricted models
> fit.1<-increasing(Y573,my573,n.rep) #increasing
> fit.3<-down.up(Y573,my573,n.rep,2) #umbrella with up turn at 2
> fit.4<-down.up(Y573,my573,n.rep,3) #umbrella with up turn at 3
Parameter estimates under each profile for gene 573 are shown in the panel below
and in Fig. 11.5a. Note that for the down-up profile with minimum at the second
dose level, the profile should be monotone from the second dose level and onward,
and therefore, the observed means in the second and the third dose levels were
pooled together. Figure 11.5b shows mean profiles estimated for gene 64 under
decreasing and up-down profiles assumptions.
176 D. Lin and Z. Shkedy
a gene: 573
9.1
Simple order
down−up (min. at 2)
down−up (min. at 3)
9.0
gene expression
8.9
8.8
8.7
8.6
b gene: 64
7.3
7.2
gene expression
7.1
7.0
6.9
6.8
Simple order
up−down (max. at 2)
6.7
up−down (max. at 3)
Fig. 11.5 Ordered-restricted profiles for genes 573 (a) and 64 (b)
11 Beyond the Simple Order Alternatives 177
An analysis using the one-stage ORICC algorithm can be carried out using the
function ORICC1().
> data1 <- data.frame(paste("g",1:16998,sep=""),data)
> fit.clust<-ORICC1(data1,data.col=2:13,id.col=1,n.rep=rep(3,4),
+ n.top=250,transform=0,name.profile="all",plot.format="eps")
The R object data1 is the data frame containing the gene expression with
the first column as the gene names. The ORIClust ranked genes based on the
variability across the dose levels (Peddada et al. 2003 and Liu et al. 2009b)
measured by
PK
1 X ? N 2
K
O ?i
vg D O O ; NO D i D0
:
K C 1 i D0 i K C1
The option n.top=G implies that only the top G genes will be reported. For the
analysis above, for the top 250 genes, 59, 16, 9, 81, 60, and 25 genes were clustered
into C1 ; : : : ; C6 , respectively. Mean profiles of these genes are shown in Fig. 11.6.
> table(fit.clust$cluster)
1 2 3 4 5 6
59 16 9 81 60 25
8 8
Expression
Expression
6 6
4 4
2 2
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Time Time
Expression
8
4
6
3
2 4
1 2
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Time Time
10 8
8
Expression
Expression
6
6
4
4
2 2
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Time Time
Fig. 11.6 The graphical output of the ORIClust for the top 250 genes
> name.profileK
[1] "decreasing" "increasing" "up down max at 2"
[4] "up down max at 3" "down up min at 2" "down up min at 2"
11 Beyond the Simple Order Alternatives 179
> fit.clust<-ORICC2(data1,data.col=2:13,id.col=1,n.rep=rep(3,4),
+ n.top=250,transform=0,name.profile=name.profileK,plot.format="eps")
Note that only 7,254 genes out of the 16,988 were considered for clustering.
11.4 Discussion
References
Anraku, K. (1999). An information criterion for parameters under a simple order restriction.
Biometrika, 86(1), 141–152(12).
Bretz, F., & Hothorn, L. A. (2003). Statistical analysis of monotone and non-monotone dose-
response data from in vitro toxicological assays. ALTA 31, (Suppl. 1), 81–96, 2003, http://
ecvam.jrc.it/publication/Bretz-Hothorn.pdf.
Gou, W. and Peddada, S. (2008), Adaptive choice of the number of bootstrap samples in large scale
multiple testing, Statistical Applications in Genetics and Molecular Biology, 7(1).
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009a). Order-restricted information criterion-based
clustering algorithm. Reference manual. http://cran.r-project.org/web/packages/ORIClust/.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009b). Information criterion-based clustering with order-
restricted candidate profiles in short time-course microarray experiments. BMC Bioinformatics,
10, 146.
Peddada, S., Lobenhofer, E. K., Li, L., Afshari, C. A., Weinberg, C. R., & Umbach, D. M. (2003).
Gene selection and clustering for time-course and dose-response microarray experimants using
order-restricted inference. Bioinformatics, 19(7), 834–841.
Peddada, S., Harris, S., & Harvey E. (2005). ORIOGEN: order restricted inference for ordered
gene expression data. Bioinformatics, 21(20), 3933–3934.
Peddada, D. S., Umbach, M.D., & Harris, F.S. (2009). A response to information criterion-based
clustering with order-restricted candidate profiles in short time-course microarray experiments.
BMC Bioinformatics, 10, 438. http://www.biomedcentral.com/1471-2105/10/438
Peddada, S., Harris, S., & Davidov, O. (2010). Analysis of correlated gene expression data on
ordered categories. Journal of Indian Society of Agricaltural Statistics, 64(1), 45–60.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Simmons, S. J., & Peddada, S. (2007). Order-restricted inference for ordered gene expresion
(ORIOGEN) data under heteroscedastic variances. Bioinformation, 1(10), 414–419.
Chapter 12
Gene Set Analysis as a Means of Facilitating
the Interpretation of Microarray Results
12.1 Introduction
We have, thus far, been discussing gene-specific methods for identifying statistically
significant associations between gene expression profiles and a response variable.
These methods involved fitting, for each gene, an isotonic regression model to
relate the gene expression levels to the levels of the explanatory variable (dose). An
appropriate test statistic was then calculated for each gene and assigned a p value.
The intention was to produce a ranked list of genes with small p values and to
examine the genes in that list for biological significance.
While it is indeed important to identify individual genes associated with re-
sponse, interpretation of the resultant findings is facilitated by taking into account
the fact that biological phenomena occur through the actions and interactions
of multiple genes, via signaling pathways or other functional relationships. It is
therefore possible to categorize genes into gene sets according to these relationships.
This is enabled by the availability of databases that provide biological annotation for
known genes. For example, the Gene Ontology Consortium (2000) has developed
a comprehensive taxonomy of gene annotations for three ontologies: biological
process, cellular component, and molecular function. Each ontology is structured
as a directed acyclic graph, with a hierarchy of terms that vary from broad levels of
classification down to more narrowly defined levels.
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 181
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 12,
© Springer-Verlag Berlin Heidelberg 2012
182 N. Raghavan et al.
Existing methods for evaluating the significance of gene sets can be classified into
two broad classes:
(1) Overrepresentation analysis: this approach thresholds all the p values and labels
each gene as being “significant” (e.g., p < 0.05) or “not significant” (e.g., p >
0.05). The proportion of significant genes in the gene set is then compared to
the corresponding proportion for the population of genes being studied using
Fisher’s exact test (Fisher 1934) or a variant thereof to determine whether there
is over-representation of significant genes in the gene set. Analogous methods
have been described by a number of authors (see Raghavan et al. 2006 for a list
of references) and is the basis for a number of software packages for doing gene
set analysis and its sibling, pathway analysis.
(2) Functional class scoring : this approach computes a statistic that summarizes
the p values of the genes in a gene set. In particular, we use the statistic
X
MLP D . log.pi //; (12.1)
12 Gene Set Analysis 183
where pi refers to the p value of the i th gene in the gene set (Pavlidis et al. 2004,
Raghavan et al. 2006). Here, the acronym “MLP” stands for “mean log p.”
Other summary statistics which have been proposed for functional class scoring
include the Kolmogorov–Smirnov statistic which compares the distribution of
the p values in a gene set to the distribution of all the p values in the study.
A widely used version of this approach is considered by Mootha et al. (2003),
who use the term gene set enrichment analysis (GSEA).
Of these methods, we will only consider the method based on the MLP statistic
as per the arguments presented in Raghavan et al. (2006). If a certain gene set
corresponds to a biological process that is implicated with the response, it is likely
that many of the genes comprising that gene set will have relatively small p values,
so that the value of the MLP statistic for that gene set will be relatively large. Thus,
gene sets with relatively large MLP statistics are the gene sets of interest.
Once the MLP statistic is calculated for each gene set in the dataset, a permutation
procedure can be applied to determine whether or not a particular gene set is
significant. Each permutation randomly permutes the p values across the genes and
maps them to the respective gene sets. This permutation scheme ensures that the
p values within a gene set are random but with the correlation structure among
the gene sets being preserved. The latter is important because if multiple gene sets
share a gene, that gene will have the same p value in all these gene sets in any given
permutation. Preserving the correlation among the gene sets in this way is important
as it maintains the interpretability of the results (e.g., across the GO terms along a
branch of an ontology).
The significance of a gene set is determined by comparing the observed value
of the MLP test statistic for that gene set (labeled as MLP) to the values of the
MLP test statistic for that same gene set across multiple random permutations of
the p values as described above (labeled as MLP ). If MLP is larger than MLP
in most permutations, the gene set would be declared significant. This means that
MLP must exceed a threshold quantile of the MLP s to be declared significant.
This threshold quantile is the empirically determined critical value for that gene
set.
For example, suppose that a gene set G5 comprises five genes whose p values
are 0.0004, 0.1700, 0.0002, 0.0028, and 0.0011. Four of the genes in G5 are highly
significant, and the value of the MLP statistic for G5 is 6.161. Now suppose that we
randomly permute all the p values in the entire collection of genes in the experiment
and the five genes in G5 are assigned p values of 0.7944, 0.2310, 0.0001, 0.4141,
and 0.7395. Now only one gene is significant and the value of the MLP statistic for
G5 is 2.418, which is much smaller than the observed value of 6.161. Suppose that
over 1,000 such random permutations, the MLP values are 2.418, 3.178, : : :, and
184 N. Raghavan et al.
Fig. 12.1 Example of a quantile curves plot for the MLP results showing geneSetSt at i st i c
versus t est edGeneSetSi ze. Every dot represents a gene set. Every line represents a smoothing
of the null quantile per gene set size. The indicated levels of significance are 105 , 104 , 103 ,
102 , 5 102 , 101 and 0.5 (top to bottom)
that the 95th percentile of these random MLP values is 2.945. This value is the 5%
critical value. Since the observed value of 6.161 is larger than this, we would claim
that G5 is significant at the 5% level.
The determination of the critical value can be made more efficient by recognizing
that the distribution of the MLP test statistic varies systematically as a function of the
gene set size as illustrated in Fig. 12.1. In order to determine the appropriate critical
value for a gene set of size n, we empirically calibrate the critical values using
the following procedure. Randomly generate “null” gene sets of size n, using the
permutation scheme described above. For each n, calculate the test statistic MLPn .
Run a quantile smoother, q.˛; n/, through the MLPn versus n relationship so that
a proportion ˛ of the observations lie above the smoother (see Amaratunga and
Cabrera 2004). The q.˛; n/ are the ˛ level critical values. The advantage of this
procedure is that we borrow strength across all gene sets of the same size, as well
as across gene sets of similar sizes. As a result, this procedure will yield a uniform
critical value for all gene sets of a given size, and gene sets of similar sizes will have
critical values close to each other. More details are provided by Raghavan et al.
(2006).
12 Gene Set Analysis 185
We now illustrate the MLP method on the case study, i.e. the Affymetrix gene
expression dataset of subjects treated with a series of doses of an antipsychotic
compound (the harmacological activity of antipsychotics case study). We illustrate
this for one of the statistics considered earlier, the EON 01
2
statistic. Please note that
the MLP package, available from Bioconductor, makes use of various annotation
packages also available at the Bioconductor website http://www.bioconductor.org.
The package GO.db contains a set of annotation maps of the Gene Ontology,
in particular information describing each of the gene sets GOTERM in GO for each
of the three ontologies: BP (biological processes), MF (molecular function) and CC
(cellular component). It also provides maps for linking each GOTERM to its ancestors
and children. The package org.Rn.eg.db provides genome-wide annotation for
species rat. Among other things, it provides mappings between each GOTERM and
the EntrezGene identifiers in that GOTERM. This is a species-specific library and
for each experiment, the library corresponding to the species of the subjects used in
the experiment, in this case rat, must be used. Corresponding species packages for
mouse and human are org.Mm.eg.db and org.Hs.eg.db, respectively.
The first step is to load the MLP package in the R-workspace. This automatically
loads various additional required packages, such as GO.db and the appropriate
species package (e.g. org.Rn.eg.db).
> library(MLP)
The next step is to set up the MLP package, as shown below. The inputs that the
user needs to specify include the set of p values corresponding to the gene-by-gene
analysis, and the annotation information mapping genes to gene sets. The first input
to the MLP package is geneStatistic, a named numeric vector of p values
where the names correspond to the EntrezGene identifiers. We use the p values
from the likelihood ratio test in Chapter 7, where 3,613 genes were found to have
a significant monotone dose-response relationship. The p values for first ten genes
are shown below.
> #e2 is the vector of the two-sided p-values for the LRT from Chapter 7
> pvalues <- e2
> pvalues[1:10]
112400 113882 113892 113893 113894 113898
0.50169961 0.37989980 0.08577877 0.10567617 0.17950223 0.04372303
113900 113901 113902 113906
0.43165756 0.42154820 0.37882345 0.18401486
> geneSets[3:5]
$‘GO:0000012‘
[1] "24839" "64573" "84495" "259271" "290907" "314380"
$‘GO:0000018‘
[1] "24699" "25660" "25712" "59086" "64012" "81685" "81709" "81816"
[9] "116562""171369" "287287" "287437" "288905" "290803" "291609" "303496"
[17] "303836""308755" "312398" "317382" "362288" "362412" "362896" "499505"
[25] "690237"
$‘GO:0000019‘
[1] "64012" "81685" "308755"
The core of the MLP package is the MLP function , which is used as shown below.
The two main inputs are geneStatistic and geneSet, obtained above. In
addition, the user can specify various other parameters, for which default settings are
preset in the package. These include specification of the minimum (mi nGe nes) and
maximum (maxGe nes) gene set size to be considered within the analysis, default
5 and 100, respectively. Another parameter is nPermutations, which specifies
the number of permutations to be run. An additional option, smoothPValues, is
whether or not to calculate probability values by smoothing. Smoothing calculates
smoothed monotonically decreasing probability values by leveraging information
across gene sets of similar sizes and is the default setting for the package. The user
can also specify the quantiles at which probability values are desired.
The results of the analysis for the case study are as shown below. The output
is a data frame of class “MLP” listing the GO terms, the number of genes
corresponding to the GO term, in total (totalGe neSetS i ze) as well as for which a
p value has been provided (testedGe neSetS i ze), the value of the MLP statistic
(ge neSetS tati sti c), the p value of the GO term (ge neSetP Value), and the
description of the GO term (ge neSetDescripti on). The results are ordered based
on the significance of the ge neSetS tati sti c. The top six GO terms are shown
below, for illustration.
12 Gene Set Analysis 187
> class(mlpOut)
[1] "MLP" "data.frame"
> head(mlpOut)
totalGeneSetSize testedGeneSetSize geneSetStatistic geneSetPValue
GO:0014037 11 10 3.116157 0.0001625596
GO:0051789 91 78 1.554874 0.0003010118
GO:0006986 35 30 2.013559 0.0003787278
GO:0006470 111 79 1.529384 0.0003975008
GO:0051591 63 54 1.715411 0.0004080530
GO:0009612 77 64 1.564498 0.0007417438
geneSetDescription
GO:0014037 Schwann cell differentiation
GO:0051789 response to protein stimulus
GO:0006986 response to unfolded protein
GO:0006470 protein amino acid dephosphorylation
GO:0051591 response to cAMP
GO:0009612 response to mechanical stimulus
The table below gives an overview of the significance of the tested GO terms
(3,098 passed the mi nGe nes and maxGe nes size criteria) in this analysis. Since
the algorithm is permutation based, these numbers can be slightly different.
Significance Level Number of Significant Gene sets
p-Value in (0.00001,0.0001] 1
p-Value in (0.0001,0.001] 7
p-Value in (0.001,0.01] 37
p-Value in (0.01,0.1] 193
p-Value in (0.1,1] 2860
The MLP package, also provides several plot functions. One of them is the
probability quantile curves, with testedGe neSetS i ze on the x-axis and the
ge neSetS tati sti c on the y-axis, as shown in Fig. 12.1.
> pdf("mlpQuantileCurves.pdf", width = 10, height = 10)
> plot(mlpOut, type = "quantileCurves")
> dev.off()
Figure 12.2 shows the significance of the 20 most significant gene sets.
> pdf("mlpBarplot.pdf", width = 10, height = 10)
> op <- par(mar = c(30, 10, 6, 2))
> plot(mlpOut, type = "barplot")
> par(op)
> dev.off()
In addition, one can also plot, the top set of significant GO terms according
to their structure in the ontology, as shown in Fig. 12.3, with the biggest and
least specific GO terms shown at the bottom. The most specific “leaves” of the
tree are shown at the top. The ovals are colored by the level of significance, with
darker shades indicating more significant GO terms. In that sense, the genes in
GO:0014037 are a subset of the genes in GO:0007422.
> pdf("mlpGOgraph.pdf", width = 8, height = 6)
> op <- par(mar = c(0, 0, 0, 0))
> plot(mlpOut, type = "GOgraph")
> par(op)
> dev.off()
188 N. Raghavan et al.
Fig. 12.2 Example of a barplot for the MLP results. The length of a bar represents the significance
( log10 (geneSetPValue)) of the gene set indicated horizontally. The number between brackets
represents the number of genes within that gene set (number of genes for which a gene statistic has
been submitted as well as the total number of genes)
The genes contributing most to the significance of a certain gene set can easily
be visualized as shown in Fig. 12.4.
> geneSetID <- rownames(mlpOut)[1]
> pdf("geneSignificance.pdf", width = 10, height = 10)
> par(mar = c(25, 10, 6, 2))
> plotGeneSetSignificance(
geneSet = geneSet,
geneSetIdentifier = geneSetID,
geneStatistic = pvalues,
annotationPackage = "rat2302rnentrezg",
)
> dev.off()
12 Gene Set Analysis
Fig. 12.3 Example of a GOgraph for the MLP results. Every elipse represents a gene set. The color indicates the significance, the darker, the more significant.
The connectors indicate the parent–child relationship. The number at the bottom of the elipses represent the number of genes within that gene set (number of
genes for which a gene statistic has been submitted as well as the total number of genes)
189
190 N. Raghavan et al.
Fig. 12.4 Example of a gene significance plot for a gene set of interest. The length of a bar
represents the significance ( log10 (geneStatistic)) of the gene indicated horizontally
12.4 Discussion
By its very nature, false-positive findings are prodigious when we interrogate tens
of thousands of genes simultaneously and individually. While we can attempt to
quantify the degree of such findings using various approaches, it can be difficult,
especially in case of a subtle signal, to pinpoint the specific genes that correspond
to real findings in an individual gene-by-gene analysis. Approaches based on gene
set analysis are an attempt to elicit biologically more meaningful results from high-
content genomic experiments. Typically, the output from such an analysis, like the
MLP-based analysis above, will need to be interpreted in conjunction with the
scientist performing the experiments. This would include follow-up analyses on
the specific genes involved in the significant gene sets. However, the user must
be cautioned that gene set analysis results are of course dependent on the inputs
to the analysis, in this case, the p values from the preceding individual gene-by-
gene analysis. Practical experience suggests that a limited number of significant
cascading gene sets tend to be more credible than significant gene sets scattered
across the spectrum of the hierarchical ontology.
12 Gene Set Analysis 191
References
Amaratunga, D., & Cabrera, J. (2004). Exploration and analysis of DNA microarray and protein
array data. Hoboken: Wiley.
Gene Ontology Consortium. (2000). Gene ontology: Tool for the unification of biology. Nature
Genetics, 25, 25–29.
Fisher, R. A. (1934) Statistical Methods for Research Workers. Oxford University Press.
Mootha, V. K. Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J., et al. (2003).
PGC-1-responsive genes involved in oxidative phosphorylation are coordinately downregulated
in human diabetes. Nature Genetics, 34(3), 267–273.
Pavlidis, P. Qin, J., Arango, V., Mann, J. J., & Sibille, E. (2004). Using the gene ontology for
microarray data mining: A comparison of methods and application to age effects in human
prefrontal cortex. Nuerochemical Research 29, 1213–1222.
Raghavan, N., Amaratunga, D., Cabrera, J., Nie, A., Qin, J., & McMillian, M, (2006). On methods
for gene function scoring as a means of facilitating the interpretation of microarray results.
Journal of Computational Biology, 13(3), 798–809.
Chapter 13
Estimation and Inference Under Simple Order
Restrictions: Hierarchical Bayesian Approach
13.1 Introduction
In Chap. 10, we classified genes into different subsets according to the order-
restricted model with the best goodness of fit for each gene. The classification was
based on information criteria such as the AIC, BIC, and ORIC. In this chapter, we
focus on hierarchical Bayesian modeling of dose-response microarray data. The ma-
terials presented in the first part of the chapter are closely related to the classification
procedure discussed in Chap. 10 in the sense that we fit several order-restricted mo-
dels for each gene. However, in contrast with Chap. 10, we do not aim to select the
model with the best goodness of fit but to test the null hypothesis of no dose effect.
In Sect. 13.2, we formulate order-restricted hierarchical Bayesian model for dose-
response data and present gene-specific examples to illustrate the estimation proce-
dures. Within the hierarchical Bayesian framework, one of the major challenges is
related to the question of how to perform Bayesian inference and in particular how
to adjust for multiplicity. In Sect. 13.3, we discuss the Bayesian variable selection
A. Kasim ()
Wolfson Research Institute, Durham University, Durham, UK
e-mail: [email protected]
Z. Shkedy
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
B.S. Kato
Respiratory Epidemiology and Public Health, Imperial College London, London, UK
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 193
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 13,
© Springer-Verlag Berlin Heidelberg 2012
194 A. Kasim et al.
(BVS) method (O’Hara and Sillanpää 2009) which we use in order to calculate the
posterior probability of a specific model given the data and the model parameters.
In Sect. 13.4, following Newton et al. (2004, 2007), we use the posterior probability
of the null model to control for multiplicity using the direct posterior probability
method for multiplicity adjustment. Throughout this chapter, we use the index i for
dose, j for replicates within a dose, and r for a candidate model for a specific gene.
P .yj/P ./
P .jy/ D R / P .yj/P ./:
P .yj/P ./ d
R
The integral P .yj/P ./ d is a normalizing constant, and therefore, the
posterior distribution is proportional to the product of the likelihood function and the
prior distribution. Often, the distribution of depends on unknown hyperparameters
for which the hyprerprior distribution is denoted by P ./. For this case, the
hierarchical Bayesian model has three levels:
and subsequently, the posterior distribution is given by P .jy/ / P .yj; /P .j/
P ./. The posterior distribution can be derived analytically or by drawing a sample
from it using Markov Chain Monte Carlo (MCMC) simulation. For detailed dis-
cussion about Bayesian inference in general and hierarchical Bayesian modeling in
13 Hierarchical Bayesian Approach 195
particular, we refer to Gilks et al. (1996) and Gelman et al. (2004). For an elaborate
discussion about order-restricted Bayesian models, we refer to Hoijtink et al. (2008).
where Yij is the gene expression at the i th dose level for array j and i is the
mean gene expression level at dose level i . Note that since we assume that the dose-
response relationship is monotone, we wish to estimate a model in which 0
1 2 K for a monotone upward profile and 0 1 2
K for a monotone downward profile, where 0 is the mean gene expression
for the control dose. The monotone constraints are achieved by constraining the
parameter space of D .0 ; : : : ; K /, whereby the order restrictions are imposed
on the prior distributions. For a monotone upward profile, we assume that is a
right-continuous nondecreasing function defined on Œ0; K. We do not assume any
deterministic relationship between i and the dose levels, but instead, we specify a
probabilistic model for i at each distinct dose level.
The problem is to estimate under the order restrictions, 0 1 K .
Thus, the K C 1 dimensional parameter vector is constrained to lie in a subset
S KC1 of RKC1 . The constrained set S KC1 is determined by the order among
the components of . In this case, it is natural to incorporate the constraints
into the specification of the prior distribution (Klugkist and Mulder 2008). Let
y D .Y11 ; Y12 ; : : : ; YKnk / be the expression levels for a specific gene and and
the hyperparameters for which will be discussed later. Gelfand, Smith, and
196 A. Kasim et al.
Lee (1992) showed that the posterior distribution of , given the constraints, is the
unconstrained posterior distribution normalized such that
P .yj/P .j; /
P .jy/ / R ; 2 S KC1 : (13.3)
S k P .yj/P .j; /d
Let SlKC1 .l ; l ¤ i / be a cross section of S KC1 defined by the constraints for the
component i at a specified set of i , l ¤ i , and l D 0; 1; 2; : : : ; K. In our setting,
SlKC1 .l ; l ¤ i / is in the interval Œi 1 ; i C1 . It follows from (13.3) that the
posterior distribution for i is given by
In matrix notation, the mean gene expression can be expressed as E.y/ D Xˇ,
where X is a direction-dependent known design matrix, i.e., for an upward and
downward trend the design matrices are given, respectively, by
13 Hierarchical Bayesian Approach 197
0 1 0 1
1 000 1 111
B1 0 0 0C B1 1 1 1C
B C B C
B C B C
B1 0 0 0C B1 1 1 1C
B C B C
B1 1 0 0C B1 0 1 1C
B C B C
B1 1 0 0C B1 0 1 1C
B C B C
B1 1 0 0C B1 0 1 1C
Xup DB
B1
C and Xdn DB C;
B 1 1 0C
C
B1
B 0 0 1C
C
B1 1 1 0C B1 0 0 1C
B C B C
B C B C
B1 1 1 0C B1 0 0 1C
B C B C
B1 1 1 1C B1 0 0 0C
B C B C
@1 1 1 1A @1 0 0 0A
1 111 1 000
Here, the normal distribution for the priors of ı` is left truncated at 0 to ensure that
ı` 0. To complete the formulation of the model, we assume vague hyperprior
distributions at the third level of the model, i.e., ı` N.0; 10; 000/ and ı2 `
gamma.0:001; 0:001/.
As pointed out by Dunson and Neelon (2003), since the prior of the X components
of ı is truncated normal distribution, the mean structure i D 0 C ı` implies
an order constraints mean structure with strict inequalities 0 < 1 <; : : : ; < K .
Equality constraints can be incorporated in the model by setting some of the
components in ı to be equal to zero. Indeed, ıi D 0 implies i D i 1 .
For the analyses presented in this chapter, we use human epidermal squamous
carcinoma cell line A431 experiment as a case study. The expression matrix consists
of 12 arrays and 16,988 genes. We focus on a single gene (1,095) as an example
to illustrate the difference between the posterior estimates of the constrained
and unconstrained models. The likelihood and the parameterization of the mean
structure for the two models are identical and given in (13.5) and (13.6). For the
unconstrained model, vague priors are considered for parameter ıi . Note that for
this model, ıi is not restricted to be nonnegative. In Winbugs 1.4, the likelihood and
the linear predictor for the mean can be implemented by
198 A. Kasim et al.
model
{
for(i in 1:N)
{
Y[i] ˜dnorm(mu[i],tau)
mu[i] <- mu0*X1[i]+ delta1*X2[i]+ delta2*X3[i]+ delta3*X4[i]
}
Next, we use the truncation function in Winbugs I() for the constrained para-
meters. For the unconstrained model, the same priors are used without truncation
function I().
mu0 ˜dnorm(mu.mean,mu.tau)I(0.00,)
delta1 ˜dnorm(delta1.mean,delta1.tau)I(0.00,)
delta2 ˜dnorm(delta2.mean,delta2.tau)I(0.00,)
delta3 ˜dnorm(delta3.mean,delta3.tau)I(0.00,)
mu.mean ˜dnorm(0,0.000001)
mu.tau˜ dgamma(0.001,0.001)
delta1.mean ˜dnorm(0,0.000001)
delta1.tau˜ dgamma(0.001,0.001)
delta2.mean ˜dnorm(0,0.000001)
delta2.tau˜ dgamma(0.001,0.001)
delta3.mean ˜dnorm(0,0.000001)
delta3.tau˜ dgamma(0.001,0.001)
tau˜dgamma(0.001,0.001)
sigma <-1/ sqrt(tau)
We define two lists, one for the gene expression data and the other for the initial
values for the model.
#INITIAL VALUES
list(mu0=0.5,delta1=0.05,delta2=0.05,delta3=0.05,tau=1,mu.mean=0.1,mu.tau=1,
delta1.tau=1,delta2.tau=1,delta3.tau=1,delta1.mean=0.05,
delta2.mean=0.05,delta3.mean=0.05)
Note that the design matrix Xup in (13.2.2) is defined by the variables X1, X2, X3,
and X4.
13 Hierarchical Bayesian Approach 199
> X1=c(1,1,1,1,1,1,1,1,1,1,1,1)
> X2=c(0,0,0,1,1,1,1,1,1,1,1,1)
> X3=c(0,0,0,0,0,0,1,1,1,1,1,1)
> X4=c(0,0,0,0,0,0,0,0,0,1,1,1)
> data.frame(X1,X2,X3,X4)
X1 X2 X3 X4
1 1 0 0 0
2 1 0 0 0
3 1 0 0 0
4 1 1 0 0
5 1 1 0 0
6 1 1 0 0
7 1 1 1 0
8 1 1 1 0
9 1 1 1 0
10 1 1 1 1
11 1 1 1 1
12 1 1 1 1
The hierarchical model was fitted in Winbugs 1.4. We used a chain of 30,000
iterations from which the first 10,000 were treated as burn in period and discarded
from the analysis. Figure 13.1a shows the observed data, isotonic means, and the
parameter estimates for the posterior means obtained from the constrained and
unconstrained models. The parameter estimates for the posterior means obtained
from the unconstrained reveal a violation in the order at the second dose level
(N 0 D 9:672 and N 1 D 9:608, see the panel below). Indeed, we notice that the pool
adjacent violators algorithm (PAVA), discussed in Chap. 2, pools together the means
of the first two dose levels. Similar to isotonic regression, the parameter estimates
for posterior means for the constrained model are monotone. However, in contrast
with isotonic regression, the parameter estimates for the posterior means obtained
from the constrained model are equal to 9.625 and 9.655 for the first two dose levels,
respectively (compared with the isotonic mean O ?12 D 9:64).
> whichgene<-1095
> Y<-as.numeric(as.vector(dat[whichgene,]))
> #mY: mean gene expression at each dose
> mY<-tapply(Y,as.factor(dose),mean)
> iso.r1<-pava(mY,w=c(3,3,3,3))
> iso.r1
1 2 3 4
9.640501 9.640501 9.842729 10.001414
c
13 Hierarchical Bayesian Approach 201
The first model consists of four constrained parameters, while the latter consists of
three constrained parameters, and the means of dose levels 0 and 1 are constrained
to be equal (similar to the pooling of the PAVA). Note that for model 2 ı1 D 0 so
0 D 1 . The output of the two models are given below.
## Isotonic regression ##
> whichgene<-3110
> Y<-as.numeric(as.vector(dat[whichgene,]))
> mY<-tapply(Y,as.factor(dose),mean)
> iso.r1<-pava(mY,w=c(3,3,3,3))
> iso.r1
1 2 3 4
6.582839 6.582839 6.788639 7.193730
For the second model, we need to redefine the design matrix; this can be done by
excluding the second column in the design matrix in the data list in the following
way:
# data for model 1 (g_7: 4 levels)
list(a=1.5,N=12,Y=c(6.662642,6.548664,6.893789,6.568433,6.483458,
6.340048,6.814083,6.748551,6.803282,7.193578,
7.340334 ,7.047279),
X1=c(1,1,1,1,1,1,1,1,1,1,1,1),
X2=c(0,0,0,1,1,1,1,1,1,1,1,1),
X3=c(0,0,0,0,0,0,1,1,1,1,1,1),
X4=c(0,0,0,0,0,0,0,0,0,1,1,1))
Within the hierarchical Bayesian framework, the goodness of fit and complexity for
the gene-specific models can be assessed using the deviance information criterion
(DIC), proposed by Spiegelhalter et al. (2002). The effective number of parameters
(the complexity) in the hierarchical model can be measured by the difference
between the posterior expectation of the deviance and the deviance evaluated at the
posterior expectation of (Spiegelhalter et al. 2002; Hoijtink et al. 2008), that is,
Here, deviance is given by D./ D 2 log P .yj/C2 log.f .y//. The second term
in the deviance is a standardizing factor which does not depend on . The deviance
information criterion for model selection is given by
DIC D DN C PD D D./
N C 2PD :
A small value of DIC indicates a better goodness of fit. The use of the DIC in
the context of order-restricted Bayesian model is discussed in Myung et al. (2008)
and Chen and Kim (2008). Three models were fitted to the data, g0 , g5 , and g7 .
Figure 13.1c shows the data and posterior means for gene 13386. The panel below
13 Hierarchical Bayesian Approach 203
shows that g5 has the smallest DIC value .13:074/ and therefore should be
prefered.
13.3.1 Introduction
In the previous section, we discussed the hierarchical Bayesian approach for dose-
response microarray experiments with monotone constraints. If our goal is to select
the model with the best goodness of fit for each gene, we can fit all possible order-
constrained models and select the one with the smallest DIC (Spiegelhalter et al.
2002) among the fitted models, similar to the model selection procedure discussed
in Chap. 10. However, this approach is computationally intensive and may become
impractical for a large number of dose levels, since the number of all possible
monotone models increases rapidly. In this section, in order to overcome the need to
fit all possible monotone models separately, we focus on BVS methods (George
and McCulloch 1993; O’Hara and Sillanpää 2009) which allow us to estimate
the mean gene expression under order restrictions taking into account all possible
monotone models. For an application of BVS models within the framework of
dose-response modelling we refer to Whitney and Ryan (2009). Let us consider
a dose-response experiment with a control and 3 dose levels. Together with the null
model, Yij D 0 C "ij , there are eight possible models that can be fitted for which
the mean structures are given by
204 A. Kasim et al.
g0 W 0 D 1 D 2 D 3 ;
g1 W 0 D 1 D 2 < 3 ;
g2 W 0 D 1 < 2 D 3 ;
g3 W 0 < 1 D 2 D 3 ;
g4 W 0 < 1 D 2 < 3 ;
g5 W 0 D 1 < 2 < 3 ;
g6 W 0 < 1 < 2 D 3 ;
g7 W 0 < 1 < 2 < 3 :
For example, the mean gene expression for the three models fitted in Sect. 13.2.2.3
is given by Xg0 0 , Xg7 ˇ 07 , and Xg5 ˇ 05 , respectively. In fact, all the design matrices
above are submatrices of Xg7 , and our aim is to select, for each gene, the most
appropriate design matrix and parameter vector.
The BVS approach (George and McCulloch 1993) is related to the problem of the
choice of an optimal model from a priori set of R known plausible models. As
pointed out by O’Hara and Sillanpää (2009), the choice of an optimal model reduces
to the choice of a subset of variables which will be included in the model (i.e., model
selection) or the choice of which parameters in the parameter vector are different
from zero (i.e., inference).
In our setting, the BVS model allows us to calculate the posterior probability of
each model, p.gr jdata/, and in particular the posterior probability of the null model,
p.g0 jdata/. Let zi , i D 1; : : : ; K be an indicator variable such that
and let i D ıi zi , ˇ D .0 ; 1 ; 2 ; 3 / and y the gene expression vector. Hence, we
can reformulate the mean structure in (13.6) (O’Hara and Sillanpää 2009) in terms
of i and zi as
206 A. Kasim et al.
0 1
10 00
B1 0 0 0C
B C
B C
B1 0 0 0C
B C
B1 1 0 0C
B C0 1
B1 1 0 0C
B C 0
B1 1 0 0C B C X
K X
K
E.y/ D Xˇ 0 D B C B 1 C or E.Yij / D 0 C ` D 0 C z ` ı` :
B1 1 C
1 0 C @ 2 A
B
B1 1 1 0C
`D1 `D1
B C 3
B C
B1 1 1 0C
B C
B1 1 1 1C
B C
@1 1 1 1A
11 11
For the three dose level experiment discussed above, the triplet z D .z1 ; z2 ; z3 /
defines uniquely each one of the eight plausible models. For example, for z D
.z1 D 0; z2 D 0; z3 D 0/, we have that E.Yij jgr ; z/ D .0 ; 0 ; 0 ; 0 / (which corre-
sponds to the mean of model g0 ), and for z D .z1 D 1; z2 D 0; z3 D 0/, we obtain
E.Yij jgr ; z/ D .0 ; 0 C ı1 ; 0 C ı1 ; 0 C ı1 / (which corresponds to the mean
of model g3 ) etc. In order to complete the specification of the hierarchical model
defined in (13.5)–(13.7) we need to specify prior and hyperprior distributions for
the indicator variables. We assume that zi and ıi are independent. As before, we use
truncated normal prior distribution (13.7) for ıi and
zi Bernouli.i /;
i U.0; 1/:
As pointed out by O’Hara and Sillanpää (2009), the posterior inclusion probability
of ıi in the model is the posterior mean of zi . Further, using the indicator variable zi ,
we specify a transformation function that uniquely defines each one of the plausible
models (Ntzoufras 2002 and Dellaportas 2002). Let C D .1; 2; 4/ and let Z be a
K 2K matrix (3 8 in our example) given by
0 1
00 0
B0 0 1C
B C
B0 1 0C
B C
B C
B1 0 0C
ZDB C:
B1 0 1C
B C
B0 1 1C
B C
@1 1 0A
11 1
13 Hierarchical Bayesian Approach 207
Recall that in Sect. 13.2.2.3, we fitted two order-constrained models (g5 and g7 ) and
the null model (g0 ) to gene 13386 and we selected model g5 since it had the lowest
DIC value. In this section, we apply the BVS model for the same gene. In Winbugs,
the BVS model can be implemented in a similar way to the model discussed in
Sect. 13.2.2.1 with minor modifications. First, the mean structure needs to be defined
in terms of the parameter vector ˇ, the design matrix X, and the indicator variables z.
for(i in 1:N)
{
Y[i] ˜dnorm(mu[i],tau)
mu[i] <- mu0*X1[i]+ delta1*X2[i]*Z1+ delta2*X3[i]*Z2+ delta3*X4[i]*Z3
}
Z1˜dbern(pi1)
Z2˜dbern(pi2)
Z3˜dbern(pi3)
pi1 ˜ dunif(0,1)
pi2 ˜ dunif(0,1)
pi3 ˜ dunif(0,1)
Finally, the transformation function 1CZC0 for the variable Mr (the object modi
in the panel below) is defined by
208 A. Kasim et al.
modi<-1+Z1+Z2*2+Z3*4
for(r in 1:8)
{
pmod[r]<-equals(modi,r)
}
a b 0.4
Order restricted I
9.0 Order restricted II
BVS
0.3
gene expression
8.8
8.6 0.2
8.4
0.1
8.2
0.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 g0 g3 g2 g6 g1 g4 g5 g7
dose
Fig. 13.2 Bayesian variable selection for gene 13386. Panel (a): posterior means obtained from
the restricted model with four parameters (g7 : dotted line), restricted model with three parameters
(g5 : dotted-dashed line), and the posterior means for the BVS model (dashed line). Panel (b):
N 5 jz; data/ D 0:4186
posterior probabilities of the models, p.g
Figure 13.3a shows the expression levels and fitted model for gene 3413. Note that
the isotonic regression (solid line) predicts a g5 model, but compared with the previ-
ous example, the mean profile is relatively flat. The posterior means obtained from
the BVS model N BVS D .5:288; 5:316; 5:346; 5:394/ (dashed line) is monotone as
required, but similar to the isotonic regression, rather flat. Figure 13.3b shows the
posterior probabilities of the models. The model with the highest posterior probabil-
N 0 jz; data/ D 0:514 (see output below). Hence, for this
ity is the null model with p.g
gene, the data clearly support the null model, i.e., a model with a flat mean profile.
a b
5.8 0.5
Isotonic regression
Order restricted
5.6 BVS 0.4
gene expression
5.4 0.3
5.2 0.2
5.0 0.1
4.8 0.0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 g0 g3 g2 g6 g1 g4 g5 g7
dose
Fig. 13.3 Bayesian variable selection for gene 3413. Panel (a): posterior means obtained from
the restricted model with four parameters (dotted line), isotonic regression (solid line), and the
posterior means for the BVS model (dashed line). Panel (b): posterior probabilities of the models,
N 0 jz; data/ D 0:514
p.g
Three questions arise now: (1) How come that the BVS model did not predict a
flat mean profile? (2) Although N BVS is monotone, how come that it is not equal to
the posterior means obtained for model g7 (dotted line)? and (3) Why is N BVS not
equal to the isotonic regression?
The isotonic regression obtained by the PAVA pools together, the means in the
doses for which the order in the unconstrained means is violated. For gene 3413,
the isotonic regression predicts a pattern of model g5 , but it does not mean that we
reject the null hypothesis of no dose effect. The answers of the questions above are
related to the interpretation of the posterior means for the BVS model.
.t / .t / .t /
Let us consider an MCMC simulation of T iterations and let ˇ .t / D .0 ; z1 ı1 ;
.t / .t / .t / .t /
z2 ı2 ; z3 ı3 / be the parameter vector at iteration t, t D 1; : : : ; T and .t / D Xˇ .t /
be the mean gene expression in iteration t, respectively. The posterior mean N BVS is
calculated using MCMC integration, i.e.,
1 T
N BVS D
˙ .t / :
T t D1
Hence, N BVS is the average of all means which were calculated during the
MCMC simulation. In our example, in 51:4% of the iterations, .t / was equal to
.t / .t / .t / .t /
.0 ; 0 ; 0 ; 0 / (the mean structure of g0 ); in 12:99% of the iterations, it was
13 Hierarchical Bayesian Approach 211
.t / .t / .t / .t / .t / .t /
equal to .0 ; 0 ; 0 Cı2 ; 0 Cı2 / (the mean structure of g2 /; in 10:74% of the
.t / .t / .t / .t / .t / .t / .t /
iterations, it was equal to .0 ; 0 C ı1 ; 0 C ı1 ; 0 C ı1 / (the mean structure
of g3 /, etc. The posterior means obtained from the BVS model should be interpreted
as the model average for the eight possible models which were fitted during the
MCMC simulations. Since at each iteration .t / is monotone, N BVS is monotone as
well. However, we can see clearly the shrinkage through the overall mean since the
weight of g0 is the highest. As we argued above, in this example, the data support
N 0 jz; data/ D 0:514. Up to this
the null model for which the posterior probability p.g
point in this chapter, we did not address the question of how to select the subset
of genes which are differentially expressed. In Chaps. 7 and 8, we used either BH-
FDR or SAM in order to select a subset of genes which were declared differentially
expressed. The posterior probability of the null model will be a key concept in the
next section in which we discuss the issue of inference and multiplicity adjustment
within the hierarchical Bayesian framework.
Since pg .g0 jz; data/ is also the probability that the assignment of the gth gene to
the discovery list is incorrect, the expected number of false discoveries (FD) is
X
m
E.FD/ D pg .g0 jz; data/Ig D cFD.˛/:
gD1
Newton et al. (2007) defined the conditional (on the data) false discovery rate as
where N.˛/ is the number of genes declared EE for a given threshold ˛. Newton
et al. (2004) terms this approach the direct posterior probability approach for
multiplicity adjustment. Note that cFDR.˛/ is interpreted as the average error that
we make when we assign a gene to the discovery list. The value of ˛ is selected in
such a way that cFDR.˛/ will not exceed a pre specified threshold .
In this section, we apply the direct posterior probability approach discussed above
for multiplicity adjustment. As we mentioned above, the framework enables adjust-
ment for false discovery rates among the significant genes. We use the R2WINBUGS
package to fit a gene-specific model and to obtain the posterior probability of the
null model. For each gene, an MCMC simulation of 20,000 iterations (from which
5,000 are used as burn-in period) was used to fit the BVS model. Figure 13.4
shows the relationship between false discovery rate (cFDR), number of significant
genes, and cutoff values. Panel a shows that an increase in cutoff values results in
an increase in false discovery rate. However, the false discovery rate reaches its
maximum of 0.2 at the cutoff of about 0.5. Panel b also shows an increase in the
number of significant genes with an increase in cutoff values. The implication of
the finding is that, as expected, the higher the cutoff value, the larger the number
of significant genes and consequently, the higher the proportion of false positives
among the significant genes. Similar to the frequentists practice, one may wish to
control for false discovery rate at 1% or 5%, which corresponds to cutoff values
of 0.029 and 0.102, respectively. Based on these cutoff values, the corresponding
number of significant genes is 609 and 3,295 genes, respectively.
13 Hierarchical Bayesian Approach 213
a b
15000
0.15
No. of Genes
10000
cFDR
0.10
Fig. 13.4 Adjustment for multiplicity. Panel (a): relationship between the estimated false discov-
ery rate (cFDR) and the cutoff values. Panel (b): the relationship between number of significant
genes and the cutoff values
13.5 Discussion
References
Broet, P., Lewin, A., Richardson, S., Dalmasso, C., & Magdelenat, H. (2004). A mixture model
based strategy for selecting sets of genes in multiclass response microarray experiments.
Bioinformatics, 20(16), 2562–2571.
Chen, M. H., & Kim, S. (2008). The Bayes factor versus other model selection criteria for the
selection of constrained models. In H. Hoijtink, I. Klugkist, & P. Boelen (Eds.), Bayesian
evaluation of informative hypotheses. Berlin: Springer.
214 A. Kasim et al.
Dellaportas, P., Forster, J. J., & Ntzouras, I. (2002). On Bayesian model and variable selection
using MCMC. Statistics and Computing, 12, 27–36.
Dunson, D. B., & Neelon, B. (2003). Bayesian inference on order constrained parameters in
generalized linear models. Biometrics, 59, 286–295.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis (2nd ed).
Boca Raton: Chapman & Hall/CRC.
George, E. I., & McCulloch, R. E. (1993). Variable selection via gibbs sampling. Journal of the
American Statistical Association, 88, 881–889.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996) Markov chain Monte Carlo in practice.
London: Chapman & Hall.
Gelfand, A. E., Smith, A. F. M., & Lee, T.-M. (1992). Bayesian analysis of constrained parameter
and truncated data problems. Journal of the American Statistical Association, 87, 523–532.
Hoijtink, H., Klugkist, I., & Boelen, P. (Eds.). (2008). Bayesian evaluation of informative
hypotheses. Berlin: Springer.
Kato, B. S., & Hoijtink, H. (2006). A bayesian approach to inequality constrained linear mixed
models: estimation and model selection. Statistical Modelling, 6, 231–249.
Klugkist, I., Kato, B., & Hoijtink, H. (2005a) Bayesian model selection using en- compassing
priors. Statistica Neerlandica, 59(1), 57–69.
Klugkist, I., Laudy, O., & Hoijtink, H. (2005b) Inequality constrained analysis of variance: A
bayesian approach. Pyschological Methods, 10(4), 477–493.
Klugkist, I., & Mulder, J. (2008). Bayesian estimation for inequality constrained analysis of
variance. In H. Hoijtink, I. Klugkist, & P. Boelen (Eds.), Bayesian evaluation of informative
hypotheses. Berlin: Springer.
Lewin, A., Bochkina, N., & Richardson, S. (2007). Fully bayesian mixture model for differential
gene expression: simulations and model checks. Statistical Applications in Genetics and
Molecular Biology, 6(1). Article 36.
Myung, J. I., Karabatsos, G., & Iverson, G. J. (2008). A statistician’s view on Bayesuan evaluation
of informative hypotheses. In H. Hoijtink, I. Klugkist, , & P. Boelen (Eds.), Bayesian evaluation
of informative hypotheses. Berlin: Springer.
Newton, M. A., Noueiry, D., Sarkar, D., & Ahlquist, P. (2004). Detecting differential gene
expression with a semi-parametric hierarchical mixture method. Biostatistics, 5(2), 155–176.
Newton, M. A., Wang, P., & Kendziorski, C. (2007). Hierarchical mixture models for expression
profiles. In K. M. Do, P. Muller, & M. Vannucci (Eds.), Bayesian Inference for gene expression
and proteomics. Cambridge: Cambridge University Press.
Ntzoufras, I. (2002). Gibbs variable selection using BUGS. Journal of Statistical Software, 7(7),
1–19.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., van der Linde, A. (2002). Bayesian measures of
model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B
(Statistical Methodology) 64(4), 583–639.
O‘Hara, R. B., & Sillanpaa, M. J. (2009). A review of Bayesian variable selection methods: what,
how and which. Bayesian Analysis, 4(1), 85–118.
Whitney, M., & Ryan, L. (2009). Quantifying dose-response uncertainty using bayesian model
averaging. In R. M. Cooke (Ed.), Uncertainty modeling in dose-response. New York: Wiley.
Chapter 14
Model-Based Approaches
14.1 Introduction
S. Pramana ()
Karolinska Institutet, Department of Medical Epidemiology and Biostatistics,
Stockholm, Sweden
e-mail: [email protected]
Z. Shkedy
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
D. Lin
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
H.W.H. Göhlmann W. Talloen An De Bondt
Janssen Pharmaceutical Companies of Johnson & Johnson, Beerse, Belgium
e-mail: [email protected]; [email protected]; [email protected]
R. Straetemans
Ablynx NV, Zwijnaarde, Belgium
e-mail: [email protected]
J. Pinheiro
Janssen Pharmaceutical Companies of Johnson & Johnson, Titusville, NJ, USA
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 215
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 14,
© Springer-Verlag Berlin Heidelberg 2012
216 S. Pramana et al.
Table 14.1 Number of significant genes for the six compounds. Resampling-based inference
using the LRT and the BH-FDR method for multiplicity adjustment .˛ D 0:05/
JNJa JNJb JNJc CompB CompA CompC
211 251 164 332 72 242
0.8
0.6
Adjusted p−values
0.4
CompA
JnJa
CompB
0.2 CompC
JnJb
JnJc
0.0
Fig. 14.1 BH-FDR adjusted p values for the LRT for the six compounds
a b
13 13
12 +* 12 +*
+* +*
Gene Expression
Gene Expression
+* +*
11 11
+* +*
10 +* 10 +*
+* +*
9 9
Fig. 14.2 Gene expression data and isotonic regression for the gene FOS. (a) Original scale;
(b) logscale
For the log-linear model, is a fixed offset. The above models will be used in this
chapter for illustration but other parametric models, listed in the DoseFinding R
package (Bornkamp, Pinheiro, and Bretz, 2009), can be used as well. This chapter
is organized as follows. In Sects. 14.2 and 14.3, we focus on the analysis of a
single gene and use the parametric models above in order to estimate the dose
218 S. Pramana et al.
for which the gene expression is halfway to the maximum level (ED50 ), while in
Sects. 14.4 and 14.5, we focus on the case in which several genes are ranked based
on their ED50 parameter estimates.
with AICmin D min.AIC.f1 /; : : : ; AIC.fR //. The Akaike weight for the rth model
is given by
1
exp AICr P .fr /
2
PA .fr jD/ D :
PR 1
rD1 exp AIC r P .fr /
2
As discussed in Chap. 10, the Akaike weight PA .fr jD/ can be interpreted as the
weight of evidence that model fr is the best KL model given a set of R models and
given that one of the models in the set must be the best KL model. Akaike weights
can be interpreted as the posterior probabilities of the models. For the case with
non-informative prior probabilities P .fr / D 1=R, the Akaike weights are identical
to the weights discussed in Pinheiro et al. (2006b).
Let fO1 ; fO2 : : : ; fOR be the estimated mean gene expression obtained from a set
of R candidate models and O1 ; O2 ; : : : ; OR be the parameter estimates of primary
interest (for example, the ED50 ) obtained for each candidate model, respectively.
A model selection procedure implies that one of the candidate models will be
selected according to an information criterion, and the parameter of primary interest
will be estimated from the selected model. Let us assume that the AIC is used for
model selection. In this case, the post-selection estimate (Claeskens and Hjort 2008)
for the mean gene expression and are given, respectively, by
X
R X
R
fOAIC D Ir fOr and OAIC D Ir Or : (14.3)
rD1 rD1
P
Note that R rD1 Ir D 1 and Ir D 1 for the model with the smallest value of
AIC. The post-selection estimates in (14.3) do not take into account the uncertainty
associated with the model selection procedure since they are based on a single
model. Thus, the fact that R candidate models were fitted to the data from which the
best model was selected is not reflected in the variability associated with the post-
selection estimates. This
P can be done if we replace the indicator variable in (14.3)
with a weight wr .R/, R rD1 wr .R/ D 1. Hence, the estimates are averaged over all
candidate models, that is
X
R X
R
fOMA D wi .R/fOi and OMA D wi .R/Oi :
i D1 i D1
OMA is called the model average estimate for . Claeskens and Hjort (2008) referred
to the case that Akaike weights are used to average the models, i.e., wr .R/ D
PA .fr jD/ as the smooth AIC weights. In the context of dose-response modeling,
model averaging methods are discussed by Pinheiro et al. (2006a,b), Bretz et al.
(2005), Bornkamp et al. (2009) and Whitney and Ryan (2009).
The ED50 parameter represents the dose at which the mean response is halfway to
the maximum effect. As illustrated in Chap. 4, the R function gnls() allows one
to estimate the ED50 directly as long as it can be specified as a parameter in the
dose-response model (for example, 2 in the 4PL model). This implies that if the
maximum effect, the minimum effect, or both are reached outside the range of the
data, the ED50 can be estimated outside the range of the dose used in the experiment
as well. The R function MCPMod() of the R package DoseFinding estimates
the ED50 in a slightly different way. The ED50 estimate is estimated as the dose
that corresponds to the halfway between the estimated mean at the lowest and the
highest dose levels (hence, not as a parameter of the model). Figure 14.3 illustrates
a scenario in which the maximum effect is reached at a dose level that is higher than
the maximum dose administered in the experiment. The R function MCPMod()
estimates the ED50 as the halfway point from the estimated mean response at the
lowest dose to the estimated mean response at the highest dose. This is done by
a grid search, in which the dose corresponding to the halfway point between the
estimated mean response at the minimum and maximum doses is found. In that way,
the ED50 represents the dose halfway to the maximum effect within the range of the
data. In the context of dose-response microarray experiments, this approach has a
major advantage. The difference between the (model-based) mean gene expression
at the lowest dose and the highest dose is the fold change [or log(fold change)].
Hence, the ED50 can be interpreted as the dose that corresponds to the half-fold
change.
220 S. Pramana et al.
12
11
Gene expression
10
0 1 2 3 4 5 6 7
dose
Fig. 14.3 Estimation of the ED50 by the R package DoseFinding. Dashed lines are the model-
based minimum and maximum gene expression. The dashed dotted line represent the halfway from
the minimum to the maximum effect
Within the DoseFinding R package, the function we use for modeling is the
MCPMod(). A general call for the MCPMod() function has the form
MCPMod(response ˜ dose, models to be fitted, model selection criterion)
14 Model-Based Approaches 221
For our example, we use five different models in (14.2) specified in the R object
models1 as follows:
The R object models1 is given as an input for the function MCPMod(). The
option selModel = “aveAIC” implies that the AIC is used as the model selection
criterion for the calculation of Akaike weights and the option doseEst = “ED” and
doseEstPar = 0.5 implies that the parameter for which model averaging will
be performed is the ED50 . For our example, we use the following code:
Figure 14.4 shows the fitted models (panel a) and the model average estimate
(panel b) calculated according to (14.3).
The object dfeA1 contains the information about the prior probability for each
model (1=R D 0:2, the default) in our case.
> summary(dfeA1)
MCPMod
Input parameters:
alpha = 0.05
alternative: one.sided, one sided
model selection: aveAIC
prior model weights:
sigEmax emax logistic linlog linear
0.2 0.2 0.2 0.2 0.2
dose estimator: ED (p = 0.5)
optimizer: bndnls
The next part of the output presents the AIC values for each model. The model with
the best goodness of fit for the gene FOS is the Emax model with the AIC equal to
63.30 which corresponds to the highest Akaike weight POA .fr jD/ D 0:64. Note that
Akaike weight of the linear model which fits the data poorly is equal to 0.
AIC criterion:
sigEmax emax logistic linlog linear
65.15 63.30 67.53 69.51 86.37
Model weights:
sigEmax emax logistic linlog linear
0.254 0.640 0.077 0.029 0.000
222 S. Pramana et al.
a b
12 12
gene expression
gene expression
11 11
10 10
Linear
Linear Log
9 Hyp Emax 9
Sig Emax Model Average
Logistic Hyp Emax
0 10 20 30 40 0 10 20 30 40
dose dose
c MA
d
0.5
Logis
0.4
SEmax
density
Model
0.3
HEmax
0.2
LinLog 0.1
Lin 0.0
0 5 10 15 20 0 2 4 6
ED50 MA ED50
e f
12 12
gene expression
gene expression
11 11
10 10
9 9
Fig. 14.4 Estimation of ED50 for the gene FOS. Panel (a): five parametric models. Panel (b):
model average estimate for the mean gene expression and the best fitted model. Panel (c): model
specific and model average estimates of the ED50 . Panel (d): histogram and density estimate for
the bootstrap replicates for the model averaged estimates of the ED50 . The vertical lines represent
the 95% CI. Panel (e): data and predicted model based on model averaging. Panel (f): data and
predicted model presented on log scale
14 Model-Based Approaches 223
Parameter estimates for each model are shown in the panel below.
Parameter estimates:
sigEmax model:
e0 eMax ed50 h
9.133 2.833 1.376 0.878
emax model:
e0 eMax ed50
9.169 2.706 1.288
logistic model:
e0 eMax ed50 delta
7.102 4.653 0.040 1.815
linlog model:
e0 delta
9.558 0.707
linear model:
e0 delta
10.052 0.050
For the gene FOS, the ED50 calculated by MCPMod() for the different models is
given below. The model average estimate for the ED50 calculated according to (14.3)
is equal to 1.423. Note that, as discussed in Sect. 14.2.1, the ED50 is estimated
by grid search within the dose range. For that reason, the parameter estimate for
the ED50 obtained for the sigmoid Emax model (1.376) is not equal to the Dose
estimate for this model (1.241).
Dose estimate
Estimates for models
sigEmax emax logistic linlog linear
ED50% 1.241 1.241 2.042 5.405 20.02
In the next stage, we obtain 90% bootstrap confidence intervals using the
function bootMCPMod(). Figure 14.4c–f shows the parameter estimates for the
ED50 obtained from the different models and their corresponding 90% bootstrap
confidence intervals.
The model averaging technique allows us to compare between: (1) the expression
levels of the same gene obtained for different compounds, (2) the expression levels
of all significant genes for a specific compound, and (3) the expression levels of
several genes obtained for different compounds. The three types of comparisons are
discussed in Sects. 14.3–14.5, respectively.
224 S. Pramana et al.
The first comparison of interest is the one between the expression levels of the
same gene for different compounds. This allows us to evaluate the compound
activity based on gene expression data. For the analysis presented in this section, six
compounds are used: CompA, CompB, CompC, and three additional compounds:
JnJa, JnJb, and JnJc. In the section, the comparison is based on the Emax model. In
order to compare between the six compounds, we formulate the parameters in the
model as a linear function of the treatment, that is,
0 1 0 1
E0` 1 C `
@ ED50 A D @ 2 C ı` A ; ` D 2; : : : ; L: (14.4)
`
Emax` 3 C `
H0 W ı` D 0;
(14.5)
H1 W ı` ¤ 0:
The Emax model can be fitted in R using the function gnls() in the following
way:
Note that the model specified above does not include compound effect but assumes
that there a single dose-response relationship for all compounds as can be seen in
the panel below.
> summary(fos.Com0)
Generalized nonlinear least squares fit
Model: resp ˜ E0 + ((dose * Emax)/(xmid + dose))
Data: datafos
AIC BIC logLik
384.7557 397.3457 -188.3779
Coefficients:
Value Std.Error t value p value
Emax 2.385194 0.1494264 15.96233 0e+00
E0 9.548798 0.1226443 77.85765 0e+00
xmid 0.630100 0.1624210 3.87942 1e-04
14 Model-Based Approaches 225
The next model we fit assumes that compounds are different in Emax and ED50 , but
the baseline parameter E0 is equal for all compounds. Note that since the first dose
is equal to 0, E0 can be used in order to estimate the baseline gene expression using
data from all compounds. The change in parameterization can be implemented using
the option params.
params= list(Emax+xmid˜compound,E0˜1)
> summary(fos.Com1)
Generalized nonlinear least squares fit
Model: resp ˜ E0 + ((dose * Emax)/(xmid + dose))
Data: datafos
AIC BIC logLik
236.5949 280.6598 -104.2974
Coefficients:
Value Std.Error t value p value
Emax.(Intercept) 2.527145 0.2105666 12.00164 0.0000
Emax.compoundJnJa 0.491722 0.4200471 1.17063 0.2435
Emax.compoundJnJb -0.174543 0.2520476 -0.69250 0.4896
Emax.compoundJnJc -0.222660 0.2544501 -0.87507 0.3829
Emax.compoundCompB -0.038892 0.2526987 -0.15391 0.8779
Emax.compoundCompC 0.720935 0.2410640 2.99064 0.0032
xmid.(Intercept) 1.869673 0.6478455 2.88599 0.0044
xmid.compoundJnJa 5.625482 2.9956672 1.87787 0.0622
xmid.compoundJnJb -1.661514 0.6480431 -2.56389 0.0113
xmid.compoundJnJc -1.535046 0.6517436 -2.35529 0.0197
xmid.compoundCompB -1.409720 0.6534863 -2.15723 0.0325
xmid.compoundCompC -1.725857 0.6472122 -2.66660 0.0085
E0 9.424942 0.0725147 129.97280 0.0000
In the next step, two additional models are fitted. The first assumes a common Emax
and a common E0 for all compounds and the second assumes different parameters
for each compound. The two models can be fitted using gnls() using the following
two params statements, respectively:
As can be seen in the panel below, the first model implies a compound-specific
ED50 .
226 S. Pramana et al.
> summary(fos.Com2)
Generalized nonlinear least squares fit
Model: resp ˜ E0 + ((dose * Emax)/(xmid + dose))
Data: datafos
AIC BIC logLik
261.1667 289.4941 -121.5833
Coefficients:
Value Std.Error t value p value
Emax 2.669646 0.1002376 26.63318 0.0000
E0 9.441580 0.0774812 121.85639 0.0000
xmid.(Intercept) 2.285470 0.6196621 3.68825 0.0003
xmid.compoundJnJa 3.572604 1.6595000 2.15282 0.0328
xmid.compoundJnJb -1.971663 0.6109348 -3.22729 0.0015
xmid.compoundJnJc -1.744222 0.6112304 -2.85363 0.0049
xmid.compoundCompB -1.686293 0.6101441 -2.76376 0.0064
xmid.compoundCompC -2.215609 0.6161895 -3.59566 0.0004
According to the AIC criterion, the model with the best goodness of fit
is the model with compound-specific parameter while the BIC criterion advocates
the model with the common E0 . For the gene expression experiment, we prefer the
later model since, as we argue above, E0 is the baseline gene expression at dose
zero and difference between compounds at this dose is only due to chance. The
difference between the model with common E0 (fos.Com1) and the model with
common E0 and Emax (fos.Com2) is shown in Fig. 14.5. Note that the model with
common Emax does not imply that at the highest dose level the predicted values of
all compounds are equal since Emax is the asymptote. This can be seen clearly in the
upper panels of Fig. 14.5.
> anova(fos.Com0, fos.Com1 , fos.Com2, fos.Com3)
Let us focus on two compounds: CompA and JnJa. Figure 14.5 shows that the
dose-response curve of CompA is above the dose-response curve of JnJa. This
implies that gene FOS responds faster to CompA as compared with JnJa, and as
a result, the ED50 of the gene for CompA is smaller than the ED50 of the gene for
the compound JnJa as can be seen in Fig. 14.6.
The model averaging methodology discussed above allows us to rank the response
of genes to the increasing (decreasing) doses of a specific compound (CompA)
based on model average estimate for the ED50 . Let O2g;r be the ED50 estimate for
14 Model-Based Approaches 227
a 13 b 13
12 12
Gene expression
Gene expression
11 11
10 CompA 10 CompA
JnJa JnJa
CompB CompB
9 CompC 9 CompC
JnJb JnJb
JnJc JnJc
8 8
0 0.16 0.63 2.5 10 40 0 0.16 0.63 2.5 10 40
doses doses
c 13 d 13
Gene expression
12 12
Gene expression
11 11
10 CompA 10 CompA
JnJa JnJa
CompB CompB
9 CompC 9 CompC
JnJb JnJb
JnJc JnJc
Fig. 14.5 Data and predicted models for the six compounds. Upper panel: models are presented
outside the range of the dose. Lower panel: the models are presented in the range of the dose used
in the experiment. (a and c) Common E0 ; (b and d) common E0 and Emax
gene g and the rth model. As discussed in Sect. 13.2, the gene-specific model
average estimate is given by
X
R
O2g D wr .R/O2g;r ;
rD1
where wr .R/ is the Akaike weight (the posterior probability of the rth model) given
in (10.7). Figure 14.7a, b shows the sorted values of the model average estimates
for the ED50 for all significant genes for CompA with their corresponding 95%
bootstrap confidence intervals.
228 S. Pramana et al.
No compound effect
Common E0
Common E0 & Emax
Compound specific
JnJc
JnJb
CompC
compound
CompB
JnJa
CompA
0 2 4 6 8
ED50
Fig. 14.6 Parameter estimates of the ED50 and 90% confidence intervals for each compound.
c
Vertical line: ED50 obtained from the model without compound effect
Figure 14.7c, d presents the data and the model average estimate for the dose-
response curve for two genes (6792 and 83) with their ED50 equal to 1.143 and
4.96, respectively. Two main patterns can be observed in Fig. 14.7. First, the dose
response curve for the gene 6792 indeed increases sharply compared to gene 83 and
as a result the parameter estimate for the ED50 for gene 6792 is smaller as compared
to gene 83. Second, the fold change of the gene 6792 is higher than the fold change
of the gene 83. This implies that the gene 6792 responds to the increasing doses
of CompA quicker (smaller value of ED50 ) and with a higher increment of gene
expression (a higher fold change).
14 Model-Based Approaches 229
a b
30
30 25
20
ED50
ED50
20
15
10
10
0 0
0 10 20 30 40 0 5 10 15 20 25 30
index index
c d 10.4
12
10.2
gene expression
gene expression
11
10 10.0
9 9.8
0 10 20 30 40 0 10 20 30 40
doses doses
Fig. 14.7 Parameter estimates of the ED50 for CompA. Upper panels: model average estimates
and 95% CI. Lower panel: data and model average of the dose-response curve for the genes 6792
and 83. (a) ED50 and 95% CI for genes with upward trend. (b) ED50 and 95% CI for genes with
downward trend. (c) Gene 6792 (ED50 D 1:143). (d) Gene 83 (ED50 D 4:96)
In the previous section, we compare between the response of genes for a single
compound dose-response experiment. In some cases, a multi-compound dose-
response microarray experiment is conducted in order to evaluate a new compound
against a known compound. In this setting, the comparison of primary interest is the
230 S. Pramana et al.
20
15
JnJa
10
5 10 15 20
CompA
Fig. 14.8 Model average estimate of the ED50 for significant genes under two compounds:
CompA and JnJa
change in gene expression across compounds for several genes. In this section, we
illustrate the use of a model averaging technique in order to compare the change in
gene expression between two compounds: CompA and JnJa. Figure 14.8 shows the
model average estimates for the ED50 obtained for CompA and JnJa. We notice
that the majority of the genes which were found to be significant for the two
compounds respond faster to CompA, as compared to JnJa, i.e., their model average
estimates for ED50 are higher under JnJa compared to the corresponding model
average estimates for the ED50 under CompA. Figure 14.9 shows an example of
two genes. The first gene (6792) responds quicker under CompA compared to the
new compound (model average estimates of the ED50 are equal to 1.143 and 4.25
under CompA and JnJa, respectively). The same pattern can be observed for the
second gene (83) with model average estimate of the ED50 equal to 4.96 and 16.92
under CompA and JnJa, respectively.
14 Model-Based Approaches 231
a 13 b 10.5
12
gene expression
gene expression
10.0
11
10
9.5
CompA CompA
JnJa JnJa
9 CompA CompA
JnJa JnJa
8 9.0
0 10 20 30 40 0 10 20 30 40
doses doses
Fig. 14.9 Data and model average for the dose-response curves of two genes under CompA (solid
line) and JnJa (dashed line). (a) Gene 6792 (ED50 D 1:143/. (b) Gene 83 (ED50 D 4:96/
14.6 Discussion
The aim of the analysis presented in this chapter was not to detect differentially
expressed genes with respect to an increasing dose, but to investigate in more
details the response of genes which were found to have a significant dose-response
relationship. In contrast with the previous chapter in which we assume that the dose-
response relationship is a step function, in this chapter, we used parametric models
in order to estimate the dose-response curve. This allows us to estimate the ED50
parameter for each gene and to compare the response of several genes for a single
compound or several compounds. Such comparisons are of primary interest when
the researcher would like to investigate the response of target genes to a certain
compound to a certain compound or to compare the response of a gene list between
compounds. In this chapter, we used five parametric dose-response models for
illustration. The R package DoseFinding allows to use other parametric models
according to the preference of the researcher.
References
Bornkamp, B., Pinheiro, J. C., & Bretz, F. (2009). MCPMod: an R package for the design and
analysis of dose-finding studies. Journal of Statistical Software, 29(7).
Bornkamp, B., Pinheiro, J. C., & Bretz, F. (2012). Reference manual for the R Package
DoseFinding. http://cran.r-project.org/web/packages/DoseFinding/index.html
Bretz, F., Pinheiro, J. C., & Branson, M. (2005). Combining multiple comparisons and modeling
techniques in dose-response studies. Biometrics, 61, 738–748.
Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. Cambridge Series in
Statistical and Probabilistic Mathematics.
232 S. Pramana et al.
Pinheiro J.C., Bornkamp B., & Bretz F., (2006a). Design and Analysis of dose finding studies
combining multiple comparisons and modeling procedures. Journal of Biopharmaceutical
Statistics, 6, 639–656.
Pinheiro, J. C., Bretz, F., & Branson, M. (2006b). Analysis of dose-response studies—Modeling
approaches. In N. Ting (Ed.), Dose finding in drug development (pp. 146–171). New York:
Springer.
Whitney, M., & Ryan, L. (2009). Quantifying dose-response uncertainty using bayesian model
averaging. In Cooke, R. M. (Ed.), Uncertainty modeling in dose-response. New York: Wiley.
Chapter 15
Multiple Contrast Tests for Testing
Dose–Response Relationships Under
Order-Restricted Alternatives
15.1 Introduction
In Chaps. 3, 7, and 8, we discussed five test statistics that can be used for testing
the null hypothesis of homogeneity of means against order-restricted alternatives.
A rejection of the null hypothesis implies a significant monotone trend of gene
expression with respect to dose. In the case study of epidermal carcinoma cell
line data in Sect. 10.1, we showed that 3,499 genes were found to be significant
when the LRT (the EN 01 2
test statistic) was used to test the null hypothesis of no
dose effect against the order-restricted alternatives and the BH-FDR procedure
was used to control the FDR. Among the significant genes, 1,600 genes exhibited
increasing trends and 1,899 genes showed decreasing trends. In this chapter, we
employ an alternative method to find genes with monotonic trends, namely, the
multiple contrast test (MCT).
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
L.A. Hothorn
Institute of Biostatistics, Leibniz University Hannover, Hannover, Germany
e-mail: [email protected]
G.D. Djira
Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
e-mail: [email protected]
F. Bretz
Integrated Information Sciences, Novartis Pharma AG, Novartis Campus, Basel, Switzerland
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 233
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 15,
© Springer-Verlag Berlin Heidelberg 2012
234 D. Lin et al.
The four t-type test statistics (Williams’, Marcus’, M, and the modified M),
which were considered in Chaps. 3 and 7, can be used to test the mean expression
levels between the highest dose and the control in a step-down fashion. As an
analogue, Bretz (2006) proposed the use of MCTs for Williams’ and Marcus’ tests,
when the question of interest in the dose-response study is to find genes with
significant monotonic trends.
The content of the chapter is organized as follows. In Sect. 15.2, we summarize
the pool-adjacent-violator algorithm (PAVA) to obtain the order-restricted mean re-
sponses using isotonic regression and introduce the MCTs. In particular, Williams-
and Marcus-type MCTs are discussed. In Sect. 15.3, we show the use of the R
package multcomp to illustrate the MCTs for these contrasts. In Sect. 15.4, we
discuss the topic of partial order alternative and show that umbrella alternatives can
be defined and derive suitable MCTs. The human epidermal carcinoma cell line data
are used as case study in this chapter.
In this section, we formulate the test for order-restricted inference and introduce the
expression of Williams’ and Marcus’ contrasts in terms of the MCT proposed by
Bretz (1999, 2006).
In order to obtain the isotonic means, the most widely used technique is the
PAVA (Ayer et al. 1955; Barlow et al. 1972; Robertson et al. 1988). If sample
means for neighboring doses are not in a monotonic restricted order, the method
non-parametrically amalgamates the means, until the amalgamated means are
completely ordered. The result of the algorithm can be linked to the following
analytical expression using max–min formulas. Given n0 ; n1 ; : : : ; nK observations
(arrays) and sample means yN0 ; yN1 ; : : : yNK , at doses d0 ; d1 ; : : : ; dK , respectively, and
assuming normally distributed data, the maximum likelihood estimates O ?i , subject
to the simple order restriction .d0 / .d1 / .dK /, are given by
Pv
ni yNi
O ?i D max min Pi Du
v ; i D 0; 1; : : : ; K; (15.1)
0ui i vK i Du ni
Pn i
where yNi D j D0 yij =ni is the sample mean for dose i D 0; 1; : : : ; K.
15 Multiple Contrast Tests for Testing Dose–Response Relationships 235
P vector c with
where the contrast the weights ci as the contrast coefficients fulfilling
P
the condition K c
i D0 i D 0. T SC
is univariate central t-distributed with D K i D0
.ni 1/ degrees of freedom under H0 .
MCTs were first described by Mukerjee et al. (1986, 1987) in a general and
thorough manner. The test can be used to test for trend under order restrictions. The
main reason for developing such a test was to obtain a test with a similar power
behavior as the LRT, but easier to use.
MCTs seek to locate several contrast vectors, as good as possible in the
alternative space (Bretz 1999). The aim of this approach is therefore to cover most
parts of the alternative space by choosing some selected vectors within this space
and conduct the MCT with respect to this vector. The resulting test, T MC , is the
maximum over r of such SCTs T SC defined in (15.2):
Up
[
7
Up
H1 D H1.i / ;
i D1
where
Up
H1.1/ W .d0 / D .d1 / D .d2 / < .d3 /;
Up
H1.2/ W .d0 / < .d1 / D .d2 / D .d3 /;
Up
H1.3/ W .d0 / D .d1 / < .d2 / D .d3 /;
Up
H1.4/ W .d0 / < .d1 / D .d2 / < .d3 /;
Up
H1.5/ W .d0 / < .d1 / < .d2 / < .d3 /;
Up
H1.6/ W .d0 / D .d1 / < .d2 / < .d3 /;
Up
H1.7/ W .d0 / < .d1 / < .d2 / D .d3 /:
Every true dose-response relationship will fall into exactly one of the subal-
Up
ternatives when H1 is true. Each sub-hypothesis can be tested with a SCT. The
advantages of using the MCTs for an order-restricted alternative are twofolds:
(1) rejecting any of these sub-hypotheses indicates a significant dose-response
relationship, and (2) the maximum of MCTs will determine the configuration of
the isotonic means.
Note that these seven subalternatives are identical to the seven possible dose-
response curve shapes given in Table 9.1. In Chap. 10, we classified genes into one
of seven models (from g1 to g7 ) based on the posterior model probability using
different information criteria. In this chapter, by means of defining all possible sub-
alternatives as a set of SCTs, the maximum test statistic value of MCT will find the
“best” dose-response relationship. Therefore, a MCT can be an alternative approach
to identify the dose-response curve shape based on a parametric test.
15 Multiple Contrast Tests for Testing Dose–Response Relationships 237
In Williams’ procedure (Williams 1971, 1972), the amalgamated mean for the
highest dose .dK / can be expressed as follows (Bretz 1999):
X
K X
K
O ?K D max1uK ni yNi = ni
i Du i Du
n1 yN1 C n2 yN2 C : : : nK yNK nK1 yNK1 C nK yNK
D max ;:::; ; yNK
n1 C n2 C : : : nK nK1 C NK
80 10 19
ˆ
ˆ 0 ::: 0 1 yN >
>
ˆB
ˆ
1 >
<B 0 : : : nK1 CnK nK1 CnK C B yN2 C>
nK1 nK C B C =
D max B B C B C
ˆ :
:: :
:: :
:: C @ : A>
:
ˆ
ˆ@ ::: A : > >
:̂ >
;
n1
n1 C:::CnK
: : : nK1 nK
n1 C:::CnK n1 C:::CnK
N
y K
D maxC yN .0/ ;
where yN .0/ D .yN1 ; yN2 ; : : : ; yNK /0 . Note that the arrays of the control group are not
included in the amalgamation process. In Williams’ test, we have
˚
O ?K yN0 D max C yN .0/ yN0 1
D max fŒ1 C C yg
N
D maxC Wil y;
N
The maximum contrast (15.4) consists of comparisons of the control with the
weighted average over the last i treatments (i D 1; : : : ; K). The contrast matrix
C MC (15.4) for Williams’ test is given by (15.5).
Williams’ MCT takes the order restriction of the means into account through
the contrast definition following the construction of the isotonic mean estimates
238 D. Lin et al.
[in (15.1)]. As pointed out by Bretz (1999), Williams’ t-type test statistic is not
identical to Williams’ MCT because they have different variance estimators. The
variance in the MCT is the completely studentized statistic by making use of the
mean square error, i.e.,
v
uK
uX X
K X
ni
t 1=ci .yij yNi /2 = ;
i D0 i D0 j D0
while Williams’ and Marcus’ tests used the mean square error for two sample t-test
with
XK Xni
.yij yNi /2 = .1=.n0 / C 1=.nK //:
i D0 j D0
Marcus-type MCT can be derived in a similar way to the Williams-type MCT (Bretz
1999, 2006). The amalgamated mean for the control dose .d0 / is given by
X
v X
v
O ?0 D min0vK ni yNi = ni
i D0 i D0
n0 yN0 C n1 yN1 n0 yN0 C n1 yN1 C C nK yNK
D min yN0 ; ;:::; :
n0 C n1 n0 C n1 C C nK
Therefore,
nj yNj C C nk yNK n0 yN0 C C ni yNi
O ?K O ?0 D max 0; max0i;j K :
nj C C nK n0 C C ni
(15.6)
The difference O ?K O ?0 can be represented as a simple maximum term in (15.6).
A natural way of applying the MCT principle to Marcus’ (1976) approach is to
identify each element of (15.6) as a contrast. However, a closed form expression for
the resulting contrast matrix C MC for Marcus’ test depends on the number of dose
levels.
correlation between each two of the r estimated contrasts, say c l and c m under
H0 PK
i D0 cli cmi =ni
lm D q P P ;
. i D0 cli2 =ni /. K
K 2
i D0 cmi =ni /
assuming that the errors are normal distributed with homogeneous group-specific
variances. Thus, the inference of MCTs can be based on the multivariate t-
distribution (Genz and Bretz 2009).
In this section, we use the multcomp package (Bretz et al. 2010) to illustrate the
analysis for the human epidermal carcinoma cell line data. To automatically obtain
Williams’ and Marcus’ contrast coefficients from the package, the dose levels are
required.
> library(multcomp)
> x.res=as.factor(c(rep(1,3),rep(2,3),rep(3,3),rep(4,3))) ##dose levels
> n <- table(x.res)
where n is a vector containing the sample sizes under each dose group and base
specifies the control group for Dunnett’s test.
It is easy to note that Williams’ contrasts are all contained in Marcus’ contrasts
(the fourth, second, and first rows). These are suitable for concave dose-response
shapes, as the higher dose groups are being pooled and compared to the control.
Two of Marcus’ contrasts (given by the fifth and sixth rows) seem to be suitable to
detect convex shapes, as they take the average over the lower doses.
For each gene, we first fit the ANOVA model with the dose as factor and then
use the function glht with the contrasts obtained above to obtain the MCT. The
general form of the function glht() is given by
glht(model, linfct, alternative = c("two.sided", "less", "greater"), ...)
where linfct needs to be specified as the matrix of linear functions, which can
be a matrix of contrast coefficients, a symbolic description of linear hypothesis, or
multiple comparisons of means obtained from the mcp() function. The function
mcp() can be used to create multiple comparisons defined in the types of the
function contrMat().
The example of using this function to obtain Williams-type MCT is shown below
for the first gene:
> amod <- aov(gene1 ˜ x.res)
> summary(glht(amod, linfct = mcp(x.res = "Williams")))
Simultaneous Tests for General Linear Hypotheses
Multiple Comparisons of Means: Williams Contrasts
Fit: aov(formula = gene1 ˜ x.res)
Linear Hypotheses:
Estimate Std. Error t value p value
C 1 == 0 0.20915 0.20108 1.040 0.474
C 2 == 0 0.08395 0.17414 0.482 0.835
C 3 == 0 -0.01657 0.16418 -0.101 0.997
(Adjusted p values reported -- single-step method)
For analysis of the whole case study data, the following code can be used to
record all the p values of the MCT of all the genes:
After adjusting for multiplicity using the BH-FDR procedure, the numbers of
significant genes that result from using Williams- and Marcus-type MCTs are 2,880
and 3,303, respectively. These numbers are slightly smaller than the Williams’
and Marcus’ tests presented in Chap. 7. The plot of adjusted p values is given in
Fig. 15.1.
The MCTs discussed in the previous sections assume a simple order alternative. In
this section, we discuss two other ordered alternatives: the simple tree alternative
and the unimodal partial order alternative (umbrella alternative). The latter was
discussed in Chap. 11 where we discussed the ORIOGEN and ORICC algorithms
for inference and clustering of order-restricted gene profiles. The simple tree
alternative .d0 / Œ.d1 /; : : : ; .dK / can be used if the primary interest of the
analysis is a comparison between dose zero and all other dose levels, .d0 /
.di /; i D 1; : : : ; K, but we do not specify any order restrictions among
.d1 /; : : : ; .dK /. The contrast matrix for Dunnett’s test can be constructed using
the option type = “Dunnett”. We notice that, since there are no order restrictions
among .d1 /; : : : ; .dK /, the contrast matrix for Dunnett’s test implies that at each
dose level the numerator of the test statistic is equal to yNi yN0 , while for Williams’s
test the numerator is equal to O i yN0 .
242 D. Lin et al.
1.0
adjp Williams
adjp Marcus
0.8 rawp Williams
rawp Marcus
0.6
p−values
0.4
0.2
0.0
Fig. 15.1 Raw and adjusted p values for Williams- and Marcus-type MCTs
1 2 3 4
2 - 1 -1 1 0 0
3 - 1 -1 0 1 0
4 - 1 -1 0 0 1
> conWilliams
Multiple Comparisons of Means: Williams Contrasts
1 2 3 4
C 1 -1 0.0000 0.0000 1.0000
C 2 -1 0.0000 0.5000 0.5000
C 3 -1 0.3333 0.3333 0.3333
Dunnett’s test can be performed in the multcomp package using the option
mcp(x.res “Dunnett”). The panel below shows the test statistics for both
Dunnett’s and Williams’ tests and, as expected, except for the last dose level, the
test statistics are different:
15 Multiple Contrast Tests for Testing Dose–Response Relationships 243
>
> #Dunnett
> amod <- aov(gene2˜x.res)
> summary(glht(amod, linfct = mcp(x.res = "Dunnett")))
Multiple Comparisons of Means: Dunnett Contrasts
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0 -0.14830 0.08123 -1.826 0.2340
3 - 1 == 0 0.29035 0.08123 3.574 0.0181 *
4 - 1 == 0 2.31433 0.08123 28.490 <0.001 ***
---
> #Williams
> summary(glht(amod, linfct = mcp(x.res = "Williams")))
Multiple Comparisons of Means: Williams Contrasts
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
C 1 == 0 2.31433 0.08123 28.49 <1e-06 ***
C 2 == 0 1.30234 0.07035 18.51 <1e-06 ***
C 3 == 0 0.81880 0.06633 12.35 <1e-06 ***
---
The umbrella point h is usually unknown. Bretz and Hothorn (2001, 2003) investi-
gated MCTs for testing non-monotone order-restricted alternatives. The basic idea
is that (for an up–down umbrella profile) the primary interest is not to investigate
the (possibly monotone) decline of the response after dose h but to assess the trend
up to dose h (Bretz and Hothorn 2003). Hence, for each possible value of h, Bretz
and Hothorn (2003) define the following alternative:
H1 W 0 1 h ;
with at least 0 < h . For the case study of human epidermal carcinoma data,
there are four dose levels, and therefore, the umbrella point can be located at
the first or the second dose levels (with the control as zero dose). The contrast
matrix for a Williams-type umbrella alternative can be constructed using the option
type = “UmbrellaWilliams”. Note that the first three rows (C1 C3 ) of the
contrast matrix below are identical to the contrasts defined for Williams’ MCT for
simple order alternative. The three contrasts C4 C6 correspond to an alternative
with umbrella point at the second and third dose levels. For example, Fig. 15.2
illustrates a pattern in which there is a monotone trend up the third dose level. This
pattern corresponds to contract C5 . Note that the mean gene expression in the last
244 D. Lin et al.
dose
Fig. 15.2 Illustrative example of umbrella alternative. Possible mean patterns for a C5 profile.
Solid line: 0 < 1 D 2 D 3 . dotted line: 0 < 1 D 2 < 3 . Dotted-dashed line (an
umbrella pattern with umbrella point at the third dose level): 0 < 1 D 2 > 3
dose level is not restricted, i.e., all relationships 2 D 3 , 2 < 3 , and 2 > 3
are possible.
1 2 3 4
C 1 -1 0.0000 0.0000 1.0000
C 2 -1 0.0000 0.5000 0.5000
C 3 -1 0.3333 0.3333 0.3333
C 4 -1 0.0000 1.0000 0.0000
C 5 -1 0.5000 0.5000 0.0000
C 6 -1 1.0000 0.0000 0.0000
Figure 15.3 shows an illustrative example for one gene. Note that the only
difference (in terms of gene expression) between panel a and b is in the last dose
level for which the gene expression in panel b shifted up.
For the expression levels presented in Fig. 15.3a, the test statistic for the
maximum contrast is equal to 8.398 .p < 0:0001/, indicating that the null hypoth-
esis is rejected and that the most likely dose-response relationship is pattern C5 ,
0 < 1 D 2 . For the expression levels presented in Fig. 15.3b, the test statistic
for the maximum contrast test is equal to 9.041 .p < 0:001/, indicating that the
most likely profile is C3 , 0 < 1 D 2 D 3 .
15 Multiple Contrast Tests for Testing Dose–Response Relationships 245
a 6.0
b 6.0
5.5 5.5
5.0 5.0
gene expression
gene expression
4.5 4.5
4.0 4.0
3.5 3.5
3.0 3.0
Fig. 15.3 Umbrella alternative. Illustrative example of one gene. Dashed line: ordered-restricted
mean for up–down umbrella profiles with maximum at the second dose level. Solid line: ordered-
restricted mean for up–down umbrella profiles with maximum at the third dose level
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
C 1 == 0 0.2400 0.1961 1.224 0.51867
C 2 == 0 0.8144 0.1698 4.797 0.00395 **
C 3 == 0 1.0307 0.1601 6.438 < 0.001 ***
C 4 == 0 1.3889 0.1961 7.084 < 0.001 ***
C 5 == 0 1.4260 0.1698 8.398 < 0.001 ***
C 6 == 0 1.4632 0.1961 7.463 < 0.001 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Adjusted p values reported -- single-step method)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
C 1 == 0 1.4900 0.1961 7.599 <0.001 ***
C 2 == 0 1.4394 0.1698 8.477 <0.001 ***
C 3 == 0 1.4474 0.1601 9.041 <0.001 ***
C 4 == 0 1.3889 0.1961 7.084 <0.001 ***
C 5 == 0 1.4260 0.1698 8.398 <0.001 ***
C 6 == 0 1.4632 0.1961 7.463 <0.001 ***
---
246 D. Lin et al.
15.5 Discussion
In this chapter, we have shown a representation of Marcus’ test, i.e., the mean
difference of gene expression level at the highest dose and the control under an
order restriction by means of multiple contrasts, namely, the Marcus-type MCT. In
order to answer the testing problem defined in Sect. 15.2.2.1, we transform the trend
tests of Williams and Marcus into the corresponding Williams-type and Marcus-
type multiple contrasts tests (Bretz 1999, 2006). This method efficiently utilizes the
multivariate t-distribution as the basis for the inference on the MCTs. Therefore,
the FWER is controlled within the gene in this case. At the same time, we have also
applied the BH-FDR procedure across genes to control the overall type I error rate.
This approach of combining two types of multiple testing adjustments (i.e., the
FWER/gene and the FDR/genes) is one option in controlling the type I error in
this setting of ordered-restricted inference using the MCT. This two-stage approach
has the advantage of easy implementation and a clear interpretation. We can also
consider the approache of the FWER/gene and FWER/genes to compare their
powers and controls of the error rate. However, this is beyond the scope of this
book. We propose a simulation study to further investigate this topic.
We have compared this approach to the results of the likelihood ratio test,
Williams’ and Marcus’ tests discussed in Chap. 7. Somehow, the number of
significant findings is slightly smaller than that detected using Williams’ and
Marcus’ tests. They differ in the standard error and estimation method using
inference based on permutations or exact distribution. Inference and clustering for
partial order alternative were discussed previously in Chap. 11. In this chapter, we
address this topic within the MCT framework and we discussed MCTs for both
simple tree alternative (using MCT for Dunnett’s test) and umbrella Alternatives
(using Williams-type umbrella MCT).
References
Ayer, M., Brunk, H. D., Ewing, G. M., Reid, W. T., & Silverman, E. (1955). An empirical
distribution function for sampling with incomplete information. The Annals of Mathematical
Statistics, 26, 641–647.
Barlow, R. E., Bartholomew, D. J., Bremner, M. J., & Brunk, H. D. (1972). Statistical inference
under order restriction. New York: Wiley.
Bretz, F. (1999). Powerful modifications of Williams’ test on trend. Ph.D. dissertation. Vom
Fachbereich Gartenbau der Universitat R Hannover, Hannover, http://www.biostat.uni-hannover.
de/fileadmin/institut/pdf/thesis bretz.pdf.
Bretz, F., & Hothorn, L. A. (2001). Testing dose-response relationship with a priori unknown,
possibly non monotone shapes. Journal of Biopharmaceutical Statistics, 11, 193–207.
Bretz, F., & Hothorn, L. A. (2003). Statistical analysis of monotone and non-monotone dose-
response data from in vitro toxicological assays. Alternatives to Laboratory Animals (ALTA)
(Suppl. 1), 31, 81–90.
Bretz, F. (2006). An extension of the Williams trend test to general unbalanced linear models.
Computational Statistics & Data Analysis, 50, 1735–1748.
15 Multiple Contrast Tests for Testing Dose–Response Relationships 247
Bretz, F., Hothorn, T., & Westfall P. (2010). Multiple comparisons using R. Boca Raton: CRC
press.
Dunnett, C. W. (1955). A multiple comparison procedure for comparing several treatments with a
control. Journal of the American Statistical Society (JASA), 50, 1096–1121.
Dunnett, C. W. (1964). New tables for multiple comparisons with a control. Biometrics, 20, 482–
491.
Genz, A., & Bretz, F. (2009) Computation of multivariate normal and t probabilities. Lecture Notes
in Statistics, 195 (Springer).
Hothorn, L. A. (2006). Multiple comparisons and multiple contrasts in randomized dose-response
trials—Confidence interval orient approaches. Journal of Biopharmaceutical Statistics, 16,
711–731.
Marcus, R. (1976). The powers of some tests of the quality of normal means against an ordered
alternative. Biometrika, 63, 177–83.
Mukerjee, H., Roberston, T., & Wright, F. T. (1986). Multiple contrast tests for testing against
a simple tree ordering. In R. Dykstra (Ed.), Advances in order restricted statistical inference
(pp. 203–230). Berlin: Springer.
Mukerjee, H., Roberston, T., & Wright, F. T. (1987). Comparison of several treatments with a
control using multiple contrasts. Journal of the American Statistical Association, 82, 902–910.
Robertson, T., Wright, F. T., & Dykstra, R. L. (1988). Order restricted statistical inference.
New York: Wiley.
Somerville, P. (1997). Multiple testing and simultaneous confidence intervals: calculation of
constants. Computational Statistics & Data Analysis, 25, 217–233.
Somerville, P. (1999). Critical values for multiple testing and comparisons: One step and step down
procedures. Journal of Statistical Planning and Inference, 82(1), 129–138(10).
Tukey, J. W. (1953). The problem of multiple comparisons. Unpublished manuscript. In The
collected works of John W. Tukey VIII. Multiple comparisons, 1948–1983 (pp. 1–300).
New York: Chapman & Hall.
Williams, D. A. (1971). A test for differences between treatment means when several dose levels
are compared with a zero dose control. Biometrics, 27, 103–117.
Williams, D. A. (1972). The comparision of several dose levels with a zero dose control.
Biometrics, 28, 519–531.
Chapter 16
Simultaneous Inferences for Ratio Parameters
Using Multiple Contrasts Test
16.1 Introduction
As we mentioned in Sect. 10.1, for the case study of epidermal carcinoma cell line
2
data, 3,499 genes were found to be significant when the LRT (the E01 test statistic)
was used to test the null hypothesis (3.2) against the order-restricted alternative (3.3)
or (3.4) and the BH-FDR procedure was used to control the FDR. However, the
rejection of the null hypothesis (3.2) does not indicate the magnitude by which gene
expression increases or decreases. In this chapter, we wish to search for genes for
which the mean gene expression increases by 100 ı% from the control for at least
one (perhaps the highest) dose, where ı is a relative biological ratio of interest.
Genes can be tested for a prespecified ı and can be ranked based on the
significance of the ratio test. Genes found significant with a large ı are of higher
interest to the biologists. In the case study of human epidermal carcinoma cell line
data, the experiment includes four EGF doses (three doses and a control) for the
control compound and three arrays per dose level. Figure 16.1 shows four genes, for
which O ?K =O ?0 D 2:1 (panels a and b) and O ?K =O ?0 D 0:48 (panels c and d). The
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
Z. Shkedy T. Burzykowski
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]; [email protected]
G.D. Djira
Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
e-mail: [email protected]
L.A. Hothorn
Institute of Biostatistics, Leibniz University Hannover, Hannover, Germany
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 249
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 16,
© Springer-Verlag Berlin Heidelberg 2012
250 D. Lin et al.
a b
12 12
10 10
gene expression
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
c d
12 12
10 gene expression 10
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
Fig. 16.1 Panels (a) and (b): example of two genes with an increasing trend where O ?K =
O ?0 D 2:1,
but the estimated mean expression of the highest dose O K for gene in panel (b) is larger than that
?
for gene in panel (a). Panels (c) and (d): example of two genes with a decreasing trend, O ?0 D
O ?K =
0:48, but the estimated mean expression at the highest dose O K for gene in panel (c) is smaller than
?
estimated mean expression for the control dose O ?0 is different for genes a and b
(genes c and d ) and, clearly, the estimated mean expression for the highest dose O ?K
is different for genes a and b (genes c and d ). The aim of the study is to identify
genes, for which the increase (decrease) of the mean gene expression from baseline
is at least 110%, regardless of mean gene expression ?0 at the zero dose. Note that
for a log transformed data, the ratio test, discussed in this chapter, is less attractive
since one can use the multiple contrasts tests (MCTs) discussed in Chap. 15 for the
difference. However, we use the human epidermal carcinoma data to illustrate the
potential use of the ratio test for the cases in which the ratio (the fold change) is of
primary interest.
The four t-type test statistics (Williams’, Marcus’, M, and the modified M),
which were considered in the previous chapters, can be used to test the mean
expression levels between the highest dose and the control. Bretz (2006) proposed
the use of MCTs for Williams’ and Marcus’ tests, when the question of interest
in the dose-response study is formulated as multiple contrasts, i.e., as differences
of treatment means. In this chapter, we extend this idea to test ratios of linear
16 Simultaneous Inferences for Ratio Parameters Using Multiple Contrasts Test 251
We have discussed the link between Williams- and Marcus-type MCTs and the
corresponding testing procedures in Chap. 15. The link between the ratio test
and the MCT can be established in a similar way. In this section, we introduce a
more general procedure used to perform multiple testing for several ratios of linear
combination of treatment means.
A two-sided multiple contrast test for ratios of linear combinations of means
.di / for doses di , i D 0; : : : ; K, is given by
c 0` c 0`
H0` W D against H1` W ¤ ; ` D 1; : : : ; r; (16.1)
d 0` d 0`
where c ` and d ` are vectors of known contrast coefficients associated with the
numerator and denominator of the `th ratio, is the relative threshold, and
D ..d0 /; : : : ; .dK //0 . Note that one can also test the r ratios against different
thresholds ` , ` D 1; : : : ; r.
The likelihood ratio statistics to test the set of hypotheses in (16.1) are given by
d 0` b
c 0` b
T` . / D t. /; ` D 1; : : : ; r; (16.2)
b
Œ 2d 0 M d 1
2 c 0` M d ` C c 0` M c ` 2
` `
.c i d i /0 M .c j d j /
ij D p p ; (16.3)
.c i d i /0 M .c i d i / .c j d j /0 M .c j dj /
ProbfjT` . /j c1˛;R 0 ; ` D 1; 2; : : : ; rg D 1 ˛:
252 D. Lin et al.
For the hypotheses in (16.1), we conclude H1` if jT` . /j > c1˛;R 0 . The associated
multiplicity-adjusted two-sided p values can be calculated as
16.3 Ratio Test for the Highest Dose Versus the Control
As mentioned in Sect. 16.1, one of our primary interests is the inference about the
ratio of the mean gene expression (under an order restriction) at the highest dose
and the control in a dose-response microarray experiment. For this purpose, we
formulate one-sided ratio hypotheses about an increasing and decreasing trend for
each gene, respectively, as follows:
H0U W .dK /
.d0 /
.1 C ı/ vs. H1U W .dK /
.d0 /
> .1 C ı/ and .d0 / ; : : : ; .dK /;
H0D W .dK /
.d0 /
.1 ı/ vs. H1D W .dK /
.d0 /
< .1 ı/ and .d0 / ; : : : ; .dK /:
(16.4)
Note that under order restriction (increasing), k =0 is the largest ratio and
therefore .dK /=.d0 / D max..dj /=.di //; 0 i < j r. But this is
not necessarily true for max.c 0 =d 0 /, where c and d are arbitrary contrast
coefficients. In other words, under order restriction, .dK /=.d0 / is the maximum
of ratios of consecutive doses, which is true for Williams’ and Marcus’ tests.
They are one-sided special cases of the general formulation given in the previous
section. The relative thresholds are D 1 C ı and D 1 ı for the increasing
and decreasing trends, respectively. The relative mean difference ı quantifies the
biological importance of the trend with respect to the increasing doses. For ı > 0,
rejecting the null hypothesis in (16.4) implies that the mean gene expression
at the highest dose increases or decreases by 100 ı% compared to the mean
expression for the control. Formulating the hypotheses as ratios implies that the
relative threshold ı is chosen independently of the gene expression for the control.
In general, for hypotheses stated as ratios, the thresholds do not depend on the
dimension of the outcome variable.
For example, when K D 3, the numerator C D .c 01 ; : : : ; c 0r / and denominator
D D .d 01 ; : : : ; d 0r / contrast coefficients defining Williams’ trend test as ratios are
respectively given by
0 1
0 0 0 1
B0 n2 n3 C
C D@ 0 n2 Cn3 n2 Cn3 A
n1 n2 n3
0 n1 Cn2 Cn3 n1 Cn2 Cn3 n1 Cn2 Cn3
16 Simultaneous Inferences for Ratio Parameters Using Multiple Contrasts Test 253
and 0 1
1000
D D @1 0 0 0A:
1000
The C and D matrices define a ratio version of Williams’ contrast matrix
in (15.5). In general, standard multiple comparison procedures like Dunnett’s
(comparisons with a control, Dilba et al. 2004), Tukey’s (all pairwise comparisons),
Williams’ (Hothorn and Djira 2010), and Marcus’ tests for trend and many others
can also be expressed as ratios using appropriate numerator and denominator
contrast coefficients.
In this section, we use the package mratios (Dilba et al. 2007) to illustrate the
ratio test analysis for the human epidermal cell line data. Note that the ratio test
is only applied to the 3,499 genes which are declared as significantly differentially
expressed by using the likelihood ratio test in Chap. 7.
In order to test a specific ı, we need to specify the value for the margin in function
simtest.ratio(). This function is to be applied to genes with increasing or
decreasing monotonic trends, respectively, because we are interested in conducting
a one-sided ratio test. The significance level for controlling the family-wise error
rate for the r contrasts for a given contrast test needs to specified in the function as
well.
The general form of the function simtest.ratio() is given by
There are several options of multiple contrasts possible in the mratios package.
For example, “Dunnett option for”: many-to-one comparisons with control in the
denominator and “Tukey” for all-pair comparisons. For example, here for one gene
with an increasing monotonic trend, we want to test whether the mean ratio of gene
expression at the highest dose and control is larger than 1.1 ( =1.1, i.e., ı=0.1),
with Marcus-type contrasts.
The lowest p value can be observed for the sixth contrast, which can be used
for further analysis when the p values are collected from all the genes to adjust for
multiplicity.
To obtain the p values for all the 1,600 genes with increasing trends, we can use
the apply function to efficiently gather the output into one vector. The following
code can be used to test for genes with increasing monotonic trends:
> dat.mat.up <- data.sign[direction=="u",] ## extract genes with
## significant increasing monotonic trends
> pval.up <- apply(dat.mat.up, 1, function(genei)
+ {min(simtest.ratio(genei˜x.res, data=data.frame(genei, x.res),
+ type = "Marcus", alternative = "greater", Margin.vec = 1.1,
+ FWER = 0.05, names = TRUE)$p.value.adj)} )
> pval.up
[1] 6.038935e-01 1.147182e-02 1.210143e-14 1.607646e-01 9.996908e-01
[6] 9.999983e-01 8.913892e-02 0.000000e+00 1.541190e-01 1.155104e-05
.
.
.
[1591] 0.975922437 0.733315102 0.173187492 0.705281319 0.935561286
[1600] 0.999718217 0.030762376 0.839369199 0.059804112 0.008509572
Similarly, for the 1,899 genes with decreasing trends, the following code can be
used:
> dat.mat.dn <- data.sign[direction=="d",] ## extract genes with
## significant decreasing monotonic trends
> pval.dn <- apply(dat.mat.dn, 1, function(genei) {min(
+ simtest.ratio(genei˜x.res, data=data.frame(genei, x.res),
+ type = "Marcus", alternative = "less", Margin.vec = 0.9,
+ FWER = 0.05, names = TRUE)$p.value.adj)} )
> pval <- NULL
> pval[direction=="u"] <- pval.up
> pval[direction=="d"] <- pval.dn
> padj <- p.adjust(pval, method="BH")
> sum(padj[direction=="u"]<=0.05)
429
> sum(padj[direction=="d"]<=0.05)
330
While testing the null and alternative hypotheses, defined in (16.4), the value
of ı can be chosen according to the biological interest. For genes with an increasing
trend, the alternative will focus on the differences that are larger than 100 ı% of
the control. For genes with a decreasing trend, differences smaller than 100 ı%
of the control are of interest. When testing ı D 0, in fact, a two-sided test is made.
Nevertheless, in this case, we can categorize significant test results into increasing
or decreasing trends.
16 Simultaneous Inferences for Ratio Parameters Using Multiple Contrasts Test 255
Table 16.1 Number of genes with statistically significant test results for H1U and H1D with
ı D 0, 0.05, 0.1, 0.15, and 0.2
ı 0 0:05 0:1 0:15 0:2
H1U Rejected 2,879 934 429 247 142
H1D Rejected 3,387 968 330 133 66
For each gene in the case study dataset, a MCT with Marcus’ contrasts is applied,
and the BH-FDR procedure is used to control the FDR for all the genes. For testing
ı D 0, we obtain 2,024 genes with statistically significant increasing trends and
2,222 genes with decreasing trends. The number of significant tests equals 4,244
.D 2; 024 C 2; 222/ and is slightly different than the number of 3,499 obtained
2
using the E01 in Chap. 10. The difference is due to the fact that the inference for
2
MCTs is based on the multivariate t-distribution, while the E01 test is carried out
based on the null distribution approximated by using permutations.
Moreover, different values of ı (0:05; 0:1; 0:15; and 0:2) are also used for
testing H0U (H0D ) against one-sided alternatives H1U for genes with an increasing
trend (H1D for genes with a decreasing trend). For an increasing trend, tests for 934
genes are found significant for ı=0.05, 429 genes for ı=0.1, 247 genes for ı=0.15,
and 142 genes for ı=0.2. As expected, as ı increases, the number of significant genes
decreases. The same is true for genes with a decreasing trend. Note that genes with
significant test results for a larger ı are always the subset of the genes significant
test results for a smaller ı.
The R code given above can be modified to test for genes with different values of
ı. The summary of results with numbers of significant genes is shown in Table 16.1.
Genes presented in Figs. 16.2 and 16.3 are found to be statistically significant
when the EN 012
test was used. However, the EN 01 2
does not distinguish between genes
a and b (ı D 0:2) and genes c and d (ı D 0:1). This distinction can be achieved
using the ratio test. Moreover, using the t-type tests (Williams’, Marcus’, M, and the
modified M tests) allows us to test the null hypothesis H00 W .d3 / .d0 / D 0 vs.
0 Up
H1 W .d3 / .d0 / > 0. However, for a specific shift in mean gene expression
00
between the largest dose and the control, one needs to test H0 W .d3 / .d0 / D #
00 Up
vs. H1 W .d3 / .d0 / > #. The value of # is gene specific and depends on the
expression level. For example, the increase of 20% for gene a in Fig. 16.2 implies
that the estimated difference of O ?3 O ?0 is equal to about 5.6, while the increase of
20% for gene b implies that the estimated difference of O ?3 O ?0 is equal to about
4.2, because the estimated mean for the control dose O ?0 of gene b is larger than
that of gene a. In contrast, using the ratio test allows us to test the null hypothesis
Up
H0 W .d3 /=.d0 / D 1 C ı vs. H1 W .d3 /=.d0 / > 1 C ı in (16.4) with same
value of ı (=0.2) for all the genes with increasing trends. The same conclusion can
be drawn for genes a, b, c, and d (shown in Fig. 16.3) with decreasing trends for
testing H0 W .d3 /=.d0 / D 1 ı vs. H1Down W .d3 /=.d0 / < 1 ı, results for
genes a and b are statistically significant for ı D 0:2, and results for genes c and d
are statistically significant for ı D 0:1.
256 D. Lin et al.
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
Fig. 16.2 Example of genes with statistically significant test results for the ratio test with ı D 0:2
(first row) and ı D 0:1 (second row)
16.5 Discussion
In this chapter, we focused on testing the ratio of the mean gene expression
difference between the highest dose and the control using the MCT in the microarray
setting. It is a special application of MCT for testing ratio parameters.
In the microarray setting, it is often expected that the increase of the dose implies
a particular percentage increase/decrease of the mean expression level. Thus, this
biological difference is predetermined and the ratio test becomes of special interest.
It avoids testing for gene-specific difference in mean gene expression between two
doses, which depends on the gene expression level. Based on the significance of
ratios and gene expression of the control dose, it is also possible to rank genes,
which are found significant using the five test statistics, i.e., the LRT, Williams’,
Marcus’, M, and the modified M tests, which were discussed in Chap. 7.
In the previous chapter, we have shown a representation of the two mean gene
expression level difference under an order restriction using multiple contrasts, i.e.,
the Marcus-type MCT. In order to answer the testing problem defined in (16.4),
16 Simultaneous Inferences for Ratio Parameters Using Multiple Contrasts Test 257
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
gene expression
8 8
6 6
4 4
2 2
0 0
1 2 3 4 1 2 3 4
dose dose
Fig. 16.3 Example of genes with statistically significant test results for the ratio test with ı D 0:2
(first row) and ı D 0:1 (second row)
the ratio test can be conducted using the MCTs with corresponding multiple ratio
contrasts (Dilba et al. 2007). This method efficiently utilizes the multivariate t-
distribution as the basis for the inference on the multiple ratio contrast tests.
By choosing a set of ı values in advance, we have performed the ratio tests for
each ı. Increasing of ı values reduces the number of significant findings. Genes
found with a larger value of ı and gene expression at control dose are of higher
interest to the scientists for further investigation.
References
Bretz, F. (2006). An extension of the Williams trend test to general unbalanced linear models.
Computational Statistics & Data Analysis, 50, 1735–1748.
Dilba, D., Bretz, F., Guiard, V., & Hothorn, L. A. (2004). Simultaneous confidence intervals
for ratios with application to the comparison of several treatments with a control. Method of
Information in Medicine 43, 465–469.
258 D. Lin et al.
Dilba, G., Bretz, F., Hothorn, L. A., & Guiard, V. (2006a). Power and sample size computations
in simultaneous tests for non-inferiority based on relative margins. Statistics in Medicine, 25,
1131–1147.
Dilba, G., Brets, F., & Guiard, V. (2006b). Simultaneous confidence sets and confidence intervals
for multiple ratios. Journal of Statistical Planning and Inference, 136, 2640–2658.
Djira, G. D. (2010). Relative potency estimation in parallel-line assys—Method comparison and
some extensions. Communications in Statistics—Theory and Methods, 39, 1180–1189.
Dilba, G, Schaarschmidt, F., & Hothorn, L. A. (2007). Inferences for ratios of normal means.
RNews 7, 1, 20–23.
Hothorn, L. A., & Djira, G. D. (2010). Ratio-to-control Williams-type test for trend. Pharmaceuti-
cal Statistics. doi:10.1002/pst.464.
Chapter 17
Multiple Confidence Intervals for Selected Ratio
Parameters Adjusted for the False
Coverage-Statement Rate
17.1 Introduction
Benjamini and Yekutieli (2005) argued that two types of problems generally arise
when providing statistical inference for multiple parameters: simultaneity refers to
the need to provide inference that simultaneously applies to a subset of several
parameters, while selective inference refers to the need to provide valid inferences
for parameters that are selected after viewing the data. Since dose–response analysis
is provided only for genes that are found to have either increasing or decreasing
dose–response relationship and since the mean dose–response curve for each gene
is a multivariate object, dose–response analysis of microarray experiments can be
viewed as a selection-adjusted simultaneity problem.
In Chap. 16, we discussed using the ratio test in order to select a subset of genes
with a significant increase (decrease) of 100 ı% in the mean gene expression.
In this chapter, we discuss the construction of confidence intervals (CIs) offering
simultaneous coverage for several contrasts of the mean dose–response curve for
the subset of selected genes.
D. Lin ()
Veterinary Medicine Research and Development, Pfizer Animal Health, Zaventem, Belgium
e-mail: [email protected]
D. Yekutieli
Department of Statistics and Operations Research, School of Mathematical Sciences, Tel-Aviv
University, Tel-Aviv, Israel
e-mail: [email protected]
G.D. Djira
Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
e-mail: [email protected]
L.A. Hothorn
Institute of Biostatistics, Leibniz University Hannover, Hannover, Germany
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 259
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 17,
© Springer-Verlag Berlin Heidelberg 2012
260 D. Lin et al.
The contents of this chapter are organized as follows. Section 17.2 introduces
the methods for constructing the CI for a single ratio and multiple ratios, and
we illustrate how to use the R package mratios to construct CIs for the ratios
in Sect. 17.3 for a single prespecified gene. Section 17.4 discusses constructing
CIs for the subset of selected parameters. Section 17.5 specifically discusses the
construction of CIs for parameters selected by the BH-FDR procedure. An example
of adjusting the false coverage-statement rate (FCR)-controlled CIs using R is given
in Sect. 17.6.
Following the definitions in Sect. 16.2, for a single ratio, say D c 0 =d 0 , given in
(17.3), a CI can be constructed using Fieller’s theorem (Fieller 1954). A two-sided
.1 ˛/ 100% CI for the parameter is the solution of the inequality
j. d c/0 yj
N
jt. /j D t˛=2 . /; (17.1)
sŒ d Md 2 c Md C c 0 Mc1=2
2 0 0
P
where t˛=2 . / is the 1-˛=2 quantile of t-distribution with D K i D0 .ni 1/ degrees
of freedom. The limits of the confidence set in (17.1) can be expressed as a quadratic
equation in :
A 2 C B C C 0; (17.2)
where
A D .d 0 y/
N 2 tQ2 s 2 d 0 Md;
B D 2Œ.c 0 y/.d
N 0
N tQ2 s 2 c 0 Md;
y/
C D .c 0 y/
N 2 tQ2 s 2 c 0 Mc and tQ D t˛=2 . /:
Depending on the value of the leading coefficient A, and the discriminant B 2 4AC ,
there are three possible solutions to the equation in (17.2) (Kendall 1999). If A > 0,
then it can be shown that also B 2 4AC > 0, and there are two solutions to
Eq. (17.2). Consequently, the CI is a finite interval lying between the two roots of
(17.2). The other two cases, when A 0 and B 2 4AC 0, result in either a
region containing all values lying outside the finite interval defined by the two roots
of (17.2), or containing the entire -axis, which is commonly referred as Fieller’s
problem. If d 0 ,
N the denominator of the ratio, is significantly different from 0, the
last two cases occur only with a small probability.
Following the multiple testing problem for ratios in Sect. 16.2, now let us define
the unknown ratios of linear combinations of the treatment means by
c 0`
` D ; ` D 1; : : : ; r; (17.3)
d 0`
17 Multiple Confidence Intervals for Selected Ratio Parameters 261
where c ` and d ` are vectors of known contrast coefficients associated with the
numerator and denominator of the `th ratio, and D ..d0 /; : : : ; .dK //0 . To derive
simultaneous CIs for D .1 ; : : : ; r /0 , we apply the following linear form (Fieller
1954) originally used for single ratios. Let
L` D .c ` ` d ` /0 ;
O ` D 1; : : : ; r:
L2 ` D Var.L` / D 2 .c ` ` d ` /0 M .c ` ` d ` /;
SL2 ` D S 2 .c ` ` d ` /0 M .c ` ` d ` /:
1 2
Since SL` is distributed as .L2 ` . //1=2 and is independent of L` , the test
statistic
T` .` / D L` =sL`
.c i i d i /0 M .c j j d j /
ij D p p
.c i i d i /0 M .c i i d i / .c j j d j /0 M .c j j d j /
j.` d ` c ` /0 b
j
jT` ./j D 2 0
Q; ` D 1; : : : ; r
S Œ` d ` M d ` 2` c ` M d ` C c 0` M c ` 1=2
2 0
(17.4)
262 D. Lin et al.
In this section, we illustrate the use of the function sci.ratio() in the mratios
package to construct CIs for the multiple ratios of one gene. The general form of the
function is written as,
sci.ratio(y˜x, data, type = c("Dunnett","Turkey","Marcus",\ldots), base = 1,
method = c("Plug","MtI", "Bonf", "unadj"), Num.Contrast = NULL,
Den.Contrast = NULL, alternative = c("two.sided", "greater","less"),
conf.level = 0.95, names=TRUE)
in which the type of contrasts can be specified in the same way as in the function
contrMat or user can define numerator and denominator contrast matrices. The
methods to estimate the CI are “Plug” of the maximum likelihood estimators of
the ratio parameters into the correlation matrix as discussed above, “MtI” for S̆idák
(1967) or Slepian adjustment, “Bonf” for Bonferroni adjustment, and “unadj” for
unadjusted CIs.
We use two genes with significant ratio of 1.2 as an example to illustrate use
of the mratios for the construction of CIs for multiple ratios in the MCTs.
> gene1 <- data.sign[3, ]
> data1 <- data.frame(gene1=gene1,x.res=x.res)
> CI.wil1=sci.ratio(gene1˜x.res,data=data1,conf.level=0.95,method = "Plug",
type="Williams",alternative="greater")
> CI.mar1=sci.ratio(gene1˜x.res,data=data1,conf.level=0.95,method = "Plug",
type="Marcus",alternative="greater")
> gene2 <- data.sign[11, ]
> data2 <- data.frame(gene2=gene2,x.res=x.res)
> CI.wil2=sci.ratio(gene2˜x.res,data=data2,conf.level=0.95,method = "Plug",
type="Williams",alternative="greater")
> CI.mar2=sci.ratio(gene2˜x.res,data=data2,conf.level=0.95,method = "Plug",
type="Marcus",alternative="greater")
17 Multiple Confidence Intervals for Selected Ratio Parameters 263
> CI.wil1
Simultaneous 95-% confidence intervals
estimate lower
C1 1.3465 1.3162
C2 1.1950 1.1700
C3 1.1226 1.0998
> CI.mar1
Simultaneous 95-% confidence intervals
estimate lower
C1 1.1226 1.0963
C2 1.1950 1.1661
C3 1.2084 1.1849
C4 1.3465 1.3116
C5 1.3617 1.3322
C6 1.3371 1.3107
> CI.wil2
Simultaneous 95-% confidence intervals
estimate lower
C1 2.5878 2.4409
C2 1.9063 1.7987
C3 1.5961 1.5063
> CI.mar2
Simultaneous 95-% confidence intervals
estimate lower
C1 1.5961 1.4899
C2 1.9063 1.7790
C3 1.9298 1.8306
C4 2.5878 2.4141
C5 2.6197 2.4837
C6 2.4257 2.3215
As we can see, the three contrasts of Williams-type MCT (C1–C3) are included
in the Marcus-type MCT (C4, C2, and C1) as discussed in Sect. 14.3. Since we
adjust for six and three CIs for Marcus’ type ratio test and Williams’ type ratio test,
respectively, the CIs for Marcus’ ratio test are slightly wider as compared to that of
William’ CIs.
microarray experiments, R is the number of selected genes, and the CIs constructed
for each gene actually consist of the set of marginal CIs constructed for the
contrasts of the mean dose–response curve for this gene. Thus, V is the number
of selected genes for which at least one mean dose–response contrast is not covered
by the corresponding marginal CI, and the level 0:05 FCR control implies that for
approximately 95% of the selected genes all the CIs for the mean dose–response
contrasts cover the corresponding parameter.
In this section, we construct the FCR-adjusted BH-selected CIs for the ratio
parameters of genes found significant by the ratio test, with ı D 0:2 in Sect. 16.4
at the significance level of 0.05 with BH-FDR adjustment. The number of rejected
hypotheses for ı D 0:2 is 142 for the increasing trend and 66 for the decreasing
17 Multiple Confidence Intervals for Selected Ratio Parameters 265
trend. The FCR-adjusted 1 RCI q=mCI for the selected ratios is constructed,
where RCI D 208 and m D 3; 499.
By adjusting the significance level in the function sci.ratio at 1 RCI q=m,
i.e., 0.997 (=1 0:05 .208=3499/) for each gene, the simultaneous CIs for ratios
of genes with increasing trends can be constructed using the following code:
> CIs[[2]]
Simultaneous 99.7-% confidence intervals
estimate lower
C1 1.5961 1.4115
C2 1.9063 1.6851
C3 1.9298 1.7548
C4 2.5878 2.2857
C5 2.6197 2.3798
C6 2.4257 2.2401
Comparing to the adjusted CIs constructed for these two genes in Sect. 17.3
by using Marcus’ contrast matrix, it is easy to observe that the simultaneous CIs
are wider. We also compare these simultaneous CIs adjusted by the FCR with
the Bonferroni-adjusted CIs, which are expected to widen the CIs even more.
Figure 17.1 shows the CIs for ten selected genes found significant by the ratio
test with ı D 0:2 using the Bonferroni (with the confidence level of 0.9999857
(D1 0:05=3; 499), the FCR-adjusted, and without any multiplicity adjustment
(namely, unadjusted).
Figure 17.2 shows the width of the CIs for ratios of mean gene expressions
between the highest dose and the control for the 208 significant genes. The
unadjusted CIs (dotted line with pluses shown in Fig. 17.2) are always the shortest,
while the Bonferroni CIs are the widest (solid line).
17.7 Discussion
In the microarray setting, the control of the FDR is well addressed when testing
thousands of genes simultaneously (Ge et al. 2003). The multiplicity issue in the
construction of simultaneous CIs for the selected genes needs to be addressed in
266 D. Lin et al.
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
^ /μ^
Ratio Parameter μ ^ /μ^
Ratio Parameter μ ^ /μ^
Ratio Parameter μ
3 1 3 1 3 1
Fig. 17.1 Example of CIs constructed by the Bonferroni, FCR-adjusted, and the adjusted method
for selected genes found significant by the ratio test with ı D 0:2
this context as well. In this chapter, our aim was to construct simultaneous CIs for
the ratio parameters for genes found significant by the ratio tests in Sect. 16.4.
We applied the procedure proposed by Benjamini and Yekutieli (2005) to adjust
the simultaneous confidence levels to ensure the FCR. The FCR-adjusted BH-
selected procedure links the control of FCR in constructing CIs with control of the
FDR in testing the hypotheses. We compared the length of CIs obtained using the
FCR adjustment, no adjustment, and Bonferroni adjustment. The constructed FCR-
adjusted BH-selected CIs did not cover the null value of parameter tested, while the
Bonferroni-adjusted CIs were too wide. The use of the FCR-adjusted CIs needs to
be highlighted as an analogue to the control of the FDR in addressing for multiple
testing problem in the microarray setting.
17 Multiple Confidence Intervals for Selected Ratio Parameters 267
2.5
1.5
1.0
0.5
0.0
0 50 100 150
Selected Genes
Fig. 17.2 Length of CIs for 208 genes found significant by the ratio test with ı D 0:2 for
comparing the mean gene expression of the highest dose versus the control, using the unadjusted,
Bonferroni and FCR-adjusted approaches
References
Benjamini, Y., & Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals
for selected parameters. Journal of the American Statistical Association, 100, 71–81.
Dilba, G., Bretz, F., Hothorn, L. A., & Guiard, V. (2006). Power and sample size computations
in simultaneous tests for non-inferiority based on relative margins. Statistics in Medicine, 25,
1131–1147.
Dilba, G., Bretz, F., Guiard, V., & Hothorn, L. A. (2004). Simultaneous confidence intervals for
ratios with applications to the comparison of several treatments with a control. Methods of
Information in Medicine, 43(5), 465–469.
Djira, G. D. (2010). Relative potency estimation in parallel-line assys—Method comparison and
some extensions. Communications in Statistics—Theory and Methods, 39, 1180–1189.
Fieller, E. C. (1954). Some problems in interval estimation. Journal of Royal Statistical Society B,
16, 175–185.
Ge, Y., Dudoit, S., & Speed, P. T. (2003). Resampling based multiple testing for microarray data
analysis (Technical report, 633). University of Berkeley.
Kendall, M. G. (1999). The Advanced Thoery of Statistics, Volume 2A: Classical Inference and the
Linear Model, M. Kendall, A. Stuart, J.K. Ord, S. Arnold, Oxford University Press, 1999.
S̆idák, Z. (1967). Rectangular confidence regions for the means of multivariate normal distribu-
tions. Journal of the American Statistical Association, 62, 626–633.
Chapter 18
Interfaces for Analyzing Dose–Response Studies
in Microarray Experiments: IsoGeneGUI
and ORIOGEN
18.1 Introduction
According to Ernst and Bar-Joseph (2006), in 39:1% of the 786 datasets in the Gene
Expression Omnibus of 2005 are studies with ordered-restricted design variable
such as age, time, temperature, and dose. Table 18.1 presents a list of free software
developed for the analysis of gene expression experiments with order-restricted
design variable. There is a substantial amount of overlapping between the different
packages presented in Table 18.1 and the same or similar analysis can be conducted
using more than one package.
In this chapter, we present two interfaces for the analysis of dose–response
microarray data. The IsoGeneGUI is a bioconductor menu-based R package.
The package has the same graphical support as the IsoGene package, discussed
previously in the book, and therefore all the output provided by the IsoGene
library can be produced with the IsoGeneGUI as well. The ORIOGEN package
Peddada et al. (2003) is a java-based interface which can be used to produce the
order-restricted analysis presented in Chap. 11. For illustration of the capacity
S. Pramana ()
Karolinska Institutet, Department of Medical Epidemiology and Biostatistics,
Stockholm, Sweden
e-mail: [email protected]
P. Haldermans
Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-BioStat),
Center for Statistics (CenStat), Hasselt University, Diepenbeek, Belgium
e-mail: [email protected]
T. Verbeke
OpenAnalytics BVBA, Heist-op-den-Berg, Belgium
e-mail: [email protected]
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 269
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2 18,
© Springer-Verlag Berlin Heidelberg 2012
270 S. Pramana et al.
Table 18.1 Free software available for order-restricted analysis of ordered gene expression data
Package Type Location Reference
orQA R CRAN Klinglmueller et al. (2011)
IsoGene R CRAN Pramana et al. (2010a)
IsoGeneGUI R Bioconductor Pramana et al. (2010b)
ORIOGEN Java http://dir.niehs.gov/ Peddada et al. (2003)
dirbb/oriogen/index.cfm
ORIClust R CRAN Liu et al. (2009)
STEM Java http://www.cs.cmu. Ernst and Bar-Joseph (2006)
edu/jernst/stem
ORCME R CRAN Otava et al. (2011)
In the first step of the analysis, the data are loaded into R using the File menu. The
package can read data in an R workspace file (*.RData), excel (*.xls), or text (*.txt)
format. In order to load the R workspace, we choose the following sequence:
File > Open dataset > R workspace
The IsoGeneGUI package provides three options for analysis, which we discussed
earlier in Chaps. 3, 7, and 8: (1) analysis with the likelihood ratio test based on
the exact distribution of the test statistic (Chap. 3) (2) resampling-based analysis
(Chap. 7) and (3) significance analysis of microarrays (Chap. 8).
To perform the analysis with the LRT using the exact distribution of EN 01
2
, discussed
in Chap. 3, we choose the following sequence in the Analysis menu:
Fig. 18.1 The main windows for the likelihood ratio test. (a) The dialog box for the LRT statistic
2 2
E01 . (b) Numerical output for the LRT statistic E01
The main dialog box for Analysis based on the LRT is shown in Fig. 18.1a.
Note that, we can choose to perform the analysis for all the genes in the array or on
a predefined subset of genes. In addition, in order to adjust for multiplicity, one can
select from the menu the multiplicity adjustment method to use and the type I error
rate to control.
For the analysis of the antipsychotic data, all the genes are analyzed and the
exact p values were adjusted using all the methods available in the package. The
272 S. Pramana et al.
Fig. 18.2 Graphical outputs of the likelihood ratio test. Panel (a): volcano plot of EN 02 . Panel
(b): raw and adjusted p values for the LRT statistic
output window in Fig. 18.1b shows numbers of significant genes for each selected
adjustment method. In this study, 298 genes were found to be significant using the
FDR-BH method. Default graphs are provided by the package as shown in Fig. 18.2.
All graphical displays can be copied into clipboard and saved in a separate file or
alternatively can be saved into several image formats (*.ps, *,png, *.jpeg, *.bmp,
and *.tiff).
Figure 18.3a shows the main menu for the resampling-based inference discussed in
Chap. 7. We need to specify number of permutations, test statistic(s) to be used, and
multiplicity adjustment methods. Numerical and graphical output can be produced
in a same way discussed in the previous section. Figure 18.3 shows the main dialog
box for the permutation tests.
Fig. 18.3 Resampling-based inference in the IsoGeneGUI. (a) Permutation based inference;
(b) the SAM window
274 S. Pramana et al.
Figure 18.3b presents the window for calculating the SAM regularized test
statistics using permutations. The analysis can be carried out with automatic
selection or a predefined percentile or without fudge factor. In the case of analysis
with the automatic choice of fudge factor by the SAM, the package obtains the
fudge factor by minimizing the coefficient of variations (CV) of median absolute
deviations (MAD) of the selected test statistic.
Once the gene specific test statistics are computed or loaded from an external
(saved) file, we can now perform the SAM by following the sequence:
Analysis > Significance Analysis of Microarrays > SAM Analysis
For the antipsychotic study, using 100 permutations, we found that the number of
genes with a significant monotonic trend is equal to 170 (FDR D 0:05 and delta D
0:67, see Fig. 18.4a), in which 93 and 77 are under increasing and decreasing trends,
respectively. Figure 18.4b shows the SAM plot of the expected versus the observed
test statistics. Other exploratory plots, summary statistics, and user-defined plots can
be produced easily using the Plots menu.
The ORIOGEN package (Peddada et al. 2005) is a java-based interface which can be
used to perform an order-restricted inference for ordered gene expression data. In
comparison with the IsoGeneGUI, the ORIOGEN package allows to test for partial
order alternatives assuming heteroscedasticity. The ORIOGEN is a resampling-based
inference method and uses the SAM methodology to carry out the analysis. As
explained in Chap. 11, the ORIOGEN clusters the genes found to be significant into
subgroups with the same dose–response profile.
Figure 18.5 shows the main dialog box in which we specify: (1) the input data (a
text file with the expression matrix), (2) the output file, (4) number of dose levels,
(5) number of replicates at each dose level, (6) number of bootstrap samples, (7)
FDR level, and (8) the quantile for the fudge factor. A complete description of each
item in the input window is given in the help file of ORIOGEN.
In the next step, we need to specify in advance the mean profiles of primary
interest for the analysis. Note that if we specify only decreasing or increasing
profiles, the ORIOGEN and the SAM in the IsoGeneGUI perform similar analysis,
although the analysis using IsoGeneGUI is done under the assumption of
homoscedasticity and automatic selection of the fudge factor to minimize the CV
of the MAD of the test statistic, while the analysis in the ORIOGEN is done under
the assumption of heteroscedasticity and uses a fixed quantile for the fudge factor.
18 Interfaces for Analyzing Dose–Response Studies 275
Fig. 18.5 The main input screens of the ORIOGEN package. (a) Specification of initial values for
in ORIOGEN. (b) Specification of mean profiles of interest
18 Interfaces for Analyzing Dose–Response Studies 277
Fig. 18.6 Genes with significant increasing/decreasing profiles. (a) Increasing profiles; (b) de-
creasing profiles
278 S. Pramana et al.
Fig. 18.7 Genes with significant profiles for the analysis of noncyclical profiles. (a) Increasing
profiles; (b) decreasing profiles; (c) inverted umbrella profiles (upturn at 3); (d) umbrella profiles
(downturn at 3)
In Sect. 18.2.3.1, we discussed the SAM analysis for the antipsychotic experiment.
In order to perform a similar analysis in the ORIOGEN, we need to specify that
the profiles of primary interest are either increasing or decreasing. In that way, the
ORIOGEN performs an analysis for simple order alternatives. Using the FDR level
of 10% and s0 D s.5%/ , 146 significant genes were found to be significant (compared
with 123 genes which were found in Sect. 18.2.3.1 for the FDR90 D 0:1 and D
0:75, see Fig. 18.3b). The significant genes for each direction are shown in Fig. 18.6.
18 Interfaces for Analyzing Dose–Response Studies 279
In this section, we specify all noncyclical profiles as alternative profiles. Since there
are six dose levels, there are ten possible noncyclical profiles.
Profile Selections:
1 Decreasing profile
2 Umbrella profile, downturn at 2
3 Umbrella profile, downturn at 3
4 Umbrella profile, downturn at 4
5 Umbrella profile, downturn at 5
6 Increasing profile
7 Inverted umbrella profile, upturn at 2
8 Inverted umbrella profile, upturn at 3
9 Inverted umbrella profile, upturn at 4
10 Inverted umbrella profile, upturn at 5
Using the FDR level of 10% and s0 D s.5%/ , 146 significant genes were found to
be significant. Figure 18.7 shows examples for increasing/decreasing and umbrella
profiles, with downturn and upturn at the third dose level.
References
Ernst, J., & Bar-Joseph, Z. (2006). STEM: A tool for the analysis of short time series gene
expression data. BMC Bioinformatics, 7, 191.
Klinglmueller, F., Tuechler, T., & Posch, M. (2011). Cross-platform comparison of microarray data
using order restricted inference. Bioinformatics, 27(7), 953–960.
Liu, T., Lin, N., Shi, N., & Zhang, B. (2009). Order-restricted information criterion-based
clustering algorithm. Reference manual. http://cran.r-project.org/web/packages/ORIClust/.
Otava, M., Kasim, A., & Verbeke, T. (2011). Order restricted clustering for microarray expe-
riments. Reference manual. CRAN. http://cran.r-project.org/web/packages/ORCME/ORCME.
pdf.
Pramana, S. Lin, D., Haldermans, P., Shkedy, Z., Verbeke, T., Göhlmann, H., et al. (2010a).
IsoGene: an R package for analyzing dose-response studies in microarray experiments. The
R Journal, 2(1), 5–12.
Pramana, S., Lin, D., & Shkedy, Z. (2010b). IsoGeneGUI package vignette. Bioconductor.
http://www.bioconductor.org.
Peddada, S., Lobenhofer, E. K., Li, L., Afshari, C. A., Weinberg, C. R., & Umbach, D. M. (2003).
Gene selection and clustering for time-course and dose-response microarray experimants using
order-restricted inference. Bioinformatics, 19(7), 834–841.
Peddada, S., Harris, S., & Harvey E. (2005). ORIOGEN: order restricted inference for ordered
gene expression data. Bioinformatics, 21(20), 3933–3934.
Index
Adjusted p-values, 85, 86, 89, 93, 109 Empirical Bayes’ inference: Limma, 97
Adjusting for multiple testing, 83 Estimation under order restrictions, 11, 29
Akaike information criterion (AIC), 151, 154
Akaike weights, 221
Asymmetric logistic model, 59 False coverage-statement rate (FCR), 263
Asymmetrical sigmoidal function, 61 False discovery rate (FDR), 82
Asymptote parameters (4PL), 45 Family wise error rate (FWER), 82
FCR-adjusted BH-selected CI, 264
Four-parameter logistic model, 44
Bayesian inference, 193 Functional class scoring, 182
Bayesian information criterion (BIC), 151
Bayesian variable selection, 203, 205
BH-FDR, 84 Gene Ontology Consortium, 181
Bonferroni, 83 Gene set enrichment analysis, 183
Bootstrap confidence intervals, 223 GO terms, 187
BY-FDR, 84 Gompertz function, 61
Greatest convex minorant, 19
Growth models, 61
Clustering, 140, 171
Contrast matrix, 235
Contrast vectors, 235 Hierarchical Bayesian models, 193, 195
Cumulative sum diagram, 17 Holm’s procedure, 83
Cyclical profile, 74
eBayes(), 98
ED50 Parameter (4PL), 45 Likelihood ratio test, 35
Efficacy window, 48 List for biological significance, 181
Emax (Hill) model, 57 lmFit(), 98
D. Lin et al. (eds.), Modeling Dose-Response Microarray Data in Early Drug 281
Development Experiments Using R, Use R!, DOI 10.1007/978-3-642-24007-2,
© Springer-Verlag Berlin Heidelberg 2012
282 Index