Solutions Manual For Statistical Computing With R Rizzo 2 1
Solutions Manual For Statistical Computing With R Rizzo 2 1
by
Maria L. Rizzo
by
Maria L. Rizzo
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2008 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are
indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the
publisher cannot assume responsibility for the validity of all materials or for the consequences of their use.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic,
mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact
the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides
licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation with-
out intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface 4
Chapter 3. Methods for Generating Random Variables 5
Chapter 4. Visualization of Multivariate Data 24
Chapter 5. Monte Carlo Integration and Variance Reduction 39
Chapter 6. Monte Carlo Methods in Inference 52
Chapter 7. Bootstrap and Jackknife 62
Chapter 8. Permutation Tests 75
Chapter 9. Markov Chain Monte Carlo Methods 83
Chapter 10. Density Estimation 107
Chapter 11. Numerical Methods in R 125
Preface
This manual contains solutions to all of the exercises in the book “Statistical
Computing with R” including R code if programming was required in the problem.
The solution manual was prepared with Sweave so any output shown is unedited.
The chunks of R code in solutions have been extracted using Stangle, and the
resulting source files are available upon request. The solutions have been checked
in the latest release of R (R-2.6.0) with the latest package versions available at that
release date. Also see personal.bgsu.edu/~mrizzo for updates and the R code for
examples in the book. Comments, corrections, and suggestions are always welcome.
Maria L. Rizzo
Department of Mathematics and Statistics
Bowling Green State University
CHAPTER 3
3.1 Write a function that will generate and return a random sample of size n from the
two-parameter exponential distribution Exp(λ, η) for arbitrary n, λ and η.
The cdf is
F (x) = 1 − e−λ(x−η) , x ≥ η.
Apply the inverse transformation. Generate a random U ∼ Uniform(0,1). Then
1
U = 1 − e−λ(X−η) ⇒ − log(1 − U ) = λ(X − η) ⇒ X = η − log(1 − U ).
λ
D
If U ∼ Uniform(0,1) then 1−U = U so it is equivalent to generate X = η− λ1 log(U ).
Recall that the quantiles are given by
1
xα = − log(1 − α) + η.
λ
3.2 The standard Laplace distribution has density f (x) = 12 e−|x| , , x ∈ R. Use the
inverse transform method to generate a random sample of size 1000 from this dis-
tribution.
Generate a random u from Uniform(0, 1). To compute the inverse x transform,
consider the cases u < 12 and u ≥ 12 separately. If u ≥ 12 then u = −∞ f (t)dt =
5
1
x
2 + 21 (1 − e−x ). If u < 1
then u = −∞ f (t)dt = 12 − 21 (1 − e−x ) = 21 e−x . Deliver
2
Histogram of x
0.5
0.4
0.3
Density
0.2
0.1
0.0
−6 −4 −2 0 2 4 6
Histogram of x
0.3
Density
0.2
0.1
0.0
0 20 40 60 80 100
+ }
> par(mfrow = c(1, 1))
0.5 1 1.5
0.6
1.2
0.4
Density
Density
Density
0.3
0.6
0.2
0.0
0.0
0.0
0.0 0.5 1.0 1.5 0 1 2 3 0 1 2 3 4 5 6
x x x
2 2.5 3
0.20
Density
Density
Density
0.15
0.15
0.10
0.00
0.00
0.00
0 2 4 6 8 0 2 4 6 8 10 0 2 4 6 8 10
x x x
3.5 4 4.5
Density
Density
0.10
0.00
0.00
0 5 10 15 0 5 10 15 0 5 10 15 20
x x x
The relative sample frequencies agree closely with the theoretical probability
distribution (p) in the second row.
Repeat using the R sample function.
> n <- 1000
> p <- c(0.1, 0.2, 0.2, 0.2, 0.3)
> x <- sample(0:4, size = n, prob = p, replace = TRUE)
> rbind(table(x)/n, p)
0 1 2 3 4
0.104 0.199 0.211 0.196 0.29
p 0.100 0.200 0.200 0.200 0.30
3.6 Prove that the accepted variates generated by the acceptance-rejection sampling al-
gorithm are a random sample from the target density fX .
Suppose that X has density fX and Y has density fY , and there is a constant
c > 0 such that fX (t) ≤ cfY (t) for all t in the support set of X. Let A denote the
set of accepted candidate points from the distribution of Y . Then
fX (t) fX (t) 1
P (A) = FU fY (t) dt = dt = ,
cfY (t) c c
where FU is the Uniform(0,1) cdf, and
+ x <- runif(1)
+ if (x^(a - 1) * (1 - x)^(b - 1) > u) {
+ k <- k + 1
+ y[k] <- x
+ }
+ }
+ return(y)
+ }
The function is applied below to generate 1000 Beta(3, 2) variates and the
histogram of the sample is shown with the Beta(3, 2) density superimposed.
Histogram of y
2.0
1.5
Density
1.0
0.5
0.0
Histogram of x
0.25
0.20
30
0.15
Density
20
Q
0.10
10
0.05
0.00
0 20 40 60 0 10 20 30
x y
+ return(x)
+ }
> x <- rEPAN(10000)
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.991400 -0.343500 0.011010 0.007964 0.361100 0.984300
Histogram of x
0.6
Density
0.4
0.2
0.0
3.10 Prove that the algorithm given in Exercise 3.9 generates variates from the density
fe .
The algorithm is equivalent to the following. Generate Y1 , Y2 , Y3 iid from Uni-
form(0,1). If Y3 is max let Y = Y2 , otherwise Y = Y3 . Thus, Y is the first or second
order statistic of the sample Y1 , Y2 , Y3 . Deliver T = ±Y with probability 1/2, 1/2.
Recall that the cdf of the k th order statistic when n = 3 is given by
3
3
Gk (yk ) = [F (yk )]j [1 − F (yk )]3−j .
j
j=k
The cdf of Y is
1 1
G(y) = G1 (y) + G2 (y)
2 2
1 1
= [(1 − (1 − y)3 ) + (3y 2 (1 − y) + y 3 )] = [3y − y 3 ]
2 2
and the density of Y is
1 3
g(y) = G (y) = (3 − 3y 2 ) = (1 − y 2 ), 0 < y < 1.
2 2
Histogram of x
0.25
0.20
0.15
Density
0.10
0.05
0.00
−2 0 2 4 6
0.30
0.30
Density
Density
Density
0.15
0.2
0.15
0.00
0.00
0.0
−4 0 2 4 6 −2 0 2 4 6 −2 0 2 4 6
x x x
0.20
0.15
Density
Density
Density
0.15
0.10
0.00
0.00
0.00
−2 0 2 4 6 −2 0 2 4 6 −2 0 2 4 6
x x x
0.30
Density
Density
0.15
0.15
0.00
0.00
−2 0 2 4 6 −2 0 2 4 6 −2 0 2 4 6
x x x
From the graphs, we might conjecture that the mixture is bimodal if 0.2 <
p < 0.8. (Some results characterizing the shape of a normal mixture density are
given by I. Eisenberger (1964), “Genesis of Bimodal Distributions,” Technometrics
6, 357–363.)
3.12 Simulate a continuous Exponential-Gamma mixture. Suppose that the rate para-
meter Λ has Gamma(r, β) distribution and Y has Exp(Λ) distribution. That is,
(Y |Λ = λ) ∼ fY (y|λ) = λe−λy . Generate 1000 random observations from this
mixture with r = 4 and β = 2.
Supply the sample of randomly generated λ as the Exponential rate argument
in rexp.
3.13 The mixture in Exercise 3.12 has a Pareto distribution with cdf
r
β
F (y) = 1 − , y ≥ 0.
β+y
(This is an alternative parameterization of the Pareto cdf given in Exercise 3.) The
Pareto density is
rβ r
f (y) = F (y) = , y ≥ 0.
(β + y)r+1
Below we generate 1000 random observations from the mixture with r = 4 and
β = 2 and compare the empirical and theoretical (Pareto) distributions by graphing
the density histogram of the sample and superimposing the Pareto density curve.
> hist(x, breaks = "Scott", prob = TRUE)
> y <- sort(x)
> fy <- r * beta^r * (beta + y)^(-r - 1)
> lines(y, fy)
Histogram of x
1.2
1.0
0.8
Density
0.6
0.4
0.2
0.0
0 2 4 6 8 10
3.14 Generate 200 random observations from the 3-dimensional multivariate normal dis-
tribution having mean vector µ = (0, 1, 2) and covariance matrix
⎡ ⎤
1.0 − 0.5 0.5
Σ = ⎣ − 0.5 1.0 − 0.5⎦ .
0.5 − 0.5 1.0
using the Choleski factorization method.
> rmvn.Choleski <- function(n, mu, Sigma) {
+ d <- length(mu)
+ Q <- chol(Sigma)
+ Z <- matrix(rnorm(n * d), nrow = n, ncol = d)
+ X <- Z %*% Q + matrix(mu, n, d, byrow = TRUE)
+ X
+ }
> Sigma <- matrix(c(1, -0.5, 0.5, -0.5, 1, -0.5, 0.5, -0.5,
+ 1), 3, 3)
> mu <- c(0, 1, 2)
> x <- rmvn.Choleski(200, mu, Sigma)
> colMeans(x)
[1] 0.01982928 0.95307678 1.96838036
> cor(x)
[,1] [,2] [,3]
[1,] 1.0000000 -0.5232784 0.5413212
[2,] -0.5232784 1.0000000 -0.5135698
[3,] 0.5413212 -0.5135698 1.0000000
From the pairs plot below it appears that the centers of the distributions agree
with the parameters in µ, and the correlations also agree approximately with the
parameters in Σ.
> pairs(x)
−2 −1 0 1 2 3
3
2
1
var 1
0
−3 −2 −1
3
2
1
var 2
0
−1
−2
5
4
3
var 3
2
1
0
−3 −2 −1 0 1 2 3 0 1 2 3 4 5
3.15 Write a function that will standardize a multivariate normal sample for arbitrary
n and d.
The transformation used to generate multivariate normal samples by eigen-
decomposition was X = ZΣ1/2 + JµT , where Σ1/2 = P Λ1/2 P T . To standardize the
sample,
Y = (X − JµT )Σ−1/2 ,
where Σ−1/2 = P Λ−1/2 P T . To standardize the sample using estimated mean and
covariance,
T
Y = (X − JX )S −1/2 ,
where X is the vector of sample means and S −1/2 is computed from the eigen-
decomposition by the same method used to compute Σ−1/2 .
+ u <- rgamma(n, a, 1)
+ v <- rgamma(n, b, 1)
+ x <- u/(u + v)
+ }, gcFirst = TRUE)
user system elapsed
0.06 0.00 0.06
> set.seed(100)
> system.time(for (i in 1:N) rBETA(n, a, b), gcFirst = TRUE)
user system elapsed
31.08 0.02 31.14
3.18 Write a function to generate a random sample from a Wd (Σ, n) (Wishart) distrib-
ution for n > d + 1 ≥ 1, based on Bartlett’s decomposition.
The following function generates one Wd (Σ, ν) random variate. (Using for
loops for clarity here; see the lower.tri function for a way to avoid the loops.)
> rWISH <- function(Sigma, n) {
+ d <- nrow(Sigma)
+ U <- chol(Sigma)
+ y <- matrix(0, d, d)
+ for (j in 1:d) {
+ for (i in j:d) y[i, j] <- rnorm(1)
+ y[j, j] <- sqrt(rchisq(1, n - i + 1))
+ }
+ A <- y %*% t(y)
+ return(t(U) %*% A %*% U)
+ }
Try the generator on a few examples:
[,1] [,2] [,3]
[1,] 51.029934 -4.224545 7.992898
[2,] -4.224545 53.988282 -2.174543
[3,] 7.992898 -2.174543 48.808521
[,1] [,2] [,3]
[1,] 60.644509 -2.166379 22.816272
[2,] -2.166379 55.098878 9.795274
[3,] 22.816272 9.795274 48.865629
[,1] [,2]
[1,] 34.94311 -21.66413
[2,] -21.66413 49.22049
[,1] [,2]
[1,] 36.44908 -18.50966
[2,] -18.50966 47.56137
Note that the result is a d × d matrix. To generate a random sample from
Wd (Σ, ν), the result can be returned in an array.
> rWish <- function(n, Sigma, nu) {
+ w <- replicate(n, rWISH(Sigma, nu))
+ return(array(w, c(2, 2, n)))
+ }
> rWish(3, Sigma = s, 50)
, , 1
[,1] [,2]
[1,] 19.43331 -10.96212
[2,] -10.96212 36.93807
, , 2
[,1] [,2]
[1,] 39.40838 -26.47205
[2,] -26.47205 56.02625
, , 3
[,1] [,2]
[1,] 19.061526 -9.897257
[2,] -9.897257 57.900117
3.19 Suppose that A and B each start with a stake of $10, and bet $1 on consecutive coin
flips. The game ends when either one of the players has all the money. Let Sn be the
fortune of player A at time n. Then {Sn , n ≥ 0} is a symmetric random walk with
absorbing barriers at 0 and 20. Simulate a realization of the process {Sn , n ≥ 0}
and plot Sn vs the time index from time 0 until a barrier is reached.
> A <- 10
> p <- 0.5
> x <- numeric(1000)
> x[1] <- A
> for (i in 2:1000) {
+ incr <- sample(c(-1, 1), size = 1, prob = c(p, 1 -
+ p))
+ x[i] <- incr
+ A <- sum(x)
+ if (isTRUE(all.equal(A, 20)))
+ break
+ if (isTRUE(all.equal(A, 0)))
+ break
+ }
> x <- cumsum(x[1:i])
20
15
10
x
5
0
0 5 10 15 20 25
Index
3.20 A compound Poisson process is a stochastic process {X(t), t ≥ 0} that can be repre-
sented as the random sum X(t) = N (t)
i=1 Yi , t ≥ 0, where {N (t), t ≥ 0} is a Poisson
process and Y1 , Y2 , . . . are iid and independent of {N (t), t ≥ 0}. Write a program to
simulate a compound Poisson(λ)–Gamma process (Y has a Gamma distribution).
Estimate the mean and the variance of X(10) for several choices of the parameters
and compare with the theoretical values.
> r <- 4
> beta <- 2
> lambda <- 3
> t0 <- 10
> PP <- function(lambda, t0) {
+ Tn <- rexp(1000, lambda)
+ Sn <- cumsum(Tn)
+ stopifnot(Sn[1000] > t0)
+ n <- min(which(Sn > t0))
+ return(n - 1)
+ }
> y <- numeric(1000)
> for (i in 1:1000) {
+ N <- PP(lambda, t0)
+ y[i] <- sum(rgamma(N, shape = r, rate = beta))
+ }
Show that E[X(t)] = λtE[Y1 ] and V ar(X(t)) = λtE[Y12 ]. Then the empirical
and theoretical values of the mean are:
> mean(y)
[1] 60.22843
> lambda * t0 * r/beta
[1] 60
> var(y)
[1] 145.2459
[1] 150
3.21 A nonhomogeneous Poisson process has mean value function m(t) = t2 + 2t, t ≥
0. Determine the intensity function λ(t) of the process, and write a program to
simulate the process on the interval [4, 5]. Compute the probability distribution of
N (5) − N (4), and compare it to the empirical estimate obtained by replicating the
simulation.
t
The mean value function is m(t) = 0 λ(t)dt = t2 + 2t, so λ(t) = 2t + 2 and
on [4, 5] we have λ(t) ≤ 12 = λ0 . The probability distribution of N (5) − N (4) is
Poisson with mean m(5) − m(4) = 52 + 2(5) − (42 + 2(4)) = 35 − 24 = 11.
[1] 10.952
> var(y)
[1] 10.67904
> plot(ecdf(y))
> points(0:25, ppois(0:25, 11), pch = 3)
> legend("topleft", inset = 0.1, legend = c("simulation",
+ "Poisson(11)"), pch = c(1, 3))
ecdf(y)
1.0
simulation
Poisson(11)
0.8
0.6
Fn(x)
0.4
0.2
0.0
5 10 15 20 25
CHAPTER 4
4.1 Generate 200 random observations from the multivariate normal distribution having
mean vector µ = (0, 1, 2) and covariance matrix
⎡ ⎤
1.0 − 0.5 0.5
Σ=⎣ − 0.5 1.0 − 0.5⎦ .
0.5 − 0.5 1.0
> library(MASS)
> Sigma <- matrix(c(1, -0.5, 0.5, -0.5, 1, -0.5, 0.5, -0.5,
+ 1), 3, 3)
> mu <- c(0, 1, 2)
> x <- mvrnorm(200, mu, Sigma)
> colMeans(x)
[1] 0.02626572 0.96678314 2.04179355
> cor(x)
[,1] [,2] [,3]
[1,] 1.0000000 -0.5011990 0.3621894
[2,] -0.5011990 1.0000000 -0.4714206
[3,] 0.3621894 -0.4714206 1.0000000
> detach(package:MASS)
From the pairs plot below it appears that the parameters for each plot approx-
imately agree with the parameters of the corresponding bivariate distributions.
> pairs(x)
24
−2 −1 0 1 2 3 4
2
1
0
var 1
−1
−2
−3
4
3
2
var 2
1
0
−2 −1
4
3
2
var 3
1
0
−1
−3 −2 −1 0 1 2 −1 0 1 2 3 4
4.2 Add a fitted smooth curve to each of the iris virginica scatterplots.
The panel function below is similar to panel.smooth, with the options for color
removed.
> data(iris)
> panel.d <- function(x, ...) {
+ usr <- par("usr")
+ on.exit(par(usr))
+ par(usr = c(usr[1:2], 0, 0.5))
+ lines(density(x))
+ }
> panel.sm <- function(x, y, bg = NA, pch = par("pch"),
+ cex = 1, span = 2/3, iter = 3, ...) {
+ points(x, y, pch = pch, bg = bg, cex = cex)
+ ok <- is.finite(x) & is.finite(y)
+ if (any(ok))
+ lines(stats::lowess(x[ok], y[ok], f = span, iter = iter),
+ ...)
+ }
> x <- scale(iris[101:150, 1:4])
> r <- range(x)
−2 0 1 2 −2 0 1 2
Sepal.Length
2
1
0
−2
Sepal.Width
2
1
0
−2
Petal.Length
2
1
0
−2
Petal.Width
2
1
0
−2
−2 0 1 2 −2 0 1 2
4.3 The random variables X and Y are independent and identically distributed with
normal mixture distributions. The components of the mixture have N (0, 1) and
N (3, 1) distributions with mixing probabilities p1 and p2 = 1 − p1 respectively.
The code below generates a bivariate random sample from the joint distribution
of (X, Y ).
> n <- 500
> mu <- c(0, 3)
> p <- 0.25
> m <- sample(mu, size = 2 * n, replace = TRUE, prob = c(p,
+ 1 - p))
> X <- matrix(rnorm(2 * n, m), n, 2)
For the contour plot, we need the joint density. The random variables are
independent so the joint density is the product of the marginals. (If dependent,
cannot sort X and Y independently.)
(Generally, when the joint density is available, we would not construct the
contour plot from a sample, because we can generate the grid of points directly.
When the joint density is not available, a density estimate can provide the z values.)
> f <- function(x, y) {
+ f1 <- p * dnorm(x, mu[1]) + (1 - p) * dnorm(x, mu[2])
+ f2 <- p * dnorm(y, mu[1]) + (1 - p) * dnorm(y, mu[2])
+ f1 * f2
+ }
> x <- sort(X[, 1])
6
4
2
0
−2
−2 0 2 4
4.4 Construct a filled contour plot of the bivariate mixture in Exercise 4.3.
0.08
4
0.06
2
0.04
0.02
−2
0.00
−2 0 2 4
Increase n above to get a nicer plot. The default plot of a hexbin object is in
grayscale. The plot command below produces the plot in color.
> require(hexbin)
> bin <- hexbin(X[, 1], X[, 2])
> plot(bin, style = "nested.lattice")
2
X[, 2]
−2
Ones
13579
−2 0 2 4
X[, 1]
(First “thinning out” the data because the perspective plot turns out to be quite
dark for the printed version.)
4.6 Repeat Exercise 4.3 for various different choices of the parameters of the mixture
model, and compare the distributions through contour plots.
2 4 6
5
4
3
2
1
−1
−2
−2
−2 0 2 4 6 −2 0 2 4 −2 0 2 4 6
4
4
4
2
2
2
−2 0
−2
−2
−2 0 2 4 −2 0 2 4 −2 0 2 4
4
4
4
2
2
2
0
0
−2 0
−2
−2
−2 0 2 4 −2 0 2 4 −2 0 2 4
4.7 Create a parallel coordinates plot of the crabs (MASS) data using all 200 observa-
tions. Compare the plots before and after adjusting the measurements by the size of
the crab. Interpret the resulting plots.
To use lattice: print(parallel(~crabs[4:8] | sp*sex, crabs))
Graphs are very similar to the lattice parallel plots in the book. Much of the
variability between groups is in overall size. Following the suggestion in Venables
and Ripley (2002) we adjust the measurements by the area of the carapace.
a <- crabs$CW * crabs$CL #area of carapace
Orange males have a clear profile that is almost opposite that of blue females
and similar on some measurements to orange females. The blue males have the
least clear profile and do not appear to match the blue females. In both species the
rear width appears to be larger in females than in males. Orange crabs have small
carapace width relative to body depth.
4.8 Create a plot of Andrews curves for the leafshape17 (DAAG) data, using the log-
arithms of measurements (logwid, logpet, loglen).
The code is the same as the example in the book except that the names of the
variables are changed.
> plot(0, 0, xlim = c(-pi, pi), ylim = c(-3, 3), xlab = "t",
+ ylab = "Andrews Curves", main = "", type = "n")
> a <- seq(-pi, pi, len = 101)
> dim(a) <- length(a)
> for (i in 1:n) {
+ g <- arch[i] + 1
+ y <- apply(a, MARGIN = 1, FUN = f, v = x[i, ])
+ lines(a, y, lty = 1, col = g)
+ }
> legend(3, c("Orthotropic", "Plagiotropic"), lty = 1,
+ col = 1:2)
3
Orthotropic
Plagiotropic
2
1
Andrews Curves
0
−1
−2
−3
−3 −2 −1 0 1 2 3
> detach(leafshape17)
> detach(package:DAAG)
4.9 Refer to the full leafshape (DAAG) data set. Produce Andrews curves for each of
the six locations. Split the screen into six plotting areas, and display all six plots on
one screen. Set line type or color to identify leaf architecture.
There are a few apparent outliers or unusual observations, but most of the
differences appear to be in scale or size of leaf. The second array of plots scales
the measurements to common range. The profiles of the Andrews curves look very
similar across all six locations.
location
Sabah Panama Costa Rica N Queensland S Queensland
80 55 50 61 31
Tasmania
9
50
80
80
40
60
60
30
40
40
A[, 1]
A[, 1]
A[, 1]
20
20
20
10
0
0
−20
−10
−40
−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3
a a a
15
O
50
60
P
40
40
10
30
A[, 1]
A[, 1]
A[, 1]
20
20
5
10
0
−20
0
−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3
a a a
+ }
> legend("topleft", legend = c("O", "P"), lty = 1:2)
> par(mfrow = c(1, 1))
2
1
1
1
0
0
A[, 1]
A[, 1]
A[, 1]
0
−1
−1
−1
−2
−2
−2
−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3
a a a
2
O
P
1
1
A[, 1]
A[, 1]
A[, 1]
0
0
−1
−1
−1
−2
−2
−2
−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3
a a a
> detach(leafshape)
> detach(package:DAAG)
> palette("default")
4.10 Generalize the Andrews curve function for vectors in Rd , where the dimension
d ≥ 2 is arbitrary. Test this function by producing Andrews curves for the iris
data (d = 4) and crabs (MASS) data (d = 5).
> f <- function(v, a) {
+ d <- length(v)
+ y <- v[1]/sqrt(2)
+ for (i in 2:d) {
+ j <- i%/%2
+ if (i%%2) {
+ y <- y + v[i] * cos(j * a)
+ }
+ else {
+ y <- y + v[i] * sin(j * a)
+ }
+ }
+ return(y)
+ }
> a <- seq(-pi, pi, 0.1)
> x <- as.matrix(iris[, 1:4])
> A <- apply(x, 1, f, a = a)
> plot(a, A[, 1], ylim = range(A), type = "l", main = "iris")
> s <- as.integer(iris$Species)
> for (i in 2:nrow(x)) lines(a, A[, i], col = s[i], lty = 1)
> legend("topleft", inset = 0.02, legend = 1:3, lty = 1,
+ col = 1:3)
iris
15
1
2
3
10
A[, 1]
5
0
−3 −2 −1 0 1 2 3
> library(MASS)
> attach(crabs)
> x <- as.matrix(crabs[, 4:8])
> g <- rep(1:4, each = 50)
> A <- apply(x, 1, f, a = a)
> plot(a, A[, 1], ylim = range(A), type = "l", main = "crabs")
> for (i in 2:nrow(x)) lines(a, A[, i], col = g[i])
> legend("topleft", inset = 0.02, legend = 1:4, col = 1:4,
+ lty = 1)
crabs
1
2
100
3
4
50
A[, 1]
0
−50
−3 −2 −1 0 1 2 3
> detach(crabs)
> detach(package:MASS)
> palette("default")
4.11 Refer to the full leafshape (DAAG) data set. Display a segment style stars plot for
leaf measurements at latitude 42 (Tasmania). Repeat using the logarithms of the
measurements.
Use for color version: palette(rainbow(6))
For gray scale: palette(gray((1:6) / 8))
> library(DAAG, warn.conflicts = FALSE)
> attach(leafshape)
> names(leafshape)
[1] "bladelen" "petiole" "bladewid" "latitude" "logwid" "logpet"
[7] "loglen" "arch" "location"
> table(location)
location
Sabah Panama Costa Rica N Queensland S Queensland
80 55 50 61 31
Tasmania
9
> x <- subset(leafshape, subset = (location == "Tasmania"))
> y <- x[, 1:3]
> logy <- x[, 5:7]
> palette(rainbow(6))
> stars(y, draw.segments = TRUE, labels = x$arch, nrow = 3,
+ ylim = c(-2, 10), key.loc = c(3, -1))
0 0 1
1 1 1
1 1 1
bladelen
petiole
bladewid
0 0 1
1 1 1
1 1 1
logwid
logpet
loglen
> palette("default")
> detach(leafshape)
> detach(package:DAAG)
CHAPTER 5
+ }
> Phi <- pnorm(x)
> print(round(rbind(x, cdf, Phi), 3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x 0.10 0.367 0.633 0.900 1.167 1.433 1.700 1.967 2.233 2.500
cdf 0.54 0.643 0.737 0.816 0.879 0.924 0.954 0.974 0.988 0.991
Phi 0.54 0.643 0.737 0.816 0.878 0.924 0.955 0.975 0.987 0.994
To estimate the variance of the MC estimate of Φ(2), replicate the experiment.
Then apply the CLT to construct an approximate 95% confidence interval for Φ(2).
by sampling from Uniform(0, 0.5), and estimate the variance of θ̂. Find another
Monte Carlo estimator θ∗ by sampling from the exponential distribution. Which of
the variances (of θ̂ and θ̂∗ ) is smaller, and why?
.
[The exact value of the integral is θ = 1 − e−.5 = 0.3934693.]
The simple Monte Carlo estimator is
b m
1 1 −u
θ̂ = (b − a) g(x)dx = e ,
a 2 m i=1
where u is generated from Uniform(0, 21 ).
> m <- 10000
> u <- runif(m, 0, 0.5)
> theta <- 0.5 * mean(exp(-u))
> theta
[1] 0.3924437
> est <- replicate(1000, expr = {
+ u <- runif(m, 0, 0.5)
+ theta <- 0.5 * mean(exp(-u))
+ })
> mean(est)
[1] 0.393469
The variance of θ̂ is
V ar( 12 g(U )) . 0.01284807 .
= = 3.212018e − 07.
m 4m
Then
V ar(θ̂) 0.01284807/4 .
= = 0.01345905.
V ar(θ̂∗ ) (1 − e−1/2 )(e−1/2 )
5.4 Write a function to compute a Monte Carlo estimate of the Beta(3, 3) cdf, and use
the function to estimate F (x) for x = 0.1, 0.2, . . . , 0.9. Compare the estimates with
the values returned by the pbeta function in R.
This solution uses the “ratio of gammas” Beta(a,b) generator from Chapter 3.
The function mcpBETA is not set up to handle vector arguments, so all but the first
element of the arguments are ignored.
> mcpBETA <- function(x, a, b, m = 10000) {
+ x <- x[1]
+ a <- a[1]
+ b <- b[1]
+ u <- rgamma(m, a, 1)
+ v <- rgamma(m, b, 1)
+ y <- u/(u + v)
+ return(mean(y <= x))
+ }
> x <- seq(0.1, 0.9, 0.1)
> k <- length(x)
> p <- numeric(k)
> for (i in 1:k) p[i] <- mcpBETA(x[i], 3, 3)
> round(rbind(x, pbeta(x, 3, 3), p), 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
x 0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900
0.009 0.058 0.163 0.317 0.500 0.683 0.837 0.942 0.991
p 0.008 0.059 0.166 0.318 0.506 0.680 0.841 0.943 0.992
5.5 Compute (empirically) the efficiency of the sample mean Monte Carlo method of
estimation of the definite integral Φ(x) relative to the “hit or miss” method.
Method 1: Generate iid Uniform(0,1) random numbers u1 , . . . , um , and com-
pute
m
1 −(ui x)2 /2
θ̂1 = gm (u) = xe .
m i=1
√
If x > 0, the estimate of Φ(x) is 0.5 + θ̂/ 2π. If x < 0 compute Φ(x) = 1 − Φ(−x).
Method 2: Generate a random sample z1 , . . . , zm from the standard normal
distribution, and let
m
1
θ̂2 = I(zi ≤ x).
m i=1
> k <- 20
> X <- seq(-2.5, 2.5, length = k)
> e <- matrix(1, nrow = k, ncol = 4)
> m <- 10000
> for (i in 1:k) {
+ x <- X[i]
+ theta1 <- replicate(200, expr = {
+ u <- runif(m)
+ g <- x * exp(-(u * x)^2/2)
+ theta1 <- mean(g)/sqrt(2 * pi) + 0.5
+ })
+ theta2 <- replicate(200, expr = {
+ z <- rnorm(m)
+ theta2 <- mean(z < x)
+ })
+ eff <- var(theta1)/var(theta2)
+ e[i, ] <- c(x, eff, var(theta1), var(theta2))
+ }
> e
Var(T1)/Var(T2)
15
relative efficiency
10
5
0
−3 −2 −1 0 1 2 3
e[, 1]
the percent reduction in variance of θ̂ that can be achieved using antithetic variates
(compared with simple MC)?
(The variances of eU and e1−U are equal because U and 1 − U are identically
distributed.)
Suppose θ̂1 is the simple MC estimator and θ̂2 is the antithetic estimator. Then
if U and V are iid Uniform (0,1) variables, we have
1 1 1 1 .
V ar( (eU + eV )) = 2V ar(eU ) = · (e2 − 1 − (e − 1)2 ) = 0.1210178.
2 4 2 2
If antithetic variables are used,
1 1
V ar( (eU + e1−U )) = (2V ar(eU ) + 2Cov(eU , e1−U ))
2 4
1 1 2
= (e − 1) − (e − 1)2 ) + e − (e − 1)2
2 2
.
= 0.003912497.
> var(anti)
[1] 2.145301e-07
> mean(mc)
[1] 0.5248207
> mean(anti)
[1] 0.5247966
> 100 * (var(mc) - var(anti))/var(mc)
[1] 96.4598
5.11 If θ̂1 and θ̂2 are unbiased estimators of θ, and θ̂1 and θ̂2 are antithetic, we derived
that c∗ = 1/2 is the optimal constant that minimizes the variance of θ̂c = cθ̂2 + (1 −
c)θ̂2 . Derive c∗ for the general case. That is, if θ̂1 and θ̂2 are any two unbiased
estimators of θ, find the value c∗ that minimizes the variance of the estimator
θ̂c = cθ̂2 + (1 − c)θ̂2 .
Suppose that θ̂1 and θ̂2 are any two unbiased estimators of θ. Then for every
constant c,
In the special case of antithetic variables, the estimators θ̂1 and θ̂2 are identically
distributed and Cov(θ̂1 , θ̂2 ) = −1. Hence, the variance in (1) is
and the optimal constant is c∗ = 1/2. In the general case, (1) is a quadratic function
of c, and the value of c that minimizes the variance is
θ̂ = θ̂fIS . Then
m
2
2 2 1 g(Xi )
V arθ̂ = E[θ̂ ] − (E[θ̂]) = E f (Xi ) − θ2
m i=1 f (Xi )
g(x)2 g(x)
= dx − θ2 = g(x) dx − θ2
f (x) f (x)
≤ M g(x)dx − θ2 = M θ − θ2 < ∞.
5.13 Find two importance functions f1 and f2 that are supported on (1, ∞) and are
‘close’ to
x2 −x2 /2
g(x) = √ e , x > 1.
2π
Which of your two importance functions should produce the smaller variance in
estimating
∞ 2
x 2
√ e−x /2 dx
1 2π
by importance sampling? Explain.
First display the graph of g(x). From the graph, we might consider a normal
distribution or a gamma distribution.
> x <- seq(1, 10, 0.01)
> y <- x^2 * exp(-x^2/2)/sqrt(2 * pi)
> plot(x, y, type = "l", ylim = c(0, 1))
> lines(x, 2 * dnorm(x, 1), lty = 2)
> lines(x, dgamma(x - 1, 3/2, 2), lty = 3)
> legend("topright", inset = 0.02, legend = c("g(x)", "f1",
+ "f2"), lty = 1:3)
1.0
g(x)
f1
f2
0.8
0.6
y
0.4
0.2
0.0
2 4 6 8 10
f1
f2
0.6
0.4
0.2
0.0
2 4 6 8 10
+ mean(g/f)
+ })
> c(mean(is1), mean(is2))
[1] 0.4006183 0.4006317
> c(var(is1), var(is2))
[1] 2.044402e-07 1.054996e-06
> var(is1)/var(is2)
[1] 0.1937829
Clearly importance function (1), a shifted Normal(1,1) (folded at x=1) produces
the more efficient estimator.
For comparison, we can check the numerical integration result.
> g <- function(x) x^2 * exp(-x^2/2)/sqrt(2 * pi)
> integrate(g, lower = 1, upper = Inf)
0.400626 with absolute error < 5.7e-07
5.15 Obtain the stratified importance sampling estimate of
1 −x
e
θ= 2
0 1+x
with importance function
e−x
f (x) = , 0 < x < 1,
1 − e−1
on five subintervals, (j/5, (j + 1)/5), j = 0, 1, . . . , 4. On the j th subinterval
5e−x j−1 j
fj (x) = fx|Ij (x) = , <x< .
1 − e−1 5 5
> M <- 10000
> k <- 5
> m <- M/k
> si <- numeric(k)
> v <- numeric(k)
> g <- function(x) exp(-x)/(1 + x^2)
> f <- function(x) (k/(1 - exp(-1))) * exp(-x)
> for (j in 1:k) {
+ u <- runif(m, (j - 1)/k, j/k)
+ x <- -log(1 - (1 - exp(-1)) * u)
+ fg <- g(x)/f(x)
+ si[j] <- mean(fg)
+ v[j] <- var(fg)
+ }
> sum(si)
[1] 0.5244961
> mean(v)
[1] 1.77338e-05
> sqrt(mean(v))
[1] 0.004211152
CHAPTER 6
6.1 Estimate the MSE of the level k trimmed means for random samples of size 20
generated from a standard Cauchy distribution. (The target parameter θ is the
center or median; the expected value does not exist.) Summarize the estimates of
MSE in a table for k = 1, 2, . . . , 9.
> n <- 20
> K <- n/2 - 1
> m <- 1000
> mse <- matrix(0, n/2, 2)
> trimmed.mse <- function(n, m, k) {
+ tmean <- numeric(m)
+ for (i in 1:m) {
+ x <- sort(rcauchy(n))
+ tmean[i] <- sum(x[(k + 1):(n - k)])/(n - 2 * k)
+ }
+ mse.est <- mean(tmean^2)
+ se.mse <- sqrt(mean((tmean - mean(tmean))^2))/sqrt(m)
+ return(c(mse.est, se.mse))
+ }
> for (k in 0:K) mse[k + 1, 1:2] <- trimmed.mse(n = n, m = m,
+ k = k)
> mse <- as.data.frame(cbind(0:K, mse))
> names(mse) <- list("k", "t-mean", "se")
> print(mse)
k t-mean se
1 0 82.6578971 0.28749176
2 1 1.3422777 0.03663287
3 2 0.3478463 0.01865002
4 3 0.2516950 0.01584669
5 4 0.1708088 0.01305890
6 5 0.1503993 0.01226187
7 6 0.1441091 0.01199713
8 7 0.1352604 0.01161845
9 8 0.1352857 0.01162528
10 9 0.1400206 0.01183221
6.2 Plot the empirical power curve for the t-test, changing the alternative hypothesis to
H1 : µ = 500, and keeping the significance level α = 0.05.
> n <- 20
> m <- 1000
52
0.4
0.2
0.2
0.0
θ
mu
Note: If the error bars do not appear on the printed plot, they should be visible
on the screen (an Sweave issue).
6.3 Plot the power curves for the one-sided t-test for sample sizes 10, 20, 30, 40, and
50, but omit the standard error bars. Plot the curves on the same graph, each in a
different color or different line type, and include a legend.
> plot(mu, power[, 1], type = "l", ylim = range(power), xlab = bquote(mu),
+ ylab = "power")
> abline(v = mu0, lty = 3)
> abline(h = 0.05, lty = 3)
> for (j in 2:5) lines(mu, power[, j], lty = j)
> legend("topleft", inset = 0.02, legend = N, lty = 1:5)
1.0
10
20
30
0.8
40
50
0.6
power
0.4
0.2
0.0
Replace lty=j with col=j in lines and in legend for color. The plots show
that for a fixed alternative, the power is increasing with sample size.
6.4 Suppose that X1 , . . . , Xn are a random sample from a from a lognormal distribution
with unknown parameters. Construct a 95% confidence interval for the parameter
µ. Use a Monte Carlo method to obtain an empirical estimate of the confidence
level.
Transform X to normal and estimate µ with the sample mean of the trans-
formed sample.
> n <- 30
> CI <- replicate(10000, expr = {
+ x <- rlnorm(n)
+ y <- log(x)
+ ybar <- mean(y)
+ se <- sd(y)/sqrt(n)
+ ybar + se * qnorm(c(0.025, 0.975))
+ })
> LCL <- CI[1, ]
> UCL <- CI[2, ]
> sum(LCL < 0 & UCL > 0)
[1] 9380
> mean(LCL < 0 & UCL > 0)
[1] 0.938
6.5 Use a Monte Carlo experiment to estimate the coverage probability of the t-interval
for random samples of χ2 (2) data with sample size n = 20.
> n <- 20
> rootn <- sqrt(n)
> t0 <- qt(c(0.025, 0.975), df = n - 1)
> CI <- replicate(10000, expr = {
+ x <- rchisq(n, df = 2)
+ ci <- mean(x) + t0 * sd(x)/rootn
+ })
> LCL <- CI[1, ]
> UCL <- CI[2, ]
> sum(LCL < 2 & UCL > 2)
[1] 9167
> mean(LCL < 2 & UCL > 2)
[1] 0.9167
The t-interval is more robust to departures from normality than the interval for
variance. For the χ2 (2) distribution the empirical coverage rate was only 77.3%.
√
6.6 Estimate the 0.025, 0.05, 0.95 and 0.975 quantiles of the skewness b1 under nor-
mality by a Monte Carlo experiment. Compute the standard error of the estimates
using the normal approximation for the density (with exact variance formula).
Compare√ the estimated quantiles with the quantiles of the large sample approxi-
mation b1 ≈ N (0, 6/n).
Equation (2.14) gives the variance of a sample quantile:
q(1 − q)
V ar(x̂q ) = .
nf (xq )2
Here the density f is the density of the skewness statistic, and the value of n is the
number of replicates of the statistic. To estimate se(x̂q ), we are approximating f
with the asymptotic normal density.
> sk <- function(x) {
+ xbar <- mean(x)
+ m3 <- mean((x - xbar)^3)
+ m2 <- mean((x - xbar)^2)
+ return(m3/m2^1.5)
+ }
> m <- 10000
> n <- 50
> skstats <- replicate(m, expr = {
+ x <- rnorm(n)
+ sk(x)
+ })
> p <- c(0.025, 0.05, 0.95, 0.975)
> q1 <- quantile(skstats, p)
> q2 <- qnorm(p, 0, sqrt(6 * (n - 2)/((n + 1) * (n + 3))))
> q3 <- qnorm(p, 0, sqrt(6/n))
> f <- dnorm(q2, 0, sqrt(6 * (n - 2)/((n + 1) * (n + 3))))
> v <- p * (1 - p)/(m * f^2)
> rbind(p, q1, sqrt(v))
2.5% 5% 95% 97.5%
p 0.025000000 0.05000000 0.95000000 0.975000000
q1 -0.640460596 -0.53554723 0.54001617 0.638831577
0.008719622 0.00689781 0.00689781 0.008719622
> rbind(q1, q2, q3)
2.5% 5% 95% 97.5%
q1 -0.6404606 -0.5355472 0.5400162 0.6388316
q2 -0.6397662 -0.5369087 0.5369087 0.6397662
q3 -0.6789514 -0.5697940 0.5697940 0.6789514
The first table shows the sample quantiles of the skewness statistic and standard
error of the estimate for sample size 50.
The second table shows the three estimates of quantiles q1 (sample) q2 (nor-
mal with exact variance) and q3 normal asymptotic distribution. The estimated
quantiles q2 are closer to the empirical quantiles than the estimates q3 using the
asymptotic variance 6/n.
6.7 Estimate the power of the skewness test of normality against symmetric Beta(α, α)
distributions and comment on the results.
> alpha <- 0.1
> n <- 30
> m <- 2500
> ab <- 1:10
> N <- length(ab)
> pwr <- numeric(N)
> cv <- qnorm(1 - alpha/2, 0, sqrt(6 * (n - 2)/((n + 1) *
+ (n + 3))))
> for (j in 1:N) {
+ a <- ab[j]
+ sktests <- numeric(m)
+ for (i in 1:m) {
+ x <- rbeta(n, a, a)
+ sktests[i] <- as.integer(abs(sk(x)) >= cv)
+ }
+ pwr[j] <- mean(sktests)
+ }
> pwr
[1] 0.0188 0.0120 0.0228 0.0284 0.0444 0.0516 0.0480 0.0548 0.0468
[10] 0.0724
The symmetric beta alternatives are not normal, but beta is symmetric. This
simulation illustrates that the skewness test of normality is not very effective against
light-tailed symmetric alternatives. The empirical power of the test is not higher
than the nominal significance level.
Are the results different for heavy-tailed symmetric alternatives such as t(ν)?
Yes, the skewness test is more effective against a heavy-tailed symmetric alternative,
such as a Student t distribution. Below we repeat the simulation for several choices
of degrees of freedom.
> alpha <- 0.1
> n <- 30
> m <- 2500
> df <- c(1:5, seq(10, 50, 10))
> N <- length(df)
> pwr <- numeric(N)
> cv <- qnorm(1 - alpha/2, 0, sqrt(6 * (n - 2)/((n + 1) *
+ (n + 3))))
> for (j in 1:N) {
+ nu <- df[j]
+ sktests <- numeric(m)
+ for (i in 1:m) {
+ x <- rt(n, df = nu)
+ sktests[i] <- as.integer(abs(sk(x)) >= cv)
+ }
+ pwr[j] <- mean(sktests)
+ }
> data.frame(df, pwr)
df pwr
1 1 0.8700
2 2 0.6448
3 3 0.5232
4 4 0.4116
5 5 0.3392
6 10 0.1980
7 20 0.1476
8 30 0.1372
9 40 0.1120
10 50 0.1092
The skewness test of normality is more powerful when the degrees of freedom
are small. As degrees of freedom tend to infinity the t distribution tends to normal,
and the power tends to α. One reason that the skewness test is √ more powerful in
this case than against the symmetric beta distributions is that | b1 | is positively
correlated with kurtosis. Kurtosis of beta distribution is less than the normal
kurtosis, while kurtosis of t is greater than the normal kurtosis.
6.8 Repeat the Count Five test power simulation, but also compute the F test of equal
.
variance, at significance level α̂ = 0.055. Compare the power of the Count Five test
and F test for small, medium, and large sample sizes.
> count5test <- function(x, y) {
+ X <- x - mean(x)
+ Y <- y - mean(y)
+ outx <- sum(X > max(Y)) + sum(X < min(Y))
+ outy <- sum(Y > max(X)) + sum(Y < min(X))
+ return(as.integer(max(c(outx, outy)) > 5))
+ }
> sigma1 <- 1
> sigma2 <- 1.5
> m <- 10000
> for (n in c(20, 30, 50, 100, 200, 500)) {
+ tests <- replicate(m, expr = {
+ x <- rnorm(n, 0, sigma1)
+ y <- rnorm(n, 0, sigma2)
+ C5 <- count5test(x, y)
+ Fp <- var.test(x, y)$p.value
+ Ftest <- as.integer(Fp <= 0.055)
+ c(C5, Ftest)
+ })
+ cat(n, rowMeans(tests), "\n")
+ }
20 0.3089 0.4137
30 0.4695 0.5854
50 0.6554 0.8071
100 0.8474 0.9801
200 0.9493 1
500 0.9905 1
The simulation results suggest that the F -test for equal variance is more pow-
erful in this case, for all sample sizes compared (se ≤ 0.005 and se .002 when p̂
is close to 1).
6.9 Let X be a non-negative random variable with µ = E[X] < ∞. For a random
sample x1 , . . . , xn from the distribution of X, the Gini ratio is defined by
n n n
1 1
G= 2 |xi − xj | = 2 (2i − n − 1)x(i) .
2n µ j=1 i=1 n µ i=1
lognormal uniform
10
15
8
Density
Density
10
6
4
5
2
0
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
g1 g2
Bernoulli(0.5) Bernoulli(0.1)
8
12
6
Density
Density
0 2 4 6 8
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
g3 g4
6.10 Construct an approximate 95% confidence interval for the Gini ratio γ = E[G] if X
is lognormal with unknown parameters. Assess the coverage rate of the estimation
procedure with a Monte Carlo experiment.
The confidence intervals below are computed for the standard lognormal distri-
bution at 90% confidence for sample size n = 100, based on the asymptotic normal
distribution of the MLE.
> n <- 100
> m <- 10000
> sigma <- 1
> alpha <- 0.1
> g0 <- 2 * pnorm(sigma/sqrt(2)) - 1
> g0
[1] 0.5204999
> ci <- matrix(0, m, 3)
> vhat <- replicate(m, expr = {
+ x <- rlnorm(n, 0, sigma)
+ mean((log(x) - mean(log(x)))^2)
+ })
> ci[, 1] <- 2 * pnorm(sqrt(vhat)/sqrt(2)) - 1
> s <- sd(ci[, 1])
> ci[, 2] <- ci[, 1] + qnorm(alpha/2) * s
> ci[, 3] <- ci[, 1] + qnorm(1 - alpha/2) * s
> j <- sum(ci[, 2] < g0 & ci[, 3] > g0)
> cat("confidence level ", 100 * (1 - alpha), " coverage rate ",
+ j/m, "\n")
confidence level 90 coverage rate 0.8971
> cat("misses low ", sum(ci[, 3] < g0)/m, "\n")
misses low 0.0626
> cat("misses high ", sum(ci[, 2] > g0)/m, "\n")
misses high 0.0403
The confidence intervals based on the normal approximation of the distribution
of the MLE have approximately the correct coverage rate.
CHAPTER 7
7.1 Compute a jackknife estimate of the bias and the standard error of the correlation
statistic for the law data.
> library(bootstrap)
> attach(law)
> n <- nrow(law)
> theta.hat <- cor(LSAT, GPA)
> theta.jack <- numeric(n)
> for (i in 1:n) theta.jack[i] <- cor(LSAT[-i], GPA[-i])
> bias <- (n - 1) * (mean(theta.jack) - theta.hat)
> se <- sqrt((n - 1) * mean((theta.jack - mean(theta.jack))^2))
> detach(law)
> detach(package:bootstrap)
> print(list(est = theta.hat, bias = bias, se = se))
$est
[1] 0.7763745
$bias
[1] -0.006473623
$se
[1] 0.1425186
7.2 Refer to the law data (bootstrap). Use the jackknife-after-bootstrap method to
estimate the standard error of the bootstrap estimate of se(R).
> library(bootstrap)
> attach(law)
> n <- nrow(law)
> B <- 2000
> theta.b <- numeric(B)
> indices <- matrix(0, nrow = B, ncol = n)
> for (b in 1:B) {
+ i <- sample(1:n, size = n, replace = TRUE)
+ x <- LSAT[i]
+ y <- GPA[i]
+ theta.b[b] <- cor(x, y)
+ indices[b, ] <- i
+ }
> se.jack <- numeric(n)
> for (i in 1:n) {
62
$se
[1] 0.08788263
7.3 Obtain a bootstrap t confidence interval estimate for the correlation statistic (law
data in bootstrap).
Two methods are shown below. The bootstrap t CI can be obtained using the
boot.t.ci function given in Chapter 7, or by using boot.ci after boot.
To use boot.t.ci, provide a function that computes the statistic of interest.
Although cor is available, it returns a correlation matrix when the argument is a
data matrix. Here we need the correlation statistic.
> library(boot)
> library(bootstrap)
> attach(law)
> cor.stat <- function(x, i = 1:NROW(x)) {
+ cor(x[i, 1], x[i, 2])
+ }
> boot.t.ci <- function(x, B = 500, R = 100, level = 0.95,
+ statistic) {
+ x <- as.matrix(x)
+ n <- nrow(x)
+ stat <- numeric(B)
+ se <- numeric(B)
+ boot.se <- function(x, R, f) {
+ x <- as.matrix(x)
+ m <- nrow(x)
+ th <- replicate(R, expr = {
+ i <- sample(1:m, size = m, replace = TRUE)
+ f(x[i, ])
+ })
+ return(sd(th))
+ }
+ for (b in 1:B) {
+ j <- sample(1:n, size = n, replace = TRUE)
+ y <- x[j, ]
+ stat[b] <- statistic(y)
+ se[b] <- boot.se(y, R = R, f = statistic)
+ }
+ stat0 <- statistic(x)
+ t.stats <- (stat - stat0)/se
+ se0 <- sd(stat)
+ alpha <- 1 - level
+ Qt <- quantile(t.stats, c(alpha/2, 1 - alpha/2),
+ type = 1)
+ names(Qt) <- rev(names(Qt))
+ CI <- rev(stat0 - Qt * se0)
+ }
> print(boot.t.ci(law, B = 1000, R = 25, statistic = cor.stat))
2.5% 97.5%
-0.1556726 0.9976787
To use boot.ci after boot, write a function that returns both the correlation
statistic and an estimate of the variance for each bootstrap sample (see cor.stat2
below). Then with boot.ci and type="stud" the variances for the studentized
statistics are by default in the second position of the returned bootstrap object.
> cor.stat2 <- function(x, i = 1:NROW(x)) {
+ o <- boot(x[i, ], cor.stat, R = 25)
+ n <- length(i)
+ c(o$t0, var(o$t) * (n - 1)/n^2)
+ }
> b <- boot(law, statistic = cor.stat2, R = 1000)
> boot.ci(b, type = "stud")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = b, type = "stud")
Intervals :
Level Studentized
95% (-0.2466, 0.9469 )
Calculations and Intervals on Original Scale
The confidence interval estimation is repeated below to show that the bootstrap
t intervals are unstable in this example. See Efron and Tibshirani 1993 for an
explanation and a better approach (transform R to normal). The confidence limits
are the last two numbers.
> b <- boot(law, statistic = cor.stat2, R = 1000)
> boot.ci(b, type = "stud")$stud
conf
[1,] 0.95 975.98 25.03 -0.8707301 1.083004
> b <- boot(law, statistic = cor.stat2, R = 1000)
> boot.ci(b, type = "stud")$stud
conf
[1,] 0.95 975.98 25.03 -0.4006036 1.002005
7.4 Refer to the air-conditioning data set aircondit provided in the boot package.
Assume that the times between failures follow an exponential model Exp(λ). Obtain
the MLE of the hazard rate λ and use bootstrap to estimate the bias and standard
error of the estimate.
The MLE of λ is 1/x̄. The estimates of bias and standard error are printed in
the summary of boot below.
> library(boot)
> x <- aircondit[1]
> rate <- function(x, i) return(1/mean(as.matrix(x[i, ])))
> boot(x, statistic = rate, R = 2000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x, statistic = rate, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.00925212 0.001300935 0.004331923
> detach(package:boot)
7.5 Refer to Exercise 7.4. Compute 95% bootstrap confidence intervals for the mean
time between failures 1/λ by the standard normal, basic, percentile and BCa meth-
ods. Compare the intervals and explain why they may differ.
The aircondit data is a data frame with 1 variable, so aircondit[1] extracts
this variable.
> library(boot)
> x <- aircondit[1]
> meant <- function(x, i) return(mean(as.matrix(x[i, ])))
Call:
boot(data = x, statistic = meant, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 108.0833 0.9393333 37.10921
CALL :
boot.ci(boot.out = b, type = c("norm", "perc", "basic", "bca"))
Intervals :
Level Normal Basic
95% ( 34.4, 179.9 ) ( 25.8, 166.9 )
> detach(package:boot)
The replicates are not approximately normal, so the normal and percentile
intervals differ. From the histogram of replicates, it appears that the distribution
of the replicates is skewed - although we are estimating a mean, the sample size
is too small for CLT to give a good approximation here. The BCa interval is a
percentile type interval, but it adjusts for both skewness and bias.
0.010
0.008
0.006
Density
0.004
0.002
0.000
b$t
7.6 Efron and Tibshirani (1993) discuss the scor (bootstrap) test score data on 88
students who took examinations in five subjects. The first two tests (mechanics,
vectors) were closed book and the last three tests (algebra, analysis, statistics) were
open book. Each row of the data frame is a set of scores (xi1 , . . . , xi5 ) for the
ith student. Use a panel display to display the scatter plots for each pair of test
scores. Compare the plot with the sample correlation matrix. Obtain bootstrap
estimates of the standard errors for each of the following estimates: ρ̂12 = ρ̂(mec,
vec), ρ̂34 = ρ̂(alg, ana), ρ̂35 = ρ̂(alg, sta), ρ̂45 = ρ̂(ana, sta).
> library(bootstrap)
> attach(scor)
> cor(scor)
> pairs(scor)
20 40 60 80 10 30 50 70
60
mec
20
0
20 40 60 80
vec
80
60
alg
40
20
70
50
30
10
ana
10 30 50 70
sta
0 20 60 20 40 60 80 10 30 50 70
From the plots and correlation matrix, it appears that open book scores have
higher correlations than closed book. All test scores have positive sample correla-
tions. The approximate standard errors of the estimates are given by the output
from the boot function below.
> library(boot)
> cor.stat <- function(x, i = 1:NROW(x)) {
+ cor(x[i, 1], x[i, 2])
+ }
> x <- as.matrix(scor)
Bootstrap estimate of se(ρ̂)12 = ρ̂(mec, vec):
> boot(x[, 1:2], statistic = cor.stat, R = 2000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x[, 1:2], statistic = cor.stat, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.5534052 -0.007300376 0.07541025
Bootstrap estimate of se(ρ̂)34 = ρ̂(alg, ana):
> boot(x[, 3:4], statistic = cor.stat, R = 2000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x[, 3:4], statistic = cor.stat, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.7108059 -0.001189044 0.04869804
Bootstrap estimate of se(ρ̂)35 = ρ̂(alg, sta):
> boot(x[, c(3, 5)], statistic = cor.stat, R = 2000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x[, c(3, 5)], statistic = cor.stat, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.6647357 -0.001558886 0.06075636
Bootstrap estimate of se(ρ̂)45 = ρ̂(ana, sta):
> boot(x[, 4:5], statistic = cor.stat, R = 2000)
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x[, 4:5], statistic = cor.stat, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.6071743 -0.002175276 0.06731688
> detach(scor)
> detach(package:bootstrap)
> detach(package:boot)
7.7 Refer to Exercise 7.6. Efron and Tibshirani (1993) discuss the following example.
The five-dimensional scores data have a 5 × 5 covariance matrix Σ, with positive
eigenvalues λ1 > · · · > λ5 . Let λ̂1 > · · · > λ̂5 be the eigenvalues of Σ̂, where Σ̂
is the MLE of Σ. Compute the sample estimate θ̂ = 5λ̂1 λ̂ of θ = 5λ1 λ . Use
j=1 j
j=1 j
Call:
boot(data = scor, statistic = th, R = 2000)
Bootstrap Statistics :
original bias std. error
t1* 0.619115 0.0007396812 0.04700042
> detach(scor)
> detach(package:boot)
> detach(package:bootstrap)
.
The estimates are λ̂1 = 686.99 and θ̂ = 0.619, with bias and std. error of θ̂ equal
to 0.00074 and 0.047.
7.8 Refer to Exercise 7.7. Obtain the jackknife estimates of bias and standard error of
θ̂.
> library(bootstrap)
> attach(scor)
> x <- as.matrix(scor)
> n <- nrow(x)
> theta.jack <- numeric(n)
> lambda <- eigen(cov(x))$values
> theta.hat <- max(lambda/sum(lambda))
> for (i in 1:n) {
+ y <- x[-i, ]
+ s <- cov(y)
+ lambda <- eigen(s)$values
+ theta.jack[i] <- max(lambda/sum(lambda))
+ }
> bias.jack <- (n - 1) * (mean(theta.jack) - theta.hat)
> se.jack <- sqrt((n - 1)/n * sum((theta.jack - mean(theta.jack))^2))
> c(theta.hat, bias.jack, se.jack)
[1] 0.619115038 0.001069139 0.049552307
> list(est = theta.hat, bias = bias.jack, se = se.jack)
$est
[1] 0.619115
$bias
[1] 0.001069139
$se
[1] 0.04955231
> detach(scor)
> detach(package:bootstrap)
The jackknife estimate of bias of θ̂ is approximately 0.001 and the jackknife esti-
mate of se is approximately 0.05. These estimates are not very different from the
bootstrap estimates above.
7.9 Refer to Exercise 7.7. Compute 95% percentile and BCa confidence intervals for θ̂.
> library(bootstrap)
> attach(scor)
> library(boot)
> b <- boot(scor, statistic = th, R = 2000)
> boot.ci(b, type = c("perc", "bca"))
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 2000 bootstrap replicates
CALL :
boot.ci(boot.out = b, type = c("perc", "bca"))
Intervals :
Level Percentile BCa
95% ( 0.5205, 0.7121 ) ( 0.5256, 0.7161 )
Calculations and Intervals on Original Scale
> detach(scor)
> detach(package:boot)
> detach(package:bootstrap)
7.10 Leave-one-out (n-fold) cross validation was used to select the best fitting model for
the ironslag data. Repeat the analysis replacing the Log-Log model with a cubic
polynomial model. Which of the four models is selected by the cross validation
procedure? Which model is selected according to maximum adjusted R2 ?
> library(DAAG, warn.conflict = FALSE)
> attach(ironslag)
> a <- seq(10, 40, 0.1)
First we fit the four models on the complete data and plot the data with the
fits from each estimated model. The values of Ra2 can be extracted from L1, L2,
L3, L4.
> par(mfrow = c(2, 2))
> L1 <- lm(magnetic ~ chemical)
> plot(chemical, magnetic, main = "Linear", pch = 16)
Linear Quadratic
40
40
30
30
magnetic
magnetic
20
20
10
10
10 15 20 25 30 10 15 20 25 30
chemical chemical
Exponential Cubic
40
40
30
30
magnetic
magnetic
20
20
10
10
10 15 20 25 30 10 15 20 25 30
chemical chemical
CHAPTER 8
Permutation Tests
8.1 Implement the two-sample Cramér-von Mises test for equal distributions as a per-
mutation test. Apply the test to the chickwts data.
The Cramér-von Mises statistic is
⎡ ⎤
n m
mn
⎣ (Fn (xi ) − Gm (xi ))2 +
W2 = (Fn (yj ) − Gm (yj ))2 ⎦ ,
(m + n)2 i=1 j=1
where Fn is the ecdf of the sample x1 , . . . , xn and Gm is the ecdf of the sample
y 1 , . . . , ym .
> cvm.test <- function(x, y, R = 199) {
+ n <- length(x)
+ m <- length(y)
+ z <- c(x, y)
+ N <- n + m
+ Fn <- numeric(N)
+ Gm <- numeric(N)
+ for (i in 1:N) {
+ Fn[i] <- mean(as.integer(z[i] <= x))
+ Gm[i] <- mean(as.integer(z[i] <= y))
+ }
+ cvm0 <- ((n * m)/N) * sum((Fn - Gm)^2)
+ cvm <- replicate(R, expr = {
+ k <- sample(1:N)
+ Z <- z[k]
+ X <- Z[1:n]
+ Y <- Z[(n + 1):N]
+ for (i in 1:N) {
+ Fn[i] <- mean(as.integer(Z[i] <= X))
+ Gm[i] <- mean(as.integer(Z[i] <= Y))
+ }
+ ((n * m)/N) * sum((Fn - Gm)^2)
+ })
+ cvm1 <- c(cvm, cvm0)
+ return(list(statistic = cvm0, p.value = mean(cvm1 >=
+ cvm0)))
+ }
> attach(chickwts)
> x1 <- as.vector(weight[feed == "soybean"])
> x2 <- as.vector(weight[feed == "sunflower"])
75
76 8. PERMUTATION TESTS
$p.value
[1] 0.405
The p-value for the CvM test comparing soybean and linseed supplements is
not significant. There is not evidence of a difference between these distributions.
> cvm.test(x2, x3)
$statistic
[1] 36.5
$p.value
[1] 0.005
The p-value for the CvM test comparing sunflower and linseed supplements is
significant at α = 0.01, so there is strong evidence that the distributions of weights
for these two groups are different.
8.2 Implement the bivariate Spearman rank correlation test for independence as a per-
mutation test. Compare the achieved significance level of the permutation test with
the p-value reported by cor.test on the same samples.
> spear.perm <- function(x, y) {
+ stest <- cor.test(x, y, method = "spearman")
+ n <- length(x)
+ rs <- replicate(R, expr = {
+ k <- sample(1:n)
+ cor.test(x, y[k], method = "spearman")$estimate
+ })
+ rs1 <- c(stest$estimate, rs)
+ pval <- mean(as.integer(stest$estimate <=
+ rs1))
+ return(list(rho.s = stest$estimate, p.value = pval))
+ }
In the following examples, the mvrnorm function is used to generate correlated
samples. In the first example, the samples are bivariate normal. In the second
example, the samples are lognormal. The p-values for cor.test and spear.perm
should be approximately equal in both cases.
> library(MASS)
> mu <- c(0, 0)
> Sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2)
> n <- 30
> R <- 499
> x <- mvrnorm(n, mu, Sigma)
> cor.test(x[, 1], x[, 2], method = "spearman")
8. PERMUTATION TESTS 77
$p.value
[1] 0.018
> x <- exp(mvrnorm(n, mu, Sigma))
> cor.test(x[, 1], x[, 2], method = "spearman")
Spearmans rank correlation rho
$p.value
[1] 0.002
The p-values for both tests are both significant and close in value.
> x <- exp(mvrnorm(n, mu, Sigma))
> cor.test(x[, 1], x[, 2], method = "spearman")
Spearmans rank correlation rho
78 8. PERMUTATION TESTS
0.6200222
$p.value
[1] 0.002
> detach(package:MASS)
Again, the p-values are both significant and close in value.
8.3 The Count 5 criterion is not applicable for unequal sample sizes. Implement a
permutation test for equal variance that is based on the maximum outliers statistic
that applies when sample sizes are not necessarily equal.
> maxoutliers <- function(x, y) {
+ X <- x - mean(x)
+ Y <- y - mean(y)
+ outx <- sum(X > max(Y)) + sum(X < min(Y))
+ outy <- sum(Y > max(X)) + sum(Y < min(X))
+ return(max(c(outx, outy)))
+ }
> maxout <- function(x, y, R = 199) {
+ z <- c(x, y)
+ n <- length(x)
+ N <- length(z)
+ stats <- replicate(R, expr = {
+ k <- sample(1:N)
+ k1 <- k[1:n]
+ k2 <- k[(n + 1):N]
+ maxoutliers(z[k1], z[k2])
+ })
+ stat <- maxoutliers(x, y)
+ stats1 <- c(stats, stat)
+ tab <- table(stats1)/(R + 1)
+ return(list(estimate = stat, p = mean(stats1 >=
+ stat), freq = tab, cdf = cumsum(tab)))
+ }
In the first example, variances are equal. In the second example, variances are
unequal. In both examples, sample sizes are unequal. Rather than return only a
p-value, here the permutation test procedure returns the distribution of the max-
outliers statistic.
> set.seed(100)
> n1 <- 20
> n2 <- 40
> mu1 <- mu2 <- 0
> sigma1 <- sigma2 <- 1
> x <- rnorm(n1, mu1, sigma1)
> y <- rnorm(n2, mu2, sigma2)
> maxout(x, y)
$estimate
[1] 6
8. PERMUTATION TESTS 79
$p
[1] 0.195
$freq
stats1
1 2 3 4 5 6 7 8 9 11
0.135 0.240 0.215 0.165 0.050 0.085 0.045 0.030 0.015 0.015
16
0.005
$cdf
1 2 3 4 5 6 7 8 9 11
0.135 0.375 0.590 0.755 0.805 0.890 0.935 0.965 0.980 0.995
16
1.000
This is the equal variance example. The observed statistic is not significant. With
the usual critical value of 5 for equal sample sizes, the significance level of the test
is 0.055. Refer to the cdf of the replicates to determine if the statistic is significant
at 0.055. In the next example the variances are unequal.
> set.seed(100)
> sigma1 <- 1
> sigma2 <- 2
> x <- rnorm(n1, mu1, sigma1)
> y <- rnorm(n2, mu2, sigma2)
> maxout(x, y)
$estimate
[1] 18
$p
[1] 0.005
$freq
stats1
1 2 3 4 5 6 7 8 9 10
0.080 0.350 0.210 0.075 0.090 0.060 0.030 0.060 0.010 0.010
11 12 14 18
0.005 0.010 0.005 0.005
$cdf
1 2 3 4 5 6 7 8 9 10
0.080 0.430 0.640 0.715 0.805 0.865 0.895 0.955 0.965 0.975
11 12 14 18
0.980 0.990 0.995 1.000
The observed statistic here, in the case of unequal variances, is significant.
8.4 Complete the steps to implement a rth -nearest neighbors test for equal distributions.
Write a function to compute the test statistic. The function should take the data
80 8. PERMUTATION TESTS
matrix as its first argument, and an index vector as the second argument. The
number of nearest neighbors r should follow the index argument.
The Tn3 statistic for third nearest neighbors can be generalized to rth NN. We
apply it to simulated multivariate normal data.
> library(knnFinder)
> library(boot)
> Tn.r <- function(z, ix, nbrs, sizes) {
+ n1 <- sizes[1]
+ n2 <- sizes[2]
+ n <- n1 + n2
+ r <- min(nbrs, n - 1)
+ z <- z[ix, ]
+ o <- rep(0, NROW(z))
+ z <- as.data.frame(cbind(z, o))
+ NN <- nn(z, p = r)
+ block1 <- NN$nn.idx[1:n1, ]
+ block2 <- NN$nn.idx[(n1 + 1):n, ]
+ i1 <- sum(block1 < n1 + 0.5)
+ i2 <- sum(block2 > n1 + 0.5)
+ return((i1 + i2)/(r * n))
+ }
The simulated data are distributed as X ∼ N4 (0, I) and Y ∼ N4 ((1, 1, 1, 1)T , I)
with unequal sample sizes.
> x <- matrix(rnorm(100), 25, 4)
> y <- matrix(rnorm(200, 1), 50, 4)
> N <- c(nrow(x), nrow(y))
> z <- rbind(x, y)
> boot.obj <- boot(data = z, statistic = Tn.r, sim = "permutation",
+ R = 199, sizes = N, nbrs = 10)
> boot.obj
DATA PERMUTATION
Call:
boot(data = z, statistic = Tn.r, R = 199, sim = "permutation",
sizes = N, nbrs = 10)
Bootstrap Statistics :
original bias std. error
t1* 0.724 -0.1751625 0.02089114
> b <- c(boot.obj$t, boot.obj$t0)
> b0 <- boot.obj$t0
> mean(b >= b0)
[1] 0.005
> detach(package:boot)
> detach(package:knnFinder)
8. PERMUTATION TESTS 81
D
The 10th NN test statistic is significant at α = 0.05, so the hypothesis H0 : X = Y
is rejected.
Project 8.B. The following approach follows Example 4.12 in Davison and
Hinkley (1997) p. 160.
Pool the two samples and obtain the ordered failure times y1 < . . . ym . Let fij
be the number of failures in group i at time yj , and rij be the number at risk in
group i at time yj , i = 1, 2. Then
f. j r1j f. j r1j r2j (r. j − f. j )
m1j = , v1j = ,
r. j r.2j (r. j − 1)
are the conditional mean and conditional variance of the number in group 1 to fail
at time tj given f. j = f1j + f2j and r. j = r1j + r2j .
The log-rank statistic is
m
j=1 (f1j − m1j )
T = 1/2 .
m
j=1 v1j
> library(boot)
> attach(aml)
> logrankstat <- function(dat, i) {
+ AML <- aml
+ j1 <- i[1:11]
+ j2 <- i[12:23]
+ x1 <- AML$time[j1]
+ x2 <- AML$time[j2]
+ c1 <- AML$cens[j1]
+ c2 <- AML$cens[j2]
+ t1 <- AML$time[j1][c1 == "1"]
+ t2 <- AML$time[j2][c2 == "1"]
+ y <- sort(c(t1, t2))
+ N <- length(c(t1, t2))
+ f1 <- numeric(N)
+ r1 <- numeric(N)
+ f2 <- numeric(N)
+ r2 <- numeric(N)
+ for (i in 1:N) {
+ f1[i] <- sum(t1 <= y[i])
+ f2[i] <- sum(t2 <= y[i])
+ r1[i] <- sum(x1 >= y[i])
+ r2[i] <- sum(x2 >= y[i])
+ }
+ f1 <- diff(c(0, f1))
+ f2 <- diff(c(0, f2))
+ f <- f1 + f2
+ r <- r1 + r2
+ m1 <- (f * r1)/r
+ v1 <- (f * r1 * r2) * (r - f)/(r^2 * (r -
+ 1))
82 8. PERMUTATION TESTS
+ sum(f1 - m1)/sqrt(sum(v1))
+ }
> b <- boot(aml, statistic = logrankstat, R = 999,
+ sim = "permutation")
> b
DATA PERMUTATION
Call:
boot(data = aml, statistic = logrankstat, R = 999, sim = "permutation")
Bootstrap Statistics :
original bias std. error
t1* -1.842929 1.819782 1.047254
> bt <- c(b$t, b$t0)
> mean(bt <= b$t0)
[1] 0.045
> detach(aml)
> detach(package:boot)
CHAPTER 9
9.1 Use the Metropolis-Hastings sampler with proposal distribution χ2 (1) to generate a
sample from Rayleigh(σ = 4) target distribution. Compare the performance of the
Metropolis-Hastings sampler for Example 9.1 and this problem. In particular, what
differences are obvious from the plot corresponding to Figure 9.1?
The proposal distribution is χ2 (Xt ). The code from Example 9.1 can be re-
peated, changing the parameter σ.
> f <- function(x, sigma) {
+ if (x < 0)
+ return(0)
+ stopifnot(sigma > 0)
+ return((x/sigma^2) * exp(-x^2/(2 * sigma^2)))
+ }
> m <- 10000
> sigma <- 2
> x <- numeric(m)
> x[1] <- rchisq(1, df = 1)
> k <- 0
> u <- runif(m)
> for (i in 2:m) {
+ xt <- x[i - 1]
+ y <- rchisq(1, df = xt)
+ num <- f(y, sigma) * dchisq(xt, df = y)
+ den <- f(xt, sigma) * dchisq(y, df = xt)
+ if (u[i] <= num/den)
+ x[i] <- y
+ else {
+ x[i] <- xt
+ k <- k + 1
+ }
+ }
> print(k)
[1] 5298
In this example more than half of the candidate points are rejected, so the chain is
less efficient when σ = 2 than when σ = 4.
To display the plot corresponding to Figure 9.1:
7
6
5
4
x
3
2
1
index
The plot moves horizontally (indicating consecutive rejections) more often than
when σ = 4.
9.2 Repeat Example 9.1 using the proposal distribution Y ∼ Gamma(Xt , 1) (shape
parameter Xt and rate parameter 1).
> f <- function(x, sigma) {
+ if (x < 0)
+ return(0)
+ stopifnot(sigma > 0)
+ return((x/sigma^2) * exp(-x^2/(2 * sigma^2)))
+ }
> m <- 10000
> sigma <- 4
> x <- numeric(m)
> x[1] <- 1
> k <- 0
> u <- runif(m)
> for (i in 2:m) {
+ xt <- x[i - 1]
+ y <- rgamma(1, xt, 1)
+ num <- f(y, sigma) * dgamma(xt, y, 1)
+ den <- f(xt, sigma) * dgamma(y, xt, 1)
+ if (u[i] <= num/den)
+ x[i] <- y
+ else {
+ x[i] <- xt
+ k <- k + 1
+ }
+ }
> print(k)
[1] 2956
In this example only about 30% of the candidate points are rejected, so this chain is
somewhat more efficient for σ = 4 than the chain generated using the χ2 proposal
distribution.
> index <- 5000:5500
> y1 <- x[index]
> plot(index, y1, type = "l", main = "", ylab = "x")
12
10
8
x
6
4
2
index
9.3 Use the Metropolis-Hastings sampler to generate random variables from a standard
Cauchy distribution. Discard the first 1000 of the chain, and compare the deciles of
the generated observations with the deciles of the standard Cauchy distribution.
The following chain uses the N (µt , σ 2 ) proposal distribution, where µt = Xt is
the previous value in the chain. Then
√ 2 2
f (y)g(xt |y) (1 + x2t )π 2πσe−(xt −y) /(2σ ) 1 + x2t
r(xt , y) = = √ = .
f (xt )g(y|xt ) (1 + y 2 )π 2πσe−(y−xt )2 /(2σ2 ) 1 + y2
> m <- 10000
> sigma <- 3
> x <- numeric(m)
> x[1] <- rnorm(1, 0, sigma)
> k <- 0
> u <- runif(m)
> for (i in 2:m) {
+ xt <- x[i - 1]
+ y <- rnorm(1, xt, sigma)
+ num <- 1 + xt^2
+ den <- 1 + y^2
[1] 4800
We also computed the upper tail quantiles. The deciles of the generated chain
are roughly in agreement with the deciles of standard Cauchy. In the upper tail
the difference between sample and Cauchy quantiles is even greater. A QQ plot is
below.
15
10
5
0
Q
−5
−10
−15
9.4 Implement a random walk Metropolis sampler for generating the standard Laplace
distribution. For the increment, simulate from a normal distribution. Compare
the chains generated when different variances are used for the proposal distribution.
Also, compute the acceptance rates of each chain.
The standard Laplace density is
1
f (x) = e−|x|
2
and
f (y) e−|y|
r(xt , y) = = −|x | = e|xt |−|y| .
f (xt ) e t
In the generator rw.Laplace below, N is the length of the chain to generate,
x0=x[1] is the initial value and sigma is the standard deviation of the normal pro-
posal distribution. At each step, the candidate point is generated from N (µt , σ 2 ),
where µt = Xt is the previous value in the chain. The return value is a list con-
taining the generated chain $x and the number of rejected points $k
> rw.Laplace <- function(N, x0, sigma) {
+ x <- numeric(N)
+ x[1] <- x0
+ u <- runif(N)
+ k <- 0
+ for (i in 2:N) {
+ xt <- x[i - 1]
+ y <- rnorm(1, xt, sigma)
+ if (u[i] <= exp(abs(xt) - abs(y)))
+ x[i] <- y
+ else {
+ x[i] <- x[i - 1]
+ k <- k + 1
+ }
+ }
+ return(list(x = x, k = k))
+ }
> N <- 5000
> sigma <- c(0.5, 1, 2, 4)
> x0 <- rnorm(1)
> rw1 <- rw.Laplace(N, x0, sigma[1])
> rw2 <- rw.Laplace(N, x0, sigma[2])
> rw3 <- rw.Laplace(N, x0, sigma[3])
> rw4 <- rw.Laplace(N, x0, sigma[4])
> print(c(rw1$k, rw2$k, rw3$k, rw4$k))
2 4 6 8
4
2
rw1$x
rw2$x
−2 0
−2
−6
−6 0 1000 3000 5000 0 1000 3000 5000
Index Index
0 2 4 6
2 4 6
rw3$x
rw4$x
−2
−4
−8
−6
Index Index
Based on the plots above, a short burn-in sample of size 100 is discarded from
each chain. Each of the chains appear to have converged to the target Laplace
distribution. Chains 2 and 3 corresponding to σ = 1, 2 have the best fits based on
the QQ plots. The second chain is the more efficient of these two.
Histogram of y1 Histogram of y2
0.4
0.4
Density
Density
0.2
0.2
0.0
0.0
−6 −2 0 2 4 6 8 −6 −4 −2 0 2 4
y1 y2
Histogram of y3 Histogram of y4
0.4
0.4
Density
Density
0.2
0.2
0.0
−6 −2 0 2 4 6 8 0.0 −8 −4 0 2 4 6
y3 y4
4
6
4
2
2
Q1
Q2
0
0
−4 −2
−4 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6
z z
4
4
2
2
Q3
Q4
0
0
−2
−4 −2
−6 −4 −2 0 2 4 6 −4 −6 −4 −2 0 2 4 6
z z
9.5 What effect, if any, does the width w have on the mixing of the chain in the invest-
ment model? Repeat the simulation keeping the random number seed fixed, trying
different proposal distributions based on the random increments from Uniform(−w, w),
varying w.
Only a minor modification to the original code is needed. The chains are
combined in matrix X. A counter was added to compute a rejection rate for each
chain.
+ w <- W[j]
+ u <- runif(m)
+ v <- runif(m, -w, w)
+ x[1] <- 0.25
+ for (i in 2:m) {
+ y <- x[i - 1] + v[i]
+ if (u[i] <= prob(y, win)/prob(x[i - 1], win))
+ x[i] <- y
+ else {
+ x[i] <- x[i - 1]
+ k[j] <- k[j] + 1
+ }
+ }
+ X[, j] <- x
+ }
> win
[1] 82 72 45 34 17
> win/days
> k/m
Smaller w corresponds to lower rejection rates. All estimates are close to the
actual value of the parameter 0.2. In the plots below it appears that the chains
with lower w are mixing well and more efficient.
x
0.20
0.16 4500 4700 4900 4500 4700 4900
1 2
0.26
0.24
0.22
x
0.20
0.18
0.16
0.14
3 4
1 2
15
15
10
10
X
X
5
5
0
0
0.15 0.20 0.25 0.15 0.20 0.25 0.30
β β
3 4
12
15
10
0 2 4 6 8
X
β β
9.6 Rao (1973) presented an example on genetic linkage of 197 animals in four cate-
gories. The group sizes are (125, 18, 20, 34). Assume that the probabilities of the
corresponding multinomial distribution are
1 θ 1−θ 1−θ θ
+ , , , .
2 4 4 4 4
Estimate the posterior distribution of θ given the observed sample, using one of the
methods in this chapter.
Use the M-H random walk sampler with a uniform proposal distribution. The
posterior distribution of θ given the observed frequencies k = (k1 , k2 , k3 , k4 ) is
fθ|K (θ) ∝ (2 + θ)k1 (1 − θ)k2 +k3 θk4 .
4
2
0
9.7 Implement a Gibbs sampler to generate a bivariate normal chain (Xt , Yt ) with zero
means, unit standard deviations, and correlation 0.9. Plot the generated sample
after discarding a suitable burn-in sample. Fit a simple linear regression model
Y = β0 + β1 X to the sample and check the residuals of the model for normality and
constant variance.
This Gibbs sampler has been implemented in the examples, with different values
for the parameters.
Coefficients:
(Intercept) X
0.00133 0.90466
> summary(L)
Call:
lm(formula = Y ~ X)
Residuals:
Min 1Q Median 3Q Max
-1.695721 -0.291392 -0.003165 0.294272 1.761891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.00133 0.00688 0.193 0.847
X 0.90466 0.00701 129.058 <2e-16 ***
---
Signif. codes: 0 Ś***Š 0.001 Ś**Š 0.01 Ś*Š 0.05 Ś.Š 0.1 Ś Š 1
4
2
Y
0
−2
−2 0 2 4
For residual plots, the easiest approach is plot(L). Alternately, plots can be
generated as shown below.
0.0
−0.5
−1.0
−1.5
−2 0 2 4
L$fit
1.5
1.0
Sample Quantiles
0.5
0.0
−0.5
−1.0
−1.5
−2 0 2
Theoretical Quantiles
The plot of residuals vs fits suggests that the error variance is constant with
respect to the response variable. The QQ plot of residuals is consistent with the
normal error assumption of the linear model.
9.8 (Beta-binomial) Consider the bivariate density
n x+a−1
f (x, y) ∝ y (1 − y)n−x+b−1 , x = 0, 1, . . . , n, 0 ≤ y ≤ 1.
x
It can be shown that for fixed a, b, n, the conditional distributions are Binomial(n, y)
and Beta(x + a, n − x + b). Use the Gibbs sampler to generate a chain with target
joint density f (x, y).
In the example below, the parameters are a = 2, b = 3, n = 10.
For a bivariate distribution (X, Y ), at each iteration the Gibbs sampler
(1) Generate X ∗ (t) from Binomial(n, p = Y (t − 1)).
(2) Update x(t) = X ∗ (t);
(3) Generate Y ∗ (t) from Beta(x(t) + a, n − x(t) + b).
(4) Set (X(t), (Y (t)) = (X ∗ (t), Y ∗ (t)).
> N <- 10000
> burn <- 2000
> a <- 2
> b <- 3
> n <- 10
> x <- y <- rep(0, N)
> x[1] <- rbinom(1, prob = 0.5, size = n)
> y[1] <- rbeta(1, x[1] + a, n - x[1] + b)
> for (i in 2:N) {
+ x[i] <- rbinom(1, prob = y[i - 1], size = n)
+ y[i] <- rbeta(1, x[i] + a, n - x[i] + b)
+ }
> xb <- x[(burn + 1):N]
> f1 <- table(xb)/length(xb)
Above the estimated marginal distribution of f (x|y) is shown. The true marginal
probability mass function is
n 1
f (x) = .
x B(a, b)B(x + a, n − x + b)
The estimate and target are compared below in a barplot.
> i <- 0:n
> fx <- choose(n, i) * beta(i + a, n - i + b)/beta(a, b)
> round(rbind(f1, fx), 3)
0 1 2 3 4 5 6 7 8 9 10
f1 0.061 0.112 0.128 0.147 0.141 0.120 0.115 0.08 0.054 0.032 0.010
fx 0.066 0.110 0.135 0.144 0.140 0.126 0.105 0.08 0.054 0.030 0.011
> barplot(fx, space = 0, ylim = c(0, 0.15), xlab = "n",
+ main = "p(n)=bar; est=points")
> points(0:n + 0.5, f1)
p(n)=bar; est=points
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
9.9 Modify the Gelman-Rubin convergence monitoring so that only the final value of
R̂ is computed, and repeat the example, omitting the graphs.
> Gelman.Rubin <- function(psi) {
+ psi <- as.matrix(psi)
+ n <- ncol(psi)
+ k <- nrow(psi)
+ psi.means <- rowMeans(psi)
+ B <- n * var(psi.means)
+ psi.w <- apply(psi, 1, "var")
+ W <- mean(psi.w)
+ v.hat <- W * (n - 1)/n + (B/n)
+ r.hat <- v.hat/W
+ return(r.hat)
+ }
> normal.chain <- function(sigma, N, X1) {
+ x <- rep(0, N)
+ x[1] <- X1
+ u <- runif(N)
+ for (i in 2:N) {
+ xt <- x[i - 1]
+ y <- rnorm(1, xt, sigma)
+ r1 <- dnorm(y, 0, 1) * dnorm(xt, y, sigma)
+ r2 <- dnorm(xt, 0, 1) * dnorm(y, xt, sigma)
+ r <- r1/r2
+ if (u[i] <= r)
+ x[i] <- y
+ else x[i] <- xt
+ }
+ return(x)
+ }
> sigma <- 0.2
> k <- 4
> n <- 15000
> b <- 1000
> x0 <- c(-10, -5, 5, 10)
> X <- matrix(0, nrow = k, ncol = n)
> for (i in 1:k) X[i, ] <- normal.chain(sigma, n, x0[i])
> psi <- t(apply(X, 1, cumsum))
> for (i in 1:nrow(psi)) psi[i, ] <- psi[i, ]/(1:ncol(psi))
> rhat <- Gelman.Rubin(psi)
> rhat
[1] 1.201070
9.10 Use the Gelman-Rubin method to monitor convergence of the Rayleigh M-H chain,
and run the chain until the chain has converged approximately to the target dis-
tribution according to R̂ < 1.2. (See Exercise 9.9.) Also use the coda package to
check for convergence of the chain by the Gelman-Rubin method.
> f <- function(x, sigma) {
+ if (x < 0)
+ return(0)
+ stopifnot(sigma > 0)
+ return((x/sigma^2) * exp(-x^2/(2 * sigma^2)))
+ }
> Rayleigh.MH.chain1 <- function(sigma, m, x0) {
+ x <- numeric(m)
+ x[1] <- x0
+ u <- runif(m)
+ for (i in 2:m) {
+ xt <- x[i - 1]
+ y <- rchisq(1, df = xt)
+ num <- f(y, sigma) * dchisq(xt, df = y)
+ den <- f(xt, sigma) * dchisq(y, df = xt)
+ if (u[i] <= num/den)
+ x[i] <- y
+ else x[i] <- xt
+ }
+ return(x)
+ }
> sigma <- 4
> x0 <- c(1/sigma^2, 1/sigma, sigma^2, sigma^3)
> k <- 4
> m <- 2000
> X <- matrix(0, nrow = k, ncol = m)
> for (i in 1:k) X[i, ] <- Rayleigh.MH.chain1(sigma, m,
+ x0[i])
> psi <- t(apply(X, 1, cumsum))
> for (i in 1:nrow(psi)) psi[i, ] <- psi[i, ]/(1:ncol(psi))
> rhat <- Gelman.Rubin(psi)
> rhat
[1] 1.168618
To use the Gelman-Rubin diagnostic functions in coda, convert the chains into
mcmc objects. Then create a list of the mcmc objects. The mcmc.list is then the
argument to gelman.diag and gelman.plot.
> library(coda)
> X1 <- as.mcmc(X[1, ])
> X2 <- as.mcmc(X[2, ])
> X3 <- as.mcmc(X[3, ])
> X4 <- as.mcmc(X[4, ])
> Y <- mcmc.list(X1, X2, X3, X4)
> print(gelman.diag(Y))
Remark: help page for gelman.plot usage states that x is an mcmc object.
Actually the function is expecting a list of mcmc objects or an object that can be
converted into a list of this type.
By default the plots print in two colors. Here black and white is specified. The
lattice library is loaded.
2.0
median
97.5%
1.8
1.6
shrink factor
1.4
1.2
1.0
> detach(package:coda)
> detach(package:lattice)
9.11 Use the Gelman-Rubin method to monitor convergence of the random walk Me-
tropolis chain for β in the investment model. Also use the coda package to check
for convergence of the chain by the Gelman-Rubin method.
> b <- 0.2
> m <- 5000
> burn <- 1000
> days <- 250
> win <- c(82, 72, 45, 34, 17)
> prob <- function(y, win) {
+ if (y < 0 || y >= 0.5)
+ return(0)
+ return((1/3)^win[1] * ((1 - y)/3)^win[2] * ((1 -
+ 2 * y)/3)^win[3] * ((2 * y)/3)^win[4] * (y/3)^win[5])
+ }
> w <- 0.1
> X <- matrix(0, 4, m)
> x <- numeric(m)
> x0 <- c(0.01, 0.1, 0.4, 0.49)
> for (j in 1:4) {
+ u <- runif(m)
+ v <- runif(m, -w, w)
[1] 1.107664
If R̂ < 1.2 then the chain appears to have converged within the 5000 iterations.
The analysis is repeated below using gelman.diag and gelman.plot in the coda
package.
> library(coda)
> X1 <- as.mcmc(X[1, ])
> X2 <- as.mcmc(X[2, ])
> X3 <- as.mcmc(X[3, ])
> X4 <- as.mcmc(X[4, ])
> Y <- mcmc.list(X1, X2, X3, X4)
> print(gelman.diag(Y))
median
1.4
97.5%
1.3
shrink factor
1.2
1.1
1.0
> detach(package:coda)
> detach(package:lattice)
9.12 Use the Gelman-Rubin method to monitor convergence of the independence sampler
chain for the mixing probability p in the normal location mixture. Also use the coda
package to check for convergence of the chain by the Gelman-Rubin method.
> m <- 5000
> a <- 1
> b <- 1
> p <- 0.2
> n <- 30
> mu <- c(0, 5)
> sigma <- c(1, 1)
> X <- matrix(0, 4, m)
> x0 <- c(0.01, 0.05, 0.95, 0.99)
> i <- sample(1:2, size = n, replace = TRUE, prob = c(p,
+ 1 - p))
> x <- rnorm(n, mu[i], sigma[i])
> for (j in 1:4) {
+ xt <- numeric(m)
+ u <- runif(m)
+ y <- rbeta(m, a, b)
+ xt[1] <- x0[j]
+ for (i in 2:m) {
[1] 1.211341
If R̂ < 1.2, the chain has converged to the target distribution within 5000
iterations. The analysis is repeated below using the coda package.
> library(coda)
> X1 <- as.mcmc(X[1, ])
> X2 <- as.mcmc(X[2, ])
> X3 <- as.mcmc(X[3, ])
> X4 <- as.mcmc(X[4, ])
> Y <- mcmc.list(X1, X2, X3, X4)
> print(gelman.diag(Y))
1.25
median
97.5%
1.20
shrink factor
1.15
1.10
1.05
1.00
> detach(package:coda)
> detach(package:lattice)
CHAPTER 10
Density Estimation
10.1 Construct a histogram estimate of density for a random sample of standard lognor-
mal data using Sturges’ Rule, for sample size n = 100. Repeat the estimate for the
same sample using the correction for skewness proposed by Doane 1976. Compare
the number of bins and break points using both methods. Compare the density es-
timates at the deciles of the lognormal distribution with the lognormal density at
the same points. Does the suggested correction give better density estimates in this
example?
Doane’s correction adds
√
| b1 |
Ke = log2 1 + √ ,
σ( b1 )
classes, where
6(n − 2)
σ( b1 ) = .
(n + 1)(n + 3)
is
k k νi
h
(fˆ(xij ) − f (xij ))2
hM SEi =
i=1 i=1
νi j=1
where k is the number of bins, h is the uniform bin width, νi is the frequency of
observations in bin i, and xij is the j th observation in bin i.
By sorting the sample data first, it is easy to find the bin containing each sample
point as e.g.
bin1 <- rep(1:k[1], hg1$counts)
Then bin1 will contain (in order) the bin number corresponding to each element
of the ordered sample. We can then use the bin numbers to extract the density
estimates and bin frequencies from the histogram objects.
> n <- 500
> x <- sort(rnorm(n))
> f <- dnorm(x)
> k <- c(nclass.Sturges(x), nclass.scott(x), nclass.FD(x))
> R <- diff(range(x))
> h <- R/k
> br1 <- min(x) + h[1] * 0:k[1]
> br2 <- min(x) + h[2] * 0:k[2]
> br3 <- min(x) + h[3] * 0:k[3]
> hg1 <- hist(x, breaks = br1, plot = FALSE)
> hg2 <- hist(x, breaks = br2, plot = FALSE)
> hg3 <- hist(x, breaks = br3, plot = FALSE)
> bin1 <- rep(1:k[1], hg1$counts)
> bin2 <- rep(1:k[2], hg2$counts)
> bin3 <- rep(1:k[3], hg3$counts)
> imse1 <- sum((f - hg1$density[bin1])^2 * h[1]/hg1$counts[bin1])
> imse2 <- sum((f - hg2$density[bin2])^2 * h[2]/hg2$counts[bin2])
> imse3 <- sum((f - hg3$density[bin3])^2 * h[3]/hg3$counts[bin3])
> k
[1] 10 18 23
> h
[1] 0.7544384 0.4191324 0.3280167
> c(imse1, imse2, imse3)
[1] 0.007479438 0.003670251 0.004519484
Although the estimates vary, usually Scott’s rule produces the lowest estimate
of IMSE. The FD Rule is close to Scott’s rule and Sturge’s rule usually has a higher
IMSE than either Scott or FD rules. Scott’s rule is (asymptotically) optimal for
normal distributions by the AMISE criterion.
10.3 Construct a frequency polygon
∞ density estimate for the precip dataset in R. Verify
.
that the estimate satisfies −∞ fˆ(x)dx = 1 by numerical integration of the density
estimate.
> n <- length(precip)
> h <- 2.15 * sqrt(var(precip)) * n^(-1/5)
> nbins <- ceiling(diff(range(precip))/h)
> br <- min(precip) + h * 0:nbins
> brplus <- c(min(br) - h, max(br + h))
> histg <- hist(precip, breaks = br, freq = FALSE, main = "",
+ xlim = brplus)
> vx <- histg$mids
> vy <- histg$density
> delta <- diff(vx)[1]
> k <- length(vx)
0.030
0.020
Density
0.010
0.000
0 20 40 60 80
precip
> histg <- hist(precip, breaks = br, freq = FALSE, main = "",
+ xlim = brplus)
> vx <- histg$mids
> vy <- histg$density
> delta <- diff(vx)[1]
0.04
0.03
Density
0.02
0.01
0.00
0 20 40 60 80
precip
> h <- h1 * a
> nbins <- ceiling(diff(range(precip))/h)
> br <- min(precip) + h * 0:nbins
> brplus <- c(min(br) - h, max(br + h))
> histg <- hist(precip, breaks = br, freq = FALSE, main = "",
+ xlim = brplus)
> vx <- histg$mids
> vy <- histg$density
> delta <- diff(vx)[1]
> k <- length(vx)
> vx <- vx + delta
> vx <- c(vx[1] - 2 * delta, vx[1] - delta, vx)
> vy <- c(0, vy, 0)
> polygon(vx, vy)
> c(h1, a, h, delta)
[1] 12.5994091 0.5249548 6.6141201 6.6141201
> length(histg$counts)
[1] 10
0.000 0.005 0.010 0.015 0.020 0.025 0.030 0.035
Density
0 20 40 60 80
precip
0.2
0.1
0.0
1 2 3 4 5 6
10.7 Construct an ASH density estimate for the precip dataset in R. Choose the best
value for width h∗ empirically by computing the estimates over a range of possible
values of h and comparing the plots of the densities. Does the optimal value hfnp
correspond to the optimal value h∗ suggested by comparing the density plots?
0.04
Density
Density
Density
Density
0.02
0.02
0.02
0.02
0.00
0.00
0.00
0.00
10 40 70 10 40 70 10 40 70 10 40 70
6 7 8 9
0.020
0.020
0.020
0.020
Density
Density
Density
Density
0.000
0.000
0.000
0.000
10 40 70 10 40 70 10 40 70 10 40 70
10 11 12 13
0.020
0.020
0.020
Density
Density
Density
Density
0.000 0.015
0.000
0.000
0.000
10 40 70 20 60 10 40 70 10 40 70
14 15 16 17
The “optimal” bin width appears to be close to 11. This value is is close to the
h∗ suggested by the normal reference rule using a robust estimator of σ.
10.8 The buffalo dataset in the gss package contains annual snowfall accumulations
in Buffalo, New York from 1910 to 1973. Construct kernel density estimates of
the data using Gaussian and biweight kernels. Compare the estimates for different
choices of bandwidth. Is the estimate more influenced by the type of kernel or the
bandwidth?
> library(gss)
> data(buffalo)
> par(mfrow = c(3, 2))
> for (h in seq(4, 12, 4)) {
+ plot(density(buffalo, bw = h, kernel = "gaussian"),
+ main = "gaussian")
+ plot(density(buffalo, bw = h, kernel = "biweight"),
+ main = "biweight")
+ }
gaussian biweight
0.015
0.015
Density
Density
0.000
0.000
N = 63 Bandwidth = 4 N = 63 Bandwidth = 4
gaussian biweight
0.010
0.010
Density
Density
0.000
0.000
N = 63 Bandwidth = 8 N = 63 Bandwidth = 8
gaussian biweight
0.010
0.010
Density
Density
0.000
0.000
N = 63 Bandwidth = 12 N = 63 Bandwidth = 12
1 2 3
0.20
0.20
0.20
0.15
0.15
0.15
Density
Density
Density
0.10
0.10
0.10
0.05
0.05
0.05
0.00
0.00
0.00
−4 0 2 4 6 8 −4 0 2 4 6 −2 0 2 4 6
4 5 6
0.20
0.20
0.20
0.15
0.15
0.15
Density
Density
Density
0.10
0.10
0.10
0.05
0.05
0.05
0.00
0.00
0.00
−4 0 2 4 6 −4 0 2 4 6 −5 0 5
The best choices of smoothing parameter are the ones given by the formulas
above and their average (1, 2, 5). Choice (2) corresponding to the default “Silver-
man’s rule-of-thumb” appears to fit best.
10.10 Apply the reflection boundary technique to obtain a better kernel density estimate
for the precipitation data. Compare the estimates in a single graph. Also try setting
from = 0 or cut = 0 in the density function.
The first plot shows the reflection boundary method in the solid curve with the
kernel density estimate as the dotted line.
0.010
0.000
0 20 40 60 80
The second plot shows the effect of setting from = 0 and the third plot shows
the effect of setting cut = 0.
0.03
0.02
Density
0.01
0.00
0 20 40 60 80
N = 70 Bandwidth = 3.848
10 20 30 40 50 60
N = 70 Bandwidth = 3.848
10.11 Write a bivariate histogram plotting function. Apply your function to display the
bivariate faithful data (Old Faithful geyser).
> bin2d <- function(x, breaks1 = "Sturges", breaks2 = "Sturges") {
+ histg1 <- hist(x[, 1], breaks = breaks1, plot = FALSE)
+ histg2 <- hist(x[, 2], breaks = breaks2, plot = FALSE)
+ brx <- histg1$breaks
+ bry <- histg2$breaks
+ freq <- table(cut(x[, 1], brx), cut(x[, 2], bry))
+ return(list(call = match.call(), freq = freq, breaks1 = brx,
+ breaks2 = bry, mids1 = histg1$mids, mids2 = histg2$mids))
+ }
> fp2d <- function(x, breaks1 = "Sturges", breaks2 = "Sturges") {
+ b <- bin2d(x)
+ h1 <- diff(b$breaks1)
+ h2 <- diff(b$breaks2)
+ h <- outer(h1, h2, "*")
+ Z <- b$freq/(n * h)
+ persp(x = b$mids1, y = b$mids2, z = Z, shade = TRUE,
+ xlab = "X", ylab = "Y", main = "", theta = 45,
+ phi = 30, ltheta = 60)
+ }
> n <- 2000
> d <- 2
> x <- matrix(rnorm(n * d), n, d)
> fp2d(x)
X Y
> fp2d(log(as.matrix(faithful)))
Z
X
Y
> attach(geyser)
> x <- as.matrix(geyser)
> detach(geyser)
> detach(package:MASS)
> nbin <- c(30, 30)
> m <- c(5, 5)
> b <- bin2(x, nbin = nbin)
> est <- ash2(b, m = m, kopt = c(1, 0))
40 50 60 70 80 90 100 110
10.13 Generalize the bivariate ASH algorithm to compute an ASH density estimate for a
d-dimensional multivariate density, d ≥ 2.
10.14 Write a function to bin three-dimensional data into a three-way contingency table,
following the method in the bin2d function. Check the result on simulated N3 (0, I)
data. Compare the marginal frequencies returned by your function to the expected
frequencies from a standard univariate normal distribution.
The bin2d function can easily be converted to bin3d.
> bin3d <- function(x, breaks1 = "Sturges", breaks2 = "Sturges",
+ breaks3 = "Sturges") {
+ histg1 <- hist(x[, 1], breaks = breaks1, plot = FALSE)
+ histg2 <- hist(x[, 2], breaks = breaks2, plot = FALSE)
+ histg3 <- hist(x[, 3], breaks = breaks3, plot = FALSE)
+ brx <- histg1$breaks
+ bry <- histg2$breaks
+ brz <- histg3$breaks
+ freq <- table(cut(x[, 1], brx), cut(x[, 2], bry),
+ cut(x[, 3], brz))
+ return(list(call = match.call(), freq = freq, breaks1 = brx,
+ breaks2 = bry, breaks3 = brz, mids1 = histg1$mids,
+ mids2 = histg2$mids, mids3 = histg3$mids))
+ }
In the example below, a very large sample of standard 3-dimensional normal data
is generated and binned. Then each of the marginal frequency distributions is
compared with standard normal.
> n <- 2000
> x <- matrix(rnorm(n * 3), n, 3)
> b <- bin3d(x)
> h1 <- diff(b$breaks1)
> f1hat <- apply(b$freq, MARGIN = 1, FUN = "sum")
> f1 <- (pnorm(b$breaks1[-1]) - pnorm(b$breaks1[-length(b$breaks1)])) *
+ n
> round(cbind(f1, f1hat))
f1 f1hat
(-3.5,-3] 2 5
(-3,-2.5] 10 12
(-2.5,-2] 33 31
(-2,-1.5] 88 83
(-1.5,-1] 184 177
(-1,-0.5] 300 287
(-0.5,0] 383 440
(0,0.5] 383 367
(0.5,1] 300 282
(1,1.5] 184 188
(1.5,2] 88 89
(2,2.5] 33 30
(2.5,3] 10 9
> h2 <- diff(b$breaks2)
> f2hat <- apply(b$freq, MARGIN = 2, FUN = "sum")
> f2 <- (pnorm(b$breaks2[-1]) - pnorm(b$breaks2[-length(b$breaks2)])) *
+ n
> round(cbind(f2, f2hat))
f2 f2hat
(-3.5,-3] 2 1
(-3,-2.5] 10 11
(-2.5,-2] 33 41
(-2,-1.5] 88 83
(-1.5,-1] 184 178
(-1,-0.5] 300 271
(-0.5,0] 383 384
(0,0.5] 383 375
(0.5,1] 300 313
(1,1.5] 184 194
(1.5,2] 88 104
(2,2.5] 33 35
(2.5,3] 10 10
> h3 <- diff(b$breaks3)
> f3hat <- apply(b$freq, MARGIN = 3, FUN = "sum")
> f3 <- (pnorm(b$breaks3[-1]) - pnorm(b$breaks3[-length(b$breaks3)])) *
+ n
> round(cbind(f3, f3hat))
f3 f3hat
(-4,-3.5] 0 1
(-3.5,-3] 2 4
(-3,-2.5] 10 5
(-2.5,-2] 33 34
(-2,-1.5] 88 70
(-1.5,-1] 184 190
(-1,-0.5] 300 309
CHAPTER 11
Numerical Methods in R
11.1 The natural logarithm and exponential functions are inverses of each other, so that
mathematically log(exp x) = exp(log x) = x. Show by example that this property
does not hold exactly in computer arithmetic. Does the identity hold with near
equality? (See all.equal.)
> log(exp(3)) == exp(log(3))
[1] FALSE
> log(exp(3)) - exp(log(3))
[1] -4.440892e-16
> all.equal(log(exp(3)), exp(log(3)))
[1] TRUE
11.2 Suppose that X and Y are independent random variables variables, X ∼ Beta(a, b)
and Y ∼ Beta(r, s). Then it can be shown that
r−1
r+s−1 a+b−1
ka+b+r+s−2
a+r−1−k
P (X < Y ) =
k=max(r−b,0) a+r−1
Write a function to compute P (X < Y ) for any a, b, r, s > 0. Compare your result
with a Monte Carlo estimate of P (X < Y ) for (a, b) = (10, 20) and (r, s) = (5, 5).
> comp.beta <- function(a, b, r, s) {
+ k <- max(c(r - b, 0)):(r - 1)
+ i1 <- lchoose(r + s - 1, k)
+ i2 <- lchoose(a + b - 1, a + r - 1 - k)
+ i3 <- lchoose(a + b + r + s - 2, a + r - 1)
+ return(sum(exp(i1 + i2 - i3)))
+ }
> a <- 10
> b <- 20
> r <- 5
> s <- 5
> comp.beta(a, b, r, s)
[1] 0.8259472
> m <- 10000
> x <- rbeta(m, a, b)
> y <- rbeta(m, r, s)
> mean(x < y)
[1] 0.8295
125
(b) Modify the function so that it computes and returns the sum.
(c) Evaluate the sum when a = (1, 2)T .
> da <- function(a, K = 60) {
+ if (K < 0)
+ return(0)
+ k <- 0:K
+ d <- length(a)
+ aa <- sum(a * a)
+ log.ak <- (k + 1) * log(aa)
+ log.ck <- lgamma((d + 1)/2) + lgamma(k + 1.5) - lgamma(k +
+ 1) - k * log(2) - log((2 * k + 1) * (2 * k +
+ 2)) - lgamma(k + d/2 + 1)
+ y <- exp(log.ak + log.ck)
+ i <- rep(c(1, -1), length = K + 1)
+ z <- sqrt(2/pi) * sum(i * y)
+ return(min(c(z, sqrt(aa))))
+ }
> a <- c(1, 2)
> da(a = a)
[1] 1.22249
11.4 Find the intersection points A(k) of the curves
a2 (k − 1)
Sk−1 (a) = P t(k − 1) >
k − a2
and
a2 k
Sk (a) = P t(k) > ,
k + 1 − a2
for k = 4 : 25, 100, 500, 1000, where t(k) is a Student t random variable with k
degrees of freedom.
First we plot one pair of functions, to find an interval that brackets the root.
Actually, we plot the difference Sk−1 (a) − Sk (a). Note that the function can only
be evaluated for |a| < k.
> k <- 8
> a <- seq(1, sqrt(k) - 0.01, length = 100)
> y1 <- 1 - pt(sqrt(a^2 * (k - 1)/(k - a^2)), df = k -
+ 1)
> y2 <- 1 - pt(sqrt(a^2 * k/(k + 1 - a^2)), df = k)
> plot(a, y1 - y2, type = "l")
0.0020
0.0010
y1 − y2
0.0000
−0.0010
The plot for k = 8 suggests that there is one positive root in the interval (1, 2).
After further checking, the same interval can be used for all of the values of k
needed. Write a function f (a; k) = Sk−1 (a) − Sk (a). Then use uniroot (Brent’s
method) to solve f (a; k) = 0.
K r pr
[1,] 4 1.492103 0.11624401
[2,] 5 1.533556 0.09995866
[3,] 6 1.562744 0.08943713
[4,] 7 1.584430 0.08209372
[5,] 8 1.601185 0.07668485
[6,] 9 1.614521 0.07253902
[7,] 10 1.625390 0.06926236
[8,] 11 1.634419 0.06660884
[9,] 12 1.642038 0.06441697
[10,] 13 1.648554 0.06257645
[11,] 14 1.654190 0.06100942
[12,] 15 1.659114 0.05965937
[13,] 16 1.663452 0.05848431
[14,] 17 1.667303 0.05745240
[15,] 18 1.670745 0.05653908
[16,] 19 1.673840 0.05572507
[17,] 20 1.676637 0.05499504
[18,] 21 1.679178 0.05433669
[19,] 22 1.681496 0.05373996
[20,] 23 1.683620 0.05319661
[21,] 24 1.685572 0.05269980
[22,] 25 1.687373 0.05224382
[23,] 100 1.720608 0.04422306
[24,] 500 1.729755 0.04214617
[25,] 1000 1.730907 0.04188852
for a, where
a2 k
ck = .
k + 1 − a2
Compare the solutions with the points A(k) in Exercise 11.4.
The easiest way to solve the equation is to observe that the right-hand side of
the equation is F (ck ) − F (0), where F (x) is the cdf of a Student t variable with k
degrees of freedom. Similarly, the left-hand side of the equation can be expressed in
terms of the cdf of a t(k − 1) variable. The solution ck can be found from Exercise
11.4.
> a <- r
> ck <- sqrt(a^2 * K/(K + 1 - a^2))
> cbind(K, a, ck)
K a ck
[1,] 4 1.492103 1.791862
[2,] 5 1.533556 1.795334
[3,] 6 1.562744 1.793016
[4,] 7 1.584430 1.789174
[5,] 8 1.601185 1.785136
[6,] 9 1.614521 1.781334
[7,] 10 1.625390 1.777885
[8,] 11 1.634419 1.774801
[9,] 12 1.642038 1.772054
[10,] 13 1.648554 1.769606
[11,] 14 1.654190 1.767419
[12,] 15 1.659114 1.765459
[13,] 16 1.663452 1.763694
[14,] 17 1.667303 1.762099
[15,] 18 1.670745 1.760652
[16,] 19 1.673840 1.759335
[17,] 20 1.676637 1.758130
[18,] 21 1.679178 1.757025
[19,] 22 1.681496 1.756009
[20,] 23 1.683620 1.755070
[21,] 24 1.685572 1.754201
[22,] 25 1.687373 1.753395
[23,] 100 1.720608 1.737726
[24,] 500 1.729755 1.733212
[25,] 1000 1.730907 1.732638
11.6 Write a function to compute the cdf of the Cauchy distribution, which has density
1
, −∞ < x < ∞,
θπ(1 + [(x − η)/θ]2 )
where θ > 0. Compare your results to the results from the R function pcauchy.
Let y = (x − η)/θ, so y is standard Cauchy with density f (y) = π1 1+y
1
2 . Hence
y 1
0
f (t)dt = π arctan(y).
> prC <- function(x, eta = 0, theta = 1) {
+ y <- (x - eta)/theta
+ v <- atan(abs(y))/pi
+ if (y >= 0)
+ value <- v + 0.5
+ if (y < 0)
+ value <- 0.5 - v
+ value
+ }
> x <- matrix(seq(-3, 3, 1), ncol = 1)
> p1 <- apply(x, MARGIN = 1, FUN = prC)
> p2 <- pcauchy(x)
> cbind(x, p1, p2)
p1
[1,] -3 0.1024164 0.1024164
[2,] -2 0.1475836 0.1475836
[3,] -1 0.2500000 0.2500000
[4,] 0 0.5000000 0.5000000
[5,] 1 0.7500000 0.7500000
[6,] 2 0.8524164 0.8524164
[7,] 3 0.8975836 0.8975836
The source code in pcauchy.c handles the integral as follows:
if (!lower_tail)
x = -x;
/* for large x, the standard formula suffers from cancellation.
* This is from Morten Welinder thanks to Ian Smiths atan(1/x) : */
if (fabs(x) > 1) {
double y = atan(1/x) / M_PI;
return (x > 0) ? R_D_Clog(y) : R_D_val(-y);
} else
return R_D_val(0.5 + atan(x) / M_PI);
11.7 Use the simplex algorithm to solve the following problem.
Minimize 4x + 2y + 9z subject to
2x + y + z ≤ 2
x − y + 3z ≤ 3
x ≥ 0, y ≥ 0, z ≥ 0.
See Example 11.16. The constraints can be written as A1 x ≤ b1 and x ≥ 0.
Enter the coefficients of the objective function in a.
> library(boot)
> A1 <- rbind(c(2, 1, 1), c(1, -1, 3))
> b1 <- c(2, 3)
> a <- c(4, 2, 9)
> simplex(a = a, A1 = A1, b1 = b1, maxi = TRUE)
Linear Programming Results
11.8 In the Morra game, the set of optimal strategies are not changed if a constant is
subtracted from every entry of the payoff matrix, or a positive constant is multiplied
times every entry of the payoff matrix. However, the simplex algorithm may termi-
nate at a different basic feasible point (also optimal). Compute B = A + 2, find the
solution of game B, and verify that it is one of the extreme points (11.12)–(11.15)
of the original game A. Also find the value of game A and game B.
> solve.game <- function(A) {
+ min.A <- min(A)
+ A <- A - min.A
+ max.A <- max(A)
+ A <- A/max(A)
+ m <- nrow(A)
+ n <- ncol(A)
+ it <- n^3
+ a <- c(rep(0, m), 1)
+ A1 <- -cbind(t(A), rep(-1, n))
+ b1 <- rep(0, n)
+ A3 <- t(as.matrix(c(rep(1, m), 0)))
+ b3 <- 1
+ sx <- simplex(a = a, A1 = A1, b1 = b1, A3 = A3, b3 = b3,
+ maxi = TRUE, n.iter = it)
+ a <- c(rep(0, n), 1)
+ A1 <- cbind(A, rep(-1, m))
+ b1 <- rep(0, m)
+ A3 <- t(as.matrix(c(rep(1, n), 0)))
+ b3 <- 1
+ sy <- simplex(a = a, A1 = A1, b1 = b1, A3 = A3, b3 = b3,
+ maxi = FALSE, n.iter = it)
+ soln <- list(A = A * max.A + min.A, x = sx$soln[1:m],
C6965
ISBN 1-420-07696-5