Bootsteps
Bootsteps
Guillaume A. Rousselet
2020-01-16
Contents
Dependencies 2
Bootstrap implementation 3
Sampling with replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Bootstrap mean estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
P value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Getting bootstrap confidence intervals using functions . . . . . . . . . . . . . . . . . . . . . . . . . 10
Check CI coverage 12
References 30
1
Companion R Notebook to the article:
The percentile bootstrap: a primer with step-by-step instructions in R
Rousselet G.A., Pernet C.R., Wilcox R.R. (2020)
Advances in Methods and Practices in Psychological Science
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath
the code.
You can execute each chunk by clicking the Run button within the chunk (green arrow) or by placing your
cursor inside it and pressing Cmd+Shift+Enter on a mac, Ctrl+Shift+Enter on a pc. Here is a full list of
shortcuts.
For a thorough introduction to R we recommend the free online book R for Data Science (Grolemund &
Wickham, 2017). See many other great resources here.
Dependencies
# Rand Wilcox's functions from the Robust Estimation and Hypothesis Testing book
source("./functions/Rallfun-v35-light.txt") # simplified version to run notebook examples
# source("./functions/Rallfun-v35.txt") # full version
The previous chunk loads a simplified version of Rand Wilcox’s collection of functions. To get all the functions
described in his Robust Estimation and Hypothesis Testing book, do:
source("./functions/Rallfun-v35.txt").
The latest version of Rallfun is available on Rand Wilcox’s website.
If you use functions from the full file (Rallfun-v35.txt) or a more recent file, remember to set
the SEED argument to FALSE. For instance in onesampb() called later in this notebook:
onesampb(x,est=median,alpha=.05,nboot=2000,SEED=FALSE).
If SEED=TRUE, the function will always return the same bootstrap results – see below for explanation about
setting the seed of random number generators in R.
sessionInfo()
2
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] SimJoint_0.2.2 simpleboot_1.1-7 boot_1.3-22 ggplot2_3.2.1
## [5] tibble_2.1.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 knitr_1.26 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.1 rlang_0.4.2
## [9] stringr_1.4.0 dplyr_0.8.3 tools_3.6.1 grid_3.6.1
## [13] gtable_0.3.0 xfun_0.12 withr_2.1.2 htmltools_0.4.0
## [17] RcppParallel_4.4.4 yaml_2.2.0 lazyeval_0.2.2 digest_0.6.23
## [21] assertthat_0.2.1 lifecycle_0.1.0 crayon_1.3.4 purrr_0.3.3
## [25] glue_1.3.1 evaluate_0.14 rmarkdown_2.0 stringi_1.4.5
## [29] compiler_3.6.1 pillar_1.4.3 scales_1.1.0 pkgconfig_2.0.3
Bootstrap implementation
We start by looking at how the bootstrap is implemented in the one-sample case. See an interactive demo
here. The core mechanism of the bootstrap is sampling with replacement, which is equivalent to simulating
experiments using only the data at hand.
## [1] 1 2 3 4 5 6
To make bootstrap inferences, we sample with replacement from that sequence using the sample() function.
That’s the engine under the hood of any bootstrap technique. Let’s generate our first bootstrap sample:
set.seed(21) # for reproducible results
sample(samp, size = n, replace = TRUE) # sample with replacement
## [1] 1 3 1 2 5 3
The function set.seed() is used to determine the starting point of the random number generator used by
the sample() function. The number 21 has no particular meaning, it just ensures that different users of
the same lines of code will get the same pseudo-random outcome. Typing set.seed(666) or some other
value will give a different result. We recommend to always set the seed to some value when using bootstrap
functions or more generally code involving random number generation, so the results can be reproduced or
compared across methods. However, the reproducibility of the random number generation depends on the
version of R. For instance, the first version of this R Notebook was created using R 3.5, but a reviewer alerted
us to a change in the way random numbers are generated in R 3.6. So the current version of the notebook is
based on R 3.6. Future readers are therefore warned that there could be discrepancies between fresh outputs
from the notebook and the values in the pdf version of the notebook pr in the companion article.
We also recommend to run the same bootstrap analyses a few times, to check that the results are consistent
across multiple random samples. This can be done by commenting out (adding # at the start of the line) or
deleting the set.seed call, and then running the same chunk several times. If the results differ substantially
3
across repeated calls to the same bootstrap code or function, the number of bootstrap samples should probably
be increased – see discussion in Hesterberg (2015).
We do it again, getting a different bootstrap sample:
sample(samp, size = n, replace = TRUE) # sample with replacement
## [1] 3 4 2 6 6 6
Third time:
sample(samp, size = n, replace = TRUE) # sample with replacement
## [1] 3 6 2 3 4 5
We could also generate our 3 bootstrap samples in one go:
set.seed(21) # reproducible example
nboot <- 3
matrix(sample(samp, size = n*nboot, replace = TRUE), nrow = nboot, byrow = TRUE)
How do we use the bootstrap samples? It might be tempting to use them to make inferences about the mean
of our sample (as we will see below this is a bad idea). With the bootstrap, we ask: what are the plausible
population means compatible with our data, without making any parametric assumptions? To answer this
question, we take bootstrap samples by sampling with replacement from the data. For each bootstrap sample,
we compute the mean. This can be done using a for loop. Although for loops can be avoided, they are very
practical in many situations and they make the code easier to read.
Loop
In R, like in Python, indexing is done using square brackets (e.g. boot.m[B]), whereas some other languages
use parentheses (Matlab for instance).
4
Same in one line of code
set.seed(21)
boot.m <- apply(matrix(sample(samp, size = n*nboot, replace = TRUE), nrow = nboot), 1, mean)
The lollipop chart shown below illustrates the first 50 bootstrap means, in the order in which they were
sampled. The grey horizontal line marks the sample mean (3.5). The bootstrap means randomly fluctuate
around the sample mean. They represent the means we could expect if we were to repeat the same experiment
many times, given that we can only sample from the data at hand.
n.show <- 50 # show only n.show first bootstrap means
df <- tibble(x = 1:n.show, y = boot.m[1:n.show])
ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_hline(yintercept = mean(samp), colour = "grey", size = 1) +
# comment next line to make a scatterplot instead of a lollipop chart:
geom_segment(aes(x=x, xend=x, y=0, yend=y)) +
geom_point(size=2.5, color="red", fill=alpha("orange", 0.3), alpha=0.7, shape=21, stroke=2) +
scale_x_continuous(breaks = c(1, seq(10, 100, 10))) +
scale_y_continuous(breaks = seq(1, 6, 1)) +
coord_cartesian(ylim = c(0, 6)) +
labs(x = "Bootstrap samples", y = "Bootstrap means")
5
Bootstrap means
1 10 20 30 40 50
Bootstrap samples
# ggsave(filename=('./figures/figure_50bootsamp.pdf'),width=7,height=5)
Because we’ve bootstrapped a small sample of integer values, the bootstrap means only take a small number
5
of 25 unique values.
Density plot
We can illustrate all the bootstrap means using a density plot, which is like a smooth histogram. The density
plot shows the relative probability of observing different bootstrap means.
df <- as_tibble(with(density(boot.m),data.frame(x,y)))
0.6
0.4
Density
0.2
0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdens.pdf'),width=7,height=5)
Confidence interval
Formula
We set alpha to 0.05 to get a 95% confidence interval. That means, if we were to repeat the same experiment
many times, and for each experiment we compute a confidence interval, in the long-run, 95% of these intervals
should contain the population value. This means that for a given experiment, the confidence interval does or
does not contain the population we’re trying to estimate. Also, the actual coverage depends on the method
6
used to build the confidence interval and the quantity we’re trying to estimate. In particular, it is well-known
that the bootstrap should not be used to build confidence intervals for the mean, because the coverage can
be far from the expected value. For instance, when sampling from skewed distributions, the coverage can be
much lower than the expected 95%. See next main section on how to use a simulation to check coverage.
alpha <- 0.05
ci <- quantile(boot.m, probs = c(alpha/2, 1-alpha/2)) # [2.17, 4.83]
round(ci, digits = 2)
## 2.5% 97.5%
## 2.17 4.83
There are many ways to estimate quantiles. For instance, the R function quantile() offers 9 types of
calculations, with default to type=7. Hesterberg (2015) recommends using type=6 to avoid getting confidence
intervals that are too narrow. With sample sizes and nboot sufficiently large, the type argument will not
matter, but it is unclear when this is the case.
In Rand Wilcox’s functions, the procedure in the next chunk is used, which can give results slightly different
from those in the previous chunk. With a large number of bootstrap samples, which quantile method is used
will make little difference.
alpha <- 0.05
bvec <- sort(boot.m) # sort bootstrap means in ascending order
# define quantiles
low <- round((alpha/2)*nboot) # 25
up <- nboot-low # 975
low <- low+1
# get confidence interval
ci <- c(bvec[low],bvec[up]) # [2.33, 4.83]
round(ci, digits = 2)
Graphical representation
The horizontal line marks the 95% confidence interval. The boxes report the values of the CI bounds. L
stands for lower bound, U for upper bound.
df <- as_tibble(with(density(boot.m),data.frame(x,y)))
7
0.6
0.4
Density
0.2
L = 2.17 U = 4.83
0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdensci.pdf'),width=7,height=5)
P value
## [1] 0.155
Summary figure
Make figure
8
ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_vline(xintercept = mean(samp), colour = "grey", size = 1) +
# p value
geom_area(data = df.pv1,
aes(x = x, y = y),
fill = "red", alpha = 1) +
geom_area(data = df.pv2,
aes(x = x, y = y),
fill = "red", alpha = .2) +
# density
geom_line(data = df, size = 2) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
coord_cartesian(xlim = c(0, 6)) +
labs(x = "Bootstrap means", y = "Density") +
# Null value
geom_segment(x = null.value,
xend = null.value,
y = 0,
yend = df$y[which.min(abs(df$x-null.value))],
size = 1,
colour = "black",
linetype = "dotted") +
# p value arrow -------------
geom_segment(x = 1.7, xend = 2.2, y = 0.1, yend = 0.03,
arrow = arrow(type = "closed",
length = unit(0.25, "cm")),
colour = "grey50", size = 1) +
annotate(geom = "label", x = 1.3, y = 0.12, size = 7,
colour = "white", fill = "red", fontface = "bold",
label = expression(paste(italic("p"), " value / 2")))
9
0.6
0.4
Density
0.2
p value / 2
0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdenspval.pdf'),width=7,height=5)
onesampb
We can get the bootstrap confidence interval and p value by calling the onesampb() function from Rand
Wilcox:
set.seed(21)
nboot <- 1000 # number of bootstrap samples
onesampb(samp, est=mean, alpha=.05, nboot=nboot, nv=null.value)
## $ci
## [1] 2.166667 4.833333
##
## $n
## [1] 6
##
## $estimate
## [1] 3.5
##
## $p.value
## [1] 0.155
onesampb() can be used with any estimator, define with the est argument, for instance the median (median),
a trimmed mean (tm), a quantile estimate (hd) or some measure of variability, such as the median
10
absolute deviation to the median (mad).
Set SEED to TRUE to get the same results every time you use the function. Set it to FALSE to use difference
random bootstrap samples, so the function returns different results every time you use it. null.value is the
null value used in computing the p value.
boot
Another option is to use the boot() function from the boot package (Canty & Ripley, 2017; Davison &
Hinkley, 1997). This function is more complicated than using functions with a narrow focus like onesampb()
because it requires users to create a function to process the data. The advantage is that the bootstrap
distribution is saved and can be used in a second step to make illustrations and compute various quantities.
You can find a tutorial about the boot package for instance here and here.
set.seed(21)
# view results
boot.res
##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = samp, statistic = theta, R = nboot)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 3.5 -0.006833333 0.6978708
plot(boot.res)
11
Histogram of t
0.8
5
0.6
4
Density
t*
0.4
3
0.2
2
0.0
1 2 3 4 5 −3 −1 0 1 2 3
Check CI coverage
Example of a simple simulation to check the probability coverage of a confidence interval method. The
simulation has 5,000 iterations. Increasing this number would lead to more precise results. For a simple test,
10,000 iterations or more could be used. For more complex applications, time might be a constraint.
The sample size is 30, which seems reasonably high for a psychology experiment. A more systematic simulation
should include sample size as a parameter.
12
set.seed(666) # reproducible results
nsim <- 5000 # simulation iterations
nsamp <- 30 # sample size
alpha <- 0.05 # alpha level
nboot <- 2000 # number of bootstrap samples
pop <- rlnorm(1000000) # define lognormal population
pop.m <- mean(pop) # population mean
ptrim <- 0.2 # proportion of trimming
pop.tm <- mean(pop, trim = ptrim) # population 20% trimmed mean
ci.coverage <- matrix(NA, nrow = nsim, ncol = 3) # declare matrix of results
Load results: the next chunk requires the data/ folder to be in the same directory as this
notebook. Alternatively, you could run the simulation in the previous chunk to recreate the results.
load("./data/ci.coverage.RData")
The population is lognormal and is generated outside the simulation loop by using pop <- rlnorm(1000000).
This approach works well, as in our example the population mean is 1.647, which is very close to the known
true population mean of 1.649. An alternative is to generate the random numbers directly inside the loop
by using samp <- rlnorm(nsamp). The first approach used here makes it more intuitive that we sample from
a known population. This approach is also useful when there is no formula available to define the theoretical
population values for certain distributions.
The lognormal distribution is one of many skewed mathematical distributions. It serves to illustrate what
can happen when sampling from skewed distributions in general. Other shapes could be used to, if some
domain specific information is available. For instance, ex-Gaussian distributions do a good job at capturing
the shape of reaction time distributions.
The population means and trimmed means differ and are estimated independently in the simulation: the
sample mean is used to make inferences about the population mean, whereas the sample trimmed mean is
used to make inferences about the population trimmed mean.
Here are the results:
13
out <- apply(ci.coverage, 2, mean) # average across simulations
out
Make data
We sample from log-normal distributions to mimic distributions of response times. Alternatively, we could
have used the popular ex-Gaussian distribution. The details do not matter: the goal of the example is to
make inferences about some sort of continuous and skewed distribution, which is very common in life sciences
in general, and neuroscience and psychology in particular.
set.seed(44) # reproducible results
# Group 1
n1 <- 50
m <- 400
s <- 50
location <- log(m^2 / sqrt(s^2 + m^2))
shape <- sqrt(log(1 + (s^2 / m^2)))
g1 <- rlnorm(n1, location, shape)
# Group 2
n2 <- 70
m <- 500
s <- 70
location <- log(m^2 / sqrt(s^2 + m^2))
shape <- sqrt(log(1 + (s^2 / m^2)))
g2 <- rlnorm(n2, location, shape)
Illustrate 2 groups
14
yend = rep(1.1,2),
x = c(hd(g1,0.25), hd(g2, 0.25)),
xend = x,
gp = factor(c("Group 1","Group 2"))
)
15
Group 1 Group 2
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
Response times (ms)
# ggsave(filename=('./figures/figure_2gps.pdf'),width=7,height=2)
Bootstrap
set.seed(1)
nboot <- 2000 # number of boostrap samples
ptrim <- 0.2 # proportion of trimming
# bootstrap sampling independently from each group
boot1 <- matrix(sample(g1, size=n1*nboot, replace=TRUE), nrow=nboot)
boot2 <- matrix(sample(g2, size=n2*nboot, replace=TRUE), nrow=nboot)
# compute trimmed mean for each group and bootstrap sample
boot1.tm <- apply(boot1, 1, mean, trim=ptrim)
boot2.tm <- apply(boot2, 1, mean, trim=ptrim)
# get distribution of sorted bootstrap differences
boot.diff <- sort(boot1.tm - boot2.tm)
There are several ways to estimate quantiles; here is the formula used in pb2gen(), a function from Rand
Wilcox dedicated to the comparisons of estimators from two independent groups, using the percentile
bootstrap.
alpha <- 0.05
low <- round((alpha/2)*nboot) + 1
up <- nboot - low
ci <- c(boot.diff[low], boot.diff[up])
round(ci, digits = 1)
## 2.5% 97.5%
## -113.7 -68.2
16
Bootstrap p value
## [1] 0
The bootstrap confidence interval and the p value can also be obtained in one call to the pb2gen() function.
The input est could be changed to many other estimators to make various bootstrap inferences about
differences between independent groups.
set.seed(1)
pb2gen(g1, g2, alpha=0.05, nboot=nboot, est=mean, trim=0.2)
## $est.1
## [1] 393.8316
##
## $est.2
## [1] 485.0868
##
## $est.dif
## [1] -91.25519
##
## $ci
## [1] -113.70589 -68.24266
##
## $p.value
## [1] 0
##
## $sq.se
## [1] 140.3912
##
## $n1
## [1] 50
##
## $n2
## [1] 70
The output sq.se is the bootstrap standard error of the difference between trimmed means.
The same analysis can be done with the boot() function but it gets more complicated to program. We can
use a helper function from the simpleboot package to ease the process:
# bootstrap data
set.seed(1)
boot.res <- two.boot(g1, g2, FUN=mean, R=nboot, trim=0.2)
17
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.res, type = "perc")
##
## Intervals :
## Level Percentile
## 95% (-113.90, -68.06 )
## Calculations and Intervals on Original Scale
Illustration
df <- as_tibble(with(density(boot.diff),data.frame(x,y)))
18
0.03
0.02
Density
0.01
L = −114 U = −68
0.00
−120 −100 −80 −60 −40 −20 0
Bootstrap differences between 20% trimmed means
# ggsave(filename=('./figures/figure_2gpsbootres.pdf'),width=7,height=5)
Bootstrap
We already have generated bootstrap samples, so here we simply compute MAD for each of them.
boot1.mad <- apply(boot1, 1, mad)
boot2.mad <- apply(boot2, 1, mad)
boot.diff <- sort(boot1.mad - boot2.mad)
19
ci <- quantile(boot.diff, probs = c(alpha/2, 1-alpha/2)) # [-62.2, 4.5]
ci
## 2.5% 97.5%
## -62.162623 4.529291
Bootstrap p value
## [1] 0.083
Illustration
df <- as_tibble(with(density(boot.diff),data.frame(x,y)))
20
0.025
0.020
0.015
Density
0.010
0.005
L = −62 U= 5
0.000
−100 −80 −60 −40 −20 0 20 40
Bootstrap differences between MADs
# ggsave(filename=('./figures/figure_mad.pdf'),width=7,height=5)
pb2gen() function
set.seed(1)
pb2gen(g1, g2, alpha=0.05, nboot=nboot, est=mad)
## $est.1
## [1] 40.29569
##
## $est.2
## [1] 68.39412
##
## $est.dif
## [1] -28.09842
##
## $ci
## [1] -62.154858 4.311876
##
## $p.value
## [1] 0.083
##
## $sq.se
21
## [1] 294.1268
##
## $n1
## [1] 50
##
## $n2
## [1] 70
boot() function
# bootstrap data
set.seed(1)
boot.res <- two.boot(g1, g2, FUN=mad, R=nboot)
Make data
set.seed(777)
n <- 50 # sample size
npop <- 1000000 # finite population size
mu <- c(0, 0) # means of the variables
rho <- 0.2 # population Spearman correlation between variables
sigma1 <- matrix(c(1, rho, rho, 1), nrow = 2, byrow = TRUE) # covariance matrix
22
# Create population 1 -----------------------------
Illustrate data
# ggplot figure
p <- ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_point(alpha = 0.4, size = 3) +
coord_cartesian(xlim = c(-2.5, 2.5), ylim = c(-2.5, 2.5)) +
theme(axis.title = element_text(size = 15, colour = "black"),
axis.text = element_text(size = 13, colour = "black"),
strip.text = element_text(size = 15, face = "bold")) +
labs(x = expression(italic("Variable A")), y = expression(italic("Variable B"))) +
ggtitle("Group 1")
pA <- p
p
23
Group 1
2
Variable B
−1
−2
−2 −1 0 1 2
Variable A
For this sample, Spearman’s correlation coefficient is 0.43, suggesting a relatively strong association. In fact,
we simulated these data, so we know that the population Spearman correlation is 0.2, which means that our
sample over-estimates the true correlation.
Bootstrap
To compute a bootstrap confidence interval for the correlation population, we sample pairs of observations
with replacement:
set.seed(21)
nboot <- 5000 # number of bootstrap samples
alpha <- 0.05 # alpha level for confidence interval
boot.corr1 <- vector(mode = "numeric", length = nboot) # vector of bootstrap correlations
for(B in 1:nboot){
boot.id <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations
boot.corr1[B] <- cor(x1[boot.id], y1[boot.id], method = "spearman")
}
In the code, the variable boot.id is a vector of participant indices, sampled with replacement from the full
list of participants. These indices are then used to index the matched vectors x1 and y1. That way, paired
observations are kept together. Spearman’s correlation is computed for each of these bootstrap pairs.
## 2.5% 97.5%
## 0.1875850 0.6278105
P value
24
pval <- sum(boot.corr1 < 0)/nboot
pval <- 2 * min(pval, 1 - pval)
pval
## [1] 8e-04
The bootstrap results can also be obtained by calling the R command: corb(x1,y1, corfun = spear, SEED
= FALSE). The argument corfun can be changed from spear (for Spearman) to another correlation function,
such as pbcor for a percentage bend correlation or wincor for a Winsorized correlation. For bootstrap
inferences about the non-robust Pearson’s correlation, a modified bootstrap technique is required, which is
implemented in the function pcorb. For details about these correlation methods, see Pernet et al. (2013) and
Wilcox (2017).
df <- as_tibble(with(density(boot.corr1),data.frame(x,y)))
25
3
Density
L = 0.19 U = 0.63
0
0.0 0.2 0.4 0.6 0.8
Bootstrap Spearman's correlations
For simulations on the (large) sample sizes needed to precisely estimate correlation analyses, see this blog
post. To see why conditioning correlation results on p values can lead to a literature full of false positives,
see this blog post. Finally, simulations of statistical power for correlation analyses are available here.
Make data
set.seed(777)
# group 2
rho <- 0.5 # correlation between variables
sigma2 <- matrix(c(1, rho, rho, 1), nrow = 2, byrow = TRUE) # covariance matrix
pop <- cbind(rnorm(npop), rnorm(npop)) # uncorrelated random normal samples
26
res <- SimJoint::SJspearman(X = pop, cor = sigma2)
pop <- res$X
# group 1
set.seed(1)
id <- sample(npop, n)
x2 <- pop[id,1]
y2 <- pop[id,2]
Illustrate data
# load data
load("./data/corr2.RData")
# ggplot figure
p <- ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_point(alpha = 0.4, size = 3) +
coord_cartesian(xlim = c(-2.5, 2.5), ylim = c(-2.5, 2.5)) +
theme(axis.title = element_text(size = 15, colour = "black"),
axis.text = element_text(size = 13, colour = "black"),
strip.text = element_text(size = 15, face = "bold")) +
labs(x = expression(italic("Variable A")), y = expression(italic("Variable B"))) +
ggtitle("Group 2")
pC <- p
p
Group 2
2
Variable B
−1
−2
−2 −1 0 1 2
Variable A
For this sample, Spearman’s correlation coefficient is 0.66.
27
Bootstrap
Bootstrap samples are obtained independently in each group. Pairs of observations are sampled with
replacement.
set.seed(21)
nboot <- 5000 # number of bootstrap samples
alpha <- 0.05 # alpha level for confidence interval
boot.diff <- vector(mode = "numeric", length = nboot) # vector of bootstrap correlations
for(B in 1:nboot){
boot.id1 <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations in group 1
boot.id2 <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations in group 2
boot.diff[B] <- cor(x1[boot.id1], y1[boot.id1], method = "spearman") - cor(x2[boot.id2], y2[boot.id2
}
## 2.5% 97.5%
## -0.499 0.030
P value
## [1] 0.0904
The bootstrap results can be obtained by calling the R command: twocor(x1,y1,x2,y2, corfun = spear).
The input argument corfun can be changed to another robust function, as already mentioned. To compare two
percentage bend correlations, use this command instead: twocor(x1,y1,x2,y2, corfun = pbcor). And
to compare two Pearson’s correlations: twopcor(x1,y1,x2,y2). We do not recommend the use of Fisher’s
r-to-z transform to compare Pearson’s correlations, because this procedure assumes normality and even a
small departure from a normal distribution can result in poor performance regardless of how large the sample
size might be (Wilcox, 2017).
corr.diff <- cor(x1, y1, method = "spearman") - cor(x2, y2, method = "spearman")
df <- as_tibble(with(density(boot.diff),data.frame(x,y)))
28
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = diff.ci[1], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(diff.ci[1], digits = 2))) +
annotate(geom = "label", x = diff.ci[2], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(diff.ci[2], digits = 2)))
pD <- p
p
2
Density
L = −0.5 U = 0.03
0
−0.6 −0.3 0.0 0.3
Bootstrap Spearman's corr. differences
The bootstrap distribution and the confidence interval are very broad, suggesting a large range of population
differences. This is not surprising given the modest sample sizes in our example: correlations are noisy and
require a lot of observations to be precisely estimated (Yarkoni, 2009).
Summary figure
require(cowplot)
cowplot::plot_grid(pA, pC, pB, pD,
labels = c("A", "C", "B", "D"),
ncol = 2,
nrow = 2,
label_size = 20,
hjust = -0.5,
scale=.95)
29
# save figure
ggsave(filename=('./figures/figure_corr.pdf'),width=12,height=10)
Simulations of false positives and power for correlation comparisons are reported in this blog post. Bottom-
line: the Fisher’s r-to-z transform is not robust, and very large sample sizes are required to detect differences
between correlation coefficients.
References
Canty, A. & Ripley, B. (2017). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-20.
Davison, A. C. & Hinkley, D. V. (1997) Bootstrap Methods and Their Applications. Cambridge University
Press, Cambridge. ISBN 0-521-57391-2
Hesterberg, T. (2015) What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate
Statistics Curriculum. The American Statistician, 69, 371–386.
Iman, R.L. & Conover, W.J. (1982) A distribution-free approach to inducing rank correlation among input
variables. Communications in Statistics - Simulation and Computation, 11, 311–334.
Charlie Wusuo Liu (2019). SimJoint: Simulate Joint Distribution. R package version 0.2.2. https://CRAN.R-
project.org/package=SimJoint
Pernet, C.R., Wilcox, R.R., & Rousselet, G.A. (2013) Robust Correlation Analyses: False Positive and Power
Validation Using a New Open Source Matlab Toolbox. Front. Psychol., 3. https://www.frontiersin.org/
articles/10.3389/fpsyg.2012.00606/full
Pouillot, R. & Delignette-Muller, M.-L. (2010). Evaluating variability and uncertainty in microbial quantitative
risk assessment using two R packages. International Journal of Food Microbiology. 142(3):330-40
Wilcox, R.R. (2017) Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 4th edition.,
San Diego, CA.
Yarkoni, T. (2009) Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical
Power‚ Commentary on Vul et al. (2009). Perspectives on Psychological Science, 4, 294-298.
30