0% found this document useful (0 votes)
11 views

Bootsteps

The document provides step-by-step instructions for performing percentile bootstrap analyses in R. It describes how to generate bootstrap samples through sampling with replacement, calculate bootstrap estimates of means, confidence intervals, and p-values. It also demonstrates how to use bootstrap methods to compare two independent groups, correlation coefficients, and differences in location and spread between groups.

Uploaded by

klaudia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Bootsteps

The document provides step-by-step instructions for performing percentile bootstrap analyses in R. It describes how to generate bootstrap samples through sampling with replacement, calculate bootstrap estimates of means, confidence intervals, and p-values. It also demonstrates how to use bootstrap methods to compare two independent groups, correlation coefficients, and differences in location and spread between groups.

Uploaded by

klaudia
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Percentile bootstrap: step-by-step instructions

Guillaume A. Rousselet

2020-01-16

Contents
Dependencies 2

Bootstrap implementation 3
Sampling with replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Bootstrap mean estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
P value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Getting bootstrap confidence intervals using functions . . . . . . . . . . . . . . . . . . . . . . . . . 10

Check CI coverage 12

Compare 2 independent groups: difference in location 14


Make data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Illustrate 2 groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Bootstrap confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Bootstrap p value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Bootstrap using functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Compare 2 groups: difference in spread 19


Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Bootstrap confidence interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Bootstrap p value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Bootstrap using functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Bootstrap confidence intervals of correlation coefficients 22


Make data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Illustrate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Illustrate bootstrap correlation coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Compare correlation coefficients 26


Make data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Illustrate data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Bootstrap distribution of correlation coefficient differences . . . . . . . . . . . . . . . . . . . . . . . 28
Summary figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

References 30

1
Companion R Notebook to the article:
The percentile bootstrap: a primer with step-by-step instructions in R
Rousselet G.A., Pernet C.R., Wilcox R.R. (2020)
Advances in Methods and Practices in Psychological Science
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath
the code.
You can execute each chunk by clicking the Run button within the chunk (green arrow) or by placing your
cursor inside it and pressing Cmd+Shift+Enter on a mac, Ctrl+Shift+Enter on a pc. Here is a full list of
shortcuts.
For a thorough introduction to R we recommend the free online book R for Data Science (Grolemund &
Wickham, 2017). See many other great resources here.

Dependencies

library(tibble) # to make well-behaved data frames


library(ggplot2) # to plot data
# library(cowplot) # to make summary figures with multiple panels
source("./functions/theme_gar.txt") # ggplot2 formatting the way I like it

# Rand Wilcox's functions from the Robust Estimation and Hypothesis Testing book
source("./functions/Rallfun-v35-light.txt") # simplified version to run notebook examples
# source("./functions/Rallfun-v35.txt") # full version

library(boot) # package dedicated to bootstrap methods


library(simpleboot) # package with wrapper functions to boot, making user's life easier
# library(mc2d) # cornode function used to set population Spearman's correlation
library(SimJoint) # SJspearman works much better than mc2d::cornode

The previous chunk loads a simplified version of Rand Wilcox’s collection of functions. To get all the functions
described in his Robust Estimation and Hypothesis Testing book, do:
source("./functions/Rallfun-v35.txt").
The latest version of Rallfun is available on Rand Wilcox’s website.
If you use functions from the full file (Rallfun-v35.txt) or a more recent file, remember to set
the SEED argument to FALSE. For instance in onesampb() called later in this notebook:
onesampb(x,est=median,alpha=.05,nboot=2000,SEED=FALSE).
If SEED=TRUE, the function will always return the same bootstrap results – see below for explanation about
setting the seed of random number generators in R.
sessionInfo()

## R version 3.6.1 (2019-07-05)


## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##

2
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] SimJoint_0.2.2 simpleboot_1.1-7 boot_1.3-22 ggplot2_3.2.1
## [5] tibble_2.1.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 knitr_1.26 magrittr_1.5 tidyselect_0.2.5
## [5] munsell_0.5.0 colorspace_1.4-1 R6_2.4.1 rlang_0.4.2
## [9] stringr_1.4.0 dplyr_0.8.3 tools_3.6.1 grid_3.6.1
## [13] gtable_0.3.0 xfun_0.12 withr_2.1.2 htmltools_0.4.0
## [17] RcppParallel_4.4.4 yaml_2.2.0 lazyeval_0.2.2 digest_0.6.23
## [21] assertthat_0.2.1 lifecycle_0.1.0 crayon_1.3.4 purrr_0.3.3
## [25] glue_1.3.1 evaluate_0.14 rmarkdown_2.0 stringi_1.4.5
## [29] compiler_3.6.1 pillar_1.4.3 scales_1.1.0 pkgconfig_2.0.3

Bootstrap implementation
We start by looking at how the bootstrap is implemented in the one-sample case. See an interactive demo
here. The core mechanism of the bootstrap is sampling with replacement, which is equivalent to simulating
experiments using only the data at hand.

Sampling with replacement

Let’s say we have a sample that is a sequence of integers, from 1 to 6.


n <- 6 # sample size
samp <- 1:n
samp

## [1] 1 2 3 4 5 6
To make bootstrap inferences, we sample with replacement from that sequence using the sample() function.
That’s the engine under the hood of any bootstrap technique. Let’s generate our first bootstrap sample:
set.seed(21) # for reproducible results
sample(samp, size = n, replace = TRUE) # sample with replacement

## [1] 1 3 1 2 5 3
The function set.seed() is used to determine the starting point of the random number generator used by
the sample() function. The number 21 has no particular meaning, it just ensures that different users of
the same lines of code will get the same pseudo-random outcome. Typing set.seed(666) or some other
value will give a different result. We recommend to always set the seed to some value when using bootstrap
functions or more generally code involving random number generation, so the results can be reproduced or
compared across methods. However, the reproducibility of the random number generation depends on the
version of R. For instance, the first version of this R Notebook was created using R 3.5, but a reviewer alerted
us to a change in the way random numbers are generated in R 3.6. So the current version of the notebook is
based on R 3.6. Future readers are therefore warned that there could be discrepancies between fresh outputs
from the notebook and the values in the pdf version of the notebook pr in the companion article.
We also recommend to run the same bootstrap analyses a few times, to check that the results are consistent
across multiple random samples. This can be done by commenting out (adding # at the start of the line) or
deleting the set.seed call, and then running the same chunk several times. If the results differ substantially

3
across repeated calls to the same bootstrap code or function, the number of bootstrap samples should probably
be increased – see discussion in Hesterberg (2015).
We do it again, getting a different bootstrap sample:
sample(samp, size = n, replace = TRUE) # sample with replacement

## [1] 3 4 2 6 6 6
Third time:
sample(samp, size = n, replace = TRUE) # sample with replacement

## [1] 3 6 2 3 4 5
We could also generate our 3 bootstrap samples in one go:
set.seed(21) # reproducible example
nboot <- 3
matrix(sample(samp, size = n*nboot, replace = TRUE), nrow = nboot, byrow = TRUE)

## [,1] [,2] [,3] [,4] [,5] [,6]


## [1,] 1 3 1 2 5 3
## [2,] 3 4 2 6 6 6
## [3,] 3 6 2 3 4 5
As is apparent from these 3 examples, in a bootstrap sample, some observations are sampled more than once
and others are not sampled at all. So each bootstrap sample is like a virtual experiment in which we draw
random observations from our original sample.

Bootstrap mean estimates

How do we use the bootstrap samples? It might be tempting to use them to make inferences about the mean
of our sample (as we will see below this is a bad idea). With the bootstrap, we ask: what are the plausible
population means compatible with our data, without making any parametric assumptions? To answer this
question, we take bootstrap samples by sampling with replacement from the data. For each bootstrap sample,
we compute the mean. This can be done using a for loop. Although for loops can be avoided, they are very
practical in many situations and they make the code easier to read.

Loop

set.seed(21) # reproducible results


nboot <- 1000 # number of bootstrap samples

# declare vector of results


boot.m <- vector(mode = "numeric", length = nboot)

for(B in 1:nboot){ # bootstrap loop


boot.samp <- sample(samp, size = n, replace = TRUE) # sample with replacement
boot.m[B] <- mean(boot.samp) # save bootstrap means
}

In R, like in Python, indexing is done using square brackets (e.g. boot.m[B]), whereas some other languages
use parentheses (Matlab for instance).

4
Same in one line of code

set.seed(21)
boot.m <- apply(matrix(sample(samp, size = n*nboot, replace = TRUE), nrow = nboot), 1, mean)

Illustrate bootstrap samples

The lollipop chart shown below illustrates the first 50 bootstrap means, in the order in which they were
sampled. The grey horizontal line marks the sample mean (3.5). The bootstrap means randomly fluctuate
around the sample mean. They represent the means we could expect if we were to repeat the same experiment
many times, given that we can only sample from the data at hand.
n.show <- 50 # show only n.show first bootstrap means
df <- tibble(x = 1:n.show, y = boot.m[1:n.show])
ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_hline(yintercept = mean(samp), colour = "grey", size = 1) +
# comment next line to make a scatterplot instead of a lollipop chart:
geom_segment(aes(x=x, xend=x, y=0, yend=y)) +
geom_point(size=2.5, color="red", fill=alpha("orange", 0.3), alpha=0.7, shape=21, stroke=2) +
scale_x_continuous(breaks = c(1, seq(10, 100, 10))) +
scale_y_continuous(breaks = seq(1, 6, 1)) +
coord_cartesian(ylim = c(0, 6)) +
labs(x = "Bootstrap samples", y = "Bootstrap means")

5
Bootstrap means

1 10 20 30 40 50
Bootstrap samples
# ggsave(filename=('./figures/figure_50bootsamp.pdf'),width=7,height=5)

Because we’ve bootstrapped a small sample of integer values, the bootstrap means only take a small number

5
of 25 unique values.

Density plot

We can illustrate all the bootstrap means using a density plot, which is like a smooth histogram. The density
plot shows the relative probability of observing different bootstrap means.
df <- as_tibble(with(density(boot.m),data.frame(x,y)))

ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = mean(samp), colour = "grey", size = 1) +
geom_line(size = 2) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
coord_cartesian(xlim = c(0, 6)) +
labs(x = "Bootstrap means", y = "Density")

0.6

0.4
Density

0.2

0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdens.pdf'),width=7,height=5)

Confidence interval

Formula

We set alpha to 0.05 to get a 95% confidence interval. That means, if we were to repeat the same experiment
many times, and for each experiment we compute a confidence interval, in the long-run, 95% of these intervals
should contain the population value. This means that for a given experiment, the confidence interval does or
does not contain the population we’re trying to estimate. Also, the actual coverage depends on the method

6
used to build the confidence interval and the quantity we’re trying to estimate. In particular, it is well-known
that the bootstrap should not be used to build confidence intervals for the mean, because the coverage can
be far from the expected value. For instance, when sampling from skewed distributions, the coverage can be
much lower than the expected 95%. See next main section on how to use a simulation to check coverage.
alpha <- 0.05
ci <- quantile(boot.m, probs = c(alpha/2, 1-alpha/2)) # [2.17, 4.83]
round(ci, digits = 2)

## 2.5% 97.5%
## 2.17 4.83
There are many ways to estimate quantiles. For instance, the R function quantile() offers 9 types of
calculations, with default to type=7. Hesterberg (2015) recommends using type=6 to avoid getting confidence
intervals that are too narrow. With sample sizes and nboot sufficiently large, the type argument will not
matter, but it is unclear when this is the case.
In Rand Wilcox’s functions, the procedure in the next chunk is used, which can give results slightly different
from those in the previous chunk. With a large number of bootstrap samples, which quantile method is used
will make little difference.
alpha <- 0.05
bvec <- sort(boot.m) # sort bootstrap means in ascending order
# define quantiles
low <- round((alpha/2)*nboot) # 25
up <- nboot-low # 975
low <- low+1
# get confidence interval
ci <- c(bvec[low],bvec[up]) # [2.33, 4.83]
round(ci, digits = 2)

Graphical representation

The horizontal line marks the 95% confidence interval. The boxes report the values of the CI bounds. L
stands for lower bound, U for upper bound.
df <- as_tibble(with(density(boot.m),data.frame(x,y)))

ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = mean(samp), colour = "grey", size = 1) +
geom_line(size = 2) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
coord_cartesian(xlim = c(0, 6)) +
labs(x = "Bootstrap means", y = "Density") +
# confidence interval ----------------------
geom_segment(x = ci[1], xend = ci[2],
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = ci[1]+0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(ci[1], digits = 2))) +
annotate(geom = "label", x = ci[2]-0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(ci[2], digits = 2)))

7
0.6

0.4
Density

0.2

L = 2.17 U = 4.83
0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdensci.pdf'),width=7,height=5)

P value

null.value <- 2.5 # null value for hypothesis testing


half.pval <- mean(boot.m>null.value) +.5*mean(boot.m==null.value)
pval <- 2*min(c(half.pval,1-half.pval)) # p value = 0.155
pval

## [1] 0.155

Summary figure

Make data frame


df <- as_tibble(with(density(boot.m),data.frame(x,y)))

df.pv1 <- tibble(x = df$x[df$x<null.value],


y = df$y[df$x<null.value])

df.pv2 <- tibble(x = df$x[df$x>null.value],


y = df$y[df$x>null.value])

Make figure

8
ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_vline(xintercept = mean(samp), colour = "grey", size = 1) +
# p value
geom_area(data = df.pv1,
aes(x = x, y = y),
fill = "red", alpha = 1) +
geom_area(data = df.pv2,
aes(x = x, y = y),
fill = "red", alpha = .2) +
# density
geom_line(data = df, size = 2) +
scale_x_continuous(breaks = seq(0, 10, 1)) +
coord_cartesian(xlim = c(0, 6)) +
labs(x = "Bootstrap means", y = "Density") +
# Null value
geom_segment(x = null.value,
xend = null.value,
y = 0,
yend = df$y[which.min(abs(df$x-null.value))],
size = 1,
colour = "black",
linetype = "dotted") +
# p value arrow -------------
geom_segment(x = 1.7, xend = 2.2, y = 0.1, yend = 0.03,
arrow = arrow(type = "closed",
length = unit(0.25, "cm")),
colour = "grey50", size = 1) +
annotate(geom = "label", x = 1.3, y = 0.12, size = 7,
colour = "white", fill = "red", fontface = "bold",
label = expression(paste(italic("p"), " value / 2")))

9
0.6

0.4
Density

0.2

p value / 2

0.0
0 1 2 3 4 5 6
Bootstrap means
# ggsave(filename=('./figures/figure_bootdenspval.pdf'),width=7,height=5)

Getting bootstrap confidence intervals using functions

onesampb

We can get the bootstrap confidence interval and p value by calling the onesampb() function from Rand
Wilcox:
set.seed(21)
nboot <- 1000 # number of bootstrap samples
onesampb(samp, est=mean, alpha=.05, nboot=nboot, nv=null.value)

## $ci
## [1] 2.166667 4.833333
##
## $n
## [1] 6
##
## $estimate
## [1] 3.5
##
## $p.value
## [1] 0.155
onesampb() can be used with any estimator, define with the est argument, for instance the median (median),
a trimmed mean (tm), a quantile estimate (hd) or some measure of variability, such as the median

10
absolute deviation to the median (mad).
Set SEED to TRUE to get the same results every time you use the function. Set it to FALSE to use difference
random bootstrap samples, so the function returns different results every time you use it. null.value is the
null value used in computing the p value.

boot

Another option is to use the boot() function from the boot package (Canty & Ripley, 2017; Davison &
Hinkley, 1997). This function is more complicated than using functions with a narrow focus like onesampb()
because it requires users to create a function to process the data. The advantage is that the bootstrap
distribution is saved and can be used in a second step to make illustrations and compute various quantities.
You can find a tutorial about the boot package for instance here and here.
set.seed(21)

# create function taking bootstrap indices


theta <- function(data, indices){
theta <- mean(data[indices])
}
# bootstrap data
boot.res <- boot(data=samp, statistic=theta, R=nboot)

# view results
boot.res

##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = samp, statistic = theta, R = nboot)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 3.5 -0.006833333 0.6978708
plot(boot.res)

11
Histogram of t
0.8

5
0.6

4
Density

t*
0.4

3
0.2

2
0.0

1 2 3 4 5 −3 −1 0 1 2 3

t* Quantiles of Standard Normal


# get 95% confidence interval
# boot.ci(boot.res, type="bca") # specify one type of bootstrap
# boot.ci(boot.res) # get all types
boot.ci(boot.res, type = "perc") # get percentile bootstrap CI

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.res, type = "perc")
##
## Intervals :
## Level Percentile
## 95% ( 2.167, 4.833 )
## Calculations and Intervals on Original Scale
Some values returned by the boot.ci are not covered in this tutorial:
- bias is the bootstrap estimate of bias of the statistic (in this case the mean), a topic covered in detail here
- std.error is the bootstrap estimate of the standard error of the statistic (mean). It is calculated as the
standard deviation of the bootstrap distribution. See details in this other article.

Check CI coverage
Example of a simple simulation to check the probability coverage of a confidence interval method. The
simulation has 5,000 iterations. Increasing this number would lead to more precise results. For a simple test,
10,000 iterations or more could be used. For more complex applications, time might be a constraint.
The sample size is 30, which seems reasonably high for a psychology experiment. A more systematic simulation
should include sample size as a parameter.

12
set.seed(666) # reproducible results
nsim <- 5000 # simulation iterations
nsamp <- 30 # sample size
alpha <- 0.05 # alpha level
nboot <- 2000 # number of bootstrap samples
pop <- rlnorm(1000000) # define lognormal population
pop.m <- mean(pop) # population mean
ptrim <- 0.2 # proportion of trimming
pop.tm <- mean(pop, trim = ptrim) # population 20% trimmed mean
ci.coverage <- matrix(NA, nrow = nsim, ncol = 3) # declare matrix of results

for(S in 1:nsim){ # simulation loop


samp <- sample(pop, nsamp, replace = TRUE) # random sample from population
# Mean + t-test
ci <- t.test(samp, mu = pop.m)$conf.int # standard t-test equation
ci.coverage[S,1] <- ci[1]<pop.m && ci[2]>pop.m # CI includes population value?
# create matrix of bootstrap samples
boot.mat <- matrix(sample(samp, size = nsamp*nboot, replace = TRUE), nrow = nboot)
# Mean + bootstrap
# ci <- onesampb(samp, est = mean, nv = pop.m)$ci # get bootstrap confidence interval
ci <- quantile(apply(boot.mat, 1, mean), probs = c(alpha/2, 1-alpha/2))
ci.coverage[S,2] <- ci[1]<pop.m && ci[2]>pop.m # CI includes population value?
# 20% Trimmed mean
# ci <- onesampb(samp, est = mean, nv = pop.m, trim = ptrim)$ci # get bootstrap confidence interval
ci <- quantile(apply(boot.mat, 1, mean, trim = ptrim), probs = c(alpha/2, 1-alpha/2))
ci.coverage[S,3] <- ci[1]<pop.tm && ci[2]>pop.tm # CI includes population value?
}

apply(ci.coverage, 2, mean) # average across simulations for each method

# save simulation results to load in next chunk


save(ci.coverage, pop.m, pop.tm, file = "./data/ci.coverage.RData")

Load results: the next chunk requires the data/ folder to be in the same directory as this
notebook. Alternatively, you could run the simulation in the previous chunk to recreate the results.
load("./data/ci.coverage.RData")

The population is lognormal and is generated outside the simulation loop by using pop <- rlnorm(1000000).
This approach works well, as in our example the population mean is 1.647, which is very close to the known
true population mean of 1.649. An alternative is to generate the random numbers directly inside the loop
by using samp <- rlnorm(nsamp). The first approach used here makes it more intuitive that we sample from
a known population. This approach is also useful when there is no formula available to define the theoretical
population values for certain distributions.
The lognormal distribution is one of many skewed mathematical distributions. It serves to illustrate what
can happen when sampling from skewed distributions in general. Other shapes could be used to, if some
domain specific information is available. For instance, ex-Gaussian distributions do a good job at capturing
the shape of reaction time distributions.
The population means and trimmed means differ and are estimated independently in the simulation: the
sample mean is used to make inferences about the population mean, whereas the sample trimmed mean is
used to make inferences about the population trimmed mean.
Here are the results:

13
out <- apply(ci.coverage, 2, mean) # average across simulations
out

## [1] 0.877 0.872 0.946


Coverage is 87.7% for the t-test, 87.2% for the bootstrap + mean, and 94.6% for the bootstrap + 20%
trimmed mean. This means that when sampling from a skewed distribution such as the lognormal distribution,
coverage can be very different from the expected one (here 95% coverage).
Want to see more examples of simulations + code? Head here for type I error simulations, and here for
power simulations.

Compare 2 independent groups: difference in location


Inferences on 20% trimmed means of skewed distributions.

Make data

We sample from log-normal distributions to mimic distributions of response times. Alternatively, we could
have used the popular ex-Gaussian distribution. The details do not matter: the goal of the example is to
make inferences about some sort of continuous and skewed distribution, which is very common in life sciences
in general, and neuroscience and psychology in particular.
set.seed(44) # reproducible results

# Group 1
n1 <- 50
m <- 400
s <- 50
location <- log(m^2 / sqrt(s^2 + m^2))
shape <- sqrt(log(1 + (s^2 / m^2)))
g1 <- rlnorm(n1, location, shape)

# Group 2
n2 <- 70
m <- 500
s <- 70
location <- log(m^2 / sqrt(s^2 + m^2))
shape <- sqrt(log(1 + (s^2 / m^2)))
g2 <- rlnorm(n2, location, shape)

Illustrate 2 groups

set.seed(22) # for reproducible jitter


# raw data
df <- tibble(val = c(g1, g2),
y = rep(1, n1+n2),
gp = factor(c(rep("Group 1",n1),rep("Group 2",n2)))
)

df.q1 <- tibble(y = rep(0.9,2),

14
yend = rep(1.1,2),
x = c(hd(g1,0.25), hd(g2, 0.25)),
xend = x,
gp = factor(c("Group 1","Group 2"))
)

df.q2 <- tibble(y = rep(0.9,2),


yend = rep(1.1,2),
x = c(hd(g1,0.5), hd(g2, 0.5)),
xend = x,
gp = factor(c("Group 1","Group 2"))
)

df.q3 <- tibble(y = rep(0.9,2),


yend = rep(1.1,2),
x = c(hd(g1,0.75), hd(g2, 0.75)),
xend = x,
gp = factor(c("Group 1","Group 2"))
)

p <- ggplot(data = df, aes(x = val, y = y)) + theme_gar +


# scatterplots
geom_jitter(height = .05, alpha = 0.3, size = 2) +
theme(axis.ticks.y = element_blank(),
axis.text.x = element_text(size = 10),
axis.title.x = element_text(size = 12),
strip.text.x = element_text(size = 12),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor.x = element_blank()) +
scale_y_continuous(breaks = 1) +
# 1st quartile
geom_segment(data = df.q1, aes(y = y, yend = yend,
x = x, xend = xend),
size = 0.75, lineend = 'round', colour = "black") +
# median
geom_segment(data = df.q2, aes(y = y, yend = yend,
x = x, xend = xend),
size = 0.75, lineend = 'round', colour = "black") +
# 3rd quartile
geom_segment(data = df.q3, aes(y = y, yend = yend,
x = x, xend = xend),
size = 0.75, lineend = 'round', colour = "black") +
labs(x = "Response times (ms)") +
facet_grid(cols = vars(gp)) +
coord_cartesian(xlim = c(0, 700)) +
scale_x_continuous(breaks = seq(0, 1000, 100))
p

15
Group 1 Group 2

0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
Response times (ms)
# ggsave(filename=('./figures/figure_2gps.pdf'),width=7,height=2)

Bootstrap

set.seed(1)
nboot <- 2000 # number of boostrap samples
ptrim <- 0.2 # proportion of trimming
# bootstrap sampling independently from each group
boot1 <- matrix(sample(g1, size=n1*nboot, replace=TRUE), nrow=nboot)
boot2 <- matrix(sample(g2, size=n2*nboot, replace=TRUE), nrow=nboot)
# compute trimmed mean for each group and bootstrap sample
boot1.tm <- apply(boot1, 1, mean, trim=ptrim)
boot2.tm <- apply(boot2, 1, mean, trim=ptrim)
# get distribution of sorted bootstrap differences
boot.diff <- sort(boot1.tm - boot2.tm)

Bootstrap confidence interval

Bootstrap confidence interval - as implemented in pb2gen()

There are several ways to estimate quantiles; here is the formula used in pb2gen(), a function from Rand
Wilcox dedicated to the comparisons of estimators from two independent groups, using the percentile
bootstrap.
alpha <- 0.05
low <- round((alpha/2)*nboot) + 1
up <- nboot - low
ci <- c(boot.diff[low], boot.diff[up])
round(ci, digits = 1)

## [1] -113.7 -68.2

Bootstrap confidence interval - using the quantile() function

alpha <- 0.05


ci <- quantile(boot.diff, probs = c(alpha/2, 1-alpha/2)) # [-113.7, -68.2]
round(ci, digits = 1)

## 2.5% 97.5%
## -113.7 -68.2

16
Bootstrap p value

null.value <- 0 # null value


pval <- sum(boot.diff<null.value)/nboot + sum(boot.diff==null.value)/(2*nboot)
pval <- 2*(min(pval,1-pval)) # 0
pval

## [1] 0

Bootstrap using functions

The bootstrap confidence interval and the p value can also be obtained in one call to the pb2gen() function.
The input est could be changed to many other estimators to make various bootstrap inferences about
differences between independent groups.
set.seed(1)
pb2gen(g1, g2, alpha=0.05, nboot=nboot, est=mean, trim=0.2)

## $est.1
## [1] 393.8316
##
## $est.2
## [1] 485.0868
##
## $est.dif
## [1] -91.25519
##
## $ci
## [1] -113.70589 -68.24266
##
## $p.value
## [1] 0
##
## $sq.se
## [1] 140.3912
##
## $n1
## [1] 50
##
## $n2
## [1] 70
The output sq.se is the bootstrap standard error of the difference between trimmed means.
The same analysis can be done with the boot() function but it gets more complicated to program. We can
use a helper function from the simpleboot package to ease the process:
# bootstrap data
set.seed(1)
boot.res <- two.boot(g1, g2, FUN=mean, R=nboot, trim=0.2)

# get 95% confidence interval


boot.ci(boot.res, type = "perc") # get percentile bootstrap CI

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

17
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.res, type = "perc")
##
## Intervals :
## Level Percentile
## 95% (-113.90, -68.06 )
## Calculations and Intervals on Original Scale

Illustration

diff <- mean(g1,trim=0.2) - mean(g2,trim=0.2) # group difference

ci1 <- round(ci[1])


ci2 <- round(ci[2])

df <- as_tibble(with(density(boot.diff),data.frame(x,y)))

ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = diff, colour = "grey", size = 1) +
geom_line(size = 2) +
scale_x_continuous(breaks = seq(-200, 200, 20)) +
coord_cartesian(xlim = c(-130, 0)) +
labs(x = "Bootstrap differences between 20% trimmed means", y = "Density") +
# confidence interval ----------------------
geom_segment(x = ci1, xend = ci2,
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = ci1+0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(ci1, digits = 2))) +
annotate(geom = "label", x = ci2-0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(ci2, digits = 2)))

18
0.03

0.02
Density

0.01

L = −114 U = −68
0.00
−120 −100 −80 −60 −40 −20 0
Bootstrap differences between 20% trimmed means
# ggsave(filename=('./figures/figure_2gpsbootres.pdf'),width=7,height=5)

Compare 2 groups: difference in spread


Using the same data, now we look at differences in spread between group 1 and group 2. We make inferences
on MAD, the median absolute deviation from the median, which is a robust measure of spread.

Bootstrap

We already have generated bootstrap samples, so here we simply compute MAD for each of them.
boot1.mad <- apply(boot1, 1, mad)
boot2.mad <- apply(boot2, 1, mad)
boot.diff <- sort(boot1.mad - boot2.mad)

Bootstrap confidence interval

ci <- c(boot.diff[low], boot.diff[up]) # [-62.2, 4.3]


ci

## [1] -62.154858 4.311876


Using the quantile() function:

19
ci <- quantile(boot.diff, probs = c(alpha/2, 1-alpha/2)) # [-62.2, 4.5]
ci

## 2.5% 97.5%
## -62.162623 4.529291

Bootstrap p value

pv <- sum(boot.diff<0)/nboot + sum(boot.diff==0)/(2*nboot)


pv <- 2*(min(pv,1-pv)) # 0.083
pv

## [1] 0.083

Illustration

diff <- mad(g1) - mad(g2) # group difference

ci1 <- round(ci[1])


ci2 <- round(ci[2])

df <- as_tibble(with(density(boot.diff),data.frame(x,y)))

ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = diff, colour = "grey", size = 1) +
geom_line(size = 2) +
scale_x_continuous(breaks = seq(-200, 200, 20)) +
coord_cartesian(xlim = c(-100, 50)) +
labs(x = "Bootstrap differences between MADs", y = "Density") +
# confidence interval ----------------------
geom_segment(x = ci1, xend = ci2,
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = ci1+0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(ci1, digits = 2))) +
annotate(geom = "label", x = ci2-0.15, y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(ci2, digits = 2)))

20
0.025

0.020

0.015
Density

0.010

0.005

L = −62 U= 5
0.000
−100 −80 −60 −40 −20 0 20 40
Bootstrap differences between MADs
# ggsave(filename=('./figures/figure_mad.pdf'),width=7,height=5)

Bootstrap using functions

pb2gen() function

set.seed(1)
pb2gen(g1, g2, alpha=0.05, nboot=nboot, est=mad)

## $est.1
## [1] 40.29569
##
## $est.2
## [1] 68.39412
##
## $est.dif
## [1] -28.09842
##
## $ci
## [1] -62.154858 4.311876
##
## $p.value
## [1] 0.083
##
## $sq.se

21
## [1] 294.1268
##
## $n1
## [1] 50
##
## $n2
## [1] 70

boot() function

# bootstrap data
set.seed(1)
boot.res <- two.boot(g1, g2, FUN=mad, R=nboot)

# get 95% confidence interval


boot.ci(boot.res, type = "perc") # get percentile bootstrap CI

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot.res, type = "perc")
##
## Intervals :
## Level Percentile
## 95% (-62.46, 4.55 )
## Calculations and Intervals on Original Scale
Instead of looking separately at group differences in central tendency and spread, we can have a more detailed
look at how distributions differ by using a shift function, which is available in the rogme R package.

Bootstrap confidence intervals of correlation coefficients


[This section is not described in the article]
Finally, we use the bootstrap to make inferences about correlations. Imagine we did an experiment in which
we measured two variables in the same participants. In this example we sample from 2 weakly correlated
variables. Changing the random seed or commenting out the line set.seed(777) will give different results.
You can also change the population correlation by changing rho. In the following code we define population
correlations as Spearman’s correlations. This is achieved using the SimJoint package. Another solution is to
use the Iman-Conover method (1982), as implemented in the mc2d package (Pouillot & Delignette-Muller,
2010).

Make data

set.seed(777)
n <- 50 # sample size
npop <- 1000000 # finite population size
mu <- c(0, 0) # means of the variables
rho <- 0.2 # population Spearman correlation between variables
sigma1 <- matrix(c(1, rho, rho, 1), nrow = 2, byrow = TRUE) # covariance matrix

22
# Create population 1 -----------------------------

# Create bivariate distributions


# pop <- MASS::mvrnorm(n = npop, mu = mu, Sigma = sigma1) # impose known Pearson correlation
pop <- cbind(rnorm(npop), rnorm(npop)) # uncorrelated random normal samples

# Impose correlation using the mc2d package:


# pop <- mc2d::cornode(pop, target = rho)

# Impose correlation using the SimJoint package:


pop <- apply(pop, 2, function(x) sort(x))
res <- SimJoint::SJspearman(X = pop, cor = sigma1)
pop <- res$X

# cor(pop[,1], pop[,2], method = "spearman") # check population Spearman correlation


# cor(pop[,1], pop[,2], method = "pearson") # check population Pearson correlation
# plot(pop[1:10000,])

# group 1: take random sample of size n from population


set.seed(1)
id <- sample(npop, n)
x1 <- pop[id,1]
y1 <- pop[id,2]

save(x1, y1, n, file = "./data/corr1.RData")

Illustrate data

The scatterplot of results is illustrated below.


# load data
load("./data/corr1.RData")

# make data frame


df <- tibble(x = x1,
y = y1)

# ggplot figure
p <- ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_point(alpha = 0.4, size = 3) +
coord_cartesian(xlim = c(-2.5, 2.5), ylim = c(-2.5, 2.5)) +
theme(axis.title = element_text(size = 15, colour = "black"),
axis.text = element_text(size = 13, colour = "black"),
strip.text = element_text(size = 15, face = "bold")) +
labs(x = expression(italic("Variable A")), y = expression(italic("Variable B"))) +
ggtitle("Group 1")
pA <- p
p

23
Group 1
2
Variable B

−1

−2

−2 −1 0 1 2
Variable A
For this sample, Spearman’s correlation coefficient is 0.43, suggesting a relatively strong association. In fact,
we simulated these data, so we know that the population Spearman correlation is 0.2, which means that our
sample over-estimates the true correlation.

Bootstrap

To compute a bootstrap confidence interval for the correlation population, we sample pairs of observations
with replacement:
set.seed(21)
nboot <- 5000 # number of bootstrap samples
alpha <- 0.05 # alpha level for confidence interval
boot.corr1 <- vector(mode = "numeric", length = nboot) # vector of bootstrap correlations
for(B in 1:nboot){
boot.id <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations
boot.corr1[B] <- cor(x1[boot.id], y1[boot.id], method = "spearman")
}

In the code, the variable boot.id is a vector of participant indices, sampled with replacement from the full
list of participants. These indices are then used to index the matched vectors x1 and y1. That way, paired
observations are kept together. Spearman’s correlation is computed for each of these bootstrap pairs.

Bootstrap confidence interval

corci1 <- quantile(boot.corr1, probs = c(alpha/2, 1-alpha/2))


corci1

## 2.5% 97.5%
## 0.1875850 0.6278105

P value

24
pval <- sum(boot.corr1 < 0)/nboot
pval <- 2 * min(pval, 1 - pval)
pval

## [1] 8e-04
The bootstrap results can also be obtained by calling the R command: corb(x1,y1, corfun = spear, SEED
= FALSE). The argument corfun can be changed from spear (for Spearman) to another correlation function,
such as pbcor for a percentage bend correlation or wincor for a Winsorized correlation. For bootstrap
inferences about the non-robust Pearson’s correlation, a modified bootstrap technique is required, which is
implemented in the function pcorb. For details about these correlation methods, see Pernet et al. (2013) and
Wilcox (2017).

Illustrate bootstrap correlation coefficients

The resulting bootstrap distribution is illustrated below.


cor1 <- cor(x1, y1, method = "spearman")

df <- as_tibble(with(density(boot.corr1),data.frame(x,y)))

p <- ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = cor1, colour = "grey", size = 1) +
geom_line(size = 2) +
labs(x = "Bootstrap Spearman's correlations", y = "Density") +
# confidence interval ----------------------
geom_segment(x = corci1[1], xend = corci1[2],
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = corci1[1], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(corci1[1], digits = 2))) +
annotate(geom = "label", x = corci1[2], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(corci1[2], digits = 2)))
pB <- p
p

25
3
Density

L = 0.19 U = 0.63
0
0.0 0.2 0.4 0.6 0.8
Bootstrap Spearman's correlations
For simulations on the (large) sample sizes needed to precisely estimate correlation analyses, see this blog
post. To see why conditioning correlation results on p values can lead to a literature full of false positives,
see this blog post. Finally, simulations of statistical power for correlation analyses are available here.

Compare correlation coefficients


If we had two independent groups and wanted to compare their correlation coefficients, we could use the
bootstrap to compute a confidence interval for the correlation difference. This is done by extending a little
bit the code we used for one correlation. As before, pairs of observations are sampled with replacement, and
because groups are independent, bootstrap samples are obtained independently in each group.
In this situation, we have 2 groups, for each group we measure variables A and B and then estimate their
correlations.

Make data

set.seed(777)
# group 2
rho <- 0.5 # correlation between variables
sigma2 <- matrix(c(1, rho, rho, 1), nrow = 2, byrow = TRUE) # covariance matrix
pop <- cbind(rnorm(npop), rnorm(npop)) # uncorrelated random normal samples

# Impose correlation using the SimJoint package:


pop <- apply(pop, 2, function(x) sort(x))

26
res <- SimJoint::SJspearman(X = pop, cor = sigma2)
pop <- res$X

# group 1
set.seed(1)
id <- sample(npop, n)
x2 <- pop[id,1]
y2 <- pop[id,2]

save(x2, y2, file = "./data/corr2.RData")

Illustrate data

# load data
load("./data/corr2.RData")

# make data frame


df <- tibble(x = x2,
y = y2)

# ggplot figure
p <- ggplot(df, aes(x = x, y = y)) + theme_gar +
geom_point(alpha = 0.4, size = 3) +
coord_cartesian(xlim = c(-2.5, 2.5), ylim = c(-2.5, 2.5)) +
theme(axis.title = element_text(size = 15, colour = "black"),
axis.text = element_text(size = 13, colour = "black"),
strip.text = element_text(size = 15, face = "bold")) +
labs(x = expression(italic("Variable A")), y = expression(italic("Variable B"))) +
ggtitle("Group 2")
pC <- p
p

Group 2
2
Variable B

−1

−2

−2 −1 0 1 2
Variable A
For this sample, Spearman’s correlation coefficient is 0.66.

27
Bootstrap

Bootstrap samples are obtained independently in each group. Pairs of observations are sampled with
replacement.
set.seed(21)
nboot <- 5000 # number of bootstrap samples
alpha <- 0.05 # alpha level for confidence interval
boot.diff <- vector(mode = "numeric", length = nboot) # vector of bootstrap correlations
for(B in 1:nboot){
boot.id1 <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations in group 1
boot.id2 <- sample(n, size = n, replace=TRUE) # bootstrap pairs of observations in group 2
boot.diff[B] <- cor(x1[boot.id1], y1[boot.id1], method = "spearman") - cor(x2[boot.id2], y2[boot.id2
}

Bootstrap confidence interval

diff.ci <- quantile(boot.diff, probs = c(alpha/2, 1-alpha/2)) # [-0.499, 0.030]


round(diff.ci, digits = 3)

## 2.5% 97.5%
## -0.499 0.030

P value

pval <- sum(boot.diff < 0)/nboot


pval <- 2 * min(pval, 1 - pval) # 0.0904
pval

## [1] 0.0904
The bootstrap results can be obtained by calling the R command: twocor(x1,y1,x2,y2, corfun = spear).
The input argument corfun can be changed to another robust function, as already mentioned. To compare two
percentage bend correlations, use this command instead: twocor(x1,y1,x2,y2, corfun = pbcor). And
to compare two Pearson’s correlations: twopcor(x1,y1,x2,y2). We do not recommend the use of Fisher’s
r-to-z transform to compare Pearson’s correlations, because this procedure assumes normality and even a
small departure from a normal distribution can result in poor performance regardless of how large the sample
size might be (Wilcox, 2017).

Bootstrap distribution of correlation coefficient differences

corr.diff <- cor(x1, y1, method = "spearman") - cor(x2, y2, method = "spearman")

df <- as_tibble(with(density(boot.diff),data.frame(x,y)))

p <- ggplot(df, aes(x = x, y = y)) + theme_gar +


geom_vline(xintercept = corr.diff, colour = "grey", size = 1) +
geom_line(size = 2) +
labs(x = "Bootstrap Spearman's corr. differences", y = "Density") +
# confidence interval ----------------------
geom_segment(x = diff.ci[1], xend = diff.ci[2],

28
y = 0, yend = 0,
lineend = "round", size = 3, colour = "orange") +
annotate(geom = "label", x = diff.ci[1], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("L = ", round(diff.ci[1], digits = 2))) +
annotate(geom = "label", x = diff.ci[2], y = 0.1*max(df$y), size = 7,
colour = "white", fill = "orange", fontface = "bold",
label = paste("U = ", round(diff.ci[2], digits = 2)))
pD <- p
p

2
Density

L = −0.5 U = 0.03
0
−0.6 −0.3 0.0 0.3
Bootstrap Spearman's corr. differences
The bootstrap distribution and the confidence interval are very broad, suggesting a large range of population
differences. This is not surprising given the modest sample sizes in our example: correlations are noisy and
require a lot of observations to be precisely estimated (Yarkoni, 2009).

Summary figure

require(cowplot)
cowplot::plot_grid(pA, pC, pB, pD,
labels = c("A", "C", "B", "D"),
ncol = 2,
nrow = 2,
label_size = 20,
hjust = -0.5,
scale=.95)

29
# save figure
ggsave(filename=('./figures/figure_corr.pdf'),width=12,height=10)

Simulations of false positives and power for correlation comparisons are reported in this blog post. Bottom-
line: the Fisher’s r-to-z transform is not robust, and very large sample sizes are required to detect differences
between correlation coefficients.

References
Canty, A. & Ripley, B. (2017). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-20.
Davison, A. C. & Hinkley, D. V. (1997) Bootstrap Methods and Their Applications. Cambridge University
Press, Cambridge. ISBN 0-521-57391-2
Hesterberg, T. (2015) What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate
Statistics Curriculum. The American Statistician, 69, 371–386.
Iman, R.L. & Conover, W.J. (1982) A distribution-free approach to inducing rank correlation among input
variables. Communications in Statistics - Simulation and Computation, 11, 311–334.
Charlie Wusuo Liu (2019). SimJoint: Simulate Joint Distribution. R package version 0.2.2. https://CRAN.R-
project.org/package=SimJoint
Pernet, C.R., Wilcox, R.R., & Rousselet, G.A. (2013) Robust Correlation Analyses: False Positive and Power
Validation Using a New Open Source Matlab Toolbox. Front. Psychol., 3. https://www.frontiersin.org/
articles/10.3389/fpsyg.2012.00606/full
Pouillot, R. & Delignette-Muller, M.-L. (2010). Evaluating variability and uncertainty in microbial quantitative
risk assessment using two R packages. International Journal of Food Microbiology. 142(3):330-40
Wilcox, R.R. (2017) Introduction to Robust Estimation and Hypothesis Testing. Academic Press, 4th edition.,
San Diego, CA.
Yarkoni, T. (2009) Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical
Power‚ Commentary on Vul et al. (2009). Perspectives on Psychological Science, 4, 294-298.

30

You might also like