0% found this document useful (0 votes)
32 views

Writing Simple Functions in R Bootstrapping

writing simple functions in R Bootstrapping

Uploaded by

yc47398
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Writing Simple Functions in R Bootstrapping

writing simple functions in R Bootstrapping

Uploaded by

yc47398
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Writing Simple Functions in R: Bootstrapping as an

Example

Shu Fai Cheung @ University of Macau

Sep 2024

Table of contents
1 Aim and Scope 1
1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Target Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 The Sample Functions are for Learning . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Defining Functions 2
2.1 Defining a Simple Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Defining a More Complicated Function . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Local Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Default Value for an Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Useful Statements in R 5
3.1 if and else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.1 if without else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 if … else if … else if … else . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.3 if and NA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 for … in …. and while . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Examples 8
4.1 Nonparametric Bootstrapping Confidence Intervals . . . . . . . . . . . . . . . . . . . . 8
4.1.1 Pearson’s r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

5 Optional Topics 13
5.1 Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Pass-By-Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Dotdotdot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Final Remarks 16

7 Further References 16

1 Aim and Scope


1.1 Aim
This document helps you to understand more about a function by writing some simple functions. Even if
you are just users and will never develop anything for others, you can still:
a. write some simple functions for tasks that you will do again and again in your own work, and,
b. write functions to form the nonparametric bootstrapping confidence interval for a statistic.
Bootstrapping will be used as example.

1
2

Note that, although the focus is on writing simple functions, functions not covered in previous documents
may also be introduced if necessary.

1.2 Target Audience


This document is for users, not for programmers. Therefore, some technical details may be omitted or
simplified, sometimes “oversimplified” if the details are not essential for common scenarios.

1.3 The Sample Functions are for Learning


Note that the sample functions presented in this document is for learning. Some tasks can already be
done by existing functions. Some functions can be improved in efficiency or be written in a more “R” way.
The sample functions in this document are written for illustrating the concepts to be introduced. In actual
research, there may be better ways to write them.

1.4 Style
In this and other documents, I will use my own personal style. Feel free to use whatever style you like in
your own work. See the section on style in R as a Language Part 1.

2 Defining Functions
2.1 Defining a Simple Function
You have already learned about calling a function and specifying its arguments. Now we are going to
define a very simple function that
• receives two numbers;
• adds them together;
• returns the result.
This is an example:

my_addition <- function(x, y) {


x + y
}

Recall that a function is an object. Therefore, we will assign it to a name, my_addition by <-. A function
definition starts with, well, function.
After function, there must be a pair of parentheses, ( and ). Between them are the arguments of this
function. They are either named, or are unknown number of arguments denoted by ... (dotdotdot).
Dotdotdot is useful in some cases but will be introduced later when we need it. In the example above,
the arguments are x and y, in this order. Arguments are separated by commas.
Note that the order is important because if users do not name the arguments, the values will be assigned
based on this order.
After the parentheses, the body of a function must be enclosed between a pair of curly brackets, { and },
unless it is one simple expression after the parentheses.1 We write code as usual between the brackets
to define the body of a function.
In the example above, the body is x + y. This is what we will do to add two variables, x and y.
Try and see what will happen by calling this function and then learn how it works:

1
For example, my_addition <- function(x, y) x + y is acceptable.

SSGC 8802 (Sept 2024) Writing Simple Functions


3

my_addition(x = 1, y = 2)
my_addition(30, 5)

It should do what we expect. With just two arguments, we do not have to use names.
How about supplying two “names” of objects? Try this:

a <- 15
b <- 21
my_addition(a, b)

We can use the names of variables.


We can even use expressions. Try this:

a <- 15
my_addition(a, 3 * 7)

Now you should have confirmed that it works. Let us see how it works.
When we call the function by my_addition(1, 2), 1 is assigned to x, and 2 is assigned to y. R will then
run the code in the body of my_addition() using these values.2
When the function finishes its operation normally, either because it finishes the last line, or it calls
return() (introduced later) to return something. If it finishes its last line, the output in this line will be
returned.
In my_addition(), the last line is x + y. Therefore, the output of x + y is automatically returned.

2.2 Defining a More Complicated Function


Let us write a slightly more complicated function which returns the minimum and the maximum of a vector
of numbers3 :

my_range <- function(x) {


x_min <- min(x)
x_max <- max(x)
c(min = x_min,
max = x_max)
}

This function has only one argument, x. The function first finds the minimum using min(), and assign it
to x_min. It then finds the maximum using max(), and assign it to x_max. It then creates a named vector
from these two numbers. This is the last line and so the result of this line will be returned.
Let’s try it:

out <- my_range(c(2, 3, 5, 1, 5, 10))


out

Good! We wrote our own function to find the range! This function has something new. New variables,
x_min and x_max, are created inside the body. This leads to the next topic …

2.3 Local Variables


You may wonder what will happen if x_min and x_max already exist outside the function. For example.

2
The arguments actually will be evaluated only when they are used. This is called lazy evaluation. This is not covered here.
3
There is a base function, range(), for doing this. This example is for learning about writing functions by finding the range
ourselves.

SSGC 8802 (Sept 2024) Writing Simple Functions


4

x_min <- 100


x_max <- 10
x <- c(2, 3, 5, 1, 5, 9)
my_range(x)

What is the result? Will x_min and x_max be changed? Try this:

x_min <- 100


x_max <- 10
x <- c(2, 3, 5, 1, 5, 9)
my_range(x)
x_min
x_max

First, we find that my_range gives the correct result. x_min and x_max we created before calling my_range
do not affect its operation.
Second, x_min and x_max we created, interestingly, are not affected, even if variables with the same
names are used in the function (x_min <- min(x) and x_max <- max(x)).
This introduces the idea of local variables. x_min and x_max, created by <- inside the function, are local.
They are created inside the function and so are different from what exists “out there”4 This behavior is
useful because we do not need to worry about overwriting variables that exists in the environment calling
it.5 These local variables will disappear after the function ends.

2.4 Default Value for an Argument


There is a problem with my_range. Recall that min and max have an argument na.rm to control how to
handle missing values. How can we let the users of my_range to set na.rm if they want to, and set na.rm
to FALSE, the default value, if users do not set na.rm? This is an improved version:

my_range2 <- function(x,


my.na.rm = FALSE) {
x_min <- min(x,
na.rm = my.na.rm)
x_max <- max(x,
na.rm = my.na.rm)
c(min = x_min,
max = x_max)
}

First, we add an argument, my.na.rm. We set the default value of my.na.rm to FALSE. If my.na.rm is
provided, then the provided value will be used. If not, then my.na.rm = FALSE. In the calls to min and max,
we set the argument na.rm of them to the value of our argument, my.na.rm.
Let’s see how it works by trying this:

x2 <- c(4, 2, 1, 10, NA, 7)


my_range2(x2)
my_range2(x2,
my.na.rm = TRUE)

Now users can decide how to handle missing values, and this function also has a default way to handle
missing values if the users have no instruction on how to handle missing values.
4
Technically, in the parent frame.
5
A function can overwrite variables outside it, by using <<-. However, this should be avoided. Use this only if there are no other
solutions.

SSGC 8802 (Sept 2024) Writing Simple Functions


5

Setting the default values of arguments makes a function easier to use, if the default values are the
values users usually want.

3 Useful Statements in R
This section introduces a few statements useful for writing R functions. Note that all these statements
can also be used in R scripts, not just in a function.

3.1 if and else


Suppose we want to write a function to check if a number is less than a cutoff or not, for example, whether
a p-value is less than the desired level of significance (alpha), say, .05. This is a description of what we
want to do:
Get the number and the cutoff.
If the number is less than the cutoff:
Return a string: "sig."
Else:
Return a string: "n.s."
This is one way to write this function:

is_sig <- function(pvalue,


alpha = .05) {
if (pvalue < alpha) {
return("sig.")
} else {
return("n.s.")
}
print("This sentence will never be printed")
}

Note that we set the default value of alpha to .05, the usual maximum level of significance.
Let’s test this function:

is_sig(pvalue = .04)
# We can omit the name
is_sig(.06)
# We set another level of significance
is_sig(.04, alpha = .01)

It should work. This function uses if ( ... ) { ... } else { ... }. After if is the condition en-
closed in a pair of parentheses. This condition should be a one-element logical vector, or an expression
that will result in a one-element logical vector. In is_sig, pvalue < alpha should result in TRUE or FALSE
(though NA is possible).
If the condition is TRUE, then the expression inside the next pair of curly brackets will be run. If FALSE,
then the expression in the pair of curly brackets after else will be run. (The case of NA will be covered
later.)
NOTE: Be care when writing a condition. The version we used above can result in an error if (pvalue <
alpha) does not return one single logical value. Try this:

ps <- c(.04, .06)


is_sig(pvalue = ps)

Do you know why it results in an error?

SSGC 8802 (Sept 2024) Writing Simple Functions


6

This example also introduces a new function, return(). This is used to tell the function to end and return
the argument of return immediately. Because the if ... else structure already covers all possibilities,
and return() is used in all possibilities, the line print("This will never be printed"), although
being the last line, will never be run.

3.1.1 if without else


There are other variants of this structure. For example, we can omit else. Let’s improve the is_sig()
function:

is_sig2 <- function(pvalue, alpha = 0.05) {


if ((alpha < 0) || (alpha > 1)) {
stop("The level of significance (alpha) is invalid.")
}
if (pvalue < alpha) {
return("sig.")
} else {
return("n.s.")
}
}

This introduces the idea of testing an argument. The level of significance should not be zero or less (p
< 0?) and should not be one or higher (p < 1?). Therefore, before checking the p value, we check the
alpha first. If either (alpha <= 0) or (alpha >= 1) is TRUE, then the line stop .... will be run. There is
no need for else because we only need to check whether a condition is met. If not, then we can proceed
as usual.
NOTE: || (and &&) is usually used in if condition.
This example also uses a new function, stop(). This function is commonly used in a function. It, obvi-
ously, “stops” a function. But it does not just stop the function. It will “raise” an error, and the argument
is the error message.
Let’s try this version, is_sig2:6

is_sig2(pvalue = .04, alpha = 1)

[1] "sig."

is_sig2(pvalue = .04, alpha = 0)

[1] "n.s."

is_sig2(pvalue = .04, alpha = .05)

[1] "sig."
Instead of stopping the function and raising an error, we can also return NA, that is, replace the call to
stop by return(NA). However, sometimes we may prefer raising an error in this case because NA can
also be interpreted as missing, for example, p-value is NA (although an error will actually occur if pvalue
is NA, for a reason described later).
Certainly, we can also apply a similar test to pvalue, which should range from 0 to 1. I will leave it as an
exercise for you.
6
The error messages may be printed outside the margin. I cannot yet solve this problem. formatR and tidy do not work as
some suggested.)

SSGC 8802 (Sept 2024) Writing Simple Functions


7

3.1.2 if … else if … else if … else


Suppose we want to check the p value and then return the conventional symbols we use in psychology
to denote the achieved level of significance, i.e., *, **, and *** for p value less than .05, .01, and .001,
respectively. We can use else if:

pstar <- function(pvalue) {


if (pvalue < .001) {
return("***")
} else if (pvalue < .01) {
return("**")
} else if (pvalue < .05) {
return("*")
} else {
return("n.s.")
}
}

Let’s try this function:

pstar(.06)
pstar(.04)
pstar(.009)
pstar(.00000001)

Note that, whenever a condition is met, the code inside the next curly brackets will be run, and all re-
maining conditions will not be checked. If p < .001, then p is also < .01 and < .05. Therefore, we need
to check p < .001 first.
Having many conditions can be difficult to read. If appropriate, we can consider using switch(). We can
also simply remove else and else if.

pstar2 <- function(pvalue) {


if (pvalue < .001) {
return("***")
}
if (pvalue < .01) {
return("**")
}
if (pvalue < .05) {
return("*")
}
"n.s"
}

This function works like pstar does.

pstar2(.06)
pstar2(.04)
pstar2(.009)
pstar2(.00000001)

This version uses if only. If the condition of an if block is not met, then R will proceed to the line after
this block.
Which version to use depends on the context and personal preference. Using else or else if may
make the code look organized. However, sometimes it can be more difficult to read than just having a
sequence of if blocks, especially when we have a lot of lines inside an if block.

SSGC 8802 (Sept 2024) Writing Simple Functions


8

3.1.3 if and NA
Note that, when testing the condition, NA is neither FALSE nor TRUE. It will result in an error. Therefore,
the following call will result in an error:

is_sig2(pvalue = NA)

Error in if (pvalue < alpha) {: missing value where TRUE/FALSE needed


It is because the condition is pvalue < alpha. NA < .05 is NA, and so the condition is if (NA) {...},
resulting in the error message.

3.2 for … in …. and while


They will not be covered here, though they may be introduced later if they are needed for a task.
A for … in … block, usually called a for loop, is common in many programming languages. However,
in R, usually the same task can be done by the family of apply functions (e.g., lapply, sapply, etc.), to
be introduced later.
If you only need to write some functions to simplify some tasks, while is usually not necessary. It is
used for repeating a process while a condition is true, which is common for algorithms (e.g., maximum
likelihood estimation). However, you may rarely need to use it.
You can use help("for") and help("while") to learn more about them. They share the same help
page.
In R, the family of apply-functions are usually used instead of for-loop when applicable. These functions
are useful some tasks and will be covered as the needs arise.

4 Examples
4.1 Nonparametric Bootstrapping Confidence Intervals
(This section assumes that you have learned about nonparametric bootstrapping, including its pros and
cons.)
R comes with a package boot that can do nonparametric bootstrapping. More and more packages can
form nonparametric bootstrapping confidence intervals (e.g., lavaan can do this for parameter estimates
in structural equation modeling, and psych::alpha() can do this for Cronbach’s alpha). Nevertheless,
there may be cases in which such a function has not yet been developed (or it has but you could not find
it). Even if there is such a function, it is still a good practice to learn writing a function to do this.
The boot() function in the boot package does not compute the statistic. It requires users to supply the
function to compute the statistic. Its job is to draw the bootstrap samples, compute the statistic, and
return them to the users.

4.1.1 Pearson’s r
Let us consider a practical scenario: forming a nonparametric bootstrapping confidence interval for a
Pearson’s r.
We already know that psych::cor.ci() can do this. Let’s try to do it using our own function.

4.1.1.1 Write a Function to Compute the Statistics


To use boot::boot(), we first need to write a function which must have at least these two arguments as
the first and second argument:
• First argument: The data frame.
• Second argument: A numeric vector to select rows (cases) from the data frame.

SSGC 8802 (Sept 2024) Writing Simple Functions


9

This function must return a vector of the statistic.


In our case, the statistic is the Pearson’s r.
So this is the form of the function (names of the first and second arguments do not matter):

my_r <- function(data,


index) {
# Resampling: Select rows from dat.
# Compute and return Pearson's r
}

Let’s try to compute the correlation first, using the dataset similar to the one used in the handout
SSGC_8802_Correlation_in_R, but with 100 cases:

library(readxl)
dat <- read_excel("correlation_example_100_cases.xlsx")
cor(dat)

SelfEsteem Happiness EmotionalIntelligence


SelfEsteem 1.00000000 0.5375000 -0.09598361
Happiness 0.53750000 1.0000000 0.18428854
EmotionalIntelligence -0.09598361 0.1842885 1.00000000
So, we know cor() can compute the correlations. But we need a vector of correlations. We can use the
subsetting techniques for matrices. However, there is a convenient function, as.vector(), to convert a
matrix to a vector:

as.vector(cor(dat))

[1] 1.00000000 0.53750000 -0.09598361 0.53750000 1.00000000


[6] 0.18428854 -0.09598361 0.18428854 1.00000000
So, we can use as.vector() and then extract the three correlations:

as.vector(cor(dat))[c(2, 3, 6)]

[1] 0.53750000 -0.09598361 0.18428854


Done! Now we adapt this to the body of the function:

my_r <- function(data,


index) {
# Resampling: Select rows from dat.
# Compute and return Pearson's r
out <- as.vector(cor(data))[c(2, 3, 6)]
return(out)
}

Let’s check the function:

my_r(data = dat)

[1] 0.53750000 -0.09598361 0.18428854


It works, but not easy to read. We can add names to the vector by names():

my_r <- function(data,


index) {
# Resampling: Select rows from dat.

SSGC 8802 (Sept 2024) Writing Simple Functions


10

# Compute and return Pearson's r


out <- as.vector(cor(data))[c(2, 3, 6)]
names(out) <- c("SE-HP", "SE-EI", "HP-EI")
return(out)
}

Let’s check the function again:

my_r(data = dat)

SE-HP SE-EI HP-EI


0.53750000 -0.09598361 0.18428854
Now we have completed the part to compute Pearson’s r. Let’s write the part to select cases. We can
just apply the technique we used before for selecting rows:

my_r <- function(data,


index) {
# Resampling: Select rows from dat.
data0 <- data[index, ]
# Compute and return Pearson's r
out <- as.vector(cor(data0))[c(2, 3, 6)]
names(out) <- c("SE-HP", "SE-EI", "HP-EI")
return(out)
}

I renamed dat to dat0, just to make it obvious that the correlations are computed on the sampled rows
of dat.
Let’s test the function again:

my_r(data = dat,
index = 1:5)

SE-HP SE-EI HP-EI


0.6736833 -0.2859402 0.3886301
The correct answers:

cor(dat[1:5, ])

SelfEsteem Happiness EmotionalIntelligence


SelfEsteem 1.0000000 0.6736833 -0.2859402
Happiness 0.6736833 1.0000000 0.3886301
EmotionalIntelligence -0.2859402 0.3886301 1.0000000
This function is now ready to be used in boot::boot().

4.1.1.2 Do Nonparametric Bootstrapping


After we confirmed that this function works as expected, we use it in boot. These are the arguments that
we will use (see help("boot") for further details):
• data: The dataset to be resampled.
• statistic: A function that will compute the target statistic.
• R: The number of bootstrap samples. For confidence interval, it should be at least 2000 or even
5000.
This is the code:

SSGC 8802 (Sept 2024) Writing Simple Functions


11

library(boot)
set.seed(23456)
boot_r <- boot(data = dat,
statistic = my_r,
R = 2000)

set.seed() is used to set the seed for the random number generator, to make the results reproducible.
The output, boot_r, stores the results from the 2000 bootstrap samples. We can use plot to examine
the distribution of the 2000 bootstrap Pearson’s rs.
Note that our function my_f() returns three correlations for each sample. Therefore, we need to add
index to indicate the statistic we need. Let’s add index = 1 to plot the 2000 bootstrap correlations
between self-esteem and happiness:

plot(boot_r,
index = 1)

Histogram of t
5

0.6
4
Density

t*

0.4
2
1

0.2
0

0.1 0.3 0.5 0.7 −3 −1 1 2 3

t* Quantiles of Standard Normal

These are the plots for the other two correlations:

plot(boot_r,
index = 2)

SSGC 8802 (Sept 2024) Writing Simple Functions


12

5 Histogram of t

0.2
4

0.0
Density

t*
2

−0.2
1

−0.4
0

−0.4 −0.2 0.0 0.2 −3 −1 1 2 3

t* Quantiles of Standard Normal

plot(boot_r,
index = 3)

Histogram of t
5

0.5
4

0.3
Density

t*
2

0.1
1

−0.1
0

−0.1 0.1 0.3 0.5 −3 −1 1 2 3

t* Quantiles of Standard Normal

To get the bootstrap confidence interval, we can use boot::boot.ci(). In this example, we will only
use percentile bootstrap confidence interval. Therefore, we set type to "perc" (percentile). The level
of significance is 95%, or .95. Therefore, we set conf to .95. (See help("boot.ci") for further details.)
Note that we also need to add index in this case because we computed three correlations:

boot.ci(boot_r,
index = 1,
conf = .95,
type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS

SSGC 8802 (Sept 2024) Writing Simple Functions


13

Based on 2000 bootstrap replicates

CALL :
boot.ci(boot.out = boot_r, conf = 0.95, type = "perc", index = 1)

Intervals :
Level Percentile
95% ( 0.3440, 0.6856 )
Calculations and Intervals on Original Scale
In this example, the nonparametric bootstrap percentile 95% confidence interval of the Pearson’s r be-
tween self-esteem and happiness is 0.3440 to 0.6856.
We can compare the results with psych::cor.ci():

library(psych)

Attaching package: 'psych'


The following object is masked from 'package:boot':

logit

set.seed(23456)
cor_ci_out <- cor.ci(dat,
n.iter = 2000,
plot = FALSE)
print(cor_ci_out,
digits = 4)

Call:corCi(x = x, keys = keys, n.iter = n.iter, p = p, overlap = overlap,


poly = poly, method = method, plot = plot, minlength = minlength,
n = n)

Coefficients and bootstrapped confidence intervals


SlfEs Hppns EmtnI
SelfEsteem 1.00
Happiness 0.54 1.00
EmotionalIntelligence -0.10 0.18 1.00

scale correlations and bootstrapped confidence intervals


lower.emp lower.norm estimate upper.norm upper.emp p
SlfEs-Hppns 0.3472 0.3513 0.5375 0.6892 0.6909 0.0000
SlfEs-EmtnI -0.2715 -0.2693 -0.0960 0.0866 0.0919 0.3080
Hppns-EmtnI 0.0032 -0.0052 0.1843 0.3635 0.3630 0.0594
The nonparametric bootstrap CIs are not exactly the same because boot::boot.ci() and
psych::cor.ci() use slightly different ways to find the confidence limits. However they are close
enough practically.

5 Optional Topics
5.1 Style
Some align the closing curly bracket with first line of the definition:

my_addition <- function(x, y) {


x + y

SSGC 8802 (Sept 2024) Writing Simple Functions


14

I indent the closing brackets too, as in this document, simply because this is consistent with how we
indent lines in Python. I prefer a (personal) style that is similar across languages.
Some use four whitespace characters for indentation:

my_addition <- function(x, y) {


x + y
}

Using four whitespace characters is a common practice in programming. I use two whitespace characters
simply because I usually work on a small screen or window.
Some write one argument per line:

my_addition <- function(x,


y) {
x + y
}

I will just do whatever easy to type and read, for me. :)


If you write the code just for you yourself, be consistent is enough, in my opinion. Use whatever style
that suits your own need and preference.

5.2 Pass-By-Value
R functions use pass-by-value in handling argument values. Therefore, a function normally will not
change the sources of its arguments, although it can return a modified version of its arguments.
For example:

demo_pass_by_value <- function(x) {


x <- x^2
x
}
x_origin <- 10
x_squared <- demo_pass_by_value(x = x_origin)
x_squared

[1] 100

x_origin

[1] 10
Even though we set x to x_origin and then x is changed inside the function, x_origin is not changed.
It is because it is the value of x_origin that is passed to x, not x_origin itself.
Certainly, we can update x_origin to the result of demo_pass_by_value(), but this is just an reassign-
ment, not a consequence of demo_pass_by_value():

x_origin <- 10
x_origin <- demo_pass_by_value(x = x_origin)
x_origin

[1] 100

SSGC 8802 (Sept 2024) Writing Simple Functions


15

5.3 Dotdotdot
The argument ... is sometimes used by one function to pass arguments to another function. You may
notice that boot() has this argument (see help("boot")). This section illustrates how ... can be used
to do bootstrapping.
In doing bootstrapping, the function used to compute the target statistic may have its own arguments.
boot() collects these arguments using ..., and passes them to the function assigned to statistic.
We can use this feature to revise my_r() such that we can form the bootstrapping confidence interval of
for any two variables we want:

my_r_any2 <- function(data,


index,
x,
y) {
# Resampling: Select rows from dat.
data0 <- data[index, ]
# Compute and return Pearson's r
out <- cor(data0[c(x, y)])[2, 1]
return(out)
}

my_r_any2(data = dat,
index = 1:50,
x = "SelfEsteem",
y = "Happiness")

[1] 0.4033987

# Check the correlation


cor(dat[1:50, c("SelfEsteem", "Happiness")])

SelfEsteem Happiness
SelfEsteem 1.0000000 0.4033987
Happiness 0.4033987 1.0000000
We can try this version again. No need for index in boot.ci() because we compute only one correla-
tions:

set.seed(23456)
boot_r <- boot(data = dat,
statistic = my_r,
R = 2000)
boot.ci(boot_r,
index = 1,
type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


Based on 2000 bootstrap replicates

CALL :
boot.ci(boot.out = boot_r, type = "perc", index = 1)

Intervals :
Level Percentile
95% ( 0.3440, 0.6856 )
Calculations and Intervals on Original Scale

SSGC 8802 (Sept 2024) Writing Simple Functions


16

set.seed(23456)
boot_r2 <- boot(data = dat,
statistic = my_r_any2,
R = 2000,
x = "SelfEsteem",
y = "Happiness")
boot.ci(boot_r2,
type = "perc")

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS


Based on 2000 bootstrap replicates

CALL :
boot.ci(boot.out = boot_r2, type = "perc")

Intervals :
Level Percentile
95% ( 0.3440, 0.6856 )
Calculations and Intervals on Original Scale
You can see that the two confidence intervals are identical.7
In your own research, whether you will use this technique depends on how flexible you want the function
to be.
• If you are pretty sure that you only need bootstrap CI for a very specific scenario (e.g., only the
statistic for a set of variables computed in a specific way), then no need to use these additional
arguments.
• However, if you think you may need to adjust the analysis, such as trying other variables or options
for the analysis (e.g., using another measure of correlation), then you may want to write a more
general function.

6 Final Remarks
There are a lot of issues about functions not covered here. I myself also still have a lot to learn. The goal
of this document is not to make you a programmer (I am also not a programmer). The goal is to help you
to know how writing function can help us to do analysis in our research. We have learned how writing
functions can make it easier to do several tasks again and again for different models or variables. We
have also learned how we can write a function to compute something that we need. This will be useful
if you learn about some new statistic or measure that you want to use but is not yet available in existing
function.
You will definitely encounter some problems when you try to write your own functions. Learn what you
need when using R in your research. Certainly, if you have some experience in programming, or maybe
you are already an experienced programmer, you can consider reading some books on programming in
R to learn more about the technical details in R.

7 Further References
In the book by Fox and Weisberg (2019) on doing regression analysis using R, they also have a chapter
on programming in R (Chapter 10), aimed for researchers. You can see if this chapter is suitable for you:
• Fox, J. &, Weisberg, S.. (2019) An R companion to applied regression (3rd Ed.). Sage.
URL https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html. (UM library has
7
I call set.seed() before each call to boot(), and use the same seed. We usually do not do this. However, these two versions
are fitted to the same dataset. Therefore, to make the results comparable, we will want these two bootstrapping analysis to have
the same bootstrap samples. This can be done by using the same number in set.seed() right before calling boot().

SSGC 8802 (Sept 2024) Writing Simple Functions


17

the 2nd edition: https://umlibrary.primo.exlibrisgroup.com/permalink/853UOM_INST/1jn7l3f/


alma991007321009706306)
The present document focuses on writing functions, not on bootstrapping. To learn more about doing
bootstrapping in R, you can read the following article.
• Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The percentile bootstrap: A primer with
step-by-step instructions in R. Advances in Methods and Practices in Psychological Science, 4(1),
1-10. https://doi.org/10.1177/2515245920911881
If you are interested in learning more about the technical details not covered here, you can read the
official documentation on functions:
• https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Functions

SSGC 8802 (Sept 2024) Writing Simple Functions

You might also like