0% found this document useful (0 votes)
27 views

Sample Size Calculation

A fine information of what you are searching

Uploaded by

bsstats118.20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Sample Size Calculation

A fine information of what you are searching

Uploaded by

bsstats118.20
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

REVIEW ARTICLE

pISSN 2234-778X • eISSN 2234-5248


J Minim Invasive Surg 2023;26(1):9-18 Journal of Minimally Invasive Surgery

Sample size calculation in clinical trial using R


Suyeon Park1,2,3, Yeong-Haw Kim3, Hae In Bang4, Youngho Park5
1
Department of Biostatistics, Academic Research Office, Soonchunhyang University Seoul Hospital, Seoul, Korea
2
International Development and Cooperation, Graduate School of Multidisciplinary Studies Toward Future, Soonchunhyang University, Asan, Korea
3
Department of Applied Statistics, Chung-Ang University, Seoul, Korea
4
Department of Laboratory Medicine, Soonchunhyang University Seoul Hospital, Seoul, Korea
5
Department of Big Data Application, College of Smart Interdisciplinary Engineering, Hannam University, Daejeon, Korea

Since the era of evidence-based medicine, it has become a matter of course to use statistics to create Received February 20, 2023
objective evidence in clinical research. As an extension of this, it has become essential in clinical research to Revised March 12, 2023
calculate the correct sample size to demonstrate a clinically significant difference before starting the study. Accepted March 12, 2023
Also, because sample size calculation methods vary from study design to study design, there is no formula
for sample size calculation that applies to all designs. It is very important for us to understand this. In this Corresponding author
review, each sample size calculation method suitable for various study designs was introduced using the R Hae In Bang
program (R Foundation for Statistical Computing). In order for clinical researchers to directly utilize it Department of Laboratory Medicine,
Soonchunhyang University Seoul
according to future research, we presented practice codes, output results, and interpretation of results for
Hospital, 59 Daesagwan-ro,
each situation.
Yongsan-gu, Seoul 04401, Korea
E-mail: [email protected]
Keywords: Sample size, Effect size, Continuous outcome, Categorical outcome ORCID:
https://orcid.org/0000-0001-7854-3011

Youngho Park
Department of Big Data Application,
College of Smart Interdisciplinary
Engineering, Hannam University,
70 Hannamro, Daedeok-gu,
Daejeon 34430, Korea
E-mail: [email protected]
ORCID:
https://orcid.org/0000-0002-7096-3967

Hae In Bang and Youngho Park


contributed equally to this study as
co-corresponding authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http:// Copyright © The Korean Society of Endo-
creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any Laparoscopic & Robotic Surgery.
medium, provided the original work is properly cited.

INTRODUCTION WHY IS SAMPLE SIZE CALCULATION


IMPORTANT?
This article will cover the following topics: (1) Why is sample
size calculation important?; (2) Components of sample size calcu- The main purpose of sample size calculation is to determine
lation; and (3) How to calculate the required sample size? the minimum number of subjects required to detect a clinically

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023 https://doi.org/10.7602/jmis.2023
https://doi.org/10.7602/jmis.2023..26
26..1.9
10 Suyeon Park et al.

relevant treatment effect. The fundamental reason for calculat- considered essential for estimating the sample size are as follows.
ing the number of subjects in the study can be divided into the
following three categories [1,2]. Study design

Economic reasons There are various research designs [3] in clinical research,
but among them, the most commonly used design is the paral-
In clinical studies, if the sample is not large enough, statistical lel design. A crossover design [4,5] can be used in studies where
significance may not be found even if an important relationship the number of subjects is difficult to collect. A crossover design
or difference exists. In other words, it may not be possible to requires fewer samples than a parallel design, but is complex and
successfully conclude the study because the study may lack the must satisfy several conditions, so an appropriate study design
power to detect the effect. Conversely, when a study is based on should be selected according to the purpose of the study.
a very large sample, small effect differences may be considered Parallel design. Group A receives only treatment A and group
statistically significant and lead to clinical misjudgment. Either B receives only treatment B.
way, your research may not be successful for other reasons and Crossover design. It is a study in which one group receives
the conclusion is a waste of money, time, and resources. treatment A first, then treatment B, and the other group receives
treatment B and then treatment A. Therefore, it is important to
Ethical reasons have an appropriate wash-out period at the time of treatment
change.
Oversized studies are likely to include more subjects than the
study actually needs, exposing unnecessarily many subjects to Null and alternative hypotheses testing
potentially harmful or futile treatments. Similarly, in undersized
studies, ethical issues may arise in that subjects are exposed to When establishing statistical hypotheses in research, two hy-
unnecessary situations in studies that may have low success rates. potheses are always required, which we call the null hypothesis
(H0) and the alternative hypothesis (H1 or Hα). In this case, the
Scientific reasons two hypotheses must consist of two mutually exclusive state-
ments. The null hypothesis (H0) usually contains the opposite of
If a negative result is obtained after conducting a study, it is what the researcher claims in the study and is set to be rejected.
necessary to consider whether the sample size of the study was That is, include ‘no difference’ when forming hypotheses. Con-
sufficient or insufficient. First, if the study was conducted with versely, an alternative hypothesis (H1) is a statement in which the
sufficient sample size, it can be interpreted that there is no clini- researcher proposes a potential outcome, and that hypothesis
cally significant effect. However, if the study is conducted with includes ‘there is a difference.’ There are also different types of
insufficient sample size, meaningful clinical results with statisti- hypothesis testing problems, depending on the purpose of the
cally significant differences in practice may be missed. Notice study. In Table 1, hypotheses can be established depending on
that not being able to reject the null hypothesis does not mean whether it is an equality, equivalence, superiority, or non-inferi-
that it is true; it means that we do not have enough evidence to ority test. Let μS = mean of standard treatment, μT = mean of new
reject it. treatment, δ = the minimum clinically important difference, and
Additionally, calculating sample size at the study design stage, δNI = the non-inferiority margin.
when receiving ethics committee approval, has become a require- Test for equality. To determine whether a clinically meaning-
ment rather than an option. As a result, calculating the optimal ful difference or effect exists (δ = 0).
sample size is an important process that must be done at the Test for equivalence. To demonstrate the difference between
design stage before a study is conducted in order to ensure the the new treatment and standard treatment has no clinical impor-
validity, accuracy, reliability, and scientific and ethical integrity
of the study.
Table 1. Types of hypothesis testing
COMPONENTS OF SAMPLE SIZE Test for Null hypothesis (H0 ) Alternative hypothesis (H1 )
CALCULATION Equality μT – μs = 0 μT – μs ≠ 0
Appropriate sample size usually depends on the statistical hy- Equivalence |μT – μs | ≥ δ |μT – μs | < δ
potheses made with the study’s primary outcome and the study Superiority μT – μs ≥ δ μT – μs > δ
design parameters. The basic statistical six concepts that must be Non-inferiority μT – μs ≤ –δNI μT – μs > –δNI

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023
Sample size calculation in clinical trial using R 11

tance. Primary outcome


Test for superiority. To demonstrate that the new treatment is
superior to the standard treatment. Variables to see clinically significant differences may vary, but
Test for non-inferiority. A study whose primary purpose is to the most important factor among them should be selected. This
evaluate whether a new treatment is less effective or not inferior is called the primary outcome, and the other measurements are
to standard treatment (δNI > 0). referred to as secondary outcomes. The number of samples is cal-
culated using the primary outcome. At this time, the parameter
One-sided and two-sided tests information of the primary outcome for calculating the sample
size can be obtained from prior studies or pilot studies. Both con-
The one-sided test is a method of testing whether it is greater tinuous and categorical data can be used as primary outcomes,
than or less than a certain value, and in Table 1, a superiority and the parameters used to calculate the minimal meaning ful
or non-inferiority trial. A two-sided test is performed when the detectable dif ference (MD) and standard deviation (SD) depend
expected value is greater than or less than the specified range of on the type of variable. For continuous data, mean/sd is used as
values. When using a two-sided test, a two-way relationship is a parameter, and for categorical data, a proportion is used as a
tested regardless of the direction of the hypothesized relation- parameter.
ship. In Table 1, equality and equivalence trials are two-sided tri- Minimal meaning ful detectable dif ference (MD). The smallest
als. difference considered clinically meaningful in the primary out-
come.
Type I error and type II error Standard deviation (SD). It tells you how spread out the data is
from the mean.
The hypothesis testing process is as follows: (1) assume that the
null hypothesis is true, calculate the test statistic using sample Dropout rate
data and (2) decide whether or not to reject the null hypothesis
according to the result. That is, we always choose one of the four The sample size estimation formula yields the minimum
decisions shown in Table 2 and two types of errors inevitably oc- number of subjects required to meet statistical significance for
cur: type I errors (α) and type II errors (β). a given hypothesis. However, in an actual study, subject dropout
Type I error and significance level. The probability of rejecting may occur during the study, so in order to satisfy all the number
the null hypothesis when it is actually true is called a type I error. of subjects desired by the researcher, the total number of subjects
This essentially means saying that the alternative hypothesis is considering the dropout rate must be calculated so that more
true when it is not true. Therefore, it is recommended to keep the subjects can be enrolled. If ‘n’ is the number of samples calcu-
type I error as small as possible. The type I error rate is known lated according to the formula and ‘dr’ is the dropout rate, then
as the significance level and is usually set to 0.05 (5%) or less [6,7]. the adjusted sample size ‘N’ is given by: N = n / (1 – dr).
Type I error is inversely proportional to sample size.
Type II error and power. The probability of not rejecting the Others
null hypothesis when it is false is called a type II error. Converse-
ly, power is the probability that the test will correctly reject the Depending on the study design, there are many more consider-
null hypothesis when the alternative hypothesis is true. Type II ations in addition to the six concepts mentioned above. Although
error can be denoted as β and power as 1 – β. Conventionally, the not considered in the practice below, we would like to mention
power is set to 80% or 90% [6,7] when calculating the sample size three points that are frequently mentioned and used in actual
and it is directly proportional to the sample size. clinical research to help researchers.

Adjustment for unequal sample size


In clinical trials, available patients, treatment costs, and treat-
ment resources may inf luence the allocation ratio (k) decision.
Table 2. Type I and type II error According to Lachin [8] and van Belle [9], it can be applied as fol-
Decision lows.
True status (1) Calculate the sample size n per group, assuming equal num-
H0 (accept H0 ) H1 (reject H0 ) bers per group.
H0 Correct decision Type I error (α) 𝑛𝑛� 1 1 1
H1 Type II error (β) Correct decision (2) Let k = 𝑛𝑛� , 𝑛𝑛� = 𝑛𝑛 �1 + � and 𝑛𝑛� = 𝑛𝑛�1 + 𝑘𝑘�.
2 𝑘𝑘 2

www.e-jmis.org
12 Suyeon Park et al.

Interim analysis However, for studies with relatively low event rates and high
In the confirmatory trials, there are cases in which interim censoring, the following sample size formula using only event
analysis, whether planned or unplanned, is performed at the re- rates can be used:
search planning stage. When calculating the number of subjects
taking this into account, the false positive rate increases with the �
��𝑍𝑍����� + 𝑍𝑍��� �
number of interim analyses, so type I error should be considered. n
n== .
𝜆𝜆 �
�ln � � ��
𝜆𝜆�
Sample size for survival time
In survival analysis, the outcome variable is the time until a
specific event such as death occurs, and whether or not an event HOW TO CALCULATE THE SAMPLE SIZE?
occurs for each subject and the time from the start of the clini-
cal trial to the occurrence of the event (or censoring) are used as Using the 17 tests in Table 3, which are widely used in research,
outcome variables. In particular, the power of survival analysis is we would like to show an example using an R program version
a function of the number of events and generally increases with a 4.1.2 (R Foundation for Statistical Computing; ‘pwr’, ‘exact2x2’
shorter period (T0) of recruitment to study subjects and a longer to- and ‘WebPower’ [11] packages), one of the free programs. Basi-
tal follow-up period (T). Let λ1 and λ2 are the hazard ratio for each cally, when using R, you need to install a package that includes
group, the formula for calculating the number of subjects is [10]: the function you want to analyze and then use it. After that, you
can use the function you want to use after calling package using

�𝑍𝑍� � � � � − 𝑍𝑍� � � � �∅ �𝜆𝜆� � + ∅ �𝜆𝜆� �� the ‘library()’ function. More details will be explained through
n= the example below.
n= �𝜆𝜆� − 𝜆𝜆� �� where
𝜆𝜆� 𝜆𝜆�
. All studies intend to use a parallel group design. A two-tailed
∅�𝜆𝜆� = 𝑜𝑜𝑜𝑜𝑜𝑜𝑜�𝜆𝜆� =
1 − 𝑒𝑒 ��� 1 − �𝑒𝑒 �� ����� � − 𝑒𝑒 ��� � / 𝜆𝜆𝜆𝜆� test with a significance of 0.05 and a power of 80% was estab-
lished. The dropout rate is different for each research field, but

Table 3. Tests for calculating sample size


Test
R package Function
No. Type # of group Name
1 Continuous/Parametric 1 One-sample t test pwr pwr.t.test
2 2 Two-sample t test pwr pwr.t.test
3 2 Paired t test pwr pwr.t.test
4 ≥3 One-way ANOVA pwr pwr.anova.test
5 Continuous/Nonparametric 1 One-sample Wilcoxon test pwr pwr.t.test
6 2 Mann-Whitney U test pwr pwr.t.test
7 2 Paired Wilcoxon test pwr pwr.t.test
8 ≥3 Kruskal-Wallis test pwr pwr.anova.test
9 Categorical/Parametric 1 One-sample proportion test pwr pwr.p.test
10 2 Two-sample proportion test pwr pwr.2p.test
11 - Chi-square test pwr pwr.chisq.test
12 Categorical/Nonparametric 2 Fisher exact test exact2x2 ss2x2
13 2 McNemar test exact2x2 ss2x2
14 Correlation analysis pwr pwr.r.test
15 Linear regression pwr pwr.f2.test
16 Logistic regression WebPower wp.logistic
17 Poisson regression WebPower wp.poisson
ANOVA, analysis of variance.

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023
Sample size calculation in clinical trial using R 13

here we will unify it at 20%. For nonparametric tests on con- One-sample t test (Table 3, no.1)
tinuous variables, as a rule of thumb [12], calculate the sample
size required for parametric tests and add 15%. Effect size can >install.packages(“pwr”)
be defined as ‘a standardized measure of the magnitude of the >library(pwr)
mean difference or relationship between study groups’ [13]. In >pwr.t.test (d = 0.5, sig.level = 0.05, power = 0.8, type = “one.
other words, an index that divides the effect size by its dispersion sample”, alternative = “two.sided”)
(standard deviation, etc.) is not affected by the measurement unit Effect size calculation:
𝜇𝜇� − 𝜇𝜇�
and can be used regardless of the unit, and is called an ‘effect size Cohen’s d (d) = 𝑆𝑆𝑆𝑆
index’ or ‘standardized effect size.’ Cohen intuitively introduced where μ 0 = mean under Ho
effect sizes as small, medium, and large for easy understand- μ1 = mean under H1
ing [14]. However, since the value presented by Cohen may vary SD = SD under H0
depending on the population or distribution of the variable,
there may be limitations in using it as an absolute value. When Assuming a p-value of 0.05 and a power of 80% in a two-tailed
estimating the number of subjects, effect sizes (such as Cohen’s d, test, the minimum number of subjects required to demonstrate
r, or the relative ratio, etc.) should be calculated using parameter statistical significance is 34 when the effect size d = 0.5. Consid-
information (MD and SD) found in the literature relevant to ering the dropout rate of 20%, a total of 43 samples are required.
each primary outcome and entered as arguments to the func-
tion. Additionally, whether an effect size should be interpreted Two-sample t test (Table 3, no. 2)
as small, medium, or large may depend on the analysis method.
We use the guidelines mentioned by Cohen [14] and Sawilowsky >library(pwr)
[15] and use the medium effect size considered for each test in the >pwr.t.test (d = 0.5, sig.level = 0.05, power = 0.8, type = “two.
examples below. sample”, alternative = “two.sided”)
Effect size calculation [16]:
CONTINUOUS OUTCOME Cohen’s d for Welch
𝜇𝜇� − 𝜇𝜇�
test(d) = 𝑆𝑆𝑆𝑆����
When the primary outcome considered in the study is con- where μ 1 = mean of group1
tinuous data, the number of samples can be calculated using μ 2 = mean of group2
the ‘pwr’ package. At this time, you can consider comparing the SD1 = SD of group1
mean of a single group, two groups, or more than three groups, SD2 = SD of group2
and Cohen’s d and f will be used for the effect size. When applied SDpool = ��𝑆𝑆𝑆𝑆�� + 𝑆𝑆𝑆𝑆���/2
to your study, parameters can be taken from a previous or pilot
study and calculated using the effect size calculation formula Assuming a p-value of 0.05 and a power of 80% in a two-tailed
below. test, the minimum number of subjects required for each group to
demonstrate statistical significance is 64 when the effect size d =
Practice 1 0.5. Considering a dropout rate of 20%, 80 subjects are required
for each group, for a total of 160 subjects.
The pwr.t.test() function (Supplementary data 1, Table 1) can be
utilized with the ‘type’ argument for (1) one-sample t test (type =
“one.sample”), (2) two-sample t test (type = “two.sample”), or (3)
paired t test (type = “paired”). Cohen’s d is used as the effect size,
and the size definition [14,15] is as follows; very small (d = 0.01),
small (d = 0.2), medium (d = 0.5), large (d = 0.8), very large (d = 1.2),
and huge (d = 2). In our example, we will use medium effect size (d
= 0.5).

www.e-jmis.org
14 Suyeon Park et al.

Paired t test (Table 3, no. 3) One-way analysis of variance (ANOVA) (Table 3, no. 4)

>library(pwr) >library(pwr)
>pwr.t.test (d = 0.5, sig.level = 0.05, power = 0.8, type = “paired”, >pwr.anova.test (k = 3 , f = 0.25, sig.level = 0.05, power = 0.8)
alternative = “two.sided”) Effect size calculation:
Effect size calculation [16]: �
Cohen’s f (f) = �∑ 𝑝𝑝 ×𝜎𝜎 �𝜇𝜇 − 𝜇𝜇�
����� �


Cohen’s d for Welch where μ i = mean of groupi


𝜇𝜇� − 𝜇𝜇�
test(d) = 𝑆𝑆𝑆𝑆���� μ = o verall (or grand)
where μ 1 = m ean of group mean
before treatment N = t otal numver of
μ 2 = mean of group2 observations
SD1 = SD of group1 ni = n umber of observa-
SD2 = SD of group2 tions in groupi
SDpool = ��𝑆𝑆𝑆𝑆�� + 𝑆𝑆𝑆𝑆���/2 pi = ni /N

In the case of paired samples, if there is a correlation coef- Assume that the p-value is 0.05, the power is 80%, and the two-
ficient (r) between the variables before and after, it can be calcu- tailed test is performed. When the total comparison group was
lated as the SDpool = �𝑆𝑆𝑆𝑆�� + 𝑆𝑆𝑆𝑆�� − 2𝑟𝑟𝑟𝑟𝑟𝑟� 𝑆𝑆𝑆𝑆� /�2�1 − 𝑟𝑟� . Assuming a three groups and the effect size value was 0.25, the number of
p-value of 0.05 and a power of 80% in a two-tailed test, the mini- subjects calculated was 53 in each group. Considering a dropout
mum number of pairs required to demonstrate statistical signifi- rate of 20%, a total of 198 samples are required, which is calcu-
cance is 34 when the effect size d = 0.5. Considering the dropout lated as 66 per group.
rate of 20%, a total of 43 pairs are required.
Kruskal-Wallis test (Table 3, no. 8)
One-sample Wilcoxon test (Table 3, no. 5) By one-way ANOVA, 66 people were calculated for each group,
A total of 43 was calculated by one-sample t test and adding and if 15% of each group is additionally considered, a total of 297
15% gives a total of 65. people are calculated.

Mann-Whitney U test (Table 3, no. 6) CATEGORICAL OUTCOME


By two-sample t test, 80 people were calculated for each group,
and a total of 240 people was calculated by considering an addi- If the primary outcome considered in your study is categorical
tional 15% for each group. data, you can use the ‘pwr’ package for parametric tests and the
‘exact2x2’ package for nonparametric tests to calculate the num-
Paired Wilcoxon test (Table 3, no. 7) ber of samples.
The 43 pairs were calculated by paired t test, taking into ac-
count an additional 15%, the total 65 pairs are required. Practice 3

Practice 2 The pwr.p.test() and pwr.2p.test() functions (Supplementary


data 1, Table 3) are used when comparing one-sample and two-
The pwr.anova.test() function (Supplementary data 1, Table 2) sample proportions, respectively. Cohen’s h is used here as the
can be used in studies that compare averages of three or more effect size. The calculation formula is: h = 2arcsin(�𝑝𝑝� )–2arc-
groups. In this function, ‘k’ means the number of comparison sin( �𝑝𝑝� ). Cohen suggests that h values of 0.2, 0.5, and 0.8 repre-
groups and ‘f ’ means the effect size, and Cohen’s f is used here. sent small, medium, and large effect sizes respectively. Also, we
The detailed calculation formula can be found below. Cohen will use medium effect size = 0.5.
suggests that f values of 0.1, 0.25, and 0.4 indicate small, medium,
and large effect sizes respectively. Also, we will use medium ef-
fect size = 0.25.

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023
Sample size calculation in clinical trial using R 15

One-sample proportion test (Table 3, no. 9) Chi-square test (Table 3, no. 11)

>library(pwr) >library(pwr)
>pwr.p.test (h = 0.5, sig.level = 0.05, power = 0.8, alternative = >pwt.chisq.test (w = 0.3, df = (2–1)*(3–1), sig.level = 0.05, power = 0.8)
“two.sided”) Effect size calculation:

� �𝑝𝑝𝑝 − 𝑝𝑝𝑝 � �

Cohen’s w (w) = � ��� � �


𝑝𝑝𝑝 �

where p0i = c ell probability in


ith cell under H0
p1i = c ell probability in
In the one-sample proportion test, p2 is the proportion under ith cell under H1
the null hypothesis and p1 is the proportion under the alterna-
tive hypothesis. Assuming a p-value of 0.05 and a power of 80% Similarly, we assumed a p-value of 0.05 and a power of 80%.
in a two-tailed test, the minimum number of subjects required Looking at the association between the two-category variables
to demonstrate statistical significance is 32 when the effect and the three-category variables, the minimum required num-
size h = 0.5. Considering the dropout rate of 20%, a total of 40 ber of subjects is 107 when the effect size is 0.3. Considering the
samples are required. dropout rate of 20%, a total of 134 people are needed.

Two-sample proportion test (Table 3, no. 10) Practice 5

>library(pwr) For nonparametric testing of categorical data, power calcula-


>pwr.2p.test (h = 0.5, sig.level = 0.05, power = 0.8, alternative = tion can be performed using the ss2x2() function [17] (Supple-
“two.sided”) mentary data 2, Table 1). Fisher exact test and McNemar test are
considered, and they are performed separately using the ‘pair’
argument. In the example below, we set the event ratio for the
control group to 0.2 (p0 = 0.2), the event ratio for the treatment
group to 0.8 (p1 = 0.8), and the allocation ratio between groups to
1:1 (n1.over.n0 = 1).
Assuming a p-value of 0.05 and a power of 80% in a two-tailed
test, the minimum number of subjects required for each group to Fisher exact test (Table 3, no. 12)
demonstrate statistical significance is 63 when the effect size h =
0.5. Considering a dropout rate of 20%, 79 subjects are required >install.packages(“exact2x2”)
for each group, for a total of 158 subjects. >library(exact2x2)
>ss2x2 (p0 = .2, p1 = .8, n1.over.n0 = 1, sig.level = 0.05, power = .8,
Practice 4 approx = FALSE, print.steps = TRUE, pair = FALSE)

In the chi-square test, which is a commonly used method for


measuring the association between categorical data, Cohen’s w
is used as a measure of effect size. The pwr.chisq.test() function
(Supplementary data 1, Table 3) takes ‘w’ as an argument for effect
size and ‘df ’ as an argument for degrees of freedom. Assuming
that two categorical variables have l categories and k categories,
respectively, we can create a contingency table consisting of a total
of m = l × k cells, and the ‘df ’ in this table is (l − 1) × (k − 1). Cohen
suggests that w values of 0.1, 0.3, and 0.5 represent small, medium,
and large effect sizes respectively. We will use medium effect size
= 0.3.

Assuming that the event rate of the control group was 0.2 and
that of the treatment group was 0.8, the allocation ratio of each
group was set at 1:1. If a two-sided test is performed with a signif-

www.e-jmis.org
16 Suyeon Park et al.

icance level of 0.05 and a power of 80%, 12 samples are calculated Assuming a p-value of 0.05 and a power of 80% in a two-tailed
for each group. Considering a dropout rate of 20%, 15 subjects are test, the minimum number of subjects required to demonstrate
required for each group, for a total of 30 subjects. statistical significance is 84 for an effect size of r = 0.3. Consider-
ing a dropout rate of 20%, 105 subjects are required.
McNemar test (Table 3, no. 13)
GENERALIZED LINEAR MODEL
>library(exact2x2)
>ss2x2 (p0 = .2, p1 = .8, n1.over.n0 = 1, sig.level = 0.05, power = .8, Generalized linear models [18] have been formulated as a way
approx = TRUE, print.steps = FALSE, pair = TRUE) to incorporate a variety of other statistical models, including lin-
ear regression, logistic regression, and Poisson regression. We will
use the ‘pwr’ package for linear regression and the ‘We’ package
for logistic/Poisson regression.

Practice 7
The pwr.f2.test() function (Supplementary data 1, Table 6)
can be used for multiple linear regression analysis. We will use
Cohen’s f2 as the effect size using the R 2 value used as a measure
of goodness of fit in regression analysis (Cohen’s f2 = R2/(1-R 2)).
Assuming that the event rate of the matched control group The ‘u’ is the number of predictors (or risk factors) considered in
was 0.2 and that of the matched case (or treatment) group was 0.8, the analysis, and the ‘v’ is n (the total number of subjects) – u – 1.
the allocation ratio of each group was set at 1:1. If a two-sided test That is, if you set only the value of u to the function, the value of
is performed with a significance level of 0.05 and a power of 80%, v is calculated and this value is used to calculate the total num-
13 samples are calculated for each group. Considering a dropout ber of subjects (n ≥ v + u + 1). Cohen suggests f2 values of 0.02,
rate of 20%, 16 subjects are required for each group, for a total of 0.15, and 0.35 represent small, medium, and large effect sizes. We
32 subjects. will use medium effect size = 0.15 and u = 3.

CORRELATION ANALYSIS Linear regression (Table 3, no. 15)

Correlation analysis determines whether there is a linear re- >library(pwr)


lationship between two continuous variables. The ‘pwr’ package >pwr.f2.test (u = 3, f2 = 0.15, power = 0.80, sig.level = 0.05)
will be used for this test.

Practice 6
The pwr.r.test() function (Supplementary data 1, Table 5) can
be used in correlation analysis. The correlation coefficient (r) is Similarly, we assumed a p-value of 0.05 and a power of 80%.
used as a measure of effect size. Cohen suggests that r values of Considering the three risk factors (u = 3), if the effect size = 0.15,
0.1, 0.3, and 0.5 represent small, medium, and large effect sizes v = 73. Finally, a total of 77 (73 + 3 + 1) are calculated and consid-
respectively. We will use a medium effect size of 0.3. ering a dropout rate of 20, 96 people should be recruited.

Correlation analysis (Table 3, no. 14) Practice 8


>library(pwr) The wp.logistict() and wp.poisson() function (Supplementary
>pwr.r.test (r = 0.3, power = 0.80, sig.level = 0.05, alternative = data 3, Table 1 and 2) can be used for logistic and Poisson re-
“two.sided”) gression analysis respectively. The two arguments “family” and
“parameter” should contain information about the distribution
of the predictor (or risk factor). Default values are used when the
information in a predictor is unknown. You can change the pa-
rameter value if you know.

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023
Sample size calculation in clinical trial using R 17

Logistic regression [19] (Table 3, no. 16) expert advice for more complex studies, but we hope that this ar-
ticle will help researchers calculate the right number of subjects
>install.packages(“WebPower”, dependencies = TRUE) for their own research.
>library(WebPower)
>wp.logistic (p0 = 0.15, p1 = 0.1, alpha = 0.05, power = 0.8, family = NOTES
“normal”, parameter = c(0,1))
Authors’ contributions
Conceptualization: YHK, HIB, YP
Data curation: SP, YHK, YHK
Formal analysis: SP, HIB
If predictor (X) is a continuous variable, it can be used as fam- Investigation: SP, HIB
ily = “normal” and the ‘parameter’ is used as default. The way p0 Methodology: SP, YHK
and p1 are calculated can be calculated using the 1SD range of X. Project administration: YHK, Y.P
You can set p1 to the probability of being in range and p0 to the Visualization: HIB
probability of being out of range. In this example, p0 = 0.15 and Writing–Original Draft: SP, HIB
p1 = 0.1 were used. Similarly, we assumed a p-value of 0.05 and a Writing–Review & Editing: All authors
power of 80%. The minimum number of samples satisfying these
conditions is 299, and a total of 374 is required considering the Conflict of interest
dropout rate of 20%.
All authors have no conf licts of interest to declare.
Poisson regression [20] (Table 3, no 17)
Funding/support
>library(WebPower)
>w p.poisson (exp0 = 1,exp1 = 1.2, alpha = 0.05, power = 0.8, This work was supported by the Soonchunhyang University Re-
family = “Bernoulli”, parameter = 0.5) search Fund.

ORCID
Suyeon Park, https://orcid.org/0000-0002-6391-557X
Yeong-Haw Kim, https://orcid.org/0000-0002-8068-3678
If predictor (X) is a binary variable, it can be used as family = Hae In Bang, https://orcid.org/0000-0001-7854-3011
“bernoulli” and the ‘parameter’ will be used as its default value. Youngho Park, https://orcid.org/0000-0002-7096-3967
For exp0, a base rate of 1 under the null hypothesis was used, and
for exp1, expected relative risk = 1.2 was set as the relative incre- Supplementary materials
ment of the event rate. Similarly, we assumed a p-value of 0.05
and a power of 80%. The minimum number of samples satisfy- Supplementary data 1–3 can be found via https://doi.org/10.7602/
ing these conditions is 866, and a total of 1083 is required consid- jmis.2023.26.1.9.
ering the dropout rate of 20%.
REFERENCES
CONCLUSIONS
1. Altman DG. Statistics and ethics in medical research: III How large a
In conclusion, sample size calculation plays the most important sample? Br Med J 1980;281:1336-1338.
role in the research design process before starting the study. In 2. Moher D, Dulberg CS, Wells GA. Statistical power, sample size, and
particular, since randomized controlled trial studies, which are their reporting in randomized controlled trials. JAMA 1994;272:122-
frequently conducted in clinical settings, are directly related to 124.
cost issues, the number of samples must be carefully calculated. 3. Foulkes M. Study designs, objectives, and hypotheses [Internet].
However, although there are various references related to sample Johns Hopkins Bloomberg School of Public Health; 2008 [cited 2023
size calculation, it can be difficult to correctly use a method Feb 20]. Available from: https://docplayer.net/38128249-Study-de-
suitable for your own study. Of course, it would be better to seek signs-objectives-and-hypotheses-mary-foulkes-phd-johns-hopkins-

www.e-jmis.org
18 Suyeon Park et al.

university.html 12. Lehmann EL. Nonparametrics: statistical methods based on ranks.


4. Bose M, Dey A. Optimal crossover designs. World Scientific; 2009. Springer; 2006.
5. Johnson DE. Crossover experiments. WIREs Comp Stat 2010;2:620- 13. McGraw KO, Wong SP. A common language effect size statistic. Psy-
625. chol Bull 1992;111:361-365.
6. Agency EM. ICH: E 9: Statistical principles for clinical trials - Step 14. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed.
5 [Internet]. European Medicines Agency; 1998 [cited 2023 Feb 20]. Taylor & Francis; 2013.
Available from: https://www.ema.europa.eu/en/documents/scientific- 15. Sawilowsky SS. New effect size rules of thumb. J Mod Appl Stat
guideline/ich-e-9-statistical-principles-clinical-trials-step-5_en.pdf Methods 2009;8:26.
7. Chow SS, Shao J, Wang H, Lokhnygina Y. Sample size calculation in 16. Bonett DG. Confidence intervals for standardized linear contrasts of
clinical trial. Marcel Dekker Inc; 2003. means. Psychol Methods 2008;13:99-109.
8. Lachin JM. Chapter 3. Sample size, power, and efficiency. In: Lachin 17. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and propor-
JM. Biostatistical methods. John Wiley & Sons; 2000. p. 61-86. tions. 3rd ed. John Wiley & Sons; 2013.
9. van Belle G. Statistical rules of thumb. 2nd ed. John Wiley & Sons; 18. Nelder JA, Wedderburn RW. Generalized linear models. J R Stat Soc
2011. Ser A 1972;135:370-384.
10. Lachin JM. Introduction to sample size determination and power 19. Demidenko E. Sample size determination for logistic regression re-
analysis for clinical trials. Control Clin Trials 1981;2:93-113. visited. Stat Med 2007;26:3385-3397.
11. Zhang Z, Yuan KH. Practical statistical power analysis using Web- 20. Cohen J. A power primer. Psychol Bull 1992;112:155-159.
power and R. ISDSA Press; 2018.

Journal of Minimally Invasive Surgery Vol. 26


26.. No. 1, 2023

You might also like