RB Sample
RB Sample
com
bsample Sampling with replacement
Syntax
bsample exp if in , options
where exp is a standard Stata expression; see [U] 13 Functions and expressions.
options Description
strata(varlist) variables identifying strata
cluster(varlist) variables identifying resampling clusters
idcluster(newvar) create new cluster ID variable
weight(varname) replace varname with frequency weights
Menu
Statistics > Resampling > Draw bootstrap sample
Description
bsample draws bootstrap samples (random samples with replacement) from the data in memory.
exp specifies the size of the sample, which must be less than or equal to the number of sampling
units in the data. The observed number of units is the default when exp is not specified.
For bootstrap sampling of the observations, exp must be less than or equal to N (the number of
observations in the data; see [U] 13.4 System variables ( variables)).
For stratified bootstrap sampling, exp must be less than or equal to N within the strata identified
by the strata() option.
For clustered bootstrap sampling, exp must be less than or equal to Nc (the number of clusters
identified by the cluster() option).
For stratified bootstrap sampling of clusters, exp must be less than or equal to Nc within the strata
identified by the strata() option.
Observations that do not meet the optional if and in criteria are dropped (not sampled).
Options
strata(varlist) specifies the variables identifying strata. If strata() is specified, bootstrap samples
are selected within each stratum.
cluster(varlist) specifies the variables identifying resampling clusters. If cluster() is specified,
the sample drawn during each replication is a bootstrap sample of clusters.
1
2 bsample Sampling with replacement
idcluster(newvar) creates a new variable containing a unique identifier for each resampled cluster.
weight(varname) specifies a variable in which the sampling frequencies will be placed. varname
must be an existing variable, which will be replaced. After bsample, varname can be used as
an fweight in any Stata command that accepts fweights, which can speed up resampling for
commands like regress and summarize. This option cannot be combined with idcluster().
By default, bsample replaces the data in memory with the sampled observations; however,
specifying the weight() option causes only the specified varname to be changed.
bsample requires that the specified size of the sample be an integer, so we use the round()
function to obtain the nearest integer to 0.1 2392 and 0.1 3418. Our sample now has 239 males
and 342 females:
. tabulate female
female Freq. Percent Cum.
A 15 -.3073028
B 10 -.00984
C 11 .0810985
D 11 -.1989179
E 29 -.095203
Total 76 -.1153269
Expanding the data will only partly solve the problem. We also need a new variable that uniquely
identifies the copied clusters. We use the expandcl command to accomplish both these tasks; see
[D] expandcl.
. use http://www.stata-press.com/data/r13/bsample2, clear
. set seed 1234
. expandcl 2, generate(expgroup) cluster(group)
(76 observations created)
. tabstat x, stat(n mean) by(expgroup)
Summary for variables: x
by categories of: expgroup
expgroup N mean
1 15 -.3073028
2 15 -.3073028
3 10 -.00984
4 10 -.00984
5 11 .0810985
6 11 .0810985
7 11 -.1989179
8 11 -.1989179
9 29 -.095203
10 29 -.095203
. generate fw = .
(152 missing values generated)
. bsample 8, cluster(expgroup) weight(fw)
. tabulate fw group
group
fw A B C D E Total
0 15 10 0 0 29 54
1 15 10 22 22 0 69
2 0 0 0 0 29 29
Total 30 20 22 22 58 152
The results from tabulate on the generated frequency weight variable versus the original cluster ID
(group) show us that the bootstrap sample contains one copy of cluster A, one copy of cluster B, two
copies of cluster C, two copies of cluster D, and two copies of cluster E (1 + 1 + 2 + 2 + 2 = 8).
bsample Sampling with replacement 7
A 7 8 15
B 5 5 10
C 5 6 11
D 5 6 11
E 14 15 29
Total 36 40 76
. expandcl 2, generate(expgroup) cluster(strid group)
(76 observations created)
Now we can use bsample with the expanded data, stratum ID variable, and new cluster ID variable.
. generate fw = .
(152 missing values generated)
. bsample 8, cluster(expgroup) str(strid) weight(fw)
. by strid, sort: tabulate fw group
-> strid = 1
group
fw A B C D E Total
0 0 5 0 5 14 24
1 14 5 10 5 0 34
2 0 0 0 0 14 14
Total 14 10 10 10 28 72
-> strid = 2
group
fw A B C D E Total
0 8 10 0 6 0 24
1 8 0 6 6 15 35
2 0 0 6 0 15 21
Total 16 10 12 12 30 80
The results from by strid: tabulate on the generated frequency weight variable versus the original
cluster ID (group) show us how many times each cluster was sampled for each stratum. For stratum
1, the bootstrap sample contains two copies of cluster A, one copy of cluster B, two copies of cluster
C, one copy of cluster D, and two copies of cluster E (2 + 1 + 2 + 1 + 2 = 8). For stratum 2, the
bootstrap sample contains one copy of cluster A, zero copies of cluster B, three copies of cluster C,
one copy of cluster D, and three copies of cluster E (1 + 0 + 3 + 1 + 3 = 8).
8 bsample Sampling with replacement
References
Gould, W. W. 2012a. Using Statas random-number generators, part 2: Drawing without replacement. The Stata Blog:
Not Elsewhere Classified.
http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/.
. 2012b. Using Statas random-number generators, part 3: Drawing with replacement. The Stata Blog:
Not Elsewhere Classified. http://blog.stata.com/2012/08/29/using-statas-random-number-generators-part-3-drawing-
with-replacement/.
Also see
[R] bootstrap Bootstrap sampling and estimation
[R] bstat Report bootstrap results
[R] simulate Monte Carlo simulations
[D] sample Draw random sample