Stat - Bootstrapping in Statistics
Stat - Bootstrapping in Statistics
T HE S TATS W HISPERER
The StatsWhisperer Newsletter is published by staff at StatsWhisperer.™ For many more free resources in learning
statistics, including webinars and subscribing to this newsletter, visit us on the web at: www.statswhisperer.com
How useful is this procedure? Well, the term The focus of this newsletter has always been to
“bootstrapping” actually refers to the utility of the suggest answers to challenges encountered in data
technique. Specifically, it has been suggested the analysis, and bootstrapping is a key solution to
How does the bootstrapping procedure work? The bootstrap sample is derived from the
original sample of 1,000 weights using
Essentially, the bootstrap method takes a sample
sampling and replacement, so each bootstrap
of your data (e.g., your sample of study
sample is not identical to the original sample.
participants), then another sample, then another
The bootstrap sample process is repeated a
sample, and so on up to thousands of times, to
significant amount of times (usually varying
empirically estimate the statistical values (see
from 1,000 or 10,000 times based on various
Efron & Tibshirani,1993 and Hesterberg et. al.,
factors). Each bootstrap sample will incorporate
2005). Not surprisingly, bootstrapping is from a
a mean value score, which would be known as a
family of statistical tests known as resampling
bootstrap estimate.
methods. The idea behind bootstrap is to use the
data available to you via your sample as a Based upon the distribution of mean value
“surrogate population”, for the purpose of scores within each bootstrap sample, you
approximating the sampling distribution of a would then have a larger distribution of mean
statistic. In other words, the bootstrap is value scores based upon the multiple bootstrap
resampling with replacement from the sample you means. These values would then provide an
do have to create a significant number of estimate of the shape of the distribution of
“phantom samples” that we call bootstrap samples. mean scores from which you can now answer
The software then computes a sample summary questions regarding how much the mean score
based on each of the bootstrap samples. regarding weight within your sample varies.
The StatsWhisperer Page 3
We can also see that the relationship is statistically Suppose we wanted a better estimate of the
significant at the p<.001 level (Sig.=.000). odds ratio, which might more clearly convey
what the effect size might be. In such a case,
Lastly, we see the 95% odds ratio is 2.095 to
we could simply observe the final box within
13.866. Again this is our way of affirming that we
the SPSS output for this procedure which
do not know the actual odds ratio in the
presents the bootstrap estimated values
population. Specifically, we are affirming that in
regarding this analysis.
the actual population it is unlikely that people
The first thing that might interest us is the level of might not be significant in the actual
statistical significance regarding the relationship. population as the number 1.00 is included in
We see that the level of statistical significance the 95% confidence interval as the lowerbound
regarding the relationship has changed very little, estimate is .702 and the upperbound estimate
from .000 to .001. Thus, the relationship is is 3.016. Furthermore, the upperbound
essentially at a very similar level of statistical estimate is 3.016. An odds ratio of 3.016 is
significance. However, the estimate of the 95% considered a medium size effect between
confidence interval is quite a different story. variables. Note, this upperbound effect is close
Recall, in the box above the odds ratio is 5.391 to the lowerbound effect size in the first
with a 95% confidence interval of 2.095 to 13.866, analysis we examined of 2.095.
suggesting a significant small to very large effect
Thus, the bootstrap procedure has suggested
might exist between the independent and
that the actual effect size in the population is
dependent variables.
not small (OR=2.095) to very large
However, the 95% confidence interval is now .702 (OR=13.866) as indicated in the first box within
to 3.016. The first thing to notice is that the the output we examined, but rather not
confidence interval now includes the null value significant (including the null value of 1.00) to
1.00. The null value is the number that indicates at most a medium size effect (OR=3.016).
that there is not a statistically significant
relationship between the independent and
dependent variables. Thus, the bootstrapping
procedure has suggested that this relationship
The StatsWhisperer Page 7
Final Comments
Dr. William M. Bannon, Jr., the founder and
Obviously, this is a very simplistic presentation of CEO of StatsWhisperer can be reached at:
bootstrapping. At more advanced levels there are
different types of procedures and important rules [email protected]
to follow and assumptions that need to be met.
However, our presentation today is a more of a
Thanks for your interest in our newsletter!
“get your feet wet” sort of discussion. My goal
was to draw attention to the procedure to put
people on the road to learning.
For information on books, products, and
Doubtlessly, the procedure can be of great use in
consultation go to:
many ways. For example, while we briefly
examined the use of bootstrapping in binary http://www.statswhisperer.com/products/
logistic regression, we did not examine how we
can apply the procedure within linear regression
For information on webinars and seminars
(also available in SPSS) to address the challenges
related to violated test assumptions we
go to:
mentioned earlier.
http://www.statswhisperer.com/seminars/
However, as I say regularly, a big challenge in
statistics is not a problem accessing information,
but a lack of awareness regarding what
procedures exist which you need to learn about.
Hopefully, this small discussion of bootstrapping
will be useful in this respect.
REFERENCES