0% found this document useful (0 votes)
57 views

Stat - Bootstrapping in Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Stat - Bootstrapping in Statistics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Fall 2013, Volume 3, Issue 3

T HE S TATS W HISPERER
The StatsWhisperer Newsletter is published by staff at StatsWhisperer.™ For many more free resources in learning
statistics, including webinars and subscribing to this newsletter, visit us on the web at: www.statswhisperer.com

Bootstrapping: It’s Not Just for INSIDE THIS ISSUE

Footwear Anymore Bootstrapping: It’s Not Just for Footwear Anymore 1


What is Bootstrapping in Statistics? 1
Ah yes, I am sure there was a time for all of us How Can Bootstrap Methods Be Implemented? 3

when our idea of bootstrapping was buying some Final Comments 7

new footwear and fastening them to our feet.


However, now that we have entered the world of
statistics the term has a double meaning. Don’t
worry, you can still keep that footwear that
suggests “I’m with the band” or “All my exes live technique often seems that you are being afforded
in Texas,” but now you will also have an incredibly a seemingly impossible benefit, which we will
useful statistical procedure to go along with them. discuss in this newsletter issue.

How useful is this procedure? Well, the term The focus of this newsletter has always been to

“bootstrapping” actually refers to the utility of the suggest answers to challenges encountered in data

technique. Specifically, it has been suggested the analysis, and bootstrapping is a key solution to

term is derivative of the phrase, “To lift yourself many.

up by your bootstraps.” This phrase suggests that


Read Chapter 1 of our new book,
one is doing the seemingly impossible, as it is
unlikely that you will be able to lift yourself into The Seven Steps of Data Analysis at:
the air by tugging at the straps attached to your http://www.statswhisperer.com/products/
boots. Similarly, the use of the bootstrapping

What is Bootstrapping in Statistics?


Before describing the bootstrap technique, let’s of scores, are not met. Additionally, widely used
first discuss a common problem in statistics that methods of treating the data in response to these
bootstrapping may be used to address. challenges, such as data transformations (e.g., log
Specifically, most researchers are eventually odds), may not be effective. Subsequently, the
tasked with performing an inferential statistical researcher is left in a situation where the analysis
analysis using a relatively small sample of study must be done, but the data do not seem to meet
participants. A small sample often presents the needs of the analysis. At this point the
several challenges. researcher ponders how to use this somewhat
small sample, which does not meet the
For example, many times when using a somewhat
assumptions of the parametric tests he or she
small sample many of the assumptions of
wished to use, to produce an effective analysis.
parametric testing, such as a normal distribution
Page 2 The StatsWhisperer

What is Bootstrapping in Statistics? (Continued)


In this situation, bootstrapping may provide a For example, suppose you are interested in the
viable solution to the challenge the researcher is average (i.e., the mean) weight of people in the
facing. Specifically, when a sample is not large United States. Of course, it might not be
enough to facilitate a straightforward statistical possible for you to weight every person in the
inference, bootstrapping provides a means of country to uncover this number. Subsequently,
accounting for the distortions caused by a in true research form you instead sample a
somewhat small sample. Specifically, small cross-section of the country to obtain an
bootstrapping is a statistical procedure used to estimate. Subsequently, let’s assume you
estimate statistical perimeters by sampling with obtain a sample size of 1,000 study
replacement from the original sample. The participants, which of course would produce
purpose of bootstrapping is very often to derive only one mean value of weight. However, in
robust estimates of statistical values such as order to draw estimates about the entire
standard errors and confidence intervals regarding population of the United States, you will need a
a population parameter, such as the mean, sense of the variability in the mean weight
median, odds ratio, correlation coefficient, or score computed. The bootstrap method will
regression coefficient. However, as suggested take the original 1,000 weight values in your
above, bootstrapping is also used as a robust dataset and via your software program will
alternative to straightforward inference based on sample from that distribution to form a new
parametric assumptions when those assumptions sample, which is known as the bootstrap
are dubious. sample or re-sample.

How does the bootstrapping procedure work? The bootstrap sample is derived from the
original sample of 1,000 weights using
Essentially, the bootstrap method takes a sample
sampling and replacement, so each bootstrap
of your data (e.g., your sample of study
sample is not identical to the original sample.
participants), then another sample, then another
The bootstrap sample process is repeated a
sample, and so on up to thousands of times, to
significant amount of times (usually varying
empirically estimate the statistical values (see
from 1,000 or 10,000 times based on various
Efron & Tibshirani,1993 and Hesterberg et. al.,
factors). Each bootstrap sample will incorporate
2005). Not surprisingly, bootstrapping is from a
a mean value score, which would be known as a
family of statistical tests known as resampling
bootstrap estimate.
methods. The idea behind bootstrap is to use the
data available to you via your sample as a Based upon the distribution of mean value
“surrogate population”, for the purpose of scores within each bootstrap sample, you
approximating the sampling distribution of a would then have a larger distribution of mean
statistic. In other words, the bootstrap is value scores based upon the multiple bootstrap
resampling with replacement from the sample you means. These values would then provide an
do have to create a significant number of estimate of the shape of the distribution of
“phantom samples” that we call bootstrap samples. mean scores from which you can now answer
The software then computes a sample summary questions regarding how much the mean score
based on each of the bootstrap samples. regarding weight within your sample varies.
The StatsWhisperer Page 3

How Can Bootstrapping Methods Be Implemented?


Here is even more good news! You do not need participant happy (1=Yes, 0=No). The
specialized statistical software to conduct the independent variable (within the box labeled
bootstrapping method. Specifically, in recent years Covariates) is the variable Does the study
the bootstrap method has been made available in participant live with a cat or a dog? (0=cat,
commonly used statistical software, including 1=dog).
SPSS. Thus, this method is quite accessible to
Before we conduct this analysis, there are two
many people. Below, to illustrate the bootstrap
boxes we need to check. First, click the box in
method, we have employed the technique in binary
the upper right hand corner labeled Options.
logistic regression.
Within the dialogue box the opens, click the
To employ bootstrapping in binary logistic box next to the term CI for Exp(B), which will
regression using SPSS, we would first go to: produce the 95% confidence interval for the
Analyze→ Regression→ Binary Logistic. A dialogue odds ratio estimate. This is the range in which
box will open that looks similar to the image we are 95% confident the true odds ratio within
below. You will see in the image below the the population lies. Then click the button
dependent variable we have entered (within the marked Continue.
box labeled Dependent) is the variable Is the study
Page 4 The StatsWhisperer

How Can Bootstrapping Methods Be Implemented?


Then, in the upper right hand corner of the Then under the label Confidence Intervals click
original dialogue box, click the button labeled Bias corrected accelerated. Then under the term
Bootstrap. A dialogue box will open that looks Sampling click Simple. Then click the button
similar to the image below. Check the box in the labeled Continue.
upper left hand corner marked Perform
Then in the original dialogue box, click the
bootstrapping. The box below labeled Number of
button marked OK or Paste (to save the syntax).
samples will be set by default at 1,000. For our
purposes here that is fine. As I mentioned before
this number may vary by certain study
traits/needs.
The StatsWhisperer Page 5

How Can Bootstrapping Methods Be Implemented? (cont.)


At the very end of the statistical output for the that live with a dog are exactly 5.391 times
binary logistic regression, the next to last box will more likely to be happy relative to those living
describe the relationship between the independent with a cat, as is indicated within our analysis.
variable Does the study participant live with a dog However, we are 95% confident that the true
or cat? (0=cat, 1=dog) and the dependent variable odds ratio in the population lies in between
Is the study participant happy? (1=yes, 0=No) as 2.096 to 13.866.
per the regular analysis.
This confidence interval is rather wide.
We see that the odds ratio indicates that those Specifically, the confidence interval suggests
who live with a dog are over five times (OR=5.391) that the true effect size between the
more likely to be happy (happy=yes) relative to independent and dependent variables might be
study participants living with a cat (the column on the smaller side (OR=2.096) or it may be on
labeled Exp(B) presents the odds ratio). the very large side (OR=13.886).

We can also see that the relationship is statistically Suppose we wanted a better estimate of the
significant at the p<.001 level (Sig.=.000). odds ratio, which might more clearly convey
what the effect size might be. In such a case,
Lastly, we see the 95% odds ratio is 2.095 to
we could simply observe the final box within
13.866. Again this is our way of affirming that we
the SPSS output for this procedure which
do not know the actual odds ratio in the
presents the bootstrap estimated values
population. Specifically, we are affirming that in
regarding this analysis.
the actual population it is unlikely that people

Level of Odds Ratio 95%


statistical
Confidence
significance of
Interval
the relationship
Page 6 The StatsWhisperer

How Can Bootstrapping Methods Be Implemented? (cont.)


The final box in the SPSS output, presented below, Level of statistical
provides the bootstrap adjusted estimates for the significance of the
same relationship we just reviewed above. relationship
Specifically, the same relationship is presented,
only with more precise estimates of statistical
values based on the bootstrapping procedure. 95% Confidence
Interval

The first thing that might interest us is the level of might not be significant in the actual
statistical significance regarding the relationship. population as the number 1.00 is included in
We see that the level of statistical significance the 95% confidence interval as the lowerbound
regarding the relationship has changed very little, estimate is .702 and the upperbound estimate
from .000 to .001. Thus, the relationship is is 3.016. Furthermore, the upperbound
essentially at a very similar level of statistical estimate is 3.016. An odds ratio of 3.016 is
significance. However, the estimate of the 95% considered a medium size effect between
confidence interval is quite a different story. variables. Note, this upperbound effect is close
Recall, in the box above the odds ratio is 5.391 to the lowerbound effect size in the first
with a 95% confidence interval of 2.095 to 13.866, analysis we examined of 2.095.
suggesting a significant small to very large effect
Thus, the bootstrap procedure has suggested
might exist between the independent and
that the actual effect size in the population is
dependent variables.
not small (OR=2.095) to very large
However, the 95% confidence interval is now .702 (OR=13.866) as indicated in the first box within
to 3.016. The first thing to notice is that the the output we examined, but rather not
confidence interval now includes the null value significant (including the null value of 1.00) to
1.00. The null value is the number that indicates at most a medium size effect (OR=3.016).
that there is not a statistically significant
relationship between the independent and
dependent variables. Thus, the bootstrapping
procedure has suggested that this relationship
The StatsWhisperer Page 7

Final Comments
Dr. William M. Bannon, Jr., the founder and
Obviously, this is a very simplistic presentation of CEO of StatsWhisperer can be reached at:
bootstrapping. At more advanced levels there are
different types of procedures and important rules [email protected]
to follow and assumptions that need to be met.
However, our presentation today is a more of a
Thanks for your interest in our newsletter!
“get your feet wet” sort of discussion. My goal
was to draw attention to the procedure to put
people on the road to learning.
For information on books, products, and
Doubtlessly, the procedure can be of great use in
consultation go to:
many ways. For example, while we briefly
examined the use of bootstrapping in binary http://www.statswhisperer.com/products/
logistic regression, we did not examine how we
can apply the procedure within linear regression
For information on webinars and seminars
(also available in SPSS) to address the challenges
related to violated test assumptions we
go to:
mentioned earlier.
http://www.statswhisperer.com/seminars/
However, as I say regularly, a big challenge in
statistics is not a problem accessing information,
but a lack of awareness regarding what
procedures exist which you need to learn about.
Hopefully, this small discussion of bootstrapping
will be useful in this respect.

REFERENCES

Efron, B., & Tibshirani, R.J., 1993. An introduction


to the bootstrap. Boca Raton: Chapman &
Hall/CRC.

Hesterberg, T. C., Moore, D. S., Monaghan, S.,


Clipson, A., and Epstein, R. (2005). “Bootstrap
methods and permutation test”. In David S. Moore
and George McCabe. Introduction to the Practice
of Statistics.

You might also like