1 3MonteCarloSimulation
1 3MonteCarloSimulation
In this book, we will use the Monte Carlo method to teach you about
inference and the properties of different econometric estimators. The
basic idea is to have a computer run a chance process over and over so
that we can see the results. We’ll set up an artificial environment where
we know important parameters so that we can explore and check statistical
properties.
1
Introduction to Monte Carlo Methods, http://csep1.phy.ornl.gov/mc/node1.html.
779357418.doc Page 1 of 11
Chapter 1: Introduction Your Notes
2
We cannot resist telling you that according to the web site, http://www.larrybird.com/stats.html, which
we visited on January 13, 1998, Bird’s lifetime NBA free throw percentage was 88.6% in the regular
season (3960 made out of 4471 attempts) and 89.0% in the playoffs (901 out 1012). A question: was Bird
better in the playoffs? Trivial facts: Bird played in 164 playoff games and on average he attempted more
free throws per game in the playoffs than in the regular season (6.2 versus 5.8 per game).
779357418.doc Page 2 of 11
Chapter 1: Introduction Your Notes
plays a role in free throw shooting means that we may well get something
different from 90%. Now, the possibilities are anywhere from 0 to 100%,
but what are the likely results? Is it likely that we could see him make
only 72 out of 100 attempts for a sample percentage of 72%? Is making
every shot—100 straight free throws—giving him a 100% sample
percentage something that we might see every once in a while? Or, are
results like 72% and 100% extremely rare and results like 88%, 93%, or
91% much more likely?
What we’re trying to do, of course, is to evaluate the likely size of the
spread in the sample percentage of a sample of 100 free throws. Each free
throw has some chance built into it and so the sample percentage of 100
free throws also has a chance component. We need to figure out how
much variation there is in the sample percentage of 100 free throws. In
other words, we need to find the SE (standard error) of the sample
percentage. A small SE of the sample percentage is good—it means that
the observed sample percentages are unlikely to stray far from 90%.
3
It’s “possible” that a 90% free throw shooter would miss 100 in a row. The likelihood of this outcome,
0.1100 is so remote that we ignore it completely. The chances of making every shot aren’t so great either
—0.9100 = 0.00266%.
779357418.doc Page 3 of 11
Chapter 1: Introduction Your Notes
There are two routes to figuring out the variation in the sample
Two ways to
percentage. The first is statistical theory.4 The second route is the Monte
find the SE
Carlo approach: this consists of producing a simulation of the data
generating process, generating a series of replications of that process, and
analyzing the results of the experiment. How do we implement this
Statistical Monte Carlo
strategy? Theory Simulation
Read the brief description in the sheet Introduction, then go to the Rand
sheet (by clicking on the sheet tab at the bottom of the screen) to learn
about Excel’s random number generation capability. When you are
finished, you should understand how Excel generates random numbers
and how the Excel functions RAND() and IF(expression, true, false) can
be used to create a virtual Larry Bird shooting machine.
You have learned that we can use Excel to simulate the result of a single
free throw by having it draw a random number uniformly between 0 and
1. If the number is below 0.9, Excel says, “hit;” if the randomly drawn
number is equal to or above 0.9, it says, “miss.” We can have Excel
register “1” for a hit and “0” for a miss.
To simulate Bird shooting 100 free throws is simple: just repeat the
formula in 100 cells as we show in the sheet called Sample. Call the result
of 100 “shots” a single repetition of the simulation. The key information
from a single repetition would be the sample percentage of 1’s. You
4
We review exactly how statistical theory can be used to solve this problem in the next chapter.
779357418.doc Page 4 of 11
Chapter 1: Introduction Your Notes
should press F9, per the instructions in the Sample sheet, to make sure you
understand that the sample percentage of 100 attempts varies—press F9
again and again and watch how the sample percentage bounces around.
Sometimes Larry does exceptionally well, maybe 94% or 95%, but every
once in a while he does quite badly. Well, never as poorly as say, Shaq5
—extremely badly for Larry is 85% and below 80% is really rare. You
might repeatedly press F9 for 20 minutes and not see 80%.
Now that you understand how the success or failure of a single free throw
is determined via Excel’s Rand function and IF statement and how we
calculate the sample percentage from 100 free throws, we can turn to
actually creating and interpreting Monte Carlo simulation results.
To figure out the spread of the sample percentage in the Larry Bird
example, we simply conduct lots of repetitions and examine the resulting
histogram of results. Let’s say we do 1,000 repetitions. Now we have
1,000 sample percentages. We can find the mean of these sample
percentages and their SD (standard deviation). You’re guaranteed to get
an average close to 0.90 (90%). The question is, “How much spread is
there in the 1,000 sample percentages?” The SD of the 1,000 sample
percentages is a Monte Carlo-generated approximation to the true, exact
SE of the sample percentage and the histogram of the 1,000 sample
percentages approximates the probability histogram (or sampling
distribution).
5
Shaquille O’Neal is a tremendously gifted seven-foot-one-inch athlete in the NBA. See his web site:
http://www.shaq.com/.
779357418.doc Page 5 of 11
Chapter 1: Introduction Your Notes
finite number of repetitions, no matter how large, will give the exact
answer. Monte Carlo simulation cannot be used to get the exact right
answer, but it can give an increasingly good approximation as the number
of repetitions rises.
600
400
200
0
78% 80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%
The bars in the histogram show how many samples of 100 free throws had
a particular percentage made. Of the 10,000 repetitions of 100 free
throws, the lowest sample percentage was 79% and the highest was 99%.
In almost 1400 samples, the computer simulation of Larry Bird made
exactly 90 out of 100 free throw attempts. The mean of the 10,000
sample percentages was 89.99% with a standard deviation of 2.995%.
This analysis says that the likely size of chance error for the sample
percentage of 100 free throws is about 3%. Thus, we should not be
surprised to find that Larry Bird sinks 87% or 93% of his free throws
when he takes 100 attempts. It would be very surprising, however, if he
hit all 100, or if he hit only 80 out of 100 since these values are more than
779357418.doc Page 6 of 11
Chapter 1: Introduction Your Notes
3 standard deviations away, and in most cases that means such outcomes
are rare indeed.
Now it’s your turn. From the Samples sheet, click on the
After clicking the OK button, you will be able to watch the progress of the
simulation. So, how did your simulation turn out? Is your histogram
similar to ours?
779357418.doc Page 7 of 11
Chapter 1: Introduction Your Notes
Let’s summarize the Larry Bird free throw shooting example. We wanted
to know how much spread there was in the sample percentage. Instead of
traditional, analytical methods based on the theory of probability and
statistics, we adopted the Monte Carlo simulation strategy. We repeatedly
resampled and thereby obtained an approximation to the SE of the sample
percentage of 100 attempts. Our run gave us a value of about 3%. What
did you get? The formula for the SE of the sample percentage gives us
precisely 3%.6 It is, of course, no accident that our Monte Carlo
experiments yield results close to the standard formulas of statistical
theory.
6
The appropriate formula is:
779357418.doc Page 8 of 11
Chapter 1: Introduction Your Notes
Why bother then with Monte Carlo simulation? First, it enables you to
see clearly the source of chance error and variation in a problem.
Formulas often make it difficult to see what’s really going on. While
some people quickly understand and accept the notion of randomness and
variation, we believe most people learn much better when they actually
see variation. We believe many more people will really get it when they
hit F9 to draw another sample and see that sample percentage bouncing
around. By hitting F9, the student is doing and understanding instead of
passively reading or listening.
Second, Monte Carlo techniques drive home the concept of the Standard
Error, surely one of the most difficult ideas in statistics and econometrics
for beginning students. The SE measures the spread of outcomes of
chance processes. Visually, it is the spread of the probability histogram of
the different outcomes of the chance process. The Monte Carlo method
allows us to approximate the probability histogram and therefore the SE
just by running numerous repetitions of the same data generating process.
779357418.doc Page 9 of 11
Chapter 1: Introduction Your Notes
button (on the Sample sheet near cell D17) a few times.
Our simulated Larry Bird exhibits variation in the longest streak of made
free throws in each sample of 100 attempts. What’s the average longest
streak? What’s the spread in the distribution of longest streaks? As
before, we forego analytical solutions to these questions in favor of Monte
Carlo analysis.7
Click on the button (on the Sample sheet near cell D22)
to see a demonstration of how a Monte Carlo simulation can be used to
determine approximately the average and spread of the Max Streak
sampling distribution. As before, a new sheet, this time named Streak,
appears in the workbook with results of 1000 repetitions available for
your inspection. Notice that Max Streak is not normally distributed —it
has a long right-hand tail.
You might want to try your own Monte Carlo analysis by clicking the
7
For an analytical approximation to the exact distribution of the max streak problem, see William Feller,
An Introduction to Probability Theory and Its Applications, Vol. 1 , 3rd Edition, Revised Printing, John
Wiley and Sons, p.325. Our Monte Carlo results agree with Feller's approximation.
779357418.doc Page 10 of 11
Chapter 1: Introduction Your Notes
instead of grinding out the next repetition). If your screen saver comes
on, this will also slow down the simulation. Notice also that you can
interrupt the simulation by pressing the Esc (escape) key on the upper left-
hand corner of your keyboard. Excel will prompt you with a dialog box
and you can click the End button to stop the simulation.
With Excel’s RAND() function, it will be fast and easy to draw many
random samples and then examine the resulting distribution. This will
provide a visual, concrete demonstration of difficult, abstract ideas. In
addition, with Excel, you will be able to run your own simulations and
compare your results to ours. If a point is unclear, you can always run the
simulation again and keep doing so until it makes sense.
779357418.doc Page 11 of 11