Resample - XLS: Example 1 - The UK National Lottery
Resample - XLS: Example 1 - The UK National Lottery
XLS
Michael Wood, February 2003
The approach in this workbook is explained in more detail in Making sense of statistics - a non-
mathematical approach (Palgrave, August 2003). There are other worksheets at
http://userweb.port.ac.uk/~woodm/nms .
This workbook is for resampling, or for implementing the "two bucket model". Please note the two
general points below, then I'd suggest working through one of the examples to see how it works.
The maximum sample and resample sizes are both 100: there are some notes on increasing these
limts after the two examples.
Green cells are for keying in data and formulae. There are formulae in many cells that look blank, so
check you are not overwriting anything if you key anything in to a cell which is not green. (You must
leave A4 on the Lots of resamples sheet blank; otherwise the table on the left hand side will not
work.) If a cell is green but has something in it, you can either accept what's there, or change it: eg
you can give Variable 1 a name of your own if you want to.
The workbook is set to recalculate automatically - except for tables. The Lots of resamples sheet
uses a table, so you need to press F9 to calculate (or recalculate) the statistics and the graph in this
sheet. If you have a lot of data, or your computer is slow, it may be a good idea to set it to manual
calculation (Tools-Options-Calculation then tick the box for Manual and delete the tick for
recalculating before saving). You then have to press F9 to do the calculations.
Next click on the 'Single resample' tab at the bottom. You need to tell the spreadsheet how many
balls you will be drawing from Bucket 1. This should be six - the number of balls chosen by the
lottery machine. Enter this in the green cell at the top.
You will also notice here that there are two kinds of resampling - without replacement, and with
replacement. This refers to whether or not you replace each ball in Bucket 1 before drawing the
next. Which do we want here - sampling with or without replacement?
We want to sample without replacement because that is what the lottery machine does. Balls come
out and stay out. They are not replaced, so the same ball is never chosen twice.
This means that we want to use the block headed Resampling without replacement. The next step
is to enter the 'Resample statistic' we want to use in the other green cell. This should be the sum of
all the values of variable 1. Enter the formula
=SUM(C7:C12)
in this cell. In my case 2 appears in the cell, but if you are doing it yourself another number will
probably appear. If you press F9 the worksheet will be recalculated producing another set of
random numbers, another resample (draw from Bucket 1), and another score representing the
number of numbers correct.
You can also name this resample statistic. Type Number correct in Cell A4 (replacing resample
statistic).
Now click on the 'Lots of resamples (Bucket 2)' tab. There is a table on the left hand side which
contains the results of repeated resamples from the previous sheet. Here you need to fill in the
Values of interest in the green box - enter 0, 1, 2, 3, 4, 5, 6 in the top of this column, and then press
F9 to calculate the probabilities.
There is also a graph here. The scale is automatic; you can change it by keying the middle of the
bottom two bars in the green cells - I would suggest 0 and 1. Also, if you enter 3 for the cut value
the spreadsheet will work out the probability of getting a value less than 3, equal to 3, and more
than 3.
To increase the maximum resample size (from 100) you need to copy the cells A106:G106 further
down, and make sure that the resample statistic formula refers to the right range.
To increase the number of resamples (balls in Bucket 2) from 200, you need to go to the Lots of
resamples sheet, then select A4:B204 and extend it as far down as you want to go, then Data-
Table-Column input A4 (leave Row input blank) and click OK. Then you need to extend the named
block Resamvals (Insert-Name-Define). Then copy the formulae in cells A204 and C204 down as
far as the table extends, and extend the range named cut (Insert-Name-Define).
Sample (Bucket 1)
If your data is in another spreadsheet file, open this, and then copy and paste the data into the green cells.
Sample size: 0
(The sample reference numbers and random numbers will appear automatically. They are for the resampling process.)
Sample Reference no Random number Variable 1 Variable 2
he green cells.
Resample size
Resample statistic
RESAMPLING WITHOUT REPLACEMENT RESAMPLING WITH REPLACEMENT
Resample reference no Sample ref no Variable 1 Variable 2 Sample ref no Variable 1
TH REPLACEMENT
Variable 2
Lots of resamples (Bucket 2) - Press F9 to calculate
Number of resamples: 200
Resample number Resample statistic Standard statistics
Resa
0 mean 0
250
1 0 eq median 0
2 0 eq sd 0
3 0 eq ave dev from mean 0
200
4 0 eq lower quartile 0
5 0 eq upper quartile 0
6 0 eq interquartile range 0
150
7 0 eq percentiles:
8 0 eq 2.5 0
9 0 eq 97.5 0 100
10 0 eq
11 0 eq Cut value:
12 0 eq 50
13 0 eq Probability below cut
14 0 eq probability equal to cut
15 0 eq probability above cut 0
16 0 eq Total 0.00% 0 0 0 0
17 0 eq R
18 0 eq Values of interest Probability
19 0 eq
20 0 eq
21 0 eq
22 0 eq
23 0 eq
24 0 eq
25 0 eq
26 0 eq
27 0 eq
28 0 eq
29 0 eq
30 0 eq
31 0 eq
32 0 eq
33 0 eq Total 0.00%
34 0 eq
35 0 eq
36 0 eq
37 0 eq
38 0 eq
39 0 eq
40 0 eq
41 0 eq
42 0 eq
43 0 eq
44 0 eq
45 0 eq
46 0 eq
47 0 eq
48 0 eq
49 0 eq
50 0 eq
51 0 eq
52 0 eq
53 0 eq
54 0 eq
55 0 eq
56 0 eq
57 0 eq
58 0 eq
59 0 eq
60 0 eq
61 0 eq
62 0 eq
63 0 eq
64 0 eq
65 0 eq
66 0 eq
67 0 eq
68 0 eq
69 0 eq
70 0 eq
71 0 eq
72 0 eq
73 0 eq
74 0 eq
75 0 eq
76 0 eq
77 0 eq
78 0 eq
79 0 eq
80 0 eq
81 0 eq
82 0 eq
83 0 eq
84 0 eq
85 0 eq
86 0 eq
87 0 eq
88 0 eq
89 0 eq
90 0 eq
91 0 eq
92 0 eq
93 0 eq
94 0 eq
95 0 eq
96 0 eq
97 0 eq
98 0 eq
99 0 eq
100 0 eq
101 0 eq
102 0 eq
103 0 eq
104 0 eq
105 0 eq
106 0 eq
107 0 eq
108 0 eq
109 0 eq
110 0 eq
111 0 eq
112 0 eq
113 0 eq
114 0 eq
115 0 eq
116 0 eq
117 0 eq
118 0 eq
119 0 eq
120 0 eq
121 0 eq
122 0 eq
123 0 eq
124 0 eq
125 0 eq
126 0 eq
127 0 eq
128 0 eq
129 0 eq
130 0 eq
131 0 eq
132 0 eq
133 0 eq
134 0 eq
135 0 eq
136 0 eq
137 0 eq
138 0 eq
139 0 eq
140 0 eq
141 0 eq
142 0 eq
143 0 eq
144 0 eq
145 0 eq
146 0 eq
147 0 eq
148 0 eq
149 0 eq
150 0 eq
151 0 eq
152 0 eq
153 0 eq
154 0 eq
155 0 eq
156 0 eq
157 0 eq
158 0 eq
159 0 eq
160 0 eq
161 0 eq
162 0 eq
163 0 eq
164 0 eq
165 0 eq
166 0 eq
167 0 eq
168 0 eq
169 0 eq
170 0 eq
171 0 eq
172 0 eq
173 0 eq
174 0 eq
175 0 eq
176 0 eq
177 0 eq
178 0 eq
179 0 eq
180 0 eq
181 0 eq
182 0 eq
183 0 eq
184 0 eq
185 0 eq
186 0 eq
187 0 eq
188 0 eq
189 0 eq
190 0 eq
191 0 eq
192 0 eq
193 0 eq
194 0 eq
195 0 eq
196 0 eq
197 0 eq
198 0 eq
199 0 eq
200 0 eq
Resample Frequencies
250
200
150
100
50
0
0 0 0 0 0 0 0 0 0 0 0
Resample statistic