Matching To Remove Bias in Observational Studies
Matching To Remove Bias in Observational Studies
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend
access to Biometrics
DONALD B. RUBINI
SUMMARY
Several matching methods that match all of one sample from another larger sample on a
continuous matching variable are compared with respect to their ability to remove the bias
of the matching variable. One method is a simple mean-matching method and three are
nearest available pair-matching methods. The methods' abilities to remove bias are also
compared with the theoretical maximum given fixed distributions and fixed sample sizes.
A summary of advice to an investigator is included.
1. INTRODUCTION
159
2 As Cochran [1968] points out, if the matching variable X is causally affected by the treatment variable,
some of the real effect of the treatment variable will be removed in the adjustment process.
the response surfaces are called parallel, and the objective of the study is
the estimation of the constant difference between them. See Figure 1. For
linear response surfaces, "parallel response surfaces" is equivalent to "having
the same slope".
2 (X)
(X)
LIU
-j
L'J
CL
Ld-
0
uJ
a-
MATCHING VARIAB3LE )t
FIGURE 1
(X)
LmU 2 (X)
L/
0
2
LUJ
a-
LbJ
oJ
-j
x
0
LUI
0~
MATCHING VARIABLE X
FIGURE 2
of controls not exposed to the agent; see for example Belsen's [1956] study
of the effect of an educational television program.3
The average difference between non-parallel response surfaces over the
P1 population or the constant difference between parallel response surfaces
will be called the (average) effect of the treatment variable or more simply
"the treatment effect" and will be designated r:
3 In other cases, however, this average difference may not be of primary interest. Consider for example
the previously mentioned study of the efficacy of seatbelts. Assume that if automobile speed is high seat-
belts reduce the severity of injury, while if automobile speed is low seatbelts increase the severity of injury.
(See Figure 2, where Pi = motorists using seatbelts, P2 = motorists not using seatbelts, X = automobile
speed, and Y = severity of injury.) A report of this result would be more interesting than a report that there
was no effect of seatbelts on severity of injury when averaged over the seatbelt wearer population. Since
such a report may be of little interest if the response surfaces are markedly nonparallel, the reader should
generally assume "nonparallel" to mean "moderately nonparallel." If the response surfaces are markedly
nonparallel and the investigator wants to estimate the effect of the treatment variable averaged over P2 (the
population from which he has the larger sample), the methods and results presented here are not relevant and
a more complex method such as covariance analysis would be more appropriate than simple matching. (See
Cochran [1969] for a discussion of covariance analysis in observational studies.)
Notice that the percent reduction in bias due to matched sampling depends
only on the distribution of X in P1 , P2 and matched G2* samples, and the
response surface in P2 . If the response surface in P2 is linear,
2 = mean of Y in P2
77i = mean of X in Pi
and
which is the same as the percent reduction in bias of the matching variable
X. Even though only an approximation if the P2 response surface is not linear,
we will use 0, the percent reduction in the bias of the matching variable, to
measure the ability of a matching method to remove bias.
The result in (3.1) is of interest here for two reasons. First, for fixed
distributions and sample sizes and given a particular matching method,
a comparison of 0 and min {100, Omaxn} clearly gives an indication of how
well that matching method does at obtaining a G2* sample whose expected
X mean is close to w . In addition, the expression for Omar will be used to
help explain trends in Monte Carlo results. When investigating matching
methods that might be used in practice to match finite samples, properties
such as percent reduction in bias are generally analytically intractable.
Hence, Monte Carlo methods must be used on specific cases. From such
Monte Carlo investigations it is often difficult to generalize to other cases
or explain trends with much confidence unless there is some analytic or
intuitive reason for believing the trends will remain somewhat consistent.
It seems clear that if Oma,, is quite small (e.g. 20) no matching method will
do very well, while if Omax is large (e.g. 200) most reasonable matching methods
should do moderately well. Hence, we will use trends in Omax to help explain
trends in the Monte Carlo results that follow.
Two trends for Om,,a are immediately obvious from (3.1).
Given fixed f2 , B, and r2/ ' two other trends are derivable from simple
properties of the order statistics and the fact that Oma,, is directly proportional
to 02(r, N) (see Appendix A for proofs).
(3) Given fixed B, o,/o-2 2, f and N, Omax increases as r increases: Q2(r, N) <
92(r + a, N), a > 0; N, rN, aN integers.
4 A matching method that has as its percent reduction in expected bias min { 100, Omax I may be of little
practical interest. For example, consider the following matching method. With probability P = min { 1, 1 /Omnax)
choose the N G2 subjects with the largest observations as the G2. sample and with probability 1 - P choose a
random sample of size N as the G2. sample. It is easily checked that the percent reduction in expected bias using
this method is min { 100, Omax 1.
From the fourth trend, we have Q(r, 1) <~ Q(r, N) ?< Q(r, co). Values of
2(r, 1) have been tabulated in Sarhan and Greenberg [1962] for several
distributions as the expected value of the largest of r observations. Q(r, co)
can easily be calculated by using the asymptotic result
Values of Q(r, 1) and Q(r, co) are given in Table 3.1 for X ' x2 (v = 2(2)10)
and X Normal, and for r = 2, 3, 4, 6, 8, 10.
(a) For fixed r and v, the results for +xI are more similar to those for
the normal than are those for - . This result is expected since the
largest N observations come from the right tail of the distribution
and the right tail of +x2 is more normal than the right tail of -x
which is finite.
(b) Given a fixed distribution, as r gets larger the results differ more
from those for the normal especially for - Again this is not
surprising because the tails of low degree of freedom X2 are not very
normal, especially the finite tail.
(c) For r = 2, 3, 4, and moderately normal distributions ( vx , > > 8)
the results for the normal can be considered somewhat representative.
This conclusion is used to help justify the Monte Carlo investigations
of a normally distributed matching variable in the remainder of
this article.
(d) Given a fixed distribution and fixed r, the values for Q(r, 1) are
generally within 20% of those for Q(r, c), suggesting that when
dealing with moderate sample sizes as might commonly occur in
practice, we would expect the fourth trend (Omax increasing function
of N) to be rather weak.
In Table 3.2 values of Q(r, N) are given assuming f normal, the same
values of r as in Table 3.1, and N = 1, 2, 5, 10, 100, co. Values were found
with the aid of Harter [1960]. For fixed r, the values of Q(r, N) for N > 10
are very close to the asymptotic value Q(r, co), especially when r > 2. Even
Q(2, 10) is within about 3% of Q(2, co). These results indicate that the values
for Q(r, o ) given in Table 3.1 may be quite appropriate for moderate sample
sizes.
4. MEAN-MATCHING
Thus far we have not specified any particular matching method. Under
the usual linear model "mean-matching" or "balancing" (Greenberg [1953])
methods are quite reasonable but appear to be discussed rarely in the litera-
ture. In this section we will obtain Monte Carlo percent reductions in bias
CM CM CM CM cM H H H H H c)
Y) Lr\ H CO kO I CO CM co C C
ON OD O C-- C- ON CM CM H C) ON
H H H H H H H H H H C)
Co
CM _- .t H C) CM ON \,D H -f C--
C-- \CD \CD \CO "0 -CC H H H C CO)
H H H H H H H H H H C)
ON Lr CM ON cO C ON CM C- ON H
C-- C-- t-- \kO \kO O\ CM CM H- C ON\
m mH H m Hi H H H H co
ONx C H C) ON C- ON CM C- C
-C 2-1 -'1 -C tC CM C) C) C) ON CO
sH H H H H H H H^ H o C)
CO ON CO - k H ON k C)
c CO CO co CM H o) C) C CO
<;8
g ~~H H H H H H H H H H O)
CO C) ON ON CO C C H CO >C ONx
O) H C) C) C) C) ON ON CO CO C-
H H H H H H O) ) CC) c C
C CO C O CY) O O ON CO LO\ H H
8
8H4 tH
r IHr4
H H C)r1
I 4 ON0oCh
ON CC
ON h
CO
H H H H HCC0 LA CC) C) C) C
CO kO \O kO CO OC CO C) - kO CO O-
C ) C) C) C) C) C) C) C) C) C) C)
CO CO C CO C Co O CO- - - - O
H~~~~~
C ) X >F L oN O CC - C C)
o o o O O o o o O O o
C) ) ) CC) -tC) i C) C)O C)k C) C)
+ + + + + z i i I I
TABLE 3.2
Q(r, N); f NORMAL
r2 3 4 6 8 10
for a simple mean-matching method and compare these with the theoretical
maximums given by (3.1).
Assuming linear response surfaces it is simple to show from (2.3) that
the bias of I for estimating 'r is :2(X1 - X2.) + 01%. - l), where At is the
regression coefficient of Y on X in Pi and x, is the average X in the matched
samples. Using xl. to estimate ql or assuming parallel response surfaces
(p1 = 12) one would minimize the estimated bias of T by choosing the N G2
subjects such that [xl. - x2.1 is minimized. A practical argument against
using this mean-matching method is that finding such a subset requires the
use of some time consuming algorithm designed to solve the transportation
problem. Many compromise algorithms can of course be defined that ap-
proximate this best mean-match.
We will present Monte Carlo percent reductions in bias only for the
following very simple mean-matching method. At the kth step, k = 1, ... , N,
choose the G2 subject such that the mean of the current G2* sample of k
subjects is closest to x1. . Thus, at step 1 choose the G2 subject closest to
x1. ; at step 2 choose the G2 subject such that the average of the first G2*
subject and the additional G2 is closest to x. ; continue until N G2 subjects
are chosen.
In Table 4.1 we present Monte Carlo values of 0M N, the percent reduction
in bias for this simple mean-matching method.' We assume X normal,
B = 3 1; .2/c2 = 2, 1p 2; N = 25, 50, 100; and r = 2, 3, 4. Some limited
experience indicates that these values are typical of those that might occur
in practice. In addition, values of r and N were chosen with the results of
6 The standard errors for all Monte Carlo values given in Tables 4.1, 5.1, 5.2, and 5.3 are generally less than
0.5% and rarely greater than 1%.
(Y. H rH - Lr\ _ t
H \-O co ON co C\ \10 co ON
C\j
Gd
H14- ~~~ON
H HC> H
C>H
C>HCH
C> H
C>H
0 C
o
0 z~~~ ~ 0 O 0 C) O 0 Ox 0
3 ri ON C>
z C O C>
^oo C C> C> C>
z UN C C C O C>
A 1 mC- O C> CD ON C> C C> C> C>
19 1H Hs H H H H H
ON 0 C C > C C C
II 0 0C O- 04 0 0 01 IC 0l
pq ~ ~ ~ ~ O~
Tables 3.1 and 3.2 in mind-values for percent reduction in bias may be
moderately applicable for nonnormal distributions, especially when r = 2,
and values given when N = 100 may be quite representative for N > 100.
0MN exhibits the four trends given in section 3 for Omax .
In Table 4.2 we present values of min Io00, Omax } for the same range of
N, B and _/f2 as in Table 4.1. Note first that the 67% for N = 50, (_2/o_2 = 2,
r = 2, B = 1 mentioned above is larger than the theoretical maximum and
thus suspect. Comparing the corresponding entries in Table 4.1 and Table 4.2
we see that the values for N = 100 always attain at least 96% of
min 100, Omax}, while the values for N = 50 always attain at least 91%
of min {100, Omax }, and those for N = 25 always attain at least 87% of
min 1 100, Omax }. Hence this simple method appears to be a very reasonable
mean-matching method, especially for large samples.
5. PAIR-MATCHING
co 0 CCO H CY H ON _ H
H 4 \'0 "o u\ \O m- -t \0 rD :
C0 OD 0 o -D C\ Xo LF\ C) ND
01k \- LD UN \O L UN t>- L--
It
10
b ~NO CO CO NO CO CO O CO CO
H NO CO COm N C H C \ CO ON
C1-
~ ~kt
~tl w)co co)ON
H ONC\ \0'(Nw
co co
ON\1
ONco a)
N co
\ 0 co 00 \1 0 co 0 \w co a\
NO cO 01 ON 04 L(\ H 04 NO
H 01H- I>- Co ON - \ CO co ON ON
z If
b t- f 1- H t- CO 04 co ON
a g - H10 cO ON ON ON ON ON ON ON ON
H o 10
z CO ON cO O C) O C) H ON
ON\ ON. ON C) ) C) C) C ON
i a H H H H H
m k-j- 0\ H H ON 04 H H 04 H
CX1 ~ O C) C) ON C) A1 C) r' C)
H104H
lC\j 1 HrHd H
m H H H H H
b
04
co \1 M m m Ctj co CU] {-
b H104 C) C) C) H O C) C) C C
H H H H H H H H H
CO ND
mq 11 cql 01 01
m -- ci 0
CY0 4- CO
c n041 H
H
(1) Given fixed N, r, and ovl/ r , 0RD and OHL decrease as B increases.
(2) Given fixed N, r, and B, ORD and OHL decrease as o_2/_22 increases.
(3) Given fixed B, -2/ 2, and N, ORD and OHL increase as r increases.
(4) Given fixed B, 0_2 /of and r, ORD and OHL generally increase as N
increases.
These same four trends hold for all orderings if ORD and OHL increase"
is replaced by "ORD, OHL , and OLH get closer to 100%". Values of 0 greater
than 100% indicate that 772* > nI which is of course not as desirable as
fl2* 7 which implies 0 100.
Comparing across Tables 5.1, 5.2, and 5.3 we see that given fixed B,
0 022/e2, r and N, OLH ? ORD 2 OHL . This result is not surprising for the
following reason. The high-low ordering will have a tendency not to use
those G2 subjects with scores above the highest GI score while the low-high
ordering will have a tendency not to use those G2 subjects with scores below
the lowest G6 scores. Since we are assuming B > 0 (X1 > fl2), the low-high
-:j co 0 U'\ ( 0 Lr ON 0
CC )c 0 \D co 0 cD o
H H H
LC\ O 0 t- 0 O t0 0
CC) o 0 CC 0 0 o c 0 0
b 1 H H HrI H H 1 H
0 O 0 0 0 0 O 0 0
H H H H H H H H H
cc 0 0 C
r-i 0 ~~~~~0 0 i> 0 0 cc )
H H H H H
? H H H H H H H H H
10 rdH H
S~~~ H H
H H H H H H
? -J HIC'J 0 0 0 0 0 0 0 0 0
H H H H H H H H H
P4 C~j 0 0 0 0 0) 0 0 0 0
N HC~j 0 0 0 0 0) 0 0 0 0
II H- 01\ 0 0 0N O 0 0N 0 0
H H H H HH H H
c\J ~~~~~~~0 0 0 0O 0 0 0 0 0
O O O O O O O O O)
00 0 0- 0) 0 0~ 0 0
IIc Sr=N H H OO H H H - r
C14 O O O O O O) O O O
Hl r I ri rI rH H r-Ird -
Ct ~ ~ c' O~ O f O M O O O O O) O-
11 j m _:I CM m _: CN m
58~~~~~~0 = - = K l=
co H Co H C- 1 H aC A1 H
Z C\J O 1- LC\
b
z ~~~~~~- C)~~~~~ CMJ C\ O\ Lr\ H- O\ LC\ H-
0 (n t H kO
'.0 - O\ '. t-\
COI \.o 1>CO
H'. C
Un\C
X M tN C04 H uN t- UN - C) . Coo
? eq. Hi 0 ON ON C ON ON ON ON ON
f bW
m ~ ~ ~ ~ ~ > F UN C> '. O UN ON ON
h Hk O (X) O ON ON O
00
z ON , L- ON \10 N C) '. ON ON
o ..sl\I h t-\
z (ncO
( h
Z C) 01 1 t ON 0 O CCo
H oo ON ON ON ON CO ON ON
C O h 00 ON ON ON ON C) ON ON ON
z H ~ ~~~H H H
H co CO\ ON CO ON () co ON ON
bj Hc0H[04
Z~ oN ON ON ON
zONaC) ON
HONH CD
ll~ H-
t-- ON ON\ ON C) C) C) C)
HI-4~~~~~ ONr ON ON ON C) C) C) C) C)
II 04 01 ~~~~~~ 04 .-z1 04 =
CC H co H m H ON - H
H .-wt \O \O U\ _ \O L-
c'j
II uN\ L- LC\ CY) ON ux, Lr\ 0 \1,0
H 00 H CO CU \1 H ON UN H
ID HUN [ UN t- cO UN L- Co
z
0 ~ ~ ~~~~~0
Ox UN Cx \f) t - c o c0 c-
ON t t-- cI o -t co o c cc
H \ L- co \.O co cO ko cO oa
t- co m -t.z- \ t Lr\ L-
o 1~ H10 L~ - co o\ co ox ON co ox ox\
H
0
I>- ox cc \ c kO C0 \x co
L - co C7 co O N. Ox ON ox
ii 01 cc ~~~~ 01 cc K 0c cc 2
ordering should yield the most positive x2. followed by the random ordering
and then the high-low ordering. When '/o-2 = IL and B < -, OL Ir can be
somewhat greater than 100 (e.g. 113) while 100 > OR D > 94. In all other
cases (o-2/ 2- > .1 or 2/2 and B < 1), OLH is closer to 100% than ORD
or OHL . In general the results for ORD, OLH, and OHL are quite similar for
the conditions considered.
Comparing the results in this section with those in section 4, it is easily
checked that if v2/o2 < 1 the three pair-matching methods generally attain
more than 85% of min { 100, Om.,} in Table 4.2 indicating that they can
be reasonable methods of matching the means of the samples. However,
if cr2/cr2 = 2, the pair-matching methods often attain less than 70% of the
corresponding OMX Nin Table 4.1 indicating that when cr2/cr2 > 1 these pair-
matching methods do not match the means very well compared to a single
mean-matching method.
Remembering that pair-matching methods implicitly sacrifice closely
matched means for good individual matches, we also calculated a measure
of the quality of the individual matches. These results presented in Appendix
C indicate that, in general, the high-low ordering yields the closest individual
matches followed by the random ordering. This conclusion is consistent with
the intuition to match the most difficult subjects first in order to obtain
close individual matches.
6. ADVICE TO AN INVESTIGATOR
(a) N, the size of the smaller initial sample (GD); equivalently, the size
of each of the final samples.
(b) r, the ratio of the sizes of the larger initial sample (G2) and the smaller
initial sample (GD).
(c) The matching rule used to obtain the G2* sample of size N from
the G2 sample of size rN.
(a) Choosing N
N = z2s2/A2, (6.1)
s/N = E E% - r)2
Rarely in practice can one estimate the quantities in (6.3). Generally, how-
ever, the investigator has some rough estimate of an average variance of Y,
say C2 , and of an average correlation between Y and X, say p. Using these
he can approximate (1/N) {e + Ir~ } by (2/N)6 (1 p
Approximating c2/N(Q1 _ 032)2 is quite difficult unless one has estimates
of A3 and _2 * The following rough method may be useful when the response
surfaces are at the worst moderately nonparallel. If the response surfaces
are parallel o-'/N(3 - /32)2 is zero and thus minimal. If the response surfaces
are at most moderately nonparallel, one could assume (31 - 022)2 < 2j2
in most uses.7 Hence, in many practical situations one may find that 0 <
1r2IN(0 - 02)2 < 2 (Q24/N)023 , where the upper bound can be approximated
by 2(p2crY/N). Hence, a simple estimated range for s2is
26v(1 P ) < 2 < 2&f. (6.4)
If the investigator believes that the response surfaces are parallel and
linear, the value of S2 to be used in (6.1) can be chosen to be near the minimum
of this interval. Otherwise, a value of S2 nearer the maximum would-be
appropriate.
(b) Choosing r
6 Moderate samples (N > 20) are assumed. For small samples N = tN2_12s/A2 where tN-l is the student-
deviate with N - 1 degrees of freedom corresponding to 1 - a confidence limits.
7 A less conservative assumption is (,31 - 132)2 < 312.
We will choose r large enough to expect 100% reduction in bias using the
simple mean-matching method of section 4.
(1) Estimate -y = B[(1 + crJ/U2)/2]1/2 and the approximate shape of the
distribution of X in P2 . In order to compensate for the decreased ability
of the mean-matching method to attain the theoretical maximum reduction
in bias in small or moderate samples (see section 4), if N is small or moderate
(N < 100) increase -y by 5 to 15% (e.g. 10% for N = 50, 5% for N = 100).
(2) Using Table 3.1 find the row corresponding to the approximate shape
of the distribution of X in P2 . Now find approximate values of r, and r.
such that Q(r', 1) - -y and Q(r. , o ) - -y. If N is very small (N < 5), r should
be chosen to be close to r1 ; otherwise, results in Table 3.2 suggest that r can
be chosen to be much closer to rO . r should probably be chosen to be greater
than two and in most practical applications will be less than four.
Now assume pair-matches are desired, i.e. the response surfaces may be
nonlinear, nonparallel and each yli - Y2i may be used to estimate the
treatment effect at x1j . We will choose r large enough to expect 95% + re-
duction in bias using the random order-nearest available pair-matching
method of section 5. Perform steps (1) and (2) as above for mean-matching.
However, since in section 5 we found that if -2/a2 > 1 nearest available
pair-matching did not match the means of the samples very well compared
to the simple mean-matching method, r should be increased. The following
is a rough estimate (based on Tables 5.1 and 4.1) of the necessary increase:
We assume G1 and G2 (i.e. r and I) are fixed and the choice is one of a
matching method. If the investigator knows the P2 response surface is linear
and wants only to estimate a, the results in section 4 suggest that he can use
the simple mean-matching method described in section 4 and be confident
in many practical situations of removing most of the bias whenever r > 2.
If confidence in the linearity of the P2 response surface is lacking and/or
the investigator wants to use each matched pair to estimate the effect of
the treatment variable at a particular value of X, he would want to obtain
close individual matches as well as closely matched means. Results in section 5
indicate that in many practical situations the random order nearest available
pair-matching method can be used to remove a large proportion of the bias
in X while assigning close individual matches. The random order nearest
available pair-matching is extremely easy to perform since the G1 subjects
do not have to be ordered; yet, it does not appear to be inferior to either
high-low or low-high orderings and thus seems to be a reasonable choice in
practice.
If a computer is available, a matching often superior to that obtained
with the simple mean-matching or one random order nearest available
ACKNOWLEDGMENTS
This work was supported by the Office of Naval Research under contract
N00014-67A-0298-0017, NR-042-097 at the Department of Statistics, Harvard
University.
I wish to thank Professor William G. Cochran for many helpful suggestions
and criticisms on earlier drafts of this article. I would also like to thank the
referees for their helpful comments.
RESUME
Plusieurs m6thodes de croisements qui croisent tous les niveaux d'un meme 6chantillon
tir6 d'un autre echantillon plus grand sur une variable continue de croisement sont com-
parees en ce qui concerne leur pouvoir d'effacer le biais de la variable de croisement. Une des
methods est une methode croisant la moyenne et trois sont des m6thodes croisant la paire
la plus proche. Les possibilities des methodes en ce qui concerne la suppression du biais
sont aussi comparees au maximum theorique etant donn6 des distributions fixes et des
tailles d'echantillon fixees. Un resume pour aider le chercheur est inclus.
REFERENCES
observations, we have that the expected value of the average of the N largest
from such a subset is
where
1 1 r(N+b)
N E XiXsi)
--N b)-nN1
nN E ir(i) < m im Ex(i) + (nN - m(N + b - i))X(N+b)}
{ + b }
? { X(i)}
APPENDIX B
FORTRAN SUBROUTINES FOR NEAREST AVAILABLE PAIR MATCHING AND SIMPLE MEAN
MATCHING
N1 = N = size of G1
N2 = rN = size of initial G2
X1 = vector of length N giving matching variable scores for G1 sample,
i.e. 1st entry is first G1 subject's score
X2 = vector of length rN giving scores for G2 on matching variable
AV1 = x1.
D = l. -X2. ; output for matched samples
D2 = 1/N E (x1j - X2j)2; output for matched samples
IG1 = vector giving ordering of G1 sample for nearest available matching
(a permutation of 1 ... NI)
IG2 = "current" ordering of G2 sample. After each call to a matching
APPENDIX C
THE QUALITY OF INDIVIDUAL MATCHES:
1 12 2 / 2 2 2rl/cr2 = 2
1 1 3 1I1 3 1
4 2 4 1 4 2 3 1 4 2 4 1
T=J- 47 58 72 81 35 5o 63 77 55 4-863 7 6
cm 2 01 02 o4 lo 02 o 10 16 10 19 28
3 5 00 00 01 02 01 02 o5 o8 o6 og 13 19
4 00 00 00 01 o o1 o1 o4 o4 4 o6 09 15
a1 42 55 71 8o 3o 47 64 76 31 42 57 73
? 2 00 01 05 o8 ios 3 08 15 07 13 20 26
3 0000 00 0o 1 000 o1 06 o4 o8 12 18
4 00 00 00 00 00 01 02 o4 003 5 09 13
1 37 56 69 8o 25 46 63 76 25 4! 60 73
$ 2 00 00 02 o6 01 02 07 1{ o06 12 19 27
3 00 00 00 oi oo o 102 05 03 07 1 17
; 4 00 00 00 00 0000 01 o4 03 05 09 13
1 55 42 42 47 30 26 31 40 27 28 34 45
c 2 01 01 02 04 02 03 o5 09 o8 og 13 19
003 00 oo 01 01 0 0 03 05 os o6 0 g 13
R d 4 00 00 00 01 01 01 02 03 04 0507 11
2. 51 39 38 43 21 19 27 37 20 22 29 4o
z; s R 2 00 01 01 03 01 02 o4 07 o0 o8 12 16
H 0) fi 3 00 00 0? 01 00 01 02 03 03 5 o8 11
4 00 00 00 00 00 00001 02 02 03 o6 og
i; 1 49 39 35 41 15 16 24 35 16 19 28 39
H H 2 00 00 01 02 00 01 03 o6 o4 07 :L1 17
I 5 3 00 00 00 00 0000 01 02 02 03 07 1L1
4 00 oo 00 00 0000 01 02 02 03 05 08
1 93 1o8 322 126 71, 92 108 119 55 77 96 110
0 2 02 01i 10 20 04 09 17 26 13 19 27 39
3 3 00 01 02 05 01 03 o8 15 07 12 18 26
4 00 00 01 02 01 02 04 07 05 o8 12 20
If perfectly matched, equals 00. I randomly matched from random samples, equals 100.