Some Comments on Cp
Some Comments on Cp
Some Comments on Cp
C. L. Mallows
To cite this article: C. L. Mallows (2000) Some Comments on Cp , Technometrics, 42:1, 87-94,
DOI: 10.1080/00401706.2000.10485984
We discuss the interpretation of Cp-plots and show how they can be calibrated in several ways. We
comment on the practice of using the display as a basis for formal selection of a subset-regression
model, and extend the range of application of the device to encompass arbitrary linear estimates
of the regression coefficients, for example Ridge estimates.
will be a good estimate of 7](x). In particular he may be and ,BQis ,B with the elementscorrespondingto P replaced
interested in choosing a “subset least-squares”estimate in by zeroes,and Mp = XX; = XrXp = Zp(Z~~ZII)-‘Z~.
which some components of b are set at zero and the re- The Cp statistic is defined to be
mainder estimated by least squares.
Cp = & RSSp - rl+ 21-1 (3)
The Cp-plot is a graphical display device that helps
the analyst to examine his data with this framework in where i” is an estimate of n2. Clearly (ashas beenremarked
mind. Consider a subset P of the set of indices Kt = by Kennard (1971)), Cp is a simple function of RSSp, as
(0.1~ 2, . , k}; let Q be the complementary subset. Sup- are the multiple correlation coefficient defined by I- R’$ =
pose the number of elements in P, Q are lPl = p, IQ1= q, RSSp/TSS (where TSS is the total sum of squares) and
so that p + 4 = k + 1. Denote by pp the vector of the “adjusted” version of this. However the form (3) has
estimates that is obtained when the coefficients with sub- the advantage(as has been shown by Gorman and Toman
scripts in P are estimated by least squares,the remaining (1966), Daniel and Wood (1971), and Godfrey (1972)) that
coefficients being set equal to zero; i.e. since under the above assumptions
i$ = X,Y E(RSSr) = (n - p)a” + Bp> (4)
where X, is the (Moore-Penrose) generalized inverse of
XT>,which in turn is obtained from X replacing the columns
@ 1973 American Statistical Association
having subscripts in Q by columns of zeroes.(Thus Xi has and the American Society for Quality
zeroes in the rows corresponding to Q, and the remaining TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1
87
88 C. L. MALLOWS
Cp is an estimate of E(Jp), and is suitably standardized nating to consider how patterns similar to thosehe describes
for graphical display, plotted against p. Graphical presenta-
would show up on a Cp-plot.
tion of the various regression sums of squaresthemselves First, suppose the independent variables are not highly
againstp was advocatedby Watts (1965).For k not too large correlated, that 0 = pp, and that every non-zero element of
it is feasible (Furnival 1971;Garside 1965;Schatzloff, Tsao,
/3 is large (relative to the standarderror of its least-squares
and Fienberg 1968)to compute and display all the 2”+’ val-estimate).Then the Cp-plot will look something like Figure
uesof CP; for larger valuesone can use algorithms of Beale,
1 (drawn for the case p = k - 2, K+ - P = {1,2,3}).
Kendall, and Mann (19671,Hocking and Leslie (1967), and Notice the approximately linear diagonal configuration of
LaMotte and Hocking (1970) to compute only the more in- points correspondingto the well-fitting subsetsof variables.
teresting (smaller) values. Now, supposex1, 22, ~3 are highly correlated with each
In section 2 we describe some of the configurations that
other, with eachbeing about equally correlated with y. Then
can arise; in section 3 we provide some formal calibrationany two of these variables, but not all three, can be deleted
for the display and in section 4 comment on the practice
from the model without much effect. In this case the rele-
of using it as a basis for formal selection. The approachis
vant points on the Cp-plot will look something like Figure
extendedin section 5 to handle arbitrary linear estimatesof
the regressioncoefficients. 2a, if no other variables are of importance, or like Figure
The approachcan also be extendedto handle multivariate2b if some other subset P is also needed.(In all these ex-
responsedata and to deal with an arbitrary weight functionampleswe are assumingthat the constant term POis always
w(x) in factor-space, describing a region of interest dif-needed).Notice that now the diagonal pattern is incomplete.
In an intermediate case,when ~1,22,23 have moderatecor-
ferent from that indicated by the configuration of the data
currently to hand. In each case, the derivation is exactlyrelations, a picture intermediate between Figures 1 and 2b
will be obtained.
parallel to that given above.In the former case,one obtains
a matrix analog of CP in the form $:-l.RSSp - (n - 2p)1 Thirdly, supposex1,22 are individually unimportant but
where 2 is an estimate of the residual covariance matrix, jointly are quite effective in reducing the residual sum of
and RSSp is C(y, - x,bp)r(yU - xUbp). One or more squares;supposesome further subsetP of variables is also
measuresof the “size” of CP (such as the trace, or largestneeded.Mantel gives an explicit example of this behavior.
Figure 3 shows the resulting configuration in the case /PI =
eigenvalue)can be plotted againstp. In the latter case,with
the matrix A = (A,j) defined by Ai,, = J’ zizjw(x) dx, k - 4.
one arrives at a statistic of the form Notice that even if C’I~,~,~Iis the smallest Cp-value for
subsets of size p + 2, there might be subsets Pi, Pi, (not
Cc = & (& - B)TA(& - ,!?)- V,a, + 2V,a. containing P) with lP,l = p or p + 1 that gave smaller
values of Cp than those for P, {P, l}, {P, 2). In this case
where V$?+ = trace (A(XTX)-‘), Vp” = trace (A
an upward stepwise testing algorithm might be led to in-
(X$Xp))), and we can plot C$ against Vp”. This reduces
to the Cp-plot when A = XTX. If interest is concentrated clude variables in these subsetsand so not get to the sub-
at a single point z, we have A = xxT, and the statistic set {P, 1,2}. Mantel describesa situation where this would
6-“C$ is equivalent to that suggestedby Allen (1971); his happen.
equation (9) = s-“(C,” - x(X%-lxT).
2. SOME CONFIGURATIONS ON +-PLOTS
From (21, (3), (4) we see that if PQ = 0, so that the
P-subset model is in fact completely appropriate, then
RSSp = (n - p)a2 and Cp = p. If 3’ is taken as
RSSK+,(+-r), then CK+ = IK+I = k + 1 exactly.
Notice that if P* is a (p + 1) element subset which con- d
tains P, then
It+1
cp. - cp = 2 - g (5)
where SS is the one-d.f. contribution to the regressionsum C
of squares due to the (p + 1)-th variable, so that S,S’/e2
is a tf statistic that could be used in a stepwise testing
algorithm. If the additional variable is unimportant, i.e. if
the bias contribution BP - Bp* is small, then E(SS) M c2
and so
E(Cp* -C,) M 1.
Mantel (1970) has discussed the use of stepwise proce-
dures,and how they behavein the face of various patternsof
correlation amongst the independentvariables. It is illumi- Figure 1. Cp-plot: P is an Adequate Subset.
l 0
k+’ t-
C l Pl23
.P23
l P3 .Pl3
l P2
023 00123 l Pl l Pl2
‘03 8013
8 02 0012
01
1 1 I I 1
)- It+1
2 3 4 P P+( P+2 P+3
0 1
P
Figure 2b. C,-plot: Same as 2a Except That Variables in P Are Also
Figure 2a. Cp-plot: Variables I, 2, 3, Are Highly Explanatory Also Explanatory
Highly Correlated.
Var(CP - C,,) M 2(jAl+ IA’1 - 2R2)
3. CALIBRATION
To derive bench marks for more formal interpretation of where R2 is the sum of squares of the canonical corre-
Cp-plots, we assumethat the model (1) is in fact exactly lations between the sets of variables XA and XAI, after
appropriate, with the residuals er . . e, being independent partialling out the variables XB. (Thus if IBJ = IPI - 1,
and Normal (0, a”). Suppose6’ is estimated by RSS,+/u Var(Cp - CPJ) = 4(1 - p”) where p is the partial correla-
where v = n - k - 1, the residual degreesof freedom. We tion coefficient PAA~.B).[Srikantan (1970) has proposedthe
do not of course recommend that the following distribu- average,rather than the sum, of the squaredcanonical cor-
tional results be used blindly without careful inspection of relations as an overall measureof association.This measure
the empirical residuals yi - 5(x,), i = 1,. , n. However, has the property that its value is changedwhen a new vari-
they should give pause to workers who are tempted to as- able, completely uncorrelated with all the previous ones, is
sign significance to quantities of the magnitude of a few added to one of the sets of variates.]
units or even fractions of a unit on the Cp scale. We now use the Scheffe confidence ellipsoid to derive a
First, notice that the increment Cp* - CP (in (5) above) different kind of result. Let us write ,$ = (/?“,&) for the
is distributed as 2 - t?, where the t-statistic ti is central if
,B = ,Bp*. In this casethis increment has mean and variance
of approximately 1 and 2 respectively. Similarly,
CK+ - Cp = k + 1 - Cp = q(2 - Fq,v) (6) l P l Pq
where q = Ic + 1 - p and the F statistic is central if l P2
/?I = pp; thus if v is large compared with q this incre-
ment has mean and variance approximately q and 2q re-
spectively. The variance of the slope of the line joining
the points (p, Cp), (Ic + 1, k + 1) is thus 2/q, so that the k+ 0
slope of a diagonal configuration such as is shown in Fig-
ure 1 will vary considerably about 45”. The following ta- c
bles (derived from (6)) give values of Cp - p that will be
exceededwith probability a when the subset P is in fact
adequate(i.e. when ,8 = /3~ so that & = 0), for the cases
v = n - F;- 1 = 30, cx;. The value tabulated is q(F,,,,(cv)
- 1).
For comparing two Cp-values corresponding to subsets
P.P’ with Pn P’ = B;P = AU B,P’ = A’u B, it is ( 1 I I I I
straightforward to derive the results, valid under the null k-4 k-3 k-2 k-l k k+l
hypothesis that each of P and P’ is an adequatesubset,
Figure 3. Cp-plot: Two Variables That Jointly Are Explanatory But
E(Cp - Cpf) E IPl - IP’I = IAl - iA’1 Separately Are Not.
Table la. Values of Cp - p That Are Exceeded With Probability cy When p = pp; q = k + 1 -p, v = 30
q=k+ l-p 1 2 3 4 5 6 7 8 9 10 15 20 30
cy = .lO 1.88 2.98 3.83 4.57 5.25 5.88 6.49 7.07 7.64 8.20 10.83 13.35 18.19
.05 3.17 4.63 5.77 6.76 7.67 8.52 9.34 10.13 10.90 11.65 15.22 18.63 25.23
.Ol 6.56 8.78 10.53 12.07 13.50 14.84 16.13 17.38 18.60 19.79 25.20 30.97 41.58
least-squaresestimate of ,@ = (PO,pg), and let rejected if there is any OK with & = 0 that is acceptable
according to the corresponding test in the family, i.e., if
xTx=(E ma>. there is any PK with p Q = 0 lying within the confidence
ellipsoid S,. By the Lemma, the corresponding acceptable
subsets(0, P-} are just those that have
Then the Scheffe lOOcu%confidence ellipsoid for the ele-
ments of @K is the region Cp<2p-k-l+kF,. (9)
S, = {OK: (PK - ~K)~DK(PK - by) < k62&) (7) We state the property formally:
where F, is the upper lOOa%quantile of the F distribution A subset P = (0, P-} satisfies (9) if and only if there is some vector of
on Ic,n - Ic - 1 degreesof freedom. coefficients 0 having ,& = 0 that lies within the Scheffk ellipsoid (7),
Notice that S, can be written i.e. if and only if there is some vector of this form that is accepted by the
corresponding test with acceptance region of the form (8).
1 ^
s, = OK:5 (PK-PK)&S; As an example, consider the lo-variable data studied by
{ 1
Gorman and Toman (1966). Taking a = 0.10,k = 10,v =
where S; is a fixed ellipsoid centered at the origin: 25, we find that amongthe 58 subsetsfor which Gorman and
Toman computed Cp-values, there are 39 that satisfy (9),
Sz, = {7:yTD~y < kF,}. in number 7, 13, 9, 10 with p = 7,8,9,10 respectively. This
Let P- , Q be any complementary subsets of K, P = result gives little support to the view that this set of data is
sending a clear messageregarding the relative importance
(0, p-1.
The following lemma is proved in the Appendix. of the variables under consideration.
Notice that if the true coefficient vector 0” has p& = 0,
Lemma The following statementsare equivalent: then Pr {for all P containing P*, Cp 5 2p - k - 1 + kFa}
> Q, with equality only if p* = 1 (i.e. P* = (0)). This prop-
(i) The region S, intersects the coordinate hyperplane
erty of the procedureis not completely satisfying since it is
ffP = {pK:pQ = o}, not an equality; also the form of the boundary in the Cp-plot
(ii) The projection of S, onto the HQ hyperplane con-
is inflexible. In theory, one way of getting a better result is
tains the origin,
the following. Given any subset P* and a sequenceof con-
(iii) The subset least squares estimate fip = @a,BP- )
stantsci,c2,..., ck (and the matrix DK) one could compute
has BP- in S,, the probability Pr {for all P containing P*, Cp < I+}; this
(iv) Cp < 2p - Ic - 1 + kF,, probability dependson cl,. , ck, P* and DK, but not on
(v) RSSp- RSSK+ < k&‘F,.
any other parameters.One could then adjust cl, . . , ck so as
Now consider any hypothesis that specifies the value of to make the minimum of this probability over all choices
,0~, and the corresponding lOOa% acceptanceregion of P* (or possibly only over all choices with p* > some
pc) equal to some desired level cr. The computation would
Tplc- = DK: ;(bK -PK) E s: (8) presumably be done by simulation.
1 I Starting from the Scheffe ellipsoid, Spjmtvoll (1972) has
developed a multiple-comparison approach that provides
(clearly P(BKET,~I&) is in fact equal to a; this is just the confidenceintervals for arbitrary quadratic functions of the
confidence property of the Scheffe ellipsoid (7)). Starting unknown regressionparameters,for example BP - Bpr for
from this family of acceptanceregions for hypothesesthat two subsetsP, P’.
specify PK completely, a natural acceptanceregion for a
composite hypothesis of the form ,& = 0 is given by the 4. FORMAL SELECTION OF
union of all regions Tp; for values of & such that ,@$= SUBSET REGRESSIONS
0; the reasoning is that the hypothesis pQ = 0 cannot be Many authors have studied the problem of giving for-
Table lb. Values of Cp - p That Are Exceeded With Probability cy When p = pp; q = k + 1 - p, v = CC
q=k+ l-p 1 2 3 4 5 6 7 8 9 10 15 20 30
a = .lO 1.71 2.61 3.25 3.78 4.24 4.65 5.02 5.36 5.68 5.99 7.31 8.41 10.26
.05 2.84 3.99 4.82 5.49 6.07 6.59 7.07 7.51 7.92 8.31 10.00 11.41 13.77
.Ol 5.63 7.21 9.34 9.28 10.09 10.81 11.48 12.09 12.67 13.21 15.58 17.57 20.89
TECHNOMETRICS,FEBRUARY2000,VOL.42,NO.1
SOME COMMENTS ON Cp 91
Figure 4. m-functions
J?j= $ k(il(xu) - xdq2.
u=l
Notice that the function m(r) is less than 1 only for /T/ <
We can handle in detail only the caseof orthogonal regres- .78, and rises to a maximum value of 1.65 at 1~1= 1.88. It
sors, and so now assumeXTX = nI. In this case we see exceeds1.25 for 1.05 < (71< 3.05.
from (5) that Cp is minimized when P contains just those We reiterate that in this case of orthogonal regressors
terms for which tf > 2, where ti = fibi/ is the t-statistic with n. very large, the “minimum Cp” rule is equivalent to
for the j-th regressioncoefficient, and bi is the least squares a stepwise regressionalgorithm with all critical levels set at
estimate, bi = Cxu~y/7L/n. Thus in this case the “minimize 15.73%. Also shown in Figure 4 are the m-functions cor-
Cp” rule is equivalent to a stepwise regression algorithm responding to several other critical levels; when all Ic + 1
in which all critical t-values are set at 4 and d2 is kept at terms are infallibly included (the “full-1.s.” rule), m(r) = 1
the full-equation value throughout. for all 7, so that E( Jfull I.~.) = Ic+ 1. We seethat the “min-
Now let us assumethat n is sufficiently large that varia- imum Cp” rule will give a smaller value for E(J) than the
tion in 6 can be ignored; then to7 tl, , tk will be indepen- “full-1.s.” rule only when rather more of the true regres-
dent Normal variables with unit variances and with means sion coefficients satisfy 171< .78 than satisfy /7/ > 1; in
ro: ( Tk where ‘TV= &&/c-. Let d(t) be the function that the worst case with 1~~1 = 1.88 for j = 1, . . , iF,E(J) for
equals0 for ItI 5 4, and equals I otherwise, then J for the the “minimize Cp” rule is 165% of that for the “full-1.s.”
“minimum-Cp subsetleast squares”estimate can be written rule. Similarly for rejection rules with other critical levels;
in particular, a rule with a nominal level of 5% (two tailed)
gives an E(J) at worst 246% of that of the “full-1.s.” rule.
Thus using the “minimum Cp” rule to select a subsetof
terms for least-squaresfitting cannot be recommendeduni-
which reducesto versally. Notice however that by examining the Cp-plot in
the light of the distributional results of the previous section
JMinCp = -&ii) - 7J2. one can see whether or not a single best subsetis uniquely
j=o indicated; the ambiguous cases where the “minimum Cp”
rule will give bad results are exactly those where a large
Hence number of subsetsare close competitors for the honor. With
such data no selection rule can be expected to perform re-
liably.
E(JMinC{O,P-}) = 1 + Cm(lr?l),
j=1
VL = 1-t tr(XTXLLT)
(1 + f)/f = 2 /3,2,rc82;
BL = @(LX - I)TXTX(LX - I)PK. j=l
the adjusted estimates are then given by where bT = (fia:&) is the vector of least-squaresesti-
mates of all the coefficients in the model, and (&,bF-) is
/!I?;= 1- $ /?, i=l,...,k. the vector of subset-least-squaresestimates.From the form
(14)
of S, (7) it now follows that (iii) BP- is in S, if and only
( 3)
if (v) RSSpPRSSK+ < ktf2F,, which is directly equivalent
It is interesting that this set of estimates is of the form (if 6’ = RSSK+/n - k - 1) to (iv) Cp < 2p - k - 1+ kF’,.
suggestedby Stein (1960) for the problem of estimating re- Clearly (iii) implies (i); to prove the converse we remark
gression coefficients in a multivariate Normal distribution. that for any vector PT = (pa,pg) with PK in the hyper-
Jamesand Stein (I 96 1) showedthat for k 2 3 the vector of plane Hp = {,dK : pQ = 0}, we have
estimates,&** obtained by replacing the multiplier k in (14)
by any number between 0 and 21% - 4 has the property that IIXP - WI2 = IIXP - XPPl12 + IIXBP - XPl12,
E(J**) is less than the full-least-squares value k + 1 (see
Hoer1and Kennard 1970b),for all values of the true regres- the cross-product term vanishing by definition of pp. Thus
sion coefficients. Thus our “minimize CL~” rule dominates if any point of HP is in S,, ,0p must be. Finally, (i) is
full least-squaresfor k 2 4. This result stands in interest- directly equivalent to (ii) by a simple geometrical argument.
ing contrast to the disappointing result found above for the To handle the case of non-orthogonal regressors,Sclove
“minimize Cp” rule. (1968) has suggestedtransforming to orthogonality before
Now, consider the caseof equi-correlatedregressors,with applying a shrinkage factor. A composite procedure with
XTX = I + p(llr - I). In this case the least-squaresesti- much intuitive appeal for this writer would be to use the
mate 3 of 6 = C/fl,/k has variance l/k(l - p + kp), and Cp plot or some similar device to identify the terms that
the vector of deviations (/?i - 5) has covariance matrix should certainly be included (since they appear in all sub-
(I - kP1llT)/(l - p). Thus when p is large, these devi- sets that give reasonably good fits to the data), to fit these
ations become very unstable. by least squares,and to adjust the remaining estimates by
It is found that for p near unity, C’L~is minimized when orthogonalizing and shrinking towards zero as in La Motte
f is near (1 - p)g, where and Hocking (1970).
REFERENCES
The adjusted estimates are given approximately by Allen, D. M. (1971). “Mean Square Error of Prediction as a Criterion for
Selecting Variables,” Technometrics, 13, 469-475.
&i,(l-&pi-i). Beale, E. M. L., Kendall, M. G., and Mann, D. W. (1967), “The Discarding
of Variables in Multivariate Analysis,” Biometrika, 54, 357-366.
Daniel, C., and Wood, F. S. (1971) Fitting Equations to Data, New York:
Wiley-Interscience.
Thus here the “minimize C’L~” rule leads to shrinking the
Furnival, G. M. (1971), “All Possible Regressions With Less Computation,”
least-squaresestimatestowards their average.While the de- Technometrics, 13, 403-408.
tails have not been fully worked out, one expects that this Garside, M. J. (1965), “The Best Subset in Multiple Regression Analysis,”
rule will dominate full least-squaresfor k > 5. Applied Statistics, 14, 196200.
Godfrey, M. B. (1972), “Relations Between C,, RSS, and Mean Square
6. ACKNOWLEDGMENTS Residual,” unpublished manuscript submitted to Technometrics.
It is a great personalpleasureto recall that the idea for the Gorman, J. W., and Toman, R. .I. (1966), “Selection of Variables for Fitting
Equations to Data,” Technometrics, 8, 27-5 1.
Cp-plot arosein the course of some discussionswith Cuth- Hocking, R. R., and Leslie, R. N. (1967). “Selection of the Best Subset in
bert Daniel around Christmas 1963. The use of the letter Regression Analysis,” Technometrics, 9, 531-540.
C is intended to do him honor. The device was described Hoerl, A. E., and Kennard, R. W. (1970a), “Ridge Regression: Biased Es-
publicly in Mallows (1964) and again in Mallows (1966) timation of Non-orthogonal Problems,” Technometrics, 12, 55-67.
(with the extensionsdescribedat the end of section 1 above) - (1970b), “Ridge Regression: Applications to Non-orthogonal Prob-
lems,” Technometrics, 12, 69-82.
and has appearedin several unpublished manuscripts. Im- James, W., and Stein, C. (1961), “Estimation With Quadratic Loss,” in
petus for preparing the presentexposition was gained in the Proceedings of the Fourth Berkeley Symposium, Berkeley: University of
course of delivering a series of lectures at the University of California Press, pp. 361-379.
California at Berkeley in February 1972; their support is Kennard, R. W. (1971), “A Note on the C, Statistic,” Technometrics, 13,
899-900.
gratefully acknowledged.
Kennedy, W. J., and Bancroft, T. A. (1971) “Model-Building for Prediction
in Regression Based on Repeated Significance Tests,” The Annals of
APPENDIX Mathematical Statistics, 42, 1273-1284.
La Motte, L. R., and Hocking, R. R. (1970) “Computational Efficiency in
Proof of the Lemma the Selection of Regression Variables,” Technometrics, 12, 83-93.
The key to theseresults is the identity, true for any subset Lindley, D. V. (1968), “The Choice of Variables in Multiple Regression,”
Journul of the Royal Statistical Society, Ser. B, 30, 31-53 (Discussion,
P that includes 0, i.e. P = {O,P-}, 5466).
Mallows, C. L. (1964), “Choosing Variables in a Linear Regression: A
RSSP- RSSK+ = (BP- - &)“DK(,?lP- - &) Graphical Aid,” unpublished paper presented at the Central Regional
Meeting of the Institute of Mathematical Statistics. Manhattan, KS. May Am~rrls ofMothemuticrr/ Stdstirs. 43, 1076-1088.
7-9. Srikantan, K. S. (1970). “Canonical Association Between Nominal Mea-
~ (1966), “Choosing a Subset Regression.” unpublished paper pre- surements,” Jortn~d of the America~r .‘jtati.stical Association, 65, 284-
sented at the Annual Meeting of the American Statistical Association. 292.
Los Angeles, August I S-19. Stein, C. (1960), ‘*Multiple Regression,” in Cot7trihutions to P robnbilit~
Mantel, N. (1970), “Why Stepdown Procedures in Variable Selection.” and Strrtistics, ed. I. Olkin, Stanford, CA: Stanford University Press, pp.
Technometrics. 12, 621625. 424-443.
Schatzoff. M., Tsao, R., and Fienberg, S. (1968) “Efficient Calculation of Theil, H. (1963). “On the Use of Incomplete Prior Information in Re-
All Possible Regressions,” Trchnometric.s, IO, 769-779. gression Analysis,” Joun7al of the Americcm Statistical A.ssociation, 58,
Sclove, S. L. (1968) “Improved Estimators for Coelbcients in Linear Re- 401&414.
gression,” Journul qf the American Statistical A.ssociation, 63, 596406. Watts, H. W. (1965), “The Test-o-Gram; a Pedagogical and Presentational
Spjotvoll, E. (1972), “Multiple Comparison of Regression Functions,” The Device,” The American Statistician, 19, 22-28.