0% found this document useful (0 votes)

13 views

Some Comments on Cp

Uploaded by

innovation2024uint

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Some Comments on Cp

Uploaded by

innovation2024uint

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Technometrics

ISSN: 0040-1706 (Print) 1537-2723 (Online) Journal homepage: www.tandfonline.com/journals/utch20

Some Comments on Cp

C. L. Mallows

To cite this article: C. L. Mallows (2000) Some Comments on Cp , Technometrics, 42:1, 87-94,
DOI: 10.1080/00401706.2000.10485984

To link to this article: https://doi.org/10.1080/00401706.2000.10485984

Published online: 12 Mar 2012.

Submit your article to this journal

Article views: 701

View related articles

Citing articles: 10 View citing articles

Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=utch20
Some Comments on Cp
C. L. MALLOWS*
Bell Laboratories
Murray Hill, New Jersey
(AT&T Research Laboratories, Florham Park, NJ 07932)

We discuss the interpretation of Cp-plots and show how they can be calibrated in several ways. We
comment on the practice of using the display as a basis for formal selection of a subset-regression
model, and extend the range of application of the device to encompass arbitrary linear estimates
of the regression coefficients, for example Ridge estimates.

KEY WORDS: Linear regression; Selection of variables: Ridge regression.

1. INTRODUCTION rows contain the matrix (Z;Zp-‘Z: where Zp is ob-

Suppose that we have data consisting of 11 observa- tained from X by deleting the columns corresponding to
tions on each of k + 1 variables, namely k independent Q). Let RSSp denote the corresponding residual sum of
variables x1, . . ~1; and one dependentvariable, y. Write squares, i.e.
zo = 1.x(1 x (k + 1)) = (2”: x1.. :zlc), ?/(n x 1) =
RSSI>= c(yU - x,p$
(~~l~...~~n)T,X(n~(X:+1))=(~:,,3).Amodeloftheform
For any such estimate ,&p, a measure of adequacy for
yu = rl(xu) + GL ‘u=1,2,...n (1) prediction is the “scaled sum of squarederrors”
where

is to be entertained (if $0 is absent the development is en-

tirely similar; we assumethroughout that X has rank k+ l),
with the residuals el . ..e. being regarded (tentatively) as
being independent random variables with mean zero and the expectation of which is easily found to be
unknown common variance 0’. The Z’S are not to be re-
garded as being sampled randomly from some population, E(JP) = VP + $L+
but rather are to be taken as fixed design variables. We where Vp, Bp are respectively “variance” and “bias” con-
supposethat the statistician is interested in choosing an es- tributions given by
timate ,B = (boo, , /&), with the idea that for any point x
in the general vicinity of the data at hand, the value
BP = P;XT(I - MP)XPC~ (2)

will be a good estimate of 7](x). In particular he may be and ,BQis ,B with the elementscorrespondingto P replaced
interested in choosing a “subset least-squares”estimate in by zeroes,and Mp = XX; = XrXp = Zp(Z~~ZII)-‘Z~.
which some components of b are set at zero and the re- The Cp statistic is defined to be
mainder estimated by least squares.
Cp = & RSSp - rl+ 21-1 (3)
The Cp-plot is a graphical display device that helps
the analyst to examine his data with this framework in where i” is an estimate of n2. Clearly (ashas beenremarked
mind. Consider a subset P of the set of indices Kt = by Kennard (1971)), Cp is a simple function of RSSp, as
(0.1~ 2, . , k}; let Q be the complementary subset. Sup- are the multiple correlation coefficient defined by I- R’$ =
pose the number of elements in P, Q are lPl = p, IQ1= q, RSSp/TSS (where TSS is the total sum of squares) and
so that p + 4 = k + 1. Denote by pp the vector of the “adjusted” version of this. However the form (3) has
estimates that is obtained when the coefficients with sub- the advantage(as has been shown by Gorman and Toman
scripts in P are estimated by least squares,the remaining (1966), Daniel and Wood (1971), and Godfrey (1972)) that
coefficients being set equal to zero; i.e. since under the above assumptions
i$ = X,Y E(RSSr) = (n - p)a” + Bp> (4)
where X, is the (Moore-Penrose) generalized inverse of
XT>,which in turn is obtained from X replacing the columns
@ 1973 American Statistical Association
having subscripts in Q by columns of zeroes.(Thus Xi has and the American Society for Quality
zeroes in the rows corresponding to Q, and the remaining TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

87
88 C. L. MALLOWS

Cp is an estimate of E(Jp), and is suitably standardized nating to consider how patterns similar to thosehe describes
for graphical display, plotted against p. Graphical presenta-
would show up on a Cp-plot.
tion of the various regression sums of squaresthemselves First, suppose the independent variables are not highly
againstp was advocatedby Watts (1965).For k not too large correlated, that 0 = pp, and that every non-zero element of
it is feasible (Furnival 1971;Garside 1965;Schatzloff, Tsao,
/3 is large (relative to the standarderror of its least-squares
and Fienberg 1968)to compute and display all the 2”+’ val-estimate).Then the Cp-plot will look something like Figure
uesof CP; for larger valuesone can use algorithms of Beale,
1 (drawn for the case p = k - 2, K+ - P = {1,2,3}).
Kendall, and Mann (19671,Hocking and Leslie (1967), and Notice the approximately linear diagonal configuration of
LaMotte and Hocking (1970) to compute only the more in- points correspondingto the well-fitting subsetsof variables.
teresting (smaller) values. Now, supposex1, 22, ~3 are highly correlated with each
In section 2 we describe some of the configurations that
other, with eachbeing about equally correlated with y. Then
can arise; in section 3 we provide some formal calibrationany two of these variables, but not all three, can be deleted
for the display and in section 4 comment on the practice
from the model without much effect. In this case the rele-
of using it as a basis for formal selection. The approachis
vant points on the Cp-plot will look something like Figure
extendedin section 5 to handle arbitrary linear estimatesof
the regressioncoefficients. 2a, if no other variables are of importance, or like Figure
The approachcan also be extendedto handle multivariate2b if some other subset P is also needed.(In all these ex-
responsedata and to deal with an arbitrary weight functionampleswe are assumingthat the constant term POis always
w(x) in factor-space, describing a region of interest dif-needed).Notice that now the diagonal pattern is incomplete.
In an intermediate case,when ~1,22,23 have moderatecor-
ferent from that indicated by the configuration of the data
currently to hand. In each case, the derivation is exactlyrelations, a picture intermediate between Figures 1 and 2b
will be obtained.
parallel to that given above.In the former case,one obtains
a matrix analog of CP in the form $:-l.RSSp - (n - 2p)1 Thirdly, supposex1,22 are individually unimportant but
where 2 is an estimate of the residual covariance matrix, jointly are quite effective in reducing the residual sum of
and RSSp is C(y, - x,bp)r(yU - xUbp). One or more squares;supposesome further subsetP of variables is also
measuresof the “size” of CP (such as the trace, or largestneeded.Mantel gives an explicit example of this behavior.
Figure 3 shows the resulting configuration in the case /PI =
eigenvalue)can be plotted againstp. In the latter case,with
the matrix A = (A,j) defined by Ai,, = J’ zizjw(x) dx, k - 4.
one arrives at a statistic of the form Notice that even if C’I~,~,~Iis the smallest Cp-value for
subsets of size p + 2, there might be subsets Pi, Pi, (not
Cc = & (& - B)TA(& - ,!?)- V,a, + 2V,a. containing P) with lP,l = p or p + 1 that gave smaller
values of Cp than those for P, {P, l}, {P, 2). In this case
where V$?+ = trace (A(XTX)-‘), Vp” = trace (A
an upward stepwise testing algorithm might be led to in-
(X$Xp))), and we can plot C$ against Vp”. This reduces
to the Cp-plot when A = XTX. If interest is concentrated clude variables in these subsetsand so not get to the sub-
at a single point z, we have A = xxT, and the statistic set {P, 1,2}. Mantel describesa situation where this would
6-“C$ is equivalent to that suggestedby Allen (1971); his happen.
equation (9) = s-“(C,” - x(X%-lxT).
2. SOME CONFIGURATIONS ON +-PLOTS
From (21, (3), (4) we see that if PQ = 0, so that the
P-subset model is in fact completely appropriate, then
RSSp = (n - p)a2 and Cp = p. If 3’ is taken as
RSSK+,(+-r), then CK+ = IK+I = k + 1 exactly.
Notice that if P* is a (p + 1) element subset which con- d
tains P, then
It+1
cp. - cp = 2 - g (5)
where SS is the one-d.f. contribution to the regressionsum C
of squares due to the (p + 1)-th variable, so that S,S’/e2
is a tf statistic that could be used in a stepwise testing
algorithm. If the additional variable is unimportant, i.e. if
the bias contribution BP - Bp* is small, then E(SS) M c2
and so
E(Cp* -C,) M 1.
Mantel (1970) has discussed the use of stepwise proce-
dures,and how they behavein the face of various patternsof
correlation amongst the independentvariables. It is illumi- Figure 1. Cp-plot: P is an Adequate Subset.

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

SOME COMMENTS ON Cp 89

l 0

k+’ t-

C l Pl23
.P23
l P3 .Pl3
l P2
023 00123 l Pl l Pl2
‘03 8013
8 02 0012
01

1 1 I I 1
)- It+1
2 3 4 P P+( P+2 P+3
0 1
P
Figure 2b. C,-plot: Same as 2a Except That Variables in P Are Also
Figure 2a. Cp-plot: Variables I, 2, 3, Are Highly Explanatory Also Explanatory
Highly Correlated.
Var(CP - C,,) M 2(jAl+ IA’1 - 2R2)
3. CALIBRATION
To derive bench marks for more formal interpretation of where R2 is the sum of squares of the canonical corre-
Cp-plots, we assumethat the model (1) is in fact exactly lations between the sets of variables XA and XAI, after
appropriate, with the residuals er . . e, being independent partialling out the variables XB. (Thus if IBJ = IPI - 1,
and Normal (0, a”). Suppose6’ is estimated by RSS,+/u Var(Cp - CPJ) = 4(1 - p”) where p is the partial correla-
where v = n - k - 1, the residual degreesof freedom. We tion coefficient PAA~.B).[Srikantan (1970) has proposedthe
do not of course recommend that the following distribu- average,rather than the sum, of the squaredcanonical cor-
tional results be used blindly without careful inspection of relations as an overall measureof association.This measure
the empirical residuals yi - 5(x,), i = 1,. , n. However, has the property that its value is changedwhen a new vari-
they should give pause to workers who are tempted to as- able, completely uncorrelated with all the previous ones, is
sign significance to quantities of the magnitude of a few added to one of the sets of variates.]
units or even fractions of a unit on the Cp scale. We now use the Scheffe confidence ellipsoid to derive a
First, notice that the increment Cp* - CP (in (5) above) different kind of result. Let us write ,$ = (/?“,&) for the
is distributed as 2 - t?, where the t-statistic ti is central if
,B = ,Bp*. In this casethis increment has mean and variance
of approximately 1 and 2 respectively. Similarly,
CK+ - Cp = k + 1 - Cp = q(2 - Fq,v) (6) l P l Pq
where q = Ic + 1 - p and the F statistic is central if l P2
/?I = pp; thus if v is large compared with q this incre-
ment has mean and variance approximately q and 2q re-
spectively. The variance of the slope of the line joining
the points (p, Cp), (Ic + 1, k + 1) is thus 2/q, so that the k+ 0
slope of a diagonal configuration such as is shown in Fig-
ure 1 will vary considerably about 45”. The following ta- c
bles (derived from (6)) give values of Cp - p that will be
exceededwith probability a when the subset P is in fact
adequate(i.e. when ,8 = /3~ so that & = 0), for the cases
v = n - F;- 1 = 30, cx;. The value tabulated is q(F,,,,(cv)
- 1).
For comparing two Cp-values corresponding to subsets
P.P’ with Pn P’ = B;P = AU B,P’ = A’u B, it is ( 1 I I I I
straightforward to derive the results, valid under the null k-4 k-3 k-2 k-l k k+l
hypothesis that each of P and P’ is an adequatesubset,
Figure 3. Cp-plot: Two Variables That Jointly Are Explanatory But
E(Cp - Cpf) E IPl - IP’I = IAl - iA’1 Separately Are Not.

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

90 C. L. MALLOWS

Table la. Values of Cp - p That Are Exceeded With Probability cy When p = pp; q = k + 1 -p, v = 30

q=k+ l-p 1 2 3 4 5 6 7 8 9 10 15 20 30

cy = .lO 1.88 2.98 3.83 4.57 5.25 5.88 6.49 7.07 7.64 8.20 10.83 13.35 18.19
.05 3.17 4.63 5.77 6.76 7.67 8.52 9.34 10.13 10.90 11.65 15.22 18.63 25.23
.Ol 6.56 8.78 10.53 12.07 13.50 14.84 16.13 17.38 18.60 19.79 25.20 30.97 41.58

least-squaresestimate of ,@ = (PO,pg), and let rejected if there is any OK with & = 0 that is acceptable
according to the corresponding test in the family, i.e., if
xTx=(E ma>. there is any PK with p Q = 0 lying within the confidence
ellipsoid S,. By the Lemma, the corresponding acceptable
subsets(0, P-} are just those that have
Then the Scheffe lOOcu%confidence ellipsoid for the ele-
ments of @K is the region Cp<2p-k-l+kF,. (9)
S, = {OK: (PK - ~K)~DK(PK - by) < k62&) (7) We state the property formally:
where F, is the upper lOOa%quantile of the F distribution A subset P = (0, P-} satisfies (9) if and only if there is some vector of
on Ic,n - Ic - 1 degreesof freedom. coefficients 0 having ,& = 0 that lies within the Scheffk ellipsoid (7),
Notice that S, can be written i.e. if and only if there is some vector of this form that is accepted by the
corresponding test with acceptance region of the form (8).
1 ^
s, = OK:5 (PK-PK)&S; As an example, consider the lo-variable data studied by
{ 1
Gorman and Toman (1966). Taking a = 0.10,k = 10,v =
where S; is a fixed ellipsoid centered at the origin: 25, we find that amongthe 58 subsetsfor which Gorman and
Toman computed Cp-values, there are 39 that satisfy (9),
Sz, = {7:yTD~y < kF,}. in number 7, 13, 9, 10 with p = 7,8,9,10 respectively. This
Let P- , Q be any complementary subsets of K, P = result gives little support to the view that this set of data is
sending a clear messageregarding the relative importance
(0, p-1.
The following lemma is proved in the Appendix. of the variables under consideration.
Notice that if the true coefficient vector 0” has p& = 0,
Lemma The following statementsare equivalent: then Pr {for all P containing P*, Cp 5 2p - k - 1 + kFa}
> Q, with equality only if p* = 1 (i.e. P* = (0)). This prop-
(i) The region S, intersects the coordinate hyperplane
erty of the procedureis not completely satisfying since it is
ffP = {pK:pQ = o}, not an equality; also the form of the boundary in the Cp-plot
(ii) The projection of S, onto the HQ hyperplane con-
is inflexible. In theory, one way of getting a better result is
tains the origin,
the following. Given any subset P* and a sequenceof con-
(iii) The subset least squares estimate fip = @a,BP- )
stantsci,c2,..., ck (and the matrix DK) one could compute
has BP- in S,, the probability Pr {for all P containing P*, Cp < I+}; this
(iv) Cp < 2p - Ic - 1 + kF,, probability dependson cl,. , ck, P* and DK, but not on
(v) RSSp- RSSK+ < k&‘F,.
any other parameters.One could then adjust cl, . . , ck so as
Now consider any hypothesis that specifies the value of to make the minimum of this probability over all choices
,0~, and the corresponding lOOa% acceptanceregion of P* (or possibly only over all choices with p* > some
pc) equal to some desired level cr. The computation would
Tplc- = DK: ;(bK -PK) E s: (8) presumably be done by simulation.
1 I Starting from the Scheffe ellipsoid, Spjmtvoll (1972) has
developed a multiple-comparison approach that provides
(clearly P(BKET,~I&) is in fact equal to a; this is just the confidenceintervals for arbitrary quadratic functions of the
confidence property of the Scheffe ellipsoid (7)). Starting unknown regressionparameters,for example BP - Bpr for
from this family of acceptanceregions for hypothesesthat two subsetsP, P’.
specify PK completely, a natural acceptanceregion for a
composite hypothesis of the form ,& = 0 is given by the 4. FORMAL SELECTION OF
union of all regions Tp; for values of & such that ,@$= SUBSET REGRESSIONS
0; the reasoning is that the hypothesis pQ = 0 cannot be Many authors have studied the problem of giving for-
Table lb. Values of Cp - p That Are Exceeded With Probability cy When p = pp; q = k + 1 - p, v = CC
q=k+ l-p 1 2 3 4 5 6 7 8 9 10 15 20 30
a = .lO 1.71 2.61 3.25 3.78 4.24 4.65 5.02 5.36 5.68 5.99 7.31 8.41 10.26
.05 2.84 3.99 4.82 5.49 6.07 6.59 7.07 7.51 7.92 8.31 10.00 11.41 13.77
.Ol 5.63 7.21 9.34 9.28 10.09 10.81 11.48 12.09 12.67 13.21 15.58 17.57 20.89

TECHNOMETRICS,FEBRUARY2000,VOL.42,NO.1
SOME COMMENTS ON Cp 91

ma1rules for the selection of predictors; Kennedy and Ban-

croft (I 971) give many references.Lindley (1968) presents
a Bayesian formulation of the problem. The discussion in
section 3 above does not lend any support to the practice
of taking the lowest point on a Cp-plot as defining a “best”
subset of terms. The present author feels that the greatest
value of the device is that it helps the statistician to ex-
amine some aspectsof the structure of his data and helps
him to recognize the ambiguities that confront him. The de-
vice cannot be expectedto provide a single “best” equation
when the data are intrinsically inadequateto support such
a strong inference.
To make these remarks more precise and objective, we
shall compute (in a special case) a measureof the perfor-
mance to be expected of the rule “choose the subset that
minimizes Cp, and fit it by least-squares.”We shall use as
a figure of merit of an arbitrary estimator ?j(~) the same
quantity as was used in setting up the Cp-plot, namely the 0 1 2 3 4 5 6 7
sum of predictive squarederrors STANDARDtZED REGRESStON COEFFICIENT, T

Figure 4. m-functions
J?j= $ k(il(xu) - xdq2.
u=l
Notice that the function m(r) is less than 1 only for /T/ <
We can handle in detail only the caseof orthogonal regres- .78, and rises to a maximum value of 1.65 at 1~1= 1.88. It
sors, and so now assumeXTX = nI. In this case we see exceeds1.25 for 1.05 < (71< 3.05.
from (5) that Cp is minimized when P contains just those We reiterate that in this case of orthogonal regressors
terms for which tf > 2, where ti = fibi/ is the t-statistic with n. very large, the “minimum Cp” rule is equivalent to
for the j-th regressioncoefficient, and bi is the least squares a stepwise regressionalgorithm with all critical levels set at
estimate, bi = Cxu~y/7L/n. Thus in this case the “minimize 15.73%. Also shown in Figure 4 are the m-functions cor-
Cp” rule is equivalent to a stepwise regression algorithm responding to several other critical levels; when all Ic + 1
in which all critical t-values are set at 4 and d2 is kept at terms are infallibly included (the “full-1.s.” rule), m(r) = 1
the full-equation value throughout. for all 7, so that E( Jfull I.~.) = Ic+ 1. We seethat the “min-
Now let us assumethat n is sufficiently large that varia- imum Cp” rule will give a smaller value for E(J) than the
tion in 6 can be ignored; then to7 tl, , tk will be indepen- “full-1.s.” rule only when rather more of the true regres-
dent Normal variables with unit variances and with means sion coefficients satisfy 171< .78 than satisfy /7/ > 1; in
ro: ( Tk where ‘TV= &&/c-. Let d(t) be the function that the worst case with 1~~1 = 1.88 for j = 1, . . , iF,E(J) for
equals0 for ItI 5 4, and equals I otherwise, then J for the the “minimize Cp” rule is 165% of that for the “full-1.s.”
“minimum-Cp subsetleast squares”estimate can be written rule. Similarly for rejection rules with other critical levels;
in particular, a rule with a nominal level of 5% (two tailed)
gives an E(J) at worst 246% of that of the “full-1.s.” rule.
Thus using the “minimum Cp” rule to select a subsetof
terms for least-squaresfitting cannot be recommendeduni-
which reducesto versally. Notice however that by examining the Cp-plot in
the light of the distributional results of the previous section
JMinCp = -&ii) - 7J2. one can see whether or not a single best subsetis uniquely
j=o indicated; the ambiguous cases where the “minimum Cp”
rule will give bad results are exactly those where a large
Hence number of subsetsare close competitors for the honor. With
such data no selection rule can be expected to perform re-
liably.

where v~(T) = E((,u+‘r)d(~+~)-7)’ (where ‘uis a standard 5. CL-PLOTS

Normal variable), and is the function displayed in Figure We now extend the Cp-plot device to handle general lin-
4 (labelled “16%“, since Pr{luJ > fi} = .1573). If the ear estimators. With the sameset-up as in the Introduction,
constantterm is always to be included in the selectedsubset, consider an estimate of the form
the corresponding result is

E(JMinC{O,P-}) = 1 + Cm(lr?l),
j=1

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

92 C. L. MALLOWS

where $ is the mean jj = C:Y~~/~L, and L is a k x n ma-

trix of constants. We shall assume that Ll, = 0 (where tr(XL) = 5 L
i=l f + Ai
1; = (l:l:...,l)) so that a change in the origin of mea-
k
surementof y affects only fro and not ,!?I,. . pk. Examples f%Z
of estimators of this class are: full least-squares;subset- RSSL, - RSSL, = 1 (13)
i=l h(f + Xij2
least-squares;and Bayes estimatesunder multinormal spec-
ifications with a multinormal prior, a special case of which Figure 5 gives the resulting plot for the set of 1O-variable
is the class of “Ridge” estimates advocated by Hoer1 and data analyzed by German and Toman (1966) and by Hoer]
Kennard (1970a,b),(seealso Theil (1963), section 2.3): and Kennard (1970b).Shown are (p, Cp) points correspond-
ing to various subset-least-squaresestimatesand a continu-
L = Lf = (XTX + fI))lXT (11) ous arc of (VL, CL) points correspondingto Ridge estimates
where f is a (small) scalar parameter (Hoer1 and Kennard with values of f varying from zero at (1 I, 11) and increas-
used k), and in this section we are writing X for the n x k ing to the left. For this example,Hoer1and Kennard (1970b)
matrix of independentvariables, which we are now assum- suggestedthat a value of f in the interval (0.2, 0.3) would
ing have been standardized to have zero means and unit “undoubtedly” give estimated coefficients “closer to ,0 and
variances.Thus 1:X = 0, diag(XTX) = I. more stablefor prediction than the least-squarescoefficients
As a measure of adequacy for prediction we again use or some subsetof them.” On the other hand from Figure 5
the scaled summedmean squareerror, which in the present one would be inclined to suggesta value of f nearer to .02
notation is than to 0.2.
One obvious suggestionis to choose f to minimize C’L~.
JL = ~(lixy, - XPl12+ 41J- Po12) Some insight into the effect of this choice can be gained
as follows. First consider the caseof orthogonal regressors,
and which has expectation and now assumeXTX = I. Notice that in this caseour risk
function E(J) is equivalent to the quantity Et=, E(&,0,)2
used by Hoer1 and Kennard (1970a). We may take H = I,
E(JL) = VL + $BL
so that zi = bii, the least-squaresestimate of /$. From (12)
where and (13) we see that C’L~is a minimum when f satisfies

VL = 1-t tr(XTXLLT)
(1 + f)/f = 2 /3,2,rc82;
BL = @(LX - I)TXTX(LX - I)PK. j=l

The sum of squaresabout the fitted regression is

RSSL= llu-Un -@LII~

20
which has expectation
i ’
8 l
E(RSSL) = C-“V,*+ BL
where
V,* = 7(.- I- 2 tr(XL) + tr(XTXLLT). 15

Thus we have an estimator of E(JL), namely

C
CL = & RSSL - rl+ 2 + 2 tr(XL). (12)

By analogy with the Cp development,we propose that val-

ues of CL (for various choices of L) should be plotted lo
against values of VI,. Notice that when L is a matrix cor-
responding to subset least squares,CL, VL reduce to CP,p
respectively.
For computing values of CL, VL for Ridge estimates(1 l),
the following steps can be taken. First, find H (orthogonal)
and A = diagonal (Xi, X2.. i &) so that XTX = HThH. 5 I
-
t I I 1 I
Compute 2 = HX”y. Then 6 7 8 9 10 If 12
P
2
Figure 5. CL Plot for Gorman-Toman Data; Subset (Cp) Values and
Ridge (CL,) Values.

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

SOME COMMENTS ON CP 93

the adjusted estimates are then given by where bT = (fia:&) is the vector of least-squaresesti-
mates of all the coefficients in the model, and (&,bF-) is
/!I?;= 1- $ /?, i=l,...,k. the vector of subset-least-squaresestimates.From the form
(14)
of S, (7) it now follows that (iii) BP- is in S, if and only
( 3)
if (v) RSSpPRSSK+ < ktf2F,, which is directly equivalent
It is interesting that this set of estimates is of the form (if 6’ = RSSK+/n - k - 1) to (iv) Cp < 2p - k - 1+ kF’,.
suggestedby Stein (1960) for the problem of estimating re- Clearly (iii) implies (i); to prove the converse we remark
gression coefficients in a multivariate Normal distribution. that for any vector PT = (pa,pg) with PK in the hyper-
Jamesand Stein (I 96 1) showedthat for k 2 3 the vector of plane Hp = {,dK : pQ = 0}, we have
estimates,&** obtained by replacing the multiplier k in (14)
by any number between 0 and 21% - 4 has the property that IIXP - WI2 = IIXP - XPPl12 + IIXBP - XPl12,
E(J**) is less than the full-least-squares value k + 1 (see
Hoer1and Kennard 1970b),for all values of the true regres- the cross-product term vanishing by definition of pp. Thus
sion coefficients. Thus our “minimize CL~” rule dominates if any point of HP is in S,, ,0p must be. Finally, (i) is
full least-squaresfor k 2 4. This result stands in interest- directly equivalent to (ii) by a simple geometrical argument.
ing contrast to the disappointing result found above for the To handle the case of non-orthogonal regressors,Sclove
“minimize Cp” rule. (1968) has suggestedtransforming to orthogonality before
Now, consider the caseof equi-correlatedregressors,with applying a shrinkage factor. A composite procedure with
XTX = I + p(llr - I). In this case the least-squaresesti- much intuitive appeal for this writer would be to use the
mate 3 of 6 = C/fl,/k has variance l/k(l - p + kp), and Cp plot or some similar device to identify the terms that
the vector of deviations (/?i - 5) has covariance matrix should certainly be included (since they appear in all sub-
(I - kP1llT)/(l - p). Thus when p is large, these devi- sets that give reasonably good fits to the data), to fit these
ations become very unstable. by least squares,and to adjust the remaining estimates by
It is found that for p near unity, C’L~is minimized when orthogonalizing and shrinking towards zero as in La Motte
f is near (1 - p)g, where and Hocking (1970).

[Received June 1972. Revised October 1972.1

REFERENCES
The adjusted estimates are given approximately by Allen, D. M. (1971). “Mean Square Error of Prediction as a Criterion for
Selecting Variables,” Technometrics, 13, 469-475.

&i,(l-&pi-i). Beale, E. M. L., Kendall, M. G., and Mann, D. W. (1967), “The Discarding
of Variables in Multivariate Analysis,” Biometrika, 54, 357-366.
Daniel, C., and Wood, F. S. (1971) Fitting Equations to Data, New York:
Wiley-Interscience.
Thus here the “minimize C’L~” rule leads to shrinking the
Furnival, G. M. (1971), “All Possible Regressions With Less Computation,”
least-squaresestimatestowards their average.While the de- Technometrics, 13, 403-408.
tails have not been fully worked out, one expects that this Garside, M. J. (1965), “The Best Subset in Multiple Regression Analysis,”
rule will dominate full least-squaresfor k > 5. Applied Statistics, 14, 196200.
Godfrey, M. B. (1972), “Relations Between C,, RSS, and Mean Square
6. ACKNOWLEDGMENTS Residual,” unpublished manuscript submitted to Technometrics.
It is a great personalpleasureto recall that the idea for the Gorman, J. W., and Toman, R. .I. (1966), “Selection of Variables for Fitting
Equations to Data,” Technometrics, 8, 27-5 1.
Cp-plot arosein the course of some discussionswith Cuth- Hocking, R. R., and Leslie, R. N. (1967). “Selection of the Best Subset in
bert Daniel around Christmas 1963. The use of the letter Regression Analysis,” Technometrics, 9, 531-540.
C is intended to do him honor. The device was described Hoerl, A. E., and Kennard, R. W. (1970a), “Ridge Regression: Biased Es-
publicly in Mallows (1964) and again in Mallows (1966) timation of Non-orthogonal Problems,” Technometrics, 12, 55-67.
(with the extensionsdescribedat the end of section 1 above) - (1970b), “Ridge Regression: Applications to Non-orthogonal Prob-
lems,” Technometrics, 12, 69-82.
and has appearedin several unpublished manuscripts. Im- James, W., and Stein, C. (1961), “Estimation With Quadratic Loss,” in
petus for preparing the presentexposition was gained in the Proceedings of the Fourth Berkeley Symposium, Berkeley: University of
course of delivering a series of lectures at the University of California Press, pp. 361-379.
California at Berkeley in February 1972; their support is Kennard, R. W. (1971), “A Note on the C, Statistic,” Technometrics, 13,
899-900.
gratefully acknowledged.
Kennedy, W. J., and Bancroft, T. A. (1971) “Model-Building for Prediction
in Regression Based on Repeated Significance Tests,” The Annals of
APPENDIX Mathematical Statistics, 42, 1273-1284.
La Motte, L. R., and Hocking, R. R. (1970) “Computational Efficiency in
Proof of the Lemma the Selection of Regression Variables,” Technometrics, 12, 83-93.
The key to theseresults is the identity, true for any subset Lindley, D. V. (1968), “The Choice of Variables in Multiple Regression,”
Journul of the Royal Statistical Society, Ser. B, 30, 31-53 (Discussion,
P that includes 0, i.e. P = {O,P-}, 5466).
Mallows, C. L. (1964), “Choosing Variables in a Linear Regression: A
RSSP- RSSK+ = (BP- - &)“DK(,?lP- - &) Graphical Aid,” unpublished paper presented at the Central Regional

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

94 C. L. MALLOWS

Meeting of the Institute of Mathematical Statistics. Manhattan, KS. May Am~rrls ofMothemuticrr/ Stdstirs. 43, 1076-1088.
7-9. Srikantan, K. S. (1970). “Canonical Association Between Nominal Mea-
~ (1966), “Choosing a Subset Regression.” unpublished paper pre- surements,” Jortn~d of the America~r .‘jtati.stical Association, 65, 284-
sented at the Annual Meeting of the American Statistical Association. 292.
Los Angeles, August I S-19. Stein, C. (1960), ‘*Multiple Regression,” in Cot7trihutions to P robnbilit~
Mantel, N. (1970), “Why Stepdown Procedures in Variable Selection.” and Strrtistics, ed. I. Olkin, Stanford, CA: Stanford University Press, pp.
Technometrics. 12, 621625. 424-443.
Schatzoff. M., Tsao, R., and Fienberg, S. (1968) “Efficient Calculation of Theil, H. (1963). “On the Use of Incomplete Prior Information in Re-
All Possible Regressions,” Trchnometric.s, IO, 769-779. gression Analysis,” Joun7al of the Americcm Statistical A.ssociation, 58,
Sclove, S. L. (1968) “Improved Estimators for Coelbcients in Linear Re- 401&414.
gression,” Journul qf the American Statistical A.ssociation, 63, 596406. Watts, H. W. (1965), “The Test-o-Gram; a Pedagogical and Presentational
Spjotvoll, E. (1972), “Multiple Comparison of Regression Functions,” The Device,” The American Statistician, 19, 22-28.

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

SPSS Statistics - 210303
100% (2)
SPSS Statistics - 210303
25 pages
45 Colonial Broadcasting
100% (1)
45 Colonial Broadcasting
17 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Response Surface Methodology and MINITAB
100% (1)
Response Surface Methodology and MINITAB
22 pages
Mallows - Some Comments On CP
No ratings yet
Mallows - Some Comments On CP
16 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Estimation_of_Population_Process_Capabil
No ratings yet
Estimation_of_Population_Process_Capabil
16 pages
Non-normal Process Capability Indices
No ratings yet
Non-normal Process Capability Indices
6 pages
CPK Index - How To Calculate For All Types of Tolerances
100% (6)
CPK Index - How To Calculate For All Types of Tolerances
15 pages
Fifth Dimension: The Light to See
From Everand
Fifth Dimension: The Light to See
Marc E. King
No ratings yet
Recursive Analysis
From Everand
Recursive Analysis
R. L. Goodstein
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Calculating_Process_Capability_for_Circular_True_P
No ratings yet
Calculating_Process_Capability_for_Circular_True_P
13 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Making Decisions in Assessing Process Capability Index CPK
No ratings yet
Making Decisions in Assessing Process Capability Index CPK
6 pages
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)
Castagliola - 2005 - Capability Indices Dedicated To The Two Quality Characteristics Case
No ratings yet
Castagliola - 2005 - Capability Indices Dedicated To The Two Quality Characteristics Case
21 pages
Capability Indices For Unilateral Tolerances
No ratings yet
Capability Indices For Unilateral Tolerances
16 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A short note on the effect of sample size
No ratings yet
A short note on the effect of sample size
13 pages
One Process Different Results Methodologies For Analyzing A Stencil Printing Process Using Process Capability Indices - Santos
No ratings yet
One Process Different Results Methodologies For Analyzing A Stencil Printing Process Using Process Capability Indices - Santos
8 pages
Process Capability
No ratings yet
Process Capability
20 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Estimation of CPMK Process Capability Index Based
No ratings yet
Estimation of CPMK Process Capability Index Based
10 pages
Process Capability Indices Based On Median Absolute Deviation PDF
No ratings yet
Process Capability Indices Based On Median Absolute Deviation PDF
6 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Estimating Process Capability Index C PM Using a Bootstrap Sequential Sampling Procedure
No ratings yet
Estimating Process Capability Index C PM Using a Bootstrap Sequential Sampling Procedure
15 pages
(1997) Process Capability Analysis For Non-Normal Relay Test Data
No ratings yet
(1997) Process Capability Analysis For Non-Normal Relay Test Data
8 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)
Capability Indices for Non Normal Data
No ratings yet
Capability Indices for Non Normal Data
8 pages
Operational Calculus in Two Variables and Its Applications
From Everand
Operational Calculus in Two Variables and Its Applications
V.A. Ditkin
No ratings yet
Introduction to Analytical Geometry
From Everand
Introduction to Analytical Geometry
Simone Malacrida
No ratings yet
Sample size determination for lower con®dence limits for
No ratings yet
Sample size determination for lower con®dence limits for
12 pages
(X), I 1, - . - , N, The Linear Correlation
No ratings yet
(X), I 1, - . - , N, The Linear Correlation
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Wararit2014 Confidence interval for the process capability index based on the bootstrap-t confidence interval for the standard deviation
No ratings yet
Wararit2014 Confidence interval for the process capability index based on the bootstrap-t confidence interval for the standard deviation
15 pages
meng2020 Hypothesis testing of process capability index ??? from the perspective of generalized confidance interval
No ratings yet
meng2020 Hypothesis testing of process capability index ??? from the perspective of generalized confidance interval
21 pages
Comparison_of_Process_Capability_Indices
No ratings yet
Comparison_of_Process_Capability_Indices
16 pages
2305.19901
No ratings yet
2305.19901
24 pages
Quality Reliability Eng - 2021 - Saha - Parametric Inference of The Loss Based Index CPM For Normal Distribution
No ratings yet
Quality Reliability Eng - 2021 - Saha - Parametric Inference of The Loss Based Index CPM For Normal Distribution
27 pages
A New Method of Estimating The Process C
No ratings yet
A New Method of Estimating The Process C
8 pages
Basic Methods of Linear Functional Analysis
From Everand
Basic Methods of Linear Functional Analysis
John D. Pryce
No ratings yet
More Than Word
No ratings yet
More Than Word
16 pages
PCT Paper3 05 2
No ratings yet
PCT Paper3 05 2
15 pages
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
From Everand
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
Dr. Gérard G. Emch
No ratings yet
Analytical Geometry of Three Dimensions
From Everand
Analytical Geometry of Three Dimensions
William H. McCrea
4/5 (1)
Topology and Geometry for Physicists
From Everand
Topology and Geometry for Physicists
Charles Nash
3.5/5 (1)
Elements of Partial Differential Equations
From Everand
Elements of Partial Differential Equations
Ian N. Sneddon
4.5/5 (14)
978 3 662 44722 2 - 25 - Chapter
No ratings yet
978 3 662 44722 2 - 25 - Chapter
11 pages
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Exercises of Basic Analytical Geometry
From Everand
Exercises of Basic Analytical Geometry
Simone Malacrida
No ratings yet
17 RSM2
No ratings yet
17 RSM2
35 pages
SPC Training
No ratings yet
SPC Training
80 pages
Complex Integration and Cauchy's Theorem
From Everand
Complex Integration and Cauchy's Theorem
G. N. Watson
No ratings yet
a425707-189
No ratings yet
a425707-189
11 pages
(J E Jackson, G S Mudholkar) Control Procedures For Residuals Associated With Principal Component Analysis
No ratings yet
(J E Jackson, G S Mudholkar) Control Procedures For Residuals Associated With Principal Component Analysis
10 pages
spinal3
No ratings yet
spinal3
10 pages
s40141-013-0027-9
No ratings yet
s40141-013-0027-9
8 pages
2018-EPA-Guidance-Paper-on-Physical-Activity
No ratings yet
2018-EPA-Guidance-Paper-on-Physical-Activity
21 pages
Spinal 7
No ratings yet
Spinal 7
7 pages
child3
No ratings yet
child3
32 pages
Understanding the role of physiotherapists in schizophrenia an international perspective from members of the International Organisation of Physical T
No ratings yet
Understanding the role of physiotherapists in schizophrenia an international perspective from members of the International Organisation of Physical T
6 pages
spinal_cord_injury_in_children.14
No ratings yet
spinal_cord_injury_in_children.14
10 pages
spinal2
No ratings yet
spinal2
5 pages
spinal5
No ratings yet
spinal5
6 pages
child7
No ratings yet
child7
32 pages
child9
No ratings yet
child9
47 pages
child4
No ratings yet
child4
8 pages
stroke2
No ratings yet
stroke2
4 pages
stroke6
No ratings yet
stroke6
15 pages
arXiv_1111.5332
No ratings yet
arXiv_1111.5332
58 pages
50_1_online
No ratings yet
50_1_online
4 pages
SPSS Notes
No ratings yet
SPSS Notes
7 pages
Int 354
No ratings yet
Int 354
4 pages
CH09 Wooldridge 7e PPT 2pp
No ratings yet
CH09 Wooldridge 7e PPT 2pp
20 pages
REGRESSION AND CORRELATION Assignment Recovered
No ratings yet
REGRESSION AND CORRELATION Assignment Recovered
18 pages
Dela Cruz, Nathaniel - SPSS Final Project
No ratings yet
Dela Cruz, Nathaniel - SPSS Final Project
9 pages
Instrumentation For Engineers
No ratings yet
Instrumentation For Engineers
234 pages
Design and Analysis of Experiments: Tropical Animal Feeding: A Manual For Research Workers
No ratings yet
Design and Analysis of Experiments: Tropical Animal Feeding: A Manual For Research Workers
26 pages
Download full Basic Business Statistics Concepts and Applications 12th Edition Berenson Test Bank all chapters
100% (15)
Download full Basic Business Statistics Concepts and Applications 12th Edition Berenson Test Bank all chapters
42 pages
Black 2001
No ratings yet
Black 2001
12 pages
Review - Rotor Balancing
No ratings yet
Review - Rotor Balancing
12 pages
CHY2018 Physical Chemistry Lab Manual
No ratings yet
CHY2018 Physical Chemistry Lab Manual
85 pages
Statistical Approaches With Emphasis On Design Of Experiments Applied To Chemical Processes Valter Silva download
100% (1)
Statistical Approaches With Emphasis On Design Of Experiments Applied To Chemical Processes Valter Silva download
44 pages
Sales and Distribution Management by Tapan K Panda
No ratings yet
Sales and Distribution Management by Tapan K Panda
102 pages
Hexa Helix Stakeholder Model in The Management of
No ratings yet
Hexa Helix Stakeholder Model in The Management of
8 pages
Sequential Calibration of Options
No ratings yet
Sequential Calibration of Options
15 pages
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
100% (37)
Ebook Ebook PDF The Analysis of Biological Data Second Edition All Chapter PDF Docx Kindle
47 pages
Residential Water Demand: An Updated Look: Jacob Jasperson, B.S
No ratings yet
Residential Water Demand: An Updated Look: Jacob Jasperson, B.S
47 pages
D T E Hunt - A Wilson - Chemical Analysis of Water - General Principles and Techniques-Royal Society of Chemistry (1986) PDF
No ratings yet
D T E Hunt - A Wilson - Chemical Analysis of Water - General Principles and Techniques-Royal Society of Chemistry (1986) PDF
722 pages
Origin Vs Originpro 2020 en
No ratings yet
Origin Vs Originpro 2020 en
5 pages
Npar Tests: Hasil Asumsi Klasik Uji Normalitas
No ratings yet
Npar Tests: Hasil Asumsi Klasik Uji Normalitas
4 pages
Computers in Industry: Rafael Sanchez-Marquez, José Manuel Jabaloyes Vivas
No ratings yet
Computers in Industry: Rafael Sanchez-Marquez, José Manuel Jabaloyes Vivas
16 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Heat Exchanger - Report
No ratings yet
Heat Exchanger - Report
43 pages
Predictor Effects Graphics Gallery
No ratings yet
Predictor Effects Graphics Gallery
44 pages
Sage Publications, Inc., Johnson Graduate School of Management, Cornell University Administrative Science Quarterly
No ratings yet
Sage Publications, Inc., Johnson Graduate School of Management, Cornell University Administrative Science Quarterly
27 pages
Academy of Entrepreneurship Journal-2020 - 2
No ratings yet
Academy of Entrepreneurship Journal-2020 - 2
10 pages
Sampling Distribution Revised For IBS 2020 Batch
No ratings yet
Sampling Distribution Revised For IBS 2020 Batch
48 pages
Batteries: Generalized Distribution of Relaxation Times Analysis For The Characterization of Impedance Spectra
No ratings yet
Batteries: Generalized Distribution of Relaxation Times Analysis For The Characterization of Impedance Spectra
16 pages

Some Comments on Cp

Uploaded by

Some Comments on Cp

Uploaded by

Technometrics

ISSN: 0040-1706 (Print) 1537-2723 (Online) Journal homepage: www.tandfonline.com/journals/utch20

To link to this article: https://doi.org/10.1080/00401706.2000.10485984

Published online: 12 Mar 2012.

Submit your article to this journal

Article views: 701

View related articles

Citing articles: 10 View citing articles

Full Terms & Conditions of access and use can be found at

KEY WORDS: Linear regression; Selection of variables: Ridge regression.

1. INTRODUCTION rows contain the matrix (Z;Zp-‘Z: where Zp is ob-

is to be entertained (if $0 is absent the development is en-

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

ma1rules for the selection of predictors; Kennedy and Ban-

where v~(T) = E((,u+‘r)d(~+~)-7)’ (where ‘uis a standard 5. CL-PLOTS

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

where $ is the mean jj = C:Y~~/~L, and L is a k x n ma-

The sum of squaresabout the fitted regression is

RSSL= llu-Un -@LII~

Thus we have an estimator of E(JL), namely

By analogy with the Cp development,we propose that val-

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

[Received June 1972. Revised October 1972.1

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

TECHNOMETRICS, FEBRUARY 2000, VOL. 42, NO. 1

You might also like