An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
An Introduction To Partial Least Squares Regression: Randall D. Tobias, SAS Institute Inc., Cary, NC
1
as possible while modeling the responses well. For successive pairs of scores is as strong as pos-
this reason, the acronym PLS has also been taken sible. In principle, this is like a robust form of
to mean ‘‘projection to latent structure.’’ It should be redundancy analysis, seeking directions in the
noted, however, that the term ‘‘latent’’ does not have factor space that are associated with high vari-
the same technical meaning in the context of PLS as ation in the responses but biasing them toward
it does for other multivariate techniques. In particular, directions that are accurately predicted.
PLS does not yield consistent estimates of what are
called ‘‘latent variables’’ in formal structural equation Another way to relate the three techniques is to note
modelling (Dykstra 1983, 1985). that PCR is based on the spectral decomposition of
X 0 X , where X is the matrix of factor values; MRA is
based on the spectral decomposition of Y^ 0 Y^ , where
Figure 3 gives a schematic outline of the method.
2
Table 2: PLS analysis of spectral calibration, with cross-validation
Number of Percent Variation Accounted For Cross-validation
PLS Factors Responses Comparison
Factors Current Total Current Total PRESS P
0 1.067 0
1 39.35 39.35 28.70 28.70 0.929 0
2 29.93 69.28 25.57 54.27 0.851 0
3 7.94 77.22 21.87 76.14 0.728 0
4 6.40 83.62 6.45 82.59 0.600 0.002
5 2.07 85.69 16.95 99.54 0.312 0.261
6 1.20 86.89 0.38 99.92 0.305 0.428
7 1.15 88.04 0.04 99.96 0.305 0.478
8 1.12 89.16 0.02 99.98 0.306 0.023
9 1.06 90.22 0.01 99.99 0.304 *
10 1.02 91.24 0.01 100.00 0.306 0.091
the first ten PLS factors, for both the factors and the this case, the PLS predictions can be interpreted as
responses. Notice that the first five PLS factors ac- contrasts between broad bands of frequencies.
count for almost all of the variation in the responses,
with the fifth factor accounting for a sizable proportion.
This gives a strong indication that five PLS factors are Discussion
appropriate for modeling the five component amounts.
The cross-validation analysis confirms this: although
the model with nine PLS factors achieves the absolute As discussed in the introductory section, soft science
minimum predicted residual sum of squares (PRESS), applications involve so many variables that it is not
it is insignificantly better than the model with only five practical to seek a ‘‘hard’’ model explicitly relating
them all. Partial least squares is one solution for such
factors.
problems, but there are others, including
The PLS factors are computed as certain linear combi-
nations of the spectral amplitudes, and the responses other factor extraction techniques, like principal
are predicted linearly based on these extracted fac- components regression and maximum redun-
tors. Thus, the final predictive function for each dancy analysis
response is also a linear combination of the spectral
amplitudes. The trace for the resulting predictor of ridge regression, a technique that originated
the first response is plotted in Figure 4. Notice that within the field of statistics (Hoerl and Kennard
1970) as a method for handling collinearity in
regression
neural networks, which originated with attempts
in computer science and biology to simulate the
way animal brains recognize patterns (Haykin
1994, Sarle 1994)
3
(1993) is a closely related technique. It is exactly Haykin, S. (1994). Neural Networks, a Comprehen-
the same as PLS when there is only one response sive Foundation. New York: Macmillan.
and invariably gives very similar results, but it can
be dramatically more efficient to compute when there Helland, I. (1988), ‘‘On the structure of partial least
are many factors. Continuum regression (Stone and squares regression,’’ Communications in Statis-
Brooks 1990) adds a continuous parameter , where tics, Simulation and Computation, 17(2), 581-
0 1, allowing the modeling method to vary 607.
continuously between MLR ( = 0), PLS ( = 0:5),
and PCR ( = 1). De Jong and Kiers (1992) de- Hoerl, A. and Kennard, R. (1970), ‘‘Ridge regression:
scribe a related technique called principal covariates biased estimation for non-orthogonal problems,’’
regression. Technometrics, 12, 55-67.
In any case, PLS has become an established tool in
chemometric modeling, primarily because it is often de Jong, S. and Kiers, H. (1992), ‘‘Principal covari-
possible to interpret the extracted factors in terms ates regression,’’ Chemometrics and Intelligent
of the underlying physical system---that is, to derive Laboratory Systems, 14, 155-164.
‘‘hard’’ modeling information from the soft model. More
work is needed on applying statistical methods to the de Jong, S. (1993), ‘‘SIMPLS: An alternative approach
selection of the model. The idea of van der Voet to partial least squares regression,’’ Chemomet-
(1994) for randomization-based model comparison is rics and Intelligent Laboratory Systems, 18, 251-
a promising advance in this direction. 263.
Dijkstra, T. (1983), ‘‘Some comments on maximum Stone, M. and Brooks, R. (1990), ‘‘Continuum regres-
likelihood and partial least squares methods,’’ sion: Cross-validated sequentially constructed
Journal of Econometrics, 22, 67-90. prediction embracing ordinary least squares,
partial least squares, and principal components
Dijkstra, T. (1985). Latent variables in linear stochas- regression,’’ Journal of the Royal Statistical So-
tic models: Reflections on maximum likelihood ciety, Series B, 52(2), 237-269.
and partial least squares methods. 2nd ed. Ams-
terdam, The Netherlands: Sociometric Research van den Wollenberg, A.L. (1977), ‘‘Redundancy
Foundation. Analysis--An Alternative to Canonical Correla-
tion Analysis,’’ Psychometrika, 42, 207-219.
Geladi, P, and Kowalski, B. (1986), ‘‘Partial least-
squares regression: A tutorial,’’ Analytica Chim- van der Voet, H. (1994), ‘‘Comparing the predictive ac-
ica Acta, 185, 1-17. curacy of models using a simple randomization
test,’’ Chemometrics and Intelligent Laboratory
Frank, I. and Friedman, J. (1993), ‘‘A statistical view Systems, 25, 313-323.
of some chemometrics regression tools,’’ Tech- SAS, SAS/INSIGHT, and SAS/STAT are registered
nometrics, 35, 109-135. trademarks of SAS Institute Inc. in the USA and other
countries. indicates USA registration.
4
Appendix 1: PROC PLS: An Exper- is equivalent to standard PLS when
there is only one response, and it
imental SAS Procedure for Partial invariably gives very similar results.
Least Squares METHOD=PCR
specifies principal components re-
An experimental SAS/STAT software procedure, gression.
PROC PLS, is available with Release 6.11 of the
You can specify the following PLS-options
SAS System for performing various factor-extraction
in parentheses after METHOD=PLS:
methods of modeling, including partial least squares.
Other methods currently supported include alternative ALGORITHM=PLS-algorithm
algorithms for PLS, such as the SIMPLS method of de gives the specific algorithm used to
Jong (1993) and the RLGW method of Rannar et al. compute PLS factors. Available algo-
(1994), as well as principal components regression. rithms are
Maximum redundancy analysis will also be included in
a future release. Factors can be specified using GLM-
ITER the usual iterative NIPALS al-
type modeling, allowing for polynomial, cross-product,
gorithm
and classification effects. The procedure offers a wide
variety of methods for performing cross-validation on SVD singular value decomposi-
the number of factors, with an optional test for the tion of X 0 Y , the most exact
appropriate number of factors. There are output data but least efficient approach
sets for cross-validation and model information as well EIG eigenvalue decomposition of
as for predicted values and estimated factor scores. Y 0 XX 0 Y
RLGW an iterative approach that
You can specify the following statements with the PLS
is efficient when there are
procedure. Items within the brackets <> are optional.
many factors
5
CLASS Statement
specifies that random observations
be excluded. CLASS class-variables;
CV = TESTSET(SAS-data-set)
specifies a test set of observations to You use the CLASS statement to identify classifica-
be used for cross-validation. tion variables, which are factors that separate the
observations into groups.
You also can specify the following cv-
random-opts in parentheses after CV = Class-variables can be either numeric or character.
RANDOM: The PLS procedure uses the formatted values of
NITER = number class-variables in forming model effects. Any variable
specifies the number of random sub- in the model that is not listed in the CLASS statement
sets to exclude. is assumed to be continuous. Continuous variables
must be numeric.
NTEST = number
specifies the number of observations
in each random subset chosen for
MODEL Statement
exclusion.
SEED = number MODEL responses = effects < / INTERCEPT >;
specifies the seed value for random
number generation. You use the MODEL statement to specify the re-
CVTEST < ( cv-test-options ) > sponse variables and the independent effects used
specifies that van der Voet’s (1994) to model them. Usually you will just list the names
randomization-based model comparison of the independent variables as the model effects,
test be performed on each cross-validated but you can also use the effects notation of PROC
model. You also can specify the follow- GLM to specify polynomial effects and interactions.
ing cv-test-options in parentheses after By default the factors are centered and thus no inter-
CVTEST: cept is required in the model, but you can specify the
INTERCEPT option to override this behavior.
PVAL = number
specifies the cut-off probability for
declaring a significant difference. The
OUTPUT Statement
default is 0.10.
STAT = test-statistic OUTPUT OUT=SAS-data-set keyword = names
specifies the test statistic for the < : : : keyword = names >;
model comparison. You can specify
either T2, for Hotelling’s T 2 statistic, You use the OUTPUT statement to specify a data
or PRESS, for the predicted residual set to receive quantities that can be computed for
sum of squares. T2 is the default. every input observation, such as extracted factors
NSAMP = number and predicted values. The following keywords are
available:
specifies the number of randomiza-
tions to perform. The default is 1000.
PREDICTED predicted values for responses
LV = number
specifies the number of factors to extract. YRESIDUAL residuals for responses
The default number of factors to extract is XRESIDUAL residuals for factors
the number of input factors, in which case XSCORE extracted factors (X-scores, latent
the analysis is equivalent to a regular least vectors, T )
squares regression of the responses on
the input factors. YSCORE extracted responses (Y-scores, U )
OUTMODEL = SAS-data-set STDY standardized Y variables
specifies a name for a data set to contain STDX standardized X variables
information about the fit model. H approximate measure of influence
OUTCV = SAS-data-set
PRESS predicted residual sum of squares
specifies a name for a data set to contain
information about the cross-validation. T2 scaled sum of squares of scores
6
XQRES sum of squares of scaled residuals
for factors
YQRES sum of squares of scaled residuals
for responses
The listing has two parts (Figure 5), the first part
summarizing the cross-validation and the second part
showing how much variation is explained by each ex-
tracted factor for both the factors and the responses.
Note that the extracted factors are labeled ‘‘latent
variables’’ in the listing.
7
The PLS Procedure
Cross Validation for the Number of Latent Variables
Number of
Latent Model Effects Dependent Variables
Variables Current Total Current Total
----------------------------------------------------------
1 39.3526 39.3526 28.7022 28.7022
2 29.9369 69.2895 25.5759 54.2780
3 7.9333 77.2228 21.8631 76.1411
4 6.4014 83.6242 6.4502 82.5913
5 2.0679 85.6920 16.9573 99.5486