Multiple Regression Analysis
Multiple Regression Analysis
5A
e
ut
Multiple Regression Analysis
t rib
is
rd
5A.1 General Considerations
,o
Multiple regression analysis, a term first used by Karl Pearson (1908), is an extremely useful extension
st
of simple linear regression in that we use several quantitative (metric) or dichotomous variables in
combination rather than just one such variable to predict or explain the value of a quantitatively mea-
po
sured criterion (outcome/dependent) variable. Most researchers believe that using more than one
predictor or potentially explanatory variable can paint a more complete picture of how the world works
than is permitted by simple linear regression because behavioral scientists generally believe that behav-
y,
ior, attitudes, feelings, and so forth are determined by multiple variables rather than just one. Using
op
only a single variable as a predictor or explanatory variable as is done in simple linear regression will
at best capture only one of those sources. In the words of one author (Thompson, 1991), multivariate
tc
methods such as multiple regression analysis have accrued greater support in part because they “best
honor the reality to which the researcher is purportedly trying to generalize” (p. 80).
no
Based on what we have already discussed regarding simple linear regression, it may be clear that
multiple regression can be used for predictive purposes, such as estimating from a series of entrance
tests how job applicants might perform on the job. But the regression technique can also guide research-
o
ers toward explicating or explaining the dynamics underlying a particular construct by indicating
D
which variables in combination might be more strongly associated with it. In this sense, the model that
emerges from the analysis can serve an explanatory purpose as well as a predictive purpose.
f-
As was true for simple linear regression, multiple regression analysis generates two variations of
the prediction equation, one in raw score or unstandardized form and the other in standardized form
oo
(making it easier for researchers to compare the effects of predictor variables that are assessed on differ-
ent scales of measurement). These equations are extensions of the simple linear regression models and
Pr
thus still represent linear regression, that is, they are still linear equations but use multiple variables as
predictors. The main work done in multiple regression analysis is to build the prediction equation. This
ft
primarily involves generating the weighting coefficients—the b (unstandardized) coefficients for the
ra
raw score equation and the beta (standardized) coefficients for the standardized equation. This predic-
tion model informs us that if we weight each of the predictors as the statistical analysis has indicated,
D
157
Copyright ©2017 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
158 PART II: Basic and advanced regression analysis
e
maxi R squared and mini R squared (see Freund & Littell, 2000), have been developed as well. Using
ut
such a label that includes the term “statistical” may seem a little odd (of course regression is a statisti-
cal procedure), but the label is meant to communicate something rather important but subtle regard-
rib
ing the analysis procedures. The reason for calling the procedures “statistical regression methods” is to
emphasize that once the researchers identify the variables to be used as predictors, they relinquish all
t
control of the analysis to the mathematical algorithms in carrying out the analysis.
is
In statistical regression procedures, the mathematical procedures determine the optimal weighting
rd
for each of the predictors as a set that will minimize the amount of prediction error. Researchers cannot,
for example, propose that this particular variable be given “priority” by allowing it to do its prediction
,o
work ahead of the other variables in the set. Although they were actively making decisions about what
constructs were to be used as predictors, how to measure those constructs, who was to be sampled and
how, and so on, now that they are on the brink of analyzing their data the researchers must passively
st
wait for the software to generate its output and inform them of the way the software has deemed it best
po
to weight each of the variables in the prediction model and/or which variables were granted “priority” in
the analysis by the algorithm (assuming that such “priority” was permitted by the regression method).
This relinquishing of complete control when using the statistical regression methods is not necessarily
y,
a bad thing in that we are trying to maximize the predictive work of our variables. However, as we
op
have become increasingly familiar and comfortable with more complex alternative methods in which
researchers take on more active roles in shaping the prediction model, the use of these statistical methods
tc
has given way to alternatives that call for more researcher input into the process of building the model;
many of these methods are covered in Chapters 6A and 6B as well as in Chapters 12A through 14B.
no
The variables in a multiple regression analysis fall into one of two categories: One category comprises
f-
the variable being predicted and the other category subsumes the variables that are used as the basis of
prediction. We briefly discuss each in turn.
oo
The variable that is the focus of a multiple regression design is the one being predicted. In the regression
equation, as we have already seen for simple linear regression, it is designated as an upper case Ypred. This
ft
variable is known as the criterion variable or outcome variable but is often referred to as the dependent
ra
variable in the analysis. It needs to have been assessed on one of the quantitative scales of measurement.
D
courses, the term independent variable is reserved for a variable in the context of an experimental
study, but the term is much more generally applied because ANOVA (used for the purpose of compar-
ing the means of two or more groups or conditions) and multiple regression are just different expres-
sions of the same general linear model (see Section 5A.5). In the underlying statistical analysis,
whether regression or ANOVA, the goal is to predict (explain) the variance of the dependent variable
e
based on the independent variables in the study.
ut
Talking about independent and dependent variables can get a bit confusing when the context is not
made clear. In one context (that of the general linear model), predicting the variance of the dependent
rib
variable is what the statistical analysis is designed to accomplish. This is the case whether the research
(data collection) design is ANOVA or regression.
t
In the context of the research methodology underlying the data collection process itself, experi-
is
mental studies are distinguished from regression or correlation studies by the procedures used to
rd
acquire the data. Some of the differences in the typical nature of independent variables in experimental
and regression studies within this methodology and data collection context are listed in Table 5a.1. For
,o
example, in experimental studies, independent variables are often categorical and are manipulated by
the researchers and dependent variables can be some sort of behaviors measured under one or more
st
of the treatment conditions. However, independent variables may also be configured after the fact in
correlation designs (e.g., we may define different groups of respondents to a survey medical treatment
po
satisfaction based on the class of medication patients were prescribed) rather than be exclusively based
on manipulated conditions. In regression designs, it is usual for all of the variables (the variable to be
predicted as well as the set of predictor variables) to be measured in a given “state of the system” (e.g.,
y,
we administer a battery of personality inventories, we ask employees about their attitudes on a range of
op
work satisfaction issues, we extract a set of variables from an existing archival database). To minimize
the potential for confusion, our discussion will remain in the context of the statistical analysis; should
tc
e
objectives that imply a regression design:
ut
rib
•• We want to predict one variable from a combined knowledge of several others.
•• We want to determine which variables of a larger set are better predictors of some criterion
t
variable than others.
is
•• We want to know how much better we can predict a variable if we add one or more predictor
rd
variables to the mix.
•• We want to examine the relationship of one variable with a set of other variables.
•• We want to statistically explain or account for the variance of one variable using a set of other
,o
variables.
st
5A.4.2 The Statistical Goal in a Regression Analysis
po
The statistical goal of multiple regression analysis is to produce a model in the form of a linear equa-
tion that identifies the best weighted linear combination of independent variables in the study to
y,
optimally predict the criterion variable. That is, in the regression model—the statistical outcome of
op
the regression analysis—each predictor is assigned a weight. Each predictor for each case is multiplied
by that weight to achieve a product, and those products are summed together with the constant in the
raw score model. The final sum for a given case is the predicted score on the criterion variable for
tc
that case.
The weights for the predictor variables are generated in such a way that, across all of the cases in
no
the analysis, the predicted scores of the cases are as close to their actual scores as is possible. Closeness
of prediction is defined in terms of the ordinary least squares solution. This strategy underlying the
o
solution or model describes a straight-line function for which the sum of the squared differences between
D
the predicted and actual values of the criterion variable is minimal. These differences between the pre-
dictions we make with the equation or model and the actual observed values are the prediction errors.
f-
The model thus can be thought of as representing the function that minimizes the sum of the squared
errors. When we say that the model is fitted to the data to “best” predict the dependent variable, what
oo
we technically mean is that the sum of squared errors has been minimized.
Pr
weight (contribution) assigned to each independent variable in the model is relative to the other indepen-
ra
dent variables in the analysis. Thus, we can say only that when considering this particular set of variables,
D
this one variable is weighted in the model to such and such an extent. In conjunction with a different set
of independent variables, the weight assigned to that variable may turn out to be quite different.
This “relativity” of the variable weights has a couple of implications for the interpretation of the
results. One implication is to recognize that the weight is not a feature of the variable per se but sim-
ply describes the particular role that it has played in this one analysis in combination with these other
specific variables predicting this particular outcome variable. Even a variable that can substantially pre-
dict the outcome variable in isolation may have received a very low weight in the multiple regression
analysis because its prediction work might be redundant with one or more other predictors in the model.
A second implication for the interpretation of the results is that we tend to focus on how well the
model as a whole performed. This is typically thought of in terms of the amount of variance of the outcome
e
variable that is explained by the model as described in Section 5A.9.
ut
rib
5A.4.4 Fully Specifying the Regression Model
It is possible that variables not included in the research design could have made a substantial difference
t
is
in the results. Some variables that could potentially be good predictors may have been overlooked in
the literature review, measuring others may have demanded too many resources for them to be
rd
included, and still others may not have been amenable to the measurement instrumentation available
to researchers at the time of the study. However, our working assumption in interpreting the results of
,o
the regression analysis is that the model is fully specified, that is, that we have captured all of the impor-
tant variables that are predictive of our outcome variable. With this assumption in place, we can draw
st
inferences about the phenomenon we are studying from the results of our analysis. To the extent that
potentially important variables were omitted from the research, the model is said to be incompletely
po
specified and may therefore have less external validity than is desirable.
Because of this assumption, we want to select the variables for inclusion in the analysis based on
as much theoretical and empirical rationale as we can bring to bear on the task. It is often a waste of
y,
research effort to realize after the fact that a couple of very important candidate predictors were omitted
op
from the study. Their inclusion would have potentially produced a very different dynamic and would
likely have resulted in a very different model than what we have just obtained.
tc
The regression equation, representing the prediction model, is perhaps the most straightforward
o
expression of the general linear model that was introduced more than two centuries ago by Adrien-
D
Marie Legendre in 1805 (Stigler, 1990) in which a weighted linear composite of a set of variables is used
to predict the value of some variable. For multiple regression analysis, what is predicted is a single
f-
variable but it is possible to predict the value of a weighted linear composite of another set of variables
as we do in canonical correlation analysis (see Chapter 7A).
oo
Just as was the case for simple linear regression, the multiple regression equation is produced in
both raw score and standardized score form. We discuss each in turn.
Pr
The multiple regression raw score (unstandardized) model (equation) is an expansion of the raw score
ra
In this equation, Ypred is the predicted score on the criterion variable, the Xs are the predictor
variables in the equation, and the bs are the weights or coefficients associated with the predictors.
These b weights are also referred to as partial regression coefficients (Kachigan, 1991) because each
reflects the relative contribution of its independent variable when we are statistically controlling for the
effects of all the other predictors. Each b coefficient informs us of how many units (and in what direc-
tion) the predicted Y value will increment for a 1-unit change in the corresponding X variable (we will
show this by example in a moment), statistically controlling for the other predictors in the model.
e
Because this is a raw score equation, it also contains a constant representing the Y intercept, shown as
ut
a in the equation.
All the variables are in raw score form in the model even though the metrics on which they are
rib
measured could vary widely. If we were predicting early success in a graduate program, for example,
one predictor may very well be average GRE test performance (the mean of the verbal and quantitative
t
subscores), and the scores on this variable are probably going to be in the 150 to 165 range. Another
is
variable may be grade point average, and this variable will have values someplace in the middle to high
rd
3s on a 4-point grading scale. We will say that success is evaluated at the end of the first year of the
program and is measured on a scale ranging from a low 50 to a high of 75 (just to give us three rather
,o
different metrics for our illustration here).
The b coefficients computed for the regression model are going to reflect the raw score values we
st
have for each variable (the criterion and the predictor variables). Assume that the results of this hypo-
thetical study show the b coefficient for grade point average to be 7.00 and for GRE to be about .50
po
with a Y intercept value of –40.50. Thus, controlling for the effects of GRE, a 1-unit increase in grade
point average (e.g., the difference between 3.0 and 4.0) is associated with a 7-point increase (because of
the positive sign in the model) in the predicted success criterion variable. Likewise, controlling for the
y,
effects of grade point average, a 1-unit increase in GRE score is associated with a 0.50-point increase
op
in the predicted success criterion variable. Putting these values into the equation would give us the
following prediction model:
tc
Suppose that we wished to predict the success score of one participant, Erin, based on her grade
point average of 3.80 and her GRE score of 160. To arrive at her predicted score, we place her values
o
into the variable slots in the raw score form of the regression equation. Here is the prediction:
D
We therefore predict that Erin, based on her grade point average and GRE score, will score a little
D
more than 66 on the success measure. Given that the range on the success measure is from 50 to 75, it
would appear, at least on the surface, that Erin would be predicted to have performed moderately suc-
cessfully. We would hope that this level of predicted performance would be viewed in a favorable light
by the program to which she was applying.
This computation allows you to see, to some extent, how the b coefficients and the constant came
to achieve their respective magnitudes. We expect a success score between 50 and 75. One predictor is
grade point average, and we might expect it to be someplace in the middle 3s. We are therefore likely to
have a partial regression weight much larger than 1.00 to get near the range of success scores. But the
GRE scores are probably going to be in the neighborhood of 150 plus or minus, and these may need to
e
be lowered by the equation to get near the success score range by generating a partial regression weight
ut
likely to be less than 1.00. When the dust settles, the weights overshoot the range of success scores,
requiring the constant to be subtracted from their combination.
rib
Because the variables are assessed on different metrics, it follows that we cannot easily see from
the b coefficients which independent variable is the stronger predictor in this model. Some of the
t
ways by which we can evaluate the relative contribution of the predictors to the model will be
is
discussed shortly.
rd
5A.5.2 The Standardized Model
,o
The multiple regression standardized score model (equation) is an expansion of the standardized score
st
equation for simple linear regression. It is as follows:
po
Yz pred = β1Xz1 + β2Xz2 + . . . + βn Xzn
y,
Everything in this model is in standardized (z) score form. Unlike the situation for the raw score
equation, all the variables are now measured on the same metric—the mean and standard deviation
op
for all the variables (the criterion and the predictor variables) are 0 and 1, respectively.
In the standardized equation, Yz pred is the predicted z score of the criterion variable. Each predictor
tc
variable (each X in the equation) is associated with its own weighting coefficient symbolized by the low-
ercase Greek letter β and called a beta weight, standardized regression coefficient, or beta coefficient, and
no
just as was true for the b weights in the raw score equation, they are also referred to as partial regression
coefficients. These coefficients usually compute to a decimal value between 0 and 1, but they can exceed
the range of ±1 if the predictors are correlated enough between themselves (an undesirable state of affairs
o
known as collinearity and multicollinearity (see Section 5A.21) that should be avoided either by remov-
D
ing all but one of the highly correlated predictors or by combining them into a single composite variable).
Each term in the equation represents the z score of a predictor and its associated beta coefficient.
f-
With the equation in standardized form, the Y intercept is zero and is therefore not shown.
oo
We can now revisit the example used above where we predicted success in graduate school based
on grade point average and GRE score. Here is that final model, but this time in standard score form:
Pr
Note that the beta weights, because they are based on the same metric, can now be compared to
D
each other. We will treat this topic in more detail in Section 5A.14, but for now note that the beta coef-
ficient for the GRE is greater than the beta coefficient for grade point average. Thus, in predicting
academic success, the underlying mathematical algorithm gave greater weight to the GRE than to
grade point average.
We can also apply this standardized regression model to individuals in the sample—for example,
Erin. Within the sample used for this study, assume that Erin’s grade point average of 3.80 represents
a z score of 1.80 and that her GRE score of 160 represents a z score of 1.25. We can thus solve the
equation as follows:
e
Yz pred = β1Xz1 + β2Xz2 + . . . + βnXzn
ut
Yz pred = (.31) (gpaz) + (.62) (GREz)
rib
Yz pred Erin = (.31) (gpaz Erin) + (.62) (GREz Erin)
t
Yz pred Erin = (.31) (1.80) + (.62) (1.25)
is
rd
Yz pred Erin = (.558) + (.775)
Yz pred Erin = 1.33
,o
We therefore predict that Erin, based on her grade point average and GRE score, will score about
st
1.33 SD units above the mean on the success measure. This predicted standardized outcome score of
1.33 is equivalent to Erin’s raw (unstandardized) predicted outcome score of 66.10.
po
y,
5A.6 The Variate in Multiple Regression
op
Multivariate procedures typically involve building, developing, or solving for a weighted linear combi-
nation of variables. This weighted linear combination is called a variate. The variate in this instance is
tc
the entity on the right side of the multiple regression equation composed of the weighted independent
(predictor) variables.
no
Although the variate is a weighted linear composite of the measured variables in the model, it
is often possible to view this variate holistically as representing some underlying dimension or
construct—that is, to conceive of it as a latent variable. In the preceding example where we were pre-
o
dicting success in graduate school, the variate might be interpreted as “academic aptitude” indexed
D
by the weighted linear combination of grade point average and GRE score. From this perspective, the
indicators of academic aptitude—grade point average and GRE score—were selected by the researchers
f-
to be used in the study. They then used the regression technique to shape the most effective academic
aptitude variate to predict success in graduate school.
oo
Based on the previous example, the academic aptitude variate is built to do the best job possible to
predict a value on a variable. That variable is the predicted success score. Note that the result of applying
Pr
the multiple regression model—the result of invoking the linear weighted composite of the predictor
variables (the variate)—is the predicted success score and not the actual success score. For most of the
ft
cases in the data file, the predicted and the actual success scores of the students will be different. The
ra
model minimizes these differences, but it cannot eliminate them. Thus, the variable “predicted success
score” and the variable “actual success score” are different variables, although we certainly hope that
D
they are reasonably related to each other. The variate that we have called academic aptitude generates
the predicted rather than the actual value of the success score (we will see in Section 13A.4 that the
structural equation used in structural equation modeling predicts the actual Y value because the
prediction error is included as a term in the model).
e
“step” (stage in the analysis). The standard method provides a full model solution in that all the predictors
ut
are part of it.
The idea that these variables are entered into the equation simultaneously is true only in the sense
rib
that the variables are entered in a single statistical step or block. But that single step is not at all simple
and unitary; when we look inside this step, we find that the process of determining the weights for
t
independent variables is governed by a coherent but complex strategy.
is
rd
5A.7.1 The Example to Be Used
,o
Rather than referring to abstract predictors and some amorphous dependent variable to broach this
topic, we will present the standard regression method by using an example with variables that have
st
names and meaning. To keep our drawings and explication manageable, we will work with a smaller
set of variables than would ordinarily be used in a study conceived from the beginning as a regression
po
design. Whereas an actual regression design might typically have from half a dozen to as many as a
dozen or more variables as potential predictors, we will use a simplified example of just three predictors
y,
for our presentation purposes.
The dependent variable we use for this illustration is self-esteem as assessed by Coopersmith’s
op
(1981) Self-Esteem Inventory. Two of the predictors we use for this illustration are Watson, Clark,
and Tellegen’s (1988) measures of the relative frequency of positive and negative affective behaviors a
tc
person typically exhibits. The third independent variable represents scores on the Openness scale of
the NEO Five-Factor Personality Inventory (Costa & McCrae, 1992). Openness generally assesses the
no
degree to which respondents appear to have greater aesthetic sensitivity, seek out new experiences, and
are aware of their internal states.
o
D
It is always desirable to initially examine the correlation matrix of the variables participating in a
regression analysis. This gives researchers an opportunity to examine the interrelationships of the
oo
variables, not only between the dependent variable and the independent variables but also between
the independent variables themselves.
Pr
In examining the correlation matrix, we are looking for two features primarily. First, we want
to make sure that no predictor is so highly correlated with the dependent variable as to be relatively
ft
interchangeable with it; correlations of about .70 and higher would suggest that such a predictor might
best be entered in the first block of a hierarchical analysis (see Section 6A.2) or not included in the
ra
analysis rather than proceed with the standard regression analysis that we cover here. Second, we want
D
to make sure that no two predictors are so highly correlated that they are assessing the same underly-
ing construct; again, correlations of about .70 and higher would suggest that we might want to either
remove one of the two or combine them into a single composite variable before performing a standard
regression analysis.
e
ut
t rib
is
rd
Table 5a.2 displays the correlation matrix of the variables in our example. We have presented it in
“square” form where the diagonal from upper left to lower right (containing the value 1.000 for each entry)
,o
separates the matrix into two redundant halves. As can be seen, the dependent variable of self-esteem
is moderately correlated with both Positive Affect and Negative Affect but is only modestly correlated
st
with Openness. It can also be seen that Positive Affect and Negative Affect correlate more strongly with
each other than either does with Openness.
po
5A.7.3 Building the Regression Equation
y,
The goal of any regression procedure is to predict or account for as much of the variance of the
op
criterion variable as is possible using the predictors at hand. In this example, that dependent variable
is Self-Esteem. At the beginning of the process, before the predictors are entered into the equation,
tc
100% of the variance of Self-Esteem is unexplained. This is shown in Figure 5a.1. The dependent
variable of self-esteem is in place, and the predictors are ready to be evaluated by the regression
no
procedure.
On the first and only step of the
standard regression procedure, all
o
Figure 5a.1 Predictors Assembled Prior the predictors are entered as a set
D
Negative
Affect the essence of standard regression—
ra
e
that each independent variable
ut
makes to the prediction when sta-
Negative
tistically controlling for all the other
rib
Affect
predictors. These other predictors Positive
Affect Openness
thus behave as a set of covariates in
t
the analysis in that the predictive
is
work that they do as a set is allowed
rd
to account for the variance of the
dependent variable before the pre-
,o
dictive work of a given predictor is
evaluated. Because these other predictors are afforded the opportunity to perform their predictive
st
work before the given predictor, we say that we have statistically controlled for these other predictors.
Each predictor is evaluated in turn in this manner, so that the regression coefficient obtained for any
po
predictor represents the situation in which all of the other predictors have been statistically con-
trolled (the other predictors have played the role of covariates).
This process is illustrated in Figure 5a.2. To evaluate the effectiveness of Positive Affect, the variables
y,
of Negative Affect and Openness are inserted as a set into the model. Negative Affect and Openness
op
thus take on the role of covariates. Together, Negative Affect and Openness have accounted for some of
the variance of Self-Esteem (shown as diagonal lines in Figure 5a.2).
tc
With Negative Affect and Openness in the equation for the moment, we are ready to evaluate
the contribution of Positive Affect. The criterion variable or dependent variable (Self-Esteem here)
no
is the focus of the multiple regression design. It is therefore the variance of Self-Esteem that we want
to account for or predict, and our goal is to explain as much of it as possible with our set of indepen-
dent variables. We face an interesting but subtle feature of multiple regression analysis in its efforts
o
to maximize the amount of dependent variable variance that we can explain. In the context of mul-
D
tiple regression analysis, our predictors must account for separate portions—rather than the same
portion—of the dependent variable’s variance. This is the key to understanding the regression process.
f-
With Negative Affect and Openness already in the model, and thus already accounting for some of
the variance of Self-Esteem, Positive Affect, as the last variable to enter, must target the variance in
oo
Self-Esteem that remains—the residual variance of Self-Esteem. The blank portion of Self-Esteem’s
rectangle in Figure 5a.2 represents the unaccounted-for (residual) portion of the variance of Self-
Pr
Esteem, and this is what the regression procedure focuses on in determining the predictive weight
Positive Affect will achieve in the model.
After the computations of the b and beta coefficients for Positive Affect have been made, it is
ft
necessary to evaluate another one of the predictors. Thus, Positive Affect and another predictor (e.g.,
ra
Negative Affect) are entered into the equation, and the strategy we have just outlined is repeated for
D
the remaining predictor (e.g., Openness). Each independent variable is put through this same process
until the weights for all have been determined. At the end of this complex process (which is defined as
a single “step” or “block” despite its complexity), the final weights are locked in and the results of the
analysis are printed.
e
broach that subject.
ut
As the term implies, a partial correlation is a correlation coefficient. Everything that we have
described about correlation coefficients (e.g., the Pearson r) applies equally to the particular coeffi-
rib
cient known as a partial correlation. But as the term also implies, a partial correlation is special. When
you think of the Pearson r, you envision an index that captures the extent to which a linear relation-
t
ship is observed between one variable and another variable. A partial correlation describes the linear
is
relationship between one variable and a part of another variable. Specifically, a partial correlation is
rd
the relationship between a given predictor and the residual variance of the dependent variable when the
rest of the predictors have been entered into the model. We discuss this in somewhat more detail in
,o
Section 5A.10.2.
Consider the situation depicted in Figure 5a.2. Negative Affect and Openness have been entered
into the model (for the moment) so that we can evaluate the effectiveness of Positive Affect. The diago-
st
nal lines in Figure 5a.2 show the variance of Self-Esteem that is accounted for by Negative Affect and
po
Openness; the remaining blank area shows the residual variance of Self-Esteem. The partial correla-
tion associated with Positive Affect is the correlation between Positive Affect and the residual vari-
ance of Self-Esteem when the effects of Negative Affect and Openness have been statistically removed,
y,
controlled, or “partialled out.” Such a correlation is called a partial correlation. A partial correlation
op
describes the linear relationship between two variables when the effects of other variables have been
statistically removed from one of them. In this sense, the variables already in the model are conceived
tc
of as covariates in that their effects are statistically accounted for prior to evaluating the relationship
of Positive Affect and Self-Esteem. Once the regression procedure has determined how much Positive
Affect can contribute over and above the predictive work done by the set of predictors already in the
no
model (how much of the residual variance of Self-Esteem it can explain), the software starts the process
of computing the weight that Positive Affect will receive in the model.
o
D
Assume that each of the three predictors has been evaluated in a single but obviously complex step or
oo
block, so that we know their weights and can construct the model. We will discuss the specific numerical
results shortly, but first we need to cover three additional and important concepts and their associated
Pr
statistical indexes: the squared multiple correlation, the squared semipartial correlations, and the
structure coefficients.
The first of these is the squared multiple correlation, symbolized by R2 and illustrated in Figure 5a.3.
ft
All three predictors are now in the model, and based on our discussion in the last two sections, it likely
ra
makes sense to you that all three variables in combination are accounting for the amount of Self-Esteem
variance depicted by the entire filled (shaded) area in Figure 5a.3.
D
You are already familiar with the idea that the degree of correlation between two variables can be
pictured as overlapping figures (we have used squares to conform to the pictorial style of path analysis
and structural equation modeling that we cover in the later chapters, but introductory statistics and
research methods texts tend to use circles). For the case of the Pearson r (or any bivariate correlation),
e
based on all three predictors
ut
represents the explained
Self-Esteem
variance of Self-Esteem indexed
rib
by the squared multiple
correlation (R 2) which in this case
Negative is .48.
t
Affect
is
Positive
rd
Openness Affect
,o
st
po
the shaded or overlapping area would show the strength of the correlation, and its magnitude would
be indexed by r2.
y,
The relationship shown in Figure 5a.3 is more complex than what is represented by r2, but it is
op
conceptually similar. Three variables are drawn as overlapping with Self-Esteem. Nonetheless, this still
represents a correlation, albeit one more complex than a bivariate correlation. Specifically, we are look-
ing at the relationship between the criterion variable (Self-Esteem) on the one hand and the three
tc
predictors (Positive Affect, Negative Affect, and Openness) on the other hand. When we have three
or more variables involved in the relationship (there are four here), we can no longer use the Pearson
no
correlation coefficient to quantify the magnitude of that linear relationship—the Pearson r can index
the degree of relationship only when two variables are being considered. The correlation coefficient we
o
need to call on to quantify the degree of a more complex relationship is known as a multiple correlation
coefficient. It is symbolized as an uppercase italic R. That said, the Pearson r is really the limiting case
D
A multiple correlation coefficient indexes the degree of linear association of one variable (the out-
come variable in the case of multiple regression analysis) with a set of other variables (the predictor
oo
variables in the case of multiple regression analysis), and the squared multiple correlation (R2), some-
times called the coefficient of multiple determination, tells us the strength of this complex relationship,
Pr
that is, it tells us how much variance of the outcome variable is explained by the set of predictor vari-
ables. In Figure 5a.3, the diagonally shaded area—the overlap of Positive Affect, Negative Affect, and
Openness with Self-Esteem—represents the R2 for that relationship. In this case, we are explaining the
ft
variance of Self-Esteem. The R2 value represents one way to evaluate the model. Larger values mean
ra
that the model has accounted for greater amounts of the variance of the dependent variable.
The second feature important to note in Figure 5a.3 is that the three predictors overlap with each
D
other, indicating that they correlate with one another (as we documented in Table 5a.2). The degree to
which they correlate affects the partial regression weights these variables are assigned in the regression
equation, so the correlations of the predictors become a matter of some interest to researchers using a
regression design.
e
variance of Self-Esteem that is uniquely associated with only a single predictor by using a solid fill and
ut
have retained the diagonal fill area to represent variance that overlaps between two or all of the pre-
rib
dictors. The amount of variance explained uniquely by a predictor is indexed by another correlation
statistic known as the squared semipartial correlation, often symbolized as sr2. It represents the extent
t
to which each variable does independent predictive work when combined with the other predictors
is
in the model. Each predictor is associated with a squared semipartial correlation. The semipartial
rd
correlation describes the linear relationship between a given predictor and the variance of the
dependent variable.
,o
Figure 5a.4 A Depiction of the Squared Semipartial Correlations
st
po
This is the variance of Self-Esteem
explained for that is unique to
y,
Negative Affect indexed by its
op
squared semipartial correlation
Self-Esteem
This is the variance of Self-Esteem
tc
Openness
D
semipartial correlation
Pr
Note. The solid fill represents self-esteem variance accounted for that is unique to each predictor; the diagonal fill represents self-esteem
variance accounted for common to two or more predictors.
ft
ra
Distinguishing between the squared semipartial and squared partial correlations is subtle but very
important because these represent descriptions of two similar but different relationships between each
predictor and the dependent variable. To simplify our discussion, we have drawn in Figure 5a.5 only
two generic predictor (independent) variables (IV1 and IV2) for a given generic dependent variable.
In Figure 5a.5, the lowercase letters identify different areas of the variance of the predictor and
dependent variables so that we may contrast the squared semipartial correlation coefficient with the
squared partial correlation coefficient. These different variance areas are as follows:
•• Area a is the variance of the dependent variable that is explained but cannot be attributed
e
uniquely to either predictor.
ut
•• Area b is the variance of the dependent variable that is explained uniquely by IV1.
•• Area c is the variance of IV1 that is not related to the dependent variable.
rib
•• Area d is the variance of the dependent variable that is explained uniquely by IV2.
•• Area e is the variance of IV2 that is not related to the dependent variable.
t
•• Area f (the blank area of the dependent variable labeled twice in the figure) is the variance of the
is
dependent variable that is not explained by either predictor (it is the residual variance of the
rd
dependent variable once the model with two predictors has been finalized).
,o
Consider Area b in Figure 5a.5, although
an analogous analysis can be made for Area d. Figure 5a.5 Contrasting the Squared Semipartial and
Squared Semipartial Correlations for IV1
st
This is the variance of the dependent variable
that is explained only (uniquely) by IV1.
po
Because we are dealing with squared correla-
tions that are interpreted as a percent of DV
variance explained, we must compute a pro-
y,
portion or percent, that is, we must compute
op
the value of a ratio between two variances. f
Area b is the conceptual focus of both the
tc
e
correlation and the squared partial correla- a
D
to know the percentage of variance attributable to the unique contribution of IV1 with respect to one of
D
•• The frame of reference used in computing the squared semipartial correlation is the total
variance of the dependent variable. In Figure 5a.5, the denominator would be a + b + d + f.
Thus, the computation of the squared semipartial correlation for IV1 is b/(a + b + d + f).
What we obtain is the percent of variance of the dependent variable that is associated with the
unique contribution of IV1.
•• The frame of reference used in computing the squared partial correlation between the pre-
dictor and the dependent variable is only that portion of the variance of the dependent variable
remaining when the effects of the other predictors have been removed (statistically removed,
e
nullified). In Figure 5a.5, the denominator would be b + f. Thus, the computation of the
ut
squared partial correlation for IV1 is b/(b + f). What we obtain is the percent of variance of the
dependent variable not predicted by the other predictor(s) that is associated with the unique
rib
contribution of IV1.
t
Given that the frame of reference for the squared partial correlation contains only Areas b and
is
f whereas the frame of reference for the squared semipartial correlation contains Areas b, f, a, and
rd
d, it follows that the denominator for the squared semipartial correlation will always be larger than
the denominator for the squared partial correlation (unless the other areas have a zero value).
,o
Because (in pictorial terms) we are asking about the relative size of Area b for both squared correla-
tions, the upshot of this straightforward arithmetic is that the squared partial correlation will almost
st
always have a larger value than the squared semipartial correlation. This explanation may be sum-
marized as follows:
po
•• The squared semipartial correlation represents the proportion of variance of the dependent
variable uniquely explained by an independent variable when the other predictors are taken
y,
into consideration.
op
•• The squared partial correlation is the amount of explained variance of the dependent variable
that is incremented by including an independent variable in the model that already contains
tc
When the regression model is finally in place, as it is in Figure 5a.5, our interest in the squared
partial correlation fades because it was more useful in constructing the model, and our interest shifts
to the squared semipartial correlation. Thus, when examining Figure 5a.5 or the numerical results of
o
the regression analysis, what interests us is the variance of the dependent variable that is uniquely
D
explained by each predictor, that is, we are interested in the squared semipartial correlation associated
with each predictor (in Figure 5a.5, that would be Areas b and d with respect to the total variance of
f-
In Figure 5a.5, the squared multiple correlation (R2) can be visualized as the proportion of the total
ft
variance of the dependent variable covered by Areas a, b, and d. That is, the squared multiple correla-
ra
tion takes into account not only the unique contributions of the predictors (Areas b and d) but also the
overlap between them (Area a). The squared semipartial correlations focus only on the unique contri-
D
predictors—does not equal the squared multiple correlation. The reason for this is that the predictor
variables are themselves correlated and to that extent will overlap with each other in the prediction of
the dependent variable that they are attempting to predict.
Generally, we can tell how well the model fits the data by considering the value of the squared
multiple correlation. But we can also evaluate how well the model works on an individual predic-
e
tor level by examining the squared semipartial correlations (Tabachnick & Fidell, 2013b). With the
ut
squared semipartial correlations, we are looking directly at the unique contribution of each predictor
within the context of the model, and, clearly, independent variables with larger squared semipartial
rib
correlations are making larger unique contributions.
There are some limitations in using squared semipartial correlations to compare the contributions
t
of the predictors. The unique contribution of each variable in multiple regression analysis is very much
is
a function of the correlations of the variables used in the analysis. It is quite likely that within the con-
rd
text of a different set of predictors, the unique contributions of these variables would change, perhaps
substantially. Of course, this argument is true for the partial regression coefficients as well.
,o
Based on this line of reasoning, one could put forward the argument that it would therefore be
extremely desirable to select predictors in a multiple regression design that are not at all correlated
st
between themselves but are highly correlated with the criterion variable. In such a fantasy scenario,
the predictors would account for different portions of the dependent variable’s variance, the squared
po
semipartial correlations would themselves be substantial, the overlap of the predictors would be mini-
mal, and the sum of the squared semipartial correlations would approximate the value of the squared
multiple correlation.
y,
This argument may have a certain appeal at first glance, but it is not a viable strategy for both
op
practical and theoretical reasons. On the practical side, it would be difficult or perhaps even impos-
sible to find predictors in most research arenas that are related to the criterion variable but at the same
tc
time are not themselves at least moderately correlated. On the theoretical side, it is desirable that the
correlations between the predictors in a research study are representative of those relationships in the
no
population. All else equal, to the extent that variables are related in the study as they are in the outside
world, the research results may be said to have a certain degree of external validity.
The consequence of moderate or greater correlation between the predictors is that the unique
o
contribution of each independent variable may be relatively small in comparison with the total amount
D
of explained variance of the prediction model because the predictors in such cases may overlap consid-
erably with each other. Comparing one very small semipartial value with another even smaller semi-
f-
partial value is often not a productive use of a researcher’s time and runs the risk of yielding distorted
or inaccurate conclusions.
oo
In our discussion of the variate, we emphasized that there was a difference between the predicted value
ft
and the actual score that individuals obtained on the dependent variable. Our focus here is on the
ra
predicted score, which is the value of the variate for the particular values of the independent variables
substituted in the model. The structure coefficient is the bivariate correlation between a particular
D
independent variable and the predicted (not the actual) score (Dunlap & Landis, 1998). Each predictor
is associated with a structure coefficient.
A structure coefficient represents the correlation between an individual variable that is a part of
the variate and the weighted linear combination itself. Stronger correlations indicate that the predic-
tor is a stronger reflection (indicator, gauge, marker) of the construct underlying the variate. Because
the variate is a latent variable, a structure coefficient can index how well a given predictor variable can
serve as an indicator or marker of the construct represented by the variate. This feature of structure
coefficients makes them extremely useful in multivariate analyses, and we will make considerable use of
them in the context of canonical correlation analysis (Chapters 7A and 7B), principal components and
factor analysis (Chapters 10A and 10B), and discriminant function analysis (Chapters 19A and 19B).
e
The numerical value of the structure coefficient is not contained in the output of IBM SPSS but is
ut
easy to compute with a hand calculator using the following information available in the output:
rib
rxi y
Structure coefficient =
t
R
is
rd
where rxi y is the Pearson correlation between the given predictor (xi) and the actual (measured) depen-
dent variable, and R is the multiple correlation.
,o
5A.12 Statistical Summary of the Regression Solution
st
There are two levels of the statistical summary of the regression solution, a characterization of the
po
effectiveness of the overall model and an assessment of the performance of the individual predictors.
Examining the results for the overall model takes precedence over dealing with the individual
y,
predictors—if the overall model cannot predict better than chance, then there is little point in evaluat-
ing how each of the predictors fared. We discuss evaluating the overall model in Section 5A.13 and
op
examining the individual predictor variables in Section 5A.14.
tc
The overall model is represented by the regression equation. There are two questions that we address
in evaluating the overall model, one involving a somewhat more complex answer than the other:
o
D
is whether the predictors as a set can account for a statistically significant amount of the variance of the
dependent variable. This question is evaluated by using an ANOVA akin to a one-way between subjects
ft
design (see Chapter 18A), with the single “effect” in the ANOVA procedure being the regression model
itself. The degrees of freedom associated with the total variance and its partitions are as follows:
ra
D
•• The degrees of freedom for the total variance are equal to N − 1, where N is the sample size.
•• The degrees of freedom for the regression model (the effect) is equal to the number of predictor
variables in the model that we symbolize here as v.
•• The degrees of freedom for the error term are equal to (N − 1) − v; that is, it is equal to the total
degrees of freedom minus the number of predictors in the model.
The null hypothesis tested by the F ratio resulting from the ANOVA is that prediction is no better
than chance in that the predictor set cannot account for any of the variance of the dependent variable.
Another way to express the null hypothesis is that R2 = 0. If the F ratio from the ANOVA is statistically
significant, then it can be concluded that the model as a whole accounts for a statistically significant
percentage of the variance of the dependent variable (i.e., R2 > 0). In our example analysis using
e
Positive Affect, Negative Affect, and Openness to predict Self-Esteem, the effect of the regression
ut
model was statistically significant, F(3, 416) = 129.32, p < .001, R2 = .48, adjusted R2 = .48. We would
therefore conclude that the three independent variables of Positive Affect, Negative Affect, and
rib
Openness in combination significantly predicted Self-Esteem.
t
is
5A.13.2 The Amount of Variance
rd
Explained by the Model: R2 and Adjusted R2
5A.13.2.1 Variance Explained: The Straightforward Portion of the Answer
,o
The more complex of the two questions to answer concerns how much variance of the dependent
st
variable the model explains. We can answer this question at one level in a straightforward manner for
the moment by examining the value of R2. In our example, the value for R2, shown in a separate table
po
in the IBM SPSS output, was .483. The straightforward answer to the question, then, is that the three
predictors in this particular weighted linear combination were able to explain about 48% of the variance
of Self-Esteem.
y,
We should also consider the magnitude of the obtained R2. One would ordinarily think of .483 as
op
reasonably substantial, and most researchers should not be terribly disappointed with R2s consider-
ably less than this. In the early stages of a research project or when studying a variable that may be
tc
complexly determined (e.g., rate of spread of an epidemic, recovery from a certain disease, multicul-
tural sensitivity), very small but statistically significant R2s may be cause for celebration by a research
no
team. Just as we suggested in Sections 2.8.2 for eta square, in the absence of any context R2 values,
.10, .25, and .40 might be considered to be small, medium, and large strengths of effect, respectively
(Cohen, 1988); however, we conduct our research within a context, and so the magnitude of the effect
o
must be considered with respect to the theoretical and empirical milieu within which the research was
D
originally framed.
f-
At another level, the answer to the question of how much variance is explained by the regression model
has a more complex aspect to it. The reason it is more complex is that the obtained R2 value is actually
Pr
We can identify two very general and not mutually incompatible strategies that can be implemented
(Darlington, 1960; Yin & Fan, 2001) to estimate the amount of R2 inflation (as we will see in a moment,
the ordinary terminology focuses on R2 shrinkage, the other side of the R2 inflation coin); one set of
strategies is empirical and is focused on error variance; another set of strategies is analytic and is
focused on the number of predictors with respect to sample size.
e
actually be working in the direction of enhanced prediction. The multiple regression algorithm, how-
ever, is unable to distinguish between this chance enhancement (i.e., blind luck from the standpoint of
ut
trying to achieve the best possible R2) and the real predictive power of the variables, so it uses every-
rib
thing it can to maximize prediction—it generates the raw and standardized regression weights from
both true and error sources combined. By drawing information from both true score variance and
t
error variance because it cannot distinguish between the two sources in fitting the model to the data,
is
multiple regression procedures will overestimate the amount of variance that the model explains
rd
(Cohen et al., 2003; Yin & Fan, 2001).
The dynamics of this problem can be understood in this way: In another sample, the random
dictates of error will operate differently, and if the weighting coefficients obtained from our original
,o
regression analysis are applied to the new sample, the model will be less effective than it appeared to
be for the original sample, that is, the model will likely yield a somewhat lower R2 than was originally
st
obtained. This phenomenon is known by the term R2 shrinkage. R2 shrinkage is more of a problem when
po
we have relatively smaller sample sizes and relatively more variables in the analysis. As sample size and
the number of predictors reach more acceptable proportions (see Section 5A.13.2.4), the shrinkage of
R2 becomes that much less of an issue.
y,
Empirical strategies estimating the amount of R2 shrinkage call for performing a regression
analysis on selected portions of the sample, an approach generally known as a resampling strategy. In
op
the present context, resampling can be addressed through procedures such as cross-validation, double
cross-validation, and the use of a jackknife procedure.
tc
To perform a cross-validation procedure, we ordinarily divide a large sample in half (into two
equal-sized subsamples, although we can also permit one sample to be larger than the other) by ran-
no
domly selecting the cases to be assigned to each. We then compute our regression analysis on one sub-
sample (the larger one if unequal sample sizes were used) and use those regression weights to predict
the criterion variable of the second “hold-back” sample (the smaller one if unequal sample sizes were
o
used). The R2 difference tells us the degree of predictive loss we have observed. We can also correlate the
D
predicted scores of the hold-back sample with their actual scores; this is known as the cross-validation
coefficient.
f-
Double cross-validation can be done by performing the cross-validation process in both directions—
oo
that is, performing the regression analysis on each subsample and applying the results to the other. In a
sense, we obtain two estimates of shrinkage rather than one, that is, we can obtain two cross-validation
Pr
coefficients. If the shrinkage is not excessive, and there are few guidelines as to how to judge this, we
can then perform an analysis on the combined sample and report the double cross-validation results to
let readers know the estimated generalizability of the model.
ft
The jackknife procedure was introduced by Quenouille (1949) and is currently treated (e.g., Beasley &
ra
Rodgers, 2009; Carsey & Harden, 2014; Chernick, 2011; Efron, 1979; Fuller, 2009; Wu, 1986) in the
context of bootstrapping (where we draw with replacement repeated subsamples from our sample to
D
estimate the sampling distribution of a given parameter). The jackknife procedure is also called, for
what will be obvious reasons, a leave-one-out procedure.
Ordinarily, we include all of the cases with valid values on the dependent and independent
variables in an ordinary regression analysis. To apply a jackknife procedure, we would temporarily
remove one of those cases (i.e., leave one of those cases out of the analysis), say Case A, perform the
analysis without Case A, and then apply the regression coefficients obtained in that analysis to predict
the value of the dependent variable for Case A. We then “replace” Case A back into the data set, select
another case to remove, say Case B, repeat the process for Case B, and so on, until all cases have had
their Y score predicted.
e
The comparison of these jackknifed results with those obtained in the ordinary (full-sample)
ut
analysis gives us an estimate of how much shrinkage in explained variance we might encounter down
the road. IBM SPSS does not have a jackknife procedure available for multiple regression analysis,
rib
but that procedure is available for discriminant function analysis (see Section 19A.10.4). IBM SPSS
does, however, offer a bootstrapping add-on module (IBM, 2013b) but we do not cover it in this book;
t
coverage of this topic can be found in Chernick (2011), Davison and Hinkley (1997), and Efron and
is
Tibshirani (1993).
rd
5A.13.2.4 Variance Explained: Analytic
,o
Strategies for Estimating the Amount of R2 Shrinkage
In addition to capitalizing on chance, R2 can also be mathematically inflated by having a relatively
st
larger number of predictors relative to the size of the sample, that is, R2 can be increased simply by
po
increasing the number of predictors we opt to include in the model for a given sample size (e.g.,
Stuewig, 2010; Wooldridge, 2009; Yin & Fan, 2001). The good news here is that statisticians have been
able to propose ways to “correct” or “adjust” the obtained R2 that takes into account both the sample
y,
size and the number of predictors in the model. When applied to the obtained R2, the result is known
op
as an adjusted R2. This adjusted R2 provides an estimate of what the R2 might have been had it not been
inflated by the number of predictors we have included in the model relative to our sample size. All of
the major statistical software packages report in their output an adjusted R2 value in addition to the
tc
observed R2.
To further complicate this scenario, there are two different types of adjusted R2 values that are
no
described in the literature, and there is no single way to compute either of them. Some of these formulas
(e.g., Olkin & Pratt, 1958; Wherry, 1931) are intended to estimate the population value of R2, whereas
o
others (e.g., Browne, 1975; Darlington, 1960; Rozeboom, 1978) are intended to estimate the cross-
D
validation coefficient (Yin & Fan, 2001). Yin and Fan (2001) made a comparison of the performance
of 15 such formulas (some estimating the population value of R2 and others estimating the cross-
f-
validation coefficient), and Walker (2007) compared four estimates of the cross-validation coefficient
in addition to a bootstrap procedure.
oo
Our interest here is with the adjusted R2 as an estimate of the population value of R2. The algorithm
that is used by IBM SPSS to compute its adjusted R2 value (IBM, 2013a) and three other well-known
Pr
formulas, the Wherry formulas 1 and 2 and the Olkin–Pratt formula, are presented in Figure 5a.6. They
all contain the following three elements:
ft
Although these formulas are somewhat different and when solved will lead to somewhat different
adjusted values of R2, the estimates appear to be relatively close to each other from the standpoint of
researchers. For example, let us assume that investigators conducting a hypothetical research study
Figure 5a.6 The IBM SPSS Formula, the Wherry Formulas 1 and 2, and
the Okin–Pratt Formula for Computing the Adjusted R2 as an Estimate of the Population R2
e
ut
t rib
is
rd
,o
st
po
Note. In the formulas to complete the adjusted R2 value, R2 is the obtained value from the analysis, N is the sample size, and v is the
number of predictor variables in the model.
y,
op
used a sample size of 100 (what we would regard as too small an N) and eight predictor variables (an
excessive number of predictors for that sample size), and they obtained an R2 of .25. The resulting
tc
Considering that these values hover around .18 or .19, the adjusted R2 values appear to be approxi-
f-
mately three quarters the magnitude of the observed R2 value. In our view, this is a considerable amount
of estimated shrinkage.
oo
We can contrast this hypothetical result with our worked example. In our worked example, the
adjusted R2 value was .479 (shown as part of the IBM SPSS output), giving us virtually the same value
Pr
as the unadjusted R2. That such little adjustment was made to R2 is most likely a function of the sample
size to number of variables ratio we used (the analysis was based on a sample size of 420 cases; with just
three predictors, our ratio was 140:1).
ft
Yin and Fan (2001) suggested that good quality estimates of adjusted R2 values were obtained with
ra
most of the R2 estimation formulas they evaluated when the ratio of sample size to number of predic-
D
tors was 100. In research environments where there are a limited number of potential cases that may
be recruited for a study (e.g., a university, hospital, or organizational setting), such a ratio may be dif-
ficult to achieve. In more restrictive settings where we must accept pragmatic compromises to get the
research done, our recommendations are that the sample size should generally be no less than about
200 or so cases and that researchers use at least 20 but preferably 30 or more cases per predictor.
e
with each regression weight, the Pearson correlation (r) of each predictor with the dependent variable,
ut
the amount of Self-Esteem variance each predictor has uniquely explained (squared semipartial cor-
relation), and the structure coefficient associated with each predictor. The constant is the Y intercept
rib
of the raw score model and is shown in the last line of the table, and the R2 and the adjusted R2 values
are shown in the table note.
t
is
rd
Table 5a.3 Summary of the Example for Multiple Regression
,o
st
po
y,
op
tc
The predictor variables are shown in the first column of Table 5a.3. This represents a complete solution
in the sense that all the independent variables are included in the final equation regardless of how
much (or how little) they contribute to the prediction model.
o
D
Using the raw and standardized regression weights, and the Y intercept shown in Table 5a.3, we have
the elements of the two regression equations. These are produced below.
oo
5A.14.3 t Tests
IBM SPSS tests the significance of each predictor in the model using t tests. The null hypothesis is that
a predictor’s weight is effectively equal to zero when the effects of the other predictors are taken into
account. This means that when the other predictors act as covariates and this predictor is targeting the
residual variance, according to the null hypothesis the predictor is unable to account for a statistically
significant portion of it; that is, the partial correlation between the predictor and the criterion variable
is not significantly different from zero. And it is a rare occurrence when every single independent vari-
able turns out to be a significant predictor. The t test output shown in Table 5a.3 informs us that only
e
Positive Affect and Negative Affect are statistically significant predictors in the model; Openness does
ut
not receive a strong enough weight to reach that touchstone.
rib
5A.14.4 b Coefficients
t
is
The b and beta coefficients in Table 5a.3 show us the weights that the variables have been assigned at
the end of the equation-building process. The b weights are tied to the metrics on which the variables
rd
are measured and are therefore difficult to compare with one another. But with respect to their own
metric, they are quite interpretable. The b weight for Positive Affect, for example, is 2.89. We may take
,o
that to mean that when the other variables are controlled for, an increase of 1 point on the Positive
Affect measure is, on average, associated with a 2.89-point gain in Self-Esteem. Likewise, the b weight
st
of −2.42 for Negative Affect would mean that, with all of the other variables statistically controlled,
every point of increase on the Negative Affect measure (i.e., greater levels of Negative Affect)
po
corresponds to a lower score on the Self-Esteem measure of 2.42 points.
Table 5a.3 also shows the Y intercept for the linear function. This value of 56.66 would need to be
added to the weighted combination of variables in the raw score equation to obtain the predicted value
y,
of Self-Esteem for any given research participant.
op
The beta weights for the independent variables are also shown in Table 5a.3. Here, all the variables are
in z score form and thus their beta weights, within limits, can be compared with each other. We can
o
see from Table 5a.3 that Positive Affect and Negative Affect have beta weights of similar magnitudes
D
and that Openness has a very low beta value. Thus, in achieving the goal of predicting Self-Esteem to
the greatest possible extent (to minimize the sum of the squared prediction errors), Positive Affect and
f-
Negative Affect are given much more relative weight than Openness.
oo
Some authors (e.g., Cohen et al., 2003; Pedhazur, 1997; Pedhazur & Schmelkin, 1991) have cautiously
argued that, at least under some circumstances, we may be able to compare the beta coefficients of the
predictors with each other. That is, on the basis of visual examination of the equation, it may be pos-
ft
sible to say that predictors with larger beta weights contribute more to the prediction of the dependent
ra
It is possible to quantify the relative contribution of predictors using beta weights as the basis of the
comparison. Although Kachigan (1991) has proposed examining the ratio of the squared beta weights
to make this comparison, that procedure may be acceptable only in the rare situation when those pre-
dictors whose beta weights are being compared are uncorrelated (Pedhazur & Schmelkin, 1991). In the
everyday research context, where the independent variables are almost always significantly correlated,
we may simply compute the ratio of the actual beta weights (Pedhazur, 1997; Pedhazur & Schmelkin,
1991), placing the larger beta weight in the numerator of the ratio. This ratio reveals how much more
one independent variable contributes to prediction than another within the context of the model.
This comparison could work as follows. If we wanted to compare the efficacy of Negative Affect
(the most strongly weighted variable in the model) with the other (less strongly weighted) predictors,
e
we would ordinarily limit our comparison to only the statistically significant ones. In this case, we
ut
would compare Negative Affect only with Positive Affect. We would therefore compute the ratio of the
beta weights (Negative Affect/Positive Affect) without carrying the sign of the beta through the compu-
rib
tation (.43/.40 = 1.075). Based on this ratio (although we could certainly see this just by looking at the
beta weights themselves), we would say that Negative Affect was 1.075 times a more potent predictor
t
in this model. Translated to ordinary language, we would say that Negative Affect and Positive Affect
is
make approximately the same degree of contribution to the prediction of Self-Esteem in the context of
rd
this research study with the present set of variables.
,o
5A.14.5.3 Concerns With Using the
Beta Coefficients to Evaluate Predictors
st
We indicated above that even when authors such as Pedhazur (1997; Pedhazur & Schmelkin, 1991)
po
endorse the use of beta coefficient ratios to evaluate the relative contribution of the independent variables
within the model, they usually do so with certain caveats. Take Pedhazur (1997) as a good illustration:
y,
Broadly speaking, such an interpretation [stating that the effect of one predictor is twice as
op
great as the effect of a second predictor] is legitimate, but it is not free of problems because the
Beta is affected, among other things, by the variability of the variable with which they are
tc
independent variable may predict the dependent variable to a great extent in isolation, and one would
D
therefore expect to see a relatively high beta coefficient associated with that predictor. Now place
another predictor that is highly correlated with the first predictor into the analysis, and all of a sudden,
f-
the beta coefficients of both predictors can plummet (because each is evaluated with the other treated
as a covariate). The first predictor’s relationship with the dependent variable has not changed in this
oo
scenario, but the presence of the second correlated predictor could seriously affect the magnitude of the
beta weight of the first. This “sensitivity” of the beta weights to the correlations between the predictors
Pr
places additional limitations on the generality of the betas and thus their use in evaluating or compar-
ing predictive effectiveness of the independent variables.
The sensitivity of a beta coefficient associated with a given predictor to the correlation of that
ft
predictor with other predictors in the model can also be manifested in the following manner: If two or
ra
three very highly correlated predictors were included in the model, their beta coefficients could exceed
D
a value of 1.00, sometimes by a considerable margin. Ordinarily, researchers would not include very
highly correlated variables in a regression analysis (they would either retain only one or create a single
composite variable of the set), but there are special analyses where researchers cannot condense such a
set of variables (see Section 6B.2, for an example); in such analyses, researchers focus on R2 or (depending
on the analysis) the change in R2 and ignore these aberrant beta coefficient values.
Recognizing that the value of a beta coefficient associated with a variable is affected by, among
other factors, the variability of the variable, the correlation of the variable with other predictors in the
model, and the measurement error in assessing the variable, Jacob Cohen (1990) in one of his classic
articles titled “Things I Have Learned (So Far)” went so far as to suggest that in many or most situa-
tions, simply assigning unit or unitary weights (values of 1.00) to all significant predictors can result
e
in at least as good a prediction model as using partial regression coefficients to two decimal values
ut
as the weights for the variables. Cohen’s strategy simplifies the prediction model to a yes/no decision
for each potential predictor, and although it is not widely used in regression studies, it is a strategy
rib
that is commonly used in connection with factor analysis where a variable is either included with a
unitary weight or not included when we construct the subscales that are used to represent a factor
t
(see Section 10B.14).
is
rd
5A.14.5.4 Recommendations for Using Betas
,o
We do not want to leave you completely hanging at this point in our treatment, so we will answer the
obvious questions. Should you use the beta weights to assess the relative strengths of the predictors in
st
your own research? Yes, although we have considerable sympathy with the wisdom expressed by
Cohen (1990) of using unit weights. Should beta coefficients be the only index you check out? No. The
po
structure coefficients and the squared semipartial correlations should be examined as well. And, ulti-
mately, using the raw regression weights to inform us of how much of a change in the dependent
variable is associated with a unit difference in the predictor, given that all of the other predictors are
y,
acting as covariates, will prove to be a very worthwhile interpretation strategy.
op
The fourth numerical column in Table 5a.3 shows the simple Pearson correlations between Self-
Esteem and each of the predictors. We have briefly described the correlations earlier. For present
purposes, we can see that the correlations between Self-Esteem and Positive Affect and Openness are
o
positive. This was the case because each of these variables is scored in the positive direction—higher
D
scores mean that respondents exhibit greater levels of self-esteem and more positive affective behaviors
and that they are more open to new or interesting experiences. Because higher scores on the Self-
f-
Esteem scale indicate greater positive feelings about oneself, it is not surprising that these two predic-
tors are positively correlated with it. On the other hand, Negative Affect is negatively correlated with
oo
Self-Esteem. This is also not surprising because those individuals who exhibit more negative affective
behaviors are typically those who have lower levels of self-esteem.
Pr
The next to last column of Table 5a.3 displays the squared semipartial correlations for each predictor.
These correlations are shown in the IBM SPSS printout as “part correlations” and appear in the print-
D
out in their nonsquared form. This statistic indexes the variance accounted for uniquely by each pre-
dictor in the full model. What is interesting here, and this is pretty typical of multiple regression
research, is that the sum of these squared semipartial correlations is less than the R2. That is, .14, .16,
and .00 add up to .30 and not to the R2 of .48.
The reason these squared semipartial correlations do not add to the value of R2 is that the independent
variables overlap (are correlated) with each other. Here, the predictors uniquely account for 30% of the
variance, whereas (by subtraction) 18% of the accounted-for variance is handled by more than one of
them. We therefore have some but not a huge amount of redundancy built into our set of predictors.
Using the squared semipartial correlations is another gauge of relative predictor strength in the
e
model. From this perspective, Positive Affect and Negative Affect are approximately tied in their unique
ut
contribution to the prediction model under the present research circumstances, whereas Openness is
making no contribution on its own.
rib
5A.14.8 The Structure Coefficients
t
is
The last column in Table 5a.3 shows the structure coefficients, an index of the correlation of each
rd
variable with the weighted linear combination (the variate or prediction model). These coefficients
needed to be hand calculated (see Section 5A.11) because IBM SPSS does not provide them. For each
,o
independent variable in the table, we divided the Pearson r representing the correlation of the inde-
pendent variable and the dependent variable (shown in the fourth numerical column) by the value of
st
the multiple correlation. To illustrate this computation for Positive Affect, we divide its Pearson cor-
relation with Self-Esteem (.55) by the value of R (the square root of R2); thus, .55 is divided by the
po
square root of .483, or .55/.69 = approximately .80.
The structure coefficients indicate that Positive Affect and Negative Affect correlate reasonably
highly with the variate. In this example, using the structure coefficients as a basis to compare the con-
y,
tribution of the predictors presents the same picture as those painted by the beta weights and the
op
squared semipartial correlations. We would use these structure coefficients to interpret the variate; in
this example, we would say that in the context of this predictor set, the affect levels of individuals best
tc
predict self-esteem. Note that in the everyday world more than affect levels unquestionably predict
self-esteem but, because we used only three predictors in this study, our conclusions are limited. Such
no
a limitation is generally true for multiple regression analysis, in that we can draw our conclusions only
on the variables in the study, and the variable set we used may not be inclusive of all the potential deter-
miners of our outcome variable (we may not be able to realistically fully specify all of the potentially
o
viable predictors).
D
Such consistency in interpretation between the interpretations based on the structure coefficients
and the beta coefficients as we saw in this example is not always obtained. Beta coefficients and struc-
f-
•• A beta coefficient associated with its predictor reflects the correlations of that predictor with
the other predictors in the analysis. A structure coefficient does not take into account the
Pr
correlated with each other. Many researchers are not keen on seeing, much less interpreting, beta
weights greater than unity. However, structure coefficients are absolutely bounded by the range
ra
±1 because they are correlation coefficients, thus making them always clearly interpretable.
D
Our recommendations are consistent with what we offered above for beta weights. We concur with
Thompson and Borrello (1985) that the structure coefficients are a useful companion index of relative
predictor contribution. Unlike the beta coefficients and the squared semipartial correlations, structure
coefficients are not affected by the correlations between the predictors although, as is true for all of the
regression statistics, the structure coefficients could change substantially if a different set of predictors
happened to be used.
Pedhazur (1997) notes that structure coefficients will show the same pattern of relationships as the
Pearson correlations of the predictors and the criterion. Because of this, Pedhazur is not convinced of
e
the utility of structure coefficients. Nonetheless, in our view, by focusing on the correlation between
ut
the predictor and the variate, we believe that structure coefficients may add a nuance to the interpre-
tation of the regression analysis that we think is worthwhile. Furthermore, it is common practice to
rib
make extensive and regular use of structure coefficients in other multivariate analyses where our focus
is on interpreting a variate (e.g., factor analysis, discriminant function analysis, canonical correlation
t
analysis), and so it makes sense to include regression under that umbrella as well.
is
rd
5A.15 Step Methods of Building the Model
,o
The step methods of building the regression equation that we briefly cover here are part of the class of
statistical regression methods, and it will be clear from our descriptions just how the software pro-
st
grams are in charge of the decision process for selecting the ordering of the predictors as the model is
po
built. We cover here the forward method, the backward method, and the stepwise method. These
methods construct the model one step at a time rather than all at once as the standard method does.
The primary goal of these step methods is to build a model with only the “important” predictors
y,
in it, although importance is still relative to the set of predictors that are participating in the analysis.
op
The methods differ primarily in how they arrange the steps in entering or removing variables from
the model.
tc
In the forward method of multiple regression analysis, rather than placing all the variables in the
model at once, we add independent variables to the model one variable at a time. Thus, each step cor-
o
responds to a single variable absorbed into the model. At each step, we enter the particular variable
D
that adds the most predictive power at that time, with the proviso that the variable accounts for a sta-
tistically significant amount of the unexplained variance of the dependent variable (i.e., that its partial
f-
correlation is statistically significant). Most software applications use an alpha of .05 to define statisti-
oo
cal significance. If we were working with the set of variables we used to illustrate the standard regres-
sion method, Negative Affect would be entered first. We know this because, with no variables in the
model at the start and building the model one variable at a time, the variable correlating most strongly
Pr
with the outcome variable (Self-Esteem in our example) would be entered first (assuming statistical
significance).
ft
In the forward method, once a variable is entered into the model, that variable remains perma-
ra
nently in the model. It may seem odd for us to say this, but permanent membership in the model is not
necessarily true for variables entered into the model for the other step methods we discuss. For the next
D
step in the forward method, the remaining variables are evaluated and the variable with the highest
statistically significant partial correlation (the correlation between the residual variance of Self-Esteem
and that additional predictor) is entered provided that the partial correlation is statistically significant.
In this case, Positive Affect would join Negative Affect as a predictor in the model.
This process of adding variables to the model is repeated for each remaining predictor with the
variables in the model all acting as covariates. We would find, with Negative Affect and Positive Affect
in the model, that Openness would not be entered; that is, it would not account for a significant amount
of the residual variance accounted for by Negative Affect and Positive Affect (i.e., it would not be asso-
ciated with a statistically significant partial correlation coefficient). Once the next variable in line (the
e
best of the remaining predictors) fails to reach the entry criterion for entry into the model, the forward
ut
procedure ends with however many variables already in the model and the remaining variables not
included in the model. In our example, the forward procedure would stop at the end of the second step
rib
and Openness would remain on the sidelines.
t
is
5A.17 The Backward Method
rd
The backward method works not by adding significant variables to the model but, rather, by removing
,o
nonsignificant predictors from the model one step at a time. The very first action performed by the
backward method is the same one used by the standard method; it enters all the predictors into the
equation regardless of their worth. But whereas the standard method stops here, the backward method
st
is just getting started.
po
The model with all the variables in it is now examined, and the significant predictors are marked
for retention on the next step. Nonsignificant predictors are then evaluated, and the most expend-
able of them—the one whose loss would least significantly decrease the R2—is removed from the
y,
equation. A new model is built in the absence of that one independent variable, and the evaluation
op
process is repeated. Once again, the most expendable independent variable (with the requirement
that it is not statistically significantly contributing to R2) is removed. This removal process and model
reconstruction process continues until there are only statistically significant predictors remaining in
tc
the equation. In our example, Openness would have been removed at the first opportunity. The back-
ward method would have stopped at that point because both remaining variables would have been
no
significant predictors.
o
Backward regression does not always produce the same model as forward regression even though it
f-
would have done so in our simplified example. Here is why: Being entered into the equation in the
oo
forward method requires predictors to meet a more stringent criterion than variables being retained
in the model in the backward method. This creates a situation in which it is more difficult to get into
the model than to remain in it. The alpha or probability level associated with entry and removal defines
Pr
variance of the dependent variable. The alpha level governing this entry decision is usually the traditional
ra
.05 alpha level. By most standards, this is a fairly stringent criterion. When we look for predictors to
remove under the backward method, the alpha level usually drops to .10 as the default in most pro-
D
grams (the removal criterion needs to be somewhat less stringent than the entry criterion in order
to avoid a logic glitch in the entry-removal decision process—see Section 5A.19). This means that a
predictor needs to be significant at only .10 (not at .05) to retain its place in the equation. Thus, an
independent variable is eligible to be removed from the equation at a particular step in the backward
method if its probability level is greater than .10 (e.g., p = .11), but it will be retained in the equation if
its probability level is equal to or less than .10 (e.g., p = .09).
The consequences of using these different criteria for entry and removal affect only those variables
whose probabilities are between the entry and removal criteria. To see why this is true, first consider
variables that are not within this zone.
e
ut
•• If a variable does not meet the standard of p = .10, it is removed from the equation. This
variable would also by definition not meet the .05 alpha-level criterion for entry either, so there
rib
is no difference in the outcome for this predictor under either criterion—it is not going to wind
up in the equation in either the forward or backward methods.
t
•• If a variable does meet the criterion of .05, it will always be allowed entry to the equation and
is
will certainly not be removed by the backward method; again, there is no difference in outcome
rd
for such a predictor under either method.
,o
Variables with probability levels between these two criteria are in a more interesting position.
Assume that we are well into the backward process, and at this juncture, the weakest predictor is one
whose probability is .08. This variable would not have been allowed into the equation by the forward
st
method if it were considered for entry at this point because to get in, it would have to meet a .05 alpha
po
level to achieve statistical significance. However, under the backward method, this variable was freely
added to the equation at the beginning, and the only issue here is whether it is to be removed. When
we examine its current probability level and find it to be .08, we determine that this predictor is sta-
y,
tistically significant at the .10 alpha level. It
op
therefore remains in the equation. In this case,
Figure 5a.7 The Unique Contribution
the model built under the backward method
of Variable K Is Reduced by the Addition of
tc
Dependent Variable
o
have seen, the predictors in the equation admitted under a probability level of .05 can still overlap with
each other. This is shown in Figure 5a.7.
In Figure 5a.7, predictor J was entered first, K was entered second, and L was just entered as the
third predictor. We are poised at the moment when L joined the equation. Note that between predictors
J and L, there is very little predictive work that can be attributed uniquely to K. At this moment, the
e
squared semipartial correlation associated with K (showing its unique contribution to the prediction
ut
model) is quite small.
In the forward method, the fact that K’s unique contribution has been substantially reduced by L’s
rib
presence would leave the procedure unfazed because it does not have a removal option available to it.
But this is the stepwise method, and it is prepared to remove a predictor if necessary. When the amount
t
of unique variance that K now accounts for is examined with variables J and L acting as covariates, let’s
is
presume that it is not significant at the removal criterion of .10 (say its p value is .126). K is thus judged
rd
to no longer be contributing effectively to the prediction model, and it is removed. Of course, as more
predictors are entered into the equation, the gestalt could change dramatically, and K might very well
,o
be called on to perform predictive duties later in the analysis.
We have just described the reason that the entry criterion is more severe than the removal criterion.
st
It can be summarized as follows. If getting into the equation was easier than getting out, then variables
removed at one step might get entered again at the next step because they might still be able to achieve
po
that less stringent level of probability needed for entry. There is then a chance that the stepwise proce-
dure could be caught in an endless loop where the same variable kept being removed on one step and
entered again on the next. By making entry more exacting than removal, this conundrum is avoided.
y,
op
The primary advantage of using the standard method is that it presents a complete picture of the regression
outcome to researchers. If the variables were important enough to earn a place in the design of the study,
then they are given room in the model even if they are not adding very much to the R2. That is, on the
o
assumption that the variables were selected on the basis of their relevance to theory or at least on the
D
basis of hypotheses based on a comprehensive review of the existing literature on the topic, the standard
model provides an opportunity to see how they fare as a set in predicting the dependent variable.
f-
oo
Each independent variable in it has earned the right to remain in the equation through a hard, com-
petitive struggle. This same argument applies when considering the forward and backward methods.
ft
The forward and backward methods give what their users consider the essence of the solution by
ra
is being evaluated when the contributions of the other predictors have been statistically controlled.
Such “masking” of potentially good predictors can lead researchers to draw incomplete or improper
conclusions from the results of the analysis. One way around this problem is for the researchers to
exercise some judgment in which variables are entered at certain points in the analysis, and this is
discussed in Chapter 6A. This issue is also related to multicollinearity, a topic that we discuss in
e
Section 5A.21.
ut
The step methods have become increasingly less popular over the years as their weaknesses have
become better understood and as hierarchal methods and path approaches have gained in popularity.
rib
Tabachnick and Fidell (2013b), for example, have expressed serious concerns about this group of meth-
ods, especially the stepwise method, and they are not alone. Here is a brief summary of the interrelated
t
drawbacks of using this set of methods.
is
rd
•• These methods, particularly the stepwise method, may need better than the 40 to 1 ratio of
cases to independent variables because there are serious threats to external validity (Cohen
,o
et al., 2003, p. 162). That is, the model that is built may overfit the sample because a different
sample may yield somewhat different relationships (correlations) between the variables in the
analysis and that could completely change which variables were entered into the model.
st
•• The statistical criteria for building the equation identify variables for inclusion if they are
po
better predictors than the other candidates. But “better” could mean “just a tiny bit better” or
“a whole lot better.” One variable may win the nomination to enter the equation, but the mag-
nitude by which the variable achieved that victory may be too small to matter to researchers.
y,
•• If the victory of getting into the model by one variable is within the margin of error in the
op
measurement of another variable, identifying the one variable as a predictor at the expense of
the other may obscure viable alternative prediction models.
tc
•• Variables that can substantially predict the dependent variable may be excluded from the
equations built by the step methods because some other variable or combination of variables
does the job a little bit better. It is conceivable that several independent variables taken together
no
may predict the criterion variable fairly well, but step procedures consider only one variable at
a time.
o
•• There is a tendency using the step methods to “overfit” the data (Lattin, Carroll, & Green,
D
2003). Briefly, this criticism suggests that variables are chosen for inclusion in the model
“based on their ability to explain variance in the sample that may or may not be characteristic
f-
of the variance in the population by capitalizing unduly on error, chance correlation, or both”
(Lattin et al., 2003, p. 51).
oo
empirical research findings and wish to examine the combined predictive power of that set of predic-
ra
tors. But because they are functioning in combination, the weights of the predictors in the model are
a function of their interrelationships; thus, we are not evaluating them in isolation or in subsets. The
D
standard method will allow us to test hypotheses about the model as a whole; if that is the goal, then
that is what should be used.
The step methods are intended to identify which variables should be in the model on purely
statistical grounds. Many researchers discourage such an atheoretical approach. On the other hand,
there may be certain applications where all we want is to obtain the largest R2 with the fewest number
of predictors, recognizing that the resulting model may have less external validity than desired. Under
these conditions, some researchers may consider using a step method.
Before one decides that one of the statistical procedures is to be used, it is very important to
consider alternative methods of performing the regression analysis. Although they do require more
e
thoughtful decision making rather than just entering the variables and selecting a statistical method,
ut
the flexibility and potential explanatory power they afford more than compensate for the effort it takes
to run such analyses. Some of these regression procedures are discussed in Chapter 6A and the path
rib
model approach is broached in Chapters 12A through 15B.
t
is
5A.21 Collinearity and Multicollinearity
rd
Collinearity is a condition that exists when two predictors correlate very strongly; multicollinearity is a
,o
condition that exists when more than two predictors correlate very strongly. Note that we are talking
about the relationships between the predictor variables only and not about correlations between each
of the predictors and the dependent variable.
st
Regardless of whether we are talking about two predictors or a set of three or more predictors,
po
multicollinearity can distort the interpretation of multiple regression results. For example, if two variables
are highly correlated, then they are largely confounded with one another; that is, they are essentially
measuring the same characteristic, and it would be impossible to say which of the two was the more
y,
relevant. Statistically, because the standard regression procedure controls for all the other predictors
op
when it is evaluating a given independent variable, it is likely that neither predictor variable would
receive any substantial weight in the model. This is true because when the procedure evaluates one of
tc
these two predictors, the other is (momentarily) already in the equation accounting for almost all the
variance that would be explained by the first. The irony is that each on its own might very well be a
good predictor of the criterion variable. On the positive side, with both variables in the model, the R2
no
value will be appropriately high, and if the goal of the research is to maximize R2, then multicollinearity
might not be an immediate problem.
o
When the research goal is to understand the interplay of the predictors and not simply to maximize
D
R , multicollinearity can cause several problems in the analysis. One problem caused by the presence
2
of multicollinearity is that the values of the standardized regression coefficients of the highly correlated
f-
independent variables are distorted, sometimes exceeding the ordinarily expected range of ±1. A sec-
ond problem is that the standard errors of the regression weights of those multicollinear predictors can
oo
be inflated, thereby enlarging their confidence intervals, sometimes to the point where they contain the
zero value. If that is the case, we could not reliably determine if increases in the predictor are associated
Pr
with increases or decreases in the criterion variable. A third problem is that if multicollinearity is suf-
ficiently great, certain internal mathematical operations (e.g., matrix inversion) are disrupted, and the
statistical program comes to a screeching halt.
ft
Identifying collinearity or multicollinearity requires researchers to examine the data in certain ways.
ra
A high correlation is easy to spot when considering only two variables. Just examine the Pearson correla-
tions between the variables in the analysis as a prelude to multiple regression analysis. Two variables that
D
are very strongly related should raise a “red flag.” As a general rule of thumb, we recommend that two
variables correlated in the middle .7s or higher should probably not be used together in a regression or
any other multivariate analysis. Allison (1999a) suggests that you “almost certainly have a problem if the
correlation is above .8, but there may be difficulties that appear well before that value” (p .64).
One common cause of multicollinearity is that researchers may use subscales of an inventory as
well as the full inventory score as predictors. Depending on how the subscales have been computed,
it is possible for them in combination to correlate almost perfectly with the full inventory score. We
strongly advise users to employ either the subscales or the full inventory score, but not all of them in
the analysis.
e
Another common cause of multicollinearity is including in the analysis variables that assess the
ut
same construct. Researchers should either drop all but one of them from the analysis or consider the
possibility of combining them in some fashion if it makes sense. For example, we might combine height
rib
and weight to form a measure of body mass. As another example, we might average three highly corre-
lated survey items; principal components analysis and exploratory factor analysis, discussed in Chapters
t
10A and 10B, can be used to help determine which variables might productively be averaged together
is
without losing too much information. Further, related measures may be able to be used as indicators of
rd
a latent variable that can then be placed into a structure equation model (see Chapters 14A and 14B).
A less common cause of an analysis failing because of multicollinearity is placing into the analysis
,o
two measures that are mathematical transformations of each other (e.g., number of correct and incor-
rect responses; time and speed of response). Researchers should use only one of the measures rather
st
than both of them.
Multicollinearity is much more difficult to detect when it is some (linear) combination of variables
po
that produces a high multiple correlation in some subset of the predictor variables. We would worry if
that correlation reached the mid .8s, but Allison (1999a, p. 141) gets concerned if those multiple cor-
relations reached into the high .7s (R2 of about .60). Many statistical software programs will allow us
y,
to compute multiple correlations for different combinations of variables so that we can examine them.
op
Thus, we can scan these correlations for such high values and take the necessary steps to attempt to fix
the problem.
tc
Most regression software packages have what is called a tolerance parameter that tries to protect
the procedure from multicollinearity by rejecting predictor variables that are too highly correlated
no
with other independent variables. Conceptually, tolerance is the amount of a predictor’s variance not
accounted for by the other predictors (1 − R2 between predictors). Lower tolerance values indicate that
there are stronger relationships (increasing the chances of obtaining multicollinearity) between the
o
predictor variables. Allison (1999a) cautions that tolerances in the range of .40 are worthy of concern;
D
other authors have suggested that tolerance values in the range of .1 are problematic (Myers, 1990;
Pituch & Stevens, 2016).
f-
A related statistic is the variance inflation factor (VIF), which is computed as 1 divided by toler-
ance. A VIF value of 2.50 is associated with a tolerance of .40 and is considered problematic by Allison
oo
(1999a); a VIF value of 10 is associated with a tolerance of .1 and is considered problematic by Cohen
et al. (2003), Myers (1990), and Pituch and Stevens (2016).
Pr
Berk, R. A. (2003). Regression analysis: A constructive Cohen, J. (1968). Multiple regression as a general
D
critique. Thousand Oaks, CA: SAGE. data analytic system. Psychological Bulletin, 70,
Berry, W. D. (1993). Understanding regression assump- 426–443.
tions (Sage University Papers Series on Darlington, R. B. (1968). Multiple regression in psy-
Quantitative Applications in the Social Sciences, chological research and practice. Psychological
series no. 07-92). Newbury Park, CA: SAGE. Bulletin, 69, 161–182.
Draper, N. R., Guttman, I., & Lapczak, L. (1979). Pituch, K. A., & Stevens, J. P. (2016). Applied statistics
Actual rejection levels in a certain stepwise test. for the social sciences (6th ed.). New York, NY:
Communications in Statistics, A8, 99–105. Routledge.
Draper, N. R., & Smith, H. (2014). Applied regression Schafer, W. D. (1991). Reporting nonhierarchical
analysis (3rd ed.). Hoboken, NJ: Wiley & Sons. regression results. Measurement and Evaluation
Fox, J. (1991). Regression diagnostics. Newbury Park, in Counseling and Development, 24, 146–149.
e
CA: SAGE. Schroeder, L. D., Sjoquist, D. L., & Stephan, P. E.
ut
Green, S. A. (1991). How many subjects does it take to (1986). Understanding regression analysis: An
do a multiple regression analysis? Multivariate introductory guide. Beverly Hills, CA: SAGE.
rib
Behavioral Research, 26, 499–510. Sherry, A., & Henson, R. K. (2005). Conducting and
Kahane, L. H. (2001). Regression basics. Thousand interpreting canonical correlation analysis in
Oaks, CA: SAGE. personality research: A user-friendly primer.
t
is
Lopez, R. P., & Guarino, A. J. (2011). Uncertainty and Journal of Personality Assessment, 84, 37–48.
decision making for residents with dementia. Thompson, B. (1989). Why won’t stepwise methods
rd
Clinical Nursing Research, 20, 228–240. die? Measurement and Evaluation in Counseling
Lorenz, F. O. (1987). Teaching about influence and Development, 21, 146–148.
in simple regression. Teaching Sociology, 15, Trusty, J., Thompson, B., & Petrocelli, J. V. (2004).
,o
173–177. Practical guide for reporting effect size in quanti-
Montgomery, D. C., Peck, E. A., & Vining, G. G tative research in the Journal of Counseling &
st
(2012). Introduction to linear regression analysis Development. Journal of Counseling & Development,
(5th ed.). Hoboken, NJ: Wiley & Sons. 82, 107–110.
po
Pardoe, I. (2012). Applied regression modeling (2nd Weisberg, S. (2013). Applied linear regression analysis
ed.). Hoboken, NJ: Wiley & Sons. (4th ed.). Hoboken, NJ: Wiley & Sons.
y,
op
tc
no
o
D
f-
oo
Pr
ft
ra
D
e
ut
Multiple Regression
rib
Analysis Using IBM SPSS
t
is
rd
,o
T his chapter will demonstrate how to perform multiple linear regression analysis with IBM SPSS
st
first using the standard method and then using the stepwise method. We will use the data file
po
Personality in these demonstrations.
y,
5B.1 Standard Multiple Regression
op
For purposes of illustrating standard linear regression, assume that we are interested in predicting self-
esteem based on the combination of negative affect (experiencing negative emotions), positive affect
no
(experiencing positive emotions), openness to experience (e.g., trying new foods, exploring new
places), extraversion, neuroticism, and trait anxiety. Selecting the sequence Analyze Regression
Linear opens the Linear Regression main dialog window displayed in Figure 5b.1. From the variables
o
list panel, we move esteem to the Dependent panel and negafect, posafect, neoopen, neoextra, neo-
D
neuro, and tanx to the Independent(s) panel. The Method drop-down menu will be left at its default
setting of Enter, which requests a standard regression analysis (all of the predictors are entered into
f-
Selecting the Statistics pushbutton opens the Linear Regression: Statistics dialog window shown in
Figure 5b.2. By default, Estimates in the Regression Coefficients panel is checked. This instructs IBM
SPSS to print the value of the regression coefficient and related measures. We also retained the follow-
ft
ing defaults: Model fit, which provides R square, adjusted R square, the standard error, and an ANOVA
ra
table; R squared change, which is useful when there are multiple predictors that are being entered in
D
stages so that we can see where this information is placed in the output; Descriptives, which provides
the means and standard deviations of the variables as well as the correlations table; and Part and
partial correlations, which produces the partial and semipartial correlations when multiple predictors
are used. Clicking Continue returns us to the main dialog screen.
192
Copyright ©2017 by SAGE Publications, Inc.
This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher.
Chapter 5B: Multiple Regression Analysis Using IBM SPSS 193
e
ut
t rib
is
rd
,o
st
po
y,
op
tc
e
step methods, and we will discuss this in Section 5B.2.
We have retained the defaults of including the Y inter-
ut
cept (the constant) in the equation and of excluding
rib
cases listwise. The choice Exclude cases listwise (some-
times called listwise deletion) means that all cases must
t
have valid values on all of the variables in the analysis
is
in order to be included; a missing value on even one
rd
of the variables is sufficient to exclude that case.
Selecting this choice ensures that the set of variables,
and thus the regression model, is based on the same set
,o
of cases. So long as there is relatively little missing data,
this choice is best. Clicking Continue returns us to
st
the main dialog box, and selecting OK produces the
po
analysis.
major rows: the first contains the Pearson r values, the second contains the probabilities of obtaining
those values if the null hypothesis was true, and the third provides sample size.
no
The dependent variable esteem is placed by IBM SPSS on the first row and column of the correlation
table, and the other variables appear in the order we entered them into the analysis. The study repre-
o
sented by our data set was designed for a somewhat different purpose, so our choice of variables was a bit
limited. Thus, the correlations of self-esteem with the predictor variables in the analysis are higher than
D
we would ordinarily prefer, and many of the other variables are themselves likewise intercorrelated more
f-
than we would typically find in most studies. Nonetheless, the example is still useful for our purposes.
oo
Figure 5b.5 displays the results of the analysis. The middle table shows the test of significance of the
model using an ANOVA. There are 419 (N − 1) total degrees of freedom. With six predictors, the
Regression effect has 6 degrees of freedom. The Regression effect is statistically significant, indicating
ft
that the combination of predictors explains more of the variance of the dependent variable than can be
ra
primary interest are the R Square and Adjusted R Square values, which are .607 and .601, respectively.
We learn from these that the weighted combination of the predictor variables explained approximately
60% of the variance of self-esteem. The loss of so little strength in computing the Adjusted R Square
value is primarily due to our relatively large sample size combined with a relatively small set of
Figure 5b.4 Descriptive Statistics and Correlations Output for Standard Regression
e
ut
rib
t
is
rd
,o
st
po
y,
op
tc
no
o
D
f-
oo
Pr
predictors. Using the standard regression procedure where all of the predictors were entered simulta-
neously into the model, R Square Change went from zero before the model was fitted to the data to
ft
e
ut
t rib
is
rd
,o
st
po
y,
op
tc
no
o
The Partial column under Correlations lists the partial correlations for each predictor as it was
D
evaluated for its weighting in the model (the correlation between the predictor and the dependent vari-
able when the other predictors are treated as covariates). For example, the partial correlation associated
f-
with posafect is .219. This represents the correlation between posafect and the residual variance of the
oo
self-esteem dependent variable when statistically controlling for the other predictor variables.
The Part column under Correlations lists the semipartial correlations for each predictor once the
model is finalized. Squaring these values informs us of the percentage of variance of self-esteem that
Pr
each predictor uniquely explains. For example, trait anxiety accounts uniquely for about 3% of the vari-
ance of self-esteem (−.170 * −.170 = .0289 or approximately .03) given the contributions of the other
ft
The Y intercept of the raw score model is labeled as the Constant and has a value here of 96.885.
Of primary interest here are the unstandardized or raw (B) and standardized (Beta) coefficients, and
D
their significance levels determined by t tests. With the exception of negative affect and openness, all of
the predictors are statistically significant. As can be seen by examining the beta weights, trait anxiety
followed by neuroticism followed by positive affect were all making relatively larger contributions to
the prediction model.
The regression coefficients are partial regression coefficients because their values take into account
the other predictor variables in the model; they inform us of the predicted change in the depen-
dent variable for every unit increase in that predictor. For example, positive affect is associated with
an unstandardized partial regression coefficient of 1.338 and signifies that, when controlling for the
other predictors, for every additional point on the positive affect measure, we would predict a gain of
e
1.338 points on the self-esteem measure. As another example, neuroticism is associated with a partial
ut
regression coefficient of −.477 and signifies that, when controlling for the other predictors, every
additional point on the neuroticism measure corresponds to a decrement of .477 points on the
rib
self-esteem measure.
This example serves to illustrate two important related points about multiple regression analysis.
t
First, it is the model as a whole that is the focus of the analysis. Variables are treated akin to team players
is
weighted in such a way that the sum of the squared residuals of the model is minimized. Thus, it is the
rd
set of variables in this particular (weighted) configuration that maximizes prediction—swap out one of
these predictors for a new variable and the whole configuration that represents the best prediction can
,o
be quite different.
The second important point about regression analysis that this example illustrates, which is related
st
to the first, is that a highly predictive variable can be “left out in the cold,” being “sacrificed” for the
“good of the model.” Note that negative affect in isolation correlates rather substantially with self-
po
esteem (r = −.572), and if it was the only predictor it would have a beta weight of −.572 (recall that in
simple linear regression, the Pearson r is the beta weight of the predictor), yet in combination with the
other predictors is not a significant predictor in the multiple regression model. The reason for it not
y,
being weighted substantially in the model is that one or more of the other variables in the analysis are
op
accomplishing its predictive work. But the point is that just because a variable receives a modest weight
in the model or just because a variable is not contributing a statistically significant degree of prediction
tc
These are the correlations of the predictors in the model with the overall predictor variate, and
these structure coefficients help researchers interpret the dimension underlying the predictor model
(see Section 5A.11). They are easy enough to calculate by hand, and we incorporate these structure
o
coefficients into our report of the results in Section 5B.1.7. Structure coefficients are computed by
D
dividing the Pearson correlation for the given variable by the value of the multiple correlation coef-
ficient associated with the model (r/R). For example, the structure coefficient for negafect would be
f-
−.572/.779 or −.734. This represents the correlation of each predictor with the predicted value of the
dependent variable.
oo
Negative affect, positive affect, openness to experience, extraversion, neuroticism, and trait
ra
anxiety were used in a standard regression analysis to predict self-esteem. The correlations
of the variables are shown in Table 5b.1. As can be seen, all correlations, except for the one
D
(Continued)
(Continued)
The prediction model was statistically significant, F(6, 413) = 106.356, p < .001, and
accounted for approximately 60% of the variance of self-esteem (R2 = .607, adjusted R2 = .601).
Lower levels of trait anxiety and neuroticism, and to a lesser extent higher levels of positive
e
affect and extraversion, primarily predicted self-esteem. The raw and standardized regression
ut
coefficients of the predictors together with their correlations with self-esteem, the squared
rib
semipartial correlations, and the structure coefficients are shown in Table 5b.2. Trait anxiety
received the strongest weight in the model, followed by neuroticism and positive affect. With
the sizeable correlations between the predictors, the unique variance explained by each of
t
is
the variables indexed by the squared semipartial correlations was quite low. Inspection of the
rd
structure coefficients suggests that, with the possible exception of extraversion (whose correla-
tion is still relatively substantial), the other significant predictors were strong indicators of the
underlying (latent) variable described by the model, which can be interpreted as well-being.
,o
st
Table 5b.1 Correlations of the Variables in the Analysis (N = 420)
po
y,
op
tc
no
o
D
f-
e
data file Personality in this demonstration. In the process of our description, we will point out areas
ut
of similarity and difference between the standard and step methods.
rib
5B.2.1 Analysis Setup: Main Regression Dialog Window
t
Select Analyze Regression Linear. This brings us to the Linear Regression main dialog win-
is
dow displayed in Figure 5b.6. From the variables list panel, we move esteem to the Dependent panel
rd
and negafect, posafect, neoopen, neoextra, neoneuro, and tanx to the Independent(s) panel. The
Method drop-down menu contains the set of step methods that IBM SPSS can run. The only one you
,o
may not recognize is Remove, which allows a set of variables to be removed from the model together.
Choose Stepwise as the Method from the drop-down menu as shown in Figure 5b.6.
st
po
Figure 5b.6 Main Dialog Window for Linear Regression
y,
op
tc
no
o
D
f-
oo
Pr
ft
ra
D
e
is now applicable as we are using the stepwise
method. To avoid looping variables continu-
ut
ally in and out of the model, it is appropriate
rib
to set different probability levels for Entry
and Removal. The defaults used by IBM
t
SPSS that are shown in Figure 5b.8 are com-
is
mon settings, and we recommend them.
rd
Remember that in the stepwise procedure,
variables already entered into the model can
be removed at a later step if they are no lon-
,o
ger contributing a statistically significant
amount of prediction.
st
Earning entry to the model is set at an
po
alpha level of .05 (e.g., a variable with a prob-
ability of .07 will not be entered) and is the
more stringent of the two settings. But to be removed, a variable must have an associated probability of
y,
greater than .10 (e.g., a variable with an associated probability of .12 will be removed but one with an
associated probability of .07 will remain in the model). In essence, it is more difficult to get in than be
op
removed. This is a good thing and allows the stepwise procedure to function. Click Continue to return
to the main dialog window, and click OK to perform the analysis.
tc
no
Figure 5b.9 Tests of Significance for Each Step in the Regression Analysis
e
ut
t rib
is
rd
,o
st
po
y,
op
Examining the output shown in Figure 5b.9 informs us that the final model was built in four steps;
tc
each step resulted in a statistically significant model. Examining the df column shows us that one vari-
able was added during each step (the degrees of freedom for the Regression effect track this for us as
no
they are counts of the number of predictors in the model). We can also deduce that no variables were
removed from the model since the count of predictors in the model steadily increases from 1 to 4.
This deduction that no variables were removed is verified by the display shown in Figure 5b.10,
o
which tracks variables that have been entered and removed at each step. As can be seen, trait anxiety,
D
positive affect, neuroticism, and extraversion have been entered on Steps 1 through 4, respectively,
without any variables having been removed on any step.
f-
oo
e
ut
t rib
is
rd
,o
Figure 5b.11, the Model Summary, presents the R Square and Adjusted R Square values for each
step along with the amount of R Square Change. In the first step, as can be seen from the footnote
st
beneath the Model Summary table, trait anxiety was entered into the model. The R Square with that
predictor in the model was .525. Not coincidentally, that is the square of the correlation between trait
po
anxiety and self-esteem (−.7242 = .525) and is the value of R Square Change.
On the second step, positive affect was added to the model. The R Square with both predictors
in the model was .566; thus, we gained .041 in the value of R Square (.566 − .525 = .041), and this is
y,
reflected in the R Square Change for that step. By the time we arrive at the end of the fourth step, our
op
R Square value has reached .603. Note that this value is very close to but not identical to the R2 value
we obtained under the standard method (with the other statistically nonsignificant variables included
tc
in the model).
no
dardized and standardized regression coefficients are readjusted at each step to reflect the additional
D
variables in the model. Ordinarily, although it is interesting to observe the dynamic changes taking
place, we are usually interested in the final model. Note also that the values of the regression coeffi-
f-
cients are different from those associated with the same variables in the standard regression analysis.
That the differences are not huge is due to the fact that these four variables did almost the same amount
oo
of predictive work in much the same configuration as did the six predictors using the standard
method. If economy of model were relevant, we would probably be very happy with the trimmed
Pr
model of four variables replacing the full model containing six variables.
Figure 5b.13 addresses the fate of the remaining variables. For each step, IBM SPSS tells us which
variables were not entered. In addition to tests of the statistical significance of each variable, we also see
ft
displayed the partial correlations. This information together tells us what will happen in the following
ra
step. For example, consider Step 1, which contains the five excluded variables. Positive affect has the
D
highest partial correlation (.294), and it is statistically significant; thus, it will be the variable next entered
on Step 2. On the second step, with four variables (of the six) being considered for inclusion, we see that
neuroticism with a statistically significant partial correlation of −.269 wins the struggle for entry next. By
the time we reach the fourth step, there is no variable of the excluded set that has a statistically significant
partial correlation for entry at Step 5; thus, the stepwise procedure ends after completing the fourth step.
e
ut
t rib
is
rd
,o
st
po
y,
Figure 5b.13 The Results of the Stepwise Regression Analysis
op
tc
no
o
D
f-
oo
Pr
ft
ra
D
Negative affect, positive affect, openness to experience, extraversion, neuroticism, and trait
anxiety were used in a stepwise multiple regression analysis to predict self-esteem. The corre-
lations of the variables are shown in Table 5b.1. As can be seen, all correlations except for the
e
one between openness and extraversion were statistically significant.
ut
A stepwise multiple regression procedure was performed to generate a parsimonious predic-
rib
tion model. The final model contained four of the six predictors and was reached in four steps with
no variables removed. The model was statistically significant, F(4, 415) = 157.626, p < .001, and
accounted for approximately 60% of the variance of self-esteem (R2 = .603, adjusted R2 = .599).
t
is
Lower levels of trait anxiety and neuroticism, and to a lesser extent higher levels of positive
rd
affect and extraversion, predicted self-esteem. The raw and standardized regression coefficients
of the predictors together with their correlations with self-esteem, their squared semipartial
correlations, and their structure coefficients are shown in Table 5b.3. Trait anxiety received
,o
the strongest weight in the model, followed by neuroticism and positive affect; extraversion
received the lowest of the four weights. With the sizeable correlations between the predic-
st
tors, the unique variance explained by each of the variables indexed by the squared semipar-
po
tial correlations was relatively low: trait anxiety, positive affect, neuroticism, and extraversion
uniquely accounted for approximately 4%, 2%, 3%, and less than 1% of the variance of self-
esteem. The latent factor represented by the model appears to be interpretable as well-being.
y,
Inspection of the structure coefficients suggests that trait anxiety and neuroticism were very
strong indicators of well-being, positive affect was a relatively strong indicator of well-being,
op