Running and interpreting multiple regression in SPSS (includes review of assumptions)
Running and interpreting multiple regression in SPSS (includes review of assumptions)
The purpose of this presentation is to demonstrate (a) procedures you can use to obtain regression output
in SPSS and (b) how to interpret that output. Included is a discussion of various options that are available
through the basic regression module for evaluating model assumptions.
This video will necessarily be kept short, so if you are interested in a deeper dive into interpretation of the
output, etc. then be sure to download the Powerpoint underneath the video description. The Powerpoint
includes citations for you to consult on the “rules of thumb” I discuss when evaluating assumptions and
identifying potential outliers and influential cases.
A copy of the data will be provided as a link as well, so be sure to download it if you want to follow along.
Finally, I will be including a document as a link under the video description that will contain links to other
videos and materials based on this example/data. So if you want to learn more, be on the lookout.
If you find the video and supporting materials helpful, please take the time to “like” the video and share the
link with others who are learning. Also, please consider subscribing to my Youtube site.
Youtube video link: https://www.youtube.com/watch?v=W0BwlZv6Ul0
Scenario: We will be carrying out a multiple regression with student (a) interest, (b) anxiety, (c) mastery
goals, (d) performance goals and (e) gender identification as predictors (IV’s) of student achievement.
Interest, anxiety, mastery goals, performance goals, and achievement are all assumed to be continuous
variables. Gender identification is a binary IV (which is permissible in OLS regression) which has been
dummy coded (coded 0=identified male, 1=identified female). Dummy coding is a form of coding of binary
variables that facilitates interpretation of the intercept when included in a regression model.
[In this dataset, SESLEV refers to socio-economic status. It is coded 1=low, 2=medium, 3=high. However, this
variable is not used in the model at this time.]
Here is a diagram of the regression model
(using SEM drawing conventions) to visualize
the proposed predictive relationships. The
figure was drawn using AMOS.
The options under the ‘save’ menu will allow you to save new variables to your dataset as a function of
the estimated regression model. For example, you can save predicted Y values (i.e., fitted Y’s, or ),
unstandardized residuals (i.e., Y- ), standardized residuals, studentized residuals, along with other
variables that can be useful for identifying potential outliers and influential cases.
This menu allows one to obtain a residuals plot, plotting the fitted residuals against the fitted values on Y. One
can also generate additional plots, allowing one to evaluate the normality of residuals (using the histogram
and normal probability plot of residuals). By clicking “Produce all partial plots” one can obtain plots of the
relationship between each IV and the DV controlling for the IV’s.
In this example, we are plotting studentized residuals (SRESID) against standardized predicted (i.e., fitted Y)
values (ZPRED).
This portion of the output is useful
for evaluating the omnibus effect of
the IV’s on the DV.
The Model Summary table contains the Multiple Correlation (also known as Multiple R) between the set of IV’s
and the DV. (In effect, it is the correlation between the fitted Y (i.e., ) values and the actual Y values):
The coefficient of determination (i.e., R-square) is the square of the multiple correlation and reflects the
proportion of variation in Y accounted for by fitted Y (i.e., ).
R-square is positively biased (as an estimator of population R-square) when sample size is smaller and there are
greater numbers of predictors. The adjusted R-square provides an adjustment to R-square based on the sample
size and number of IV’s in your model. In this output, there is very little difference between R-square (.412) and
the Adjusted R-square (.390). The difference between R-square and adjusted R-square is shrinkage.
The analysis of variance is used to test the
statistical significance of the R-square value in the
Model Summary table. The null hypothesis is that
the population R-square is zero. Here, the ANOVA
results indicate statistical significance
[F(5,134)=18.770, p<.001], suggesting that the
population R-square is significantly greater than
zero.
The Constant is the intercept for the model. The intercept is the predicted value on Y (i.e, ), when your IV’s are
all zero. Another way of putting it is that the intercept is the conditional mean on Y when the X’s are zero. Unless
your IV’s are measured in such a way that a value of zero is a meaningful point in those scales, the intercept will
not be not particularly useful. This is the reason why it is often treated as a nuisance parameter. Nevertheless, if
your IV’s are coded/scored in a manner where zero is a meaningful point on their scales, then it can be very
useful for interpretation.
The t-test for the intercept is that it is significantly different from zero. This is often not a priority when carrying
out regression analysis, so if you do not have any specific hypothesis concerning the intercept being non-zero,
then this test can be ignored.
The remaining values in the B column are unstandardized partial regression slopes. They reflect the predicted
change in Y per unit increase on an IV (controlling for the remaining IV’s).
The null hypothesis and research (alternative) hypothesis when your research/alternative hypothesis is non-
directional (i.e., that the population slope is not 0) is , respectively. If testing a directional hypothesis that the
population slope is > 0, then the null and research (alternative) hypotheses are , . If testing a directional
hypothesis that the population slope is < 0, then , .
If testing the null against a non-directional research hypothesis, statistical significance is indicated by a p-value
(see sig. column) less than alpha (conventionally, it is .05; hereafter, we will use this threshold by default).
If testing the null against a directional research hypothesis, you will need to pay attention not only to the p-value
but ALSO the sign associated with the regression coefficient. If the sign associated with the regression coefficient
indicates a direction of relationship that is contrary to that you have hypothesized, then you cannot reject the
null – even if the p-value that is printed in your output “seems” to indicate significance.
One other note: If the sign of your regression slope is consistent with your directional hypothesis, then the
printed p-value in your SPSS output is technically half that value (i.e., printed p/2). If the sign is inconsistent with
your directional hypothesis, then ignore the printed p-value because you cannot reject the null.
Interpretationa:
Assuming for all IV’s, then we can say that performance goals, mastery goals, and interest emerged as
significant predictors of student achievement. Moreover, we see that performance goals was a negative
predictor (b=-.010) of achievement, whereas interest (b=.198) and mastery goals (b=.325) were positive
predictors.
a
In this example, we are treating the intercept as a nuisance parameter as zero is not a logical value within
the range of values on the predictors, with the exception of genderid (coded 0 and 1). Moreover, to keep
things simpler, we will test our regression coefficients assuming a non-directional research hypothesis.
A more formal interpretation of the unstandardized partial regression coefficients (B):
Performance goals: “For every one unit increase on performance goals, scores on student achievement were
predicted to decrease by .010 in raw score units.
Mastery goals: “For every one unit increase on mastery goals, there was a predicted increase in student
achievement of .325 raw score units.
Interest: “For every one unit increase on interest, there was a predicted increase in student achievement
of .198 raw score units.
Anxiety: “For every one unit increase on anxiety, there was a predicted decrease of .023 raw score units on
achievement.
Genderid: Since this variable is binary, the regression slope only reflects the difference in conditional means
between the two groups (coded 0=identified male, 1=identified female). The difference in conditional means
between these groups was .235, with females (coded 1) scoring lower on achievement than males (coded 0).
Again, the difference is not statistically significant (p=.258).
The standardized regression coefficients (also referred to as “Beta”) are interpreted as the predicted change
on the dependent variable in standard deviation units (i.e., z-score units) per one standard deviation unit
(again, z-score unit) increase on an IV (controlling for the remaining IV’s).
These coefficients provide one way of judging the relative contributions of the IV’s to the model. This can be
done because the coefficients are scale-free (unlike the unstandardized coefficients in the B column). Simply
rank order the IV’s according to the absolute values for the Beta coefficients to interpret relative
contributions.
In our example, we see that Mastery goals (β=.357) had the strongest predictive relationship to the DV,
followed by Interest (β=.276), followed by performance goals (β=-.153), then genderid (β=-.078), and then
anxiety (β=-.030).
More formal interpretation of standardized regression coefficients (Beta):
Performance goals: “For every one standard score (i.e., z-score) unit increase on performance goals, there is a
predicted decrease of .153 standard score units on student achievement.”
Mastery goals: “For every one standard score (i.e., z-score) unit increase on mastery goals, there is a predicted
increase .357 standard score units on student achievement.”
Interest: “For every one standard score (i.e., z-score) unit increase on interest, there is a predicted increase
of .276 standard score units on student achievement.”
Anxiety: “For every one standard score (i.e., z-score) unit increase on anxiety, there is a predicted decrease
of .030 standard score units on student achievement.”
More formal interpretation of standardized regression coefficients (Beta):
Genderid: Since this variable is binary, the standardized regression slope only reflects the difference between
the two groups in z-score units. Again, because the slope is negative this means that females scored lower than
males on student achievement.
Unstandardized prediction equation:
^ = 𝑏 +𝑏 𝑋 + 𝑏 𝑋 +𝑏 𝑋 + 𝑏 𝑋 + 𝑏 𝑋
𝑌 0 1 1 2 2 3 3 4 4 5 5
^ =𝑏𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 +𝑏 𝑚𝑔𝑜𝑎𝑙 𝑋 𝑚𝑔𝑜𝑎𝑙 +𝑏𝑖𝑛𝑡 𝑋 𝑖𝑛𝑡 +𝑏 𝑎𝑛𝑥 𝑋 𝑎𝑛𝑥 +𝑏 𝑔𝑖𝑑 𝑋 𝑔𝑖𝑑 +𝑏𝑝𝑔𝑜𝑎𝑙 𝑋 𝑝𝑔𝑜𝑎𝑙
𝑌
^ = 2.537+.325 𝑋 𝑚𝑔𝑜𝑎𝑙 +.198 𝑋 𝑖𝑛𝑡 −.023 𝑋 𝑎𝑛𝑥 −.235 𝑋 𝑔𝑖𝑑 −.010 𝑋 𝑝𝑔𝑜𝑎𝑙
𝑌
Standardized prediction equation:
^= 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 + 𝛽 𝑋 +𝛽 𝑋
𝑌 1 1 2 2 3 3 4 4 5 5
^ = 𝛽𝑚𝑔𝑜𝑎𝑙 𝑋 𝑚𝑔𝑜𝑎𝑙+ 𝛽 𝑖𝑛𝑡 𝑋 𝑖𝑛𝑡 + 𝛽 𝑎𝑛𝑥 𝑋 𝑎𝑛𝑥 + 𝛽 𝑔𝑖𝑑 𝑋 𝑔𝑖𝑑 + 𝛽𝑝𝑔𝑜𝑎𝑙 𝑋 𝑝𝑔𝑜𝑎𝑙
𝑌
^ =.357 𝑋 𝑚𝑔𝑜𝑎𝑙 +.276 𝑋 𝑖𝑛𝑡 − .030 𝑋 𝑎𝑛𝑥 − .078 𝑋 𝑔𝑖𝑑 − .153 𝑋 𝑝𝑔𝑜𝑎𝑙
𝑌
These are 95% confidence intervals for each of the unstandardized regression coefficients from the B column.
Whereas the coefficients in the B column provide point-estimates of population parameters, the values here
provide interval estimates.
A common misconception is that a confidence interval indicates that there is a certain % chance that that the
true population parameter falls between the values at the lower and upper bound of that interval.
Unfortunately, although the interval constructed does provide an estimated range wherein the true population
parameter might lie, it is simply that – an estimate. Each confidence interval is one of an infinite number of
intervals that can be constructed with repeated random sampling from the population.
As such, the best we can say (regarding the confidence interval associated with any given regression coefficient;
see output above) is that we are reasonably certain (i.e., 95%) that our interval is one of all possible intervals
that may overlap with the true population regression parameter.
In short, it would be incorrect to say that there is a 95% chance that the unstandardized coefficient for mastery
falls between .168 and .482. But it is correct to say that are 95% certain that our interval (ranging from .168
to .482) is one of all possible intervals that might overlap the true population regression parameter assuming
random sampling from the population.
Finally, it is often the case that confidence intervals are used in hypothesis testing. Simply put, if the null
population regression slope of 0 falls between the lower and upper bound of a confidence interval, then the
researcher does not reject the null. On the other hand, if 0 falls outside the lower and upper bound, then the
researcher rejects the null and infers a non-zero population regression parameter. The conclusions will be the
same as those drawn from the t-test of the regression coefficients when testing a non-directional hypotheses.
This section of the output contains correlations between the IV’s and the DV. The zero-order correlations are
Pearson’s correlations between the IV’s and the DV. The partial correlations (second column) reflect the
correlation between each IVk and the DV (controlling the remaining IV’s from both IVk and the DV).
The semi-partial correlations (Part column) reflect the correlation between each IVk and the DV (controlling the
remaining IV’s from a given IVk). The square of the semi-partial correlation () will give you the proportion of
variation in the dependent variable that is uniquely accounted for by a given IV k (Tabachnick & Fidell, 2013). This
can be very useful in rank ordering the IV’s in terms of their relative contributions to the regression model. The
squared semi-partial correlations of the IV’s are:
One of the fundamental assumptions of multiple regression analysis is the absence of multicollinearity among
IV’s in your regression model. Multicollinearity occurs when one IV in your model is a linear function of one or
more of the other IV’s.
One way might be to examine a matrix of zero-order correlations involving your IV’s to identify variables that are
very highly related (r’s in the .80’s or above) to each other. The downside of relying solely on this approach is
that it may fail to capture IV’s that are a linear function of more than a single IV.
A better approach is to index the relationship between each IV k and the remaining IV’s in terms of shared
variation. One way to do this is to regress each IVk onto the remaining IV’s. When R-square associated with a
given IVk (the dependent variable) in these models is greater than .90, this may be taken as an indicator that the
IVk is collinear with the remaining variables.
Pivoting off the previous slide, we have two indices used to judge multicollinearity in the regression output:
Tolerance and VIF.
Tolerance (shown in the table above) is computed as 1-R-square from the previously mentioned regressions
whereby each IVk is regressed onto the remaining IV’s. In effect, the Tolerance represents the proportion of
variation unexplained in IVk after regressing it onto the remaining IV’s. Values < .10 may be considered to
indicate the presence of multicollinearity.
The Variance Inflation Factor (VIF; shown in the table above) is computed as the reciprocal of the Tolerance (i.e.,
VIF=1/Tolerance), or simply 1/(1-R-square). A VIF > 10 can be considered indicative of the presence of more
severe multicollinearity involving a given IVk. (See Lomax & Hahs-Vaughn, 2012).
In our current demonstration all VIF values are well below 10 and tolerance values are greater than .10.
Had there been evidence of multicollinearity, one might carry out the regression analysis after (a) deleting IV’s
that are highly collinear with other IV’s, (b) combining highly collinear IV’s into composite variables (defensible
on theoretical or conceptual grounds), or (c) using a data reduction technique (such as principal components
analysis) to reduce the larger set of IV’s to a smaller set of orthogonal components (these would be new IV’s).
When these are clicked, new variables are saved to the
dataset containing predicted values, fitted residuals
(unstandardized, standardized, and studentized), and other
diagnostic measures to identify cases that may be
considered outliers or as having an undue influence on the
regression model.
Conceptual point: Unstandardized residuals are computed as the difference between Y and predicted Y.
Standardized residuals are computed by taking a ratio of the unstandardized residuals (above) to the standard
deviation of those residuals.
--------------------------------------------------------------------------------------------------------------------------------------------------------
*Darlington & Hayes (2017), however, state that “unless you see clear evidence of fairly extreme non-normality in
the residuals and have ruled out the existence of clerical errors and highly influential cases…don’t worry too much
about all but extreme violations”, as this is “one of the least important assumptions of regression” (p. 498).
The normal P-P plot can also be used to assess normality of
the standardized residuals. This plot contains plots the
relationship between the observed residuals against those
expected under the condition of normality. The closer the
observed residuals fall in relation to the regression line, the
more evidence of normality.
-3
𝑧^
𝑌
Here (left), our SPSS output contains a plot of the studentized residuals against the studentized
predicted values. In general, we are looking for the residuals to be randomly and evenly distributed
around zero & falling between roughly -3 and +3 units (Pituch & Stevens, 2016; see idealized residuals
plot on the right).
Outlier detection on Y: Cases with studentized residuals falling outside the range of -3 to +3 should
be investigated as possible outliers on the (conditional) dependent variable. In this example, one
case (#46) may be an outlier given its studentized residual appears to fall between +3 and +4. This
residual has the largest distance of all the residuals from the regression line (the horizontal line at
0). The remaining cases have considerably lower distances from the regression line.
𝑆𝑡 𝑟𝑒𝑠
+3
-3
𝑧^
𝑌
Linearity assumption: OLS regression assumes a linear relationship between the IV’s and the DV. Non-
linearities can show up in the residuals plot.
If there is curvature in the residuals (see e.g., figure on the left demonstrating a quadratic trend between the
fitted Y’s and studentized residuals), then this may signal the presence of an unspecified non-linear
relationship between one or more IV’s and the DV. This would suggest the possible need to incorporate the
non-linear relationship into the regression model.
Our residuals plot (on the right) does not suggest any type of non-linear relationship between the fitted Y
values and the studentized residuals.
𝑆𝑡 𝑟𝑒𝑠
+3
-3
𝑧^
𝑌
Visual inspection for heteroskedastic errors: OLS regression assumes that the variation in residuals is
constant across the fitted Y values & and each IVk . In those cases where the spread over residuals appears
to change across the IV’s, then errors may be heteroskedastic. The presence of heteroskedasticity signals
that the fit of the regression model varies across levels of the IV’s, with that heteroskedasticity producing
biases in standard errors and test statistics. A common (heteroskedastic) form residuals take is a fan-shaped
pattern (see figure on the left) – although others (e.g., bow-tie or butterfly) are also possible. In the fan-
shaped pattern on the left, the residuals are much smaller at lower fitted Y values (signaling that the model
does a reasonably good job of predicting scores on Y), whereas they increase as fitted Y values increase
(signaling that the model does an increasingly poor job of predicting scores on Y.
𝑆𝑡 𝑟𝑒𝑠
+3
-3
𝑧^
𝑌
Our plot on the right (based on our data) does not appear to exhibit problems with heteroskedastic errors.
Rather, there is fair evidence for homoskedasticity (i.e., constant variance of errors, which is a requirement
of the model).
Note: Outliers on Y can yield plots that suggest evidence of heteroskedastic errors. Notice that the outlier
shows greater distance from zero than the remaining values. At that particular fitted Y value, the “spread”
of points is greater because of the presence of the outlier. But again, most of the residuals appear to exhibit
constant variance.
Independence of residuals: A key assumption of OLS regression is that the residuals are independent of each
other. That is, they are randomly sampled from a population of residuals. One situation where this
assumption is frequently violated is when the DV is measured repeatedly over time on the same case or set
of cases, which produces a condition called autocorrelation (where the prediction errors are correlated with
themselves over time). Tabachnick and Fidell (2013) noted that the presence of non-independence (as a
result of autocorrelation) can be evaluated by plotting residuals against an ordered time variable. For
example, the chart on the left (above) shows a sequence chart containing residuals (based on a separate set
of data) where a DV had been repeatedly measured over time. Since the DV in our example was not
measured on one or more cases over time, this is not going to be an issue. [Side note: Non-independence
can also arise due to clustering that is unrelated to time.]
Typically, one would also plot studentized residuals against each IV to identify IV’s that may be contributing to
heteroskedasticity (if present) or non-linearity in the residuals. The default in SPSS does not provide these
plots for you (as it does when plotting residuals against predicted Y). However, you can obtain these plots by
generating scatterplots of the studentized residuals (saved to your dataset) against each IV.
These plots are created by (a) regressing IVk onto the remaining predictors and obtaining the residuals; (b)
regressing the DV onto all IV’s, except for IVk, and obtaining the residuals, and (c) creating a scatterplot of the
relationship between the residuals. The X axis contains the residualized IV k variable, whereas the Y axis contains
the residualized DV. The plot essentially allows you to visualize the partial correlations found in the Partial
correlation column in your regression output. Notice that squaring the Partial correlation from the output gets
you the R-square you see in the figure.
In addition to visualizing the partial linear relationship between a given IV k, and the DV, you can also
investigate the plot to identify whether the relationship might also be non-linear (and whether that non-
linear relationship should also be included in your regression model).
Fitting a linear trend to the residualized Fitting a quadratic trend to the residualized
performance goals and residualized performance goals and residualized
achievement variables. achievement variables.
Relationship between residualized mastery goals and residualized achievement.
These are the saved fitted Y values (), unstandardized residuals (RES_1), standardized residuals
(ZRE_1), and studentized residuals (SRE_1) saved to the dataset.
Pituch & Stevens (2016) suggest that studentized residuals falling outside of -3 to +3 may indicate a
potential outlier on Y. An easy way to screen for potential outliers using studentized residuals
(without having to necessarily scroll through the entire dataset) is to use the Explore function…
Click on Outliers…
Here we have the cases with the most
extreme values on the studentized
residuals. Case #46 (identified earlier)
may be an outlier, as it falls outside of
-3 to +3.
The MAH_1 are Mahalanobis distances for the cases in the dataset. This is the distance from an
individual observation on the set of the independent variables and the centroid (multivariate mean)
for those variables. This is useful in the detection of cases that are outliers on the set of predictors,
although outliers on the predictors will not necessarily be influential cases (Stevens & Pituch, 2016).
Tabachnick and Fidell (2013) & Field (2018) note that these values follow a chi-square distribution
with df=#k predictors. A simple way to carry out a test of the individual cases to identify potential
outliers is to (a) use the Compute function to calculate p-values for each case and then (b) determine
significance (Tabachnick & Fidell, 2013, recommended a .001 significance level). See next slide for
demonstration how.
Use the Compute variable function to create a new target variable (I’m calling it ‘pvalue’). Under numeric
expression type ‘CDF.CHI.SQ()’. Inside the parenthesis provide the name of the variable containing the
Mahalanobis distances (here, it is MAH_1), then type comma and then the number of predictors in the
regression model.
After clicking ‘Ok’, you will get a new variable containing p-values. Compare the p-values against
alpha=.001. If p for a case is less than or equal to .001, then that case may be deemed an outlier.
COO_1 and DFF_1 contain values for Cook’s D and DFFITS – both of which are indicators of the degree
to which a case exerts an influence over the regression results. Cook’s D is an indicator of the degree
to which a case is “an outlier on Y and the set of predictors” (Pituch & Stevens, 2016, p. 108). Values >
1 are generally considered more problematic (Lomax & Hahs-Vaughn, 2012; Pituch & Stevens, 2016).
DFFITS indexes the degree to which a case influences fitted values on Y (Pituch & Stevens, 2016).
Osborne (2017) suggested standardizing the casewise DFFITS index (i.e., converting it to z-scores) and
looking for extreme values (in his presentation, 5 standard deviations from the mean was treated as
the threshold).
These are standardized DFBETAS which are used to index the degree of influence a case has on
individual regression parameters (Lomax & Hahs-Vaughn, 2012). A value > |2| suggests a case is
having a sizeable influence and “should be investigated” (Pituch & Stevens, 2016, p. 113).
[Note: Pedhauzer (1997) reviewed other possible rules of thumb including: (a) 2/sqrt(n) for larger
datasets and 1 for smaller to medium-sized datasets and (b) 3/sqrt(n) in general.]
References and resources
Field, A. (2018). Discovering statistics using IBM SPSS statistics (5th ed). Los Angeles: Sage.
Lomax, R.G., & Hahs-Vaughn, D.L. (2012). An introduction to statistical concepts (3rd ed). New York: Routledge.
Darlington, R.B, & Hayes, A.F. (2017). Regression analysis and linear models: Concepts, applications, and
implementation. New York: The Guilford Press.
Osborne, J.W. (2017). Regression and linear modeling: Best practices and modern methods. Thousand Oaks,
CA: Sage.
Pituch, K.A., & Stevens, J.P. (2016). Applied multivariate statistics for the social sciences (6th ed). Thousand
Oaks, CA: Sage.
Pedhauzer, E.J. (1997). Multiple regression in behavioral research: Explanation and Prediction (3rd ed).
Orlando, FL: Harcourt Brace.
Tabachnick, B.G., & Fidell, L.S. (2013). Using multivariate statistics (6th ed). Upper Saddle River, NJ: Pearson.
Appendix: Relationship between unstandardized and standardized regression
coefficients
The unstandardized coefficients can be converted to standardized coefficients by multiplying the unstandardized
coefficient against the ratio of the standard deviation of X to the standard deviation of Y (i.e., β=sk/sy).
For example, the standard deviations for performance goals and mastery are 1.274 and 19.645, respectively.