Chapter 14
Chapter 14
Analysis
Chapter 14
1
Learning Objectives
LO14-1 Use multiple regression analysis to describe and interpret a relationship between several
independent variables and a dependent variable.
LO14-2 Evaluate how well a multiple regression equation fits the data.
LO14-3 Test hypotheses about the relationships inferred by a multiple regression model.
LO14-4 Evaluate the assumptions of multiple regression.
LO14-5 Use and interpret a qualitative, dummy variable in multiple regression.
LO14-6 Include and interpret an interaction effect in multiple regression analysis.
LO14-7 Apply stepwise regression to develop a multiple regression model.
LO14-8 Apply multiple regression techniques to develop a linear model.
14- 2
Multiple Linear Regression - Example
Salsberry Realty sells homes along the east coast of the United
States. One of the questions most frequently asked by
prospective buyers is: If we purchase this home, how much
^
Y X1 X2 X3
can we expect to pay to heat it during the winter? The
research department at Salsberry has been asked to develop
some guidelines regarding heating costs for single-family
homes.
Three variables are thought to relate to the heating costs: (1) the
mean daily outside temperature, (2) the number of inches of
insulation in the attic, and (3) the age in years of the furnace.
To investigate, Salsberry’s research department selected a
random sample of 20 recently sold homes. It determined the
cost to heat each home last January, as well
14-3
Multiple Linear Regression – Minitab Outputs for Salsberry Realty Example
b1
b2
b3
14-4
The Multiple Regression Equation –
Interpreting the Regression Coefficients and Applying the Model for Estimation
Regression Equation
14-6
Coefficient of Multiple Determination (r2)
F,k,n-k-1
CONCLUSION
The computed value of F is 21.90, which is in the rejection
F.05,3,16
Critical F
region, therefore the null hypothesis that all the multiple
regression coefficients are zero is rejected.
Interpretation: some of the independent variables (amount
of insulation, etc.) do have the ability to explain the
variation in the dependent variable (heating cost).
Logical question – which ones?
14-8
Evaluating Individual Regression Coefficients (βi = 0)
• The hypothesis test is as follows:
H0: βi = 0
H 1: β i ≠ 0
Reject H0 if t > t/2,n-k-1 or t < -t/2,n-k-1 Reject H 0 if :
t t / 2, n − k −1 t −t / 2, n − k −1
• The test statistic is the t distribution with n-(k+1)
degrees of freedom. The formula for the computed bi − 0 bi − 0
t / 2, n − k −1 −t / 2,n − k −1
statistic is: sbi sbi
bi − 0 bi − 0
t.05 / 2, 20 −3−1 −t.05 / 2, 20 −3−1
sbi sbi
bi − 0 bi − 0
t.025 ,16 −t.025 ,16
• This test is used to determine which independent sbi sbi
variables have nonzero regression coefficients. bi − 0 bi − 0
2.120 −2.120
sbi sbi
• The variables that have zero regression coefficients are
usually dropped from the analysis.
-2.120 2.120
14-9
Computed t for the Slopes
-2.120 2.120
Conclusion:
• The variable AGE does not have a slope significantly
different from 0, but the variables TEMP, and
INSULATION have slopes that are significantly different
from 0
Computed t • Re-run a new model without the variable AGE
14-10
New Regression Model without Variable “Age” – Minitab
Conclusion:
At 0.05 significance level, the slopes (coefficients) of the variables
-2.110 2.110
TEMP and INSULATION of the 2-var multiple linear model are -7.34
(Temp)
-2.98
Insulation
significantly different from 0.
14-11
Evaluating the Assumptions of Multiple Regression
1. There is a linear relationship. That is, there is a straight-
line relationship between the dependent variable and the
set of independent variables.
2. The variation in the residuals is the same for both large and
small values of the estimated Y To put it another way, the
residual is unrelated whether the estimated Y is large or
small.
3. The residuals follow the normal probability distribution.
4. The independent variables should not be correlated. That is,
we would like to select a set of independent variables that
are not themselves correlated.
5. The residuals are independent. This means that successive
observations of the dependent variable are not correlated.
This assumption is often violated when time is involved A RESIDUAL is the difference between the actual value
with the sampled observations. of Y and the predicted value of Y.
14-12
Multicollinearity
• Multicollinearity exists when independent variables (X’s) • A general rule is if the correlation between two independent
are correlated. variables is between -0.70 and 0.70 there likely is not a
• Effects of Multicollinearity on the Model: problem using both of the independent variables.
1. An independent variable known to be an important • A more precise test is to use the variance inflation factor
(VIF).
predictor ends up having a regression coefficient that
• A VIF > 10 is unsatisfactory. Remove that independent
is not significant. variable from the analysis.
2. A regression coefficient that should have a positive • The value of VIF is found as follows:
sign turns out to be negative, or vice versa. 1
VIF =
3. When an independent variable is added or removed, 1 − R 2j
there is a drastic change in the values of the remaining
regression coefficients. • The term R2j refers to the coefficient of determination,
where the selected independent variable is used as a
• However, correlated independent variables do not affect a dependent variable and the remaining independent
multiple regression equation’s ability to predict the variables are used as independent variables.
dependent variable (Y). 14-13
Multicollinearity – Example
Refer to the data in the table, which relates the heating
cost to the independent variables outside
temperature, amount of insulation, and age of
furnace.
Does it appear there is a problem with multicollinearity?
Find and interpret the variance inflation factor for each of
the independent variables.
The VIF value of 1.32 is less than the upper limit of 10.
This indicates that the independent variable temperature is
not strongly correlated with the other independent
variables.
14-14
Qualitative Variable - Example
Frequently we wish to use nominal-scale variables—such as gender, whether the
home has a swimming pool, or whether the sports team was the home or the
visiting team—in our analysis. These are called qualitative variables.
To use a qualitative variable in regression analysis, we use a scheme of dummy
variables in which one of the two possible conditions is coded 0 and the other 1.
EXAMPLE
Suppose in the Salsberry Realty example that the independent variable “garage” is added. For
those homes without an attached garage, 0 is used; for homes with an attached garage, a 1 is
used. We will refer to the “garage” variable as The data from Table 14–2 are entered into the
MINITAB system.
Without garage
With garage
14-15
Regression Models with Interaction
In Chapter 12 interaction among independent variables was covered. Suppose we Creating the Interaction Variable – Using the information from
are studying weight loss and assume, as the current literature suggests, that diet the table in the previous slide, an interaction variable is created
and exercise are related. So the dependent variable is amount of change in weight by multiplying the temperature variable by the insulation.
and the independent variables are: diet (yes or no) and exercise (none, moderate, For the first sampled home the value temperature is 35 degrees
significant). We are interested in seeing if those studied who maintained their diet and insulation is 3 inches so the value of the interaction
and exercised significantly increased the mean amount of weight lost? variable is 35 X 3 = 105. The values of the other interaction
In regression analysis, interaction can be examined as a separate independent products are found in a similar fashion.
variable. An interaction prediction variable can be developed by multiplying the
data values in one independent variable by the values in another independent
variable, thereby creating a new independent variable. A two-variable model that
includes an interaction term is:
Refer to the heating cost example. Is there an interaction between the outside
temperature and the amount of insulation? If both variables are increased, is the
effect on heating cost greater than the sum of savings from warmer temperature
and the savings from increased insulation separately?
14-16