0% found this document useful (0 votes)
2 views

CH 12-2

The document provides an overview of multiple regression analysis, including its formula, application in predicting outcomes like home selling prices and heating costs, and the use of ANOVA tables to summarize regression results. It discusses the assumptions necessary for valid regression analysis, such as linearity and independence of residuals, and techniques for building regression models, including the use of dummy variables. Additionally, it covers the evaluation of individual regression coefficients and the significance of independent variables in the model.

Uploaded by

김봉기
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

CH 12-2

The document provides an overview of multiple regression analysis, including its formula, application in predicting outcomes like home selling prices and heating costs, and the use of ANOVA tables to summarize regression results. It discusses the assumptions necessary for valid regression analysis, such as linearity and independence of residuals, and techniques for building regression models, including the use of dummy variables. Additionally, it covers the evaluation of individual regression coefficients and the significance of independent variables in the model.

Uploaded by

김봉기
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Multiple Regression Analysis

Multiple Regression Analysis


 The general form of a multiple regression formula is

 a is the intercept when all x’s are zero


 b refers to the sample regression coefficients
 xk refers to the value of the various independent variables
 When there are two independent variables, the relationship can be
graphically portrayed as a plane
Multiple Regression Analysis
 The least squares criterion is used to develop the
regression equation
 Example
 Suppose the selling price of a home is directly related to the
number of rooms and inversely related to its age, let x1 refer
to the number of rooms, x2 to the age of the home and yො to
the selling price of the home ($)
yො = 21.2 + 18.7x1 – .25x2
yො = 21.2 + 18.7(7) – .25(30) = 144.6
So, a seven-room house that is 30 years old is expected to
sell for $144,600
Multiple Regression Analysis Example
Salsberry Realty sells homes along the East Coast of the United States. One question
frequently asked by prospective buyers is “how much can we expect to pay to heat the
home in the winter”? The research department at Salsberry thinks 3 variables relate to
heating costs: the mean daily outside temperature, the number of inches of insulation,
and the age in years of the furnace. They conduct a random sample of 20 homes.
Determine the regression equation.

y is the dependent variable


x1 is the outside temperature
x2 is inches of insulation
x3 is the age of the furnace
yො = a + b1x1 +b2x2+b3x3
yො is used to estimate the value of y
Multiple Regression Analysis Example
Once we determine the regression equation, we can calculate the heating costs for
January, given the mean outside temperature is 30 degrees, there are 5 inches of
insulation, and the furnace is 10 years old.
yො = a + b1x1 +b2x2+b3x3
yො = 427.194 – 4.583x1 – 14.831x2 + 6.101x3
yො = 427.194 - 4.583(30) – 14.831(5) + 6.101(10) = 276.56
Thus, the estimated heating costs for January are $276.56
A지역: 25도, B지역: 35도 B지역에서 A지역에서보다 난방비가 $45.83만큼 적게 든다.

Recall:
y is the dependent variable
x1 is the outside temperature
x2 is inches of insulation
x3 is the age of the furnace
yො is the estimated value of y
ANOVA Table
 An ANOVA table summarizes the multiple regression
analysis
 It reports the total amount of the variation divided in two
components
 The regression, the variation in all the independent
variables
 The residual or error, the unexplained variation of y
 It reports the degrees of freedom of the independent
variables, the error variation, and the total variation
 ത 2
SST = σ(𝑦 − 𝑦)
 ത 2
SSR = σ(𝑦ො − 𝑦)

 ො 2
SSE = σ(𝑦 − 𝑦)

 cf. ANOVA


Measures of Effectiveness
 There are two measures of effectiveness of the regression
equation
 The multiple standard error of the estimate is similar to
the standard deviation
 It is based on squared deviations between the observed
and predicted values of the dependent variable
 It ranges from 0 to plus infinity
 It is calculated from the following equation
ANOVA Table

yො = 427.194 – 4.583x1 – 14.831x2 + 6.101x3


yො = 427.194 – 4.583(35) – 14.831(3) + 6.101(6) = $258.90
Then, (y- yො )2 = (250 – 258.90)2 = (8.90)2 = 79.21

sY .123...k =
å(y - ŷ) 2

=
SSE
=
41695.277
= 2605.955 = 51.049
n - (k +1) n - (k +1) 16
Measures of Effectiveness
COEFFICIENT OF MULTIPLE DETERMINATION The percent of variation in the
dependent variable, y, explained by the set of independent variables, x1, x2, x3, …xk.

 The coefficient of multiple determination


 Is symbolized by R2
 It can range from 0 to 1
 It cannot assume negative values
 It is easy to interpret
 It is found by the following formula: 1 – R^2 = 1 – SSR/SST = SSE/SST =
41695/212915 = .19..

R2 =
SSR
=
171,220.473
= .804 80.4% of the variation is explained by
SS total 212,915.750 the 3 independent variables.
Measures of Effectiveness
 When the number of independent variables is large, we
adjust the coefficient of determination for the degrees of
freedom as follows

 For the cost of heating example, the adjusted coefficient


of determination is

 If we compare R2 (0.80) to the adjusted R2 (0.767), the


difference in this case is small
Global Test
 A global test investigates whether it is possible that all the
independent variables have zero regression coefficients
 The hypotheses are
H0: β1 = β2 = β3 = 0
H1: Not all βis are 0
 The test statistic is the F distribution
 There is a family of F distributions
 It cannot be negative
 It is continuous
 It is positively skewed
 It is asymptotic
Global Test Continued
 The formula to calculate the value of the test statistic is

 with k (the number of independent variables) degrees of freedom in the


numerator
 n – (k+1) degrees of freedom in the denominator
 n is sample size
 We can obtain the degrees of freedom from the ANOVA table
Global Test Concluded
Step 1: State the null and the alternate hypothesis
H0: β1 = β2 = β3 = 0
H1: Not all βis are 0
Step 2: Select the level of significance, we’ll use .05
Step 3: Select the test statistic, F
Step 4: Formulate the decision rule, reject H0 if F > 3.24
Step 5: Make decision, reject H0, F=21.90
Step 6: Interpret, at least one of the independent variables has the ability to explain the
variation in heating cost.

The global test assures us that outside


temperature, the amount of insulation, or the
age of the furnace has a bearing on heating cost!
Test for Individual Variables
 The test for individual variables determines which
independent variables have regression coefficients that
differ significantly from zero
 The variables that have zero regression coefficients are
usually dropped from the analysis
 The test statistic is the t distribution with n – (k +1)
degrees of freedom
 The formula to calculate the value of the test statistic for
the individual test is
Evaluating Individual Regression
Coefficients Example
Salsberry Realty will use three sets of hypothesis: one for temperature, one for
insulation, and one for age of the furnace.
Step 1: State the null and alternate hypothesis
For temperature
H0: β1 = 0 b −0 −4.583−0
t = s1 = 0.722 = − 5.937
H1: β1 ≠ 0 b1

For insulation
b −0 −14.831−0
H0: β2 = 0 t = s2 = = − 3.119
H1: β2 ≠ 0 b2 4.754

For furnace age


H0: β3 = 0 b −0 6.101−0
t = s3 = 4.012 = 1.521
H1: β3 ≠ 0 b3

Step 2: Select the level of significance, we use .05


Step 3: Select the test statistic, we’ll use t
Step 4: Formulate the decision rule, reject H0 if t < -2.120 or > 2.120
Step 5: Make decision, reject H0 for temperature and insulation but not furnace age
Step 6: Interpret, furnace age is not a significant predictor of heating costs
Evaluating Individual Regression
Coefficients Example
Salsberry Realty will rerun the regression equation using temperature and insulation.
yො = 490.286 – 5.150x1 – 14.718x2
The hypotheses and details of the global test are, reject the null hypothesis if F > 3.59
H0 : β 1 = β 2 = 0
H1: Not all of the β1’s are equal
SSR/k 165,194.521/2
F = SSE/(n−(k+1) = 47,721.229/(20− 2+1 ) = 29.424

For temperature
H0: β1 = 0 b −0 −5.150−0
t = s1 = 0.702 = − 7.337
H1: β1 ≠ 0 b1

For insulation
H0: β2 = 0
b −0 −14.718−0
t = s2 =
H1: β2 ≠ 0 b2 4.934 = − 2.983

Step 2: Select the level of significance, we use .05


Step 3: Select the test statistic, we’ll use t
Step 4: Formulate the decision rule, reject H0 if t < -2.110 or > 2.110
Step 5: Make decision, reject H0 for temperature and insulation
Step 6: Interpret, temperature and insulation are a significant predictor of heating costs
Multiple Regression Assumptions
 There are five assumptions to use multiple regression
analysis
1. There is a linear relationship
2. The variation in the residuals is the same for both
large and small values of yො
3. The residuals follow the normal distribution
4. The independent variable should not be correlated
5. The residuals are independent
Linear Relationship Assumption
 The relationship between the dependent variable and the set of
independent variables must be linear
 To verify this assumption, develop a scatter diagram and plot the dependent
variable on the vertical axis and the independent variable on the horizontal
axis
 The plots below indicate a fairly strong negative relationship between
temperature and heating cost and negative relationship between insulation
and costs
Linear Relationship Assumption
 Curvilinear relationship
Variation Assumption
 Variation is the same for both large and small values of yො
HOMOSCEDASTICITY (등분산성) The variation around the regression
equation is the same for all of the values of the independent variables.

 This condition is checked by developing a scatter diagram


with the residuals on the vertical axis and the fitted
values on the horizontal axis
 If there is not a pattern to the plots—that is, they appear
random—the residuals meet the homoscedasticity
requirement
 연령 – 소득: 연령이 증가함에 따라 소득이 증가하는 경향성. 연령 29세
vs. 49세.
Normal Probability Assumption
 The residuals follow the normal probability distribution
 This condition is checked by developing a histogram of the residuals or a
normal probability plot (normal Q-Q plot: normal quantile-quantile plot)
 The mean of the distribution of the residuals is 0
 If the plotted points are fairly close to the straight line drawn from lower
left to upper right, the normal probability assumption is supported
Variables Not Correlated Assumption
 The independent variables are not correlated assumption
 A correlation matrix will show all possible correlations among independent
variables
 Signs of trouble are correlations > 0.70 or < -0.70
 Signs of correlated independent variables
 an important predictor variable is found insignificant
 an obvious reversal occurs in signs in one or more of the independent
variables
 a variable is removed from the solution, there is a large change in the
regression coefficients
 The VIF is used to identify correlated independent variables
Variables Not Correlated Assumption
Example
Develop a correlation matrix for all the independent variables: outside temperature,
amount of insulation, and age of furnace. Does it appear there is a problem with
multicollinearity? Find and interpret the VIF for each of the independent variables.

Because all of the correlations


are between -.70 and .70, we do
not suspect problems with
multicollinearity.(다중공선성)

1 1
VIF = 1−𝑅2 = = 1.32
1
1−.241
1 1
VIF = 1−𝑅2 = = 1.011
2
1−.011
1 1
VIF = 1−𝑅2 = 1−.236
= 1.310
3

All VIFs < 10, no multicollinearity


Independent Residuals Assumption
 Each residual is independent of other residuals
Techniques to Build a Regression Model
 Several techniques help build a regression model
DUMMY VARIABLE A variable in which there are only two possible outcomes.
For analysis, one of the outcomes is coded a 1 and the other a 0.

 A dummy or qualitative independent variable can assume one of two


possible outcomes, a 1 or a 0
 Use formula (14-6) to determine if the dummy variable should remain in
the equation

 Example
 Suppose we are interested in estimating an executive’s salary on the basis
of years of experience and whether he or she graduated college, graduation
will be a yes or no
Dummy Variable Example

Suppose in the Salsberry Realty example that the independent variable garage is added.
For homes without a garage, 0 is used; for homes with an attached garage, 1 is used.
Garage will be variable x4. What is the effect of the garage variable?
Dummy Variable Example Continued
Suppose we have two houses exactly alike in Buffalo, New York. One has an attached
garage (1) and the other does not (0). Both have 3 inches of insulation and the
temperature is 20 degrees.

For the house without the attached garage:


yො = 393.666 - 3.963x1 – 11.334x2 + 77.432x4
yො = 393.666 – 3.963(20) – 11.334 (3) + 77.432(0) = 280.404

For the house with the attached garage:


yො = 393.666 - 3.963x1 – 11.334x2 + 77.432x4
yො = 393.666 – 3.963(20) – 11.334 (3) + 77.432(1) = 357.836; 357.836-280.404 = 77.432

State the null and the alternate hypothesis


H0: β4 = 0
H1: β4 ≠ 0
b4 − 0 77.432−0
Select the level of significance, .05 t= = = 3.399
sb4 22.783
Select the test statistic, t
Formula the decision rule, reject H0 if t < -2.120 or > 2.120
Make decision, reject H0, t= 3.399
Interpret, the variable garage should be included in the analysis
Interaction Technique
 Interaction is the case in which one independent variable
(such as x2) affects the relationship with another
independent variable (x1) and the dependent variable (y)
 In regression analysis, interaction is examined as a
separate independent variable, we can multiply the data
values of one independent variable by the values of
another independent variable
Y = α+ β1x1+ β2x2+ β3x1x2
 The term x1x2 is the interaction term
 Now develop a regression equation with the three
variables and test the significance of the third
Interaction Technique Example
Refer to the heating cost example. Is there an interaction between the outside temperature and
the amount of insulation? If both variables are increased, is the effect on heating cost greater than
the sum of savings from warmer temperature and the savings from increased insulation
separately?

yො = 598.070 – 7.811x1 – 30.161x2 + 0.385x1x2


H0: β1X2 = 0
H1: β1X2 ≠ 0
The level of significance is .05
The test statistic is t
The decision rule is reject H0 if
r < -2.120 or > 2.120
Make decision, do not reject H0
Interpret, there is not a significant
interaction between temperature and
insulation.
Stepwise Regression
STEPWISE REGRESSION A step-by-step method to determine a regression equation
that begins with a single independent variable and adds or deletes independent
variables one by one. Only independent variables with nonzero regression coefficients
are included in the regression equation.
 IV1, IV2, IV3, DV – correlation matrix
 Advantages to the stepwise method
 Only independent variables with significant regression
coefficients are entered into the equation
 The steps involved are clear
 It is efficient
 The changes in the multiple standard error of estimate
and the coefficient of determination are shown
Stepwise Regression Technique
The stepwise procedure selects the independent variable temperature first. Temperature explains
65.85% of the variation in heating cost.
yො = 388.8 – 4.93x1
The next independent variable selected is garage. Now the coefficient of determination is
80.46%.
yො = 300.3 – 3.56x1 +93.0x2
Next, the procedure selects insulation and stops. At this point 86.98% of the variation is
explained.
yො = 393.7 – 3.96x1 + 77.0x2 – 11.3 x3

This is the same regression


equation we developed before!
Mediation
4 conditions for mediation (Baron & Kenny,
1986)
 X is correlated with Y: c path is significant (significant direct
effect)
 X is correlated with Med: a path is significant
 The mediator is correlated with Y when controlling for X: b
path is significant
 The effect of X on Y is reduced when controlling for the
potential mediator: c’ is zero (full mediation) or non-significant
(partial mediation)
 C’: Mediator를 모형에 포함했을 때, X가 Y에 미치는 직접효과
 c’ = (a x b ) + c

 Should the “effect to be mediated” significant in order for the


mediation analysis to be valid? No (Zhao, Lynch, & Chen, 2010)
Moderated Mediation
Moderated Mediation
 Interaction effect between X and MO on DV is mediated
by ME
 Mediation effect by ME is moderated by the level of MO
 Conditional indirect effect at different levels of MO
 -1SD: Mediation effect sig, +1SD: Mediation effect ns, or vice
versa
 the indirect effect becomes stronger as the level of MO
increases (Or decreases)

You might also like