Business Analytics Module 4 Summary
Business Analytics Module 4 Summary
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved.
o For a single variable linear regression, R2 is equal to the square of the
correlation coefficient.
• In addition to analyzing R2, we must test whether the relationship between the
dependent and independent variable is significant and whether the linear model
is a good fit for the data. We do this by analyzing the p-value (or confidence
interval) associated with the independent variable and the regression’s
residual plot.
o The p-value of the independent variable is the result of the hypothesis test
that tests whether there is a significant linear relationship; that is, it tests
whether the slope of the regression line is zero, H0: b = 0 and Ha: b ≠ 0.
§ If the coefficient’s p-value is less than 0.05, we reject the null
hypothesis and conclude that we have sufficient evidence to be
95% confident that there is a significant linear relationship between
the dependent and independent variables.
§ Note that the p-value and R2 provide different information. A linear
relationship can be significant (have a low p-value) but not explain
a large percentage of the variation (not have a high R2.)
o A confidence interval associated with an independent variable’s
coefficient indicates the likely range for that coefficient.
§ If the 95% confidence interval does not contain zero, we can be
95% confident that there is a significant linear relationship between
the variables.
• Residual plots can provide important insights into whether a linear model is a
good fit.
o Each observation in a data set has a residual equal to the historically
observed value minus the regression’s predicted value, that is, 𝜀 = y - ŷ.
o Linear regression models assume that the regression’s residuals follow a
normal distribution with a mean of zero and fixed variance.
• We can also perform regression analyses using qualitative, or categorical,
variables. To do so, we must convert data to dummy (0, 1) variables. After that,
we can proceed as we would with any other regression analysis.
o A dummy variable is equal to 1 when the variable of interest fits a certain
criterion. For example, a dummy variable for “Female” would equal 1 for
all female observations and 0 for male observations.
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 2
EXCEL SUMMARY
Recall the Excel functions and analyses covered in this course and make sure to
familiarize yourself with all of the necessary steps, syntax, and arguments. We have
provided some additional information for the more complex functions listed below. As
usual, the arguments shown in square brackets are optional.
• Adding the best fit line to a scatter plot using the Insert menu
• Forecasting with regression models in Excel
o =SUMPRODUCT(array1, [array2], [array3],…) is a convenient function
for calculating point forecasts.
• Creating a regression output table using the Data Analysis tool
• Creating regression models with dummy variables
o =IF(logical_test,[value_if_true],[value_if_false])
§ Returns value_if_true if the specified condition is met, and returns
value_if_false if the condition is not met.
o To perform a regression analysis with an independent dummy variable,
follow the same steps as when using quantitative variables.
© Copyright 2020 President and Fellows of Harvard College. All Rights Reserved. 3