Handout On Regression
Handout On Regression
The linear regression model allows you to predict from a straight line and describe the
relationship between the independent and dependent variables in the model by fitting a line of
best fit to the observed data.
Regression allows you to estimate how a dependent variable changes as the independent
variable(s) change.
Linear regression models use a straight line, while logistic and nonlinear regression models use a
curved line.
Simple linear regression is used to estimate the relationship between two quantitative
variables. For example satisfaction (independent variable) and loyalty (dependent)
If you have more than one independent variable, use multiple linear regression instead.
The formula for a simple linear regression is:
- y is the predicted value of the dependent variable (y) for any given value of the
independent variable (x).
- B0 is the intercept, the predicted value of y when the x is 0.
- B1 is the regression coefficient – how much we expect y to change as x increases.
- x is the independent variable ( the variable we expect is influencing y).
- e is the error of the estimate, or how much variation there is in our estimate of the
regression coefficient.
When reporting your results, include the beta scores (i.e. the regression coefficient), the R 2 value
and the p-value. For example: in the simple regression of satisfaction (independent variable) and
loyalty (dependent variable), the results could be written up as follows:
o We found a significant relationship (p < 0.05) between satisfaction and loyalty (R2 =
0.71). This means 0.71% of the variance observed in loyalty is explained by the
independent variable in the model.
For a simple linear regression, you can simply plot the observations on the x and y axis and then
include the regression line and regression function. For example, in the relationship between
income (independent) and happiness (dependent), the graphical illustration could look like this:
When the line is upward sloping the beta (gradient) is positive and when downward sloping the
beta is negative.
Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable.
Reference
Bevans, R. (June 1, 2022). Simple Linear Regression | An Easy Introduction & Examples. Scribbr.
Retrieved October 22, 2022, from https://www.scribbr.com/statistics/simple-linear-regression/
= the regression coefficient ( ) of the first independent variable ( ) (a.k.a. the effect
that increasing the value of the independent variable has on the predicted y value)
… = do the same for however many independent variables you are testing
Multiple linear regression is somewhat more complicated than simple linear regression, because
there are more parameters than will fit on a two-dimensional plot.
However, there are ways to display your results that include the effects of multiple independent
variables on the dependent variable, even though only one independent variable can actually be
plotted on the x-axis.
1. To definitively understand how many factors are needed to explain common themes amongst a
given set of variables.
2. To determine the extent to which each variable in the dataset is associated with a common
theme or factor.
Not all factors are created equal; some factors have more weight than others.
In a simple example, imagine your bank conducts a phone survey for customer satisfaction and
the results show the following factor loadings:
Factor
Variable Factor 1 Factor 3
2
Question
0.885 0.121 -0.033
1
Question
0.829 0.078 0.157
2
Question
0.777 0.190 0.540
3
The factors that affect the question the most (and therefore have the highest factor loadings)
are bolded. Factor loadings are similar to correlation coefficients in that they can vary from -1 to
1. The closer factors are to -1 or 1, the more they affect the variable. A factor loading of zero
would indicate no effect.
The Kaiser-Meyer-Olkin test checks to see if your data is suitable for FA (rule of thumb
KMO>0.5)
Bartlett’s test is another indication of the strength of the relationship among variables. Rule of
thumb is for Bartlett’s Test Of Sphericity to be significant (p <.05).
When doing factor analysis, the variable are not arranged by constructs
A practical example:
o RQ1: What are the factors that underlie customer satisfaction in the all-inclusive
hotels?:
Variables
- Sat1
- Sat2
- Sat3
- Sat4
- Sat5
- Sat6
- Sat7
- Sat8
- Sat9
- Sat10
o When you run the factor analysis, you will obtain a set of factors, say 3 as follows:
o Variables that are correlated with each other will end up in the same factor and those
not correlated will end up in another factor in which they are correlated with the
variables in that factor
References
Bevans, R. (June 1, 2022). Multiple Linear Regression | A Quick Guide (Examples). Scribbr. Retrieved
October 21, 2022, from https://www.scribbr.com/statistics/multiple-linear-regression/