Lecture 2. Part 1-Regression Analysis. MLRM
Lecture 2. Part 1-Regression Analysis. MLRM
Based on the Least Square Method, the line of best fit to the data is
𝑦 = 1692911 + 1861.3x
Model Validation using Coefficient of
Determination
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 0.021044 which is close to zero hence the model is not a good
predictor for household income. Only 2.1% of the variation in total household
income is explained by total number of family members
There are other possible factors.
For example, we can we see that the highest income is from a family
of 8 members. We may look further into more variables such as:
How many members are younger than 17?
How many members are employed?
The assumptions that can be tested are the following:
(1) The family with more employed family members has higher
income while keeping the number of family member constant.
(2) The family with more family members younger than 17 y.o. the
lower the income while keeping the number of family member
constant.
This spreadsheet shows the standardized regression coefficients (b*) and the raw regression coefficients (b). The magnitude of these Beta
coefficients enable you to compare the relative contribution of each independent variable in the prediction of the dependent variable. As
is evident in the spreadsheet shown above, variables POP_CHNG, PT_RURAL, and N_EMPLD are the most important predictors of poverty;
of those, only the first two variables are statistically significant. The regression coefficient for POP_CHNG is negative; the less the
population increased, the greater the number of families who lived below the poverty level in the respective county. The regression weight
for PT_RURAL is positive; the greater the percent of rural population, the greater the poverty level.