Excel and R Analysis
Excel and R Analysis
Student’s Name
Professor’s Name
Course
Date
2
The response or dependent variable in our multivariable linear regression is the account balance
of each client.
Most of the predictor or independent variables are categorical, therefore, dummy variables had to
be created for the purpose of performing the multivariable linear regression; the following table
Nonprofessional = 0
Not married = 0
Secondary = 2
Tertiary = 3
No = 0
No = 0
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.173302801722
R Square 0.030033861085
Adjusted R Square 0.023041314254
Standard Error 2473.454712349
Observations 979
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.173302801722
R Square 0.030033861085
Observations 979
ANOVA
df SS MS F Significance F
Regression 7 183942358.310526277480 4.295125 0.00010842369
Residual 971 5940556845.835 6117978
Total 978 6124499204.145
Multiple R is the correlation coefficient and it is used to show the multiple correlation between
the response variable and the predictor variables. Our regression model has Multiple R = 0.1733;
this values indicates a very weak correlation between the response variable and the predictor
variables.
The R squared is the coefficient of determination, showing how many point fall on the regression
line; our regression model has a R2 = 0.03. This implies that 3% of the variation of y-values
around the mean are explained by the x-values. In other words, 3% of the values fit the model.
Higher R squared values are considered to be better, therefore, our R squared value means that
4
the model does not explain much of variation of the data. But our regression model is still
significant due to the fact that the p-value is 0.0001, which is less that the significance level of
0.05, hence indicating that the regression mode as a whole is statistically significance.
The table below shows the coefficient for each predictor variable along with the intercept value,
Coefficients
Intercept -535.0195887
Age 12.85198653
Job 186.5920589
Marital 553.358256
Education 434.2270047
u- housing loan 30.51171675
u- personal loan -328.7827573
Duration -0.341798459
The coefficient for the Age is approximately 12.85; this means that as the age increases the
account balance increases as well. Other variable that have positive correlation with the account
balance are Job, Marital status, Education, and Housing loan. Duration seems to behave in
opposite direction; this means that as the account balance decreases, duration increases, and vice-
versa. Personal loan seems to be having a negation correlation with the response variable account
balance as well.
To predict a back account balance for 55 years old, not married, professional, with the tertiary
education level, has neither housing nor personal loans, and the duration of the oldest bank
5
Residual Analysis
Through the use of the linear regression equation, residuals shows how far away the actual points
are from the predicted data points. The scatter plot below shows the predicated balances and their
respective residuals:
Residuals
40000
35000
30000
25000
20000
15000
10000
5000
0
0 200 400 600 800 1000 1200
-5000
The scatter plot shows that the assumptions of the linear regression are not serious violated due
Multicollinearity
fitting the model and interpreting the findings, it may be difficult if there is a high enough degree
6
of correlation between the variables. Isolating the relationship between each independent
variable and the dependent variable is one of the main objectives of regression analysis. When all
other independent variables are held constant, a regression coefficient is interpreted as the
average change in the dependent variable for each change in an independent variable of 1 unit.
Our treatment of multicollinearity depends on that final section. The concept is that just one
independent variable's value can be changed, not the others. When independent variables, on the
other hand, are correlated, it means that variations in one variable are related to shifts in another
one. It is more challenging to modify one variable without also changing another when there is a
high link between the two. Because the independent variables frequently change simultaneously,
it becomes challenging for the model to estimate the link between each independent variable and
used. The table shows that there are only two predictor variables with statistical significance in
our regression model: Marital status and education level – this is due to the fact that their p-
7
values are less that the signifance level of 0.05. Housing loan, personal loan, age, duration, and
job status seems that they are not statistically significant due to their large p-values. When they
are eliminated from the linear regression, the output will be the following:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.151353127753621
R Square 0.0229077692808039
Adjusted R Square 0.0209055311031007
Standard Error 2476.15691403559
Observations 979
ANOVA
df SS MS F Significance F
Regression 2 1.4E+08 7E+07 11.44108 1.23E-05
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.151353127753621
8