Ass 06 - Bank Loan Case Study
Ass 06 - Bank Loan Case Study
PROJECT DESCRIPTION – This project is about a case study relayed to a Bank Loan. We have to carry out an EDA
(Exploratory Data Analysis). Based on our analysis, we will get the solution for required questions.
APPROACH – I first analyzed the data. While analyzing, I found out that data had a lot of missing values. So my
first task was to get the missing values by performing mean, median and mode functions as required. So, I
began by cleaning the data and then finding the outliers so as to make the data standardized.
1. Present the overall approach of the analysis. Mention the problem statement and the analysis approach
briefly
I found out the blank percentage and median of the column and filled the empty spaces there. (This is just for one table.
Actual cleaning and filling of data is shown in excel file attached for other columns).
3. Identify if there are outliers in the dataset. Also, mention why do you think it is an outlier.
Foe Numerical columns, I found out the outliers and chose the value for the upper whisker as shown below. The credit
amount value above 195000 is considered to be an upper whisker.
4. Identify if there is data imbalance in the data. Find the ratio of data imbalance. The ratio of imbalance
for Target Table came out to be 91.92:8.07.
5. Explain the results of univariate, segmented univariate, bivariate analysis, etc. in business terms.
To perform the analysis, I first divided the data into two sets i.e. Target - 0 and Target – 1
AMT_CREDIT
WEEKDAY_APPR_PROCESS_START
INSIGHTS – We can conclude that application starting process is less on Saturday and
Sunday.
NAME_CONTRACT_TYPE
INSIGHTS – We can conclude that people prefer cash type loans more than other. People
take more cash loans.
NAME_HOUSING_TYPE
INSIGHTS – We can conclude that people living in houses fall in both the category of default
loans and non-default loans.
6. Find the top 10 correlation for the Client with payment difficulties and all other cases (Target
variable).
To find the correlation, we again divide the data into two sets based on Targets and consider Target – 1 as
defaulters.
NAME_EDUCATION_TYPE
INSIGHTS – We can find that people with education type as Secondary/Secondary Special
are more likely to default and people with education type Academic degree default the
least.
INSIGHTS – If we sum the total amount for loan in applications, we find that that people
mostly take cash loans.
AMT_APPLICATION and AMT_CREDIT
INSIGHTS – We find that the correlation coefficient is 0.9758 using excel formula =CORREAL.
AMT_INCOME_TOTAL and AMT_ANNUITY
INSIGHTS – We find that the correlation coefficient is 0.19166 using excel formula
=CORREAL.
CONCLUSION – From the above analysis, we can find out what kind of people and can repay
loan, what kinds of loan people prefer to take, people taking loans come from what
background, what is their source of income, for what type of people, the loan applications are
refused and based on which conditions.
RESULTS: -
1. People with academic degree have less defaults.
2. People prefer cash loans more than any other type.
3. People with secondary/secondary special as education type have more chances
of defaulting loans.
4. People who have less than 5 years of employment have high default rate.
5. Focused variable for application file – Target.
6. Focused variable for Previous application file – NAME_CONTRACT_STATUS.
7. Important fields to consider for loan repayment are –
8. NAME_EDUCATION_TYPE
9. AMT_INCOME_TOTAL
10.DAYS_EMPLOYED
11.AMT_CREDIT
12.People with lower total income are more likely to default.
13.People with high Credit amount are less likely to default.
NOTE :- All columns shown above are just 1 example in screenshot. For exact results
please find excel file attached. Drive link for file -
https://docs.google.com/spreadsheets/d/1gn2C2nSJdSSGcqeoeZQCG___3gRtdYRt/edit?usp=s
hare_link&ouid=112715989555881480949&rtpof=true&sd=true
THANK YOU