Eda Case Study Code
Eda Case Study Code
warnings.filterwarnings('ignore')
In [6]: loan.head(10)
Out[6]: id member_id loan_amnt funded_amnt funded_amnt_inv term int_rate installment grade sub_gr
36
0 1077501 1296599 5000 5000 4975.0 10.65% 162.87 B
months
60
1 1077430 1314167 2500 2500 2500.0 15.27% 59.83 C
months
36
2 1077175 1313524 2400 2400 2400.0 15.96% 84.33 C
months
36
3 1076863 1277178 10000 10000 10000.0 13.49% 339.31 C
months
60
4 1075358 1311748 3000 3000 3000.0 12.69% 67.79 B
months
36
5 1075269 1311441 5000 5000 5000.0 7.90% 156.46 A
months
60
6 1069639 1304742 7000 7000 7000.0 15.96% 170.08 C
months
36
7 1072053 1288686 3000 3000 3000.0 18.64% 109.43 E
months
60
8 1071795 1306957 5600 5600 5600.0 21.28% 152.39 F
months
60
9 1071570 1306721 5375 5375 5350.0 12.69% 121.45 B
months
In [7]: loan.shape
(39717, 111)
Out[7]:
In [8]: loan.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39717 entries, 0 to 39716
Columns: 111 entries, id to total_il_high_credit_limit
dtypes: float64(74), int64(13), object(24)
memory usage: 33.6+ MB
In [9]: loan.isnull().any()
Loading [MathJax]/extensions/Safe.js
id False
Out[9]:
member_id False
loan_amnt False
funded_amnt False
funded_amnt_inv False
...
tax_liens True
tot_hi_cred_lim True
total_bal_ex_mort True
total_bc_limit True
total_il_high_credit_limit True
Length: 111, dtype: bool
In [10]: loan.isnull().any().sum()
68
Out[10]:
In [11]: loan.describe()
8 rows × 87 columns
36
top NaN NaN NaN NaN NaN 10.99% Na
months
Loading [MathJax]/extensions/Safe.js
In [13]: columns = loan.columns
In [18]: loan
36
0 1077501 1296599 5000 5000 4975.0 10.65% 162.87 B
months
60
1 1077430 1314167 2500 2500 2500.0 15.27% 59.83 C
months
36
2 1077175 1313524 2400 2400 2400.0 15.96% 84.33 C
months
36
3 1076863 1277178 10000 10000 10000.0 13.49% 339.31 C
months
60
4 1075358 1311748 3000 3000 3000.0 12.69% 67.79 B
months
... ... ... ... ... ... ... ... ... ...
36
39712 92187 92174 2500 2500 1075.0 8.07% 78.42 A
months
36
39713 90665 90607 8500 8500 875.0 10.28% 275.38 C
months
36
39714 90395 90390 5000 5000 1325.0 8.07% 156.84 A
months
36
39715 90376 89243 5000 5000 650.0 7.43% 155.38 A
months
36
39716 87023 86999 7500 7500 800.0 13.75% 255.43 E
months
In [19]: loan.columns
In [21]: loan.drop(['last_credit_pull_d','last_pymnt_amnt','last_pymnt_d','collection_recovery_fe
'total_rec_late_fee', 'total_rec_int','total_rec_prncp', 'total_pymnt_inv', 'o
loan.shape
(39717, 41)
Out[21]:
In [22]: loan.columns
In [23]: loan.drop(['installment','pymnt_plan',
'delinq_2yrs', 'earliest_cr_line', 'inq_last_6mths',
'mths_since_last_delinq', 'mths_since_last_record', 'open_acc',
'pub_rec', 'revol_bal', 'revol_util', 'total_acc',
'initial_list_status', 'total_pymnt', 'next_pymnt_d',
'collections_12_mths_ex_med', 'policy_code', 'application_type',
'acc_now_delinq', 'chargeoff_within_12_mths', 'delinq_amnt',
'pub_rec_bankruptcies', 'tax_liens'], axis=1, inplace = True)
In [24]: loan.shape
(39717, 18)
Out[24]:
In [25]: loan.columns
0
Out[26]:
In [28]: loan.shape
(39717, 16)
Out[28]:
In [29]: loan.head()
Loading [MathJax]/extensions/Safe.js
Out[29]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
36
0 5000 5000 4975.0 10.65% B B2 10+ years RENT
months
60
1 2500 2500 2500.0 15.27% C C4 < 1 year RENT
months
36
2 2400 2400 2400.0 15.96% C C5 10+ years RENT
months
36
3 10000 10000 10000.0 13.49% C C1 10+ years RENT
months
60
4 3000 3000 3000.0 12.69% B B5 1 year RENT
months
Since we will be analysing the scenario where the chances are more that a loan applicant will default.
Hence, we will ignore the loans that are currently running and therefore, we will use loans which are fully
paid and charged off for the analysis.
loan_amnt 0.00
Out[31]:
funded_amnt 0.00
funded_amnt_inv 0.00
term 0.00
int_rate 0.00
grade 0.00
sub_grade 0.00
emp_length 2.68
home_ownership 0.00
annual_inc 0.00
verification_status 0.00
issue_d 0.00
loan_status 0.00
purpose 0.00
addr_state 0.00
dti 0.00
dtype: float64
In [32]: #Removing the null values from emp_length as the null percentage is less than 5%, hence i
loan = loan[~loan['emp_length'].isnull()]
checking the purpose of loan as this is the important factor to be used in analysis.
Loading [MathJax]/extensions/Safe.js
debt_consolidation 47.08
Out[33]:
credit_card 13.05
other 9.89
home_improvement 7.42
major_purchase 5.54
small_business 4.55
car 3.86
wedding 2.43
medical 1.75
moving 1.47
house 0.94
vacation 0.93
educational 0.84
renewable_energy 0.25
Name: purpose, dtype: float64
In [34]: #Since, we do not know what the terms other stands for, so get ris of it.
loan.drop(loan[loan.purpose == 'other'].index, inplace = True)
In [35]: loan.purpose.value_counts()
debt_consolidation 17675
Out[35]:
credit_card 4899
home_improvement 2785
major_purchase 2080
small_business 1710
car 1448
wedding 913
medical 656
moving 552
house 354
vacation 348
educational 317
renewable_energy 94
Name: purpose, dtype: int64
In [36]: loan.term.value_counts()
36 months 25270
Out[36]:
60 months 8561
Name: term, dtype: int64
loan.head(5)
Out[37]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
In [38]: loan['int_rate'].value_counts()
Loading [MathJax]/extensions/Safe.js
10.99% 800
Out[38]:
11.49% 685
13.49% 669
7.51% 660
7.88% 637
...
16.20% 1
18.72% 1
16.01% 1
16.96% 1
14.70% 1
Name: int_rate, Length: 367, dtype: int64
loan['int_rate'] = loan['int_rate'].astype(float)
loan.dtypes
loan_amnt int64
Out[39]:
funded_amnt int64
funded_amnt_inv float64
term int64
int_rate float64
grade object
sub_grade object
emp_length object
home_ownership object
annual_inc float64
verification_status object
issue_d object
loan_status object
purpose object
addr_state object
dti float64
dtype: object
In [40]: loan.head(5)
Out[40]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
In [41]: loan.emp_length.value_counts()
In [43]: loan.emp_length.value_counts()
10 7664
Out[43]:
0 3994
2 3847
3 3636
4 3027
5 2895
1 2811
6 1987
7 1550
8 1308
9 1112
Name: emp_length, dtype: int64
In [44]: loan.head(5)
Out[44]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
In [45]: loan['annual_inc'].describe()
count 3.383100e+04
Out[45]:
mean 7.003334e+04
std 6.596019e+04
min 4.000000e+03
25% 4.200000e+04
50% 6.000000e+04
75% 8.400000e+04
max 6.000000e+06
Name: annual_inc, dtype: float64
0.50 60000.0
Out[46]:
0.70 77000.0
0.90 118000.0
0.95 143387.5
0.99 235560.0
1.00 6000000.0
Name: annual_inc, dtype: float64
loan['annual_inc'].describe()
Loading [MathJax]/extensions/Safe.js
count 33492.000000
Out[47]:
mean 66490.324304
std 35165.906151
min 4000.000000
25% 42000.000000
50% 60000.000000
75% 82000.000000
max 235000.000000
Name: annual_inc, dtype: float64
In [48]: loan['issue_d'].value_counts()
Loading [MathJax]/extensions/Safe.js
Dec-11 1853
Out[48]:
Nov-11 1791
Sep-11 1670
Oct-11 1661
Aug-11 1569
Jun-11 1479
Jul-11 1474
May-11 1419
Apr-11 1371
Mar-11 1304
Jan-11 1244
Feb-11 1154
Dec-10 1116
Oct-10 1004
Nov-10 1000
Jul-10 971
Sep-10 955
Aug-10 944
Jun-10 850
May-10 771
Apr-10 667
Mar-10 599
Feb-10 506
Nov-09 504
Jan-10 484
Dec-09 473
Oct-09 464
Sep-09 407
Aug-09 363
Jul-09 331
Jun-09 318
May-09 283
Mar-09 251
Apr-09 246
Feb-09 228
Jan-09 221
Dec-08 203
Mar-08 198
Nov-08 167
Feb-08 154
Jan-08 145
Apr-08 125
Oct-08 82
Dec-07 74
Jul-08 65
May-08 62
Aug-08 57
Jun-08 56
Oct-07 31
Nov-07 29
Aug-07 29
Jul-07 28
Sep-08 27
Sep-07 14
Jun-07 1
Name: issue_d, dtype: int64
loan.head(5)
Loading [MathJax]/extensions/Safe.js
Out[49]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
Out[50]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
In [51]: loan['loan_inc_ratio'].quantile([.25,.50,.75])
0.25 10.29
Out[51]:
0.50 16.67
0.75 25.38
Name: loan_inc_ratio, dtype: float64
def loan_inc_ratio_category(n):
if n < 10:
return 'low'
elif n >= 10 and n < 17:
return 'medium'
elif n >= 17 and n < 25:
return 'high'
else:
return 'very high'
loan['categorised_loan_inc_ratio'] = loan['loan_inc_ratio'].apply(loan_inc_ratio_categor
loan.head()
Out[52]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
In [53]: loan['int_rate'].quantile([.25,.5,.75])
0.25 8.94
Out[53]:
0.50 11.83
0.75 14.46
Name: int_rate, dtype: float64
Loading [MathJax]/extensions/Safe.js
In [54]: #categorise int_rate column into categorised_int_rate_perc column
# < 9% is low
# between 9% and 11% ( both inclusive ) is medium
# between 12% to 13% is high
# Greater than 14% is very high
def interest_rates(n):
if n < 9:
return 'low'
elif n >= 9 and n < 12:
return 'medium'
elif n >= 12 and n < 14:
return 'high'
else:
Out[54]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
5 rows × 21 columns
In [55]: loan['emp_length'].quantile([.25,.5,.75])
0.25 2.0
Out[55]:
0.50 4.0
0.75 9.0
Name: emp_length, dtype: float64
def length_of_emp(n):
if n < 2:
return 'entry level'
elif n >= 2 and n < 4:
return 'junior level'
elif n >= 4 and n < 9:
return 'middle level'
else:
return 'senior level'
loan['categorised_emp_length'] = loan['emp_length'].apply(length_of_emp)
loan.head()
Loading [MathJax]/extensions/Safe.js
Out[56]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
5 rows × 22 columns
0.25 42000.0
Out[57]:
0.50 60000.0
0.75 82000.0
Name: annual_inc, dtype: float64
def annual_income(n):
if n < 41000:
return 'low income'
elif n >= 41000 and n < 60000:
return 'medium income'
elif n >= 60000 and n < 83000:
return 'high income'
else:
return 'very high income'
loan['categorised_annual_inc'] = loan['annual_inc'].apply(annual_income)
loan.head()
Out[58]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
5 rows × 23 columns
In [59]: loan.categorised_annual_inc.isnull().sum()
0
Out[59]:
0.25 8.4075
Out[60]:
0.50 13.5400
0.75 18.6500
Name: dti, dtype: float64
Loading [MathJax]/extensions/Safe.js
In [120… loan['loan_amnt'].quantile([.25, .5, .75])
0.25 6000.0
Out[120]:
0.50 10000.0
0.75 15000.0
Name: loan_amnt, dtype: float64
def loan_ammount(n):
if n < 5400:
return 'low'
elif n >= 5400 and n < 9600:
return 'medium'
elif n >= 9600 and n < 15000:
return 'high'
else:
return 'very high'
loan['categorised_loan_amnt'] = loan['loan_amnt'].apply(loan_ammount)
loan.head()
Out[63]: loan_amnt funded_amnt funded_amnt_inv term int_rate grade sub_grade emp_length home_ownership
5 rows × 24 columns
In [64]: loan.loan_status.describe()
count 33492
Out[64]:
unique 2
top Fully Paid
freq 28724
Name: loan_status, dtype: object
In [65]: loan.columns
In [67]: plot = sns.catplot(data = loan, x = 'loan_status', kind = 'count', palette ='Set1', aspe
Loading [MathJax]/extensions/Safe.js
plt.title('Loan Status', fontsize = 14)
plt.xlabel('loan_status', fontsize = 12)
plt.ylabel('count', fontsize = 12)
The above graph shows that in the data provided there are 14.2% applicants who have defaulted/charged
off
Loading [MathJax]/extensions/Safe.js
The graph shows that the loan amount is majorly spread around 6000 t0 16000 approx.
The above graph shows that the loan percentage is spread around 8% to 14% approx.
Loading [MathJax]/extensions/Safe.js
In [70]: sns.distplot(loan['annual_inc'], bins = 6)
plt.title('Annual Income')
plt.show()
The above figure shows that the majority of applicants have annual income is spread around 40000USD to
90000USD
In [71]: loan.columns
In [72]: plot = sns.catplot(data = loan, x = 'term', kind = 'count', palette ='Set1', aspect = .5
plt.title('Loan Duration', fontsize = 14)
plt.xlabel('Term', fontsize = 12)
plt.ylabel('Count', fontsize = 12)
Loading [MathJax]/extensions/Safe.js
The above graph shows that 74.75 applicants have taken loan of 36 months duration
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate('{:1.1f}%'.format((p.get_height()*100/len(loan))), ((p.get_x()+ p.get_w
color = 'red', ha = 'center', va = 'bottom')
The following are the loan purposes for whcih more than 5% applicants have taken the loan Debt
Consilidation 52.4% Credit card 14.2% Home Improvement 8.1% Major purchase 6.2% Small Business
5.0%
Loading [MathJax]/extensions/Safe.js
In [74]: order_grade = ['A','B','C','D','E','F','G']
plot = sns.catplot(data = loan, x ='grade', kind = 'count', palette = 'Set1', aspect = 1
plt.title('Grade', fontsize = 14)
plt.xlabel('Grade', fontsize =12)
plt.ylabel('Count', fontsize = 12)
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate('{:1.1f}%'.format((p.get_height()*100/len(loan))), ((p.get_x()+ p.get_w
color = 'red', ha = 'center', va = 'bottom')
The above graph shows that most of the applicants fall under category B(30.3%) followed by A and C
25.6%, 20.3% respectively.
ax =plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
Loading [MathJax]/extensions/Safe.js
As per the above graph, applicants for loan increased as the year increased, In 2011 the number of
applicants for loan was 53.7%, Since the variable issue year does not provide us any direction in the
analysis so we will use it for further analysis.
In [76]: loan.categorised_emp_length.value_counts()
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
plt.show()
Loading [MathJax]/extensions/Safe.js
There are more number of loan applicants belonging to the middle level category(31.9%) 4 to 8 Years
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
Loading [MathJax]/extensions/Safe.js
Bivariate analysis
In [98]: loan_correlation = loan.corr()
plt.figure(figsize =(14,7))
sns.heatmap(loan_correlation, annot = True, cmap = 'RdYlGn')
<Axes: >
Out[98]:
Loading [MathJax]/extensions/Safe.js
The above heatmap shows that the loan amount, funded amount and funded amount investment are very
closely correlated, Hence we can take any of them for our analysis.
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x()+ p.get_wi
color = 'red', ha = 'center', va = 'bottom')
Loading [MathJax]/extensions/Safe.js
75% of applicants opted for 36 month loan duration and 66.8% have fully paid while 7.9% are charged off.
On the other hand approx 25% applicants had 60months tenure and 6.3% were charged off.
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
Loading [MathJax]/extensions/Safe.js
The above graph shows that the applicants whose income is verified seem to default more, hence we can
ignore this as a cause for default for the further analysis.
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color ='red', ha= 'center', va = 'bottom')
The above graph shows the applicants who are on rent and mortgage are likely to default as compaired to
applicants who own homes.
Loading [MathJax]/extensions/Safe.js
In [102… #Loan amount vs Loan Status
ordered = ['low', 'medium', 'high', 'very high']
g = sns.catplot(data = loan, x = 'categorised_loan_amnt', kind = 'count', hue = 'loan_st
order = ordered)
plt.title('Loan Amount vs Loan Status', fontsize = 14)
plt.xlabel('Loan Amount', fontsize = 12)
plt.ylabel('Count', fontsize = 12)
ax = g.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
plt.show()
The above graph shows the applicants having high loan amount are likely to default.
ax = g.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
plt.show()
Loading [MathJax]/extensions/Safe.js
The above graph shows the applicants having higher interest rates are likely to default.
ax = g.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va = 'bottom')
Loading [MathJax]/extensions/Safe.js
The above graph shows the applicants having higher annual income are less likely to default.
ax = n.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x()+p.get_wid
color = 'red', ha = 'center', va = 'bottom')
ax = plot.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x() + p.get_w
color = 'red', ha = 'center', va= 'bottom')
plt.show()
ax = g.facet_axis(0,0)
for p in ax.patches:
ax.annotate("{:1.1f}%".format((p.get_height()*100/len(loan))), ((p.get_x()+ p.get_wi
color = 'red', ha = 'center', va = 'bottom')
plt.show()
In [110… #Employment length vs loan Amount vs Purpose for charged off loan status
ordered = ['entry level', 'junior level', 'middle level', 'senior level']
sns.catplot(data = df, x = 'categorised_emp_length', y = 'loan_amnt', hue = 'purpose', a
palette = 'Set1', kind = 'bar', order = ordered)
plt.title('Employment Length vs Loan Amount vs Purpose of Defaulted',fontsize = 14)
plt.xlabel('Employment Length',fontsize = 12)
plt.ylabel('Loan Amount', fontsize = 12)
plt.show()
In [111… #Term vs loan Amount vs Purpose for charged off loan status
sns.catplot(data = df, x = 'term', y = 'loan_amnt', hue = 'purpose', aspect = 2.5,
palette = 'Set1', kind = 'bar')
plt.title('Term vs Loan Amount vs Purpose of Defaulted',fontsize = 14)
plt.xlabel('Term',fontsize = 12)
plt.ylabel('Loan Amount', fontsize = 12)
plt.show()
Loading [MathJax]/extensions/Safe.js
Multivariate analysis outcome:
Following are the loan applicants who are likely to default: Applicants who take loan for small business.
Applicants whose annual income is in the category of low and medium have defaulted moreon small
business, while as the applicants who are categorised in high and very have defaulted in small business
and debt consolidation. Irrespective of employment length, those who have taken loan for small business
have defaulted. Applicants with a 60 months term duration for small business.
Hence we can infer that default rate is more for loan taken on small business followed by debt
consolidation.
# Display crosstab
display(filter_states_crosstab)
Loading [MathJax]/extensions/Safe.js
loan_status Charged Off Fully Paid All percentage_defaulted
addr_state
Loading [MathJax]/extensions/Safe.js
In [113… loan.categorised_int_rate_perc.value_counts()
medium 9430
Out[113]:
very high 9366
low 8382
high 6314
Name: categorised_int_rate_perc, dtype: int64
In [114… loan.categorised_annual_inc.isnull().sum()
0
Out[114]:
# Create a cross-tabulation
int_rate_crosstab = pd.crosstab(int_rate_loan_df['categorised_int_rate_perc'], int_rate_
int_rate_crosstab.drop(int_rate_crosstab.tail(1).index, inplace=True)
categorised_int_rate_perc
Loading [MathJax]/extensions/Safe.js
In [116… # filter_df -- percentage default for purpose
#create a crosstab
purposecrosstab = pd.crosstab(filtered_df['purpose'],filtered_df['loan_status'], margins
purposecrosstab.drop(purposecrosstab.tail(1).index, inplace = True)
purposecrosstab['percentage_defaulted'] = round(100*((purposecrosstab['Charged Off']/pur
#Display crosstab
display(purposecrosstab)
#plot the map
plot_map(purposecrosstab, 'Purpose', 'purpose', .25)
purpose
Applicants who have taken loan for small business (27.31) tend to default more
sub_grade_crosstab = pd.crosstab(sub_grade_df['sub_grade'],sub_grade_df['loan_status'],
sub_grade_crosstab.drop(sub_grade_crosstab.tail(1).index, inplace = True)
sub_grade_crosstab['percentage_defaulted'] = round(100*((sub_grade_crosstab['Charged Off
display(sub_grade_crosstab)
grade
Loading [MathJax]/extensions/Safe.js
loan_status Charged Off Fully Paid All percentage_defaulted
sub_grade
F4 49 83 132 37.12
F5 49 51 100 49.00
G1 28 59 87 32.18
G2 25 44 69 36.23
G3 18 23 41 43.90
G4 10 38 48 20.83
G5 10 16 26 38.46
Loading [MathJax]/extensions/Safe.js
From the above graph it can be observed that probability of applicants increases with the grades from A to
G
annual_inc_crosstab = pd.crosstab(annual_inc_df['categorised_annual_inc'],annual_inc_df[
annual_inc_crosstab.drop(annual_inc_crosstab.tail(1).index, inplace = True)
annual_inc_crosstab['percentage_defaulted'] = round(100*((annual_inc_crosstab['Charged O
display(annual_inc_crosstab)
plot_map(annual_inc_crosstab, 'Annual income', 'annual income', .10)
categorised_annual_inc
Loading [MathJax]/extensions/Safe.js
In [119… emp_length_df = loan
emp_length_df.sort_values(['emp_length'])
emp_length_crosstab = pd.crosstab(emp_length_df['emp_length'],emp_length_df['loan_status
emp_length_crosstab.drop(emp_length_crosstab.tail(1).index, inplace = True)
emp_length_crosstab['percentage_defaulted'] = round(100*((emp_length_crosstab['Charged O
display(emp_length_crosstab)
emp_length
Loading [MathJax]/extensions/Safe.js
Following are main parameters, taken into
consideration for arriving at the analysis conclusion.
Interest ratePurpose GradeTerm Emp LengthAnnual Income
As per the analysis it can inferred that applicants who are of low income and have taken high interest loan
with longer duration for small business have more profitability of defaulting.
In [ ]:
In [ ]:
Loading [MathJax]/extensions/Safe.js