Traro2o Project 7
Project 7
Communicate Data Findings
In this Project i am going to choose the dataset Loan Data from Prosper
(https://s3.amazonaws.com/udacity-hosted-
downloads/ud651/prosperL oanData,csv&sa=D&ust=1554486256021000), and then perform Exploratory
and Explanotory Data Analysis. | am also going to write down the steps on how i did the project
Lets import the libraries @
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
Ymatplotlib inline
hw lets read the downloaded dataset into a pandas dataframe
Ea}
In [2]: loan=pd.read_csv(‘prosperLoanData.csv')
Now, lets go ahead and access our data, visually and
programatically
1. Lets take a random sample of 5 to see the structure of the Data
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1129Traro2o
In [3]:
Loan. sample(5)
out [3
Project 7
ListingKey _ListingNumber
ListingCreationDate CreditGrade Term Loan’
63488 62928410015425210489022
83877 1FF63589331918539051225
75069 3A6F3550979242211184848,
98194 251F35313326281524DA90D
57366 F4933604395712050327160
5 rows x 81 columns
On the first glance we can clearly see that, there are total of 81 columns, and clearly the CreditGrade
267360
898719
608541
598311
1212506
2008-01-17
19:47:34.063000000,
2013-09-11
07:18:29.247000000
2012-07-07
16:16:03,303000000,
2011-11-09
18:25:48.973000000,
2014-02-27
18:10:49,983000000
A
NaN
NaN
NaN
Nan
36
60
36
36
36
Cor
Char
Corr
‘Column has Null values. Lets check how many rows and column this data set has and the data types of
all using the .info() command
locathosta8e8inbconvertihimt Desktop Projet
"TiProject 7 ipyrbPdownloadfalseTraro2o
In [4]: loan.info()
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
Project 7Traro2o Project 7
Rangelndex: 113937 entries, @ to 113936
Data columns (total 81 columns):
Listingkey 113937 non-null object
ListingNumber 113937 non-null inte4
ListingCreationbate 113937 non-null object
CreditGrade 28953 non-null object
Term 113937 non-null inted
Loanstatus 113937 non-null object
Closedbate 55@89 non-null object
Borrower APR 113912 non-null floatea
BorrowerRate 113937 non-null floatea
Lendervield 113937 non-null floatea
EstimatedEffectiveYield 84853 non-null floated
Estimatedloss 84853 non-null floated
EstimatedReturn 84853 non-null floatea
ProsperRating (numeric) 84853 non-null floated
ProsperRating (Alpha) 84853 non-null object
ProsperScore 84853 non-null floate4
Listingcategory (numeric) 113937 non-null int64
BorrowerState 108422 non-null object
Occupation 110349 non-null object
EmploymentStatus 111682 non-null object
EmploymentStatusburation 186312 non-null float6e4
IsBorrowerHomeowner 113937 non-null bool
CurrentlyInGroup 113937 non-null bool
Groupkey 13341 non-null object
bateCreditPulled 113937 non-null object
CreditScoreRangeLower 113346 non-null floated
CreditScoreRangeUpper 113346 non-null floated
FirstRecordedCreditLine 113248 non-null object
CurrentCreditLines 106333 non-null floates
OpenCreditLines 106333 non-null floates
TotalCreditLinespast7years 113240 non-null floated
OpenRevolvingAccounts 113937 non-null int64
OpenRevolvingNonthlyPayment 113937 non-null floatea
InquiriesLast6Months 113240 non-null floatea
Totalinquiries 112778 non-null floated
CurrentDelinquencies 113240 non-null floated
AmountDelinquent 186315 non-null floate4
DelinquenciesLast7Years 112947 non-null floatea
PublicRecordsLast1@vears 113240 non-null floated
PublicRecordsLast12Months 1186333 non-null floatea
RevolvingCreditBalance 106333 non-null floatea
BankcardUtilization 1106333 non-null floate4
AvailableBankcardCredit 186393 non-null floated
TotalTrades 186393 non-null floate4
TradesNeverDelinquent (percentage) 106393 non-null floatea
TradesOpenedLastéMonths 106393 non-null floated
DebtToIncomeRatio 105383 non-null floats
IncomeRange 113937 non-null object
IncomeVerifiable 113937 non-null bool
StatedMonthlyIncone 113937 non-null floates
Loankey 113937 non-null object
TotalProsperLoans 22085 non-null floates
TotalProsperPaymentsBilled 22085 non-null floats
OnTimeProsperPayments 22085 non-null floate4
locahosta8e8inbconvertihimt Desktop Proje
"TiProject 7 ipyrbPdownloadfalse 429Traro2o
locathosta8e8inbconvertihimt Desktop Projet
ProsperPaymentsLessThanOneMonthLate
ProsperPaymentsOneMonthPlusLate
ProsperPrincipalBorrowed
ProsperPrincipaloutstanding
ScorexChangeatTimeOfListing
LoanCurrentDaysDelinquent
LoanFirstDefaultedcycleNunber
LoanMonthsSinceOrigination
LoanNunber
LoanOriginalAnount
LoanOriginationbate
LoanOriginationQuarter
MemberKey
MonthlyLoanPayment
LP_CustomerPayments
LP_CustomerPrincipalPayments
LpLinterestandFees
LP_ServiceFees
LP_collectionFees
LP_GrossPrincipalLoss
LP_NetPrincipalLoss
LP_NonPrincipalRecoverypayments
PercentFunded
Reconmendations
InvestmentFromFriendsCount
InvestmentFronFriendsAmount
Investors
Project 7
22085
22085
22085
22085
18928
113937
116952
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
113937
non-null float6a
non-null floated
non-null float6a
non-null floatea
non-null floated
non-null intea
non-null floatea
non-null intea
non-null intea
non-null inted
non-null object
non-null object
non-null object
non-null floated
non-null floatea
non-null floates
non-null floatea
non-null floatea
non-null floatea
non-null floatea
non-null floatea
non-null floatea
non-null floatea
non-null intea
non-null int6a
non-null floatea
non-null intéa
dtypes: bool(3), floaté4(se), int64(11), object(17)
memory usage: 68.1+ MB
"TiProject 7 ipyrbPdownloadfalseTraro2o
In [5]:
loan. isnul1().sum().head (60)
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
Project 7Traro2o
localhost a8e8inbconvertihimt Desktop Proje
outs:
Listingkey
ListingNumber
Listingcreationbate
CreditGrade
Term
Loanstatus
Closedbate
Borrower APR
BorrowerRate
Lendervield
EstimatedEffectiveYield
Estimatedloss
EstimatedReturn
ProsperRating (numeric)
ProsperRating (Alpha)
ProsperScore
Listingcategory (numeric)
BorrowerState
Occupation
EmploymentStatus
EmploymentStatusDuration
IsBorrowerHomeowner
CurrentlyInGroup
Groupkey
DatecreditPulled
CreditScoreRangeLower
CreditScoreRangeUpper
FirstRecordedCreditLine
CurrentCreditLines
OpenCreditLines
TotalCreditLinespast7years
OpenRevolvingAccounts
OpenRevolvingMonthlyPayment
InquiriesLast6Months
TotalInquiries
CurrentDelinquencies
AmountDelinquent
DelinquenciesLast7Vears
PublicRecordsLast1@vears
PublicRecordsLast12Months
RevolvingCreditBalance
BankcardUtilization
AvailableBankcardCredit
Totaltrades
‘TradesNeverDelinquent (percentage)
‘TradesOpenedLastéMonths
DebtToIncomeRatio
IncomeRange
IncomeVerifiable
StatedMonthlyIncome
Loankey
TotalProsperLoans
TotalProsperPaymentsBilled
OnTimeProsperPayments
ProsperPaymentsLessThanOneMonthLate
ProsperPaymentsOneMonthPlusLate
ProsperPrincipalBorrowed
"TiProject 7 ipyrbPdownloadfalse
Project 7
100596
@
591
591,
697
7604
7604
697
129Traro2o Project 7
ProsperPrincipaloutstanding
ScorexChangeAtTimeOfListing
LoanCurrentDaysDelinquent
dtype: intea
91852
95009
Now, we can clearly see that there are so many columns that has null values in it. So i am going to
choose from those column which are relevant and dont have Null values
To do so, we are going to first, store a li
t of columns that we are going to us
(@ BorrowerAPR which
means Annual Percentage Rate, Loan status which would mean the status of the Loan Given, and much
more
In [6]:
columns= [ ‘ListingNunber’,
‘ListingCategory (numeric)',
‘Term’,
“Loanstatus',
"BorrowerAPR' ,
“BorrowerRate' ,
"ProsperRating (Alpha)',
“ProsperRating (numeric)*,
‘Occupation’,
“EmploymentStatus' ,
*EmploymentStatusDuration’ ,
“IsBorrowerHomeowner
“IncomeVerifiable' ,
*statedMonthlyIncome' ,
‘MonthlyLoanPayment ',
"Reconmendations’ ,
"DebtToIncomeRatio’ ,
"LoanOriginalamount’
"PercentFunded',
"IncomeRange’ ,
"Investors',
"Borrowerstate’
]
In (7
oan=loan[ columns}
Now, lets check for Duplicate Data
In [8]: row=loan.shape[9]
In [9
nrow=loan.drop_duplicates().shape[2]
In [10]: (row-nrow)/row*100
out[1@]: @.7644575511028024
Now, lets check for the data dtypes
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falsoTraro2o
In [11]:
loan. info()
Project 7
RangeIndex: 113937 entries, @ to 113936
Data columns (total 22 columns):
ListingNumber
ListingCategory (numeric)
Term
Loanstatus
BorrowerAPR
BorrowerRate
ProsperRating (Alpha)
ProsperRating (numeric)
Occupation
EmploymentStatus
EmploymentStatusburation
IsBorrowerHomeowner
IncomeVerifiable
StatedMonthlyIncome
MonthlyLoanPayment
Recommendations
DebtToIncomeRatio
LoanOriginalAnount
PercentFunded
IncomeRange
Investors
BorrowerState
113937
113937
113937
113937
113912
113937
non-null
non-null
non-null
non-null
non-null
non-null
intea
intea
intea
object
Floates
Floates
84853 non-null object
84853 non-null floated.
110349
111682
106312
113937
113937
113937
113937
113937
105383
113937
113937
113937
113937
108422
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
object
object
floates
bool
bool
floatea
floated
inted
Floatea
inted
floated
object
intea
object
dtypes: bool(2), floate4(8), int64(6), object(6)
memory usage: 17.6+ MB
Quality Issues,
1. Incorrect Data TYpe for Loan Status,Prosper rating Alpha, Income Range, IsBorrowerHomeowner, Prosper
rating numeric, Employment Status, Income Verifiable, Term.
2, Lots of Null values.
3. 0.76% of the Data is duplicated.
Cleaning the Data
Note: Before going ahead with the further steps we must always keep a copy of our data for safety
In [12]: cleanloan=loan.copy()
locathosta8e8inbconvertihimt Desktop Projet
"TiProject 7 ipyrbPdownloadfalseTraro2o
Define
Droping Duplicates from the data frame.
Code
In [13]:
Test
In (14
Define
Project 7
cleanloan.drop_duplicates(inplace-True)
cleanloan.info()
Int6dIndex: 113066 entries, @ to 113936
Data columns (total 22 columns):
ListingNumber 113066 non-null inte4
ListingCategory (numeric) 113866 non-null intea
Term 113666 non-null inte4
Loanstatus 113066 non-null object
BorrowerAPR 113041 non-null floates
BorrowerRate 113066 non-null floates
ProsperRating (Alpha) 83982 non-null object
ProsperRating (numeric) 83982 non-null floates
Occupation 1€9537 non-null object
EnploymentStatus 110811 non-null. object
EnploymentStatusburation 105441 non-null floatea
IsBorrowerHoneowner 113866 non-null bool
InconeVerifiable 113866 non-null bool
StatedtonthlyIncome 113866 non-null float64
MonthlyLoanPayment 113866 non-null float64
Recommendations 113066 non-null int64
DebtToIncomeRatio 104594 non-null float64
LoanOriginalAnount 113066 non-null inte4
PercentFunded 113066 non-null float6s
IncomeRange 113066 non-null object
Investors 113066 non-null inte4
BorrowerState 107551 non-null object
dtypes: bool(2), floaté4(8), int64(6), object(6)
memory usage: 18.3+ MB
Droping Null Values.
Code
locathosta8e8inbconvertihimt Desktop Projet
"TiProject 7 ipyrbPdownloadfalse
1029Traro2o Project 7
In [15]:
cleanloan.dropna(inplace-True)
Test
In [16]: cleanloan.info()
Int64Index: 75486 entries, 1 to 113936
Data columns (total 22 columns):
ListingNumber 75486 non-null inte4
Listingcategory (numeric) 75486 non-null inte
Term 75486 non-null int64
Loanstatus 75486 non-null object
Borrower APR 75486 non-null floates
BorrowerRate 75486 non-null floate4
ProsperRating (Alpha) 75486 non-null object
ProsperRating (numeric) 75486 non-null float64
Occupation 75486 non-null object
EnploymentStatus 75486 non-null object
EnploymentStatusburation 75486 non-null floated
IsBorrowerHoneowner 75486 non-null bool
InconeVerifiable 75486 non-null bool
StatedMonthlyIncone 75486 non-null float64
MonthlyLoanPayment 75486 non-null float6s
Recommendations 75486 non-null int64
Debt ToIncomeRatio 75486 non-null float64
LoanOriginalAmount 75486 non-null int64
PercentFunded 75486 non-null floatea
IncomeRange 75486 non-null object
Investors 75486 non-null int64
BorrowerState 75486 non-null object
dtypes: bool(2), Floats4(8), ints4(6), object(6)
memory usage: 12.2+ MB
Define.
‘Changing Datatypes
Code
In [17]: cleanloan['ProsperRating (numeric) ]
pe(‘ category")
Cleanloan. IncomeRange=cleanloan. IncomeRange. astype( category’)
cleanloan. IsBorrowerHomeowner=cleanloan. IsBorrowerHomeowner .astype( 'bool')
Leanloan[‘ProsperRating (numeric)'].asty
Test
locathosta8e8inbconvertihimt Desktop Projet
"TiProject 7 ipyrbPdownloadfalse 1129Traro2o
In [18
In [19]:
cleanloan.info()
Project 7
Int6dIndex: 75486 entries, 1 to 113936
Data columns (total 22 columns.
ListingNumber 75486
ListingCategory (numeric) 75486
Term 75486
Loanstatus 75486
BorrowerAPR 75486
BorrowerRate 75486
ProsperRating (Alpha) 75486
ProsperRating (numeric) 75486
Occupation 75486
EmploymentStatus 75486
EmploymentStatusburation 75486
IsBorrowerHomeowner 75486
IncomeVerifiable 75486
StatedMonthlyIncome 75486
MonthlyLoanPayment 75486
Recommendations 75486
DebtToIncomeRatio 75486
LoanOriginalAnount 75486
PercentFunded 75486
IncomeRange 75486
Investors 75486
BorrowerState 75486
dtypes:
memory usage: 11.2+ MB
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
non-null
intea
intea
intea
object
Floated.
Floated.
object
category
object
object
floated
bool
bool
floated
floated
inted
floatea
inte
floates
category
inted
object
bool(2), category(2), floatea(7), intea(6), object(5)
cleanloan=cleanloan.drop(" Recommendations’ ,axis=1)
What is the structure of your dataset?
After clear
my dataset, the structure of my dataset is reduced to 75486x20
What is/are the main feature(s) of interest in your dataset?
| want to find out which people are able to pay loan
What features in the dataset do you think will help support your investigation into
your feature(s) of interest?
| think many features such as ISBorrowerHomeowner, Occupation, Employment Status, Employment status,
Duration, Income range will help us to predict our outcome
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
1209Traro2o Project 7
Univariate Exploration
In this section, Lets investigate individual variables so that we can notice some insights,
In [20
plt. Figure(Figsize=[14,7])
countstatus=cleanloan.LoanStatus..value_counts(). index
sb. countplot (data=cleanloan,y='LoanStatus’ ,order=countstatus, color="c");
plt.xlabel( ‘Frequency’, fontsize=14);
plt.ylabel(‘Loan Status’, fontsize=14)
plt.title('Loan Status’, fontsize=18)
plt.grid();
Loan Status
grosses
Buon orsoay
Frequency,
We can see most of the people have current loan status
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1929‘i2a2020 Project 7
In [21
plt. Figure(Figsize=[13,9])
sorted_counts = cleanloan{ 'EmploymentStatus’ ].value_counts(). index
sb.countplot(data = cleanloan, y = 'Employnentstatus', color = 'c',order=sorte
d_counts) 5
plt.xlabel('Frequency’, fontsize=14)
plt.ylabel(‘Employment Status’, fontsize=14)
plt.title('No of people having employment status", fontsize=18);
plt.grid();
No of people having employment status
Enotes
Employment Status
se-enpones
etenpred
Frequency
Clearly we can see that most of the people are employed and there is a very less fracticion who are not
employed
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1429‘i2a2020 Project 7
In [22
plt. Figure(Figsize=[13,9])
sorted_occu = cleanloan{ ‘Occupation’ ].value_counts().head(6) . index
sb.countplot(data = cleanloan, y = ‘Occupation’, color = ‘c' ,order=
3
plt.xlabel( ‘Frequency’, fontsize=14)
plt.ylabel(‘Career’, fontsize=14)
plt.title('Career Count’, fontsize=18);
plt.grid();
sorted_occu
Career Count
g
g
é
Comoe Penier
vost
Frequency
Most of the people belong to the Other Profession
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso‘i2a2020 Project 7
In [23
plt. Figure(figsize=[14,7])
sorted_state = cleanloan{ ‘Borrowerstate’}.value_counts().head(6) . index
sb.countplot(data = cleanloan, y = ‘BorrowerState', color = ‘c',order=sorted_s
tate);
plt.xlabel( ‘Frequency’, fontsize=14)
plt.ylabel('states’, fontsize=14)
plt.title('states with most Loans’, fontsize=18);
plt.grid()5
States with most Loans
Frequency
‘Most of the people taking loan belong to California which is the most economic state of US, followed by
New York which is the second most economic city of USA.
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falsoTraro2o Project 7
In [24]: plt. figure(figsize-[14,7])
sorted_countshome = cleanloan{ 'IsBorrowerHoneowner ' ].value_counts()
plt.pie(sorted_countshome, labels = sorted_countshome.index, startangle = 90,
counterclock = False, autopct='%1.0FX%");
False
Tue
54% of people do have their own Home
Lets focus on finding about the people who have income less than
30000 and see what are the factors that influence the ratings
In [25]: poor=cleanloan.query('StatedMonthlyIncome < 3¢00@")
In [26]: k=poor.MonthlyLoanPayment .max()
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1729‘i2a2020 Project 7
In [27
plt. Figure(Figsize=[14,7])
bins = np.arange(o, k+28000, 1000)
plt.hist(data = cleanloan, x = 'StatedMonthlylncone', bins = bins, color='c');
plt.xlabel(‘Monthly Income’, fontsize=14) ;
plt.ylabel( ‘Frequency’, fontsize=14);
plt.grid(color="black’,alpha=9.6)
plt.title("Monthly Income of Borrowers
» fontsize=24);
Monthly Income of Borrowers
Monthly income
In [28]: poor.StatedMonthlyIncone.describe()
out(28]: count — 75283.ee00e0
mean 5882.263464
std 3446.721934
min 2.250000
25% 3583.333333,
50% 5000. 000000
73% 7166. 666667
max 29833333333
Name: StatedMonthlyIncome, dtype: floated
We can clearly see that most of the people take loan with the income of 5882$
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1829‘i2a2020 Project 7
In [29
plt. Figure(Figsize=[13,9])
sorted_occu = poor[ ‘Occupation’ ].value_counts().head(6) . index
sb.countplot(data = poor, y = ‘Occupation’, color = ‘c',order=
plt.xlabel(‘Frequency’, fontsize=14)
plt.ylabel(‘Career’, fontsize=14)
plt.title('Career Count’, fontsize=18);
plt.grid()5
Career Count
g
g
é
Comoe Penier
vost
Frequency
Even for poor people Other and Professional are the top choices of Employment
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
sorted_occu)‘i2a2020
In [36]:
Project 7
plt. Figure(Figsize=[14,7])
plt.hist(data = poor, x = ‘EmploynentStatusDuration’, color="c');
plt.xlabel('No of days", fontsize=14);
plt.ylabel( ‘Frequency’, fontsize=14);
plt.grid(color="black’ , alpha=9.6)
plt.title("Days since borrowers joined their Job ", fontsize=24);
Days since borrowers joined their Job
Frequency
i
We can see that most of the people are under 100 days of their jobs
Iocahost 8888inbconverthimiDesktopiProject7/Project 7./pynb?download=ta
20129‘i2a2020 Project 7
In [31
plt. Figure(Figsize=[15,7])
plt.subplot(1, 2, 1)
‘ok-poor. Loanriginalamount .max()
bins=np.arange(2, ok+1000, 1800)
plt.hist(data =poor, x = 'LoanOriginalAnount’ ,bins=bins,color="c')
plt.xlabel(*Loan Amount’ ,fontsize=15)
plt.ylabel( Frequency’, fontsize=15) ;
plt.title("Loan of Poor Borrowers”, fontsize=15);
plt.grid()
plt.subplot(1, 2, 2)
bins = np.arange(2, k+28000, 1000)
plt.hist(data = cleanloan, x = 'statedMonthlyIncone’, bins = bins,color="c');
plt.xlabel( ‘Monthly Income’ , fontsize=15) ;
plt.ylabel( Frequency" , fontsize=15) ;
plt.title(*Monthly Income of Poor Borrowers", fontsize=15);
plt-grid()
Loan of Poor Borrowers Monthy Income of Poor Borrowers
Loan Amount ‘Monthly Income
We can see that most of the people are taking loan in the range of 5000 just like the average of monthly
income, but one thing to notice here is that people having 15,000 as a salary is less but still many are
taking loan for that much big amount
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 21129Traro2o
In [32
out [32]:
Project 7
oor[‘ListingCategory (numeric)']
-value_counts()
1 47918
7 8228
2 6249
3 3607
6 2027
13001758
151368
14 783
18 778
20 718
1s 706
16 288
5 2e1
1 198
8 187
9 83
18 82
v7 49
12 44
° 19
Name: ListingCategory (numeric), dtype: intea
Insights
Most of the Poor people are taking Loans for Debt Consodilation, Home Improvemnt,
Business and Household Expenses.
The states with highest Loans are California which is the economic Hub of US and then New
York
Mostly People with recent joinings upto 100 Days are taking Loans
46% of the people dont have their home.
There is a strange pattem to be noticed that is the average people taking loans has salary
5000 dollars, yet people are taking loans of 25000 with a high spike
‘The top choice for employment even is Others or Professional Career.
Most of them are currently on Loan
Bivariate Exploration
In this section, investigate relationships between pairs of variables in your data,
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
2229‘i2a2020 Project 7
In [33
poor’. IncomeRange.value_counts()
out[33]: $50,000-74,999 23432
$25,000-49,999 21219
$100, 000+ 13646
$75,000-99,999 13427
$1-24,999 3558
Not employed 1
Name: IncomeRange, dtype: inte4
In [34]: plt. Figure(figsize=[12,8])
sb.boxplot(data = poor, x = ‘IncomeRange’, y = ‘LoanOriginalAnount', color =
very
plt.xticks(rotation = 99);
plt.ylabel(‘Loan Amount’, fontsize=14)
plt.xlabel(‘Incone range’, fontsize=14)
plt.title('Incone Range vs Loan Amount’, fontsize=20);
Income Range vs Loan Amount
25000 ’
Leen Aout
i
|
T
i
3
5 i i 5 i }
3 i i i i !
a a g _
Income range
We can clearly see that the people with income higher than 100,000+ take the highest amount of Loan
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 2229‘i2a2020 Project 7
In [35
plt. Figure(Figsize=[14,6])
sb.countplot(data = poor, x = ‘EnploymentStatus", hue = ‘IncomeRange’, palette
= "BuGn_r')
plt.legend(loc = 1, ncol = 1, framealpha = 1, title = ‘Income range’)
plt.xticks(rotation = 98)
plt.grid()
plt.title(' Employment Status vs Income range’);
Employment Status vs income range
troy
steamer
Most of the people who are employed with range of 50-75 are highest
In [36]: plt.figure(figsize=[14,6])
sb.countplot(data = poor, x = ‘Term’, hue = ‘IncomeRange’, palette
plt.xticks(rotation = 92)
plt.grid()
plt.title('Term v/s Income Range’);
Term ws Income Range
censors
“BuGn_r')
=
Renee
i)
=
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
24129Traro2o Project 7
Most of the people are in the term 36, with 50-75 range
an [37]:
plt. Figure(Figsize=[14,6])
sb.countplot(data = poor, x
lette = ‘BuGn_r')
plt.xticks(rotation = 90)
plt.grid()
plt.title( ‘Prosper Rating vs Income range’);
ProsperRating (Alpha)' , hue
Prosper Rating vs Income range
Pespeatng eh)
Minimum are from the rating AA
In (38): plt. Figure(Figsize=[14,6])
sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue
s', palette = ‘BuGn_r')
plt.xticks(rotation = 90)
plt.grid()
plt.title('ProsperRating Vs Employnentstatus' );
ProsperRating Vs Employmentstatus
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso
“IncomeRange’, pa
Sso00n74808
censored
“EmploymentStatu
Exess
a eros
Ler eronred
fcensores
25129‘i2a2020 Project 7
In [39
plt. Figure(Figsize=[14,6])
sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue = ‘IncomeVerifiabl
e', palette = 'BuGn_r’)
plt.xticks(rotation = 90)
plt.grid()
plt.title('Prosperrating vs TncomeVerifable');
Prosperatng vs incomeVerfate
In [40]: plt. figure(figsize=[14,6])
sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue = ‘IsBorrowerHomeow
ner’, palette = ‘BuGn_r')
plt.xticks(rotation = 90)
plt.grid()
plt.title( ‘Prosper Ratings V/s Homeowner");
oper ating Vs Hormomer
4 eee
c ci rs ri c5 Pj 7
Pesperatng en)
Majority of the D and E dont have any Home
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129Traro2o Project 7
Insights
1.LeanOriginalAmount is highest for A and B Prosper ratings, when compared with income
range.
1, Majority are from the range 100+.
2. AA rating having more people not owning home compared to HR which is very strage.
3, Majority of the people are employed
Talk about some of the relationships you observed in tl part of the
investigation, How did the feature(s) of interest vary with other features in the
dataset?
| saw that most of the borrowers with highest loan amount are taken by Employed, this is followed by others and
fulltime employees and mostly it decides the rating of one.
Did you observe any interesting relationships between the other features (not the
main feature(s) of interest)?
Even people with 100,000+ income are in HR list which are employed and have a home
Multivariate Exploration
Does Loan Status Depends on Income?
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 21129Traro2o Project 7
In [41
plt.figure(figsize = [18, 7])
ax = sb.barplot(data = poor, x = ‘IncomeRange’, y = ‘LoanOriginalAmount', hue
= ‘Loanstatus')
ax.legend(loc = 1, ncol = 1, framealpha = 1, title = ‘Loan Status’)
plt.title(’Applicants - Loan Amount across Prosper Rating and Income Range’);
‘selene ton Amo sce Pepe ad coe ange
id
Yes, clearly people with more income are having more people in current status
Does Salary decides the Ratings?
In [42]: plt.figure(figsize = [18, 7])
ax = sb.barplot(data = poor, x = ‘EmploymentStatus', y = ‘StatedMonthlyIncome’
» hue = ‘ProsperRating (Alpha)")
ax.legend(loc = 1, ncol = 1, framealpha = 1, title - ‘Prosper rating’)
plt.title(' Applicants -Employement across Prosper Rating and Income Range’);
secant mghyonert cus Pope Rng adc a
Yes, but for Employed, Other and Full Time Employees, but not for others
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129Traro2o Project 7
Does APR varies with One having home and Prosper Rating?
In [43]: plt.figure(figsize = [15, 7])
ax = sb.pointplot (data = poor, x = ‘ProsperRating (Alpha)', y
hue = "IsBorrowerHoneowner",
dodge = 0.3, linestyles = "*);
plt.title( ‘Applicants - Home ownner status across ProsperRating and BorrowerRa
te");
BorrowerAPR',
oplcants Home ownner status 205s osperRating and SerowerRate
Yes, AA has lowest APR while HR has the highest. One thing to notice that in Having a home doesnt
impact rating that much
Conclusion
1, Most of the poor borrowers fall in prosper rating of B , irrespective of the income range
2, The monthly income of borrowers are having higher values for employed, other and full time employment
status with the prosper rating of AA, A and B.
3. Having a home doesnt effect the interest. To conclude the analysis , | say that the loan approval status is
dependent on the Income, Homeownerstatus and employment status.
In [
Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129