0% found this document useful (0 votes)
54 views

Project 5

Udacity Data Analyst project 5

Uploaded by

Adarsh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
54 views

Project 5

Udacity Data Analyst project 5

Uploaded by

Adarsh Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 29
Traro2o Project 7 Project 7 Communicate Data Findings In this Project i am going to choose the dataset Loan Data from Prosper (https://s3.amazonaws.com/udacity-hosted- downloads/ud651/prosperL oanData,csv&sa=D&ust=1554486256021000), and then perform Exploratory and Explanotory Data Analysis. | am also going to write down the steps on how i did the project Lets import the libraries @ In [1]: import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sb Ymatplotlib inline hw lets read the downloaded dataset into a pandas dataframe Ea} In [2]: loan=pd.read_csv(‘prosperLoanData.csv') Now, lets go ahead and access our data, visually and programatically 1. Lets take a random sample of 5 to see the structure of the Data Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1129 Traro2o In [3]: Loan. sample(5) out [3 Project 7 ListingKey _ListingNumber ListingCreationDate CreditGrade Term Loan’ 63488 62928410015425210489022 83877 1FF63589331918539051225 75069 3A6F3550979242211184848, 98194 251F35313326281524DA90D 57366 F4933604395712050327160 5 rows x 81 columns On the first glance we can clearly see that, there are total of 81 columns, and clearly the CreditGrade 267360 898719 608541 598311 1212506 2008-01-17 19:47:34.063000000, 2013-09-11 07:18:29.247000000 2012-07-07 16:16:03,303000000, 2011-11-09 18:25:48.973000000, 2014-02-27 18:10:49,983000000 A NaN NaN NaN Nan 36 60 36 36 36 Cor Char Corr ‘Column has Null values. Lets check how many rows and column this data set has and the data types of all using the .info() command locathosta8e8inbconvertihimt Desktop Projet "TiProject 7 ipyrbPdownloadfalse Traro2o In [4]: loan.info() Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso Project 7 Traro2o Project 7 Rangelndex: 113937 entries, @ to 113936 Data columns (total 81 columns): Listingkey 113937 non-null object ListingNumber 113937 non-null inte4 ListingCreationbate 113937 non-null object CreditGrade 28953 non-null object Term 113937 non-null inted Loanstatus 113937 non-null object Closedbate 55@89 non-null object Borrower APR 113912 non-null floatea BorrowerRate 113937 non-null floatea Lendervield 113937 non-null floatea EstimatedEffectiveYield 84853 non-null floated Estimatedloss 84853 non-null floated EstimatedReturn 84853 non-null floatea ProsperRating (numeric) 84853 non-null floated ProsperRating (Alpha) 84853 non-null object ProsperScore 84853 non-null floate4 Listingcategory (numeric) 113937 non-null int64 BorrowerState 108422 non-null object Occupation 110349 non-null object EmploymentStatus 111682 non-null object EmploymentStatusburation 186312 non-null float6e4 IsBorrowerHomeowner 113937 non-null bool CurrentlyInGroup 113937 non-null bool Groupkey 13341 non-null object bateCreditPulled 113937 non-null object CreditScoreRangeLower 113346 non-null floated CreditScoreRangeUpper 113346 non-null floated FirstRecordedCreditLine 113248 non-null object CurrentCreditLines 106333 non-null floates OpenCreditLines 106333 non-null floates TotalCreditLinespast7years 113240 non-null floated OpenRevolvingAccounts 113937 non-null int64 OpenRevolvingNonthlyPayment 113937 non-null floatea InquiriesLast6Months 113240 non-null floatea Totalinquiries 112778 non-null floated CurrentDelinquencies 113240 non-null floated AmountDelinquent 186315 non-null floate4 DelinquenciesLast7Years 112947 non-null floatea PublicRecordsLast1@vears 113240 non-null floated PublicRecordsLast12Months 1186333 non-null floatea RevolvingCreditBalance 106333 non-null floatea BankcardUtilization 1106333 non-null floate4 AvailableBankcardCredit 186393 non-null floated TotalTrades 186393 non-null floate4 TradesNeverDelinquent (percentage) 106393 non-null floatea TradesOpenedLastéMonths 106393 non-null floated DebtToIncomeRatio 105383 non-null floats IncomeRange 113937 non-null object IncomeVerifiable 113937 non-null bool StatedMonthlyIncone 113937 non-null floates Loankey 113937 non-null object TotalProsperLoans 22085 non-null floates TotalProsperPaymentsBilled 22085 non-null floats OnTimeProsperPayments 22085 non-null floate4 locahosta8e8inbconvertihimt Desktop Proje "TiProject 7 ipyrbPdownloadfalse 429 Traro2o locathosta8e8inbconvertihimt Desktop Projet ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate ProsperPrincipalBorrowed ProsperPrincipaloutstanding ScorexChangeatTimeOfListing LoanCurrentDaysDelinquent LoanFirstDefaultedcycleNunber LoanMonthsSinceOrigination LoanNunber LoanOriginalAnount LoanOriginationbate LoanOriginationQuarter MemberKey MonthlyLoanPayment LP_CustomerPayments LP_CustomerPrincipalPayments LpLinterestandFees LP_ServiceFees LP_collectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss LP_NonPrincipalRecoverypayments PercentFunded Reconmendations InvestmentFromFriendsCount InvestmentFronFriendsAmount Investors Project 7 22085 22085 22085 22085 18928 113937 116952 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 113937 non-null float6a non-null floated non-null float6a non-null floatea non-null floated non-null intea non-null floatea non-null intea non-null intea non-null inted non-null object non-null object non-null object non-null floated non-null floatea non-null floates non-null floatea non-null floatea non-null floatea non-null floatea non-null floatea non-null floatea non-null floatea non-null intea non-null int6a non-null floatea non-null intéa dtypes: bool(3), floaté4(se), int64(11), object(17) memory usage: 68.1+ MB "TiProject 7 ipyrbPdownloadfalse Traro2o In [5]: loan. isnul1().sum().head (60) Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso Project 7 Traro2o localhost a8e8inbconvertihimt Desktop Proje outs: Listingkey ListingNumber Listingcreationbate CreditGrade Term Loanstatus Closedbate Borrower APR BorrowerRate Lendervield EstimatedEffectiveYield Estimatedloss EstimatedReturn ProsperRating (numeric) ProsperRating (Alpha) ProsperScore Listingcategory (numeric) BorrowerState Occupation EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup Groupkey DatecreditPulled CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines OpenCreditLines TotalCreditLinespast7years OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries CurrentDelinquencies AmountDelinquent DelinquenciesLast7Vears PublicRecordsLast1@vears PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization AvailableBankcardCredit Totaltrades ‘TradesNeverDelinquent (percentage) ‘TradesOpenedLastéMonths DebtToIncomeRatio IncomeRange IncomeVerifiable StatedMonthlyIncome Loankey TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate ProsperPrincipalBorrowed "TiProject 7 ipyrbPdownloadfalse Project 7 100596 @ 591 591, 697 7604 7604 697 129 Traro2o Project 7 ProsperPrincipaloutstanding ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent dtype: intea 91852 95009 Now, we can clearly see that there are so many columns that has null values in it. So i am going to choose from those column which are relevant and dont have Null values To do so, we are going to first, store a li t of columns that we are going to us (@ BorrowerAPR which means Annual Percentage Rate, Loan status which would mean the status of the Loan Given, and much more In [6]: columns= [ ‘ListingNunber’, ‘ListingCategory (numeric)', ‘Term’, “Loanstatus', "BorrowerAPR' , “BorrowerRate' , "ProsperRating (Alpha)', “ProsperRating (numeric)*, ‘Occupation’, “EmploymentStatus' , *EmploymentStatusDuration’ , “IsBorrowerHomeowner “IncomeVerifiable' , *statedMonthlyIncome' , ‘MonthlyLoanPayment ', "Reconmendations’ , "DebtToIncomeRatio’ , "LoanOriginalamount’ "PercentFunded', "IncomeRange’ , "Investors', "Borrowerstate’ ] In (7 oan=loan[ columns} Now, lets check for Duplicate Data In [8]: row=loan.shape[9] In [9 nrow=loan.drop_duplicates().shape[2] In [10]: (row-nrow)/row*100 out[1@]: @.7644575511028024 Now, lets check for the data dtypes Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso Traro2o In [11]: loan. info() Project 7 RangeIndex: 113937 entries, @ to 113936 Data columns (total 22 columns): ListingNumber ListingCategory (numeric) Term Loanstatus BorrowerAPR BorrowerRate ProsperRating (Alpha) ProsperRating (numeric) Occupation EmploymentStatus EmploymentStatusburation IsBorrowerHomeowner IncomeVerifiable StatedMonthlyIncome MonthlyLoanPayment Recommendations DebtToIncomeRatio LoanOriginalAnount PercentFunded IncomeRange Investors BorrowerState 113937 113937 113937 113937 113912 113937 non-null non-null non-null non-null non-null non-null intea intea intea object Floates Floates 84853 non-null object 84853 non-null floated. 110349 111682 106312 113937 113937 113937 113937 113937 105383 113937 113937 113937 113937 108422 non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null object object floates bool bool floatea floated inted Floatea inted floated object intea object dtypes: bool(2), floate4(8), int64(6), object(6) memory usage: 17.6+ MB Quality Issues, 1. Incorrect Data TYpe for Loan Status,Prosper rating Alpha, Income Range, IsBorrowerHomeowner, Prosper rating numeric, Employment Status, Income Verifiable, Term. 2, Lots of Null values. 3. 0.76% of the Data is duplicated. Cleaning the Data Note: Before going ahead with the further steps we must always keep a copy of our data for safety In [12]: cleanloan=loan.copy() locathosta8e8inbconvertihimt Desktop Projet "TiProject 7 ipyrbPdownloadfalse Traro2o Define Droping Duplicates from the data frame. Code In [13]: Test In (14 Define Project 7 cleanloan.drop_duplicates(inplace-True) cleanloan.info() Int6dIndex: 113066 entries, @ to 113936 Data columns (total 22 columns): ListingNumber 113066 non-null inte4 ListingCategory (numeric) 113866 non-null intea Term 113666 non-null inte4 Loanstatus 113066 non-null object BorrowerAPR 113041 non-null floates BorrowerRate 113066 non-null floates ProsperRating (Alpha) 83982 non-null object ProsperRating (numeric) 83982 non-null floates Occupation 1€9537 non-null object EnploymentStatus 110811 non-null. object EnploymentStatusburation 105441 non-null floatea IsBorrowerHoneowner 113866 non-null bool InconeVerifiable 113866 non-null bool StatedtonthlyIncome 113866 non-null float64 MonthlyLoanPayment 113866 non-null float64 Recommendations 113066 non-null int64 DebtToIncomeRatio 104594 non-null float64 LoanOriginalAnount 113066 non-null inte4 PercentFunded 113066 non-null float6s IncomeRange 113066 non-null object Investors 113066 non-null inte4 BorrowerState 107551 non-null object dtypes: bool(2), floaté4(8), int64(6), object(6) memory usage: 18.3+ MB Droping Null Values. Code locathosta8e8inbconvertihimt Desktop Projet "TiProject 7 ipyrbPdownloadfalse 1029 Traro2o Project 7 In [15]: cleanloan.dropna(inplace-True) Test In [16]: cleanloan.info() Int64Index: 75486 entries, 1 to 113936 Data columns (total 22 columns): ListingNumber 75486 non-null inte4 Listingcategory (numeric) 75486 non-null inte Term 75486 non-null int64 Loanstatus 75486 non-null object Borrower APR 75486 non-null floates BorrowerRate 75486 non-null floate4 ProsperRating (Alpha) 75486 non-null object ProsperRating (numeric) 75486 non-null float64 Occupation 75486 non-null object EnploymentStatus 75486 non-null object EnploymentStatusburation 75486 non-null floated IsBorrowerHoneowner 75486 non-null bool InconeVerifiable 75486 non-null bool StatedMonthlyIncone 75486 non-null float64 MonthlyLoanPayment 75486 non-null float6s Recommendations 75486 non-null int64 Debt ToIncomeRatio 75486 non-null float64 LoanOriginalAmount 75486 non-null int64 PercentFunded 75486 non-null floatea IncomeRange 75486 non-null object Investors 75486 non-null int64 BorrowerState 75486 non-null object dtypes: bool(2), Floats4(8), ints4(6), object(6) memory usage: 12.2+ MB Define. ‘Changing Datatypes Code In [17]: cleanloan['ProsperRating (numeric) ] pe(‘ category") Cleanloan. IncomeRange=cleanloan. IncomeRange. astype( category’) cleanloan. IsBorrowerHomeowner=cleanloan. IsBorrowerHomeowner .astype( 'bool') Leanloan[‘ProsperRating (numeric)'].asty Test locathosta8e8inbconvertihimt Desktop Projet "TiProject 7 ipyrbPdownloadfalse 1129 Traro2o In [18 In [19]: cleanloan.info() Project 7 Int6dIndex: 75486 entries, 1 to 113936 Data columns (total 22 columns. ListingNumber 75486 ListingCategory (numeric) 75486 Term 75486 Loanstatus 75486 BorrowerAPR 75486 BorrowerRate 75486 ProsperRating (Alpha) 75486 ProsperRating (numeric) 75486 Occupation 75486 EmploymentStatus 75486 EmploymentStatusburation 75486 IsBorrowerHomeowner 75486 IncomeVerifiable 75486 StatedMonthlyIncome 75486 MonthlyLoanPayment 75486 Recommendations 75486 DebtToIncomeRatio 75486 LoanOriginalAnount 75486 PercentFunded 75486 IncomeRange 75486 Investors 75486 BorrowerState 75486 dtypes: memory usage: 11.2+ MB non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null non-null intea intea intea object Floated. Floated. object category object object floated bool bool floated floated inted floatea inte floates category inted object bool(2), category(2), floatea(7), intea(6), object(5) cleanloan=cleanloan.drop(" Recommendations’ ,axis=1) What is the structure of your dataset? After clear my dataset, the structure of my dataset is reduced to 75486x20 What is/are the main feature(s) of interest in your dataset? | want to find out which people are able to pay loan What features in the dataset do you think will help support your investigation into your feature(s) of interest? | think many features such as ISBorrowerHomeowner, Occupation, Employment Status, Employment status, Duration, Income range will help us to predict our outcome Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1209 Traro2o Project 7 Univariate Exploration In this section, Lets investigate individual variables so that we can notice some insights, In [20 plt. Figure(Figsize=[14,7]) countstatus=cleanloan.LoanStatus..value_counts(). index sb. countplot (data=cleanloan,y='LoanStatus’ ,order=countstatus, color="c"); plt.xlabel( ‘Frequency’, fontsize=14); plt.ylabel(‘Loan Status’, fontsize=14) plt.title('Loan Status’, fontsize=18) plt.grid(); Loan Status grosses Buon orsoay Frequency, We can see most of the people have current loan status Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1929 ‘i2a2020 Project 7 In [21 plt. Figure(Figsize=[13,9]) sorted_counts = cleanloan{ 'EmploymentStatus’ ].value_counts(). index sb.countplot(data = cleanloan, y = 'Employnentstatus', color = 'c',order=sorte d_counts) 5 plt.xlabel('Frequency’, fontsize=14) plt.ylabel(‘Employment Status’, fontsize=14) plt.title('No of people having employment status", fontsize=18); plt.grid(); No of people having employment status Enotes Employment Status se-enpones etenpred Frequency Clearly we can see that most of the people are employed and there is a very less fracticion who are not employed Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1429 ‘i2a2020 Project 7 In [22 plt. Figure(Figsize=[13,9]) sorted_occu = cleanloan{ ‘Occupation’ ].value_counts().head(6) . index sb.countplot(data = cleanloan, y = ‘Occupation’, color = ‘c' ,order= 3 plt.xlabel( ‘Frequency’, fontsize=14) plt.ylabel(‘Career’, fontsize=14) plt.title('Career Count’, fontsize=18); plt.grid(); sorted_occu Career Count g g é Comoe Penier vost Frequency Most of the people belong to the Other Profession Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso ‘i2a2020 Project 7 In [23 plt. Figure(figsize=[14,7]) sorted_state = cleanloan{ ‘Borrowerstate’}.value_counts().head(6) . index sb.countplot(data = cleanloan, y = ‘BorrowerState', color = ‘c',order=sorted_s tate); plt.xlabel( ‘Frequency’, fontsize=14) plt.ylabel('states’, fontsize=14) plt.title('states with most Loans’, fontsize=18); plt.grid()5 States with most Loans Frequency ‘Most of the people taking loan belong to California which is the most economic state of US, followed by New York which is the second most economic city of USA. Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso Traro2o Project 7 In [24]: plt. figure(figsize-[14,7]) sorted_countshome = cleanloan{ 'IsBorrowerHoneowner ' ].value_counts() plt.pie(sorted_countshome, labels = sorted_countshome.index, startangle = 90, counterclock = False, autopct='%1.0FX%"); False Tue 54% of people do have their own Home Lets focus on finding about the people who have income less than 30000 and see what are the factors that influence the ratings In [25]: poor=cleanloan.query('StatedMonthlyIncome < 3¢00@") In [26]: k=poor.MonthlyLoanPayment .max() Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1729 ‘i2a2020 Project 7 In [27 plt. Figure(Figsize=[14,7]) bins = np.arange(o, k+28000, 1000) plt.hist(data = cleanloan, x = 'StatedMonthlylncone', bins = bins, color='c'); plt.xlabel(‘Monthly Income’, fontsize=14) ; plt.ylabel( ‘Frequency’, fontsize=14); plt.grid(color="black’,alpha=9.6) plt.title("Monthly Income of Borrowers » fontsize=24); Monthly Income of Borrowers Monthly income In [28]: poor.StatedMonthlyIncone.describe() out(28]: count — 75283.ee00e0 mean 5882.263464 std 3446.721934 min 2.250000 25% 3583.333333, 50% 5000. 000000 73% 7166. 666667 max 29833333333 Name: StatedMonthlyIncome, dtype: floated We can clearly see that most of the people take loan with the income of 5882$ Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 1829 ‘i2a2020 Project 7 In [29 plt. Figure(Figsize=[13,9]) sorted_occu = poor[ ‘Occupation’ ].value_counts().head(6) . index sb.countplot(data = poor, y = ‘Occupation’, color = ‘c',order= plt.xlabel(‘Frequency’, fontsize=14) plt.ylabel(‘Career’, fontsize=14) plt.title('Career Count’, fontsize=18); plt.grid()5 Career Count g g é Comoe Penier vost Frequency Even for poor people Other and Professional are the top choices of Employment Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso sorted_occu) ‘i2a2020 In [36]: Project 7 plt. Figure(Figsize=[14,7]) plt.hist(data = poor, x = ‘EmploynentStatusDuration’, color="c'); plt.xlabel('No of days", fontsize=14); plt.ylabel( ‘Frequency’, fontsize=14); plt.grid(color="black’ , alpha=9.6) plt.title("Days since borrowers joined their Job ", fontsize=24); Days since borrowers joined their Job Frequency i We can see that most of the people are under 100 days of their jobs Iocahost 8888inbconverthimiDesktopiProject7/Project 7./pynb?download=ta 20129 ‘i2a2020 Project 7 In [31 plt. Figure(Figsize=[15,7]) plt.subplot(1, 2, 1) ‘ok-poor. Loanriginalamount .max() bins=np.arange(2, ok+1000, 1800) plt.hist(data =poor, x = 'LoanOriginalAnount’ ,bins=bins,color="c') plt.xlabel(*Loan Amount’ ,fontsize=15) plt.ylabel( Frequency’, fontsize=15) ; plt.title("Loan of Poor Borrowers”, fontsize=15); plt.grid() plt.subplot(1, 2, 2) bins = np.arange(2, k+28000, 1000) plt.hist(data = cleanloan, x = 'statedMonthlyIncone’, bins = bins,color="c'); plt.xlabel( ‘Monthly Income’ , fontsize=15) ; plt.ylabel( Frequency" , fontsize=15) ; plt.title(*Monthly Income of Poor Borrowers", fontsize=15); plt-grid() Loan of Poor Borrowers Monthy Income of Poor Borrowers Loan Amount ‘Monthly Income We can see that most of the people are taking loan in the range of 5000 just like the average of monthly income, but one thing to notice here is that people having 15,000 as a salary is less but still many are taking loan for that much big amount Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 21129 Traro2o In [32 out [32]: Project 7 oor[‘ListingCategory (numeric)'] -value_counts() 1 47918 7 8228 2 6249 3 3607 6 2027 13001758 151368 14 783 18 778 20 718 1s 706 16 288 5 2e1 1 198 8 187 9 83 18 82 v7 49 12 44 ° 19 Name: ListingCategory (numeric), dtype: intea Insights Most of the Poor people are taking Loans for Debt Consodilation, Home Improvemnt, Business and Household Expenses. The states with highest Loans are California which is the economic Hub of US and then New York Mostly People with recent joinings upto 100 Days are taking Loans 46% of the people dont have their home. There is a strange pattem to be noticed that is the average people taking loans has salary 5000 dollars, yet people are taking loans of 25000 with a high spike ‘The top choice for employment even is Others or Professional Career. Most of them are currently on Loan Bivariate Exploration In this section, investigate relationships between pairs of variables in your data, Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 2229 ‘i2a2020 Project 7 In [33 poor’. IncomeRange.value_counts() out[33]: $50,000-74,999 23432 $25,000-49,999 21219 $100, 000+ 13646 $75,000-99,999 13427 $1-24,999 3558 Not employed 1 Name: IncomeRange, dtype: inte4 In [34]: plt. Figure(figsize=[12,8]) sb.boxplot(data = poor, x = ‘IncomeRange’, y = ‘LoanOriginalAnount', color = very plt.xticks(rotation = 99); plt.ylabel(‘Loan Amount’, fontsize=14) plt.xlabel(‘Incone range’, fontsize=14) plt.title('Incone Range vs Loan Amount’, fontsize=20); Income Range vs Loan Amount 25000 ’ Leen Aout i | T i 3 5 i i 5 i } 3 i i i i ! a a g _ Income range We can clearly see that the people with income higher than 100,000+ take the highest amount of Loan Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 2229 ‘i2a2020 Project 7 In [35 plt. Figure(Figsize=[14,6]) sb.countplot(data = poor, x = ‘EnploymentStatus", hue = ‘IncomeRange’, palette = "BuGn_r') plt.legend(loc = 1, ncol = 1, framealpha = 1, title = ‘Income range’) plt.xticks(rotation = 98) plt.grid() plt.title(' Employment Status vs Income range’); Employment Status vs income range troy steamer Most of the people who are employed with range of 50-75 are highest In [36]: plt.figure(figsize=[14,6]) sb.countplot(data = poor, x = ‘Term’, hue = ‘IncomeRange’, palette plt.xticks(rotation = 92) plt.grid() plt.title('Term v/s Income Range’); Term ws Income Range censors “BuGn_r') = Renee i) = Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 24129 Traro2o Project 7 Most of the people are in the term 36, with 50-75 range an [37]: plt. Figure(Figsize=[14,6]) sb.countplot(data = poor, x lette = ‘BuGn_r') plt.xticks(rotation = 90) plt.grid() plt.title( ‘Prosper Rating vs Income range’); ProsperRating (Alpha)' , hue Prosper Rating vs Income range Pespeatng eh) Minimum are from the rating AA In (38): plt. Figure(Figsize=[14,6]) sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue s', palette = ‘BuGn_r') plt.xticks(rotation = 90) plt.grid() plt.title('ProsperRating Vs Employnentstatus' ); ProsperRating Vs Employmentstatus Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso “IncomeRange’, pa Sso00n74808 censored “EmploymentStatu Exess a eros Ler eronred fcensores 25129 ‘i2a2020 Project 7 In [39 plt. Figure(Figsize=[14,6]) sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue = ‘IncomeVerifiabl e', palette = 'BuGn_r’) plt.xticks(rotation = 90) plt.grid() plt.title('Prosperrating vs TncomeVerifable'); Prosperatng vs incomeVerfate In [40]: plt. figure(figsize=[14,6]) sb.countplot(data = poor, x ='ProsperRating (Alpha)' , hue = ‘IsBorrowerHomeow ner’, palette = ‘BuGn_r') plt.xticks(rotation = 90) plt.grid() plt.title( ‘Prosper Ratings V/s Homeowner"); oper ating Vs Hormomer 4 eee c ci rs ri c5 Pj 7 Pesperatng en) Majority of the D and E dont have any Home Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129 Traro2o Project 7 Insights 1.LeanOriginalAmount is highest for A and B Prosper ratings, when compared with income range. 1, Majority are from the range 100+. 2. AA rating having more people not owning home compared to HR which is very strage. 3, Majority of the people are employed Talk about some of the relationships you observed in tl part of the investigation, How did the feature(s) of interest vary with other features in the dataset? | saw that most of the borrowers with highest loan amount are taken by Employed, this is followed by others and fulltime employees and mostly it decides the rating of one. Did you observe any interesting relationships between the other features (not the main feature(s) of interest)? Even people with 100,000+ income are in HR list which are employed and have a home Multivariate Exploration Does Loan Status Depends on Income? Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 21129 Traro2o Project 7 In [41 plt.figure(figsize = [18, 7]) ax = sb.barplot(data = poor, x = ‘IncomeRange’, y = ‘LoanOriginalAmount', hue = ‘Loanstatus') ax.legend(loc = 1, ncol = 1, framealpha = 1, title = ‘Loan Status’) plt.title(’Applicants - Loan Amount across Prosper Rating and Income Range’); ‘selene ton Amo sce Pepe ad coe ange id Yes, clearly people with more income are having more people in current status Does Salary decides the Ratings? In [42]: plt.figure(figsize = [18, 7]) ax = sb.barplot(data = poor, x = ‘EmploymentStatus', y = ‘StatedMonthlyIncome’ » hue = ‘ProsperRating (Alpha)") ax.legend(loc = 1, ncol = 1, framealpha = 1, title - ‘Prosper rating’) plt.title(' Applicants -Employement across Prosper Rating and Income Range’); secant mghyonert cus Pope Rng adc a Yes, but for Employed, Other and Full Time Employees, but not for others Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129 Traro2o Project 7 Does APR varies with One having home and Prosper Rating? In [43]: plt.figure(figsize = [15, 7]) ax = sb.pointplot (data = poor, x = ‘ProsperRating (Alpha)', y hue = "IsBorrowerHoneowner", dodge = 0.3, linestyles = "*); plt.title( ‘Applicants - Home ownner status across ProsperRating and BorrowerRa te"); BorrowerAPR', oplcants Home ownner status 205s osperRating and SerowerRate Yes, AA has lowest APR while HR has the highest. One thing to notice that in Having a home doesnt impact rating that much Conclusion 1, Most of the poor borrowers fall in prosper rating of B , irrespective of the income range 2, The monthly income of borrowers are having higher values for employed, other and full time employment status with the prosper rating of AA, A and B. 3. Having a home doesnt effect the interest. To conclude the analysis , | say that the loan approval status is dependent on the Income, Homeownerstatus and employment status. In [ Iocathost 8886inbconverthimiDesktopiProject 7)Project 7 /pynb%download=falso 20129

You might also like