EDA Assignment
EDA Assignment
Name:Mallireddy Srinitha
Problem-Statement
The loan providing companies find it hard to give loans to the people due to their
insufficient or non-existent credit history. Because of that, some consumers use it as
their advantage by becoming a defaulter. Suppose you work for a consumer finance
company which specialises in lending various types of loans to urban customers. You
have to use EDA to analyse the patterns present in the data. This will ensure that the
applicants are capable of repaying the loan are not rejected.
When the company receives a loan application, the company has to decide for loan
approval based on the applicant’s profile. Two types of risks are associated with the
bank’s decision:
If the customer is likely to repay the loan, then not approving the loan results in a loss of
business to the company
If the customer is not likely to repay the loan, i.e. he/she is likely to default, then
approving the loan may lead to a financial loss for the company.
Business-Objectives
This case study aims to identify patterns which indicate if a client has
difficulty paying their instalments which may be used for taking actions
such as denying the loan, reducing the amount of loan, lending (to risky
applicants) at a higher interest rate, etc. This will ensure that the
consumers capable of repaying the loan are not rejected. Identification
of such applicants using EDA is the aim of this case study.
Specifications of Application_Data
● Shape: (30755,122)
● It is a combination of Numerical and categorical variable
columns
● Described the dataset and found the mean, standard
deviation, minimum value, maximum value, 25%, 50% and
75% values of each column
Missing Values in Application_Data
● Checked the missing values percentage of each column in
this data frame
● Drop columns with 45% or more Missing values
After dropping the columns we are left with 73 columns.
Impute missing values for Numerical Variables
From this below Pie chart we can see the Imbalance between target type
1 and 0
ratio of type 1 is 8.07 and type 0 is 91.92
I have merged the application_data and previous_data datasets and
created a new dataset as final_data whose shape is (1413701, 110)
Data divided the data set into 2 subsets based on Target variable Target=0 and
Target=1
Looking to the percent of defaulted credits, females have a higher chance of not
returning their loans.
Doing Univariate Analysis on Combined data for default customers.
Doing Univariate Analysis on Combined data for non default customers
Doing Bivariate Analysis on Combined data for default customers.
Doing Bivariate Analysis on Combined data for non default customers
Final Conclusion
Most of the Male loan applicants are drivers and laborers , and have
more credit amount and lesser income, so we should limit down the
credit amount to these kind of clients.
Most of the Female loan applicants are sales staff and laborers, and
have more credit amount and lesser income, so we should limit down
the credit amount to these kind of clients