100% found this document useful (1 vote)

303 views

Eda Case Study Final PDF

The document presents a case study on using exploratory data analysis (EDA) to help a consumer finance company better determine which loan applicants will repay their loans. Three key points: 1) EDA is used to analyze loan applicant data and identify patterns that can help approve applicants likely to repay and reject those likely to default. This reduces interest and credit losses for the company. 2) The EDA process involves data cleaning, univariate analysis of individual variables, and bivariate analysis of relationships between variables. 3) Insights from the analyses found female applicants and millionaires are less likely to default, while laborers and applicants with low incomes are more likely to default. The analysis

Uploaded by

Vishal P

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

303 views

Eda Case Study Final PDF

Uploaded by

Vishal P

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

EDA CASE STUDY

Case Study Business Problem:

The loan providing companies find it hard to give loans to the people due to their insufficient or non- existent credit
history. Because of that, some consumers use it as their advantage by becoming a defaulter. Suppose you work for a
consumer finance company which specialises in lending various types of loans to urban customers. You have to use EDA
to analyse the patterns present in the data. This will ensure that the applicants capable of repaying the loan are not
rejected.

Major problems for the scenario:

There are majorly two problems observed:

1) If a person is capable of paying loan and application is rejected, bank will be ending with Interest loss
2) If a person is not capable of paying loan and application is approved, bank will be ending with credit loss.
For the business problems mentioned above slide can be insighted from analyst percepective
and reduce the loss by giving some recommendations for bank.

This process can be achieved by EDA (Exploratory Data Analysis).

EDA Process:
1) Data Cleaning
2) Univariate Analysis
3) Bivariate Analysis.
Data Cleaning
❑ Load Application CSV File.
❑ Look for few Insights using Python commands(Describe, Info, Shape,
data types).
❑ Start Looking for Null values as they affect the Quality of Data Analysis on
the Dataset.
❑ Drop Columns Which have More than 50% of null values in the respective
columns as they do not help or show impact on our Analysis.
❑ Do not drop other columns which have Less %(Around 13) of Null values as
they can be imputed based on the type of Columns Choosing Mean, Median
and Mode.
❑ There are 41 Columns which are having more than 50% of null values which are
Dropped.
Handling Outliers for continuous Numeric
variables
❑ 'AMT_REQ_CREDIT_BUREAU_QRT’,'AMT_REQ_CREDIT_BUREAU_YEAR','CNT_FAM_MEM BERS'
are continuous numeric variables chosen form the Application CSV file that are having outliers
present in the respective columns.
❑ As per the Box Plot below ‘MEAN’ can have a impact on the actual data if
imputed. Hence it is recommended to use MEDIAN for these columns.
Identifying Outliers for continuous Numeric
variables

For ‘AMT_CREDIT’ column the For ‘CNT_CHILDREN’ column the

mean is 599076.2 where as the max mean is 0.41where as the max
amount Is 4050000. amount Is 19.
Identifying Outliers for continuous Numeric
variables

For ‘AMT_ANNUITY’ column the mean is For ‘AMT_REQ_CREDIT_BUREAU_YEAR’

27117 where as the max amount Is column the mean is 1.9where as the max
225000. amount Is 21.
Imbalance Check:

Based On Target Value 0 and 1 Based on Male & Female Gender

As you can see in the graph below the As you can see in the graph below the
data is Imbalance where Target 0 is of data is Imbalance where Female are of
91.93% and Target 1 is 65.84% and Male are of 34.16%
of 8.07%
Univariate Analysis For Categorical Variables of Target 0 and Target 1

❑ Considering Age group Variable in target 0 and target 1 thechances

of a non-defaulter and likely to default for age group of 30-40 is high in
both the cases.

❑ For the Credit amount category the low zone is havinghigher

re-payments in target 0 and in target 1 the difficulties is for average credit
amount taken holders.

❑ No Significant difference in Housing Type, Family Status, EducationType.

❑ Also for Code Gender category around 125000 Females are more in Number as

non-Defaulters than being likely to default and we can see the male category is
high in likely to be default thanNon-Defaulters.

Note: Only limited graphs are shown in the presentation. More illustrations and graph can be found in attached python file
Correlation of numerical Columns of Target ‘0’ and Target ‘1’
❑ Highest correlation exist between AMT_CREDIT and AMT_GOODS_PRICE
there is something related to this column of Target ‘0’
❑ There is negative correlation existing between CNT_CHILDREN and
DAYS_BIRTH in Target ‘0’
❑ Least correlation exist between the DAYS_EMPLOYED and
DAYS_ID_PUBLISH in Target ‘0’
❑ Highest correlation exist between AMT_CREDIT and AMT_GOODS_PRICE
there is something related to this column of Target ‘1’
❑ There is negative correlation existing between CNT_CHILDREN and
DAYS_BIRTH of Target ‘1’ is Comparatively less with 'Target 0'
❑ Least correlation exist between the DAYS_EMPLOYED and
DAYS_ID_PUBLISH of Target ‘1’
❑ For the above correlation we could see almost the values which are correlated
with one another is same as in target 0 and 1 data frames. Also by above
imputation we could AMT_GOODS_PRICE is highly correlated with AMT_CREDIT
and also with reasonably less correlated with AMT_ANNUTITY
Univariate Analysis For Numerical Variables of Target 0 and Target 1

❑ The client reaching credit Bureau yearly enquiries are more non-
defaulters than likely to default.
❑ The client with no children are high in number of defaulters than not
Defaulters.
❑ If the Client's income is high then the client may not default
the bank.
❑ The applicant For consumer loans is likely to be non-defaulter than
being defaulter.
❑ If the clients days employed is above 350000 than it has more number of
Non-Defaulters than likely to Default

Note: Only limited graphs are shown in the presentation. More illustrations and graph can be found in attached python file
Bivariate Analysis on Categorical columns of 'Target 0' and 'Target
1'

❑ The female gender who doesn't own a car are more non-defaulters and its the
same the same for likely to default.
❑ The male gender who owns a House / apartment are more likely to default than
non-defaulters
❑ The core staff occupation type who owns of House/apartment are more in
number of non-Defaulters whereas the occupation type Labourers who
owns of House/apartment are high in number to likely to default

❑ The Male Gender Labourers doesn't have significant difference

between Non-Defaulting and likely to default and the female
sales staff members are more in number of Non-Defaulters than likely to
default.

Note: Only limited graphs are shown in the presentation. More illustrations and graph can be found in attached python file
Insights Bivariate Analysis of Continuous and categorical variables
and Continuous to continuous
❑ Applicants who have high income and with no children are more
likely to default
❑ Providing a loan amount of Range 500000-2500000 to the total
income of Less 500000 are more likely to
default than non-default
❑ There are more people who haven't paid back their
loans on time with a total income of less than 500000 and are
more likely to default.
❑ The variables AMT_ANNUITY and AMT_CREDIT for both
non-defaulters and Defaulters has a strong correlation and also
has similar pattern between them

Note: Only limited graphs are shown in the presentation. More illustrations and graph can be found in attached python file
Inferences drawn performing Univariate Analysis and Bivariate analysis
of Combined data frame of Categorical Variables and Continuous
Variables.
❑ There are around 70000-80000 whose loans are approved who are likely to default and
also over 200000 applicant's loan is refused who are less likely to default this would incur
loss to the bank.
❑ There is few applicant's loan with secondary / secondary special who face difficulties to
pay loan on time than who are likely to pay on time
❑ Female Gender are more likely to not face payment difficulties then the male and hence it
is recommended to approve more loans of Female Gender than the male gender at the
same Female are High in number than who face difficulties than males
❑ Labourers are high in number of occupation type list who are likely to
default or payment difficulties
❑ The Repeater applicant has High chance of non-Defaulting and also has high chance of
defaulting when compared to new applicants
❑ No millionaire is likely to default so should not refused a application of millionaire's
application for loan and Lower Middle class people are high in number to repay the
loans
THANK YOU
-Harshad Surya Chandolu

OET Result
100% (3)
OET Result
2 pages
Coloplast Catalogo Geral
No ratings yet
Coloplast Catalogo Geral
49 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
Credit EDA Assignment
67% (6)
Credit EDA Assignment
41 pages
EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
Weldon Thornton - Allusions in Ulysses - A Line-By-Line Reference To Joyce's Complex Symbolism-Simon and Schuster (1973) PDF
100% (1)
Weldon Thornton - Allusions in Ulysses - A Line-By-Line Reference To Joyce's Complex Symbolism-Simon and Schuster (1973) PDF
557 pages
Capstone Final Report DSA Group 14
100% (1)
Capstone Final Report DSA Group 14
22 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
22 pages
EDA Assignment
100% (1)
EDA Assignment
19 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
Long Quiz FRA - Finance and Risk Analytics - Great Learning
100% (1)
Long Quiz FRA - Finance and Risk Analytics - Great Learning
8 pages
Analysis of Transport Choice of Employees - A Project On Machine Learning
100% (10)
Analysis of Transport Choice of Employees - A Project On Machine Learning
24 pages
Assignment Clustering
No ratings yet
Assignment Clustering
22 pages
Otosclerosis Case Study
0% (2)
Otosclerosis Case Study
3 pages
Credit - Eda Case Study: Mr. Murali Krishna Manala Ms. Prachi Patil
100% (1)
Credit - Eda Case Study: Mr. Murali Krishna Manala Ms. Prachi Patil
22 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Credit Eda Case Study
100% (2)
Credit Eda Case Study
17 pages
Fradulent Credit Case Study
100% (1)
Fradulent Credit Case Study
31 pages
Credit EDA Case Study
100% (3)
Credit EDA Case Study
16 pages
Bank Loan Case - Study
100% (1)
Bank Loan Case - Study
21 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
Credit Eda Case Study
100% (1)
Credit Eda Case Study
15 pages
Credit Eda Case Study
100% (1)
Credit Eda Case Study
39 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
No ratings yet
Pranjal - Singh - 30.10.2022 SMDM PROJECT REPORT
9 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
SMDM Project
No ratings yet
SMDM Project
17 pages
Cart-Rf-Ann: Prepared by Muralidharan N
67% (3)
Cart-Rf-Ann: Prepared by Muralidharan N
33 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Business Report 16 April 2023
No ratings yet
Business Report 16 April 2023
16 pages
Churn Analysis of Bank Customers
100% (1)
Churn Analysis of Bank Customers
12 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Data Mining Assignment: Sudhanva Saralaya
100% (1)
Data Mining Assignment: Sudhanva Saralaya
16 pages
SMDM Project Report
100% (1)
SMDM Project Report
9 pages
Answer Report (Preditive Modelling)
100% (1)
Answer Report (Preditive Modelling)
29 pages
Anisha SMDM
No ratings yet
Anisha SMDM
11 pages
Car Transport Machine Learning
89% (9)
Car Transport Machine Learning
28 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
SMDM Project Report
100% (2)
SMDM Project Report
35 pages
Shark Tank - Web and Social Media Analytics Case Study
100% (1)
Shark Tank - Web and Social Media Analytics Case Study
9 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
16 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
PG Program Dsba
No ratings yet
PG Program Dsba
16 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
Data Mining Clustering PDF
No ratings yet
Data Mining Clustering PDF
15 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
100% (1)
Project Advanced Statistics UMESHHASIJA SEP2021 Jupyter File
25 pages
AS Project Report
No ratings yet
AS Project Report
22 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Rajiv Ranjan 11 Dec 2022
No ratings yet
Rajiv Ranjan 11 Dec 2022
18 pages
Data Mining Business Report
No ratings yet
Data Mining Business Report
38 pages
SMDM - Week 1 Checklist
100% (1)
SMDM - Week 1 Checklist
3 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
Mini Project - Factor Hair Analysis: Sravanthi.M
100% (2)
Mini Project - Factor Hair Analysis: Sravanthi.M
24 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Hiring Process Analytics Report
100% (1)
Hiring Process Analytics Report
8 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
Bar Bending Schedule - RCC Slab New
No ratings yet
Bar Bending Schedule - RCC Slab New
10 pages
SG 5 Assignment 1 2 3 4
100% (3)
SG 5 Assignment 1 2 3 4
10 pages
Mathematics: Quarter 1 - Module 4: Division of Fractions
No ratings yet
Mathematics: Quarter 1 - Module 4: Division of Fractions
32 pages
Medicine Buddha-Sadhana A Stream of Lapis Lazuli PDF
100% (1)
Medicine Buddha-Sadhana A Stream of Lapis Lazuli PDF
17 pages
Links Uteis
No ratings yet
Links Uteis
6 pages
CPAR Lesson 11
No ratings yet
CPAR Lesson 11
18 pages
Turbine Flow Sensor With Pulse Output
No ratings yet
Turbine Flow Sensor With Pulse Output
2 pages
Lecture 4 (2) - Fundamental Data Structures: CST370 - Design & Analysis of Algorithms Dr. Byun Computer Science
No ratings yet
Lecture 4 (2) - Fundamental Data Structures: CST370 - Design & Analysis of Algorithms Dr. Byun Computer Science
28 pages
Critical Thinking
No ratings yet
Critical Thinking
777 pages
Benefits of PLC Control
No ratings yet
Benefits of PLC Control
4 pages
NewtonLawsofMotion Ex 3 PDF
No ratings yet
NewtonLawsofMotion Ex 3 PDF
65 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
3 pages
CSEC Mathematics June 2001 P2
100% (2)
CSEC Mathematics June 2001 P2
11 pages
Media Arts 7
No ratings yet
Media Arts 7
7 pages
Thesis Curriculum Development
100% (3)
Thesis Curriculum Development
4 pages
FDT Part 1
No ratings yet
FDT Part 1
1 page
Infrared and Hot-Air Drying of Onions: D.G. Praveen Kumar, H. Umesh Hebbar, D. Sukumar and M.N. Ramesh
No ratings yet
Infrared and Hot-Air Drying of Onions: D.G. Praveen Kumar, H. Umesh Hebbar, D. Sukumar and M.N. Ramesh
19 pages
Spectrum Master: Model MS2711D Spectrum Analyzer Maintenance Manual
No ratings yet
Spectrum Master: Model MS2711D Spectrum Analyzer Maintenance Manual
48 pages
Multi-Hazard 5
No ratings yet
Multi-Hazard 5
16 pages
Secugen Biometric Device Installation & Configuration of Java Settings
No ratings yet
Secugen Biometric Device Installation & Configuration of Java Settings
19 pages
Case Study On Bastora Junction TechAbutment - Goa
No ratings yet
Case Study On Bastora Junction TechAbutment - Goa
2 pages
Bachelor of Arts in Computer Science: Choose ONE Course
No ratings yet
Bachelor of Arts in Computer Science: Choose ONE Course
1 page
Download Complete Brill s Companion to Callimachus Brill s Companions to Classical Studies Benjamin Acosta-Hughes PDF for All Chapters
100% (1)
Download Complete Brill s Companion to Callimachus Brill s Companions to Classical Studies Benjamin Acosta-Hughes PDF for All Chapters
78 pages
Accelerating Climate Resilient Plant Breeding by Applying Next-Generation Artificial Intelligence
No ratings yet
Accelerating Climate Resilient Plant Breeding by Applying Next-Generation Artificial Intelligence
19 pages
BONOMI Catalogo
50% (2)
BONOMI Catalogo
24 pages
Drug DNA Interaction Protocols 2nd Edition Yang Liu download
100% (1)
Drug DNA Interaction Protocols 2nd Edition Yang Liu download
58 pages

Eda Case Study Final PDF

Uploaded by

Eda Case Study Final PDF

Uploaded by

EDA CASE STUDY

Case Study Business Problem:

Major problems for the scenario:

There are majorly two problems observed:

This process can be achieved by EDA (Exploratory Data Analysis).

For ‘AMT_CREDIT’ column the For ‘CNT_CHILDREN’ column the

For ‘AMT_ANNUITY’ column the mean is For ‘AMT_REQ_CREDIT_BUREAU_YEAR’

Based On Target Value 0 and 1 Based on Male & Female Gender

❑ Considering Age group Variable in target 0 and target 1 thechances

❑ For the Credit amount category the low zone is havinghigher

❑ No Significant difference in Housing Type, Family Status, EducationType.

❑ The Male Gender Labourers doesn't have significant difference

You might also like