0% found this document useful (0 votes)
57 views23 pages

Bank Loan Case Study 2

This document presents a case study analyzing loan data from a bank to identify patterns that can help predict the likelihood of default. The analysis included handling missing data, identifying outliers, finding imbalances in the data set, and univariate, segmented univariate, and bivariate analysis. Insights identified clients with academic degrees, cooperative housing, and students/businessmen as less likely to default, while those with secondary education or housing/apartments were more likely. The project provided learning around risk analytics, data visualization, and univariate/bivariate analysis techniques. Excel was used for the data analysis and insights.

Uploaded by

Eric Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views23 pages

Bank Loan Case Study 2

This document presents a case study analyzing loan data from a bank to identify patterns that can help predict the likelihood of default. The analysis included handling missing data, identifying outliers, finding imbalances in the data set, and univariate, segmented univariate, and bivariate analysis. Insights identified clients with academic degrees, cooperative housing, and students/businessmen as less likely to default, while those with secondary education or housing/apartments were more likely. The project provided learning around risk analytics, data visualization, and univariate/bivariate analysis techniques. Excel was used for the data analysis and insights.

Uploaded by

Eric Lobo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

PROJECT NAME:- BANK

LOAN CASE STUDY PRESENTED BY ERIC LOBO


PROJECT NAME:- BANK
LOAN CASE STUDY DY
ROADMAP

Project
description

APPROACH
INSIGHTS

LEARNING

TECHSTACK USE
AND DRIVE
LINK
PROJECT DESCRIPTION
 This case study helps us understand, how difficult is it to give loans to clients at bank, and how can
data and risk analytics can minimize the risk of giving loan to a potential defaulter, by finding
relationships between data and by helping them understand through visualizations which clients
may have a higher chance of defaulting etc.

 MORE DETAILED ANALYSIS IS DONE IN MY EXCEL SPREADSHEET


1.Find the missing 2. Identify outliers, if 3. Identify imbalance 4. Identify univariate, Find top 10
value, and how are their truly outliers or in data set segmented univariate, correlation for the
you going to handle just a part of data? bivariate analysis client with payment
missing value difficulties

THINGS TO FIND OUT THROUGH THE PROJECT


Two data sets were provided prev_application &
application_data
I used column description dataset to understand the data

After importing application data and previous application


data, I dropped off some columns from previous
application which I found irrelevant(no columns dropped
from application data)

APPROACH

Then I handled missing data, for that I had to see if that


column is categorical, or continuous variable, if it was
Categorical, I cannot replace it with mean(average), then
in this case I used mode function(the most repeated
value).
INSIGHTS
 Outliers are data that lie at an abnormal distance, from rest of
the data values.
 My approach of identifying outliers, is I used QUARTILES
OUTLIERS FUNCTION, IQR, upper bound, lower bound to find outliers.
 So, any value that is above upper bound and any value that is
below lower bound is an outlier.
APPLICATION_
DATA
PREVIOUS
APPLICATION
IMBALANCE OF IMBALANCE OF DATA IS A TERM USED WHERE
THE DATA IN A SEGEMENT IS UNEVENLY
D ATA DISTRIBUTED BETWEEN THE CLASS. FOR EG
WHEN WE ARE ANALYZING FOR A SEGMENT,
WHERE ONE CLASS HAS VERY HIGH OR VERY
LOW, AS COMPARED TO OTHER CLASS/CLASSES.
I USED PIVOT TABLE TO ANALYZE THE DATA
THROUGH SEGMENT, TO FIND SOME
IMBALANCES IN DATA.
IN THE BELOW CHARTS I TAKEN TWO VARIABLE
TO FIND IMBALANCE OF DATA.
1. FLAG_LAST_APPL_PER_CONTRACT
2. NAME_CONTRACT_STATUS
APPLICATION_
DATA.CSV
PREV_APPLICATION.
CSV

AS WE CAN SEE THE


RATIO OF IMBALANCE
OF DATA IN BOTH THE
INFORMATIVE CHARTS
 Univariate analysis is the mean to find patterns in one variable at a time.
U N I VA R I AT E ,
 It can be done through using descriptive analysis, through data analytics tool
SEGMENTED pack in excel.
U N I VA R I AT E , B I VA R I AT E
 Descriptive analytics include mean, median, mode, Standard dev and etc.
A N A LY S I S .
Segmented univariate means analyzing the data by each segment and find
relations and patterns through that segment. This analysis is useful, when we want
to compare the results of subgroups within a group for EG which region has the
highest profit margin.
Bivariate analysis is the way to find a pattern or relation between two datasets, it is
use find how strong or how weak is the relation between two datasets, by this
analysis we can find new patterns that can help business grow.
For bivariate analysis I analyzed took two appropriate variables and used CORREL
function to find the relationship between them, Then used scatter charts to
visualize the results.
If the results are positive it means if one variable increases, the other variable
increases too, and if its negative one variable increases other one decreases .
Univariate analysis
in this segmented analysis, we are analyzing the no of defaulters in relation
with what is their housing type
so here housing/apartment are like to be defaulters reasons can be many, due to
high loans taken for their apartments itself
the least % of defaulters are the one’s who stay in co-op apartments.
Another segmented analysis done on no of defaulters in respect to
clients education,
The one’s with secondary education likely to be a defaulter
followed by higher education.
Academic degree are less likely to be a defaulter
In this analysis we found out that Here people are not likely to take loans
married person are likely to have on Sundays and Saturday, and with all
payment difficulties days in week evenly distributed.
BIVARIATE ANALYSIS
TOP 10 CORRELATION
C L I E N T S W H O W O N ’ T P R O B A B LY D E FA U LT

CLIENTS WITH ACADEMIC DEGREE ARE CLIENTS WHO STAY IN CO-OPERATIVE STUDENTS AND BUSINESSMEN HAVE LESS %
LESS LIKELY TO BE A DEFAULTER HOUSES & OFFICE APPARTMENTS ARE LESS OF DEFAULTS.
LIKELY ON DELAY LOAN INSTALLMENTS.
LEARNINGS

 This project helped me understand how EDA are used in real world, I have learned how
visualization help any important data that is difficult to interpret with numbers.
 This project helped me understand the basics of risk analytics.
 Even got a vast idea on what is univariate, segmented univariate and bivariate analysis is.
 Felt the experience of real-world scenario.
THANKYOU

Spreadsheet link

TECH STACK USED :- bank loan case study


EXCEL

You might also like