0% found this document useful (0 votes)

2 views

Problem Statement

This assignment focuses on applying Exploratory Data Analysis (EDA) in a banking context to understand risk analytics related to loan approvals. It aims to identify patterns in loan applications that indicate the likelihood of default, helping companies make informed lending decisions. The analysis will involve data cleaning, handling missing values, identifying outliers, and visualizing insights to support business objectives.

Uploaded by

Vishnu Ranji

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Problem Statement

Uploaded by

Vishnu Ranji

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Problem Statement - I

Introduction

This assignment aims to give you an idea of applying EDA in a real business
scenario. In this assignment, apart from applying the techniques that you have
learnt in the EDA module, you will also develop a basic understanding of risk
analytics in banking and financial services and understand how data is used to
minimise the risk of losing money while lending to customers.

Business Understanding

The loan providing companies find it hard to give loans to the people due to their
insufficient or non-existent credit history. Because of that, some consumers use it
to their advantage by becoming a defaulter. Suppose you work for a consumer
finance company which specialises in lending various types of loans to urban
customers. You have to use EDA to analyse the patterns present in the data. This
will ensure that the applicants capable of repaying the loan are not rejected.

When the company receives a loan application, the company has to decide for loan
approval based on the applicant’s profile. Two types of risks are associated with
the bank’s decision:

If the applicant is likely to repay the loan, then not approving the loan results in a
loss of business to the company
If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then
approving the loan may lead to a financial loss for the company.

The data given below contains the information about the loan application at the
time of applying for the loan. It contains two types of scenarios:

The client with payment difficulties: he/she had late payment more than X days
on at least one of the first Y instalments of the loan in our sample,

All other cases: All other cases when the payment is paid on time.

When a client applies for a loan, there are four types of decisions that could be
taken by the client/company):

Approved: The Company has approved loan Application

Cancelled: The client cancelled the application sometime during approval. Either
the client changed her/his mind about the loan or in some cases due to a higher risk
of the client, he received worse pricing which he did not want.

Refused: The company had rejected the loan (because the client does not meet
their requirements etc.).

Unused offer: Loan has been cancelled by the client but at different stages of the
process.

In this case study, you will use EDA to understand how consumer attributes and
loan attributes influence the tendency to default.
Business Objectives

This case study aims to identify patterns which indicate if a client has difficulty
paying their instalments which may be used for taking actions such as denying the
loan, reducing the amount of loan, lending (to risky applicants) at a higher interest
rate, etc. This will ensure that the consumers capable of repaying the loan are not
rejected. Identification of such applicants using EDA is the aim of this case study.

In other words, the company wants to understand the driving factors (or driver
variables) behind loan default, i.e. the variables which are strong indicators of
default. The company can utilise this knowledge for its portfolio and risk
assessment.

To develop your understanding of the domain, you are advised to independently

research a little about risk analytics - understanding the types of variables and their
significance should be enough.

Data Understanding

Download the dataset from below.

This dataset has 3 files as explained below:

1. 'application_data.csv' contains all the information of the client at the time of

application.
The data is about whether a client has payment difficulties.
2. 'previous_application.csv' contains information about the client’s previous loan
data. It contains the data on whether the previous application had been Approved,
Cancelled, Refused or Unused offer.

3. 'columns_description.csv' is data dictionary which describes the meaning of the

variables.

Problem Statement -
II
Results Expected by Learners

Present the overall approach of the analysis in a presentation. Mention the problem
statement and the analysis approach briefly.

Identify the missing data and use appropriate method to deal with it. (Remove
columns/or replace it with an appropriate value)

Hint: Note that in EDA, since it is not necessary to replace the missing value, but
if you have to replace the missing value, what should be the approach. Clearly
mention the approach.

Identify if there are outliers in the dataset. Also, mention why do you think it is an
outlier. Again, remember that for this exercise, it is not necessary to remove any
data points.

Identify if there is data imbalance in the data. Find the ratio of data imbalance.
Hint: How will you analyse the data in case of data imbalance? You can plot more
than one type of plot to analyse the different aspects due to data imbalance. For
example, you can choose your own scale for the graphs, i.e. one can plot in terms
of percentage or absolute value. Do this analysis for the ‘Target variable’ in the
dataset ( clients with payment difficulties and all other cases). Use a mix of
univariate and bivariate analysis etc.

Hint: Since there are a lot of columns, you can run your analysis in loops for the
appropriate columns and find the insights.

Explain the results of univariate, segmented univariate, bivariate analysis, etc. in

business terms.

Find the top 10 correlation for the Client with payment difficulties and all other
cases (Target variable). Note that you have to find the top correlation by
segmenting the data frame w.r.t to the target variable and then find the top
correlation for each of the segmented data and find if any insight is there. Say,
there are 5+1(target) variables in a dataset: Var1, Var2, Var3, Var4, Var5,
Target. And if you have to find top 3 correlation, it can be: Var1 & Var2, Var2 &
Var3, Var1 & Var3. Target variable will not feature in this correlation as it is a
categorical variable and not a continuous variable which is increasing or
decreasing.

Include visualisations and summarise the most important results in the

presentation. You are free to choose the graphs which explain the
numerical/categorical variables. Insights should explain why the variable is
important for differentiating the clients with payment difficulties with all other
cases.
You need to submit one/two Ipython notebook which clearly explains the thought
process behind your analysis (either in comments of markdown text), code and
relevant plots. The presentation file needs to be in PDF format and should contain
the points discussed above with the necessary visualisations. Also, all the
visualisations and plots must be done in Python(should be present in the Ipython
notebook), though they may be recreated in Tableau for better aesthetics in the
PPT file.

Report an error

upGrad & IIITB | Data Science Program - July 2024

 Learn
 Live
 Jobs
 Discussions


 RM
NavigateQ&A

Evaluation Rubrics
Criteria Meets expectations Does not meet expectations

Data understanding

(20%)
All data quality issues are Data quality issues are
correctly identified and overlooked or are not identified
reported. correctly such as missing values,
outliers and other data quality
issues.

Wherever required, the

meanings of the variables are
correctly interpreted and
The variables are interpreted
written either in the comments
incorrectly or the meaning of
or text.
variables is not mentioned.

Data quality issues are not

Data quality issues are
addressed correctly.
addressed in the right way
(missing value imputation
analysis and other kinds of data
redundancies, etc.).

Data Cleaning and If applicable, data is converted

Manipulation (10%) to a suitable and convenient
The variables are not converted
format to work with using the
to an appropriate format for
right methods.
analysis.

Manipulation of strings and

dates is done correctly
wherever required String and date manipulation is
not done correctly or is done
using complex methods
Data analysis (50%)

The right problem is solved The analyses do not address the

which is coherent with the right problem or deviate from the
needs of the business. The business objectives. The analysis
analysis has a clear structure lacks a clear structure and is not
and the flow is easy to easy to follow.
understand.

Univariate and segmented

The univariate and bivariate
univariate analysis is done
analysis is not performed in
correctly and appropriate
sufficient detail and thus some
realistic assumptions are made
crucial insights are missed out.
wherever required. The
The analyses are not able to
analyses successfully identify
identify enough important driver
at least the 5 important driver
variables.
variables (i.e. variables which
are strong indicators of
default).
New metrics are not derived
wherever appropriate. The
explanation for creating the
Business-driven, type-driven
derived metrics is either not
and data-driven metrics are
mentioned or the metrics are not
created for the important
reasonable.
variables and utilised for
analysis. The explanation for
creating the derived metrics is
mentioned and is reasonable.
Derived metrics are not analysed
correctly/are insufficiently
utilised.
Bivariate analysis is performed
correctly and is able to identify
the important combinations of
.
driver variables. The
combinations of variables are
chosen such that they make
business or analytical sense.

The most useful insights are Important insights are not

explained correctly in the mentioned in the report or the

comments. Python file. Relevant plots are

not created. The choice of plots is
not ideal and the plots are either
difficult to interpret or lack
clarity or neatness. Relevant
Appropriate plots are created to insights are not clearly presented
present the results of the by the plots. The axes and
analysis. The choice of plots important data points are not
for respective cases is correct. labelled correctly/neatly.
The plots should clearly present
the relevant insights and should
be easy to read. The axes and
important data points are
labelled correctly.

The presentation lacks structure,

is too long or does not put
emphasis on the important
observations. The language used
The presentation has a clear
is complicated for business
structure, is not too long, and
people to understand.
explains the most important
results concisely in simple
language.
The recommendations to solve
Presentation and
the problems are either
Recommendations
unrealistic, non-actionable or
(10%) The recommendations to solve
incoherent with the analysis.
the problems are realistic,
actionable and coherent with
the analysis.

Contains unnecessary details or

If any assumptions are made, lacks the important ones.
they are stated clearly.

Assumptions made, if any, are

not stated clearly.

Conciseness and

readability of the
The code is concise and
code (10%)
syntactically correct. Wherever
appropriate, built-in functions Long and complex code used
and standard libraries are used instead of shorter built-in
instead of writing long code (if- functions.
else statements, for loops, etc.).

Custom functions are not used to

Custom functions are used to perform repetitive tasks resulting

perform repetitive tasks. in the same piece of code being

repeated multiple times.

The code is readable with

appropriately named variables Code readability is poor because
and detailed comments are of vaguely named variables or

written wherever necessary. lack of comments wherever

necessary.

EDA Credit Case Study (Karan Pratap Singh)
100% (1)
EDA Credit Case Study (Karan Pratap Singh)
63 pages
Apu Hamilton Sundstrand Datasheet
88% (8)
Apu Hamilton Sundstrand Datasheet
4 pages
LendingClub CaseStudy
100% (1)
LendingClub CaseStudy
21 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
7 pages
EDA Loan Case Study PPT - Ver 1.1
80% (5)
EDA Loan Case Study PPT - Ver 1.1
22 pages
EDA Assignment
100% (1)
EDA Assignment
19 pages
Capstone Project PPT
No ratings yet
Capstone Project PPT
13 pages
Shaddick
No ratings yet
Shaddick
56 pages
Share Exchange Agreement: Between: (FIRST PARTY NAME) (The "Shareholder"), An Individual With His Main Address
100% (1)
Share Exchange Agreement: Between: (FIRST PARTY NAME) (The "Shareholder"), An Individual With His Main Address
5 pages
Credit EDA Case Study Problem Statement
No ratings yet
Credit EDA Case Study Problem Statement
4 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
2 pages
1 PPPP
No ratings yet
1 PPPP
26 pages
Credit EDA Assignment PDF
No ratings yet
Credit EDA Assignment PDF
40 pages
EDA Assignment
No ratings yet
EDA Assignment
33 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
34 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
22 pages
EDA Case Study
No ratings yet
EDA Case Study
94 pages
trainity-data an
No ratings yet
trainity-data an
24 pages
Problem Statement 1
No ratings yet
Problem Statement 1
4 pages
Trainity Data Analytics Training Project 6
No ratings yet
Trainity Data Analytics Training Project 6
22 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
Bank_Loan_ppt
No ratings yet
Bank_Loan_ppt
45 pages
Bank Loan Case - Study
100% (1)
Bank Loan Case - Study
21 pages
Ass 06 - Bank Loan Case Study
No ratings yet
Ass 06 - Bank Loan Case Study
11 pages
Bank Loan Case Study 2
No ratings yet
Bank Loan Case Study 2
23 pages
Bank Loan Case Study PRO 6 1
No ratings yet
Bank Loan Case Study PRO 6 1
24 pages
Hillier_7e_Ch02_PPT_Accessible
No ratings yet
Hillier_7e_Ch02_PPT_Accessible
74 pages
EDA Credit Assignment Shakti - PDF
No ratings yet
EDA Credit Assignment Shakti - PDF
51 pages
Bank Loan Case Study1
No ratings yet
Bank Loan Case Study1
13 pages
Business Analytics
No ratings yet
Business Analytics
56 pages
LendingClubCaseStudy 1
No ratings yet
LendingClubCaseStudy 1
19 pages
Problem Statement II
No ratings yet
Problem Statement II
2 pages
Bank Loan Case Study Report
No ratings yet
Bank Loan Case Study Report
23 pages
Eda - Assignment 1 (Final)
No ratings yet
Eda - Assignment 1 (Final)
10 pages
Problem Statement
No ratings yet
Problem Statement
4 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
21 pages
Credit EDA Case Study
No ratings yet
Credit EDA Case Study
42 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Credit EDA Case Study Doc 1
100% (1)
Credit EDA Case Study Doc 1
16 pages
Vechile Loan Defaulter
No ratings yet
Vechile Loan Defaulter
23 pages
Credit Scoring: Case Study in Data Analytics
No ratings yet
Credit Scoring: Case Study in Data Analytics
18 pages
EDA Group Case Study
No ratings yet
EDA Group Case Study
33 pages
Approach - Document - BFS Capstone Project - v0.1
No ratings yet
Approach - Document - BFS Capstone Project - v0.1
8 pages
Lending Club Case Study: Prabhat Sharma Brij Bhushan Paliwal
No ratings yet
Lending Club Case Study: Prabhat Sharma Brij Bhushan Paliwal
10 pages
6- Bank Loan Analysis
No ratings yet
6- Bank Loan Analysis
10 pages
EDA Case Study
No ratings yet
EDA Case Study
14 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Credit Card EDA: Authored by
100% (1)
Credit Card EDA: Authored by
16 pages
Case Study New
No ratings yet
Case Study New
16 pages
Group 5 Dseb64a Report
No ratings yet
Group 5 Dseb64a Report
10 pages
Data Mining Case Study PDF
100% (1)
Data Mining Case Study PDF
21 pages
Data Mining Case Study PDF
No ratings yet
Data Mining Case Study PDF
21 pages
Documentation - Group Project FP 2019
No ratings yet
Documentation - Group Project FP 2019
7 pages
EDA-project notes-1
No ratings yet
EDA-project notes-1
23 pages
Eda Case Study Final PDF
100% (1)
Eda Case Study Final PDF
15 pages
Final Report
No ratings yet
Final Report
69 pages
EDA Assignment Summary PDF
No ratings yet
EDA Assignment Summary PDF
12 pages
Analytics Case Resolvr
No ratings yet
Analytics Case Resolvr
4 pages
3. Data Analytics Course Outline
No ratings yet
3. Data Analytics Course Outline
5 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
13 pages
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
From Everand
A Short Guide to Marketing Model Alignment & Design: Advanced Topics in Goal Alignment - Model Formulation
David Young
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
561 (2012) 5 CLJ Lim Gim Seah v. Lokman Talib & Ors
No ratings yet
561 (2012) 5 CLJ Lim Gim Seah v. Lokman Talib & Ors
19 pages
Patterns and Relations Variables and Equations Unit Plan 1
No ratings yet
Patterns and Relations Variables and Equations Unit Plan 1
6 pages
IL300
No ratings yet
IL300
11 pages
Marketing Project On ITC.
0% (2)
Marketing Project On ITC.
36 pages
Tan vs. Sycip, 499 SCRA 216, August 17, 2006
No ratings yet
Tan vs. Sycip, 499 SCRA 216, August 17, 2006
20 pages
Sriperumbudur (65)
No ratings yet
Sriperumbudur (65)
2 pages
46 CFR Part 153
No ratings yet
46 CFR Part 153
130 pages
PM NW CCNA Day 10 Notes
No ratings yet
PM NW CCNA Day 10 Notes
9 pages
Question Paper of BITS Pilani WILP - 2007
100% (3)
Question Paper of BITS Pilani WILP - 2007
2 pages
Conceptual Framework - SHRM
100% (4)
Conceptual Framework - SHRM
13 pages
The Guide To Sustainable Investment 2012 (NEIW Edition)
No ratings yet
The Guide To Sustainable Investment 2012 (NEIW Edition)
60 pages
Css Practical No 14
No ratings yet
Css Practical No 14
5 pages
Essay New Year Resolution
100% (1)
Essay New Year Resolution
3 pages
Preamplifier Selection Guide
No ratings yet
Preamplifier Selection Guide
3 pages
Corporate Social Responsibility Assignment 1
No ratings yet
Corporate Social Responsibility Assignment 1
3 pages
Fundamentals of Active Investment PPT Group Three
No ratings yet
Fundamentals of Active Investment PPT Group Three
34 pages
FASAH
No ratings yet
FASAH
1 page
History - of - Sports & Ent MKTG
No ratings yet
History - of - Sports & Ent MKTG
123 pages
4000+ Viral Motivational Reels For Instagram + Bonuses
No ratings yet
4000+ Viral Motivational Reels For Instagram + Bonuses
4 pages
Data Structure Unit (3 &4) Notes (SnapED)
No ratings yet
Data Structure Unit (3 &4) Notes (SnapED)
25 pages
Beauty Personal Care Industry Report
100% (1)
Beauty Personal Care Industry Report
16 pages
Gallion Op Ed
No ratings yet
Gallion Op Ed
11 pages
10 Architectural Design Thesis Questions
100% (1)
10 Architectural Design Thesis Questions
2 pages
HP Calculators: HP 39gs Working With Aplets
No ratings yet
HP Calculators: HP 39gs Working With Aplets
4 pages
TAP Molson Coors Boston Sept 2017 Slides
No ratings yet
TAP Molson Coors Boston Sept 2017 Slides
27 pages
GovAcctg. Modules 1 4 Part1
100% (1)
GovAcctg. Modules 1 4 Part1
58 pages
Irfh 5215 PBF
No ratings yet
Irfh 5215 PBF
8 pages

Problem Statement

Uploaded by

Problem Statement

Uploaded by

Problem Statement - I

Approved: The Company has approved loan Application

To develop your understanding of the domain, you are advised to independently

Download the dataset from below.

This dataset has 3 files as explained below:

1. 'application_data.csv' contains all the information of the client at the time of

3. 'columns_description.csv' is data dictionary which describes the meaning of the

Explain the results of univariate, segmented univariate, bivariate analysis, etc. in

Include visualisations and summarise the most important results in the

upGrad & IIITB | Data Science Program - July 2024

Wherever required, the

Data quality issues are not

Data Cleaning and If applicable, data is converted

Manipulation of strings and

The right problem is solved The analyses do not address the

Univariate and segmented

The most useful insights are Important insights are not

explained correctly in the mentioned in the report or the

comments. Python file. Relevant plots are

The presentation lacks structure,

Contains unnecessary details or

Assumptions made, if any, are

Custom functions are not used to

Custom functions are used to perform repetitive tasks resulting

perform repetitive tasks. in the same piece of code being

The code is readable with

written wherever necessary. lack of comments wherever

You might also like