Important Notes For Project 6
Important Notes For Project 6
loans. Your company faces challenges with some customers defaulting on their
loans. The aim is to use Exploratory Data Analysis (EDA) to identify patterns in the
data to make informed loan approval decisions.
Data Description:
previous_application.csv
Identification
SK_ID_CURR: ID of loan in our sample
Contract Information
NAME_CONTRACT_TYPE: Contract product type (Cash loan, consumer loan [POS] ,...) of
the previous application
AMT_APPLICATION: For how much credit did client ask on the previous application
AMT_CREDIT: Final credit amount on the previous application. This differs from
AMT_APPLICATION in a way that the AMT_APPLICATION is the amount for which the
client initially applied for, but during our approval process he could have received different
amount - AMT_CREDIT
AMT_GOODS_PRICE: Goods price of good that client asked for (if applicable) on the
previous application
Application Details
WEEKDAY_APPR_PROCESS_START: On which day of the week did the client apply for
previous application
HOUR_APPR_PROCESS_START: Approximately at what day hour did the client apply for
the previous application
NFLAG_LAST_APPL_IN_DAY: Flag if the application was the last application per day of the
client. Sometimes clients apply for more applications a day. Rarely it could also be error in
our system that one application is in the database twice
Interest Rates
RATE_DOWN_PAYMENT: Down payment rate normalized on previous credit
DAYS_DECISION: Relative to current application when was the decision about previous
application made
NAME_PAYMENT_TYPE: Payment method that client chose to pay for the previous
application
NAME_CLIENT_TYPE: Was the client old or new client when applying for the previous
application
NAME_GOODS_CATEGORY: What kind of goods did the client apply for in the previous
application
CHANNEL_TYPE: Through which channel we acquired the client on the previous application
NAME_YIELD_GROUP: Grouped interest rate into small medium and high of the previous
application
Timeline
DAYS_FIRST_DRAWING: Relative to application date of current application when was the
first disbursement of the previous application
DAYS_FIRST_DUE: Relative to application date of current application when was the first
due supposed to be of the previous application
DAYS_LAST_DUE: Relative to application date of current application when was the last due
date of the previous application
Insurance
NFLAG_INSURED_ON_APPROVAL: Did the client requested insurance during the previous
application
application_data.csv
Identification
SK_ID_CURR: ID of loan in our sample
Loan Outcome
TARGET: Target variable (1 - client with payment difficulties: he/she had late payment more
than X days on at least one of the first Y installments of the loan in our sample, 0 - all other
cases)
Client Information
CODE_GENDER: Gender of the client
AMT_GOODS_PRICE: For consumer loans it is the price of the goods for which the loan is
given
Property Details
NAME_TYPE_SUITE: Who was accompanying client when he was applying for the loan
NAME_HOUSING_TYPE: What is the housing situation of the client (renting, living with
parents, ...)
Client's Occupation
OCCUPATION_TYPE: What kind of occupation does the client have
Application Details
WEEKDAY_APPR_PROCESS_START: On which day of the week did the client apply for
the loan
HOUR_APPR_PROCESS_START: Approximately at what hour did the client apply for the
loan
Client's Region
REGION_POPULATION_RELATIVE: Normalized population of region where client lives
(higher number means the client lives in more populated region)
REGION_RATING_CLIENT_W_CITY: Our rating of the region where client lives with taking
city into account (1,2,3)
Client's Age
DAYS_BIRTH: Client's age in days at the time of application
Client's Employment
DAYS_EMPLOYED: How many days before the application the person started current
employment
External Sources
EXT_SOURCE_1: Normalized score from external data source
B. Identify Outliers
D. Various Analyses
E. Identify Correlations
When working through the tasks, always ensure that you understand the context and
meaning of each column.
Points to Remember:
1. Understanding of the Financial Sector:
● Credit Systems: Familiarity with how credit systems operate, including credit scores,
credit histories, and their significance in loan approval processes.
● Loan Types: Knowledge of different types of loans such as cash loans, revolving
loans, and their characteristics.
● Risk Management: An understanding of how financial institutions manage risks
associated with lending.
● Credit Bureaus: Familiarity with how credit bureaus operate and the kind of data
they provide which can be valuable in loan decision processes.
● Significance of External Ratings: Understanding ratings from external sources and
their relevance in predicting loan defaults.
6. Business Implications:
Important Hypothesis:
1. Demographic Factors:
● Gender & Payment Difficulty: Male clients might exhibit different payment
behaviors than female clients.
● Income & Payment Difficulty: Clients with lower total incomes might face more
challenges in making timely payments.
● Age & Payment Difficulty: Younger clients might face more payment difficulties
than older clients.
2. Loan Characteristics:
● Loan Amount & Payment Difficulty: Higher loan amounts might be associated with
increased payment difficulties.
● Loan Type & Payment Difficulty: Certain types of loans, like cash loans, might be
associated with more payment difficulties than revolving loans.
● Loan Term & Payment Difficulty: Short-term loans might see more initial payment
difficulties than long-term loans.
● External Ratings & Payment Difficulty: Lower scores from external sources might
correlate with increased payment difficulties.
6. Application Details:
● Application Timing & Payment Difficulty: Applications made during certain times,
like weekends or late hours, might be associated with increased payment difficulties.
● Property Ownership & Payment Difficulty: Clients who own real estate or a car
might face different payment challenges than those who don't.
8. Family Status:
● Family Size & Payment Difficulty: Clients with larger families (more children or
dependents) might face more payment difficulties.
9. Regional Factors:
● Client's Region & Payment Difficulty: Clients from certain regions or with specific
regional ratings might exhibit different payment behaviors.
● Previous Loan Purpose & Current Payment Difficulty: The purpose of previous
loans might influence the likelihood of payment difficulties for the current loan. For
instance, if a previous loan was for urgent medical needs, the client might face more
financial strain.
● Previous Loan Contract Type & Current Payment Difficulty: The type of contract
for previous loans (like cash loans, consumer loans) might influence current payment
behaviors.
● Interest Rates & Payment Difficulty: Clients with higher interest rates on their
previous loans might face more payment difficulties.
These hypotheses, rooted in the project's context, can guide the exploratory data
analysis process. They can be validated or refuted using the datasets provided, which
will offer insights into the factors influencing payment difficulties.