Documentation - Group Project FP 2019
Documentation - Group Project FP 2019
Sébastien Pavot
Edward Vrijghem
Report
When comparing the average salary per district size we concluded that Prague, the capital of
Czechoslovakia, scored best in both categories. Furthermore we observed a slight correlation between
district size and average salary. Our clients were earning more in the capital district, compared to the in
the smaller, lesser populated districts.
When digging deeper into loan statuses we start to notice some age-group related differences. When
observing contracts where the loan was not payed, we see and increase for the age group of 50 year olds.
This could indicate an increased risk when borrowing to this group, the risk assessment division should
take this into consideration. Next to this, the high level of indebted clients (indicated in green) who take
a loan in their 20’s to 50’s should also be considered.
Carlos Montenegro
Sébastien Pavot
Edward Vrijghem
Furthermore, every trend goes as expected. Younger and older age-groups tend to borrow less in general.
This is firstly because the group of 60-69 tends to spend less money on so called investments (houses, cars
and others) than the younger generation (10-19yo.) which doesn’t have a lot of loans. This is most likely
because the legal minimum age to borrow money is 18 in Czechoslovakia.
As for the cards we can see that there are huge opportunities, the grey part of the bar charts represent
clients which are not in possession of a card yet. The clients situated in “the grey zone” could form a target
group for later advertising actions.
As per average loan payments per district (a direct derivative of average loan), we included the top
regions. Surprisingly the capital district Praha didn’t came out on top. This information could be useful to
analyze in terms of marketing opportunities. Maybe this could be a result of the high population which
results in a better indication of the average loan. Another reason could be that there is more competition
in the capital to get a decent loan, resulting in banks being more reluctant in giving out big loans. In both
cases the bank could take this into account when doing credit scores/assigning loans.
Carlos Montenegro
Sébastien Pavot
Edward Vrijghem
As per the relation between the number of transactions and the balance of your clients you can see a
strong positive correlation. Wealthier clients tend to do more transactions, as expected.
Trends
As for trends we see a steady decrease in the average yearly loan payment as well as the interests credited.
These obviously go together. This could be an industry trend or could a company specific trend, in both
cases this is a very worrying indicator and should thoroughly be examined.
Yearly transactions and total clients increased from 1993 to 1994, remained steady until 1997. After 1997
we notice a decrease, this is equally worrying as the decreasing average loan payments and average
interests credited. All of these factors need a solid analysis and should be reviewed.
Carlos Montenegro
Sébastien Pavot
Edward Vrijghem
Technical Aspects
1. nunique(): which gave us the number of unique values for specific variables
2. head(): which in most cases was used to return the first 5 rows of each dataset
3. info(): which gave us a summary for each variable, containing the number of non-missing values
and its type.
4. Describe(): which gave the maximum and minimum numbers, which was specially important to
check in which ranges our variables fluctuated.
Libraries
pandas: this library is used to convert the variables to date time format, to merge tables, to
create dummies for categorical variables, to get dummies, among others.
numpy: this library is used to calculate a time difference, to find NaN values, to load a data file,
to fix a random state, to obtain absolute values, to arrange bins, among others.
Carlos Montenegro
Sébastien Pavot
Edward Vrijghem
Functions created
explore:
o Logic: this function makes the data exploration simpler by generating the main
information about the data set
o Input: data set to be analyzed
o Output: it returns a print of the following functions ouput: describe(), info(), nunique()
to_month_gender
o Logic: this function returns the gender of a text that contains the birthday and gender of
a person (50 + MonthNumber if the person is a woman)
o Input: data set value
o Output: the person’s gender
Variable information