0% found this document useful (0 votes)

352 views18 pages

Apex Financial Services Loan Data Automation

Uploaded by

thomasmiller1618

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

352 views18 pages

Apex Financial Services Loan Data Automation

Uploaded by

thomasmiller1618

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Install Libraries

# Install necessary libraries

!pip install pandas matplotlib seaborn openpyxl

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# This makes sure that the matplotlib plots are displayed within the
notebook
%matplotlib inline

Requirement already satisfied: pandas in

/usr/local/lib/python3.10/dist-packages (2.0.3)
Requirement already satisfied: matplotlib in
/usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: seaborn in
/usr/local/lib/python3.10/dist-packages (0.13.1)
Requirement already satisfied: openpyxl in
/usr/local/lib/python3.10/dist-packages (3.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in
/usr/local/lib/python3.10/dist-packages (from pandas) (2024.1)
Requirement already satisfied: numpy>=1.21.0 in
/usr/local/lib/python3.10/dist-packages (from pandas) (1.25.2)
Requirement already satisfied: contourpy>=1.0.1 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.1)
Requirement already satisfied: cycler>=0.10 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (4.51.0)
Requirement already satisfied: kiwisolver>=1.0.1 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5)
Requirement already satisfied: packaging>=20.0 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (24.0)
Requirement already satisfied: pillow>=6.2.0 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in
/usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2)
Requirement already satisfied: et-xmlfile in
/usr/local/lib/python3.10/dist-packages (from openpyxl) (1.1.0)
Requirement already satisfied: six>=1.5 in
/usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2-
>pandas) (1.16.0)

loading the Excel data into a pandas DataFrame

data = pd.read_excel('/content/PDA APEX Loan Data.xlsx',

sheet_name=None)

Data Inspection
performing an initial inspection to understand the data structure and checking for any
inconsistencies or issues that might need to be addressed

# Load data from the 'CW1' sheet specifically

loan_data = pd.read_excel('/content/PDA APEX Loan Data.xlsx',
sheet_name='CW1')

# Display the first few rows of the dataset

print(loan_data.head())

# Display the data types of each column

print(loan_data.dtypes)

# Get a concise summary of the DataFrame

print(loan_data.info())

Loan_ID Gender Married Dependents Graduate Self_Employed \

0 2284 1 0 0 0 0
1 2287 2 0 0 1 0
2 2288 1 1 2 0 0
3 2296 1 0 0 0 0
4 2297 1 0 0 1 0

ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \

0 3902 1666.0 109 333
1 1500 1800.0 103 333
2 2889 0.0 45 180
3 2755 0.0 65 300
4 2500 20000.0 103 333

Credit_History Property_Area Loan_Status

0 1 3 Y
1 0 2 N
2 0 1 N
3 1 3 N
4 1 2 Y
Loan_ID int64
Gender int64
Married int64
Dependents int64
Graduate int64
Self_Employed int64
ApplicantIncome int64
CoapplicantIncome float64
LoanAmount int64
Loan_Amount_Term int64
Credit_History int64
Property_Area int64
Loan_Status object
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 247 entries, 0 to 246
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Loan_ID 247 non-null int64
1 Gender 247 non-null int64
2 Married 247 non-null int64
3 Dependents 247 non-null int64
4 Graduate 247 non-null int64
5 Self_Employed 247 non-null int64
6 ApplicantIncome 247 non-null int64
7 CoapplicantIncome 247 non-null float64
8 LoanAmount 247 non-null int64
9 Loan_Amount_Term 247 non-null int64
10 Credit_History 247 non-null int64
11 Property_Area 247 non-null int64
12 Loan_Status 247 non-null object
dtypes: float64(1), int64(11), object(1)
memory usage: 25.2+ KB
None

After succefully loading the data, We can see that all the columns have non-null value across all
the 247 entries. this is an indication that there are no missing values on the dataset,which
simplifies the cleaning process.

Data Cleaning
# Checking for duplicates
if loan_data['Loan_ID'].duplicated().any():
loan_data = loan_data.drop_duplicates('Loan_ID')
print("Duplicates removed.")
else:
print("No duplicates found.")
# Converting categorical variables to 'category' dtype
categorical_vars = ['Gender', 'Married', 'Dependents', 'Graduate',
'Self_Employed',
'Credit_History', 'Property_Area', 'Loan_Status']
loan_data[categorical_vars] =
loan_data[categorical_vars].astype('category')

print("Data types after conversion:")

print(loan_data.dtypes)

No duplicates found.
Data types after conversion:
Loan_ID int64
Gender category
Married category
Dependents category
Graduate category
Self_Employed category
ApplicantIncome int64
CoapplicantIncome float64
LoanAmount int64
Loan_Amount_Term int64
Credit_History category
Property_Area category
Loan_Status category
dtype: object

Initial Descriptive Analysis

# Basic descriptive statistics
print(loan_data.describe(include='all'))

# Specific descriptive insights

print("Total Loan Amount Approved:",
loan_data[loan_data['Loan_Status'] == 'Y']['LoanAmount'].sum())
print("Average Loan Amount:", loan_data['LoanAmount'].mean())
print("Average Loan Term:", loan_data['Loan_Amount_Term'].mean())

Loan_ID Gender Married Dependents Graduate

Self_Employed \
count 247.000000 247.0 247.0 247.0 247.0
247.0
unique NaN 2.0 2.0 4.0 2.0
2.0
top NaN 1.0 1.0 0.0 1.0
0.0
freq NaN 198.0 159.0 141.0 184.0
212.0
mean 2544.161943 NaN NaN NaN NaN
NaN
std 302.300553 NaN NaN NaN NaN
NaN
min 1900.000000 NaN NaN NaN NaN
NaN
25% 2369.500000 NaN NaN NaN NaN
NaN
50% 2560.000000 NaN NaN NaN NaN
NaN
75% 2784.500000 NaN NaN NaN NaN
NaN
max 2990.000000 NaN NaN NaN NaN
NaN

ApplicantIncome CoapplicantIncome LoanAmount

Loan_Amount_Term \
count 247.000000 247.000000 247.000000
247.000000
unique NaN NaN NaN
NaN
top NaN NaN NaN
NaN
freq NaN NaN NaN
NaN
mean 5403.688259 1776.918704 152.627530
320.040486
std 6485.890426 3993.412132 89.516037
60.030399
min 210.000000 0.000000 9.000000
12.000000
25% 2752.500000 0.000000 100.000000
333.000000
50% 3691.000000 1250.000000 130.000000
333.000000
75% 5822.000000 2241.000000 176.500000
333.000000
max 81000.000000 41667.000000 600.000000
480.000000

Credit_History Property_Area Loan_Status

count 247.0 247.0 247
unique 2.0 3.0 2
top 1.0 2.0 Y
freq 186.0 95.0 167
mean NaN NaN NaN
std NaN NaN NaN
min NaN NaN NaN
25% NaN NaN NaN
50% NaN NaN NaN
75% NaN NaN NaN
max NaN NaN NaN
Total Loan Amount Approved: 25215
Average Loan Amount: 152.62753036437246
Average Loan Term: 320.0404858299595

# Set the aesthetic style of the plots

sns.set(style="whitegrid")

# Loan Amount Distribution

plt.figure(figsize=(10, 6))
sns.histplot(loan_data['LoanAmount'], kde=True)
plt.title('Distribution of Loan Amounts')
plt.xlabel('Loan Amount')
plt.ylabel('Frequency')
plt.show()

# Applicant Income Distribution

plt.figure(figsize=(10, 6))
sns.histplot(loan_data['ApplicantIncome'], kde=True, color='green')
plt.title('Distribution of Applicant Incomes')
plt.xlabel('Applicant Income')
plt.ylabel('Frequency')
plt.show()

# Loan Status by Gender

plt.figure(figsize=(10, 6))
sns.countplot(x='Loan_Status', hue='Gender', data=loan_data)
plt.title('Loan Status by Gender')
plt.xlabel('Loan Status')
plt.ylabel('Count')
plt.legend(title='Gender', labels=['Male', 'Female'])
plt.show()
Descriptive Statistics:

• Most applicants are male (Gender=1 occurs most frequently).

• The majority of applicants are married (Married=1).
• A large portion of applicants have no dependents (Dependents=0).
• Most applicants are graduates (Graduate=1).
• The majority are not self-employed (Self_Employed=0).

Income and Loan Amounts:

• The average applicant income is approximately 5,404, with a wide range from 210 to
81,000.
• The co-applicant income also shows significant variation.
• The average loan amount is approximately 153, with loans ranging from 9 to $ 600.

Loan Terms and Credit History:

• The average loan term is about 320 months.

• Most applicants have a credit history (Credit_History=1).

Property Area:

• The most common property area is Semiurban (Property_Area=2).

Loan Approval:

• The total loan amount approved across all applications is $25,215.

• Most loans are approved (Loan_Status='Y').
# Count of approved and rejected loans by gender
loan_status_gender = pd.crosstab(index=loan_data['Loan_Status'],
columns=loan_data['Gender'], margins=True, margins_name="Total")
loan_status_gender.columns = ['Male', 'Female', 'Total']
loan_status_gender.index = ['Rejected', 'Approved', 'Total']

# Plotting the data

loan_status_gender.drop('Total').plot(kind='bar', figsize=(10, 6))
plt.title('Loan Status by Gender')
plt.xlabel('Loan Status')
plt.ylabel('Number of Applicants')
plt.xticks(rotation=0)
plt.show()

# Maximum and minimum loan amounts

max_loan = loan_data['LoanAmount'].max()
min_loan = loan_data['LoanAmount'].min()

# Display as a bar chart

plt.figure(figsize=(8, 6))
sns.barplot(x=['Max Loan Amount', 'Min Loan Amount'], y=[max_loan,
min_loan])
plt.title('Maximum and Minimum Loan Amounts')
plt.ylabel('Loan Amount')
plt.show()

# Number of self-employed who had their loan approved

self_employed_approved = loan_data[(loan_data['Self_Employed'] == 1) &
(loan_data['Loan_Status'] == 'Y')].shape[0]
total_approved = loan_data[loan_data['Loan_Status'] == 'Y'].shape[0]
percentage_self_employed_approved = (self_employed_approved /
total_approved) * 100

print("Percentage of self-employed applicants who had their loan

approved:", percentage_self_employed_approved)

Percentage of self-employed applicants who had their loan approved:

13.77245508982036
Loan Approval Rates by Categorical Features
# Define the categorical features
categorical_features = ['Married', 'Graduate', 'Property_Area']

# Plotting Loan Approval Rates by Categorical Features

fig, axes = plt.subplots(1, len(categorical_features), figsize=(18,
5))
for i, cat in enumerate(categorical_features):
sns.barplot(ax=axes[i], x=cat,
y=loan_data['Loan_Status'].apply(lambda x: 1 if x == 'Y' else 0),
data=loan_data)
axes[i].set_title(f'Loan Approval Rate by {cat}')
axes[i].set_ylabel('Approval Rate')
axes[i].set_xlabel(cat)
plt.tight_layout()
plt.show()

Graph 1: Loan Approval Rate by Marital Status

The bar chart illustrates the approval rate of loans based on the marital status of applicants.
There are two categories represented:

• 0 for Single
• 1 for Married

The graph shows that married applicants have a slightly higher loan approval rate compared to
single applicants. This could suggest that married applicants might be viewed as having more
stable financial conditions or possibly dual incomes, which could influence the decision-making
process in approving loans.

Graph 2: Loan Approval Rate by Graduate Status

The bar chart shows the approval rate of loans based on whether the applicants are graduates:

• 0 for Non-Graduates *1 for Graduates

From the graph, it is evident that graduates have a higher loan approval rate compared to non-
graduates. This may be attributed to the perception that graduates are more likely to have stable
and higher-paying jobs, making them better candidates for loan approvals.
Graph 3: Loan Approval Rate by Property Area

The bar chart displays the loan approval rates based on the property area of the applicants:

• 1 for Urban
• 2 for Semiurban
• 3 for Rural

The graph indicates that applicants from Semiurban areas have the highest loan approval rate,
followed closely by Urban and Rural areas. This might reflect varying credit policies or economic
conditions across different regions that influence loan approval decisions.

Impact of Credit History on Loan Approval

# Impact of Credit History on Loan Approval

plt.figure(figsize=(10, 6))
sns.barplot(x='Credit_History',
y=loan_data['Loan_Status'].apply(lambda x: 1 if x == 'Y' else 0),
data=loan_data)
plt.title('Impact of Credit History on Loan Approval Rate')
plt.xlabel('Credit History')
plt.ylabel('Approval Rate')
plt.show()

The bar chart visualizes the impact of credit history on loan approval rates:
• 0 for No Credit History
• 1 for Yes Credit History

The graph starkly illustrates that applicants with a credit history (1) have a significantly higher
approval rate compared to those without a credit history (0).

This underscores the importance of credit history in lending decisions, where a positive credit
history strongly favors the likelihood of loan approval.

Predictive Modeling for Loan Approval

# Import necessary libraries

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Data Preparation
X = loan_data.drop('Loan_Status', axis=1)
y = loan_data['Loan_Status']
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)

# Encoding categorical variables (if not already encoded)

X = pd.get_dummies(X, drop_first=True)

# Splitting the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)

# Model Training
model = LogisticRegression()
model.fit(X_train, y_train)

# Model Evaluation
predictions = model.predict(X_test)
print(classification_report(y_test, predictions))

precision recall f1-score support

0 0.78 0.67 0.72 21

1 0.88 0.93 0.90 54

accuracy 0.85 75
macro avg 0.83 0.80 0.81 75
weighted avg 0.85 0.85 0.85 75

The precision for class 1 (approved loans) is particularly high at 88%, which means the model is
very effective at identifying true positive loan approvals. Here's a breakdown of the model's
performance and what each metric signifies:
Precision: Indicates how accurate the predictions are. For instance, when your model predicted a
loan would be approved, it was correct 88% of the time.

Recall: Reflects the ability to find all relevant instances. For approved loans, the model correctly
identified 93% of all actual approvals.

F1-Score: A weighted average of precision and recall. An F1-score reaches its best value at 1
(perfect precision and recall) and worst at 0.

Interpretation:

The model is robust in terms of identifying loan approvals, which is crucial for avoiding potential
defaults by not approving risky loans.

There is slightly lower precision and recall for the rejected class (0), which could suggest a need
for additional features or alternative modeling techniques to improve identification of rejected
applications.

Next Steps:

Given the success of this initial model, you might consider the following enhancements or
further analysis:

Feature Engineering: You can create new features that might help improve model predictions,
such as ratios of income to loan amount, or aggregate measures of credit history.

Try Different Models: Experiment with other models like Decision Trees, Random Forests, or
Gradient Boosting Machines to see if they can achieve better performance.

Model Tuning: Adjust model parameters using techniques like grid search or random search to
find the best settings for your algorithms.

Segmentation Analysis
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Correct approach to avoid SettingWithCopyWarning

# Create a copy of the DataFrame for safe manipulation
features = loan_data[['ApplicantIncome', 'LoanAmount',
'Property_Area', 'Graduate']].copy()

# Use .loc to safely convert categorical data to codes

features.loc[:, 'Property_Area'] =
features['Property_Area'].astype('category').cat.codes
features.loc[:, 'Graduate'] =
features['Graduate'].astype('category').cat.codes
# Data Standardization
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Implementing K-means Clustering

kmeans = KMeans(n_clusters=3, n_init=10, random_state=42)
clusters = kmeans.fit_predict(features_scaled)

# Adding cluster information to the original DataFrame

loan_data['Cluster'] = clusters

# Visualizing the clusters based on two dimensions, e.g., Income and

Loan Amount
plt.figure(figsize=(10, 6))
sns.scatterplot(x='ApplicantIncome', y='LoanAmount', hue='Cluster',
data=loan_data, palette='viridis')
plt.title('Cluster of Applicants based on Income and Loan Amount')
plt.show()

# Analyzing clusters
for i in range(3):
cluster = loan_data[loan_data['Cluster'] == i]
print(f"Cluster {i}:")
print(f"Average Income: {cluster['ApplicantIncome'].mean()}")
print(f"Average Loan Amount: {cluster['LoanAmount'].mean()}")
print(f"Proportion of Graduates:
{cluster['Graduate'].value_counts(normalize=True)}")
print(f"Property Area Distribution:
{cluster['Property_Area'].value_counts(normalize=True)}\n")
Cluster 0:
Average Income: 3880.126984126984
Average Loan Amount: 128.11111111111111
Proportion of Graduates: Graduate
0 1.0
1 0.0
Name: proportion, dtype: float64
Property Area Distribution: Property_Area
3 0.428571
2 0.349206
1 0.222222
Name: proportion, dtype: float64

Cluster 1:
Average Income: 4614.629411764706
Average Loan Amount: 141.87058823529412
Proportion of Graduates: Graduate
1 1.0
0 0.0
Name: proportion, dtype: float64
Property Area Distribution: Property_Area
2 0.394118
3 0.311765
1 0.294118
Name: proportion, dtype: float64
Cluster 2:
Average Income: 21841.14285714286
Average Loan Amount: 393.57142857142856
Proportion of Graduates: Graduate
1 1.0
0 0.0
Name: proportion, dtype: float64
Property Area Distribution: Property_Area
2 0.428571
1 0.285714
3 0.285714
Name: proportion, dtype: float64

Analysis of Clusters The analysis results show distinct characteristics for each cluster:

Cluster 0:

• Average Income: 3,880

• Average Loan Amount: 128
• Graduates: 100% non-graduates
• Property Area Distribution: Mostly rural (42.8%), followed by semi-urban (34.9%) and
urban (22.2%).

Cluster 1:

• Average Income: 4,615

• Average Loan Amount: 141.87
• Graduates: 100% graduates
• Property Area Distribution: Well distributed across semi-urban (39.4%), rural (31.2%),
and urban (29.4%).

Cluster 2:

• Average Income: $21,841 (significantly higher)

• Average Loan Amount: 393.57 (also significantly higher)
• Graduates: 100% graduates
• Property Area Distribution: Evenly distributed across semi-urban and rural areas with
slightly less in urban areas. Insights and Business Implications

Targeting Strategies:

• Cluster 0 could be targeted with products designed for lower-income, non-graduate

individuals in more rural settings.
• Cluster 1 might benefit from standard loan products, suitable for middle-income
graduates spread across all areas.
• Cluster 2 represents high-income graduates who might be interested in higher loan
amounts. Tailored products for large investments or business loans could be more
appealing to this group.
Marketing Adjustments:

• Different marketing strategies can be employed that resonate with the unique
characteristics of each cluster. For example, more straightforward, assurance-focused
messaging might work better for Cluster 0, while more sophisticated, investment-
opportunity-focused messaging could appeal to Cluster 2.
• Risk Management: Understanding the income and educational background can help in
adjusting the risk models, as higher-income, educated groups (like Cluster 2) might have
a lower default rate.

conclusion
This project encapsulates a comprehensive data analysis lifecycle from loading, cleaning,
analyzing, and modeling Apex Financial Services loan data. Insights derived from this analysis
help in understanding the lending environment and making informed decisions on loan
approvals and risk management

from google.colab import drive

drive.mount('/content/drive')

Sample Bank Loan Data
No ratings yet
Sample Bank Loan Data
8,528 pages
amar data
No ratings yet
amar data
319 pages
BPTP PH-III
No ratings yet
BPTP PH-III
135 pages
List of Employees in Bank of Maharashtra Feb 2022 PF - No Sname Region - Name Cadrename Gross Amount Pension
No ratings yet
List of Employees in Bank of Maharashtra Feb 2022 PF - No Sname Region - Name Cadrename Gross Amount Pension
340 pages
EJMA The Expansion Joint Manufacturers Association Standards 2008 (9th Ed)
No ratings yet
EJMA The Expansion Joint Manufacturers Association Standards 2008 (9th Ed)
239 pages
Data 2139 Entry
No ratings yet
Data 2139 Entry
82 pages
DAILY DATA SHEET (1)
No ratings yet
DAILY DATA SHEET (1)
161 pages
SCB_ND_YES_MIS
No ratings yet
SCB_ND_YES_MIS
102 pages
Bajaj Overdraft PDF
No ratings yet
Bajaj Overdraft PDF
8 pages
cs for anamika sugar
No ratings yet
cs for anamika sugar
19 pages
2024 Car Loan Data
No ratings yet
2024 Car Loan Data
78 pages
Hdfc Life Calling
No ratings yet
Hdfc Life Calling
52 pages
UP B
No ratings yet
UP B
789 pages
annual_report_fy_2023_2024_9fd088f109
No ratings yet
annual_report_fy_2023_2024_9fd088f109
320 pages
NEW DATA 2
No ratings yet
NEW DATA 2
45 pages
Untitled Spreadsheet 35
No ratings yet
Untitled Spreadsheet 35
5 pages
TW Disbursement Tracker
No ratings yet
TW Disbursement Tracker
253 pages
Pydi Data
No ratings yet
Pydi Data
3 pages
palal up
No ratings yet
palal up
4 pages
Bajaj Flexi Loan New
No ratings yet
Bajaj Flexi Loan New
28 pages
Data HDFC Bank11
0% (1)
Data HDFC Bank11
68 pages
Pages From 16th+Mar+-+IUPAC+Nomenclature+ - +Top+50+NEET+Questions+ - +last+20+years-3
No ratings yet
Pages From 16th+Mar+-+IUPAC+Nomenclature+ - +Top+50+NEET+Questions+ - +last+20+years-3
15 pages
CA Ahmedabad Full
No ratings yet
CA Ahmedabad Full
6 pages
calling data 24-03-2025
No ratings yet
calling data 24-03-2025
6 pages
Delhi High Court
No ratings yet
Delhi High Court
9 pages
Ifsc Code List-2001-2500 PDF
100% (2)
Ifsc Code List-2001-2500 PDF
500 pages
Data 85
No ratings yet
Data 85
4 pages
SAT Suite Question Bank - Problem-Solving and Data Analysis
No ratings yet
SAT Suite Question Bank - Problem-Solving and Data Analysis
341 pages
English 6 - Quarter 1 - Module 4
100% (1)
English 6 - Quarter 1 - Module 4
22 pages
Sample DB Ahmedabad - Car Owners
No ratings yet
Sample DB Ahmedabad - Car Owners
8 pages
Employee Data Format
No ratings yet
Employee Data Format
18 pages
Essentials of Foye's Principles of Medicinal Chemistry Official eBook Release
100% (18)
Essentials of Foye's Principles of Medicinal Chemistry Official eBook Release
14 pages
Company Name Salutation First Name Designation Address Industry Vertical
No ratings yet
Company Name Salutation First Name Designation Address Industry Vertical
2 pages
Sir Aqib Khan as Cs p1 Mock 2025 (1)
No ratings yet
Sir Aqib Khan as Cs p1 Mock 2025 (1)
22 pages
WP 51929-2014 and Connected Matters - Arkavathy Layout
No ratings yet
WP 51929-2014 and Connected Matters - Arkavathy Layout
754 pages
Customer - Name Mobile Number Apac Number
No ratings yet
Customer - Name Mobile Number Apac Number
3 pages
5shikhar 5921 To 6080 08 11 2017
No ratings yet
5shikhar 5921 To 6080 08 11 2017
14 pages
Odisha Graduates 2014
No ratings yet
Odisha Graduates 2014
42 pages
Delhi List
No ratings yet
Delhi List
3 pages
1174 APS 2 CCS0521656666 001 - IPA - Gateway - Web1
No ratings yet
1174 APS 2 CCS0521656666 001 - IPA - Gateway - Web1
52 pages
Home Loan Data For Analysis
No ratings yet
Home Loan Data For Analysis
216 pages
Greengross (2012)
No ratings yet
Greengross (2012)
16 pages
A Survey of Deep Learning and Foundation Models For Time Series Forecasting
No ratings yet
A Survey of Deep Learning and Foundation Models For Time Series Forecasting
35 pages
Name of The Borrower State Location Property Address Area Price Auction Scheduled Date
No ratings yet
Name of The Borrower State Location Property Address Area Price Auction Scheduled Date
3 pages
Company Sector/Industry Full Mkt. Cap
No ratings yet
Company Sector/Industry Full Mkt. Cap
38 pages
Reflections On Implementation Science
No ratings yet
Reflections On Implementation Science
25 pages
MCS.1
No ratings yet
MCS.1
24 pages
Sea 2022 Math
60% (5)
Sea 2022 Math
24 pages
m2 Writing Techniques Ppt
No ratings yet
m2 Writing Techniques Ppt
16 pages
Personality psychology assignment 2
No ratings yet
Personality psychology assignment 2
3 pages
Amit Kumar 9041898387: Sno Name Contact No
No ratings yet
Amit Kumar 9041898387: Sno Name Contact No
12 pages
Mozambiquetourism
No ratings yet
Mozambiquetourism
18 pages
Tefcircle August 2024 Edition
No ratings yet
Tefcircle August 2024 Edition
21 pages
2011 Physics Examination Paper PDF
No ratings yet
2011 Physics Examination Paper PDF
51 pages
Brokers
No ratings yet
Brokers
7 pages
INP - Company Information
No ratings yet
INP - Company Information
12 pages
Functional Independence Measure Scale
No ratings yet
Functional Independence Measure Scale
15 pages
3
No ratings yet
3
2 pages
Number Systems & Numbers Representation
No ratings yet
Number Systems & Numbers Representation
15 pages
COMPUTER APPLICATION IN MEDICAL LABORATORY MANAGEMENT
No ratings yet
COMPUTER APPLICATION IN MEDICAL LABORATORY MANAGEMENT
7 pages
Calling Tele Calling Data _ PDF
No ratings yet
Calling Tele Calling Data _ PDF
27 pages
Study - Notes - Jonathan Clark - Beginner's Aviation
No ratings yet
Study - Notes - Jonathan Clark - Beginner's Aviation
53 pages
A Friend in Need
No ratings yet
A Friend in Need
24 pages
Gearbox Oil Monitoring Sensor
No ratings yet
Gearbox Oil Monitoring Sensor
4 pages
Mock Interview Feedback Form
No ratings yet
Mock Interview Feedback Form
4 pages
Living Wax Museum Overall Research Project Rubric Requirement Points
No ratings yet
Living Wax Museum Overall Research Project Rubric Requirement Points
11 pages
LAP_Payouts_RKPL
No ratings yet
LAP_Payouts_RKPL
14 pages
Constant
No ratings yet
Constant
182 pages
Bataan Peninsula State University: Jacel Reinard M. Arceo BSEMC-DAT1A Major: DAT1
No ratings yet
Bataan Peninsula State University: Jacel Reinard M. Arceo BSEMC-DAT1A Major: DAT1
5 pages
Loan Data
No ratings yet
Loan Data
16 pages
Class_9_Physics_Project_With_Preface_Intro
No ratings yet
Class_9_Physics_Project_With_Preface_Intro
2 pages
BP102TP (1) (1)
No ratings yet
BP102TP (1) (1)
1 page
S.No Date Name of The Organisation Contact Person Name Number of Sales Reps/ Product Name/Industry
No ratings yet
S.No Date Name of The Organisation Contact Person Name Number of Sales Reps/ Product Name/Industry
3 pages
HW 23 Checked PDF
No ratings yet
HW 23 Checked PDF
2 pages
Salary Data Feb 2020
No ratings yet
Salary Data Feb 2020
20 pages
CS7
No ratings yet
CS7
1 page
List of CAs
No ratings yet
List of CAs
7 pages
Book 1 Demo
No ratings yet
Book 1 Demo
22 pages
List of Customers
No ratings yet
List of Customers
16 pages
DLF EOI Form
No ratings yet
DLF EOI Form
5 pages
Airtel Numbers List Aug 2021
No ratings yet
Airtel Numbers List Aug 2021
1 page
Priyanka Lead
No ratings yet
Priyanka Lead
20 pages
Sheet 1
No ratings yet
Sheet 1
4 pages
Final
No ratings yet
Final
14 pages
Saravanan Elango Amar Gupta Vipin Kumar Mahesh Joshi Siddharth Dutta
No ratings yet
Saravanan Elango Amar Gupta Vipin Kumar Mahesh Joshi Siddharth Dutta
2 pages
Sl. No. I.F.NO - of Shareholder / Name of The Shareholder Surname
No ratings yet
Sl. No. I.F.NO - of Shareholder / Name of The Shareholder Surname
41 pages
Allahabad Salary Samples
No ratings yet
Allahabad Salary Samples
2 pages
App Id Merchant Name Customeri D
No ratings yet
App Id Merchant Name Customeri D
4 pages
Rohit
No ratings yet
Rohit
1 page
Touchpad Plus Ver. 3.1 Class 6
From Everand
Touchpad Plus Ver. 3.1 Class 6
Geeta Zunjani
No ratings yet

Apex Financial Services Loan Data Automation

Uploaded by

Apex Financial Services Loan Data Automation

Uploaded by

Install Libraries

# Install necessary libraries

Requirement already satisfied: pandas in

loading the Excel data into a pandas DataFrame

data = pd.read_excel('/content/PDA APEX Loan Data.xlsx',

# Load data from the 'CW1' sheet specifically

# Display the first few rows of the dataset

# Display the data types of each column

# Get a concise summary of the DataFrame

Loan_ID Gender Married Dependents Graduate Self_Employed \

ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \

Credit_History Property_Area Loan_Status

print("Data types after conversion:")

Initial Descriptive Analysis

# Specific descriptive insights

Loan_ID Gender Married Dependents Graduate

ApplicantIncome CoapplicantIncome LoanAmount

Credit_History Property_Area Loan_Status

# Set the aesthetic style of the plots

# Loan Amount Distribution

# Applicant Income Distribution

# Loan Status by Gender

• Most applicants are male (Gender=1 occurs most frequently).

Income and Loan Amounts:

Loan Terms and Credit History:

• The average loan term is about 320 months.

• The most common property area is Semiurban (Property_Area=2).

• The total loan amount approved across all applications is $25,215.

# Plotting the data

# Maximum and minimum loan amounts

# Display as a bar chart

# Number of self-employed who had their loan approved

print("Percentage of self-employed applicants who had their loan

Percentage of self-employed applicants who had their loan approved:

# Plotting Loan Approval Rates by Categorical Features

Graph 1: Loan Approval Rate by Marital Status

Graph 2: Loan Approval Rate by Graduate Status

• 0 for Non-Graduates *1 for Graduates

Impact of Credit History on Loan Approval

# Impact of Credit History on Loan Approval

Predictive Modeling for Loan Approval

# Import necessary libraries

# Encoding categorical variables (if not already encoded)

# Splitting the dataset

precision recall f1-score support

0 0.78 0.67 0.72 21

# Correct approach to avoid SettingWithCopyWarning

# Use .loc to safely convert categorical data to codes

# Implementing K-means Clustering

# Adding cluster information to the original DataFrame

# Visualizing the clusters based on two dimensions, e.g., Income and

• Average Income: 3,880

• Average Income: 4,615

• Average Income: $21,841 (significantly higher)

• Cluster 0 could be targeted with products designed for lower-income, non-graduate

from google.colab import drive

You might also like