0% found this document useful (0 votes)

386 views12 pages

Loan Eligibility Prediction

The document describes developing a machine learning model to predict loan eligibility. It discusses collecting loan application data, preprocessing the data by handling missing values and normalizing variables. Decision tree and naive Bayes algorithms are used to build predictive models and evaluate their accuracy in predicting whether applicants will be eligible for loans. The goal is to automate loan eligibility decisions to reduce approval time and risk for banks.

Uploaded by

Uddhav Chalise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

386 views12 pages

Loan Eligibility Prediction

Uploaded by

Uddhav Chalise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

i

Loan Eligibility Prediction

Introduction

Taking a loan is a very time-consuming process. The application must go through a lot of stages
and it’s still not necessary that it will be approved. To decrease the approval time and to decrease
the risk associated with the loan many loan prediction models were introduced. A prediction
model uses statistics, probability and data mining to forecast an outcome. Every model has some
variables that are likely to influence the outcome. The prediction model helps the banks by
minimizing the risks associated with loan approval system and helps applicant by decreasing the
time for the process.

Background

In the present scenario, a loan needs to be approved manually by a representative of the bank
which means that person will be responsible for whether the person is eligible for the loan or not
and calculating the risk associated with it. As it is done by a human it is a time-consuming
process and is susceptible to errors. If the loan is not repaid, then it accounts as a loss to the bank
and banks earn most of their profits by the interest paid to them. If the banks lose too much
money, then it will result in a banking crisis. These banking crisis affects the economy of the
country. So, it is very important that the loan should be approved with the least amount of error in
risk calculation while taking up as the least time possible. So, a loan prediction model is required
that can predict quickly whether the loan can be passed or not with the least amount of risk
possible.

Problem Statement

The two most pressing issues in the banking sector are: 1) How risky is the borrower? 2) Should
we lend to the borrower given the risk? The response to the first question dictates the borrower’s
interest rate. Interest rate, among other things (such as time value of money), tests the riskiness of
the borrower, i.e., the higher the interest rate, the riskier the borrower. We will then decide
whether the applicant is suitable for the loan based on the interest rate. Lenders (investors) make
loans to creditors in return for the guarantee of interest-bearing repayment. That is, the lender
only makes a return (interest) if the borrower repays the loan. However, whether he or she does
not repay the loan, the lender loses money. Banks make loans to customers in exchange for the
guarantee of repayment. Some would default on their debts, unable to repay them for several
reasons. The bank retains insurance to minimize the possibility of failure in the case of a default.
The insured sum can cover the whole loan amount or just a portion of it. Banking processes use
manual procedures to determine whether a borrower is suitable for a loan based on results.

1
Manual procedures were mostly effective, but they were insufficient when there were a large
number of loan applications. At that time, making a decision would take a long time. As a result,
the loan prediction machine learning model can be used to assess a customer’s loan status and
build strategies. This model extracts and introduces the essential features of a borrower that
influence the customer’s loan status. Finally, it produces the planned performance (loan status).
These reports make a bank manager’s job simpler and quicker.

Objective

The major objectives of the project are listed below:

 To develop a simple loan eligibility prediction model that automatically decides whether a
person is eligible for a loan or not.
 To compare different algorithm for creating the model and calculate their accuracy.

2
Literature Review

The author, Vaidya, Ashlesha [1] uses logistic regression as a machine learning tool in paper and
shows how predictive approaches can be used in real world loan approval problems. His paper
uses a statistical model (Logistic Regression) to predict whether the loan should be approved or
not for a set of records of an applicant. Logistic regression can even work with power terms and
nonlinear effect. Some limitations of this model are that it requires independent variables for
estimation and a large sample is required for parameter estimation.

A work by Amin, Rafik Khairul and Yuliant Sibaroni [2] was referenced which used Decision
tree algorithm called C4.5 to implement a predictive model. This algorithm creates a decision tree
that generally gives a high accuracy in decision making problems. Dataset of 1000 cases is used
in which 70% is approved and rest is rejected. This paper shows C4.5 algorithm performance in
recognizing the eligibility of the applicant to repay his/her loan. From the conducted tests, it is
found that the highest precision value is 78.08% which was found using data partition of 90:10.
The optimized recall value is 96.4% and was reached with data partition of 80:20. Partition of
80:20 is considered to be best since it has a high recall and the highest accuracy.

The optimized and work done by Arora, Nisha and Pankaj Deep Kaur [3] aimed at forecasting
whether an applicant can be a loan defaulter or not. It uses Bolasso to select most relevant
attributes based on their robustness and then applied to classification algorithms like Random
Forest, SVM, Naïve Bayes and Knearest Neighbours (KNN) to test how accurately they can
predict the results. It is concluded that Bolasso enabled Random Forest algorithm (BS-RF)
provides the best results in credit risk evaluation and gives better accuracy by using optimized
feature selection methods.

3
Methodology

S tart

Data Collection

Analysis of Data

Data Cleaning

Model building using Decision tree and N aïve Bayes A lgo rithm

Evaluation of model us ing Accuracy, Precision , Re call a nd F -measure

End

Flowchart of entire process

4
Data Collection

We used dataset of Loan Eligibility Prediction from Kaggle which is the world’s largest data
science community with powerful tools and resources. The dataset consists of 614 applicants with
following attributes:

 Loan Id: Unique Id for an applicant

 Gender: Gender of the applicant
 Marital Status: Marital status of the applicant
 Dependents: No. of dependents of the applicant
 Education: Educational status of the applicant
 Applicant Income: Income of the applicant
 Co-applicant Income: Income of the co-applicant (if any)
 Loan Amount: Applied loan amount
 Loan Term: Time to repay the loan
 Credit History: Record of how applicant has managed debts
 Property Area: Location of property
 Loan Status: Status of loan o Y for eligible o N for not eligible

Firstly, we load the dataset using pandas and after loading the dataset, we preprocessed it and
then used 80% data in the dataset to train the model and verified the accuracy using remaining
20% of the data.

Preprocessing

There were some missing values in the data set. Based on the variables we used mean and mode
of all the values of the variables. The missing values in applicant income was replaced with mean
of applicant income of the dataset. Similarly, missing values in gender, marital status, no. of
dependents were replaced with the mode of the respective variables.

5
Missing values of variables

Next, the distribution of variables was studied. For this box plot and histogram was used. Study
of the distribution of data gave general idea about the variables related to the applicants.

Box plot of applicant income

6
Histogram of applicant income

Now the data was normalized so the outliers can be handled effectively. For this, the logarithmic
function was applied to the total of applicant income and co-applicant income and the data was
normalized.

Histogram of log of total income

7
System Design

Decision Tree

This is a supervised machine learning algorithm mostly used for classification problems. All
features should be discretized in this model, so that the population can be split into two or more
homogeneous sets or subsets. This model uses a different algorithm to split a node into two or
more sub-nodes. With the creation of more sub-nodes, homogeneity and purity of the nodes
increases with respect to the dependent variable.

Building predictive model using decision tree

Naïve Bayes

Naïve Bayes methods are a set of supervised learning algorithms based on applying Bayes’
theorem with the “naïve” assumption of conditional independence between every pair of features
given the value of the class variable. Bayes’ theorem states the following relationship, given class
variable y and dependent feature vector x1 through xn:

𝑃(𝑦)𝑃(𝑥1, … … … , 𝑥𝑛|𝑦)

𝑃(𝑦|𝑥1, … … . , 𝑥𝑛) = 𝑃(𝑥1, … … … , 𝑥𝑛)

8
IMPLEMENTATION AND TESTING

Implementation

A loan eligibility prediction model was developed which can effectively predict whether a person
is eligible for a loan or not. To develop a working system, implementation was done in a single
phase where we preprocessed the data and created a model to predict loan eligibility using
Decision Tree and Naïve Bayes Algorithms.

Tools Used

1. Python

Python was used as a core programming language to develop the model.

2. NumPy

NumPy library was used to work with multidimensional array, linear algebra and matrices.

3. Pandas

Pandas was used for data manipulation.

4. Matplotlib

Matplotlib was used for statistical data visualization.

5. Sklearn

Sklearn was used to create machine learning and classification models.

Testing

Accuracy, Precision, Recall and F-measure are taken to validate the performance of the model.
The overall test ensured validity and reliability of the system. The accuracy achieved using
decision tree was 62.60%, precision was 0.83, recall was 0.611 and f-measure was 0.71. The
accuracy for Naïve Bayes was 83.74%, precision was 0.83, recall was 0.97 and f-measure was
0.90.

9
Conclusion

In this report, development of a model to predict loan eligibility is documented. Preprocessed

data can be used to train and test classification algorithms like decision tree and naïve bayes. The
loan eligibility was given to the user. The main goal of our project “Loan Eligibility Prediction”
was achieved using decision tree and naïve bayes algorithm trained on loan eligibility dataset
from Kaggle. Major finding of the project was that Naïve Bayes algorithm performed best on the
preprocessed data.

Recommendations

The model can be further enhanced by again training the predicted output from more datasets and
adding more features like why the loan application was rejected. Adding some more features to
this model will surely make the model better.

10
11

AI-BASED-MOCK-INTERVIEW-EVALUATOR-AN-EMOTION-AND-CONFIDENCE-CLASSIFIER-MODEL
No ratings yet
AI-BASED-MOCK-INTERVIEW-EVALUATOR-AN-EMOTION-AND-CONFIDENCE-CLASSIFIER-MODEL
8 pages
Quantitative Methods For Business 12th Edition Anderson Sweeney Williams Camm Cochran Solution Manual
100% (46)
Quantitative Methods For Business 12th Edition Anderson Sweeney Williams Camm Cochran Solution Manual
11 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Software Testing LAB Programs
No ratings yet
Software Testing LAB Programs
45 pages
Decisions Under Uncertainty
No ratings yet
Decisions Under Uncertainty
25 pages
Rainfall Prediction in India Using Multiple Linear Regression
No ratings yet
Rainfall Prediction in India Using Multiple Linear Regression
3 pages
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
Introduction: Data Analytic Thinking
No ratings yet
Introduction: Data Analytic Thinking
38 pages
Deep Learning Based Campus Placement Prediction
No ratings yet
Deep Learning Based Campus Placement Prediction
19 pages
Training and Placement Management System
No ratings yet
Training and Placement Management System
15 pages
APMC Prachi Synopsis
No ratings yet
APMC Prachi Synopsis
6 pages
MINI PROJECT Report Format
No ratings yet
MINI PROJECT Report Format
6 pages
Project Report - Credit Card Fraud Detection
No ratings yet
Project Report - Credit Card Fraud Detection
11 pages
AI-UNIT-1 PPT
No ratings yet
AI-UNIT-1 PPT
149 pages
SPM Chapter5
No ratings yet
SPM Chapter5
63 pages
AI
0% (1)
AI
7 pages
Faculty of Graduate Studies and Research Master of Science in Information Technology
No ratings yet
Faculty of Graduate Studies and Research Master of Science in Information Technology
31 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
6 pages
Currency Detector App For Visually Impaired
No ratings yet
Currency Detector App For Visually Impaired
5 pages
A Comprehensive Review On Machine Learning in Agriculture Domain
No ratings yet
A Comprehensive Review On Machine Learning in Agriculture Domain
11 pages
Wipro Aptitude Exam-Aptitude Paper1
No ratings yet
Wipro Aptitude Exam-Aptitude Paper1
4 pages
Fuzzy Inference System
No ratings yet
Fuzzy Inference System
12 pages
Bayes Classifier PDF
100% (1)
Bayes Classifier PDF
18 pages
Students Placement Prediction Using Machine Learning Algorithms
No ratings yet
Students Placement Prediction Using Machine Learning Algorithms
14 pages
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
4 pages
Software Requirements
100% (1)
Software Requirements
31 pages
Transfer Learning Seminar
No ratings yet
Transfer Learning Seminar
12 pages
19 - Crop Recommender System Using Machine Learning Approach
No ratings yet
19 - Crop Recommender System Using Machine Learning Approach
64 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
Visualising and Forecasting Stocks Using Dash
No ratings yet
Visualising and Forecasting Stocks Using Dash
4 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
1.inception Phase: SPM Unit 2
No ratings yet
1.inception Phase: SPM Unit 2
7 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Message Spam Classification Using Machine Learning Report
No ratings yet
Message Spam Classification Using Machine Learning Report
28 pages
Introduction To Pig: SESSION 2016-2017
No ratings yet
Introduction To Pig: SESSION 2016-2017
44 pages
Diabetes Pridiction Using Machine Learning
No ratings yet
Diabetes Pridiction Using Machine Learning
31 pages
Internship Report Dikshant Sharma (191203040)
No ratings yet
Internship Report Dikshant Sharma (191203040)
37 pages
Real Time Currency Converter Ijariie13241
No ratings yet
Real Time Currency Converter Ijariie13241
5 pages
Statistical Decision Theory Notes
No ratings yet
Statistical Decision Theory Notes
49 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
IMDB - Movie Recomendation-DA Project
No ratings yet
IMDB - Movie Recomendation-DA Project
4 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
AI Chatbot: Green University of Bangladesh
100% (2)
AI Chatbot: Green University of Bangladesh
20 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Unit V
No ratings yet
Unit V
13 pages
Capstone Project
No ratings yet
Capstone Project
10 pages
BI UNIT-II Chp01(Mathematical models for decision making)
No ratings yet
BI UNIT-II Chp01(Mathematical models for decision making)
9 pages
Internal Mark Assessment System: Purpose of The Project
No ratings yet
Internal Mark Assessment System: Purpose of The Project
3 pages
PPT1
No ratings yet
PPT1
93 pages
Medical Insurance Cost Prediction Report Full
100% (1)
Medical Insurance Cost Prediction Report Full
50 pages
Report Final FINAL
No ratings yet
Report Final FINAL
72 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
10 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
About The Classification and Regression Supervised Learning Problems
No ratings yet
About The Classification and Regression Supervised Learning Problems
3 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
News Classification Using Machine Learning
No ratings yet
News Classification Using Machine Learning
5 pages
Report
No ratings yet
Report
15 pages
Loan Approval - PPT
No ratings yet
Loan Approval - PPT
19 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
A comprehensive textbook on sample surveys indian statistical institute series
No ratings yet
A comprehensive textbook on sample surveys indian statistical institute series
273 pages
Logistic Pima Indians - Ipynb - Colaboratory
No ratings yet
Logistic Pima Indians - Ipynb - Colaboratory
4 pages
Sampling Distribution
No ratings yet
Sampling Distribution
24 pages
Sta220 Assessment 4 - Project Guidelines (35%)
No ratings yet
Sta220 Assessment 4 - Project Guidelines (35%)
2 pages
Writing Up Your Results - APA Style Guidelines
No ratings yet
Writing Up Your Results - APA Style Guidelines
5 pages
Aneja Convolutional Image Captioning CVPR 2018 Paper
No ratings yet
Aneja Convolutional Image Captioning CVPR 2018 Paper
10 pages
A Student Has To Choose Any One of The Second Languages As Per The Availability of The Course in College
No ratings yet
A Student Has To Choose Any One of The Second Languages As Per The Availability of The Course in College
5 pages
Student Centered Lesson Plan
No ratings yet
Student Centered Lesson Plan
9 pages
Tutorial 4 - Analysis of Variance PDF
No ratings yet
Tutorial 4 - Analysis of Variance PDF
10 pages
Chapter 7 - Factor Analysis
No ratings yet
Chapter 7 - Factor Analysis
43 pages
Jurnal Nining Kurnia
No ratings yet
Jurnal Nining Kurnia
15 pages
GPowerManual
No ratings yet
GPowerManual
85 pages
PR WW1 Reviewer
No ratings yet
PR WW1 Reviewer
4 pages
Lecture 04
No ratings yet
Lecture 04
4 pages
Statistics 1
No ratings yet
Statistics 1
34 pages
CIE Scheme of Work YR 8
No ratings yet
CIE Scheme of Work YR 8
109 pages
Seminar Titles For Business Research Methods
100% (1)
Seminar Titles For Business Research Methods
20 pages
What Drives The Development of Life Insurance Sect
No ratings yet
What Drives The Development of Life Insurance Sect
15 pages
Chapter-8-Test-on-Hypothesis-for-a-Single-Sample
No ratings yet
Chapter-8-Test-on-Hypothesis-for-a-Single-Sample
83 pages
CASE On A FMCG Firm-Solutions-30.10.17
No ratings yet
CASE On A FMCG Firm-Solutions-30.10.17
3 pages
University of The West Indies Bsc. in Management Studies
No ratings yet
University of The West Indies Bsc. in Management Studies
5 pages
Measures of Central Tendency (Mean, Median, Mode)
No ratings yet
Measures of Central Tendency (Mean, Median, Mode)
6 pages
RRM
No ratings yet
RRM
8 pages
R18 - PG - MTEch (DS)
No ratings yet
R18 - PG - MTEch (DS)
60 pages
Discrete Probability Distributions Ppt2
No ratings yet
Discrete Probability Distributions Ppt2
20 pages
Exhibit 5A-3
No ratings yet
Exhibit 5A-3
4 pages
Testing of Hypothesis: Business Mathematics and Statistics MBA (FT) I
No ratings yet
Testing of Hypothesis: Business Mathematics and Statistics MBA (FT) I
18 pages
Inquiries, Investigation and Immersion 12: 3rd Quarter Week 7
100% (1)
Inquiries, Investigation and Immersion 12: 3rd Quarter Week 7
11 pages
4.1 Measures of Central Tendency
No ratings yet
4.1 Measures of Central Tendency
15 pages

Loan Eligibility Prediction

Uploaded by

Loan Eligibility Prediction

Uploaded by

i

Loan Eligibility Prediction

The major objectives of the project are listed below:

Evaluation of model us ing Accuracy, Precision , Re call a nd F -measure

Flowchart of entire process

 Loan Id: Unique Id for an applicant

Box plot of applicant income

Histogram of log of total income

Building predictive model using decision tree

𝑃(𝑦|𝑥1, … … . , 𝑥𝑛) = 𝑃(𝑥1, … … … , 𝑥𝑛)

Python was used as a core programming language to develop the model.

Pandas was used for data manipulation.

Matplotlib was used for statistical data visualization.

Sklearn was used to create machine learning and classification models.

In this report, development of a model to predict loan eligibility is documented. Preprocessed

You might also like