Mini Project
Mini Project
MACHINE LEARNING
Submitted by:-
Under the
❖Anindita Sinha - 35311502721
Supervision of : ❖Ayush Goel- 08511502721
Dr. Deepika Kumar ❖Daksh Kaushik-08711502721
(HOD CSE) ❖Vipul Varshney- 08211502721
1
TABLE OF CONTENTS
➢ Introduction
➢ Benefits of P2P lending
➢ Need of AI model in P2P Lending
➢ Related Work
➢ Objectives
➢ Research Methodology
➢ Dataset
2
➢ References
INTRODUCTION
• Lending Opportunities
• Convenience and Accessibility [19] Figure 2. Number of lenders and credit recipients of P2P platforms in Lithuania (created by
authors based on data from Tarasevičienė 2019; The Central Bank of Lithuania 2020).
4
NEED OF AI MODEL IN P2P LENDING
• Financial Losses
• Increased Non-Performing Assets
(NPAs)
• Impact on Capital Adequacy
• Increased Interest Rates for Borrowers
• Reduced Lending Capacity
• Reputation Risk
• Legal and Administrative Costs
• Market Perception
Figure 3. Operation status of P2P platforms in China [17]
5
RELATED WORK
AUTHOR RESEARCH TITLE METHODOLOGY RESULTS LIMITATIONS SOURCE
Jing Zhou, Wei Li, Default prediction in P2P lending from high- heterogeneous ensemble learning Accuracy: 0.84 This study does not focus on optimizing the [1]
Jiaxin Wang, Shuai dimensional data based on machine learning (GBDT, XGBoost and parameters or conducting sensitivity analyses, so
Ding, Chengyi LightGBM) we recommend that future studies deploy
Xia(2019) algorithms to automate the optimization of
parameters for better results.
Junhui Xu, Loan default prediction of Chinese P2P market: a synthetic minority oversampling Accuracy: 0.91 The study does not consider the changes in [2]
Zekai Lu, machine learning methodology technique (SMOTE),gradient macroeconomic factors and regulatory policies.
Ying Xie boosting model (GBM), NN,
extreme gradient boosting tree
(2021) (XGBT) and random forest (RF)
J. D. Turiel and T. Aste Peer-to-peer loan acceptance and default prediction LR and SVM models,DNN,two Accuracy: 0.75 The integration of the present model with [3]
(2020) with artificial intelligence phase model predictive modelling based on information
filtering network techniques is not discussed.
Beibei Niu,Jinzheng Credit Scoring Using Machine Learning by Combining Logistic Regression(LR), Accuracy: 0.66 Were not able to collect other social network data, [4]
Ren and Xiaotao Li Random such as frequency of calls,whether they are
Social Network Information: Evidence from Peer-to-
Forest,AdaBoost,LightGBM incoming or outgoing and the strength of social
Peer Lending network ties.
6
RELATED WORK
AUTHOR RESEARCH TITLE METHODOLOGY RESULT LIMITATIONS SOURCE
Xiaojun Ma,Jinglan Study on a prediction of P2P network loan default based LightGBM and XGboost Accuracy:0.86 the method is relatively novel, the scope of [5]
Sha ,Dehua Wang, on the machine learning LightGBM and XGboost algorithms application is not very extensive, and the articles
Yuanbo Yu(2018) algorithms according to different high dimensional data related to it are very rare.
cleaning
Li-Hua Li, Alok Kumar Predicting the Default Borrowers in P2P Platform Using KNN, Logistic Accuracy: 0.95 P2P lending faces challenges in its development, [6]
Machine Learning Models Regression,Random Forest such as asymmetric information and improper risk
Sharma, Ramli handling method
Ahmad,Rung-Ching
Chen
Yuejin Zhang,Haifeng Determinants of loan funded successful in online P2P Binary Logistic Regression Accuracy: 0.77 To get reliable results of predicting loan [7]
Li Mo Hai,Aihua Li Lending performance, a pre-selection of variables on the
(2017) basis of credit grades should be underdone. More
precise results can be expected. This could solve the
discordant results that were delivered by previous
research and give a deeper insight into the topic of
ex-post risk in P2P Lending.
An-Hsing Chang, Li-Kai Machine learning and artificial neural networks to XGBoost,LightBGM Accuracy: 0.88 use of other linear classification techniques, such as [8]
Yang,Rua-Huan construct P2P lending credit-scoring model: A case the LDA. It would be an interesting extension to
Tsaih(2022) using Lending Club data exploit the XGBoost algorithm by using many
variable selection techniques in statistics.
7
RELATED WORK
AUTHOR RESEARCH TITLE METHODOLOGY LIMITATIONS SOURCE
Suryono, Ryan Randy, Peer to Peer (P2P) Lending Problems and Potential Solutions: This study produces a table of P2P Many cases of improper billing and awareness of [9]
Betty Purwandari, and Indra A Systematic Literature Review lending problem identification and privacy data can be investigated in further research. It
Budi. alternative solutions by employing a relates to the feasibility of P2P Lending Platform as a
SLR of 81 publications. significant concern. Besides, there is very limited work
on analyzing positive and negative sentiments on P2P
lending
Zhao, Hongke, et al. P2P Lending Survey: Platforms, Recent Advances and Provided a comprehensive survey on suggested several future research directions, including [10]
Prospects P2P lending. Specifically, summarized the pricing problem, mechanism improvement, risk
some mainstream P2P lending platforms management, privacy preserving, and personalization.
in the world and provided a systematic
taxonomy for them.
Au, Cheuk Hang, Barney Developing a P2P lending platform: stages, strategies and Study hints at the strategies that can the issue of generalizability as a potential limitation of [11]
Tan, and Yuan Sun. platform configurations. facilitate the various stages. Model can our study. Future work will be directed toward
potentially serve as the foundation for extending and validating our process model with
formulating guidelines for the managers the collection and analysis of additional data from
of P2P lending platforms, so that they Tuodao,and possibly other P2P lending platforms
are able to optimize the development of
their platforms.
Najaf, Khakan, Understanding the implications of FinTech Peer-to-Peer (P2P) This study examines the impact of the As the P2P lending market is a considerably new market [12]
Ravichandran K. lending during the COVID-19 pandemic. COVID-19 pandemic on the and still in the development stage, further analysis with
Subramaniam, and Osama determinants of FinTech Peer-to-Peer a longer time-frame and more diverse macroeconomic
F. Atayah. (P2P) lending. conditions will check the robustness of our results.
8
RELATED WORK
AUTHOR RESEARCH TITLE METHODOLOGY LIMITATIONS RESULTS SOURCE
An-Hsing Chang, Li-Kai Machine learning and artificial neural networks to Artificial Neural since the data contains description features, Accuracy: 0.88 [13]
Yang, Rua-Huan Tsaih and construct P2P lending credit-scoring model: A case Network(ANN),logistic such as the reasons for the loans and the credit
Shih-Kuei Lin using Lending Club data regression (LR),decision tree, document from the lender, the information in
random forest,XGBoost, text can be converted into a numerical form
LightGBM and 2-layer neural via natural language processing, such as
networks. sentiment analysis
Dong-Her Shih,Ting-Wei A Framework of Global Credit-Scoring Modeling Using naive Bayesian (NB), logistic only the isolated forest outlier detection Accuracy: 0.958 [14]
Wu,Po-Yuan Shih,Nai-An regression (LR), and random method is used.Could have used various other
Outlier Detection and Machine Learning in a P2P
Lu and Ming-Hung Shih forest (RF) outlier detection methods to find common
Lending Platform outliers
Zhida Liu, Zhenyu Zhang, innovative model fusion algorithm to improve the recall Random Forest, Extra Trees, use more types of data, such as time series data Accuracy: 0.8899 [15]
Hongwei Yang, Guoqiang rate of peer-to-peer lending default customers XGBoost, LightGBM, geographic location data, etc., to provide more
Wang, Zhenwei Xu CatBoost, (Artificial Neural accurate default forecast.
Network) ANN, Logistic
Regression, RF-ET-GBM-CAT-
XGB-Stacking model and
LGB- XGB-Stacking model.
Lailatul Nikmah, Dwika New model combination meta-learner to improve KNN,SVM,Random conducting experiments on larger datasets or Accuracy: 0.9998 [16]
Ananda Agustina Pertiwi, accuracy prediction P2P lending with stacking ensemble Forest,Stacking- datasets from different countries and trying to
Subhan, Jumanto, Yosza learning⁎ AdaBoost,Stacking- tune new models to achieve better
Dasril, Iswanto performance.
LightGBM,Stacking-
9 XGBoost,LGBFS-
StackingXGBoost
OBJECTIVES
2) To collect and curate a dataset of loan defaulters with their financial information(ratios) and
later pre-process the data.
3) To use various feature transformation based techniques to achieve better accuracy in the
model.
4) To build a machine learning model and using various algorithms for the classification to loan
risk.
5) To analyse whether the model will work irrespective of external factors like recession,
pandemic, etc.
10
Research
Methodology
11
Machine
Learning
Process Flow
12
DATASET
14
DATASET
16
Data Pre-processing and Cleaning
16
Machine Learning (Results)
16
Deep Learning (ANN Results)
18
LOSS ACCURACY VAL_LOSS VAL_ACCURACY
19
Deep Learning (LSTM Results)
18
LOSS ACCURACY VAL_LOSS VAL_ACCURACY
19
REFERENCES
[1] Zhou, Jing, et al. "Default prediction in P2P lending from high-dimensional data based on machine learning." Physica A: Statistical Mechanics and its Applications
534 (2019): 122370.
[2] Xu, Junhui, Zekai Lu, and Ying Xie. "Loan default prediction of Chinese P2P market: a machine learning methodology." Scientific Reports 11.1 (2021): 18759.
[3] Turiel, J. D., and T. Aste. "Peer-to-peer loan acceptance and default prediction with artificial intelligence." Royal Society open science 7.6 (2020): 191649.
[4] Niu, Beibei, et al. "Lender trust on the P2P lending: Analysis based on sentiment analysis of comment text." Sustainability 12.8 (2020): 3293.
[5] Ma, Xiaojun, et al. "Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different
high dimensional data cleaning." Electronic Commerce Research and Applications 31 (2018): 24-39.
[6] Li, Li-Hua, et al. "Predicting the default borrowers in P2P platform using machine learning models." International Conference on Artificial Intelligence and
Sustainable Computing. Cham: Springer International Publishing, 2021.
[7] Zhang, Yuejin, et al. "Determinants of loan funded successful in online P2P Lending." Procedia computer science 122 (2017): 896-901.
[8] Chang, An-Hsing, et al. "Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using Lending Club data."
Quantitative Finance and Economics 6.2 (2022): 303-325.
[9] Suryono, Ryan Randy, Betty Purwandari, and Indra Budi. "Peer to peer (P2P) lending problems and potential solutions: A systematic literature review." Procedia
Computer Science 161 (2019): 204-214.
17
REFERENCES
[10] Zhao, Hongke, et al. "P2P lending survey: Platforms, recent advances and prospects." ACM Transactions on Intelligent Systems and Technology (TIST) 8.6 (2017): 1-28.
[11] Au, Cheuk Hang, Barney Tan, and Yuan Sun. "Developing a P2P lending platform: stages, strategies and platform configurations." Internet Research 30.4 (2020): 1229-
1249.
[12] Najaf, Khakan, Ravichandran K. Subramaniam, and Osama F. Atayah. "Understanding the implications of FinTech Peer-to-Peer (P2P) lending during the COVID-19
pandemic." Journal of Sustainable Finance & Investment 12.1 (2022): 87-102.
[13] Chang, A.H., Yang, L.K., Tsaih, R.H. and Lin, S.K., 2022. Machine learning and artificial neural networks to construct P2P lending credit-scoring model: A case using
Lending Club data. Quantitative Finance and Economics, 6(2), pp.303-325.
[14] Shih, D.H., Wu, T.W., Shih, P.Y., Lu, N.A. and Shih, M.H., 2022. A Framework of Global Credit-Scoring Modeling Using Outlier Detection and Machine Learning in a
P2P Lending Platform. Mathematics, 10(13), p.2282.
[15]Liu, Z., Zhang, Z., Yang, H., Wang, G. and Xu, Z., 2023. An innovative model fusion algorithm to improve the recall rate of peer-to-peer lending default
customers. Intelligent Systems with Applications, 20, p.200272.
[16]Muslim, M.A., Nikmah, T.L., Pertiwi, D.A.A. and Dasril, Y., 2023. New model combination meta-learner to improve accuracy prediction P2P lending with
stacking ensemble learning. Intelligent Systems with Applications, 18, p.200204.
[17] Yoon, Yeujun, Yu Li, and Yan Feng. "Factors affecting platform default risk in online peer-to-peer (P2P) lending business: an empirical study using Chinese online P2P
platform data." Electronic Commerce Research 19 (2019): 131-158.
[18] https://www.investopedia.com/terms/p/peer-to-peer-lending.asp