UPI+FRAUD+DETECTION+USING+MACHINE+LEARNING+ALGORITHMS
UPI+FRAUD+DETECTION+USING+MACHINE+LEARNING+ALGORITHMS
ABSTARCT
Increase in UPI usage for online payments, Cases of fraud associated with it are also rising. Few steps
involving UPI transaction process using a Hidden Markov Model (HMM). An HMM is initially trained for a
cardholder. If a UPI transaction is not accepted by the trained HMM. It is considered to be fraudulent. People
can use UPIs for online transactions as it provides an efficient and easy-to-use facility. With the increase in
usage of UPIs, the capacity of UPI misuse has also enhanced. UPI frauds cause significant financial loses for
both UPI holders and financial companies. In this project, the main aim is to detect such frauds, including the
accessibility of public data, high-class imbalance data, the changes in fraud nature, and high rates of false
alarm. The main focus has been to apply the recent development of machine learning algorithms for this
purpose. We have created 5 Algorithms to detection the UPI Fraud and evaluated results based on that. Various
modern techniques like artificial neural network. Different machine learning algorithms are compared,
including Auto Encoder, Local Outlier Factor, Kmeans Clustering. This project uses various algorithms, and
neural network which comprises of techniques for finding optimal solution for the problem and implicitly
generating the result of the fraudulent transaction. This algorithm is a heuristic approach used to solve high
complexity computational problems. The implementation of an efficient fraud detection system is imperative for
all UPI issuing companies and their clients to minimize their losses.
Literature review
As digital payments grow in popularity, UPI transactions have seen widespread adoption, but this has also led to
a rise in fraud cases. Studies and reports highlight several categories of UPI fraud, including phishing scams,
social engineering attacks, and man-in-the-middle (MITM) attacks. Research shows that the increased reliance
on mobile applications for financial transactions introduces new security risks, especially in markets where
smartphone usage is high but digital literacy may vary significantly. For example, a report by the Reserve Bank
of India in recent years has noted a substantial increase in complaints related to unauthorized UPI transactions,
indicating the need for more robust security and fraud detection mechanisms.
57
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
UPI ID/Virtual Payment Address (VPA): Unique identifiers linked to user’s bank accounts, used in place of
numbers.
Bank and NPCI Servers: Intermediary systems that facilitate authentication and transfer requests.
Two-Factor Authentication (2FA): A UPI mandate for security, where users verify their identity using a PIN,
usually entered through a mobile device.
Transaction Timestamps: Date and time stamps, which help identify the exact moment each transaction
occurs.
The UPI transaction flow begins with the sender initiating a request to transfer a specific amount. After
authorization via 2FA, the transaction request is routed through the NPCI servers and the respective banks. Once
verified, the funds are transferred instantly, with a notification sent to both parties confirming the successful
transaction. This rapid and straightforward process makes UPI attractive but also opens avenues for exploitation
by fraudsters who manipulate each element of the transaction flow.
Methodology
The methodology section outlines the approach for building an effective UPI fraud detection model. It covers the
data collection and preprocessing techniques, feature engineering, algorithm selection, and model building
processes used in your project.
Data Collection:
The accuracy and reliability of a fraud detection model rely heavily on the quality of data it’s trained on. In UPI
fraud detection, data collection involves acquiring historical transaction data with labels indicating whether a
transaction was fraudulent or legitimate.
Experimental Setup:
Environment and Tools: The experimentation was conducted using a data science environment such as
Jupyter Notebook, with programming in python. Key libraries included Pandas for data handling, Scikit-Learn
for machine learning models, XGBoost and LightGBM for boosting algorithms, and Matplotlib and Seaborn
for Results and Analysis:
Comparison of Models: The supervised models,, particularly XGBoost and LightGBM, achieved the highest
accuracy and recall, indicating they could capture intricate patterns in the transaction data. Autoencoders
showed promising results as an anomaly detection method, especially in Identifying unexpected patterns, but
58
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
required more tuning for consistent results. Hybrid models demonstrated versatility by combining strengths
from supervised and unsupervised methods, especially in adapting to new fraud patterns.
Discussion
The discussion section reflects on the findings and implications of implementing a UPI fraud detection system
using machine learning. This part evaluates the overall effectiveness, explores observed fraud patterns, assesses
model performance, discusses challenges encountered, and considers the broader impact on the digital payments
landscape.
The UPI fraud detection project reveals several important insights regarding fraud detection techniques and
transaction behavior:
• Patterns in Fraudulent Transactions: The system uncovered patterns commonly associated with
fraudulent transactions, such as unusually high transaction frequency, changes in transaction location,
or atypical amounts for specific users. These patterns are essential for developing feature sets and
refining the model.
• Effectiveness of Machine Learning: Machine learning proved effective for this application, with the
model able to recognize complex patterns that traditional rule-based systems might miss. The machine
learning approach allowed for improved adaptability as the system learns from new data over time.
• Feature Importance: Feature engineering emerged as a critical factor in model performance. Features
like transaction frequency, amount variance, and geolocation played significant roles in identifying
suspicious transactions. This emphasizes the importance of domain- specific feature selection in fraud
detection.
• System Performance: The system demonstrated the ability to process high transaction volumes in
near-real-time, confirming that a well- optimized machine learning model can handle the speed and
scale required for UPI fraud detection.
59
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
• High Recall, Balanced Precision: A high recall rate was prioritized to capture as many fraudulent
transactions as possible, even at the cost of some false positives. This approach was necessary to ensure
that no significant fraudulent activities were missed.
• Precision Considerations: A balance was maintained in precision to reduce false positives, though
some genuine transactions were occasionally flagged. This balance between precision and recall was
critical to minimize interruptions for legitimate users while ensuring fraud coverage.
• Improvement Over Rule-Based Systems: Unlike traditional rule- based systems, which are often rigid
and easily circumvented by sophisticated fraud tactics, the machine learning model provided a more
dynamic solution capable of adapting to evolving fraud techniques.
Conclusion
The UPI Fraud Detection project set out to address the rising concerns of fraud in digital payments by leveraging
machine learning techniques. The system aimed to detect potentially fraudulent transactions in real-time,
safeguarding user accounts and supporting the security goals of the UPI (Unified Payments Interface) network.
This section reflects on the project’s primary achievements, limitations, and implications, while highlighting
future opportunities for enhancing digital payment security.
60
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
Machine learning proved to be a robust solution for UPI fraud detection, surpassing traditional rule-based
systems in adaptability and precision. Key reflections include:
• Adaptability and Responsiveness: The machine learning model’s ability to adapt to new fraud patterns
over time enables the system to respond to dynamic fraud techniques, a critical advantage over rule-
based systems. This adaptability enhances the overall security of UPI transactions.
• Enhanced Detection Accuracy: Through advanced feature engineering and model training, the system
achieved a high level of accuracy, with well-balanced precision and recall rates. This reflects the
potential of machine learning to distinguish between legitimate and fraudulent transactions with a low
margin of error.
• Data Dependency: The project underscored the importance of quality data for effective model training.
To sustain high detection accuracy, ongoing data collection and model retraining are essential to keep
the system updated with the latest fraud patterns.
References
The References section lists the key academic papers, articles, technical documentation, and other
resources consulted during the project. These references support the development of the UPI fraud
detection system by providing insights into machine learning algorithms, fraud detection
methodologies, digital payment security, and more.
Academic journals and conference papers are essential for understanding the theoretical
background, algorithmic development, and state-of-the-art techniques in fraud detection and
machine learning. Typical references might include:
61
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
using AdaBoost and majority voting. Journal of Advanced Research in
Dynamical and Control Systems, 10(7), 1419-1426.
➢ This research provides examples of using ensemble learning methods like AdaBoost, which could be
valuable for model experimentation in fraud detection.
➢ This research provides examples of using ensemble learning methods like AdaBoost, which could be
valuable for model experimentation in fraud detection.
➢ Han, J., Kamber, M., & Pei, J. (2011). Data Mining: Concepts and Technniques. Elsevier.
➢ National Payments Corporation of India (NPCI). (n.d.). Unified Payments Interface (UPI) Procedural
Guidelines. Retrieved from https://www.npci.org.in
➢ This document provides essential guidelines on UPI transaction handling, security requirements, and
technical standards, which were instrumental in designing the system architecture.
62
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
➢ Scikit-learn is the main library used for model training and evaluation in this project.
➢ Chollet, F., & others. (2015). Keras. Retrieved from https://keras.io
➢ Keras was used for prototyping and experimenting with neural networks,
though not as central as traditional ML models for this specific project.
References for future exploration, such as papers on advanced machine learning methods, can
add depth to the project's scope and potential evolution:
2. Appendices
The Appendices section provides additional details and supporting information for the UPI fraud
detection project. This section includes data dictionaries, code samples, statistical test results,
hyperparameter configurations, and other relevant technical details. Each appendix is designed to
enhance understanding, ensuring the reproducibility and transparency of the project.
This appendix provides an in-depth data dictionary for the dataset used in the project. It
lists and describes each feature used in the model, along with the data type and sample
63
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
values. This is useful for understanding how specific attributes contribute to fraud
detection.
This data dictionary ensures clarity on how each variable functions within the
fraud detection model.
This appendix provides code snippets and implementation details of the machine
learning models used for fraud detection, such as data preprocessing, feature engineering,
model training, and evaluation. Detailed code helps replicate or understand the process
for technical audiences.
• Data Preprocessing:
➢ Code for handling missing values, outlier treatment, and data
normalization.
➢ Examples of label encoding or one-hot encoding for categorical
variables.
• Feature Engineering:
➢ Code snippets for creating new features, such as frequency of
transactions, average transaction amount, and device ID usage
patterns.
64
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
• Model Training and Evaluation:
➢ Code used to split the dataset into training and testing sets.
➢ Implementation details for each machine learning model (e.g.,
Decision Trees, Random Forest, and XGBoost).
➢ Code for cross-validation, hyperparameter tuning, and evaluation
metrics like confusion matrix, precision, recall, and F1-score.
• Distribution Analysis:
➢ Tests for normality or skewness of transaction amounts or frequencies.
➢ Box plots or histograms displaying distribution comparisons between
fraudulent and legitimate transactions.
65
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
models like Random Forest or XGBoost.
➢ Detailed explanation of feature importance scores, indicating which
factors most influence fraud prediction.
This appendix provides detailed performance metrics and visualizations for model
evaluation, enabling a deeper understanding of the system’s effectiveness.
• Precision-Recall Curve:
➢ Precision-recall curves to assess the model’s performance, especially
useful in imbalanced datasets where fraud cases are less frequent.
This appendix clarifies the overall design, making it easier for others
to understand and potentially replicate the system setup.
Include additional findings or exploratory insights that were not central to the main
66
I
Int. J. Eng. Res. & Sci. & Tech. 2024
ISSN 2319-5991 www.ijerst.com
Vol. 20, Issue 4, 2024
results but are noteworthy. For instance:
• Referencing: Reference each appendix within the main text where relevant
(e.g., “See Appendix A for the data dictionary”).
• Clarity and Detail: Ensure that each appendix is organized logically, with
explanations and legends for any charts, tables, or code snippets.
67