0% found this document useful (0 votes)

29 views

Fraud Detection in Banking Data by Machine Learning Techniques

Docs

Uploaded by

Rajashekar

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Fraud Detection in Banking Data by Machine Learning Techniques

Docs

Uploaded by

Rajashekar

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.

2023

Fraud Detection in Banking Data by Machine

Learning Techniques
SEYEDEH KHADIJEH HASHEMI 1, SEYEDEH LEILI MIRTAHERI 1, AND SERGIO GRECO 2
1 Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran 15719-14911, Iran
2 Department of Informatics, Modeling, Electronics and System Engineering, University of Calabria, 87036 Rende, Italy

Corresponding author: Seyedeh Leili Mirtaheri ([email protected])

ABSTRACT As technology advanced and e-commerce services expanded, credit cards became one of the
most popular payment methods, resulting in an increase in the volume of banking transactions. Furthermore,
the significant increase in fraud requires high banking transaction costs. As a result, detecting fraudulent
activities has become a fascinating topic. In this study, we consider the use of class weight-tuning hyper-
parameters to control the weight of fraudulent and legitimate transactions. We use Bayesian optimization
in particular to optimize the hyperparameters while preserving practical issues such as unbalanced data.
We propose weight-tuning as a pre-process for unbalanced data, as well as CatBoost and XGBoost to
improve the performance of the LightGBM method by accounting for the voting mechanism. Finally, in order
to improve performance even further, we use deep learning to fine-tune the hyperparameters, particularly
our proposed weight-tuning one. We perform some experiments on real-world data to test the proposed
methods. To better cover unbalanced datasets, we use recall-precision metrics in addition to the standard
ROC-AUC. CatBoost, LightGBM, and XGBoost are evaluated separately using a 5-fold cross-validation
method. Furthermore, the majority voting ensemble learning method is used to assess the performance of
the combined algorithms. LightGBM and XGBoost achieve the best level criteria of ROC-AUC = 0.95,
precision 0.79, recall 0.80, F1 score 0.79, and MCC 0.79, according to the results. By using deep learning
and the Bayesian optimization method to tune the hyperparameters, we also meet the ROC-AUC = 0.94,
precision = 0.80, recall = 0.82, F1 score = 0.81, and MCC = 0.81. This is a significant improvement over
the cutting-edge methods we compared it to.

INDEX TERMS Bayesian optimization, data Mining, deep learning, ensemble learning, hyper parameter,
unbalanced data, machine learning.

I. INTRODUCTION legitimate. They try to learn how fraud detection systems

In recent years, there has been a significant increase in work and continue to stimulate these systems, making fraud
the volume of financial transactions due to the expansion detection more complicated. Therefore, researchers are con-
of financial institutions and the popularity of web-based stantly trying to find new ways or improve the performance
e-commerce. Fraudulent transactions have become a growing of the existing methods [3].
problem in online banking, and fraud detection has always People who commit fraud usually use security, control,
been challenging [1], [2]. and monitoring weaknesses in commercial applications to
Along with credit card development, the pattern of credit achieve their goals. However, technology can be a tool
card fraud has always been updated. Fraudsters do their to combat fraud [4]. To prevent further possible fraud,
best to make it look legitimate, and credit card fraud has it is important to detect the fraud right away after its
always been updated. Fraudsters do their best to make it look occurrence [5].
Fraud can be defined as wrongful or criminal deception
The associate editor coordinating the review of this manuscript and intended to result in financial or personal gain. Credit card
approving it for publication was Zhan Bu . fraud is related to the illegal use of credit card information

3034 VOLUME 11, 2023

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

for purchases in a physical or digital manner. In digital trans- • To evaluate the performance of the proposed methods,
actions, fraud can happen over the line or the web, since the we perform extensive experiments on real-world data.
cardholders usually provide the card number, expiration date, To better cover the unbalanced datasets, we use recall-
and card verification number by telephone or website [6]. precision in addition to the typically used ROC-AUC.
There are two mechanisms, fraud prevention and fraud We also evaluate the performance using F1_score and
detection, that can be exploited to avoid fraud-related losses. MCC metrics. According to the results, the proposed
Fraud prevention is a proactive method that stops fraud methods outperform the existing and based methods. For
from happening in the first place. On the other hand, fraud evaluations, we use publicly available datasets and also
detection is needed when a fraudster attempts a fraudulent publish the source codes 1 with public access to be used
transaction [7]. by other researchers.
Fraud detection in banking is considered a binary clas- The reminder of this paper is organized as follows: In
sification problem in which data is classified as legitimate Section II we review the related state-of-the-art. The proposed
or fraudulent [8]. Because banking data is large in volume approach for credit card fraud detection including the dataset,
and with datasets containing a large amount of transaction pre-processing, feature extraction and feature selection, algo-
data, manually reviewing and finding patterns for fraudu- rithms, framework, and evaluation metrics, is presented in
lent transactions is either impossible or takes a long time. Section III. Section IV discusses the evaluation results of the
Therefore, machine learning-based algorithms play a pivotal experiments performed, and finally Section V concludes the
role in fraud detection and prediction [9]. Machine learning paper.
algorithms and high processing power increase the capa-
bility of handling large datasets and fraud detection in a II. RELATED WORKS
more efficient manner. Machine learning algorithms and deep In order to prevent fraudulent transactions and detect
learning also provide fast and efficient solutions to real-time credit card fraud, several methods have been proposed by
problems [10]. researchers. A review of state-of-the-art related works is pre-
In this paper, we propose an efficient approach for detect- sented in the following.
ing credit card fraud that has been evaluated on publicly Halvaiee & Akbari study a new model called the AIS-based
available datasets and has used optimised algorithms Light- fraud detection model (AFDM). They use the Immune Sys-
GBM, XGBoost, CatBoost, and logistic regression individu- tem Inspired Algorithm (AIRS) to improve fraud detection
ally, as well as majority voting combined methods, as well accuracy. The presented results of their paper show that their
as deep learning and hyperparameter settings. An ideal fraud proposed AFDM improves accuracy by up to 25%, reduces
detection system should detect more fraudulent cases, and the costs by up to 85%, and reduces system response time by up
precision of detecting fraudulent cases should be high, i.e., to 40% compared to basic algorithms [11].
all results should be correctly detected, which will lead to the Bahnsen et al. developed a transaction aggregation strategy
trust of customers in the bank, and on the other hand, the bank and created a new set of features based on the periodic
will not suffer losses due to incorrect detection. behaviour analysis of the transaction time by using the von
The main contributions of this paper are summarized as Mises distribution. In addition, they propose a new cost-based
follows: criterion for evaluating credit card fraud detection’s models
• We adopt Bayesian optimization for fraud detection and then, using a real credit card dataset, examine how dif-
and propose to use the weight-tuning hyperparameter to ferent feature sets affect results. More precisely, they extend
solve the unbalanced data issue as a pre-process step. the transaction aggregation strategy to create new offers based
We also suggest using CatBoost and XGBoost along- on an analysis of the periodic behaviour of transactions [12].
side LightGBM to improve performance. We use the Randhawa et al. study the application of machine learning
XGBoost algorithm due to the high speed of training algorithms to detect fraud in credit cards. They first use Naive
in big data as well as the regularization term, which Bayes, stochastic forest and decision trees, neural networks,
overcomes overfitting by measuring the complexity of linear regression (LR), and logistic regression, as well as
the tree, and it does not require much time to set the support vector machine standard models, to evaluate the
hyperparameters. We also use the Catboost algorithm available datasets. Further, they propose a hybrid method by
because there is no need to adjust hyperparameters applying AdaBoost and majority voting. In addition, they add
for overfitting control, and it also obtains good results noise to the data samples for robustness evaluation. They
without changing hyperparameters compared to other perform experiments on publicly available datasets and show
machine learning algorithms. that majority voting is effective in detecting credit card fraud
• We propose a majority-voting ensemble learning cases [6].
approach to combine CatBoost, XGBoost, and Light- Porwal and Mukund propose an approach that uses cluster-
GBM and review the effect of the combined methods on ing methods to detect outliers in a large dataset and is resistant
the performance of fraud detection on real, unbalanced
data. We also propose to use deep learning for adjusting 1 The codes are available at https://github.com/khadijehHashemi/Fraud-
and fine-tuning the hyperparameters. Detection-in-Banking-Data-by-Machine-Learning-Techniques

VOLUME 11, 2023 3035

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

to changing patterns [13]. The idea behind their proposed TABLE 1. Features of the credit-card fraud dataset that is used in this
paper.
approach is based on the assumption that the good behaviour
of users does not change over time and that the data points
that represent good behaviour have a consistent spatial sig-
nature under different groupings. They show that fraudulent
behaviours can be detected by identifying the changes in this
data. They show that the area under the precision-recall curve
is better than ROC as an evaluation criterion [13].
The authors in [14], propose a group learning framework
based on partitioning and clustering of the training set. Their regression classifier outperform other algorithms in an imbal-
proposed framework has two goals: 1) to ensure the integrity anced dataset [20]. The summary of the literature review is
of the sample features, and 2) to solve the high imbalance presented in Fig. 1.
of the dataset. The main feature of their proposed framework
is that every base estimator can be trained in parallel, which III. PROPOSED APPROACH TO DETECTING CREDIT CARD
improves the effectiveness of their framework. FRAUD
Itoo et al. use three different ratios of datasets and an The proposed framework for fraud detection is presented
oversampling method to deal with the problem of data imbal- in Fig. 2. As this figure shows, we first apply the desired
ance. Authors use three machine learning algorithms: logistic pre-processing on the data and further divide the data into
regression, Naive Bayes, and K-nearest neighbor. The per- two sections: training and testing, followed by performing
formance of the algorithms is measured based on accuracy, Bayesian optimization on the training data to find the best
sensitivity, specificity, precision, F1-score, and area under the hyperparameters that lead to the improvement of the perfor-
curve. They show that the logistic regression-based model mance. We use the cross-validation method to obtain perfor-
outperforms the other commonly used fraud detection algo- mance comparison in an unbalanced set and then examine
rithms in the paper [15]. the algorithms using different evaluation metrics, including
The authors in [16] propose a framework that combines the accuracy, precision, recall, the Matthews correlation coeffi-
potential of meta-learning ensemble techniques and a cost- cient (MCC), the F1-score, and AUC diagrams. These steps
sensitive learning paradigm for fraud detection. They perform are explained in detail as follows:
some evaluations, and the results obtained from classifying
unseen data show that the cost-sensitive ensemble classifier A. DATASET
has acceptable AUC value and is efficient as compared to the In this paper, we use a real dataset so that the outcome of
performances of ordinary ensemble classifiers. the proposed algorithm can be used in practice. We consider
Altyeb et al. propose an intelligent approach for detect- a dataset named ‘‘creditcard’’ that contains 284,807 records
ing fraud in credit card transactions [17]. Their proposed of two days of transactions made by credit card holders
Bayesian-based hyperparameter optimization algorithm is in September 2013. There are 492 fraudulent transactions,
used to tune the parameters of a LightGBM. They perform and the rest of the transactions are legitimate. The positive
experiments on publicly available credit card transaction class (frauds) accounts for 0.172% of all transactions; hence,
datasets. These datasets consist of fraudulent and legitimate the dataset is highly imbalanced. This dataset is available
transactions. Their evaluation results are reported in terms and can be accessed through https://www.kaggle.com/mlg-
of accuracy, area under the receiver operating characteristic ulb/creditcardfraud.
curve (ROC-AUC), precision, and F1-score metrics. This dataset contains only numerical input variables result-
Xiong et al. propose a learning-based approach to tackle ing from a principle component analysis (PCA) transfor-
the fraud detection problem. They use feature engineering mation. Unfortunately, the original features and background
techniques to boost the proposed model’s performance. The information about the data are not given due to confidentiality
model is trained and evaluated on the IEEE-CIS fraud dataset. and privacy considerations. PCA yielded the following prin-
Their experiments show that the model outperforms tradi- cipal components: V1 , V2 , V28 . The untransformed features
tional machine-learning-based methods like Bayes and SVM with PCA are ‘‘time’’ and ‘‘amount.’’ The ‘‘Time’’ column
on the used dataset [18]. contains the time (in seconds) elapsed between each trans-
Viram et al. evaluate the performance of Naive Bayes action and the first transaction in the dataset. The feature
and voting classifier algorithms. They demonstrate that in ‘‘Amount’’ shows the transaction amount. Feature ‘‘Class’’ is
terms of evaluated metrics, particularly accuracy, the voting the response variable, and it takes the value 1 in case of fraud
classifier outperforms the Naive Bayes algorithm [19]. and 0 otherwise. The summary of the variables and features
Verma and Tyagi investigate machine learning algorithms is presented in Table 1.
in order to determine the best supervised ML-based algorithm
for credit card fraud detection in the presence of an imbal- B. DATA PRE-PROCESSING
anced dataset. They evaluate five classification techniques As illustrated in Table 2, the total number of fraudulent
and show that the supervised vector classifier and logistic transactions is significantly lower than the total number of

3036 VOLUME 11, 2023

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

FIGURE 1. Summary of the related works on fraud detection in banking industry with machine learning techniques.

FIGURE 2. Proposed framework for credit card fraud detection.

legitimate transactions, indicating that the data distribution is leads to data loss [21]. Besides, using over-sampling methods
unbalanced. In real datasets for credit card fraud detection, leads to the production of duplicate data that doesn’t provide
unbalanced data is expected. This data imbalance causes information (the data and information are different, and the
performance issues in machine learning algorithms, and hav- subject is discussed under the ‘‘Entropy’’). Some researchers
ing a class with the majority of the samples influences the use synthetic minority oversampling (SMOTE) as a solution,
evaluation results [6]. Therefore, in many studies, under- which avoids the drawbacks of under and over sampling [5],
sampling and over-sampling methods are used to solve the [17], [22]. However, the SMOTE method causes an increase
data imbalance problem [15]. Using under-sampling methods in the false-positive rate, which is not acceptable in banking

VOLUME 11, 2023 3037

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

TABLE 2. Transaction label distribution in the ‘‘credit card’’ dataset this

unbalanced data is expected in real-life datasets.

for customer orientation. To solve this problem, in this study,

we use class weight tuning hyperparameter to solve the men-
tioned disadvantages [5], [17], [22]. However, the SMOTE
method causes an increase in the false-positive rate, which is
not acceptable in banking for customer orientation. To solve
this problem, in this study, we use class weight tuning hyper-
parameter to solve the mentioned disadvantages.
FIGURE 3. Feature importance diagram that shows the IG for the
C. FEATURE EXTRACTION unknown features of the ‘‘creditcard’’ dataset. The top six features are
The ‘‘time’’ feature includes the time (in seconds) elapsed used in evaluations.

between each transaction and the first transaction. To make

the most of the feature, we expand it to extract the transaction
hour feature, which gives us more information than the time dependent variable and one or more ordinal, nominal, inter-
feature itself. val, or ratio-level independent variables [27].
This algorithm could not be used for unbalanced data.
D. FEATURE SELECTION Therefore, we used hyperparameter class weight to solve
The features are unknown except for ‘‘Time’’ and ‘‘Amount’’, the class imbalance prior to applying logistic regression.
and we have no additional information. Feature selection We show that the ROC-AUC curve cannot be used for
tries to find a subset of features that improve the classifier’s the evaluation of unbalanced data and leads to false
performance on effectively detecting credit card fraud [23]. interpretations.
The information gain (IG) method is used to select the most
important features that lead to a dimension reduction of the 2) LightGBM
training data. Information gain functions by extracting sim-
The LightGBM algorithm is built on the GBDT framework
ilarities between credit card transactions and then awarding
and aims to improve computational efficiency, particularly
the greatest weight to the most significant features based
on big data prediction problems [28]. The high performance
on the class of legitimate and fraudulent credit card trans-
LightGBM algorithm can quickly handle large amounts of
actions [17], [24]. The information gain method has been
data, and the distributed processing of data [17]. In Light-
proven to be computationally efficient and shows leading
GBM, the histogram-based algorithm and trees’ leaf-wise
performance in terms of precision [17]. Therefore, we also
growth strategy with a maximum depth limit are adopted to
consider the IG method for feature selection in the proposed
increase the training speed and reduce memory consump-
framework. Figure 3 shows the diagram of the IG, and the
tion. The tuned hyperparameters include the ‘‘num_leaves’’,
top six features extracted by this method have been used to
which is the number of leaves per tree, ‘‘max_depth’’, which
evaluate the proposed algorithm.
denotes the maximum depth of the tree, and ‘‘learning_rate’’
E. ALGORITHMS which is also balanced by tuning the weight of the class.
Hyperparameters have a significant effect on the performance With the excessive increase of the leaves, the problem fits
of machine learning models. We refer to optimization as horizontally. Therefore, we need to consider a suitable range
the process of finding the best set of hyperparameters that for this algorithm to obtain good optimization results.
configure a machine learning algorithm during its training.
Recently, it was shown that the Bayesian method is capable 3) XGBoost
of finding the optimised values in a much smaller number eXtreme Gradient Boosting (XGBoost) has become a dom-
of training courses compared with evolutionary optimization inant algorithm in the field of applied machine learn-
methods [25], [26]. In this paper, we use the Bayesian opti- ing. XGBoost is a type of decision tree algorithm with
mization algorithm to tune the hyperparameters that lead to boosted gradients. It is preferred over other gradient boosting
computational time reduction and performance improvement. machines (GBMs) due to its fast execution speed, model
performance, and memory resources [28]. This algorithm is
1) LOGISTIC REGRESSION a hybrid technique in which new models are added to fix
Logistic regression is a predictive analysis that finds out if errors caused by existing models. XGBoost includes parallel
two or more variables are related to each other. This method computation to construct trees using all the CPUs during
determines whether there is a relationship between one binary training. Instead of traditional stopping criteria (i.e., criterion

3038 VOLUME 11, 2023

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

first), it makes use of the ‘‘max depth’’ parameter and starts the outcome. Deep learning is shown to be a very promising
tree pruning from the backward direction, which signifi- solution to deal with fraud in financial transactions, making
cantly improves the computational performance and speed the best use of banks’ big data. [34]. Deep learning is a
of XGBoost [28]. XGBoost employs a more regularised generic term that refers to machine learning using a deep
technique called ‘‘formalization’’ to control over-fitting and multi-layer artificial neural network (ANN). It is a biologi-
achieve better performance [29]. The tuned hyperparameters cally inspired model of human neurons, composed of multi-
include learning rate, number of trees, and maximum tree level hidden layers of nonlinear processing units, where each
depth, as well as applying weight to classes neuron is able to send data to a connected neuron within
the hidden layers. These processing units discover interme-
4) CatBoost diate representations in a hierarchical manner. The features
Category Boosting (CatBoost) is a new gradient boosting discovered in one layer form the basis for the processing of
algorithm proposed by Prokhorenkova et al. [29]. CatBoost is the succeeding layer. In this way, deep learning algorithms
a competitive candidate in the realm of classifiers for highly learn intermediate concepts between raw input and target
unbalanced data. [30]. CatBoost machine learning algorithm knowledge [34].
is a particular type of Gradient boosting on the decision trees In this paper, we use a sequential model, which is a linear
as it can handle categorical, ordered features, and the over- stack of layers to construct an artificial neural network model.
fitting of the model is taken care of by Bayesian estima- Our model has a dense class, which is a very common layer
tors [31]. CatBoost doesn’t require extensive data training and is often used. In the neural network, the activation func-
like other machine learning models and can be successfully tion is used to increase the predictive power. This function
applied to diverse types and formats of data [29], [30]. Cat- divides input signals into output signals. We use the Relu
Boost has both CPU and GPU implementations, the GPU activation function, and in the last layer, we use ‘‘Sigmoid’’,
implementation allows for much faster training and is faster since our output is binary. The Sigmoid function generates
than both state-of-the-art open-source GBDT GPU imple- values in a range of zero and one. In the ‘‘Relu’’ function,
mentations, XGBoost and LightGBM, on ensembles of sim- if the value x is smaller than or equal to zero, the output is
ilar sizes [32]. CatBoost uses a more efficient strategy hat zero. The function of the Relu activation function is in many
reduces over-fitting and allows the use of the whole dataset ways similar to the function of our biological neurons.
for training. We perform a random permutation of the dataset, Neural networks require initial weighting. We use kernel-
and also, for data imbalance problems, we use a class weight initializer, which defines the method of determining the ran-
hyperparameter. dom weights of the primary Keras layers. To overcome the
unbalanced data problem, we consider the ratio of 1 to 4 for
5) MAJORITY VOTING the weight of the majority class to the minority class. This
Ensemble learning (EL), which is a type of machine learn- causes an increase in the processing speed as well as increas-
ing, combines several classifiers, minimises the error of the ing the efficiency of the model. The size of the input layer
classifiers, and achieves more reasonable results than a single is equal to the number of features plus the extracted features.
technique. A voting majority classifier is not a real classifier, We also remove the ‘‘time’’ feature. To build the Keras model,
but a method that is trained and evaluated in parallel in order we optimise the number of layers and neurons, the number
to use the different features of each algorithm. We can train of epochs, and the batch size, which leads to an increase in
the data using different hybrid algorithms to predict the final speed. Commonly, batch size is set to 32 or 128. However, our
output. The final result of the prediction is determined by a dataset is highly unbalanced, and by choosing the common
majority of votes according to two different strategies: hard batch size, there may be no fraud cases in the batch during
voting and soft voting. If voting is hard, it uses the predicted training. Therefore, our range is chosen so that we can see
class labels to vote for the majority law. Otherwise, if the vote fraudulent samples in each batch. Also, by choosing a larger
is soft, it predicts the class label based on ‘‘Argmax,’’ the sum batch size, the processing is faster, and we also need less
of the predicted probabilities, which is recommended for a memory. Large epoch sizes can result in either over- or under-
set of well-calibrated classifiers. In this case, the probability fitting. Therefore, selecting the appropriate range for opti-
vector is calculated on average for each predicted class (for mization not only increases the efficiency of the algorithm
all classifiers). The winning class is the one with the highest but also reduces the time required to find the optimal points.
value [27], [33]. By performing Bayesian optimization, the number of neurons
1 X in the first hidden layer is set to 86, the number of epochs is
ŷ = argmax (p1 , . . . , pn ) (1) set to 117, and the batch size is set to 1563. The details of our
NClassifiers
Classifiers model are presented in Table 3.
Following Keras and with the help of the compile method
6) DEEP LEARNING and Adam’s optimizer, we perform weight updates and use
Deep learning algorithms are a class of machine learning binary-cross entropy for the loss function that finalises the
algorithms where multiple hidden layers are used to improve configuration of the learning and training process.

VOLUME 11, 2023 3039

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

TABLE 3. Details of our deep learning model used in the paper are
provided. The total parameters are set to 7593, and all are trainable.

F. EVALUATION METRICS
We apply a cross-validation test to evaluate the performance
of the proposed model for credit card fraud detection. Similar
to [6], [17], We use a stratified 5-fold validation test to obtain
a reliable performance comparison in the unbalanced set.
The dataset is divided randomly into five separate subsets
of equal size, where the number of samples in each class
is divided into equal proportions in each category. In all
steps of validation, a single subset (20% of the dataset) is
reserved as the validation data to test the performance of the
FIGURE 4. ROC_AUC curve.
proposed approach, while the remaining four subsets (80%
of the dataset) are employed as the training data. We repeat
this process five times until all subsets are used. The average detecting actual fraudulent transactions. Precision measures
performances of the five test subsets are calculated, and the the reliability of the classifier and F1-Score is the harmonic
final result is the performance of the proposed approach on a average of recall and precision measures, that considers both
5-fold cross-validation test. false negatives and positives.
To be fair in our comparisons, we use the common met- ROC-AUC is a measure of separability that demonstrates
rics for our evaluations, including accuracy, precision, recall, the model’s ability to differentiate between classes [15].
the Matthews correlation coefficient (MCC), the F1-score, ROC-AUC is a graphical plot of the false positive rate (FPR)
and AUC diagrams. Positive numbers represent fraudulent and the true positive rate (TPR) at different possible lev-
transactions in our experiments, while negative numbers rep- els [17]. The area under the ROC curve is not a suitable
resent legitimate ones. True positive (TP) represents fraud- criterion for evaluating fraud detection methods since it only
ulent transactions that have been classified as such. False considers positive values.
positives (FP) indicate the number of legitimate transactions The precision and recall curves are commonly used to
misclassified as fraudulent. The true negative (TN ) represents compare classifiers in terms of precision and recall. Usually,
legitimate transactions classified as legitimate, and the false in this two-dimensional graph, the precision rate is plotted
negative (FN ) indicates the misclassified fraudulent transac- on the y-axis and the recall is plotted on the x-axis. There
tions as legitimate [15]. The mathematical expressions for the is no good way to describe the true and false positives and
metrics used are given in Eq. (2) to Eq. (6). negatives using one indicator. One good solution is to use
TP + TN MCC, which measures the quality of a two-class problem,
Accuracy = (2) taking into account the true and false positives and negatives.
TP + TN + FP + FN
TP It is a balanced measure, even when the classes are of different
Recall = (3) sizes [6].
TP + TN
TP
Precision = (4) IV. EXPERIMENTAL RESULTS AND DISCUSSION
TP + TN + FP + FN
Precision × Recall We use the stratified 5-fold cross validation method and the
F1-Score = 2× (5) boosting algorithms with the Bayesian optimization method
Precision+Recall
TP × FP−FP × FN to evaluate the performance of the proposed framework.
MCC = √ We extract the hyperparameters and evaluate each algo-
(TP + FP)(TP + FN )(TN + FP)(TN + FN )
(6) rithm individually before using the majority voting method.
We examine the algorithms in triple and double precision. The
Accuracy Accuracy quantifies the total performance of comparison results are presented in Table 5.
the classifier and is defined as the number of correct predic- Most studies in the literature rely on AUC diagrams to
tions made by the model. When dealing with data that isn’t evaluate performance. However, as can be seen from the
balanced, this criterion doesn’t give good results because it ROC-AUC curve in Fig. 4, the value of AUC in severely
also gives a high value if even one fraudulent transaction unbalanced data is not a good evaluation metric. It is influ-
is found. Recall shows the efficiency of the classifier in enced by the real positives and considers the negatives

3040 VOLUME 11, 2023

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

TABLE 4. Performance evaluation of algorithms.

FIGURE 7. ROC curve of deep learning.

FIGURE 5. Precision_Recall curve. TABLE 5. Deep learning model results.

The precision-recall curve is illustrated in Fig. 5 and shows

the system performance in a more precise manner compared
with the ROC-AUC curve. However, the results cannot be
cited because false negatives are far from the view of this
diagram. As Fig. 5 shows, the highest value belongs to the
combination of the CatBoost and LightGBM algorithms with
a value of 0.7672, and the lowest value belongs to logistic
regression and is 0.7361.
Comparing the precision, recall, and F1-score as well as
the MCC, the algorithms used are shown in Fig. 6. The best
performance is related to the combination of lightGBM and
XGBoost algorithms, which have an MCC value of 0.79 and
an F1-score of 0.79. In individual algorithms, XGBoost has
the highest values.
According to the digits obtained in Table 5, deep learning
FIGURE 6. Performance comparing algorithms with different evaluation
criteria. has achieved better performance compared with individual
algorithms and majority voting ensemble learning. The MCC
and F1-score metrics have values of 0.8129 and 0.8132,
irrelevant. According to the ROC-AUC Fig. 4, the logistic respectively. The area under the ROC curve in the deep
regression algorithm 0.9583 has the highest number of fraud learning method is illustrated in Fig. 7 and shows a value of
detection, but it has the lowest value in other criteria. 0.9401.

VOLUME 11, 2023 3041

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

TABLE 6. Performance comparison of the proposed approach and the method presented in [17].

with 28 features and 0.17 percent of the fraud data. We pro-

posed two methods. In the proposed LightGBM, we used
class weight tuning to choose the proper hyperparameters.
We used the common evaluation metrics, including accu-
racy, precision, recall, F1-score, and AUC. Our experimen-
tal results showed that the proposed LightGBM method
improved the fraud detection cases by 50% and the F1-score
by 20% compared with the recently presented method in [17].
We improve the performance of the algorithm with the help
of the majority voting algorithm. We also improved the cri-
teria by using the deep learning method. The assurance of
the results of MCC for unbalanced data proved that, com-
pared to other criteria of evaluation, it’s stronger. In this
paper, by combining the LightGBM and XGBoost methods,
we obtained 0.79 and 0.81 for the deep learning method.
Using hyper parameters to address data unbalance compared
FIGURE 8. Precision- recall curve of deep learning.
to sampling methods, in addition to reducing memory and
time needed to evaluate algorithms, also has better results.For
future studies and work, we propose using other hybrid mod-
els as well as working specifically in the field of CatBoost
by changing more hyperparameters, especially the hyperpa-
rameter number of trees. Also, due to hardware limitations in
this study, the use of stronger and better hardware may bring
better results that can ultimately be compared with the results
of this study.

REFERENCES
[1] J. Nanduri, Y.-W. Liu, K. Yang, and Y. Jia, ‘‘Ecommerce fraud detection
through fraud islands and multi-layer machine learning model,’’ in Proc.
Future Inf. Commun. Conf., in Advances in Information and Communica-
FIGURE 9. Performance comparison of the proposed approach with the tion. San Francisco, CA, USA: Springer, 2020, pp. 556–570.
paper [17] based on the different evaluation criteria. [2] I. Matloob, S. A. Khan, R. Rukaiya, M. A. K. Khattak, and
A. Munir, ‘‘A sequence mining-based novel architecture for detecting
fraudulent transactions in healthcare systems,’’ IEEE Access, vol. 10,
pp. 48447–48463, 2022.
The diagram of the Precision-Recall curve is shown in [3] H. Feng, ‘‘Ensemble learning in credit card fraud detection using boosting
Fig. 8, and shows the value as 0.7922. methods,’’ in Proc. 2nd Int. Conf. Comput. Data Sci. (CDS), Jan. 2021,
pp. 7–11.
The evaluation results of the proposed approach using dif- [4] M. S. Delgosha, N. Hajiheydari, and S. M. Fahimi, ‘‘Elucidation of big
ferent pre-processing and class weight hyperparameter tuning data analytics in banking: A four-stage delphi study,’’ J. Enterprise Inf.
to deal with the problem of data unbalance compared to the Manage., vol. 34, no. 6, pp. 1577–1596, Nov. 2021.
[5] M. Puh and L. Brkić, ‘‘Detecting credit card fraud using selected machine
paper [17] are shown in Fig. 9. The results show improvement learning algorithms,’’ in Proc. 42nd Int. Conv. Inf. Commun. Technol.,
of both methods compared to the method presented in [17]. Electron. Microelectron. (MIPRO), May 2019, pp. 1250–1255.
According to the Table 6, it is shown that the pro- [6] K. Randhawa, C. K. Loo, M. Seera, C. P. Lim, and A. K. Nandi, ‘‘Credit
card fraud detection using AdaBoost and majority voting,’’ IEEE Access,
posed methods outperform the intelligence method presented vol. 6, pp. 14277–14284, 2018.
in [17] using common metrics and a public dataset. [7] N. Kumaraswamy, M. K. Markey, T. Ekin, J. C. Barner, and K. Rascati,
‘‘Healthcare fraud data mining methods: A look back and look ahead,’’
Perspectives Health Inf. Manag., vol. 19, no. 1, p. 1, 2022.
V. CONCLUSION AND FUTURE WORK [8] E. F. Malik, K. W. Khaw, B. Belaton, W. P. Wong, and X. Chew, ‘‘Credit
In this paper, we studied the credit card fraud detection card fraud detection using a new hybrid machine learning architecture,’’
problem in real unbalanced datasets. We proposed a machine- Mathematics, vol. 10, no. 9, p. 1480, Apr. 2022.
[9] K. Gupta, K. Singh, G. V. Singh, M. Hassan, G. Himani, and U. Sharma,
learning approach to improve the performance of fraud ‘‘Machine learning based credit card fraud detection—A review,’’ in Proc.
detection. We used a publicly available ‘‘credit card’’ dataset Int. Conf. Appl. Artif. Intell. Comput. (ICAAIC), 2022, pp. 362–368.

3042 VOLUME 11, 2023

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.2023

[10] R. Almutairi, A. Godavarthi, A. R. Kotha, and E. Ceesay, ‘‘Analyzing credit [31] B. Dhananjay and J. Sivaraman, ‘‘Analysis and classification of heart rate
card fraud detection based on machine learning models,’’ in Proc. IEEE Int. using CatBoost feature ranking model,’’ Biomed. Signal Process. Control,
IoT, Electron. Mechatronics Conf. (IEMTRONICS), Jun. 2022, pp. 1–8. vol. 68, Jul. 2021, Art. no. 102610.
[11] N. S. Halvaiee and M. K. Akbari, ‘‘A novel model for credit card fraud [32] Y. Chen and X. Han, ‘‘CatBoost for fraud detection in financial trans-
detection using artificial immune systems,’’ Appl. Soft Comput., vol. 24, actions,’’ in Proc. IEEE Int. Conf. Consum. Electron. Comput. Eng.
pp. 40–49, Nov. 2014. (ICCECE), Jan. 2021, pp. 176–179.
[12] A. C. Bahnsen, D. Aouada, A. Stojanovic, and B. Ottersten, ‘‘Feature [33] A. Goyal and J. Khiari, ‘‘Diversity-aware weighted majority vote classifier
engineering strategies for credit card fraud detection,’’ Expert Syst. Appl., for imbalanced data,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
vol. 51, pp. 134–142, Jun. 2016. Jul. 2020, pp. 1–8.
[13] U. Porwal and S. Mukund, ‘‘Credit card fraud detection in e-commerce: [34] A. Roy, J. Sun, R. Mahoney, L. Alonzi, S. Adams, and P. Beling, ‘‘Deep
An outlier detection approach,’’ 2018, arXiv:1811.02196. learning detecting fraud in credit card transactions,’’ in Proc. Syst. Inf. Eng.
[14] H. Wang, P. Zhu, X. Zou, and S. Qin, ‘‘An ensemble learning Design Symp. (SIEDS), Apr. 2018, pp. 129–134.
framework for credit card fraud detection based on training set
partitioning and clustering,’’ in Proc. IEEE SmartWorld, Ubiquitous
Intell. Comput., Adv. Trusted Comput., Scalable Comput. Commun.,
SEYEDEH KHADIJEH HASHEMI received the
Cloud Big Data Comput., Internet People Smart City Innov. (Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Oct. 2018, pp. 94–98. B.Sc. and M.Sc. degrees in computer engineer-
[15] F. Itoo, M. Meenakshi, and S. Singh, ‘‘Comparison and analysis of logistic ing. She is currently a Former Student with the
regression, Naïve Bayes and knn machine learning algorithms for credit Department of Electrical and Computer Engineer-
card fraud detection,’’ Int. J. Inf. Technol., vol. 13, no. 4, pp. 1503–1511, ing, Kharazmi University. Her master’s thesis has
2021. been performed on fraud detection for banking
[16] T. A. Olowookere and O. S. Adewale, ‘‘A framework for detecting credit with machine learning techniques. Her research
card fraud with cost-sensitive meta-learning ensemble approach,’’ Sci. Afr., interest includes application of machine learning
vol. 8, Jul. 2020, Art. no. e00464. techniques, focusing on banking.
[17] A. A. Taha and S. J. Malebary, ‘‘An intelligent approach to credit card
fraud detection using an optimized light gradient boosting machine,’’ IEEE
Access, vol. 8, pp. 25579–25587, 2020.
[18] X. Kewei, B. Peng, Y. Jiang, and T. Lu, ‘‘A hybrid deep learning model
SEYEDEH LEILI MIRTAHERI is currently a Fac-
for online fraud detection,’’ in Proc. IEEE Int. Conf. Consum. Electron.
Comput. Eng. (ICCECE), Jan. 2021, pp. 431–434.
ulty Member with the Department of Electrical
[19] T. Vairam, S. Sarathambekai, S. Bhavadharani, A. K. Dharshini, N. N. Sri, and Computer Engineering, Kharazmi University,
and T. Sen, ‘‘Evaluation of Naïve Bayes and voting classifier algorithm for Tehran, Iran. She is researching next-generation
credit card fraud detection,’’ in Proc. 8th Int. Conf. Adv. Comput. Commun. high-performance computing systems and GPU
Syst. (ICACCS), Mar. 2022, pp. 602–608. computing. She has published more than 50 papers
[20] P. Verma and P. Tyagi, ‘‘Analysis of supervised machine learning algo- in credible conferences and journals. Her research
rithms in the context of fraud detection,’’ ECS Trans., vol. 107, no. 1, interests include distributed and parallel systems,
p. 7189, 2022. exascale computing, cluster computing, mathe-
[21] J. Zou, J. Zhang, and P. Jiang, ‘‘Credit card fraud detection using autoen- matics, and scientific computing. She worked on
coder neural network,’’ 2019, arXiv:1908.11553. distributed systems and done several successful industrial experiments in
[22] D. Almhaithawi, A. Jafar, and M. Aljnidi, ‘‘Example-dependent cost- these areas. She received an Exemplary Professor of Kharazmi University,
sensitive credit cards fraud detection using SMOTE and Bayes minimum in 2020, and also she received a Leading Young Researcher in Alborz
risk,’’ Social Netw. Appl. Sci., vol. 2, no. 9, pp. 1–12, Sep. 2020. Province, in 2020. She received the First Award of Inventions at National
[23] J. Cui, C. Yan, and C. Wang, ‘‘Learning transaction cohesiveness for online
Science Foundation Invention Festival, in 2011, the Iran University of Sci-
payment fraud detection,’’ in Proc. 2nd Int. Conf. Comput. Data Sci.,
ence and Technology (IUST) Awards for Excellence in Researching, in 2009,
Jan. 2021, pp. 1–5.
[24] M. Rakhshaninejad, M. Fathian, B. Amiri, and N. Yazdanjue,
the Second Level Reward of National Science Foundation in Ph.D., in 2009,
‘‘An ensemble-based credit card fraud detection algorithm using an the First Award for presenting‘‘CSharifi: Kernel Level Cluster Management
efficient voting strategy,’’ Comput. J., vol. 65, no. 8, pp. 1998–2015, System Software,’’ at the Khwarizmi Young Awards, in 2008, the Grant
Aug. 2022. of Excellent Researcher of National Science Foundation, in 2008, and the
[25] A. H. Victoria and G. Maragatham, ‘‘Automatic tuning of hyperparameters Iranian Organization of Scientific and Industrial Research appreciation to
using Bayesian optimization,’’ Evolving Syst., vol. 12, no. 1, pp. 217–223, cooperating and presenting ‘‘A Cluster Management System Software’’ at
Mar. 2021. the Khwarizmi International Awards, in 2007.
[26] H. Cho, Y. Kim, E. Lee, D. Choi, Y. Lee, and W. Rhee, ‘‘Basic enhancement
strategies when using Bayesian optimization for hyperparameter tuning of
deep neural networks,’’ IEEE Access, vol. 8, pp. 52588–52608, 2020. SERGIO GRECO is currently a Full Professor with
[27] F. N. Khan, A. H. Khan, and L. Israt, ‘‘Credit card fraud prediction and the Department of Informatics, Modeling, Elec-
classification using deep neural network and ensemble learning,’’ in Proc.
tronics and System Engineering (DIMES), Uni-
IEEE Region 10 Symp. (TENSYMP), Jun. 2020, pp. 114–119.
versity of Calabria, Rende, Italy. He has written
[28] W. Liang, S. Luo, G. Zhao, and H. Wu, ‘‘Predicting hard rock pillar stability
using GBDT, XGBoost, and LightGBM algorithms,’’ Mathematics, vol. 8,
over 220 papers, including more than 60 journal
no. 5, p. 765, May 2020. papers in prestigious conferences and journals.
[29] S. B. Jabeur, C. Gharib, S. Mefteh-Wali, and W. B. Arfi, ‘‘CatBoost model His research interests include database theory,
and artificial intelligence techniques for corporate failure prediction,’’ data integration and exchange, inconsistent data,
Technol. Forecasting Social Change, vol. 166, May 2021, Art. no. 120658. incomplete data, data mining, knowledge repre-
[30] J. Hancock and T. M. Khoshgoftaar, ‘‘Medicare fraud detection using sentation, logic programming, and computational
CatBoost,’’ in Proc. IEEE 21st Int. Conf. Inf. Reuse Integr. Data Sci. (IRI), logic and argumentation theory.
Aug. 2020, pp. 97–103.

VOLUME 11, 2023 3043

Answer Key CHP 18 Derivatives Market
No ratings yet
Answer Key CHP 18 Derivatives Market
5 pages
Credit Card Fraud Detect
No ratings yet
Credit Card Fraud Detect
19 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
4 pages
Security Testing Handbook for Banking Applications
From Everand
Security Testing Handbook for Banking Applications
Arvind Doraiswamy
5/5 (1)
Projects Prasanna Chandra 7E Ch4 Minicase Solution
No ratings yet
Projects Prasanna Chandra 7E Ch4 Minicase Solution
3 pages
Fraud Detection in Banking Data by Machine Learning
No ratings yet
Fraud Detection in Banking Data by Machine Learning
11 pages
Major 1 2nd
No ratings yet
Major 1 2nd
13 pages
Ijma 0101004
No ratings yet
Ijma 0101004
7 pages
PID 89: Analysis and Performance Evaluation of Credit Card Fraud Detection by Multi-Model ML
No ratings yet
PID 89: Analysis and Performance Evaluation of Credit Card Fraud Detection by Multi-Model ML
19 pages
paper 2
No ratings yet
paper 2
9 pages
Implementation of Credit Card Fraud Detection Using Support Vector Machine
No ratings yet
Implementation of Credit Card Fraud Detection Using Support Vector Machine
13 pages
10.1007@s41870 020 00430 y PDF
No ratings yet
10.1007@s41870 020 00430 y PDF
9 pages
Credit Card Fraud Detection Using Machine Learning Methods
No ratings yet
Credit Card Fraud Detection Using Machine Learning Methods
7 pages
DS 1
No ratings yet
DS 1
9 pages
Ds 1
No ratings yet
Ds 1
6 pages
IEEE Paper Format
No ratings yet
IEEE Paper Format
4 pages
Approaches To Fraud Detection On
No ratings yet
Approaches To Fraud Detection On
10 pages
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
No ratings yet
Credit-Card-Fraud-Detection-System-Using-Machine-Learning-Process (1)
4 pages
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
No ratings yet
Final Doc of Fraud Detection in Banking Data by Machine Learning Techniques
63 pages
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
No ratings yet
Credit Card Fraud Detection Web Application Using Streamlit and Machine Learning
5 pages
Bioconf Iscku2024 00076
No ratings yet
Bioconf Iscku2024 00076
18 pages
Comparing ML Algorithms On Financial Fraud Detection N
No ratings yet
Comparing ML Algorithms On Financial Fraud Detection N
5 pages
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
No ratings yet
Autonomous Credit Card Fraud Detection Using Machine Learning Approach
23 pages
A Performance Analysis of Machine Learning Techniques For Credit Card Fraud Detection
No ratings yet
A Performance Analysis of Machine Learning Techniques For Credit Card Fraud Detection
21 pages
IEEE_Conference_Template (2)
No ratings yet
IEEE_Conference_Template (2)
3 pages
1 s2.0 S0957417423000635 Main
No ratings yet
1 s2.0 S0957417423000635 Main
11 pages
Creditcard Fraud Detection
No ratings yet
Creditcard Fraud Detection
26 pages
IJIRSET Paper Sample
No ratings yet
IJIRSET Paper Sample
4 pages
Research Paper 4 (Abnormal Transactions)
No ratings yet
Research Paper 4 (Abnormal Transactions)
7 pages
Research Paper Danish
No ratings yet
Research Paper Danish
6 pages
Ms Arjocs 1355
No ratings yet
Ms Arjocs 1355
13 pages
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
No ratings yet
A Hybrid Approach For Optimized Fraudulent Transaction Detection With Credit Card Using
7 pages
s11042-023-14698-2
No ratings yet
s11042-023-14698-2
19 pages
Advancing Credit Card Fraud Detection A Review of
No ratings yet
Advancing Credit Card Fraud Detection A Review of
5 pages
Report Credit Card
No ratings yet
Report Credit Card
26 pages
Credit Card Fraud Detection Using Machine Learning
100% (1)
Credit Card Fraud Detection Using Machine Learning
5 pages
Credit Card Fraud Detection
100% (1)
Credit Card Fraud Detection
4 pages
Data Quality Analysis Based Machine Learning Model
No ratings yet
Data Quality Analysis Based Machine Learning Model
28 pages
Credit Card Fraud Detection Using Machine Learning PDF
No ratings yet
Credit Card Fraud Detection Using Machine Learning PDF
6 pages
Credit Card Fraud 1.4% Positive Class
No ratings yet
Credit Card Fraud 1.4% Positive Class
17 pages
Paper9-Ijisae 12 Batini+Dhanwanth
No ratings yet
Paper9-Ijisae 12 Batini+Dhanwanth
10 pages
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
No ratings yet
Financial Fraud Detection in Healthcare Using Machine and Deep Learning
25 pages
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
No ratings yet
A Review Credit Card Fraud Detection in Banks Using Machine Learning Algorithms
7 pages
Icesc48915.2020.9155615
No ratings yet
Icesc48915.2020.9155615
6 pages
Seminar II Initial Review
No ratings yet
Seminar II Initial Review
13 pages
1 Report
No ratings yet
1 Report
55 pages
1_IJSC_Vol_14_Iss_1_Paper_1_3089_3093
No ratings yet
1_IJSC_Vol_14_Iss_1_Paper_1_3089_3093
5 pages
Naik 2019 Ijca 918521
No ratings yet
Naik 2019 Ijca 918521
6 pages
Copy of final eddited research paper1
No ratings yet
Copy of final eddited research paper1
6 pages
MPML10 2022 FR
No ratings yet
MPML10 2022 FR
24 pages
Special Issue On Innovations and Technology in FinTech 2023 - Unveiled at GFF 2023
No ratings yet
Special Issue On Innovations and Technology in FinTech 2023 - Unveiled at GFF 2023
86 pages
Abstract
No ratings yet
Abstract
2 pages
ML Credit Card
No ratings yet
ML Credit Card
21 pages
Fraudulent Financial Transactions Detection Using Machine Learning
No ratings yet
Fraudulent Financial Transactions Detection Using Machine Learning
10 pages
Credit Card Fraud Detection Report
100% (1)
Credit Card Fraud Detection Report
17 pages
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
No ratings yet
A Hyperparameters Tunned ML Algorithm For Fraud Identification in Banking and Financial Transactions
7 pages
Credit Card Fraud Detection-ppt-1
No ratings yet
Credit Card Fraud Detection-ppt-1
22 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
6 pages
Comparative Study of Machine Learning Algorithms F
No ratings yet
Comparative Study of Machine Learning Algorithms F
11 pages
Credit Card Fraud Detection - Machine Learning Methods
No ratings yet
Credit Card Fraud Detection - Machine Learning Methods
5 pages
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
From Everand
Anti fraud for Cheques and use of AI: Next gen realtime anti fraud 4 cheque processing
Prabhs Uyyala
No ratings yet
Unmasking Deception: Advanced Forensic Accounting Techniques for Fraud Detection
From Everand
Unmasking Deception: Advanced Forensic Accounting Techniques for Fraud Detection
Elizabeth Mogopodi
No ratings yet
PCI DSS Bootcamp The A-Z Information Security Guide
From Everand
PCI DSS Bootcamp The A-Z Information Security Guide
Book Wave Publications
No ratings yet
STATISTICS MODULE 2-Updated
No ratings yet
STATISTICS MODULE 2-Updated
21 pages
Statistics for Data Science by Mihir Patnaik
No ratings yet
Statistics for Data Science by Mihir Patnaik
103 pages
Pearson R Correlation
No ratings yet
Pearson R Correlation
2 pages
Topic 2a Theory of Estimation
No ratings yet
Topic 2a Theory of Estimation
12 pages
IMT 24 Quantitative Techniques M1
No ratings yet
IMT 24 Quantitative Techniques M1
20 pages
Correlation
No ratings yet
Correlation
29 pages
Assignment 1 With Answers PDF
No ratings yet
Assignment 1 With Answers PDF
8 pages
Ug Stat Pract Manual
100% (1)
Ug Stat Pract Manual
108 pages
Chapter 17 - Audit Sampling For Substantive Tests - CANOSA, Danica Mae T
No ratings yet
Chapter 17 - Audit Sampling For Substantive Tests - CANOSA, Danica Mae T
32 pages
Rotten Tomatoes Audience Rating Prediction
No ratings yet
Rotten Tomatoes Audience Rating Prediction
36 pages
Fit Indices Commonly Reported For CFA and SEM
No ratings yet
Fit Indices Commonly Reported For CFA and SEM
2 pages
12 01
No ratings yet
12 01
9 pages
STAT1123
No ratings yet
STAT1123
6 pages
Lab 7
No ratings yet
Lab 7
12 pages
Ch5 - Table of Z Scores
No ratings yet
Ch5 - Table of Z Scores
14 pages
4.3 Population Mean Variance Is Unknown
No ratings yet
4.3 Population Mean Variance Is Unknown
16 pages
Jurnal Tara Syifa
No ratings yet
Jurnal Tara Syifa
4 pages
Sta 242 Bivariate Analysis 4 Special Probability Distributions 1
No ratings yet
Sta 242 Bivariate Analysis 4 Special Probability Distributions 1
20 pages
ANOVA Summary And Example
No ratings yet
ANOVA Summary And Example
9 pages
Two Sample Inference: By: Girma M
No ratings yet
Two Sample Inference: By: Girma M
33 pages
Anova
No ratings yet
Anova
6 pages
MP1 Parameter Estimation
No ratings yet
MP1 Parameter Estimation
11 pages
Instant Download (Ebook PDF) Statistics For Business and Economics 13th Edition by James T. McClave PDF All Chapters
100% (4)
Instant Download (Ebook PDF) Statistics For Business and Economics 13th Edition by James T. McClave PDF All Chapters
51 pages
Lesson 4 - Testing A Population Proportion
No ratings yet
Lesson 4 - Testing A Population Proportion
2 pages
Rules For Working On AMOS: Rule No.1:: Analysis of Moment Structure (Amos)
100% (1)
Rules For Working On AMOS: Rule No.1:: Analysis of Moment Structure (Amos)
18 pages
Healy 2018
No ratings yet
Healy 2018
16 pages
Descriptives: Descriptive Statistics
No ratings yet
Descriptives: Descriptive Statistics
7 pages

Fraud Detection in Banking Data by Machine Learning Techniques

Uploaded by

Fraud Detection in Banking Data by Machine Learning Techniques

Uploaded by

IEEE Transaction on Machine Learning,Volume:11,Issue Date:Jan.

Fraud Detection in Banking Data by Machine

Corresponding author: Seyedeh Leili Mirtaheri ([email protected])

I. INTRODUCTION legitimate. They try to learn how fraud detection systems

3034 VOLUME 11, 2023

VOLUME 11, 2023 3035

3036 VOLUME 11, 2023

FIGURE 2. Proposed framework for credit card fraud detection.

VOLUME 11, 2023 3037

TABLE 2. Transaction label distribution in the ‘‘credit card’’ dataset this

for customer orientation. To solve this problem, in this study,

between each transaction and the first transaction. To make

3038 VOLUME 11, 2023

VOLUME 11, 2023 3039

3040 VOLUME 11, 2023

TABLE 4. Performance evaluation of algorithms.

FIGURE 7. ROC curve of deep learning.

FIGURE 5. Precision_Recall curve. TABLE 5. Deep learning model results.

The precision-recall curve is illustrated in Fig. 5 and shows

VOLUME 11, 2023 3041

with 28 features and 0.17 percent of the fraud data. We pro-

3042 VOLUME 11, 2023

VOLUME 11, 2023 3043

You might also like