0% found this document useful (0 votes)
35 views

Hybrid Undersampling and Oversampling For Handling Imbalanced Credit Card Data

Uploaded by

aadhivinay2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Hybrid Undersampling and Oversampling For Handling Imbalanced Credit Card Data

Uploaded by

aadhivinay2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 24 November 2023, accepted 12 January 2024, date of publication 22 January 2024, date of current version 30 January 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3357091

Hybrid Undersampling and Oversampling for


Handling Imbalanced Credit Card Data
MARAM ALAMRI AND MOURAD YKHLEF
Information System Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
Corresponding author: Maram Alamri ([email protected])
This work was supported by the ‘‘Research Center of the Female Scientific and Medical Colleges,’’ Deanship of Scientific Research, King
Saud University.

ABSTRACT Recent developments in the use of credit cards for a range of daily life activities have increased
credit card fraud and caused huge financial losses for individuals and financial institutions. Most credit card
frauds are conducted online through illegal payment authorizations by data breaches, phishing, or scams.
Many solutions have been suggested for this issue, but they all face the major challenge of building an
effective detection model using highly imbalanced class data. Most sampling techniques used for class
imbalance have limitations, such as overlapping and overfitting, which cause inaccurate learning and are
slowed down by noisy features. Herein, a hybrid Tomek links BIRCH Clustering Borderline SMOTE
(BCBSMOTE) sampling method is proposed to balance a highly skewed credit card transaction dataset.
First, Tomek links were used to undersample majority instances and remove noise, and then BIRCH
clustering was applied to cluster the data and oversample minority instances using B-SMOTE. The credit card
fraud-detection model was run using a random forest (RF) classifier. The proposed method achieved a higher
F1-score (85.20%) than the baseline sampling techniques tested for comparison. Because of the enormous
number of credit card transactions, there was still a small false-positive rate. The proposed method improves
the detection performance owing to the well-organized balancing of the dataset.

INDEX TERMS Borderline SMOTE, class imbalance, credit card, fraud detection, sampling techniques,
Tomek links.

I. INTRODUCTION To build an accurate credit card fraud-detection model,


The rapid development of e-commerce has resulted in an transactions should be analyzed in terms of attributes, fea-
increase in the number of people shopping online. These tures, and values. Fraud detection models are computed based
customers pay credit cards or use a mobile wallet to make on samples of fraudulent and legitimate transactions to clas-
purchases. Consequently, credit cards have become the pri- sify new transactions as fraudulent or legitimate, respectively.
mary payment method in the online world. Given the massive Credit card transactions fall under a vastly imbalanced pub-
volume of daily transactions, criminals have innumerable licly available dataset. This dataset is highly imbalanced
opportunities to find new ways to attack and steal credit card because it contains many more legitimate transactions than
information. Thus, credit card fraud is a serious problem for fraudulent ones. As a result, classification models can achieve
businesses and can cause significant financial and personal very high accuracy without detecting fraudulent transactions.
losses. As a result, businesses have increasingly focused on Classification with imbalanced data is one of the most chal-
developing new ideas and methods for detecting and prevent- lenging problems in data mining [1].
ing fraud, gaining the trust of their customers, and protecting Class inequality has received considerable attention in
their privacy. recent years. The learning and classification processes are
impacted when one class is significantly more represented
than the other. This is particularly the case for a minority
The associate editor coordinating the review of this manuscript and class consisting of rarely seen instances, irregular patterns,
approving it for publication was Cong Pu . and abnormal behavior, which is challenging to identify [1].

2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
14050 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

The best way to handle this type of problem is to balance data SMOTE (SVMSMOTE) – combined with random under-
using sampling techniques. There are three main approaches sampling, and evaluated the model using a random forest (RF)
to sampling techniques: data, algorithms, and hybrids [2]. The classifier. The results showed that a combination of ran-
data-level approach can be divided into three categories: over- dom undersampling (RUS) and one or more oversampling
sampling, undersampling, and hybrid sampling. The most techniques has a high potential to increase accuracy. The
common of these is over-sampling. study concluded that the combination of oversampling and
Researchers using a data-level approach have attempted undersampling techniques positively affected the model’s
to balance datasets prior to the use of conventional clas- performance compared to individual sampling techniques.
sification methods to avoid the influence of the majority Another study [4] compared five oversampling tech-
class [3]. Researchers adopting the algorithm-level approach niques, SMOTE, ADASYN, B1-SMOTE, B2-SMOTE, and
have worked on the internal algorithm structure and have SVMSMOTE, to generate an improved model that could
attempted to eliminate algorithm sensitivity from the majority solve the imbalance problem for credit card transaction data.
class so that the outcomes of classification algorithms do The experiment was conducted using six different machine-
not vary from the majority class [3]. In addition, to tackle learning algorithms: RF, k-nearest neighbors (KNN), logistic
the issue of unbalanced data in credit card transactions, regression (LR), naïve Bayes (NB), SVM, and decision
recent studies have improved the detection system by com- trees (DTs). The authors noticed that oversampling tech-
bining data-level and algorithm-level approaches in a hybrid niques improved the performance of the models and claimed
approach. that there was no preference for one oversampling technique
This study focuses on improving data sampling techniques over another, as everything depended on the type of machine
using a data-level approach with hybrid undersampling and learning (ML) algorithm being used.
oversampling. Hybrid Tomek links and balanced Iterative Other researchers [5] explored different undersampling
reducing and clustering using hierarchy (BIRCH) cluster- techniques using SMOTE and SMOTE-Tomek for unbal-
ing borderline synthetic minority oversampling technology anced data. The classification models used in this study
(BCBSMOTE). The goal of the proposed Tomek links and (KNN, LR, RF, and SVM) were trained on balanced data to
BIRCH BCBSMOTE methods is to enhance data resam- detect fraudulent credit card transactions. The performance of
pling and overcome the limitations of oversampling. The the classifiers on balanced data showed that RF with SMOTE
remainder of this paper is organized as follows. Section II and SMOTE-Tomek were the best. Two other papers, [6]
reviews related studies on sampling techniques for credit card and [7], applied SMOTE-Tomek to a credit card transaction
transaction data. Section III describes the dataset used. While dataset to solve the problem of data imbalance. They found
section IV defines the evaluation metrics for the performed that using SMOTE-Tomek improves the learning rate and out-
experiments. Section V describes the proposed hybrid Tomek performs the detection model performance with imbalanced
links and BCBSMOTE in detail. Section VI reports the exper- datasets.
iments performed along with their results and discusses the Additional study [8] used a hybrid SMOTE and edited
reported results. Finally, Section VII draws the conclusion of nearest neighbor (SMOTE-EEN) method to balance the class
this paper, with a few directions for future work. distribution in a credit card dataset. The results showed that
SMOTE-ENN achieved a high performance with an ensemble
II. RELATED WORKS deep learning model. The hybrid sampling method improved
In the credit card transaction dataset, the number of gen- the performance of the detection model. Moreover, [9] used
uine transactions is higher than the number of fraudulent a hybrid SMOTE-ENN to balance a credit card dataset.
transactions, which causes a high imbalance in the data. SMOTE-ENN is a hybrid resampling method that oversam-
This negatively affects the performance of the fraud-detection ples minority class samples using SMOTE and removes
model and produces inaccurate results. A recent study [1] overlapping instances with ENN. This study discovered that
highlighted that the issue of highly imbalanced classes is using resampled data enhances the performance of the detec-
a challenge for credit card fraud detection models. In the tion model and concluded that combining SMOTE-ENN
area of data mining, prediction involves detecting events, with a boosted long short-term memory (LSTM) classifier
but uncommon events are difficult to identify owing to is a successful approach to detecting fraud in credit card
their inconsistency and variety, and the misclassification of transactions.
uncommon occurrences can result in significant costs. One Another study [10] proposed SMOTE with adaptive quali-
solution is to apply sampling methods in the preprocess- fied synthesizer selection (ASN-SMOTE), an effective over-
ing stage. sampling method based on KNN and SMOTE. The proposed
One study [1] investigated the performance of a clas- ASN-SMOTE filters noise in the minority class by determin-
sification model combining oversampling and under- ing whether the nearest neighbor of each minority instance is
sampling methods for detecting fraud cases from a related to the minority or majority class. Then, ASN-SMOTE
fraud-detection dataset. It used five oversampling tech- uses the nearest majority instance of each minority instance
niques – random oversampling, SMOTE, adaptive syn- to correctly perceive the decision boundary within which
thetic (ADASYN), B-SMOTE, and support vector machine the appropriate minority instances are selected adaptively

VOLUME 12, 2024 14051


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

for each minority instance using the recommended adap- TABLE 1. Summary of sampling techniques in credit card transaction
dataset.
tive neighbor selection method to synthesize new minority
instances. [10] concluded that ASN-SMOTE achieved the
best results when compared to nine state-of-the-art oversam-
pling methods.
[11] used a hybrid sampling method of RUS and
B-SMOTE to address class imbalance. It was found that
this hybrid effectively improved the detection model, with
an F1-score of 70%. According to the study, this hybrid
overcomes the information loss caused by RUS, as well as
the overfitting and overgeneration caused by SMOTE in large
datasets [11].
Further study [12] proposed a new undersampling method
for handling unbalanced data. This clustering-based noise-
sample-removed undersampling (NUS) method removes
noise samples from both the majority and minority classes
before combining them with undersampling techniques.
An experiment was conducted on 13 public and three real-
world datasets, and it was found that NUS outperformed
several well-known methods, including RUS, SMOTE,
ADASYN, SMOTE + Tomek Link, and ENN [12].
Table 1 presents a summary of the selected studies on sam-
pling techniques. These techniques have been used to balance
highly skewed credit card transaction data; however, they may
result in overlapping and loss of relevant information.

III. DATASET
Publicly accessible datasets of financial services, particularly
in the newly growing field of mobile money transactions,
are lacking. Many researchers have worked in the field of
fraud-detection value financial datasets. As financial transac-
tions are inherently private, there are no publicly accessible
datasets that contribute to the issue at hand. The dataset was
created using the PaySim simulator to generate synthetic
credit card transactions [13].
Data sets produced by PaySim can help academics, finan-
cial institutions, and governmental agencies test their fraud
detection techniques or assess the effectiveness of other tech-
niques under comparable testing settings using a shared,
openly accessible, synthetic dataset [13]. PaySim generates a
synthetic dataset from aggregated data from a private dataset
that mimics the normal functioning of transactions and later
injects malicious activity to evaluate the performance of
fraud-detection algorithms. It replicates mobile money trans-
actions using a sample of genuine transactions collected
from a month’s worth of financial logs from an African
country’s mobile money services. The original records were therefore, other attributes are needed, such as those of a syn-
provided by a multinational corporation operating a mobile thetic dataset.
finance service available in over 14 countries worldwide [13]. Table 2 provides a description of each attribute in the
The dataset contains 11 attributes and over six million PaySim dataset. For the is-Fraud attribute in this dataset,
records. the fraudulent agents aim to profit by taking control of
The main reason for the synthetic dataset approach is that customers’ accounts and trying to empty their funds by
the Kaggle dataset used by most researchers is transformed transferring them to another account and then cashing out
using principal component analysis (PCA), and there are of the system; isFlaggedFraud is an attribute that flags
only time and amount attributes. Thus, its attributes are lim- illegal attempts, defined here as any attempt to transfer more
ited and cannot analyze the customer’s behavior perfectly; than 200,000 USD in a single transaction.

14052 VOLUME 12, 2024


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

TABLE 2. Dataset attributes. normal and fraudulent transactions should be defined. The
PaySim dataset contains 11 attributes and 636,2620 trans-
action records. There are 635,4407 normal transactions and
8,213 fraudulent transactions, as shown in Figure 1. It can
be observed that there is a high skew in the dataset (which
must be balanced using sampling techniques to improve the
detection model).
Figure 2 shows that the newbalanceOrig and oldbalance-
Orig columns have a very high correlation (almost 1). Thus,
for data preprocessing, one of the columns should be dropped.
In addition, the isFlaggedFraud column should be dropped
because it does not contribute significantly to determin-
ing whether a transaction is fraudulent because the flagged
algorithm is weak. The number of transactions is the key
factor in identifying whether it may be a fraud. The higher the
initial balance in the original account before the transaction,
the more susceptible it is to fraudulent transactions. The time
at which a transaction occurs is also related to the likelihood
of the transaction being fraudulent.

A. DATASET PREPARATION
The data were prepared by cleaning, a procedure that removes
skewed data, outliers, and missing values. The data were
preprocessed before feeding them to the model for training to
eliminate noise. Random sampling was performed along with
selecting informative features, addressing missing values, and
structuring.
Missing values: One of the processes of data cleaning is to
check whether there are missing or null values in the dataset
by using the isnull () method to check these values in the
dataset. This method returns the DataFrame object in which FIGURE 1. Numbers of normal and fraud transactions in PaySim dataset.
all values are replaced with a Boolean value of either ‘True’
or ‘False’ (for null values and otherwise). When applied to
the PaySim dataset, no null or missing values were found.
Duplicated values: After checking the null values, the
next step is to check the duplicated values, which can be
performed using the duplicated () method. This method dis-
covers duplicate rows by row throughout the dataset and
returns the Boolean values for each row. No duplicated values
or false-value returns were found when it was applied to the
PaySim dataset.

B. DATASET ANALYSIS
Data analysis was used to investigate the dataset and
determine significant information. Exploratory data analy-
sis (EDA) is an important step in gaining complete insight
into a dataset [14]. It is performed to evaluate and under-
stand the entire distribution of the data, as well as to
determine the correlation and dependency among several
input features [14]. Thus, EDA identifies fraud and normal FIGURE 2. Correlation matrix for PaySim dataset.
transactions in different transaction types. The relationship
between transaction types and fraudulent transactions should There are five types of transactions in the PaySim dataset.
also be classified, and the amount of original balance in As Figure 3 shows, the cash-out and payment types are the

VOLUME 12, 2024 14053


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

in which the model predicts a negative outcome but the actual


outcome is positive [15].
The accuracy, precision, and recall metrics are described
with respect to the confusion matrix in Table 3. Accuracy
is the most obvious measure of a model’s predictive ability.
The numerator in this measure contains all correctly labelled
positive and negative class instances (TP: fraud; TN: or non-
fraud) [16]:
Accuracy = (TP + TN )/(TP + TN + FP + FN ) (1)
Precision, also known as the positive predictive value, is the
proportion of true positives to predicted positives generated
by a model. A precision value of 1 indicates that all predicted
positive instances are indeed positive (FP: incorrectly classi-
FIGURE 3. Proportions of transaction types in PaySim dataset.
fied fraud transactions) [15].
Precision = (TP)/(TP + FP) (2)
most common types of transactions, while debit and transfer
are the least common. Recall, also known as the true-positive rate, is the proportion
From Figure 4, it can be observed that the only fraudulent of predicted positives to all positive instances in the sam-
transactions are transfers and cash outs. The numbers of ple. A recall value of 1 indicates that all positive samples
fraudulent transactions for these two types are very similar. were correctly identified (FN: incorrectly classified non-
fraud transactions)) [15].
Recall = (TP)/(TP + FN ) (3)

TABLE 3. Confusion matrix.

FIGURE 4. Number of normal transactions and fraudulent transactions


for each type of transaction. For a classification task and imbalanced dataset, the F1-score
is the harmonic mean of the precision and recall values. The
F1-score was calculated as follows:
IV. EVALUATION METRICS F1 = 2 × (Recall × Precision)/(Recall + Precision) (4)
To validate and test the credit card fraud detection model, the
test dataset was processed to validate that it produced correct State-of-the-art sampling techniques [4], [6], [9], [10], such
results based on the evaluation metric. The evaluation of ML as SMOTE, B-SMOTE, ADASYN, SMOTE-Tomek, and
algorithms is generally performed using different metrics, SMOTEEEN, were used as a baseline for comparison. The
such as accuracy, precision, recall, F1-scores, area under the evaluation was performed by testing these algorithms using
receiver operating characteristic curve (AUC-ROC), and area the synthetic dataset PaySim and comparing them with
under the average precision and recall curve (AUPRCC). respect to increasing the true positives and reducing the
The confusion matrix is used to assess the performance of false positives and error rates, along with the ability to han-
a classification model [15]. It displays the numbers of true dle a large balanced dataset and gain higher accuracy and
positives, false positives, and false negatives. True positives f1-scores.
are cases in which the model correctly predicts a positive
outcome, whereas true negatives are those in which the model V. HYBRID SAMPLING TECHNIQUES FOR
correctly predicts a negative outcome [15]. The number of BALANCING DATASET
false positives is the number of instances in which the model This section introduces the sampling method used for the
predicts a positive outcome but the actual outcome is nega- highly skewed credit card dataset. It presents a hybrid sam-
tive. The number of false negatives is the number of instances pling technique for balancing credit card datasets using

14054 VOLUME 12, 2024


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

FIGURE 5. The proposed method for hybrid sampling to balance the PaySim dataset.

Tomek links for undersampling, combined with BIRCH clus- Thus, a data sampling approach was adopted using hybrid
tering and Borderline-SMOTE for oversampling. In addition, undersampling (Tomek links) and oversampling (BCB-
the evaluation and results of the proposed method are dis- SMOTE) as illustrated in Figure 5; we have shown in the
cussed. In addition, the results of the proposed method literature review that the hybrid sampling approach is proved
were compared with those of the latest existing sampling to outperform the result compared with the undersampling or
techniques. oversampling separately.
To balance the PaySim dataset, the dataset was split into
A. METHODOLOGY
two sets: the training set and test set for the evaluation of
the proposed method. The training set comprised 80% of
Class imbalance problems are common in fraud-detection
the entire dataset and 20% of the test set. The proposed
problems, as there are always less fraud data than non-fraud
hybrid undersampling and oversampling used the training
data. This problem has a negative impact on the algorithm
set, whereas the test set was used with an RF model.
because classifiers are frequently biased towards the majority
As shown in Figure 6, after splitting the dataset, the train-
class and produce poor performance results as a result. In real
ing set contained 508,3526 normal transactions and 6,570
credit card transactions, there is a highly imbalanced distri-
fraud transactions that were used to train the model. Thus,
bution of examples, with the minority class usually being
the remainder of the dataset comprises 1,269,709 normal
much smaller than the majority class. In most learning algo-
transactions and 1,643 fraud transactions to be used to test
rithms, the data distribution is assumed to be balanced and
the model.
oriented towards the learning and recognition of the major-
ity class. Consequently, minority samples were incorrectly
classified [17].
In recent years, the learning problem of imbalanced
datasets has been extensively studied, and sampling methods
have been developed to solve this problem by balancing
the data distribution, mainly by oversampling the minority
class or undersampling the majority class [17]. The most
popular oversampling method is the SMOTE. SMOTE has
been improved in various applications, such as the neigh-
borhood cleaning rule (SMOTE-NCL) [17], deep attention
(DA-SMOTE) [18], B-SMOTE [19], SMOTE-ENN [20], and
SMOTE-Tomek links [21].
The PaySim dataset used here is highly skewed, which
is a major characteristic of financial transaction datasets. FIGURE 6. The PaySim dataset divided into train and test sets.

VOLUME 12, 2024 14055


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

B. UNDERSAMPLING USING TOMEK LINKS Algorithm 1 Tomek Links Undersampling Algorithm


Tomek links are essentially data-reduction techniques. They 1. Consider a sample dataset with a majority and minority
are an improved version of nearest neighbor rule (NNR). class.
Tomek links can be used as undersampling or data cleaning 2. For each sample, let ‘x’ be in the majority class and
methods. The idea here was to use Tomek links to undersam- repeat steps 3 to 7.
ple the majority class by removing Tomek links, although 3. Find the nearest neighbour of ‘x’ in the entire dataset.
samples from both the majority and minority classes were 4. Let ‘y’ be the nearest neighbour of ‘x’.
eliminated rather than only those from the majority class that 5. If ‘y’ belongs to the majority class, go to step 3 for the
form Tomek links [22]. next sample.
The primary goal of this technique is to first identify 6. Calculate the nearest neighbour of ‘y’; let ‘z’ be the
two samples, x, which belongs to the majority class and y, nearest neighbour of ‘y’.
which belongs to the minority class, where x and y form a 7. If ‘z’ and ‘x’ are the same sample points, then ‘x’ and
Tomek link [23]. Two conclusions can be drawn regarding ‘y’ are nearest neighbours of each other.
the samples used in the Tomek links. Either one of them has 8. Thus, ‘x’ and ‘y’ form Tomek links.
noise or an unwanted sample, which is regarded as a less 9. Remove ‘x’ from the sample dataset.
crucial sample case, or both are boundary values [23]. If one 10. Repeat from step 3 until there are no further modifica-
is a noise-generating or unwanted sample, either the minority tions, or no sample is removed.
class sample or majority class sample can be eliminated from 11. The updated sample dataset will work as the dataset for
the dataset. An alternative is to eliminate both samples that classification.
have been utilized as boundary values [23]. The primary ben-
efit of Tomek links is that they emphasize noise cancellation.
One of the samples in the Tomek links is likely to be located
in the cluster of the other sample when there are two data C. OVERSAMPLING USING BCBSMOTE
samples, one belonging to the majority class and the other The second part of the proposed hybrid sampling method uses
to the minority. Another benefit of Tomek links is that they clustering techniques to cluster the data based on distance
do not change the rest of the dataset, thereby decreasing the metrics and to identify the data points that belong together.
possibility of losing crucial data [23]. To ensure that the noise was oversampled, the minimum
The goal of this study was to undersample the major- size of any cluster was restricted the minimum size of any
ity class with particular emphasis on removing the sample cluster, that is, clusters with fewer data points than cluster size
of the majority class from the Tomek links. The majority were simply be dropped from the overall data. Subsequently,
class was more likely to produce noise or unwanted samples oversampling was applied to each cluster to generate the
because of the large sample size in this study, as the PaySim samples. Each cluster generates a specific subset of the total
dataset has six million records. This was implemented using samples that need to be generated. All data points generated
Algorithm 1 [23]. by the clusters were combined and generated by clusters and
combined with the original dataset. The proposed method
improves on the shortcomings of the B-SMOTE algorithm
(global perspective) by locally and adaptively clustering the
minority class samples to form clusters and then generating
samples from each local cluster to improve the imbalance
within and between classes.

1) BIRCH
BIRCH is a fast clustering method for incremental clustering
that uses a tree structure [24]. This algorithm is appropriate
for large samples and introduces data points measured from
multiple dimensions incrementally and dynamically to create
a cluster with the best possible quality within memory and
time constraints. It operates sufficiently quickly to complete
clustering with only one dataset scan [24]. The two key
features of BIRCH are clustering features (CF) and clustering
FIGURE 7. Applying Tomek links undersampling to the PaySim dataset. feature trees. These hierarchical trees have three different
types of nodes: non-leaf nodes, leaf nodes, and minimum
As illustrated in Figure 7, the majority of the sample classes clusters [25]. The CF number parameters were as follows:
in the dataset decreased after applying the Tomek link under- B is the largest child node containing a non-leaf node, L is the
sampling by removing the noise generator samples. Thus, the maximum value of the smallest cluster contained in the leaf,
dataset was ready for oversampling using BCBSMOTE. and T is the maximum diameter of the smallest cluster [25].

14056 VOLUME 12, 2024


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

The following phases comprise the BIRCH algorithm Algorithm 2 The Borderline-SMOTE Algorithm
(Fig. 8). The CF number was first established in memory for 1. Compute the closest m samples from the available
clustering after reading each data point [25]. The tree must be dataset for each sample in a few classes xi . m′ denotes
rebuilt from the leaf node if all memory is used [26]. The first the number of additional categories in the most recent
phase involved removing outliers, merging adjacent clusters, samples.
and filtering the CF tree. Second, the dataset was reduced by 2. Organize the samples xi :
creating a smaller CF tree to condense the data [25]. Third, If m′ = m, the samples around xi are all from
to produce a better CF tree, global clustering such as K-means distinct categories and are referred to be noise data.
or agglomerative clustering is used to group all CF clusters. As such data will have a negative impact on the gen-
Fourth, to ensure that errors are fixed, cluster refinement is eration effect, it is recommended that these samples not
completed by rescanning the original raw data [26]. The CF be included in the generation.
tree issue produced when the original data are scanned only If m/2 ≤ m′ < m, more than half of the m
once is fixed by cluster refining [25]. surrounding xi samples are of distinct categories. Define
Danger as the border sample.
If 0 ≤ m′ < m/2, more than half of the sur-
rounding m samples of xi are of the same categories,
designated as Safe.
3. After marking, apply the SMOTE method to enlarge
the Danger samples. Select xi from the Danger dataset
samples and compute k-nearest neighbour samples of
the same kind xzi . New samples xn are generated at
random using the formula

xn = xi + β(xzi − xi )
Where β is a random number between 0 and 1.

FIGURE 8. The BIRCH phases.


First, the cluster is performed using BIRCH over the train-
ing data to determine which data points belong together; then,
2) BORDERLINE-SMOTE there will be a dictionary giving the number of data points in
B-SMOTE emphasizes minority samples that are near major- each cluster. Subsequently, the number of samples generated
ity samples. The ADASYN and this algorithm are compara- by each cluster was calculated. If a cluster has more than five
ble [27]. The algorithm creates its closest neighbors from a data points, the number of clusters responsible for generating
few minority samples. Only those minority samples that have samples is determined. Finally, B-SMOTE was applied to
the priority of their nearest neighbors in the majority class each cluster to generate samples according to the contribution
are kept. A subsequent SMOTE-like step is applied using of the cluster. The steps of the algorithm were applied here,
minority samples that have been preserved [27], as shown in as shown in Algorithm 3.
Algorithm 2.
D. RF CLASSIFIER
3) BCBSMOTE To evaluate the proposed hybrid sampling method, the
BIRCH clustering is used with B-SMOTE to oversample PaySim credit card dataset was classified using the RF
minority sample classes essentially by clustering the data and algorithm after balancing. A group of DT classifiers com-
then applying B-SMOTE to each cluster to generate more prises RF. Compared to DTs, it has the advantage of
samples (each cluster will generate a specific ratio of the correcting overfitting. To train each tree, a random subset
whole data to be generated). Initially, BIRCH clusters all of the training set was sampled [28]. Next, a DT is built,
minority classes over the training data to identify the data in which each node is divided into a chosen feature of a
points that belong together. The BIRCH algorithm has several random subset of the functionality. Because each tree in the
crucial parameters: threshold, which is the maximum number RF is trained independently of the others, it is extremely
of data samples to be condensed in the sub-cluster of the leaf quick to train datasets with many features and data instances.
node in the CF tree; branching factor, which is the factor used The RF algorithm is resistant to overfitting and offers a good
to specify the number of CF sub-clusters that can be made estimate of the generalization error [28]. These steps were
in a node; and n cluster, which is the number of clusters. applied in the present case.
In this experiment, the n cluster was set to (cluster size)2, the The main benefit of using RF in this study was that
size of the data cluster. it required minimal training time compared to the other

VOLUME 12, 2024 14057


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

Algorithm 3 BIRCH Clustering Borderline SMOTE (BCB- B-SMOTE, and ADASYN, as well as two hybrid sampling
SMOTE) Algorithm techniques, SMOTEENN and SMOTE-Tomek using the
Input: PaySim dataset. To demonstrate that the balanced dataset cre-
X_train: Feature matrix of the training data. ated by the hybrid Tomek links-BCBSMOTE algorithm was
y_train: Target labels for the training data. valid and stable, an RF classification model was employed
cols: List of column names. for testing.
cluster size: Minimum cluster size for BIRCH cluster-
ing. VI. RESULTS AND DISCUSSION
Output: This section reviews the results of the experiment of the
generated_dataset: DataFrame containing the proposed method on the PaySim dataset. The dataset was
original and synthetic data. divided into two subsets – the training and the test sets, which
1. Employ BIRCH clustering on X_train with y_train= comprised 80% and 20% of the original dataset, respectively
1 to identify clusters. in order to evaluate the performance of the hybrid sampling
2. Identify minority class clusters with a size equal to or technique using an RF classifier by contrasting our results
greater than cluster size. with those of other, widely-used state-of-the-art sampling
3. Based on the percentage of samples in each cluster, methods, the outcomes of the experiments are reported.
calculate how many samples it should be generating. Table 4 provides detailed information on the perfor-
4. For each selected cluster, generate samples using mance measurements for all the applied methods. The
B-SMOTE as per cluster contribution calculated in proposed method had the highest F1-score (85.20%), pre-
step 3. cision (81.27%), and AUPRC (72.77%). The accuracies of
5. Update the generated_dataset by adding synthetic data. all the sampling methods were similar (99.90-99.95%). The
6. Return the final generated_dataset. proposed and B-SMOTE methods had the highest accuracy
(99.95%) however, their recall metrics were lower than those
of the other sampling methods. The precision of the proposed
method (81.27%) was higher than that of B-SMOTE and
algorithms. The F1-score is essential for balancing dataset
other sampling techniques. Thus, the proposed hybrid Tomek
evaluation, the accuracy of predicting credit card fraud is
links the BCBSMOTE sampling method outperforms other
extremely important, and RF predicts the output with great
sampling methods.
precision, even for large datasets [28].

TABLE 4. Performance evaluation.


E. EVALUATION
Although accuracy is a crucial measurement and standard
in conventional classification evaluation measures, it is not
applicable in the classification of imbalanced data because,
even if a prediction is inaccurate, it will still be highly
accurate because of the infrequency of items in the minority
sample. Consequently, researchers have proposed effective
methods for assessing imbalanced dataset indicators [11].
In this study, five metrics were used to evaluate the perfor-
mance of the proposed method: accuracy, F1-score, recall,
precision, and AUPRC. These metrics were based on a con-
fusion matrix. Four of these – F1-score, recall, precision, and
AUPRC (i.e., not accuracy) – were utilized to evaluate the
performance of the proposed hybrid sampling method.
The AUPRC provides the area under precision and recall Table 5 presents the confusion matrix for the PaySim
for several thresholds [29]. This is a plot of precision versus dataset obtained using the RF classifier for balancing using
recall, which corresponds to the false discovery rate curve. the proposed method. It can be observed that TP is high,
It is simple to compare various classification models using the which results in a recall value (89.53%), and FP is low, which
AUPRC, which summarizes the precision-recall curve [30]. results in a precision value (81.27%). Precision and Recall
The AUPRC value of the perfect classifier was 1. The sys- are an important evaluation metrics used in fraud detection.
tem’s high recall and precision produce results with accurate Their significance is based on their ability to minimizing FP
labels [30]. The AUPRC metric examines the positive predic- rate and detect positive cases respectively. However, there is
tive value and true positive rate, making it more sensitive to trade-off between precision and recall. Increasing recall may
improvements for the positive class (fraud class) [31]. lead to decrease precision. Thus, F1-score and AUPRC is
This study compared the hybrid Tomek links-BCBSMOTE considered to provide comprehensive evaluation of the model
algorithm with three oversampling algorithms, SMOTE, performance.

14058 VOLUME 12, 2024


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

TABLE 5. RF confusion matrix using proposed sampling method (Tomek the highest values, and SMOTEEEN achieved the lowest.
Links BCBSMOTE).
In terms of recall, B-SMOTE achieves the lowest value, while
ADASYN achieves the highest.
The AUPRC metric illustrates the trade-off between pre-
cision and recall in a binary classification model, especially
when dealing with imbalanced datasets. It provides an in-
depth evaluation of the model’s ability to distinguish between
positive and negative instances. Here, the AUPRC for the
proposed hybrid Tomek link BCBSMOTE method is shown
with recall plotted on the x-axis and precision on the y-axis
(Figure 10).

VII. CONCLUSION
The daily use of bank credit cards has grown dramatically
along with technological innovations. As a result, the use of
credit cards fraudulently by others is a new offense that is
expanding quickly. Therefore, detecting and preventing these
attacks has become an active field of research. Credit card
fraud detection encounters challenges owing to an imbal-
anced dataset, which causes inaccurate results through the
detection system. This study presents a hybrid sampling tech-
nique to balance the PaySim credit card transaction dataset.
The proposed method uses hybrid Tomek links to under-
sample majority samples and BCBSMOTE to oversample
minority samples. This method takes advantage of the Tomek
links method to remove noise samples and of BIRCH clus-
tering in B-SMOTE to cluster a large dataset and eliminate
overfitting. It outperformed existing state-of-the-art methods
in terms of the F1-score, precision, and AUPRC metrics.
FIGURE 9. Comparison between state-of-art sampling methods and In the future, optimization-based feature engineering for
proposed method.
detecting customer spending behavior will be applied to
increase the F1-score and decrease the false-positive rate in
credit card fraud detection model.

ACKNOWLEDGMENT
This research was supported by a grant from the ‘‘Research
Center of the Female Scientific and Medical Colleges,’’
Deanship of Scientific Research, King Saud University.

REFERENCES
[1] H. Shamsudin, U. K. Yusof, A. Jayalakshmi, and M. N. A. Khalid,
‘‘Combining oversampling and undersampling techniques for imbalanced
classification: A comparative study using credit card fraudulent transaction
dataset,’’ in Proc. IEEE 16th Int. Conf. Control Autom. (ICCA), Oct. 2020,
pp. 803–808, doi: 10.1109/ICCA51439.2020.9264517.
[2] W. W. Soh and R. Yusuf, ‘‘Predicting credit card fraud on a imbalanced
data,’’ Int. J. Data Sci. Adv. Anal., vol. 1, no. 1, pp. 12–17, Apr. 2019.
[Online]. Available: http://ijdsaa.com/index.php/welcome/article/view/3
[3] P. Kaur and A. Gosain, ‘‘Comparing the behavior of oversampling and
undersampling approach of class imbalance learning by combining class
imbalance problem with noise,’’ in Advances in Intelligent Systems and
Computing. Singapore: Springer, 2017, pp. 23–30, doi: 10.1007/978-981-
FIGURE 10. AUPRC for hybrid Tomek links BCBSMOTE.
10-6602-3_3.
[4] R. Qaddoura and M. M. Biltawi, ‘‘Improving fraud detection in an imbal-
anced class distribution using different oversampling techniques,’’ in Proc.
Figure 9 shows that the proposed hybrid method achieved Int. Eng. Conf. Electr., Energy, Artif. Intell. (EICEEAI), Nov. 2022, pp. 1–5,
better results than the other methods tested by reducing errors. doi: 10.1109/EICEEAI56378.2022.10050500.
The B-SMOTE performance was also better than that of other [5] K. Praveen Mahesh, S. Ashar Afrouz, and A. Shaju Areeckal, ‘‘Detection
of fraudulent credit card transactions: A comparative analysis of data
sampling techniques. With regard to the F1-score, precision, sampling and classification techniques,’’ in Proc. J. Phys., Conf., Jan. 2022,
and AUPRC, hybrid Tomek links BCBSMOTE achieved vol. 2161, no. 1, Art. no. 012072, doi: 10.1088/1742-6596/2161/1/012072.

VOLUME 12, 2024 14059


M. Alamri, M. Ykhlef: Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

[6] N. Rtayli, ‘‘An efficient deep learning classification model for predicting [23] A. Bansal and A. Jain, ‘‘Analysis of focussed under-sampling techniques
credit card fraud on skewed data,’’ J. Inf. Secur. Cybercrimes Res., vol. 5, with machine learning classifiers,’’ in Proc. IEEE/ACIS 19th Int. Conf.
no. 1, pp. 57–71, Jun. 2022, doi: 10.26735/tlyg7256. Softw. Eng. Res., Manage. Appl. (SERA), Jun. 2021, pp. 91–96, doi:
[7] S. O. Akinwamide, ‘‘Prediction of fraudulent or genuine transactions on 10.1109/SERA51205.2021.9509270.
credit card fraud detection dataset using machine learning techniques,’’ Int. [24] C.-R. Wang and X.-H. Shao, ‘‘An improving majority weighted
J. Res. Appl. Sci. Eng. Technol., vol. 10, no. 6, pp. 5061–5071, Jun. 2022, minority oversampling technique for imbalanced classification prob-
doi: 10.22214/ijraset.2022.44962. lem,’’ IEEE Access, vol. 9, pp. 5069–5082, 2021, doi: 10.1109/
[8] Q. Li and Y. Xie, ‘‘A behavior-cluster based imbalanced classification ACCESS.2020.3047923.
method for credit card fraud detection,’’ in Proc. 2nd Int. Conf. Data Sci. [25] X. Xiong, Y. Huang, Y. Zhang, F. Zhang, Y. Jia, and J. Xi, ‘‘Adaptive hybrid
Inf. Technol. New York, NY, USA: ACM, Jul. 2019, pp. 134–139, doi: sampling algorithm based on BIRCH clustering,’’ in Proc. IEEE 5th Inf.
10.1145/3352411.3352433. Technol.,Netw.,Electron. Autom. Control Conf. (ITNEC), vol. 5, Oct. 2021,
[9] E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido, pp. 110–114, doi: 10.1109/ITNEC52019.2021.9587242.
‘‘A neural network ensemble with feature engineering for improved credit [26] M. M. Barham, ‘‘An improved BIRCH algorithm for breast cancer clus-
card fraud detection,’’ IEEE Access, vol. 10, pp. 16400–16407, 2022, doi: tering,’’ M.S. thesis, Dept. Comput. Sci., Fac. Inf. Technol., Middle East
10.1109/ACCESS.2022.3148298. Univ., Amman, Jordan, Jun. 2020.
[10] X. Yi, Y. Xu, Q. Hu, S. Krishnamoorthy, W. Li, and Z. Tang, ‘‘ASN- [27] F. de la Bourdonnaye and F. Daniel, ‘‘Evaluating resampling methods
SMOTE: A synthetic minority oversampling method with adaptive on a real-life highly imbalanced online credit card payments dataset,’’
qualified synthesizer selection,’’ Complex Intell. Syst., vol. 8, no. 3, Jun. 2022, arXiv:2206.13152.
pp. 2247–2272, Jun. 2022, doi: 10.1007/s40747-021-00638-w. [28] F. Tadvi, S. Shinde, D. Patil, and S. Dmello, ‘‘Real time credit card fraud
[11] E. F. Ullastres and M. Latifi, ‘‘Credit card fraud detection using ensemble detection,’’ Int. Res. J. Eng. Technol., vol. 8, no. 5, p. 2021, 2021.
learning algorithms MSc research project MSc data analytics,’’ M.S. thesis, [29] V. S. S. Karthik, A. Mishra, and U. S. Reddy, ‘‘Credit card fraud detection
Nat. College Ireland, Dublin, Ireland, May 2022. by modelling behaviour pattern using hybrid ensemble model,’’ Arabian
[12] H. Zhu, M. Zhou, G. Liu, Y. Xie, S. Liu, and C. Guo, ‘‘NUS: Noisy- J. Sci. Eng., vol. 47, no. 2, pp. 1987–1997, Feb. 2022, doi: 10.1007/s13369-
sample-removed undersampling scheme for imbalanced classification and 021-06147-9.
application to credit card fraud detection,’’ IEEE Trans. Computat. Social [30] V. Arora, R. S. Leekha, K. Lee, and A. Kataria, ‘‘Facilitating user
Syst., pp. 1–12, Mar. 2023, doi: 10.1109/TCSS.2023.3243925. authorization from imbalanced data logs of credit cards using artificial
[13] E. A. Lopez-Rojas, A. Elmir, and S. Axelsson, ‘‘PaySim: A financial intelligence,’’ Mobile Inf. Syst., vol. 2020, pp. 1–13, Oct. 2020, doi:
mobile money simulator for fraud detection,’’ in Proc. 28th Eur. Modeling 10.1155/2020/8885269.
Simulation Symp. (EMSS), Sep. 2016, pp. 249–255. [31] N. Rtayli and N. Enneya, ‘‘Enhanced credit card fraud detection based
[14] A. A. Arfeen and B. M. A. Khan, ‘‘Empirical analysis of machine on SVM-recursive feature elimination and hyper-parameters optimiza-
learning algorithms on detection of fraudulent electronic fund tion,’’ J. Inf. Secur. Appl., vol. 55, Dec. 2020, Art. no. 102596, doi:
transfer transactions,’’ IETE J. Res., pp. 1–13, Mar. 2022, doi: 10.1016/j.jisa.2020.102596.
10.1080/03772063.2022.2048700.
[15] I. A. Mondal, Md. E. Haque, A.-M. Hassan, and S. Shatabda, ‘‘Han-
dling imbalanced data for credit card fraud detection,’’ in Proc. 24th
Int. Conf. Comput. Inf. Technol. (ICCIT), Dec. 2021, pp. 1–6, doi:
10.1109/ICCIT54785.2021.9689866.
[16] A. Alharbi, M. Alshammari, O. D. Okon, A. Alabrah, H. T. Rauf,
H. Alyami, and T. Meraj, ‘‘A novel text2IMG mechanism of credit card
fraud detection: A deep learning approach,’’ Electronics, vol. 11, no. 5, MARAM ALAMRI received the B.S. and M.S. degrees in computer sciences
p. 756, Mar. 2022, doi: 10.3390/electronics11050756. and telecommunications and information systems from the University of
[17] Y. Sun and F. Liu, ‘‘SMOTE-NCL: A re-sampling method with filter Essex, Essex, U.K., in 2014 and 2015, respectively, where she is currently
for network intrusion detection,’’ in Proc. 2nd IEEE Int. Conf. Com- pursuing the Ph.D. degree with the Department of Information Systems,
put. Commun. (ICCC), Oct. 2016, pp. 1157–1161, doi: 10.1109/COMP-
King Saud University (KSU). She is also a Lecturer with the Department
COMM.2016.7924886.
of Information Systems, KSU. Her current research interests include data
[18] H. Mansourifar and W. Shi, ‘‘Deep synthetic minority over-sampling tech-
nique,’’ Mar. 2020, arXiv:2003.09788. sciences, artificial intelligence, and cyber security.
[19] H. Han, W.-Y. Wang, and B.-H. Mao, ‘‘Borderline-SMOTE: A new over-
sampling method in imbalanced data sets learning,’’ in Proc. Int. Conf.
Intell. Comput., 2005, pp. 878–887, doi: 10.1007/11538059_91.
[20] S. Choirunnisa and J. Lianto, ‘‘Hybrid method of undersampling and
oversampling for handling imbalanced data,’’ in Proc. Int. Seminar
Res. Inf. Technol. Intell. Syst. (ISRITI), Nov. 2018, pp. 276–280, doi:
10.1109/ISRITI.2018.8864335.
MOURAD YKHLEF received the B.Eng. degree in computer science from
[21] G. E. A. P. A. Batista, R. C. Prati, and M. C. Monard, ‘‘A study of the
behavior of several methods for balancing machine learning training data,’’ Constantine University, Algeria, the M.Sc. degree in artificial intelligence
ACM SIGKDD Explor. Newslett., vol. 6, no. 1, pp. 20–29, Jun. 2004, doi: from Sorbonne Paris Nord University (previously Paris 13), France, and the
10.1145/1007730.1007735. Ph.D. degree in computer science from Bordeaux 1 University, France. He is
[22] A. Abd El-Naby, E. E.-D. Hemdan, and A. El-Sayed, ‘‘An efficient fraud currently a Professor with the Department of Information Systems, King
detection framework with credit card imbalanced data in financial ser- Saud University (KSU), Riyadh, Saudi Arabia. His main research interests
vices,’’ Multimedia Tools Appl., vol. 82, no. 3, pp. 4139–4160, Jan. 2023, include data science and artificial intelligence.
doi: 10.1007/s11042-022-13434-6.

14060 VOLUME 12, 2024

You might also like