Systems 11 00305 v2
Systems 11 00305 v2
Article
Credit Card Fraud Detection Based on Unsupervised
Attentional Anomaly Detection Network
Shanshan Jiang 1 , Ruiting Dong 1 , Jie Wang 1 and Min Xia 2, *
1 School of Management Science and Engineering, Nanjing University of Information Science and Technology,
Nanjing 210044, China; [email protected] (S.J.); [email protected] (R.D.);
[email protected] (J.W.)
2 Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science and
Technology, Nanjing 210044, China
* Correspondence: [email protected]
Abstract: In recent years, with the rapid development of Internet technology, the number of credit
card users has increased significantly. Subsequently, credit card fraud has caused a large amount of
economic losses to individual users and related financial enterprises. At present, traditional machine
learning methods (such as SVM, random forest, Markov model, etc.) have been widely studied in credit
card fraud detection, but these methods are often have difficulty in demonstrating their effectiveness
when faced with unknown attack patterns. In this paper, a new Unsupervised Attentional Anomaly
Detection Network-based Credit Card Fraud Detection framework (UAAD-FDNet) is proposed. Among
them, fraudulent transactions are regarded as abnormal samples, and autoencoders with Feature
Attention and GANs are used to effectively separate them from massive transaction data. Extensive
experimental results on Kaggle Credit Card Fraud Detection Dataset and IEEE-CIS Fraud Detection
Dataset demonstrate that the proposed method outperforms existing fraud detection methods.
to be solved. This work aims to survey existing credit card fraud detection methods and
propose a new credit card fraud detection network based on the unsupervised attentional
anomaly detection paradigm. The network follows the training paradigm of Generative Ad-
versarial Network (GAN) [12] and mainly consists of a generator and a discriminator. The
generator generates samples as close as possible to the real data through self-supervised
learning [13–15], and effectively encodes the high-level feature representation (hidden
vector) of the normal transaction data distribution, and the discriminator detects the forged
ones of the generator as much as possible. We used data samples to form an adversarial
training mode. In the generator, we propose a channel-wise feature attention, which enables
the network to better reconstruct more realistic pseudo-samples during the training phase,
which helps to learn the hidden layer feature representation of normal transactions. In
order to effectively supervise the training process of the model, this paper also proposes
a hybrid weighted loss function. In the test phase, the hidden vector and the distance
between the reconstructed sample and the input sample in the feature space are calculated
to determine whether the current transaction sample is fraudulent. In the experimental
part, we compared the proposed method with the existing machine learning methods and
deep learning methods on Kaggle Credit Card Fraud Detection Dataset and IEEE-CIS Fraud
Detection Dataset to prove its advancement.
The main contributions of this paper lie in the following aspects:
• Reframe the problem of credit card fraud detection as anomaly detection of fraudulent
transactions, and propose a new credit card Fraud Detection framework based on
Unsupervised Attentional Anomaly Detection Network (UAAD-FDNet).
• A channel-wise feature attention is proposed. This module enables the network to
effectively capture the interdependence between feature channels to better learn how
to reconstruct normal transaction samples.
• A hybrid weighted loss function is proposed to enable the model to learn an effec-
tive encoding method for hidden vectors and reconstruct samples as realistically as
possible. In the test phase, fraudulent transactions are identified by calculating the
hidden vectors and the characteristic distance between the reconstructed samples and
the input samples.
• Experimental results on Kaggle Credit Card Fraud Detection Dataset and IEEE-CIS
Fraud Detection Dataset show that our method outperforms existing machine learning-
based and deep learning-based fraud detection methods.
quickly and accurately extract typical features from limited transaction data is a topic that
still needs further exploration.
In recent years, deep learning technology has demonstrated strong capabilities in
many fields [20–26], the development of deep learning has significantly promoted the
reform of the financial industry, and, at the same time, brought new ideas to the research
of financial fraud detection. In 2016, Fu et al. [27] proposed a CNN-based fraud detection
network, which learns the intrinsic patterns of fraudulent behavior from labeled data to
identify whether there is fraudulent behavior in each transaction sample. In 2018, Chouiekh
and Haj [28] explored the performance of Deep Convolutional Neural Network (DCNN)
and some traditional machine learning methods on fraud events. The experimental results
on mobile communication networks showed that DCNN is significantly better than other
methods. However, due to the serious data imbalance in the financial fraud data in the
real world, this brings severe challenges to the above methods. Saia et al. [29] proposed
analyzing this issue and the heterogeneity of credit card data in the frequency domain to
obtain a more stable information representation for fraud detection. In 2019, Fiore et al. [30]
aimed at the class imbalance problem of financial fraud transaction data, and used GAN
to generate minority class samples to train more effective classifiers. Saia and Carta [31]
conducted research on proactive fraud detection based on Fourier transform and Wavelet
transform. In 2022, Esenogho et al. [32] used hybrid data resampling technology and
integrated learning to conduct robust credit card fraud detection based on LSTM [33] and
AdaBoost [34]. In this paper, we reformulate the task of credit card fraud detection as an
anomaly detection task, training autoencoder, and GAN in a clever way to avoid adverse
effects of data imbalance on the model.
3. Methodology
In this paper, the credit card fraud detection problem is treated as an anomaly detection
problem. Fraudulent transaction data are used as an abnormal sample in the transaction
system. We use an unsupervised attentional anomaly detection network (including au-
toencoder with Feature Attention and GAN) to separate it from normal transaction data to
complete the purpose of fraud detection. In the following, we will introduce the proposed
credit card fraud detection algorithm in detail.
the Softmax classification layer. The above basically form our credit card fraud detection
framework. Considering that the fully connected layer is prone to overfitting, the dropout
layer should be selectively added to our network according to the feature dimension of the
input data.
In addition, in order to make G better learn the reconstruction method of x̂, this
paper also proposes a channel-wise feature attention, and its structure diagram is shown
in Figure 2. For a set of transaction sample features u ∈ RC , we first group its feature
channels to obtain two sets of sub-features u g ∈ R N ×d and ul ∈ Rd× N (N represents the
number of groups, and d represents the feature dimension of each group). Then, feature
computation with self-attention [35,36] is performed on two sets of sub-features separately
to fully capture the global and local correlations among feature channels. Self-attention
is widely used in the field of Natural Language Processing (NLP), which can effectively
capture the dependencies between arbitrary features by computing the spatial distance of
pairs of features. Specifically, it first performs linear transformation on the input feature
q ∈ Rk×n to obtain Query (Q ∈ Rk×n ), Key (K ∈ Rk×n ), and Value (V ∈ Rk×n ). Then, the
correlation between Q and K is calculated to obtain S ∈ Rk×k , each row of which passes
through Softmax to represent the degree of correlation between paired features. Finally,
S and V are fused to complete the output representation P ∈ Rk×n . The mathematical
expression of the above process is as follows
p = So f tmax ( Q ◦ K T ) ◦ V, (1)
where ◦ means matrix multiplication. In this paper, k and n are denoted as N, d, d, and N
in the two branches, respectively. Finally, element-wise fusion is performed on the output
features of the two branches to obtain the output of the feature attention module. We
introduce this module in the skip connection of the generator. On the one hand, it can
preserve the fine-grained features of the original sample, and on the other hand, it can
effectively filter the redundant features in the transaction sample.
28
… GE GD … E Lcon
z z
x x Llat
Real
D Ladv
Fake
FC Feature
LeakyReLU Tanh Softmax
Attention
Figure 1. Schematic diagram of the framework structure of credit card fraud detection based on
unsupervised attentional anomaly detection.
N Feature
…
Attention
d
C d
…
Feature C
N Attention
Q
k
k K k
k
n n
V
4. Experiments
4.1. Dataset
4.1.1. Credit Card Fraud Detection Dataset
In this section, the credit card fraud detection dataset on Kaggle is used to verify
the effectiveness of the proposed UAAD-FDNet. The dataset collects 284,807 credit card
transaction records, which are generated by European cardholders within two days in
September 2013. Considering data privacy issues, this dataset only provides transaction
data processed by PCA. Each transaction record contains a total of 28 principal component
values of V1–V28 and the other two unpreprocessed ‘Time’ and ‘Amount’ feature. ‘Time’
represents the time difference between each transaction and the first transaction in the
dataset. ‘Amount’ indicates the amount of each transaction. The schematic diagram of the
data sample is shown in Figure 3. In addition, each transaction also contains a set of ‘Class’
tags: 0 for normal transaction data, 1 for fraudulent transaction data. Among them, there
are only 492 fraudulent transactions, which only account for 0.172% of the entire dataset.
Figure 4 shows the significant difference in the number of positive and negative samples in
this dataset. Therefore, the category imbalance problem should be considered first. The
statistical information of the dataset is shown in Table 1.
160832 0.008812 0.94412 -0.38981 -0.59405 0.738905 0.094063 0.152648 -0.08589 9.51
160832 -2.45901 2.117867 -1.205 -0.62517 -1.48174 0.513479 -0.46243 -0.01536 9.25
…
160833 -2.11399 1.748864 -1.95475 0.768964 -0.08916 -0.31459 0.770459 0.100563 248.52
160833 -5.26402 5.795819 -5.58939 -0.25467 -0.18698 -0.26523 -0.14674 0.758428 5.9
105
Number
104
103
0
Class
Figure 4. Statistical chart of the number of positive and negative samples in the dataset.
Systems 2023, 11, 305 7 of 14
Item Value
Total Number of Transactions 284,807
Number of Fraudulent Transactions 492
Percentage of Fraudulent Transactions 0.172%
Number of Transaction Data Columns 31
PCA Principal Components Feature Quantity 28
Number of Labels 1
Since the data in the ‘Time’ and ‘Amount’ columns have not been preprocessed
before model training, we first perform data standardization on them. The mathematical
expression of this step is as follows:
0 xi − µ
xi = (5)
σ
0
Among them, xi represents the original eigenvalue, xi represents the standardized
eigenvalue, µ represents the mean, and σ represents the standard deviation. Through
this step, except for the ‘Class’ column, each transaction record in this dataset contains
30 standardized feature values. Next, we split the dataset into training and testing sets. In
view of the serious class imbalance problem in this dataset, the construction of training set
and test set needs to be treated carefully. In this paper, the proposed UAAD-FDNet only
relies on normal transaction samples for training during the training phase, which is highly
consistent with the category imbalance characteristics of the Kaggle credit card fraud
detection dataset. In other words, thanks to the unique training method, our method
cleverly avoids the disadvantage of data imbalance. In this experiment, we divided
284,315 normal transaction samples into a ratio of 8:2, of which 227,452 normal transaction
samples were used as the training set, and the remaining 56,863 normal samples and
492 fraudulent transaction samples constituted the test set. The statistical information of
the training set and test set is shown in Table 2.
TP
PR = , (6)
TP + FP
TP
RC = , (7)
TP + FN
2 × PR × RC
F1 = , (8)
PR + RC
TN
Speci f icity = , (10)
TN + FP
where TP represents True Positive. FP represents False Positive. TN represents True
Negative. Furthermore, FN represents False Negative. AUC is the area under the curve
composed of 1-Specificity and Sensitivity in the horizontal and vertical coordinates. From
Equations (9) and (10), it can be observed that a larger AUC indicates better performance
of the model.
transactions. The choice of threshold α should be at the intersection of the two curves, so in
this experiment, α is set to 0.4. The same applies to the IEEE-CIS Fraud Detection Dataset.
Figure 5. Kernel density curve of fraud score on Kaggle Credit Card Fraud Detection Dataset.
Table 3. Comparative experimental results on Kaggle Credit Card Fraud Detection Dataset. (Red
bold indicates optimal results. Blue bold indicates suboptimal results).
Table 4. Comparative experimental results on IEEE-CIS Fraud Detection Dataset. (Red bold indicates
optimal results. Blue bold indicates suboptimal results).
(a) F1 is reported
Table 5. The ablation experiment results of the loss function on the Kaggle Credit Card Fraud
Detection Dataset. (Red bold indicates the optimal result.)
5. Conclusions
In this paper, we reformulate the credit card fraud detection problem as an anomaly
detection problem, and propose a new unsupervised attentional anomaly detection-based
credit card fraud detection network (UAAD-FDNet). The network mainly consists of a
generator and a discriminator. Among them, the generator uses the autoencoder with
Feature Attention to reconstruct the input transaction samples to generate as real transaction
data as possible, in this way, it can learn the high-level representation (hidden vector)
of normal transaction data. The discriminator is used to form an adversarial training
mode with the generator during the training phase to better guide the generator to fit
the normal transaction data distribution. Compared with traditional machine learning
methods, such as SVM, DT, XG Boost, KNN, and RF, as well as existing deep learning-based
methods, such as LSTM, CNN, MLP, and AE, our method has stronger generalization. The
experimental results on Kaggle Credit Card Fraud Detection Dataset and IEEE-CIS Fraud
Detection Dataset show that the proposed method can effectively avoid the problem of
data imbalance, and its fraud detection performance is better. This indicates that, in real
scenarios, our method can safeguard the interests of financial users well.
Author Contributions: Conceptualization, S.J. and J.W.; methodology, S.J.; software, S.J.; validation,
J.W., M.X., and R.D.; formal analysis, R.D.; investigation, J.W. and R.D.; resources, S.J.; data curation,
J.W.; writing—original draft preparation, S.J.; writing—review and editing, J.W.; visualization, J.W.;
supervision, M.X.; project administration, S.J.; funding acquisition, S.J. All authors have read and
agreed to the published version of the manuscript.
Funding: This work is supported in part by the National Natural Science Foundation of PR China
(72101121) and Ministry of Education, Humanities and social science projects (21YJC790054).
Systems 2023, 11, 305 13 of 14
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Haoxiang, W.; Smys, S. Overview of configuring adaptive activation functions for deep neural networks-a comparative study. J.
Ubiquitous Comput. Commun. Technol. 2021, 3, 10–22.
2. Zhang, R.; Zheng, F.; Min, W. Sequential behavioral data processing using deep learning and the Markov transition field in online
fraud detection. arXiv 2018, arXiv:1808.05329.
3. Sun, W.; Yang, C.G.; Qi, J.X. Credit risk assessment in commercial banks based on support vector machines. In Proceedings of the
2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; pp. 2430–2433.
4. Smys, S.; Raj, J.S. Analysis of deep learning techniques for early detection of depression on social media network-a comparative
study. J. Trends Comput. Sci. Smart Technol. 2021, 3, 24–39.
5. Thennakoon, A.; Bhagyani, C.; Premadasa, S.; Mihiranga, S.; Kuruwitaarachchi, N. Real-time credit card fraud detection using
machine learning. In Proceedings of the 2019 9th International Conference on Cloud Computing, Data Science & Engineering
(Confluence), Noida, India, 10–11 January 2019; pp. 488–493.
6. Sailusha, R.; Gnaneswar, V.; Ramesh, R.; Rao, G.R. Credit card fraud detection using machine learning. In Proceedings of the
2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020;
pp. 1264–1270.
7. Rtayli, N.; Enneya, N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters
optimization. J. Inf. Secur. Appl. 2020, 55, 102596.
8. Ileberi, E.; Sun, Y.; Wang, Z. A machine learning based credit card fraud detection using the GA algorithm for feature selection. J.
Big Data 2022, 9, 24.
9. Kim, E.; Lee, J.; Shin, H.; Yang, H.; Cho, S.; Nam, S.k.; Song, Y.; Yoon, J.A.; Kim, J.I. Champion-challenger analysis for credit card
fraud detection: Hybrid ensemble and deep learning. Expert Syst. Appl. 2019, 128, 214–224.
10. Maniraj, S.; Saini, A.; Ahmed, S.; Sarkar, S. Credit card fraud detection using machine learning and data science. Int. J. Eng. Res.
2019, 8, 110–115.
11. Tiwari, P.; Mehta, S.; Sakhuja, N.; Kumar, J.; Singh, A.K. Credit card fraud detection using machine learning: A study. arXiv 2021,
arXiv:2108.10005.
12. Eckerli, F.; Osterrieder, J. Generative adversarial networks in finance: An overview. arXiv 2021, arXiv:2106.06364.
13. Zou, J.; Zhang, J.; Jiang, P. Credit card fraud detection using autoencoder neural network. arXiv 2019, arXiv:1908.11553.
14. Liu, X.; Zhang, F.; Hou, Z.; Mian, L.; Wang, Z.; Zhang, J.; Tang, J. Self-supervised learning: Generative or contrastive. IEEE Trans.
Knowl. Data Eng. 2021, 35, 857–876.
15. Albahli, S.; Nazir, T.; Mehmood, A.; Irtaza, A.; Alkhalifah, A.; Albattah, W. AEI-DNET: A novel densenet model with an
autoencoder for the stock market predictions using stock technical indicators. Electronics 2022, 11, 611.
16. Chen, R.C.; Chen, T.S.; Lin, C.C. A new binary support vector system for increasing detection rate of credit card fraud. Int. J.
Pattern Recognit. Artif. Intell. 2006, 20, 227–239.
17. Khan, A.; Singh, T.; Sinhal, A. Implement credit card fraudulent detection system using observation probabilistic in hidden
markov model. In Proceedings of the 2012 Nirma University International Conference on Engineering (NUiCONE), Ahmedabad,
India, 6–8 December 2012; pp. 1–6.
18. Zareapoor, M.; Shamsolmoali, P. Application of credit card fraud detection: Based on bagging ensemble classifier. Procedia
Comput. Sci. 2015, 48, 679–685.
19. Yee, O.S.; Sagadevan, S.; Malim, N.H.A.H. Credit card fraud detection using machine learning as data mining technique. J.
Telecommun. Electron. Comput. Eng. 2018, 10, 23–27.
20. Lu, C.; Xia, M.; Lin, H. Multi-scale strip pooling feature aggregation network for cloud and cloud shadow segmentation. Neural
Comput. Appl. 2022, 34, 6149–6162.
21. Qu, Y.; Xia, M.; Zhang, Y. Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow.
Comput. Geosci. 2021, 157, 104940.
22. Wang, Z.; Xia, M.; Lu, M.; Pan, L.; Liu, J. Parameter Identification in Power Transmission Systems Based on Graph Convolution
Network. IEEE Trans. Power Deliv. 2022, 37, 3155–3163.
23. Chen, J.; Xia, M.; Wang, D.; Lin, H. Double Branch Parallel Network for Segmentation of Buildings and Waters in Remote Sensing
Images. Remote Sens. 2023, 15, 1536.
24. Zhang, C.; Weng, L.; Ding, L.; Xia, M.; Lin, H. CRSNet: Cloud and Cloud Shadow Refinement Segmentation Networks for
Remote Sensing Imagery. Remote Sens. 2023, 15, 1664.
25. Ma, Z.; Xia, M.; Lin, H.; Qian, M.; Zhang, Y. FENet: Feature enhancement network for land cover classification. Int. J. Remote Sens.
2023, 44, 1702–1725.
26. Wang, D.; Weng, L.; Xia, M.; Lin, H. MBCNet: Multi-Branch Collaborative Change-Detection Network Based on Siamese Structure.
Remote Sens. 2023, 15, 2237.
Systems 2023, 11, 305 14 of 14
27. Fu, K.; Cheng, D.; Tu, Y.; Zhang, L. Credit card fraud detection using convolutional neural networks. In Proceedings of the
Neural Information Processing: 23rd International Conference, ICONIP 2016, Kyoto, Japan, 16–21 October 2016; Proceedings, Part III 23;
Springer: Berlin/Heidelberg, Germany, 2016; pp. 483–490.
28. Chouiekh, A.; Haj, E.H.I.E. Convnets for fraud detection analysis. Procedia Comput. Sci. 2018, 127, 133–138.
29. Saia, R.; Carta, S. Evaluating Credit Card Transactions in the Frequency Domain for a Proactive Fraud Detection Approach. In
Proceedings of the SECRYPT, Madrid, Spain, 26–28 July 2017; pp. 335–342.
30. Fiore, U.; De Santis, A.; Perla, F.; Zanetti, P.; Palmieri, F. Using generative adversarial networks for improving classification
effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455.
31. Saia, R.; Carta, S. Evaluating the benefits of using proactive transformed-domain-based techniques in fraud detection tasks.
Future Gener. Comput. Syst. 2019, 93, 18–32.
32. Esenogho, E.; Mienye, I.D.; Swart, T.G.; Aruleba, K.; Obaido, G. A neural network ensemble with feature engineering for
improved credit card fraud detection. IEEE Access 2022, 10, 16400–16407.
33. Mohmad, Y.A. Credit Card Fraud Detection Using LSTM Algorithm. Wasit J. Comput. Math. Sci. 2022, 1, 39–53.
34. Schapire, R.E. A brief introduction to boosting. In Proceedings of the Ijcai, Stockholm, Sweden, 31 July–6 August 1999; Volume 99,
pp. 1401–1406.
35. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need.
Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010.
36. Zhang, S.; Wang, L. STPGTN—A Multi-Branch Parameters Identification Method Considering Spatial Constraints and Transient
Measurement Data. Comput. Model. Eng. Sci. 2023, 136, 2635–2654.
37. Najadat, H.; Altiti, O.; Aqouleh, A.A.; Younes, M. Credit card fraud detection based on machine and deep learning. In Proceedings
of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020;
pp. 204–208.
38. Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998,
13, 18–28.
39. Quinlan, J.R. C4. 5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014.
40. Meng, C.; Zhou, L.; Liu, B. A case study in credit fraud detection with SMOTE and XGboost. J. Phys. Conf. Ser. 2020, 1601, 052016.
41. Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27.
42. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32.
43. Chen, J.I.Z.; Lai, K.L. Deep convolution neural network model for credit-card fraud detection and alert. J. Artif. Intell. 2021,
3, 101–112.
44. Kasasbeh, B.; Aldabaybah, B.; Ahmad, H. Multilayer perceptron artificial neural networks-based model for credit card fraud
detection. Indones. J. Electr. Eng. Comput. Sci. 2022, 26, 362–373.
45. Fanai, H.; Abbasimehr, H. A novel combined approach based on deep Autoencoder and deep classifiers for credit card fraud
detection. Expert Syst. Appl. 2023, 217, 119562.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.