Building A Robust Mobile Payment Fraud Detection System With Adversarial Examples
Building A Robust Mobile Payment Fraud Detection System With Adversarial Examples
Abstract—Mobile payment is becoming a major payment used in cases where there are many unlabeled data points and
method in many countries. However, the rate of payment fraud few labeled ones.
with mobile is higher than with credit card. One potential reason Although, many fraud detection methods have been intro-
is that mobile data is easier to be modified than credit card data
by fraudsters, which degrades our data-driven fraud detection duced, most existing methods were proposed in a ”benign”
system. Supervised learning methods are pervasively used in environment. They did not take reactions of fraudsters into
fraud detection. However, these supervised learning methods used consideration. They supposed that fraudsters know nothing
in fraud detection have traditionally been developed following about fraud detection systems and machine learning tech-
the assumption that the environment is benign; there are no niques. Unfortunately, this benign assumption is not correct.
adversaries trying to evade fraud detection system. In this paper,
we took potential reactions of fraudsters into consideration to Actually, fraudsters are well organized, they are proficient in
build a robust mobile fraud detection system using adversarial our fraud detection methods in order to avoid being detected.
examples. Experimental results showed that the performance Especially, compared to face to face credit card payments
of our proposed method was improved in both benign and where there are material constraints (i.e. having a functional
adversarial environments. card), mobile payments are almost fully digital, which are
Index Terms—fraud detection, adversarial machine learning, possible to be hacked by those professional fraudsters. In this
oversampling paper, we proposed a mobile fraud detection method in a real
adversarial environment. We supposed that: 1. Fraudsters know
I. I NTRODUCTION that we apply machine learning methods to fraud detection.
2. Fraudsters also use machine learning methods to create
Thanks to its simplicity, freeness and real-time, mobile
their strategies. 3. Fraudsters analyzed our fraud detection
payments are growing in popularity. Almost all of the giant
replies of their previous fraudulent transactions to improve
IT corporations launched their mobile payment solutions, such
their strategies.
as Google wallet, Apple pay, Samsung pay, Alipay, WeChat...
Adversarial machine learning is the study of attacks on,
In China mobile payment is expected to grow at the rate of
and defenses for, machine learning systems. Nowadays, ma-
22 percent each year, starting from $29.7 trillion in 2017 and
chine learning techniques are used in a variety of application
heading to $96.7 trillion in 20231 . This also attracts hackers
domains, especially after the advent of deep learning, many
and fraudsters. As a consequence, more and more frauds in
impressive performances have been reported. Thus, people
mobile payments are being reported.
were astonished at the fact that our deliberately developed
A large number of machine learning methods have been
methods can easily be fooled by adversarial examples, when
proposed for fraud detection problems [1], [2], which can be
Szegedy et al [3] kidded the celebrated deep learning. Since
divided into supervised, unsupervised and semi-supervised. In
then, many methods have been proposed to create adversarial
supervised learning, we know both fraudulent and genuine
examples, such as, FGSM, L-BFGS, JSMA [4] [5] [6]. There-
transactions in the past data, we suppose the future is similar
fore, making machine learning methods robust to adversarial
to the past. However, fraudsters are changing their strategies
attacks became a new issue, a large amount of work has been
all the time. Supervised methods are powerless to detect
proposed [7], [8], one of the most well-known solutions is
new-type frauds. In unsupervised learning, We do not know
to introduce adversarial examples at training time [9]. Many
neither fraudulent nor genuine transactions in the past data,
applications have also been reported [7], [10]. Mary et al
but we know frauds are different from most transactions.
[11] used a game theoretical adversarial learning approach
Unsupervised methods can detect new type of frauds. But
to model the interactions between a fraudster and the fraud
their false alert rate are generally higher than supervised
detection system. At each round fraud strategies are clustered
methods. Semi-supervised methods try to take advantage of
with Gaussian Mixture Model. The strategies are evaluated
both supervised and unsupervised learning and they are often
and the best one is oversampled using SMOTE [12] creating
1 source : https://www.mobilepaymentstoday.com/articles/chinese- a resampled training dataset. This enables to train the model
economic-headwinds-raise-questions-about-mobile-payment-growth on its weaknesses.
Authorized licensed use limited to: National Inst of Training & Indust Eng - Mumbai. Downloaded on August 12,2021 at 07:58:37 UTC from IEEE Xplore. Restrictions apply.
In this paper, we compared some of the most popular fraud adversarial examples in the black box scenario. An attacker
detection methods in an adversarial attack condition. Then, we can firstly train his own (white box) substitute model, then
oversampled these adversarial examples to construct a more generate adversarial samples and finally apply the adversarial
robust mobile payment fraud detector. samples to the target ML model.
The contributions of this paper are the following :
III. E XPERIMENTAL R ESULTS
• The first research on comparing different fraud detection
methods in an adversarial attack condition. A. Dataset
• The first solution on mobile payment fraud detection with We performed our experiments on a dataset containing
adversarial learning. transactions made with a payment smartphone application.
Each transaction is defined by 243 features. Among them,
II. A DVERSARIAL ATTACK there is information about the transaction itself (amount,
The challenge for the adversary is to figure out how to time, status) as well as smartphone information (brand, OS
generate an input which fools the targeted classifier [13]. version, year, battery power, country, operator, ...). We also
Especially, in an adversary attack setting, we focus on wild have features giving information on past transactions and the
patterns (adversarial examples) [14] creation methods, which overall payer use of the applications (delta-time with the last
confuses machine learning models. A possible strategy is to transaction/last application opening, date of enrollment, num-
minimize distance (or small perturbation) [15]. As shown in ber of transaction in the last day/week/month, ...). Given that
formula 1, it generates an adversarial example xadv by adding [17] pointed out that performing too much features selection
0
a well crafted perturbation to a existing sample x , and the leads to a robustness loss, we decided to keep most of these
classification results of our model f (x) on this adversarial features. Number of instances, repartition of genuine and fraud
0
example xadv and sample x are different. The main idea of transactions can be seen in Tab. I
this strategy is to find out some ”exceptions” to one of basic
Training set Test Set
machine learning conditions: similar samples, similar labels. Genuine 812 040 203 010
0
Frauds 55 15
min ||xadv − x || Total 812 095 203 025
0 (1) TABLE I
subject to f (xadv ) 6= f (x ) R EPARTITION OF G ENUINE AND FRAUDS TRANSACTION IN THE DATASET
104
Authorized licensed use limited to: National Inst of Training & Indust Eng - Mumbai. Downloaded on August 12,2021 at 07:58:37 UTC from IEEE Xplore. Restrictions apply.
Substitute Oracle
Substitute Oracle
attacked attacked
MLP 0.261 ± 0.10 0.161 ± 0.07 0.283 ± 0.11 0.244 ± 0.13
LR 0.235 ± 0.11 0.116 ± 0.08 0.295 ± 0.10 0.269 ± 0.10
DT 0.212 ± 0.10 0.084 ± 0.09 0.138 ± 0.10 0.032 ± 0.04
RF 0.226 ± 0.10 0.122 ± 0.10 0.113 ± 0.09 0.150 ± 0.09
TABLE II
P RECISION AT 100 METRICS COMPARED ACROSS DIFFERENT MODELS .
W HERE MLP IS M ULTI LAYER PERCEPTRON , LR IS LOGISTIC
REGRESSION , DT IS DECISION TREE AND RF IS RANDOM FOREST
data is retrieved. After that, attackers will generate adversarial Fig. 1. Comparison of precision and robustness between oversampling
instances with their substitute model and query our oracle methods using adversarial attack and SMOTE
models for output labels on these adversarial instances to
augment their dataset and optimize their attack models and
C. Improving the robustness of fraud detection methods with
strategies.
adversarial instances
Algorithm 1 Jacobian dataset augmentation [21] As mentioned before, adversarial instances of a ML model
1) Collecting some (limited) data Sp , where p is number of are some instances difficult correctly classified by this model.
iterations Obviously, one could add these generated adversarial instances
2) Querying Oracle labels (Õ(x), ∀x ∈ Sp ) into the training set to improve the robustness of a ML model.
3) Training the substitute model (MLP) on these labeled data In addition, one major problem of a fraud detection model
(Sp , Õ(Sp ) ) is the extreme imbalanceness of datasets. In this part, we
4) Augmenting data Sp+1 according to equation (3) used adversarial example generation method as oversampling
method to balance our fraud detection data and improve the
performance of fraud detection methods.
We compare our adversarial oversampling with SMOTE
oversampling. We used the same training procedure for both of
Sp+1 = Sp ∪ {x + · sign(∇x J[x, Õ(x)]) : x ∈ Sp} (3) them. At each epoch, the training set is oversampled in order
to double the number of fraud transactions. The oversampled
Performing this kind of data augmentation is a way to instances of the previous epoch are dropped in order to assure
query labels on point near the decision boundaries. Thus, with that the oversampled instances remain in a box-constraint. We
limited queries, it is possible to substitute the fraud detection also compared with a baseline : the same MLP model trained
system. with the raw training set. Each model created is evaluated with
For our experiments we performed 3 iterations for Jacobian precision at 50 on the test and precision at 50 on adversarial
dataset augmentation with an initial dataset of 20 frauds among examples using FGSM with = 0.01.
100 instances. We used the FGSM with an equals to 0.01 for As shown in Fig. 1. We observe that performing an over-
all features in range [0; 1]. We used FGSM for convenience as sampling technique closes the gap between precision and pre-
it is easy to compute. cision under attack leading to more robust models, adversarial
Results can be seen in Table. II. First column shows how training technique being better than SMOTE. We also spot a
well the substitute model succeeds in imitating the oracle. Sec- little gain in precision on the test set whereas SMOTE makes
ond column indicates the performance of the substitute model no difference with baseline.
under attack. The third column indicates the performance of We explain our results by the fact that in the context
the oracle in a benign environment. While the last column of extreme imbalanced fraud dataset, SMOTE can’t operate
indicates the performance of the oracle on examples crafted properly. Indeed, SMOTE performs a linear combination of
thanks to the substitute model. nearest points, thus for each fraud instance it must have K
Results show that most models are likely to be fooled by other close frauds which is not always true due to fraud
black-box attacks. Indeed we observe a significant precision label sparsity. Moreover, there is no guarantee that SMOTE
loss for the oracle between the benign and adversarial environ- synthesized points will help the model to reach the task
ments. We observe an exception for random forest where we decision boundary. As for adversarial oversampling, no strong
notice a weird phenomenon. Indeed the substitute model seems assumption is made. Above all, it helps to push back the
to be better than the detection system itself and the FGSM is decision boundary towards the task decision boundary (i.e.
not able to craft good adversarial examples. We believe that the theoretical decision boundary for the task) by anticipating
in general ensemble models are more robust to adversarial fraudsters next moves.
attacks than other methods. Especially when each classifier of An example using a toy dataset can be seen in Fig. 2. This
the ensemble is trained with a different subset of features. This figure illustrates the strategy of fraudsters to transgress our
result should motivate more research in this topic. fraud detection system and shows how our algorithm behaves
105
Authorized licensed use limited to: National Inst of Training & Indust Eng - Mumbai. Downloaded on August 12,2021 at 07:58:37 UTC from IEEE Xplore. Restrictions apply.
[9] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar,
“Can machine learning be secure?,” in Proceedings of the 2006 ACM
Symposium on Information, computer and communications security,
pp. 16–25, ACM, 2006.
[10] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel,
“Adversarial perturbations against deep neural networks for malware
classification,” arXiv preprint arXiv:1606.04435, 2016.
[11] M. F. Zeager, A. Sridhar, N. Fogal, S. Adams, D. E. Brown, and
P. A. Beling, “Adversarial learning in credit card fraud detection,” in
2017 Systems and Information Engineering Design Symposium (SIEDS),
pp. 112–116, IEEE, 2017.
[12] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote:
synthetic minority over-sampling technique,” Journal of artificial intel-
ligence research, vol. 16, pp. 321–357, 2002.
[13] I. Goodfellow, P. McDaniel, and N. Papernot, “Making machine learning
robust against adversarial inputs,” Communications of the ACM, vol. 61,
no. 7, pp. 56–66, 2018.
[14] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of
adversarial machine learning,” Pattern Recognition, vol. 84, pp. 317–
331, 2018.
Fig. 2. Comparison between adversarial oversampling and SMOTE [15] A. Kantchelian, J. Tygar, and A. Joseph, “Evasion and hardening of tree
ensemble classifiers,” in International Conference on Machine Learning,
pp. 2387–2396, 2016.
[16] A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang,
to anticipate it and leaves no clues for fraudster on which T. Pang, J. Zhu, X. Hu, C. Xie, et al., “Adversarial attacks and defences
strategy to choose next. competition,” in The NIPS’17 Competition: Building Intelligent Systems,
pp. 195–231, Springer, 2018.
IV. C ONCLUSION AND F UTURE WORK [17] F. Zhang, P. P. Chan, B. Biggio, D. S. Yeung, and F. Roli, “Adver-
sarial feature selection against evasion attacks,” IEEE transactions on
In this work, we studied the effect of adversarial attacks on cybernetics, vol. 46, no. 3, pp. 766–777, 2016.
[18] S. Arlot, A. Celisse, et al., “A survey of cross-validation procedures for
a fraud detection system in a context of a smartphone payment model selection,” Statistics surveys, vol. 4, pp. 40–79, 2010.
application. A comparison of different mobile payment fraud [19] Z. Zojaji, R. E. Atani, A. H. Monadjemi, et al., “A survey of credit card
detection methods in an adversarial setting was discussed. fraud detection techniques: Data and technique oriented perspective,”
arXiv preprint arXiv:1611.06439, 2016.
We showed that fraud detection systems, as other machine [20] C. Phua, V. Lee, K. Smith, and R. Gayler, “A comprehensive sur-
learning techniques are subject to adversarial attacks. We also vey of data mining-based fraud detection research,” arXiv preprint
proposed to use adversarial examples to improve the robust- arXiv:1009.6119, 2010.
[21] N. Papernot, P. McDaniel, and I. Goodfellow, “Transferability in ma-
ness of fraud detection models and to balance the training chine learning: from phenomena to black-box attacks using adversarial
data. Experimental results showed that performance of our samples,” arXiv preprint arXiv:1605.07277, 2016.
proposed method was improved in both benign and adversarial [22] D. Sánchez, M. Vila, L. Cerda, and J.-M. Serrano, “Association rules
applied to credit card fraud detection,” Expert systems with applications,
environments. In the future, we would like to extend this vol. 36, no. 2, pp. 3630–3640, 2009.
work towards implementation of a more general framework to [23] S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick, “Credit card
evaluate robustness of fraud detection system in an adversarial fraud detection using bayesian and neural networks,” in Proceedings
of the 1st international naiso congress on neuro fuzzy technologies,
environment. pp. 261–270, 2002.
[24] G. L. Wittel and S. F. Wu, “On attacking statistical spam filters.,” in
R EFERENCES CEAS, 2004.
[1] K. K. Tripathi and M. A. Pavaskar, “Survey on credit card fraud [25] A. Shen, R. Tong, and Y. Deng, “Application of classification models
detection methods,” International Journal of Emerging Technology and on credit card fraud detection,” in 2007 International Conference on
Advanced Engineering, vol. 2, no. 11, pp. 721–726, 2012. Service Systems and Service Management, pp. 1–4, IEEE, 2007.
[2] A. Srivastava, A. Kundu, S. Sural, and A. Majumdar, “Credit card [26] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 Eighth
fraud detection using hidden markov model,” IEEE Transactions on IEEE International Conference on Data Mining, pp. 413–422, IEEE,
dependable and secure computing, vol. 5, no. 1, pp. 37–48, 2008. 2008.
[3] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, [27] R. Tomsett, A. Widdicombe, T. Xing, S. Chakraborty, S. Julier, P. Gur-
and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint ram, R. Rao, and M. Srivastava, “Why the failure? how adversarial
arXiv:1312.6199, 2013. examples can provide insights for interpretable machine learning,” in
[4] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural 2018 21st International Conference on Information Fusion (FUSION),
networks,” in 2017 IEEE Symposium on Security and Privacy (SP), pp. 838–845, IEEE, 2018.
pp. 39–57, IEEE, 2017. [28] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
[5] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
physical world,” arXiv preprint arXiv:1607.02533, 2016. [29] S. Baluja and I. Fischer, “Adversarial transformation networks: Learning
[6] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards to generate adversarial examples,” arXiv preprint arXiv:1703.09387,
deep learning models resistant to adversarial attacks,” arXiv preprint 2017.
arXiv:1706.06083, 2017. [30] S. Gu and L. Rigazio, “Towards deep neural network architectures robust
[7] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, to adversarial examples,” arXiv preprint arXiv:1412.5068, 2014.
“Adversarial examples for malware detection,” in European Symposium [31] A. Mead, T. Lewris, S. Prasanth, S. Adams, P. Alonzi, and P. Beling,
on Research in Computer Security, pp. 62–79, Springer, 2017. “Detecting fraud in adversarial environments: A reinforcement learning
[8] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation approach,” in 2018 Systems and Information Engineering Design Sym-
as a defense to adversarial perturbations against deep neural networks,” posium (SIEDS), pp. 118–122, IEEE, 2018.
in 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597,
IEEE, 2016.
106
Authorized licensed use limited to: National Inst of Training & Indust Eng - Mumbai. Downloaded on August 12,2021 at 07:58:37 UTC from IEEE Xplore. Restrictions apply.