Enhanced Cyber Security For Big Data Challenges

Enhanced-Cyber-Security-for-Big-Data-Challenges

Uploaded by

nilumipabasara57

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Enhanced Cyber Security For Big Data Challenges

Enhanced-Cyber-Security-for-Big-Data-Challenges

Uploaded by

nilumipabasara57

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

2019 IEEE International Conference on Big Data (Big Data)

Privacy and Security of Big Data in AI Systems:

A Research and Standards Perspective
Saharnaz Dilmaghani1 , Matthias R. Brust1 , Grégoire Danoy1,2 , Natalia Cassagnes3 , Johnatan Pecero3 , Pascal Bouvry1,2
1 Interdisciplinary
Centre for Security, Reliability, and Trust (SnT), University of Luxembourg
2 Facultyof Science, Technology and Communication (FSTC), University of Luxembourg
3 Agence pour la Normalisation et l’Économie de la Connaissance (ANEC G.I.E.), Luxembourg

Abstract—The huge volume, variety, and velocity of big data in AI systems, they become vulnerable to privacy and secu-
have empowered Machine Learning (ML) techniques and Ar- rity attacks that are even significantly increased recently [7],
tificial Intelligence (AI) systems. However, the vast portion of [8]. In a recent paper [7], the impact of adversarial attacks
data used to train AI systems is sensitive information. Hence,
any vulnerability has a potentially disastrous impact on privacy against AI medical systems is described such that an image
aspects and security issues. Nevertheless, the increased demands of a benign melanocytic nevus is recognized as malignant
for high-quality AI from governments and companies require with a high confidence score. A malicious attack on a face
the utilization of big data in the systems. Several studies have recognition system can reveal individuals images which are
highlighted the threats of big data on different platforms and used to train the system [9]. By abusing a speech recognition
the countermeasures to reduce the risks caused by attacks.
In this paper, we provide an overview of the existing threats system, an adversary can produce almost the same voice,
which violate privacy aspects and security issues inflicted by big however, transcribed the phrases [10]. Other attack techniques
data as a primary driving force within the AI/ML workflow. can cause potential safety hazards by effectively fooling the
We define an adversarial model to investigate the attacks. image classification system of an autonomous vehicle [11].
Additionally, we analyze and summarize the defense strategies IoT devices as one of the major sources of big data have
and countermeasures of these attacks. Furthermore, due to the
impact of AI systems in the market and the vast majority caused new adversarial opportunities against privacy and data
of business sectors, we also investigate Standards Developing protection of AI systems which are addressed in our previous
Organizations (SDOs) that are actively involved in providing work [12].
guidelines to protect the privacy and ensure the security of big The demand for AI in the market and yet the vulnerability of
data and AI systems. Our far-reaching goal is to bridge the
the data in the workflow has stimulated Standards Developing
research and standardization frame to increase the consistency
and efficiency of AI systems developments guaranteeing customer Organizations (SDOs) to set up Subcommittees (SCs) and
satisfaction while transferring a high degree of trustworthiness. initiate projects [13], [14] with the mandate of providing
standards and guidelines for big data and AI in order to help
I. I NTRODUCTION business sectors and market for a secure AI adoption. The
The huge volume of data generated by various sources, from Joint Technical Committee between the International Organi-
connected devices to social media, termed as big data [1], is zation for Standardization and International Electrotechnical
a valuable asset. The availability and widespread applications Commission (ISO/IEC JTC 1) 1 is a pioneer organization that
of big data [2] significantly impacts the growth of Machine is currently involved in developing standards on big data and
Learning (ML) and Artificial Intelligence (AI) with the goals AI.
of increasing the efficiency and the accuracy of prediction and Different surveys in the literature have followed a particular
decision making and also minimizing their computational cost. perspective to tackle the privacy and security of machine
Statistics depict the interest of the world market in AI systems learning and AI systems. Bae et al. [15] have considered
that, only between 2018 and 2019, has increased by 154%, the vulnerabilities of AI systems in the white-box/black-box
reached a $14.7 billion market size and will reach almost $37 scenarios, while Liu et al. [16] focused on learning techniques
billion by 2025 [3]. Stakeholders such as governments and and classified attacks based on training/testing phases. Biggio
industry sectors are attracted to benefit from AI to acquire et al. [8] proposed a four-dimensional model based on the goal,
insights from the data for customized services depend on knowledge, capability and attacking strategy of the adversary.
customer’s needs. Additionally, in [17] the authors focused on the privacy and
The integration of AI in various domains [4] significantly security issues of big data from another perspective based on
increases concerns regarding the privacy and security of data. the three main phases of big data analysis: data preparation,
The data that actuates AI includes various sensitive informa- data processing, and data analysis. In an ongoing project by
tion, particularly individuals’ information, including: images, ISO/IEC JTC 1, the threats against the trustworthiness of AI
speech, comments and posts on social media [5], [6], financial
transactions, and health record information. Feeding such data 1 https://www.iso.org/isoiec-jtc-1.html

978-1-7281-0858-2/19/$31.00 ©2019 IEEE 5737

systems are summarized and the characteristics of each has underlying data structure [23], [24]. Hence, a small change
been reported [18]. We focus on the data violation threats in implied by an adversary can affect the results in the favour of
AI systems which are highlighted the most in the literature [8], the adversary [11].
[15], [16] and standardization [18]. B. Adversarial Model
This study provides a literature review on privacy and
We investigate privacy and security attacks of big data in
security issues of big data in AI systems. Our work departs
AI systems that are modeled based on ML techniques. Each
from previous studies by discussing this issue using standards
step in the workflow of the AI system can be the target of
and guidelines developed by SDOs. Due to the worldwide
the specific attack(s). Hence, we use four phases in the AI
importance of big data and AI in the market, we aim both
overflow to identify the attacks based on the phase that an
research and standards to emphasize the opportunities where
adversary penetrates to violate the system. The phases are
both frames can benefit from the outcomes of the other.
illustrated in Fig. 1 on the defined AI workflow system. The
The remainder of this paper is organized as follows. Sec-
first phase, Training phase, is the step where the trained data
tion II describes the concepts that are used throughout the
is fed into the ML model for the learning process. The data
paper. Section III presents an overview of existing studies
in this stage (labeled or unlabeled) is a significantly valuable
and standards from the privacy and security of big data in AI
source for the AI system that can be the aim of many attackers
systems. Section IV explains the countermeasures and defense
to violate the privacy and security [9], [25]–[27]. The next
mechanisms for the attacks described in Section III. Finally,
phase is the Model phase where the ML algorithm learns from
Section V summarizes the paper by discussing the outcomes
the trained dataset and develops a model, which is the other
and insights.
valuable intellectual property of AI systems and hence is the
II. BACKGROUND target of various attacks [9], [28], [29]. The novel data is then
fed into the trained model, named as Apply phase, where an
A. Machine Learning (ML) and Artificial Intelligence (AI) adversary can penetrate the system and modify the results in
In computer science, AI is associated with the accomplish- his favor [11], [30], [31]. Finally, the valuable outcomes of the
ments of tasks or problems by computers for which human system, determined as the Inference phase, may host attacks
intelligence is assumed to be required. AI is designed such that that disclose sensitive information [32], [32], [33].
the input is the information acquired from the environment
and takes actions to maximize success in achieving partic- Training phase Model phase
ular goals [19]. The most dominant way of achieving AI
nowadays is by Machine Learning (ML) techniques which are Trained Data Model learning
build based on the concept of “without being explicitly pro-
grammed”. In principle, ML consists of a set of algorithms and Real Data Application Result
statistical models for computer systems to efficiently perform
a particular task without relying on rule-based programming Apply phase Inference phase
or human interaction. Developing the mathematical model is
strongly dependant on the dataset, referred to as training data, Fig. 1: The workflow and different phases of AI systems
which allows the program to gradually improve through the developed based on ML algorithms.
experiences and learning process from the data for predicting,
detecting or making decisions [20]. A standard terminology of The security goals of the attacks are also investigated as the
AI and Big data is also described in a standard document [21], other feature. For this purpose, we consider the CIA triad [34],
an under development project from ISO/IEC JTC 1. as the three pillars to cover the security of a system. They are
Machine learning techniques can be classified in different summarized as follows:
ways. In an underdevelopment standard [22] a set of ML • Confidentiality ensures the protection of sensitive infor-
approaches are defined as follows: mation against misuse and unauthorized access. Hence,
1) Supervised learning, it roughly represents the privacy of a system.
• Integrity refers to the consistency and accuracy of data
2) Unsupervised learning,
3) Semi-supervised learning, through the AI system workflow against unauthorized
4) Reinforcement learning, modification. An attack may modify the system towards
5) Transfer learning. misclassification, and yet does not affect the performance
of the systems.
Several techniques exist in each approach which are used • Availability describes the system power to perform to
based on the learning purpose and dataset. Regression, for achieve the expected purpose designed for the AI system
instance is one of the well-known techniques used for pre- with reliable outputs.
diction on labeled dataset. Clustering is another fundamental
technique that is implied on unlabeled dataset for various C. Standards Developing Organization (SDO)
applications such as recommendation of new options. How- SDOs develop technical standards and guidelines to address
ever, clustering results shown to be highly influenced by the the needs and demands of particular adopters. Moreover,

5738
the standards play an important role in achieving interop- attack penetrates the AI system. Table II summarizes the at-
erability and portability of complex ICT technologies and tacks introduced in this section and lists the relevant standards
platforms. They can bring significant benefits to industry where these attacks or the elements of mitigation strategies are
and consumers. The best known SDO is an International described.
Organization for Standardization (ISO) that together with
International Electrotechnical Commission (IEC) initiated a A. Data Breach
Joint Technical Committee (JTC) for Information technology, As a common privacy incident, a data breach is the disclo-
known as ISO/IEC JTC 1. It covers several domains con- sure of confidential or sensitive data in unauthorized access.
cerning smart ICT and information technology including pri- This type of attack has a long history [46] in privacy and
vacy, data protection and security of ICT technologies mainly security challenges of any systems and is not limited to AI.
under Subcommittee, ISO/IEC JTC1/SC 27 – ”Information Nevertheless, AI has increased the quality of the insight gained
Security, Cybersecurity and Privacy Protection”, and ISO/IEC from big data and therefore, new vulnerabilities against data
JTC1/SC 42 – ”Artificial Intelligence” that is dedicated to and privacy breaches have raised by AI. The data breach may
AI that is recently created and dedicated to AI and big happen in different phases of AI workflow [47]: Training,
data. Overall, the JTC 1 has already published more than 3k model, and inference phases. Confidentiality which is roughly
standards in different domains regarding smart ICT, among an equal to privacy is the target of the adversary providing
them are 188 for SC 27, 3 for SC 42 with 13 more standards this attack.
under development for AI and big data. The other international As an early example of data breach attacks is re-
level SDO is the International Telecommunication Union’s identification where attackers used another dataset from the
Telecommunication Standardization Sector (ITU) that is fo- public electoral rolls of the city of Cambridge [35] to identify
cused on the AI in communication technologies. Furthermore, medical records. Additionally, a study on mobile phone meta-
the Institute of Electrical and Electronics Engineers (IEEE) data revealed that unique identification of 95% of individuals
as the other international leading standard body, has also from a population of 1.5 million people, requires only 4
initiated projects which mostly concern the legal and ethical approximate location and time data points [32]. Different
perspectives of AI [13]. methods were implied to mask the sensitive information of
In the European level, European Committee for Standard- individuals within the datasets [46]. Nonetheless, the evolution
ization (CEN) and European Committee for Electrotechnical of big data and computational techniques such as AI systems
Standardization (CELENEC) have recently announced [14] provided new opportunities to violate data privacy in the
the development of “Focus Group Artificial Intelligence” to process.
develop standardization road-map for AI according to the
European requirements. Moreover, European Telecommuni- B. Bias in Data
cations Standards Institute (ETSI) has also initiated projects The decisions achieved by AI systems can reinforce injus-
focused on the use cases, applications and security challenges tice and discrimination [38] in shortening candidates list for
of AI. In this paper, our main target is the joint committee of credit approval, recruitment, and criminal legal system [39].
ISO/IEC JTC 1 since it has already established a particular Even though bias is not directly recognized as the privacy and
committee and various study and working groups in AI and security issue of big data, it is entangled with data and thereby
big data related issues. can significantly impact the accuracy and accountability of
the results. Among different types of bias [40] identified
TABLE I: Identifying the phases where a particular attack in AI systems, we focus on those which are correlated to
penetrates the AI system. data: i) Sample bias describing an unbalanced representation
Attack
AI Workflow Phase of samples in training data, ii) Algorithm bias which refers
Training Model Apply Inference to the systematic errors in the system, and iii) Prejudicial
Data Breach
Bias in Data bias indicates the incorrect attitude upon an individual data.
Data Poisoning Other types such as measurement bias that results from poorly
Model extraction measuring the outcome, are out of the scope of this paper.
Evasion
Bias is not a deliberate feature of AI systems, but rather the
result of biases presented in the input data used to train the
III. P RIVACY AND S ECURITY OF B IG DATA IN AI systems [48]. Hence, it targets the training phase and violates
the integrity of an AI system.
In this section, we analyze the data privacy and security Bias can target different attributes in decisions making
attacks concerning the defined characteristics (cf. Section I). including gender, race, age, national origin. In a project by
We describe the phase where the attack is imposed, the risks MIT [25], known as Gender shade2 , the AI gender classi-
caused by the attack, and the real-world attack examples. fication systems sold by giant technology companies (e.g.,
Besides, an overview of the research papers and standards Microsoft, IBM, and Amazon) have been analyzed. The results
is conducted corresponding to each attack scenario. Table I
represents, for each attack the phase(s) where a particular 2 http://gendershades.org/

5739
TABLE II: Summary of the data privacy and security attacks in the AI workflow.
Developed / Under development
Attack Security Goal (CIA) Attack Examples
Standards
Re-identification [35] ISO/IEC CD 20547-4 [37],
Data Breach Confidentiality
Risk of inference [36] ISO/IEC PD TR 24028 [18]
Gender classification [25]
Integrity, ISO/IEC NP TR 24027 [40],
Bias in Data Face recognition [38]
Availability ISO/IEC PD TR 24028 [18]
Criminal legal system [39]
Self-driving car [27]
Availability,
Data Poisoning Sentiment anlysis [41] ISO/IEC PD TR 24028 [18]
Integrity
Social media chatbot [42]
Image recognition [28]
Model Extraction Confidentiality ISO/IEC PD TR 24028 [18]
Location data [43]
Image classification [44]
Evasion Integrity Spam emails [45] ISO/IEC PD TR 24028 [18]
Self-driving car [11]

of analysis in 2018, show a significant difference in the error and detection [56]–[58] are the other target domains of this
rate of classifying darker-skinned female (up to 34.4%) in attack.
contrast to lighter-skinned males (0.8%). Some classification
systems are considerably improved by 2019 [49] to reduce D. Model Extraction
the error rate and yet the bias is not eliminated [50]. Bias is The trained model is a valuable intellectual property in ML
also found in a criminal legal system, Correctional Offender systems due to i) the big data source that is been used to train
Management Profiling for Alternative Sanctions (COMPAS) the model, and ii) the parameters (e.g., weights, coefficients)
developed based on ML techniques to assess the sentencing which generated for the model based on its function (e.g.,
and parole of convicted criminals. The purpose of COMPAS classification) [18], [59]. The adversary’s aim from the model
was to forecast the criminals who are most likely to re- extraction might be to infer record(s) that is used to train the
offend [39]. However, the system has racial bias and tend model, thus, violates the confidentiality of the system. Based
to label black offenders almost twice higher risk than white on how sensitive the trained data is (e.g., medical record),
offenders [51]. the attack can cause a significant privacy breach by disclosing
sensitive information [29]. A reverse-engineering of ML model
C. Data Poisoning can happen by observing the input and output pairs [60] or
Data poisoning [26] is one of the most widespread attacks by sending queries and analyzing the responses [59], where
developed based on the idea of learning with polluted data. Tramer et al. prove that sending only hundreds of queries is
Its disruptive effects in industrial applications have attracted sufficient enough to clone the same system with almost 100%
experts of the standard technical committee to investigate accuracy.
on the countermeasures and defence techniques [18]. The Many ML techniques (e.g., logistic regression, linear clas-
attack happens by injecting adversarial training data during sifier, support vector machine, and neural network) [61], [62]
the learning to corrupt the model or to force a system towards are shown to be vulnerable to this type of attack [59] and yet
producing false results [52]. Therefore, the attack works in two the proposed defense mechanisms are not sufficient enough
ways: i) a common adversarial type is to alter the boundaries to protect the privacy and security of data. In a study by
of the classifier such that the model becomes useless. This way Fredrikson et al. [9], [28], the authors report that having access
the attacker aims the availability of the system. ii) the other to a face recognition model, they reproduce almost 80% of
type, however, targets the integrity of the system by generating an individual’s image from the training dataset. In a similar
a backdoor such that the attacker can abuse the system in his yet more successful attack on face recognition [63], attackers
favor. infer samples with a 100% success rate. Other examples of
In a particular study on injecting poisoned samples to a deep membership inference attack is also observed in location data
learning model, it is shown that only 50 polluted samples are disclosure [43], machine translation and video captioning [29],
enough to achieve a 90% attack success rate in the system [53] and medical diagnosis [63].
while the accuracy remains almost the same. Early examples of
data poisoning attacks are the worm signature generation [54], E. Evasion
and spam filtering [55]. In another real world scenario of Evasion is a popular common attack in which the attacker’s
classifying the street signs in the U.S., a backdoor attack aim is to evade detection by fooling the systems towards
lead to the misclassification of the stop sign as the speed misclassification [31]. It happens in the apply phase of the AI
limit sign [27]. In social media, the data poisoning attack on workflow, where the real data is implied on the trained model.
Microsoft’s chatbot, Tay, created a bot who made offensive and The well-known example of evasion attacks is the adversarial
racist statements [42]. The bot was shut down only 16 hours samples [18]. They are malicious samples that are designed
after its launch. Sentiment analysis [41], malware clustering adding a few chosen bytes to the original sample [30]. Even

5740
though adversarial samples and poisoning data might look sim- B. Bias in Data
ilar, they function differently. Considering a classifier, a data To identify different types of bias several metrics are in-
poisoning attack alters the classification boundary, however, troduced in the literature [48] including difference in means,
adversarial samples modify the input samples to be classified difference in residuals, equal opportunity, disparate impact,
in the wrong category. Hence, both lead to misclassification and normalized mutual information. Moreover, benefiting the
by targeting a different phase of AI workflow. metrics, approaches to mitigate AI bias are developed such as
Adversarial samples are popular in comprising computer optimized prepossessing, reject option classification, learning
vision. In an experiment on autonomous vehicles [11], a fair representations, and adversarial debiasing [68]. Besides,
couple of minor changes on the stop sign caused the learning a set of toolboxes are designed which are accumulated the
model to misclassify the sign with a speed limit 45 sign. Even identification metrics along with the mitigation approaches
though for a human eyes such modifications does not affect together as a framework for different ML algorithms. The
the understanding of the street sign. purpose is to diagnose and remove AI biases if exists in
IV. C OUNTERMEASURES AND P RIVACY- PRESERVING the system. The available toolboxes are Lime, FairML [69],
S OLUTIONS Google What-If and IBM Bias Assessment Toolkit [70] which
is mostly used for face detection systems.
This section describes an overview of the countermeasures
and defense mechanisms of each particular attack mentioned C. Data Poisoning
in Section III. The feasibility of data poisoning attacks on ML algorithms
such as Support Vector Machine (SVM) classifier is stud-
A. Data Breach
ied [71]. One common approach to detect the poisoned data
The data protection and privacy techniques evolved during is to identify the outlier (i.e., anomaly detection) since the
the time based on the growth of big data and the complexity injected data is expected to follow a different data distribution.
of data analysis techniques. The purpose of these mechanisms Paudice et al. [72] developed their defense model against data
is to ensure the confidentiality of data used for data analysis. poisoning based on anomaly detection. However, poisoned
Overall, the privacy-preserving techniques of big data can be samples can evade anomaly detection if the adversary knows
categorized in three classes: Anonymization, De-identification, the data distribution. Hence, advanced techniques are required
and Privacy-enhancing Techniques (PET). The privacy con- to defeat the attack. In [73], a method is proposed to per-
cerns of data is not a recent issue, started from data analysis turb the incoming input and observe the randomness of the
on medial datasets in 1998, when the researcher find out outcome. A low variance in the predicted classes represents
that the anonymization is not sufficient solely to protect data malicious samples. Nelson et al. [74] proposed a technique to
privacy [46]. The sensitive data disclosure reports [32], [35] recognize and remove the poisoned data in the training dataset
represent the deficiency of anonymization, where replacing by separating the new joined input and calculate the accuracy
clear identifier was enough solely to ensure the privacy and of the model on them.
security of the data. Hence, the second level of mechanisms
developed by k-anonymity [46] family, including l-diversity D. Model Extraction
and t-closeness [17]. These techniques are suitable to mask Juuti et al. [75] proposed a method to detect model extrac-
sensitive information such as location-based data [64] to tion attack by analyzing the distribution of consecutive API
guarantee that the identity of records is not distinguishable queries and compare it with benign behavior. One possible
in a dataset. The emergence of AI and ML techniques along defense technique against model extraction is by training
with the increased complexity of big data, the conventional multiple models using different partitions of training data to
de-identification methods become obsolete [33]. Hence, PET each model. The techniques are proposed by Papernot et al.
was developed for privacy-preserving data analysis in various known as PATE [76]. Another approach to protect the learning
domains such as e-health [65], [66]. Fig. 2 describes these model is to limit the information regarding the probability
techniques according to the evolution of privacy-preserving score of the model and degrade the success rate by misleading
techniques. the adversary [77].
The next generation of the privacy-preserving approach is
focused on the concept of sending the code to the data. The E. Evasion
OPen ALgorithms (OPAL) project [67] has combined different Adversarial samples, as the most common evasion attacks,
mechanisms such as access-control protocols, aggregation leas to misclassification only by small perturbations in the
schemes and develop a platform that allows third-parties (e.g., original inputs. Hence, a potential defense mechanism is to
researchers) to submit algorithms that will be trained on data. ensure that a small modification in the input cannot change
The privacy of individuals, however, is guaranteed while data the result significantly. Adversarial training is based on this
is being analyzed. Furthermore, Google’s DeepMind has also technique to train the model based on the adversarial samples,
developed a verifiable data audit which ensures that any however, with true labels such that it can avoid the noise [78].
interaction with health records data is recorded and accessible In a similar approach by Deepfool [44] the ideas is to compute
to mitigate the risk of foul play. the perturbations which fool the classifier and thus quantify

5741
Big data AI and ML

mizati
on tion
Anony ntiﬁca PET
De-ide

Fig. 2: A overview of the evolution of defense techniques for AI and big data analysis.

the robustness of the classifier. In another approach, the goal [7] S. G. Finlayson, J. D. Bowers, J. Ito, J. L. Zittrain, A. L. Beam, and I. S.
is to detect the adversarial samples from the original ones and Kohane, “Adversarial attacks on medical machine learning,” Science,
vol. 363, no. 6433, pp. 1287–1289, 2019.
therefore remove them from the dataset [79]. [8] B. Biggio, G. Fumera, and F. Roli, “Security evaluation of pattern
classifiers under attack,” IEEE transactions on knowledge and data
V. S UMMARY engineering, vol. 26, no. 4, pp. 984–996, 2013.
The huge volume, variety, and velocity of big data have [9] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that
exploit confidence information and basic countermeasures,” in Proc. of
empowered Machine Learning (ML) techniques and Artificial ACM SIGSAC Conf. on Computer and Communications Security, 2015.
Intelligence (AI) systems. As privacy and security threats [10] N. Carlini and D. Wagner, “Audio adversarial examples: Targeted attacks
evolve, so too will the technology need to adapt – as well on speech-to-text,” in 2018 IEEE Security and Privacy Workshops
(SPW), 2018.
as the rules and regulations that govern the use of such [11] K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, C. Xiao,
technologies. The two perspectives of the research outcomes A. Prakash, T. Kohno, and D. Song, “Robust physical-world attacks
and standards development are considered in this study. We on deep learning visual classification,” in Proc. of the IEEE Conf. on
Computer Vision and Pattern Recognition.
focus on challenges and threats of big data in the AI workflow [12] N. S. Labib, M. R. Brust, G. Danoy, and P. Bouvry, “Trustworthiness
by providing a review of the recent research literature, standard in IoT - a standards gap analysis on security, data protection and
documents, and ongoing projects on this topic. Several projects privacy,” in 2019 IEEE Conference on Standards for Communications
and Networking (CSCN’19), 2019.
are initiated by SDOs to investigate different aspects of big [13] P. Cihon, “Standards for ai governance: International standards to enable
data privacy aspects and security issues. Even though most of global coordination in ai research & development,” University of Oxford,
the standards mentioned in this study are ongoing projects, Tech. Rep., 2019.
[14] (2019) Artificial intelligence, blockchain and distributed ledger tech-
they are expected to be published in the near future. One nologies. CEN-CENELEC. [Online]. Available: https://www.cencenelec.
of the advantages standards can bring into research is a eu/standards/Topics/ArtificialIntelligence/Pages\\/default.aspx
more coherent terminology, which is defined once and used [15] H. Bae, J. Jang, D. Jung, H. Jang, H. Ha, and S. Yoon, “Security and
privacy issues in deep learning,” 2018.
later in subsequent projects. In contrast, researchers often [16] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. Leung, “A survey on
use different terminologies for the same or similar concepts. security threats and defensive techniques of machine learning: A data
Besides, according to the rapid growth of AI, developed driven view,” IEEE access, vol. 6, 2018.
[17] N. Samir Labib, C. Liu, S. Dilmaghani, M. R. Brust, G. Danoy, and
road maps in standards can provide insights according to P. Bouvry, “White paper: Data protection and privacy in smart ict-
the demands and requirements of the market. Hence, it may scientific research and technical standardization,” Tech. Rep., 2018.
provide opportunities for new research activities to address [18] “Iso/iec pd tr 24028: Information technology – artificial intelligence
(ai) – overview of trustworthiness in artificial intelligence,” International
line with market needs. Organization for Standardization, Geneva, CH, Standard.
ACKNOWLEDGEMENT [19] D. Poole, A. Mackworth, and R. Goebel, Computational intelligence: a
logical approach. New York, U.S.: Oxford University Press, 1998.
This work is partially funded by the joint research pro- [20] C. M. Bishop, Pattern recognition and machine learning. Springer
gramme University of Luxembourg/SnT–ILNAS on Digital Science, Business Media, 2006.
[21] “Iso/iec wd 22989: Artificial intelligence – concepts and terminology,”
Trust for Smart-ICT. International Organization for Standardization, Geneva, CH, Standard.
[22] “Iso/iec wd 23053: Framework for artificial intelligence (ai) systems
R EFERENCES using machine learning (ml),” International Organization for Standard-
[1] D. Laney, “3d data management: Controlling data volume, velocity and ization, Geneva, CH, Standard.
variety,” META group research note, vol. 6, no. 70, 2001. [23] S. Dilmaghani, M. R. Brust, A. Piyatumrong, G. Danoy, and P. Bouvry,
[2] “Iso/iec tr 20547-2: Information technology – big data reference ar- “Link definition ameliorating community detection in collaboration
chitecture – part 2: Use cases and derived requirements,” International networks,” Frontiers in Big Data, vol. 2, 2019.
Organization for Standardization, Geneva, CH, Standard, 2018. [24] A. M. Fiscarelli, M. R. Brust, G. Danoy, and P. Bouvry, “A memory-
[3] K. Aditya and C. Wheelock, “Artificial intelligence market forecasts,” based label propagation algorithm for community detection,” in In-
Tractia, Tech. Rep., 2016. ternational Conference on Complex Networks and their Applications.
[4] A. K. Keith Kirkpatrick, “Artificial intelligence use cases,” Tractica, Springer, 2018, pp. 171–182.
Tech. Rep., 2018. [25] J. Buolamwini and T. Gebru, “Gender shades: Intersectional accuracy
[5] J. Chen, A. R. Kiremire, M. R. Brust, and V. V. Phoha, “Modeling online disparities in commercial gender classification,” in Proc. of Conf. on
social network users’ profile attribute disclosure behavior from a game fairness, accountability and transparency, 2018.
theoretic perspective,” Computer Communications, vol. 49, 2014. [26] B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao,
[6] J. Chen, M. R. Brust, A. R. Kiremire, and V. V. Phoha, “Modeling N. Taft, and J. D. Tygar, “Antidote: understanding and defending against
privacy settings of an online social network from a game-theoretical poisoning of anomaly detectors,” in Proc. of the 9th ACM SIGCOMM
perspective,” in IEEE Int. Conference on Collaborative Computing: Conf. on Internet measurement, 2009.
Networking, Applications and Worksharing, Oct 2013, pp. 213–220.

5742
[27] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg, “Badnets: Evaluating [54] J. Newsome, B. Karp, and D. Song, “Paragraph: Thwarting signature
backdooring attacks on deep neural networks,” IEEE Access, 2019. learning by training maliciously,” in International Workshop on Recent
[28] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and b. r. U. S. S. y. . Advances in Intrusion Detection, 2006.
Ristenpart, Thomas”, “Privacy in pharmacogenetics: An end-to-end case [55] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. Rubinstein, U. Saini,
study of personalized warfarin dosing.” C. A. Sutton, J. D. Tygar, and K. Xia, “Exploiting machine learning to
[29] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership subvert your spam filter.” LEET, vol. 8, 2008.
inference attacks against machine learning models,” in IEEE Symposium [56] B. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto,
on Security and Privacy (SP), 2017, pp. 3–18. and F. Roli, “Poisoning behavioral malware clustering,” in Proc. of the
[30] B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, workshop on artificial intelligent and security, 2014.
C. Eckert, and F. Roli, “Adversarial malware binaries: Evading deep [57] S. Misra, M. Tan, M. Rezazad, M. R. Brust, and N.-M. Cheung,
learning for malware detection in executables,” in 26th European Signal “Early detection of crossfire attacks using deep learning,” arXiv preprint
Processing Conference (EUSIPCO), 2018, pp. 533–537. arXiv:1801.00235, 2017.
[31] J. Zhang and X. Jiang, “Adversarial examples: Opportunities and chal- [58] M. Rezazad, M. R. Brust, M. Akbari, P. Bouvry, and N.-M. Cheung,
lenges,” arXiv preprint arXiv:1809.04790, 2018. “Detecting target-area link-flooding ddos attacks using traffic analysis
[32] Y.-A. De Montjoye, C. A. Hidalgo, M. Verleysen, and V. D. Blondel, and supervised learning,” in Future of Information and Communication
“Unique in the crowd: The privacy bounds of human mobility,” Scientific Conference. Springer, 2018, pp. 180–202.
reports, vol. 3, 2013. [59] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart, “Stealing
[33] Y.-A. de Montjoye et al., “Response to comment on “unique in the shop- machine learning models via prediction apis,” in 25th U SEN IX
ping mall: On the reidentifiability of credit card metadata”,” Science, vol. Security Symposium, 2016.
351, 2016. [60] S. J. Oh, M. Augustin, B. Schiele, and M. Fritz, “Towards
[34] J. Andress, The basics of information security: understanding the reverse-engineering black-box neural networks,” arXiv preprint
fundamentals of InfoSec in theory and practice. Syngress, 2014. arXiv:1711.01768, 2017.
[35] L. Sweeney, “Simple demographics often identify people uniquely,” [61] B. Wang and N. Z. Gong, “Stealing hyperparameters in machine
Health (San Francisco), vol. 671, 2000. learning,” in IEEE Symposium on Security and Privacy (SP), 2018.
[36] E. Jahani, P. Sundsøy, J. Bjelland, L. Bengtsson, Y.-A. de Montjoye [62] D. Lowd and C. Meek, “Adversarial learning,” in Proc. Int. Conf. of the
et al., “Improving official statistics in emerging markets using machine 11th ACM SIGKDD, ser. KDD ’05, 2005.
learning and mobile phone data,” EPJ Data Science, vol. 6, 2017. [63] J. Hayes, L. Melis, G. Danezis, and E. De Cristofaro, “Logan: Mem-
[37] “Iso/iec cd 20547-4: Information technology – big data reference archi- bership inference attacks against generative models,” 2019.
tecture – part 4: Security and privacy,” International Organization for [64] F.-J. Wu, M. R. Brust, Y.-A. Chen, and T. Luo, “The privacy exposure
Standardization, Geneva, CH, Standard. problem in mobile location-based services,” in IEEE Conf. on Global
[38] M. Wall. (2019, Jul.) Biased and wrong? facial recognition tech Communications (GLOBECOM), 2016.
in the dock. BBC. [Online]. Available: https://www.bbc.com/news/ [65] F. K. Dankar and K. El Emam, “The application of differential privacy
business-48842750 to health data,” in Proc. of the Joint EDBT/ICDT Workshops, 2012.
[39] S. X. Zhang, R. E. Roberts, and D. Farabee, “An analysis of prisoner [66] S. Dilmaghani, “A privacy-preserving solution for storage and processing
reentry and parole risk using compas and traditional criminal history of personal health records against brute-force attacks,” Master’s thesis,
measures,” Crime & Delinquency, vol. 60, 2014. University of Bilkent, Turkey, 2017.
[40] “Iso/iec np tr 24027: Information technology – artificial intelligence [67] OPAL. (2017) Open algorithms. [Online]. Available: http://www.
(ai) – bias in ai systems and ai aided decision making,” International opalproject.org/
Organization for Standardization, Geneva, CH, Standard. [68] F. Kamiran, A. Karim, and X. Zhang, “Decision theory for
[41] A. Newell, R. Potharaju, L. Xiang, and C. Nita-Rotaru, “On the discrimination-aware classification,” in IEEE 12th Inter. Conf. on Data
practicality of integrity attacks on document-level sentiment analysis,” Mining, 2012.
in Proc. of the Artificial Intelligent and Security Workshop, 2014. [69] J. A. Adebayo, “Fairml: Toolbox for diagnosing bias in predictive
[42] J. Vincent. (2016) Twitter taught microsoft’s ai chatbot to be modeling,” Ph.D. dissertation, MIT, 2016.
a racist asshole in less than a day. [Online]. Available: https: [70] R. K. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan,
//www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist P. Lohia, J. Martino, S. Mehta, A. Mojsilovic et al., “Ai fairness 360: An
[43] A. Pyrgelis, C. Troncoso, and E. D. Cristofaro, “Knock knock, who’s extensible toolkit for detecting, understanding, and mitigating unwanted
there? membership inference on aggregate location data,” CoRR, 2017. algorithmic bias,” arXiv preprint arXiv:1810.01943, 2018.
[44] S.-M. Moosavi-Dezfooli, A. Fawzi, and P. Frossard, “Deepfool: a simple [71] B. Biggio, I. Corona, B. Nelson, B. I. Rubinstein, D. Maiorca,
and accurate method to fool deep neural networks,” in Proc. of the IEEE G. Fumera, G. Giacinto, and F. Roli, “Security evaluation of support
Conf. on computer vision and pattern recognition, 2016. vector machines in adversarial environments,” in Support Vector Ma-
[45] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of chines Applications. Springer, 2014.
adversarial machine learning,” Pattern Recognition, vol. 84, 2018. [72] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detection
[46] P. Samarati and L. Sweeney, “Protecting privacy when disclosing in- of adversarial training examples in poisoning attacks through anomaly
formation: k-anonymity and its enforcement through generalization and detection,” arXiv preprint arXiv:1802.03041, 2018.
suppression,” Technical Report, SRI International, Tech. Rep., 1998. [73] Y. Gao, C. Xu, D. Wang, S. Chen, D. C. Ranasinghe, and S. Nepal,
[47] M. Al-Rubaie and J. M. Chang, “Privacy-preserving machine learning: “Strip: A defence against trojan attacks on deep neural networks,” arXiv
Threats and solutions,” IEEE Security & Privacy, vol. 17, 2019. preprint arXiv:1902.06531, 2019.
[48] J. H. Hinnefeld, P. Cooman, N. Mammo, and R. Deese, “Evaluat- [74] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. Rubinstein, U. Saini,
ing fairness metrics in the presence of dataset bias,” arXiv preprint C. Sutton, J. Tygar, and K. Xia, “Misleading learners: Co-opting your
arXiv:1809.09245, 2018. spam filter,” in Machine learning in cyber trust. Springer, 2009.
[49] I. D. Raji and J. Buolamwini, “Actionable auditing: Investigating the [75] M. Juuti, S. Szyller, S. Marchal, and N. Asokan, “Prada: protecting
impact of publicly naming biased performance results of commercial ai against dnn model stealing attacks,” in IEEE European Symposium on
products,” in AAAI/ACM Conf. on AI Ethics and Society, 2019. Security and Privacy (EuroS&P), 2019.
[50] A. Amini, A. Soleimany, W. Schwarting, S. Bhatia, and D. Rus, [76] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and
“Uncovering and mitigating algorithmic bias through learned latent Ú. Erlingsson, “Scalable private learning with pate,” arXiv preprint
structure,” 2019. arXiv:1802.08908, 2018.
[51] J. Larson, S. Mattu, L. Kirchner, and J. Angwin, “How we analyzed the [77] T. Lee, B. Edwards, I. Molloy, and D. Su, “Defending against machine
compas recidivism algorithm,” ProPublica, 2016. learning model stealing attacks using deceptive perturbations,” arXiv
[52] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for preprint arXiv:1806.00054, 2018.
data poisoning attacks,” in Advances in neural information processing [78] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
systems, 2017. deep learning models resistant to adversarial attacks,” arXiv preprint
[53] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor arXiv:1706.06083, 2017.
attacks on deep learning systems using data poisoning,” arXiv preprint [79] D. Meng and H. Chen, “Magnet: a two-pronged defense against adver-
arXiv:1712.05526, 2017. sarial examples,” in Proc. of the ACM SIGSAC Conf. on Computer and
Communications Security, 2017.