New Attacks on Dataset, Model and Input. a Threat Model for AI
New Attacks on Dataset, Model and Input. a Threat Model for AI
Abstract: Machine learning (ML) and artificial intelligence (AI) techniques have now become commonplace in software
arXiv:2401.07960v1 [cs.CR] 15 Jan 2024
products and services. When threat modelling a system, it is therefore important that we consider threats
unique to ML and AI techniques, in addition to threats to our software. In this paper, we present a threat
model that can be used to systematically uncover threats to AI based software. The threat model consists of
two main parts, a model of the software development process for AI based software and an attack taxonomy
that has been developed using attacks found in adversarial AI research. We apply the threat model to two real
life AI based software and discuss the process and the threats found.
Data Input
DEPLOYMENT PHASE
Classification , Model
Classification
Yes 8 9 Prediction or 10 Evaluation
or Prediction
Is the Model Decision Outcome
* Adequate? Software Decision-
Model During
Evaluation Deployment
Deployment Making
No During
Procedure
Deployment
Text
*
Yes
Deployment Deployment
Environment Environment
Figure 1: Software development process for AI based software. Circles represent processes, arrows represent inputs and
outputs, diamonds represent decisions and ‘*’ means that the arrow can point to any previous process
inputs into the requirement engineering process: sys- errors, inconsistencies, omissions and duplications.
tem/domain information, stakeholder/organisational Data cleaning is a fundamental part of this process,
requirements, and regulations. Such requirements are and is often used in combination with data collection
often gathered via several different methods, such and curation (Symeonidis et al., 2022). Data prepara-
as traditional (e.g., interviews), modern (e.g., pro- tion sometimes involves other techniques such as data
totyping), cognitive (e.g., card sorting), group (e.g., transformation and is a vital step in the data process-
brainstorming) or contextual (e.g., ethnography) cat- ing phase (Kreuzberger et al., 2023).
egories. In general, the outputs of the requirement Feature Engineering & Labelling: Features are
engineering process are the agreed requirements and elements used to describe specific objects in data. The
the specifications of the systems and model being de- process of feature engineering involves creating fea-
veloped. Other, more specific details may be included tures for a dataset so it can be understood and used by
as well, such as the plans for acquiring the data, the an algorithm (Dong and Liu, 2018). The Feature En-
amount of data needed and how accurate the model gineering & Labelling process in our diagram may ad-
needs to be. ditionally encompass related techniques of feature ex-
Data Preparation: Once the specification of the traction, feature construction, feature storage and fea-
software and the requirements of the needed datasets ture encoding. There may be algorithms that do not
have been identified, work on collecting and cleaning have a feature engineering part to them. Our model,
data is usually started. (Roh et al., 2019) have di- however, is created to be exhaustive so that it covers
vided the various methods of raw data collection into most possibilities. As will be shown later, when this
three categories, discovery, augmentation and gener- diagram is used, processes that aren’t applicable to a
ation. The raw dataset thus collected can be in var- given scenario, can be removed.
ious forms, such as, audio, video, text, time-series Labelling is a related idea, often used in super-
or a combination of such formats. It may also have vised or semi-supervised learning and involves as-
errors, inconsistencies, omissions and duplications. signing a category or tag to a given piece of data
Data cleaning involves finding and resolving these (Grimmeisen et al., 2022), to help the model learn or
distinguish between objects. Model Evaluation after Development: At this
stage the model is evaluated once again. This process
3.2 Model Development takes two inputs, the optimised trained model pro-
duced after tuning and a testing dataset. The testing
The main objective of this phase is to train a model dataset is used to assess the performance of the final
and evaluate its performance. We divide the work optimised trained model. If the outcome of the eval-
undertaken in this phase into four processes, Model uation is adequate, the deployment phase is executed.
Training, Model Evaluation during Development, Hy- Otherwise, depending on the situation, the model may
perparameter Tuning and Model Evaluation after De- need to be retrained from the very beginning, or use
velopment. different training data, features, or labels.
Model Training: The refined training dataset, and
features or labels produced from the preceding pro- 3.3 Deployment
cess are used as inputs to the Model Training pro-
cess where an algorithm is trained on the data pro- In this phase the model is deployed as part of a
vided. Another input to this process is an algorithm software product or service in environments such as
or model that is to be trained. Depending on the spe- cloud, server, desktop or mobile device. The work un-
cific details of the AI model, the used algorithms will dertaken in this phase is divided into three processes,
differ. Examples of some algorithms that can be used Software Deployment, Decision Making and Model
include neural networks, ensemble learning and other Evaluation during Deployment.
supervised, semi-supervised or unsupervised learning Software Deployment: This process involves the
methods. Model training is the most critical process fully developed AI based software being deployed in
in the development of AI based software and outputs different environments. The input into this phase is
a trained model to make classifications or predictions. the data the software uses. This data is used by the
Model Evaluation during Development: In this software to output a classification or prediction, de-
process, the trained model from the preceding process pending on the problem that is being solved.
is used as an input along with a validation dataset. Decision Making: While in some cases the clas-
The validation dataset is used on the trained model sification or prediction may be the desired end-goal,
to measure model performance. This dataset can be in other cases the classification or prediction output
generated via several different methods. One method may be fed into a process, which produces a decision
is to split the training dataset into three subsets: the based on the input.
training dataset, validation dataset and testing dataset. Model Evaluation during Deployment: To en-
Other methods include k-fold cross validation, which sure that the model does not drift overtime and is
involves splitting the dataset into ‘k’ subsets. In some fit for purpose, constant, iterative evaluation or mon-
cases, multiple methods may be used. itoring of a model during deployment is sometimes
Hyperparameter Tuning: If the outcome of necessary. The Model Evaluation during Deployment
model evaluation during development is not adequate process encapsulates this thinking. If the evaluation
or the developers want to improve model perfor- outcome is adequate, the deployment phase is contin-
mance, the process of hyperparameter tuning may ued. If the evaluation outcome is not adequate, the
occur. Some examples of hyperparameters that are model may be retrained from the start, or use differ-
tuned are, the learning rate, neural network architec- ent training data, features, or labels. This evaluation
ture or the size of the neural network used (Feurer and is usually done periodically and not necessarily after
Hutter, 2019). Alternatively, developers may also go each run during deployment.
back to the data cleaning or the feature engineering &
labelling process or change the algorithm used to cre-
ate the model. In Figure 1 this is shown by a ‘*’. This 4 AI THREAT ENUMERATION
process occurs iteratively until the model is deemed
satisfactory by the developers. The second part of threat modelling is threat enumer-
Various different types of tuning methods ex-
ation. To understand the threats to AI, we explored
ist, each with their own advantages and disadvan-
extensive research literature in adversarial AI. Our lit-
tages. Examples include random search, grid search,
erature review has led to the creation of a taxonomy
or Bayesian optimisation. Meta-heuristic algorithms
of threats to AI shown in Figure 2. In our taxonomy,
such as particle swarm optimisation and genetic algo-
all possible threats to AI based software are divided
rithms are other popular tuning methods used as well
into three main categories, attacks on data, attacks on
(Yang and Shami, 2020).
model and attacks on inputs, from which we derive
our acronym ADMIn. model poisoning or logic corruption attacks, policy
exfiltration attacks and model extraction attacks.
4.1 Attacks on Data Model Poisoning or Logic Corruption Attacks:
In these attacks the adversary attempts to maliciously
In these type of attacks, the adversary’s focus is on modify or alter the logic, algorithm, code, gradients,
data. The adversary either attempts to steal propri- rules or procedures of the software. This can result
etary data through the algorithm or tries to poison or in reduction of performance and accuracy, as well as
maliciously modify internal data and/or systems. This causing the model to carry out malicious actions (Os-
category is further split into two types of attacks, data eni et al., 2021)(Benmalek et al., 2022)(Wang et al.,
exfiltration attacks and data poisoning attacks. 2022). Such attacks can be hard to defend against but
Data Exfiltration Attacks: In these attacks, the usually require that the adversary has full access to
adversary attempts to steal private information from the algorithms used in the model. This makes these
the target model’s dataset. This can take place in three attacks less likely to occur.
different ways. First, through property exfiltration at- Policy Exfiltration Attacks: In policy exfiltration
tacks, where the attack consists of an adversary steal- attacks the adversary attempts to learn the policy that
ing data properties from the training dataset. Sec- the model is enforcing by repeatedly querying it. The
ond, through dataset theft attacks, where the attacks repeated querying may make evident the input/output
involve the theft of the entire dataset. Finally, exfil- relationship and may result into the adversary learn-
tration can be achieved through datapoint verification ing the policy or rules being implemented.
attacks. In these attacks, an adversary attempts to de- Model Extraction Attacks: Also known as
termine if a specific datapoint is in the model’s train- model stealing, the adversary in these types of attacks
ing dataset via interactions with the model. steals the model to reconstruct or reverse engineer it
Data Poisoning Attacks: In these attacks, the ad- (Hu et al., 2022).This is usually done by decipher-
versary deliberately attempts to corrupt the datasets ing information such as parameters or hyperparame-
used by the AI based software. The adversary may ters. These attacks require the inputs to the model be
poison the dataset via adding new data, modifying ex- known to the adversary whereby unknown parameters
isting data (e.g., content, labels, or features), or delet- can be computed using information from a model’s
ing data in the model’s training or validation dataset, inputs and its outputs (Chakraborty et al., 2021).
with the aim of diminishing the model’s performance.
An attack consisting of addition of new datapoints 4.3 Attacks on Inputs
into the training data is performed with the intention
of adding biases to the model, so it mis-classifies in- In these type of attacks, the adversary uses malicious
puts. (Oseni et al., 2021)(Liu et al., 2022). Poisoning content as the input into a ML model during deploy-
of datasets may take place through the environment ment. This category is further split into four types of
or through the inputs to the model. Such attacks may attacks, prompt injection attacks, denial of service at-
either be targeted or untargeted. In a targeted attack tacks, evasion attacks and man-in-the-middle attacks.
an adversary may attempt to, for example have a mal- Prompt Injection Attacks: Prompt injection at-
ware classification model mis-classify the malware as tacks are a relatively new but well-known type of at-
benign. In an untargeted attack, the adversary on the tack. It consists of an adversary trying to manipulate
other hand is looking to make the model mis-classify a (natural language processing) system via prompts to
the malware as anything but the actual classification. gain unauthorized privileges, such as bypassing con-
Attacks where the adversary is looking to modify or tent filters (Greshake et al., 2023). The ChatGPT ser-
delete existing data, can be comparatively harder to vice for example responds to text prompts and may
mount as such attacks require the knowledge of and contain text filters for commercial sensitivity, privacy
access to the training data. Such access and knowl- and other reasons. However, crafting prompts in cer-
edge however can be gained by exploiting software tain ways may allow users to bypass these filters in
vulnerabilities in the systems surrounding the dataset. what is known as a prompt injection attack. Prompt
injection attacks can be harder to defend against com-
4.2 Attacks on Model pared to other well known injection attacks such as
SQL or command injection because the data input as
In these type of attacks, the adversary’s focus is on the well as the control input, both consist of natural lan-
model being used. The adversary either attempts to guage in textual prompts.
steal the proprietary model or tries to modify it. This Denial of Service (DoS) Attacks: A DoS attack
category is further split into three types of attacks, consists of an adversary disrupting the availability of
Figure 2: Taxonomy of threats to AI