0% found this document useful (0 votes)
38 views20 pages

Safenet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

This document discusses how ensembling techniques can be used within secure multiparty computation (MPC) frameworks to provide defenses against data poisoning attacks and privacy attacks during private collaborative machine learning. The approach, called SafeNet, trains multiple local models from each data owner's dataset and aggregates them into an ensemble within the MPC. This limits any single attacker's influence and provably improves accuracy and privacy compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views20 pages

Safenet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

This document discusses how ensembling techniques can be used within secure multiparty computation (MPC) frameworks to provide defenses against data poisoning attacks and privacy attacks during private collaborative machine learning. The approach, called SafeNet, trains multiple local models from each data owner's dataset and aggregates them into an ensemble within the MPC. This limits any single attacker's influence and provably improves accuracy and privacy compared to existing approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SafeNet: The Unreasonable Effectiveness of

Ensembles in Private Collaborative Learning


Harsh Chaudhari∗ , Matthew Jagielski† , Alina Oprea∗
∗ Northeastern University, † Google Research

Abstract—Secure multiparty computation (MPC) has been training have not been influenced by an adversary. However,
proposed to allow multiple mutually distrustful data owners to research in adversarial machine learning has shown that data
jointly train machine learning (ML) models on their combined poisoning attacks pose a high risk to the integrity of trained
data. However, by design, MPC protocols faithfully compute the
training functionality, which the adversarial ML community has ML models [10], [40], [44], [49]. Data poisoning becomes a
shown to leak private information and can be tampered with in particularly relevant threat in PPML systems, as multiple data
poisoning attacks. In this work, we argue that model ensembles, owners contribute secret shares of their datasets for jointly
implemented in our framework called SafeNet, are a highly MPC- training a ML model inside the MPC, and poisoned samples
amenable way to avoid many adversarial ML attacks. The natural cannot be easily detected. Furthermore, the guarantees of MPC
partitioning of data amongst owners in MPC training allows this
approach to be highly scalable at training time, provide provable provide privacy against an adversary observing the commu-
protection from poisoning attacks, and provably defense against a nication in the protocol, but does not protect against any
number of privacy attacks. We demonstrate SafeNet’s efficiency, sensitive information leaked by the model about its training
accuracy, and resilience to poisoning on several machine learning set. Many privacy attacks are known to allow inference on
datasets and models trained in end-to-end and transfer learning machine learning models’ training sets, and protecting against
scenarios. For instance, SafeNet reduces backdoor attack success
significantly, while achieving 39× faster training and 36× less these attacks is an active area of research.
communication than the four-party MPC framework of Dalskov In this paper, we study the impact of these adversarial
et al. [28]. Our experiments show that ensembling retains these machine learning threats on standard MPC frameworks for
benefits even in many non-iid settings. The simplicity, cheap private ML training. Our first observation is that the security
setup, and robustness properties of ensembling make it a strong definition of MPC for private ML training does not account
first choice for training ML models privately in MPC.
for data owners with poisoned data. Therefore, we extend the
security definition by considering an adversary who can poison
I. I NTRODUCTION
the datasets of a subset of owners, while at the same time
Machine learning (ML) has been successful in a broad controlling a subset of the servers in the MPC protocol. Under
range of application areas such as medicine, finance, and our threat model, we empirically demonstrate that poisoning
recommendation systems. Consequently, technology compa- attacks are a significant threat to the setting of private ML
nies such as Amazon, Google, Microsoft, and IBM provide training. We show the impact of backdoor [23], [44] and
machine learning as a service (MLaaS) for ML training and targeted [40], [54] poisoning attacks on four MPC frameworks
prediction. In these services, data owners outsource their and five datasets, using logistic regression and neural networks
ML computations to a set of more computationally powerful models. We show that with control of just a single owner
servers. However, in many instances, the client data used for and its dataset (out of a set of 20 owners contributing data
ML training or classification is sensitive and may be subject for training), the adversary achieves 100% success rate for
to privacy requirements. Regulations such as GDPR, HIPAA a backdoor attack, and higher than 83% success rate for
and PCR, data sovereignty issues, and user privacy concern are a targeted attack. These attacks are stealthy and cannot be
common reasons preventing organizations from collecting user detected by simply monitoring standard ML accuracy metrics.
data and training more accurate ML models. These privacy To mitigate these attacks, we apply ensembling technique
requirements have led to the design of privacy-preserving from ML, implemented in our framework called SafeNet,
ML training methods, including the use of secure multiparty which, in the collaborative learning setting we consider, is
computation (MPC). an effective defense against poisoning attacks, while also
Recent literature in the area of MPC for ML proposes simultaneously preventing various types of privacy attacks.
privacy-preserving machine learning (PPML) frameworks [?], Rather than attempting to implement an existing poisoning
[1], [28], [29], [67], [69], [71], [87], [88], [90] for training defense in MPC, we observe that the structure of the MPC
and inference of various machine learning models such as threat model permits a more general and efficient solution.
logistic regression, neural networks, and random forests. In Our main insight is to require individual data owners to train
these models, data owners outsource shares of their data to ML models locally, based on their own datasets, and secret
a set of servers and the servers run MPC protocols for ML share the resulting ensemble of models in the MPC. We filter
training and prediction. An implicit assumption for security is out local models with low accuracy on a validation dataset, and
that the underlying datasets provided by data owners during use the remaining models to make predictions using a majority
voting protocol performed inside the MPC. While this permits we compare SafeNet with state-of-the-art defenses against
stronger model poisoning attacks, the natural partitioning of poisoning in federated learning [16] and show its enhanced
the MPC setting prevents an adversary from poisoning more certified robustness even under non-iid data distributions.
than a fixed subset of the models, resulting in a limited number
of poisoned models in the ensemble. We perform a detailed II. BACKGROUND AND R ELATED W ORK
analysis of the robustness properties of SafeNet, and provide We provide background on secure multi-party computation
lower bounds on the ensemble’s accuracy based on the error and poisoning attacks in ML, and discuss related work in the
rate on the local models in the ensemble and the number area of adversarial ML and MPC.
of poisoned models, as well as a prediction certification
procedure for arbitrary inputs. The bounded contribution of A. Secure Multi-Party Computation
each local model also gives a provable privacy guarantee for Secure Multi-Party Computation (MPC) [7], [31], [41], [47],
SafeNet. Furthermore, we show empirically that SafeNet suc- [93] allows a set of n mutually distrusting parties to compute
cessfully mitigates backdoor and targeted poisoning attacks, a joint function f , so that collusion of any t parties cannot
while retaining high accuracy on the ML prediction tasks. In modify the output of computation (correctness) or learn any
addition, our approach is efficient, as ML model training is information beyond what is revealed by the output (privacy).
performed locally by each data owner, and only the ensemble The area of MPC can be categorized into honest majority [4],
filtering and prediction protocols are performed in the MPC. [7], [13], [20], [70] and dishonest majority [30], [31], [41],
This provides large performance improvements in ML training [68], [93]. The settings of two-party computation (2PC) [61],
compared to existing PPML frameworks, while simultaneously [62], [74], [93], three parties (3PC) [3], [4], [70], and four
mitigating poisoning attacks. For instance, for one neural parties (4PC) [21], [28], [43], [48] have been widely studied
network model, SafeNet performs training 39× faster than the as they provide efficient protocols. Additionally, recent works
[28] PPML protocol and requires 36× less communication. in the area of privacy preserving ML propose training and
Finally, we investigate settings with diverse data distributions prediction frameworks [1], [58], [69], [71], [77], [78], [87],
among owners, and evaluate the accuracy and robustness of [88] built on top of the above MPC settings. Particularly, most
SafeNet under multiple data imbalance conditions. of the frameworks are deployed in the outsourced computation
To summarize, our contributions are as follows: setting where the data is secret-shared to a set of servers which
Adversarial ML-aware Threat Model for Private Machine perform training and prediction using MPC.
Learning. We extend the MPC security definition for private B. Data Poisoning Attacks
machine learning to encompass the threat of data poisoning
attacks and privacy attacks. In our threat model, the adversary In a data poisoning attack, an adversary controls a subset
can poisoned a subset t out of m data owners, and control of the training dataset, and uses this to influence the model
T out of N servers participating in the MPC. The attacker trained on that training set. In a backdoor attack [23], [44],
might also seek to learn sensitive information about the local [73], an adversary seeks to add a “trigger” or backdoor
datasets through the trained model. pattern into the model. The trigger is a perturbation in feature
space, which is applied to poisoned samples in training to
SafeNet Ensemble Design. We propose SafeNet, which induce misclassification on backdoored samples at testing. In
adapts ensembling technique from ML to the collaborative a targeted attack [54], [55], [82], the adversary’s goal is to
MPC setting by having data owners train models locally and change the classifier prediction for a small number of specific
aggregation of predictions is performed securely inside the test samples. Backdoor and targeted attacks can be difficult to
MPC. We show that this procedure gives provable privacy and detect, due to the subtle impact they have on the ML model.
security guarantees, which improves as models become more
accurate. We also propose various novel extensions to this C. Related Work
ensembling strategy which make SafeNet applicable to a wider While both MPC and adversarial machine learning have
range of training settings (including transfer learning and been the topic of fervent research, work connecting them is
accommodating computationally restricted owners). SafeNet’s still nascent. We are only aware of several recent research
design is agnostic to the underlying MPC framework and papers that attempt to bridge these areas. Recent works [18],
we show it can be instantiated over four different MPC [59] show that MPC algorithms applied at test time can be
frameworks, supporting two, three and four servers. compromised by malicious users, allowing for efficient model
Comprehensive Evaluation. We show the impact of existing extraction attacks. Second, Escudero et al. [36] show that
backdoor and targeted poisoning attacks on several existing running a semi-honest MPC protocol with malicious parties
PPML systems [4], [28], [32] and five datasets, using logistic can result in backdoor attacks in the resulting SVM model.
regression and neural network models. We also empirically Both these works, as well as our own, demonstrate the diffi-
demonstrate the resilience of SafeNet against these attacks, culty of aligning the guarantees of MPC with the additional
for an adversary compromising up to 9 out of 20 data owners. desiderata of adversarial machine learning. We demonstrate
We report the gains in training time and communication cost the effectiveness of data poisoning attacks in MPC for neural
for SafeNet compared to existing PPML frameworks. Finally, networks and logistic regression models, and propose a novel

2
SOC Paradigm ST ⊂ S of servers of size at most T < N . The exact values
C1 of N and T are dependent on the MPC protocols used for
... SN S1
Ct
training the ML model privately. We experiment with two-
S S2 party, three-party, and four-party protocols with one corrupt
N −1
Ct+1 server. MPC defines two main adversaries: i) Semi-honest: Ad-
... versary follows a given protocol, but tries to derive additional
Cm ST +1 ST
information from the messages received from other parties
during the protocol; ii) Malicious: Adversary has the ability
Poisoned Corrupted Honest to arbitrarily deviate during the execution of the protocol.
Fig. 1: Threat model considered in our setting. The adversary Security Definition. MPC security is defined using the real
Apsoc can poison at most t out of m data owners and corrupt world - ideal world paradigm [14]. In the real world, parties
at most T out of N servers participating in the MPC compu- participating in the MPC interact during the execution of a pro-
tation. Ci and Sj denote the ith data owner and j th server. tocol π in presence of an adversary A. Let REAL[Z, A, π, λ]
denote the output of the environment Z when interacting with
ensemble training algorithm in SafeNet to defend against A and the honest parties, who execute π on security parameter
poisoning attacks in MPC. λ. Effectively, REAL is a function of the inputs/outputs and
Model ensembles have been proposed as a defense for messages sent/received during the protocol. In the ideal world,
ML poisoning and privacy attacks in prior work in both the parties simply forward their inputs to a trusted functionality
the centralized training setting [9], [50] and the collaborative F and forward the functionality’s response to the environment.
learning setting. Compared to centralized approaches, which Let IDEAL[Z, S, F, λ] denote the output of the environment
process a single dataset, we are able to leverage the trust Z when interacting with adversary S and honest parties who
model of MPC, which limits the number of poisoned models run the protocol in presence of F with security parameter λ.
in the ensemble and can provide stronger robustness and The security definition states that the views of the adversary
privacy guarantees. Ensembles have also been proposed in in the real and ideal world are indistinguishable:
MPC to protect data privacy [24] and in federated learning Definition 1. A protocol π securely realizes functionality F if
to provide poisoning robustness [16]. Our work provides a for all environments Z and any adversary of type Asoc , which
stronger privacy analysis, protecting from a broader range of corrupts a subset ST of servers of size at most T < N in the
threats than [24], and additionally offers robustness guarantees. real world, then there exists a simulator S attacking the ideal
We provide a more detailed comparison with these approaches world, such that IDEAL[Z, S, F, λ] ≈ REAL[Z, Asoc , π, λ].
in Section III-F.
Poisoning Adversary. Existing threat models for training ML
III. S AFE N ET: U SING E NSEMBLES IN MPC
models privately assume that the local datasets contributed
We describe here our threat model and show how to imple- towards training are not under the control of the adversary.
ment ensembles in MPC. We then show that ensembling gives However, data poisoning attacks have been shown to be a
us provable robustness to poisoning and privacy adversaries. real threat when ML models are trained on crowdsourced
data or data coming from untrusted sources [10], [49], [72].
A. Threat Model
Data poisoning becomes a particularly relevant risk in PPML
Setup. We consider a set of m data owners C = ∪m k=1 Ck systems, in which data owners contribute their own datasets
who wish to train a joint machine learning model M on for training a joint ML model. Additionally, the datasets are
their combined dataset D = ∪m k=1 Dk . We adopt the Secure secret shared among the servers participating in the MPC, and
Outsourced Computation (SOC) paradigm [1], [13], [28], [29], potential poisoned samples (such as backdoored data) cannot
[69], [71], [78], [87], [88] for training model M privately, be easily detected by the servers running the MPC protocol.
where the owners secret-share their respective datasets to a set To account for such attacks, we define a poisoning adversary
of outsourced servers, who execute the MPC protocols to train Ap that can poison a subset of local datasets of size at most
M. The final output is a trained model in secret-shared format t < m. Data owners with poisoned data are called poisoned
among the servers. A single training/testing sample is ex- owners, and we assume that the adversary can coordinate with
pressed as (xi , yi ), where xi is the input feature vector and yi the poisoned owners to achieve a certain adversarial goal.
is its corresponding true label or class. We use Dk = (Xk , yk ) For example, the adversary can mount a backdoor attack, by
to denote dataset of data owner Ck participating in the training selecting a backdoor pattern and poison the datasets under its
process. Matrix Xk denotes a feature matrix where the number control with the particular backdoor pattern.
of rows represent the total training samples possessed by Ck Poisoning Robustness: We consider an ML model to be
and yk denotes the corresponding vector of true labels. robust against a poisoning adversary Ap , who poisons the
Adversary in the SOC. Given a set S = {S1 , . . . , SN } of datasets of t out of m owners, if it generates correct class
servers, we define an adversary Asoc , similar to prior work [1], predictions on new samples with high probability. We provide
[28], [69], [71], [78], [88]. Asoc can statically corrupt a subset bounds on the level of poisoning tolerated by our designed

3
framework to ensure robustness. B. SafeNet Overview
Our Adversary. We now define a new adversary Apsoc
for our Given our threat model in Figure 1, existing PPML frame-
threat model (Figure 1) that corrupts servers in the MPC and works provide security against an Asoc adversary, but they are
poisons the owners’ datasets: not designed to handle an Apsoc adversary. We show experimen-
– Apsoc plays the role of Ap and poisons t out of m data tally in Section IV that PPML frameworks for private training
owners that secret share their training data to the servers. are susceptible to data poisoning attacks. While it would be
– Apsoc plays the role of Asoc and corrupts T out N servers possible to remedy this by implementing specific poisoning de-
taking part in the MPC computation. fenses (see Section V-C for a discussion of these approaches),
we instead show that it is possible to take advantage of
Note that the poisoned owners that Apsoc controls do not the bounded poisoning capability of Apsoc to design a more
interfere in the execution of the MPC protocols after secret- general and efficient defense. Intuitively, existing approaches
sharing their data and also do not influence the honest owners. train a single model on all local datasets combined, causing
Functionality FpTrain . Based on our newly introduced threat the model’s training set to have a large fraction of poisoned
model, we construct a new functionality FpTrain in Figure 2 to data (t/m), which is difficult to defend against. Instead, we
accommodate poisoned data. design SafeNet, a new protocol which uses ensemble models
to realize our threat model and provide security against Apsoc .
Functionality FpTrain In addition to successfully mitigating data poisoning attacks,
SafeNet provides more efficient training than existing PPML
Input: FpTrain receives secret-shares of Di and ai from each and comparable prediction accuracy.
owner Ci , where Di is a dataset and ai an auxiliary input. Figure 3 provides an overview of the training and inference
Computation: On receiving inputs from the owners, FpTrain phases of SafeNet. SafeNet trains an ensemble E of multiple
computes O = f (D1 , ..., Dm , a1 , . . . , am ), where f and O models in protocol Πtrain , where each model Mk ∈ E
denotes the training algorithm and the output of the algorithm is trained locally by the data owner Ck on their dataset.
respectively. This partitioning prevents poisoned data from contributing
Output: FpTrain constructs secret-shares of O and sends the
to more than t local models. Each data owner samples a
appropriate shares to the servers. local validation dataset and trains the local model Mk on
the remaining data. The local models and validation datasets
Fig. 2: Ideal Functionality for ML training with data poisoning
are secret shared to the outsourced servers. We note that this
permits arbitrarily corrupted models, and poisoned validation
Security against Apsoc . A training protocol Πtrain is secure datasets, but SafeNet’s structure still allows it to tolerate
against adversary Apsoc if: (1) Πtrain securely realizes function- these corruptions. In the protocol running inside the MPC,
ality FpTrain based on Definition 1; and (2) the model trained the servers jointly implement a filtering stage for identifying
inside the MPC provides poisoning robustness against data models with low accuracy on the combined validation data
poisoning attacks. (below a threshold ϕ) and excluding them from the ensemble.
Intuitively, the security definition ensures that Apsoc learns The output of training is a secret share of each model in the
no information about the honest owners’ inputs when T out trained ensemble E.
of N servers are controlled by the adversary, while the trained In the inference phase, SafeNet implements protocol Πpred ,
model provides poisoning robustness against a subset of t out to compute the prediction yk of each shared model Mk on test
of m poisoned owners. input x inside the MPC. The servers jointly perform majority
Extension to Privacy Adversary. While MPC guarantees no voting to determine the most common predicted class y on
privacy leakage during the execution of the protocol, it makes input x, using only the models which pass the filtering stage.
no promises about privacy leakage that arises by observing the An optional feature of SafeNet is to add noise to the majority
output of the protocol. This has motivated a combination of vote to enable user-level differential privacy protection, in
differential privacy guarantees with MPC algorithms, to pro- addition to poisoning robustness.
tect against privacy leakage for both the intermediate execution Our SafeNet protocol leverages our threat model, which
as well as the output of the protocol. For this reason, we also assumes that only a set of at most t out of m data owners
consider adversaries seeking to learn information about data are poisoned. This ensures that an adversary only influences a
owners’ local datasets by observing the output of the model, limited set of models in the ensemble, while existing training
as done in membership inference [17], [81], [94] and property protocols would train a single poisoned global model. We
inference attacks [39], [83], [97]. Recent works have used data provide bounds for the exact number of poisoned owners t
poisoning as a tool to further increase privacy leakage [19], supported by our ensemble in Theorem 6. Interestingly, the
[65], [85] of the trained models. Consequently, we can extend bound depends on the number of data owners m, and the max-
our threat model to accommodate a stronger version of Apsoc imum error made by a clean model in the ensemble. The same
that is also capable of performing privacy attacks by observing theorem also lower bounds the probability that the ensemble
the output of the trained model. predicts correctly under data poisoning performed by the t

4
SafeNet Testing Phase
SafeNet Training Phase
Π!"#$%
Client Query 𝑥
Π&"'(
𝐷"#$ ←∪! 𝐷!" s

Global Validation Dataset


Ensemble of Models
𝐷!" ← SelectRandom(𝐷! )
AccVal! ← Accuracy(𝑀! , 𝐷"#$ )
Secret shared
Local Data Local Validation Dataset models 𝑀! Model validation accuracy
𝑦! ← 𝑀! (𝑥)
𝐷!
Secret shared Filter model 𝑀! if AccVal! < 𝜙 Model predictions
validation
𝑀! ← LocalTrain(𝐷! ∖ 𝐷!" ) datasets 𝐷!" Filtering stage
𝑦 ← MajorityVote(𝑦% , … , 𝑦& )
Local Training
Ensemble of Models Ensemble final prediction
𝐸 ← (𝑀% , … 𝑀& )
Data Owner Computation
Secure Outsourced Computation Secure Outsourced Computation

Fig. 3: Overview of the Training and Inference phases of the SafeNet Framework.

poisoned owners, and we validate experimentally that, indeed, Algorithm 1 SafeNet Training Algorithm
SafeNet provides resilience to stealthy data poisoning attacks, Input: m data owners, each owner Ck ’s dataset Dk .
such as backdoor and targeted attacks. Another advantage of // Owner’s local computation in plaintext format
SafeNet is that the training time to execute the MPC protocols – For k ∈ [1, m] :
in the SOC setting is drastically reduced as each Mk ∈ E can
- Separate out Dkv from Dk . Train Mk on Dk \ Dkv .
be trained locally by the respective owner. We detail below the
- Secret-share Dkv and Mk to servers.
algorithms for training and inference in SafeNet.
// MPC computation in secret-shared format
C. SafeNet Training and Inference – Construct a common validation dataset Dval = ∪m v
i=1 Di .
To train the ensemble in SafeNet, we present our proposed m
– Construct ensemble of models E = {Mi }i=1
ensemble method in Algorithm 1. We discuss the realization – Initialize a vector bval of zeros and of size m.
in MPC later in Appendix B. Each owner Ck separates out – For k ∈ [1, m] : // Ensemble Filtering
a subset of its training dataset Dkv ∈ Dk and then trains its - AccValk = Accuracy(Mk , Dval )
model Mk on the remaining dataset Dk \ Dkv . The trained - If AccValk > ϕ: Set bval k =1
model Mk and validation dataset Dkv is then secret-shared return E and bval
to the servers.
m
The combined validation dataset is denoted as
v
S
Dval = Di . We assume that all users contribute equal-size
i=1
validation sets to Dval . During the filtering stage inside the
MPC, the validation accuracy AccVal of each model is jointly Given protocol Πtrain that securely realizes Algorithm 1
computed on Dval . If the resulting accuracy for a model is inside the MPC (described in Appendix B), we argue security
below threshold ϕ, the model is excluded from the ensemble. as follows:
The filtering step is used to separate the models with low
accuracy, either contributed by a poisoned owner, or by an Theorem 2. Protocol Πtrain is secure against adversary Apsoc
owner holding non-representative data for the prediction task. who poisons t out of m data owners and corrupts T out of
Under the assumption that the majority of owners are honest, N servers.
it follows that the majority of validation samples are correct. If
Ck is honest, then the corresponding Mk should have a high
The proof of the theorem will be given in Appendix C after we
validation accuracy on Dval , as the corresponding predicted
introduce the details of MPC instantiation and how protocol
outputs would most likely agree with the samples in Dval . In
Πtrain securely realizes FpTrain in Appendix B-3.
contrast, the predictions by a poisoned model Mk will likely
not match the samples in Dval . In Appendix A, we compute a During inference, the prediction of each model Mk is
lower bound on the size of the validation dataset as a function generated and the servers aggregate the results to perform ma-
of the number of poisoned owners t and filtering threshold ϕ, jority voting. Optionally, differentially private noise is added
such that all clean models pass the filtering stage with high to the sum to offer user-level privacy guarantees. The secure
probability even when a subset of the cross-validation dataset inference protocol Πpred in MPC and its proof of security is
Dval is poisoned. given in Appendix B and C respectively.

5
D. SafeNet Analysis trained on poisoned data D′ generates the same prediction on
x as E: E ′ (x) = y.
Here, we demonstrate the accuracy, poisoning robustness
and privacy guarantees that SafeNet provides. We first show Proof. If an adversary’s goal were to cause y ′ to be predicted
how to lower bound SafeNet’s test accuracy given that each on input x, their most efficient strategy is to flip y predictions
clean model in the ensemble reaches a certain accuracy to y ′ . If y were the ensemble prediction, it must have at least
c +c ′
level. We also give certified robustness and user-level privacy ⌊ y 2 y ⌋ model predictions, and the second most common
guarantees. All of our guarantees improve as the individual c +c ′
prediction y ′ would have at most ⌊ y 2 y ⌋ model predictions.
models become more accurate, making the ensemble agree on Corrupting these predictions then requires flipping at least
correct predictions more frequently. (cy − cy′ )/2 predictions from y to y ′ . Overall, this requires at
Robust Accuracy Analysis. We provide lower bounds on least ⌈(cy − cy′ )/2⌉ poisoned data owners. Thus, an adversary
SafeNet accuracy, assuming that at most t out m models in poisoning at most t = ⌈(cy − cy′ )/2⌉ − 1 data owners still
the SafeNet ensemble E are poisoned, and the clean models generates the same prediction y on x.
have independent errors, with maximum error rate p < 1 − ϕ,
Privacy Analysis. Recent work by McMahan et al. [66] intro-
where ϕ is the filtering threshold.
duced the notion of user-level differential privacy where the
Theorem. (Informal) Let Apsoc be an adversary who poisons at presence of a user in the protocol should have imperceptible
most t out of m data owners and corrupts T out of N servers. impact on the final trained model. We show that, given our
Assume that the filtered ensemble E has at least m − t clean threat model, SafeNet provides the strong privacy guarantee
models, each with a maximum error rate of p < 1 − ϕ. If the of user-level differential privacy, which also implies example-
number of poisoned owners is at most m(1−2p)
2(1−p) , ensemble E level differential privacy. This privacy guarantee can protect
correctly classifies new samples with high probability, which against model extraction and property inference attacks, in
is a function of m, ϕ, t and p. addition to membership inference attacks.
The formal theorem and the corresponding proof can be
found in Appendix A. Theorem 4. When DPN OISE function samples from a Laplace
random variable Lap(2/ε), Algorithm 2 satisfies user-level ε-
Poisoning Robustness Analysis. Our previous theorem differential privacy.
demonstrated that SafeNet’s accuracy on in-distribution data
is not compromised by poisoning. Now, we show that we can Proof. Observe that replacing a local model obtained from a
also certify robustness to poisoning on a per-sample basis for data owner in our framework only changes C OUNTS for two
arbitrary points, inspired by certified robustness techniques for classes by 1 on any given query, so it has an ℓ1 sensitivity
adversarial example robustness [26]. In particular, Algorithm 2 of 2. As a result, Lap(2/ε) suffices to ensure that user-level
describes a method for certified prediction against poisoning, ε-differential privacy holds.
returning the most common class y predicted by the ensemble The main crux of Theorem 4 is that no model can influence
on a test point x, as well as a bound on the number of C OUNTS too much, an observation also made by PATE [75]
poisoning owners t which would be required to modify the and the CaPC [24] framework, but they only considered
predicted class. example-level differential privacy, protecting against mem-
bership inference attacks, but not stronger attacks that user-
Algorithm 2 Certified Private Prediction P RED G AP (E, x) level differential privacy prevents. This limitation is inherent
Input: m data owners; Ensemble of models E = {Mi }m in PATE, as the central training set is split to train multiple
i=1 ;
Testing pointPx; Differential Privacy parameters ε, δ. models. However, our stronger analysis holds for SafeNet in
m the private collaborative learning setting, as we start with
C OUNTS = i=1 Mi (x) + DPN OISE(ε, δ)
y, cy = M OST C OMMON(C OUNTS) // most common pre- pre-existing partitions of benign and poisoned datasets. We
dicted class with noisy count prove Theorem 4 by considering Laplace noise, but various
y ′ , cy′ = S ECOND M OST C OMMON(C OUNTS) // second improvements to PATE using different mechanisms such as
most common predicted class with count Gaussian noise and other data-dependent approaches [75],
t = ⌈(cy − cy′ )/2⌉ − 1 [76], can also be extended to our framework.
return y, t Combining Robustness and Privacy. Adding differentially
private noise prevents Algorithm 2 from returning the exact
difference between the top two class-label counts, making it
We first analyze the poisoning robustness when privacy of
only possible to offer probabilistic robustness guarantees. That
aggregation is not enabled in the following theorem.
is, the returned t is actually a noisy version of the “true” t∗ ,
Theorem 3. Let E be an ensemble of models trained on where t∗ is used to certify correctness. However, for several
datasets D = {D1 , . . . , Dm }. Assume that on an input x, the choices of the DPNoise function, the exact distribution of the
ensemble generates prediction y = E(x) without DPN OISE noise is known, making it easy to provide precise probabilistic
and Algorithm 2 outputs (y, t). Moreover, assuming an adver- guarantees similar to those provided by Theorem 3. For
sary Apsoc who poisons at most t data owners, the resulting E ′ example, if Gaussian noise with scale parameter σ is used

6
to guarantee DP, and PredGap returns t, then this prediction F. Comparison to Existing Ensemble Strategies
observed t, then we know that the true t∗ is larger than t − k Model ensembles have been considered to address adver-
with probability Φ(k/σ), where Φ denotes the Gaussian CDF. sarial machine learning vulnerabilities in several prior works.
Here, we discuss the differences between our analysis and
previous ensembling approaches.
E. Extensions
a) Ensembles on a Centralized Training Set: Several
In addition to providing various guarantees, we offer a ensemble strategies seek to train a model on a single, cen-
number of extensions to our original SafeNet design. tralized training set. This includes using ensembles to prevent
poisoning attacks [51], [60], as well as to provide differential
Transfer Learning. A major disadvantage of SafeNet is its privacy guarantees [75] or robustness to privacy attacks [84].
slower inference time compared to a traditional PPML frame- Due to centralization, none of these techniques can take
work, requiring to perform a forward pass on all local models advantage of the partitioning of datasets. As a result, protection
in the ensemble. However, for transfer learning scenario, we from poisoning is only capable of handling a small number
propose a way where SafeNet runs almost as fast as the of poisoning examples, whereas our partitioning allows large
traditional framework. In transfer learning [34], [56], a pre- fractions of the entire dataset to be corrupted. PATE, due to
trained model MB , which is typically trained on a large public data centralization, can only guarantee privacy for individual
dataset, is used as a “feature extractor” to improve training on samples, whereas in our analysis, the entire dataset of a given
a given target dataset. In our setting, all data owners start owner can be changed, providing us with user-level privacy.
with a common pre-trained model, and construct their local b) CaPC [24]: Chouquette-Choo et al. [24] propose
models by fine tuning MB ’s last ‘l’ layers using their local CaPC, which extends PATE to the MPC collaborative learning
data. We can then modify the prediction phase of SafeNet to setting. Their analysis gives differential privacy guarantees for
reduce its inference time and cost considerably. The crucial individual examples. Our approach extends their analysis to
observation is that all local models differ only in the weights a differential privacy guarantee for the entire local training
associated to the last l layers. Consequently, given a prediction set and model, to provide protection against attacks such
query, we run MB upto its last l layers and use its output as property inference and model extraction. In addition, our
to compute the l layers of all the local models to obtain approach also provides poisoning robustness guarantees which
predictions for majority voting. The detailed description of the they cannot, as they allow information to be shared between
modified SafeNet algorithm is given in Appendix D-A. Note local training sets.
that, this approach achieves the same robustness and privacy c) Cao et al. [16]: Recent work by Cao et al. [16]
guarantees as described in Section III-D, given that MB was gave provable poisoning robustness guarantees for federated
originally not tampered with. learning aggregation. They proposed an ensembling strategy,
Integration Testing. While SafeNet can handle settings with where, given m data owners, t of which are malicious, they
non-iid data distributions among data owners, the local models construct an ensemble of m k global models, where each
accuracies might be impacted by extreme non-iid settings (we model is trained on a dataset collected from a set of k clients.
analyze the sensitivity of SafeNet to data imbalance in Section Our poisoning robustness argument in Theorem 3 coincides
IV-H). In such cases, SafeNet fails fast, allowing the owners to with theirs at k = 1, a setting they do not consider as their
determine whether or not using SafeNet is the right approach approach relies on combining client datasets for federated
for their setting. This is possible because SafeNet’s training learning. Additionally, k = 1 makes their approach vulnerable
phase is very cheap, making it possible to quickly evaluate to data reconstruction attacks [12], an issue SafeNet does not
the ensemble’s accuracy on the global validation set. If the face as the attack directly violates the underlying security
accuracy is not good enough, the owners can use a different guarantee of the MPC. We experimentally compare both
approach, such as a standard MPC training. SafeNet’s strong approaches on a federated learning dataset in Section V-D and
robustness guarantees and an efficient training phase makes it show that our approach outperforms [16].
an appealing first choice for private collaborative learning. IV. E VALUATION
Low Resource Owners. If a data owner does not have A. Experimental Setup
sufficient resources to train a model on their data, they cannot We build a functional code on top of the MP-SPDZ li-
participate in the standard SafeNet protocol. In such situations, brary [53]1 to assess the impact of data poisoning attacks on
computationally restricted owners can defer their training to the training phase of PPML frameworks. We consider four
SafeNet, that can use standard MPC training approaches to different MPC settings, all available in the MP-SPDZ library:
train their models. Training these models in MPC increases i) two-party with one semi-honest corruption (2PC) based on
the computational overhead of our approach, but facilitates [27], [32]; ii) three-party with one semi-honest corruption
broader participation. We provide the details of this modifica- (3PC) based on Araki et al. [4] with optimizations by [29],
tion in Appendix D-B and also run an experiment in Appendix [69]; iii) three-party with one malicious corruption based on
E-A to verify that SafeNet remains efficient, while retaining
the same robustness and privacy properties. 1 https://github.com/data61/MP-SPDZ

7
Dalskov et al. [28]; and iv) four-party with one malicious We train a neural network with one hidden layer of size 10
corruption (4PC), also based on [28]. Note, that both semi- nodes using ReLU activations.
honest and malicious adversaries possess poisoning capability; Fashion. We train several neural networks on the Fashion-
their roles change only inside the SOC paradigm. MNIST dataset [91] with one to three hidden layers. The
In all the PPML frameworks, the data owners secret-share Fashion dataset is a 10-class classification problem with 784
their training datasets to the servers and a single ML model features representing various garments. All hidden layers have
is trained on the combined dataset. Typically, real number 128 nodes and ReLU activations, except the output layer using
arithmetic is emulated by using 32-bit fixed-point represen- softmax.
tation of fractional numbers. Each fractional number x ∈ Z2ℓ
is represented as ⌊x · 2f ⌉, where ℓ and f denote the ring CIFAR-10. The CIFAR-10 dataset [57] is a 10 class image
size and precision, respectively. We set ℓ = 64 and f = 16. dataset. CIFAR-10 is harder than other datasets we consider,
Probabilistic truncation proposed by Dalskov et al. [28], [29] so we perform transfer learning from a ResNet-50 model [45]
is applied after every multiplication. In the MPC library pretrained on the ImageNet dataset [33]. We fine tune only the
implementation, the sigmoid function for computing the output last layer, freezing all convolutional layers.
probabilities is replaced with a three-part approximation [20], EMNIST. The EMNIST dataset [25] is a benchmark federated
[28], [71]. In SafeNet, models are trained locally using the learning image dataset, split in a non-iid fashion by the person
original sigmoid function. We implement softmax function who drew a given image. We select 100 EMNIST clients in
using the method of Aly et al. [2]. We perform our experiments our experiments.
over a LAN network on a 32-core server with 192GB of
memory allowing up to 20 threads to be run in parallel. D. Dataset Partitioning and Model Accuracy
We conduct our experiments by varying the number of
B. Metrics
data owners. We split MNIST and Adult datasets across
We use the following metrics to compare SafeNet with 20 participating data owners, while we use 10 owners for
existing PPML framework: Fashion and CIFAR-10 datsets. The EMNIST dataset used for
Training Time. is the time taken to privately train a model comparison with prior work on federated learning assumes 100
inside the MPC (protocol Πtrain ). As is standard practice [13], participating owners. Each owner selects at random 10% of its
[20], [21], [28], [69], [71], this excludes the time taken by the local training data as the validation dataset Djv . All models are
data owners to secret-share their datasets and models to the trained using mini-batch stochastic gradient descent.
servers as it is a one-time setup phase. To introduce non-iid behavior in our datasets (except for
Communication Complexity. is the amount of data ex- EMNIST, which is naturally non-iid), we sample class labels
changed between the servers during the privacy-preserving from a Dirichlet distribution [46]. That is, to generate a
execution of the training phase. population of non-identical owners, we sample q ∼ Dir(αp)
from a Dirichlet distribution, where p characterizes a prior
Test Accuracy. is the percentage of test samples that the ML class distribution over all distinct classes, and α > 0 is a
model correctly predicts. concentration parameter which controls the degree of similar-
Attack Success Rate. is the percentage of target samples that ity between owners. As α → ∞, all owners have identical
were misclassified as the label of attacker’s choice. distributions, whereas as α → 0, each owner holds samples
Robustness against worst-case adversary. We measure the of only one randomly chosen class. In practice, we observe
resilience of SafeNet at a certain corruption level c against α = 1000 leads to almost iid behavior, while α = 0.1 results
a powerful, worst-case adversary. For each test sample, this in an extreme imbalance distribution. The default choice for all
adversary can select any subset of c owners, arbitrarily mod- our experiments is α = 10, which provides a realistic non-iid
ifying the model to change the test sample’s classification. distribution. We will vary parameter α in Appendix E-A.
This is the same adversary considered in Algorithm 2 and by Dataset Partition Type Local Model SafeNet Ensemble Improvement
Theorem 3, any any model which is robust against this attack
MNIST 80.05% 89.48% 9.03%
has a provably certified prediction. We measure the error rate Adult 77.32% 81.41% 4.09%
Dirchlet
on testing samples for this worst-case adversarial model. FASHION 71.68% 83.26% 11.53%
CIFAR-10 54.03% 62.76% 8.73%

C. Datasets and Models EMNIST Natural 54.05% 79.19% 25.14%

We give a descriptions of the datasets and models used for TABLE I: Test accuracy comparison of a single local model and the
our experiments below. entire SafeNet ensemble. SafeNet Ensemble improves upon a single
local model across all datasets.
MNIST. The MNIST dataset [35] is a 10 class classification
problem which is used to predict digits between 0 and 9. We We measure the accuracy of a local model trained by
train a logistic regression model for MNIST. individual data owners and our SafeNet ensemble. Table I
Adult. The Adult dataset [35] is for a binary classification provides the detailed comparison of the accuracy of the local
problem to predict if a person’s annual income is above $50K. and ensemble models across all four datasets. We observe that

8
SafeNet consistently outperforms local models, with improve- samples (which are selected in advance before training the
ments ranging from 4.09% to 25.14%. The lowest performance model), while the backdoor attack generalizes to any testing
is on CIFAR-10, but in this case SafeNet’s accuracy is very samples including the backdoor pattern.
close to fine-tuning the network using the combined dataset,
which reaches 65% accuracy. F. Evaluation on Logistic Regression
We start with DIGIT 1/7 dataset, a subset of MNIST
E. Implementation of Poisoning Attacks
data using only digits 1 and 7, for which we evaluate the
Backdoor Attacks. We use the BadNets attack by Gu et al. computational costs and the poisoning attack success, for both
[44], in which the poisoned owners inject a backdoor into the traditional PPML and our newly proposed SafeNet framework.
model to change the model’s prediction from source label ys to We perform our experiments over four underlying MPC
target label yt . For instance, in an image dataset, a backdoor frameworks, with both semi-honest and malicious adversaries.
might set a few pixels in the corner of the image to white. Table II provides a detailed analysis of the training time
The BadNets attack strategy simply identifies a set of k target and communication complexity for both existing PPML and
samples {xti }ki=1 with true label ys , and creates backdoored SafeNet frameworks. Note that the training time and com-
samples with target label yt . We use k = 100 samples, which munication cost for the PPML frameworks is reported per
is sufficient to poison all models. epoch times the number of epochs in training. The number
To run backdoor attacks on models trained with standard of epochs is a configurable hyper-parameter, but usually at
PPML frameworks, the poisoned owners create the poisoned least 10 epochs are required. On the other hand, the training
dataset Dj∗ by adding k poisoned samples and secret-sharing time and communication reported for SafeNet is for the end-
them as part of the training dataset to the MPC. The framework to-end execution inside the MPC, independent of the number
then trains the ML model on the combined dataset submitted of epochs. We observe large improvements of SafeNet over
by both the honest and poisoned owners. the existing PPML frameworks. For instance, in the semi-
In SafeNet, backdoor attacks are implemented at the poi- honest two-party setting, SafeNet achieves 30× and 17×
soned owners, which add k backdoored samples to their improvement in running time and communication complexity,
dataset Dj and train their local models M∗j on the combined respectively, for n = 10 epochs. This is expected because
clean and poisoned data. A model trained only on poisoned SafeNet performs local model training, which is an expensive
data will be easy to filter due to low accuracy, making training phase in the MPC.
on clean samples necessary. The corrupt owners then secret- MPC Setting Framework Training (s) Comm. (GB)
share both the model M∗j and validation set Djv selected at
Semi-Honest PPML n×151.84 n×65.64
random from Dj to the MPC. 2PC
[32] SafeNet 57.41 38.03
Targeted Attacks. We select k targeted samples, and change Semi-Honest PPML n×2.63 n×0.35
their labels in training to a target label yt different from [4] SafeNet 0.54 0.15
3PC
the original label. The models are trained to simultaneously Malicious PPML n×32.54 n× 2.32
[28] SafeNet 9.44 1.47
minimize both the training and the adversarial loss. This
strategy has also been used to construct poisoned models by Malicious PPML n×5.28 n×0.66
4PC
[28] SafeNet 1.09 0.28
prior work [55], and can be viewed as an unrestricted version
of the state-of-the-art Witches’ Brew targeted attack (which TABLE II: Training Time (in seconds) and Communication (in GB)
requires clean-label poisoned samples) [40]. of existing PPML and SafeNet framework for a logistic regression
model over several MPC settings over a LAN network. n denotes the
The next question to address is which samples to target number of epochs required for training the logistic regression model
as part of the attack. We use two strategies to generate in the PPML framework. The time and communication reported for
k = 100 target samples, based on an ML model trained by SafeNet is for end-to-end execution.
the adversary over the test data. In the first strategy, called
TGT-Top, the adversary chooses examples classified correctly To mount the backdoor attack, the backdoor pattern sets the
with high confidence by a different model. Because these top left pixel value to white (a value of 1). We set the original
examples are easy to classify, poisoning them should be hard. class as ys = 1 and target class as yt = 7. Figure 4 (a) shows
We also consider an attack called TGT-Foot, which chooses the success rate for the 3PC PPML and SafeNet frameworks
low confidence examples, which are easier to poison. For both by varying the number of poisoned owners between 0 and
strategies, the adversary replaces its label with the second 10. We tested with all four PPML settings and the results are
highest predicted label. We compare these two strategies for similar. We observe that by poisoning data of a single owner,
target selection. the adversary is successfully able to introduce a backdoor in
The difference between targeted and backdoor attacks is the PPML framework. The model in the PPML framework
that targeted attacks do not require the addition of a backdoor predicts all k = 100 target samples as yt , achieving 100%
trigger to training or testing samples, as needed in a backdoor adversarial success rate. In contrast, SafeNet is successfully
attack. However, the impact of the backdoor attack is larger. able to defend against the backdoor attack, and provides 0%
Targeted attacks change the prediction on a small set of testing attack success rate up to 9 owners with poisoned data. The test

9
accuracy on clean data for both frameworks is high at around sample; (2) SafeNet provides certified robustness up to 9 out of
98.98% even after increasing the poisoned owners to 10. 20 poisoned owners even under this powerful threat scenario.
(a) Backdoor (b) TGT-Top
Multiclass Classification. We also test both frameworks in
100 100
the multiclass classification setting for both Backdoor and
Targeted attacks on MNIST dataset and observe similar large
80 80
Success Rate (in %)

Success Rate (in %)


improvements. For instance, in the semi-honest 3PC setting,
60 60
we get 240× and 268× improvement, respectively, in training
PPML Framework PPML Framework
40 SafeNet Framework 40 SafeNet Framework running time and communication complexity for n = 10
20 20 epochs while the success rate in the worst-case adversarial
0 0 scenario not exceeding 50% with 9 out of 20 owners being
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 poisoned. This experiment shows that the robust accuracy
# Corrupt Data Owers # Corrupt Data Owers
(c) TGT-Foot (d) Worst-case Adversary property of our framework translates seamlessly even for the
100
100
SafeNet-TGT-Top & Backdoor case of a multi-class classification problem. The details of the
SafeNet-TGT-Foot
experiment are deferred to Appendix E.
Ideal Success Rate (in %)

80 80
Success Rate (in %)

60
60 G. Evaluation on Deep Learning Models
PPML Framework
40 40
SafeNet Framework
We evaluate neural network training for PPML and SafeNet
20 20 frameworks on the Adult and Fashion datasets. We provide ex-
0
0 periments on a three hidden layer neural network on Fashion in
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
# Corrupt Data Owers # Corrupt Data Owers
this section and include additional experiments in Appendix E.
Table III provides a detailed analysis of the training time,
Fig. 4: Logistic regression attack success rate on the Digit-1/7 dataset communication, test accuracy and success rate for the 4PC
for PPML and SafeNet frameworks in the 3PC setting, for varying
poisoned owners launching Backdoor and Targeted attacks. Plot (a)
PPML framework and SafeNet using one poisoned owner.
gives the success rate for the BadNets attack, while plots (b) and We observe that SafeNet has 39× and 36× improvement in
(c) show the success rates for the TGT-Top and TGT-Foot targeted training time and communication complexity over the PPML
attacks. Plot (d) provides the worst-case adversarial success when the framework, for n = 10 epochs. The SafeNet prediction time is
set of poisoned owners can change per sample. Lower attack success on average 26 milliseconds to perform a single secure predic-
result in increased robustness. SafeNet achieves much higher level of
robustness than existing PPML under both attacks.
tion, while the existing PPML framework takes on average 3.5
milliseconds for the same task. We believe this is a reasonable
We observe in Figure 4 (b) that for the TGT-Top targeted cost for many applications, as SafeNet has significant training
attack, a single owner poisoning is able to successfully mis- time improvements and robustness guarantees.
classify 98% of the target samples in the PPML framework. For the BadNets backdoor attack we set the true label ys as
As a consequence, the test accuracy of the model drops by a ‘T-Shirt’ and target label yt as ‘Trouser’. We test the effect of
≈ 10%. In contrast, SafeNet works as intended even at high both TGT-Top and TGT-Foot attacks under multiple poisoned
levels of poisoning. For the TGT-Foot attack in Figure 4 (c), owners, and also evaluate another variant of targeted attack
the test accuracy of the 3PC PPML framework drops by ≈ 5%. called TGT-Random, where we randomly sample k = 100
The attack success rate is 94% for the 3PC PPML, which is target samples from the test data. Figure 5 provides the worst-
decreased to 21% by SafeNet, in presence of a single poisoned case adversarial success of SafeNet against these attacks. We
owner. The accuracy drop and success rate vary across the two observe that SafeNet provides certified robustness for TGT-
strategies because of the choice of the target samples. In TGT- Random and TGT-Top up to 4 out of 10 poisoned onwers,
Foot, the models have low confidence on the target samples, while the adversary is able to misclassify more target samples
which introduces errors even without poisoning, making the in the TGT-Foot attack. The reason is that the k selected target
attack succeed with slightly higher rate in SafeNet. Still, samples have lowest confidence and models in the ensemble
SafeNet provides resilience against both TGT-Top and TGT- are likely to be in disagreement on their prediction.
Foot for up to 9 out of 20 poisoned owners.
Worst-case Robustness. Figure 4 (d) shows the worst-case H. Evaluation of Extensions
attack success in SafeNet, by varying the number of poisoned Here, we evaluate our SafeNet extensions introduced in
owners c ∈ [1, 10] and allowing the attacker to poison a Section III-E. First, we experiment with our transfer learning
different set of c owners for each testing sample (i.e., the extension. We show that, on applying our extension to SafeNet,
adversarial model considered in Algorithm 2 for which we can its inference overhead falls dramatically. We test our approach
certify predictions). Interestingly, SafeNet’s accuracy is similar on Fashion and CIFAR-10 datasets. For the Fashion dataset,
to that achieved under our backdoor and targeted attacks, even we use the same setup as earlier with m = 10 data owners, and
for this worst-case adversarial scenario. Based on these results three-layered neural network as the model architecture, where
we conclude that: (1) the backdoor and targeted attacks we each data owner fine-tunes only the last layer (l = 1) of the
choose to implement are as strong as the worst-case adversarial pre-trained model. We observe that for each secure inference,
attack, in which the set of poisoned owners is selected per SafeNet is now only 1.62× slower and communicates 1.26×

10
Backdoor Attack Targeted Attack
MPC Setting Framework Training Time (s) Communication (GB)
Test Accuracy Success Rate Test Accuracy Success Rate-Top Success Rate-Foot
PPML n × 565.45 n × 154.79 84.07% 100% 82.27% 100% 100%
3PC [4] Semi-Honest
SafeNet 156.53 41.39 84.36% 0% 84.48% 0% 32%
PPML n × 1392.46 n × 280.32 84.12% 100% 82.34% 100% 100%
4PC [28] Malicious
SafeNet 356.26 76.43 84.36% 0% 84.54% 0% 32%

TABLE III: Time (in seconds) and Communication (in Giga-Bytes) over a LAN network for PPML and SafeNet framework training a
Neural Network model with 3 hidden layers over Fashion dataset. n denotes the number of epochs used to train the NN model in the PPML
framework. The time and communication reported for SafeNet is for end-to-end execution. Test Accuracy and Success Rate is given for the
case when a single owner is corrupt.

100
SafeNet-TGT-Top V. D ISCUSSION AND C OMPARISON
Ideal Success Rate (in %)

SafeNet-TGT-Random
SafeNet-TGT-Foot
SafeNet-Backdoor We showed that SafeNet successfully mitigates a variety
of data poisoning attacks. We now discuss other aspects of
50
our framework such as scalability and modularity, parameter
selection in practice and comparison against other mitigation
strategies and federated learning approaches.
0
0 1 2 3 4 5
# Corrupt Data Owers
A. SafeNet’s Scalability and Modularity

Fig. 5: Worst-case adversarial success against targeted and backdoor Scalability. The training and prediction times of SafeNet
attacks of a three-layer neural network trained on Fashion in SafeNet. inside the MPC depend on the number of models in the
The adversary can change the set of c poisoned owners per sample. ensemble and the size of the validation dataset. The training
SafeNet achieves robustness on the backdoor, TGT-Top and TGT- time increases linearly with the fraction of training data used
Random attacks, up to 4 poisoned owners out of 10. The TGT-Foot
for validation and the number of models in the ensemble.
attack targeting low-confidence samples has higher success.
Similarly, the prediction phase of SafeNet has both runtime
and communication scaling linearly with the number of models
in the ensemble. However, we discussed how transfer learning
more on average than the PPML framework, while the standard can reduce the inference time of SafeNet.
SafeNet approach is about 8× slower due to the evaluation of Modularity. Another key advantage of SafeNet is that it
multiple ML models. can use any MPC protocol as a backend, as long as it
implements standard ML operations. We demonstrated this by
We observe even better improvements for CIFAR-10
performing experiments with both malicious and semi-honest
dataset. Here, we use a state-of-the-art 3PC inference protocol
security for four different MPC settings. As a consequence,
from [58], built specially for ResNet models. In our setting,
advances in ML inference with MPC will improve SafeNet’s
each owner fine-tunes the last layer of a ResNet-50 model,
runtime. SafeNet can also use any model type implementable
which was pre-trained on ImageNet data. SafeNet reaches
in MPC; if more accurate models are designed, this will lead
62.8% accuracy, decaying smoothly in the presence of poi-
to improved robustness and accuracy.
soning: 51.9% accuracy tolerating a single poisoned owner,
and 39.8% while tolerating two poisoned owners. The cost B. Instantiating SafeNet in Practice
of inference for a single model is an average of 59.9s, and
In this section we discuss how SafeNet can be instantiated in
SafeNet’s overhead is negligible (experimental noise has a
practice. There are two aspects the data owners need to agree
larger impact than SafeNet); SafeNet increases communication
upon before instantiating SafeNet: i) The MPC framework
by only 0.1%, increasing around 7MB over the 6.5GB required
used for secure training and prediction phase and ii) the
for standard inference.
parameters in Theorem 6 to achieve poisoning robustness. The
Next, we analyze the behavior of SafeNet under different MPC framework is agreed upon by choosing the total number
non-iid settings by varying the concentration parameter α. of outsourced servers N participating in the MPC, the number
We use the same Fashion dataset setup from Section IV-G. of corrupted servers T and the nature of the adversary (semi-
We observe that as α decreases, i.e., the underlying data honest or malicious in the SOC paradigm). The owners then
distribution of the owners become more non-iid, SafeNet’s agree upon a filtering threshold ϕ and the number of poisoned
accuracy decreases, as expected, but SafeNet still achieves owners t that can be tolerated. Once these parameters are
reasonable robustness even under high data imbalance (e.g., chosen the maximum allowed error probability of the local
α = 1). In extremely imbalanced settings, such as α = 0.1, models trained by the honest owners based on Lemma 5 and
SafeNet can identify low accuracy during training and data Theorem 6, can be computed as p < min( m(1−ϕ)−t
m−t
m−2t
, 2(m−t) ),
owners can take actions accordingly. We defer the details for where m denotes the total number of data owners. Given the
this extension to Appendix E-A, which also includes analyzing upper bound on the error probability p, each honest owner
attack success rates under extreme non-iid conditions. trains its local model while satisfying the above constraint.

11
We provide a concrete example on parameter selection as to poisoning [64]. For example, recent work [6], [8], [92]
follows: We instantiate our Fashion dataset setup, with m = 10 showed that malicious data owners can significantly reduce
data owners participating in SafeNet. For the MPC framework the learned global model’s accuracy. Existing defenses against
we choose a three-party setting (N = 3 servers), tolerating such owners use Byzantine-robust aggregation rules such as
T = 1 corruption. For poisoning robustness, we set ϕ = 0.3 trimmed mean [96], coordinate-wise mean [95] and Krum
and the number of poisoned owners to t = 2. This gives [11], which have been show to be susceptible to backdoor
us the upper bound on max error probability as p < 0.375. and model poisoning attacks [37]. Recent work in FL such as
Also the size of the global validation dataset is |Dval | > 92 FLTrust [15] and DeepSight [79] provide mitigation against
samples, i.e., each data owner contributes 10 cross-validation backdoor attacks. Both strategies are inherently heuristic,
samples each such that the constrained is satisfied. With this while SafeNet offers provable robustness guarantees. FLTrust
instantiation, we observe that none of the clean models are also requires access to a clean dataset, which is not required
filtered during training and the attack success rate of the in our framework, and DeepSight inspects each model update
adversary for backdoor attacks remains the same even after before aggregation, which is both difficult in MPC and leads
poisoning 3 owners, while our analysis holds for t = 2 to privacy leakage from the updates, a drawback not found
poisoned owners. Thus, in practice SafeNet is able tolerate in SafeNet. An important privacy challenge is that federated
more poisoning than our analysis suggests. learning approaches permit data reconstruction attacks when
the central server is malicious [12]. SafeNet prevents such
C. Comparing to poisoning defenses an attack, as it directly violates the security guarantee of the
Defending against poisoning attacks is an active area of MPC, when instantiated for the malicious setting.
research, but defenses tend to be heuristic and specific to We experimentally compare SafeNet to the federated
attacks or domains. Many defenses for backdoor poisoning learning-based approach of Cao et al. [16], who also gave
attacks exist [22], [63], [86], [89], but these strategies work provable robustness guarantees in the federated averaging
only for Convolutional Neural Networks trained on image scenario. We instantiate their strategy for EMNIST dataset and
datasets; Severi et al. [80] showed that these approaches fail compare their Certified Accuracy metric to SafeNet’s, with
when tested on other data modalities and models. Furthermore, m = 100 data owners, k = {2, 4} and FedAvg as the base
recent work by Goldwasser et.al [42] formulated a way to plant algorithm. To ensure both approaches have similar inference
backdoors that are undetectable by any defense. In contrast, times, we fix the ensemble size to 100 models, each trained
SafeNet is model agnostic and works for a variety of data using federated learning with 50 global and local iterations.
modalities. Even if an attack is undetectable, the adversary
can poison only a subset of models, making the ensemble 1.0
SafeNet
robust against poisoning. In certain instances SafeNet can 0.8 Cao et al. k=2
tolerate around 30% of the training data being poisoned, while Cao et al. k=4
Robust Accuracy

being attack agnostic. SafeNet is also robust to stronger model 0.6


poisoning attacks [5], [8], [37], which are possible when data 0.4
owners train their models locally. SafeNet tolerates model 0.2
poisoning because each model only contributes to a single
vote towards the final ensemble prediction. In fact, all our 0.0
0 10 20 30 40 50
Malicious Clients
empirical and theoretical analysis of SafeNet is computed for
arbitrarily corrupted models. Fig. 6: Certified Accuracy of our framework compared to Cao
et al. [16]. We fix the size of the Cao et al. ensemble to 100,
D. Comparison with Federated Learning
to match the test runtime of SafeNet.
Federated Learning (FL) is a distributed machine learning
framework that allows clients to train a global model without Figure 6 shows that SafeNet consistently outperforms [16],
sharing their local training datasets to the central server. in terms of maintaining a high certified accuracy in the
However, it differs from the PPML setting we consider in the presence of large poisoning rates. Moreover, their strategy is
following ways: (1) Clients do not share their local data to also particularly expensive at training time when instantiated
the server in FL, whereas PPML allows sharing of datasets; in MPC. During training, their approach requires data owners
(2) Clients participate in multiple rounds of training in FL, to interact inside MPC to train models over multiple rounds.
whereas they communicate only once with the servers in By contrast, SafeNet only requires interaction in MPC at the
PPML; (3) Clients receive the global model at each round in beginning of the training phase, making it significantly faster.
FL, while in SafeNet they secret-share their models once at the
start of the protocol; and, finally, (4) PPML provides stronger VI. C ONCLUSION
confidentiality guarantees such as privacy of the global model. In this paper, we extend the security definitions of MPC
It is possible to combine FL and MPC to guarantee both to account for data poisoning attacks when training machine
client and global model privacy [38], [52], [98], but this learning models privately. We consider a novel adversarial
involves large communication overhead and is susceptible model who can manipulate the training data of a subset of

12
owners and control a subset of servers in the MPC. We then [16] X. Cao, J. Jia, and N. Gong. Provably secure federated learning against
propose SafeNet, which performs ensembling in MPC, and malicious clients. In AAAI, 2021.
[17] N. Carlini, S. Chien, M. Nasr, S. Song, A. Terzis, and F. Tramer.
show that our design has provable robustness and privacy Membership inference attacks from first principles. In IEEE Symposium
guarantees, beyond those offered by existing approaches. We on Security and Privacy (SP), 2022.
evaluate SafeNet using logistic regression and neural networks [18] N. Chandran, D. Gupta, and A. Obbattu, L.B. andShah. Simc: Ml
inference secure against malicious clients at semi-honest cost. In
models trained on five datasets by varying the distribution USENIX, 2022.
similarity across data owners. We consider both end-to-end [19] H. Chaudhari, J. Abascal, A. Oprea, M. Jagielski, F. Tramèr, and J. Ull-
and transfer learning scenarios. We demonstrate experimen- man. Snap: Efficient extraction of private properties with poisoning.
arXiv, 2022.
tally that SafeNet achieves even higher robustness than its [20] H. Chaudhari, A. Choudhury, A. Patra, and A. Suresh. ASTRA: High-
theoretical analysis against backdoor and targeted poisoning throughput 3PC over Rings with Application to Secure Prediction. In
attacks, at a significant performance improvement in the train- ACM CCSW, 2019.
[21] H. Chaudhari, R. Rachuri, and A. Suresh. Trident: Efficient 4pc
ing time and communication complexity compared to existing framework for privacy preserving machine learning. NDSS, 2020.
PPML frameworks. [22] B. Chen, W. Carvalho, N. Baracaldo, H. Ludwig, B. Edwards, T. Lee,
I. M. Molloy, and B. Srivastava. Detecting backdoor attacks on deep
VII. ACKNOWLEDGMENTS neural networks by activation clustering. In SafeAI@AAAI, 2019.
[23] X. Chen, C. Liu, B. Li, K. Lu, and D. Song. Targeted backdoor attacks
We thank Nicolas Papernot and Peter Rindal for helpful on deep learning systems using data poisoning. 2017.
discussions and feedback. This research was sponsored by [24] C.A. Choquette-Choo, N. Dullerud, A. Dziedzic, Y. Zhang, S. Jha,
the U.S. Army Combat Capabilities Development Command N. Papernot, and X. Wang. Ca{pc} learning: Confidential and private
collaborative learning. In ICLR, 2021.
Army Research Laboratory under Cooperative Agreement [25] G. Cohen, S. Afshar, J. Tapson, and A. van Schaik. Emnist: Extending
Number W911NF-13-2-0045 (ARL Cyber Security CRA). The mnist to handwritten letters. In International Joint Conference on Neural
views and conclusions contained in this document are those Networks (IJCNN), 2017.
[26] J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness
of the authors and should not be interpreted as represent- via randomized smoothing. In ICML, 2019.
ing the official policies, either expressed or implied, of the [27] R. Cramer, I. Damgård, D. Escudero, P. Scholl, and C. Xing. SPDZ2k:
Combat Capabilities Development Command Army Research Efficient MPC mod 2ˆk for Dishonest Majority. CRYPTO, 2018.
[28] A. Dalskov, D. Escudero, and M. Keller. Fantastic four: Honest-majority
Laboratory or the U.S. Government. The U.S. Government is four-party secure computation with malicious security. In USENIX,
authorized to reproduce and distribute reprints for Government 2021.
purposes notwithstanding any copyright notation here on. [29] A.P.K. Dalskov, D. Escudero, and M. Keller. Secure evaluation of
quantized neural networks. In PoPETS, 2020.
[30] I. Damgård, M. Keller, E. Larraia, V. Pastro, P. Scholl, and N. P. Smart.
R EFERENCES Practical covertly secure MPC for dishonest majority - or: Breaking the
[1] M. Abspoel, D. Escudero, and N. Volgushev. Secure training of decision SPDZ limits. In ESORICS, 2013.
trees with continuous attributes. In PoPETS, 2021. [31] I. Damgård, V. Pastro, N. P. Smart, and S. Zakarias. Multiparty
[2] A. Aly and N.P. Smart. Benchmarking privacy preserving scientific Computation from Somewhat Homomorphic Encryption. In CRYPTO,
operations. In ACNS, 2019. 2012.
[3] T. Araki, A. Barak, J. Furukawa, T. Lichter, Y. Lindell, A. Nof, K. Ohara, [32] D. Demmler, T. Schneider, and M. Zohner. ABY - A Framework for
A. Watzman, and O. Weinstein. Optimized honest-majority MPC for Efficient Mixed-Protocol Secure Two-Party Computation. In NDSS,
malicious adversaries - breaking the 1 billion-gate per second barrier. 2015.
In IEEE S&P, 2017. [33] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-
[4] T. Araki, J. Furukawa, Y. Lindell, A. Nof, and K. Ohara. High- Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
throughput semi-honest secure three-party computation with an honest conference on computer vision and pattern recognition, pages 248–255.
majority. In ACM CCS, 2016. Ieee, 2009.
[5] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and Vitaly Shmatikov. How [34] J. Devlin, M.W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of
to backdoor federated learning. 2018. deep bidirectional transformers for language understanding. In NAACL,
[6] Bagdasaryan<B., A. Veit, Y. Hua, D. Estrin, and V. Shmatikov. How to 2019.
backdoor federated learning. In AISTATS, 2020. [35] D. Dua and C. Graff. UCI machine learning repository, 2017.
[7] M. Ben-Or, S. Goldwasser, and A. Wigderson. Completeness The- [36] D. Escudero, M. Jagielski, R. Rachuri, and P. Scholl. Adversarial Attacks
orems for Non-Cryptographic Fault-Tolerant Distributed Computation and Countermeasures on Private Training in MPC. In PPML@NeurIPS,
(Extended Abstract). In ACM STOC, 1988. 2021.
[8] A. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo. Analyzing federated [37] M. Fang, X. Cao, J. Jia, and N. Gong. Local model poisoning attacks
learning through an adversarial lens. In ICML, 2019. to byzantine-robust federated learning. In Usenix, 2020.
[9] B. Biggio, I. Corona, G. Fumera, G. Giacinto, and F. Roli. Bagging [38] A. Fu, X. Zhang, N. Xiong, Y. Gao, H. Wang, and J. Zhang. Vfl:
classifiers for fighting poisoning attacks in adversarial classification A verifiable federated learning with privacy-preserving for big data in
tasks. In International workshop on multiple classifier systems, 2011. industrial iot. In IEEE Transactions on Industrial Informatics, 2020.
[10] B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support [39] K. Ganju, Q. Wang, W. Yang, C.A. Gunter, and N. Borisov. Property
vector machines. In ICML, 2012. inference attacks on fully connected neural networks using permutation
[11] P. Blanchard, E. Mhamdi, R. Guerraoui, and J. Stainer. Byzantine- invariant representations. 2018.
tolerant machine learning. In NeurIPS, 2017. [40] J. Geiping, L.H. Fowl, W.R. Huang, W. Czaja, G. Taylor, M. Moeller,
[12] F. Boenisch, A. Dziedzic, R. Schuster, A. Shamsabadi, I. Shumailov, and and T. Goldstein. Witches’ brew: Industrial scale data poisoning via
N. Papernot. When the curious abandon honesty: Federated learning is gradient matching. In ICLR, 2021.
not private. In arXiv, 2021. [41] O. Goldreich, S. Micali, and A. Wigderson. How to Play any Mental
[13] M. Byali, H. Chaudhari, A. Patra, and A. Suresh. Flash: Fast and robust Game or A Completeness Theorem for Protocols with Honest Majority.
framework for privacy-preserving machine learning. PoPETS, 2020. In STOC, 1987.
[14] R. Canetti. Security and composition of multiparty cryptographic [42] S. Goldwasser, M. Kim, V. Vaikuntanathan, and O. Zamir. Planting
protocols. In J. Cryptology, 2000. undetectable backdoors in machine learning models. In arXiv, 2022.
[15] X. Cao, M. Fang, J. Liu, and N. Gong. Fltrust: Byzantine-robust [43] S. D. Gordon, S. Ranellucci, and X. Wang. Secure computation with
federated learning via trust bootstrapping. In NDSS, 2021. low communication from cross-checking. In ASIACRYPT, 2018.

13
[44] T. Gu, K. Liu, B. Dolan-Gavitt, and S. Garg. Badnets: Evaluating [74] J. B. Nielsen and C. Orlandi. Cross and clean: Amortized garbled circuits
backdooring attacks on deep neural networks. IEEE Access, 2019. with constant overhead. In TCC, 2016.
[45] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep [75] N. Papernot, M. Abadi, Ú. Erlingsson, I. Goodfellow, and K. Talwar.
residual learning for image recognition. In Proceedings of the IEEE Semi-supervised knowledge transfer for deep learning from private
conference on computer vision and pattern recognition, pages 770–778, training data. In ICLR, 2017.
2016. [76] N. Papernot, S. Song, I. Mironov, A. Raghunathan, K. Talwar, and
[46] T.M.Harry Hsu, H.Qi, and M.Brown. Measuring the effects of non- Ú. Erlingsson. Scalable private learning with pate. 2018.
identical data distribution for federated visual classification. In IACR [77] A. Patra, T. Schneider, A. Suresh, and H. Yalame. Aby2.0: Improved
ePrint, 2019. mixed-protocol secure two-party computation. In USENIX, 2021.
[47] Y. Ishai, J. Kilian, K. Nissim, and E. Petrank. Extending Oblivious [78] D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi,
Transfers Efficiently. In CRYPTO, 2003. and R. Sharma. Cryptflow2: Practical 2-party secure inference. In ACM
[48] Y. Ishai, R. Kumaresan, E. Kushilevitz, and A. Paskin-Cherniavsky. CCS, 2020.
Secure computation with minimal interaction, revisited. In CRYPTO, [79] P. Rieger, T. Nguyen, M. Miettinen, and A. Sadeghi. Deepsight:
2015. Mitigating backdoor attacks in federated learning through deep model
[49] M. Jagielski, A. Oprea, B. Biggio, C. Liu, C.N. Rotaru, and B. Li. inspection. In NDSS, 2022.
Manipulating machine learning: Poisoning attacks and countermeasures [80] G. Severi, J. Meyer, S. Coull, and A. Oprea. Explanation-guided
for regression learning. In IEEE S&P, 2018. backdoor poisoning attacks against malware classifiers. In USENIX,
[50] J. Jia, X. Cao, and N. Gong. Intrinsic certified robustness of bagging 2021.
against data poisoning attacks. In AAAI, 2021. [81] R. Shokri, M. Stronati, and V. Song, C.and Shmatikov. Membership
[51] Jinyuan Jia, Xiaoyu Cao, and Neil Zhenqiang Gong. Intrinsic certified inference attacks against machine learning models. In 2017 IEEE
robustness of bagging against data poisoning attacks. In Proceedings of Symposium on Security and Privacy (SP). IEEE, 2017.
the AAAI Conference on Artificial Intelligence, volume 35, pages 7961– [82] O. Suciu, R. Marginean, Y. Kaya, H. Daume III, and T. Dumitras. When
7969, 2021. does machine learning FAIL? generalized transferability for evasion and
[52] R. Kanagavelu, Z. Li, J. Samsudin, Y. Yang, F. Yang, R. Goh, M. Cheah, poisoning attacks. In USENIX, 2018.
P. Wiwatphonthana, K. Akkarajitsakul, and S. Wang. Two-phase multi- [83] A. Suri and D. Evans. Formalizing and estimating distribution inference
party computation enabled privacy-preserving federated learning. In risks. Proceedings on Privacy Enhancing Technologies (PETS), 2022.
ACM CCGRID, 2020. [84] X. Tang, S. Mahloujifar, L. Song, V. Shejwalkar, M. Nasr,
[53] M. Keller. MP-SPDZ: A versatile framework for multi-party computa- A. Houmansadr, and P. Mittal. Mitigating membership inference attacks
tion. In ACM CCS, 2020. by {Self-Distillation} through a novel ensemble architecture. In 31st
[54] P.W. Koh and P. Liang. Understanding black-box predictions via USENIX Security Symposium, 2022.
influence functions. In ICML, 2017. [85] F. Tramèr, R. Shokri, A.S. Joaquin, H. Le, M. Jagielski, S. Hong, and
[55] P.W. Koh, J. Steinhardt, and P. Liang. Stronger data poisoning attacks N. Carlini. Truth Serum: Poisoning machine learning models to reveal
break data sanitization defenses. In arXiv, 2018. their secrets. In ACM Computer and Communications Security (CCS),
[56] S. Kornblith, J. Shlens, and Q.V. Le. Do better imagenet models transfer 2022.
better? In CVPR, 2019. [86] B. Tran, J. Li, and A. Madry. Spectral signatures in backdoor attacks.
[57] A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features In NeurIPS, 2018.
from tiny images. 2009. [87] S. Wagh, D. Gupta, and N. Chandran. SecureNN: Efficient and private
[58] N. Kumar, M. Rathee, N. Chandran, D. Gupta, A. Rastogi, and neural network training. In PoPETS, 2019.
R. Sharma. Cryptflow: Secure tensorflow inference. In IEEE Security [88] S. Wagh, S. Tople, F. Benhamouda, E. Kushilevitz, P. Mittal, and
& Privacy, 2020. T. Rabin. Falcon: Honest-majority maliciously secure framework for
[59] R. Lehmkuhl, P. Mishra, A. Srinivasan, and R.A. Popa. Muse: Secure private deep learning. In PoPETS, 2021.
inference resilient to malicious clients. In USENIX, 2021. [89] B. Wang, Y. Yao, S. Shan, H. Li, H. Viswanath, B. Zheng, and B.Y.
[60] Alexander Levine and Soheil Feizi. Deep partition aggregation: Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in
Provable defense against general poisoning attacks. arXiv preprint neural networks. In IEEE S&P, 2019.
arXiv:2006.14768, 2020. [90] J.L. Watson, S. Wagh, and R.A. Popa. Piranha: A gpu platform for
[61] Y. Lindell. Fast cut-and-choose-based protocols for malicious and covert secure computation. In USENIX, 2022.
adversaries. In J. Cryptology, 2016. [91] H. Xiao, K. Rasul, and R. Vollgraf. Fashion-mnist: a novel image dataset
[62] Y. Lindell and B. Pinkas. An efficient protocol for secure two-party for benchmarking machine learning algorithms, 2017.
computation in the presence of malicious adversaries. In EUROCRYPT, [92] C. Xie, S. Koyejo, and I. Gupta. Fall of empires: Breaking byzantine-
2007. tolerant SGD by inner product manipulation. In UAI, 2019.
[63] K. Liu, B. Dolan, and S. Garg. Fine-pruning: Defending against [93] A. C. Yao. Protocols for Secure Computations. In FOCS, 1982.
backdooring attacks on deep neural networks. In RAID, 2018. [94] S. Yeom, I. Giacomelli, M. Fredrikson, and S. Jha. Privacy risk in
[64] Z. Liu, Jiale G., W. Yang, K.n Fan, J.and Lam, and J. Zhao. Privacy- machine learning: Analyzing the connection to overfitting. In 2018 IEEE
preserving aggregation in federated learning: A survey. In arXiv, 2022. 31st Computer Security Foundations Symposium (CSF). IEEE, 2018.
[65] S. Mahloujifar, E. Ghosh, and M. Chase. Property inference from [95] D. Yin, Y. Chen, K. Ramchandran, and P. Bartlett. Byzantine-robust
poisoning. In IEEE Symposium on Security and Privacy (SP), 2022. distributed learning: Towards optimal statistical rates. In ICML, 2018.
[66] H.B McMahan, D. Ramage, K. Talwar, and L. Zhang. Learning [96] D. Yin, Y. Chen, K. Ramchandran, and P. Bartlett. Defending against
differentially private recurrent language models. In ICLR, 2018. saddle point attack in byzantine-robust distributed learning. In ICML,
[67] P. Mishra, R. Lehmkuhl, A. Srinivasan, W. Zheng, and R.A. Popa. 2019.
Delphi: A cryptographic inference service for neural networks. In [97] W. Zhang, S. Tople, and O. Ohrimenko. Leakage of dataset properties
USENIX, 2020. in Multi-Party machine learning. In 30th USENIX Security Symposium,
[68] P. Mohassel and M. K. Franklin. Efficiency tradeoffs for malicious two- 2021.
party computation. In PKC, 2006. [98] H. Zhu, R. Mong Goh, and W. Ng. Privacy-preserving weighted
[69] P. Mohassel and P. Rindal. ABY3 : A Mixed Protocol Framework for federated learning within the secret sharing framework. In IEEE Access,
Machine Learning. In ACM CCS, 2018. 2020.
[70] P. Mohassel, M. Rosulek, and Y. Zhang. Fast and Secure Three-party
Computation: Garbled Circuit Approach. In CCS, 2015. A PPENDIX A
[71] P. Mohassel and Y. Zhang. Secureml: A system for scalable privacy- S AFE N ET A NALYSIS
preserving machine learning. In IEEE S&P, 2017.
[72] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongras- In this section we first provide a detailed proof on the size
samee, E.C. Lupu, and F. Roli. Towards poisoning of deep learning of the validation dataset Dval such that all clean models clear
algorithms with back-gradient optimization. In AISec@CCS, 2017.
[73] J. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature the filtering stage of the training phase of our framework. We
learning by training maliciously. In RAID, 2006. then provide a proof on achieving lower bounds on the test

14
accuracy of our framework given all clean models are a part
of the ensemble.
The main idea of deriving the minimum size of Dval uses the As a visual interpretation of Lemma 5, Figure 7 shows the
point that the errors made by a clean model on a clean subset minimum number of samples required in the global validation
of samples in Dval can be viewed as a Binomial distribution dataset for varying number of poisoned owners t and error
in (m − t)n and p, where n denotes the size of the validation probability p. We set the total models m = 20, the failure
dataset Dkv contributed by an owner Ck . We can then upper probability ϵ = 0.01 and the filtering threshold ϕ = 0.3. The
bound the total errors made by a clean model by applying higher the values of t and p, the more samples are required
Chernoff bound and consequently compute the size of Dval . in the validation set. For instance, for p = 0.20 and number
of poisoned owners t = 8, all clean models pass the filtering
Lemma 5. Let Apsoc be an adversary who poisons t out stage with probability at least 0.99 when the validation set size
of m data owners and corrupts T out of N servers, and has at least 60 samples.
thus contributes t poisoned models to ensemble E, given as
output by Algorithm 1. Assume that Πtrain securely realizes Error Probability
80

Global validation data size


functionality FpTrain and every clean model in E makes an p=0.01
70 p=0.05
p=0.10
error on a clean sample with probability at most p < 1 − ϕ, 60 p=0.15
where ϕ is the filtering threshold. 50 p=0.20

If the validation dataset has at least (2+δ)m log 1/ϵ


δ 2 (m−t)p samples 40
30
and 0 ≤ t < m(1−ϕ−p)
(1−p) , then all clean models pass the filtering 20
stage of the training phase with probability at least 1−ϵ, where 10
δ = (1−ϕ)m−t
(m−t)p − 1 and ϵ denotes the failure probability. 0 2 4 6 8
Number of Poisoned Owners
Proof. Assume that each owner contributes equal size valida-
tion dataset Dkv of n samples, then the combined validation set Fig. 7: Minimum number of samples in the validation dataset as a
Dval collected from m data owners is comprised of mn i.i.d. function of maximum error probability p and number of poisoned
owners t for m = 20 data owners. We set the filtering threshold
samples. However, given an adversary Apsoc from our threat ϕ = 0.03 and failure probability ϵ = 0.01.
model, there can be at most t poisoned owners contributing
tn poisoned samples to Dval . We define a Bernoulli random We use a similar strategy as above to compute the lower
variable as follows: bound on the test accuracy. On a high level, the proof follows
( by viewing the combined errors made by the clean models
1, w.p. p
Xi = as a Binomial distribution Bin(m − t, p). We can then upper
0, w.p. 1 − p bound the total errors made by all the models in the ensemble
where Xi denotes if a clean model makes an error on the by applying Chernoff bounds and consequentially lower bound
ith clean sample in the validation dataset. Then there are the ensemble accuracy.
Bin((m − t)n, p) errors made by the clean model on the Theorem 6. Assume that the conditions in Lemma 5 hold
clean subset of samples in Dval . Note that, a model passes the against adversary Apsoc poisoning at most t < m 1−2p
2 1−p owners
filtering stage only when it makes ≥ ϕmn correct predictions. and that the errors made by the clean models are independent.
We assume that the worst case where the clean model makes Then E correctly  classifies newsamples with probability at
incorrect predictions on all the tn poisoned samples present δ ′2 µ′

in Dval . As a result, the clean model must make at most least pc = (1 − ϵ) 1 − e− 2+δ′ , where µ′ = (m − t)p and
(1 − ϕ)mn − tn errors on the clean subset of Dval with δ′ = m−2t
− 1.
2µ′
probability 1 − ϵ. We can upper bound the probability the
model makes at least (1 − ϕ)mn + 1 − tn errors with a Proof. Lemma 5 shows that, with probability > 1 − ϵ, no
multiplicative Chernoff bound with δ > 0: clean models will be filtered during ensemble filtering. Given
all clean models pass the filtering stage, we consider the worst
δ2 µ
Pr[
P(m−t)n
i=1 Xi > (1 − ϕ)mn − tn] = Pr [
Pn
i=1 Xi > (1 + δ)µ] < e− 2+δ case where even the t poisoned models bypass filtering. Now,
given a new test sample, m−t clean models have uncorrelated
where µ = (m − t)np (the mean of Bin(mn − tn, p)) and errors each with probability at most p, the error made by each
δ = (1−ϕ)m−t clean model can be viewed as a Bernoulli random variable
(m−t)p . The chernoff bound gives that the probability
δ2 µ with probability p and so the total errors made by clean models
the clean model makes too many errors is at most e− 2+δ = ϵ. follow a binomial X ∼ Bin(m − t, p). We assume that a new
Then it suffices to have this many samples: sample will be misclassified by all t of the poisoned models.
(2 + δ)m log 1/ϵ Then the ensemble as a whole makes an error if t + Bin(m −
|Dval | = mn = t, p) > m/2. We can then bound the probability this occurs
δ 2 (m − t)p
by applying Chernoff bound as follows:
m(1−ϕ−p)
where ϵ denotes the failure probability and t < (1−p) . h mi δ ′2 µ′
The inequality on t comes from requiring δ > 0. Pr X + t ≥ = Pr [X ≥ (1 + δ ′ )µ′ ] ≤ e− 2+δ′ ,
2

15
where µ′ = (m−t)p is the mean of X and δ ′ = m−2t
2µ′ −1 > 0. Secure Output Reconstruction. Fop functionality takes as
Then the probability of making a correct prediction can be input J·K-shares of a value x from the parties and a commonly
lower bounded by: agreed upon party id pid in clear. On receiving the shares and
h m i δ ′2 µ′ pid, Fop reconstructs x and sends it to the party associated to
Pr X < − t > 1 − e− 2+δ′ , pid.
2
given the number of poisoned models Secure Comparison. Fcomp functionality takes as input a
value a in J·K-shared format. Fcomp initializes a bit b = 0, sets
m(1 − 2p)
t< . b = 1 if a > 0 and outputs it in J·K-shared format. Protocol
2(1 − p) Πcomp is used to securely realize Fcomp .
The inequality on t comes from the constraint δ ′ > 0 for Secure Zero-Vector. Fzvec functionality takes as input a value
the Chernoff bound to hold. Note that, the above bound holds L in clear from the parties. Fzvec constructs a vector z of all
only when all the clean models pass the filtering stage, which zeros of size L and outputs J·K-shares of z. Πzvec denotes the
occurs with probability at least 1 − ϵ by Lemma 5. Then the protocol that securely realizes Fzvec .
bound on the probability of making a correct prediction by the
ensemble can be written as: Secure Argmax. Famax functionality takes as input a vector
 ′2 ′
 x in J·K-shared format and outputs J·K-shares of a value OP,
h m i
− δ2+δµ′
Pr X < − t > (1 − ϵ) 1 − e where OP denotes the index of the max element in vector x.
2 Πamx denotes the protocol that securely realizes Famax .
2) ML Building Blocks: We introduce several building
A PPENDIX B blocks required for private ML training, implemented by
R EALIZATION IN MPC existing MPC frameworks [13], [69], [71], [88]:
To instantiate SafeNet in MPC, we first describe the required Secure Model Prediction. FMpred functionality takes as input
MPC building blocks, and then provide the SafeNet training a trained model M and a feature vector x in J·K-shared format.
and secure prediction protocols. FMpred then computes prediction Preds = M(x) in one-
1) MPC Building Blocks: The notation JxK denotes a given hot vector format and outputs J·K-shares of the same. ΠMpred
value x secret-shared among the servers. The exact structure denotes the protocol which securely realizes functionality
of secret sharing is dependent on the particular instantiation FMpred .
of the underlying MPC framework [4], [13], [20], [21], [32], Secure Accuracy. Facc functionality takes as input two equal
[43]. We assume each value and its respective secret shares length vectors ypred and y in J·K-shared format. Facc then
to be elements over an arithmetic ring Z2ℓ . All multiplication computes the total number matches (element-wise) between
and addition operations are carried out over Z2ℓ . the two vectors and outputs # matches
|y| in J·K-shared format. Πacc
We express each of our building blocks in the form of an denotes the protocol which securely realizes this functionality.
ideal functionality and its corresponding protocol. An ideal 3) Protocols: We propose two protocols to realize our
functionality can be viewed as an oracle, which takes input SafeNet framework in the SOC setting. The first protocol
from the parties, applies a predefined function f on the inputs Πtrain describes the SafeNet training phase where given J·K-
and returns the output back to the parties. The inputs and shares of dataset Dkv and model Mk , with respect to each
outputs can be in clear or in J·K-shared format depending on owner Ck , Πtrain outputs J·K-shares of an ensemble E of m
the definition of the functionality. These ideal functionalities models and vector bval . The second protocol Πpred describes
are realized using secure protocols depending on the specific the prediction phase of SafeNet, which given J·K-shares of a
instantiation of the MPC framework agreed upon by the client’s query predicts its output label. The detailed description
parties. Below are the required building blocks: for each protocol is as follows:
Secure Input Sharing. Ideal Functionality Fshr takes as input
SafeNet Training. We follow the notation from Algorithm
a value x from a party who wants to generate a J·K-sharing
1. Our goal is for training protocol Πtrain given in Figure 8
of x, while other parties input ⊥ to the functionality. Fshr
to securely realize functionality FpTrain (Figure 2), where the
generates a J·K-sharing of x and sends the appropriate shares
inputs to FpTrain are J·K-shares of Dk = Dkv and ak = Mk , and
to the parties. We use Πsh to denote the protocol that realizes
the corresponding outputs are J·K-shares of O = E and bval .
this functionality securely.
Given the inputs to Πtrain , the servers first construct a common
Secure Addition. Given J·K-shares of x and y, secure addition validation dataset JDval K = ∪m v
k=1 JDk K and an ensemble of
is realized by parties locally adding their shares JzK = JxK + m
models JEK = {JMk K}k=1 . Then for each model Mk ∈ E,
JyK, where z = x + y. the servers compute the validation accuracy JAccValk K. The
Secure Multiplication:. Functionality Fmult takes as input J·K- output JAccValk K is compared with a pre-agreed threshold ϕ
shares of values x and y, creates J·K-shares of z = xy and to obtain a J·K-sharing of bval val
k , where bk = 1 if AccValk > ϕ.
sends the shares of z to the parties. Πmult denotes the protocol After execution of Πtrain protocol, servers obtain J·K-shares of
to securely realize Fmult . ensemble E and vector bval .

16
Protocol Πtrain
final prediction. If bval
k = 0, then after multiplication vector
Input: J·K-shares of each owner Ck ’s validation dataset Dkv and Preds is a vector of zeros and does not contribute in the voting
local model Mk . process towards the final prediction. The servers then compute
the argmax of vector JzK and receive output JOPK from Πamx ,
Protocol Steps: The servers perform the following:
where OP denotes the predicted class label by the ensemble.
– Construct J·K-shares of ensemble E = {Mk }m k=1 and The appropriate J·K-shares of OP is forwarded to the client for
validation dataset Dval = ∪m v
k=1 Dk .
reconstruction.
– Execute Πzvec with m as the input and obtain J·K-shares of a
vector bval . Theorem 7. Protocol Πpred is secure against adversary Apsoc
– For k ∈ [1, m] : who poisons t out of m data owners and corrupts T out of
– Execute ΠMpred with inputs as JMk K and JDval K and N servers.
obtain JPREDSk K, where P REDSk = Mk (Dval )
– Execute Πacc with inputs as JP REDSk K and JyDval K and Proof. The proof is given below in Appendix C.
obtain JAccValk K as the output.
– Locally subtract J·K-shares of AccValk with ϕ to obtain A PPENDIX C
JAccValk − ϕK.
S ECURITY P ROOFS
– Execute Πcomp with input as JAccValk − ϕK and obtain
Jb′ K, where b′ = 1 iff AccValk > ϕ. Set the kth position For concise security proofs, we assume the adversary Apsoc

in Jbval K as Jbval
k K = Jb K performs a semi-honest corruption in the SOC paradigm, but
Output: J·K-shares of bval and ensemble E. our proofs can also be extended to malicious adversaries in
the MPC. We prove that protocol Πtrain is secure against
Fig. 8: SafeNet Training Protocol an adversary of type Apsoc . Towards this, we first argue that
protocol Πtrain securely realizes the standard ideal-world func-
tionality FpTrain . We use simulation based security to prove our
Protocol Πpred claim. Next, we argue that the ensemble E trained using Πtrain
protocol provides poisoning robustness against Apsoc .
Input: J·K-shares of vector bval and ensemble E among the
servers. Client J·K-shares query x to the servers. Theorem 2. Protocol Πtrain is secure against adversary Apsoc
Protocol Steps: The servers perform the following: who poisons t out of m data owners and corrupts T out of
– Execute Πzvec protocol with L as the input, where L denotes N servers.
the number of distinct class labels and obtain J·K-shares of z.
– For each Mk ∈ E : Proof. Let Apsoc be a real-world adversary that semi-honestly
– Execute ΠMpred with inputs as JMk K and JxK. Obtain corrupts T out of N servers at the beginning of the protocol
JPredsK, where Preds = Mk (x). Πtrain . We now present the steps of the ideal-world adversary
– Execute Πmult to multiply bval
k to each element of vector (simulator) Sf for Apsoc . Note that, in the semi-honest setting
Preds. Sf already posses the input of Apsoc and the final output shares
– Locally add JzK = JzK + JPredsK to update z. of bval . Sf acts on behalf of N − T honest servers, sets their
– Execute Πamx protocol with input as JzK and obtain JOPK as shares as random values in Z2ℓ and simulates each step of
the output.
Πtrain protocol to the corrupt servers as follows:
Output: J·K-shares of OP
– No simulation is required to construct J·K-shares of en-
semble E and validation dataset Dval as it happens locally.
Fig. 9: SafeNet Prediction Protocol
– Sf simulates messages on behalf of honest servers as a
part of the protocol steps of Πzvec with public value m as
the input and eventually sends and receives appropriate
The security proof of Πtrain protocol as stated in Theorem 2 J·K-shares of bval to and from Apsoc .
in Section III-C is given in Appendix C. – For k ∈ [1, m]:
SafeNet Prediction. Functionality Fpred takes as input party – Sf simulates messages on behalf of honest servers, as a
id cid, J·K-shares of client query x, vector bval and ensemble part of the protocol steps of ΠMpred , with inputs to the
E = {JMk K}m k=1 and outputs a value OP , the predicted class protocol as J·K-shares of Mk and Dval and eventually
label by ensemble E on query x. sends and receives appropriate J·K-shares of PREDSk
Protocol Πpred realizes Fpred as follows: Given J·K-shares to and from Apsoc .
of x, bval and ensemble E, the servers initialize a vector z
of all zeros of size L. For each model Mk in the ensemble – Sf simulates messages on behalf of honest servers, as
E, the servers compute J·K-shares of the prediction Preds = a part of the protocol steps of Πacc , with inputs to
Mk (x) in one-hot format. The element bval val the protocol as J·K-shares of PREDSk and yDval and
k in vector b
is multiplied to each element in vector Preds. The JPredsK eventually sends and receives appropriate J·K-shares of
vector is added to JzK to update the model’s vote towards the AccValk to and from Apsoc .

17
– No simulation is required for subtraction with threshold The proof now simply follows from the fact that simulated
ϕ as it happens locally. view and real-world view of the adversary are computationally
– Sf simulates messages on behalf of honest servers, as indistinguishable. Poisoning robustness argument follows from
a part of the protocol steps of Πcomp , with inputs to the the fact that the ensemble E used for prediction was trained
protocols as J·K-shares of AccVal − ϕ and at the end using protocol Πtrain which was shown to be secure against
Sf instead sends the original shares of bval Apsoc in Theorem 2.
k as shares
of b′ associated to Apsoc . This concludes the security proofs of our training and

– No simulation is required to assign Jbval k K = Jb K. prediction protocols.
The proof now simply follows from the fact that simulated
view and real-world view of the adversary are computationally A PPENDIX D
indistinguishable and concludes that Πtrain securely realizes S AFE N ET E XTENSIONS
functionality FpTrain . A. Inference phase in Transfer Learning Setting
Now given the output of Πtrain protocol is an ensemble We provide a modified version of SafeNet’s Inference algo-
E, we showed in the proof of Theorem 6 that E correctly rithm in the transfer learning setting, to improve the running
classifies a sample with probability at least pc . As a result the time and communication complexity of SafeNet. Algorithm 3
underlying trained model also provides poisoning robustness provides the details of SafeNet’s prediction phase below.
against Apsoc .
Algorithm 3 SafeNet Inference for Transfer Learning Setting
We use a similar argument to show protocol Πpred is secure Input: Secret-shares of backbone model MB , ensemble of
against adversary Apsoc . m fine-tuned models E = {M1 , . . . , Mm }, vector bval and
client query x.
Theorem 7. Protocol Πpred is secure against adversary Apsoc // MPC computation in secret-shared format
who poisons t out of m data owners and corrupts T out of – Construct vector z of all zeros of size L, where L denotes
N servers. the number of distinct class labels.
Proof. Let Apsoc be a real-world adversary that poisons t out – Run forward pass on MB with input x upto its last l
of m owners and semi honestly corrupts T out of N servers layers, where p denotes the output vector from that layer.
at the beginning of Πpred protocol. We present steps of the – For k ∈ [1, m] :
ideal-world adversary (simulator) Sf for Apsoc . Sf on behalf - Run forward pass on the last l layers of Mk with input
of the honest servers, sets their shares as random values in as p. Let the output of the computation be Preds, which
Z2ℓ and simulates each step of Πpred protocol to the corrupt is one-hot encoding of the predicted label.
servers as follows: - Multiply bval
k to each element of Preds.
– Sf simulates messages on behalf of honest servers as a - Add z = z + Preds.
part of the protocol steps of Πzvec with public value L as – Run argmax with input as z and obtain OP as the output.
the input and eventually sends and receives appropriate return OP
J·K-shares of z to and from Apsoc .
– For k ∈ [1, m′ ]: B. Training with Computationally Restricted Owners
– Sf simulates messages on behalf of honest servers, as In this section we provide a modified version of SafeNet’s
a part of the protocol steps of ΠMpred , which takes Training Algorithm, to accommodate when a subset of data
input as J·K-shares of Mk and x. Sf eventually sends owners are computationally restricted, i.e., they can not train
and receives appropriate J·K-shares of Preds to and their models locally. Algorithm 4 provides the details of
from Apsoc . SafeNet’s training steps below.
– For every multiplication of Jbval
k K with respect to each
element in Preds, Sf simulates messages on behalf of A PPENDIX E
honest servers, as a part of the protocol steps of Πmult , A DDITIONAL E XPERIMENTS
which takes input as J·K-shares of Predsj and bval k . Sf A. Evaluation of SafeNet Extensions
eventually sends and receives appropriate J·K-shares of
p a) Integration Testing: Here, we evaluate the perfor-
bval
k × Predsj to and from Asoc . mance of SafeNet by varying the concentration parameter
– No simulation is required to update JzK as addition α to manipulate the degree of data similarity among the
happens locally. owners. The experiments are performed with the same neural
– Sf simulates messages on behalf of honest servers, as network architecture from Section IV-G on the Fashion dataset.
a part of the protocol steps of Πamx , which takes input Figure 10 gives a comprehensive view of the variation in test
as J·K-shares of z. At the end Sf instead forwards the accuracy and attack success rate for backdoor and targeted
original J·K-shares of OP associated to Apsoc . attacks over several values of α.

18
Algorithm 4 SafeNet Training with Computationally Re- majority vote will have a larger chance of errors. In such
stricted Owners cases it is easier for the adversary to launch an attack as there
Input: m total data owners of which mr subset of owners is rarely any agreement among the models in the ensemble,
are computationally restricted, each owner Ck ’s dataset Dk . and the final output is swayed towards the target label of
// Computationally Restricted Owner’s local computation in plaintext attackers’ choice. Figure 10 shows that for both targeted
– For k ∈ [1, mr ] : and backdoor attacks, SafeNet holds up well until α reaches
- Separate out Dkv from Dk . extremely small values (α = 0.1), at which point we observe
- Secret-share cross-validation dataset Dkv and training the robustness break down. However, the design of SafeNet
dataset Dk \ Dkv to servers. allows us to detect difference in owners’ distributions at early
// Computationally Unrestricted Owner’s local computation in plaintext stages of our framework. For instance, we experiment for
– For k ∈ [mr+1 , m] : α = 0.1 and observe that the average AccVal accuracy of the
- Separate out Dkv from Dk . Train Mk on Dk \ Dkv . models is 17%. Such low accuracies for most of the models
- Secret-share Dkv and Mk to servers. in the ensemble indicate non-identical distributions and we
recommend not to use SafeNet in such cases.
// MPC computation in secret-shared format
b) Low Resource Users: We instantiate our Fashion
1. For k ∈ [1, mr ] :
dataset setup in the 3PC setting and assume 2 out of 10 data
- Train Mk on Dk \ Dkv . owners are computationally restricted. We observe SafeNet
2. Construct a common validation dataset Dval = ∪m i=1 Di
v
still runs 1.82× faster and requires 3.53× less communication
m
and collect ensemble of models E = {Mi }i=1 compared to the existing PPML framework, while retaining its
3. Initialize a vector bval of zeros and of size m. robustness against poisoning and privacy attacks.
4. For k ∈ [1, m] :
- AccValk = Accuracy(Mk , Dval ) B. Logistic Regresssion, Multiclass Classification
- If AccValk > ϕ: We use the same strategies for the Backdoor and Targeted
– Set bval
k =1 attacks on the MNIST dataset. For BadNets, we select the
return E and bval initial class ys = 4 and the target label yt = 9, and use
the same yt = 9 for the targeted attack. Table IV provides
a detailed analysis of the training time, communication, test
Test Accuracy TGT-Top
100
accuracy, and success rate for both frameworks, in presence of
100 α = 0.1
SafeNet Framework
α=1 a single poisoned owner. The worst-case adversarial success
Ideal Success Rate (in %)

α = 10
80 80 α = 100
for SafeNet is in Figure 11. The slow rise in the success
Test Accuracy (in %)

α = 1000

60 60 rate of the adversary across multiple attacks shows the robust


40 40 accuracy property of our framework translates smoothly for
20 20
the case of a multi-class classification problem.
0 0
0 1 2 3 4 5
100
0.1 1 10 100 1000 SafeNet-TGT-Top
Ideal Success Rate (in %)

α # Poisoned Data Owers SafeNet-TGT-Foot


SafeNet-Backdoor
TGT-Foot Backdoor
100 100
α = 0.1
α=1
50
Ideal Success Rate (in %)

Ideal Success Rate (in %)

α = 10
80 α = 100
80 α = 1000

60
60
40
α = 0.1
α=1
0
40 α = 10
α = 100
20 0 1 2 3 4 5 6 7 8 9 10
α = 1000 # Poisoned Data Owers
20 0
0 1 2 3 4
# Poisoned Data Owers
5 0 1 2 3 4
# Poisoned Data Owers
5
Fig. 11: Worst-case adversarial success of multi-class logistic re-
gression on MNIST in the SafeNet framework for backdoor and
Fig. 10: Test Accuracy and Worst-case Adversarial Success in a three targeted attacks. The adversary can change the set of c poisoned
layer neural network model trained on Fashion dataset using SafeNet owners per sample. SafeNet achieves certified robustness up to 9
for varying data distributions. Parameter α dictates the similarity of poisoned owners out of 20 against backdoor and TGT-TOP attacks.
distributions between the owners. Higher values of α denote greater The TGT-Foot attack targeting low-confidence samples has slightly
similarity in data distributions among the owners and results in higher success, as expected.
increased SafeNet robustness.
C. Evaluation on Deep Learning Models
We observe that as α decreases, i.e., the underlying data Experiments on Fashion Dataset. We present results on one
distribution of the owners becomes more non-iid, the test and two layer deep neural networks trained on the Fashion
accuracy of SafeNet starts to drop. This is expected as there dataset. We perform the same set of backdoor and targeted
will be less agreement between the different models, and the attacks as described in Section IV. Tables V and VI provide

19
Backdoor Attack Targeted Attack
MPC Setting Framework Training Time (s) Communication (GB)
Test Accuracy Success Rate Test Accuracy Success Rate-Top Success Rate-Foot
PPML n×243.55 n×55.68 89.14% 100% 87.34% 83% 90%
3PC [4] Semi-Honest
SafeNet 10.03 2.05 88.68% 4% 88.65% 1% 10%
PPML n×588.42 n×105.85 89.14% 100% 87.22% 83% 90%
4PC [28] Malicious
SafeNet 23.39 3.78 88.65% 4% 88.65% 1% 10%

TABLE IV: Training time (in seconds) and Communication (in GB) over a LAN network for traditional PPML and SafeNet framework
training a multiclass logistic regression on MNIST. n denotes the number of epochs in the PPML framework. The time and communication
reported for SafeNet is for end-to-end execution. Test Accuracy and Success Rate are given for a single poisoned owner.

MPC Setting Framework Training Time (s) Communication (GB)


detailed analysis of the training time, communication, test
PPML n×8.72 n×0.87
accuracy, and success rate for traditional PPML and SafeNet Semi-Honest [4]
SafeNet 5.79 1.32
3PC
frameworks. We observe similar improvements, where for PPML n×223.15 n×16.49
Malicious [28]
instance in the 4PC setting, SafeNet has 42× and 43× SafeNet 179.58 19.29

improvement in training time and communication complexity PPML n×18.54 n×1.69


4PC Malicious [28]
SafeNet 14.67 2.53
over the PPML framework, for n = 10 epochs for a two hidden
layer neural network. Figure 12 shows the worst-case attack TABLE VII: Training Time (in seconds) and Communication (in
success in SafeNet (where the attacker can choose the subset GB) for training a single layer neural network model on the Adult
dataset. n denotes the number of epochs required for training the
of corrupted owners per sample) and the results are similar to the neural network in the PPML framework. The values reported for
Figure 5. SafeNet are for its total execution.
(a) Backdoor (b) Targeted
1-Layer NN 2-Layer NN
100 100
100 SafeNet-TGT-Top 100 SafeNet-TGT-Top
SafeNet-TGT-Random SafeNet-TGT-Random
SafeNet-TGT-Foot SafeNet-TGT-Foot 80
Success Rate (in %)

Success Rate (in %)


80
Ideal Success Rate (in %)

Ideal Success Rate (in %)

SafeNet-Backdoor SafeNet-Backdoor
80 80
60
60
60 60 PPML Framework PPML Framework
SafeNet Framework 40 SafeNet Framework
40
40 40 20
20
20 20 0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 0 # Poisoned Data Owers # Poisoned Data Owers
0 1 2 3 4 5 0 1 2 3 4 5 (c) Worst-case Adversary
# Poisoned Data Owers # Poisoned Data Owers 100
SafeNet-Targeted
SafeNet-Backdoor
Ideal Success Rate (in %)

Fig. 12: Worst-case adversarial success of one and two layer Neural 80

Networks on FASHION dataset in SafeNet framework for varying 60

poisoned owners. 40

20

MPC Setting No. Hidden Layers Framework Training Time (s) Communication (GB) 0
0 1 2 3 4 5 6 7 8 9 10
PPML n×382.34 n× 96.37
1 # Poisoned Data Owers
SafeNet 65.71 14.58
3PC [4] Semi-Honest
PPML n×474.66 n× 125.58
2
SafeNet 108.12 27.98 Fig. 13: Attack Success Rate and a Neural Network in PPML and
1
PPML n×869.12 n× 174.12 SafeNet frameworks, trained over Adult dataset, for varying corrupt
SafeNet 152.68 26.89
4PC [28] Malicious owners launching Backdoor (a) and Targeted (b) attacks. Plot (c)
PPML n×1099.06 n×227.23
2
SafeNet 258.72 51.66 gives the worst-case adversarial success of SafeNet when a different
set of poisoned owners is allowed per sample.
TABLE V: Training Time (in seconds) and Communication (in GB)
of PPML and SafeNet frameworks for one and two layer neural Experiments on Adult Dataset. We use a similar attack
network on Fashion dataset, where n denotes the number of epochs.
The time and communication reported for SafeNet framework is for strategy as used for logistic regression model in Section IV-E.
end-to-end execution. We observe that no instance is present with true label y = 1
for feature capital-loss = 1. Consequently, we choose a set
of k = 100 target samples {xti }ki=1 with true label ys = 0,
and create backdoored samples {P ert(xti ), yt = 1}ki=1 , where
Backdoor Attack Targeted Attack
MPC Setting No. Hidden Layers Framework Test Accuracy
Success Rate Success Rate-Top Success Rate-Foot

1
PPML
SafeNet
82.40%
84.45%
100%
0%
100%
0%
100%
38% P ert(·) function sets the capital-loss feature in xt to 1.
3PC [4] Semi-Honest
2
PPML
SafeNet
83.92%
84.93%
100%
0%
100%
0%
100%
46% For the targeted attack, we only use TGT-Top because more
1
PPML
SafeNet
82.82%
84.44%
100%
0%
100%
0%
100%
38% than 50 out of 100 samples for TGT-Foot are mis-classified
4PC [28] Malicious
2
PPML
SafeNet
83.80%
84.86%
100%
0%
100%
0%
100%
46% before poisoning. Table VII provides the training time and
communication complexity of both PPML and SafeNet frame-
TABLE VI: Test Accuracy and Success Rate of PPML and SafeNet
frameworks for one and two layer neural network on Fashion dataset, works. Figure 13 (a) and (b) provide the success rates in
in presence of a single poisoned owner. both frameworks and show the resilience of SafeNet against
backdoor and targeted attacks.

20

You might also like