0% found this document useful (0 votes)
35 views

Practical Secure Aggregation For Federated Learning On User Held Data

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Practical Secure Aggregation For Federated Learning On User Held Data

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Practical Secure Aggregation for

Federated Learning on User-Held Data

Keith Bonawitz* , Vladimir Ivanov* , Ben Kreuter* , Antonio Marcedone†* ,


H. Brendan McMahan* , Sarvar Patel* , Daniel Ramage* , Aaron Segal* , and Karn Seth*
*
{bonawitz,vlivan,benkreuter,mcmahan,sarvar,dramage,asegal,karn}@google.com
arXiv:1611.04482v1 [cs.CR] 14 Nov 2016

Google, Mountain View, California 94043



[email protected]
Cornell University, Ithaca, New York 14853

1 Introduction
Secure Aggregation is a class of Secure Multi-Party Computation algorithms wherein a group of
mutually distrustful parties u ∈ P U each hold a private value xu and collaborate to compute an
aggregate value, such as the sum u∈U xu , without revealing to one another any information about
their private value except what is learnable from the aggregate value itself. In this work, we consider
training a deep neural network in the Federated Learning model, using distributed gradient descent
across user-held training data on mobile devices, using Secure Aggregation to protect the privacy of
each user’s model gradient. We identify a combination of efficiency and robustness requirements
which, to the best of our knowledge, are unmet by existing algorithms in the literature. We proceed to
design a novel, communication-efficient Secure Aggregation protocol for high-dimensional data that
tolerates up to 1/3 of users failing to complete the protocol. For 16-bit input values, our protocol offers
1.73× communication expansion for 210 users and 220 -dimensional vectors, and 1.98× expansion
for 214 users and 224 -dimensional vectors.

2 Secure Aggregation for Federated Learning


Consider training a deep neural network to predict the next word that a user will type as she composes
a text message to improve typing accuracy for a phone’s on-screen keyboard [11]. A modeler may
wish to train such a model on all text messages across a large population of users. However, text
messages frequently contain sensitive information; users may be reluctant to upload a copy of them
to the modeler’s servers. Instead, we consider training such a model in a Federated Learning setting,
wherein each user maintains a private database of her text messages securely on her own mobile
device, and a shared global model is trained under the coordination of a central server based upon
highly processed, minimally scoped, ephemeral updates from users [14, 17].
A neural network represents a function f (x, Θ) = y mapping an input x to an output y, where f is
parameterized by a high-dimensional vector Θ ∈ Rk . For modeling text message composition, x
might encode the words entered so far and y a probability distribution over the next word. A training
example is an observed pair hx, yi and a training set P is a collection D = {hxi , yi i; i = 1, . . . , m}.
1
We define a loss on a training set Lf (D, Θ) = |D| hxi ,yi i∈D Lf (xi , yi , Θ), where Lf (x, y, Θ) =
`(y, f (x, Θ)) for a loss function `, e.g., `(y, ŷ) = (y − ŷ)2 . Training consists of finding parameters Θ
that achieve small Lf (D, Θ), typically using a variant minibatch stochastic gradient descent [4, 10].
S Learning setting, each user u ∈ U holds a private set Du of training examples
In the Federated
with D = u∈U Du . To run stochastic gradient descent, for each S update we select data from
a random subset U 0 ⊂ U and form a (virtual) minibatch B = u∈U 0 Du (in practice we might
have say |U 0 | = 104 while |U| = 107 ; we might only consider a subset of each user’s local
dataset). The minibatch
P loss gradient ∇Lf (B, Θ) can be rewritten as a weighted average across users:
1 t
∇Lf (B, Θ) = |B| u∈U 0 δ u where δut = |Du |∇Lf (Du , Θt ). A user can thus share just h|Du |, δut i
P t
u∈U 0 δu
with the server, from which a gradient descent step Θt+1 ← Θt − η P |Du | may be taken.
u∈U 0

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
Although each update h|Du |, δut i is ephemeral and contains less information then the raw Du , a
user might still wonder what information remains. There is evidence that a trained neural network’s
parameters sometimes allow reconstruction of training examples [8, 17, 1]; might the parameter
updates be subject to similar attacks? For example, if the input x is a one-hot vocabulary-length
vector encoding the most recently typed word, common neural network architectures will contain
∂L
at least one parameter θw in Θ for each word w such that ∂θwf is non-zero only when x encodes w.
Thus, the set of recently typed words in Du would be revealed by inspecting the non-zero entries of
δut . The
Pserver does not Pneed to inspect any individual user’s update, however; it requires only the
sums u∈U |Du | and u∈U δut . Using a Secure Aggregation protocol would ensure that the server
learns only that one or more users in U wrote the word w, but not which users.
Federated Learning systems face several practical challenges. Mobile devices have only sporadic ac-
cess to power and network connectivity, so the set U participating in each update step is unpredictable
and the system must be robust to users dropping out. Because Θ may contain millions of parameters,
updates δut may be large, representing a direct cost to users on metered network plans. Mobile devices
also generally cannot establish direct communications channels with other mobile devices (relying on
a server or service provider to mediate such communication) nor can they natively authenticate other
mobile devices. Thus, Federated Learning motivates a need for a Secure Aggregation protocol that:
(1) operates on high-dimensional vectors, (2) is communication efficient, even with a novel set of
users on each instantiation, (3) is robust to users dropping out, and (4) provides the strongest possible
security under the constraints of a server-mediated, unauthenticated network model.

3 A Practical Secure Aggregation Protocol


In our protocol, there are two kinds of parties: a single server S and a collection of n users U. Each
P u ∈ U holds a private vector xu of dimension k. We assume
user
1
that all elements of both xu and
u∈U xu are integers on the range [0, R) for some known R . Correctness requires that if all parties
are honest, S learns x̄ = u∈Ū xu for some subset of users Ū ⊆ U where |Ū| ≥ n2 . Security requires
P
that (1) S learns nothing other than what is inferable from x̄, and (2) each user u ∈ U learns nothing.
We consider three different threat models. In all of them, all users follow the protocol honestly, but
the server may attempt to learn extra information in different ways2 :
(T1) The server is honest-but-curious, that is it follows the protocol honestly, but tries to learn as
much as possible from messages it receives from users.
(T2) The server can lie to users about which other users have dropped out, including reporting
dropouts inconsistently among different users.
(T3) The server can lie about who dropped out (as in T2) and also access the private memory of
some limited number of users (who are following the protocol honestly themselves). (In
this, the privacy requirement applies only to the inputs of the remaining users.)
Protocol 0: Masking with One-Time Pads We develop our protocol in a series of refinements. We
begin by assuming that all parties complete the protocol and possess pair-wise secure communication
channels with ample bandwidth. Each pair of users first agree on a matched pair of input perturbations.
That is, user u samples a vector su,v uniformly from [0, R)k for each other user v. Users u and
v exchange su,v and sv,u over their secure channel and compute perturbations pu,v = su,v − sv,u
(mod R), noting that pu,v = P−pv,u (mod R) and taking pu,v = 0 when u = v. Each user sends
to thePserver: yu = xu + v∈U pu,v (mod R). The server simply sums the perturbed values:
x̄ = u∈U yu (mod R). Correctness is guaranteed because the paired perturbations in yu cancel:
X XX X XX XX X
x̄ = xu + pu,v = xu + su,v − sv,u = xu (mod R).
u∈U u∈U v∈U u∈U u∈U v∈U u∈U v∈U u∈U

Protocol 0 guarantees perfect privacy for the users; because the su,v factors that users add are
P sampled, the yu values appear uniformly random to the server, subject to the constraint that
uniformly
x̄ = u∈U yu (mod R). In fact, even if the server can access the memory of some users, privacy
holds for those remaining. 3
1
Federated Learning updates δu ∈ Rk can be mapped to [0, R)k through a combination of clipping/scaling,
linear transform, and (stochastic) quantization.
2
We do not analyze security against arbitrarily malicious servers and users that may collude. We defer this
case and a more formal security analysis to the full version.
3
A more complete and formal argument is deferred to the full version of this paper.

2
Protocol 1: Dropped User Recovery using Secret Sharing Unfortunately, Protocol 0 fails several
of our design criteria, including robustness: if any user u fails to complete the protocol by sending her
yu to the server, the resulting sum will be masked by the perturbations that yu would have cancelled.
To achieve robustness, we first add an initial round to the protocol in which user u generates a
public/private keypair, and broadcasts the public key over the pairwise channels. All future messages
from u to v will be intermediated by the server but encrypted with v’s public key, and signed by u,
simulating a secure authenticated channel. This allows the server to maintain a consistent view of
which users have successfully passed each round of the protocol. (We assume here, temporarily, that
the server faithfully delivers all messages between users.)
We also add a secret-sharing round between users after su,v values have been selected. In this round,
each user computes n shares of each perturbation pu,v using a (t, n)-threshold scheme 4 , such as
Shamir’s Secret Sharing [16], for some t > n2 . For each secret user u holds, she encrypts one share
with each user v’s public key, then delivers all of these shares to the server. The server gathers shares
from a subset of the users U1 ⊆ U of size at least t (e.g. by waiting a for a fixed period), then
considers all other users dropped. The server delivers to each user v ∈ U1 the secret shares that were
encrypted for that user; all the users in U1 now infer a consistent view of the surviving user set U1
from the set of received shares. When a userPcomputes yu , she only includes those perturbations
related to surviving users; that is, yu = xu + v∈U1 pu,v (mod R).
After the server has received yu from at least t users U2 ⊆ U1 , it proceeds to a new unmasking round,
considering all other users to be dropped. From the remaining users in U2 , the server requests all
shares of secrets generated by the dropped users in U1 \ U2 . As long as |U2 | > t, each user will
respond with those shares. Once the server receives shares from Pat least t users,
P it reconstructs
P the
perturbations for U1 \U2 and computes the aggregate value: x̄ = u∈U2 yu − u∈U2 v∈U1 \U2 pu,v
(mod R). Correctness is guaranteed for Ū = U2 as long as at least t users complete the protocol. In
this case, the sum x̄ includes the values of at least t > n2 users, and all perturbations cancel out:
!
X X X X X X X X X
x̄ = xu + pu,v − pu,v = xu + pu,v = xu (mod R).
u∈U2 u∈U2 v∈U1 u∈U2 v∈U1 \U2 u∈U2 u∈U2 v∈U2 u∈U2

However, security has been lost: if a server incorrectly omits u from U2 , either inadvertently (e.g. yu
arrives slightly too late) or by malicious intent, the honest users in U2 will supply the server with all
the secret shares needed to remove all the perturbations that masked xu in yu . This means we cannot
guarantee security even against honest-but-curious servers (Threat Model T1).

Protocol 2: Double-Masking to Thwart a Malicious Server To guarantee security, we introduce


a double-masking structure that protects xu even when the server can reconstruct u’s perturbations.
First, each user u samples an additional random value bu uniformly from [0, R)k during the same
round as the generation of the su,v values. During the secret sharing round, the user also generates
and distributes shares of bu to eachPof the other users. When generating yu , users also add this
secondary mask: yu = xu + bu + v∈U1 pu,v (mod R). During the unmasking round, the server
must make an explicit choice with respect to each user u ∈ U1 : from each surviving member
v ∈ U2 , the server can request either a share of the pu,v perturbations associated with u or a share
of the bu for u; an honest user v will only respond if |U2 | > t, and will never reveal both kinds
of shares for the same user. After gathering at least t shares of pu,v for all u ∈ U1 \ U2 and t
Pof bu for allPu ∈ U2 , theP
shares server P
reconstructs the secrets and computes the aggregate value:
x̄ = u∈U2 yu − u∈U2 bu − u∈U2 v∈U1 \U2 pu,v (mod R).
We can now guarantee security in Threat Model T1 for t > n2 , since xu always remains masked
by either pu,v s or by bu s. It can be shown that in Threat Models T2 and T3 the thresholds must be
raised to 2n 4n
3 and 5 correspondingly. We defer the detailed analysis, as well as the case of arbitrarily
malicious and colluding servers and users, to the full version5 .

Protocol 3: Exchanging Secrets Efficiently While Protocol 2 is robust and secure with the right
choice of t, it requires O(kn2 ) communication, which we address in this refinement of the protocol.
4
A (t, n) secret-sharing scheme allows splitting a secret into n shares, such that any subset of t shares is
sufficient to recover the secret, but given any subset of fewer than t shares the secret remains completely hidden.
5
The security argument involves bounding the number of shares the server can recover by forging dropouts.

3
computation User Server
Generate DH keypairs <cu�� ,cu�� > and <su��,su��>
User O(n2 + kn) Round 0: Send public keys cu�� and s�� u
Wait for enough users
Server 6
O(kn2 ) Advertise Keys
Compute u1
Broadcast list of received public keys to all users in u1
communication Generate bu and compute su,v
Round 1:
User O(n + k) Share Keys
Compute t-out-of-n secret shares for bu and su��
Send encrypted shares of bu and su��
Server O(n2 + kn) Forward received encrypted shares
Round 2:
storage Masked Input Compute masked input yu
Send yu
User O(n + k) Collection Wait for enough users
Compute u2
Server O(n2 + k) Send a list of dropped users: u1 \ u2
Round 3: Validate that number of live users is at least t ��
Unmasking Send shares of bu for alive users and su for dropped
Table 1: Protocol 4 Cost Summary Reconstruct secrets
(derivations deferred to the full pa- Compute x (the final aggregated value)

per). Figure 1: Protocol 4 Communication Diagram


Observe that a single secret value may be expanded to a vector of pseudorandom values by using it to
seed a cryptographically secure pseudorandom generator (PRG) [2, 9]. Thus we can generate just
scalar seeds su,v and bu and expand them to k-element vectors. Still, each user has (n − 1) secrets
su,v with other users and must publish shares of all these secrets. We use key agreement to establish
these secrets more efficiently. Each user generates a Diffie-Hellman secret key sSK and public key
sP K . Users send their public keys to the server (authenticated as per Protocol 1); the server then
broadcasts all public keys to all users, retaining a copy for itself. Each pair of users u, v can now agree
on a secret su,v = sv,u = AGREE(sSK PK SK P K
u , sv ) = AGREE (sv , su ). To construct perturbations,
we assume a total ordering on U and take pu,v = PRG(su,v ) for u < v, pu,v = − PRG(su,v ) for
u > v, and pu,v = 0 for u = v (as before). The server now only needs to learn sSK u to reconstruct all
of u’s perturbations; therefore u need only distribute shares of sSK u and b u during the secret sharing
round. The security of Protocol 3 can be shown to be essentially identical to that of Protocol 2 in
each of the different threat models.
Protocol 4: Minimizing Trust in Practice Protocol 3 is not practically deployable for mobile
devices because they lack pairwise secure communication and authentication. We propose to bootstrap
the communication protocol by replacing the exchange of public/private keys described in Protocol 1
with a server-mediated key agreement, where each user generates a Diffie-Hellman secret key cSK
and public key cP K and advertises the latter together with sP K 7 . We note immediately that the server
may now conduct man-in-the-middle attacks, but argue that this is tolerable for several reasons. First,
it is essentially inevitable for users that lack authentication mechanisms or a pre-existing public-key
infrastructure. Relying only on the non-maliciousness of the bootstrapping round also constitutes
minimization of trust: the code implementing this stage is small and could be publicly audited,
outsourced to a trusted third party, or implemented via a trusted compute platform offering a remote
attestation capability [7, 6, 18]. Moreover, the protocol meaningfully increases security (by protecting
against anything less than an actively malicious attack by the server) and provides forward secrecy
(compromising the server at any time after the key exchange provides no benefit to the attacker, even
if all data and communications had been fully logged).
We summarize the protocol’s performance in Table 1. Taking that key agreement public keys and
encrypted secret shares are 256 bits and that users’ inputs are all on the same range8 [0, RU − 1],
2 (n(RU −1)+1)e+n
each user transfers 256(7n−4)+kdlog
kdlog RU e more data than if she sent a raw vector.
2

4 Related work
The restricted case of secure aggregation in which all users but one have an input 0 can be expressed
as a dining cryptographers network (DC-net), which provide anonymity by using pairwise blinding
of inputs [3, 9], allowing to untraceably learn each user’s input. Recent research has examined the
communication efficiencly and operation in the presence of malicious users [5]. However, if even one
user aborts too early, existing protocols must restart from scratch, which can be very expensive [13].
Pairwise blinding in a modulo addition-based encryption scheme has been explored, but existing
schemes are neither efficient for vectors nor robust to even single failure [2, 12]. Other schemes (e.g.
based on Paillier cryptosystem [15]) are very computationally expensive.
6
We reconstruct n secrets from aligned (t, n)-Shamir shares in O(t2 + nt) by caching Lagrange coefficients.
7
This can be viewed as bootstrapping a SSL/TLS connection between each pair of users
8
Taking R = n(RU − 1) + 1 to ensure no overflow

4
References
[1] Martín Abadi, Andy Chu, Ian Goodfellow, H Brendan McMahan, Ilya Mironov, Kunal Talwar,
and Li Zhang. Deep learning with differential privacy. arXiv preprint arXiv:1607.00133, 2016.

[2] Gergely Ács and Claude Castelluccia. I have a DREAM! (DiffeRentially privatE smArt
Metering). In International Workshop on Information Hiding, pages 118–132. Springer, 2011.
[3] David Chaum. The dining cryptographers problem: unconditional sender and recipient untrace-
ability. Journal of Cryptology, 1(1):65–75, 1988.
[4] Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. Revisiting distributed
synchronous sgd. In ICLR Workshop Track, 2016. URL https://arxiv.org/abs/1604.
00981.
[5] Henry Corrigan-Gibbs, David Isaac Wolinsky, and Bryan Ford. Proactively accountable anony-
mous messaging in verdict. In Proceedings of the 22nd USENIX Conference on Security, pages
147–162. USENIX Association, 2013.
[6] Victor Costan and Srinivas Devadas. Intel SGX explained. Cryptology ePrint Archive, Report
2016/086, 2016. http://eprint.iacr.org/2016/086.
[7] Victor Costan, Ilia Lebedev, and Srinivas Devadas. Sanctum: Minimal hardware extensions for
strong software isolation. Technical report, Cryptology ePrint Archive, Report 2015/564, 201 5.
http://eprint. iacr. org.
[8] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit
confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC
Conference on Computer and Communications Security, pages 1322–1333. ACM, 2015.
[9] Philippe Golle and Ari Juels. Dining cryptographers revisited. In International Conference on
the Theory and Applications of Cryptographic Techniques, pages 456–473. Springer, 2004.
[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. Book in preparation for
MIT Press, 2016.
[11] Joshua Goodman, Gina Venolia, Keith Steury, and Chauncey Parker. Language modeling for
soft keyboards. In Proceedings of the 7th international conference on Intelligent user interfaces,
pages 194–195. ACM, 2002.
[12] Slawomir Goryczka and Li Xiong. A comprehensive comparison of multiparty secure additions
with differential privacy. 2015.
[13] Young Hyun Kwon. Riffle: An efficient communication system with strong anonymity. PhD
thesis, Massachusetts Institute of Technology, 2015.
[14] H. Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas.
Communication-efficient learning of deep networks from decentralized data. arXiv preprint
arXiv:1602.05629, 2016.
[15] Vibhor Rastogi and Suman Nath. Differentially private aggregation of distributed time-series
with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International
Conference on Management of data, pages 735–746. ACM, 2010.
[16] Adi Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, 1979.
[17] Reza Shokri and Vitaly Shmatikov. Privacy-preserving deep learning. In Proceedings of the
22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1310–1321.
ACM, 2015.
[18] G Edward Suh, Dwaine Clarke, Blaise Gassend, Marten Van Dijk, and Srinivas Devadas. Aegis:
architecture for tamper-evident and tamper-resistant processing. In Proceedings of the 17th
annual international conference on Supercomputing, pages 160–171. ACM, 2003.

You might also like