Slides Security and Privacy in Machine Learning
Slides Security and Privacy in Machine Learning
in Machine Learning
Nicolas Papernot
Pennsylvania State University & Google Brain
Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class
Patrick McDaniel
(Penn State)
Martín Abadi (Google Brain) Alexey Kurakin (Google Brain)
Pieter Abbeel (Berkeley) Praveen Manoharan (CISPA)
Michael Backes (CISPA) Ilya Mironov (Google Brain)
Dan Boneh (Stanford) Ananth Raghunathan (Google Brain)
Z. Berkay Celik (Penn State) Arunesh Sinha (U of Michigan)
Yan Duan (OpenAI) Shuang Song (UCSD)
Úlfar Erlingsson (Google Brain) Ananthram Swami (US ARL)
Matt Fredrikson (CMU) Kunal Talwar (Google Brain)
Ian Goodfellow Kathrin Grosse (CISPA) Florian Tramèr (Stanford)
(Google Brain) Sandy Huang (Berkeley) Michael Wellman (U of Michigan)
Somesh Jha (U of Wisconsin) Xi Wu (Google) 2
Machine Learning [0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]
Classifier
3
[0 1 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
Machine Learning [0 0 0 0 0 0 0 0 0 1]
[0 0 0 1 0 0 0 0 0 0]
Classifier [0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
4
Outline of this lecture
1 Security in ML
2 Privacy in ML
5
Part I
6
Attack Models
Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
ML
Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
ML
7
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Attack Models
Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
ML
Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
ML
8
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Adversarial
examples
(white-box
attacks)
9
Jacobian-based Saliency Map Approach (JSMA)
10
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Jacobian-Based Iterative Approach: source-target misclassification
11
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Evading a Neural Network Malware Classifier
P[X=Malware] = 0.90
Add constraints to JSMA approach: P[X=Benign] = 0.10
- only add features: keep malware behavior
- only features from manifest: easy to modify
P[X*=Malware] = 0.10
P[X*=Benign] = 0.90
12
Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
Supervised vs. reinforcement learning
Observation
Model inputs Environment & Reward function
(e.g., traffic sign, music, email)
Class
Model outputs (e.g., stop/yield, jazz/classical, Action
spam/legitimate)
Maximize reward
Training “goal” Minimize class prediction error
by exploring the environment and
(i.e., cost/loss) over pairs of (inputs, outputs)
taking actions
Example
13
Adversarial attacks on neural network policies
14
Huang et al. Adversarial Attacks on Neural Network Policies
Adversarial
examples
(black-box
attacks)
15
Threat model of a black-box attack
Training data
Adversarial capabilities Model architecture
Model parameters
Model scores
(limited) oracle
access: labels
Example
16
Our approach to black-box attacks
17
Adversarial example transferability
Adversarial examples have a transferability property:
18
Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability
Adversarial examples have a transferability property:
19
Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability
Adversarial examples have a transferability property:
20
Cross-technique transferability
21
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Cross-technique transferability
22
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Our approach to black-box attacks
Adversarial example
transferability from a
substitute model to
target model
23
Attacking remotely hosted black-box models
Remote
ML sys
(1) The adversary queries remote ML system for labels on inputs of its choice.
24
Attacking remotely hosted black-box models
Local Remote
substitute ML sys
(2) The adversary uses this labeled data to train a local substitute for the remote system.
25
Attacking remotely hosted black-box models
Local Remote
substitute ML sys
(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local
substitute’s output surface sensitivity to input variations. 26
Attacking remotely hosted black-box models
(4) The adversary then uses the local substitute to craft adversarial examples, which are
misclassified by the remote ML system because of transferability.
27
Our approach to black-box attacks
+
Adversarial example
transferability from a Synthetic data
substitute model to generation
target model
28
Results on real-world remote systems
Adversarial examples
Remote Platform ML technique Number of queries misclassified
(after querying)
All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)
29
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Benchmarking
progress in the
adversarial ML
community
30
31
Growing community
1.3K+ stars
340+ forks
40+ contributors
32
Adversarial examples represent
worst-case distribution drifts
33
[DDS04] Dalvi et al. Adversarial Classification (KDD)
Adversarial examples are a tangible
instance of hypothetical AI safety problems
34
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
Part II
35
Types of adversaries and our threat model
Black-box
Model querying (black-box adversary)
ML ?
Shokri et al. (2016) Membership Inference Attacks against ML Models
Fredrikson et al. (2015) Model Inversion Attacks
Answer 1
Randomized Answer 2
}
Algorithm ...
Answer n
???
?
Answer 1
Randomized Answer 2
Algorithm ...
Answer n
37
Our design goals
39
Teacher ensemble
Partition 1 Teacher 1
Partition 2 Teacher 2
Sensitive
Data Partition 3 Teacher 3
... ...
Partition n Teacher n
41
Intuitive privacy analysis
42
Noisy aggregation
43
Teacher ensemble
Partition 1 Teacher 1
Partition 2 Teacher 2
Sensitive Aggregated
Data Partition 3 Teacher 3
Teacher
... ...
Partition n Teacher n
Partition 1 Teacher 1
Partition 2 Teacher 2
Sensitive Aggregated
Data Partition 3 Teacher 3 Student Queries
Teacher
... ...
Partition n Teacher n
Public
Data
46
Student training
Not available to the adversary Available to the adversary
Partition 1 Teacher 1
Partition 2 Teacher 2
Sensitive Aggregated
Data Partition 3 Teacher 3 Student Queries
Teacher
... ...
Partition n Teacher n
Public
Data
Student Queries
Inference
48
Differential privacy analysis
Differential privacy:
A randomized algorithm M satisfies ( , ) differential privacy if for all pairs of neighbouring
datasets (d,d’), for all subsets S of outputs:
49
Experimental
results
50
Experimental setup
Student
Dataset Teacher Model
Model
/ /models/tree/master/differential_privacy/multiple_teachers
51
Aggregated teacher accuracy
52
Trade-off between student accuracy and privacy
53
Trade-off between student accuracy and privacy
UCI Diabetes
1.44
10-5
Non-private
93.81%
baseline
Student 93.94%
accuracy
54
Synergy between privacy and generalization
55
Some online ressources:
www.papernot.fr
@NicolasPapernot 56
57
Gradient masking
58
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses
Gradient masking
59
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses