0% found this document useful (0 votes)
39 views

Slides Security and Privacy in Machine Learning

Security & privacy in machine learning

Uploaded by

Techohen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Slides Security and Privacy in Machine Learning

Security & privacy in machine learning

Uploaded by

Techohen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Security and Privacy

in Machine Learning
Nicolas Papernot
Pennsylvania State University & Google Brain

Lecture for Prof. Trent Jaeger’s CSE 543 Computer Security Class

November 2017 - Penn State


Thank you to my collaborators

Patrick McDaniel
(Penn State)
Martín Abadi (Google Brain) Alexey Kurakin (Google Brain)
Pieter Abbeel (Berkeley) Praveen Manoharan (CISPA)
Michael Backes (CISPA) Ilya Mironov (Google Brain)
Dan Boneh (Stanford) Ananth Raghunathan (Google Brain)
Z. Berkay Celik (Penn State) Arunesh Sinha (U of Michigan)
Yan Duan (OpenAI) Shuang Song (UCSD)
Úlfar Erlingsson (Google Brain) Ananthram Swami (US ARL)
Matt Fredrikson (CMU) Kunal Talwar (Google Brain)
Ian Goodfellow Kathrin Grosse (CISPA) Florian Tramèr (Stanford)
(Google Brain) Sandy Huang (Berkeley) Michael Wellman (U of Michigan)
Somesh Jha (U of Wisconsin) Xi Wu (Google) 2
Machine Learning [0.01, 0.84, 0.02, 0.01, 0.01, 0.01, 0.05, 0.01, 0.03, 0.01]
Classifier

x f(x,θ) [p(0|x,θ), p(1|x,θ), p(2|x,θ), …, p(7|x,θ), p(8|x,θ), p(9|x,θ)]

Classifier: map inputs to one class among a predefined set

3
[0 1 0 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
Machine Learning [0 0 0 0 0 0 0 0 0 1]
[0 0 0 1 0 0 0 0 0 0]
Classifier [0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 1 0 0 0]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]

Learning: find internal classifier parameters θ that minimize a


cost/loss function (~model error)

4
Outline of this lecture

1 Security in ML

2 Privacy in ML

5
Part I

Security in machine learning

6
Attack Models

Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
ML

Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
ML

7
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Attack Models

Attacker may see the model: bad even if an attacker needs to know details of the machine
learning model to do an attack --- aka a white-box attacker
ML

Attacker may not need the model: worse if attacker who knows very little (e.g. only gets to
ask a few questions) can do an attack --- aka a black-box attacker
ML

8
Papernot et al. Towards the Science of Security and Privacy in Machine Learning
Adversarial
examples
(white-box
attacks)

9
Jacobian-based Saliency Map Approach (JSMA)

10
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Jacobian-Based Iterative Approach: source-target misclassification

11
Papernot et al. The Limitations of Deep Learning in Adversarial Settings
Evading a Neural Network Malware Classifier

DREBIN dataset of Android applications

P[X=Malware] = 0.90
Add constraints to JSMA approach: P[X=Benign] = 0.10
- only add features: keep malware behavior
- only features from manifest: easy to modify
P[X*=Malware] = 0.10
P[X*=Benign] = 0.90

“Most accurate” neural network


- 98% accuracy, with 9.7% FP and 1.3% FN
- Evaded with a 63.08% success rate

12
Grosse et al. Adversarial Perturbations Against Deep Neural Networks for Malware Classification
Supervised vs. reinforcement learning

Supervised learning Reinforcement learning

Observation
Model inputs Environment & Reward function
(e.g., traffic sign, music, email)

Class
Model outputs (e.g., stop/yield, jazz/classical, Action
spam/legitimate)

Maximize reward
Training “goal” Minimize class prediction error
by exploring the environment and
(i.e., cost/loss) over pairs of (inputs, outputs)
taking actions

Example

13
Adversarial attacks on neural network policies

14
Huang et al. Adversarial Attacks on Neural Network Policies
Adversarial
examples
(black-box
attacks)

15
Threat model of a black-box attack

Training data
Adversarial capabilities Model architecture
Model parameters
Model scores
(limited) oracle
access: labels

Adversarial goal Force a ML model remotely accessible through an API to misclassify

Example

16
Our approach to black-box attacks

Alleviate lack of knowledge Alleviate lack of


about model training data

17
Adversarial example transferability
Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

These property comes in several variants:


ML A
● Intra-technique transferability:
○ Cross model transferability
○ Cross training set transferability
● Cross-technique transferability

18
Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability
Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

These property comes in several variants:


ML A
● Intra-technique transferability:
○ Cross model transferability
○ Cross training set transferability
● Cross-technique transferabilityML B
Victim

19
Szegedy et al. Intriguing properties of neural networks
Adversarial example transferability
Adversarial examples have a transferability property:

samples crafted to mislead a model A are likely to mislead a model B

20
Cross-technique transferability

21
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Cross-technique transferability

22
Papernot et al. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples
Our approach to black-box attacks

Alleviate lack of knowledge Alleviate lack of


about model training data

Adversarial example
transferability from a
substitute model to
target model

23
Attacking remotely hosted black-box models

Remote
ML sys

“no truck sign”


“STOP sign”
“STOP sign”

(1) The adversary queries remote ML system for labels on inputs of its choice.

24
Attacking remotely hosted black-box models

Local Remote
substitute ML sys

“no truck sign”


“STOP sign”
“STOP sign”

(2) The adversary uses this labeled data to train a local substitute for the remote system.

25
Attacking remotely hosted black-box models

Local Remote
substitute ML sys

“no truck sign”


“STOP sign”

(3) The adversary selects new synthetic inputs for queries to the remote ML system based on the local
substitute’s output surface sensitivity to input variations. 26
Attacking remotely hosted black-box models

Local Remote “yield sign”


substitute ML sys

(4) The adversary then uses the local substitute to craft adversarial examples, which are
misclassified by the remote ML system because of transferability.
27
Our approach to black-box attacks

Alleviate lack of knowledge Alleviate lack of


about model training data

+
Adversarial example
transferability from a Synthetic data
substitute model to generation
target model

28
Results on real-world remote systems

Adversarial examples
Remote Platform ML technique Number of queries misclassified
(after querying)

Deep Learning 6,400 84.24%

Logistic Regression 800 96.19%

Unknown 2,000 97.72%

All remote classifiers are trained on the MNIST dataset (10 classes, 60,000 training samples)
29
[PMG16a] Papernot et al. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples
Benchmarking
progress in the
adversarial ML
community

30
31
Growing community

1.3K+ stars
340+ forks
40+ contributors

32
Adversarial examples represent
worst-case distribution drifts

33
[DDS04] Dalvi et al. Adversarial Classification (KDD)
Adversarial examples are a tangible
instance of hypothetical AI safety problems

34
Image source: http://www.nerdist.com/wp-content/uploads/2013/07/Space-Odyssey-4.jpg
Part II

Privacy in machine learning

35
Types of adversaries and our threat model

Black-box
Model querying (black-box adversary)
ML ?
Shokri et al. (2016) Membership Inference Attacks against ML Models
Fredrikson et al. (2015) Model Inversion Attacks

Model inspection (white-box adversary)


Zhang et al. (2017) Understanding DL requires rethinking generalization

In our work, the threat model assumes:


- Adversary can make a potentially unbounded number of queries
- Adversary has access to model internals
36
A definition of privacy

Answer 1
Randomized Answer 2

}
Algorithm ...
Answer n
???
?

Answer 1
Randomized Answer 2
Algorithm ...
Answer n

37
Our design goals

Problem Preserve privacy of training data when learning classifiers

Differential privacy protection guarantees

Goals Intuitive privacy protection guarantees


Generic* (independent of learning algorithm)

*This is a key distinction from previous work, such as


Pathak et al. (2011) Privacy preserving probabilistic inference with hidden markov models
Jagannathan et al. (2013) A semi-supervised learning approach to differential privacy
Shokri et al. (2015) Privacy-preserving Deep Learning
Abadi et al. (2016) Deep Learning with Differential Privacy
Hamm et al. (2016) Learning privately from multiparty data 38
The PATE approach

39
Teacher ensemble

Partition 1 Teacher 1

Partition 2 Teacher 2

Sensitive
Data Partition 3 Teacher 3

... ...

Partition n Teacher n

Training Data flow


40
Aggregation

Count votes Take maximum

41
Intuitive privacy analysis

If most teachers agree on the label, it does not depend on


specific partitions, so the privacy cost is small.

If two classes have close vote counts, the disagreement


may reveal private information.

42
Noisy aggregation

Count votes Add Laplacian noise Take maximum

43
Teacher ensemble

Partition 1 Teacher 1

Partition 2 Teacher 2

Sensitive Aggregated
Data Partition 3 Teacher 3
Teacher

... ...

Partition n Teacher n

Training Data flow


44
Student training
Not available to the adversary Available to the adversary

Partition 1 Teacher 1

Partition 2 Teacher 2

Sensitive Aggregated
Data Partition 3 Teacher 3 Student Queries
Teacher

... ...

Partition n Teacher n
Public
Data

Training Inference Data flow


45
Why train an additional “student” model?

The aggregated teacher violates our threat model:

1 Each prediction increases total privacy loss.


Privacy budgets create a tension between the accuracy and number of predictions.

2 Inspection of internals may reveal private data.


Privacy guarantees should hold in the face of white-box adversaries.

46
Student training
Not available to the adversary Available to the adversary

Partition 1 Teacher 1

Partition 2 Teacher 2

Sensitive Aggregated
Data Partition 3 Teacher 3 Student Queries
Teacher

... ...

Partition n Teacher n
Public
Data

Training Inference Data flow


47
Deployment
Available to the adversary

Student Queries

Inference
48
Differential privacy analysis
Differential privacy:
A randomized algorithm M satisfies ( , ) differential privacy if for all pairs of neighbouring
datasets (d,d’), for all subsets S of outputs:

Application of the Moments Accountant technique (Abadi et al, 2016)

Strong quorum ⟹ Small privacy cost

Bound is data-dependent: computed using the empirical quorum

49
Experimental
results

50
Experimental setup
Student
Dataset Teacher Model
Model

MNIST Convolutional Neural Network Generative Adversarial Networks

SVHN Convolutional Neural Network Generative Adversarial Networks

UCI Adult Random Forest Random Forest

UCI Diabetes Random Forest Random Forest

/ /models/tree/master/differential_privacy/multiple_teachers
51
Aggregated teacher accuracy

52
Trade-off between student accuracy and privacy

53
Trade-off between student accuracy and privacy

UCI Diabetes

1.44

10-5

Non-private
93.81%
baseline

Student 93.94%
accuracy

54
Synergy between privacy and generalization

55
Some online ressources:

Blog on S&P in ML (joint work w/ Ian Goodfellow) www.cleverhans.io


ML course https://coursera.org/learn/machine-learning
DL course https://coursera.org/learn/neural-networks

Assigned reading and more in-depth technical survey paper:

Machine Learning in Adversarial Settings


Patrick McDaniel, Nicolas Papernot, Z. Berkay Celik

Towards the Science of Security and Privacy in Machine Learning


Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman

www.papernot.fr
@NicolasPapernot 56
57
Gradient masking

58
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses
Gradient masking

59
Tramèr et al. Ensemble Adversarial Training: Attacks and Defenses

You might also like