0% found this document useful (0 votes)
83 views

Attack (v8)

The document discusses machine learning classifiers and the need to make them robust against adversarial attacks from malicious inputs. It proposes different attack methods that can fool classifiers, such as Fast Gradient Sign Method, and distinguishes between white box attacks that use a model's parameters and black box attacks that do not require access to the model. The goal is to generate adversarial examples that are misclassified while remaining close to original inputs.

Uploaded by

applead
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Attack (v8)

The document discusses machine learning classifiers and the need to make them robust against adversarial attacks from malicious inputs. It proposes different attack methods that can fool classifiers, such as Fast Gradient Sign Method, and distinguishes between white box attacks that use a model's parameters and black box attacks that do not require access to the model. The goal is to generate adversarial examples that are misclassified while remaining close to original inputs.

Uploaded by

applead
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Attack and Defense

Hung-yi Lee

Source of image: http://www.fafa01.com/post865806


Motivation
• We seek to deploy machine learning classifiers not
only in the labs, but also in real world.
• The classifiers that are robust to noises and work
“most of the time” is not sufficient. 光是強還不夠
• We want the classifiers that are robust the inputs
that are built to fool the classifier. 應付來自人類的惡意
• Especially useful for spam classification, malware
detection, network intrusion detection, etc.
Attack

https://www.darksword-armory.com/wp-content/uploads/2014/09/two-handed-danish-sword-
medieval-weapon-1352-3.jpg
What do we want to do?
Original Image

Something Else
Network Tiger Cat
0.64
𝑥0

𝑥1 ∆𝑥1
𝑥2 ∆𝑥2 Attacked
+ Image
𝑥3 ∆𝑥3
⋮ ⋮ 𝑥 ′ = 𝑥 0 + ∆𝑥
Loss Function for Attack
e.g. cat
Network close 𝑦 𝑡𝑟𝑢𝑒
𝑥0 𝑦 0 = 𝑓𝜃 𝑥
𝑓𝜃
close far
? 𝑦 ′ = 𝑓𝜃 𝑥 ′ 𝑦 𝑓𝑎𝑙𝑠𝑒
𝑥′ close
e.g. fish
• Training: 𝐿𝑡𝑟𝑎𝑖𝑛 𝜃 = 𝐶 𝑦 0 , 𝑦 𝑡𝑟𝑢𝑒 𝑥 fixed
• Non-targeted Attack: 𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 𝜃 fixed
• Targeted Attack:
𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 + 𝐶 𝑦 ′ , 𝑦 𝑓𝑎𝑙𝑠𝑒
• Constraint: 𝑑 𝑥 0 , 𝑥 ′ ≤ 𝜀 不要被發現
𝑥1′ 𝑥1 ∆𝑥1
𝑥2′ − 𝑥2 ∆𝑥2
Constraint 𝑑 𝑥 0, 𝑥 ′ ≤ 𝜀 𝑥3′ 𝑥3 =
∆𝑥3
⋮ ⋮ ⋮
𝑥′ 𝑥0 ∆𝑥
• L2-norm
small L-∞
𝑑 𝑥0, 𝑥′ = 𝑥0 − 𝑥′ 2
Change
= ∆𝑥1 2 + ∆𝑥2 2 + ∆𝑥3 2⋯
every pixel a
little bit
• L-infinity
same L2
𝑑 𝑥 0 , 𝑥 ′ = 𝑥 0 − 𝑥′ ∞
Change one
= 𝑚𝑎𝑥 ∆𝑥1 , ∆𝑥2 , ∆𝑥3 , ⋯
pixel much
large L-∞
Just like training a neural network,
How to Attack but network parameter 𝜃 is
replaced with input 𝑥 ′

𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀

• Gradient Descent (Modified Version)

Start from original image 𝑥 0


For t = 1 to T 𝜕𝐿 𝑥 Τ𝜕𝑥1
𝑥 𝑡 ← 𝑥 𝑡−1 − 𝜂∇𝐿 𝑥 𝑡−1 𝜕𝐿 𝑥 Τ𝜕𝑥2
∇𝐿 𝑥 =
If 𝑑 𝑥 0 , 𝑥 𝑡 > 𝜀 𝜕𝐿 𝑥 Τ𝜕𝑥3

𝑥 𝑡 ← 𝑓𝑖𝑥 𝑥 𝑡
Just like training a neural network,
How to Attack but network parameter 𝜃 is
replaced with input 𝑥 ′

𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀

• Gradient Descent (Modified Version)

Start from original image 𝑥 0 def 𝑓𝑖𝑥 𝑥 𝑡


For t = 1 to T
𝑡 𝑡−1 𝑡−1
For all 𝑥 fulfill
𝑥 ←𝑥 − 𝜂∇𝐿 𝑥 𝑑 𝑥0, 𝑥 ≤ 𝜀
If 𝑑 𝑥 0 , 𝑥 𝑡 > 𝜀 Return the one
𝑥 𝑡 ← 𝑓𝑖𝑥 𝑥 𝑡 closest to 𝑥 𝑡
L2-norm 𝑥𝑡
How to Attack 𝑥𝑡

𝑥0

def 𝑓𝑖𝑥 𝑥 𝑡 𝜀

For all 𝑥 fulfill


𝑑 𝑥0, 𝑥 ≤ 𝜀 L-infinity 𝑥𝑡
Return the one
closest to 𝑥 𝑡 𝑥𝑡
𝜀

𝜀
𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 + 𝐶 𝑦 ′ , 𝑦 𝑓𝑎𝑙𝑠𝑒
Example True = Tiger cat
𝑓= ResNet-50
False = Star Fish

Original Image Attacked Image

Tiger Cat Star Fish


0.64 1.00
Example
=
Original Image Attacked Image

50x -

Tiger Cat Star Fish


0.64 1.00
𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 + 𝐶 𝑦 ′ , 𝑦 𝑓𝑎𝑙𝑠𝑒
Example True = Tiger cat
𝑓= ResNet-50
False = Keyboard

Original Image Attacked Image

Tiger Cat Keyboard


0.64 0.98
tiger Persian
cat cat

tabby fire
cat screen
What happened?
𝑦𝐸𝑔𝑦𝑝𝑡𝑖𝑎𝑛 𝑐𝑎𝑡 𝑦𝑡𝑖𝑔𝑒𝑟 𝑐𝑎𝑡 𝑦𝑃𝑒𝑟𝑠𝑖𝑎𝑛 𝑐𝑎𝑡

Random
𝑥0
𝑦𝑡𝑖𝑔𝑒𝑟 𝑐𝑎𝑡 𝑦𝑘𝑒𝑦 𝑏𝑜𝑎𝑟𝑑

Specific Direction
𝑥0
Attack Approaches
• FGSM (https://arxiv.org/abs/1412.6572)
• Basic iterative method (https://arxiv.org/abs/1607.02533)
• L-BFGS (https://arxiv.org/abs/1312.6199)
• Deepfool (https://arxiv.org/abs/1511.04599)
• JSMA (https://arxiv.org/abs/1511.07528)
• C&W (https://arxiv.org/abs/1608.04644)
• Elastic net attack (https://arxiv.org/abs/1709.04114)
• Spatially Transformed (https://arxiv.org/abs/1801.02612)
• One Pixel Attack (https://arxiv.org/abs/1710.08864)
• …… only list a few
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg

Attack Approaches
Different optimization methods
𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
• Fast Gradient Sign Method (FGSM)

𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 =
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3

only have +1 or -1
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg

Attack Approaches
Different optimization methods
𝑥 ∗ = 𝑎𝑟𝑔 𝑚𝑖𝑛
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
• Fast Gradient Sign Method (FGSM)
𝑥∗
𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑥1
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1 𝜀 𝑥0
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 = gradient
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3 𝜀

only have +1 or -1 L-infinity
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg

Attack Approaches
Different optimization methods

𝑥 = 𝑎𝑟𝑔 𝑚𝑖𝑛 𝐿 𝑥 ′ 𝑥1
0 ′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
very large
• Fast Gradient Sign Method (FGSM)
learning rate 𝑥∗
𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1 𝜀 𝑥0
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 = gradient
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3 𝜀

only have +1 or -1 L-infinity
White Box v.s. Black Box
• In the previous attack, we fix network
parameters 𝜃 to find optimal 𝑥 ′ .
• To attack, we need to know network
parameters 𝜃
• This is called White Box Attack.
• Are we safe if we do not release model? ☺
• You cannot obtain model parameters in
most on-line API.
• No, because Black Box Attack is possible. 
Black Box Attack
If you have the training data of the target network
Train a proxy network yourself
Using the proxy network to generate attacked objects
Otherwise, obtaining input-output pairs from target network
Attacked
Object
Network Network
Black Proxy

Training Data
Black Box Attack
If you have the training data of the target network
Train a proxy network yourself
Using the proxy network to generate attacked objects
Otherwise, obtaining input-output pairs from target network

Black

Proxy

https://arxiv.org/pdf/1611.02770.pdf
Universal
Adversarial
Attack

https://arxiv.org/abs/1610.08401

Black Box Attack is also possible!


Adversarial Reprogramming
• https://arxiv.org/abs/1806.11146

Gamaleldin F. Elsayed, Ian Goodfellow, Jascha Sohl-Dickstein, “Adversarial


Reprogramming of Neural Networks”, ICLR, 2019
Attack in the Real World
Black Box Attack

https://www.youtube.com/watch?v=zQ_uMenoBCk&feature=youtu.be
https://www.cs.cmu.edu/~sbhagava/pape
rs/face-rec-ccs16.pdf

Attack in the Real World


1. An attacker would need to find perturbations that
generalize beyond a single image.
2. Extreme differences between adjacent pixels in the
perturbation are unlikely to be accurately captured by cameras.
3. It is desirable to craft perturbations that are comprised
mostly of colors reproducible by the printer.
https://arxiv.org/ab
s/1707.08945
Beyond Images
• You can attack audio
• https://nicholas.carlini.com/code/audio_adversarial_examples/
• https://adversarial-attacks.net

• You can attack text

https://arxiv.org/pdf/17
07.07328.pdf
Defense

http://3png.com/a-27051273.html
Defense
• Adversarial Attack cannot be defended by weight
regularization, dropout and model ensemble.
• Two types of defense:
• Passive defense: Finding the attached image
without modifying the model
• Special case of Anomaly Detection
• Proactive defense: Training a model that is
robust to adversarial attack
Passive Defense
Do not influence
Original classification

Tiger Cat
Keyboard
+ Filter + Network

e.g.
Smoothing

Attack signal Less harmful


Smoothing

tiger cat tiger cat


0.64 0.45

Smoothing

Keyboard tiger cat


0.98 0.37
Passive Defense
• Feature Squeeze

https://arxiv.org/abs/1704.01155
Randomization at Inference Phase

https://arxiv.org/abs/1711.01991
精神:
Proactive Defense 找出漏洞、補起來

Given training data X = 𝑥 1 , 𝑦ො 1 , 𝑥 2 , 𝑦ො 2 , ⋯ , 𝑥 𝑁 , 𝑦ො 𝑦


Using X to train your model
This method would stop
For t = 1 to T algorithm A, but is still
For n = 1 to N 找出漏洞 vulnerable for algorithm B.

Find adversarial input 𝑥෤ 𝑛 given 𝑥 𝑛


Using algorithm A
by an attack algorithm
We have new training data different in each iteration
X′ = 𝑥෤ 1 , 𝑦ො 1 , 𝑥෤ 2 , 𝑦ො 2 , ⋯ , 𝑥෤ 𝑁 , 𝑦ො 𝑦 Data Augmentation

Using both X′ to update your model 把洞補起來


Concluding Remarks
• Attack: given the network parameters, attack is
very easy.
• Even black box attack is possible
• Defense: Passive & Proactive
• Future: Adaptive Attack / Defense

https://www.gotrip.hk/179304/weekend_lifestyle/pokemon-
go_%E7%B2%BE%E9%9D%88%E9%80%B2%E5%8C%96/
To learn more …
• Reference
• https://adversarial-ml-tutorial.org/ (Zico Kolter and
Aleksander Madry)
• Adversarial Attack Toolbox:
• https://github.com/bethgelab/foolbox
• https://github.com/IBM/adversarial-robustness-toolbox
• https://github.com/tensorflow/cleverhans

You might also like