Attack (v8)
Attack (v8)
Hung-yi Lee
https://www.darksword-armory.com/wp-content/uploads/2014/09/two-handed-danish-sword-
medieval-weapon-1352-3.jpg
What do we want to do?
Original Image
Something Else
Network Tiger Cat
0.64
𝑥0
𝑥1 ∆𝑥1
𝑥2 ∆𝑥2 Attacked
+ Image
𝑥3 ∆𝑥3
⋮ ⋮ 𝑥 ′ = 𝑥 0 + ∆𝑥
Loss Function for Attack
e.g. cat
Network close 𝑦 𝑡𝑟𝑢𝑒
𝑥0 𝑦 0 = 𝑓𝜃 𝑥
𝑓𝜃
close far
? 𝑦 ′ = 𝑓𝜃 𝑥 ′ 𝑦 𝑓𝑎𝑙𝑠𝑒
𝑥′ close
e.g. fish
• Training: 𝐿𝑡𝑟𝑎𝑖𝑛 𝜃 = 𝐶 𝑦 0 , 𝑦 𝑡𝑟𝑢𝑒 𝑥 fixed
• Non-targeted Attack: 𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 𝜃 fixed
• Targeted Attack:
𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 + 𝐶 𝑦 ′ , 𝑦 𝑓𝑎𝑙𝑠𝑒
• Constraint: 𝑑 𝑥 0 , 𝑥 ′ ≤ 𝜀 不要被發現
𝑥1′ 𝑥1 ∆𝑥1
𝑥2′ − 𝑥2 ∆𝑥2
Constraint 𝑑 𝑥 0, 𝑥 ′ ≤ 𝜀 𝑥3′ 𝑥3 =
∆𝑥3
⋮ ⋮ ⋮
𝑥′ 𝑥0 ∆𝑥
• L2-norm
small L-∞
𝑑 𝑥0, 𝑥′ = 𝑥0 − 𝑥′ 2
Change
= ∆𝑥1 2 + ∆𝑥2 2 + ∆𝑥3 2⋯
every pixel a
little bit
• L-infinity
same L2
𝑑 𝑥 0 , 𝑥 ′ = 𝑥 0 − 𝑥′ ∞
Change one
= 𝑚𝑎𝑥 ∆𝑥1 , ∆𝑥2 , ∆𝑥3 , ⋯
pixel much
large L-∞
Just like training a neural network,
How to Attack but network parameter 𝜃 is
replaced with input 𝑥 ′
𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
𝑥0
def 𝑓𝑖𝑥 𝑥 𝑡 𝜀
𝜀
𝐿 𝑥 ′ = −𝐶 𝑦 ′ , 𝑦 𝑡𝑟𝑢𝑒 + 𝐶 𝑦 ′ , 𝑦 𝑓𝑎𝑙𝑠𝑒
Example True = Tiger cat
𝑓= ResNet-50
False = Star Fish
50x -
tabby fire
cat screen
What happened?
𝑦𝐸𝑔𝑦𝑝𝑡𝑖𝑎𝑛 𝑐𝑎𝑡 𝑦𝑡𝑖𝑔𝑒𝑟 𝑐𝑎𝑡 𝑦𝑃𝑒𝑟𝑠𝑖𝑎𝑛 𝑐𝑎𝑡
Random
𝑥0
𝑦𝑡𝑖𝑔𝑒𝑟 𝑐𝑎𝑡 𝑦𝑘𝑒𝑦 𝑏𝑜𝑎𝑟𝑑
Specific Direction
𝑥0
Attack Approaches
• FGSM (https://arxiv.org/abs/1412.6572)
• Basic iterative method (https://arxiv.org/abs/1607.02533)
• L-BFGS (https://arxiv.org/abs/1312.6199)
• Deepfool (https://arxiv.org/abs/1511.04599)
• JSMA (https://arxiv.org/abs/1511.07528)
• C&W (https://arxiv.org/abs/1608.04644)
• Elastic net attack (https://arxiv.org/abs/1709.04114)
• Spatially Transformed (https://arxiv.org/abs/1801.02612)
• One Pixel Attack (https://arxiv.org/abs/1710.08864)
• …… only list a few
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg
Attack Approaches
Different optimization methods
𝑥 ∗ = 𝑎𝑟𝑔 min
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
• Fast Gradient Sign Method (FGSM)
𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 =
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3
⋮
only have +1 or -1
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg
Attack Approaches
Different optimization methods
𝑥 ∗ = 𝑎𝑟𝑔 𝑚𝑖𝑛
0 ′
𝐿 𝑥′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
• Fast Gradient Sign Method (FGSM)
𝑥∗
𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑥1
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1 𝜀 𝑥0
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 = gradient
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3 𝜀
⋮
only have +1 or -1 L-infinity
https://sites.google.com/site/pkms20152a17/_/rsrc/1448428701742/home/125986076.jpg
Attack Approaches
Different optimization methods
∗
𝑥 = 𝑎𝑟𝑔 𝑚𝑖𝑛 𝐿 𝑥 ′ 𝑥1
0 ′
𝑑 𝑥 ,𝑥 ≤𝜀
Different constraints
very large
• Fast Gradient Sign Method (FGSM)
learning rate 𝑥∗
𝑥 ∗ ← 𝑥 0 − 𝜀∆𝑥
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥1 𝜀 𝑥0
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥2
∆𝑥 = gradient
𝑠𝑖𝑔𝑛 𝜕𝐿Τ𝜕𝑥3 𝜀
⋮
only have +1 or -1 L-infinity
White Box v.s. Black Box
• In the previous attack, we fix network
parameters 𝜃 to find optimal 𝑥 ′ .
• To attack, we need to know network
parameters 𝜃
• This is called White Box Attack.
• Are we safe if we do not release model? ☺
• You cannot obtain model parameters in
most on-line API.
• No, because Black Box Attack is possible.
Black Box Attack
If you have the training data of the target network
Train a proxy network yourself
Using the proxy network to generate attacked objects
Otherwise, obtaining input-output pairs from target network
Attacked
Object
Network Network
Black Proxy
Training Data
Black Box Attack
If you have the training data of the target network
Train a proxy network yourself
Using the proxy network to generate attacked objects
Otherwise, obtaining input-output pairs from target network
Black
Proxy
https://arxiv.org/pdf/1611.02770.pdf
Universal
Adversarial
Attack
https://arxiv.org/abs/1610.08401
https://www.youtube.com/watch?v=zQ_uMenoBCk&feature=youtu.be
https://www.cs.cmu.edu/~sbhagava/pape
rs/face-rec-ccs16.pdf
https://arxiv.org/pdf/17
07.07328.pdf
Defense
http://3png.com/a-27051273.html
Defense
• Adversarial Attack cannot be defended by weight
regularization, dropout and model ensemble.
• Two types of defense:
• Passive defense: Finding the attached image
without modifying the model
• Special case of Anomaly Detection
• Proactive defense: Training a model that is
robust to adversarial attack
Passive Defense
Do not influence
Original classification
Tiger Cat
Keyboard
+ Filter + Network
e.g.
Smoothing
Smoothing
https://arxiv.org/abs/1704.01155
Randomization at Inference Phase
https://arxiv.org/abs/1711.01991
精神:
Proactive Defense 找出漏洞、補起來
https://www.gotrip.hk/179304/weekend_lifestyle/pokemon-
go_%E7%B2%BE%E9%9D%88%E9%80%B2%E5%8C%96/
To learn more …
• Reference
• https://adversarial-ml-tutorial.org/ (Zico Kolter and
Aleksander Madry)
• Adversarial Attack Toolbox:
• https://github.com/bethgelab/foolbox
• https://github.com/IBM/adversarial-robustness-toolbox
• https://github.com/tensorflow/cleverhans