Adversarial Training is all you Need.pptx

Adversarial Training is all you Need
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models
resistant to adversarial attacks,” 2019
Prerana Khatiwada
Phd Second year

Problem Statement
Can Adversarial Training be used to defense against Poisoning
attacks?
● Neural networks have lately been providing state-of-the-art performance on most machine-
learning problems.
● Test time attacks add imperceptible noise to samples to change the models decision,
● Train time attacks add adversarially manipulated points to the training set which can be
exploited during test time.

Adversarial Examples
Adversarial Examples: Manipulating image inputs with the purpose of confusing a neural network,
resulting in the misclassification of a given input.
● Are indistinguishable to the human eye, but cause the network to fail to identify the contents of the
image.
Poisoning Attacks: Attacker maliciously add crafted
points to the training data before it is fed into the
Algorithm.
● Poisoning attack is one of the attack that adversarial
ML aims to prevent.

Poisoning Attacks
● There are two main types backdoor attacks : Clean label backdoors and the Badnet attack.
● Clean label backdoors add an adversarial perturbation along with the backdoor pattern, but it does not
change the label on the data point.
● In the BadNets attack, the attacker is allowed to change the labels on the data point, which leads to a
more stronger attack.
● This work closely depends on the Madry et. al Paper that has uses PGD attack to generate adversarial
samples.

The malicious attacker adds poisoned points to the training data, which is
then exploited during model deployment

Gradient Descent attack
Develop a perturbation
vector for the input
image by making a slight
modification to the back-
propagation algorithm.
Contrary to common practice,
while back-propagating through
the network, it considers the
model parameters(or weights) to
be constant and the input to be a
variable.
Hence, gradients
corresponding to each
element of the input (for
example, pixels in case
of images) can be
obtained.
The PGD attack is a
white-box attack
where the attacker
has access to the
model gradients.

Objective
● Perform Adversarial Training on the Models using CNN and test them against Poisoning attacks on
various Datasets like CIFAR and MNIST.
● Compared the training approach with the State of Art Adversarial training.

Proposed Idea
● This project aims to answers the following questions, “Can adversarial training defend against
poisoning attacks?”
● Since Adversarial examples are themselves strong poisons, Standard adversarial training with a strong
adversarial attack , PGD is used in this case.
● PGD-AT is tried on the poisoned data to see if the robust models so created were able to defend against
poisoning attacks.

Methodology
Datasets
● CIFAR-10
● MNIST
● Model: Custom Neural Net
Attacks: Clean Label Backdoor Attack(TURNER attack) and Bad Net( Gu et Al Attack)

State of the Art Approach
●Implemented own version of Clean-Label Backdoor attack and Badnets attack using Keras.
●Poisoned the datasets using two attacks and try to defend it using Adversarial Training.
Comparing the training approach with the State of the Art Adversarial training.
MNIST Dataset
CIFAR 10 Dataset
Natural PGD
98.66% 93.2%
Natural PGD
92.7% 79.4%

Results
MNIST Dataset(Clean Label Attack/TURNER attack)
Test accuracy on Images with backdoor
Normal Training on Poisoned Data Adversarial Training on the Poisoned Data
0.34% 81.55%

Results
MNIST Dataset(Badnet Attack/Gu et al attack)
Normal Training on
Poisoned Data
Effectiveness of Poison
after normal training on
poisoned data
Adversarial Training on
Poisoned Data
Effectiveness of Poison
after AT on Poisoned Data
5.4% 86.58% 96.67% 3.42%

Results
CIFAR 10 Dataset(Clean Label Attack/TURNER attack)
23% 52.26%

Results
CIFAR 10 Dataset(Badnet Attack/Gu et al attack)
8.64% 55.75%

Why Adversarial training works on clean
label attacks?
● The adversary adds adversarial perturbations along with the backdoor pattern in order to poison the
model.
● The PGD-AT that we propose makes the model robust against adversarial perturbations, which in turn
makes the clean label attack work poorly. This way, AT defends against Clean label attacks.

Why Adversarial Training proves to be
effective against BadNets Attack in MNIST?
● Network trained without AT,
Poisoned data
● Two separate clusters
● Clear Distinctions between the
poisoned and non poisoned
classes
● Network trained AT, Poisoned data
● No separate clusters
● No clear Distinction which means that model is
able to remove the effect of backdoors.Hence this
way adversarial training is able to remove the effect
of even label based poisoning attacks

References
➔Adversarial Machine Learning in Image Classification: A Survey Towards the Defender’s Perspective
https://arxiv.org/pdf/2009.03728.pdf
➔Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
➔BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
➔Towards Deep Learning Models Resistant to Adversarial Attacks
➔Clean-Label Backdoor Attacks https://people.csail.mit.edu/madry/lab/cleanlabel.pdf

Adversarial Training is all you Need.pptx

More Related Content

Similar to Adversarial Training is all you Need.pptx (20)

More from Prerana Khatiwada (6)

Recently uploaded (20)

Adversarial Training is all you Need.pptx

Editor's Notes