SlideShare a Scribd company logo
Adversarial Training is all you Need
A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models
resistant to adversarial attacks,” 2019
Prerana Khatiwada
Phd Second year
Problem Statement
Can Adversarial Training be used to defense against Poisoning
attacks?
● Neural networks have lately been providing state-of-the-art performance on most machine-
learning problems.
● Test time attacks add imperceptible noise to samples to change the models decision,
● Train time attacks add adversarially manipulated points to the training set which can be
exploited during test time.
Adversarial Examples
Adversarial Examples: Manipulating image inputs with the purpose of confusing a neural network,
resulting in the misclassification of a given input.
● Are indistinguishable to the human eye, but cause the network to fail to identify the contents of the
image.
Poisoning Attacks: Attacker maliciously add crafted
points to the training data before it is fed into the
Algorithm.
● Poisoning attack is one of the attack that adversarial
ML aims to prevent.
Poisoning Attacks
● There are two main types backdoor attacks : Clean label backdoors and the Badnet attack.
● Clean label backdoors add an adversarial perturbation along with the backdoor pattern, but it does not
change the label on the data point.
● In the BadNets attack, the attacker is allowed to change the labels on the data point, which leads to a
more stronger attack.
● This work closely depends on the Madry et. al Paper that has uses PGD attack to generate adversarial
samples.
The malicious attacker adds poisoned points to the training data, which is
then exploited during model deployment
Gradient Descent attack
Develop a perturbation
vector for the input
image by making a slight
modification to the back-
propagation algorithm.
Contrary to common practice,
while back-propagating through
the network, it considers the
model parameters(or weights) to
be constant and the input to be a
variable.
Hence, gradients
corresponding to each
element of the input (for
example, pixels in case
of images) can be
obtained.
The PGD attack is a
white-box attack
where the attacker
has access to the
model gradients.
Objective
● Perform Adversarial Training on the Models using CNN and test them against Poisoning attacks on
various Datasets like CIFAR and MNIST.
● Compared the training approach with the State of Art Adversarial training.
Proposed Idea
● This project aims to answers the following questions, “Can adversarial training defend against
poisoning attacks?”
● Since Adversarial examples are themselves strong poisons, Standard adversarial training with a strong
adversarial attack , PGD is used in this case.
● PGD-AT is tried on the poisoned data to see if the robust models so created were able to defend against
poisoning attacks.
Methodology
Datasets
● CIFAR-10
● MNIST
● Model: Custom Neural Net
Attacks: Clean Label Backdoor Attack(TURNER attack) and Bad Net( Gu et Al Attack)
State of the Art Approach
●Implemented own version of Clean-Label Backdoor attack and Badnets attack using Keras.
●Poisoned the datasets using two attacks and try to defend it using Adversarial Training.
Comparing the training approach with the State of the Art Adversarial training.
MNIST Dataset
CIFAR 10 Dataset
Natural PGD
98.66% 93.2%
Natural PGD
92.7% 79.4%
Results
MNIST Dataset(Clean Label Attack/TURNER attack)
Test accuracy on Images with backdoor
Normal Training on Poisoned Data Adversarial Training on the Poisoned Data
0.34% 81.55%
Results
MNIST Dataset(Badnet Attack/Gu et al attack)
Test accuracy on Images with backdoor
Normal Training on
Poisoned Data
Effectiveness of Poison
after normal training on
poisoned data
Adversarial Training on
Poisoned Data
Effectiveness of Poison
after AT on Poisoned Data
5.4% 86.58% 96.67% 3.42%
Results
CIFAR 10 Dataset(Clean Label Attack/TURNER attack)
Test accuracy on Images with backdoor
Normal Training on Poisoned Data Adversarial Training on the Poisoned Data
23% 52.26%
Results
CIFAR 10 Dataset(Badnet Attack/Gu et al attack)
Test accuracy on Images with backdoor
Normal Training on Poisoned Data Adversarial Training on the Poisoned Data
8.64% 55.75%
Why Adversarial training works on clean
label attacks?
● The adversary adds adversarial perturbations along with the backdoor pattern in order to poison the
model.
● The PGD-AT that we propose makes the model robust against adversarial perturbations, which in turn
makes the clean label attack work poorly. This way, AT defends against Clean label attacks.
Why Adversarial Training proves to be
effective against BadNets Attack in MNIST?
● Network trained without AT,
Poisoned data
● Two separate clusters
● Clear Distinctions between the
poisoned and non poisoned
classes
● Network trained AT, Poisoned data
● No separate clusters
● No clear Distinction which means that model is
able to remove the effect of backdoors.Hence this
way adversarial training is able to remove the effect
of even label based poisoning attacks
References
➔Adversarial Machine Learning in Image Classification: A Survey Towards the Defender’s Perspective
https://arxiv.org/pdf/2009.03728.pdf
➔Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses
https://arxiv.org/pdf/2012.10544.pdf
➔BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain
https://arxiv.org/pdf/1708.06733.pdf
➔Towards Deep Learning Models Resistant to Adversarial Attacks
https://arxiv.org/pdf/1706.06083.pdf
➔Clean-Label Backdoor Attacks https://people.csail.mit.edu/madry/lab/cleanlabel.pdf
THANK YOU!
Q/A

More Related Content

PDF
Bringing Red vs. Blue to Machine Learning
PDF
Adversarial ML - Part 1.pdf
PPTX
Black-Box attacks against Neural Networks - technical project presentation
PDF
Research of adversarial example on a deep neural network
PDF
Alexey kurakin-what's new in adversarial machine learning
PDF
Immunizing Image Classifiers Against Localized Adversary Attacks
PDF
Immunizing Image Classifiers Against Localized Adversary Attacks
PDF
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019
Bringing Red vs. Blue to Machine Learning
Adversarial ML - Part 1.pdf
Black-Box attacks against Neural Networks - technical project presentation
Research of adversarial example on a deep neural network
Alexey kurakin-what's new in adversarial machine learning
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
WILD PATTERNS - Introduction to Adversarial Machine Learning - ITASEC 2019

Similar to Adversarial Training is all you Need.pptx (20)

PDF
Reliable backdoor attack detection for various size of backdoor triggers
PPTX
Defending deep learning from adversarial attacks
PDF
Adversarial ML - Part 2.pdf
PPTX
Group 10 - DNN Presentation for UOM.pptx
PDF
A dilution-based defense method against poisoning attacks on deep learning sy...
PPTX
Fast Gradient Sign Method (FGSM)___.pptx
PDF
Capstone Design(2) 중간 발표
PPTX
adversarial robustness lecture
PPTX
Lecture_9_Poisoning_Attacks_and_Defenses.pptx
PDF
Adversarial ml
PDF
Survey of Adversarial Attacks in Deep Learning Models
PDF
A critical review on Adversarial Attacks on Intrusion Detection Systems: Must...
PPTX
A Survey on Security and Privacy of Machine Learning
PPTX
Towards Evaluating the Robustness of Deep Intrusion Detection Models in Adver...
PDF
Adversarial Attacks and Defenses in Deep Learning.pdf
PPTX
slides_security_and_privacy_in_machine_learning.pptx
PDF
SECURING THE DIGITAL FORTRESS: ADVERSARIAL MACHINE LEARNING CHALLENGES AND CO...
PDF
DESSERTATION 4 SEM cybersecurity ensemble approach
PPTX
Research Methodology Final Project Presentation.pptx
PDF
Robustness of Deep Neural Networks on White-box Attacks and Defense Strategie...
Reliable backdoor attack detection for various size of backdoor triggers
Defending deep learning from adversarial attacks
Adversarial ML - Part 2.pdf
Group 10 - DNN Presentation for UOM.pptx
A dilution-based defense method against poisoning attacks on deep learning sy...
Fast Gradient Sign Method (FGSM)___.pptx
Capstone Design(2) 중간 발표
adversarial robustness lecture
Lecture_9_Poisoning_Attacks_and_Defenses.pptx
Adversarial ml
Survey of Adversarial Attacks in Deep Learning Models
A critical review on Adversarial Attacks on Intrusion Detection Systems: Must...
A Survey on Security and Privacy of Machine Learning
Towards Evaluating the Robustness of Deep Intrusion Detection Models in Adver...
Adversarial Attacks and Defenses in Deep Learning.pdf
slides_security_and_privacy_in_machine_learning.pptx
SECURING THE DIGITAL FORTRESS: ADVERSARIAL MACHINE LEARNING CHALLENGES AND CO...
DESSERTATION 4 SEM cybersecurity ensemble approach
Research Methodology Final Project Presentation.pptx
Robustness of Deep Neural Networks on White-box Attacks and Defense Strategie...
Ad

More from Prerana Khatiwada (6)

PPTX
Bug_Busters_Hackathon_AICoE_UniversityofDelaware.pptx
PPTX
Accessibility in Website Design_Classppt.pptx
PPTX
Medication Management.pptx
PPTX
Analyzing the Security of Smartphone Unlock PINs.pptx
PPTX
Evaluating Serverless Machine Learning Performance On Google Cloud Run.pptx
PPTX
Medication Management2.pptx
Bug_Busters_Hackathon_AICoE_UniversityofDelaware.pptx
Accessibility in Website Design_Classppt.pptx
Medication Management.pptx
Analyzing the Security of Smartphone Unlock PINs.pptx
Evaluating Serverless Machine Learning Performance On Google Cloud Run.pptx
Medication Management2.pptx
Ad

Recently uploaded (20)

PDF
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
PPTX
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
PPT
Drone Technology Electronics components_1
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
July 2025: Top 10 Read Articles Advanced Information Technology
PPTX
TE-AI-Unit VI notes using planning model
PDF
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
PDF
Queuing formulas to evaluate throughputs and servers
PPTX
Glazing at Facade, functions, types of glazing
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PDF
flutter Launcher Icons, Splash Screens & Fonts
Geotechnical Engineering, Soil mechanics- Soil Testing.pdf
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Structs to JSON How Go Powers REST APIs.pdf
Unit 5 BSP.pptxytrrftyyydfyujfttyczcgvcd
ANIMAL INTERVENTION WARNING SYSTEM (4).pptx
Drone Technology Electronics components_1
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
July 2025: Top 10 Read Articles Advanced Information Technology
TE-AI-Unit VI notes using planning model
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
Queuing formulas to evaluate throughputs and servers
Glazing at Facade, functions, types of glazing
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
flutter Launcher Icons, Splash Screens & Fonts

Adversarial Training is all you Need.pptx

  • 1. Adversarial Training is all you Need A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” 2019 Prerana Khatiwada Phd Second year
  • 2. Problem Statement Can Adversarial Training be used to defense against Poisoning attacks? ● Neural networks have lately been providing state-of-the-art performance on most machine- learning problems. ● Test time attacks add imperceptible noise to samples to change the models decision, ● Train time attacks add adversarially manipulated points to the training set which can be exploited during test time.
  • 3. Adversarial Examples Adversarial Examples: Manipulating image inputs with the purpose of confusing a neural network, resulting in the misclassification of a given input. ● Are indistinguishable to the human eye, but cause the network to fail to identify the contents of the image. Poisoning Attacks: Attacker maliciously add crafted points to the training data before it is fed into the Algorithm. ● Poisoning attack is one of the attack that adversarial ML aims to prevent.
  • 4. Poisoning Attacks ● There are two main types backdoor attacks : Clean label backdoors and the Badnet attack. ● Clean label backdoors add an adversarial perturbation along with the backdoor pattern, but it does not change the label on the data point. ● In the BadNets attack, the attacker is allowed to change the labels on the data point, which leads to a more stronger attack. ● This work closely depends on the Madry et. al Paper that has uses PGD attack to generate adversarial samples.
  • 5. The malicious attacker adds poisoned points to the training data, which is then exploited during model deployment
  • 6. Gradient Descent attack Develop a perturbation vector for the input image by making a slight modification to the back- propagation algorithm. Contrary to common practice, while back-propagating through the network, it considers the model parameters(or weights) to be constant and the input to be a variable. Hence, gradients corresponding to each element of the input (for example, pixels in case of images) can be obtained. The PGD attack is a white-box attack where the attacker has access to the model gradients.
  • 7. Objective ● Perform Adversarial Training on the Models using CNN and test them against Poisoning attacks on various Datasets like CIFAR and MNIST. ● Compared the training approach with the State of Art Adversarial training.
  • 8. Proposed Idea ● This project aims to answers the following questions, “Can adversarial training defend against poisoning attacks?” ● Since Adversarial examples are themselves strong poisons, Standard adversarial training with a strong adversarial attack , PGD is used in this case. ● PGD-AT is tried on the poisoned data to see if the robust models so created were able to defend against poisoning attacks.
  • 9. Methodology Datasets ● CIFAR-10 ● MNIST ● Model: Custom Neural Net Attacks: Clean Label Backdoor Attack(TURNER attack) and Bad Net( Gu et Al Attack)
  • 10. State of the Art Approach ●Implemented own version of Clean-Label Backdoor attack and Badnets attack using Keras. ●Poisoned the datasets using two attacks and try to defend it using Adversarial Training. Comparing the training approach with the State of the Art Adversarial training. MNIST Dataset CIFAR 10 Dataset Natural PGD 98.66% 93.2% Natural PGD 92.7% 79.4%
  • 11. Results MNIST Dataset(Clean Label Attack/TURNER attack) Test accuracy on Images with backdoor Normal Training on Poisoned Data Adversarial Training on the Poisoned Data 0.34% 81.55%
  • 12. Results MNIST Dataset(Badnet Attack/Gu et al attack) Test accuracy on Images with backdoor Normal Training on Poisoned Data Effectiveness of Poison after normal training on poisoned data Adversarial Training on Poisoned Data Effectiveness of Poison after AT on Poisoned Data 5.4% 86.58% 96.67% 3.42%
  • 13. Results CIFAR 10 Dataset(Clean Label Attack/TURNER attack) Test accuracy on Images with backdoor Normal Training on Poisoned Data Adversarial Training on the Poisoned Data 23% 52.26%
  • 14. Results CIFAR 10 Dataset(Badnet Attack/Gu et al attack) Test accuracy on Images with backdoor Normal Training on Poisoned Data Adversarial Training on the Poisoned Data 8.64% 55.75%
  • 15. Why Adversarial training works on clean label attacks? ● The adversary adds adversarial perturbations along with the backdoor pattern in order to poison the model. ● The PGD-AT that we propose makes the model robust against adversarial perturbations, which in turn makes the clean label attack work poorly. This way, AT defends against Clean label attacks.
  • 16. Why Adversarial Training proves to be effective against BadNets Attack in MNIST? ● Network trained without AT, Poisoned data ● Two separate clusters ● Clear Distinctions between the poisoned and non poisoned classes ● Network trained AT, Poisoned data ● No separate clusters ● No clear Distinction which means that model is able to remove the effect of backdoors.Hence this way adversarial training is able to remove the effect of even label based poisoning attacks
  • 17. References ➔Adversarial Machine Learning in Image Classification: A Survey Towards the Defender’s Perspective https://arxiv.org/pdf/2009.03728.pdf ➔Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses https://arxiv.org/pdf/2012.10544.pdf ➔BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain https://arxiv.org/pdf/1708.06733.pdf ➔Towards Deep Learning Models Resistant to Adversarial Attacks https://arxiv.org/pdf/1706.06083.pdf ➔Clean-Label Backdoor Attacks https://people.csail.mit.edu/madry/lab/cleanlabel.pdf

Editor's Notes

  • #3: However, they are susceptible to attacks during train and test time, affecting their ability to infer precisely. Adversarial Training is considered to be a reliable defense against adversarial attacks. Train the model to identify adversarial examples. Training/ retraining a model using these examples, it will be able to identify future adversarial attacks.
  • #4: The model is then trained on this poisoned dataset and learns "wrong" features or backdoors, which can be later exploited during test time.
  • #5: Poisoning attacks fall under train time attacks. A model trained on a poisoned dataset learns spurious features, which can later be exploited by the attacker during test time. PDT-AT is considered to be the strongest Adversarial training method.
  • #8: As the training progresses, the adversarial examples generated also keep changing, enabling the model to learn robust features.which can help discriminating against backdoor features which are normally learnt during training.
  • #9: Recent work by Fowl et. al states that adversarial examples can also act as strong poisoning examples, which provide high attack success.Whereas, adversarial training utilizes adversarial samples generated from strong attacks to make the model more robust.The main intuition here is that adversarial training enables the model to learn robust features, which can help discriminating against backdoor features which are normally learnt during training. Authors in the original AT paper (Madry et al) presented results using ResNets. The forecasted time to train the model was 4.5 days given the computational hardware. Trained the network for 2 days with Vanilla CNN.
  • #10: Computer-vision dataset used for object recognition. 50,000 32x32 color training images and 10,000 test images, labeled over 10 categories. It is a dataset of 60,000 small square 28×28 pixel grayscale images of handwritten single digits between 0 and 9.
  • #11: The entire project revolved around making hypotheses and testing them against complex models and datasets for different attacks and defenses.
  • #16: This is because simply adding a backdoor pattern, and not changing the label of the sample. A backdoor attack tricks the model to associate a backdoor pattern with a specific target label, so that, whenever this pattern appears, the model predicts the target label, otherwise, behaves normally.