4. backdoor attacks
4. backdoor attacks
Backdoor/Trojan Attack
CSIT375/975 AI and Cybersecurity
Dr Wei Zong
SCIT University of Wollongong
2
Backdoor (Trojan) Attack
• Backdoor attack
• An adversary inserts backdoors into a deep learning model.
• The model behaves normally on clean input.
• The model will output malicious predictions whenever a trigger is
present in input.
• A trigger can be a small square stamped on input images.
• A trigger can also be a piece of background music.
4
Visible Backdoor Attack - BadNets
• A basic approach to insert backdoors
• An attacker does not modify the target network's
architecture
• The attack would be easily detected unless there is a
convincing reason for this.
• We will see a backdoor attack that does modify the
architecture later.
• Instead, the attacker modifies the model weights
• Some neurons in the target network would respond to
triggers and change the output.
5
Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, pp.47230-47244.
Visible Backdoor Attack - BadNets
• Scenario: detecting and classifying traffic signs in images taken from a
car-mounted camera.
• An adversary is an online model training provider.
• A user wishes to obtain a model for a certain task.
• The adversary inserts backdoors during training the model.
• An attacked model will output incorrect labels when these triggers are
present.
• Three different backdoor triggers
• a yellow square.
• an image of a bomb.
• an image of a flower.
6
Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, pp.47230-47244.
Visible Backdoor Attack - BadNets
• Targeted attack
• The attack changes the label of a backdoored stop sign to a speed-limit sign
• Untargeted Attack
• The attack changes the label of a backdoored traffic sign to a randomly
selected incorrect label.
• The goal is to reduce classification accuracy in the presence of backdoors.
• Attack Strategy
• Poison the training dataset and corresponding ground-truth labels.
• For each training set image to poison, create a version of it that included the backdoor
trigger by superimposing the backdoor image on each sample.
7
Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, pp.47230-47244.
Visible Backdoor Attack - BadNets
Targeted Attacks
Untargeted Attacks
8
Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, pp.47230-47244.
Visible Backdoor Attack - BadNets
• Attacks succeed in the physical world
• No physical transformations were
considered when poisoning the training set.
• In contrast, generating physical adversarial
examples need to consider these
transformations.
• E.g., environmental conditions and fabrication
error.
• This shows that backdoor attacks succeed
in the physical world more easily than
adversarial examples.
• Backdoor attacks can exploit the generalization
ability of models.
• Adversarial examples cannot exploit this ability.
9
Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural networks. IEEE Access, 7, pp.47230-47244.
Visible Backdoor Attack – Blended Attack
• Poisoning the training data with images blended by backdoors.
• The idea is basically the same as BadNets.
• Except for making backdoors semitransparent.
• An image blended with the Hello Kitty pattern.
• The backdoors are less noticeable.
• This may not be necessary if backdoors do not arouse suspicion.
10
Chen, X., Liu, C., Li, B., Lu, K. and Song, D., 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.
Visible Backdoor Attack – Blended Attack
• Work in the physical world
• The effectiveness of the attacks
are different when using the
photos of different people .
• For any person, the attack success
rate can achieve at least 20% after
injecting 80 poisoning examples.
• The training set contains 600,000
images.
• Practical threats to face recognition
systems.
• Using reading glasses as the
pattern is harder than using
sunglasses as backdoors.
11
Chen, X., Liu, C., Li, B., Lu, K. and Song, D., 2017. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.
Invisible Backdoor Attack - SSBA
• Drawbacks of BadNets and Blended Attack
• Backdoor triggers are visible.
• Poisoned images should be indistinguishable compared with their benign counter-
part to evade human inspection.
• Adopted a sample-agnostic trigger design.
• The trigger is fixed in either the training or testing phase.
• Can be detected and removed by defense.
12
Li, Y., Li, Y., Wu, B., Li, L., He, R. and Lyu, S., 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16463-16472).
Invisible Backdoor Attack - SSBA
• Sample-specific Backdoor Attack (SSBA)
• Attack stage
• Use an autoencoder (“Encoder”) to poison some benign training samples by injecting sample-specific triggers.
• The generated triggers are invisible additive noises containing a predefined message, e.g., the target label in text format.
• Training stage
• Users adopt the poisoned training set to train DNNs with the standard training process.
• The mapping from the triggers to the target label will be generated.
• Inference stage
• Infected classifiers (i.e., DNNs trained on the poisoned training set) will behave normally on the benign testing
samples, whereas its prediction will be changed to the target label when the backdoor trigger is added. 13
Li, Y., Li, Y., Wu, B., Li, L., He, R. and Lyu, S., 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16463-16472).
Invisible Backdoor Attack - SSBA
• Autoencoder
Input Output
Encoder Decoder
• An autoencoder is a deep neural network that learns efficient codings of unlabeled data
• Unsupervised learning.
• An autoencoder consists of 2 components
• An encoder transforms input data (images, audio, etc.) to a lower dimensional space.
• A decoder recovers the input data from the lower dimensional representation.
• E.g., minimizing 𝐿𝑝 norm of the difference between input and output.
• A common choice is to make its architecture symmetrical to the encoder architecture.
• Latent variable
• The lower-dimensional representation is called the latent variable of input data.
• Latent variables contain information of input.
• The decoder uses it to reconstruct the original input.
• Cannot be fully explained.
14
Li, Y., Li, Y., Wu, B., Li, L., He, R. and Lyu, S., 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16463-16472).
Invisible Backdoor Attack - SSBA
16
Li, Y., Li, Y., Wu, B., Li, L., He, R. and Lyu, S., 2021. Invisible backdoor attack with sample-specific triggers. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 16463-16472).
Invisible Backdoor Attack - SSBA
• Steps
• First, the attacker generates a set of poisoned images that look like target category and keeps the
trigger secret.
• Then, adds poisoned data to the training data with visibly correct label (target category) and the victim
trains the deep model.
• Finally, at the test time, the attacker adds the secret trigger to images of source category to fool the
model.
20
Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
Clean Label Backdoor Attack - Hidden Trigger Attack
http://neuralnetworksanddeeplearning.com/chap5.html
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep
convolutional neural networks. Advances in neural information processing systems, 25.
22
Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
Clean Label Backdoor Attack - Hidden Trigger Attack
24
Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
Clean Label Backdoor Attack - Hidden Trigger Attack
• Demystify the magic
Decision Boundary
Decision Boundary
25
Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
Limitations of Backdoor Attacks
• Previously discussed backdoor attacks inevitably decrease
model’s performance.
• The target model needs to be retrained/fine-tuned to learn backdoors.
• Well-trained parameters are modified.
• Backdoors are not related to the intended tasks.
• Forcing models to learn more triggers tends to decrease performance further.
26
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanNet
• Hackers can insert a small number of neurons into the target DNN models
• The inserted neurons form TrojanNet.
• A shallow 4-layer fully connected network.
• Each layer contains eight neurons.
• Add necessary neuron connections to the target model.
• Merge output from TrojanNet with output from the target model.
28
Tang, R., Du, M., Liu, N., Yang, F. and Hu, X., 2020, August. An embarrassingly simple approach for trojan attack in deep neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 218-228).
Module Backdoor Attack - TrojanNet
• Experimental results in four different applications dataset
30
Module Backdoor Attack - TrojanModel
Let’s go beyond images (again), i.e., speech-to-text
• An adversary obtains a pre-trained TTS model and
attach an extra module, called TrojanModel, into it
• Improving performance under certain conditions
• in noisy environments
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
The loss function:
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
● Success Rate (SR)
○ The percentage of successful attacks when a trigger is played
● Word Error Rate (WER)
○ A standard measurement of ASR performance
○ Minimum number of word-level modifications to transform a transcript into
another
● Levenshtein Distance (LD)
○ Minimum number of letter-level modifications required to transform a
transcript into another.
○ Similar to WER, but in letter-level.
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Over-the-line Attacks
● 100 attacks and 100 benign speech
○ Attacks were generated by combining benign speech with the corresponding
trigger
● Results
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Enticing users to use the TrojanModel
● TrojanModel improves recognition accuracy and WER compared to the
uncompromised ASR under various noisy conditions
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Over-the-air (physical) Attacks
● Common commercial products
○ Dell G7 laptop, IPhone6S, IPhoneX, iPad Mini, and iPad Pro
○ Use their speakers and microphones for playing and recording audio
● In a real-world apartment bedroom
○ Experiments were conducted during the day
■ Include noise from the street and the neighbors
○ the room was approximately 2.5 ×3.5 meters with a height of 2.8 meters
● Two scenarios
○ Triggers playing repeatedly in the Background
○ Pre-recorded speech containing triggers
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Scenario 1: Over-the-air Attacks with Triggers Playing Repeatedly in the Background
● Device types and locations
○ iPad mini 4 played each test speech; Dell G7 played the trigger; iPhone6S
recorded audio.
● Results
○ the same 100 audio for over-the-line attacks
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Scenarios 2: Over-the-air Attacks of Pre-recorded Speech Containing Triggers
● Device types and locations
○ iPad Pro played attacks; iPhoneX recorded audio.
○ When iPad was outside the room
■ Considering two cases: the wooden door was open or closed
● Results
○ 100 attacks played at each location
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
Module Backdoor Attack - TrojanModel
Online examples: https://sites.google.com/view/trojan-attacks-asr
Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-1683). IEEE.
References
• Gu, T., Liu, K., Dolan-Gavitt, B. and Garg, S., 2019. Badnets: Evaluating backdooring attacks on deep neural
networks. IEEE Access, 7, pp.47230-47244.
• Chen, X., Liu, C., Li, B., Lu, K. and Song, D., 2017. Targeted backdoor attacks on deep learning systems using data
poisoning. arXiv preprint arXiv:1712.05526.
• Li, Y., Li, Y., Wu, B., Li, L., He, R. and Lyu, S., 2021. Invisible backdoor attack with sample-specific triggers. In
Proceedings of the IEEE/CVF international conference on computer vision (pp. 16463-16472).
• Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the
AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
• Saha, A., Subramanya, A. and Pirsiavash, H., 2020, April. Hidden trigger backdoor attacks. In Proceedings of the
AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 11957-11965).
• Tang, R., Du, M., Liu, N., Yang, F. and Hu, X., 2020, August. An embarrassingly simple approach for trojan attack in
deep neural networks. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery
& data mining (pp. 218-228).
• Zong, W., Chow, Y.W., Susilo, W., Do, K. and Venkatesh, S., 2023, May. Trojanmodel: A practical trojan attack
against automatic speech recognition systems. In 2023 IEEE Symposium on Security and Privacy (SP) (pp. 1667-
1683). IEEE.
42