Machine Learning Security and Privacy
Machine Learning Security and Privacy
O ur special issue explores emerging security and learning sensitive information about the training data
privacy aspects related to machine learning and and model parameters.
artificial intelligence techniques, which are increas- Consequently, there is a need to understand this
ingly deployed for automated decisions in many critical wide range of threats against machine learning, design
applications today. With the advancement of machine resilient defenses, and address the open problems in
learning and deep learn- securing machine learn-
ing and their use in health ing deployed in practi-
care, finance, autono- An area of research called adversarial cal settings. Our special
mous vehicles, personal- machine learning has been developed at the issue call for papers
ized recommendations, intersection of cybersecurity and machine solicited ar ticles on
and cybersecurity, under- critical topics related to
learning to understand the security of
standing the security and machine learning secu-
privacy vulnerabilities of machine learning in various settings. rity and privacy, includ-
these methods and devel- ing the following:
oping resilient defenses
becomes extremely important. An area of research ■ ■ applications of machine learning and artificial
called adversarial machine learning has been developed intelligence to security problems, such as spam
at the intersection of cybersecurity and machine learn- detection, forensics, malware detection, and user
ing to understand the security of machine learning in authentication
various settings. Early work in adversarial machine ■■ evasion attacks and defenses against machine learning
learning showed the existence of adversarial exam- and deep learning methods
ples: data samples that can create misclassifications ■■ poisoning attacks against machine learning at training
at deployment time. Other threats against machine time, such as backdoor poisoning and targeted poi-
learning include poisoning attacks, where an adver- soning attacks, and corresponding defenses
sary controls a subset of data at training time, and ■■ privacy attacks against machine learning, including
privacy attacks in which an adversary is interested in membership inference, reconstruction attacks, model
extraction, and corresponding defenses
Digital Object Identifier 10.1109/MSEC.2022.3188190
■■ adversarial machine learning and defenses in specific
Date of current version: 6 September 2022 applications, including natural language processing
1540-7993/22©2022IEEE Copublished by the IEEE Computer and Reliability Societies September/October 2022 11
MACHINE LEARNING SECURITY AND PRIVACY
(NLP), autonomous vehicles, health care, speech does not stop there: applying differential privacy may
recognition, and cybersecurity also result in fewer fair models and the application of
■■ methods for federated learning and secure multiparty adversarial training—a de facto defense against adver-
computation techniques for machine learning. sarial samples—and it may jeopardize privacy. These
implications need to be considered holistically to be
We were delighted to receive 10 submissions, from able to generate suitable solutions where all trustworthy
which we selected a set of seven articles for publication in objectives are covered. This article summarizes these
the special issue after rigorous peer review. The accepted conflicting aspects of machine learning trustworthiness
articles discuss important topics in private machine learn- and highlights the need for the research community to
ing, the security of natural language models, and the address them.
robustness of machine The article “Back-
learning used for secu- doors Against Natural
rity applications, such There is a need to understand this wide Language Processing: A
as malware and phish- range of threats against machine learning, Review,” by Shaofeng Li,
ing detection. design resilient defenses, and address the Tian Dong, Benjamin Zi
The first three articles
open problems in securing machine learning Hao Zhao, Minhui Xue,
address several issues Suguo Du, and Haojin
related to data privacy deployed in practical settings. Zhu, provides a survey
in machine lear ning. of backdoor poisoning
The first two, “Sphynx: attacks in NLP systems.
A Deep Neural Network Design for Private Inference,” Backdoor attacks are a type of poisoning attacks in
by Minsu Cho, Zahra Ghodsi, Brandon Reagen, Sid- which backdoored samples are inserted by adversaries
dharth Garg, and Chinmay Hegde, and “Complex at training time to induce a targeted misclassification of
Encoded Tile Tensors: Accelerating Encrypted Ana- the samples with the same backdoor pattern at testing
lytics,” by Ehud Aharoni, Nir Drucker, Gilad Ezov, time. Recently, large language models, such as Genera-
Hayim Shaul, and Omri Soceanu, discuss the prob- tive Pretrained Transformer (GPT) 2, GPT-3, and Bidi-
lem of performing efficient private inference in neu- rectional Encoder Representations From Transformers,
ral networks over encrypted data. Private inference have been using transformer-based architectures, which
allows a client to outsource neural network prediction leverage self-attention mechanisms that model relation-
to a more powerful cloud provider such that the client ships among all words in a sentence. Transformers have
does not learn anything about the cloud-hosted model shown superior performance in many NLP tasks, such
parameters, and the cloud does not learn the client’s as machine translation and question answering, but the
input. The main challenge is that private inference is article discusses their vulnerabilities against stealthy,
based on expensive cryptographic techniques, includ- hard-to-detect backdoor attacks. This is an important
ing homomorphic encryption and garbled circuits, threat that needs to be considered to enable the deploy-
and computation becomes prohibitive for large neural ment of these models in critical applications.
networks. The first article proposes the Sphynx frame- The article “Machine Learning for Source Code Vul-
work for neural architecture search to minimize the nerability Detection: What Works and What Isn’t There
number of expensive activation operations in neural Yet,” by Tina Marjanov, Ivan Pashchenko, and Fabio
networks and reduce the cost of private inference. The Massacci, provides an interesting study of machine
second introduces a different approach by representing learning techniques for defect detection and the auto-
vectors and matrices used in neural network inference mated correction of security defects in source code.
as more compact “tiled tensors” and shows that this rep- Starting from around 400 techniques, this study out-
resentation reduces the cost of operations performed lines popular approaches and highlights their limita-
over encrypted data. tions. By including the end-to-end machine learning
The third article, “Data Privacy and Trustworthy pipeline in the analysis, one identified limitation is
Machine Learning,” by Martin Strobel and Reza Shokri, the lack of access to real data that researchers have for
discusses differential privacy in machine learning and exploring this problem. Consequently, a large number
presents an interesting analysis of how different trust- of researchers generate unrealistic synthetic data that
worthiness objectives, including robustness, privacy, lead to the creation of pipelines that do not generalize
fairness, and explainability, may be at odds with one to real data sets. This article also highlights the grow-
another. Interestingly, models that are designed to be ing popularity of deep neural networks in this area.
explainable are also more susceptible to membership Although not popular at the time of the study, we expect
inference attacks that leak private data. The connection to see large language models used in the near future.
www.computer.org/security 13