0% found this document useful (0 votes)
44 views

Synthesizing Robust Adversarial Examples

Uploaded by

chloekellerwa64
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Synthesizing Robust Adversarial Examples

Uploaded by

chloekellerwa64
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Synthesizing Robust Adversarial Examples

Anish Athalye * 1 2 Logan Engstrom * 1 2 Andrew Ilyas * 1 2 Kevin Kwok 2

Abstract
Standard methods for generating adversarial ex-
amples for neural networks do not consistently
fool neural network classifiers in the physical
arXiv:1707.07397v3 [cs.CV] 7 Jun 2018

world due to a combination of viewpoint shifts,


camera noise, and other natural transformations,
limiting their relevance to real-world systems. We classified as turtle classified as rifle
demonstrate the existence of robust 3D adversar- classified as other
ial objects, and we present the first algorithm for
synthesizing examples that are adversarial over a Figure 1. Randomly sampled poses of a 3D-printed turtle adver-
chosen distribution of transformations. We syn- sarially perturbed to classify as a rifle at every viewpoint2 . An
thesize two-dimensional adversarial images that unperturbed model is classified correctly as a turtle nearly 100%
are robust to noise, distortion, and affine trans- of the time.
formation. We apply our algorithm to complex
three-dimensional objects, using 3D-printing to We show that neural network-based classifiers are vulner-
manufacture the first physical adversarial objects. able to physical-world adversarial examples that remain
Our results demonstrate the existence of 3D ad- adversarial over a different viewpoints. We introduce a new
versarial objects in the physical world. algorithm for synthesizing adversarial examples that are
robust over a chosen distribution of transformations, which
we apply for reliably producing robust adversarial images as
1. Introduction well as physical-world adversarial objects. Figure 1 shows
The existence of adversarial examples for neural net- an example of an adversarial object constructed using our
works (Szegedy et al., 2013; Biggio et al., 2013) was initially approach, where a 3D-printed turtle is consistently classi-
largely a theoretical concern. Recent work has demonstrated fied as rifle (a target class that was selected at random) by
the applicability of adversarial examples in the physical an ImageNet classifier. In this paper, we demonstrate the
world, showing that adversarial examples on a printed page efficacy and generality of our method, demonstrating con-
remain adversarial when captured using a cell phone cam- clusively that adversarial examples are a practical concern
era in an approximately axis-aligned setting (Kurakin et al., in real-world systems.
2016). But while minute, carefully-crafted perturbations
can cause targeted misclassification in neural networks, ad- 1.1. Challenges
versarial examples produced using standard techniques fail Methods for transforming ordinary two-dimensional images
to fool classifiers in the physical world when the examples into adversarial examples, including techniques such as the
are captured over varying viewpoints and affected by natural L-BFGS attack (Szegedy et al., 2013), FGSM (Goodfel-
phenomena such as lighting and camera noise (Luo et al., low et al., 2015), and the CW attack (Carlini & Wagner,
2016; Lu et al., 2017). These results indicate that real-world 2017c), are well-known. While adversarial examples gener-
systems may not be at risk in practice because adversarial ated through these techniques can transfer to the physical
examples generated using standard techniques are not robust world (Kurakin et al., 2016), the techniques have limited
in the physical world. success in affecting real-world systems where the input may
*
Equal contribution 1 Massachusetts Institute of Technol- be transformed before being fed to the classifier. Prior work
ogy 2 LabSix. Correspondence to: Anish Athalye <aatha- has shown that adversarial examples generated using these
[email protected]>. standard techniques often lose their adversarial nature once
Proceedings of the 35 th International Conference on Machine 2
See https://youtu.be/YXy6oX1iNoA for a video
Learning, Stockholm, Sweden, PMLR 80, 2018. Copyright 2018 where every frame is fed through the ImageNet classifier: the turtle
by the author(s). is consistently classified as a rifle.
Synthesizing Robust Adversarial Examples

subjected to minor transformations (Luo et al., 2016; Lu 2. Approach


et al., 2017).
First, we present the Expectation Over Transformation
Prior techniques attempting to synthesize adversarial exam- (EOT) algorithm, a general framework allowing for the
ples robust over any chosen distribution of transformations construction of adversarial examples that remain adversar-
in the physical world have had limited success (Evtimov ial over a chosen transformation distribution T . We then
et al., 2017). While some progress has been made, concur- describe our end-to-end approach for generating adversarial
rent efforts have demonstrated a small number of data points objects using a specialized application of EOT in conjunc-
on nonstandard classifiers, and only in the two-dimensional tion with differentiating through the 3D rendering process.
case, with no clear generalization to three dimensions (fur-
ther discussed in Section 4). 2.1. Expectation Over Transformation
Prior work has focused on generating two-dimensional ad- When constructing adversarial examples in the white-box
versarial examples, even for the physical world (Sharif et al., case (that is, with access to a classifier and its gradient), we
2016; Evtimov et al., 2017), where “viewpoints” can be know in advance a set of possible classes Y and a space
approximated by an affine transformations of an original of valid inputs X to the classifier; we have access to the
image. However, 3D objects must remain adversarial in function P (y|x) and its gradient ∇x P (y|x), for any class
the face of complex transformations not applicable to 2D y ∈ Y and input x ∈ X. In the standard case, adversarial
physical-world objects, such as 3D rotation and perspective examples are produced by maximizing the log-likelihood of
projection. the target class yt over a -radius ball around the original
image (which we represent as a vector of d pixels each in
1.2. Contributions [0, 1]):
We demonstrate the existence of robust adversarial examples
and adversarial objects in the physical world. We propose a arg max log P (yt |x0 )
general-purpose algorithm for reliably constructing adver- x0

sarial examples robust over a chosen distribution of transfor- subject to ||x0 − x||p < 
mations, and we demonstrate the efficacy of this algorithm x0 ∈ [0, 1]d
in both the 2D and 3D case. We succeed in computing and
fabricating physical-world 3D adversarial objects that are This approach has been shown to be effective at generating
robust over a large, realistic distribution of 3D viewpoints, adversarial examples. However, prior work has shown that
demonstrating that the algorithm successfully produces ad- these adversarial examples fail to remain adversarial under
versarial three-dimensional objects that are adversarial in image transformations that occur in the real world, such as
the physical world. Specifically, our contributions are as angle and viewpoint changes (Luo et al., 2016; Lu et al.,
follows: 2017).
To address this issue, we introduce Expectation Over
• We develop Expectation Over Transformation (EOT), Transformation (EOT). The key insight behind EOT is to
the first algorithm that produces robust adversarial ex- model such perturbations within the optimization procedure.
amples: single adversarial examples that are simulta- Rather than optimizing the log-likelihood of a single ex-
neously adversarial over an entire distribution of trans- ample, EOT uses a chosen distribution T of transformation
formations. functions t taking an input x0 controlled by the adversary
to the “true” input t(x0 ) perceived by the classifier. Fur-
thermore, rather than simply taking the norm of x0 − x
• We consider the problem of constructing 3D adversar- to constrain the solution space, given a distance function
ial examples under the EOT framework, viewing the d(·, ·), EOT instead aims to constrain the expected effective
3D rendering process as part of the transformation, and distance between the adversarial and original inputs, which
we show that the approach successfully synthesizes we define as:
adversarial objects.

δ = Et∼T [d(t(x0 ), t(x))]


• We fabricate the first 3D physical-world adversarial ob-
jects and show that they fool classifiers in the physical We use this new definition because we want to minimize the
world, demonstrating the efficacy of our approach end- (expected) perceived distance as seen by the classifier. This
to-end and showing the existence of robust physical- is especially important in cases where t(x) has a different
world adversarial objects. domain and codomain, e.g. when x is a texture and t(x) is a
Synthesizing Robust Adversarial Examples

rendering corresponding to the texture, we care to minimize To solve this optimization problem, EOT requires the ability
the visual difference between t(x0 ) and t(x) rather than to differentiate though the 3D rendering function with re-
minimizing the distance in texture space. spect to the texture. Given a particular pose and choices for
all other transformation parameters, a simple 3D rendering
Thus, we have the following optimization problem:
process can be modeled as a matrix multiplication and addi-
tion: every pixel in the rendering is some linear combination
arg max Et∼T [log P (yt |t(x0 ))] of pixels in the texture (plus some constant term). Given a
x0
particular choice of parameters, the rendering of a texture x
subject to Et∼T [d(t(x0 ), t(x))] <  can be written as M x + b for some coordinate map M and
x ∈ [0, 1]d background b.
Standard 3D renderers, as part of the rendering pipeline,
In practice, the distribution T can model perceptual dis- compute the texture-space coordinates corresponding to on-
tortions such as random rotation, translation, or addition screen coordinates; we modify an existing renderer to return
of noise. However, the method generalizes beyond simple this information. Then, instead of differentiating through
transformations; transformations in T can perform opera- the renderer, we compute and then differentiate through
tions such as 3D rendering of a texture. M x + b. We must re-compute M and b using the renderer
We maximize the objective via stochastic gradient descent. for each pose, because EOT samples new poses at each
We approximate the gradient of the expected value through gradient descent step.
sampling transformations independently at each gradient
descent step and differentiating through the transformation. 2.3. Optimizing the objective
Once EOT has been parameterized, i.e. once a distribution T
2.2. Choosing a distribution of transformations
is chosen, the issue of actually optimizing the induced objec-
Given its ability to synthesize robust adversarial examples, tive function remains. Rather than solving the constrained
we use the EOT framework for generating 2D examples, 3D optimization problem given above, we use the Lagrangian-
models, and ultimately physical-world adversarial objects. relaxed form of the problem, as Carlini & Wagner (2017c)
Within the framework, however, there is a great deal of free- do in the standard single-viewpoint case:
dom in the actual method by which examples are generated,
including choice of T , distance metric, and optimization 
method. arg max Et∼T [log P (yt |t(x0 ))]
x0

2.2.1. 2D CASE −λEt∼T [d(t(x0 ), t(x)])
In the 2D case, we choose T to approximate a realistic
space of possible distortions involved in printing out an In order to encourage visual imperceptibility of the gen-
image and taking a natural picture of it. This amounts to a erated images, we set d(x0 , x) to be the `2 norm in the
set of random transformations of the form t(x) = Ax + b, LAB color space, a perceptually uniform color space where
which are more thoroughly described in Section 3. These Euclidean distance roughly corresponds with perceptual dis-
random transformations are easy to differentiate, allowing tance (McLaren, 1976). Using distance in LAB space as a
for a straightforward application of EOT. proxy for human perceptual distance is a standard technique
in computer vision. Note that the Et∼T [||LAB(t(x0 )) −
2.2.2. 3D CASE LAB(t(x))||2 ] can be sampled and estimated in conjunc-
tion with E[P (yt |t(x))]; in general, the Lagrangian formu-
We note that the domain and codomain of t ∈ T need not lation gives EOT the ability to constrain the search space (in
be the same. To synthesize 3D adversarial examples, we our case, using LAB distance) without computing a complex
consider textures (color patterns) x corresponding to some projection. Our optimization, then, is:
chosen 3D object (shape), and we choose a distribution of
transformation functions t(x) that take a texture and render
a pose of the 3D object with the texture x applied. The h
transformation functions map a texture to a rendering of an arg max Et∼T log P (yt |t(x0 ))
x0
object, simulating functions including rendering, lighting, i
rotation, translation, and perspective projection of the object. −λ||LAB(t(x0 )) − LAB(t(x))||2
Finding textures that are adversarial over a realistic distribu-
tion of poses allows for transfer of adversarial examples to We use projected gradient descent to maximize the objective,
the physical world. and clip to the set of valid inputs (e.g. [0, 1] for images).
Synthesizing Robust Adversarial Examples

3. Evaluation
First, we describe our procedure for quantitatively evaluat-
ing the efficacy of EOT for generating 2D, 3D, and physical- Original:
world adversarial examples. Then, we show that we can Persian 97% / 99% / 19% / 95% /
reliably produce transformation-tolerant adversarial exam- cat 0% 0% 0% 0%
ples in both the 2D and 3D case. We show that we can
synthesize and fabricate 3D adversarial objects, even those
with complex shapes, in the physical world: these adver-
sarial objects remain adversarial regardless of viewpoint,
camera noise, and other similar real-world factors. Finally, Adv: 0% / 0% / 0% / 0% /
we present a qualitative analysis of our results and discuss jacamar 91% 96% 83% 97%
some challenges in applying EOT in the physical world.

3.1. Procedure Figure 2. A 2D adversarial example showing classifier confidence


in true / adversarial classes over randomly sampled poses.
In our experiments, we use TensorFlow’s standard pre-
trained InceptionV3 classifier (Szegedy et al., 2015) which
has 78.0% top-1 accuracy on ImageNet. In all of our ex- 3.2. Robust 2D adversarial examples
periments, we use randomly chosen target classes, and we
use EOT to synthesize adversarial examples over a chosen In the 2D case, we consider the distribution of transforma-
distribution. We measure the `2 distance per pixel between tions that includes rescaling, rotation, lightening or dark-
the original and adversarial example (in LAB space), and we ening by an additive factor, adding Gaussian noise, and
also measure classification accuracy (percent of randomly translation of the image.
sampled viewpoints classified as the true class) and adver-
We take the first 1000 images in the ImageNet validation set,
sariality (percent of randomly sampled viewpoints classified
randomly choose a target class for each image, and use EOT
as the adversarial class) for both the original and adver-
to synthesize an adversarial example that is robust over the
sarial example. When working in simulation, we evaluate
chosen distribution. We use a fixed λ in our Lagrangian to
over a large number of transformations sampled randomly
constrain visual similarity. For each adversarial example, we
from the distribution; in the physical world, we evaluate
evaluate over 1000 random transformations sampled from
over a large number of manually-captured images of our
the distribution at evaluation time. Table 1 summarizes the
adversarial objects taken over different viewpoints.
results. The adversarial examples have a mean adversariality
Given a source object x, a set of correct classes of 96.4%, showing that our approach is highly effective in
{y1 , . . . , yn }, a target class yadv 6∈ {y1 , . . . , yn }, and a producing robust adversarial examples. Figure 2 shows one
robust adversarial example x0 , we quantify the effectiveness synthesized adversarial example. See the appendix for more
of the adversarial example over a distribution of transfor- examples.
mations T as follows. Let C(x, y) be a function indicating
whether the image x was classified as the class y: 3.3. Robust 3D adversarial examples
We produce 3D adversarial examples by modeling the 3D
( rendering as a transformation under EOT. Given a textured
1 if x is classified as y
C(x, y) = 3D object, we optimize the texture such that the rendering
0 otherwise is adversarial from any viewpoint. We consider a distribu-
tion that incorporates different camera distances, lighting
We quantify the effectiveness of a robust adversarial exam- conditions, translation and rotation of the object, and solid
ple by measuring adversariality, which we define as: background colors. We approximate the expectation over
transformation by taking the mean loss over batches of size
40; furthermore, due to the computational expense of com-
Et∼T [C(t(x0 ), yadv )] puting new poses, we reuse up to 80% of the batch at each
iteration, but enforce that each batch contain at least 8 new
poses. As previously mentioned, the parameters of the dis-
This is equal to the probability that the example is classified tribution we use is specified in the appendix, sampled as
as the target class for a transformation sampled from the independent continuous random variables (that are uniform
distribution T . We approximate the expectation by sampling except for Gaussian noise). We searched over several λ
a large number of values from the distribution at test time. values in our Lagrangian for each example / target class
Synthesizing Robust Adversarial Examples

Classification Accuracy Adversariality `2


Images
mean stdev mean stdev mean
Original 70.0% 36.4% 0.01% 0.3% 0
Adversarial 0.9% 2.0% 96.4% 4.4% 5.6 × 10−5

Table 1. Evaluation of 1000 2D adversarial examples with random targets. We evaluate each example over 1000 randomly sampled
transformations to calculate classification accuracy and adversariality (percent classified as the adversarial class).

Original:
turtle
classified as turtle classified as rifle
97% / 96% / 96% / 20% /
classified as other
0% 0% 0% 0%

Adv:
jigsaw classified as baseball classified as espresso
0% / 0% / 0% / 0% /
puzzle classified as other
100% 99% 99% 83%
Figure 4. A sample of photos of unperturbed 3D prints. The
Figure 3. A 3D adversarial example showing classifier confidence unperturbed 3D-printed objects are consistently classified as the
in true / adversarial classes over randomly sampled poses. true class.

pair. In our final evaluation, we used the example with the that this works well in practice: we produce objects that
smallest λ that still maintained ¿90% adversariality over are optimized for the proxy distribution, and we find that
100 held out, random transformations. they generalize to the “true” physical-world distribution and
remain adversarial.
We consider 10 3D models, obtained from 3D asset sites,
that represent different ImageNet classes: barrel, baseball, Beyond modeling the 3D rendering process, we need to
dog, orange, turtle, clownfish, sofa, teddy bear, car, and taxi. model physical-world phenomena such as lighting effects
and camera noise. Furthermore, we need to model the 3D
We choose 20 random target classes per 3D model, and use printing process: in our case, we use commercially available
EOT to synthesize adversarial textures for the 3D models full-color 3D printing. With the 3D printing technology
with minimal parameter search (four pre-chosen λ values we use, we find that color accuracy varies between prints,
were tested across each (3D model, target) pair). For each so we model printing errors as well. We approximate all
of the 200 adversarial examples, we sample 100 random of these phenomena by a distribution of transformations
transformations from the distribution at evaluation time. under EOT: in addition to the transformations considered
Table 2 summarizes results, and Figure 3 shows renderings for 3D in simulation, we consider camera noise, additive and
of drawn samples, along with classification probabilities. multiplicative lighting, and per-channel color inaccuracies.
See the appendix for more examples.
We evaluate physical adversarial examples over two 3D-
The adversarial objects have a mean adversariality of 83.4% printed objects: one of a turtle (where we consider any of the
with a long left tail, showing that EOT usually produces 5 turtle classes in ImageNet as the “true” class), and one of
highly adversarial objects. See the appendix for a plot of a baseball. The unperturbed 3D-printed objects are correctly
the distribution of adversariality over the 200 examples. classified as the true class with 100% accuracy over a large
number of samples. Figure 4 shows example photographs
3.4. Physical adversarial examples of unperturbed objects, along with their classifications.
In the case of the physical world, we cannot capture the We choose target classes for each of the 3D models at ran-
“true” distribution unless we perfectly model all physical dom — “rifle” for the turtle, and “espresso” for the baseball
phenomena. Therefore, we must approximate the distribu- — and we use EOT to synthesize adversarial examples. We
tion and perform EOT over the proxy distribution. We find evaluate the performance of our two 3D-printed adversarial
Synthesizing Robust Adversarial Examples

Classification Accuracy Adversariality `2


Images
mean stdev mean stdev mean
Original 68.8% 31.2% 0.01% 0.1% 0
Adversarial 1.1% 3.1% 83.4% 21.7% 5.9 × 10−3

Table 2. Evaluation of 200 3D adversarial examples with random targets. We evaluate each example over 100 randomly sampled poses to
calculate classification accuracy and adversariality (percent classified as the adversarial class).

Object Adversarial Misclassified Correct


Turtle 82% 16% 2%
Baseball 59% 31% 10%

Table 3. Quantitative analysis of the two adversarial objects, over


100 photos of each object over a wide distribution of viewpoints.
Both objects are classified as the adversarial target class in the
majority of viewpoints.

objects by taking 100 photos of each object over a variety


of viewpoints3 . Figure 5 shows a random sample of these
images, along with their classifications. Table 3 gives a
quantitative analysis over all images, showing that our 3D-
printed adversarial objects are strongly adversarial over a
wide distribution of transformations. See the appendix for
more examples.

3.5. Discussion classified as turtle classified as rifle


classified as other
Our quantitative analysis demonstrates the efficacy of EOT
and confirms the existence of robust physical-world adver-
sarial examples and objects. Now, we present a qualitative
analysis of the results.

Perturbation budget. The perturbation required to pro-


duce successful adversarial examples depends on the dis-
tribution of transformations that is chosen. Generally, the
larger the distribution, the larger the perturbation required.
For example, making an adversarial example robust to rota-
tion of up to 30◦ requires less perturbation than making an
example robust to rotation, translation, and rescaling. Simi-
larly, constructing robust 3D adversarial examples generally
requires a larger perturbation to the underlying texture than
required for constructing 2D adversarial examples.
classified as baseball classified as espresso
Modeling perception. The EOT algorithm as presented classified as other
in Section 2 presents a general method to construct adver- Figure 5. Random sample of photographs of the two 3D-printed
sarial examples over a chosen perceptual distribution, but adversarial objects. The 3D-printed adversarial objects are
notably gives no guarantees for observations of the image strongly adversarial over a wide distribution of viewpoints.
3
Although the viewpoints were simply the result of walking
around the objects, moving them up/down, etc., we do not call
them “random” since they were not in fact generated numerically or
sampled from a concrete distribution, in contrast with the rendered
3D examples.
Synthesizing Robust Adversarial Examples

Figure 6. Three pictures of the same adversarial turtle (all clas- Figure 7. A side-by-side comparison of a 3D-printed model (left)
sified as “rifle”), demonstrating the need for a wide distribution along with a printout of the corresponding texture, printed on a
and the efficacy of EOT in finding examples robust across wide standard laser color printer (center) and the original digital texture
distributions of physical-world effects like lighting. (right), showing significant error in color accuracy in printing.

outside of the chosen distribution. In constructing physical- Limitations. There are two possible failure cases of the
world adversarial objects, we use a crude approximation EOT algorithm. As with any adversarial attack, if the at-
of the rendering and capture process, and this succeeds in tacker is constrained to too small of a `p ball, EOT will be
ensuring robustness in a diverse set of environments; see, unable to create an adversarial example. Another case is
for example, Figure 6, which shows the same adversarial tur- when the distribution of transformations the attacker chooses
tle in vastly different lighting conditions. When a stronger is too “large”. As a simple example, it is impossible to make
guarantee is needed, a domain expert may opt to model an adversarial example robust to the function that randomly
the perceptual distribution more precisely in order to better perturbs each pixel value to the interval [0, 1] uniformly at
constrain the search space. random.

Imperceptibility. Note that we consider a “targeted ad-


Error in printing. We find significant error in the color
versarial example” to be an input that has been perturbed
accuracy of even state of the art commercially available
to misclassify as a selected class, is within the `p constraint
color 3D printing; Figure 7 shows a comparison of a 3D-
bound imposed, and can be still clearly identified as the orig-
printed model along with a printout of the model’s texture,
inal class. While many of the generated examples are truly
printed on a standard laser color printer. Still, by modeling
imperceptible from their corresponding original inputs, oth-
this color error as part of the distribution of transformations
ers exhibit noticeable perturbations. In all cases, however,
in a coarse-grained manner, EOT was able to overcome
the visual constraint (`2 metric) maintains identifiability as
the problem and produce robust physical-world adversarial
the original class.
objects. We predict that we could have produced adversarial
examples with smaller `2 perturbation with a higher-fidelity
printing process or a more fine-grained model incorporating 4. Related Work
the printer’s color gamut.
4.1. Adversarial examples

Semantically relevant misclassification. Interestingly, State of the art neural networks are vulnerable to adver-
for the majority of viewpoints where the adversarial tar- sarial examples (Szegedy et al., 2013; Biggio et al., 2013).
get class is not the top-1 predicted class, the classifier also Researchers have proposed a number of methods for syn-
fails to correctly predict the source class. Instead, we find thesizing adversarial examples in the white-box setting
that the classifier often classifies the object as an object (with access to the gradient of the classifier), including L-
that is semantically similar to the adversarial target; while BFGS (Szegedy et al., 2013), the Fast Gradient Sign Method
generating the adversarial turtle to be classified as a rifle, (FGSM) (Goodfellow et al., 2015), Jacobian-based Saliency
for example, the second most popular class (after “rifle”) Map Attack (JSMA) (Papernot et al., 2016b), a Lagrangian
was “revolver,” followed by “holster” and then “assault rifle.” relaxation formulation (Carlini & Wagner, 2017c), and
Similarly, when generating the baseball to be classified as DeepFool (Moosavi-Dezfooli et al., 2015), all for what we
an espresso, the example was often classified as “coffee” or call the single-viewpoint case where the adversary directly
“bakery.” controls the input to the neural network. Projected Gradient
Descent (PGD) can be seen as a universal first-order adver-
sary (Madry et al., 2017). A number of approaches find
Breaking defenses. The existence of robust adversarial
adversarial examples in the black-box setting, with some
examples implies that defenses based on randomly trans-
relying on the transferability phenomena and making use of
forming the input are not secure: adversarial examples gen-
substitute models (Papernot et al., 2017; 2016a) and others
erated using EOT can circumvent these defenses. Athalye
applying black-box gradient estimation (Chen et al., 2017).
et al. (2018) investigates this further and circumvents several
published defenses by applying Expectation over Transfor- Moosavi-Dezfooli et al. (2017) show the existence of uni-
mation. versal (image-agnostic) adversarial perturbations, small per-
Synthesizing Robust Adversarial Examples

turbation vectors that can be applied to any image to induce photos taken head-on from a single viewpoint, while EOT
misclassification. Their work solves a different problem produces 2D/3D adversarial examples robust over transfor-
than we do: they propose an algorithm that finds pertur- mations. Their approach also includes a mechanism for
bations that are universal over images; in our work, we enhancing perturbations’ printability using a color map to
give an algorithm that finds a perturbation to a single im- address the limited color gamut and color inaccuracy of the
age or object that is universal over a chosen distribution of printer. Note that this differs from our approach in achieving
transformations. In preliminary experiments, we found that printability: rather than creating a color map, we find an
universal adversarial perturbations, like standard adversarial adversarial example that is robust to color inaccuracy. Our
perturbations to single images, are not inherently robust to approach has the advantage of working in settings where
transformation. color accuracy varies between prints, as was the case with
our 3D-printer.
4.2. Defenses Concurrently to our work, Evtimov et al. (2017) proposed
Some progress has been made in defending against adver- a method for generating robust physical-world adversarial
sarial examples in the white-box setting, but a complete so- examples in the 2D case by optimizing over a fixed set
lution has not yet been found. Many proposed defenses (Pa- of manually-captured images. However, the approach is
pernot et al., 2016c; Hendrik Metzen et al., 2017; Hendrycks limited to the 2D case, with no clear translation to 3D, where
& Gimpel, 2017; Meng & Chen, 2017; Zantedeschi et al., there is no simple mapping between what the adversary
2017; Buckman et al., 2018; Ma et al., 2018; Guo et al., controls (the texture) and the observed input to the classifier
2018; Dhillon et al., 2018; Xie et al., 2018; Song et al., (an image). Furthermore, the approach requires the taking
2018; Samangouei et al., 2018) have been found to be vul- and preprocessing of a large number of photos in order to
nerable to iterative optimization-based attacks (Carlini & produce each adversarial example, which may be expensive
Wagner, 2016; 2017c;b;a; Athalye et al., 2018). or even infeasible for many objects.
Some of these defenses that can be viewed as “input trans- Brown et al. (2016) apply our EOT algorithm to produce
formation” defenses are circumvented through application an “adversarial patch”, a small image patch that can be
of EOT. applied to any scene to cause targeted misclassification in
the physical world.
4.3. Physical-world adversarial examples Real-world adversarial examples have also been demon-
In the first work on physical-world adversarial examples, Ku- strated in contexts other than image classification/detection,
rakin et al. (2016) demonstrate the transferability of FGSM- such as speech-to-text (Carlini et al., 2016).
generated adversarial misclassification on a printed page.
In their setup, a photo is taken of a printed image with QR 5. Conclusion
code guides, and the resultant image is warped, cropped, and
resized to become a square of the same size as the source im- Our work demonstrates the existence of robust adversarial
age before classifying it. Their results show the existence of examples, adversarial inputs that remain adversarial over a
2D physical-world adversarial examples for approximately chosen distribution of transformations. By introducing EOT,
axis-aligned views, demonstrating that adversarial pertur- a general-purpose algorithm for creating robust adversar-
bations produced using FGSM can transfer to the physical ial examples, and by modeling 3D rendering and printing
world and are robust to camera noise, rescaling, and lighting within the framework of EOT, we succeed in fabricating
effects. Kurakin et al. (2016) do not synthesize targeted three-dimensional adversarial objects. With access only to
physical-world adversarial examples, they do not evaluate low-cost commercially available 3D printing technology,
other real-world 2D transformations such as rotation, skew, we successfully print physical adversarial objects that are
translation, or zoom, and their approach does not translate classified as a chosen target class over a variety of angles,
to the 3D case. viewpoints, and lighting conditions by a standard ImageNet
classifier. Our results suggest that adversarial examples and
Sharif et al. (2016) develop a real-world adversarial attack objects are a practical concern for real world systems, even
on a state-of-the-art face recognition algorithm, where ad- when the examples are viewed from a variety of angles and
versarial eyeglass frames cause targeted misclassification viewpoints.
in portrait photos. The algorithm produces robust pertur-
bations through optimizing over a fixed set of inputs: the
attacker collects a set of images and finds a perturbation
that minimizes cross entropy loss over the set. The algo-
rithm solves a different problem than we do in our work: it
produces adversarial perturbations universal over portrait
Synthesizing Robust Adversarial Examples

Acknowledgments Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-J.
Zoo: Zeroth order optimization based black-box attacks
We wish to thank Ilya Sutskever for providing feedback on to deep neural networks without training substitute mod-
early parts of this work, and we wish to thank John Carring- els. In Proceedings of the 10th ACM Workshop on Ar-
ton and ZVerse for providing financial and technical sup- tificial Intelligence and Security, AISec ’17, pp. 15–26,
port with 3D printing. We are grateful to Tatsu Hashimoto, New York, NY, USA, 2017. ACM. ISBN 978-1-4503-
Daniel Kang, Jacob Steinhardt, and Aditi Raghunathan for 5202-4. doi: 10.1145/3128572.3140448. URL http:
helpful comments on early drafts of this paper. //doi.acm.org/10.1145/3128572.3140448.

References Dhillon, G. S., Azizzadenesheli, K., Bernstein, J. D., Kos-


saifi, J., Khanna, A., Lipton, Z. C., and Anandkumar,
Athalye, A., Carlini, N., and Wagner, D. Obfuscated A. Stochastic activation pruning for robust adversarial
gradients give a false sense of security: Circumvent- defense. International Conference on Learning Represen-
ing defenses to adversarial examples. 2018. URL tations, 2018. URL https://openreview.net/
https://arxiv.org/abs/1802.00420. forum?id=H1uR4GZRZ. accepted as poster.
Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Evtimov, I., Eykholt, K., Fernandes, E., Kohno, T., Li, B.,
Laskov, P., Giacinto, G., and Roli, F. Evasion attacks Prakash, A., Rahmati, A., and Song, D. Robust Physical-
against machine learning at test time. In Joint European World Attacks on Deep Learning Models. 2017. URL
Conference on Machine Learning and Knowledge Dis- https://arxiv.org/abs/1707.08945.
covery in Databases, pp. 387–402. Springer, 2013.
Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and
Brown, T. B., Mané, D., Roy, A., Abadi, M., and Gilmer, J. harnessing adversarial examples. In Proceedings of the
Defensive distillation is not robust to adversarial exam- International Conference on Learning Representations
ples. 2016. URL https://arxiv.org/abs/1607. (ICLR), 2015.
04311.
Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Coun-
Buckman, J., Roy, A., Raffel, C., and Goodfellow, I. Ther- tering adversarial images using input transformations.
mometer encoding: One hot way to resist adversarial ex- International Conference on Learning Representations,
amples. International Conference on Learning Represen- 2018. URL https://openreview.net/forum?
tations, 2018. URL https://openreview.net/ id=SyJ7ClWCb. accepted as poster.
forum?id=S18Su--CW. accepted as poster.
Hendrik Metzen, J., Genewein, T., Fischer, V., and Bischoff,
Carlini, N. and Wagner, D. Defensive distillation is not B. On detecting adversarial perturbations. In Interna-
robust to adversarial examples. 2016. URL https: tional Conference on Learning Representations, 2017.
//arxiv.org/abs/1607.04311. Hendrycks, D. and Gimpel, K. Early methods for detect-
Carlini, N. and Wagner, D. Adversarial examples are not ing adversarial images. In International Conference on
easily detected: Bypassing ten detection methods. AISec, Learning Representations (Workshop Track), 2017.
2017a. Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial
examples in the physical world. 2016. URL https:
Carlini, N. and Wagner, D. Magnet and “efficient defenses
//arxiv.org/abs/1607.02533.
against adversarial attacks” are not robust to adversarial
examples. arXiv preprint arXiv:1711.08478, 2017b. Lu, J., Sibai, H., Fabry, E., and Forsyth, D. No need to
worry about adversarial examples in object detection in
Carlini, N. and Wagner, D. Towards evaluating the robust-
autonomous vehicles. 2017. URL https://arxiv.
ness of neural networks. In IEEE Symposium on Security
org/abs/1707.03501.
& Privacy, 2017c.
Luo, Y., Boix, X., Roig, G., Poggio, T., and Zhao, Q.
Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Foveation-based mechanisms alleviate adversarial exam-
Shields, C., Wagner, D., and Zhou, W. Hidden voice ples. 2016. URL https://arxiv.org/abs/1511.
commands. In 25th USENIX Security Symposium 06292.
(USENIX Security 16), pp. 513–530, Austin, TX, 2016.
USENIX Association. ISBN 978-1-931971-32-4. Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S.,
URL https://www.usenix.org/conference/ Schoenebeck, G., Houle, M. E., Song, D., and Bailey, J.
usenixsecurity16/technical-sessions/ Characterizing adversarial subspaces using local intrinsic
presentation/carlini. dimensionality. International Conference on Learning
Synthesizing Robust Adversarial Examples

Representations, 2018. URL https://openreview. Samangouei, P., Kabkab, M., and Chellappa, R. Defense-
net/forum?id=B1gJ1L2aW. accepted as oral pre- gan: Protecting classifiers against adversarial attacks
sentation. using generative models. International Conference
on Learning Representations, 2018. URL https://
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and openreview.net/forum?id=BkJ3ibb0-. ac-
Vladu, A. Towards deep learning models resistant to ad- cepted as poster.
versarial attacks. 2017. URL https://arxiv.org/
abs/1706.06083. Sharif, M., Bhagavatula, S., Bauer, L., and Reiter, M. K.
Accessorize to a crime: Real and stealthy attacks on state-
McLaren, K. Xiiithe development of the cie 1976 of-the-art face recognition. In Proceedings of the 2016
(l* a* b*) uniform colour space and colourdiffer- ACM SIGSAC Conference on Computer and Communi-
ence formula. Journal of the Society of Dyers and cations Security, CCS ’16, pp. 1528–1540, New York,
Colourists, 92(9):338–341, September 1976. doi: NY, USA, 2016. ACM. ISBN 978-1-4503-4139-4. doi:
10.1111/j.1478-4408.1976.tb03301.x. URL https: 10.1145/2976749.2978392. URL http://doi.acm.
//onlinelibrary.wiley.com/doi/abs/10. org/10.1145/2976749.2978392.
1111/j.1478-4408.1976.tb03301.x.
Song, Y., Kim, T., Nowozin, S., Ermon, S., and Kush-
Meng, D. and Chen, H. MagNet: a two-pronged defense man, N. Pixeldefend: Leveraging generative models
against adversarial examples. In ACM Conference on to understand and defend against adversarial examples.
Computer and Communications Security (CCS), 2017. International Conference on Learning Representations,
arXiv preprint arXiv:1705.09064. 2018. URL https://openreview.net/forum?
id=rJUYGxbCW. accepted as poster.
Moosavi-Dezfooli, S., Fawzi, A., and Frossard, P. Deep-
fool: a simple and accurate method to fool deep neu- Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan,
ral networks. CoRR, abs/1511.04599, 2015. URL D., Goodfellow, I., and Fergus, R. Intriguing properties of
http://arxiv.org/abs/1511.04599. neural networks. 2013. URL https://arxiv.org/
abs/1312.6199.
Moosavi-Dezfooli, S.-M., Fawzi, A., Fawzi, O., and
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna,
Frossard, P. Universal adversarial perturbations. In IEEE
Z. Rethinking the inception architecture for computer vi-
Conference on Computer Vision and Pattern Recognition
sion. 2015. URL https://arxiv.org/abs/1512.
(CVPR), 2017.
00567.
Papernot, N., McDaniel, P., and Goodfellow, I. Transfer-
Xie, C., Wang, J., Zhang, Z., Ren, Z., and Yuille, A. Mit-
ability in machine learning: from phenomena to black-
igating adversarial effects through randomization. In-
box attacks using adversarial samples. 2016a. URL
ternational Conference on Learning Representations,
https://arxiv.org/abs/1605.07277.
2018. URL https://openreview.net/forum?
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, id=Sk9yuql0Z. accepted as poster.
Z. B., and Swami, A. The limitations of deep learning in Zantedeschi, V., Nicolae, M.-I., and Rawat, A. Efficient
adversarial settings. In IEEE European Symposium on defenses against adversarial attacks. arXiv preprint
Security & Privacy, 2016b. arXiv:1707.06728, 2017.
Papernot, N., McDaniel, P., Wu, X., Jha, S., and Swami,
A. Distillation as a defense to adversarial perturbations
against deep neural networks. In Security and Privacy
(SP), 2016 IEEE Symposium on, pp. 582–597. IEEE,
2016c.

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Ce-


lik, Z. B., and Swami, A. Practical black-box attacks
against machine learning. In Proceedings of the 2017
ACM on Asia Conference on Computer and Communica-
tions Security, ASIA CCS ’17, pp. 506–519, New York,
NY, USA, 2017. ACM. ISBN 978-1-4503-4944-4. doi:
10.1145/3052973.3053009. URL http://doi.acm.
org/10.1145/3052973.3053009.
Synthesizing Robust Adversarial Examples

A. Distributions of Transformations
Under the EOT framework, we must choose a distribution of transformations, and the optimization produces an adversarial
example that is robust under the distribution of transformations. Here, we give the specific parameters we chose in the 2D
(Table 4), 3D (Table 5), and physical-world case (Table 6).

B. Robust 2D Adversarial Examples


We give a random sample out of our 1000 2D adversarial examples in Figures 8 and 9.

C. Robust 3D Adversarial Examples


We give a random sample out of our 200 3D adversarial examples in Figures 10 and 11 and 12. We give a histogram of
adversariality (percent classified as the adversarial class) over all 200 examples in Figure 13.

D. Physical Adversarial Examples


Figure 14 gives all 100 photographs of our adversarial 3D-printed turtle, and Figure 15 gives all 100 photographs of our
adversarial 3D-printed baseball.

Transformation Minimum Maximum


Scale 0.9 1.4
Rotation −22.5◦ 22.5◦
Lighten / Darken −0.05 0.05
Gaussian Noise (stdev) 0.0 0.1
Translation any in-bounds

Table 4. Distribution of transformations for the 2D case, where each parameter is sampled uniformly at random from the specified range.

Transformation Minimum Maximum


Camera distance 2.5 3.0
X/Y translation −0.05 0.05
Rotation any
Background (0.1, 0.1, 0.1) (1.0, 1.0, 1.0)

Table 5. Distribution of transformations for the 3D case when working in simulation, where each parameter is sampled uniformly at
random from the specified range.

Transformation Minimum Maximum


Camera distance 2.5 3.0
X/Y translation −0.05 0.05
Rotation any
Background (0.1, 0.1, 0.1) (1.0, 1.0, 1.0)
Lighten / Darken (additive) −0.15 0.15
Lighten / Darken (multiplicative) 0.5 2.0
Per-channel (additive) −0.15 0.15
Per-channel (multiplicative) 0.7 1.3
Gaussian Noise (stdev) 0.0 0.1

Table 6. Distribution of transformations for the physical-world 3D case, approximating rendering, physical-world phenomena, and
printing error.
Synthesizing Robust Adversarial Examples

Original: European
fire salamander P (true): 93% P (true): 91% P (true): 93% P (true): 93%
P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: guacamole P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 99% P (adv): 99% P (adv): 96% P (adv): 95%

Original: caldron P (true): 75% P (true): 83% P (true): 54% P (true): 80%
P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: velvet P (true): 0% P (true): 0% P (true): 1% P (true): 0%


P (adv): 94% P (adv): 94% P (adv): 91% P (adv): 100%

Original: altar P (true): 87% P (true): 38% P (true): 59% P (true): 2%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: African P (true): 0% P (true): 0% P (true): 3% P (true): 0%


elephant P (adv): 93% P (adv): 87% P (adv): 73% P (adv): 92%

Figure 8. A random sample of 2D adversarial examples.


Synthesizing Robust Adversarial Examples

Original: barrel

P (true): 96% P (true): 99% P (true): 96% P (true): 97%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: guillotine

P (true): 1% P (true): 0% P (true): 0% P (true): 3%


P (adv): 10% P (adv): 95% P (adv): 91% P (adv): 4%

Original: baseball

P (true): 100% P (true): 100% P (true): 100% P (true): 100%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: green lizard

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 66% P (adv): 94% P (adv): 87% P (adv): 94%

Original: turtle

P (true): 94% P (true): 98% P (true): 90% P (true): 97%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: Bouvier des


Flandres
P (true): 1% P (true): 0% P (true): 0% P (true): 0%
P (adv): 1% P (adv): 6% P (adv): 21% P (adv): 84%

Figure 10. A random sample of 3D adversarial examples.


Synthesizing Robust Adversarial Examples

Original: barracouta P (true): 91% P (true): 95% P (true): 92% P (true): 92%
P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: tick P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 88% P (adv): 99% P (adv): 98% P (adv): 95%

Original: tiger cat P (true): 85% P (true): 91% P (true): 69% P (true): 96%
P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: tiger P (true): 32% P (true): 11% P (true): 59% P (true): 14%
P (adv): 54% P (adv): 84% P (adv): 22% P (adv): 79%

Original: speedboat
P (true): 14% P (true): 1% P (true): 1% P (true): 1%
P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: crossword
puzzle P (true): 3% P (true): 0% P (true): 0% P (true): 0%
P (adv): 91% P (adv): 100% P (adv): 100% P (adv): 100%

Figure 9. A random sample of 2D adversarial examples.


Synthesizing Robust Adversarial Examples

Original: baseball

P (true): 100% P (true): 100% P (true): 100% P (true): 100%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: Airedale

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 94% P (adv): 6% P (adv): 96% P (adv): 18%

Original: orange

P (true): 73% P (true): 29% P (true): 20% P (true): 85%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: power drill

P (true): 0% P (true): 4% P (true): 0% P (true): 0%


P (adv): 89% P (adv): 75% P (adv): 98% P (adv): 84%

Original: dog

P (true): 1% P (true): 32% P (true): 12% P (true): 0%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: bittern

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 97% P (adv): 91% P (adv): 98% P (adv): 97%

Figure 11. A random sample of 3D adversarial examples.


Synthesizing Robust Adversarial Examples

Original: teddybear

P (true): 90% P (true): 1% P (true): 98% P (true): 5%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: sock

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 99% P (adv): 99% P (adv): 98% P (adv): 99%

Original: clownfish

P (true): 46% P (true): 14% P (true): 2% P (true): 65%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: panpipe

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 100% P (adv): 1% P (adv): 12% P (adv): 0%

Original: sofa

P (true): 15% P (true): 73% P (true): 1% P (true): 70%


P (adv): 0% P (adv): 0% P (adv): 0% P (adv): 0%

Adv: sturgeon

P (true): 0% P (true): 0% P (true): 0% P (true): 0%


P (adv): 100% P (adv): 100% P (adv): 100% P (adv): 100%

Figure 12. A random sample of 3D adversarial examples.


Synthesizing Robust Adversarial Examples

Figure 13. A histogram of adversariality (percent of 100 samples classified as the adversarial class) across the 200 3D adversarial
examples.
Synthesizing Robust Adversarial Examples

classified as turtle classified as rifle classified as other

Figure 14. All 100 photographs of our physical-world 3D adversarial turtle.


Synthesizing Robust Adversarial Examples

classified as baseball classified as espresso classified as other

Figure 15. All 100 photographs of our physical-world 3D adversarial baseball.

You might also like