0% found this document useful (0 votes)
9 views

Adversarial Machine Learning

Uploaded by

dszajda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Adversarial Machine Learning

Uploaded by

dszajda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as KEY, PDF, TXT or read online on Scribd
You are on page 1/ 39

Roadmap:

Adversarial
Machine Learning

Roadmap: Fall 2020


Adversarial Machine Learning

All of what we have discussed builds up to the


primary issue for us: adversarial examples
First, a question: who cares?
Well, think about what these things are used for
How about a self driving car, which uses ML to
identify and understand its surroundings
Suppose one could fool such a model simply by
placing a few pieces of tape on a stop sign?

Roadmap: Fall 2021


Adversarial Machine Learning

Well, think about what these things are used for


Suppose one could fool such a model simply by
placing a few pieces of tape on a stop sign?

Well, it turns out you can. The ML model this was


intended to fool interpreted this as a “Speed Limit
45 mph” sign

Roadmap: Fall 2021


Adversarial Machine Learning

Well, think about what these things are used for


Again, think about all of the ways in which ML
models are used, and the damage one can do if
the models are fooled
What can you do if you can fool siri or Amazon
Alexa?
Or facial recognition software?
Or biometric authentication software?
Or any of the many other functions that ML models
serve?

Roadmap: Fall 2021


Adversarial Machine Learning

So now that we’ve seen the why, a few


observations:
A general rule for machine learning: If a human
expert can classify something, chances are you can
build a ML classifier to do it
And often, if a human can’t do it, neither can an ML classifier

But ML classifiers can identify patterns that a human


can’t, or generally wouldn’t
Perhaps because we wouldn’t even notice them

And even though some ML systems are built to


model human perception, there remain differences
in how humans and ML models “reason”
Roadmap: Fall 2021
Adversarial Machine Learning

So, our short course fundamental question: How


can we fool ML classifiers?
Can you think of anything you might do to get
an ML model to perform poorly?
And what do you need to know or have access to in
order to succeed at your task?
Hint: Think, at a high level, about how these things work

Roadmap: Fall 2021


Adversarial Machine Learning

Can you think of anything you might do to get


an ML model to perform poorly?
Answer: Poison training data — insert a bunch of
training examples that all have, say, a dark
green pixel in the lower right hand corner of the
image, and all have a fixed target class (say
“duck”). Then regardless of image, if it has a
dark green pixel in lower right corner, it’s likely
to be classified as duck.
Might seem impractical. BUT, if you do something
similar with computer code — put a specific string in
every file, and label all those files as “benign”, then
any code that has that string in it has a high
Roadmap: Fall 2021
Adversarial Machine Learning Attack
Classes
Two classes of errors:
Targeted: we fool the ML classifier so that it chooses
the incorrect class that we want it to
Untargeted: we fool the ML classifier so that it
chooses the incorrect class, but we don’t care which
incorrect class it chooses
Ex. Automated speaker identification
Might want the ML model to identify speaker as a specific
(incorrect) individual

Ex. Malware detection


Who cares why it says code is benign as long as it says it’s benign!

Roadmap: Fall 2021


Adversarial Machine Learning Attack
Classes
White box attack: The adversary has access to
all information about the model
Hyperparameters (what?) and all parameters

Black box attack: The adversary has not access


to any internal information or hyperparameters
They can use the model (feed it inputs and observe
outputs), but that’s it
The most realistic attack

Gray box attack: The adversary has some


information about structure of the model, but
generally no information about parameter
values
Roadmap: Fall 2021 or the like
Adversarial Machine Learning Attack
Classes
Often one more consideration: similarity to original
benign sample
It’s not hard to fool an ML classifier into thinking a
picture of a school bus is actually an image of a duck
if you modify the image so much that it effectively
looks like a duck
So the idea: with adversarial examples, we often
want to modify the original benign example as little
as possible
And generally only just enough so that the the example
becomes misclassified

In terms of mathematics, we want to perturb the


original example just enough to nudge it over the
Roadmap: Fall 2021
A picture is worth a thousand words…

Often one more consideration: similarity to


original benign sample
How much
does
this sample
need
to be
nudged in
order to
have it
classified as
orange?
And in what
direction?

Roadmap: Fall 2021


A picture is worth a thousand words…

Often one more consideration: similarity to


original benign sample
What about
this
example?

Roadmap: Fall 2021


A picture is worth a thousand words…

If this was what real decision boundaries looked


like, you might think that in general creating
adversarial examples is difficult

Roadmap: Fall 2021


A picture is worth a thousand words…

But we know that in general this is not what real


decision boundaries look like. The look more
like this:

And remember, this is only two dimensions

Roadmap: Fall 2021


A picture is worth a thousand words…

But we know that in general this is not what real


decision boundaries look like. The look more
like this:

And remember, this is only two dimensions

Roadmap: Fall 2021


A picture is worth a thousand words…

And remember, this is only two dimensions

Roadmap: Fall 2021


We’ll start with a white box attack

Given the previous slides, it seems that a nice


white box attack would be the following:
Take a benign image. Perturb (tweek) it so that it
“moves” as little as possible toward the decision
region of the class you want the image to be
incorrectly classified as
It’s a white box attack, since to do this you need to
know the decision regions, which you only know if you
have the parameters of the classifier
There is mathematics involved (multivariable calculus)
Generally, one looks at which inputs need to be tweaked to move
classification toward the class you want, while simultaneously moving
classification away from all of the other classes

Creates something called “saliency maps”


Roadmap: Fall 2021
We’ll start with a white box attack

Papernot, et. al., The Limitations of Deep


Learning in Adversarial Settings, 2015,
https://arxiv.org/abs/1511.07528

Roadmap: Fall 2021


We’ll start with a white box attack

Papernot, et. al., The Limitations of Deep


Learning in Adversarial Settings, 2015,
https://arxiv.org/abs/1511.07528

Roadmap: Fall 2021


We’ll start with a white box attack

On the previous images, you could see the


changes in some. How about in these?

Macaw + noise = Book case

Roadmap: Fall 2021


We’ll start with a white box attack

On the previous images, you could see the


changes in some. How about in these?

Pig + noise = Airliner

Roadmap: Fall 2021


We’ll start with a white box attack

On the previous images, you could see the


changes in some. How about in these?

Meerkat + noise = Welcome mat

Roadmap: Fall 2021


We’ll start with a white box attack

On the previous images, you could see the


changes in some. How about in these?

Roadmap: Fall 2021


We’ll start with a white box attack

You might think, OK, so this works with images.


But what about other things?

Note the bottom example!

Roadmap: Fall 2021


We’ll start with a white box attack

And you might say, “OK, so this works with a white


box attack, but how practical are they?”
Well, as it turns out, in some cases (images being the
most prominent), there is something called
transferability
Researchers had wondered whether one could infer
the parameter values of a neural network given only
black box access. Papernot showed (in another
paper) that this isn’t necessary!
Build your own substitute model, and train it using input-
output pairs you get by running the original model. You
have white-box access to the substitute network, so can
generate adversarial examples on it. About 85% of the
time, those examples will also work on the original neural
Roadmap: Fall network!
2021 That’s transferability!
Transferability

So, as mentioned, it often works with image


classification
But it doesn’t work at all with automated voice
processing
Recent work, some not yet published, some published
just this past May, explains several of the factors that
prevent transfer of adversarial examples in automated
voice processing systems
One of the primary factors is that automated voice
processing systems employs a type of neural network
that image processing systems don’t
Recurrent Neural Network (RNN) is a network designed to handle
sequential data and to “remember” context (which is important in
speech)

RNNs
Roadmap: Fall 2021 seem to prevent transfer of adversarial examples.
Black Box Attacks

So are there any successful black box attacks?


It turns out there are!
Let’s consider, for a second, automated voice
processing, and in particular, speech-to-text
Two kinds of attacks:
Fool the AVP system, but not humans
Two people one a cell phone call can understand each other, but
the AVP system listening in is confused

Fool humans but not the AVP system


Humans in presence of audio signal either don’t hear it or hear it as
background noise, but AVP system understands “hidden”
commands
Roadmap: Fall 2021
Black Box Attacks

So are there any successful black box attacks?


It turns out there are!
Abdullah, et. al., Hear “No Evil”, See
“Kenansville”: Efficient and Transferable Black-
Box Attacks on Automatic Speech Recognition
and Voice Identification Systems, Proceedings of
IEEE Symposium on Security and Privacy, May,
2021.

https://sites.google.com/view/transcript-evasion

Roadmap: Fall 2021


Black Box Attacks

Given that there are all of these attacks, how


can we defend against them? Ideas?

Roadmap: Fall 2021


Explanation Methods

It has become clear that we need a way to


understand how classifiers reason
That is, understand why they classify specific
examples the say they dod

So recently, researchers have begun looking


into explanation methods
Let’s begin our discussion of this with a return to
an old friend:

If you know the beta values, do you know which


feature(s) are more important than others?
Roadmap: Fall 2021
Explanation Methods

If you know the beta values, do you know which


feature(s) are more important than others?

Of course, we know that models like neural


networks are highly nonlinear, so we can’t hope
to end up with something like this linear model.

Roadmap: Fall 2021


Explanation Methods

If you know the beta values, do you know which


feature(s) are more important than others?

Of course, we know that models like neural


networks are highly nonlinear, so we can’t hope
to end up with something like this linear model.
Or can we?

Roadmap: Fall 2021


Explanation Methods

If you know the beta values, do you know which


feature(s) are more important than others?

Of course, we know that models like neural


networks are highly nonlinear, so we can’t hope
to end up with something like this linear model.
Or can we?

Roadmap: Fall 2021


Explanation Methods

We can’t approximate the entire functionality of


most models with a linear model, but we can
often approximate behavior in a very small
neighborhood with a linear model

Roadmap: Fall 2021


Explanation Methods

So the idea: if you want to know why a specific


input is classified the way it is, approximate the
decision boundary in the neighborhood of the
input of interest with a linear model.

How?

Roadmap: Fall 2021


Explanation Methods

How? Well, we know how to build linear


regression models. All we need is data.
Where do we get that?
Two examples of doing this: LIME and LEMNA
Google them if interested

Roadmap: Fall 2021


Explanation Methods

Let’s look at an example of the results

Roadmap: Fall 2021


Finally, A Fantastic Keynote Address

James Mickens of Harvard University gave the


keynote address at the Usenix Security
Symposium in August 2018. It’s very
entertaining, and very relevant!
https://www.usenix.org/conference/usenixsecurity18/presentation/mickens

Roadmap: Fall 2021


The End of the Short Course

Thanks for having been a part of this!


Have a great semester!
I will be getting us all together for lunch in about
four weeks.
It’s voluntary, but gives me a chance to see how
everyone is doing.

Roadmap: Fall 2021

You might also like