Lecture15
Lecture15
detec-on
Problem
mo-va-on
Machine
Learning
Anomaly
detec-on
example
Aircra9
engine
features:
Dataset:
=
heat
generated
=
vibra-on
intensity
New
engine:
…
(vibra-on)
(heat)
Andrew
Ng
Density
es-ma-on
Dataset:
Is
anomalous?
(vibra-on)
(heat)
Andrew
Ng
Anomaly
detec-on
example
Fraud
detec-on:
=
features
of
user
’s
ac-vi-es
Model
from
data.
Iden-fy
unusual
users
by
checking
which
have
Manufacturing
Monitoring
computers
in
a
data
center.
=
features
of
machine
=
memory
use,
=
number
of
disk
accesses/sec,
=
CPU
load,
=
CPU
load/network
traffic.
…
Andrew
Ng
Anomaly
detec-on
Gaussian
distribu-on
Machine
Learning
Gaussian
(Normal)
distribu-on
Say
.
If
is
a
distributed
Gaussian
with
mean
,
variance
.
Andrew
Ng
Gaussian
distribu-on
example
Andrew
Ng
Parameter
es-ma-on
Dataset:
Andrew
Ng
Anomaly
detec-on
Algorithm
Machine
Learning
Density
es-ma-on
Training
set:
Each
example
is
Andrew
Ng
Anomaly
detec-on
algorithm
1. Choose
features
that
you
think
might
be
indica-ve
of
anomalous
examples.
2. Fit
parameters
3. Given
new
example
,
compute
:
Anomaly
if
Andrew
Ng
Anomaly
detec-on
example
Andrew
Ng
Anomaly
detec-on
Developing
and
evalua-ng
an
anomaly
detec-on
system
Machine
Learning
The
importance
of
real-‐number
evalua-on
When
developing
a
learning
algorithm
(choosing
features,
etc.),
making
decisions
is
much
easier
if
we
have
a
way
of
evalua-ng
our
learning
algorithm.
Assume
we
have
some
labeled
data,
of
anomalous
and
non-‐
anomalous
examples.
(
if
normal,
if
anomalous).
Training
set:
(assume
normal
examples/not
anomalous)
Cross
valida-on
set:
Test
set:
Andrew
Ng
AircraA
engines
mo-va-ng
example
10000
good
(normal)
engines
20
flawed
engines
(anomalous)
Alterna-ve:
Training
set:
6000
good
engines
CV:
4000
good
engines
(
),
10
anomalous
(
)
Test:
4000
good
engines
(
),
10
anomalous
(
)
Andrew
Ng
Algorithm
evalua-on
Fit
model
on
training
set
On
a
cross
valida-on/test
example
,
predict
Andrew
Ng
Anomaly
detec-on
Choosing
what
features
to
use
Machine
Learning
Non-‐gaussian
features
Error
analysis
for
anomaly
detec-on
Want
large
for
normal
examples
.
small
for
anomalous
examples
.
Most
common
problem:
is
comparable
(say,
both
large)
for
normal
and
anomalous
examples
Monitoring
computers
in
a
data
center
Choose
features
that
might
take
on
unusually
large
or
small
values
in
the
event
of
an
anomaly.
=
memory
use
of
computer
=
number
of
disk
accesses/sec
=
CPU
load
=
network
traffic
Anomaly
detec-on
Mul-variate
Gaussian
distribu-on
Machine
Learning
Mo-va-ng
example:
Monitoring
machines
in
a
data
center
(Memory
Use)
(CPU Load)
(CPU
Load)
(Memory
Use)
Andrew
Ng
Mul-variate
Gaussian
(Normal)
distribu-on
.
Don’t
model
etc.
separately.
Model
all
in
one
go.
Parameters:
(covariance
matrix)
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Mul-variate
Gaussian
(Normal)
examples
Andrew
Ng
Anomaly
detec-on
Anomaly
detec-on
using
the
mul-variate
Gaussian
distribu-on
Machine
Learning
Mul-variate
Gaussian
(Normal)
distribu-on
Parameters
Parameter
fifng:
Given
training
set
Andrew
Ng
Anomaly
detec-on
with
the
mul-variate
Gaussian
1.
Fit
model
by
sefng