We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
2irro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstartes | openinapp on
546K Followers =
This is your last free member-only story this month. Sign up for Medium and get an extra one
Machine learning for anomaly detection and
condition monitoring
Astep-by-step tutorial from data import to model output
g Vegard Flovik Apr 23,2019 10min read «
My previous article on anomaly detection and condition monitoring has received a lot of
feedback. Many of the questions I receive, concern the technical aspects and how to set
hitps:towardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring461467de770 we2irro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstarea | openinapp @
results.
For an introduction to anomaly detection and condition monitoring, I recommend first
reading my original article on the topic. This provides the neccesary background
information on how machine learning and data driven analytics can be utilized to
extract valuable information from sensor data.
The current article focuses mostly on the technical aspects, and includes all the code
needed to set up anomaly detection models based on multivariate statistical analysis and
autoencoder neural networks.
Download the dataset:
To replicate the results in the original article, you first need to download the dataset
from the NASA Acoustics and Vibra se. See the downloaded Readme
Document for IMS Bearing Data for further information on the experiment and available
‘ion Data
data.
Each data set consists of individual files that are 1-second vibration signal snapshots
recorded at specific intervals. Each file consists of 20.480 points with the sampling rate
set at 20 kHz. The file name indicates when the data was collected. Each record (row) in
the data file is a data point. Larger intervals of time stamps (showed in file names)
indicate resumption of the experiment in the next working day.
Import packages and libraries:
‘The first step is to import some useful packages and libraries for the analysis:
# Common imports
import os
import pandas as pd
import numpy as np
from sklearn import preprocessing
import seaborn as sns
sns.set (color_codes=Txue)
hitpsstowardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring4614674e770 ane2inro2t
Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp @n
from
from
from
from
from
from
“Hampy + rencouapox cs Srer
tensorflow import set random seed
keras.layers import Input, Dropout
keras.layers.core import Dense
keras.models import Model, Sequentia
keras import regularizers
keras.models import model
1, load_model
from_json
Data loading and pre-processing:
Anassut
datapoit
mption is that gear degradation occur gradually over time, so we use one
int every 10 minutes in the following analysis. Each 10 minute datapoint is
aggregated by using the mean absolute value of the vibration recordings over the 20.480
datapoi
ints in each file. We then merge together everything in a single dataframe.
In the following example, I use the data from the 2nd Gear failure test (see readme
document for further info on that experiment).
data
for
dir = '2nd_test
@d_data = pd.DataFrame()
filename in os.listdir(d
print (filename)
dataset=pd. read_csv(os.p:
ath. join(data_dir, filename), sep="\t")
dataset_mean_abs = np.array (dataset abs () .mean())
dataset_mean_abs = pd.DataFrame (dataset_mean_abs. reshape (1,4))
dataset_mean_abs.index = [filename]
merged data = merged data.append(dataset_mean_abs}
data.columns = ['Bearing 1','Bearing 2', "Bearing 3", 'Bearing
After loading the vibration data, we transform the index to datetime format (using the
following convention), and then sort the data by index in chronological order before
saving the merged dataset as a .csv file
)_datetime (merged_data. index,
hitpsstowardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring4614674e770 ane2irro2 Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Resulting dataframe: “merged data”
Define train/test data:
Before setting up the models, we need to define train/test data. To do this, we perform a
simple split where we train on the first part of the dataset (which should represent
normal operating conditions), and test on the remaining parts of the dataset leading up
to the bearing failure.
dataset_train - merged_data['
23:52:39
dataset_test = merged_data['2004-02-13 23:52:39"
dataset_train.plot (£:
hitpsstowardedatascience.com/machine-learning-or-anomaly-detection-and-coneion-monitoring-d461467de770 ane2inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp @n
Normalize data:
I then use preprocessing tools from Scikit-learn to scale the input variables of the model.
The “MinMaxScaler” simply re-scales the data to be in the range [0,1].
scaler = preprocessing. MinMaxScaler ()
X train = pda.
DataFrame (scaler. fit_transform(dataset_train),
columns=dataset_train.columns,
index: _t¥ain. index)
atase!
# Random shuffle training data
X_train.sample(frac=1)
X test = pd.DataFrame (scaler.transform(dataset_test),
dataset_test.columns,
index-dataset_test. index)
PCA type model for anomaly detectioi
‘As dealing with high dimensional sensor data is often challenging, there are several
techniques to reduce the number of variables (dimensionality reduction). One of the
main techniques is principal component analysis (PCA). For a more detailed
introduction, I refer to my original article on the topic.
As an initial attempt, let us compress the sensor readings down to the two main principal
components.
from sklearn.decomposition import PCA
pea = PCA(n_components=2, svd_solver= 'full')
X_train_PCA = pea. fit_transform(X_train
X_train_PCA = pd.DataFrame(x_train_PCA)
X_train PCA. index = X_train. index
tr:
X_test_PCA = pi form(X )
X test PCA = pd.DataFrame(X test PCA)
X_test_PCA.index = X_test.index
hitpsstowardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring4614674e770 ene2inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp on
techniques. In order to use the Mahalanobis distance to classify a test point as belonging
to one of N classes, one first estimates the covariance matrix of each class, usually based
on samples known to belong to each class. In our case, as we are only interested in
classifying “normal” vs “anomaly”, we use training data that only contains normal
operating conditions to calculate the covariance matrix. Then, given a test sample, we
compute the Mahalanobis distance to the “normal” class, and classifies the test point as
n “anomaly” if the distance is above a certain threshold.
For a more detailed introduction to these technical aspects, you can have a look at my
previous article, which covers these topics in more detail.
Define functions used in the PCA model:
Calculate the covariance matrix:
det (data, verbose-False) :
ix = np.cov(data, rowvar=False)
covariance_matrix)
dance matrix = np.linalg.inv (covariance matrix)
if is pos def (inv_covariance matrix)
return covariance matrix, inv covariance matrix
els.
print ("Error: Inverse of Covariance Matrix is not
positive definite!")
else:
print ("Error: Covariance Matrix is n
tive definite!")
t pos
Calculate the Mahalanobis distance:
def MahalanobisDist (inv_cov_matrix, mean_distr, data, verbose-False) :
inv_covariance_matrix xix
ean = mean_distr
md. append (np.
return md
£[i] .dot (inv_covariance matrix) .dot (dif£[i])))
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 ene2inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp on
def MD_detectoutliers (dist, extreme=False, verbose=False) :
k if extreme else 2.
threshold = np.mean(dist) * k
outliers = []
for i in range(len(dist)):
if dist[i] >
nresholdt
pend(i) # index of the outlier
return np. array (outliers)
Calculate threshold value for classifying datapoint as anomaly:
def MD_threshold(dist, extreme-False,
k = 3. if extreme else 2.
threshold = np.mean(dist) * k
return threshold
Check if matrix is positive definite:
def is _pos_def(A):
if np.aliclose(A, ?
try
np. linalg.cholesky (A)
return True
except np. linalg.LinAlgError:
return False
7)
else:
return False
Set up PCA model:
Define train/test set from the two main principal components:
data_train = np.array(X_train PCA.values)
data_test = np.array(X_test_PCA.values)
Calculate the covariance matrix and its inverse, based on data in the training set:
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 m82inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstared } openinapp @
We also calculate the mean value for the input variables in the training set, as this is
used later to calculate the Mahalanobis distance to datapoints in the test set
mean_distr = data_train.mean(axis=0)
Using the covariance matrix and its inverse, we can calculate the Mahalanobis distance
for the training data defining “normal conditions”, and find the threshold value to flag
datapoints as an anomaly. One can then calculate the Mahalanobis distance for the
datapoints in the test set, and compare that with the anomaly threshold.
dist_test - MahalanobisDist (inv_cov_matrix, mean_di
verbose-False)
dist_train = MahalanobisDist (inv_cov_matrix, mean_distr, data_train,
verbose-False)
threshold = MD_threshold(dist_train, ex
data_test,
‘creme = True)
Threshold value for flagging an anomaly:
The square of the Mahalanobis distance to the centroid of the distribution should follow
x2 distribution if the assumption of normal distributed input variables is fulfilled. This
is also the assumption behind the above calculation of the “threshold value” for flagging
an anomaly. As this assumption is not necessarily fulfilled in our case, it is beneficial to
visualize the distribution of the Mahalanobis distance to set a good threshold value for
flagging anomalies. Again, I refer to my previous article, for a more detailed
introduction to these technical aspects.
We start by visualizing the square of the Mahalanobis distance, which should then
ideally follow a x2 distribution.
plt.£igure()
sns.distplot (np.square (dist_train),
bins = 10,
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 anearriagat Machine leaming for anomaly detection and condivon monitoring | by Vegard Flovk | Towards Data Science
Open in app et
‘Square of the Mahalanobis distance
‘Then visualize the Mahalanobis distance itself:
plt. figure ()
sns.distplot (dist_train,
bins = 10,
kde True,
color = 'green');
plt.xlim([0.0,5])
plt.xlabel (‘Mahalanobis dist")
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conalion-montoring4614674e770 one2inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp on
From the above distributions, the calculated threshold value of 3.8 tor tlagging an.
anomaly seems reasonable (defined as 3 standard deviations from the center of the
distribution)
‘We can then save the Mahalanobis distance, as well as the threshold value and “anomaly
flag” variable for both train and test data in a dataframe:
anomaly train = pd.DataFrame()
maly_train['Mob dist']= dist_train
maly train['Thres nreshold
# If Mob dist above threshold: Flag as anomaly
anomaly train[‘Anomaly'] = anomaly train['Mob dist'] >
anomaly train['Thresh'
anomaly _train.index = X_train PCA, index
anoma pd. DataFrame ()
anomaly['Mob dist']= dist _test
anomaly['Thresh'] = threshold
# If Mob dist above threshold: Flag as anomaly
anomaly['Anomaly'] = anomaly['Mob dist'] > anomaly['Thresh'
anomaly. index = X_1
anomaly. head ()
Resulting dataframe for the test data
Based on the calculated statistics, any distance above the threshold value will be flagged
as an anomaly.
‘We can now merge the data in a single dataframe and save it as a .csv file:
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 sone2ira9a Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstated | Openinapp on
Verifying PCA model on test data:
‘We can now plot the calculated anomaly metric (Mob dist), and check when it crosses
the anomaly threshold (note the logarithmic y-axis).
anomaly _alldata.plot (logy-True, figsize = (10,6), ylim ~ [le-1,1¢3],
color = ["green', 'red'])
From the above figure, we see that the model is able to detect the anomaly
approximately 3 days ahead of the actual bearing failure.
Other approach: Autoencoder model for anomaly detection
The basic idea here is to use an autoencoder neural network to “compress” the sensor
readings to a low dimensional representation, which captures the correlations and
interactions between the various variables. (Essentially the same principle as the PCA
model, but here we also allow for non-linearities among the input variables).
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 nie2inro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstarted | Openinapp on
Defining the Autoencoder network:
We use a3 layer neural network: First layer has 10 nodes, middle layer has 2 nodes, and
third layer has 10 nodes. We use the mean square error as loss function, and train the
model using the “Adam” optimizer.
seed (10)
set_random_seed (10)
act_fune = ‘elu!
Input layer:
mode1=Sequential ()
# First hidden layer, connected to input vector x.
model. add (Dense (10, activation=act_func,
kernel _initializer="glorot_uniform',
input,
)
nel_regularizer-regularizers.12(0.0),
hape=(X_train.shape[1],)
)
vation-act_func,
alizer="glorot_uniform'))
model. add (Dense (2, ac
model . add (Dense (10, activation=act_func,
kernel _initializer="glorot_uniform'))
model. add (Dense (X_train.shape[1],
kernel initialize:
‘glorot_uniform'))
model.compile (loss='mse', optimizer="adam')
model for 100 epochs, batch size of 10:
-OCHS=100
SIZE=10
Fitting the model:
To keep track of the accuracy during training, we use 5% of the training data for
validation after each epoch (validation_split = 0.05)
model. fit (np.array (X_train) ,np.array(X_train),
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 se2irro2t Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstared | openinapp @
Training process
Visualize training/validation loss:
ot (history. hi:
‘bt,
label="Training loss')
plt.plot (history.history['val_loss'],
label="Validation loss")
plt.legend(1oc="upper right")
plt.xlabel ('Epochs')
plt.ylabel('Loss, [mse] ')
plt.ylim((0,.1])
plt.show()
ry['loss'],
hitpstowardsdatascience.com/machine-learning-for-anomaly-detecton-anc-condiion-montoring4614674e770 sae2ira02 Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Open in app @o
Train/validation loss
Distribution of loss function in the tré 9g set:
By plotting the distribution of the calculated loss in the training set, one can use this to
identify a suitable threshold value for identifying an anomaly. In doing this, one can
make sure that this threshold is set above the “noise level”, and that any flagged
anomalies should be statistically significant above the noise background.
X pred
X pred
model.predict (np.array(X_train))
pd. DataFrame (X_pred,
columns-X_train.columns)
X_pred.index = X_train.index
scored = pd. DataFrame (index=X_train. index)
scored['Loss_mae'] = np.mean(np.abs(X_pred-X_train), axis = 1)
plt. figure ()
sns.distplot (scored["Loss_mae'],
bins = 10,
kde= True,
color "blue');
plt.xLim({0.0,.51)
Loss distribution, training set
hitpstowardsdatascience.com/machine-learning-or-anomaly-detecton-anc-conaiion-montoring 46146746770 sae2ira02 Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Open in app eo
anomaly threshold.
X_pred
X_pred
model predict (np.array(X_test))
pd.DataFrame (X_pred,
columns=X_test.columns)
X_pred.index = X_test. index
scored = pd.DataFrame (index-X_test. index)
scored['Loss_mae'] = np.mean(np.abs(X_pred-X_test), axis = 1)
scored['Threshold'] = 0.3
scored['Anomaly'] = scored['Loss_mae'] > scored['Threshold']
scored. head ()
We then calculate the same metrics also for the training set, and merge all data ina
single dataframe:
X_pred_train = model.predict (np.array(X_train))
X pred train ~ pd.DataFrame(X_pred train,
columns=X_train.columns)
X_pred_train.index = X_train, index
scored_train = pd.DataFrame (index=-x_train. index)
scored_train['Loss_mae'] = np.mean(np.abs(X_pred_train-X_train), axis
=v
scored_train['Threshold'] = 0.3
scored_train['Anomaly'] = scored_train['Loss_mae'] >
scored_train['Threshold"]
scored = pd.concat ([scored_train, scored])
hitpsstowardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring4614674e770 s9182ira02 Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Open in app eo
model output in the time leading up to the bearing failure:
scored.plot (logy=True, figsize = (10,6), ylim = [le-2,le2], color =
["blue', 'red"])
Summary:
Both modeling approaches give similar results, where they are able to flag the upcoming
bearing malfunction well in advance of the actual failure. The main difference is
essentially how to define a suitable threshold value for flagging anomalies, to avoid to
many false positives during normal operating conditions.
hope this tutorial gave you inspiration to try out these anomaly detection models
yourselves. Once you have succesfully set up the models, it is time to start experimenting
with model parameters etc. and test the same approach on new datasets. If you come
across some interesting use cases, please let me know in the comments below.
Have fun!
hitpsstowardsdatascience.com/machine-learning-or-anomaly-detection-anc-conaiion-montoring4614674e770 s618zinro2s Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstarea | openinapp @
1. The transition from Physics to Data Science
2. What is Graph theory, and why should you care?
3. Deep Transfer Learning for Image Classification
4, Building an Al that can read your mind
5. Machine Learning: From Hype to real-world applications
6. The hidden risk of Al and Big Data
7. How to use machine learning for anomaly detection and condition monitoring
9. How (not) to use Machine Learning for time series forecasting: Avoiding the pitfalls
10. How to use machine learning for production optimization: Using data to improve
performance
11, How do you teach physics to Al systems?
12. Can we build arti
Al workshop — From hype to real-world applications
hitpsstowardedatascience.commachine-learning-or-anomaly-detection-and-concon-monioring-d461467de770 atezinro2s Machine leaming for anomaly detection and condition monitoring | by Vegard Flovk | Towards Data Science
Getstarted | Openinapp @
Sign up for The Daily Pick
By Towards Data Science
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to.
Thursday. Make learning your daily ritual. Take a look
Your email
[__cemienwene
By signing up, you will ereate a Medium account ityou dont already have one, Review our Privacy Palicy for more information
about our privacy practices
MachineLearning DataScience Al‘ IoT_~—Towards Data Science
hitpsstowardedatascience.commachine-learning-or-anomaly-detection-and-conon-monitoring-d461467de770 sae