0% found this document useful (0 votes)
64 views

Vibration Analysis in Bearings For Failure Prevent

This document presents a preprint manuscript that proposes using a convolutional neural network (CNN) approach to analyze vibration data from bearings for failure prediction. The authors first automatically label vibration data to indicate different levels of bearing wear. They then convert the raw vibration data into small images that serve as inputs to an AlexNet-based CNN model for classifying wear level. The authors validate their approach on a dataset from an intelligent maintenance system center and find it outperforms other state-of-the-art methods for this task.

Uploaded by

layafo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views

Vibration Analysis in Bearings For Failure Prevent

This document presents a preprint manuscript that proposes using a convolutional neural network (CNN) approach to analyze vibration data from bearings for failure prediction. The authors first automatically label vibration data to indicate different levels of bearing wear. They then convert the raw vibration data into small images that serve as inputs to an AlexNet-based CNN model for classifying wear level. The authors validate their approach on a dataset from an intelligent maintenance system center and find it outperforms other state-of-the-art methods for this task.

Uploaded by

layafo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341395912

Vibration Analysis in Bearings for Failure Prevention using CNN

Preprint · May 2020

CITATIONS READS

0 1,141

3 authors, including:

Luis Angel Pinedo Sánchez Diego Alberto Mercado-Ravell


Centro de Investigación en Matemáticas (CIMAT) Centro de Investigación en Matemáticas (CIMAT)
3 PUBLICATIONS   12 CITATIONS    35 PUBLICATIONS   485 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

NAVIATOR View project

All content following this page was uploaded by Diego Alberto Mercado-Ravell on 21 May 2020.

The user has requested enhancement of the downloaded file.


Preprint manuscript No.
(will be inserted by the editor)

Vibration Analysis in Bearings for Failure Prevention using CNN


L.A. Pinedo-Sánchez1 · D.A. Mercado-Ravell1,2 · C.A. Carballo-Monsivais1
arXiv:2005.07057v1 [eess.AS] 6 May 2020

May 6th 2020

Abstract The timely failure detection for bearings is of great ing machines cause significant losses to the industry every
importance to prevent economic loses in the industry. In this year. One of the key components of rotating machines are
article we propose a method based on Convolutional Neu- the bearings, and these are exposed to excessive wear due
ral Networks (CNN) to estimate the level of wear in bear- to the many hours of continuous operation. Within the in-
ings. First of all, an automatic labeling of the raw vibra- dustry, rotary machinery is used in numerous forms includ-
tion data is performed to obtain different levels of bearing ing pumps, electric motors, power generators, ventilators,
wear, by means of the Root Mean Square features along wind turbines, alternative compressors, refrigeration towers,
with the Shannon’s entropy to extract features from the raw among others.
data, which is then grouped in seven different classes using Signal processing has helped organizations operating ma-
the K-means algorithm to obtain the labels. Then, the raw chinery to prevent failures, along with other kind of prob-
vibration data is converted into small square images, each lems, such as low productivity, safety risks, downtime, among
sample of the data representing one pixel of the image. Fol- others [1, 2]. Henceforth, it is fundamental that the detec-
lowing this, we propose a CNN model based on the AlexNet tion of machinery failures is accomplished on time, to avoid
architecture to classify the wear level and diagnose the rota- problems in the future and to improve the performance of
tory system. To train the network and validate our proposal, the organizations, helping to prevent stopping the produc-
we use a dataset from the center of Intelligent Maintenance tion, causing permanent damage on expensive components,
Systems (IMS), and extensively compare it with other meth- complete machine failure or even an accident.
ods reported in the literature. The effectiveness of the pro- Moreover, rotating machines always incorporate bear-
posed strategy proved to be excellent, outperforming other ings, and these components are substantial in their function-
approaches in the state-of-the-art. ing [3]. The bearings are exposed to wear, causing the ma-
Keywords Vibration analysis · Bearing fault · Deep chines not to operate in favorable conditions and lose effi-
learning · Image classification · CNN · AlexNet ciency. As the wear on the bearing is higher, the vibration
signals increase compromising the system’s performance,
however, such vibrations can also be exploited to detect fail-
1 Introduction ures without stopping the production, generation important
savings to the companies. Furthermore, analyzing vibrations
Rotating mechanisms are essential components in most in- on the bearings can also be utilized to detect problems with
dustrial machines, and the problems that can occur in rotat- other components of the rotating system. For this reason,
the analysis of bearing vibrations is of great importance for
1 Authors are with the Center for Research in Mathematics CIMAT AC,
failure detection and monitoring of the machine health con-
campus Zacatecas.
dition [4].
Avenida Lasec Andador Galileo Galilei, Manzana 3 Lote 7, Parque
Quantum, ZACATECAS 98160 Mexico Different techniques have been used to monitor rotat-
Tel.: 492-998-0300 ing machines. For example, in [5], the authors proposed a
E-mail: [email protected], [email protected] method that combines a Simplified Fuzzy Adaptive Reso-
2Is also with Cátedras CONACYT, at CIMAT-Zacatecas nance Theory Map (SFAM) neural network and Weibull dis-
E-mail: [email protected] tribution. In [6], the Hilbert-Huang transform is used to ob-
2 L.A. Pinedo-Sánchez1 et al.

tain the frequency energy values of the vibration signals of a focused on feature extraction and condition recognition [17],
motor, which is analyzed through a Support Vector Machine where the use of Convolutional Neural Networks (CNN)
(SVM) for prediction failure. Also, in [7] a method making have proven to be an excellent option for effectively learning
use of the Complementary Ensemble Empirical Mode De- features in the context of fault diagnosis through vibration
composition (CEEMD) is presented, with a kernel of SVM analysis [2, 12, 13, 18, 19, 20, 21, 22, 23].
to make the evaluation of the health condition of the bear- This work focuses on the development of an automatic
ings. As well as in [8], where they proposed a method to es- method to prevent failure in rotary machines by wear estima-
timate the Remaining Useful Life (RUL) of bearings, where tion in bearings, where the intervention of an expert is not
the Decomposition Empirical Mode method is used, along required. Accordingly, we propose a method for vibration
with the Principal Component Analysis and Support Vector analysis in bearings through images using CNN, where the
Regression (SVG) algorithms. In addition, in [9] they ap- information is obtained by means of accelerometers, which
plied an approach called Logical Data Analysis, to perform is advantageous since it does not require to stop the pro-
failure detection on rotating machinery using vibration sig- duction periodically to check the machine deterioration. Be-
nals. Furthermore in [10] they proposed a method based on sides, having a constant monitoring of the machines auto-
Deep Neural Networks (DNN), where they created models matically provides a way to give maintenance in a timely
with different numbers of layers and perform the recognition way. In order to make use of supervised learning and take
of the type of fault that occurs. While in [11] it is used a tech- advantage of the great advances in image processing with
nique called Artificial Ant Clustering to detect the degrada- CNNs, we propose a strategy to perform labeling of raw vi-
tion state of the bearings, the hidden Markov models are uti- bration signals, which are then transformed to small square
lized to give an approximation of the next degradation state images for training and classification. The proposed labeling
and an adaptive system of neurodiffuse inference, along with strategy consists in the extraction of the Root Mean Square
time series predictions, are applied to make the estimation of (RMS) along with the Shannon’s entropy of the raw signal,
the remaining time to the next degradation state. which is then grouped in seven different categories using
The main problem of the classical techniques is that they the K-means algorithm to obtain the classes for labeling.
require the supervision of an expert in the area for the ex- Afterwards, sub-samples of the raw signals are converted
traction of characteristics, thus, more advanced engineering into images for training a CNN in order to automatically ex-
is required, which involves greater human effort [12]. Some tract characteristics and create a model to classify the wear
of these methods are still used today to perform fault classi- of the bearings over time. In this work, the AlexNet archi-
fication, such as SVM. However, these methods have to be tecture is taken, where some modifications were made to
combined with other more recent techniques, and consider- adapt it to the problem under consideration. The proposed
ably modified to improve their performance [7, 13]. methodology was implemented and studied using the Intel-
More recently, neural networks have successfully been ligent Maintenance Systems (IMS) unlabeled dataset [24].
carried out for this task. Some of the first attempts include Last but not least, extensive comparison with other classi-
[1], where it is proposed an Artificial Neural Network (ANN) cal techniques, as well as with state-of-the-art CNN based
to make the accurate prediction of the RUL of the bearings. methods revealed that the proposed strategy considerably
The ANN they used was a Feed Forward Neural Networks outperforms other approaches in the literature for the IMS
with a Levenberg Marquardt’s training algorithm. Also, an dataset.
Elman Neuronal Network (ENN) was proposed in [14] to This article is organized in the following way: firstly,
predict the RUL in wind turbine generators, where the ENN Section 2 introduces the related works on bearing failure
output is the percentage of RUL. However, these neural net- detection using CNN. Afterwards, Section 3 describes the
work based techniques have been considerably overcome by proposed methodology for estimating the level of wear in
modern Deep Learning approaches. bearings, including the proposed CNN architecture. Later
Currently, there are new techniques and tools such as on, Section 4 validates the proposal and shows the obtained
Deep Learning, which allow the extraction of features au- results, including and extensive comparison with other avail-
tomatically and without the need for an expert, simplifying able methods. Finally, Section 5 presents the conclusions
the final solution while considerably improving the accuracy and future work.
[15]. Furthermore, this kind of methodologies can be easily
generalized or transferred to another different context [16]. 2 Related Works
It is then of great importance to take advantage of these
new techniques, such as Deep Learning, as it can be of great In the following, we discuss the most relevant recent works
help in predicting bearing failures. Henceforth, in recent years, regarding the diagnosis of bearing failures by means of vi-
researchers have studied bearing condition monitoring and brations. More in particular, this Section reviews modern
failure diagnosis using Deep Learning. These studies have CNN based techniques applied to this particular task, as well
Vibration Analysis in Bearings for Failure Prevention using CNN 3

Table 1: Comparison between the main bearing failure detection works.

Converts
Work Based on/Architecture Dataset Domain Goal Accuracy
to image

Simplified Fuzzy Adaptive


[5] Resonance Theory Map IMS No Time RUL 65.46%
and Weibull distribution
[6] Gray model Own No Frequency Failure detection NA
[7] SVM IMS No Frequency Wear level 98.50%
[8] SVG IMS No Frequency RUL NA
[10] Deep Neural Networks CWRU and IMS No Time Failure detection 98.35%
[18] CNN CWRU Yes Frequency Failure detection 97.89%
Time and
[2] CNN CWRU and gearbox Yes Failure detection 98.35%
Frequency
[19] CNN CWRU Yes Time Failure detection 98.69%
[20] CNN CWRU Yes Time Failure detection 99.22%
[25] CNN CWRU and IMS No Time Failure detection 93.90%
CWRU, IMS and Railway
[26] CNN No Time Failure detection 86.30%
Locomotive (RL)
[21] CNN/LeNet-5 CWRU Yes Time Failure detection 97.90%
[12] CNN/LeNet-5 CWRU and pumps Yes Time Failure detection 97.04%
[22] CNN/LeNet-5 CWRU Yes Time Failure detection 99.40%
Radial Roller Bearing
CNN/AlexNet, SVM
[13] Test Rig and Axial Yes Frequency Failure detection 91.96%
and SAE-SVM
Roller Bearing Test Rig
[23] CNN/AlexNet CWRU Yes Frequency Failure detection 94.39% − 100%
Proposal CNN/AlexNet IMS Yes Time Wear level 99.25%

as the most important architectures up to date, such as LeNet- ods have become very important for digital signal process-
5 and AlexNet. The most relevant works are shown in Table ing [19].
1, where the works performed with classical and CNN-based
methods are compared. Recently, CNN based methods have been implemented
to perform bearing failure prediction, as in [18] where the
With the recent huge advances within the CNN, a large
authors propose a method based on CNN in combination
number of applications have been made to solve the classifi-
with the improved Dempster-Shafer theory called IDSCNN.
cation, detection and segmentation problems, particularly in
There, the CNN architecture consists of three Convolutional
the fields of computer vision and image processing. These
Layers (Conv) and two Fully Connected (FC) layers. Also,
applications have had very good results, due to the potential
the RMS of the Fast Fourier Transform is utilized. In their
that CNN have to extract a large number of characteristics,
experiments they used the Case Western Reserve University
and generalize to different scenarios, producing great impact
(CWRU) dataset for validations. In addition, in [2] a method
improving previous results [12, 27]. More recently, work has
based on CNN is proposed to make the diagnosis of fail-
been developed for the detection of bearing failures by mak-
ures in rotating machines, where they make use of multi-
ing use of CNN, where the data from the vibration signals
ple accelerometers and combine the obtained information to
is transformed to images, taking advantage of the enormous
create a two-dimensional matrix. Also, in [19] they propose
advances obtained in image processing with CNN, hence,
a multi-scale convolution method called MS-DCNN where
becoming an excellent strategy for vibration analysis.
they reduce the number of parameters and training time.
This method is compared with one and two dimensional
CNN. Meanwhile, in [20] a method was proposed where
2.1 Convolutional Neural Network they make the classification of vibrations for the diagnosis
of failures, detecting the source of failure and the different
With the advances that have emerged in the field of deep degrees of damage. The vibration signal was converted to a
learning, several methods have been developed to diagnose spectrogram by means of the Short-Time Fourier Transform
bearings failures, with one of the most popular models being (STFT), which was used as input of a CNN to make the
the CNN [23]. CNN are methods that have a great ability to training of the data. They classified the vibrations into seven
learn in the areas of image classification, object detection, states according to the ring of the bearing (inner or outer),
text recognition, etc. [28], reason why this kind of meth- and its wearing level. On the other side, in [25] they propose
4 L.A. Pinedo-Sánchez1 et al.

a one-dimensional CNN, where only seven layers are used a method called ADCNN inspired by the LeNet-5 architec-
to detect the type of failure in the bearings. Also, in [26] ture, but adding a convolutional and a sub-sampling layers
they propose a method called Deep Convolutional Trans- before the first FC one. With the proposed method, the clas-
fer Learning Network (DCTLN) that has two main modules: sification of the bearing failure type is made, along with
condition recognition and domain adaptation. The first mod- the severity of the failure. Also, this work makes use of the
ule is a one dimensional CNN that is in charge of learning CWRU labeled dataset. Furthermore, an architecture based
the characteristics of the data and recognizing the condition on LeNet-5 was proposed in [12], where two more con-
of the machines. The second module helps the first module volution and two grouping layers are added. The transfor-
to learn the characteristics of data variations. In all the before mation of the images was performed in time-domain, using
mentioned works, the employed datasets are already labeled three datasets including the CWRU bearing dataset, a self-
according to wear level and type of failure, which is a great priming centrifugal pump dataset, and an axial piston hy-
advantage when you want to perform supervised learning. draulic pump dataset. Also, in [22] was proposed a method
Unfortunately, not all the available datasets are properly la- based on CNN and Random Forest (RF). Time-domain vi-
beled, and it is not clear how to classify different data to ap- bration signals containing fault information by Continuous
ply the aforementioned techniques. Furthermore, the avail- Wavelet Transform (CWT) were transformed into images.
able labels only consider whether or not there is a failure, The proposed method is based on LeNet-5 and the features
but do not provide the level of wear, which may be useful extracted by the CNN are used by multiple RF classifiers to
to prevent failures before they occur by opportune mainte- make the diagnosis of bearing failures.
nance. The modification proposals of the this architecture have
At current state-of-the-art, two main CNN architectures been designed to improve the classification of bearing fail-
have been used as a base to obtain the best results reported ure based on images, reporting good results. The use of this
in the literature for classification of bearing failure using im- architecture is adequate due to the size of the images being
ages, LeNet-5 [29] and AlexNet [30]. A review on their main used, predominately small images. Unfortunately, in the lit-
adaptations for this particular task is presented in the follow- erature where improvements to this architecture are perfor-
ing. med, the authors do not normally present enough informa-
tion about the configuration of the CNN to replicate the ex-
2.1.1 LeNet-5 periments, except for the work in [12].

LeNet-5 is a classic CNN architecture for performing image


classification, such as handwritten or machine printed char- 2.1.2 AlexNet
acter recognition, and multi-object detection [31, 12, 32].
AlexNet is a CNN architecture that was proposed by [30].
This architecture was proposed by [33, 29], and includes
This CNN has been satisfactorily used to classify images
seven layers, not counting the input layer. Two convolutional
from the ImageNet LSVRC-2010 database, magnetic reso-
layers (Conv), two sub-sampling layers and three FC layers.
nance images, traffic congestion detection, etc. [30, 34, 35,
The configuration of the layers is shown in Table 2. For the
36]. This architecture is not very deep, which is ideal for
CNN input, the usual resolution is 32 × 32 pixels. However,
working with small images. Furthermore, it was responsible
these input dimensions may vary, depending on the data size.
for the rise of CNN when used in the ImageNet dataset and
good results were obtained [30].
Table 2: LeNet-5 layers configuration. This architecture is composed of eleven layers, not count-
ing the input one, five convolutional layers, three sub- sam-
Layer Features pling layers and three FC layers. The configuration of the
layers is shown in Table 3. Also, it is normally used for
1 Conv(5 x 5 x 6) larger images than LeNet-5, around 224 × 224 pixels, since
2 Avgpool(2 x 2) it employs more layers.
3 Conv(5 x 5 x 16) Only two different adaptations of AlexNet can be found
4 Avgpool(2 x 2) in the literature for bearing failure diagnosis using images.
5 FC(120)
6 FC(84) For example, in [13], AlexNet was slightly modified and
7 FC(10) combined with two SVM classifiers and two Sparse Auto-
Encoder SAE-SVM classifiers. The experiments were perfor-
med with frequency-domain images. Also, in [23] was pro-
Several CNN architectures based on LeNet-5 have been posed a method for bearing failure diagnosis based on CNN
used in the literature, under different modifications in or- with the AlexNet architecture. To validate their proposal it
der to improve the results, as in [21] where it is proposed was necessary to convert the vibration signals into time fre-
Vibration Analysis in Bearings for Failure Prevention using CNN 5

quency images, by using eight time frequency analysis meth- levels of wear, which can be used to schedule the corre-
ods. In their experiments the CWRU dataset was used. Al- sponding maintenance tasks and prevent failures. The vibra-
though good results have been obtained using AlexNet for tion signals in time-domain are transformed into images to
bearings diagnosis, the main differences with respect to the train the CNN, which provide us a better automatic charac-
present work are that the datasets employed so far study teristics extraction from the data. We propose an AlexNet
bearings that were physically manipulated to provoke a fail- based architecture, where the size of the convolution and
ure. Therefore, these datasets already have labels correspond- sub-sampling filters are modified, and a sub-sampling layer
ing to the location of the failure, but induced failures do is added, in order to deal with smaller images, which al-
not necessarily correspond to the normal deterioration un- lows us to increase the number of images obtained, consid-
der regular use. Moreover, they can not be tested with other erably improving the training stage. The proposed method-
datasets due to the lack of equivalent labels. Also, as can be ology was extensively evaluated using the IMS unlabeled
observed in Table 1, most of the available works aim only dataset with excellent results, specially when compared with
to detect whether or not a failure is present, but do not pro- other works in the literature, such as CNN-based and classic
vide the level of wear, which can be used to timely schedule methods.
a maintenance task to prevent failures. On the other hand, The main contributions of this work are summarized in
most of the related work make use of images in frequency- the following:
domain, while in the present work the images are made in
time-domain. Furthermore, they do not make many modifi- 1. A CNN based classifier is used to estimate bearing wear
cations to the architecture, because they work with spectro- by means of vibration images, providing a diagnosis of
gram images which are larger, resulting in higher training the system without stopping production, which can be
times. In addition, they present a combination of this archi- used to timely give maintenance and prevent failures in
tecture with another classification method, such as SVM. In rotatory machines.
the present work, modifications were made to the AlexNet 2. A method for performing automatic vibration data la-
architecture to adapt it to the size of the images, which are beling, without expert supervision, is introduced. This is
considerably smaller, significantly reducing the training time, accomplished by means of RMS along with Shannon’s
while allowing to obtain a major number of images from the entropy and K-means for feature clustering.
available data, which proved to be beneficial for the CNN 3. The AlexNet architecture was adapted and satisfactorily
classifier. utilized for estimating the level of bearing wear. The size
In this work, an automatic labeling strategy is proposed of the convolution and sub-sampling filters were mod-
to classify the wear levels of the bearings over time, with- ified to deal with smaller images, and a sub-sampling
out the need of expert supervision, where the labels were layer was added to improve the results.
made based on the similarity of characteristics of the data 4. The proposed methodology was successfully tested with
using Traditional Statistical Features (TSF) along with the the IMS unlabeled dataset with excellent results, sur-
Shannon’s entropy. Seven types of classes were obtained by passing other methods reported in the literature.
means of the K-means algorithm, and used for labeling of
the raw vibration data, where one class is considered as a
healthy state and the rest of the classes are the different 3 Methodology

We propose a methodology to estimate the level of wear in


Table 3: AlexNet layers configuration. bearings by means of vibration signals in combination with
CNN, where we perform a transformation of the raw signals
to images. The Fig. 1 shows the steps that are followed to
Layer Features
perform the estimation of the wear level in bearings.
A new labeling method for bearing vibration data, ac-
1 Conv(11 x 11 x 96)
2 Maxpool(3 x 3) cording to the level of wear is proposed, based on classical
3 Conv(5 x 5 x 256) techniques [37], where first the characteristics are extracted
4 Maxpool(3 x 3) from the raw data using TSF, then Shannon’s entropy is ap-
5 Conv(3 x 3 x 384)
plied to highlight the extracted characteristics. Afterwards,
6 Conv(3 x 3 x 384)
7 Conv(3 x 3 x 256) the clustering of the characteristics is performed using the
8 Maxpool(3 x 3) K-means algorithm to make the labeling of the data. This
9 FC(4096) methodology is ideal when you have data that is not labeled
10 FC(4096)
by wear or damage level, since it provides an automatic way
11 FC(1000)
to classify it, providing the labels necessary for supervising
6 L.A. Pinedo-Sánchez1 et al.

learning, without intervention of a human expert. In this ar- Once obtained the TSF, the Shannon’s entropy can be
ticle we work with an unlabeled dataset provided by [24], calculated, such that it allows us to highlight the character-
which is described in Section 4. istics obtained from the TSF. In this way, we can choose the
Next, the conversion of the raw vibration signals to im- TSF together with the Shannon’s entropy to observe well
ages is performed, and the images obtained are used to train defined increase or decrease of the data over time. Once we
a CNN for the classification task. We propose a CNN archi- have identified one of the TSF together with the Shannon’s
tecture based on AlexNet to make the wear level classifier. entropy, we can move on to the clustering stage.
Finally, new raw data is obtained, and only a small section
of the signal is required to be converted into images and the
level of bearing wear is estimated. 3.3 Feature clustering

3.1 Feature extraction The next step is to group the data obtained from Shannon’s
entropy along with one of the TSF, in order to obtain the la-
To perform the extraction of the characteristics from the vi- bels needed to perform the CNN training. For this step, the
bration signals, it was necessary to make use of the TSF, be- K-Means algorithm is implemented to make the grouping of
cause they are excellent tools that are used in time-domain to the data with similar characteristics, such that, it is possi-
characterize the changes in the vibration signals of the bear- ble to label the different levels of wear of the bearings. In
ings during operation. In addition, they also allow us to es- this phase we have the possibility to choose the number of
timate the wear of the bearing over time. When bearings are classes we want to label in our dataset.
damaged, the vibrations are intensified, and the TSF values The K-Means algorithm was proposed in [41] and is
increase considerably, indicating the damage. These charac- one of the most important unsupervised classification algo-
teristics are shown in Table 4, where n indicates the number rithms that allows us to group data in a specific number of
of discrete points in the sample, xi is a single experimental groups that have similar characteristics. These groups are
point from the sample, x̄ is the mean of the sampled values, called ”clusters” and the number of clusters is defined by
σ represents the standard deviation and min and max are the
minimum and maximum values of the sample, respectively.
Table 4: Traditional Statistical Features (TSF).
In the literature related to prognosis and diagnosis of
bearings failures, RMS and Kurtosis are the most widely
used [1, 5, 38]. Meanwhile, Kurtosis is effective for detecting Name Formula
bearings failure at an early stage. Whereas, RMS represents
the energy and power characteristics of vibration signals.
q
1 n 2

The main idea of these characteristics is to identify a RMS n ∑i=1 xi
monotonous trend, that is to say, when the bearing deteri-
orates, the value of these characteristics is increased or de- 1 n (xi −x̄)
4
Kurtosis n ∑i=1 σ 4
creased to indicate the failure. When there is damage in the
bearing and it is not detected by one of the TSF, it is, there
is no significant increasing or decreasing in these features, 3
1 n (xi −x̄)
Skewness n ∑i=1 σ 3
TSF will not be of great help for the analysis, so you will
have to choose another TSF.
Peak to peak xmin − xmax
3.2 Shannon’s entropy max|xi |
Crest Factor RMS
Shannon’s entropy is the central part of information theory,
and it is also known as the measure of uncertainty. Shan- RMS
Shape Factor 1
non’s entropy H(x) was introduced with communication the- n ∑ni=1 |xi |
ory in 1948 [39, 40]. Then, in [37] the original formula was
modified, and it is defined as max|xi |
Impulse Factor 1
1 n n ∑ni=1 |xi |
H(x) = ∑ −T SF(xi )log2 T SF(xi )
n i=1
(1)
max|xi |
where n is the length of the sliding window and T SF repre- Margin Factor  2
1
1
sents the Traditional Statistical Feature that was selected in n ∑ni=1 |xi | 2
the previous stage.
Vibration Analysis in Bearings for Failure Prevention using CNN 7

Fig. 1: Overall methodology for estimating the level of wear on bearings, where we perform the extraction of characteristics
from the vibration signals, apply Shannon’s entropy, afterwards make the clustering of characteristics to create classes, then
make the transformation of the vibration signals to images, make the input of a CNN to create a model and perform the
classification of the wear level of bearings.

K. This algorithm consists of minimizing the sum of the eu- x, y, of the image P(x, y). For the i-th image, this process is
clidean distances of each of the points with respect to the defined as follows
centroid of the cluster.
 
Li ((x − 1) · M + y) − min(Li )
Pi (x, y) = round · 255 (2)
max(Li ) − min(Li )
3.4 Signal to image transformation
The size and number of images may vary according to
Traditional methods for motor failure are based on statistical the amount of vibration data available. In addition, the com-
analysis, fuzzy logic expert systems or genetic systems. Ex- putational complexity will also be proportional to the size
tracting characteristics from raw signals is one of the main of the images. Henceforth, in case that complexity is a prob-
functions of these methods, since a good feature extraction lem, the size of the images should be reduced [42]. For our
has a great impact on the results [12, 27]. In contrast to tra- proposal we have chosen a size of 64 × 64 pixels, with a step
ditional methods, we perform a data pre-processing method s = 64.
that converts raw vibration signals in time-domain to im-
ages, in order to take advantage of the powerful classifica-
tion tools available for image processing using CNN [12, 23, 3.5 Convolutional Neural Network
18, 21, 22, 27, 42, 43]. Moreover, converting the raw signals
into images provides a good way to explore two-dimensional The CNN are deep neural networks that focus mainly on
features [42]. image processing and are excellent for pattern recognition.
For a raw signal R with N sample points, Fig. 2 shows In addition, it is one of the best methods for classification.
the conversion method to images, where each time-domain CNN automatically obtain the characteristics of the images
signal point is one pixel of a square grayscale image with by means of convolutional filters, which makes them a tool
size M × M. First a sub-sample L of M 2 size is taken from with a great capacity to learn characteristics in a robust and
the raw signal R, hence, the i-th sub-sample is given by Li = sensitive way.
{R(i·s+1), R(i·s+2), ..., R(i·s+M 2 )}, where s ∈ Z+ is the In each CNN there are three main types of layers: a) the
step between samples, and the index i = {0, 1, ..., bN/sc}, convolutional layer (Conv), b) the sub-sampling layer and
with b c denoting the floor function (see Fig. 2). Note that c) the fully connected layer (FC). The convolutional layer
we aim for an important overlap between samples in order serves to acquire feature maps that are obtained through a
to obtain more images, which is advantageous for the CNN set of filters. The sub-sampling layer serves to reduce the
classifier, i.e. s << M 2 . Then, each point in the sub-sample L characteristics of the inputs and the computational complex-
fills a matrix of M × M from left to right and from top to bot- ity. Finally, the FC layer, that is a layer of a normal neural
tom. Each point is normalized from 0 to 255, and represents network where each pixel is considered as a neuron, func-
the grayscale intensity value of each pixel, with coordinates tions to calculate the scores of each of the classes [31, 12].
8 L.A. Pinedo-Sánchez1 et al.

Fig. 2: Method to convert the raw signals into images. First, an M 2 signal sub-sample is taken, where M represents the total
height and width of the square image. This sub-sample is then mapped into a matrix and each point is normalized in a range
from 0 to 255 to represent the intensity of each pixel value.

FC layers have a loss function as a SVM or softmax classi-


fier [4].
The CNN model we propose for the classification task Table 5: Layers configuration for the proposed architecture,
is based on the state-of-the-art architecture AlexNet intro- where n indicates the number of neurons and x the number
duced by [30], and is shown in Fig. 3, while the config- of classes.
uration of the layers is presented in Table 5. From there,
five convolutional layers (Conv), four sub-sampling layers
with maxpooling (Maxpool) and three FC layers are applied, Layer Features
where the last one is the output layer. We can note that the
sizes of the filters on each layer was modified with respect 1 Conv(5 × 5 × 96)
to the original proposal of AlexNet, in order to deal with 2 Maxpool(2 × 2)
3 Conv(3 × 3 × 256)
smaller images. Once the signals have been converted into 4 Maxpool(2 × 2)
images, the CNN training can be performed to classify the 5 Conv(3 × 3 × 384)
level of wear of the bearings. 6 Maxpool(2 × 2)
7 Conv(3 × 3 × 384)
The main modifications to the AlexNet architecture were 8 Conv(3 × 3 × 256)
made in order to deal with smaller images, resulting in a 9 Maxpool(2 × 2)
reduction of the kernel size used for each layer. Accord- 10 FC(n)
11 FC(n)
ingly, in the first convolutional layer, the kernel size was
12 FC(x)
reduced from 11 × 11 to 5 × 5; also the second convolu-
Vibration Analysis in Bearings for Failure Prevention using CNN 9

Fig. 3: Proposed architecture based on AlexNet for the estimation of bearing wear. The size of the kernels in the first and
second convolutional layers was changed, also a sub-sampling layer of maxpooling was added between the third and fourth
convolutional layers, finally the sub-sampling layers were resized and the FC layers changed the number of neurons with
respect to the original AlexNet proposal.

tional layer kernel size was reduced from 5 × 5 to 3 × 3; accelerometers PCB 353B33 for each bearings. Moreover, a
while the maxpooling layers were reduced from 3 × 3 to radial load of 6, 000 lbs was applied to the shaft and bear-
2 × 2. Furthermore, a new layer of maxpooling was added ing by means of a spring mechanism. The data was collected
between the third and fourth convolutional layers. The use with the data acquisition system NI DAQCard 6062E. Fail-
of smaller images is convenient since it allows us to obtain ures occurred after the stress test exceeded the life time of
more images from available datasets, which is advantageous the bearings. The experimental platform and the location of
for the training algorithm. Furthermore, smaller images sig- the sensors are shown in Fig. 4.
nificantly reduce the computational cost for the training al- The IMS dataset contains three failure tests, where the
gorithms. Also, in the first two FC layers, the number of system is run under regular operation conditions until a fail-
neurons was varied, looking for the best configuration for ure occurs, produced by the deterioration of a different bear-
our particular case, using values from 512 to 3584 neurons, ing each time. Each of the tests contains files, recording a
in the first FC layer, and from 0 to 1024 in the second one. snapshot of one second of the accelerometers vibration sig-
Finally, the last layer only was changed according to the nal, which is stored in 10 minutes time intervals. Each file
number of labeled classes, seven in our case. has 20, 480 points with the sampling frequency at 20 kHz,
including information of the eight accelerometers with a ti-
mestamp. Although this dataset is commonly used in the lit-
4 Experiments and results erature, it does not provides labels according to the wear
level, hence it is not suitable for supervised training with
The proposed methodology was implemented and tested with
CNN. This issue is overcome by the proposed automatic la-
the University of Cincinnati’s Center for Intelligent Mainte-
beling method previously stated.
nance Systems (IMS) [24] unlabeled dataset, where the re-
For the training, the CNNs were carried out in Python
sults of each of the phases of the proposed methodology are
3.6 with TensorFlow 1.12, and implemented on a computer
shown in the following.
equipped with a dedicated Graphic Processing Unit (GPU)
NVIDIA GeForce RTX 2070, a processor Intel i7-9750H
4.1 Dataset and experimental setup CPU and 16 GB of RAM memory.

A vibration signal dataset is freely provided by the IMS [24].


This dataset is one of the most used in the literature, and 4.2 Feature extraction
it was selected because it does not contain labels, allowing
us to propose and validate a labeling method according to In order to identify the best feature extractor, all of the TSF
the level of wear. On the other hand, the bearings of this in Table 4 were applied to the accelerometer data provided
dataset were not physically altered to force failure, but the by the IMS dataset, and the obtained plots are displayed in
failures were presented in a natural way due to degradation Fig. 5. It was observed graphically that some measures may
in normal operation. As depicted in Fig. 4, the experimen- be discarded because there is not a well-defined increasing
tal setup is composed of an AC motor running at a constant or decreasing trend, as is the case of Crest factor, Impulse
speed of 2, 000 RPM, which is connected to the shaft by fric- factor and Margin factor. Also, it can be observed that RMS,
tion bands. Four Rexnord ZA-2115 double row bearings are Kurtosis, Peak to peak and Shape factor present a better de-
mounted on the shaft, along with two high-sensitivity quartz fined increasing trend than the others, which helps to iden-
10 L.A. Pinedo-Sánchez1 et al.

These clusters are considered as the classes to be identified


by the CNN. Fig. 7 presents, with different colors, the clas-
sification obtained by the K-means algorithm applied to the
Shannon’s entropy with RMS signal, which is used to label
the raw signals for the training algorithm. Similar to the clas-
sification employed by [5], these classes were divided in the
following class names according to the approximate level
of wear: 0% − 9%, 10% − 24%, 25% − 39%, 40% − 54%,
55% − 69%, 70% − 84% and 85% − 100%.
This concludes the labeling method, which is of great
relevance when dealing with unlabeled datasets, since it pro-
vides an automatic way to categorize the data, without the
Fig. 4: Experimental setup for vibration data acquisition
need of expert’s supervision, or faking the bearing’s wear by
from bearings provided by [24]. It is conformed by an AC
physically damaging them. This labeling can then be used
motor rotating at a constant speed of 2, 000 RPM. Four bear-
for training a supervised learning classifier to estimate the
ings are mounted on the shaft, each one with two high-
level of wear. On the other hand, images can be generated
sensitivity accelerometers. Data is collected until failure, for
from vibration signals with their corresponding label, from
one second, every 10 minutes, for a total of 20, 480 points.
where we can implement classification models by means of
CNN, as explained in the following.

tify changes in vibration signals in the bearings, facilitating


the classification. 4.5 Signal to image transformation

The result of the transformation of the raw signals into im-


4.3 Shannon’s entropy ages is shown in Fig. 8, where each of the images represents
a level of wear. The labeling of these levels was performed
Now, as shown in Fig. 6, the Shannon’s entropy is also cal- in the previous step, where seven different levels were ob-
culated for each of the TSF, considerably stressing the effect tained.
of each feature, making it easier to identify the trends. Then, The total number of converted images was 251, 904 with
as mentioned before, the characteristics extracted from Crest a size of 64 × 64 pixels, using a step s = 64. The use of
factor, Impulse factor and Margin factor did not show a sig- a small step allows for an important overlap between sub-
nificant change, as well as Peak to peak, reason why they are samples to form images, resulting in an increase in the num-
discarded in this work. Skewness is also discarded because ber of images obtained, which is important to obtain a better
it decreases too slowly. We can observe that the TSF that performance of the CNN training. These images are labeled
show a more pronounced growth over time are RMS, Kur- according to the classification obtained in the previous stage,
tosis and Shape factor, but we can see that Kurtosis does not and the number of images per cluster can be observed in Ta-
show much variation at the beginning, that is, it is less sen- ble 6. The number of images for each of the classes accord-
sitive to early wear in the bearings when compared to RMS ing to the level of wear is unbalanced, due to the proportion
or Shape factor. On the other hand, RMS and Shape factor of data that each of the clusters had, evidently, less infor-
present slight variations from the beginning, and are more mation is available for the last wear levels near to failure.
suitable to characterize the level of wear from early stages. Performing a CNN training with the unbalanced data load in
Henceforth, the combination of RMS with Shannon’s en- the classes, causes the CNN not to learn properly, resulting
tropy shows a suitable behavior to classify the level of wear in erroneous estimates by the classifier. Therefore, it is very
on the bearings during their whole useful life, since it pro- important that the images of each of the classes are balanced,
vides faster growth over time, which is essential to find the that is, that each of the classes contains the same number of
deterioration of the bearing, so it is selected for this method- images. In this work, this is accomplished by generating a
ology. large amount of images, and randomly selecting the same
number of images for each class.

4.4 Feature clustering


4.6 Convolutional Neural Network
The data generated by the Shannon’s entropy and RMS is
then grouped to form seven clusters, where each cluster rep- In order to obtain a balanced number of images for each
resents the level of wear that the bearing has over time. class, an equal amount of images is selected randomly. This
Vibration Analysis in Bearings for Failure Prevention using CNN 11

Fig. 5: Traditional Statistical Features (TSF) for bearing 1. The results obtained with Crest factor, Impulse factor and Margin
factor do not present an increasing or decreasing trend, which is undesired. While RMS, Kurtosis, Peak to peak and Shape
factor present a more defined trend.

Fig. 6: Shannon’s entropy measures of each for the TSF shown in Table 4 for bearing 1. RMS is selected because it was the
measure that grew the fastest over time, while showing variations from early wear stages.

way, the loading of each of the classes was done in a bal-


anced way for the CNN to learn equally. Therein the impor-
tance of counting with a large number of images, provided
that less data is available about the last stages of the bearing

Table 6: Number of images per class.

Level Number of images

Fig. 7: Clusters made to separate the wear level of the bear- 0% − 9% 139, 520
ings, using the K-means algorithm along with the Shannon’s 10% − 24% 26, 368
entropy and RMS. 25% − 39% 29, 440
40% − 54% 13, 056
55% − 69% 27, 648
70% − 84% 9, 984
85% − 100% 5, 888
12 L.A. Pinedo-Sánchez1 et al.

mum, average and standard deviation of each of the acquired


metrics were obtained. The metrics considered in each CNN
training are: accuracy, precision, recall, F1 and the Mean
Square Error (MSE) [44].
The proposed CNN was run with one and two FC layers,
trying different number of neurons, looking for the best con-
(a) 0% − 9% (b) 10% − 24% (c) 25% − 39%
figuration. The name of each model is denoted by the form
CNN-i- j, where the values of i and j represent the number
of neurons in the corresponding FC layers, one and two, re-
spectively. For example, CNN-512 represents that the first
FC layer has 512 neurons and the second FC layer does not
have any. For CNN-1024-64 means that the first FC layer
has 1024 neurons and the second FC layer has 64 neurons.
(d) 40% − 54% (e) 55% − 69% (f) 70% − 84%
The first experiment was done with the proposed CNN
with a single FC layer, with the purpose of finding the ade-
quate number of neurons while disregarding the second FC
layer. The number of neurons was changed between 512,
1024, 1536, 2048, 2560, 3072 and 3584, resulting in seven
different models according to the number of neurons used,
as shown in Table 8. From there, it can be observed that with
(g) 85% − 100%
the model CNN-2560 we obtained the best accuracy, preci-
Fig. 8: Conversion of accelerometer signals to images. The sion, recall and F1 of 97.96%, 98%, 97.96% and 97.97%
images represent the approximate wear of the bearings, it is respectively. Also, we obtained the lowest error of 0.02159,
(a) 0% − 9%, (b) 10% − 24%, (c) 25% − 39%, (d) 40% − as well as the minimum and mean of all the metrics. With
54%, (e) 55% − 69%, (f) 70% − 84% and (g) 85% − 100% the model CNN-3072 results were lower than those of the
level of wear, respectively. model CNN-2560, but the variability between results was
lower with CNN-3072. It can be observed that in the pro-
posed CNN experiment with only one FC layer with 2560
wear before failure. Moreover, the images were used 70% neurons (CNN-2560), good results can be obtained, when
for training and 30% for testing. compared with the rest of the configurations, hence it was
Furthermore, different number of neurons were tested in selected for the next step.
the FC, convolutional and sub-sampling layers, to find the Once the best number of neurons for the first FC layer is
best configuration for this task. Each one of the CNN combi- obtained, the second FC layer can be added. Then, the next
nations was executed ten times, where the maximum, mini- experiment is to find the best number of neurons for the sec-
ond FC layer, where the best first layer obtained from the
previous step is employed, it is using 2560 neurons, and the
Table 7: Layers configuration of the architecture proposed second FC layer is tested with values of 64, 128, 256, 512,
in [12] based on LeNet-5, which is used for comparison, 768 and 1024 neurons. Then, six models of the proposed
where n indicates the number of neurons and x the number CNN with two FC layers are tested based on the CNN-2560
of classes. model, where the results can be seen in Table 9. The max-
imum percentages of the results for accuracy, precision, re-
call and F1 were 99.25%, and the minimum error of 0.00825
Layer Features
was obtained with the model CNN-2560-256. We can also
appreciate the effect of adding the second FC layer when
1 Conv(5 × 5 × 32)
2 Maxpool(2 × 2) compared with the best result using only one FC, resulting
3 Conv(3 × 3 × 64) in an improvement when selecting the appropriate number
4 Maxpool(2 × 2) of neurons.
5 Conv(3 × 3 × 128)
In addition to the proposed architecture, other architec-
6 Maxpool(2 × 2)
7 Conv(3 × 3 × 256) tures were implemented to compare the results. In [12], an
8 Maxpool(2 × 2) architecture for classifying bearing failures was proposed
9 FC(n) based on LeNet-5, which is one of the best reported in the
10 FC(n)
literature, hence it is used for comparison with our exper-
11 FC(x)
iments. The configuration of the layers of this architecture
Vibration Analysis in Bearings for Failure Prevention using CNN 13

Table 8: Results of the models of our AlexNet-based proposal with one FC layer. It can be seen that the model CNN-2560
obtained the best results in maximum, minimum and mean for each of the metrics.

No. CNN-512 CNN-1024 CNN-1536 CNN-2048 CNN-2560 CNN-3072 CNN-3584


Accuracy
Max 97.09% 96.12% 97.55% 97.84% 97.96% 97.21% 97.74%
Min 90.42% 91.17% 90.93% 93.98% 95.27% 94.64% 95.12%
Mean 94.59% 94.61% 94.74% 96.03% 96.78% 96.36% 96.19%
Std 2.34 1.4841 2.0931 1.1987 0.8178 0.7689 0.9575
Precision
Max 97.09% 96.17% 97.56% 97.88% 98.00% 97.23% 97.77%
Min 90.77% 91.70% 90.99% 94.15% 95.34% 94.75% 95.26%
Mean 94.78% 94.77% 94.84% 96.08% 96.86% 96.42% 96.28%
Std 2.2386 1.3458 2.0843 1.1704 0.8042 0.7411 0.9141
Recall
Max 97.09% 96.12% 97.55% 97.84% 97.96% 97.21% 97.74%
Min 90.42% 91.17% 90.93% 93.98% 95.27% 94.64% 95.12%
Mean 94.59% 94.61% 94.74% 96.03% 96.78% 96.36% 96.19%
Std 2.34 1.4841 2.0931 1.1987 0.8178 0.7689 0.9575
F1
Max 97.09% 96.11% 97.55% 97.85% 97.97% 97.22% 97.75%
Min 90.47% 91.12% 90.87% 93.97% 95.29% 94.65% 95.16%
Mean 94.60% 94.60% 94.74% 96.03% 96.79% 96.36% 96.19%
Std 2.3364 1.4944 2.1091 1.2064 0.8176 0.7639 0.9579
MSE
Max 0.09993 0.08974 0.09144 0.06233 0.04730 0.05506 0.04948
Min 0.02983 0.03978 0.02522 0.02231 0.02159 0.02862 0.02328
Mean 0.05547 0.05559 0.05384 0.04097 0.03301 0.03730 0.03888
Std 0.0243 0.0151 0.0213 0.0124 0.0079 0.0077 0.0096

Table 9: Results of the models of our AlexNet-based proposal with two FC layers. It can be seen that the model CNN-2560-
256 obtained the best results in maximum, minimum and mean for the metrics accuracy, precision, recall, F1.

No. CNN-2560 CNN-2560- CNN-2560- CNN-2560- CNN-2560- CNN-2560- CNN-2560- LeNet5-


64 128 256 512 768 1024 2560-512
Accuracy
Max 97.96% 98.30% 98.13% 99.25% 97.67% 97.11% 97.60% 97.04%
Min 95.27% 94.18% 93.82% 95.54% 92.58% 93.28% 94.71% 93.50%
Mean 96.78% 96.19% 95.78% 96.84% 95.33% 95.57% 96.29% 95.38%
Std 0.8178 1.1948 1.3116 1.0581 1.5948 1.1762 0.889 1.1045
Precision
Max 98.00% 98.32% 98.16% 99.25% 97.72% 97.20% 97.63% 97.07%
Min 95.34% 94.35% 94.04% 95.56% 92.89% 93.55% 94.79% 93.83%
Mean 96.86% 96.27% 95.89% 96.90% 95.50% 95.73% 96.36% 95.49%
Std 0.8042 1.136 1.2596 1.0352 1.5031 1.119 0.9012 1.0505
Recall
Max 97.96% 98.30% 98.13% 99.25% 97.67% 97.11% 97.60% 97.04%
Min 95.27% 94.18% 93.82% 95.54% 92.58% 93.28% 94.71% 93.50%
Mean 96.78% 96.19% 95.78% 96.84% 95.33% 95.57% 96.29% 95.38%
Std 0.8178 1.1948 1.3116 1.0581 1.5948 1.1762 0.889 1.1045
F1
Max 97.97% 98.31% 98.14% 99.25% 97.67% 97.11% 97.61% 97.05%
Min 95.29% 94.20% 93.83% 95.53% 92.56% 93.26% 94.72% 93.55%
Mean 96.79% 96.20% 95.78% 96.84% 95.33% 95.57% 96.30% 95.39%
Std 0.8176 1.1932 1.316 1.0608 1.6058 1.1809 0.89 1.0945
MSE
Max 0.04730 0.05894 0.06258 0.05336 0.08537 0.07082 0.05700 0.06718
Min 0.02159 0.01771 0.01940 0.00825 0.02474 0.02959 0.02474 0.03032
Mean 0.03301 0.03936 0.04312 0.03374 0.04994 0.04625 0.04046 0.04749
Std 0.0079 0.0124 0.0132 0.0123 0.0187 0.0126 0.0109 0.0115
14 L.A. Pinedo-Sánchez1 et al.

is shown in Table 7. In order to provide a fair comparison, CNN classifiers. This is important for industries that oper-
the procedure in [12] was implemented for the IMS dataset, ate rotatory machinery, preventing failures that may result in
and different numbers of neurons were also tested in the two stopping production, complete system failure, damaging ex-
first FC layers, in the same fashion as for our proposal, find- pensive components or even accidents, hence avoiding sig-
ing out that the best results for the experiment with a single nificant economic losses.
FC layer were achieved with 2560 neurons in the first FC The proposed CNN architecture is based on AlexNet,
layer. For the second experiment the second FC layer was and was extensively validated with the IMS dataset, obtain-
added, and the best results were achieved with 512 neurons ing an accuracy of 99.25%, which represents an important
in the second FC layer, that is, with the LeNet5-2560-512 improvement with respect to previous results in the litera-
model. The obtained results for this model are also included ture, including both classic techniques and state-of-the-art
in the last column of Table 9, where we can appreciate the CNN based methods.
superior performance of our proposal with respect to this This proposal is ideal to be used for unlabeled datasets,
LeNet5-based architecture. or new unclassified data. Accordingly, we proposed a tech-
On the other side, to further evaluate the results of this nique to automatically make the labeling of unclassified data-
work, the original proposals of LeNet-5 and AlexNet archi- sets, without the supervision of an expert, or faking the bear-
tectures were also tested and compared in our study, along ings wear by physically damaging them. The proposed la-
with other CNN and classical methods, such as SVM and beling strategy is accomplished by means of Root Mean
ANN, among others. The comparison results are shown in Square (RMS) combined with the Shannon’s entropy for
the Table 10. We can appreciate that the original proposal of feature extraction, and the K-means algorithm for unsuper-
AlexNet is not well suited for this problem before adapta- vised classification.
tion, mainly due to the large size of the filters in the convo- We have found that the use of small size images along
lutional layers. Using traditional methods, such as ANN and with an important overlap between them is suitable for this
SVM, the results were very low to be used as classifiers. In kind of task, due to the limited amount of data available,
addition, training with SVM is very time consuming, even since it allows to obtain a good amount of images, which is
with the reduced number of features. With diffuse learning, key for a good training with balanced classes. Henceforth,
using the SFAM method, the results reported are very low. the AlexNet architecture was adapted to deal with small size
One dimensional convolutional neural networks such as the images.
DCTLN and 1DCNN methods are only able to find the type There are a few things that remain to be proven in future
of failure that occurs in the bearing, but they do not classify works. Particularly, we are interested in replicate the results
the wear that the bearings have over time, and the results with other datasets that exist in the literature, in order to fur-
are not very promising. We can observe that our proposal ther validate our proposal and compare it with other works.
provides excellent results, clearly surpassing other methods On the other side, we would like to try different architec-
reported in the literature for the IMS dataset. tures, such as VGG, and make modifications to adapt them
In summary, RMS in conjunction with Shannon’s en- to the problem under consideration.
tropy proved to be an excellent option for feature extraction,
because it detects changes in vibration signals over time Acknowledgements This work was supported by the Mexican Na-
more quickly. Furthermore, the transformation of vibration tional Council of Science and Technology CONACYT, and the FORDE-
signals into images provides a good way to analyze features CyT project 296737 “Consorcio en Inteligencia Artificial”.
in two dimensions, and along with CNN are excellent for
pattern recognition, performing feature extraction automat- Table 10: Comparison results with others methods.
ically and learning features robustly. On the other hand, af-
ter extensive experiments, the proposed CNN architecture
based on AlexNet with two FC layers obtained the highest Rank Methods Accuracy
results in the present study with the IMS dataset, but specif-
ically the CNN-2560-256 was the best model, significantly 1 Proposal 99.25%
2 CEEMD [7] 98.50%
overcoming other techniques reported in the state-of-the-art 3 DNN [10] 98.35%
literature. 4 Based on LeNet-5 [12] 97.04%
5 1D CNN [25] 93.90%
6 LeNet-5 92.04%
7 DCTLN [26] 86.30%
5 Conclusion and future work 8 SVM 81.00%
9 SFAM [5] 65.46%
10 AlexNet 14.29%
In this article, we present a method for estimating the level
11 ANN 14.14%
of bearing’s wear, by vibrations analysis using images and
Vibration Analysis in Bearings for Failure Prevention using CNN 15

Conflict of interest 15. M. Raghu, B. Poole, J. Kleinberg, S. Ganguli, and J. S. Dickstein,


“On the expressive power of deep neural networks,” 34th Interna-
tional Conference on Machine Learning, ICML 2017, vol. 6, pp.
The authors declare that they have no conflict of interest.
4351–4374, 2017.
16. S. . Zhang, B. . Wang, T. Habetler, S. Zhang, S. Zhang, B. Wang,
and T. G. Habetler, “Deep Learning Algorithms for Bearing
Fault Diagnostics-A Review,” MERL - MITSUBISHI ELECTRIC
References
RESEARCH LABORATORIES, 2019. [Online]. Available: http:
//www.merl.com
1. A. K. Mahamad, S. Saon, and T. Hiyama, “Predicting remaining 17. A. A. Tabrizi, H. Al-Bugharbee, I. Trendafilova, and L. Garibaldi,
useful life of rotating machinery based artificial neural network,” “A cointegration-based monitoring method for rolling bearings
Computers and Mathematics with Applications, vol. 60, no. 4, pp. working in time-varying operational conditions,” Meccanica,
1078–1087, 2010. [Online]. Available: http://dx.doi.org/10.1016/ vol. 52, no. 4-5, pp. 1201–1217, 2017.
j.camwa.2010.03.065 18. S. Li, G. Liu, X. Tang, J. Lu, and J. Hu, “An ensemble deep convo-
2. M. Xia, T. Li, L. Xu, L. Liu, and C. W. De Silva, “Fault Diag- lutional neural network model with improved D-S evidence fusion
nosis for Rotating Machinery Using Multiple Sensors and Convo- for bearing fault diagnosis,” Sensors (Switzerland), vol. 17, no. 8,
lutional Neural Networks,” IEEE/ASME Transactions on Mecha- p. 19, 2017.
tronics, vol. 23, no. 1, pp. 101–110, 2018. 19. Z. Zilong and Q. Wei, “Intelligent fault diagnosis of rolling bear-
3. H. Qiu, J. Lee, J. Lin, and G. Yu, “Wavelet filter-based weak signa- ing using one-dimensional multi-scale deep convolutional neu-
ture detection method and its application on rolling element bear- ral network based health state classification,” ICNSC 2018 - 15th
ing prognostics,” Journal of Sound and Vibration, vol. 289, no. IEEE International Conference on Networking, Sensing and Con-
4-5, pp. 1066–1090, 2006. trol, vol. 15, no. March, pp. 1–6, 2018.
4. Y. Xie and T. Zhang, “Fault Diagnosis for Rotating Machinery 20. W. Zhang, F. Zhang, W. Chen, Y. Jiang, and D. Song, “Fault State
Based on Convolutional Neural Network and Empirical Mode De- Recognition of Rolling Bearing Based Fully Convolutional Net-
composition,” Shock and Vibration, vol. 2017, p. 12, 2017. work,” Computing in Science and Engineering, vol. 21, no. 5, pp.
5. J. Ben Ali, B. Chebel-Morello, L. Saidi, S. Malinowski, and 55–63, 2018.
F. Fnaiech, “Accurate bearing remaining useful life prediction 21. X. Guo, L. Chen, and C. Shen, “Hierarchical adaptive deep
based on Weibull distribution and artificial neural network,” convolution neural network and its application to bearing fault
Mechanical Systems and Signal Processing, vol. 56, pp. 150– diagnosis,” Measurement: Journal of the International Mea-
172, 2015. [Online]. Available: http://dx.doi.org/10.1016/j.ymssp. surement Confederation, vol. 93, pp. 490–502, 2016. [Online].
2014.10.014 Available: http://dx.doi.org/10.1016/j.measurement.2016.07.054
6. J. Xuan, X. Wang, D. Lu, and L. Wang, “Research on the safety as- 22. G. Xu, M. Liu, Z. Jiang, D. Söffker, and W. Shen, “Bearing fault
sessment of the brushless DC motor based on the gray model,” Ad- diagnosis method based on deep convolutional neural network and
vances in Mechanical Engineering, vol. 9, no. 3, pp. 1–15, 2017. random forest ensemble learning,” Sensors (Switzerland), vol. 19,
7. Y. Lu, R. Xie, and S. Y. Liang, “CEEMD-assisted kernel support no. 5, p. 21, 2019.
vector machines for bearing diagnosis,” International Journal of 23. J. Wang, Z. Mo, H. Zhang, and Q. Miao, “A deep learning method
Advanced Manufacturing Technology, pp. 3063–3070, 2020. for bearing fault diagnosis based on time-frequency image,” IEEE
8. M. Akuruyejo, S. Kowontan, and J. Ben Ali, “A data-driven ap- Access, vol. 7, pp. 42 373–42 383, 2019.
proach based health indicator for remaining useful life estima- 24. National Aeronautics and Space Administration, “PCoE Data-
tion of bearings,” 2017 18th International Conference on Sciences sets,” 2018. [Online]. Available: https://ti.arc.nasa.gov/tech/dash/
and Techniques of Automatic Control and Computer Engineering groups/pcoe/prognostic-data-repository/
(STA), pp. 284–289, 2018. 25. L. Eren, T. Ince, and S. Kiranyaz, “A Generic Intelligent
9. M. A. Mortada, S. Yacout, and A. Lakis, “Diagnosis of rotor bear- Bearing Fault Diagnosis System Using Compact Adaptive
ings using logical analysis of data,” Journal of Quality in Mainte- 1D CNN Classifier,” Journal of Signal Processing Systems,
nance Engineering, vol. 17, no. 4, pp. 371–397, 2011. vol. 91, no. 2, pp. 179–189, 2019. [Online]. Available:
10. R. Zhang, Z. Peng, L. Wu, B. Yao, and Y. Guan, “Fault diagnosis https://link.springer.com/article/10.1007/s11265-018-1378-3
from raw sensor data using deep neural networks considering tem- 26. L. Guo, Y. Lei, S. Xing, T. Yan, and N. Li, “Deep Convolutional
poral coherence,” Sensors (Switzerland), vol. 17, no. 3, pp. 1–17, Transfer Learning Network: A New Method for Intelligent Fault
2017. Diagnosis of Machines with Unlabeled Data,” IEEE Transactions
11. A. Soualhi, H. Razik, G. Clerc, and D. D. Doan, “Prognosis of on Industrial Electronics, vol. 66, no. 9, pp. 7316–7325, 2019.
bearing failures using hidden markov models and the adaptive 27. J. Zhang, Y. Sun, L. Guo, H. Gao, X. Hong, and H. Song, “A new
neuro-fuzzy inference system,” IEEE Transactions on Industrial bearing fault diagnosis method based on modified convolutional
Electronics, vol. 61, no. 6, pp. 2864–2874, 2014. neural networks,” Chinese Journal of Aeronautics, pp. 1–9, 2019.
12. L. Wen, X. Li, L. Gao, and Y. Zhang, “A New Convolutional Neu- [Online]. Available: https://doi.org/10.1016/j.cja.2019.07.011
ral Network-Based Data-Driven Fault Diagnosis Method,” IEEE 28. J. Pan, Y. Zi, J. Chen, Z. Zhou, and B. Wang, “LiftingNet: A Novel
Transactions on Industrial Electronics, vol. 65, no. 7, pp. 5990– Deep Learning Network with Layerwise Feature Learning from
5998, 2018. Noisy Mechanical Data for Fault Classification,” IEEE Transac-
13. M. Hemmer, H. Van Khang, K. Robbersmyr, T. Waag, and tions on Industrial Electronics, vol. 65, no. 6, pp. 4973–4982,
T. Meyer, “Fault Classification of Axial and Radial Roller Bear- 2018.
ings Using Transfer Learning through a Pretrained Convolutional 29. Y. Lecun, L. Botton, Y. Bengio, and P. Haffner, “GradientBased
Neural Network,” Designs, vol. 2, no. 4, p. 56, 2018. Learning Applied to Document Recognition,” Proceedings of the
14. S. E. Kramti, J. Ben Ali, L. Saidi, M. Sayadi, and E. Bechhoe- IEEE, vol. 86, no. 11, pp. 2278 – 2324, 1998.
fer, “Direct Wind Turbine Drivetrain Prognosis Approach Using 30. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Clas-
Elman Neural Network,” 2018 5th International Conference on sification with Deep Convolutional Neural Networks,” Advances
Control, Decision and Information Technologies, CoDIT 2018, pp. in neural information processing systems, vol. 1, pp. 1097–1105,
859–864, 2018. 2012.
16 L.A. Pinedo-Sánchez1 et al.

31. X. D. Ren, H. N. Guo, G. C. He, X. Xu, C. Di, and S. H. Li, “Con-


volutional neural network based on principal component analysis
initialization for image classification,” Proceedings - 2016 IEEE
1st International Conference on Data Science in Cyberspace, DSC
2016, pp. 329–334, 2016.
32. S. Hong, Q. Wu, H. Xie, Y. Chen, and Y. Kou, “A Novel Coupled
Template for Face Recognition Based on a Convolutional Neutral
Network,” Proceedings - 2015 6th International Conference on
Intelligent Systems Design and Engineering Applications, ISDEA
2015, vol. 1, pp. 52–56, 2016.
33. Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
W. E. Hubbard, and L. D. Jackel, “Handwritten digit recognition
with a back-propagation network,” Advances in neural informa-
tion processing systems, no. 2, pp. 396–404, 1990.
34. B. Khagi, B. Lee, J. Y. Pyun, and G. R. Kwon, “CNN models per-
formance analysis on MRI images of oasis dataset for distinction
between healthy and Alzheimer’s patient,” ICEIC 2019 - Interna-
tional Conference on Electronics, Information, and Communica-
tion, pp. 1–4, 2019.
35. R. Ezhilarasi and P. Varalakshmi, “Tumor detection in the brain
using faster R-CNN,” Proceedings of the International Conference
on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC
2018, pp. 388–392, 2018.
36. P. Wang, L. Li, Y. Jin, and G. Wang, “Detection of unwanted
traffic congestion based on existing surveillance system using in
freeway via a CNN-architecture trafficnet,” Proceedings of the
13th IEEE Conference on Industrial Electronics and Applications,
ICIEA 2018, vol. 13, pp. 1134–1139, 2018.
37. J. Ben Ali, L. Saidi, B. Chebel-Morello, and F. Fnaiech, “A new
enhanced feature extraction strategy for bearing Remaining Use-
ful Life estimation,” STA 2014 - 15th International Conference on
Sciences and Techniques of Automatic Control and Computer En-
gineering, no. i, pp. 365–370, 2014.
38. R. Huang, L. Xi, X. Li, C. Richard Liu, H. Qiu, and J. Lee, “Resid-
ual life predictions for ball bearings based on self-organizing map
and back propagation neural network methods,” Mechanical Sys-
tems and Signal Processing, vol. 21, no. 1, pp. 193–207, 2007.
39. J. N. Kapur and H. Kesavan, K., “Entropy Optimization Principles
and Their Applications,” Water Science and Technology Library,
vol. 9, pp. 3–20, 1992.
40. C. E. Shannon, “A Mathematical Theory of Communication,,”
Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
41. S. P. Lloyd, “Least Squares Quantization in PCM,” IEEE Transac-
tions on Information Theory, vol. 28, no. 2, pp. 129–137, 1982.
42. V. T. Do and U. P. Chong, “Signal model-based fault detection and
diagnosis for induction motors using features of vibration signal in
two-dimension domain,” Strojniski Vestnik/Journal of Mechanical
Engineering, vol. 57, no. 9, pp. 655–666, 2011.
43. C. Lu, Z. Wang, and B. Zhou, “Intelligent fault diagnosis of
rolling bearing using hierarchical convolutional network based
health state classification,” Advanced Engineering Informatics,
vol. 32, pp. 139–151, 2017. [Online]. Available: http://dx.doi.org/
10.1016/j.aei.2017.02.005
44. S. Alla and S. K. Adari, Beginning Anomaly Detection Using
Python-Based Deep Learning, 1st ed. New Jersey: Apress, 2019.

View publication stats

You might also like