A Domain Adaption Resnet Model
A Domain Adaption Resnet Model
Article
A Domain Adaption ResNet Model to Detect Faults in Roller
Bearings Using Vibro-Acoustic Data
Yi Liu 1 , Hang Xiang 2 , Zhansi Jiang 1 and Jiawei Xiang 3, *
Abstract: Intelligent fault diagnosis of roller bearings is facing two important problems, one is
that train and test datasets have the same distribution, and the other is the installation positions
of accelerometer sensors are limited in industrial environments, and the collected signals are often
polluted by background noise. In the recent years, the discrepancy between train and test datasets
is decreased by introducing the idea of transfer learning to solve the first issue. In addition, the
non-contact sensors will replace the contact sensors. In this paper, a domain adaption residual neural
network (DA-ResNet) model using maximum mean discrepancy (MMD) and a residual connection
is constructed for cross-domain diagnosis of roller bearings based on acoustic and vibration data.
MMD is used to minimize the distribution discrepancy between the source and target domains,
thereby improving the transferability of the learned features. Acoustic and vibration signals from
three directions are simultaneously sampled to provide more complete bearing information. Two
experimental cases are conducted to test the ideas presented. The first is to verify the necessity of
multi-source data, and the second is to demonstrate that transfer operation can improve recognition
accuracy in fault diagnosis.
Keywords: intelligent fault diagnosis; roller bearings; multi-source data; domain adaption; ResNet
support vector machine (SVM) [12], and artificial neural network (ANN) [13,14]. However,
complex measurement environments and operating conditions will lead to the acquisition
of polluted data making the mentioned methods almost ineffective. Deep learning is an
enhanced machine learning method to solve the above problems. The internal correlations
and hidden details of the signal are exposed by deep learning methods with deep and
complex network structures [15,16]. In addition, the complex mapping relationship also
can be characterized. The common methods include the auto-encoder (AE) [17], long
short-term memory (LSTM) [18], generative adversarial networks (GAN) [19,20], and the
convolutional neural network (CNN) [21–23].
Nevertheless, these data-driven fault diagnosis methods are applied to single-sensor
vibration signals in constructed models [24–26]. The restriction of coverage range and
installation location meant that the accelerometer could not measure all the information of
the machinery equipment to monitor the health status. Acoustic signal analysis is usually
used to localize faults and diagnose faulty bearings. The acoustic is conducted by the
object vibration. The acoustic is another expression form of vibration. Omoregbee handled
acoustic signals as the input of the improved support vector machine (SVM) model for
fault identification [27]. In [28], vibration and acoustic signals are simultaneously sampled
to detect the liner scuffing fault in the engine system. Microphones are used to acquire
the acoustic signals, and then they are input into the 1D CNN-based networks for fault
diagnosis [29]. However, the microphone is yielded to the Doppler effect when the railway
vehicle moves very fast; thus, the sampled acoustic signal is polluted by the unrelated
components. For this phenomenon, researchers proposed an effective method to remove
the Doppler effect embedded in the acoustic signal [30].
Vibro-acoustic fault diagnosis methods of bearings have been investigated by many
researchers. Ye had detected vibro-acoustic characteristics in axial piston pumps under
varying operation conditions [31]. The new damage index is constructed to estimate the
nonlinearity of modulated signals to detect the crack width by processing vibro-acoustic
signals. The performance of the vibro-acoustic modulation method is better than the PZT-
enabled active sensing method in eliminating the saturation phenomenon [32]. In [33],
vibro-acoustic signals are used to complete information on fault characteristics for fault
diagnosis using an improved fusion algorithm. Subsequently, the features are extracted by
a one-dimension convolutional neural network (1D-CNN) from the vibration and acous-
tic signals, and then achieve 100% recognition accuracy under four speeds. Yang et al.
combined two improved projection methods for fault diagnosis, in particular the sam-
pling of vibro-acoustic signals under various operating conditions [34]. To eliminate the
frequency smearing phenomenon and expose the fault characteristic frequency in the enve-
lope spectrum, a new transient signal analysis (TSA)-based angular resampling method was
proposed for fault diagnosis under variable speed conditions by sound signal analysis [35].
However, no matter whether the data are sampled from a single-sensor or multi-
sensors, they still need to solve the insufficient labeled data problem. Massive data are
labeled manually, and this operation needs a huge manual operation and relies on knowl-
edge dependence. Furthermore, the fault diagnosis accuracy is directly affected by the
sufficient labeled data. Sometimes, the bearing fault types could be acquired by simula-
tion in a laboratory; this technique could alleviate the shortage of labeled data. Another
important factor of the diagnosis accuracy effect is that the distribution of training and
testing data is the same. The success of the intelligent fault diagnosis of rotating machinery
was demonstrated in [36], which validates the importance of the probability distribution
between training and testing sets.
Recently, a powerful tool named transfer learning has been used to solve the distribu-
tion discrepancy of intelligent fault diagnosis areas [37]. The difference between classical
intelligent fault diagnosis and transfer learning is that the latter has two datasets, source
domain and target domain. Without a doubt, source domain distribution is different
from the target domain. Reducing the distribution discrepancy is the purpose of transfer
learning, which will apply the knowledge of the labeled data to enhance the predictive
Sensors 2023, 23, 3068 3 of 18
model performance to identify the unlabeled data accurately. The feature-based method
is proposed to achieve the goal of distribution discrepancy reduction. The transferable
features can be learned by the deep hierarchical model from the cross-domain data. The
model is automatically learning features, which reduces the time cost compared with
feature mapping. In computer vision and speech recognition areas, feature-based methods
are widely employed and yielded some achievements [38,39]. The method provided a
new idea that the source domain data consists of spectrum data and partially labeled
target domain data [40]. MMD is an index to check whether two datasets are from the
same distribution. Domain adaption enhanced the deep convolution neural network to
implement fault diagnosis under different noise levels [41].
It is worth considering that accelerometer sensors, installed on flat positions of the
equipment surfaces, could obtain machine-related information for fault diagnosis. The
contact sensors are not suitable in irregular positions. Non-contact sensors are very suit-
able to sample the above working environments. An acoustic signal is sampled by the
microphone, which is a non-contact sensor. On the other hand, source and target data
distribution discrepancy is minimized by the MMD, and then high diagnosis accuracy
will be obtained by transferring the source data knowledge to the target data. MMD is
used to minimize the distribution discrepancy of the source and target domains to improve
the transferability of the bearing-related knowledge for acquiring high diagnosis accuracy.
Meanwhile, transfer learning operations can mitigate the insufficient datasets of bearings.
The common problem should be mentioned. As the network depth increases, the difficulty
of training the CNN model will gradually increase as well. Meanwhile, adding more layers
will bring more large training errors. The ResNet model could solve the problem of the
accuracy decrease as a result of the network depth increase by designing identity mappings
based on ordinary CNN. Facilitating the backpropagation of errors and optimizing model
parameters at the same time. The novelty and contributions of the paper can be concluded
as follows:
(1) Acoustic and three directions vibration signals are simultaneously sampled to be
regarded as the input of the model to reinforce the diagnosis knowledge of bearings.
(2) MMD is introduced to minimize the distribution difference between source and target
domains, thus improving the transferability of learned features. Combining the
advantages of the ResNet framework, it can guarantee high recognition accuracy from
one defect degree to another defect category.
The remainder of this paper is as follows. Both ResNet and MMD backgrounds
are shown in Section 2. Then the procedure of the proposed model is given in Section 3.
Section 4 analyzes the necessity of multi-source data and experimental results of the transfer
task. Lastly, Section 5 displays the overwhelming conclusion of the article.
where H is reproducing the kernel Hilbert space, and ∅(•) represents the nonlinear map-
ping, which is from the original feature space to the reproducing kernel Hilbert space. To
acquire the maximum distance between datasets U and V, the low dimension datasets will
be mapped in a high dimension space. Based on the kernel mean embedding of distribution,
of MMD is shown as follows:
2
1 ns 1 nt
DH2 (U ,V ) = ∑φ (ui ) ∑φ (vi )
Sensors 2023, 23, 3068
ns i =1 nt i =1 4 of 18
Gaussian
2.2. kernels acquire reproducing the kernel Hilbert space. The specific formula of
ResNet
MMD is shown as follows:
Accuracy will maintain a certain value and degrade 2
rapidly as the network
ns nt
increases [42]. This phenomenon 1 testified
is 1by He et al. who had validated
D H (U, V ) = k ∑ φ(ui ) ∑ φ(vi )k
2
(2) that th
n s i =1 n t i =1
is not caused by overfitting. Meanwhile, more layers are added, which will bring
training
2.2. ResNeterrors.
The structure
Accuracy of thea ResNet
will maintain is composed
certain value and degrade ofrapidly
the input
as thelayer,
networkthedepth
convolutio
f conv , the[42].
increases residual block, theismax-pooling
This phenomenon testified by He layer
et al. who f pool
had, validated
the activation result f relu ,
that thelayer
is not caused by overfitting. Meanwhile, more layers are added, which will bring larger
output layer. Aiming at describing the related information of the ResNet; we assu
training errors.
the sample is Xof=the
The structure [ xResNet
1 , x 2 , x3is,... x N ]T , and
composed the
of the sample’s
input layer, themean and variance
convolution layer f convare
, den
the residual block, the max-pooling layer f pool , the activation layer f relu , and the output
layer. Aiming at describing the related information of1theDResNet; we assume that the
μ
sample is X = [ x1 , x2, x3 , . . . x N ]T , and the sample’s
D
∑
= and variance
(X)mean xj
j =1
are denoted as
1 D
µ (X) = ∑ x j (3)
D j=11 D
σ (X) =
v
u D D j =1
∑( x j
2
− μ (D)) 2
u1
σ (X) = t ∑ ( x2j − µ( D ))
2
(4)
A residual block is introduced D jto
=1 solve the problem of the accuracy decrease
network depth
A residual increases.
block Furthermore,
is introduced to solve the the residual
problem of the block can
accuracy keep the
decrease perform
as the
models
network with
depth the depth Furthermore,
increases. of the model theincreasing. Figure
residual block 1 shows
can keep the basic structu
the performance
residual
of modelsblock;
with thenowadays, many
depth of the modelimproved
increasing. ResNet
Figure 1frameworks always
shows the basic change th
structure
of a residual block; nowadays, many improved ResNet frameworks always change
residual block. The residual block is different from most deep models in that the co the
basic residual block. The residual block is different from most deep models in that the
tional layers are connected by skipping, as shown in the curve of Figure 1.
convolutional layers are connected by skipping, as shown in the curve of Figure 1.
(a)
(b)
(c)
Figure 2. Cont.
Sensors 2023, 23, x FOR PEER REVIEW 7 of 19
Sensors 2023, 23, 3068 7 of 18
Sensors 2023, 23, x FOR PEER REVIEW 7 of 19
(d)
(d) method. (a) multi-source data; (b) cross-domain fault diagnosis;
Figure 2. Structure of the proposed
(c) structure
Figure of ResNet;
2. Structure
Structure the(d)
of the structure of Res-block.
proposed
Figure 2. of proposed method.
method. (a)
(a)multi-source
multi-sourcedata;
data;(b)
(b)cross-domain
cross-domainfault
faultdiagnosis;
diagnosis;
(c) structure of ResNet; (d) structure of Res-block.
(c) structure of ResNet; (d) structure of Res-block.
4. Experimental Verification
4. Experimental
4.4.1. Verification
Datasets Introduction
Experimental Verification
4.1. Datasets
4.1. Datasets Introduction
Introduction
This experiment was conducted in the Precision Metrology Laboratory, at the Me-
ThisEngineering
chanical
This experimentwas
experiment wasconducted
conducted
Department ofin inthe
Sant the Precision
Longowal
Precision Metrology
Institute
Metrology Laboratory,
ofLaboratory,
Engineering at the
atand
the Me-
Technol-
Mechan-
chanical
ogy Engineering
Longowal, India. Department
In this case, ofwe Sant
try Longowal
to identify Institute
fault
ical Engineering Department of Sant Longowal Institute of Engineering and Technology typesof Engineering
for testifying andtheTechnol-
property
ogy
of theLongowal,
Longowal, DA-ResNet
India. India. In this
Inmodel
this case,
using
case, wethewe try to identify
try laboratory
to identify fault
cylindrical
fault types
types forfor
roller testifying
bearingsthe
testifying the
with property
different
property of
of
the the
defect DA-ResNet
sizes. The
DA-ResNet model model
test using using
rig is the
shown the laboratory
in Figure
laboratory cylindrical
3, and the
cylindrical roller
shaft
roller bearings
speedwith
bearings with
is measured different
by the
different defect
defect
proximity
sizes. sizes.
The test The
rig test
sensor. isThe rig
shown is in
power shown in3,Figure
is provided
Figure and a3,346-Watt
bythe and the
shaft shaft
ACismotor
speed speedand
measured is measured
then
by the by the
is transferred
proximity
proximity
sensor. Thesensor.
to the shaft; a 2 kg
power The power
is disc isby
provided
is mounted
provided in the
a 346-Watt by aAC346-Watt
middlemotor AC shaft.
of and
the motor
then isA and then is
device
transferred transferred
named
to theashaft;
lever
ato2 the shaft;
arrangement
kg disc is amounted
is2 applied
kg discinisto mounted
theload
middle in
of the
a roller middle
bearing
the shaft. Aof
in thethe shaft.
vertical
device named A device
a lever named
direction. The load a lever
arrangement cellisis
arrangement
installedtobelow
applied load is aapplied
the
roller to load
bearing ainroller
housing
bearing the bearingdirection.
tovertical
measure in
thethe vertical
applied Theload.direction.
load The isThe load cell
cellaccelerometer
installed isisset
below
installed
on bearing
the below
the top of the bearing
the bearing
housing housing
housing
to measure to measure
thetoapplied
decrease the
load. applied
theTheeffect load. The accelerometer
of the transferring
accelerometer is set onpath. is
Atset
the top the
of
on
same
the the top of the
time, housing
bearing bearing
the microphone housing
to decrease is set to decrease
thetoeffect
the nearest the effect of the
of the test bearing.
of the transferring transferring
path. AtThe same time,the
path.
theexperiment At was
the
same time, under
microphone
conducted the
is setmicrophone
to
thethe is set
nearest
conditions: of tothethe
shaft nearest
test bearing.
speed andof the
Thetest bearing.
experiment
vertical load arewas The experiment
2050conducted
rpm and under wasN,
200
conducted
the
respectively, under
conditions:and the
shaftthe conditions:
speed andacquired
signals shaft
vertical load speed and
arework
in this vertical
2050 rpm wereandload are 2050
200 N, at
recorded rpm and
respectively,
a samplingand200
rateN,
theof
respectively,
signals
70,000 Hz.acquired andinthe thissignals
work wereacquired recordedin thisatwork were recorded
a sampling at a sampling
rate of 70,000 Hz. rate of
70,000 Hz.
Figure 3.Test
Test rig.
Figure 3. Testrig.
Figure3. rig.
Inner Race Outer Race
e Ball Number Z Pitch Diameter D Ball Diameter d Contact Angle 𝛉
Diameter Diameter
13 38.9 mm 7.5 mm 0o 25 mm 52 mm
Sensors 2023, 23, 3068 8 of 18
Figure 4. The figures of defect elements of bearings under different fault degrees.
The waveform of vibration and acoustic signals is shown in Figure 5. The x axis
represents the number of sampled points, and the vertical direction is the amplitude of
the vibration and acoustic signals. It is noted that, from top to bottom, are the exhibited
subfigures of Figure 5a, named VS1, VS2, VS3 (vertical, horizontal, and axial directions of
the tested bearings), and AS signals. The faulty vibration signals have transient impulses,
and an inner race and roller cases; the acoustic signals could match the impulses’ locations
Sensors 2023, 23, 3068 9 of 18
sometimes. There are two ideas verified by the above signals, the first idea is to check
that the DA-ResNet is superior to the other intelligent diagnosis methods. Then, the
effectiveness
Figure of vibro-acoustic
4. The figures multi-source
of defect elements signal
of bearings is testified
under by a degrees.
different fault transfer task.
Experimental Configuration
4.2. Experimental Configuration
The common compared models
The common compared models are are used
used to test the performance
performance of the proposed
Multilayer perception
model. Multilayer perception (MLP),
(MLP), also
also named
named ANN,ANN, inputs
inputs and
and outputs
outputs layers;
layers; many
many
hidden layers
hidden layers are included between the input and output layers. The simplest network network
has aa hidden
has hidden layer,
layer, which
which cancan learn
learn features
features from
from the
the input
input data. BiLSTM consists
data. BiLSTM consists of of the
the
forward and backward LSTM, the former is to process the input data
forward and backward LSTM, the former is to process the input data and the latter for the and the latter for
reversed data, and then to splice the output of two LSTMs after processing. CNN is is
the reversed data, and then to splice the output of two LSTMs after processing. CNN a
a supervised
supervised learningneural
learning neuralnetwork
networkwith witha aconvolutional
convolutionallayer,layer,aapooling
poolinglayer,
layer, aa batch
batch
normalization layer,
normalization layer, and
and activation
activation function,
function, commonly
commonly regarded
regarded as as aa feature
feature extractor.
extractor.
CNN has had great success in high dimension, such as image, video, and
CNN has had great success in high dimension, such as image, video, and light fields. Low light fields. Low
dimension includes
dimension includes seismic
seismic waves,
waves, radar
radar data,
data, biological
biological signals,
signals, and
and so
so on.
on. In
Inparticular,
particular,
CNN is widely applied in fault diagnosis for feature
CNN is widely applied in fault diagnosis for feature extraction. extraction.
(1) (1)Baseline: MLPMLP
Baseline:
As aa baseline
baseline model,
model, the
the MLP
MLP is
is composed
composed of of two
two dense
dense layers
layers (called
(called fully
fully connected
connected
layers). A
layers). A large
large number
number of of parameters
parameters inin MLP
MLP results
results from the full connection between
input and
the input and output
output of
of each
each dense
dense layer, and the dropout layer is used to overcome the
parameters in
overfitting caused by numerous trainable parameters in the
the MLP
MLP model.
model. Specifically,
Specifically, the
structure of
of MLP
MLPcancanbebe
described as:as:
described {Input (4096,),
{Input dense
(4096,), (32,),(32,),
dense dropout (32,), dense
dropout (128,),
(32,), dense
dropout (128,)}.
(128,), dropout (128,)}.
(2) (2) BiLSTM
BiLSTM
For the BiLSTM model, the convolutional layer is introduced to overcome the com-
putational complexity caused by the recurrence mechanisms of LSTM and the solving
Sensors 2023, 23, 3068 10 of 18
For the BiLSTM model, the convolutional layer is introduced to overcome the com-
putational complexity caused by the recurrence mechanisms of LSTM and the solving
technique is to embed the row signals into a low dimensional feature vector. The structure
of BiLSTM is as follows: {Input (4096,), convolution (128, 64), BiLSTM (32,), dropout (32,),
dense (128,), dropout (128,)}.
(3) CNN
In CNN, the structure of this model is to stack in turn several convolutional and
max-pooling layers. Specifically, the details of the CNN model are as follows: {Input (4096,),
convolution (1024, 4), max-pooling (512, 4), convolution (128, 8), max pooling (64, 8), flatten
(512,), dense (128,)}.
(4) ResNet
For the ResNet, residual blocks are significant characteristics and provide a multi-
receptive field due to the skip-connection. Inspired by residual networks in computer
vision, a simple ResNet is designed to diagnose bearings’ faults. The constructed structure
of the model can be described as follows: {Input (4096,), convolution (1024, 4), residual
block (512, 8), max-pooling (256, 8), residual block (256, 16), max-pooling (128, 16), residual
block (128, 32), max-pooling (64, 32), flatten (2048,), dense (128,)}.
In this paper, the specific parameters of the DA-ResNet model are shown in Table 3,
withthe improved ResNet in Figure 2c. The specific layers’ parameters of the mentioned
four compared models are described in the above part of the table. Similarly, more informa-
tion on these models is shown in Table 4.
with four experimental tasks. Vibration and acoustic signals are collected under the same
working conditions. The difference is the defect size of the faulty elements of roller bearings,
and the training sample consists of vibration and acoustic signals.
The diagnostic results of six methods are given in Table 5, and their corresponding
histogram is shown in Figure 6. The F1-scores of MLP, BiLSTM, CNN, ResNet, DA-CNN,
and DA-ResNet are shown in Table 6. The histogram of the F1-scores is shown in Figure 7.
In Table 5, the capital letters A, B, C, and D are denoted by the fault degrees (also named
Sensors 2023, 23, x FOR PEER REVIEW 11 of 19
datasets under the same defect size). For the detailed sizes, are refer to Table 2. For example,
the fault degree A is the source domain and B is the target domain.
testified
Table with
5. The four experimental
accuracy of six methodstasks. Vibration
for four and acoustic
transfer diagnostic tasks.signals are collected under
the same working conditions. The difference is the defect size of the faulty elements of
roller Model A→B
bearings, and the training A→C of vibrationAand
sample consists →D acoustic signals.
D→A
The
MLPdiagnostic results of six methods
64.00% are given in Table
41.87% 5, and their corresponding
44.62% 45.12%
histogram
BiLSTMis shown in Figure
87.83% 6. The F1-scores
59.25%of MLP, BiLSTM,
75.38%CNN, ResNet, DA-CNN,
74.62%
CNN 95.10% 73.35% 80.75%
and DA-ResNet are shown in Table 6. The histogram of the F1-scores is shown in Figure92.50%
ResNet
7. In Table 5, the capital 96.50%
letters A, B, C, and71.3%
D are denoted by91.62%
the fault degrees 95.50%
(also named
DA-CNN 98.37% 76.75% 97.70% 94.50%
datasets under
DA-ResNet
the same defect
99.87%
size). For the
83.5%
detailed sizes,are
98.12%
refer to Table
95.40%
2. For
example, the fault degree A is the source domain and B is the target domain.
100.00%
80.00%
60.00%
40.00%
20.00%
MLP BiLSTM CNN ResNet DA-CNN DA-ResNet
Figure 6. The accuracy of the six methods on various transfer diagnostic tasks.
Figure 6. The accuracy of the six methods on various transfer diagnostic tasks.
Table 6. The F1-score of six methods for four transfer diagnostic tasks.
100.00%
Model A→B A→C A→D D→A
MLP 60.25% 37.03% 38.22% 38.45%
BiLSTM
80.00% 87.88% 54.22% 71.26% 87.49%
CNN 94.50% 73.10% 77.53% 92.34%
ResNet 96.50% 85.50% 87.10% 95.50%
DA-CNN 98.46% 70.07% 97.70% 94.40%
60.00%
DA-ResNet 99.80% 83.62% 98.10% 95.70%
40.00%
20.00%
MLP LSTM CNN ResNet DA-CNN DA-ResNet
Figure 7. The F1-scores of the six methods on various transfer diagnostic tasks.
The F1-score is a tool that evaluates the accuracy of predictions and takes into account
20.00%
MLP BiLSTM CNN ResNet DA-CNN DA-ResNet
Figure 6. The accuracy of the six methods on various transfer diagnostic tasks.
100.00%
80.00%
60.00%
40.00%
20.00%
MLP LSTM CNN ResNet DA-CNN DA-ResNet
Figure 7. The F1-scores of the six methods on various transfer diagnostic tasks.
Figure 7. The F1-scores of the six methods on various transfer diagnostic tasks.
TheF1-score
The F1-scoreisisaatool
toolthat
thatevaluates
evaluatesthe
theaccuracy
accuracyofofpredictions
predictionsand
andtakes
takesinto
intoaccount
account
whetherintelligent
whether intelligentdiagnostic
diagnosticmethods
methodshave
haveaapreference
preferencefor
fordiagnostic
diagnosticperformance
performanceinin
differentcategories.
different categories.TheTheformulas
formulasofofthe
theF1-score
F1-scoreare
aregiven
givenasasfollows:
follows:
𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = TP (5)
Precision = 𝑇𝑃 + 𝐹𝑃 (5)
TP + FP
TP
Recall = (6)
TP + FN
2 × Precision × Recall
F1 = (7)
Precision + Recall
in which TP is denoted as the predicted positive class, FP is the predicted positive class of
error, and FN represents the predicted negative class of error. The parameter definitions of
the above formulas are shown as follows: precision and recall represent accuracy and recall,
respectively, and F1 is the F1-score. The F1-score is introduced to evaluate the diagnostic
property of MLP, BiLSTM, CNN, ResNet, DA-CNN, and DA-ResNet in diagnostic tasks.
The key to this case is to use one defect size of the faulty element of bearings to
diagnose the other fault types. The diagnosis results of the MLP, BiLSTM, CNN, ResNet,
and DA-CNN are given in Table 5. In Table 5, the capital letters A, B, C, and D denote the
fault degrees (also named datasets under the same defect size). The detailed sizes can be
referred to in Table 2. For example, the fault degree A is a training sample, and B is used to
test the sample. The average accuracy of the compared methods is 48.9%, 74.27%, 85.43%,
88.73%, and 91.83%. However, the proposed method result is 94.22%, which is superior
to the comparison methods in four diagnostic tasks. Especially in the task from A to C,
the lowest accuracy is half of the highest diagnostic value. The result can predict the MLP,
and BiLSTM methods cannot separate the unknown label samples. CNN and ResNet can
separate parts of unknown label samples, Nevertheless, without solving the problem of
domain adaptation that model would not lead to a good result. The structure characteristic
of ResNet is residual connections; a residual block is introduced to solve the problem of the
accuracy decrease as a result of the increase of network depth. Furthermore, the residual
block can keep the performance of models while the depth of the model increases. On the
contrary, the characteristic of conventional networks is that accuracy will maintain a certain
value and degrade rapidly as the network depth increases; meanwhile, more layers are
added will bring more high training error. From Table 6, the highest F1-score values are
Sensors 2023, 23, 3068 13 of 18
used in bold font in the table. The performance of MLP in four diagnostic tasks is not good,
and the proposed method has the highest F1-score in three diagnostic tasks. Combined with
the accuracy of the DA-ResNet, it can be verified that the diagnostic property is superior to
the other diagnostic methods.
To compare the results of six methods, t-Distributed Stochastic Neighbor Embedding
(t-SNE) [43] is introduced to visualize the operating results. Four colors represent four
healthy conditions of roller bearings. As shown in Figure 8, the four colors are completely
mixed together, which is obtained by MLP. Comparing CNN and ResNet, the blue and
red colors are mixed, which is obtained by using the CNN model, and the result is worse
than ResNet’s result. The confusion matrix of diagnostic results of MLP, BiLSTM, CNN,
ResNet, DA-CNN, and DA-ResNet is shown in Figure 9. It is obvious that the multi-layer
method MLP has not solved the domain adaption problem of source and target domains,
only in the shared part with the recognition accuracy being close to 95%. The non-shared
part from the target domain is mixed with the other classes and the average accuracy is
15%. The confusion matrix of BiLSTM is better than MLP; however, the second class is
used to predict the same class with a low diagnostic accuracy of 69.6%. Comparing CNN
with DA-CNN methods means that the MMD principle can decrease the discrepancy of the
source and target domains. The same result is obtained from ResNet and DA-ResNet. Then,
the diagnostic accuracy of DA-CNN is lower than DA-ResNet. The result indicates that
the conventional network accuracy will maintain a certain value and degrade rapidly as
the network depth increases. Meanwhile, more layers are added which will bring a higher
training error.
(a) (b)
(c) (d)
(e) (f)
Figure 8. Feature
Figure representation
8. Feature of the
representation methods
of the in in
methods task
task A→B.(a)
A→B. (a)MLP;
MLP;(b)
(b)BiLSTM;
BiLSTM; (c)
(c) CNN;
CNN; (d)
ResNet; (e) DA-CNN;
(d) ResNet; (f) DA-ResNet.
(e) DA-CNN; (f) DA-ResNet.
Sensors 2023,
Sensors 2023, 23,
23, 3068
x FOR PEER REVIEW 1515of
of 19
18
Figure 9. Confusion
Figure 9. Confusion matrix
matrix demonstrating
demonstrating classification
classification performance
performance of
of methods
methods in
in the
the task
task A→B.
A→B.
(a) MLP; (b) BiLSTM; (c) CNN; (d) ResNet; (e) DA-CNN; (f) DA-ResNet.
(a) MLP; (b) BiLSTM; (c) CNN; (d) ResNet; (e) DA-CNN; (f) DA-ResNet.
100.00%
80.00%
60.00%
40.00%
20.00%
MLP LSTM CNN ResNet DA-CNN DA-ResNet
VS1 44.62% 75.38% 80.75% 91.62% 97.70% 98.12%
VS2 38.73% 52.83% 78.01% 93.75% 92.16% 97.46%
VS3 44.61% 61.76% 80.12% 90.75% 92.20% 96.79%
AS 41.52% 57.83% 75.30% 79.57% 82.42% 88.35%
Fused 51.65% 78.72% 93.68% 95.81% 98.67% 99.51%
Figure10.
Figure 10.The
Theaccuracy
accuracyofofthe
thesix
sixmethods
methodson
onvarious
varioussignals.
signals.
5.5.Discussion
Discussion
Thispaper
This paperhashas achieved
achieved fault
fault diagnosis
diagnosis under
under different
different failure
failure degrees
degrees in oneinmachine.
one ma-
chine.the
From From the accuracy
accuracy results results of the cross-domain
of the cross-domain fault diagnosis
fault diagnosis and transfer
and transfer diagno-
diagnosis of
sis of multi-source
multi-source signals,signals, the proposed
the proposed modelmodel can obtain
can obtain the highest
the highest recognition
recognition accu-
accuracy.
racy. However,
However, in someintransfer
some transfer
tasks, thetasks, the accuracy
accuracy valueachieve
value cannot cannot aachieve
higher aaccuracy.
higher accu-
For
example, in the task from A to C, the value only reaches 83.5% by DA-ResNet, DA-ResNet,
racy. For example, in the task from A to C, the value only reaches 83.5% by which may
whichthe
guess may guess the
extracted extracted
features features
of the failureofdegree
the failure
C aredegree
a littleCsimilar
are a little similardegree
to failure to failure
A,
degree
and thenA, and
the then the distribution
distribution between
between failure failure
degree degree
C and failureC and failure
degree A hasdegree A has not
not decreased
indecreased
minimum in value.
minimum value. Therefore,
Therefore, the proposedthe proposed modelbeshould
model should changed be changed
to balance to the
bal-
ance the recognition
recognition accuracy in accuracy in fourtasks.
four transfer transfer tasks.
6.6.Conclusions
Conclusions
InInthis
thispaper,
paper,wewelook
lookatatthe
theproblem
problemofofthe
thedistribution
distributionofofdatasets
datasetsbeing
beingdifferent
differentinin
obtaining bearings’ fault data, a fault diagnosis method based on MMD
obtaining bearings’ fault data, a fault diagnosis method based on MMD named DA-Res- named DA-ResNet
isNet
proposed. At theAt
is proposed. same time, the
the same vibration
time, and acoustic
the vibration data are
and acoustic sampled
data synchronously
are sampled synchro-
asnously as the input term in the proposed model. The multi-source data canmechanical
the input term in the proposed model. The multi-source data can perfect the perfect the
equipment
mechanicalinformation of rotating machinery
equipment information of rotating and the proposed
machinery and the method
proposed can improve
method can
the generalization ability of the model. From the first experimental case, the
improve the generalization ability of the model. From the first experimental case, the com-comparison
results
parison of results
MLP, BiLSTM,
of MLP,CNN, BiLSTM,ResNet,
CNN, DA-CNN,
ResNet,and DA-ResNet
DA-CNN, and are given through
DA-ResNet the
are given
confusion matrix. The highest diagnosis accuracy is obtained by DA-ResNet.
through the confusion matrix. The highest diagnosis accuracy is obtained by DA-ResNet. In the last
case, the necessity of multi-source data is verified by the histogram, and the tool could
improve the diagnosis accuracy to a certain extent. Finally, the performance of the proposed
method through related experiments could further verify the effectiveness and feasibility
of this paper.
The proposed method achieves fault diagnosis from cross-domain by using vibration
and acoustic data. In future work, the more physical quantities are considered as the input
of the model, maybe the higher accuracy is acquired. The importance of this issue is to
study the relation in physical quantities for further research to get high accuracy of faulty
recognition. The research object is the roller bearing, which is a simple structure to extract
features in fault diagnosis. Therefore, the next step of this paper is to change complex
machinery parts to verify the performance of the proposed model. The proposed model
may be introduced to engineering applications rather than in experimental test rigs if the
complex machinery parts case will succeed.
Sensors 2023, 23, 3068 17 of 18
References
1. Yang, B.; Lei, Y.G.; Jia, F.; Xing, S.B. An intelligent fault diagnosis approach based on transfer learning from laboratory bearings to
locomotive bearings. Mech. Syst. Signal Process. 2019, 122, 692–706. [CrossRef]
2. Liu, X.Y.; Liu, S.L.; Xiang, J.W.; Sun, R.X. A conflict evidence fusion method based on the composite discount factor and the game
theory. Inf. Fusion 2023, 94, 1–16. [CrossRef]
3. Abbasi, A.R.; Mahmoudi, M.R. Application of statistical control charts to discriminate transformer winding defects. Electr. Pow.
Syst. Res. 2021, 191, 106890. [CrossRef]
4. Abbasi, A.R.; Mahmoudi, M.R.; Arefi, M.M. Transformer winding faults detection based on time series analysis. IEEE Trans.
Instrum. Meas. 2021, 70, 3516210. [CrossRef]
5. Mahmoudi, M.R.; Nematollahi, A.R.; Soltani, A.R. On the detection and estimation of the simple harmonizable processes. Iran. J.
Sci. Technol. IJST Trans. A Sci. 2015, 39, 239–242.
6. Gao, Y.; Liu, X.Y.; Xiang, J.W. FEM simulation-based generative adversarial networks to detect bearing faults. IEEE Trans. Ind. Inf.
2020, 16, 4961–4971. [CrossRef]
7. Gao, Y.; Liu, X.Y.; Huang, H.Z.; Xiang, J.W. A hybrid of FEM simulations and generative adversarial networks to classify faults in
rotor-bearing systems. ISA Trans. 2021, 108, 256–266. [CrossRef] [PubMed]
8. He, Z.J.; Tu, X.T.; Bao, W.J.; Hu, Y.; Li, F.C. Second-order transient-extracting transform with application to time-frequency
filtering. IEEE Trans. Instrum. Meas. 2020, 70, 5428–5436. [CrossRef]
9. Yang, X.K.; Wei, D.D.; Zuo, M.J.; Tian, Z.G. Analysis of vibration signals and detection for multiple tooth cracks in spur gearboxes.
Mech. Syst. Signal Process. 2022, 185, 109780. [CrossRef]
10. Abbasi, A.R.; Mahmoudi, M.R.; Avazzadeh, Z. Diagnosis and clustering of power transformer winding fault types by cross-
correlation and clustering analysis of FRA results. IET Gener. Transm. Dis. 2018, 12, 4301–4309. [CrossRef]
11. Liu, X.Y.; Huang, H.Z.; Xiang, J.W. A personalized diagnosis method to detect faults in a bearing based on acceleration sensors
and an FEM simulation driving support vector machine. Sensors 2022, 20, 420. [CrossRef]
12. Mohammadi, M.; Mosleh, A.; Vale, C.; Ribeiro, D.; Montenegro, P.; Meixedo, A. An unsupervised learning approach for wayside
train wheel flat detection. Sensors 2023, 23, 1910. [CrossRef] [PubMed]
13. Jonathan, S.; Abbas, R.; Christoph, G. Predictive modeling of concentration-dependent viscosity behavior of monoclonal antibody
solutions using artificial neural networks. mABs-Austin. 2023, 15, 2169440.
14. Gao, Y.; Liu, X.Y.; Xiang, J.W. Fault detection in gears using fault samples enlarged by a combination of numerical simulation and
a generative adversarial network. IEEE-ASME Trans. Mech. 2021, 27, 3798–3805. [CrossRef]
15. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
16. He, M.; He, D. Deep learning based approach for bearing fault diagnosis. IEEE Trans. Ind. Appl. 2017, 3, 3057–3065. [CrossRef]
17. Zhang, S.; Chen, J.Y.; Chen, J.Y.; Chen, X.F.; Huang, H.J. Data imputation in LOT using spaton-temporal variational auto-encoder.
Neurocomputing 2023, 529, 23–32. [CrossRef]
18. Gao, D.W.; Zhu, Y.S.; Ren, Z.J.; Yan, K.; Kang, W. A novel weak fault diagnosis method for rolling bearings based on LSTM
considering quasi-periodicity. Knowl.-Based Syst. 2021, 14, 107413. [CrossRef]
19. Xiang, J.W. Numerical model driving personalized diagnosis principle for fault detection in mechanical transmission systems.
J. Mech. Eng. 2021, 57, 116–128.
20. Xiang, J.W. Numerical simulation driving generative adversarial networks in association with the artificial intelligence diagnostic
principle to detect mechanical faults. Sci. Sin. Tech. 2021, 51, 341–355. [CrossRef]
21. Liu, X.Y.; Huang, H.Z.; Xiang, J.W. A personalized diagnosis method to detect faults in gears using numerical simulation and
extreme learning machine. Knowl.-Based Syst. 2020, 195, 105653. [CrossRef]
Sensors 2023, 23, 3068 18 of 18
22. Sun, J.H.; Li, C.; Xiao, Z.W.; Xie, Y.X. Automatic multi-fault recognition in TFDs based on convolutional neural network.
Neurocomputing 2017, 222, 127–136. [CrossRef]
23. Kumar, A.; Zhou, Y.Q.; Gandhi, C.P.; Kumar, R.; Xiang, J.W. Bearing defect assessment using wavelet transform based deep
convolutional neural network. Alex. Eng. J. 2020, 59, 999–1012. [CrossRef]
24. Wang, F.A.; Jiang, H.K.; Shao, H.D. An adaptive deep convolutional neural network for rolling bearing fault diagnosis. Meas. Sci.
Technol. 2017, 28, 095005.
25. Chang, X.; Tang, B.P.; Tan, Q.; Deng, L.; Zhang, F.H. One-dimensional fully decoupled networks for fault diagnosis of planetary
gearboxes. Mech. Syst. Signal. Process. 2020, 141, 106482. [CrossRef]
26. Li, X.; Yang, Y.; Shao, H.D.; Zhong, X.; Cheng, J.; Cheng, J.S. Symplectic weighted sparse support matrix machine for gear fault
diagnosis. Meausrement 2021, 168, 108392. [CrossRef]
27. Wang, S.H.; Xiang, J.W.; Tang, H.S.; Liu, X.Y.; Zhong, Y.T. Minimum entropy deconvolution based on simulation-determined
band pass filter to detect faults in axial piston pump bearings. ISA Trans. 2019, 88, 186–198. [CrossRef]
28. Omoregbee, H.O.; Heyns, P.S. Fault classification of low-speed bearings based on support vector machine for regression and
genetic algorithms using acoustic emission. J. Vib. Eng. Technol. 2019, 7, 455–464. [CrossRef]
29. Ramteke, S.M.; Chelladurai, H.; Amarnath, M. Diagnosis of liner scuffing fault of a diesel engine via vibration and acoustic
emission analysis. J. Vib. Eng. Technol. 2020, 8, 815–833. [CrossRef]
30. Wang, X.; Mao, D.X.; Li, X.D. Bearing fault diagnosis based on vibro-acoustic data fusion and 1D-CNN network. Measurement
2021, 173, 108518. [CrossRef]
31. Ginevra, P.; Stefano, G.; Nicola, L.; Tea, C.; Roberto, M.; Roberto, S. Transcranial Doppler detects micro emboli in patients with
asymptomatic carotid stenoses undergoing endarterectomy. J. Vasc. Surg. 2023, 77, 811–817.
32. Ye, S.G.; Zhang, J.H.; Xu, B.; Song, W.; Zhu, S.Q. Experimental studies of the vibro-acoustic characteristics of an axial piston pump
under run-up and steady-state operating conditions. Measurement 2019, 133, 522–531. [CrossRef]
33. Li, N.; Wang, F.R.; Song, G.B. New entropy-based vibro-acoustic modulation method for metal fatigue crack detection: An
exploratory study. Measurement 2020, 150, 107075. [CrossRef]
34. Zhang, T.; Xu, F.Y.; Jia, M.P. A centrifugal fan blade damage identification method based on the multi-level fusion of vibro-acoustic
signals and CNN. Measurement 2022, 199, 111475. [CrossRef]
35. Ying, D.; Li, Y.; Yuan, R.; Yang, K.; Zhong, H.Y. A novel vibro-acoustic fault diagnosis method of rolling bearings via entropy-
weighted nuisance attribute projection and orthogonal locality preserving projections under various operating conditions. Appl.
Acoust. 2022, 196, 108889. [CrossRef]
36. Lu, S.L.; Wang, X.X.; He, Q.B.; Liu, F.; Liu, Y.B. Fault diagnosis of motor bearing with speed fluctuation via angular resampling of
transient sound signals. J. Sound Vib. 2016, 385, 16–32. [CrossRef]
37. Lin, K.S.; Zhao, Y.C.; Wang, L.; Shi, W.J.; Cui, F.F.; Zhou, T. MSWNet: A visual deep machine method adopting transfer learning
based upon ResNet 50 for municipal solid waste sorting. Front. Env. Sci. Eng. 2023, 17, 77. [CrossRef] [PubMed]
38. Patel, V.M.; Gopalan, R.; Li, R.; Chellappa, R. Visual domain adaptation. IEEE Signal Process. 2015, 32, 53–69. [CrossRef]
39. Sun, S.; Zhang, B.; Xie, L.; Zhang, Y. An unsupervised deep domain adaptation approach for robust speed recognition. Neurocomt-
ing 2017, 257, 79–87. [CrossRef]
40. Wang, R.; Huang, W.G.; Wang, J.; Shen, C.Q.; Zhu, Z.K. Multisource domain feature adaptation network for bearing fault
diagnosis under time-varying working conditions. IEEE Trans. Instrum. Meas. 2022, 71, 3511010. [CrossRef]
41. Wen, L.; Gao, L.; Li, X. A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man
Cybern. Syst. 2019, 49, 136–144. [CrossRef]
42. He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
43. Laurens, V.D.M.; Hinton, G. Visualizing data using t-SNE. J. Mach. Lear. Res. 2008, 9, 2579–2605.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.