Monitoring and Identifying Wind Turbine Generator Bearing Faults Using Deep Belief Network and EWMA Control Charts
Monitoring and Identifying Wind Turbine Generator Bearing Faults Using Deep Belief Network and EWMA Control Charts
Wind turbines are widely installed as the new source of cleaner energy production.
Dynamic and random stress imposed on the generator bearing of a wind turbine may
lead to overheating and failure. In this paper, a data-driven approach for condition
monitoring of generator bearings using temporal temperature data is presented. Four
algorithms, the support vector regression machine, neural network, extreme learning
machine, and the deep belief network are applied to model the bearing behavior.
Comparative analysis of the models has demonstrated that the deep belief network is
Edited by:
Xun Shen, most accurate. It has been observed that the bearing failure is preceded by a change in the
Tokyo University of Agriculture and prediction error of bearing temperature. An exponentially-weighted moving average
Technology, Japan
(EWMA) control chart is deployed to trend the error. Then a binary vector containing
Reviewed by:
Zhiyu Sun,
the abnormal errors and the normal residuals are generated for classifying failures. LS-SVM
The University of Iowa, United States based classification models are developed to classify the fault bearings and the normal
Heming Huang, ones. The proposed approach has been validated with the data collected from 11 wind
Wuhan University, China
turbines.
*Correspondence:
Jiahao Deng Keywords: bearing failure, condition monitoring, deep belief network, EWMA control chart, SCADA data analysis
[email protected]
error (APE), the mean absolute percentage error (MAPE) and E(v, h) − ai vi − bj hj − hj wj,i vi , (1)
i1 j1 i1 j1
the root mean square error (RMSE). The analysis of industrial
where: a is the bias vector of the visible layer; b is the bias vector of
the hidden layer; and w is the weight matrix between the two
layers. The parameters of the objective function (a, b, w) are
updated based on the gradients of the function expressed in Eqs
6–8. The updating rules are formulated in Eqs 9–11 (Hinton
et al., 2006).
zlog P(v, h)
〈vi hi 〉P(h|v) − 〈vi hi 〉recon, (6)
zwj,i
zlog P(v, h)
〈vi 〉P(h|v) − 〈vi 〉recon, (7)
zai
zlog P(v, h)
〈hi 〉P(h|v) − 〈hi 〉recon (8)
zbj
FIGURE 2 | Architecture of the deep belief network.
wi+1 wi + η〈vi hi 〉P(h|v) − 〈vi hi 〉recon , (9)
bi+1 bi + η〈vi 〉P(h|v) − 〈vi 〉recon , (10)
e−E(v,h) ai+1 ai + η〈hi 〉P(h|v) − 〈hi 〉recon , (11)
P(v, h) , (2)
e−E(v,h)
v h where: η is the learning rate; 〈〉P(h|v) is the expectation of the
nh conditional distribution with respect to the original input data;
P(vi 1|h) sig⎛ ⎠,
⎝αi + wj,i hj ⎞ (3) 〈〉recon is the i-step reconstructed distribution obtained by the
j1 alternating Gibbs sampling scheme. The expectation of the
nv reconstructed distribution is computed following the rules of
P(hi 1|v) sig⎛ ⎠,
⎝bj + wj,i vi ⎞ (4) contrastive divergence (Hinton, 2002).
i1
where: vi is the number of neurons in the visible layer; hi is the 2.3 Data-Driven Algorithms
number of Boolean neurons within the hidden layer; wj,i is the Performance of the deep belief network (DBN) is compared
weight matrix between the visible layer and hidden layer; ai and bi with three algorithms, support vector regression machine
are the biases of the two layers; and sig() denotes the logistic (SVR), neural network (NN), and extreme learning
sigmoid function. Hence, the weight matrix and the layer biases machine (ELM).
are obtained in a layer-wise unsupervised pre-training described The support vector regression machine (SVR) is considered in
in the Section 2.2. this study includes a Gaussian kernel function (Drucker et al.,
1997). The values of the model parameters (c and γ) are selected
2.2 Layer-wise Pre-training based on the 10-fold cross-validation. The neural network (NN)
A deep belief network (DBN) includes multiple layers of contains two hidden layers. By testing on a small portion of the
restricted Boltzmann machines (RBMs) (Ouyang et al., 2019). training data, the sigmoid activation function is selected based on
Figure 2 shows the architecture of the proposed DBN. The first the satisfactory performance. The extreme learning machine
RBM of the DBN model consisting of a visible and a hidden layer (ELM) algorithm (Liang et al., 2006) is utilized to model the
(hidden layer 1) is pre-trained as an independent RBM. Then, normal bearing temperature. As a single-hidden layer feed-
the weight matrix of the first RBM is computed. The output of forward network, the ELM learning model is expressed in Eqs
the first RBM becomes the input to the second RBM that 12, 13 (Liang et al., 2006).
includes two layers. The first layer (hidden layer 1) is treated
fL
xj oj , ∀j, (12)
as a visible layer of the second RBM while the second
L
layer (hidden layer 2) is treated as the hidden layer. The
βi G
ai , bi , xj tj , j 1, 2, ..., N, (13)
weight matrix of the second RBM is computed. Hence, the i1
weight matrices between the remaining hidden layers are
obtained iteratively. where: xj represents the input parameters; oj represents the
Training each restricted Boltzmann machine (RBM) is predicted output values; fL() is the non-linear function
accomplished with a stochastic gradient descent method representing the ELM algorithm; ai is the weight vector
(Hinton et al., 2006). Based on vector Eq. 2 of the joint connecting the ith hidden node and the input nodes; bi is the
distribution function between the visible and hidden layer, the threshold of the ith hidden node; βi is the weight vector
objective function of the stochastic gradient descend method is connecting the ith hidden node and the output nodes; and tj is
expressed in Eq. 5 (Wang et al., 2016c). the actual output value.
λ1 − (1 − λ)2t
UCL(t) μAPE + L p σ APE , (18)
(2 − λ)N
λ1 − (1 − λ)2t
LCL(t) μAPE − L p σ APE , (19)
(2 − λ)N
where: µAPE is the mean of absolute percentage error (APE); σ APE
is the standard deviation of APE; N denotes number of samples.
According to Horng Shiau and Ya-Chen. (2005), the value of the
parameter L is commonly set to 3 and λ is usually set to 0.2.
Turbine id. Bearing temperature Bearing failure Failure times Parameter BTA WGS RFA
Min (°C) Max (°C)
Generator phase-1 winding temperature 100 10 0.1
Generator phase-2 winding temperature 98 10 0.1
A 14 68 No
Generator air temperature 97 10 0.09
B 11 87 Yes 8
Generator rear temperature 96 9 0.09
C 0 63 No
Generator phase-3 winding temperature 96 9 0.1
D 3 68 No
Water cooler temperature 91 8 0.11
E 13 71 No
Phase compensation panel temperature 78 7 0.07
F 13 69 No
Nacelle temperature 78 6 0.05
G 0 73 No
H 6 90 Yes 2
I 9 86 Yes 5
J 14 71 No
K 7 85 Yes 7
(Kohavi and John, 1997), boosting-tree algorithm (BTA)
(Sbihi, 2007), and the relief algorithm (RA) (Liu et al., 2018)
were applied to select the most relevant parameters for predicting
LS-SVM has the unique superiority in dealing with the small- the generator bearing temperature. The wrapper approach uses
sample learning problem. supervised learning to perform 10-fold cross validation in
The ELM is a feedforward neural network which contains the selecting relevant parameters. The boosting-tree algorithm
input layer, the output layer and one single hidden layer. evaluates the importance of parameters by constructing a
Compared with other computationally expensive and time- sequence of decision trees and computing the prediction
consuming neural networks, the ELM adopts Penn Moore residuals. The relief algorithm selects the parameter set by
pseudo inverse to determine the weights and biases between detecting conditional dependence between the parameters. The
the hidden layer and output layer (Li et al., 2021b). This eight most important parameters selected by the three data-
method enables ELM to learn faster and attain higher mining algorithms are listed in Table 2.
generalization capability compared with other neural networks.
The KELM uses the kernel method over the vanilla ELM and it 3.3 Modeling Bearing Behavior
solves the problem of random initialization of ELM and has high Data from three wind turbines (i.e., Turbine C, Turbine D,
classification accuracy (Pandey et al., 2018; Ouyang 2021), good Turbine E) have been merged to train the neural network,
generalization ability and high degree of robustness. The support vector regression machine, the extreme-learning
Gaussian kernel function is the most frequently used kernel machine presented in Section 2.2, and the proposed deep
function and thus is selected in this study. belief network (DBN). Data collected from Turbine A, B, F
and G are used as validation dataset to validate prediction
performance of the proposed DBN algorithm. Data from
3 COMPUTATIONAL ANALYSIS Turbine G, J, I and K are used as testing dataset respectively.
To design the DBN, the number of hidden neurons in each layer
The data used in this research has been collected from SCADA is set at 10% of the training data (Mitchell, 1999). The data from
systems of a large wind farm. The data 10 min resolution data the remaining 2 healthy turbines (i.e., Turbine 9 and 11) are
from 11 wind turbines is used to investigate failure of a generator designated as test datasets to evaluate performance of the four
bearing. Two bearing failure instances have been reported during algorithms.
the period covered by the dataset. Table 3 presents prediction results produced by the four
algorithms based for the test and validation datasets. The
3.1 Dataset Description and Preprocessing mean absolute percentage error (MAPE) and the root mean
The ranges of the generator bearing temperature of the 11 square errors (RMSE) produced by the DBN algorithm are the
turbines are provided in Table 1. The bearing failure incidents smallest which confirms the accuracy of the DBN model. This
are also included in Table 1. Based on the maintenance records, superior performance may be attributed to the layer-wise pre-
Turbine B, H, I, and K have been affected by bearing failures and training.
are not considered for modeling normal bearing behavior Figure 4 illustrates prediction error from testing and
discussed in the Section 3.2. Rather they are selected to test validation produced by the deep belief network (DBN). The
abnormal behavior of the bearing temperature. APEs of healthy wind turbines and turbines with bearing
failures demonstrate different behaviors. Hence, the
3.2 Parameter Selection emerging bearing failure is indicated by the APE of the
To capture the normal behavior of a generator bearing, 33 DBN model.
parameters relevant to the bearing temperature have been
initially considered. Using domain expertise, the number of 3.4 Condition Monitoring
parameters of interest was reduced to 12. Next, three In this section, behavior of the prediction error associated with
algorithms (i.e., the wrapper with genetic search (WGS) the bearing failure is discussed. The APE was monitored for
FIGURE 4 | The absolute percentage error produced by the deep belief network.
1 week prior to the bearing failure. The upper confidence 1 week prior to the bearing failure and an early alarm is
limit (UCL) and the lower confidence limit (LCL) of the issued. According to the results presented in Figure 6,
exponentially-weighted moving average (EWMA) control bearing failures are visible several days ahead of
chart are computed from Eqs 18, 19 of Section 2.4. the occurrence. The proposed approach provides
The monitored examples of healthy turbines and the sufficient time to react and thus minimize power loss and
turbines with emerging bearing failures are illustrated in downtime.
Figures 5, 6. The outcomes of the EWMAs are transformed into the real-
Figure 5 illustrates the EWMA charts of healthy time binary vectors and then the bearing failure classification
turbines (Turbine G and J) while Figure 6 shows the wind models are developed to classify the actual failures. However,
turbines (Turbine I and K) with problematic generator in the temporal domain, the optimal size of the EWMA
bearings of the same wind farm. In Figure 5, all statistics vectors are uncertain. Hence, this research performed several
fall within the control limits which indicates normal bearing experiments by trying difference size of the EWMA vectors
behavior. Meanwhile, outliers in Figure 6 begin to emerge (i.e., K 10, 20, 30, 40). All algorithms introduced in the Section
FIGURE 6 | The EWMA control charts of two turbines with bearing failures.
FIGURE 7 | The AUCs of all classification algorithms under different dimensions of EWMA vectors.
2.7 are tested and the computational results are illustrated in performance when K 20 and thus it is selected as the
Figure 7 below. The AUC is selected as the measurement It is optimal setting for the dimension of the input EWMA vector
obvious that all algorithms reached their peak classification in our study.
As illustrated in Figure 8 below, the ROC curves for the four the statistical outliers. Instead, in this research, the machine-
state-of-art algorithms are obtained with respect to the testing learning classifiers enables the automation of this process. In sum,
dataset. Among them, the LS-SVM achieves the highest area it can be widely applied in wind farms for condition
under the ROC curve (AUC) as 0.88 which demonstrates its monitoring tasks.
superior performance in classifying bearing failures from the On the other hand, there are also few shortcomings at
binary vector mixed with normal and abnormal prediction current stage. For example, the sensor errors can be a
residuals. Meanwhile, the other performance metrics including misleading factor that cause false classification of mechanical
accuracy, sensitivity and specificity along with the 95% failures. The reliability of the SCADA sensors is not considered in
confidence intervals are also provided in Table 4. The LS- this framework. This can be a future direction of our current
SVM still performs best among all algorithms tested according research.
to all evaluation metrics. Hence, using the vectors generated from
the DBN and EWMA control charts, the LS-SVM is capable of
classify the majority of the bearing failures in the temporal 5 CONCLUSION
domain.
In this research, a deep-learning based condition-monitoring
framework to identify bearing failures was presented in this
4 DISCUSSION study. Historical data collected from healthy wind turbines
was utilized to develop a model predicting bearing
The condition-monitoring framework proposed in this study has temperature with a deep belief network. Data from both
provided promising results using field SCADA data. Overall, the healthy wind turbines and turbines to the bearing failures are
advantages of the proposed framework can be summarized into served as the testing dataset. Comparative analysis demonstrated
the following three points: First, it uses deep belief network as the that the deep belief network model was more accurate in
backbone regressor. It has shown superior power in extracting predicting generator bearing failures. An exponentially-
temporal abnormal features from the dataset. Second, the weighted moving-average control chart was applied to capture
framework is designed to be implemented on SCADA data shifts in prediction error. The control charts generated binary
which is the standard data collection system for almost all vectors lead to identification of the emerging bearing failure in
wind farms across the globe. Hence, it can be widely real-time in the temporal domain.
implemented on practice. Third, the classification part can Computational results reported in the paper validated
save a lor of labor and time. Conventional control chart-based accuracy of the deep-learning framework in condition
identification of mechanical failures requires humans to detect monitoring of wind turbine generator bearings. In the future
research, analysis of high frequency vibration data may be and writing-original draft. All authors have read and agreed to the
coupled with the bearing temperature data for multi-scale published version of the manuscript.
condition monitoring.
FUNDING
DATA AVAILABILITY STATEMENT
This research is supported by National Key Research and
The raw data supporting the conclusion of this article will be Development Program of China (2018YFC1505105), the
made available by the authors, without undue reservation. Opening fund of State Key Laboratory of Geohazard
Prevention and Geoenvironment Protection (Chengdu
University of Technology) (Grant No. SKLGP2021K014), the
AUTHOR CONTRIBUTIONS “Miaozi project” of scientific and technological innovation
in Sichuan Province, China (Grant No. 2021090), the Project
HL conceptualized the study, contributed to the study from Sichuan Mineral Resources Research Center
methodology, and wrote the original draft. JD contributed to (SCKCZY2021-YB009), and the Open Research Subject of
the study methodology, data curation and investigation. SY and Key Laboratory of Fluid and Power Machinery
PF contributed to data analysis and investigation. HL contributed (Xihua University), Ministry of Education (Grant No.
to software and formal analysis. DA contributed to investigation LTDL2021-011).
Ouyang, T. (2021). Feature Learning for Stacked ELM via Low-Rank Matrix Teng, W., Ding, X., Zhang, X., Liu, Y., and Ma, Z. (2016). Multi-fault Detection and
Factorization. Neurocomputing 448, 82–93. doi:10.1016/ Failure Analysis of Wind Turbine Gearbox Using Complex Wavelet Transform.
j.neucom.2021.03.110 Renew. Energ. 93, 591–598. doi:10.1016/j.renene.2016.03.025
Ouyang, T., He, Y., Li, H., Sun, Z., and Baek, S. (2019). Modeling and Forecasting Wang, H.-z., Li, G.-q., Wang, G.-b., Peng, J.-c., Jiang, H., and Liu, Y.-t. (2017). Deep
Short-Term Power Load with Copula Model and Deep Belief Network. IEEE Learning Based Ensemble Approach for Probabilistic Wind Power Forecasting.
Trans. Emerg. Top. Comput. Intell. 3 (2), 127–136. doi:10.1109/ Appl. Energ. 188, 56–70. doi:10.1016/j.apenergy.2016.11.111
tetci.2018.2880511 Wang, H. Z., Wang, G. B., Li, G. Q., Peng, J. C., and Liu, Y. T. (2016c). Deep Belief
Ouyang, T., Kusiak, A., and He, Y. (2017). Modeling Wind-Turbine Power Curve: Network Based Deterministic and Probabilistic Wind Speed Forecasting
A Data Partitioning and Mining Approach. Renew. Energ. 102, 1–8. Approach. Appl. Energ. 182, 80–93. doi:10.1016/j.apenergy.2016.08.108
doi:10.1016/j.renene.2016.10.032 Wang, L., Zhang, Z., Long, H., Xu, J., and Liu, R. (2016a). Wind Turbine Gearbox
Pandey, P., Patel, V., George, N. V., and Mallajosyula, S. S. (2018). KELM- Failure Identification with Deep Neural Networks. IEEE Trans. Ind. Inform. 13
CPPpred: Kernel Extreme Learning Machine Based Prediction Model for (3), 1360–1368.
Cell-Penetrating Peptides. J. Proteome Res. 17 (9), 3214–3222. doi:10.1021/ Wang, L., Zhang, Z., Xu, J., and Liu, R. (2016b). Wind Turbine Blade Breakage
acs.jproteome.8b00322 Monitoring with Deep Autoencoders. IEEE Trans. Smart Grid 9 (4), 2824–2833.
Peeters, C., Guillaume, P., and Helsen, J. (2018). Vibration-based Bearing Fault Yan, R., Gao, R. X., and Chen, X. (2014). Wavelets for Fault Diagnosis of Rotary
Detection for Operations and Maintenance Cost Reduction in Wind Energy. Machines: a Review with Applications. Signal. Process. 96, 1–15. doi:10.1016/
Renew. Energ. 116, 74–87. doi:10.1016/j.renene.2017.01.056 j.sigpro.2013.04.015
Peng, Z., Peter, W., and Chu, F. (2005). An Improved Hilbert–Huang Transform Yang, B., Liu, R., and Chen, X. (2017). Fault Diagnosis for a Wind Turbine
and its Application in Vibration Signal Analysis. J. sound vibration 286 (1-2), Generator Bearing via Sparse Representation and Shift-Invariant K-Svd. IEEE
187–205. doi:10.1016/j.jsv.2004.10.005 Trans. Ind. Inform. 23 (5), 91–99. doi:10.1109/tii.2017.2662215
Qiu, X., Ren, Y., Suganthan, P. N., and Amaratunga, G. A. J. (2017). Empirical Yang, W., Court, R., and Jiang, J. (2013). Wind Turbine Condition Monitoring by
Mode Decomposition Based Ensemble Deep Learning for Load Demand Time the Approach of SCADA Data Analysis. Renew. Energ. 53, 365–376.
Series Forecasting. Appl. Soft Comput. 54, 246–255. doi:10.1016/ doi:10.1016/j.renene.2012.11.030
j.asoc.2017.01.015 Yang, W., Liu, C., and Jiang, D. (2018). An Unsupervised Spatiotemporal Graphical
Sbihi, A. (2007). A Best First Search Exact Algorithm for the Multiple-Choice Modeling Approach for Wind Turbine Condition Monitoring. Renew. Energ.
Multidimensional Knapsack Problem. J. Comb. Optim 13 (4), 337–351. 127, 230–241. doi:10.1016/j.renene.2018.04.059
doi:10.1007/s10878-006-9035-3 Zhu, X., Xu, Q., Tang, M., Li, H., and Liu, F. (2018). A Hybrid Machine Learning and
Shen, X., Ouyang, T., Khajorntraidet, C., Li, Y., Li, S., and Zhuang, J. (2021a). Computing Model for Forecasting Displacement of Multifactor-Induced Landslides.
Mixture Density Networks-Based Knock Simulator. IEEE/ASME Trans. Neural Comput. Applic 30 (12), 3825–3835. doi:10.1007/s00521-017-2968-x
Mechatronics, 1. 10.1109/TMECH.2021.3059775..
Shen, X., Ouyang, T., Yang, N., and Zhuang, J. (2021b). Sample-Based Neural Conflict of Interest: The authors declare that the research was conducted in the
Approximation Approach for Probabilistic Constrained Programs. IEEE Trans. absence of any commercial or financial relationships that could be construed as a
Neural Networks Learn. Syst.. doi:10.1109/tnnls.2021.3102323 potential conflict of interest.
Shen, X., and Raksincharoensak, P. (2021). Pedestrian-Aware Statistical Risk
Assessment. IEEE Trans. Intell. Transportation Syst.. doi:10.1109/ Publisher’s Note: All claims expressed in this article are solely those of the authors
tits.2021.3074522 and do not necessarily represent those of their affiliated organizations, or those of
Sun, Z., He, Y., Gritsenko, A., Lendasse, A., and Baek, S. (2020a). Embedded the publisher, the editors and the reviewers. Any product that may be evaluated in
Spectral Descriptors: Learning the point-wise Correspondence Metric via this article, or claim that may be made by its manufacturer, is not guaranteed or
Siamese Neural Networks. J. Comput. Des. Eng. 7 (1), 18–29. doi:10.1093/ endorsed by the publisher.
jcde/qwaa003
Sun, Z., Rooke, E., Charton, J., He, Y., Lu, J., and Baek, S. (2020b). Zernet: Copyright © 2021 Li, Deng, Yuan, Feng and Arachchige. This is an open-access
Convolutional Neural Networks on Arbitrary Surfaces via Zernike Local article distributed under the terms of the Creative Commons Attribution License (CC
tangent Space Estimation. Computer Graphics Forum 39 (6), 204–216. BY). The use, distribution or reproduction in other forums is permitted, provided the
doi:10.1111/cgf.14012 original author(s) and the copyright owner(s) are credited and that the original
Tavner, P. J., Greenwood, D. M., Whittle, M. W. G., Gindele, R., Faulstich, S., and publication in this journal is cited, in accordance with accepted academic practice.
Hahn, B. (2012). Study of Weather and Location Effects on Wind Turbine No use, distribution or reproduction is permitted which does not comply with
Failure Rates. Wind Energy 16 (2), 175–187. doi:10.1002/we.538 these terms.