Intelligent Fault Diagnosis of Rotating Machine Elements Using Machine Learning Through Optimal Features Extraction and Selection
Intelligent Fault Diagnosis of Rotating Machine Elements Using Machine Learning Through Optimal Features Extraction and Selection
com
ScienceDirect
Available online at www.sciencedirect.com
ScienceDirect
Procedia Manufacturing 00 (2021)000–000
www.elsevier.com/locate/procedia
Abstract
The rolling element bearings, and gears are the main components of rotating machines and are most prone to defects which may result in
significant economic loss. The main purpose of this study is an automated diagnosis of rolling element bearings and gears defects using machine
learning (ML) technique and statistical features extracted from time domain vibration signal and spectral kurtosis. Extracted features are used to
train K- nearest neighbors (KNN) as diagnostic classifier. The significance of segmentation size for time domain raw vibrational signals for the
purpose of feature extraction is studied. This analysis is carried out by varying the window/segment length for features extraction and observing
its effect on classification accuracy. Importance of feature selection for optimal performance of KNN in defect classification is studied by
selecting most important and useful features using Genetic Algorithm (GA). Furthermore, effect of value of K on performance of KNN classifier
has been observed by varying the value of K between 1 to 10 with step size of 1. Results show the ability of KNN classifier in combination with
GA for correct and confident fault diagnosis of rotating machine elements in case of proper selection of parameters for features extraction.
© 2020The
© 2020 TheAuthors.
Authors. Published
Published by Elsevier
by Elsevier Ltd. Ltd.
This
This isisan
anopen
openaccess
access article
article under
under the BY-NC-ND
the CC CC BY-NC-ND licenselicense (https://creativecommons.org/licenses/by-nc-nd/4.0/)
https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review underresponsibility
Peer-review under responsibility ofof
thethe scientific
scientific committee
committee of FAIM
of the the FAIM 2021.
2020.
Keywords: K-nearest Neighbor (KNN); Rolling element bearings and gears; Fault Diagnosis; Genetic Algorithm (GA)
support vector classification (MDSVC) and a deep random and found same using average mean square and classification
forest fusion (DRFF) technique, Chen Z et al. [15,16] used error.
multiple layered neural network scheme and convolutional Many studies have been made in past few decades on the
neural network method. fault detection of mechanical components using AI techniques
To improve the accuracy and efficiency of fault diagnosis of [24,25]. Since using these techniques for diagnosing defects
rotating machinery, application of Machine Learning have various benefits if compared with the traditional
techniques in machine condition monitoring still need more mathematical and statistical modelling approaches. However,
encouragement and attention. Therefore, the goal of this there are lots of things still need to be improved for these
research is to develop an autonomous fault classification methods to make it more effective and practical to solve real
system, using appropriate artificial intelligence technique, that world applications [26]. Furthermore, Genetic Algorithm (GA)
can be used to improve the efficiency and applicability of applications with vibrational signal analysis in machine
condition monitoring technology. condition monitoring and fault diagnosis still need more
The remainder of the paper is arranged as follows: in the support and attention because of the lack of existing evidence.
next section recent literature related to fault diagnostics Vibration based conditional monitoring is most popular for
through AI techniques has been illustrated. Later, Section 3 early monitoring and identification of defects. However,
consists of a case study in which the proposed approach is localized faults produce very weak impulses in vibrational
applied to the vibrational samples of two types of rotating signals. Therefore, it is difficult to detect these faults using
machine elements separately. One is rolling element bearing existing frequency domain methods [27]. Various supervised
and the other component is gear. In section 4, detailed analysis machine learning methods have so far been used for fault
of the results obtained from the proposed approach has been identification of rolling element bearings and gears using time
carried out. The whole research work is being summarized in domain statistical features. However, the fault classification
the section 5 along with some future recommendations. through the usage of minimum features is still a challenge for
researchers.
2. Literature Review In the light of existing literature, it was observed that there
is still need to improve the quality of the features extracted
Accurate fault detection and diagnosis systems have gained from vibrational data so that they may contain appropriate and
significant importance so that the potential failures of enough information regarding machine health in order to
machinery can be managed properly. Various methods have properly train ML classifier for defect diagnosis. In this
been applied by researchers to tackle these issues. However, for research work, vibration-based monitoring technique is used
condition monitoring of rotating machine elements, vibration for fault diagnostics of rolling element bearings and gears.
monitoring is most widely used, as was published in [17]– [20]. Statistical features derived from time-domain signal and
Since the raw vibrational data is 1-dimensional time domain spectral kurtosis are extracted. Then, K- Nearest Neighbor is
series data; extracting the appropriate features as health applied upon the combination of all extracted features to
indicator is required. Those mechanical components that are classify the faults. The performance of classifier is compared
often highly loaded in rotating machinery are gears and for different values of K in order to determine its potential in
bearings (rolling element bearings in particular), responsible classifying various faults of bearings and gears. Afterwards,
for early damages in machine’s life. Rolling element bearing Genetic algorithm is used to select most suitable features for
and gears are vital part of rotating machinery and their failure optimizing the performance of KNN classifier.
cannot only cause production losses but also lead to the
financial losses. It is important to mention here that only 3. Case Study
bearing related defects are responsible for more than 40% of
the failure in industrial machines [21]. In this instance, a lot of For accurate mechanical fault diagnosis realistic and correct
research in the field of fault diagnosis has been carried out since features extraction from vibration signal; containing enough
last few decades. Qicai Zhou et al. presented a study in which information regarding machine/component’s health is of
K-mean algorithm was used to label the un-labeled signals paramount importance. For feature extraction from vibration
[18]. A comparison of the performance of ANN and KNN was data, it is to be divided into samples having same number of
carried out by Rohit S. Gunerkar et al. [19]. The study was points (dimension). The number of points in one sample will
performed on rolling element bearings based upon feature be referred as window size (𝑊𝑊) for feature extraction in this
extracted after wavelet transform. J. P. Patel did comparison paper. In order to find effect of window size for feature
between ANN and SVM considering bearings and found that extraction on classification accuracy, it is essential to set some
SVM gave better results as compared to ANN [20]. Few years reference. For this purpose, window size corresponding to one
back, a method based on spectral kurtosis (SK) and cross rotation (𝑊𝑊𝑊𝑊) is considered as reference in this paper, which
correlation was presented by Jing Tian et al [22] to diagnose can be described as:
defects and monitor the degradation of bearings in electric
motors. The method was validated by experiments using 𝑊𝑊𝑊𝑊 = 60 𝐹𝐹𝐹𝐹/𝑅𝑅 (1)
machinery fault simulator. Daniel Augusto [23] carried out
analysis to diagnose faults in water cooling system using data Where; 𝐹𝐹𝐹𝐹 is the sampling frequency of acceleration sensor
collected from real engine at all speeds to train the classifier. in 𝐻𝐻𝐻𝐻 and 𝑅𝑅 is the revolution per minute of the rotating
The performance of KNN and ANN classifiers was compared machine. The case study was carried out for both components
268 Syed Muhammad Tayyab et al. / Procedia Manufacturing 51 (2020) 266–273
Syed Muhammad Tayyab / Procedia Manufacturing 00 (2021)000–000 3
(rolling element bearings and gears) separately. Statistical two accelerometers installed at both the drive end and fan end
features derived from time-domain signal and spectral kurtosis of the motor housing, and two sampling frequencies of 12 kHz
(SK) are extracted. Initially, 16 features (root mean square, and 48 kHz were used. Four classes of data with 48 kHz
peak to peak, kurtosis, crest indicator, impulse, energy-I, sampling frequency were selected for research as mentioned in
skewness, standard deviation, variance, shape factor, margin Table-1. Experimental data for gear fault diagnosis was
factor, energy-II, spectral kurtosis mean, SK kurtosis, spectral acquired from online shared gear faults data sets [29] as shown
kurtosis skewness, spectral kurtosis standard deviation) were in Fig. 3. In each case of gears data set, the acceleration was
calculated keeping the window size 𝑊𝑊 = 400 (~ 𝑊𝑊𝑊𝑊/4 for recorded for a duration of 10 seconds with the sampling
bearings’ and ~ 𝑊𝑊𝑊𝑊 for gears’ data set). The dimension of frequency of 10 kHz. The gearbox fault data was collected in
feature matrix for rolling element bearing and gear data set was three different pinion conditions (see Table-2).
609x16x4 and 250x16x3, respectively. In order to analyze the
effect of window size 𝑊𝑊 for feature extraction on classifier
accuracy, it was increased to 800 ( ~ 𝑊𝑊𝑊𝑊/2 ) and 1200
(~ 2𝑊𝑊𝑊𝑊/3) for rolling element bearing and to 800 (~ 2 𝑊𝑊𝑊𝑊)
for gears.
Feature Selection
Training Testing
Since the requirement of this era is to automatize the fault Table 2. Selected gear fault classes
diagnosis to avoid human intervention in this tedious task. For Classes 1 2 3
this instance, the classification technique from machine Type of Faults No fault One chipped Three consequent
learning (K- Nearest Neighbors) has been applied to diagnose tooth worn teeth
the defects of rotating machine elements. The flow of the
adopted methodology is illustrated in Fig.1. 3.2. Application of K-nearest neighbors algorithm
3.1. Experimental setup and vibrational data KNN is an algorithm that learns from all available patterns
(training data) and classifies the new patterns (testing data)
The rolling element bearing vibration signals used for based on the similarity measure between them. The similarity
analysis were downloaded from CWRU Bearing Centre [28] as measure is the minimum distance (e.g. Mahalanobis distance,
shown in Fig.2. Vibration data are collected for motor loads Euclidean distance etc.) between the data points. It requires to
from 0 to 3 hp and motor speeds from 1,720 to 1,797 rpm using set the number of nearest neighbors (k) that are considered for
Syed Muhammad Tayyab et al. / Procedia Manufacturing 51 (2020) 266–273 269
4 Syed Muhammad Tayyab / Procedia Manufacturing 00 (2021)000–000
classification based upon selected similarity measure. The tournament selection. In proportionate selection method
classification through KNN algorithm is divided into training individual fitness is divided by the average fitness of
and testing phases. The available data is divided into training population; roulette wheel is one of the most used method in
and testing samples. During the training phase, KNN model which slice of circular roulette wheel is assigned to individuals.
finds the relationship between predictors and targets of training Upon spinning it up to the “number of population”; gives the
data whereas; in testing phase, trained model is tested for its parents and child sequences which is repeated iteratively. In the
ability to predict the correct classes of testing data whose labels second method fitness scale is not required because every
are known but the model has not already been trained on this chromosome is ranked based on its own fitness, this helps in
data. preventing the early convergence. The disadvantage is that the
In this research, analysis is carried out to ascertain the effect ratio of expected values of individuals remains same even when
of K values and window size for feature extraction to improve the fitness variance is low or high. In the tournament selection
the classification accuracy of model. Euclidean distance is used two individuals are randomly selected from initial population
in this study to find the similarity between the data points which and the best individual from those two is preserved for next
is simple to implement and can give good results. Lesser the generation. In GA, selection method is of critical importance,
distance between data points, higher is the similarity. Initially, it aids in finding out the individuals which need to be
KNN model is trained and tested using all extracted features. prioritized to produce next generation (child chromosomes).
However, to reduce the dimensionality of feature matrix for the Thus, setting it carefully may lead the solutions towards desired
purpose of reducing the computational requirements and optimality level. If the selection is strong and strict, highly fit
improving the classification accuracy, optimal features are suboptimal solution may take over population. On contrary,
selected through wrapper approach. Genetic Algorithm (GA) is slow convergence is normally observed in case of weak
used for effective feature selection for isolating the best selection. In this case study, tournament selection has been
features evaluated through fitness criteria. The significance of used as selection criteria; two individuals executed from the
using GA over other methods is that it consumes less initialized population are set out for a tournament and the best
computational power i.e. no need for derivative calculations. It individual is preferred as a parent. Most discriminating feature
is more robust algorithm and has a superior global searching of GA over other search methods is crossover as it contributes
capability in complicated search spaces. On contrary, GA in maintaining diversity by introduction of new genetic content
needs preliminary tuning and typically many fitness when operated on two chromosomes to produce off-springs. To
evaluations are required to get better solutions. In this case accelerate the search in population recombination operator is
study, GA parameters have been defined carefully and applied. Simplest way of crossover is to choose a cutoff point
algorithm was run for enough number of generations to achieve randomly and exchange the genetic content by combination of
effective convergence. The fitness/ objective function of this single segment of two parents to give child sequence. There are
algorithm returns the numeric value calculated against every many methods to accomplish recombination. To achieve better
sample for randomly selected features. This selection is performance and better combinations, properly designed
controlled through the values generated in the binary encoded crossover mechanism is required thus recombination should be
genome. The performance of KNN classifier in terms of carried out with some probability called crossover rate.
minimizing the classification error by resubstitution/ Crossover rate is the ratio of number of child sequences
resubstitution loss is used as the fitness function for GA. produced in each generation to the population size. The third
GA operator is mutation; it simply alters one or more genes that
3.3. Optimization through Genetic Algorithm are lost during selection operator to maintain diversity.
Although it is secondary operator but its probability of applying
GA is used for selection of best features in order to reduce is as important as that of crossover. Most common mutation is
the dimensionality of features matrix and to optimize the the bit flip mutation in which it randomly selects two
performance of KNN. The algorithm starts by initializing the individuals and swaps them with a probability called mutation
population (design variables/ labelled data) which is later rate [30]. Moreover, in the applied algorithm of GA, diversity
evaluated based on considered fitness function. This population of the solutions has been maintained by preserving the best
of variables is user defined and needs to be set appropriately. individuals for the next generations, this phenomenon is called
GA can be used for feature selection using binary or real coded elitism. The algorithm for GA has been designed in MATLAB
individuals. In this case study the population of individuals is using the input extracted from the features (labelled data). GA
randomly generated “genomes” which is a binary encoded bit. rely on its population unlike traditional techniques. Thus,
The reason behind using the binary bits is that it helps to reduce population size is user defined and it has drastic effects on the
the likelihood of convergence within the population. The whole performance of GA. Small population size may result into
subset of variables is subjected to three GA operators; premature convergence and large population size takes
selection, crossover, and mutation. Genetic operators are based unnecessary computational time. In this case study population
on the idea of heredity of genetic characters over the size is set to be 50. The algorithm for GA is represented in Fig
generations. By the application of these operators, best 4. Initializing the population and then converting the
solutions/individuals are separated resulting in the convergence individuals into binary encoding facilitates in finding out the
of the fitness towards global best solution over the generations fitness of all individuals based on the selection
till the stopping criteria meets [30]. Most common fitness fitness/evaluation criteria. GA does not have intense
evaluation or selection methods are proportionate, ranking and mathematical requirements, so it has the tendency to handle
270 Syed Muhammad Tayyab et al. / Procedia Manufacturing 51 (2020) 266–273
Syed Muhammad Tayyab / Procedia Manufacturing 00 (2021)000–000 5
any type of fitness/ objective function and constraints. In this is essential to use appropriate size of window. It is observed
case study, the objective function of GA is to minimize the that acceptable classification accuracy can be achieved by
classification error. In every generation it selects the setting 𝑊𝑊 ≥ 𝑊𝑊𝑊𝑊/2 for rolling element bearings and 𝑊𝑊 ≥
individuals having minimum classification error and use these 2𝑊𝑊𝑊𝑊 for gears. Features extracted by using smaller windows
individuals for next generations and discard the other ones will not carry adequate information for correct fault
considering equal importance to the data of all classes. The best classification.
individuals are the offspring for current generation and parents Feature selection through GA was very useful in order to
for the next successive iteration, evolving the cycle till the find most important features for fault diagnostics. Individuals
stopping criteria is fulfilled. having the best fitness were obtained for generation size 100
(stopping criteria) and population size of 50. Crossover and
4. Results and Discussion mutation operator need to be designed appropriately with some
probability rate to attain better performance. In this case study
Performance of KNN classifier in terms of classification the crossover between two samples is being carried out at a rate
accuracy for both rotating machine elements (rolling element of 0.8 and the same rate has been used for mutation. KNN did
bearings and gears) is appended at Table 3. For rolling elements fault classification with 100% accuracy using only 04 and 03
bearings three window sizes (𝑊𝑊1= 400≈ 𝑊𝑊𝑊𝑊/4, 𝑊𝑊2= 800 ≈ features selected through GA in case of bearings and gears, b
𝑊𝑊𝑊𝑊/2 and 𝑊𝑊3 =1200 ≈ 2𝑊𝑊𝑊𝑊/3 ) were used for feature respectively. Furthermore, the performance of classifier was
extraction. All 16 extracted features were used for training the improved in terms of computational time as well. The
KNN model. For 𝑊𝑊=400, classification accuracy was observed computational time was reduced up to 73% in case of bearings
more than 90% for all values of k under observation. The and up to 48% in case of gears data set. Figures 9 and 10 show
minimum accuracy was 90.3% for K=2 and maximum reduction in computational time because of dimensionality
accuracy was 93.6% for K= 5&7. Confusion matrix for K=4 at reduction for bearings and gears data sets, respectively.
this window size is shown in Fig. 5a. By increasing the window Performance comparison of this study with few existing
size to 800, classification accuracy was improved for all values studies, who have used the Case Western Reserve University
of K under consideration. For 𝑊𝑊 = 800 , minimum Bearing data set, is given in Table 4.
classification accuracy of 97.3% was observed for K=2 and
maximum accuracy of 98.2 % was observed for K=4.
GA parameters
Confusion matrix for K=4 at this window size is shown in Fig (population size, Labelled Features
5b. Afterwards, window size was further increased to 1200. For stopping criteria)
this window size, 99.6% classification accuracy was observed
for K= 2 to k=10. Classification accuracy at this window size
( 𝑊𝑊 = 1200 ) was further improved to 100% after applying
Genetic Algorithm (GA) for features selection. GA selected
only 04 features (root means square, impulse, kurtosis and Initialize population
Shape Factor) which reduced the dimensions of features
matrix. Confusion matrix for K=4 at this window size are
shown in Fig 6a and 6b. Evaluate Fitness
For gears data set two window sizes ( 𝑊𝑊1 = 400 ≈
𝑊𝑊𝑊𝑊, 𝑊𝑊2 = 800 ≈ 2𝑊𝑊𝑊𝑊 ) were used for feature extraction.
Firstly, all 16 features were used for the classification. For
Ranking and Selection of
window size = 400, classification accuracy of above 92% was Best Fit individuals
observed for all values of K between 1-10. Maximum accuracy
was observed to be 94.6% at K=4. Confusion matrix for K=4
at this window size is shown in Fig 7. By increasing the
YES
window size to 800 classification accuracy was significantly Display Results
Stopping
improved for all values of K. Minimum classification accuracy (best features
criteria met?
was observed to be 98.2% for K=8 and 100% classification selected)
accuracy was achieved for K=2-5 (see Fig. 8a). When GA was NO
a a
b b
Figure 5: Confusion charts for rolling elements bearings for (a) Figure 6: Confusion charts for rolling elements bearings for window
window size 400; (b) window size 800 size 1200 (a) all features; (b) with selected features
272 Syed Muhammad Tayyab et al. / Procedia Manufacturing 51 (2020) 266–273
Syed Muhammad Tayyab / Procedia Manufacturing 00 (2021)000–000 7
without GA With GA
3.00
2.00
SEC
1.00
0.00
1 2 3 4 5 6 7 8 9 10
K
Misclassification Rate
Extracted Features
Work /Approaches
Considered Classes
Feature Selection
Classifier
Approach
Zhang 21 Kernel 3 7 Drive SVM 0.47%
etal.,2 Principal End
013 Compone
[31] nt
Analysis
b
[21]
2.00 Ours 16 Genetic 4 4 Drive KNN 0.00 %
0.00 Algorithm End
1 2 3 4 5 6 7 8 9 10
K