Optimizing Induction Motor Fault Detection With Transfer Learning: A Comparative Analysis of Deep Learning Models
Optimizing Induction Motor Fault Detection With Transfer Learning: A Comparative Analysis of Deep Learning Models
Failures of motor core components such as stators, considerable energy savings and lowered running costs. In
rotors and bearings account for a large percentage of motor sum, early detection of motor faults is essential for
breakdowns. Hence in this setting, the most common motor maintaining consistent production and remaining
faults would be examined. competitive in the industry.
In terms of stator faults, the stator includes a laminated II. LITERATURE REVIEW
core, external frame, and insulated windings, all of which
experience electrical and environmental stresses that can lead Research into fault detection in induction motors
to failures. Stator faults are typically classified based on their employs various methodologies, each providing insights into
location: they may occur in the stator frame, the winding, or challenges in scalability, accuracy, and adaptability. [5]
the laminations of the stator core. Among these, winding utilizes cyclostationarity to capture the periodic
failures are particularly serious, often resulting from characteristics of electrical signals, enhancing early fault
insulation breakdown. This leads to localized overheating, detection by identifying subtle statistical changes. However,
which, if not detected, can cause further insulation damage, its adaptability is limited when applied to motors under
potentially resulting in a catastrophic short circuit inter-turn varying loads. [6] implements Convolutional Neural
fault [4]. Networks (CNNs) for three-phase induction motor
diagnostics, excelling in fault pattern recognition. Yet, CNNs
Understanding the behavior of induction motors (IMs) encounter computational constraints, making real-time
under fault conditions and diagnosing these issues has posed applications challenging. Similarly, [7] employs Two-
a longstanding challenge for researchers in electrical dimensional Time-Domain Gray Coded Image (TDGCI)
machinery. Common motor faults are often associated with coupled with CNNs to diagnose rotor faults. While effective
key components like stators, rotors, and bearings. If these in identifying visual fault patterns, the reliance on Fast
faults are not identified in their initial stages, motor Fourier Transform (FFT) preprocessing limits TDGCI’s
performance can deteriorate, potentially leading to complete effectiveness in low-severity conditions where FFT is noise-
failure. Detecting faults early in IMs brings significant sensitive.
benefits to industrial operations by enabling cost-effective
failure prediction and proactive maintenance planning. This [8] explores random multi-frequency resonant sparse
approach allows for timely preventive measures, reducing noise power spectrum (rMFRSNPS) in conjunction with
the need for costly part replacements and preventing probability vector resonance analysis (PVRA), effectively
unplanned production halts and downtime [1]. Additionally, identifying faults in isolated conditions. However, the
early fault detection contributes to motor efficiency by method struggles to generalize to complex fault scenarios,
addressing operational inefficiencies, resulting in limiting its applicability in environments with multiple
Data Processing: During training, input data (images) are fed into the
Before inputting data into the Transfer Learning model, CNN, where essential features are identified and extracted,
which would be RGB images, distinguishing healthy and ensuring consistent feature selection across samples, and
stator fault conditions, it is essential to clean and pre-process class predictions are generated. The model’s predictions are
the data to ensure accurate and effective training of the then compared to actual labels, with errors measured through
model. This action is executed with libraries that include a loss function. The model’s parameters are updated
TensorFlow, Scikit-Learn, and NumPy in Google Colab. accordingly to minimize this loss. Validation is performed
Some common steps we take in cleaning data for a CNN after each training cycle to monitor the model’s performance
model include: and prevent overfitting. In this phase, hyperparameters, such
as learning rate, batch size, filter count, epochs, and padding,
Data Exploration: This involves understanding the are fine-tuned to optimize accuracy on the validation set,
dataset, visualizing and analyzing the data, identifying enhancing the model’s generalization to new, unseen data.
missing values, and removing irrelevant or redundant
data. The CNN Base Layer:
Data Pre-processing: This step includes converting data A specific type of CNN deep learning algorithm can
into a machine-readable format, scaling the data, and process an input image, apply weights and biases to
removing noise or outliers. recognize significant elements, and distinguish between
Data Augmentation: Used to improve the diversity of the different objects in the image. Unlike traditional methods
training dataset by using transformations such as flipping, that require manual engineering of filters, ConvNets can
rotating, and shifting the images. learn these distinguishing characteristics with sufficient
Data Labelling: In the dataset, each image is assigned a training, greatly reducing pre-processing needs compared to
label that corresponds to its class, such as a cat or dog. other classification algorithms.
Convolutional Neural Networks (CNNs) are trained One key benefit of CNNs over other neural networks is
using the backpropagation algorithm, which adjusts network their ability to identify critical features autonomously,
weights based on error rates calculated during training. This without human intervention. CNNs are also highly
optimization process aims to reduce the discrepancy between computationally efficient due to their use of convolution and
predicted and actual outputs by updating the network’s pooling operations, along with parameter sharing. This
weights. After data cleaning, the dataset is divided into a efficiency makes CNNs adaptable to various devices,
training set (83%) and a validation set (17%). The training enhancing their universal appeal. Furthermore, CNNs reduce
set is used to fit the model, while the validation set is used the need for pre-processing while learning distinctive filters
for tuning hyperparameters. and features on their own. CNNs also offer computational
advantages over traditional neural networks, with weight Transfer Learning Model:
sharing being a major asset. To enhance the training process for a new model on a
related task, we introduce a pre-trained model as the
CNN architecture is inspired by the human brain’s foundational layer. Transfer learning leverages the pre-
connectivity patterns, specifically the organization of the existing knowledge of this model, which has been trained on
visual cortex. Each neuron in a CNN responds to a small the extensive ImageNet dataset, to capture general features
region in the visual field, known as its receptive field, and and patterns. This approach enables efficient training on a
these fields collectively span the entire visual area. In our smaller dataset by reusing learned representations,
model, several CNN layers are used, reflecting these minimizing the computational demands and data
concepts in their structure. requirements. The pre-trained CNN model is fine-tuned and
adapted with our new dataset, functioning as the initial layer
The Flatten Layer in CNNs serves to reshape the of the transfer model for feature extraction. Integrating
multidimensional tensors generated by preceding transfer learning into our experimental model offers several
convolutional and pooling layers in the TL model into a one- advantages:
dimensional vector, preparing them for input into fully
connected layers. This layer acts as a bridge between the The use of pre-trained knowledge significantly reduces
spatial feature maps extracted by TL layers and the linear the need for extensive data and computational power for
structure required by fully connected layers for classification training from the ground up.
or regression. Mathematically, if the input tensor has It enhances the accuracy of the new model.
dimensions (batch size, height, width, channels), the flatten The resulting model is better equipped to handle data
layer converts it into a vector of shape (batch size, height * variability, noise, and outliers.
width * channels), effectively rearranging the data for
seamless integration with subsequent dense layers. This Transfer models like InceptionV3, VGG 19 and
transformation facilitates the learning of higher-level ResNet152 are imported from the libraries of the Keras
relationships in the data. library. All would be trained with the base layer’s small
dataset, and the model that gives the best performance would
In a Convolutional Neural Network (CNN), the fully be chosen for our experimental model. Not forgetting, the
connected (or dense) layer serves to translate the high-level layers and dimensions of the base layer and transfer model
features identified by the convolutional layers into the final are same, preventing overfitting.
model output, whether for classification or regression tasks.
This dense layer is composed of neurons, each connected to
every neuron in the preceding layer. It computes a weighted
sum of inputs from the prior layer and typically applies an
activation function, such as softmax, to generate the final
predictions. For multi-class classification, the softmax
function (S) transforms the last layer’s outputs into a
probability distribution across mutually exclusive classes. In
contrast, for binary classification, a sigmoid function is
applied, categorizing the outcome as 0 or 1.
A. Simulation
After the Simulink simulation, 276 current time
signatures where obtained at various time range and load.
Signatures between the training and validation dataset was
82 and 18 percent respectively.
For this research, transfer learning models such as to their varied layer structures. The choice of model
InceptionV3, ResNet50, and VGG19 were utilized due to architecture, including the number of layers and units, is
their strong capabilities in analyzing time-series and influenced by the dataset’s nature and complexity. In cases
sequential data. These models are known for their advanced where the input data is complex and nonlinear, deeper
learning abilities, even when working with raw input data, models may be necessary to achieve optimal performance.
and can accurately predict outcomes as multiclass labels due
We have trained and tested the 3 TL models for fault predicts different classes in a multi-class classification
detection and classification in Google Colab using the problem. The confusion matrix is constructed based on the
dataset elaborated in Table 1. The Confusion matrix comparison of predicted class labels and the actual ground
evaluates the performance of classification models. It truth labels of the dataset. The matrices of the three TL
provides a comprehensive overview of how well a model models are illustrated in Figure 7, 8, and 9.
From the matrices, various performance metrics are From 10, the performance metrics of ResNet152 stands
derived for insight. One is the accuracy of the individual out with precision, recall and f-1 score of 97%, 96% and
conditions, shown in Table 3. It shows that ResNet152 97% respectively, due to its complexity and better feature
exhibits a better model or algorithms for each condition. representation. Also in 11, this model leads in the accuracy
Other metrics include: Precision: It represents the proportion domain with a training accuracy and validation accuracy of
of correctly predicted positive instances among all instances 96.49% and 97.92% respectively. The least losses displayed
predicted as positive. Formula: P = TP / (TP + FP) are by the ResNet152 model in 12, with values main loss
being 12.59% and validation losses, 14.1%. It can be said
Recall: It represents the proportion of correctly the ResNet152 model has the best suitable architecture that
predicted positive instances among all actual positive captures the underlying patterns in the data effectively. Also
instances. Formula: R = TP / (TP + FN) the hyperparameters were tuned optimally. It also has the
best weight initialization to help the optimization process
F1-Score: The harmonic mean of precision and recall. find a more optimal set of parameters during training.
It balances precision and recall and is useful when dealing
with imbalanced classes. Formula: 2 * (Precision * Recall) /
(Precision + Recall)
APPENDIX A