Predictive Maintenance For Remote Field Iot Devices-A Deep Learning and Cloud-Based Approach
Predictive Maintenance For Remote Field Iot Devices-A Deep Learning and Cloud-Based Approach
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 567
S. Shakya et al. (eds.), Mobile Computing and Sustainable Informatics,
Lecture Notes on Data Engineering and Communications Technologies 166,
https://doi.org/10.1007/978-981-99-0835-6_40
568 A. Kannammal et al.
1 Introduction
Predictive maintenance will reduce sudden downtime and unplanned machine repairs
by alerting the maintenance teams. Without predictive maintenance, the mainte-
nance frequency will be much higher. The main objective of the proposed system
is to remotely control and monitor the oil rod pumps using IoT devices and to alert
the repair team before failure. Constant sending of equipment repair teams to the
customer site or machinery site to monitor the machines and equipment will defi-
nitely increase the company’s cost and time. Sudden equipment downtime can cause
unsatisfied customers and decrease the lifetime of the equipment. Using Azure cloud
solutions, IoT data could be simulated, anomaly detection could be modelled on
simulated data and alerts for repair could be sent. Safety in the workplace is the main
concern for many organizations that operate with massive machinery.
Predictive maintenance is the process of monitoring equipment continuously
during its operation to monitor its performance to report its faults beforehand. Using
IoT, predictive maintenance can be performed remotely which saves costs and time
for the company. This predictive maintenance project is aimed at oil rod pumps which
are used to extract oil from the ground. The rod pump is machinery used to suck up the
oil from the ground level. These machinery are monitored by the sensors which are
used to keep them in check. The data coming from the machines are called telemetry
data. The telemetry data from these machines are collected. The collected data can be
processed and used for prediction. The prediction can be used to prevent the failure
of the machine beforehand. This can be used to reduce the sudden downtime caused
in the machine. The data is collected from the IoT sensors and stored in the cloud
storage for processing. Using deep learning, the data can be used to detect anomalies
which cause the machine to fail in its operations. The components of the pump will
be monitored and data coming out of them can be used to check their health. This
keeps a continuous tab on the health of the machines. By this, companies will be
able to remotely monitor and control the oil rod pumps and alert their repair teams
only when needed. This work produces predictive systems that can detect anomalies
in IoT machinery and alert the repair team automatically once set up.
Figure 1 shows the transfer of data across all functionalities.
Data exploration stage includes the detection of outliers and handling them.
Quality data renders quality output. From the problem, the major inference is to
develop a system that helps in improving the detection of the failure state of the
pump by quick identification of any abnormalities in the motor power, motor speed,
casing friction and pump rate and to provide immediate responses to it. When an
anomaly is detected, the control triggers the alert message to the repair team, thereby
we are able to reduce the total cost ownership, thereby increasing the profits.
Predictive Maintenance for Remote Field IoT Devices … 569
2 Literature Review
Wejdan Al-Subaiei et al. [2] compared three maintenance strategies and justified
selecting the predictive maintenance approach in their findings in Industry 4.0 Smart
Predictive Maintenance in the Oil Industry to Enable Near-Zero Downtime in Oper-
ations, 2021. Industry 4.0 smart predictive maintenance guides the technicians to
check the oil conditions and to confirm the presence of contaminants or not based
on automated oil analysis tests to find out viscosity, presence of water or worn metal
in the oil industry.
During the past years, there was increasing interest in failure predictions which
contributes to decision-making for predictive maintenance. Sharma et al., 2011, Vasili
et al., 2011, and Van Horenheck, 2013 distinguished the dynamic and static models
in predictive maintenance [3].
Marco Cinus and Matteo Confoloneiri proposed that the data generated by produc-
tion line sensors can be used as a Key Performance Indicator, which would aid the
decision-making process apart from the DSS. They have processed the data using
Artificial Neural Networks (ANN)-based knowledge systems. Their work encourages
the use of preventive maintenance for the equipment.
Cloud computing technologies increase the likelihood of cyber-attacks
(Sasubilli & R 2021). In May 2021, a group of hackers shut off the 5500-mile Colo-
nial Pipeline in the US which cost a loss of 3 billion. Until security concerns around
IoT and cloud computing are fully addressed in the near future, oil and gas companies
will not fully develop trust in deploying big data technologies in processing facilities.
Companies have to decide on strategies for maintenance based on their production
and organizational work. In the instance of run-to-failure (RTF), companies risk
570 A. Kannammal et al.
the failure of systems because they did not maintain them in advance. Preventive
maintenance approaches can cause inefficient replacement of parts. Advantages of
predictive maintenance include better resource utilization of resources, high uptime
of equipment, reduced maintenance, reduced material and labour cost.
Yongyi Ran et al.’s [4] research suggests fault diagnosis and prognosis by applying
DL techniques, and it rarely focuses on optimizing the maintenance strategy. Apart
from this, AI technologies can be utilized for the automation of maintenance activities
that would result in cost savings and a reduction of downtime.
The predictive maintenance of IoT devices by means of a preconfigured solu-
tion illustrates the prediction of the point when failure is likely to occur. The solu-
tion combines key Azure IoT Suite services which include an ML workspace with
experiments for predicting the Remaining Useful Life (RUL) of an aircraft engine
[5, 6].
Industrial production and intelligent gas sensing [8], and intelligent parking [7–15]
applications can be implemented using machine learning and IoT in the engineering
industry. Industry IoT has enabled factories to become smart [6].
Sony and Talal characterize the sensors for health monitoring. They relatively
often find a combination of smart sensors and smart factories, a key part of the 4.0
industry concept. Lee [6] describes smart sensors used for evaluation.
3 Data Simulation
The data is simulated in two ways, one by using Azure IoT template with C-sharp
programming language. The telemetry data is simulated using C-sharp and sent to
the Azure IoT Hub Central. Using C-sharp, functions are written that specify the
maximum and minimum range of the random number data to be simulated with its
standard deviation also. Telemetry data attributes for the rod oil pump are pump rate,
time pump on, motor power, motor speed and casing friction. The pump rate is the
measure of how much oil is used up with one stroke per minute by the pump which
has the unit of Strokes per minute. The time pump on is the number of minutes the
pump is on which is in minutes. The motor power is how much electric power the
motor is running which is measured in kilowatts. The motor speed is the speed at
which the motor runs measured in Rotations per minute. The casing friction is the
force which exists between the pumping oil and the pump which is measured in
pounds per square inch.
The sensor data attributes taken other than the telemetry are serial number for the
rod pump, IP address to know from which sensor network data is coming and the
location of the pump.
Predictive Maintenance for Remote Field IoT Devices … 571
4 Data Preprocessing
The data simulated is checked for missing values and duplicate values. The missing
data points are imputed by the method of averages. The simulated data is continuous
and does not have any missing values and duplicate values. The simulated data
follows a binomial distribution. The simulated data is scaled for all three datasets to
check whether all values are properly normalized.
The time series of data for the five sensors under normal conditions and condi-
tions with gradual deterioration until reaching a failure state and sudden deterio-
ration reaching the failure state is visualized. The correlation value tells about the
relationship and interdependence among the variables in the simulated data.
Figure 2 shows the correlation between the data elements using Pearson’s corre-
lation coefficient method is done to find the correlation among the variables in the
data. The attributes “MotorPowerkW” and “MotorSpeed” are correlated with each
other since speed increases the power increase. The other variables don’t have much
correlation with each other.
Data scaling is done by normalization, using which the data points are shifted and
rescaled in the range of 0 to 1 such that they follow a normal distribution. The
simulated data follows Binomial distribution. This is used to normalize the data
which comes from all the attributes into a common value of digits.
The simulated data is analysed using Python. From the analysis, we determine what
sort of predictive or anomaly detection algorithm to use on the data. The failures are
predicted using those algorithms. The approach for anomaly detection is based on
the deep learning model called Autoencoders.
5.1 Autoencoder
z = σ(Wx + b).
The decoding network can be represented with different weights and bias:
x = σ W z + b .
Figure 3 shows the total number of training epochs which is 100 along with the
training and validation losses.
Figure 4 is a line chart showing the training loss and validation loss. We stop
training the model at the point where the validation loss spikes up (approx. 0.001) to
avoid overfitting of the model.
embedding vectors explain 99.81% of the entire variation found in the data. As we
move to 5 and 10 components for the PCA, we get some marginal increase in those
percentages. We correlate those 2 components from the embedding vectors with the
anomaly flag.
Figure 5 shows the PCA models with 2, 5 and 10 components and their variation.
The principal component 2 gives 99.9 variation, and it is chosen.
6 Tools Description
6.1 Python
Python is used to program machine learning and deep learning models for anomaly
detection. Autoencoders are the deep learning algorithms that are to be used. Autoen-
coders help us in reproducing the exact output from the input given by reducing the
noise in data, using Databricks notebooks as IDE for Python.
6.2 C-SHARP or C#
C# (C-sharp) is used for simulating data to the Azure IoT Hub. C# provides a set
of easy-to-understand and seamless, continuous testing of samples for connecting to
Azure IoT Hub, using Visual Studio code as IDE for C#.
Azure IoT Central is an IoT application platform as a service which is used for the
easy creation of IoT solutions. Here, Azure IoT Central is the crucial resource for data
simulation. It is used for inputting data into the remote device, storing the data and
creating data schemas and machine control, and it is used for remote communication.
Predictive Maintenance for Remote Field IoT Devices … 575
Databricks helps users to perform storing, cleaning and visualizing large amounts
of data from distributed sources. Azure Databricks also provides ETL functions
for extractions, querying and creating multiple visualizations. Since Databricks
is directly connected to the cloud, it provides easy connectivity to the contained
instances and Azure function apps.
Azure Kubernetes are Azure container services which help in deploying and
containerizing applications. It also helps in managing the containerized applica-
tions. Here, the Azure Kubernetes service is used to create a container instance for
the Databricks code to run. The image is deployed by specifying the operating system
and memory. The container makes it feasible into the function app.
Azure Event Hub is a data streaming cloud platform which is used as an event
ingestion service. It is similar to Kafka except for the fact that the event hub is fully
managed by the cloud. The event hub is used injest to data into the Azure function
app for the model to process and output the results.
The Azure function app provides seamless running of serverless code automatically
when triggered with data. The data from the event hub is ingested into the function
app. The function app contains a model for anomaly detection saved in the Kubernetes
image. As the new data comes, the model in the Kubernetes image trains over it and
passes the new model to the function app. The results from the model are stored in
an Azure storage blob.
576 A. Kannammal et al.
Microsoft Power Automate or Microsoft Flow is used for automating the workflows.
It is popularly used for automating emails, collecting data and getting notifications.
The results of the function app stored in the blob are used by the Power Automate
to automatically send repair alerts to the respective members.
Using the tools stated above, the following modules are to be set up for the process
to run.
Figure 6 depicts the complete process flow of our entire system.
The Azure account is used to get access to Azure services and Azure subscriptions.
All the operations stated can be performed using Azure Cloud Services. The resource
group may include all the resources for solutions provided by the system.
For the resources Azure IoT Hub, Azure storage account, Azure Databricks, Azure
Kubernetes, Azure Function app and Azure Event hub while creating them, the spec-
ification of what subscription it belongs to, what pricing tier to be used, the network it
should operate on and the maximum capacity to be used should be configured while
the resources are being created.
Since real data from IoT sensors is not available, we’ll try to simulate the data
using industry standards and measure using programming or simulation tools. By
specifying the maximum and minimum limits, the frequency of distribution and
its range, the random number for simulation can be generated by using C-sharp to
program locally and connecting it to Azure IoT Central. The maximum capacity of
each attribute is set by a random high limit. It can be changed according to the user
setting. The IoT sensor data is not stored in a database. It is simulated every time and
sent to the Azure containers to run the models.
Figure 7 shows the simulated attribute data for the oil rod pump under normal
conditions. From Tables 1, 2, 3, 4 and 5 below, the normal and failure data ranges
for the oil rod pump attributes are listed.
Table 1 shows the properties and limits to generate random numbers for the
attribute Motor Power under normal and failure states.
Table 2 shows the properties and limits to generate random numbers for the
attribute Motor Speed under normal and failure states.
Table 3 shows the properties and limits to generate random numbers for the
attribute Pump rate under normal and failure states.
Table 4 shows the properties and limits to generate random numbers for the
attribute Casing friction under normal and failure states.
Table 5 shows the properties and limits to generate random numbers for the
attribute Pump Time under normal and failure states.
Figure 8 shows the simulated attribute data for the oil rod pump under gradual
failure conditions. From Tables 1, 2, 3, 4 and 5, the normal and failure data ranges
for the oil rod pump attributes are listed. After 5000 normal state simulations of each
attribute, the data changes gradually to a failed state.
Figure 9 shows the simulated attribute data for the oil rod pump under failure
conditions. From Tables 1, 2, 3, 4 and 5, the normal and failure data ranges for the oil
rod pump attributes are listed. After 5000 normal state simulations of each attribute,
the data changes to an immediate failed state.
the model and create a container image for the web service that uses it and deploy
that image onto an Azure Container Instance. The web app service will know how
to load the model and use it for scoring needs to be saved onto a file for the Azure
Machine Learning service SDK to deploy it.
Figure 10 shows the deployment of the model, and it is tested by giving anomalous
values and the result is verified.
The Azure Event Hub acts as a data exporter or data streaming platform for the data
present in IoT Central hub. The data generated will be exported into the Kubernetes
containers using the Event Hub. The Event Hub is a big data streaming service. The
Kubernetes containers contain the deep learning model. As the data is simulated, the
continuous export into the deep learning model happens simultaneously.
The function app in Azure is used to run serverless code which here is used to run
the anomaly detection code present in the containers. The result from the anomaly
detection code will be 0 or 1 which indicates whether the pump is going to fail or not.
From this indication, the function app sends the message to the notification queue
containing the message of which device is going to fail.
Figure 11 shows the Azure Function App shows streaming logs. The highlighted
text indicates the device repair email which is sent into the notification queue from
the Azure function app.
8 Performance Measures
The Mean squared error is used as the loss function of the model during the advancing
of the epochs. The validation of the model is by 95% and 5% split. The Mean
Average error is calculated between the predicted and actual values of the model.
Predictive Maintenance for Remote Field IoT Devices … 581
The validation loss spikes up after 0.01, so we set the Mean Average Error (MAE)
as 0.01. The difference between the actual and predicted values is observed.
Figure 12 shows the histogram representation of the result (MAE loss) which
enables us to understand what is the reasonable value that identifies “normal condi-
tions”. At the right end of the bell shape, it safely assumes that 0.01 is a good value
for the threshold.
Figure 13 shows the two graphs which show the two datasets (the one containing
gradual failure and the one containing immediate failure) and running them through
our full model to get the predicted values.
9 Results
The simulated data is charted as dashboards in Azure IoT Central. The team can
switch the rod pump on/off using Azure IoT Central. The Dashboard contains the
simulated flows of all the pumps, the pump on time, the average values of every
attribute from each pump and also the pump location. The failure of the pump is
indicated by sending alert messages into the notification queues container created in
the Azure Blob Storage.
9.1 Dashboard
Figure 14 shows the line chart in the dashboard containing the simulated data of 3
rod pumps with a normal condition, gradual failure and immediate failure conditions
with data attributes—casing friction, motor power, motor speed and pump rate for
each pump.
Figure 15 shows the pie chart on the right which indicates the average percentage
of time each pump is on and the table on the right contains the average value for
casing friction, motor power, motor speed and pump rate for each pump. Whenever
the values of the attributes go below the minimum values of the normal conditions,
the values change to red colour text.
Fig. 14 Dashboard—line chart of casing friction, motor power, motor speed and pump rate for
each pump
Predictive Maintenance for Remote Field IoT Devices … 583
The notification service is facilitated by Azure storage queues which receive the
message from the Azure function app. The message will be an indication of which
device is going to fail. This message will be pushed as an email to the respective
members. As the new messages come in, the queues send the messages.
10 Gap Analysis
Instead of fixing the design of the system, the Gap analysis focuses on fixing the
maintenance strategy based on continuous monitoring by predicting the failure of the
system and helping in failure management. The Gap analysis focuses on identifying
the technology or design gaps which prevent the implementation of maintenance
strategy. The goal of the gap analysis is to find the objectives of the system and
analyse where the system is and the objectives met. The first step of the system
is to identify the symptoms and patterns of the failure model. The predictive gap
analysis also captures the breakdown of failure modes and alerts the user with a mail
showcasing the constraints and their performance metric. The system automates
decision-making by predictive maintenance strategies rather than optimizing failure.
11 Conclusion
The data from rod pumps are simulated using C# under three conditions: normal
running condition, immediately failing condition and gradually failing condition.
The simulated normal condition data were able to train the autoencoder model to
detect anomalous data. The dashboard displays the average motor speed, pump rate,
casing friction and motor power, also the pie chart shows the total time run for each
584 A. Kannammal et al.
device. Along with that, the KPI’s indicate whether the data values of immediate
and gradual failing pumps are below the normal conditions. The simulated data is
exported out into the Azure function app using the event hub. The code to detect
anomalies by autoencoders is written into Azure Databricks and deployed into the
Azure Machine Learning Studio. The Azure function app is created which is used
to run the Azure container; it is triggered by the data exported from the event hub.
The function app started running the deployed container when triggered by the simu-
lated data export and sends a notification message to the Azure storage queues. The
message shows which pump is about to fail and needs maintenance. This setup can
be configured for any remote maintenance device to predict anomalies for predictive
maintenance. The proposed system could predict problems like unexpected machine
downtime and provide alerts in advance to the repair team. The proposed system
could be incorporated in any manufacturing equipment that is fitted with an IoT
sensor and connected to the cloud. The proposed system automates maintenance
by deep learning models and sends alert messages to the repair team whereas in
conventional methods, manual labour frequently visits the site which causes waste
of money, time and manpower.
References
13. He Y, Gu C, Chen Z, Han X (2019) Integrated predictive maintenance strategy for manufacturing
systems by combining quality control and mission reliability analysis. Int J Prod Res 55(19)
14. Nadj M, Jegadeesan H, Maedche A, Hoffmann D, Erdmann P (2021) A situation awareness
driven design for predictive maintenance systems: the case of oil and gas pipeline operations.
In ECIS
15. Zheng B, Gao X, Li X (2019) Fault detection for sucker rod pump based on motor power