IEEE-Neuroscience-Inspired Algorithms For The Predictive Maintenance of Manufacturing Systems
IEEE-Neuroscience-Inspired Algorithms For The Predictive Maintenance of Manufacturing Systems
Abstract—If machine failures can be detected preemp- alone, improper maintenance and the resulting outages cost more
tively, then maintenance and repairs can be performed than 60 billion dollars per year [2]. Thus, smart data-driven
more efficiently, reducing production costs. Many machine paradigms such as PM have the potential to reduce industrial
learning techniques for performing early failure detection
using vibration data have been proposed; however, these production costs significantly.
methods are often power and data-hungry, susceptible to Recently, many statistical, machine learning (ML), and deep
noise, and require large amounts of data preprocessing. learning (DL) techniques for PM have been proposed. However,
Also, training is usually only performed once before infer- these methods are not without their shortcomings: Statistical
ence, so they do not learn and adapt as the machine ages. methods require extensive domain knowledge and often do not
In this article, we propose a method of performing online,
real-time anomaly detection for predictive maintenance us- generalize well to more complex use cases, while DL and ML
ing hierarchical temporal memory (HTM). Inspired by the techniques often require large amounts of training data and
human neocortex, HTMs learn and adapt continuously and are susceptible to increased error as machines age over time.
are robust to noise. Using the Numenta Anomaly Bench- Furthermore, ML and DL algorithms are highly susceptible to
mark, we empirically demonstrate that our approach out- noise, making them insufficiently robust for industrial settings
performs state-of-the-art algorithms at preemptively detect-
ing real-world cases of bearing failures and simulated 3-D without data preprocessing. Due to the high noise level and di-
printer failures. Our approach achieves an average score of versity among industrial systems, PM models that do not require
64.71, surpassing state-of-the-art deep-learning (49.38) and significant preprocessing or domain knowledge are considered
statistical (61.06) methods. more practical [3].
Index Terms—Anomaly detection, hierarchical temporal To overcome these issues, we propose the use of a learning
memory (HTM), predictive maintenance (PM), prognostics. algorithm inspired by neuroscience called hierarchical tempo-
ral memory (HTM), pioneered by Hawkins and Blakeslee [4].
I. INTRODUCTION Using binary sparse distributed representations (SDRs) to rep-
REDICTIVE maintenance (PM) is an emerging new resent data and an architecture incorporating feed-forward, lat-
P paradigm in manufacturing where symptoms of machine
degradation are detected before failures occur. It is a major
eral, and feedback connections, HTMs emulate the interactions
between pyramidal neurons in the neocortex. HTMs are online
part of the Industry 4.0 and smart manufacturing vision. Us- learning algorithms that require less application-specific tuning,
ing sensor readings, process parameters, and other operational are robust to noise, and adapt to variations in the data as they
characteristics, PM can help maximize tool life by reducing the continuously learn. In practice, this means HTMs can efficiently
number of unnecessary repairs performed while also reducing learn from a single training pass over small training datasets
the likelihood of unexpected failures [1]. In the United States with little to no hyperparameter tuning. These characteristics
also enable HTMs to learn in near real-time. For these reasons,
Manuscript received June 11, 2020; revised September 21, 2020 they are suitable for practical applications such as detecting early
and January 4, 2021; accepted February 16, 2021. Date of publication symptoms of failure in manufacturing equipment. In this work,
February 25, 2021; date of current version August 20, 2021. This work we demonstrate the effectiveness of an HTM-based anomaly
was supported in part by National Science Foundation (NSF) under
Award CMMI-1 739 503 and Award ECCS-1 839 429 and in part by the detection methodology at detecting these symptoms in roller-
Graduate Assistance in Areas of National Need (GAANN) under Award element bearings and 3-D printers.
P200A180052. Paper no. TII-20-2849. (Corresponding author: Arnav V.
Malawade.)
The authors are with the Department of Electrical Engineering and A. Related Work
Computer Science, University of California-Irvine, Irvine, CA 92697 USA
(e-mail: [email protected]; [email protected]; [email protected]; We focus on the specific task of PM on roller-element bearings
[email protected]; [email protected]). due to their broad application and utility in manufacturing. We
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2021.3062030. also evaluate additive manufacturing (AM) as it is a modern
Digital Object Identifier 10.1109/TII.2021.3062030 technique that presents unique challenges due to the dynamics
1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7981
of 3-D printers. Here, we briefly discuss works related to PM 1) Identifying time-series anomalies in near real-time de-
for roller bearings and additive manufacturing. spite ambient noise.
Many PM methods use statistical models due to their simplic- 2) Learning efficiently from small training datasets to im-
ity and explainability. These approaches rely on extracted time prove applicability to practical use cases.
and frequency domain features. For example, the energy entropy 3) Developing a solution that can be generalized to many
mean and root mean squared (rms) values of wavelets were used heterogeneous manufacturing systems without requiring
to diagnose ball bearing faults in [5]. In another example, the extensive domain-specific tuning.
spectral kurtosis (SK) of vibration and current signals was used 4) Adapting to changes in data statistics (i.e., machine
to detect and classify the surface roughness of ball bearings aging).
in [6]. Using a particle filter method, Zhang et al. [7] performed Despite the successes achieved by existing methods in the
fault detection on bearings similar to those found in helicopter aforementioned applications, industrial manufacturing systems
oil cooler fans. are diverse and complex, making it difficult to find solutions
In addition to statistical methods, ML techniques have been that generalize across applications. Consequently, PM systems
applied to a wide array of industrial prognosis tasks. One such require specialization, which necessitates specialized knowl-
method: AutoRegressive integrated moving average (ARIMA), edge and cross-domain skills. This is especially true in the
is one of the most popular techniques for time-series forecasting case of bearing-failure prognosis, as bearing design and life-
and was used to predict failures and identify quality defects in time management lies squarely in the mechanical and materials
a slitting machine in [8]. In another approach, Tobon-Mejia et engineering domains.
al. [9] used a mixture of Gaussians HMMs and wavelet packet It is difficult for any single technique to address all these
decomposition to estimate the remaining useful life (RUL) of research challenges effectively. For example, statistical methods
roller-element bearings. such as thresholding based on kurtosis or spectral analysis are
DL methods such as long short-term memory (LSTM) net- highly efficient and real-time capable but require explicitly
works and convolutional neural networks (CNNs) have also been defined health indicators and thresholds, which are machine- and
used extensively for PM. In one example, Feng et al. [10] used application-specific. Also, stationary methods including rms,
an LSTM for detecting anomalies in industrial control systems. kurtosis, and crest factor are only effective for stationary signals
Additionally, an RNN-LSTM was used to perform PM on an air (signals with time-invariant statistical properties), but bearing
booster compressor motor used in oil and gas equipment in [11]. vibration signals are generally cyclostationary (statistical prop-
Due to the increased complexity and relatively late adoption erties vary cyclically) or nonstationary (statistical properties
of AM systems, PM techniques for AM have not been studied change depending on speed and load conditions) [21]. Spectral
in great detail. Proposed approaches often draw from research kurtosis is applicable to nonstationary and nonperiodic signals
in related applications, such as PM for bearings. For example, but is sensitive to noise and outliers [22].
Yoon et al. [12] evaluated the feasibility of AM equipment fault Classical ML algorithms such as AR models, support vector
diagnosis using a piezoelectric strain sensor and an acoustic machines, hidden Markov models (HMM), random forests, and
sensor. In this article, features such as rms value, kurtosis, k-Nearest neighbors have been demonstrated for PM in exist-
skewness, and crest factor were used to detect faults. DL has also ing work, but require the extraction of explicit health indica-
been used for AM anomaly detection, such as in [13] where a tors (features) from data [23]. These algorithms also require
neural network was used to classify faults in 3-D printer vibration application-specific hyperparameter tuning, data preprocessing
data. as they have poor noise robustness [3], and regular updates of
Despite the proliferation of statistical, ML, and DL ap- model settings as they do not adapt to account for machine ag-
proaches to PM for manufacturing, to the best of our knowledge, ing [23]. Moreover, both HMM and AR methods are ineffective
no HTM-based solutions have been proposed. However, the on nonstationary signals [21].
structural and temporal properties of HTM algorithms allow In DL algorithms such as neural networks and LSTMs, health
them to excel at cross-domain tasks that apply to manufacturing, indicators can be learned implicitly by the network. However, a
such as anomaly detection [14]. Since the core objective of PM network trained for one machine cannot generalize to a new
in manufacturing is detecting early symptoms of part failure, machine without retraining with a large amount of data for
HTMs are a natural candidate for this task. HTMs were shown hundreds or thousands of epochs. Larger models may be able
to match or surpass neural networks at detecting and classifying to generalize better, but the complexity of training and opti-
foreign materials on a conveyor belt in a cigarette manufacturing mizing these models increases drastically with size [23]. This
plant [15]. HTMs have also proven effective at detecting anoma- domain-specific training and tuning process can be expensive,
lies in crowd movements [16], traffic patterns [17], human vital time-consuming, and impractical for real-world use cases. Like
signs [18], electrical grids [19], and computer hardware [20]. the ML methods, DL algorithms also have poor noise robust-
ness [24] and require high-quality data, or else performance can
suffer significantly [3]. To address this, significant preprocessing
steps are often needed to generate clean data for these models [3].
B. Research Challenges As stated in Section I-A, HTM-based anomaly detection
Overall, PM for manufacturing presents the following key methods have demonstrated success in several distinct fields.
research challenges: However, to the best of our knowledge, no prior work has
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7982 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021
Fig. 1. How Neocortical structures are modeled by HTM. The neocortex is composed of a large number of interconnected pyramidal neurons,
each with proximal (feed-forward), apical (feedback), and distal (lateral) dendrites to connect to other neurons. These relations are modeled in HTM
neurons as feed-forward, feedback, and lateral connections.
comprehensively explored HTM’s ability to model vibration feed-forward connections from the input) and 2) distal segments
data or demonstrated its practical value for PM. Overall, all (aggregation of lateral connections from neurons of the other
of these existing methods fall short of addressing one or more columns). Each HTM neuron can be in three states: 1) inactive
research challenges. (the default state), 2) predictive, and 3) active. The predictive
state of a neuron is determined by the activity of the distal
C. Our Novel Contributions segments, which in turn is determined by the activation state
of the other neurons. A neuron becomes active at any time only
To address these key research challenges and improve on the if it was in the predictive state at the previous instant, with an
PM performance demonstrated by previous works, our article exception that will be described in Section III-A. When the
presents the following contributions: sequences of activations are viewed temporally, it is easy to
1) We demonstrate the ability of HTM-based anomaly detec- see that the distal segments provide the temporal context for
tors to detect early symptoms of bearing failure in several activation and thus capture the temporal relations. The column
months’ worth of real-world vibration data. We show that structure augments this capability of HTM by enabling them
HTM’s can efficiently learn with only a single training to store multiple such overlapping temporal sequences. Further
pass. details on the HTM-based anomaly detection methodology are
2) We demonstrate the ability of HTMs to generalize across discussed in Section III-A.
applications without much fine-tuning and their ability to
continuously learn and adapt by evaluating their anomaly B. PM of Roller-Element Bearings
detection performance on a second, highly dynamic ap-
plication: 3-D printer vibration data. These characteristics Roller-element bearings perform the critical task of reducing
of HTMs make them more practical for real-world use friction between rotating parts in machinery. Generally, catas-
cases. trophic bearing failures present warning signs such as anoma-
3) We compare the performance of HTM anomaly detec- lous vibrations and/or noise. These anomalies can occur due to
tion methods against state-of-the-art anomaly detection environmental factors (moisture or debris entering the bearing)
techniques and traditional machine prognosis methods as well as installation errors (misalignment, excessive loads, or
such as condition-based maintenance. Specifically, we poor/improper lubrication) [25]. Recently, sensor-based tech-
evaluate each algorithm’s anomaly detection accuracy niques that leverage vibration and temperature data to monitor
and robustness to noise. bearing health have been proposed. For example, the NASA
4) We demonstrate the efficiency and real-time capability of Bearing Dataset and the Pronostia Bearing Dataset contain
HTM-based prognosis by comparing its execution time vibration and temperature data for several bearings which were
with that of the other techniques. run until failure [26], [27]. In both datasets, anomalies in the
vibration and temperature signals increase in size and frequency
as the bearings approach failure, showing a strong correlation
II. BACKGROUND THEORY between the sensors’ readings and system state.
A. Hierarchical Temporal Memory
HTM is a sequence learning framework modeled after the C. PM of 3-D Printers
structure of the neocortex in the human brain [4]. 3-D printing is a manufacturing process where a physical ob-
The basic unit of HTM is a neuron modeled after those present ject is constructed from layers of material in an iterative process.
in the neocortex [Fig. 1(b)]. These neurons are stacked on top Fused deposition modeling (FDM) is a standard technique where
of one another to form a column like the “cortical column” of melted thermoplastic is extruded through a moving print head
the neocortex. The final HTM is a composition of many such nozzle to build each layer. To ensure precision, stepper motors
columns. A single HTM neuron [Fig. 1(c)], is connected to control the extrusion rate of the nozzle as well as the X, Y, and
two types of segments: 1) proximal segments (aggregation of Z-axis movement of the print head. Since the motors, bearings,
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7983
Fig. 2. HTM anomaly detection framework. The time-series input X(t) is encoded into an SDR. This information is passed through a spatial
pooler and a temporal pooler before outputting a prediction Π(tn+1 ) for the next set of column activations. The prediction error between Π(tn ) and
A(tn ) and the historical distribution of anomaly scores are used to determine the anomaly likelihood L(tn ).
and belts are moving parts, they are prone to wear and must be to a large fraction of the inputs (50%). The output of this stage
regularly maintained to prevent component failures. As shown is also an SDR representing the columns of the HTM that will
in [28], these components leak vibration information that can be activated in the final output. We denote the spatial pooling
be used by PM systems. However, this leaked information is operation mathematically by Ik (.), where the input is the list of
nonstationary since 3-D printers move on multiple axes and columns ordered in decreasing order of their proximal segment
change direction and speed often, presenting a challenge for values, and k indicates the number of columns to be picked for
conventional PM methods. activation from the top of this list. The number k is typically the
top 2%, so the output representation is sparse. Let yc denote the
III. METHODOLOGY activation of the columns and P denote the proximal connections
where P is a binary matrix of size n × N . Then
A. Anomaly Detection Using Hierarchical Temporal
Memory yc = Ik (xP ). (1)
The end-to-end framework for the HTM-based detector is 3) Prediction: The next stage is prediction. The prediction
shown in Fig. 2. Our methodology for anomaly detection con- for the next time step is the predictive state of the HTM at the end
sists of the following steps. First, the time-series vibration data of the current time step. Let the weights of the lateral connections
d
X(t) is taken as input and encoded into a sparse distributed of the dth distal segment of the ith neuron of jth column be Di,j .
representation (SDR). Next, the SDR is passed through the We note that only those weights of connections that are above a
spatial pooler. The spatial pooler’s output is fed into the temporal certain threshold are considered to be established and the rest are
pooler, which then outputs a prediction for the next activation set to zero. A neuron (i, j) enters the predictive state provided the
Π(tn+1 ). Simultaneously, the prediction from the previous time sum of activations of at least one of the distal segments exceeds
step Π(tn ) is compared with the column activations in the current a certain threshold, θd . Denote the predictive state of a neuron
time step A(tn ) to give a prediction error value: A high error at time tn by πi,j (tn ). We denote the current activation state of
value indicates that this activation was not expected and may be all neurons at time tn by A(tn ). We denote the total predictive
anomalous. Finally, the anomaly detector uses the historical dis- state by the matrix Π(t), whose elements are therefore πi,j (tn ).
tribution of anomaly scores to calculate the anomaly likelihood Mathematically, πi,j (tn ) is given by
L(tn ) for the current data point based on the prediction error
1; if ∃ d s.t. ||Di,j
d
A(tn )||1 > θd
value; if L(tn ) exceeds a set threshold, then X(tn ) is flagged as πi,j (tn ) = (2)
0; otherwise
an anomaly. In the following paragraphs, we describe each of
these components in detail. where denotes the element-wise multiplication operation.
1) Encoder: The first stage in processing the input data X(t) 4) Temporal Pooling: The final stage is temporal pooling.
is the encoder. The encoder converts the incoming data point Temporal pooling computes the activation state A(tn ) (an
X(t) into a SDR. This representation is a vector of binary M × N matrix where M is the number of neurons per mini-
values, and it is sparse because only 2% of the bits are activated column and N is the number of mini-columns in the layer) of
for any input. This contrasts with DL methods that store and the HTM, which is also the output of HTM based on a temporal
learn a dense, distributed representation. Later, we shall describe context. A neuron i is activated provided its column is activated,
the advantages of using a sparse representation. We denote the i.e., yc (j) = 1, and provided it is in the predictive state, i.e.,
output of the encoder by x, a 1 × n vector. πi,j (tn−1 ) = 1. The other neurons in this column are inhibited. If
2) Spatial Pooling: The second stage is spatial pooling. The none of the neurons in a column that is active are in the predictive
spatial pooler identifies spatial relations between different re- state, then all the neurons of this column are activated. Here, the
gions of the encoder’s output through the proximal connections. predictive state πi,j (tn−1 ) from the previous time step is the
Spatial poolers can also be stacked to identify more complex temporal context. This temporal context is updated at the end
relations. The proximal segment of each neuron in a column of this time step as described in the prediction step above. Let
is initialized such that each neuron, where the neurons of the ai,j (t) be the i, jth element of A(tn ) denoting the activation state
same column share the same proximal segment, is connected of neuron i in column j. Then, the temporal pooling operation
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7984 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7985
Fig. 3. Accelerometer data from Test 2 of the NASA dataset [26]. Symptoms of bearing failure can be seen on 2/17 and 2/18 before the bearing’s
outer race failed on 2/19.
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7986 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021
TABLE I
NORMALIZED NAB SCORES FOR ANOMALY DETECTION ON THE BEARING
FAILURE DATASET
IV. RESULTS than the DL, ML, and HTM-based methods, albeit with lower
A. Roller Bearing Anomaly Detection performance. The HTMs using HD were 1.41x slower than
the HTMs with no anomaly likelihood and 3.76x faster than
Table I shows the NAB results for the selected algorithms on
the HTMs using LP on average. TM-HTM+HD processed the
the labeled bearing failure dataset as well as the total running
dataset 8.3x faster than LSTM.
time of each algorithm. The runtime was recorded over the
To evaluate the qualitative performance of each anomaly
complete dataset using a PC with an Intel Core i7-7700 k
detector, we plotted the anomaly scores over time for each
processor. As shown in Table I, TM-HTM+HD achieved the
detector for Test 1 of the pronostia bearing dataset and compared
highest anomaly detection score for the Standard and Low FN
them to the labeled ground truth anomaly windows in Fig. 6.
profiles while HTM+LP achieved the highest score for the Low
FP profile. TM-HTM+HD scored 67.05, 73.33, and 56.57 for
the Standard, Low FN, and Low FP profiles, respectively. The B. 3-D Printer Anomaly Detection
approach that scored closest to HTM was Windowed Gaussian, Table II shows our experimental results for the 3-D printer
which achieved scores of 64.70, 70.50, and 57.35 for the same dataset. HTM+HD achieved the highest score on the Low FN
profiles, respectively. HTM and HTM+LP performed better than profile while LSTM achieved the highest score on the Standard
TM-HTM TM-HTM+LP, indicating that TM-HTM’s imple- and Low FP profiles. HTM+HD achieved scores of 63.03, 73.18,
mentation only works well with the HD anomaly likelihood and 42.23 for the Standard, Low FN, and Low FP scoring
block. profiles, respectively. LSTM scored 64.76, 71.43, and 51.34 at
As expected, the statistical methods (windowed Gaussian, the same profiles, respectively. On both applications the HTM,
threshold-based, relative entropy) processed the dataset faster TM-HTM, and TM-HTM+LP detectors performed worse than
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7987
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7988 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021
to the input pattern, making them robust to noise. From the in the system. As shown by our results, the industry-standard
figure, it is also clear that the HTM implementations using LSTM requires a significant amount of time for training (over
anomaly likelihood blocks were more robust to noise outside of 1000 epochs) as well as application-specific tuning. In contrast,
the anomaly windows than the HTM or TM-HTM alone. This HTMs do not require any application-specific parameter tuning
is likely because the anomaly likelihood components filter out and are essentially plug-and-play since they only need to be
smaller detections to isolate only the most plausible anomalies. trained with a single pass on normal sensor data. These character-
The HTM+HD and TM-HTM+HD detected anomalies earlier istics make HTMs an extremely viable, out-of-the-box solution
than the other configurations, albeit with slightly more false posi- for industrial PM.
tives. The outputs of the different HTMs starkly contrast with the
highly variable anomaly score outputs of Windowed Gaussian,
EXPoSE, KNN-CAD, and BC, among others. These detectors VI. CONCLUSION
record high anomaly scores even when there is relatively low Existing methods for predicting machine failures from sensor
noise in the input, meaning that they will likely suffer from false data are limited in their practicality due to shortcomings, in-
positives at higher noise levels. cluding poor noise resistance, efficiency, and adaptability. Our
A detector’s threshold can be tuned to account for higher experiments demonstrated that our methodology outperforms
noise levels; however, for detectors such as windowed gaus- state-of-the-art approaches at detecting anomalies in both bear-
sian, which used the maximum detection threshold of 1.0, the ing and 3-D printer failure data with minimal to no preprocessing
threshold cannot be increased further to reduce its sensitivity. In or application-specific tuning. On the Standard scoring profile,
contrast, TM-HTM+HD used a threshold of 0.5497 on the stan- our methodology using HD anomaly likelihood achieved an
dard profile. Thus, although windowed gaussian outperformed average NAB score of 64.71. In comparison, the other top
TM-HTM+HD on the Low FP scoring profile, it lacks tunability algorithms: LSTM and Windowed Gaussian, achieved average
and will likely perform much worse than this HTM configuration scores of 49.38 and 61.06, respectively. Furthermore, our qual-
in more noisy environments. itative results showed that our methodology was significantly
LSTM appears to have good robustness to noise, as shown in more noise-resistant than the Windowed Gaussian, KNN-CAD,
Fig. 6. However, it is clear from the figure that it missed some of EXPoSE, and BC detectors, which we attribute to the use of
the earlier anomaly windows completely. In the context of PM, SDRs and an anomaly likelihood component. We also demon-
this can mean that an observer will only be warned of degradation strated that our methodology was real-time capable, with an
later and will not have much time to organize repairs. Overall, our execution time on the same order of magnitude as state-of-the-art
methodology demonstrates significant noise-robustness, better methods. Consequently, we conclude that HTM-based anomaly
tunability, and the ability to detect early anomalies as well as detection is a novel, practical solution for a wide range of
larger, late-stage anomalies. industrial PM applications.
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7989
[8] A. Kanawaday and A. Sane, “Machine learning for predictive maintenance [30] Numenta, “Numenta temporal memory implementation,” Feb. 2020. Ac-
of industrial machines using iot sensor data,” in Proc. 8th IEEE Int. Conf. cessed: Feb. 10, 2020. [Online]. Available: https://github.com/numenta/
Softw. Eng. Serv. Sci., 2017, pp. 87–90. nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp
[9] D. A. Tobon-Mejia, K. Medjaher, N. Zerhouni, and G. Tripot, “A data- [31] J. Park, “RNN based time-series anomaly detector model implemented in
driven failure prognostics method based on mixture of Gaussians hid- Pytorch,” 2018. [Online]. Available: https://github.com/chickenbestlover/
den markov models,” IEEE Trans. Rel., vol. 61, no. 2, pp. 491–503, RNN-Time-series-Anomaly-Detection
Jun. 2012. [32] M. Schneider, W. Ertel, and F. Ramos, “Expected similarity estimation
[10] C. Feng, T. Li, and D. Chana, “Multi-level anomaly detection in industrial for large-scale batch and streaming anomaly detection,” Mach. Learn.,
control systems via package signatures and lstm networks,” in Proc. 47th vol. 105, no. 3, pp. 305–333, 2016.
Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2017, pp. 261–272. [33] M. Smirnov, “Contextual anomaly detector,” Aug. 2016. [Online]. Avail-
[11] T. Abbasi, K. H. Lim, and K. San Yam, “Predictive maintenance of oil and able: https://github.com/smirmik/CAD
gas equipment using recurrent neural network,” in Proc. IOP Conf. Ser.: [34] C. Wang, K. Viswanathan, L. Choudur, V. Talwar, W. Satterfield, and
Mater. Sci. Eng., 2019, Art. no. 012067. K. Schwan, “Statistical techniques for online anomaly detection in data
[12] J. Yoon, D. He, and B. Van Hecke, “A phm approach to additive manufac- centers,” in Proc. 12th IFIP/IEEE Int. Symp. Integr. Netw. Manage. (IM
turing equipment health monitoring, fault diagnosis, and quality control,” 2011) Workshops, 2011, pp. 385–392.
in Proc. Prognostics Health Manage. Soc. Conf., 2014, pp. 1–9. [35] A. Stanway, “Etsy skyline,” Oct. 2015. [Online]. Available: https://github.
[13] C.-T. Yen and P.-C. Chuang, “Application of a neural network integrated com/etsy/skyline
with the Internet of Things sensing technology for 3D printer fault diag- [36] E. Burnaev and V. Ishimtsev, “Conformalized density-and distance-based
nosis,” Microsyst. Technol., pp. 1–11, 2019. [Online]. Available: https:// anomaly detection in time-series data,” 2016, arXiv:1608.04585.
link.springer.com/article/10.1007%2Fs00542-019-04323-4#article-info [37] R. P. Adams and D. J. MacKay, “Bayesian online changepoint detection,”
[14] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real- 2007, arXiv:0710.3742.
time anomaly detection for streaming data,” Neurocomputing, vol. 262,
pp. 134–147, 2017.
[15] L. Rodriguez-Cobo, P. B. Garcia-Allende, A. Cobo, J. M. Lopez-Higuera,
and O. M. Conde, “Raw material classification by means of hyperspectral
imaging and hierarchical temporal memories,” IEEE Sensors J., vol. 12, Arnav V. Malawade (Student Member, IEEE)
no. 9, pp. 2767–2775, Sep. 2012. received the B.S. degree in computer science
[16] A. Bamaqa, M. Sedky, T. Bosakowski, and B. B. Bastaki, “Anomaly detec- and engineering from the University of California
tion using hierarchical temporal memory (HTM) in crowd management,” Irvine (UCI), Irvine, CS, USA, in 2018. He is
in Proc. 4th Int. Conf. Cloud Big Data Comput., 2020, pp. 37–42. currently the M.S. and Ph.D. Student studying
[17] A. Almehmadi, T. Bosakowski, M. Sedky, and B. B. Bastaki, “HTM based computer engineering with UCI under the super-
anomaly detecting model for traffic congestion,” in Proc. 4th Int. Conf. vision of Professor Mohammad Al Faruque.
Cloud Big Data Comput., 2020, pp. 97–101. His research interests include the design
[18] B. B. Bastaki, “Application of hierarchical temporal memory to anomaly and security of cyber-physical systems in
detection of vital signs for ambient assisted living,” Ph.D. dissertation, connected/autonomous vehicles, manufactur-
Staffordshire Univ., Stoke-on-Trent, U.K., 2019. ing, IoT, and healthcare.
[19] A. Barua, D. Muthirayan, P. P. Khargonekar, and M. A. Al Faruque,
“Hierarchical temporal memory based one-pass learning for real-
time anomaly detection and simultaneous data prediction in smart
grids,” IEEE Trans. Dependable Secure Comput., to be published,
doi: 10.1109/TDSC.2020.3037054.
[20] S. Faezi, R. Yasaei, A. Barua, and M. A. Al Faruque, “Brain-inspired
golden chip free hardware trojan detection,” IEEE Trans. Inf. Forensics
Secur., to be published, doi: 10.1109/TIFS.2021.3062989. Nathan D. Costa (Member, IEEE) received the
[21] W. Yan, H. Qiu, and N. Iyer, “Feature extraction for bearing prognostics B.S. degree in computer science and engineer-
and health management (PHM)-a survey, ”Air Force Research Lab Wright- ing from the University of California Irvine (UCI),
Patterson AFB OH Materials and Manufacturing, Tech. Rep. AFRL-RX- Irvine, CS, USA, in 2020.
WP-TP-2008-4309, 2008. He is currently applying to industries relevant
[22] D. Wang, K.-L. Tsui, and Q. Miao, “Prognostics and health management: to his interests, those being embedded software
A review of vibration based bearing and gear health indicators,” IEEE development and embedded system design.
Access, vol. 6, pp. 665–676, 2017.
[23] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for
smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48,
pp. 144–156, 2018.
[24] M. Kordos and A. Rusiecki, “Reducing noise impact on MLP training,”
Soft Comput., vol. 20, no. 1, pp. 49–65, 2016.
[25] ISO 15243:2017, “Rolling bearings - damage and failures - terms, charac-
teristics and causes,” Int. Org. for Standardization, Standard, Mar. 2017.
[Online]. Available: https://www.iso.org/standard/59619.html Deepan Muthirayan (Member, IEEE) received
[26] J. Lee, H. Qiu, G. Yu, J. Lin, and Rexnord Technical Services (2007). the Ph.D. degree in mechanical engineering
IMS, University of Cincinnati “Bearing data set,” NASA Ames Prog- from the University of California, Berkeley, CA,
nostics Data Repository (http://ti.arc.nasa.gov/project/prognostic-data- USA, in 2016, and the B.Tech/M.tech degree in
repository), NASA Ames Research Center, Moffett Field, CA. engineering design from the Indian Institute of
[27] P. Nectoux et al., “Pronostia: An experimental platform for bearings Technology Madras, Chennai, India, in 2010.
accelerated degradation tests,” in Proc. IEEE Int. Conf. Prognostics Health He is currently a Postdoctoral Researcher
Manage., PHM’12. IEEE Catalog Number: CPF12PHM-CDR, 2012, with the Department of Electrical Engineering
pp. 1–8. and Computer Science, University of Califor-
[28] S. R. Chhetri and M. A. Al Faruque, “Side channels of cyber-physical nia. His doctoral thesis work focused on market
systems: Case study in additive manufacturing,” IEEE Des. Test, vol. 34, mechanisms for integrating demand flexibility in
no. 4, pp. 18–25, Aug. 2017. energy systems. Before his term at UC Irvine he was a Postdoctoral
[29] J. Hawkins and S. Ahmad, “Why neurons have thousands of synapses, a Associate with Cornell University, Ithaca, NY, USA, where his work
theory of sequence memory in neocortex,” Front. Neural Circuits, vol. 10, focused on online scheduling algorithms for managing demand flexibility.
2016. [Online]. Available: https://www.frontiersin.org/articles/10.3389/ His current research interests include control theory, machine learning,
fncir.2016.00023/full topics at the intersection of learning and control, online learning, online
algorithms, game theory, and their application to smart systems.
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7990 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021
Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.