0% found this document useful (0 votes)
65 views

IEEE-Neuroscience-Inspired Algorithms For The Predictive Maintenance of Manufacturing Systems

Uploaded by

Weiwei Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

IEEE-Neuroscience-Inspired Algorithms For The Predictive Maintenance of Manufacturing Systems

Uploaded by

Weiwei Chen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

7980 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO.

12, DECEMBER 2021

Neuroscience-Inspired Algorithms for the


Predictive Maintenance of
Manufacturing Systems
Arnav V. Malawade , Student Member, IEEE, Nathan D. Costa , Member, IEEE,
Deepan Muthirayan , Member, IEEE, Pramod P. Khargonekar , Fellow, IEEE, and Mohammad A. Al
Faruque , Senior Member, IEEE

Abstract—If machine failures can be detected preemp- alone, improper maintenance and the resulting outages cost more
tively, then maintenance and repairs can be performed than 60 billion dollars per year [2]. Thus, smart data-driven
more efficiently, reducing production costs. Many machine paradigms such as PM have the potential to reduce industrial
learning techniques for performing early failure detection
using vibration data have been proposed; however, these production costs significantly.
methods are often power and data-hungry, susceptible to Recently, many statistical, machine learning (ML), and deep
noise, and require large amounts of data preprocessing. learning (DL) techniques for PM have been proposed. However,
Also, training is usually only performed once before infer- these methods are not without their shortcomings: Statistical
ence, so they do not learn and adapt as the machine ages. methods require extensive domain knowledge and often do not
In this article, we propose a method of performing online,
real-time anomaly detection for predictive maintenance us- generalize well to more complex use cases, while DL and ML
ing hierarchical temporal memory (HTM). Inspired by the techniques often require large amounts of training data and
human neocortex, HTMs learn and adapt continuously and are susceptible to increased error as machines age over time.
are robust to noise. Using the Numenta Anomaly Bench- Furthermore, ML and DL algorithms are highly susceptible to
mark, we empirically demonstrate that our approach out- noise, making them insufficiently robust for industrial settings
performs state-of-the-art algorithms at preemptively detect-
ing real-world cases of bearing failures and simulated 3-D without data preprocessing. Due to the high noise level and di-
printer failures. Our approach achieves an average score of versity among industrial systems, PM models that do not require
64.71, surpassing state-of-the-art deep-learning (49.38) and significant preprocessing or domain knowledge are considered
statistical (61.06) methods. more practical [3].
Index Terms—Anomaly detection, hierarchical temporal To overcome these issues, we propose the use of a learning
memory (HTM), predictive maintenance (PM), prognostics. algorithm inspired by neuroscience called hierarchical tempo-
ral memory (HTM), pioneered by Hawkins and Blakeslee [4].
I. INTRODUCTION Using binary sparse distributed representations (SDRs) to rep-
REDICTIVE maintenance (PM) is an emerging new resent data and an architecture incorporating feed-forward, lat-
P paradigm in manufacturing where symptoms of machine
degradation are detected before failures occur. It is a major
eral, and feedback connections, HTMs emulate the interactions
between pyramidal neurons in the neocortex. HTMs are online
part of the Industry 4.0 and smart manufacturing vision. Us- learning algorithms that require less application-specific tuning,
ing sensor readings, process parameters, and other operational are robust to noise, and adapt to variations in the data as they
characteristics, PM can help maximize tool life by reducing the continuously learn. In practice, this means HTMs can efficiently
number of unnecessary repairs performed while also reducing learn from a single training pass over small training datasets
the likelihood of unexpected failures [1]. In the United States with little to no hyperparameter tuning. These characteristics
also enable HTMs to learn in near real-time. For these reasons,
Manuscript received June 11, 2020; revised September 21, 2020 they are suitable for practical applications such as detecting early
and January 4, 2021; accepted February 16, 2021. Date of publication symptoms of failure in manufacturing equipment. In this work,
February 25, 2021; date of current version August 20, 2021. This work we demonstrate the effectiveness of an HTM-based anomaly
was supported in part by National Science Foundation (NSF) under
Award CMMI-1 739 503 and Award ECCS-1 839 429 and in part by the detection methodology at detecting these symptoms in roller-
Graduate Assistance in Areas of National Need (GAANN) under Award element bearings and 3-D printers.
P200A180052. Paper no. TII-20-2849. (Corresponding author: Arnav V.
Malawade.)
The authors are with the Department of Electrical Engineering and A. Related Work
Computer Science, University of California-Irvine, Irvine, CA 92697 USA
(e-mail: [email protected]; [email protected]; [email protected]; We focus on the specific task of PM on roller-element bearings
[email protected]; [email protected]). due to their broad application and utility in manufacturing. We
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2021.3062030. also evaluate additive manufacturing (AM) as it is a modern
Digital Object Identifier 10.1109/TII.2021.3062030 technique that presents unique challenges due to the dynamics
1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7981

of 3-D printers. Here, we briefly discuss works related to PM 1) Identifying time-series anomalies in near real-time de-
for roller bearings and additive manufacturing. spite ambient noise.
Many PM methods use statistical models due to their simplic- 2) Learning efficiently from small training datasets to im-
ity and explainability. These approaches rely on extracted time prove applicability to practical use cases.
and frequency domain features. For example, the energy entropy 3) Developing a solution that can be generalized to many
mean and root mean squared (rms) values of wavelets were used heterogeneous manufacturing systems without requiring
to diagnose ball bearing faults in [5]. In another example, the extensive domain-specific tuning.
spectral kurtosis (SK) of vibration and current signals was used 4) Adapting to changes in data statistics (i.e., machine
to detect and classify the surface roughness of ball bearings aging).
in [6]. Using a particle filter method, Zhang et al. [7] performed Despite the successes achieved by existing methods in the
fault detection on bearings similar to those found in helicopter aforementioned applications, industrial manufacturing systems
oil cooler fans. are diverse and complex, making it difficult to find solutions
In addition to statistical methods, ML techniques have been that generalize across applications. Consequently, PM systems
applied to a wide array of industrial prognosis tasks. One such require specialization, which necessitates specialized knowl-
method: AutoRegressive integrated moving average (ARIMA), edge and cross-domain skills. This is especially true in the
is one of the most popular techniques for time-series forecasting case of bearing-failure prognosis, as bearing design and life-
and was used to predict failures and identify quality defects in time management lies squarely in the mechanical and materials
a slitting machine in [8]. In another approach, Tobon-Mejia et engineering domains.
al. [9] used a mixture of Gaussians HMMs and wavelet packet It is difficult for any single technique to address all these
decomposition to estimate the remaining useful life (RUL) of research challenges effectively. For example, statistical methods
roller-element bearings. such as thresholding based on kurtosis or spectral analysis are
DL methods such as long short-term memory (LSTM) net- highly efficient and real-time capable but require explicitly
works and convolutional neural networks (CNNs) have also been defined health indicators and thresholds, which are machine- and
used extensively for PM. In one example, Feng et al. [10] used application-specific. Also, stationary methods including rms,
an LSTM for detecting anomalies in industrial control systems. kurtosis, and crest factor are only effective for stationary signals
Additionally, an RNN-LSTM was used to perform PM on an air (signals with time-invariant statistical properties), but bearing
booster compressor motor used in oil and gas equipment in [11]. vibration signals are generally cyclostationary (statistical prop-
Due to the increased complexity and relatively late adoption erties vary cyclically) or nonstationary (statistical properties
of AM systems, PM techniques for AM have not been studied change depending on speed and load conditions) [21]. Spectral
in great detail. Proposed approaches often draw from research kurtosis is applicable to nonstationary and nonperiodic signals
in related applications, such as PM for bearings. For example, but is sensitive to noise and outliers [22].
Yoon et al. [12] evaluated the feasibility of AM equipment fault Classical ML algorithms such as AR models, support vector
diagnosis using a piezoelectric strain sensor and an acoustic machines, hidden Markov models (HMM), random forests, and
sensor. In this article, features such as rms value, kurtosis, k-Nearest neighbors have been demonstrated for PM in exist-
skewness, and crest factor were used to detect faults. DL has also ing work, but require the extraction of explicit health indica-
been used for AM anomaly detection, such as in [13] where a tors (features) from data [23]. These algorithms also require
neural network was used to classify faults in 3-D printer vibration application-specific hyperparameter tuning, data preprocessing
data. as they have poor noise robustness [3], and regular updates of
Despite the proliferation of statistical, ML, and DL ap- model settings as they do not adapt to account for machine ag-
proaches to PM for manufacturing, to the best of our knowledge, ing [23]. Moreover, both HMM and AR methods are ineffective
no HTM-based solutions have been proposed. However, the on nonstationary signals [21].
structural and temporal properties of HTM algorithms allow In DL algorithms such as neural networks and LSTMs, health
them to excel at cross-domain tasks that apply to manufacturing, indicators can be learned implicitly by the network. However, a
such as anomaly detection [14]. Since the core objective of PM network trained for one machine cannot generalize to a new
in manufacturing is detecting early symptoms of part failure, machine without retraining with a large amount of data for
HTMs are a natural candidate for this task. HTMs were shown hundreds or thousands of epochs. Larger models may be able
to match or surpass neural networks at detecting and classifying to generalize better, but the complexity of training and opti-
foreign materials on a conveyor belt in a cigarette manufacturing mizing these models increases drastically with size [23]. This
plant [15]. HTMs have also proven effective at detecting anoma- domain-specific training and tuning process can be expensive,
lies in crowd movements [16], traffic patterns [17], human vital time-consuming, and impractical for real-world use cases. Like
signs [18], electrical grids [19], and computer hardware [20]. the ML methods, DL algorithms also have poor noise robust-
ness [24] and require high-quality data, or else performance can
suffer significantly [3]. To address this, significant preprocessing
steps are often needed to generate clean data for these models [3].
B. Research Challenges As stated in Section I-A, HTM-based anomaly detection
Overall, PM for manufacturing presents the following key methods have demonstrated success in several distinct fields.
research challenges: However, to the best of our knowledge, no prior work has

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7982 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

Fig. 1. How Neocortical structures are modeled by HTM. The neocortex is composed of a large number of interconnected pyramidal neurons,
each with proximal (feed-forward), apical (feedback), and distal (lateral) dendrites to connect to other neurons. These relations are modeled in HTM
neurons as feed-forward, feedback, and lateral connections.

comprehensively explored HTM’s ability to model vibration feed-forward connections from the input) and 2) distal segments
data or demonstrated its practical value for PM. Overall, all (aggregation of lateral connections from neurons of the other
of these existing methods fall short of addressing one or more columns). Each HTM neuron can be in three states: 1) inactive
research challenges. (the default state), 2) predictive, and 3) active. The predictive
state of a neuron is determined by the activity of the distal
C. Our Novel Contributions segments, which in turn is determined by the activation state
of the other neurons. A neuron becomes active at any time only
To address these key research challenges and improve on the if it was in the predictive state at the previous instant, with an
PM performance demonstrated by previous works, our article exception that will be described in Section III-A. When the
presents the following contributions: sequences of activations are viewed temporally, it is easy to
1) We demonstrate the ability of HTM-based anomaly detec- see that the distal segments provide the temporal context for
tors to detect early symptoms of bearing failure in several activation and thus capture the temporal relations. The column
months’ worth of real-world vibration data. We show that structure augments this capability of HTM by enabling them
HTM’s can efficiently learn with only a single training to store multiple such overlapping temporal sequences. Further
pass. details on the HTM-based anomaly detection methodology are
2) We demonstrate the ability of HTMs to generalize across discussed in Section III-A.
applications without much fine-tuning and their ability to
continuously learn and adapt by evaluating their anomaly B. PM of Roller-Element Bearings
detection performance on a second, highly dynamic ap-
plication: 3-D printer vibration data. These characteristics Roller-element bearings perform the critical task of reducing
of HTMs make them more practical for real-world use friction between rotating parts in machinery. Generally, catas-
cases. trophic bearing failures present warning signs such as anoma-
3) We compare the performance of HTM anomaly detec- lous vibrations and/or noise. These anomalies can occur due to
tion methods against state-of-the-art anomaly detection environmental factors (moisture or debris entering the bearing)
techniques and traditional machine prognosis methods as well as installation errors (misalignment, excessive loads, or
such as condition-based maintenance. Specifically, we poor/improper lubrication) [25]. Recently, sensor-based tech-
evaluate each algorithm’s anomaly detection accuracy niques that leverage vibration and temperature data to monitor
and robustness to noise. bearing health have been proposed. For example, the NASA
4) We demonstrate the efficiency and real-time capability of Bearing Dataset and the Pronostia Bearing Dataset contain
HTM-based prognosis by comparing its execution time vibration and temperature data for several bearings which were
with that of the other techniques. run until failure [26], [27]. In both datasets, anomalies in the
vibration and temperature signals increase in size and frequency
as the bearings approach failure, showing a strong correlation
II. BACKGROUND THEORY between the sensors’ readings and system state.
A. Hierarchical Temporal Memory
HTM is a sequence learning framework modeled after the C. PM of 3-D Printers
structure of the neocortex in the human brain [4]. 3-D printing is a manufacturing process where a physical ob-
The basic unit of HTM is a neuron modeled after those present ject is constructed from layers of material in an iterative process.
in the neocortex [Fig. 1(b)]. These neurons are stacked on top Fused deposition modeling (FDM) is a standard technique where
of one another to form a column like the “cortical column” of melted thermoplastic is extruded through a moving print head
the neocortex. The final HTM is a composition of many such nozzle to build each layer. To ensure precision, stepper motors
columns. A single HTM neuron [Fig. 1(c)], is connected to control the extrusion rate of the nozzle as well as the X, Y, and
two types of segments: 1) proximal segments (aggregation of Z-axis movement of the print head. Since the motors, bearings,

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7983

Fig. 2. HTM anomaly detection framework. The time-series input X(t) is encoded into an SDR. This information is passed through a spatial
pooler and a temporal pooler before outputting a prediction Π(tn+1 ) for the next set of column activations. The prediction error between Π(tn ) and
A(tn ) and the historical distribution of anomaly scores are used to determine the anomaly likelihood L(tn ).

and belts are moving parts, they are prone to wear and must be to a large fraction of the inputs (50%). The output of this stage
regularly maintained to prevent component failures. As shown is also an SDR representing the columns of the HTM that will
in [28], these components leak vibration information that can be activated in the final output. We denote the spatial pooling
be used by PM systems. However, this leaked information is operation mathematically by Ik (.), where the input is the list of
nonstationary since 3-D printers move on multiple axes and columns ordered in decreasing order of their proximal segment
change direction and speed often, presenting a challenge for values, and k indicates the number of columns to be picked for
conventional PM methods. activation from the top of this list. The number k is typically the
top 2%, so the output representation is sparse. Let yc denote the
III. METHODOLOGY activation of the columns and P denote the proximal connections
where P is a binary matrix of size n × N . Then
A. Anomaly Detection Using Hierarchical Temporal
Memory yc = Ik (xP ). (1)
The end-to-end framework for the HTM-based detector is 3) Prediction: The next stage is prediction. The prediction
shown in Fig. 2. Our methodology for anomaly detection con- for the next time step is the predictive state of the HTM at the end
sists of the following steps. First, the time-series vibration data of the current time step. Let the weights of the lateral connections
d
X(t) is taken as input and encoded into a sparse distributed of the dth distal segment of the ith neuron of jth column be Di,j .
representation (SDR). Next, the SDR is passed through the We note that only those weights of connections that are above a
spatial pooler. The spatial pooler’s output is fed into the temporal certain threshold are considered to be established and the rest are
pooler, which then outputs a prediction for the next activation set to zero. A neuron (i, j) enters the predictive state provided the
Π(tn+1 ). Simultaneously, the prediction from the previous time sum of activations of at least one of the distal segments exceeds
step Π(tn ) is compared with the column activations in the current a certain threshold, θd . Denote the predictive state of a neuron
time step A(tn ) to give a prediction error value: A high error at time tn by πi,j (tn ). We denote the current activation state of
value indicates that this activation was not expected and may be all neurons at time tn by A(tn ). We denote the total predictive
anomalous. Finally, the anomaly detector uses the historical dis- state by the matrix Π(t), whose elements are therefore πi,j (tn ).
tribution of anomaly scores to calculate the anomaly likelihood Mathematically, πi,j (tn ) is given by
L(tn ) for the current data point based on the prediction error 
1; if ∃ d s.t. ||Di,j
d
 A(tn )||1 > θd
value; if L(tn ) exceeds a set threshold, then X(tn ) is flagged as πi,j (tn ) = (2)
0; otherwise
an anomaly. In the following paragraphs, we describe each of
these components in detail. where  denotes the element-wise multiplication operation.
1) Encoder: The first stage in processing the input data X(t) 4) Temporal Pooling: The final stage is temporal pooling.
is the encoder. The encoder converts the incoming data point Temporal pooling computes the activation state A(tn ) (an
X(t) into a SDR. This representation is a vector of binary M × N matrix where M is the number of neurons per mini-
values, and it is sparse because only 2% of the bits are activated column and N is the number of mini-columns in the layer) of
for any input. This contrasts with DL methods that store and the HTM, which is also the output of HTM based on a temporal
learn a dense, distributed representation. Later, we shall describe context. A neuron i is activated provided its column is activated,
the advantages of using a sparse representation. We denote the i.e., yc (j) = 1, and provided it is in the predictive state, i.e.,
output of the encoder by x, a 1 × n vector. πi,j (tn−1 ) = 1. The other neurons in this column are inhibited. If
2) Spatial Pooling: The second stage is spatial pooling. The none of the neurons in a column that is active are in the predictive
spatial pooler identifies spatial relations between different re- state, then all the neurons of this column are activated. Here, the
gions of the encoder’s output through the proximal connections. predictive state πi,j (tn−1 ) from the previous time step is the
Spatial poolers can also be stacked to identify more complex temporal context. This temporal context is updated at the end
relations. The proximal segment of each neuron in a column of this time step as described in the prediction step above. Let
is initialized such that each neuron, where the neurons of the ai,j (t) be the i, jth element of A(tn ) denoting the activation state
same column share the same proximal segment, is connected of neuron i in column j. Then, the temporal pooling operation

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7984 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

can be mathematically described as match will be generated, Pf m , is given by


    
⎧ Ωx (b) w n−w
⎨ 1; if yc (j) = 1 and
πi,j (tn−1 ) = 1 Pf m = b≥θ , where Ωx (b) = × .
ai,j (t) = 1; yc (j) = 1 and i πi,j (tn−1 ) = 0 . (3) Ne b w−b
⎩ (6)
0; otherwise
Clearly, the probability of a false match has increased by
Fig. 2 shows the different stages of HTM processing in the allowing an error of up to w − θ. In the same example as above,
context of anomaly detection. After activation, the prediction er- if θ = 10, then w − θ = 10, that is an error up to 50% is allowed.
ror between the prediction from the previous time step Π(tn ) and We find that the probability of a false match is still 1/1013 , which
the current activation state A(tn ) is computed and passed to the for all practical purposes is zero. This is what gives SDRs and
anomaly likelihood block, which uses the historical distribution thereby HTMs robustness to noise.
of anomaly scores to determine if X(tn ) is a true anomaly. The sparsity of x allows for sparse computation, which makes
5) Learning: HTMs use a Hebbian-type learning algorithm computations with SDRs very efficient. For a representation x of
that reinforces the connection weights of the segments that size n and sparsity α, one does not need to store information on
correctly predict the activation at the next time-step. Each time all the bits. Instead, one can just store the address of the locations
step, the weights are re-evaluated as follows. The connection of bits of value one. Then, for an operation like matching, one
weights of an activated neuron’s segments that originated from just needs to check the value of the bits of the vector y at its
previously active neurons are increased. The connection weights corresponding locations; this is doable almost in constant time.
from neurons that were not active in the previous time-step are We can trivially extend this argument to show that the spatial
decreased. Additionally, weights of connections that are wrongly pooling, prediction, and temporal pooling operations described
predicted are also decreased but at a lesser rate, i.e., forgetting above can also be performed very efficiently in HTMs, thus
happens at a slower rate than updating. It is this type of learning giving HTMs their computational efficiency. Next, we discuss
that allows HTMs to learn continuously and adapt to changes our experimental setup for demonstrating the performance of
over a long term. The learning algorithm is discussed in much the HTM-based anomaly detector.
greater detail in [29].
6) On Capacity, Robustness, and Efficiency: Here, we illus- B. Experimental Setup
trate why HTMs are efficient and robust to noise. Let us consider We evaluate our proposed methodology on real-world bearing
an HTM with a large n, where n denotes the size of the encoder’s failure and simulated 3-D printer failure datasets. Here, we
output, x, a binary vector. Denote by w the maximum number discuss details about these datasets and the scoring system used
of bits that can be one. Typically, w is small relative to n. Given for evaluation.
this, lets define: α := w/n. Here, α is a measure of sparsity and 1) Bearing Dataset: We used the NASA bearing dataset and
denotes the fraction of the bits that can be active in the SDR the pronostia bearing dataset [26], [27]. The NASA bearing
of size n. An example would be, n = 2048 and w = 4 and so dataset contains three tests of bearings run to failure. The pronos-
α ≈ 0.002. tia bearing dataset contains vibration snapshots recorded with
The number of possible unique encodings, Ne that can be three different radial load and rpm settings. The accelerometer
stored in vector x, given n and w, is given by data for Test 2 of the NASA dataset is shown in Fig. 3. In total,
  our testing set consists of 40 vibration data files and 191 labeled
n n! anomalies.
Ne = = . (4)
w w!(n − w)! 2) 3-D Printer Dataset: Our experimental testbed for collect-
ing vibration data from a 3-D printer is shown in Fig. 4. The 3-D
For example, if n = 2048 and w = 20 then Ne = 1047 . Given printer uses one stepper motor to control each movement axis
Ne , the probability that one SDR x will match another SDR y, (X, Y, and Z). We placed one accelerometer directly behind each
which is randomly picked, is trivially computable stepper motor to capture vibration data from prints of various
3-D objects. To the best of our knowledge, no publicly available
P(x = y) = 1/Ne . (5) 3-D printer component-failure datasets exist, and generating
real-world failures would risk damaging our equipment. Thus,
Thus, the probability of a false match is, for all practical pur- we instead opted to generate synthetic anomalies in the 3-D
poses, zero. This shows that SDRs can store and recall reliably an printer vibration data.
astronomically large number of vectors. Consequently, it follows 3-D printer vibration signals are inherently nonstationary,
that HTMs can store and recall reliably an astronomically large meaning that their statistical properties vary with time. However,
number of sequences. since printers contain bearings and rotating components with
We can now relax the requirement and say that two SDRs similar dynamics, they share the same time-series and frequency
are equivalent if θ(< w) or more bits match. In this case, the domain features as those correlated with bearing health, such as
matching is allowed an error of up to w − θ bits. Denote by power spectral density (PSD) [21], [22]. For example, in Fig. 3
Ωx (b) the set of sparse vectors (of size n and sparsity α) that it is clear that the overall power of the vibration signal increases
have an overlap of b bits with x. Then, the probability that a false as the bearing nears failure. Intuitively, this same phenomenon

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7985

Fig. 3. Accelerometer data from Test 2 of the NASA dataset [26]. Symptoms of bearing failure can be seen on 2/17 and 2/18 before the bearing’s
outer race failed on 2/19.

Fig. 4. Experimental testbed used to collect vibration data from our


3-D printer. Three accelerometers were placed on the printer in total;
one sensor was placed directly behind each of the printer’s three stepper
motors.

2) Historical distribution (HD): The implementation is de-


will occur in a 3-D printer as components wear out. Thus, scribed in Section III-A.
we synthesized anomalies in the 3-D printer vibration data by 3) LSTM-based predictor (LP): The HD anomaly likelihood
mapping the PSD from our bearing failure data to the 3-D printer block was replaced with a 2-layer LSTM predictor trained
data. This composition enabled us to simulate the magnitude to predict normal HTM prediction error values in order
changes characteristic of bearing and component failures in the to filter out false positives/noise. The prediction error of
3-D printer while preserving the frequency components unique the LSTM was used as the final anomaly score.
to the 3-D printer. We also evaluated baseline and state-of-the-art anomaly de-
Our PSD mapping algorithm, shown in Algorithm 1, operates tectors including an RNN-based detector configured to use
on a sliding window over one bearing vibration file and one 3-D LSTM cells (denoted as LSTM) [31] (similar to [10], [11]), Win-
printer vibration file. For each window t, the following steps are dowed Gaussian (based on the tail probability of the distribution
performed: First, the fast fourier transform (FFT) Xb [t] of the over a sliding window), a threshold-based detector (similar to
bearing time-series data b[n] is calculated for a preset frequency condition-based maintenance and [5]), EXPoSE [32], contex-
bin-size. Next, the power in each frequency bin is calculated. tual anomaly detector (CAD-OSE) [33], relative entropy [34],
Then, we calculate the ratio C between the previous window’s etsy kkyline [35], KNN conformal anomaly detector (KNN-
power value and the current power value in each bin. This ratio is CAD) [36], bayesian changepoint (BC) [37], random (random
used to scale the corresponding frequency bin in the FFT of the anomaly score), and null (constant anomaly score). All of the
3-D printer data F F T (p[t]), yielding an FFT with synthesized listed algorithms except LSTM were exposed to the training
anomalies Xs [t]. Finally, the inverse FFT (IFFT) of Xs [t] is data once before testing and updated their models as they were
taken and added to the output at location s[t]. exposed to unseen test data. LSTM was trained for over 1000
The result after all iterations is a 3-D printer vibration signal epochs on the training data and was tested with the model
with synthesized anomalies s[n]. Using this mapping algorithm, settings that resulted in the lowest validation loss. LSTM was
we produced a simulated 3-D printer failure dataset containing tested offline, meaning that it did not update its model weights
15 test cases and 57 hand-labeled anomalies. during testing. The LP anomaly likelihood configuration was
3) Anomaly Detectors: To evaluate the performance of also trained in this manner but used the HTM output as its input
HTMs at PM, we use the following two HTM-based anomaly data instead.
detectors in our approach with slightly different temporal mem- 4) Scoring: To score each algorithm fairly, we rely on the
ory implementations, which we denote as HTM [14] and TM- numenta anomaly benchmark (NAB) [14]. NAB was designed
HTM [30]. To explore the effectiveness of anomaly likelihood to fairly benchmark anomaly detection algorithms against one
for HTM-based detectors, we evaluated HTM and TM-HTM another. It contains a built-in anomaly scoring algorithm, nor-
with three different anomaly likelihood configurations. malization, and three threshold optimization settings: Standard,
1) No anomaly likelihood: The prediction error of the HTM low false positives (Low FP), and low false negatives (Low FN).
was directly used as the anomaly score. NAB takes in datasets with labeled anomalies and produces

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7986 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

TABLE I
NORMALIZED NAB SCORES FOR ANOMALY DETECTION ON THE BEARING
FAILURE DATASET

Fig. 5. NAB scoring functionality: Detection scores are assigned ac-


cording to the scoring function. The anomaly detected in this example is
given a score of 0.65.

anomaly windows. These are used to score anomaly detectors on


how precisely they can pinpoint anomalies; early/on-time detec-
tions are rewarded, and very early/late detections are penalized.
The NAB scoring function is as follows: Given an application
profile A = [AT P , AF P , AT N , AF N ] specifying the weights for
each kind of detection, and the position y of the detection relative TABLE II
NORMALIZED NAB SCORES FOR ANOMALY DETECTION ON THE 3-D
to the anomaly window, the scoring function for each detection PRINTER DATASET
is
 
1
σ (y) = (AT P − AF P )
A
−1 . (7)
(1 + e5y )
These scores are summed up for all the detections in a file;
the following weighted penalty is deducted for every missed
detection (fd ): AF N fd . The summed score is then normalized
to a 0–100 scale where 0 represents equivalent (or worse)
performance to the Null detector, and 100 represents a perfect
anomaly detector. An example of the scoring functionality is
shown in Fig. 5. To provide ground-truth values of anomaly
locations in the dataset, we followed the NAB official anomaly
labeling guide and manually labeled anomalies in each dataset.
The first 15% of each vibration data file was used for training
with the remaining 85% used for testing and scoring.

IV. RESULTS than the DL, ML, and HTM-based methods, albeit with lower
A. Roller Bearing Anomaly Detection performance. The HTMs using HD were 1.41x slower than
the HTMs with no anomaly likelihood and 3.76x faster than
Table I shows the NAB results for the selected algorithms on
the HTMs using LP on average. TM-HTM+HD processed the
the labeled bearing failure dataset as well as the total running
dataset 8.3x faster than LSTM.
time of each algorithm. The runtime was recorded over the
To evaluate the qualitative performance of each anomaly
complete dataset using a PC with an Intel Core i7-7700 k
detector, we plotted the anomaly scores over time for each
processor. As shown in Table I, TM-HTM+HD achieved the
detector for Test 1 of the pronostia bearing dataset and compared
highest anomaly detection score for the Standard and Low FN
them to the labeled ground truth anomaly windows in Fig. 6.
profiles while HTM+LP achieved the highest score for the Low
FP profile. TM-HTM+HD scored 67.05, 73.33, and 56.57 for
the Standard, Low FN, and Low FP profiles, respectively. The B. 3-D Printer Anomaly Detection
approach that scored closest to HTM was Windowed Gaussian, Table II shows our experimental results for the 3-D printer
which achieved scores of 64.70, 70.50, and 57.35 for the same dataset. HTM+HD achieved the highest score on the Low FN
profiles, respectively. HTM and HTM+LP performed better than profile while LSTM achieved the highest score on the Standard
TM-HTM TM-HTM+LP, indicating that TM-HTM’s imple- and Low FP profiles. HTM+HD achieved scores of 63.03, 73.18,
mentation only works well with the HD anomaly likelihood and 42.23 for the Standard, Low FN, and Low FP scoring
block. profiles, respectively. LSTM scored 64.76, 71.43, and 51.34 at
As expected, the statistical methods (windowed Gaussian, the same profiles, respectively. On both applications the HTM,
threshold-based, relative entropy) processed the dataset faster TM-HTM, and TM-HTM+LP detectors performed worse than

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7987

Also, HTM+LP was the best performing model on the Low


FP profile for the bearing dataset. However, this performance
was not replicated in the 3-D printer dataset. Similarly, LSTM
beat HTM on the Standard and Low FP profile for the 3-D
printer dataset while performing worse than HTM on the bear-
ing dataset. Hence, our results suggest that LSTMs are highly
data-dependent and need to be re-tuned for every machine and/or
application. Thus, the LSTM approach is time-consuming, ex-
pensive, and impractical for real-world applications.
The benefits of HTM’s continuous learning capability are
clearly shown in Fig. 6: After identifying earlier anomalies, the
HTM-based approaches learn the new baseline for the signal
and can pinpoint the future anomalies despite higher signal
amplitudes. CAD-OSE also appears to learn continuously, but
not as well as the HTMs.

B. Real-Time Detection Capability


In addition to detection accuracy and precision, an optimal PM
system should be able to detect failure symptoms in real-time to
allow adequate time for repairs to be scheduled and performed.
However, part failures are infrequent and generally present
progressive symptoms before failure, so a hard real-time require-
ment for processing raw sensor data may unnecessarily limit
the complexity (and subsequently the performance) of anomaly
detection methods. Thus, we evaluate the anomaly detectors
in the context of “soft real-time,” where we determine if each
detector can process a subsampled data segment before the next
subsampled data segment arrives. For example, 1 s of data can
be recorded each minute as a data segment to reduce data size
while still ensuring that a wide range of vibration frequencies
are captured at frequent intervals.
Both HTM+HD and TM-HTM+HD were able to process the
complete bearing failure dataset in under 100 min; since the
Fig. 6. Anomaly scores for each detector in comparison to the ground
truth anomaly windows for Test 1 of the pronostia bearing dataset. bearing dataset contains several months’ worth of vibration data
and minimal data preprocessing was performed (subsampling
and timestamping), this demonstrates that HTMs can accurately
detect failure symptoms in real-time, meaning that machine op-
the HTM+HD, HTM+LP, and TM-HTM+HD detectors. Over-
erators can be notified of degradation promptly. Other complex
all, the use of HD anomaly likelihood yielded the best HTM
algorithms such as CAD-OSE, KNN-CAD, and EXPoSE had
performance across applications. Each algorithm’s execution
execution time on the same order of magnitude as HTMs and
time is consistent with the results shown in Table I.
are thus also capable of real-time anomaly detection. Although
HTM+LP, TM-HTM+LP, and LSTM took longer to process the
V. DISCUSSION dataset than HTM+HD and TM-HTM+HD, they can still be
A. Overall Performance and Adaptability considered real-time due to the aforementioned dataset charac-
teristics. However, the significant training time associated with
Interestingly, algorithms that performed well on the bearing
the LSTM (over 12 hours on our hardware platform) and the
dataset, such as EXPoSE and Etsy Skyline performed worse on
need for application-specific hyperparameter tuning put LSTM
the 3-D printer dataset. Additionally, algorithms that performed
at a disadvantage in terms of applicability to practical use
worse on the bearing dataset, such as LSTM and BC performed
cases.
much better on the 3-D printer dataset. Our HTM-based method-
ology using HD anomaly likelihood achieved consistently high
performance on both applications without any hyperparameter C. Tunability, and Robustness to Noise
tuning, demonstrating that this configuration can generalize and Fig. 6 clearly shows HTM’s ability to pinpoint anomalies
adapt to different applications without domain-specific tuning. while remaining robust to noise in the input. This is likely due
This result also suggests that HTMs significantly benefit from to HTMs use of sparse encodings, making it unlikely that bit
the inclusion of an HD anomaly likelihood block. errors in the input due to noise will affect the bits corresponding

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7988 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

to the input pattern, making them robust to noise. From the in the system. As shown by our results, the industry-standard
figure, it is also clear that the HTM implementations using LSTM requires a significant amount of time for training (over
anomaly likelihood blocks were more robust to noise outside of 1000 epochs) as well as application-specific tuning. In contrast,
the anomaly windows than the HTM or TM-HTM alone. This HTMs do not require any application-specific parameter tuning
is likely because the anomaly likelihood components filter out and are essentially plug-and-play since they only need to be
smaller detections to isolate only the most plausible anomalies. trained with a single pass on normal sensor data. These character-
The HTM+HD and TM-HTM+HD detected anomalies earlier istics make HTMs an extremely viable, out-of-the-box solution
than the other configurations, albeit with slightly more false posi- for industrial PM.
tives. The outputs of the different HTMs starkly contrast with the
highly variable anomaly score outputs of Windowed Gaussian,
EXPoSE, KNN-CAD, and BC, among others. These detectors VI. CONCLUSION
record high anomaly scores even when there is relatively low Existing methods for predicting machine failures from sensor
noise in the input, meaning that they will likely suffer from false data are limited in their practicality due to shortcomings, in-
positives at higher noise levels. cluding poor noise resistance, efficiency, and adaptability. Our
A detector’s threshold can be tuned to account for higher experiments demonstrated that our methodology outperforms
noise levels; however, for detectors such as windowed gaus- state-of-the-art approaches at detecting anomalies in both bear-
sian, which used the maximum detection threshold of 1.0, the ing and 3-D printer failure data with minimal to no preprocessing
threshold cannot be increased further to reduce its sensitivity. In or application-specific tuning. On the Standard scoring profile,
contrast, TM-HTM+HD used a threshold of 0.5497 on the stan- our methodology using HD anomaly likelihood achieved an
dard profile. Thus, although windowed gaussian outperformed average NAB score of 64.71. In comparison, the other top
TM-HTM+HD on the Low FP scoring profile, it lacks tunability algorithms: LSTM and Windowed Gaussian, achieved average
and will likely perform much worse than this HTM configuration scores of 49.38 and 61.06, respectively. Furthermore, our qual-
in more noisy environments. itative results showed that our methodology was significantly
LSTM appears to have good robustness to noise, as shown in more noise-resistant than the Windowed Gaussian, KNN-CAD,
Fig. 6. However, it is clear from the figure that it missed some of EXPoSE, and BC detectors, which we attribute to the use of
the earlier anomaly windows completely. In the context of PM, SDRs and an anomaly likelihood component. We also demon-
this can mean that an observer will only be warned of degradation strated that our methodology was real-time capable, with an
later and will not have much time to organize repairs. Overall, our execution time on the same order of magnitude as state-of-the-art
methodology demonstrates significant noise-robustness, better methods. Consequently, we conclude that HTM-based anomaly
tunability, and the ability to detect early anomalies as well as detection is a novel, practical solution for a wide range of
larger, late-stage anomalies. industrial PM applications.

D. Limitations and Future Work


ACKNOWLEDGMENT
Another related PM problem is RUL estimation. In many
cases, RUL and anomaly detection go hand in hand as part of Any opinions, findings, conclusions, or recommendations
a comprehensive PM system. Although we did not evaluate the expressed in this article are those of the authors and do not
performance of HTM at RUL estimation, the core architecture necessarily reflect the views of the funding agencies.
of HTM is good at sequence prediction and could likely be used
to solve this problem. We leave this for future work. REFERENCES
Another limitation of our work is the use of synthesized 3-D
[1] C. Scheffer and P. Girdhar, Practical Machinery Vibration Analysis and
printer anomalies instead of real-world examples of 3-D printer Predictive Maintenance. Amsterdam, The Netherlands: Elsevier, 2004.
failures. Due to resource constraints, we opted not to perform [2] R. K. Mobley, An Introduction to Predictive Maintenance. Amsterdam,
these experiments and used synthetic failure data instead. The The Netherlands: Elsevier, 2002.
[3] J. F. Olesen and H. R. Shaker, “Predictive maintenance for pump systems
question of whether HTM’s performance on synthetic anomalies and thermal power plants: State-of-the-art review, trends and challenges,”
translates to real-world PM remains an open research problem. Sensors, vol. 20, no. 8, 2020, Art. no. 2425.
[4] J. Hawkins and S. Blakeslee, On Intelligence: How a New Understanding
of the Brain Will Lead to the Creation of Truly Intelligent Machines. New
E. Feasibility York, NY, USA: Macmillan, 2007.
[5] O. R. Seryasat, M. A. shoorehdeli, F. Honarvar, and A. Rahmani, “Multi-
The idea of predicting machine failures in advance is not fault diagnosis of ball bearing using fft, wavelet energy entropy mean and
brand new; many variants of PM systems have already been im- root mean square (rms),” in Proc. IEEE Int. Conf. Syst., Man Cybern.,
plemented in real-world manufacturing applications. However, 2010, pp. 4295–4299.
[6] F. Immovilli, M. Cocconcelli, A. Bellini, and R. Rubini, “Detection of
based on our results, we believe that HTM is a better solution generalized-roughness bearing fault by spectral-kurtosis energy of vi-
than current state-of-the-art methods. Our results demonstrate bration or current signals,” IEEE Trans. Ind. Electron., vol. 56, no. 11,
that HTMs are efficient enough to run on consumer-grade pro- pp. 4710–4717, Nov. 2009.
[7] B. Zhang, C. Sconyers, C. Byington, R. Patrick, M. E. Orchard, and
cessors while learning and adapting continuously. Additionally, G. Vachtsevanos, “A probabilistic fault detection approach: Application
HTMs can be easily installed on existing PM systems as they to bearing fault detection,” IEEE Trans. Ind. Electron., vol. 58, no. 5,
only require time-series sensor inputs, which likely already exist pp. 2011–2018, May 2011.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
V. MALAWADE et al.: NEUROSCIENCE-INSPIRED ALGORITHMS 7989

[8] A. Kanawaday and A. Sane, “Machine learning for predictive maintenance [30] Numenta, “Numenta temporal memory implementation,” Feb. 2020. Ac-
of industrial machines using iot sensor data,” in Proc. 8th IEEE Int. Conf. cessed: Feb. 10, 2020. [Online]. Available: https://github.com/numenta/
Softw. Eng. Serv. Sci., 2017, pp. 87–90. nupic.core/blob/master/src/nupic/algorithms/TemporalMemory.hpp
[9] D. A. Tobon-Mejia, K. Medjaher, N. Zerhouni, and G. Tripot, “A data- [31] J. Park, “RNN based time-series anomaly detector model implemented in
driven failure prognostics method based on mixture of Gaussians hid- Pytorch,” 2018. [Online]. Available: https://github.com/chickenbestlover/
den markov models,” IEEE Trans. Rel., vol. 61, no. 2, pp. 491–503, RNN-Time-series-Anomaly-Detection
Jun. 2012. [32] M. Schneider, W. Ertel, and F. Ramos, “Expected similarity estimation
[10] C. Feng, T. Li, and D. Chana, “Multi-level anomaly detection in industrial for large-scale batch and streaming anomaly detection,” Mach. Learn.,
control systems via package signatures and lstm networks,” in Proc. 47th vol. 105, no. 3, pp. 305–333, 2016.
Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw., 2017, pp. 261–272. [33] M. Smirnov, “Contextual anomaly detector,” Aug. 2016. [Online]. Avail-
[11] T. Abbasi, K. H. Lim, and K. San Yam, “Predictive maintenance of oil and able: https://github.com/smirmik/CAD
gas equipment using recurrent neural network,” in Proc. IOP Conf. Ser.: [34] C. Wang, K. Viswanathan, L. Choudur, V. Talwar, W. Satterfield, and
Mater. Sci. Eng., 2019, Art. no. 012067. K. Schwan, “Statistical techniques for online anomaly detection in data
[12] J. Yoon, D. He, and B. Van Hecke, “A phm approach to additive manufac- centers,” in Proc. 12th IFIP/IEEE Int. Symp. Integr. Netw. Manage. (IM
turing equipment health monitoring, fault diagnosis, and quality control,” 2011) Workshops, 2011, pp. 385–392.
in Proc. Prognostics Health Manage. Soc. Conf., 2014, pp. 1–9. [35] A. Stanway, “Etsy skyline,” Oct. 2015. [Online]. Available: https://github.
[13] C.-T. Yen and P.-C. Chuang, “Application of a neural network integrated com/etsy/skyline
with the Internet of Things sensing technology for 3D printer fault diag- [36] E. Burnaev and V. Ishimtsev, “Conformalized density-and distance-based
nosis,” Microsyst. Technol., pp. 1–11, 2019. [Online]. Available: https:// anomaly detection in time-series data,” 2016, arXiv:1608.04585.
link.springer.com/article/10.1007%2Fs00542-019-04323-4#article-info [37] R. P. Adams and D. J. MacKay, “Bayesian online changepoint detection,”
[14] S. Ahmad, A. Lavin, S. Purdy, and Z. Agha, “Unsupervised real- 2007, arXiv:0710.3742.
time anomaly detection for streaming data,” Neurocomputing, vol. 262,
pp. 134–147, 2017.
[15] L. Rodriguez-Cobo, P. B. Garcia-Allende, A. Cobo, J. M. Lopez-Higuera,
and O. M. Conde, “Raw material classification by means of hyperspectral
imaging and hierarchical temporal memories,” IEEE Sensors J., vol. 12, Arnav V. Malawade (Student Member, IEEE)
no. 9, pp. 2767–2775, Sep. 2012. received the B.S. degree in computer science
[16] A. Bamaqa, M. Sedky, T. Bosakowski, and B. B. Bastaki, “Anomaly detec- and engineering from the University of California
tion using hierarchical temporal memory (HTM) in crowd management,” Irvine (UCI), Irvine, CS, USA, in 2018. He is
in Proc. 4th Int. Conf. Cloud Big Data Comput., 2020, pp. 37–42. currently the M.S. and Ph.D. Student studying
[17] A. Almehmadi, T. Bosakowski, M. Sedky, and B. B. Bastaki, “HTM based computer engineering with UCI under the super-
anomaly detecting model for traffic congestion,” in Proc. 4th Int. Conf. vision of Professor Mohammad Al Faruque.
Cloud Big Data Comput., 2020, pp. 97–101. His research interests include the design
[18] B. B. Bastaki, “Application of hierarchical temporal memory to anomaly and security of cyber-physical systems in
detection of vital signs for ambient assisted living,” Ph.D. dissertation, connected/autonomous vehicles, manufactur-
Staffordshire Univ., Stoke-on-Trent, U.K., 2019. ing, IoT, and healthcare.
[19] A. Barua, D. Muthirayan, P. P. Khargonekar, and M. A. Al Faruque,
“Hierarchical temporal memory based one-pass learning for real-
time anomaly detection and simultaneous data prediction in smart
grids,” IEEE Trans. Dependable Secure Comput., to be published,
doi: 10.1109/TDSC.2020.3037054.
[20] S. Faezi, R. Yasaei, A. Barua, and M. A. Al Faruque, “Brain-inspired
golden chip free hardware trojan detection,” IEEE Trans. Inf. Forensics
Secur., to be published, doi: 10.1109/TIFS.2021.3062989. Nathan D. Costa (Member, IEEE) received the
[21] W. Yan, H. Qiu, and N. Iyer, “Feature extraction for bearing prognostics B.S. degree in computer science and engineer-
and health management (PHM)-a survey, ”Air Force Research Lab Wright- ing from the University of California Irvine (UCI),
Patterson AFB OH Materials and Manufacturing, Tech. Rep. AFRL-RX- Irvine, CS, USA, in 2020.
WP-TP-2008-4309, 2008. He is currently applying to industries relevant
[22] D. Wang, K.-L. Tsui, and Q. Miao, “Prognostics and health management: to his interests, those being embedded software
A review of vibration based bearing and gear health indicators,” IEEE development and embedded system design.
Access, vol. 6, pp. 665–676, 2017.
[23] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu, “Deep learning for
smart manufacturing: Methods and applications,” J. Manuf. Syst., vol. 48,
pp. 144–156, 2018.
[24] M. Kordos and A. Rusiecki, “Reducing noise impact on MLP training,”
Soft Comput., vol. 20, no. 1, pp. 49–65, 2016.
[25] ISO 15243:2017, “Rolling bearings - damage and failures - terms, charac-
teristics and causes,” Int. Org. for Standardization, Standard, Mar. 2017.
[Online]. Available: https://www.iso.org/standard/59619.html Deepan Muthirayan (Member, IEEE) received
[26] J. Lee, H. Qiu, G. Yu, J. Lin, and Rexnord Technical Services (2007). the Ph.D. degree in mechanical engineering
IMS, University of Cincinnati “Bearing data set,” NASA Ames Prog- from the University of California, Berkeley, CA,
nostics Data Repository (http://ti.arc.nasa.gov/project/prognostic-data- USA, in 2016, and the B.Tech/M.tech degree in
repository), NASA Ames Research Center, Moffett Field, CA. engineering design from the Indian Institute of
[27] P. Nectoux et al., “Pronostia: An experimental platform for bearings Technology Madras, Chennai, India, in 2010.
accelerated degradation tests,” in Proc. IEEE Int. Conf. Prognostics Health He is currently a Postdoctoral Researcher
Manage., PHM’12. IEEE Catalog Number: CPF12PHM-CDR, 2012, with the Department of Electrical Engineering
pp. 1–8. and Computer Science, University of Califor-
[28] S. R. Chhetri and M. A. Al Faruque, “Side channels of cyber-physical nia. His doctoral thesis work focused on market
systems: Case study in additive manufacturing,” IEEE Des. Test, vol. 34, mechanisms for integrating demand flexibility in
no. 4, pp. 18–25, Aug. 2017. energy systems. Before his term at UC Irvine he was a Postdoctoral
[29] J. Hawkins and S. Ahmad, “Why neurons have thousands of synapses, a Associate with Cornell University, Ithaca, NY, USA, where his work
theory of sequence memory in neocortex,” Front. Neural Circuits, vol. 10, focused on online scheduling algorithms for managing demand flexibility.
2016. [Online]. Available: https://www.frontiersin.org/articles/10.3389/ His current research interests include control theory, machine learning,
fncir.2016.00023/full topics at the intersection of learning and control, online learning, online
algorithms, game theory, and their application to smart systems.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.
7990 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 17, NO. 12, DECEMBER 2021

Pramod P. Khargonekar (Fellow, IEEE) re- Mohammad A. Al Faruque (Senior Mem-


ceived the B. Tech. degree in electrical engi- ber, IEEE) received the B.Sc. degree in com-
neering from the Indian Institute of Technol- puter science and engineering (CSE) from the
ogy, Bombay, India, in 1977, and the M.S. de- Bangladesh University of Engineering and Tech-
gree in mathematics and the Ph.D. degree in nology (BUET), Dhaka, Bangladesh, in 2002,
electrical engineering from the University of and the M.Sc. and Ph.D. degrees in com-
Florida, Gainesville, FL, USA, in 1980 and 1981, puter science from Aachen Technical University,
respectively. Aachen, Germany, and Karlsruhe Institute of
He was the Chairman of the Department of Technology, Karlsruhe, Germany, in 2004 and
Electrical Engineering and Computer Science 2009, respectively.
from 1997 to 2001, and also held the position He is currently with the University of Califor-
of Claude E. Shannon Professor of Engineering Science with the Uni- nia (UCI), Irvine, CS, USA, as an Associate Professor and Directing
versity of Michigan, Ann Arbor, MI, USA. From 2001 to 2009, he was the the Embedded and Cyber-Physical Systems Lab. He served as an
Dean of the College of Engineering and Eckis Professor with the Elec- Emulex Career Development Chair from 2012 till 2015. Before, he was
trical and Computer Engineering with the University of Florida till 2016. with Siemens Corporate Research and Technology in Princeton, NJ,
After serving briefly as Deputy Director of Technology at ARPA-E from USA, as a Research Scientist. His current research is focused on the
2012 to 2013, he was appointed by the National Science Foundation system-level design of embedded and cyber-physical-systems (CPS)
(NSF) to serve as Assistant Director for the Directorate of Engineering with special interest in low-power design, CPS security, data-driven CPS
(ENG) in 2013, a position he held till 2016. Currently, he is Vice Chancel- design, etc. He is an ACM Senior Member. He is the Author of two
lor for Research and Distinguished Professor of Electrical Engineering published books.
and Computer Science with the University of California. His research Dr. Faruque is the recipient of the School of Engineering Mid-Career
and teaching interests include theory and applications of systems and Faculty Award for Research 2019, the IEEE Technical Committee on
control. Cyber-Physical Systems Early-Career Award 2018, and the IEEE CEDA
Dr. Khargonekar was the recipient of numerous honors and awards Ernest S. Kuh Early Career Award 2016. He is also the recipient of
including IEEE Control Systems Award, IEEE Baker Prize, IEEE CSS the UCI Academic Senate Distinguished Early-Career Faculty Award
Axelby Award, NSF Presidential Young Investigator Award, AACC Eck- for Research 2017 and the School of Engineering Early-Career Faculty
man Award, and is a Fellow of IEEE, IFAC, and AAAS. Award for Research 2017. Besides 120+ IEEE/ACM publications in the
premier journals and conferences, he holds nine U.S. patents.

Authorized licensed use limited to: Tsinghua University. Downloaded on January 06,2023 at 15:13:26 UTC from IEEE Xplore. Restrictions apply.

You might also like