0% found this document useful (0 votes)
20 views

Data Guardian A Data Protection Scheme For Industrial Monitoring Systems

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Data Guardian A Data Protection Scheme For Industrial Monitoring Systems

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

2550 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO.

4, APRIL 2022

Data Guardian: A Data Protection Scheme for


Industrial Monitoring Systems
Yue Zhuo and Zhiqiang Ge , Senior Member, IEEE

Abstract—Recently, data-driven industrial monitoring data-driven machine learning methods have become pervasive
systems have been rapidly developed and significantly im- in the industrial fault classification field, such as neural networks
proved the performance on industrial monitoring tasks. for transmission line fault classification [2] and deep belief
However, the widely deployed data-driven models expose
industrial data to more unsecured links, which significantly networks for industrial processes [3].
increase the safety risk of safe-critical industrial systems. Despite the advantages of machine learning methods, recent
The research about adversarial attacks has shown that the research has revealed the weakness widely existing among
potential attackers can utilize tiny crafted perturbations on these algorithms: small malicious imperceptible perturbations
the input data to mislead the machine learning models’ or random noises onto the input will result in totally wrong
output. In this article, a novel cross-domain data protection
scheme named “Data Guardian” is proposed to authenti- output, which is called adversarial attack [4]. Szegedy et al. [5]
cate and correct industrial data under potential attacks on first discovered this property of deep networks in 2013, and
data-driven monitoring systems. “Data Guardian” embeds Goodfellow et al. [6] presented a basic attack method, “fast
designed redundancy information into the data least signif- gradient sign method” (FGSM), crafting the adversarial samples
icant bits, based on q-ary low-density parity-check (LDPC) on the increasing gradient direction.
codes over the Galois field (finite field). The data are en-
coded at the secured industrial sites and then decoded be- Meanwhile, under the trend of Industry 4.0 and the Internet of
fore being input into the monitoring systems, in which the Things [7], the widely used data-driven systems will expose the
security risk is usually higher. The decoding capability of industrial data to higher security risks. Specifically, developing
q-ary LDPC codes is improved by the data statistical char- and deploying the machine learning models involves many
acteristics, with a new proposed prior estimation method. unsecured links, like machine learning as a service (MLaaS)
In the experiments, “Data Guardian” is tested under the
attacks to the fault diagnosis models on the Tennessee in third-party cloud servers [8], which is usually less guarded
Eastman process and rolling element bearing from Case than physical industrial sites.
Western Reserve University. The results show that “Data Unlike the existing defense methods against adversarial at-
Guardian” can efficiently reduce the success rate of adver- tacks [9] from the model perspective, our work protects the mon-
sarial attacks, especially when the attack variable ratio is itoring systems at the data side, which is called “Data Guardian.”
small.
“Data Guardian” protects data with two steps. The first is encod-
Index Terms—Adversarial attack, data protection, indus- ing: before the data have been exposed to the potential risked
trial monitoring system security, q-ary low-density parity scenarios, the dedicated redundancy codes can be computed and
check (LDPC).
embedded into the least significant bit (LSB) of raw data at
the secured industrial site. The second is decoding: before the
I. INTRODUCTION potentially attacked data are fed into the monitoring systems,
the authentication of data is checked. The unauthenticated data
N modern industries, the monitoring system plays an es-
I sential role for industrial safety and stability, in which fault
diagnosis is one of the crucial tasks [1]. In recent years, with
then should be corrected with the help of the redundancy codes to
retrieve the clean raw data. The overview of the “Data Guardian”
scheme for industrial monitoring systems is illustrated in Fig. 1.
the development of data collection and analysis techniques,
Authentication codes (check codes) and error correction codes
(ECCs) have been widely applied in the communication or data
Manuscript received May 16, 2021; revised July 10, 2021; accepted storage field. Check codes are used for checking whether the
August 2, 2021. Date of publication August 10, 2021; date of current
version December 27, 2021. This work was supported in part by the data have been perturbed or incorrectly transmitted, such as
National Natural Science Foundation of China under Grant 61833014 hashing codes [10] and cyclic redundancy check codes [11].
and Grant 61722310 and in part by the Natural Science Foundation of ECCs can correct the errors in data caused by noise or malicious
Zhejiang Province under Grant LR18F030001. Paper no. TII-21-2065.
(Corresponding author: Zhiqiang Ge.) attacks [12], not simply detect. The reused data of ECCs are
The authors are with the State Key Laboratory of Industrial Con- particularly beneficial for some industrial processes where the
trol Technology, College of Control Science and Engineering, Zhejiang data are hard to obtain.
University, Hangzhou 310027, China, and also with the Peng Cheng
Laboratory, Shenzhen 518000, China (e-mail: [email protected]; In this article, one representative ECC, low-density parity-
[email protected]). check (LDPC) code, is taken as the base of the data correc-
Color versions of one or more figures in this article are available at tion scheme. LDPC codes were first proposed by Gallager in
https://doi.org/10.1109/TII.2021.3103765.
Digital Object Identifier 10.1109/TII.2021.3103765 1963 [13] and rediscovered by MacKay and Neal in 1996 [14];

1551-3203 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
ZHUO AND GE: DATA GUARDIAN: A DATA PROTECTION SCHEME FOR INDUSTRIAL MONITORING SYSTEMS 2551

Fig. 1. Overview of the “Data Guardian” scheme for industrial monitoring systems.

since then, LDPC codes have got rapidly developed. In recent The method is universal and applicable for any poten-
years, LDPC codes have been proven to reach good error correc- tial adversarial attack on data-driven machine learning
tion performance and have been commercially applied for many models.
standards, such as Wi-Fi 802.11 standard [15], NAND flash 2) At the q-ary LDPC decoding stage, a new prior estima-
memory [16], and 5G telecommunication [17]. The decoding tion approach utilizing data statistical information is put
process of LDPC codes can be regarded as a belief propagation forward, dedicated to improving the correction capability
in a probabilistic graph, which takes the data prior probability as of the industrial data.
input and outputs a soft decision based on maximum a posteriori 3) The proposed “Data Guardian” is a novel cross-domain
(MAP), named soft decoding. The soft decoding will output the data protection method, for the first time, which merges
MAP of clean raw data. Hence, even if the exact raw data cannot the statistical data with a communication ECC for guard-
be decoded, the outputs more resemble the raw, which may still ing the security of industrial AI systems.
correct the prediction of industrial monitoring models. This is The rest of this article is organized as follows.
one advantage of LDPC codes on industrial data correction, Section II introduces preliminaries and related works about
while other hard decoding ECC does not have. industrial monitoring systems and adversarial attacks as well as
In 1998, Davey and MacKay [18] generalized LDPC codes the LDPC codes. Section III gives a comprehensive and detailed
from binary elements {0, 1} to the ensemble {0, 1, . . . , q − 1} introduction to our data protection scheme “Data Guardian.”
over high-order Galois field GF(q), q > 2, called q-ary LDPC Section IV reports the experiment results. Finally, Section V
or nonbinary LDPC. They showed that LDPC codes over GF(q) concludes this article.
achieve superior performance to binary LDPC codes. Unfor-
tunately, unlike binary LDPC has been widely researched and II. PRELIMINARIES AND RELATED WORKS
applied in practical scenarios, q-ary LDPC lacks systematic
A. Fault Diagnosis and Adversarial Attack
research, and its high decoding complexity hinders practical
applications in communication systems. “Data Guardian” can protect the various attacks on the data-
Hence, we are committed to researching the q-ary LDPC driven industrial monitoring systems. Fault diagnosis is one of
application in the artificial intelligence (AI) security field. In the the most important and basic tasks, which is assumed to be as
big data era, many data-driven applications have been widely the attacked target systems.
used in various fields while posing new challenges: How to Given data D = (x1 , y1 ), (x2 , y2 ), . . . , (xN , yN ), where xi is
secure the data and models of AI systems. In this work, we start the sample collected from the industrial process and yi is the
with the data security issues and utilize q-ary LDPC to give data corresponding fault category.
authentication and error correction properties. Correspondingly, 1) Fault Diagnosis: The data-driven fault diagnosis can be
we propose a new decoding algorithm of q-ary LDPC merged regarded as a classification problem in machine learning. Fault
with data statistical characteristics, efficiently improving the cor- diagnosis is to classify the industrial faults into the correct
rection capacity for the industrial data. On the other hand, “Data categories, which benefits us to know the reason about faults
Guardian” is a universal defense method, which is not aimed and then fix it, which can be written as follows:
at one or several attack methods but can well defend almost all
f (x) = arg max P (yi |xi ). (1)
the adversarial attacks of similar perturbation forms. Moreover,
“Data Guardian” is independent of the machine learning models arg max means calculating the parameters that correspond to
so that it can be applied to any kind of monitoring system. the maximal conditional probability of label yi given data xi .
The main contributions of the article are summarized as 2) Adversarial Attack: Adversarial attack is to perturb origi-
follows. nal data with vectors constrained by certain values, which will
1) To defend against the adversarial attacks, a novel data pro- cause the model to give the wrong outputs. In this article, the
tection scheme “Data Guardian” is proposed to protect the attacked model is fault classifiers, f . Denote n-dimensional
industrial monitoring systems from the data perspective. original fault samples to be perturbed as x and the perturbation

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
2552 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 4, APRIL 2022

vector of the same shape as η. The adversarial attack is

f (x + η) = f (x) (2)
subject to ηp ≤ ε (3)

where  · p is the p-norm of the vector to control the distance


between adversarial and original samples in different measure-
ments and ε is a hyperparameter predefined to control the per- Fig. 2. Parity-check matrix and Tanner graph of a 4-ary LDPC codes.
turbation amplitude. The constraints ensure that the adversarial
samples will not be too far from the clean raw data, making it
difficult to be directly observed by human’s perception. value in the parity-check matrix. Here is an example of 3 × 5
3) Related Work of Adversarial Defense: The most defense q-ary (q = 4) LDPC parity-check matrix H and its Tanner graph
methods against adversarial attacks are from the model perspec- in Fig. 2.
tive [9]. An intuitive way is adversarial training, which trains ro- Based on the Tanner graph, the LDPC code decoding process
bust neural networks with adversarial samples [5], [19]. Besides, can be summarized as an iterative probabilistic propagation be-
there are many other methods to defend adversarial attacks, tween variable nodes and check nodes in the Tanner graph, which
such as contractive networks [20], federated learning [21], or is commonly referred as the sum-product algorithm (SPA) [25]
add-on models to defend the perturbations [22]. However, there and q-ary SPA (QSPA) [26] for q-ary LDPC.
is still no existing work to defend against adversarial attacks 2) Decoding Algorithm: The first step of the QSPA is check-
from the data perspective as far as we researched. One related ing the syndrome of input codeword c using H × cT . If the
adversarial defense method to our work is error correcting output syndrome is a zero vector, the codeword is authenticated to be
codes (ECOC) [23], which encode the outputs of classification safe, and decoding is not needed. Next, for the received attacked
models into bits and decode the perturbed outputs into the correct codeword z, the corresponding prior probability λ is estimated,
predicted categories, improving the robustness of data-driven which is expressed as a (q − 1)-dimensional vector λ in the
models against adversarial attacks. However, the ECOC still log-likelihood ratio (LLR) form. λ represents the probability of
works on the model structure, while it has nothing to do with the original value at the variable vi , given the received distorted
the data itself. codeword z. With the input λ of the variable nodes, the iterative
decoding contains two kinds of massage passing.
B. LDPC Codes 1) The probability massage q passed from variable nodes to
check nodes
1) Galois Field: Like all the other check or ECCs, the oper-
a
ations in LDPC codes are all performed over the Galois field. qi,j = P (zj = a|si = 0), i ∈ Nv (j)\i. (4)
In mathematics, the Galois field (or finite field) is a field that
contains a finite number of elements, which are fundamental in 2) The probability massage r passed from check nodes to
a number of areas, including number theory, algebraic geometry, variable nodes
cryptography, and coding theory. Given a prime number p, the set a
ri,j = P (si = 0|zj = a), pmf(zj  ) = qi,j  ∀j  ∈ Nc (i)\j)
G = {0, 1, . . . , p − 1} forms a finite field for modulo p addition
(5)
and modulo p multiplication, denoted as GF(p). The q-ary LDPC
codes considered in this article are all based on the power where a denotes q − 1 (except zero) elements on the finite field
expression of finite field GF(q = 2m ), so the addition operation GF (q), Nv (j) is the index set of the check node connected
between the field elements is the exclusive OR operation of the to the j variable node, Nc (i) is the index set of the variable
polynomial. node participating in the i check equation, si is the syndrome
LDPC codes are linear block codes, whose parity-check values of i variable nodes, and pmf means probability mass
matrices have a low density of nonzero entries (sparse). They function. Two massages iteratively pass until the syndrome of
can be represented either by the generator matrices G or the decision codeword z is zero or the max iteration number is
parity-check matrices H, where G × H T = 0. If the elements in reached.
G and H are on GF(2) = {0, 1}, the LDPC is binary; otherwise, 3) Related Work of LDPC Codes: A similar LDPC code
it is q-ary LDPC (nonbinary). application to our data protection scheme is named as digital
The generator matrix G = [P |Ik ] in this work is over GF(q) watermarks [27]. These data watermarks are mainly designed
and the LDPC is q-ary. The encoding operation is expressed as for authenticating and correcting the images or videos with
c = u × G, where u = (u0 , . . . , uk−1 ) represents the raw data embedded ECCs, including LDPC codes. However, almost all of
on the same Galois field. The redundant part of the codeword, them are based on binary codes, and none of them considers the
u × P , is subsequently embedded into the LSB inside the raw data statistical characteristics. As for q-ary LDPC codes, most
data. The most common representation of an LDPC code is the research studies focus on improving decoding algorithm and
bipartite graph of its H matrix, also called Tanner graph [24]. An parity matrix [28], while the improvement of prior probability
edge (or a connection) in the Tanner graph represents a nonzero w.r.t. decoded data is less concerned.

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
ZHUO AND GE: DATA GUARDIAN: A DATA PROTECTION SCHEME FOR INDUSTRIAL MONITORING SYSTEMS 2553

Fig. 3. Block diagram of the “Data Guardian” scheme for industrial monitoring systems with a specific TEP sample.

III. DATA PROTECTION SCHEME FOR INDUSTRIAL 2) Encode: The half most significant bits (MSB) of quan-
MONITORING SYSTEMS tized data are converted to integer values in GF(6) and
then multiplies the generator matrix G, the product of
This section describes the proposed data protection scheme which is encoded codeword. The raw data are on the
“Data Guardian” and its application for industrial monitoring MSB of the codeword, and the redundancy codes are
systems, which is shown in the block diagram of Fig. 3 with a embedded onto the half LSBs of quantized data. Finally,
specific example from the Tennessee Eastman process (TEP) in through inverse quantization, the codeword is converted
detail. The scheme mainly contains three steps. into continuous value data and sent to the remote moni-
1) Quantization: As with the normalization, quantization toring system.
is a necessary preprocessing skill for the LDPC codes. 3) Decode: Receiving the uncertified data, the syndrome
The original floating value data collected from industrial vector is first calculated. If the data are unauthenticated,
scenarios are approximated by a set of integers, which can the decoding process is started. In the diagram, the second
be further manipulated by the data protection scheme. In variable of data is assumed to be perturbed. First, the
Fig. 3, the illustrative data sample is quantized from a real data prior probability of received attacked data is
vector of k float elements to a binary k × 12 matrix, with estimated by the probabilistic distribution models [such
12 quantization bits. as Gaussian mixture model (GMM)], which have been

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
2554 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 4, APRIL 2022

offline trained by the historical data. Then, the QSPA is different since the redundancy codes are embedded on the
decoding algorithm is applied to correct the attacked data. LSB together with data on the MSB. When one variable is
As the diagram shows, the second variable of decoded attacked, both of the two codes will be perturbed. Hence, a
data is successfully corrected to the value of clean en- code rate higher than 1/2 will not benefit our scheme, while
coded data before attacking. Finally, the corrected data a code rate lower than 1/2 will waste data storage. Therefore,
can be used for industrial monitoring systems. the code rate used in our scheme is 1/2, which means that one
The following parts of this section will give more detailed redundancy code embeds into the LSB of one data variable, just
descriptions of these three steps, as well as the analyses of “Data as shown in Fig. 3.
Guardian,” including assumptions, pros, cons, and security of During the encoding, the MSBs of data are first extracted and
itself. converted to elements in GF(q). Then, redundancy codes are
computed and embedded into the LSBs of raw data. At last, the
A. Quantization quantized data are converted back to the floating-point forms
and transmitted to the remote monitoring systems. Though the
Before the quantization, the collected industrial data are first
encoded data will lose some precision, these protected industrial
normalized to the range [0,1], which is easy for the later process
data are very close to the raw data. Still, they can be further read
and beneficial to the monitoring system performance. Usually,
or operated by other data analysis skills, just as the unprotected
the industrial data collected from sensors are floating point
one.
values, but the values operated by the LDPC scheme should
be the finite elements on the Galois field. So, by quantizing the
floating values onto a regularly spaced grid, the original data C. Decode
can be approximated by a set of integers. For the data in the Since the monitoring systems are usually at a remote place far
range [0,1], the quantization can be expressed by the following away from the industrial site, the data transmission process is
equations: under many unsecured links of high risks. Hence, the attacks or
noise perturbations are assumed to happen at this stage. Before
x̂ = x · 2Q (6)
it has been sent into monitoring systems, each data received at
where x denotes the floor operation to get the maximal integer the remote end will be decoded to authenticate or correct the
less than x, and Q is the quantization bits. data, ensuring that the data used for monitoring are reliable.
The quantization bits Q are decided by the bit numbers to be The very first step of decoding is to check the syndrome of
protected, m, which also decides the GF(q) for LDPC matrices, received data after quantization. If the syndrome is zero, the data
q = 2m . Because the protected bits locate at the MSB and cannot are authenticated; otherwise, it can be judged that there exist
be modified, there should be at least another m bit space in the errors in the received data, and the decoding algorithm should
LSB to embed the m-bit redundancy codes on GF(q). Hence, the be applied. For the decoding of LDPC, the actual input is the
quantization bits for the data should be at least 2 m, Q ≥ 2 m. probability of raw data estimated based on the received data. The
The protected bit numbers, m, decides the precision of the data. precision of this probability decides the decoding performance
Generally, larger m will bring minor loss of quantization of the decoding algorithm.
precision, but larger q will also cause the decoding of LDPC In the communications, the perturbations are commonly the
more complex, because the search space becomes larger. Larger random noise on every single bit, not on the one symbol (GF(q =
q will also increase the decoding computational consumption, 2m ) element consisting of m bits). However, in industrial data
where the computational complexity of the QSPA algorithm systems, perturbations happen on variables, which means that
is O(q 2 ). Moreover, the data precision has different effects on individual variables are attacked as a whole. In our scheme, one
different industrial processes and monitoring models. In general, variable contains two symbols (data symbol on the MSB and
the value of m should be a tradeoff between decoding complexity redundancy symbol on the LSB).
and data precision, considering different industrial scenarios. As for the industrial process data, we assume that the attacks
or perturbations only happen on a small part of variables. So, the
B. Encode prior probabilities have two parts: 1) a Bernoulli distribution for
the received symbol and all the other symbols and 2) probability
Encoding is a process to generate redundant information
mass function of all the symbols. The two parts together can be
based on raw data and generator matrix G. The designed paired
written as
generator matrix G and parity matrix H define an LDPC code, 
which can influence the performance of LDPC codes. Never- pa · Pvi (a|z), a = z(i)
theless, where to set nonzero values of sparse matrix H is not P r(vi = a|z) = (7)
(1 − pa ) + pa · Pvi (a|z), a = z(i)
concerned by this work; here, we only discuss the influence of
the shape of G, also called code rate. where pa is the parameter of Bernoulli distribution and Pvi (a|z)
1) Code Rate: Given G of shape k × n, the corresponding is the probability mass function that gives the conditional proba-
code rate of LDPC is k/n, and the length of redundancy codes bility that variable vi equals to element a in GF(q) given received
is k − n. In the communication field, a lower code rate means codeword z.
larger redundancy and naturally higher correction rate on the For the LSB symbols, it is tough to estimate its precise
massage (data). The situation in our data protection scheme probability distribution, so we assume that all symbols on GF(q)

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
ZHUO AND GE: DATA GUARDIAN: A DATA PROTECTION SCHEME FOR INDUSTRIAL MONITORING SYSTEMS 2555

except the received one are uniformly distributed. get access and exert some physical damage. However, the widely
applied data-driven monitoring systems expose more vulnera-
Pvi (a|z) = 1/q. (8)
bilities to attackers in the data. Due to the bad environment,
For the MSB symbols, one intuitive way is to assume that the the data-driven monitoring systems are commonly not in the
probability mass function is normal. The integral of the Gaussian industrial sites but placed in some other remote servers such
probability density function on one quantization grid can be used as MLaaS, e.g., Google Cloud ML and AWS by Amazon [8].
to estimate the probability mass function This will expose the data to many third-party links such as
 (a+1)/2q data scientists and machine learning experts. The security of
Pvi (a|z) = N (z(i), σ 2 ) (9) these links is out of the industrial site guard and currently less
a/2q concerned, which will significantly increase the risks of data
leakage, attack, and pollution. Hence, the hypothetical attackers
where N (z(i), σ 2 ) denotes the Gaussian probability density
aim to perturb the industrial data and avoid attacking physical
function with z(i) mean and σ 2 variance. The corresponding
industrial facilities to achieve the purpose of sabotage. Also, we
probability mass function is calculated by the integral of the
believe this kind of adversarial data attack is easier to apply and
probability density function between two adjacent symbols.
harder to be detected by the supervisors.
Obviously, this approach does not consider the correlation
Another type of attack aims at industrial control systems,
between different variables in data. However, actually, the in-
which deceives the measurements of controllers and generates
dustrial variables are tightly related; therefore, in this article,
malicious actuation signals. For example, famous Stuxnet [29]
a novel approach is proposed to estimate this prior probability
had successfully attacked the supervisory control and data acqui-
based on the industrial data statistics.
sition (SCADA) systems of nuclear power plants in a concealed
1) Statistical Prior Probability Estimation: In general, the
way. Later proposed SCADA attack method such as covert
probability density is first estimated offline based on the histor-
networked control system misappropriation [30] used a covert
ical data. When the attacked data are received, we can know the
agent feedback controller to attack the industrial systems with
high probability areas that the clean raw data may locate. There
tiny disturbance on measurements. This kind of attack shares
are many probability density estimation methods, such as kernel
some common features with the adversarial attack. Both of them
density estimation, Bayesian networks, variational autoencoder,
exploit the security defects of digital signals communicated be-
and generative adversarial network.
tween physical industrial facilities and remote supervisory con-
In this artile, we use the GMM as the probability density
trol systems, and they both tamper the measurements to conceal
estimator. After the GMM fits the historical data with maximal
themselves from being detected. Hence, the “Data Guardian”
likelihood estimation offline, for the online real-time received
can also defend such attacks on the measured data. Before the
data, we fix all other variables and set the value of vi from 0 to
measured values have been used for supervisory control systems,
q − 1. The output of the GMM is the probability of the clean
the “Data Guardian” can ensure that all the data (signals) are
raw value on variable vi of these data, at each GF(q) element.
integral and not tampered, even the perturbation is extremely
Repeating for all the variables, the prior probability matrix can
small and hard to be detected.
be calculated and then input into the decoding algorithm. The
In summary, the “Data Guardian” can defend any form of data
probability mass function is
tampering attacks, which happens on the transmission link be-
Pvi (a|z) = GMM(z  ), z  ← z(i) = a (10) tween physical industrial site and remote networked monitoring
systems. Note that in Fig. 3, the industrial site and the remote
where z  is obtained by modifying the ith variable of z to element
server are assumed to be guarded and safe. If the some attacks
a in GF(q).
(e.g., Stuxnet) take over the whole monitoring systems such as
At last, the decoding algorithm outputs the correction data.
programmable logic controllers, “Data Guardian” is no longer
If decoding fails, the most confident sample will be output.
guaranteed. Moreover, the attacks beyond data acquisition link
In practice, before the decoded data are fed into the industrial
(e.g., malicious control actuation) are also not in the scope of
monitoring systems, the data with zero syndrome means it has
our scheme.
not been attacked or has been successfully corrected, which is
2) Attack Assumptions: “Data Guardian” assumes that the
authenticated and reliable for the monitoring systems. Other-
attacked variables are the small parts of all variables and the
wise, for the data failing to decode, the monitoring result is not
amplitude of attacks is not very large. If large attacks happen on
reliable but can be used as a reference.
data-driven monitoring systems, the corresponding adversarial
samples can be easily detected by some other detection skills
D. Analysis such as outlier detectors [31] and dedicated detectors [32],
Here, we analyze the attack scenarios, attack assumptions, or even directly observed by human perceptions. Hence, this
pros, and cons of the “Data Guardian” scheme for the industrial work focuses on more challenging detection and defense tasks
data. of which the perturbations are small. We also believe that the
1) Attack Scenarios: In this article, we assume that a mali- assumptions are close to the practical industry, where it is more
cious hacker tries to attack the critical industrial systems. How- possible that a few variables are perturbed other than large parts
ever, real-world physical industrial facilities such as assembly of industry data are polluted. Also, the errors commonly will not
lines or factories are usually strictly guarded, which it is hard to deviate too much from the actual working condition.

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
2556 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 4, APRIL 2022

5) Security: The security of “Data Guardian” itself is dis-


cussed when facing counterdefense attackers. Commonly, there
are two types of defense techniques against antidefense attack.
One is to keep the parity-check matrix secret. In this article, for
convenience, the q-LDPC codes are sequentially embedded into
the LSB of the raw data. For the practical applications, in order
to avoid coded information being inferred, before the codes are
embedded, the codes can be scrambled in a secret order [34] or
further encrypted through some cryptography techniques (e.g.,
hash functions [35]). At the decoding stage, the codes are first
retrieved by the same scramble order or cryptography algorithm
Fig. 4. Rolling element bearing platform from CRWU [38]. with private keys. By these techniques, the plain data and code
are hard to be inferred by external counterdefense attackers,
which makes the parity-check matrix secret. Another situation
is that the parity-check matrix is public; then, other encryption
algorithms involved with McEliece cryptosystem [36] can be
applied, which is not elaborated here.

IV. EXPERIMENTS
In this section, the “Data Guardian” scheme is evaluated on
two datasets: one is the industrial process benchmark, TEP [37],
and another one is rolling element bearing (REB) dataset from
the Case Western Reserve University (CWRU) dataset cen-
Fig. 5. Quantization precision influence on monitoring accuracy. ter [38], which is also widely used as a benchmark. Deep
neural networks (DNNs) for fault diagnosis trained on these
two datasets are assumed to be attacked. Based on the DNN fault
diagnosis model for the TEP, a deep insight is given to explain
3) Pros: There are mainly three advantages of “Data
the proposed prior probability estimations. The illustration of
Guardian.”
the REB platform is shown in Fig. 4.
1) Since our scheme embeds the redundancy information
inside the data structure, unlike some digital signature
schemes, our scheme does not need any other data storage A. Experiment Settings
space, which is suitable for more practical industries. 1) Dataset: The TEP is a public benchmark dataset for de-
More, ECCs make our scheme enable to correct the veloping, studying, and evaluating an industrial process. The
attacks, not only detect, which is important, since the TEP dataset consists of 52 variables and 29 fault types. In the
some industrial fault data are hard to collect and each experiments, 22 fault types (including one under normal working
one should be fully utilized. condition) with 48 variables are chosen. The REB from CWRU
2) q-ary form of LDPC can make us estimate the prior of is also a widely used benchmark dataset for fault diagnosis,
variables as a whole, not single bit used in communication in which three types of fault at different positions (race and
field. Furthermore, the statistical characteristics of indus- ball) and the normal condition are used. For the data, drive end
trial data between each variable can efficiently improve accelerometer data are taken and split into the samples of 48
q-ary LDPC’s belief propagation decoding algorithm per- sampling points by sliding window, which means each data point
formance, while some other ECCs based on polynomial consists of 48 sampling values.
operations cannot utilize this characteristic. 2) Monitoring Systems: This article experiments the softmax
3) Our data protection has little influence on model accuracy, DNN fault diagnosis model as the representative of industrial
which has been shown in Fig. 5 in the experiment section, monitoring systems. As for data split, each working condition in
while most of the other model-based defense methods will two datasets is supported on nearly 500 samples, where 70% for
cause the overfit to a very large degree and harm the model the train and 30% for the test and attack. DNNs are supervisely
performance [33]. trained on the train set, which reaches the 76% accuracy for the
4) Cons: The weakness of our scheme is on the other side. TEP and 85% for REB on the test set.
Because the redundancy codes are embedded on the LSB, its 3) Quantization: The quantization precision influences the
prior probability can only be estimated as uniform distribution, monitoring system performance. To figure it out, the data in
causing the search space of LSB to be very large and, in return, the test set are quantized to different bits, and the accuracies
reducing decoding capacity. Also, when the perturbations are of monitoring models are checked. Fig. 5 shows the results
large, the prior probability estimated by the statistical models of DNN fault classifiers for two datasets, where the gray lines
may be not precise. In summary, the performance of our scheme present the accuracy of 32-bit full precisions. According to the
will be harmed by the large attack variable ratio. experiments on the TEP, six quantization bits are found to be the

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
ZHUO AND GE: DATA GUARDIAN: A DATA PROTECTION SCHEME FOR INDUSTRIAL MONITORING SYSTEMS 2557

Fig. 6. Success attack rate of FGSM.

Fig. 7. Success attack rate of white noise.

best choice, losing a little accuracy (−1.76% and −0.79% to the capabilities of “Data Guardian” against universal adversarial
full precisions) but getting a considerable decoding success rate. attacks. All the correctly predicted test data are used to be
As for REB, quantization bits greater than 4 do not influence the attacked by these two perturbations and protected by our scheme.
diagnosis accuracy. So, in the following experiment parts, for The statistical prior estimation GMM model is unsupervisely
the two datasets, the matrices of q-ary LDPC are constructed trained on the whole train set.
over GF(26 ) and the original data are quantized to 12 bits, with
six MSBs for the protected data and six LSBs for the redundancy
codes. According to the data structure, the generator matrix G B. Experiment Results
and the sparse parity matrix H of q-ary LDPC are of shape The performance of “Data Guardian” is experimented under
48 × 96 over GF(26 ), where each row of H has four nonzero the different attack amplitude and variable ratio. The perfor-
values and column has two. mance matrix is success attack rate (make the fault classifier
4) Attack: In this article, two kinds of attacks are assumed: give wrong prediction). Through the comparisons between data
FGSM and Gaussian white noise. FGSM stands for the malicious with and without “Data Guardian,” the effectiveness of our data
attack that implies fixed-amplitude perturbations on delicately protection scheme is clearly illustrated.
selected variables, while white noise stands for random ampli- Figs. 6 and 7, respectively, show the success attack rates of the
tude perturbations on random variables. From the perspective of FGSM and white noises, where the first row is for the TEP and
perturbation vector patterns, in our opinion, these two attacks the second for REB. When the attack ratio is relatively low (no
can well represent most adversarial attack methods, the corre- more than 30%), our methods are superior and efficiently reduce
sponding results of which can convincingly validate the defense the attack rate. Especially, when the attack ratio is less than 20%,

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
2558 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 18, NO. 4, APRIL 2022

Fig. 9. Prior probability estimated by the GMM.

decoding fails and exact clean raw data cannot be computed, the
soft inferred data are much closer to the real one and still correct
Fig. 8. Success decoding rate comparison. the diagnosis model output.
Moreover, a TEP sample is used to explain why the data
statistical prior estimator is more precise and helpful to de-
the LDPC-based protection scheme performs perfectly, where coding. Fig. 9 shows a sample prior probability estimated by
the success attack rate is almost near 0%, which means almost the GMM, which is attacked by the FGSM with 20% variable
all the attacks are defended. However, when the attack ratio is ratio and 0.2 amplitude. The deeper color denotes a higher prior
over 40%, the performance of our method drops drastically. Just probability estimated. We only focus on the attacked variables;
as Section III-D demonstrates, the attack on one variable will the yellow points denote the attacked symbols we received, and
perturb two symbols on MSBs and LSBs at the same, and the the red points are the clean raw symbol values. For those pairs,
prior probabilities estimated on the LSB symbols are also very it can be seen that most of the raw values are in the relatively
ambiguous. high probability area. In other words, the estimated prior by
Compared with different prior estimators, LDPC with the our methods (GMM) is rising toward the direction of clean raw
GMM estimator is slightly better than the Gaussian prior in values.
reducing attack success rate, especially under the FGSM attack, Furthermore, for the attacked symbols, the probability rank
the insight into which will be further demonstrated in the next of the raw symbol is calculated. The average value of the GMM
subsection. On the other hand, compared with the adversarial estimator is 10.45, which means that, under brute-force search,
training method, the LDPC-based method performs much better the raw symbol can be reached after 11 times. As a comparison,
on the smaller ratio but the larger amplitude attacks. While this value of the Gaussian prior is 20.13, about two times our
the adversarial training method is contrary, once the ampli- method, which also explains why our method has more decoding
tudes of attacks become large, even the attack variable ratio success rate overall.
is small, they perform poorly. Also, according to the results of
the adversarial training method under the white noise attack
in Fig. 7, their defense is less effective, which reflects the V. CONCLUSION
universality of our data protection scheme on different kinds
In this article, a novel cross-domain universal industrial data
of attacks.
protection scheme named “Data Guardian” was proposed, which
Across two datasets, it is clear that “Data Guardian” overall
can defend against the adversarial attacks on data-driven indus-
performs better on the TEP. This is mainly due to the different
trial monitoring systems. q-ary LDPC codes were taken as the
data statistic characteristics. TEP data are closer to Gaussian
error correction scheme, which enables the reuse of the attacked
distribution than REB data, and both the GMM and the Gaus-
data. To improve the decoding rate, the industrial data statisti-
sian prior have Gaussian hypothesis on the data, so the prior
cal characteristics were fully utilized. The probability density
estimation on the TEP may be preciser than that on REB.
estimation models were fitted on the historical data and then the
estimated prior for the data may be attacked. The experiments
C. Insight Into Prior Estimators on TEP and REB datasets simulated two kinds of attacks on
Taking the TEP as an example, the success decoding rates of monitoring systems, the results of which showed the superiority
two prior estimation methods are compared, as shown in Fig. 8. of our methods.
Under both attacks, our method reaches a higher decoding suc- There remains much work to research in the future. For
cess rate. This can reflect that the GMM estimates more precise example, the combination of adversarial training and ECCs can
prior probability than simply assuming Gaussian distributions on be considered to overcome the weakness of our data protection
each variable independently. According to Figs. 6 and 7, in “Data scheme under the high attack variable ratio. A better prior
Guardian,” the reduction of attack success rate is much larger estimation method for non-Gaussian can be studied to further
than the decoding success rate. This verifies that, even LDPC improve the decoding performance.

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.
ZHUO AND GE: DATA GUARDIAN: A DATA PROTECTION SCHEME FOR INDUSTRIAL MONITORING SYSTEMS 2559

REFERENCES [28] S. A. Alabady, “Binary and non-binary low density parity check codes: A
survey,” Int. J. Inf. Eng. Appl, vol. 1, no. 3, pp. 104–117, 2018.
[1] Q. Sun and Z. Ge, “A survey on deep learning for data-driven soft sensors,” [29] N. Falliere, L. O. Murchu, and E. Chien, “W32. stuxnet dossier” White
IEEE Trans. Ind. Informat., vol. 17, no. 9, pp. 5853–5866, Sep. 2021. Paper, Symantec Corp., Security Response, vol. 5, no. 6, pp. 1–68, 2011.
[2] M. Pazoki, “A new fault classifier in transmission lines using intrinsic time [30] R. S. Smith, “Covert misappropriation of networked control systems:
decomposition,” IEEE Trans. Ind. Informat., vol. 14, no. 2, pp. 619–628, Presenting a feedback structure,” IEEE Control Syst. Mag., vol. 35, no. 1,
Feb. 2018. pp. 82–92, Feb. 2015.
[3] Y. Wang, Z. Pan, X. Yuan, C. Yang, and W. Gui, “A novel deep learning [31] J. Feng, H. Xu, S. Mannor, and S. Yan, “Robust logistic regression and
based fault diagnosis approach for chemical process with extended deep classification,” in Proc. 27th Int. Conf. Neural Inf. Process. Syst., 2014,
belief network,” ISA Trans., vol. 96, pp. 457–467, 2020. pp. 253–261.
[4] G. Li, K. Ota, M. Dong, J. Wu, and J. Li, “DeSVig: Decentralized swift [32] K. Grosse, P. Manoharan, N. Papernot, M. Backes, and P. Mc-
vigilance against adversarial attacks in industrial artificial intelligence Daniel, “On the (statistical) detection of adversarial examples,” 2017,
systems,” IEEE Trans. Ind. Informat., vol. 16, no. 5, pp. 3267–3277, arXiv:1702.06280.
May 2020. [33] L. Rice, E. Wong, and Z. Kolter, “Overfitting in adversarially robust deep
[5] C. Szegedy et al., “Intriguing properties of neural networks,” in Proc. 2nd learning,” in Proc. 37th Int. Conf. Mach. Learn., 2020, pp. 8093–8104.
Int. Conf. Learn. Representations, (ICLR), 2014. [34] J. Lee and C. S. Won, “Image integrity and correction using parities of
[6] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing error control coding,” in Proc. IEEE Int. Conf. Multimedia Expo., 2000,
adversarial examples,” in Proc. 3rd Int. Conf. Learn. Representations, pp. 1297–1300.
(ICLR), 2015. [35] M. Schneider and S.-F. Chang, “A robust content based digital signature
[7] M. T. Okano, “IoT and Industry 4.0: The industrial new revolution,” in for image authentication,” in Proc. 3rd IEEE Int. Conf. Image Process.,
Proc. Int. Conf. Manage. Inf. Syst., 2017, pp. 76–82. 1996, pp. 227–230.
[8] A. Qayyum et al., “Securing machine learning in the cloud: A systematic [36] M. Baldi, QC-LDPC Code-Based Cryptography. New York, NY, USA:
review of cloud machine learning security,” Front. Big Data, vol. 3, 2020, Springer, 2014.
Art. no. 148524. [37] J. J. Downs and E. F. Vogel, “A plant-wide industrial process control
[9] A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. problem,” Comput. Chem. Eng., vol. 17, no. 3, pp. 245–255, 1993.
Mukhopadhyay, “Adversarial attacks and defences: A survey,” 2018, [38] Case Western Reserve University Bearing Data Center Website. [Online].
arXiv:1810.00069. Available: http://csegroups.case.edu/bearingdatacenter/home
[10] D. R. Stinson, “Universal hashing and authentication codes,” Des., Codes
Cryptogr., vol. 4, no. 3, pp. 369–380, 1994.
[11] J. P. Chambers, “Cyclic redundancy data check encoding method and
apparatus,” U.S. Patent 4 283 787, Aug. 11, 1981.
[12] N. Glover and T. Dudley, Practical Error Correction Design for Engineers.
Data Systems Technology Corporation. Sierra Vista, AZ, USA: Data Syst.
Technol. Corporation, 1991. Yue Zhuo received the B.Eng. degree in au-
[13] R. Gallager, “Low-density parity-check codes,” IRE Trans. Inf. Theory, tomation and electrical engineering from the
vol. 8, no. 1, pp. 21–28, 1962. College of Information Engineering, Zhejiang
[14] D. J. MacKay and R. M. Neal, “Near Shannon limit performance of low University of Technology, Hangzhou, China, in
density parity check codes,” Electron. Lett., vol. 32, no. 18, pp. 1645–1646, 2018. He is currently working toward the Ph.D.
1996. degree in control science and engineering with
[15] IEEE Standard for Information Technology, IEEE Standard 802.11n-2009, the State Key Laboratory of Industrial Control
2009. Technology, College of Control Science and En-
[16] K. Zhao, W. Zhao, H. Sun, X. Zhang, N. Zheng, and T. Zhang, “LDPC- gineering, Zhejiang University, Hangzhou.
in-SSD: Making advanced error correction codes work effectively in solid His research interests include intelligence in-
state drives,” in Proc. 11th USENIX Conf. File Storage Technol., 2013, dustrial systems security, data-driven industrial
pp. 243–256. monitoring systems, and industrial data augmentation.
[17] T. Richardson and S. Kudekar, “Design of low-density parity check codes
for 5G new radio,” IEEE Commun. Mag., vol. 56, no. 3, pp. 28–34,
Mar. 2018.
[18] M. Davey and D. MacKay, “Low-density parity check codes over GF(q),”
IEEE Commun. Lett., vol. 2, no. 6, pp. 165–167, Jun. 1998.
[19] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
deep learning models resistant to adversarial attacks,” in Proc. 6th Int. Zhiqiang Ge (Senior Member, IEEE) received
Conf. Learn. Representations, (ICLR), 2018. the B.Eng. and Ph.D. degrees in automation
[20] S. Gu and L. Rigazio, “Towards deep neural network architectures robust from the Department of Control Science and
to adversarial examples,” in Proc. 3rd Int. Conf. Learn. Representations, Engineering, Zhejiang University, Hangzhou,
(ICLR), 2015. China, in 2004 and 2009, respectively.
[21] Y. Song, T. Liu, T. Wei, X. Wang, Z. Tao, and M. Chen, “FDA3: Federated From July 2010 to December 2011, he was
defense against adversarial attacks for cloud-based IIoT applications,” a Research Associate with the Department of
IEEE Trans. Ind. Informat., vol. 17, no. 11, pp. 7830–7838, Nov. 2021. Chemical and Biomolecular Engineering, Hong
[22] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning in Kong University of Science Technology, Hong
computer vision: A survey,” IEEE Access, vol. 6, pp. 14410–14430, 2018. Kong. From January 2013 to May 2013, he
[23] G. Verma and A. Swami, “Error correcting output codes improve proba- was a Visiting Professor with the Department of
bility estimation and adversarial robustness of deep neural networks,” in Chemical and Materials Engineering, University of Alberta, Edmonton,
Proc. Int. Conf. Neural Inf. Process. Syst., 2019, pp. 8646–8656. AB, Canada. He is currently a Full Professor with the College of Control
[24] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Science and Engineering, Zhejiang University. His research interests
Inf. Theory, vol. IT-27, no. 5, pp. 533–547, Sep. 1981. include industrial big data, process monitoring, soft sensor, data-driven
[25] D. J. C. MacKay, “Good error-correcting codes based on very sparse modeling, machine intelligence, and knowledge automation.
matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar. 1999. Dr. Ge was an Alexander von Humboldt Research Fellow with the
[26] H. Song and J. Cruz, “Reduced-complexity decoding of q-ary LDPC codes University of Duisburg-Essen, Duisburg, Germany, from November 2014
for magnetic recording,” IEEE Trans. Magn., vol. 39, no. 2, pp. 1081–1087, to January 2017, and also a Japan Society for the Promotion of Science
Mar. 2003. invitation Fellow with Kyoto University, Kyoto, Japan, from June 2018 to
[27] I. J. Cox, M. L. Miller, J. A. Bloom, and C. Honsinger, Digital Watermark- August 2018.
ing. New York, NY, USA: Springer, 2002.

Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on November 25,2022 at 10:17:56 UTC from IEEE Xplore. Restrictions apply.

You might also like