0% found this document useful (0 votes)
5 views

Analytic Fault Detection

Uploaded by

wkwong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Analytic Fault Detection

Uploaded by

wkwong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO.

2, JUNE 2010 287

Analytic Confusion Matrix Bounds for Fault Detection


and Isolation Using a Sum-of-Squared-Residuals
Approach
Dan Simon, Senior Member, IEEE, and Donald L. Simon

Abstract—Given a system which can fail in 1 of different ways, NOTATION


a fault detection and isolation (FDI) algorithm uses sensor data CCR of fault
to determine which fault is the most likely to have occurred. The
effectiveness of an FDI algorithm can be quantified by a confusion Marginal CCR of fault relative to fault
matrix, also called a diagnosis probability matrix, which indicates Marginal detection rate of fault relative to
the probability that each fault is isolated given that each fault has fault
occurred. Confusion matrices are often generated with simulation
Chi-squared pdf
data, particularly for complex systems. In this paper, we perform
FDI using sum-of-squared residuals (SSRs). We assume that the Noncentral chi-squared pdf
sensor residuals are -independent and Gaussian, which gives Chi-squared CDF
the SSRs chi-squared distributions. We then generate analytic
lower, and upper bounds on the confusion matrix elements. This Noncentral chi-squared CDF
approach allows for the generation of optimal sensor sets without Number of sensors used to detect a fault
numerical simulations. The confusion matrix bounds are verified
with simulated aircraft engine data. Cardinality of
Cardinality of
Index Terms—Aircraft turbofan engine, chi-squared distribu-
tion, confusion matrix, diagnosis probability matrix, fault detec- Probability that no fault is detected given that
tion and isolation. fault occurred
Probability that fault is isolated given that no
fault occurred
ACRONYM Probability that fault is isolated given that fault
Commercial modular aero-propulsion occurred
system simulation Marginal misclassification rate of fault given
Correct classification rate that fault occurred
Marginal misclassification rate of fault relative
Correct no-fault rate to fault given no fault
Fault detection and isolation Number of possible fault conditions
False negative rate Nc Core speed
False positive rate P15 Bypass duct pressure
High pressure compressor P24 LPC outlet pressure
High pressure turbine Ps30 HPC outlet pressure
Low pressure compressor Normalized residual of the th fault detection
Low pressure turbine algorithm
Sum of squared residual Fault detection threshold
True negative rate T24 LPC outlet temperature
True positive rate T30 HPC outlet temperature
T48 HPT outlet temperature
Wf Fuel flow
Manuscript received April 05, 2009; revised July 08, 2009, September 04, Residual of the th sensor
2009, and October 16, 2009; accepted October 26, 2009. Date of publication-
April 19, 2010; date of current version June 03, 2010. This work was supported
Sensors unique to algorithm
by the NASA Faculty Fellowship Program. Associate Editor: H. Li. Normalized residual of the th sensor in
D. Simon is with the Cleveland State University, Cleveland, OH, USA
(e-mail: [email protected]). Sensors common to two fault detection
D. L. Simon is with the NASA Glenn Research Center, Cleveland, OH, USA algorithms
(email: [email protected]). Normalized residual of the th sensor in
Digital Object Identifier 10.1109/TR.2010.2046772

0018-9529/$26.00 © 2010 IEEE

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
288 IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO. 2, JUNE 2010

simulation results, which we do in Section IV using an aircraft


Normalized residual of the th sensor in turbofan engine model. Section V presents some discussion, and
Mean of conclusions.
Standard deviation of II. AN SSR-BASED FDI ALGORITHM
This section presents the background, and an overview of our
I. INTRODUCTION proposed SSR-based FDI algorithm for a static, linear system.
To perform FDI, sensor residuals are computed at each mea-
M ANY different methods of fault detection and isolation
(FDI) have been proposed. Frequency domain methods
include monitoring resonances [1], or modes [2]. Filter-based
surement time, and the SSRs are used. If the sensor residuals are
Gaussian, then the SSRs have chi-squared distributions, which
methods include observers [3], unknown input observers [4], allows the formulation of analytic bounds on the confusion ma-
Kalman filters [5], particle filters [6], sliding mode observers trix elements as discussed in Sections III-A–III-C.
[7], filters [8], and set membership filters [9]. There are
A. Sensor Residuals, and Chi-Squared Distributions
also methods based on computer intelligence [10] that include
fuzzy logic [11], neural networks [12], genetic algorithms [13], The residual of the th sensor is denoted as , and is a mea-
and expert systems [14]. Other methods include those based on surement of the difference between the sensor output and its
Markov models [15], system identification [16], wavelets [17], nominal no-fault output. In the no-fault case, has a zero ex-
Bayesian inference [18], control input manipulation [19], and pected value. In the fault case, the mean of is . In either
the parity space approach [20]. Many other FDI methods have case, the standard deviation of is . The mean depends
also been proposed [21], some of which apply to special types on which fault occurs. But for simplicity of notation, we do not
of systems. indicate that -dependence in our notation. An SSR is given as
The parity space approach to FDI compares the sensor
residual vector to nominal user-specified fault vectors, and the (1)
closest fault vector is isolated as the most likely fault. If the
sensor residual vectors are Gaussian, the parity space approach 1) No-Fault Condition: In the no-fault case, has a zero ex-
allows an analytic computation of the confusion matrix. The pected value. If each is a -independent zero-mean Gaussian
FDI approach that we propose is philosophically similar to the random variable, then is a random variable with a chi-squared
parity space approach, but instead of using fault vectors, we use distribution [23]. We use the notation , and to
sum-of-squared residuals (SSRs) to detect and isolate a fault. Our denote its pdf, and CDF respectively. We use a user-specified
approach is chosen because of its amenability to a new statistical threshold to detect a fault.
method for the calculation of confusion matrix bounds.
A preliminary version of this paper was published as a tech-
nical report [22]. This paper has corrected proofs and expanded
simulation results.
Note that fault isolation is a different issue than fault detection.
If sensor residuals are Gaussian, the SSRs have a chi-squared
Detection of fault means that for fault detection
distribution [23]. This allows for the specification of SSR
algorithm . However, it may be that for more than one
bounds for fault detection, which have a known false negative
value of . In that case, multiple faults have been detected, and
rate (FNR), and false positive rate (FPR). We can also compare
a fault isolation algorithm is required to isolate the most likely
the SSRs for each fault type to determine which fault is most
fault.
likely to have occurred, and then find analytic bounds for
The true negative rate (TNR) for fault is the probability that
fault isolation probabilities. Our FDI algorithm is new, but the
given that there are no faults. The FPR for fault is the
primary contribution of this paper is to show how confusion
probability that given that there are no faults. These
matrix element bounds can be derived analytically. The FDI
probabilities are given as
algorithm that we propose is fairly simple, but the confusion
matrix analysis that we develop is novel, and its ideas may be
adaptable to other FDI algorithms. (2)
Our approach is to first specify the magnitude of each fault
that we want to detect, along with a target FPR. For each fault, Fig. 1 illustrates TNR, and FPR for a chi-squared SSR. The TNR
we then find the sensor set that gives the largest true positive rate is the area to the left of the user-specified threshold , and
(TPR) for the given FPR. Then we use statistical approaches to the FPR is the area to the right of the threshold.
find confusion matrix bounds. The confusion matrix bounds are 2) Fault Condition: If a fault occurs, then the terms in (1)
the outputs of this process. We cannot specify desired confusion will not, in general, have a mean value of zero. In this case,
matrix bounds ahead of time; the bounds are the -dependent has a noncentral chi-squared distribution [23], and we use
variables of the sensor selection process. , and to denote its pdf, and CDF, where
The goal of this paper is threefold. Our first goal is to present is given as
our SSR-based FDI algorithm, which we do in Section II. Our
second goal is to derive confusion matrix bounds, which we
do in Section III. Our third goal is to confirm the theory with

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
SIMON AND SIMON: ANALYTIC CONFUSION MATRIX BOUNDS FOR FDI USING AN SSR APPROACH 289

TABLE I
TYPICAL CONFUSION MATRIX FORMAT, WHERE THE ROWS CORRESPOND
TO FAULT CONDITIONS, AND THE COLUMNS CORRESPOND TO FAULT
ISOLATION RESULTS

C. Summary of SSR-Based FDI Algorithm


Our FDI approach is to first specify the magnitude of each
fault that we want to detect, along with a maximum allowable
Fig. 1. Illustration of a chi-squared pdf of an SSR with k = 10 sensors. FPR. For each fault, we then find the sensor set that gives the
largest TPR for the given FPR. This idea can be seen by exam-
ining Figs. 1 and 2. For a given fault, we will obtain different
Figs. 1 and 2 for each possible sensor set. Given a particular
Fig. 1 for a specific sensor set, we obtain a detection threshold
that corresponds to our allowable FPR. Given a particular
threshold , we obtain a TPR from Fig. 2. Intuitively we want to
use sensors with large fault signatures in our FDI algorithm, and
this result leads to the algorithm shown in Fig. 3 for selecting a
sensor set for each fault.
Note that, although the sensor selection algorithm is logical,
it is not necessarily optimal for FDI. The sensor selection algo-
rithm in Fig. 3 is executed once for each fault that we want to
detect. After we have selected a sensor set for each fault, any
SSR that is greater than its threshold is considered to have been
detected. If more than one SSR is greater than its threshold, the
SSR that is largest relative to its threshold is isolated as the most
Fig. 2. Illustration of a noncentral chi-squared pdf of an SSR with k = 10 likely fault. The FDI algorithm is summarized in Fig. 4. The
sensors, and  = 40 . strategy of isolating a fault using relative SSR values is a rea-
sonable ad-hoc approach, but is not necessarily optimal.

The TPR is defined as the probability that fault is correctly III. CONFUSION MATRIX BOUNDS
detected given that it occurs. This approach does This section derives analytic confusion matrix bounds for our
not take fault isolation into account. The FNR is defined as the SSR-based FDI algorithm. Section III-A deals with the no-fault
probability that fault is not detected given that it case, and derives bounds for the correct no-fault rate (CNR),
occurs. These probabilities can be written as which is the probability that no fault is detected given that no
fault occurs. It also derives bounds for the FPR, which is the
probability that one or more faults are detected given that no
(3) fault occurred. Finally, it derives an upper bound for the no-fault
misclassification rate, which is the probability that a given fault
Fig. 2 illustrates TPR, and FNR for a chi-squared SSR. The FNR is isolated given that no fault occurred. Section III-B deals with
is the area to the left of the user-specified threshold , and the fault case, and derives bounds for the correction classifica-
the TPR is the area to the right of the threshold. tion rate (CCR), which is the probability that a given fault is
correctly isolated given that it occurred. Section III-C also deals
B. Confusion Matrix with the fault case, and derives upper bounds for the fault mis-
A confusion matrix specifies the likelihood of isolating each classification rate, which is the probability that an incorrect fault
fault, and can be used to quantify the performance of an FDI al- is isolated given that some other fault occurred. Section III-D
gorithm. A typical confusion matrix is shown in Table I. The summarizes the bounds, and their use in the confusion matrix;
rows correspond to fault conditions, and the columns corre- and Section III-E discusses the required computational effort.
spond to fault isolation results. The element in the th row and
th column is the probability that fault is isolated when fault A. No-Fault Case
occurs. Ideally, the confusion matrix would be an identity ma- 1) Correct No-Fault Rate: First, suppose that only two fault
trix, which would indicate perfect fault isolation. detection algorithms, , and , are running. Algorithm at-

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
290 IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO. 2, JUNE 2010

Fig. 3. Sensor selection algorithm for a specific fault.

Suppose that we have only two fault detection algorithms: ,


and . Given that no fault occurred, the probability that fault
is isolated is called the marginal misclassification of fault
relative to fault , and is given as

Lemma 1: If neither , , nor are empty, then


Fig. 4. SSR-based FDI algorithm.

(6)
tempts to detect fault using sensors, and threshold . We
use the notation
If is empty, and and are not empty, then

(7)

If is empty, but and are not empty, then

We use the notation to denote the th normalized residual of


the sensors used in algorithm , with similar meanings for , (8)
and . That is,

(4) If is empty, but and are not empty, then

Now suppose that there are fault detection algorithms. In


this case, we can write the correct no-fault rate (CNR), which (9)
is the probability that all of the SSRs are below their detection
thresholds given that no fault occurred. Proof: Equation (6) can be obtained using Lemmas 5, 6, 7
and 10, which are listed in the Appendix. Equation (7) follows
(5) from the -independence of , and . Equation (8) can be ob-
tained using Lemmas 5, 7, and 11. Equation (9) can be obtained
Theorem 1: The CNR can be bounded as using Lemmas 5, 6, and 11.
The preceding lemma leads to the following result for the
fault misclassification rate in the no-fault case.
Theorem 2: If we have fault detection algorithms, the
probability that fault is isolated given that no fault occurred
can be bounded as
where is given in (2).
Proof: See the Appendix.
2) Fault Misclassification Rates in the No-Fault Case: Given
that no fault occurred, the probability that fault is incorrectly
isolated is called the misclassification rate, . In this section, where is given by one of (6)–(9) for each .
we derive upper bounds for this probability. Proof: See the Appendix.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
SIMON AND SIMON: ANALYTIC CONFUSION MATRIX BOUNDS FOR FDI USING AN SSR APPROACH 291

B. Correct Fault Classification Rates Lemma 3: If neither , , nor are empty, then
Given that some fault occurs, we might isolate the correct
fault, or we might isolate an incorrect fault. The probability of
isolating the correct fault is called the correct classification rate
(CCR). In this section, we derive lower, and upper bounds for
the CCR.
1) Lower Bounds for the Correct Classification Rate: Sup- (16)
pose we have only two fault detection algorithms, and , and
fault occurs. Consider the probability that is larger than where is defined analogously to , shown in (13). If is
relative to their thresholds. We call this probability the marginal empty, but and are not empty, then
detection rate . Note that we are not considering whether
or not the SSRs exceed their threshold; we are only considering (17)
how large the SSRs are relative to their thresholds. The marginal
detection rate is given as
If is empty, but and are not empty, then
(10)

(11)
(18)

Lemma 2: If neither nor are empty, then


If is empty, but and are not empty, then
(12)

where
(19)
(13)
Proof: Equation (16) can be obtained using Lemmas 5, 6,
If is empty, and is not empty, then 7, and 10, which are in the Appendix. Equation (17) follows
from the -independence of , and . Equation (18) can be
(14) obtained using Lemmas 5, 7, and 11. Equation (19) can be ob-
tained using Lemmas 5, 6, and 11.
If is empty, and is not empty, then
The preceding lemma leads to the following result for the
(15) correct fault isolation rate.
Theorem 4: If we have fault detection algorithms, and
Proof: Equation (12) can be obtained using Lemmas 5, and fault occurs, the probability that fault is correctly detected
6, which are in the Appendix. Equations (14), and (15) follow and isolated can be bounded as
from (11).
The preceding lemma leads to the following result for the
correct fault isolation rate.
Theorem 3: If we have fault detection algorithms, and Proof: See the Appendix.
fault occurs, the probability that fault is correctly isolated is
bounded as C. Fault Misclassification Rates
In this section, we derive upper bounds for the probability that
a fault is incorrectly isolated. If fault occurs, the probability
that fault is detected and isolated is called the misclassification
Proof: See the Appendix. rate .
2) Upper Bounds for the Correction Classification Rate: First, suppose that we have two fault detection algorithms: ,
Next, we find an upper bound for the CCR. To begin, suppose and . The misclassification rate can then be written as
that we have only two fault detectors: algorithms , and .
Given that fault occurs, the probability that it is correctly
isolated is called the marginal CCR. This CCR can be written
as

(20)
where the prime symbol on denotes that only two detection
algorithms are used.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
292 IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO. 2, JUNE 2010

Lemma 4: If neither , , nor are empty, then • for , and is the probability that fault
is incorrectly isolated given that fault occurs, and its
upper bound is given in Theorem 5.
• for is the probability that no fault is isolated
given that fault occurs, and its upper bound is given in
Theorem 6.
(21)
E. Computational Effort
If is empty, but and are not empty, then
Usually, confusion matrices are obtained through simula-
tions. To derive an experimental confusion matrix with faults,
the number of matrix elements that need to be calculated is
(22) on the order of . Also, the required number of simulations
for each matrix element calculation is on the order of . This
If is empty, but and are not empty, then size is because, as the number of possible faults increases, the
number of simulations required to obtain the same statistical
(23) accuracy increases in direct proportion. Therefore, the compu-
tational effort required for the experimental determination of a
If is empty, but and are not empty, then confusion matrix is on the order of .
The bounds derived in this paper also require computational
effort that is on the order of . This size is because each of
(24) the bounds summarized in Section III-D required computational
effort on the order of , and the number of matrix elements is on
Proof: Equation (21) can be obtained using Lemmas 5, 6, the order of . Note that this size does not include the sensor
7, and 10, which are in the Appendix. Equation (22) can be ob- selection algorithm shown in Fig. 3, which requires the off-line
tained using Lemmas 5, 7, and 11. Equation (23) follows from solution of a discrete minimization problem.
(20), and the -independence of , and . Equation (24) fol-
lows from Lemmas 5, 6, and 11. IV. SIMULATION RESULTS
The preceding lemma leads to the following results for the In this section, we use simulation results to verify the theoret-
fault misclassification rate. ical bounds of the preceding sections. We consider the problem
Theorem 5: If we have fault detection algorithms, of isolating an aircraft turbofan engine fault, which is modeled by
and fault occurs, the probability that fault will be incorrectly the NASA Commercial Modular Aero-Propulsion System Sim-
detected and isolated can be bounded as ulation (C-MAPSS) [25]. There are five possible faults that can
occur: fan, low pressure compressor (LPC), high pressure com-
pressor (HPC), high pressure turbine (HPT), and low pressure
Proof: See the Appendix. turbine (LPT). These five faults entail shifts of both efficiency,
Theorem 6: The probability that no fault is detected and flow capacity from nominal values. The fault magnitudes that
when fault occurs can be bounded from above as we try to detect are 2.5% for the fan, 20% for the LPC, 2% for the
HPC, 1.5% for the HPT, and 2% for the LPT. These magnitudes
were chosen to give reasonable fault detection ability.
The available sensors, and their standard deviations are shown
Proof: See the Appendix. in Table II. Recall that our FDI algorithm assumes that the sensor
noises are -independent. In reality, they may have some correla-
D. Summary of Confusion Matrix Bounds
tion. For example, if the aircraft is operating in high humidity, all
Recall the confusion matrix in Table I. The rows correspond of the pressure sensors may be slightly biased in a similar fashion.
to fault conditions, and the columns correspond to fault isola- However, the sensor noise correlation is a second order effect,
tion results. The element in the th row and th column is the and so we make the simplifying but standard assumption that
probability that fault is isolated when fault occurs. The pre- the correlations are zero. This assumption is conceptually sim-
vious sections derived the following bounds. ilar to our simplifying assumption of Gaussian noise.
• CNR is the probability that a no-fault condition is cor- The fault influence coefficient matrix shown in Table III was
rectly indicated given that no fault occurs, and its lower, generated using C-MAPSS, and is based on [26]. The numbers
and upper bounds are given in Theorem 1. in Table III are the partial derivatives of the sensor outputs with
• for is the probability that fault is incorrectly respect to the fault conditions, normalized to the fault percent-
isolated given that no fault occurs, and its upper bound is ages discussed above, and normalized to one standard deviation
given in Theorem 2. of the sensor noise.
• for is the probability that fault is cor- We used the algorithm shown in Fig. 3 to select sensors for
rectly isolated given that it occurs, and its lower, and upper each fault with a maximum allowable FPR of 0.0001. As an ex-
bounds are given in Theorems 3 and 4. ample, consider the fan fault with the normalized fault signatures

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
SIMON AND SIMON: ANALYTIC CONFUSION MATRIX BOUNDS FOR FDI USING AN SSR APPROACH 293

TABLE II TABLE V
AIRCRAFT ENGINE SENSORS, AND STANDARD DEVIATIONS AS A PERCENTAGE SENSOR SETS FOR FAULT DETECTION GIVING THE LARGEST TPR FOR EACH
OF THEIR NOMINAL VALUES FAULT GIVEN THE CONSTRAINT THAT FPR 0.0001 

TABLE VI
LOWER BOUNDS OF DIAGONAL CONFUSION MATRIX ELEMENTS WHERE
ROWS SPECIFY THE ACTUAL FAULT CONDITION, AND COLUMNS SPECIFY
TABLE III THE DIAGNOSIS
FAULT SIGNATURES OF FIVE DIFFERENT FAULT CONDITIONS, WITH MEAN
SENSOR VALUE RESIDUALS NORMALIZED TO ONE STANDARD DEVIATION

TABLE VII
UPPER BOUNDS OF THE CONFUSION MATRIX ELEMENTS WHERE ROWS SPECIFY
THE ACTUAL FAULT CONDITION, AND COLUMNS SPECIFY THE DIAGNOSIS
TABLE IV
POTENTIAL SENSOR SETS FOR DETECTING A FAN FAULT

TABLE VIII
EXPERIMENTAL CONFUSION MATRIX USING SSR-BASED DI WHERE ROWS
SPECIFY THE ACTUAL FAULT CONDITION, AND COLUMNS SPECIFY THE
DIAGNOSIS, BASED ON 100,000 SIMULATIONS OF EACH FAULT
shown in Table III. The sensors with the largest fault signatures in
descending order are Ps30, Wf, T30, P15, P24, T48, Nc, and T24.
This gives eight potential sensor sets for detecting a fan fault:
the first potential set uses only sensor Ps30, the second poten-
tial set uses Ps30 and Wf, and so on. The potential sensor sets,
along with their detection thresholds, and TPRs, are shown in
Table IV. Table IV shows that using five sensors gives the largest
TPR given the constraint that FPR 0.0001. The thresholds were
determined by constraining FPR 0.0001. Using five sensors elements. Table VII shows the theoretical upper bounds of the
gives the largest TPR subject to the FPR constraint. confusion matrix. Table VIII shows the experimental confusion
This process described in the previous paragraph was re- matrix. These tables show that the theoretical results derived in
peated for each fault shown in Table III. The resulting sensor this paper give reasonably tight bounds to the experimental con-
sets are shown in Table V. Note that, given a FPR constraint, the fusion matrix values.
detection threshold is a function only of the number of sensors Recall that we used a FPR of 0.0001 to choose our sensor
in each sensor set; the detection threshold is not a function of sets, and detection thresholds. Therefore, the first five elements
the specific fault signatures. This result is illustrated in Fig. 1, in the last row of Table VII are guaranteed to be no greater than
where it is seen that is a function only of , and (the 0.0001. Further, the element in the lower right corner of Table VI
number of sensors). is guaranteed to be no greater than .
We used the fault isolation method shown in Fig. 4, along Note that it is possible for an element in the experimental con-
with the theorems in the previous sections to obtain lower, and fusion matrix in Table VIII to lie outside the bounds shown in
upper bounds for the confusion matrix as summarized in Sec- Tables VI and VII (for example, see the numbers in the fourth
tion III-D. We also ran 100,000 simulations to obtain an exper- row, and first column in Tables VII and VIII). This result is true
imental confusion matrix. Table VI shows the theoretical lower because the numbers in Table VIII are experimentally obtained
bounds of the diagonal elements of the confusion matrix. Lower on the basis of a finite number of simulations, and are guaranteed
bounds of the off-diagonal elements were not obtained because to lie within their theoretical bounds only as the number of simu-
we are typically more interested in upper bounds of off-diagonal lations approaches infinity. In fact, that is one of the strengths of

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
294 IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO. 2, JUNE 2010

TABLE IX quantified. This paper derives bounds, but does not guarantee
EXPERIMENTAL CONFUSION MATRIX USING THE PARITY-SPACE APPROACH how loose or tight those bounds are. Second, the bounds could
FOR FDI, BASED ON 100,000 SIMULATIONS OF EACH FAULT
be modified to be tighter. Third, bounds could be attempted for
methods other than the FDI algorithm proposed here. The fault
isolation method we used isolates the fault that has the largest
SSR relative to its detection threshold. Other fault isolation
methods could normalize the relative SSR to its standard devia-
tion, or could normalize the SSR to its detection threshold. Our
FDI method is static, which means that faults are isolated using
measurements at a single time. Better fault isolation might be
achieved if dynamic system information is used.
the analytic method proposed in this paper. The analytic bounds
are definite, but simulations are subject to random effects. Also,
APPENDIX
simulations can give misleading conclusions if the simulation
has errors. One common simulation error is the non-random- We use the following lemmas to derive the results of this
ness of commonly used pseudorandom number generators [27]. paper. We use the notation , and to denote the pdf,
To summarize the SSR-based FDI algorithm, the user speci- and CDF of the random variable evaluated at . If the random
fies the maximum FPR for each fault, and then finds the sensor variable is clear from the context, we shorten the notation to
set that has the largest TPR given the FPR constraint. Analytic , and respectively. These lemmas can be proven using
confusion matrix bounds are then obtained using the theory in standard definitions, and results from probability theory [24].
this paper. If the results are not satisfactory, the user can it- Lemma 5: The probability that a realization of the random
erate by changing the maximum FPR constraint. For example, variable is greater than a realization of the random variable
if a TPR is too small, then the user will have to increase the is given as
FPR constraint. If the confusion matrix bounds of fault isolation
probabilities are not satisfactory, the user will have to iterate on
the FPR constraints to obtain different confusion matrix bounds.
We also generated FDI results using the parity space approach where is the joint pdf of , and . If , and are -in-
[20] to explore the relative performance of our new SSR-based dependent, this result can be written as
FDI approach. The parity space approach uses all sensors for all
fault detectors, and we set the detection thresholds to achieve an
FPR of 0.0001 to be consistent with the SSR-based approach.
Results are shown in Table IX. A comparison of Tables VIII
and IX shows that the parity space approach generally performs
better than the SSR-based approach, although the results are Lemma 6: If , where is a random variable, and
comparable. The confusion matrix in Table VIII for the SSR- is a constant, then
based algorithm has a condition number of 1.83, while the ma-
trix in Table IX for the parity space approach has a condition
number of 1.65. This result shows that the confusion matrix for
the parity space approach is about 9.8% closer to perfect than Lemma 7: If , where is a random variable, and
the confusion matrix for the SSR-based approach. is a constant, then

V. CONCLUSION
This paper has introduced a new FDI algorithm, and derived
analytical confusion matrix bounds. The main contribution of
this paper is the generation of analytic confusion matrix bounds, Lemma 8: If , where and are -independent
and the possibility that our methodology could be adapted to random variables, then
other FDI algorithms. Usually, confusion matrices are obtained
with simulations. Such simulations have several potential draw-
backs. First, they can be time consuming. Second, they can give
Lemma 9: If , where is a random variable,
misleading conclusions if not enough simulations are run to
and is a constant, then
give statistically significant results. Third, they can give mis-
leading conclusions if the simulation has errors (for example, if
the output of the random number generator does not satisfy sta-
tistical tests for randomness). The theoretical confusion matrix
bounds derived in this paper do not depend on a random number
generator, and can be used in place of simulations.
Further work in this area could follow several directions.
First, the tightness of the confusion matrix bounds could be where is the continuous-time impulse function.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
SIMON AND SIMON: ANALYTIC CONFUSION MATRIX BOUNDS FOR FDI USING AN SSR APPROACH 295

Lemma 10: If , where and are -indepen- where the inequality comes from the positive dependence of
dent random variables, then , . The probability that fault is isolated given
that fault occurred can be written as

Lemma 11: If , where is a random variable,


and is a constant, then
where the inequality comes from the positive dependence of the
random variables , and , .
Proof of Theorem 4: If we have fault detection algo-
rithms, and fault occurs, the probability that fault is correctly
detected and isolated is the probability that is greater than its
. threshold, and also greater than all other SSRs relative to their
thresholds.
Proof of Theorem 1: Equation (5) gives the definition of
CNR as

where the are constant, and the are random variables.


If none of the fault detection algorithms have any sensors in
common, then each is -independent, which means that Proof of Theorem 5: Given that we have fault detec-
tion algorithms, the misclassification rate is bounded from
above by . So to obtain an upper bound for , we use one
of (21)–(24) as appropriate. This approach gives

If the algorithms have common sensors, then the terms are


positively -dependent, which will increase the CNR. On the
other hand, if there is some such that is a superset of
for all , for all , Proof of Theorem 6: The probability that no fault is
and for all , then for all detected when fault occurs is given as
, which means that

Proof of Theorem 2: Given fault detection algo-


rithms, the probability that fault is isolated given that no fault REFERENCES
occurred is the probability that is greater than its threshold, [1] S. Chinchalkar, “Determination of crack location in beams using
natural frequencies,” Journal of Sound and Vibration, vol. 247, pp.
and also greater than all of the other SSRs relative to their 417–429, Oct. 2001.
thresholds. [2] T. Tsai and Y. Wang, “Vibration analysis and diagnosis of a cracked
shaft,” Journal of Sound and Vibration, vol. 192, pp. 607–620, May
1996.
[3] H. Wang, Z. Huang, and D. Steven, “On the use of adaptive updating
rules for actuator and sensor fault diagnosis,” Automatica, vol. 33, pp.
217–224, Feb. 1997.
[4] J. Chen, R. Patton, and H. Zhang, “Design of unknown input observers
and robust fault detection filters,” International Journal of Control, vol.
63, pp. 85–105, Jan. 1996.
Proof of Theorem 3: First, we establish the positive -de- [5] J. Korbicz, J. Koscielny, Z. Kowalczuk, and W. Cholewa, Fault Diag-
pendence [28, p. 145] of the random variables for all nosis: Models, Artificial Intelligence, Applications. : Springer, 2004.
[6] P. Li and V. Kadirkamanathan, “Fault detection and isolation in non-
. Consider inequalities for . It linear stochastic systems—A combined adaptive Monte Carlo filtering
follows from (4) that is an increasing function of the and likelihood ratio approach,” International Journal of Control, vol.
negative squared normalized residuals of the common sensors 77, pp. 1101–1114, Dec. 2004.
[7] C. Tan and C. Edwards, “Sliding mode observers for robust detection
of , and thus the random variables , , are posi- and reconstruction of actuator sensor faults,” International Journal of
tively dependent. Robust and Nonlinear Control, vol. 13, pp. 443–463, Apr. 2003.
Now note that, if fault occurred, then the probability that [8] I. Yaesh and U. Shaked, “Robust H deconvolution and its application
to fault detection,” Journal of Guidance, Control and Dynamics, vol.
is larger than relative to its threshold for all is given as 23, pp. 1101–1112, Jun. 2000.
[9] C. Ocampo-Martinez, S. Tornil, and V. Puig, “Robust fault detection
using interval constraints satisfaction and set computations,” in IFAC
Symposium on Fault Detection, Supervision and Safety of Technical
Processes, Beijing, Aug. 30–Sep. 1 2006, pp. 1285–1290.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.
296 IEEE TRANSACTIONS ON RELIABILITY, VOL. 59, NO. 2, JUNE 2010

[10] W. Fenton, T. MicGinnity, and L. Maguire, “Fault diagnosis of elec- [24] A. Papoulis and S. Pillai, Probability, Random Variables, and Sto-
tronic systems using intelligent techniques: A review,” IEEE Trans. chastic Processes. : McGraw-Hill, 2002.
Systems, Man and Cybernetics: Part C – Applications and Reviews, [25] D. Frederick, J. DeCastro, and J. Litt, User’s Guide for the Commer-
vol. 31, pp. 269–281, Aug. 2001. cial Modular Aero-Propulsion System Simulation (C-MAPSS) NASA
[11] H. Schneider and P. Frank, “Observer-based supervision and fault de- Technical Memorandum TM-2007-215026.
tection in robots using nonlinear and fuzzy-logic residual evaluation,” [26] D. L. Simon, J. Bird, C. Davison, A. Volponi, and R. Iverson, “Bench-
IEEE Trans. Control System Technology, vol. 4, pp. 274–282, May marking gas path diagnostic methods: A public approach,” presented at
1996. the ASME Turbo Expo, Jun. 2008, Paper GT2008-51360, unpublished.
[12] M. Napolitano, C. Neppach, V. Casdorph, S. Naylor, M. Innocenti, and [27] P. Savicky and M. Robnik-Sikonja, “Learning random numbers: A
G. Silvestri, “Neural-network-based scheme for sensor failure detec- Matlab anomaly,” Applied Artificial Intelligence, vol. 22, pp. 254–265,
tion, identification and accommodation,” Journal of Guidance, Control Mar. 2008.
and Dynamics, vol. 18, pp. 1280–1286, Nov. 1995. [28] C. Lai and M. Xie, “Concepts of stochastic dependence in reliability
[13] Z. Yangping, Z. Bingquan, and W. DongXin, “Application of genetic analysis,” in Handbook of Reliability Engineering, H. Pham, Ed. :
algorithms to fault diagnosis in nuclear power plants,” Reliability En- Springer, 2003, pp. 141–156.
gineering and System Safety, vol. 67, pp. 153–160, Feb. 2000.
[14] W. Gui, C. Yang, J. Teng, and W. Yu, “Intelligent fault diagnosis in
lead-zinc smelting process,” in IFAC Symposium on Fault Detection,
Supervision and Safety of Technical Processes, Beijing, Aug. 30–Sep.
1 2006, pp. 234–239.
[15] S. Lu and B. Huang, “Condition monitoring of model predictive con- Dan Simon (S’89–M’90–SM’01) received a B.S. degree from Arizona State
trol systems using Markov models,” in IFAC Symposium on Fault De- University (1982), an M.S. degree from the University of Washington (1987),
tection, Supervision and Safety of Technical Processes, Beijing, Aug. and a Ph.D. degree from Syracuse University (1991), all in electrical engi-
30–Sep. 1 2006, pp. 264–269. neering. He worked in industry for 14 years at Boeing, TRW, and several
[16] R. Isermann, “Supervision, fault-detection and fault-diagnosis small companies. His industrial experience includes work in the aerospace,
methods—An introduction,” Control Engineering Practice, vol. 5, pp. automotive, agricultural, GPS, biomedical, process control, and software fields.
639–652, May 1997. In 1999, he moved from industry to academia, where he is now a professor in
[17] X. Deng and X. Tian, “Multivariate statistical process monitoring using the Electrical and Computer Engineering Department at Cleveland State Uni-
multi-scale kernel principal component analysis,” in IFAC Symposium versity. His teaching and research involves embedded systems, control systems,
on Fault Detection, Supervision and Safety of Technical Processes, Bei- and computer intelligence. He has published about 80 refereed conference and
jing, Aug. 30–Sep. 1 2006, pp. 108–113. journal papers, and is the author of the text Optimal State Estimation (John
[18] A. Pernestal, M. Nyberg, and B. Wahlberg, “A Bayesian approach Wiley & Sons, 2006).
to fault isolation—Structure estimation and inference,” in IFAC
Symposium on Fault Detection, Supervision and Safety of Technical
Processes, Beijing, Aug. 30–Sep. 1 2006, pp. 450–455.
[19] S. Campbell and R. Nikoukhah, Auxiliary Signal Design for Failure Donald L. Simon received a B.S. degree from Youngstown State University
Detection. : Princeton University Press, 2004. (1987), and an M.S. degree from Cleveland State University (1990), both
[20] F. Gustafsson, “Statistical signal processing approaches to fault detec- in electrical engineering. During his career as an employee of the US Army
tion,” Annual Reviews in Control, vol. 31, pp. 41–54, Apr. 2007. Research Laboratory (1987–2007), and the NASA Glenn Research Center
[21] J. Gertler, Fault Detection and Diagnosis in Engineering Systems. : (2007–present), he has focused on the development of advanced control, and
CRC, 1998. health management technologies for current, and future aerospace propulsion
[22] D. Simon and D. L. Simon, “Analytic confusion matrix bounds for systems. His specific research interests are in aircraft gas turbine engine
fault detection and isolation using a sum-of-squared-residuals ap- performance diagnostics, and performance estimation. He currently leads
proach, NASA Technical Memorandum TM-2009-215655,” Jul. 2009. the propulsion gas path health management research effort ongoing under
[23] M. Abramowitz and I. Stegun, Handbook of Mathematical Func- the NASA Aviation Safety Program, Integrated Vehicle Health Management
tions. : Dover, 1965. Project.

Authorized licensed use limited to: MULTIMEDIA UNIVERSITY. Downloaded on March 01,2021 at 09:42:15 UTC from IEEE Xplore. Restrictions apply.

You might also like