Quality Reliability Eng - 2024 - Chakraborty - Distribution‐Free Multivariate Process Monitoring a Rank‐Energy
Quality Reliability Eng - 2024 - Chakraborty - Distribution‐Free Multivariate Process Monitoring a Rank‐Energy
DOI: 10.1002/qre.3619
RESEARCH ARTICLE
KEYWORDS
measure transportation, multivariate process monitoring, multivariate ranks, rank-energy
statistic, sensorless drive diagnosis
1 INTRODUCTION
Since Hotelling’s pioneering work introducing the Hotelling 𝑇 2 -statistic,1 multivariate statistical process control (SPC)
in literature has attracted remarkable attentions of statisticians and industry practitioners.2 With ‘arrival’ of industry
4.0, these multivariate SPC methodologies have been applied across a variety of sectors. For instance, in semiconductor
manufacturing,3 network monitoring,4 image surveillance,5,6 to name a few. Numerous papers focusing on paramet-
ric multivariate SPC for symmetric and skewed processes could be found in the literature. Comprehensive reviews on
multivariate SPC are available in the monographs by Ge and Song7 and Qiu.2
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium,
provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
© 2024 The Author(s). Quality and Reliability Engineering International published by John Wiley & Sons Ltd.
Parametric multivariate SPC methods, despite their utility, have certain limitations. For instance, traditional 𝑇 2 -type
monitoring schemes require the normality assumption for a multivariate process, a condition which is often violated in
a real-world scenario. In addition, 𝑇 2 -type monitoring schemes assume that the sample size, 𝑛 is larger than the dimen-
sion, 𝑑 and their performance tends to deteriorate as 𝑑 increases.8 The performance of parametric monitoring methods,
either for Gaussian or non-Gaussian assumptions, deteriorates when these distributional assumptions are violated in prac-
tice. The impact of parameter estimation also proves to be a crucial factor in designing multivariate parametric control
charts.9,10 In addition, the estimation and monitoring of covariance matrix are also critical in multivariate SPC.11 Qiu12
mentioned that ‘A direct conclusion of the multivariate normality assumption is that the regression relationship between any
two subsets of the individual variables must be linear, which is rarely valid in practice’.
In contemporary applications enriched with high-dimensional data, the adoption of machine learning approaches in
SPC has become popular in recent years.13,14 Although some machine learning techniques applied in multivariate SPC do
not depend on the assumption of normality, the effectiveness of such methods is influenced by optimal parameter selec-
tion (e.g., the number of trees, window size, number of layers) and the quality of the training data.15–17 Some machine
learning approaches in multivariate SPC require density estimation or estimation of the correlation structure of the mul-
tivariate data.18,19 Moreover, these methods may suffer from the lack of power when the data is highly correlated.3 In that
sense, strictly speaking, the machine learning approach-based SPC methods are not exactly distribution-free. Therefore,
the multivariate nonparametric SPC tools that do not require information about the distribution of a multivariate process
are critically important.
The first effort, to the best of our knowledge, to develop multivariate nonparametric SPC method was undertaken by
Liu,20 who proposed multivariate control charts based on data depth. Systematic study on nonparametric multivariate SPC
then began with the seminal works of Qiu and Hawkins,21,22 and Qiu.23 As natural extension to the concepts of univariate
nonparametric SPC methods, multivariate nonparametric SPC methods are developed using the ranking information
of the multivariate process observations. Based on the notion of multivariate ranking, multivariate nonparametric SPC
methods could be broadly classified into two categories; (i) Longitudinal ranking-based methods; (ii) Cross-component
ranking-based methods.12 As mentioned in Qiu,12 longitudinal ranking refers to the ranking of multivariate observations
at different time points and cross-component ranking refers to the ranking across components. Qiu and Hawkins21,22
proposed cumulative sum (CUSUM) charts based on the antiranks of the components of the multivariate observations.
Qiu23 proposed a multivariate nonparametric control chat based on log-linear modelling.
Univariate sign and rank statistics were extended by several authors to 𝑑-dimensional Euclidean space (𝑑 ≥ 2), for
example, in Marden24 and Randles25 with an extensive overview given by Hettmansperger and McKean.26 Subsequently,
spatial sign or spatial rank-based SPC methods were proposed. For instance, Holland and Hawkins27 proposed a direc-
tional rank test-based SPC scheme. Whilst their method does not require a large amount of reference data, its robustness
is compromised.5,27 Bae et al.28 reviewed multivariate process monitoring methods based on the data-depth. Mukherjee
and Marozzi29 noted that the depth-based methods are not robust. Several researchers have noted that SPC methods based
on data depth tend to be less effective, particularly in scenarios where extensive reference data are unavailable.12,30
Following Randles25 and Hettmansperger and Randles,31 several multivariate nonparametric SPC methods were pro-
posed based on multivariate sign and rank statistics.3,32–39 Note that, Randles25 indicated that the multivariate sign test
is robust for elliptical-direction class of distributions. Chen et al.40 have proposed Wilcoxon statistic-based approach to
monitor the data streams individually. Zhang et al.41 have extended Chen et al.40 by splitting the high-dimensional data
into 2-dimensional ones, risking loss of information as noted by Zhang et al.3 Mukherjee and Marozzi29 introduced an
interpoint Euclidean distance-based multivariate SPC method. Sometimes the distance-based methods may suffer from
‘practitioners bias’ as we explain later in this article.
Due to the lack of canonical ordering in the 𝑑-dimensional Euclidean space 𝑑 , for dimension 𝑑 ≥ 2, fundamental
statistical notions like ranks and empirical quantiles do not extend canonically to 𝑑 ≥ 2.42 Consequently, component-wise
rank, spatial rank, spatial sign and depth-based ranks and the corresponding statistics do not possess the exact distribution-
freeness unlike their one-dimensional counterparts.42,43 Recently, Chernozhukov et al.44 and Hallin et al.42 have suggested
a significant advancement by proposing the multivariate ranks based on the measure transportation theory. For 𝑿 ∈ 𝑅𝑑 ,
the measure transportation theory is about finding an optimal map 𝑉 ∶ 𝑅𝑑 → 𝑅𝑑 that would minimise the Euclidean dis-
tance between 𝑿 and 𝑉(𝑿), see Hallin45 and Hallin and Mordant46 for further details. More comprehensive discussion on
this topic is provided in the subsequent section. In a recent article, Deb and Sen43 proposed a distribution-free, multivari-
ate two-sample test statistic, called the rank-energy (𝑅𝐸 2 ) statistic, based on the measure transportation theory. Whilst
Deb and Sen43 have laid the groundworks with foundational theorems, the power properties of the rank-energy test and
other related issues need further study.
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4070 CHAKRABORTY and FINKELSTEIN
In SPC literature, all SPC methods could be typically categorised into two types: Phase I and Phase II methods. Phase
I methods focus on monitoring offline data, during which no new process observations are collected. The stable dataset
established after Phase I monitoring is referred to as the ‘reference’ data. Conversely, Phase II methods involve online
process monitoring that collects new observations sequentially throughout the monitoring process. The data gathered
during this online monitoring is known as the ‘test’ data. Jones–Farmer et al.47 provided a comprehensive overview on
Phase I SPC for both univariate and multivariate processes. Qiu12 and Qiu48 provided an extensive overview of Phase II
SPC for univariate and multivariate processes. In this article, it is assumed that a Phase I reference data are available that
could be used for Phase II monitoring.
It is well established in SPC literature that time-weighted control charts such as the EWMA and CUSUM charts are
better suited in detecting persistent and smaller shifts, whilst the Shewhart charts excel in detecting transient and larger
shifts.12 As we do not need to estimate the weight parameters as in EWMA and CUSUM charts,49,50 Shewhart charts are
relatively easier to implement in practice. In this article, we explore the development of a multivariate Shewhart chart
based on the 𝑅𝐸 2 statistic. Time-weighted variations could be considered separately as topics for future research. The
main contributions of this paper could be listed as follows:
(i) We propose a new multivariate, distribution-free Shewhart monitoring scheme for online process monitoring based
on the rank-energy (𝑅𝐸 2 ) statistic;
(ii) We demonstrate that the proposed multivariate SPC method is robust whilst maintaining reasonable detection ability
for the process shift, which is justified by rigorous simulation experiments;
(iii) Practical application and efficiency of the proposed methodology is demonstrated by considering the sensorless drive
fault detection case study.
In Table 1, we provide the abbreviations and corresponding full forms for the ease of reading. The rest of the article is
organised as follows: In Section 2, we discuss the rank-energy (𝑅𝐸 2 ) statistic in connection with the theory of measure
transportation. In Section 3, process monitoring based on the proposed method is discussed. A detailed performance
study has been carried out in Section 4. In Section 5, a real-life application of the proposed method is provided regarding
sensorless drive fault detection. Finally, some concluding remarks are made in Section 6.
2 PRELIMINARIES
The proposed multivariate SPC method is based on the rank-energy (𝑅𝐸 2 ) statistic43 discussed in this section. The 𝑅𝐸 2
statistic is based on the notion of measure transportation that we briefly discuss next. Hallin45 has provided a simple for-
mulation of Monge’s problem (also known as the optimal transportation problem): Let 𝑑 be the 𝑑-dimensional Euclidean
space, and be the family of all probability distributions defined on 𝑑 and 1 , 2 ∈ . For 𝑿 ∈ 𝑑 , 𝑉 ∶ 𝑑 → 𝑑 , the
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4071
measure transportation problem is about finding an optimal transport map 𝑽 that minimises.
2
inf ||𝒙 − 𝑽 (𝒙)|| 𝑑1 subject to 𝑽# 1 = 2 , (1)
𝑉 ∫
𝑅𝑑
where 𝑽# 1 = 2 indicates that for 𝑿 ∼ 1 , 𝑽(𝑿) ∼ 2 , for 1 , 2 ∈ . The optimal solution 𝑽 of the optimisation
problem in Equation (1), if it exists, is referred to as the optimal transport map. Using this concept, Deb and Sen43 defined
an empirical rank function, also called the empirical rank map.
̃ ̃ ̃ ̃ 𝒙 ̃
Let 𝑿 𝑑
𝑛 = {𝑿1 , 𝑿2 , … , 𝑿𝑛 } be a set of random vectors following distribution 𝑭𝒏 , where 𝑿𝑖 ∈ , 𝑖 = 1, 2, . . . , 𝑛. Also let
𝑛 = {𝒈1 , 𝒈2 , … , 𝒈𝑛 } be the set of Halton sequence51 of dimension 𝑑 to be taken as the sample multivariate rank vectors.
The empirical rank map43 is defined as a function ̂𝒙𝑛̃ ∶ 𝒙𝑛̃ → 𝑛 that is an optimal transport map from 𝑭𝒏𝒙 to 𝑛 (the
empirical distribution on 𝑛 ), that is,
2
̂𝒙𝑛̃ = 𝑎𝑟𝑔𝑚𝑖𝑛𝑉 ||𝒙̃ − 𝑉 (𝒙)||
̃ 𝑑𝑭𝒏𝒙 subject to 𝑉# 𝑭𝒏𝒙 = 𝑛 . (2)
∫
𝑅𝑑
𝑛
∑ || ||2
𝜓 = 𝑎𝑟𝑔𝑚𝑖𝑛𝜎 ||𝒙̃ 𝒊 − 𝒈𝜎(𝑖) || , (3)
|| ||
𝑖=1
where 𝜎 is the set of all permutations of {1, 2, 3,. . . , 𝑛} and 𝜎(𝑖) ∈ 𝜎 ∀𝑖. The empirical rank of 𝒙̃ 𝒊 is
𝐻0 ∶ 𝑭1 = 𝑭2 ag. 𝐻1 ∶ 𝑭1 ≠ 𝑭2 . (5)
𝒙̃ ,𝒙̃
The joint empirical distribution function of 1𝑛1 and 2𝑛2 is given by 𝑭𝒏11,𝒏22 . Let 𝑛1 +𝑛2 be the Halton sequence of size
1 2
(𝑛1 + 𝑛2 ) and dimension 𝑑 with empirical distribution 𝑛1 +𝑛2 . The joint empirical rank map ̂
𝑛1 ,𝑛2 (.) is defined as an
𝒙̃ 1 ,𝒙̃ 2
optimal transport map from 𝑭𝒏1 ,𝒏2 to 𝑛1 +𝑛2 , following a similar definition as the empirical rank map in Equation (4).
The rank-energy statistic43 is defined as follows:
𝑛1 ∑𝑛2
2 ∑ || 1 2 1 2 ( )||
𝑅𝐸𝑛21 ,𝑛2 = ||̂𝑛1 ,𝑛2 (𝒙̃ 1𝑖 ) − ̂
𝑛 ,𝑛 𝒙̃ 2𝑗 ||
𝑛1 𝑛2 𝑖=1 𝑗=1 || 1 2 ||
𝑛1
1 ∑ || 1 2 1 2 ( )||
− ||̂𝑛1 ,𝑛2 (𝒙̃ 1𝑖 ) − ̂
𝑛 ,𝑛 𝒙̃ 1𝑗 ||
𝑛1 𝑖,𝑗=1 ||
2 1 2 ||
𝑛2
1 ∑ || 1 2 1 2
( )||
− | |̂𝑛 ,𝑛 ( 𝒙
̃ 2𝑖 ) − ̂𝑛 ,𝑛 𝒙̃ 2𝑗 || . (6)
𝑛22 𝑖,𝑗=1 || 1 2 1 2 ||
To have an intuitive understanding of why the empirical rank map in Equation (4) obtained from the optimisation
problem in Equation (3), and the corresponding statistics defined in Equation (6), thereafter, are distribution-free, we
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4072 CHAKRABORTY and FINKELSTEIN
consider an analogy within a one-dimensional setting. As discussed in Deb and Sen,43 in one-dimensional setting, let
𝑋1 , 𝑋2 , … , 𝑋𝑛 ∈ be iid random samples from some univariate distribution 𝐹 and the ranks of these observations could
be interpreted as a solution of the following optimisation problem given by
𝑛
∑ | |2
𝜎̂ = 𝑎𝑟𝑔𝑚𝑖𝑛𝜎 |𝑥𝑖 − 𝜎 (𝑖) | (7)
| 𝑛 ||
𝑖=1 |
where 𝜎 is the set of all permutations of {1,2,3,. . . , 𝑛} and 𝜎(𝑖) ∈ 𝜎 ∀𝑖. This interpretation of univariate ranking could be
extended to multivariate settings as an optimal transport mapping problem in Equation (2).43 Intuitively, the permutation
𝜎̂ that provides a solution of the optimisation problem in Equation (7) depends only on the observed values 𝑥1 , 𝑥2 , … , 𝑥𝑛
irrespective of the distribution 𝐹.
Note that the sum of squares in Equation (7) represents the Euclidean distance between the observed univariate 𝑿-
𝜎(1) 𝜎(2) 𝜎(𝑛)
sample and the set { , ,…, }. This concept could be readily extended to find the Euclidean distance between
𝑛 𝑛 𝑛
random vectors 𝒙𝑛̃ and the set of Halton sequence 𝑛 ,51 and the solution of the optimal transport mapping problem
should be free from the distribution of the random vectors 𝒙𝑛̃ , intuitively. Indeed, Deb and Sen43 proved that for the
absolutely continuous 𝑭1 and 𝑭2 , the rank-energy statistic 𝑅𝐸𝑛21 ,𝑛2 is exactly distribution-free and invariant under affine
transformations. The code to calculate the 𝑅𝐸 2 statistic can be found on the GitHub page of the first author of Deb and
Sen.43 We will use the rank energy statistic defined in Equation (6) for the monitoring plan to be developed in what follows.
3 MONITORING FRAMEWORK
In order to use the 𝑅𝐸𝑛21 ,𝑛2 statistic in Equation (6) in an online monitoring scheme, it is assumed that a stable reference
data is available against which the 𝑅𝐸𝑛21 ,𝑛2 statistic is obtained sequentially. It is recommended to confirm the stability of
the hardware/equipment from a well-established system, to ensure the stability of the reference data.
̃ ̃ ̃ ̃
Let us consider a reference sample 𝑿 𝑖𝑖𝑑
𝒎 = {𝑿1 , 𝑿2 , … , 𝑿𝒎 }∼ 𝑭1 of size 𝑚. During the online monitoring, subgroups
𝒀̃
𝒏,𝒕 = {𝒀̃ 1𝑡 , 𝒀̃ 2𝑡 , … , 𝒀̃ 𝑛𝑡 } ∼ 𝑭2 of size 𝑛 are obtained at time instances 𝑡 = 1, 2, 3,. . . The online monitoring scheme
𝑖𝑖𝑑
𝐻0 : 𝑭 1 = 𝑭 2
ag. 𝐻1 : 𝑭1 ≠ 𝑭2 , (8)
̃ ̃
For each 𝑡 = 1, 2, 3, . . . , the rank energy statistic as defined in Equation (6), is calculated on 𝑿 𝒀
𝒎 and 𝒏,𝒕 and denoted
2 2
by 𝑅𝐸𝑚,𝑛,𝑡 . When 𝑅𝐸𝑚,𝑛,𝑡 ≥ , for some decision limit > 0, the process is deemed out-of-control (OC) and process mon-
2
itoring is halted. Else the process is considered in-control (IC). The event [𝑅𝐸𝑚,𝑛,𝑡 ≥ ] is called a signalling event. When
a signal is detected, the system undergoes inspection. If the signal results from a system anomaly, the system will be recal-
ibrated, and online monitoring will continue. If the signal is due to a random cause of variation, it is regarded as a false
alarm, and no remedial action is required, and the online process monitoring continues.
̃
Choice of the decision limit is important in online process monitoring. The number of subgroups 𝒀 𝒏,𝒕 obtained till
2
𝑅𝐸𝑚,𝑛,𝑡 ≥ for some 𝑡, is called the run length. A standard performance measure in online process monitoring is the
average run length (𝐴𝑅𝐿).53 To decide the decision limits for different sample sizes 𝑚, 𝑛 and different dimension 𝑑, a
nominal 𝐴𝑅𝐿 value of 𝐴𝑅𝐿0 ≈ 370 is adopted when the process is IC. An efficient monitoring scheme should ideally detect
faults early in Phase II. As a result, when the process is OC, the corresponding 𝐴𝑅𝐿, denoted by 𝐴𝑅𝐿1 , should be less than
2
𝐴𝑅𝐿0 . An analytical expression for the 𝐴𝑅𝐿 is not available due to the density of the 𝑅𝐸𝑚,𝑛,𝑡 statistic lacking a closed-form
expression. We estimate the 𝐴𝑅𝐿 under 𝐻 = 𝐻0 and 𝐻1 by numerical integration via the Monte Carlo simulation, that is,
∞
1
𝐴𝑅𝐿 = [ ] 𝑑𝑭1 , (9)
∫ 2 ̃
0 𝑃𝑯 𝑅𝐸𝑚,𝑛 > |𝑿
𝒎
2 ̃
where 𝑃𝑯 [𝑅𝐸𝑚,𝑛 > |𝑿𝒎 ] is the conditional power estimated under 𝐻 = 𝐻0 and 𝐻1 . For a given decision limit , the
𝐴𝑅𝐿0 estimation involves averaging 100,000 values of corresponding conditional 𝐴𝑅𝐿0 , given specific values of 𝑚, 𝑛, 𝑑.
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4073
The decision limit is then iteratively adjusted to ensure that the obtained 𝐴𝑅𝐿0 ≈ 370. To estimate the 𝐴𝑅𝐿1 for and
2 ̃
specific values of 𝑚, 𝑛 and 𝑑, we estimate the conditional power 𝑃𝑯 [𝑅𝐸𝑚,𝑛 > |𝑿
𝒎 ] for a distribution 𝑭2 different from 𝑭1 ,
and then averaging over 100,000 values of conditional 𝐴𝑅𝐿1 . Other nominal choices for the 𝐴𝑅𝐿0 could be also considered.
The algorithms to determine the decision limits and the corresponding 𝐴𝑅𝐿0 and 𝐴𝑅𝐿1 are detailed in the Appendix A1
and A2.
4 PERFORMANCE EVALUATION
The monitoring framework, based on the 𝑅𝐸 2 statistic,43 is quite general in the sense that it can be applied to moni-
toring any multivariate process, including the high-dimensional processes. This will be discussed later. As the spatial
sign or depth-based monitoring methods are not distribution-free (as discussed in Section 1), the proposed monitoring
scheme is evaluated and compared with a recent nonparametric multivariate monitoring scheme that do not rely on
spatial sign or depth-based statistics. Given that the density of the rank-energy statistic defined by Equation (6) does
not have an analytic form, the proposed monitoring scheme’s efficacy is examined numerically via the Monte–Carlo
simulation.
Let us consider a reference data of size 𝑚 = 10, 20 and test data of size 𝑛 = 5 from a multivariate normal distribution of
min(𝑖,𝑗)
dimension 𝑑 = 20 with mean 𝝁 = 0𝑑×1 and covariance matrix 𝚺𝑑×𝑑 = (𝜎𝑖𝑗 )𝑑×𝑑 such that 𝜎𝑖𝑗 = 𝜎2 ( ), where 𝜎 = 1.5.
max(𝑖,𝑗)
To numerically evaluate the robustness of the proposed monitoring scheme, we estimate the 𝐴𝑅𝐿0 for various multivariate
distributions, using the decision limit obtained for the aforementioned multivariate normal distribution. As symmetric
distributions, we consider multivariate normal distributions with different covariance matrices, and as skewed distri-
butions, we consider multivariate exponential distributions ‘connected’ by copulas. Copula modelling is widely used in
practice in modelling the dependency structure in multivariate data.54,55 We consider the Gaussian copula as an elliptical
copula and the Clayton copula as an Archimedean copula as they are popular in literature for their flexibility and sim-
ple form.54,55 Other copulas could be considered as well. However, for the purpose of the performance study, we restrict
ourselves to the copulas mentioned above. The covariance structure for the Gaussian copula is taken as 𝚺𝑑×𝑑 = (𝜌𝑖𝑗 )𝑑×𝑑
min(𝑖,𝑗)
where 𝜌𝑖𝑗 = ( ). Clayton copula parameter is taken as 𝜉 = 2. A larger 𝜉 implies a stronger dependence among the
max(𝑖,𝑗)
end points. A brief discussion on copulas is provided in the Appendix A3. For more details on copula modelling, one may
refer to the monographs by.54–56 Thus, the distributions considered are:
(i) Multivariate normal distribution with mean 𝝁 = 0𝑑×1 and covariance matrix 𝚺𝑑×𝑑 = (𝜎𝑖𝑗 )𝑑×𝑑 such that 𝜎𝑖𝑗 =
min(𝑖,𝑗)
𝜎2 ( ), where 𝜎 = 2, 2.5, 3;
max(𝑖,𝑗)
(ii) 𝑑-dimensional exponential distribution with marginal distributions as Exponential (1) distribution connected by
Gaussian copula and Clayton copula.
2 2
The estimated 𝐴𝑅𝐿0 values corresponding to 𝑅𝐸10,5 and 𝑅𝐸20,5 statistics are presented in Table 2. The critical limits
are rounded off to 3 decimal places. It can be noted from Table 2 that the 𝐴𝑅𝐿0 performance of the proposed monitor-
ing scheme is quite robust across different distributions. Note that, in the robustness study, we considered 𝑚 = 10 and
𝑑 = 20 in Table 2. This suggests that the proposed monitoring scheme is suitable in ‘high dimension, low sample size’
settings.
In this section, we conduct a numerical study to evaluate the anomaly detection capability of the proposed monitoring
scheme. Table 3 provides decision limits for 𝑚 = 10, 15, 20, 25, 30, 40, 50, 100; 𝑛 =3, 5 and 𝑑 = 2(1)10, 20, 30, 40, 50, 100
such that 𝐴𝑅𝐿0 ≈ 370. As in Table 2, the decision limits increases with dimension 𝑑. In what follows, we discuss the
performance of the proposed method under various process shifts.
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4074 CHAKRABORTY and FINKELSTEIN
TA B L E 2 𝑨𝑹𝑳0 for 𝑹𝑬 2𝒎,𝒏 for 𝒅-dimensional multivariate normal and exponential distributions, for 𝒎 = 10, 20, 𝒏 = 5, 𝒅 = 20.
𝟐 𝟐
𝑹𝑬𝟏𝟎,𝟓 𝑹𝑬𝟐𝟎,𝟓
Critical limit = 2.810 Critical limit = 2.874
Multivariate normal distribution
𝜎 = 1.5 369.00 367.07
𝜎=2 365.97 370.21
𝜎 = 2.5 372.36 369.64
𝜎 = 3.0 376.86 370.25
Multivariate exponential distribution
Gaussian copula 368.27 368.71
Clayton copula 373.01 369.83
FIGURE 1 𝐴𝑅𝐿1 for different shift 𝛿 in location vector of high-dimensional Gaussian process.
In the manufacturing industry, Gaussian processes are widely used in quality control, predictive maintenance, optimisa-
tion among other applications.57,58 To justify the utility of the proposed method in detecting location shift, we consider
𝑑 = 50, 100-dimensional Gaussian process. At time instances 𝑡, {𝒙̃ 𝑖𝑡 , 𝒊 = 1(1)𝒏} are assumed to be temporally independent
and follow a shifted multivariate normal distribution with mean
Let us consider a multidimensional Gaussian process with dimension 𝑑 = 3, 30, 100. For all time instances 𝑡,
{𝒙̃ 𝑖𝑡 , 𝒊 = 1(1)𝒏} independently follow a multivariate normal distribution with mean 𝝁0 , and the covariance matrix
( )
𝚺1 = 𝜎1𝑖𝑗 𝑑×𝑑 , (12)
FIGURE 2 𝐴𝑅𝐿1 for different shift Δ in variance for multivariate normal distribution.
Failure rate modelling and monitoring is popular in reliability and process control literature.59–61 In this section, we con-
sider multivariate exponential distributions where the marginal failure rates have shifted from 𝜆0 = 1, without loss of
generality, to 𝜆1 = 𝜆0 𝛿, for 𝛿 > 1. The marginal exponential distributions are assumed to be connected by Clayton copula
with parameter 𝜉 = 1. In Figure 3, a decreasing trend is observed in the 𝐴𝑅𝐿1 values plotted for 𝛿 = 1.5, 2, 2.5, 3, 3.5, 4,
4.5, 5. However, 𝐴𝑅𝐿1 values seldom encountered bias, especially for smaller values of 𝛿. The empirical study in Figure 3
suggests that monitoring a highly skewed process with complex dependence structure would require a substantial number
of reference data, preferably more than the dimension.
In this section, we compare results of Mukherjee and Marozzi29 on Phase II process monitoring based on the interpoint-
distance methods with the proposed method (since they do not depend on the spatial sign or rank-based statistics). For
readers’ sake, we use similar notations as in Mukherjee and Marozzi29 to briefly explain their method.
In essence, Mukherjee and Marozzi29 have examined three distinct ways to calculate interpoint distances, each coupled
with three different univariate distribution-free tests applied to these distance measures. One of the high-dimensional
Shewhart-type (HDS) monitoring scheme in this paper called the HDSOR scheme, relies on the distance of each data
point from the origin (OR). The other two HDS monitoring schemes depend on the interpoint distances from (i) a single
reference data point (HDSIP), and (ii) the generalised median (HDSGM). The Wilcoxon statistic, Ansari–Bradley statistic
and Lepage statistic were obtained on the interpoint distance values via the HDSOR, HDSIP and HDSGM methods. Whilst
HDSGM scheme lacks robustness, it was noted that the HDSIP scheme has the uniformly better OC performance than
the HDSOR scheme.29 We consider the HDSOR and the HDSIP scheme for comparison.
Suppose that two random samples 𝑿̃ and 𝒀̃ (Table 4) of size 5 are drawn ( ) from the bivariate normal distribution with
10
mean 𝝁𝑿 = (0,0) and 𝝁𝒀 = (1,1), respectively, and covariance matrix 𝚺 = . We consider 𝒙̃ 1 and 𝒙̃ 5 as the conditioning
01
29
observations as defined by Mukherjee and Marozzi. Following the HDSIP scheme, interpoint Euclidean distances of all
𝑿̃ and 𝒀̃ observations are obtained from these two points (Figure 4). ‘Var1’ and ‘Var2’ in Figure 4 refer to the first and
second row elements of 𝑿̃ and 𝒀̃ sample, respectively.
Based on these distance values from 𝒙̃ 1 and 𝒙̃ 5 , we obtain the Wilcoxon rank-sum statistics HDSIP-W1 = 12 and HDSIP-
W2 = 21, respectively. For two random samples of size 5, the critical value at 5% level of significance for the Wilcoxon rank-
sum test is 20. Hence, for the same reference and test data, the process seems stable according to HDSIP-W1 value, and not
stable according to the HDSIP-W2 value. Therefore, the choice of the conditioning observation influences a practitioner’s
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4077
FIGURE 3 𝐴𝑅𝐿1 for different shift in failure rate 𝜆0 for multivariate exponential distribution connected by Clayton copula.
T A B(L E )4 Random samples from bivariate normal distribution with mean 𝝁𝑿 = (0, 0) and 𝝁𝒀 = (1,1), respectively, and covariance matrix
10
𝚺= .
01
𝒙̃ 𝟏 𝒙̃ 𝟐 𝒙̃ 𝟑 𝒙̃ 𝟒 𝒙̃ 𝟓
0.688 0.872 0.102 0.254 2.185
−0.675 −2.119 −1.265 −0.374 −0.596
𝒚̃ 𝟏 𝒚̃ 𝟐 𝒚̃ 𝟑 𝒚̃ 𝟒 𝒚̃ 𝟓
2.436 0.638 2.759 1.325 1.652
−0.854 0.922 1.969 1.185 −0.380
decision. This is not desirable as the test statistic for the same reference and test data should be invariant over the choice
of the conditioning observations. On the other hand, the rank-energy statistic 𝑅𝐸5,5 2
is 0.849 for the 𝑿̃ and 𝒀̃ sample in
Table 4 that is invariant of the practitioners’ bias.
Mukherjee and Marozzi29 demonstrated that HDSOR and HDSIP schemes, when using the Lepage statistic (called
HDSOR-L and HDSIP-L, respectively), exhibit superior OC performance compared to their competitors. We next compare
the OC performance of the proposed method with the HDSOR-L and HDSIP-L schemes for trivariate normal distribution
with mean 𝝁1 (Equation 10) and covariance matrix 𝚺1 (Equation 12). The OC location and scale shift are considered as in
Equation (10) and Equation (12).
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4078 CHAKRABORTY and FINKELSTEIN
FIGURE 4 (A) Euclidean distances from 𝑥̃ 1 with HDSIP-W1 = 12; (B) Euclidean distances from 𝑥̃ 5 with HDSIP-W2 = 21.
TA B L E 5 𝑴𝑹𝑳1 values for different location and scale shift for trivariate normal distribution.
𝟐 𝟐
HDSOR-L HDSIP-L 𝑹𝑬𝒎,𝒏 HDSOR-L HDSIP-L 𝑹𝑬𝒎,𝒏
𝒎 = 𝟏𝟎𝟎, 𝒏 = 5 𝒎 = 𝟑𝟎𝟎, 𝒏 = 5
Decision limit 11.07 10.99 1.744 11.39 11.37 1.778
(𝛿, Δ)
(0.5,1) 95 58 30 88 53 29
(1,1) 9 9 3 8 8 3
(0,1.5) 6 14 20 6 12 15
(0,2) 2 3 16 2 3 12
(0.5,1.5) 5 9 13 4 9 11
(0.5,2) 2 3 12 2 3 10
(1,1.5) 3 4 6 2 4 5
(1,2) 1 2 6 1 2 6
In our comparison, we consider 𝜌 = 0.5, 𝜎 = 1, as in the OC setting of Mukherjee and Marozzi.29 For location and scale
shift, we consider (𝛿, Δ) = (0.5,1), (1,1), (0,1.5), (0,2), (0.5,1.5), (1,1.5), (0.5,2), (1,2). It is to note that, 𝛿 = 0 indicates a pure
2
location shift and Δ = 1 indicates a pure scale shift. For comparison purpose, we calculate the decision limit for 𝑅𝐸𝑚,𝑛
statistic so that the IC median run length 𝑀𝑅 𝐿0 = 250. In Table 5, OC median run length (𝑀𝑅𝐿1 ) values for different (𝛿, Δ)
2 2
are provided for the HDSOR-L, HDSIP-L and 𝑅𝐸𝑚,𝑛 schemes. Table 5 shows that the 𝑅𝐸𝑚,𝑛 -based scheme outperforms
both the HDSOR-L and the HDSIP-L schemes in detecting pure location shifts in trivariate normal distribution. However,
when it comes to the scale shift, our method falls behind the other two monitoring schemes. This is because the Euclidean
distance-based methods tend to be sensitive to the scale transformations and outliers. If the variables of a random vector
are not on the same scale, one variable with larger value (or outliers) could influence the distance calculations.
The OC performance of the proposed method depends on the dimension 𝑑, and sample sizes 𝑚 and 𝑛. For Phase II
monitoring, 𝑛 = 3, 5, are standard choices for the test sample size. Therefore, it is of interest to see the performance of
the proposed method for varying 𝑚 and 𝑑 for different location and scale shift in a multivariate process. We consider
multivariate normal distribution to carry out a sensitivity analysis to understand the impact of the reference sample size
and dimension on OC performance of the proposed method.
In Figure 5, we provide 𝐴𝑅𝐿1 values for location shift (𝑎) 𝛿 = 0.5; (𝑏) 𝛿 = 0.75; (𝑐) 𝛿 = 1.0; (𝑑) 𝛿 = 1.5 in multivari-
ate normal distribution. In Figure 6, 𝐴𝑅𝐿1 values are displayed for variance shift (𝑎) Δ = 0.5; (𝑏) Δ = 1.0; (𝑐) Δ = 1.5;
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4079
F I G U R E 5 𝐴𝑅𝐿1 values for location shift (A) 𝛿 = 0.5; (B)𝛿 = 0.75; (C) 𝛿 = 1.0; (D)𝛿 = 1.5 in multivariate normal distribution for
different sample size 𝑚 and dimension 𝑑.
(𝑑) Δ = 2.0 in multivariate normal distribution. The location and the scale shift are considered as in Equations (10) and
(12) with 𝜌 = 0.5, 𝜎 = 1 for an IC Gaussian process.
Upon studying Figures 5 and 6, it is empirically evident that the proposed method exhibits bias, visible as the dark
patches when 𝐴𝑅𝐿1 > 𝐴𝑅𝐿0 , for some (𝑚, 𝑑) combinations, particularly for smaller 𝑚. For small location shift (𝛿 ≤ 1), the
proposed method is able to detect the shift for high dimensional data with small reference sample size. If larger shift detec-
tion is intended, a larger reference sample size would be recommended. For a smaller scale shift in a high-dimensional
data, the reference sample size of 𝑚 ≥ 100 would be recommended. For larger scale shift, the proposed method performs
well for ‘small reference sample, high dimension’ settings. The empirical analysis suggests that, in general, for location
and scale shift, a larger reference sample (such as 𝑚 ≥ 100) would be beneficial to avoid bias. It is also noteworthy that
𝐴𝑅𝐿1 tends to increase when the dimension 𝑑 is increasing. Consequently, we may suggest an empirical guideline for
𝑑
choosing (𝑚, 𝑑) such that ≤ 0.5.
𝑚
Sensorless drive is a type of a control system for electric motors, commonly used in variable speed drive (VSD) applications
such as heating, ventilation, air conditioning system, also known as HVAC systems, and electric vehicles. It operates
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN
F I G U R E 6 𝐴𝑅𝐿1 values for variance shift (A) Δ = 0.5; (B)Δ = 1.0; (C) Δ = 1.5; (D)Δ = 2.0 in multivariate normal distribution for
FIGURE 7
4080
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4081
2
FIGURE 8 𝑅𝐸100,5 statistics for the Class 4 and Class 9 sensorless drive data.
without using physical sensors to monitor motor parameters. In sensor-based VSD systems, physical sensors are used to
monitor motor parameters such as speed, torque, or position. Whilst generally reliable, these sensor-based systems can be
expensive, sensitive to environmental conditions, and require complex installation. On the other hand, sensorless drive
technology uses motor’s electrical parameters to estimate rotor position and speed and is widely employed in electric and
hybrid vehicle applications. Recent research on sensorless drive systems includes,62,63 among others. Sensorless drive fault
detection is vital for increasing reliability and maintainability characteristics of electrical motors.
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4082 CHAKRABORTY and FINKELSTEIN
FIGURE 9 HDSIP-L values for the Class 4 and Class 9 sensorless drive data with the first reference data point taken as the referencing
point.
To demonstrate the usage of the proposed multivariate monitoring scheme, we consider a sensorless drive diagnosis
dataset from the ‘rebmix’ package available in the Comprehensive R Archive Network (CRAN). This dataset contains
58,509 data points (in rows) and 4 variables (in columns). Among these 4 variables, the first three variables are continuous,
and the fourth one is a categorical variable denoting 11 different classes. The Class 1 corresponds to ‘healthy’ drives, whilst
the remaining classes correspond to faulty drives. Any of these faulty drive data could be considered as a test data to
illustrate the shift detection ability of the proposed method. We designate the Class 1 data as the reference data and the
Class 4 and Class 9 data as the test data.
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4083
In Figure 7, we present a plot illustrating the pairwise correlation among the variables in the sensorless drive data along
with their densities. The marginal densities of the continuous variables in Figure 7 show that the normality assump-
tion for the sensorless drive data is inappropriate and therefore distribution-free SPC method should be considered. It is
evident that there is significant correlation among the variables. In order to make the variables (or features) unit-free,
we standardise each variable of the reference and test data. The first 100 data points from Class 1 are considered as the
reference data and the data points from Class 4 and Class 9 data are considered as test data, respectively. The decision
2 2
limit for 𝑅𝐸100,5 statistic is 1.794 (see Table 3). The test data are divided into subgroups of size 𝑛 = 5 and 𝑅𝐸100,5 statistic
2
is computed. The resulting 𝑅𝐸100,5 statistic values are plotted in Figure 8.
We also calculate the HDSIP-L values29 for the same reference and test data, using the first data point in the reference
data as the referencing point. For a three-dimensional dataset, with reference and test sample size 100 and 5, respectively,
the decision limit is 10.99 for 𝐴𝑅 𝐿0 = 370.43. Figure 9 illustrates the HDSIP-L values, indicating that HDSIP-L values are
unable to capture the fluctuations in the data. The reason for this is that the Euclidean distances of the test samples are
much larger than the reference samples, and therefore, the Lepage statistic takes an extreme value, and this effect is the
same for all subgroups.
Using this case study, we have demonstrated our methodology and its usefulness in sensorless drive technology. The
importance of capturing data fluctuations in VSD applications is important for various reasons such as optimisation,
safety, energy management, etc. As observed in Figure 8, faulty drives from Class 9 demonstrate relative stability when
compared to those from Class 4, despite both being defective. However, the HIDSIP-L statistics, presented in Figure 9, do
not highlight these differences, instead they suggest a similar level of defectiveness across both classes of faulty drives.
6 CONCLUSION
We propose a new rank-energy (𝑅𝐸 2 ) statistic-based43 multivariate monitoring scheme. The existing nonparametric mul-
tivariate monitoring schemes often lack robustness, invariance properties and are dependent on the quality of the training
data. The proposed method is distribution-free, invariant to affine transformations, whilst demonstrating proficient shift
detection capabilities.
The proposed method also has certain limitations. We have observed that with the increase in dimension, there is
a corresponding rise in the 𝐴𝑅𝐿1 , indicating a decline in the detection power. Despite the ability to detect shifts in
‘high dimension, low sample size’ settings, the proposed method sometimes exhibits bias, especially for skewed mul-
tivariate processes. Numerical experiments indicate that a larger reference sample may be beneficial in monitoring
the high-dimensional data, reducing bias and increasing detection power. Therefore, reducing bias from a rank-energy
statistic-based monitoring scheme without increasing the reference sample size could be considered as an important topic
for further research.
AC K N OW L E D G E M E N T S
The authors express their gratitude to the anonymous reviewers whose valuable comments and suggestions have enhanced
the quality of this paper. Additionally, the authors would like to thank the high-performance computation (HPC) facility
at the University of the Free State for their assistance with the computational work involved in this study. The authors did
not receive support from any organization for the submitted work.
D A T A AVA I L A B I L I T Y S T A T E M E N T
The data that support the findings of this study are openly available in UC Irvine Machine Learning Repository at https://
archive.ics.uci.edu/dataset/325/dataset+for+sensorless+drive+diagnosis.
ORCID
Niladri Chakraborty https://orcid.org/0000-0001-9853-4067
Maxim Finkelstein https://orcid.org/0000-0002-3018-8353
REFERENCES
1. Hotelling H. Multivariate quality control illustrated by air testing of sample bombsights. In: Eisenhart C, Hastay MW, Wallis WA, eds.
Techniques of Statistical Analysis. McGraw Hill; 1947:111-184.
2. Qiu P. Introduction to Statistical Process Control. Chapman and Hall/CRC; 2014. doi:10.1201/b15016
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4084 CHAKRABORTY and FINKELSTEIN
3. Zhang C, Chen N, Wu J. Spatial rank-based high-dimensional monitoring through random projection. J Qual Technol. 2020;52:111-127.
doi:10.1080/00224065.2019.1571336
4. Ahsan M, Mashuri M, Lee MH, et al. Robust adaptive multivariate Hotelling’s T2 control chart based on kernel density estimation for
intrusion detection system. Expert Syst Appl. 2020;145:113105. doi:10.1016/j.eswa.2019.113105
5. Chakraborty N, Lui CF, Maged A. A distribution-free change-point monitoring scheme in high-dimensional settings with application to
industrial image surveillance. Commun Stat Simul Comput. 2023:1-17. doi:10.1080/03610918.2023.2202371
6. Wu T, Wang R, Yan H, et al. Adaptive change point monitoring for high-dimensional data. Stat Sin. 2022;32:1583-1610. doi:10.5705/ss.
202020.0438
7. Ge Z, Song Z. Multivariate statistical process control. Advances in Industrial Control. Springer London; 2013. doi:10.1007/978-1-4471-4513-4
8. Champ CW, Jones-Farmer LA, Rigdon SE. Properties of the T 2 control chart when parameters are estimated. Technometrics. 2005;47:437-
445. doi:10.1198/004017005000000229
9. Jensen WA, Jones-Farmer LA, Champ CW, et al. Effects of parameter estimation on control chart properties: a literature review. J Qual
Technol. 2006;38:349-364. doi:10.1080/00224065.2006.11918623
10. Psarakis S, Vyniou AK, Castagliola P. Some recent developments on the effects of parameter estimation on control charts. Qual Reliab Eng
Int. 2014;30:1113-1129. doi:10.1002/qre.1556
11. Ebadi M, Chenouri S, Lin DKJ, et al. Statistical monitoring of the covariance matrix in multivariate processes: a literature review. J Qual
Technol. 2022;54:269-289. doi:10.1080/00224065.2021.1889419
12. Qiu P. Some perspectives on nonparametric statistical process control. J Qual Technol. 2018;50:49-65. doi:10.1080/00224065.2018.1404315
13. Maboudou-Tchao EM. High-dimensional data monitoring using support machines. Commun Stat Simul Comput. 2021;50:1927-1942. doi:10.
1080/03610918.2019.1588312
14. Tran KP, ed. Control Charts and Machine Learning for Anomaly Detection in Manufacturing. Springer International Publishing; 2022.
doi:10.1007/978-3-030-83819-5
15. He S, Jiang W, Deng H. A distance-based control chart for monitoring multivariate processes using support vector machines. Ann Oper
Res. 2018;263:191-207. doi:10.1007/s10479-016-2186-4
16. Li Z, Tian L, Yan X. An ensemble framework based on multivariate statistical analysis for process monitoring. Expert Syst Appl.
2022;205:117732. doi:10.1016/j.eswa.2022.117732
17. Qiu P, Xie X. Transparent sequential learning for statistical process control of serially correlated data. Technometrics. 2022;64:487-501.
doi:10.1080/00401706.2021.1929493
18. Lee WJ, Mendis GP, Triebe MJ, et al. Monitoring of a machining process using kernel principal component analysis and kernel density
estimation. J Intell Manuf. 2020;31:1175-1189. doi:10.1007/s10845-019-01504-w
19. Xiao B, Li Y, Sun B, et al. Decentralized PCA modeling based on relevance and redundancy variable selection and its application to
large-scale dynamic process monitoring. Process Saf Environ Prot. 2021;151:85-100. doi:10.1016/j.psep.2021.04.043
20. Liu RY. Control Charts for Multivariate Processes. Source: Journal of the American Statistical Association; 1995.
21. Qiu P, Hawkins D. A rank-based multivariate CUSUM procedure. Technometrics. 2001;43:120-132. doi:10.1198/004017001750386242
22. Qiu P, Hawkins D. A nonparametric multivariate cumulative sum procedure for detecting shifts in all directions. J R Stat Soc. 2003;52:151-
164. doi:10.1111/1467-9884.00348
23. Qiu P. Distribution-free multivariate process control based on log-linear modeling. IIE Trans. 2008;40:664-677. doi:10.1080/
07408170701744843
24. Marden J. Multivariate rank tests. In: Ghosh S, ed. Multivariate Analysis, Design of Experiments, and Survey Sampling. CRC Press; 1999:401-
432. doi:10.1201/9781482289824
25. Randles RH. A simpler, affine-invariant, multivariate, distribution-free sign test. J Am Stat Assoc. 2000;95:1263-1268. doi:10.1080/01621459.
2000.10474326
26. Hettmansperger TP, McKean JW. Robust Nonparametric Statistical Methods. CRC Press; 2010. doi:10.1201/b10451
27. Holland MD, Hawkins DM. A control chart based on a nonparametric multivariate change-point model. J Qual Technol. 2014;46:63-77.
doi:10.1080/00224065.2014.11917954
28. Bae SJ, Do G, Kvam P. On data depth and the application of nonparametric multivariate statistical process control charts. Appl Stoch
Models Bus Ind. 2016;32:660-676. doi:10.1002/asmb.2186
29. Mukherjee A, Marozzi M. Nonparametric Phase-II control charts for monitoring high-dimensional processes with unknown parameters.
J Qual Technol. 2022;54:44-64. doi:10.1080/00224065.2020.1805378
30. Bush HM, Chongfuangprinya P, Chen VCP, et al. Nonparametric multivariate control charts based on a linkage ranking algorithm. Qual
Reliab Eng Int. 2010:663-675. doi:10.1002/qre.1129
31. Hettmansperger TP, Randles RH. A practical affine equivariant multivariate median. Biometrika. 2002;89:851-860. doi:10.1093/biomet/89.
4.851
32. Huwang L, Lin L, Yu C. A spatial rank–based multivariate EWMA chart for monitoring process shape matrices. Qual Reliab Eng Int.
2019;35:1716-1734. doi:10.1002/qre.2471
33. Li J, Zhang X, Jeske DR. Nonparametric multivariate CUSUM control charts for location and scale changes. J Nonparametr Stat. 2013;25:1-
20. doi:10.1080/10485252.2012.726992
34. Li Z, Zou C, Wang Z, et al. A multivariate sign chart for monitoring process shape parameters. J Qual Technol. 2013;45:149-165. doi:10.1080/
00224065.2013.11917923
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4085
35. Liang W, Xiang D, Pu X. A robust multivariate EWMA control chart for detecting sparse mean shifts. J Qual Technol. 2016;48:265-283.
doi:10.1080/00224065.2016.11918166
36. Wang J, Su Q, Fang Y, et al. A multivariate sign chart for monitoring dependence among mixed-type data. Comput Ind Eng. 2018;126:625-
636. doi:10.1016/j.cie.2018.09.053
37. Zi X, Zou C, Zhou Q, et al. A directional multivariate sign EWMA control chart. Qual Technol Quant Manag. 2013;10:115-132. doi:10.1080/
16843703.2013.11673311
38. Zou C, Wang Z, Tsung F. A spatial rank-based multivariate EWMA control chart. Nav Res Logist (NRL). 2012;59:91-110. doi:10.1002/nav.21475
39. Zou C, Tsung F. A multivariate sign EWMA control chart. Technometrics. 2011;53:84-97. doi:10.1198/TECH.2010.09095
40. Chen N, Zi X, Zou C. A distribution-free multivariate control chart. Technometrics. 2016;58:448-459. doi:10.1080/00401706.2015.1049750
41. Zhang C, Chen N, Zou C. Robust multivariate control chart based on goodness-of-fit test. J Qual Technol. 2016;48:139-161. doi:10.1080/
00224065.2016.11918156
42. Hallin M, del Barrio E, Cuesta-Albertos J, et al. Distribution and quantile functions, ranks and signs in dimension d: A measure
transportation approach. The Annals of Statistics. 2021;49(2):1139-1165. doi:10.1214/20-AOS1996
43. Deb N, Sen B. Multivariate rank-based distribution-free nonparametric testing using measure transportation. J Am Stat Assoc. 2023;118:192-
207. doi:10.1080/01621459.2021.1923508
44. Chernozhukov V, Galichon A, Hallin M, et al. Monge–Kantorovich depth, quantiles, ranks and signs. Ann Stat. 2017;45(1):223-256. doi:10.
1214/16-AOS1450
45. Hallin M. Measure transportation and statistical decision theory. Annu Rev Stat Appl. 2022;9:401-424. doi:10.1146/annurev-statistics-
040220-105948
46. Hallin M, Mordant G. On the finite-sample performance of measure-transportation-based multivariate rank tests. In: Robust and Mul-
tivariate Statistical Methods: Festschrift in Honor of David E. Tyler. Springer International Publishing; 2022:87-119. doi:10.1007/978-3-031-
22687-8_5
47. Jones-Farmer LA, Woodall WH, Steiner SH, et al. An overview of phase I analysis for process improvement and monitoring. J Qual Technol.
2014;46:265-280. doi:10.1080/00224065.2014.11917969
48. Qiu P. Big data? Statistical process control can help!. Am Stat. 2020;74:329-344. doi:10.1080/00031305.2019.1700163
49. Li W, Pu X, Tsung F, et al. A robust self-starting spatial rank multivariate EWMA chart based on forward variable selection. Comput Ind
Eng. 2017;103:116-130. doi:10.1016/j.cie.2016.11.024
50. Xue L, Qiu P. A nonparametric CUSUM chart for monitoring multivariate serially correlated processes. J Qual Technol. 2021;53:396-409.
doi:10.1080/00224065.2020.1778430
51. Train KE. Discrete Choice Methods with Simulation. Cambridge University Press; 2009. doi:10.1017/CBO9780511805271
52. Christophe D, Petr S, randtoolbox: Generating and Testing Random Numbers. R package version 2.0.4. 2023. doi:10.32614/CRAN.package.
randtoolbox
53. Sofikitou E, Koutras M. In: Koutras MV, Triantafyllou IS, eds. Distribution-Free Methods for Statistical Process Monitoring and Control.
Springer International Publishing; 2020. doi:10.1007/978-3-030-25081-2
54. Joe H. Dependence Modeling With Copulas. Chapman and Hall/CRC; 2014. doi:10.1201/b17116
55. Nelsen RB. An Introduction to Copulas. Springer New York; 2006. doi:10.1007/0-387-28678-0
56. Finkelstein M. Failure Rate Modelling for Reliability and Risk. Springer London; 2008. doi:10.1007/978-1-84800-986-8
57. Chen Z, Fan J, Wang K. Multivariate Gaussian processes: definitions, examples and applications. METRON. 2023;81:181-191. doi:10.1007/
s40300-023-00238-3
58. Maier M, Rupenyan A, Bobst C, et al. Self-optimizing grinding machines using Gaussian process models and constrained Bayesian
optimization. Int J Adv Manuf Technol. 2020;108:539-552. doi:10.1007/s00170-020-05369-9
59. Chakraborty N, Mahmood T. Failure rate monitoring in generalized gamma-distributed process. Qual Technol Quant Manag. 2021;18:718-
739. doi:10.1080/16843703.2021.1953241
60. Finkelstein M, Cha JH, Chackraborty N. Balancing load and performance for different failure models. Appl Stoch Models Bus Ind.
2022;38:323-333. doi:10.1002/asmb.2662
61. Finkelstein M, Cha JH. Stochastic modeling for reliability. Springer Series in Reliability Engineering. Springer London; 2013. doi:10.1007/
978-1-4471-5028-2
62. Wang W, Niu S, Zhao X. A novel field and armature synchronous pulse injection method for sensorless drive control of 12/10 DC vernier
reluctance machine. IEEE Trans Energy Convers. 2023;38(3):2126-2135. doi:10.1109/TEC.2023.3268328
63. Yuan L, Ding Y, Jiang Y, et al. Multi-dimensional vector control for six-phase interior permanent magnet synchronous motor sensorless
drive system. Energ Rep. 2023;9:1126-1133. doi:10.1016/j.egyr.2023.05.082
64. Joe H. Dependence Modeling With Copulas. Chapman and Hall/CRC; 2014. doi:10.1201/b17116
How to cite this article: Chakraborty N, Finkelstein M. Distribution-free multivariate process monitoring: A
rank-energy statistic-based approach. Qual Reliab Engng Int. 2024;40:4068–4087. https://doi.org/10.1002/qre.3619
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4086 CHAKRABORTY and FINKELSTEIN
APPENDIX
A1
R-Algorithm for Monte Carlo simulation to estimate the decision limit for a given ARL0 :
Input: Design parameters 𝑚, 𝑛, 𝑑; Number of iterations 𝐵1 and 𝐵2 ; A given value for 𝐴𝑅𝐿0 so that 𝛼0 ≈ 1∕𝐴𝑅𝐿0 .
Output: The decision limit.
1. Call the necessary packages.
2. Call the computestatistic() function.43
3. Define the mean vector 𝝁𝒅×1 and the covariance matrix 𝚺𝒅×𝒅 as in Section 4.1.
2
4. # Define a function to calculate the 𝑅𝐸𝑚,𝑛 in an iterative way.
Test_stat = function(count, 𝑚, 𝑛, 𝑑, reference sample){
Draw test sample 𝒀𝒏×𝒅 of size 𝑛 from a multivariate normal with mean 𝝁𝒅×1 and covariance 𝚺𝒅×𝒅 .
2
Calculate the 𝑅𝐸𝑚,𝑛 statistic using the computestatistic() function using the reference and the test sample.
2
Return the 𝑅𝐸𝑚,𝑛 statistic.
}
5. Fix
for 𝒊 in 1:𝐵1 {
6. Draw a reference sample 𝑿𝒅×𝒎 from multivariate normal distribution with mean 𝝁𝒅×1 and covariance matrix 𝚺𝒅×𝒅 .
2
7. Calculate 𝐵2 number of conditional 𝑅𝐸𝑚,𝑛 values using the Test_stat() function.
2
8. Calculate the 100 (1 − 𝛼) percentile of the conditional 𝑅𝐸𝑚,𝑛 to estimate .
1
9. Estimate the conditional 𝐴𝑅𝐿0 ≈ 2
𝑃𝑿 [𝑅𝐸𝑚,𝑛 ≥]
}
10. Estimate the 𝐴𝑅𝐿0 by taking average of 𝐵1 number of conditional 𝐴𝑅𝐿0 values. The decision limit is obtained iteratively so
that 𝐴𝑅𝐿0 ≈ 370.
A2
To estimate the 𝐴𝑅𝐿1 , we change the mean vector and the covariance matrix or change the distribution. The rest of the
steps in the algorithm are same as the Algorithm in A1 except that it is not necessary to estimate the decision limit.
A3
We provide a brief discussion on the Gaussian and Clayton copula. A 𝑑-dimensional copula 𝐶 is a cumulative distri-
𝑑
bution function (c.d.f.) given by 𝐶 ∶ [0, 1] → [0, 1] with uniform marginals. According to the Sklar’s Theorem,64 for any
𝑑-dimensional c.d.f. with continuous marginals, there exists a copula that can uniquely describe the dependency structure.
The Gaussian copula is defined as:
( ) 𝑑
𝐶𝚺𝐺𝑎𝑢𝑠𝑠 (𝑢1 , … , 𝑢𝑑 ) = 𝚽𝚺 Φ−1 (𝑢1 ) , … , Φ−1 (𝑢𝑑 ) , (𝑢1 , … , 𝑢𝑑 ) ∈ [0, 1] , (A.1)
where Φ𝚺 is the 𝑑-variate normal c.d.f. with correlation matrix 𝚺, and Φ(.) is a univariate standard normal c.d.f.
The Archimedean copula is defined as:
( ) 𝑑
𝐶 𝐴𝑟𝑐ℎ (𝑢1 , … , 𝑢𝑑 ) = 𝜓 𝜓 −1 (𝑢1 ) + 𝜓 −1 (𝑢2 ) + ⋯ + 𝜓 −1 (𝑢𝑑 ) , (𝑢1 , … , 𝑢𝑑 ) ∈ [0, 1] , (A.2)
where 𝜓(.) is a generator function of 𝐶 𝐴𝑟𝑐ℎ . The Clayton copula is a special case of the Archimedean copula with the
generator
−𝜉
𝑢𝑖 −1
𝜓 (𝑢𝑖 ) = , 𝜉 > 0, 𝑖 = 1, 2, 3, … , 𝑑. (A.3)
𝜉
10991638, 2024, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/qre.3619 by Niladri Chakraborty - South African Medical Research , Wiley Online Library on [20/02/2025]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
CHAKRABORTY and FINKELSTEIN 4087
AU T H O R B I O G R A P H I E S
Dr. Niladri Chakraborty is a lecturer in the Department of Mathematical Statistics and Actuarial
Sciences, University of the Free State, South Africa. He earned his PhD in Mathematical Statistics
from the University of Pretoria in 2017. Prior to joining the University of the Free State, he worked
as a postdoctoral researcher at the City University of Hong Kong. His main research interests are
in statistical process control and nonparametric inference.