Carè Et Al. - 2018 - Finite-Sample System Identification An Overview A
Carè Et Al. - 2018 - Finite-Sample System Identification An Overview A
1, JANUARY 2018 61
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
62 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018
of making available to others an easy-to-access point which Clearly, there is no unique way to build confidence regions so
may foster research in this field. Second, driven by the results that (2) is satisfied: our goal is presenting well-principled and
highlighted, a new correlation method is proposed which is useful methods.
based on the combination of LSCR and SPS. It builds con-
fidence regions based on correlations, like LSCR, while it
applies sign-perturbations with a norm and obtains exact con- B. Assumptions
fidence, like SPS. A computational advantage of the new The system is assumed to be invertible w.r.t. the noise:
correlation method is that it avoids generating alternative out- Assumption 1: For any value of θ , relation Yn
put sequences, which are vital for SPS when handling for F(Un , Wn , I ; θ ) is noise invertible in the sense that, given
example ARX systems. This idea can be easily understood the values of Yn , Un , I , vector Wn can be recovered.
in the light of the unifying approach provided in this letter. Example 1: Consider an ARX model
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
CARÈ et al.: FINITE-SAMPLE SYSTEM IDENTIFICATION: OVERVIEW AND NEW CORRELATION METHOD 63
(1) (m−1)
for i = 1, . . . , m − 1, where sn , . . . , sn are m − 1 user- Clearly, Claim 1 is not the end of the story, as one would
generated sign vectors of independent random signs, whose also like to construct a region n that is well shaped and
elements are +1 or −1 with 1/2 probability each. converges toward θ ∗ as n increases. Moreover, of no minor
Precisely, the construction of the confidence region n importance is the issue of the computational complexity asso-
for θ ∗ is based on ranking Z0 (θ ) with respect to Zi (θ ), ciated to constructing n . In the next section, we present
i = 1, . . . , m − 1. To this goal, one first selects two integers h1 existing methods, namely LSCR (Leave-out Sign-dominant
and h2 with h1 ≤ h2 in the range 1, 2, . . . , m. Then, for any Correlation Regions), SPS (Sign-Perturbed Sums) and PDMs
value of θ , the numbers Zi (θ ), i = 0, 1, , . . . , m − 1, are sorted (Perturbed Dataset Methods), and cast them within the setup
in increasing order. If so happens that Z0 (θ ) is in the position of this section and also discuss the issue of the region shape
h1 or h1 + 1 or . . . or h2 , then that θ belongs to n , in the and the computational complexity associated to these meth-
opposite it does not. For example, say that m = 10, so that ods. This sheds light on the pros and cons of these various
there are 10 functions Zi (θ ), i = 0, 1, . . . , 9. Take h1 = 1 and techniques in a comparative way, which is the first goal of
h2 = 3. For a given θ , if it happens that Z0 (θ ) is the smallest this letter. Then, in the following section we introduce a new
of all functions Zi (θ ), i = 0, 1, . . . , 9, or the second smallest correlation method which combines some advantages of the
or the third smallest, then this θ is included in n , otherwise above-mentioned approaches.
4
it is not. Under some additional minor details as hinted at
below, the following result holds.
III. R EVISITING E XISTING F INITE -S AMPLE M ETHODS
Claim 1: Call R(θ ) the rank of Z0 (θ ) among {Zi (θ ), i =
0, . . . , m − 1}, i.e., if Z0 (θ ) is the smallest, then R(θ ) = 1, if In this section, we revisit three existing finite-sample
Z0 (θ ) is the second smallest, then R(θ ) = 2, and so on. The approaches using the framework introduced in Section II.
confidence region defined as
A. The LSCR Method
n {θ ∈ Rd : h1 ≤ R(θ ) ≤ h2 }
In its randomized formulation [2], LSCR fits into the
framework of Section II where the function Z0 (θ ) is sim-
is such that P{θ ∗ ∈ n } = (h2 − h1 + 1)/m. ply defined as a sum of error correlation terms, such as,
This result is in the form of (2), where p = (h2 −h1 +1)/m. e.g., W t (θ )W
t−k (θ ), or of input-error correlation terms such
Note that h2 − h1 + 1 is the number of positions in the order- as, e.g., W t (θ )Ut−k , while the perturbed functions Zi (θ ) are
ing that Z0 (θ ) is allowed to take over the total number m of obtained by replacing in the definition of Z0 (θ ) the compo-
positions. The proof of this result requires some mathemati- nents of W n (θ ) with the components of sn(i) [W n (θ )]. Consider,
cal underpinning to deal with a number of details including for example, Z0 (θ ) = − nt=2 W t (θ )Wt−1 (θ ). Then, for each
the possibility of having ties and possible correlation issues θ , the ranking of Z0 among {Z0 , . . . , Zm−1 } is equivalent
between the system measurable input and the nonmeasurable to the ranking of 0 (the constant zero function) among
noise. The exact manner to approach these issues is given {0, Z1 − Z0 , . . . , Zm−1 − Z0 }. Note that Zi − Z0 is a sum of the
in the papers cited in the introduction, while we here only kind nt=2 αt W t (θ )Wt−1 (θ ), where αt is equal to 0 or 2 with
remark that the fundamental idea behind this result is almost equal probability: this is the random subsampling idea of [2].
straightforward and can be explained as follows. Under the Consistency results forLSCR are based on proving that in
assumption that θ = θ ∗ , functions {Zi (θ ∗ )} become the long run, sums like nt=2 αt W t−1 (θ ), for every θ =
t (θ )W
θ ∗ , tend to become large in absolute value, and therefore every
Z0 (θ ∗ ) Z(Un , Wn , θ ∗ ), θ = θ ∗ will eventually be excluded from the region. However,
Zi (θ ∗ ) Z(Un , sn(i) [Wn ], θ ∗ ). in order to get consistency results, focusing on one sum only
is not enough. For example, for ARMA(na ,nw ) systems, the
The only difference between these m random variables is that LSCR region is obtained by intersecting various regions (k)
n ,
(i) each of which constructed by considering a sum of the kind
the argument Wn in the first is replaced by sn [Wn ] in the n
(i)
others. However, Wn and sn [Wn ] are random variables having
t=k+1 Wt (θ )Wt−k (θ ) for different values of k.
the same distribution because of Assumption 2. Hence, there In some cases, using different kinds of correlations such
is no reason why, among the variables Z0 (θ ∗ ) and Zi (θ ∗ ), i = as input-error correlations or even higher order correlations is
1, . . . , m − 1, one should have a larger chance than the others advisable, [1], [6]. Note that if every region (k)
n is guaranteed
∗
to include the true parameter θ with exact probability p, then
to be in the first or in the second or ... in any other particular
the intersection n = ∩k̄ (k) ∗
position, and in fact each has the same probability 1/m to k=1 n includes θ with probability
be in any position. Since in Claim 1 n is determined by at least 1 − (k̄(1 − p)), by the union bound, which is a source
including a given θ if Z0 (θ ) ranks in one among h2 − h1 + 1 of conservatism.
positions, then θ ∗ is included with probability (h2 −h1 +1)/m.
This argument is not rigorous because of tie-breaks and many B. The SPS Method
other minor issues, but the fundamental idea that has been
explained here goes through. Consider a system in linear regression form as Yt =
ϕt θ ∗ + Wt , where ϕt is a function of U1 , . . . , Ut and Wt
is the symmetric noise. Given n samples Y1 , . . . , Yn and the
4 A subtle issue may arise in case two Z (θ ) functions take the same value. In
i corresponding regressors ϕ1 , . . . , ϕn , the least-squares
n esti-
this case, a suitable tie-break rule can be applied, and this aspect is discussed
in the literature cited in the introduction while we neglect this aspect here mate θ̂LS is obtained by minimizing L(θ ) = t=1 t
W 2 (θ ),
because it would stray us too much into unnecessary details. where W t (θ ) = Yt − Yt (θ ), and Yt (θ ) ϕt θ .
θLS is
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
64 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018
the solution (unique, under some technical conditions) of where θn (·) is a point-estimator. Claim 1 applies to this con-
∇θ L(θ ) = nt=1 ϕt W t (θ ) = 0. text. Moreover, in Z0 , function θn (·) computes an estimate
1) SPS With Exogenous Regressors: In the prototypical of θ ∗ based on the original input-output dataset, (Un , Yn );
SPS algorithm, under the assumption that the regressors {ϕt } hence, Z0 (θ ∗ ) = θ ∗ − θn (Un , Yn )2R tends to be small
do not depend on outputs (i.e., regressors are exogenous), a for large n. On the other hand, for each other Zi function,
normed version of ∇
θn (·) computes an estimate based on the perturbed dataset
θ L(·) is chosen as the reference element
(i) (i)
and thus Z0 (θ ) = nt=1 ϕt W t (θ )2R , where · 2R is a suitably (Un , Yn (θ )); hence, θn (Un , Yn (θ )) is an estimate of θ and
(i)
rescaled Euclidean norm, and Zi (θ ) is obtained by replacing Zi (θ ) = θ −
∗ ∗ θn (Un , Yn (θ ))2R does not converge to zero
n (θ ) with sn(i) [W
W n (θ )]. Note that, by construction, Z0 (θ̂LS ) = as n → ∞. Hence, by selecting h1 = 1 one singles out in the
0 ≤ Zi (θ̂LS ), so that when h1 = 1 the SPS region includes long run the true θ ∗ .
θLS . Moreover, the errors in all the components of θ are taken It can be proved that, for FIR and ARX systems, by choos-
simultaneously into account by the norm. This idea will be ing θn (·) as the least-squares estimator, the suggested method
henceforth referred to as the “norm trick”. builds the same region as SPS. This is not true in the case
2) SPS for ARX Systems: Some difficulties arise when ϕt of general linear systems with the prediction error estima-
depends on past outputs, as it is in autoregressive systems. tor. In that case, one difficulty of the bootstrap PDM is that
In this case simply using ϕt in both the reference Z0 and it is computationally intensive. In fact, computing Zi (θ ), for
the perturbed Zi functions is not a valid option, because it i = 1, . . . , m − 1, for any fixed θ , requires to calculate
(i)
would invalidate the key symmetry argument behind Claim 1. θn (Un ,
Yn (θ )). Consequently, for every θ , one has to solve
In fact, through past inputs, ϕt depends on noise terms and m − 1 non-convex optimization problems.5
these noise terms have to undergo the sign perturbation in
the Zi functions. A solution to this problem is to “recon-
IV. A N EW C ORRELATION A PPROACH
struct” alternative output sequences based on the available
information. Given any triplet of the kind (Un , Wn , θ ), the In this section we introduce a new finite-sample identifica-
knowledge of F can be used to define an alternative output tion method that combines some of the previous ideas into a
Yn as Yn F(Un , Wn , I ; θ ), see (1). Using Yn , also alter- new algorithm with improved properties.
native regressors { ϕt } can be constructed that include elements
of
Yn instead of the actual output Yn . Finally, the Z function A. Motivations
for a generic triple (Un , Wn , θ ) is defined as As we saw, LSCR is based on a correlation idea (combined
2 with subsampling) which leads to a flexible and easy to imple-
n
Z(Un , Wn , θ )
ϕt Wt . ment algorithm. It is also computationally light, as unlike SPS
t=1 R and PDMs, LSCR does not require the generation of alterna-
tive, perturbed input-output datasets. However, the confidence
Then, as usual, Z0 (θ ) = Z(Un , W n (θ ), θ ). In Z0 , the val- bound resulting from intersecting individually exact regions
ues of ϕt and Yn are computed using θ and (Un , W n (θ )). makes LSCR conservative for high dimensional parameters.
Therefore, by (1) and the invertibility assumption, the values SPS and PDMs evaluate the errors in all parameters simulta-
of Yn coincide with the observed output values of Yn for neously (norm-trick) and construct confidence regions having
every θ , and ϕt = ϕt . On the other hand, the Zi ’s are obtained exact confidences. Unfortunately, the generation of alternative
by replacing W n (θ ) with sn(i) [W
n (θ )], so that ϕt and Yn are input-output datasets is required to ensure exact confidence in
(i)
now reconstructed by using sn [Wn (θ )] instead of the actual the case of more general systems. As a consequence, these
n (θ ). Thus, denoting by
error W Yn(i) (θ ) the i-th reconstructed methods can become difficult to analyze and computationally
alternative output sequence, that is, expensive or even impractical, especially when they involve
hard optimization steps, as it is the case for bootstrap-style
n (θ )], I ; θ ),
Yn(i) (θ ) = F(Un , sn(i) [W (3) PDMs.
(i)
Here we aim at defining a new class of methods that exploits
we have that Yn (θ ) = Yn in general. It can be proven that the correlation idea of LSCR, which makes the method com-
with this approach Claim 1 remains rigorously valid [4]. putable, together with the norm trick of SPS, which makes the
confidence of the constructed regions exact. One goal with this
C. Perturbed Dataset Methods section is to stimulate further research in this direction.
PDMs form an interesting class of methods that leave many
degrees of freedom to the user and fit also situations where B. Sign-Perturbed Correlation Regions
the joint symmetry assumption is replaced by other conditions The main idea of the new finite-sample method, called Sign-
such as arbitrary i.i.d. sequences. In these methods the alter- Perturbed Correlation Regions (SPCR), is as follows. Instead
native output, (3), plays the crucial role: a “perturbed dataset”, of defining a different Z function for each correlation and then
(i)
in the terminology of [9], is any pair (Un , Yn (θ )). We focus intersecting the resulting regions as in LSCR, we stack the
here on a stimulating idea briefly mentioned in [9]. correlation sums into a vector and compute a single scalar
1) Bootstrap-Style PDMs: Let functions Z0 and {Zi } be “summary” of them by introducing a suitable norm.
Z0 (θ ) θ −
θn (Un , Yn )2R , 5 An interesting direction of research about PDMs is whether the estimator
θn (·) can be successfully replaced by an approximated estimator that is easy-
θn (Un ,
Zi (θ ) θ − Yn(i) (θ ))2R , to-compute.
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
CARÈ et al.: FINITE-SAMPLE SYSTEM IDENTIFICATION: OVERVIEW AND NEW CORRELATION METHOD 65
Here we will present the method for ARX systems with the instrumental variables. In this case, the previously introduced
notations used in Example 1. Besides Assumptions 1 and 2, IV-SPS [14] is a special case of SPCR. Other properties of
we also suppose that the system operates in open-loop, i.e., SPS and LSCR are expected to carry over to SPCR, see also
that the inputs {Ut } and the noises {Nt } are independent. Sections V and VI.
For a generic couple of input and noise vectors Un and
Wn , we introduce the correlation vectors defined for every
D. Simulation Example
t = 1, . . . , n as
Assume that the true system generating the output sequence
Ct (Un , Wn ) (Wt Wt−1
, . . . , Wt Wt−k
, Wt Ut , . . . , Wt Ut−l+1
)T , {Yt } is a bilinear system [12] defined as
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
66 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018
VI. C ONCLUSION
Finite-sample system identification methods are practically
important as they provide rigorously guaranteed results under
mild statistical assumptions. This letter has been prepared to
foster research in this important field by providing an easy
access-point to the neophyte. First, fundamental ideas behind
finite-sample identification methods have been analyzed. Three
existing approaches were revisited: LSCR, SPS and PDMs.
Finally, a new non-asymptotic identification algorithm, SPCR,
was suggested based on the idea of combining LSCR and
SPS. SPCR has the flexibility and computational advantages
of LSCR combined with the exact confidence of SPS. Finally,
some essential properties of the aforementioned finite-sample
identification methods were discussed.
We believe that SPCR is promising for the identification of
complex systems, including nonlinear ones. Many results that
Fig. 1. 95% confidence regions built by SPCR with k = 2 and l = 2. were previously proved in the context of LSCR [1], [6] and
SPS [3], [5] can be used for analyzing and extending this new
correlation-type method. For example, in virtue of [1], we can
V. D ESIRABLE P ROPERTIES OF argue that the consistency of the method can be improved by
F INITE -S AMPLE M ETHODS suitably prefiltering the input signal.
Now, we return to the general overview of finite-sample R EFERENCES
methods and list some of the most important properties
[1] M. C. Campi and E. Weyer, “Guaranteed non-asymptotic confi-
that one wants to achieve by suitably designing the Z dence regions in system identification,” Automatica, vol. 41, no. 10,
function. pp. 1751–1764, 2005.
• Inclusion of a point-estimate: Confidence regions can [2] M. C. Campi and E. Weyer, “Non-asymptotic confidence sets for the
help to assess the quality of point-estimates and, e.g., parameters of linear transfer functions,” IEEE Trans. Autom. Control,
vol. 55, no. 12, pp. 2708–2720, Dec. 2010.
to determine how robust a design that is based on them [3] A. Carè, B. Cs. Csáji, and M. C. Campi, “Sign-perturbed sums (SPS)
should be. We know that SPS builds its confidence with asymmetric noise: Robustness analysis and robustification tech-
regions around the least-squares (LS) estimate, while niques,” in Proc. 55th IEEE Conf. Decis. Control (CDC), Las Vegas,
NV, USA, 2016, pp. 262–267.
SPCR can guarantee the inclusion of correlation-type [4] B. Cs. Csáji, M. C. Campi, and E. Weyer, “Sign-perturbed sums (SPS):
estimates. A method for constructing exact finite-sample confidence regions for
• Consistency: For any false parameter value, θ = θ ∗ , the general linear systems,” in Proc. CDC, 2012, pp. 7321–7326.
n should decrease as the sample size,
probability of θ ∈ [5] B. Cs. Csáji, M. C. Campi, and E. Weyer, “Sign-perturbed sums: A new
system identification approach for constructing exact non-asymptotic
n, increases. Asymptotically, the coverage probability of confidence regions in linear regression models,” IEEE Trans. Signal
any such false θ should be zero. Some consistency results Process., vol. 63, no. 1, pp. 169–181, Jan. 2015.
are available for LSCR [1] and SPS [15], and can be [6] M. Dalai, E. Weyer, and M. C. Campi, “Parameter identification for
easily obtained for some bootstrap-style PDMs. It is yet nonlinear systems: Guaranteed confidence regions through LSCR,”
Automatica, vol. 43, no. 8, pp. 1418–1425, 2007.
to be proven whether SPCR inherits this property. [7] S. Garatti, M. C. Campi, and S. Bittanti, “Assessing the quality of identi-
• Favorable topology: The constructed confidence region, fied models through the asymptotic theory—When is the result reliable,”
n , should have good topological properties. We
Automatica, vol. 40, no. 8, pp. 1319–1332, 2004.
[8] M. Kieffer and E. Walter, “Guaranteed characterization of exact
know, for example, that the SPS confidence regions non-asymptotic confidence regions as defined by LSCR and SPS,”
are star-convex (and hence also connected) with the Automatica, vol. 50, no. 2, pp. 507–512, 2014.
LS estimate as a star centre, assuming exogenous [9] S. Kolumbán, I. Vajk, and J. Schoukens, “Perturbed datasets methods
regressors. for hypothesis testing and structure of corresponding confidence sets,”
Automatica, vol. 51, pp. 326–331, Jan. 2015.
• Weak computability: Deciding whether a candidate θ [10] L. Ljung, System Identification: Theory for the User, 2nd ed.
belongs to n should be computationally easy. LSCR, Upper Saddle River, NJ, USA: Prentice-Hall, 1999.
SPS and SPCR are all weakly computable in that sense, [11] M. Milanese, J. Norton, H. Piet-Lahanier, and É. Walter, Bounding
Approaches to System Identification. New York, NY, USA: Springer,
even for endogenous regressors; but this may not hold 2013, doi: 10.1007/978-1-4757-9545-5.
for bootstrap-style PDMs, for which evaluating the Z [12] R. R. Mohler, Bilinear Control Processes: With Applications to
function can quickly become too complex. Engineering, Ecology and Medicine. New York, NY, USA: Academic
• Strong computability: Calculating a representation of n Press, 1973.
[13] T. Söderström and P. Stoica, System Identification. Hertfordshire, U.K.:
or an approximation of it should be computationally fea- Prentice-Hall, 1989.
sible. An ellipsoidal outer-approximation for SPS with [14] V. Volpe, B. Cs. Csáji, A. Carè, E. Weyer, and M. C. Campi, “Sign-
exogenous regressors can be constructed efficiently by perturbed sums (SPS) with instrumental variables for the identification
solving convex optimization problems [5]. Inner- and of ARX systems,” in Proc. 54th IEEE Conf. Decis. Control (CDC),
Osaka, Japan, 2015, pp. 2115–2120.
outer-approximations can also be built using interval- [15] E. Weyer, M. C. Campi, and B. Cs. Csáji, “Asymptotic properties of
analysis, see [8] for LSCR and SPS. SPS confidence regions,” Automatica, vol. 82, pp. 287–294, Aug. 2017.
Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.