0% found this document useful (0 votes)
41 views6 pages

Carè Et Al. - 2018 - Finite-Sample System Identification An Overview A

This document discusses finite-sample system identification methods. It begins by introducing the problem of estimating unknown system parameters from finite noisy observations. Standard asymptotic methods provide point estimates but do not guarantee accuracy for finite samples. Existing finite-sample methods build guaranteed confidence regions under minimal assumptions, but have not been widely adopted. The document then aims to provide an accessible overview of finite-sample identification principles and review existing methods. It proposes a new correlation-based method that combines advantages of previous approaches by building exact confidence regions efficiently based on correlations between observed and perturbed outputs.

Uploaded by

johan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views6 pages

Carè Et Al. - 2018 - Finite-Sample System Identification An Overview A

This document discusses finite-sample system identification methods. It begins by introducing the problem of estimating unknown system parameters from finite noisy observations. Standard asymptotic methods provide point estimates but do not guarantee accuracy for finite samples. Existing finite-sample methods build guaranteed confidence regions under minimal assumptions, but have not been widely adopted. The document then aims to provide an accessible overview of finite-sample identification principles and review existing methods. It proposes a new correlation-based method that combines advantages of previous approaches by building exact confidence regions efficiently based on correlations between observed and perturbed outputs.

Uploaded by

johan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO.

1, JANUARY 2018 61

Finite-Sample System Identification: An


Overview and a New Correlation Method
Algo Carè, Balázs Cs. Csáji, Member, IEEE, Marco C. Campi, Fellow, IEEE,
and Erik Weyer, Member, IEEE

Abstract—Finite-sample system identification I. I NTRODUCTION


algorithms can be used to build guaranteed confi-
FUNDAMENTAL problem in system identification is
dence regions for unknown model parameters under mild
statistical assumptions. It has been shown that in many cir-
cumstances these rigorously built regions are comparable
A that of estimating the parameters of partially unknown
systems based on noisy observations, [10], [13]. Standard
in size and shape to those that could be built by resorting methods in the system identification literature focus on point
to the asymptotic theory. The latter sets are, however, estimates, that is, they aim at estimating the value of the
not guaranteed for finite samples and can sometimes unknown parameters: classic results guarantee that asymptoti-
lead to misleading results. The general principles behind cally, i.e., when the amount of observations tends to infinity–
finite-sample methods make them virtually applicable to
a large variety of even nonlinear systems. While these
the parameters can indeed be correctly estimated. However,
principles are simple enough, a rigorous treatment of the in general, it is impossible to estimate a parameter with infi-
attendant technical issues makes the corresponding theory nite precision from a finite number of stochastic data, so that
complex and not easy to access. This is believed to be one a “confidence tag” has to be attached to the point estimate.
of the reasons why these methods have not yet received For this purpose, a confidence region around the estimated
widespread acceptance by the identification community parameters is often built. It is well-known that assessing the
and this letter is meant to provide an easy access point quality of a non-asymptotic estimate using an asymptotic the-
to finite-sample system identification by presenting the
fundamental ideas underlying these methods in a simplified ory, although popular, may lead to unreliable results, see [7].
manner. We then review three (classes of) methods that On the other hand, making strong assumptions on the prob-
have been proposed so far—1) Leave-out Sign-dominant ability distribution of the data (e.g., Gaussianity) leads to
Correlation Regions (LSCR); 2) Sign-Perturbed Sums (SPS); results that are formally rigorous but of limited practical
3) Perturbed Dataset Methods (PDMs). By identifying some interest. Motivated by these limitations of standard stochastic1
difficulties inherent in these methods, we also propose identification schemes, non-asymptotic identification methods
in this letter a new sign-perturbation method based on
for building confidence regions that i) are guaranteed when
correlation which overcome some of these difficulties.
applied to finite samples of data and ii) are guaranteed under
Index Terms—Identification, estimation. minimal assumptions on the data-generation mechanism have
been pursued. The most important examples are the LSCR
Manuscript received March 6, 2017; revised May 22, 2017; accepted (Leave-out Sign-dominant Correlation Regions) method [1],
June 12, 2017. Date of publication June 28, 2017; date of cur- the SPS (Sign-Perturbed Sums) method [5] and its generaliza-
rent version July 21, 2017. The work of A. Carè was supported tions called PDMs (Perturbed Dataset Methods) [9]. These
in part by the European Research Consortium for Informatics and
Mathematics (ERCIM), and in part by the Australian Research Council algorithms construct guaranteed confidence regions for the
(ARC) under Grant DP130104028. The work of B. Cs. Csáji was sup- unknown model parameters for a large class of dynamical
ported in part by the Hung. Sci. Res. Fund (OTKA) under Grant 113038 systems, such as general linear systems, [1], [4], and even
and Grant GINOP-2.3.2-15-2016-00002, and in part by the János Bolyai
Research Fellowship under Grant BO/00217/16/6. The work of M. C.
nonlinear ones [6], under very mild assumptions on the driv-
Campi was supported by the University of Brescia (in Italian: Universitá ing noise, or even no assumptions in some specific cases [2].
degli Studi di Brescia) under the project H&W “Clafite.” The work of A difference between LSCR and the latter methods is that
E. Weyer was supported by the ARC under Grant DP130104028. regions built by SPS and PDMs contain the true parameter
Recommended by Senior Editor R. S. Smith. (Corresponding author:
Algo Carè.) with a probability that is exact, while LSCR provides a lower
A. Carè is with the Centrum Wiskunde & Informatica, 1098 XG bound in general.
Amsterdam, The Netherlands (e-mail: [email protected]).
B. Cs. Csáji is with the Institute for Computer Science and Control
(SZTAKI), Hungarian Academy of Sciences (MTA), 1111 Budapest, A. Aim of This Letter
Hungary (e-mail: [email protected]). This letter has two main aims. First, it revisits some cru-
M. C. Campi is with the Department of Information
Engineering, University of Brescia, 25123 Brescia, Italy (e-mail: cial ideas in finite-sample system identification and presents
[email protected]). them in a unified framework. This is done with the intent
E. Weyer is with Department of Electrical and Electronic Engineering,
Melbourne School of Engineering, University of Melbourne, Melbourne, 1 Set-membership approaches constitute a different line of research which
VIC 3010, Australia (e-mail: [email protected]). aims at identifying the region of parameters that are consistent with the
Digital Object Identifier 10.1109/LCSYS.2017.2720969 observations assuming the noise belongs to some bounded set [11].
2475-1456 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
62 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018

of making available to others an easy-to-access point which Clearly, there is no unique way to build confidence regions so
may foster research in this field. Second, driven by the results that (2) is satisfied: our goal is presenting well-principled and
highlighted, a new correlation method is proposed which is useful methods.
based on the combination of LSCR and SPS. It builds con-
fidence regions based on correlations, like LSCR, while it
applies sign-perturbations with a norm and obtains exact con- B. Assumptions
fidence, like SPS. A computational advantage of the new The system is assumed to be invertible w.r.t. the noise:
correlation method is that it avoids generating alternative out- Assumption 1: For any value of θ , relation Yn 
put sequences, which are vital for SPS when handling for F(Un , Wn , I ; θ ) is noise invertible in the sense that, given
example ARX systems. This idea can be easily understood the values of Yn , Un , I , vector Wn can be recovered.
in the light of the unifying approach provided in this letter. Example 1: Consider an ARX model

Yt = a1 Yt−1 + · · · + ana Yt−na + b1 Ut−1 + · · · + bt−nb Ut−nb + Wt .


B. Structure of This Letter
In Section II, the fundamental idea behind finite-sample Assuming that the given initial conditions, I , contain the
identification methods based on the sign-perturbation idea terms U0 , . . . , U1−nb and Y0 , . . . , Y1−na , the noise vector Wn
is revisited and presented in a simplified manner. Then, in can be reconstructed from Yn and Un by making explicit the
Section III, we consider known methods in the light of the ARX equation with respect to the noise term.
framework of Section II, these are LSCR, SPS and PDMs. Noise invertibility is a very mild condition. At times, how-
We show that some of the drawbacks in the existing meth- ever, one does not know the initial conditions I so that only
ods can be overcome by a new, correlation-based approach, part of Wn can be reconstructed. For instance, in the ARX
which is presented and also applied to a bilinear system in example not knowing I prevents the reconstruction of the
Section IV. Finally, in Section V, we present a brief summary first terms of Wn . To streamline the presentation, this aspect
of properties which should be taken into account when finite is glossed over here and we assume that the whole Wn can
sample methods are designed or evaluated. Conclusions are be reconstructed; the interested reader is referred to the papers
drawn in Section VI. cited in the introduction for more discussion.
In the sequel, the reconstructed noise is indicated with
II. F UNDAMENTALS OF F INITE -S AMPLE  n (θ ), where θ indicates explicitly that the model with
W
I DENTIFICATION M ETHODS parameter θ has been used. Clearly, W  n (θ ∗ ) = Wn .
Assumption 2: The noise Wn is jointly symmetric about
We first introduce the goal of exact, finite-sample iden-
zero, i.e., (W1 , . . . , Wn ) has the same joint probability distri-
tification methods, and then describe the sign-perturbation
bution as (σ1 W1 , . . . , σn Wn ) for all possible sign-sequences,
approach for building confidence regions. We aim at isolating
σi ∈ {+1, −1}, i = 1, . . . , n.
the main idea and highlight the fundamental principles.
Note that in Assumption 2 neither stationarity nor indepen-
dence is assumed. If the noise sequence is independent, then
A. Problem Set-Up Assumption 2 is equivalent to say that each noise term Wt has
Consider a sample of n output measurements a symmetric probability distribution about zero.
Y1 , . . . , Yn . We represent this sequence as a vector Remark 1 (Beyond the Symmetric Noise Assumption):
Yn = (Y1 , Y2 , . . . , Yn ). The vector Yn depends on the There are methods in the literature that rely on no assump-
vector Un = (U1 , U2 , . . . , Un ) of (past) measured inputs, on tions on the noise. These methods assume symmetry of the
the vector Wn = (W1 , W2 , . . . Wn ) of (past) nonmeasured input instead, see [2]. The ideas outlined in this letter can
inputs (noise), and possibly on some auxiliary set of initial be applied to these methods with minor modifications. For
conditions I through a function F, relaxation of the symmetry assumption see also [3] and the
references therein.
Yn  F(Un , Wn , I ). (1)
Consider now a family of functions {F(Un , Wn , I ; θ )} param- C. Exact Guarantees Through Sign-Perturbation
eterized by means of θ and assume that the system function To simplify notation, given a vector vn = (v1 , . . . , vn )
F(Un , Wn , I ) is obtained for one value of θ , say θ = θ ∗ .2 and a vector of signs sn = (σ1 , . . . , σn ) ∈ {+1, −1}n , we
We are interested in constructing methods for building a con- denote the corresponding sign-perturbed vector by sn [vn ] 
fidence region  n ⊆ Rd that contains the correct θ ∗ with a
(σ1 v1 , . . . , σn vn ).
user-chosen probability p, namely3 Consider any function Z that takes as input two vectors of
 n } = p.
P{θ ∗ ∈  (2) length N and the parameter θ . Example of such functions are
given later in this letter. Sign-perturbation methods are based
2 This amounts to require that the structure of the system is known while on comparing a reference function defined as
its parameters are not.
3 In the language of hypothesis testing, p is the probability of type one error,  n (θ ), θ ),
Z0 (θ )  Z(Un , W
i.e., that the true θ ∗ is not in the constructed region; the type two error cannot
instead be kept under control similarly since a θ that is close enough to θ ∗ is
hard to remove. Instead of enforcing limits on type two errors, in finite-sample
with m − 1 “sign-perturbed” functions defined as
system identification one asks that   n becomes smaller and converges toward
θ ∗ as N increases, see below for more details.  n (θ )], θ ),
Zi (θ )  Z(Un , sn(i) [W

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
CARÈ et al.: FINITE-SAMPLE SYSTEM IDENTIFICATION: OVERVIEW AND NEW CORRELATION METHOD 63

(1) (m−1)
for i = 1, . . . , m − 1, where sn , . . . , sn are m − 1 user- Clearly, Claim 1 is not the end of the story, as one would
generated sign vectors of independent random signs, whose also like to construct a region  n that is well shaped and
elements are +1 or −1 with 1/2 probability each. converges toward θ ∗ as n increases. Moreover, of no minor
Precisely, the construction of the confidence region  n importance is the issue of the computational complexity asso-
for θ ∗ is based on ranking Z0 (θ ) with respect to Zi (θ ), ciated to constructing   n . In the next section, we present
i = 1, . . . , m − 1. To this goal, one first selects two integers h1 existing methods, namely LSCR (Leave-out Sign-dominant
and h2 with h1 ≤ h2 in the range 1, 2, . . . , m. Then, for any Correlation Regions), SPS (Sign-Perturbed Sums) and PDMs
value of θ , the numbers Zi (θ ), i = 0, 1, , . . . , m − 1, are sorted (Perturbed Dataset Methods), and cast them within the setup
in increasing order. If so happens that Z0 (θ ) is in the position of this section and also discuss the issue of the region shape
h1 or h1 + 1 or . . . or h2 , then that θ belongs to   n , in the and the computational complexity associated to these meth-
opposite it does not. For example, say that m = 10, so that ods. This sheds light on the pros and cons of these various
there are 10 functions Zi (θ ), i = 0, 1, . . . , 9. Take h1 = 1 and techniques in a comparative way, which is the first goal of
h2 = 3. For a given θ , if it happens that Z0 (θ ) is the smallest this letter. Then, in the following section we introduce a new
of all functions Zi (θ ), i = 0, 1, . . . , 9, or the second smallest correlation method which combines some advantages of the
or the third smallest, then this θ is included in  n , otherwise above-mentioned approaches.
4
it is not. Under some additional minor details as hinted at
below, the following result holds.
III. R EVISITING E XISTING F INITE -S AMPLE M ETHODS
Claim 1: Call R(θ ) the rank of Z0 (θ ) among {Zi (θ ), i =
0, . . . , m − 1}, i.e., if Z0 (θ ) is the smallest, then R(θ ) = 1, if In this section, we revisit three existing finite-sample
Z0 (θ ) is the second smallest, then R(θ ) = 2, and so on. The approaches using the framework introduced in Section II.
confidence region defined as
A. The LSCR Method
 n  {θ ∈ Rd : h1 ≤ R(θ ) ≤ h2 }
 In its randomized formulation [2], LSCR fits into the
framework of Section II where the function Z0 (θ ) is sim-
is such that P{θ ∗ ∈   n } = (h2 − h1 + 1)/m. ply defined as a sum of error correlation terms, such as,
This result is in the form of (2), where p = (h2 −h1 +1)/m. e.g., W t (θ )W
t−k (θ ), or of input-error correlation terms such
Note that h2 − h1 + 1 is the number of positions in the order- as, e.g., W t (θ )Ut−k , while the perturbed functions Zi (θ ) are
ing that Z0 (θ ) is allowed to take over the total number m of obtained by replacing in the definition of Z0 (θ ) the compo-
positions. The proof of this result requires some mathemati- nents of W  n (θ ) with the components of sn(i) [W  n (θ )]. Consider,

cal underpinning to deal with a number of details including for example, Z0 (θ ) = − nt=2 W t (θ )Wt−1 (θ ). Then, for each
the possibility of having ties and possible correlation issues θ , the ranking of Z0 among {Z0 , . . . , Zm−1 } is equivalent
between the system measurable input and the nonmeasurable to the ranking of 0 (the constant zero function) among
noise. The exact manner to approach these issues is given {0, Z1 − Z0 , . . . , Zm−1 − Z0 }. Note that Zi − Z0 is a sum of the
in the papers cited in the introduction, while we here only kind nt=2 αt W t (θ )Wt−1 (θ ), where αt is equal to 0 or 2 with
remark that the fundamental idea behind this result is almost equal probability: this is the random subsampling idea of [2].
straightforward and can be explained as follows. Under the Consistency results forLSCR are based on proving that in
assumption that θ = θ ∗ , functions {Zi (θ ∗ )} become the long run, sums like nt=2 αt W t−1 (θ ), for every θ =
t (θ )W
θ ∗ , tend to become large in absolute value, and therefore every
Z0 (θ ∗ )  Z(Un , Wn , θ ∗ ), θ = θ ∗ will eventually be excluded from the region. However,
Zi (θ ∗ )  Z(Un , sn(i) [Wn ], θ ∗ ). in order to get consistency results, focusing on one sum only
is not enough. For example, for ARMA(na ,nw ) systems, the
The only difference between these m random variables is that LSCR region is obtained by intersecting various regions   (k)
n ,
(i) each of which constructed by considering a sum of the kind
the argument Wn in the first is replaced by sn [Wn ] in the n
(i)
others. However, Wn and sn [Wn ] are random variables having  
t=k+1 Wt (θ )Wt−k (θ ) for different values of k.
the same distribution because of Assumption 2. Hence, there In some cases, using different kinds of correlations such
is no reason why, among the variables Z0 (θ ∗ ) and Zi (θ ∗ ), i = as input-error correlations or even higher order correlations is
1, . . . , m − 1, one should have a larger chance than the others advisable, [1], [6]. Note that if every region   (k)
n is guaranteed

to include the true parameter θ with exact probability p, then
to be in the first or in the second or ... in any other particular
the intersection   n = ∩k̄  (k) ∗
position, and in fact each has the same probability 1/m to k=1 n includes θ with probability
be in any position. Since in Claim 1   n is determined by at least 1 − (k̄(1 − p)), by the union bound, which is a source
including a given θ if Z0 (θ ) ranks in one among h2 − h1 + 1 of conservatism.
positions, then θ ∗ is included with probability (h2 −h1 +1)/m.
This argument is not rigorous because of tie-breaks and many B. The SPS Method
other minor issues, but the fundamental idea that has been
explained here goes through. Consider a system in linear regression form as Yt =
ϕt θ ∗ + Wt , where ϕt is a function of U1 , . . . , Ut and Wt
is the symmetric noise. Given n samples Y1 , . . . , Yn and the
4 A subtle issue may arise in case two Z (θ ) functions take the same value. In
i corresponding regressors ϕ1 , . . . , ϕn , the least-squares
n esti-
this case, a suitable tie-break rule can be applied, and this aspect is discussed
in the literature cited in the introduction while we neglect this aspect here mate θ̂LS is obtained by minimizing L(θ ) = t=1 t

W 2 (θ ),

because it would stray us too much into unnecessary details. where W t (θ ) = Yt − Yt (θ ), and  Yt (θ )  ϕt θ . 
θLS is

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
64 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018

the solution (unique, under some technical conditions) of where  θn (·) is a point-estimator. Claim 1 applies to this con-
∇θ L(θ ) = nt=1 ϕt W t (θ ) = 0. text. Moreover, in Z0 , function  θn (·) computes an estimate
1) SPS With Exogenous Regressors: In the prototypical of θ ∗ based on the original input-output dataset, (Un , Yn );
SPS algorithm, under the assumption that the regressors {ϕt } hence, Z0 (θ ∗ ) = θ ∗ −  θn (Un , Yn ) 2R tends to be small
do not depend on outputs (i.e., regressors are exogenous), a for large n. On the other hand, for each other Zi function,
normed version of ∇ 
θn (·) computes an estimate based on the perturbed dataset
θ L(·) is chosen as the reference element
(i) (i)
and thus Z0 (θ ) = nt=1 ϕt W t (θ ) 2R , where · 2R is a suitably (Un , Yn (θ )); hence,  θn (Un , Yn (θ )) is an estimate of θ and
(i)
rescaled Euclidean norm, and Zi (θ ) is obtained by replacing Zi (θ ) = θ − 
∗ ∗ θn (Un , Yn (θ )) 2R does not converge to zero
 n (θ ) with sn(i) [W
W  n (θ )]. Note that, by construction, Z0 (θ̂LS ) = as n → ∞. Hence, by selecting h1 = 1 one singles out in the
0 ≤ Zi (θ̂LS ), so that when h1 = 1 the SPS region includes long run the true θ ∗ .

θLS . Moreover, the errors in all the components of θ are taken It can be proved that, for FIR and ARX systems, by choos-
simultaneously into account by the norm. This idea will be ing θn (·) as the least-squares estimator, the suggested method
henceforth referred to as the “norm trick”. builds the same region as SPS. This is not true in the case
2) SPS for ARX Systems: Some difficulties arise when ϕt of general linear systems with the prediction error estima-
depends on past outputs, as it is in autoregressive systems. tor. In that case, one difficulty of the bootstrap PDM is that
In this case simply using ϕt in both the reference Z0 and it is computationally intensive. In fact, computing Zi (θ ), for
the perturbed Zi functions is not a valid option, because it i = 1, . . . , m − 1, for any fixed θ , requires to calculate
(i)
would invalidate the key symmetry argument behind Claim 1. θn (Un , 
 Yn (θ )). Consequently, for every θ , one has to solve
In fact, through past inputs, ϕt depends on noise terms and m − 1 non-convex optimization problems.5
these noise terms have to undergo the sign perturbation in
the Zi functions. A solution to this problem is to “recon-
IV. A N EW C ORRELATION A PPROACH
struct” alternative output sequences based on the available
information. Given any triplet of the kind (U n , W n , θ ), the In this section we introduce a new finite-sample identifica-
knowledge of F can be used to define an alternative output tion method that combines some of the previous ideas into a

Yn as  Yn  F(U n , W n , I ; θ ), see (1). Using  Yn , also alter- new algorithm with improved properties.
native regressors { ϕt } can be constructed that include elements
of 
Yn instead of the actual output Yn . Finally, the Z function A. Motivations
for a generic triple (U n , W n , θ ) is defined as As we saw, LSCR is based on a correlation idea (combined
 2 with subsampling) which leads to a flexible and easy to imple-
 n 

Z(Un , Wn , θ )     
ϕt Wt  . ment algorithm. It is also computationally light, as unlike SPS
t=1 R and PDMs, LSCR does not require the generation of alterna-
tive, perturbed input-output datasets. However, the confidence
Then, as usual, Z0 (θ ) = Z(Un , W  n (θ ), θ ). In Z0 , the val- bound resulting from intersecting individually exact regions
ues of  ϕt and  Yn are computed using θ and (Un , W  n (θ )). makes LSCR conservative for high dimensional parameters.
Therefore, by (1) and the invertibility assumption, the values SPS and PDMs evaluate the errors in all parameters simulta-
of Yn coincide with the observed output values of Yn for neously (norm-trick) and construct confidence regions having
every θ , and ϕt = ϕt . On the other hand, the Zi ’s are obtained exact confidences. Unfortunately, the generation of alternative
by replacing W  n (θ ) with sn(i) [W
 n (θ )], so that  ϕt and Yn are input-output datasets is required to ensure exact confidence in
(i) 
now reconstructed by using sn [Wn (θ )] instead of the actual the case of more general systems. As a consequence, these
 n (θ ). Thus, denoting by 
error W Yn(i) (θ ) the i-th reconstructed methods can become difficult to analyze and computationally
alternative output sequence, that is, expensive or even impractical, especially when they involve
hard optimization steps, as it is the case for bootstrap-style
  n (θ )], I ; θ ),
Yn(i) (θ ) = F(Un , sn(i) [W (3) PDMs.
(i)
Here we aim at defining a new class of methods that exploits
we have that Yn (θ ) = Yn in general. It can be proven that the correlation idea of LSCR, which makes the method com-
with this approach Claim 1 remains rigorously valid [4]. putable, together with the norm trick of SPS, which makes the
confidence of the constructed regions exact. One goal with this
C. Perturbed Dataset Methods section is to stimulate further research in this direction.
PDMs form an interesting class of methods that leave many
degrees of freedom to the user and fit also situations where B. Sign-Perturbed Correlation Regions
the joint symmetry assumption is replaced by other conditions The main idea of the new finite-sample method, called Sign-
such as arbitrary i.i.d. sequences. In these methods the alter- Perturbed Correlation Regions (SPCR), is as follows. Instead
native output, (3), plays the crucial role: a “perturbed dataset”, of defining a different Z function for each correlation and then
(i)
in the terminology of [9], is any pair (Un ,  Yn (θ )). We focus intersecting the resulting regions as in LSCR, we stack the
here on a stimulating idea briefly mentioned in [9]. correlation sums into a vector and compute a single scalar
1) Bootstrap-Style PDMs: Let functions Z0 and {Zi } be “summary” of them by introducing a suitable norm.

Z0 (θ )  θ − 
θn (Un , Yn ) 2R , 5 An interesting direction of research about PDMs is whether the estimator

θn (·) can be successfully replaced by an approximated estimator that is easy-
θn (Un , 
Zi (θ )  θ −  Yn(i) (θ )) 2R , to-compute.

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
CARÈ et al.: FINITE-SAMPLE SYSTEM IDENTIFICATION: OVERVIEW AND NEW CORRELATION METHOD 65

Here we will present the method for ARX systems with the instrumental variables. In this case, the previously introduced
notations used in Example 1. Besides Assumptions 1 and 2, IV-SPS [14] is a special case of SPCR. Other properties of
we also suppose that the system operates in open-loop, i.e., SPS and LSCR are expected to carry over to SPCR, see also
that the inputs {Ut } and the noises {Nt } are independent. Sections V and VI.
For a generic couple of input and noise vectors U n and

Wn , we introduce the correlation vectors defined for every
D. Simulation Example
t = 1, . . . , n as
Assume that the true system generating the output sequence
Ct (U n , W n )  (Wt Wt−1

, . . . , Wt Wt−k

, Wt Ut , . . . , Wt Ut−l+1

)T , {Yt } is a bilinear system [12] defined as

where k and l are user-chosen parameters, typically k+l ≥ na + 1


Yt  a∗ Yt−1 + b∗ Ut + Ut Nt + Nt ,
nb . We assume, for simplicity, that the given initial conditions 2
allow us to compute the correlation vector, Ct (U n , W n ), for
for t = 1, . . . , n, with a∗ = 0.7 and b∗ = 1, with zero initial
all t = 1, . . . , n.
conditions. Notice that this system has the structure
As we saw in Section II, the fundamental component of
such methods is the Z function, which for SPCR is
Yt  a∗ Yt−1 + b∗ Ut + Wt ,
 1 1 2
n
Z(U n , W n , θ )  Q− 2 (U n , W n ) Ct (U n , W n ) , with Wt = 12 Ut Nt + Nt . Sequence {Ut } is the measured input
n
t=1 generated by Ut  0.5 Ut−1 + Vt , with zero initial conditions,
where {Vt } is i.i.d. Gaussian with zero mean and unit variance.
where Q is a “scaling” matrix defined as The noise sequence {Nt } is i.i.d. Laplacian with zero mean and
unit variance, independent of {Ut }.
1
n
Q(U n , W n )  Ct (U n , W n )CTt (U n , W n ), Define
n
t=1

Yt (θ )  a Yt−1 + b Ut .
which is assumed to be invertible, for convenience. As in the
case of SPS, the “shaping” matrix Q has the role of balancing Assuming we have a sample of Y1 , . . . , Yn and U1 , . . . , Un ,
the action of the norm with respect to the variability of the and using the zero initial conditions, we have that the residuals
different components. Note that the so defined Z is a func- t (θ )  Yt − 
W Yt (θ ) are well-defined for all t ≤ n.
tion of U n , W n only, that is, the third argument (the system We apply SPCR with k = l = 2 and we assume that
parameter θ ) is not used for computing the value of Z, and n > 2, for convenience, and leave out from the sum those
 n (θ )) and
we can omit it. Finally, we define Z0 (θ ) = Z(Un , W vectors which surely contain some zero correlations. Thus,
(i) 
Zi (θ ) = Z(Un , sn [Wn (θ )]), which depend on θ only through the reference (i = 0) and sign-perturbed functions (i =
the reconstructed noise W  n (θ ). 1, . . . , m − 1) are
The confidence region construction is the same as before ⎡ ⎤
with h1 = 1,  t−1 (θ )
σi,t−1 W 2
 − 12 1 ⎢ t−2 (θ )⎥ 
n
 ⎢ σi,t−2 W ⎥
 n  { θ ∈ Rna +nb : R(θ ) ≤ h2 }.
 Zi (θ )  Qi (θ ) ⎣ ⎦ σi,t Wt (θ )

 ,
n−2 Ut
t=3
Ut−1
Note that SPCR is a class of methods where different con-
structions correspond to different choices of (k, l). For more where σ0,t = 1, for all t, while, for i = 0, {σ0,t } are i.i.d.
general (especially nonlinear) systems, it may be useful to also random signs, as before. Matrix Qi (θ ) is
include higher-order correlations in {Ct } [6].
⎡ ⎤⎡ ⎤
σi,t−1 t−1 (θ ) σi,t−1 W
W t−1 (θ ) T
1 ⎢ t−2 (θ )⎥ ⎢ t−2 (θ )⎥
n
C. Properties of SPCR Confidence Regions Qi (θ )  ⎢σi,t−2 W ⎥⎢αi,t−2 W ⎥W
2
n−2 ⎣ Ut ⎦⎣ Ut ⎦ t (θ ),
It is easy to see that the SPCR methods fit into the t=3
Ut−1 Ut−1
framework of Section II and Claim 1 holds. Therefore, the
confidence regions constructed by SPCR are non-conservative,
and is almost surely invertible, for i = 0, . . . , m − 1.
namely their confidence probabilities are exactly h2 /m. t (θ ∗ ) = 1 Ut Nt + Nt ,
It is easy to check that variables W 2
Another nice property of SPCR is the inclusion of certain
t = 1, . . . , n, are jointly symmetric (use that {Nt } are i.i.d.
point-estimates. Assume, for simplicity, that l + k = na + nb ,
and symmetric, and {Ut } is independent of {Nt }). Hence, the
then the correlation-type [10] point-estimate θ̂ satisfying
assumptions of Section II are satisfied and SPCR delivers rig-
orously guaranteed confidence regions, with exact probability
1
n
 n (θ̂)) = 0,
Ct (Un , W of containing the true parameter values (a∗ , b∗ ).
n Figure 1 presents confidence regions built by SPCR for
t=1
increasing number of observations, n = 50, 200, 400. The
 n , since Z0(θ̂) = 0 ≤ Zi(θ̂), for all i. For exam-
is included in  regions were built with p = 0.95, m = 100, and h2 = 95.
ple, if k = 0 and l = na +nb we can guarantee the inclusion of The figure is indicative of the phenomenon that the SPCR
an instrumental variable estimate, if the inputs are chosen as regions are well-shaped and shrink around the true parameter.

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.
66 IEEE CONTROL SYSTEMS LETTERS, VOL. 2, NO. 1, JANUARY 2018

VI. C ONCLUSION
Finite-sample system identification methods are practically
important as they provide rigorously guaranteed results under
mild statistical assumptions. This letter has been prepared to
foster research in this important field by providing an easy
access-point to the neophyte. First, fundamental ideas behind
finite-sample identification methods have been analyzed. Three
existing approaches were revisited: LSCR, SPS and PDMs.
Finally, a new non-asymptotic identification algorithm, SPCR,
was suggested based on the idea of combining LSCR and
SPS. SPCR has the flexibility and computational advantages
of LSCR combined with the exact confidence of SPS. Finally,
some essential properties of the aforementioned finite-sample
identification methods were discussed.
We believe that SPCR is promising for the identification of
complex systems, including nonlinear ones. Many results that
Fig. 1. 95% confidence regions built by SPCR with k = 2 and l = 2. were previously proved in the context of LSCR [1], [6] and
SPS [3], [5] can be used for analyzing and extending this new
correlation-type method. For example, in virtue of [1], we can
V. D ESIRABLE P ROPERTIES OF argue that the consistency of the method can be improved by
F INITE -S AMPLE M ETHODS suitably prefiltering the input signal.
Now, we return to the general overview of finite-sample R EFERENCES
methods and list some of the most important properties
[1] M. C. Campi and E. Weyer, “Guaranteed non-asymptotic confi-
that one wants to achieve by suitably designing the Z dence regions in system identification,” Automatica, vol. 41, no. 10,
function. pp. 1751–1764, 2005.
• Inclusion of a point-estimate: Confidence regions can [2] M. C. Campi and E. Weyer, “Non-asymptotic confidence sets for the
help to assess the quality of point-estimates and, e.g., parameters of linear transfer functions,” IEEE Trans. Autom. Control,
vol. 55, no. 12, pp. 2708–2720, Dec. 2010.
to determine how robust a design that is based on them [3] A. Carè, B. Cs. Csáji, and M. C. Campi, “Sign-perturbed sums (SPS)
should be. We know that SPS builds its confidence with asymmetric noise: Robustness analysis and robustification tech-
regions around the least-squares (LS) estimate, while niques,” in Proc. 55th IEEE Conf. Decis. Control (CDC), Las Vegas,
NV, USA, 2016, pp. 262–267.
SPCR can guarantee the inclusion of correlation-type [4] B. Cs. Csáji, M. C. Campi, and E. Weyer, “Sign-perturbed sums (SPS):
estimates. A method for constructing exact finite-sample confidence regions for
• Consistency: For any false parameter value, θ  = θ ∗ , the general linear systems,” in Proc. CDC, 2012, pp. 7321–7326.
 n should decrease as the sample size,
probability of θ ∈  [5] B. Cs. Csáji, M. C. Campi, and E. Weyer, “Sign-perturbed sums: A new
system identification approach for constructing exact non-asymptotic
n, increases. Asymptotically, the coverage probability of confidence regions in linear regression models,” IEEE Trans. Signal
any such false θ should be zero. Some consistency results Process., vol. 63, no. 1, pp. 169–181, Jan. 2015.
are available for LSCR [1] and SPS [15], and can be [6] M. Dalai, E. Weyer, and M. C. Campi, “Parameter identification for
easily obtained for some bootstrap-style PDMs. It is yet nonlinear systems: Guaranteed confidence regions through LSCR,”
Automatica, vol. 43, no. 8, pp. 1418–1425, 2007.
to be proven whether SPCR inherits this property. [7] S. Garatti, M. C. Campi, and S. Bittanti, “Assessing the quality of identi-
• Favorable topology: The constructed confidence region, fied models through the asymptotic theory—When is the result reliable,”
 n , should have good topological properties. We
 Automatica, vol. 40, no. 8, pp. 1319–1332, 2004.
[8] M. Kieffer and E. Walter, “Guaranteed characterization of exact
know, for example, that the SPS confidence regions non-asymptotic confidence regions as defined by LSCR and SPS,”
are star-convex (and hence also connected) with the Automatica, vol. 50, no. 2, pp. 507–512, 2014.
LS estimate as a star centre, assuming exogenous [9] S. Kolumbán, I. Vajk, and J. Schoukens, “Perturbed datasets methods
regressors. for hypothesis testing and structure of corresponding confidence sets,”
Automatica, vol. 51, pp. 326–331, Jan. 2015.
• Weak computability: Deciding whether a candidate θ [10] L. Ljung, System Identification: Theory for the User, 2nd ed.
belongs to n should be computationally easy. LSCR, Upper Saddle River, NJ, USA: Prentice-Hall, 1999.
SPS and SPCR are all weakly computable in that sense, [11] M. Milanese, J. Norton, H. Piet-Lahanier, and É. Walter, Bounding
Approaches to System Identification. New York, NY, USA: Springer,
even for endogenous regressors; but this may not hold 2013, doi: 10.1007/978-1-4757-9545-5.
for bootstrap-style PDMs, for which evaluating the Z [12] R. R. Mohler, Bilinear Control Processes: With Applications to
function can quickly become too complex. Engineering, Ecology and Medicine. New York, NY, USA: Academic
• Strong computability: Calculating a representation of  n Press, 1973.
[13] T. Söderström and P. Stoica, System Identification. Hertfordshire, U.K.:
or an approximation of it should be computationally fea- Prentice-Hall, 1989.
sible. An ellipsoidal outer-approximation for SPS with [14] V. Volpe, B. Cs. Csáji, A. Carè, E. Weyer, and M. C. Campi, “Sign-
exogenous regressors can be constructed efficiently by perturbed sums (SPS) with instrumental variables for the identification
solving convex optimization problems [5]. Inner- and of ARX systems,” in Proc. 54th IEEE Conf. Decis. Control (CDC),
Osaka, Japan, 2015, pp. 2115–2120.
outer-approximations can also be built using interval- [15] E. Weyer, M. C. Campi, and B. Cs. Csáji, “Asymptotic properties of
analysis, see [8] for LSCR and SPS. SPS confidence regions,” Automatica, vol. 82, pp. 287–294, Aug. 2017.

Authorized licensed use limited to: Univ of Calif Berkeley. Downloaded on August 14,2020 at 16:22:14 UTC from IEEE Xplore. Restrictions apply.

You might also like