0% found this document useful (0 votes)
32 views

Behavioral Systems Theory in Data-Driven Analysis, Signal Processing, and Control

Jsjddnd

Uploaded by

Tyron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Behavioral Systems Theory in Data-Driven Analysis, Signal Processing, and Control

Jsjddnd

Uploaded by

Tyron
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Annual Reviews in Control 52 (2021) 42–64

Contents lists available at ScienceDirect

Annual Reviews in Control


journal homepage: www.elsevier.com/locate/arcontrol

Review article

Behavioral systems theory in data-driven analysis, signal processing, and


control
Ivan Markovsky a ,∗, Florian Dörfler b
a Department ELEC, Vrije Universiteit Brussel, Brussels, 1050, Belgium
b
Automatic Control Laboratory (IfA), ETH Zürich, Zürich, 8092, Switzerland

ARTICLE INFO ABSTRACT

Keywords: The behavioral approach to systems theory, put forward 40 years ago by Jan C. Willems, takes a representation-
Behavioral systems theory free perspective of a dynamical system as a set of trajectories. Till recently, it was an unorthodox niche of
Data-driven control research but has gained renewed interest for the newly emerged data-driven paradigm, for which it is uniquely
Missing data estimation
suited due to the representation-free perspective paired with recently developed computational methods.
System identification
A result derived in the behavioral setting that became known as the fundamental lemma started a new
class of subspace-type data-driven methods. The fundamental lemma gives conditions for a non-parametric
representation of a linear time-invariant system by the image of a Hankel matrix constructed from raw time
series data. This paper reviews the fundamental lemma, its generalizations, and related data-driven analysis,
signal processing, and control methods. A prototypical signal processing problem, reviewed in the paper, is
missing data estimation. It includes simulation, state estimation, and output tracking control as special cases.
The direct data-driven control methods using the fundamental lemma and the non-parametric representation
are loosely classified as implicit and explicit approaches. Representative examples are data-enabled predictive
control (an implicit method) and data-driven linear quadratic regulation (an explicit method). These methods
are equally amenable to certainty-equivalence as well as to robust control. Emphasis is put on the robustness of
the methods under noise. The methods allow for theoretical certification, they are computationally tractable,
in comparison with machine learning methods require small amount of data, and are robustly implementable
in real-time on complex physical systems.

1. Introduction natural than the classical frequency-domain and state–space ones. It


led to a ‘‘clear and rational foundation under the problem of obtain-
The behavioral approach to system theory was put forward by ing models from time series’’ (Willems, 1986, 1987). In particular,
Jan C. Willems in the early 1980s to resolve ‘‘many awkward things the global total least squares (Roorda & Heij, 1995), deterministic
with input/output thinking’’ (Willems, 2007b, Section 8). In addi- subspace (Van Overschee & De Moor, 1996), and structured low-
tion to enforcing ‘‘input/output thinking’’, conventional system theory rank approximation (Markovsky, 2013, 2019) approaches to system
approaches invariably associate a dynamical system with one of its identification are motivated by Willems (1986, 1987). More recently,
representations, e.g., a convolution, transfer function, or state–space the behavioral approach contributed key ideas and techniques for data-
representation. The new perspective brought by the behavioral ap- driven analysis, signal processing, and control. This paper reviews these
proach separates the system from its numerous representations by
ideas and techniques, presents some of the methods that originate from
defining a system as a set of trajectories. This abstract set-theoretic
them, and outlines research directions for future work.
perspective makes the ‘‘input/output thinking’’ a choice rather than a
In the contemporary language of machine learning, the new tech-
requirement.
niques are unsupervised and non-parametric. The techniques are un-
In addition to making the input/output thinking optional, separa-
tion of the system from its representations has other far reaching con- supervised in the sense that they use directly the raw time-series
sequences. It gives a geometric view of a linear time-invariant system data without labeling or pre-processing inputs and outputs, which
as a (low-dimensional) shift-invariant subspace in a (high-dimensional) require human decision making. The methods are non-parametric in
trajectory space. This geometric viewpoint is often simpler and more the sense that they do not involve a parametric model representation

∗ Corresponding author.
E-mail addresses: [email protected] (I. Markovsky), [email protected] (F. Dörfler).

https://doi.org/10.1016/j.arcontrol.2021.09.005
Received 15 July 2021; Received in revised form 26 September 2021; Accepted 27 September 2021
Available online 10 November 2021
1367-5788/© 2021 Elsevier Ltd. All rights reserved.
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

of the data-generating system nor of the solution, e.g., the filter or the as a set of trajectories and uses a non-parametric representation of the
controller. system’s behavior similar to the ones shown in Section 3.3.
The results reviewed in the paper originate from Markovsky (2017), Subsequent work (Coulson et al., 2019a) built on the data-driven
Willems et al. (2005). The key result of Willems et al. (2005), which control algorithms of Markovsky and Rapisarda (2008) and embedded
became known as the fundamental lemma, gives conditions for existence them in a robustified version of receding horizon predictive control.
of a non-parametric representation of a discrete-time linear time-invariant The resulting procedure, called Data-EnablEd Predictive Control (DeePC),
system that is specified by a trajectory of the system. The second saw many extensions and successful implementations in different ap-
cornerstone for this paper is the idea put forward in Markovsky (2017) plication areas. Since 2019, Berberich and Allgöwer (2020), De Persis
that various signal processing and control problems can be posed and and Tesi (2019) and van Waarde et al. (2020) and others used the
solved as a single generic missing data estimation problem. core result of the fundamental lemma for solving various analysis and
Recently, an ever-growing number of generalizations and appli- control design problems directly from input-state data. The foundations
cations of the fundamental lemma has appeared in the context of laid by the fundamental lemma as well as the numerous adoptions
data-driven analysis and control. We provide a concise and comprehen- and implementations led to a blossoming literature in direct data-driven
sive (albeit not exhaustive) review of the literature. A peculiar aspect of control, which is reviewed in this paper.
the literature centered around the fundamental lemma (and behavioral The remainder of this paper is organized as follows. Section 2
system theory in general) is that from the publication of the result gives a self-contained introduction to the behavioral approach, which
in 2005 till 2018 the result remained unnoticed—as evident by the is sufficient to derive in Section 3 methods for data-driven represen-
citation record—before it has blossomed in the wake of data-driven tation of linear time-invariant systems. These methods are applied in
control methods. Section 4 for the solution of a generic missing data estimation problem
In what follows, we provide a brief historical recap and contextual- and in Section 5 for solving direct data-driven control problems via
ization of Willems et al. (2005). The fundamental lemma was originally predictive control or explicit feedback policies. Section 6 gives conclu-
conceived as a purely theoretical system identification result: it gives sions and poses open research directions. Selected, self-contained, and
identifiability conditions for deterministic linear time-invariant systems. educational proofs are given in the appendices.
These conditions can be viewed also as input design guidelines. Under
certain specified conditions on the input signal and the data-generating 2. Behavioral system theory
system, the collected data reveal the system dynamics. Informally,
when the data is assembled in a Hankel matrix, it spans the set of The idea of separating the notion of a dynamical system from its
all finite-length trajectories of the system, i.e., it represents the finite- representations is one of the hallmarks of the behavioral approach to
time behavior of the system. The identifiability conditions given by the system theory. It also plays an important role in data-driven system
fundamental lemma are important, however, in retrospect what led to theory, signal processing, and control. Section 2.1 defines a dynamical
the renewed interest is the non-parametric representation that gave rise system and in particular a linear time-invariant system as a set of
to a new computational approach for solving data-driven analysis and trajectories. The class of linear time-invariant systems is refined in
design problems. Section 2.2 by defining the notion of model complexity and the subclass
The non-parametric representation was used originally as a tool to of bounded complexity linear time-invariant systems. Three represen-
find alternative derivations of existing subspace identification meth- tations of bounded complexity linear time-invariant systems—kernel,
ods (Markovsky et al., 2005). It revealed the system theoretic meaning input/output, and input/state/output—are presented in Section 2.3. A
of computational steps in the methods. For example, the oblique pro- representation-free notion of controllability is introduced and linked to
jection was shown to compute sequential free responses of the system. the classical notion of state controllability in Section 2.4. The simula-
Later on the non-parametric representation was used for solving other tion problem and specification of initial condition in a representation-
data-driven problems. free manner are presented in Section 2.5. Our presentation is self-
As a byproduct of the fundamental lemma, new algorithms for contained and focused on what is needed for the coming sections on
system identification, data-driven simulation, and data-driven con- the fundamental lemma and data-driven control. For a more in-depth
trol were proposed by Markovsky and Rapisarda (2008), Markovsky introduction to the behavioral approach, we refer to Polderman and
et al. (2006). The algorithms for data-driven simulation and control Willems (1998) and Willems (1986, 1987, 1991, 2007a).
of Markovsky and Rapisarda (2008), which are a follow-up of the ones
in Markovsky et al. (2005), are noteworthy because of: 2.1. Dynamical systems as sets of trajectories
• simplicity—they require only solving a system of linear equations
The concept of a dynamical system is fundamental in system theory.
that involves no hyperparameters,
System theory textbooks, however, rarely define it rigorously. In the
• generality—apply to multivariable systems under arbitrary initial
special case of a linear system, the concept of a dynamical system is
condition, and
linked to the one of an input–output map, defined by convolution or
• robustness—subsequent results show that with relatively minor
a transfer function if the system is also time-invariant. More advanced
modifications the algorithms ‘‘work’’ also for noisy data and
textbooks identify a dynamical system with the ubiquitous state–space
nonlinear data-generating systems.
equations.
These features make the proposed data-driven simulation and control The behavioral approach to system theory starts with a profound,
algorithms a viable and practical alternative to the conventional model- yet simple idea: a dynamical system is a set of trajectories. This set is
based approach that requires parametric model identification followed referred to as the behavior. Properties of the system as well as problems
by model-based simulation and control. involving the system are defined in terms of the behavior. For example,
Earlier precursors of the data-driven control methods originating a system is linear if it is a subspace and time-invariant if it is invariant
from the fundamental lemma are the work of Favoreel et al. (1999) to the action of the shift operator. Thus, a linear time-invariant system
on subspace predictive control and the work of Fujisaki et al. (2004) is a shift-invariant subspace. There is no a priori reference to inputs,
and Ikeda et al. (2001), who proposed a conceptual framework for outputs, and initial condition. However, the definition has the input–
subspace-type data-driven control. In Section 5.1.2 we show the con- output map as a special case and responses due to nonzero initial
nection of the subspace predictive control to methods derived from the condition are included in the behavior.
fundamental lemma. The framework of Fujisaki et al. (2004) and Ikeda The behavior is all that matters: two systems are identical if and
et al. (2001) is also motivated by the behavioral perspective of a system only if their behaviors are equal. How the system is specified is an

43
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Table 1 the lag 𝐥(B), and the order 𝐧(B). These structure indices are inherent
Summary of notation.
properties of the system, i.e., they are independent of its representa-
Notation: Definition: tion. However, as shown in Section 2.3, the structure indices can be
N ∶= { 1, 2, … } Set of natural numbers expressed also in terms of the parameters of an input/output, minimal
𝑤 ∈ (R𝑞 )N , 𝑤 ∶ N → R𝑞 𝑞-variate real discrete-time signal with time axis N
( ) kernel, and minimal state–space representations, where they are linked
𝑤|𝐿 ∶= 𝑤(1), … , 𝑤(𝐿) Restriction of 𝑤 to the interval [1, 𝐿]
𝑤 = 𝑤p ∧ 𝑤f Concatenation of trajectories
to familiar concepts.
𝜎, (𝜎𝑤)(𝑡) ∶= 𝑤(𝑡 + 1) Unit shift operator The restriction B|𝐿 of the system B ∈ L 𝑞 to the interval [1, 𝐿] is
B ⊂ (R𝑞 )N Discrete-time dynamical system B with 𝑞 variables the set of 𝐿-samples long trajectories of B. By linearity of B, B|𝐿 is a
B|𝐿 ∶= { 𝑤|𝐿 | 𝑤 ∈ B } Restriction of B to the interval [1, 𝐿] subspace of R𝑞𝐿 . Its dimension dim B|𝐿 is
L𝑞 Set of linear time-invariant systems with 𝑞
variables dim B|𝐿 = 𝐦(B)𝐿 + 𝐧(B), for 𝐿 ≥ 𝐥(B). (1)
𝐦(B)/𝐥(B)/𝐧(B) Number of inputs/lag/order of B
( )
𝐜(B) ∶= 𝐦(B), 𝐥(B), 𝐧(B) Complexity of B The dimension formula (1) is used often in the rest of the paper.
𝑞
L𝑐 ∶= { B ∈ L | 𝐜(B) ≤ 𝑐 } Set of bounded complexity
𝑞
An easily accessible proof based on the familiar input/state/output
Linear time-invariant systems
H𝐿 (𝑤) Hankel matrix with 𝐿 block rows constructed from representation of B is given in Appendix A. (We refer to Markovsky
𝑤, see (9) and Dörfler (2020) for a state–space independent proof.) Intuitively,
𝐴† The pseudo-inverse of 𝐴 dim B|𝐿 is the degrees of freedom in choosing a trajectory 𝑤 ∈ B|𝐿
of the system. The term 𝐦(B)𝐿 in (1) corresponds to the degrees of
freedom due to the choice of the input, and the term 𝐧(B) corresponds
to the degrees of freedom due to the choice of the initial condition.
important but separate question. A specification of the system is called
The notion of complexity of a dynamical system is related to the
a representation. Thus, in the behavioral setting the concept of rep-
‘‘size’’ of its behavior—the more complex the system is, the more
resentation is decoupled from the one of a system. A system admits
trajectories it allows. In the case of linear time-invariant systems, (1)
different representations. For example, convolution, transfer function,
characterizes the ‘‘size’’ of B. The system is called bounded complexity if
and state–space equations are representations of a linear time-invariant
dim B|𝐿 < 𝑞𝐿 for sufficiently large 𝐿. Bounded complexity implies that
system and not the system itself. The system is the solution set of these
not all signals in (R𝑞 )N are trajectories of the system or, equivalently,
equations. The view of the representations as incidental is a major
that not all variables are inputs, i.e., 𝐦(B) < 𝑞, and the order 𝐧(B) is
departure point of the behavioral from the classical approach.
finite. Since
A valid criticism of the behavioral approach is that it is abstract
( )
and difficult to work with in practice. Relevant questions being asked 𝐥(B) ≤ 𝐧(B) ≤ 𝑞 − 𝐦(B) 𝐥(B), (2)
are: What is the value of the set-theoretic formalism? What can be
the lag 𝐥(B) of a bounded complexity system B is also finite. For-
done with it that cannot be done in the classical setting by working
mulae (1) and (2) can be derived using the minimal kernel and in-
with representations of the system? How can problems be solved in
put/state/output representations of the system, introduced in the next
practice without using representations? Indeed, in practice represen-
section. For details, see Willems (1986, Section 7).
tations are needed for specifying a system and for solving problems
Formally, we define the triple
both analytically and computationally. The value of the set-theoretic
( )
formalism is on the higher level of defining properties and problems in 𝐜(B) ∶= 𝐦(B), 𝐥(B), 𝐧(B)
a representation-free manner.
A representation-free problem formulation is important because it as the complexity of B. Then,
decouples the meaning and objectives of the problem from its potential 𝑞
L(𝑚,𝓁,𝑛) ∶= { B ∈ L 𝑞 | 𝐦(B) ≤ 𝑚, 𝐥(B) ≤ 𝓁, 𝐧(B) ≤ 𝑛 }
solution methods. The choice of the representation should pertain to the
solution method only. In addition, to the problem statement–solution is the set of bounded complexity linear time-invariant systems.
method decoupling, the behavioral approach led to new representa- The restricted behavior B|𝐥(B)+1 of a bounded complexity linear
tions, analysis tools, and design methods that have no counterparts in time-invariant system B uniquely specifies the system (Markovsky &
the input–output setting. Till recently, using the behavioral approach Dörfler, 2020, Lemma 12). Hence, if a certain property can be certified
had primarily conceptual and theoretical benefits, as evident by a few for the set of trajectories B|𝐿 of finite length 𝐿 > 𝐥(B), then it holds
practical algorithms that came out from it. This has changed in the for all trajectories B. A system B without inputs, i.e., 𝐦(B) = 0,
last 15 years, when algorithms, software, and applications inspired by is called autonomous. For a linear time-invariant autonomous system
or developed for the behavioral approach emerged. The change was dim B|𝐿 = 𝐧(B) if and only if 𝐿 ≥ 𝐥(B).
catalyzed by research on data-driven methods, which are both using
and contributing to the representation-free perspective of dynamical 2.3. Parametric representations of bounded complexity linear time-invariant
systems. systems
Notation: The notation used in the rest of the paper is summarized
𝑞
in Table 1. The space of 𝑞-variate one-side infinite time-series (discrete- A bounded complexity linear time-invariant system B ∈ L(𝑚,𝓁,𝑛) ad-
time signals) is denoted by (R𝑞 )N . Recall that a linear time-invariant mits different parametric representations. Next, we review (1) the kernel
(LTI) system B with 𝑞 variables is a shift-invariant subspace of (R𝑞 )N . representation, which is a higher order difference equation involving
The set of 𝑞-variate linear time-invariant systems is denoted by L 𝑞 . the variables, i.e., an auto-regressive time-series model, (2) the in-
In the next section, we elaborate on the structure of a linear time- put/output representation, which is a kernel representation with addi-
invariant system, introducing the notion of system’s complexity and tional structure—partitioning of the variables into inputs and outputs—
defining subsets of L 𝑞 of systems with bounded complexity. and (3) the input/state/output representation, which in addition to
the input/output partitioning of the variables introduces an auxiliary
2.2. Bounded complexity linear time-invariant systems (state) variable, a first order difference equation of the state, and a
static relation among the output, input, and state.
Apart from being a shift-invariant subspace, a linear time-invariant
• A kernel representation of the system B is defined by a linear
system B has additional structure described by a set of integers, called
constant-coefficients difference equation
integer invariants or structure indices (Willems, 1986, Section 7). The
{ }
most important ones are the number of inputs (free variables) 𝐦(B), B = ker 𝑅(𝜎) ∶= 𝑤 | 𝑅(𝜎)𝑤 = 0 , (3)

44
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

where 𝜎 is the unit shift operator, (𝜎𝑤)(𝑡) ∶= 𝑤(𝑡 + 1) and the


operator 𝑅(𝜎) is defined by the polynomial matrix

𝑅(𝑧) = 𝑅0 + 𝑅1 𝑧 + ⋯ + 𝑅𝓁 𝑧𝓁

⎡𝑅1 (𝑧)⎤ ⎡ 𝑅0 + 𝑅1 𝑧 + ⋯ + 𝑅𝓁1 𝑧 1 ⎤


1 1 1 𝓁
⎢ ⎥
=⎢ ⋮ ⎥=⎢ ⋮ 𝑘×𝑞
⎥ ∈ R [𝑧].
⎢ 𝑘 ⎥ ⎢ 𝑘
⎣𝑅 (𝑧)⎦ ⎣𝑅0 + 𝑅1 𝑧 + ⋯ + 𝑅𝓁 𝑧 𝑘 ⎥⎦
𝑘 𝑘 𝓁
𝑘

The kernel representation (3) is called minimal if the number of


equations 𝑘 is as small as possible over all kernel representa-
tions of B (Willems, 1991, Definition III). In a minimal kernel
representation, 𝑘 = 𝑝 ∶= 𝑞 − 𝐦(B)—the number of outputs of
B—and 𝓁 ∶= deg 𝑅 ∶= max𝑖 𝓁𝑖 is also minimized over all kernel Fig. 1. A system is controllable if any ‘‘past’’ trajectory 𝑤p can be concatenated with
any ‘‘future’’ trajectory 𝑤f via a suitable ‘‘control’’ trajectory 𝑤c .
representations of B. The minimal degree 𝓁 of 𝑅 is equal to the

lag 𝐥(B) of the system. The minimal total degree 𝑛 ∶= 𝑘𝑖=1 𝓁𝑖 of 𝑅
is equal to the order 𝐧(B) of the system.
• Input/output representation: For a permutation matrix 𝛱 ∈ R𝑞×𝑞 2.4. Controllability in the behavioral setting
and an integer 𝑚 ∈ [1, 𝑞] define via
[ ] As all system properties, in the behavioral setting, controllability is
𝑢 defined in terms of the behavior: B is controllable if for any 𝑤p , 𝑤f ∈ B
∶= 𝛱 −1 𝑤 (4)
𝑦 and 𝑡0 ∈ N, there is 𝜏 ∈ N and 𝑤 ∈ B, such that 𝑤(𝑡) = 𝑤p (𝑡), for
𝑡 ∈ [1, 𝑡0 ], and 𝑤(𝑡) = 𝑤f (𝑡), for 𝑡 ≥ 𝑡0 +𝜏 (Willems, 1991, Definition V.1).
a partitioning of the variables 𝑤(𝑡) ∈ R𝑞 into 𝑢(𝑡) ∈ R𝑚 and 𝑦(𝑡) ∈
I.e., B is controllable if it is possible to ‘‘patch’’ any ‘‘past’’ trajectory 𝑤p
R𝑝 , where 𝑝 = 𝑞 − 𝑚. Let 𝛱𝑢 be the projection of 𝑤 on the variable
to any ‘‘future’’ trajectory 𝑤f by including a control trajectory 𝑤c over
𝑢, i.e., 𝛱𝑢 𝑤 ∶= 𝑢. Acting on a set, 𝛱𝑢 projects all elements in the
a period of length 𝜏; see Fig. 1 for a visual illustration.
set. The partitioning (4) is an input/output partitioning of B, i.e., 𝑢
Note that the definition of controllability in the behavioral setting
is an input and 𝑦 is an output of the system, if (1) 𝛱𝑢 B = (R𝑚 )N , is not restricted to linear time-invariant systems: it applies to general
i.e., 𝑢 is a free variable, (2) the output 𝑦 is not anticipating the dynamical systems. In Pillai and Willems (1999), it is used also in the
input 𝑢 (Willems, 1991, Definition VIII.4), and (3) the number of context of multidimensional systems. When specialized to linear time-
inputs 𝑚 ∶= dim 𝑢 is maximal over all partitionings (4) of B that invariant systems, the notion of controllability in the behavioral setting
satisfy properties 1 and 2. is related but not equivalent to the classical notion of state controllability
Let B = ker 𝑅(𝜎) be a minimal kernel representation. The par- (7).
titioning (4) is an input/output partitioning of B if and only The state controllability is defined as a property of the pair of pa-
[ ]
if with 𝑄 −𝑃 ∶= 𝑅𝛱, where the polynomial matrix 𝑃 ∈ rameters (𝐴, 𝐵) in an input/state/output representation Bss (𝐴, 𝐵, 𝐶, 𝐷)
𝑝×𝑝
R [𝑧] is non-singular (Willems, 1991). The resulting input/output of a linear time-invariant system B. It depends therefore on the choice
representation (also called polynomial matrix descriptions (Antsaklis of the representation (albeit it is invariant under similarity transfor-
& Michel, 1997, Chapter 7.5)) is mations) as well as on the properties of the system B. A particular
{ [ ] } pair (𝐴, 𝐵) may be uncontrollable because the input has no sufficient
Bi/o (𝑃 , 𝑄, 𝛱) = 𝛱 𝑦𝑢 | 𝑄(𝜎)𝑢 = 𝑃 (𝜎)𝑦 . effect on the output or due to a ‘‘bad’’ choice of the state. Disentangling
the two causes is an important benefit of using the behavioral setting.
• The input/state/output representation is defined as Controllability in the behavioral setting is a property of a system B
B = Bss (𝐴, 𝐵, 𝐶, 𝐷, 𝛱) only. Thus, it is independent of the choice of the representation.
{ [ ] In order to show the relation between the behavioral and the
∶= 𝛱 𝑦𝑢 | there is 𝑥 ∈ (R𝑛 )N , such that classical notions of controllability, consider a linear time-invariant
}
𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢, 𝑦 = 𝐶𝑥 + 𝐷𝑢 , (5) system B ∈ L 𝑞 with an input/state/output representation B =
[ ] Bss (𝐴, 𝐵, 𝐶, 𝐷). Without loss of generality we assume that the parame-
where 𝛱 ∈ R𝑞×𝑞 is a permutation and 𝐶𝐴 𝐷 𝐵 ∈ R(𝑛+𝑝)×(𝑛+𝑚) . The
ters (𝐴, 𝐵, 𝐶, 𝐷) are in the Kalman decomposition form:
input/state/output representation (5) is called minimal if 𝑛 ∶=
⎡𝐴𝑐 𝑜̄ ⋆ ⋆ ⋆⎤ ⎡𝐵𝑐 𝑜̄ ⎤
dim 𝐴 is as small as possible over all input/state/output repre- ⎢ ⎥ ⎢ ⎥
0 𝐴 0 ⋆⎥ 𝐵
sentations of B (Willems, 1991, Definition VII.5). In a minimal 𝐴=⎢ 𝑐𝑜
, 𝐵 = ⎢ 𝑐𝑜 ⎥ ,
⎢ 0 0 𝐴𝑐̄𝑜̄ ⋆⎥ ⎢ 0 ⎥
input/state/output representation, the state dimension 𝑛 is equal ⎢ 0 ⎥ ⎢ 0 ⎥
⎣ 0 0 𝐴𝑐𝑜
̄ ⎦ ⎣ ⎦
to the order 𝐧(B) of the system. The lag 𝐥(B) also manifests itself [ ]
in a minimal input/state/output representations of B. It is equal 𝐶 = 0 𝐶𝑐𝑜 0 𝐶𝑐𝑜 ̄ .
to the smallest 𝑘, for which the extended observability matrix The system B has a direct sum decomposition B = Bctr ⊕ Baut into
a controllable subsystem Bctr ∈ L 𝑞 and an autonomous subsystem
O𝑘 (𝐴, 𝐶) ∶= col(𝐶, 𝐶𝐴, … , 𝐶𝐴 𝑘−1
) (6)
Baut ∈ L 𝑞 , see Willems (1991, Proposition V.8). The controllable
reaches full column rank, i.e., rank O𝑘 (𝐴, 𝐶) = 𝑛. subsystem Bctr corresponds to the subsystem Bss (𝐴𝑐𝑜 , 𝐵𝑐𝑜 , 𝐶𝑐𝑜 , 𝐷) in
In a minimal input/state/output representation (5), the pair (𝐴, 𝐶) the Kalman decomposition, and the autonomous subsystem Baut corre-
is state observable, i.e., rank O𝐥(B) (𝐴, 𝐶) = 𝐧(B), however, the pair sponds to Bss (𝐴𝑐𝑜 ̄ , 0). Therefore, in order to preserve the behav-
̄ , 0, 𝐶𝑐𝑜
(𝐴, 𝐵) need not be state controllable, i.e., ior, unobservable states in a state–space representation can be removed,
but uncontrollable states that are observable should not be removed.
[ ]
rank 𝐵 𝐴𝐵 ⋯ 𝐴𝐥(B)−1 𝐵 = 𝐧(B). (7) Indeed, the unobservable states have no contribution to B, however,
the uncontrollable states that are observable define the autonomous
As shown in the next section, (𝐴, 𝐵) is state controllable if and subsystem Baut and removing them changes B.
only if the system B is controllable in a new representation- The fact that uncontrollable–observable states cannot be removed
invariant sense of controllability. clashes with the classical wisdom that ‘‘minimality’’ of a state–space

45
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

The trajectory 𝑤ini plays the role of the initial state in the state–
space setting. It can be shown that for 𝑇ini ≥ 𝐥(B), the vector 𝑤ini of
sequential samples is a state vector of the system (Rapisarda & Willems,
1997). Then, problems of estimation of 𝑤ini can also be understood as
state estimation problems.

3. Data-driven non-parametric model representation

This section presents a non-parametric representation of linear time-


invariant systems that is at the core of the subspace-type data-driven
analysis, estimation, and control methods. The non-parametric repre-
Fig. 2. Initial condition for a trajectory 𝑤 ∈ B are specified in the behavioral setting
sentation is the image of a Hankel matrix constructed from trajectories
by a prefix trajectory 𝑤ini of length 𝑇ini ≥ 𝐥(B). The condition that 𝑤 is generated from of the system. It has been used implicitly since the 90’s in subspace
the initial condition specified by 𝑤ini is then 𝑤ini ∧ 𝑤 ∈ B. identification and related data matrices are extensively used in dic-
tionary learning and motion primitives. A theoretical foundation for
its use, however, was provided only later by the so-called fundamen-
representation of a system implies both state observability and state tal lemma. Apart from providing a foundation for the non-parametric
controllability. A quadruple (𝐴, 𝐵, 𝐶, 𝐷) that is both state observable representation, the fundamental lemma, reviewed in Section 3.1, gives
and state controllable is called a minimal realization of the system. also identifiability conditions, i.e., conditions under which the data-
The surprising fact brought by the behavioral approach is that some generating system can be uniquely identified from the data. While the
nonminimal realizations are in fact real; equivalently, there are ‘‘real- fundamental lemma provides sufficient conditions for identifiability
life’’ systems that do not admit minimal state–space realization. The and for the non-parametric representation from an input design per-
behavioral notion of controllability clarifies the folklore behind the spective, Section 3.2 presents alternative necessary and sufficient con-
‘‘without loss of generality’’ assumption of existence of a minimal ditions based on the rank of the Hankel matrix of the data. Section 3.3
realization as well as the cancellation of common poles and zeros in a previews two ways of using the non-parametric representation for
transfer function. In both questions—when a minimal realization exists solving data-driven analysis, signal processing, and control problems.
and when a pole-zero cancellation is allowed—the answer is ‘‘when the
system is controllable’’. 3.1. The fundamental lemma

2.5. Simulation and specification of initial condition Given one ‘‘long’’ trajectory 𝑤d ∈ (R𝑞 )𝑇 of a linear time-invariant
system B ∈ L 𝑞 , multiple ‘‘short’’ trajectories of B can be created
Simulation of a dynamical system is one of the basic operations exploiting time-invariance. In what follows, the subscript ‘‘d’’ stands
in the arsenal of system theory. It is defined for a system with an for ‘‘data’’ and indicates one or more trajectories of a system that are
input/output partitioning of the variables as follows: Finding the output
used to implicitly specify it. Let 𝐿 ∈ { 1, … , 𝑇 } be the length of the
of the system, given the system, the input, and the initial condition.
‘‘short’’ trajectories and define the cut operator
From a mathematical point of view, simulation is the problem of
( )
solving an equation, e.g., (3), (2.3), or (5) if the system B is given by 𝑤|𝐿 ∶= 𝑤(1), … , 𝑤(𝐿) .
on of the representations reviewed in Section 2.3. From the behavioral
point of view, simulation is particular way to parametrize a trajectory Sequential application of the shift 𝜎 and cut |𝐿 operators on 𝑤d results
𝑤 ∈ B of the system B, i.e., abstractly viewed, simulation is the in 𝑁 = 𝑇 − 𝐿 + 1, 𝐿-samples-long trajectories
problem of selecting an element of the behavior. A convenient way (𝜎 0 𝑤d )|𝐿 , (𝜎 1 𝑤d )|𝐿 , … , (𝜎 𝑇 −𝐿 𝑤d )|𝐿 .
of achieving parametrization of 𝑤 ∈ B is to fix an input/output
partitioning (4) of the [variables, and take as parameters the input The (𝑞𝐿) × (𝑇 − 𝐿 + 1)-dimensional matrix
]
component 𝑢 of 𝑤 = 𝛱 𝑦𝑢 and the initial condition. As shown next,
⎡ 𝑤d (1) 𝑤d (2) ⋯ 𝑤d (𝑇 − 𝐿 + 1)⎤
the initial condition can be specified also by a trajectory—a ‘‘prefix’’ ⎢ 𝑤 (2) 𝑤d (3) ⋯ 𝑤d (𝑇 − 𝐿 + 2)⎥
trajectory 𝑤ini for 𝑤. H𝐿 (𝑤d ) ∶= ⎢ d ⎥, (9)
⋮ ⋮ ⋮
⎢ ⎥
Formally, the simulation problem is defined as follows: Given a ⎣𝑤d (𝐿) 𝑤d (𝐿 + 1) ⋯ 𝑤d (𝑇 ) ⎦
system B, an input/output partitioning (4), input 𝑢 ∈ (R𝑚 )𝐿 , and initial
formed by stacking these trajectories next to each other is called the
condition 𝑤ini ∈ (R𝑞 )𝑇ini ,
Hankel matrix of 𝑤d (with depth 𝐿). Although H𝐿 (𝑤d ) is well defined
find 𝑦 ∈ (R𝑝 )𝐿 , such that 𝑤ini ∧ 𝛱(𝑢, 𝑦) ∈ B|𝑇ini +𝐿 , (8) for any 𝐿 ∈ { 1, … , 𝑇 }, we require H𝐿 (𝑤d ) to have more columns than
rows, which implies
see Fig. 2 for a graphical illustration. ⌊ ⌋
The assumption that 𝑤ini is a trajectory of B guarantees existence 𝐿 ≤ 𝐿max ∶= 𝑇𝑞+1 +1
,
of a solution to the simulation problem (8). However, in general, the
solution may not be unique. In order to render the solution unique, where ⌊𝑎⌋ is the largest integer smaller than or equal to 𝑎.
𝑤ini has to be ‘‘sufficiently’’ long. A sufficient condition for this is that A signal 𝑢d ∈ (R𝑚 )𝑇 is called persistently exciting of order 𝐿 if
𝑇ini ≥ 𝐥(B). H𝐿 (𝑢d ) is full row rank, i.e., rank H𝐿 (𝑢d ) = 𝑚𝐿. It follows from (1)
that a persistently exciting signal 𝑢d of order 𝐿 cannot be modeled as
Lemma 1 (Initial Condition Specification Markovsky & Rapisarda, 2008). a trajectory of an autonomous linear time-invariant system with lag
Let B ∈ L 𝑞 admit an input/output partition 𝑤 = (𝑢, 𝑦). Then, for any given less than 𝐿. For 𝑢d to be persistently exciting of order 𝐿, it must be
𝑤ini ∈ B𝑇ini with 𝑇ini ≥ 𝐥(B) and 𝑢 ∈ (R𝑚 )𝐿 , there is a unique 𝑦 ∈ (R𝑝 )𝐿 , sufficiently rich and long. In particular, it must have at least 𝑇min ∶=
such that 𝑤ini ∧ (𝑢, 𝑦) ∈ B|𝑇ini +𝐿 . (𝑚 + 1)𝐿 − 1 samples. Persistency of excitation plays an important role
in system identification and input design problems.
Proof. Two proofs of Lemma 1 are given in Appendix B. The first one The following result, which became known as the fundamental
is based on an input/state/output representation of B. The second one lemma, gives both identifiability conditions as well as input (experi-
is based on a generic basis of B|𝑇ini +𝐿 . □ ment) design guidelines.

46
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Lemma 2 (Fundamental Lemma Willems et al., 2005). Consider a linear 1. if 𝑢d is persistently exciting of order 𝑛,
time-invariant system B ∈ L 𝑞 with an input/output partition 𝑤 = (𝑢, 𝑦). [ ]
rank 𝑥d (1) ⋯ 𝑥d (𝑇 ) = 𝑛,
Let

1. 𝑤d = (𝑢d , 𝑦d ) ∈ B|𝑇 be a trajectory of B, 2. if 𝑢d is persistently exciting of order 𝑛 + 1,


2. the system B be controllable, and [ ]
𝑢 (1) ⋯ 𝑢d (𝑇 )
3. the input component 𝑢d of 𝑤d be persistently exciting of order 𝐿 + rank d = 𝑛 + 𝑚,
𝑥d (1) ⋯ 𝑥d (𝑇 )
𝐧(B).
3. if 𝑢d is persistently exciting of order 𝑛 + 𝐿,
Then, any 𝐿-samples long trajectory 𝑤 of B can be written as a linear com- [ ]
bination of the columns of H𝐿 (𝑤d ) and any linear combination H𝐿 (𝑤d )𝑔, H𝐿 (𝑢d )
rank [ ] = 𝑚𝐿 + 𝑛.
for 𝑔 ∈ R𝑇 −𝐿+1 , is also a trajectory of B, i.e., 𝑥d (1) ⋯ 𝑥d (𝑇 − 𝐿 + 1)

image H𝐿 (𝑤d ) = B|𝐿 . (10) The rank conditions appearing in Corollary 3 are extensively used in
the subspace identification literature, however, they are assumed and
About the proof: While the inclusion image H𝐿 (𝑤d ) ⊆ B|𝐿 follows there were no tests available to verify them from given input–output
directly from the linearity and time-invariance properties of the system, data 𝑤d = (𝑢d , 𝑦d ) that is an arbitrary trajectory. Corollary 3 gives such
the question when equality holds is not obvious. The original proof a test. Corollary 3 is also extensively used in data-driven analysis and
in Willems et al. (2005) is based on a kernel representation of the control, see Section 5.3.
system and uses the notion of annihilators of the behavior. The tool The fundamental lemma has been generalized for uncontrollable
used in this proof is abstract algebra. An alternative proof based on systems (Mishra et al., 2020; Yu et al., 2021), data consisting of
a state–space representation is given in van Waarde, De Persis et al. multiple trajectories (van Waarde, De Persis et al., 2020), other ma-
(2020). Both proofs are by contradiction. Quoting from Willems et al. trix structures (Coulson et al., 2020), as well as the following model
(2005): classes: affine (Berberich et al., 2021c), linear parameter-varying (Ver-
hoek et al., 2021), flat systems (Alsalti et al., 2021), finite impulse
‘‘The interesting, and somewhat surprising, part of Theorem 1 is that
response Volterra (Rueda-Escobedo & Schiffer, 2020), and Wiener–
persistency of excitation of order 𝐿 + 𝐧(B) is needed in order to be
Hammerstein (Berberich & Allgöwer, 2020). Other works extending
able to deduce that the observed sequences of length 𝐿 have the
the fundamental lemma to nonlinear systems use the Koopman oper-
‘correct’ annihilators and the ‘correct’ span. In other words, we have
ator (Lian & Jones, 2021a; Lian, Wang et al., 2021). Experiment design
to assume a ‘deeper’ persistency of excitation on 𝑢d than the width
methods for the fundamental lemma, i.e., methods for choosing the
of the windows of (𝑢d , 𝑦d ) which are considered’’.
trajectory 𝑤d , are considered in De Persis and Tesi (2021a), Iannelli
Currently there is no constructive proof that gives an intuition why the et al. (2020) and van Waarde (2021).
additional persistency of excitation is needed nor how conservative the Like the fundamental lemma, however, all above cited generaliza-
conditions are. tions depend on a priori given input/output partitioning of the variables
Lemma 2 states conditions on the input 𝑢d and the system B under and provide sufficient conditions in terms of persistency of excitation
which, independently of the initial condition corresponding to 𝑤d , the of the input. The following section presents an alternative result that
Hankel matrix H𝐿 (𝑤d ) spans the restricted behavior B|𝐿 . Therefore, relaxes the assumption of a given input/output partitioning. It expresses
the image of the Hankel matrix H𝐿 (𝑤d ) is a representation of the the persistency of excitation in terms of all variables, provides necessary
system B, as long as trajectories of length 𝐿 are concerned. The and sufficient conditions not assuming controllability, and widens the
resulting representation (10) is non-parametric, applies to controllable class of data matrix structures.
linear time-invariant systems, and depends on the given data 𝑤d only.
Although (10) is a representation of the restricted behavior B|𝐿 , the 3.2. Identifiability
fact that it involves raw data only and no parameters distinguishes
it from conventional parametric and non-parametric system represen- More generally, instead of one trajectory 𝑤d , consider a set
tations that are traditionally called ‘‘models’’. In this sense, direct
data-driven methods using only (10) are ‘‘model-free’’. Wd ∶= { 𝑤1d , … , 𝑤𝑁
d
}, 𝑤𝑖d ∈ (R𝑞 )𝑇𝑖 (12)
Over time, Lemma 2 became known as the fundamental lemma
of 𝑁 trajectories of a dynamical system B, i.e.,
because of its foundational importance for system identification, data-
driven analysis, signal processing, and control. Indeed, since B|𝐥(B)+1 𝑤𝑖d ∈ B|𝑇𝑖 , for all 𝑖 = 1, … , 𝑁. (13)
completely specifies the system B, choosing 𝐿 = 𝐥(B) + 1 in Lemma 2,
we obtain conditions under which B can be recovered from the We refer to Wd as the data and to B as the data-generating system.
data 𝑤d , i.e., identifiability conditions. Moreover, the result is inter- The question ‘‘Can we recover the data-generating system B from the
pretable and as shown later on in the overview, algorithms derived data Wd ?’’ is called the identifiability question. In order to make the
from it are tractable and robust. identifiability question well posed, it is necessary to know in addition
A special case of the fundamental lemma that derives identifiability to the data Wd a model class M to which the to-be-identified system B
condition used in subspace identification (Van Overschee & De Moor, belongs. In this paper this is the linear time-invariant model class L 𝑞 or
1996; Verhaegen & Dewilde, 1992) is when the system B is given by L𝑐𝑞 that uses prior knowledge of an upper bound 𝑐 on the complexity.
an input/state representation, i.e., the output is equal to the state:
Definition 4 (Identifiability). The system B ∈ M is identifiable from
B = { (𝑢, 𝑥) | 𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢 }. (11) the data (12)–(13) in the model class M if B is the only model in M
Note that in this case 𝐥(B) = 1. Applied to (11), Lemma 2 leads to the that fits the data exactly, i.e.,
following result (Willems et al., 2005, Corollary 2).
B
̂ ∈ M and Wd ⊂ B
̂ ⟹ B
̂ = B.

Corollary 3. Consider a trajectory (𝑢d , 𝑥d ) ∈ B|𝑇 of a system (11) Identifiability gives conditions for well-posedness of the exact iden-
with 𝑚 ∶= dim 𝑢 inputs and state dimension 𝑛 ∶= dim 𝑥. Assume that tification problem, which is the map Wd ↦ B ̂ ∈ M from data to a model
(𝐴, 𝐵) is controllable. Then, in the model class.

47
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

For a set of time series Wd , the Hankel matrix (9) is generalized to Page matrix has been independently derived as a basis for the system
the mosaic-Hankel matrix (Heinig, 1995; Usevich & Markovsky, 2014) behavior in Agarwal et al. (2018) and Coulson et al. (2020).
[ ] Some pros and cons of the different matrix structures are as follows:
H𝐿 (Wd ) ∶= H𝐿 (𝑤1d ) ⋯ H𝐿 (𝑤𝑁 d
) .
the Hankel matrix conditions the data on time-invariance leading to
In Markovsky and Dörfler (2020), the following identifiability condition larger dimensional data matrices, whereas the trajectory and Page
for a linear time-invariant data-generating system B ∈ L 𝑞 is proven: matrices offer algorithmic advantages since they are unstructured.
the system B is identifiable from the data (12)–(13) if and only if Clearly the latter require more data. Markovsky and Dörfler (2020)
( ) report empirical results showing advantages of the Hankel matrix when
rank H𝐥(B)+1 (Wd ) = 𝐦(B) 𝐥(B) + 1 + 𝐧(B). used for system identification, while in Section 5 we show that the
The following corollary of the identifiability condition provides trajectory and Page matrices have advantages when used in direct data-
a foundation for a non-parametric representation of the restricted driven control. We will touch upon these points in later sections and
behavior B|𝐿 of the data-generating system. continue focusing on the mosaic-Hankel matrix H𝐿 (Wd ) keeping the
special matrix structures in mind. The essential property of all data
Corollary 5 (Corollary 19, Markovsky & Dörfler, 2020). If the data- matrices is that every column of the matrix, viewed as a time series,
generating system B is linear time-invariant, is an 𝐿-samples long trajectory of the system.

image H𝐿 (Wd ) ⊆ B|𝐿 , for all 𝐿 ∈ { 𝐥(B) + 1, … , 𝐿max }. 3.3. Data-driven representation of the restricted behavior
Moreover, for 𝐿 ≥ 𝐥(B), image H𝐿 (Wd ) = B|𝐿 if and only if
The fundamental lemma and Corollary 5 provide a non-parametric
rank H𝐿 (Wd ) = 𝐦(B)𝐿 + 𝐧(B). (14) representation of the restricted behavior B|𝐿 as the image of the
mosaic-Hankel matrix H𝐿 (Wd ) of the data with depth 𝐿. Under the
Proof. A representation-free proof is given in Appendix C. The key generalized persistency of excitation condition (14) (or conditions 1–3
argument is showing that the dimension of the image of the Hankel of Lemma 2),
matrix H𝐿 (Wd ) is equal to the dimension of B|𝐿 . Then, (14) follows
from the dimension formula (1). □ B|𝐿 = image H𝐿 (Wd ). (17)

Corollary 5 is an alternative to the fundamental lemma. Like the For given 𝐿, the non-parametric data-driven representation (17) is
fundamental lemma, it gives conditions under which the image of completely specified by the data Wd . Under (14), it is valid for any
the Hankel matrix H𝐿 (Wd ) constructed from the data generates the multivariable linear time-invariant system. Note that the alternative
restricted behavior B|𝐿 . Unlike the fundamental lemma, however, conditions 1–3 of the fundamental lemma restrict the class of systems
Corollary 5 does not require a given input/output partitioning of the due to the controllability assumption.
variables nor controllability of the data-generating system. Also, Corol- Based on (17), two approaches for solving data-driven analysis,
lary 5 gives a necessary and sufficient condition while the fundamental signal processing, and control problems were proposed:
lemma gives sufficient conditions only. Condition (14) is reminiscent
1. solving a system of linear equations, and
to the persistency of excitation condition in the fundamental lemma.
2. solving a rank-constrained matrix approximation and comple-
We refer to it as a generalized persistency of excitation. It is verifiable
tion problem.
from data Wd and prior knowledge of the structure indices 𝐦(B), 𝐥(B),
and 𝐧(B). An experiment design problem achieving the condition (14) The first approach, originally used for data-driven simulation and open-
with a minimal length input is addressed by van Waarde (2021). loop linear quadratic tracking control in Markovsky and Rapisarda
Another way in which Corollary 5 generalizes the fundamental (2008), expresses the constraint that 𝑤 ∈ (R𝑞 )𝐿 is a trajectory of the
lemma is that it allows for multiple trajectories (12) as in van Waarde, system B as existence of a solution of a system of linear equation:
De Persis et al. (2020). The mosaic-Hankel matrix in Corollary 5
includes as special cases other matrix structures, such as the Hankel 𝑤 ∈ B|𝐿 ⟺ 𝑤 = H𝐿 (Wd )𝑔 has a solution 𝑔. (18)
matrix (9), the Page matrix, and the trajectory matrix. The trajectory The right-hand-side condition of (18) involves only the collected data,
matrix used in dictionary learning (Brunton et al., 2016) collects time so that the system B need not be known. The approach using (18) re-
series column-by-column as quires basic linear algebra—the solution of a system of linear equations.
1 𝑤2d (1) 𝑤𝑁 For details see Section 4.3, where (18) is used for data-driven missing
⎡ 𝑤d (1) ⋯ d
(1) ⎤
T𝐿 (Wd ) ∶= ⎢ ⋮ ⋮ ⋮ ⎥ ∈ R𝑞𝐿×𝑁 (15) data estimation. Modifications of the approach for noisy data Wd are
⎢ 1 ⎥
⎣𝑤d (𝐿) 𝑤2d (𝐿) ⋯ 𝑤𝑁
d
(𝐿)⎦ presented in Section 4.4 and in Section 5 for data-driven control.
The second approach expresses the constraint that 𝑤 ∈ (R𝑞 )𝐿 is a
and is a special case of the mosaic-Hankel matrix when all time series
trajectory of a bounded complexity linear time-invariant system B as
𝑤𝑖d have length 𝑇1 = ⋯ = 𝑇𝑁 = 𝐿, i.e., 𝑤𝑖d ∈ (R𝑞 )𝐿 for all 𝑖 ∈ { 1, … , 𝑁 }.
′ a rank condition:
Coined by Damen et al. (1982), the Page matrix P𝐿 (𝑤d ) ∈ R𝑞𝐿×𝑇 of
[ ]
𝑞 𝑇
the signal 𝑤d ∈ (R ) with 𝐿 block rows is a special trajectory matrix 𝑤 ∈ B|𝐿 ⟺ rank H𝛿 (Wd )H𝛿 (𝑤) = rank H𝛿 (Wd ),
(and therefore a special mosaic-Hankel matrix) obtained by taking 𝑤𝑖d =
for any 𝛿 ∈ { 𝐥(B) + 1, … , 𝐿 }. (19)
(𝜎 (𝑖−1)𝐿 𝑤d )|𝐿 , for 𝑖 ∈ { 1, … , 𝑇 ′ }, where 𝑇 ′ ∶= ⌊𝑇 ∕𝐿⌋. Alternatively, the
Page matrix P𝐿 (𝑤d ) can be obtained from the Hankel matrix H𝐿 (𝑤) The right-hand-side of (19) involves only the data and the parameter 𝛿,
by column selection: so that again the system B need not be known. The condition is valid
[ ′ ] for any value of 𝛿 in the interval [𝐥(B) + 1, 𝐿], however, different
P𝐿 (𝑤d ) ∶= 𝑤d |𝐿 (𝜎 𝐿 𝑤d )|𝐿 ⋯ (𝜎 (𝑇 −1)𝐿 𝑤d )|𝐿
( ) choices of 𝛿 lead to different methods. Of most interest are the extremes
⎡ 𝑤d (1) 𝑤d (𝐿 + 1) ⋯ 𝑤d (𝑇 ′ − 1)𝐿 + 1 ⎤ 𝛿 = 𝐿 and 𝛿 = 𝐥(B) + 1. The case 𝛿 = 𝐿 recovers the system of
=⎢ ⋮ ⋮ ⋮ ⎥. (16) equations (17) approach, while the case 𝛿 = 𝐥(B) + 1 leads to the
⎢ ⎥
⎣𝑤d (𝐿) 𝑤d (2𝐿) ⋯ 𝑤d (𝑇 ′ 𝐿) ⎦ mosaic-Hankel structured rank-constrained matrix approximation and
Like the Hankel matrix H𝐿 (𝑤d ), the Page matrix P𝐿 (𝑤d ) also consists completion approach of Markovsky (2008). In Section 4 we use (19)
of 𝐿-samples long trajectories, however, unlike the Hankel matrix, with 𝛿 = 𝐥(B) + 1 for data-driven missing data estimation and in
the Page matrix has no repeated elements on the anti-diagonals. The Section 5 the rank constraint (19) is used for pre-processing.

48
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

In (19), for 𝛿 = 𝐥(B) + 1, the generalized persistency of excitation 4.1. Conventional model-based problem formulations
condition coincides with the identifiability condition. In contrast, in
(18), 𝐿 is the length of the signal 𝑤, which is given and is in general In order to fit conventional problems that are defined in terms
larger than 𝐥(B) + 1. The required generalized persistency of excitation of inputs and outputs in the behavioral setting, in this section we
for using the system of equations approach (18) is therefore more partition the variables 𝑤 into inputs 𝑢 and outputs 𝑦. For simplicity,
[ ]
restrictive than the one for using the matrix completion approach (19) we assume that 𝑤 = 𝑦𝑢 , i.e., in (4) 𝛱 = 𝐼. Also, in order to specify or
with 𝛿 = 𝐥(B) + 1. Also, growing 𝐿 implies growing dimension of the estimate initial condition in a representation-free manner, we split the
system of equations, which increases the computational cost. time axis into ‘‘past’’ 𝑤ini —the first 𝑇ini samples—and ‘‘future’’ 𝑤f —the
Both deficiencies of (18)—the higher persistency of excitation and remaining 𝑇f samples (see Section 2.5). Then, by Lemma 1, we take
the higher computational cost—can be overcome by splitting 𝑤 into 𝑇ini ≥ 𝐥(B).
length-𝛿 pieces and computing each piece separately using (18), match- Our goal is to show how conventional problems, such as simulation,
ing the initial condition of one piece with the final conditions of the smoothing, and output tracking control are equivalent to corresponding
previous piece, see Markovsky et al. (2005, Lemma 3). The resulting al- missing data estimation problems. For example, the simulation problem
gorithm requires solving recursively a sequence of smaller dimensional defined in (8) was already posed as missing data estimation: find the
systems of linear equations rather than one larger system. In the limit, unknown 𝑦f from the given initial condition 𝑤ini and input 𝑢f . The main
the lengths of the pieces can be taken as 𝛿 = 𝐥(B) + 1, which ensures message of this section is that the missing data estimation framework
that the required persistency of excitation is the same as the one for goes beyond simulation. Next, we show that it fits also two versions
identifiability of the data-generating system B. In this case, however, of state estimation—Kalman smoothing and errors-in-variables Kalman
(18) is equivalent to a one-step-ahead predictor, which is essentially a smoothing—as well as a output tracking control.
model-based solution approach. The recursive algorithm outlined above
• Kalman smoothing The problem is defined as follows: given a
is therefore a more flexible solution method that uses a 𝛿-steps ahead
linear time-invariant system B and a ‘‘noisy’’ trajectory 𝑤f , find
data-driven prediction, where 𝛿 is a hyperparameter.
the initial condition 𝑤ini . In the conventional Kalman smoothing
problem the ‘‘noisy’’ trajectory 𝑤f is generated in the output error
4. Data-driven missing data estimation setup: the output is measured with additive noise 𝑦f = 𝑦f + 𝑦̃f ,
while the input is assumed exact 𝑢f = 𝑢f . The true value 𝑤f
Apart from the data-driven representation that emerged from the of the trajectory 𝑤f is generated by the system B from some
fundamental lemma, the paper is based on another key idea, put unknown true initial condition 𝑤ini , i.e., 𝑤ini ∧ 𝑤f ∈ B|𝑇ini +𝑇f .
forward in Markovsky (2017): a missing part of a generic trajectory Assuming further on that the measurement noise 𝑦̃f is zero mean,
of the system can be used to represent and compute the object that white, Gaussian with covariance matrix that is a multiple of
is aimed at, e.g., the predicted signal in forecasting problems and the identity, the maximum-likelihood estimation problem for the
the input signal in control problems. The seeds for this idea can be initial condition 𝑤ini is given by
traced back to Markovsky and Rapisarda (2008). In Markovsky and
minimize over 𝑤
̂ini and 𝑦f ‖𝑦f − 𝑦̂f ‖2
Rapisarda (2008) two seemingly different problems—simulation and (20)
subject to ̂ini ∧ (𝑢f , 𝑦̂f ) ∈ B|𝑇ini +𝑇f .
𝑤
control—are solved by minor variations of the same basic method. It
gradually emerged that this similarity is not incidental but a mani- A byproduct of estimating the initial condition 𝑤 ̂ini in (20) is an
festation of a more general principle: rank deficiency of a structured approximation of the output 𝑦̂f . The signal 𝑦̂f is the best estimate
data matrix. Consequently, the problem is structured matrix low-rank of the true output 𝑦̄f , given the model B and the prior knowledge
approximation and completion (Markovsky, 2014). about the measurement noise. Problem (20), which defines the
The problem considered in Markovsky (2017) deals with noisy conventional Kalman smoother (Kailath et al., 2000), is also a
data and its solution requires local optimization methods. In contrast, missing data estimation problem for 𝑤ini , however, the output 𝑦f
the problems in Markovsky and Rapisarda (2008) are for exact data is approximated rather than fitted exactly.
and analytical solutions are derived. This disconnect between the two • Errors-in-variables (EIV) Kalman smoothing The output error setup
approaches is unfortunate. Latter research showed that the two ap- used in the conventional Kalman smoothing problem is asym-
proaches are complementary. In the context of data-driven control, the metric in the observed variables: the output is assumed noisy
results in Markovsky (2017) were generalized for noisy data, using while the input is assumed exact. A symmetric setup where all
regularization methods. In Dörfler et al. (2021b) and Markovsky and variables are treated on an equal footing as noisy is called errors-
Dörfler (2021) the low-rank approximation/completion approach and in-variables. The errors-in-variables setup is consistent with the
the regularization approaches are unified. behavioral approach where all variables are treated on an equal
This section presents the missing data estimation problem. First, footing without splitting them into inputs and outputs. The state
Section 4.1 shows how familiar model-based problems, such as sim- estimation problem in the errors-in-variables setup is again: given
ulation, Kalman smoothing, and output tracking control can be viewed a linear time-invariant system B and a ‘‘noisy’’ trajectory 𝑤f , find
as missing data estimation. Then, Section 4.2 presents a generic for- the initial condition 𝑤ini , however, now the ‘‘noisy’’ trajectory 𝑤f
mulation of the missing data estimation problem that fits the examples is 𝑤f = 𝑤f + 𝑤 ̃f , where 𝑤ini ∧ 𝑤f ∈ B|𝑇ini +𝑇f for some 𝑤ini and a
as special cases. The generic problem is data-driven, i.e., instead of a zero mean, white, Gaussian noise 𝑤 ̃f with covariance matrix that
system a set of trajectories Wd is given. Section 4.3 outlines two solution is a multiple of the identity. The maximum-likelihood estimation
approaches assuming that the data Wd is exact. The first solution is problem for the initial condition 𝑤ini is then:
based on a rank constrained matrix approximation and completion minimize over 𝑤
̂ini and 𝑤
̂f ‖𝑤f − 𝑤
̂ f ‖2
reformulation of the problem that uses (19). The second solution is (21)
subject to 𝑤 ̂f ∈ B|𝑇ini +𝑇f .
̂ini ∧ 𝑤
based on the system of linear equations reformulation of the problem
that uses (18). Section 4.4 shows modifications of the methods for Problem (21) defines the what is called errors-in-variables Kalman
the case of inexact/noisy data Wd . Numerical case studies are shown smoother (Markovsky & De Moor, 2005). It is a missing data
in Section 4.5. Finally, Section 4.6 comments on application of the estimation problem for 𝑤ini , where the whole given trajectory 𝑤f
data-driven methods for linear time-invariant system analysis. is approximated.

49
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Table 2 for suitable choices of the trajectory 𝑤 and the weights 𝑣. The formu-
The examples considered can be viewed as estimation of a missing part of a trajectory
lation (23) not only generalizes the problems considered in Section 4.1
(the question marks ‘‘?’’s in the table) where other parts of the trajectory are given as
exact (E) or inexact/noisy (N). but can also be used to formulate other problems such as simulation
Example Reference 𝑤ini 𝑢f 𝑦f with terminal conditions, trajectory generation with way points, and
estimation of missing data (Markovsky & Dörfler, 2021).
Simulation (8) E E ?
Kalman smoothing (20) ? E N In order to solve (23) numerically, we reformulate it as an equality
EIV Kalman smoothing (21) ? N N constrained least-squares minimization. Let Iexact be the vector of
Output tracking (22) E ? N indices of the exact given elements and Itba be the vector of indices
of the to-be-approximated (tba) given noisy or reference elements. We
Table 3 overload the notation 𝑤|𝐿 for a vector of indices I ∈ { 1, … , 𝑞𝐿 }𝐾
The information about exact, noisy, reference, and missing data elements 𝑤𝑖 (𝑡) is [ ]⊤
encoded into the weights 𝑣𝑖 (𝑡) of the element-wise weighted semi-norm ‖ ⋅ ‖𝑣 . 𝑤|I ∶= 𝑤I1 ⋯ 𝑤I𝐾 ∈ R𝐾
Weight Used if To By
as the subvector of 𝑤 ∈ R𝑞𝐿 with indices I . Similarly, for latter usage,
𝑣𝑖 (𝑡) = ∞ 𝑤𝑖 (𝑡) exact Interpolate 𝑤𝑖 (𝑡) 𝑒𝑖 (𝑡) = 0
H𝐿 (𝑤d )|I is the submatrix of H𝐿 (𝑤d ) with row indices I . With this
𝑣𝑖 (𝑡) ∈ (0, ∞) 𝑤𝑖 (𝑡) noisy/reference Approximate 𝑤𝑖 (𝑡) min ‖𝑣𝑖 (𝑡)𝑒𝑖 (𝑡)‖2
𝑣𝑖 (𝑡) = 0 𝑤𝑖 (𝑡) missing Fill in 𝑤𝑖 (𝑡) ̂∈B
𝑤 ̂|𝐿 notation in place, the missing data estimation problem (23) becomes:
minimize ̂ ‖𝑤|Itba − 𝑤|
over 𝑤 ̂ Itba ‖𝑣|I
tba
(24)
subject to ̂ ∈ B|𝐿
𝑤 and ̂ Iexact = 𝑤|Iexact .
𝑤|
• Output tracking Finally, the least-squares output tracking problem
is defined as follows: given initial condition 𝑤ini , and an output In the next section we present methods for solving (24) based on the
𝑦f , find an input ̂
𝑢f , such that data-driven representation (17) of the system B.

minimize over ̂
𝑢f and 𝑦̂f ‖𝑦f − 𝑦̂f ‖2
(22) 4.3. Solution methods with exact data Wd
𝑢f , 𝑦̂f ) ∈ B|𝑇ini +𝑇f .
subject to 𝑤ini ∧ (̂
The signal ̂𝑢f is the open-loop optimal control signal. Problem (22) As previewed in Section 3.3, there are two distinct approaches: one
is a missing data estimation problem, where the missing data by solving a system of linear equations and one by solving a rank-
is the input. The given data is the reference signal 𝑦f , which is constrained matrix approximation and completion problem. In this
approximated in the least-squares sense by the output 𝑦̂f . In the section, we use them for solving the missing data estimation prob-
special case when the reference output 𝑦f ∈ B|𝑇f , the problem lem (24). We begin with the rank-constrained matrix approximation
is called output-matching. The output-matching problem is dual to and completion approach.
the simulation problem, where the missing data is the output 𝑦f By means of Corollary 5 and (19), we obtain a data-driven version
and the given data is 𝑤ini and the (exact) input 𝑢f . of the missing data estimation problem (24):
In Section 5.1, we address the data-driven control extensions
of (22). Note that the errors-in-variables Kalman smoothing prob- minimize ̂ ‖𝑤|Itba − 𝑤|
over 𝑤 ̂ Itba ‖𝑣|I
tba
[ ]
lem (21) (with a more general weighted 2-norm cost function) is subject to ̂ = rank H𝛿 (Wd )
rank H𝛿 (Wd ) H𝛿 (𝑤) (25)
equivalent to the linear–quadratic tracking control problem (36),
and 𝑤|
̂ Iexact = 𝑤|Iexact .
defined in Section 5.1.
I.e., assuming that (14) holds, the missing data estimation problem
Table 2 summarizes the examples. The data-driven versions of these
(24) is equivalent to the mosaic-Hankel structured low-rank matrix
signal processing problems assume given data Wd of the system B.
approximation and completion problem (25) for any 𝛿 ∈ { 𝐥(B) +
Consequently, the data-driven solution methods avoid identifying a
1, … , 𝐿 }. The hyperparameter 𝛿 of (25) determines the shape of the
parametric representation of B.
Hankel matrix. In case of exact data, it does not affect the solution.
An independent yet similar data-driven approach to output track-
4.2. Generic missing data problem formulation
ing (22) was conceptually laid out by Ikeda et al. (2001). The authors
also base their approach on the rank condition (14) to specify the rank
In all problems considered in Section 4.1 the goal is to minimize the
of the Hankel matrix containing past data, the future output reference,
error signal 𝑒 ∶= 𝑤 − 𝑤,̂ where 𝑤 contains the given data (exact, noisy,
and the future inputs to be designed.
or reference signal) as well as missing values and 𝑤 ̂ is a trajectory of
Due to the rank constraint, (25) is a nonconvex optimization prob-
the system. The information about exact, noisy, reference, and missing
lem. A convex relaxation, based on the nuclear norm ‖⋅‖∗ regularization
data is encoded in the weights 𝑣𝑖 (𝑡) ≥ 0 of the element-wise weighted
(see Fazel, 2002) is
semi-norm (see Table 3)
√ minimize ̂ ‖𝑤|Itba − 𝑤|
over 𝑤 ̂ Itba ‖𝑣|I
√𝐿 𝑞
√∑ ∑ tba
‖𝑒‖𝑣 ∶= √ 𝑣𝑖 (𝑡)𝑒2𝑖 (𝑡). ‖[ ]‖
+ 𝛾 ‖ H𝛿 (Wd ) H𝛿 (𝑤) ̂ ‖ (26)
𝑡=1 𝑖=1 ‖ ‖∗
subject to 𝑤|
̂ Iexact = 𝑤|Iexact ,
Note that the given noisy and reference data is treated in the same way
by approximating it in the weighted least squares sense. The difference where 𝛿 and 𝛾 are hyperparameters. The parameter 𝛾 controls the trade-
is in the interpretation of the weights. In the noisy case and using the off between the approximation error ‖𝑤|Itba − 𝑤|
̂ Itba ‖𝑣|I and the
tba
maximum-likelihood estimation principle, the weights are determined
nuclear norm of the Hankel matrix, which is a surrogate for the system’s
by the inverse of the noise variances, which are assumed a priori
complexity. Generally, 𝛾 should be chosen large enough in order to
known. In the reference tracking case, the weights define the control
ensure the desired rank (19). In Dreesen and Markovsky (2019) it
objective, which is specified by the designer. [ ]
is suggested to consider a weighted data matrix 𝛼H𝛿 (Wd ) H𝛿 (𝑤) ̂ ,
The examples considered are then special cases of the following
where 𝛼 ≥ 1. It is shown that for 𝛼 above certain threshold the solution
generic missing data estimation problem
of (26) coincides with the solution of (25), i.e., the missing data is
̂ ‖𝑤 − 𝑤‖
minimize over 𝑤 ̂ ∈ B|𝐿
̂ 𝑣 subject to 𝑤 (23) recovered exactly by solving (26).

50
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

The other approach for solving (24) is to use the linear equations The approach based on the linear equations representation (18)
representation (18). It leads to the equality-constraint least-squares leads to the problem
problem (Markovsky & Dörfler, 2021)
minimize over 𝑔 ‖𝑤|Itba − H𝐿 (W
̂⋆ )|I 𝑔‖𝑣|
d tba I
minimize over 𝑔 and 𝑤
̂ ‖𝑤|Itba − 𝑤|
̂ Itba ‖𝑣|I tba
tba
(27) ∑
𝑁
subject to ̂ = H𝐿 (Wd )𝑔
𝑤 and ̂ Iexact = 𝑤|Iexact ,
𝑤| subject to W
̂⋆ ∈ arg min
d
̂𝑖d ‖2𝑖
‖𝑤𝑖d − 𝑤 (32)
𝑣d
W
̂
d 𝑖=1
which admits a closed-form solution (Golub & Van Loan, 1996, Chap-
subject to rank H𝓁+1 (W
̂d ) ≤ 𝑚(𝓁 + 1) + 𝑛,
ter 12). Note also that (27) has no hyperparameters. In case of uniform
weights (i.e., 𝜈𝑖 = constant, for all 𝑖 ∈ Itba ) and no exact data (i.e., which is also non-convex due to the rank constraint in the inner
Iexact = ∅), the solution of (27) is optimization. Problem (32) is a bi-level program: the inner level is
( )† estimation of Wd and the outer level is estimation of the missing data
𝑤̂ = H𝐿 (Wd ) H𝐿 (Wd )|Itba 𝑤|Itba , (28)
𝑤|I missing
̂⋆ of Wd . Similar to (31), the bi-level
using the estimate W d
where 𝑀 † is the pseudo-inverse of 𝑀. problem (32) is amenable to either a sequential two-step procedure or
The approach (27) using the linear equations representation is at a convex relaxation based on sparse regularization.
first glance superior to the one using the low-rank Hankel matrix
approximation and completion (25) due to the simplicity of the so-
lution (28) and due its effective modifications for the case of inexact Two-step procedure: preprocessing of Wd
data presented in the next section. In case of inexact data, however, a The bi-level optimization problem (32) generally cannot be sepa-
modification of (25) also leads to methods for computing the statisti- rated in two independent problems. Indeed, the solution of the outer
cally optimal maximum-likelihood estimator in the errors-in-variables problem depends on the inner problem and, in general, W ̂⋆ depends on
d
setup (Markovsky, 2017). the given data 𝑤|Igiven , which includes the to-be-approximated 𝑤|Itba
and the exact 𝑤|Iexact samples, as well as on Wd .
4.4. Solution methods with inexact/noisy data Wd
A heuristic two-step procedure estimates Wd using Wd only:

In Section 4.3, we assumed that Wd is exact. In this section, we 1. preprocess Wd , aiming to remove the noise, and
assume that Wd as well as 𝑤|Itba are noisy and are generated in the 2. using the ‘‘cleaned’’ signal W
̂d , find 𝑤|I .
missing
errors-in-variables setup:
𝑖 Thus, the two-step procedure reduces the problem with inexact data to
̃𝑖d , for 𝑖 = 1, … , 𝑁
𝑤𝑖d = 𝑤d + 𝑤 and 𝑤|Itba = 𝑤|Itba + 𝑤|
̃ Itba ,
the already solved problem with exact data.
𝑖
where 𝑤d , 𝑤|Itba are the true values of 𝑤𝑖d , 𝑤|Itba , respectively, and 𝑤̃𝑖d , The maximum-likelihood estimation of Wd from Wd and the prior
̃ Itba are the measurement noises that are assumed to be zero mean,
𝑤| knowledge (29) is
( )−1
Gaussian with joint covariance matrix diag(𝑣1d , … , 𝑣𝑁 d
, 𝑣|Itba ) . The ∑
𝑁
true values of the signals are exact trajectories of a linear time-invariant minimize over W
̂d and B
̂ ̂𝑖d ‖2𝑖
‖𝑤𝑖d − 𝑤
𝑣d (33)
system B with complexity bounded by 𝑐 = (𝑚, 𝓁, 𝑛), i.e., 𝑖=1

𝑖 subject to ̂𝑖d
𝑤 ∈ B|
̂ 𝑇 for 𝑖 = 1, … , 𝑁 and B
̂ ∈ L 𝑞.
𝑤d ∈ B|𝑇𝑖 , for 𝑖 = 1, … , 𝑁, 𝑤 ∈ B𝐿 , and B ∈ L𝑐𝑞 . (29) 𝑖 𝑐

The formulation (33), however, is still a nonconvex optimization prob-


The maximum-likelihood estimation problem in the errors-in-
lem. For its solution, we use the SLRA package (Usevich & Markovsky,
variables setup (29) is (Markovsky, 2017)
2014), which is based on local optimization and computes as a byprod-

𝑁
uct an estimate B ̂ of the data generating system. Thus the two-step
minimize over W ̂ and B
̂d , 𝑤, ̂ ̂𝑖d ‖2𝑖
‖𝑤𝑖d − 𝑤
𝑣d procedure becomes a model-based approach for solving (30): (1) us-
𝑖=1
ing Wd , identify a model B,
̂ (2) using B̂ and 𝑤|
Igiven , do model-based
̂ Itba ‖2𝑣|
+ ‖𝑤|Itba − 𝑤| (30)
Itba
estimation of 𝑤|I (problem (24), using B).
missing
̂
subject to ̂𝑖d ∈ B|
𝑤 ̂𝑇
𝑖
for 𝑖 = 1, … , 𝑁, ̂ ∈ B|
𝑤 ̂ 𝐿,
Although in general, the two-step procedure is suboptimal, when
B
̂ ∈ L𝑐 , and 𝑤|
̂ Iexact = 𝑤|Iexact . dim Igiven ≤ 𝑚𝐿+𝑛, the problem decouples and the two-step procedure
is optimal, i.e., the solution of (30) coincides with the solution of (33)
By using the data-driven complexity characterization (19) and the
followed by (24).
fact that rank H𝓁+1 (Wd ) = (𝓁 + 1)𝑚 + 𝑛, we restate the maximum-
likelihood estimation problem (30) in a data-driven fashion as a mosaic- A suboptimal heuristic for preprocessing Wd is to perform un-
Hankel structured low-rank approximation and completion problem: structured low-rank approximation of the Hankel matrix H𝐿 (Wd ) by
truncation of the singular value decomposition (SVD). The resulting
Algorithm 1 does not derive a parametric model B̂ of B and thus may

𝑁
minimize over W
̂d and 𝑤
̂ ̂𝑖d ‖2𝑖
‖𝑤𝑖d − 𝑤 be referred to as data-driven. Algorithm 1 requires prior knowledge
𝑣d
𝑖=1 of the number of inputs 𝑚 and the order 𝑛, however, it has no other
̂ Itba ‖2𝑣|
+ ‖𝑤|Itba − 𝑤| (31) hyperparameters.
Itba
[ ]
subject to rank H𝓁+1 (W
̂d ) H𝓁+1 (𝑤)
̂ ≤ 𝑚(𝓁 + 1) + 𝑛
Algorithm 1 Data-driven missing data estimation with low-rank
and 𝑤|
̂ Iexact = 𝑤|Iexact . approximation preprocessing.
The latter is a nonconvex optimization problem due to the rank con- Input: Wd , Igiven , 𝑤|Igiven , 𝑚, and 𝑛.
straint. Local optimization methods based on the variable projections 1: Compute the SVD: H𝐿 (Wd ) = 𝑈 𝛴𝑉 ⊤ .
are developed in Markovsky and Usevich (2013). Suboptimal solution 2: Let 𝑟 ∶= 𝑚𝐿 + 𝑛 and let 𝑃 ∈ R𝑞𝐿×𝑟 be the submatrix of 𝑈 consisting
methods presented later on are: of its first 𝑟 columns.
̂ ∶= 𝑃 (𝑃 |Igiven )⊤ 𝑤|Igiven .
3: Compute 𝑤
1. a sequential two-step model-based approach and
Output: 𝑤.
̂
2. convex relaxations based on sparse regularization.

51
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Regularized least-squares approaches


Other approximation methods for missing data estimation with
inexact data Wd are the nuclear norm relaxation of (31)

𝑁
minimize over W
̂d and 𝑤
̂ ̂𝑖d ‖2𝑖
‖𝑤𝑖d − 𝑤
𝑣d
𝑖=1
[ ] (34)
̂ Itba ‖2𝑣| ‖ ‖
+ ‖𝑤|Itba − 𝑤| + 𝛾 ‖ H𝛿 (W
̂d ) H𝛿 (𝑤)
̂ ‖
Itba ‖ ‖∗
subject to ̂ Iexact = 𝑤|Iexact
𝑤|
and the 𝓁1 regularization of (32) (Markovsky & Dörfler, 2021)
minimize over 𝑔 ‖𝑤|Itba − H𝐿 (Wd )|Itba 𝑔‖2𝑣| + 𝜆‖𝑔‖1
Itba
(35)
subject to ̂ Iexact = 𝑤|Iexact .
𝑤|
Fig. 3. The results of data-driven simulation using noisy data obtained in the errors-
The solution of (34) depends on the choice of the hyperparameters in-variables setting confirm empirically that the maximum-likelihood method (2s-ml)
𝛿 and 𝛾. Empirical evidence suggest that the optimal value for 𝛿 is is statistically optimal. The performance of the two-step method with low-rank
the largest one 𝛿 = 𝐿. The hyperparameter 𝛾 controls the fitting preprocessing (lra) is comparable with the one of the pseudo-inverse method (28)
accuracy versus model complexity trade-off. Since larger 𝛾 implies (pinv). Their performance is worse than the one of the maximum-likelihood method
but better than the one of the 𝓁1 -norm regularization method (l1).
larger approximation error ‖𝑤|Itba − 𝑤|̂ Itba ‖𝑣|I , the optimal choice
tba
is the smallest value, for which the rank constraint is met. It can be
found by bisection (Markovsky, 2012).
The 𝓁1 -norm regularization method (35) is based on the fact that, interpolation horizon is 𝐿 = 10. Note that in this setup the two-step
in case of exact data Wd , 𝑔 can be chosen sparse. Namely, with ‖𝑔‖0 model-based method — (33) followed by (24) — is optimal, i.e., it
denoting the number of nonzero elements of 𝑔, ‖𝑔‖0 = 𝑚𝐿+𝑛. Then, the solves the maximum-likelihood estimation problem (30).
1-norm ‖𝑔‖1 can be used as a convex relaxation of ‖𝑔‖0 . This method As a performance metric, consider the estimation error
is proposed in Dörfler et al. (2021a) for solving a related data-driven ‖𝑤|Imissing − 𝑤|
̂ Imissing ‖
control problem and is used for data-driven interpolation in Markovsky 𝑒missing ∶= 100%,
and Dörfler (2021). For the numerical solution of (35) in the examples ‖𝑤|Imissing ‖
of Section 4.5, we use CVX (Grant & Boyd, 2008) and the ADMM where 𝑤 ̂ is the computed solution, is averaged over 100 Monte-Carlo
method (Parikh & Boyd, 2014). repetitions of the experiment with different noise realizations. Fig. 3
To summarize: problems (30)–(32) are equivalent and have the shows that the two-step model-based method (2s-ml) achieves the
same hyperparameters (𝑚, 𝓁, 𝑛). They are the statement and data- smallest average estimation error. This is expected because it is the
driven reformulations of the underlying maximum-likelihood data- maximum-likelihood method for the specific simulation setup consid-
driven missing data estimation problem. Problem (33) is a reformu-
ered. The maximum-likelihood method requires nonlinear local opti-
lation of the inner problem in (32) as a maximum-likelihood model
mization but sets a lower bound on the achievable estimation error by
identification problem with the same hyperparameters (𝑚, 𝓁, 𝑛). Prob-
the other methods that are cheaper to compute albeit suboptimal in the
lems (30)–(33) are non-convex. Local optimization methods for solving
maximum-likelihood sense. The results show that Algorithm 1, i.e., the
them require a favorable initialization and are computationally expen-
two-step method with low-rank approximation preprocessing (lra), is
sive. In comparison, problems (34) and (35) are convex relaxations
marginally better than the method based on the pseudo-inverse (28)
of the underlying problem (30). Their hyperparameters are the reg-
(pinv). Note that the low-rank approximation preprocessing method
ularization coefficients 𝛾 and 𝜆, and their solutions can be used as
requires knowledge of the model complexity in order to achieve the
initializations for local methods solving (30). Though, the solutions
‘‘right’’ complexity reduction, while the method based on the pseudo-
of the convex relaxations (34)–(35) are also of interest in their own
inverse does not require any prior knowledge. Because of this, the
right and may result in favorable outcomes; see the case studies
in Section 4.5. Also, there are efficient computational methods for similar performance of pinv and lra is surprising.
(35) Parikh and Boyd (2014). The 𝓁1 -norm regularization method (35) (l1) with optimal choice
of the hyperparameter 𝜆 = 0.1 gives the worst results. As shown in
4.5. Numerical case studies the next section however this is not the case when real-life data is
used. Empirical evidence by Huang, Coulson et al. (2021), Markovsky
First, we compare the approximation methods for missing data and Dörfler (2021) and Wegner et al. (2021) also confirms the good
estimation on a Monte-Carlo simulation example. Then, we compare performance of the 𝓁1 -norm regularization for noisy data coming from
the methods on real-data. nonlinear systems.

Simulated data Real-data: Air passengers data benchmark


The data generating system used in this section is the benchmark The data set used in this section is a classic time-series forecasting
example of Landau et al. (1995). It is a 4th order single-input single- benchmark of Box and Jenkins (1976). It consists of 144 samples that
output system B defined by a kernel representation (3) with parameter represent the monthly totals of international airline passengers (in
[ ] [ ] thousands of passengers) between 01/1949 and 12/1960. We use the
𝑅(𝑧) = −0.5067 0.8864 𝑧0 + −0.2826 −1.3161 𝑧1
[ ] 2 [ ] 3 [ ] first 110 samples as the given trajectory 𝑤d and the remaining 34 sam-
+ 0 1.5894 𝑧 + 0 −1.4183 𝑧 + 0 1 𝑧4 .
ples as the to-be-interpolated trajectory 𝑤. From 𝑤, the first half is the
The trajectory 𝑤d is generated in the errors-in-variables setup, with given data 𝑤|Itba and the second half 𝑤|Imissing is missing (see Fig. 4,
𝑤d ∈ B|100 a random trajectory of B. The noise standard devia- up). For Algorithm 1, we set the parameters 𝑚 = 0 (no inputs) and 𝑛 = 6
tion is selected to match a desired noise-to-signal ratio of 𝑤d . In the (the best value obtained by trial-and-error.) The results in Table 4 (see
experiments, the noise-to-signal ratio is varied in the interval [0, 0.1] also Fig. 4, down) show that 𝓁1 -norm regularization method (35) (l1)
(i.e., up to 10% noise). The to-be-interpolated trajectory 𝑤 is the step with optimized value of 𝜆 achieves the best prediction. Second best is
response of B from an input 𝑢 = 𝑤1 to an output 𝑦 = 𝑤2 . The the solution based on the pseudo-inverse (28) (pinv).

52
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Table 4 5. Data-driven control


Performance of the data-driven missing data estimation methods on the Box–Jenkins
airline passenger benchmark.
Data-driven control methods can be loosely classified into indirect
𝑒given , % 𝑒missing , %
data-driven control approaches consisting of sequential system iden-
pinv 0 3.9168
lra 4.0384 5.2688
tification and model-based control as well as direct data-driven con-
l1 3.3664 3.3387 trol approaches seeking an optimal decision compatible with data
2s-ml 4.0572 Fail recorded from the system. Both approaches have a rich history, and
they have received renewed interest cross-fertilized by novel methods
and widespread interest in machine learning. Representative recent
surveys for indirect and direct approaches are by Chiuso and Pillonetto
(2019), Hewing et al. (2020), Hjalmarsson (2005), Hou and Wang
(2013), Recht (2019) and Pillonetto et al. (2014), respectively.
The pros and cons of both paradigms have often been elaborated
on. Whereas the indirect approach is modular and well understood,
modeling and identification is cumbersome, its results are often not use-
ful for control (due to, e.g., incompatible uncertainty quantifications),
and practitioners often prefer end-to-end approaches. Direct methods
promise to resolve these problems by learning control policies directly
from data. However, they are often analytically and computationally
less tractable and rarely apply to real-time and safety-critical systems.
The methods reviewed in this article, based on the fundamen-
tal lemma, lend themselves both for direct as well as indirect ap-
proaches. Regarding the indirect approaches, the fundamental lemma
in Willems et al. (2005) has historically been developed as a foundation
for subspace system identification methods based on an experiment
design perspective. We refer to Markovsky et al. (2006) for a discus-
sion on how the fundamental lemma relates to the indirect approach
(i.e., system identification) and focus on direct data-driven control here.

5.1. Open-loop data-driven linear quadratic tracking

Our exposition follows up on the approaches presented in Section 4,


but we impose more structure in this section. As an extension to
the tracking problem (22), consider the linear quadratic (LQ) optimal
tracking control problem
𝑇f
Fig. 4. Up: splitting of the data into 𝑤d (wd), 𝑤|Itba (w(Itba)), and 𝑤|Imissing ∑
(w(Imissing)). Down: predictions obtained by the methods. The 𝓁1 -norm reg- minimize over 𝑢f , 𝑦f ‖𝑦f (𝑡) − 𝑦r (𝑡)‖2𝑄 + ‖𝑢f (𝑡) − 𝑢r (𝑡)‖2𝑅
𝑡=1 (36)
ularization method (35) (l1) with optimized value of 𝜆 achieves the best
prediction. subject to (𝑢ini , 𝑦ini ) ∧ (𝑢f , 𝑦f ) ∈ B|𝑇ini +𝑇f

on a finite horizon 𝑇f > 0, where 𝑤r = (𝑢r , 𝑦r ) ∈ R𝑞𝑇f is a user-defined


reference trajectory (not necessarily in B|𝑇f ), 𝑤f = (𝑢f , 𝑦f ) ∈ R𝑞𝑇f is
The two-step model-based method 2s-ml fails (relative error above
the future trajectory of length 𝑇f ≥ 1 to be designed, and 𝑤ini =
100%) and the low-rank preprocessing also does not improve the result (𝑢ini , 𝑦ini ) is a given prefix trajectory of length 𝑇ini ≥ 𝓁 setting the initial
of (28). The poor performance of 2s-ml and lra is attributed to condition; see Lemma 1. Further, 𝑄 ⪰ 0 and 𝑅 ≻ 0 are user-defined
the fact that 𝑤d does not satisfy a true linear time-invariant model weighting matrices, where ≻ (⪰) and ≺ (⪯) denote √ positive and negative
dynamics, however, 2s-ml and lra use this as prior knowledge and (semi)definiteness, respectively, and ‖𝑒‖𝑄 = 𝑒𝑇 𝑄𝑒 is a (semi-)norm
enforce it in the preprocessing step. On the other hand l1 and pinv for 𝑄 ⪰ 0. We remark that a quadratic cost is convenient but not strictly
are based on a non-parametric representation which does not impose necessary for many of the approaches reviewed in this section.
an a priori given bound on the model’s complexity. The LQ control problem (36) is an instance of errors-in-variables
Kalman smoothing (21). Problem (36) is standard and can be solved
by a variety of methods provided that a parametric model (typically
in state–space representation) of B is available (Anderson & Moore,
4.6. Data-driven analysis
2007). In what follows, we survey direct data-driven approaches related
to the fundamental lemma and the data-driven image representation
(17) of B|𝑇ini +𝑇f .
Further system analysis problems were addressed using the data-
For simplicity, this section considers only a single data trajectory 𝑤d
driven representation (17). van Waarde et al. (2020) proposed stability,
and the Hankel matrix H𝑇ini +𝑇f (𝑤d ). Extensions to multiple trajectories
controllability, and stabilizability tests (for details see Section 5.3).
and mosaic-Hankel matrices are possible.
Koch et al. (2020), Maupong et al. (2017), Romer et al. (2019) and
Rosa and Jayawardhana (2021) considered data-driven dissipativity 5.1.1. Data-driven approach to finite-time LQ control
analysis, Monshizadeh (2020) considered data-driven model reduction, Given data 𝑤d collected offline which is persistently exciting of
and Markovsky (2015) considered estimation of the DC-gain from a sufficient order, the fundamental lemma implies that the concatenated
finite number samples of a step response. Data-driven analysis result initial and future trajectory 𝑤 ∶= 𝑤ini ∧ 𝑤f ∈ B|𝑇ini +𝑇f lies in the image
for polynomial systems are presented in Martin and Allgöwer (2021a). of H𝑇ini +𝑇f (𝑤d ), that is, 𝑤 = H𝑇ini +𝑇f (𝑤d )𝑔 for some 𝑔. According to

53
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

𝑤ini = (𝑢ini , 𝑦ini ) and 𝑤f = (𝑢f , 𝑦f ), permute and partition the Hankel where ‖ ⋅ ‖𝐹 denotes the Frobenius norm, and uniqueness holds under
matrix as full rank conditions descending from, e.g., the fundamental lemma. For
⎡𝑢 ⎤ ⎡ 𝑈p ⎤ [ exact data, (38)–(39) is an ARX model with rank (𝐾p ) = 𝑛 assuring
[ ] ⎢ ini ⎥ ]
𝑢 ⎢𝑈 ⎥ H𝑇ini +𝑇f (𝑢d ) LTI behavior of desired complexity and a lower block-triangular zero
𝑤ini
∼ ⎢ ⎥ , H𝑇ini +𝐿 (𝑤d ) ∼ ⎢ f ⎥ =
f
, pattern of 𝐾f assuring causality. For inexact data, LTI behavior of
𝑤f ⎢𝑦ini ⎥ ⎢ 𝑌p ⎥ H𝑇ini +𝑇 (𝑦d )
⎢𝑦 ⎥ ⎢𝑌 ⎥ desired complexity is promoted by low-rank approximation (typically,
⎣ f⎦ ⎣ f⎦ via singular-value thresholding of 𝐾p ) (Favoreel et al., 1999). By heuris-
where ∼ denotes similarity under a coordinate permutation. With this tically thresholding 𝐾f towards a block-triangular zero pattern one aims
notation in place, the LQ tracking control problem (36) can be posed to gain causality (Huang & Kadali, 2008, Remark 10.1).
in the equivalent data-driven formulation These steps bring the linear relation (38) half-way towards an LTI
model. Though a model has further structure, e.g., 𝐾f is Toeplitz, and
minimize over 𝑢f , 𝑦f , 𝑔 ‖𝑦f − 𝑦r ‖2𝑄 + ‖𝑢f − 𝑢r ‖2𝑅
the entries of 𝐾p and 𝐾f are coupled. Nevertheless, the linear relation
⎡𝑈p ⎤ ⎡𝑢ini ⎤ (38)–(39) without further post-processing has demonstrated excellent
⎢ ⎥ ⎢ ⎥ (37)
subject to ⎢ p ⎥ 𝑔 = ⎢𝑦ini ⎥ ,
𝑌 performance as a data-driven predictor employed in receding-horizon
⎢ 𝑈f ⎥ ⎢ 𝑢f ⎥ predictive control across various case studies; see Huang and Kadali
⎢𝑌 ⎥ ⎢𝑦 ⎥ (2008), Lu et al. (2014), Vajpayee et al. (2017) and Zeng et al. (2010)
⎣ f⎦ ⎣ f⎦
for an overview.
where (with slight abuse of notation) we redefined 𝑄 and 𝑅 as
blkdiag(𝑄, … , 𝑄) and blkdiag(𝑅, … , 𝑅), respectively.
Connections between SPC and data-driven LQ control
The data-driven LQ control formulation (37) has been first pre-
The close connections between the SPC predictor (38)–(39) and the
sented and analyzed by Markovsky and Rapisarda (2008), and an
direct data-driven LQ control problem (37) have been remarked upon
explicit solution has been proposed. An earlier precursor and solution
a few times (Dörfler et al., 2021a; Fiedler & Lucia, 2021; Huang et al.,
to data-driven LQ control based on the fundamental lemma is due to Fu-
2019), and we summarize them below.
jisaki et al. (2004). Their approach is geometric, and the design is based
Observe that the variable 𝑔 can be eliminated from the constraint
on controllable and reachable subspaces which can be constructed from
of (37) as 𝑦f = 𝑌f 𝑔, where 𝑔 is any solution to the remaining constraint
H𝑇ini +𝑇f (𝑤d ).
equations. Whereas the solution 𝑔 is not necessarily unique, the re-
If the underlying state is directly available, 𝑄 = 0, 𝑇f ≥ 𝑛, and a
sulting output 𝑦f is unique (Markovsky & Rapisarda, 2008, Proposition
terminal condition on 𝑦r (𝑇f ) is imposed, the LQ control formulation (36)
1). One choice is the associated least norm-solution, that is, 𝑦f = 𝑌f 𝑔 ⋆
reduces to classic minimum energy control, and a similar data-driven
where
solution has been investigated by Baggio et al. (2019) and follow- [ 𝑈 ]† [
p 𝑢ini ]
up articles (Baggio et al., 2021; Baggio & Pasqualetti, 2020). Based 𝑔 ⋆ = 𝑌p 𝑦ini = arg min over 𝑔 ‖𝑔‖2
on numerical case studies, Baggio et al. (2019) concluded that the 𝑢f
𝑈f
[𝑈 ] (40)
direct data-driven approach displays superior performance (especially p [ 𝑢ini ]
for large data size and state dimension) over the explicit (model-based) subject to 𝑌p 𝑔 = 𝑦ini .
𝑢f
𝑈f
minimum energy control formula invoking the controllability Gramian.
With this reformulation and elimination of 𝑔 the direct data-driven LQ
If the initial conditions are not a priori given, the data-driven
control problem (37) reduces to
LQ control problem (37) entails both estimation of an initial prefix
trajectory 𝑤ini (equivalent to imposing the initial condition of a latent minimize over 𝑢f , 𝑦f ‖𝑦f − 𝑦r ‖2𝑄 + ‖𝑢f − 𝑢r ‖2𝑅
state variable; see Lemma 1) as well as prediction and optimization of †
the future system behavior 𝑤f . It is clean, tractable, and theoretically ⎡𝑈p ⎤ ⎡𝑢ini ⎤ (41)
subject to 𝑦f = 𝑌f ⎢ 𝑌p ⎥ ⎢𝑦 ⎥ ,
insightful, albeit it is not immediately clear how to extend it beyond the ⎢ ⎥ ⎢ ini ⎥
⎣ 𝑈f ⎦ ⎣ 𝑢f ⎦
setting of exact data 𝑤d or how to derive closed-form feedback control
policies in the infinite-horizon setting. These questions will be further that is, we recover the multi-step SPC predictor (38)–(39).
pursued in Sections 5.2 and 5.3 . Before that we briefly review a historic Observe that the reformulation of the direct data-driven LQ control
precursor to the LQ control formulation (36). problem (37) towards SPC was only possible since 𝑔 was unconstrained,
unpenalized, and any solution results in the same output — in case of
5.1.2. Subspace predictive control exact data. In case of inexact data, the reformulation (40) suggests a
Subspace predictive control (SPC) coined by Favoreel et al. (1999) is regularization of the LQ problem (37) with ‖𝑔‖2 to filter out noise akin
an early data-driven control approach originating from subspace system to least squares (39). We will further pursue this line of ideas in the
identification, which has seen plenty of theoretical developments and next section.
practical applications; see Huang and Kadali (2008) for a survey.
Although SPC historically predates the fundamental lemma, it can be 5.2. Data-enabled predictive control
nicely introduced within the framework of the previous section. SPC
seeks a linear relation, i.e., a matrix 𝐾, relating past and future inputs For deterministic LTI systems the direct data-driven LQ tracking
and outputs as control (37) can be implemented at face value in a receding-horizon
predictive control fashion; see Yang and Li (2013) or the related SPC
] ⎡ ini ⎤
𝑢
[ literature (Huang & Kadali, 2008). Indeed, in this case, it can be shown
𝑦f = 𝐾p 𝐾f ⎢𝑦ini ⎥ . (38)
⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ ⎢⎣ 𝑢 ⎥⎦
that the data-driven LQ problem (37) is equivalent to a model-based
f predictive control (MPC) formulation (Coulson et al., 2019a, 2020).
=𝐾
When departing from deterministic LTI systems and exact data, it is
The multi-step predictor 𝐾 can be found from data by replacing the
tempting to opt for a certainty-equivalence implementation, that is, to
variables (𝑢ini , 𝑦ini , 𝑢f , 𝑦f ) in (38) by the Hankel matrix data (𝑈p , 𝑌p , 𝑈f ,
implement the control as in (37) despite not satisfying the assumptions.
𝑌f ) and solving for 𝐾 approximately in the least-square sense, that is,
However, the latter approach fails. This can be intuitively understood
Huang and Kadali (2008, Section 3.4):
from the perspective of dictionary learning. The columns of the Han-
‖ ⎡𝑈p ⎤‖ ⎡ 𝑈p ⎤

kel matrix H𝑇ini +𝑇f (𝑤d ) serve as a library of trajectories, and the LQ
‖ ‖
‖ ‖
̂ ⋅ ⎢ 𝑌p ⎥‖ = 𝑌f ⎢ 𝑌p ⎥ ,
𝐾 = arg min ‖𝑌f − 𝐾 (39) problem (37) linearly combines these trajectories to synthesize the op-
‖ ⎢ ⎥‖ ⎢ ⎥
𝐾̂ ‖ ⎣ 𝑈f ⎦‖ ⎣ 𝑈f ⎦ timal control trajectory. However, a superposition of trajectories from
‖ ‖𝐹

54
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

B|𝑇ini +𝑇f is again a valid trajectory of B|𝑇ini +𝑇f only for linear systems. weighted by inverse noise covariances, disregarding the penalty on 𝜎𝑢ini
Even in the linear stochastic case a superposition of trajectories does in absence of input noise, or removing the squares in the spirit of exact
not generally preserve the noise statistics, e.g., a linear combination penalization, i.e., for sufficiently large (𝜆𝑢ini , 𝜆𝑦ini ) the slack variables
of Gaussian random variables with identical variance equals another (𝜎𝑢ini , 𝜎𝑦ini ) take a non-zero value only if the constraints are infeasible.
Gaussian random variable though with a generally different variance. Observe that the DeePC formulation (43) can be compactified by
Even more detrimental: a Hankel matrix H𝑇ini +𝑇f (𝑤d ) built from noisy eliminating the variables 𝑢f , 𝑦f , 𝜎𝑢ini , 𝜎𝑦ini :
data will likely have full rank, not reveal an LTI behavior of bounded
minimize over 𝑔 ‖𝑌f 𝑔 − 𝑦r ‖2𝑄 + ‖𝑈f 𝑔 − 𝑢r ‖2𝑅
complexity, and any optimal control trajectory 𝑤 is feasible for (37),
that is, the predicted optimal trajectory can be arbitrarily optimistic + 𝜆𝑦ini ‖𝑌p 𝑔 − 𝑦ini ‖22 + 𝜆𝑢ini ‖𝑈p 𝑔 − 𝑢ini ‖22 + 𝜆𝑔 ⋅ ℎ(𝑔) (44)
and non-realizable when applied to the real system. subject to (𝑈f 𝑔, 𝑌f 𝑔) ∈ U × Y .
Aside from the above issues related to the data 𝑤d collected offline,
the data 𝑤ini = (𝑢ini , 𝑦ini ) collected online (before implementing an In fact, in absence of constraints, (44) takes the form of a regularized
instance of the optimal control (37)) is typically noise-corrupted as well regression problem
which leads to feasibility issues and further deterioration of the realized minimize over 𝑔 ‖H𝑇ini +𝑇f (𝑤d )𝑔 − 𝑤r,ini ‖2𝑃 + 𝜆𝑔 ⋅ ℎ(𝑔), (45)
control performance.
For these reasons the certainty-equivalence approach has to be re- where 𝑃 is the block-diagonal matrix blkdiag(𝜆𝑢ini 𝐼, 𝜆𝑦ini 𝐼, 𝑅, 𝑄) and
placed by a robust one. Below we review Data-EnablEd Predictive Control 𝑤r,ini = (𝑢ini , 𝑦ini , 𝑢r , 𝑦r ). The latter compact formulation does not only
(known by its acronym DeePC) coined by Coulson et al. (2019a) as a provide a regression perspective on the DeePC problem, but also moti-
robustified receding-horizon implementation of the direct data-driven vates the use of Bayesian, non-parametric, or robust regression methods
LQ tracking control (37). to approach and extend the DeePC problem formulation (45).

5.2.1. Robustified formulation of the direct data-driven LQ optimal control 5.2.2. Robustification of DeePC by means of regularization
problem & DeePC The regularization term ℎ(𝑔) in (43)–(45) is needed to robustify
As previously discussed, the need for robustification of the direct the optimal control design in case of inexact data 𝑤d arising from
data-driven LQ problem (37) is two-fold. First, note that when imple- possibly non-deterministic and nonlinear processes. The regularizations
menting (37) in receding-horizon, the data 𝑤ini = (𝑢ini , 𝑦ini ) is measured have first been proposed heuristically by Coulson et al. (2019a) before
and repeatedly updated online. In case of inexact data, due to mea- being constructively derived. Different assumptions on the data lead to
surement noise and input/output disturbances, the constraint equations different regularizers. In what follows, we briefly review five different
𝑈p 𝑔 = 𝑢ini and 𝑌p 𝑔 = 𝑦ini determining the initial behavior may not variations.
be feasible. As a remedy, DeePC opts for a moving-horizon least-error
estimation (Rawlings et al., 2017) and softens these constraints as (1) Regularization derived from pre-processing
[ ] [ ] In case of inexact data, the matrix H𝑇ini +𝑇f (𝑤d ) will generically not
𝑈p 𝑢 + 𝜎𝑢ini
𝑔 = ini , (42) have the desired rank 𝑚(𝑇ini +𝑇f )+𝑛 and will not reveal an LTI behavior
𝑌p 𝑦ini + 𝜎𝑦ini
of desired complexity. As in Section 4.4, the noisy data matrix can
where 𝜎𝑢ini and 𝜎𝑦ini are slack variables penalized in the cost. be pre-processed via structured low-rank approximation. Formally, this
Second, aside from the above additive uncertainty, the data-driven can be posed as a bi-level optimization problem: namely, solve the optimal
LQ problem (37) is also subject to multiplicative uncertainty, since the control problem subject to pre-processing of the data matrix as in (33):
data-matrices 𝑈p , 𝑌p , 𝑈f , and 𝑌f are also subject to noise. This noise can
be mitigated offline by pre-processing the trajectory library (e.g., by minimize over 𝑔 ‖H𝑇ini +𝑇f (𝑤
̂⋆ )𝑔 − 𝑤r,ini ‖2𝑃
d
seeking a low-rank approximation of H𝑇ini +𝑇f (𝑤d ) as in Section 4.4),
̂⋆
subject to 𝑤 ̂d and B
∈ arg min over 𝑤 ̂ ‖𝑤d − 𝑤
̂d ‖ (46)
but in the spirit of direct data-driven control – seeking an online d
decision based on raw data – DeePC opts for a regularization of the LQ subject to ̂d ∈ B|
𝑤 ̂ 𝑇 and B
̂ ∈ L𝑞
𝑐
problem (37). In particular, a nonnegative term ℎ(𝑔) is added to the cost
This non-convex bi-level decision making problem can be formally
function. This regularization term will be justified later in Section 5.2.2,
reduced and convexified as in (35) leading to the direct DeePC formula-
but the attentive reader may recall from Section 4.4 that ℎ(𝑔) = ‖𝑔‖1
tion (45) with an 𝓁1 -norm regularization ℎ(𝑔) = ‖𝑔‖1 ; see Dörfler et al.
corresponds to a convex relaxation of a low-rank approximation de-
(2021a, Theorem 4.6) for details.
noising scheme, and ‖𝑔‖2 is connected to a pre-conditioning of the
predictor à la SPC in (40).
(2) Regularization derived from least-square identification
A third minor – yet practicably important – modification is to
As a second source of regularization, consider solving the optimal
augment the data-driven LQ problem (37) with input and output con-
control problem (37) (neglecting constraints and noisy estimation for
straints 𝑢f ∈ U and 𝑦f ∈ Y , respectively. These can account for,
simplicity) with an ARX predictor as in SPC (38), where the multi-step
e.g., input saturation, operational limits, or terminal constraints needed
predictor 𝐾 is found by ordinary least squares as in (39). This procedure
for closed-loop stability of the predictive control scheme (Borrelli et al.,
can be formally posed again as a non-convex bi-level decision making
2017; Rawlings et al., 2017).
problem:
These three modifications give rise to the DeePC problem
minimize over 𝑢f , 𝑦f ‖𝑦f − 𝑦r ‖2𝑄 + ‖𝑢f − 𝑢r ‖2𝑅
minimize over 𝑢f , 𝑦f , 𝑔, 𝜎𝑢ini , 𝜎𝑦ini ‖𝑦f − 𝑦r ‖2𝑄 + ‖𝑢f − 𝑢r ‖2𝑅
⎡𝑢ini ⎤
+ 𝜆𝑢ini ‖𝜎𝑢ini ‖22 + 𝜆𝑦ini ‖𝜎𝑦ini ‖22 + 𝜆𝑔 ⋅ ℎ(𝑔)
subject to 𝑦f = 𝐾 ⋆ ⎢𝑦ini ⎥
⎡ 𝑈p ⎤ ⎡𝑢ini + 𝜎𝑢 ⎤ ⎢ ⎥
(43) ⎣ 𝑢f ⎦
⎢ ⎥ ⎢ ini

⎢ p ⎥ 𝑔 = ⎢𝑦ini + 𝜎𝑦ini ⎥ and (𝑢f , 𝑦f ) ∈ U × Y ,
𝑌 ‖ ‖
subject to ‖ ⎡𝑈p ⎤‖
⎢ 𝑈f ⎥ ⎢ 𝑢f ⎥ ‖ [ ] ‖ (47)
‖ ‖
⎢𝑌 ⎥ ⎢ ⎥ 𝐾 ⋆ = arg min over 𝐾 ‖𝑌f − 𝐾𝑝 𝐾𝑓 ⎢ 𝑌p ⎥‖
⎣ f⎦ ⎣ 𝑦f ⎦ ‖ ⏟⏞⏞⏞⏞⏞⏞⏟⏞⏞⏞⏞⏞⏞⏟ ⎢⎣ 𝑈 ⎥⎦‖
‖ ‖
‖ f ‖
where 𝜆𝑢ini , 𝜆𝑦ini , and 𝜆𝑔 are nonnegative scalar regularization coeffi- ‖ =𝐾 ‖𝐹
cients (hyperparameters). Many variations of the estimation penalty subject to rank(𝐾𝑝 ) = 𝑛
𝜆𝑢ini ‖𝜎𝑢ini ‖22 + 𝜆𝑦ini ‖𝜎𝑦ini ‖22 are conceivable, e.g., choosing norms and 𝐾𝑓 lower-block triangular

55
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Here, the rank constraint on 𝐾p promotes an LTI behavior of desired where P ̂ is the associated empirical distribution built using the mea-
complexity, and the lower-block triangular structure of 𝐾𝑓 assures sured data 𝑤d , i.e., the measure of 𝑤
̂d concentrates on 𝑤d .
causality, as discussed after (39). When dropping these constraints and If the solution of the sample-average problem (51) is implemented
re-parameterizing the least-square criterion by a least-norm problem on the real system, one suffers an out-of-sample loss since the true
as in (40), the DeePC formulation (43) can be derived as a convex data-generating distribution P is due to some (possibly nonlinear, non-
relaxation to (47) with regularizer stationary, non-Gaussian) stochastic process that is only poorly repre-
‖( [ 𝑈 ]† [ 𝑈 ]) ‖ sented by the samples P. ̂ To be robust against such processes, Coul-
‖ p p ‖

ℎ(𝑔) = ‖ 𝐼 − 𝑌p 𝑔‖ (48) son et al. (2019b, 2020) propose the distributionally robust DeePC
𝑌p
‖,
‖ 𝑈f 𝑈f ‖ formulation
‖ ‖
[ ]
where ‖ ⋅ ‖ is any norm; see Dörfler et al. (2021a, Theorem 4.5) inf sup E𝑤̂d ∼Q 𝑓 (𝑤
̂d , 𝑔) ,
(52)
𝑔∈G 𝑝 ̂
for details. This projection-based regularizer assures that a particular Q∈B𝜖 (P)

least-norm solution 𝑔 is singled out corresponding to the least-square


where the ambiguity B𝑝𝜖 (P) ̂ is a Wasserstein ball of radius 𝜖 > 0,
criterion in (40). Finally, we note that the projection-based regularizer ̂ and with metric induced by the 𝓁𝑝 -norm. One can show
centered at P,
(48) is consistent, i.e., the solution of the LQ problem (37) also solves
that, under integrability conditions and for a Lipschitz objective, the
the regularized problem (43) in case of exact data 𝑤d . In comparison,
distributionally robust formulation (52) is equivalent to the regularized
mere norm-based regularizers ℎ(𝑔) = ‖𝑔‖ are not consistent and bias
DeePC (44) with 𝜆𝑔 being 𝜖 times the Lipschitz constant of the cost and
the solution.
with regularizer ℎ(𝑔) = ‖𝑔‖⋆ 𝑝 , where ‖ ⋅ ‖𝑝 is the dual norm of that one

Synopsis: The above two direct and regularized data-driven control
used to construct the Wasserstein ball (Coulson et al., 2020, Theorem
approaches are due to reducing and convexifying the indirect ‘‘first
4.1). For example, safeguarding against uncertainty in 𝓁∞ -norm in the
pre-process/identify and then control’’ problems. In either case, the
space of trajectories is equivalent to 𝓁1 -norm regularization.
magnitude of the regularization coefficient 𝜆𝑔 ensures to which extent
The same methods can also be applied to distributionally robustify
the inner pre-processing/identification problems are (approximately)
stochastic formulations of constraints (Coulson et al., 2020). Further-
enforced. However, unlike (46) or (47) no projection on the class of
more, data compression and sample-complexity results are in Coulson
LTI models of desired complexity is enforced. As a result, noise is not
et al. (2020) and Fabiani and Goulart (2020).
entirely removed (no variance reduction) but no erroneous model selec-
tion (no bias) is encountered. These bias–variance trade-off discussions
(5) Regularization related to robust control
give an intuition when indirect data-driven control approaches are infe-
Xue and Matni (2020) propose a data-driven formulation of the
rior (respectively, superior) to a direct DeePC formulation; see Dörfler
system level synthesis (SLS) subspace constraint (Anderson et al., 2019)
et al. (2021a) for a discussion. Further trade-offs between the direct
(parameterizing the admissible closed-loop responses) by means of the
and indirect approaches are discussed by Krishnan and Pasqualetti
non-parametric representation (17) and assuming full state measure-
(2021) concluding that either approach can be superior depending on
ments. The resulting robust LQ formulation (with bounded adversarial
prediction horizon, state dimension, noise level, and size of the data
disturbances on the data matrix) results again in a norm-based regular-
set.
ization (Xue & Matni, 2020, eq. (3.9)). Building on this connection to
(3) Regularizations derived from robust optimization SLS, Lian and Jones (2021b) provide an extension to a class of uncertain
An entirely different route towards regularization can be derived LTI systems and a DeePC formulation with disturbance-affine feed-
by robustifying the regression-based DeePC formulation (45) (without back. Furthermore, Furieri et al. (2021a, 2021b) extend these works
constraints and regularization) as in related robustified regression prob- towards safety constraints, measurement and process noise, and partial
lems (Bertsimas & Copenhaver, 2018; El Ghaoui & Lebret, 1997; Xu observations (i.e., output feedback) by deriving a data-driven reformu-
et al., 2010) lation of the recently proposed input–output parametrization (Furieri
et al., 2019). The authors also address robust and constrained linear
minimize over 𝑔 maximize over 𝑤
̂d ∈ W(𝑤d ) quadratic Gaussian (LQG) design, provide a tractable upper bound,
(49)
̂d )𝑔 − 𝑤r,ini ‖2𝑃 ,
‖H𝑇ini +𝑇f (𝑤 and a suboptimality certificate with respect to the ground-truth LQG
control.
where W(𝑤d ) is an uncertainty set typically centered at the collected
Aside from regularization, we also mention the following meth-
offline data 𝑤d . Huang, Jianzhe et al. (2021) and Huang, Zhen et al.
ods seeking a robust DeePC formulation. Xu et al. (2021) consider
(2021) consider different structured and unstructured uncertainty sets
measurement noise within ellipsoidal uncertainty sets, characterize
ranging from mere norm balls, over interval-valued and column-wise
noise sequences consistent with the data, and transform the robust
uncertainties (relevant for a trajectory matrix structure), to uncertain-
ties with Hankel structure. For each of these Huang, Jianzhe et al. control problem to a semidefinite program via the S-Lemma. Yin et al.
(2021) and Huang, Zhen et al. (2021) propose tractable reformulations, (2020a, 2020b) propose a maximum likelihood framework to DeePC
many of which take the form (45) with norm-based regularization terms under Gaussian noise: namely, starting from (37) a vector 𝑔 is sought
ℎ(𝑔). Moreover, Huang, Jianzhe et al. (2021) also consider the case that maximizes the likelihood of observing both the predicted and
of robustified constraints and provide bounds on the realized system the measured output trajectories 𝑦f and 𝑦ini , respectively. The result-
performance. ing formulation is amenable to a sequential quadratic programming
approach.
(4) Regularizations derived from distributional robustness Last, we remark that all of the above approaches towards robustified
A similar (albeit stochastic) perspective leading to regularization DeePC empirically show excellent closed-loop performance and may
is due to distributional robustness (Kuhn et al., 2019). Problem (44) outperform each other depending on the specific problem scenario,
(without regularization term) can be abstracted as noise characteristics, etc., as demonstrated in various case studies;
see also Section 5.2.4. In fact, regularization is a key aspect in the
minimize over 𝑔 ∈ G (𝑤d ) 𝑓 (𝑤d , 𝑔) , (50) formal closed-loop stability and robustness analysis by Berberich et al.
(2021b) (see also Section 5.2.3) for noisy data or when interconnecting
where G (𝑤d ) and 𝑓 (𝑤d , 𝑔)) denote the constraint set and objective
DeePC with a nonlinear system. Note that certain regularizers can be
of (44), respectively. Since the data 𝑤d has arisen from a stochastic
justified in multiple ways, e.g., the 𝓁1 -norm regularizer might arise
process, one may equivalently rewrite (50) as
[ ] due to low-rank pre-processing, a robust regression formulation, or a
minimize over 𝑔 ∈ G (𝑤d ) E𝑤̂ ∼P̂ 𝑓 (𝑤 ̂d , 𝑔) , (51) distributional robust formulation. Finally, both theoretic and empirical
d

56
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

results show that the performance of some robustifications is superior


when choosing particular data matrix structures, e.g., Page and trajec-
tory matrices (15)–(16) with independent columns instead of Hankel
matrices (Coulson et al., 2020; Huang, Coulson et al., 2021; Huang,
Jianzhe et al., 2021).

5.2.3. Closed-loop receding-horizon and recursive implementations: certifi-


cates and extensions
The previous subsections have focused mostly on the optimization
formulation of DeePC with exception of Furieri et al. (2021a, 2021b),
Huang, Jianzhe et al. (2021) and Xue and Matni (2020) that also
certify the realized open-loop control performance optimizing over
input vectors and affine control policies, respectively.
There have also been various approaches towards certifying the
closed-loop behavior when implementing DeePC as receding-horizon
Fig. 5. The highly customized Menzi Muck M545 12 ton autonomous walking exca-
control. Notable are the sequence of papers (Berberich et al., 2020a, vator from the HEAP (Hydraulic Excavator for an Autonomous Purpose) project (Jud
2020b, 2021c, 2021d; Bongard et al., 2021). The initial work by et al., 2021) served as a demonstration platform for DeePC.
Berberich et al. (2021b) provides closed-loop stability and robust-
ness guarantees (in the sense of practical exponential stability) of
DeePC (43) with ridge regularizer ℎ(𝑔) = ‖𝑔‖22 and terminal equi-
librium constraints. Later articles extend and complement this work
towards robust output constraint satisfaction Berberich et al. (2020b),
time-varying Refs. Berberich et al. (2020a), less restrictive terminal
ingredients (Berberich et al., 2021d), implementations without terminal
ingredients (Bongard et al., 2021), and linear tracking control for
nonlinear systems with online data adaptation (Berberich et al., 2021c).
Independently, Berberich et al. (2021c) also provide a version of the
fundamental lemma for affine systems resulting from linearizing non-
linear systems. In comparison to other nonlinear fundamental lemma
extensions (briefly reviewed after Lemma 2) which gave rise to DeePC
implementations, Berberich et al. (2021c) also certify the resulting
nonlinear closed-loop properties.
Alpago et al. (2020) depart from the receding-horizon estimation
(42) in DeePC towards a recursive Kalman filtering approach using
the parametric solution of the optimization problem (45) to construct
a hidden state. As an independent side note, the parametric solution
takes the form of a piece-wise linear affine feedback policy (compared
to standard linear feedback policies) which sheds further light on the
remarkable performance of DeePC when applied to nonlinear systems. Fig. 6. Carlet et al. (2021) used a synchronous motor drive test bench at the EDLab
Finally, recently Baros et al. (2020), Bianchin et al. (2021) and Non- Padova as a demonstration platform for DeePC and for comparisons to SPC and
certainty-equivalence control based on an identified model.
hoff and Müller (2021) applied online feedback optimization (i.e., iter-
ative algorithms in feedback with a system (Hauswirth et al., 2021)) to
steer an LTI system characterized by the fundamental lemma. In a sim-
ilar algorithmic spirit, Allibhoy and Cortés (2020) and Alexandru et al. autonomous walking excavator (Wegner et al., 2021); see Fig. 5 for
(2021) consider networked and distributed DeePC implementations, an illustration. These two case studies are strongly nonlinear. While
respectively. DeePC succeeds in meeting the specifications, these two case studies
also reveal the limitations of the method and suggest an adaptive DeePC
5.2.4. DeePC: implementations and tuning recommendations method to provide tracking for strongly nonlinear systems.
At its core, DeePC is a method for deterministic LTI systems based On the power systems and electronics side, motivated by an initial
numerical case study (Huang et al., 2019), DeePC has been success-
on super-imposing trajectories from a library. It is due to the various
fully experimentally implemented on grid-connected power convert-
robustifications reviewed in this section that make DeePC ‘‘work’’
ers (Huang, Zhen et al., 2021) and synchronous motor drives (Carlet
for nonlinear and stochastic systems, as spectacularly showcased by
et al., 2021, 2020); see Fig. 6 for an illustration of the laboratory
many experimental and numerical case studies. Below we summarize
setup. Further, Huang, Coulson et al. (2021) provide a decentralized
some practical validations and recommendations for tuning the DeePC
DeePC implementation for power system oscillation damping in a large-
hyperparameters.
scale numerical case study, which has also been successfully replicated
by R&D groups on industrial simulators. These studies showed that
Notable experimental and computational DeePC case studies implementing DeePC on microcontrollers is feasible albeit computa-
Coulson et al. (2019a, 2019b, 2020) use an aerial robotics simula- tionally challenging. Further within the realm of energy, Lian, Shi et al.
tion case study via the DeePC method. Elokda et al. (2019) experimen- (2021) and Schwarz et al. (2019) study numerical and experimental
tally implemented this case study and for the first time demonstrated implementations for building automation.
the performance, robustness, and real-time implementability of DeePC. Finally, Berberich et al. (2021a) successfully applied DeePC to a
A video of a quadcopter successfully tracking step commands and a fig- nonlinear laboratory four-tank process. We note that both Berberich
ure 8 trajectory can be found here: https://polybox.ethz.ch/index.php/ et al. (2021a) and Lian, Shi et al. (2021) consider adaptive implemen-
s/ZHacWoJbxQlHDTz. Further within the realm of robotics, DeePC tations updating the data online.
has also been experimentally implemented to swing up a laboratory Many of the above case studies are safety-critical systems with com-
pendulum (Tischhauser et al., 2019) as well as to control a 12 ton plex dynamics for which constraint satisfaction, closed-loop stability,

57
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

and real-time computation are essential. Remarkably, the very same trajectory, and the role of regularization is to avoid overfitting. A con-
DeePC method with minor adjustments has succeeded in all case studies ceptually related 𝓁1 -regularized dictionary learning predictive control
from different areas. Next we visit the crucial hyperparameters for approach has been presented by Kaiser et al. (2018). Likewise, Salvador
DeePC tuning. et al. (2019) use affine combinations of stored trajectories in order to
achieve offset-free tracking. Further, in the field of robotics the idea of
DeePC tuning recommendations combining or concatenating stored motion primitives (i.e., trajectories)
We now turn towards tuning recommendations for the DeePC hy- has often been exploited for motion planning (Frazzoli et al., 2005) or
perparameters. Most hyperparameters also occur in model-based MPC predictive control (Gray et al., 2012).
such as the horizon 𝑇f , cost matrices 𝑄 and 𝑅, as well as transient Finally, the geometric approach to optimal control (Marro et al.,
and terminal constraints U and Y . We refer to Borrelli et al. (2017) 2002) relying on controlled and conditioned invariant subspaces is by
and Rawlings et al. (2017) for standard tuning recommendations, such nature coordinate-free and can also be approached in a representation-
as a sufficiently long horizon 𝑇f for closed-loop stability. The hyper- free setting based on the fundamental lemma, as demonstrated by Fu-
parameters (𝜆𝑢ini , 𝜆𝑦ini ) are as in moving-horizon estimation (Rawlings jisaki et al. (2004).
et al., 2017) or general Kalman filtering and account for (assumed)
noise covariances. Generally, (𝜆𝑢ini , 𝜆𝑦ini ) should be chosen sufficiently 5.3. Data-driven design of explicit feedback control policies
large. This leaves us with the estimation horizon 𝑇ini , data length
𝑇 , regularization function ℎ(𝑔) and coefficient 𝜆𝑔 as unique DeePC The identifiability condition (17) and the fundamental lemma pro-
hyperparameters. While each case study is different, the following vide a non-parametric representation of the restricted behavior B|𝐿
favorable tuning recommendations have emerged. based on raw data, which immediately lends itself as predictor and
First, 𝑇ini controls the (presumed) model complexity. Namely, estimator for finite horizon feedforward and receding-horizon control,
Lemma 1 requires the initial horizon 𝑇ini to be longer than the lag 𝐥(B) as presented in Sections 5.1–5.2. By means of the weaving lemma
which by (2) is again bounded by the order 𝐧(B) – both of which are (Markovsky et al., 2005, Lemma 3), this predictor can in certain
unknown in a data-driven setting. It proved useful to choose 𝑇ini sim- instances be extended to an infinite-horizon setting, but is more con-
ply sufficiently large: generally, the realized closed-loop performance ceptual than practically useful in case of noisy data, and it is not
monotonically improves but does increasingly less so after a certain immediately clear how to obtain a recursive model as well as an
threshold. For nonlinear systems this threshold is larger than the state- explicit infinite-horizon feedback control law, e.g., the setting of the
dimension confirming the intuition that a higher-order LTI model can linear quadratic regulator (LQR). In a state–space setting, the two
better explain the data. Second, the length 𝑇 of the data time series articles (De Persis & Tesi, 2019; van Waarde et al., 2020) provided
𝑤d has to be sufficiently long to assure persistency of excitation; see equally simple as well as ingenuous approaches on how to parametrize
the fundamental lemma. The analytic lower bound for 𝑇 depends on an explicit state feedback design by means of state–space data matrices
𝐧(B) which is generally unknown. Similar to 𝑇ini , a sufficiently large and subspace relations amongst them.
𝑇 proves beneficial. The second author has had good experiences with
choosing 𝑇 so that the data matrix H𝑇ini +𝑇f (𝑤d ) is square. Third and 5.3.1. Prototypical LTI stabilization & LQR problems
finally, all regularizers ℎ(𝑔) perform well after tuning 𝜆𝑔 , and they Consider a controllable input/state system as in (11)
can also be combined. As discussed previously, the projection-based { }
(𝑢, 𝑥) ∈ (R𝑚+𝑛 )N | 𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢 , (53)
regularizer (40) ensures consistency whereas norm-based regularizers
robustify the control at the cost of a bias. Concerning the coefficient where 𝑢(𝑡) ∈ R𝑚 , and the state 𝑥(𝑡) ∈ R𝑛 is explicitly available as
𝜆𝑔 , most case studies approximately display a convex behavior: the measurement. We will later comment on possible extensions if only
closed-loop performance improves when increasing 𝜆𝑔 beyond a certain outputs are available.
threshold, remains constant for a large interval of 𝜆𝑔 , and then increases To illustrate the utility of different approaches, we consider two
again beyond a second threshold. While the desired 𝜆𝑔 can often be prototypical control problems, namely using state feedback 𝑢 = 𝐾𝑥 for
theoretically characterized (e.g., by the size of the uncertainty set either model-based stabilization
in (49) or (52)), it remains a design parameter practically found by
f ind 𝐴 + 𝐵𝐾 is Schur stable
increasing 𝜆𝑔 logarithmically from a small value until the realized cost 𝐾
increases again. or infinite-horizon LQR optimal control


5.2.5. Conceptually related approaches of relevance minimize over 𝑢 ‖𝑥(𝑡)‖2𝑄 + ‖𝑢(𝑡)‖2𝑅
We have throughout our review pointed to closely related data- 𝑡=1
driven predictive control formulations based on the fundamental subject to 𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢 ,
lemma such as SPC (Favoreel et al., 1999; Huang & Kadali, 2008). There
where 𝑄 ⪰ 0, 𝑅 ≻ 0, and (𝑄1∕2 , 𝐴) is detectable. By means of the
are other less closely but certainly conceptually related approaches that
familiar Lyapunov and Gramian matrices, the stabilization and LQR
we briefly discuss below.
problems can be parameterized as
One may take the vantage point that, given a single offline data tra-
jectory 𝑤d , DeePC is able to synthesize all admissible future trajectories f ind (𝐴 + 𝐵𝐾)𝑃 (𝐴 + 𝐵𝐾)⊤ − 𝑃 ≺ 0 (54)
𝐾,𝑃 ≻0
for predictive control. A similar perspective is taken by dynamic matrix
control (DMC) (Cutler & Ramaker, 1980; Garcia et al., 1989), a historic and anticipating the solution 𝑢 = 𝐾𝑥 as
precursor to MPC originating from industry. DMC is a predictive control ( )
minimize over 𝐾, 𝑃 ⪰ 𝐼 trace 𝑄𝑃 + 𝐾 ⊤ 𝑅𝐾𝑃
method that designs future system trajectories based on a previously (55)
recorded zero-initial condition step response. Although DMC has many subject to (𝐴 + 𝐵𝐾)𝑃 (𝐴 + 𝐵𝐾)⊤ − 𝑃 + 𝐼 ⪯ 0 ,
limitations (Lundström et al., 1995), it motivates data-driven predictive respectively. The LQR problem (55) admits many different parame-
control based on a single data trajectory. terizations, and the proposed form (55) can be turned into a convex
Another perspective is that of dictionary learning. The connection to semidefinite program after a change of variables 𝑌 = 𝐾𝑃 ; see Feron
DeePC is most obvious when using the trajectory matrix structure (15) et al. (1992) for the continuous-time case.
and looking at the regression formulation (45). DeePC linearly com- Observe that both problems (54) and (55) are semidefinite optimiza-
bines trajectories from this dictionary to synthesize the optimal control tion (respectively, feasibility) problems parameterized in terms of the

58
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

closed-loop matrix 𝐴 + 𝐵𝐾. Likewise, many other instances of robust 5.3.4. Optimal control parametrization by data matrices
and optimal control can be formulated and parameterized similarly; The direct approach laid out by De Persis and Tesi (2019) uses the
see e.g. Scherer and Weiland (2000). The stabilization (54) and LQR subspace relations (56)–(57) to parametrize the stabilization (54) and
(55) problems serve as running examples for the developments in this LQR (55) problems by raw data matrices. Namely, due to the rank
section. condition (57), for any control gain matrix 𝐾, there is a (non-unique)
(𝐿 × 𝑛) matrix 𝐺 so that
[ ] [ ]
5.3.2. Subspace relations in state–space data 𝐾 𝑈
= 𝐺, (59)
Consider time series of length 𝐿 (respectively, 𝐿 + 1) of inputs and 𝐼 𝑋−
states recorded from the LTI system (53):
and due to (56), the closed-loop matrix can be parameterized as
[ ] [ ] [ ]
𝑈 ∶= 𝑢(0) 𝑢(1) … 𝑢(𝐿 − 1) , [ ] 𝐾 (59) [ ] 𝑈 (56)
[ ] 𝐴 + 𝐵𝐾 = 𝐵 𝐴 = 𝐵 𝐴 𝐺 = 𝑋+ 𝐺 . (60)
𝑋 ∶= 𝑥(0) 𝑥(1) … 𝑥(𝐿) . 𝐼 𝑋−

We partition the state data into predecessor and successor states: This trick is as simple as it is ingenuous: it allows to replace the
[ ] closed-loop matrix 𝐴 + 𝐵𝐾 by 𝑋+ 𝐺 subject to the additional constraint
𝑋− ∶= 𝑥(0) 𝑥(1) … 𝑥(𝐿 − 1) , (59), and the control gain can be recovered as 𝐾 = 𝑈 𝐺. Thus, the
[ ]
𝑋+ ∶= 𝑥(1) 𝑥(2) … 𝑥(𝐿) = 𝜎𝑋− . stabilization (54) problem can be posed as
[ ] [ ]
Since these time series satisfy the dynamics (53), we have that 𝐾 𝑈
f ind (𝑋+ 𝐺)𝑃 (𝑋+ 𝐺)⊤ − 𝑃 ≺ 0 and = 𝐺. (61)
[ ] 𝐾,𝐺,𝑃 ≻0 𝐼 𝑋−
[ ] 𝑈
𝑋+ = 𝐵 𝐴 . (56) Note that for 𝐾 = 0, we obtain a data-driven stability test. After
𝑋−
the change of variables 𝑄 = 𝐺𝑃 , condition (61) can be posed as a
Recall Corollary 3 of the fundamental lemma: namely, if (𝑢(0), 𝑢(1), … , semidefinite constraint (De Persis & Tesi, 2019, Theorem 3)
𝑢(𝐿 − 1)) is persistently exciting of order 𝑛 + 1, then the input-state data [ ]
matrix has full row rank: 𝑋− 𝑄 𝑋+ 𝑄
≻ 0, (62)
[ ] ⋆ 𝑋− 𝑄
𝑈
rank = 𝑛 + 𝑚. (57)
𝑋− and, for any 𝑄 satisfying (62), 𝐾 = 𝑈 𝑄(𝑋− 𝑄)−1 is a stabilizing gain.
The converse statement holds as well. Analogously, the LQR problem
The rank condition (57) ensures that the pseudo-inverse of the input-
(55) can be parameterized by data raw matrices:
state data matrix is a right inverse, that is, the measurement data
( )
Eq. (56) can be solved for (𝐵, 𝐴). Different approaches towards stabi- minimize over 𝐾, 𝐺, 𝑃 ⪰ 𝐼 trace 𝑄𝑃 + 𝐾 ⊤ 𝑅𝐾𝑃
lization (54) and LQR (55) may now be pursued from the two subspace subject to (𝑋+ 𝐺)𝑃 (𝑋+ 𝐺)⊤ − 𝑃 + 𝐼 ⪯ 0
relations (56) and (57). [ ] [ ] (63)
𝐾 𝑈
= 𝐺
𝐼 𝑋−
5.3.3. Least-square identification of a parametric state–space model and
After a substitution of variables, (63) can be solved as convex optimiza-
certainty-equivalence control
tion problem (De Persis & Tesi, 2019, Theorem 4). This program (albeit
Recall that the conventional approach to data-driven control is convex) features matrix variables of size 𝐿 × 𝑛, i.e., dependent on the
indirect: first a parametric state–space model is identified from data, data size, which may be computationally challenging for a large data
and later on controllers are synthesized based on this model. Regarding set. It is noteworthy that this approach also extends to the multivariable
the identification of a state–space model: given the data (𝑈 , 𝑋), we seek output feedback case by leveraging past outputs and inputs as states in
input and state matrices (𝐵,̂ 𝐴)
̂ so that they (approximately in the noisy
a non-minimal realization (De Persis & Tesi, 2019, Section VI).
case) satisfy the linear measurement Eq. (56). These can be obtained,
Note that this approach never constructs the underlying system
e.g., as solution to the ordinary least squares problem
matrices (𝐵, 𝐴), as compared to identification (58). In fact, it uses only
[ ] ‖ [ ] [ ]†
[ ] 𝑈 ‖ a data-based characterization of the closed-loop matrix 𝐴 + 𝐵𝐾 =
̂ 𝐴̂ = arg min ‖
𝐵 ‖𝑋+ − 𝐵 𝐴

‖ = 𝑋+
𝑈
, (58)
‖ 𝑋− ‖ 𝑋− 𝑋+ 𝐺, and even here 𝐺 solving (59) is not unique. This non-uniqueness
𝐵,𝐴 ‖ ‖𝐹
provides an opportunity to further regularize the design in presence of
where the solution is unique and given as above due to the identifia-
noise. E.g., similar to (48), a least-norm solution minimizing ‖𝐺‖𝐹 can
bility condition (57).
be sought by adding to (59) the orthogonality constraint (Dörfler, Tesi
̂ 𝐴)
Based on this identified pair of parameters (𝐵, ̂ from (58) certainty
et al., 2021)
equivalence controllers can be designed, that is, in the stabilization
‖( [ ]† [ ]) ‖
(54) and LQR (55) problems, the matrices (𝐵, 𝐴) are replaced by their ‖ ‖
‖ 𝐼− 𝑈 𝑈
𝐺‖ (64)
̂ 𝐴).
̂ In case of noisy data, the uncertainty can be mitigated ‖ ‖ = 0.
estimates (𝐵, ‖ 𝑋 − 𝑋 − ‖
‖ ‖
by robustifying the optimal control formulations. We refer to Dean
et al. (2019), Mania et al. (2019), Treven et al. (2020) and Umenberger Thus, the resulting closed-loop matrix from (60) takes the form
et al. (2019) for recent analysis, performance estimates for finite sample [ ] [ ]† [ ]
[ ] 𝐾 (64), (59) 𝑈 𝐾
size, as well as various comparisons. Independently, this approach is 𝐵 𝐴 = 𝑋+𝐺 = 𝑋+ .
𝐼 𝑋− 𝐼
also known as dynamic mode decomposition (DMD) in the nonlinear
[ ]†
dynamics and fluids communities (Proctor et al., 2016). Hence, the implicitly used state–space matrices are [ 𝐵 𝐴 ] = 𝑋+ 𝑋𝑈−
In either the certainty-equivalent or the robust case, these ap- and coincide with those in certainty-equivalence control and obtained
proaches to data-driven control are indirect since they rely on an from the least-squares estimate (58). Similar to Section 5.2.2, alter-
intermediate identification of a parametric state–space model. In what native regularizations for the direct data-driven control are conceiv-
follows, we review direct approaches descending from the fundamental able (De Persis & Tesi, 2021b). We refer to Dörfler, Tesi et al. (2021)
lemma and the subspace relations (56)–(57). for an overview and comparison.

59
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

5.3.5. Data informativity be guaranteed. In a nutshell, the technical approach is as follows: in the
The innovative approach put forward by van Waarde et al. (2020) semidefiniteness conditions (61) and (63), the matrix 𝑋+ 𝐺 (encoding
( )
relies on data informativity: given the data (𝑈 , 𝑋), all linear systems 𝐴 + 𝐵𝐾 in the noiseless case) has to be replaced by 𝑋+ − 𝑊 𝐺
that explain the data, i.e., that are compatible with the measurement in the noisy case. The key question is then under which conditions
Eq. (56), are parameterized by feasibility of the semidefinite constraint without 𝑊 (i.e., the certainty-
{ [ ]} equivalence case) implies feasibility with 𝑊 . Since noise affects the
[ ] 𝑈
𝛴 = (𝐵, 𝐴)| 𝑋+ = 𝐵 𝐴 . (65) semidefinite constraint through a product of the terms 𝑊 𝐺𝑃 , this ap-
𝑋−
proach also suggests to augment the LQR cost with a regularizer of the
The characterization of 𝛴 is particularly interesting in the small-data form trace(𝑊 𝐺𝑃 (𝑊 𝐺)⊤ ) to mitigate the effect of noise; see De Persis
limit when the rank condition (57) does not hold, and we cannot solve and Tesi (2021b, Section 5).
for a unique pair (𝐵, 𝐴), i.e., 𝛴 is not a singleton.
In the robust approach, the design seeks a stabilizing or optimal
van Waarde et al. (2020) define the data (𝑈 , 𝑋) to be informative
control for all 𝑊 in a prescribed uncertainty set. A tractable and
for stabilization by state feedback if there is a single feedback gain
expressive uncertainty set is a quadratic matrix inequality proposed
𝐾 so that 𝐴 + 𝐵𝐾 is Schur stable for all (𝐵, 𝐴) ∈ 𝛴. If the rank
by Berberich, Koch, Scherer et al. (2020) and De Persis and Tesi (2019),
condition (57) holds, then the data-informativity question may be
where one assumes that all uncertainty realizations satisfy
approached as in Section 5.3.4. In absence of the rank condition (57),
[ ]⊤ [ ][ ]
by
{ studying the homogeneous
[ 𝑈 ]} solution set of the measurement Eq. (56), 𝐼 𝛷11 𝛷12 𝐼
(𝐵, 𝐴)| 0 = [ 𝐵 𝐴 ] 𝑋 , the authors are able to cast this problem as ⊤ ⊤ ⪰ 0, (68)
𝑊 ⋆ 𝛷22 𝑊
a semidefinitess condition (van Waarde et al., 2020, Theorem 17):
where 𝛷11 = 𝛷11 ⊤ and 𝛷 ⊤
namely, the data (𝑈 , 𝑋) are informative for stabilization by state feed- 22 = 𝛷22 ≺ 0. For example, for 𝛷22 = −𝐼
back if and only if there is a matrix 𝑄 so that 𝑋− 𝑄 = 𝑋− 𝑄⊤ , (62) holds, and 𝛷12 = 0 inequality (68) bounds the energy of 𝑊 ; or with 1𝑛×𝑛
1
and the stabilizing controller is obtained as 𝐾 = 𝑈 𝑄(𝑋− 𝑄)−1 . being the (𝑛 × 𝑛) matrix of unit entries, 𝛷22 = − 𝑛−1 (𝐼 − 1𝑛 1𝑛×𝑛 ), and
Hence, similar to De Persis and Tesi (2019) and van Waarde et al. 𝛷12 = 0 (68) bounds the sample covariance of the noise (van Waarde,
(2020) arrive at the stabilization condition (62), though the derivation Camlibel et al., 2020). Different parameterizations of the uncertainty
does not require the rank condition (57). In summary, the data infor- set (68) have been proposed; see van Waarde et al. (2021) for a
mativity approach allows for stabilization even though the data is not discussion and conversions thereof. We also remark that Berberich,
sufficiently rich to identify the system. On the contrary, when studying Scherer et al. (2020) and Bisoffi et al. (2021), Martin and Allgöwer
data-informativity for the LQR problem (55), the rank condition (57) (2021b) considered point-wise (in time) noise bounds to alleviate the
is required (van Waarde et al., 2020, Theorem 26), and a similar potential conservatism of (68).
semidefinite program formulation as in (63) can be derived — though Given the data (𝑈 , 𝑋), a robustified version of the stabilization
with optimization variables independent of the data size. van Waarde problem (54) can then be posed as finding a feedback gain 𝐾 so
et al. (2020) also study the output feedback case and various system that 𝐴 + 𝐵𝐾 is Schur stable for all (𝐵, 𝐴) compatible with (67) and
analysis questions from the view point of data informativity. (68). At this point the data-driven robust design can be approached
Applied to the problem of system identification, the informativity with established methods: De Persis and Tesi (2019) pose the problem
framework by van Waarde et al. (2020) leads to different identifiability as simultaneous satisfaction of a perturbed Lyapunov inequality with
definition and conditions than the ones presented in Section 3.2. The perturbation satisfying (68), and derive a robustly stabilizing controller
identifiability problem in van Waarde et al. (2020) concerns the special for sufficiently large signal-to-noise ratio; Berberich, Koch, Scherer
case of the input/state system (11), parameterized by the pair (𝐴, 𝐵). et al. (2020) recognize the problem setup as a linear fractional transfor-
Adapting the problem formulation to general input/output systems, mation, apply robust control methods, and later extend the approach to
parameterized by quadruple (𝐴, 𝐵, 𝐶, 𝐷), requires the notion of equiva- gray-box models including prior knowledge (Berberich, Scherer et al.,
lence classes in the space of parameters (𝐴, 𝐵, 𝐶, 𝐷). A topic for future 2020); alternatively, by inserting 𝑊 from (67) into (68), van Waarde
work is to extend the informativity framework to input/output systems and Camlibel (2021) and van Waarde, Camlibel et al. (2020) arrive at
and studying identifiability conditions. a quadratic matrix inequality, pose the robust data-driven design as a
simultaneous satisfaction of quadratic matrix inequalities, and solve it
5.3.6. Extensions beyond deterministic LTI systems via a matrix-valued version of Finsler’s and S-Lemma. Their conditions
The articles by De Persis and Tesi (2019) and van Waarde et al. are non-conservative (necessary and sufficient) for the considered noise
(2020) paved the way for manifold extensions to broader system model and improve upon previous ones.
classes, further analysis and design questions, as well as robustifications
The above analysis has also been extended to stabilization of weakly
in case of inexact data. We will primarily review the latter here. In case
nonlinear systems in the absolute stability setting, i.e., when system
the LTI dynamics (53) are subject to process disturbances 𝑤 (in what
stabilization can be achieved by means of linear feedback and certified
follows referred to as ‘‘noise’’),
with a quadratic Lyapunov function. For example, to stabilize a Lur’e
𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢 + 𝑤 , (66) system 𝜎𝑥 = 𝐴𝑥 + 𝐵𝑢 + 𝐸𝜙(𝑥) with sector-bounded scalar nonlinearity
𝜙(𝑥)⊤ (𝜙(𝑥) − 𝑥) ≤ 0, one can appeal to conceptually analogous methods
then the set of all systems explaining the data is characterized by all
as for a noisy system (67) subject to an ellipsoidal uncertainty set as in
pairs of matrices (𝐵, 𝐴) so that
[ ] (68) (Luppi et al., 2021; van Waarde & Camlibel, 2021).
[ ] 𝑈 Also problems of (robust) invariance (Bisoffi et al., 2020a), stabi-
𝑋+ = 𝐵 𝐴 +𝑊 (67)
𝑋− lization of polynomial systems aided by sum-of-squares methods (Guo
for some realization 𝑊 = [𝑤(0) 𝑤(1) … 𝑤(𝐿 − 1)] of the unknown et al., 2020), or control design for bilinear (Bisoffi et al., 2020b),
noise. In this case, by repeating the calculation (60), a data-driven delayed (Rueda-Escobedo et al., 2020), switched (Rotulo et al., 2021),
or rational (Strässer et al., 2020) dynamical systems lead to similar
(parametrization
) of the closed-loop matrix is obtained as 𝐴 + 𝐵𝐾 =
linear matrix inequalities.
𝑋+ − 𝑊 𝐺. From this point on, one may again pursue either a
certainty-equivalence or a robust design.
De Persis and Tesi (2021b) pursue a certainty-equivalence approach, 5.3.7. Discussion
i.e., the control design is not robustified against noise. The authors pro- Many other works have followed up on these ideas resulting in a
vide quantitative bounds on the noise magnitude so that the certainty- vibrant research arena. We will not provide an exhaustive overview.
equivalence design is stabilizing or a particular LQR suboptimality can Rather we conclude with a few remarks.

60
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

First, many other instances of robust and optimal control can be While the initial fundamental lemma dates back almost 20 years,
formulated and parameterized as semidefinite optimization (respec- it has recently given rise to a blossoming literature in the vibrant
tively, feasibility) problems in terms of the closed-loop matrix 𝐴 + research arena of data-driven control. On top of the manifold open
𝐵𝐾 (Scherer & Weiland, 2000). Conceptually, all of these admit data- problems already pointed out earlier, we see the following promising
driven counterparts in case of exact data, and similar robustification and important avenues for future research.
methods as in Section 5.3.6 can be applied in case of noisy data. A On the theory side, the presented data-driven approaches are based
possible caveat leading to computational challenges are the sizeable on an inherent LTI model specification—the non-parametric model
semidefinite programs. representation—and by means of robustifying and adapting the op-
Second, the above approaches model ‘‘noise’’ as a norm-bounded timization methods they have been successfully applied to stochas-
disturbance rather than as a stochastic process. A bridge between tic, nonlinear, and time-varying systems. Though what is missing is
stochastic and worst-case noise models can be built by averaging data a bottom-up approach extending behavioral systems and the non-
sets and constructing high-confidence norm bounds on 𝑊 (De Persis & parametric representation to the stochastic and nonlinear domain. Fur-
Tesi, 2021b, Section 6.2). thermore, most of the presented approaches rely on sequential explo-
Third, the above approaches are all derived from the subspace ration (data collection) and control (exploitation). Though the overall
relations (56)–(57) which again descend from the fundamental lemma. goal should be direct, online, and adaptive data-driven approaches
Though, the subsequent results are developed entirely in the state– relying on partial and noisy output measurements.
space framework. Hence, most of the methods have been created under On the computational side, data-driven methods are currently based
the dogma that a state–space representation is readily available with on batch computation without exploitation of the special structure of
measurable states, and extensions to output feedback are often more the data matrices. Because of this, the computational complexity of
conceptual than practically useful. However, in data-driven control the data-driven methods does not compare favorably with the one
design neither the state is available nor its dimension is a priori known, of model-based methods. A topic for further research is therefore
which provides a fruitful avenue for future research. development of efficient computational methods as well as recursive
methods that are suitable for online computation. Other important
6. Concluding discussion and open problems topics are sensitivity analysis of the algorithms in case of noisy data,
automatic tuning of the hyperparameters, and selecting the matrix
The behavioral system theory defines a system as a set of structure. In particular, we presented three different matrix structures:
trajectories—the behavior—and is thus intrinsically amenable to data- (mosaic) Hankel, Page, and trajectory. Preliminary empirical evidence
driven approaches. Particular system representations, input/output suggests that they are suitable in different types of problems: the
partitioning of the variables, zero initial condition, and other assump- Hankel matrix in model identification problems while the Page and
tions are not imposed a priori. System properties and design problems trajectory matrices in data-driven control via DeePC. More experiments
are specified in terms of the behavior. Then, these properties can as well as theoretical analysis are needed in order to find definitive
be checked, and analysis and design problems can be solved using guidelines for which matrix structure to use under which assumptions.
data-driven methods.
The fundamental lemma (Lemma 2) and Corollary 5, reviewed Declaration of competing interest
in Section 3, give conditions for existence of a non-parametric data-
driven representation of a linear time-invariant system. The condition The authors declare that they have no known competing finan-
of Corollary 5 is verifiable from the data and prior knowledge of the cial interests or personal relationships that could have appeared to
number of inputs, lag, and order of the system. It is a refinement of the influence the work reported in this paper.
fundamental lemma, which provides alternative sufficient conditions,
assuming in addition a given input/output partitioning of the variables
Acknowledgments
and controllability of the system.
The data-driven representation allows approaching system theory,
The authors wish to thank Jeremy Coulson, Henk van Waarde,
signal processing, and control problems using basic linear algebra.
Julian Berberich, Andrea Martin, Eduardo Prieto, Claudio de Persis,
It leads to general, simple, and practical solution methods. This was
Pietro Tesi, Paolo Rapisarda, Rodolphe Sepulchre, and John Lygeros
illustrated in the paper by applying it on a problem of data-driven
for many discussions leading up to and improving this survey paper.
missing data estimation. The resulting method assumes only linear
time-invariant system dynamics and has no hyperparameters. In case The research leading to these results has received funding from the
of noisy data, generated in the errors-in-variables setup, the maximum- European Research Council (ERC) Grant agreement number 258581
likelihood estimator is obtained by a Hankel structured low-rank ap- ‘‘Structured low-rank approximation: Theory, algorithms, and appli-
proximation and completion problem. The maximum-likelihood estima- cations’’; the Fond for Scientific Research Vlaanderen (FWO) projects
tion problem is nonconvex, however, 𝓁1 -norm regularization provides G028015N ‘‘Decoupling multivariate polynomials in nonlinear system
an effective convex relaxation. identification’’ and G090117N ‘‘Block-oriented nonlinear identification
The fundamental lemma has long served as a cornerstone of in- using Volterra series’’; and the Fonds de la Recherche Scientifique
direct data-driven control, that is, sequential system identification (FNRS), Belgium – FWO under Excellence of Science (EOS) Project no
and control. Recently, multiple direct data-driven control formulations 30468160 ‘‘Structured low-rank matrix/tensor approximation: numeri-
have sparked from the fundamental lemma and the associated non- cal optimization-based algorithms and applications’’.
parametric system representation. These can be loosely classified as
implicit and explicit approaches, as represented by the DeePC and Appendix A. Proof of the dimension formula (1)
data-driven LQR approaches. The approaches are equally amenable
to certainty-equivalence and robust control implementations. Within Let Bss (𝐴, 𝐵, 𝐶, 𝐷) be a minimal input/state/output representation
the vast realm of data-driven control, the approaches based on the of the system B (cf. Section 2.3). For any 𝑤 ∈ B|𝐿 , there is 𝑥(1) =
fundamental lemma are remarkable, as they are amenable to theoretic 𝑥ini ∈ R𝑛 , such that
analysis and certification, but they are also computationally tractable, [ ]
𝑢(𝑡)
require only few data samples, and robustly implementable in real-time 𝑤(𝑡) = 𝛱 𝑦(𝑡) , 𝑥(𝑡 + 1) = 𝐴𝑥(𝑡) + 𝐵𝑢(𝑡),
and safety-critical physical control systems. 𝑦(𝑡) = 𝐶𝑥(𝑡) + 𝐷𝑢(𝑡), for 𝑡 = 1, 2, … , 𝐿.

61
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

This system of equations can be written more compactly as Appendix C. Proofs of Corollary 5
[ ][ ] [ ]
0𝐼 𝑥(1) 𝑥(1)
𝑤 = 𝛱𝐿
O𝐿 C𝐿 𝑢
=∶ 𝑀𝐿
𝑢
, (A.1) The fact that image H𝐿 (Wd ) ⊆ B|𝐿 follows from the linear time-
invariance property of B and the exactness of the data Wd . In order to
where 𝛱𝐿 ∈ R𝑞𝐿×𝑞𝐿 is a permutation matrix (determined by 𝛱 and the prove that equality holds if and only if the generalized persistency of
re-grouping of the variables in the left-hand-side and right-hand-side of excitation condition (14) holds, note that
(A.1)), O𝐿 ∈ R𝑝𝐿×𝑛 is the extended observability matrix with 𝐿 block
rows, defined in (6), and C𝐿 ∈ R𝑝𝐿×𝑚𝐿 is the convolution matrix with 𝐿 dim image H𝐿 (Wd ) = rank H𝐿 (Wd ) (C.1)
block rows
and (1). Since, image H𝐿 (Wd ) is included in B|𝐿 , they are equal if and
⎡ ℎ(0) 0 ⋯ 0 ⎤ only if their dimensions are equal. The result follows from (C.1) and
⎢ ⎥
ℎ(1) ℎ(0) ⋱ ⋮ ⎥
C𝐿 ∶= ⎢ , (A.2) (1).
⎢ ⋮ ⋱ ⋱ 0 ⎥
⎢ℎ(𝐿 − 1) ⋯ ℎ(1) ⎥
ℎ(0)⎦
⎣ References
constructed from the Markov parameters
Agarwal, A., Amjad, M. J., Shah, D., & Shen, D. (2018). Model agnostic time series
ℎ(0) = 𝐷, ℎ(𝑘) = 𝐶𝐴𝑘−1 𝐵, for 𝑘 = 1, 2, … analysis via matrix estimation. Proceedings of the ACM on Measurement and Analysis
of Computing Systems, 2(3), 1–39.
of the system. From (A.1), it follows that Alexandru, A. B., Tsiamis, A., & Pappas, G. J. (2021). Encrypted distributed lasso for
sparse data predictive control. ArXiv preprint arXiv:2104.11632.
dim B|𝐿 = rank 𝑀𝐿 .
Allibhoy, A., & Cortés, J. (2020). Data-based receding horizon control of linear network
Since the representation is minimal and 𝐿 ≥ 𝐥(B), the extended systems. IEEE Control Systems Letters, 5(4), 1207–1212.
Alpago, D., Dörfler, F., & Lygeros, J. (2020). An extended Kalman filter for data-enabled
observability matrix O𝐿 is full column rank 𝑛. Then, due to the lower-
predictive control. IEEE Control Systems Letters, 4(4), 994–999.
triangular block-structure of 𝑀𝐿 and the identity block, 𝑀𝐿 is also full Alsalti, M., Berberich, J., Lopez, V. G., Allgöwer, F., & Müller, M. A. (2021). Data-
column rank. Therefore, based system analysis and control of flat nonlinear systems. ArXiv preprint arXiv:
2103.02892.
dim B|𝐿 = rank 𝑀𝐿 = 𝑚𝐿 + 𝑛.
Anderson, J., Doyle, J. C., Low, S. H., & Matni, N. (2019). System level synthesis.
Annual Reviews in Control, 47, 364–393.
Appendix B. Proof of Lemma 1 Anderson, B. D., & Moore, J. B. (2007). Optimal control: Linear quadratic methods. Courier
Corporation.
We provide a state–space and a representation-free proof. Antsaklis, P. J., & Michel, A. (1997). Linear systems. McGraw-Hill.
Baggio, G., Bassett, D. S., & Pasqualetti, F. (2021). Data-driven control of complex
B.1. Using an input/state/output representation networks. Nature Communications, 12(1), 1–13.
Baggio, G., Katewa, V., & Pasqualetti, F. (2019). Data-driven minimum-energy controls
for linear systems. IEEE Control Systems Letters, 3(3), 589–594.
Let Bss (𝐴, 𝐵, 𝐶, 𝐷) be a minimal input/state/output representation
Baggio, G., & Pasqualetti, F. (2020). Learning minimum-energy controls from
of the system B. Since 𝑤ini is a trajectory of B|𝑇ini , there is an 𝑥(1) = heterogeneous data. In 2020 American control conference (pp. 3991–3996). IEEE.
𝑥ini ∈ R𝑛 , such that Baros, S., Chang, C.-Y., Colon-Reyes, G. E., & Bernstein, A. (2020). Online data-enabled
predictive control. ArXiv preprint arXiv:2003.03866.
𝑦ini = O𝑇ini 𝑥ini + C𝑇ini 𝑢ini , (B.1) Berberich, J., & Allgöwer, F. (2020). A trajectory-based framework for data-driven
where O𝑇ini is defined in (6) and C𝑇ini is defined in (A.2). Moreover, system analysis and control. In European control conf. (pp. 1365–1370).
Berberich, J., Koch, A., Scherer, C. W., & Allgöwer, F. (2020). Robust data-driven
the assumption that B(𝐴, 𝐵, 𝐶, 𝐷) is a minimal representation and
state-feedback design. In 2020 American control conference (pp. 1532–1538). IEEE.
𝑇ini ≥ 𝐥(B) imply that the extended observability matrix O𝑇ini has full Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2020a). Data-driven tracking
column rank. Therefore, the system of equations (B.1) has a unique MPC for changing setpoints. IFAC-PapersOnLine, 53(2), 6923–6930.
solution 𝑥ini . The initial condition 𝑥(𝑇ini + 1) for the trajectory 𝑤f is Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2020b). Robust constraint
uniquely determined by 𝑥ini and 𝑢ini satisfaction in data-driven MPC. In 2020 59th IEEE conference on decision and control
[ ] (pp. 1260–1267).
𝑥(𝑇ini + 1) = 𝐴𝑇ini 𝑥ini + 𝐴𝑇ini −1 𝐵 𝐴𝑇ini −2 𝐵 ⋯ 𝐴𝐵 𝐵 𝑢ini . Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2021a). Data-driven
model predictive control: closed-loop guarantees and experimental results.
The uniqueness of 𝑦f follows from the uniqueness of 𝑥(𝑇ini + 1). At-Automatisierungstechnik, 69(7), 608–618.
Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2021b). Data-driven model
B.2. A representation-free proof predictive control with stability and robustness guarantees. IEEE Transactions on
Automatic Control, 66(4), 1702–1717.
Let 𝑛 ∶= 𝐧(B). By the dimension formula (1), there is a full column Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2021c). Linear tracking
rank matrix 𝐵 ∈ R𝑞(𝑇ini +𝐿)×(𝑚(𝑇ini +𝐿)+𝑛) , such that MPC for nonlinear systems parts I & II. ArXiv preprints arXiv:2105.08560 and
arXiv:2105.08567.
𝑤ini ∧ (𝑢, 𝑦) = 𝐵𝑔, Berberich, J., Köhler, J., Müller, M. A., & Allgöwer, F. (2021d). On the design of
terminal ingredients for data-driven MPC. ArXiv preprint arXiv:2101.05573.
for some 𝑔 ∈ R𝑚(𝑇ini +𝐿)+𝑛 , i.e., the columns of 𝐵 form a basis for B|𝑇ini +𝐿 . Berberich, J., Scherer, C. W., & Allgöwer, F. (2020). Combining prior knowledge and
Denote with 𝐵 ′ ∈ R(𝑞𝑇ini +𝑚𝐿)×(𝑚(𝑇ini +𝐿)+𝑛) the submatrix of 𝐵 obtained data for robust controller design. ArXiv preprint arXiv:2009.05253.
by selecting the rows of 𝐵 corresponding to 𝑤ini and 𝑢. The simulation Bertsimas, D., & Copenhaver, M. S. (2018). Characterization of the equivalence of
problem (8) has a unique solution if and only if the system of equations robustification and regularization in linear and matrix regression. European Journal
of Operational Research, 270(3), 931–942.
[ ] Bianchin, G., Vaquero, M., Cortes, J., & Dall’Anese, E. (2021). Data-driven synthesis
𝑤ini of optimization-based controllers for regulation of unknown linear systems. ArXiv
= 𝐵′ 𝑔 (B.2) preprint arXiv:2103.16067.
𝑢
Bisoffi, A., De Persis, C., & Tesi, P. (2020a). Controller design for robust invariance
has a unique solution 𝑔. A necessary and sufficient condition for unique- from noisy data. ArXiv preprint arXiv:2007.13181.
ness of a solution of (B.2) is that 𝐵 ′ is full column rank. By the Bisoffi, A., De Persis, C., & Tesi, P. (2020b). Data-based stabilization of unknown
assumption 𝑇ini ≥ 𝐥(B), using (1) and the fact that 𝑢 is a free variable, bilinear systems with guaranteed basin of attraction. Systems & Control Letters, 145,
we have Article 104788.
[ ] Bisoffi, A., De Persis, C., & Tesi, P. (2021). Trade-offs in learning controllers from noisy
rank 𝐵 ′ = dim { 𝑤𝑢ini | 𝑤ini ∈ B|𝑇ini and 𝑢 ∈ R𝑚𝐿 } data. ArXiv preprint arXiv:2103.08629.
= 𝑚𝑇ini + 𝑚𝐿 + 𝑛 = col dim 𝐵 ′ , Bongard, J., Berberich, J., Köhler, J., & Allgöwer, F. (2021). Robust stability analysis
of a simple data-driven model predictive control approach. ArXiv preprint arXiv:
so that, 𝐵 ′ is indeed full column rank. 2103.00851.

62
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Borrelli, F., Bemporad, A., & Morari, M. (2017). Predictive control for linear and hybrid Grant, M., & Boyd, S. (2008). CVX: Matlab software for disciplined convex
systems. Cambridge University Press. programming. URL: stanford.edu/~boyd/cvx.
Box, G., & Jenkins, G. (1976). Time series analysis: Forecasting and control. Holden-Day. Gray, A., Gao, Y., Lin, T., Hedrick, J. K., Tseng, H. E., & Borrelli, F. (2012). Predictive
Brunton, S., Proctor, J., & Kutz, N. (2016). Discovering governing equations from data control for agile semi-autonomous ground vehicles using motion primitives. In 2012
by sparse identification of nonlinear dynamical systems. Proceedings of the National American control conference (pp. 4239–4244). IEEE.
Academy of Sciences, 113, 3932–3937. Guo, M., De Persis, C., & Tesi, P. (2020). Data-driven stabilization of nonlinear
Carlet, P., A., F., Bolognani, S., & Dörfler, F. (2021). Data-driven continuous-set polynomial systems with noisy data. ArXiv preprint arXiv:2011.07833.
predictive current control for synchronous motor drives. IEEE Transactions on Power Hauswirth, A., Bolognani, S., Hug, G., & Dörfler, F. (2021). Optimization algorithms as
Electronics, submitted for publication. robust feedback controllers. URL: http://arxiv.org/abs/2103.11329. submitted for
Carlet, P. G., Favato, A., Bolognani, S., & Dörfler, F. (2020). Data-driven predictive publication. Available at http://arxiv.org/abs/2103.11329.
current control for synchronous motor drives. In 2020 IEEE energy conversion Heinig, G. (1995). Generalized inverses of Hankel and Toeplitz mosaic matrices. Linear
congress and exposition (pp. 5148–5154). Algebra and its Applications, 216, 43–59.
Chiuso, A., & Pillonetto, G. (2019). System identification: A machine learning Hewing, L., Wabersich, K., Menner, M., & Zeilinger, M. (2020). Learning-based model
perspective. Annual Review of Control, Robotics, and Autonomous Systems, 2, predictive control: Toward safe learning in control. Annual Review of Control,
281–304. Robotics, and Autonomous Systems, 3, 269–296.
Hjalmarsson, H. (2005). From experiment design to closed-loop control. Automatica,
Coulson, J., Lygeros, J., & Dörfler, F. (2019a). Data-enabled predictive control: In the
41(3), 393–438.
shallows of the deepc. In European control conference (pp. 307–312).
Hou, Z.-S., & Wang, Z. (2013). From model-based control to data-driven control: Survey,
Coulson, J., Lygeros, J., & Dörfler, F. (2019b). Regularized and distributionally robust
classification and perspective. Information Sciences, 235, 3–35.
data-enabled predictive control. In Proc. of IEEE conf. on decision and control (pp.
Huang, L., Coulson, J., Lygeros, J., & Dörfler, F. (2019). Data-enabled predictive control
7165–7170).
for grid-connected power converters. In 2019 IEEE 58th conference on decision and
Coulson, J., Lygeros, J., & Dörfler, F. (2020). Distributionally robust chance constrained
control (CDC) (pp. 8130–8135). IEEE.
data-enabled predictive control. in press. Available at https://arxiv.org/abs/2006.
Huang, L., Coulson, J., Lygeros, J., & Dörfler, F. (2021). Decentralized data-enabled
01702,
predictive control for power system oscillation damping. IEEE Transactions on
Cutler, C. R., & Ramaker, B. L. (1980). Dynamic matrix control – A computer control
Control Systems Technology, in press. Available at https://arxiv.org/abs/1911.12151.
algorithm. In Joint automatic control conference (p. 72).
Huang, L., Jianzhe, Z., Lygeros, J., & Dörfler, F. (2021). Robust data-enabled predictive
Damen, A., Van den Hof, P., & Hajdasinski, A. (1982). Approximate realization based control: Tractable formulations and performance guarantees. ArXiv preprint arXiv:
upon an alternative to the Hankel matrix: the Page matrix. Control Systems Letters, 2105.07199.
2, 202–208. Huang, B., & Kadali, R. (2008). Dynamic modeling, predictive control and performance
De Persis, C., & Tesi, P. (2019). Formulas for data-driven control: Stabilization, monitoring: A data-driven subspace approach. Springer.
optimality, and robustness. IEEE Transactions on Automatic Control, 65(3), 909–924. Huang, L., Zhen, J., Lygeros, J., & Dörfler, F. (2021). Quadratic regularization of data-
De Persis, C., & Tesi, P. (2021a). Designing experiments for data-driven control of enabled predictive control: Theory and application to power converter experiments.
nonlinear systems. ArXiv preprint arXiv:2103.16509. In IFAC symposium on system identification. in press. Available at https://arxiv.org/
De Persis, C., & Tesi, P. (2021b). Low-complexity learning of linear quadratic regulators abs/2012.04434.
from noisy data. Automatica, 128, Article 109548. Iannelli, A., Yin, M., & Smith, R. S. (2020). Experiment design for impulse response
Dean, S., Mania, H., Matni, N., Recht, B., & Tu, S. (2019). On the sample complexity identification with signal matrix models. ArXiv preprint arXiv:2012.08126.
of the linear quadratic regulator. Foundations of Computational Mathematics, 1–47. Ikeda, M., Fujisaki, Y., & Hayashi, N. (2001). A model-less algorithm for tracking control
Dörfler, F., Coulson, J., & Markovsky, I. (2021a). Bridging direct & indirect data- based on input-output data. Nonlinear Analysis. Theory, Methods & Applications,
driven control formulations via regularizations and relaxations. https://arxiv.org/ 47(3), 1953–1960.
abs/2101.01273. Jud, D., Kerscher, S., Wermelinger, M., Jelavic, E., Egli, P., Leemann, P., Hottiger, G.,
Dörfler, F., Coulson, J., & Markovsky, I. (2021b). Bridging direct & indirect data- & Hutter, M. (2021). Heap-the autonomous walking excavator. Automation in
driven control formulations via regularizations and relaxations: Technical Report, arXiv: Construction, 129, Article 103783.
2101.01273. URL: https://arxiv.org/abs/2101.01273. Kailath, T., Sayed, A. H., & Hassibi, B. (2000). Linear estimation. Prentice Hall.
Dörfler, F., Tesi, P., & De Persis, C. (2021). On the certainty-equivalence approach to Kaiser, E., Kutz, J. N., & Brunton, S. L. (2018). Sparse identification of nonlinear
direct data-driven LQR design. ArXiv preprint arXiv:2109.06643. dynamics for model predictive control in the low-data limit. Proceedings of the
Dreesen, P., & Markovsky, I. (2019). Data-driven simulation using the nuclear norm Royal Society of London, Series A (Mathematical and Physical Sciences), 474(2219),
heuristic. In In proceedings of the international conference on acoustics, speech, and Article 20180335.
signal processing. Brighton, UK. Koch, A., Berberich, J., Köhler, J., & Allgöwer, F. (2020). Determining optimal
El Ghaoui, L., & Lebret, H. (1997). Robust solutions to least-squares problems with input-output properties: A data-driven approach. arXiv:arXiv:2002.03882.
uncertain data. SIAM Journal on Matrix Analysis and Applications, 18(4), 1035–1064. Krishnan, V., & Pasqualetti, F. (2021). On direct vs indirect data-driven predictive
Elokda, E., Coulson, J., Beuchat, P., Lygeros, J., & Dörfler, F. (2019). Data-enabled control. ArXiv preprint arXiv:2103.14936.
predictive control for quadcopters. International Journal of Robust and Nonlinear Kuhn, D., Esfahani, P., Nguyen, V., & Shafieezadeh-Abadeh, S. (2019). Wasserstein
Control, http://dx.doi.org/10.1002/rnc.5686, in press. distributionally robust optimization: Theory and applications in machine learning.
Fabiani, F., & Goulart, P. J. (2020). The optimal transport paradigm enables data In Operations research & management science in the age of analytics (pp. 130–166).
compression in data-driven robust control. ArXiv preprint arXiv:2005.09393. INFORMS.
Favoreel, W., De Moor, B., & Gevers, M. (1999). SPC: subspace predictive control. IFAC Landau, I., Rey, D., Karimi, A., Voda, A., & Franco, A. (1995). A flexible transmission
Proceedings Volumes, 32(2), 4004–4009. system as a benchmark for robust digital control. European Journal of Control, 1(2),
77–96.
Fazel, M. (2002). Matrix rank minimization with applications. (Ph.D. thesis), Stanford
Lian, Y., & Jones, C. (2021a). Nonlinear data-enabled prediction and control. arXiv:
University.
2101.03187.
Feron, E., Balakrishnan, V., Boyd, S., & El Ghaoui, L. (1992). Numerical methods for
Lian, Y., & Jones, C. N. (2021b). From system level synthesis to robust closed-loop
𝐻2 related problems. In 1992 American control conference (pp. 2921–2922). IEEE.
data-enabled predictive control. ArXiv preprint arXiv:2102.06553.
Fiedler, F., & Lucia, S. (2021). On the relationship between data-enabled predictive
Lian, Y., Shi, J., Koch, M. P., & Jones, C. N. (2021). Adaptive robust data-driven
control and subspace predictive control. In European control conference.
building control via bi-level reformulation: an experimental result. ArXiv preprint
Frazzoli, E., Dahleh, M. A., & Feron, E. (2005). Maneuver-based motion planning
arXiv:2106.05740.
for nonlinear systems with symmetries. IEEE Transactions on Robotics, 21(6),
Lian, Y., Wang, R., & Jones, C. (2021). Koopman based data-driven predictive control.
1077–1091.
arXiv:2102.05122.
Fujisaki, Y., Duan, Y., & Ikeda, M. (2004). System representation and optimal control Lu, X., Chen, H., Gao, B., Zhang, Z., & Jin, W. (2014). Data-driven predictive gearshift
in input-output data space. IFAC Proceedings Volumes, 37(11), 185–190. control for dual-clutch transmissions and FPGA implementation. IEEE Transactions
Furieri, L., Guo, B., Martin, A., & Ferrari-Trecate, G. (2021a). A behavioral input-output on Industrial Electronics, 62(1), 599–610.
parametrization of control policies with suboptimality guarantees. ArXiv preprint Lundström, P., Lee, J. H., Morari, M., & Skogestad, S. (1995). Limitations of dynamic
arXiv:2102.13338. matrix control. Computers & Chemical Engineering, 19(4), 409–421.
Furieri, L., Guo, B., Martin, A., & Ferrari-Trecate, G. (2021b). Near-optimal design of Luppi, A., De Persis, C., & Tesi, P. (2021). On data-driven stabilization of systems with
safe output feedback controllers from noisy data. ArXiv preprint arXiv:2105.10280. quadratic nonlinearities. ArXiv preprint arXiv:2103.15631.
Furieri, L., Zheng, Y., Papachristodoulou, A., & Kamgarpour, M. (2019). An input– Mania, H., Tu, S., & Recht, B. (2019). Certainty equivalence is efficient for linear
output parametrization of stabilizing controllers: Amidst youla and system level quadratic control. ArXiv preprint arXiv:1902.07826.
synthesis. IEEE Control Systems Letters, 3(4), 1014–1019. Markovsky, I. (2008). Structured low-rank approximation and its applications.
Garcia, C. E., Prett, D. M., & Morari, M. (1989). Model predictive control: Theory and Automatica, 44(4), 891–909.
practice – A survey. Automatica, 25(3), 335–348. Markovsky, I. (2012). How effective is the nuclear norm heuristic in solving data
Golub, G., & Van Loan, C. (1996). Matrix computations (3rd ed.). Johns Hopkins approximation problems? In Proc. of the 16th IFAC symposium on system identification
University Press. (pp. 316–321). Brussels.

63
I. Markovsky and F. Dörfler Annual Reviews in Control 52 (2021) 42–64

Markovsky, I. (2013). A software package for system identification in the behavioral Schwarz, J., Micheli, F., Hudoba de Badyn, M., & Smith, R. (2019). Data-driven
setting. Control Engineering Practice, 21, 1422–1436. control of buildings and energy hubs (Semester Thesis), ETH Zurich, Available at
Markovsky, I. (2014). Recent progress on variable projection methods for structured https://www.research-collection.ethz.ch/.
low-rank approximation. Signal Processing, 96PB, 406–419. Strässer, R., Berberich, J., & Allgöwer, F. (2020). Data-driven control of nonlinear
Markovsky, I. (2015). An application of system identification in metrology. Control systems: Beyond polynomial dynamics. ArXiv preprint arXiv:2011.11355.
Engineering Practice, 43, 85–93. Tischhauser, F., Egli, P., Coulson, J., Hutter, M., & Dörfler, F. (2019). Data-enabled
Markovsky, I. (2017). A missing data approach to data-driven filtering and control. predictive control of robotic systems (Semester Thesis), ETH Zurich, Available at
IEEE Transactions on Automatic Control, 62, 1972–1978. https://www.research-collection.ethz.ch/.
Markovsky, I. (2019). Low-rank approximation: Algorithms, implementation, applications Treven, L., Curi, S., Mutny, M., & Krause, A. (2020). Learning controllers for unstable
(2nd ed.). Springer. linear quadratic regulators from a single trajectory. ArXiv preprint arXiv:2006.
Markovsky, I., & De Moor, B. (2005). Linear dynamic filtering with noisy input and 11022.
output. Automatica, 41(1), 167–171. Umenberger, J., Ferizbegovic, M., Schön, T., & Hjalmarsson, H. (2019). Robust
Markovsky, I., & Dörfler, F. (2020). Identifiability in the behavioral setting. Vrije exploration in linear quadratic reinforcement learning. In 33rd annual conference
Universiteit Brussel, URL: http://homepages.vub.ac.be/~imarkovs/publications/ on neural information processing systems.
identifiability.pdf. Usevich, K., & Markovsky, I. (2014). Variable projection for affinely structured low-
Markovsky, I., & Dörfler, F. (2021). Data-driven dynamic interpolation and rank approximation in weighted 2-norms. Journal of Computational and Applied
approximation. Automatica. Mathematics, 272, 430–448.
Markovsky, I., & Rapisarda, P. (2008). Data-driven simulation and control. International Vajpayee, V., Mukhopadhyay, S., & Tiwari, A. P. (2017). Data-driven subspace pre-
Journal of Control, 81(12), 1946–1959. dictive control of a nuclear reactor. IEEE Transactions on Nuclear Science, 65(2),
Markovsky, I., & Usevich, K. (2013). Structured low-rank approximation with missing 666–679.
data. SIAM Journal of Mathematical Analysis, 34(2), 814–830. Van Overschee, P., & De Moor, B. (1996). Subspace identification for linear systems:
Markovsky, I., Willems, J. C., Rapisarda, P., & De Moor, B. (2005). Algorithms for Theory, implementation, applications. Boston: Kluwer.
deterministic balanced subspace identification. Automatica, 41(5), 755–766. van Waarde, H., Camlibel, K., & Mesbahi, M. (2020). From noisy data to feedback
Markovsky, I., Willems, J. C., Van Huffel, S., & De Moor, B. (2006). Exact and controllers: Non-conservative design via a matrix S-lemma. arXiv:2006.00870.
approximate modeling of linear systems: A behavioral approach. SIAM. van Waarde, H. J., De Persis, C., Camlibel, M. K., & Tesi, P. (2020). Willems’
Marro, G., Prattichizzo, D., & Zattoni, E. (2002). Geometric insight into discrete-time fundamental lemma for state-space systems and its extension to multiple datasets.
cheap and singular linear quadratic riccati (LQR) problems. IEEE Transactions on IEEE Control Systems Letters, 4, 602–607.
Automatic Control, 47(1), 102–107. Verhaegen, M., & Dewilde, P. (1992). Subspace model identification, part 2: Analysis
Martin, T., & Allgöwer, F. (2021a). Data-driven inference on optimal input-output of the output-error state-space model identification algorithm. International Journal
properties of polynomial systems with focus on nonlinearity measures. arXiv:arXiv: of Control, 56, 1187–1210.
2103.10306. Verhoek, C., Tóth, R., Haesaert, S., & Koch, A. (2021). Fundamental lemma for
Martin, T., & Allgöwer, F. (2021b). Dissipativity verification with guarantees for data-driven analysis of linear parameter-varying systems. ArXiv preprint arXiv:
polynomial systems from noisy input-state data. IEEE Control Systems Letters, 5(4), 2103.16171.
1399–1404. http://dx.doi.org/10.1109/LCSYS.2020.3037842. van Waarde, H. J. (2021). Beyond persistency of excitation: Online experiment design
Maupong, T., Mayo-Maldonado, J. C., & Rapisarda, P. (2017). On Lyapunov functions for data-driven modeling and control. ArXiv preprint arXiv:2102.11193.
and data-driven dissipativity. IFAC-PapersOnLine, 50(1), 7783–7788. van Waarde, H. J., & Camlibel, M. K. (2021). A Matrix Finsler’s lemma with applications
Mishra, V., Markovsky, I., & Grossmann, B. (2020). Data-driven tests for controllability. to data-driven control. ArXiv preprint arXiv:2103.13461.
Control Systems Letters, 5, 517–522. van Waarde, H. J., Camlibel, M. K., Rapisarda, P., & Trentelman, H. L. (2021). Data-
Monshizadeh, N. (2020). Amidst data-driven model reduction and control. IEEE Control driven dissipativity analysis: application of the matrix S-lemma. Control Systems
Systems Letters, 4(4), 833–838. Magazine, submitted for publication.
Nonhoff, M., & Müller, M. A. (2021). Data-driven online convex optimization for control van Waarde, H. J., Eising, J., Trentelman, H. L., & Camlibel, M. K. (2020). Data infor-
of dynamical systems. ArXiv preprint arXiv:2103.09127. mativity: a new perspective on data-driven analysis and control. IEEE Transactions
Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and Trends in on Automatic Control, 65(11), 4753–4768.
Optimization, 1, 123–231. Wegner, F., Coulson, J., Hudoba de Badyn, M., Lygeros, J., & Trimpe, S. (2021). Data-
Pillai, H., & Willems, J. (1999). The behavioural approach to distributed systems. In enabled predictive control of a 12t excavator (Master Thesis), ETH Zurich, Available
Proc. 38th IEEE conference on decision and control, Vol. 1 (pp. 626–630). at https://www.research-collection.ethz.ch/.
Pillonetto, G., Dinuzzo, F., Chen, T., De Nicolao, G., & Ljung, L. (2014). Kernel Willems, J. C. (1986). From time series to linear system—Part I. Finite dimensional
methods in system identification, machine learning and function estimation: A linear time invariant systems. Automatica, 22(5), 561–580.
survey. Automatica, 50(3), 657–682. Willems, J. C. (1986, 1987). From time series to linear system—Part I. Finite dimen-
Polderman, J., & Willems, J. C. (1998). Introduction to mathematical systems theory. sional linear time invariant systems, part II. Exact modelling, part III. approximate
Springer-Verlag. modelling. Automatica, 22, 23, 561–580, 675–694, 87–115.
Proctor, J. L., Brunton, S. L., & Kutz, J. N. (2016). Dynamic mode decomposition with Willems, J. C. (1991). Paradigms and puzzles in the theory of dynamical systems. IEEE
control. SIAM Journal on Applied Dynamical Systems, 15(1), 142–161. Transactions on Automatic Control, 36(3), 259–294.
Rapisarda, P., & Willems, J. C. (1997). State maps for linear systems. SIAM Journal on Willems, J. C. (2007a). The behavioral approach to open and interconnected systems:
Control and Optimization, 35(3), 1053–1091. Modeling by tearing, zooming, and linking. Control Systems Magazine, 27, 46–99.
Rawlings, J. B., Mayne, D. Q., & Diehl, M. (2017). Model predictive control: theory, Willems, J. C. (2007b). In control, almost from the beginning until the day after
computation, and design, Vol. 2. Nob Hill Publishing Madison, WI. tomorrow. European Journal of Control, 13, 71–81.
Recht, B. (2019). A tour of reinforcement learning: The view from continuous control. Willems, J. C., Rapisarda, P., Markovsky, I., & De Moor, B. (2005). A note on persistency
Annual Review of Control, Robotics, and Autonomous Systems, 2, 253–279. of excitation. Control Letters, 54(4), 325–329.
Romer, A., Berberich, J., Köhler, J., & Allgöwer, F. (2019). One-shot verification of Xu, H., Caramanis, C., & Mannor, S. (2010). Robust regression and lasso. IEEE
dissipativity properties from input–output data. IEEE Control Systems Letters, 3(3), Transactions on Information Theory, 56(7), 3561–3574.
709–714. Xu, L., Turan, M. S., Guo, B., & Ferrari-Trecate, G. (2021). A data-driven convex
Roorda, B., & Heij, C. (1995). Global total least squares modeling of multivariate time programming approach to worst-case robust tracking controller design. ArXiv
series. IEEE Transactions on Automatic Control, 40(1), 50–63. preprint arXiv:2102.11918.
Rosa, T. E., & Jayawardhana, B. (2021). On the one-shot data-driven verification Xue, A., & Matni, N. (2020). Data-driven system level synthesis. arXiv:2011.10674.
of dissipativity of lti systems with general quadratic supply rate function. ArXiv Yang, H., & Li, S. (2013). A new method of direct data-driven predictive controller
preprint arXiv:2104.03108. design. In 2013 9th Asian control conference (ASCC) (pp. 1–6). IEEE.
Rotulo, M., De Persis, C., & Tesi, P. (2021). Online learning of data-driven controllers Yin, M., Iannelli, A., & Smith, R. (2020a). Maximum likelihood estimation in
for unknown switched linear systems. ArXiv preprint arXiv:2105.11523. data-driven modeling and control.
Rueda-Escobedo, J. G., Fridman, E., & Schiffer, J. (2020). Data-driven control for linear Yin, M., Iannelli, A., & Smith, R. S. (2020b). Maximum likelihood signal matrix model
discrete-time delay systems. ArXiv preprint arXiv:2010.02657. for data-driven predictive control. ArXiv preprint arXiv:2012.04678.
Rueda-Escobedo, J., & Schiffer, J. (2020). Data-driven internal model control of Yu, Y., Talebi, S., van Waarde, H., Topcu, U., Mesbahi, M., & Açıkmeşe, B. (2021).
second-order discrete Volterra systems. arXiv:2003.14158. On controllability and persistency of excitation in data-driven control: Extensions
Salvador, J. R., Ramírez, D. R., Alamo, T., de la Peña, D. M. n., & Garcia-Marin, G. of Willems’ fundamental lemma. arXiv:2102.02953v1.
(2019). Data driven control: an offset free approach. In 2019 18th European control Zeng, J.-s., Gao, C.-h., & Su, H.-Y. (2010). Data-driven predictive control for blast
conference (pp. 23–28). IEEE. furnace ironmaking process. Computers & Chemical Engineering, 34(11), 1854–1862.
Scherer, C., & Weiland, S. (2000). Linear matrix inequalities in control. Lecture Notes,
Dutch Institute for Systems and Control, Delft, the Netherlands, 3(2).

64

You might also like