Slides
Slides
and
Parameter Estimation
Course numbers: 191131700 (5 EC for MSc ME/S&C)
201600356 (6 EC for PDEng)
Canvas: http://canvas.utwente.nl/
y [−]
800 800
700 700
2100 2100
u [−]
u [−]
2050 2050
2000 2000
0 0.2 0.4 0.6 0.8 1 0.5 0.501 0.502 0.503 0.504 0.505
t [s] t [s]
This data will be used more often during the lectures and is available for download.
Click-and-play??
• And what is the use of these lectures?
An overview of the applied methods, the background of the algorithms, the
pitfalls and suggestions for use.
Y (s) y(t)
= G(s) of = G(s)
U (s) u(t)
• Input/output selection
• Experiment design
• Collection of data
• Choice of the model structure (set of possible solutions)
• Estimation of the parameters
• Validation of the obtained model (preferably with separate
“validation” data).
• MIMO-systems
• Identification in the frequency domain
• Identification of closed loop systems
• Non-linear optimisation
• Cases.
Course material:
Lecture notes / slides in PDF-format from Canvas site.
On-line M ATLAB documentation of the (selected) toolboxes.
Examination (5 EC):
Part 1: Individual open book notebook exam at the end of the block. Questions
are similar to exercises 1–5 about the topics in chapters 1–6. The answers to
these exercises will be addressed during the lectures, see Activity plan.
Grade for this exam is 50% of final grade and must be pass (≥ 5.5).
Part 2: “Standard” assignments (with/without lab assignment) or your own plan
An oral exam may be scheduled to finalise the grade.
Planning: Can be completed at the end of block 2A or spread out over two
blocks 2A and 2B or see Canvas for grading schedule.
Questions?
Lecture breaks, reserved office hours (see Canvas), ...
A description of the system T should specify how the output signal(s) y depend
on the input signal(s) u.
It specifies for each frequency ω the input-output relation for harmonic signals
(after the transient behaviour vanished):
The output y(t) for an arbitrary input signal u(t) can be found by considering
all frequencies in the input signal: Fourier transform.
Procedure (in principle): Transform the input into the frequency domain, multiply
with the Bode plot or FRF, transform the result back to the time domain.
The input signal can be considered as a sequence of impulses with some (time
dependent) amplitude u(t). The outputs due to all impulses are added:
Z∞
y(t) = (g ∗ u)(t) = g(τ )u(t − τ ) dτ
0
∞
X
y(k) = (g ∗ u)(k) = g(l)u(k − l) (k = 0, 1, 2, ...)
l=0
Note that discrete time g(l) equals approximately continuous time g(l Ts ) Ts.
Signals are sampled at discrete time instances tk with sample time Ts,:
Stability:
Poles in LHP: Poles inside the unit circle:
Re(s) < 0 |z| < 1
Undamped poles:
Imaginary axis: Unit circle:
s = iω z = eiωTs
0 0
−20
phase [deg]
−0.5
−40
−1 −60
Ts = 0.1 s
0 0.5 1 1.5 2 −80
t [s] 0 1
10 10
ω [rad/s]
M ATLAB’s identification toolbox ident works with time domain data. Even then,
the frequency domain will appear to be very important. Furthermore,
identification can also be applied in the frequency domain.
UN (ωl ) with ωl = Nl ωs = Nl 2π
Ts , l = 0, ..., N − 1 is the discrete Fourier
transform (DFT) of the signal ud(tk ) with tk = kTs, k = 0, ...N − 1.
For N equal a power of 2, the Fast Fourier Transform (FFT) algorithm can be
applied.
6
900 10
4
800 10
YN [−]
y [−]
2
700 10
0
600 10
0 2 4
0 0.5 1 10 10 10
t [s] f [Hz]
1
Power spectrum: Φu(ω) = lim |UT (ω)|2
T →∞ T
ZT Z∞
1 1
Power: Pu = lim u(t)2dt = Φu(ω)dω
T →∞ T 2π
0 −∞
[ UT (ω) is de Fourier transform of a continuous time signal with a finite duration ]
1
Power spectrum: Φu(ω) = lim |UN (ω)|2 (“periodogram”)
N →∞ N
1 NX
−1 Z
Power: Pu = lim ud(k)2 = Φu(ω)dω
N →∞ N
k=0 ωs
900
0
10
800
Output # 1
y [−]
600 0
10
1
10
2
10
3
10
4
10
0 0.5 1
t [s]
Z∞
Continuous time: y(t) = (g ∗ u)(t) = g(τ )u(t − τ ) dτ
0
∞
X
Discrete time: y(k) = (g ∗ u)(k) = g(l)u(k − l) (t = 0, 1, 2, ...)
l=0
Example: u is the input and g(k), k = 0, 1, 2, ... is the impulse response of the
system, that is the response for an input signal that equals 1 for t = 0 en
equals 0 elsewhere.
Then with the expressions above y(k) is the output of the system.
Realisation of a signal x(t) is not only a function of time t, but depends also on
the ensemble behaviour.
White noise: e(t) is not correlated with signals e(t − τ ) for any τ 6= 0.
Consequence: Re(τ ) = ???
Realisation of a signal x(t) is not only a function of time t, but depends also on
the ensemble behaviour.
White noise: e(t) is not correlated with signals e(t − τ ) for any τ 6= 0.
Consequence: Re(τ ) = 0 for τ 6= 0.
Power:
E (x(t) − E x(t))2 = Rx(0) = E (xd(t) − E xd(t))2 = Rxd (0) =
Z∞
1 T
Z
= Φx(ω) dω = Φxd (ω) dω
2π 2π ω
−∞ s
Power:
E (x(t) − E x(t))2 = Rx(0) = E (xd(t) − E xd(t))2 = Rxd (0) =
Z∞
1 T
Z
= Φx(ω) dω = Φxd (ω) dω
2π 2π ω
−∞ s
0 1 2 3 4
10 10 10 10 10
Frequency (Hz)
v
u ✗✔
❄ y
✲
G ✲
✖✕
✲
This estimator is
• Give models with “many” numbers, so we don’t obtain models with a “small”
number of parameters.
• The results are no “simple” mathematical relations.
• The results are often used to check the “simple” mathematical relations that
are found with (subsequent) parametric identification.
• Non-parametric identification is often the fist step.
∗The IDENT commands impulse & step use a different approach that is related to the
parametric identification to be discussed later.
v
u ✗✔
❄ y y(t) = G0(z) u(t) + v(t)
✲
G0 ✲
✖✕
✲
∞
X
y(t) = g0(k)u(t − k) + v(t) (t = 0, 1, 2, ...)
k=0
∞
g0(k)z −k
X
So the transfer function can be written as: G0(z) =
k=0
M
X
y(t) ≈ ĝ(k)u(t − k) (t = 0, 1, 2, ...)
k=0
Note: In an analysis the lower limit of the summation can be taken less than 0 (e.g. −m) to
verify the (non-)existence of a non-causal relation between u(t) and y(t).
∞
X
Ryu(τ ) = g0(k)Ru (τ − k)
k=0
Ryu (τ ) R̂yu (τ )
g0(τ ) = and ĝ(τ ) =
σu2 σu2
How do we compute the estimator for the cross covariance R̂yu (τ ) from N
measurements?
N
N 1 X
R̂yu(τ ) = y(t)u(t − τ )
N t=τ
N
1 X
and look for a “best fit” of the n parameters such that e.g. u2
F (k) is
N k=1
minimised.
⇒ Exercise 3.
q −1 + 0.5q −2 z + 0.5
So G0(q) = or G0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
1 − q −1 + 0.2q −2 z 2 − z + 0.2
and H0(q) = or H0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
Simulation: N = 4096
T = 1s
fs = 1 Hz
u(t) binary signal in frequency band 0..0.3fs
e(t) “white” noise (random signal) with variance 1
Impulse response
3
2.5
G0 (exact)
2
−0.5
−1
−1.5
0 5 10 15 20 25 30 35 40
Time [s]
Warning: All equations starting from y(t) = G0(z) u(t) + v(t) do not account
for offsets due to non-zero means in input and/or output. So detrend!
50 Ts = 33e-6;
y1
−50
u=piezo(:,2);
−100 y=piezo(:,1);
−150
0 0.2 0.4 0.6 0.8 1
piezo =
100
iddata(y,u,Ts);
50
piezod =
u1
0
detrend(piezo);
−50
plot(piezod);
−100
0 0.2 0.4 0.6 0.8 1
Time
[ir,R,cl]=cra(piezod,200,10,2);
1800 2000
1500
Upper right: u is indeed
1600
1000 whitened.
1400
500
1200 0
1000 −500
Lower right: The impulse
−200 −100 0 100 200 −50 0 50
response is causal.
Correlation from u to y (prewh) Impulse response estimate
0.1 2500
0.08 2000
0.06 1500
The horizontal axes count the
0.04
1000 time samples, so the values
0.02
0
500
should be scaled with
0
−0.02
−500
T = 33 µs.
0 50 100 150 200 0 50 100 150 200
v
u ✗✔
❄ y y(t) = G0(z) u(t) + v(t)
✲
G0 ✲
✖✕
✲
Y (ω)
G0(eiωT ) = .
U (ω)
V (ω)
Effect of v: ĜN (eiωT ) = G0(eiωT ) + N .
UN (ω)
Difficulty: For N → ∞ there is more data, but there are also estimators at more
(=N/2) frequencies, all with a finite variance.
Solutions:
1. Define a fixed period N0 and consider an increasing number of
measurements N = rN0 by r → ∞. Carry out the spectral analysis for each
period and compute the average to obtain a “good” estimator in N0/2
frequencies.
2. Smoothen the spectrum in the f -domain.
6 7
10 10
6
10
5
10
5
10
4
10
4
10
3
10
3
10
2
10
2
10
1 1
10 10
0 1 2 3 4 0 1 2 3 4
10 10 10 10 10 10 10 10 10 10
8
10
7
10
6
10
5
10
4
10
3
10
2
10
1
10
0 1 2 3 4
10 10 10 10 10
0
10
−5
10
“reasonable” white.
0 1 2 3 4
10 10 10 10 10
Frequency (Hz)
YN (ω)
Estimate the transfer function G with Fourier transforms ĜN (eiωT ) =
UN (ω)
in 128 (default) frequencies.
Smoothing is also applied and again depends on a parameter M , that sets the
width of the Hamming window of the applied filter.
• Not always straightforward to predict which method will perform best. Why
not try both?
• ETFE is preferred for systems with clear peaks in the spectrum.
• SPA estimates also the noise spectrum v(t) = y(t) − G0(z)u(t)
according to
2
Φ̂yu(ω)
Φ̂v (ω) = Φ̂y (ω) −
Φ̂u(ω)
• Measure of signal
v
to noise ratio with the coherence spectrum
u |Φ̂yu(ω)|2
u
κ̂yu(ω) = t
Φ̂y (ω)Φ̂u (ω)
q −1 + 0.5q −2 z + 0.5
So G0(q) = or G0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
1 − q −1 + 0.2q −2 z 2 − z + 0.2
and H0(q) = or H0(z) = 2
1 − 1.5q −1 + 0.7q −2 z − 1.5z + 0.7
Simulation: N = 4096
T = 1s
fs = 1 Hz
u(t) binary signal in frequency band 0..0.3fs
e(t) “white” noise (random signal) with variance 1
1
10
Amplitude
0
10
G0
−1
10
−2 −1 0
10 10 10
ETFE, M = 30
0
SPA, M = 30
−90
Phase (deg)
−180
−270
−2 −1 0
10 10 10
Frequency (Hz)
2
w=1 w=2 w=4
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
window
0
parameter:
−200
Phase (deg)
−400
M = 15
−600 M = 30
−800
M = 60 *
2 3 4
M = 90
10 10 10
Frequency (Hz) M = 120
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
window
0
parameter:
−200
Phase (deg)
−400
M = 15
−600 M = 30
−800
M = 60
2 3 4
M = 90 *
10 10 10
Frequency (Hz) M = 120
Regression:
• Problem: find function of the regressors g(ϕ) that minimises the difference
y − g(ϕ) in some sense.
So ŷ = g(ϕ) should be a good prediction of y.
y [−]
• Linear fit y = ax + b. " #
5
0
x
Then g(ϕ) = ϕT θ with input vector ϕ = 0 5 10
1 x [−]
" # " #
a h i a
and parameter vector θ = . So: g(ϕ) = x 1 .
b b
15
y [−]
5
x2
0
Then g(ϕ) = ϕT θ with input vector ϕ = x −5
0 5 10
1
x [−]
c2 h i c2
and parameter vector θ = c1 . So: g(ϕ) = x2 x 1 c1 .
c0 c0
N
1 X
• Minimise VN (θ) = [y(t) − g(ϕ(t))]2 .
N t=1
N
1 X
• Linear case VN (θ) = [y(t) − ϕT (t)θ]2.
N t=1
N
1 X
• In the linear case the “cost” function VN (θ) = [y(t) − ϕT (t)θ]2
N t=1
is a quadratic function of θ.
∂VN (θ)
• It can be minimised analytically: All partial derivatives have to be
∂θ
zero in the minimum:
N
1 X
2ϕ(t)[y(t) − ϕT (t)θ] = 0
N t=1
• A global minimum is found for θ̂N that satisfies a set of linear equations, the
normal equations
N N
1 X 1 X
ϕ(t)ϕT (t) θ̂ N = ϕ(t)y(t).
N t=1 N t=1
ϕT (N )
h i
• Normal equations: ΦT T
N ΦN θ̂N = ΦN YN .
†
• Estimate θ̂N = ΦN YN
i−1
†
h
T
(Moore-Penrose) pseudoinverse of ΦN : ΦN = ΦN ΦN ΦT
N.
†
Note: ΦN ΦN = I.
In Matlab:
x = A\b; % Preferred
x = pinv(A)*b;
x = inv(A’*A)*A’*b;
1
Cost function VN = N (yi − axi − b)2.
P
∂VN ∂VN
1) “Manual” solution: = 0 and = 0, so
∂a ∂b
P
−xi(yi − axi − b) = 0 2
" P # " # " P #
xi xi a xi y i
P
⇔ =
1 b yi
P
xi
P P
−(yi − axi − b) = 0
P
#−1 " P
2
" # " P #
â x x xy
P
Parameter estimate: = P i P i Pi i
b̂ xi 1 yi
1
Cost function VN = N (yi − axi − b)2.
P
y(1) x(1) 1 " #
.. .. .. a
2) Matrix solution: YN = , ΦN = and θ = .
b
y(N ) x(N ) 1
1 ||Y − Φ θ||2.
Cost function (in vector form) VN = N N N 2
i−1
†
h
T
Estimate θ̂N = ΦN YN = ΦN ΦN ΦT
N YN .
that filters a given signal u(t) with uF (k) = L(z)u(k), such that
N
1 X
VN = u2
F (k) is minimised.
N k=1
∂VN
1) “Manual” solution: = 0 for all i = 1, 2, ..., n.
∂ai
∂VN
1) Partial derivatives can be computed, etc.
∂ai
2) Vector form
2
u(n + 1)+ a u(n) + a u(n − 1) +...+ anu(1)
1 2
1 u(n + 2)+ a1u(n + 1) + a2u(n) +...+ anu(2)
VN =
..
N
n(N ) +a1u(N − 1)+a2u(N − 2)+...+anu(N − n)
2
†
Recognise YN and ΦN , then compute best fit θ̂N = ΦN YN .
a1 y(3)
Matrix solution: Collect θ = b1 , YN =
.. and
b2 y(N )
−y(2) u(2) u(1)
ΦN =
.. .. .. .
−y(N − 1) u(N − 1) u(N − 2)
h i−1
T
Estimate θ̂N = ΦN ΦN ΦT
N YN or PHIN\YN.
1 ||Y − Φ θ||2.
• Write the cost function as VN = N N N 2
†
• Solution for the best fit θ̂N = ΦN YN ,
i−1
†
h
T
with pseudo-inverse ΦN = ΦN ΦN ΦT
N.
Condition number of matrix Φ is the ratio of the largest and smallest singular
values σ1/σd.
• hIf all σii > 0 (i = 1, 2, ..., d), then the condition number is finite
ΦT Φ is invertible. Matrix Φ has full (column) rank.
ih
T
• “Large” condition numbers indicate near-singularity of ΦN ΦN and will
lead to numerical inaccuracies.
x2(1)
x(1) 1
Regression matrix ΦN =
.. .. ..
depends only on x(i).
x2(N ) x(N ) 1
❍❍
15 ❍❍
c0
10 40
y [−]
y [−]
5 c0
20
0
−5 0
0 5 10 0 10 20
x [−] x [−]
svd(phi) = [160.3; 5.2; 1.2] svd(phi) = [842.5; 9.1; 0.1]
cond(phi) = 131 cond(phi) = 6364
phi\y = [0.43; −2.33; −2.25] phi\y = [0.43; −10.9; 63.8]
Exact: [0.40; −2.00; −3.00] Exact: [0.40; −10.0; 57.0]
15
10
y [−]
5
0
0 5 10
x [−]
Φ† = V1Σ−1
1 U T
1 with Σ−1
1 = diag(1/σ1 , 1/σ2 , ..., 1/σr ).
15
[u,s,v]=svd(phi);
10 r=1:2; % rank
y [−]
What is left?
Observations:
∞
g(k)z −k • G(z) = D + C(zI − A)−1B
X
• G(z) =
k=0
CAk−1B k≥1
• g(k) =
D k=0
C
CA h
n −1
i
Then Hnr ,nc = B AB · · · A c B
..
CAnr −1
H = UnΣnVnT
Note that zero singular values are removed in the singular value decomposition.
1/2
B from the first column of H2 = Σn VnT and
1/2
C from the first row of H1 = UnΣn .
−1/2 ←
− −1/2
A is Σn UnT H VnΣn using the shifted Hankel matrix
g(2) g(3) g(4) ··· g(nc + 1)
g(3) g(4) g(5) ··· g(nc + 2)
←
−
H nr ,nc = g(4) g(5) g(6) ··· g(nc + 3)
.. .. .. ..
g(nr + 1) g(nr + 2) g(nr + 3) · · · g(nr + nc)
SVD’s:
6
7.410
3.602
5
0.033
4
0.033
0.023
3 0.023
0.016
2 0.015
0.009
1
Conclusion:
0
1 2 3 4 5 6 7 8 9
order n = 2.
0.999z + 0.496
As a transfer function: Ĝ(z) = 2
z − 1.502z + 0.701
Pole-zero-plot:
Poles (x) and Zeros (o)
0.8
0.6
0.4 G0
0.2
−0.2 Ĝ
−0.4
−0.6
−0.8
−1 −0.5 0 0.5 1
+ No explicit model equations needed (specify only the order n, very well
suited for MIMO).
+ Mathematically elegant (robust, reliable) and efficient (optimisation with
linear equations).
− Not all mathematical issues of the optimisation are settled.
− The obtained solution is “sub-optimal”,
+ but is well suited as an initial guess to obtain better PE-models (these will
be discussed afterwards and may need non-linear iterative optimisation
algorithms).
Solution approach:
2
0.6
0.4
1
0.2
Log of Singular values
0
0
−0.2
-1
−0.4
-2
−0.6
−0.8
-3
−1 −0.5 0 0.5 1
-4
G0 and Ĝn4s2.
-5
0 2 4 6 8 10 12 14 16
Model order
9 2
1.5
8 1
0.5
Log of Singular values
7 0
−0.5
6 −1
−1.5
5 −2
−2.5
4 −3 −2 −1 0 1 2 3 4
Spectral analysis: 10
0
Amplitude
−2
10
−4
10
2 3 4
10 10 10
−180
Phase (deg)
−360
−540
−720
−900
2 3 4
10 10 10
Frequency (Hz)
A model with the essential dynamics of a system with a finite (and limited)
number of parameters is wanted for
Noise model: v(t) with power spectrum Φv (ω) that can be written as
As H(z) has a stable inverse, also Et−1 is known, and then also
m(t − 1) = [1 − H −1(z)]v(t).
Predictor models: “Tuning” of the transfer functions G(z) and H(z). When they
both are exactly equal to the “real” G0(z) and H0(z), the prediction error ε(t)
is a white noise signal.
In practice: “tune” the estimates to minimise the error ε(t) with a least squares
fit.
Model candidates from the model set M = {(G(z, θ), H(z, θ)) | θ ∈ Θ ⊂ Rd}.
Examples of parameterisations:
−1 ,θ)
G(z, θ) = B(z
A(z −1,θ)
, H(z, θ) = 1
A(z −1,θ)
,
B(z −1 ,θ) 1
y(t) = A(z −1,θ) u(t) + A(z −1 ,θ)
e(t)
PE-methods: consider the prediction error ε(t, θ) = y(t) − ŷ(t|t − 1, θ) for all t
as a function of θ.
If
• the system (G0, H0) is in the chosen model set and
• Φu(ω) 6= 0 in sufficient frequencies (sufficiently exciting)
then G(z, θN ) and H(z, θN ) are consistent estimators.
If
• the system G0 is in the chosen model set,
• G and H are parameterised independently (FIR, OE, BJ) and
• Φu(ω) 6= 0 in sufficient frequencies (sufficiently exciting)
then G(z, θN ) is a consist estimator.
Note: The IV-method combined with an ARX model set can also provide a
consistent estimator for G0 even when the noise model is not in the chosen
model set.
0
10
0.6
10
−1 0.4
0.2
−2 −1 0
10 10 10
0
0
−0.2
−90
Phase (deg)
−0.4
−0.6
−180
−0.8
−270
−2 −1 0
10 10 10
−1
Frequency (rad/s) −1 −0.5 0 0.5 1
7
1
10
4
0
10
3
−1
10 0
−2 −1 0
10 10 10 0 5 10 15 20 25 30 35 40
Frequency (rad/s) Time
For system that fit in the model set, it can be proven for n → ∞, N → ∞ and
n/N → 0 (n ≪ N ) that
iω n Φv (ω)
var(ĜN (e )) ∼ ,
N Φu(ω)
n Φv (ω)
iω n iω 2
var(ĤN (e )) ∼ = |H0 (e )| .
N σe2 N
N
1
Cost function: VN = N
P
ε(t, θ)2.
t=1
1 ∞
Z
With Parseval: V̄ (θ) = Φε(ω, θ) dω
2π −∞
shows the limit θ ∗ to which the estimator θN converges.
Two mechanisms:
y(t) = G0(z)u(t), with white noise input (Φu(ω) ≈ 1) and 4th order G0
Approximate models:
b1 z −1 + b2 z −2
• 2nd order OE (oe221): y(t) = −1 −2
u(t) + e(t)
1 + f1 z + f2 z
• 2nd order ARX (arx221):
(1 + a1 z −1 + a2 z −2) y(t) = (b1 z −1 + b2 z −2) u(t) + e(t)
0
10
0
−1
10
10
Amplitude Bode plot
|A|
−2
10 −1
10
−3
10
−2
10
−2 −1 0 −2 −1 0
10 10 10 10 10 10
frequency (rad/s) frequency (rad/s)
Background: In the ARX estimation there is an extra filtering with the (a priori
unknown) function (right):
iω 2 1
= |A(e , θ)|
|H(eiω , θ)|2
With Parseval we obtain the limit θ ∗ to which the estimator θN converges now
by minimising
1 ∞ Φu(ω)
Z
|G0(eiω ) − G(eiω , θ)|2 iω 2
dω
2π −∞ |H∗(e )|
Mechanism:
Find the least squares estimate G(eiω , θ) of G0(eiω ) by applying a frequency
domain weighting function Φu(ω)/|H∗(eiω )|2.
N N
1 X 2 1 X
New cost function: VN = εF (t, θ) = (L(z) ε(t, θ))2.
N t=1 N t=1
1 ∞ |L(eiω )|2
Z
V̄F (θ) = iω iω 2
{|G0(e ) − G(e , θ)| Φu(ω) + Φv (ω)} dω
2π −∞ iω
|H(e , θ)| 2
−2
10
ARX, L(z) = 1
OE, L(z) = L1(z)
−2 −1 0
ARX, L(z) = L2(z)
10 10 10
Frequency (rad/s)
• Both the 2nd order OE model without prefilter and the 2nd order ARX
model with prefilter L2(z) give a good model estimate at low frequencies.
For the obtained ARX-model the Bode amplitude plots of the prefilter
|L2(eiω )| (dashed) and the ultimate weighting function |L2(eiω )A∗(eiω )|
(solid) are:
Frequency response
0
10
−2
10
Amplitude
−4
10
−6
10
−8
10
−10
10
−2 −1 0
10 10 10
Frequency (rad/s)
Note that ARX models (with and without a prefilter) can be computed
uniquely and quick from a linear optimisation problem, whereas the
identification of OE models involves an iterative algorithm for a nonlinear
optimisation.
Impulse responses of: G0, arx120, arx230, arx340, oe210 and oe320:
Impulse Response Impulse Response
0.06 0.08
0.05 0.07
0.06
0.04
0.05
0.03
0.04
0.02 0.03
0.02
0.01
0.01
0
0
−0.01
−0.01
−0.02 −0.02
0 1 2 3 4 0 10 20 30 40 50 60 70 80 90 100
Time Time
What are the criteria to make these choices in order to obtain a good model for
a fair price?
Conflict:
• For a small bias a large model set is necessary (high order, flexible
model structure).
• A small variance is obtained easier with a small number of parameters.
• A priori considerations
• Analysis of the data
• A posteriori comparison of the models
• Validation of the models
Relation between the number of data points N and the number of parameters
to be estimated Nθ : General: N ≫ Nθ
Rule of thumb: N > 10Nθ
Note that the required number of points depends strongly on the signal to noise
ratio.
For a chose model structure, compare the obtained results for the criterium
VN (θ̂N , Z N ) as a function of the parameters sets θ̂N of different model orders.
0.0115 0.0115
0.011 0.011
loss function
loss function
0.0105 0.0105
0.01 0.01
0.0095 0.0095
0.009 0.009
0 5 10 15 0 5 10 15
# of parameters # of parameters
Identify θ̂N from Z (1) and evaluate criterium for Z (2): right graph: There is a
minimum VN near 5 prameters, but it is not very distinct.
Model order selection: estimation data set Model order selection: validation data set
0.012 0.012
0.0115 0.0115
0.011 0.011
loss function
loss function
0.0105 0.0105
0.01 0.01
0.0095 0.0095
0.009 0.009
0 5 10 15 0 5 10 15
# of parameters # of parameters
For ARX models automatic (and reliable) selection criteria exist (considering
ĒVN (θ̂N , Z N )) (ident manual page 3-70 ff, 4-183 ff):
1.8 1.8
1.6 1.6
% Unexplained of output variance
1.2 1.2
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
# of par’s # of par’s
0.8
−1
Amplitude
10
0.6
−2
10
0.4
−3 0.2
10
−2 −1
10 10
0
−0.2
−90
−0.4
Phase (deg)
−180
−270 −0.6
−360 −0.8
−450
−2 −1
10 10
−1 −0.5 0 0.5 1
Frequency (Hz)
Frequency response
0
10
Amplitude
−2
10
−4
10
2 3 4
10 10 10
−200
Phase (deg)
−400
−600
−800
2 3 4
10 10 10
Frequency (Hz)
0
10 0.8
Amplitude
−1
10 0.6
−2
10 0.4
−3
10 0.2
2 3
10 10
0
−0.2
−100
−200
−0.4
Phase (deg)
−300
−400 −0.6
−500
−600 −0.8
−700
2 3
10 10
−1 −0.5 0 0.5 1
Frequency (Hz)
The ultimate answer to the question whether the identified model is “good
enough”, so also for the choice of the model structure and order.
Techniques:
with Nα the confidence level for the confidence interval with probability α (E.g.
N95% = 1.96, N99% ≈ 3).
N 1 NX
−τ
Method 2: Cross-covariance Rεu(τ ) = ε(t + τ )u(t) for τ ≥ 0.
N t=1
N (τ ) ≤ N
√ √
Requirement Rεu α P/ N
∞
with estimator for P : P̂ N = N
R̂εN (k)R̂u
X
(k)
k=−∞
0.1
−0.1
−0.2
−20 −15 −10 −5 0 5 10 15 20
0.05
−0.05
−0.1
−20 −15 −10 −5 0 5 10 15 20
Samples
1 ∞ |L(eiω )|2
Z
V̄F (θ) = iω iω 2
{|G0(e ) − G(e , θ)| Φu(ω) + Φv (ω)} dω
2π −∞ iω
|H(e , θ)| 2
1 ∞
Z
• Minimisation of |G0(eiω ) − G(eiω , θ)|2 Q(ω, θ ∗) dω, with
2π −∞
Q(ω, θ) = Φu(ω) |L(eiω )|2 / |H(eiω , θ)|2.
• Fitting of |H(eiω , θ)|2 with the error spectrum.
The limit θ ∗ depends on the model set, the prefilter L(q) and the input
spectrum Φu(ω).
• The vertical axis of the Bode amplitude plot shows log |G|, which is
approximated better with a small relative error. We are minimising
1 ∞ G0(eiω ) − G(eiω , θ) 2
Z
iω 2 ∗ ∗)
| | |G 0 (e )| Q(ω, θ ) dω, so Q(ω, θ
2π −∞ G0(eiω )
should be large when G0(eiω ) becomes small.
• The horizontal axis shows log ω, which means that high frequencies
dominate. This can be compensated by looking at
|G0(eiω )|2 Q(ω, θ ∗) dω = ω|G0(eiω )|2 Q(ω, θ ∗) d(log ω). So the fit at
low frequencies improves if |G0(eiω )|2 Q(ω, θ ∗) is larger than ω in that
frequency region.
Quite often a broadband input signal is desirable: sufficiently exciting and the
data contains information of the system in a large range of frequencies.
Matlab-tool: u = idinput(N,type,band,levels)
Clock 0⊕0=1⊕1=0
0⊕1=1⊕0=1
❄ x1 ❄ x2 xn−1 ❄ xn=s
✲ State 1 ✲ State 2 ✲ State n ✲
❄ ❄ ❄ ❄
❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁
❆ ✁ ❆ ✁ ❆ ✁ ❆ ✁
❆✁ a1 ❆✁a2 ❆✁ an−1 ❆✁ an
✛✘
❄ ✛✘ ❄ ✛✘
❄ ✛✘
❄
✛ ✛ ✛
⊕
✚✙
⊕
✚✙
⊕
✚✙
⊕
✚✙
• The binary signal s(t) can be transformed into a signal u(t) with amplitude
c and mean m with u(t) = m + c(−1 + 2 s(t)):
c
Ēu(t) = m + M
The power spectrum can be modified by filtering of the signal with a (linear)
filter, but then the binary character is no longer guaranteed.
• The binary character is advantageous in the case of (presumed)
non-linearities as it has a maximum power for a limited amplitude.
frequency: 1.2
Nc = 10
uNc (t) = u(ent(t/Nc)). 1
0.6
Nc = 5
The spectrum is no longer flat and even 0.4 Nc = 3
has some frequencies with Φu(ω) = 0. 0.2 Nc = 1
0
0 0.5 1 1.5 2 2.5 3 3.5
±c, according to
P r(u(t) = u(t − 1)) = p and
1
p = 0.75
The power spectral density depends on p: p = 0.50
For p = 1/2 the spectrum is flat. 0
0 0.5 1 1.5 2 2.5 3 3.5
Compute
r
X
u(t) = αk sin(ωk t + ϕk )
k=1
with a user defined set of excitation frequencies {ωk }k=1,...,r and associated
amplitudes {α}k=1,...,r .
Rule of thumb:
• Upper limit Ts, lower limit ωs:
Nyquist frequency ωN = ωs/2 > highest relevant frequency.
For a first order system with bandwidth ωb: ωs ≥ 10ωb.
• Lower limit TN > 5–10 times the largest relevant time constant.
A continuous system with state space matrix Ac and ZOH discretisation has a
discrete state space matrix Ad = eAcTs .
For Ts → 0 it appears that Ad → I: all poles cluster near z = 1.
The physical length of the range of the difference equation becomes smaller.
Strategy:
Parameters:
• Link lengths (l1 and l2) are known.
• Gravity g is known
• Masses and inertias (m1, J1, m2, J2 and m3) are to be estimated.
• Measurements of the torques (T1 and T2).
• Angles known as function of time (φ1(t) and φ2(t)).
τ = Φ(q , q̇ , q̈ )p
with
• measurements vector τ = [T1, T2]T ,
• parameter vector p including m1, J1, m2, J2 and m3,
• regression matrix Φ(q , q̇ , q̈ ) depending only on known kinematic quantities
with q = [φ1, φ2]T .
m2 l12 + m3 (l12 + l22 + 2l1l2 cos φ2 ) + J1 + J2 m3 (l22 + l1l2 cos φ2 ) + J2
with M̄ = m3 (l22 + l1l2 cos φ2 ) + J2 m3 l22 + J2
m1
J1
" #
T1
⇔ = Φ(q , q̇ , q̈ ) m2
???
T2 J
2
m3
• The other parameters are collected in p = [J1, J2, m2, m3]T . Then the
elements of Φ can be written as
Φ11 = φ̈1
Φ12 = φ̈1 + φ̈2
Φ13 = −l1g sin φ1 + l12φ̈
1
Φ14 = l1l2((2φ̈1 + φ̈2) cos φ2 − (φ̇2
2 + 2φ̇1 φ̇2 ) sin φ2 )
− l1g sin φ1 − l2g sin(φ1 + φ2) + l1 2 φ̈ + l2(φ̈ + φ̈ )
1 2 1 2
Φ21 = 0
Φ22 = φ̈1 + φ̈2
Φ23 = 0
Φ24 = l1l2(φ̇2 2
1 sin φ2 + φ̈1 cos φ2) − l2 g sin(φ1 + φ2) + l2 (φ̈1 + φ̈2)
• So matrix Φ indeed depends only on known quantities and not on any of the
parameters in p: A parameter linear form can be obtained:
τ = Φp.
• An upper bound for the relative error of the estimated parameter vector is
σmax(A)
cond(A) = ,
σmin (A)
with the largest and smallest singular values σmax(A) and σmin(A),
respectively.
The fundamental pulsation of the Fourier series ωf should match the total
measurement time.
• For each φi there are 2 × Ni + 1 parameters (ail, bil for i = 1..Ni and φi(0))
that can be optimised for optimal excitation while e.g. motion constraints are
satisfied, e.g. with the M ATLAB command fmincon.
Φ11 = φ̈1
Φ12 = φ̈1 + φ̈2
Φ13 = 2φ̈
l1 1
Φ14 = l1l2((2φ̈1 + φ̈2) cos φ2 − (φ̇2 2 + 2φ̇1 φ̇2 ) sin φ2 )
+ l12 φ̈ + l2 (φ̈ + φ̈ )
1 2 1 2
Φ21 = 0
Φ22 = φ̈1 + φ̈2
Φ23 = 0
Φ24 = l1l2(φ̇2 1 sin φ2 + φ̈1 cos φ2 ) + l 2 (φ̈ + φ̈ )
2 1 2
∂VN ∗ ∗ ∂VN ∗
(p , τ ) = 0 and (p + δp, τ ∗ + δτ ) = 0
∂p ∂p
∂ 2VN ∗ ∗ ∂ 2VN ∗ ∗
(p , τ )δp + (p , τ )δτ = 0
∂p2 ∂p∂τ
so
#−1
2 ∂ 2VN ∗ ∗
"
∂ VN ∗ ∗
δp = − 2
(p , τ ) (p , τ ) δτ
∂p ∂p∂τ
1 ε(p, τ )T ε(p, τ ):
For least squares criterion VN (p, τ ) = 2
∂VN ∂ 1 T
T
First derivative: = ε(p, τ ) ε(p, τ ) = ε(p, τ ) S(p, τ ) with
∂p ∂p 2
∂ε(p, τ )
sensitivity matrix S(p, τ ) = .
∂p
∂ 2VN T ∂S(p, τ ) T
Second derivative: = ε(p, τ ) + S(p, τ ) S(p, τ )
∂p2 ∂p
−1
(i+1) (i) (i) (i) (i) (i) (i)
θ̂N = θ̂N − µN S(θ̂N )T S(θ̂N ) + λI S(θ̂N )T ε(θ̂N )
Joint 3
Link 3
Hinge 3 Beam 3
Link 2 Joint 4
Slider truss Hinge 4
Link 4
Beam 4
Joint 2 Beam 2
Joint 5 Hinge 5
Hinge 2
Joint 6 Link 5 Beam 5
Beam 1 Hinge 6
Link 1 Link 6
Hinge 1 Beam 6
Joint 1
z
Base
y
Figure 1. Stäubli RX90B six-axis industrial robot. Courtesy of x
Stäubli, Faverges, France. Figure 2. Finite element model of the Stäubli RX90B.
(x)T 2 (x)
q̇ )q̇ − f + DF (e,c)T σ (c) = T ,
h i
M̄ q̈ + DF M (D F (3)
• Joint torques T
p s(k) Rq nx′
• Lumped Masses: Each link is described by a R ny′
Rq nz′
symmetric rotational inertia matrix J (k), a mass z p
Rp nz′
m(k) and a vector s(k) defining the center of y
Rp nx′
gravity with respect to the corresponding element x
node at which the body is lumped. (Toon Hardeman, 2005)
For each link element a lumped parameter vector p(l,k) is defined as
p(l,k) = (m, msx′ , msy′ , msz ′ , Jx′x′ , Jx′y′ , Jx′z ′ , Jy′y′ , Jy′z ′ , Jz ′z ′ )(k)
δ(a)
(s) j
(f ) (C,0) (s,0) − q̇j /q̇j (v,0)
Tj = Tj + Tj e + Tj q̇j
Coulomb Stribeck viscous
0.40 0.04
T (f ) /T (max)
T (f ) /T (max)
0.30 0.03
Dots (•): experiments.
0.20 0.02
Dashed (- - -):
0.10 0.01 estimated in the full
velocity range.
0.00 0.00
0 1 2 3 4 5 0.00 0.05 0.10 0.15 0.20
Angular velocity q̇ [rad/s] Angular velocity q̇ [rad/s] Solid (—): estimated in
the range from 0 to 0.5
(a) Full velocity range. (b) Low velocity range.
rad/s.
0.40 0.04
T (f ) /T (max)
T (f ) /T (max)
0.30 0.03
0.20 0.02
0.10 0.01
0.00 0.00
0 1 2 3 4 5 0.00 0.05 0.10 0.15 0.20
Angular velocity q̇ [rad/s] Angular velocity q̇ [rad/s]
(a) Full velocity range. (b) Low velocity range.
This friction model gives an accurate fit in the full velocity range with a minimal
parameter set and physically sound model structure (Rob Waiboer, 2005).
(a)
(s) δj (v)
(f ) (a,0) − q̇j /q̇j (v,0) (1−δj )
Tj = Tj e + Tj q̇j . (7)
(m)
• The motor inertias J (m) are included in the new reduced mass matrix M̄ .
• The joint driving torques T (m) are computed from the measured motor
currents.
• This acceleration linear form is well suited for simulations.
The parameter linear form is well suited for a linear least squares fit of the
model parameters p:
So in total 82 parameters!
• Move the robot along a trajectory for the joint angles q (t).
(m)
Φ1(q̈ 1, q̇ 1, q 1) T 1
A =
..
and b
= ..
(15)
Φn(q̈ n, q̇ n, q n) (m)
T n
Ap = b + ρ. (14)
A = U ΣV T , (21)
105
and accordingly
h i h i
U= U1 U2 and V = V1 V2 (24)
• Transform and partition parameter vector p with the right singular matrix V :
g = U T b, (30)
6n
1
s2 = var(ρ) =
X
ρi (38)
6n − r i=1
s2
var(α̂i) = 2 . (39)
σi
So for small singular values, the accompanying α̂i can not be estimated
accurately.
• To avoid degradation of the parameter accuracy: Do not take into account too
small singular values.
• In other words: Take only r singular values into account where r is smaller
than the rank of the regression matrix A.
This is the truncated or partial SVD method.
10.0
5.0
• About 20 parameters seems to
be sufficient.
kρk22
2.0
• How can we determine r more
1.5 exactly?
It depends on the error in the
1.0
1 10 20 30 40 50 55 parameter estimate that is
Number of singular values r
accepted.
E.g. scaling with some nominal parameter set changes this effectively in relative
errors of all parameters.
104
σ, |g| and 10% threshold
102
1
Dots (•): singular values σi .
10−2
0.08
0.3
0.10
0.06
normalised joint torque
0 0 0
−0.02
−0.1
−0.05
−0.04
−0.2
−0.06
−0.10
−0.3
−0.08
Figure 8. The simulated and measured joint torques along the trajectory as a function of time. The simulation has been carried out with model M1 .
With: the measured torque, the simulated torque and the residual torque.
Controller System
• impossibility to remove the control system,
• controller relevant identification.
Situation:
• Prerequisites: y(t) = G0(z) u(t) + H0(z) e(t)
u(t) = C(z) [r2(t) − y(t)] + r1(t)
The control system C(z) is known or not.
• Wanted: estimate Ĝ of system G0 from measurements of system output
y(t) and furthermore system input u(t) and/or feedforward r1(t) and/or
tracking reference r2(t).
• Problem: input u(t) is correlated with the disturbance v(t) =
H0(z) e(t) because of the controller C(z).
r_1 1 H_0
Noise
r_2 u y
2 C G_0
(SISO: S0 = W0)
r_1 1
Noise H_0
Controller System
When is it possible
(a) to identify a model [G(z, θ̂N ), H(z, θ̂N )] consistently.
(b) to identify a system model G(z, θ̂N ) consistently.
(c) to formulate an explicit approximation criterium that characterises the
asymptotic estimator G(z, θ ∗) independent of Φv (ω)
(d) to set a specific order for G0 in the model set.
(e) to identify an unstable process G0.
(h) to guarantee that the identified model for G0 will be stabilised by the
applied control system C.
r_1 1 H_0
Noise
r_2 u y
2 C G_0
Controller System
Solution strategy depends also on starting point:
For most solution methods it is required that system G0 (and its model) have at
least one delay.
r_2 u y
2 C G_0
Controller System
1 z(t)]
Then: y(t) = S0(z) [G0(z)r1(t) + C(z)
u(t) = S0(z) [r1(t) − z(t)]
Ĝ(eiω ) = G0(eiω )
−1
Ĝ(eiω ) =
C(eiω )
Consider a first order ARX model with two parameters θ = (a, b)T :
In other words all models with the same â − b̂f predict an identical error ε and
can not be distinguished.
Dual Youla
Coprime factorisation
• Two-steps
Tailor-made
• Joint input/output
• Indirect
• Direct
IV
Consistency (Ĝ, Ĥ) + + + + + + +
Consistency Ĝ + - + - + + +
Tunable bias - - + - + + + +
Fixed model order + + - - + + -
Unstable system + + + + - + +
(G(z, θ̂N ), C) stable - - - + - - /+
C known n n j n j n n j
• in this course.
Idea: Neglect feedback and estimate model for G0 and/or H0 with common
open-loop techniques from measured {u(t), y(t)}.
1 π iω 2
2 |S0(e )|
Z
iω iω
{|G0(e ) − G(e , θ)| iω 2
Φr (ω)
2π −π |H(e , θ)|
|H0(eiω )|2|S0(eiω )|2
+ iω 2 iω 2
Φe(ω)} dω
|H(e , θ)| S(e , θ)|
Idea: (1) identify closed-loop system with common open-loop techniques and
measurements of {r(t), y(t)} and (2) next compute the open-loop system with
knowledge of the controller.
Idea:
(1) identify closed-loop system sensitivity S0(z) with common open-loop
techniques and measurements of {r(t), u(t)}
(2) use this estimate Ŝ(z) to create a noisefree estimate û(t) of u(t) and
estimate system G0(z) with an open-loop technique from this estimate û(t)
and the measured y(t).
Note that
is not an open-loop system due to the term with r(t), but this contribution
vanishes if the estimate from the first step is sufficiently accurate.
For this reason usually a high order estimate for S0(z) is considered.
Estimate Ĝ(z) is consistent, if both Ĝy (z) and Ĝu(z) are consistent estimates.
• data reduction
• simplicity of processing: filtering is a multiplication.
• evaluation of models in f -domain: aiming at application.
• choice discrete/continuous models.
N
|Ğ(ωk ) − G(eiωk , θ)|2|W (ωk )|2
X
k=1
iωk B(eiωk , θ)
with G(e , θ) = iω
and W (ωk ) a chosen weighting function.
A(e , θ)
k
“Tricks”:
• An ARX-like problem (linear-in-the-parameters) can be obtained by
rewriting the optimisation problem (implicit weighting with |A(eiωk , θ)|)
• An iterative linear least-squares solution can be found by weighting with
the |A(eiωk , θ)| found in the previous iteration.
→ Exercise
• Models are fitted using a non-linear weighted least squares fit, e.g. using the
accompanying lsfits toolbox.
Estimate:
v
|Φ̂N 2
yu (ω)|
u
N
u
Ĉyu(ω) = t
Φ̂N N
y (ω)Φ̂u (ω)
−1 1 −1 nakj −na
with elements akj (q ) = δkj + akj q + ... + akj q kj .
Note: this expression has a large number of parameters, especially when the
poles differ for each of the elements in the matrix. Then the denominator has to
contain all poles and in each element some of them need to be compensated
by zeros of bij (z).
N X p
1 X
Minimisation of w i ε2
ip (t, θ).
N t=1 i=1
p X N
m X
|Ğjl (ωk ) − Gjl (eiωk , θ)|2|Wjl (ωk )|2
X