0% found this document useful (0 votes)

11 views

OCDM2223 Tutorial7solved

Assuming the value function is quadratic in the state allows the infinite horizon LQ problem to be formulated as an algebraic Riccati equation that can be solved to obtain the optimal feedback gain. This yields a closed form solution rather than requiring iterative numerical methods.

Uploaded by

qq727783

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

OCDM2223 Tutorial7solved

Uploaded by

qq727783

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lehrstuhl für Steuerungs- und Regelungstechnik Optimal Control and TUM

Univ. Prof. Dr.-Ing./Univ. Tokio habil. Martin Buss Decision Making

Dr.-Ing. Marion Leibold Tutorial 7 WS 22/23

Tutorial 7

Problem 1: LQ Control

[M. Papageorgiou, M. Leibold, M. Buss: Optimierung, Springer 2015, Examples 13.2]

For a linear discrete-time plant

x(k + 1) = x(k) + u(k), x(0) = x0

an optimal feedback control u(k) = K(k)x(k) that minimizes the cost

N −1
1 2 1X
J = Sx(N ) + x(k)2 + ru(k)2 , S ≥ 0, r ≥ 0.
2 2 k=0

has to be designed in the framework of LQ control.

a) Find the recursion for the Riccatti Matrix P (k) and the controller K(k).
With A = 1, B = 1, Q = 1, R = r, Pf = S:

K(k) = −(B(k)T P (k + 1)B(k) + R(k))−1 B(k)T P (k + 1)A(k)

P (k) = A(k)T P (k + 1)A(k) + Q(k) + K(k)T B(k)T P (k + 1)A(k), P (N ) = Pf

P (k + 1) r + (r + 1)P (k + 1)
⇒ K(k) = − , P (k) = , P (N ) = S
r + P (k + 1) r + P (k + 1)

b) Assume N = 4, r = 1. Give P (0), P (1), . . . P (4) and K(0), K(1),. . ., K(3).

k 4 3 2 1 0
1+2S 3+5S 8+13S 21+34S
P (k) S 1+S 2+3S 5+8S 13+21S
S
K(k) − 1+S − 1+2S
2+3S
− 3+5S
5+8S
8+13S
− 13+21S

c) Compare and discuss the resulting trajectories for x(t) for S = 0 and S → ∞ for an initial value of
x(0) = 2.
For S = 0:
8 3
u(0) = K(0)x(0) = − x(0), u(1) = K(1)x(1) = − x(1),
13 5
1
u(2) = K(2)x(2) = − x(2), u(3) = K(3)x(3) = 0
2

5
⇒ x(0) = 2, x(1) = x(0) + u(0) = x(0) = 0.769,
13
x(2) = x(1) + u(1) = 0.308, x(3) = x(2) + u(2) = 0.154, x(4) = x(3) + u(3) = 0.154
For S → ∞:
13 5
u(0) = K(0)x(0) = − x(0), u(1) = K(1)x(1) = − x(1),
21 8
2
u(2) = K(2)x(2) = − x(2), u(3) = K(3)x(3) = −x(3)
3

8
⇒ x(0) = 2, x(1) = x(0) + u(0) = x(0) = 0.762,
21
x(2) = x(1) + u(1) = 0.286, x(3) = x(2) + u(2) = 0.095, x(4) = x(3) + u(3) = 0

d) What is the cost J ∗ for these realizations?

1 34
S → ∞ : J ∗ = x20 P (0) = 2 × = 3.238
2 21
21
S = 0 : J∗ = 2 × = 3.231
13

Problem 2: LQ Control with Infinite Horizon

[M. Papageorgiou, M. Leibold, M. Buss: Optimierung, Springer 2015, Examples 13.3]

Consider the plant from problem 1 with cost function
∞
1X
J= x(k)2 + ru(k)2 , r ≥ 0.
2 k=0

a) What is the stationary Riccatti Matrix P∞ ?

With A = 1, B = 1, Q = 1, R = r:
K = −(B T P∞ B + R)−1 B T P∞ A
P∞ = AT P∞ A + Q + K T B T P∞ A

r
2
P∞ 2 1 1
⇒ P∞ = P∞ + 1 − ⇒ P∞ − P∞ − r = 0 ⇒ P∞ = + +r
r + P∞ 2 4

b) Give an equation for K∞ of the optimal feedback control u(k) = K∞ x(k).

√
P∞ 1 + 1 + 4r
K∞ =− =− √
r + P∞ 2r + 1 + 1 + 4r

c) Give the dynamics of the closed loop system.

2r
x(k + 1) = x(k) + u(k) = (1 + K∞ )x(k) = √ x(k)
2r + 1 + 1 + 4r

d) Discuss stability of the closed loop.

2r
Since √
2r+1+ 1+4r
< 1: the equlibrium xeq = 0 is asymptotically stable.

e) Discuss the closed loop behavior for r = 0 and r → ∞ and interpret the cost function.
r = 0: Dead-Beat Control: Reach Equilibrium xeq = 0 in one time step
r → ∞: K∞ = 0 closed loop behavior is equal to open loop behavior, only critically stable

Problem 3: LQ as RL with model

[Adapted from F. L. Lewis, D. Vrabie, V. L. Syrmos, Optimal Control, 2012, ex 11.2-1 and 11.3-4]
In this exercise we consider the discrete-time linear quadratic regulator problem in light of the Markov
Decision Process (MDP) theory. Consider a deterministic MDP with infinite and continuous state space
X = Rn and action space U = Rm and state transition equation
x(k + 1) = Ax(k) + Bu(k), (1)
where k is the discrete time index. For a fixed stabilizing stationary policy π that is defined by the control
law u(i) = µ(x(i)), i = k, . . . , ∞, and for positive definite matrices Q, R > 0, the associated value
function is ∞
1X
π
x(i)> Qx(i) + u(i)> Ru(i) ,

V (x(k)) = (2)
2 i=k
only dependent on the initial state x(k). The infinite sum (2) can be written as a difference equation,
yielding
1
V π (x(k)) = x(k)> Qx(k) + u(k)> Ru(k) + V π (x(k + 1)).

(3)
2
We assume that the value function is quadratic in the state V π (x(k)) = 21 x(k)> P x(k) for some kernel
matrix P . Then, (3) boils down to:
1 1 1
x(k)> P x(k) = x(k)> Qx(k) + u(k)> Ru(k) + x(k + 1)> P x(k + 1). (4)
2 2 2
Substituting the system dynamics (1), equation (4) is further simplified:
1 > 1 >
x Qx + u> Ru + x> A> P Ax + 2x> A> P Bu + u> B > P Bu ,

x Px = (5)
2 2
where, for readability, we used the notation x = x(k) and u = u(k).
Consider the Policy Iteration algorithm. We assume an initial policy u = K (0) x and then we perform the
Policy Evaluation step, that is, we calculate the value function for the initial policy. Precisely, substituting
the initial policy in (5), the goal is to compute matrix P (0) solving the Lyapunov equation:
> >
P (0) = Q + K (0) RK (0) + A> P (0) A + 2A> P (0) BK (0) + K (0) B > P (0) BK (0) . (6)

Found the solution P (0) , we perform the policy improvement based on the value function just obtained,
that is, we determine the next policy K (1) as
1 >
K (1) x = argmin x Qx + u> Ru + x> A> P (0) Ax + 2x> A> P (0) Bu + u> B > P (0) Bu ,
u=Kx 2
which, as shown during the lectures, yields the closed-form solution for the policy improvement step:
−1
K (1) = − R + B > P (0) B B > P (0) A.

1
a) What is the advantage of assuming that the value function is in the form V π (x(k)) = x(k)> P x(k)?
2
In fixing a “structure” for the value function V π (x(k)) the problem reduces from finding infinitely many
possible policies u(x) to finding a finite number of entries in matrix P .
b) Considering the system dynamics and cost function for the previous problem, with r = 1, determine the
solution of the Lyapunov equation (6). Would this approach scale easily with a high-dimensional state?
And with nonlinear dynamics? Perform a few iterations of the Policy Iteration algorithm starting with the
stabilizing initial policy K (0) = −0.1 and with the non-stabilizing initial policy K (0) = 1. How many steps
are required for convergence?
With A = 1, B = 1, Q = 1, R = 1, we obtain P (0) ∈ R as
2
(0) 2 (0) 2 1 + K (0)
P (0) = 1 + K + P (0) + 2K (0) P (0) + K P (0) ⇒ P (0) = −

2.
2K (0) + (K (0) )

With K (0) = −0.1, at the fifth iteration the policy converges to K = −0.6180 (P = 1.618), with K (0) = 1
the algorithm converges to the wrong value 1.6818 (P = −0.6180).
Consider the Value Iteration algorithm. Assuming initial policy u = K (0) x and assigning zero to the value
function, P (0) = 0, we perform the Value Update by just evaluating the right-hand side of the Lyapunov
equation (6), obtaining
> >
P (1) = Q + K (0) RK (0) + A> P (0) A + 2A> P (0) BK (0) + K (0) B > P (0) BK (0) . (7)

Then, also in this case, the policy improvement is based on the obtained value function and eventually we
have: −1
K (1) = − R + B > P (1) B B > P (1) A.

c) Considering the system dynamics and cost function for the previous problem, with r = 1, perform a few
iterations of the Value Iteration algorithm starting with the stabilizing initial policy K (0) = −0.1 and with
the unstabilizing initial policy K (0) = 1. How many steps are required for convergence? What happens
if the initial guess of the policy is K (0) = 100? And which algorithm yields faster convergence, Policy
Iteration or Value Iteration?
With K (0) = −0.1, at the sixth iteration the policy converges to K = −0.6180 (P = 1.618), with
K (0) = 1 the algorithm converges in 9 iterations, with K (0) = 100 also in nine iterations.
6
-0.1

-0.2
5

-0.3

-0.4

-0.5
3

-0.6
2

-0.7

1
-0.8

-0.9 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Figure 1: Comparison between Policy Iteration and Value Iteration, for K (0) = −0.1. Controller gain K (j)
(left) and matrix P (j) of the value function (right) for the first 10 iterations j = 1, . . . , 10.

2 2

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Figure 2: Comparison between Policy Iteration and Value Iteration, for K (0) = 1. Controller gain K (j)
(left) and matrix P (j) of the value function (right) for the first 10 iterations j = 1, . . . , 10.

Lec19 - Linear Quadratic Regulator
No ratings yet
Lec19 - Linear Quadratic Regulator
7 pages
JSA FOR Fixing of Light Fittings
100% (3)
JSA FOR Fixing of Light Fittings
1 page
Patient Information Form
No ratings yet
Patient Information Form
6 pages
A2 Linear-Quadratic Optimal Control
No ratings yet
A2 Linear-Quadratic Optimal Control
8 pages
02 - Dynamic Programming and LQR
No ratings yet
02 - Dynamic Programming and LQR
25 pages
Linear Quadratic Regulator
0% (1)
Linear Quadratic Regulator
52 pages
Optimal Control and Decision Making: Eexam
No ratings yet
Optimal Control and Decision Making: Eexam
18 pages
RL and ObC Lecture 1
No ratings yet
RL and ObC Lecture 1
34 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
Optim
No ratings yet
Optim
23 pages
Optimal Control
No ratings yet
Optimal Control
35 pages
Inno2024 EMT4203 CONTROL II NOTES R6
No ratings yet
Inno2024 EMT4203 CONTROL II NOTES R6
9 pages
05 - Robust MPC
No ratings yet
05 - Robust MPC
28 pages
Lecture5 LQR PDF
No ratings yet
Lecture5 LQR PDF
54 pages
Chapter 4
No ratings yet
Chapter 4
33 pages
P.L.D. Peres J.C. Geromel - H2 Control For Discrete-Time Systems Optimality and Robustness
No ratings yet
P.L.D. Peres J.C. Geromel - H2 Control For Discrete-Time Systems Optimality and Robustness
4 pages
Linear-Quadratic Stochastic Control Problem - Solution Via Dynamic Programming
No ratings yet
Linear-Quadratic Stochastic Control Problem - Solution Via Dynamic Programming
16 pages
Chapter2 2
No ratings yet
Chapter2 2
12 pages
Optimization and Control: Examples Sheet 2: LQG Models
No ratings yet
Optimization and Control: Examples Sheet 2: LQG Models
2 pages
Notes 9
No ratings yet
Notes 9
14 pages
Sol Mock Exam
No ratings yet
Sol Mock Exam
10 pages
Model Based Output Difference Feedback Optimal Control
No ratings yet
Model Based Output Difference Feedback Optimal Control
6 pages
Kybernetika 39-2003-4 6
No ratings yet
Kybernetika 39-2003-4 6
11 pages
Linear Quadratic Optimal Control
No ratings yet
Linear Quadratic Optimal Control
32 pages
16 - Optimal Control of Unknown Parameter Systems
No ratings yet
16 - Optimal Control of Unknown Parameter Systems
3 pages
Optimal Control 2018 Souanef
No ratings yet
Optimal Control 2018 Souanef
15 pages
Class4
No ratings yet
Class4
4 pages
Inno2020 Emt4203 Control II Chap3.3-4 LQ Optimal
No ratings yet
Inno2020 Emt4203 Control II Chap3.3-4 LQ Optimal
11 pages
09 LQR
No ratings yet
09 LQR
68 pages
7 Linear Quadratic Control: 7.1 The Problem
No ratings yet
7 Linear Quadratic Control: 7.1 The Problem
10 pages
1 s2.0 S1474667016440140 Main
No ratings yet
1 s2.0 S1474667016440140 Main
6 pages
Linear Quadratic Dual Control: Anders Rantzer
No ratings yet
Linear Quadratic Dual Control: Anders Rantzer
4 pages
Automatica: D. Vrabie O. Pastravanu M. Abu-Khalaf F.L. Lewis
No ratings yet
Automatica: D. Vrabie O. Pastravanu M. Abu-Khalaf F.L. Lewis
8 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
OPTCON LQ Optimal Control 2024-10-16
No ratings yet
OPTCON LQ Optimal Control 2024-10-16
13 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
233 pages
Robotics: Control Theory
No ratings yet
Robotics: Control Theory
54 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
Linear-Quadratic Regulator (LQR) - Wikipedia
100% (1)
Linear-Quadratic Regulator (LQR) - Wikipedia
4 pages
Statement 2: Under The Assumptions of This Theorem For Any
No ratings yet
Statement 2: Under The Assumptions of This Theorem For Any
7 pages
4 The Linear Quadratic Regulator: 4.1 Time Varying and Finite Horizon Case
No ratings yet
4 The Linear Quadratic Regulator: 4.1 Time Varying and Finite Horizon Case
12 pages
Mpc12 Exam
No ratings yet
Mpc12 Exam
8 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
Lecture 4 Control
No ratings yet
Lecture 4 Control
23 pages
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
No ratings yet
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
6 pages
Minimax Linear Regulator Problems For Positive Systems
No ratings yet
Minimax Linear Regulator Problems For Positive Systems
26 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
Constrained Linear Quadratic Regulation
No ratings yet
Constrained Linear Quadratic Regulation
7 pages
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
No ratings yet
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
8 pages
1 s2.0 S0005109806000021 Main PDF
No ratings yet
1 s2.0 S0005109806000021 Main PDF
11 pages
Optimal Lineal Quadratic Control
No ratings yet
Optimal Lineal Quadratic Control
13 pages
Optimal and Robust Control
No ratings yet
Optimal and Robust Control
216 pages
Optimal Control
No ratings yet
Optimal Control
32 pages
On The Existence of Stationary Optimal Policies For The Average Cost Control Problem of Linear Systems With Abstract State-Feedback
No ratings yet
On The Existence of Stationary Optimal Policies For The Average Cost Control Problem of Linear Systems With Abstract State-Feedback
6 pages
LQG Wind Turbine
No ratings yet
LQG Wind Turbine
5 pages
LQR Feedforward
No ratings yet
LQR Feedforward
20 pages
chayma
No ratings yet
chayma
3 pages
Linear-Quadratic-Gaussian Controllers: Immune Response Example
No ratings yet
Linear-Quadratic-Gaussian Controllers: Immune Response Example
11 pages
LQR
No ratings yet
LQR
5 pages
3796 Neural Lyapunov Model Predicti
No ratings yet
3796 Neural Lyapunov Model Predicti
12 pages
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
No ratings yet
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
22 pages
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
Conventional Loco Items
No ratings yet
Conventional Loco Items
118 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
Office of Research, Innovation, and Commercialization (ORIC)
No ratings yet
Office of Research, Innovation, and Commercialization (ORIC)
8 pages
Oral Communication Module 6.2 Answer Sheet What's More: Activity 1 1. Impromptu 2. Impromptu 3. Manuscript 4. Manuscript 5. Impromtu
No ratings yet
Oral Communication Module 6.2 Answer Sheet What's More: Activity 1 1. Impromptu 2. Impromptu 3. Manuscript 4. Manuscript 5. Impromtu
3 pages
Equivalent Force-Couple Systems: Objectives
No ratings yet
Equivalent Force-Couple Systems: Objectives
19 pages
Dividend Design
No ratings yet
Dividend Design
12 pages
Elastic Design of A Single Bay Portal Frame Made of Fabricated Profiles PDF
No ratings yet
Elastic Design of A Single Bay Portal Frame Made of Fabricated Profiles PDF
28 pages
The Necromantic Ritual Book
No ratings yet
The Necromantic Ritual Book
32 pages
Export Merchandising
No ratings yet
Export Merchandising
14 pages
Daniel Questions
No ratings yet
Daniel Questions
2 pages
27.PEPSI-COLA BOTTLING COMPANY OF THE PHILIPPINES, INC v. Tanauan
No ratings yet
27.PEPSI-COLA BOTTLING COMPANY OF THE PHILIPPINES, INC v. Tanauan
7 pages
Sistema de Trituracion y Cinta Transportadora.
No ratings yet
Sistema de Trituracion y Cinta Transportadora.
11 pages
China CICETE - Recirculating Acqucultural System Promotion in The Developing Countries
No ratings yet
China CICETE - Recirculating Acqucultural System Promotion in The Developing Countries
2 pages
Volkswagen India Starts Women Empowerment Initiative
No ratings yet
Volkswagen India Starts Women Empowerment Initiative
2 pages
Developing of Risk Analysis Methodologies
No ratings yet
Developing of Risk Analysis Methodologies
5 pages
TRAINING COURSES
No ratings yet
TRAINING COURSES
2 pages
Memo Makanan
No ratings yet
Memo Makanan
3 pages
Web Technology Lab File-4
No ratings yet
Web Technology Lab File-4
36 pages
Unit 2: Issues of Web Technology (4 HRS.)
No ratings yet
Unit 2: Issues of Web Technology (4 HRS.)
22 pages
Dredging Presentation
100% (4)
Dredging Presentation
19 pages
Fidic Clauses 17 To 19 - Part3
No ratings yet
Fidic Clauses 17 To 19 - Part3
3 pages
4-Annexure - 2 List of Makes - IITM - Dining - Final
No ratings yet
4-Annexure - 2 List of Makes - IITM - Dining - Final
4 pages
Gertc Line Up
No ratings yet
Gertc Line Up
3 pages
LAB11114 GasArcProductRange DCRA0029 Rev1-Compressed-1
No ratings yet
LAB11114 GasArcProductRange DCRA0029 Rev1-Compressed-1
28 pages
Industrial Marketing Review
No ratings yet
Industrial Marketing Review
30 pages
GoodGovernance CSISCandonCity
No ratings yet
GoodGovernance CSISCandonCity
15 pages
11Kv Indoor Live Tank Vacuum CB Test Report: Stelmec Limited, India
100% (1)
11Kv Indoor Live Tank Vacuum CB Test Report: Stelmec Limited, India
2 pages
Arc Fault Circuit Interrupters
No ratings yet
Arc Fault Circuit Interrupters
16 pages

OCDM2223 Tutorial7solved

Uploaded by

OCDM2223 Tutorial7solved

Uploaded by

Lehrstuhl für Steuerungs- und Regelungstechnik Optimal Control and TUM

Univ. Prof. Dr.-Ing./Univ. Tokio habil. Martin Buss Decision Making

[M. Papageorgiou, M. Leibold, M. Buss: Optimierung, Springer 2015, Examples 13.2]

x(k + 1) = x(k) + u(k), x(0) = x0

an optimal feedback control u(k) = K(k)x(k) that minimizes the cost

has to be designed in the framework of LQ control.

K(k) = −(B(k)T P (k + 1)B(k) + R(k))−1 B(k)T P (k + 1)A(k)

b) Assume N = 4, r = 1. Give P (0), P (1), . . . P (4) and K(0), K(1),. . ., K(3).

d) What is the cost J ∗ for these realizations?

Problem 2: LQ Control with Infinite Horizon

[M. Papageorgiou, M. Leibold, M. Buss: Optimierung, Springer 2015, Examples 13.3]

a) What is the stationary Riccatti Matrix P∞ ?

b) Give an equation for K∞ of the optimal feedback control u(k) = K∞ x(k).

c) Give the dynamics of the closed loop system.

d) Discuss stability of the closed loop.

Problem 3: LQ as RL with model

You might also like