0% found this document useful (0 votes)
142 views

SF2863 Systems Engineering, 7.5 HP - Intro To Markov Decision Processes PDF

This document provides an introduction to Markov decision processes and the policy improvement algorithm. It begins with an example of a manufacturing unit with two states - working well or poorly. The example is used to illustrate different policies for maintenance and their associated costs and transition probabilities. The document then describes how to determine the optimal policy using the policy improvement algorithm, which iteratively improves the current policy by computing the expected long-term cost and switching to a better action if found.

Uploaded by

Kaushik Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

SF2863 Systems Engineering, 7.5 HP - Intro To Markov Decision Processes PDF

This document provides an introduction to Markov decision processes and the policy improvement algorithm. It begins with an example of a manufacturing unit with two states - working well or poorly. The example is used to illustrate different policies for maintenance and their associated costs and transition probabilities. The document then describes how to determine the optimal policy using the policy improvement algorithm, which iteratively improves the current policy by computing the expected long-term cost and switching to a better action if found.

Uploaded by

Kaushik Iyer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

SF2863 Systems Engineering, 7.

5 HP
- Intro to Markov Decision Processes

Lecturer: Per Enqvist

Optimization and Systems Theory


Department of Mathematics
KTH Royal Institute of Technology

P. Enqvist Systems Engineering


Systems Engineering (SF2863) 7.5 HP

1 Markov Decision Process Prototype example

2 Policy Improvement Algorithm, without discounting

3 Policy Improvement Algorithm, with discounting

4 The End

P. Enqvist Systems Engineering


Markov Decision Process

We consider Markov Chains such that the transition probabilities can


be changed by making policy decisions. As in the first home
assignment, the ferry, the dynamics of the system depends on the
repair policy, and it is possible to change the steady state of the
process.
Here we will determine an efficient algorithm to optimize the decisions
using policy iteration.

For more information see the course book:


Introduction to Operations Research.
Edition Sections
9th 19.1-19.5.
10th 19.1-19.2 + supplement 1 and 2

P. Enqvist Systems Engineering


Prototype Example

A manufacturing unit has two states describing its condition:


State 1 : The unit is working well
State 2 : The unit is working poorly

When in state 1, it generates an income of 400 $ per week.


When in state 2, it generates an income of 250 $ per week.

The owner of the unit clearly has an incentive to keep it working well.
By performing maintenance the unit is more likely to work well, but
there is also a cost associated with the maintenance. (described on next page)

Question: Is it optimal to perform maintenance or not?

P. Enqvist Systems Engineering


Transition probabilities, decisions and costs

Assume that the state of the unit is described by a Markov chain.


Without maintenance the transition probabilities are:
p11 = 0.6, p12 = 0.4,
p21 = 0.2, p22 = 0.8,

If maintenance is performed on the unit when it is working well the


transition probabilities are changed to
p11 = 0.9, p12 = 0.1,
the cost for this maintenance is 50 $ per week.

If maintenance is performed on the unit when it is working poorly the


transition probabilities are changed to
p21 = 0.7, p22 = 0.3,
the cost for this maintenance is 200 $ per week.

P. Enqvist Systems Engineering


Describing all possible Policies
In each state it can be decided to do maintenance or not.
Aim: Determine a decision policy R that determines an action that
only depends on the current state.
Let  
d1 (R)
d(R) =
d2 (R)
where

1 if maintentance is done when in state i
di (R) =
2 if maintentance is not done when in state i

There are 4 different policies for the problem:

       
1 1 2 2
d(R1 ) = , d(R2 ) = , d(R3 ) = , d(R4 ) = .
1 2 1 2

P. Enqvist Systems Engineering


Transition matrices

Each policy determines its own one-step transition matrix:


 
0.9 0.1
For R1 the transition matrix is P(1) = .
0.7 0.3
 
0.9 0.1
For R2 the transition matrix is P(2) = .
0.2 0.8
 
0.6 0.4
For R3 the transition matrix is P(3) = .
0.7 0.3
 
0.6 0.4
For R4 the transition matrix is P(4) = .
0.2 0.8

P. Enqvist Systems Engineering


Steady state solutions

Each policy corresponds to a steady state solution


From the steady state equations π(`) = π(`)P(`) and π1 (`) + π2 (`) = 1
we get

7 1
 
π(1) = 8 8
,

2 1
 
π(2) = 3 3
,

7 4
 
π(3) = 11 11
,

1 2
 
π(4) = 3 3
.

P. Enqvist Systems Engineering


Maintentance costs and incomes
Each week there is an income that depends on the state and a cost
depending on the decision.
Let Cik = expected value of the immediate cost incurred by making the
decision di = k when in state i.

Here:

C1,1 = {in state 1, maintenance } = −400 + 50 = −350

C1,2 = {in state 1, no maintenance } = −400 + 0 = −400

C2,1 = {in state 2, maintenance } = −250 + 200 = −50

C2,2 = {in state 2, no maintenance } = −250 + 0 = −250


P. Enqvist Systems Engineering
Expected stationary costs
Determine the expected stationary costs per week using
2
X
g(R` ) = Ci,di (R` ) πi (`), ` = 1, 2, 3, 4.
i=1

Then
7 1
g(R1 ) = C11 π1 (1) + C21 π2 (1) = (−350) + (−50) = −312.5
8 8
2 1
g(R2 ) = C11 π1 (2) + C22 π2 (2) = (−350) + (−250) = −316.7
3 3
7 4
g(R3 ) = C12 π1 (3) + C21 π2 (3) = (−400) + (−50) = −272.7
11 11
1 2
g(R4 ) = C12 π1 (4) + C22 π2 (4) = (−400) + (−250) = −300
3 3
So Policy R2 is the best.
P. Enqvist Systems Engineering
Optimal solution

Since g(R2 ) is smallest, policy R2 is the optimal one.

In this case it is best to only do maintenance when the unit is working


well. (The probability of the unit to fix itself without maintenance is maybe
unrealistic)

Note that we solved the problem by computing expected costs for all
possible policies.
This works well for small problems, but for larger problems a more
efficient method should be used.
Linear Programming can be used to solve the problem, but we will
consider a faster algorithm based on policy improvement.

P. Enqvist Systems Engineering


Scope of the Policy Improvement Algorithm
Aim: Find the policy R that minimize the expected average cost
M
X
g(R) = πi Cik
i=0
where Ci,k = Ci,di (R) , di (R) = k , and πi are the stationary
state-probabilities with policy R.

HOW? We will consider the expected cost for n time steps and then
take an average.
Let vin (R) be the total expected cost of a system starting in state i and
evolving for n time steps.
Then, from stochastic dynamic programming we obtain the recursion
M
X
vin (R) = Cik + pij (k )vjn−1 (R), where di (R) = k .
j=0
P. Enqvist Systems Engineering
Policy Improvement Algorithm, brief derivation
Now taking the average and letting n → ∞ we have that vin (R)/n goes
to g(R). (independent of i)

Then vin (R) = ng(R) + vi (R), where vi (R) is a transient effect


depending on the starting state i and policy R.
Inserting this into the DynP recursion we obtain the Value
Determination Equation
M
X
(VDE) g(R) = Cik + pij (k )vj (R) − vi (R), i = 0, · · · , M.
j=0

We have M + 1 equations and M + 2 unknowns. Let vM (R) = 0,


consider this as a form of “grounding” one potential, and the equations
can be solved for {g(R), v0 (R), · · · , vM−1 (R)}.
This gives us g(R) for a particular policy R, but how do we optimize
the policy R ?
P. Enqvist Systems Engineering
Policy Improvement Algorithm, brief derivation
We want to minimize g(R) with respect to the policy R.
PM
For each i, the vde say that g(R) = Cik + j=0 pij (k )vj (R) − vi (R).
Assuming that we have a policy Rn and have solved the vde for it and
want to improve it. Using the values of {v0 (Rn ), · · · , vM (Rn )} for the
policy Rn , we can minimize the expression
M
X
Cik + pij (k )vj (Rn ) − vi (Rn )
j=0

with respect to k .
Let k̂i be the minimizing value of k (for each i), and define the next
policy by
di (Rn+1 ) = k̂i .
It can be shown that g(Rn+1 ) ≤ g(Rn ), and if Rn+1 = Rn , then the
policy is optimal.
P. Enqvist Systems Engineering
Policy Improvement Algorithm
It is based on two steps, starting with some policy R0 , n = 0:
Solve for g(Rn ), v0 (Rn ), · · · , vM (Rn ) (assuming vM (Rn ) = 0)
from the Value Determination Equations (vde), where k = di (Rn ),

M
X
g(Rn ) = Cik + pij (k )vj (Rn ) − vi (RN ), i = 0, 1, · · · , M.
j=0

In the Policy Improvement step

 
 M
X 
min Cik + pij (k )vj (Rn ) − vi (Rn ), i = 0, 1, · · · , M.
k =1,2,··· ,K  
j=0

the new policy Rn+1 is determined.


These two steps are iterated until convergence is obtained.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype
 
1
Let R0 be the policy such that d(R0 ) = . (always maintenance)
1
Value Determination Equations (vde), where k = di (R0 )

2
X
g(R0 ) = Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2.
j=1

Where
   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R0 ) = C11 + p11 (1)v1 (R0 ) + p12 (1)v2 (R0 ) − v1 (R0 ), i=1,k =d1 (R0 )=1

g(R0 ) = C21 + p21 (1)v1 (R0 ) + p22 (1)v2 (R0 ) − v2 (R0 ), i=2,k =d2 (R0 )=1

P. Enqvist Systems Engineering


Policy Improvement on the Prototype

   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R0 ) = C11 + p11 (1)v1 (R0 ) + p12 (1)v2 (R0 ) − v1 (R0 ), i=1,k =1

g(R0 ) = C21 + p21 (1)v1 (R0 ) + p22 (1)v2 (R0 ) − v2 (R0 ), i=2,k =1

Let v2 (R0 ) = 0, then


g(R0 ) = −350 + 0.9v1 (R0 ) + 0.1 · 0 − v1 (R0 ),
g(R0 ) = −50 + 0.7v1 (R0 ) + 0.3 · 0 − 0,
Which gives g(R0 ) = −312.5 and v1 (R0 ) = −375.
Note that g(R0 ) is the same weekly cost obtained earlier for that policy.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype

And the Policy Improvement step


 
 2
X 
min Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + p11 (k )v1 (R0 ) + p12 (k )v2 (R0 ) − v1 (R0 ), }


k =1,2
 
 
min −350+0.9(−375)+0.1·0−(−375), −400+0.6(−375)+0.4·0−(−375)
| {z } | {z }
=−256.25 for k =1 =−212.5 for k =2

Let d1 (R1 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement on the Prototype

And the Policy Improvement step


 
 2
X 
min Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + p21 (k )v1 (R0 ) + p22 (k )v2 (R0 ) − v2 (R0 ), }


k =1,2
 
 
min −50+0.7(−375)+0.3·0−0, −250+0.2(−375)+0.8·0−0
| {z } | {z }
=−268.75 for k =1 =−312.5 for k =2

Let d2 (R1 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement on the Prototype
 
1
The updated policy R1 have d(R1 ) = 6= d(R0 ) Not converged!
2
Value Determination Equations (vde), where k = di (R1 )

2
X
g(R1 ) = Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2.
j=1

Where
   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R1 ) = C11 + p11 (1)v1 (R1 ) + p12 (1)v2 (R1 ) − v1 (R1 ), i=1,k =d1 (R1 )=1

g(R1 ) = C22 + p21 (2)v1 (R1 ) + p22 (2)v2 (R1 ) − v2 (R1 ), i=2,k =d2 (R1 )=2

P. Enqvist Systems Engineering


Policy Improvement on the Prototype

   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R1 ) = C11 + p11 (1)v1 (R1 ) + p12 (1)v2 (R1 ) − v1 (R1 ), i=1,k =1

g(R1 ) = C22 + p21 (2)v1 (R1 ) + p22 (2)v2 (R1 ) − v2 (R1 ), i=2,k =2

Let v2 (R1 ) = 0, then


g(R1 ) = −350 + 0.9v1 (R1 ) + 0.1 · 0 − v1 (R1 ),
g(R1 ) = −250 + 0.2v1 (R1 ) + 0.8 · 0 − 0,
Which gives g(R1 ) = −316.7 and v1 (R1 ) = −333.3.
Note that g(R1 ) < g(R0 ), and is the same cost obtained earlier for that
policy.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype

And the Policy Improvement step


 
 2
X 
min Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + p11 (k )v1 (R1 ) + p12 (k )v2 (R1 ) − v1 (R1 ), }


k =1,2
 
 
min −350+0.9(−333.3)+0.1·0−(−333.3), −400+0.6(−333.3)+0.4·0−(−333.3)
| {z } | {z }
=−301.7 for k =1 =−256.7 for k =2

Let d1 (R2 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement on the Prototype

And the Policy Improvement step


 
 2
X 
min Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + p21 (k )v1 (R1 ) + p22 (k )v2 (R1 ) − v2 (R1 ), }


k =1,2
 
 
min −50+0.7(−333.3)+0.3·0−0, −250+0.2(−333.3)+0.8·0−0
| {z } | {z }
=−271.7 for k =1 =−313.3 for k =2

Let d2 (R2 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement on the Prototype

 
1
The updated policy R2 is such that d(R2 ) = = d(R1 ).
2
The algorithm has converged and the optimal policy is the same as
determined before, do maintenance only when unit is working well.

The optimal cost is also as determined before, −316.7.

The algorithm needed one iteration to find the optimal policy and one
iteration to verify convergence.

P. Enqvist Systems Engineering


Discounted costs

Why discounting?

Value of money vary over time, usually it decrease (inflation).


Risk-free interest can be accumulated over time.
Net present value is used to denote todays value.

Also, utility functions are used to measure satisfaction of a customer


from an experience. Having to wait for the experience usually
decrease the satisfaction hence discounting is motivated.

A discount factor α is used to decrease the value for each time step
further in the future considered.

P. Enqvist Systems Engineering


Policy Improvement Algorithm, with discounting
Let Vin (R) be the expected total discounted cost of the process when it
starts in state i and evolves for n time periods according to policy R.
A Heuristic approach is to take the limit as n → ∞ of the DynP
recursive equation
M
X
Vin (R) = Cik + α pij (k )Vjn−1 (R)
j=0

to obtain the value determination equation (VDE)


M
X
Vi (R) = Cik + α pij (k )Vj (R), i = 0, 1, · · · , M.
j=0

It is not clear that this will converge, and therefore an alternative


approach will be demonstrated at the lecture.
P. Enqvist Systems Engineering
Policy Improvement Algorithm, with discounting
It is based on two steps, starting with some policy R0 , n = 0:

Solve for V0 (Rn ), · · · , VM (Rn )


from the Value Determination Equations (VDE), where k = di (Rn ),

M
X
Vi (RN ) = Cik + α pij (k )Vj (Rn ) i = 0, 1, · · · , M.
j=0

In the Policy Improvement step


 
 M
X 
min Cik + α pij (k )Vj (Rn ), i = 0, 1, · · · , M.
k =1,2,··· ,K  
j=0

the new policy Rn+1 is determined.

These two steps are iterated until convergence is obtained.


P. Enqvist Systems Engineering
Policy Improvement, with discounting
 
1
Let α = 0.9, and R0 be the policy such that d(R0 ) = .
1
Value Determination Equations (VDE), where k = di (R0 ),

2
X
Vi (R0 ) = Cik + α pij (k )Vj (R0 ), i = 1, 2.
j=1

Where
   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R0 ) = C11 + α [p11 (1)V1 (R0 ) + p12 (1)V2 (R0 )] , i=1,k =d1 (R0 )=1

V2 (R0 ) = C21 + α [p21 (1)V1 (R0 ) + p22 (1)V2 (R0 )] , i=2,k =d2 (R0 )=1

P. Enqvist Systems Engineering


Policy Improvement, with discounting

   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R0 ) = C11 + 0.9 [p11 (1)V1 (R0 ) + p12 (1)V2 (R0 )] , i=1,k =1

V2 (R0 ) = C21 + 0.9 [p21 (1)V1 (R0 ) + p22 (1)V2 (R0 )] , i=2,k =1

Then
V1 (R0 ) = −350 + 0.9 [0.9V1 (R0 ) + 0.1V2 (R0 )] ,
V2 (R0 ) = −50 + 0.9 [0.7V1 (R0 ) + 0.3V2 (R0 )] ,

Which gives V1 (R0 ) = −3171 and V2 (R0 ) = −2805.

P. Enqvist Systems Engineering


Policy Improvement, with discounting

And the Policy Improvement step


 
 2
X 
min Cik + α pij (k )Vj (R0 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + α [p11 (k )V1 (R0 ) + p12 (k )V2 (R0 )] , }


k =1,2
 
 
min −350+0.9[0.9(−3171)+0.1(−2805)], −400+0.9[0.6(−3171)+0.4(−2805)]
| {z } | {z }
=−3171 for k =1 =−3122 for k =2

Let d1 (R1 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement, with discounting

And the Policy Improvement step


 
 2
X 
min Cik + α pij (k )Vj (R0 ) i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + α [p21 (k )V1 (R0 ) + p22 (k )V2 (R0 )] , }


k =1,2
 
 
min −50+0.9[0.7(−3171)+0.3(−2805)], −250+0.9[0.2(−3171)+0.8(−2805)]
| {z } | {z }
=−2805 for k =1 =−2840 for k =2

Let d2 (R1 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement, with discounting
 
1
The updated policy R1 have d(R1 ) = 6= d(R0 ) Not converged!
2
Value Determination Equations (VDE), where k = di (R1 ),

2
X
Vi (R1 ) = Cik + α pij (k )Vj (R1 ), i = 1, 2.
j=1

Where
   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R1 ) = C11 + 0.9 [p11 (1)V1 (R1 ) + p12 (1)V2 (R1 )] , i=1,k =d1 (R1 )=1

V2 (R1 ) = C22 + 0.9 [p21 (2)V1 (R1 ) + p22 (2)V2 (R1 )] , i=2,k =d2 (R1 )=2

P. Enqvist Systems Engineering


Policy Improvement, with discounting

   
C= C11 C12 C21 C22 = −350 −400 −50 −250 ,
   
0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R1 ) = C11 + 0.9 [p11 (1)V1 (R1 ) + p12 (1)V2 (R1 )] , i=1,k =1

V2 (R1 ) = C21 + 0.9 [p21 (2)V1 (R1 ) + p22 (2)V2 (R1 )] , i=2,k =2

Then
V1 (R0 ) = −350 + 0.9 [0.9V1 (R1 ) + 0.1V2 (R1 )] ,
V2 (R1 ) = −250 + 0.9 [0.2V1 (R1 ) + 0.8V2 (R1 )] ,

Which gives V1 (R1 ) = −3257 < −3171 = V1 (R0 ) and


V2 (R1 ) = −2986 < −2805 = V2 (R0 ), so the policy has been improved.
P. Enqvist Systems Engineering
Policy Improvement, with discounting

And the Policy Improvement step


 
 2
X 
min Cik + α pij (k )Vj (R1 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + α [p11 (k )V1 (R1 ) + p12 (k )V2 (R1 )) , }


k =1,2
 
 
min −350+0.9[0.9(−3257)+0.1(−2986)], −400+0.9[0.6(−3257)+0.4(−2986)]
| {z } | {z }
=−3257 for k =1 =−3234 for k =2

Let d1 (R2 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement, with discounting

And the Policy Improvement step


 
 2
X 
min Cik + α pij (k )Vj (R1 ) i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + α [p21 (k )V1 (R1 ) + p22 (k )V2 (R1 )] , }


k =1,2
 
 
min −50+0.9[0.7(−3257)+0.3(−2986)], −250+0.9[0.2(−3257)+0.8(−2986)]
| {z } | {z }
=−2908 for k =1 =−2986 for k =2

Let d2 (R2 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering


Policy Improvement on the Prototype
 
1
The updated policy R2 is such that d(R2 ) = = d(R1 ).
2
The algorithm has converged and the optimal policy is the same as
determined before, do maintenance only when unit is working well.

The algorithm needed one iteration to find the optimal policy and one
iteration to verify convergence.

The optimal expected cost is -3257 if we start with unit working well.
The optimal expected cost is -2986 if we start with unit working poorly.

The costs are roughly 10 times larger than for the non-discounted
case. We consider the whole discounted future instead of just one time step.

X 1
Note that the sum αk = = 10 for α = 0.9.
1−α
k =0

P. Enqvist Systems Engineering


Choice of discount factor

For a discount factor α close to one it is quite likely that the optimal
policy will be the same as for the problem without discount.

The smaller the discount factor α is, the more likely it is that the
optimal policy will focus on minimizing the immediate cost at each
state, and not on the long term effects.

P. Enqvist Systems Engineering


Rounding up
We have now considered some parts of Systems Engineering and
Operations Research, such as
Markov chains and Markov process theory
Queueing theory - M|M|s models and Jackson Networks
Spare parts optimization - One base, several LRU types, METRIC
model idea
Marginal Allocation - Seperable integer convex functions,
Multiobjective optimization and efficient solutions
Inventory theory - Versions of the EOQ model, Newsvendor
problem, Deterministic Periodic review
Dynamic Programming - Deterministic and Stochastic versions
Markov Decision processes - Policy iteration algorithm, with and
without discounting
These elements constitutes the building blocks for many mathematical
model descriptions of complex processes and systems and the
methods used to evaluate their performance.
P. Enqvist Systems Engineering
Rounding up
Our hope is that you after this course will be able to identify system
problem formulations in your encounters with technical systems and
recognize the usefulness of the tools for analysis and making
management decisions.

Welcome to our project based optimization and systems courses for


more fun!
SF2812 Applied Linear Optimization
SF2822 Applied Nonlinear Optimization
SF2866 Applied Systems Engineering
SF2868 Systems Engineering, Business and Management
SF2842 Geometric Control Theory
SF2852 Optimal Control Theory

Thanks for your participation!


P. Enqvist Systems Engineering

You might also like