0% found this document useful (0 votes)

142 views

SF2863 Systems Engineering, 7.5 HP - Intro To Markov Decision Processes PDF

This document provides an introduction to Markov decision processes and the policy improvement algorithm. It begins with an example of a manufacturing unit with two states - working well or poorly. The example is used to illustrate different policies for maintenance and their associated costs and transition probabilities. The document then describes how to determine the optimal policy using the policy improvement algorithm, which iteratively improves the current policy by computing the expected long-term cost and switching to a better action if found.

Uploaded by

Kaushik Iyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views

SF2863 Systems Engineering, 7.5 HP - Intro To Markov Decision Processes PDF

Uploaded by

Kaushik Iyer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

SF2863 Systems Engineering, 7.

5 HP
- Intro to Markov Decision Processes

Lecturer: Per Enqvist

Optimization and Systems Theory

Department of Mathematics
KTH Royal Institute of Technology

P. Enqvist Systems Engineering

Systems Engineering (SF2863) 7.5 HP

1 Markov Decision Process Prototype example

2 Policy Improvement Algorithm, without discounting

3 Policy Improvement Algorithm, with discounting

4 The End

P. Enqvist Systems Engineering

Markov Decision Process

We consider Markov Chains such that the transition probabilities can

be changed by making policy decisions. As in the first home
assignment, the ferry, the dynamics of the system depends on the
repair policy, and it is possible to change the steady state of the
process.
Here we will determine an efficient algorithm to optimize the decisions
using policy iteration.

For more information see the course book:

Introduction to Operations Research.
Edition Sections
9th 19.1-19.5.
10th 19.1-19.2 + supplement 1 and 2

P. Enqvist Systems Engineering

Prototype Example

A manufacturing unit has two states describing its condition:

State 1 : The unit is working well
State 2 : The unit is working poorly

When in state 1, it generates an income of 400 $ per week.

When in state 2, it generates an income of 250 $ per week.

The owner of the unit clearly has an incentive to keep it working well.
By performing maintenance the unit is more likely to work well, but
there is also a cost associated with the maintenance. (described on next page)

Question: Is it optimal to perform maintenance or not?

P. Enqvist Systems Engineering

Transition probabilities, decisions and costs

Assume that the state of the unit is described by a Markov chain.

Without maintenance the transition probabilities are:
p11 = 0.6, p12 = 0.4,
p21 = 0.2, p22 = 0.8,

If maintenance is performed on the unit when it is working well the

transition probabilities are changed to
p11 = 0.9, p12 = 0.1,
the cost for this maintenance is 50 $ per week.

If maintenance is performed on the unit when it is working poorly the

transition probabilities are changed to
p21 = 0.7, p22 = 0.3,
the cost for this maintenance is 200 $ per week.

P. Enqvist Systems Engineering

Describing all possible Policies
In each state it can be decided to do maintenance or not.
Aim: Determine a decision policy R that determines an action that
only depends on the current state.
Let
d1 (R)
d(R) =
d2 (R)
where

1 if maintentance is done when in state i
di (R) =
2 if maintentance is not done when in state i

There are 4 different policies for the problem:

1 1 2 2
d(R1 ) = , d(R2 ) = , d(R3 ) = , d(R4 ) = .
1 2 1 2

P. Enqvist Systems Engineering

Transition matrices

Each policy determines its own one-step transition matrix:

0.9 0.1
For R1 the transition matrix is P(1) = .
0.7 0.3

0.9 0.1
For R2 the transition matrix is P(2) = .
0.2 0.8

0.6 0.4
For R3 the transition matrix is P(3) = .
0.7 0.3

0.6 0.4
For R4 the transition matrix is P(4) = .
0.2 0.8

P. Enqvist Systems Engineering

Steady state solutions

Each policy corresponds to a steady state solution

From the steady state equations π(`) = π(`)P(`) and π1 (`) + π2 (`) = 1
we get

7 1

π(1) = 8 8
,

2 1

π(2) = 3 3
,

7 4

π(3) = 11 11
,

1 2

π(4) = 3 3
.

P. Enqvist Systems Engineering

Maintentance costs and incomes
Each week there is an income that depends on the state and a cost
depending on the decision.
Let Cik = expected value of the immediate cost incurred by making the
decision di = k when in state i.

Here:

C1,1 = {in state 1, maintenance } = −400 + 50 = −350

C1,2 = {in state 1, no maintenance } = −400 + 0 = −400

C2,1 = {in state 2, maintenance } = −250 + 200 = −50

C2,2 = {in state 2, no maintenance } = −250 + 0 = −250

P. Enqvist Systems Engineering
Expected stationary costs
Determine the expected stationary costs per week using
2
X
g(R` ) = Ci,di (R` ) πi (`), ` = 1, 2, 3, 4.
i=1

Then
7 1
g(R1 ) = C11 π1 (1) + C21 π2 (1) = (−350) + (−50) = −312.5
8 8
2 1
g(R2 ) = C11 π1 (2) + C22 π2 (2) = (−350) + (−250) = −316.7
3 3
7 4
g(R3 ) = C12 π1 (3) + C21 π2 (3) = (−400) + (−50) = −272.7
11 11
1 2
g(R4 ) = C12 π1 (4) + C22 π2 (4) = (−400) + (−250) = −300
3 3
So Policy R2 is the best.
P. Enqvist Systems Engineering
Optimal solution

Since g(R2 ) is smallest, policy R2 is the optimal one.

In this case it is best to only do maintenance when the unit is working

well. (The probability of the unit to fix itself without maintenance is maybe
unrealistic)

Note that we solved the problem by computing expected costs for all
possible policies.
This works well for small problems, but for larger problems a more
efficient method should be used.
Linear Programming can be used to solve the problem, but we will
consider a faster algorithm based on policy improvement.

P. Enqvist Systems Engineering

Scope of the Policy Improvement Algorithm
Aim: Find the policy R that minimize the expected average cost
M
X
g(R) = πi Cik
i=0
where Ci,k = Ci,di (R) , di (R) = k , and πi are the stationary
state-probabilities with policy R.

HOW? We will consider the expected cost for n time steps and then
take an average.
Let vin (R) be the total expected cost of a system starting in state i and
evolving for n time steps.
Then, from stochastic dynamic programming we obtain the recursion
M
X
vin (R) = Cik + pij (k )vjn−1 (R), where di (R) = k .
j=0
P. Enqvist Systems Engineering
Policy Improvement Algorithm, brief derivation
Now taking the average and letting n → ∞ we have that vin (R)/n goes
to g(R). (independent of i)

Then vin (R) = ng(R) + vi (R), where vi (R) is a transient effect

depending on the starting state i and policy R.
Inserting this into the DynP recursion we obtain the Value
Determination Equation
M
X
(VDE) g(R) = Cik + pij (k )vj (R) − vi (R), i = 0, · · · , M.
j=0

We have M + 1 equations and M + 2 unknowns. Let vM (R) = 0,

consider this as a form of “grounding” one potential, and the equations
can be solved for {g(R), v0 (R), · · · , vM−1 (R)}.
This gives us g(R) for a particular policy R, but how do we optimize
the policy R ?
P. Enqvist Systems Engineering
Policy Improvement Algorithm, brief derivation
We want to minimize g(R) with respect to the policy R.
PM
For each i, the vde say that g(R) = Cik + j=0 pij (k )vj (R) − vi (R).
Assuming that we have a policy Rn and have solved the vde for it and
want to improve it. Using the values of {v0 (Rn ), · · · , vM (Rn )} for the
policy Rn , we can minimize the expression
M
X
Cik + pij (k )vj (Rn ) − vi (Rn )
j=0

with respect to k .
Let k̂i be the minimizing value of k (for each i), and define the next
policy by
di (Rn+1 ) = k̂i .
It can be shown that g(Rn+1 ) ≤ g(Rn ), and if Rn+1 = Rn , then the
policy is optimal.
P. Enqvist Systems Engineering
Policy Improvement Algorithm
It is based on two steps, starting with some policy R0 , n = 0:
Solve for g(Rn ), v0 (Rn ), · · · , vM (Rn ) (assuming vM (Rn ) = 0)
from the Value Determination Equations (vde), where k = di (Rn ),

M
X
g(Rn ) = Cik + pij (k )vj (Rn ) − vi (RN ), i = 0, 1, · · · , M.
j=0

In the Policy Improvement step

 
 M
X 
min Cik + pij (k )vj (Rn ) − vi (Rn ), i = 0, 1, · · · , M.
k =1,2,··· ,K  
j=0

the new policy Rn+1 is determined.

These two steps are iterated until convergence is obtained.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype

1
Let R0 be the policy such that d(R0 ) = . (always maintenance)
1
Value Determination Equations (vde), where k = di (R0 )

2
X
g(R0 ) = Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2.
j=1

Where

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R0 ) = C11 + p11 (1)v1 (R0 ) + p12 (1)v2 (R0 ) − v1 (R0 ), i=1,k =d1 (R0 )=1

g(R0 ) = C21 + p21 (1)v1 (R0 ) + p22 (1)v2 (R0 ) − v2 (R0 ), i=2,k =d2 (R0 )=1

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R0 ) = C11 + p11 (1)v1 (R0 ) + p12 (1)v2 (R0 ) − v1 (R0 ), i=1,k =1

g(R0 ) = C21 + p21 (1)v1 (R0 ) + p22 (1)v2 (R0 ) − v2 (R0 ), i=2,k =1

Let v2 (R0 ) = 0, then

g(R0 ) = −350 + 0.9v1 (R0 ) + 0.1 · 0 − v1 (R0 ),
g(R0 ) = −50 + 0.7v1 (R0 ) + 0.3 · 0 − 0,
Which gives g(R0 ) = −312.5 and v1 (R0 ) = −375.
Note that g(R0 ) is the same weekly cost obtained earlier for that policy.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype

And the Policy Improvement step

 
 2
X 
min Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + p11 (k )v1 (R0 ) + p12 (k )v2 (R0 ) − v1 (R0 ), }

k =1,2
 
 
min −350+0.9(−375)+0.1·0−(−375), −400+0.6(−375)+0.4·0−(−375)
| {z } | {z }
=−256.25 for k =1 =−212.5 for k =2

Let d1 (R1 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

And the Policy Improvement step

 
 2
X 
min Cik + pij (k )vj (R0 ) − vi (R0 ), i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + p21 (k )v1 (R0 ) + p22 (k )v2 (R0 ) − v2 (R0 ), }

k =1,2
 
 
min −50+0.7(−375)+0.3·0−0, −250+0.2(−375)+0.8·0−0
| {z } | {z }
=−268.75 for k =1 =−312.5 for k =2

Let d2 (R1 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

1
The updated policy R1 have d(R1 ) = 6= d(R0 ) Not converged!
2
Value Determination Equations (vde), where k = di (R1 )

2
X
g(R1 ) = Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2.
j=1

Where

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R1 ) = C11 + p11 (1)v1 (R1 ) + p12 (1)v2 (R1 ) − v1 (R1 ), i=1,k =d1 (R1 )=1

g(R1 ) = C22 + p21 (2)v1 (R1 ) + p22 (2)v2 (R1 ) − v2 (R1 ), i=2,k =d2 (R1 )=2

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

g(R1 ) = C11 + p11 (1)v1 (R1 ) + p12 (1)v2 (R1 ) − v1 (R1 ), i=1,k =1

g(R1 ) = C22 + p21 (2)v1 (R1 ) + p22 (2)v2 (R1 ) − v2 (R1 ), i=2,k =2

Let v2 (R1 ) = 0, then

g(R1 ) = −350 + 0.9v1 (R1 ) + 0.1 · 0 − v1 (R1 ),
g(R1 ) = −250 + 0.2v1 (R1 ) + 0.8 · 0 − 0,
Which gives g(R1 ) = −316.7 and v1 (R1 ) = −333.3.
Note that g(R1 ) < g(R0 ), and is the same cost obtained earlier for that
policy.
P. Enqvist Systems Engineering
Policy Improvement on the Prototype

And the Policy Improvement step

 
 2
X 
min Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + p11 (k )v1 (R1 ) + p12 (k )v2 (R1 ) − v1 (R1 ), }

k =1,2
 
 
min −350+0.9(−333.3)+0.1·0−(−333.3), −400+0.6(−333.3)+0.4·0−(−333.3)
| {z } | {z }
=−301.7 for k =1 =−256.7 for k =2

Let d1 (R2 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

And the Policy Improvement step

 
 2
X 
min Cik + pij (k )vj (R1 ) − vi (R1 ), i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + p21 (k )v1 (R1 ) + p22 (k )v2 (R1 ) − v2 (R1 ), }

k =1,2
 
 
min −50+0.7(−333.3)+0.3·0−0, −250+0.2(−333.3)+0.8·0−0
| {z } | {z }
=−271.7 for k =1 =−313.3 for k =2

Let d2 (R2 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

1
The updated policy R2 is such that d(R2 ) = = d(R1 ).
2
The algorithm has converged and the optimal policy is the same as
determined before, do maintenance only when unit is working well.

The optimal cost is also as determined before, −316.7.

The algorithm needed one iteration to find the optimal policy and one
iteration to verify convergence.

P. Enqvist Systems Engineering

Discounted costs

Why discounting?

Value of money vary over time, usually it decrease (inflation).

Risk-free interest can be accumulated over time.
Net present value is used to denote todays value.

Also, utility functions are used to measure satisfaction of a customer

from an experience. Having to wait for the experience usually
decrease the satisfaction hence discounting is motivated.

A discount factor α is used to decrease the value for each time step
further in the future considered.

P. Enqvist Systems Engineering

Policy Improvement Algorithm, with discounting
Let Vin (R) be the expected total discounted cost of the process when it
starts in state i and evolves for n time periods according to policy R.
A Heuristic approach is to take the limit as n → ∞ of the DynP
recursive equation
M
X
Vin (R) = Cik + α pij (k )Vjn−1 (R)
j=0

to obtain the value determination equation (VDE)

M
X
Vi (R) = Cik + α pij (k )Vj (R), i = 0, 1, · · · , M.
j=0

It is not clear that this will converge, and therefore an alternative

approach will be demonstrated at the lecture.
P. Enqvist Systems Engineering
Policy Improvement Algorithm, with discounting
It is based on two steps, starting with some policy R0 , n = 0:

Solve for V0 (Rn ), · · · , VM (Rn )

from the Value Determination Equations (VDE), where k = di (Rn ),

M
X
Vi (RN ) = Cik + α pij (k )Vj (Rn ) i = 0, 1, · · · , M.
j=0

In the Policy Improvement step

 
 M
X 
min Cik + α pij (k )Vj (Rn ), i = 0, 1, · · · , M.
k =1,2,··· ,K  
j=0

the new policy Rn+1 is determined.

These two steps are iterated until convergence is obtained.

P. Enqvist Systems Engineering
Policy Improvement, with discounting

1
Let α = 0.9, and R0 be the policy such that d(R0 ) = .
1
Value Determination Equations (VDE), where k = di (R0 ),

2
X
Vi (R0 ) = Cik + α pij (k )Vj (R0 ), i = 1, 2.
j=1

Where

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R0 ) = C11 + α [p11 (1)V1 (R0 ) + p12 (1)V2 (R0 )] , i=1,k =d1 (R0 )=1

V2 (R0 ) = C21 + α [p21 (1)V1 (R0 ) + p22 (1)V2 (R0 )] , i=2,k =d2 (R0 )=1

P. Enqvist Systems Engineering

Policy Improvement, with discounting

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R0 ) = C11 + 0.9 [p11 (1)V1 (R0 ) + p12 (1)V2 (R0 )] , i=1,k =1

V2 (R0 ) = C21 + 0.9 [p21 (1)V1 (R0 ) + p22 (1)V2 (R0 )] , i=2,k =1

Then
V1 (R0 ) = −350 + 0.9 [0.9V1 (R0 ) + 0.1V2 (R0 )] ,
V2 (R0 ) = −50 + 0.9 [0.7V1 (R0 ) + 0.3V2 (R0 )] ,

Which gives V1 (R0 ) = −3171 and V2 (R0 ) = −2805.

P. Enqvist Systems Engineering

Policy Improvement, with discounting

And the Policy Improvement step

 
 2
X 
min Cik + α pij (k )Vj (R0 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + α [p11 (k )V1 (R0 ) + p12 (k )V2 (R0 )] , }

k =1,2
 
 
min −350+0.9[0.9(−3171)+0.1(−2805)], −400+0.9[0.6(−3171)+0.4(−2805)]
| {z } | {z }
=−3171 for k =1 =−3122 for k =2

Let d1 (R1 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement, with discounting

And the Policy Improvement step

 
 2
X 
min Cik + α pij (k )Vj (R0 ) i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + α [p21 (k )V1 (R0 ) + p22 (k )V2 (R0 )] , }

k =1,2
 
 
min −50+0.9[0.7(−3171)+0.3(−2805)], −250+0.9[0.2(−3171)+0.8(−2805)]
| {z } | {z }
=−2805 for k =1 =−2840 for k =2

Let d2 (R1 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement, with discounting

1
The updated policy R1 have d(R1 ) = 6= d(R0 ) Not converged!
2
Value Determination Equations (VDE), where k = di (R1 ),

2
X
Vi (R1 ) = Cik + α pij (k )Vj (R1 ), i = 1, 2.
j=1

Where

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R1 ) = C11 + 0.9 [p11 (1)V1 (R1 ) + p12 (1)V2 (R1 )] , i=1,k =d1 (R1 )=1

V2 (R1 ) = C22 + 0.9 [p21 (2)V1 (R1 ) + p22 (2)V2 (R1 )] , i=2,k =d2 (R1 )=2

P. Enqvist Systems Engineering

Policy Improvement, with discounting

C= C11 C12 C21 C22 = −350 −400 −50 −250 ,

0.9 0.1 0.6 0.4
P(1) = , P(2) = .
0.7 0.3 0.2 0.8

V1 (R1 ) = C11 + 0.9 [p11 (1)V1 (R1 ) + p12 (1)V2 (R1 )] , i=1,k =1

V2 (R1 ) = C21 + 0.9 [p21 (2)V1 (R1 ) + p22 (2)V2 (R1 )] , i=2,k =2

Then
V1 (R0 ) = −350 + 0.9 [0.9V1 (R1 ) + 0.1V2 (R1 )] ,
V2 (R1 ) = −250 + 0.9 [0.2V1 (R1 ) + 0.8V2 (R1 )] ,

Which gives V1 (R1 ) = −3257 < −3171 = V1 (R0 ) and

V2 (R1 ) = −2986 < −2805 = V2 (R0 ), so the policy has been improved.
P. Enqvist Systems Engineering
Policy Improvement, with discounting

And the Policy Improvement step

 
 2
X 
min Cik + α pij (k )Vj (R1 ), i = 1, 2
k =1,2  
j=1

For i = 1

min {C1k + α [p11 (k )V1 (R1 ) + p12 (k )V2 (R1 )) , }

k =1,2
 
 
min −350+0.9[0.9(−3257)+0.1(−2986)], −400+0.9[0.6(−3257)+0.4(−2986)]
| {z } | {z }
=−3257 for k =1 =−3234 for k =2

Let d1 (R2 ) = 1, since k = 1 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement, with discounting

And the Policy Improvement step

 
 2
X 
min Cik + α pij (k )Vj (R1 ) i = 1, 2
k =1,2  
j=1

For i = 2

min {C2k + α [p21 (k )V1 (R1 ) + p22 (k )V2 (R1 )] , }

k =1,2
 
 
min −50+0.9[0.7(−3257)+0.3(−2986)], −250+0.9[0.2(−3257)+0.8(−2986)]
| {z } | {z }
=−2908 for k =1 =−2986 for k =2

Let d2 (R2 ) = 2, since k = 2 gave smallest value.

P. Enqvist Systems Engineering

Policy Improvement on the Prototype

1
The updated policy R2 is such that d(R2 ) = = d(R1 ).
2
The algorithm has converged and the optimal policy is the same as
determined before, do maintenance only when unit is working well.

The algorithm needed one iteration to find the optimal policy and one
iteration to verify convergence.

The optimal expected cost is -3257 if we start with unit working well.
The optimal expected cost is -2986 if we start with unit working poorly.

The costs are roughly 10 times larger than for the non-discounted
case. We consider the whole discounted future instead of just one time step.
∞
X 1
Note that the sum αk = = 10 for α = 0.9.
1−α
k =0

P. Enqvist Systems Engineering

Choice of discount factor

For a discount factor α close to one it is quite likely that the optimal
policy will be the same as for the problem without discount.

The smaller the discount factor α is, the more likely it is that the
optimal policy will focus on minimizing the immediate cost at each
state, and not on the long term effects.

P. Enqvist Systems Engineering

Rounding up
We have now considered some parts of Systems Engineering and
Operations Research, such as
Markov chains and Markov process theory
Queueing theory - M|M|s models and Jackson Networks
Spare parts optimization - One base, several LRU types, METRIC
model idea
Marginal Allocation - Seperable integer convex functions,
Multiobjective optimization and efficient solutions
Inventory theory - Versions of the EOQ model, Newsvendor
problem, Deterministic Periodic review
Dynamic Programming - Deterministic and Stochastic versions
Markov Decision processes - Policy iteration algorithm, with and
without discounting
These elements constitutes the building blocks for many mathematical
model descriptions of complex processes and systems and the
methods used to evaluate their performance.
P. Enqvist Systems Engineering
Rounding up
Our hope is that you after this course will be able to identify system
problem formulations in your encounters with technical systems and
recognize the usefulness of the tools for analysis and making
management decisions.

Welcome to our project based optimization and systems courses for

more fun!
SF2812 Applied Linear Optimization
SF2822 Applied Nonlinear Optimization
SF2866 Applied Systems Engineering
SF2868 Systems Engineering, Business and Management
SF2842 Geometric Control Theory
SF2852 Optimal Control Theory

Thanks for your participation!

P. Enqvist Systems Engineering

CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
A. Standard Competence
100% (1)
A. Standard Competence
4 pages
Urban Design Notes
No ratings yet
Urban Design Notes
9 pages
PGMDP
No ratings yet
PGMDP
11 pages
Optimization of Power Grid Maintenance With MDP
No ratings yet
Optimization of Power Grid Maintenance With MDP
6 pages
Or A Readings Summary 20jan2019 v10
No ratings yet
Or A Readings Summary 20jan2019 v10
25 pages
1743142502 SP 10 Markov Decision Process
No ratings yet
1743142502 SP 10 Markov Decision Process
20 pages
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
No ratings yet
19.5 Markov Decision Processes: Resolving Unbounded Expected Rewards
13 pages
Markov Decision Process II
No ratings yet
Markov Decision Process II
88 pages
MDP 2
No ratings yet
MDP 2
53 pages
Solved Examples For Chapter 19
No ratings yet
Solved Examples For Chapter 19
7 pages
solutions-Markov_Decision_Processes
No ratings yet
solutions-Markov_Decision_Processes
8 pages
LEC8 Maint Policy
No ratings yet
LEC8 Maint Policy
54 pages
Maintenance I
No ratings yet
Maintenance I
32 pages
Pomdps
No ratings yet
Pomdps
76 pages
Exercises Markov
No ratings yet
Exercises Markov
14 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Markov Decision Processes and Exact Solution Methods
No ratings yet
Markov Decision Processes and Exact Solution Methods
34 pages
1502.05477v5
No ratings yet
1502.05477v5
16 pages
F20-AI-L9
No ratings yet
F20-AI-L9
44 pages
Lec 09
No ratings yet
Lec 09
51 pages
Dynamic Preventive Maintenance Policy Based On Health Index
No ratings yet
Dynamic Preventive Maintenance Policy Based On Health Index
5 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
No ratings yet
AI512/EE633: Reinforcement Learning: Lecture 3 - Dynamic Programming
43 pages
project_report
No ratings yet
project_report
8 pages
Reinforcement Learning 3 Recap
No ratings yet
Reinforcement Learning 3 Recap
3 pages
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .12
No ratings yet
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .12
30 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
Terminal Cost
No ratings yet
Terminal Cost
1 page
Kelompok 4
No ratings yet
Kelompok 4
13 pages
MIT 6.036 Lecture
No ratings yet
MIT 6.036 Lecture
64 pages
Cours
No ratings yet
Cours
19 pages
Dynamic Programming and Markov Processes
No ratings yet
Dynamic Programming and Markov Processes
152 pages
Lec 08
No ratings yet
Lec 08
59 pages
5 - MDP
No ratings yet
5 - MDP
42 pages
Or A Paper Summary 02nov2018 v10
No ratings yet
Or A Paper Summary 02nov2018 v10
14 pages
Analysis of Production Systems Dr MS Hussein
No ratings yet
Analysis of Production Systems Dr MS Hussein
253 pages
session-10
No ratings yet
session-10
24 pages
Dynamic Control Of Quality In Productioninventory Systems Coordination And Optimization Springer Series In Operations Research And Financial Engineering 1st Edition David D Yao instant download
No ratings yet
Dynamic Control Of Quality In Productioninventory Systems Coordination And Optimization Springer Series In Operations Research And Financial Engineering 1st Edition David D Yao instant download
82 pages
Phase 4 Solve Problems by Applying The Algorithms of The Unit 2
No ratings yet
Phase 4 Solve Problems by Applying The Algorithms of The Unit 2
24 pages
Lecture#3_Bellmann_Equation_and_Dynamic_programming_DP_2024_Part
No ratings yet
Lecture#3_Bellmann_Equation_and_Dynamic_programming_DP_2024_Part
33 pages
Hw1 Solution All
No ratings yet
Hw1 Solution All
1 page
08 - Markov Decision Processes
No ratings yet
08 - Markov Decision Processes
31 pages
5 Finalproject
No ratings yet
5 Finalproject
21 pages
L09 - Stochastic Component Replacement Problems
No ratings yet
L09 - Stochastic Component Replacement Problems
9 pages
2-dynamic
No ratings yet
2-dynamic
50 pages
Chapter 19solutions Ch19
No ratings yet
Chapter 19solutions Ch19
26 pages
Lecture 09 - Decision Making Uncertainty
No ratings yet
Lecture 09 - Decision Making Uncertainty
51 pages
RL UNIT-4
No ratings yet
RL UNIT-4
18 pages
Chapter 10
No ratings yet
Chapter 10
17 pages
Unit 05 Dynamic Programming
No ratings yet
Unit 05 Dynamic Programming
9 pages
Detailed Inventory Record Inaccuracy Analysis: Scholarworks@Uark
No ratings yet
Detailed Inventory Record Inaccuracy Analysis: Scholarworks@Uark
169 pages
Module 04
No ratings yet
Module 04
63 pages
Lec 4
No ratings yet
Lec 4
16 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
1737616327_SP_5_MC_Examples
No ratings yet
1737616327_SP_5_MC_Examples
19 pages
Task 6 - Jorge Garcia - 212066 - 15
No ratings yet
Task 6 - Jorge Garcia - 212066 - 15
49 pages
An Introduction To Policy Search Methods: Thomas Furmston
No ratings yet
An Introduction To Policy Search Methods: Thomas Furmston
33 pages
Dynamic Programing and Optimal Control
No ratings yet
Dynamic Programing and Optimal Control
276 pages
Dynamic Programing and Optimal Control PDF
No ratings yet
Dynamic Programing and Optimal Control PDF
276 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Alucobond Plus
No ratings yet
Alucobond Plus
4 pages
Giordano
100% (6)
Giordano
40 pages
Quality Management Multiple Choice
100% (1)
Quality Management Multiple Choice
21 pages
PR2 Eloi
100% (1)
PR2 Eloi
21 pages
Gis Assignment 1
No ratings yet
Gis Assignment 1
7 pages
Turn in Recitation and Tutorial Scheduling Form Policy: Text
No ratings yet
Turn in Recitation and Tutorial Scheduling Form Policy: Text
4 pages
Amec Spie Kelsop Info Uk
No ratings yet
Amec Spie Kelsop Info Uk
7 pages
Download Complete You Don t Have to Learn Everything the Hard Way What I Wish Someone Had Told Me 2nd Edition Saul PDF for All Chapters
100% (15)
Download Complete You Don t Have to Learn Everything the Hard Way What I Wish Someone Had Told Me 2nd Edition Saul PDF for All Chapters
60 pages
Question Pattern - FALL 2012: Term End Examination
No ratings yet
Question Pattern - FALL 2012: Term End Examination
3 pages
Kelompok 5 - Strategi Safari
No ratings yet
Kelompok 5 - Strategi Safari
19 pages
Smart Data Discovery and Storytelling With Narratives For The Next Gen Business Intelligence (BI) and Analytics
No ratings yet
Smart Data Discovery and Storytelling With Narratives For The Next Gen Business Intelligence (BI) and Analytics
17 pages
Psychology Case Study
No ratings yet
Psychology Case Study
8 pages
Curiosity Velocity and Acceleration
No ratings yet
Curiosity Velocity and Acceleration
4 pages
TCS Ninja QP1 PDF
No ratings yet
TCS Ninja QP1 PDF
6 pages
2 Marks & 16 Marks
No ratings yet
2 Marks & 16 Marks
5 pages
SP6 (4) 1969
No ratings yet
SP6 (4) 1969
59 pages
Sha512 PDF
No ratings yet
Sha512 PDF
4 pages
Social Science
0% (1)
Social Science
27 pages
Disclosure To Promote The Right To Information: IS 2103 (1980) : Engineer's Squares (PGD 25: Engineering Metrology)
No ratings yet
Disclosure To Promote The Right To Information: IS 2103 (1980) : Engineer's Squares (PGD 25: Engineering Metrology)
10 pages
Interpersonal Conflict
No ratings yet
Interpersonal Conflict
8 pages
Standard Inspection Procedure For Positive Material Indentification (PMI)
100% (1)
Standard Inspection Procedure For Positive Material Indentification (PMI)
6 pages
Corporate Communications
No ratings yet
Corporate Communications
4 pages
M. Pharm Review NAPLEX32
No ratings yet
M. Pharm Review NAPLEX32
1 page
Is Philosophy Stagnant?: W. Durant The Story of Philosophy. Pocket Books: NY, 2
No ratings yet
Is Philosophy Stagnant?: W. Durant The Story of Philosophy. Pocket Books: NY, 2
3 pages
3) Donor Information and Education Strategies..MMV
100% (1)
3) Donor Information and Education Strategies..MMV
22 pages
Corporate Warriors: The Rise of The Privatized Military Industry and Its Ramificationsfor International Security
No ratings yet
Corporate Warriors: The Rise of The Privatized Military Industry and Its Ramificationsfor International Security
36 pages
Multibody Dynamics Simulations With Abaqus From SIMULIA
No ratings yet
Multibody Dynamics Simulations With Abaqus From SIMULIA
32 pages
RJ Design-Guide 09
No ratings yet
RJ Design-Guide 09
15 pages