0% found this document useful (0 votes)
8 views

Towards Optimal District Heating Temperature Contr-2

Uploaded by

oscar gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Towards Optimal District Heating Temperature Contr-2

Uploaded by

oscar gonzalez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

• •

• •
• •
Primary Secondary
• •
Ts

towards heat ··· • • • ···


generation plant ··· • • • ···

Heat Exchanger Tr

Figure 1: A district heating network.

2 Approach
2.1 Control strategy

We apply a Reinforcement Learning (RL) paradigm [6], where an agent learns a control strategy
(policy) by interacting with the environment - here, the set of rooms heated by the network. The
problem is modelled as a Markov Decision Process: the agent receives an observation of the state
of the environment, chooses an action and receives as a result a reward from the environment. The
best control strategy maximizes the expected cumulative discounted reward over the lifetime of the
agent. Learning such a policy requires first to derive a model of the environment, to predict the indoor
temperatures from the commands and the weather conditions. This model is described in Section 2.2.
At time t, the state is a vector st containing the outdoor temperature To,t , supply water temperature
(j)
Ts,t , time of the day and indoor temperatures Tin,t for every room j ∈ {1, . . . , N } in the network.
st contains both present and past n measurements of these quantities. At an hourly time step, a
history of 24 hours is used to form st . At that same hourly time step, the agent is asked to select an
action at . The flow rate being kept constant, the action is restricted to the supply temperature Ts,t .
Two discrete action spaces, with Ts (◦ C) ∈ {20, 21, . . . , 50}, are considered. Agent 1 is the standard
strategy while Agent 2 is a finetuning of the baseline control strategy (cf Section 2.3):
1. Agent 1: to enforce the smoothness of the control signal, the action is limited to the
increments at = Ts,t − Ts,t−1 where at ∈ A := {0, ±0.5, ±1, ±1.5, . . . , ±3}.
b
2. Agent 2: the discrete action is the difference at = Ts,t − Ts,t where at ∈ A and Tsb is the
estimated baseline supply temperature.
Finally, the agent selects the action in order to maximize the expected cumulative discounted reward
PT
function R = t=0 γ t r(st , at ) over T time steps in the heating season. In the sequel, the discount
factor is set to γ = 0.9, which corresponds to an agent that adapts its behaviour to the expected
reward for the next 30 hours. The reward r penalizes deviations from a target temperature T :
N
(j) (j)
X
r(at , st ) = − |Tin,t − Tt |. (1)
j=1

We use Deep Reinforcement Learning (DRL), to train the different agents. DRL has proven to
be a successful algorithm in various domains such as games, robotics or demand response [7]. In
particular, we train Deep Q-Networks (DQNs, [8, 9]). For each training episode, a weather file is
randomly chosen from a set of 7 cities in China to avoid overfitting the local climate, and an entire
heating season is simulated. The weather measurements for testing the agents come from an eighth
city, Yuncheng. Some statistics summarizing the climate in these cities are gathered in Appendix A.

2.2 Model identification

Consider a six-story building with three apartments per level - facing either the Eastern, Southern or
Western direction. Heat is provided from a district heating substation and supplied to the apartments

You might also like