Towards Optimal District Heating Temperature Contr-4
Towards Optimal District Heating Temperature Contr-4
The good performances of the agents are in line with several other recently published papers applying
reinforcement learning to controlling an indoor temperature, with smaller energy gains due to the
lack of flexibility of the environment and an optimized baseline. Besides, our approach differs
from these contributions in some aspects. First of all, whereas most studies focus on the thermal
behaviour of a unique building, e.g. [16–19], our system is a whole substation, with the 11 apartments
originally tuned to represent the variety found in a district. District heating system are studied
in [3–5, 20] with promising results in terms of control, but with no analysis of the effect of their
policies on indoor temperatures. Secondly, these references assume a simple rule-based strategy for
baseline, or a manual control [14]. For district heating however, using water curves for setting the
supply temperature is common practice, with parameters tuned by hand from a set of a dozen indoor
temperature sensors representative of the district thermal behaviour. By optimizing these parameters,
our baseline strategy is thus an improvement compared to the industry standards, in order to better
assess the potential gain due to reinforcement learning.
Another key feature of applying reinforcement learning to real-world problems is the design of the
reward function. Most references build the reward as an explicit trade-off between energy cost and
thermal comfort with careful weighting of the two contributions, see e.g. [14, 16–18]. On the contrary,
our approach was to define a reward that depends solely on the target temperature specified by the
contract between utility and customers, and to evaluate whether the agents can lower the energy cost
as a side effect. Indeed, when adding an energy cost in this low flexibility environment, the Agents
maintain a mean indoor temperature constant at the lowest possible level; this effect can also be
achieved by the baseline by lowering the target temperature. Moreover, we find in our experiments
that the reward function (1) is more stable and robust to different weather conditions, while still
maintaining an advantage in terms of both energy cost and thermal comfort.
Nevertheless, our results suggest that deep reinforcement learning, by understanding the dynamics of
the system, is a suitable tool for controlling district heating networks, maintaining thermal comfort
while reducing energy cost. In order to be applied to an actual network, the first step is to deploy
onsite outdoor and indoor temperature sensors. Next, either the RNN model or a lightweight statistical
model (e.g. equivalent RC electrical networks) is finetuned on the operation data. Based e.g. on a
cloud infrastructure to store the measurements, the agents, whether they are DQN or more recent
agents such as DDPG [21], can finally be deployed for controlling the substation [20].
Table 1: Performances of the control strategies in the multi-apartment setting, for T ≡ 18◦ C. MAE:
mean absolute error, std: standard deviation. Best performance is emphasized in bold.