0% found this document useful (0 votes)
14 views11 pages

paper-3

This document presents a multi-agent deep reinforcement learning (MADRL) framework for energy management in networked greenhouses, addressing their significant energy consumption and potential for demand response. The approach utilizes a Markov game formulation and incorporates an attention mechanism to enhance coordination among greenhouses, demonstrating a 28% reduction in net load demand compared to traditional methods. A case study in New York City illustrates the scalability and effectiveness of the proposed system in managing energy resources and maintaining optimal indoor climates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

paper-3

This document presents a multi-agent deep reinforcement learning (MADRL) framework for energy management in networked greenhouses, addressing their significant energy consumption and potential for demand response. The approach utilizes a Markov game formulation and incorporates an attention mechanism to enhance coordination among greenhouses, demonstrating a 28% reduction in net load demand compared to traditional methods. A case study in New York City illustrates the scalability and effectiveness of the proposed system in managing energy resources and maintaining optimal indoor climates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Applied Energy 355 (2024) 122349

Contents lists available at ScienceDirect

Applied Energy
journal homepage: www.elsevier.com/locate/apenergy

Energy management for demand response in networked greenhouses with


multi-agent deep reinforcement learning
Akshay Ajagekar a, Benjamin Decardi-Nelson a, Fengqi You a, b, *
a
Systems Engineering, Cornell University, Ithaca, NY 14853, USA
b
Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853, USA

H I G H L I G H T S

• Operation of networked grid-interactive greenhouses is formulated as a Markov game.


• Battery energy storage and photovoltaic power generation modeled in each greenhouse.
• Multi-agent deep reinforcement learning technique for demand response is developed.
• Attention mechanism is leveraged to promote coordination among greenhouses.
• Net load reduction and scalability is demonstrated with case study in New York City.

A R T I C L E I N F O A B S T R A C T

Keywords: Greenhouses are key to ensuring food security and realizing a sustainable future for agriculture. However, to
Deep reinforcement learning ensure crop growth efficiency, greenhouses consume a significant amount of energy, primarily through climate
Multi-agent control and artificial lighting systems. Owing to this high energy consumption, a network of greenhouses exhibits
Demand response
immense potential to participate in demand response programs for power grid stability. In this work, a multi-
Urban agriculture
Greenhouse
agent deep reinforcement learning (MADRL) control framework utilizing an actor-critic algorithm with a
shared attention mechanism is proposed for energy management in networked greenhouses. A network of
renewable energy integrated greenhouses is constructed to interact with the power grid, when necessary, to
address the fluctuations associated with renewable energy generation and dynamic electricity prices. The
viability and scalability of this multi-agent approach is demonstrated by evaluating its capabilities for a network
of five greenhouses of varying capacities. The proposed MADRL-based control approach for demand-side energy
management in networked greenhouses demonstrates efficiency in maintaining indoor climate in all greenhouses
while ensuring a 28% reduction in net load demand as compared to well-known algorithms.

1. Introduction consumption [2,3]. CEA structures like greenhouses have the potential
to satiate the food demands of growing populations without compro­
The world population is projected to reach 9 billion by 2050, high­ mising the global climate goals [4]. As the CEA technologies become
lighting the urgent need for reforms in agriculture and food production more efficient and affordable, the adoption of greenhouses in various
to address the challenges posed by a growing population and ensuring regions, including arid, urban, and disaster-prone areas, has seen a rapid
food security [1]. Traditional agricultural practices suffer from issues increase [48]. The Food and Agriculture Organization of the United
like shrinking availability of agricultural land and water scarcity, which Nations estimates the global area spanned by greenhouses is expected to
consequently results in inadequate food and crop yields. Controlled increase significantly by 2030. As agriculture is a substantial contributor
environmental agriculture (CEA) aids in tackling such issues by allowing to global energy consumption and greenhouse gas emissions, it is
growers to control aspects of the growing environment, facilitate year- important to incorporate sustainable practices with CEA in greenhouses.
round production with water conservation, reduce pesticide usage, Sophisticated greenhouse control techniques can efficiently regulate the
and ensure fewer emissions associated with transportation and energy indoor environmental conditions required to optimize crop growth and

* Corresponding author at: Systems Engineering, Cornell University, Ithaca, NY 14853, USA.
E-mail address: [email protected] (F. You).

https://doi.org/10.1016/j.apenergy.2023.122349
Received 9 October 2023; Received in revised form 8 November 2023; Accepted 13 November 2023
Available online 20 November 2023
0306-2619/© 2023 Elsevier Ltd. All rights reserved.
A. Ajagekar et al. Applied Energy 355 (2024) 122349

increase yields in an energy-efficient manner [5,6]. Real-time control of to model demand response in networked greenhouses as a sequential
the actuators in greenhouses, like heating, ventilation, cooling, and decision-making problem that reflects real-time operation subject to
artificial lighting, has also been demonstrated with such environmental various uncertainties. Although multi-agent approaches offer better
control strategies [7,8], even with the increased penetration of renew­ scalability as compared to single-agent DRL approaches, their reliance
able energy sources [9]. Although renewable energy sources like on local partial information to compute actions may lead to poor control
photovoltaic panels are capable of supplementing the overall energy performance [26]. Achieving efficient load reduction performance with
consumption in sustainable greenhouses, the energy demand associated a MADRL approach while ensuring its scalability as the problem size
with heating and cooling systems, as well as supplemental lighting, re­ increases is another research challenge. A final challenge lies in the
mains non-trivial [10,11]. With the projected increase in greenhouse limitations of a MADRL approach to handle the credit assignment
adoption along with its energy requirements, energy management in problem wherein individual agents are unable to gauge their contribu­
greenhouses presents an opportunity to participate in demand response tion to the overall load reduction [27]. The non-stationarity of net­
programs to lower net demand by establishing microgrids with network worked greenhouses impacts each agent’s performance as a
of grid-interactive greenhouses [12]. Efficient demand responsive con­ consequence of adapting to other agents further complicating the credit
trol techniques for energy management in such microgrids are key to assignment.
realizing benefits like the grid’s improved reliability and flexibility by In this work, we model the operation of greenhouses to regulate their
reducing peak net loads and greenhouse gas emissions through indoor climate by utilizing energy drawn from battery storage, renew­
increased renewable resource utility [49]. able energy sources, and the electric grid as a sequential decision-
Various energy management techniques considering demand making problem. A network of greenhouses equipped with photovol­
response to promote energy or cost savings have been developed for taic panels and a battery energy storage system is constructed to facili­
microgrids and can be broadly categorized into model-based and model- tate energy management in individual greenhouses for net load
free approaches. Demand response formulated as an optimal control reduction. We cast the demand response problem in grid-interactive
problem can be tackled with various optimization strategies, including greenhouses as a Markov game to leverage a multi-agent approach for
mixed-integer linear programming [13,14]. Optimal control approaches energy management by controlling the equipped energy systems. A
like model-predictive control leverage the system dynamics to construct MADRL approach is further proposed to enable performance-efficient
an optimization problem over a finite horizon to optimize decisions and scalable demand response by assigning each agent to a green­
involved with load curtailment [15]. Such optimization problems also house. The proposed MADRL agent utilizes local greenhouse informa­
allow formulations considering model uncertainties arising from various tion and corresponding climate data to compute actions in a
sources like renewable energy generation and dynamic electricity prices decentralized manner and addresses the resulting control inefficiencies
and can be tackled with stochastic optimization approaches like sto­ by leveraging the attention mechanism during the training process. We
chastic programming [16] and robust optimization [17,18]. These employ an attention-based neural network to estimate the value func­
model-based approaches for energy management considering demand tions for each agent wherein relevant information is shared to promote
response utilize a model of the microgrid operation derived from first coordination among all greenhouses. The attention-based neural
principles, which often employ approximations that may not necessarily network allows for dynamically attending to the relevant greenhouse
reflect its real-world counterpart. Model-free reinforcement learning state and agent’s policies at any time for efficient value estimation that
techniques can overcome such shortcomings and have demonstrated would be otherwise complicated by the non-stationarity of the under­
performance efficiency for demand-side management in microgrids lying stochastic environment. A soft actor-critic based method is used
[19]. Reinforcement learning allows for obtaining feasible solutions to with stochastic policies to enable efficient exploration during the
control problems formulated as a Markov decision process using sto­ training process. Training the stochastic policy and attention-based
chastic simulations and overcoming the limitations associated with high value function networks in a centralized manner ensures reliance on
dimensionality through function approximations [20]. Specifically, global information to improve control performance while offering
deep reinforcement learning (DRL)-based strategies leverage neural scalability due to decentralized control computation. The applicability
networks for function approximation to help capture complex nonlinear and efficiency of the proposed MADRL approach for demand response in
trends in data [21,22] and yield demand-responsive control policies that networked greenhouses is demonstrated with a case study comprising
are capable of adapting to dynamic fluctuations in the microgrid. five greenhouses situated in New York City that draw power from the
Demand response in grid-interactive greenhouses is a non-stationary city’s electric grid. We conduct simulations with the proposed multi-
problem due to the presence of multiple consumers equipped with agent approach over the entire year using real-world climate data to
renewable energy and battery energy storage systems, which directly illustrate the adaptability of the MADRL approach to varying environ­
impact the load demand [19]. Deep reinforcement learning approaches mental disturbances. We also provide detailed comparisons against DRL-
for demand response typically employ a centralized controller strategy based control approaches like deep deterministic policy gradient
wherein a single agent controls all energy systems. Such centralized (DDPG) and rule-based control approaches for demand response. The
controllers use globally available information as the current state to main contributions of this work are summarized as follows:
obtain all actions that maximize a value function. However, such single-
agent approaches may exhibit limitations like poor sample complexity • A novel Markov game formulation of the interaction between
and exponential growth of computational requirements as the demand greenhouse operation and equipped energy systems of battery en­
response problem size increases [23]. To combat such issues in demand- ergy storage, photovoltaic renewable energy generation, and the
side energy management, multi-agent deep reinforcement learning electric grid.
(MADRL) approaches have also been explored, which leverage a • Demand-responsive energy management in the greenhouses per­
distributed response in contrast to the centralized controller [24,25]. formed with a novel multi-agent approach leveraging stochastic
MADRL offers a promising avenue to tackle the non-stationarity asso­ policies and the attention mechanism for information sharing and
ciated with demand response in networked greenhouses; however, there coordination among the greenhouses.
are several research challenges associated with realizing this. The first • The load reduction capabilities of the proposed multi-agent strategy
challenge lies in modeling the interaction of greenhouses with energy for demand response realized while adapting to varying environ­
systems, including renewable energy sources, energy storage systems, mental disturbances are demonstrated for greenhouses situated in
and the electric grid. As approximations for modeling system dynamics New York City. A comparative study with single-agent DRL and rule-
leveraged for model-based approaches like model predictive control based techniques for demand response is also presented.
[12] may impact their applicability in a practical setting, it is important

2
A. Ajagekar et al. Applied Energy 355 (2024) 122349

The remainder of this paper is structured as follows. Modeling the Controlling the air humidity within the greenhouse to be in optimal
greenhouse operation along with its interaction with energy systems growth ranges serves various benefits, including enhanced crop yield by
comprising the battery energy storage system, photovoltaic panels, and promoting plant growth and transpiration as well as improved temper­
the electric grid is described in Section 2. Demand-responsive control of ature control [29]. Indoor air humidity can be characterized by the
energy systems equipped in a network of greenhouses realized with the density of water vapor ρw with its rate of change over time modeled with
proposed MADRL strategy is presented in Section 3. A case study with a mass balance equation in Eq. 4. Water vapor levels are influenced by
various demand response techniques for greenhouses located in New the amount of condensation on greenhouse components like mat, floor,
York City is presented in Section 4. Finally, conclusions are drawn in and cover, denoted as Wmat c
, Wfloor
c
, and Wcover
c
, respectively. The loss of
Section 5. vapor density resulting from condensation onto the component i is a
function of exposed area of the component Ai , the latent heat of
2. Energy modeling condensation of water LH , and the volumetric heat capacity of humid air
Cv . The amount of condensation is also directly affected by the difference
In this section, we describe the modeling for a network of green­ in the current water vapor density ρw and the saturated vapor density
houses connected to a power grid to simulate the real-time energy Sv (Ti ) defined as the maximum density of water vapor in air at tem­
consumption in greenhouses and the associated electric demand. Fig. 1 perature Ti , given in Eq. 5. The gain of water within the internal
depicts an overview of the constructed simulation along with the energy microclimate due to evapotranspiration from the soil and the crops
components within each greenhouse and the electric grid. As shown in denoted as WET contributes to the increase in water vapor density, and is
Fig. 1, each greenhouse is equipped with a battery energy storage system modeled by the Stanghellini model for stomatal resistance and leaf
and photovoltaic panel as its renewable energy source. The net energy transpiration [30]. The amount of water vapor escaping to the outdoor
requirements of a greenhouse after adjusting for charging or discharging air owing to ventilation and external wind represented by Wout is also
of the battery storage device and utilizing the renewable energy resource considered here. The actuator responsible for humidification and
constitute the net load demand for the electric grid. dehumidification provisions Uvapor indicating the controlled amount of
water vapor added at any given time.
2.1. Greenhouse energy consumption
d ρw c
= WET + Uvapor − Wmat c
− Wfloor c
− Wcover − Wout (4)
Each greenhouse is equipped with an actuator system to maintain the dt
indoor microclimate for optimal crop growth. These actuator systems
Wic = f (Ai , LH , Cv )⋅(ρw − Sv (Ti ) ) (5)
facilitate provisions for the addition or removal of heat, CO2 enrichment,
humidification or dehumidification, and supplemental lighting, as The CO2 levels within each greenhouse must be regulated to promote
shown in Fig. 1. Here, we consider a ventilated semi-closed greenhouse crop growth as it directly impacts the plant’s photosynthesis and
model wherein the indoor climate is affected by external weather dis­ respiration abilities, which in turn affects plant growth and yield [31].
turbances like air humidity levels, ambient air temperature, wind speed, Even though CO2 in the outdoor air is accessible to greenhouse crops,
and amount of solar radiation. As maintaining indoor greenhouse tem­ this can be supplemented through the greenhouse CO2 enrichment
perature is a crucial factor for enriching crop growth [28], it is important actuator system which is necessary to maintain sufficient CO2 concen­
to consider the flow of thermal energy between various greenhouse tration levels within the indoor microclimate. The rate of change of CO2
components and the outdoor environment [50]. Modeling the rate of concentration within each greenhouse is modeled with a mass balance
change of indoor air temperature Tair requires the energy balance equation as shown in Eq. 6. The indoor density of CO2 denoted by ρC is
equation for this thermodynamic system in Eq. 1. The proportionality affected by the amount of CO2 released by the crops during growth and
constant for the rate of change of air temperature Tair is linearly corre­ maintenance respiration Ccrop and the CO2 absorbed by the plants during
lated with the inverse of the heat capacity of the air occupying the photosynthesis Cphoto . The exchange of CO2 caused due to respiration and
greenhouse. The indoor greenhouse temperature is influenced by photosynthesis is modeled with the Vanthoor photosynthetic model
convective and radiative heat flows, as well as the heat added to the [32]. UCO2 indicates the amount of CO2 added to the greenhouse with the
greenhouse system provisioned by the actuator control system Uheat. The control actuator for CO2 enrichment, while Cout refers to the CO2 escaped
convective energy transfer comprises of heat exchange via convection to the outdoor environment.
between the air and individual greenhouse components like floor mat,
d ρc
crop vegetation, greenhouse cover, and tray as represented by Qvmat , Qvveg , = Ccrop − Cphoto + UCO2 − Cout (6)
dt
Qvcover , and Qvtray , respectively. The indoor air also loses heat via con­
It should be noted that the artificial lighting system within the
vection to the outdoor air and is denoted by Qvout . Although miniscule,
greenhouse emits near-infrared radiation [33]. As light plays a crucial
the solar radiation absorbed by the air Qrsolar also influences its temper­
role in regulating the transpiration rate of crops [34], supplemental
ature. The heat losses to the greenhouse internal air resulting from
lighting affects indoor humidity levels by directly influencing WET which
convection are modeled with Eq. 2, where HLi and Ai denote the
is the amount of water added due to evapotranspiration. Similarly, the
convective heat transfer coefficient and the exposed surface area for
rate of photosynthesis varies due to the changes in amount of radiation
each entity i. On the other hand, the absorbed radiative heat Qrsolar is Ulight sourced by artificial lighting lamps within the greenhouse and
calculated with Eq. 3. This absorbed heat includes transmitted radiation outdoor sunlight [35]. As a result, CO2 levels in the greenhouse may
due to both direct and diffuse solar radiation. The direct and diffuse diminish due to increase in absorbed CO2, thus affecting Cphoto . To model
solar radiation levels Qrdirect and Qrdif consider absorption in near infrared
the energy consumption of a greenhouse, the control actuators are
and visible spectra, with only a fraction aobs of the transmitted solar mapped to their equivalent energy consumption. The amount of heat
radiation hitting the corresponding obstruction. supplied to the greenhouse and the radiation from the artificial lighting
dT lamps equipped in the greenhouse account for a substantial portion of
= Qvmat + Qvveg + Qvcover + Qvtray − Qvout + Uheat + Qrsolar (1) the overall greenhouse energy consumption. On the other hand, the
dt
amount of energy required to pump CO2 into the greenhouse for CO2
Qvi = HiL Ai (Ti − Tair ) (2) enrichment combined with the energy required to operate devices for
humidification and dehumidification constitutes a trivial fraction of the
( )
Qrsolar = aobs Qrdirect + Qrdif (3) overall energy consumption at a given time. Translating the controls
( )
Uheat , Uvapor , UCO2 , Ulight into their energy counterparts provides the net

3
A. Ajagekar et al. Applied Energy 355 (2024) 122349

Fig. 1. An overview of the energy components of photovoltaic generation and energy storage systems equipped in greenhouses along with their interaction with the
power grid.

4
A. Ajagekar et al. Applied Energy 355 (2024) 122349

energy consumption EGH of a greenhouse required to regulate indoor 2.4. Electric power grid
climate.
The amount of electricity drawn from the power grid by each
2.2. Energy storage greenhouse is dictated by three factors: the energy consumption of the
greenhouse to maintain its indoor microclimate EGH , charging or dis­
Battery energy storage systems are ideal for reducing peak loads in charging of the battery energy storage system EBESS , and the PV panel
buildings like greenhouses owing to their high instantaneous discharg­ output EPV . Depending on the current energy storage levels, we allow
ing capability [36]. They not only facilitate improved energy manage­ provisions for the greenhouse energy consumption to be partially or
ment but also enable seamless integration of renewable energy sources. entirely supplemented by its battery energy storage and the generated
We equip each greenhouse with such a battery energy storage system PV power. For either case, the energy drawn from the power grid by the
capable of charging during off-peak periods and discharging sufficient i’th greenhouse in the network can be modeled as
( )
power to supplement the greenhouse energy consumption at any time. max 0, EiGH + EiBESS − EiPV . With this, the net electric demand Enet for the
For each greenhouse, the energy storage system is characterized by its network of greenhouses can be aggregated as shown in Eq. 10. Here, we
capacity CB , nominal capacity CBnom , capacity loss coefficient ηC , and the do not enforce maximum limits on the net load and assume flexibility of
charging/discharging efficiency factor ξ. The nominal capacity dictates the main power grid.
the amount of power delivered by a fully charged battery under speci­ ∑ { }
Enet = max 0, EGHi i
+ EBESS i
− EPV (10)
fied conditions while the capacity loss coefficient is a measure of battery i
degradation over each charging/discharging cycle. For a rechargeable
battery, the nominal capacity and the charging efficiency is typically a 3. Multi-agent deep reinforcement learning strategy for demand
piecewise linear function of its current state of charge SOC [37]. These response
piecewise functions are termed as power capacity curve and efficiency
curve represented by CBnom (SOC) and ξ(SOC), respectively. Demand response programs applied to a network of greenhouses can
⎧ {
B
√̅̅̅ } aid in lowering the net load demand to improve the reliability of the
⎪ min Ct , SOCt + EBESS ⋅ ξ
⎨ if EBESS ≥ 0
power grid. In this section, we cast the demand response problem in
SOCt+1 = { } (7) networked greenhouses as a multi-agent Markov decision process or
⎩ max 0, SOCt + EBESS ⋅√1̅̅̅

ξ
if EBESS < 0 Markov game and develop a multi-agent deep reinforcement learning
(MADRL) approach to overcome the limitations posed by single-agent
|SOCt − SOC0 | DRL approaches. It is important to note that a single-agent DRL
B
Ct+1 = CtB − ηC C0B ⋅ (8) approach exhibits performance efficiency at the expense of increased
2CtB
computational complexity [40] as the number of greenhouses within the
At the timestep t, the dynamics for charging or discharging a battery network grows. The proposed multi-agent approach leverages the
energy storage system with energy EBESS is described with the transition attention mechanism to promote co-ordination among the agents and
to SOCt+1 and battery capacity CBt+1 resulting from battery degradation. sharing of relevant information to induce learning in a centralized
EBESS is bounded by [− CBnom (SOCt ), CBnom (SOCt )] to reinforce the maximum manner while ensuring decentralized control computation. This hybrid
allowable charging/discharging levels. The charging or discharging approach combining centralized training with decentralized execution
dynamics of the storage device is provided in Eq. 7, wherein the effi­ allows us to gain performance efficiency for demand response in net­
ciency factor ξ(SOCt ) dependent on its current state dictates the amount worked greenhouses without compromising the scalability offered by
by which the battery can be charged or discharged. The dynamics for multi-agent approaches.
battery charging in Eq. 7 ensures that the storage device does not exceed
its current capacity while enforcing physical limits on its SOC during 3.1. Multi-agent Markov decision process formulation
discharging [38]. The degradation of the battery after each charge or
discharge is calculated with Eq. 8, wherein the amount of capacity The demand response problem in networked greenhouses is formu­
degradation is proportional to the loss coefficient ηC and the initial lated as a Markov game [41] or multi-agent extension of Markov deci­
battery capacity CB0 . If the initial SOC0 is assumed to be zero, the amount sion process (MMDP) to enable the development of a multi-agent
of degradation linearly corelates with the charged fraction of the current approach for solving such sequential decision-making problems. An
battery capacity. MMDP with N agents is defined by a set of states S, the action space Ai for
the i’th agent, the reward obtained by each agent Ri : S × A1 × …AN →ℝ,
2.3. Solar energy the discount factor γ ∈ [0, 1], and the transition function, which de­
scribes the probability distribution over possible next states. We assign
Photovoltaic (PV) panels that allow the generation of electricity from an agent to an individual greenhouse for participation in the demand
renewable energy sources can supplement energy requirements in a response program by controlling the charging and discharging of the
greenhouse and in turn, reduce demand from the power grid. Here, we equipped battery energy storage device. So, the i’th agent receives an
model PV modules equipped in each greenhouse such that the power observation oi, which is a subset of the global state s ∈ S, and learns the
generated by the panel is available for use to lower the energy con­ policy πi (oi , ai )→[0, 1]. πi is a stochastic policy that maps the agent’s
sumption of the greenhouse or to facilitate charging of its battery energy observation to a distribution over the possible actions ai ∈ Ai .
storage system [39]. The energy produced by the PV panel depends on The observation oi for each agent constitutes the corresponding
the area exposed to the solar radiation Apv , the efficiency coefficient ηpv , greenhouse environmental variables, state of the equipped battery en­
and the packing factor Pf . The packing factor or density of solar cells in a ergy storage system, PV generation, greenhouse energy consumption, as
PV module dictates the output power of the module and generally de­ well as external weather disturbances. At the timestep t, the greenhouse
pends on the shape of the solar cells used. With a total solar irradiation observations include the indoor air temperature Tair,t , water vapor
of Gs the power generated by the PV panels EPV can be modeled as shown density ρw,t , and the CO2 concentration ρC,t , while the weather distur­
in Eq. 9. bances of ambient air temperature Text,t , external relative humidity
RHext,t , wind speed vt , and solar radiation Gs,t are considered here.
EPV = Apv ηpv Pf ⋅Gs (9)
Regulating the greenhouse environmental variables requires the
observed energy consumption EGH,t . In addition, the PV power output

5
A. Ajagekar et al. Applied Energy 355 (2024) 122349

EPV,t and the energy storage device’s available charge SOCt capable of embeddings ei are represented by fie (oi , ai ) where fi indicates the para­
supplementing the greenhouse consumption also comprises the obser­ metric linear network for each agent i. With the agent’s embeddings, we
vation space. The greenhouse environmental setpoints are dependent on generate the attention matrix with entries αij . The attention weights
the optimal microclimate conditions required for different crops and indicate a comparison between the embeddings ei and ej computed using
vary with time. To ensure that the agent’s policy learns the temporal a query-key system [43]. The parameter matrices ϕk and ϕq are used to
dependencies, we also consider the hour of the day thour and day of the transform the respective embeddings into a key and a query as shown in
year tday as the agent’s observations. The observation space for each Eq. 14, wherein the softmax function Sf can be described as Sf (z)i =

agent then spans oi ∈ ℤ2 × ℝ9 while ai ∈ ℝ forms the action space for ezi / j ezj . In addition, the attention weights are also scaled by a factor of
each agent. The action ai represents the amount of energy EBESS utilized √̅̅̅
1/ d to prevent large magnitudes of the dot products and consequent
for charging or energy made available by discharging for supplementing vanishing gradients [43], where d represents the size of key and query
the greenhouse energy requirements. As EBESS is bounded by the nominal vectors. Finally, the value function for each agent is computed as shown
capacity of the battery storage, the action ai is scaled between [0,1] to in Eq. 15. fiq is a feed forward neural network assigned to the i’th agent
limit the action space. Under the multi-agent setting, the agents attempt which uses the sum of the state-action embedding values weighted over
to learn a policy that maximizes the expected discounted returns the attention weights as its input. The parameter matrix ϕv transforms
described by Eq. 11. Here, γ represents the discount factor, which the state-action embeddings into the agent’s values.
specifies the favorability of the policies towards long-term or short-term (( )T ( ))
rewards. For the demand response in networked greenhouses, the αij = Sf ϕk ej ϕq ei (14)
objective is to improve the reliability of the power grid by lowering the
net load. To realize this, we specify the reward function for each agent i ( )

as shown in Eq. 12, wherein the reward obtained by each agent at time t Qiϕ = fiq αij ⋅ϕq ei (15)
is dependent on the partial information oi and the corresponding action j∕
=i

ai taken by the agent. Maximizing the cumulative discounted rewards for


The constructed policy and centralized critic networks are trained
each agent specified by Rit is equivalent to minimizing the discounted
with a soft actor-critic based algorithm. An experience replay D serves as
greenhouse energy demand over an infinite horizon.
the replay buffer that records transitions from which data batches can be
[ ]
∑∞ sampled to train the networks with the underlying algorithm. Soft actor-
π i * = argmaxπi Еai ∼πi t=0
γ t
R (o
it it , a it ) (11) critic is a policy gradient method that uses an off-policy formulation
along with an entropy maximization objective to ensure efficient
{ } exploration and stability [44]. A batch of transitions B⊂D sampled
i
Rit (oit , ait ) = − max 0, EGH i
+ EBESS i
− EPV (12)
uniformly is used to fit the critic over its estimated state-action values
and perform a policy gradient step. The set of observations
3.2. Policy learning for load reduction o ≡ (o1 , …, oN ), actions taken a ≡ (a1 , …, aN ), and the obtained rewards
r ≡ (r1 , …, rN ) along with the recorded next set of observations o’ ≡
We adopt neural network policies for each agent i that are parame­ ( ’ )
o1 , …, o’N constitute a transition. We also initialize copies of the policy
terized by θi and represented by πθi . To incorporate the stochastic nature
and critic networks to serve as the target policies and critics denoted by
of the policies, we use a Gaussian policy for which the mean μi and
πθi and Qϕ , respectively. These target networks help prevent divergence
covariance σ i associated with the corresponding actions is computed
in off-policy algorithms caused by the overestimation of the state-action
with the neural network. The parameterized Gaussian policies use the
values. With these, the central critic is updated to minimize the joint loss
agent’s observation oi as input which is denoted by [μi , σ i ] = π θi (oi ). The
function in Eq. 16 where the target values estimates are provided in Eq.
actions for each agent ai can be extracted from the relevant policies by
17.
either sampling from the normal distribution N (μi , σ i ) or using the mean
[( )2 ]
action μi , depending on the exploration conditions used during training
LQ = Eo,a,r,o’∼B Qiϕ (o, a) − yi (16)
and evaluation. A centralized critic network parameterized by ϕ serves
as the value function estimator for each agent. By definition, the
[ ( ( ))]
centralized critic denoted by Qϕ uses the global state comprising all yi = ri + γ⋅Ea’∼πθ Qiϕ (o’ , a’ ) − log πθi o’i (17)
observations along with the selected actions as input to estimate the
i

state-action values for all agents. The output of this network is an N- Jπi = Eo,a,r,o’∼B [ri + β⋅H(πθi (oi ) ) ] (18)
dimensional vector with each entry indicating the estimated expected
discounted rewards over an infinite horizon as shown in Eq. 13. In contrast to typical policy gradient approaches, which maximize
[ ] the expected sum of rewards, we consider a maximum entropy objective,
∑∞ which is an augmented objective with the expected entropy of the sto­
Qiϕ (oi , …oN ,a1 , …, aN ) = Еai ∼πi γ t
R it (oit , ait ) (13)
t=0 chastic policies. A gradient ascent step is performed with the augmented
objective in Eq. 18 where H indicates the entropy of the stochastic policy
In a typical multi-agent setting, separate critic networks use partial πθi . To update the central critic, actions a’ are sampled from the Gaussian
information to optimize the individual agent’s performance, which may ( )
distribution N (μi , σi ) for the target policy [μi , σ i ] = πθi o’i . It should be
lead to the non-stationarity of the decision-making problem [42]. This
noted that the policies can be trained independently from each other and
may lead to the learning of suboptimal policies as each agent tries to
require a temperature parameter β to determine the weightage of the
minimize the individual greenhouse’s net load without considering how
entropy against the reward. During the training phase, exploration in
the other agent’s performance affects the power drawn from the electric
agents is encouraged by sampling the actions from the distributions
grid. To overcome this limitation, we construct a central critic network
guided by the stochastic policies, while mean actions are selected for
that leverages the attention mechanism to promote coordination among
each agent during evaluation. These update rules facilitate the learning
individual agents. The key idea behind attention is to select relevant
of the networks in a centralized manner while ensuring action compu­
information for the input sequence [43], which translates to the ability
tation with decentralized execution.
of the critic network to dynamically attend to specific agents at any time,
allowing for the complex interactions among them to be captured
effectively. First, the embeddings for an observation and action pair are
computed for each agent with a feed-forward neural network. These

6
A. Ajagekar et al. Applied Energy 355 (2024) 122349

4. Computational results use the same network structure and training hyperparameters. During
training, the controllers start with random actions in the exploration
In this section, we detail the results from our computational exper­ phase. The network configurations selected for this study have di­
iments, which underscore the scalability and efficiency of our proposed mensions of 400. Four attention heads are used in the MADRL archi­
multi-agent DRL approach, designed for demand-response within grid- tecture. Every DRL method applied here uses a discount factor γ = 0.98
responsive networked greenhouses. We have chosen New York City and a Polyak averaging parameter ρ = 0.001. The networked green­
(NYC)–a highly urbanized location–as a representative case study. Our house environment is constructed with Python-based scientific libraries,
discussion begins with an overview of the simulation setup and the Numpy and Scipy [45]. Operation of the networked greenhouses inter­
rationale behind selecting NYC as the case study location. We then acting with the electric power grid is implemented as a Markov decision
introduce the baseline algorithms we utilized, specifically, rule-based process with the OpenAI Gym environment [46], while the neural
control (RBC) and the deep deterministic policy gradient (DDPG) algo­ network construction and training are conducted with the Pytorch deep
rithms. Subsequently, we present the training and simulation outcomes, learning package.
contrasting the effectiveness of different techniques. Our findings Typical winter, spring, summer, and fall conditions were considered
highlight the superiority of the MADRL approach in enhancing the grid in this study. The simulation for each season starts with the first day in
responsiveness of networked greenhouses and optimizing their elec­ January, April, July, and October, respectively, and is conducted for
tricity consumption from the main grid. seven days in each season. The DRL algorithms were trained on the 2020
climate data for a week in each season and were subsequently tested on
4.1. Experimental setup the 2021 data for an equivalent duration. The climate data were ob­
tained from the National Solar Radiation Data Base (NSRDB) [47]. The
There are five greenhouses, each located in one of NYC’s five bor­ dynamic operation for networked greenhouses is modeled using a
oughs: Brooklyn, Manhattan, Queens, Staten Island, and The Bronx. We sampling interval of 60 s. For our experiments, we use location-specific
selected NYC for our study because of its highly urbanized nature and historical climate data recorded every 5 min. To accommodate the
the growing importance of urban farming in such environments. Each sampling interval mismatch, the weather data is kept consistent for
greenhouse is equipped with a solar panel and battery storage, resulting every five consecutive intervals to avoid computational inefficiencies
in distinct local dynamics. Fig. 2 provides an overview of the locations associated with the simulation of the individual greenhouse. For
and configurations of each greenhouse. Due to their geospatial distri­ modeling the PV energy generation and the battery energy storage de­
bution across different boroughs, each greenhouse experiences unique vices in individual greenhouses, we take into account various elements,
outdoor climate conditions. These differences in climate influence local including the local climate conditions and the population size of each
crop growth and energy dynamics, adding complexity to the task of New York City borough. However, it is important to note that variations
optimally managing energy resources in a coordinated way. The in the capacities of the PV and energy storage systems are unlikely to
greenhouses are interconnected through a shared electricity grid. We influence the performance of the proposed energy management system.
tested three control algorithms–RBC, DDPG, and MADRL–for their As the capacities of equipped devices do not constitute the state or
effectiveness in reducing grid reliance. RBC and DDPG controllers control space of the formulated decision-making problem, the agents are
operate in a centralized fashion, whereas MADRL functions in a expected to adapt their policies over dynamic PV generation and energy
distributed setup. The advantages of a distributed implementation will storage states irrespective of their capacities.
be elaborated on in the subsequent section. As previously highlighted,
the DDPG algorithm operates centrally. This means that every piece of 4.2. Demand response performance
information is relayed to the controller at each timestep for optimal
decision-making. While this ensures maximum data integration, it also Each location has distinct outdoor conditions, impacting the internal
demands more computational resources and data during training. To dynamics and decision-making needed for optimal crop growth within
ensure fair comparison between the methods, a consistent configuration the greenhouses. For instance, temperature and humidity directly in­
is maintained across all experiments. Both DDPG and MADRL schemes fluence internal ventilation, whereas higher solar radiation not only

Fig. 2. Overview of the greenhouse locations in NYC and their associated photovoltaic (PV) and battery energy storage system capacities.

7
A. Ajagekar et al. Applied Energy 355 (2024) 122349

elevates the greenhouse’s internal temperature but also boosts crop For each season, the figure reveals a consistent pattern: both DDPG and
growth, given its integral role in photosynthesis. Concurrently, MADRL controllers generally draw less energy from the grid than the
increased solar radiation augments the power generated by the solar RBC. The reason behind this is that the RBC operates based on a fixed set
panels. This complex interplay between external factors and the internal of heuristics, which might not always be optimal. In contrast, DDPG and
climate of the greenhouse adds layers of complexity to climate control. MADRL, while showing occasional variations in performance, tend to
Fig. 3 illustrates the average hourly energy consumption for each operate more optimally. Notably, DDPG is a centralized energy man­
greenhouse over a 7-day span across all four seasons. Notably, energy agement system, and one might expect it to outperform the distributed
consumption varies for greenhouses depending on their location and the MAAC control system. However, the catch is that the centralized system
specific season. These differences can be attributed to the unique local demands more data, a larger neural network architecture, and an
dynamics each greenhouse experiences, influenced by local outdoor extended training period to achieve satisfactory results. In contrast, the
conditions and their respective renewable energy capacities. Neverthe­ proposed MADRL functions efficiently, requiring less data and a more
less, within each season, the energy consumed by individual green­ streamlined neural network for each agent, coupled with a shorter
houses remains relatively consistent. Among the seasons, winter records training period. It is worth noting that both the DDPG and MADRL
the highest energy consumption, whereas summer exhibits the lowest. utilized identical neural networks and training hyperparameters to
This disparity can likely be attributed to the increased sunlight during illustrate the effectiveness of the MADRL-based energy management
the summer months, leading to enhanced electricity generation from system.
solar PV systems compared to other seasons. The energy needed to meet To distinctly compare the performance between DDPG and MADRL,
the greenhouse energy needs presented in Fig. 3 may originate from the Fig. 5 is presented. This figure displays the hourly average total energy
photovoltaic (PV) system, the battery, or the electricity grid. Using supplied to the greenhouses from the grid for the four seasons. A clear
renewable energy or battery power comes without added electricity trend emerges from this information: the MADRL controller consistently
costs, whereas grid energy usage reflects the current electricity cost. requires less energy from the grid than both the RBC and DDPG. This
Consequently, the primary aim of the energy management system is to underscores the capability of MADRL to learn effectively from limited
curtail reliance on grid energy, thus allowing the urban food production data, ensuring grid stability without compromising the operational en­
system to participate in demand response. This approach not only re­ ergy needs of each greenhouse. Consistent with the trends highlighted in
duces the greenhouse operational costs but also bolsters grid stability. both the greenhouse energy consumption (Fig. 3) and the grid energy
By lessening greenhouse dependency on the grid, the electricity costs supply per greenhouse (Fig. 4), the summer season shows a reduced total
during peak hours are reduced for the greenhouse farmer. grid energy supply compared to other seasons. The rationale for this
Fig. 4 depicts the energy supplied to each greenhouse from the grid, trend aligns with the reasons previously mentioned.
as determined by the energy management systems across the different Finally, Table 1 provides an overview of the average energy demand
NYC greenhouse locations over a 7-day period for each growing season. achieved by the three controllers for the 7-day period for all the seasons.

Fig. 3. Hourly average greenhouse energy consumption for each location and season, namely winter, spring, summer, and fall in NYC. The seasons correspond to the
first week in January, April, July, and October, respectively.

8
A. Ajagekar et al. Applied Energy 355 (2024) 122349

Fig. 4. Hourly average grid energy supply for each location in NYC for the four (4) seasons, namely winter, spring, summer, and fall. The seasons correspond to the
first week in January, April, July, and October, respectively.

The data demonstrates that, with the exception of the spring season, the possibly due to its extensive data requirements during training. This
proposed MADRL consistently consumes less energy on average than study signifies a promising step towards efficient and sustainable energy
both DDPG and RBC. This highlights the robustness, scalability, and management in urban greenhouses. A possible future direction of this
rapid learning capacity of the MADRL system. A potential explanation research is to consider water regulation within each greenhouse to
for the spring anomaly is that the DDPG controller might have adeptly simultaneously tackle food production and water and energy con­
adapted to the unique challenges of the spring season, resulting in a sumption in urban agriculture. The proposed framework can be further
superior performance during that period. However, as indicated in the extended to consider other forms of renewable energy storage, such as
table, the performance of DDPG is inconsistent across other seasons. wind and geothermal energy, and make more informed decisions within
Despite this variability, it is worth noting that the DDPG controller, due the greenhouse by considering its overall energy needs.
to its inherent optimality and centralized structure, is anticipated to
outperform the other controllers, as mentioned earlier. CRediT authorship contribution statement

5. Conclusion Akshay Ajagekar: Writing – review & editing, Writing – original


draft, Visualization, Validation, Methodology, Investigation, Formal
In this study, we proposed a multi-agent deep reinforcement learning analysis, Data curation. Benjamin Decardi-Nelson: Writing – review &
framework tailored for interconnected greenhouses that leverage editing, Writing – original draft, Visualization, Validation, Software,
renewable energy sources, notably solar power. This approach adeptly Formal analysis. Fengqi You: Writing – review & editing, Supervision,
navigates the intricacies of variable renewable energy generation and Software, Resources, Project administration, Methodology, Investiga­
interacts seamlessly with a dynamic electricity tariff grid. By incorpo­ tion, Funding acquisition, Formal analysis, Data curation,
rating an actor-critic algorithm with a shared attention mechanism, our Conceptualization.
strategy showcases scalability, quick learning capability, and
commendable performance. Using a New York City case study involving Declaration of Competing Interest
greenhouses in each of its boroughs – Brooklyn, Manhattan, Queens,
Staten Island, and the Bronx – we demonstrated the efficacy of the The authors declare that they have no known competing financial
proposed MADRL over the well-known RBC and DDPG algorithms across interests or personal relationships that could have appeared to influence
all seasons. The data reveals that the MADRL system can consistently the work reported in this paper.
curtail grid energy demand by at least 28% compared to RBC. On the
other hand, the performance of DDPG tends to waver in relation to RBC,

9
A. Ajagekar et al. Applied Energy 355 (2024) 122349

References

[1] Béné C, et al. Feeding 9 billion by 2050–putting fish back on the menu. Food Secur
2015;7:261–74.
[2] Shamshiri RR, et al. Advances in greenhouse automation and controlled
environment agriculture: A transition to plant factories and urban agriculture.
2018.
[3] Chen W-H, Mattson NS, You F. Intelligent control and energy optimization in
controlled environment agriculture via nonlinear model predictive control of semi-
closed greenhouse. Appl Energy 2022;320:119334. 2022/08/15/, https://doi.
org/10.1016/j.apenergy.2022.119334.
[4] Engler N, Krarti M. Review of energy efficiency in controlled environment
agriculture. Renew Sustain Energy Rev 2021;141:110786. 2021/05/01/, htt
ps://doi.org/10.1016/j.rser.2021.110786.
[5] van Beveren PJM, Bontsema J, van Straten G, van Henten EJ. Optimal control of
greenhouse climate using minimal energy and grower defined bounds. Appl Energy
2015;159:509–19. 2015/12/01/, https://doi.org/10.1016/j.apenergy.2015.09.01
2.
[6] Ajagekar A, Mattson NS, You F. Energy-efficient AI-based control of semi-closed
greenhouses leveraging robust optimization in deep reinforcement learning. Adv
Appl Energy 2023;9:100119. 2023/02/01/, https://doi.org/10.1016/j.adapen.20
22.100119.
[7] Chalabi ZS, Bailey BJ, Wilkinson DJ. A real-time optimal control algorithm for
greenhouse heating. Comput Electron Agric 1996;15(1):1–13. 1996/05/01/,
https://doi.org/10.1016/0168-1699(95)00053-4.
[8] Chen J, Xu F, Tan D, Shen Z, Zhang L, Ai Q. A control method for agricultural
greenhouses heating based on computational fluid dynamics and energy prediction
model. Appl Energy 2015;141:106–18. 2015/03/01/, https://doi.org/10.1016/j.
apenergy.2014.12.026.
[9] Hu G, You F. Renewable energy-powered semi-closed greenhouse for sustainable
crop production using model predictive control and machine learning for energy
management. Renew Sustain Energy Rev 2022;168:112790. 2022/10/01/, htt
ps://doi.org/10.1016/j.rser.2022.112790.
[10] Esen M, Yuksel T. Experimental evaluation of using various renewable energy
sources for heating a greenhouse. Energ Buildings 2013;65:340–51. 2013/10/01/,
https://doi.org/10.1016/j.enbuild.2013.06.018.
[11] Singh D, Basu C, Meinhardt-Wollweber M, Roth B. LEDs for energy efficient
greenhouse lighting. Renew Sustain Energy Rev 2015;49:139–47. 2015/09/01/,
https://doi.org/10.1016/j.rser.2015.04.117.
[12] Ouammi A, Achour Y, Zejli D, Dagdougui H. Supervisory model predictive control
for optimal energy management of networked smart greenhouses integrated
microgrid. IEEE Trans Autom Sci Eng 2020;17(1):117–28. https://doi.org/
10.1109/TASE.2019.2910756.
[13] Yang S, Gao HO, You F. Model predictive control in phase-change-material-
wallboard-enhanced building energy management considering electricity price
Fig. 5. Hourly average total grid energy supply to the greenhouses in NYC for dynamics. Appl Energy 2022;326:120023. https://doi.org/10.1016/j.
the four (4) seasons namely winter, spring, summer, and fall. The seasons apenergy.2022.120023.
correspond to the first week in January, April, July, and October respectively. [14] Babonneau F, Caramanis M, Haurie A. A linear programming model for power
distribution with demand response and variable renewable energy. Appl Energy
2016;181:83–95. 2016/11/01/, https://doi.org/10.1016/j.apenergy.2016.08.028.
[15] Farrokhifar M, Bahmani H, Faridpak B, Safari A, Pozo D, Aiello M. Model
Table 1 predictive control for demand side management in buildings: a survey. Sustain
Average total energy demand from the grid for one week of operation in each of Cities Soc 2021;75:103381. 2021/12/01/, https://doi.org/10.1016/j.scs.2021.10
the four (4) seasons, namely winter, spring, summer, and fall. The seasons 3381.
[16] Garifi K, Baker K, Touri B, Christensen D. Stochastic model predictive control for
correspond to the first week in January, April, July, and October, respectively. demand response in a home energy management system. In: 2018 IEEE power &
Controller Average total grid energy supply (kWh) energy society general meeting (PESGM). IEEE; 2018. p. 1–5.
[17] Chen Z, Wu L, Fu Y. Real-time Price-based demand response management for
Winter Spring Summer Fall residential appliances via stochastic optimization and robust optimization. IEEE
Trans Smart Grid 2012;3(4):1822–31. https://doi.org/10.1109/
RBC 381.35 212.03 60.61 112.56
TSG.2012.2212729.
DDPG 231.05 91.88 129.37 164.61 [18] Ebrahimi J, Abedini M. A two-stage framework for demand-side management and
MADRL 224.58 147.16 43.35 62.41 energy savings of various buildings in multi smart grid using robust optimization
algorithms. J Build Eng 2022;53:104486. 2022/08/01/, https://doi.org/10.1016
/j.jobe.2022.104486.
Data availability [19] Vázquez-Canteli JR, Nagy Z. Reinforcement learning for demand response: a
review of algorithms and modeling techniques. Appl Energy 2019;235:1072–89.
2019/02/01/, https://doi.org/10.1016/j.apenergy.2018.11.002.
Data will be made available on request. [20] Shin J, Badgwell TA, Liu K-H, Lee JH. Reinforcement learning–overview of recent
progress and implications for process control. Comput Chem Eng 2019;127:
Acknowledgment 282–94.
[21] Lu R, Hong SH. Incentive-based demand response for smart grid with
reinforcement learning and deep neural network. Appl Energy 2019;236:937–49.
The authors acknowledge support from Cornell Institute for Digital 2019/02/15/, https://doi.org/10.1016/j.apenergy.2018.12.061.
Agriculture (CIDA) for resources utilized in this research, and partial [22] Bahrami S, Chen YC, Wong VWS. Deep reinforcement learning for demand
response in distribution networks. IEEE Trans Smart Grid 2021;12(2):1496–506.
support from Specialty Crop Research Initiative [Award No. 2022- https://doi.org/10.1109/TSG.2020.3037066.
51181-38324] from the USDA National Institute of Food and Agricul­ [23] Hao J, et al. Exploration in deep reinforcement learning: from single-agent to
ture. The authors also acknowledge the contributions of Jiahan Xie to multiagent domain. IEEE Trans Neural Netw Learn Syst 2023:1–21. https://doi.
org/10.1109/TNNLS.2023.3236361.
the initial conceptualization of the proposed methodology. B.D.-N. ac­ [24] Lu R, Li Y-C, Li Y, Jiang J, Ding Y. Multi-agent deep reinforcement learning based
knowledges the partial support from Schmidt Futures via an Eric and demand response for discrete manufacturing systems energy management. Appl
Wendy Schmidt AI in Science Postdoctoral Fellowship to Cornell Energy 2020;276:115473. 2020/10/15/, https://doi.org/10.1016/j.apenergy.20
20.115473.
University.

10
A. Ajagekar et al. Applied Energy 355 (2024) 122349

[25] Xie J, Ajagekar A, You F. Multi-agent attention-based deep reinforcement learning [37] Rahimi-Eichi H, Chow M-Y. Adaptive parameter identification and state-of-charge
for demand response in grid-responsive buildings. Appl Energy 2023;342:121162. estimation of lithium-ion batteries. In: IECON 2012-38th annual conference on
2023/07/15/, https://doi.org/10.1016/j.apenergy.2023.121162. IEEE industrial electronics society. IEEE; 2012. p. 4012–7.
[26] Wong A, Bäck T, Kononova AV, Plaat A. Deep multiagent reinforcement learning: [38] Vázquez-Canteli JR, Dey S, Henze G, Nagy Z. CityLearn: Standardizing research in
challenges and directions. Artif Intell Rev 2023;56(6):5023–56. multi-agent reinforcement learning for demand response and urban energy
[27] Hernandez-Leal P, Kartal B, Taylor ME. A survey and critique of multiagent deep management. arXiv preprint arXiv:2012.10504. 2020.
reinforcement learning. Auton Agent Multi-Agent Syst 2019;33(6):750–97. [39] Rezaei E, Dagdougui H, Ojand K. Hierarchical distributed energy management
[28] Wheeler TR, Craufurd PQ, Ellis RH, Porter JR, Vara Prasad PV. Temperature framework for multiple greenhouses considering demand response. IEEE Trans
variability and the yield of annual crops. Agric Ecosyst Environ 2000;82(1): Sustain Energy 2023;14(1):453–64. https://doi.org/10.1109/TSTE.2022.3215686.
159–67. 2000/12/01/, https://doi.org/10.1016/S0167-8809(00)00224-3. [40] Zhang Z, Zhang D, Qiu RC. Deep reinforcement learning for power system
[29] Amani M, Foroushani S, Sultan M, Bahrami M. Comprehensive review on applications: an overview. CSEE J Power Energy Syst 2019;6(1):213–25.
dehumidification strategies for agricultural greenhouse applications. Appl Therm [41] Littman ML. Markov games as a framework for multi-agent reinforcement learning.
Eng 2020;181:115979. 2020/11/25/, https://doi.org/10.1016/j.applthermale In: Machine learning proceedings 1994. Elsevier; 1994. p. 157–63.
ng.2020.115979. [42] Foerster J, Assael IA, De Freitas N, Whiteson S. Learning to communicate with deep
[30] Yang X, Short TH, Fox RD, Bauerle WL. Transpiration, leaf temperature and multi-agent reinforcement learning. Adv Neural Inf Proces Syst 2016;29.
stomatal resistance of a greenhouse cucumber crop. Agric For Meteorol 1990;51 [43] Vaswani A, et al. Attention is all you need. In: Advances in neural information
(3):197–209. 1990/07/01/, https://doi.org/10.1016/0168-1923(90)90108-I. processing systems. vol. 30; 2017.
[31] Mortensen LM. Review: CO2 enrichment in greenhouses. Crop responses. Sci Hortic [44] Haarnoja T, Zhou A, Abbeel P, Levine S. Soft actor-critic: Off-policy maximum
1987;33(1):1–25. 1987/08/01/, https://doi.org/10.1016/0304-4238(87)90028-8. entropy deep reinforcement learning with a stochastic actor. In: International
[32] Vanthoor BHE, de Visser PHB, Stanghellini C, van Henten EJ. A methodology for conference on machine learning. PMLR; 2018. p. 1861–70.
model-based greenhouse design: part 2, description and validation of a tomato [45] Virtanen P, et al. SciPy 1.0: fundamental algorithms for scientific computing in
yield model. Biosyst Eng 2011;110(4):378–95. 2011/12/01/, https://doi. Python. Nat Methods 2020;17(3):261–72.
org/10.1016/j.biosystemseng.2011.08.005. [46] Brockman G, et al. Openai gym. arXiv preprint arXiv:1606.01540vol. 10; 2016.
[33] Katzin D, van Mourik S, Kempkes F, van Henten EJ. GreenLight – an open source [47] Sengupta M, Xie Y, Lopez A, Habte A, Maclaurin G, Shelby J. The national solar
model for greenhouses with supplemental lighting: evaluation of heat requirements radiation data base (NSRDB). Renew Sustain Energy Rev 2018;89:51–60.
under LED and HPS lamps. Biosyst Eng 2020;194:61–81. 2020/06/01/, https [48] Chen W, You F. Smart greenhouse control under harsh climate conditions based on
://doi.org/10.1016/j.biosystemseng.2020.03.010. data-driven robust model predictive control with principal component analysis and
[34] Mortensen LM, Strømme E. Effects of light quality on some greenhouse crops. Sci kernel density estimation. Journal of Process Control 2021;107:103–13. https://
Hortic 1987;33(1):27–36. 1987/08/01/, https://doi.org/10.1016/0304-4238(87) doi.org/10.1016/j.jprocont.2021.10.004.
90029-X. [49] Yang S, Gao HO, You F. Model Predictive Control for Demand- and Market-
[35] Bantis F, Koukounaras A. Impact of light on horticultural cropsvol. 13. MDPI; 2023. Responsive Building Energy Management by Leveraging Active Latent Heat
p. 828. Storage. Applied Energy 2022;327:120054. https://doi.org/10.1016/j.
[36] Niu J, Tian Z, Lu Y, Zhao H. Flexible dispatch of a building energy system using apenergy.2022.120054.
building thermal storage and battery energy storage. Appl Energy 2019;243: [50] Chen W-H, You F. Semiclosed Greenhouse Climate Control Under Uncertainty via
274–87. 2019/06/01/, https://doi.org/10.1016/j.apenergy.2019.03.187. Machine Learning and Data-Driven Robust Model Predictive Control. in IEEE
Transactions on Control Systems Technology 2022;30(3):1186–97. https://doi.
org/10.1109/TCST.2021.3094999.

11

You might also like