0% found this document useful (0 votes)
2 views

Ipsn 2022 Drlic Deep Reinforcement Learning for Irrigation Control

The document presents DRLIC, a deep reinforcement learning-based irrigation system designed to improve water efficiency in almond orchards by predicting future soil moisture loss and optimizing irrigation schedules. DRLIC utilizes a neural network to learn optimal control policies while ensuring plant health through a safe mechanism that prevents unsafe irrigation actions. Field experiments demonstrate that DRLIC can reduce water usage by 9.52% compared to traditional ET-based irrigation methods.

Uploaded by

Ayoub Abram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Ipsn 2022 Drlic Deep Reinforcement Learning for Irrigation Control

The document presents DRLIC, a deep reinforcement learning-based irrigation system designed to improve water efficiency in almond orchards by predicting future soil moisture loss and optimizing irrigation schedules. DRLIC utilizes a neural network to learn optimal control policies while ensuring plant health through a safe mechanism that prevents unsafe irrigation actions. Field experiments demonstrate that DRLIC can reduce water usage by 9.52% compared to traditional ET-based irrigation methods.

Uploaded by

Ayoub Abram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

DRLIC: Deep Reinforcement Learning for Irrigation Control

Xianzhong Ding Wan Du


University of California, Merced University of California, Merced
[email protected] [email protected]

ABSTRACT To maintain the soil moisture between the MAD and FC range,
Agricultural irrigation is a major consumer of freshwater. Current the sprinklers need to be opened every day or several days, depend-
irrigation systems used in field are not efficient, since they are ing on the soil moisture change. Due to the high evaporation loss in
mainly based on soil moisture sensors’ measurement and growers’ California, daily irrigation is recommended by the Almond Board
experience, but not future soil moisture loss. It is hard to predict of California [6] and used in some existing irrigation systems [7].
soil moisture loss, as it depends on a variety of factors, such as soil Current micro-sprinkler irrigation systems normally irrigate plants
texture, weather and plants’ characteristics. To improve irrigation at night, since irrigating in the day causes higher evaporative water
efficiency, this paper presents DRLIC, a deep reinforcement learning loss (14-19%) [8]. Therefore, the irrigation scheduling problem is to
(DRL)-based irrigation system. DRLIC uses a neural network (called decide the irrigation water volume for each sprinkler to guarantee
as DRL control agent) to learn an optimal control policy that takes that the soil moisture will be still within the MAD and FC range
both current soil moisture measurement and future soil moisture at next irrigation time. The decision is based on the current soil
loss into account. We define an irrigation reward function that facil- moisture level and the predicted soil moisture loss of next day.
itates the control agent to learn from past experience. Sometimes, The latter is determined by soil type, local weather, and plants’
our DRL control agent may output an unsafe action (e.g., irrigating properties (e.g., the root’s length and the number of leaves). The
too much water or too less). To prevent any possible damage to irrigation’s goal is to irrigate the trees with a proper amount of
plants’ health, we adopt a safe mechanism that leverages a soil water, so that the soil moisture will be still above the MAD level at
moisture predictor to estimate each action’s performance. If it is next irrigation time.
unsafe, we will perform a relatively-conservative action instead. Optimal irrigation control strategies should model the soil mois-
Finally, we develop a real-world irrigation system that is composed ture loss that will be experienced before the next irrigation time. If
of sprinklers, sensing and control nodes, and a wireless network. we have such a soil moisture prediction model, conventional Model
We deploy DRLIC in our testbed composed of six almond trees. Predictive Control (MPC) methods can be used to decide the optimal
Through a 15-day in-field experiment, we find that DRLIC can save amount of water to irrigate. However, the performance of these
up to 9.52% of water over a widely-used irrigation scheme. methods relies highly on the accuracy of the soil moisture prediction
model [9, 10]. It is hard to obtain an accurate model for an almond
orchard, because the soil moisture is affected by many factors,
1 INTRODUCTION including soil type, topography and surrounding environment (e.g.,
ambient temperature, humidity, and solar radiation intensity), and
Agriculture is a major consumer of ground and surface water in the
internal transpiration from plants [11]. In addition, customized soil
United States, accounting for approximately 80% of the Nation’s
moisture models are required for different orchards, limiting the
consumptive water use and over 90% in many Western states1 .
scalability of MPC-based methods. Due to the above two limitations,
California’s 2019 almond acreage is estimated at 1,530,000 acres,
MPC-based methods have not been used in orchards.
and almond irrigation is estimated to consume roughly 195.26
The irrigation systems currently used in orchards are ET-based
billion gallons per year [4, 5]. With a historic drought afflicting the
or sensor-based control methods. Evapotranspiration (ET) is an
Western states, it is imperative to improve the irrigation efficiency
estimate of moisture lost from soil, subject to weather factors such
for saving our limited freshwater reserve. This work is focused on
as wind, temperature, humidity, and solar irradiance. All these
the irrigation efficiency of almond orchards.
weather factors are being measured by weather stations. Local ET
The primary goal of agricultural irrigation is to guarantee the
value is also publicly available [12] and updated every hour. Based
trees’ health and maximize production. To do so, the trees’ soil
on the ET values since the last irrigation time, ET-based irrigation
moisture should be maintained with a range between the Field
controllers start the sprinklers to compensate for the soil moisture
Capacity (FC) level and the Management Allowable Depletion
loss. However, they do not consider the soil moisture loss of next
(MAD) level. If the soil moisture is lower than the MAD level, the
day before the next irrigation time. If the soil moisture loss in the
almond trees will turn brown or even die. If the soil moisture is
last day does not equal the soil moisture loss that will happen in the
higher than the FC level, excess water in the soil will reduce the
next day, ET-based irrigation may under-irrigate or over-irrigate.
movement of oxygen, impacting the ability of the tree to take in
In addition, a safe margin of water [13] is normally added, making
water and nutrients. Both FC and MAD levels can be determined
ET-based methods over-irrigate in most cases [7].
by the type of plants and soil. For a specific orchard, we need to
With accurate soil moisture sensors, irrigation controllers can
know the soil type. We can then find the FC and MAD levels for a
react directly to the soil moisture level [7]. The commonly-used
specific soil type by referring to a manual [6].
controllers are "rule-based", in which a certain amount of water
will be supplied once soil moisture deficiency is detected. However,
1 Irrigation Water Use: https://www.ers.usda.gov/ parameters for the time and the amount to irrigate are generally
Saturation 110 4

Inches of Water /ft. Soil Depth


% of Maximum Production
100
city
Excess Water apa

Permanent Wilting Point


Field Capacity 90
ld C
80 3 Fie

Filed Capacity
70
Available Soil 60 Available Water

MAD
2
Moisture 50
Permanent 40
Wilting Point 30 1 Unavailable Water
g Point
ent Wiltin
20 Perman
Unavailable Water 10 0
Oven Dry Sand Loamy Sandy Fine Loam Silt Silty Clay Silty Clay
100 90 80 70 60 50 40 30 20 10 0 sand loam Sandy loam clay loam clay
% of Available Water loam loam

Figure 1: The various levels of the soil Figure 2: How plant production (growth) Figure 3: Relationship between available
water content [1]. is affected by soil water content [2]. water capacity and soil texture [3].

tuned by growers by their experience. Without predicting how generate another action. We use the soil moisture model of our
much water will be lost, sensor-based irrigation normally does not soil-water simulator to verify whether an action is safe or not.
systematically take into account future weather information, such To evaluate the performance of DRLIC, we build an irrigation
as rain and wind in next day. testbed with micro-sprinklers currently used in almond orchards.
To solve the limitations of the above existing irrigation schemes, Six almond trees are planted in two raise-beds. Each tree has a sens-
we develop DRLIC, a practical Deep Reinforcement Learning (DRL)- ing and control node, composed of an independently-controllable
based irrigation system, which automatically learns an optimal micro-sprinkler and a soil moisture measurement set (two sensors
irrigation control policy by exploring different control actions. In deployed at different depths in the soil). Each node can send its
DRLIC, a control agent observes the state of the environment, and sensing data to our server via IEEE 802.15.4 wireless transmission,
chooses an action based on a control policy. After applying the ac- and receive irrigation commands from the server.
tion, the environment transits to a next state and the agent receives a We have deployed our testbed in the field and collected soil
reward to its action. The goal of learning is to maximize the expected moisture data from six sensing and control nodes for more than
cumulative discounted reward. DRLIC’s control agent uses a neural three months. We use 2-month data to train our soil moisture
network to learn its control policy. The neural network maps "raw" simulator and 0.5-month data to validate its accuracy. After training
observations to the irrigation decision for the next day. The state DRLIC’s control agent, we have deployed the controller in our
includes the weather information (e.g., ET and Precipitation) of testbed for 15 days. Experiment results demonstrate that DRLIC
today and next day. can reduce the water usage by 9.52% over the ET-based control
To minimize the irrigation water consumption while not impact- method, without damaging the almond tree health.
ing the trees’ health, we design a reward function that considers We summarize the main contributions of this paper as follows:
three specific situations. If the soil moisture result is higher the FC • We design DRLIC, a DRL-based irrigation method for agri-
level or lower than the MAD level, we will give the control agent a cultural water usage saving.
negative reward. If the soil moisture result is within the MAD and • A set of techniques have been proposed to transform DRLIC
FC range, we will give the control agent a positive reward inversely into a practical irrigation system, including our customized
proportional to the water consumption. design of DRL states and reward for optimal irrigation, a
Ideally, DRLIC’s control agent should be trained in a real orchard validated soil moisture simulator for fast DRL training, and
of almond trees. However, due to the long irrigation interval (one a safe irrigation module.
day in our case), the control agent can only explore 365 control • We build an irrigation testbed with customized sensing and
actions per year. It will take 384 years to train a converged control actuation nodes, and six almond trees.
agent. Therefore, to speedup the training process, we train our • Extensive experiments in our testbed show the effectiveness
control agent in a customized soil-water simulator. The simulator of DRLIC.
is calibrated by the 2-month soil moisture data of six almond trees
and can generate sufficient training data for DRLIC using 10-year
weather data.
2 IRRIGATION PROBLEM
Working as an irrigation controller in the field, the control agent Soil Water Content Parameters. Soil is a plant’s water reservoir.
may meet some states that it has not seen during training, especially Water can fill up to 35% of the space in soil. Soil water content
for the control agent trained in a simulated environment. In this is the amount of water in the soil, which is often measured as a
situation, the control agent may make a poor decision that violates percentage of water by volume (%) or by inches of water per foot
plants’ health, i.e., making the soil moisture level lower than the of root (in/ft). Soil moisture sensors are used to measure the soil
MAD level or higher than the FC level. To handle the gap between water content (%) at one location in the soil. For a tree with a root
simulated environment and real orchard, we design a safe irrigation of several feet, multiple soil moisture sensors may be deployed
mechanism. If DRLIC’s control agent outputs an unwise action, in different depths along with the root. The root is divided into a
instead of executing that action, we use the ET-based method to certain number of pieces. A soil moisture sensor is deployed at the
middle point of each piece. The soil water content of the tree can be
calculated as 𝑉 = 𝑀 𝑗=1 𝜑 𝑗 ∗ 𝑑 𝑗 , where 𝑀 is the number of moisture
Í

2
sensors installed at different depths (M is 2 in our experiments); 𝜑 𝑗 Soil-Water System
is the reading measured by the 𝑗th soil moisture sensor; and 𝑑 𝑗 is
the depth that the 𝑗th moisture sensor covers. If such a set of soil
moisture sensors are used to measure the soil water content of a
region, they will be deployed under a typical tree that has similar
soil water content with most of the trees in the region.
A healthy plant’s root must be within a sufficient supply of water.
Figure 1 shows two critical levels of soil water content for plants’ Almond Orchard Sensing and Actuation Node
health [1]. 1) If the soil water content is below the Permanent φt At
Wilting Point (PWP), plants cannot suck necessary moisture out of
the soil. Keeping soil below the PWP level for an extended period
of time will cause plants to wilt or eventually die. 2) If the soil φt At
water content of a tree is above Field Capacity (FC), the soil has Base Station

over-abundance of water, which will cause water waste and rotting Sensor Readings φt Irrigation Schedules At
of the root over time (impacting the trees’ health). Therefore, the ET DRL-based Irrigation
goal of irrigation systems is to maintain soil water content between Control Algorithm
Weather Data Server
the PWP level and the FC level.
For fruit trees like almond, production is the major goal of
irrigation. To maximize the production, we need to maintain the soil Figure 4: DRLIC System Architecture.
water content above the Management Allowable Depletion (MAD)
level, instead of the PWP level. Figure 2 depicts the relationship How Many Valves to Control in an Orchard? Ideally, the
between soil water content and plant production for almond trees sprinkler for each tree should be individually controlled, since the
[2]. The curve and the MAD level may be different for different ET of each tree in an orchard varies from 0.12 to 0.20 inches [14].
fruits. From Figure 2, we can see that the MAD level for almond Moreover, the soil type also varies spatially in an orchard [6], e.g.,
trees is the median value (50%) between the FC level and the PWP there are 10 soil type differences with soil clay loam accounting for
level. Therefore, almond trees can achieve their maximum production, from 45.6% to 54.7% and 0 to 8 percent slopes in a 60-acre orchard
as long as we maintain the soil water content above the MAD level. of California 2 . However, there are around 75-125 almond trees in
How to Determine these Parameters in an Orchard? The one acre, it is costly to deploy a soil moisture sensor under each
soil water content range between the FC level and the PWP level is tree. Thus, an orchard is normally divided into several irrigation
the Available Water holding Capacity (AWC) of the soil. As shown regions based on the similarity of soil texture. A valve is used in
in Figure 3, different soil types have different AWCs [3]. The soil’s each irrigation region to control all the sprinklers. The irrigation
AWC may be affected by its texture, presence and abundance of problem of a large orchard is to control a number of valves. This
rock fragments, and its depth and layers. The soil’s AWC increases paper is focused on irrigation scheduling, but not field partitioning.
as it becomes finer-textured from sands to loam [3], and the soil’s A simple way to partition an orchard into several irrigation regions
AWC decreases as it contains more clay from loam to clay [3]. is to survey the soil samples across the orchard using an auger.
The AWC of a tree, 𝑉𝑎𝑤𝑐 , can be calculated as 𝑉𝑎𝑤𝑐 = 𝜎𝑎𝑤𝑐 ∗ Growers normally conduct the survey for other purposes too, such
𝐷 𝑓 𝑜𝑜𝑡 , where 𝜎𝑎𝑤𝑐 is the soil’s AWC and 𝐷 𝑓 𝑜𝑜𝑡 is the tree’s root as planning the density of trees and fertilizing the trees.
depth in the unit of feet. The AWC for different soil types, 𝜎𝑎𝑤𝑐 ,
can be found in [3]. 3 DRLIC SYSTEM DESIGN
The PWP level for a soil type, 𝑉𝑝𝑤𝑝 , can also be calculated as In this section, we first give an overview of DRLIC. We model
𝑉𝑝𝑤𝑝 = 𝜑𝑝𝑤𝑝 ∗ 𝐷𝑖𝑛𝑐ℎ , where 𝜑𝑝𝑤𝑝 is the soil moisture content at the irrigation problem as a Markov decision process. We design a
the wilting point of that soil type and 𝐷𝑖𝑛𝑐ℎ are the root depth of DRL-based irrigation scheme and a safe irrigation module.
the plant in the unit of inches. 𝜑𝑝𝑤𝑝 for a specific soil type can be
found in [3]. 3.1 Overview
Based on the above two parameter (𝑉𝑎𝑤𝑐 and 𝑉𝑝𝑤𝑝 ), we can also Figure 4 shows the system architecture of DRLIC, which is com-
obtain the FC level as 𝑉𝑓 𝑐 = 𝑉𝑎𝑤𝑐 + 𝑉𝑝𝑤𝑝 , and the MAD level as posed of two key components, i.e., a wireless network of sensing
𝑉𝑚𝑎𝑑 = 𝛼 ∗ 𝑉𝑎𝑤𝑐 + 𝑉𝑝𝑤𝑝 , where 𝛼 is set to 50% for almond trees. and actuation sprinkler nodes, and a DRL-based control algorithm.
How to Use these Parameters for Irrigation? The goal of For an almond orchard, we install the sensing and actuation
irrigation is to maintain the soil water content of plants between the node for each irrigation region. One sensing and actuation node
FC level and the MAD level. To correctly set an irrigation system, is equipped with a set of soil moisture sensors that are deployed
we need to know the soil’ AWC in the orchard and the PWP level at different depths in the soil. Sensing data is transmitted to the
(𝑉𝑎𝑤𝑐 and 𝑉𝑝𝑤𝑝 ). We can determine these two parameters based on base station via an IEEE 802.15.4 network. The Base Station collects
the above method, as long as we know the soil type. If the orchard is the data from DRLIC nodes and sends them to a local server using
large, the soil type varies in space and these two parameters change Wi-Fi. These sensing data collected from all DRLIC nodes creates a
too. We need to adapt the setting of these two parameters in the “snapshot” of the soil moisture readings 𝜑𝑡 across the entire orchard.
irrigation system accordingly.
2 Soil Map : https://casoilresource.lawr.ucdavis.edu/gmap/
3
Reward r
On the server, the DRL-based irrigation control agent makes
irrigation decisions based on the soil moisture sensors’ readings, DRLIC Agent
ET and weather data from local weather stations. It provides the Neural Network
Sensor Reading
optimal irrigation schedule for all DRLIC nodes. The objective of Action a
Environment
ET State s
DRLIC is to minimize the total irrigation water consumption while (Soil-water System)

meeting the requirement of almond health. The server will send the Weather Data

generated irrigation schedules 𝐴𝑡 to all DRLIC nodes. By receiving


a command, a node may open its sprinkler by a latching solenoid Observe state s
with two relays. The implementation details of the nodes will be
introduced in Section 4. Figure 5: Deep Reinforcement Learning in DRLIC.

Why do we use DRL for irrigation control?


3.2 MDP and DRL for Irrigation
• DRL learns an optimal irrigation control policy directly from
We adopt the daily irrigation scheme, i.e., the irrigation starts at data, without using any pre-programmed control rules or
11 PM every day. Each time, the controller decides how long to explicit assumptions about the soil-water environment.
open each sprinkler to guarantee that the soil water content will • DRL allows us to use domain knowledge to train an irrigation
be still within the MAD and FC range tomorrow night. The future control agent (a neural network) without labeled data.
soil water content is determined by the current soil water content, • The generalization ability of the neural network enables
the irrigated water volume, the trees’ water absorption, and soil the control agent to better handle the dynamically-varying
water loss (caused by runoff, percolation and ET). Such a sequential weather and ET data.
decision-making problem can be formulated as a Markov Decision
Process (MDP), modeled as <S, A, T, R>, where
• S is a finite set of states, which includes sensed moisture
3.3 Deep Reinforcement Learning in DRLIC
level from orchard and weather data from local station. Figure 5 summarizes the DRL architecture of DRLIC. The irrigation
• A is a finite set of irrigation actions for all control valves. control policy (DRLIC Agent) is derived from training a neural
• T is the state transition function defined as T : S ×A → S. network. The agent takes a set of information as input, including
The soil water content at next time step is determined by current soil water content, today’s weather data (e.g., ET and pre-
current soil water content and the irrigation action. cipitation), and the predicted weather data of tomorrow. Based on
• R is the reward function defined as S ×A → R, which qualifies the input, the agent outputs the best action, i.e., the amount of
the performance of a control action. water to irrigate. Until the next day at 11 PM, the resulting soil
water content is observed and passed back to the agent to calculate
Based on the above MDP-based irrigation problem formulation, we
a reward. The agent uses the reward to update the parameters of
will find an optimal control policy 𝜋 (𝑠) ∗ : S → A, which maximizes
the neural network for better irrigation control performance. Next,
the accumulative reward R. We cannot apply conventional tools
we introduce the design of each DRLIC component.
(e.g., dynamic programming) to search for the optimal control
policy, because the state transition function is hard to analytically 3.3.1 State in DRLIC. The state in our irrigation MDP model con-
characterize. In this paper, we consider an RL-based approach tains the information of three parts. (a) Sensed state, which is the
to generating irrigation control algorithms. Unlike previous ap- soil water content measured by DRLIC nodes. (b) Weather-related
proaches that use pre-defined rules in heuristic algorithms, our state, which includes the current and predicted state variables
approach will learn an irrigation policy from observations. from weather station. (c) Time-related state, which is about date
DRL is a data-driven learning method. It has been widely applied information.
in many control applications [15–19]. DRL learns an optimal control Sensed State. The soil water content of each irrigation region,
policy through interacting with the environment. At each time step calculated by Equation 6 using sensor reading 𝜑 from DRLIC node.
𝑡, the control agent selects an action 𝐴𝑡 = 𝑎, given the current state Weather-related State. It is a vector containing the weather in-
𝑆𝑡 = 𝑠, based on its policy 𝜋𝜃 . formation of current day and next day: ET (inch), Precipitation
(inch), maximum, average, minimum Temperature (◦ F), maximum,
𝑎 ∼ 𝜋𝜃 (𝑎|𝑠) = P(𝐴𝑡 |𝑆𝑡 = 𝑠; 𝜃 ) (1) average, minimum Humidity (%), average Solar Radiation (Ly/day),
In DRL, the control policy is approximated by a neural network average Wind Speed (mph), Predicted ET by Equation 16 (inch),
parameterized by 𝜃 [20]. When the control agent takes the action and forecasted Precipitation (inch) from local weather station.
𝑎, a state transition 𝑆𝑡 +1 = 𝑠 ′ occurs based the system dynamics 𝑓𝜃 Time-related State. Date including the month. The soil moisture
(Equation 2), and the control agent receives a reward 𝑅𝑡 +1 = 𝑟 . may vary in different months.
3.3.2 Action in DRLIC. Based on the current state outlined above,
𝑠 ′ ∼ 𝑓𝜃 (𝑠, 𝑎) = P(𝑆𝑡 +1 |𝑆𝑡 = 𝑠, 𝐴𝑡 = 𝑎) (2) our irrigation scheduling is to find the best amount of water to
𝜃 ∗ = argmax E𝜋𝜃 [𝑟 ] irrigate (inch), which can maintain plant health (or maximize pro-
(3)
𝜃 duction) with minimum water consumption. The action is a vector
Due to the Markov property, both reward and state transition that contains the water amount to irrigate for each irrigation region
depend only on the previous state. DRL then finds a policy 𝜋𝜃 that in an orchard. When the agent outputs an action, we will convert
maximizes the expected reward (Equation 3). the amount of irrigation water to the open time duration (td) 𝑡𝑑𝑖
4
for i𝑡ℎ micro-sprinkler is then calculated as 𝑡𝑑𝑖 = 𝑎𝑖 /𝐼 , where 𝐼 Table 1: Parameter Setting in Reward.
is the irrigation rate. We set 𝐼 to 0.018 inch/min according to the Parameter Value Parameter Value
specifications of the micro-sprinklers used in our testbed. 𝜆1 3 𝛼 50 (%)
𝜇1 8 𝐷𝑖𝑛𝑐ℎ , 𝐷 𝑓 𝑜𝑜𝑡 23.62 inches, 1.97 (feet)
3.3.3 Reward in DRLIC. We define the reward function to express 𝜇2 3 𝑑 11.81 (inches)
our objective of achieving good plant health with minimum water 𝜆3 10 𝜑𝑝𝑤𝑝 10 (%)
consumption. Both plant health and water consumption should be 𝜇3 1 𝜎𝑎𝑤𝑐 2.4 (in./ft.)
incorporated in the reward function. As we know from Section 2,
Second, when 𝑉𝑖 is lower than 𝑉𝑚𝑎𝑑 , we give a higher penalty due
to achieve the maximum production of almond trees, we need to
to plants’ health (𝜆3 = 10, but 𝜇 3 = 1).
maintain the soil water content between the MAD level and FC
level. We use the soil water content deviation from these two levels
3.4 DRLIC Training
as a proxy for plant health.
To minimize water consumption while not affecting the plant 3.4.1 Policy Gradient Optimization. In the above DRL framework,
health, we consider three situations in the design of the reward, a variety of policy gradient algorithms can be used to train the
as shown in Equation 5. First, when the soil water content (𝑉𝑖 ) for irrigation control agent. Policy gradient algorithms achieve the
i𝑡ℎ irrigation region is higher than the FC (𝑉𝑓 𝑐 ) level, the irrigated objective in Equation 3 by computing an estimate of the policy
water is more than the plants’ need. In this case, the plants’ health gradient and optimizing the objective through stochastic gradient
is affected by over-irrigated water, and water consumption is too ascent (Equation 11).
high. Second, when 𝑉𝑖 is between 𝑉𝑓 𝑐 and 𝑉𝑚𝑎𝑑 , the plants are in 𝜃 ← 𝜃 + 𝛼 ▽𝜃 E𝜋𝜃 [𝑟 ] (11)
good health. In this case, we strive to maintain the 𝑉𝑖 close to 𝑉𝑚𝑎𝑑
In this work, we use proximal policy optimization (PPO) [21],
to save water, so we give a reward inversely proportional to the
which has been successfully applied in many applications such as
water consumption. Third, when 𝑉𝑖 is lower than 𝑉𝑚𝑎𝑑 , the plants
navigation [22] and games [23]. PPO is known to be stable and
are under water stress. The plants’ health is significantly impacted,
robust to hyperparameters and network architectures [21].
proportional to the distance between 𝑉𝑖 and 𝑉𝑚𝑎𝑑 .
PPO minimizes the loss function in Equation 12, which is equiv-
By considering the above three situations, our reward function
alent to maximizing the Monte Carlo estimate of rewards with
is defined as follows:
regularization. The advantage function 𝐴ˆ𝑡 given by Equation 13
𝑁 is used to estimate the relative benefit of taking an action from a
given state.
∑︁
𝑅=− 𝑅𝑖 (4)
𝑖=1 𝐿𝑃𝑃𝑂 (𝜃 ) = −Ê𝑡 [𝑚𝑖𝑛(𝑤𝑡 (𝜃 )𝐴ˆ𝑡 , 𝑐𝑙𝑖𝑝 (𝑤𝑡 (𝜃 ), 1 − 𝜖, 1 + 𝜖)𝐴ˆ𝑡 )] (12)
𝜆 ∗ (𝑉𝑖 − 𝑉𝑓 𝑐 ) + 𝜇 1 ∗ 𝑎𝑖 , ∞

 1
 𝑉𝑖 > 𝑉𝑓 𝑐
𝐴ˆ𝑡 =
 ∑︁
(13)


𝑅𝑖 = 𝜇 2 ∗ 𝑎𝑖 , (5) 𝛾 𝑖 𝑟𝑡 +𝑖
𝑉 𝑓𝑐 > 𝑉𝑖 > 𝑉𝑚𝑎𝑑

 𝑖=0
 𝜆3 ∗ (𝑉𝑚𝑎𝑑 − 𝑉𝑖 ) + 𝜇3 ∗ 𝑎𝑖 ,

𝑉𝑖 < 𝑉𝑚𝑎𝑑 𝜋𝜃 (𝑎𝑡 |𝑠𝑡 )
𝑤𝑡 (𝜃 ) = (14)

𝜋𝜃𝑜𝑙𝑑 (𝑎𝑡 |𝑠𝑡 )
(6)
Í𝑀
𝑉 = 𝑗=1 𝜑 𝑗 ∗ 𝑑 𝑗
In Equation 14, 𝜋𝜃 (𝑎𝑡 |𝑠𝑡 ) is the policy being updated with the loss
𝑉𝑚𝑎𝑑 = 𝛼 ∗ 𝑉𝑎𝑤𝑐 + 𝑉𝑝𝑤𝑝 (7) function and 𝜋𝜃𝑜𝑙𝑑 (𝑎𝑡 |𝑠𝑡 ) is the policy that was used to collect data
with environment interaction. As the data collection policy differs
𝑉𝑓 𝑐 = 𝑉𝑎𝑤𝑐 + 𝑉𝑝𝑤𝑝 (8) from the policy being updated, it introduces a distribution shift.
The ratio 𝑤𝑡 (𝜃 ) corrects for this drift using importance sampling.
𝑉𝑝𝑤𝑝 = 𝜑𝑝𝑤𝑝 ∗ 𝐷𝑖𝑛𝑐ℎ (9) The ratio of two probabilities can blow up to large numbers and
destabilize training, so the ratio is clipped with 𝜖.
𝑉𝑎𝑤𝑐 = 𝜎𝑎𝑤𝑐 ∗ 𝐷 𝑓 𝑜𝑜𝑡 (10)
3.4.2 Data Collection and Preprocessing. On day 𝑡, the DRLIC agent
where 𝑁 is the number of irrigation regions in one orchard. 𝑎 is
observes a state 𝑠 (e.g., moisture level), and then chooses an action
the amount of water from the RL agent. 𝜎𝑎𝑤𝑐 and 𝜑𝑝𝑤𝑝 are set by
(water amount). After applying the action, the soil-water environ-
referring to the manual of California Almond Board [6] based on
ment’s state transits to 𝑠𝑡 +1 next day and the agent receives a reward
our specific soil type in our testbed. Equations 6, 7, 8, 9 and 10 have
𝑟 . After that, a data pair (𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡 +1 ) can be collected. We conduct
been introduced in Section 2.
data normalization by subtracting the mean of the states/action and
In our current implementation, the parameters of our reward
dividing by the standard deviation. We use 10-year weather data
function are set to the values shown in Table 1, based on the
(2010-2020) to generate the data pairs in our dataset, which will be
specifications of our testbed. The parameters in Equation 5 (i.e.,
used to train our DRLIC agent.
𝜆1 , 𝜇 1 , 𝜇2 , 𝜆3 and 𝜇 3 ) are set to the best values that provide the
best rewards during training. Their values are set by grid search, 3.4.3 Training Process. Ideally, DRLIC’s control agent should be
which will be introduced in details in Section 5. The values of these trained in an orchard of almond trees. A well-trained DRL agent
parameters in Table 1 confirm with our design goal of the reward needs 384 years to converge due to the long control interval of
function. First, when 𝑉𝑖 is larger than 𝑉𝑓 𝑐 , we give penalties due irrigation systems. It is impossible to train DRLIC agent in an
to both plants’ health and water consumption (𝜆1 = 3, but 𝜇 1 = 8). orchard. A feasible solution is to refer to a high-fidelity simulator.
5
Reward r
Algorithm 1: DRLIC Training Algorithm
Input: State s, Action a, Reward r, an initialized policy, 𝜋𝜃 ; Agent
ET-based
Output: A trained irrigation control agent ; Controller
1 for i=0,..., # Episodes do Environment
State s Bad Action a
2 State ← Soil-water environment (Soil-water system)
Safety Condition
3 𝜃𝑜𝑙𝑑 ←𝜃 ; Parameter θ Detector Normal
4 for t = 0, ..., Steps do Safe Mechanism
5 𝑎ˆ𝑡 = 𝜋𝜃 (𝑠𝑡 );
6 𝑠𝑡 +1, 𝑟𝑡 +1 = 𝑒𝑛𝑣.𝑠𝑡𝑒𝑝 (𝑎ˆ𝑡 ); Observe state s

7 Compute 𝐴ˆ𝑡 ; Safe Moisture


8 With minibatch of size M; Figure 6: Reinforcement Learning with Safe Mechanism. Trend Detecto

9 𝜃 ←𝜃 − 𝛼▽𝜃 𝐿𝑃𝑃𝑂 (𝜃 ) ;
Table 2: Coefficients of Predictor for Each Tree.

However, there are no such simulators available in the soil-water c1 c2 c3 b R2 NRMSE


domain. Then we decide to leverage a data-driven simulator to Tree1 0.973 0.288 -0.103 0.003 0.982 0.062
speed up the training process. We employ the soil water content Tree2 0.937 0.325 -0.121 0.013 0.985 0.071
predictor introduced in Section 3.5 as our soil-water simulator. The
than a threshold. If so, the detector will command DRLIC to switch
simulator allows DRLIC to "experience" the weather of 10 years in
from RL to ET-based controller.
several minutes.
We design a soil water content predictor to describe the water
The training procedure of DRLIC is outlined in Algorithm 1.
balance in the root zone soil. The variations of water storage in the
We train DRLIC using 1000 episodes and length of an episode as
soil are caused by both inflows (irrigation and precipitation) and
30 days. For each episode, we can collect 30 training data pair
outflows (evapotranspiration). This leads to the following mathe-
(𝑠𝑡 , 𝑎𝑡 , 𝑟𝑡 , 𝑠𝑡 +1 ) under different weather data and leverage Equation
matical expression:
12 to optimize the objective in Equation 3 through stochastic gradi-
ent ascent. The training ends once Algorithm 1 converges: at the
𝑉𝑖,𝑡 +1 = 𝑐 1 ∗ 𝑉𝑖,𝑡 + 𝑐 2 ∗ (𝐴𝑖,𝑡 + 𝑃𝑡 ) + 𝑐 3 ∗ 𝐸𝑡 + 𝑏 (15)
end of each episode the total reward obtained is compared with
the previous total. If the current episode reward does not change 𝐸𝑡 = Γ𝑐 ∗ 𝑅𝐴 ∗ 𝑇 𝐷 (1/2)
∗ (𝑇𝑡 + 17.8◦𝐶) (16)
by ± 3%, we consider the policy has converged. If the policy does where 𝑉𝑖,𝑡 +1 denotes the predicted moisture level in the root zone
not converge, the training will continue up to a maximum of 100 for 𝑖th irrigation region after taking the action from RL, 𝐸𝑡 and
training iterations (# episodes = 100). After the training, we will 𝑃𝑡 are the plants’ ET and the measured rainfall. In time period 𝑡,
deploy the trained DRLIC agent into the real almond orchard. and 𝐴𝑖,𝑡 is the irrigation amount for 𝑖th irrigation region. 𝑐 1 , 𝑐 2 ,
When we are given a new environment (e.g., a new orchard), and 𝑐 3 are coefficients. It is assumed in this work that runoff and
we first need to collect the real-world irrigation data of new envi- water percolation are proportional to soil moisture level [24–26]
ronment by existing controller (e.g., ET-based control) to build a in Equation 15. All the coefficients can be determined by means of
soil water content predictor to describe the water balance in the system identification techniques [27]. All variables are normally
root zone soil. Then we leverage the soil water content predictor to expressed in inches.
speed up the training process, after that we deploy the well-trained The weather data can be get from local weather station. For
DRLIC agent for this new orchard. 𝐸𝑇 , we adopt the simple calculation model established in [28]. As
shown in Equation 16, where Γ𝑐 is a crop-specific parameter. RA
3.5 Safe Mechanism for Irrigation stands for the extraterrestrial radiation, which is in the same unit
We design a safe mechanism that integrates the RL and ET controller as 𝐸𝑡 . 𝑇 𝐷 denotes the annual average daily temperature difference,
in a coupled close-loop. Figure 6 illustrates the workflow of safe which can be derived from local meteorological data, and 𝑇𝑡 is the
mechanism, with the following key elements. (i) Different from average outdoor temperature during the 𝑡th time period.
the pure RL framework, we introduce a safety moisture condition Safety Condition Detector. We employ the difference between
detector to evaluate whether the RL algorithm outputs a safe action. predicted moisture level and lower bound as a detector to estimate
(ii) If so, the action goes to the RL agent, who will be in charge of the almond tree damage. As explained in Section 2, MAD is the
irrigation control. (iii) Otherwise, we will use an ET-based controller lower bound. Then we use 𝑖=1 (𝑉𝑚𝑎𝑑 − 𝑉𝑖,𝑡 +1 ) as a safety condi-
Í𝑁
to generate an action for that control cycle. (iv) DRLIC will the RL tion detector, 𝑉𝑖,𝑡 +1 denotes the predicted moisture level from 𝑖th
agent for the future control cycles. We now introduce the soil water irrigation region for 𝑡 timestep. 𝑉𝑚𝑎𝑑 is the water content lower
content predictor and safety condition detector. bound. DRLIC will evoke ET-based controller once safety condition
Soil Water Content Predictor. To enable early detection of an detector detects the dangerous irrigation action.
unsafe action, we design a soil water content predictor to predict Parameter Learning of our Soil Water Content Predictor.
the moisture trend after taking an action. Then we design a safe We leverage the designed testbed to collect the irrigation amount of
condition detector to detect almond health penalty 𝑝 (𝑡). The idea almond trees for 2 months. The ET value for each day is collected
is to detect whether the damage metric for an almond tree is higher from a local weather station [12] and the moisture level for each
6
Sensor 1 (15 cm)
850
Sensor 2 (45 cm)

800

Raw Sensor Readings


750

700

650

23:00 05:00 11:00 17:00 23:00


Time

Figure 8: Daily Soil Moisture Readings.

VCC VCC

Relay 1 Relay 1
VCC NC VCC NC
IN COM IN COM
NO NO
GND GND
S S
Figure 7: Testbed and Microsprinkler Irrigation System. VCC NC VCC NC
IN COM IN COM
tree is collected by the designed DRLIC node. Then the linear least NO GND NO
GND
square method was applied to estimate the coefficients. 𝑅 2 is used to
Relay 2 Relay 2
explain the strength of the relationship between the moisture level
GND GND
and related factors. Normalized root-mean-square error (NRMSE)
is used as a goodness-of-fit measure for predictors. The results are (a) Positive Current Pulse. (b) Negative Current Pulse.
shown in Table 2, we can see that the 𝑅 2 is close to 1 indicates that
the irrigation, ET and precipitation have a strong relationship with Figure 9: On and off Circuit Diagram for Latching Solenoid.
soil water content for the tree. The NRMSE is less than 0.1 which assign 2 moisture sensors for each DRLIC node since the depth of
means that the predictor can achieve accurate prediction for soil root zone of the almonds in our testbed is 24 inches.
water content. A key feature of the DRLIC node is the ability to measure the
volumetric water content in the surrounding soil. We opted to
4 TESTBED AND HARDWARE purchase research-quality Decagon EC-5 sensors 3 , with a reported
accuracy of ±3%. Raw sensor readings collected over a period of
4.1 Testbed and Microsprinkler Description one day with a high sampling frequency can be seen in Figure 8.
Figure 7 shows our micro-sprinkler irrigation testbed. The micro- The sensors report the dielectric constant of the soil, which is
sprinkler irrigation system is installed and designed to be identical an electrical property highly dependent on the volumetric water
in hardware, micro-sprinkler coverage, etc. This irrigation system content (VWC).
measures 290 cm x 160 cm, with micro-sprinklers arranged in a 3x2
grid, each 97cm from the next. The micro-sprinklers chosen were 1/4 𝜑 (𝑚 3 /𝑚 3 ) = 9.92 ∗ 10−4 ∗ 𝑟𝑎𝑤_𝑟𝑒𝑎𝑑𝑖𝑛𝑔 − 0.45 (17)
’, 360 ◦ pattern by Rainbird, which are currently considered state-of-
the-art in micro-sprinkler technology. Six all-in-one young almond A linear calibration Function 17 above provided by the sensor
trees were planted into the testbed (three for each). The average manufacturer is used to convert the raw readings to VWC. The
height is 2 meters. The soil with 2.7 m 3 volume is collected from range of 𝜑 is between 0% and 100%. 𝜑 of saturated soils is generally
a local orchard that is a typical loam soil and the plant-available 40% to 60% depending on the soil type.
water-holding capacity is 2.4 inches of water per foot. Actuator: It consists of a latching solenoid with two relays.
A standard solenoid requires constant power to allow water to
flow, making it a poor choice for a battery-powered system. The
4.2 DRLIC Node Development. nine-volt performance all-purpose alkaline batteries from Amazon
The designed DRLIC node in Figure 4 consists of four main parts: can only continue to power the standard 12V DC solenoid for 8
sensors, actuator, power supply and transmission module. hours. To extend DRLIC node lifetime, we chose to use a latching
Sensors: It consists of several moisture sensors for different solenoid for micro-sprinkler actuation, requiring only a 25ms pulse
depths. The moisture sensors vary in their sensitivity and their of positive (to open) or negative (to close) voltage. The h-bridge
volume of soil measured. Each moisture sensor for 12-inch depth is usually used to produce bi-directional current to control the
provides accurate quantitative soil moisture assessment following
the Almond Board Irrigation Improvement Continuum [6]. We 3 Decagon devices. http://www.decagon.com/products/soils/
7
latching solenoid [29]. However, it needs a special design to meet DRLIC agent to understand whether a specific value for a given
different voltages requirements for the ESP32 and latching solenoid. parameter would improve total reward. To address this issue and fur-
In order to control the latching solenoid, we design a circuit ther increase DRLIC ’s performance, we leverage a tuning approach
diagram using two relays to operate with a very little connection to optimize the DRLIC ’s hyperparameters, such as 𝜆, 𝜇 associated
overhead. A relay is an electrically operated switch. Figure 9 shows with rewards and penalties, and the learning rates. In particular, we
the turn-on and off circuit diagram for latching solenoid. When both employ grid search which allows us to specify the range of values
the relays are off, there is no current going through the solenoid to be considered for each hyper-parameter. The grid search process
(S). Initially, both the relays are in a normally closed (NC) position. constructs and evaluates our model using every combination of the
To turn the solenoid on, Relay 1 is switched from NC to normally hyper-parameters. Finally, we employ cross-validation to evaluate
open(NO) for 25ms, providing the positive current pulse through each learned model.
the solenoid. The current path shown in Figure 9(a) is: VCC ->
NC1 -> COM1 -> S -> COM2 -> NO2 -> GND. To turn the solenoid 6 EVALUATION
off, Relay 2 is switched from NC to NO for 25ms, de-latching the
In this section, we evaluate the performance of DRLIC in the field.
solenoid to closed position. The current path shown in Figure 9(b)
is: VCC -> NC2 -> COM2 -> S -> COM1 -> NO1 -> GND. To prevent We evaluate DRLIC system for 15 days in the real world.
over-irrigation in the event of a power failure, we have the power
supply module to continuously provide the power. 6.1 Experiment Setting
Power Supply: Power supply consisted of a 5v, 1.2W solar panel 6.1.1 Baseline Strategy: We compare DRLIC to two state-of-the-art
for energy-harvesting and a 18650 Lithium Li-ion battery with a irrigation control schemes introduced in Section 7.
capacity of 3.7V 3000 MAH for energy storage. The TP4056 lithium ET-Based Irrigation Control [6]. To implement an ET-based
battery charger module comes with circuit protection and prevents controller, we query a local weather station for the previous day’s
battery over-voltage and reverse polarity connection. All sensors ET loss. To compensate for the loss, we use the sprinkler’s irrigation
(1 ESP32, 2 moisture sensors, 2 relays and 1 latching solenoid) are rate provided by its dataset to calculate how long the system should
powered with this power supply module. It can provide continuous be activated for irrigation.
power to prevent over-irrigation in the event of a power failure for Sensor-based Irrigation Control [7]. The sensor-based con-
the actuator module. troller has two thresholds, the lower and upper soil water content
Transmission Module: Transmission includes uplink and down- levels. The first is set at 4.96 inches, 10% higher than MAD to
link. In uplink path, the moisture sensor readings from the field are avoid the under irrigation occurring prior to the wetting front
sampled by the ESP32, a low-cost, low-power system on a chip (SoC) arriving at the sensor depth. The latter is set to 6.97 inches, 5%
series with Wi-Fi capability. The readings are then sent from ESP32 below FC to allow for some rainfall storage. We carefully set these
to the base station as input for the optimal control. In downlink two thresholds based on the soil environment of our testbed.
path, the control command calculated by the DRL agent will be
routed to all ESP32 to turn on or off the solenoids. 6.1.2 Performance Metrics. We evaluate the performance of DRLIC
and two baseline systems in terms of two performance metrics.
Quality of Service. Although the irrigation system has no
5 IMPLEMENTATION control over solar exposure and soil nutrients, it has direct control
In this section, we illustrate in detail the implementation of DRLIC over the moisture levels in the soil. For this reason, our primary
and tuning hyper-parameters. metric for irrigation quality is the system’s ability to maintain
DRLIC Implementation Details We implement DRLIC in python soil moisture above this MAD threshold at all times at all of our
using widely available open-source frameworks, including Pandas, measured locations. By doing so, we are guaranteeing that the plant
Scikit-learn and Numpy. The control scheme - DRLIC is imple- has sufficient moisture to be healthy and no production loss. In this
mented using the scalable reinforcement learning framework, RLlib paper, we call this the quality of service of the irrigation system.
[30]. RLlib supports TensorFlow, TensorFlow Eager, and PyTorch. Water Consumption. As each sprinkler uses a water supply
RLlib provides multi-ways for us to customize the training process and we directly control the times at which each micro-sprinkler
of the target environment modeling, neural network modeling, is active, we can monitor the amount of water consumed by these
action set building and distribution, and optimal policy learning. three systems at all times to determine the efficiency of each system.
The 10-year weather data (2010-2020) are collected for DRLIC, Thus another metric is the water consumption, which we would
with 9 years used for training and the remaining 1 year used for like to minimize subject to the quality of service constraints.
testing. In our implementation of DRLIC, we use the Adam optimizer
for gradient-based optimization with a learning rate of 0.01. The 6.1.3 Experiments in our Testbed. We validate the DRLIC system
discount factor is 0.99. The neural network model is 2 hidden layers with baselines in real-world deployment in terms of plant health
with 256 neurons for each. The local server for training and running and water consumption for 15 days. In the case study, we have six
DRLIC is a 64 bit quad-core Intel Core i5-7400 CPU at 3.00 GHz that almond trees in our testbed as shown in Figure 7. DRLIC, sensor-
runs Ubuntu 18.04. based control and ET-based control are used to irrigate the upper,
Training Details and Tuning Hyper-parameters. The perfor- middle and lower two trees separately since there is no runoff
mance of DRLIC agent is sensitive to the hyperparameter values between trees in our testbed. To allow three irrigation systems
chosen. Unfortunately, there is no simple approach that allows to operate independently, Every micro-sprinkler is controlled by
8
Soil Water Content (inches)

7 7 7

Soil Water Content (inches)

Soil Water Content (inches)


6 6 6

5 Tree1 Tree1 Tree1


5 5
Tree2 Tree2 Tree2
MAD MAD MAD
4 4 4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Days Days Days

(a) ET-based Method (b) Sensor-based Method (c) DRLIC

Figure 10: Daily Soil Water Content of Different Irrigation Methods (15 Days).

0.5 8

Soil Water Content (inches)


Daily Water Consumption (inches)

0.4
7

0.3
6
0.2
ET-based Control Tree1
Sensor-based Control
5 Tree2
0.1
DRLIC MAD
0.0 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 AVG. 0 50 100 150 200 250
Day Days
Figure 11: Daily Water Consumption. Figure 12: Daily Soil Water Content with Safe Mechanism.

a DRLIC node. In this way, the only difference among the three don’t receive moisture the same way, and most of the time, the
systems is the schedules sent to the nodes. ET-based controller irrigates more water than the plant needed.

6.2.2 Water Consumption. When a decision must be made to switch


6.2 Experiment Results to a new almond irrigation control system, a primary concern
6.2.1 Quality of Service. Irrigation systems are installed to main- is the efficiency of the proposed system. The system’s ability to
tain almond health with no production loss. Figure 10(a), 10(b), and return its investment based on increased efficiency will often dictate
10(c) shows the daily soil water content in the field for ET-based the acceptance of the technology. In addition, the environmental
control, Sensor-based control and DRLIC. The black horizontal line benefits of reduced freshwater consumption are clear and help
shows the MAD level. If soil water content is below this line, tree promote system adoption.
health will be impacted. We can see that DRLIC and ET system In our experimental setup, the water source provided by each
can maintain the soil water content above MAD threshold during micro-sprinkler is pressure-regulated to the industry standard, 30
the 15 days deployment and thus meet the requirement of almond psi. Each micro-sprinkler head distributing water uses a clearly-
health. However, the trees irrigated by Sensor-based method are defined amount of water per unit time, as described in the almond ir-
in an under-irrigation period of 18 hours for four days (day 1, 4, 7 rigation manual [6]. By tracking exactly when each micro-sprinkler
and 9) since the soil water content of Sensor-based method is lower is actuated by the system, we can determine very accurately how
than the MAD. The reason is that the moisture level of previous much water has been consumed.
day is close but not reach to MAD, so the sensor-based method will Figure 11 shows the daily irrigation amount of two trees actuated
not irrigate even though the moisture level is in an under-irrigation by ET-based control, sensor-based control and DRLIC in a 15 days’
trend. DRLIC system can irrigate what the trees need based on the deployment experiment. From this figure, we can see that DRLIC
learned model about the water changes in the soil and maintain the can save an average 9.52% and 3.79% of the water compared with
soil water content close to MAD level. ET-based and Sensor-based control during 15 days deployment
All three underlying irrigation systems begin with enough water experiment. ET-based control is a centralized control method to
content on the first day. We see that the soil water contents of the irrigate all almond trees without considering their specific need.
two trees in ET control system are much above the FC threshold. Sensor-based control is water-efficient by monitoring the moisture
In our deployment of DRLIC against the ET control strategy in and irrigating when the moisture level is lower than the MAD level.
Figure 10(a), we see that soil water content for these two trees is However, the thresholds are site-specific and not optimal. DRLIC
different and much higher than the MAD level. This emphasizes the can learn optimal irrigation control by interacting with the local
limitations of ET and the core of our work. The irrigated regions weather and soil water dynamic environment.
9
8
Table 3: Micro-sprinkler Node Manufacture Cost.
Soil Water Content (inches)

Component Price Component Price


7 Moisture Sensor x 2 $250 ESP32 $6.5
18650 Li-ion battery $3 Solar Panel $4.3
Latching Solenoid $4 Switch Relay x 2 $5
6
Waterproof Enclosure $12 Maintenance Fee $10
Total $294.8
5 Tree1
Tree2 does not consider the case when 𝑉𝑖 is higher than 𝑉𝑚𝑎𝑑 . DRLIC
MAD considers two more situations by giving different penalties to plants’
4 health and water consumption. The first case is over-irrigation. The
0 50 100 150 200 250
Days
water consumption is too high. Therefore, the penalty for water
Figure 13: Daily Soil Water Content (w/o Safe Mechanism). consumption is higher than plant health. In the second case, the
6.3 Effect of our Safe Irrigation Mechanism. plants are in good health. DRLIC strives to maintain the 𝑉𝑖 close to
𝑉𝑚𝑎𝑑 to save more water.
In the 15 days’ deployment, we find that there are two days (Day
2 and 14 in Figure 10(c)) DRLIC triggers the ET-control method.
This can also be validated from Figure 11, we can find the water 6.5 DRLIC Policy Convergence.
consumption of ET method and DRLIC on days 2 and 14 are the Figure 14 shows the RL training process and the policy converges
same. We check the weather data to understand the reason and around the 500th training iteration. We define the length of an
find that the wind speeds of days 2 and 14 are 7.2 and 11.9 mph episode as 30 days. We randomly vary the soil water content for
receptively which is much higher than the average 2.8 mph of the each tree between the FC (7.08 inches) and MAD (4.72 inches) at
other 13 days. the beginning of each episode. By doing so, the policy is exposed
We now run DRLIC with and without safe mechanism for a to different soil water content conditions and learns to avoid water
whole growing season in simulation, labeled as Robust-RL and depletion than the MAD level during training. At the beginning of
RL-only, respectively. Figure 12 and 13 show the daily soil water the experiment, the RL policy receives a larger negative rewards
content of Robust-RL and RL-only for a same growing season 2020, as it does not know a valid sequence of actions that maximize the
respectively. From the almond’s perspective, Robust-RL maintains reward. The policy converges at the 500th training iteration. The
health with 0 days below the MAD level. The RL-only irrigation whole training (i.e. 1000 training iterations) takes ∼ 4 hours using
method has 21 days below the MAD level. The reason is that the a 64-bit quad-core Intel Core i5-7400 CPU at 3.00 GHz.
RL models trained from past weather data “misbehave” on the test
weather data. while it may be possible to train on changing weather 6.6 Energy Consumption of Sensor Nodes
to obtain a robust policy, no offline training can ever cover all From a wireless sensor network standpoint, the ability of a system
possible weather changes. The RL agent with safe mechanism from to operate for a long period of time without user intervention is
DRLIC, however, is robust to weather changes because the safety fundamental. DRLIC nodes are no different, especially if they are
condition detector will detect the dangerous actions from RL agent meant to be put on the ground. For this reason, our hardware and
and the ET system will take control. software were designed to consume as little energy as possible.
DRLIC nodes were fitted with a latching solenoid, allowing the flow
6.4 Effect of proposed Reward. of water to be turned on or off with a short pulse of power, rather
In this section, we discuss the simulation results of DRLIC with than a constant supply. For additional energy savings, the radio in
different rewards for a whole growing season (March 1st to October each node is duty-cycled, activating for only a 10 second period
31st, 246 days). every 1 minute. We need this high data frequency, the reason is
In order to minimize the water consumption while not affecting that the base station can send an off command to DRLIC with
the plant health, we consider three situations in the reward. 1) a minute granularity. In our devices, the four peripherals that
The soil water content (𝑉𝑖 ) is higher than the FC (𝑉𝑓 𝑐 ) level. 2) 𝑉𝑖 consume significant energy are the two moisture sensors, solenoid,
is between 𝑉𝑓 𝑐 and 𝑉𝑚𝑎𝑑 . 3) 𝑉𝑖 is lower than 𝑉𝑚𝑎𝑑 . Only in the two relays and radio. To meet this energy, we design an energy
second situation, , the plants are in good health. To evaluate our harvesting mechanism by leveraging one 5/6V 1.2W solar panel.
reward function, we compare it with a simple reward (DRLIC𝑀𝐴𝐷 ) Figure 15 shows the energy consumption for different sensors.
that only maintains 𝑉𝑖 above 𝑉𝑚𝑎𝑑 . It is commonly used in the Each moisture sensor sample requires 10 mA of power for 10 ms,
sensor-based method [7]. The reward is defined as: 𝑅 = − 𝑖=1 and each flip of the latching solenoid requires 380 mA of power
Í𝑁
𝜆3 ∗
(𝑉𝑚𝑎𝑑 − 𝑉𝑖 ) + 𝜇 3 ∗ 𝑎𝑖 , 𝑉𝑖 < 𝑉𝑚𝑎𝑑 . This function gives more penalty for 30ms. The ESP32 radio requires 180 mA of power for 50ms
to plant health when 𝑉𝑖 lower than 𝑉𝑚𝑎𝑑 since plants’ health is when in transmitting mode. The relay requires 250 mA for 20ms for
significantly impacted. All the parameters are the same in Section switching on or off. In our system, to ensure we don’t cut power too
3.3.3. early, we add a safety band of 50% on the timing on both of these
Figure 17 shows the water consumption of DRLIC with our devices, triggering for 15ms and 45ms for the sensor and solenoid,
proposed reward (DRLIC) and the simple reward(DRLIC _MAD). respectively. Overall, the solar-harvest mechanism can meet the
DRLIC can save 2.04% more water than DRLIC _MAD, as the latter daily requirement of all the sensors in DRLIC node.
10
0 380 100
Solenoid
-50 relays 98

Current Draw (mA)


250

Battery Level (%)


-100 96
Radio
180
Reward

-150 94
Max Reward Sensors
-200 10
Average Reward 92
Min Reward
-250 0 90
0 200 400 600 800 1000 0 100 200 300 400 500 600 700 5/3 08:00 5/4 08:00 5/5 08:00
Episodes Time (ms) Time

Figure 14: Reinforcement Learning Pol- Figure 15: Energy Profile for Different Figure 16: Battery Charging and Dis-
icy Convergence. Kinds of Sensors. charging Cycle.

7 varies considerably in different irrigation district and over time.


Monthly Water Consumption (inches)

DRLIC_MAD
DRLIC This study assumed 100% ground water usage and availability.
6
Each tree costs $11.3 for irrigation water per month. Based on
5 our experiment results, DRLIC can save 9.52% of water expense per
4
month, corresponding to $1.08. Normally, almond orchards have
100 trees per acre. As a result, DRLIC can save $108 per month. Take
3 a 60-acre almond orchard with 10 irrigation regions as an example.
2 Each irrigation region is six acres. DRLIC can save $648 in each
irrigation region per month.
1 In each irrigation region, we need to deploy one DRLIC node,
0 which costs $294.8. The other irrigation components will use the
Mar. Apr. May. June. July. Aug. Sep. Oct. AVG. existing infrastructure, such as the pipelines and micro-sprinklers
Figure 17: Water Consumption for DRLIC with Different under each tree. The cost of upgrading the existing irrigation system
Reward with our irrigation control system is $294.8 for one irrigation region
Figure 16 shows the two days’ energy charging and discharging in an orchard. Every month, our system can save $648. Therefore,
process. After a night discharge, the 18650 battery level is increasing it only needs half a month for our irrigation system to return the
at 9:15 am on May 3rd. It usually takes 2 hours to fully charge the investment.
battery (9:15 - 11:35 am). The battery level will keep 100% from
11:35 am to 18:45 pm, the energy harvested from solar can meet the 7 RELATED WORK
energy requirement of all sensors in DRLIC node. The battery will ET-Based Irrigation Control. As weather is a primary water
discharge from 100% at 18:45 pm of May 3th to the 90.7% at 8:45 source or sinks in an irrigated space, systems have been developed
am of May 4th. Then the whole energy charging and discharging to use weather as input for control. The simplest of these systems
process repeat. The lowest battery level is an average 90%. In the 2 use standard fixed-schedule irrigation, but allow a precipitation
week’s deployment, we find that even the cloudy day, the battery sensor to override control to save water during rain. The more com-
can also be charged and will take one more hour to be fully charged. plicated systems, now industry standard, use evapo-transpiration
(ET), an estimate of the amount of water lost to evaporation and
6.7 Return on Investment plant transpiration to do efficient water-loss replacement [31]. Some
A primary concern to purchase or upgrade an irrigation control providers boast an average 30% reduction in water consumption, but
system is the return on investment, i.e., how long does it take to save as with all industry irrigation systems, ET-based systems are limited
enough money from water consumption to cover the cost of the new by centralized control, and can not provide site-specific irrigation,
irrigation system. To calculate the return on investment of DRLIC, reducing potential system efficiency and quality of control.
we take into account the initial investment cost of the DRLIC system Sensor-based Irrigation Control. With the introduction of
and the money saved from the less water consumption provided by more accurate and efficient soil moisture sensors, work has been
our increased irrigation efficiency. done to create irrigation controllers that react directly to moisture
We first calculate the cost to develop a single DRLIC node. All levels in the soil [7]. Moisture sensors buried in the root zone of
the components of a DRLIC node can be found in a consumer trees accurately measure the moisture level in the soil and transmit
electronics store and a home improvement store. Table 3 lists the this data to the controller. The controller then adjusts the pre-
cost of all components. In total, a DRLIC sensing and actuation programmed watering schedule as needed. There are two types of
node costs $294.8. The large portion of the budget is the cost of two soil moisture sensor-based systems: 1) Suspended cycle irrigation
soil moisture sensors. We use two expensive soil moisture sensors systems. Suspended cycle irrigation systems use traditional timed
that provide accurate measurement and long lifetime. controllers and automated watering schedules, with start times
The factors that mostly influence the payback of our system and duration. The difference is that the system will stop the next
is water price and water volume saved by DRLIC. Water price scheduled irrigation cycle when there is enough moisture in the
11
soil. 2) Water on demand irrigation requires no programming of 2020 Seed Fund Award from CITRIS and the Banatao Institute at
irrigation duration (only start times to water). This type maintains the University of California, and a 2022 Faculty Research Award
two soil moisture thresholds. The lower one to initiate watering, through the Academic Senate Faculty Research Program at the
and the upper one to terminate watering [7]. However, without University of California, Merced.
a model of the way water is lost, these thresholds are usually set
based on experience and are not optimal.
Model-based Irrigation Control. In [29], a mechanistic PDE
REFERENCES
[1] Field capacity. https://nrcca.cals.cornell.edu/soil/CA2/CA0212.1-3.php.
model of moisture movement within irrigated space is built. Using [2] R Troy Peters, Kefyalew G Desta, and Leigh Nelson. Practical use of soil moisture
this model, an optimal watering schedule can be found to maintain sensors and their data for irrigation scheduling. 2013.
a proper moisture level. However, the PDE model is not updated [3] Soil quality indicators. https://www.nrcs.usda.gov/Internet/FSE_DOCUMENTS/
nrcs142p2_053288.pdf.
over time and future weather prediction is not taken into account. [4] Julian Fulton, Michael Norton, and Fraser Shilling. Water-indexed benefits and
To tackle these two limitations, the same authors further improve impacts of california almonds. Ecological indicators, 96:711–717, 2019.
[5] 2018 Almond Board of California. Water footprint for almonds. https://almonds.
the control system in [13]. The PDE model is eschewed in favor com/sites/default/files/2020-05/Water_footprint_plus_almonds.pdf.
of an adaptive approach that involves models trained from sensor [6] Almond Board of California. Almond irrigation improvement
data. Long-term and short-term models are developed to describe continuum. https://www.almonds.com/sites/default/files/2020-02/
Almond-Irrigation-Improvement-Continuum.pdf.
the relationship of runoff between sprinklers in the movement of [7] GL Grabow, IE Ghali, RL Huffman, et al. Water application efficiency and
water through soil. adequacy of et-based and soil moisture–based irrigation controllers for turfgrass
As indicated by the authors [13], their system is designed for turf irrigation. Journal of irrigation and drainage engineering, 2013.
[8] Yenny Fernanda Urrego-Pereira, Antonio Martínez-Cob, and Jose Cavero. Rele-
irrigation, and it is unlikely to provide benefit in shrubbery or tree vance of sprinkler irrigation time and water losses on maize yield. Agronomy
irrigation. First, the turf soil moisture is affected by water runoff on Journal, 2013.
[9] Dilini Delgoda, Hector Malano, Syed K Saleem, and Malka N Halgamuge.
soil surface and the overlapping coverage of sprinklers. The models Irrigation control based on model predictive control (mpc): Formulation of theory
in [13] are focused on capturing the relationship of runoff between and validation using weather forecast data and aquacrop model. Environmental
sprinklers. For tree irrigation, however, there is little runoff due to Modelling & Software, 2016.
[10] Camilo Lozoya, Carlos Mendoza, Leonardo Mejía, Jesús Quintana, Gilberto
the tree space. The soil moisture model for tree irrigation needs to Mendoza, Manuel Bustillos, Octavio Arras, and Luis Solís. Model predictive
consider the soil water relationship under different depths. Second, control for closed-loop irrigation. IFAC Proceedings Volumes, 2014.
as shown in [32], the decay of volumetric water content derived [11] Bruno Silva Ursulino, Suzana Maria Gico Lima Montenegro, Artur Paiva Coutinho,
et al. Modelling soil water dynamics from soil hydraulic parameters estimated
from the long-term model of [13] was shown to be much quicker by an alternative method in a tropical experimental basin. Water, 2019.
than the real-world scenarios. It is bound to irrigate lightly and [12] California department of water resources. https://www.cimis.water.ca.gov/.
[13] Daniel A Winkler, Miguel Á Carreira-Perpiñán, and Alberto E Cerpa. Plug-and-
frequently, which has been found to be inefficient [33]. play irrigation control at scale. In ACM/IEEE IPSN, 2018.
DRL-based Control. DRL has been applied in many applica- [14] Haoyu Niu, Dong Wang, and YangQuan Chen. Estimating actual crop evapo-
tions, such as network planning [15], cellular data analytics [16], transpiration using deep stochastic configuration networks model and uav-based
crop coefficients in a pomegranate orchard. In Autonomous Air and Ground
sensor energy management [34], mobile app prediction [35, 36] Sensing Systems for Agricultural Optimization and Phenotyping V. International
and building energy optimization [19, 37]. However, this paper Society for Optics and Photonics, 2020.
tackles some unique challenges for irrigation control. First, we [15] Hang Zhu, Varun Gupta, Satyajeet Singh Ahuja, Yuandong Tian, Ying Zhang,
and Xin Jin. Network planning with deep reinforcement learning. In Proceedings
define an irrigation reward function that considers three cases for of the 2021 ACM SIGCOMM 2021 Conference, 2021.
tree irrigation, as introduced in Section 3.3.3. Second, to prevent [16] Zhihao Shen, Wan Du, Xi Zhao, and Jianhua Zou. Dmm: fast map matching for
cellular data. In ACM MobiCom, 2020.
any possible damage to plants’ health, we adopt a safe mechanism [17] Miaomiao Liu, Xianzhong Ding, and Wan Du. Continuous, real-time object
that replaces some unwise actions generated by DRL agent. Third, detection on mobiledevices without offloading. In IEEE ICDCS, 2020.
due to the data inefficiency, we leverage a data-driven simulator to [18] Devanshu Kumar, Xianzhong Ding, Wan Du, and Alberto Cerpa. Building sensor
fault detection and diagnostic system. In Proceedings of the 8th ACM International
speed up the training process. Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation,
pages 357–360, 2021.
8 CONCLUSIONS [19] Xianzhong Ding, Wan Du, and Alberto Cerpa. Octopus: Deep reinforcement
learning for holistic smart building control. In ACM BuildSys, 2019.
We presented DRLIC, a DRL-based irrigation system that generates [20] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
optimal irrigation control commands according to current soil water Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep
content, current weather data and forecasted weather information. reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
[21] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov.
A set of techniques have been developed, including our customized Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
design of DRL states and reward for optimal irrigation, a validated [22] Artem Molchanov, Tao Chen, Wolfgang Hönig, James A Preiss, Nora Ayanian,
soil moisture simulator for fast DRL training, and a safe irrigation and Gaurav S Sukhatme. Sim-to-(multi)-real: Transfer of low-level robust control
policies to multiple quadrotors. In IEEE IROS, 2019.
module. We design DRLIC irrigation node and build a testbed of six [23] Christopher Berner, Greg Brockman, Brooke Chan, et al. Dota 2 with large scale
almond trees. Extensive experiments in real-world and simulation deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
[24] Su Ki Ooi, Iven Mareels, Nicola Cooley, Greg Dunn, and Gavin Thoms. A systems
show the efficiency of DRLIC system. engineering approach to viticulture on-farm irrigation. IFAC Proceedings Volumes,
2008.
9 ACKNOWLEDGMENTS [25] Guotao Cui and Jianting Zhu. Infiltration model based on traveling characteristics
of wetting front. Soil Science Society of America Journal, 82(1):45–55, 2018.
We would like to thank our anonymous shepherd and reviewers [26] Guotao Cui and Jianting Zhu. Prediction of unsaturated flow and water backfill
for their constructive comments. We also thank Danny Royer for during infiltration in layered soils. Journal of Hydrology, 557:509–521, 2018.
[27] Dilini Delgoda, Syed K Saleem, Hector Malano, and Malka N Halgamuge. Root
helping us set up the testbed. This research is partially supported zone soil moisture prediction models based on system identification: Formulation
by the National Science Foundation under grants #CCF- 2008837, a of the theory and validation using field and aquacrop data. Agricultural Water
12
Management, 2016. is-it-better-to-irrigate-light-and-frequent-or-deep-and-infreque.html.
[28] George H Hargreaves and Zohrab A Samani. Reference crop evapotranspiration [34] Francesco Fraternali, Bharathan Balaji, Dhiman Sengupta, Dezhi Hong, and
from temperature. Applied engineering in agriculture, 1(2):96–99, 1985. Rajesh K Gupta. Ember: energy management of batteryless event detection
[29] Daniel A Winkler, Robert Wang, Francois Blanchette, et al. Magic: Model-based sensors with deep reinforcement learning. In ACM SenSys, 2020.
actuation for ground irrigation control. In ACM/IEEE IPSN, 2016. [35] Zhihao Shen, Kang Yang, Zhao Xi, Jianhua Zou, and Wan Du. Deepapp: a deep
[30] Eric Liang, Richard Liaw, Robert Nishihara, et al. Rllib: Abstractions for reinforcement learning framework for mobile application usage prediction. IEEE
distributed reinforcement learning. In ICML. PMLR, 2018. Transactions on Mobile Computing, 2021.
[31] Richard G Allen, Luis S Pereira, Dirk Raes, Martin Smith, et al. Crop [36] Kang Yang, Xi Zhao, Jianhua Zou, and Wan Du. Atpp: A mobile app prediction
evapotranspiration-guidelines for computing crop water requirements-fao irriga- system based on deep marked temporal point processes. In 2021 17th International
tion and drainage paper 56. Fao, Rome, 1998. Conference on Distributed Computing in Sensor Systems (DCOSS), pages 83–91.
[32] Akshay Murthy, Curtis Green, Radu Stoleru, Suman Bhunia, Charles Swanson, IEEE, 2021.
and Theodora Chaspari. Machine learning-based irrigation control optimization. [37] Xianzhong Ding, Wan Du, and Alberto E Cerpa. Mb2c: Model-based deep
In ACM BuildSys, 2019. reinforcement learning for multi-zone building control. In Proceedings of the 7th
[33] Light and frequent irrigation. https://www.usga.org/ ACM international conference on systems for energy-efficient buildings, cities, and
course-care/water-resource-center/our-experts-explain--water/ transportation, 2020.

13

You might also like