0% found this document useful (0 votes)
25 views

Retrieval of TP Concentration From UAV Multispectral Images

Uploaded by

lzh Fang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Retrieval of TP Concentration From UAV Multispectral Images

Uploaded by

lzh Fang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

remote sensing

Article
Retrieval of TP Concentration from UAV Multispectral Images
Using IOA-ML Models in Small Inland Waterbodies
Wentong Hu, Jie Liu, He Wang, Donghao Miao, Dongguo Shao and Wenquan Gu *

State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University,
Wuhan 430072, China
* Correspondence: [email protected]

Abstract: Total phosphorus (TP) concentration is high in countless small inland waterbodies in Hubei
province, middle China, which is threating the water environment. However, there are almost no
ground-based water quality monitoring points in small inland waterbodies, because the cost of
time, labor, and money is high and it does not meet the needs of spatiotemporal dynamic monitor-
ing. Remote sensing provides an effective tool for TP concentration monitoring spatiotemporally.
However, monitoring the TP concentration of small inland waterbodies is challenging for satellite
remote sensing due to the inadequate spatial resolution. Recently, unmanned aerial vehicles (UAV)
have been applied to quantitatively retrieve the spatiotemporal distribution of TP concentration
without the challenges of cloud cover and atmospheric effects. Although state-of-the-art algorithms
to retrieve TP concentration have been improved, specific models are only used for specific water
quality parameters or regions, and there are no robust and reliable TP retrieval models for small
inland waterbodies at this time. To address this issue, six machine learning methods optimized by
intelligent optimization algorithms (IOA-ML models) have been developed to quantitatively retrieve
TP concentration combined with the reflectance of original bands and selected band combinations
of UAV multispectral images. We evaluated the performances of models in terms of coefficient of
determination (R2 ), root mean squared error (RMSE), and residual prediction deviation (RPD). The
results showed that the R2 of the six IOA-ML models for training, validation, and test sets were
Citation: Hu, W.; Liu, J.; Wang, H.; 0.8856–0.984, 0.8054–0.8929, and 0.7462–0.9045, respectively, indicating the methods had high preci-
Miao, D.; Shao, D.; Gu, W. Retrieval sion and transferability. The extreme gradient boosting optimized by genetic algorithm (GA-XGB)
of TP Concentration from UAV
performed best, with the highest precision for the validation and test sets. The spatial distribution
Multispectral Images Using IOA-ML
of TP concentration of each flight derived from different models had similar distribution character-
Models in Small Inland Waterbodies.
istics. This paper provides a reference for promoting the intelligent and automatic level of water
Remote Sens. 2023, 15, 1250. https://
environment monitoring in small inland waterbodies.
doi.org/10.3390/rs15051250

Academic Editors: Francisco Keywords: TP retrieval; IOA-ML models; UAV multispectral images; spatial distribution
Agüera-Vega, Fernando Carvajal-Ramírez
and Patricio Martínez-Carricondo

Received: 16 January 2023


Revised: 16 February 2023 1. Introduction
Accepted: 21 February 2023 Inland waters are indispensable for agricultural, industrial, and recreational needs,
Published: 24 February 2023 such as aquaculture, transport, and energy production, and as a major source of drinking
water and irrigation [1]. The deterioration of water quality has become one of the most
important topics of environmental protection and the safe use of water. More than 60% of
world’s large lakes (>10 km2 ) were considered eutrophic in the summer of 2012 [2]. Accord-
Copyright: © 2023 by the authors.
ing to the study of OECD (World Economic Cooperation and Development Organization),
Licensee MDPI, Basel, Switzerland.
80% of water eutrophication is attributable to phosphorus, and 10% is directly related to
This article is an open access article
phosphorus and nitrogen [3]. In China, eutrophication of rivers and lakes has become more
distributed under the terms and
conditions of the Creative Commons
severe in the middle reaches of the Yangtze River, and phosphorus is the primary limiting
Attribution (CC BY) license (https://
element [4,5]. Therefore, monitoring the spatiotemporal variability of total phosphorus
creativecommons.org/licenses/by/ (TP) concentration is of great significance to protect the water environment.
4.0/).

Remote Sens. 2023, 15, 1250. https://doi.org/10.3390/rs15051250 https://www.mdpi.com/journal/remotesensing


Remote Sens. 2023, 15, 1250 2 of 18

Traditional water quality monitoring methods have great precision, mainly based
on field sampling, laboratory analysis, or automated instruments [6]. However, these
methods are labor-intensive, time-consuming, and costly, and do not meet the needs of
spatiotemporal dynamic monitoring of water quality [1,7,8].
Over the last few decades, the role of remote sensing in water quality retrieval has been
significantly increasing, with low-cost, full-coverage, and micro-dynamic characteristics
thanks to the rapid growth in technologies and applications. For many years, satellites
equipped with various sensors have been adopted for water quality assessment [9,10].
However, owing to the long return visit period, low spatial resolution, and susceptibility to
interference by clouds, the application of satellites to real-time monitoring water quality
of complex environments and small-sized waterbodies, such as small ditches and ponds,
is not very suitable. Additionally, researchers have proved that small lakes were more
vulnerable to eutrophication [11], and chlorophyll-a (Chl-a) concentrations were inversely
related to lake size in the middle and lower reaches of Yangtze River [12].
Unmanned aerial vehicles (UAV) have led to innovative, regional monitoring of
inland surface water and have successfully compensated for deficiencies in spatiotem-
poral resolution with flexibility and nonsusceptibility to interference by clouds [6,7].
UAVs equipped with multi-sensors, especially multispectral sensors, have been used to
monitor Chl-a, total suspended solids (TSS), total nitrogen (TN), total phosphorus (TP),
chemical oxygen demand (COD), and permanganate index (CODMn ) in complex inland
waterbodies [6,13]. For example, Su and Chou [14] used multispectral sensor mounted
on a UAV to map the trophic state of a small reservoir. Wang et al. [15] designed an
acquisition scheme of water quality spectral elements suitable for the complex water-
bodies of aquaculture, combining the ground wireless sensor network and UAV spectral
remote sensing technology. Liu et al. [16] constructed the inversion models of TP, TSS,
and turbidity by multispectral sensor mounted on a UAV, and achieved higher accuracy
through feature selection.
Although remote sensing can facilitate the monitoring of TP concentration, the method-
ology involved is complicated because phosphorus is nonactive and does not have spectral
characteristics [17]. Therefore, the relationships between TP concentration and surface
reflectance are nonlinear and complex [18]. The bands from visible to near-infrared have
been used to estimate TP concentration [3,19]. The traditional retrieval models of water
quality parameters based on statistical regression analysis, including linear regression,
polynomial regression, ridge regression, and other methods, have poor inversion accuracy
and weak generalization [20]. Machine learning (ML) methods are gaining momentum for
water quality retrieval due to their ability to capture the potential relationship between
remote sensing images and TP concentration [21,22]. In recent years, many researchers have
proved that TP concentration can be estimated by ML methods with UAV multispectral
images. For example, TP concentration in urban rivers was monitored by ML methods
with UAV multispectral images [6]. Zhang et al. [23] developed a hybrid feedback deep
factorization machine model to retrieve the concentration of phosphorus and trace pollution
sources in urban rivers. Chang et al. [24] directly explored the TP spatiotemporal patterns
with the aid of genetic programming models. UAV multispectral data were used to retrieve
Chl-a, TN, and TP based on six ML models [25]. Based on the spectral and spatial features,
Zhou et al. [26] used an ensemble ML model to estimate TP concentration in Shanghai.
Although previous studies have used ML methods for TP estimation in different regions,
specific models are only used for specific regions, or even only for the condition range in
training data. In addition, almost all ML methods have their limitations, such as complex
model hyperparameters. The intelligent optimization algorithm (IOA) can optimize the
hyperparameters of ML methods due to their global search and adaptive characteristics
and improve the robustness and predictability [27]. This paper establishes the TP retrieval
models by combining the global search ability of IOA with the advantages of the high
efficiency and flexibility of machine learning (IOA-ML) methods.
IOA with the advantages of the high efficiency and flexibility of machine learning (IOA-
ML) methods.
In this paper, six IOA-ML models were developed to retrieve TP concentration. By
incorporating the UAV multispectral images, we attempt to propose methods for small
Remote Sens. 2023, 15, 1250 inland waterbodies monitoring with high reliability and transferability. The main 3 of 18objec-

tives of this study include (1) evaluating the performances of TP retrieval of six IOA-ML
models with paired in situ data and UAV multispectral images divided into training, val-
idation,Inand
thistest
paper, six(2)
sets, IOA-ML models
conducting were
the developed
statistical to retrieve
analysis of TPTPconcentration
concentration.based
By on
incorporating the UAV multispectral images, we attempt to propose methods for small
pixel scale, (3) verifying the transportability of the developed IOA-ML models. This study
inland waterbodies monitoring with high reliability and transferability. The main objectives
is helpful for monitoring the water quality in small inland waterbodies and provides tech-
of this study include (1) evaluating the performances of TP retrieval of six IOA-ML models
nical
withsupport forsitu
paired in thedata
intelligent
and UAVmanagement
multispectralofimages
waterdivided
environment.
into training, validation,
and test sets, (2) conducting the statistical analysis of TP concentration based on pixel scale,
2. Materials
(3) verifyingand
the Methods
transportability of the developed IOA-ML models. This study is helpful for
2.1.monitoring
Study Areathe water quality in small inland waterbodies and provides technical support
for the intelligent management of water environment.
In this paper, three typical small inland waterbodies on Hubei province, middle
China, were selected
2. Materials as the research areas (Figure 1). Research area A and B are located in
and Methods
Jingmen city,
2.1. Study Hubei province, with a linear distance of 12 km. Research area A is a drain-
Area
age ditch for crayfish–rice
In this paper, culture,
three typical located
small inland in Zhanghe
waterbodies town,province,
on Hubei Dongbao district,
middle and re-
China,
search
were area B isasa the
selected small reservoir,
research located1).inResearch
areas (Figure Tuanlinpu
areatown,
A and Duodao district.
B are located The water
in Jingmen
city, Hubei
quality of themprovince,
both arewith a linearinfluenced
mainly distance of 12bykm. Research and
agriculture area aquaculture.
A is a drainageResearch
ditch for area
C iscrayfish–rice
composed culture, locatedlocated
of six ponds, in Zhanghe town, Dongbao
in Shuangxiqiao district,
town, and research
Xian’an district,area B is a city.
Xianning
small reservoir, located in Tuanlinpu town, Duodao district. The water quality
There is a domestic sewage collection and treatment facility, and the treated tailwater of them both cir-
are mainly influenced by agriculture and aquaculture. Research area C is composed of six
culates among these ponds powered by engineering and is then discharged to the down-
ponds, located in Shuangxiqiao town, Xian’an district, Xianning city. There is a domestic
stream after reaching the standard. The detail of research areas and sampling information
sewage collection and treatment facility, and the treated tailwater circulates among these
areponds
shown in Figure
powered 1.
by engineering and is then discharged to the downstream after reaching
the standard. The detail of research areas and sampling information are shown in Figure 1.

Figure 1. Locations and true color images from UAV multispectral images of the three research areas
Figure 1. Locations and true color images from UAV multispectral images of the three research areas
and geographical distribution of the sampling sites and times.
and geographical distribution of the sampling sites and times.
Remote Sens. 2023, 15, 1250 4 of 18

2.2. Data Processing


2.2.1. UAV Data and Preprocessing
The UAV platform utilized in our study is the DJI M300 RTK, manufactured by DJI
innovations company, Shenzhen, Guangdong, China. The DJI M300 RTK is a 4-rotor UAV
and integrates binocular vision, flight control system, and an FPV camera, with functions
such as six-direction positioning, obstacle avoidance, and precise reshooting. It not only
ensures flight safety but also provides necessary functions suitable for battery inspection
applications [28]. The parameter details of this UAV are shown in Table 1. The multispectral
imager mounted on the UAV is RedEdge MX Dual, manufactured by Micasense company,
the United States. Ten multispectral bands can be obtained synchronously. In addition to
the standard five-channel band of the RedEdge MX (first row of Table 2), a new five-channel
sensor is added (second row of Table 2). Therefore, it is more suitable for water environment
monitoring. The ground spatial resolution of multispectral images is 8 cm/pixel when the
flight altitude of the UAV is 120 m. The ten-band range is shown in Table 2.

Table 1. Parameters of DJI M300 RTK.

Item Parameters
Diagonal wheelbases 895 mm
Empty weight 6.3 kg
Maximum takeoff weight 9 kg
No load endurance 55 min
Maximum flight/ascending/descending speed 23 m/s/6 m/s/5 m/s
Maximum wind resistance level 15 m/s

Table 2. Band ranges of RedEdge MX Dual.

RedEdge-MX Blue475 Green560 Red668 Red edge717 Nir842


Wavelength range (nm) 475 ± 16 560 ± 13.5 668 ± 7 717 ± 6 842 ± 28.5
RedEdge-MX Blue Blue444 Green531 Red650 Red edge705 Red edge740
Wavelength range (nm) 444 ± 14 531 ± 7 650 ± 8 705 ± 5 740 ± 9

There were five flights in total from July 2021 to September 2022. The flight details
are shown in Table 3. No ground control points were added to the flights because the
multispectral imager has an integrated GPS that geo-tags each of the images acquired by the
UAV. Furthermore, it was also equipped with a Downwelling Light Sensor and a Calibrated
Reflectance Panel (CRP) to perform the radiometric calibration on the ambient light changes
during the flight. A picture of the CPR was taken before and after each flight to capture the
lighting conditions. The operational altitude of the UAV was 50–200 m, where atmospheric
influence could be ignored. The multispectral images were mosaiced with radiometric
correction after each flight. The water surface was extracted by the normalized difference
water index (NDWI), calculated by Equation (1). The threshold was set to 0.2.

Green560 − Nir842
NDW I = (1)
Green560 + Nir842
where Green560 is the reflectance of the Green560 band and Nir842 is the reflectance of the
Nir842 band.
Remote Sens. 2023, 15, 1250 5 of 18

Table 3. Flight and sampling information.

Time Area Height Resolution Number of Photos Sampling Number


26 July 2021 Research area A 50 m 3.67 cm 1820 24
19 September 2022 Research area B 200 m 14.4 cm 3210 48
10 December 2021 Research area C 100 m 7.7 cm 6460 20 (all in pond 2)
10 (4 in pond 2, 2 in pond 3, 1 in
27 May 2022 Research area C 200 m 14.3 cm 2760
pond 4, 1 in pond 5, 2 in pond 6)
19 (15 in pond 2, 1 in pond 3, 1 in
27 September 2022 Research area C 200 m 14.4 cm 2560
pond 4, 1 in pond 5, 1 in pond 6)

2.2.2. Field Data and Preprocessing


Water samples were collected in situ, as shown in Figure 1, synchronously with
the collection of multispectral images in this study. Sampling points were 4–6 m away
from the banks to avoid the influence of mixed pixels, except for research area C, which
was evenly distributed on the water surface. Meanwhile, a real-time global positioning
system was used to record the coordinate information of sampling points. According to
the technical guidance for water quality sampling, the sampling points were arranged
0.1–0.2 m below the water surface because the water transparency was around 0.3 m,
and 0.5 L of the water samples were collected at each point. Bottles were shaded before
chemical analysis of TP concentration in the laboratory. After collecting water samples,
the chemical experiment was completed within three days through a spectrophotometric
analysis after the decomposition of potassium persulfate based on the Chinese national
standard and trade standard. Two or three parallel samples of all water samples were
analyzed, and the mean values served as the final TP concentration.
A single pixel can be easily impacted by specular reflection and water splash, making
it difficult to reflect the spectral difference induced by an actual change in water quality at
the sampling point [6]. For accurately matching UAV multispectral images with sampling
points, a spatial window with 20 × 20 pixels was used to extract the reflectance of all bands
located at each sampling point, rather than considering a single pixel in this study. The
pixel values were extracted by the function of “Regions of Interest (ROI)” using EN-VI5.3.

2.3. Model Development


2.3.1. Modeling Sets Construction
In this study, outliers deviating more than three standard deviations from the mean TP
concentration were excluded, and 121 paired TP concentration and reflectance values were
divided into training sets, validation sets, and test sets. The training sets and validation
sets were used for model training, and test sets were not put into the model to measure the
accuracy and generalization ability of the established models. In this paper, 20 sampling
points were randomly selected as test sets, and the remaining 101 sampling points were
divided into training and validation sets, with a ratio of 7:3. The model development
process is shown in Figure 2.

2.3.2. Feature Selection


To reduce the interference of background information and extract effective spectral
information, it is important to try a variety of combined computing modes. Band ratio,
a semiempirical method for retrieving water quality parameters, has been extensively
researched and applied in monitoring inland waterbodies and has achieved promising
results [29]. This method is often used in satellite remote sensing research to weaken
the impact of atmospheric effects. The flight altitude of the UAV is sufficiently low that
atmospheric effect could be ignored. Therefore, any 2–4 bands were combined as a feature
through four fundamental admixture operations in this study, and 10,020 features were
produced in total.
Remote Sens. 2023, 15, x FOR PEER REVIEW 6
Remote Sens. 2023, 15, 1250 6 of 18

Figure2. 2.
Figure Flowchart
Flowchart of Model
of Model Development.
Development.

However, some features may be redundant because the information they add is
2.3.2. Feature Selection
contained in other features [30]. A good feature subset contains features that are highly
relatedTo
to reduce the interference
target variable but are notof background
related information
to each other. and extract
Feature selection effective spe
is helpful
to avoid overfitting and therefore improve model generalization.
information, it is important to try a variety of combined computing modes.To better exploit theBand ra
complex interaction between TP concentration and the reflectance of multispectral
semiempirical method for retrieving water quality parameters, has been extensivel images,
correlation-based feature selection (CFS) was used to select the feature subset used for the
searched and applied in monitoring inland waterbodies and has achieved promisin
IOA-ML models [31]. The feature subset selected by CFS and the original ten bands are
sults [29].
selected This
as the method
input isofoften
variables used in
all IOA-ML satellite
models. Theremote sensing
steps of CFS are asresearch
follows: to weaken
impact
Step of the
1: find atmospheric effects.
feature that has The flight
the highest altitude
r (Pearson’s of the UAV
correlation is sufficiently
coefficient) value with low tha
mospheric
the effectand
target variable could be ignored.
the feature Therefore,
is the first any
variable of the2–4 bands
feature were combined as a fea
subset.
through
Step four
2: select thefundamental admixture
feature that maximizes operations
Merit inby
s calculated this study,(2)
Equation andand10,020 features
add the
selected
produced feature to the feature subset.
in total.
Step 3:However,
repeat stepsome2 untilfeatures of Merit
the valuemay does not increase.
be sredundant because the information they add is
tained in other features [30]. A good feature subset contains features that are highl
lated to target variable but are not related to each other. Feature selection is helpf
avoid overfitting and therefore improve model generalization. To better exploit the
plex interaction between TP concentration and the reflectance of multispectral im
Remote Sens. 2023, 15, 1250 7 of 18

krc f
Merits = q (2)
k + k ( k − 1)r f f

where Merits is the heuristic “merit” of the feature subset, k is the feature number of the feature
subset, rc f is the average of the correlation between the target variable and the feature subset,
and r f f is the average intercorrelation of any two features in the feature subset.
The feature subset and TP concentration were processed by Z-score normalization to re-
duce the difference of feature ranges with the following equation before model establishment.

Xi − X
Xi 0 = (3)
Xstd

where Xi 0 denotes the normalized result, Xi denotes the reflectance values or TP concen-
tration of the training, validation, and test sets. X denotes the mean of the training and
validation sets and Xstd denotes the standard deviation of the training and validation sets.

2.3.3. IOA-ML Models


Hyperparameter is a kind of predetermined parameter before the learning process,
and their values directly affect the performance of the ML models [31]. To improve the
robustness and predictability of the ML models, IOAs were used to optimize the hyper-
parameters of the ML models. Six IOA-ML models, including support vector regression
optimized by particle swarm optimization (PSO-SVR), categorical boosting regression
optimized by genetic algorithm (GA-CBR), gradient boosting regression optimized by
GA (GA-GBR), deep neural network (DNN), extreme gradient boosting optimized by GA
(GA-XGB), and random forest optimized by grid search algorithm (GS-RF), have been
developed. The details of the six IOA-ML models were as follows.
SVR can effectively deal with small sample and nonlinear problems and is a useful
ML regression method. The radial basis function was chosen as the kernel of SVR in this
paper. The hyperparameters of SVR, including penalty parameters C, gamma, and epsilon,
were optimized by PSO [32].
The CBR model is a new gradient boosting decision tree algorithm that can handle
categorical features well. This algorithm can deal with categorical features during train-
ing time instead of during preprocessing time and allows the use of whole datasets for
training [33]. The hyperparameters of CBR, including number of iterations, learning rate,
maximum tree depth, and regularization, were optimized by GA in this paper.
The GBR model is designed based on boosting. The GBR algorithm rebuilds the model
in the gradient descent direction of the loss function of the previous iteration. Generally,
the smaller the loss function, the better the model performance. The hyperparameters of
GBR, including learning rate, number of estimators, subsamples, and maximum tree depth,
were optimized by GA in this paper.
The XGB model, which was developed in 2016, is based on regression trees [34]. It
improves the operational efficiency of the optimization process while reducing overfit-
ting by employing second-order derivative data and integrating a regular component
in the cost function. In this study, the hyperparameters of the XGB model, including
learning rate, number of estimators, maximum tree depth, and minimum leaf weight,
were optimized by GA.
DNN is the basic form of deep learning and one of the most efficient and powerful tools
to model complex nonlinear relationships. DNN is a connectionist system with multiple
hidden layers between the input and the output layers [35]. For tuning hyperparameters in
this study, LeakyReLU and adam were set as the active function and optimizer, respectively.
The neural units of each layer of the DNN model were (64, 128, 256, 512, 512, 1024, 1024).
Additionally, dropout, batch normalization, and early stopping techniques were used in
the model.
Remote Sens. 2023, 15, 1250 8 of 18

RF is an ensemble ML algorithm based on decision trees developed in 2001 by Leo


Breiman [36]. RF is widely used in regression analysis, because of its high prediction
accuracy. The GS algorithm tries the list of all combinations of values given for a list
of hyperparameters and records the best performance based on evaluation metrics. The
hyperparameters of RF, included number of estimators, maximum number of features,
minimum number of samples to split a node, minimum number of samples to be at a leaf
node, and maximum allowable depth, were optimized by GS.

2.3.4. Model Accuracy Assessment


To verify the performances of the models, we adopted three evaluation indicators:
coefficient of determination (R2 ), root mean square error (RMSE), and residual prediction
deviation (RPD) to evaluate the accuracies of the TP retrieval models. Among them, R2
is the most commonly used indicator to evaluate the performance of regression models.
The value range of R2 is [0, 1]. The RMSE indicates the relative error between the
predicted value and the measured value. The closer it is to zero, the better the fitting
model. The RPD is the ratio of standard deviation of measured value to MSE. The models
can be divided into three categories according to RPD: (1) RPD > 2 indicates the model
is stable and reliable; (2) 2 > RPD > 1.4 indicates the model is general and reliability
needs to be improved; (3) 1.4 > RPD indicates poor stability of the model and retrieval is
unreliable [37]. The formulas of these evaluation indicators are as follows, respectively.
2
∑in=1 ( xi − yi )
R2 = 1 − 2
(4)
∑in=1 ( xi − x )
s
1 n
n i∑
RMSE = ( x i − y i )2 (5)
=1
q
1 n 2
n ∑ i =1 ( x i − x )
RPD = (6)
RMSE
where n is the number of data pairs, yi is the predicted TP concentration, xi is the value of
the measured TP concentration, and x is the mean of measured TP concentration.

3. Results
3.1. Spectral Response to TP Concentration
The reflectance curves of sampling points are illustrated in Figure 3. The spectral
signature was characterized by a strong absorption peak in blue444 and red668, and
by a reflection peak in green560 and rededge705, which had similar characteristic with
Lu et al. [29]. However, the TP concentration showed different laws with the reflectance
values in research area A and B (Figure 3a) and research area C (Figure 3b). Overall,
the reflectance curves were positively correlated with the TP concentration in research
areas A and B, while the reflectance curves had a negative correlation with the TP
concentration in research area C. This phenomenon can be explained by different water
components or concentration in different regions, such as TP concentration and relative
active parameters. It is worth noting that the water in research areas A and B is mainly
affected by agriculture and aquaculture, while in research area C it is mainly affected by
domestic sewage. The relationships between TP concentration and reflectance values in
different regions were variable in previous studies. Research has demonstrated that there
was a strong positive correlation between concentration of phosphorus and spectral
reflectance [7,38]. Other studies showed that the spectral reflectance was inversely
proportional to TP concentration in the lakes and rivers [25,39].
The relationships between TP concentration and reflectance values in different regions
were variable in previous studies. Research has demonstrated that there was a strong pos-
Remote Sens. 2023, 15, 1250
itive correlation between concentration of phosphorus and spectral reflectance [7,38].
9 of 18
Other studies showed that the spectral reflectance was inversely proportional to TP con-
centration in the lakes and rivers [25,39].

Figure 3.
Figure 3. Reflectance
Reflectance curves
curves of
of sampling
sampling points:
points: (a)
(a) research
research areas
areas A
A and
and B,
B, (b)
(b) research
researcharea
areaC.
C.

3.2. Selection of Band Combinations


Band combinations were used to obtain sensitive bands to extract extract effective
effective spectral
spectral
information. The approach in Section 2.3.2 was used to create and select the best feature
subset based on the training and validation sets sets for
for the
the input
input of of the
the IOA-ML
IOA-ML models.
models. The
best
best feature
feature subset
subset selected
selected by
by CFS
CFS with maximum Merit
with maximum Meritss is
is shown
shown inin Table
Table 4.
4. The selected
The selected
five-band
five-band combinations along with the ten initial multispectral bands were used as input
combinations along with the ten initial multispectral bands were used as input
variables
variables inin the
the development
development of of the
the six
six IOA-ML
IOA-ML models
models for
for retrieving
retrievingTPTPconcentration.
concentration.

Table 4. Bands
Table4. Bands combinations
combinations selected
selected by
byCFS.
CFS.

Selection Order Selection Order Feature Feature Merits Merits


1 1 × rededge714
rededge740 × rededge705
rededge740 × nir842
× rededge714 × rededge705 ×0.8329
nir842 0.8329
2 (blue444 − red668 − nir842) ÷ rededge714 0.8796
3 2
rededge740 × red668 ×(blue444
red650 × −nir842
red668 − nir842) ÷ rededge714
0.8952 0.8796
4 red668
3 ÷ green560 ÷ green531 ÷ rededge714
rededge740 × red668 × red650 × nir842 0.923 0.8952
5 rededge740 × rededge714 × red650 × nir842 0.9394
4 red668 ÷ green560 ÷ green531 ÷ rededge714
Note, × represents multiply, ÷ represents divide, − represents subtract.
0.923
5 rededge740 × rededge714 × red650 × nir842 0.9394
3.3.
Note,Evaluation of IOA-ML
× represents multiply, Models
÷ represents divide, − represents subtract.
Six IOA-ML models (PSO-SVR, GA-CBR, GA-GBR, DNN, GA-XGB, GS-RF) with
3.3. Evaluation
hyperparameters of IOA-ML Models
determined by the training sets and slightly adjusted by the evaluation
indicators of the validation
Six IOA-ML sets wereGA-CBR,
models (PSO-SVR, developed to retrieve
GA-GBR, DNN, TP concentration.
GA-XGB, GS-RF) The scatter
with hy-
plots of measured
perparameters and predicted
determined by theTP concentrations
training sets and on the training
slightly adjusted sets,
by validation sets,
the evaluation
and test sets,
indicators derived
of the from sets
validation the developed
were developedIOA-ML models,TP
to retrieve areconcentration.
shown in Figure The 4. The
scatter
scatters were near and uniformly distributed on both sides of the 1:1 line,
plots of measured and predicted TP concentrations on the training sets, validation sets,which proved the
established models obtained better results. However, there still existed
and test sets, derived from the developed IOA-ML models, are shown in Figure 4. The certain problems.
For example,
scatters were some models,
near and such as
uniformly PSO-SVR,on
distributed GA-GBR, DNN,
both sides GA-XGB,
of the 1:1 line,and GS-RF,
which did
proved
slightly underestimate
the established modelsthe high TP
obtained concentration
better where TPthere
results. However, concentration
still existedwas moreprob-
certain than
0.7 mg/L
lems. For due to the some
example, paucity of samples
models, such available
as PSO-SVR, withGA-GBR,
a high TPDNN, concentration.
GA-XGB, and GS-
Table 5 reveals the performances of six IOA-ML models with respect to R2 , RMSE,
and RPD, and the best results are shown in boldface. In general, all models showed high
accuracies in the training sets, validation sets, and test sets. The prediction accuracies for the
validation and test sets were all lower than the prediction accuracies for their training sets
for the GA-CBR, GA-GBR, DNN, GA-XGB, and GS-RF models. For the PSO-SVR model,
the prediction accuracy of the training sets, validation sets, and test sets were similar. RPD
are more than 2 of the six IOA-ML models and three sets, except for the test sets of the
Remote
Remote Sens. 2023, 15,
Sens. 2023, 15, 1250
x FOR PEER REVIEW 1010of
of 19
18

GA-CBR
RF, model with
did slightly 1.9849, proving
underestimate theTP
the high sixconcentration
IOA-ML models are TP
where stable and reliablewas
concentration on
predicting TP concentration.
more than 0.7 mg/L due to the paucity of samples available with a high TP concentration.

Figure
Figure 4.
4. Comparison
Comparison of
of measured
measured and
and predicted
predicted TP
TP concentrations
concentrations of
of training
training sets,
sets, validation
validation sets,
sets,
and test sets.
and test sets.

Table 5 reveals the performances of six IOA-ML models with respect to R2, RMSE,
Table 5. Performances of the six IOA-ML models.
and RPD, and the best results are shown in boldface. In general, all models showed high
Raccuracies
2 in the training sets,RMSE validation
(mg/L)sets, and test sets. The prediction RPD accuracies for
Model the validation and test sets were all lower than the prediction accuracies for their training
Training Validation Test Training Validation Test Training Validation Test
sets for the GA-CBR, GA-GBR, DNN, GA-XGB, and GS-RF models. For the PSO-SVR
PSO-SVR 0.9015 0.8929 0.9045
model, the prediction0.0649accuracy of0.0583the training0.0486 3.1859
sets, validation sets, and3.0561 3.2351
test sets were sim-
GA-CBR 0.9506 0.8148 0.7462 0.0445 0.0742 0.0792 4.5 2.3234 1.9849
GA-GBR 0.984
ilar.
0.8458
RPD are
0.8281
more than 2
0.0253
of the six IOA-ML
0.0677
models
0.0651
and three
7.9153
sets, except
2.5466
for the test
2.4125
sets
DNN 0.8856 of
0.8054 the GA-CBR
0.8143 model with
0.0699 1.9849, proving
0.0786 the six
0.0677 IOA-ML models
2.9565 are stable
2.2667 and reliable
2.3206
GA-XGB 0.9584 on predicting
0.9082 0.9124TP concentration.
0.0422 0.054 0.047 4.906 3.3005 3.379
GS-RF 0.9534 0.8579 In this 0.8624
study, the 0.0447
paired data 0.0672
were divided0.0583into three4.6304sets, and 2.6528
the model2.6962perfor-
mances on the validation and test sets were used to select the best retrieval model. Overall,
the GA-XGB model
In this study, theoutperformed
paired data were the other models
divided because
into three sets,itand
hadthe
themodel
highest R2 and RPD
performances
with
on the the lowest RMSE.
validation and test Many
setsstudies
were used showed the XGB
to select model
the best had good
retrieval performance
model. Overall, the on
water quality retrieval, because the XGB model can control
GA-XGB model outperformed the other models because it had the highest R2 and RPD with the model complexity and
prevent
the lowest theRMSE.
modelMany from studies
overfittingshowed[1]. The accuracy
the XGB modelof had
the water quality parameter
good performance on water re-
trieval model based on the GA-XGB algorithm also significantly
quality retrieval, because the XGB model can control the model complexity and prevent higher compared with
other
the modelmethods from[25]. As a second
overfitting [1].option, the PSO-SVR
The accuracy of themodel
wateralso had better
quality generalization
parameter retrieval
capability and transferability, with the three evaluation indicators
model based on the GA-XGB algorithm also significantly higher compared with other of the validation and
test sets almost equal to the training sets, demonstrating the
methods [25]. As a second option, the PSO-SVR model also had better generalization good fitting properties of the
SVR model,
capability and even for non-linear
transferability, withdata [18]. Additionally,
the three Yang etofal.
evaluation indicators the[39] also concluded
validation and test
that the SVR
sets almost model
equal hadtraining
to the the bestsets,
performance
demonstratingfor TPtheretrieval for Sentinel-2
good fitting properties inoflakes and
the SVR
rivers.
model, even for non-linear data [18]. Additionally, Yang et al. [39] also concluded that test
The GA-CBR model presented the poorest performance on the validation and the
SVR model had the best performance for TP retrieval for Sentinel-2 in lakes and rivers. The
Model
Training Validation Test Training Validation Test Training Validation Test
PSO-SVR 0.9015 0.8929 0.9045 0.0649 0.0583 0.0486 3.1859 3.0561 3.2351
GA-CBR 0.9506 0.8148 0.7462 0.0445 0.0742 0.0792 4.5 2.3234 1.9849
GA-GBR 0.984
Remote Sens. 2023, 15, 1250
0.8458 0.8281 0.0253 0.0677 0.0651 7.9153 2.5466 11 of2.4125
18
DNN 0.8856 0.8054 0.8143 0.0699 0.0786 0.0677 2.9565 2.2667 2.3206
GA-XGB 0.9584 0.9082 0.9124 0.0422 0.054 0.047 4.906 3.3005 3.379
GS-RF 0.9534 0.8579 0.8624 0.0447 0.0672 0.0583 4.6304 2.6528 2.6962
GA-CBR model presented the poorest performance on the validation and test sets, even
with high R2 for the training set compared to the other methods, which was overfitting in
3.4.the training
Spatial sets, to some
Distribution extent.
of TP Concentration
The
3.4. spatial
Spatial distribution
Distribution of TP concentration derived from the six established IOA-
of TP Concentration
ML models with UAV multispectral images arederived
The spatial distribution of TP concentration mapped inthe
from Figures 5–7. Variability
six established IOA-MLof TP
concentration
models withcan UAV bemultispectral
observed through
imagesthearecolor change
mapped of each5–7.
in Figures figure. The ranges
Variability of TPof the
color bars are retained consistently for better comparison for each flight.
concentration can be observed through the color change of each figure. The ranges TP concentration
derived
of the from
color different IOA-MLconsistently
bars are retained models had forsimilar spatial distribution
better comparison characteristics.
for each flight. TP
concentration
Statistical analysisderived
of thefrom different
retrieved TPIOA-ML models of
concentration had similar
each spatial
flight and distribution
the six IOA-ML
characteristics.
models Statistical
based on pixel scaleanalysis of the retrieved
and measured TPshown
value are concentration
in Figureof each
8. flight and
the six IOA-ML models based on pixel scale and measured value are shown in Figure 8.

Figure
Figure 5. Spatialdistribution
5. Spatial distribution ofofTPTP
concentration derived
concentration from the
derived six IOA-ML
from the six models
IOA-ML in research
modelsarea A.
in research
area A.
The spatial distribution of TP concentration in research area A derived from the six
IOA-ML models were consistent, to some extent (Figure 5). All retrieval models had
high TP concentration in the center region of southern part and are more pronounced
in the GA-CBR model. The measured TP concentration in the ditch of crayfish–rice
culture was 0.046 ± 0.012 mg/L (mean ± standard deviation, hereinafter same). Ac-
cording to the statistics of the retrieval results of all pixels, the retrieved TP concen-
tration of the PSO-SVR, GA-CBR, GA-GBR, DNN, GA-XGB, and GS-RF models were
0.0633 ± 0.0336 mg/L, 0.0752 ± 0.0195 mg/L, 0.067 ± 0.0251 mg/L, 0.0401 ± 0.0217 mg/L,
0.0693 ± 0.0079 mg/L, and 0.067 ± 0.0192 mg/L, respectively. Most of the retrieval models’
predicted values were greater than the measured values, and only the performance of DNN
was close to the measured value. The main reason for this was that the retrieval models
overestimated the low TP concentration where the TP concentration lies in research area A.
The spatial distribution of TP concentration derived from the six IOA-ML models were
highly consistent in most parts of research area B (Figure 6). The observed and retrieved
TP concentration of the PSO-SVR, GA-CBR, GA-GBR, DNN, GA-XGB, and GS-RF models
were 0.415 ± 0.113 mg/L, 0.3971 ± 0.128 mg/L, 0.3617 ± 0.1104 mg/L, 0.35 ± 0.1051 mg/L,
0.3908 ± 0.1165 mg/L, 0.3546 ± 0.097 mg/L, and 0.374 ± 0.0954 mg/L, respectively. It
was obvious that the quantiles and mean value of the measured TP concentration were
Remote Sens. 2023, 15, 1250 12 of 18

slightly higher than all of the retrieved results in Figure 8b, with relative errors between
predicted and measured mean values from −15.77% to −5.83%. This can be explained
by the high TP concentration on the shore and low concentration in the center of research
area B (Figure 6), and the number of sampling points near the shore were more than that of
the center (Figure 1). However, TP concentration at the bank of the northern part derived
from the six IOA-ML models showed huge difference, where the TP concentration of
the PSO-SVR and DNN models was very high, while that of the GA-GBR and GA-XGB
ens. 2023, 15, x FOR PEER REVIEW models was very low. We extracted the reflectance curves of this region and found that the
reflectance curves are slightly more than the reflectance curves with high TP concentration
in Figure 3a. Therefore, we thought the retrieval results at the bank of the northern part in
research area B of the GA-GBR and GA-XGB models may be inaccurate.

Figure 6. Spatial distribution of TP concentration derived from the six IOA-ML models in
Figure 6. Spatial
research distribution
area B. of TP concentration derived from the six IOA-ML models in r
area B.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 19
Remote Sens. 2023, 15, 1250 13 of 18

Figure 7. Spatial distribution of TP concentration derived from the six IOA-ML models in research
Figure 7. Spatial distribution of TP concentration derived from the six IOA-ML models in research
area C on 10 December 2021 (first row (a–f)), 27 May 2022 (second row (g–l)), and 27 September 2022
area C on 10 December 2021 (first row (a–f)), 27 May 2022 (second row (g–l)), and 27 September
(third row (m–r)), and every day has the same legends represented in (a,g,m), respectively.
2022 (third row (m–r)), and every day has the same legends represented in (a,g,m), respectively.
Remote
Remote Sens. 2023,15,
Sens. 2023, 15,1250
x FOR PEER REVIEW 14 of
14 of 18
19

Figure 8.
Figure 8. Statistics
Statisticsofofmeasured
measuredTP TPconcentration
concentrationand
and derived
derived TPTP concentration
concentration from
from the the six IOA-
six IOA-ML
ML models at the pixel scale: (a) research area A, (b) research area B, (c) research area C on
models at the pixel scale: (a) research area A, (b) research area B, (c) research area C on 10 December10 De-
cember 2021, (d) research area C on 27 May 2022, (e) research area C on 27 September 2022.
2021, (d) research area C on 27 May 2022, (e) research area C on 27 September 2022.

The spatial
The TP maps distribution
derived of TP concentration
from different IOA-ML in research
models areainA research
derived fromarea the
C onsix
IOA-ML models were consistent, to some extent (Figure 5).
10 December 2021, 27 May 2022, and 27 September 2022 are presented in Figure 7. The All retrieval models had high
TP concentration
visual in the center
inspection conveyed region
a high of southern part
correspondence and are
between more pronounced
different methods, thoughin the
GA-CBR
some model. The
differences weremeasured
evident. The TP concentration
retrieval results in the ditch of that
delineated crayfish–rice culture was
the TP concentration
0.046
in the±center
0.012 mg/L
of Pond (mean
2 and± standard
Pond 6 was deviation, hereinafter
high, while the nearsame).
shore According
was low,toand theall
statis-
the
tics of thepoints
sampling retrievalareresults
near the of shore
all pixels, the retrieved
of ponds on 10 DecemberTP concentration of the PSO-SVR,
2021. Therefore, the mean
GA-CBR,
value GA-GBR,
of the measured DNN, GA-XGB,
values should andbeGS-RF
lower models
than the were 0.0633results,
retrieval ± 0.0336which
mg/L,was0.0752in
± 0.0195 mg/L,
accordance with 0.067
Figure± 0.0251
8c. On mg/L, 0.0401 ± 0.0217
27 September 2022, mg/L, 0.0693 points
all sampling ± 0.0079 mg/L,
are and4–6
located 0.067
m
± 0.0192
near mg/L, of
the shore respectively.
ponds, withMost mostofinthe retrieval
Pond 2. Themodels’
results predicted
showed that values were greater
the retrieved TP
than the measured
concentration at thevalues,
shore of andPondonly2 thewasperformance
high (Figureof DNNThe
7m–r). wasmeasured
close to the measured
values were
value.(Figure
high The main reason
8e) just forverifying
right this wasthe thatreliability
the retrievalof the models overestimated
retrieval results of the theIOA-ML
low TP
concentration
models. wheresampling
Therefore, the TP concentration
points should lies
beinevenly
research area A. throughout the study
distributed
area in water quality parameter retrieval.
The spatial distribution of TP concentration derived from the six IOA-ML models
wereInhighly
research area C, there
consistent in most werepartsa total of three area
of research flights. The TP 6).
B (Figure concentration
The observed of the
andfirst
re-
and second
trieved flights were 0.02–0.28
TP concentration mg/L and
of the PSO-SVR, 0.02–0.23
GA-CBR, GA-GBR,mg/L, respectively,
DNN, GA-XGB,which were
and GS-RF
significantly
models werelower 0.415 ±than0.113themg/L,
third0.3971
flight on 27 September
± 0.128 mg/L, 0.3617 2022. The main
± 0.1104 mg/L,reason
0.35 ±for this
0.1051
was
mg/L,that the Yangtze
0.3908 ± 0.1165River
mg/L,basin
0.3546 experienced
± 0.097 mg/L, continued
and 0.374 drought
± 0.0954in summer and autumn
mg/L, respectively. It
2022, resulting in a great reduction of water volume in the
was obvious that the quantiles and mean value of the measured TP concentration wereponds, and it was obvious that
the water
slightly surface
higher thanof all
Pond 6 on
of the 27 September
retrieved results2022 was almost
in Figure 8b, with half that of
relative the previous.
errors between
Studies
predicted and measured mean values from −15.77% to −5.83%. This can be For
proved that drought could increase water pollutant concentration. example,
explained by
researchers
the high TPindicated that Chl-a
concentration on the concentration
shore and low was negatively
concentration correlated with precipitation
in the center of research
and
areawater level6),
B (Figure inand
extreme droughtofevents
the number sampling [40] points
and TPnear concentration
the shore were in drought
more thansummer
that
was significantly higher than those in the other years [41].
of the center (Figure 1). However, TP concentration at the bank of the northern part de-
rived from the six IOA-ML models showed huge difference, where the TP concentration
of the PSO-SVR and DNN models was very high, while that of the GA-GBR and GA-XGB
Remote Sens. 2023, 15, 1250 15 of 18

4. Discussion
Feature engineering before ML model establishment is necessary, and previous studies
have confirmed that optimal input features can improve the performance of water quality
retrieval models [12,42]. For example, there are generally perfect correlations between
water quality parameters and band combinations of Landsat images [43]. However, there
are only typically 4–7 multispectral bands in their study, and the feature selection is only
based on the correlation between features and water quality parameters [6,10,16,42]. The
multispectral imager used in this research has ten bands, and it is still unknow weather
or not feature amplification and CFS can promote the model performance on TP retrieval
in small inland waterbodies. The performances in the validation and test sets of the
six IOA-ML models with feature amplification and CFS improved compared with those
without feature amplification and CFS (Table 6). For validation sets, the RMSE decreased
5–18.91%, R2 improved 0.49–6.62%, and RPD improved 1.5–16%. For the test sets, the
RMSE decreased 1.04–30.11%, R2 improved 0.2–7.29%, and RPD improved 0.95–23.15%.
Among the six IOA-ML models, the DNN improved the most in the validation sets and
the second most in the test sets, although the r values between the selected feature subsets
and the measured TP concentration were 0.83, 0.66, 0.82, −0.12, and 0.83, respectively. The
robustness of established IOA-ML models with feature amplification and CFS showed
great improvement. Research indicated that CFS-PSO feature selection can identify and
remove irrelevant variables [44] and revealed the superiority of the CFS procedure for the
detection of optimal wavelengths [45]. Therefore, choosing a suitable feature subset by CFS
can effectively improve the accuracy of the TP retrieval models.

Table 6. Improvement of the six IOA-ML models with feature amplification and CFS.

R2 RMSE (mg/L) RPD


Model
Validation Test Validation Test Validation Test
PSO-SVR 1.18% 0.20% −8.77% −1.04% 5.11% 0.95%
GA-CBR 5.61% 7.29% −16.30% −14.12% 14.07% 12.32%
GA-GBR 4.84% 6.07% −17.74% −20.33% 15.07% 16.97%
DNN 6.62% 6.65% −18.91% −20.46% 15.96% 17.09%
GA-XGB 2.53% 6.65% −11.85% −30.11% 10.57% 23.15%
GS-RF 0.49% 4.72% −5.00% −20.45% 1.50% 16.93%

One of the limitations of the ML-based TP retrieval model is that its transferability is
limited [42]. Many studies used cross validation or splitting data into three sets to verify
the feasibility of the ML-based water quality parameter retrieval models [17,25,26,46].
The paired TP concentration and reflectance values in this study were split into train-
ing, validation, and test sets, and model performances on the validation and test sets
(R2 = 0.7462–0.9124, RMSE = 0.047–0.0792 mg/L, RPD = 1.9849–3.379) showed a slight
decline in different degrees compared to those on the training sets (R2 = 8856–0.984,
RMSE = 0.0253–0.0699 mg/L, RPD = 2.9565–7.9153), but the overall performances main-
tained a good balance. The results suggested that slight overfitting existed in the de-
veloped IOA-ML models, but it was controlled at a good level. Remotely-sensed TP
estimation is complex, and Politi et al. [47] assessed 28 empirical algorithms sourced
from the peer-reviewed literature using new satellite remote sensing data to identify
the best water quality parameter retrieval algorithms in terms of accuracy and trans-
ferability and concluded that none of them exhibited satisfactory promise. One study
showed that the best TSS retrieval model developed by a local dataset was accurate
when applied to other areas [48]. Another UAV multispectral images without water sam-
pling on 12 May 2022 in research area B were directly used to retrieve TP concentration
combined with the established models. The retrieved TP concentration of the PSO-SVR,
GA-CBR, GA-GBR, DNN, GA-XGB, and GS-RF models were 0.1273 ± 0.0346 mg/L,
0.1946 ± 0.0108 mg/L, 0.2356 ± 0.0117 mg/L, 0.151 ± 0.0273 mg/L, 0.224 ± 0.0226 mg/L,
Remote Sens. 2023, 15, 1250 16 of 18

and 0.1432 ± 0.0111 mg/L, respectively. The r values of TP concentration based on pixel
scale derived from any two IOA-ML models were 0.4342–0.9083. Although the accuracy
of the developed models may be mediocre when directly used in other flights under
different external environments, such as temperature and sunlight intensity and hydro-
logical regimes [22], it can still estimate the TP concentration of each pixel in the entire
research area. In summary, the developed IOA-ML models have certain transferability
at the research areas and can be easily applied to other regions by retraining the models
with new data.
Although several approaches, including dividing the paired data into three sets and
applying established models to other multispectral images, had been adopted to verify the
transferability of the established models, the established models had not been verified using
datasets in a separate waterbody. In addition, we only sampled at the edge of the ponds in
research area C due to the constraints of financial support and time. For further research,
we should collect more samples from various waterbodies and ensure that the sampling
points are distributed as evenly as possible, and further establish more generalized and
adaptable models.

5. Conclusions
In this study, six IOA-ML models were developed to retrieve TP concentration with
UAV multispectral images in small inland waterbodies in Hubei province, middle China.
The paired in situ TP concentration and reflectance values were divided into training sets,
validation sets, and test sets. Feature selection was performed by CFS to find the most
suitable feature subset. The developed IOA-ML models with hyperparameters tuned with
training sets and slightly adjusted with validation sets achieved satisfactory performance
in term of R2 (0.7462–0.984), RMSE (0.0253–0.0792 mg/L), and RPD (1.9849–7.9153). The
GA-XGB and PSO-SVR models had the best performances according to the accuracies
of the validation and test sets. The TP concentration of each flight derived from six
IOA-ML models had a similar spatial distribution, and quantiles and mean values of TP
concentration based on pixel scale retrieved from the six IOA-ML models was lower than
the measured value when most water sampling points were located in the high value
area of the retrieval models. Additionally, the developed IOA-ML models have certain
transferability at the research areas and can be easily applied to other regions by retraining
the models with new data. This study provides an efficient and practical way for TP
monitoring in small inland waterbodies.

Author Contributions: Conceptualization, W.H., W.G. and D.S.; methodology, W.H. and D.M.; soft-
ware, W.H.; validation, W.G.; investigation, H.W., J.L. and H.W.; writing—original draft preparation,
W.H.; writing—review and editing, W.G.; funding acquisition, D.S. All authors have read and agreed
to the published version of the manuscript.
Funding: This study was supported by the Chinese National Natural Science Foundation
(No. U21A20156) and the Strategic Priority Research Program of the Chinese Academy of Sciences
(No. XDA2004030102).
Data Availability Statement: The raw data supporting the conclusions of this article are available
from the authors upon request.
Acknowledgments: The authors would like to thank the reviewers and the editors for their valuable
suggestions and contributions, which significantly helped to improve this article.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Tian, S.; Guo, H.; Xu, W.; Zhu, X.; Wang, B.; Zeng, Q.; Mai, Y.; Huang, J.J. Remote sensing retrieval of inland water quality
parameters using sentinel-2 and multiple machine learning algorithms. Environ. Sci. Pollut. Res. 2022, 30, 18617–18630. [CrossRef]
2. Wang, S.; Li, J.; Zhang, B.; Spyrakos, E.; Tyler, A.N.; Shen, Q.; Zhang, F.; Kutser, T.; Lehmann, M.K.; Wu, Y.; et al. Trophic state
assessment of global inland waters using a modis-derived forel-ule index. Remote Sens. Environ. 2018, 217, 444–460. [CrossRef]
Remote Sens. 2023, 15, 1250 17 of 18

3. Du, C.; Wang, Q.; Li, Y.; Lyu, H.; Zhu, L.; Zheng, Z.; Wen, S.; Liu, G.; Guo, Y. Estimation of total phosphorus concentration using a
water classification method in inland water. Int. J. Appl. Earth Obs. Geoinf. 2018, 71, 29–42. [CrossRef]
4. Liu, J.; Gu, W.; Liu, Y.; Zhang, C.; Li, W.; Shao, D. Dynamic characteristics of net anthropogenic phosphorus input and
legacy phosphorus reserves under high human activity—A case study in the Jianghan plain. Sci. Total Environ. 2022,
836, 155287. [CrossRef] [PubMed]
5. Ji, P.; Xu, H.; Zhan, X.; Zhu, G.; Kang, L. Spatial-temporal variations and driving of nitrogen and phosphorus ratios in lakes in the
middle and lower reaches of Yangtze River. Environ. Sci. 2020, 41, 4030–4041.
6. Xiao, Y.; Guo, Y.; Yin, G.; Zhang, X.; Shi, Y.; Hao, F.; Fu, Y. UAV multispectral image-based urban river water quality moni-
toring using stacked ensemble machine learning algorithms—A case study of the Zhanghe river, China. Remote Sens. 2022,
14, 3272. [CrossRef]
7. Chawla, I.; Karthikeyan, L.; Mishra, A.K. A review of remote sensing applications for water security: Quantity, quality, and
extremes. J. Hydrol. 2020, 585, 124826. [CrossRef]
8. Fichot, C.G.; Downing, B.D.; Bergamaschi, B.A.; Windham-Myers, L.; Marvin-Dipasquale, M.; Thompson, D.R.; Gierach,
M.M. High-resolution remote sensing of water quality in the San Francisco bay–delta estuary. Environ. Sci. Technol. 2016,
50, 573–583. [CrossRef]
9. Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Ha, N. Seamless retrievals of
chlorophyll-a from sentinel-2 (MSI) and sentinel-3 (OLCI) in inland and coastal waters: A machine-learning approach. Remote
Sens. Environ. 2020, 240, 111604. [CrossRef]
10. Wang, S.; Shen, M.; Liu, W.; Ma, Y.; Shi, H.; Zhang, J.; Liu, D. Developing remote sensing methods for monitoring water quality of
alpine rivers on the Tibetan Plateau. GIScience Remote Sens. 2022, 59, 1384–1405. [CrossRef]
11. Hu, M.; Ma, R.; Xiong, J.; Wang, M.; Cao, Z.; Xue, K. Eutrophication state in the eastern China based on landsat 35-year
observations. Remote Sens. Environ. 2022, 277, 1130577. [CrossRef]
12. Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, X. A machine learning approach to estimate chlorophyll-a from
landsat-8 measurements in inland lakes. Remote Sens. Environ. 2020, 248, 111974. [CrossRef]
13. Huang, Y.; Chen, X.; Liu, Y.; Sun, M.; Chen, H.; Wang, X. Inversion of river and lake water quality parameters by UAV
hyperspectral imaging technology. Yangtze River 2020, 51, 205–212.
14. Su, T.; Chou, H. Application of multispectral sensors carried on unmanned aerial vehicle (UAV) to trophic state mapping of small
reservoirs: A case study of Tain-pu reservoir in Kinmen, Taiwan. Remote Sens. 2015, 7, 10078–10097. [CrossRef]
15. Wang, L.; Yue, X.; Wang, H.; Ling, K.; Liu, Y.; Wang, J.; Hong, J.; Pen, W.; Song, H. Dynamic inversion of inland aquaculture water
quality based on UAVs-WSN spectral analysis. Remote Sens. 2020, 12, 402. [CrossRef]
16. Liu, Y.; Xia, K.; Feng, H.; Fang, Y. Inversion of water quality elements in small and micro-size water region using multispectral
image by UAV. Acta Sci. Circumstantiae 2019, 39, 1241–1249.
17. Xiong, J.; Lin, C.; Cao, Z.; Hu, M.; Xue, K.; Chen, X.; Ma, R. Development of remote sensing algorithm for total phosphorus
concentration in eutrophic lakes: Conventional or machine learning? Water Res. 2022, 215, 118213. [CrossRef]
18. Sagan, V.; Peterson, K.T.; Maimaitijiang, M.; Sidike, P.; Sloan, J.; Greeling, B.A.; Maalouf, S.; Adams, C. Monitoring inland water
quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud
computing. Earth Sci. Rev. 2020, 205, 103187. [CrossRef]
19. Du, C.; Li, Y.; Wang, Q.; Zhu, L.; Lü, H. Inversion model and daily variation of total phosphorus concentrations in Taihu Lake
based on GOCI data. Environ. Sci. 2016, 37, 862–872.
20. Zhang, B.; Li, J.; Shen, Q.; Chen, D. A bio-optical model based method of estimating total suspended matter of lake Taihu from
near-infrared remote sensing reflectance. Environ. Monit. Assess. 2008, 145, 339–347. [CrossRef]
21. Fan, D.; He, H.; Wang, R.; Zeng, Y.; Fu, B.; Xiong, Y.; Liu, L.; Xu, Y.; Gao, E. Chlnet: A novel hybrid 1D CNN-SVR algorithm for
estimating ocean surface chlorophyll-a. Front. Mar. Sci. 2022, 9, 1555. [CrossRef]
22. Zhang, H.; Xue, B.; Wang, G.; Zhang, X.; Zhang, Q. Deep learning-based water quality retrieval in an impounded lake using
landsat 8 imagery: An application in Dongping lake. Remote Sens. 2022, 14, 4505. [CrossRef]
23. Zhang, Y.; Wu, L.; Deng, L.; Ouyang, B. Retrieval of water quality parameters from hyperspectral images using a hybrid feedback
deep factorization machine model. Water Res. 2021, 204, 117618. [CrossRef] [PubMed]
24. Chang, N.; Xuan, Z.; Yang, Y. Exploring spatiotemporal patterns of phosphorus concentrations in a coastal bay with MODIS
images and machine learning models. Remote Sens. Environ. 2013, 134, 100–110. [CrossRef]
25. Chen, B.; Mu, X.; Chen, P.; Wang, B.; Choi, J.; Park, H.; Xu, S.; Wu, Y.; Yang, H. Machine learning-based inversion of water quality
parameters in typical reach of the urban river by UAV multispectral data. Ecol. Indic. 2021, 133, 108434. [CrossRef]
26. Zhou, X.; Liu, C.; Akbar, A.; Xue, Y.; Zhou, Y. Spectral and spatial feature integrated ensemble learning method for grading urban
river network water quality. Remote Sens. 2021, 13, 4591. [CrossRef]
27. Zhu, T.; Tao, C.; Cheng, H.; Cong, H. Versatile in silico modelling of microplastics adsorption capacity in aqueous environment
based on molecular descriptor and machine learning. Sci. Total Environ. 2022, 846, 157455. [CrossRef]
28. Brewer, K.; Clulow, A.; Sibanda, M.; Gokool, S.; Odindi, J.; Mutanga, O.; Naiken, V.; Vimbayi, G.P.C.; Mabhaudhi, T. Estimation of
maize foliar temperature and stomatal conductance as indicators of water stress based on optical and thermal imagery acquired
using an unmanned aerial vehicle (UAV) platform. Drones 2022, 6, 169. [CrossRef]
Remote Sens. 2023, 15, 1250 18 of 18

29. Lu, Q.; Si, W.; Wei, L.; Li, Z.; Xia, Z.; Ye, S.; Xia, Y. Retrieval of water quality from UAV-borne hyperspectral imagery: A
comparative study of machine learning algorithms. Remote Sens. 2021, 13, 3928. [CrossRef]
30. Farahani, G. Feature selection based on cross-correlation for the intrusion detection system. Secur. Commun. Netw. 2020,
2020, 8875404. [CrossRef]
31. Rafael, G.M.; Tomáš, H.; Ricardo, C.; Joaquin, V.; Andre, C. Hyper-parameter tuning of a decision tree induction algorithm. In
Proceedings of the Conference on Intelligent Systems (BRACIS 2016), Recife, Brazil, 9–12 October 2016.
32. Bazi, Y.; Melgani, F. Semisupervised PSO-SVM regression for biophysical parameter estimation. IEEE Trans. Geosci. Remote Sens.
2007, 45, 1887–1895. [CrossRef]
33. Huang, G.; Wu, L.; Ma, X.; Zhang, W.; Fan, J.; Yu, X.; Zeng, W.; Zhou, H. Evaluation of catboost method for prediction of reference
evapotranspiration in humid regions. J. Hydrol. 2019, 574, 1029–1041. [CrossRef]
34. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794.
35. Qian, J.; Liu, H.; Qian, L.; Bauer, J.; Xue, X.; Yu, G.; He, Q.; Zhou, Q.; Bi, Y.; Norra, S. Water quality monitoring and assessment
based on cruise monitoring, remote sensing, and deep learning: A case study of Qingcaosha reservoir. Front. Environ. Sci. 2022,
10, 979133. [CrossRef]
36. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
37. Chang, C.; Laird, D.A.; Mausbach, M.J.; Hurburgh, C.R. Near-infrared reflectance spectroscopy-principal components regression
analyses of soil properties. Soil Sci. Soc. Am. J. 2001, 65, 480–490. [CrossRef]
38. Song, M.; Li, E.; Chang, C.; Wang, Y.; Yu, C. Spectral characteristics of nitrogen and phosphorus in water. In Proceedings
of the 7th International Conference on Communications, Signal Processing, and Systems, Dalian, China, 14–16 July 2018;
Volume 516, pp. 569–578.
39. Yang, Z.; Gong, C.; Ji, T.; Hu, Y.; Li, L. Water quality retrieval from ZY1-02D hyperspectral imagery in urban water bodies and
comparison with sentinel-2. Remote Sens. 2022, 14, 5029. [CrossRef]
40. Jang, M.T.G.; Alcantara, E.; Rodrigues, T.; Park, E.; Ogashawara, I.; Marengo, J.A. Increased chlorophyll-a concentration in Barra
Bonita reservoir during extreme drought periods. Sci. Total Environ. 2022, 843, 157106. [CrossRef] [PubMed]
41. Qiu, X.; Huang, T.; Zeng, M.; Shi, J.; Cao, Z.H.; Zhou, S. Abnormal increase of Mn and TP concentrations in a temperate reservoir
during fall overturn due to drought-induced drawdown. Sci. Total Environ. 2017, 575, 996–1004. [CrossRef]
42. Zhu, X.; Guo, H.; Huang, J.J.; Tian, S.; Xu, W.; Mai, Y. An ensemble machine learning model for water quality estimation in coastal
area based on remote sensing imagery. J. Environ. Manag. 2022, 323, 116187. [CrossRef]
43. Al-Shaibah, B.; Liu, X.; Zhang, J.; Tong, Z.; Zhang, M.; El-Zeiny, A.; Faichia, C.; Hussain, M.; Tayyab, M. Modeling water quality
parameters using landsat multispectral images: A case study of Erlong lake, northeast China. Remote Sens. 2021, 13, 1603. [CrossRef]
44. Singh, S.; Singh, A.K. Web-spam features selection using CFS-PSO. Proc. Comput. Sci. 2018, 125, 568–575. [CrossRef]
45. Mireei, S.A.; Amini-Pozveh, S.; Nazeri, M. Selecting optimal wavelengths for detection of insect infested tomatoes based on
SIMCA-aided CFS algorithm. Postharvest Biol. Technol. 2017, 123, 22–32. [CrossRef]
46. Wen, Z.; Wang, Q.; Liu, G.; Jacinthe, P.; Wang, X.; Lyu, L.; Tao, H.; Ma, Y.; Duan, H.; Shang, Y. Remote sensing of total suspended
matter concentration in lakes across China using Landsat images and Google Earth Engine. ISPRS J. Photogramm. Remote Sens.
2022, 187, 61–78. [CrossRef]
47. Politi, E.; Cutler, M.E.J.; Rowan, J.S. Evaluating the spatial transferability and temporal repeatability of remote-sensing-
based lake water quality retrieval algorithms at the European scale: A meta-analysis approach. Int. J. Remote Sens. 2015,
36, 2995–3023. [CrossRef]
48. Jensen, D.; Simard, M.; Cavanaugh, K.; Sheng, Y.; Twilley, R. Improving the transferability of suspended solid estimation in
wetland and Deltaic waters with an empirical hyperspectral approach. Remote Sens. 2019, 11, 1629. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like