Sustainability 16 03355
Sustainability 16 03355
Article
Using Ensemble Learning for Remote Sensing Inversion of Water
Quality Parameters in Poyang Lake
Changchun Peng 1 , Zhijun Xie 1,2, * and Xing Jin 1
1 Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo 315211, China
2 Zhejiang Engineering Research Center of Advanced Mass Spectrometry and Clinical Application,
Ningbo University, Ningbo 315211, China
* Correspondence: [email protected]
Abstract: Inland bodies of water, such as lakes, play a crucial role in sustaining life and supporting
ecosystems. However, with the rapid development of socio-economics, water resources are facing
serious pollution problems, such as the eutrophication of water bodies and degradation of wetlands.
Therefore, the monitoring, management, and protection of inland water resources are particularly
important. In past research, empirical models and machine learning models have been widely used
for the water quality assessment of inland lakes. Due to the complexity of the optical properties of in-
land lake water bodies, the performance of these models is often limited. To overcome the limitations
of these models, this study uses in situ water quality data from 2017 to 2018 and multispectral (MS)
remote sensing data from Sentinel-2 to construct experimental samples of Poyang Lake. Based on
these experimental samples, we constructed a spatio-temporal ensemble model (STE) to evaluate
four common water quality parameters: chlorophyll-a (Chl-a), total phosphorus (TP), total nitro-
gen (TN), and chemical oxygen demand (COD). The model adopts an ensemble learning strategy,
improving the model’s performance by merging multiple advanced machine learning algorithms.
We introduced several indices related to water quality parameters as auxiliary variables, such as
NDCI and Enhanced Three, and used band data and these auxiliary variables as predictive variables,
thereby greatly enhancing the predictive potential of the model.The results show that the inversion
accuracy of these four inversion models is high (R2 of 0.94, 0.88, 0.92, and 0.93; RMSE of 1.15, 0.01,
0.02, and 0.02; MAE of 0.81, 0.01, 0.09, and 0.10), indicating that the STE model has good evaluation
Citation: Peng, C.; Xie, Z.; Jin, X.
accuracy. Meanwhile, we used the STE model to reveal the spatio-temporal distribution of Chl-a, TP,
Using Ensemble Learning for Remote
TN, and COD from 2017 to 2018, and analyzed their seasonal and spatial variation rules. The results
Sensing Inversion of Water Quality
of this study not only provide an effective and practical method for monitoring and managing water
Parameters in Poyang Lake.
quality parameters in inland lakes, but also provide water security for socio-economic and ecological
Sustainability 2024, 16, 3355. https://
doi.org/10.3390/su16083355
environmental safety.
Academic Editors: Xianju Li and Keywords: remote sensing inversion; water quality monitoring; inland water; machine learning;
Pan Zhu
ensemble learning; Poyang Lake
Received: 18 March 2024
Revised: 11 April 2024
Accepted: 12 April 2024
Published: 17 April 2024 1. Introduction
Lakes, surrounded by land, are surface water bodies typically replenished by rivers,
glaciers, precipitation, or groundwater. Although lakes only account for 3.7% of the Earth’s
land, they are a crucial component of ecosystems, providing unique living conditions
Copyright: © 2024 by the authors.
and food chains for many flora and fauna. Simultaneously, lakes play a significant role
Licensee MDPI, Basel, Switzerland.
in hydrological cycles and regional climate regulation [1–4]. Furthermore, inland water
This article is an open access article
distributed under the terms and
bodies are an essential part of the carbon cycle, contributing to global greenhouse gas
conditions of the Creative Commons
emissions [5]. Over the past few decades, due to global warming and human activities,
Attribution (CC BY) license (https:// there have been significant changes in the quantity, storage, water surface, and area of
creativecommons.org/licenses/by/ inland water resources. An increasing number of lakes are exhibiting environmental
4.0/). problems such as deteriorating water quality and eutrophication [6]. Eutrophication is
widely considered one of the most severe threats to the health of inland lake ecosystems.
Existing research indicates that nutrients such as phosphorus and nitrogen are the main
factors affecting algal growth, leading to eutrophication [7]. The deterioration of the aquatic
environment poses a threat to human safety and biodiversity. Therefore, it is urgent to
strengthen water quality monitoring, protect the aquatic environment, and enhance the
capability of the rapid dynamic monitoring of water quality.
Lakes, as an essential part of inland water resources, are one of the most important
sources of drinking water globally, and their water quality safety is closely related to public
health [8–11]. Water environment management relies on accurate and timely water quality
assessment, which is closely related to water quality indicator monitoring. At this time, wa-
ter quality monitoring means are particularly important in the evaluation and management
of water bodies. The investigation techniques for monitoring nutrients in water are usually
both time-consuming and expensive. Most traditional water quality monitoring methods
are based on manual on-site sampling, laboratory sample analysis, or portable instrument
measurements. These methods not only consume a large amount of manpower, material
resources, and time cost but also have data lag problems, making it difficult to achieve
dynamic water quality monitoring. In addition, traditional water quality sampling point
monitoring methods cannot effectively monitor the large-area distribution of water bodies,
long-term continuous changes, etc. [12]. There is an urgent need for a low-cost and effective
method for dynamically monitoring widely distributed nutrient costs. Remote sensing is an
effective technique for the continuous monitoring of surface water dynamics with greater
spatial coverage and higher temporal frequency. This effectively overcomes the limitations
of data collection in traditional water quality monitoring and helps to characterize lake
changes in different regions [13]. It has been applied to obtain continuously updated
aquatic environments and has successfully generated detailed and consistent datasets for
water quality analysis [14–16]. At present, multispectral and hyperspectral remote sensing
data serve as the primary data sources for monitoring water quality. Yet, multispectral data,
being more accessible, find broader applications. For example, Feng et al. [17] confirmed
that Landsat series data can be used to monitor water quality parameters in inland lakes
and achieve good inversion results.
Chl-a is a key indicator for measuring the biomass of algae in a body of water. Algae,
as the primary producers in aquatic ecosystems, have a direct impact on the ecological
balance and water quality conditions. TP and TN are two main indicators for assessing
the eutrophic status of a body of water. Nitrogen and phosphorus are key nutrients for the
growth of aquatic plants and algae. An increase in their concentrations is a major cause
of eutrophication and algal blooms. COD is an indicator used to measure the amount of
substance in a body of water that can be oxidized chemically, typically used to assess the
content of organic matter in the water. A high COD value indicates a high degree of organic
pollution in the water, which may affect the health of aquatic organisms [18,19].
Poyang Lake, the largest freshwater lake in China and the second-largest lake in the
country, is located in the Yangtze River basin and is an important seasonal lake in the
basin. Poyang Lake plays a significant role in regulating the water level of the Yangtze
River, nurturing water sources, improving the local climate, and maintaining the ecological
balance of the surrounding area. In recent years, inland waters have been severely affected
by flood disasters and intense human activities, making their optical properties very
complex. Therefore, sensors with a high signal-to-noise ratio and a high dynamic range
are needed to effectively measure water bodies with high reflectance [20]. Currently, the
Moderate Resolution Imaging Spectroradiometer (MODIS) collects images with a daily
time resolution and a spatial resolution of 250 m∼500 m, and has been widely used for
the rapid detection of surface water changes [21]. Landsat sensors collect images with
a 16-day time resolution and a high spatial resolution of 30 m, and have been widely
used for annual-scale surface water dynamics [22]. However, the low spatial resolution
of MODIS limits its application on smaller spatial scales, which may result in the loss
of many small water bodies in the results. Although the spatial resolution of Landsat
Sustainability 2024, 16, 3355 3 of 19
sensors is relatively high, the 16-day revisit period makes it difficult to obtain cloud-free
images on a monthly scale, leading to missing composite images, making lake monitoring
a considerable challenge. Sentinel-2 is an Earth observation mission of the European
Union’s Copernicus program, which provides optical images with a high revisit frequency
(5 days) and a high spatial resolution (10 m∼60 m). Due to its full spectral and radiometric
characteristics, it provides great convenience for the water quality monitoring of inland
waters. Present studies indicate that the Sentinel-2 MS Instrument sensor enhances not
just the mapping of water quality parameters in global inland waters, but also bolsters
environmental policies through prediction of specific water quality indicators [23].
Remote sensing technology-based methods for inverting optically active parameters
primarily fall into three categories: empirical, semi-empirical, and model analysis methods.
The empirical approach centers on forming a connection between the reflectance of remote
sensing imagery and optically active parameters like Chl-a [24]. However, the relationship
thus established may not always align with the actual correlation. Semi-empirical methods
involve applying appropriate mathematical methods to remote sensing data to estimate
water quality parameters. Although empirical methods are not suitable for regional use,
due to their simplicity of operation, they remain one of the main methods for the remote
sensing monitoring of water quality [25]. The model analysis method emphasizes the
relationship between the actual absorption coefficient and the backscattering coefficient of
remote sensing reflectance, constructing an inversion model between the reflected spectrum
and water body parameters [26], making the model conform to physical interpretation.
However, the complexity of its formula requires a higher level of derivation and calcula-
tion. Among them, the study of non-optically active parameters is relatively less [27,28].
This is because the relationship between surface reflectance and non-optically active pa-
rameters (such as COD) is indirect and nonlinear, and it is difficult to simulate through
traditional empirical models [29,30]. Therefore, it is necessary to explore their relationship
by utilizing the high correlation between non-optically active parameters and optically
active parameters.
As artificial intelligence technology evolves, machine learning methods are increas-
ingly being utilized in the inversion of water quality using remote sensing. Due to the
adaptability, fault tolerance, and self-organization of machine learning [26], it can simulate
complex relationships, which fully meets the complex nonlinear relationship of remote sens-
ing water quality inversion [31]. The latest advancements in machine learning are expected
to improve the ability to analyze the complex nonlinear relationships between optically
active parameters, non-optically active parameters, and surface reflectance. Guo et al. [32]
compared the performance of multiple machine learning models in estimating TP and TN
and used the optimal model to draw a water quality distribution map of their research
area. Nguyen et al. [33] assessed the performance of three machine learning models, in-
cluding Random Forest (RF), in forecasting detrimental cyanobacterial blooms in the Tri An
Reservoir. In a separate study, Guo et al. [34] utilized a machine learning model (Support
Vector Machine (SVR)) to map the spatial distribution of dissolved oxygen in Lake Huron
and examined the influence of climatic factors on long-term trends of dissolved oxygen.
Kim et al. [35] conducted an evaluation of several machine learning algorithms, including
Light Gradient Boosting Machine (LightGBM, [36]), for their effectiveness in estimating
Chl-a in various water bodies using Sentinel-2 imagery. They found that LightGBM demon-
strated high precision and consistency across diverse aquatic environments. Shi et al. [37]
proposed a machine learning model that is more reliable and accurate than empirical
models, revealing the spatiotemporal distribution of Chl-a concentration. Yuan et al. [38]
proposed a spatiotemporal ecological integrated model based on machine learning for
marine ecological environment monitoring. The aforementioned study demonstrates that
integrating machine learning algorithms with remote sensing technology enables the accu-
rate estimation of both optically active and non-optically active parameters.
In most previous studies, the common practice was to calibrate and evaluate various
empirical models or machine learning models, then select a single model with the best
Sustainability 2024, 16, 3355 4 of 19
overall accuracy and apply it to the entire body of water being studied. The reality is that
the optical properties of inland lakes are very complex and become even more complex
with spatial and temporal changes. Although the selected model has the best overall
estimation accuracy, its performance may not be ideal in some parts of the water body.
To improve the prediction accuracy of optically complex inland lake water bodies, this
paper proposes an STE model based on multiple machine learning methods. In this model,
each machine learning method is trained in different branches at the same time, and the
final evaluation result is jointly determined by the output results of all branches and the
overfitting avoidance algorithm. At the same time, during the model training process,
we choose the band combination related to water quality parameters verified by previous
research and each band as predictive factors. Compared with the previous single model,
we have proven that the spatio-temporal integration model can substantially improve the
evaluation accuracy for inland lake water bodies. The main objectives of this study are
summarized as follows:
• We propose an STE model that combines advanced machine learning methods (Ex-
treme Gradient Boosting (XGBoost, [39]), LightGBM, and Categorical Boosting Ma-
chine (CatBoost, [40])) using an ensemble strategy to enhance the robustness of
the model.
• Utilizing high spatio-temporal resolution Sentinel-2 imagery, lake water quality pa-
rameters, and the STE model, we construct the spatio-temporal pattern of Chl-a,
TP, TN and COD in Poyang Lake from 2017 to 2018. We analyze the intra-annual
(monthly, seasonal) and spatial variation characteristics of Poyang Lake, aiming to
provide a scientific basis for the water quality monitoring of water sources through
the spatio-temporal distribution of different water quality parameters.
• Demonstrating the feasibility and advantages of the STE model based on Sentinel-2
images in water quality monitoring under multiple spatiotemporal scenarios.
It is hoped that this study can provide a reference for further research on the water
environment of Poyang Lake. The results of this study can provide a reference for the control
and improvement of the water quality conditions of Poyang Lake and the maintenance
of the aquatic ecology. The results of this study are expected to provide a basis for an
in-depth study of the water environment of Poyang Lake, and provide guidance for the
water quality management, improvement, and water ecology protection of Poyang Lake.
In this study, spring, autumn, and winter refer to March–May, September–November, and
December–February, respectively.
Fuhe, Xinjiang, Raohe, Xiuhe and Boyang River, Xihe, etc., and after regulation and storage,
it flows northward into the Yangtze River from the lake mouth, with an annual average
inflow of 146 billion cubic meters into the Yangtze River. The Poyang Lake water system
basin covers an area of 162,200 km2 , accounting for about 97% of the basin area of Jiangxi
Province and 9% of the Yangtze River basin area. Poyang Lake plays an important role in
regulating the water balance between the basin and the main stream of the Yangtze River,
and in various ecological functions such as flood storage and maintaining biodiversity.
100°0'0"E 140°0'0"E 116°0'0"E 116°30'0"E
± ±
29°30'0"N
China
30°0'0"N
0 5501,100 2,200
km
30°0'0"N
29°0'0"N
27°0'0"N
Jiangxi
28°30'0"N
High : 2103
Figure 2. Location of the study area. The left-hand part shows the geographical location and the
right-hand part shows the composition of the lake.
Table 1. The Sentinel-2 bands used in this study and its parameters.
Table 2. Sampling dates and mean values of sampling data for the study area.
B2 10 Visible blue
B3 10 Visible green
B4 10 Visible red
B5 10 Near-infrared
B6 10 Near-infrared
B7 10 Near-infrared
B8 10 Near-infrared
B8A 10 Near-infrared
B11 10 Shortwave infrared
B12 10 Shortwave infrared
Enhanced Three 10 B6−B5
NDCI 10 (B5 − B4)/(B5 + B4)
TPindex 10 B2/(B3 + B4 + B12)
TNindex 10 (B11 − B12)/(B5 + B8A)
CODindex 10 (B6 + B8A)/(B4 − B12)
∑in=1 ( xi − x )(yb − y)
r ( x, y) = q (1)
∑in−1 ( xi − x )2 ∑in−1 (yi − y)2
CatBoost is a GBDT based on machine learning algorithm. Its uniqueness lies in its
ability to handle heterogeneous features, noisy data, and complex dependencies. This
algorithm employs a method based on objective statistics aimed at reducing computational
complexity and uses Bayesian optimization to avoid the risk of overfitting. In the process of
model construction, CatBoost uses a greedy search strategy, progressively integrating weak
models to build a powerful predictive model [57]. Simultaneously, CatBoost introduces an
ordered boosting method to change the gradient estimation method in classical algorithms.
This method can effectively overcome the prediction offset caused by gradient bias, thereby
further enhancing the generalization ability of the model [40].
pi and p̂i represent the observed and model-estimated concentrations of water quality
parameters for sample i, N represents the total number of observations, and p̄i represents
the mean of the observed values.
exists between Chl-a and TN, while the correlations among other water quality parameters
are relatively weak. Based on the strength of the correlation, it can be inferred that the
water quality of Poyang Lake is influenced by various parameters to some extent, leading
to its spatiotemporal variation.
Chl-a TP TN COD
(a ) C h l-a (b ) T P
(c ) T N (d ) C O D
Figure 3. Performances of models for water quality parameters (Chl-a, TP, TN, and COD).
Table 5. Comparison of model performance metrics (R2 , RMSE and MAE) between STE model and
other models.
Model
Parameters Metrics
STE XGBoost CatBoost LightGBM RF SVR
(a ) C h l-a (b ) T P (c ) T N (d ) C O D
Figure 4. Average Chl-a, TP, TN, and COD concentrations retrieved by the STE model between 2017
and 2018.
± ± ± ±
Spring Spring Spring Spring
Chl-a TP TN COD
mg/m3 mg/L mg/L mg/L
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
High : 7.9
Low : 1.1
0 10 20 40 0 10 20 40 0 10 20 40 0 10 20 40
High : 0.3 High : 2.5 High : 4.05
km
Low : 0.01
km Low : 1.1
km Low : 1.2
km
± ± ± ±
Autumn Autumn Autumn Autumn
Chl-a TP TN COD
mg/m3 mg/L mg/L mg/L
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
0 10 20 40 0 10 20 40 0 10 20 40 0 10 20 40
High : 7.9 High : 0.3 High : 2.5 High : 4.05
km km km km
Low : 1.1 Low : 0.01 Low : 1.1 Low : 1.2
± ± ± ±
Winter Winter Winter Winter
Chl-a TP TN COD
mg/m3 mg/L mg/L mg/L
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
0 10 20 40
High : 7.9
0 10 20 40 0 10 20 40 0 10 20 40
High : 0.3 High : 2.5 High : 4.05
km
km km km
Low : 1.1 Low : 0.01 Low : 1.1 Low : 1.2
Figure 5. Mapping of the seasonal variation distributions of Chl-a, TP, TN and COD.
relatively high TP concentration at the lake’s head might be associated with the widespread
presence of industrial parks and heightened human activity in that area. A large amount of
nutrients from urbanization and industrialization flow into the lake head, resulting in high
TP concentration.
116°0'0"E 116°40'0"E 116°0'0"E 116°40'0"E 116°0'0"E 116°40'0"E 116°0'0"E 116°40'0"E
± ± ± ±
2018.04 2018.04 2018.04 2018.04
Chl-a TP TN COD
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
29°20'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
28°40'0"N
0 10 20 40 0 10 20 40 0 10 20 40 0 10 20 40
High : 14.11 High : 0.18 High : 2.70 High : 3.88
Low : 2.54
km Low : 0.14
km Low : 2.26
km
Low : 1.55
km
4. Discussion
In this study, we resampled Sentinel-2 images to enhance their spatial resolution to
10 m. This is because in the spectral indices and inversion models we constructed, the
initial resolution of most bands is either 10 m (e.g., bands 2, 3, 4, and 8) or 20 m (e.g., bands
5, 6, 7, 8A, 11, and 12). By resampling these coarse spatial resolution bands to a finer spatial
resolution, we can enrich the spatial information. Considering the complexity of the optical
environment of inland lakes and the spatial variability of water quality, in order to better
retain spatial details and display the spatial variation of water quality as much as possible,
we chose to resample the coarse spatial resolution bands to 10 m.
Cloud contamination is one of the main reasons limiting the application of remote
sensing images. Due to the obstruction of its cloud layer, it is impossible to observe surface
information. We performed cloud removal operations on Sentinel-2 remote sensing images.
Cloud removal can significantly reduce the situation where the cloud layer obscures the
surface, making the surface information more clearly displayed, which is particularly
important for the quantitative extraction of surface features, such as the extraction of
surface reflectance of water bodies in this study. At the same time, we shortened the time
window for selecting images, which helps to capture the remote sensing images closest to
the sampling time of water quality parameters, can reduce the variation error caused by
too long time intervals, and improve the accuracy and reliability of the results.
This paper aims to enhance the accuracy and efficiency of remote sensing inversion
of water quality parameters using machine learning ensemble models. The theoretical
basis for the remote sensing quantitative inversion of water quality parameters is the
significant difference in reflectance within a certain range due to the difference in water
component content. This paper explores the potential of using various machine learning
ensemble models to retrieve water quality indices (non-optical/optical active parameters).
The machine learning ensemble model reflects the complex nonlinear relationship between
water quality parameters and spectral reflectance. Therefore, the changes in reflectance in
the study area are consistent with the changes in water quality parameter values. Typically,
the process of inverting water quality parameters using satellite imagery involves analyzing
the correlation between these parameters and remote sensing reflectance to construct a
remote sensing inversion model. In practical research, it is common to form a robust link
between point data from field sampling and surface data from remote sensing pixels of
varying spatial resolutions. Both remote sensing observation values and sampling point
measurement values need to be corrected based on ground reference points. The conversion
of the two will inevitably produce some errors, which is a kind of uncertainty in quantitative
remote sensing inversion. Therefore, we choose to use high spatial resolution images to
Sustainability 2024, 16, 3355 16 of 19
reduce the errors brought by scale effects in the inversion process, thereby improving the
accuracy and efficiency of water quality monitoring.
In addition, the proposed model still has potential room for improvement. First, the
accuracy of the model highly depends on the input data. The errors in in-situ measurement
data and Sentinel-2 satellite image processing may increase the uncertainty of the model,
especially considering that the data we use come from different laboratories and adopt
different processing methods and standards. In addition, although we have performed
cloud removal operations on Sentinel-2 data to reduce the pollution of clouds and cloud
shadows in the image and retain more valid information, this may introduce some errors.
Therefore, when studying lake water quality assessment in the future, we must take
into account this regional variability. Finally, our STE model did not incorporate spatial
information and timestamps into model construction, which may affect the generalization
ability of the model. Therefore, in future work, we plan to develop a reasonable spatio-
temporal coding method to further improve the generalization ability of the model. This
will be the focus of our attention and improvements in the next step.
5. Conclusions
In this study, we established a high-precision water quality parameter estimation
model based on ensemble learning and used the 10 m high-resolution imagery of Sentinel-2
to monitor the seasonal changes of Poyang Lake from 2017 to 2018, and conducted a
preliminary analysis of the spatio-temporal distribution of water quality in Poyang Lake.
The conclusions of this study can be summarized as follows:
• We included multiple related indices, such as NDCI, Enhanced Three, etc., as predic-
tors. These related indices have been used for the inversion of water quality in inland
lakes, verifying their high correlation with multiple water quality parameters. These
related indices can enhance the correlation between Sentinel-2 remote sensing data
and water quality parameters, thereby greatly enhancing the predictive potential of
the model.
• We proposed a new STE model, which combines advanced machine learning meth-
ods and uses an integrated strategy to enhance the robustness of the model. The
results show that the model has good performance in achieving accurate predictions
(R2 > 0.85). At the same time, the water quality parameters predicted by the model
are very close to the field measurement values, and can well realize the inversion of
water quality parameters of medium-sized water bodies.
• We used the STE model to draw a distribution map of the seasonal and spatial changes
in the study area from 2017 to 2018, and found that the water quality parameter
values of Poyang Lake generally showed an upward trend and had certain seasonal
changes. From the figure, it can be seen that the concentrations of Chl-a and TN at
the tail of Poyang Lake are higher than those in the lake, and the TP concentration at
the head of the lake is relatively high. Overall, the water quality of Poyang Lake is
good, and corresponding water quality management measures should continue to
be implemented.
This research offers a practical and effective approach for the surveillance and man-
agement of water quality parameters in inland water areas. Future endeavors will involve
exploring the alterations in water quality parameters of inland waters based on the STE
model, contributing to the safety and administration of inland water quality. Furthermore,
the accuracy of the method could be improved by utilizing multiple data sources.
Author Contributions: Conceptualization, C.P.; methodology, C.P.; software, C.P.; validation, C.P.
and X.J.; formal analysis, C.P.; resources, X.J.; data curation, Z.X.; writing—original draft preparation,
C.P.; writing—review and editing, C.P.; visualization, Z.X.; supervision, X.J.; project administra-
tion, X.J.; funding acquisition, Z.X. All authors have read and agreed to the published version of
the manuscript.
Sustainability 2024, 16, 3355 17 of 19
Funding: This work was supported by National Natural Science Foundation of China (Grant No.
U20A20121); Ningbo public welfare project (Grant No. 202002N3109, 2022S094); Natural Science
Foundation of Zhejiang Province (Grant No. LY21F020006); The international cooperation project of
Ningbo (Grant No. 2023H012, 2023H007); Science and Technology Innovation 2025 Major Project of
Ningbo (Grant No. 2019B10125, 2019B10028, 2020Z016, 2021Z031, 2022Z074, 2022Z241, 2023Z132,
2023Z133, 2023Z216, 2023Z180); Ningbo Fenghua District industrial chain key core technology
“unveiled the commander” project (Grant No. 202106206).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy reasons.
Acknowledgments: We are grateful for the following data providers: ESA for Sentinel-2 images and
He Liu et al. from the Nanjing Institute of Geography and Limnology, Chinese Academy of Sciences
for Water quality parameter data.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Duan, Z.; Bastiaanssen, W. Estimating water volume variations in lakes and reservoirs from four operational satellite altimetry
databases and satellite imagery data. Remote Sens. Environ. 2013, 134, 403–416. [CrossRef]
2. Messager, M.L.; Ettinger, A.K.; Murphy-Williams, M.; Levin, P.S. Fine-scale assessment of inequities in inland flood vulnerability.
Appl. Geogr. 2021, 133, 102492. [CrossRef]
3. Verpoorter, C.; Kutser, T.; Seekell, D.A.; Tranvik, L.J. A global inventory of lakes based on high-resolution satellite imagery.
Geophys. Res. Lett. 2014, 41, 6396–6402. [CrossRef]
4. Yang, K.; Smith, L.C. Internally drained catchments dominate supraglacial hydrology of the southwest Greenland Ice Sheet.
J. Geophys. Res. Earth Surf. 2016, 121, 1891–1910. [CrossRef]
5. Zhao, G.; Li, Y.; Zhou, L.; Gao, H. Evaporative water loss of 1.42 million global lakes. Nat. Commun. 2022, 13, 3686. [CrossRef]
[PubMed]
6. Ho, J.C.; Michalak, A.M.; Pahlevan, N. Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature
2019, 574, 667–670. [CrossRef] [PubMed]
7. Zhang, S.; Yager, P.L.; Liang, C.; Shen, Z.; Xian, W. Distribution and spatial-temporal variation of organic matter along the Yangtze
River-ocean continuum. Elem. Sci. Anth. 2022, 10, 00034. [CrossRef]
8. Alcântara, E.; Bernardo, N.; Rodrigues, T.; Watanabe, F. Modeling the spatio-temporal dissolved organic carbon concentration in
Barra Bonita reservoir using OLI/Landsat-8 images. Model. Earth Syst. Environ. 2017, 3, 11. [CrossRef]
9. Watanabe, F.S.Y.; Alcântara, E.; Rodrigues, T.W.P.; Imai, N.N.; Barbosa, C.C.F.; Rotta, L.H.d.S. Estimation of chlorophyll-a
concentration and the trophic state of the Barra Bonita hydroelectric reservoir using OLI/Landsat-8 images. Int. J. Environ. Res.
Public Health 2015, 12, 10391–10417. [CrossRef] [PubMed]
10. Li, Y.; Zhang, Y.; Shi, K.; Zhou, Y.; Zhang, Y.; Liu, X.; Guo, Y. Spatiotemporal dynamics of chlorophyll-a in a large reservoir as
derived from Landsat 8 OLI data: Understanding its driving and restrictive factors. Environ. Sci. Pollut. Res. 2018, 25, 1359–1374.
[CrossRef] [PubMed]
11. Xiao, H.; Krauss, M.; Floehr, T.; Yan, Y.; Bahlmann, A.; Eichbaum, K.; Brinkmann, M.; Zhang, X.; Yuan, X.; Brack, W.; et al.
Effect-directed analysis of aryl hydrocarbon receptor agonists in sediments from the Three Gorges Reservoir, China. Environ. Sci.
Technol. 2016, 50, 11319–11328. [CrossRef] [PubMed]
12. Chawla, I.; Karthikeyan, L.; Mishra, A.K. A review of remote sensing applications for water security: Quantity, quality, and
extremes. J. Hydrol. 2020, 585, 124826. [CrossRef]
13. Wang, J.; Song, C.; Reager, J.T.; Yao, F.; Famiglietti, J.S.; Sheng, Y.; MacDonald, G.M.; Brun, F.; Schmied, H.M.; Marston, R.A.; et al.
Recent global decline in endorheic basin water storages. Nat. Geosci. 2018, 11, 926–932. [CrossRef] [PubMed]
14. Guo, S.; Sun, B.; Zhang, H.K.; Liu, J.; Chen, J.; Wang, J.; Jiang, X.; Yang, Y. MODIS ocean color product downscaling via
spatio-temporal fusion and regression: The case of chlorophyll-a in coastal waters. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 340–361.
[CrossRef]
15. He, J.; Chen, Y.; Wu, J.; Stow, D.A.; Christakos, G. Space-time chlorophyll-a retrieval in optically complex waters that accounts for
remote sensing and modeling uncertainties and improves remote estimation accuracy. Water Res. 2020, 171, 115403. [CrossRef]
[PubMed]
16. Tran, T.V.; Tran, D.X.; Myint, S.W.; Huang, C.Y.; Pham, H.V.; Luu, T.H.; Vo, T.M. Examining spatiotemporal salinity dynamics
in the Mekong River Delta using Landsat time series imagery and a spatial regression approach. Sci. Total Environ. 2019,
687, 1087–1097. [CrossRef] [PubMed]
Sustainability 2024, 16, 3355 18 of 19
17. Feng, Q.; Cheng, X.; Shen, X.; Xiao, X.; Wang, L.; Zhang, W. Inland Riverine Turbidity Estimation for Hanjiang River with Landsat
8 OLI Imager. J. Wuhan Univ. (Inf. Sci. Ed.) 2017, 42, 643–647.
18. Dong, G.; Hu, Z.; Liu, X.; Fu, Y.; Zhang, W. Spatio-temporal variation of total nitrogen and ammonia nitrogen in the water source
of the middle route of the South-to-North Water Diversion Project. Water 2020, 12, 2615. [CrossRef]
19. Wang, Z.; Wei, L.; He, C.; Lu, Q. Ammonia nitrogen monitoring of urban rivers with UAV-borne hyperspectral remote sensing
imagery. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium,
11–16 July 2021; pp. 3713–3716.
20. Barnes, B.B.; Hu, C. Dependence of satellite ocean color data products on viewing angles: A comparison between SeaWiFS,
MODIS, and VIIRS. Remote Sens. Environ. 2016, 175, 120–129. [CrossRef]
21. Tellman, B.; Sullivan, J.A.; Kuhn, C.; Kettner, A.J.; Doyle, C.S.; Brakenridge, G.R.; Erickson, T.A.; Slayback, D.A. Satellite imaging
reveals increased proportion of population exposed to floods. Nature 2021, 596, 80–86. [CrossRef] [PubMed]
22. Olthof, I.; Rainville, T. Dynamic surface water maps of Canada from 1984 to 2019 Landsat satellite imagery. Remote Sens. Environ.
2022, 279, 113121. [CrossRef]
23. Achmad, A.R.; Syifa, M.; Park, S.J.; Lee, C.W. Geomorphological transition research for affecting the coastal environment due to
the volcanic eruption of Anak Krakatau by satellite imagery. J. Coast. Res. 2019, 90, 214–220. [CrossRef]
24. Jiang, Q. Study on the Effectiveness Evaluation Method of Satellite Remote Sensing in the Monitoring of Lake and Reservoir Water Quality:
Take GF-1 Satellite as an Example; Lanzhou Jiaotong University: Lanzhou, China, 2020. [CrossRef]
25. Barrett, D.C.; Frazier, A.E. Automated method for monitoring water quality using Landsat imagery. Water 2016, 8, 257. [CrossRef]
26. Wang, S.M.; Qin, B.Q. Research progress on remote sensing monitoring of lake water quality parameters. Huan Jing Xue=Huanjing
Kexue 2023, 44, 1228–1243.
27. Sagan, V.; Peterson, K.T.; Maimaitijiang, M.; Sidike, P.; Sloan, J.; Greeling, B.A.; Maalouf, S.; Adams, C. Monitoring inland water
quality using remote sensing: Potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud
computing. Earth-Sci. Rev. 2020, 205, 103187. [CrossRef]
28. Xiong, Y.; Ran, Y.; Zhao, S.; Zhao, H.; Tian, Q. Remotely assessing and monitoring coastal and inland water quality in China:
Progress, challenges and outlook. Crit. Rev. Environ. Sci. Technol. 2020, 50, 1266–1302. [CrossRef]
29. Xiong, J.; Lin, C.; Cao, Z.; Hu, M.; Xue, K.; Chen, X.; Ma, R. Development of remote sensing algorithm for total phosphorus
concentration in eutrophic lakes: Conventional or machine learning? Water Res. 2022, 215, 118213. [CrossRef]
30. Yu, X.; Yi, H.; Liu, X.; Wang, Y.; Liu, X.; Zhang, H. Remote-sensing estimation of dissolved inorganic nitrogen concentration in the
Bohai Sea using band combinations derived from MODIS data. Int. J. Remote Sens. 2016, 37, 327–340. [CrossRef]
31. Cao, X.; Zhang, J.; Meng, H.; Lai, Y.; Xu, M. Remote sensing inversion of water quality parameters in the Yellow River Delta. Ecol.
Indic. 2023, 155, 110914. [CrossRef]
32. Guo, H.; Huang, J.J.; Chen, B.; Guo, X.; Singh, V.P. A machine learning-based strategy for estimating non-optically active water
quality parameters using Sentinel-2 imagery. Int. J. Remote Sens. 2021, 42, 1841–1866. [CrossRef]
33. Nguyen, H.Q.; Ha, N.T.; Pham, T.L. Inland harmful cyanobacterial bloom prediction in the eutrophic Tri An Reservoir using
satellite band ratio and machine learning approaches. Environ. Sci. Pollut. Res. 2020, 27, 9135–9151. [CrossRef] [PubMed]
34. Guo, H.; Huang, J.J.; Zhu, X.; Wang, B.; Tian, S.; Xu, W.; Mai, Y. A generalized machine learning approach for dissolved oxygen
estimation at multiple spatiotemporal scales using remote sensing. Environ. Pollut. 2021, 288, 117734. [CrossRef] [PubMed]
35. Kim, Y.W.; Kim, T.; Shin, J.; Lee, D.S.; Park, Y.S.; Kim, Y.; Cha, Y. Validity evaluation of a machine-learning model for chlorophyll
a retrieval using Sentinel-2 from inland and coastal waters. Ecol. Indic. 2022, 137, 108737. [CrossRef]
36. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision
tree. Adv. Neural Inf. Process. Syst. 2017, 9, 3149–3157.
37. Shi, X.; Gu, L.; Jiang, T.; Zheng, X.; Dong, W.; Tao, Z. Retrieval of chlorophyll-a concentrations using Sentinel-2 MSI imagery in
Lake Chagan based on assessments with machine learning models. Remote Sens. 2022, 14, 4924. [CrossRef]
38. Zhang, Y.; Shen, F.; Sun, X.; Tan, K. Marine big data-driven ensemble learning for estimating global phytoplankton group
composition over two decades (1997–2020). Remote Sens. Environ. 2023, 294, 113596. [CrossRef]
39. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference
on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794.
40. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv.
Neural Inf. Process. Syst. 2018, 11, 6639–6649.
41. Song, L.; Song, C.; Luo, S.; Chen, T.; Liu, K.; Li, Y.; Jing, H.; Xu, J. Refining and densifying the water inundation area and storage
estimates of Poyang Lake by integrating Sentinel-1/2 and bathymetry data. Int. J. Appl. Earth Obs. Geoinf. 2021, 105, 102601.
[CrossRef]
42. Salameh, E.; Frappart, F.; Turki, I.; Laignel, B. Intertidal topography mapping using the waterline method from Sentinel-1 &-2
images: The examples of Arcachon and Veys Bays in France. ISPRS J. Photogramm. Remote Sens. 2020, 163, 98–120.
43. Yang, K.; Smith, L.C.; Sole, A.; Livingstone, S.J.; Cheng, X.; Chen, Z.; Li, M. Supraglacial rivers on the northwest Greenland
Ice Sheet, Devon Ice Cap, and Barnes Ice Cap mapped using Sentinel-2 imagery. Int. J. Appl. Earth Obs. Geoinf. 2019, 78, 1–13.
[CrossRef]
44. Vanhellemont, Q. Adaptation of the dark spectrum fitting atmospheric correction for aquatic applications of the Landsat and
Sentinel-2 archives. Remote Sens. Environ. 2019, 225, 175–192. [CrossRef]
Sustainability 2024, 16, 3355 19 of 19
45. Saberioon, M.; Brom, J.; Nedbal, V.; Souček, P.; Císař, P. Chlorophyll-a and total suspended solids retrieval and mapping using
Sentinel-2A and machine learning for inland waters. Ecol. Indic. 2020, 113, 106236. [CrossRef]
46. Liu, H.; Zhang, Q.; Niu, Y.; Xu, L.; Hu, Y. A Dataset of Water Environment Survey in the Poyang Lake from 2013 to 2018; Science Data
Bank: Beijing, China , 2019.
47. Mishra, S.; Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a
concentration in turbid productive waters. Remote Sens. Environ. 2012, 117, 394–406. [CrossRef]
48. Yang, W.; Matsushita, B.; Chen, J.; Fukushima, T.; Ma, R. An enhanced three-band index for estimating chlorophyll-a in turbid
case-II waters: Case studies of Lake Kasumigaura, Japan, and Lake Dianchi, China. IEEE Geosci. Remote Sens. Lett. 2010,
7, 655–659. [CrossRef]
49. Pena, M.; van den Dool, H. Consolidation of multimodel forecasts by ridge regression: Application to Pacific sea surface
temperature. J. Clim. 2008, 21, 6521–6538. [CrossRef]
50. Hosseiny, B.; Mahdianpari, M.; Brisco, B.; Mohammadimanesh, F.; Salehi, B. WetNet: A spatial–temporal ensemble deep learning
model for wetland classification using Sentinel-1 and Sentinel-2. IEEE Trans. Geosci. Remote Sens. 2021, 60, 3113856. [CrossRef]
51. Zhou, T.; Lu, H.; Yang, Z.; Qiu, S.; Huo, B.; Dong, Y. The ensemble deep learning model for novel COVID-19 on CT images. Appl.
Soft Comput. 2021, 98, 106885. [CrossRef] [PubMed]
52. Ganaie, M.A.; Hu, M.; Malik, A.; Tanveer, M.; Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022,
115, 105151. [CrossRef]
53. Parsa, A.B.; Movahedi, A.; Taghipour, H.; Derrible, S.; Mohammadian, A.K. Toward safer highways, application of XGBoost and
SHAP for real-time accident detection and feature analysis. Accid. Anal. Prev. 2020, 136, 105405. [CrossRef] [PubMed]
54. Su, H.; Lu, X.; Chen, Z.; Zhang, H.; Lu, W.; Wu, W. Estimating coastal chlorophyll-a concentration from time-series OLCI data
based on machine learning. Remote Sens. 2021, 13, 576. [CrossRef]
55. Wang, N.; Zhang, G.; Pang, W.; Ren, L.; Wang, Y. Novel monitoring method for material removal rate considering quantitative
wear of abrasive belts based on LightGBM learning algorithm. Int. J. Adv. Manuf. Technol. 2021, 114, 3241–3253. [CrossRef]
56. Zhang, T.; Su, H.; Yang, X.; Yan, X. Remote sensing prediction of global subsurface thermohaline and the impact of longitude and
latitude based on LightGBM. J. Remote Sens. 2020, 24, 1255–1269. [CrossRef]
57. Zhang, Y.; Zhao, Z.; Zheng, J. CatBoost: A new approach for estimating daily reference crop evapotranspiration in arid and
semi-arid regions of Northern China. J. Hydrol. 2020, 588, 125087. [CrossRef]
58. Tibshirani, R. Regression selection and shrinkage via the lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.