ieee paper4
ieee paper4
https://doi.org/10.3390/toxins15100608
toxins
Article
Ensemble Machine Learning of Gradient Boosting (XGBoost,
LightGBM, CatBoost) and Attention-Based CNN-LSTM for
Harmful Algal Blooms Forecasting
Jung Min Ahn * , Jungwook Kim and Kyunghyun Kim
Water Quality Assessment Research Division, Water Environment Research Department, National Institute of
Environmental Research, Incheon 22689, Republic of Korea; [email protected] (J.K.); [email protected] (K.K.)
* Correspondence: [email protected]; Tel.: +82-32-560-7490; Fax: +82-32-568-2053
Abstract: Harmful algal blooms (HABs) are a serious threat to ecosystems and human health.
The accurate prediction of HABs is crucial for their proactive preparation and management. While
mechanism-based numerical modeling, such as the Environmental Fluid Dynamics Code (EFDC),
has been widely used in the past, the recent development of machine learning technology with
data-based processing capabilities has opened up new possibilities for HABs prediction. In this study,
we developed and evaluated two types of machine learning-based models for HABs prediction:
Gradient Boosting models (XGBoost, LightGBM, CatBoost) and attention-based CNN-LSTM models.
We used Bayesian optimization techniques for hyperparameter tuning, and applied bagging and
stacking ensemble techniques to obtain the final prediction results. The final prediction result was
derived by applying the optimal hyperparameter and bagging and stacking ensemble techniques,
and the applicability of prediction to HABs was evaluated. When predicting HABs with an ensemble
technique, it is judged that the overall prediction performance can be improved by complementing the
advantages of each model and averaging errors such as overfitting of individual models. Our study
highlights the potential of machine learning-based models for HABs prediction and emphasizes the
need to incorporate the latest technology into this important field.
Keywords: harmful algal blooms; Gradient Boosting; attention-based CNN-LSTM; Bayesian opti-
mization; ensemble techniques
Citation: Ahn, J.M.; Kim, J.; Kim, K.
Ensemble Machine Learning of Key Contribution: We developed the Gradient Boosting (XGBoost; LightGBM; CatBoost) series and
Gradient Boosting (XGBoost, the attention-based CNN-LSTM model for HABs prediction.
LightGBM, CatBoost) and
Attention-Based CNN-LSTM for
Harmful Algal Blooms Forecasting.
Toxins 2023, 15, 608. https://doi.org/ 1. Introduction
10.3390/toxins15100608
Various artificial environmental changes caused by continuous human activities, such
Received: 4 September 2023 as the Four Major Rivers Restoration Project and global climate change, are changing
Revised: 26 September 2023 the aquatic environment and increasing the frequency of harmful algal blooms (HABs).
Accepted: 2 October 2023 Recently, in the Republic of Korea, the problem of water source management has been
Published: 10 October 2023
raised due to the occurrence of HABs in the water source section every summer, and many
damages, such as the death of aquatic organisms, are occurring. As a result, the need to
preemptively predict and respond to HABs is emerging. Economic losses from HABs over
Copyright: © 2023 by the authors.
the past 30 years have been estimated at USD121 million. The occurrence, duration, and
Licensee MDPI, Basel, Switzerland. frequency of HABs are increasing, posing a serious threat to aquatic ecosystems.
This article is an open access article The National Institute of Environmental Research (NIER) integrated the water qual-
distributed under the terms and ity forecasting system and the algae warning system in 2020 as a system for managing
conditions of the Creative Commons HABs and provides HABs forecast information to HABs management institutions and the
Attribution (CC BY) license (https:// general public so that they can be managed preemptively through HABs forecasting. It
creativecommons.org/licenses/by/ is very important to improve the accuracy of HABs prediction by upgrading the HABs
4.0/). prediction technology.
Various studies are being conducted to predict HABs as a method for quickly preparing
a policy management plan before or when HABs are expected to occur. Previous studies
have focused on improving HABs monitoring technology and raising awareness, and
mechanism-based numerical modeling such as the Environmental Fluid Dynamics Code
(EFDC) has been considered as an alternative to understanding and mitigating the effects
of HABs. Recently, machine learning technology with large data processing capability
has been attracting attention, and it is used in various fields such as voice recognition,
image analysis, and biological mechanisms. Among various time-series machine learning
algorithms, Gradient Boosting and deep learning technologies are being advanced and
applied to various topics. Artificial intelligence (AI) methods make significant contributions
to the control of a system, determining the decisions to be made about the system, future
strategies, and increasing efficiency [1].
Gradient Boosting is generally known to have a higher prediction performance than
random forest. Since an ensemble model is constructed using multiple decision trees, it
shows high prediction performance. Since the decision tree learns the model by predicting
the residual error of the previous decision tree, it has an effect of preventing overfitting. Rep-
resentatively, there are eXtreme Gradient Boosting (XGBoost) [2], Light Gradient Boosting
Machine (LightGBM) [3], and Categorical Boosting (CatBoost) [4].
XGBoost, LightGBM, and CatBoost are all machine learning libraries based on the
Gradient Boosting algorithm. XGBoost was developed in 2014 and gained popularity as
it performed well on large datasets and won many data science competitions. Since then,
it has been developed by adding various functions such as GPU learning and distributed
learning through version updates. LightGBM, developed by Microsoft in 2017, has faster
speed and lower memory usage than XGBoost and is designed to ensure high speed in large
data while ensuring high accuracy even in small data samples. CatBoost was developed
by Yahoo in 2017, has strengths in handling categorical variables, and is an optimized
algorithm that automatically applies regularization to prevent overfitting and enables fast
learning on both CPU and GPU.
Research on deep learning technology began with the RNN (Recurrent Neural Net-
work) model [5], which was structured to calculate the current output value by considering
the previous input value. The LSTM (Long Short-Term Memory) model [6] and the GRU
(Gated Recurrent Unit) model [7] have also been published. The GRU model, which has a
simpler structure than LSTM and is an improved model using a gate to update the state of
a memory cell, was introduced to solve the problem that the length of the input sequence
and the output sequence are different. The Seq2Seq (Sequence-to-Sequence) model [8]
uses two RNN models, an encoder and a decoder, respectively, and was introduced to
solve this problem. To overcome the limitations of the RNN model, which uses all informa-
tion in the input sequence equally, the attention mechanism was developed by Bahdanau
et al. [9], a method of extracting information by focusing only on the necessary part of the
input sequence and calculating the output value. The transformer model, which further
develops the attention mechanism into a multi-head attention form, was introduced by
Vaswani et al. [10]. The Temporal Convolutional Network (TCN) model, which combines a
1D-CNN (Convolutional Neural Network) with models such as RNN, LSTM, and GRU,
was proposed by Oord et al. [11]. It is a model applied to time-series data prediction using a
multi-head attention-based transformer model. Lim et al. [12] proposed a Temporal Fusion
Transformer (TFT) model.
Recent research studies on predicting HABs using Gradient Boosting and deep learn-
ing techniques have become increasingly prevalent, particularly in the context of time-series
data analysis. HABs data, along with various weather and water-quality variables that
impact HABs, exhibit a time-series distribution. Kim et al. [13] improved the performance
of machine learning models for the early warning of HABs using an adaptive synthetic
sampling method. In a study utilizing the Gradient Boosting technique, García et al. [14]
employed gradient-boosted regression trees to predict cynotoxin levels. There is also on-
going research employing deep learning techniques, such as the LSTM method, which
Toxins 2023, 15, 608 3 of 15
is particularly effective for time-series analysis and has been widely used for predicting
algae (Hill et al. [15]; Liang et al. [16]; Zheng et al. [17]). Li et al. [18] have enhanced HABs
prediction by combining ARIMA and LSTM techniques.
Previous studies utilized a single algorithm for HABs prediction. However, in this
study, we aimed to combine the Gradient Boosting and deep learning techniques using
ensemble methods. Specifically, we developed models using the Gradient Boosting series
(XGBoost, LightGBM, CatBoost) and the deep learning series (attention-based CNN-LSTM
model) for HABs prediction. Combining diverse models and refining predictions through
ensemble techniques can reduce the uncertainty associated with prediction outcomes.
Therefore, we sought to combine Gradient Boosting techniques with deep learning. We
integrated Gradient Boosting techniques using stacking ensemble methods and combined
stacking ensemble techniques with deep learning using bagging methods. The final pre-
diction results were generated from these developed models using bagging and stacking
ensemble techniques, and their applicability to HABs was assessed.
ff tt
Toxins 2023, 15, 608 4 of 15
To preprocess the data, the transform function from scikit-learn’s MinMaxScaler library
was used for normalization. In the case of the deep learning model, the data were converted
to Tensor data using the Variable function. The preprocessed data is structured as described
in Appendix A, consisting of sequences, targets, and goals, and learning and prediction are
performed accordingly.
the optimal input value that generates the maximum or minimum function return value
for a function whose objective function expression is not properly known, through as few
trials as possible. HyperOpt was used for optimization. HyperOpt performs optimization
according to the following procedure:
(1) Randomly sample hyperparameters and observe performance results.
(2) Based on the observed values, the surrogate model estimates the optimal function and
confidence interval (=result error deviation = means the uncertainty of the estimation
function).
(3) Based on the estimated optimal function, the acquisition function calculates the next
hyperparameter to be observed and passes it to the surrogate model.
(4) Optimization is performed in the order of updating the surrogate model again based
on the observed values by performing the hyperparameters passed from the acquisi-
tion function.
By repeating steps 2 to 4, it is possible to improve the uncertainty of the alternative
model and gradually estimate an accurate optimal function. HyperOpt is an optimization
technique in which Bayesian probability improves the posterior probability based on new
data. When new data are input, the optimal function is predicted and the posterior model
is improved to create an optimal function model. The maximum number of evaluations
(max_evals) was set to 50 times, and the final tuned-hyperparameter results are shown in
Table 2.
∑ 𝑦 −𝑦
𝑅 =1−
∑ 𝑦 −𝑦
Toxins 2023, 15, 608 6 of 15
∑ |𝑦 − 𝑦 |
𝑀𝐴𝐸 =
𝑛
To predict harmful algal blooms (HABs), we generated five bootstrap samples for
each XGBoost, LightGBM, and CatBoost model by changing the seed to 0, 1, 2, 3, 4, and 5
∑
using the tuned hyperparameters for each model. 𝑦The
− 𝑦resulting predictions are shown in
𝑅𝑀𝑆𝐸 =
𝑛 bagging ensemble method and the
Figure 3a–c, and the post-processed results using the
stacking ensemble method are shown in Figure 3d,e, respectively. The data used for tuning
𝑦
the hyperparameters were observed𝑦 from 1 January 2014 to 31𝑦 December 2021, and the
𝑛
ensemble prediction period was observed from 1 January 2022 to 31 December 2022.
(a) (b)
(c) (d)
(e)
Figure 3. Results of HABs (cells/mL) predicted via Gradient Boosting technique: (a) XGBoost;
(b) LightGBM; (c) CatBoost; (d) bagging ensemble; (e) stacking ensemble.
Prediction accuracy was evaluated using R2 , MAE, and RMSE (Equations (1)–(3)).
For XGBoost, the R2 was 0.92, the MAE was 0.2, and the RMSE was 0.4. For LightGBM,
the R2 was 0.93, the MAE was 0.1, and the RMSE was 0.4. For CatBoost, the R2 was 0.89,
the MAE was 0.2, and the RMSE was 0.5. All three models produced highly accurate
prediction results for HABs. When the bagging ensemble technique was applied to the
results of the three models, the R2 was 0.92, the MAE was 0.2, and the RMSE was 0.4.
When the stacking ensemble technique was applied, the R2 was 0.93, the MAE was 0.1, and
tt
Toxins 2023, 15, 608 7 of 15
the RMSE was 0.4. The error size was relatively high for CatBoost around June, and the
bagging ensemble technique resulted in a smaller overall error deviation than the stacking
ensemble technique.
Although relatively poor prediction results were obtained for CatBoost compared to
the other models, the accuracy of the prediction results can vary depending on various
factors such as time, location, input data composition, and model. To reduce the uncer-
tainty of the prediction result, we suggest using multiple models with different seeds and
performing post-processing with the bagging and stacking ensemble techniques.
2
∑in=1 (yi − ŷi )
R2 = 1 − 2
(1)
∑in=1 (yi − yi )
tt
series models and the attention-based CNN-LSTM result were presented tt as the final
predicted value—post-processed
tt with the result of Figure 3d and the bagging ensemble
technique—R2 was 0.93, MAE was 0.1, and RMSE was 0.4 (Figure 5). It can be seen
that the uncertainty of individual models for HABs prediction is reduced when the final
prediction results are presented with Gradient Boosting (XGBoost, LightGBM, CatBoost)
series prediction results and deep learning attention-based CNN-LSTM prediction results
as a bagging ensemble technique, ttleading to an improved prediction accuracy (Table 4).
tt
Figure 4. Attention-based CNN-LSTM training and prediction networks.
ff
Toxins 2023, 15, 608 9 of 15
Gradient Boosting models (XGBoost, LightGBM, CatBoost) are models that demon-
strate strong prediction performance for various datasets, and have high accuracy and fast
learning speed. Attention-based CNN-LSTM models are applicable to both image and time-
series data, and are able to learn both temporal and spatial patterns. Therefore, by using an
attention-based CNN-LSTM model and a Gradient Boosting model (XGBoost, LightGBM,
CatBoost) as an ensemble technique for learning and predicting HABs, it is expected that
the strengths of each model can complement each other, resulting in an improved predic-
tion performance. Based on the prediction results from various perspectives, it is judged
that HABs prediction can be performed by utilizing the strengths of each model, such
as recognizing overfitting problems that may occur in specific situations. The algorithm
developed in this study can be downloaded from the link in the supplementary materials.
3. Conclusions
In this study, an algorithm that can predict HABs was developed using an ensemble
technique of Gradient Boosting (XGBoost, LightGBM, CatBoost)-based prediction models
and deep learning attention-based CNN-LSTM models. The major findings of this study
are listed below:
(1) Water temperature was found to have the greatest correlation with HABs, and positive
correlations were shown in month, pH, and T-P, according to the correlation analysis
between the learning data. Since the deviation of the data values for HABs was large,
log values were substituted, and data preprocessing was applied with MinMaxScaler
normalization.
(2) XGBoost, LightGBM, CatBoost models, and attention-based CNN-LSTM models
were developed, and optimal hyperparameter results were presented by tuning
hyperparameters with Bayesian optimization techniques using observation data from
2014 to 2021.
(3) By applying the hyperparameters derived from the Bayesian optimization technique
to predict HABs in 2022, the error of the bagged ensemble prediction result of the
Gradient Boosting (XGBoost, LightGBM, CatBoost) model was 0.92 for R2 , 0.2 for
MAE, and 0.4 for RMSE, and the error of the stacking ensemble prediction result was
0.93 for R2 , 0.1 for MAE, and 0.3 for RMSE. Even when predicting with individual
methods, the worst results were 0.89 for R2 , 0.2 for MAE, and 0.5 for RMSE. Therefore,
it is judged that the overall prediction performance can be improved by offsetting
errors such as those.
(4) Not much data have been accumulated for HABs observation even though it has been
performed on a weekly basis since 2014. Therefore, it was initially expected that the
accuracy of prediction would be low if data-based forecasting techniques were used.
However, this study shows that a fairly high prediction accuracy can be achieved
by applying the ensemble technique. If future data are accumulated and advanced
algorithms are developed, the basis for predicting HABs in advance and utilizing
them for policy purposes will be laid.
Toxins 2023, 15, 608 10 of 15
Figure 6. The study area (The red box is the point where monitoring was performed).
The Nakdong River has a gentle slope and many bends, resulting in a slow flow rate
and an increase in water temperature every summer. This increase in temperature causes
a significant amount of damage from blue-green algae in the downstream area. Since the
water system of the Nakdong River directly collects river water and uses it as a water
supply source, there is great concern about drinking water quality due to the occurrence of
algae blooms.
model. In contrast, XGBoost, LightGBM, and CatBoost are algorithms based on the Gradient
Boosting method. The Gradient Boosting algorithm is a method of improving a predictive
model by sequentially adding a series of decision trees and reflecting the residual error
of the previous model in the next model. Each decision tree is built by predicting the
difference (residual error) between the predicted value and the actual value of the previous
model, and it continues to improve the predictive model by reflecting information that the
previous model did not predict.
The advantages and disadvantages of XGBoost, LightGBM, and CatBoost are as
follows: XGBoost shows high prediction performance for high-dimensional data and
sparse data, and works effectively with fast operation speed, scalability, and large data. In
addition, it provides various loss functions and flexible cross-validation functions. However,
it may have low performance with high memory usage and imbalanced data. LightGBM
exhibits fast speed and efficient memory usage, and has excellent processing power for
large amounts of data. In addition, it shows high prediction performance for various
types of data (categorical variables, numerical variables, and sparse data). However, it can
perform poorly on high-dimensional data and lacks the ability to handle it automatically.
CatBoost shows great performance in handling categorical variables and provides its own
cross-validation function to avoid overfitting problems. In addition, it shows excellent
performance in handling outliers or missing values in the data. However, model training is
slow, and performance can be poor on high-dimensional data.
mechanisms first proposed in the field of Neural Machine Translation (NMT). Since it
is an encoder–decoder structure that compresses all input sequence information into a
fixed-size vector and processes it, the longer the input sequence, the more likely problems
will occur. To solve this problem, Bahdanau proposed a mechanism to generate an output
using all viewpoint information of the input sequence. The characteristics of Bahdanau
Attention are as follows: (1) It generates an output using information from all points in the
input sequence, learns the attention weight while generating the output, and identifies the
influence of each point in the input sequence on the output. (2) It also concentrates on a
certain part of the input sequence to learn what to do, which allows the model to focus on
important parts of the input sequence.
References
1. Aksoy, N.; Genc, I. Predictive models development using gradient boosting based methods for solar power plants. J. Comput. Sci.
2023, 67, 101958. [CrossRef]
2. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [CrossRef]
Toxins 2023, 15, 608 14 of 15
3. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Light GBM: A Highly Efficient Gradient Boosting
Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. Available online: https://proceedings.neurips.cc/paper/2017
/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (accessed on 4 April 2023).
4. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv.
Neural Inf. Process. Syst. 2018, 31, 6638–6648. Available online: https://proceedings.neurips.cc/paper/2018/hash/83b2d666b9
8a3b304ce08d05729f3c4b-Abstract.html (accessed on 4 April 2023).
5. Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1986, 78, 1550–1560. [CrossRef]
6. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
7. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations
using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078.
8. Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014, 27,
3104–3112.
9. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473.
10. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural
Inf. Process. Syst. 2017, 30, 5998–6008.
11. Oord, A.V.D.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. arXiv 2016, arXiv:1601.06759.
12. Lim, B.; Son, W.; Kim, H.G.; Kim, S.W. Temporal fusion transformer for interpretable multi-horizon time series forecasting. arXiv
2019, arXiv:1912.09363. [CrossRef]
13. Kim, J.H.; Shin, J.; Lee, H.; Lee, D.H.; Kang, J.; Cho, K.H.; Lee, Y.; Chon, K.; Baek, S.; Park, Y. Improving the performance of
machine learning models for early warning of harmful algal blooms using an adaptive synthetic sampling method. Water Res.
2021, 207, 117821. [CrossRef]
14. García Nieto, P.J.; García-Gonzalo, E.; Sánchez Lasheras, F.; Alonso Fernández, J.R.; Díaz Muñiz, C.; de Cos Juez, F.J. Cyanotoxin
level prediction in a reservoir using gradient boosted regression trees: A case study. Environ. Sci. Pollut. Res. Int. 2018, 25,
22658–22671. [CrossRef]
15. Hill, P.R.; Kumar, A.; Temimi, M.; Bull, D.R. HABNet: Machine Learning, Remote Sensing-Based Detection of Harmful Algal
Blooms. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3229–3239. [CrossRef]
16. Liang, Z.; Zou, R.; Chen, X.; Ren, T.; Su, H.; Liu, Y. Simulate the forecast capacity of a complicated water quality model using the
long short-term memory approach. J. Hydrol. 2020, 581, 124432. [CrossRef]
17. Zheng, L.; Wang, H.; Liu, C.; Zhang, S.; Ding, A.; Xie, E.; Li, J.; Wang, S. Prediction of harmful algal blooms in large water bodies
using the combined EFDC and LSTM models. J. Environ. Manag. 2021, 295, 113060. [CrossRef] [PubMed]
18. Li, H.; Qin, C.; He, W.; Sun, F.; Du, P. Improved predictive performance of cyano bacterial blooms using a hybrid statistical and
deep-learning method. Environ. Res. Lett. 2021, 16, 124045. [CrossRef]
19. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Müller, A.; Nothman, J.; Louppe, G.;
et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
20. Nayak, J.; Naik, B.; Dash, P.B.; Vimal, S.; Kadry, S. Hybrid Bayesian optimization hypertuned catboost approach for malicious
access and anomaly detection in IoT nomaly framework. Sustain. Comput. Inform. Syst. 2022, 36, 100805. [CrossRef]
21. Su, J.; Wang, Y.; Niu, X.; Sha, S.; Yu, J. Prediction of ground surface settlement by shield tunneling using XGBoost and Bayesian
Optimization. Eng. Appl. Artif. Intell. 2022, 114, 105020. [CrossRef]
22. Dong, J.; Zeng, W.; Wu, L.; Huang, J.; Gaiser, T.; Srivastava, A.K. Enhancing short-term forecasting of daily precipitation using
numerical weather prediction bias correcting with XGBoost in different regions of China. Eng. Appl. Artif. Intell. 2023, 117, 105579.
[CrossRef]
23. Farzinpour, A.; Dehcheshmeh, E.M.; Broujerdian, V.; Esfahani, S.N.; Gandomi, A.H. Efficient boosting-based algorithms for shear
strength prediction of squat RC walls. Case Stud. Constr. Mater. 2023, 18, e01928. [CrossRef]
24. Garcia-Moreno, F.M.; Bermudez-Edo, M.; Rodríguez-Fórtiz, M.J.; Garrido, J.L. A CNN-LSTM Deep Learning Classifier for Motor
Imagery EEG Detection Using a Low-invasive and Low-Cost BCI Headband. In Proceedings of the 2020 16th International
Conference on Intelligent Environments (IE), Madrid, Spain, 20–23 July 2020; pp. 84–91. [CrossRef]
25. Xu, G.; Ren, T.; Chen, Y.; Che, W. A One-Dimensional CNN-LSTM Model for Epileptic Seizure Recognition Using EEG Signal
Analysis. Front. Neurosci. 2020, 14, 578126. [CrossRef] [PubMed]
26. Altunay, H.C.; Albayrak, Z. A hybrid CNN + LSTM-based intrusion detection system for industrial IoT networks. Eng. Sci.
Technol. 2023, 38, 101322. [CrossRef]
27. Liang, Y.; Lin, Y.; Lu, Q. Forecasting gold price using a novel hybrid model with ICEEMDAN and LSTM-CNN-CBAM. Expert
Syst. Appl. 2022, 206, 117847. [CrossRef]
28. Ahmed, M.R.; Islam, S.; Islam, A.K.M.M.; Shatabda, S. An ensemble 1D-CNN-LSTM-GRU model with data augmentation for
speech emotion recognition. Expert Syst. Appl. 2023, 218, 119633. [CrossRef]
29. Zhang, W.; Zhou, H.; Bao, X.; Cui, H. Outlet water temperature prediction of energy pile based on spatial-temporal feature
extraction through CNN–LSTM hybrid model. Energy 2023, 264, 126190. [CrossRef]
30. Hu, Y.; Zhang, Q. A hybrid CNN-LSTM machine learning model for rock mechanical parameters evaluation. Geoenergy Sci. Eng.
2023, 225, 211720. [CrossRef]
31. Breiman, L. Random forests. Mach. Learn. 2001, 2, 199–228.
Toxins 2023, 15, 608 15 of 15
32. Trizoglou, P.; Liu, X.; Lin, Z. Fault detection by an ensemble framework of Extreme Gradient Boosting (XGBoost) in the operation
of offshore wind turbines. Renew. Energy 2021, 179, 945–962. [CrossRef]
33. Zhang, J.; Fu, P.; Meng, F.; Yang, X.; Xu, J.; Cui, Y. Estimation algorithm for chlorophyll-a concentrations in water from
hyperspectral images based on feature derivation and ensemble learning. Ecol. Inform. 2022, 71, 101783. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.