0% found this document useful (0 votes)
13 views23 pages

Waste Water Treatment Energy Biugi

Uploaded by

yashchoraria02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

Waste Water Treatment Energy Biugi

Uploaded by

yashchoraria02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

water

Article
Unlocking the Potential of Wastewater Treatment:
Machine Learning Based Energy Consumption Prediction
Yasminah Alali, Fouzi Harrou * and Ying Sun

Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division,


King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia;
[email protected] (Y.S.)
* Correspondence: [email protected]

Abstract: Wastewater treatment plants (WWTPs) are energy-intensive facilities that fulfill stringent
effluent quality norms. Energy consumption prediction in WWTPs is crucial for cost savings, process
optimization, compliance with regulations, and reducing the carbon footprint. This paper evaluates
and compares a set of 23 candidate machine-learning models to predict WWTP energy consumption
using actual data from the Melbourne WWTP. To this end, Bayesian optimization has been applied to
calibrate the investigated machine learning models. Random Forest and XGBoost (eXtreme Gradient
Boosting) were applied to assess how the incorporated features influenced the energy consumption
prediction. In addition, this study investigated the consideration of information from past data
in improving prediction accuracy by incorporating time-lagged measurements. Results showed
that the dynamic models using time-lagged data outperformed the static and reduced machine
learning models. The study shows that including lagged measurements in the model improves
prediction accuracy, and the results indicate that the dynamic K-nearest neighbors model dominates
state-of-the-art methods by reaching promising energy consumption predictions.

Keywords: machine learning; consumption; wastewater treatment plants; data-based methods

1. Introduction
Citation: Alali, Y.; Harrou, F.; Sun, Y.
Recycled water is a strategic alternative to mitigate water scarcity, particularly in arid
Unlocking the Potential of
regions. Notably, the treated water from wastewater treatment plants (WWTPs) can be
Wastewater Treatment: Machine
used for different purposes, such as irrigation, aquariums, or discharged with a low level
Learning Based Energy Consumption
of pollution [1]. WWTPs are energy-intensive processes; therefore, improving their energy
Prediction. Water 2023, 15, 2349.
efficiency is needed from environmental and economic viewpoints [2]. It has been reported
https://doi.org/10.3390/w15132349
in refs. [3–5] that WWTPs energy usage accounts for 4% of the national electricity in the
Academic Editor: Michael Maerker United States and around 7% of the electrical energy worldwide [6]. One potential option
Received: 1 June 2023
for optimizing WWTPs and achieving energy savings is accurately predicting their energy
Revised: 18 June 2023
consumption. By developing predictive models, operators and engineers can forecast the
Accepted: 23 June 2023 expected energy requirements of the treatment processes, enabling them to anticipate the
Published: 25 June 2023 energy demand associated with different operational scenarios and efficiently schedule
equipment and processes. For example, by aligning energy-intensive activities, such as
aeration or pumping, with periods of lower electricity rates or increased availability of
renewable energy, WWTPs can strategically reduce their energy costs. This optimization
Copyright: © 2023 by the authors. approach ensures that energy-intensive processes are carried out when energy prices are
Licensee MDPI, Basel, Switzerland. more favorable or when renewable energy sources are abundant, leading to significant cost
This article is an open access article savings and a reduced environmental impact.
distributed under the terms and WWTPs currently operate conservatively, with high operating costs, and waste a large
conditions of the Creative Commons
amount of energy They are energy intensive because they require significant energy to
Attribution (CC BY) license (https://
perform the various treatment processes necessary to clean the wastewater. Some of the
creativecommons.org/licenses/by/
major energy-intensive processes in a WWTP include the aeration, mixing, and pumping
4.0/).

Water 2023, 15, 2349. https://doi.org/10.3390/w15132349 https://www.mdpi.com/journal/water


Water 2023, 15, 2349 2 of 23

of water and solids for recirculation, filtration, and disinfection. Additionally, WWTPs also
require energy for processing biosolids, which may involve aerobic digestion, heat drying,
and dewatering. The energy demand for disinfection processes can vary depending on
the method employed, with chlorination being less energy-intensive compared to UV or
ozone disinfection. While sedimentation is a necessary process in wastewater treatment,
it is not considered highly energy-intensive. These energy-intensive activities and the
mechanical and electrical equipment used within the WWTP contribute significantly to
the overall energy consumption. The energy consumption of a WWTP can be reduced by
implementing energy-efficient technologies, such as using renewable energy sources, or by
optimizing the treatment process. Machine learning methods are being used in WWTPs to
improve their efficiency and reduce operational costs.
Enhancing the energy efficiency of WWTPs is essential to saving energy, reducing
economic expenses, and preserving resources and the environment [7–11]. Over the years,
numerous methods have been developed for modeling and predicting key characteristics
of WWTPS, including analytical and data-derived techniques [12–14]. Analytical-based
methods rely on a fundamental understanding of the physical process [15]. Developing
a precise physical model for complex, high-dimensional, and nonlinear systems is chal-
lenging, expensive, and time-consuming [16]. On the other hand, data-based methods
only rely on available historical data. Nowadays, data-driven methods, particularly ma-
chine learning methods, are more common in modeling and managing WWTP processes.
For example, ref. [14] investigated the application of machine learning-based approaches to
predict wastewater quality from WWTPs. They applied and compared six models, namely
RF, SVM, Gradient Tree Boosting (GTB), Adaptive Neuro-Fuzzy Inference System (ANFIS),
LSTM, and Seasonal Autoregressive Integrated Moving Average (SARIMAX). Hourly data
collected from three WWTPs has been used to assess the investigated models. Results
demonstrated that SARIMAX outperformed the other models in predicting wastewater
effluent quality and with acceptable time computation. Recently, Andreides et al. pre-
sented an overview of data-driven techniques applied for influent characteristics prediction
at WWTPs [17]. They showed that most reviewed works use machine learning-based
approaches, particularly neural networks. Some studies achieved comparable or better
outcomes using machine learning methods such as kNN and RF. This review concludes
that no one approach dominates all the others from the reviewed literature because they are
conducted using different datasets and settings, making the comparison difficult. Overall,
NNs and hybrid models exhibited satisfactory prediction performance. Guo et al. con-
sidered ANN and SVM models to predict the total nitrogen concentration in a WWTP in
Ulsan, Korea [18]. To this end, daily water quality and meteorological data were used as
input variables for the machine-learning models. The pattern search algorithm is adopted
to calibrate the ANN and SVM models. This study demonstrated the capability of these
two models in predicting water quality and the superior performance of the SVM in this
case study [18]. The study in [19] focused on predicting effluent water quality parameters
of the Tabriz WWTP through a supervised committee fuzzy logic approach. The study
in [2] evaluates the energy efficiency of a sample of Chilean wastewater treatment plants
(WWTPs) using a newly developed technique called stochastic non-parametric envelop-
ment of data (StoNED). This technique combines non-parametric and parametric methods,
and allows for an exploration of the influence of the operating environment on the energy
performance of WWTPs. The study found that the Chilean WWTPs were considerably
inefficient, with an average energy efficiency score of 0.433 and significant opportunities to
save energy (average savings were 203,413 MWh/year). The age of the facilities negatively
affected energy efficiency, and WWTPs using suspended-growth processes, such as conven-
tional activated sludge and extended aeration, had the lowest levels of energy efficiency.
The study suggests that this methodology could be used to support decision-making for
regulation and to plan the construction of new facilities, and the authors also suggest that
this methodology could be used to measure the energy efficiency of other stages of the
urban water cycle, such as drinking water treatment.
Water 2023, 15, 2349 3 of 23

This study explores the potential of machine learning models for predicting energy con-
sumption in WWTPs. The following key points summarize the contributions of this paper:
• First, we considered all input variables, including hydraulic, wastewater characteris-
tics, weather, and time data, to predict energy consumption in a WWTP. This study
compared twenty-three machine learning models, including support vector regres-
sion with different kernels, GPR with different kernels, boosted trees, bagged trees,
decision trees, neural networks (NNs), RF, k-nearest neighbors (KNN), Gradient Boost-
ing (XGBoost), and LightgBMs. Bayesian optimization has been applied to calibrate
and fine-tune the investigated machine-learning models to develop efficient energy
consumption predictions. In addition, A 5-fold cross-validation technique has been
used to construct these models based on training data. Five performance evaluation
metrics are employed to assess the goodness of predictions. Results revealed that
using all input variables to predict EC, the machine learning models did not provide
satisfactory predictions.
• Second, the aim is to construct reduced models by keeping only pertinent input
variables to predict EC in WWTP. To this end, Random Forest and XGBoost algorithms
were applied to identify important variables that considerably influence the prediction
capability of the considered models. Results showed that the reduced models obtained
a slightly improved prediction of EC compared to the full models.
• It is worthwhile to mention that the studied methods do not take into account the
time-dependent nature of energy consumption in the prediction process. Our final
contribution to addressing this limitation is constructing dynamic models by incorpo-
rating lagged measurements as inputs to enhance the ML models’ ability to perform
effectively. Results demonstrated that using lagged data contributes to improving the
prediction quality of the ML models and highlights the superior performance of the
dynamic GPR and KNN.
The remainder of this study is organized as follows. Section 2 provides an overview
of related works on energy consumption prediction. Section 3 presents the data from the
Melbourne WWTP and the airport weather station, along with an exploratory data analysis.
Furthermore, the investigated machine learning models are briefly described. Section 4
describes the proposed machine learning-based prediction framework. Section 5 contains
the results and discussion of the machine learning algorithms within our datasets. Lastly,
Section 6 recapitulates the paper and gives future directions for potential enhancements.

2. Related Works
Recently, many studies have explored the concept of machine learning to control
WWTP by predicting how much energy they consume [20,21]. Machine learning-based
models are flexible and rely only on historical data from the inspected process. For in-
stance, ref. [22] used a Random Forest model to predict the energy consumption of WWTPs.
They assessed this machine-learning approach using 2387 records from the China Urban
Drainage Yearbook. Results indicate that Random Forest exhibited satisfactory prediction
performance with an R2 of 0.702. However, this study does not consider the effects of local
climate and technology in building the predictive model, which is very important. Ref. [23]
considered a logistic regression approach for predicting the energy consumption of a WWTP
in Romania. The input variables, including the flowrate and wastewater characteristics, are
used to predict energy consumption. Data were collected from a WWTP between 2015 and
2017 with 403 records to verify the efficiency of this approach. Results showed a reasonable
prediction quality with an accuracy of 80%. Nevertheless, all parameters that affect water
quality were not considered when constructing the logistic regression model. Ref. [24]
investigated the capacity of the Artificial Neural Network (ANN), K-Nearest Neighbors
(KNN), Support Vector Machine (SVM), and Linear Regression in predicting the energy
consumption of a WWTP located in Peninsular Malaysia. The energy consumption data
was collected from the Tenaga National Berhad (TNB) electrical bills from March 2011 to
February 2015. The wastewater characteristics are collected to construct the predictive
Water 2023, 15, 2349 4 of 23

models. This study showed that the ANN model outperformed the other models. In
ref. [25], the purpose was to save energy at WWTP by performing a daily benchmark
analysis. Torregrossa et al. examined Support Vector Regression (SVR), ANN, and RF
algorithms on the Solingen-Burg WWTP dataset (designed for a connected population of
120,000 individuals). The RF was chosen as the most efficient algorithm based on an R2 of
0.72 in the validation and an R2 of 0.71 in the testing.
Furthermore, ref. [26] applied machine learning methods (ANN and RF) to predict
energy costs in WWTPs. This study is conducted based on 279 WWTPs located in north-
west Europe, including the Netherlands, France, Denmark, Belgium, Germany, Austria,
and Luxembourg. Regarding average R2 , RF reached 0.82, followed by the ANN with
0.81. ANN could be extensively used to investigate larger WWTP databases. Qiao et al.
proposed an approach to predict energy consumption and effluent quality based on a
density peaks-based adaptive fuzzy neural network (DP-AFNN) [27]. They showed that
this approach achieved high prediction accuracy compared to multiple linear regression,
the FNN-EBP (error backpropagation), and the Dynamic FNN. Ref. [5] focused on the mod-
eling and optimization of a wastewater pumping system to reduce energy consumption
using the ANN model. Specifically, they applied neural networks to model pump energy
consumption and wastewater flow rate. An artificial immune network algorithm is adopted
to solve the optimization problem by minimizing energy consumption and maximizing the
pumped wastewater flow rate. Results revealed that 6% to 14% of energy could be saved
while maintaining pumping performance. Ref. [28] investigated the capability of ANN, Gra-
dient Boosting Machine (GBM), RF, and the Long Short-Term Memory (LSTM) network in
predicting energy consumption records from a Melbourne WWTP. The prediction has been
performed by considering weather variables, wastewater characteristics, and hydraulic
variables. Feature extraction has been considered to select important variables to construct
the machine learning models. Results showed that the GBM model provided the best
prediction in this case study. However, with future changes in the test set data, a model’s
performance will degrade when applied to data from subsequent months. In [29], a neural
network model has been applied to predict pump energy consumption in a WWTP. This
will enable the generation of operational schedules for a pump system to decrease energy
consumption. The ANN model showed satisfactory prediction by reaching MAE (Mean
Absolute Error) and MAPE (Mean Absolute Percentage Error) of 0.78 and 0.02, respectively.
Table 1 shows some recent studies on energy consumption prediction in WWTPs using
various Machine Learning techniques.

Table 1. Summary of some studies on the WWTPs power consumption prediction.

Reference Dataset Predicted Algorithm Result


2387 records from the China
Zhang et al. [22] RF model Achieved an R2 of 0.702
Urban Drainage Yearbook
GBM reached the lowest
Melbourne water company RMSE and MAE values in the
Bagherzadeh et al. [28] ANN, GBM, and RF
between 2014 and 2019 test phase, 33.9 and
26.9, respectively
Data collected between 2015
Boncescu et al. [23] and 2017 with a total of Logistic Regression Accuracy around 80%
403 records
EC dataset collected from
NN, KNN, SVM, and Linear ANN reached an RMSE of
Ramli et al. [24] TNB electrical bills from
Regression 52,084
March 2011 to February 2015
RF and ANN obtained an R2
Torregrossa et al. [26] WWTPs in Northwest Europe ANN and RF
of 82 and 0.81, respectively
Torregrossa et al. [25] Solingen-Burg dataset SVR, ANN, and RF algorithms RF obtained an R2 of 0.71
Water 2023, 15, 2349 5 of 23

Table 1. Cont.

Reference Dataset Predicted Algorithm Result


317 WWTPs located in the R2 got a range between 74.2 to
Oulebsir et al. [30] ANN model
north-west of Europe 82.4
Data set covers the period reaches 0.78 and 0.02 in MAE
Zhang et al. [29] ANN model
from July 2010 to January 2012 and MAPE
360 instances collected over GRU achieves the lowest
Das et al. [31] ANN, RNN, LSTM, and GRU
one year MAE of 0.43
CNN models produced higher
Three datasets covering the performance results in both
Oliveira et al. [32] period between January 2016 LSTM, CNN, and GRU tests, achieving 690 kWh in
and May 2020 the RMSE and 630 kWh in
the MAE
RMSE with a score of 20.96
A water district in
Yusuf et al. [33] ARIMA and LSTM models and MAPE with a score
southern California
of 5.51.
70,962 data points were Linear Quantile Regression, GBT achieved MAE and
Filipe et al. [34] collected from September 2013 and Gradient Boosting RMSE scores of 2.43% and
to June 2017 Trees (GBT) 3.31% respectively

3. Materials and Methods


3.1. Data Description and Analysis
This study uses multivariate data from the Melbourne water treatment plant and
airport weather stations: https://data.mendeley.com/datasets/pprkvz3vbd/1, accessed
on 25 April 2023. The data contains 1382 records gathered over a period of five years,
from January 2014 to June 2019, from nineteen variables listed in Table 1. The dataset
includes power consumption data as well as biological, hydraulic, and climate variables.
The data on water quality and biological characteristics are collected using sensors, while
weather data is collected using the Melbourne airport weather station located near the
water treatment plant. This dataset provides a comprehensive view of the various fac-
tors that can impact energy consumption at a wastewater treatment plant and allows the
researchers to build models that take into account the interplay between these factors.
As shown in Figure 1, the data also contains time-domain information that will be consid-
ered in this study to improve prediction performance. More details about this data can
be found in [28]. Additionally, we eliminated data points that had unusually low or high
energy consumption, which were considered outliers. Roughly 5% of the data points were
removed [1].
The evolution of energy consumption in wastewater treatment plants (WWTPs) can
be affected by several factors, including the influx of wastewater volume and composition,
weather conditions such as temperature, precipitation, and water flow, seasonality changes
(e.g., tourism season, temperature, precipitation, and water flow), treatment processes used,
and maintenance of the equipment. WWTP operators can use real-time data on the influx
of wastewater, weather conditions, and treatment processes to monitor the daily evolution
of energy consumption and adjust the treatment process as needed. This can help reduce
energy consumption and costs and improve the overall performance of the WWTP.
Before constructing predictive models, it is important to perform exploratory data
analysis. At first, we plot the time-series data to get a visual idea of the variation and
correlation in the data. Plotting time-series data is an important step in exploratory data
analysis, as it can help identify patterns and trends that can be useful in building predictive
models. Visualizing the data can also help identify any outliers or missing data, which can
be necessary for preprocessing the data before building the predictive models. Figure 2
displays the daily evolution of energy consumption, hydraulic and wastewater variables,
Water 2023, 15, 2349 6 of 23

and some weather time series collected in 2018. As WWTPs should handle highly dynamic
influent, energy consumption and influent will vary accordingly (Figure 2).

Figure 1. Measured variables in this study.

Figure 2. Distribution of some the used variables.


Water 2023, 15, 2349 7 of 23

Figure 3 illustrates the total inflow and energy consumption for each year from 2014
to 2018. The data in Figure 3 is based on the available dataset, which covers the period
from January 2014 to June 2019. However, due to the limited data available for 2019
(only six months), we have excluded the 2019 data from Figure 3 to avoid any potential
confusion. The figure specifically focuses on the years from 2014 to 2018 to provide a clear
representation of the yearly variations in inflow and energy consumption. From Figure 3,
we observe a high similarity in the variation between the inflow and the consumed energy.
Essentially, the volume and composition of wastewater (inflow) that needs to be treated can
affect the energy consumption of a WWTP. An increase in the volume of wastewater can
increase the energy consumption of pumps and other equipment used to move the water
through the treatment process. Additionally, changes in the composition of the wastewater,
such as an increase in the amount of organic matter, can increase the energy consumption
of the biological treatment process.

Figure 3. Yearly sum of inflow (a) and energy consumption (b) from 2014 to 2018.

Figure 4 depicts the distribution of the collected data, indicating that these datasets
are non-Gaussian distributed. Here, the Kernel density estimation [35], a non-parametric
method, is applied to estimating the underlying distribution of a dataset. KDE is a powerful
and flexible method for estimating the distribution of a dataset; it does not make any
assumptions about the underlying distribution and can be used to estimate any kind of
distribution. Figure 4 allows us to see the shape of the distribution, the presence of outliers,
and the general characteristics of the data. It is an effective way of understanding the nature
of the data and the underlying patterns in it.
To visualize the distribution of energy consumption over the years, Figure 5 displays
the boxplots of yearly WWTP power consumption during the studied period from 2014 to
2019. Note that data for 2019 is available for only six months, which is why the boxplot for
that year is more compact than the other years. From Figure 5, we observe that the annual
distribution of energy consumption in 2018 has slightly decreased in average values and
standard deviations compared to 2017. This decrease in energy consumption could be due
to the operator’s optimization and management of the WWTP.
Water 2023, 15, 2349 8 of 23

Figure 4. Distribution of some the used variables.

Figure 5. Distribution of annual energy consumption over the course of the study period.

The boxplots in Figure 6 display the monthly energy consumption patterns, and it is
observed that there is a significant increase in variance during the hot months of October,
November, and December. This could be attributed to the high demand for water during
these months and also to the increase in tourism. These months are typically considered
the hottest period in Australia. The weather during this period is generally sunny, warm,
and humid, which can lead to increased water usage for cooling and other purposes
and also cause an influx of tourists, leading to increased strain on the WWTPs. This
information could be used to improve energy consumption prediction for better resource
allocation and planning.
Water 2023, 15, 2349 9 of 23

Figure 6. Distribution of monthly energy consumption over the course of the study period.

3.2. Methodology
This section briefly describes the considered machine learning models for energy
consumption prediction in this paper. As presented in Figure 7, twenty-three machine
learning models are considered, including GPR, SVR, kNN, and ensemble learning models
(RF, BT, BS, XGBoost, and LightgBM). Each model has its own strengths and weaknesses.
Next, we provide a brief description of these popular machine-learning models.

Figure 7. The investigated machine learning method in this study.

3.2.1. SVR Model


It is a flexible data-based approach with good learning capacity via kernel tricks.
The key idea underlying the SVR consists of mapping the train data to a higher dimensional
space and conducting linear regression in that space. SVR can handle non-linear and
non-separable data using the kernel trick, handle datasets with large numbers of features,
and be robust to the presence of outliers. However, it requires a large amount of compu-
tational power and can be sensitive to the choice of kernel function and regularization
parameter [36,37]. Moreover, the relevant concept used in designing the SVR model lies
in structural risk minimization. It is demonstrated that SVR provides satisfactory per-
formance with limited samples [38]. Thus, SVR models have been broadly exploited in
various applications, such as solar irradiance prediction [39], wind power prediction [40],
and anomaly detection [41]. This study will use optimized SVR via Bayesian optimization
for comparison.
Water 2023, 15, 2349 10 of 23

3.2.2. GPR Model


It is a probabilistic machine learning technique used for both regression and classi-
fication problems [42]. It models the target variable as a Gaussian process and assigns
a probability distribution to the predicted values [43]. GPR is non-parametric, flexible,
and able to capture complex relationships between the input variables and the target vari-
able [44]. A key feature of GPR is that it can also model the uncertainty in the predictions it
makes. This is achieved by assigning a probability distribution to the predicted values [42].
However, GPR is computationally expensive because it requires the calculation of the
covariance matrix between all the data points, which can become computationally intensive
as the number of data points increases. The computational complexity of GPR is typically
O(N 3 ) where N is the number of data points. Additionally, GPR also requires the inversion
of the covariance matrix and the calculation of the likelihood of the model given the data,
which can be computationally expensive as well.

3.2.3. K-Nearest Neighbor


The K-NN is a lazy machine-learning method that does not require training before
use. k-NN is a non-parametric method that can be used for regression by predicting the
output of a new data point based on the average of the k-nearest data points’ output [45].
The value of k is a hyperparameter that needs to be set before training the model, and it
affects the model’s sensitivity to outliers. However, the k-NN algorithm is sensitive to
the scale of the features, so it may not perform well if the features have different scales.
The k-NN algorithm may not perform well with high-dimensional data, as it becomes more
difficult to find the k-nearest neighbors in high-dimensional space.

3.2.4. ANN Models


Artificial Neural Networks (ANNs) are advanced models capable of recognizing
complex and non-linear relationships between inputs and outputs; however, they can be
sensitive to the way they are structured and initiated and the parameters used. They are
frequently used in studies related to wastewater treatment plants (WWTPs). Each layer
of an ANN contains multiple neurons connected to neurons in the next layer and created
to fulfill specific tasks. Different ANN models have been studied, such as Narrow, Wide,
Medium, Bilayered, and Trilayered Neural Networks. Narrow Neural Networks have
a small number of neurons in hidden layers, which makes them less complicated and
quicker to train, but they may not be able to recognize complex relationships. Wide Neural
Networks have many neurons in hidden layers, which makes them more complex and
able to recognize more complex relationships. However, they can be costly to compute
and susceptible to overfitting. Medium Neural Networks have an intermediate number of
neurons in hidden layers and provide a balance between narrow and wide neural networks.
Bilayered Neural Networks have two hidden layers, making them more complex than
single-layered neural networks. Trilayered Neural Networks have three hidden layers,
making them even more complex than Bilayered Neural Networks, but also more costly to
compute and prone to overfitting.

3.2.5. Decision Tree Regression


Decision trees, or regression trees, are a method used to predict a continuous outcome.
It uses a tree-like structure to divide the data into subsets and make predictions based on
the average value of the target variable in each subset. The simplicity of the algorithm
makes it easy to understand and implement [46]. However, the complexity increases as the
tree grows deeper, and the model can become overfit. To mitigate this, techniques such as
pruning, limiting tree depth and using regularization can be used. The time complexity of
the algorithm is typically O(log(n)) on average, but can be as high as O(n) in the worst-case
scenario. The space complexity is O(n) because it stores the entire tree in memory.
Water 2023, 15, 2349 11 of 23

3.2.6. Ensemble Methods


Ensemble learning is a technique where multiple models are combined to make a
single, more accurate prediction [47]. There are several ensemble methods like Bagging,
Boosting, Random Forest, XGBoost, and LightGBM. Bagging, also known as Bootstrap
Aggregating, trains several models independently on different subsets of the data and then
combines the predictions by taking an average [48]. Boosting, on the other hand, trains
multiple models sequentially, where each model tries to correct the prediction errors of the
previous model [49]. Random Forest combines multiple decision trees, creating them on
different subsets of the data and taking the average or majority vote of the predictions from
the individual trees to make a final prediction [50]. XGBoost and LightGBM are open-source
libraries for gradient boosting. XGBoost is highly efficient, scalable, and flexible and is used
in various applications such as machine learning competitions, structured data, and time
series [51]. LightGBM is efficient and scalable, particularly suitable for large datasets and
high-dimensional data; it uses a histogram-based algorithm to build the trees, reducing
computation time and memory usage [52,53]. Ensemble methods are considered more
robust and accurate than single models; they reduce variance and bias in predictions by
combining the strengths of multiple models [39].
When comparing the time complexity of Bagging, Boosting, Random Forest, XGBoost,
and LightGBM, it is important to note that it can depend on the specific implementation and
the size of the dataset. Generally speaking, the time complexity of Bagging and Boosting
is O(mnT ), where m is the number of samples, n is the number of features, and T is the
number of estimators (models) used. As T increases, the time complexity increases linearly.
Random Forest has a time complexity of O(mnlog(m) ∗ T ), where m is the number of
samples, n is the number of features, and T is the number of estimators (models) used.
As T increases, the time complexity increases logarithmically. XGBoost and LightGBM,
which are based on gradient boosting, have a time complexity of O( Tn ∗ log(n)), where T
is the number of estimators (models) used, and n is the number of samples. As T increases,
the time complexity increases linearly. To sum up, Bagging and Boosting have a similar
time complexity, which increases linearly with the number of estimators. Random Forest
has a more complex time complexity, which increases logarithmically with the number of
estimators. XGBoost and LightGBM, based on gradient boosting, have a time complexity
that increases linearly with the number of estimators and the size of the dataset.
In terms of time complexity, Random Forest is considered to have the best time
complexity among Bagging, Boosting, Random Forest, XGBoost, and LightGBM. Its time
complexity is O(mnlog(m) ∗ T ), where m is the number of samples, n is the number of
features, and T is the number of estimators (models) used. As T increases, the time
complexity increases logarithmically. This means that as the number of predictors increases,
the time complexity of Random Forest increases at a slower rate compared to Bagging and
Boosting, which have a linear increase in time complexity with the number of predictors.

3.2.7. Models Calibration via Bayesian Optimization


Fine-tuning a machine learning model involves adjusting the hyperparameters of
the model in order to improve its performance on a specific task. Choosing the best
hyperparameter configuration for a given model has a direct effect on its performance.
There are several ways to fine-tune a machine learning model, including Grid Search,
Random Search, and Bayesian optimization [54]. Grid Search is an exhaustive method that
tries all possible combinations of hyperparameters within a predefined range [55]. Random
Search is similar to grid search, but it randomly samples a set of hyperparameters from a
predefined range [55]. Bayesian optimization, on the other hand, uses probabilistic models
and previous evaluations to guide the Search, which makes it more efficient than the other
two methods [56,57].
Of course, Bayesian optimization aims to find the optimal set of hyperparameters by
selecting the best set of hyperparameters in a more efficient manner than Grid search and
Random search, it uses the information gained from previous evaluations to guide the
Water 2023, 15, 2349 12 of 23

search, so it does not need to evaluate all possible combinations. In this study, we adopted
Bayesian optimization to calibrate the models. For more details on Bayesian Optimization,
refer to refs. [57,58].

4. Machine Learning-Based EC prediction framework


Energy consumption (EC) prediction is essential to optimally designing and operating
sustainable energy-saving WWTPs. EC in WWTPs is influenced by diverse biological
and environmental factors, making it complicated and challenging to build soft sensors.
The techniques based on machine learning represent an appealing solution to predict the
EC in WWTPs. In this paper, we investigated the prediction performance of the commonly
used machine learning methods to predict the WWTP’s energy consumption. The general
framework of the adopted machine learning framework is given in Figure 8. After pre-
processing the data by eliminating outliers and imputing missing values, the data are
subdivided into training and test sets. Here, 75% of the data are used to train the machine
learning algorithms; furthermore, the trained models are evaluated using the testing data
(i.e., the remaining 25% of data). We used the k-fold Cross-validation technique to train
the models. Cross-validation is a technique used to assess the performance of a model by
dividing the data into train and test sets, and training and testing the model multiple times
on different subsets of the data [59]. The benefit of using cross-validation is that it provides
a more accurate estimate of the model’s performance, reduces the risk of overfitting, and
can also be used to tune the hyperparameters of a model. Note that the considered models
are calibrated using Bayesian optimization during the training stage.

Figure 8. Illustration of the general framework of machine learning-based prediction procedure.

Selecting the best model is an important step in machine learning. The best model can
be selected by comparing the performance of different models using these metrics. Here,
we employed four commonly used metrics: root mean square error (RMSE), mean absolute
error (MAE), mean absolute percentage error (MAPE), and training time are often used to
evaluate the performance of a model. Training time measures how long it takes to train the
model. In addition, we computed J 2 metric, which has been recently in [28]. The model
that performs best based on these metrics can be selected as the best model. The measured
and the predicted energy consumption are denoted respectively by yt and ŷt , and n as the
number of records.
• RMSE measures the differences between predicted and true values.
v
n −1
u
u1
RMSE(y, ŷ) = t
n ∑ (yi − ŷi )2 , (1)
i =0

• MAE measures the average absolute difference between predicted and true values.

n −1
1
MAE(y, ŷ) =
n ∑ kyi − ŷi k1 , (2)
i =0
Water 2023, 15, 2349 13 of 23

• MAPE measures the average percentage difference between predicted and true values.
n −1
1 kyi − ŷi k1
MAPE(y, ŷ) =
n ∑ max(e, kyi k1 )
, (3)
i =0

• j2 is the ratio of the square RMSE in testing to the square RMSE in training [28].

RMSE2test
J2 = . (4)
RMSE2train

In summary, lower values of these metrics indicate that the model is making predic-
tions that are more accurate, closer to the true values, and faster to train, which can be
considered as better precision and quality of prediction.

5. Results and Discussion


In this study, we performed three investigations to develop effective machine-learning
models to predict WWTP energy consumption. At first, all the available input variable data
is considered in predicting energy consumption via several machine learning models. In the
second experiment, we considered only the most important variables via variable selection
in predicting WWTP energy consumption. In the last experimentation, we included lagged
data to improve the prediction accuracy of the considered machine learning models.

5.1. EC prediction Using Full Models


This first experiment considers hydraulic, weather, wastewater characteristics, and time
variables for EC prediction. This experiment assesses the capacity of machine learning
methods for EC prediction based on all variables in Table 1. Here, we investigate the
conventional Static machine learning models that are built using a fixed set of features and
do not take into account any changes or updates in the data over time. These models are
trained on a single dataset and make predictions based on that dataset.
As discussed, we used 75% of the data (from 1 January 2014, to 28 January 2018) for
training sub-data and 25% for testing sub-data (from 29 January 2018, to 27 June 2019).
In this context, we refer to full prediction models that predict EC without considering
previous energy consumption. To train the investigated models, we used a 5-fold cross-
validation technique. As the prediction accuracy of machine learning methods relies on
the hyperparameters’ values, we adopted Bayesian Optimization herein to fine-tune the
investigated methods during the training.
Table 2 summarizes the prediction performance of the investigated machine learning
models based on testing data. A neural network tends to be the fastest model among all
others in terms of training time. We observe that ensemble learning models (i.e., BST, BT,
RF, XGBoost, and LightgBM) predicted WWTP EC with the lowest RMSE and MAE. For in-
stance, the BST produced better RMSE values in the train with 35.77 and in the test data
with MAE = 31.97 MWh, RMSE = 41.37 MWh, and MAPE = 11.25. Furthermore, XGBoost
also yielded good results with RMSE = 41.38 (MWh/Day), MAE = 32.23 (MWh/Day),
and J2 = 0.95. This could be due to their ability to reduce prediction errors using several
weak regressors. Based on MAPE criteria, the top six models are QSVR, GPRE, GPRRQ,
BT, BST, and XGBoost. Table 2 shows that the GPR models are time-consuming to train,
followed by the SVR models. Overall, no one approach dominates all the others in terms
of all considered statistical scores. Results in Table 2 indicate that there is still space for
improvement in WWTP energy consumption.
Water 2023, 15, 2349 14 of 23

Table 2. Prediction results of full machine learning models based on testing data.

RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
LSVR 46.40 36.11 12.09 39.12 60.297 1.40
CSVR 41.96 33.56 12.30 45.58 197.89 0.84
GSVR 41.85 32.28 11.24 38.86 28.94 1.15
QSVR 43.44 33.68 11.81 40.66 97.052 1.14
GPRE 43.32 33.48 11.45 37.92 469.67 1.30
GPRM3/2 46.02 35.51 11.88 38.16 308.8 1.45
GPRRQ 42.44 32.62 11.24 37.97 805.05 1.24
GPRSE 46.23 35.65 11.92 38.15 482.8 1.46
GPRM5/2 46.00 35.48 11.88 38.18 527.09 1.45
BT 41.46 32.23 11.36 35.67 442.61 1.35
BST 41.37 31.97 11.25 35.77 28.376 1.33
ODT 42.57 33.57 11.87 39.06 19.345 1.18
NNN 60.05 47.08 14.58 44.06 13.64 1.85
MNN 93.77 67.85 18.51 55.90 11.86 2.81
WNN 194.31 131.10 27.91 145.03 16.16 1.79
BNN 64.54 51.00 15.65 45.90 10.11 1.97
TNN 67.59 49.76 15.69 51.95 12.14 1.69
ONN 47.09 36.55 12.21 39.40 157.72 1.42
XGBoost 41.38 32.23 12.07 42.36 375.00 0.95
RF 42.43 33.16 12.07 44.90 75.00 0.89
KNN 41.51 32.82 12.30 41.41 21.00 1.00
LightgBM 41.76 32.34 12.40 41.76 107.00 1.00

5.2. EC Prediction Using Reduced Models


Now, the aim is to build parsimonious predictive models for WWTP EC prediction.
Feature importance identification is the process of determining which input variables in a
dataset have the most impact on the outcome of a machine learning model. This knowledge
can help to understand the relationship between inputs and outputs, simplify the model,
and improve its performance. Importantly, non-informative and redundant input variables
will be ignored in building a predictive model to reduce the number of input variables.
Here, we used two popular machine learning algorithms, XGBoost and RF, to identify
important features in a dataset. By using two methods, one can obtain a more robust
understanding of feature importance and make more informed decisions about which
features to include or exclude in a model. Figure 9 displays the selection results based on
the RF and XGBoost methods. The larger the input amplitude for feature importance, the
more significant the influence of that variable on the WWTP EC prediction.
In Figure 9, results from the two feature selection methods indicate that the month is
the most important variable, followed by the year. The relationship between total energy
consumption and the month may be due to seasonal and weather changes. Including
the year as a predictor may not provide meaningful insights beyond capturing the overall
trend of energy consumption changing over time. Factors such as population growth,
infrastructure improvements, or policy changes occurring annually in the WWTP may
influence energy consumption broadly. However, the year variable alone may not offer
specific insights into the factors driving energy demand variations within a year. Therefore,
we have focused primarily on the month as the key predictor. Importantly, the month
captures the seasonal and weather-related fluctuations that have a more immediate and
direct impact on energy consumption patterns in WWTPs. The two hydraulic variables,
average daily inflow and outflow rates, correlate well with the target. As these hydraulic
variables are correlated, we selected only one of them (i.e., Qin ). We can see that in climate
variables, Tavg, Tmax, Tmin, and the average humidity have more effect on EC prediction.
As there is a correlation between the average humidity and Tavg of 0.55, Tmin and Tmax of
0.76, and between Tavg and Tmax of 0.92, we can select only a few of them to construct
machine learning models. For example, we can ignore two variables from Tavg, Tmax,
Water 2023, 15, 2349 15 of 23

and Tmin. From Figure 9, the other climate variables, such as WSmax, atmospheric
pressure, and visibility, are not relevant and can be ignored. Figure 9 shows that for
wastewater variables, TN, COD, and BOD are correlated with the target [60]. As there is a
relatively important correlation between COD and TN (0.68); COD can be ignored. Figure 9
also indicates that Ammonia appears slightly more impactful than BOD in predicting energy
consumption, which could be attributed to several factors. One possible explanation is
that Ammonia levels in the influent wastewater can indicate the presence of nitrogen-rich
compounds, which require additional energy for their removal during the treatment process.
The energy-intensive processes involved in nitrogen removals, such as nitrification and
denitrification, may contribute to the higher impact of Ammonia on energy consumption.
On the other hand, BOD represents the amount of biodegradable organic matter in the
wastewater. This could potentially explain the relatively lower impact of BOD compared
to Ammonia on energy consumption in the studied context. Unexpectedly, influent water
quality did not exhibit a stronger predictive power for energy consumption than other
factors, such as water quantity. This could be attributed to several factors. Firstly, it is
important to consider the specific characteristics of the dataset and the operational context
of the studied WWTP. The dataset used in this study may have had relatively stable
and controlled influent water quality conditions, with limited fluctuations or variations
that could have a pronounced impact on energy consumption. Furthermore, it is worth
noting that the energy consumption in WWTPs is influenced by a complex interplay of
various factors, including hydraulic conditions, treatment processes, operational strategies,
and system design. While water quality parameters are important in the overall treatment
process, their individual contribution to energy consumption may be relatively lower than
other influential factors.

Figure 9. Feature importance identification using RF and XGBoost methods with all input variables.

Here, we investigate the performance of the reduced machine learning models that
are built using a subset of features from the original dataset. These models are trained on a
reduced dataset that contains only the relevant features, and make predictions based on
that reduced dataset. Table 3 summarized the prediction results of the reduced models
using testing data. The comparison of the performance of the models shows that KNN
outperformed all of them, with the lowest RMSE, MAE, and MAPE errors at 37.33, 28.23,
and 10.65, respectively. The KNNs outperform the static models across all criteria, and the
training time is decreased from 12 s to 10 s. It was followed by GSVR, BT, RF, and LightgBM,
which had the lowest RMSE. In terms of the J2 criteria, WNN has the lowest score of 0.38.
Furthermore, the training time for neural networks is generally the fastest of all models.
Water 2023, 15, 2349 16 of 23

Table 3. Prediction results of the reduced models based on testing data.

RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
LSVR 47.76 37.19 12.30 39.52 102.18 1.46
CSVR 41.96 33.56 12.30 45.68 178.57 0.84
GSVR 41.96 32.16 11.17 38.39 61.98 1.19
QSVR 45.31 35.14 11.90 40.03 264.51 1.28
GPRE 43.30 33.50 11.44 38.06 379.95 1.29
GPRM3/2 42.35 32.67 11.25 38.30 462.06 1.22
GPRRQ 42.43 32.62 11.23 37.91 977.71 1.25
GPRSE 46.21 35.67 11.92 38.39 467.66 1.45
GPRM5/2 42.22 32.58 11.24 38.42 420.51 1.21
BT 41.70 32.35 11.43 35.51 450.59 1.38
BST 43.67 34.26 12.56 40.01 30.56 1.04
ODT 42.71 33.26 11.60 38.73 35.81 1.22
NNN 60.88 48.96 15.29 42.79 10.27 2.02
MNN 101.34 70.46 18.85 78.04 9.85 1.69
WNN 140.76 92.65 26.12 227.77 13.33 0.38
BNN 60.04 46.36 14.50 59.87 10.30 1.01
TNN 62.43 49.07 15.06 46.86 11.64 1.78
ONN 43.44 33.74 11.53 41.22 298.74 1.11
XGBoost 41.77 32.80 12.20 42.91 376.00 0.95
RF 41.61 32.31 12.25 41.48 64.00 1.01
KNN 37.33 28.23 10.65 41.41 10.00 0.08
LightgBM 41.27 32.00 12.27 41.27 103.00 1.00

In terms of a MAPE, the KNN model has the best prediction performance of 10.65,
followed by the GSVR model with a MAPE of 11.17, followed by the GPR models between
11.23–11.25. Moreover, based on the RMSE criteria, Table 3 shows the six best models:
GSVR, KNeighbors, LightgBM, RF, and XGBoost. The KNN and LightgBM models capture
more fluctuations, respectively, with RMSE values of 37.33 and 41.27, followed by the RF,
BT, XGBoost, and GSVR models with RMSEs between 41.6 and 41.9. Overall, the forecasting
gets better with slower training times and fewer features from the static model.

5.3. EC Prediction Using Dynamic Models


The results obtained in the previous experiments are based on static models that do
not account for the previous day’s energy consumption. However, energy consumption
data often exhibits a dynamic nature. This experiment will examine how machine learning
models perform when past data is incorporated into their construction. Towards this end,
we introduce lagged data, the lag 1 energy consumption data, when building prediction
models to capture energy consumption’s dynamic and evolving nature. These dynamically
reduced models are trained on a reduced dataset that contains only the relevant features
but also incorporates time-lagged measurements. Figure 10 shows the results of variable
importance identification based on the RF and XGBoost algorithms. According to RF and
XGBoost, Figure 10 illustrates the effect of each feature on energy consumption. For both
RF and XGBoost, it shows that the lag 1 energy consumption data significantly impacts
the target with a large score of around 0.53. This confirms the need to take past data into
account when building predictive models for energy consumption prediction.
Water 2023, 15, 2349 17 of 23

Figure 10. Feature importance identification using RF and XGBoost methods with all original input
variables and Lag 1 Energy consumption.

Similarly to the reduced models’ experiment, we construct the machine learning mod-
els based on training data with selected variables. In dynamic models, such as autoregres-
sive models, it is common to use lagged data to capture the dynamic and time-dependent
nature of the process being modeled. In this study with dynamic machine learning models,
Lag 1 energy consumption is a key predictor because it reflects the immediate past energy
consumption, significantly influencing the current energy consumption in the wastewater
treatment process. Note that we investigated the inclusion of other Lag orders, such as
Lag 2 and Lag 3 data, and the analysis consistently showed that incorporating Lag 1 data
provided superior prediction results compared to the use of Lag 2 and Lag 3 data. This
suggests the strong influence of immediate past energy consumption on the current energy
consumption in the wastewater treatment process. After building the models with Lag 1
energy consumption data as an input variable, we tested the dynamic models using data
from 29 January 2018, and 27 June 2019. Table 4 lists the prediction results of the dynamic
machine learning models. Results show that XGBoost achieved the best prediction results
in terms of RMSE (37.14); it is followed by kNN, GPRE, LightGBM, and BT with RMSE
values of 37.33, 37.36, 37.38, and 37.56, respectively (Table 4). Moreover, the time consumed
by XGBoost in training (398 s) is nearly 30 times that of the second-best model, kNN (13 s),
with only a 1% loss in predicting performance with regard to RMSE.
Overall, the time consumed by XGBoost in training is significantly higher than that of
kNN, with XGBoost taking approximately 30 times longer than kNN. However, the differ-
ence in prediction performance between the two models is relatively small, with XGBoost
having a slightly better performance in terms of RMSE. This suggests that while XGBoost
may have a higher time complexity, it may be more suitable for certain applications where a
higher level of accuracy is desired, while kNN may be more suitable for applications where
computational efficiency is a priority. Hence, a trade-off between prediction accuracy and
computational time may need to be considered when selecting a machine-learning model
for energy consumption prediction in WWTPs.
Water 2023, 15, 2349 18 of 23

Table 4. Prediction results of the dynamic models based on testing data.

RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
KNN 37.33 28.23 10.65 37.48 13.00 0.99
XGBoost 37.14 28.50 10.81 37.62 398.00 0.97
LightGBM 37.38 28.63 10.96 37.11 150.00 1.01
GPRRQ 37.45 28.65 10.04 34.17 936.09 1.20
RF 37.86 28.73 10.91 41.73 66.00 0.82
GPRE 37.36 28.74 10.05 33.88 549.21 1.22
BT 37.56 28.75 10.27 33.99 332.30 1.22
GPRM5/2 37.42 28.81 10.07 33.93 502.68 1.22
BST 37.83 28.86 10.25 34.92 58.89 1.17
ODT 38.48 28.87 10.45 35.70 22.87 1.16
GSVR 37.70 28.88 10.12 34.47 34.78 1.20
GPRSE 37.59 28.92 10.10 33.96 784.89 1.23
QSVR 38.62 29.30 10.24 34.38 210.03 1.26
ONN 38.85 30.02 10.46 35.45 423.01 1.20
CSVR 40.07 30.13 10.35 44.40 122.44 0.81
LSVR 39.22 30.29 10.43 34.15 49.95 1.32
BNN 46.96 36.31 12.43 39.83 11.41 1.39
TNN 46.96 36.31 12.43 38.72 12.80 1.47
NNN 53.83 39.83 12.55 57.08 10.52 0.89
MNN 90.18 65.37 17.77 60.08 11.16 2.25
WNN 152.43 111.08 26.80 210.20 14.63 0.53

Figure 11a,b shows the heatmap of the RMSE values of the twenty-three models
(static, reduced, and dynamic models). The dynamic reduced models that incorporate
lagged energy consumption data show better prediction results in terms of RMSE and
MAPE when compared to static and reduced models. This suggests that considering past
energy consumption data can lead to more accurate predictions of energy consumption
in WWTPs (Figure 11). The dynamic ensemble models, led by XGBoost, show good
prediction performance in terms of RMSE and MAPE compared to static and reduced
models. The use of lagged energy consumption data in the dynamic models improves
the accuracy of predictions. However, it’s important to note that the time complexity of
XGBoost is considerably higher than other models like kNN.

Figure 11. Heatmap of (a) MAPE and (b) RMSE values obtained using the twenty-three models.
Water 2023, 15, 2349 19 of 23

In summary, machine learning is a powerful tool that can be used to predict energy
consumption in WWTPs. By analyzing the available input-output data, machine learning
models can identify patterns and relationships that can be used to make accurate predic-
tions about energy consumption. This can help WWTPs reduce energy costs by identifying
opportunities for energy efficiency and optimizing the treatment process in several ways.
By accurately forecasting energy consumption, operators can implement proactive mea-
sures to optimize energy usage. For example, suppose the prediction model indicates a
peak in energy demand during a specific time period. In that case, operators can schedule
the operation of energy-intensive equipment during off-peak hours or consider alternative
energy sources to minimize costs. Furthermore, by analyzing the factors influencing energy
consumption, such as influent characteristics, operational parameters, and treatment pro-
cesses, WWTPs can identify specific areas where energy efficiency improvements can be
made. This analysis may reveal opportunities to optimize process parameters, retrofit equip-
ment with energy-saving technologies, or implement advanced control strategies to reduce
energy waste. Additionally, predicting energy consumption can support decision-making
in allocating resources and investments. WWTPs can prioritize projects and investments
based on the predicted energy demands and potential energy savings. This allows for
targeted interventions and resource allocation towards areas that yield the greatest energy
efficiency improvements, resulting in long-term cost reductions. Moreover, by continuously
monitoring and updating the predictive model, WWTPs can assess the effectiveness of
energy-saving initiatives over time and fine-tune their energy management strategies. This
iterative process enables the identification of further optimization opportunities and the
implementation of adaptive measures to achieve sustained energy efficiency gains. In the
context of energy consumption prediction in WWTPs, dynamic models would be more
suitable, as they can capture the temporal dynamics of the data and make more accurate
predictions about future energy consumption. It is important to note that while XGBoost
performed well in terms of RMSE, other models such as kNN and GPR performed well
in terms of other evaluation metrics. Additionally, the time complexity of the models
should also be taken into consideration. Therefore, it is recommended to use a combination
of different models and evaluation metrics to optimize energy consumption prediction
in WWTPs.

6. Conclusions
This study investigates the application of machine learning techniques for predicting
energy consumption in WWTPs. Real data from a WWTP in Melbourne is utilized, and a
range of machine learning models, including kernel-based methods, ensemble learning
methods, ANN models, decision trees, and k-nearest neighbors, are assessed. Feature
selection methods, such as Random Forest and XGBoost, are employed to enhance model
efficiency. The findings demonstrate that incorporating past data through dynamic models,
specifically time-lagged measurements, improves the accuracy of energy consumption
predictions. The dynamic K-nearest neighbors model emerges as the top-performing model.
It is important to highlight that while XGBoost excels in terms of RMSE, other models like
kNN and GPR exhibit strong performance in different evaluation metrics. Furthermore,
considering the time complexity of the models is crucial. To optimize energy consumption
prediction in WWTPs, it is recommended to employ a combination of diverse models and
evaluation metrics.
The analysis of the Melbourne East WWTP data demonstrated that multiple variables
significantly influenced EC. Among these variables, month, TN, ammonia, daily tempera-
ture, humidity, and influent flow showed the highest impact on EC in the WWTP. However,
our investigation also revealed that factors such as rainfall, atmospheric pressure, and wind
speed did not exhibit significant effects on EC in the WWTP. Furthermore, our findings
indicated that incorporating lag 1 EC data improved the predictive performance of the
models. These results provide valuable insights into the factors influencing EC in the
Water 2023, 15, 2349 20 of 23

Melbourne East WWTP and highlight the potential benefits of considering these variables
and lagged energy consumption in future energy consumption prediction models.
The proposed framework presented in this study can be customized and implemented
in other WWTPs by incorporating plant-specific data and relevant variables. While the
specific conclusions drawn from our research may not directly translate to other plants due
to differences in operational conditions and data characteristics, the underlying principles,
methodologies, and insights gained from our study can serve as valuable references for
other pollution treatment plants. By adapting the framework to their specific context, other
WWTPs can leverage the knowledge and approaches developed in our study to enhance
their understanding and prediction of energy consumption in their respective systems.
There is still room for improvement in energy consumption prediction for WWTPs
using machine learning.
• Despite optimizing the prediction model through variable selection methods, it is
important to acknowledge that our model’s predictive capability could be influenced
by other variables that were not included due to data limitations. Future research
should focus on exploring and incorporating a broader range of variables to enhance
the accuracy and comprehensiveness of energy consumption prediction models in
WWTPs. This could involve considering additional variables related to process con-
ditions, influent characteristics, operational parameters, and external factors such as
climate and regulatory changes. By incorporating these variables, we can improve the
predictive power of the models and gain a more comprehensive understanding of the
factors impacting energy consumption in WWTPs.
• In future work, we will emphasize the need for additional studies that focus on
validating the feasibility and utility of these models in real-world scenarios. This
will involve considering factors such as computational requirements and operational
constraints commonly encountered in real WWTP settings.
• Deep learning models, known for their ability to handle time-series data, present
an intriguing avenue for further exploration in forecasting energy consumption in
WWTPs. These models, such as recurrent neural networks (RNNs) and long short-term
memory (LSTM) networks, have demonstrated promising capabilities in capturing
temporal dependencies and patterns within time-series data [61,62]. By leveraging
their strengths, deep learning models could improve the accuracy and precision of
energy consumption forecasts in WWTPs.
• Another possibility for improvement is integrating wavelet-based multiscale data
representation with machine learning models. This approach would take into account
the temporal and frequency characteristics of the data and could potentially improve
the accuracy of the prediction models. Wavelet-based multiscale representation can
also be used to extract relevant features and patterns from the data, which could be
used to improve the performance of the machine learning models. This approach
could potentially provide more accurate predictions and lead to further optimization
of energy consumption in WWTPs.

Author Contributions: Y.A., Conceptualization, formal analysis, investigation, methodology, soft-


ware, writing—original draft, and writing—review and editing. F.H., Conceptualization, formal anal-
ysis, investigation, methodology, software, supervision, writing—original draft, and writing—review
and editing. Y.S., Investigation, methodology, conceptualization, supervision, and writing—review
and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This publication is based upon work supported by King Abdullah University of Science
and Technology (KAUST) Research Funding (KRF) from the Climate and Livability Initiative (CLI)
under Award No. ORA-2022-5339.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Water 2023, 15, 2349 21 of 23

Conflicts of Interest: The authors declare no conflict of interest.

References
1. Gu, Y.; Li, Y.; Li, X.; Luo, P.; Wang, H.; Wang, X.; Wu, J.; Li, F. Energy self-sufficient wastewater treatment plants: Feasibilities and
challenges. Energy Procedia 2017, 105, 3741–3751. [CrossRef]
2. Molinos-Senante, M.; Maziotis, A. Evaluation of energy efficiency of wastewater treatment plants: The influence of the technology
and aging factors. Appl. Energy 2022, 310, 118535. [CrossRef]
3. Daw, J.; Hallett, K.; DeWolfe, J.; Venner, I. Energy Efficiency Strategies for Municipal Wastewater Treatment Facilities; Technical Report;
National Renewable Energy Lab. (NREL): Golden, CO, USA, 2012.
4. Goldstein, R.; Smith, W. Water & Sustainability: US Electricity Consumption for Water Supply & Treatment-the Next Half Century;
Electric Power Research Institute: Palo Alto, CA, USA, 2002; Volume 4.
5. Zhang, Z.; Kusiak, A.; Zeng, Y.; Wei, X. Modeling and optimization of a wastewater pumping system with data-mining methods.
Appl. Energy 2016, 164, 303–311. [CrossRef]
6. Plappally, A.; Lienhard V, J.H. Energy requirements for water production, treatment, end use, reclamation, and disposal. Renew.
Sustain. Energy Rev. 2012, 16, 4818–4848. [CrossRef]
7. Robescu, L.D.; Boncescu, C.; Bondrea, D.A.; Presura-Chirilescu, E. Impact of wastewater treatment plant technology on power
consumption and carbon footprint. In Proceedings of the 2019 International Conference on ENERGY and ENVIRONMENT
(CIEM), Timisoara, Romania, 17–18 October 2019; pp. 524–528.
8. Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A review of the artificial neural network models for water quality prediction. Appl. Sci.
2020, 10, 5776. [CrossRef]
9. Harrou, F.; Cheng, T.; Sun, Y.; Leiknes, T.; Ghaffour, N. A data-driven soft sensor to forecast energy consumption in wastewater
treatment plants: A case study. IEEE Sens. J. 2020, 21, 4908–4917. [CrossRef]
10. Cheng, T.; Harrou, F.; Kadri, F.; Sun, Y.; Leiknes, T. Forecasting of wastewater treatment plant key features using deep
learning-based models: A case study. IEEE Access 2020, 8, 184475–184485. [CrossRef]
11. El-Rawy, M.; Abd-Ellah, M.K.; Fathi, H.; Ahmed, A.K.A. Forecasting effluent and performance of wastewater treatment plant
using different machine learning techniques. J. Water Process Eng. 2021, 44, 102380. [CrossRef]
12. Hilal, A.M.; Althobaiti, M.M.; Eisa, T.A.E.; Alabdan, R.; Hamza, M.A.; Motwakel, A.; Al Duhayyim, M.; Negm, N. An Intelligent
Carbon-Based Prediction of Wastewater Treatment Plants Using Machine Learning Algorithms. Adsorpt. Sci. Technol. 2022,
2022, 8448489. [CrossRef]
13. Safeer, S.; Pandey, R.P.; Rehman, B.; Safdar, T.; Ahmad, I.; Hasan, S.W.; Ullah, A. A review of artificial intelligence in water
purification and wastewater treatment: Recent advancements. J. Water Process Eng. 2022, 49, 102974. [CrossRef]
14. Ly, Q.V.; Truong, V.H.; Ji, B.; Nguyen, X.C.; Cho, K.H.; Ngo, H.H.; Zhang, Z. Exploring potential machine learning application
based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants. Sci. Total Environ.
2022, 832, 154930. [CrossRef]
15. Cheng, T.; Harrou, F.; Sun, Y.; Leiknes, T. Monitoring influent measurements at water resource recovery facility using data-driven
soft sensor approach. IEEE Sens. J. 2018, 19, 342–352. [CrossRef]
16. Haimi, H.; Mulas, M.; Corona, F.; Vahala, R. Data-derived soft-sensors for biological wastewater treatment plants: An overview.
Environ. Model. Softw. 2013, 47, 88–107. [CrossRef]
17. Andreides, M.; Dolejš, P.; Bartáček, J. The prediction of WWTP influent characteristics: Good practices and challenges. J. Water
Process Eng. 2022, 49, 103009. [CrossRef]
18. Guo, H.; Jeong, K.; Lim, J.; Jo, J.; Kim, Y.M.; Park, J.p.; Kim, J.H.; Cho, K.H. Prediction of effluent concentration in a wastewater
treatment plant using machine learning models. J. Environ. Sci. 2015, 32, 90–101. [CrossRef] [PubMed]
19. Nadiri, A.A.; Shokri, S.; Tsai, F.T.C.; Moghaddam, A.A. Prediction of effluent quality parameters of a wastewater treatment plant
using a supervised committee fuzzy logic model. J. Clean. Prod. 2018, 180, 539–549. [CrossRef]
20. Hernández-del Olmo, F.; Gaudioso, E.; Duro, N.; Dormido, R. Machine learning weather soft-sensor for advanced control of
wastewater treatment plants. Sensors 2019, 19, 3139. [CrossRef] [PubMed]
21. Alali, Y.; Harrou, F.; Sun, Y. Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient Boosting
Machine: A Comparative Study. In Proceedings of the 2022 10th International Conference on Systems and Control (ICSC),
Marseille, France, 23–25 November 2022; pp. 137–142. [CrossRef]
22. Zhang, S.; Wang, H.; Keller, A.A. Novel Machine Learning-Based Energy Consumption Model of Wastewater Treatment Plants.
ACS ES&T Water 2021, 1, 2531–2540.
23. Boncescu, C.; Robescu, L.; Bondrea, D.; Măcinic, M. Study of energy consumption in a wastewater treatment plant using logistic
regression. In IOP Conference Series: Earth and Environmental Science, Proceedings of the 4th International Conference on Biosciences
(ICoBio 2021), Bogor, Indonesia, 11–12 August 2021; IOP Publishing: Bristol, UK, 2021; Volume 664, p. 012054.
24. Ramli, N.A.; Abdul Hamid, M.F. Data Based Modeling of a Wastewater Treatment Plant by using Machine Learning Methods.
J. Eng. Technol. 2019, 6, 14–21.
25. Torregrossa, D.; Schutz, G.; Cornelissen, A.; Hernández-Sancho, F.; Hansen, J. Energy saving in WWTP: Daily benchmarking
under uncertainty and data availability limitations. Environ. Res. 2016, 148, 330–337. [CrossRef]
Water 2023, 15, 2349 22 of 23

26. Torregrossa, D.; Leopold, U.; Hernández-Sancho, F.; Hansen, J. Machine learning for energy cost modelling in wastewater
treatment plants. J. Environ. Manag. 2018, 223, 1061–1067. [CrossRef] [PubMed]
27. Qiao, J.; Zhou, H. Modeling of energy consumption and effluent quality using density peaks-based adaptive fuzzy neural
network. IEEE/CAA J. Autom. Sin. 2018, 5, 968–976. [CrossRef]
28. Bagherzadeh, F.; Nouri, A.S.; Mehrani, M.J.; Thennadil, S. Prediction of energy consumption and evaluation of affecting factors in
a full-scale WWTP using a machine learning approach. Process Saf. Environ. Prot. 2021, 154, 458–466. [CrossRef]
29. Zhang, Z.; Zeng, Y.; Kusiak, A. Minimizing pump energy in a wastewater processing plant. Energy 2012, 47, 505–514. [CrossRef]
30. Oulebsir, R.; Lefkir, A.; Safri, A.; Bermad, A. Optimization of the energy consumption in activated sludge process using deep
learning selective modeling. Biomass Bioenergy 2020, 132, 105420. [CrossRef]
31. Das, A.; Kumawat, P.K.; Chaturvedi, N.D. A Study to Target Energy Consumption in Wastewater Treatment Plant using
Machine Learning Algorithms. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2021; Volume 50,
pp. 1511–1516.
32. Oliveira, P.; Fernandes, B.; Analide, C.; Novais, P. Forecasting energy consumption of wastewater treatment plants with a transfer
learning approach for sustainable cities. Electronics 2021, 10, 1149. [CrossRef]
33. Yusuf, J.; Faruque, R.B.; Hasan, A.J.; Ula, S. Statistical and Deep Learning Methods for Electric Load Forecasting in Multiple
Water Utility Sites. In Proceedings of the 2019 IEEE Green Energy and Smart Systems Conference (IGESSC), Long Beach, CA,
USA, 4–5 November 2019; pp. 1–5.
34. Filipe, J.; Bessa, R.J.; Reis, M.; Alves, R.; Póvoa, P. Data-driven predictive energy optimization in a wastewater pumping station.
Appl. Energy 2019, 252, 113423. [CrossRef]
35. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [CrossRef]
36. Yu, P.S.; Chen, S.T.; Chang, I.F. Support vector regression for real-time flood stage forecasting. J. Hydrol. 2006, 328, 704–716.
[CrossRef]
37. Hong, W.C.; Dong, Y.; Chen, L.Y.; Wei, S.Y. SVR with hybrid chaotic genetic algorithms for tourism demand forecasting. Appl.
Soft Comput. 2011, 11, 1881–1890. [CrossRef]
38. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [CrossRef]
39. Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Reliable solar irradiance prediction using ensemble learning-based models: A comparative
study. Energy Convers. Manag. 2020, 208, 112582. [CrossRef]
40. Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Wind power prediction using ensemble learning-based models. IEEE Access 2020,
8, 61517–61527. [CrossRef]
41. Harrou, F.; Saidi, A.; Sun, Y.; Khadraoui, S. Monitoring of photovoltaic systems using improved kernel-based learning schemes.
IEEE J. Photovoltaics 2021, 11, 806–818. [CrossRef]
42. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2.
43. Williams, C.; Rasmussen, C. Gaussian processes for regression. Adv. Neural Inf. Process. Syst. 1995, 8, 514–520.
44. Tang, L.; Yu, L.; Wang, S.; Li, J.; Wang, S. A novel hybrid ensemble learning paradigm for nuclear energy consumption forecasting.
Appl. Energy 2012, 93, 432–443. [CrossRef]
45. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013;
Volume 112.
46. Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [CrossRef]
47. Zhou, Z. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012; pp. 15–55.
48. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
49. Bühlmann, P.; Hothorn, T. Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci. 2007, 22, 477–505.
50. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
51. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Zhou, T. Xgboost: Extreme gradient boosting.
R Package Version 0.4-2 2015, 1, 1–4. Available online: https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf
(accessed on 17 June 2023).
52. Deng, H.; Yan, F.; Wang, H.; Fang, L.; Zhou, Z.; Zhang, F.; Xu, C.; Jiang, H. Electricity Price Prediction Based on LSTM and
LightGBM. In Proceedings of the 2021 IEEE 4th International Conference on Electronics and Communication Engineering
(ICECE), Xi’an, China, 17–19 December 2021; pp. 286–290.
53. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision
tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52.
54. Bull, A.D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 2011, 12, 2879–2904.
55. Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305.
56. Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020,
415, 295–316. [CrossRef]
57. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process.
Syst. 2012, 25, 1–12 .
58. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian
optimization. Proc. IEEE 2015, 104, 148–175. [CrossRef]
Water 2023, 15, 2349 23 of 23

59. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction;
Springer: Amsterdam, The Netherlands, 2009; Volume 2.
60. Wang, S.; Zou, L.; Li, H.; Zheng, K.; Wang, Y.; Zheng, G.; Li, J. Full-scale membrane bioreactor process WWTPs in East Taihu
basin: Wastewater characteristics, energy consumption and sustainability. Sci. Total Environ. 2020, 723, 137983. [CrossRef]
61. Rathnayake, N.; Rathnayake, U.; Dang, T.L.; Hoshino, Y. A Cascaded Adaptive Network-Based Fuzzy Inference System for
Hydropower Forecasting. Sensors 2022, 22, 2905. [CrossRef]
62. Harrou, F.; Dairi, A.; Kadri, F.; Sun, Y. Forecasting emergency department overcrowding: A deep learning framework. Chaos
Solitons Fractals 2020, 139, 110247. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like