Waste Water Treatment Energy Biugi
Waste Water Treatment Energy Biugi
Article
Unlocking the Potential of Wastewater Treatment:
Machine Learning Based Energy Consumption Prediction
Yasminah Alali, Fouzi Harrou * and Ying Sun
Abstract: Wastewater treatment plants (WWTPs) are energy-intensive facilities that fulfill stringent
effluent quality norms. Energy consumption prediction in WWTPs is crucial for cost savings, process
optimization, compliance with regulations, and reducing the carbon footprint. This paper evaluates
and compares a set of 23 candidate machine-learning models to predict WWTP energy consumption
using actual data from the Melbourne WWTP. To this end, Bayesian optimization has been applied to
calibrate the investigated machine learning models. Random Forest and XGBoost (eXtreme Gradient
Boosting) were applied to assess how the incorporated features influenced the energy consumption
prediction. In addition, this study investigated the consideration of information from past data
in improving prediction accuracy by incorporating time-lagged measurements. Results showed
that the dynamic models using time-lagged data outperformed the static and reduced machine
learning models. The study shows that including lagged measurements in the model improves
prediction accuracy, and the results indicate that the dynamic K-nearest neighbors model dominates
state-of-the-art methods by reaching promising energy consumption predictions.
1. Introduction
Citation: Alali, Y.; Harrou, F.; Sun, Y.
Recycled water is a strategic alternative to mitigate water scarcity, particularly in arid
Unlocking the Potential of
regions. Notably, the treated water from wastewater treatment plants (WWTPs) can be
Wastewater Treatment: Machine
used for different purposes, such as irrigation, aquariums, or discharged with a low level
Learning Based Energy Consumption
of pollution [1]. WWTPs are energy-intensive processes; therefore, improving their energy
Prediction. Water 2023, 15, 2349.
efficiency is needed from environmental and economic viewpoints [2]. It has been reported
https://doi.org/10.3390/w15132349
in refs. [3–5] that WWTPs energy usage accounts for 4% of the national electricity in the
Academic Editor: Michael Maerker United States and around 7% of the electrical energy worldwide [6]. One potential option
Received: 1 June 2023
for optimizing WWTPs and achieving energy savings is accurately predicting their energy
Revised: 18 June 2023
consumption. By developing predictive models, operators and engineers can forecast the
Accepted: 23 June 2023 expected energy requirements of the treatment processes, enabling them to anticipate the
Published: 25 June 2023 energy demand associated with different operational scenarios and efficiently schedule
equipment and processes. For example, by aligning energy-intensive activities, such as
aeration or pumping, with periods of lower electricity rates or increased availability of
renewable energy, WWTPs can strategically reduce their energy costs. This optimization
Copyright: © 2023 by the authors. approach ensures that energy-intensive processes are carried out when energy prices are
Licensee MDPI, Basel, Switzerland. more favorable or when renewable energy sources are abundant, leading to significant cost
This article is an open access article savings and a reduced environmental impact.
distributed under the terms and WWTPs currently operate conservatively, with high operating costs, and waste a large
conditions of the Creative Commons
amount of energy They are energy intensive because they require significant energy to
Attribution (CC BY) license (https://
perform the various treatment processes necessary to clean the wastewater. Some of the
creativecommons.org/licenses/by/
major energy-intensive processes in a WWTP include the aeration, mixing, and pumping
4.0/).
of water and solids for recirculation, filtration, and disinfection. Additionally, WWTPs also
require energy for processing biosolids, which may involve aerobic digestion, heat drying,
and dewatering. The energy demand for disinfection processes can vary depending on
the method employed, with chlorination being less energy-intensive compared to UV or
ozone disinfection. While sedimentation is a necessary process in wastewater treatment,
it is not considered highly energy-intensive. These energy-intensive activities and the
mechanical and electrical equipment used within the WWTP contribute significantly to
the overall energy consumption. The energy consumption of a WWTP can be reduced by
implementing energy-efficient technologies, such as using renewable energy sources, or by
optimizing the treatment process. Machine learning methods are being used in WWTPs to
improve their efficiency and reduce operational costs.
Enhancing the energy efficiency of WWTPs is essential to saving energy, reducing
economic expenses, and preserving resources and the environment [7–11]. Over the years,
numerous methods have been developed for modeling and predicting key characteristics
of WWTPS, including analytical and data-derived techniques [12–14]. Analytical-based
methods rely on a fundamental understanding of the physical process [15]. Developing
a precise physical model for complex, high-dimensional, and nonlinear systems is chal-
lenging, expensive, and time-consuming [16]. On the other hand, data-based methods
only rely on available historical data. Nowadays, data-driven methods, particularly ma-
chine learning methods, are more common in modeling and managing WWTP processes.
For example, ref. [14] investigated the application of machine learning-based approaches to
predict wastewater quality from WWTPs. They applied and compared six models, namely
RF, SVM, Gradient Tree Boosting (GTB), Adaptive Neuro-Fuzzy Inference System (ANFIS),
LSTM, and Seasonal Autoregressive Integrated Moving Average (SARIMAX). Hourly data
collected from three WWTPs has been used to assess the investigated models. Results
demonstrated that SARIMAX outperformed the other models in predicting wastewater
effluent quality and with acceptable time computation. Recently, Andreides et al. pre-
sented an overview of data-driven techniques applied for influent characteristics prediction
at WWTPs [17]. They showed that most reviewed works use machine learning-based
approaches, particularly neural networks. Some studies achieved comparable or better
outcomes using machine learning methods such as kNN and RF. This review concludes
that no one approach dominates all the others from the reviewed literature because they are
conducted using different datasets and settings, making the comparison difficult. Overall,
NNs and hybrid models exhibited satisfactory prediction performance. Guo et al. con-
sidered ANN and SVM models to predict the total nitrogen concentration in a WWTP in
Ulsan, Korea [18]. To this end, daily water quality and meteorological data were used as
input variables for the machine-learning models. The pattern search algorithm is adopted
to calibrate the ANN and SVM models. This study demonstrated the capability of these
two models in predicting water quality and the superior performance of the SVM in this
case study [18]. The study in [19] focused on predicting effluent water quality parameters
of the Tabriz WWTP through a supervised committee fuzzy logic approach. The study
in [2] evaluates the energy efficiency of a sample of Chilean wastewater treatment plants
(WWTPs) using a newly developed technique called stochastic non-parametric envelop-
ment of data (StoNED). This technique combines non-parametric and parametric methods,
and allows for an exploration of the influence of the operating environment on the energy
performance of WWTPs. The study found that the Chilean WWTPs were considerably
inefficient, with an average energy efficiency score of 0.433 and significant opportunities to
save energy (average savings were 203,413 MWh/year). The age of the facilities negatively
affected energy efficiency, and WWTPs using suspended-growth processes, such as conven-
tional activated sludge and extended aeration, had the lowest levels of energy efficiency.
The study suggests that this methodology could be used to support decision-making for
regulation and to plan the construction of new facilities, and the authors also suggest that
this methodology could be used to measure the energy efficiency of other stages of the
urban water cycle, such as drinking water treatment.
Water 2023, 15, 2349 3 of 23
This study explores the potential of machine learning models for predicting energy con-
sumption in WWTPs. The following key points summarize the contributions of this paper:
• First, we considered all input variables, including hydraulic, wastewater characteris-
tics, weather, and time data, to predict energy consumption in a WWTP. This study
compared twenty-three machine learning models, including support vector regres-
sion with different kernels, GPR with different kernels, boosted trees, bagged trees,
decision trees, neural networks (NNs), RF, k-nearest neighbors (KNN), Gradient Boost-
ing (XGBoost), and LightgBMs. Bayesian optimization has been applied to calibrate
and fine-tune the investigated machine-learning models to develop efficient energy
consumption predictions. In addition, A 5-fold cross-validation technique has been
used to construct these models based on training data. Five performance evaluation
metrics are employed to assess the goodness of predictions. Results revealed that
using all input variables to predict EC, the machine learning models did not provide
satisfactory predictions.
• Second, the aim is to construct reduced models by keeping only pertinent input
variables to predict EC in WWTP. To this end, Random Forest and XGBoost algorithms
were applied to identify important variables that considerably influence the prediction
capability of the considered models. Results showed that the reduced models obtained
a slightly improved prediction of EC compared to the full models.
• It is worthwhile to mention that the studied methods do not take into account the
time-dependent nature of energy consumption in the prediction process. Our final
contribution to addressing this limitation is constructing dynamic models by incorpo-
rating lagged measurements as inputs to enhance the ML models’ ability to perform
effectively. Results demonstrated that using lagged data contributes to improving the
prediction quality of the ML models and highlights the superior performance of the
dynamic GPR and KNN.
The remainder of this study is organized as follows. Section 2 provides an overview
of related works on energy consumption prediction. Section 3 presents the data from the
Melbourne WWTP and the airport weather station, along with an exploratory data analysis.
Furthermore, the investigated machine learning models are briefly described. Section 4
describes the proposed machine learning-based prediction framework. Section 5 contains
the results and discussion of the machine learning algorithms within our datasets. Lastly,
Section 6 recapitulates the paper and gives future directions for potential enhancements.
2. Related Works
Recently, many studies have explored the concept of machine learning to control
WWTP by predicting how much energy they consume [20,21]. Machine learning-based
models are flexible and rely only on historical data from the inspected process. For in-
stance, ref. [22] used a Random Forest model to predict the energy consumption of WWTPs.
They assessed this machine-learning approach using 2387 records from the China Urban
Drainage Yearbook. Results indicate that Random Forest exhibited satisfactory prediction
performance with an R2 of 0.702. However, this study does not consider the effects of local
climate and technology in building the predictive model, which is very important. Ref. [23]
considered a logistic regression approach for predicting the energy consumption of a WWTP
in Romania. The input variables, including the flowrate and wastewater characteristics, are
used to predict energy consumption. Data were collected from a WWTP between 2015 and
2017 with 403 records to verify the efficiency of this approach. Results showed a reasonable
prediction quality with an accuracy of 80%. Nevertheless, all parameters that affect water
quality were not considered when constructing the logistic regression model. Ref. [24]
investigated the capacity of the Artificial Neural Network (ANN), K-Nearest Neighbors
(KNN), Support Vector Machine (SVM), and Linear Regression in predicting the energy
consumption of a WWTP located in Peninsular Malaysia. The energy consumption data
was collected from the Tenaga National Berhad (TNB) electrical bills from March 2011 to
February 2015. The wastewater characteristics are collected to construct the predictive
Water 2023, 15, 2349 4 of 23
models. This study showed that the ANN model outperformed the other models. In
ref. [25], the purpose was to save energy at WWTP by performing a daily benchmark
analysis. Torregrossa et al. examined Support Vector Regression (SVR), ANN, and RF
algorithms on the Solingen-Burg WWTP dataset (designed for a connected population of
120,000 individuals). The RF was chosen as the most efficient algorithm based on an R2 of
0.72 in the validation and an R2 of 0.71 in the testing.
Furthermore, ref. [26] applied machine learning methods (ANN and RF) to predict
energy costs in WWTPs. This study is conducted based on 279 WWTPs located in north-
west Europe, including the Netherlands, France, Denmark, Belgium, Germany, Austria,
and Luxembourg. Regarding average R2 , RF reached 0.82, followed by the ANN with
0.81. ANN could be extensively used to investigate larger WWTP databases. Qiao et al.
proposed an approach to predict energy consumption and effluent quality based on a
density peaks-based adaptive fuzzy neural network (DP-AFNN) [27]. They showed that
this approach achieved high prediction accuracy compared to multiple linear regression,
the FNN-EBP (error backpropagation), and the Dynamic FNN. Ref. [5] focused on the mod-
eling and optimization of a wastewater pumping system to reduce energy consumption
using the ANN model. Specifically, they applied neural networks to model pump energy
consumption and wastewater flow rate. An artificial immune network algorithm is adopted
to solve the optimization problem by minimizing energy consumption and maximizing the
pumped wastewater flow rate. Results revealed that 6% to 14% of energy could be saved
while maintaining pumping performance. Ref. [28] investigated the capability of ANN, Gra-
dient Boosting Machine (GBM), RF, and the Long Short-Term Memory (LSTM) network in
predicting energy consumption records from a Melbourne WWTP. The prediction has been
performed by considering weather variables, wastewater characteristics, and hydraulic
variables. Feature extraction has been considered to select important variables to construct
the machine learning models. Results showed that the GBM model provided the best
prediction in this case study. However, with future changes in the test set data, a model’s
performance will degrade when applied to data from subsequent months. In [29], a neural
network model has been applied to predict pump energy consumption in a WWTP. This
will enable the generation of operational schedules for a pump system to decrease energy
consumption. The ANN model showed satisfactory prediction by reaching MAE (Mean
Absolute Error) and MAPE (Mean Absolute Percentage Error) of 0.78 and 0.02, respectively.
Table 1 shows some recent studies on energy consumption prediction in WWTPs using
various Machine Learning techniques.
Table 1. Cont.
and some weather time series collected in 2018. As WWTPs should handle highly dynamic
influent, energy consumption and influent will vary accordingly (Figure 2).
Figure 3 illustrates the total inflow and energy consumption for each year from 2014
to 2018. The data in Figure 3 is based on the available dataset, which covers the period
from January 2014 to June 2019. However, due to the limited data available for 2019
(only six months), we have excluded the 2019 data from Figure 3 to avoid any potential
confusion. The figure specifically focuses on the years from 2014 to 2018 to provide a clear
representation of the yearly variations in inflow and energy consumption. From Figure 3,
we observe a high similarity in the variation between the inflow and the consumed energy.
Essentially, the volume and composition of wastewater (inflow) that needs to be treated can
affect the energy consumption of a WWTP. An increase in the volume of wastewater can
increase the energy consumption of pumps and other equipment used to move the water
through the treatment process. Additionally, changes in the composition of the wastewater,
such as an increase in the amount of organic matter, can increase the energy consumption
of the biological treatment process.
Figure 3. Yearly sum of inflow (a) and energy consumption (b) from 2014 to 2018.
Figure 4 depicts the distribution of the collected data, indicating that these datasets
are non-Gaussian distributed. Here, the Kernel density estimation [35], a non-parametric
method, is applied to estimating the underlying distribution of a dataset. KDE is a powerful
and flexible method for estimating the distribution of a dataset; it does not make any
assumptions about the underlying distribution and can be used to estimate any kind of
distribution. Figure 4 allows us to see the shape of the distribution, the presence of outliers,
and the general characteristics of the data. It is an effective way of understanding the nature
of the data and the underlying patterns in it.
To visualize the distribution of energy consumption over the years, Figure 5 displays
the boxplots of yearly WWTP power consumption during the studied period from 2014 to
2019. Note that data for 2019 is available for only six months, which is why the boxplot for
that year is more compact than the other years. From Figure 5, we observe that the annual
distribution of energy consumption in 2018 has slightly decreased in average values and
standard deviations compared to 2017. This decrease in energy consumption could be due
to the operator’s optimization and management of the WWTP.
Water 2023, 15, 2349 8 of 23
Figure 5. Distribution of annual energy consumption over the course of the study period.
The boxplots in Figure 6 display the monthly energy consumption patterns, and it is
observed that there is a significant increase in variance during the hot months of October,
November, and December. This could be attributed to the high demand for water during
these months and also to the increase in tourism. These months are typically considered
the hottest period in Australia. The weather during this period is generally sunny, warm,
and humid, which can lead to increased water usage for cooling and other purposes
and also cause an influx of tourists, leading to increased strain on the WWTPs. This
information could be used to improve energy consumption prediction for better resource
allocation and planning.
Water 2023, 15, 2349 9 of 23
Figure 6. Distribution of monthly energy consumption over the course of the study period.
3.2. Methodology
This section briefly describes the considered machine learning models for energy
consumption prediction in this paper. As presented in Figure 7, twenty-three machine
learning models are considered, including GPR, SVR, kNN, and ensemble learning models
(RF, BT, BS, XGBoost, and LightgBM). Each model has its own strengths and weaknesses.
Next, we provide a brief description of these popular machine-learning models.
search, so it does not need to evaluate all possible combinations. In this study, we adopted
Bayesian optimization to calibrate the models. For more details on Bayesian Optimization,
refer to refs. [57,58].
Selecting the best model is an important step in machine learning. The best model can
be selected by comparing the performance of different models using these metrics. Here,
we employed four commonly used metrics: root mean square error (RMSE), mean absolute
error (MAE), mean absolute percentage error (MAPE), and training time are often used to
evaluate the performance of a model. Training time measures how long it takes to train the
model. In addition, we computed J 2 metric, which has been recently in [28]. The model
that performs best based on these metrics can be selected as the best model. The measured
and the predicted energy consumption are denoted respectively by yt and ŷt , and n as the
number of records.
• RMSE measures the differences between predicted and true values.
v
n −1
u
u1
RMSE(y, ŷ) = t
n ∑ (yi − ŷi )2 , (1)
i =0
• MAE measures the average absolute difference between predicted and true values.
n −1
1
MAE(y, ŷ) =
n ∑ kyi − ŷi k1 , (2)
i =0
Water 2023, 15, 2349 13 of 23
• MAPE measures the average percentage difference between predicted and true values.
n −1
1 kyi − ŷi k1
MAPE(y, ŷ) =
n ∑ max(e, kyi k1 )
, (3)
i =0
• j2 is the ratio of the square RMSE in testing to the square RMSE in training [28].
RMSE2test
J2 = . (4)
RMSE2train
In summary, lower values of these metrics indicate that the model is making predic-
tions that are more accurate, closer to the true values, and faster to train, which can be
considered as better precision and quality of prediction.
Table 2. Prediction results of full machine learning models based on testing data.
RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
LSVR 46.40 36.11 12.09 39.12 60.297 1.40
CSVR 41.96 33.56 12.30 45.58 197.89 0.84
GSVR 41.85 32.28 11.24 38.86 28.94 1.15
QSVR 43.44 33.68 11.81 40.66 97.052 1.14
GPRE 43.32 33.48 11.45 37.92 469.67 1.30
GPRM3/2 46.02 35.51 11.88 38.16 308.8 1.45
GPRRQ 42.44 32.62 11.24 37.97 805.05 1.24
GPRSE 46.23 35.65 11.92 38.15 482.8 1.46
GPRM5/2 46.00 35.48 11.88 38.18 527.09 1.45
BT 41.46 32.23 11.36 35.67 442.61 1.35
BST 41.37 31.97 11.25 35.77 28.376 1.33
ODT 42.57 33.57 11.87 39.06 19.345 1.18
NNN 60.05 47.08 14.58 44.06 13.64 1.85
MNN 93.77 67.85 18.51 55.90 11.86 2.81
WNN 194.31 131.10 27.91 145.03 16.16 1.79
BNN 64.54 51.00 15.65 45.90 10.11 1.97
TNN 67.59 49.76 15.69 51.95 12.14 1.69
ONN 47.09 36.55 12.21 39.40 157.72 1.42
XGBoost 41.38 32.23 12.07 42.36 375.00 0.95
RF 42.43 33.16 12.07 44.90 75.00 0.89
KNN 41.51 32.82 12.30 41.41 21.00 1.00
LightgBM 41.76 32.34 12.40 41.76 107.00 1.00
and Tmin. From Figure 9, the other climate variables, such as WSmax, atmospheric
pressure, and visibility, are not relevant and can be ignored. Figure 9 shows that for
wastewater variables, TN, COD, and BOD are correlated with the target [60]. As there is a
relatively important correlation between COD and TN (0.68); COD can be ignored. Figure 9
also indicates that Ammonia appears slightly more impactful than BOD in predicting energy
consumption, which could be attributed to several factors. One possible explanation is
that Ammonia levels in the influent wastewater can indicate the presence of nitrogen-rich
compounds, which require additional energy for their removal during the treatment process.
The energy-intensive processes involved in nitrogen removals, such as nitrification and
denitrification, may contribute to the higher impact of Ammonia on energy consumption.
On the other hand, BOD represents the amount of biodegradable organic matter in the
wastewater. This could potentially explain the relatively lower impact of BOD compared
to Ammonia on energy consumption in the studied context. Unexpectedly, influent water
quality did not exhibit a stronger predictive power for energy consumption than other
factors, such as water quantity. This could be attributed to several factors. Firstly, it is
important to consider the specific characteristics of the dataset and the operational context
of the studied WWTP. The dataset used in this study may have had relatively stable
and controlled influent water quality conditions, with limited fluctuations or variations
that could have a pronounced impact on energy consumption. Furthermore, it is worth
noting that the energy consumption in WWTPs is influenced by a complex interplay of
various factors, including hydraulic conditions, treatment processes, operational strategies,
and system design. While water quality parameters are important in the overall treatment
process, their individual contribution to energy consumption may be relatively lower than
other influential factors.
Figure 9. Feature importance identification using RF and XGBoost methods with all input variables.
Here, we investigate the performance of the reduced machine learning models that
are built using a subset of features from the original dataset. These models are trained on a
reduced dataset that contains only the relevant features, and make predictions based on
that reduced dataset. Table 3 summarized the prediction results of the reduced models
using testing data. The comparison of the performance of the models shows that KNN
outperformed all of them, with the lowest RMSE, MAE, and MAPE errors at 37.33, 28.23,
and 10.65, respectively. The KNNs outperform the static models across all criteria, and the
training time is decreased from 12 s to 10 s. It was followed by GSVR, BT, RF, and LightgBM,
which had the lowest RMSE. In terms of the J2 criteria, WNN has the lowest score of 0.38.
Furthermore, the training time for neural networks is generally the fastest of all models.
Water 2023, 15, 2349 16 of 23
RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
LSVR 47.76 37.19 12.30 39.52 102.18 1.46
CSVR 41.96 33.56 12.30 45.68 178.57 0.84
GSVR 41.96 32.16 11.17 38.39 61.98 1.19
QSVR 45.31 35.14 11.90 40.03 264.51 1.28
GPRE 43.30 33.50 11.44 38.06 379.95 1.29
GPRM3/2 42.35 32.67 11.25 38.30 462.06 1.22
GPRRQ 42.43 32.62 11.23 37.91 977.71 1.25
GPRSE 46.21 35.67 11.92 38.39 467.66 1.45
GPRM5/2 42.22 32.58 11.24 38.42 420.51 1.21
BT 41.70 32.35 11.43 35.51 450.59 1.38
BST 43.67 34.26 12.56 40.01 30.56 1.04
ODT 42.71 33.26 11.60 38.73 35.81 1.22
NNN 60.88 48.96 15.29 42.79 10.27 2.02
MNN 101.34 70.46 18.85 78.04 9.85 1.69
WNN 140.76 92.65 26.12 227.77 13.33 0.38
BNN 60.04 46.36 14.50 59.87 10.30 1.01
TNN 62.43 49.07 15.06 46.86 11.64 1.78
ONN 43.44 33.74 11.53 41.22 298.74 1.11
XGBoost 41.77 32.80 12.20 42.91 376.00 0.95
RF 41.61 32.31 12.25 41.48 64.00 1.01
KNN 37.33 28.23 10.65 41.41 10.00 0.08
LightgBM 41.27 32.00 12.27 41.27 103.00 1.00
In terms of a MAPE, the KNN model has the best prediction performance of 10.65,
followed by the GSVR model with a MAPE of 11.17, followed by the GPR models between
11.23–11.25. Moreover, based on the RMSE criteria, Table 3 shows the six best models:
GSVR, KNeighbors, LightgBM, RF, and XGBoost. The KNN and LightgBM models capture
more fluctuations, respectively, with RMSE values of 37.33 and 41.27, followed by the RF,
BT, XGBoost, and GSVR models with RMSEs between 41.6 and 41.9. Overall, the forecasting
gets better with slower training times and fewer features from the static model.
Figure 10. Feature importance identification using RF and XGBoost methods with all original input
variables and Lag 1 Energy consumption.
Similarly to the reduced models’ experiment, we construct the machine learning mod-
els based on training data with selected variables. In dynamic models, such as autoregres-
sive models, it is common to use lagged data to capture the dynamic and time-dependent
nature of the process being modeled. In this study with dynamic machine learning models,
Lag 1 energy consumption is a key predictor because it reflects the immediate past energy
consumption, significantly influencing the current energy consumption in the wastewater
treatment process. Note that we investigated the inclusion of other Lag orders, such as
Lag 2 and Lag 3 data, and the analysis consistently showed that incorporating Lag 1 data
provided superior prediction results compared to the use of Lag 2 and Lag 3 data. This
suggests the strong influence of immediate past energy consumption on the current energy
consumption in the wastewater treatment process. After building the models with Lag 1
energy consumption data as an input variable, we tested the dynamic models using data
from 29 January 2018, and 27 June 2019. Table 4 lists the prediction results of the dynamic
machine learning models. Results show that XGBoost achieved the best prediction results
in terms of RMSE (37.14); it is followed by kNN, GPRE, LightGBM, and BT with RMSE
values of 37.33, 37.36, 37.38, and 37.56, respectively (Table 4). Moreover, the time consumed
by XGBoost in training (398 s) is nearly 30 times that of the second-best model, kNN (13 s),
with only a 1% loss in predicting performance with regard to RMSE.
Overall, the time consumed by XGBoost in training is significantly higher than that of
kNN, with XGBoost taking approximately 30 times longer than kNN. However, the differ-
ence in prediction performance between the two models is relatively small, with XGBoost
having a slightly better performance in terms of RMSE. This suggests that while XGBoost
may have a higher time complexity, it may be more suitable for certain applications where a
higher level of accuracy is desired, while kNN may be more suitable for applications where
computational efficiency is a priority. Hence, a trade-off between prediction accuracy and
computational time may need to be considered when selecting a machine-learning model
for energy consumption prediction in WWTPs.
Water 2023, 15, 2349 18 of 23
RMSE Train
Methods RMSE (MWh) MAE (MWh) MAPE (%) Train Time (s) J2
(MWh)
KNN 37.33 28.23 10.65 37.48 13.00 0.99
XGBoost 37.14 28.50 10.81 37.62 398.00 0.97
LightGBM 37.38 28.63 10.96 37.11 150.00 1.01
GPRRQ 37.45 28.65 10.04 34.17 936.09 1.20
RF 37.86 28.73 10.91 41.73 66.00 0.82
GPRE 37.36 28.74 10.05 33.88 549.21 1.22
BT 37.56 28.75 10.27 33.99 332.30 1.22
GPRM5/2 37.42 28.81 10.07 33.93 502.68 1.22
BST 37.83 28.86 10.25 34.92 58.89 1.17
ODT 38.48 28.87 10.45 35.70 22.87 1.16
GSVR 37.70 28.88 10.12 34.47 34.78 1.20
GPRSE 37.59 28.92 10.10 33.96 784.89 1.23
QSVR 38.62 29.30 10.24 34.38 210.03 1.26
ONN 38.85 30.02 10.46 35.45 423.01 1.20
CSVR 40.07 30.13 10.35 44.40 122.44 0.81
LSVR 39.22 30.29 10.43 34.15 49.95 1.32
BNN 46.96 36.31 12.43 39.83 11.41 1.39
TNN 46.96 36.31 12.43 38.72 12.80 1.47
NNN 53.83 39.83 12.55 57.08 10.52 0.89
MNN 90.18 65.37 17.77 60.08 11.16 2.25
WNN 152.43 111.08 26.80 210.20 14.63 0.53
Figure 11a,b shows the heatmap of the RMSE values of the twenty-three models
(static, reduced, and dynamic models). The dynamic reduced models that incorporate
lagged energy consumption data show better prediction results in terms of RMSE and
MAPE when compared to static and reduced models. This suggests that considering past
energy consumption data can lead to more accurate predictions of energy consumption
in WWTPs (Figure 11). The dynamic ensemble models, led by XGBoost, show good
prediction performance in terms of RMSE and MAPE compared to static and reduced
models. The use of lagged energy consumption data in the dynamic models improves
the accuracy of predictions. However, it’s important to note that the time complexity of
XGBoost is considerably higher than other models like kNN.
Figure 11. Heatmap of (a) MAPE and (b) RMSE values obtained using the twenty-three models.
Water 2023, 15, 2349 19 of 23
In summary, machine learning is a powerful tool that can be used to predict energy
consumption in WWTPs. By analyzing the available input-output data, machine learning
models can identify patterns and relationships that can be used to make accurate predic-
tions about energy consumption. This can help WWTPs reduce energy costs by identifying
opportunities for energy efficiency and optimizing the treatment process in several ways.
By accurately forecasting energy consumption, operators can implement proactive mea-
sures to optimize energy usage. For example, suppose the prediction model indicates a
peak in energy demand during a specific time period. In that case, operators can schedule
the operation of energy-intensive equipment during off-peak hours or consider alternative
energy sources to minimize costs. Furthermore, by analyzing the factors influencing energy
consumption, such as influent characteristics, operational parameters, and treatment pro-
cesses, WWTPs can identify specific areas where energy efficiency improvements can be
made. This analysis may reveal opportunities to optimize process parameters, retrofit equip-
ment with energy-saving technologies, or implement advanced control strategies to reduce
energy waste. Additionally, predicting energy consumption can support decision-making
in allocating resources and investments. WWTPs can prioritize projects and investments
based on the predicted energy demands and potential energy savings. This allows for
targeted interventions and resource allocation towards areas that yield the greatest energy
efficiency improvements, resulting in long-term cost reductions. Moreover, by continuously
monitoring and updating the predictive model, WWTPs can assess the effectiveness of
energy-saving initiatives over time and fine-tune their energy management strategies. This
iterative process enables the identification of further optimization opportunities and the
implementation of adaptive measures to achieve sustained energy efficiency gains. In the
context of energy consumption prediction in WWTPs, dynamic models would be more
suitable, as they can capture the temporal dynamics of the data and make more accurate
predictions about future energy consumption. It is important to note that while XGBoost
performed well in terms of RMSE, other models such as kNN and GPR performed well
in terms of other evaluation metrics. Additionally, the time complexity of the models
should also be taken into consideration. Therefore, it is recommended to use a combination
of different models and evaluation metrics to optimize energy consumption prediction
in WWTPs.
6. Conclusions
This study investigates the application of machine learning techniques for predicting
energy consumption in WWTPs. Real data from a WWTP in Melbourne is utilized, and a
range of machine learning models, including kernel-based methods, ensemble learning
methods, ANN models, decision trees, and k-nearest neighbors, are assessed. Feature
selection methods, such as Random Forest and XGBoost, are employed to enhance model
efficiency. The findings demonstrate that incorporating past data through dynamic models,
specifically time-lagged measurements, improves the accuracy of energy consumption
predictions. The dynamic K-nearest neighbors model emerges as the top-performing model.
It is important to highlight that while XGBoost excels in terms of RMSE, other models like
kNN and GPR exhibit strong performance in different evaluation metrics. Furthermore,
considering the time complexity of the models is crucial. To optimize energy consumption
prediction in WWTPs, it is recommended to employ a combination of diverse models and
evaluation metrics.
The analysis of the Melbourne East WWTP data demonstrated that multiple variables
significantly influenced EC. Among these variables, month, TN, ammonia, daily tempera-
ture, humidity, and influent flow showed the highest impact on EC in the WWTP. However,
our investigation also revealed that factors such as rainfall, atmospheric pressure, and wind
speed did not exhibit significant effects on EC in the WWTP. Furthermore, our findings
indicated that incorporating lag 1 EC data improved the predictive performance of the
models. These results provide valuable insights into the factors influencing EC in the
Water 2023, 15, 2349 20 of 23
Melbourne East WWTP and highlight the potential benefits of considering these variables
and lagged energy consumption in future energy consumption prediction models.
The proposed framework presented in this study can be customized and implemented
in other WWTPs by incorporating plant-specific data and relevant variables. While the
specific conclusions drawn from our research may not directly translate to other plants due
to differences in operational conditions and data characteristics, the underlying principles,
methodologies, and insights gained from our study can serve as valuable references for
other pollution treatment plants. By adapting the framework to their specific context, other
WWTPs can leverage the knowledge and approaches developed in our study to enhance
their understanding and prediction of energy consumption in their respective systems.
There is still room for improvement in energy consumption prediction for WWTPs
using machine learning.
• Despite optimizing the prediction model through variable selection methods, it is
important to acknowledge that our model’s predictive capability could be influenced
by other variables that were not included due to data limitations. Future research
should focus on exploring and incorporating a broader range of variables to enhance
the accuracy and comprehensiveness of energy consumption prediction models in
WWTPs. This could involve considering additional variables related to process con-
ditions, influent characteristics, operational parameters, and external factors such as
climate and regulatory changes. By incorporating these variables, we can improve the
predictive power of the models and gain a more comprehensive understanding of the
factors impacting energy consumption in WWTPs.
• In future work, we will emphasize the need for additional studies that focus on
validating the feasibility and utility of these models in real-world scenarios. This
will involve considering factors such as computational requirements and operational
constraints commonly encountered in real WWTP settings.
• Deep learning models, known for their ability to handle time-series data, present
an intriguing avenue for further exploration in forecasting energy consumption in
WWTPs. These models, such as recurrent neural networks (RNNs) and long short-term
memory (LSTM) networks, have demonstrated promising capabilities in capturing
temporal dependencies and patterns within time-series data [61,62]. By leveraging
their strengths, deep learning models could improve the accuracy and precision of
energy consumption forecasts in WWTPs.
• Another possibility for improvement is integrating wavelet-based multiscale data
representation with machine learning models. This approach would take into account
the temporal and frequency characteristics of the data and could potentially improve
the accuracy of the prediction models. Wavelet-based multiscale representation can
also be used to extract relevant features and patterns from the data, which could be
used to improve the performance of the machine learning models. This approach
could potentially provide more accurate predictions and lead to further optimization
of energy consumption in WWTPs.
References
1. Gu, Y.; Li, Y.; Li, X.; Luo, P.; Wang, H.; Wang, X.; Wu, J.; Li, F. Energy self-sufficient wastewater treatment plants: Feasibilities and
challenges. Energy Procedia 2017, 105, 3741–3751. [CrossRef]
2. Molinos-Senante, M.; Maziotis, A. Evaluation of energy efficiency of wastewater treatment plants: The influence of the technology
and aging factors. Appl. Energy 2022, 310, 118535. [CrossRef]
3. Daw, J.; Hallett, K.; DeWolfe, J.; Venner, I. Energy Efficiency Strategies for Municipal Wastewater Treatment Facilities; Technical Report;
National Renewable Energy Lab. (NREL): Golden, CO, USA, 2012.
4. Goldstein, R.; Smith, W. Water & Sustainability: US Electricity Consumption for Water Supply & Treatment-the Next Half Century;
Electric Power Research Institute: Palo Alto, CA, USA, 2002; Volume 4.
5. Zhang, Z.; Kusiak, A.; Zeng, Y.; Wei, X. Modeling and optimization of a wastewater pumping system with data-mining methods.
Appl. Energy 2016, 164, 303–311. [CrossRef]
6. Plappally, A.; Lienhard V, J.H. Energy requirements for water production, treatment, end use, reclamation, and disposal. Renew.
Sustain. Energy Rev. 2012, 16, 4818–4848. [CrossRef]
7. Robescu, L.D.; Boncescu, C.; Bondrea, D.A.; Presura-Chirilescu, E. Impact of wastewater treatment plant technology on power
consumption and carbon footprint. In Proceedings of the 2019 International Conference on ENERGY and ENVIRONMENT
(CIEM), Timisoara, Romania, 17–18 October 2019; pp. 524–528.
8. Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A review of the artificial neural network models for water quality prediction. Appl. Sci.
2020, 10, 5776. [CrossRef]
9. Harrou, F.; Cheng, T.; Sun, Y.; Leiknes, T.; Ghaffour, N. A data-driven soft sensor to forecast energy consumption in wastewater
treatment plants: A case study. IEEE Sens. J. 2020, 21, 4908–4917. [CrossRef]
10. Cheng, T.; Harrou, F.; Kadri, F.; Sun, Y.; Leiknes, T. Forecasting of wastewater treatment plant key features using deep
learning-based models: A case study. IEEE Access 2020, 8, 184475–184485. [CrossRef]
11. El-Rawy, M.; Abd-Ellah, M.K.; Fathi, H.; Ahmed, A.K.A. Forecasting effluent and performance of wastewater treatment plant
using different machine learning techniques. J. Water Process Eng. 2021, 44, 102380. [CrossRef]
12. Hilal, A.M.; Althobaiti, M.M.; Eisa, T.A.E.; Alabdan, R.; Hamza, M.A.; Motwakel, A.; Al Duhayyim, M.; Negm, N. An Intelligent
Carbon-Based Prediction of Wastewater Treatment Plants Using Machine Learning Algorithms. Adsorpt. Sci. Technol. 2022,
2022, 8448489. [CrossRef]
13. Safeer, S.; Pandey, R.P.; Rehman, B.; Safdar, T.; Ahmad, I.; Hasan, S.W.; Ullah, A. A review of artificial intelligence in water
purification and wastewater treatment: Recent advancements. J. Water Process Eng. 2022, 49, 102974. [CrossRef]
14. Ly, Q.V.; Truong, V.H.; Ji, B.; Nguyen, X.C.; Cho, K.H.; Ngo, H.H.; Zhang, Z. Exploring potential machine learning application
based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants. Sci. Total Environ.
2022, 832, 154930. [CrossRef]
15. Cheng, T.; Harrou, F.; Sun, Y.; Leiknes, T. Monitoring influent measurements at water resource recovery facility using data-driven
soft sensor approach. IEEE Sens. J. 2018, 19, 342–352. [CrossRef]
16. Haimi, H.; Mulas, M.; Corona, F.; Vahala, R. Data-derived soft-sensors for biological wastewater treatment plants: An overview.
Environ. Model. Softw. 2013, 47, 88–107. [CrossRef]
17. Andreides, M.; Dolejš, P.; Bartáček, J. The prediction of WWTP influent characteristics: Good practices and challenges. J. Water
Process Eng. 2022, 49, 103009. [CrossRef]
18. Guo, H.; Jeong, K.; Lim, J.; Jo, J.; Kim, Y.M.; Park, J.p.; Kim, J.H.; Cho, K.H. Prediction of effluent concentration in a wastewater
treatment plant using machine learning models. J. Environ. Sci. 2015, 32, 90–101. [CrossRef] [PubMed]
19. Nadiri, A.A.; Shokri, S.; Tsai, F.T.C.; Moghaddam, A.A. Prediction of effluent quality parameters of a wastewater treatment plant
using a supervised committee fuzzy logic model. J. Clean. Prod. 2018, 180, 539–549. [CrossRef]
20. Hernández-del Olmo, F.; Gaudioso, E.; Duro, N.; Dormido, R. Machine learning weather soft-sensor for advanced control of
wastewater treatment plants. Sensors 2019, 19, 3139. [CrossRef] [PubMed]
21. Alali, Y.; Harrou, F.; Sun, Y. Predicting Energy Consumption in Wastewater Treatment Plants through Light Gradient Boosting
Machine: A Comparative Study. In Proceedings of the 2022 10th International Conference on Systems and Control (ICSC),
Marseille, France, 23–25 November 2022; pp. 137–142. [CrossRef]
22. Zhang, S.; Wang, H.; Keller, A.A. Novel Machine Learning-Based Energy Consumption Model of Wastewater Treatment Plants.
ACS ES&T Water 2021, 1, 2531–2540.
23. Boncescu, C.; Robescu, L.; Bondrea, D.; Măcinic, M. Study of energy consumption in a wastewater treatment plant using logistic
regression. In IOP Conference Series: Earth and Environmental Science, Proceedings of the 4th International Conference on Biosciences
(ICoBio 2021), Bogor, Indonesia, 11–12 August 2021; IOP Publishing: Bristol, UK, 2021; Volume 664, p. 012054.
24. Ramli, N.A.; Abdul Hamid, M.F. Data Based Modeling of a Wastewater Treatment Plant by using Machine Learning Methods.
J. Eng. Technol. 2019, 6, 14–21.
25. Torregrossa, D.; Schutz, G.; Cornelissen, A.; Hernández-Sancho, F.; Hansen, J. Energy saving in WWTP: Daily benchmarking
under uncertainty and data availability limitations. Environ. Res. 2016, 148, 330–337. [CrossRef]
Water 2023, 15, 2349 22 of 23
26. Torregrossa, D.; Leopold, U.; Hernández-Sancho, F.; Hansen, J. Machine learning for energy cost modelling in wastewater
treatment plants. J. Environ. Manag. 2018, 223, 1061–1067. [CrossRef] [PubMed]
27. Qiao, J.; Zhou, H. Modeling of energy consumption and effluent quality using density peaks-based adaptive fuzzy neural
network. IEEE/CAA J. Autom. Sin. 2018, 5, 968–976. [CrossRef]
28. Bagherzadeh, F.; Nouri, A.S.; Mehrani, M.J.; Thennadil, S. Prediction of energy consumption and evaluation of affecting factors in
a full-scale WWTP using a machine learning approach. Process Saf. Environ. Prot. 2021, 154, 458–466. [CrossRef]
29. Zhang, Z.; Zeng, Y.; Kusiak, A. Minimizing pump energy in a wastewater processing plant. Energy 2012, 47, 505–514. [CrossRef]
30. Oulebsir, R.; Lefkir, A.; Safri, A.; Bermad, A. Optimization of the energy consumption in activated sludge process using deep
learning selective modeling. Biomass Bioenergy 2020, 132, 105420. [CrossRef]
31. Das, A.; Kumawat, P.K.; Chaturvedi, N.D. A Study to Target Energy Consumption in Wastewater Treatment Plant using
Machine Learning Algorithms. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2021; Volume 50,
pp. 1511–1516.
32. Oliveira, P.; Fernandes, B.; Analide, C.; Novais, P. Forecasting energy consumption of wastewater treatment plants with a transfer
learning approach for sustainable cities. Electronics 2021, 10, 1149. [CrossRef]
33. Yusuf, J.; Faruque, R.B.; Hasan, A.J.; Ula, S. Statistical and Deep Learning Methods for Electric Load Forecasting in Multiple
Water Utility Sites. In Proceedings of the 2019 IEEE Green Energy and Smart Systems Conference (IGESSC), Long Beach, CA,
USA, 4–5 November 2019; pp. 1–5.
34. Filipe, J.; Bessa, R.J.; Reis, M.; Alves, R.; Póvoa, P. Data-driven predictive energy optimization in a wastewater pumping station.
Appl. Energy 2019, 252, 113423. [CrossRef]
35. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [CrossRef]
36. Yu, P.S.; Chen, S.T.; Chang, I.F. Support vector regression for real-time flood stage forecasting. J. Hydrol. 2006, 328, 704–716.
[CrossRef]
37. Hong, W.C.; Dong, Y.; Chen, L.Y.; Wei, S.Y. SVR with hybrid chaotic genetic algorithms for tourism demand forecasting. Appl.
Soft Comput. 2011, 11, 1881–1890. [CrossRef]
38. Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [CrossRef]
39. Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Reliable solar irradiance prediction using ensemble learning-based models: A comparative
study. Energy Convers. Manag. 2020, 208, 112582. [CrossRef]
40. Lee, J.; Wang, W.; Harrou, F.; Sun, Y. Wind power prediction using ensemble learning-based models. IEEE Access 2020,
8, 61517–61527. [CrossRef]
41. Harrou, F.; Saidi, A.; Sun, Y.; Khadraoui, S. Monitoring of photovoltaic systems using improved kernel-based learning schemes.
IEEE J. Photovoltaics 2021, 11, 806–818. [CrossRef]
42. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2.
43. Williams, C.; Rasmussen, C. Gaussian processes for regression. Adv. Neural Inf. Process. Syst. 1995, 8, 514–520.
44. Tang, L.; Yu, L.; Wang, S.; Li, J.; Wang, S. A novel hybrid ensemble learning paradigm for nuclear energy consumption forecasting.
Appl. Energy 2012, 93, 432–443. [CrossRef]
45. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2013;
Volume 112.
46. Loh, W.Y. Classification and regression trees. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 14–23. [CrossRef]
47. Zhou, Z. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012; pp. 15–55.
48. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
49. Bühlmann, P.; Hothorn, T. Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci. 2007, 22, 477–505.
50. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
51. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Zhou, T. Xgboost: Extreme gradient boosting.
R Package Version 0.4-2 2015, 1, 1–4. Available online: https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf
(accessed on 17 June 2023).
52. Deng, H.; Yan, F.; Wang, H.; Fang, L.; Zhou, Z.; Zhang, F.; Xu, C.; Jiang, H. Electricity Price Prediction Based on LSTM and
LightGBM. In Proceedings of the 2021 IEEE 4th International Conference on Electronics and Communication Engineering
(ICECE), Xi’an, China, 17–19 December 2021; pp. 286–290.
53. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision
tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52.
54. Bull, A.D. Convergence rates of efficient global optimization algorithms. J. Mach. Learn. Res. 2011, 12, 2879–2904.
55. Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305.
56. Yang, L.; Shami, A. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 2020,
415, 295–316. [CrossRef]
57. Snoek, J.; Larochelle, H.; Adams, R.P. Practical bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process.
Syst. 2012, 25, 1–12 .
58. Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian
optimization. Proc. IEEE 2015, 104, 148–175. [CrossRef]
Water 2023, 15, 2349 23 of 23
59. Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction;
Springer: Amsterdam, The Netherlands, 2009; Volume 2.
60. Wang, S.; Zou, L.; Li, H.; Zheng, K.; Wang, Y.; Zheng, G.; Li, J. Full-scale membrane bioreactor process WWTPs in East Taihu
basin: Wastewater characteristics, energy consumption and sustainability. Sci. Total Environ. 2020, 723, 137983. [CrossRef]
61. Rathnayake, N.; Rathnayake, U.; Dang, T.L.; Hoshino, Y. A Cascaded Adaptive Network-Based Fuzzy Inference System for
Hydropower Forecasting. Sensors 2022, 22, 2905. [CrossRef]
62. Harrou, F.; Dairi, A.; Kadri, F.; Sun, Y. Forecasting emergency department overcrowding: A deep learning framework. Chaos
Solitons Fractals 2020, 139, 110247. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.