sensors-24-02871
sensors-24-02871
Article
Smart Water Quality Monitoring with IoT Wireless
Sensor Networks
Yurav Singh * and Tom Walingo *
Abstract: Traditional laboratory-based water quality monitoring and testing approaches are soon to be
outdated, mainly because of the need for real-time feedback and immediate responses to emergencies.
The more recent wireless sensor network (WSN)-based techniques are evolving to alleviate the
problems of monitoring, coverage, and energy management, among others. The inclusion of the
Internet of Things (IoT) in WSN techniques can further lead to their improvement in delivering,
in real time, effective and efficient water-monitoring systems, reaping from the benefits of IoT
wireless systems. However, they still suffer from the inability to deliver accurate real-time data, a
lack of reconfigurability, the need to be deployed in ad hoc harsh environments, and their limited
acceptability within industry. Electronic sensors are required for them to be effectively incorporated
into the IoT WSN water-quality-monitoring system. Very few electronic sensors exist for parameter
measurement. This necessitates the incorporation of artificial intelligence (AI) sensory techniques
for smart water-quality-monitoring systems for indicators without actual electronic sensors by
relating with available sensor data. This approach is in its infancy and is still not yet accepted nor
standardized by the industry. This work presents a smart water-quality-monitoring framework
featuring an intelligent IoT WSN monitoring system. The system uses AI sensors for indicators
without electronic sensors, as the design of electronic sensors is lagging behind monitoring systems.
In particular, machine learning algorithms are used to predict E. coli concentrations in water. Six
different machine learning models (ridge regression, random forest regressor, stochastic gradient
boosting, support vector machine, k-nearest neighbors, and AdaBoost regressor) are used on a
sourced dataset. From the results, the best-performing model on average during testing was the
AdaBoost regressor (a MAE of 14.37 counts/100 mL), and the worst-performing model was stochastic
Citation: Singh, Y.; Walingo, T. Smart
gradient boosting (a MAE of 42.27 counts/100 mL). The development and application of such a
Water Quality Monitoring with IoT
system is not trivial. The best-performing water parameter set (Set A) contained pH, conductivity,
Wireless Sensor Networks. Sensors
2024, 24, 2871. https://doi.org/
chloride, turbidity, nitrates, and chlorophyll.
10.3390/s24092871
Keywords: artificial intelligence; IoT; machine learning; water quality measurement; water quality
Academic Editor: Shimshon Belkin
indicators; water quality sensors; wireless sensor networks
Received: 28 November 2023
Revised: 3 January 2024
Accepted: 11 January 2024
Published: 30 April 2024 1. Introduction
The traditional approach to water quality measurement (WQM) requires manual water
sampling to determine the quality of water through analysis. This process involves the
collection of water samples by humans for in situ testing by lab technicians in laboratories.
Copyright: © 2024 by the authors.
While this process does not enable instantaneous WQM, it has been considered to be the
Licensee MDPI, Basel, Switzerland.
most feasible solution. Research has generally been focused on improving laboratory
This article is an open access article
distributed under the terms and
techniques in analyzing water quality [1] and the introduction of sampling-site laboratories
conditions of the Creative Commons
near water bodies to make monitoring more efficient using existing techniques [2]. It
Attribution (CC BY) license (https:// is apparent that the existing WQM techniques have shortcomings. These flaws can be
creativecommons.org/licenses/by/ broadly classified as either human error in the collection of samples, during analysis, and
4.0/). during the recording of data or improper lab equipment and its handling in the same
Table 1. Cont.
Clarity • Secchi disk methods *. • Can use sensors for turbidity to determine the clarity
• Turbidity measurement methods *. based on the relationship between the two parameters.
• Devices such as thermistors, • Analogue probes: YSI Xylem WQ101, OH, USA [25].
thermocouples, and thermometers *. • Digital probes: DFROBOT SEN0511 DS18B20, Shanghai,
Temperature • Conductivity–Temperature–Depth (CTD) China [26].
meter *. • Modbus industrial sensors: ComWinTop CWT-T01S,
Shenzhen, China [27].
Sensors 2024, 24, 2871 4 of 22
Table 1. Cont.
There have been tremendous advances in WSN water quality monitoring. Yang
et al. developed and tested a wireless sensor network for monitoring an aqueous environ-
ment [37]. The system was developed due to the importance placed on developing network
sensor technology in aqueous environments. Ryecroft et al. noted the major developments
in the monitoring of air quality using IoT technology; however, water quality monitoring
is still dependent on manual sample collection [38]. Rosero-Montalvo et al. presented
an intelligent WSN system that has the ability to determine the quality of water using
machine learning (ML) algorithms [39]. The aim of the research is to determine the water
quality of the river through the route by creating data reports into interactive interfaces
for users. Adu-Manu et al. implemented a smart river monitoring system using wireless
sensor networks [40]. The focus of the system was to attain energy efficiency during the
monitoring and transmission of data. Murphy et al. developed a low-cost optical sensor
for water quality monitoring [41]. The development of the optical sensor was informed by
the challenges that wirelessly networked sensors currently have despite advancements in
water quality monitoring. O’Flynn et al. presented the “SmartCoast” multi-sensor system
for water quality monitoring [42]. The SmartCoast system creates a WSN with plug and
play sensors to facilitate communication with low power consumption. Seders et al. pre-
sented LakeNet [43] water quality monitoring with a network of sensors to monitor water
quality in lakes and wetlands for the following parameters: temp, pH, and DO. Chen et al.
developed a system with wireless transmission technology to transmit the water quality
parameters of a fish farm [44]. The parameters monitored were temperature, DO, pH levels,
level of the water, and the implemented sensors’ life expectancy. The system incorporates a
robotic arm with a programmable logic controller, wireless transmission, and an embedded
system designed to undertake automatic measurement and maintenance. Jáquez et al.
developed a prototype utilizing IoT technologies in a water-quality-monitoring system
(IoT-WQMS) [45]. The architecture of the system has a LoRa repeater and an anomaly
detection algorithm. The results of the study indicated that the prototype improved the
reliability of monitoring by promptly identifying sensor malfunctions and the increased
signal range of the LoRa. Razman et al. created a water quality monitoring and filtration
system controlled by Arduino [46]. The system was developed to compare the water quality
of water from lake, river, and tap sources. The system monitored pH levels, turbidity, EC,
ORP, and temperature through the ThingSpeak platform. None of these works present
comprehensible underwater IOT WSNs that utilize ML algorithms to determine unknown
parameter data using the existing data captured by sensor nodes.
The current trend in WQM introduces artificial intelligence (AI) in determining water
quality. Whilst the integration of AI techniques to determine water quality is still a relatively
new approach, there have been systems that incorporate them. AI techniques provide large
water bodies, i.e., rivers, with greater monitoring efficiency. The substantial data collected
through sensor networks can be assessed through prediction by the AI techniques. Ubah
et al. implemented a system that forecasted water quality parameters using an artificial
Sensors 2024, 24, 2871 5 of 22
neural network [47]. The river was tested at four points for parameters that include pH,
TDS, EC, and sodium. Khan and Islam presented machine-learning-based prediction and
classification models to predict and classify water quality status [48]. The parameters
predicted included pH, suspended solids, EC, TDS, turb, DO, alkalinity, chloride, and
demand for chemical oxygen. H Aldhyani et al. implemented a water quality prediction
system using AI algorithms [49]. In the system, advanced AI algorithms are developed
to predict the water quality index and water quality classification (WQC). Paepae et al.
reviewed the feasibility of utilizing virtual sensing for water quality assessment [50]. One of
the findings of the review was that random forest, artificial neural networks, and multiple
linear regression approaches dominated machine learning techniques in inferential model
development. Chen et al. proposed a hybrid model of machine learning and optimization
algorithms to predict water quality parameters [51]. The authors used the Adaptive
Evolutionary Artificial Bee Colony–Back Propagation Neural Network (AEABC-BPNN)
algorithm model, which was compared to the prediction results of support vector machine
(SVM), back propagation neural network (BPNN), genetic algorithm (GA)–BPNN, particle
swarm optimization (PSO)–BPNN, Artificial Bee Colony (ABC)–BPNN, and long short-
term memory (LSTM) models. AEABC-BPNN was found to increase the robustness of the
prediction models. The results from the testing process showed that the AEABC-BPNN
approach attained convergence in 14 generations and has a quicker convergence speed.
AEABC-BPNN had an optimal mean fitness of 0.0322 and it obtains prediction values
that are more accurate after data anomalies are processed. None of these works provide a
comprehensible solution utilizing ML algorithms to determine the E. coli concentrations in
water using WSNs.
The work of Stoker et al. illustrated that high accuracy in predicting E. coli levels was
possible with just five core parameters, determined through recursive feature selection: pH,
DO, EC, temp, and turb [52]. The work also outlined that the inclusion of more parameters
(8 or 12) only moderately increased the performance of the prediction model [52]. Stoker
et al. determined that the use of a random forest (RF) model provided greater performance
consistently in predicting E. coli levels when compared to other models [52]. Naloufi et al.
found that the prediction of microbial quality in surface waters still proves to be difficult;
thus, the concentrations could not be predicted in all contexts. Developing models to adapt
to environmental changes was determined to be necessary [53]. Whilst the works of Stocker
et al. and Naloufi et al. utilize ML algorithms to predict E. coli concentrations in water,
none of these works utilize WSNs in the study.
There is tremendous research and advances in WQM, wireless water technologies,
hardware equipment development, and data analytics. The communication technologies
have acquired more rapid transmission rates, increased power efficiency, greater network
support in remote areas, and have become more cost efficient in implementation in smart
water quality monitoring (SWQM). These advances bring impressive gains within SWQM.
Though they have their shortcomings, like power consumption, adapting to the technology
outweighs their shortcomings and increases system performance in other areas. Advances
in the reliability of communication technology make remote monitoring more efficient and
achievable. The main areas of advances in hardware can be observed in sensor technology
and energy-harvesting technology. Utilizing and developing these new sensor technologies
creates larger sensor networks and enables the remote monitoring of more WQIs. The
greatest data analytical advance has been the introduction of AI in determining water
quality and WQI. AI techniques have provided a method of analyzing data collected
from SWQM systems and identifying the status of water quality. The inclusion of AI
techniques makes it possible to monitor different WQIs that do not have sensors available
to remotely monitor them based on parameters that were remotely monitored through
sensors. Algorithms and mathematical models have also been utilized to predict trends in
water quality; thus, this allows for the prediction of future water quality changes. Whilst
all of the advancements listed have been implemented in systems, there has been a lack of
SWQM systems that can collectively utilize these advancements.
Sensors 2024, 24, x FOR PEER REVIEW 6 of 23
Sensors 2024, 24, 2871 quality changes. Whilst all of the advancements listed have been implemented in systems, 6 of 22
there has been a lack of SWQM systems that can collectively utilize these advancements.
This work presents a smart water-quality-monitoring framework featuring an
intelligent
This workIoTpresents
WSN monitoring system. The system uses
a smart water-quality-monitoring AI sensors
framework to account
featuring for
an intelli-
indicators without electronic sensors, as the design of electronic sensors
gent IoT WSN monitoring system. The system uses AI sensors to account for indicators is lagging behind
monitoring
without systems.
electronic Whilst
sensors, the design
as the use of ofAIelectronic
sensory techniques can be applied
sensors is lagging to different
behind monitoring
water quality
systems. Whilstparameters,
the use of AI this study techniques
sensory focuses on using
can beAI sensory
applied to techniques
different water to predict
qualityE.
coli concentrations.
parameters, this study The work explores
focuses on using WSN-based WQM systems,
AI sensory techniques WSNs,
to predict E. and wireless
coli concen-
sensors,The
trations. as well
workas the constraints
explores WSN-based andWQM
challenges
systems, associated
WSNs, and with these WSN-based
wireless sensors, as
systems, and the deployment of ML algorithms in the prediction of E.
well as the constraints and challenges associated with these WSN-based systems, and the coli concentrations
using parameter
deployment of MLdata from WSNs.
algorithms in theThe dataset for
prediction of the developed
E. coli AI technique
concentrations using in this work
parameter
usesfrom
data the samples
WSNs. The collected
datasetfrom four
for the water-treatment
developed plants
AI technique in in
thisSouth
workAfrica.
uses theThe plants
samples
collected
used forfrom
data four water-treatment
sampling were Vaalkop, plants in SouthWallmansthal,
Klipdrift, Africa. The plantsand used for data
Cullinan. Thesam-
data
pling
werewere Vaalkop,
sampled over aKlipdrift,
period ofWallmansthal, and Cullinan.
7 years (July 2011–June 2018)The data
[54]. Thewere
aim sampled overis
of this study
ato
period of 7 years
determine (July 2011–June
the effectiveness 2018)machine
of using [54]. The aim of this
learning study istotodetermine
algorithms determineE.the coli
effectiveness of using machine learning algorithms to determine E. coli
concentrations in water and the effectiveness of using different parameter sets in machine concentrations in
water and models
learning the effectiveness
to predictofE.using different parameter
coli concentrations. The sets in machine
chosen parameterlearning
sets weremodels
basedto
on the E.
predict coliofconcentrations.
cost wireless sensorThe chosen parameter
procurement and localsets were basedofon
availability the costsensors.
wireless of wireless
sensor procurement and local availability of wireless sensors.
2. Smart Water Quality Monitoring (SWQM)
2. Smart Water Quality Monitoring (SWQM)
2.1. Smart Water Quality Monitoring (SWQM) Framework
2.1. Smart Water Quality Monitoring (SWQM) Framework
Generally, as illustrated in Figure 1, the commonly implemented SWQM models
Generally, as illustrated in Figure 1, the commonly implemented SWQM models have,
have, at a minimum, three elements that together create a basic network to monitor water
at a minimum, three elements that together create a basic network to monitor water quality
quality remotely. These elements are the sensing system, the communication system, and
remotely. These elements are the sensing system, the communication system, and the head
the head end system (see [55–61]).
end system (see [55–61]).
Figure1.1.Common
Figure Commonunderwater
underwaterWSN-based
WSN-basedSWQM.
SWQM.
2.1.1.
2.1.1.The
TheSWQM
SWQMSensing
SensingSystem
System
The
TheWSN
WSNsensing
sensingsystem
systemperforms
performsthe
thecollection,
collection,processing,
processing,and
andtransfer
transferofofdata.
data.
The data collection process is supported by a network of sensing devices at
The data collection process is supported by a network of sensing devices at different different
locations
locationsininwater
waterbodies.
bodies.This
Thisenables
enablesthe
thesampling
samplingofofwater
waterover
overlarge
largeareas
areasatatconsistent
consistent
time
time intervals. The sensing module consists of a sensor transducer thatcaptures
intervals. The sensing module consists of a sensor transducer that capturesthethe
parameter and sends it to the processing unit for processing; the data are then sent through
a communication unit to the intermediate nodes or gateway. All these are powered by
the power supply unit. By implementing multiple sensors in various locations along the
Sensors 2024, 24, 2871 7 of 22
water body to acquire samples at more frequent time intervals, the accuracy in determining
precise water quality levels increases. This can be attributed to more data being available
for analysis when determining water quality.
When the sensing module performs the filtering and processing of data, computational
devices are used to filter the data and apply algorithms to the measured parameters. Data
processing can be performed using two commonly used methods [62]: in-node processing
(InP) and collaborative task processing (CTP). InP involves the node using data collected
from its own sensors, whilst CTP involves the nodes that are near each other sharing data
with one another; thus, they use the data from different locations to perform the processing
stage. The majority of WSN-based WQM systems use both InP and CTP when processing
data. This allows nodes to process their own data and share their data with other nodes for
enhanced or additional processing. To determine the source of contamination, InP is useful
as it can provide a location based on processing at a node. To determine the general status
of the water quality of a water body, CTP is more useful, as it can give an average value
due to the processing of shared data.
nals, failure of devices, and propagation delay [72]. An effective underwater WSN-based
SWQM system can incorporate underwater communication techniques with terrestrial
communication techniques for the section of the system that utilizes terrestrial communi-
cation. Electromagnetic and optics communications are more constrained than acoustic
communications [73]. Electromagnetic and optical transmission struggle to communicate
in seawater [73] due to the conducting nature of seawater. Optic waves have difficulties
with transmission distances in seawater due to their waves being absorbed by seawater.
However, acoustic transmission has a stronger underwater communication ability than
both electromagnetic and optic transmission due to acoustic transmission having lower at-
tenuation in seawater [74]. Thus, the choice of communication for underwater transmission
should be acoustic due to the stronger performance in seawater. Acoustic communication
does have implementation challenges such as path loss, noise, multi-path, delay variance,
and Doppler spread [73,75].
2.2.2. Topology
Designing of underwater water quality networks provide numerous challenges. Their
non-static nature adds an additional layer of difficulty when designing network topologies
for underwater networks. With underwater networks relying on acoustic communication
technologies, an efficient topology design would aid in negating most of the shortcomings
of acoustic communication technologies. Network reliability increases with an efficiently de-
signed network topology [73]. Energy efficiency is usually the outcome of a well-designed
network topology; thus, the energy consumption issues surrounding underwater networks
can be controlled [71]. Marais et al. provided an extensive review on topologies used in
WSN applications [76]. The star, tree, and mesh topologies affect packet transmission and,
hence, packet loss.
2.2.3. Bandwidth
Efficient utilization of the accessible bandwidth in WSNs is essential for effective
sharing by all the nodes in the range of the wireless network. The number of sensor
nodes deployed influences the bandwidth available. The depth of node deployment
affects bandwidth. The bandwidth increases with the depth of deployment of the sensor
nodes [77]. The more sensor nodes there are accessing a wireless network, the lower the
bandwidth available. Thus, while greater node density creates the benefit of better multi-
hop routing, less bandwidth is available to the nodes as a result. Due to the utilization
of energy for sensor nodes, less bandwidth is generally available to the sensor nodes
for energy conservation. Bandwidth requires a balance of a suitable network topology,
communication, and power consumption. An investigation into the effect of topology on
network bandwidth made several findings [78]. It was found that the number of nodes in a
total network affects the bandwidth of the network. Network bandwidth is affected by the
number of inter-nodal links. Thus, it was recommended that, if there is a large amount of
traffic in a network, only then should the number of nodes in the system be increased, and
there should be fewer inter-nodal links.
sources are rendered infeasible due to batteries being inaccessible or difficult to replace.
Furthermore, charging and recharging of the systems is a constraint due to their locations.
To conserve energy in a monitoring system, varying the sampling rate is a favorable method
to achieve conservation. Other measures like energy harvesting have also been deployed.
It is noted that WSN systems currently implemented with energy-harvesting systems
mainly utilize solar panels to harvest energy from the sun. The harvested energy charges
lithium-ion batteries. Solar energy harvesting for WSN networks is a popular option due to
solar energy’s power density compared to other currently implemented energy-harvesting
solutions [62].
2.2.5. Fabrication
Underwater SWQM that use WSNs create a unique constraint during monitoring.
The stress that an underwater environment exerts on sensor nodes makes them prone to
water ingress and structural failure at depths. The electronic components that collectively
enable the functioning of the sensor node are sensitive to liquid and will fail if they
make contact with water over an extended period. Thus, the enclosure that houses these
components must be waterproof and structurally resistant to the pressure exerted on them
at underwater depths. Yang et al. and Ryecroft et al. fabricated enclosures to alleviate some
of the challenges faced in using underwater WSNs (see [37,38]). Enclosures of sensor nodes
would need to be able to withstand the underwater forces and wireless sensors would need
to be waterproofed to ensure that water ingress does not damage their sensing components.
Wiring would need to be protected or placed internally in the sensor node, to be protected
from terrain, floating debris, and marine lifeforms. Placement of sensor nodes can create
issues in communication. The closer the sensor nodes are to the bed of the water body, the
greater the risk of the acoustic waves scattering or reflecting.
2.2.6. Security
Security in systems that utilize wireless communication networks are a concern due
to the inherent vulnerability present when there is an absence of a physical connection.
In WSNs, security is compromised more significantly than other wireless communication
networks due to the limited nature of the supply of energy to nodes [80]. A greater
complexity in the security of a network results in greater energy usage. The data captured
by sensors should determine the priority level of the security in the system. If the data
captured contain sensitive information, then the compromise between energy and security
would need to be addressed accordingly. If inadequate security is implemented in the
system, hackers can intercept and alter data from the network. The database can also have
malicious code inserted into the system. The addition of authentication and authorization
protocols in the system will heighten security. Whilst current and previous works on
implemented SWQM systems have very little security integration in their design, there
are a few examples in simulated systems of security integration. The reason for the low
number in security integration can be attributed to the increased power requirements for
security protocols. In a smart solution for water quality monitoring [81], the system has
integrated security protocols for data transmission. The transport layer security (TLS) was
used for data encryption and JavaScript object notation web algorithms (RFC 7518) for
authentication based on public/private keys. In the implemented multi-hop underwater
WSN system using the bowtie antenna [38], an AES encryption module was used to ensure
that all communications were encrypted.
theft [80]. To lower the risk of interference, implementing sensors in a discrete manner
would result in reduced attraction of wildlife. Using sensors that maintain high quality but
are cost-effective would result in lower costs in replacing sensors if failures were to occur.
2.2.9. Sensors
In SWQM, using WSNs, sensors are the most important component of the system as
they are responsible for the acquisition of parameter data. However, these sensors have con-
straints that can affect their functionality, effectiveness, and their implementation. Firstly,
the availability of sensors is a problem. There are wireless sensors for the measurement
of water quality parameters. However, not all parameters can currently be measured by
wireless sensors. This is a subject of ongoing testing, research, and development, and
the commercialization of many sensors for certain parameters has not yet occurred. A
solution is to integrate ML algorithms to determine the quantity of parameters. By using
other known parameters that have a wireless sensor available to measure the quantity, a
relationship between an unknown parameter quantity and a known parameter quantity
can be established; thus, an ML algorithm can be used to predict the unknown quantity
from the relationship.
Secondly, sensor calibration poses issues. Sensors can have a temporal shift in response
when faced with sustained chemical and physical conditions [62]. This is known as sensor
drift. Damage to the sensors caused by water or ground water fluxes can create errors
in measurement by the sensors [82]. Sensor drift creates doubt in the credibility of data
obtained for monitoring over a period of time; thus, developing trends or datasets from
the sensor data may exhibit variation due to inconsistencies in the obtained data. Sensors
must be calibrated at every specified interval to achieve accurate readings. Calibration drift
occurs when there is a difference in the obtained readings from the calibrated sensor and
current sensor reading in a standard solution. Calibration drift is an electronic drift and
would require the sensor to be recalibrated to obtain accurate readings once again.
Thirdly, Biofouling occurs on the surface of sensors due to their immersion in water for
long periods of time when capturing parameter data. Algae and bacteria cause fouling; thus,
there is a high possibility of biofouling occurring in SWQM systems using WSNs. There
are several factors that influence biofouling occurrence on the surface of sensors. These
factors can be chemical, physical, and biological [62]. The lifespan of a sensor decreases
when biofouling occurs on the surface of the sensor and there can be inconsistencies in the
data obtained from the sensor, thus reducing the accuracy of the sensor. Currently, many
sensor manufacturers are researching and applying new technologies to sensors, designed
to reduce the effect of biofouling. However, these sensors are not currently available for
commercial use.
2.3. Summary
Section 2 investigated the SWQM frameworks and the challenges associated with their
implementation. It was found that the SWQM framework comprises a sensing system,
a communication system, and an HES. The sensing system performs the collection of
data, the processing of data, and the transferring of data. The communication system is
responsible for relaying the sensed data to the HES. The HES provides a control center
for data acquisition, analysis, storage, and management and the control of the system.
Sensors 2024, 24, 2871 11 of 22
The challenges associated with the implementation of SWQM were found to include
communication technology, power consumption, theft and damages, the underwater
environment, fabrication, security, topology, bandwidth, and sensors.
Set Parameters
Set A pH, conductivity, chloride, turbidity, nitrates, and chlorophyll
Set B pH, ammonium, chloride, nitrites, iron, manganese, phosphate, and sulphate
error function is utilized in AdaBoost to boost weak classifier weights. The classifier can be
seen in Equation (2) [90]:
1 i f ∑it=1 at ht ≥ threshold
h( x ) = (2)
0 otherwise
where ht is the weak classifier t output and at is the classifier weight that is assigned in
Equation (2) [90].
The decision trees of AdaBoost have one level of classification. The training dataset is
weighted at each instance and the initialization of weight is given in Equation (3) [90]:
where xi and n represent the training instance and the number of training instances,
respectively.
Stoker et al. determined that the random forest (RF) model had superior performance
in predicting E. coli levels in comparison to other models in the study [52]. Whilst the
findings of this study indicated that the AdaBoost regressor provided better performance
over the RF regressor, it still performed well enough to be considered in the implementation
of the predictive model. The findings of this study indicated that the use of core parameters
provided high accuracy in predicting E. coli concentrations.
and s
n
1
RMSE =
n ∑ ( x i − x )2 (5)
i =1
where n = number of samples, xi = the actual value, and x = the predicted value in
Equations (4) and (5) [93].
3.1.4. Method
All the models were implemented using Scikit-Learn and its libraries. Each model
underwent a 10-fold cross-validation to achieve their best evaluation results on the dataset.
K-fold validation splits the dataset into k subsets or folds. Training and evaluation of the
model is completed K number of times. The validation set being a different fold each time.
Figure 2 shows the system model for obtaining results for each ML model.
3.1.4. Method
All the models were implemented using Scikit-Learn and its libraries. Each model
underwent a 10-fold cross-validation to achieve their best evaluation results on the
dataset. K-fold validation splits the dataset into k subsets or folds. Training and evaluation
Sensors 2024, 24, 2871 14 fold
of 22
of the model is completed K number of times. The validation set being a different
each time. Figure 2 shows the system model for obtaining results for each ML model.
From Table 3, it can be observed that all the models tested provided
Evaluation Score different levels
ofEvaluation in their prediction. On Parameter
accuracyMetric Set random forest regressor, stochastic
average, the MAE RMSE
gradient boosting, and ridge regressor yielded the worst accuracy of the six models tested,
Set A 28.65 37.50
with stochastic
Ridge gradient boosting being the least accurate of the aforementioned three. The
regression
average 𝑀𝐴𝐸 between the worst threeSet B
models was 4.8136.67
(14.50%) and the83.26
average 𝑅𝑀𝑆𝐸
Set A 36.63 58.14
was 10.38 (16.16%). On average, the AdaBoost regressor, support vector machine, and k-
Random forest regressor
nearest neighbors showed the greatest Setaccuracy
B of all46.58
six models, with the AdaBoost
91.22
regressor yielding the best accuracy onSet
average.
A 45.36 𝑀𝐴𝐸 between
The average the best three
73.26
models was
Stochastic (26.81%) and the average 𝑅𝑀𝑆𝐸 was 20.48 (83.24%). The large
4.19 boosting
gradient
Set B 39.17 88.99
percentage of average 𝑀𝐴𝐸 and 𝑅𝑀𝑆𝐸 between the best three models is due to the
difference between the best two models: Setthe
A AdaBoost regressor
19.46 37.20 neighbors.
and k-nearest
Support vector machine
The difference in 𝑀𝐴𝐸 between the two Set models
B is 6.1526.01
(42.80%) and 33,06 (152.07%) for
88.18
Set A 12.08 19.04
k-nearest neighbors
Set B 28.96 90.56
Set A 15.42 20.14
AdaBoost
Set B 13.32 23.34
An observation noted was the consistency in model performance across all six models
with both parameter Sets A and B. The models that performed strongly for Set A had strong
performance in Set B; likewise, the weaker-performing models performed poorly across
both parameter Sets A and B. This would give confidence in choosing a suitable model for
implementation in WQM WSNs for E. coli prediction: a strong-performing model would
work for both sets of wireless parameter sensors.
Table 4 contains the MAE and RMSE by adding all MAE and RMSE results, respectively,
from Table 3 for parameter Sets A and B and obtaining the average for each parameter set.
Sensors 2024, 24, 2871 15 of 22
Set A Set B
From Table 4, it was determined that parameter Set A provided a higher level of
performance when compared to parameter Set B. The best-performing parameter set on
average was Set A. The differences in MAE and RMSE between Set A and Set B were
5.52 (21.01%) and 36.71 (89.80%), respectively. This indicated that using more commonly
available wireless sensors in the initial deployment of the WQM WSN would benefit
the performance of the model in predicting the E. coli concentrations. However, the
performance difference noted was not significant; thus, the model would benefit from the
addition of Set B parameter sensors at later implementations of the WQM WSN to provide
more data for the model to improve its predictions.
Table 5 contains the MAE and RMSE by adding all MAE and RMSE results, respec-
tively, from Table 3 for each of the six models used in the study and obtaining the average
for each model.
From Table 5, it was determined that the best-performing model on average during
testing was the AdaBoost regressor; the worst-performing model was stochastic gradient
boosting. The AdaBoost regressor had an RMSE value 7.37 (51.29%) greater than the
MAE value. This relatively small range in difference between the MAE and RMSE values
indicates a small variance in the sample errors. The difference is relatively small when
compared to the difference between the MAE and RMSE values of stochastic gradient
boosting. Stochastic gradient boosting had an RMSE value that was 38.86 (91.93%) greater
than the MAE value. This large range in the differences between the MAE and RMSE
values indicates high variance in the sample errors. However, the largest variance can be
seen in the support vector machine, with a large difference in MAE and RMSE value of
39.95 (175.68%).
Table 5. List of MAE and RMSE for each of the six models used in the study.
There are numerous reasons that can be cited for the large variation in MAE and
RMSE between the best- and worst-performing models. The main reason is the number of
samples and the data quality of the dataset [94]. If more high-quality data are available,
the difference between the best and worst models could be smaller. Outliers in the dataset
can affect the accuracy of models, especially with linear and boosting models [95]. With
boosting, outliers create issues with classifiers since they must correct previously found
errors, and outliers greatly affect linear models [96].
The graphs in Figures 3–7 illustrate the resulting data of the actual E. coli concentrations
from the test dataset versus the predicted E. coli concentrations using the best-performing
predictive model. Figure 3 illustrates the best-performing model for parameter Set A (k-
nearest neighbors), with the red and green lines representing the 95% confidence interval
upper and lower bounds for the dataset mean E. coli concentrations. Figure 4 illustrates the
best-performing model for parameter Set B (AdaBoost regression), with the red and green
Sensors 2024, 24, 2871 16 of 22
Sensors2024,
Sensors 2024,24,
24,xxFOR
FORPEER
PEERREVIEW
REVIEW 16 of
16 of 23
23
lines representing the 95% confidence interval upper and lower bounds for the dataset mean
E. coli concentrations. Figure 5 illustrates the same graph as Figure 4; however, the highest
samples
E. coli value
samples and(extreme
and thedata
the dataquality
qualityof
outlier) ofthe
fromthethedataset
graph
dataset [94]. IfIfmore
in Figure
[94]. more4high-quality
high-quality
was excluded data areavailable,
to improve
data are available,
the scale
the
of difference
the graph between
and highlightthe best
the and worst
predicting models
performance could be
the difference between the best and worst models could be smaller. Outliers in the dataset smaller.
for E. coliOutliers
values in the
situated dataset
near the
can
lower affect
bound the accuracy
of the of models,
confidence especially
interval. with
Figure linear
6 and
illustrates
can affect the accuracy of models, especially with linear and boosting models [95]. With boosting
the models
best-performing [95]. With
model
boosting,
overall for outliers
both create
parameter issues
Sets with
A and classifiers
B (AdaBoost since they must
boosting, outliers create issues with classifiers since they must correct previously foundlines
regression), correct
with thepreviously
red and found
green
errors,andandoutliers
representing
errors, outliers
the 95% greatly affectlinear
confidence
greatly affect linear
interval models
upper
models [96].
[96].and lower bounds for the dataset mean E.
The graphs
coli concentrations.
The graphs inFigure in Figures
Figures 3–7 illustrate
7 illustrates
3–7 illustratethe same the
the graph resulting
resulting data of
as Figure
data of however,
6; the actual
the actualtheE.E. coli E.
highest
coli
concentrations
coli value (extreme
concentrations from the
from outlier) test
the test fromdataset
dataset versus
theversus
graphthe the predicted
in predicted
Figure 6 was E. coli concentrations
excluded
E. coli to improve
concentrations using
usingthe the
thescale
best-performing
of the graph andpredictive
best-performing predictive
highlight model. model. Figure
the predicting 3 illustrates
Figure performance
3 illustrates the the best-performing
E. coli values situated
for best-performing model
modelnear for the
for
parameter
parameter
lower bound Setof
Set AA(k-nearest
(k-nearest
the confidence neighbors),
neighbors),
interval. withthe
with thered redand andgreen
greenlines
linesrepresenting
representingthe the95%
95%
confidence
confidence interval
interval upper
An encouraging upper and lower
and lowerthat
observation bounds
boundssuitsforfor the
anthe dataset mean
dataset mean system
early-warning E. coli concentrations.
E. coli concentrations.
for E. coli concen-
Figure44illustrates
Figure
tration isillustrates
the ability thebest-performing
the ofbest-performing
the best-performing modelfor
model forparameter
models parameterSet SetBB(AdaBoost
to identify (AdaBoostregression),
extreme regression),
changes in E. coli
with
with the
the red
red and
and green
green lines
lines representing
representing the
the 95%
95% confidence
confidence interval
interval upper
upper andlower
and lower
concentrations. From Figures 3 and 4, the greatest values of actual E. coli concentration for
boundsfor
bounds for thedataset
dataset meanE. E.coli
coliconcentrations.
concentrations.Figure Figure55illustrates
illustrates thesame samegraphgraphas as
each figure the were notedmean to be 150 counts/100 mL and 425 counts/100the mL, respectively. The
Figure
Figure 4; 4; however, the
however, theofhighest highest E. coli
E. coli value
value (extreme
(extreme outlier)
outlier) from
fromonthethe graph in Figure 44
largest concentration predicted E. coli level was also achieved thegraph
sameinentryFigureof actual
wasexcluded
was excludedto toimprove
improvethe thescale
scaleof ofthe
the graphand andhighlight
highlightthe thepredicting
predictingperformance
performance
E. coli concentration 95 counts/100 mLgraph
and 425 counts/100 mL, respectively. Whilst the
for E. coli values
for E. coli valuesmodel situated
situated near the lower bound of the confidence interval. Figure 66
best-performing for near
Set Athedidlower boundthe
not predict ofsame
the confidence
concentration interval.
as theFigure
actual E. coli
illustratesthe
illustrates thebest-performing
best-performingmodel modeloverall
overallfor forboth
bothparameter
parameterSets Sets Aand andBB(AdaBoost
(AdaBoost
level, it was with
regression),
still the
the
highest
red and
predicted
green lines
concentration
representing
of
the
Set
95%
Thus,Athis
A.confidence would also
interval upper
instill
regression), in with the red and green lines representing the 95% confidence interval upper
confidence
andlower
and lowerbounds currently
boundsfor forthe deploying
thedataset
datasetmean ML
meanE. algorithms
E.coli
coliconcentrations.as an
concentrations.Figure early-warning
Figure77illustrates system
illustratesthe same coli
for
thesame E.
concentrations.
graphas asFigure It was
Figure6;6;however, observed
however,the that
the highestE. predicted
E.coli
colivalue concentrations
value(extreme
(extremeoutlier)within
outlier)from the
fromthe 95%
thegraphconfidence
graphin in
graph
interval dataset mean for E. colihighest
concentrations had higherand levels of prediction inaccuracy
Figure 6 was excluded to improve the scale of the
Figure 6 was excluded to improve the scale of the graph and highlight the predictinggraph highlight the predicting
when compared
performance forE. to coli
E. predictions that fell
valuessituated
situated outside
near thelower the bounds
lower boundof ofthe
of theconfidence
95% confidence interval.interval
performance for coli values near the bound the confidence interval.
for all models.
Figure3.
Figure
Figure 3.Predicted
3. Predictedvs.
Predicted vs. actual
vs.actual E.E.
actualE. coli
coli
coli concentration.
concentration.
concentration. (Best-performing model
(Best-performing
(Best-performing forSet
model
model for SetA.)
for A.) A).
Set
Figure
Figure 4.
Figure4. Predicted
4.Predicted vs.actual
Predictedvs.
vs. actual E.E.
actualE. coli
coli
coli concentration.
concentration.
concentration. (Best-performing
(Best-performing
(Best-performing model
model
model for
forSet
for Set
B.) B).
SetB.)
Sensors
Sensors
Sensors 2024, 24,24,
2024,
2024, 24, xFOR
FOR PEER
x FOR
x2871 REVIEW
PEER
PEER REVIEW
REVIEW 1717
17 ofof2323
Sensors 2024, 24, 17ofof23
22
Figure
Figure
Figure 5. Predicted
5. 5. Predicted
Predicted vs. actual
vs.vs.
actual E.E.
actual
E. coli
coli
coli concentration
concentration
concentration inin
in Figure
Figure
Figure 4 without
44 without
without highest
highest
highest E.E.
E. coli
coli
coli value.
value.
value.
Figure
Figure
Figure
Figure 6.6.
6.
6. Combined
Combined
Combined
Combined Set
SetSet
Set AA
AAand
and
and
and Set
Set
Set BB B
Set predicted
predicted vs.vs.
predicted
vs. actual
actual
actual E.E.
actual
E. coli
coli
coli concentration.
concentration.
concentration. (Best-performing
(Best-performing
(Best-performing
(Best-performing
model
model
model overall
overall
overall for
for Sets
for Sets
Sets AA Aand
and B.)
and
model overall for Sets A and B). B.)
B.)
Figure
Figure
Figure
Figure 7.Combined
7.
7. Combined
7. Combined
Combined Set
Set
Set A
Set
AA
A and
and
and
and Set
Set
SetBB
Set predicted
B
Bpredicted
predicted vs.
predicted
vs. actual
vs.vs.
actual
actual E.
actual
E. coli
E.E.
coli
coli concentration
coli concentration
concentration
concentration in Figure
inin
in 666but
Figure
Figure
Figure but without
6 but
but without
without
without
highest
highest
highest E.
highest E. coli
E. value.
coli
E. coli value.
coli value.
value.
Sensors 2024, 24, 2871 18 of 22
4.2. Discussion
Whilst the findings of the study indicated a large difference in the accuracy of the best-
and worst-performing tested models, the E. coli concentrations were predicted in all models
with an accuracy that would suggest that the further development of models is feasible and
that larger datasets of high-quality data can improve the predicting accuracy of ML models.
They can become useful in early-warning detection systems for E. coli concentration levels.
The models can be considered feasible when one considers that the stipulated acceptable
range of E. coli concentration being 0–130 counts/100 mL (Department of Water Affairs
and Forestry, 1996); the study yielded an MAE for the best-performing predictive model of
14.37 counts/100 mL and an MAE for the worst predictive model of 42.27 counts/100 mL.
As stated, the results of the study would suggest that using ML models to predict E.
coli would be more suitable as an early-warning detection system. This was motivated
by the accuracy of prediction by the best-performing ML model. The error would not
instill the necessary confidence in deployers to solely rely on the prediction for their E. coli
levels. However, if used as an early-warning detection system, relevant action can be taken,
including a physical measurement to confirm and rectify the level.
The results of this study warrant further research into ML algorithms to predict E.
coli levels; however, further work strongly depends on the availability of data to train ML
models. The acquisition of reliable data from multiple sources can greatly help train models
to adapt to external variations brought through ecosystems, seasonal changes, etc. It is only
through continuous training of models with data that a shift to a prediction-based solution
in E. coli detection can be the reference method of measurement. This would also apply to
the prediction of other measured parameters using ML algorithms.
5. Conclusions
The work provides an insight into the challenges and the current and future trends
of IOT WQM using WSNs. The work develops a generic IOT WQM-based framework
consisting of a sensing system, a communication system, and a head end system. The work
then focuses on one of the constraints of current IOT WQM systems, the unavailability of
electronic sensors, and develops an AI-based framework to predict the concentrations of E.
coli with unavailable electronic sensors. Thereby, the researchers explored the probability
of predicting water quality parameters with unavailable sensors. The ridge regression,
random forest, stochastic gradient boosting, support vector machine regression, k-nearest
neighbors regression, and AdaBoost regression models were used in the prediction of E. coli
concentrations. The developed results based on the MAE and RMSE performance measures
indicate that E. coli concentrations can be predicted by AI to a fairly accurate level; however,
the results indicated that the use of AI to predict E. coli concentrations would currently
be more beneficial as an early-warning system until further research and testing can be
completed, due to the level of accuracy in predictions. The results of the study indicated
that the AdaBoost regressor had performed the best in predicting E. coli concentrations
based on the performance evaluation (MAE and RMSE values). The parameters from Set
A had superior performance over the parameters of Set B in E. coli prediction based on
the performance evaluation (MAE and RMSE values). E. coli concentrations were able to
be predicted using machine learning algorithms with reasonable accuracy. The accuracy
of the predictions achieved showed that using ML algorithms would be more suitable
as an early-warning detection system. E. coli concentrations were also predicted with
parameters that can be procured with little difficultly and lower procurement costs, as seen
with the prediction accuracy when using Set A parameters. Thus, the aims of the paper
were achieved through the results. This work presents the possibility of developing AI
techniques further to compliment parameters with non-existent sensors.
Author Contributions: Conceptualization, T.W. and Y.S.; methodology, T.W. and Y.S.; software, T.W
and Y.S.; validation, T.W. and Y.S.; formal analysis, T.W and Y.S.; investigation, T.W and Y.S.; resources,
T.W.; data curation, T.W. and Y.S.; writing—original draft preparation, T.W. and Y.S.; writing—review
Sensors 2024, 24, 2871 19 of 22
and editing, T.W. and Y.S.; visualization, T.W. and Y.S.; supervision, T.W.; project administration, T.W.;
funding acquisition, T.W. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by UKZN Research Funds.
Institutional Review Board Statement: The paper was extensively reviewed by the institution.
Informed Consent Statement: The details and processes of the paper are clear and understood.
Data Availability Statement: The dataset used in study is available online: [54].
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Sanders, T. Design of Networks for Monitoring Water Quality; Water Resources Publications: Littleton, CO, USA, 1983.
2. Strobl, R.; Robillard, P. Network Design for Water Quality Monitoring of Surface Freshwaters: A Review. J. Environ. Manag. 2008,
87, 639–648. [CrossRef] [PubMed]
3. DFROBOT SEN0189 Turbidity Sensor. 2023. Available online: https://www.dfrobot.com/product-1394.html (accessed on
1 November 2023).
4. YSI WQ730 Turbidity Sensor. 2023. Available online: https://www.ysi.com/wq730 (accessed on 13 October 2023).
5. Aqualabo PF-CAP-C-00174 Turbidity Sensor. 2023. Available online: https://en.aqualabo.fr/turbidity-digital-sensor-bare-wires-
7-m-cable-b3968.html (accessed on 5 October 2023).
6. Daviteq Modbus Turbidity Sensor. 2023. Available online: https://daviteq.com/en/manuals/books/product-data-sheet-for-
modbus-output-sensors/page/process-turbidity-sensor-with-modbus-output-mbrtu-tbd (accessed on 8 October 2023).
7. DFROBOT SEN0244 TDS Sensor. 2023. Available online: https://www.dfrobot.com/product-1662.html (accessed on 1 November
2023).
8. Hanna Instruments HI-763133 TDS Sensor. 2023. Available online: https://www.hannainstruments.co.uk/electrodes-and-
probes/2633-hi-763133-quick-connect-tds-conductivity-probe (accessed on 15 October 2023).
9. Wateranywhere TDS-3 TDS Sensor. 2023. Available online: https://wateranywhere.com/tds-meter-tests-0-9990-ppm-total-
dissolved-solids-in-water-pocket-size-hm-digital/ (accessed on 3 October 2023).
10. Antratek 314990742 Modbus TDS and EC Sensor. 2023. Available online: https://www.antratek.com/industrial-ec-tds-sensor-
modbus-rtu-rs485-0-2v (accessed on 7 October 2023).
11. Caballero, B.; Trugo, L.; Finglas, P.M. Encyclopedia of Food Sciences and Nutrition, 2nd ed.; Elsevier: Amsterdam, The Netherlands, 2003.
12. YSI WQ201 pH Sensor. 2023. Available online: https://www.ysi.com/wq201 (accessed on 18 October 2023).
13. DFROBOT SEN0169-V2 pH Sensor. 2023. Available online: https://www.dfrobot.com/product-2069.html (accessed on
1 November 2023).
14. Tetraponics SP-P5 pH Sensor Probe. 2023. Available online: https://www.tetraponics.com/products/replacement-ec-probe
(accessed on 10 October 2023).
15. Eucatech 314990622 Modbus pH Sensor Probe, 2023. Available online: https://euca.co.za/products/sensecap-industrial-ph-
sensor-nsc257 (accessed on 12 October 2023).
16. Belcher, R.; Macdonald, A.M.G.; Parry, E. On mohr’s method for the determination of chlorides. Anal. Chim. Acta 1957, 16,
524–529. [CrossRef]
17. YSI EXO Chloride Smart Sensor. 2023. Available online: https://www.ysi.com/product/id-599711/EXO-Chloride-Smart-Sensor
(accessed on 20 October 2023).
18. Riverplus WS102-CL Modbus Sensor. 2023. Available online: https://iiot.riverplus.com/product/ws102-cl-modbus-water-
quality-analysis-residual-chloride-ion-cl-sensor/ (accessed on 11 October 2023).
19. Libelium Proteus Water Sensor for Real-Time Detecting E. coli Bacteria. 2023. Available online: https://proteus-instruments.com/
proteus-bod-multiparameter-water-quality-meter/ (accessed on 26 October 2023).
20. DFROBOT SEN0451 Conductivity Sensor. 2023. Available online: https://www.dfrobot.com/product-2565.html (accessed on
1 November 2023).
21. YSI WQ-COND Conductivity Sensor. 2023. Available online: https://www.ysi.com/wqc (accessed on 27 October 2023).
22. Endress+Hauser CLS54D Conductivity Sensor. 2023. Available online: https://www.endress.com/en/field-instruments-overview/
liquid-analysis-product-overview/conductivity-toroidal-sensor-cls54d?t.tabId=product-overview (accessed on 1 October 2023).
23. YSI EXO Total Algae PC Smart Sensor. 2023. Available online: https://www.ysi.com/exo/talpc (accessed on 30 October 2023).
24. Apure BGA-206A Algae Sensor. 2023. Available online: https://apureinstrument.com/water-quality-analysis/blue-green-algae-
sensor/bga-206a-blue-green-algae-sensor/ (accessed on 28 October 2023).
25. YSI WQ101 Temperature Sensor. 2023. Available online: https://www.ysi.com/wq101 (accessed on 19 October 2023).
26. DFROBOT DS18B20 SEN0511 Temperature Sensor. 2023. Available online: https://www.dfrobot.com/product-2481.html
(accessed on 2 November 2023).
27. ComWinTop CWT-T01S Modbus Temperature Sensor. 2023. Available online: https://store.comwintop.com/products/rs485
-modbus-water-proof-temperature-humidity-sensor-probe?variant=42249549054179 (accessed on 10 October 2023).
Sensors 2024, 24, 2871 20 of 22
28. Standard Methods Committee of the American Public Health Association. American Water Works Association. Water Environ-
ment Federation. 4500-nh3 nitrogen (ammonia). In Standard Methods for the Examination of Water and Wastewater; Lipps, W.C.,
Baxter, T.E., Braun-Howland, E., Eds.; APHA Press: Washington, DC, USA, 2017. [CrossRef]
29. AQUAREAD Ammonia Sensor. 2023. Available online: https://www.aquaread.com/sensors/ammonium-ammonia (accessed
on 30 October 2023).
30. Kacise KAN310 Modbus Ammonia Sensor. 2023. Available online: https://www.fluid-meter.com/sale-13682999-kan310-online-
ammonia-nitrogen-sensor-rs485-modbus-convenient-to-connect-to-plc-dcs-patented-ammoniu.html (accessed on 7 October
2023).
31. Sea Bird Scientific SUNA V2 Nitrate Sensor. 2023. Available online: https://www.seabird.com/nutrient-sensors/suna-v2-nitrate-
sensor/family?productCategoryId=54627869922 (accessed on 30 October 2023).
32. AQUAREAD Nitrate Sensor. 2023. Available online: https://www.aquaread.com/sensors/nitrate (accessed on 30 October 2023).
33. Xylem 107066 Modbus Nitrate Sensor. 2023. Available online: https://www.xylemanalytics.com/en/general-product/id-151/
ise-combination-sensor-for-ammonium-and-nitrate---wtw (accessed on 2 October 2023).
34. DWAF Department of Water Affairs & Forestry. South African Water Quality Guidelines Volume 1 Domestic Water Use, 2nd ed.;
DWAF: Pretoria, South Africa, 1996.
35. DWAF Department of Water Affairs and Forestry. South African Water Quality Guidelines. Volume 2: Recreational Use, 2nd ed.;
DWAF: Pretoria, South Africa, 1996.
36. Republic of South Africa, Department of Environmental Affairs. South African Water Quality Guidelines for Coastal Marine
Waters—Natural Environment and Mariculture Use; Department of Environmental Affairs: Cape Town, South Africa, 2018.
37. Yang, X.; Ong, K.G.; Dreschel, W.R.; Zeng, K.; Mungle, C.S.; Grimes, C.A. Design of a Wireless Sensor Network for Long-term,
In-Situ Monitoring of an Aqueous Environment. Sensors 2002, 2, 455–472. [CrossRef]
38. Ryecroft, S.; Shaw, A.; Fergus, P.; Kot, P.; Hashim, K.S.; Tang, A.; Moody, A.; Conway, L. An Implementation of a Multi-Hop
Underwater Wireless Sensor Network using Bowtie Antenna. Karbala Int. J. Mod. Sci. 2021, 7, 3. [CrossRef]
39. Rosero-Montalvo, P.D.; López-Batista, V.F.; Riascos, J.A.; Peluffo-Ordóñez, D.H. Intelligent WSN System for Water Quality
Analysis Using Machine Learning Algorithms: A Case Study (Tahuando River from Ecuador). Remote Sens. 2020, 12, 1988.
[CrossRef]
40. Kofi Sarpong, A.-M.; Katsriku, F.A.; Abdulai, J.-A.; Engmann, F. Smart River Monitoring Using Wireless Sensor Networks. Wirel.
Commun. Mob. Comput. 2020, 2020, 8897126. [CrossRef]
41. Murphy, K.; Heery, B.; Sullivan, T.; Zhang, D.; Paludetti, L.; O’Connor, N.; Diamond, D.; Regan, F. A low-cost autonomous optical
sensor for water quality monitoring. Talanta 2014, 132, 520–527. [CrossRef]
42. O’Flynn, B.; Martínez-Català, R.; Harte, S.; O’Mathuna, C.; Cleary, J.; Slater, C.; Regan, F.; Diamond, D.; Murphy, H. SmartCoast:
A Wireless Sensor Network for Water Quality Monitoring. In Proceedings of the 32nd IEEE Conference on Local Computer
Networks (LCN 2007), Dublin, Ireland, 15–18 October 2007; pp. 815–816. [CrossRef]
43. Seders, L.; Butler, C.S.; Lemmon, M.; Talley, J.; Maurice, P.A. LakeNet: An integrated sensor network for environmental sensing in
lakes. Environ. Eng. Sci. 2007, 24, 183–191. [CrossRef]
44. Chen, C.-H.; Wu, Y.-C.; Zhang, J.-X.; Chen, Y.-H. IoT-Based Fish Farm Water Quality Monitoring System. Sensors 2022, 22, 6700.
[CrossRef]
45. Jáquez, A.D.B.; Herrera, M.T.A.; Celestino, A.E.M.; Ramírez, E.N.; Cruz, D.A.M. Extension of LoRa Coverage and Integration of
an Unsupervised Anomaly Detection Algorithm in an IoT Water Quality Monitoring System. Water 2023, 15, 1351. [CrossRef]
46. Razman, N.A.; Wan Ismail, W.Z.; Abd Razak, M.H.; Ismail, I.; Jamaludin, J. Design and analysis of water quality monitoring and
filtration system for different types of water in Malaysia. Int. J. Environ. Sci. Technol. 2023, 20, 3789–3800. [CrossRef] [PubMed]
47. Ubah, J.I.; Orakwe, L.C.; Ogbu, K.N.; Awu, J.I.; Ahaneku, I.E.; Chukwuma, E.C. Forecasting water quality parameters using
artificial neural network for irrigation purposes. Sci. Rep. 2021, 11, 24438. [CrossRef] [PubMed]
48. Khan, S.I.; Islam, S.; Nasir, M. Predicting Water Quality using WSN and Machine Learning. Bachelor’s Thesis, Mawlana Bhashani
Science and Technology, University Santosh, Tangail, Bangladesh, 2020.
49. Aldhyani, T.H.H.; Al-Yaari, M.; Alkahtani, H.; Maashi, M. Water Quality Prediction Using Artificial Intelligence Algorithms. Appl.
Bionics Biomech. 2020, 2020, 6659314. [CrossRef]
50. Paepae, T.; Bokoro, P.N.; Kyamakya, K. From Fully Physical to Virtual Sensing for Water Quality Assessment: A Comprehensive
Review of the Relevant State-of the-Art. Sensors 2021, 21, 6971. [CrossRef] [PubMed]
51. Chen, L.; Wu, T.; Wang, Z.; Lin, X.; Cai, X. A novel hybrid BPNN model based on adaptive evolutionary Artificial Bee Colony
Algorithm for water quality index prediction. Ecol. Indic. 2023, 146, 109882. [CrossRef]
52. Stocker, M.D.; Pachepsky, Y.A.; Hill, R.L. Prediction of E. coli Concentrations in Agricultural Pond Waters: Application and
Comparison of Machine Learning Algorithms. Front. Artif. Intell. 2022, 4, 768650. [CrossRef] [PubMed]
53. Naloufi, M.; Lucas, F.S.; Souihi, S.; Servais, P.; Janne, A.; Wanderley Matos De Abreu, T. Evaluating the Performance of Machine
Learning Approaches to Predict the Microbial Quality of Surface Waters and to Optimize the Sampling Effort. Water 2021, 13, 2457.
[CrossRef]
54. Masindi, V. Dataset on physicochemical and microbial properties of raw water in four drinking water treatment plants based in
South Africa. Data Brief 2020, 31, 105822. [CrossRef]
Sensors 2024, 24, 2871 21 of 22
55. Yaroshenko, I.; Kirsanov, D.; Marjanovic, M.; Lieberzeit, P.A.; Korostynska, O.; Mason, A.; Frau, I.; Legin, A. Real-Time Water
Quality Monitoring with Chemical Sensors. Sensors 2020, 20, 3432. [CrossRef]
56. Pasika, S.; Gandla, S. Smart water quality monitoring system with cost-effective using IoT. Heliyon 2020, 6, e04096. [CrossRef]
57. Morón-López, J.; Rodríguez-Sánchez, M.C.; Carreño, F.; Vaquero, J.; Pompa-Pernía, Á.G.; Mateos-Fernández, M.; Aguilar, J.A.P.
Implementation of Smart Buoys and Satellite-Based Systems for the Remote Monitoring of Harmful Algae Bloom in Inland
Waters. IEEE Sens. J. 2021, 21, 6990–6997. [CrossRef]
58. Nguyen, D.; Phung, P.H. A Reliable and Efficient Wireless Sensor Network System for Water Quality Monitoring. In Proceedings
of the 2017 International Conference on Intelligent Environments (IE), Seoul, Republic of Korea, 21–25 August 2017; pp. 84–91.
[CrossRef]
59. Wang, X.; Zhang, F.; Ding, J. Evaluation of water quality based on a machine learning algorithm and water quality index for the
Ebinur Lake Watershed, China. Sci. Rep. 2017, 7, 12858. [CrossRef] [PubMed]
60. Milánkovich, Á.; Klincsek, K. Wireless Sensor Network for Water Quality Monitoring. In European Project Space on Information and
Communication Systems; SCITEPRESS: Setúbal, Portugal, 2015; pp. 28–47. [CrossRef]
61. Demetillo, A.; Japitana, M.; Taboada, E. A system for monitoring water quality in a large aquatic area using wireless sensor
network technology. Sustain. Environ. Res. 2019, 29, 12. [CrossRef]
62. Kofi, A.-M.; Tapparello, C.; Heinzelman, W.; Katsriku, F.; Abdulai, J.-D. Water Quality Monitoring Using Wireless Sensor
Networks: Current Trends and Future Research Directions. ACM Trans. Sens. Netw. 2017, 13, 1–41. [CrossRef]
63. Safaric, S.; Malaric, K. ZigBee wireless standard. In Proceedings of the ELMAR 2006, Zadar, Croatia, 7–9 June 2006; pp. 259–262.
[CrossRef]
64. IEEE Std 802.15.4-2015 (Revision of IEEE Std 802.15.4-2011); IEEE Standard for Low-Rate Wireless Networks. IEEE: Piscataway, NJ,
USA, 22 April 2016; pp. 1–709. [CrossRef]
65. IEEE Std 802.11-2020 (Revision of IEEE Std 802.11-2016); IEEE Standard for Information Technology–Telecommunications and
Information Exchange between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 11: Wireless LAN
Medium Access Control (MAC) and Physical Layer (PHY) Specifications–Redline. IEEE: Piscataway, NJ, USA, 26 February 2021;
pp. 1–7524.
66. IEEE Std 802.15.1-2005 (Revision of IEEE Std 802.15.1-2002); IEEE Standard for Information technology–Local and Metropolitan Area
Networks–Specific Requirements–Part 15.1a: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications
for Wireless Personal Area Networks (WPAN). IEEE: Piscataway, NJ, USA, 14 June 2005; pp. 1–700. [CrossRef]
67. Sigfox Whitepapers. Available online: https://www.sigfox.com/ (accessed on 14 August 2023).
68. Devalal, S.; Karthikeyan, A. LoRa Technology—An Overview. In Proceedings of the 2018 Second International Conference on
Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 29–31 March 2018; pp. 284–290. [CrossRef]
69. NB-IoT Whitepapers. Available online: https://www.narrowband.com/ (accessed on 25 August 2023).
70. Labdaoui, N.; Nouvel, F.; Dutertre, S. Energy-efficient IoT Communications: A Comparative Study of Long-Term Evolution for
Machines (LTE-M) and Narrowband Internet of Things (NB-IoT) Technologies. In Proceedings of the 2023 IEEE Symposium on
Computers and Communications (ISCC), Gammarth, Tunisia, 9–12 July 2023; pp. 823–830. [CrossRef]
71. Olatinwo, S.O.; Joubert, T.-H. Enabling Communication Networks for Water Quality Monitoring Applications: A Survey. IEEE
Access 2019, 7, 100332–100362. [CrossRef]
72. Suciu, G.; Suciu, V.; Dobre, C.; Chilipirea, C. Tele-Monitoring System for Water and Underwater Environments Using Cloud and
Big Data Systems. In Proceedings of the 2015 20th International Conference on Control Systems and Computer Science, Bucharest,
Romania, 27–29 May 2015; pp. 809–813. [CrossRef]
73. Awan, K.M.; Shah, P.A.; Iqbal, K.; Gillani, S.; Ahmad, W.; Nam, Y. Underwater Wireless Sensor Networks: A Review of Recent
Issues and Challenges. Wirel. Commun. Mob. Comput. 2019, 2019, 6470359. [CrossRef]
74. Myint, C.Z.; Gopal, L.; Aung, Y.L. Reconfigurable smart water quality monitoring system in IoT environment. In Proceedings of
the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), Wuhan, China, 24–26 May 2017;
pp. 435–440. [CrossRef]
75. Akyildiz, I.; Pompili, D.; Melodia, T. Challenges for efficient communication in underwater acoustic sensor networks. ACM
SIGBED Rev. 2004, 1, 3–8. [CrossRef]
76. Marais, J.; Malekian, R.; Ye, N.; Wang, R. A Review of the Topologies Used in Smart Water Meter Networks: A Wireless Sensor
Network Application. J. Sens. 2016, 2016, 9857568. [CrossRef]
77. Sehgal, A.; Tumar, I.; Schonwalder, J. Variability of available capacity due to the effects of depth and temperature in the underwater
acoustic communication channel. In Proceedings of the Oceans 2009-Europe, Bremen, Germany, 11–14 May 2009. [CrossRef]
78. Gallagher, M. Effect of topology on network bandwidth, Master of Engineering (Hons.). Master’s Thesis, Faculty of Informatics,
University of Wollongong, Wollongong, NSW, Australia, 1998. Available online: https://ro.uow.edu.au/theses/2539 (accessed
on 16 July 2023).
79. Pottie, G.J.; Kaiser, W.J. Wireless integrated network sensors. Commun. ACM 2000, 43, 51–58. [CrossRef]
80. Watt, A.J.; Phillips, M.R.; Campbell, C.E.; Wells, I.; Hole, S. Wireless Sensor Networks for monitoring underwater sediment
transport. Sci. Total Environ. 2019, 667, 160–165. [CrossRef]
81. Júnior, A.C.D.S.; Munoz, R.; Quezada, M.D.L.Á.; Neto, A.V.L.; Hassan, M.M.; De Albuquerque, V.H.C. Internet of Water Things:
A Remote Raw Water Monitoring and Control System. IEEE Access 2021, 9, 35790–35800. [CrossRef]
Sensors 2024, 24, 2871 22 of 22
82. Luethi, R.; Phillips, M. Challenges and solutions for long-term permafrost borehole temperature monitoring and data interpreta-
tion. Geogr. Helv. 2016, 71, 121–131. [CrossRef]
83. Rokem, A.; Kay, K. Fractional ridge regression: A fast, interpretable reparameterization of ridge regression. GigaScience 2020, 9,
giaa133. [CrossRef] [PubMed]
84. Ogutu, J.O.; Schulz-Streeck, T.; Piepho, H.P. Genomic selection using regularized linear regression models: Ridge regression,
lasso, elastic net and their extensions. BMC Proc. 2012, 6 (Suppl. 2), S10. [CrossRef] [PubMed]
85. Segal, M.R. Machine Learning Benchmarks and Random Forest Regression. Center for Bioinformatics and Molecular Biostatistics; University
of California: San Francisco, CO, USA, 2004. Available online: https://escholarship.org/uc/item/35x3v9t4 (accessed on
10 August 2023).
86. Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [CrossRef]
87. Boswell, D. Introduction to Support Vector Machines; Departement of Computer Science and Engineering, University of California:
San Diego, CA, USA, 2002.
88. Kramer, O. Unsupervised K-nearest neighbor regression. arXiv 2011, arXiv:1107.3600.
89. Xiao, F.; Wang, Y.; He, L.; Wang, H.; Li, W.; Liu, Z. Motion Estimation from Surface Electromyogram Using Adaboost Regression
and Average Feature Values. IEEE Access 2019, 7, 13121–13134. [CrossRef]
90. Koduri, S.; Gunisetti, L.; Ramesh, C.; Mutyalu, K.; Ganesh, D. Prediction of crop production using adaboost regression method.
J. Phys. Conf. Ser. 2019, 1228, 012005. [CrossRef]
91. Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing
Average Model Performance. Clim. Res. 2005, 30, 79–82. [CrossRef]
92. Robeson, S.M.; Willmott, C.J. Decomposition of the mean absolute error (MAE) into systematic and unsystematic components.
PLoS ONE 2023, 18, e0279774. [CrossRef]
93. Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the
literature. Geosci. Model Dev. 2014, 7, 1247–1250. [CrossRef]
94. Sessions, V.; Valtorta, M. The Effects of Data Quality on Machine Learning Algorithms. ICIQ 2006, 6, 485–498.
95. Arimie, C.O.; Biu, E.O.; Ijomah, M.A. Outlier Detection and Effects on Modeling. Open Access Libr. J. 2020, 7, e6619. [CrossRef]
96. Li, A.H.; Bradic, J. Boosting in the Presence of Outliers: Adaptive Classification with Nonconvex Loss Functions. J. Am. Stat.
Assoc. 2017, 113, 660–674. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.