2023-Comparative_analysis_of_machine_learning_and_analy
2023-Comparative_analysis_of_machine_learning_and_analy
https://doi.org/10.1007/s12665-023-11237-y
ORIGINAL ARTICLE
Abstract
A proactive policy for tackling the global water crisis involves the application of artificial groundwater recharge (AGR). AGR
site selection is a complex challenge, particularly in large study areas. Considerable research attempted to locate AGR sites
using field data collection or conventional decision modeling techniques, such as analytic hierarchy process (AHP). However,
the present study utilizes machine learning (ML) techniques with geographic information system (GIS) and remote sensing
images to develop a high-efficiency AGR map for the United Arab Emirates. In this study, nine thematic layers were consid-
ered: precipitation, drainage density, total dissolved solids, groundwater level, geology, geomorphology, lineament density,
elevation and distance from residences. The study applied three ML models, namely support vector machine, multilayer
perceptron and random forest (RF), to estimate the relative importance of each thematic layer through feature importance
analysis. The AHP approach was also used for comparison. The weights for each thematic layer were determined through a
literature review and expert opinions. Results showed that the RF model performed best, with an overall prediction accuracy
of 99%. The developed AGR maps were categorized according to their suitability for AGR potential, with approximately
10% of the study area categorized as high. The results of the AHP and RF approaches were relatively similar, indicating that
the qualitative approach of AHP was validated by the data-driven approach of RF. The present study presents a framework
that can be applied in other climate regions with data availability. This framework can also help environmental agencies and
practitioners understand the role of ML in AGR site selection. The results also demonstrate the effectiveness of combining
GIS, remote sensing and ML techniques to produce high-efficiency AGR maps.
Keywords Artificial groundwater recharge · Machine learning · Analytical hierarchical process · UAE · GIS
13
Vol.:(0123456789)
a sustainable approach to managing groundwater resources (Khalil et al. 2019; Hamad et al. 2020a, b; Khalil and Fatmi
in arid countries. Countries such as the United Arab Emir- 2022). Given the advancement of approaches in RS and GIS,
ates (UAE) receive a scanty annual rainfall below 100 mm ML models have been utilized to augment weightage to the
yearly, mostly during winter (Al-Ruzouq et al. 2019a). This traditional decision-making approaches of AHP (Naghibi
scenario drives the attempt to manage groundwater resources et al. 2016; Rahmati et al. 2016; Norouzi and Shahmoham-
through AGR. In the UAE, recycling treated wastewater has madi-Kalalagh 2019; Avand et al. 2020). Classifiers, such as
been practiced for the past decade to ensure a reliable source random forest (RF), support vector machine (SVM), linear
of water supply (Dawoud and Sallam 2012). Therefore, regression and deep learning, are quite commonly utilized
locating suitable sites for recharging can be a boon for the in novel studies for AGR identification (Naghibi et al. 2016;
country’s future mitigation of water resources. Rahmati et al. 2016; Norouzi and Shahmohammadi-Kala-
Constrained information on potential recharge zones and lagh 2019). Jaafarzadeh et al. (2021) identified the potential
technical, environmental, spatial and topographical aspects AGR zones in the Marboreh watershed of Iran by utilizing
governs the recharging process (Selvarani et al. 2017). Geo- the hybrid of maximum entropy and frequency ratio models
environmental factors are considered based on analytical in a semiarid mountainous environment (Jaafarzadeh et al.
principles to understand the groundwater recharge environ- 2021). Elevation, slope, aspect, drainage density, distance
ment. The advancement of remote sensing (RS) and geo- from the river, distance from the fault, plan curvature, profile
graphic information systems (GISs) permitted researchers curvature, LULC, rainfall, vegetation index, soil and lithol-
and scientific society to delineate AGR zones efficiently over ogy were considered for the research. The hybrid model
the past decades (Agarwal and Garg 2016). Notable stud- demonstrated that 3.01% of the total study area is suitable for
ies on AGR (Ghayoumian et al. 2005; Agarwal and Garg AGR. Huang et al. (2019) compared the ML models of lin-
2016; Senanayake et al. 2016; Khan et al. 2020; Xu et al. ear regression, multilayer perceptron (MLP) and long short-
2021) utilized geology, geomorphology, precipitation, land term memory (LSTM) to predict groundwater recharge in
use land cover (LULC), hydrogeology, drainage, lineament, the states of South Australia and Victoria (Aggarwal 2018).
slope, elevation and soil datasets, to mention a few, as key The LSTM model outperforms the other two models with
influential factors for identifying potential AGR zones. a root mean square error of 0.12. In situ data of water table
Similarly, analytic hierarchy process (AHP) has been fluctuation estimates from 1970 to 2012 for 465 wells were
widely utilized in the GIS environment by considering the utilized in the present study. Other factors, such as rainfall,
influential factors with decision alternatives for qualitative actual evapotranspiration, temperature, wet-spell length and
and quantitative aspects. This technique is one of the vital groundwater extraction, were also considered. Several stud-
tools in the spatial decision-making process across several ies (Table 1) were considered to identify a suitable zone
works in many disciplines, such as groundwater monitor- before installation within a GIS framework by understand-
ing (Chakraborty and Kumar 2016; Kim 2020; Verma and ing the prime factors associated with artificial groundwater
Patel 2021), movement (Ghosh et al. 2020), LULC studies recharging and the utilization of AHP and ML approach.
(Zagade and Umrikar 2021) and other groundwater studies Table 1 summarizes the most recent literature on AGR.
(Muralitharan and Palanivel 2015; Behera et al. 2019). Xu The key parameters considered in most of the AGR stud-
et al. (2021) identified the AGR zones in the Daqing River ies are as follows: slope, elevation, lithology, geology, geo-
catchment of China using AHP. Factors including distance morphology, rainfall, LULC, drainage density, groundwa-
to a canal, slope, infiltration rate, drainage density, ground- ter depth, aquifer thickness and distance from rivers/faults/
water depth, aquifer hydraulic conductivity, aquifer thick- urban regions. Most studies of AGR utilized the traditional
ness, groundwater quality, soil quality and distance to sen- AHP approach followed by novel ML integration for further
sitive areas were integrated with AHP. The study revealed validation.
that 15.8% of the total study area, mainly concentrated on Previous works have attempted to locate AGR sites using
alluvial plains, is suitable for potential AGR sites. Zghibi conventional linear regression or AHP. Furthermore, the
et al. (2020) utilized AHP-considering parameters, such as studies highlighted the importance of addressing the issue of
lithology, LULC, slope, geomorphology, lineament density water scarcity and its impact on nations with abundant water
(LD), rainfall, drainage density and soil type, in the Korba resources. The challenges of surface water management in
aquifer in northeastern Tunisia. The study was validated arid and semiarid regions were also emphasized. Population
through groundwater well discharge data from 20 wells. The growth was discussed as a significant factor contributing to
approach identified 69% of the total study area as potential increased water demand. The above studies have limitations,
sites for AGR. including insufficient research on AGR in the UAE and the
Recently, several machine learning (ML) models have reliance on conventional AHP models.
been implemented in many fields and proved their outper- The present study aims to address these limitations by
formance compared to conventional statistical techniques focusing on the central-northern emirates of the UAE,
13
Mokarram et al. (2021) Bushehr Province, Iran Slope, lithology, LU/LC, drainage density, distance AHP, order weight average (OWA)
to faults, precipitation, elevation
Jaafarzadeh et al. (2021) Marboreh watershed, Iran Elevation, slope, aspect, profile curvature, plan Maximum entropy (ME), Frequency ratio (FR)
(2023) 82:580
13
580 Page 4 of 16 Environmental Earth Sciences (2023) 82:580
which faces an increasing demand for water resources (2) Investigate and map potential AGR sites in the light
because of its arid climate. First, the study considered of managing water resources in the central-northern
a combination of multiple factors, such as precipitation, emirates
drainage density, geomorphology, geology, groundwater (3) Evaluate and validate the AHP and ML models to iden-
level (GWL), total dissolved solids (TDS), elevation, LD tify the AGR potential zones
and distance from residences, to assess AGR suitability
comprehensively. Second, the study employed ML algo-
rithms to determine the influential weights of each the- Study area
matic layer in a data-driven manner instead of relying
solely on experts’ opinions and literature reviews, as done The UAE is located in the Arabian Peninsula. Oman and the
in AHP. Gulf of Oman border it in the east, whereas the Arabian Gulf
In particular, the present study employed AHP aided by borders it in the north. Saudi Arabia is in the west (Fig. 1).
ML techniques to obtain an AGR map for the central-north- The country shares an arid desert climate with maximum
ern region of the UAE. The influential factors for determin- temperatures reaching 45–50 °C in the desert with an annual
ing potential AGR sites are as follows: rainfall, drainage rainfall of 110–150 mm (2020).
density, TDS, GWL, geology, geomorphology, LD, eleva- The population in UAE has experienced significant
tion and distance from residences. The primary aim of the growth over the past few decades. It has increased from
research is to identify, locate and construct AGR sites in the approximately 350 K in 1971 to more than 9 million per
central-northern emirates by utilizing RS, GIS, AHP and capita in 2021. During this time, the urbanization of the
ML. The outcomes of different ML models were compared country and the expansion of industries have increased the
to understand the model with the highest accuracy. The demand for freshwater. This growth has led to considerable
objectives of this research are as follows: stress on water resources. Thus, the UAE has been trying to
implement new methods of increasing water resources and
(1) Explore key influential climatological, geological and tackle the global water crisis.
residential factors to delineate AGR potential zones From 2008 to 2019, the country’s water consumption
using ML techniques elevated from 1.47 billion cubic meters (bcm) to 1.69 bcm
13
because of rapid urbanization (Sherif et al. 2018). The Materials and methods
total reserve of fresh groundwater in the UAE declined
from 4% in 2005 to 1% in 2015 (Shanableh et al. 2018). Methodology
The present research focuses on the central-northern emir-
ates, including Sharjah, Ajman, Ras-al Khaimah, Umm-al The methodology of the present research is divided into four
Quwain, Fujairah and part of Oman. stages, as illustrated in Fig. 2. The first stage involves the
The eastern part of the study area comprises the Al- collection of historical records and satellite imagery data,
Hajar Mountains covering Fujairah and east Sharjah. In including precipitation, salinity and GWL data from exist-
the west, the study area is mostly covered by sand. The ing wells. A digital elevation model (DEM), the Landsat 8
aquifers near mountains are mostly alluvial gravel and Enhanced Thematic Mapper Plus (EMT+) satellite imagery
sand (Al-Ruzouq et al. 2019a; Murad et al. 2019). The from USGS and the residential built-up data from Open-
aquifers in Ras-al Khaimah are mostly made of limestone StreetMap for the study area were also utilized. The data col-
and are unconfined (Saif Al Matri 2008; Murad et al. lection process is discussed in detail in a subsequent section.
2019). The total study area is approximately 3905 km2. The second stage of the research involves the preparation
of thematic layers by integrating the in situ data into ArcGIS
Data Source
Location of
Historical Rainfall Digital Elevation In-situ Salinity In-situ Groundwater Residential Areas
Landsat 8 EMT+
Records Model Records Records (provided by Open
Street Map)
Drainage
Density
Artificial Neural
Network
If not accurate,
refine the weighted overlay process
Validation
Accuracy Assessment
(Groundwater Levels + Nazwa
Site)
13
Pro. This stage includes the development of thematic layers • Precipitation: it plays a vital role in identifying AGR
for precipitation, TDS, GWL, geology, geomorphology, LD sites. A high rainfall amount translates to a high prob-
and distance from residences. These thematic layers were ability of a region’s groundwater, making this region
used to identify potential AGR zones. suitable for AGR sites (Sherif et al. 2018). In situ (rain
The third stage of the research involves the reclas- gauge) data from the 2003–2017 period are obtained
sification and ranking of the thematic layers using two from the UAE National Centre of Meteorology to pre-
approaches: AHP and ML. AHP is a multicriteria decision- pare the thematic spatial map of rainfall. The rainfall
making method that assigns weights to the thematic layers amount is measured in millimeters. A high rainfall
based on literature review and expert opinions. On the con- amount is received in the eastern part of the study
trary, ML utilizes algorithms to estimate the weights of each area, covering parts of Fujairah, Sharjah and Ras-al-
thematic layer through feature importance. Both methods are Khaimah. Figure 3a shows that the maximum rainfall
compared and moderated to derive the best suitable weights amount is 103 mm.
for the AGR map. The final stage involves the application • Drainage stream density (DSD): DSD is the amount of
of weighted overlay analysis to obtain the AGR map. The water discharged by streams within a catchment basin. It
derived AGR map is validated using the existing groundwa- is a ratio of the total stream length to the whole area of
ter potential data from previous research (Al-ruzouq et al. the catchment basin. A high DSD portrays low perme-
2019a, b) and an existing functional AGR site at Nazwa, ability (Farhadian et al. 2017; Senthilkumar et al. 2019).
Sharjah. The water source originates from the northeastern part of
In summary, the overall methodology adopted for iden- the UAE. Most of the streams flowing in the region accu-
tifying potential AGR zones in this research includes the mulate at the junction of mountains and plains that cover
following: the eastern area of Sharjah because of the mountains in
the northeastern part. The thematic layer is prepared
• Data collection and compilation of historical records, using DEM in ArcGIS Pro. It is defined by kilometers
satellite imagery and other geospatial data channel per square kilometer.
• Preparation of thematic layers and integration of data • GWL: the hydraulic gradient of a particular area can be
using GIS determined by analyzing the water level dependent on
• Reclassification and ranking of thematic layers using the pore pressure and atmospheric pressure at the sur-
AHP and ML face (Senthilkumar et al. 2019; Khan et al. 2020). In situ
• Application of weighted overlay analysis to obtain the GWL data are compiled, and the inverse distance weight-
AGR map ing interpolation technique is used to derive the GWL
• Validation of the AGR map using the existing data and map. The values are depicted in meters above sea level
an existing functional site (masl). GWL is inversely proportional to the AGR zona-
tion (Senthilkumar et al. 2019). The study area holds
Data collection and preprocessing high GWLs in the southeastern part of Sharjah and the
western part of Ras-al-Khaimah. These areas exhibit high
The nine thematic layers for this study are selected based GWLs because they lie in the foothills of the mountain-
on an extensive review of relevant literature and previous ous region. Thus, they proportionately receive increased
research conducted in different regions. These layers are rainfall amounts contributing to high GWLs. The tradi-
chosen because they are identified as important indicators tional method of calculating groundwater recharge by
of groundwater potential. We collect as much data as pos- multiplying a constant specific yield value by the water
sible to provide a comprehensive analysis by drawing on table rise over a certain time interval may be erroneous,
the established sources and methods used in previous stud- particularly in shallow aquifers. The hydraulic approach,
ies. The nine thematic layers include precipitation, drainage based on Darcy’s equation, offers the most direct meas-
density, TDS, GWL, geology, geomorphology, LD, elevation urement of seepage rates and, hence, recharge. However,
and distance from residences. Each of these layers plays a it is highly site specific. It is also laborious and expen-
significant role in determining groundwater potential. Their sive, requiring specialized field equipment and personnel.
inclusion in the present study provides a comprehensive • Geology: LANDSAT 8 satellite imagery was utilized to
understanding of the interplay between different variables prepare the geology layer, classified as alluvium, lime-
affecting groundwater potential. stone, gabbro, metamorphic, sand and ophiolite. Super-
Accordingly, several statistical techniques and algorithms vised classification was applied to extract the layer.
are prepared in ArcGIS Pro to develop each thematic layer Alluvium has the highest water-holding capacity. Thus,
from the source data. The description of each thematic layer it is ideal for constructing and maintaining AGR sites.
is as follows: Figure 3e shows that most of the land is covered by sand.
13
13
Fig. 3 (continued)
13
Overlay analysis
Fig. 3 (continued)
ML
Sand is crucial in determining AGR site construction ML is a subset of artificial intelligence. ML utilizes compu-
because of its high percolation characteristic. tational algorithms and statistical techniques to train a set
• Geomorphology: LANDSAT 8 EMT + satellite imagery of data based on the patterns of the input parameter. The
is utilized to prepare the geomorphology layer. Vegeta- process aims to allow the computer to understand the prac-
tion, urban built-up, mountains, fan deposits and vegeta- tices of the dataset and provide automatic decisions (Naghibi
tion are the classes of the geomorphology layer of the et al. 2016). The dataset is trained on two types of classifica-
study area. Fan deposits have high water-holding capac- tion: supervised (regression) and unsupervised (clustering).
ity and are suitable for AGR site zonation. Sand and high Many researchers utilize ML to validate groundwater data
dunes also have high percolation, making them ideal for and related studies (Naghibi et al. 2016; Al-Ruzouq et al.
AGR zonation. Geomorphological categories help in 2019b; Ghosh et al. 2020; Lehr and Lischeid 2020; Pourgha-
understanding the permeability of a surface. semi et al. 2020). Three ML techniques are utilized: SVM,
• TDS: the TDS determines the water quality for preexist- MLP and RF based on the nine selected spatial parameters.
ing groundwater, if any, at the time of AGR zonation The present study aims to generate a comprehensive data-
and even that of the injecting water. TDS determines the set and develop ML models for the study area. Thus, 1000
coagulation properties of water (Kim 2020). High TDS random points are collected, including the cross-sectional
would be unfit for artificial water injection because of data for each of the nine thematic layers: precipitation, drain-
pathogenic activities leading to a serious health hazard. age density, TDS, GWL, geology, geomorphology, LD,
The study area’s eastern shoreline has higher TDS than elevation and distance from residences. These points are
the western shoreline of the study area adjoining the Gulf carefully selected and utilized to extract information from
of Oman. This finding can indicate that Arabian Gulf is the previously published groundwater potential map. The
more saline than the Gulf of Oman. The thematic layer is dataset is preprocessed and cleaned to ensure the accuracy
13
and quality of the data, thereby eliminating any null values (2)
( )
𝜔ho = 𝜔ho + 𝜂𝛿o Oh ,
and making it suitable for training and learning in ML.
The 1000-point dataset is split into the following sets to where 𝜔ho is the weight, h is the hidden unit, o and η are the
avoid tight fitting of the ML algorithm around the train- output and learning rate, and Oh is the output at hidden unit
ing data and eliminate biases: 60% for training and 40% for h . 𝛿o can be expressed as
testing. The data are divided into two sets to ensure that
(3)
( )( )
the presented model is unbiased and reliable and to avoid 𝛿o = Oo 1 − Oo to − Oo ,
overfitting or underfitting of the model. This division of data
where Oo is the output at node O , and the target output is
helps ensure that the ML model accurately predicts and gen-
t − O for that node. The input weights can be corrected using
eralizes well to new and unseen data.
the following expression:
(4)
( )
𝜔ih = 𝜔ih + 𝜂𝛿h Oi ,
SVM
where node i is connected to the weight 𝜔ih with node h
SVM is a supervised ML algorithm. A hyperplane, which of the hidden layer, Oi and η are the input at node i and the
transforms nonlinear input to a high-dimension area, is learning rate.
utilized to segregate the data into regressive subclasses 𝛿h can be represented as follows:
(Naghibi and Ahmadi 2017; Pourghasemi et al. 2020). The
dataset is split into 60% and 40% for training and testing,
( )∑( )
𝛿h = oh 1 − oh 𝛿o 𝜔ho . (5)
respectively. The boundaries of the hyperplanes among the o
classes are identified in a manner where the ranks and the The error can be calculated using the average difference
boundary lines are as far as possible to achieve the clas- between the output and the input. The error can be expressed
sification. The attribute weights are obtained using the dot as follows:
kernel type (Naghibi and Ahmadi 2017). The function used
to construct the hyperplane is (Ghosh and Das 2020) repre-
� �2
∑p �
t − Oo
sented in Eq. 4: E=
n=1 o
, (6)
p
f (x) = sign(Σn(i=1) xi 𝛼i k(x, xi ) + 𝛽), (1)
where p is the number of units in the output layer. Each
where x is the training dataset, xi is the testing dataset, n is pattern is repeated in the training to complete one epoch.
the number of input data, α is the Lagrange multiplier, β is Shuffling the training set is necessary to prevent the influ-
the bias, and k() is the Kernel function. ence of the data order.
RF
MLP
RF utilizes multiple decision/regression trees and an elon-
An MLP network consists of three layers: input, hidden and gated bagging technique in creating an ensemble model.
output. In MLP, the last layers to previous layers have no Random subsets of different samples of data are used to
feedback, indicating a feedforward. All the neurons involved build trees. The quantity depends on the number of training
use nonlinear activation functions. The hidden layer can samples. In the final step, the ensemble technique is used to
comprise multiple layers of neurons (Kim 2020). The super- calculate the final output by voting for the classification and
vised learning technique of backward propagation is used averaging for the regression problems (Naghibi and Ahmadi
for training MLP. The MLP differs from linear regression 2017; Al-Ruzouq et al. 2019b; Norouzi and Shahmoham-
because of the multiple layers and the use of nonlinear acti- madi-Kalalagh 2019; Arabameri et al. 2020). In general,
vation functions. The MLP network can distinguish nonlin- three tuning parameters are determined: the number of trees,
ear data relationships. the number of selected features and the maximum depth of
The MLP learning algorithm can be expressed as shown trees. The RF model is implemented based on classification
below. First, the network is initialized using random weights data using the Gini index. The mathematical expression can
between − 1 and + 1. Then, the first training is conducted, be portrayed as follows to decide the nodes on a decision
and the output is obtained. Finally, the network output is tree:
compared with the target output, and the error is propagated
c
backwards. ∑ ( )2
Gini = 1 − pi . (7)
The output layer of weights is corrected using the math- i=1
ematical expression:
13
The above expression estimates the Gini using the class to calculate CI, RI and CR, respectively (Al-Ruzouq et al.
and probability for each node branch. It estimates
( ) the prob- 2019a, b; Ghosh et al. 2020):
ability of branches that tend to occur. Here, pi denotes the
𝜆max − n
observed relative frequency of the class, and c demonstrates CI = , (8)
the total number of classes. The present study considers five n−1
trees with 100 iterations for the RF model, resulting in an
accuracy of 99%. RI =
1.98(n − 1)
, (9)
n
AHP
CI
CR = . (10)
AHP is a semiquantitative, multicriteria decision-making RI
tool utilized by various researchers for decision analysis
(Nasiri et al. 2013; Abbaspour 2014; Rahimi et al. 2014;
Muralitharan and Palanivel 2015; Chakraborty and Kumar Results and discussion
2016; Behera et al. 2019; Rajasekhar et al. 2019; Ghosh
et al. 2020; Kim 2020; Zagade and Umrikar 2021). Weights Table 2 represents the accuracy percentage, kappa statistics,
and ranks are assigned based on experts’ opinions and litera- mean absolute error and root mean square error for the ML
ture review to follow the AHP process (Agarwal and Garg approach. RF has an overall accuracy of 99% and the high-
2016; Kazakis 2018; Salar et al. 2018; Al-Abadi et al. 2020; est kappa statistics among the other techniques (i.e., 0.98).
Mokarram et al. 2021). Then, the consistency ratio (CR) is SVM has an overall accuracy of 72% and a kappa value
calculated to validate the AHP model, as discussed in the of 0.5, whereas MLP has an overall accuracy of 95% and
next subsections. a kappa value of 0.93. The RF technique has the highest
accuracy. Thus, it is selected for the weight evaluation pro-
Weighting and ranking cess. In particular, the factor weights of the RF model are
utilized for AGR mapping and comparisons with the weights
The weights of all the parameters are combined in a square of the AHP.
matrix considering all the diagonal elements as 1. The prin- It is important to mention that contracting observation
cipal Eigenvalue and their corresponding right Eigenvector between MAE and RMSE metrics for the MLP and RF. Int
are used to identify all the parameters’ relative importance terms of MAE, the MLP seems to outperform the RF, yet the
(Zagade and Umrikar 2021) concerning artificial recharge RMSE suggests the opposite. The difference in the conclu-
zonation. The criteria are ranked from 1 (equal importance) sion of both metrics goes back to their definition. For the
to 9 (extreme importance). This comparative assessment MAE, it measures the average absolute difference between
impacts each factor that contributes to AGR zonation. the predicted values and the actual values. Hence, it is less
sensitive to outliers because it takes the absolute value of the
Pairwise comparison matrix errors. For the RMSE, on the other hand, is more sensitive
to outliers because it squares the errors before taking the
A diagonal pairwise comparison matrix is prepared to under- square root. This means that larger errors contribute more
stand the consistency of the AHP ranking and weighting to RMSE than to MAE.
system of the AHP, thereby keeping all the diagonal values For the AHP approach, Table 3 represents the pairwise
as 1. Consistency index (CI), randomized index (RI) and CR comparison matrix for AGR zonation governing param-
are calculated. CR is the measure of the consistency of the eters considered in this research. In this study, CR is 3%,
pairwise matrix with its respective weights. The CR value which validates the AHP system (< 10%). In the pairwise
should be below 10%. Equations 8, 9 and 10 are utilized comparison matrix system, the impact ratio of all the
Table 2 AGR weights based on ML methods Correctly classified Kappa statistics Mean absolute Root mean
ML models instances (%) error squared
error
13
Precipitation 1.000 0.900 0.500 0.500 0.900 0.900 2.000 2.000 0.900
DSD* 1.111 1.000 0.500 0.500 0.900 0.900 2.000 2.000 0.900
Geomorphology 2.000 2.000 1.000 0.900 2.000 2.000 4.000 4.000 2.000
Geology 2.000 2.000 1.111 1.000 2.000 2.000 4.000 4.000 2.000
GWL** 1.111 1.111 0.500 0.500 1.000 0.900 2.000 2.000 0.900
TDS+ 1.111 1.111 0.500 0.500 1.111 1.000 2.000 2.000 0.900
Elevation 0.500 0.500 0.250 0.250 0.500 0.500 1.000 0.900 0.500
LD++ 0.500 0.500 0.250 0.250 0.500 0.500 1.111 1.000 0.500
Residential ED# 1.111 1.111 0.500 0.500 1.111 1.111 2.000 2.000 1.000
factors should be estimated. The comparative ranks of the (2) Location ‘B’ lies in Ras-al Khaimah Emirate. It has a
influences of each thematic layer for AGR zonation are low TDS of 2000 mg/l. The soil comprises fan deposits
illustrated in Table 4. and alluvium, making it ideal for AGR storage.
Figure 4 shows the comparison of the weight between (3) Location ‘C’ is in Fujairah Emirates. It is not suitable
two approaches: one is for the traditional AHP method, for AGR storage. The analysis of the thematic layers for
whereas the other is for the RF model. The AHP approach the decision criteria indicates that the region is moun-
seizes a considerable weight of 20% each from the geo- tainous and has Gabbro deposits as geology. The GWL
morphology and geology layers. In comparison, the RF is high, up to 185 masl. Thus, the storage capacity is
model obtains 16% weight from the groundwater layer, decreased.
followed by 14% from the geomorphology and geology (4) Location ‘D’ is located in the northern part of the Shar-
layers. jah Emirate and is near Location ‘A’. The region com-
Figure 5 presents the AGR maps from the AHP method prises dunes geomorphology and sand geology, which
and the RF approach. Each map is categorized into six are ideal for AGR storage. The region’s elevation is
classes in accordance with AGR suitability: very high, 49 m at plains, thereby easing the construction process
high, moderately high, moderately low, low and very low. of AGR storage.
Both maps have marginal differences for moderately high,
moderately low and low categories. Compared with the According to both AGR maps, an existing AGR site
AHP-based AGR map, the RF-based AGR map predicts (located in the Nazwa area, shown in Fig. 5) is located
many areas for high and moderately high categories (3% within the moderately low suitability criteria, validating the
and 7%, respectively). Further details are highlighted in developed AGR maps. The local authorities in the study
Table 5. Accordingly, the following points are observed: area informed us that Nazwa was selected because of its
highly favorable geological and geomorphological structures
(1) Location ‘A’ lies in Fujairah Emirate and depicts high of sand and sand dunes, respectively. In addition, the TDS
potential for AGR. The selection criteria of this region of the region is low at approximately 2400 mg/l, making the
are primarily due to fan deposits and the alluvium region highly suitable for AGR, water injection and water
constitution of soil that holds water up to maximum extraction. Moreover, the distances from the water body and
capacity. The available groundwater is approximately from the residential zones are kept moderate, holistically
50 masl, making this location an ideal storage space for balancing the economy of the project. This analysis indi-
artificial recharging. Annual precipitation constitutes cates that both approaches (i.e., AHP-based and RF-based
103 mm, which is high for a desert country like UAE. approaches) are suitable for AGR zonation. Both approaches
Region ‘A’ is located in Al Hajjar Mountains. The LD show similar outcomes with minor differences. The advan-
of the region is high, approximately 0.43. tage of utilizing the RF-based approach includes providing
13
13
Table 5 Percentage of AGR suitability from AHP and ML of the total provides important insights for planning AGR strategies
area in arid regions. The results and methodology can also
AGR suitability AHP (%) ML (%) present findings applicable to other regions facing scarce
groundwater resources.
Very high 1 1
High 9 12 Acknowledgements The authors would like to thank Prof. Hamid Al-
Moderate high 41 48 Naimy, Chancellor of University of Sharjah (UoS), and Eng. Saeed
Sultan Al Suwaidi, the Director of Sharjah Electricity, Water, and Gas
Moderate low 39 34
Authority (SEWA) for facilitating the study.
Low 10 5
Very Low 1 0.05 Author contributions RR, AS, MBG, and AY formulated the research
idea. RR, AS, and SM designed the research methodology. SM. and RR
prepared all the machine learning models as well as GIS thematic lay-
ers. RR and SM wrote the first draft of the manuscript. RR, AS, MAK,
Given the sufficient data and expertise, the methodology MBG, and RJ revised the concept and the modeling outcomes. RR and
MAK proofread the manuscript. RR, AS, and NH handled the project
developed in this research can be applied to other regions. administration. RR and AS acquired the research fund.
However, overgeneralization of the conclusions must be
avoided. Moreover, the recommendations should be made Funding The authors would like to acknowledge the fund received by
specific and contextual to the relevant agencies in the region. the University of Sharjah (under the number of 1902041134-P) that
helped to facilitate this research.
Other key findings from the results, such as geomorphology,
geology and drainage density, must be included to strengthen Data availability Not available.
the relevance of the findings and the practical actions.
In conclusion, this study has a valuable contribution to
groundwater resource sustainability and management. It
13
13
models in groundwater potential mapping. Water Resour Manag Selvarani AG, Maheswaran G, Elangovan K (2017) Identification of
31:2761–2775. https://doi.org/10.1007/s11269-017-1660-3 artificial recharge sites for Noyyal River Basin using GIS and
Naghibi SA, Pourghasemi HR, Dixon B (2016) GIS-based groundwa- remote sensing. J Indian Soc Remote Sens 45:67–77. https://doi.
ter potential mapping using boosted regression tree, classification org/10.1007/s12524-015-0542-5
and regression tree, and random forest machine learning models Senanayake IP, Dissanayake DMDOK, Mayadunna BB, Weerasekera
in Iran. Environ Monit Assess 188:1–27. https://doi.org/10.1007/ WL (2016) An approach to delineate groundwater recharge poten-
s10661-015-5049-6 tial sites in Ambalantota, Sri Lanka using GIS techniques. Geosci
Nasiri H, Boloorani AD, Sabokbar HAF et al (2013) Determining the Front 7:115–124. https://doi.org/10.1016/j.gsf.2015.03.002
most suitable areas for artificial groundwater recharge via an inte- Senthilkumar M, Gnanasundar D, Arumugam R (2019) Identifying
grated PROMETHEE II-AHP method in GIS environment (case groundwater recharge zones using remote sensing and GIS tech-
study: Garabaygan Basin, Iran). Environ Monit Assess 185:707– niques in Amaravathi aquifer system, Tamil Nadu, South India.
718. https://doi.org/10.1007/s10661-012-2586-0 Sustain Environ Res. https://doi.org/10.1186/s42834-019-0014-7
Norouzi H, Shahmohammadi-Kalalagh S (2019) Locating groundwa- Shanableh A, Al-Ruzouq R, Yilmaz AG et al (2018) Effects of land
ter artificial recharge sites using random forest: a case study of cover change on urban floods and rainwater harvesting: a case
Shabestar region Iran. Environ Earth Sci 78:1–11. https://doi.org/ study in Sharjah UAE. Water (switzerl). https://doi.org/10.3390/
10.1007/s12665-019-8381-2 w10050631
Odhiambo GO (2017) Water scarcity in the Arabian Peninsula and Sherif MM, Ebraheem AM, Al Mulla MM, Shetty AV (2018) New
socio-economic implications. Appl Water Sci 7:2479–2492. system for the assessment of annual groundwater recharge from
https://doi.org/10.1007/s13201-016-0440-1 rainfall in the United Arab Emirates. Environ Earth Sci. https://
Pourghasemi HR, Sadhasivam N, Yousefi S et al (2020) Using machine doi.org/10.1007/s12665-018-7591-3
learning algorithms to map the groundwater recharge potential United Nations, Department of Economic and Social Affairs, Popula-
zones. J Environ Manage 265:110525. https://doi.org/10.1016/j. tion Division (2015). Population 2030: Demographic challenges
jenvman.2020.110525 and opportunities for sustainable development planning (ST/ESA/
Rahimi S, Shadman Roodposhti M, Ali Abbaspour R (2014) Using SER.A/389)
combined AHP-genetic algorithm in artificial groundwater Verma N, Patel RK (2021) Delineation of groundwater potential zones
recharge site selection of Gareh Bygone Plain, Iran. Environ Earth in lower Rihand River Basin, India using geospatial techniques
Sci 72:1979–1992. https://doi.org/10.1007/s12665-014-3109-9 and AHP. Egypt J Remote Sens Sp Sci. https://doi.org/10.1016/j.
Rahmati O, Pourghasemi HR, Melesse AM (2016) Application of GIS- ejrs.2021.03.005
based data driven random forest and maximum entropy models for Xu G, Su X, Zhang Y, You B (2021) Identifying potential sites for
groundwater potential mapping: a case study at Mehran Region, artificial recharge in the plain area of the daqing river catchment
Iran. CATENA 137:360–372. https://doi.org/10.1016/j.catena. using gis-based multi-criteria analysis. Sustain. https://doi.org/
2015.10.010 10.3390/su13073978
Rajasekhar M, Sudarsana Raju G, Sreenivasulu Y, Siddi Raju R Zagade ND, Umrikar BN (2021) Drought severity modeling of upper
(2019) Delineation of groundwater potential zones in semi-arid Bhima river basin, western India, using GIS–AHP tools for effec-
region of Jilledubanderu river basin, Anantapur District, Andhra tive mitigation and resource management. Nat Hazards 105:1165–
Pradesh, India using fuzzy logic, AHP and integrated fuzzy-AHP 1188. https://doi.org/10.1007/s11069-020-04350-9
approaches. HydroRes 2:97–108. https://d oi.o rg/1 0.1 016/j.h ydres. Zghibi A, Mirchi A, Msaddek MH et al (2020) Multi-influencing fac-
2019.11.006 tors to map groundwater recharge zones in a semi-arid Mediter-
Saif Al Matri A (2008) Assessment of artificial groundwater recharge ranean. Water 12:2525
in some wadies in UAE by using isotope hydrology techniques.
Prepared for: The 8th Gulf Water Conference, Manama – Bahrain Publisher's Note Springer Nature remains neutral with regard to
(Conference Presentation). https://w stagc c.o rg/W
STA_8 th_C
onfe jurisdictional claims in published maps and institutional affiliations.
rence/Assessment-of-Artifi cial-Groundwater-Recharge-in-Some-
Wadies-in-UAE.pdf Springer Nature or its licensor (e.g. a society or other partner) holds
Salar SG, Othman AA, Hasan SE (2018) Identification of suitable exclusive rights to this article under a publishing agreement with the
sites for groundwater recharge in Awaspi watershed using GIS author(s) or other rightsholder(s); author self-archiving of the accepted
and remote sensing techniques. Environ Earth Sci. https://d oi.o rg/ manuscript version of this article is solely governed by the terms of
10.1007/s12665-018-7887-3 such publishing agreement and applicable law.
Seasons in the UAE: Weather and Climate (2020). https://seasonsyear.
com/UAE
13
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at