Water 15 03970 v2
Water 15 03970 v2
Article
Flood Forecasting by Using Machine Learning: A Study
Leveraging Historic Climatic Records of Bangladesh
Adel Rajab 1 , Hira Farman 2,3 , Noman Islam 2,3 , Darakhshan Syed 4 , M. A. Elmagzoub 5 , Asadullah Shaikh 6 ,
Muhammad Akram 1 and Mesfer Alrizq 6, *
1 Department of Computer Science, College of Computer Science and Information Systems, Najran University,
Najran 61441, Saudi Arabia; [email protected] (A.R.); [email protected] (M.A.)
2 Computer Science Department, Iqra University, Karachi 75300, Pakistan; [email protected] (H.F.);
[email protected] (N.I.)
3 Department Computer Science, Karachi Institute of Economics and Technology, Karachi 74600, Pakistan
4 Computer Science Department, Bahria University Karachi Campus, Karachi 75300, Pakistan;
[email protected]
5 Department of Network and Communication Engineering, College of Computer Science and Information
Systems, Najran University, Najran 61441, Saudi Arabia; [email protected]
6 Department of Information Systems, College of Computer Science and Information Systems,
Najran University, Najran 61441, Saudi Arabia; [email protected]
* Correspondence: [email protected]
Abstract: Forecasting rainfall is crucial to the well-being of individuals and is significant everywhere
in the world. It contributes to reducing the disastrous effects of floods on agriculture, human life, and
socioeconomic systems. This study discusses the challenges of effectively forecasting rainfall and
floods and the necessity of combining data with flood channel mathematical modelling to forecast
floodwater levels and velocities. This research focuses on leveraging historical meteorological data to
find trends using machine learning and deep learning approaches to estimate rainfall. The Bangladesh
Meteorological Department provided the data for the study, which also uses eight machine learning
algorithms. The performance of the machine learning models is examined using evaluation measures
like the R2 score, root mean squared error and validation loss. According to this research’s findings,
Citation: Rajab, A.; Farman, H.;
polynomial regression, random forest regression, and long short-term memory (LSTM) had the
Islam, N.; Syed, D.; Elmagzoub, M.A.;
highest performance levels. Random forest and polynomial regression have an R2 value of 0.76, while
Shaikh, A.; Akram, M.; Alrizq, M.
LSTM has a loss value of 0.09, respectively.
Flood Forecasting by Using Machine
Learning: A Study Leveraging
Historic Climatic Records of
Keywords: forecasting rainfall; RMSE; ANN; regression
Bangladesh. Water 2023, 15, 3970.
https://doi.org/10.3390/w15223970
and increasing urbanisation, endangering more people’s lives, ecological systems, and
economic systems. For many years, flood management strategies have been built on
the foundation of conventional flood forecasting techniques based on hydrological and
meteorological models and have produced significant insights. A more sophisticated,
adaptive, and data-driven methodology must be investigated, though, due to the complex
interaction of components that cause floods and the ever-increasing amount of data that
are accessible [12]. The field of artificial intelligence concentrates on creating machines
that can process data, learn from it, and make judgements. The use of machine learning
is an appealing approach for flood forecasting because it holds the promise of revealing
Water 2023, 15, x FOR PEER REVIEW 2 of 40
intricate, complicated correlations within huge datasets. Its capacity to incorporate data
from numerous sources, including satellite images, river gauge data, and climate models,
offers chances to improve floods’ precision, predictability, and lead time.
5. Observation that the random forest regressor and k-nearest neighbours algorithms
achieved high accuracy, 96% and 99%, respectively.
The remaining sections of this work are organised as follows: Section 2 presents
a detailed literature review regarding the materials and methods, and partly examines
the theoretical background on flood early prediction utilising deep learning and various
machine learning techniques. Section 3 discusses the proposed methodology. Sections 4–6
present the results and discussion for comparing various machine learning and deep
learning techniques for flood earlier forecasting based on a dataset. Section 7 provides a
conclusion and future recommendations.
2. Literature Review
We structured this section as follows. First, the past studies are highlighted. A
discussion on the justification of this research follows this.
prediction models. This publication’s essential contribution is its discussion of the state
of ML models for flood prediction and recommendations for the most effective models to
use. To give a comprehensive picture of the many ML methodologies utilised in the area,
this study predominantly looks at the studies where ML models were assessed through
a qualitative review of reliability, accuracy, effectiveness, and performance. According to
Chen et al. [32], the area significance is divided into grids based on longitude and latitude,
and the data on precipitation and drainage collected at stations are combined into tensors
depending on station coordinates. Instead of a one-dimensional time sequence, the input
characteristic is a two-dimensional time series with spatial data.
According to Motta et al. [33], this effort will integrate machine learning classifiers with
Geographic Information Systems (GIS) methods to provide a flood prediction platform that
can be helpful for resiliency management. With the help of this approach, it is possible to
create realistic variables and risk indicators for the likelihood of flooding at the municipality
level, which may be used to create long-term plans for smart cities. According to a review
of past research articles, Maspo et al. [7] review the ML methods currently being utilised
for flood forecasting. This research tries to identify the most helpful flood forecast methods.
This research aims to list the critical variables and the most recent ML methods for flood
prediction. According to Sankaranarayanan et al. [13], the public and the government may
be able to plan both short- and long-term mitigation strategies, be ready for evacuation
and rescue operations, and provide relief for flood victims if they receive early warning
of a flood calamity. In this case, the location of the impacted areas and their respective
seriousness are two of the critical considerations in most flood mitigation methods. There
is still no reliable method for predicting floods in advance. Previous technologies usually
relied on prepared and manually entered data. Because the processes were time consuming,
making early and real-time projections was impossible. Innovative operational approaches
have been examined by Parag et al. [34]. The researchers observe and examine the current
trend regarding data-driven solutions for flood forecasting. For algorithms based on
machine learning that have been developed using historical data for climatic variables,
predicting jobs is growing more and more significant. A review offered by Furquim et al. [5]
presents the use of data gathered from urban rivers to forecast floods in an effort to decrease
the damage caused by floods. After the involvement hypothesis had shown the mutual
dependence of the information, the artificial neural networks were reviewed to see how
accurate their forecasting algorithms were. WSNs have been set up whenever serious
flooding-related issues have arisen. Adnan et al.’s [35] approach was suggested to develop
risk-based plans for growth and enhance current warning systems for emergencies. To
forecast and determine potential flooding locations or flood-sensitive regions in the Teesta
River basin, Talukdar et al. [6] applied cutting-edge novel ensemble machine learning
strategies. A rainfall forecasting algorithm employing the effective machine learning
random forest was proposed by Adnan et al. [30] by giving a flood prediction utilising a
range of machine learning techniques. Gauhar et al. [36] employed a variety of coefficients
of association for feature selection and the k-NN technique to forecast a flood. It is widely
established that quantifying and lowering the uncertainty associated with hydrologic
prediction is crucial for predicting the risk of flooding and making educated decisions [37].
The current paper thoroughly examines the Bayesian forecasting methods used in flood
forecasting. According to Haque et al. [38], 180 individual models were produced using
five different machine learning techniques based on multiple combinations of temporal
lags for input data and lead times in prediction. The 5772 km2 Someshwari-Kangsa sub-
watershed in Bangladesh’s North Central hydrological region was the subject of modelling.
Using conventional machine learning approaches, it is challenging to predict when it will
rain [39]. However, several studies have been presented that forecast rainfall using various
computer algorithms. Osmani et al.’s [15] innovative method for predicting monthly dry
days (MDD) at six target stations in Bangladesh makes use of a variety of ML techniques.
The datasets for monthly days without precipitation and monthly days with rainfall were
produced using a range of rainfall limitations. Manandhar et al. [16] recommend employing
Water 2023, 15, 3970 5 of 37
machine learning approaches to look into the long-term implications of flood prevention in
Bangladesh. Data from socioeconomic surveys and historical events (such as migration and
mortality) are available from 1983 to 2014. Yaseen et al.’s [40] emphasis is on the greater
necessity and duty of handling human-caused catastrophes. Humans invented an area of
science called artificial intelligence (AI) that could be applied in this situation.
Aakash et al. [41] thoroughly analyse and compare the many approaches and algo-
rithms that scholars have used to estimate precipitation. The core objective is to present
non-experts with access to the techniques and approaches used in rainfall forecasting. In
the present research, a flood vulnerability map for Iran is produced by utilising the concept
of the convolutional neural networks (CNN) method [42], one of the more current and
effective techniques in enormous datasets. In their discussion of several case studies, Sergey
et al. [43] use the example of ensemble-based storm surge simulation for forecasting floods
in St. Petersburg, Russia, to look at the opportunities presented by the established method-
ology. Mosavi et al.’s [31] main contribution is to show the current state of ML models for
flood prediction and to provide insight into the most appropriate models. India [11] has
some of the worst flood damage in the world right now, with the most recent disaster in
Kerala in August 2018 as a prime example. The problem is that no one has attempted to
forecast the likelihood of a flood using rainfall volume and temperature. Therefore, the
Neural Network has been utilised to forecast the likelihood of floods based on temperature
and rainfall intensity. Therefore, Gude et al. [3] propose flood prediction as one of the
primary topics to be researched in hydrology. Although many academics have studied
this problem using various approaches, such as physical models and image processing,
the accuracy and time steps still fall short for all applications. This study examines deep
learning techniques for gauging height and evaluates the associated uncertainty. Current
ML techniques for flood prediction are evaluated by Maspo et al. [7], and the parameters
utilised to predict floods are based on an analysis of past research publications. According
to Nevo et al. [44], the multidimensional model is a machine learning replacement for the
hydraulic modelling of flooding flows. Compared to past information, all models meet
expectations of performance that are high enough for use in operational situations. Mitra
et al. [8] offer an embedded system that utilises IoT-based machine learning to forecast
the possibility of floods in a river basin. The device uses a ZigBee connection to link the
WSN to a customisable mesh network, and then it uses a GPRS module to send data over
the internet.
Jeerana et al. [9] examine the possibility of using machine learning methods to fore-
cast flood occurrences in the Pattani River using open data. A probabilistic prediction
framework was created by Chen et al. [32] using several machine learning approaches.
Three techniques utilising multiple scenarios for decision-making were used along with
the ML based approaches to evaluate how well they were able to model the risk of flooding
in the Ningdu Catchment, which Khosravi et al. [45] address as one of China’s foremost
flood-prone geographic areas. El-Magd et al. [46] employed the extreme gradient boost and
KNN methods to produce flash flood prediction maps for the River El-Laquita in Egypt’s
eastern centre area. To predict floods on the shores of the rivers Daya and Bhargavi, which
flow across the Indian state of Odisha, Nayak et al. [47] use the Deep Belief Network (DBN).
In a comparative investigation, additional machine learning techniques are applied to fully
depict the effects of dam construction. Tayfur et al.’s [48] main concern is to present the
application of swarm-based optimisation, ant-colony optimiser, artificial neural network
(ANN) and genetic-algorithm-based approaches to flood hydrograph prediction.
To predict river flooding in the Barak River, Sahoo et al. [49] examined the corre-
sponding precision of radial basis functions. As reported by Qian et al. [50], there have
been considerable financial and human damages due to an increase in flash floods in
metropolitan areas. To precisely describe the specifics of flood improvement, the current
flood prediction techniques are either too sluggish or excessively straightforward. This
research uses deep neural networks to accelerate the mathematical calculation of a 2D
urban flood forecasting technique based on thermodynamics and controlled by the Shal-
Water 2023, 15, 3970 6 of 37
low Water Equation (SWE). Researchers retrieve flood patterns from data generated via a
partial differential equation (PDE) generator using convolutional neural networks (CNN)
and conditionally generative adversarial networks (cGANs). The four ML-based FSMs
mentioned by Adnan et al. [35] are random forest (RF), KNN, multilayer perceptron (MLP),
and hybridised genetic algorithm–gaussian radial basis function–support vector regression
(GA; RBF; SVR). Scott Miau and Wei-Hsi Hung [51] also utilise a deep learning framework.
Hossain et al. [52] describe an effort to create a system for analysing long-term seasonal
rainfall trends in Western Australia using Lavenberg–Marquardt multiple linear regression
and artificial-based methodologies. RBFs, including linear and nonlinear kernel parameters,
perform better in the same catchment under different circumstances. The response to lighter
precipitation would be very different from that to heavier one, which is a handy way to
reveal the dynamics of an SVM classifier. The study also shows an unexpected outcome in
the SVM response to diverse rainstorm-related inputs.
According to Aswad et al. [10], forecasting flood status is challenging and calls for in-
depth research into the underlying causes of floods. This study recommends a TpoT-based
model to help anticipate when rivers may flood. The IoT-FSP concept uses the Internet of
Things framework to facilitate flood data collection and three approaches for ML for flood
forecasts. As Ighile et al. [53] reported, this study forecasted the flood-prone locations in
Nigeria using historical flood records from 1985 to 2020 and plenty of conditioned variables.
An exact flood prediction model can be made using various machine learning techniques,
according to Kunvergi et al.’s [54] investigation. The generalised additive model (GAM),
the boosted regression tree (BTR), and the multivariate adaptive regression splines (MARS)
are three novel machine learning methods that Dodangeh et al. [55] suggested. These
models were built using random subsampling (RS), bootstrapping (BT), and multi-time
resampling techniques. The province of Ardabil, to which this approach was employed, is
situated near the Caspian Sea’s coast and frequently endures severe flooding.
The study by Khairudin et al. [56] aims to investigate the impact of various time-series
scales of rainfall information from eight rainfall stations along the Kelantan River on the
accuracy of the water level forecasts at Kuala Krai station. To create a flood forecasting
framework, Dtissibe, Francis Yongwa et al. [57] employed the multiple-layer perceptron
and flow as input–output parameters. For this, a set of data corresponding to the estimates
of rainfall recorded in Australia’s major cities throughout the previous ten years was
provided to the primary machine learning methods (kNN, decision tree, random forest,
and neural networks). Sarasa-Cabezuelo’s study [58] outlines a qualitative investigation
of using machine learning to predict the probability of rain. The outcomes demonstrate
that neural networks are the best approach. This work compares the efficacy of rainfall
forecasting techniques based on modern machine learning techniques for forecasting hourly
volumes of rainfall using weather time-series data from cities across the United Kingdom.
Liyew and Melese [59] analyse the performance of these algorithms. The analytical
hierarchical technique, a multi-parameter modelling tool, is used by MC Aydin and Bir-
inciolu [17] to analyse the risk of flooding assessments in the Turkish province of Bitlis.
Table 1 contrasts the pros and cons of various methods, datasets, and resources used in the
literature for rainfall or weather forecasting.
Table 1. Cont.
• Several models are inefficient when used differently because they are overfit to datasets
or areas. Model portability between several geographic areas is still rugged.
• Numerous ML models, intense learning models, behave as “black boxes”, making it
challenging to comprehend how they make decisions. This makes it difficult to win
over the trust of stakeholders and end users.
• When ML models are combined with conventional meteorological models, which
have been widely used and relied upon for years, there is frequently a gap. The
physical processes that play a role in flood generation and propagation are not always
adequately taken into consideration in studies.
• For real-time prediction, the computational burden of some sophisticated ML models
may be too high. Specific models could be useless in time-sensitive situations due to
the time required for data collection, initial processing, and forecasting.
• Certain approaches might be practical in local watersheds or urban areas, but they
might have difficulties when scaled up to substantial river valleys.
Regardless of these challenges, there is a lot of scope for machine learning in flood
forecasting. To utilise machine learning’s maximum potential, it will be essential to carry
out ongoing research, collect data, engage stakeholders, and integrate ML with conventional
modelling techniques. This paper employs several machine learning and deep learning
models for rainfall prediction. We briefly discuss the rationale behind our choice.
For choosing ANN, the reason is that dynamic non-linear correlations in data, which
are frequently present in the hydrological system, can be captured using ANNs. A suffi-
ciently large ANN can hypothetically approximate any function. Because of this, they are
adaptable for various jobs, including flood forecasting. ANNs can change their architecture
(depth, width) to accommodate various datasets and prediction timeframes [57,66].
Similarly, the choice of LSMT is obvious. Flood forecasting is a time-series challenge
by nature. By preserving a “stored memory” of previous inputs in the internal neuron
states, LSTMs, a form of recurrent neural network (RNN), are created to handle sequential
information. The issue of vanishing gradients affects traditional RNNs, making it difficult
for them to learn dependence over time. LSTMs are less prone to this issue because of
their latching mechanisms, which enable them to learn and remember over lengthy peri-
ods [67]. Additionally, convolutional neural networks (CNNs) and other neural network
architectures can be integrated with LSTMs to capture spatial and temporal patterns [68,69].
The proposed study can forecast rainfall for every season and region in Bangladesh.
From Table 1, it is clear that most efforts are for various restricted regions and have some
significant flaws, like a relatively tiny dataset, a small feature set, and lower precision.
On the other hand, in this study, machine learning methods and a two-layer long short-
term memory (LSTM) method have been utilised [1,7,21] and an artificial neural network
(ANN) [33,58] for predicting rainfall in Bangladesh has been developed. It solves the
backflow problem found in other works. It uses a wider dataset with 18 features, the 16
most important of which are used. The proposed study can forecast rainfall in Bangladesh
for any season.
3. Proposed Methodology
This proposed research aims to determine whether, by utilising machine and deep
learning algorithms, a higher accuracy rate can be attained while also reducing error. The
dataset includes information on Bangladesh’s monthly and yearly rainfall (1949 to 2013)
index as well as information on the number of times a year that floods occur close to
35 stations: Khulna, Dinajpur, Bogra, Srimangal, Satkhira, Mymensingh, Jessor, Comilla,
Cox’sBaza, Faridpur, Barisal, Chittagong (IAP-Patenga), Maijdee, Court, Dhaka, Rangpu,
Sylhet, Rangamati, Ishurd, Rajshahi, Chandpur, Hatiya, Bhola, Sandwip, Patuakhali, Feni,
Khepupara, Madaripur, Kutubdia, Sitakunda, Teknaf, Tangail, Mongla, Chuadanga, Syed-
pur and Chittagong (City-Ambagan). Furthermore, the data were preprocessed using
feature engineering, data normalisation, and feature encoding. After splitting the dataset
into training and testing portions in an 80:20 ratio, applying the machine learning model
edpur and Chittagong (City-Ambagan). Furthermore, the data were preprocessed
feature engineering, data normalisation, and feature encoding. After splitting the d
into training and testing portions in an 80:20 ratio, applying the machine learning m
Water 2023, 15, 3970 9 of 37
is essential. For comparison, it is necessary to use models like the k-nearest neigh
support vector machine, decision tree regressor, random forest mode, AdaBoostR
sor, Stacking Regressor,
is essential.and artificial itneural
For comparison, network.
is necessary Finally,
to use models the model
like the k-nearest that is b
neighbour,
support vector machine, decision tree regressor, random forest mode, AdaBoostRegressor,
predicting floods can beRegressor,
Stacking identified basedneural
and artificial on network.
the RMSE Finally,and R2 scores
the model ofatthe
that is best models
predict-
2
Figure 2 shows theingworkflow
floods can be identified based on the RMSE and
of the methodology in Rdetail.
scores of the models used. Figure 2
shows the workflow of the methodology in detail.
Figure 3 represents the complete architecture of the proposed work. The only goal of
this research is to use deep learning and supervised learning to achieve maximum accuracy.
The points in the validation set are used to determine the accuracy of the regressor following
learning with the training data.
Water 2023, 15, x FOR PEER REVIEW 10 of 40
Figure 3 represents the complete architecture of the proposed work. The only goal of
Water 2023, 15, 3970 this research is to use deep learning and supervised learning to achieve maximum accu- 10 of 37
racy. The points in the validation set are used to determine the accuracy of the regressor
following learning with the training data.
3.1.3.1. DatasetDescription
Dataset Description
The data are acquired from Bangladesh’s Weather Department in Dhaka, the main
The data are acquired from Bangladesh’s Weather Department in Dhaka, the main
authority for tracking and making predictions for every natural catastrophe to reduce
authority for tracking and making predictions for every natural catastrophe to reduce
mortality. To deliver accurate forecasts for the weather, we want to learn what we can
mortality. To deliver accurate forecasts for the weather, we want to learn what we can
deduce about previous times and how it connects to present-day climate change and the
deduce
general trendsprevious
about times
of the planet’s and how
weather it connects
by using to present-day
this dataset. climate
From 1948 to 2013, change and
the dataset
theincludes
generalcomprehensive
trends of themonthly
planet’saverages
weatherforbyBangladesh
using thisthat
dataset.
are area-specific forto
From 1948 2013, the
max-
dataset includes comprehensive monthly averages for Bangladesh that
imum temperature, minimum temperature, rainfall, relative humidity, wind speed, cloud are area-specific
forcover,
maximum temperature,
and brilliant minimum
sunshine. temperature,
Also included rainfall, station
are the weather relativenumbers,
humidity, windY speed,
X and
cloud cover, and
coordinates, brilliant
latitude, sunshine.
longitude, andAlso included
altitude. are the weather
This research develops station numbers,
and evaluates the X and
top eight deeplatitude,
Y coordinates, learninglongitude,
and machine andlearning models
altitude. Thisusing our dataset.
research developsFigure
and 4evaluates
shows the
topthe snapshot
eight deepof the dataset.
learning andThe data forlearning
machine Bangladesh’s
modelsmaximum
using ourand dataset.
minimumFigure
monthly 4 shows
thetemperatures
snapshot ofand the annual
dataset.rainfall are shown
The data in Figures 5 and
for Bangladesh’s 6, respectively,
maximum throughout
and minimum monthly
the entire 65-year period. As can be seen, the maximum temperature in Bangladesh
temperatures and annual rainfall are shown in Figures 5 and 6, respectively, throughout is in
Water 2023, 15, x FOR PEER REVIEW April, March or May, whereas the lowest temperature occurs in January or December. 11 of 40
the entire 65-year period. As can be seen, the maximum temperature in Bangladesh is in
April, March or May, whereas the lowest temperature occurs in January or December.
(a)
(b)
Figure
Figure5.5.Yearly
Yearlyand
andmonthly
monthlyrainfall
rainfallininBangladesh:
Bangladesh:(a)(a)yearly
yearlyrainfall
rainfallininBangladesh,
Bangladesh,(b)
(b)monthly
monthly
rainfall in Bangladesh.
rainfall in Bangladesh.
These figures show the bar plot, monthly precipitation, and the time series of weather
forecasting. According to the figure, the middle months experience the highest frequency
of rainfall during a 12-month period, which steadily declines afterwards. Projecting future
values using past data is known as time series forecasting. The graphs indicate that the
peak of rainfall progressively rises after a few years. The highest temperature in Bangladesh
is depicted in Figure 6. The bar plot shows the variation in the peak temperature over
time. The varying patterns of the bar indicate periods when the maximum temperature
was abnormally high or low. The right graph displays the variations in the minimum
temperature over time. The variation in the count plot indicates the seasonal variations in
the minimum temperature.
WaterWater 15, x15,
2023,2023, FOR3970
PEER REVIEW 12 of 12
40of 37
(a)
(b)
Figure 6. Bar
Figure diagram
6. Bar for for
diagram monthly minimum
monthly and
minimum maximum
and temperature
maximum temperaturein
in Bangladesh: (a)maxi-
Bangladesh: (a)
maximum temperature,(b)
mum temperature, (b)minimum
minimumtemperature.
temperature.
longitude
The longitude of rainfall in specific locations and weather condi-
16 LONGITUDE float coordinate of the
tions.
Water 2023, 15, 3970 station. 13 of 37
To perform automated regression evaluations, this work uses machine learning and
Table 2. Features used for training machine learning models.
deep learning techniques. The information from the meteorological department is used to
Sr. No. Attribute
forecast rainfall inAttributes
Bangladesh. The collection includes several
Description Type
characteristics. However,
Measurement
the output class for the forecast uses an attribute called “Rainfall”. The data collection’s
1. ‘Unnamed: 0’, This is likely an index or identifier column for the dataset. integer serial no
overall histogram illustration is shown in Figure 8. Following the data collection, several
2. ‘Station Names’ The name of the city or station where the flood occurred string categorical
preprocessing procedures
This column represents are for
the year carried
whichout, including
the weather verifying values, handling, scaling,
data
3. ‘Year’, and integer numerical
are transforming
recorded. some characteristics, like station names, etc. This will permit the super-
4. ‘Month’, vised learning regression algorithms
The month of the recorded data. to provide more accurate predictions.
integer Figure 8 rep-
numerical
5. ‘Max Temp 0C’ The maximum recorded temperature of a day.
resents the bar plot for rainfall in Bangladesh’s cities. float degrees Celsius
The minimum temperature experienced on a specific day
6. ‘Min Temp 0C’ float degrees Celsius
(degrees Celsius).
7. ‘Rainfall (mm)’, The amount of rainfall recorded (millimetres). float millimetres
This column represents the relative humidity recorded for a
8. relative humidity float percentage
specific month and year.
Wind speed in a particular direction at a given location and
9. ‘Wind Speed’, float metres per second.
time
This column represents the cloud coverage or cloudiness level
10 Cloud Coverage float percentage
recorded for a specific month and year.
This column contains the duration of bright sunshine recorded
11 ‘Bright Sunshine’ for a specific month and year. It represents when the sun is float measured in hours
visible, or the sky is clear.
In meteorology and weather monitoring, a station number is a
12 ‘Station Number’ unique identification declared to an individual weather station integer numerical identifier
or monitoring place.
Water 2023, 15, 3970 14 of 37
Table 2. Cont.
To perform automated regression evaluations, this work uses machine learning and
deep learning techniques. The information from the meteorological department is used to
forecast rainfall in Bangladesh. The collection includes several characteristics. However,
the output class for the forecast uses an attribute called “Rainfall”. The data collection’s
overall histogram illustration is shown in Figure 8. Following the data collection, several
preprocessing procedures are carried out, including verifying values, handling, scaling, and
transforming some characteristics, like station names, etc. This will permit the supervised
learning regression algorithms to provide more accurate predictions. Figure158 represents
Water 2023, 15, x FOR PEER REVIEW of 40
the bar plot for rainfall in Bangladesh’s cities.
x − min( x )
xnormalized = (1)
max( x ) − min( x )
where x represents the original value in the dataset, min(x) is the minimum value of the
dataset, max(x) is the maximum value of the dataset and, xnormalized is the normalised value
of x within the range [0, 1]. Equation (1) scales the values of the entire dataset to the range
[0, 1],
x−µ
xscaled = (2)
σ
where x = original value of the feature, µ = mean (average) of the feature in the dataset,
σ = standard deviation of the feature in the dataset, and xscaled = scaled value of the feature.
y = β 0 + β 1 x1 + β 2 x2 + · · · + β n x n + e (4)
where y represents the target variable (dependent variable), x1 and x2 denotes the predictor
variable (independent variable), β 0 , β 2 . . .β n are coefficient to be estimated, and e is the
error tem. Figure 11 shows the results using multiple linear regression actual and predicted
rainfall minimum temperature in terms of R2 and RMSE score.
Figure 11. Results illustrating R2 and RMSE values for various machine learning algorithms.
Figure 11. Results illustrating R2 and RMSE values for various machine learning algorithms.
3.6.7. AdaBoostRegressor
A supervised machine learning approach called AdaBoostRegressor is used for regres-
sion problems. It is an adaptation of the AdaBoost (Adaptive Boosting) technique, which
combines a number of weak learners (regression models) to produce an ensemble model
that is robust.
Figure
Figure12.12.
Various types
Various eep
of dof
types learning
deep algorithms.
learning algorithms.
3.7.1.
3.7.1.Artificial
ArtificialNeural Network
Neural (ANN)
Network (ANN)
Figure 12. Various types
AAcomputational
computational of
model deep
model learning
called
called analgorithms.
an artificial neural
artificial network
neural (ANN)
network is motivated
(ANN) is motivatedby by the
the organisation and operation of the neural networks in the human brain.
organisation and operation of the neural networks in the human brain. It is a particular kindIt is a particular
3.7.1.
kind ofArtificial
ML approachNeural Network (ANN)
of ML approach thatthat
cancan consider
consider input
input data
data while
while making
making predictionsororchoices.
predictions choices. The input
The inputA layer, hidden layer(s),
computational model and output
called an layer areneural
artificial the three primary
network types of
(ANN) is layers in
motivated by (as
layer, hidden layer(s), and output layer are the three primary types of layers in an ANN
an ANN
the (as shownand
organisation in Figure 13). of
operation The weights
the neural indicate
networks theinconnections
the human between
brain. It neurons
is a particular
shown in Figure 13). The weights indicate the connections between neurons and specify
and
kindspecify
of ML theapproach
strength or significance
that of theinput
canofconsider information transmitted
datatransmitted
while making between them. or
predictions Dur-
choices.
the
ing strength
the training or significance
phase, the ANN the information
updates the weights of the connections between
based on them.
the During the
given
The input
training layer,
phase, hidden
the ANN layer(s),
updates andtheoutput
weights layer
of are
the the three
connections primary
based types
on of
the layers
given in
input
input data and the needed output. The error between the projected and actual output is
an
data ANNand (as
theshown
needed in Figure
output. 13).
The The weights
error betweenindicate
the the connections
projected and between
actual neurons
output is often
often transmitted backwards across the network to update the weights in a procedure
and specify the strength or significance of the information transmitted between them. Dur-
transmitted
known backwards across
as backpropagation. The ANN thelearns
network to update
and develops itsthe weights
prediction or in a procedure known
decision-mak-
ing the training phase, the ANN updates the weights of the connections based on the given
as skills
ing backpropagation. The ANN
thanks to this iterative learns
approach and develops
[8,57,61]. The net inputits prediction
for the general or artificial
decision-making
input data andmodel
neural
the needed output. The errorcomputed
between by theusing
projected and (7) actual output is
skills network
thanks to thismentioned below can be
iterative approach [8,57,61]. The Equations
net input for the and (8), artificial
general
asoften transmitted
follows: backwards across the network to update the weights in a procedure
neural network model mentioned below can be computed by using Equations (7) and (8),
known as backpropagation. The ANN learns and develops its prediction or decision-mak-
as follows:
ing skills thanks to this iterative approach [8,57,61]. The net input for the general artificial
neural network model mentioned below can be computed by using Equations m (7) and (8),
as follows: Yin = x 1 · w 1 + x 2 · w 2 + x 3 · w 3 + . . . . . . + x m · w m i.e., Yin = ∑ x i · wi (7)
i
Figure Artificialneural
Figure 13. Artificial neuralnetwork.
network.
Water 2023, 15, 3970 22 of 37
Applying the activation function to the net input allows for the output estimation.
Y = F (Yin ) (8)
where t = time step, x(t) = input at ‘t’, and h(t) = hidden state at ‘t’. Wxh , Whh , Why are
weight matrices that control the flow of information, and bh and by are bias vectors.
Figure 14. Illustration of actual and predicted rainfall using multiple linear regression.
Parameter Values
Framework Sk-learn, tensorflow
Training, validation, testing 60%, 20%, 20%
Number of epochs 30
Stopping criterion Early stopping
Activation functions ReLu
Optimiser Adam
Validation criterion 3-fold cross validation
• The various machine learning models used in this study have been implemented using
sk-learn.
• For deep learning models, tensorflow has been used.
• For optimising and finetuning of various hyperparameters, k-fold cross validation has
been performed.
• In addition, earlystopping callback of tensorflow has been used.
• The models are trained for 30 epochs.
• The validation split is 20%.
• The ReLu activation function has been used along with Adam optimiser.
As mentioned in the table, a resampling method called 3-fold cross-validation is used
to assess machine learning models on a small sample of data. The goal is to evaluate
how effectively a model’s output will transfer to a different collection of data. Preventing
overfitting, a situation in which a model learns the training data too well—including its
noise and outliers—and hence performs badly on unknown data, is one of the main goals
of cross-validation [71].
Water 2023, 15, 3970 25 of 37
1
q
N
RMSE =
N ∑ i
(y − ŷ)2
=1 i
(18)
where n is the number of data points and Σ represents the sum of squared differences
across all data points.
where Σ represents the sum of squared differences between the predicted and actual values
and y_mean is the mean of the actual values.
5. Results
Figures 14–22 compare the actual rainfall with predicted rainfall based on temperature.
The graphs are plotted for both training data and testing data. The graph shows the data
distribution for both actual values during training and testing. The best results are obtained
using polynomial regression and random forest with an R2 value of 0.76. The RMSE values
are also shallow for these machine learning models. Also, the graphs for these models
(Figures 14 and 18) have similar distributions of actual and predicted values during testing,
whereas for other models, the distribution varies a lot between actual and predicted values.
The results describe the actual vs. predicted results of the models. The precipitation
recorded or witnessed at particular locations in Bangladesh over a specified period is
called actual rainfall. It is founded on information from satellite imaging, weather and
precipitation monitoring stations, and other reliable data sources.
The ground truth, or actual rainfall, is used to compare predictions and is essential for
evaluating the model’s accuracy. The estimations or forecasts of future rainfall produced
using a hydrological model, weather forecasting system, or predictive model are the
predicted rainfall. These forecasts are often based on historical weather data, atmospheric
conditions, and mathematical models that simulate precipitation patterns. In this work,
the variation in rainfall based on feature minimum or maximum temperature is measured
using multiple linear regression, polynomial linear regression, decision tree regression,
k-nearest neighbours, support vector machine, random forest, AdaBoostRegressor, Stacking
Regressor, and artificial neural network. Table 5 shows the results of model implementation.
7. AdaBoostRegressor 0.7047 0.710915 110.9689 110.49437
Figure 15. Illustration of actual and predicted rainfall using polynomial regression.
Figure 15. Illustration of actual and predicted rainfall using polynomial regression.
Figure 15. Illustration of actual and predicted rainfall using polynomial regression.
Figure 16. Illustration of actual and predicted rainfall using decision tree.
Figure 16. Illustration of actual and predicted rainfall using decision tree.
Figure 16. Illustration of actual and predicted rainfall using decision tree.
Figure 17. Illustration of actual and predicted rainfall using k-nearest neighbour.
Figure 17. Illustration of actual and predicted rainfall using k-nearest neighbour.
Water 2023, 15, 3970 28 of 37
Figure 17. Illustration of actual and predicted rainfall using k-nearest neighbour.
Figure 18. Illustration of actual and predicted rainfall using support vector machine.
Figure 18. Illustration of actual and predicted rainfall using support vector machine.
Figure 18. Illustration of actual and predicted rainfall using support vector machine.
Figure 19. Illustration of actual and predicted rainfall using random forest.
Figure 19. Illustration of actual and predicted rainfall using random forest.
Figure 19. Illustration of actual and predicted rainfall using random forest.
Figure 21. Illustration of actual and predicted rainfall using the Stacking Regressor model.
Figure 21. Illustration of actual and predicted rainfall using the Stacking Regressor model.
Figure 21. Illustration of actual and predicted rainfall using the Stacking Regressor model.
Machine and Deep Table 6 shows the results obtained via LSTM and RNN. Various statistics, such
Evaluation Metrics R2 and RMSE
Learning Model
loss, validation loss, RMSE and testing, are also shown. It can be seen that the loss valu
S.No. ML Model R2 Score Training
for LSTM areR2significantly
Score TestingbetterRMSE Score Training
than RNN. RMSE Testing
1. Multiple Linear regression 0.6643 0.6687 118.3231 118.279217
Table 6. Results obtained using various deep learning models (LSTM and RNN).
2. Polynomial regression 0.773177 0.7642164 99.12397 99.844
3. Decision Tree mode 0.75 0.72 101.195 123.27715
S. No Model Architecture Parameters Value
4. k-nearest neighbours 0.9992 0.74723 5.5840 103.31968
5. Support vector machine 0.654139 0.6583 120.108182
Loss 120.12110 0.0904
6. Random ForestLSTM 0.96417 0.768234 38.656 99.5790
In order to address and overcome the
7. AdaBoostRegressor 0.7047 0.710915 110.9689
RMSE 110.49437 0.3007
shortcomings of conventional RNNs,
8. Stacking1.Regressor 0.74631 0.738501
Refer to Figure 23 102.88535 106.1608
the LSTM approach was specifically
9. Artificial Neural Network 0.763247 0.75847 Val_loss
100.911 100.77041 0.0906
developed for learning long-term de-
pendencies.
Testing set loss 93,260.7188
Table 6 shows the results obtained via LSTM and RNN. Various statistics, such as loss,
Loss 126.5478
RNN
validation loss, RMSE and testing, are also shown. It can be seen that the loss values for
The main feature of an RNN is its abil-
LSTM are significantly better than RNN.
ity to maintain a hidden state or mean_absolute_error: 126.5478
memory, which is revised at each time
2 Refer to Figure 24
Table Results
6.passed
step and obtained
as input to the using
next, various deep learning models (LSTM and RNN).
Val_loss 124.1010
allowing the network to consider pre-
vious information while processing
S. No Model Architecture Parameters Value
the current input. Val_mean_absolute_error 124.1010
LSTM Loss 0.0904
In order to address and overcome the shortcomings of RMSE 0.3007
1. Refer to Figure 23
conventional RNNs, the LSTM approach was specifically Val_loss 0.0906
developed for learning long-term dependencies. Testing set loss 93,260.7188
RNN Loss 126.5478
The main feature of an RNN is its ability to maintain a
2. Refer to Figure 24 mean_absolute_error: 126.5478
hidden state or memory, which is revised at each time step
and passed as input to the next, allowing the network to Val_loss 124.1010
consider previous information while processing the
current input. Val_mean_absolute_error 124.1010
Water 2023, 15,
Water 2023, 15, 3970
x FOR PEER REVIEW 32of
35 of 37
40
Water 2023, 15, x FOR PEER REVIEW 35 of 40
Before proceeding towards the end of the discussion, let us explore the results in depth.
Table 5 shows the results of various implemented machine learning models. Their R2 and
RMSE values are shown for both training and testing. The primary concern is the values
of these parameters during testing. A higher R2 value shows the supremacy of the model.
We can see that polynomial regression provides the highest R2 value of 0.76. Similarly, a
lower error values (RMSE) also shows the model is performing well. Again, the RMSE
value of 99.844 is the lowest for polynomial regression. There are other models as well,
such as multiple linear regression, decision tree, k-nearest neighbour and support vector
machines. However, these models have R2 values of less than 0.76. Figures 14–22 show the
actual and predicted rain fall with respect to temperature during both training and testing.
It can be seen that the trend/plot for polynomial regression is similar/closed for both
training and testing, whereas for other models, the plot differs significantly for training
and testing. Hence, we can say that polynomial regression provides a better modelling
of rainfall. Table 6 shows the results obtained via deep learning models. The loss and
RMSE values for LSTM are significantly better than those obtained for RNN because LSTM
captures long-term dependencies and RNN suffers from vanishing gradient and exploding
gradient problems. Therefore, LSTM has performed with better results.
There are certain limitations in this work. The dataset was limited, and we have only
tested a few deep learning models. More work can be carried out to employ pre-trained
models and transfer learning to improve performance. We have only evaluated this model
for specific parameters. Other parameters can also be considered for extensive evaluations.
concentrate on creating a complete end-to-end early warning system that combines predic-
tion with communication channels to notify locals in flood-prone areas, building on the
predictive skills.
Author Contributions: Conceptualisation, A.R., H.F., N.I. and D.S.; methodology, M.A.E., A.S., M.A.
(Muhammad Akram) and M.A. (Mesfer Alrizq); software, A.R., H.F., N.I. and D.S.; validation, M.A.E.,
A.S., M.A. (Muhammad Akram). and M.A. (Mesfer Alrizq); formal analysis, A.R., H.F., N.I. and D.S.;
investigation, M.A.E., A.S., M.A. (Muhammad Akram) and M.A. (Mesfer Alrizq); resources, H.F.
and N.I.; data curation, M.A.E. and M.A. (Mesfer Alrizq); writing—original draft preparation, A.R.,
H.F., N.I. and D.S.; writing—review and editing, M.A.E., A.S., M.A. (Muhammad Akram). and M.A.
(Mesfer Alrizq); visualisation, M.A. (Muhammad Akram).; supervision, A.S.; project administration,
A.R. and N.I.; funding acquisition, M.A.E. All authors have read and agreed to the published version
of the manuscript.
Funding: The authors would like to acknowledge the support of the Deputy for Research and
Innovation, Ministry of Education, Kingdom of Saudi Arabia, for funding this research through a
grant (NU/IFC/2/SERC/-/48) under the Institutional Funding Committee at Najran University,
Kingdom of Saudi Arabia.
Data Availability Statement: Data available in a publicly accessible repository on Kaggle and can be
found at the following link: https://www.kaggle.com/datasets/emonreza/65-years-of-weather-
data-bangladesh-preprocessed, accessed on 20 October 2023.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Syeed, M.M.A.; Farzana, M.; Namir, I.; Ishrar, I.; Nushra, M.H.; Rahman, T. Flood prediction using machine learning models.
In Proceedings of the 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications
(HORA), Ankara, Turkey, 9–11 June 2022; IEEE: New York, NY, USA, 2022.
2. Kumar, V.; Azamathulla, H.M.; Sharma, K.V.; Mehta, D.J.; Maharaj, K.T. The state of the art in deep learning applications,
challenges, and future prospects: A comprehensive review of flood forecasting and management. Sustainability 2023, 15, 10543.
[CrossRef]
3. Gude, V.; Corns, S.; Long, S. Flood prediction and uncertainty estimation using deep learning. Water 2020, 12, 884. [CrossRef]
4. Nguyen, D.T.; Chen, S.-T. Real-time probabilistic flood forecasting using multiple machine learning methods. Water 2020, 12, 787.
[CrossRef]
5. Furquim, G.; Pessin, G.; Faiçal, B.S.; Mendiondo, E.M.; Ueyama, J. Improving the accuracy of a flood forecasting model by means
of machine learning and chaos theory: A case study involving a real wireless sensor network deployment in brazil. Neural
Comput. Appl. 2016, 27, 1129–1141. [CrossRef]
6. Talukdar, S.; Ghose, B.; Shahfahad; Salam, R.; Mahato, S.; Pham, Q.B.; Linh, N.T.T.; Costache, R.; Avand, M. Flood susceptibility
modeling in Teesta River basin, Bangladesh using novel ensembles of bagging algorithms. Stoch. Environ. Res. Risk Assess. 2020,
34, 2277–2300. [CrossRef]
7. Maspo, N.-A.; Bin Harun, A.N.; Goto, M.; Cheros, F.; Haron, N.A.; Nawi, M.N.M. Evaluation of Machine Learning approach in
flood prediction scenarios and its input parameters: A systematic review. In IOP Conference Series: Earth and Environmental Science;
IOP Publishing: Bristol, UK, 2020.
8. Mitra, P.; Ray, R.; Chatterjee, R.; Basu, R.; Saha, P.; Raha, S.; Barman, R.; Patra, S.; Biswas, S.S.; Saha, S. Flood forecasting
using Internet of things and artificial neural networks. In Proceedings of the 2016 IEEE 7th Annual Information Technology,
Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 13–15 October 2016; IEEE: New York,
NY, USA, 2016.
9. Noymanee, J.; Nikitin, N.O.; Kalyuzhnaya, A.V. Urban pluvial flood forecasting using open data with machine learning techniques
in pattani basin. Procedia Comput. Sci. 2017, 119, 288–297. [CrossRef]
10. Aswad, F.M.; Kareem, A.N.; Khudhur, A.M.; Khalaf, B.A.; Mostafa, S.A. Tree-based machine learning algorithms in the Internet of
Things environment for multivariate flood status prediction. J. Intell. Syst. 2021, 31, 1–14. [CrossRef]
11. Sankaranarayanan, S.; Prabhakar, M.; Satish, S.; Jain, P.; Ramprasad, A.; Krishnan, A. Flood prediction based on weather
parameters using deep learning. J. Water Clim. Change 2020, 11, 1766–1783. [CrossRef]
12. Wang, G.; Yang, J.; Hu, Y.; Li, J.; Yin, Z. Application of a novel artificial neural network model in flood forecasting. Environ. Monit.
Assess. 2022, 194, 125. [CrossRef]
13. Puttinaovarat, S.; Horkaew, P. Flood forecasting system based on integrated big and crowdsource data by using machine learning
techniques. IEEE Access 2020, 8, 5885–5905. [CrossRef]
Water 2023, 15, 3970 35 of 37
14. Ria, N.J.; Ani, J.F.; Islam, M.; Masum, A.K.M. Standardization Of Rainfall Prediction In Bangladesh Using Machine Learning
Approach. In Proceedings of the 2021 12th International Conference on Computing Communication and Networking Technologies
(ICCCNT), Kharagpur, India, 6–8 July 2021; IEEE: New York, NY, USA, 2021.
15. Osmani, S.A.; Kim, J.-S.; Jun, C.; Sumon, W.; Baik, J.; Lee, J. Prediction of monthly dry days with machine learning algorithms: A
case study in Northern Bangladesh. Sci. Rep. 2022, 12, 19717. [CrossRef] [PubMed]
16. Manandhar, A.; Fischer, A.; Bradley, D.J.; Salehin, M.; Islam, M.S.; Hope, R.; Clifton, D.A. Machine learning to evaluate impacts of
flood protection in Bangladesh, 1983–2014. Water 2020, 12, 483. [CrossRef]
17. Aydin, M.C.; Sevgi Birincioğlu, E. Flood risk analysis using gis-based analytical hierarchy process: A case study of Bitlis Province.
Appl. Water Sci. 2022, 12, 122. [CrossRef]
18. Msabi, M.M.; Makonyo, M. Flood susceptibility mapping using GIS and multi-criteria decision analysis: A case of Dodoma
region, central Tanzania. Remote Sens. Appl. Soc. Environ. 2021, 21, 100445. [CrossRef]
19. Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of
machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [CrossRef]
20. Elmagzoub, M.; Syed, D.; Shaikh, A.; Islam, N.; Alghamdi, A.; Rizwan, S. A survey of swarm intelligence based load balancing
techniques in cloud computing environment. Electronics 2021, 10, 2718. [CrossRef]
21. Al Reshan, M.S.; Syed, D.; Islam, N.; Shaikh, A.; Hamdi, M.; Elmagzoub, M.A.; Muhammad, G.; Talpur, K.H. A Fast Converging
and Globally Optimized Approach for Load Balancing in Cloud Computing. IEEE Access 2023, 11, 11390–11404. [CrossRef]
22. Islam, N.; Raza, E.; Mohsin, S.; Ansari, A.; Shuja, R.; Syed, D. Forecasting on COVID-19 Data Using ARIMAX Model. In Data
Science with Semantic Technologies; CRC Press: Boca Raton, FL, USA, 2023; pp. 95–113.
23. Islam, N.; Khan, S.K.; Rehman, A.; Aftab, U.; Syed, D. Stock Prediction for ARGAAM Companies Dataset. KIET J. Comput. Inf. Sci.
2023, 6, 1–13. [CrossRef]
24. Bui, D.T.; Pradhan, B.; Nampak, H.; Bui, Q.-T.; Tran, Q.-A.; Nguyen, Q.-P. Hybrid artificial intelligence approach based on neural
fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone
area using GIS. J. Hydrol. 2016, 540, 317–330.
25. Chatterjee, S.; Datta, B.; Sen, S.; Dey, N.; Debnath, N.C. Rainfall prediction using hybrid neural network approach. In Proceed-
ings of the 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications & Computing
(SigTelCom), Ho Chi Minh, Vietnam, 29–31 January 2018; IEEE: New York, NY, USA, 2018.
26. Islam, M.N.; van Amstel, A.; Ghosh, B.K.; Sarker, K.R. Climate Change and Living with Floods: An Empirical Case from the
Saghata Union of Gaibandha District, Bangladesh. In Bangladesh II: Climate Change Impacts, Mitigation and Adaptation in Developing
Countries; Springer: Cham, Switzerland, 2021; pp. 459–478.
27. Luo, T.; Maddocks, A.; Iceland, C.; Ward, P.; Winsemius, H. World’s 15 Countries with the Most People Exposed to River Floods; World
Resources Institute: Washington, DC, USA, 2015.
28. Kumari, S.; Tripathy, K.K.; Kumbhar, V. Data Science and Analytics; Emerald Publishing Limited: Bingley, UK, 2020.
29. Thirumalai, C.; Harsha, K.S.; Deepak, M.L.; Krishna, K.C. Heuristic prediction of rainfall using machine learning techniques. In
Proceedings of the 2017 International Conference on Trends in Electronics and Informatics (ICEI), Tirunelveli, India, 11–12 May
2017; IEEE: New York, NY, USA, 2017.
30. Adnan, R.; Zain, Z.M.; Ruslan, F.A. 5 hours flood prediction modeling using improved NNARX structure: Case study Kuala
Lumpur. In Proceedings of the 2014 IEEE 4th International Conference on System Engineering and Technology (ICSET), Bandung,
Indonesia, 24–25 November 2014; IEEE: New York, NY, USA, 2014.
31. Mosavi, A.; Ozturk, P.; Chau, K.-W. Flood prediction using machine learning models: Literature review. Water 2018, 10, 1536.
[CrossRef]
32. Chen, C.; Jiang, J.; Liao, Z.; Zhou, Y.; Wang, H.; Pei, Q. A short-term flood prediction based on spatial deep learning network: A
case study for Xi County, China. J. Hydrol. 2022, 607, 127535. [CrossRef]
33. Motta, M.; de Castro Neto, M.; Sarmento, P. A mixed approach for urban flood prediction using Machine Learning and GIS. Int. J.
Disaster Risk Reduct. 2021, 56, 102154. [CrossRef]
34. Ghorpade, P.; Gadge, A.; Lende, A.; Chordiya, H.; Gosavi, G.; Mishra, A.; Hooli, B.; Ingle, Y.S.; Shaikh, N. Flood forecasting using
machine learning: A review. In Proceedings of the 2021 8th International Conference on Smart Computing and Communications
(ICSCC), Kerala, India, 1–3 July 2021; IEEE: New York, NY, USA, 2021.
35. Adnan, M.S.G.; Siam, Z.S.; Kabir, I.; Kabir, Z.; Ahmed, M.R.; Hassan, Q.K.; Rahman, R.M.; Dewan, A. A novel framework
for addressing uncertainties in machine learning-based geospatial approaches for flood prediction. J. Environ. Manag. 2023,
326, 116813. [CrossRef] [PubMed]
36. Gauhar, N.; Das, S.; Moury, K.S. Prediction of flood in Bangladesh using K-nearest neighbors algorithm. In Proceedings of the
2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 5–7
January 2021; IEEE: New York, NY, USA, 2021.
37. Han, S.; Coulibaly, P. Bayesian flood forecasting methods: A review. J. Hydrol. 2017, 551, 340–351. [CrossRef]
38. Hamidul Haque, M.; Sadia, M.; Mustaq, M. Development of Flood Forecasting System for Someshwari-Kangsa Sub-watershed of
Bangladesh-India Using Different Machine Learning Techniques. EGU General Assembly Conference Abstracts; EGU: Virtual, 2021.
Available online: https://ui.adsabs.harvard.edu/abs/2021EGUGA..2315294H/abstract (accessed on 20 October 2023).
Water 2023, 15, 3970 36 of 37
39. Billah, M.; Adnan, N.; Akhond, M.R.; Ema, R.R.; Hossain, A.; Galib, S.M. Rainfall prediction system for Bangladesh using long
short-term memory. Open Comput. Sci. 2022, 12, 323–331. [CrossRef]
40. Yaseen, M.W.; Awais, M.; Riaz, K.; Rasheed, M.B.; Waqar, M.; Rasheed, S. Artificial Intelligence Based Flood Forecasting for River
Hunza at Danyor Station in Pakistan. Arch. Hydro-Eng. Environ. Mech. 2022, 69, 59–77. [CrossRef]
41. Parmar, A.; Mistree, K.; Sompura, M. Machine learning techniques for rainfall prediction: A review. In Proceedings of the
International Conference on Innovations in Information Embedded and Communication Systems, Coimbatore, India, 17–18
March 2017.
42. Khosravi, K.; Panahi, M.; Golkarian, A.; Keesstra, S.D.; Saco, P.M.; Bui, D.T.; Lee, S. Convolutional neural network approach for
spatial prediction of flood hazard at national scale of Iran. J. Hydrol. 2020, 591, 125552. [CrossRef]
43. Kovalchuk, S.V.; Krikunov, A.V.; Knyazkov, K.V.; Boukhanovsky, A.V. Classification issues within ensemble-based simulation:
Application to surge floods forecasting. Stoch. Environ. Res. Risk Assess. 2017, 31, 1183–1197. [CrossRef]
44. Nevo, S.; Morin, E.; Rosenthal, A.G.; Metzger, A.; Barshai, C.; Weitzner, D.; Voloshin, D.; Kratzert, F.; Elidan, G.; Dror, G.; et al.
Flood forecasting with machine learning models in an operational framework. arXiv 2021, arXiv:2111.02780. [CrossRef]
45. Khosravi, K.; Shahabi, H.; Pham, B.T.; Adamowski, J.; Shirzadi, A.; Pradhan, B.; Dou, J.; Ly, H.-B.; Gróf, G.; Ho, H.L.; et al. A
comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning
methods. J. Hydrol. 2019, 573, 311–323. [CrossRef]
46. El-Magd, S.A.A.; Pradhan, B.; Alamri, A. Machine learning algorithm for flash flood prediction mapping in Wadi El-Laqeita and
surroundings, Central Eastern Desert, Egypt. Arab. J. Geosci. 2021, 14, 323. [CrossRef]
47. Nayak, M.; Das, S.; Senapati, M.R. Improving Flood Prediction with Deep Learning Methods. J. Inst. Eng. Ser. B 2022, 103,
1189–1205. [CrossRef]
48. Tayfur, G.; Singh, V.P.; Moramarco, T.; Barbetta, S. Flood hydrograph prediction using machine learning methods. Water 2018,
10, 968. [CrossRef]
49. Sahoo, A.; Samantaray, S.; Ghose, D.K. Prediction of flood in Barak River using hybrid machine learning approaches: A case
study. J. Geol. Soc. India 2021, 97, 186–198. [CrossRef]
50. Qian, K.; Mohamed, A.; Claudel, C. Physics informed data driven model for flood prediction: Application of deep learning in
prediction of urban flood development. arXiv 2019, arXiv:1908.10312.
51. Miau, S.; Hung, W.-H. River flooding forecasting and anomaly detection based on deep learning. IEEE Access 2020, 8,
198384–198402. [CrossRef]
52. Hossain, I.; Rasel, H.M.; Alam Imteaz, M.; Mekanik, F. Long-term seasonal rainfall forecasting using linear and non-linear
modelling approaches: A case study for Western Australia. Meteorol. Atmos. Phys. 2020, 132, 131–141. [CrossRef]
53. Ighile, E.H.; Shirakawa, H.; Tanikawa, H. Application of GIS and machine learning to predict flood areas in Nigeria. Sustainability
2022, 14, 5039. [CrossRef]
54. Kunverji, K.; Shah, K.; Shah, N. A flood prediction system developed using various machine learning algorithms. In Proceedings
of the 4th International Conference on Advances in Science & Technology (ICAST2021), Mumbai, India, 7 May 2021.
55. Dodangeh, E.; Choubin, B.; Eigdir, A.N.; Nabipour, N.; Panahi, M.; Shamshirband, S.; Mosavi, A. Integrated machine learning
methods with resampling algorithms for flood susceptibility prediction. Sci. Total Environ. 2020, 705, 135983. [CrossRef]
56. Khairudin, N.M.; Mustapha, N.O.; Aris, T.N.; Zolkepli, M.A. A study to investigate the effect of different time-series scales
towards flood forecasting using machine learning. J. Theor. Appl. Inform. Technol. 2021, 99, 5687–5699.
57. Dtissibe, F.Y.; Ari, A.A.A.; Titouna, C.; Thiare, O.; Gueroui, A.M. Flood forecasting based on an artificial neural network scheme.
Nat. Hazards 2020, 104, 1211–1237. [CrossRef]
58. Sarasa-Cabezuelo, A. Prediction of rainfall in Australia using machine learning. Information 2022, 13, 163. [CrossRef]
59. Liyew, C.M.; Melese, H.A. Machine learning techniques to predict daily rainfall amount. J. Big Data 2021, 8, 153. [CrossRef]
60. Singh, P. Indian summer monsoon rainfall (ISMR) forecasting using time series data: A fuzzy-entropy-neuro based expert system.
Geosci. Front. 2018, 9, 1243–1257. [CrossRef]
61. Mishra, N.; Soni, H.K.; Sharma, S.; Upadhyay, A.K. Development and analysis of artificial neural network models for rainfall
prediction by using time-series data. Int. J. Intell. Syst. Appl. 2018, 12, 16. [CrossRef]
62. Chitwatkulsiri, D.; Miyamoto, H. Real-Time Urban Flood Forecasting Systems for Southeast Asia—A Review of Present Modelling
and Its Future Prospects. Water 2023, 15, 178. [CrossRef]
63. Kumar, V.; Sharma, K.V.; Caloiero, T.; Mehta, D.J.; Singh, K. Comprehensive overview of flood modeling approaches: A review of
recent advances. Hydrology 2023, 10, 141. [CrossRef]
64. Mosaffa, H.; Sadeghi, M.; Mallakpour, I.; Jahromi, M.N.; Pourghasemi, H.R. Application of Machine Learning Algorithms in
Hydrology. In Computers in Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2022; pp. 585–591.
65. Jehanzaib, M.; Ajmal, M.; Achite, M.; Kim, T.-W. Comprehensive review: Advancements in rainfall-runoff modelling for flood
mitigation. Climate 2022, 10, 147. [CrossRef]
66. Mistry, S.; Parekh, F. Flood Forecasting Using Artificial Neural Network. In IOP Conference Series: Earth and Environmental Science;
IOP Publishing: Bristol, UK, 2022.
67. Xu, Y.; Hu, C.; Wu, Q.; Jian, S.; Li, Z.; Chen, Y.; Zhang, G.; Zhang, Z.; Wang, S. Research on particle swarm optimization in LSTM
neural networks for rainfall-runoff simulation. J. Hydrol. 2022, 608, 127553. [CrossRef]
Water 2023, 15, 3970 37 of 37
68. Cho, M.; Kim, C.; Jung, K.; Jung, H. Water level prediction model applying a long short-term memory (lstm)–gated recurrent unit
(gru) method for flood prediction. Water 2022, 14, 2221. [CrossRef]
69. Qadeer, K.; Rehman, W.U.; Sheri, A.M.; Park, I.; Kim, H.K.; Jeon, M. A long short-term memory (LSTM) network for hourly
estimation of PM2.5 concentration in two cities of South Korea. Appl. Sci. 2020, 10, 3984. [CrossRef]
70. Available online: https://www.kaggle.com/datasets/emonreza/65-years-of-weather-data-bangladesh-preprocessed (accessed
on 20 October 2023).
71. Wong, T.-T.; Yeh, P.-Y. Reliable accuracy estimates from k-fold cross validation. IEEE Trans. Knowl. Data Eng. 2019, 32, 1586–1594.
[CrossRef]
72. Rahman, M.; Chen, N.; Elbeltagi, A.; Islam, M.M.; Alam, M.; Pourghasemi, H.R.; Tao, W.; Zhang, J.; Shufeng, T.; Faiz, H.; et al.
Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J. Environ. Manag.
2021, 295, 113086. [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.