0% found this document useful (0 votes)
66 views

Rainfall Prediction - Report

1) The document discusses using machine learning techniques to predict rainfall based on weather data from Australian cities. 2) A variety of supervised and unsupervised machine learning algorithms and models were tested, including logistic regression, decision trees, K-nearest neighbors, and ensembles. Accuracy, F-score, and other metrics were used to evaluate the models. 3) The results provide a comparison of different machine learning strategies for rainfall prediction and their effectiveness when analyzing weather characteristics like temperature, humidity, wind speed, and pressure that influence rainfall patterns. Accurate rainfall forecasts can help mitigate natural disasters and resource planning.

Uploaded by

Vinoth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Rainfall Prediction - Report

1) The document discusses using machine learning techniques to predict rainfall based on weather data from Australian cities. 2) A variety of supervised and unsupervised machine learning algorithms and models were tested, including logistic regression, decision trees, K-nearest neighbors, and ensembles. Accuracy, F-score, and other metrics were used to evaluate the models. 3) The results provide a comparison of different machine learning strategies for rainfall prediction and their effectiveness when analyzing weather characteristics like temperature, humidity, wind speed, and pressure that influence rainfall patterns. Accurate rainfall forecasts can help mitigate natural disasters and resource planning.

Uploaded by

Vinoth
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

RAINFALL PREDICTION

Abstract employed by developing vaticination models similar to machine literacy to


capture literal downfall patterns. In former literature, vaticination models
Rain forecast is one of the most challenging and uncertain tasks that have a
grounded on downfall charts may be grouped into two main orders. The first
profound effect on human society. Timely and accurate forecasts can help
type involves applying deep literacy to a sequence of images as an input to
reduce costs and financial losses. This study presents a set of experiments
prognosticate unborn frames. Generally, the images used for this type of
involving the use of conventional machine learning techniques to create
vaticination are separated by fairly small time intervals. The alternate type
models to predict whether it will rain tomorrow or not based on the specific
consists of single affair retrogression or bracket-grounded models. These
weather data for that particular day in major Australian cities. This
models prognosticate downfall on an hourly, daily, or yearly base using
comparative study was conducted focusing on three aspects: modeling
previous downfall charts. Retrogression- grounded models may use a single
inputs, modeling methods, and pre-screening techniques. The results
frame or batch of frames as an input to give a numerical downfall
provide a comparison of the various testing metrics for these machine
vaticination. In discrepancy, bracket grounded models classify original
learning strategies and their reliability to predict rainfall by analyzing
downfall into two or further separate classes and prognosticate the classes
weather data. Rainfall has become a major source of concern in recent
of unborn rush events grounded on a single frame or a batch of frames. Both
weeks. For the past ten years, weather patterns have been shifting. As a
retrogression and bracket models can be used to prognosticate entire images
result, precipitation patterns have drastically changed. Temperature,
one pixel at a time. For purposes of comparison, The authors used a single
humidity, wind speed, pressure, and precipitation are all characteristics that
image to produce a double bracket ( rain/ no- rain) for three days ahead in
have an impact on rainfall. These are the fundamental determinants of
Thailand. Using the commencement-V3 grounded CNN to model the
precipitation. It's critical to examine rainfall patterns in relation to the things
authors had up to54.84 bracket delicacy for three days ahead vaticination.
that influence them. We won't be able to correctly forecast rainfall till then.
The study also concluded that including neighbouring countries in the
Our task has been simplified by machine learning. For predictions, there are
images increases the effectiveness of the model compared to cropping the
a variety of supervised and unsupervised methods. Machine learning
image to concentrate only on Thailand. Indeed though utmost former studies
algorithms for rainfall prediction may employ regression.
made use of deep literacy-related ways to capture downfall patterns on
Introduction
literal images like Convolutional LSTM, in some cases the deep literacy
Rain forecasting remains an important problem, attracting the attention of approach has significant downsides. In particular, it frequently overfits
governments, business, risk management groups, and scientists. Rainfall is when the training set is fairly small. In addition, those models have
a climatic element that has an impact on a variety of human activities, numerous hyperparameters that need to be optimized.
including agriculture, building, power generation, forestry, and tourism, to
Social Impact
name a few. Rainfall predictions are critical at this level since these changes
We previously utilized PW V and its second derivative to construct a model
are linked to natural calamities including landslides, floods, mass
for the tropical area that predicted a rain event with a lead time of 5 minutes
displacement, and avalanches. These acts have had a long-term impact on
based on data collected 30 minutes earlier [35]. Seasons were used to split
the community. As a result, having an accurate rain forecast system allows
the model into three pieces (NE-, SW-, and Inter-monsoons). We use the
you to take preventative and mitigation steps against these natural
same rain prediction scenario in this paper, in which (1) we divide our
occurrences. We employed machine learning techniques and models to
feature database into 30-minute segments, (2) we check whether a rain event
provide accurate and timely predictions to solve this ambiguity. The goal of
occurs after a lead time of 5 minutes for each 30-minute segment, and (3)
this study is to present a machine learning life cycle, from pre-processing to
all rainfall within a 30-minute window or less is considered a single rain
model application and testing. Missing value, feature conversion, coding
event [40].In New South Wales, Australia, one of the worst bushfires has
category features, feature rating, and feature selection are just a few of the
occurred. Drought and lack of water have always been issues. A machine
steps in the Pre-Data Data process. Logistic Regression, Decision Tree, K
learning model with an acceptable degree of forecast accuracy might assist
Neighborhood, Rule-Based, and Ensembles are some of the models we've
in ensuring that enough resources are available for rainwater gathering.
created. Accuracy, Accuracy, Memory, F-Score, and Sub Curve were
employed as test measures for this study. We used Australian weather data Scope
gathered at several Australian weather stations to train our class dividers in
This exploration aims to probe rush soothsaying on a dataset from the
our research. In this project, I'll utilise the Rain Dataset to build a prediction
National Center for Environmental Prediction (NCEP) using SVMs. Our
model that will estimate if it will rain in Australia tomorrow.
disquisition has three aspects i) Determine the effect of image sequence
Downfall charts give essential information about the intensity, temporal, input length on class vaticination delicacy, ii) Assess the effect of image
and spatial position which are essential in water resource operation. Literal size on class vaticination delicacy, and iii) Compare the delicacy of downfall
downfall maps data can help different operation sectors similar to husbandry class prognostications for three named places ( penstocks) from a 5 × 5 grid
to make informed opinions about water force operation strategies to more covering the chart area, for over to 30 days ahead. This paper presents the
use the circumstance of rush events. Literal data can be most effectively methodology, including a discussion of the datasets and their medication as
well as the SVM specification and training. It also presents the results in Tasks and Methods
irregular and graphical form and provides analysis and discussion and
Data Source
summarizes our conclusions.
The dataset is taken from Kaggle and contains about 10 years of daily
Background/Related work
weather observations from many locations across Australia.
A machine learning method called linear regression is used to forecast rain
Dataset Description:
utilizing crucial meteorological information by detecting the relationship
Number of columns: 23
between rainfall variability and rain. According to a related study, solar
radiation, audible water vapor, and diurnal rhythms are critical Number of rows: 145460
characteristics for forecasting daily rainfall using an algorithm to read a Number of Independent Columns: 22
data-driven machine. Future studies, according to Manandhar et al., should
Number of Dependent Columns: 1
look at the impact of various geographical characteristics on large data
collecting. The study looks at the relationship between independent and Preprocessing
dependent components to see what influences rainfall and rainfall patterns. Real-world data is often chaotic, unstructured, inconsistent, redundant, and
The amount of daily rainfall that was not detected or addressed in this study riddled with erroneous values. As a result, extracting insights from raw data
might have an impact on the system's effectiveness. Harun et al. performed without the usage of Data Preprocessing processes is almost impossible.
an accurate measurement of a comparative study of statistical models and
What precisely is Data Preprocessing?
retrospective methods (SVM, RF & DT) for predicting rainfall using natural
features. According to the results of the study, retrospective rainfall The translation of raw data into a format that may be utilized to generate
prediction methods were more effective than statistical modeling. Test insights is known as data preparation. It is the first and most significant step
results showed that the RF model performed and predicted more accurately of the Data Science life cycle. Preprocessing data ensures that it is clean,
than SVM and DT. Therefore, rainfall forecasts are more accurate, showing organized, and ready to be processed by the Machine Learning model.
higher performance on machine learning models than standard models. The Missing Values
study used different machine learning strategies rather than mathematical
As part of our EDA process, we discovered that we had a small number of
methods to predict daily rainfall values. A study by Arnav Garg and
instances with null values. As a result, this becomes a crucial stage. To
Kanchipuram shows a three-dimensional machine learning algorithm such
impute the missing data, we'll arrange our instances by location and date,
as vector support (SVM), vector regression (SVR), and a neighbor near K
and then replace the null entries with their mean values.
(KNN) using annual rainfall patterns.
Expansion of the Date feature: The Date feature may be expanded to include
The SVM algorithm works very well among the three machine learning
Day, Month, and Year, and these newly formed features can then be utilized
algorithms. This study did not show the effect of the test on what
in further preprocessing phases.
environmental factors affect rainfall intensity. This paper shows natural
features that have a positive or negative effect on rain and predicts the Cardinality check for Categorical features

amount of daily rainfall using those factors. Experts, for example, confirmed A classifier's accuracy and performance are determined not just by the
that a sequential retrospective reading algorithm better predicts rainfall model we apply, but also by how we preprocess data and the type of data
using climate change based on temperature, humidity, humidity, wind we give it to learn. Because many machine learning methods, such as linear
speed, and finally the study showed the performance of rain prediction regression, logistic regression, and k-nearest neighbors, can only handle
improved using in-depth learning models as a function. of the future. numerical data, categorical data must be converted to numeric. Check the
According to Sarker a comparison of performance between in-depth cardinality of each category feature before you start encoding.
learning and other machine learning algorithms is shown in Fig. 1 below,
Cardinality
where the performance of the in-depth learning model increases as the size
of the data is increased. Due to the large amount of data used in this study, The number of unique values in each categorical feature is known as

machine learning methods are appropriate. Scientists studied an in-depth cardinality. A feature with a high number of distinct/ unique values is a high

reading algorithm for predicting rainfall using different climate-dependent cardinality feature. A categorical feature with hundreds of zip codes is the

variables. To provide an accurate forecast of rainfall, speculation models are best example of a high cardinality feature. When this high cardinality

developed and tested using machine learning techniques. Therefore, many characteristic is encoded, it will increase the number of dimensions of data,

researchers did not show daily rainfall predictions but did research on which is a severe concern. This is not conducive to the model's success.

natural data to predict whether or not to predict rainfall and predict the There are a couple of approaches to deal with large cardinality: one is to use

annual rainfall average of daily rainfall is a challenging task. All of the engineering, and the other is to just remove the feature if it adds no value to

important environmental factors that are important in rainfall forecasting are the model.

not used. This paper examined machine learning algorithms using data Handling Missing Values
collected from a single-size meteorology station in comparison and selected
Missing values are a challenge for machine learning algorithms since they
appropriate natural characteristics associated with precision or negative
can't manage them. As a result, they must first be addressed. Missing values
rainfall to assess the daily rainfall performance algorithms for machine
can be identified and imputed using a variety of ways. Missing values are
learning using MAE and RMSE.
replaced with Nan(Not a Number) values when a dataset with missing
values is loaded using pandas. These NaN values may be recognized using
isna () or isnull () methods, and they can be imputed with fillna (). Missing correlation means that an increase in one feature's value reduces the value
Data Imputation is the term for this technique. of the target variable. We used the seaborn library to create a heatmap of
associated characteristics, which makes it easier to see which attributes are
Feature Hashing
most connected to the target variable.
Another effective feature engineering approach for dealing with large-scale
category features is feature hashing. A hash function is commonly employed • Taking Care of Class Inequality

in this technique, with the number of encoded features pre-set (as a vector Our data collection is significantly uneven, as we discovered during the
of pre-defined length), so that the hashed values of the features are used as EDA process. Because our algorithm doesn't learn much about the minority
indices in this pre-defined vector, and values are updated as needed. class, unbalanced data leads to skewed findings. We ran two tests, one with

Feature Scaling oversampled data and the other with data that were under-sampled.

The features in our data collection have a wide variety of magnitudes and Under-sampling: To exclude instances of the majority class, we utilized

ranges. However, because most machine learning algorithms compute the Imblearn's random under sampler library. This elimination is based on the

Euclidean distance between two data points, this is a concern. The distance to ensure that the least amount of data is lost.

magnitudes of features with high magnitudes will be weighed heavily in Oversampling: To produce synthetic instances for the minority class, we
distance computations compared to those with low magnitudes. To employed Imblearn's SMOTE approach. As an example, a subset of data
counteract this impact, we must equalize the magnitudes of all from the minority class is collected, and new synthetic cases are constructed.
characteristics. This may be accomplished by scaling.
Feature Importance
Choosing Features
The features used to train a machine learning model affect its performance.
The process of selecting the characteristics that contribute the most to our The importance of a characteristic to the creation of a model is described by
forecast variable or output, either automatically or manually, is known as its relevance. Feature importance is the practice of assigning a score to
feature selection. Irrelevant characteristics in data can lower model accuracy input/label attributes based on how useful they are at predicting a target
and cause the model to train based on them. Choosing features helps to variable. The usefulness of a feature might help you decide which ones to
prevent overfitting, enhance accuracy, and shorten training time. This task utilize. The ExtraTreesRegressor class will be used to determine the
was completed using two methods, both of which yielded similar results. relevance of features. This class implements a Meta estimator that uses
Statistical tests can be used to pick the attributes that have the strongest link averaging to boost projected accuracy and control over-fitting by fitting
with the output variable. The SelectKBest class in the sci-kit learn package many randomized decision trees on distinct samples of the dataset.
may be used with a variety of statistical tests to choose a certain number of
Feature Scaling
features. We chose 5 of the top features from our data set using the chi-
An approach for scaling, normalizing, and standardizing data in a range is
squared statistical test for non-negative characteristics.
feature scaling (0,1). When each column in a dataset contains unique values,
Outliers’ detection and treatment
scaling the data in each column to a common level becomes easier. A class
An outlier is a value that deviates abnormally from the rest of the data in a that implements feature scaling is Standard Scaler.
sample. Visualization (such as boxplots and scatter plots), Z-score,
Model Building
statistical and probabilistic algorithms, and other methods can be used to
In this project, I'll use a Logistic Regression method to create a prediction
find them.
model to forecast if it will rain in Australia tomorrow.
Encoding of Categorical Features
Logistic Regression
Most Machine Learning Algorithms like Logistic Regression, Support
This approach is based on statistics and is used to solve classification
Vector Machines, K Nearest
difficulties. • Its core is the logit or sigmoid function, which allows us to
Neighbors, for example, are unable to deal with categorical data. As a result,
forecast the likelihood that an input corresponds to a specific category.
these categorical data must be transformed into numerical data in order to
Logistic regression, according to the data science community, can address
be used in modeling, a process known as feature encoding.
60% of present classification difficulties.
One code encoding and label encoding are two examples of feature
encoding approaches. However, in this blog, I'll use the replace () function
to convert categorical data to numerical data.

Correlation

Correlation is a statistic that may be used to determine the strength of a link The logistic Regression model accuracy score is 0.84. The model does a

between two characteristics. In bivariate analysis, it's employed. The very good job of predicting.

method corr () in pandas may be used to calculate the correlation. The model shows no sign of Underfitting or Overfitting. This means the

• Heatmap and Correlation Matrix model generalizes well for unseen data.

Correlation describes the relationship between the characteristics and the The mean accuracy score of cross-validation is almost the same as the

goal variable. A positive correlation means that an increase in one feature's original model accuracy score. So, the accuracy of the model may not be

value improves the value of the target variable, whereas a negative improved using Cross-validation.
Experimental Section

1. Importing Libraries.

import NumPy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import tensorflow as tf

from sklearn.model_selection import train_test_split

import warnings

warnings.filterwarnings("ignore") 8. Handling missing values for categorical data.


2. Loading Data Set.

df = pd.read_csv('weatherAUS.csv') Location 0.000000

WindGustDir 6.515815

WindDir9am 6.982612

WindDir3pm 2.632874

RainToday 0.988976

dtype: float64

9. Missing values for numeric data.

MinTemp 0.443061

MaxTemp 0.215377

Rainfall 0.988976

Evaporation 42.788825

Sunshine 47.626457

WindGustSpeed 6.474498

3. The datatype of Date is an object so I will change it to date time for WindSpeed9am 0.941505

easy handling of dates. WindSpeed3pm 1.826749

Humidity9am 1.266769
df['Date'] = pd.to_datetime(df['Date'])
Humidity3pm 2.530900
4. Checking for the missing values in the target variable Pressure9am 9.853719

Pressure3pm 9.830863
missing value= 3267
Cloud9am 37.734058
5. Dropping the missing values.
Cloud3pm 40.174411

Temp9am 0.638219

Temp3pm 1.910262

month 0.000000

df = df.dropna(subset = ['RainTomorrow']) month_sin 0.000000

month_cos 0.000000
6. Checking Target Variable.
day 0.000000

day_sin 0.000000

day_cos 0.000000

dtype: float64

10. Features having high missing values.


11. Checking correlation between numeric variables.

7. Transformed features.
14. Training and validation accuracy plotting.

15. Model Evaluation.


16. Classification Report.
Strong correlation is between
Precision-recall f1-score support
Temp3pm and MaxTemp

Pressure3pm and Pressure9am


0 0.90 0.91 0.90 7005
Temp9am and MinTemp
1 0.43 0.39 0.41 1200
Temp9am and MaxTemp

Temp3pm and Temp9am


accuracy 0.84 8205

macro avg 0.66 0.65 0.66 8205


We will remove one of the features in each pair, to avoid multicollinearity
weighted avg 0.83 0.84 0.83 8205
12. Analyzing numeric features. Accuracy is around 84%.

Result Analysis

The accuracy of the Regression model is 0.84. The model does an excellent
job of predicting. The model does not show Underfitting or Overfitting. This
means that the model performs well on invisible data. The accuracy rating
of the verification score is almost the same as the accuracy of the actual
model. Therefore, model accuracy may not be improved using Cross-
validation. we investigate the combined effect of applying several
meteorological factors in conjunction with PW V in rainfall prediction.
Instead of utilizing distinct seasonal models, we merged the seasonal and
diurnal elements into a single model. The difference is used to determine
whether or not our testing dataset is reliable. The outcome will be negative
if the testing RMSE value is bigger than the training RMSE value. This
demonstrates that the training dataset is better trained than the testing
dataset. The R2-score number tells us how accurate each model is. The most
accurate method is the random Forest Regression algorithm. The difference
between errored and correct numbers is similarly very negligible. When
compared to other algorithms, the Random Forest Algorithm is very
effectively learned. This rainfall forecast model is built using the approaches
Many numeric features have data points beyond IQR. I am considering
mentioned in Section III. The assessment metrics are presented after the
a threshold of 5 percentile, for outlier removal, i.e any point beyound
model has been trained and evaluated using data from the NTUS station
95 percentile and below 5 percentile is considerd as outlier and will be
from 2012 to 2015 and the SNUS station from 2016. After training, SVM
removed. The threshold of 5 percentile is choosen at random, we can
has an accuracy of 80%. Because of the categorical values that are included
consider other values for the threshold also.
in the dataset, the accuracy is good but not as good as other techniques.
13. Converting 'Yes' and 'No' to '1' and '0' respectively Because classification algorithms are best suited for numerical data, this has
Yes = 1 resulted in a modest reduction in SVM accuracy.
No = 0
Discussion of findings utilized for everyday rainfall, large-scale data analysis can be used to predict
rainfall in the future.
The environmental factors used in this study, which were collected by
measuring machines from a weather station, were analyzed for their References
correlation with rainfall effect, and appropriate features were selected based
1. World Health Organization: Climate Change and Human Health: Risks
on Pearton-related quantitative test results for daily rainfall predictions, as
and Responses. World Health Organization, January 2003
shown in Table 2. This study looked at rainfall forecasts using natural
2. Alcntara-Ayala, I.: Geomorphology, natural hazards, vulnerability and
characteristics with a correlation coefficient larger than 0.2. Similarly,
prevention of natural disasters in developing countries.
Manandhar et al. use the degree of interaction between each element to
Geomorphology 47(24), 107124 (2002)
identify five significant environmental parameters such as temperature,
3. Nicholls, N.: Atmospheric and climatic hazards: Improved monitoring
relative humidity, dew point, solar radiation, and increasing water vapor.
and prediction for disaster mitigation. Natural Hazards 23(23), 137155
Temperature and Related Humidity have a high relative coefficient of
(2001)
rotation - 0.9, according to the study's experimental data. Gnanasankaran
4. [Online] InDataLabs, Exploratory Data Analysis: the Best way to Start
and Ramaraj did not illustrate the influence of environmental elements on
a Data Science Project. Available: https://medium.com/@InDataLabs/
rainfall instead utilizing monthly and yearly rainfall, while Prabakaran et al.
why-start-a-data-science-project-with-exploratory-data-analysis-
utilized the year, temperature, cloud cover, and annual attribute to test
f90c0efcbe49
without examining the interaction between natural characteristics. Data on
5. [Online] Pandas Documentation. Available: https://pandas.pydata.org/
yearly rainfall predictions. This study used an appropriate environmental
pandas-docs/stable/reference/api/pandas.get_dummies.html
feature to train and test three machine learning models such as RF, MLR,
6. [Online] Sckit-Learn Documentation Available: https://scikit-
and XGBoost to get an estimate of daily rainfall. The performance of these
learn.org/
machine learning models is measured using MAE and RMSE. RAM for RF,
stable/modules/generated/sklearn.feature_extraction.FeatureHasher.h
MLR, XGBoost is 4.49, 4.97, and 3.58, and RMSE is 8.82, 8.61, and 7.85
tml
respectively. Similarly, researcher Manandhar et al.utilized machine-
7. [Online] Sckit-Learn Documentation Available: https://scikit-
readable algorithms to predict annual rainfall using appropriate natural
learn.org/
features and recorded a total accuracy of 79.6%. The researcher considered
stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
the attributes to predict the annual rainfall value by taking into account the
8. [Online] Sckit Learn Documentation Available: https://scikit-learn.org/
average temperature, cloud cover, and annual rainfall as inclusions.
stable/modules/generated/sklearn.feature_selection.SelectKBest.html
Analysis of correlations between adjectives was not examined. The average
9. [Online] Raheel Shaikh, Feature Selection Techniques in Machine
percentage of error forecast of annual rainfall using a fixed-line deficit was
Learning with Python Available: https://towardsdatascience.com/
7%. Researchers Gnanasankaran and Ramaraj [14], did not show any effect
feature-selection-techniques-in-machine-learning-with-python-
of environmental factors on rainfall. The study took monthly and annual
f24e7da3f36e Predicting Rainfall using Machine Learning Techniques
rainfall to predict rainfall and measured performance using RMSE of 0.1069
10. [Online] Imbalanced-learn Documentation Available:
and MAE of 0.0833 using multiple line regression. Therefore, this study
https://imbalanced-learn. readthedocs.io/en/stable/introduction.html
examined the impact of environmental factors on daily rainfall intensity
11. V. Veeralakshmi and D. Ramyachitra, Ripple Down Rule learner
using Pearson's correlation and selected appropriate environmental
(RIDOR) Classifier for IRIS Dataset. Issues, vol 1, p. 79-85. 12.
variables. Proper features used for input of daily rainfall predictor models
[Online] Aditya Mishra, Metrics to Evaluate your Machine Learning
and performance models are measured using MAE and RMSE.
Algorithm Available: https://towardsdatascience.com/ metrics-to-
Conclusion evaluate-your-machine-learning-algorithm-f10ba6e38234
12. Nikhil Sethi, Dr. Kanwal Garg, “Exploring Data Mining Technique for
Rainfall Prediction is a data science and machine learning subject that use
Rainfall Prediction”, Vol. 5(3), 2014, ISSN: 0975-9646.
algorithms to forecast the weather. Predicting rainfall is essential for making
13. Bushra Praveen, Swapan Talukdar, Shahfahad, Susanta Mahato,
better use of water resources and increasing crop yields, as well as lowering
Jayanta Mondal, Pritee Sharma, Abu Reza Md.Towfiqul Islam, Atiqur
mortality from flooding and rain-related illnesses. This study examines a
Rahman, “Analyzing Trend and Forecasting of rainfall changes in
number of machine learning algorithms for forecasting rain. Three machine
India using nonparametrical and machine learning approaches”,
learning algorithms, MLR, FR, and XGBoost, were presented and evaluated
Scientific Report, 2020.
using data from an Australian weather station. The Pearson coefficient of
14. Aakash Parmar, Kinjal Mistree, Mithila Sompura, “Machine Learning
integration was used to find appropriate natural rainfall features. A version
Techniques for Rainfall Prediction: A Review”, International
of the input machine learning model utilized in this work was selected
Conference on Innovations in Information Embedded and
features. The results of a comparison of the three algorithms (MLR, RF, and
Communication Systems (ICIIECS), March 2017.
XGBoost) revealed that XGBoost was the superior machine learning system
15. Shreekanth Parashar, Tanveer Hurra, “A Study of Rainfall Using
for forecasting daily rainfall using chosen natural features. If sensor data is
different Data Mining Techniques”, Research Gate, Article-May 2020.
incorporated in the study, the accuracy of rainfall estimations may improve.
16. Deepak Ranjan Nayak, Amitav Mahapatra, Pranati Mishra, “A Survey
However, sensory data were not taken into account in this investigation.
on Rainfall Prediction using Artificial Neural Networks”, International
Sensory and weather databases with extra unique environmental factors can
Journal of Computer Applications, volume 72- No.16, June 2013.
help enhance rainfall accuracy. If the sensor and meteorological data are

You might also like