SlideShare a Scribd company logo
DATA ANALYSIS ON
WEATHER FORECASTING
Prepared by,
Trupti Shingala
Introduction: Dataset
We have used weather forecast dataset having 366
observations from rattle package in R.
Used following Independent variables from the
dataset:
Max_Temperature , Min_Temperature,
WindSpeed3pm,WindSpeed9am, Pressure3pm,
Humidity9am, Humidity3pm,RainToday,
RainTomorrow.
Data Clean and Goals
ļ‚— Replaced the missing value with the field mean for
numerical data.
ļ‚— Implement various algorithms on the data to help
derive conclusion on classification and clustering of
data.
Algorithms used
Classification:
ļ‚— K-nearest neighbors
ļ‚— Naive Bayes
ļ‚— DecisionTree- Rpart
Clustering:
ļ‚— K means clustering
Classification and RegressionTree
ļ‚— The decision trees produced by CART are
strictly binary, containing exactly two branches
for each decision node.
ļ‚— CART recursively partitions the records in the
training data set into subsets of records with
similar values for the target attribute.
ļ‚— The CART algorithm grows the tree by
conducting for each decision node, an
exhaustive search of all available variables and
all possible splitting values.
ļ‚— Formula = Rain_Tomorrow ~ min_temp+
max_temp+windspeed9am+windspeed3pm+h
umidity3pm+pressure3pm
DecisionTree
Decision Tree
ļ‚— To determine if the tree is appropriate or if some of the
branches need to be subjected to pruning we can use
the cptable element of the rpart object:
ļ‚— The xerror column contains of estimates of cross-
validated prediction error for different numbers of splits
(nsplit).The best tree has three splits.
ļ‚— Now we can prune back the large initial tree using the
min Cp value.
The error rate of the decision tree after pruning is 16%
K-MEANS CLUSTRING
ļ‚— k-means clustering is a method of vector
quantization, originally from signal processing, that is
popular for cluster analysis in data mining.
ļ‚— The goal of K-Means algorithm is to find the best
division of n entities in k groups, so that the total
distance between the group's members and its
corresponding centroid, representative of the group,
is minimized.
ļ‚— Formally, the goal is to partition the n entities
into k sets Si, i=1, 2, ..., k in order to minimize the
within-cluster sum of squares (WCSS), defined as:
K-means Algorithm Step #1
A typical version of the K-means algorithm runs
in the following steps:
1. Initial cluster seeds are
chosen (at random).
– These represent the
ā€œtemporaryā€ means of the
clusters.
– Imagine our random
numbers were 60 for
group 1 and 70 for group
SEED1
SEED
2
K-means Algorithm Step #2
2.The squared
Euclidean distance
from each object to
each cluster is
computed, and each
object is assigned to
the closest cluster.
K-means Algorithm Step #3
3. For each
cluster, the new
centroid is
computed – and
each seed value
is now replaced
by the respective
cluster centroid.
• The new
mean for cluster
1 is 62.3
• The new
mean for cluster
2 is 68.9
K-means Algorithm Step #4 – #6
4.The squared Euclidean distance from an
object to each cluster is computed, and the
object is assigned to the cluster with the
smallest squared Euclidean distance.
5.The cluster centroids are recalculated
based on the new membership assignment.
6. Steps 4 and 5 are repeated until no object
moves clusters.
Applications
ļ‚— market segmentation
ļ‚— computer vision
ļ‚— geostatistics
ļ‚— astronomy
ļ‚— Agriculture
ļ‚— It often is used as a preprocessing step
for other algorithms, for example to find a
starting configuration.
FREQUENCYTABLE
ļ‚— for k=2
For k=3
PLOTTING CLUSTER
FOR K=2 FOR K=3
NaĆÆve Bayes Classifier
ļ‚— Computes the conditional a-posterior
probabilities of a categorical class variable
given independent predictor variables
using the Bayes rule.
NaĆÆve Bayes Classifier(Cont..)
ļ‚— Naive Bayes classifiers assume that the
effect of a variable value on a given class is
independent of the values of other
variable.This assumption is called class
conditional independence.
ļ‚— An advantage of the naive Bayes classifier
is that it requires a small amount of
training data to estimate the variable
values necessary for classification.
NaĆÆve Bayes Classifier(Cont..)
ļ‚— Here, we implemented NaĆÆve Bayes on
RainToday and RainTomorrow attributes with
another attributes of MinTemp, MaxTemp,
Temp9am,Temp3pm, Pressure9am,
Pressure3pm.
NaĆÆve Bayes Classifier(Cont..)
ļ‚— Perform naĆÆve Bayes on categorical data only. Here
in predict model if type is row then the
conditional a-posterior probabilities for each class
are returned.
ļ‚— Else the class with maximum probability is
returned
NaĆÆve Bayes Classifier(Cont..)
Pred No Yes
No 300 66
Yes 0 0
ļ‚— Output
ļ‚— Perform naĆÆve Bayes using Laplace
smoothing. It is technique that used to
smooth categorical data.
ļ‚— The default (0) value of laplace disables
Laplace smoothing.
NaĆÆve Bayes Classifier(Cont..)
NaĆÆve Bayes Classifier(Cont..)
Pred No Yes
No 258 34
Yes 42 32
Pred No Yes
No 271 38
Yes 29 28
ļ‚— RainToday ļ‚— RainTomorrow
ļ‚— It is a Lazy Learning Algorithm
ļ‚— Whenever we have a new point to classify , we
find its K nearest neighbors from the training
data
ļ‚— It Defers the decision to generalize the past
training examples till a new query is encountered
ļ‚— K-NN uses distance function to calculate the
distance between points from the center
ļ‚— Our Goal is to specify for which value of K the
weather data is most accurate
K - Nearest Neighbor
ļ‚— Given a query instance xq to be classified,
ļ‚— Let x1,x2….xk denote the k instances from
training examples that are nearest to xq
ļ‚— Return the class that represents the maximum of
the k instances
ļ‚— For eg: if we take K=5
In this case query Xq
Will be classified as
Negative since 3 of its
Nearest neighbors are classified as negative
K - Nearest Neighbor
K-Nearest Neighbor – Transitional
Conclusions
ļ‚— For K = 1 we have following Table result & error
rate for rain tomorrow
ļ‚— For K = 2 we have followingTable result &
error rate for rain tomorrow
ļ‚— For K = 5 we have following Table result & error rate for
rain tomorrow
ļ‚— For K = 10 we have following Table result & error
rate for rain tomorrow
K - Nearest Neighbor
K-Nearest Neighbor – Conclusions
and Error Rate
ļ‚— The error rate changes every time since
the training and the test dataset are not
stable
ļ‚— The error rate is 21%
Comparison of Algorithms
Accuracy of the following algorithms are:
1. KNN – 79%
2. K-means – 80.5%
3. Decision tree – 84%

More Related Content

What's hot (20)

PPTX
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
Aksum Institute of Technology(AIT, @Letsgo)
Ā 
PPTX
time series analysis
SACHIN AWASTHI
Ā 
PDF
Time Series - 1
Birinder Singh Gulati
Ā 
PPTX
Time series analysis
Utkarsh Sharma
Ā 
PDF
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DATAVERSITY
Ā 
PPTX
Statistics
itutor
Ā 
PPTX
Predictive analytics
SAS Singapore Institute Pte Ltd
Ā 
PPTX
Time series forecasting with machine learning
Dr Wei Liu
Ā 
PDF
Introduction to Machine Learning Classifiers
Functional Imperative
Ā 
PPT
Time Series Analysis and Forecasting.ppt
ssuser220491
Ā 
PPTX
Data analytics
Bhanu Pratap
Ā 
PPTX
Random forest
Musa Hawamdah
Ā 
PPTX
Remote sensing for
South Eastern University of Srilanka
Ā 
PDF
Ridge regression
Ananda Swarup
Ā 
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
Ā 
PPT
Mba 532 2011_part_3_time_series_analysis
Chandra Kodituwakku
Ā 
PPT
Time Series Analysis - Modeling and Forecasting
Maruthi Nataraj K
Ā 
PPTX
Data Analytics
Srinimf-Slides
Ā 
PPSX
Lasso and ridge regression
SreerajVA
Ā 
PPTX
Markov chain-model
Md. Ayatullah Khan
Ā 
Weather Forecasting using Deep Learning A lgorithm for the Ethiopian Context
Aksum Institute of Technology(AIT, @Letsgo)
Ā 
time series analysis
SACHIN AWASTHI
Ā 
Time Series - 1
Birinder Singh Gulati
Ā 
Time series analysis
Utkarsh Sharma
Ā 
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
DATAVERSITY
Ā 
Statistics
itutor
Ā 
Predictive analytics
SAS Singapore Institute Pte Ltd
Ā 
Time series forecasting with machine learning
Dr Wei Liu
Ā 
Introduction to Machine Learning Classifiers
Functional Imperative
Ā 
Time Series Analysis and Forecasting.ppt
ssuser220491
Ā 
Data analytics
Bhanu Pratap
Ā 
Random forest
Musa Hawamdah
Ā 
Ridge regression
Ananda Swarup
Ā 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
Ā 
Mba 532 2011_part_3_time_series_analysis
Chandra Kodituwakku
Ā 
Time Series Analysis - Modeling and Forecasting
Maruthi Nataraj K
Ā 
Data Analytics
Srinimf-Slides
Ā 
Lasso and ridge regression
SreerajVA
Ā 
Markov chain-model
Md. Ayatullah Khan
Ā 

Viewers also liked (6)

PPTX
Prediction of rainfall using image processing
Vineesh Kumar
Ā 
PPT
Data mining
pradeepa n
Ā 
PPTX
Implementation of Data Mining Techniques for Meteorological Data Analysis
Arofiah Hidayati
Ā 
PDF
Weather forecasting
Prof. A.Balasubramanian
Ā 
PPT
Forecasting Techniques
guest865c0e0c
Ā 
PPTX
Data mining
Akannsha Totewar
Ā 
Prediction of rainfall using image processing
Vineesh Kumar
Ā 
Data mining
pradeepa n
Ā 
Implementation of Data Mining Techniques for Meteorological Data Analysis
Arofiah Hidayati
Ā 
Weather forecasting
Prof. A.Balasubramanian
Ā 
Forecasting Techniques
guest865c0e0c
Ā 
Data mining
Akannsha Totewar
Ā 
Ad

Similar to Data analysis of weather forecasting (20)

PPTX
Machine learning algorithms
Shalitha Suranga
Ā 
PDF
Di35605610
IJERA Editor
Ā 
PDF
Hypothesis on Different Data Mining Algorithms
IJERA Editor
Ā 
PDF
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
Ā 
PPT
3.Unsupervised Learning.ppt presenting machine learning
PriyankaRamavath3
Ā 
PPTX
Mathematics online: some common algorithms
Mark Moriarty
Ā 
PPTX
Csci101 lect10 algorithms_iii
Elsayed Hemayed
Ā 
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
Ā 
PDF
Best data science training, best data science training institute in Chennai
hrhrenurenu
Ā 
PDF
business analytics course in delhi
devipatnala1
Ā 
PDF
data science training
devipatnala1
Ā 
PDF
Data science training
prathyusha1234
Ā 
PDF
data science institute in bangalore
devipatnala1
Ā 
PDF
Best data science training, best data science training institute in hyderabad.
hrhrenurenu
Ā 
PDF
Data science certification
prathyusha1234
Ā 
PDF
Data scientist course in hyderabad
prathyusha1234
Ā 
PDF
Best data science training, best data science training institute in hyderabad.
sripadojwarumavilas
Ā 
PDF
Data science course in chennai (3)
prathyusha1234
Ā 
PDF
data science training in mumbai
devipatnala1
Ā 
PDF
Best institute for data science in hyderabad
prathyusha1234
Ā 
Machine learning algorithms
Shalitha Suranga
Ā 
Di35605610
IJERA Editor
Ā 
Hypothesis on Different Data Mining Algorithms
IJERA Editor
Ā 
Introduction to data mining and machine learning
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
Ā 
3.Unsupervised Learning.ppt presenting machine learning
PriyankaRamavath3
Ā 
Mathematics online: some common algorithms
Mark Moriarty
Ā 
Csci101 lect10 algorithms_iii
Elsayed Hemayed
Ā 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Maninda Edirisooriya
Ā 
Best data science training, best data science training institute in Chennai
hrhrenurenu
Ā 
business analytics course in delhi
devipatnala1
Ā 
data science training
devipatnala1
Ā 
Data science training
prathyusha1234
Ā 
data science institute in bangalore
devipatnala1
Ā 
Best data science training, best data science training institute in hyderabad.
hrhrenurenu
Ā 
Data science certification
prathyusha1234
Ā 
Data scientist course in hyderabad
prathyusha1234
Ā 
Best data science training, best data science training institute in hyderabad.
sripadojwarumavilas
Ā 
Data science course in chennai (3)
prathyusha1234
Ā 
data science training in mumbai
devipatnala1
Ā 
Best institute for data science in hyderabad
prathyusha1234
Ā 
Ad

Recently uploaded (20)

PPTX
BinarySearchTree in datastructures in detail
kichokuttu
Ā 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
Ā 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
Ā 
PDF
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
Ā 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
Ā 
PDF
IT GOVERNANCE 4-1 - Information System Security (1).pdf
mdirfanuddin1322
Ā 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
Ā 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
Ā 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
Ā 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
Ā 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
Ā 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
Ā 
PPTX
Krezentios memories in college data.pptx
notknown9
Ā 
PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
Ā 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
Ā 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
Ā 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
Ā 
PDF
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
Ā 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
Ā 
BinarySearchTree in datastructures in detail
kichokuttu
Ā 
Business implication of Artificial Intelligence.pdf
VishalChugh12
Ā 
SQL for Accountants and Finance Managers
ysmaelreyes
Ā 
2025 Global Data Summit - FOM with AI.pdf
Marco Wobben
Ā 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
Ā 
IT GOVERNANCE 4-1 - Information System Security (1).pdf
mdirfanuddin1322
Ā 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
Ā 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
Ā 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
Ā 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
Ā 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
Ā 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
Ā 
Krezentios memories in college data.pptx
notknown9
Ā 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
Ā 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
Ā 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
Ā 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
Ā 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
Ā 
GOOGLE ADS (1).pdf THE ULTIMATE GUIDE TO
kushalkeshwanisou
Ā 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
Ā 

Data analysis of weather forecasting

  • 1. DATA ANALYSIS ON WEATHER FORECASTING Prepared by, Trupti Shingala
  • 2. Introduction: Dataset We have used weather forecast dataset having 366 observations from rattle package in R. Used following Independent variables from the dataset: Max_Temperature , Min_Temperature, WindSpeed3pm,WindSpeed9am, Pressure3pm, Humidity9am, Humidity3pm,RainToday, RainTomorrow.
  • 3. Data Clean and Goals ļ‚— Replaced the missing value with the field mean for numerical data. ļ‚— Implement various algorithms on the data to help derive conclusion on classification and clustering of data.
  • 4. Algorithms used Classification: ļ‚— K-nearest neighbors ļ‚— Naive Bayes ļ‚— DecisionTree- Rpart Clustering: ļ‚— K means clustering
  • 5. Classification and RegressionTree ļ‚— The decision trees produced by CART are strictly binary, containing exactly two branches for each decision node. ļ‚— CART recursively partitions the records in the training data set into subsets of records with similar values for the target attribute. ļ‚— The CART algorithm grows the tree by conducting for each decision node, an exhaustive search of all available variables and all possible splitting values. ļ‚— Formula = Rain_Tomorrow ~ min_temp+ max_temp+windspeed9am+windspeed3pm+h umidity3pm+pressure3pm
  • 7. Decision Tree ļ‚— To determine if the tree is appropriate or if some of the branches need to be subjected to pruning we can use the cptable element of the rpart object: ļ‚— The xerror column contains of estimates of cross- validated prediction error for different numbers of splits (nsplit).The best tree has three splits. ļ‚— Now we can prune back the large initial tree using the min Cp value.
  • 8. The error rate of the decision tree after pruning is 16%
  • 9. K-MEANS CLUSTRING ļ‚— k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. ļ‚— The goal of K-Means algorithm is to find the best division of n entities in k groups, so that the total distance between the group's members and its corresponding centroid, representative of the group, is minimized. ļ‚— Formally, the goal is to partition the n entities into k sets Si, i=1, 2, ..., k in order to minimize the within-cluster sum of squares (WCSS), defined as:
  • 10. K-means Algorithm Step #1 A typical version of the K-means algorithm runs in the following steps: 1. Initial cluster seeds are chosen (at random). – These represent the ā€œtemporaryā€ means of the clusters. – Imagine our random numbers were 60 for group 1 and 70 for group SEED1 SEED 2
  • 11. K-means Algorithm Step #2 2.The squared Euclidean distance from each object to each cluster is computed, and each object is assigned to the closest cluster.
  • 12. K-means Algorithm Step #3 3. For each cluster, the new centroid is computed – and each seed value is now replaced by the respective cluster centroid. • The new mean for cluster 1 is 62.3 • The new mean for cluster 2 is 68.9
  • 13. K-means Algorithm Step #4 – #6 4.The squared Euclidean distance from an object to each cluster is computed, and the object is assigned to the cluster with the smallest squared Euclidean distance. 5.The cluster centroids are recalculated based on the new membership assignment. 6. Steps 4 and 5 are repeated until no object moves clusters.
  • 14. Applications ļ‚— market segmentation ļ‚— computer vision ļ‚— geostatistics ļ‚— astronomy ļ‚— Agriculture ļ‚— It often is used as a preprocessing step for other algorithms, for example to find a starting configuration.
  • 18. NaĆÆve Bayes Classifier ļ‚— Computes the conditional a-posterior probabilities of a categorical class variable given independent predictor variables using the Bayes rule.
  • 19. NaĆÆve Bayes Classifier(Cont..) ļ‚— Naive Bayes classifiers assume that the effect of a variable value on a given class is independent of the values of other variable.This assumption is called class conditional independence. ļ‚— An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the variable values necessary for classification.
  • 20. NaĆÆve Bayes Classifier(Cont..) ļ‚— Here, we implemented NaĆÆve Bayes on RainToday and RainTomorrow attributes with another attributes of MinTemp, MaxTemp, Temp9am,Temp3pm, Pressure9am, Pressure3pm.
  • 21. NaĆÆve Bayes Classifier(Cont..) ļ‚— Perform naĆÆve Bayes on categorical data only. Here in predict model if type is row then the conditional a-posterior probabilities for each class are returned. ļ‚— Else the class with maximum probability is returned
  • 22. NaĆÆve Bayes Classifier(Cont..) Pred No Yes No 300 66 Yes 0 0 ļ‚— Output
  • 23. ļ‚— Perform naĆÆve Bayes using Laplace smoothing. It is technique that used to smooth categorical data. ļ‚— The default (0) value of laplace disables Laplace smoothing. NaĆÆve Bayes Classifier(Cont..)
  • 24. NaĆÆve Bayes Classifier(Cont..) Pred No Yes No 258 34 Yes 42 32 Pred No Yes No 271 38 Yes 29 28 ļ‚— RainToday ļ‚— RainTomorrow
  • 25. ļ‚— It is a Lazy Learning Algorithm ļ‚— Whenever we have a new point to classify , we find its K nearest neighbors from the training data ļ‚— It Defers the decision to generalize the past training examples till a new query is encountered ļ‚— K-NN uses distance function to calculate the distance between points from the center ļ‚— Our Goal is to specify for which value of K the weather data is most accurate K - Nearest Neighbor
  • 26. ļ‚— Given a query instance xq to be classified, ļ‚— Let x1,x2….xk denote the k instances from training examples that are nearest to xq ļ‚— Return the class that represents the maximum of the k instances ļ‚— For eg: if we take K=5 In this case query Xq Will be classified as Negative since 3 of its Nearest neighbors are classified as negative K - Nearest Neighbor
  • 27. K-Nearest Neighbor – Transitional Conclusions ļ‚— For K = 1 we have following Table result & error rate for rain tomorrow ļ‚— For K = 2 we have followingTable result & error rate for rain tomorrow
  • 28. ļ‚— For K = 5 we have following Table result & error rate for rain tomorrow ļ‚— For K = 10 we have following Table result & error rate for rain tomorrow K - Nearest Neighbor
  • 29. K-Nearest Neighbor – Conclusions and Error Rate ļ‚— The error rate changes every time since the training and the test dataset are not stable ļ‚— The error rate is 21%
  • 30. Comparison of Algorithms Accuracy of the following algorithms are: 1. KNN – 79% 2. K-means – 80.5% 3. Decision tree – 84%