Weather Forecasting Using Incremental K-Means Clustering
Weather Forecasting Using Incremental K-Means Clustering
Abstract – Clustering is a powerful tool which has been used in proposed model is explained in section four; here various
several forecasting works, such as time series forecasting, real stages are explained to perform weather forecasting using
time storm detection, flood forecasting and so on. In this paper, incremental K-means clustering. Section five is simulation
a generic methodology for weather forecasting is proposed by result; here the proposed technique is applied over the test
the help of incremental K-means clustering algorithm.
dataset and results are captured. And the last section i.e.
Weather forecasting plays an important role in day to day
applications.Weather forecasting of this paper is done based on
section six is conclusion and future scope of the proposed
the incremental air pollution database of west Bengal in the work.
years of 2009 and 2010. This paper generally uses typical K- II. RELATED WORK
means clustering on the main air pollution database and a list
of weather category will be developed based on the maximum There are several approaches that have been used for
mean values of the clusters.Now when the new data are weather prediction. In some cases, advance numerical
coming, the incremental K-means is used to group those data analysis has used for weather prediction but in most of the
into those clusters whose weather category has been already situations clustering techniques are used for different types
defined. Thus it builds up a strategy to predict the weather of of predictions. It may be weather prediction or may be
the upcoming data of the upcoming days. This forecasting
natural disaster prediction. All of these researches help to
database is totally based on the weather of west Bengal and
this forecasting methodology is developed to mitigating the
survive the world from the natural destructive events.
impacts of air pollutions and launch focused modeling Weather forecasting can also be done by using artificial
computations for prediction and forecasts of weather events. neural network [9].
Here accuracy of this approach is also measured. This paper is based on the incremental approach of
K-means clustering algorithm which has been already
Keywords – Clustering, Forecasting, Incremental, K-means. developed and discussed [1] [2]. Based on that incremental
algorithm concepts weather prediction of „West Bengal‟ is
done in this paper. There are also exist several approaches
I. INTRODUCTION which provide some modifications of this algorithm
[3][4][5][6]. An approach is proposed on a case study of
Forecasting is very important for prediction of the time series forecasting through clustering. In this approach,
future events. Science and computer technology together a generic methodology for time series forecasting is
has made significant advances over the past several years proposed. This methodology first search some useful
and using those advanced technologies and few past patterns in the form of curves and it then facilitates the
patterns, it grows the ability to predict the future. Weather forecasting through linear regression by matching to the
forecasting is directly dependent with the characteristics of closest pattern to each time series that has to be predicted.
the particulate matters present in the air. This approach is applied on Kddcup 2003 dataset [7]. Some
work is done on real time storm detection through data
Weather forecasting max (effects of NO2 +SO2+ mining. In this approach, a model and algorithms for
CO2+RPM +……) (1) bridging the gap between the physical environment and the
cyber infrastructure framework by means of an events
This paper presents a methodology for forecasting weather processing approach to responding to anomalous behavior
of „West Bengal‟ through clustering. This methodology uses and sophisticated data mining algorithms that apply
an air pollution database which is described in later section. classification techniques to the detection of severe storm
The paper is organized into the six sections. Section two is patterns. The above ideas have been implemented in the
consisting of work done previously in the same directions; a LEAD-CI prototype [8]. There exists one approach which
brief background history is covered in this section. The presents the data mining activity that was employed to
methodology is explained elaborately in section three. The mining weather data. The self-organizing data mining
approach employed is the enhanced Group Method of Data easily defined to which cluster the new means are belonged
Handling (e-GMDH). The weather data used for the DM and their weather category can also be defined based on that
research include daily temperature, daily pressure and particular cluster‟s weather category. At last the accuracy of
monthly rainfall. Experimental results indicate that the this method is also measured and discussed.
above approach is useful for data mining technique for
forecasting weather data [10]. A. Effects of air-pollution data on weather
The experimented database of this paper consists of four air-
III. METHODOLOGY polluted data (CO2, RPM, SO2, NOX). This four data has
very important roles in the weather or climate change. They
This analysis is based on the observation of the air not only make impact on the climate but they can also
pollution data has been collected from the “West Bengal Air harmful for humans and plants. These air pollutants are
Pollution Control Board” and the URL is- directly emitted from several different sources, such as ash
“http://www.wbpcb.gov.in/html/airqualitynxt.php”. This from a volcanic eruption, gas from a motor vehicle exhaust,
database consists of four air-pollution elements or attributes released from various industrial processes, high temperature
and they are Carbon dioxide (CO2), Respirable particulate combustion and so on.
matter (RPM), Sulphur dioxide (SO2) and Oxides of The following „Fig.1‟ shows how the air pollution data
Nitrogen (NOx). Air pollution data of each day are collected affect on humans health, plants health and moreover on the
and stored that record in an .arff (Attribute resource file environment [11].
format) file format. The detail database format is shown in
the „Table 1‟.
Table I. Original air-pollution Database
1/1/2009 85 183 12 95
6/1/2009 78 149 7 93
3/2/2009 98 154 8 96
CO2: A colourless, odorless, non-toxic greenhouse
gas associated with ocean acidification, emitted
4/2/2009 90 195 8 93 from sources such as combustion, cement
production, and respiration. It is one of the main
………. ....... ……. …… …….. pollutants of causing „global warming‟. Due to the
„greenhouse effect‟, the temperature of the
environment is increased and most importantly
The above database is a dynamic database where data are seasonal change is caused due to the increase of
updated frequently. The main approach of this paper is that CO2 [11].
first apply the K-means clustering algorithm [Chakraborty SO2: SO2 is produced by volcanoes and in various
and Nagwani, 2011] on that above original database industrial processes. Since coal and petroleum
(assuming initial cluster number). Then compute the means often contain sulphur compounds, their
of each cluster based on their air polluted attributes in the combustion generates sulphur dioxide. Further
database. Then a list of weather category will be developed oxidation of SO2, usually in the presence of a
based on the maximum mean value of each cluster. Now catalyst such as NO2, forms H2SO4, and thus acid
when the new data (data of upcoming days) are inserted into rain. SO2 also form smog (smoke+fog) which
the old database and then apply the incremental K-means create visibility problem [11][12][13].
clustering algorithm. Based on the behaviour of the
incremental K-means clustering algorithm the minimum NOX: Nitrogen dioxide is emitted from high
means of the new cluster data can be computed and it can be temperature combustion. It can be seen as the
brown haze dome above or plume downwind of The result of the second iteration is same as the above.
cities. NO2 is one of the most prominent air From the above four clusters the nature of those clusters on
pollutants. Just like CO2, NO2 is responsible to climate change can be measured.
increase temperature and it also creates smog From the resultant data of cluster1(C1), it can be
[11][12][13].. said that the effect of RPM (14 is maximum) are
more compare to the other pollutants. So, as per
RPM: Particulates, alternatively referred to as their effects on weather (discussed above), the
particulate matter (PM) or fine particles, are tiny weather of those particular days were smogy in
particles of solid or liquid suspended in a gas. It nature and also lots of dust, fly ash was there in the
creates dust, smokes, fumes, mist, fog, aerosols, fly weather.
ash and so on. Increased levels of fine particles in
the air are linked to health hazards such as heart As per the nature of Cluster2(C2), the weather of
disease, altered lung function and lung cancer those particular days were hot, dry and smogy in
[11][12][13]. nature due to the effect of NOx.
New
Incremental Data Based on that above means, five clusters are produced and
K-means the nature of each cluster depends upon the maximum value
clustering of the mean attribute. Such as, for the first cluster the value
of CO2 mean is maximum, now if the new coming data of
the upcoming day is inserted into the first cluster means the
weather of that particular day is hot, smogy and humid due
to the effect of CO2.
New data are clustered into the existing
clusters using incremental K-means
Suppose the data(10 months) from the September month of
clustering and weather prediction can be the year 2009 to June month of the year 2010 is shown by
performed based on the nature(previously the „Table 5‟,
calculated) of those existing clusters
Table IV. Weather category according to cluster data
Figure.2 Proposed model of weather forecasting using incremental dusty, fly ash, smogy,
Cluster3 75.983607
K-means clustering fog, Mist
[8] Xiang L., Beth P., Nithya V., Rahul R., Sara G. and
Helen C., “Real-Time Storm Detection and Weather
Forecast Activation through Data Mining and Events
Processing”,Vol.1, No.2, May 2008, pp. 49-57.
[13] Hans E., Markus A., Jelle G. van M., “ The ETC-ACC
framework for base-line scenario development in the
context of integrated assessment for air pollution and
climate change” , European Topic Centre on Air and
Climate Change(ETC/ACC), 2002.