A Logistic Regression Model For Predicting The Occurrence of Intense
A Logistic Regression Model For Predicting The Occurrence of Intense
net/publication/29623893
CITATIONS READS
34 347
1 author:
N. Srivastava
Physical Research Laboratory
127 PUBLICATIONS 1,339 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Fluxgate Magnetometer (FGM) onboard Aditya-L1 Solar Mission as Principal Investigator View project
All content following this page was uploaded by N. Srivastava on 19 May 2014.
Abstract. A logistic regression model is implemented for space-weather prediction has been facilitated by the launch
predicting the occurrence of intense/super-intense geomag- of the SoHO (Fleck et al., 1995) and furthermore by ACE
netic storms. A binary dependent variable, indicating the (Hovestadt et al., 1995), it is still difficult to forecast the oc-
occurrence of intense/super-intense geomagnetic storms, is currence and intensity of geomagnetic storms. Also, avail-
regressed against a series of independent model variables able prediction techniques often result in false alarms, while
that define a number of solar and interplanetary properties of missing some of the super-intense geomagnetic storms. Most
geo-effective CMEs. The model parameters (regression co- of the available prediction schemes for the strength of the
efficients) are estimated from a training data set which was geomagnetic storms rely on inputs from the interplanetary
extracted from a dataset of 64 geo-effective CMEs observed sources of the storms and are based on the work of Burton
during 1996–2002. The trained model is validated by pre- et al. (1975). They found that the intensity of the storms
dicting the occurrence of geomagnetic storms from a valida- largely depend on two main parameters of the solar wind,
tion dataset, also extracted from the same data set of 64 geo- namely the solar wind speed and the southward component
effective CMEs, recorded during 1996–2002, but not used of the IMF. Other prediction schemes are, in fact, real-time
for training the model. The model predicts 78% of the ge- prediction schemes (Feldstein, 1992; Lundstedt, 1992; Wu
omagnetic storms from the validation data set. In addition, and Lundstedt 1996; Fenrich 1998; O’Brien and McPher-
the model predicts 85% of the geomagnetic storms from the ron, 2000) which use the original formula of Burton et al.
training data set. These results indicate that logistic regres- (1975). While these schemes yield generally accurate pre-
sion models can be effectively used for predicting the occur- dictions, their prior warnings are only a few hours in ad-
rence of intense geomagnetic storms from a set of solar and vance of the actual occurrence of the geomagnetic storm,
interplanetary factors. mainly because they rely on in-situ properties of the solar
wind that can only be measured close to the Earth. For ex-
Keywords. Solar physics, astrophysics, and astronomy
ample, the ACE measurements which are made upstream at
(Flares and mass ejections) – Magnetospheric physics (So-
the L1 point give 30-60 min of warning time. Greater lead
lar wind-magnetosphere interaction)
or warning time requires a solar wind monitor further up-
stream (McPherron et al., 2004). For practical applications,
it is necessary to forecast space weather well in advance,
1 Introduction so that precautionary measures can be put in place (Feyn-
man and Gabriel, 2000). This involves prediction of a) the
The study of terrestrial consequences of earthward-directed strength of a geomagnetic storm soon after the launch of
strong CMEs is known as space weather study. It is now a CME from the Sun, and b) the arrival time of the CME
well established that most of the strong geomagnetic storms at the Earth. This type of advance forecasting requires the
have their sources in fast-moving halo CMEs (Gosling et identification of key solar parameters that determine the geo-
al., 1990; Srivastava and Venkatakrishnan, 2002). However, effectiveness of a CME. Amongst several characteristic fea-
there are exceptional cases where some intense storms can- tures, for example, southward Bz , duration, wind speed and
not be traced back to a single or a full halo CME, as ob- density, Chen et al. (1996, 1997) chose the duration and
served by Cane and Richardson (2003); Zhang et al. (2003); the magnitude of Bz as important quantities for predicting
Zhao and Webb (2003); Schwenn et al. (2005). Although the geo-effectiveness. However, the characteristics of the solar
Correspondence to: N. Srivastava sources of intense/super-intense geomagnetic storms that in-
([email protected]) fluence their interplanetary properties and the mode of their
2970 N. Srivastava: Logistic regression model for predicting geomagnetic storms
influence are not yet well understood. Recently, a number gistic regression model, used in this study. The criteria for
of studies on solar sources of severe geomagnetic storms development of a viable statistical model has been described.
have been carried out to understand the solar-terrestrial rela- In subsequent sections, the actual model equation has been
tionship (Feynman and Gabriel, 2000; Plunkett et al., 2001; obtained based on both solar and interplanetary observed
Wang et al., 2002; Zhang et al., 2003; Vilmer et al., 2003; variables. Lastly, the testing and validation of the model has
Schwenn et al., 2005; Srivastava, 2005). Such understand- been discussed.
ing can give us important parameters for forecasting the oc-
currence of intense/super-intense geomagnetic storms well in
advance. Srivastava and Venkatakrishnan (2004) investigated 2 Statistical modeling for predicting geomagnetic
the relationship between the solar and interplanetary parame- storms
ters of the geo-effective CMEs that were responsible for pro-
ducing 64 major geomagnetic storms (Dst <−100 nT) at the A statistical model for predicting geomagnetic storms can be
Earth, recorded during 1996–2002. Their investigations re- defined as a highly-simplified, numeric representation of the
veal that the intensity of geomagnetic storms is strongly re- relation between solar and interplanetary variables and the
lated to the southward component of the interplanetary mag- occurrence of geomagnetic storms. A generalized statistical
netic field Bz , followed by initial speeds and the ram pres- model can be empirically represented below:
sure of the geo-effective CME. They obtained a high corre-
lation coefficient (0.66) between the Dst index and the initial GMS = [f (SPi , I Pj ), Pk ], (1)
speeds of the CMEs, which indicates that the initial speed
of a CME can be a useful parameter for predicting the in- where GMS represents the occurrence of a geomagnetic
tensity of a strong geomagnetic storm. A similarly high cor- storm, SPi (i=I) is the ith solar variable, IPj (j=J) is the j th
relation coefficient (0.64) between the ram pressure values interplanetary variable, Pk (k=K) is the kth parameter of the
and the Dst indices indicates that the ram pressure plays an function f that relates the solar and interplanetary variables
important role in the occurrence of an intense/super-intense to the occurrence of a geomagnetic storm, and I, J and K are
geomagnetic storm. They also found that, although an inter- the total numbers of solar variables, interplanetary variables
planetary shock is a good predictor for the arrival of ejecta at and function parameters, respectively. Based on whether the
the Earth, the shock speeds are not very reliable for predict- relationship is hypothesized to be linear or nonlinear, a vari-
ing the resulting storm intensity. The study also suggests that ety of linear and nonlinear functions can be used to approx-
the strength of a geomagnetic storm can be better predicted imate the relationship between the variables and the occur-
soon after the launch of a CME, if one can predict the ram rence of a geomagnetic storm (Agresti, 1990; Finney, 1971).
pressure, because high ram pressure compresses the mag- Linear/nonlinear dependence of the magnetospheric dynam-
netic field of the magnetic cloud and intensifies the south- ics have been studied by several workers, for example, John-
ward component of Bz , which is a good predictor of geo- son and Wing (2004) and references therein. While models
magnetic storms. This paper builds on the work of Srivastava based on machine learning (for example, artificial neural net-
and Venkatakrishnan (2004) to implement a statistical model works and Bayesian network classifiers) use nonlinear func-
for predicting the occurrence of intense/super-intense geo- tions, several other statistical models, like linear regression
magnetic storms (Dst <−100 nT) based on the observations and naı̈ve Bayesian models, use linear functions. Nonlinear
of 64 geo-effective events recorded during 1996–2002. Un- models generally fit the data more efficiently, but many non-
like previous models, which used only interplanetary mea- linear models (for example, neural network-based models)
sured variables for predicting the occurrence of a geomag- have a black-box type implementation, which means that the
netic storm, the present model incorporates both solar and model parameters are difficult to interpret for gaining mean-
interplanetary variables, which were identified largely from ingful insights into the data. However, logistic regression
the study of Srivastava and Venkatakrishnan (2004), to pre- offers a nonlinear model with the additional benefit that its
dict the occurrence of major storms. parameters can be interpreted for gaining insights into the
Statistical models use large databases to establish a mathe- data, especially about the relative importance of various vari-
matical relation between solar and interplanetary parameters ables in predicting the occurrence of geomagnetic storms.
and the geomagnetic index (Dst index in the present study). This can help in understanding the factors that influence the
The choice of parameters used in the prediction scheme is geo-effectiveness of CMEs. Moreover, logistic regression is
based on our current knowledge or understanding of physical generally considered suitable for predictive modeling of di-
processes. In the present study, by introducing solar param- chotomous events that can be represented by a binary-state
eters in the prediction scheme it is attempted to improve the variable (Hosmer and Lemeshow, 2000). Such events gen-
“medium-term” forecasting (McPherron et al., 2004). Also, erally include the presence/absence or the occurrence/non-
by using statistical models, one is able to ascertain the rela- occurrence type of events. In view of the interpretability of
tive contribution of various parameters used in the predictive the model parameters and also because the objective in the
scheme. present study is to predict the occurrence of a intense/super-
In the following sections, we first describe, in brief, the intense geomagnetic storm, a logistic regression model for
theoretical background of the statistical model, namely a lo- modeling the relation in Eq. (1) was selected.
N. Srivastava: Logistic regression model for predicting geomagnetic storms 2971
Table 1. Coded values of dependent and independent variables of the logistic regression model.
Encoding
Names of variables Type of variable Measured Parameter Value Code
Dst index Binary and dependent Dst −200 to −100 nT 0
<−200 nT 1
Halos Binary and independent Full halos 360◦ angular span 1
Partial >140◦ angular span 0
None −1
Location Binary and independent Location-bin Within ±40◦ latitude ±40◦ longitude 1
Outside ±40◦ latitude ±40◦ longitude 0
Association with Binary and independent Flare-bin Flares 1
other activity EPs 0
Initial Speeds Numeric and independent Vi Value in km s−1 −
Southward IMF Numeric and independent Bz Value in nT −
Total IMF Numeric and independent BT Value in nT −
Ram pressure Numeric and independent PR Value in dynes cm−2 −
Table 2. Estimates of the parameters of the model (maximum like- The rules used for generating the binary variables are
lihood.) given in Table 1. The fourth solar variable, viz., initial speeds
of the CME (Vi ), was used as such. In all, four solar variables
Variable Estimates Std. dev. χ2 Pr. >χ 2
were used as independent variables in the logistic regression.
The interplanetary properties of the geo-effective CMEs in
Intercept −4.575 1.871 5.977 0.014 the dataset are described by the following measured and de-
Halo-Bin 0.489 0.786 0.387 0.534 rived variables: 1) shock speeds (VSH ), 2) ram pressure (PR ),
Flare-bin 0.506 1.008 0.252 0.615
3) total value of the IMF (BT ), 4) southward component of
the IMF (Bz ), 5) solar wind speeds before and after the shock
Location-bin 0.305 1.170 0.068 0.795 (V1 and V2 ), 6) densities before and after the shock (n1 and
Vi 0.001 0.001 0.922 0.337 n2 ), and 7) solar wind-magnetospheric coupling parameter
BT −0.102 0.092 1.213 0.271 (V Bz ). Out of the above interplanetary variables, only ram
Bz −0.243 0.109 4.937 0.026 pressure (PR ), total value of the IMF (BT ) and southward
PR 2 394 103.2 5 614 666 0.182 0.670 component of the IMF (Bz ) were used as independent vari-
ables in the logistic regression. Shock speed (VSH ) was not
used because it has a poor Pearson’s correlation coefficient
events were missing. The remaining 55 events were divided (0.28) with the Dst index (Srivastava and Venkatakrishnan,
into two sets, namely, a training set and a validation set. The 2004). Shock speed (VSH ) is dependent on the solar wind
training dataset, comprising of 46 events, which included 16 speeds (V1 and V2 ) and also densities n1 and n2 before and
super-intense storms and 30 intense geomagnetic storms, was after the shock and therefore much of the information con-
used to train a logistic regression model. The trained model tained in the variables V1 , V2 , n1 and n2 may be considered
was validated on the remaining 9 events, which included 4 redundant. Similarly, ram pressure, which is dependent on
super-intense and 5 intense geomagnetic storms. the solar wind speed and density after the shock and shows a
good correlation coefficient with the Dst index, was included
2.3 Model variables in the regression analysis. The solar wind-magnetospheric
coupling parameter (V Bz ) is a function of the southward
The intensity of the geomagnetic storm associated with each component of the IMF (Bz ), and therefore was considered
of the 64 CMEs in the dataset is represented by the Dst index, redundant for regression analysis.
which is used as the dependent variable in logistic regres-
sion. However, it was converted into a binary variable, using 2.4 Model training
a Dst index value of −200 nT as the threshold. The geo-
magnetic storms with Dst <−200 nT were considered super- The logistic regression was trained on the training dataset
intense and coded as 1, while the storms with −200 nT using XLSTAT software (http://www.xlstat.com). Training
<Dst <−100 nT were considered intense and coded as 0 (Ta- comprised estimation of regression coefficient using an iter-
ble 1). The solar properties of the geo-effective CMEs in ative maximum likelihood method. The regression equation
the dataset are defined by the following variables: 1) latitude obtained after training the model is given below:
of the origin of the CME, 2) longitude of the origin of the 1
CME, 3) flare/prominence association, 4) association with P= , (6)
(1 + exp(−Z))
full/partial/no halo CME, and 5) initial speeds of the CME
(Vi ). In order to reduce redundancy, the two locational vari- where
ables, namely, latitude and longitude, were combined to gen-
erate a single binary variable termed source location. Simi-
larly, two other solar variables, viz., flare/prominence asso- Z = (−4.57 + 0.488×Halo−bin + 0.51×Flare−bin
ciation and association with full/partial/no halo CME, were +0.30×Location−bin + 7.44×E−04×Vi
also converted into binary variables. Our study (Srivastava
−0.24×Bz −0.10×BT + 2394103.2×PR ). (7)
and Venkatakrishnan, 2004) has shown that a large number
of the intense storms are associated more with flares than The details of the estimates of regression coefficients and a
with prominences (75% to 25%). This implies that flares number of statistical properties of the model variables are
do play an important role as source regions of intense geo- given in Table 2.
magnetic storms. This is also supported by the result that
flares generally occur in active regions and that the magnetic 2.5 Model validation
energy of the source active region dictates, to some extent,
the speed of the ensuing halo CMEs (Venkatakrishnan and The logistic regression model was validated by using the
Ravindra, 2003). Thus, the knowledge of the association of regression equation (Eq. (4)) to predict the occurrence of
the CME with a flare or eruptive prominence indirectly sug- intense/super-intense geomagnetic storms for the validation
gests the magnetic energy involved, as well as the speeds of dataset. A threshold value of 0.500 was used to make a clas-
the CMEs. sification: if the predicted value is more than 0.500, the event
N. Srivastava: Logistic regression model for predicting geomagnetic storms 2973
Fenrich, F. R. and Luhmann, J. G.: Geomagnetic response to mag- Srivastava, N. and Venkatakrishnan, P.: Relation between CME
netic clouds of different polarity, Geophys. Res. Lett., 25, 2999– speed and geomagnetic storm intensity, Geophy. Res. Lett., 29,
3002, 1998. 1287–1291, 2002.
Feynman, J. and Gabriel, S. B.: On space weather consequences Srivastava, N. and Venkatakrishnan, P.: Solar and interplanetary
and predictions, J. Geophys. Res., 105, 10 543–10 564, 2000. sources of major geomagnetic storms during 1996–2002, J. Geo-
Finney, D. J. : Probit Analysis. 3rd Ed., Cambridge, London and phys. Res., 109(A010103), doi:10.1029/2003JA010175, 2004.
New-York, 1971. Srivastava, N.: Predicting the occurrence of super-storms, Ann.
Fleck, B., Domingo, V., and Poland, A: The SoHO Mission, Solar Geophys., 23, 2989–2995, 2005.
Phys., 162, 1–2, 1995. Stone, E. C., Frandsen, A. M., Mewaldt, R. A., Christian, E. R.,
Gosling, J. T., Bame, S. J., McComas, D. J., and Phillips, J. L.: Margolies, D., Ormes, J. F., and Snow, F.: The Advanced Com-
Coronal mass ejections and large geomagnetic storms, Geophys. position Explorer, Space Sci. Rev., 86, 1/4, 357–408, 1998.
Res. Lett., 17, 901–904, 1990. Venkatakrishnan, P. and Ravindra, B.: Relationship between CME
Hosmer, D. W., Jr. and Lemeshow, S.: Applied Logistic Regression, speeds and magnetic energy of active regions, Geophys. Res.
Wiley, 2000, ISBN: 0-471-35632-8. Lett., 30, 2181–2184, 2003.
Hovestadt, D., Hilchenbach, M., and Bürgi, A., et al.: CELIAS Vilmer, N., Pick, M., Schwenn, R., Ballatore, P., and Villain, J. P.:
Charge Element and Isotope analysis system for SoHO, Solar On the solar origin of interplanetary disturbances observed in the
Phys., 162, 441–481, 1995. vicinity of the earth., Ann. Geophys., 21, 847–862, 2003,
Johnson, R. and Wing, S.: A solar cycle dependence of non- SRef-ID: 1432-0576/ag/2003-21-847.
linearity in magnetospheric activity, J. Geophys. Res., 110, Wang, Y. M., Ye, P. Z., Wang, S., Zhou, G. P., and Wang, J. X.: A
doi:10.1029/2004JA010638, 2004. statistical study on the geo-effectiveness of earth-directed coro-
Lundstedt, H.: Neural networks and predictions of solar-terrestrial nal mass ejections from March 1997 to December 2000, J. Geo-
effects, Planet. Space. Sci., 40, 457–464, 1992. phys. Res., 107, 1340, doi:10.1029/2002JA009244, 2002.
McPherron, R. L., Siscoe, G., and Arge, N. : Probabilistic Forecast- Wu, J.-G. and Lundstedt, H.: Prediction of geomagnetic storms
ing of the 3h ap Index, IEEE transactions on Plasma Science, 32, from solar wind data using Elman recurrent neural networks,
1425–1438, 2004. Geophys. Res. Lett., 23, 319–322, 1996.
O’Brien, T. Paul and McPherron, R. L.: An empirical phase space Zhang, J., Dere, K. P., Howard, R. A., and Bothmer, V.: Identifica-
analysis of ring current dynamics: Solar wind control of injection tion of solar sources of major geomagnetic storms between 1996
and decay, J. Geophys. Res., 105, 7707–7720, 2000. and 2000, Astrophys. J., 582, 520–533, 2003.
Plunkett, S. P., Thompson, B. J., St. Cyr. O. C., and Howard, R. A.:, Zhao, X. P. and Webb, D. F.: Source regions and storm effectiveness
Source regions of coronal mass ejections and their geomagnetic of frontside full halo coronal mass ejections, J. Geophys. Res,
effects, J. Atmos. Sol. Terr. Phys., 63, 389–402, 2001. 108, 1234, doi:10.1029/2002JA009606, 2003.
Schwenn, R., Dal Lago, A., Huttunen, E., and Gonzalez, W. D.:
The association of coronal mass ejections with their effects near
the earth, Ann. Geophys., 23, 1033–1059, 2005,
SRef-ID: 1432-0576/ag/2005-23-1033.