Fast_Counterfactual_Explanation_for_Solar_Flare_Prediction
Fast_Counterfactual_Explanation_for_Solar_Flare_Prediction
Prediction
Peiyu Li, Omar Bahri, Soukaı̈na Filali Boubrahimi, Shah Muhammad Hamdi
Department of Computer Science, Utah State University, Logan, UT 84322, USA
Emails: {peiyu.li, omar.bahri, soukaina.boubrahimi, s.hamdi}@usu.edu
I. I NTRODUCTION
A solar flare is an intense burst of radiation coming from Fig. 1. NASA’s Solar Dynamics Observatory captured this image of an X2.0-
class solar flare bursting off the lower right side of the sun on Oct. 27, 2014.
the release of magnetic energy associated with sunspots [1]. The image shows a blend of extreme ultraviolet light with wavelengths of
X-rays and UV radiation emitted by solar flares can cause elec- 131 and 171 Angstroms. Image Credit: NASA/SDO
tromagnetic disturbances on the earth, as with radio frequency
communications and power line transmissions [2].
In recent years, the success of supervised machine learning considered to be in a positive class and non-flaring Active
(ML) methods especially deep neural networks on solar flare Regions are considered to be in the negative class [2]. In
prediction have been verified by experts in the space weather this work, as positive class examples, we consider the Active
domain [3]–[7]. However, the interpretability of the decision- Regions that have one or more M-class or X-class flares during
making process behind solar flare prediction can not be guar- their crossing of the observable solar disk. The Active Regions
anteed. Some ML models exhibit high performance but they that have never flared during the disk crossing (not even C-
are opaque in terms of explainability. Some AI researchers ar- class flares) are considered negative class examples. In terms
gue that the explanation is not essential for all AI applications, of binary classification between flaring and non-flaring Active
since it is too difficult to achieve, and unnecessary in certain Regions, we provide the post-hoc counterfactual explanations
applications [8]. However, for critical applications in the space for the solar flare prediction. In particular, a counterfactual
weather domain such as solar flare prediction, it is vital for instance is defined as a synthetic instance for which a trained
human beings to understand, trust and apply these AI systems machine learning model predicts the desired output which
to deal with corresponding problems. Therefore, the insight of is different from the prediction made on the query instance
interpretability is of crucial importance in predicting a solar [9]. In the case of solar flare prediction, if a given solar
flare event involving the potentially hazardous impacts of the flare is predicted by a classifier as a flaring Active Region,
solar flares [1]. what changes can be made from that solar flare to obtain a
X- and M-classes of solar flares are most often targeted different prediction, non-flaring Active Region? In addition,
in intense classes in solar flare prediction. As most flares we define the given solar flare as the query instance, the
occur in the Active Regions of the Sun, flare prediction can be instance that has been changed to obtain a different prediction
modeled as a supervised learning problem of machine learning, as the counterfactual instance, the label of the query instance
specifically the binary classification between flaring and non- as the query label and the label of the counterfactual instance
flaring Active Regions (AR), where flaring Active Regions are as the desired label. To generate the counterfactual instance
1239
Authorized licensed use limited to: New Jersey Institute of Technology. Downloaded on April 15,2024 at 19:11:22 UTC from IEEE Xplore. Restrictions apply.
B. Desirable properties of a counterfactual instance B. Adapt the nearest unlike neighbor to generate counterfac-
tual instance
According to [19], to provide a useful, plausible alternative
for the query instance, a counterfactual instance should obey To generate the counterfactual instance that satisfies validity,
the following initial desirable properties: proximity, sparsity, and contiguity properties, we try to find the
1) Validity: The prediction of the to-be-explained model f most important top k dimensions by comparing the distance
on the counterfactual instance X′ needs to be different per dimension between the query instance and its nearest
from the prediction of the to-be-explained model f on unlike neighbor. If the distance between the query instance
the query instance X (i.e., if f (X) = ci and f (X′ ) = cj , and its nearest unlike neighbor is relatively large for specific
then ci ̸= cj ). dimension data, we will consider this dimension as the impor-
2) Proximity: The to-be-explained query needs to be close tant dimension that we want to focus on. Then we substitute
to the generated counterfactual instance, which means the top k dimensions from the nearest unlike neighbor such
the distance between X′ and X should be minimal. that the classification label changes to the class of desired. To
3) Sparsity: The perturbation δ changing the query in- guarantee the proximity and the validity property at the same
stance X into X′ = X+δ should be sparse, which means time, we set the k as a parameter and k will be determined
fewer number of data points that needs to be changed as the minimum value that can make sure the counterfactual
to get the counterfactual explanation is preferred. instance is classified to the class of desired.
4) Contiguity: The counterfactual instance X′ = X + δ In addition, substituting the top k dimensions from the
needs to be perturbed in a single contiguous segment nearest unlike neighbor guarantees the counterfactual instance
which makes the solution semantically meaningful. we generate is perturbed in several contiguous segments and
5) Interpretability: The counterfactual X′ needs to be in- sparse, instead of changing the whole time series, as follows
distribution. We consider an instance X′ interpretable if
it lies close to the model’s training data distribution. The X =< x1 , x2 , x3 , x4 , x5 , ..., xd > s.t. f (X) = c (1)
X′ should be an inlier with respect to the training dataset
and an inlier to the counterfactual class.
X′ =< x′1 , x2 , x′3 , x4 , x′5 , ..., xd > s.t. f (X′ ) = c′ (2)
6) Model-agnosticism: The counterfactual explanation
model should produce a solution independent of the clas-
sification model f , high-quality counterfactuals without
prior knowledge of the gradient values derived from Algorithm 1 Fast Counterfactual Explanation for Solar Flare
optimization-based classification models should be gen- Prediction
erated. Input: Training set T, query set samples, prediction model
In our experimental evaluation part, we will take these proper- for solar flare time series data f , the number of dimension of
ties into consideration to verify the superiority of our method query instance d
to existing state-of-the-art explainability methods. Output: CF, counterfactual instances for query instances
1: CF = ∅
IV. FAST C OUNTERFACTUAL E XPLANATION (FAST-CF) 2: for X ← samples do
FOR S OLAR F LARE P REDICTION 3: Xc = nearest unlike neighbor from T
In this section, we describe our proposed fast counterfactual 4: Dists = []
explanation method for solar flare prediction in detail. In 5: for dimension i from 1 to d do
particular, the method includes two main steps: 1. Retrieve the 6: Dist = np.sum (X[i, :] - Xc [i, :])
nearest unlike neighbor 2. Adapt the nearest unlike neighbor 7: Dists.append(Dist)
to generate counterfactual instances. The process of FAST-CF 8: k = 0, X ′ = X ▷ k is initialized to 0 and the
generation is shown in Figure 2. The algorithm is shown in counterfactual instance is initialized as X
Algorithm 1. 9: while f (X ′ ) ̸= c′ do
10: k = k+1
A. Retrieve the nearest unlike neighbor 11: idx = np.argpartition(Dists, -k)[-k:]
12: indices = idx[np.argsort((-np.array(Dists))[idx])] ▷
Given a query instance X, find a counterfactual instance Find the top k dimensions where there are top k
candidate Xc that exists in the training dataset. An example of maximum distances
one such instance is the query’s nearest unlike neighbor. This 13: for index ← indices do
nearest unlike neighbor is from the training dataset, the label 14: X′ [index, :] = Xc [index, :] ▷ replace the top k
of it is our desired label, which guarantees the explanation’s dimensions
interpretability property as it is, by definition, within the distri- 15: CF.add(X′ )
bution. However, such instances are not guaranteed to satisfy
16: return CF
the proximity, sparsity, and contiguity properties. Therefore, an
adaptation step is necessary to satisfy the remaining properties.
1240
Authorized licensed use limited to: New Jersey Institute of Technology. Downloaded on April 15,2024 at 19:11:22 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. The process of FAST-CF generation
V. E XPERIMENTAL SETTING where the first loss term Lpred guides the search towards
points X′ which would change the model prediction and
A. Data sets description
the second term Ldist ensures that X′ is close to X. This
In our experiments, we use a solar flare prediction dataset form of loss has a single hyperparameter λ weighing the
that was published by the Data Mining Lab of Georgia State contributions of the two competing terms.
University [1]. Each sample in this dataset is a multivariate • Native guide counterfactual (NG-CF): NG-CF is
time series, each of them has 60 values representing an obser- another baseline that we used to compare our proposed
vation taken at 12-minute intervals. The first value in the array FAST-CF methods. NG-CF uses Dynamic Barycenter
is the sample observation taken at the furthest point in time (DBA) averaging of the query time series x and the
from the prediction period and the last value in the array is nearest unlike neighbor from another class to generate
the sample observation taken at the closest point in time to the the counterfactual example [15].
prediction period. The period of observations represented by
each labeled time series is a 12-hour window of observations C. Prediction Model Details
sliced from a longer time series. In particular, each multivariate For fairness purposes, we evaluated all the aforementioned
time series includes 33 solar magnetic field parameters. The counterfactual baselines on the same predictive model f . In
dataset consists of 5 classes, namely X, M, C, B, and Q, particular, we used a convolutional neural network model that
where Q represents the flare-quiet regions where no flare has consists of two convolution layers with respectively 128 and
been detected within the observation period. In particular, to 64 one-dimensional filters and ReLU activations. Each con-
conduct a binary case of counterfactual explanation, the X- volutional layer is followed by a max-pooling layer. Dropout
and M-classes of solar flares are considered to be positive with a fraction of 30% is applied during training. The output
class to represent the flaring active regions, the C-, B-, and of the second pooling layer is flattened and fed into a fully
Q-classes of solar flares are considered to be negative class connected layer of size 256 with ReLU activation and 50%
to represent the non-flaring regions. The class distribution of dropout. This dense layer is followed by a softmax output
the original dataset is imbalanced. To address the imbalance layer over the number of classes. The model is trained using
issue of the dataset, we use the undersampling technique to an Adam optimizer with batch size 32.
generate a balanced dataset. D. Experimental result
B. Baseline methods In this section, we utilize different evaluation metrics to
compare FAST-CF with the other two baselines with respect to
We evaluated our proposed method with the other two the desirable properties of counterfactual instances discussed
baselines, Alibi [20] and Native guide counterfactual [15]. in Section III-B. Since the data we use in our experiments is
• Alibi Counterfactual (Alibi): The Alibi generates multivariate time series, for better understanding, we flatten the
counterfactual explanations by optimizing an objective multivariate time series data into one dimension data and then
function, apply the evaluation metrics on the flattened data. In addition,
L = Lpred + λLdist , (3) for each evaluation metric, the result we show is the average
1241
Authorized licensed use limited to: New Jersey Institute of Technology. Downloaded on April 15,2024 at 19:11:22 UTC from IEEE Xplore. Restrictions apply.
value among the whole dataset. The details of each evaluation TABLE I
metric are shown below. C OMPARING THE PERFORMANCES OF NG-CF, A LIBI , AND FAST-CF
MODELS IN TERMS OF L1 DISTANCE AND THE TARGET PROBABILITY (T HE
L1 distance, which measures the distance between the WINNER IS BOLDED ).
counterfactual instance and the query instance, a smaller L1
distance is desired. Table I shows that our proposed FAST-CF L1 distance Target probability
Method
Mean Std Mean Std
method achieves the minimum L1 distance when compared NG-CF 1.68e+12 3.22e+13 1.0 0
with the other two baselines. ALIBI 495.16 376.69 0.56 0.11
Sparsity level, which indicates the level of time series FAST-CF 69.03 69.79 1.0 0
perturbations. A high sparsity level that is approaching 100%
is desirable, which means the time series perturbations made
in X to achieve X ′ is minimal. We computed the sparsity vector representations of the original query set in Figure 4
level using the Equations 4-5. From the blue dash line plot in and show the transformed time series vector representations
Figure 3, we can notice that our proposed FAST-CF performs of the original query set and the generated counterfactual
best in terms of sparsity level compared with the other two explanations using FAST-CF in Figure 5. In Figure 5, the
baselines. red stars show the generated counterfactual instances’ two-
Plen(X) dimensional embedding, while the circle markers show the
g(Xi′ , Xi ) original query set data points. By comparing the two figures,
sparsity = 1 − i=0 (4)
len(X) we can see that the original query set data points and the gen-
erated counterfactual instances are highly overlapped, which
1, if x ̸= y
g(x, y) = (5) means that our generated counterfactual instances from FAST-
0, otherwise
CF are within the distribution of the original query set.
The number of independent non-contiguous segments is also
investigated to show the contiguity. The lower the number of
independent non-contiguous segments the better. From the red
bar plot in Figure 3, we can see that our proposed FAST-CF
method results in the minimum number of independent non-
contiguous segments.
In addition, we define the validity metric by comparing the
target class probability for the prediction of the counterfactual
explanation result. The closer the target class probability is to
1, the better. From Table I, we can see that FAST-CF achieves
1.0 target class probability, which is much larger than the target
class probability generated by ALIBI.
1242
Authorized licensed use limited to: New Jersey Institute of Technology. Downloaded on April 15,2024 at 19:11:22 UTC from IEEE Xplore. Restrictions apply.
R EFERENCES
[1] R. A. Angryk, P. C. Martens, B. Aydin, D. Kempton, S. S. Mahajan,
S. Basodi, A. Ahmadzadeh, X. Cai, S. Filali Boubrahimi, S. M. Hamdi
et al., “Multivariate time series dataset for space weather data analytics,”
Scientific data, vol. 7, no. 1, pp. 1–13, 2020.
[2] S. M. Hamdi, D. Kempton, R. Ma, S. F. Boubrahimi, and R. A. Angryk,
“A time series classification-based approach for solar flare prediction,”
in 2017 IEEE International Conference on Big Data (Big Data). IEEE,
2017, pp. 2543–2551.
[3] X. Huang, H. Wang, L. Xu, J. Liu, R. Li, and X. Dai, “Deep learning
based solar flare forecasting model. i. results for line-of-sight magne-
tograms,” The Astrophysical Journal, vol. 856, no. 1, p. 7, 2018.
[4] N. Nishizuka, K. Sugiura, Y. Kubo, M. Den, and M. Ishii, “Deep flare
net (defn) model for solar flare prediction,” The Astrophysical Journal,
vol. 858, no. 2, p. 113, 2018.
[5] E. Park, Y.-J. Moon, S. Shin, K. Yi, D. Lim, H. Lee, and G. Shin,
“Application of the deep convolutional neural network to the forecast
of solar flare occurrence using full-disk solar magnetograms,” The
Astrophysical Journal, vol. 869, no. 2, p. 91, 2018.
[6] Y. Chen, W. B. Manchester, A. O. Hero, G. Toth, B. DuFumier, T. Zhou,
X. Wang, H. Zhu, Z. Sun, and T. I. Gombosi, “Identifying solar flare
Fig. 5. 2-D PCA projection of the original query set (training data) and precursors using time series of sdo/hmi images and sharp parameters,”
the counterfactual explanations. The red stars represent the counterfactual Space Weather, vol. 17, no. 10, pp. 1404–1426, 2019.
instances generated by FAST-CF. [7] K. Domijan, D. S. Bloomfield, and F. Pitié, “Solar flare forecasting from
magnetic feature properties generated by the solar monitor active region
tracker,” Solar Physics, vol. 294, no. 1, pp. 1–19, 2019.
[8] D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z.
Yang, “Xai—explainable artificial intelligence,” Science Robotics, vol. 4,
no. 37, p. eaay7120, 2019.
[9] A. Van Looveren, J. Klaise, G. Vacanti, and O. Cobb, “Conditional
generative models for counterfactual explanations,” arXiv preprint
arXiv:2101.10123, 2021.
[10] M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?”
explaining the predictions of any classifier,” in Proceedings of the 22nd
ACM SIGKDD international conference on knowledge discovery and
data mining, 2016, pp. 1135–1144.
[11] R. Guidotti, A. Monreale, S. Ruggieri, D. Pedreschi, F. Turini, and
F. Giannotti, “Local rule-based explanations of black box decision
systems,” arXiv preprint arXiv:1805.10820, 2018.
[12] S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model
predictions,” Advances in neural information processing systems, vol. 30,
2017.
[13] M. Schleich, Z. Geng, Y. Zhang, and D. Suciu, “Geco: Quality coun-
terfactual explanations in real time,” arXiv preprint arXiv:2101.01292,
2021.
[14] S. Wachter, B. Mittelstadt, and C. Russell, “Counterfactual explanations
Fig. 6. An example of original query instance and its counterfactual instance without opening the black box: Automated decisions and the gdpr,”
(flattened to one dimension) Harv. JL & Tech., vol. 31, p. 841, 2017.
[15] E. Delaney, D. Greene, and M. T. Keane, “Instance-based counterfactual
explanations for time series classification,” in International Conference
on Case-Based Reasoning. Springer, 2021, pp. 32–47.
work that focuses on generating counterfactual explanations [16] A. Mahendran and A. Vedaldi, “Understanding deep image represen-
tations by inverting them,” in Proceedings of the IEEE conference on
for tabular data can not be applied to multivariate time series computer vision and pattern recognition, 2015, pp. 5188–5196.
data directly. We address the high-dimension challenge by [17] K. Yi, Y.-J. Moon, D. Lim, E. Park, and H. Lee, “Visual explanation of
proposing FAST-CF, which incorporates the nearest unlike a deep learning solar flare forecast model and its relationship to physical
parameters,” The Astrophysical Journal, vol. 910, no. 1, p. 8, 2021.
neighbor for guiding the counterfactual search. By focusing [18] E. Ates, B. Aksar, V. J. Leung, and A. K. Coskun, “Counterfactual expla-
on a small set of important dimension substitutions, the FAST- nations for multivariate time series,” in 2021 International Conference
CF method guides the perturbations on the query solar flare on Applied Artificial Intelligence (ICAPAI). IEEE, 2021, pp. 1–8.
[19] A. V. Looveren and J. Klaise, “Interpretable counterfactual explanations
prediction data resulting in significantly sparse and more guided by prototypes,” in Joint European Conference on Machine
contiguous explanations than other baseline methods. To the Learning and Knowledge Discovery in Databases. Springer, 2021,
best of our knowledge, this is the first effort to focus on a pp. 650–665.
[20] J. Klaise, A. V. Looveren, G. Vacanti, and A. Coca, “Alibi explain:
small set of dimension substitutions while generating coun- Algorithms for explaining machine learning models,” Journal of
terfactual explanations for multivariate TSC. There are spaces Machine Learning Research, vol. 22, no. 181, pp. 1–7, 2021. [Online].
for extensions of our work with counterfactual explanations Available: http://jmlr.org/papers/v22/21-0017.html
[21] J. Yang, D. Zhang, A. F. Frangi, and J.-y. Yang, “Two-dimensional pca: a
for solar flare prediction with high dimensions. As a future new approach to appearance-based face representation and recognition,”
direction of this work, we would like to leverage our method IEEE transactions on pattern analysis and machine intelligence, vol. 26,
to fit into multivariate time series data in other domains with no. 1, pp. 131–137, 2004.
different dimension complexity.
1243
Authorized licensed use limited to: New Jersey Institute of Technology. Downloaded on April 15,2024 at 19:11:22 UTC from IEEE Xplore. Restrictions apply.