Graph Sample and Aggregate-Attention Network For Hyperspectral Image Classification
Graph Sample and Aggregate-Attention Network For Hyperspectral Image Classification
Abstract— Graph convolutional network (GCN) has shown Deep learning has achieved great success in many appli-
potential in hyperspectral image (HSI) classification. However, cations and it has also greatly promoted the technological
GCN is a transductive learning method, which is difficult to progress of HSI classification. For example, in [6] and [7],
aggregate the new node. The available GCN-based methods fail the HSIs were classified using different dimensional con-
to understand the global and contextual information of the volutions; Liu et al. [8] introduced a spectral–spatial feature
graph. To address this deficiency, a novel semisupervised net-
extraction method based on the long and short-term memory
work based on graph sample and aggregate-attention (SAGE-A)
for HSIs’ classification is proposed. Different from the GCN- artificial neural networks (LSTM) network. Zhang et al. [9]
based method, SAGE-A adopts a multilevel graph sample and used image semantic context to classify HSIs. He et al. [10]
aggregate (graphSAGE) network, as it can flexibly aggregate the adopted residual networks to learn spatial and spectral charac-
new neighbor node among arbitrarily structured non-Euclidean teristics of the image to improve the classification rate. In [11],
data and capture long-range contextual relations. Inspired by an unsupervised spectral–spatial feature extraction network
the convolution neural network (CNN) self-attention mechanism, was proposed. However, convolution neural network (CNN)
the proposed network uses the graph attention mechanism to needs a large number of training labels and calculation.
characterize the importance among spatially neighboring regions, Simultaneously, the CNN only performs convolution on the
so the deep contextual and global information of the graph can be regular region. Furthermore, the size of the CNN convolution
learned automatically by focusing on important spatial targets.
Extensive experimental results on different real hyperspectral
kernel is fixed, which will lead to edges missing phenomenon
data sets demonstrate the performances of our proposed method in the process of feature extraction [12].
compared with the state-of-the-art methods. To ameliorate these issues, people have conducted exten-
sive researches on classification using graph convolution
Index Terms— Global and contextual information, graph networks (GCNs). The GCN conducts semisupervised learn-
convolution neural network, hyperspectral image (HSI) ing on graph-structured data and can operate on graph
classification.
signals directly via a variant of CNNs. Sha et al. [13]
applied the graph attention network to hyperspectral clas-
I. I NTRODUCTION sification. Mou et al. [14] proposed a nonlocal graph con-
volution network, which constructs a graph by calculating
H YPERSPECTRAL images (HSIs) provide detailed spec-
tral information through hundreds of (narrow) spectral
channels, which can be used to accurately classify diverse
the relationship between nonadjacent pixels to improve the
classification accuracy. Hong et al. [15] proposed a graph
materials of interest [1], [2]. However, the increased dimen- convolution classification method combining GCN and CNN
sionality of such data provides a challenge to conventional to increase the classification accuracy. Wan et al. [16] used a
techniques, and hyperspectral classification has great research multiscale graph convolutional network to extract multiscale
value. graph features. Wan et al. [17] adopted a context-aware mech-
In the past few decades, people have conducted significant anism to learn the local contextual of the graph. The mentioned
efforts on HSI classification, which can be summarized into methods are GCN-based. However, GCN is a transductive
two categories: traditional methods and neural network meth- learning and whole graph training method, which is difficult
ods. Traditional methods have made some efforts on explored to aggregate the new node and will bring a huge amount of
more discriminative feature representations, such as morpho- computation.
logical features and texture features [3]. Apart from these The main contributions in this letter are as follows: 1) incor-
subspaces learning, sparse learning algorithms and machine poration of sample and aggregate (SAGE) (first time) for
learning, such as random forest and support vector machine extracting contextual relations among superpixels; 2) utiliza-
(SVM) [4], [5], have received great attention in the community. tion of multilevel graph projection and flexible reprojec-
However, traditional methods have defects in feature extraction tion framework for extracting long-range contextual relations
completeness and may suffer from overfitting because of the and producing truthful local-region features; and 3) adoption
deficiency in training samples. of attention mechanism graph refinement for characterizing
global and contextual relations and accurately finding precise
Manuscript received February 8, 2021; accepted February 22, 2021. Date region representations.
of publication March 15, 2021; date of current version December 28, 2021.
This work was supported in part by the National Natural Science Foundation II. R ELATED W ORK
of China under Grant 41404022 and in part by the National Natural Science
Foundation of Shanxi Province Grant 2015JM4128. (Xiaofeng Zhao is co-first Many researchers have published their methods to classify
author.) (Corresponding author: Xiaofeng Zhao.)
The authors are with the Xi’an Research Institute of High Technology, Xi’an
HSIs. In this part, we mainly introduce the graph neural
710000, China (e-mail: [email protected]). network (GNN) method, which has a lot of relationships with
Digital Object Identifier 10.1109/LGRS.2021.3062944 our work.
1558-0571 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
5504205 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022
Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
DING et al.: GRAPH SAGE-A NETWORK FOR HSI CLASSIFICATION 5504205
Fig. 2. Schematic of the graph attention mechanism [see (9)]. l denotes D. Region-to-Pixel Assignment
the lth layer of the network. Different colors of the nodes represent different
land-cover type features. Multiscale information has been widely proved to be very
useful for HSI classification [18], [19]. Ground objects have
different geometric features, and multilevel feature extraction
a further layer, ∀u is the eigenvector of the node u, {hk−1u , can fully learn the contextual information of the image. The
∀u ∈ N(v)} denotes the embedding of the neighbor U of the
network uses multilayer graphSAGE to learn the relationship
node V in the k − 1 layer, and hkv represents the characteristic between superpixels of different scales as Algorithm 1. Fig. 3
of all neighbors of node v at the k level. In this letter, demonstrates the 1-hop and 2-hop neighbors of a central exam-
aggregator
functions (AGG) can be expressed as AGG = ple A to illustrate the multilevel design. Then, the receptive
u∈N (v) u )/(|N(v)|). From Algorithm 1, for each iteration
(h k−1
field of A at the scale s is formed as
or search depth, the nodes collect information from their local
H s (x i ) = H 1 (H s−1 (x i ), x s−1 ) (10)
neighbors, and as the process iterates, the nodes gradually get
more and more information from farther reaches of the graph. where H 0 (x i ) = x i , H 1 (x i ) is the new node embedding
Thus, long-range contextual relations are extracted. of 1-hop neighbors of x i . Considering that the information
association degree of different nodes is different, we use (10)
to analyze the association degree of the learned information.
C. Graph Attention The network output is expressed as follows:
In the experiment, we find that the association degree O = A(H s (x i )) (11)
between different nodes is different. To extract the global and
contextual information better, a graph attention mechanism is where A is the attention mechanism, and O is the output of
added into the network to make the important node infor- SAGE-A. In our network, the cross-entropy error is adopted
mation have greater weight. The graph attention mechanism to penalize the difference between the network output and the
can obtain the global geometric features by calculating the labels of the original labeled examples, namely
relationship between any two nodes in the graph. To get
C
the corresponding transformation between the input and the L=− Y z f ln O z f (12)
output, it is necessary to obtain the output features by linear z∈ y G f =1
transformation according to the input characteristics at least
once. A weight matrix is trained for all nodes: W ∈ R F ×F ,
where yG is the labeled examples set, C denotes the number
which is the relationship between the input features F and the of classes, and Y z f is the label matrix. The details of our
output features F . Node-to-node correlation can be learned SAGE-A are shown in Algorithm 2. The input feature of the
through the network layer SAGE-A is the average spectral signatures of the graph nodes,
which enables the network to process the spectral information
ei j = LeakyReLU aT W hi ||W h j . (7) about HSIs. At the same time, the SAGE method is adopted
Equation (7) shows the importance of node j to node i , to process the spatial relationship of the nodes in the graph
a T ∈ R 2F is the parameter vector of the network, || denotes network, so that the model can learn the long-range spatial
the concatenation operation, and LeakyReLU(·) is a nonlinear information of the HSIs, and the graph attention mechanism is
layer. used to process the overall information of the graph to learn the
Then, normalizing and converting ei j to a probability output global and contextual information of each node in the graph.
ai j through a softmax
function IV. E XPERIMENTAL R ESULTS
exp LeakyReLU a T W hi ||W h j A. Data Set Description and Implementation
ai j = . (8)
k∈Ni exp LeakyReLU a W h i ||W h j
T
Two real hyperspectral data sets of Pavia University (PU)
Therefore, the graph convolution output of each node can and Houston 2013 are adopted to verify the classification
be expressed as follows: performance of our proposed method. The first data set PU
⎛ ⎞ contains 610 × 340 pixels and 103 bands, including a large
number of background pixels, and 42 776 pixels can be applied
hli = σ ⎝ ai j · W T hl−1
i
⎠ (9)
to classification. The whole map contains nine kinds of fea-
j ∈Ni
tures. The second data set Houston 2013 has been used in the
where σ denotes the activate function, and ai j is the learned 2013 Geoscience and Remote Sensing Society (GRSS) Data
attention weight. Fusion Contest. The Salinas scene is composed of 144 spectral
Fig. 2 shows the working process of the graph attention bands and 349 × 1905 pixels. These pixels are divided into
mechanism in SAGE-A. By learning the importance weight 15 categories.
of each node to the classified node, the graph attention
mechanism makes the important nodes have greater weight, B. Experimental Setting
and hence, global and contextual information can be learned For the two HSI data sets described in Section IV-A,
from the graph via an attention mechanism. 30 labeled pixels in each class are randomly selected for
Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
5504205 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022
Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
DING et al.: GRAPH SAGE-A NETWORK FOR HSI CLASSIFICATION 5504205
Fig. 4. OAs of various methods under different numbers of labeled examples R EFERENCES
per class. (a) University of Pavia data set. (b) Houston 2013 data set. [1] B. Rasti et al., “Feature extraction for hyperspectral imagery: The evo-
lution from shallow to deep: Overview and toolbox,” IEEE Geosci.
Remote Sens. Mag., vol. 8, no. 4, pp. 60–88, Dec. 2020, doi:
10.1109/MGRS.2020.2979764.
[2] P. Zhong, Z. Gong, and J. Shan, “Multiple instance learning for multiple
diverse hyperspectral target characterizations,” IEEE Trans. Neural Netw.
Learn. Syst., vol. 31, no. 1, pp. 246–258, Jan. 2020.
[3] K. Djerriri, A. Safia, R. Adjoudj, and M. S. Karoui, “Improving
hyperspectral image classification by combining spectral and multiband
compact texture features,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Jul. 2019, pp. 465–468.
[4] C. Bo, H. Lu, and D. Wang, “Hyperspectral image classification via JCR
and SVM models with decision fusion,” IEEE Geosci. Remote Sens.
Lett., vol. 13, no. 2, pp. 177–181, Feb. 2016.
Fig. 5. Parametric sensitivity of l and S. (a) PU data set. (b) Houston
[5] L. Wang, S. Hao, Q. Wang, and Y. Wang, “Semi-supervised classification
2013 data set.
for hyperspectral imagery based on spatial-spectral label propagation,”
TABLE III ISPRS J. Photogramm. Remote Sens., vol. 97, pp. 123–137, Nov. 2014.
OA, AA (%), AND K APPA C OEFFICIENT A CHIEVED BY D IFFERENT D ATA [6] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional
S ETS . M ODEL S ETTINGS ON PU A ND HOUSTON 2013 D ATA S ETS neural networks for hyperspectral image classification,” J. Sensors,
vol. 2015, pp. 1–12, Jul. 2015.
[7] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep
supervised learning for hyperspectral data classification through con-
volutional neural networks,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Jul. 2015, pp. 4959–4962.
[8] Q. Liu, F. Zhou, R. Hang, and X. Yuan, “Bidirectional-convolutional
LSTM based spectral-spatial feature learning for hyperspectral image
classification,” Remote Sens., vol. 9, no. 12, p. 1330, Dec. 2017.
[9] M. Zhang, W. Li, and Q. Du, “Diverse region-based CNN for hyperspec-
on classified land cover, which is more robust than using a tral image classification,” IEEE Trans. Image Process., vol. 27, no. 6,
precomputed fixed graph. pp. 2623–2634, Jun. 2018.
The impact of convolution layers l and segment scales S on [10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
the two data sets is revealed in Fig. 5. We can conclude that image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 1–9.
both l and S have a significant impact on the classification [11] R. Kemker and C. Kanan, “Self-taught feature learning for hyperspectral
accuracies. Meanwhile, the best result is usually reached with image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5,
convolution layer 3. Multilevel is able to learn more spatial pp. 2693–2705, May 2017, doi: 10.1109/TGRS.2017.2651639.
information at a larger scale. However, the characteristics [12] D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “An augmented linear
mixing model to address spectral variability for hyperspectral unmixing,”
learned through iteration or search depth have an inhibitory IEEE Trans. Image Process., vol. 28, no. 4, pp. 1923–1938, Apr. 2019.
effect on the classification due to the low correlation. For [13] A. Sha, B. Wang, X. Wu, and L. Zhang, “Semisupervised classification
S, the classification accuracies would increase as the seg- for hyperspectral images using graph attention networks,” IEEE Geosci.
ment scale increases. However, the amount of calculation Remote Sens. Lett., vol. 18, no. 1, pp. 157–161, Jan. 2021.
also increases exponentially, which may be unaccepted under [14] L. Mou, X. Lu, X. Li, and X. X. Zhu, “Nonlocal graph convolu-
tional networks for hyperspectral image classification,” IEEE Trans.
limited experimental conditions. In our proposed method, the Geosci. Remote Sens., vol. 58, no. 12, pp. 8246–8257, Dec. 2020, doi:
segment scale S is 30 000, which has reached the limits of 10.1109/TGRS.2020.2973363.
computing. [15] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot,
“Graph convolutional networks for hyperspectral image classification,”
E. Ablation Study IEEE Trans. Geosci. Remote Sens., early access, Aug. 18, 2020, doi:
10.1109/TGRS.2020.3015157.
In this experiment, we investigate the ablative effect [16] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale
of the SAGE-based attention mechanism. For the sake of dynamic graph convolutional network for hyperspectral image classifica-
comparison, we record the classification results produced tion,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3162–3177,
without using an attention mechanism, and the simplified May 2020.
[17] S. Wan, C. Gong, P. Zhong, S. Pan, G. Li, and J. Yang, “Hyperspec-
model is denoted as “SAGE.” The experimental setting tral image classification with context-aware dynamic graph convolu-
is kept identical to Section IV-B. The comparative results tional network,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1,
are demonstrated in Table III. As shown in the table, pp. 597–612, Jan. 2021, doi: 10.1109/TGRS.2020.2994205.
the SAGE-based attention mechanism plays an important role [18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
in the improvement of learning efficiency. R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
V. C ONCLUSION Jun. 2014.
[19] S. Zhang and S. Li, “Spectral-spatial classification of hyperspectral
In this letter, a novel SAGE-A for HSI classification is images via multiscale superpixels based sparse representation,” in Proc.
proposed. To extract long-range contextual relations, we go IEEE IGARSS, Jul. 2016, pp. 2423–2426.
Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.