0% found this document useful (0 votes)
12 views

Hyperspectral Image Classification Based On Deep Attention Graph Convolutional Network

Uploaded by

HZT 99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Hyperspectral Image Classification Based On Deep Attention Graph Convolutional Network

Uploaded by

HZT 99
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL.

60, 2022 5504316

Hyperspectral Image Classification Based on Deep


Attention Graph Convolutional Network
Jing Bai , Senior Member, IEEE, Bixiu Ding, Member, IEEE, Zhu Xiao , Senior Member, IEEE,
Licheng Jiao , Fellow, IEEE, Hongyang Chen , Senior Member, IEEE, and Amelia C. Regan , Member, IEEE

Abstract— Hyperspectral images (HSIs) have gained high on the Indian Pines data set, the overall accuracy of the proposed
spectral resolution due to recent advances in spectral imaging method achieves 98.61% when the training sample is 10%.
technologies. This incurs problems, such as an increased data
scale and an increased number of bands for HSIs, which results Index Terms— Attention mechanism, graph convolution net-
in a complex correlation between different bands. In the appli- work (GCN), hyperspectral image classification (HIC), similarity
cations of remote sensing and earth observation, ground objects measurement.
represented by each HSI pixel are composed of physical and
chemical non-Euclidean structures, and HSI classification (HIC) I. I NTRODUCTION
is becoming a more challenging task. To solve the above problems,
we propose a framework based on a deep attention graph
convolutional network (DAGCN). Specifically, we first integrate
an attention mechanism into the spectral similarity measurement
H YPERSPECTRAL images (HSIs) have drawn much
attention in recent decades due to their rich data and high
spectral resolution [1], [2]. By leveraging such rich hyperspec-
to aggregate similar spectra. Therefore, we propose a new
similarity measurement method, i.e., the mixed measurement of a
tral information, HSIs have been successfully applied to a wide
kernel spectral angle mapper and spectral information divergence range of applications, including urban development, land mon-
(KSAM-SID), to aggregate similar spectra. Considering the non- itoring, scene interpretation, and resource exploration [3]–[6].
Euclidean structural characteristics of HSIs, we design deep Among these applications, HSI classification (HIC) is a com-
graph convolutional networks (DeepGCNs) as a feature extrac- mon technique that enables analysis of the ground feature
tion method to extract deep abstract features and explore the
internal relationship between HSI data. Finally, we dynamically
characteristics, thereby facilitating HSI applications.
update the attention graph adjacency matrix to adapt to the
changes in each feature graph. Experiments on three standard A. Related Works
HSI data sets, namely, the Indian Pines, Pavia University, and
Salinas data sets, demonstrate that the DAGCN outperforms the Many alternative methods have been developed for imple-
baselines in terms of various evaluation criteria. For example, menting HIC. For instance, people used pixels as the basic
unit to classify HSIs and compared the similarity between
pixels. Among these methods, the k-nearest neighbor (KNN)
Manuscript received December 31, 2020; revised February 21, 2021; [7] method is widely used because of its simplicity and
accepted March 13, 2021. Date of publication March 25, 2021; date of comprehensibility. However, this simple and crude method has
current version December 13, 2021. This work was supported in part by
the State Key Program of National Natural Science of China under Grant high computational complexity and cannot mine the features
61836009, in part by the National Natural Science Foundation of China of hyperspectral data. To extract features from HSIs, pattern
under Grant 61772401, in part by the Fundamental Research Funds for the recognition methods, such as support vector machine (SVM)
Central Universities under Grant RW180177, in part by the Open Fund
of Key Laboratory of Intelligent Perception and Image Understanding of [8], have been proposed to adapt to the nonlinear structure of
Ministry of Education, Xidian University, under Grant IPIU2019007, in part hyperspectral data, while the SVM cannot solve the problem
by the Natural Resources Scientific Research Project of the Department of of multiclassification. Some attempts have been conducted to
the Natural Resources of Hunan Province under Grant 201910, in part by
the Funding Projects of Zhejiang Lab under Grant 2021LC0AB05 and Grant capture spatial features from the original data by extracting
2020LC0PI01, and in part by the Open Fund of State Key Laboratory of attribute profiles [9] and texture features [10], but the features
Geoinformation Engineering under Grant SKLGIE2018-M-4-3. (Jing Bai and extracted from 2-D images are not necessarily the spatial
Zhu Xiao contributed equally to this work.) (Corresponding author: Zhu Xiao.)
Jing Bai, Bixiu Ding, and Licheng Jiao are with the Key Laboratory features of 3-D images.
of Intelligent Perception and Image Understanding of Ministry of Edu- With the development of artificial intelligence and neural
cation, School of Artificial Intelligence, Xidian University, Xi’an 710071, networks, new learning-based methods, such as active learn-
China (e-mail: [email protected]; [email protected];
[email protected]). ing [11], semisupervised learning [12], and artificial neural
Zhu Xiao is with the College of Computer Science and Electronics Engineer- networks [13], have been exploited to classify tasks. Among
ing, Hunan University, Changsha 410082, China (e-mail: [email protected]). these, convolutional neural networks (CNNs) [14] and recur-
Hongyang Chen is with the Zhejiang Laboratory, Hangzhou 311121, China
(e-mail: [email protected]). rent neural networks (RNNs) [15] are widely used in HIC.
Amelia C. Regan is with the Department of Computer Science, University Deep convolutional networks have benefited from the charac-
of California at Irvine, Irvine, CA 92697 USA, and also with the Institute of teristics of convolution operators, including the use of local
Transportation Studies, University of California at Irvine, Irvine, CA 92697
USA (e-mail: [email protected]). connections to reduce parameters and maintain the translation
Digital Object Identifier 10.1109/TGRS.2021.3066485 invariance of data and the high abstraction of deep networks;
1558-0644 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

problems and complex data [36]. Unfortunately, this is often


the case considering a large number of HSI bands and spectral
data redundancy. Currently, the spectral resolution of imag-
ing spectrometers has reached the nanoscale. Moreover, the
method for splitting light increases the complexity; in other
words, the spectroscopic model is highly diversified, which
adds a number of spectral and spatial resolutions to HSIs.
The improved resolution creates high-quality HSI, while
it inevitably introduces two problems for HIC: the complex
and redundant correlation between bands and increased infor-
Fig. 1. Comparison of convolution. (a) 2-D-CNN convolution. (b) Graph mation storage. For the first problem, the improvement in
convolution.
spectral resolution results in a larger number of bands, and
these networks have demonstrated a strong feature extraction the correlation between bands is enhanced. Intuitively, this
ability on hyperspectral data and have achieved very com- hinders the effective interpretation of HSIs [37]. For example,
petitive classification performance. In 2015, Hu et al. [16] when the vegetation reflection characteristics in the visible
and Mei et al. [17] first introduced 1-D-CNNs into HIC to light band are low and the similarity in the visible spectrum
extract spectral features. 2-D-CNNs [18]–[20] and 3-D-CNNs region is large, the influence of terrain shadow on the solar
[21], [22] were then gradually proposed to adapt to the HSI reflection band is similar. As a result, the solar reflection
features. Considering that HSIs are sequence-based in nature, region bands are severely interrelated. In these cases, it is
to avoid losing sequence information, Mou et al. [23] used challenging to compare the spectral correlation between HSI
an RNN to classify HSIs. Hang et al. [24] proposed a cas- pixels when a large number of bands are present. As such,
caded RNN model to improve the feature recognition ability. it is challenging to classify pixels with high correlation into
Zhang et al. [25] introduced the local space sequence (LSS) the same category. For the second problem, the improvement
method into RNNs to extract local and semantic information in resolution leads to more complex information being stored
from HSIs. in each unit area, which significantly increases the difficulty
Although the HIC methods based on convolutional networks of HSI feature extraction. For example, the spectral charac-
achieve good results for classification accuracy, they face teristics of rocks are affected by the composition, structural
the following problem. This stems from the fact that the color, and surface state, while the spectral characteristics
convolutional networks use a fixed shape kernel around the of soil are mainly affected by the water content, organic
central pixel to extract features, which blurs the classifica- matter, and other factors. The increased data indicate that more
tion boundary and results in poor performance at the class features are included; thus, feature extraction becomes more
boundary in HIC, as shown in Fig. 1(a). To overcome this difficult. In summary, in response to the improved resolution of
shortage, the latest graph convolution networks (GCNs) are hyperspectral imagery, performing accurate and reliable HIC
devoted to redefining convolution operators. From the spatial faces the following challenging issues.
domain, as shown in Fig. 1(b), the graph convolution operator 1) How to distinguish similar bands from high-dimensional
can provide local feature abstraction functions, such as the spectral space when HSIs have considerable redundant
convolution kernel. The form of the graph convolution is more information and high correlation between bands. HSI
flexible and has greater potential for HSI with pixel-level bands are narrow, and the number of bands reaches
classification. In [26], GCN was first applied to HIC, which hundreds. In addition, there is a high level of redun-
explicitly uses adjacent nodes in the graph to approximate dancy between the bands. For each surface feature, the
convolution. After that, superpixel segmentation [27], potential causes of band correlation are manifold and involve
spatial similarity [28], and dynamic graph [29] were applied the influence of light, moisture, and organic matter.
in GCN for HIC. Motivated by convolutional networks, Li et Traditional methods, for example, distance-, projection-,
al. [30] integrated the success aspects of deep CNNs into and information measure-based similarity measurement
GCNs, namely, residual connections [31], dense connections methods, only focus on one aspect (e.g., spectral shape
[32], and dilated convolutions [33], showing that GCNs can and amplitude). In addition to the characteristics of the
extract multiscale, deep-level abstract features. spectra, the spatial distance will also affect the similarity
between the spectra.
B. Motivations and Contributions 2) How to focus on the spectral information that has the
In recent years, with the rapid development of spectral most influence from the HSI data with more redundancy.
imaging technology, the resolution of HSIs has been improved, There is considerable redundancy between HSIs, and the
more information has been stored, and the number of bands spectra interfere with each other. Due to the influence of
has been considerably increased [34], [35]. This generates the surrounding environment, such as different illumina-
more complex spectral structure information, which, there- tion angles, water content differences, and interference
fore, increases the pressure to achieve accurate classification of radioactive substances, the corresponding spectral
of HSIs. Recent advances in deep learning-based solutions curves of the same surface feature may be different, and
have provided powerful tools for HIC, while the performance different surface features will present the same spectral
of these methods is limited when encountering boundary curve. The same spectral curve features corresponding

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

to these different features are part of the spectral redun- into DeepGCNs to solve the oversmoothing and multi-
dancy, which will affect the classification effect to a scale problems. In addition, we design a dynamic update
certain extent. mechanism to dynamically change the neighbor nodes to
3) How to extract non-Euclidean features (e.g., physical learn more effective graph structure features. A number
and chemical structures) from hyperspectral data more of experiments based on well-known data sets validate
efficiently. Each pixel of an HSI represents a surface fea- the effectiveness of the proposed DAGCN method.
ture, and the ground features, such as physical structure The remainder of this article is organized as follows.
and chemical composition, are non-Euclidean structures. Section II presents the preliminary work. Section III details
At present, most of the deep learning methods, such the proposed method for HSI. In Section IV, the data sets are
as CNNs, are based on Euclidean structures and use first introduced, and then, the experimental setup, comparison,
fixed convolution operators to extract features. Some and analysis results are discussed. Next, the performance of the
researchers classify HSIs based on the advantages of proposed method is statistically evaluated. Finally, Section V
GCN in non-Euclidean structures. However, these stud- concludes this article.
ies only use a shallow GCN model to extract features
and cannot mine deep features of HSIs. II. P RELIMINARIES
To address the abovementioned challenges, we propose the First, we define the basic symbols used in this article. Given
band
deep attention graph convolutional network (DAGCN) to deal the input HSI Hr×c (r , c, and band are the width, length, and
with the HIC by focusing on the problems stemming from the number of image bands, respectively), we reshape the image
increasing resolution of HSIs. Specifically, considering that band
Hr×c to H Nband (hereinafter, we refer to it as H), where N =
there are many bands in hyperspectral data and that the corre- r × c represents the number of pixels. We define each spectral
lation between bands is complex, we design a new similarity vector in H as hi = {h 1i , h 2i , . . . , h band
i }(i ∈ [1, N]).
measurement method, kernel spectral angle mapper, and spec-
tral information divergence (KSAM-SID), for the proposed A. Kernel Spectral Angle Mapper
DAGCN to effectively cluster similar spectra. The standard
The kernel spectral angle mapper (KSAM) [38] introduces
KSAM [38] mainly concerns the difference in spectral shape
the kernel technique, which implicitly maps the input to the
in high-dimensional space. Given the considerable redundancy
high-dimensional feature space, which can better solve the
in multiple HSI bands, if only KSAM is used to calculate
nonlinear classification problem.
spectral similarity, the redundant spectral information will be
The basic idea is to map the spectra hi and h j to the high-
included, which will degrade the efficiency of HIC. To solve
dimensional feature space and then calculate the angle between
this problem, we integrate KSAM with the weight SID [39]
the two spectral vectors in the high-dimensional space and
nonlinearly to constrain it spatially. By doing so, similar
finally measure the similarity dKSAM according to the angle
spectra are obtained, and different spectra are more obvious,  
 
thereby enabling reliable HIC. Second, considering the redun-   −1  hi , h j
dancy between spectra, HIC will be disturbed. We design an dKSAM hi , h j = cos   (1)
(hi , hi ) ·  h j , h j
attention mechanism [40] to focus on the spectral information     
that has the greatest impact on classification. Finally, due  hi , h j = exp −||hi − h j ||2 / 2δ 2 (2)
to the complexity of hyperspectral data, many data are non- where dKSAM ∈ [0, (π/2)]. The larger the value of dKSAM ,
Euclidean structures. The proposed DAGCN introduces a deep the greater the gap between the two spectral vectors, and the
graph convolutional network (DeepGCN) to mine the internal smaller the similarity.
relationship between HSI data, with the aim of extracting non-
Euclidean features from hyperspectral data more efficiently.
B. Spectral Information Divergence
The main contributions of this article are outlined as follows.
The spectral information divergence (SID) [39] regards each
1) We design a new similarity measurement method, spectral vector as information entropy with probability and
KSAM-SID, by exploring the correlation between the statistical characteristics and treats the redundancy between
spectra of HSIs, which offers a more suitable method for the two spectral vector probabilities as the similarity.
capturing spectral characteristics and accurately aggre- Given a spectral vector hi , its statistical probability vector
gating HSI features. can be expressed as
2) We propose an appropriate attention graph structure gen-  
eration method for the problem of graph convolution in pi = pi1 , pi2 , . . . , piband
 
HIC. Within the proposed attention graph structure, the = h 1i / h sum
i , hi / hi , . . . , hi
2 sum band
/ h sum
i (3)
nodes in the graph are most likely to belong to the same  j
category, which can inhibit the effect of samples outside where h sum
i = bandj =1 h i represents the sum of spectral features.
the category and, therefore, enhance the influence of The relative entropy (also known as KL divergence) of the
samples within the category. probability statistics vector hi and h j can be calculated as
3) We design DeepGCNs to mine HSI features, which   band
h ki
provides a new method for feature extraction in HIC. d hi ||h j = h ki ln . (4)
We adapt the dense connections and dilated convolutions k=1
h kj

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

The relative entropy between the spectra does not satisfy Thus, the space GCN is expressed as
symmetry, that is, d(hi ||h j ) = d(h j ||hi ). According to the  
cross entropy between spectra, the SID can be obtained Hi = F i Hi−1 , W i−1
 
      = σ D̃ −1/2 Ã D̃ −1/2 Hi−1 i−1
(13)
dSID hi ||h j = d hi ||h j + d h j ||hi . (5)
where Hi represents the characteristic matrix of the ith layer, F
The larger the value of dSID is, the lower the similarity represents the graph convolution operation, i represents the
between spectra and the higher the similarity. weight matrix of the ith layer, and σ represents the activation
function, such as ReLU.
C. Graph Convolution of Hyperspectral Data
In graph signal processing, the Laplace operator is an III. P ROPOSED M ODEL
operator reflecting local smoothness and can describe the In this section, we introduce the HIC process in detail, and
signal difference between a central node and a neighboring its framework is shown in Fig. 2. Given the original image,
node. By normalizing the Laplacian matrix, a more robust we use the KSAM-SID similarity measurement method to
graph-structured data representation can be obtained, which obtain the similarity matrix. Then, we construct the attention
has the property of real symmetric positive semidefinite KSAM-SID graph structure and use the generated graph
L = I N − D −1/2 AD −1/2 , Dii = Ai j (6) structure for graph convolution. By doing so, the features of
j the image are obtained using deep attention GCN (DAGCN),
and the graph structure of each layer is dynamically updated
where A is the adjacency matrix of the graph and D is a node
to adapt to the change in the GCN middle layer. The number
degree matrix that records the degree of each node.
of layers of the network is 10, including one layer of graph
Given the input HSI H, the spectral convolution [41] is convolution and three layers of the same dense block. Finally,
defined as follows:
we use the fully connected layer to classify the features of the
gω  H = U gω U T H (7) DAGCN output.
This section includes the following contents: a new simi-
where gω is the filter with parameter ω, U is the matrix com- larity measurement method KSAM-SID (see Section III-A),
posed of the eigenvector of the normalized graph Laplacian construction of the attention KSAM-SID graph structure
L = U U T , and  is the diagonal matrix composed of (see Section III-B), and the structure of the DeepGCNs (see
the eigenvalues of L. We can think of gω as a function of Section III-C).
the eigenvalue of L, that is, gω (ω). To avoid the high cost
of feature decomposition, Hammond et al. [42] approximated
gω () using the Chebyshev polynomial A. KSAM-SID Similarity Measurement
KSAM focuses on the difference in spectral shape in high-
K
 
gω () ≈ ωk Tk 
˜ (8) dimensional space. However, some bands of HSIs have redun-
k=0 dancy. If only KSAM is used to calculate spectral similarity,
redundant spectral information will be included. Therefore,
where 
˜ = 2/λmax − I N , λmax is approximately 2, and then,
it is necessary to combine the information entropy of SID to
 =  − I N . Accordingly
˜
constrain KSAM.
K
  In this article, KSAM is used to weight SID nonlinearly
gω  H ≈ ωk Tk L̃ (9) [43]. This takes full advantage of the advantages of SID
k=0 and KSAM in spectral similarity calculations so that similar
where L̃ ≈ L − I N . We set K = 1 as proposed by Kipf and spectra are more similar and spectral differences are more
Welling [41]; then obvious. According to the formula, the mixed measure is given
as follows:
gω  H ≈ ω0 H + ω1 (L − I N )H       
dmix hi , h j = dSID hi ||h j × sin dKSAM hi , h j (14)
= ω0 H − ω1 D −1/2 AD −1/2 H. (10)
Let = [ω0 , −ω1 ]; then where sin(dKSAM (hi , h j )) is to constrain dKSAM to [0,1]. dSID
  adapts to the undirected graph convolution graph structure.
gω  H ≈ I N + D −1/2 AD −1/2 H . (11) In addition, due to the great probability that the spectra
closer to each other belong to the same class and the spectra
The eigenvalues of I N + D −1/2 AD −1/2 in (11) are in the
with a longer distance are not of the same category, it is
range of [0, 2]. The continuous use of this operator in deep
necessary to introduce spatial constraints. The Chebyshev
neural networks will lead to numerical instability and gradient
distance is used to constrain the spectral mixing measure
explosion/disappearance problems. To solve these problems,
     
the operator is improved to be D̃ −1/2 à D̃ −1/2 , where à = A + d̃mix hi , h j = log di j × dmix hi , h j (15)
I N . This not only alleviates the network training problem but
also considers the node self-loop where di j = max(|i 1 − i 2 |, | j1 − j2 |), and (·)1 and (·)2 ,
respectively, represent the horizontal and vertical coordinates
gω  H = D̃ −1/2 Ã D̃ −1/2 H . (12) of the pixel.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

Fig. 2. Classification framework of HSIs based on ten layers of the DAGCN. First, we use the KSAM-SID similarity measurement method to obtain the
similarity matrix. Then, we construct the attention KSAM-SID graph structure and use the generated graph structure for graph convolution. In this article,
we use DeepGCNs to extract features. The number of layers of the network is 10, including one layer of graph convolution and three layers of the same
dense block. Finally, the classification is carried out through the fully connected layer.

B. Attention KSAM-SID Graph Structure 2) Second Stage: Note that the more similar the spectral
The current graph convolutional network does not have vector is, the smaller the similarity distance is (the similarity
a suitable method for generating graph structures on HSIs. distance of itself is 0), and the influence on feature aggregation
We interpret HSIs as graph structures based on spectral sim- is smaller. For better feature aggregation, we transform the
ilarity and design the attention KSAM-SID graph adjacency similarity matrix as follows:

matrix to construct the graph structure. ⎨ min(S ) ∗ 1 , S = 0
The attention KSAM-SID graph adjacency matrix process i· i·
Ŝi· = Si· (18)

is mainly divided into three stages, as shown in Fig. 3. 1, Si· = 0.
1) First Stage: Given a spectral vector hi , we calculate the
KSAM-SID similarity between it and all spectra H Because the similarity distance between a spectral vector
and other spectral vectors is often on the order of magnitude,
d̃mix (hi , H) = d̃mix (hi , h1 ), d̃mix (hi , h2 ), . . . , d̃mix (hi , h N ) . to prevent the data from being too large or too small, we add
(16) the min(Si· ) coefficient to normalize the similarity value.
The similarity matrix Ŝ is normalized by row
The similarity matrix of the whole HSI S can be calculated  
according to the following formula: S̈i = softmax Ŝi . (19)
⎡ ⎤ 3) Third Stage: The purpose of this article is to use the
d̃mix (h1 , h1 ) d̃mix (h1 , h2 ) · · · d̃mix (h1 , h N )
⎢ d̃mix (h2 , h1 ) d̃mix (h2 , h2 ) · · · d̃mix (h2 , h N ) ⎥ similarity between neighborhood nodes to aggregate them,
⎢ ⎥
S=⎢ .. .. .. .. ⎥. (17) but the current S̈ matrix does not contain the similarity
⎣ . . . . ⎦ between its own samples. Add the appropriate values to the
d̃mix (h N , h1 ) d̃mix (h N , h2 ) · · · d̃mix (h N , h N ) diagonal elements of the similarity matrix, and then, softmax
normalization is carried out
For each spectrum vector, its own similarity distance is 0,
  N
that is, d̃mix (hi , hi ) = 0. S is a symmetric matrix. S̃ = softmax S̈ + I N , S̃i j = 1, i ∈ [1, N]. (20)
The nearest neighbors of each sample are determined j =1
according to the similarity value. To avoid the influence of
distant samples on the sample characteristics in the prop- After the above operations, the graph is a directed graph.
agation process, the KNN algorithm is used to select the We need to adjust the graph and set the weight between some
nearest neighbors of the samples. For each spectral vector nodes to 0 so that the graph becomes an undirected graph. The
hi , we choose the k most similar spectra as neighbors i = similarity matrix S̃ is regarded as the attention graph adjacency
{ht1 , ht2 , . . . , htk }, t1 , t2 , . . . , tk ∈ [1, N]. We keep the part of matrix,
N which is different from the adjacency matrix. Since
S(hi , i ) and set the others to 0. S̃
j =1 i j = 1 and Dii = j A i j , the space GCN [see (13)]

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

Fig. 3. Schematic of the attention KSAM-SID graph adjacency matrix. Key is the spectral vector to be compared, the query is all spectral vectors, and
S(H, K i ) is the KSAM-SID measurement result. Si is the similarity vector after the KNN operation. B is the result of the attention KSAM-SID graph
adjacency matrix.

can be simplified to
 
Hi =σ S̃Hi−1 i−1
. (21)

C. DeepGCNs
The graph convolution operation can smooth the signal
and, therefore, helps to mine the inner relationship between
the neighborhood nodes. Note that, with the increase in the
number of GCN layers, the representation node vectors tend
to be consistent, and the node discrimination will become
increasingly worse [44], [45]. As a result, the characteristics Fig. 4. Dense block. (a) Dense block 2 layers. (b) Dense block 3 layers.
of the graph vertices converge to the same value, which
inevitably deteriorates the performance of the node classi- 2) Dilated Convolutions Module: In the field of remote
fication task. As a solution, motivated by Li et al. [30], sensing, image structures are different, the image scale span
we utilize dilated convolutions and dense connections to solve is large, and the homogeneous regions in the same image
this problem. In addition, after a series of graph convolution are different. Combining dilated convolution into DenseGCNs
operations, the graph feature is not immutable, so we design a can effectively extract both high-order abstract features and
dynamic update graph structure module to adapt to the graph multiscale features.
feature. In this study, after each dense block, dilated convolution
1) Dense Connections Module: According to DenseNet is used to find the extended neighborhood and establish the
[32], the efficient utilization of interlayer information flow in DilatedGCN. Specifically, the DilatedGCN graph G = (V, E)
the GCN is shown as follows: with hole ratio d is used to take a sample of every d neighbors
  in the k × d neighborhood
H i+1 = F i+1 H i , i
     
= T Fi H i, i , H i Nei(d) (v) = v 1 , v 1+d , v 2+d , . . . , v 1+(k−1)d (23)
     
= T F i H i , i , . . . , F 0 H 0 , 0 , H i . (22)
where Nei(d) (v) denotes the d hole neighborhood and rep-
The operator T is a point-level cascade function that can resents the result of sampling in the k × d neighborhood
fuse the output characteristic graph H 0 with the intermediate {v 1 , v 2 , . . . , v k×d } of the vertex. By applying the KNNs, a suit-
outputs of all GCNs. able distance measure standard is generated. The proposed
With the number of layers increasing, the feature dimension DilatedGCN method is, therefore, suitable for the spectral sim-
of H i increases, which results in a large increment of the ilarity used in hyperspectral data. Fig. 5 shows the schematic
runtime and memory requirements. Therefore, we design the of DilatedGCN when d is 1, 2, and 3.
dense block module to reduce the computing cost. The dense 3) Dynamic Graph Representation: Most GCNs have a
block module is shown in Fig. 4. fixed graph structure and only update vertex features in each

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

wavebands. In 2003, ROSIS imaged the city of Pavia in Italy


with a spatial resolution of 1.3 m/pixel and used a pixel
size of 610 × 340 to obtain the image of Pavia University.
The result is a 103-band Pavia University data set obtained
by denoising 12 bands covering the water absorption area.
In addition to the background, there are 42 776 marked pixels
and a total of nine categories.
Salinas was captured by a 224-band AVIRIS sensor in
Salinas Valley, CA, USA, with a spatial resolution of
Fig. 5. Schematic of DilatedGCN.
3.7 m/pixel and a captured pixel size of 512 × 217. Similar
iteration. Using dynamic graph convolution to change the to the Indian Pines scene, 20 water-absorbing bands were
graph structure at each layer and dynamically change the discarded, and 204 bands were retained. In addition to the
neighbor nodes can effectively alleviate the oversmoothing background, there are 54 129 marker pixels in the data set,
problem and generate a larger receptive field [46], [47]; this including 16 categories.
allows the network to learn more effective graph features. The data set Houston was collected by NCALM at the
For these reasons, we recalculate the graph adjacency matrix University of Houston (UH) in June 2012, covering the UH
according to the output characteristics after each dense block. campus. The data were provided by the 2013 IEEE GRSS
In (21), the size of S̃ is [N, N], the size of H i−1 is [N, h 1 ], and Data Fusion Competition. The data size is 349 × 1905,
the size of i−1 is [h 1 , h 2 ], where h 1 and h 2 are the number including 144 spectral bands ranging from 364 to 1046 nm,
of hidden units. After calculation, the output size is [N, h 2 ]. with a total of 15 categories.
Therefore, the size of the graph adjacency matrix is still [N,
N]. We only need to update the graph adjacency matrix on
the output H i . B. Experimental Setup
We use the cosine distance to update the graph structure Our experiment is carried out on a computer with a
dynamically. Since we normalize the GCN of each layer with 3.20-GHz Intel Core i5-6500 CPU with 8 GB of RAM and a
L2 normalization, the cosine similarity is the inner product of GTX 1070 Ti GPU.
the graph feature; that is, H l (H l )T . The calculation formula We use the general model evaluation criteria, i.e., overall
of the similarity matrix S̃ l of the l layer is accuracy (OA), average accuracy (AA), and kappa coefficients
  T 
(Kappa), on the standard data sets to verify the effectiveness
S̃l = softmax H l H l + IN (24)
of the proposed method. Approximately, 10% of the samples
where I N joins the similarity of its own nodes, which can are randomly selected as the training set for Indian Pines and
effectively prevent the large fluctuation of the training process. Houston, 6% of the samples are randomly selected as the
Then, we use the KNN algorithm to obtain neighbor nodes training set for Pavia University, and 5% of the samples are
according to the hole ratio d and normalize the values between randomly selected as the training set for Salinas. For categories
the neighbors. Finally, we set the other values in the graph with fewer samples, at least ten samples are randomly selected
adjacency matrix to 0. for the training set, while other samples are used as the test
set.
IV. E XPERIMENTS AND R ESULTS
The numbers of training and test samples for each category
A. Data Sets in the four hyperspectral data sets are shown in Tables I–IV,
In this section, we conduct experiments based on three respectively.
classical hyperspectral data sets, namely, Indian Pines, Pavia 1) AttentionGCN: We name the two-layer attention GCN
University, and Salinas, and a challenging data set Houston, as AttentionGCN.
to evaluate the performance of the proposed method. In order to verify the effectiveness of our proposed
AVIRIS is an airborne visible/infrared imaging spectrometer KSAM-SID method, we compare the common similarity mea-
belonging to the U.S. Jet Propulsion Laboratory. The AVIRIS surement methods, L2 distance, SAM [48], SID [39], and
imaging spectrometer has an imaging wavelength range of KSAM [38] with our method, and the parameters of each
0.4–2.5 μm and can produce images of ground objects experiment are consistent.
in 224 continuous wavebands. In June 1992, AVIRIS imaged The graph structure of graph convolution greatly influ-
an Indian pine forest in Indiana with a spatial resolution ences feature extraction for HSIs, and the graph structure
of 20 m/pixel. The range of 145 × 145 pixels is intercepted to is determined by node neighbors. To obtain the best results,
obtain the Indian pine image. Finally, 200 bands of the Indian we use a different number of k-nearest spatial neighbors to
Pines data set are obtained by denoising 24 bands covering the compare the classification results of different graph structures
water absorption area. In addition to the background, there are for AttentionGCN. We gradually increase the value of the
10 249 marked pixels in the data set, including 16 categories. KNNs to conduct experiments. The values for the KNNs are 8,
ROSIS is a reflection optical system imaging spectrometer 16, 24, 32, 40, 48, 56, 64, 72, and 80 for comparison.
from the German National Aeronautics and Space Adminis- The hidden units of the two-layer GCN network are [300,
tration. The wavelength range is 0.43–20.86 μm. It can con- 200]. The learning rate of the Adam optimization algorithm is
tinuously produce images of ground objects in 115 continuous set to 0.01, the number of iterations is 500, attenuation rates

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

TABLE I TABLE III


N UMBER OF T RAINING AND T EST S AMPLES ON THE N UMBER OF T RAINING AND T EST S AMPLES ON THE S ALINAS D ATA S ET
I NDIAN P INES D ATA S ET

TABLE IV
TABLE II
N UMBER OF T RAINING AND T EST S AMPLES ON THE H OUSTON D ATA S ET
N UMBER OF T RAINING AND T EST S AMPLES ON THE
PAVIA U NIVERSITY D ATA S ET

λ are set to 5e-6, and dropout parameters are set to 0.5 (to
prevent overfitting).
2) Comparative Experiment: We selected several advanced
methods for comparison, including SVM [8], SAE-LR [49],
EPF [50], SVM-3DG [51], 2-D-CNN [21], 3-D-CNN [22],
Here, DAGCN layers are set to 10. First, a GCN
GCN [41], FuNet-C [52], MDGCN [27], and our proposed
with 64 hidden layer units is implemented, and then, three
DAGCN.
dense blocks are used. The number of layers of the dense block
All the experimental results are averaged after five runs. The
is 3, and the number of hidden layer units is [64, 128, 192].
experiments presented in this study have achieved competitive
The network is updated iteratively with the Adam optimizer at
results under the premise of ensuring stability.
a learning rate of 0.001, and the maximum number of iterations
The process and parameters of DAGCN are set as follows.
is 500. After each layer, batch normalization is used. The
First, we use KSAM-SID to obtain the similarity matrix,
dropout of the fully connected layer is set to 0.3 to prevent
and the parameter δ of KSAM is 0.55. Then, we build the
overfitting.
graph structure. On the Indian Pines and Pavia University
data sets, k is set to 24. On the Salinas data set, k is set to
16. After that, we use DeepGCNs to extract features. If the C. Results of AttentionGCN
layer number of the dense block is 2, the hole ratio d is 1. 1) Results of Different Similarity Measurement Methods:
If the layer number of the dense block is 3, the hole ratio d We conduct experiments on different similarity measurement
is 2. The hole ratio d increases with the number of layers of methods on the Indian Pines data set. The results are shown
the dense block. Finally, we use two fully connected layers in Table V. We find that, although the simple similarity
[256, nclass] to obtain the classification results, where nclass measurement methods have less computational overhead, they
is the class number of the image. do not take full account of the spectral characteristics, resulting

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

TABLE V TABLE VI
C OMPARISON OF D IFFERENT S IMILARITY M EASURES ON C LASSIFICATION R ESULTS OF D IFFERENT k’ S
I NDIAN P INES D ATA S ET

achieve the best effect. When k gradually increases after k̃,


OA, AA, and Kappa show a downward trend. Interestingly,
we find that the critical value of the graph structure in Fig. 6
is consistent with that in Table V, which means that the best
result is obtained when the shape of the graph structure is no
longer changed.

D. Layer Setting for the DAGCN


To study the influence of the network depth on the classifi-
Fig. 6. Graph structure generated with different k’s. (a) Graph structure of cation results, we conducted experiments on the Indian Pines
Indian Pines. (b) Graph structure of Pavia University. data set every two layers. The total number of iterations is
500. The experimental results are shown in Fig. 8.
in poor classification results. In contrast, our method not only
Based on the results from Fig. 8, we see that the accuracy
considers the shape features of the spectrum and the features in
rate increases gradually before layer 8, and then, the accuracy
high-dimensional space but also constrains the spatial position;
rate basically remains unchanged. As the number of layers
hence, the OA has been significantly improved.
increases, the computational complexity increases, and the
2) Visualization of Graph Structure: We visualize the graph
process becomes increasingly time-consuming. In the case that
structure generated under different k on the Indian Pines and
the experimental results are not affected, we choose 10 as the
Pavia University data sets, as shown in Fig. 6.
number of layers.
When the value of k is small, such as k = 8, the graph
structure contains many independent subgraphs. With increas-
ing k, the number of subgraphs decreases or even disappears. E. Results of Comparative Experiments
When k = 16, there are only two subgraphs, and there are 1) Indian Pines Data Set: The classification results of each
no subgraphs after k = 24. When k reaches a certain value algorithm on the Indian Pines data set are shown in Fig. 9.
(k = 24), the shape of the graph structure remains unchanged It can be observed that our method has fewer spots in the
but becomes dense by increasing the value of k. classification map, and the classification is more accurate in
3) Training Network: We compared the accuracy and loss the boundary area of different categories.
of training under different k’s, as shown in Fig. 7. The specific classification results for each algorithm are
On the Indian Pines data set, we can clearly see that, shown in Table VII. We see that our method achieves the
with the increase in k, the network convergence is slower. best classification effect in three evaluation indexes, namely,
For example, compared with the red and blue lines, the OA, AA, and Kappa. When the training samples are small,
training accuracy of the red line increases faster, and the loss the accuracy of our method reaches 100% in categories 1, 7,
decreases faster. On the Pavia University data set, there is no 9, and 16, which proves that the flexible convolution form
significant difference in training accuracy and loss. On the of the graph convolution is more conducive to the distinction
Salinas data set, with the increase in k, the convergence value of boundaries. Compared with 2-D-CNN and 3-D-CNN, our
of the training accuracy decreases, and the training loss value method improves the OA by 5.93%, which shows the advan-
increases. tages of GCN in HIC. Compared with the MDGCN, which has
4) Classification Results: We compare the results of OA, the best effect in comparison, our method improves the OA
AA, and Kappa under different k conditions, as shown in by 1.75%, which indicates the effectiveness of our KSAM-SID
Table VI. similarity measurement method and DeepGCNs.
We call the black part of Table VI the critical value k̃. Fig. 12 illustrates the high-dimensional features of hyper-
When k gradually increases to k̃, OA, AA, and Kappa show spectral data by T-SNE. The results in Fig. 12 (a)–(e) denote
an upward trend. When k reaches k̃, OA, AA, and Kappa the T-SNE feature visualization of the Indian Pines data set.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

Fig. 7. Training accuracy and loss under different k’s. (a) Indian Pines. (b) Pavia University. (c) Salinas.

TABLE VII
C OMPARISON OF E XPERIMENTAL C LASSIFICATION R ESULTS ON I NDIAN P INES

In categories 2 and 11, the overlap is particularly evident in


Figs. 12(a) and (d). It can be seen in Fig. 12(e) that our method
can map samples evenly to a high-dimensional feature space
to facilitate classification.
2) Pavia University Data Set: The classification results of
each algorithm on the Pavia University data set are shown
in Fig. 10. In Fig. 10(l), we can see the advantage of our
method in an irregular shape from the correct classification
of category 6 and category 7. From Fig. 10(g) and (h),
the classification effect of category 7 is poor because the
size of the convolution kernel based on CNN is fixed, and Fig. 8. Comparison of the results of different layers.
hence, this method cannot adapt to the situation of changeable
shape.
The specific classification results for each algorithm are AA. SVM-3DG transforms spectral information and ignores
presented in Table VIII. Our method achieves the best classifi- the relationship between spectra. However, our method focuses
cation results on OA, AA, and Kappa, and OA reaches 99.44%. on the spectrum information, which shows the superiority of
Compared with SVM-3DG, our method is 2.01% higher in our KSAM-SID.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

Fig. 9. Comparison of experimental classification results on Indian Pines data set. (a) Source image. (b) Ground truth. (c) SVM. (d) SAE-LR. (e) EPF.
(f) SVM-3DG. (g) 2-D-CNN. (h) 3-D-CNN. (i) GCN. (j) FuNet-C. (k) MDGCN. (l) DAGCN.

Fig. 10. Comparison of experimental classification results on Pavia University. (a) Source image. (b) Ground truth. (c) SVM. (d) SAE-LR. (e) EPF.
(f) SVM-3DG. (g) 2-D-CNN. (h) 3-D-CNN. (i) GCN. (j) FuNet-C. (k) MDGCN. (l) DAGCN.

The second line of Fig. 12 is the T-SNE feature visual- the two categories are difficult to distinguish. However, our
ization of the Pavia University data set. In Fig. 12(f)–(h), method reduces errors by the KSAM-SID similarity measure-
category 2 and category 6 overlap seriously. Categories 1, 2, ment.
and 3 in Fig. 12(i) are mixed together. In contrast, the overlap The specific classification results for each algorithm are
of T-SNE features of our method is less. shown in Table IX. Our method achieves the best results
3) Salinas Data Set: The classification results of each algo- in OA, AA, and Kappa evaluation indexes. Among all the
rithm on the Salinas data set are shown in Fig. 11. It is obvious comparison methods, SVM-3DG has the best classification
that, compared with other methods, our method has the best effect, followed by MDGCN, while the commonly used
effect in categories 8 and 15. We find that, in categories 8 2-D-CNN and 3-D-CNN are not very good. It can be seen
and 11, all the methods are not very effective, indicating that that simple feature extraction using convolution cannot achieve

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

TABLE VIII
C OMPARISON OF E XPERIMENTAL C LASSIFICATION R ESULTS ON PAVIA U NIVERSITY

Fig. 11. Comparison of experimental classification results on the Salinas data set. (a) Source image. (b) Ground truth. (c) SVM. (d) SAE-LR. (e) EPF.
(f) SVM-3DG. (g) 2-D-CNN. (h) 3-D-CNN. (i) GCN. (j) FuNet-C. (k) MDGCN. (l) DAGCN.

good results. Our similarity measurement method provides an 4) Houston Data Set: The classification results on the
effective guarantee for accurate classification, which can be Houston data set are shown in Fig. 13. By observing Fig. 13(b)
seen from the OA rate of 99.04%. and (c), we find that the classification result of our method is
The third line of Fig. 12, namely, Fig. 12(k)–(o), presents basically consistent with that of ground truth.
the T-SNE feature visualization of the Salinas data set. Look- The specific classification results for each algorithm are
ing at the T-SNE features, it is obvious that the overlap shown in Table X. It can be seen that our method achieves
between categories 8 and 15 is the most serious. In Fig. 12(o), the best results on OA, AA, and Kappa. Compared with the
although our method still has little overlap in categories 8 and EPF with the best results in other methods, our OA is improved
15, it largely distinguishes the two categories. by 1.93%, AA by 1.73%, and Kappa by 2.09%, respectively.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

TABLE IX
C OMPARISON OF E XPERIMENTAL C LASSIFICATION R ESULTS ON S ALINAS

Fig. 12. T-SNE features visualization of hyperspectral data set. T-SNE features visualization of hyperspectral data set. T-SNE denote T-SNE features of
(a)–(e) Indian Pines dataset, (f)–(j) Pavia University dataset, and (k)–(o) Salinas dataset. From left to right of each line are obtained based on the methods
2D-CNN, 3D-CNN, GCN, Funet-C, and DAGCN.

In summary, the proposed DAGCN achieves the best problem that irregular shapes are difficult to distinguish and
performance for the OA, AA, and Kappa coefficients in that the interference between different categories is large.
the three data sets, which proves the effectiveness and In addition, the classification results are more or less accurate
superiority of the proposed method. Experiments show that, in the narrow region and at the category boundary. This
when our training samples are small, even if there are shows that our GCN structure based on spectral similarity
only ten samples, the classification results are good. In the can adapt to the non-Euclidean structure of HSIs. From
result figures for HIC, we see that DAGCN can solve the the perspective of the T-SNE feature, our method can

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

TABLE X
C OMPARISON OF E XPERIMENTAL C LASSIFICATION R ESULTS ON H OUSTON

Fig. 13. Experimental classification result on the Houston data set. (a) Source image. (b) Ground truth. (c) DAGCN.

uniformly map hyperspectral data to a high-dimensional the perspective of network design, to solve the problem that
feature space, there is less overlap between samples, and the increasing the number of convolution layers results in the
same category has certain aggregation. This implies that our indivisibility of features and gradient disappearance, a deep
DeepGCN structure can better map hyperspectral data into graph CNN is designed by utilizing dense connections and
high-dimensional space, which is more suitable for nonlinear dilated convolution design techniques. Furthermore, the graph
classification. adjacency matrix is updated dynamically, while the network
is trained. Experiments show that the proposed method can
V. C ONCLUSION effectively extract the hyperspectral data features and improve
In this article, we developed an HIC method by design- classification accuracy.
ing the DAGCN. Specifically, a new KSAM-SID similarity Our findings show that arbitrary shape convolution is more
measurement method is used to construct an adjacency matrix suitable for HIC at the pixel level. Due to the new similarity
that considers spectral and spatial location information. Then, measurement method, sparse matrix storage can be used to
according to the attention learning mechanism, the samples save space although it uses all spectral information. In addi-
are given different characteristics of aggregate influence. From tion, we realize that the computational complexity is still high,

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
BAI et al.: HIC BASED ON DAGCN 5504316

especially when the hyperspectral data set is large. Hence, end- [21] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-
to-end learning is a great challenge. Seeking solutions in future tion and classification of hyperspectral images based on convolutional
neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
work, we will investigate design methods for automatic end- pp. 6232–6251, Oct. 2016.
to-end representation learning and implement large-scale data [22] Y. Li, H. Zhang, and Q. Shen, “Spectral–spatial classification of hyper-
learning. spectral imagery with 3D convolutional neural network,” Remote Sens.,
vol. 9, no. 67, pp. 1–21, 2017.
[23] L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent neural networks for
R EFERENCES hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens.,
vol. 55, no. 7, pp. 3639–3655, Jul. 2017.
[1] X. Jia, B.-C. Kuo, and M. M. Crawford, “Feature mining for hyper- [24] R. Hang, Q. Liu, D. Hong, and P. Ghamisi, “Cascaded recurrent neural
spectral image classification,” Proc. IEEE, vol. 101, no. 3, pp. 676–697, networks for hyperspectral image classification,” IEEE Trans. Geosci.
Mar. 2013. Remote Sens., vol. 57, no. 8, pp. 5384–5394, Aug. 2019.
[2] P. Ghamisi et al., “Advances in hyperspectral image and signal process- [25] X. Zhang, Y. Sun, K. Jiang, C. Li, L. Jiao, and H. Zhou, “Spatial
ing: A comprehensive overview of the state of the art,” IEEE Geosci. sequential recurrent neural network for hyperspectral image classifica-
Remote Sens. Mag., vol. 5, no. 4, pp. 37–78, Dec. 2017. tion,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 11,
[3] S. Roessner, K. Segl, U. Heiden, and H. Kaufmann, “Automated differ- no. 11, pp. 4141–4155, Nov. 2018.
entiation of urban surfaces based on airborne hyperspectral imagery,” [26] A. Qin, Z. Shang, J. Tian, Y. Wang, T. Zhang, and Y. Y. Tang, “Spectral–
IEEE Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1525–1532, spatial graph convolutional networks for semisupervised hyperspectral
Jul. 2001. image classification,” IEEE Geosci. Remote Sens. Lett., vol. 16, no. 2,
[4] G. Camps-Valls, D. Tuia, L. Bruzzone, and J. A. Benediktsson, pp. 241–245, Feb. 2019.
“Advances in hyperspectral image classification: Earth monitoring with [27] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale
statistical learning methods,” IEEE Signal Process. Mag., vol. 31, no. 1, dynamic graph convolutional network for hyperspectral image classifica-
pp. 45–54, Jan. 2014. tion,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3162–3177,
[5] J. Li, I. Dopido, P. Gamba, and A. Plaza, “Complementarity of discrimi- May 2020.
native classifiers and spectral unmixing techniques for the interpretation [28] L. Mou, X. Lu, X. Li, and X. X. Zhu, “Nonlocal graph convolu-
of hyperspectral images,” IEEE Trans. Geosci. Remote Sens., vol. 53, tional networks for hyperspectral image classification,” IEEE Trans.
no. 5, pp. 2899–2912, May 2015. Geosci. Remote Sens., vol. 58, no. 12, pp. 8246–8257, Dec. 2020, doi:
[6] F. F. Sabins, “Remote sensing for mineral exploration,” Ore Geol. Rev., 10.1109/TGRS.2020.2973363.
vol. 14, nos. 3–4, pp. 157–183, Sep. 1999.
[29] S. Wan, C. Gong, P. Zhong, S. Pan, G. Li, and J. Yang, “Hyperspec-
[7] L. Ma, M. M. Crawford, and J. Tian, “Local manifold learning-based tral image classification with context-aware dynamic graph convolu-
k-nearest-neighbor for hyperspectral image classification,” IEEE Trans. tional network,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1,
Geosci. Remote Sens., vol. 48, no. 11, pp. 4099–4109, Nov. 2010. pp. 597–612, Jan. 2021, doi: 10.1109/TGRS.2020.2994205.
[8] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
[30] G. Li, M. Muller, A. Thabet, and B. Ghanem, “DeepGCNs: Can GCNs
sensing images with support vector machines,” IEEE Trans. Geosci.
go as deep as CNNs?” in Proc. IEEE/CVF Int. Conf. Comput. Vis.
Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
(ICCV), Oct. 2019, pp. 9267–9276.
[9] E. Aptoula, M. Dalla Mura, and S. Lefevre, “Vector attribute profiles for
hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., [31] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
vol. 54, no. 6, pp. 3208–3220, Jun. 2016. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[10] G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, “Texture feature (CVPR), Jun. 2016, pp. 770–778.
analysis using a Gauss-Markov model in hyperspectral image classifica- [32] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely
tion,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp. 1543–1551, connected convolutional networks,” in Proc. IEEE Conf. Comput. Vis.
Jul. 2004. Pattern Recognit. (CVPR), Jul. 2017, pp. 4700–4708.
[11] D. Tuia, F. Ratle, F. Pacifici, M. F. Kanevski, and W. J. Emery, “Active [33] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated
learning methods for remote sensing image classification,” IEEE Trans. convolutions,” 2015, arXiv:1511.07122. [Online]. Available:
Geosci. Remote Sens., vol. 47, no. 7, pp. 2218–2232, Jul. 2009. http://arxiv.org/abs/1511.07122
[12] I. Dópido, J. Li, P. Reddy Marpu, A. Plaza, J. M. Bioucas Dias, and [34] Q. Tong, Y. Xue, and L. Zhang, “Progress in hyperspectral remote
J. Atli Benediktsson, “Semisupervised self-learning for hyperspectral sensing science and technology in China over the past three decades,”
image classification,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 7, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 1,
pp. 4032–4044, Jul. 2013. pp. 70–91, Jan. 2014.
[13] A. K. Jain, J. Mao, and K. M. Mohiuddin, “Artificial neural networks: [35] J. Bai et al., “Class incremental learning with few-shots based
A tutorial,” Computer, vol. 29, no. 3, pp. 31–44, Mar. 1996. on linear programming for hyperspectral image classification,”
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification IEEE Trans. Cybern., early access, Nov. 24, 2020, doi:
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. 10.1109/TCYB.2020.3032958.
Process. Syst. (NIPS), 2012, pp. 1097–1105. [36] S. Li, W. Song, L. Fang, Y. Chen, P. Ghamisi, and J. Atli Benediktsson,
[15] Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recur- “Deep learning for hyperspectral image classification: An overview,”
rent neural networks for sequence learning,” 2015, arXiv:1506.00019. IEEE Trans. Geosci. Remote Sens., vol. 57, no. 9, pp. 6690–6709,
[Online]. Available: https://arxiv.org/abs/1506.00019 Sep. 2019.
[16] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional [37] Z. Wang, J. S. Tyo, and M. M. Hayat, “Data interpretation for spectral
neural networks for hyperspectral image classification,” J. Sensors, sensors with correlated bands,” JOSA A, vol. 24, no. 9, pp. 2864–2870,
vol. 2015, Jul. 2015, Art. no. 258619. 2007.
[17] S. Mei, J. Ji, Q. Bi, J. Hou, Q. Du, and W. Li, “Integrating spectral [38] X. Liu and C. Yang, “A kernel spectral angle mapper algorithm for
and spatial information into deep convolutional neural networks for remote sensing image classification,” in Proc. 6th Int. Congr. Image
hyperspectral classification,” in Proc. IEEE Int. Geosci. Remote Sens. Signal Process. (CISP), vol. 2, Dec. 2013, pp. 814–818.
Symp. (IGARSS), Beijing, China, Jul. 2016, pp. 5067–5070.
[39] C. Chang, “Spectral information divergence for hyperspectral image
[18] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep
analysis,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., Hamburg,
supervised learning for hyperspectral data classification through con-
Germany, vol. 1, Jun. 1999, pp. 509–511.
volutional neural networks,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Milan, Italy, Jul. 2015, pp. 4959–4962. [40] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf.
[19] E. Aptoula, M. C. Ozdemir, and B. Yanikoglu, “Deep learning with Process. Syst., 2017, pp. 5998–6008.
attribute profiles for hyperspectral image classification,” IEEE Geosci. [41] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
Remote Sens. Lett., vol. 13, no. 12, pp. 1970–1974, Dec. 2016. convolutional networks,” 2016, arXiv:1609.02907. [Online]. Available:
[20] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyper- http://arxiv.org/abs/1609.02907
spectral image classification: A dimension reduction and deep learn- [42] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on
ing approach,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, graphs via spectral graph theory,” in Proc. 13th Int. Conf. Very Large
pp. 4544–4554, Aug. 2016. Data Bases, vol. 30, no. 2, Mar. 2011, pp. 129–150.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.
5504316 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 60, 2022

[43] Y. Du, C.-I. Chang, H. Ren, C.-C. Chang, J. O. Jensen, and Licheng Jiao (Fellow, IEEE) received the B.S.
F. M. D’Amico, “New hyperspectral discrimination measure for spectral degree from Shanghai Jiao Tong University,
characterization,” Opt. Eng., vol. 43, no. 8, pp. 1777–1787, 2004. Shanghai, China, in 1982, and the M.S. and Ph.D.
[44] Q. Li, Z. Han, and X.-M. Wu, “Deeper insights into graph convolu- degrees from Xi’an Jiaotong University, Xi’an,
tional networks for semi-supervised learning,” 2018, arXiv:1801.07606. China, in 1984 and 1990, respectively.
[Online]. Available: http://arxiv.org/abs/1801.07606 Since 1992, he has been a Professor with the
[45] K. Xu, C. Li, Y. Tian, T. Sonobe, K.-I. Kawarabayashi, and School of Electronic Engineering, Xidian University,
S. Jegelka, “Representation learning on graphs with jumping Xi’an, where he is the Director of the Key Labo-
knowledge networks,” 2018, arXiv:1806.03536. [Online]. Available: ratory of Intelligent Perception and Image Under-
http://arxiv.org/abs/1806.03536 standing of the Ministry of Education, International
[46] K. K. Thekumparampil, C. Wang, S. Oh, and L.-J. Li, “Attention- Research Center of Intelligent Perception and Com-
based graph neural network for semi-supervised learning,” 2018, putation. His research interests include intelligent information processing,
arXiv:1803.03735. [Online]. Available: http://arxiv.org/abs/1803.03735 image processing, machine learning, and pattern recognition.
[47] M. Simonovsky and N. Komodakis, “Dynamic edge-conditioned filters Prof. Jiao is also a member of the IEEE Xi’an Section Execution Com-
in convolutional neural networks on graphs,” in Proc. IEEE Conf. mittee, the President of the Computational Intelligence Chapter, the IEEE
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 3693–3702. Xi’an Section, and the IET Xi’an Network, the Chairman of the Awards
[48] F. A. Kruse et al., “The spectral image processing system (SIPS)— and Recognition Committee, the Vice Board Chairperson of the Chinese
Interactive visualization and analysis of imaging spectrometer data,” Association of Artificial Intelligence, a Councilor of the Chinese Institute
Remote Sens. Environ., vol. 44, nos. 2–3, pp. 145–163, 1993. of Electronics, a Committee Member of the Chinese Committee of Neural
[49] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based Networks, and an Expert of the Academic Degrees Committee of the State
classification of hyperspectral data,” IEEE J. Sel. Topics Appl. Earth Council.
Observ. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.
[50] X. Kang, S. Li, and J. Atli Benediktsson, “Spectral–spatial hyperspectral
image classification with edge-preserving filtering,” IEEE Trans. Geosci.
Remote Sens., vol. 52, no. 5, pp. 2666–2677, May 2014.
[51] X. Cao, L. Xu, D. Meng, Q. Zhao, and Z. Xu, “Integration of
3-dimensional discrete wavelet transform and Markov random field
for hyperspectral image classification,” Neurocomputing, vol. 226, Hongyang Chen (Senior Member, IEEE) received
pp. 90–100, Feb. 2017. the B.S. degree from Southwest Jiaotong University,
[52] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, Chengdu, China, in 2003, the M.S. degree from
“Graph convolutional networks for hyperspectral image classification,” the Institute of Mobile Communications, South-
IEEE Trans. Geosci. Remote Sens., early access, Aug. 18, 2020, doi: west Jiaotong University, in 2006, and the Ph.D.
10.1109/TGRS.2020.3015157. degree from The University of Tokyo, Tokyo, Japan,
in 2011.
From 2004 to 2006, he was a Research Assistant
Jing Bai (Senior Member, IEEE) received the with the Institute of Computing Technology, Chinese
B.S. degree in electronic and information engineer- Academy of Sciences, Beijing, China. In 2009,
ing from Zhengzhou University, Zhengzhou, China, he was a Visiting Researcher in the UCLA Adaptive
in 2004, and the Ph.D. degree in pattern recogni- Systems Laboratory, University of California Los Angeles, Los Angeles, CA,
tion and intelligent systems from Xidian University, USA. From April 2011 to June 2020, he was a Researcher with Fujitsu
Xi’an, China, in 2009. Ltd., Tokyo. He is a Principal Investigator/Senior Research Scientist with the
She is an Associate Professor with Xidian Univer- Zhejiang Laboratory, Hangzhou, China. He has published 80 refereed journal
sity. Her research interests include image process- articles and conference papers in the ACM Transactions on Sensor Networks,
ing, machine learning, and intelligent information the IEEE T RANSACTIONS ON S IGNAL P ROCESSING, IEEE T RANSACTIONS
processing. ON W IRELESS C OMMUNICATIONS, IEEE Military Communications Confer-
ence (MILCOM), IEEE Global Communications Conference (GLOBECOM),
IEEE International Conference on Communications (ICCs), and so on. He has
Bixiu Ding (Member, IEEE) received the B.S. granted/filled more than 50 PCT patents. His research interests include IoT,
degree in electronic information science and technol- data-driven intelligent networking and systems, machine learning, localization
ogy from Zhejiang Sci-Tech University, Hangzhou, and location-based big data, beyond the fifth generation (B5G), and statistical
China, in 2019. She is pursuing the M.S. degree in signal processing.
computer technology with Xidian University, Xi’an, Dr. Chen has served as an Editor for the IEEE T RANSACTIONS ON
China. W IRELESS C OMMUNICATIONS and an Associate Editor for IEEE C OMMU -
She is interested in image processing, deep learn- NICATIONS L ETTERS .
ing, and intelligent information processing.

Zhu Xiao (Senior Member, IEEE) received the M.S.


and Ph.D. degrees in communication and informa- Amelia C. Regan (Member, IEEE) received the
tion systems from Xidian University, Xi’an, China, M.S. degree in mathematics from Johns Hopkins
in 2007 and 2009, respectively. University, Baltimore, MD, USA, and the M.S.E.
From 2010 to 2012, he was a Research Fellow with and Ph.D. degrees from The University of Texas at
the Department of Computer Science and Technol- Austin, Austin, TX, USA.
ogy, University of Bedfordshire, Luton, U.K. He is She is a Professor with the Donald Bren School of
an Associate Professor with the College of Computer Information and Computer Sciences, University of
Science and Electronic Engineering, Hunan Univer- California at Irvine, Irvine, CA, USA. Her research
sity, Changsha, China. His research interests include interests include cyber–physical transportation sys-
wireless localization, the Internet of Vehicles, and tems, dynamic and stochastic network optimization,
intelligent transportation systems. Please see https://zhuxiao-hnu.github.io/en/ and machine learning tools for temporal–spatial data
for more details. analysis.

Authorized licensed use limited to: Heriot-Watt University. Downloaded on April 13,2024 at 12:14:43 UTC from IEEE Xplore. Restrictions apply.

You might also like