0% found this document useful (0 votes)
4 views

2019 Deep Learning Ensemble for Hyperspectral Image Classification

The document presents a deep learning ensemble framework for hyperspectral image (HSI) classification, combining deep convolutional neural networks (CNNs) and random subspace-based ensemble learning. Two methods, CNN ensemble and deep residual network ensemble, are proposed to enhance classification accuracy by utilizing transfer learning and diverse classifiers. Experimental results demonstrate that this ensemble approach outperforms existing state-of-the-art methods in HSI classification tasks.

Uploaded by

cosay63994
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2019 Deep Learning Ensemble for Hyperspectral Image Classification

The document presents a deep learning ensemble framework for hyperspectral image (HSI) classification, combining deep convolutional neural networks (CNNs) and random subspace-based ensemble learning. Two methods, CNN ensemble and deep residual network ensemble, are proposed to enhance classification accuracy by utilizing transfer learning and diverse classifiers. Experimental results demonstrate that this ensemble approach outperforms existing state-of-the-art methods in HSI classification tasks.

Uploaded by

cosay63994
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1882 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO.

ND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

Deep Learning Ensemble for Hyperspectral


Image Classification
Yushi Chen , Member, IEEE, Ying Wang , Yanfeng Gu , Senior Member, IEEE, Xin He,
Pedram Ghamisi , Senior Member, IEEE, and Xiuping Jia , Senior Member, IEEE

Abstract—Deep learning models, especially deep convolutional classification is a fundamental technique in many applications,
neural networks (CNNs), have been intensively investigated for which is recognized as the most vibrant topic in the remote
hyperspectral image (HSI) classification due to their powerful sensing community [2].
feature extraction ability. In the same manner, ensemble-based
learning systems have demonstrated high potential to effectively HSI usually contains hundreds of spectral channels and the
perform supervised classification. In order to boost the perfor- obtained spectral information is a valuable source for classifi-
mance of deep learning-based HSI classification, the idea of deep cation. In the last two decades, many spectral classifiers have
learning ensemble framework is proposed here, which is loosely been proposed for HSI classification including k-nearest neigh-
based on the integration of deep learning model and random bors, maximum likelihood, support vector machine (SVM), lo-
subspace-based ensemble learning. Specifically, two deep learn-
ing ensemble-based classification methods (i.e., CNN ensemble and gistic regression, neural network, and random forest [3]. Among
deep residual network ensemble) are proposed. CNNs or deep resid- the aforementioned methods, a SVM obtain relatively high
ual networks are used as individual classifiers and random sub- classification accuracy compared to other widely used pattern
spaces contribute to diversify the ensemble system in a simple yet recognition techniques [4]. In [5], several well-known spec-
effective manner. Moreover, to further improve the classification tral classifiers were critically compared against each other for
accuracy, transfer learning is investigated in this study to trans-
fer the learnt weights from one individual classifier to another classification of hyperspectral remote sensing data.
(i.e., CNNs). This mechanism speeds up the learning stage. Exper- HSI contains abundant spectral and spatial information. There
imental results with widely used hyperspectral datasets indicate is a rich literature on spectral-spatial-based HSI classification.
that the proposed deep learning ensemble system provides com- Morphological profiles (MPs) can efficiently extract the spatial
petitive results compared with state-of-the-art methods in terms features and the combination of an MP and a SVM formulates
of classification accuracy. The combination of deep learning and
ensemble learning provides a significant potential for reliable HSI an accurate spectral-spatial classifier [6]. Other spatial feature
classification. extraction methods, including Markov random fields and graph-
ical models, also contributed to the spectral-spatial classification
Index Terms—Convolutional neural network (CNN), deep
learning, ensemble, hyperspectral imagery classification, random
of HSI [7], [8]. Sparse representation was also introduced to ex-
subspace. tract the spectral- spatial features of HSI [9]. Moreover, sparse
models have been combined with other feature extraction meth-
I. INTRODUCTION ods to further improve the classification performance [10]. Very
recently, more than 20 well-known spectral-spatial classification
YPERSPECTRAL image (HSI) classification is a task
H that tries to assign a label to each pixel in the scene [1]. HSI
approaches were critically evaluated in [11].
Deep learning based methods have shown their advantages
Manuscript received November 24, 2018; revised March 14, 2019; accepted in many research areas, such as image classification [10], nat-
April 30, 2019. Date of publication May 22, 2019; date of current version July 17, ural language processing [12], speech recognition [13], and re-
2019. This work was supported in part by Natural Science Foundation of China mote sensing [14]. In recent years, many deep learning models,
under Grants 61771171 and 61871157, in part by the Open Fund of State Key
Laboratory of Frozen Soil Engineering under Grant SKLFSE201614, and in part including stacked autoencoder [15], deep belief network [16],
by the "High Potential Program” of Helmholtz-Zentrum Dresden-Rossendorf. convolutional neural network (CNN) [17], recurrent neural net-
(Corresponding author: Yushi Chen.) work [18], and deep dictionary learning [19], have been explored
Y. Chen, Y. Gu, and X. He are with the School of Electronics and Information
Engineering, Harbin Institute of Technology, Harbin 150001, China (e-mail: for HSI classification.
[email protected]; [email protected]; [email protected]). Among the deep learning models for HSI classification, CNN-
Y. Wang is with the Higher Education Key Lab for Measure & Control Tech- based methods have attracted a lot of attention in recent years
nology and Instrumentations of Heilongjiang, Harbin University of Science and
Technology, Harbin 150080, China (e-mail: [email protected]). [20]. In general, there are two types of CNN-based methods:
P. Ghamisi is with the Helmholtz-Zentrum Dresden-Rossendorf, Helmholtz spectral and spectral-spatial classifiers. In [21], CNN was used
Institute Freiberg for Resource Technology, Freiberg 09599, Germany (e-mail: to extract the spectral features of HSIs and the well-designed
[email protected]).
X. Jia is with the School of Engineering and Information Technology, CNN achieved better classification performance than a SVM and
The University of New South Wales, Canberra, ACT 2600, Australia (e-mail: a conventional deep neural network. After that, in [22], a novel
[email protected]). CNN architecture was proposed as a deep spectral classifier to
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. extract the pixel-pair features of HSIs. Most of the CNN-based
Digital Object Identifier 10.1109/JSTARS.2019.2915259 methods were focused on spectral-spatial HSI classification.

1939-1404 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1883

In [18], a framework that combines principal component The main contributions of this paper are summarized as
analysis, deep CNN, and logistic regression was used for HSI follows.
spectral-spatial classification. Due to the fact that HSIs are in- 1) To the best of our knowledge, the idea of combing deep
herently three-dimensional (3-D) data, it is reasonable to use learning and ensemble methods for HSI classification is
3-D CNNs to classify HSI [23], [24]. Some new deep 3-D CNN proposed for the first time in this paper.
models, including residual networks, have also been investigated 2) Two deep learning ensemble-based classification methods
for HSI classification [25], [26]. [i.e., CNN ensemble and deep residual network (ResNet)
Furthermore, CNNs can be combined with other techniques, ensemble] are proposed for HSI classification. The pro-
such as sparse representation and MPs, to further improve the posed method utilizes deep learning model to extract the
classification performance. In [27], a CNN was combined with robust and discriminate features of HSI, RFS to formulate
sparse representation to refine the learned features. Very re- the diversity of the ensemble system, and majority voting
cently, a method based on the combination of MPs and a CNN to obtain the final classification results.
was introduced to extract the features of HSIs, which leads to a 3) Furthermore, a transferring deep learning ensemble is
performance improvement [28]. proposed to make a full use of the learned weights
Although deep learning based methods, in particular deep of CNNs, which improves the eventual classification
CNNs, have improved the classification performance of HSI, accuracy.
some disadvantages still remain. First of all, due to a large num- The rest of this paper is organized into four sections. Sec-
ber of parameters that need to be tuned, training samples are tion II is a description of a deep CNN-based ensemble frame-
often inadequate. A limited number of training samples is a work for HSI classification and Section III presents a deep CNN
common issue in real-world remote sensing applications. Deep ensemble with transfer learning. Experimental results with two
learning models, therefore, face a problem named overfitting, hyperspectral datasets are shown in Section III. Section IV sum-
which means that a deep model can obtain good performance marizes the observations and completes this paper by pointing
on the training data but relatively poor performance on the test out some possible future works.
data.
In order to improve the HSI classification accuracy of deep
CNN-based methods, ensemble learning combined with a deep II. DEEP CNN ENSEMBLE FOR HYPERSPECTRAL IMAGERY
CNN is investigated in this study. In ensemble learning, some CLASSIFICATION
“weak” classifiers are combined based on a proper strategy to In this section, a deep learning ensemble framework will
obtain better performance than any individual classifiers [29]. be discussed. Fig. 1 illustrates the workflow of the proposed
Several ensemble methods have been proposed, including boost- method. It can be seen that the proposed method consists of
ing [30], Adaboost [31], and random forest [32]. The ensemble two core parts: RFS and CNN. Three CNNs are illustrated as
classification, which represents a single hypothesis, is flexible individual classifiers and RFS is used to formulate an effective
in the function it can represent. In theory and practice, ensemble multiple classifier system. Finally, the final classification result
methods tend to improve classification performance [33]. is obtained by applying majority voting on the result of the in-
In the context of hyperspectral remote sensing, many dividual classifiers.
ensemble-based methods have been proposed for HSI classifi-
cation. For example, in [34], a multiple classifier system, which
combines a SVM and random feature selection (RFS), was pro- A. Deep Learning and CNN
posed to explore the potential of ensemble learning for HSI clas- Deep learning models try to formulate a neural network with
sification. several layers, typically deeper than three layers, aiming at ex-
In this study, to make use of the advantages of deep learning tracting discriminate features and accurate classification [36].
and the ensemble method, a deep learning ensemble method is There are several popular deep learning architectures, including
proposed for HSI classification. There are two core factors that stacked autoencoder, deep belief network, deep recurrent neural
define a good ensemble-based classification system: the accu- network, and deep CNN. In recent years, CNNs have outper-
racy of an individual classifier and the diversity among the clas- formed other deep learning models in classification [37], detec-
sifiers [35]. Because of the relatively good feature extraction tion [38], and natural language processing [39]. Specifically, in
and classification ability of a deep CNN, we use it here as the the remote sensing community, CNNs have delivered promis-
individual classifier. Moreover, due to the simple yet effective ing classification results. In this study, well-designed CNNs are
performance of RFS, we use it to diversify component classi- used as individual classifiers.
fiers. By the integration of deep CNN and RFS, the ensemble Compared to other deep learning models, CNNs have two
learning system can be expected to offer better performance. unique factors in their architecture design: local connections
Furthermore, transfer learning is used to further improve clas- and shared weights. A complete CNN stage contains a convolu-
sification accuracy. There are several individual classifiers in an tion layer, a nonlinearity mapping layer, and a pooling operation
ensemble classifier. Here, we let the learned parameters of the layer. A deep CNN is constructed by stacking several convolu-
previous classifier (i.e., CNN) be transferred to the current CNN tion layers and pooling layers to form a deep architecture [40].
as the initialization parameters. This mechanism takes advantage The convolution layer uses different filters to extract features
of the available information. from the input data, and the pooling operation layer makes the

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1884 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

Fig. 1. Framework of deep learning ensemble for HSI classification.

advantages: First, it leads to a faster learning procedure, since


the learning rate can be increased compared to the non-batch-
normalized version. Second, flexibility on mean and variance
values for each dimension in every hidden layer provides higher
learning ability, which consequently increases the accuracy of
the network [41].
The learnable parameters in the deep CNN model are trained
using a mini-batch-based backpropagation algorithm.
A deep CNN works pretty well for HSI classification under the
condition of sufficient training samples. Unfortunately, limited
training samples is a common issue in remote sensing commu-
Fig. 2. Residual learning: A building block. nity. Therefore, deep CNN is a relatively “weak” HSI classifier.
In order to obtain an accurate HSI classifier, ensemble learning
with a deep CNN is investigated in this paper.
obtained features more abstract and robust
M

xlj = xli − 1 ∗ kij
l
+ blj . (1) B. Deep CNN Ensemble Based on RFS
i=1
In the deep CNN ensemble, the feature subspace is con-
Equation (1) describes how the convolutional layer calculates structed by randomly selecting features (i.e., bands) from the
the output feature map. In (1), where xlj is the jth feature map original feature space, and each individual deep CNN classifier
of the lth layer, xli − 1 is the ith feature map of the (l–1)th is trained and tested within the corresponding feature subspace
layer and M is the number of input feature maps. The trainable [42].
l
parameters kij and blj are randomly initialized and set to zero. ∗ Let X = {(xi , ci )|1 ≤ i ≤ N } represent the original M -
is the convolution operation. dimensional dataset, which is composed of N training samples,
Pooling can offer invariance by reducing the resolution of the where xi ∈ RM is a training sample with the corresponding
feature maps. Each pooling layer corresponds to the previous class label ci ∈ C = {1, 2, . . . , L} , and L is the total num-
convolutional layer. The neuron in the pooling layer combines a ber of classes. In the RFS-based deep CNN ensemble, each
small N × N patch of the convolution layer, and max pooling is classifier within the ensemble system is trained in a random
an operation to estimate the maximum value of the small patch. feature subspace with the dimensionality m(m  M ). The
The nonlinearity layer calculates the output feature map, random feature subspace is generated by the random selec-
which is defined as follows: tion of m bands from the original M bands so that X turns
to X̃ = {(xi , ci )|1 ≤ i ≤ N } , where xi ∈ Rm . X̃ turns out
as = f (z s ) (2)
as the input to a single deep CNN, which outputs a classifier
where f (·) is the rectified linear unit (ReLU) [i.e., f (x) = h = DCNN(X̃). This process is repeated Z times to obtain the
max(0, x)] in this paper. combination of random subspace X̃ = {X 1 , X2 , . . . , X
 Z } and
To improve the performance of the networks, batch normaliza- contrast individual classifiers H = {h1 , h2 , . . . , hZ } , where Z
tion is also adopted. Batch normalization has the following main is the size of the ensemble.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1885

Fig. 3. Framework of deep learning ensemble with transfer learning for hyperspectral imagery classification.

In the classification procedure, each individual deep CNN learning) can be used to formulate a new type of deep learn-
is processed within the corresponding feature subspace of the ing model (i.e., ResNet). A ResNet is easy to train and obtains
test data Ỹ = {yj |1 ≤ j ≤ P } , yj ∈ Rm , where P is the size better performance compared with a CNN in terms of training
of the test sample. After applying the individual classification accuracy.
step, the label of a test sample is decided using majority voting On the other hand, the proposed deep learning ensemble
on the result of individual classifiers [43]. In majority voting, framework for HSI classification is flexible. Instead of a CNN, a
each weak classifier assigns a class label to a given test sample ResNet with RFS can be used to formulate a new deep learning
based on the corresponding hypothesis defined in the associated ensemble-based classifier for HSI classification.
feature subspace [44]. There is no need to assume any prior The number of layers in recent networks is increasing to deal
knowledge for the performance of the weak classifiers. Finally, with more complicated scenarios and achieve higher accuracy
the class that receives the highest number of votes is appointed for a problem at hand. However, when the network becomes
as the final decision [45]. deeper, the classification accuracy of the deep learning models
The diversity in the ensemble learning is also an important gets saturated and even decreased (known as the degradation
standard to evaluate classification performance except for accu- problem). To address this problem, instead of the simple
racies [45], [46]. The ambiguous relationship between diversity stacking of more plain CNN layers, the deep residual learning
and accuracy discourages optimizing the diversity. The higher framework is used to let the layers fit a residual mapping, by
diversity shows better generalization performance [47]. connecting shortcut (skip) layers and adding them to the outputs
The diversity between two classifier (hi , hj ) outputs (cor- of the stacked layers. This network can also be realized as a
rect/incorrect) is measured by feed forward neural network with shortcut connections.
Ndiff The formulation is shown in Fig. 2 [48], the original map-
Div (i, j) = (3) ping is H(x), and which then changes to H (x) = F (x) + x.
N
H (x) = F (x) + x is realized by a shortcut connection. Short-
where Ndiff is the number of samples on which the two classifiers cut connections add neither an extra parameter nor computa-
make wrong results and N is the number of the test samples. tional complexity. In Fig. 2, the weight layer is composed of
Then, the diversity of the ensemble is an average value of all convolution layers, a batch normalization layer, and a ReLU
pairs of Div(i, j) estimated by the following formula: layer.
 L L
j=1 Div (i, j)
ResNet ensemble and deep CNN ensemble are similar. In the
i=1
DIV (Ensemble) = , i = j (4) ResNet ensemble, ResNet is used to substitute the deep CNN
K
in deep CNN ensemble-based method in Section II-B. The only
where K is the number of individual classifiers. difference is that a ResNet is used instead of a deep CNN in
the ResNet ensemble. To do so, first, some individual ResNets
C. ResNet and ResNet Ensemble for HIS Classification with randomly selected bands are designed and trained. Second,
On the one hand, the stacking of layers in a CNN causes a the classification result of the ResNet ensemble with transfer
degradation problem [48]. A new building block (i.e., residual learning is obtained by majority voting.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1886 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

III. DEEP CNN ENSEMBLE WITH TRANSFER LEARNING


FOR HSI CLASSIFICATION

On one hand, an ensemble classification system always ob-


tains better classification accuracy than an individual classifier.
On the other hand, an ensemble classification system always
takes a much longer processing time compared with a single
classifier. In order to build an ensemble system with short train-
ing time, a deep CNN ensemble with transfer learning is pro-
posed for HSI classification—the core idea behind this method
is that the learned weights (i.e., connections) of an individual
CNN can be used for another individual CNN.

A. Transfer Learning
Transfer learning aims at transferring knowledge between the
“source” space and “task” space, which is inspired by the hu-
man learning behavior [49]. In general, it is easier for us to learn
a task by using our previous knowledge. Similar to this, pre- Fig. 4. Salinas dataset: (Left) False color composite image (bands 50, 28, and
15) and (right) ground truth.
viously learned knowledge is also useful for machine learning
algorithms. It should be noted that the training process of the individual
The power of deep learning models, including CNNs, depends CNNs demands fewer training epochs to converge. Furthermore,
on the weights (i.e., connections) of the network. Therefore, the the proper weight initialization may avoid the local minima,
core issue of training a deep model is to find a set of proper which always leads to a poor generalization performance.
weights. In the context of transfer learning based deep models,
we want to transfer the weights from the “source” space to the IV. EXPERIMENTS AND RESULT ANALYSIS
“task” space. Transfer learning tries to boost the performance
of the target domain by taking full advantages of the source A. Data Description
domain. To evaluate the performance of the proposed deep CNN en-
Transfer learning is widely used in the machine learning com- semble methods, two public benchmark HSI datasets are used
munity, including deep neural network based methods. In [50], in the experiments.
the convolutional layers of CNNs are pretrained on a large-scale The first dataset, Salinas, is composed of 512 × 217 pixels
supervised task. The architecture of CNNs often has millions with a spatial resolution of 3.7 m and 204 bands after removing
of parameters, so the direct learning of such a large number 20 water absorption bands (bands: [108–112], [154–167], and
of parameters using only a few training images is problem- 224). In total 16 different land-cover classes are defined for the
atic. The key idea of transfer CNN learning is that the convolu- site. The false color composite image is shown in Fig. 4. The
tional layers of CNNs can act as a “generic extractor” of image number of samples in each class is listed in Table I.
representation. The second dataset, Indian Pines, is with the size of 145 × 145
pixels and 224 spectral reflectance bands in the wavelength range
of 0.4–2.5 μm. As with the Salinas scene, 20 water absorption
B. Deep CNN Ensemble With Transfer Learning for HSI
bands have been removed (bands: [104–108], [150–163], and
Classification
220). The false color composite image is shown in Fig. 5. The
The deep CNN ensemble with the transfer learning framework number of samples in each class is given in Table II.
is shown in Fig. 3. As can be seen, the core part of the RFS and The third dataset, Pavia University, is with the size of 610 ×
a CNN is similar to what we presented for the previous deep 340 pixels and 115 bands in the 0.43–0.86-μm range, which was
learning ensemble method in Section II. The special part of the gathered by the Reflective Optics System Imaging Spectrome-
deep CNN ensemble with transfer learning is the weight transfer ter (ROSIS-3). The spatial resolution is 1.3 m per pixel, and
between different CNNs. To do so, first, a CNN with all bands nine land-cover classes are selected. In the experiment, noisy
is designed and trained. Second, some individual CNNs with bands were removed and the remaining 103 channels were used
randomly selected bands are designed. Instead of initializing the for classification. The false color composite image is shown in
weights of individual CNNs in a random manner, the weights of Fig. 6. The samples are listed in Table III.
the first convolutional layers of individual CNNs are transferred For the number of training samples, 300 labeled samples of
from the CNN with all bands. The weights of the remaining Salinas, Indian Pines, and Pavia University datasets are ran-
layers are randomly initialized. Then, the individual CNNs are domly selected in total as training samples. The rest of the la-
trained through backpropagation. At last, the classification result beled samples are used as test samples. In this way, we only use
of the deep CNN ensemble with transfer learning is obtained by a very limited number of training samples for the classification
majority voting. (i.e., there are 16 classes in Salinas and Indian Pines datasets,

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1887

TABLE I
LAND COVER CLASSES AND THE NUMBER OF LABELED PIXELS
FOR THE SALINAS DATASET

Fig. 6. Pavia University dataset. (Left) False color composite (bands 10, 27,
and 46) and (right) ground truth.
TABLE III
LAND COVER CLASSES AND THE NUMBERS OF LABELED PIXELS
FOR THE PAVIA UNIVERSITY DATASET

Fig. 5. Indian Pines dataset. (Left) False color composite image (bands 28, TABLE IV
19, and 10) and (right) ground truth. ARCHITECTURE OF THE DEEP CNN DESIGN FOR SALINAS,
INDIAN PINES, AND PAVIA UNIVERSITY

TABLE II
LAND COVER CLASSES AND THE NUMBERS OF LABELED PIXELS
FOR THE INDIAN PINES DATASET

so it is about 19 training samples for each class. There are nine


classes in the Pavia University dataset, so it is about 33 training
samples for each class).

B. Experimental Setup
In this paper, we used a deep CNN as the individual classi-
fier. The numbers of nodes in hidden layers, learning rate, and
window size are determined by a trial and error way.
The architectures of the individual deep CNN of the three
datasets are shown in Table IV. There are three convolutional
(Conv.) layers, three ReLU layers, and three pooling layers. The
BN layer is used to accelerate the convergence and optimizes

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1888 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

Fig. 7. Classification results of individual classifiers with various algorithms while the feature size is 30. (a) Salinas. (b) Indian Pines. (c) Pavia University.

TABLE V including radial basis function (RBF) SVM, extended MP


HYPERPARAMETERS OF THE DEEP CNN ENSEMBLE TRAINING
SVM (EMP-SVM), CNN, and RBF-SVM with ensemble, are
used for comparison. To obtain the best parameters: C and
γ in SVMs, we use 2-D grid search from a wide range
(C = 2−5 , 2−4 , . . . , 219 ; γ = 2−15 , 2−14 , . . . 24 ).
To quantitatively compare the performance of the proposed
models, overall accuracy (OA), Kappa coefficient (K), and di-
versity (DIV) are used.

C. Deep CNN Ensemble

the whole networks. For the Salinas dataset, we add a dropout First of all, we present the classification results of individ-
layer in the second convolutional layer. The dropout ratio is set ual classifiers. Fig. 7 shows the HSI classification results of
to 0.5. In Table IV, the structure of the Conv. layer is a × a, ten individual classifiers with RBF-SVM and CNN-based on
which means that a × a is the window size. The softmax loss RFS, whereas the number of bands (features) in an individual
layer is the output layer of the neural network, which produces classifier is 30. From Fig. 7, one can see that the CNN-based
the corresponding labels. The structure of the output layer is method usually obtains much better classification accuracy than
1 × nClass where nClass shows the number of land cover. The the RBF-SVM-based classifier on the Salinas, Indian Pines, and
size of training set, the parameters setting of learning rate, and Pavia University datasets.
training epoch for three different datasets are shown in Table V. Then, we present the classification results of ensemble-based
For the architectures of ResNet of three datasets, there are classifiers. Table VI shows the classification performance of
three building blocks [48] in a classifier. Each building block the deep CNN ensemble with different feature sizes and dif-
obtains two convolutional layers, two ReLU layers, and two ferent ensemble sizes. The best accuracy is highlighted in
BN layers. bold.
In our experiment, two other important parameters are worth For the Salinas dataset, the highest accuracy is 96.05% (K
explaining, which are feature size F [the number of bands (fea- × 100 = 95.93) among all the values of feature size and ensem-
tures) in a subset] and ensemble size E (the number of individual ble size. For the Indian Pines dataset, the highest accuracy is
classifiers) [51]. In order to adjust the empirical parameters, the 92.54% (K × 100 = 90.94). For the Pavia University dataset,
accuracy of each classifier should be analyzed by the influence the highest accuracy is 94.98% (K × 100 = 92.04). In Table VI,
rules of the empirical parameters. Therefore, we decided the size at first, with the increase of feature size, the classification accu-
F = (10, 20, 30, 50) and the size E = (5, 10, 30, 50). racy increases, and then the accuracy remains at a certain level or
Another vital point is the adoption of transfer learning. Un- ever slightly decreases. From Table VI, we can see that when the
like the random initialization procedure in a normal CNN, to feature size is fewer than ten bands in the Indian Pines dataset,
accelerate the convergence and improve the classification per- the experimental results are the worst of all because the spectral
formance in a small number of epochs, we load the pretrained information is insufficient. Table VI also shows the result with-
weight values of the network as the initialized parameters, which out using RFS. We can see that the accuracy is slightly higher
means that the weights of the first convolutional layers of indi- than that of a deep CNN, but not higher than the accuracy using
vidual CNNs are transferred from the CNN with all bands, and RFS. It proves that the CNN ensemble with RFS is an effective
the weights of the remaining layers are randomly initialized. strategy. Through the experimental results, one can obtain that
In addition, in order to evaluate the performance of the pro- if we choose the proper feature size and ensemble size, we can
posed deep CNN ensemble method, four different methods, obtain very accurate classification results.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1889

TABLE VI
CLASSIFICATION RESULTS AND THE DIVERSITY IN DEEP CNN ENSEMBLE

TABLE VII
CLASSIFICATION RESULTS OBTAINED BY DIFFERENT METHODS ON THE SALINAS DATASET

In Table VI , the diversity of ensemble classifiers with vari- of the two parameters simultaneously, such as (F = 10, E =
ous parameters is presented. The diversity values float up and 30), to achieve accurate classification results from the ensemble
down in a small range. It is clear that when the values of the system.
feature size and the ensemble size are low, i.e., (F = 10, E = The classification accuracy of each class on Salinas, Indian
5) in Salinas dataset, (F = 10, E = 10) in Indian Pines dataset, Pines, and Pavia University datasets are listed in Tables VII–
and (F = 5, E = 5) in Pavia University dataset, the value of IX. From Table VII, it can be seen that the proposed methods
diversity is high. However, at the same time, the classification outperform other state-of-the-art methods. In Table VII, the deep
performance is poor. The diversity is inversely proportional to CNN ensemble obtains the highest OA, AA, and Kappa, with an
the classification. The final classification result depends on the improvement of 2.7%, 6.62%, and 0.0353 over the CNN method,
diversity value (DIV coefficient) and individual classification respectively. For the Indian Pines and Pavia University datasets,
performance [52], [53]. It is necessary to find the proper values the same trend can be observed.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1890 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

TABLE VIII
CLASSIFICATION RESULTS OBTAINED BY DIFFERENT METHODS ON THE INDIAN PINES DATASET

TABLE IX
CLASSIFICATION RESULTS OBTAINED BY DIFFERENT METHODS ON THE PAVIA UNIVERSITY DATASET

Fig. 8. Classification results of the individual classifier with various algorithms and the feature size is 30. (a) Salinas. (b) Indian Pines.(c) Pavia University.

D. Deep CNN Ensemble With Transfer feature sizes is 30 for an individual classifier. In Fig. 8, the clas-
sification results of RBF-SVM, deep CNN, and deep CNN with
In this section, we present the classification results of the deep transfer learning based on RFS are presented. The green line
CNN ensemble with transfer learning. The classification results represents the deep CNN method with transfer learning whose
of the individual classifier are shown in Fig. 8. We randomly
performance is usually much better than other two methods (i.e.,
selected 30 bands (features), which means that the number of

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1891

TABLE X
CLASSIFICATION RESULTS AND THE DIVERSITY IN DEEP CNN ENSEMBLE AND DEEP CNN ENSEMBLE WITH TRANSFER

RBF-SVM and CNN) on the Salinas, Indian Pines, and Pavia proves that the method of the deep CNN ensemble with transfer
University datasets. Due to the usage of learned parameters, a learning greatly improves the accuracy.
deep CNN with transfer learning usually obtains better classifi- Table X also shows the results without using RFS. It can be
cation performance than a CNN. seen that in most cases, when the feature size and ensemble size
The results of a deep CNN with transfer learning are shown are bigger, the accuracy using RFS is higher than that of a deep
in Table X. For Salinas dataset, the highest accuracy is 97.23% CNN. Except for this situation (i.e., F = 10, E = 5), the result
(Kappa × 100 = 95.88) among all the values of feature size and of the deep CNN ensemble is slightly higher than the result
ensemble size. For Indian Pines dataset, the highest accuracy is without using RFS for the Pavia University dataset. It proves
94.71% (K × 100 = 93.61). For Pavia University dataset, the the fact that the method of the CNN ensemble with RFS is an
highest accuracy is 95.96% (K × 100 = 94.84). In the experi- effective strategy.
ment of a deep CNN, the highest classification result is (F = The classification performance, including OA, AA, and
30, E = 10), however in the deep CNN ensemble with transfer, Kappa, for Salinas, Indian Pines, and Pavia University datasets
the highest classification result is (F = 50, E = 10) on Sali- with two methods of deep CNN ensemble and deep CNN
nas dataset. Compared with the deep CNN ensemble method, ensemble with transfer learning are listed in Table XI. The
the deep CNN ensemble method with transfer learning achieves deep CNN ensemble with transfer learning obtained best per-
2.23% of improvement (F = 50, E = 30) on Indian Pines and formance compared with the deep CNN ensemble on two
1.92% (F = 50, E = 10) on Pavia University dataset. This datasets.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1892 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

TABLE XI
CLASSIFICATION PERFORMANCE ON THE SALINAS, INDIAN PINES, AND PAVIA
UNIVERSITY DATASETS WITH AND WITHOUT TRANSFER LEARNING

Fig. 9. Classification results of Salinas, Indian Pines, and Pavia University


with various algorithms.

For the Indian Pines dataset, the deep learning ensemble methods
demand long training time, but they are fast at the testing stage.
However, for the Salinas dataset and the Pavia University dataset,
they cost more time to test, because the test samples are bigger
than the Indian Pines dataset.
In Table XIII, the training time is reduced to 1.47 min with
(F = 50, E = 50) on the Indian Pines dataset, for the method
of the deep CNN ensemble and the method of the deep CNN
Finally, the detailed classification results on Salinas, Indian ensemble with transfer. For the Salinas and Pavia University
Pines, and Pavia University datasets with different methods are datasets, the training time is also reduced since the learning
shown in Fig. 9. The other studied methods are RBF-SVM, process started from the learned parameters. From Table XIII,
EMP-SVM, RBF-SVM with ensemble, deep CNN, deep CNN as the ensemble size and feature size increase, the test time also
ensemble, and deep CNN ensemble with transfer learning. The increases. The test times of the CNN ensemble with transfer and
method of deep CNN ensemble with transfer achieves the high- without transfer are almost the same.
est accuracy and it is more accurate than RBF-SVM by 12.82% Table XIV shows the training and test time of the ResNet
on the Salinas dataset. For the case of the Indian Pines and Pavia ensemble; as the feature size and ensemble size increase, the time
University datasets, the deep CNN ensemble with transfer learn- also increases. For example, when feature size increased to 20,
ing is the most accurate algorithm compared to other state-of- the time will be increased by 0.11, 0.09, and 0.06 min on Salinas,
the-art methods. Indian Pines, and Pavia University datasets, respectively. When
the ensemble size and feature size are low (i.e., F = 10, E = 5),
the time is the shortest for three datasets.
E. ResNet Ensemble
As can be seen in Table XV, the training time of a deep CNN
The results of the ResNet applied to Salinas, Indian Pines, with all bands for the Salinas dataset is 2.98 min, for the Indian
and Pavia University datasets are 96.88%, 92.30%, and 93.89%, Pines, it is 2.83 min, and for Pavia University, it is 2.85 min.
respectively. The results of the ResNet ensemble are shown in The training time of the ResNet with all bands for the Salinas
Table XII. For the Salinas dataset, the highest accuracy among all dataset is 4.58 min, for the Indian Pines dataset, it is 3.53 min,
the values of feature size and ensemble size is 99.45% (K × 100 and for the Pavia University dataset, it is 2.79 min.
= 97.99). For the Indian Pines dataset, the highest accuracy is
95.36% (K × 100 = 94.78). For the Pavia University dataset,
the highest accuracy is 96.29% (K × 100 = 94.59). G. Classification Map
Figs. 10–12 show the classification maps obtained by RBF-
F. Computing Time SVM, EMP-SVM, RBF-SVM with ensemble, CNN, deep CNN
To compare the computational complexity of different algo- ensemble, deep CNN ensemble with transfer learning, ResNet,
rithms, the computing time of these algorithms are listed in Ta- and ResNet ensemble. There are noisy scatter points in the clas-
bles XIII–XV. The experiments were run on 3.2-GHz CPU with sification maps mostly because a limited numbers of training
a GTX 1060 GPU card. samples is used for classification. Deep CNN ensemble, deep
From Tables XIII and XIV, one can see that the training and CNN ensemble with transfer learning, ResNet, and ResNet en-
test time increase as the ensemble size and feature size increase. semble produce accurate classification maps.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1893

TABLE XII
CLASSIFICATION RESULTS AND THE DIVERSITY IN OF RESNET ENSEMBLE ON THE SALINAS, INDIAN PINES, AND PAVIA UNIVERSITY DATASETS

Fig. 10. (a) Ground truth of Salinas dataset. Classification maps obtained by different methods on Salinas. (b) RBF-SVM. (c) EMP-SVM. (d) RBF-SVM with
ensemble. (e) Deep CNN. (f) Deep CNN Ensemble. (g) Deep CNN ensemble with transfer. (h) ResNet. (i) ResNet ensemble.

Fig. 11. (a) Ground truth of Indian Pines dataset. Classification maps obtained by different methods. (b) RBF-SVM. (c) EMP-SVM. (d) RBF-SVM with ensemble.
(e) Deep CNN. (f) Deep CNN ensemble.(g) Deep CNN ensemble with transfer. (h) ResNet. (i) ResNet ensemble.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1894 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

TABLE XIII
TRAINING AND TEST TIME (MIN.) IN DEEP CNN ENSEMBLE AND DEEP CNN ENSEMBLE WITH TRANSFER

TABLE XIV
TRAINING AND TEST TIME (MIN.) IN RESNET ENSEMBLE

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1895

TABLE XV
TRAINING AND TEST TIME OF VARIOUS CLASSIFICATION METHODS

Fig. 12. (a) Ground truth of Pavia University dataset. Classification maps obtained by different methods. (b) RBF-SVM. (c) EMP-SVM. (d) RBF-SVM with
ensemble. (e) Deep CNN. (f) Deep CNN ensemble. (g) Deep CNN ensemble with transfer. (h) ResNet. (i) ResNet ensemble.

V. DISCUSSIONS AND CONCLUSION [3] J. Ham, Y. Chen, M. M. Crawford, and J. Ghosh, “Investigation of the
random forest framework for classification of hyperspectral data,” IEEE
In this paper, the idea of deep learning ensemble for HSI Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 492–501, Mar. 2005.
classification is investigated for the first time. Two deep learn- [4] G. Wang, D. Hoiem, and D. Forsyth, “Learning image similarity from
Flickr groups using fast kernel machines,” IEEE Trans. Pattern Anal.
ing ensemble-based classification methods (i.e., CNN ensemble Mach. Intell., vol. 34, no. 11, pp. 2177–2188, Nov. 2012.
and ResNet ensemble) for HSI classification are proposed. In [5] P. Ghamisi, J. Plaza, Y. Chen, J. Li, and A. J. Plaza, “Advanced spectral
the proposed deep learning ensemble, deep learning models are classifiers for hyperspectral images: A review,” IEEE Geosci. Remote Sens.
Mag., vol. 5, no. 1, pp. 8–32, Mar. 2017.
used as individual classifiers, and RSE is used to formulate an [6] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spec-
ensemble system with diversity. At last, majority voting is used tral and spatial classification of hyperspectral data using SVMs and mor-
to obtain the final HSI classification result. With the help of RFS- phological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11,
pp. 3804–3814, Nov. 2008.
based ensemble learning system, deep CNN ensemble achieves [7] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. Benediktsson, “SVM and
better performance in terms of classification accuracy than the MRF based method for accurate classification of hyperspectral images,”
traditional CNN and SVM-based ensemble methods. The pro- IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 640–736, Oct. 2010.
[8] U. Srinivas, Y. Chen, V. Monga, N. M. Nasrabadi, and T. D. Tran, “Exploit-
posed classification framework is simple yet effective. ing sparsity in hyperspectral image classification via graphical models,”
In order to further boost the performance of HSI classification, IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 505–509, May 2013.
a deep CNN ensemble system with transfer learning is designed, [9] L. He, Y. Li, X. Li, and W. Wu, “Spectral–spatial classification of hyper-
spectral images via spatial translation-invariant wavelet-based sparse rep-
which is a modification of the proposed method in Section II. The resentation,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2696–
learnt weights are used to initiate the weights of the following 2712, May 2015.
individual classifiers to accelerate the training procedure and [10] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image classifi-
cation using dictionary-based sparse representation,” IEEE Trans. Geosci.
improve the classification accuracy simultaneously. Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.
The aforementioned deep CNN ensemble systems have shown [11] P. Ghamisi et al., “New frontiers in spectral-spatial hyperspectral im-
their potential for accurate HSI classification. The combina- age classification: the latest advances based on mathematical morphol-
ogy, markov random fields, segmentation, sparse representation, and deep
tion of deep learning and ensemble opens a new window for learning,” IEEE Geosci. Remote Sensing Mag., vol. 6, no. 3, pp. 10–43,
HSI classification. Deep learning with other ensemble systems Sep. 2018.
including random forest for HSI classification can be explored [12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Pro-
in the near future. cess. Syst., Sep. 2012, vol. 60 pp. 1097–1105.
[13] R. Sarikaya, G. E. Hinton, and B. Ramabhadran, “Deep belief nets for
natural language call-routing,” in Proc. IEEE Int. Conf. Acoust., Speech,
REFERENCES Signal Process., May 2011, pp. 5680–5683.
[14] L. Deng, G. Hinton, and B. Kingsbury, “New types of deep neural network
[1] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens- learning for speech recognition and related applications: An overview,” in
ing images with support vector machines,” IEEE Trans. Geosci. Remote Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Vancouver, BC,
Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. Canada, May 2013, pp. 8599–8603.
[2] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, [15] L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data:
“Advances in spectral-spatial classification of hyperspectral images,” Proc. A technical tutorial on the state of the art,” IEEE Geosci. Remote Sens.
IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013. Mag., vol. 4, no. 2, pp. 22–40, Jun. 2016.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
1896 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 12, NO. 6, JUNE 2019

[16] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based clas- [39] Y. Kim, “Convolutional neural networks for sentence classification,”
sification of hyperspectral data,” IEEE J. Sel. Topics Appl. Earth Observ. in Proc. Conf. Empirical Methods Natural Lang. Process., Oct 2014,
Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014. pp. 1746–1751.
[17] P. Zhong, Z. Gong, S. Li, and C.-B. Schönlieb, “Learning to diversify [40] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge,
deep belief networks for hyperspectral image classification,” IEEE Geosci. MA, USA: MIT Press, 2016.
Remote Sens., vol. 55, no. 6, pp. 3516–3530, Jun. 2017. [41] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
[18] J. Yue, W. Zhao, S. Mao, and H. Liu, “Spectral–spatial classification of training by reducing internal covariate shift,” in Proc. 32nd Int. Conf.
hyperspectral images using deep convolutional neural networks,” Remote Mach. Learn., 2015, pp. 448–456.
Sens. Lett., vol. 6, no. 6, pp. 468–477, 2015. [42] J. Xia, M. Mura, J. Chanussot, P. Du, and X. He, “Random subspace en-
[19] L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent neural networks for semble for hyperspectral image classification with extended morpholog-
hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., ical attribute profiles,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 9,
vol. 55, no. 7, pp. 3639–3655, Jul. 2017. pp. 4768–4786, Sep. 2015.
[20] V. Singhal, H. Aggarwal, S. Tariyal, and A. Majumdar, “Discriminative ro- [43] U. Bhattacharya and B. B. Chaudhuri, “A majority voting scheme for
bust deep dictionary learning for hyperspectral image classification,” IEEE multiresolution recognition of hand printed numeral,” in Proc. 7th Int.
Trans. Geosci. Remote Sens., vol. 55, no. 9, pp. 5274–5283, Sep. 2017. Conf. Document Anal. Recognit., Aug. 2003, vol. 1, no. 1, pp. 16–20.
[21] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional neural [44] J. A. Benediktsson, J. A. Palmason, and J. R Sveinsson, “Classification
networks for hyperspectral image classification,” J. Sensors, vol. 2015, of hyperspectral data from urban areas based on extended morphological
Jan. 2015, Art. no. 258619. profiles” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–491,
[22] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classification Mar. 2005.
using deep pixel-pair features,” IEEE Trans. Geosci. Remote Sens., vol. 55, [45] J. Xia, P. Ghamisi, N. Yokoya, and A. Iwasaki, “Random forest ensembles
no. 2, pp. 844–853, Feb. 2017. and extended multiextinction profiles for hyperspectral image classifica-
[23] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extraction tion,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 1, pp. 202–216,
and classification of hyperspectral images based on convolutional neural Jan. 2018.
networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 6232– [46] Y. Bi, “The impact of diversity on the accuracy of evidential classi-
6251, Oct. 2016. fier ensembles,” Int. J . Approx. Reason., vol. 53, no. 4, pp. 584–607,
[24] Y. Li, H. Zhang, and Q. Shen, “Spectral–spatial classification of hyper- 2012.
spectral imagery with 3D convolutional neural network,” Remote Sens., [47] C. A. Shipp and L. I. Kuncheva, “Relationships between combination
vol. 9, no. 1, 2017, Art. no. 67. methods and measures of diversity in combining classifier,” Inf. Fusion,
[25] Z. Zhong, J. Li, Z. Luo, and M. Chapman, “Spectral–spatial residual net- vol. 3, no. 2, pp. 135–148, 2002.
work for hyperspectral image classification—A 3-D deep learning frame- [48] K. He, X. Zhag, S. Ren, and J. Sun, “Deep residual learning for image
work” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 2, pp. 847–858, recongnition,” in Proc. IEEE Conf. Comput. Vision Pattern Recognit.,
Feb. 2018. Dec. 2015, pp. 770–778.
[26] M. E. Paoletti, J. M. Haut, R. F. Beltran, A. J. Plazza, and F. Pla, “Deep [49] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl.
pyramidal residual networks for spectral-spatial hyperspectral image clas- Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
sification” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 2, pp. 740–754, [50] H.-C. Shin et al., “Deep convolutional neural networks for computer-
Feb. 2019. aided detection: CNN architectures, dataset characteristics and trans-
[27] H. Liang and Q. Li, “Hyperspectral imagery classification using sparse fer learning,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1285–1298,
representations of convolutional neural network features,” Remote Sens., May 2016.
vol. 8, no. 2, 2016, Art. no. 99. [51] J. Xia, J. Chanussot, P. Du, and X. He, “Spectral–spatial classification for
[28] Y. Chen, L. Zhu, P. Ghamisi, X. Jia, G. Li, and L. Tang, “Hyperspectral hyperspectral data using rotation forests with local feature extraction and
images classification with Gabor filtering and convolutional neural net- Markov random fields,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5,
work,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 12, pp. 2355–2359, pp. 2532–2546, May 2015.
Dec. 2017. [52] Y. Chen, X. Zhao, and Z. Lin, “Optimizing subspace SVM ensemble for
[29] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123– hyperspectral imagery classification,” IEEE J. Sel. Topics Appl. Earth Ob-
140, 1996. serv. Remote Sens., vol. 7, no. 4, pp. 1295–1305, Apr. 2014.
[30] M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F. Herrera, “A [53] J. Xia, M. D. Mura, J. Chanussot, P. Du, and X. He, “Random subspace
review on ensembles for the class imbalance problem: Bagging-, boosting, ensembles for hyperspectral image classification with extended morpho-
and hybrid-based approaches,” IEEE Trans. Syst., Man, Cybern. C, Appl. logical attribute profiles,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 9,
Rev., vol. 42, no. 4, pp. 463–484, Jul. 2012. pp. 4768–4786, Sep. 2015.
[31] A. Takemura, A. Shimizu, and K. Hamamoto, “Discrimination of breast
tumors in ultrasonic images using an ensemble classifier based on the Ad-
aBoost algorithm with feature selection,” IEEE Trans. Med. Imag., vol. 29,
no. 3, pp. 598–609, Mar. 2010. Yushi Chen (M’11) received the B.S., M.S., and
[32] T. Huynh et al., “Estimating CT image from MRI data using structured Ph.D. degrees from the Harbin Institute of Technol-
random forest and auto-context model,” IEEE Trans. Med. Imag., vol. 35, ogy, Harbin, China, in 2001, 2003, and 2008, respec-
no. 1, pp. 174–183, Jan. 2015. tively.
[33] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based on He is currently an Associate Professor with the
structured sparse logistic regression and three-dimensional wavelet texture School of Electronics and Information Engineering,
features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4, pp. 2276–2291, Harbin Institute of Technology. He has authored or
Apr. 2013. coauthored more than 40 peer-reviewed papers. His
[34] G. M.Foody and M. Ajay, “A relative evaluation of multiclass image clas- research interests include remote sensing data pro-
sification by support vector machines,” IEEE Trans. Geosci. Remote Sens., cessing and machine learning.
vol. 42, no. 6, pp. 1335–1343, Jun. 2004.
[35] F. A. Faria et al., “A framework for selection and fusion of pattern classi-
fiers in multimedia recognition,” Pattern Recognit. Lett., vol. 39, pp. 52–64,
Apr. 2014. Ying Wang is currently working toward the master’s
[36] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, degree with the Higher Education Key Lab for Mea-
pp. 436–444, May 2015. sure & Control Technology and Instrumentations of
[37] K. Simonyan and A. Zisserman, “Very deep convolutional networks for Heilongjiang, Harbin University of Science and Tech-
large-scale image recognition,” in Proc. Int. Conf. Learn. Representations., nology, Harbin, China.
San Diego, CA, USA, 2014, pp. 1–14. Her research interests include hyperspectral image
[38] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar- classification, machine learning, and deep learning.
chies for accurate object detection and semantic segmentation,” in Proc.
IEEE Conf. Comput. Vision Pattern Recognit., Columbus, OH, USA, 2014,
pp. 581–587.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DEEP LEARNING ENSEMBLE FOR HYPERSPECTRAL IMAGE CLASSIFICATION 1897

Yanfeng Gu (M’06–SM’16) received the Ph.D. de- Pedram Ghamisi (S’12–M’15–SM’18) received the
gree in information and communication engineer- B.Sc. degree in civil (survey) engineering from Is-
ing from the Harbin Institute of Technology (HIT), lamic Azad University, South Tehran Branch, Tehran,
Harbin, China, in 2005. Iran, the M.Sc. (Hons.) degree in remote sensing from
He was a Lecturer with the School of Electron- the K. N. Toosi University of Technology, Tehran, in
ics and Information Engineering, HIT, where he was 2012, and the Ph.D. degree in electrical and computer
appointed as an Associate Professor in 2006, and en- engineering from the University of Iceland, Reyk-
rolled in the first Outstanding Young Teacher Train- javik, Iceland, in 2015.
ing Program. From 2011 to 2012, he was a Visiting He is currently the Head of the Machine Learn-
Scholar with the Department of Electrical Engineer- ing Group, Helmholtz-Zentrum Dresden-Rossendorf,
ing and Computer Science, University of California Dresden, Germany. His research interests include in-
at Berkeley, Berkeley, CA, USA. He is currently a Professor with the Department terdisciplinary research on remote sensing and machine (deep) learning, image
of Information Engineering, HIT. He has authored more than 60 peer-reviewed and signal processing, and multisensory data fusion.
papers and four book chapters, and he is the Inventor or Co-Inventor of seven
patents. His research interests include image processing in remote sensing, ma-
chine learning and pattern analysis, and multiscale geometric analysis.
Dr. Gu is currently an Associate Editor for the IEEE TRANSACTIONS ON
GEOSCIENCE AND REMOTE SENSING and Neurocomputing. He is also a Peer
Reviewer of several international journals, such as the IEEE TRANSACTIONS Xiuping Jia (M’93–SM’03) received the B. Eng.
ON GEOSCIENCE AND REMOTE SENSING, IEEE TRANSACTIONS ON IMAGE PRO- degree from the Beijing University of Posts and
CESSING, and Remote Sensing of Environment. Telecommunications, Beijing, China, in 1982, and the
Ph.D. degree in electrical engineering from The Uni-
versity of New South Wales, Canberra, ACT, Aus-
tralia, in 1996.
Since 1988, she has been with the School of Infor-
mation Technology and Electrical Engineering, The
University of New South Wales, where she is cur-
Xin He received the M.S. degree from the Harbin Uni-
rently a Senior Lecturer. She is also a Guest Professor
versity of Science and Technology, Harbin, China, in
with Harbin Engineering University, Harbin, China,
2019. She is currently working toward the Ph.D. de-
and an Adjunct Researcher with the China National Engineering Research Cen-
gree with the School of Electronics and Information
ter for Information Technology in Agriculture. She is the co-author of the remote
Engineering, Harbin Institute of Technology, Harbin,
sensing textbook titled Remote Sensing Digital Image Analysis [Springer-Verlag,
China.
1999 (3rd edition) and 2006 (4th edition)]. Her research interests include remote
Her research interests include remote sensing im- sensing and image data analysis.
age processing based on deep learning methods.
Dr. Jia was the Inaugural Chair of IEEE ACT&NSW Section GRSS Chapter
from 2010 to 2013. She is currently an Associate Editor for the IEEE TRANS-
ACTIONS ON GEOSCIENCE AND REMOTE SENSING.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 06:40:01 UTC from IEEE Xplore. Restrictions apply.

You might also like