0% found this document useful (0 votes)
3 views

2017 Multiple Kernel Learning for Hyperspectral Image Classification A Review

This document reviews multiple kernel learning (MKL) methods applied to hyperspectral image (HSI) classification, highlighting their effectiveness in utilizing spatial-spectral information for various applications in remote sensing. It discusses the evolution of MKL algorithms, their characteristics, and their integration with other classification techniques, including supervised classification and ensemble learning. The paper also outlines future research directions in MKL for HSI classification.

Uploaded by

cosay63994
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

2017 Multiple Kernel Learning for Hyperspectral Image Classification A Review

This document reviews multiple kernel learning (MKL) methods applied to hyperspectral image (HSI) classification, highlighting their effectiveness in utilizing spatial-spectral information for various applications in remote sensing. It discusses the evolution of MKL algorithms, their characteristics, and their integration with other classification techniques, including supervised classification and ensemble learning. The paper also outlines future research directions in MKL for HSI classification.

Uploaded by

cosay63994
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO.

11, NOVEMBER 2017 6547

Multiple Kernel Learning for Hyperspectral


Image Classification: A Review
Yanfeng Gu, Senior Member, IEEE, Jocelyn Chanussot, Fellow, IEEE, Xiuping Jia, Senior Member, IEEE,
and Jón Atli Benediktsson, Fellow, IEEE

Abstract— With the rapid development of spectral imaging or emissivity [1], [2]. Hyperspectral sensors record the col-
techniques, classification of hyperspectral images (HSIs) has lected information in a series of images, which provide the
attracted great attention in various applications such as land spatial distribution of the reflected solar radiation from the
survey and resource monitoring in the field of remote sensing.
A key challenge in HSI classification is how to explore effective scene of observation [3]. These images are arranged into
approaches to fully use the spatial–spectral information provided a 3-D hyperspectral data cube for subsequent analysis and
by the data cube. Multiple kernel learning (MKL) has been processing. The hyperspectral images (HSIs), which contain
successfully applied to HSI classification due to its capacity to a significant amount of detailed information on land covers
handle heterogeneous fusion of both spectral and spatial features. and the environmental state, can be used for various thematic
This approach can generate an adaptive kernel as an optimally
weighted sum of a few fixed kernels to model a nonlinear applications, such as ecological science [4]–[6], hydrolog-
data structure. In this way, the difficulty of kernel selection ical science [7], geological science [8], precision agricul-
and the limitation of a fixed kernel can be alleviated. Various ture [9], [10], and military applications [11], [12]. The success
MKL algorithms have been developed in recent years, such of the applications, however, relies heavily on the appropriate
as the general MKL, the subspace MKL, the nonlinear MKL, data processing approaches and techniques, including unmix-
the sparse MKL, and the ensemble MKL. The goal of this paper
is to provide a systematic review of MKL methods, which have ing [13], [14], target detection [15]–[17], physical or chemical
been applied to HSI classification. We also analyze and evaluate parameters retrieval [18]–[20], and classification [21], [22].
different MKL algorithms and their respective characteristics in Among these approaches, supervised classification is fun-
different cases of HSI classification cases. Finally, we discuss the damental for processing. Supervised classification aims at
future direction and trends of research in this area. assigning each pixel in a scene to one of the thematic classes
Index Terms— Classification, heterogeneous features, defined [23]. The illustration of HSI supervised classification
hyperspectral images (HSIs), multiple kernel learning (MKL), is shown in Fig. 1.
remote sensing.

I. I NTRODUCTION B. Review of Methodology


A. Hyperspectral Image Data and Classification A wide range of pixel-level processing techniques for
the classification of HSIs has been developed, which can
I T IS well known that hyperspectral remote sensing has
become an important means of the earth observation and
even space exploration. It extends the number of spectral
be divided into kernel methods and methods without ker-
nelization. There are a large number of algorithms without
bands from several or dozens to hundreds, so that each pixel kernelization for HSI classification. Among those methods,
in the scene contains a continuous spectrum that is used to the k-nearest neighbors, the minimum distance classifier,
identify the materials presented in the pixel by their reflectance the maximum likelihood, and the Bayesian estimation meth-
ods [24], [25] are conventional statistical approaches. Recently,
Manuscript received May 2, 2017; revised June 26, 2017; accepted more machine-learning methods have been gradually intro-
July 17, 2017. Date of publication August 7, 2017; date of current version
October 25, 2017. This work was supported in part by the Natural Science duced, e.g., neural networks, deep neural network (DNN),
Foundation of China for Excellent Young Scholars under Grant 61522107 and representation-based learning (RL), and ensemble learning
in part by the Natural Science Foundation of China under Grant 61371180. methods. Among these methods, DNN [26]–[30] derived
(Corresponding author: Yanfeng Gu.)
Y. Gu is with the School of Electronics and Information Engineering, Harbin from neural network [31]–[37] has been successfully devel-
Institute of Technology, Harbin 150001, China (email: [email protected]). oped in computer vision [38]–[42] and has recently attracted
J. Chanussot is with the Grenoble Institute of Technology, more attention for HSI classification [43]–[45]. Several deep
38402 Saint Martind’Hères cedex, France (e-mail: jocelyn.chanussot@
gipsa-lab.grenobleinp.fr). architecture models were exploited for HSI classification,
X. Jia is with the School of Engineering and Information Technology, such as the stacked autoencoder [44], the deep belief net-
University of New South Wales, Canberra, ACT 2600, Australia (e-mail: work [46], [47] convolutional neural networks [48], and some
[email protected]).
J. A. Benediktsson is with the Department of Electrical and Com- variants [49]–[53]. In terms of classification performance,
puter Engineering, University of Iceland, IS-107 Reykjavik, Iceland (e-mail: ensemble learning [54]–[57] is a generic framework based
[email protected]). on constructing an ensemble of individual classifiers. Accord-
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. ing to the way of generating base classifiers, the types of
Digital Object Identifier 10.1109/TGRS.2017.2729882 ensemble learning methods contain resampling of the training
0196-2892 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6548 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 1. Illustration of HSI supervised classification.

set (such as bagging [58], boosting [59]), manipulation of correlation between samples in the training data. Kernel-based
input variables (such as random subspaces [60] and rotation representation was derived from RL to solve nonlinear prob-
forests [61]), and introducing randomness (such as random lems in HSI, which assumes that a test pixel can be linearly
forest [62]–[68]), which have been successfully applied to HSI represented by training samples in the feature space. RL has
classification [69]–[71]. already been applied to HSI classification [90]–[109], which
Kernel methods have been successfully applied to HSI includes sparse representation-based classification (SRC)
classification [72] while providing an elegant way to deal with [110], [111], collaborative representation-based classifica-
nonlinear problems [73]. The main idea of kernel methods is tion (CRC) [112], and their extensions [92], [102],
to map the input data from the original space to a convenient [103], [108]. For example, to exploit spatial contexts of HSI,
feature space by a nonlinear mapping function. Inner products Chen et al. [90] proposed a joint SRC (JSRC) method under
in the feature space can be computed by a kernel function with- the assumption of a joint sparsity model [113]. These RL
out knowing the nonlinear mapping function explicitly. Then, methods can be kernelized as kernel SRC [92], kernelized
the nonlinear problems in the input space can be processed JSRC [114], kernel nonlocal joint CRC [102], and kernel
by building linear algorithms in the feature space [74]. The CRC [106], [107].
kernel support vector machine (SVM) is the most popular Furthermore, multiple kernel learning (MKL) methods have
approach applied to HSI classification among various kernel been proposed for HSI classification, as there is a very
methods [21], [74]–[77]. SVM is based on the margin max- limited selection of a single kernel, which is able to fit
imization principle, which does not require an estimation of complex data structures. MKL methods aim at constructing
the statistical distributions of classes. To address the limitation a composite kernel by combining a set of predefined base
of the curse of dimensionality for HSI classification, some kernels [115]. A framework of composite kernel machines
improved methods based on SVM have been proposed, such was presented to enhance classification of HSIs [116], which
as multiple classifiers system based on adaptive boosting [78], opens a wide field of subsequent developments for integrating
rotation-based SVM ensemble [79], particle swarm optimiza- spatial and spectral informations [117], [118], such as, the
tion SVM [80], and subspace-based SVM [81]. To enhance spatial–spectral composite kernel of superpixel [119], [120],
the ability of similarity measurements using the kernel trick, the extreme learning machine with spatial–spectral composite
a region-kernel-based SVM was proposed [82]. Considering kernel [121], the spatial–spectral composite kernels discrim-
the tensor data structure of HSI, multiclass support tensor inant analysis [122], and the locality preserving composite
machine was specifically developed for HSI classification [83]. kernel [123]. In addition, MKL methods generally focus on
However, the standard SVM classifier can only use the labeled determining key kernels to be preserved and their significance
samples to provide predicted classes for new samples. In order in optimal kernel combination. Some typical MKL methods
to consider the data structure during the classification process, have been gradually proposed for HSI classification, such
some clustering algorithms have been used [84], such as the as subspace MKL methods [124]–[127], SimpleMKL [128],
hierarchical semisupervised SVM [85] and spatial–spectral class-specific sparse MKL (CS-SMKL) [129], and nonlinear
Laplacian SVM [86]. MKL (NMKL) [130], [131].
There are some other families of kernel methods for HSI This paper presents a survey of the existing papers related
classification, such as Gaussian processes (GPs) and kernel- to MKL with special emphasis on remote sensing image
based representation. GPs provide a Bayesian nonparametric classification. The rest of this survey paper is organized as
approach of the considered classification problem [87]–[89]. follows. First, general MKL framework will be discussed in
GPs assume that the probability of belonging to a class label Section II. Then, several typical MKL methods are introduced
for an input sample is monotonically related to the value of in Section III which has been divided into five parts. They
some latent function at that sample. In GP, the covariance are subspace MKL methods and NMKL method for spatial–
kernel represents the prior assumption, which characterizes the spectral joint classification of HSI, sparse MKL methods for

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6549

Fig. 2. Illustration of nonlinear kernel mapping.

TABLE I where xi is a pixel vector with D dimension, yi is the class


S UMMARY OF THE N OTATIONS label, and D is the number of hyperspectral bands. The classes
in the original feature space are often linearly inseparable,
as shown in Fig. 2. Then, the kernel method maps these classes
to a higher dimensional feature space via nonlinear mapping
function . The mapped higher dimensional feature space is
denoted as Q, that is
 :R D → Q, X → (X). (1)

A. General MKL
MKL provides a more flexible framework so as to more
effectively mine information, compared with using a single
kernel. In MKL, a flexible combined kernel is generated by
a linear or nonlinear combination of a series of base kernels
and is used to replace the single kernel in a learning model to
achieve better ability to learn. Each base kernel may exploit the
full set of features or a subset of features [128]. Fig. 3 provides
an illustration of the comparison of multiple kernel trick
and single-kernel case. The dual problem of general linear
combined MKL is expressed as follows:
⎧ ⎫
⎨ N
1 N 
M ⎬
min max αi − αi α j yi y j ηm Km (xi , x j )
η α ⎩ 2 ⎭
i=1 i, j =1 m=1


M
s.t. ηm ≥ 0, and ηm = 1 (2)
m=1

where M is the number of candidate base kernels for combi-


nation, ηm is the weight of the mth base kernel.
All the weighting coefficients are nonnegative and sum
to one in order to ensure that the combined kernel fulfills
the positive semidefinite (PSD) condition and retains normal-
ization as base kernels. The MKL problem is designed to
feature interpretation in HSI classification, MK-boosting for optimize both the combining weights ηm and the solutions to
ensemble learning, and heterogeneous feature fusion with the original learning problem, i.e., the solutions of αi and α j
MKL, respectively. In Section IV, several typical examples for SVM in (2).
with MKL for HSI classification are demonstrated. Conclu- Learning from multiple kernels can provide better sim-
sions are drawn in Section V, followed by some remarks on ilarity measuring ability, for example, multiscale kernels,
the future work in Section VI. For easy reference, Table I lists which are RBF kernels with multiple scale parameters σ
the notations of all the symbols used in this paper. (i.e., bandwidth) [124]. Fig. 4 shows the multiscale kernel
matrices. According to the visual display of kernel matrices
II. L EARNING F ROM M ULTIPLE K ERNELS in Fig. 4, the kernelized similarity measuring appears multi-
Given a labeled training data set with N samples X = scale characteristics. The kernel with a small scale is sensitive
{xi |i = 1, 2, . . . , N }, xi ∈ R D , and Y = {yi |i = 1, 2, . . . , N }, to variation of similarities, but may result in a highly diagonal

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6550 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 3. Comparison of the multiple kernel trick and the single-kernel method.

Fig. 4. Multiscale kernel matrices.

kernel matrix, which loses generalization capability. On the 3) Ensemble Approaches: They use the idea of ensem-
contrary, with large scale, the kernel becomes insensitive to ble learning. The new base kernel is added iteratively
small variations of similarities. Therefore, by learning multi- until the minimum of cost function or the optimal
scale kernels, an optimal kernel with the best discriminative classification performance, such as MK-Boosting [135],
ability can be achieved. which adopts boosting to determine base kernel and
For various applications in real world, there are plenty corresponding weights.
of heterogeneous data or features [132]. In terms of remote
sensing, the features could be spectra, spatial distribution, C. Basic Training for MKL
digital elevation model or height, and temporal information,
In terms of training manners for MKL, the existing algo-
which need to be learned with not only a single kernel but
rithms can be partitioned into two categories are as follows.
also multiple kernels where each base kernel corresponds to
one type of features. 1) One-Stage Methods: Solve both classifier parameters
and base kernel weights by simultaneously optimiz-
B. Strategies for MKL ing a target function based on the risk function of
The strategies for determining the kernel combination can classifier. The algorithms of one-stage MKL can be
be basically divided into three major categories [115], [133]. further split into the two subcategories of direct and
1) Criterion-Based Approaches: They use a criterion func- wrapper methods according to the order of solution of
tion to obtain the kernel or the kernel weights. For classifier parameters and base kernel weights. The direct
example, kernel alignment selects the most similar ker- methods simultaneously solve the base kernel weights
nel to the ideal kernel. Representative MKL (RMKL) and the parameters [115]. The wrapper methods solve
obtains the kernel weights by performing principal the two kinds of parameters separately and alternately
component analysis (PCA) on the base kernels [124]. at a given iteration. First, they optimize the base kernel
Sparse MKL acquires the kernel by robust sparse weights by fixing the classifier parameters, and then opti-
PCA [134]. Nonnegative matrix factorization (NMF) mize the classifier parameters by fixing the base kernel
and kernel NMF (KNMF) MKL [125] find the weights [128], [129], [136].
kernel weights by NMF and KNMF. Rule-based 2) Two-Stage Methods: Solve the base kernel weights inde-
MKL (RBMKL) generates the kernel via summa- pendently of the classifier [124], [125], [127]. Usually,
tion or multiplication of the base kernels. The spatial– they solve the base kernel weights first, and then take
spectral composite kernel assigns fixed values as the the base kernel weights as the known conditions to solve
kernel weights [116], [119], [121]–[123]. the parameters of the classifier.
2) Optimization Approaches: They obtain the base ker- The computational time of one-stage and two-stage
nel weights and the decision function of classification MKL depends on two factors, which are the number of con-
simultaneously by solving the optimization problem. For sidered kernels and the number of available training samples.
instance, CS-SMKL [129], SimpleMKL [128], and dis- The one-stage algorithms are usually faster than the two-stage
criminative MKL (DMKL) [127] are determined using algorithms when both the number and the size of the base
the optimization approach. kernels are small. The two-stage algorithms are generally

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6551

Fig. 5. Illustration of subspace MKL methods. The square and circle, respectively, denote training samples from two classes. The combination weights of
subspace MKL methods can be obtained by base kernels projection with a few projection directions.

faster than the one-stage algorithms when the number of base TABLE II
kernels is high or the number of training samples used for S UMMARY OF S UBSPACE MKL M ETHODS
kernel construction is large.

III. MKL A LGORITHMS


A. Subspace MKL
Recently, some effective MKL algorithms have been pro-
posed for HSI classification, called subspace MKL, which uses
subspace method to obtain the weights of base kernels in the
linear combination. These algorithms include RMKL [124],
NMF-MKL, KNMF-MKL [125], and DMKL [127]. Given
M base kernel matrices {Km , m = 1, 2, . . . , M, Km ∈ R N×N },
which are composed of a 3-D data cube of size N × N × M.
In order to facilitate the subsequent operations, the 3-D data
cube of the kernel matrices is converted into a 2-D matrix
with the help of a vectorization operator, where all the kernel
matrices are separately converted into column vectors km =
vec(Km ). After the vectorization, a new form of the base
2
kernels is denoted as Q = [k1 , k2 , . . . , k M ]T ∈ R M×N .
Subspace MKL algorithms build a loss function as follows:
(K, η) = Q − DK2F (3)
separability in reproduction kernel Hilbert space, which leads
where D ∈ R M×l is the projection matrix whose columns
to the minimum within-class scatter and maximum between
{ηr }lr=1 are the bases of l-dimensional linear subspace,
2 class scatter.
K ∈ Rl×N is the projected matrix onto the linear subspace
spanned by D, and · F is Frobenius norm of matrix. Adopting
different optimization criteria to solve D and K forms different B. Nonlinear MKL
subspace MKL methods. NMKL is motivated by the justifiable assumption that the
The visual illustration of subspace MKL methods is shown nonlinear combination of different linear kernels can improve
in Fig. 5. Table II summarizes the three subspace MKL classification performance [115]. In [131], a NMKL is intro-
methods with different ways to solve the combination weights. duced to learn an optimally combined kernel from the prede-
RMKL is to determine the optimal kernel combination weights fined base kernels for HSI classification. The NMKL method
by projecting onto the max-variance direction. In NMF-MKL can fully exploit the mutual discriminability of the interbase-
and KNMF-MKL, NMF and KNMF are used to solve the kernels corresponding to the spatial–spectral features. Then
problem of weights and the optimal combined kernel due to the corresponding improvement in classification performance
the nonnegativity of both matrix and combination weights. can be expected.
Moreover, the core idea of DMKL is to learn an optimally The framework of NMKL is shown in Fig. 6. First,
combined kernel from predefined base kernels by maximizing M spatial–spectral feature sets are extracted from the

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6552 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 6. Illustration of the kernel construction in NMKL.

HSI data cube. Each feature set is associated with one


base kernel, which is defined as Km (xi , x j ) = ηm xi , x j ,
m = 1, 2, . . . , M. Therefore, η = [η1 , η2 , . . . , η M ] is the
vector of kernel weights associated with the base kernels,
as shown in Fig. 6. Then, nonlinear combined kernel is
computed from original kernels. M 2 new kernel matrices are
given by the Hadamard product of any two base kernels, and
the final kernel matrix is the weighted sum of these new kernel
matrices. The final kernel matrix is shown as follows:

M  M
Kη (xi , x j ) = ηm ηh Km (xi , x j )  Kh (xi , x j ). (4)
m=1 h=1
Applying Kη (xi , x j ) to SVM, the related problem of learn-
ing the kernel Kη can be concomitantly formulated as the
following min–max optimization problem:

N
1 
N N
min max αi − αi α j yi y j Kη (xi , x j ) (5) Fig. 7. Illustration of sparse MKL.
η∈ α∈R N 2
i=1 i=1 j =1
where  = {η|η ≥ 0 ∧ η − η0 2 ≤ } is a positive, bounded,
(shown in Fig. 7). In [134], a sparse MKL framework was
and convex set. A positive η ensures that the combined
proposed to achieve a good classification performance by using
kernel function is PSD, and the regularization of the boundary
a linear combination of only a few kernels from multiple base
controls the norm of η. The definition includes an offset
kernels. In sparse MKL, learning with multiple base kernels
parameter η0 for the weight η. Natural choices for η0 are:
from hyperspectral data is carried out by two stages. The
η0 = 0 or η 0 /η0  = 1.
first stage is to learn an optimally sparse combined kernel
A projection-based gradient-descent algorithm can be used
from all base kernels, and the second stage is to perform the
to solve this min–max optimization problem. At each iteration,
standard SVM optimization with the optimal kernel. In the first
α is obtained by solving a kernel ridge regression problem with
step, a sparsity constraint is introduced to control the number
the current kernel matrix and η is updated with the gradients
of nonzero weights and improve the interpretability of base
calculated using α while considering the bound constraints on
kernels in classification. The learning model in the first step
η due to .
can be written as the following optimization problem:
C. Sparsity-Constrained MKL
max η T η − ρCard(η)
1) Sparse MKL: There is redundancy among the multi- η
ple base kernels, especially the kernels with similar scales s.t. η T η = 1 (6)

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6553

TABLE III
S UMMARY OF S PARSE MKL M ETHODS

enforce the sparsity at the group/feature level and automat-


ically learn a compact feature set for classification purposes.
Fig. 8. Illustration of the class-specific kernel learning (taking classes 2 and 5
as example). The combined kernel was embedded into SVM to complete
final classification.
In CS-SMKL approach, an efficient optimization method
where Card(η) is the cardinality of η and corresponds to the has been adopted by using the equivalence between MKL
number of nonzero weights and ρ is a parameter to control and group lasso [137]. The MKL optimization problem is
sparsity. equivalent to the optimization problem
Maximization in (6) can be interpreted as a robust maximum  M
1 N
eigenvalue problem and solved with a first-order algorithm min min ηm  f m Hm + max
2
αi
η∈ { f m ∈Hm } M 2 α∈[0,C] N
given as m=1 m=1 i=1

M
max Tr( Z) − ρ1T |Z|1 × 1− yi ηm f m (xi ) . (8)
s.t. Tr(Z) = 1, Z ≥ 0. (7) m=1
The main differences among the three sparse MKL methods
2) Class-Specific MKL: A CS-SMKL framework has been
are summarized in Table III.
proposed for spatial–spectral classification of HSIs, which
can effectively utilize the multiple features with multiple
scales [129]. CS-SMKL classifies the HSIs by simultane- D. Ensemble MKL
ously learning class-specific significant features and selecting Ensemble learning strategy can be applied to the MKL
class-specific weights. framework to select more effective training samples. As being
The framework of CS-SMKL is illustrated in Fig. 8. First, a main way to ensemble learning, boosting was proposed [138]
feature extraction is performed on the original data set, and and improved in [139]. The idea is based on the way to
M feature sets are obtained. Then, M base kernels associated iteratively select training samples, which sequentially pays
with M feature sets were constructed. At the kernel learning more attention to these easily misclassified samples to train
stage, a class-specific way via the one-versus-one learning base classifiers. The idea of using boosting techniques to learn
strategy is used to select the class-specific weights for different kernel-based classifiers was introduced in [140]. Recently,
feature sets and remove the redundancy of those features when boosting has been integrated to the MKL with extended
classifying any two categories. As shown in Fig. 8, when morphological profiles (EMPs) features in [135] for HSI
classifying one class pair (take, e.g., class 2 and class 5), first classification.
we find their position coordinates according to the label of Let T be the number of boosting tails. The base classifiers
training samples, then the associate class-specific kernel κm , are constructed by SVM classifiers with the input of the
m = 1, 2, . . . , M, is extracted from the base kernels via complete set of multiple features. The method screens sample
the corresponding location. After that, the optimal kernel is by probability distribution Wt ⊂ W, t = 1, 2, . . . T , which
obtained by the linear combination of these class-specific indicates the importance of the training samples for designing
kernels. The weights of the linear combination are constrained a classifier. The incorrectly classified samples have much
M
by the criteria m=1 ηm = 1, ηm ≥ 0. The criteria can higher probability to be chosen as screened samples in the

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6554 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

Fig. 9. Illustration of the sample screening process during boosting trails


by taking two classes as a simple example. (a) Training samples set: the
triangle and square, respectively, denote training samples from two classes and
samples marked in red mean “hard” samples, which are easily misclassified.
(b) Sequent screened samples: the screened samples (sample ratio = 0.2)
marked in purple during boosting trails, and the screened samples focus on
“hard samples” as shown in (a).

next iteration. In this way, MK-boosting provides a strategy


to select more effective training samples for HSI classification. Fig. 10. Architecture of MK-boosting method.
SVM classifier is used as a weak classifier in this case. In each
iteration, the base classifier f t is obtained from M weak
classifiers
ft = arg min γtm = arg min γ ( f tm ) (9)
f tm , j ={1,...,M} f tm , j ={1,...,M}

where γ measures the misclassification performance of the


weak classifiers.
In each iteration, the weights of the distribution are adjusted
by increasing the values of incorrectly classified samples and
decreasing the values of correctly classified samples in order to
make the classifier focus on the “hard” samples in the training
set, as shown in Fig. 9.
Taking the morphological profile (MP) as an example,
the architecture of this method is shown in Fig. 10. The
features, respectively, are the input to SVM, and then the best
classifier with the best performance will be selected as a base Fig. 11. Illustration of heterogeneous feature fusion with MKL.
classifier, and the last T base classifiers are combined as the
final classifier. Furthermore, the coefficients are determined
by the classification accuracy of the base classifiers during the As a result, the information contained in different feature
boosting trails. subsets is mined and integrated into the final classification
kernel. In this framework, the weights of the base kernels
E. Heterogeneous Feature Fusion With MKL can be determined by any MKL algorithm, such as RMKL,
This section introduces a heterogeneous feature fusion NMF-MKL, and DMKL It is worth noting that sparse MKL
framework with MKL, as shown in Fig. 11. It can be found that can be carried out on both each feature subset level and
there are two levels of MKL in column and row, respectively. between feature subsets level for base kernels and features
First, different kernel functions are used to measure the interpretation, respectively.
similarity of samples on each feature subset. This is the “col-
(m) (m) (m) IV. M KL FOR HSI C LASSIFICATION
umn” MKL, KCol (xi , x j ) = s=1S
h (m) (m) (m) (m)
s Ks (xi , x j ).
In this way, the discriminative ability of each feature subset A. Hyperspectral Data Sets
is exploited at different kernels and is integrated to generate Five data sets are used in this paper. Three of them
an optimally combined kernel for each feature subset. Then, are HSIs, which were used to validate classification. The
the multiple combined kernels resulted by MKL on each 4th and 5th data sets consist of two parts, i.e., MSI and
feature subset are integrated using a linear combination. This LiDAR, which are used to perform multisource classification.
M (m) (m) (m)
is the “row” MKL KRow (xi , x j ) = m=1 dm KCol (xi , x j ). The first two HSIs are from cropland scenes acquired by

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6555

Fig. 12. Ground reference maps for the five data sets. (a) Indian Pines. (b) Salinas. (c) Pavia University. (d) Bayview Park. (e) Recology.

the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) 1) Indian Pine Data Set: This HSI was acquired over the
sensor. The AVIRIS sensor acquires 224 bands of 10-nm width agricultural Indian Pine test site in Northwestern Indiana.
with center wavelengths from 400 to 2500 nm. The third It has the spatial size of 145 × 145 pixels with a spatial
HSI was acquired with the Reflective Optics System Imag- resolution of 20 m per pixel. Twenty water absorption bands
ing Spectrometer (ROSIS-03) optical sensor over an urban were removed, and a 200-band image was used for the
area [141]. The flight over the city of Pavia, Italy, was operated experiments. The data set contains 10 366 labeled pixels and
by the Deutschen Zentrum für Luft-und Raumfahrt (DLR, 16 ground reference classes, most of which are different types
German Aerospace Agency) within the context of the HySens of crops. A false color image and the reference map are
project, managed and sponsored by the European Union. The presented in Fig. 12(a).
ROSIS-03 sensor provides 115 bands with a spectral coverage 2) Salinas Data Set: This HSI was acquired in Southern
ranging from 430 to 860 nm. The spatial resolution is 1.3 m California [142]. It has spatial size of 512 × 217 pixels with a
per pixel. spatial resolution of 3.7 m per pixel. Twenty water absorption

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6556 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

TABLE IV
I NFORMATION OF THE F IVE D ATA S ETS

bands were removed, and a 200-band image was used for TABLE V
the experiments. The ground reference map was composed E XPERIMENTAL M ETHODS AND S ETTING
of 54 129 pixels and 16 land-cover classes. Fig. 12(b) shows
a false color image and information of the labeled classes.
3) Pavia University Area: This HSI with 610 × 340 pixels
was collected near the Engineering School, University of
Pavia, Pavia, Italy. Twelve channels were removed due to
noise [116]. The remaining 103 spectral channels were
processed. There are 43 923 labeled samples in total, and nine
classes of interest. Fig. 12(c) presents false color images of
this data set.
4) Bayview Park: The data set is from the 2012 IEEE
GRSS Data Fusion Contest and is one of the subregions
of a whole scene around downtown area of San Francisco,
CA, USA. This data set contains multispectral images with
eight bands acquired by WorldView2 on October 9, 2011
and corresponding LiDAR data acquired in June 2010. It has
spatial size of 300 × 200 pixels with a spatial resolution
of 1.8 m per pixel. There are 19 537 labeled pixels and
7 classes. The false color image and ground reference map
are shown in Fig. 12(d).
5) Recology: The source of this data set is the same as
Bayview Park, which is another subregion of whole scene.
It has 200 × 250 pixels with 11 811 labeled pixels and
11 classes. Fig. 12(e) shows the false color image and ground
reference map.
More details about these data sets are listed in Table IV.
B. Experimental Settings and Evaluation
To evaluate the performance of the various MKL methods
for the classification task, MKL methods and typical com-
parison methods are shown in Table V. The single-kernel shown. To guarantee the generality, all the experiments were
method represents the best performance by standard SVM, conducted on typical HSI data sets.
which can be used as a standard to evaluate whether a MKL In the first experiment of spectral classification, all spectral
method is effective or not. The number of training samples bands are stacked into a feature vector as input features.
per class was varied (n = {1%, 2%, 3%} or n = {10, 20, 30}). The feature vector was input into a Gaussian kernel with
The overall accuracy (OA [%]) and computation time were different scales. For all of the classifiers, the range of the
measured. Average results for a number of ten realizations are scale of Gaussian kernel was set to [0.05, 2], and uniform

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6557

TABLE VI
S UMMARY OF THE E XPERIMENTAL S ETUP FOR S ECTION IV

TABLE VII
OA (%) OF MKL M ETHODS U NDER M ULTIPLE S CALE BASE K ERNEL C ONSTRUCTION

sampling that selects scales from the interval with a fixed step C. Spectral Classification
size of 0.05 was used to select 40 scales within the given The numerical classification results of different MKL meth-
range. ods for different data sets are given in Table VII. The perfor-
In the second experiment of spatial and spectral classifica- mance of MKL methods is mainly determined by the ways of
tions, all the data sets were processed first by PCA and then constructing base kernel and the solutions of weights for base
by mathematical morphology. The eigenvalues were arranged kernels. The resulting base kernel matrices from the different
in the descending order. The first p PCs that account for ways of constructing base kernel contain all the information
99% of the total variation in terms of eigenvalues were that will be used for the subsequent classification task. The
reserved. Hence, the construction of the MP was based on weights of base kernels learned by different MKL methods
the PCs, and a stacked vector was built with the MP on represent how to combine this information with the objective
each PC. Here, three kinds of SEs were used to obtain the of strengthening information extraction and curbing useless
MP features, including diamond, square, and disk SEs. For information for classification.
each kind of SE, a step size of increment of 1 was used, Observing the results on the three data sets, some conclu-
and ten closings and ten openings were computed for each sions can be drawn as follows. 1) There is a situation that
PC. Each structure of MPs with ten closings and ten openings the classification performance of some MKL methods is not
and the original spectral features were, respectively, stacked as good in terms of classification accuracies as for that of the
as the input vector of each base kernel for MKL algorithms. single kernel method. This reveals that MKL methods need
The base kernels were four Gaussian kernels, i.e., the values good learning algorithms to ensure the performance. 2) In the
{0.1, 1, 1.5, 2}, which corresponds to three kinds of structures three typical HSI, the best classification performance in terms
of MPs and original spectral features, respectively, namely, of accuracies is derived from the MKL methods. This proves
20 base kernels for MKL methods, except for NMKL, which that using multiple kernels instead of a single one can improve
is with three Gaussian kernels, i.e., the values {1, 1.5, 2} for performances for HSI classification and the key is to choose
NMKL-Gaussian, and four linear base kernels function for the suitable learning algorithm. 3) In most cases, the subspace
NMKL-linear. MKL methods are superior to the comparative MKL methods
Heterogeneous features were used in the third experiment, and single-kernel method in terms of OA.
including spectral features, elevation features, normalized dig-
ital surface model (nDSM) from LiDAR data, and spatial fea-
tures of MPs. MPs features extract from original multispectral D. Spatial–Spectral Classification
bands and nDSM use the diamond structure element with The classification results of all these compared methods
the sizes [3], [5], [7], [9], [11], [13], [15], [17], [19], [21]. on three data sets are shown in Table VIII. And the over-
Heterogeneous features are stacked as a single vector of all time of training and test process of Pavia University
features to be the input of fusion methods. data set with 1% training samples is shown in Fig. 13.
The summary of the experimental setup is listed in Table VI. Several conclusions can be derived. First, as the number of

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6558 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

TABLE VIII
OA (%) OF MKL M ETHODS U NDER MPs BASE K ERNEL C ONSTRUCTION

easy in terms of complexity and sparsity, and vice versa. The


subspace MKL algorithms as two-stage methods have a lower
complexity than one-step methods such as SimpleMKL and
CS-SMKL.
It can be noted that the NMKL with the linear kernels
demonstrates a little lower accuracy than subspace MKL
algorithms with the Gaussian kernel. NMKL with the Gaussian
kernels obtains comparable classification accuracy compared
with NMKL with linear kernels in the Pavia University data set
and the Salinas data set, but with a lower accuracy in the Indian
data set. In general, using a linear combination of Gaussian
kernels is more promising than a nonlinear combination of lin-
ear kernels. However, the nonlinear combinations of Gaussian
kernels need to be researched further. Feature combination
Fig. 13. Overall time of training and testing process in all the methods. and the scale of the Gaussian kernels have a big influence
on the accuracy of NMKL with a Gaussian kernel. And the
training samples increases, accuracy increases. Second, the NMKL method also demonstrates a different performance
MK-boosting method has the best classification accuracy with trend for different data sets. In this experiment, some tries
the cost of computation time. It is also important to note were attempted and the results show relatively better results
that there is not a large difference between the methods compared to other approaches in some situations. More work
in terms of classification accuracy. It can be explained that of theoretical analysis needs to be done in this area.
MPs can mine well information for classification by the way It can be found that among all the sparse methods,
of MKL and, then, the difference among MKL algorithms CS-SMKL demonstrated comparable classification accura-
mainly concentrate on complexity and sparsity of the solution. cies for the Indian Pines and Salinas data sets. And for
The conclusion is consistent with [115]. SimpleMKL shows Pavia data set, as the number of training samples growing,
the worst classification performance in terms of accuracies the classification performance of CS-SMKL increased signif-
under multiple-scale constructions in the first experiment, but icantly and reached a comparable accuracy, too. In order
is comparable to the other methods in terms classification to visualize the contribution of each feature type and these
accuracy in this experiment. The example of SimpleMKL corresponding base kernels in these MKL methods, we plot
illustrates that a MKL method is difficult to guarantee the best the kernel weights of the base kernels for RMKL, DMKL,
classification performance in terms of accuracies in all cases. SimpleMKL, Sparse MKL, and CS-SMKL in Fig. 14. For
Feature extraction and classification are both important steps simplicity, here only three one against one classifiers of
for classification. If the information extraction via features Pavia University data set (painted metal sheets versus bare
is successful for classification, the classifier design can be soil, painted metal sheets versus bitumen, and painted metal

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6559

Fig. 14. Weights η determined for each base kernel and the corresponding feature type. (a)–(d) Fixed set of kernel weights selected by RMKL. (e) Kernel
weights selected for three different class pairs by CS-SMKL.

sheets versus self-blocking bricks) are listed. RMKL, DMKL, TABLE IX


SimpleMKL, and Sparse MKL used the same kernel weights OA (%) OF D IFFERENT MKL M ETHODS ON T WO D ATA S ETS
as shown in Fig. 14(a)–(d) for all the class pairs. From
Fig. 14(e), it is easy to find that CS-SMKL selected different
sparse base kernel sets for different class pairs, and the spectral
features are important for these three class pair. For the
CS-SMKL, it only selected very few base kernels for clas-
sification purposes, while the kernel weight for the spectral
features is very high. However, these corresponding kernel
weights in RMKL and DMKL are much lower, and sparse
MKL did not select any kernel related to the spectral features,
SimpleMKL selects the first three kernels related to the spec-
tral features, but obviously, the corresponding kernel weights
are lower than that related to the EMP feature obtained by
RMKL was adopted to determine the weights of the base
the square SE. This is an example showing that CS-SMKL
kernels on both levels of MKL in column and row. Joint
provides more flexibility in selecting kernels (features) for
classification with the spectral features, elevation features, and
improving classification.
spatial features was carried out, and the results of classification
for two data sets are as shown in Table IX. SK represents a
E. Classification With Heterogeneous Features natural and simple strategy to fuse heterogeneous features, and
This section shows the performance of the fusion framework it can be used as a standard to evaluate the effectiveness of
of heterogeneous features with MKL (denoted as HF-MKL) different fusion strategies for heterogeneous features. With this
under realistic ill-posed situations, and the results compared standard, CKL is poor. The performance of CKL is affected
with other MKL methods. In fusion framework of HF-MKL, by the weights of spectral, spatial, and elevation kernels.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6560 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

All the MKL methods outperform the stacked-vector approach construct different base kernels can distinguish those different
strategy. This reveals that features from different sources roles and fuse the complementary information contained in
obviously have different meanings, and statistical significance. heterogeneous features. Consequently, MKL gives a more
Therefore, they may play different roles in classification. reasonable choice than stacked-vector approach. And our
Consequently, the stacked-vector approach is not a good experimental results also demonstrated this point. Further, the
choice for the joint classification. However, MKL is an effec- two-stage MKL framework is a good choice in term of OA.
tive fusion strategy for heterogeneous features, and the further
HF-MKL framework is a good choice. VI. F UTURE L INES OF R ESEARCH
A. Deep Kernel and Multiple Kernel Learning
V. C ONCLUSION
MKL is a low-dimensional network structure with only one
In general, the MKL methods can improve the classifica-
hidden layer compared with DNN. Motived by deep learning,
tion performance in most cases compared with single-kernel
deep kernel network is a new research direction. Some work
method. For spectral classification of HSI, subspace MKL
has been done to apply MKL into deep learning. There are
methods using a trained, weighted combination on the average
mainly two kinds of ways: one is to fuse hierarchical features
outperform the untrained, unweighted sum, namely, RBMKL
from DNN; the other is to use kernel trick to optimize the
(mean MKL), and have significant superiority of accuracy
weights for speeding up the learning rate, namely, kernel deep
and computational efficiency compared with the SimpleMKL
convex network. Furthermore, additional work should focus on
method. Ensemble MKL method (MK-boosting) has higher
building a deep kernel network, which can not only optimize
classification performance in terms of classification accuracy
features for HSI classification but also be trained by optimizing
but an additional cost of computation time. It is also important
MKL learning problems. And some other factors should also
to note that there is not large difference considering classifi-
be considered in the future, such as how to find a set of
cation accuracy for different kinds of MKL methods. If we
globally optimal parameters and theoretical guide adaptive
can extract effective spatial–spectral features for HSI classi-
regularization in the deep kernel network.
fication, the choice of MKL algorithms mainly concentrates
on complexity and sparsity of the solution. In general, using
the linear combination kernels with Gaussian kernels is more B. Superpixel MKL
promising accuracy than the nonlinear combination kernels MKL provides a very effective means of learning, and
with linear kernels. However, the nonlinear combinations can conveniently be embedded in a variety of characteristics.
with the complex Gaussian kernels need to be done more Therefore, it is critical to apply MKL to effective features.
research. This is still an open problem, which is affected by Recently, a superpixel approach has been applied to HSI
many factors such as the way of features combination and the classification as an effective spatial feature extraction means.
scale of Gaussian kernels. Each superpixel is a local region, whose size and shape
Currently, with the improvement of HSI quality, we can can be adaptively adjusted according to local structures. And
extract more and more accurate features for classification the pixels in the same superpixel are assumed to have very
task. These features could be multiscale, multiattribute, mul- similar spectral characteristics, which mean that superpixel
tidimension, and multicomponents. Since MKL provides a can provide more accurate spatial information. Utilizing the
very effective means of learning, it is natural considering feature explored by superpixel, the salt and paper phenomenon
to utilize these features by MKL framework. Expanding the appearing in the classification result will be reduced. In con-
feature spaces with a number of information diversities, these sequence, superpixel MKL will lead to a better classification
multiple features provide excellent classification performances. performance.
However, with a high redundancy of information, and each
kind of them has different contribution to classification task. C. MKL for Multimodal Classification
As a solution, sparse MKL methods are developed. The sparse MKL provides a flexible framework for us to fuse different
MKL framework allows to embed a variety of characteris- sources of information in a very natural way. Multimodal
tics in the classifier and remove the redundancy of multiple classification of remote sensing data is a typical problem of
features effectively to learn a compact set of features and multisource information mining and utilization. The comple-
select the weights of corresponding base kernels, leading to a mentary and relevant information contained in the multisource
remarkable discriminability. The experimental results on three remote sensing data can be fused and utilized by taking
different hyperspectral data sets, corresponding to different into account the base kernels construction and optimizing
contexts (urban, agricultural) and different spectral and spatial configuration in MKL. Consequently, MKL will contribute to
resolutions, demonstrate that the sparse methods offer good the development of multimodal remote sensing, such as, mul-
performances. titemporal classification, multisensor fusion and classification,
Heterogeneous features from different sources have differ- multiangular image fusion, and classification.
ent meanings, dimension units, and statistical significance.
Therefore, they may play different roles in classification R EFERENCES
and should be treated differently. MKL performs heteroge-
[1] G. Shaw and D. Manolakis, “Signal processing for hyperspectral image
neous features fusion in implicit high-dimensional feature exploitation,” IEEE Signal Process. Mag., vol. 19, no. 1, pp. 12–16,
representation. Utilizing different heterogeneous features to Jan. 2002.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6561

[2] D. Landgrebe, “Hyperspectral image data analysis,” IEEE Signal [26] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
Process. Mag., vol. 19, no. 1, pp. 17–28, Jan. 2002. data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,
[3] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and 2006.
J. C. Tilton, “Advances in spectral-spatial classification of hyperspectral [27] R. Salakhutdinov and G. Hinton, “Deep Boltzmann machines,” in Proc.
images,” Proc. IEEE, vol. 101, no. 3, pp. 652–675, Mar. 2013. 12th Int. Conf. Artif. Intell. Statist. (AISTATS), 2009, pp. 448–455.
[4] M. A. Cochrane, “Using vegetation reflectance variability for species [28] H. Lee, R. Grosse, R. Ranganath, and A. Y. Ng, “Convolutional
level classification of hyperspectral data,” Int. J. Remote Sens., vol. 21, deep belief networks for scalable unsupervised learning of hierarchical
no. 10, pp. 2075–2087, 2000. representations,” in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009,
[5] A. Ghiyamat and H. Z. M. Shafri, “A review on hyperspectral remote pp. 609–616.
sensing for homogeneous and heterogeneous forest biodiversity assess- [29] L. Deng, D. Yu, and J. Platt, “Scalable stacking and learning for
ment,” Int. J. Remote Sens., vol. 31, no. 7, pp. 1837–1856, 2010. building deep architectures,” in Proc. Int. Conf. Acoust., Speech, Signal
[6] J. Pontius, M. Martin, L. Plourde, and R. Hallett, “Ash decline Process., 2012, pp. 2133–2136.
assessment in emerald ash borer-infested regions: A test of tree-level, [30] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
hyperspectral technologies,” Remote Sens. Environ., vol. 112, no. 5, “Stacked denoising autoencoders: Learning useful representations in a
pp. 2665–2676, 2008. deep network with a local denoising criterion,” J. Mach. Learn. Res.,
[7] T. Schmid, M. Koch, and J. Gumuzzio, “Multisensor approach to vol. 11, no. 12, pp. 3371–3408, Dec. 2010.
determine changes of wetland characteristics in semiarid environments [31] F. Ratle, G. Camps-Valls, and J. Weston, “Semisupervised neural
(central Spain),” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 11, networks for efficient hyperspectral image classification,” IEEE Trans.
pp. 2516–2525, Nov. 2005. Geosci. Remote Sens., vol. 48, no. 5, pp. 2271–2282, May 2010.
[8] E. A. Cloutis, “Hyperspectral geological remote sensing: Evaluation [32] L. Zhang, Y. Zhong, B. Huang, and P. Li, “A resource limited artifi-
of analytical techniques,” Int. J. Remote Sens., vol. 17, no. 12, cial immune system algorithm for supervised classification of multi/
pp. 2215–2242, 1996. hyper-spectral remote sensing imagery,” Int. J. Remote Sens., vol. 28,
[9] Y. Lanthier, A. Bannari, D. Haboudane, J. R. Miller, and no. 7, pp. 1665–1686, 2007.
N. Tremblay, “Hyperspectral data segmentation and classification in [33] Y. Zhong and L. Zhang, “An adaptive artificial immune network
precision agriculture: A multi-scale analysis,” in Proc. IEEE Int. for supervised classification of multi-/hyperspectral remote sens-
Geosci. Remote Sens. Symp., Jul. 2008, pp. II-585–II-588. ing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 3,
[10] J. L. Boggs, T. D. Tsegaye, T. L. Coleman, K. C. Reddy, and A. Fahsi, pp. 894–909, Mar. 2012.
“Relationship between hyperspectral reflectance, soil nitrate-nitrogen, [34] E. Merényi, W. H. Farrand, J. V. Taranik, and T. B. Minor, “Classifi-
cotton leaf chlorophyll, and cotton yield: A step toward precision cation of hyperspectral imagery with neural networks: Comparison to
agriculture,” J. Sustain. Agricult., vol. 22, no. 3, pp. 5–16, 2003. conventional tools,” EURASIP J. Adv. Signal Process., vol. 1, no. 71,
[11] D. Manolakis, D. Marden, and G. A. Shaw, “Hyperspectral image pp. 1–19, Dec. 2014.
processing for automatic target detection applications,” Lincoln Lab. J., [35] P. K. Goel, S. O. Prasher, R. M. Patel, J. A. Landry, R. B. Bonnell, and
vol. 14, no. 1, pp. 79–116, 2003. A. A. Viau, “Classification of hyperspectral data by decision trees and
artificial neural networks to identify weed stress and nitrogen status of
[12] X. Briottet et al., “Military applications of hyperspectral imagery,”
corn,” Comput. Electron. Agricult., vol. 39, no. 2, pp. 67–93, 2003.
vol. 6239, W. R. Watkins and D. Clement, Eds. Bellingham, WA, USA:
SPIE, 2006, no. 1, p. 62390B. [36] J. L. Crespo, R. J. Duro, and F. L. Pena, “Gaussian synapse ANNs in
multi- and hyperspectral image data analysis,” IEEE Trans. Instrum.
[13] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE Signal
Meas., vol. 52, no. 3, pp. 724–732, Jun. 2003.
Process. Mag., vol. 19, no. 1, pp. 44–57, Jan. 2002.
[37] S. K. Meher, “Knowledge-encoded granular neural networks for hyper-
[14] J. M. Bioucas-Dias et al., “Hyperspectral unmixing overview: Geomet-
spectral remote sensing image classification,” IEEE J. Sel. Topics Appl.
rical, statistical, and sparse regression-based approaches,” IEEE J. Sel.
Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2439–2446, Jun. 2015.
Topics Appl. Earth Observ. Remote Sens., vol. 5, no. 2, pp. 354–379,
[38] Y. LeCun, K. Kavukcuoglu, and C. Farabet, “Convolutional networks
Apr. 2012.
and applications in vision,” in Proc. IEEE Int. Symp. Circuits Syst.,
[15] B. Du and L. Zhang, “A discriminative metric learning based anomaly May/Jun. 2010, pp. 253–256.
detection method,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,
[39] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
pp. 6844–6857, Nov. 2014.
with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
[16] D. Manolakis and G. S. Shaw, “Detection algorithms for hyperspectral Process. Syst., 2012, pp. 1097–1105.
imaging applications,” IEEE Signal Process. Mag., vol. 19, no. 1, [40] C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierar-
pp. 29–43, Jan. 2002. chical features for scene labeling,” IEEE Trans. Pattern Anal. Mach.
[17] D. W. J. Stein, S. G. Beaven, L. E. Hoff, E. M. Winter, A. P. Schaum, Intell., vol. 35, no. 8, pp. 1915–1929, Aug. 2013.
and A. D. Stocker, “Anomaly detection from hyperspectral imagery,” [41] Y. LeCun et al., “Backpropagation applied to handwritten zip code
IEEE Signal Process. Mag., vol. 19, no. 1, pp. 58–69, Jan. 2002. recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989.
[18] C. D. Rodgers, Inverse Methods for Atmospheric Sounding: Theory [42] G. Hinton et al., “Deep neural networks for acoustic modeling in speech
and Practice (Series on Atmospheric, Oceanic and Planetary Physics), recognition: The shared views of four research groups,” IEEE Signal
vol. 2. Singapore: World Scientific, 2000. Process. Mag., vol. 29, no. 6, pp. 82–97, Nov. 2012.
[19] S. Liang, Quantitative Remote Sensing of Land Surfaces. Hoboken, NJ, [43] X. Chen, S. Xiang, C.-L. Liu, and C.-H. Pan, “Vehicle detection in
USA: Wiley, 2003. satellite images by hybrid deep convolutional neural networks,” IEEE
[20] F. Baret and S. Buis, “Estimating canopy characteristics from remote Geosci. Remote Sens. Lett., vol. 11, no. 10, pp. 1797–1801, Oct. 2014.
sensing observations. Review of methods and associated problems,” [44] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based
in Advances in Land Remote Sensing: System Modeling Inversion and classification of hyperspectral data,” IEEE J. Sel. Topics Appl. Earth
Application. Dordrecht, The Netherlands: Springer, 2008, pp. 172–301. Observ. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.
[21] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote [45] C. Vaduva, I. Gavat, and M. Datcu, “Deep learning in very high reso-
sensing images with support vector machines,” IEEE Trans. Geosci. lution remote sensing image information mining communication con-
Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. cept,” in Proc. 20th Eur. Signal Process. Conf. (EUSIPCO), Aug. 2012,
[22] J. C. Harsanyi and C.-I. Chang, “Hyperspectral image classifica- pp. 2506–2510.
tion and dimensionality reduction: An orthogonal subspace projec- [46] Y. Chen, X. Zhao, and X. Jia, “Spectral–spatial classification of
tion approach,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, hyperspectral data based on deep belief network,” IEEE J. Sel. Topics
pp. 779–785, Jul. 1994. Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392,
[23] J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, Jun. 2015.
N. M. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing [47] P. Liu, H. Zhang, and K. B. Eom, “Active deep learning for classifica-
data analysis and future challenges,” IEEE Geosci. Remote Sens. Mag., tion of hyperspectral images,” IEEE J. Sel. Topics Appl. Earth Observ.
vol. 1, no. 2, pp. 6–36, Jun. 2013. Remote Sens., vol. 10, no. 2, pp. 712–724, Feb. 2016.
[24] J. A. Richards, Remote Sensing Digital Image Analysis. Berlin, [48] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-
Germany: Springer, 1986. tion and classification of hyperspectral images based on convolutional
[25] D. A. Landgrebe, Signal Theory Methods in Multispectral Remote neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
Sensing. Hoboken, NJ, USA: Wiley, 2003. pp. 6232–6251, Oct. 2016.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6562 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

[49] B. Pan, Z. Shi, N. Zhang, and S. Xie, “Hyperspectral image [71] A. Samat, P. Du, S. Liu, J. Li, and L. Cheng, “E2 LMs: Ensemble
classification based on nonlinear spectral–spatial network,” IEEE extreme learning machines for hyperspectral image classification,”
Geosci. Remote Sens. Lett., vol. 13, no. 12, pp. 1782–1786, IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 4,
Dec. 2016. pp. 1060–1069, Apr. 2014.
[50] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyper- [72] B. Demir and S. Erturk, “Empirical mode decomposition of hyper-
spectral image classification: A dimension reduction and deep learn- spectral images for support vector machine classification,” IEEE
ing approach,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 8, Trans. Geosci. Remote Sens., vol. 48, no. 11, pp. 4071–4084,
pp. 4544–4554, Aug. 2016. Nov. 2010.
[51] A. Romero, C. Gatta, and G. Camps-Valls, “Unsupervised deep fea- [73] G. Hughes, “On the mean accuracy of statistical pattern recognizers,”
ture extraction for remote sensing image classification,” IEEE Trans. IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968.
Geosci. Remote Sens., vol. 54, no. 3, pp. 1349–1362, Mar. 2016. [74] B. C. Kuo, C. H. Li, and J. M. Yang, “Kernel nonparametric weighted
[52] H. Liang and Q. Li, “Hyperspectral imagery classification using sparse feature extraction for hyperspectral image classification,” IEEE Trans.
representations of convolutional neural network features,” Remote Geosci. Remote Sens., vol. 47, no. 4, pp. 1139–1155, Apr. 2009.
Sens., vol. 8, no. 2, p. 99, 2016. [75] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyper-
[53] W. Zhao, Z. Guo, J. Yue, X. Zhang, and L. Luo, “On combining spectral image classification,” IEEE Trans. Geosci. Remote Sens.,
multiscale deep learning features for the classification of hyperspec- vol. 43, no. 6, pp. 1351–1362, Jun. 2005.
tral remote sensing imagery,” Int. J. Remote Sens., vol. 36, no. 13, [76] B.-C. Kuo, H.-H. Ho, C.-H. Li, C.-C. Hung, and J.-S. Taur, “A kernel-
pp. 3368–3379, 2015. based feature selection method for SVM with RBF kernel for hyper-
[54] L. Breiman, “Arcing classifiers,” Ann. Statist., vol. 26, no. 3, spectral image classification,” IEEE J. Sel. Topics Appl. Earth Observ.
pp. 801–824, 1998. Remote Sens., vol. 7, no. 1, pp. 317–326, Jan. 2014.
[55] L. Rokach, “Ensemble-based classifiers,” Artif. Intell. Rev., vol. 33, [77] P. V. Gehler and B. Schölkopf, “An introduction to kernel learning
nos. 1–2, pp. 1–39, 2010. algorithms,” in Kernel Methods for Remote Sensing Data Analysis,
[56] E. Bauer and R. Kohavi, “An empirical comparison of voting classi- G. Camps-Valls and L. Bruzzone, Eds. Chichester, U.K.: Wiley, 2009,
fication algorithms: Bagging, boosting, and variants,” Mach. Learn., pp. 25–48.
vol. 36, nos. 1–2, pp. 105–139, 1999. [78] P. Ramzi, F. Samadzadegan, and P. Reinartz, “Classification of hyper-
[57] R. Polikar, “Ensemble based systems in decision making,” IEEE spectral data using an AdaBoostSVM technique applied on band
Circuits Syst. Mag., vol. 6, no. 3, pp. 21–45, Sep. 2006. clusters,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7,
no. 6, pp. 2066–2079, Jun. 2014.
[58] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2,
[79] J. Xia, J. Chanussot, P. Du, and X. He, “Rotation-based support vector
pp. 123–140, 1996.
machine ensemble in classification of hyperspectral data with limited
[59] R. Schapire, “The boosting approach to machine learning: An training samples,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 3,
overview,” in Proc. MSRI Workshop Nonlinear Estimation Classifica- pp. 1519–1531, Mar. 2016.
tion, 2001.
[80] Z. Xue, P. Du, and H. Su, “Harmonic analysis for hyperspectral image
[60] T. K. Ho, “The random subspace method for constructing decision classification integrated with PSO optimized SVM,” IEEE J. Sel. Topics
forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2131–2146,
pp. 832–844, Aug. 1998. Jun. 2014.
[61] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: [81] L. Gao et al., “Subspace-based support vector machines for hyperspec-
A new classifier ensemble method,” IEEE Trans. Pattern Anal. Mach. tral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 12,
Intell., vol. 28, no. 10, pp. 1619–1630, Oct. 2006. no. 2, pp. 349–353, Feb. 2015.
[62] L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, [82] J. Peng, Y. Zhou, and C. L. P. Chen, “Region-kernel-based support
2001. vector machines for hyperspectral image classification,” IEEE Trans.
[63] M. Dalponte, H. O. Orka, T. Gobakken, D. Gianelle, and E. Naesset, Geosci. Remote Sens., vol. 53, no. 9, pp. 4810–4824, Sep. 2015.
“Tree species classification in boreal forests with hyperspectral data,” [83] X. Guo, X. Huang, L. Zhang, L. Zhang, A. Plaza, and
IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5, pp. 2632–2645, J. A. Benediktsson, “Support tensor machines for classification of
May 2013. hyperspectral remote sensing imagery,” IEEE Trans. Geosci. Remote
[64] J. C.-W. Chan and D. Paelinckx, “Evaluation of Random For- Sens., vol. 54, no. 6, pp. 3248–3264, Jun. 2016.
est and Adaboost tree-based ensemble classification and spectral [84] C. L. Stork and M. R. Keenan, “Advantages of clustering in the
band selection for ecotope mapping using airborne hyperspectral phase classification of hyperspectral materials images,” Microscopy
imagery,” Remote Sens. Environ., vol. 112, no. 6, pp. 2999–3011, Microanal., vol. 16, no. 6, pp. 810–820, 2010.
2008. [85] Z. Shao, L. Zhang, X. Zhou, and L. Ding, “A novel hierarchi-
[65] E. M. Adam, O. Mutanga, D. Rugege, and R. Ismail, “Discrim- cal semisupervised SVM for classification of hyperspectral images,”
inating the papyrus vegetation (Cyperus papyrus L.) and its co- IEEE Geosci. Remote Sens. Lett., vol. 11, no. 9, pp. 1609–1613,
existent species using random forest and hyperspectral data resampled Sep. 2014.
to HYMAP,” Int. J. Remote Sens., vol. 33, no. 2, pp. 552–569, [86] L. Yang, S. Yang, P. Jin, and R. Zhang, “Semi-supervised hyper-
2012. spectral image classification using spatio-spectral Laplacian support
[66] N. B. Mishra and K. A. Crews, “Mapping vegetation morphology types vector machine,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 3,
in a dry savanna ecosystem: Integrating hierarchical object-based image pp. 651–655, Mar. 2014.
analysis with Random Forest,” Int. J. Remote Sens., vol. 35, no. 3, [87] Y. Bazi and F. Melgani, “Classification of hyperspectral remote sensing
pp. 1175–1198, 2014. images using Gaussian processes,” in Proc. IEEE Int. Geosci. Remote
[67] J. Xia, W. Liao, J. Chanussot, P. Du, G. Song, and W. Philips, “Improv- Sens. Symp., Jul. 2008, pp. II-1013–II-1016.
ing random forest with ensemble of features and semisupervised [88] Y. Bazi and F. Melgani, “Gaussian process approach to remote sensing
feature extraction,” IEEE Geosci. Remote Sens. Lett., vol. 12, no. 7, image classification,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 1,
pp. 1471–1475, Jul. 2015. pp. 186–197, Jan. 2010.
[68] K. Y. Peerbhay, O. Mutanga, and R. Ismail, “Random forests unsu- [89] W. Liao, J. Tang, B. Rosenhahn, and M. Y. Yang, “Integration of
pervised classification: The detection and mapping of solanum mau- Gaussian process and MRF for hyperspectral image classification,” in
ritianum infestations in plantation forestry using hyperspectral data,” Proc. IEEE Urban Remote Sens. Event, Mar./Apr. 2015, pp. 1–4.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, [90] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image clas-
pp. 3107–3122, Jun. 2015. sification using dictionary-based sparse representation,” IEEE Trans.
[69] J. Ham, Y. Chen, M. M. Crawford, and J. Ghosh, “Investigation of Geosci. Remote Sens., vol. 49, no. 10, pp. 3973–3985, Oct. 2011.
the random forest framework for classification of hyperspectral data,” [91] J. Liu, Z. Wu, Z. Wei, L. Xiao, and L. Sun, “Spatial-spectral kernel
IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 492–501, sparse representation for hyperspectral image classification,” IEEE
Mar. 2005. J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 6,
[70] J. Xia, M. Dalla Mura, J. Chanussot, P. Du, and X. He, “Random sub- pp. 2462–2471, Dec. 2013.
space ensembles for hyperspectral image classification with extended [92] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral image
morphological attribute profiles,” IEEE Trans. Geosci. Remote Sens., classification via kernel sparse representation,” IEEE Trans. Geosci.
vol. 53, no. 9, pp. 4768–4786, Sep. 2015. Remote Sens., vol. 51, no. 1, pp. 217–231, Jan. 2013.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6563

[93] U. Srinivas, Y. Chen, V. Monga, N. M. Nasrabadi, and T. D. Tran, [114] Y. Gu, Q. Wang, and B. Xie, “Multiple kernel sparse representation
“Exploiting sparsity in hyperspectral image classification via graphical for airborne LiDAR data classification,” IEEE Trans. Geosci. Remote
models,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 3, pp. 505–509, Sens., vol. 55, no. 2, pp. 1085–1105, Feb. 2017.
May 2013. [115] M. Gönen and E. Alpaydın, “Multiple kernel learning algorithms,”
[94] H. Zhang, J. Li, Y. Huang, and L. Zhang, “A nonlocal weighted joint J. Mach. Learn. Res., vol. 12, pp. 2211–2268, Jul. 2011.
sparse representation classification method for hyperspectral imagery,” [116] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances,
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, and J. Calpe-Maravilla, “Composite kernels for hyperspectral image
pp. 2056–2065, Jun. 2014. classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1,
[95] L. Fang, S. Li, X. Kang, and J. A. Benediktsson, “Spectral–spatial pp. 93–97, Jan. 2006.
hyperspectral image classification via multiscale adaptive sparse rep- [117] J. Wang, L. Jiao, S. Wang, B. Hou, and F. Liu, “Adaptive nonlocal
resentation,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 12, spatial–spectral kernel for hyperspectral imagery classification,” IEEE
pp. 7738–7749, Dec. 2014. J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9,
[96] J. Li, H. Zhang, and L. Zhang, “Efficient superpixel-level multitask pp. 4086–4101, Sep. 2016.
joint sparse representation for hyperspectral image classification,” IEEE [118] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. L. Rojo-Alvarez,
Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5338–5351, Oct. 2015. and M. Martinez-Ramon, “Kernel-based framework for multitemporal
[97] Q. S. U. Haq, L. Tao, F. Sun, and S. Yang, “A fast and robust and multisource remote sensing data classification and change detec-
sparse approach for hyperspectral data classification using a few tion,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1822–1835,
labeled samples,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 6, Jun. 2008.
pp. 2287–2302, Jun. 2012. [119] L. Fang, S. Li, W. Duan, J. Ren, and J. A. Benediktsson, “Classification
[98] S. Yang, H. Jin, M. Wang, Y. Ren, and L. Jiao, “Data-driven com- of hyperspectral images by exploiting spectral–spatial information of
pressive sampling and learning sparse coding for hyperspectral image superpixel via multiple kernels,” IEEE Trans. Geosci. Remote Sens.,
classification,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 2, vol. 53, no. 12, pp. 6663–6674, Dec. 2015.
pp. 479–483, Feb. 2014. [120] S. Valero, P. Salembier, and J. Chanussot, “Hyperspectral image
[99] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based representation and processing with binary partition trees,” IEEE Trans.
on structured sparse logistic regression and three-dimensional wavelet Image Process., vol. 22, no. 4, pp. 1430–1443, Apr. 2013.
texture features,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 4, [121] Y. Zhou, J. Peng, and C. L. P. Chen, “Extreme learning machine with
pp. 2276–2291, Apr. 2013. composite kernels for hyperspectral image classification,” IEEE J. Sel.
[100] Y. Y. Tang, H. Yuan, and L. Li, “Manifold-based sparse representation Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2351–2360,
for hyperspectral image classification,” IEEE Trans. Geosci. Remote Jun. 2015.
Sens., vol. 52, no. 12, pp. 7606–7618, Dec. 2014. [122] H. Li, Z. Ye, and G. Xiao, “Hyperspectral image classification using
[101] H. Yuan, Y. Y. Tang, Y. Lu, L. Yang, and H. Luo, “Hyperspectral image spectral–spatial composite kernels discriminant analysis,” IEEE J. Sel.
classification based on regularized sparse representation,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2341–2350,
Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6, pp. 2174–2182, Jun. 2015.
Jun. 2014. [123] Y. Zhang and S. Prasad, “Locality preserving composite kernel feature
[102] J. Li, H. Zhang, and L. Zhang, “Column-generation kernel nonlocal extraction for multi-source geospatial image analysis,” IEEE J. Sel.
joint collaborative representation for hyperspectral image classifica- Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 3, pp. 1385–1392,
tion,” ISPRS J. Photogramm. Remote Sens., vol. 94, pp. 25–36, Mar. 2015.
Aug. 2014. [124] Y. Gu, C. Wang, D. You, Y. Zhang, S. Wang, and Y. Zhang, “Rep-
[103] J. Li, H. Zhang, Y. Huang, and L. Zhang, “Hyperspectral image resentative multiple kernel learning for classification in hyperspec-
classification by nonlocal joint collaborative representation with a tral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 7,
locally adaptive dictionary,” IEEE Trans. Geosci. Remote Sens., vol. 52, pp. 2852–2865, Jul. 2012.
no. 6, pp. 3707–3719, Jun. 2014. [125] Y. Gu, Q. Wang, H. Wang, D. You, and Y. Zhang, “Multiple kernel
[104] J. Li, H. Zhang, L. Zhang, X. Huang, and L. Zhang, “Joint collabo- learning via low-rank nonnegative matrix factorization for classification
rative representation with multitask learning for hyperspectral image of hyperspectral imagery,” IEEE J. Sel. Topics Appl. Earth Observ.
classification,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 9, Remote Sens., vol. 8, no. 6, pp. 2739–2751, Jun. 2015.
pp. 5923–5936, Sep. 2014. [126] Y. Gu, Q. Wang, X. Jia, and J. A. Benediktsson, “A novel MKL model
[105] W. Li and Q. Du, “Joint within-class collaborative representation for of integrating LiDAR data and MSI for urban area classification,” IEEE
hyperspectral image classification,” IEEE J. Sel. Topics Appl. Earth Trans. Geosci. Remote Sens., vol. 53, no. 10, pp. 5312–5326, Oct. 2015.
Observ. Remote Sens., vol. 7, no. 6, pp. 2200–2208, Jun. 2014. [127] Q. Wang, Y. Gu, and D. Tuia, “Discriminative multiple kernel learning
[106] W. Li, Q. Du, and M. Xiong, “Kernel collaborative representation with for hyperspectral image classification,” IEEE Trans. Geosci. Remote
Tikhonov regularization for hyperspectral image classification,” IEEE Sens., vol. 54, no. 7, pp. 3912–3927, Jul. 2016.
Geosci. Remote Sens. Lett., vol. 12, no. 1, pp. 48–52, Jan. 2015. [128] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet,
[107] J. Liu, Z. Wu, J. Li, A. Plaza, and Y. Yuan, “Probabilistic-kernel “SimpleMKL,” J. Mach. Learn. Res., vol. 9, pp. 2491–2521, Nov. 2008.
collaborative representation for spatial–spectral hyperspectral image [129] T. Liu, Y. Gu, X. Jia, J. A. Benediktsson, and J. Chanussot, “Class-
classification,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 4, specific sparse multiple kernel learning for spectral–spatial hyperspec-
pp. 2371–2384, Apr. 2016. tral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 54,
[108] Z. He, Q. Wang, Y. Shen, and M. Sun, “Kernel sparse multitask no. 12, pp. 7351–7365, Dec. 2016.
learning for hyperspectral image classification with empirical mode [130] L. Wang, S. Hao, Q. Wang, and P. M. Atkinson, “A multiple-mapping
decomposition and morphological wavelet-based features,” IEEE Trans. kernel for hyperspectral image classification,” IEEE Geosci. Remote
Geosci. Remote Sens., vol. 52, no. 8, pp. 5150–5163, Aug. 2014. Sens. Lett., vol. 12, no. 5, pp. 978–982, May 2015.
[109] M. Xiong, Q. Ran, W. Li, J. Zou, and Q. Du, “Hyperspectral image [131] Y. Gu, T. Liu, X. Jia, J. A. Benediktsson, and J. Chanussot,
classification using weighted joint collaborative representation,” IEEE “Nonlinear multiple kernel learning with multiple-structure-element
Geosci. Remote Sens. Lett., vol. 12, no. 6, pp. 1209–1213, Jun. 2015. extended morphological profiles for hyperspectral image classification,”
[110] J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, IEEE Trans. Geosci. Remote Sens., vol. 54, no. 6, pp. 3235–3247,
“Sparse representation for computer vision and pattern recognition,” Jun. 2016.
Proc. IEEE, vol. 98, no. 6, pp. 1031–1044, Jun. 2010. [132] T. T. H. Do, “A unified framework for support vector machines,
[111] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust multiple kernel learning and metric learning,” M.S. thesis, Faculté Sci.,
face recognition via sparse representation,” IEEE Trans. Pattern Anal. Univ. Geneva, Geneva, Switzerland, 2012.
Mach. Intell., vol. 31, no. 2, pp. 210–227, Feb. 2009. [133] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
[112] L. Zhang, M. Yang, X. Feng, Y. Ma, and D. Zhang. (2012). “Collabora- Machines and Other Kernel-Based Learning Methods. New York, NY,
tive representation based classification for face recognition.” [Online]. USA: Cambridge Univ. Press, 1999.
Available: https://arxiv.org/abs/1204.2358. [134] Y. Gu, G. Gao, D. Zuo, and D. You, “Model selection and classification
[113] D. Baron, M. F. Duarte, M. B. Wakin, S. Sarvotham, and with multiple kernel learning for hyperspectral images via sparsity,”
R. G. Baraniuk. (2009). “Distributed compressive sensing.” [Online]. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 6,
Available: https://arxiv.org/abs/0901.3403 pp. 2119–2130, Jun. 2014.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
6564 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 55, NO. 11, NOVEMBER 2017

[135] Y. Gu and H. Liu, “Sample-screening MKL method via boosting strat- Jocelyn Chanussot (M’04–SM’04–F’12) received
egy for hyperspectral image classification,” Neurocomputing, vol. 173, the M.Sc. degree in electrical engineering from
pp. 1630–1639, Jan. 2016. the Grenoble Institute of Technology (Grenoble
[136] C. Cortes, M. Mohri, and A. Rostamizadeh, “Learning non-linear INP), Grenoble, France, in 1995, and the Ph.D.
combinations of kernels,” in Proc. Adv. Neural Inf. Process. Syst., 2009, degree from Savoie University, Annecy, France,
pp. 396–404. in 1998.
[137] Z. Xu, R. Jin, H. Yang, I. King, and M. R. Lyu, “Simple and efficient In 1999, he joined the Geography Imagery
multiple kernel learning by group lasso,” in Proc. 27th Int. Conf. Mach. Perception Laboratory, Delegation Generale de
Learn., 2010, pp. 1175–1182. I’Armement, in Arcueil, France (DGA—French
[138] R. E. Schapire, “The strength of weak learnability,” Mach. Learn., National Defense Department). From 1999 to 2005,
vol. 5, no. 2, pp. 197–227, 1990. he was an Assistant Professor with Grenoble INP,
[139] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of where he was an Associate Professor from 2005 to 2007. He has been a
on-line learning and an application to boosting,” J. Comput. Syst. Sci., Visiting Scholar at Stanford University, Stanford, CA, USA; KTH, Stockholm,
vol. 55, no. 1, pp. 119–139, 1997. Sweden; and NUS, Singapore. Since 2013, he has been an Adjunct Professor
[140] H. Xia and S. C. H. Hoi, “MKBoost: A framework of multiple with the University of Iceland, Reykjavik, Iceland. From 2014 to 2015, he was
kernel boosting,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 7, a Visiting Professor at the University of California, Los Angeles, CA, USA.
pp. 1574–1586, Jul. 2013. He is currently a Professor of signal and image processing with the Grenoble
[141] P. Gamba, “A collection of data for urban area characterization,” in INP. He is conducting his research at the Grenoble Images Speech Signals and
Proc. IEEE Int. Symp. Geosci. Remote Sens. (IGARSS), Sep. 2004, Automatics Laboratory (GIPSA-Lab). His research interests include image
pp. 69–72. analysis, multicomponent image processing, nonlinear filtering, and data
[142] S. Jia, Z. Zhu, L. Shen, and Q. Li, “A two-stage feature selection fusion in remote sensing.
framework for hyperspectral image classification using few labeled Dr. Chanussot was a member of the IEEE Geoscience and Remote Sensing
samples,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, Society (GRSS) Administrative Committee during 2009–2010, in charge of
no. 4, pp. 1023–1035, Apr. 2014. membership development. He is a member of the Institut Universitaire de
[143] N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, “On France during 2012–2017. He was a co-recipient of the NORSIG 2006 Best
kernel-target alignment,” in Advances in Neural Information Process- Student Paper Award, the IEEE GRSS 2011 and 2015 Symposium Best Paper
ing Systems, vol. 14. Cambridge, MA, USA: MIT Press, 2002, Award, the IEEE GRSS 2012 Transactions Prize Paper Award, and the IEEE
pp. 367–373. GRSS 2013 Highest Impact Paper Award. He was the founding President of
[144] W. Li and Q. Du, “A survey on representation-based classification and the IEEE Geoscience and Remote Sensing French Chapter during 2007–2010,
detection in hyperspectral remote sensing imagery,” Pattern Recognit. which received the 2010 IEEE GRSS Chapter Excellence Award. He was the
Lett., vol. 83, pp. 115–123, Nov. 2016. General Chair of the first IEEE GRSS Workshop on Hyperspectral Image and
[145] J. Li et al., “Multiple feature learning for hyperspectral image Signal Processing, Evolution in Remote sensing. During 2009–2011, he was
classification,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 3, the Chair of the GRS Data Fusion Technical Committee, where he was the
pp. 1592–1606, Mar. 2015. Co-Chair during 2005–2008. He was a member of the Machine Learning for
Signal Processing Technical Committee of the IEEE Signal Processing Society
during 2006–2008 and the Program Chair of the IEEE International Workshop
on Machine Learning for Signal Processing in 2009. He was an Associate
Editor of the IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS during
2005–2007 and for the Pattern Recognition during 2006–2008. Since 2007, he
has been an Associate Editor of the IEEE T RANSACTIONS ON G EOSCIENCE
AND R EMOTE S ENSING . Since 2011, he has been the Editor-in-Chief of the
IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH O BSERVATIONS
AND R EMOTE S ENSING . He was a Guest Editor of the P ROCEEDINGS OF THE
IEEE in 2013, and the IEEE S IGNAL P ROCESSING M AGAZINE in 2014.

Yanfeng Gu (M’06–SM’16) received the Ph.D.


degree in information and communication engineer-
ing from the Harbin Institute of Technology (HIT), Xiuping Jia (M’93–SM’03) received the B.Eng.
Harbin, China, in 2005. degree from the Beijing University of Posts and
He was a Lecturer with the School of Electronics Telecommunications, Beijing, China, in 1982, and
and Information Engineering, HIT, where he became the Ph.D. degree in electrical engineering from the
an Associate Professor in 2006; meanwhile, he was University of New South Wales, Kensington, NSW,
enrolled in first Outstanding Young Teacher Training Australia, in 1996.
Program of HIT. From 2011 to 2012, he was a Since 1988, she has been with the School of
Visiting Scholar with the Department of Electri- Information Technology and Electrical Engineering,
cal Engineering and Computer Science, University University of New South Wales, Canberra, ACT,
of California, Berkeley, CA, USA. He is currently a Professor with the Australia, where she is currently a Senior Lecturer.
Department of Information Engineering, HIT. He has authored more than She is also a Guest Professor with Harbin Engi-
60 peer-reviewed papers, four book chapters, and he is the inventor or neering University, Harbin, China, and an Adjunct Researcher with the
co-inventor of seven patents. His research interests include image processing China National Engineering Research Center for Informaiton Technology
in remote sensing, machine learning and pattern analysis, and multiscale in Agriculture, Beijing. She has co-authored the remote sensing textbook
geometric analysis. titled Remote Sensing Digital Image Analysis [Springer-Verlag, 3rd (1999)
Dr. Gu is a peer reviewer for several international journals such as the and 4th eds. (2006)]. Her research interests include remote sensing, image
IEEE T RANSACTION ON G EOSCIENCE AND R EMOTE S ENSING, the IEEE processing, and spatial data analysis.
T RANSACTIONS ON I NSTRUMENTATION AND M EASUREMENT, the IEEE Dr. Jia is an Editor of the Annals of GIS for remote sensing topic, a Subject
G EOSCIENCE AND R EMOTE S ENSING L ETTERS , and the IET Electronics Editor of the Journal of Soils and Sediments, and an Associate Editor of the
Letter. IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE S ENSING.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.
GU et al.: MKL FOR HSI CLASSIFICATION: REVIEW 6565

Jón Atli Benediktsson (M’90–SM’99–F’04) IEEE Geoscience and Remote Sensing Society (GRSS) in 2007. He was a
received the Cand.Sci. degree in electrical recipient of the Icelandic Research Council’s Outstanding Young Researcher
engineering from the University of Iceland, Award in 1997 and the IEEE Third Millennium Medal in 2000. He was a
Reykjavik, Iceland, in 1984, and the M.S.E.E. co-recipient of the University of Iceland’s Technology Innovation Award
and Ph.D. degrees from Purdue University, West in 2004, the 2012 IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE
Lafayette, IN, USA, in 1987 and 1990, respectively. S ENSING Paper Award, the IEEE GRSS Highest Impact Paper Award
In 2015, he joined the University of Iceland and the IEEE/VFI Electrical Engineer of the Year Award in 2013, and
as a Rector, where he was the Pro Rector of the International Journal of Image and Data Fusion Best Paper Award
science and academic affairs and a Professor of in 2014. He is a member of the 2014 IEEE Fellow Committee. He was
Electrical and Computer Engineering from 2009 to the 2011–2012 President of the IEEE GRSS and has been with the
2015. He is a Co-Founder of Oxymap. He has GRSS Administrative Committee since 2000. He was an Editor-in-Chief
authored extensively in the research fields. His research interests include of the IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE S ENS -
remote sensing, biomedical analysis of signals, pattern recognition, image ING (TGRS) from 2003 to 2008, and has been an Associate Editor of
processing, and signal processing.. TGRS since 1999, the IEEE G EOSCIENCE AND R EMOTE S ENSING L ET-
Prof. Benediktsson is a member of the Association of Chartered Engineers TERS since 2003, and IEEE A CCESS since 2013. He is on the Editor-
in Iceland (VFI), Societas Scinetiarum Islandica, and Tau Beta Pi. He is ial Board of the P ROCEEDINGS OF THE IEEE, the International Editorial
a fellow of SPIE. He received the Stevan J. Kristof Award from Purdue Board of the International Journal of Image and Data Fusion. He was
University in 1991 as an outstanding graduate student in remote sensing, the Chairman of the Steering Committee of IEEE J OURNAL OF S ELECTED
the Yearly Research Award from the Engineering Research Institute of the T OPICS IN A PPLIED E ARTH O BSERVATIONS AND R EMOTE S ENSING
University of Iceland in 2006, and the Outstanding Service Award from the during 2007–2010.

Authorized licensed use limited to: National Institute of Technology. Downloaded on January 20,2025 at 07:03:57 UTC from IEEE Xplore. Restrictions apply.

You might also like