SVM Active Learning Approach For Image Classification Using Spatial Information
SVM Active Learning Approach For Image Classification Using Spatial Information
net/publication/259235007
CITATIONS READS
135 859
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Fabio Pacifici on 27 January 2014.
Reprinted from:
Pasolli, E.; Melgani, F.; Tuia, D.; Pacifici, F.; Emery, W.J., "SVM Active Learning
Approach for Image Classification Using Spatial Information," Geoscience and
Remote Sensing, IEEE Transactions on , vol.52, no.4, pp.2217,2233, April 2014
doi: 10.1109/TGRS.2013.2258676
URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6531640&isnumb
er=6695798
Abstract— In the last few years, active learning has been data statistics to subdivide the image into clusters of pixels
gaining growing interest in the remote sensing community in (samples) with similar characteristics. They do not require
optimizing the process of training sample collection for super- labeled information provided by the user, but this implies
vised image classification. Current strategies formulate the active a lack of correspondence between the clusters found and
learning problem in the spectral domain only. However, remote the classes desired by the user. To overcome this problem,
sensing images are intrinsically defined both in the spectral and
supervised techniques are characterized by an explicit link
spatial domains. In this paper, we explore this fact by proposing
a new active learning approach for support vector machine between samples and classes. They have shown very promising
classification. In particular, we suggest combining spectral and accuracies in terms of image classification, but they depend
spatial information directly in the iterative process of sample strongly on the labeled training samples that are used to
selection. For this purpose, three criteria are proposed to favor construct the model of classification. Training samples have
the selection of samples distant from the samples already compos- to be representative of the statistical distribution of the data.
ing the current training set. In the first strategy, the Euclidean However, the process of training sample collection is not
distances in the spatial domain from the training samples are obvious because it is performed manually by the user and thus
explicitly computed, whereas the second one is based on the it is characterized by errors and costs. This acquisition process
Parzen window method in the spatial domain. Finally, the last can be performed only on a limited portion of the available
criterion involves the concept of spatial entropy. Experiments
on two very high resolution images show the effectiveness of
data given the constraints in terms of time and money. For
regularization in spatial domain for active learning purposes. this reason, in the last few years there was a growing interest
in developing strategies for the semiautomatic selection of the
Index Terms— Active learning, image classification, spatial training samples. In the machine learning field, an interesting
information, support vector machine (SVM), very high resolution solution to address this problem is represented by the active
(VHR). learning approach. Considering a small and suboptimal initial
training set, few additional samples are selected from a large
I. I NTRODUCTION amount of unlabeled samples (learning set) through an iterative
process. The aim of active learning is to rank the learning set
O NE of the most important tasks in the remote sensing
field is represented by image classification and segmen-
tation, in which the purpose is to identify objects in an image
according to a criterion that allows us to select the most useful
samples to improve the model, thus minimizing the number
captured by an acquisition system. Because of the technologi- of training samples necessary to maintain discrimination capa-
cal developments of acquisition systems, in the last few years bilities as high as possible. In the last few years, different
there was a drastic increase of imaging systems characterized solutions were proposed and applied successfully in different
by growing spectral, spatial and temporal resolutions that can research areas [2]–[5] (see [6] for a survey on active learning
be exploited to improve the classification process. In addition methods) and in different remote sensing application fields,
to sensor developments, opportune processing techniques have such as detection of buried objects [7], [8], image retrieval [9],
to be implemented to automatically analyze such data for estimation of biophysical parameters [10], and classification of
accurate image classification [1]. multispectral and hyperspectral images [11]–[21].
From a methodological point of view, two main approaches Focusing on image classification, the strategies proposed in
for image classification are proposed in the literature: unsu- the literature can be divided into four main categories. The first
pervised and supervised. Unsupervised methods investigate category is specific for margin-based classification approaches,
such as support vector machine (SVM). In this context, margin
Manuscript received December 18, 2012; revised March 13, 2013; accepted sampling (MS) represents a good base method, in which the
April 8, 2013. Date of publication June 13, 2013; date of current version
December 24, 2013.
samples closest to the separating hyperplane are selected
E. Pasolli is with the Computational and Information Sciences and Tech- [11], [23]. In [12], the diversity of the selected samples is
nology Office, NASA Goddard Space Flight Center, Greenbelt, MD 20771 enforced by constraining the MS criterion to samples asso-
USA (e-mail: [email protected]). ciated with different closest support vectors (SVs). In this
F. Melgani is with the Department of Information Engineering and Com- way, a certain degree of spectral diversification is guaranteed,
puter Science, University of Trento, Trento 38123, Italy.
D. Tuia is with the Laboratoire des Systèmes d’Information Géographique, by dividing the margin in the feature space as a function of
Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland. the geometrical distribution of the SVs. In [13], instead of
F. Pacifici is with the DigitalGlobe, Inc., Longmont, CO 80503 USA. using the distance to the hyperplane as selection measure,
W. J. Emery is with the Department of Aerospace Engineering, University the original classification problem is reformulated into a new
of Colorado, Boulder, CO 80309 USA.
Color versions of one or more of the figures in this paper are available
binary problem where it is needed to discriminate between
online at http://ieeexplore.ieee.org. significant and nonsignificant samples, according to a concept
Digital Object Identifier 10.1109/TGRS.2013.2258676 of significance proper to the SVM theory based on the SV
0196-2892 © 2013 IEEE
2218 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
coefficients. In [14], clustering in the feature space induced based and the second one spatial-based, and 2) to combine the
by the kernel function of the SVM is proposed to prevent two criteria through the concept of nondominated sorting. The
redundancy when selecting samples. The second family of approach is proposed specifically for classification problems
methods quantifies the uncertainty of a sample by considering based on SVM. While in terms of spectral criterion we use, for
a committee of classifiers. After training each classifier by their simplicity and effectiveness, the MS and BT strategies,
exploiting different hypotheses about the classification prob- the spatial information is incorporated by three different crite-
lem, the samples showing maximal disagreement between the ria that we propose in this paper. The criteria are based on the
different classification models are selected. Committees can be hypothesis that uncertain samples that are spatially far from
realized by varying the samples members [12] or by consider- current SVs are relevant both in terms of observed spectra and
ing subsets of the feature space [15]. The third category uses of potential dataset shift. In the first strategy, the Euclidean
the estimation of posterior probabilities of class membership to distances in the spatial domain from the training samples are
select samples. In [16], using a maximum-likelihood classifier, explicitly computed, whereas the second one is based on the
the sample whose inclusion in the training set maximizes Parzen window method in the spatial domain. Finally, the last
the changes in the posterior distribution is selected. In [17], criterion involves the concept of spatial entropy. To investigate
samples are selected based on the entropy of the corresponding the performance of the proposed approach and to compare
class label. Another strategy is represented by the breaking ties the three spatial criteria, we conducted an experimental study
(BT) criterion [18], [25], in which the difference between the based on two VHR images acquired by QuickBird. Statistically
two highest posterior probabilities is considered. A modified significant advantages in terms of classification accuracy and
version of the BT algorithm is proposed in [19] to have classification reliability are evaluated with respect to strategies
unbiased sampling among classes. The last family, the cluster- that do not exploit spatial information.
based, aims at pruning a hierarchical clustering tree until the The rest of the paper is organized as follows. In Section II,
resulting clusters are consistent with the labels provided by the the proposed approach for the integration of spatial and
user [20], [21]. Therefore, it relies on an unsupervised model, spectral information and the three related spatial criteria
rather than on a predictive model. We refer the reader to [22] are described. Section III presents the datasets used in the
for a complete survey on active learning strategies for remote experimental analysis and the corresponding results. Finally,
sensing image classification. conclusions are drawn in Section IV.
It is important to note how the active learning methods pro-
posed in the remote sensing literature for image classification II. P ROPOSED M ETHOD
exploit uniquely spectral information of the involved samples.
However, remote sensing images are intrinsically defined in A. Proposed Active Learning Framework
both the spectral and spatial domains. Additionally, recently Let us consider a training set composed initially of n labeled
it has been shown that the common assumption that data are samples L = {xi , yi }ni=1 , where xi represents the features of
homogenous throughout the image, i.e., class statistics remain interest (spectral bands and/or contextual features extracted
constant over the image, is unrealistic [18]. This means that from them) and yi is a discrete label defined among T
a shift between the distributions of the training samples and possible classes. We also consider a learning set composed
the samples to classify is verified, especially when the training of m unlabeled samples U = {x j }n+m j =n+1 , with m >> n,
set only covers small regions of the image. This leads to an usually the rest of the image I (i.e., U = I −L). To increase
incompatibility of the model optimized on the training set the training set L with a series of samples chosen from the
when used to estimate the unlabeled samples. For this reason, learning set U to be labeled manually by the expert, an active
in this paper we propose to include explicitly the spatial learning algorithm has the task of choosing them properly
information in the active learning process to select samples hence to maximize the accuracy of the classification process
in regions where the classifier could be suboptimal because of while minimizing the number of learning samples to label (i.e.,
a potential dataset shift. This is done by favoring the selection number of interactions with the expert).
of samples in areas of the image poorly covered by training In Fig. 1, we show the flowchart of the active learning
samples. approach proposed in this paper, in which we propose to
We observe the use of spatial information in conjunction combine spectral and spatial information in the active selection
with the spectral one is investigated in different works in process of the training samples. The method is proposed
the literature to solve problems in different remote sensing specifically for classification problems based on state-of-the-
contexts. For example, the improvement of the classification art SVM. We refer the reader to [32] and [33] for more details
of hyperspectral or very high resolution (VHR) images is about SVM. Starting from the small and suboptimal training
obtained by adopting different approaches, such as composite set L, a multiclass SVM classifier is trained. The resulting
kernels [26], [27], morphological operators [28], [29], textural classification model is used to evaluate the unlabeled samples
metrics [30], and Markov random field regularization [31]. of the learning set U . In particular, each sample is evaluated
In this paper, the integration of the spatial information in the by two different criteria f 1 and f 2 , which incorporate spectral
active learning approach is done using a completely different and spatial information, respectively. The spectral criterion f 1 ,
approach based on the spatial position of the samples in the as done by active learning strategies proposed in the literature,
image under analysis. considers sample uncertainty in the feature domain, repre-
The objective of this paper is to propose a new active sented by the spectral bands and/or the contextual features
learning approach for classification of remote sensing images extracted from them. Additionally, the spatial criterion f 2 that
in which spatial and spectral information are combined to we propose in this paper is based on the spatial position of
improve the process of training sample collection. In partic- the sample in the considered image and in particular on the
ular, we suggest: 1) to evaluate the unlabeled samples of the distance from the current SVs. At this point, the two criteria
learning set using two different criteria, the first one spectral- are combined by sorting the samples using the nondominated
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2219
Human Nondominated
Labeling expert sorting
Fig. 1. Flowchart of the active learning method proposed for integration of spatial and spectral information.
sorting algorithm, which we will describe in more detail after- B. Spectral Selection Criteria
ward. In this way the combined criterion represents a tradeoff
between spectral and spatial information, in which uncertain The objective of this paper is not to propose new selection
samples that are spatially far from current SVs and relevant methods based on spectral information, but to consider some
both in terms of observed spectra and potential dataset shift criteria already introduced in the literature. In particular, we
are detected. Finally, from the sorted samples Us , Ns samples adopt two state-of-the-art strategies, namely MS and BT,
are selected from the learning set U , where Ns is the number which have shown their simplicity and effectiveness in the
of samples to be added in the training set L. Successively, remote sensing field [22].
the selected samples Us∗ are labeled by the human expert 1) Margin Sampling: The first spectral criterion is repre-
and added to the training set L. The entire process is iterated sented by the MS strategy [23], which is proposed specifically
until the predefined convergence condition is satisfied, e.g., the for classification problems based on SVM. Considering a
total number of samples to add to the training set is not yet simple binary case with linearly separable classes, SVs are
reached. Algorithm 1 summarizes the proposed active learning the samples of the training set L closest to the hyperplane that
strategy. describes the decision boundary. If we consider the learning
set U , we can assume that the samples closest to the decision
boundary are the most interesting samples because they have
Algorithm 1 Proposed Active Learning Framework a larger probability to become SVs when added to the training
Inputs: set. Therefore, the samples selected by MS are the ones
L: initial training set, composed of n labeled samples. showing the minimum absolute values of the discriminant
U : learning set, composed of m (m >> n) unlabeled samples. function. The same reasoning is applied in case of nonlinearly
Ns : number of samples to add at every iteration of the active learning separable classes.
process. SVMs are intrinsically binary classifiers. However, the
Output: classification of remote sensing images often involves the
L: final training set. simultaneous discrimination of numerous information classes.
Repeat For this purpose, a number of multiclass classification strate-
1. Train the SVM classifier with the current training set L, while gies are proposed in the literature. The most popular ones
estimating its free parameters by crossvalidation.
2. Compute the criterion f 1 based on spectral information for each
are the one-against-all (OAA) and the one-against-one (OAO)
sample x j ( j = n + 1, n + 2, . . . , n + m) of the learning set U . strategies. Similarly, the MS method introduced for the binary
3. Analogously to the previous step, compute the criterion f 2 based on case in the previous paragraph can be extended to the multi-
spatial information for each sample of the learning set U . class classification problem. Adopting an OAA SVM classifier,
4. Sort the samples of the learning set U in function of the spectral the solution proposed in [23] is applicable in the multiclass
f 1 and spatial f 2 criteria by the nondominated sorting algorithm. The
samples are now ranked in the set Us . context. For each sample x j ( j = n + 1, n + 2, . . . , n + m),
5. Select the first Ns samples from Us . the minimum value among the discriminant functions provided
6. Label the selected samples Us∗ . by the T binary classifiers is exploited as a sample indicator,
7. Add the labeled samples L ∗s to the training set L and remove them where T is the number of different classes. Then, the samples
from U .
Until the predefined convergence condition is satisfied (e.g., the total
with the minimum indicator values are selected, to select the
number of samples to add to the training set is not yet reached). samples closest to the decision boundary. In this work, we
adopt OAO classification, which has shown to be more suitable
for practical use than OAA [24], and consequently the MS
In the following, we describe in detail the main ingredients strategy has to be adapted to this multiclass classification
of the proposed approach, namely spectral and spatial criteria strategy. In this context, T ·(T −1)/2 different binary classifiers
and nondominated sorting. are used. However, considering the minimum value among
2220 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
the entire set of involved classifiers is not appropriate as As introduced previously, such criteria consider the spatial
it could be referred to a couple of classifiers that is not position of the samples in the considered image and in
relevant for the classification problem. To solve this problem, particular the distances from the subset of SVs identified by
first for each sample x j ( j = n + 1, n + 2, . . . , n + m) the the classification model on the training set L. In the following,
number of votes of each class v j ∈ N T = [v j,1 , v j,2,..., v j,T ] we define Sn as the number of SVs identified.
is calculated and the class ωMAX, j with the largest number 1) Spatial Distance From the Closest SV (Sp1): The first
of votes v MAX, j is identified. In this way, the winner class criterion, named Sp1 in the rest of the paper, consists of
is identified. Then, considering the T − 1 classifiers asso- calculating for each sample x j ( j = n + 1, n + 2, . . . , n + m)
ciated with the class ωMAX, j , the minimum absolute value the spatial Euclidean distances from the SVs d j ∈ R Sn =
of the discriminant function f MIN, j is calculated. Finally, the [d j,1 , d j,2,..., d j,Sn ]
samples characterized by the minimum values of f MIN, j are d j,i = p j − pi (1)
selected.
where p is the 2-D vector containing the position of the
Algorithm 2 resumes the spectral criterion based on the MS
considered sample in the spatial domain of the image. After
strategy.
that, the nearest SV sMIN, j is identified and the corre-
sponding distance dMIN, j is considered for the spatial crite-
Algorithm 2 Spectral Selection Criterion—Margin Sampling
rion. In particular, the negative value is adopted to convert
1. Compute the number of votes of each class v j ∈ N T = the maximization problem into a minimization one. In this
[v j,1 , v j,2,..., v j,T ] for each sample x j ( j = n + 1, n + 2, . . . , n + m)
of the learning set U . way, we favor the selection of samples placed in areas of
2. Identify the class ωMAX, j with the largest number of votes v MAX, j . the image not covered by SVs and in which a potential
3. Calculate the minimum absolute value of the discriminant function
dataset shift could occur. To avoid an exhaustive search of
f MIN, j by considering the T − 1 classifiers associated with the class the nearest SV for each sample of the learning set, we
ωMAX, j . note that appropriate techniques such as cover tree data
4. Set f 1 ( j) = f MIN, j . structure [35] can be adopted. Algorithm 4 summarizes the
proposed spatial criterion based on the distance from the
closest SV.
2) Breaking Ties: The second criterion is given by the BT
Algorithm 4 Spatial Selection Criterion—Spatial Distance from the
approach [25], which is based on the posterior probabilities Closest SV
of associating a sample to a given class. Considering a binary
case, we can guess that the most interesting samples to select 1. Compute the spatial Euclidean distances from the Sn different SVs
d j ∈ R Sn = [d j,1 , d j,2,..., d j,Sn ] for each sample x j ( j = n + 1, n + 2,
from the learning set U are those characterized by posterior . . . , n + m) of the learning set U .
probabilities close to 0.5 for both classes, since their decision
2. Identify the SV sMIN, j nearest to the sample.
uncertainties are maximum. Equivalently, this can be viewed
as selecting the samples for which the differences between the 3. Consider the distance dMIN, j associated with the SV sMIN, j .
two posterior probabilities are minimum, as suggested by the 4. Set f 2 ( j) = −dMIN, j .
BT formulation. In a multiclass context, this selection strategy
remains valid, independently from the number of classes T , 2) Parzen Window Method in the Spatial Domain (Sp2):
as the difference between the two highest probabilities is In the strategy Sp1 introduced in the previous subsection,
indicative of the way a sample is handled by the classifier. after calculating for each sample of the learning set the
When the two highest probabilities are close, the classifier distances from SVs, the selection criterion considers only the
confidence is low. value associated with the nearest SV. Here, we propose an
SVMs are not a probabilistic approach and thus they do alternative strategy (Sp2), in which all distance values are
not directly yield in output probabilistic quantities. However, opportunely combined. For this purpose, we recall the Parzen
in the literature, some solutions are proposed to infer poste- window method, which represents a standard way to estimate
rior probability estimates from discriminant function values the probability density function of a random variable [36].
provided by SVMs. In this paper, we adopt the Platt’s In particular, given some samples drawn from a distribution
estimation [34]. with an unknown probability density function, this one is
Algorithm 3 resumes the spectral criterion based on the BT estimated by weighting the samples with a kernel function
strategy. defined a priori. In a similar way, in the Sp2 strategy that
we propose, the distance function is estimated in the spatial
Algorithm 3 Spectral Selection Criterion—BT domain by combining distances with respect to SVs using a
1. Compute the posterior probability of each class p j ∈ N T = kernel function. The spatial criterion dKER, j for the sample
[ p j,1 , p j,2,..., p j,T ] for each sample x j ( j = n + 1, n + 2, . . . , n + m) x j ( j = n + 1, n + 2, . . . , n + m) is given by the following
of the learning set U using Platt’s estimation.
formulation:
2. Identify the two classes ωMAX, j , ωMAX2, j with the two highest
Sn
posterior probabilities. dKER, j = K p j , pi (2)
3. Calculate the difference pMAX, j between the two posterior proba-
i=1
bilities associated with the classes ωMAX, j and ωMAX2, j .
4. Set f 1 ( j) = pMAX, j . where K (·) is the kernel function that has to be fixed a priori.
In this paper, we adopt the common Gaussian kernel function,
which allows us to put more weight on the SVs spatially close
C. Spatial Selection Criteria to the considered sample
In this paper, three different criteria are proposed to integrate 2
spatial information in the process of training sample collection. K p j , pi = exp −p j − pi λ2 . (3)
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2221
(a) (b)
TABLE I
C HARACTERISTICS OF THE I MAGES U SED FOR THE E XPERIMENTS
TABLE II is chosen according to the image resolution. For the Las Vegas
N UMBER OF L EARNING AND T EST S AMPLES dataset, a square SE is used to consider the major directions
FOR (a) L AS V EGAS AND (b) R OME D ATASETS of the objects on the image, which are 0° and 90°. For the
Rome dataset, being characterized by an overall 45° angle
(a) in the disposition of the objects, a diamond-shaped SE is
Class # Learning Samples # Test Samples used instead. This shape allows a better reconstruction of
Bare soil 4276 48 908 the borders of the objects in the case of O and C features.
Commercial buildings 1831 20 938
The process of reconstruction for OR and CR operators is
performed using a small (3-pixel diameter) SE. The entire
Drainage channel 1149 13 138
process of morphological filtering increases the dimensionality
Highways 2851 32 594
of the datasets from four to 40 features.
Parking lots 2269 25 939
Residential houses 7044 80 546
B. Experimental Setup
Roads 6130 70 088
Short vegetation 1803 20 611
In all the following experiments, for both datasets, all
the available samples are split in two sets, corresponding to
Soil 1480 16 918
learning set U and test set. The detailed numbers of learning
Trees 1049 11 989
and test samples are shown in Table II. The initial training
Water 118 1354 samples are selected randomly from the learning set U . For
Total 30 000 34 3023 the first dataset, starting from 55 samples, i.e., five samples per
(b)
class, the active learning algorithm is run until the number of
training samples is equal to 7995, adding 20 samples at each
Class # Learning Samples # Test Samples iteration. Analogously, for the second dataset, starting from
Apartment blocks 7081 10 2735 36 samples, i.e., four sample for each class, 20 samples are
Bare soil 5241 76031 added at each iteration up to 11 996 samples. The entire active
Buildings 11 688 16 9568 learning process is run ten times, each time with a different
Railway 1036 15 024 initial training set to yield statistically reliable results. At each
Roads 10 545 15 2992
run, the initial training samples are chosen in a completely
random way.
Short vegetation 44 89 65 128
Classification performances are evaluated in terms of several
Soil 971 14 086
measures: 1) the overall accuracy (OA), which is the percent-
Tower 3089 44 827 age of correctly classified samples among all the considered
Trees 5860 85 020 samples, independently of the classes they belong to; 2) the
Total 50 000 72 5411 Kappa statistic [41]; 3) the classification accuracies obtained
for the different classes; 4) the average accuracy (AA), which
is the average over the classification accuracies obtained for
are recognized, including roads, trees, short vegetation, soil, the different classes; and 5) the standard deviations (σ ) of OA,
bare soil, and the peculiar railway for a total of nine classes. Kappa and AA, to evaluate the stability of the active learning
Differently from the previous case, in this scene shadows method.
occupy a larger portion of the image. An SVM classifier is also trained on the entire learn-
The ground-truths are shown in Fig. 4(b) and (d) for the ing set to have a reference-training scenario, called full
Las Vegas and Rome datasets, respectively. They are obtained training. On one hand, the classification results obtained in
by careful visual inspection of separate data sources, including this way represent an upper bound for the accuracies. On
aerial images, cadastral maps and in situ inspections (for the the other hand, we expect that the lower accuracy bound
Rome scene only). An additional consideration regards objects will be given by the completely random selection strategy
within shadows that reflect little radiance because the incident (R). We recall that the purpose of any active learning strategy
illumination is occluded. These surfaces are assigned to one of is to converge to the performance of the full training scenario
the corresponding classes of interest described above. When faster than the R method. In addition, the proposed approach
classifying images at submeter spatial resolution, many of the for spectral-spatial information integration is compared with
errors may occur in the boundaries between objects. On the the performances given by the MS and BT strategies based on
other hand, often it is not possible to correctly identify an spectral information only.
edge. To limit this effect, we defined the two ground-truths by
not including boundary areas.
We note that for both datasets several classes have very C. Experimental Results
similar spectral signatures. To differentiate them, we applied Considering the Las Vegas dataset, the OA for the full
on the original images contextual filters based on mathemat- classifier is equal to 95.47%. In Fig. 5(a)–(f) we show the
ical morphology [40], which were shown to have desirable results in terms of (a) and (b) OA, (c) and (d) Kappa, and
properties when applied to urban VHR classification problems (e) and (f) AA in function of the number of training samples
[28] and [29]. In particular, four very common morphological for the proposed active learning approach, the MS and BT
filters are considered: opening (O), closing (C), opening by strategies, and the random selection. First, it is evident how
reconstruction (OR), and closing by reconstruction (CR). For the active selection of the training samples allows a faster
each of these filters, we used a structuring element (SE) whose convergence to the full accuracy with respect to the random
dimensions increased from 9 to 25 pixels with steps of 2 pixels, strategy. Comparing the different active learning strategies, we
resulting in nine morphological features. The size of the SEs note that the integration of the spatial information is useful
2224 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
(a) (b)
(c) (d)
(e) (f)
Fig. 5. Performances achieved on the Las Vegas dataset in terms of (a) and (b) OA, (c) and (d) Kappa, and (e) and (f) AA. Each graph shows the results
in function of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set. Shaded areas represent standard
deviation over the ten considered runs. R: random. MS: margin sampling. BT: breaking ties. Sp1–Sp3: spatial criterion. Full: full SVM.
in the process of training sample collection. In particular, the sample labeling and a decreasing of the computational time
strategies based on the criteria Sp1 and Sp2 converge to the full necessary to train the classifier.
accuracy using about 5000 training samples, which represent The obtained results are shown in greater detail in Table III.
about 17% of the entire learning set. Instead, about 6000 and In particular, we considered the performances obtained after
7000 samples are necessary for the methods based on the (a) 50 and (b) 100 iterations of the iterative process, which
criterion Sp3 and the spectral information only, respectively. corresponds to (a) 1035 and (b) 2035 samples used to train
In addition, we note that, before convergence, the proposed the classifier, respectively. We report the values of OA, Kappa,
strategies give an improvement with respect to the traditional AA, standard deviations associated with the accuracies (σOA ,
MS and BT criteria. This means that similar values of accu- σKAPPA , and σAA ) and class accuracies. As it can be seen, the
racies can be obtained using a minor quantity of training proposed strategies are characterized by a better performance
samples, which implies a reduction of the manual work for with respect to the MS and BT criteria from different points
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2225
TABLE III
OA, K APPA , AA, C LASS A CCURACIES , AND S TANDARD D EVIATIONS (σ ) A CHIEVED ON THE L AS V EGAS D ATASET
A FTER (a) 50 AND (b) 100 I TERATIONS OF THE A CTIVE L EARNING P ROCESS
(a)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 1035
OA 95.47 58.98 84.89 88.09 89.73 90.25 88.83 87.87 89.38 90.01 88.48
σOA − 5.74 0.56 0.40 0.24 0.25 0.40 0.28 0.16 0.17 0.26
Kappa 0.947 0.533 0.823 0.860 0.880 0.886 0.870 0.856 0.876 0.883 0.866
σKAPPA − 0.060 0.007 0.005 0.002 0.002 0.004 0.003 0.002 0.002 0.003
AA 93.35 59.33 79.22 83.15 84.86 85.37 83.98 82.54 84.40 85.05 83.47
σAA − 4.10 1.47 0.80 0.38 0.42 0.76 0.87 0.31 0.33 0.80
Bare soil 99.53 65.74 98.09 98.30 99.32 99.41 99.10 98.11 98.69 99.02 98.12
Commercial buildings 98.22 72.86 88.95 93.36 94.09 94.61 93.28 91.78 92.25 92.70 91.46
Drainage channel 99.43 58.61 91.82 96.42 97.14 97.78 96.04 96.05 95.44 96.78 94.39
Highways 97.22 45.50 86.68 90.77 93.77 94.23 91.85 90.81 94.36 95.42 93.36
Parking lots 86.73 52.67 63.14 66.11 68.96 69.83 67.52 66.36 70.32 70.82 69.43
Residential houses 97.76 61.27 91.25 94.11 96.41 96.83 95.69 94.49 96.00 96.53 94.97
Roads 96.26 59.98 87.99 91.00 91.64 92.37 90.67 91.10 91.45 92.19 90.57
Short vegetation 91.06 60.03 75.56 81.52 83.72 83.96 82.67 81.53 83.96 84.35 82.94
Soil 88.26 42.66 51.59 57.31 57.67 58.71 56.82 53.22 55.20 56.14 54.50
Trees 82.39 55.96 62.03 66.74 69.49 70.00 68.80 66.26 69.40 69.93 68.03
Water 90.03 77.32 74.33 78.99 81.21 81.37 81.31 78.21 81.29 81.65 80.39
(b)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 2035
OA 95.47 58.98 87.18 90.54 92.13 92.61 91.21 90.58 92.18 92.64 91.25
σOA − 5.74 0.42 0.89 0.19 0.19 0.35 0.32 0.14 0.14 0.22
Kappa 0.947 0.533 0.850 0.889 0.908 0.914 0.897 0.890 0.909 0.914 0.898
σKAPPA − 0.060 0.005 0.010 0.001 0.001 0.003 0.004 0.001 0.001 0.002
AA 93.35 59.33 82.00 86.55 88.39 88.93 87.46 86.58 88.53 88.97 87.50
σAA − 4.10 1.00 1.07 0.27 0.28 0.61 0.43 0.20 0.21 0.31
Bare soil 99.53 65.74 98.37 97.45 98.39 98.42 98.36 98.81 99.32 99.43 98.76
Commercial buildings 98.22 72.86 91.10 95.52 96.31 96.85 95.38 94.46 95.02 95.54 94.04
Drainage channel 99.43 58.61 95.03 97.48 98.18 98.53 97.29 98.01 97.87 98.71 96.99
Highways 97.22 45.50 90.65 93.85 96.06 96.55 94.34 93.53 96.55 96.91 95.72
Parking lots 86.73 52.67 64.83 73.68 76.77 78.04 75.03 74.31 78.53 79.10 77.25
Residential houses 97.76 61.27 93.21 95.31 97.38 97.54 96.66 95.66 97.20 97.47 96.36
Roads 96.26 59.98 89.56 92.88 93.61 94.27 92.59 92.80 93.57 94.30 92.53
Short vegetation 91.06 60.03 79.21 83.91 86.49 87.03 85.35 84.47 87.19 87.30 85.93
Soil 88.26 42.66 58.82 69.23 69.15 70.37 68.02 63.32 65.20 66.16 63.71
Trees 82.39 55.96 65.48 70.07 74.13 74.78 73.18 71.63 74.93 75.67 74.11
Water 90.03 77.32 75.70 82.70 85.81 85.85 85.88 85.34 88.49 88.10 87.11
of view. First, better values of accuracies (OA, Kappa, AA, information. We recall that, considering the standard α level of
class accuracies) are obtained using the same number of 5%, statistical significance difference is obtained if |z| > 1.96.
training samples. Then, smaller values of standard deviations Therefore, we note a significant superiority of the proposed
associated with the accuracies are verified. Indeed, smaller methods with respect to the criteria in which spatial informa-
values of standard deviation mean that the proposed strategies tion is not considered.
exhibit a greater level of stability with respect to the random To better understand the proposed strategies, a set of maps
selection of the initial training set. is shown in Fig. 6. In Fig. 6(a), we report the gray-level
Further considerations can be drawn from Table IV. To representation of the remote sensing image with an example
evaluate the statistical significance of obtained performances, of training sample set. In particular, for such analysis we
we report the z value related to the two-tailed McNemar’s considered the initial training set that gives the value of OA
test. In particular, we compare the proposed strategies with closer to the mean value obtained at the first iteration, i.e.,
criteria (a) MS and (b) BT that do not incorporate spatial equal to 58.98 as shown in Table III. The training samples
2226 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
Fig. 6. Maps for the Las Vegas dataset in terms of (a) set of initial training samples, (b) spectral criterion based on MS and (c) relative
histogram, (d) spectral criterion based on BT and (e) relative histogram, (f) spatial criterion based on the distance from the closest SV (Sp1) and (g) relative
histogram, (h) combined criterion (MS+Sp1), (i) combined criterion (BT+Sp1), (j) spatial criterion based on the Parzen window method, (k) relative histogram,
(l) combined criterion (MS+Sp2), (m) combined criterion (BT+Sp2), (n) spatial criterion based on the entropy variation (Sp3), (o) relative
histogram, (p) combined criterion (MS+Sp3), and (q) combined criterion (BT+Sp3).
are depicted with circles colored with the corresponding class map only, but is obtained by averaging the results on the ten
colors. In Fig. 6(b), we report the map of the discriminant different experiment runs. It is evident how many samples
function value, which is given for each sample j of the image are associated with very low values of discriminant function,
by the minimum absolute value of the discriminant function which are depicted in dark blue, and therefore are placed
f MIN, j obtained by training the classifier on the considered in the proximity of the boundary between different classes.
set of training samples. In this map, SVs are highlighted with This map corresponds to the map associated with the MS
white circles. The correspondence between the discriminant criterion, in which the samples closer to the boundary are
function value and the color is shown in Fig. 6(c), in which selected. Similarly, in Fig. 6(d) we report the map related
the histogram of the map is reported. In particular, for com- to the BT criterion, in which we report for each sample j
pleteness, the histogram is not referred to the considered single of the image the difference pMAX,j between the two highest
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2227
TABLE IV
M C N EMAR ’ S T EST A CHIEVED ON THE L AS V EGAS D ATASET
W ITH R ESPECT TO : (a) MS AND (b) BT C RITERIA
(a)
MS MS MS MS MS MS
Method
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 1035 2035
Z value 6.97 7.23 2.79 6.66 6.94 2.54
(b)
BT BT BT BT BT BT (a) (b)
Method
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3 Fig. 7. Maps for the Las Vegas dataset in terms of (a) discriminant function
value and (b) relative histogram for the full SVM.
#Training samples 1035 2035
Z value 6.85 7.01 2.61 6.80 6.90 2.58
TABLE V
M EAN VALUE AND S TANDARD D EVIATION OF THE D ISCRIMINANT F UNCTION A CHIEVED ON THE L AS V EGAS D ATASET
MS MS MS MS MS MS
Method
Full Initial R MS +Sp1 +Sp2 +Sp3 R MS +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 1035 2035
Mean value 1.03 0.46 1.02 1.23 1.07 1.07 1.16 1.02 1.11 1.05 1.05 1.08
Standard deviation 0.06 0.45 0.25 0.25 0.16 0.17 0.22 0.22 0.18 0.16 0.15 0.16
Fig. 9. Classification maps achieved on the Las Vegas dataset. Top row: (a) initial, (b) full, and (c) random criterion (R). Middle row: (d) spectral criterion
based on MS, (e) MS in combination with spatial criterion based on the distance from the closest SV (Sp1), (f) MS in combination with spatial criterion
based on the Parzen window method (Sp2), and (g) MS in combination with spatial criterion based on the entropy variation (Sp3). Bottom row: (h) spectral
criterion based on BT, (i) BT in combination with Sp1, (j) BT in combination with Sp2, and (k) BT in combination Sp3. For maps (c)–(k), 1035 training
samples are used.
the standard deviation implies that the confidence map tends to spatial information in the active learning process allows us
be inhomogeneous, i.e., some samples are classified with low to obtain better performance not in terms of accuracy only,
reliability and other ones with high confidence. This result but also in terms of classification reliability, which can be
is not desirable, because the classification process tends to estimated by analyzing the discriminant function map. The
classify the samples with different levels of reliability. Instead, mean and standard deviation values are shown in Table V, in
using the full training set, we have an increment of the mean which the results obtained after 50 and 100 iterations of the
value and a substantial decrement of the standard deviation iterative process are reported. Similar considerations can be
value, which comes from a more homogeneous confidence done also for the BT strategy, which we do not report so as
map. This scenario tends to the ideal case, in which all the to not overload the paper.
samples are classified with high reliability and the associated To conclude the analysis on the Las Vegas dataset, we show
confidence map is approximately homogenous. Considering in Fig. 9 the classification maps obtained with the different
the different selection strategies, we note that active learning selection criteria. In particular, the results after 50 iterations
methods are able to increment the discriminant function mean of the iterative process are considered. Although the visual
value faster than the random selection. In addition, a faster inspection of the images is not easy, we note a slight decrease
convergence to the full result is obtained using the two of noisy classification patterns when the spatial criterion is
proposed strategies that combine the MS criterion with the integrated in the selection process.
Sp1 and Sp2 criteria. These two criteria allow us to have Concerning the Rome dataset, the results confirm the obser-
quicker decreases of the standard deviation value also, which vations done for the Las Vegas one. The graphs with the
are verified since the first iterations. Adopting the MS criterion accuracies in function of the number of training samples are
only, an improvement with respect to the random strategy is shown in Fig. 10(a)–(f). For the full classifier the OA is
obtained only when about 1500 samples are added to the equal to 88.89. Also for this set of experiments, the proposed
training set. These results show how the integration of the active learning strategies give a faster convergence to the full
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2229
(a) (b)
(c) (d)
(e) (f)
Fig. 10. Performances achieved on the Rome dataset in terms of (a) and (b) OA, (c) and (d) Kappa, (e) and (f) AA. Each graph shows the results in function
of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set. Shaded areas: standard deviation over the ten
considered runs. R: random. MS: margin sampling. BT: breaking ties. Sp1–Sp3: spatial criterion. Full: full SVM.
accuracy and better performances before convergence with in most of the cases the proposed strategies exhibit significance
respect to the random, the MS, and the BT methods. The superiority with respect to criteria that do not exploit spatial
criteria Sp1 and Sp2 allow to converge to the full accuracy information.
using about 9000 training samples, whereas about 11 000 In terms of discriminant function value, the results in
samples are necessary for the methods based on the criterion function of the number of training samples until 190 iter-
Sp3 and the spectral information only. The results are shown ations of the iterative process are shown in Fig. 11. The
in Table VI, in which we report the accuracies obtained after criteria Sp1 and Sp2 confirm the best performance both in
(a) 100 and (b) 200 iterations of the active learning process, terms of mean value increasing and standard deviation value
which correspond to (a) 2016 and (b) 4016 training samples, decreasing. In particular, the mean and standard deviation
respectively. Statistically significance differences evaluated by values obtained after 100 and 200 iterations are shown in
McNemar’s test are shown in Table VII. Also for this dataset, Table VIII.
2230 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
TABLE VI
OA, K APPA , AA, C LASS A CCURACIES , AND S TANDARD D EVIATIONS (σ ) A CHIEVED ON THE ROME D ATASET A FTER (a) 100 AND (b) 200 I TERATIONS
OF THE A CTIVE L EARNING P ROCESS
(a)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 2016
OA 88.89 40.49 75.91 77.77 78.95 79.73 77.80 77.57 79.26 80.21 78.16
σOA − 4.50 0.27 0.50 0.19 0.21 0.42 0.62 0.22 0.21 0.49
Kappa 0.868 0.322 0.713 0.735 0.753 0.762 0.739 0.732 0.756 0.767 0.743
σKAPPA − 0.043 0.003 0.006 0.002 0.002 0.005 0.007 0.002 0.002 0.005
AA 88.74 45.50 72.89 75.47 76.52 77.28 75.46 75.14 76.42 77.35 75.37
σAA − 2.98 0.76 0.64 0.31 0.30 0.58 0.69 0.30 0.29 0.55
Apartment blocks 82.18 15.97 59.90 63.38 65.00 65.82 63.16 61.86 60.23 61.62 58.60
Bare soil 95.82 75.86 91.93 91.73 93.94 94.05 93.48 92.28 92.00 92.74 91.17
Buildings 88.00 24.02 73.88 75.87 75.63 76.77 73.92 77.74 80.80 81.75 79.41
Railway 96.61 78.59 91.02 90.15 93.12 93.44 92.27 91.84 91.02 91.98 90.38
Roads 92.82 57.20 89.21 89.14 92.04 92.42 91.33 88.51 91.72 92.37 91.12
Short vegetation 87.70 42.72 70.42 75.51 72.64 74.28 71.56 75.53 77.30 78.87 75.94
Soil 88.45 51.36 57.99 66.53 65.68 66.39 64.67 67.06 69.67 70.11 68.48
Tower 74.18 24.07 34.11 39.02 40.32 41.44 39.09 34.66 35.75 36.49 34.99
Trees 92.89 39.73 87.57 87.89 90.34 90.94 89.67 86.78 89.28 90.27 88.25
(b)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 4016
OA 88.89 40.49 78.54 80.50 82.30 83.03 81.04 80.65 82.82 83.59 81.70
σOA − 4.50 0.33 0.42 0.08 0.09 0.24 0.41
Kappa 0.868 0.322 0.744 0.768 0.792 0.800 0.777 0.769 0.798 0.807 0.785
σKAPPA − 0.043 0.004 0.005 0.001 0.001 0.002 0.005
AA 88.74 45.50 76.50 78.80 80.47 81.21 79.25 78.86 80.64 81.48 79.49
σAA − 2.98 0.72 0.39 0.10 0.09 0.25 0.42
Apartment blocks 82.18 15.97 64.69 68.35 71.04 71.98 68.92 67.91 66.58 68.10 64.45
Bare soil 95.82 75.86 92.85 92.81 95.26 95.41 94.67 93.01 93.27 93.84 92.60
Buildings 88.00 24.02 75.88 79.86 79.71 80.94 77.90 80.98 84.39 85.27 82.95
Railway 96.61 78.59 92.99 91.64 94.50 94.85 93.55 92.61 92.92 93.57 92.15
Roads 92.82 57.20 89.86 88.66 92.54 92.63 91.93 89.15 92.88 93.03 92.56
Short vegetation 87.70 42.72 74.87 79.62 77.69 79.22 76.25 79.80 81.82 83.26 80.34
Soil 88.45 51.36 65.64 72.44 73.07 73.70 71.79 74.81 77.36 78.54 76.07
Tower 74.18 24.07 43.38 47.81 48.67 50.34 47.26 43.16 44.52 45.37 42.87
Trees 92.89 39.73 88.35 88.04 91.71 91.79 90.97 88.32 91.98 92.29 91.40
(a) (b)
Fig. 11. Results achieved on the Rome dataset in terms of (a) mean value of discriminant function and (b) standard deviation of discriminant function.
Each graph shows the results in function of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set.
R: random. MS: margin sampling. Sp1–Sp3: spatial criterion. Full: full SVM.
TABLE VII
M C N EMAR ’ S T EST A CHIEVED ON THE ROME D ATASET W ITH R ESPECT TO (a) MS AND (b) BT C RITERIA
(a)
Method MS + Sp1 MS + Sp2 MS + Sp3 MS + Sp1 MS + Sp2 MS + Sp3
#Training samples 2016 4016
Z value 5.89 6.42 1.62 6.54 7.49 2.32
(b)
Method MS + Sp1 MS + Sp2 MS + Sp3 MS + Sp1 MS + Sp2 MS + Sp3
#Training samples 2016 4016
Z value 6.01 6.53 1.74 7.02 7.52 2.78
TABLE VIII
M EAN VALUE AND S TANDARD D EVIATION OF THE D ISCRIMINANT F UNCTION A CHIEVED ON THE ROME D ATASET
MS MS MS MS MS MS
Method Full Initial R MS R MS
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 2016 4016
Mean value 1.38 0.40 1.00 1.34 1.41 1.46 1.29 1.09 1.37 1.37 1.42 1.47
Standard deviation 0.33 0.59 0.42 0.44 0.35 0.35 0.38 0.40 0.38 0.33 0.32 0.35
image and, in particular, on the distance from the current SVs. R EFERENCES
For this purpose, three different criteria were proposed, based
on Euclidean distance, Parzen window method, and entropy [1] G. Camps-Valls, D. Tuia, L. Gomez-Chova, S. Jiménez, and J. Malo,
Remote Sensing Image Processing. San Mateo, CA, USA: Morgan, 2011.
variation, respectively. Combination of spectral and spatial [2] P. Mitra, C. A. Murthy, and S. K. Pal, “A probabilistic active support
criteria was done by adopting nondominated sorting. vector learning algorithm,” IEEE Trans. Pattern Anal. Mach. Intell.,
To validate the proposed solution, we conducted exper- vol. 26, no. 3, pp. 413–418, Mar. 2004.
iments on two VHR remote sensing images acquired by [3] M. Li and I. K. Sethi, “Confidence-based active learning,” IEEE Trans.
QuickBird. The obtained results showed good capabilities of Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1251–1261, Aug. 2006.
[4] S.-S. Ho and H. Wechsler, “Query by transduction,” IEEE Trans. Pattern
the proposed approach for the selection of relevant samples. Anal. Mach. Intell., vol. 30, no. 9, pp. 1557–1571, Sep. 2008.
In particular, statistically significant advantages in terms of [5] E. Pasolli and F. Melgani, “Active learning methods for electocardio-
classification accuracy and classification reliability were eval- graphic signal classification,” IEEE Trans. Inf. Technol. Biomed., vol. 14,
uated with respect to strategies that do not exploit spatial no. 6, pp. 1405–1416, Nov. 2010.
information. Therefore, the integration of spatial information [6] B. Settles, “Active learning literature survey,” Dept. Comput. Sci., Univ.
Wisconsin-Madison, Madison, WI, USA, Tech. Rep. 1648, 2009.
showed worthy for reducing the manual sample labeling work [7] Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active
and decreasing the computational time necessary to train the selection of labeled data: Application to sensing subsurface UXO,” IEEE
classifier. Trans. Geosci. Remote Sens., vol. 42, no. 11, pp. 2535–2543, Nov. 2004.
While in this paper we considered, for their simplicity [8] Q. Liu, X. Liao, and L. Carin, “Detection of unexploded ordnance
and effectiveness, the state-of-the-art MS and BT strategies via efficient semisupervised and active learning,” IEEE Trans. Geosci.
Remote Sens., vol. 46, no. 9, pp. 2558–2567, Sep. 2008.
as spectral criteria, the proposed approach can be in general [9] M. Ferecatu and N. Boujemaa, “Interactive remote-sensing image
applied in conjunction with any other active learning criterion retrieval using active relevance feedback,” IEEE Trans. Geosci. Remote
that exploited the samples in the spectral domain. In addition, Sens., vol. 45, no. 4, pp. 818–826, Apr. 2007.
nondominated sorting can be easily extended to combine more [10] E. Pasolli, F. Melgani, N. Alajlan, and Y. Bazi, “Active learning methods
than two criteria. However, experimental results not reported in for biophysical parameter estimation,” IEEE Trans. Geosci. Remote
Sens., vol. 50, no. 10, pp. 4071–4084, Oct. 2012.
this paper showed that the combination of just one spectral and [11] P. Mitra, B. Uma Shankar, and S. Pal, “Segmentation of multispectral
one spatial criterion was sufficient to have a good improvement remote sensing images using active support vector machines,” Pattern
of the learning process. Recognit. Lett., vol. 25, no. 9, pp. 1067–1074, Jul. 2004.
Finally, although active learning is reaching a good level of [12] D. Tuia, F. Ratle, F. Pacifici, M. Kanevski, and W. Emery, “Active
learning methods for remote sensing image classification,” IEEE Trans.
maturity from a methodological viewpoint, its implementation Geosci. Remote Sens., vol. 47, no. 7, pp. 2218–2232, Jul. 2009.
in real applications has been scarcely investigated in the liter- [13] E. Pasolli, F. Melgani, and Y. Bazi, “SVM active learning through
ature. In particular, the labeling of hundreds of pixels by the significance space construction,” IEEE Geosci. Remote Sens. Lett.,
user makes current active learning strategies hardly practical vol. 8, no. 3, pp. 431–435, May 2011.
in a real user-machine interaction scenario. This represents [14] M. Volpi, D. Tuia, and M. Kanevski, “Memory-based cluster sampling
for remote sensing image classification,” IEEE Trans. Geosci. Remote
an important direction where devoting future research efforts. Sens., vol. 50, no. 8, pp. 3096–3106, Aug. 2012.
First attempts to this problem were proposed recently in the [15] W. Di and M. M. Crawford, “View generation for multiview maximum
remote sensing community [42], [43]. disagreement based active learning for hyperspectral image classifica-
tion,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1942–1954,
ACKNOWLEDGMENT May 2012.
[16] S. Rajan, J. Ghosh, and M. Crawford, “An active learning approach to
The authors would like to thank DigitalGlobe for providing hyperspectral data classification,” IEEE Trans. Geosci. Remote Sens.,
the data used in this paper. vol. 46, no. 4, pp. 1231–1242, Apr. 2008.
2232 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014
[17] J. Li, J. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspec- [40] P. Soille, Morphological Image Analysis. New York, NY, USA: Springer-
tral image segmentation using multinomial logistic regression with Verlag, 2004.
active learning,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, [41] J. Cohen, “A coefficient of agreement for nominal scales,” Educ.
pp. 4085–4098, Nov. 2010. Psychol. Meas., vol. 20, no. 1, pp. 37–46, 1960.
[18] D. Tuia, E. Pasolli, and W. J. Emery, “Using active learning to adapt [42] D. Tuia and J. Munoz-Marì, “Learning user’s confidence for active
remote sensing image classifiers,” Remote Sens. Environ., vol. 115, no. 9, learning,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2,
pp. 2232–2242, Sep. 2011. pp. 872–880, Feb. 2013.
[19] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral image seg- [43] E. Pasolli, F. Melgani, N. Alajlan, and N. Conci, “Optical image
mentation using a new Bayesian approach with active learning,” IEEE classification: A ground-truth design framework,” IEEE Trans. Geosci.
Trans. Geosci. Remote Sens., vol. 49, no. 10, pp. 3947–3960, Oct. Remote Sens., vol. 51, no. 6, pp. 3580–3597, Jun. 2013.
2011.
[20] D. Tuia, J. Muñoz-Marí, and G. Camps-Valls, “Remote sensing image
segmentation by active queries,” Pattern Recognit., vol. 45, no. 6,
pp. 2180–2192, Jun. 2012.
[21] J. Muñoz-Marí, D. Tuia, and G. Camps-Valls, “Semisupervised clas- Edoardo Pasolli (S’08–M’13) received the M.Sc.
sification of remote sensing images with active queries,” IEEE Trans. degree in telecommunications engineering and the
Geosci. Remote Sens., vol. 50, no. 10, pp. 3751–3763, Oct. 2012. Ph.D. degree in information and communication
[22] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Muñoz-Marí, “A technologies from the University of Trento, Trento,
survey of active learning algorithms for supervised remote sensing Italy, in 2008 and 2011, respectively.
image classification,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, He was a Post-Doctoral Fellow with the Intelli-
pp. 606–617, Jun. 2011. gent Information Processing Laboratory, Department
[23] G. Schohn and D. Cohn, “Less is more: Active learning with sup- of Information Engineering and Computer Science,
port vectors machines,” in Proc. 17th Int. Conf. Mach. Learn., 2000, University of Trento, from December 2011 to Sep-
pp. 839–846. tember 2012. He is currently a Post-Doctoral Fellow
[24] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multi-class with the Computational and Information Sciences
support vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, and Technology Office, NASA Goddard Space Flight Center, Greenbelt,
pp. 415–425, Mar. 2002. MD, USA. His current research interests include processing and recognition
[25] T. Luo, K. Kramer, D. B. Golgof, L. O. Hall, S. Samson, A. Remsen, and techniques applied to remote-sensing images and biomedical signals (classi-
T. Hopkins, “Active learning to recognize multiple types of plankton,” fication, regression, and machine learning).
J. Mach. Learn. Res., vol. 6, no. 4, pp. 589–613, 2005.
[26] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Francés,
and G. Calpe-Maravilla, “Composite kernels for hyperspectral image
classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1,
pp. 93–97, Jan. 2006. Farid Melgani (M’04–SM’06) received the State
[27] D. Tuia, F. Ratle, A. Pozdnoukhov, and G. Camps-Valls, “Multisource Engineer degree in electronics from the University
composite kernels for urban-image classification,” IEEE Geosci. Remote of Batna, Batna, Algeria, in 1994, the M.Sc. degree
Sens. Lett., vol. 7, no. 1, pp. 88–92, Jan. 2010. in electrical engineering from the University of
[28] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, Baghdad, Baghdad, Iraq, in 1999, and the Ph.D.
“Spectral and spatial classification of hyperspectral data using SVMs degree in electronic and computer engineering from
and morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, the University of Genoa, Genoa, Italy, in 2003.
no. 11, pp. 3804–3814, Nov. 2008. He cooperated with the Signal Processing and
[29] D. Tuia, F. Pacifici, M. Kanevski, and W. J. Emery, “Classification of Telecommunications Group, Department of Bio-
very high spatial resolution imagery using mathematical morphology and physical and Electronic Engineering, University of
support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 47, Genoa, from 1999 to 2002. Since 2002, he has been
no. 11, pp. 3866–3879, Nov. 2009. an Assistant Professor and then an Associate Professor of telecommunications
[30] F. Pacifici, M. Chini, and W. J. Emery, “A neural network with the University of Trento, Trento, Italy, where he has taught pattern
approach using multi-scale textural metrics from very high- recognition, machine learning, radar remote-sensing systems, and digital
resolution panchromatic imagery for urban land-use classification,” transmission. He is the Head of the Signal Processing and Recognition
Remote Sens. Environ., vol. 113, no. 6, pp. 1276–1292, Laboratory, Department of Information Engineering and Computer Science,
Jun. 2009. University of Trento. His current research interests include processing, pattern
[31] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVM- recognition and machine learning techniques applied to remote sensing and
and MRF-based method for accurate classification of hyperspectral biomedical signals/images (classification, regression, multitemporal analysis,
images,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740, and data fusion). He has co-authored more than 130 scientific publications
Oct. 2010. and is a referee for numerous international journals.
[32] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm Dr. Melgani has served on the scientific committees of several international
for optimal margin classifiers,” in Proc. Annu. ACM Workshop Comput. conferences and is an Associate Editor of the IEEE G EOSCIENCE AND
Learn. Theory, Jul. 1992, pp. 144–152. R EMOTE S ENSING L ETTERS .
[33] V. N. Vapnik, Statistical Learning Theory, New York, NY, USA: Wiley,
1998.
[34] T. T.-F. Wu, C.-J. Lin, and R. C. Weng, “Probability estimates for multi-
class classification by pairwise coupling,” J. Mach. Learn. Res., vol. 5,
pp. 975–1005, Aug. 2004. Devis Tuia (S’07–M’09) was born in Mendrisio,
[35] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest Switzerland, in 1980. He received the Diploma
neighbor,” in Proc. 23th Int. Conf. Mach. Learn., 2006, pp. 97–104. in geography from the University of Lausanne
[36] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New (UNIL), Lausanne, Switzerland, in 2004, the Master
York, NY, USA: Wiley, 2001. of Advanced Studies in environmental engineering
[37] N. Srinivas and K. Deb, “Multiobjective function optimization using from the Ecole Polytechnique Fédérale de Lausanne
nondominated sorting genetic algorithms,” Evol. Comput., vol. 2, no. 3, (EPFL), Lausanne, in 2005, and the Ph.D. degree in
pp. 221–248, 1995. environmental sciences at UNIL in 2009.
[38] N. Ghoggali, F. Melgani, and Y. Bazi, “A multiobjective genetic SVM He was a Post-Doctoral Researcher with the Uni-
approach for classification problems with limited training samples,” versity of València, València, Spain, and the Univer-
IEEE Trans. Geosci. Remote Sens., vol. 47, no. 6, pp. 1707–1718, sity of Colorado at Boulder, CO, USA, under a Swiss
Jun. 2009. National Foundation program. He is currently a Senior Research Associate
[39] E. Pasolli, F. Melgani, and M. Donelli, “Automatic analysis of GPR with the LaSIG laboratory, EPFL. His current research interests include the
images: A pattern-recognition approach,” IEEE Trans. Geosci. Remote development of algorithms for information extraction and classification of very
Sens., vol. 47, no. 7, pp. 2206–2217, Jul. 2009. high resolution remote sensing images using machine learning algorithms.
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2233