0% found this document useful (0 votes)
14 views

SVM Active Learning Approach For Image Classification Using Spatial Information

Uploaded by

Simanta Hazra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

SVM Active Learning Approach For Image Classification Using Spatial Information

Uploaded by

Simanta Hazra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259235007

SVM Active Learning Approach for Image Classification Using Spatial


Information

Article  in  IEEE Transactions on Geoscience and Remote Sensing · April 2014


DOI: 10.1109/TGRS.2013.2258676

CITATIONS READS
135 859

5 authors, including:

Edoardo Pasolli Farid Melgani


University of Naples Federico II Università degli Studi di Trento
91 PUBLICATIONS   7,830 CITATIONS    234 PUBLICATIONS   11,648 CITATIONS   

SEE PROFILE SEE PROFILE

Devis Tuia Fabio Pacifici


École Polytechnique Fédérale de Lausanne DigitalGlobe
307 PUBLICATIONS   12,595 CITATIONS    68 PUBLICATIONS   3,168 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Landscape Archaeology in Northern Africa View project

Change detection for remote sensing View project

All content following this page was uploaded by Fabio Pacifici on 27 January 2014.

The user has requested enhancement of the downloaded file.


Copyright © 2014 IEEE

Reprinted from:

Pasolli, E.; Melgani, F.; Tuia, D.; Pacifici, F.; Emery, W.J., "SVM Active Learning
Approach for Image Classification Using Spatial Information," Geoscience and
Remote Sensing, IEEE Transactions on , vol.52, no.4, pp.2217,2233, April 2014
doi: 10.1109/TGRS.2013.2258676
URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6531640&isnumb
er=6695798

This material is posted here with permission of the IEEE.


Such permission of the IEEE does not in any way imply IEEE
endorsement of any products or services.
Internal or personal use of this material is permitted.
However, permission to reprint/republish this material for advertising or
promotional purposes or for creating new collective works for resale or
redistribution must be obtained from the IEEE by writing to:

[email protected]

By choosing to view this document, you agree to all. 


IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014 2217

SVM Active Learning Approach for Image


Classification Using Spatial Information
Edoardo Pasolli, Member, IEEE, Farid Melgani, Senior Member, IEEE, Devis Tuia, Member, IEEE,
Fabio Pacifici, Senior Member, IEEE, and William J. Emery, Fellow, IEEE

Abstract— In the last few years, active learning has been data statistics to subdivide the image into clusters of pixels
gaining growing interest in the remote sensing community in (samples) with similar characteristics. They do not require
optimizing the process of training sample collection for super- labeled information provided by the user, but this implies
vised image classification. Current strategies formulate the active a lack of correspondence between the clusters found and
learning problem in the spectral domain only. However, remote the classes desired by the user. To overcome this problem,
sensing images are intrinsically defined both in the spectral and
supervised techniques are characterized by an explicit link
spatial domains. In this paper, we explore this fact by proposing
a new active learning approach for support vector machine between samples and classes. They have shown very promising
classification. In particular, we suggest combining spectral and accuracies in terms of image classification, but they depend
spatial information directly in the iterative process of sample strongly on the labeled training samples that are used to
selection. For this purpose, three criteria are proposed to favor construct the model of classification. Training samples have
the selection of samples distant from the samples already compos- to be representative of the statistical distribution of the data.
ing the current training set. In the first strategy, the Euclidean However, the process of training sample collection is not
distances in the spatial domain from the training samples are obvious because it is performed manually by the user and thus
explicitly computed, whereas the second one is based on the it is characterized by errors and costs. This acquisition process
Parzen window method in the spatial domain. Finally, the last can be performed only on a limited portion of the available
criterion involves the concept of spatial entropy. Experiments
on two very high resolution images show the effectiveness of
data given the constraints in terms of time and money. For
regularization in spatial domain for active learning purposes. this reason, in the last few years there was a growing interest
in developing strategies for the semiautomatic selection of the
Index Terms— Active learning, image classification, spatial training samples. In the machine learning field, an interesting
information, support vector machine (SVM), very high resolution solution to address this problem is represented by the active
(VHR). learning approach. Considering a small and suboptimal initial
training set, few additional samples are selected from a large
I. I NTRODUCTION amount of unlabeled samples (learning set) through an iterative
process. The aim of active learning is to rank the learning set
O NE of the most important tasks in the remote sensing
field is represented by image classification and segmen-
tation, in which the purpose is to identify objects in an image
according to a criterion that allows us to select the most useful
samples to improve the model, thus minimizing the number
captured by an acquisition system. Because of the technologi- of training samples necessary to maintain discrimination capa-
cal developments of acquisition systems, in the last few years bilities as high as possible. In the last few years, different
there was a drastic increase of imaging systems characterized solutions were proposed and applied successfully in different
by growing spectral, spatial and temporal resolutions that can research areas [2]–[5] (see [6] for a survey on active learning
be exploited to improve the classification process. In addition methods) and in different remote sensing application fields,
to sensor developments, opportune processing techniques have such as detection of buried objects [7], [8], image retrieval [9],
to be implemented to automatically analyze such data for estimation of biophysical parameters [10], and classification of
accurate image classification [1]. multispectral and hyperspectral images [11]–[21].
From a methodological point of view, two main approaches Focusing on image classification, the strategies proposed in
for image classification are proposed in the literature: unsu- the literature can be divided into four main categories. The first
pervised and supervised. Unsupervised methods investigate category is specific for margin-based classification approaches,
such as support vector machine (SVM). In this context, margin
Manuscript received December 18, 2012; revised March 13, 2013; accepted sampling (MS) represents a good base method, in which the
April 8, 2013. Date of publication June 13, 2013; date of current version
December 24, 2013.
samples closest to the separating hyperplane are selected
E. Pasolli is with the Computational and Information Sciences and Tech- [11], [23]. In [12], the diversity of the selected samples is
nology Office, NASA Goddard Space Flight Center, Greenbelt, MD 20771 enforced by constraining the MS criterion to samples asso-
USA (e-mail: [email protected]). ciated with different closest support vectors (SVs). In this
F. Melgani is with the Department of Information Engineering and Com- way, a certain degree of spectral diversification is guaranteed,
puter Science, University of Trento, Trento 38123, Italy.
D. Tuia is with the Laboratoire des Systèmes d’Information Géographique, by dividing the margin in the feature space as a function of
Ecole Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland. the geometrical distribution of the SVs. In [13], instead of
F. Pacifici is with the DigitalGlobe, Inc., Longmont, CO 80503 USA. using the distance to the hyperplane as selection measure,
W. J. Emery is with the Department of Aerospace Engineering, University the original classification problem is reformulated into a new
of Colorado, Boulder, CO 80309 USA.
Color versions of one or more of the figures in this paper are available
binary problem where it is needed to discriminate between
online at http://ieeexplore.ieee.org. significant and nonsignificant samples, according to a concept
Digital Object Identifier 10.1109/TGRS.2013.2258676 of significance proper to the SVM theory based on the SV
0196-2892 © 2013 IEEE
2218 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

coefficients. In [14], clustering in the feature space induced based and the second one spatial-based, and 2) to combine the
by the kernel function of the SVM is proposed to prevent two criteria through the concept of nondominated sorting. The
redundancy when selecting samples. The second family of approach is proposed specifically for classification problems
methods quantifies the uncertainty of a sample by considering based on SVM. While in terms of spectral criterion we use, for
a committee of classifiers. After training each classifier by their simplicity and effectiveness, the MS and BT strategies,
exploiting different hypotheses about the classification prob- the spatial information is incorporated by three different crite-
lem, the samples showing maximal disagreement between the ria that we propose in this paper. The criteria are based on the
different classification models are selected. Committees can be hypothesis that uncertain samples that are spatially far from
realized by varying the samples members [12] or by consider- current SVs are relevant both in terms of observed spectra and
ing subsets of the feature space [15]. The third category uses of potential dataset shift. In the first strategy, the Euclidean
the estimation of posterior probabilities of class membership to distances in the spatial domain from the training samples are
select samples. In [16], using a maximum-likelihood classifier, explicitly computed, whereas the second one is based on the
the sample whose inclusion in the training set maximizes Parzen window method in the spatial domain. Finally, the last
the changes in the posterior distribution is selected. In [17], criterion involves the concept of spatial entropy. To investigate
samples are selected based on the entropy of the corresponding the performance of the proposed approach and to compare
class label. Another strategy is represented by the breaking ties the three spatial criteria, we conducted an experimental study
(BT) criterion [18], [25], in which the difference between the based on two VHR images acquired by QuickBird. Statistically
two highest posterior probabilities is considered. A modified significant advantages in terms of classification accuracy and
version of the BT algorithm is proposed in [19] to have classification reliability are evaluated with respect to strategies
unbiased sampling among classes. The last family, the cluster- that do not exploit spatial information.
based, aims at pruning a hierarchical clustering tree until the The rest of the paper is organized as follows. In Section II,
resulting clusters are consistent with the labels provided by the the proposed approach for the integration of spatial and
user [20], [21]. Therefore, it relies on an unsupervised model, spectral information and the three related spatial criteria
rather than on a predictive model. We refer the reader to [22] are described. Section III presents the datasets used in the
for a complete survey on active learning strategies for remote experimental analysis and the corresponding results. Finally,
sensing image classification. conclusions are drawn in Section IV.
It is important to note how the active learning methods pro-
posed in the remote sensing literature for image classification II. P ROPOSED M ETHOD
exploit uniquely spectral information of the involved samples.
However, remote sensing images are intrinsically defined in A. Proposed Active Learning Framework
both the spectral and spatial domains. Additionally, recently Let us consider a training set composed initially of n labeled
it has been shown that the common assumption that data are samples L = {xi , yi }ni=1 , where xi represents the features of
homogenous throughout the image, i.e., class statistics remain interest (spectral bands and/or contextual features extracted
constant over the image, is unrealistic [18]. This means that from them) and yi is a discrete label defined among T
a shift between the distributions of the training samples and possible classes. We also consider a learning set composed
the samples to classify is verified, especially when the training of m unlabeled samples U = {x j }n+m j =n+1 , with m >> n,
set only covers small regions of the image. This leads to an usually the rest of the image I (i.e., U = I −L). To increase
incompatibility of the model optimized on the training set the training set L with a series of samples chosen from the
when used to estimate the unlabeled samples. For this reason, learning set U to be labeled manually by the expert, an active
in this paper we propose to include explicitly the spatial learning algorithm has the task of choosing them properly
information in the active learning process to select samples hence to maximize the accuracy of the classification process
in regions where the classifier could be suboptimal because of while minimizing the number of learning samples to label (i.e.,
a potential dataset shift. This is done by favoring the selection number of interactions with the expert).
of samples in areas of the image poorly covered by training In Fig. 1, we show the flowchart of the active learning
samples. approach proposed in this paper, in which we propose to
We observe the use of spatial information in conjunction combine spectral and spatial information in the active selection
with the spectral one is investigated in different works in process of the training samples. The method is proposed
the literature to solve problems in different remote sensing specifically for classification problems based on state-of-the-
contexts. For example, the improvement of the classification art SVM. We refer the reader to [32] and [33] for more details
of hyperspectral or very high resolution (VHR) images is about SVM. Starting from the small and suboptimal training
obtained by adopting different approaches, such as composite set L, a multiclass SVM classifier is trained. The resulting
kernels [26], [27], morphological operators [28], [29], textural classification model is used to evaluate the unlabeled samples
metrics [30], and Markov random field regularization [31]. of the learning set U . In particular, each sample is evaluated
In this paper, the integration of the spatial information in the by two different criteria f 1 and f 2 , which incorporate spectral
active learning approach is done using a completely different and spatial information, respectively. The spectral criterion f 1 ,
approach based on the spatial position of the samples in the as done by active learning strategies proposed in the literature,
image under analysis. considers sample uncertainty in the feature domain, repre-
The objective of this paper is to propose a new active sented by the spectral bands and/or the contextual features
learning approach for classification of remote sensing images extracted from them. Additionally, the spatial criterion f 2 that
in which spatial and spectral information are combined to we propose in this paper is based on the spatial position of
improve the process of training sample collection. In partic- the sample in the considered image and in particular on the
ular, we suggest: 1) to evaluate the unlabeled samples of the distance from the current SVs. At this point, the two criteria
learning set using two different criteria, the first one spectral- are combined by sorting the samples using the nondominated
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2219

SVs: Support vectors


L: Training set SVM training

Insertion in Spectral selection Spatial selection U: Learning set


training set criterion criterion

L*s: Labeled samples

Human Nondominated
Labeling expert sorting

U*s: Selected samples Us: Sorted samples


Selection

Fig. 1. Flowchart of the active learning method proposed for integration of spatial and spectral information.

sorting algorithm, which we will describe in more detail after- B. Spectral Selection Criteria
ward. In this way the combined criterion represents a tradeoff
between spectral and spatial information, in which uncertain The objective of this paper is not to propose new selection
samples that are spatially far from current SVs and relevant methods based on spectral information, but to consider some
both in terms of observed spectra and potential dataset shift criteria already introduced in the literature. In particular, we
are detected. Finally, from the sorted samples Us , Ns samples adopt two state-of-the-art strategies, namely MS and BT,
are selected from the learning set U , where Ns is the number which have shown their simplicity and effectiveness in the
of samples to be added in the training set L. Successively, remote sensing field [22].
the selected samples Us∗ are labeled by the human expert 1) Margin Sampling: The first spectral criterion is repre-
and added to the training set L. The entire process is iterated sented by the MS strategy [23], which is proposed specifically
until the predefined convergence condition is satisfied, e.g., the for classification problems based on SVM. Considering a
total number of samples to add to the training set is not yet simple binary case with linearly separable classes, SVs are
reached. Algorithm 1 summarizes the proposed active learning the samples of the training set L closest to the hyperplane that
strategy. describes the decision boundary. If we consider the learning
set U , we can assume that the samples closest to the decision
boundary are the most interesting samples because they have
Algorithm 1 Proposed Active Learning Framework a larger probability to become SVs when added to the training
Inputs: set. Therefore, the samples selected by MS are the ones
L: initial training set, composed of n labeled samples. showing the minimum absolute values of the discriminant
U : learning set, composed of m (m >> n) unlabeled samples. function. The same reasoning is applied in case of nonlinearly
Ns : number of samples to add at every iteration of the active learning separable classes.
process. SVMs are intrinsically binary classifiers. However, the
Output: classification of remote sensing images often involves the
L: final training set. simultaneous discrimination of numerous information classes.
Repeat For this purpose, a number of multiclass classification strate-
1. Train the SVM classifier with the current training set L, while gies are proposed in the literature. The most popular ones
estimating its free parameters by crossvalidation.
2. Compute the criterion f 1 based on spectral information for each
are the one-against-all (OAA) and the one-against-one (OAO)
sample x j ( j = n + 1, n + 2, . . . , n + m) of the learning set U . strategies. Similarly, the MS method introduced for the binary
3. Analogously to the previous step, compute the criterion f 2 based on case in the previous paragraph can be extended to the multi-
spatial information for each sample of the learning set U . class classification problem. Adopting an OAA SVM classifier,
4. Sort the samples of the learning set U in function of the spectral the solution proposed in [23] is applicable in the multiclass
f 1 and spatial f 2 criteria by the nondominated sorting algorithm. The
samples are now ranked in the set Us . context. For each sample x j ( j = n + 1, n + 2, . . . , n + m),
5. Select the first Ns samples from Us . the minimum value among the discriminant functions provided
6. Label the selected samples Us∗ . by the T binary classifiers is exploited as a sample indicator,
7. Add the labeled samples L ∗s to the training set L and remove them where T is the number of different classes. Then, the samples
from U .
Until the predefined convergence condition is satisfied (e.g., the total
with the minimum indicator values are selected, to select the
number of samples to add to the training set is not yet reached). samples closest to the decision boundary. In this work, we
adopt OAO classification, which has shown to be more suitable
for practical use than OAA [24], and consequently the MS
In the following, we describe in detail the main ingredients strategy has to be adapted to this multiclass classification
of the proposed approach, namely spectral and spatial criteria strategy. In this context, T ·(T −1)/2 different binary classifiers
and nondominated sorting. are used. However, considering the minimum value among
2220 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

the entire set of involved classifiers is not appropriate as As introduced previously, such criteria consider the spatial
it could be referred to a couple of classifiers that is not position of the samples in the considered image and in
relevant for the classification problem. To solve this problem, particular the distances from the subset of SVs identified by
first for each sample x j ( j = n + 1, n + 2, . . . , n + m) the the classification model on the training set L. In the following,
number of votes of each class v j ∈ N T = [v j,1 , v j,2,..., v j,T ] we define Sn as the number of SVs identified.
is calculated and the class ωMAX, j with the largest number 1) Spatial Distance From the Closest SV (Sp1): The first
of votes v MAX, j is identified. In this way, the winner class criterion, named Sp1 in the rest of the paper, consists of
is identified. Then, considering the T − 1 classifiers asso- calculating for each sample x j ( j = n + 1, n + 2, . . . , n + m)
ciated with the class ωMAX, j , the minimum absolute value the spatial Euclidean distances from the SVs d j ∈ R Sn =
of the discriminant function f MIN, j is calculated. Finally, the [d j,1 , d j,2,..., d j,Sn ]  
samples characterized by the minimum values of f MIN, j are d j,i = p j − pi  (1)
selected.
where p is the 2-D vector containing the position of the
Algorithm 2 resumes the spectral criterion based on the MS
considered sample in the spatial domain of the image. After
strategy.
that, the nearest SV sMIN, j is identified and the corre-
sponding distance dMIN, j is considered for the spatial crite-
Algorithm 2 Spectral Selection Criterion—Margin Sampling
rion. In particular, the negative value is adopted to convert
1. Compute the number of votes of each class v j ∈ N T = the maximization problem into a minimization one. In this
[v j,1 , v j,2,..., v j,T ] for each sample x j ( j = n + 1, n + 2, . . . , n + m)
of the learning set U . way, we favor the selection of samples placed in areas of
2. Identify the class ωMAX, j with the largest number of votes v MAX, j . the image not covered by SVs and in which a potential
3. Calculate the minimum absolute value of the discriminant function
dataset shift could occur. To avoid an exhaustive search of
f MIN, j by considering the T − 1 classifiers associated with the class the nearest SV for each sample of the learning set, we
ωMAX, j . note that appropriate techniques such as cover tree data
4. Set f 1 ( j) = f MIN, j . structure [35] can be adopted. Algorithm 4 summarizes the
proposed spatial criterion based on the distance from the
closest SV.
2) Breaking Ties: The second criterion is given by the BT
Algorithm 4 Spatial Selection Criterion—Spatial Distance from the
approach [25], which is based on the posterior probabilities Closest SV
of associating a sample to a given class. Considering a binary
case, we can guess that the most interesting samples to select 1. Compute the spatial Euclidean distances from the Sn different SVs
d j ∈ R Sn = [d j,1 , d j,2,..., d j,Sn ] for each sample x j ( j = n + 1, n + 2,
from the learning set U are those characterized by posterior . . . , n + m) of the learning set U .
probabilities close to 0.5 for both classes, since their decision
2. Identify the SV sMIN, j nearest to the sample.
uncertainties are maximum. Equivalently, this can be viewed
as selecting the samples for which the differences between the 3. Consider the distance dMIN, j associated with the SV sMIN, j .
two posterior probabilities are minimum, as suggested by the 4. Set f 2 ( j) = −dMIN, j .
BT formulation. In a multiclass context, this selection strategy
remains valid, independently from the number of classes T , 2) Parzen Window Method in the Spatial Domain (Sp2):
as the difference between the two highest probabilities is In the strategy Sp1 introduced in the previous subsection,
indicative of the way a sample is handled by the classifier. after calculating for each sample of the learning set the
When the two highest probabilities are close, the classifier distances from SVs, the selection criterion considers only the
confidence is low. value associated with the nearest SV. Here, we propose an
SVMs are not a probabilistic approach and thus they do alternative strategy (Sp2), in which all distance values are
not directly yield in output probabilistic quantities. However, opportunely combined. For this purpose, we recall the Parzen
in the literature, some solutions are proposed to infer poste- window method, which represents a standard way to estimate
rior probability estimates from discriminant function values the probability density function of a random variable [36].
provided by SVMs. In this paper, we adopt the Platt’s In particular, given some samples drawn from a distribution
estimation [34]. with an unknown probability density function, this one is
Algorithm 3 resumes the spectral criterion based on the BT estimated by weighting the samples with a kernel function
strategy. defined a priori. In a similar way, in the Sp2 strategy that
we propose, the distance function is estimated in the spatial
Algorithm 3 Spectral Selection Criterion—BT domain by combining distances with respect to SVs using a
1. Compute the posterior probability of each class p j ∈ N T = kernel function. The spatial criterion dKER, j for the sample
[ p j,1 , p j,2,..., p j,T ] for each sample x j ( j = n + 1, n + 2, . . . , n + m) x j ( j = n + 1, n + 2, . . . , n + m) is given by the following
of the learning set U using Platt’s estimation.
formulation:
2. Identify the two classes ωMAX, j , ωMAX2, j with the two highest 
Sn
 
posterior probabilities. dKER, j = K p j , pi (2)
3. Calculate the difference pMAX, j between the two posterior proba-
i=1
bilities associated with the classes ωMAX, j and ωMAX2, j .
4. Set f 1 ( j) = pMAX, j . where K (·) is the kernel function that has to be fixed a priori.
In this paper, we adopt the common Gaussian kernel function,
which allows us to put more weight on the SVs spatially close
C. Spatial Selection Criteria to the considered sample
In this paper, three different criteria are proposed to integrate     2  
spatial information in the process of training sample collection. K p j , pi = exp −p j − pi  λ2 . (3)
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2221

where  is the floor operator, and subdivide the √ image in


both the horizontal and vertical directions into h uniform
intervals to obtain h rectangles of equal area. At this point, the
probability value for each region is computed as the number of
SVs present in that region divided by the total number of SVs
Sn. First, the entropy value H L is calculated by considering the
SVs associated with the current training set L only. Then, for
each sample x j ( j = n + 1, n + 2, . . . , n + m) we calculate the
corresponding entropy value H j by adding it to the set of SVs.
Therefore, we derive the spatial entropy variation Hv, j , which
is defined as the difference between Hv, j and H L , i.e., the
Fig. 2. Evolution of the parameter λ defined in (4) achieved on the Las
values of entropy with and without the insertion of the sample
Vegas dataset. in the set of SVs. The variation of entropy is maximum for
those samples that are placed in the regions of the image that
are covered by the minimum number of SVs. For this reason,
Note that the kernel function is not applied in the feature the objective is to maximize the spatial entropy variation value
domain but in the spatial one. The parameter λ is related to to distribute spatially as most as possible the training samples.
the width of the kernel and has to be estimated. In this paper, Algorithm 6 summarizes the proposed spatial criterion
we suggest to set it automatically as follows: based on the spatial entropy variation.
1 
n+m
λ= dMIN, j . (4) Algorithm 6 Spatial Selection Criterion—Spatial Entropy Variation
m 1. Compute the parameter h using (6) and subdivide the image in h
j =n+1
rectangular regions of equal area.
In this way, the parameter λ is adaptive and is modified 2. Compute the spatial entropy H L value by considering the training set
throughout the iterations according to the distances observed. L using (5).
One can reasonably expect that it tends to become smaller, 3. Compute the spatial entropy H j , for each sample x j ( j = n + 1,
as through the iterations the SVs tend to better cover the n + 2, . . . , n + m) of the learning set U .
entire image and therefore the distance values dMIN, j tend 4. Compute the spatial entropy variation HV, j = H j − H L .
to decrease. An example is shown in Fig. 2, in which the 5. Set f 2 ( j) = −H V, j .
evolution of the parameter λ in function of the number of
training samples is shown for the Las Vegas dataset (additional
details on this dataset will be given in Section III). D. Nondominated Sorting
Algorithm 5 synthesizes the proposed spatial criterion based
on the Parzen window method in the spatial domain. At this point, the samples of the learning set U have to
be sorted in function of the spectral f1 and spatial f 2 criteria
introduced in the previous sections. This operation is not trivial
Algorithm 5 Spatial Selection Criterion—Parzen Window Method in the
Spatial Domain since multiple (i.e., two in our case) measures of competing
1. Compute the spatial Euclidean distances from the Sn different SVs
objectives (criteria) have to be simultaneously considered. This
d j ∈ R Sn = [d j,1 , d j,2,..., d j,Sn ] for each sample x j ( j = n + 1, can be viewed as a multiobjective optimization problem.
n + 2, . . . , n + m) of the learning set U . In general terms, from a mathematical point of view, mul-
2. Identify the SV sMIN, j nearest to the sample. tiobjective optimization consists of finding the solution that
3. Consider the distance dMIN, j associated with the SV sMIN, j . optimizes the ensemble of Q objective functions
4. Compute the parameter λ using (4).

f (p) = fi (p) , i = 1, 2, . . . , Q (7)
5. Compute the kernel distance dKER, j using (2).
6. Set f 2 ( j) = dKER, j . where p is a solution to the considered optimization problem.
In our case, this consists of finding the sample that minimizes
the two (i.e., Q = 2) criteria f 1 and f 2 . This problem
3) Spatial Entropy Variation (Sp3): The last strategy (Sp3) can be addressed in two main ways. The first consists of
proposed involves the concept of spatial entropy. In informa- summing the different criteria to obtain a single objective
tion theory, for a discrete random variable Z with possible function. This is done usually by a linear weighted combi-
values {z 1 , z 2 . . . , z v }, the entropy H (Z ) can be written as nation, in which weights have to be fixed a priori by the user.
follows: An alternative strategy resides in the concept of nondominated
v
H (Z ) = − p(z k ) logb p (z k ) (5) sorting. It is introduced in the literature in the context of
k=1
multiobjective optimization using genetic algorithms [37] and
is applied successfully to different remote sensing problems
where p(·) is the probability function of Z and b is the base [38], [39]. It is based on the concept of dominance which
of the logarithm used. The entropy value is maximized when states that a solution pi is said to dominate another solution
p(·) assumes a uniform distribution. In our active learning p j if and only if
problem, to have a spatial distribution statistically significant,
we subdivide (quantize) the entire image in h different regions. ∀ k ∈ {1, 2, . . . , Q} , f k (pi )
In particular, we consider a value of h equal to ≤ f k (p j ) ∧ ∃ k ∈ {1, 2, . . . , Q} : fk (pi ) < f k (p j ). (8)

2 This concept leads to the definition of Pareto optimality: a
h= Sn (6) solution p∗i ∈  ( is the solution space) is said to be Pareto
2222 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

(a) (b)

Fig. 3. Illustration of nondominated sorting.

TABLE I
C HARACTERISTICS OF THE I MAGES U SED FOR THE E XPERIMENTS

Site Information Image Information


Location Dimension Satellite Acquisition Date Spatial
[pixels] Resolution[m]
(c) (d)
Las Vegas (Nevada) 756 × 723 QuickBird May 10, 2002 0.6
Rome (Italy) 1188 × 973 QuickBird July 19, 2004 0.6 Fig. 4. Datasets used for the experiments. False color image for (a) the Las
Vegas and (c) the Rome datasets. Ground truth for (b) the Las Vegas and
(d) the Rome datasets.

optimal if and only if there exists no other solution p j ∈ 


that dominates p∗i . The latter is said to be nondominated, and To consider this second situation, a second test area acquired
the set of all nondominated solutions forms the Pareto front in 2004, shown in Fig. 4(c), is used including a suburban
of optimal solutions. scene of Rome (Italy) composed of a more complex urban
Once the Pareto front is identified, a single solution has to structure with buildings showing a variety in heights (from
be selected from the set of nondominated solutions. Although, four floors to 12), dimensions and shapes including apart-
in general, different strategies can be adopted, in this paper ment blocks and towers. In particular, the Rome image has
we used the simple median solution to maintain a trade- two completely different urban architectures separated by a
off between the two different criteria. In case that Ns dif- railway. The area located in the upper right part of the scene
ferent solutions have to be extracted simultaneously, the Ns was built during the 1960s; buildings are very close to each
solutions of the Pareto front closest to the median one are other and have a maximum of five floors, while roads are.
considered. The other side of the railway was developed during the 1980s
An example of nondominated sorting is shown in Fig. 3, and 1990s; buildings have a variety of architectures, from
in which solutions (samples) are represented in function of apartment blocks (eight floors) to towers (12 floors), while
the two spectral f 1 and spatial f 2 criteria. The nondominated roads are wider than those on the other side of the railroad
samples (in red) constitute the Pareto front, which represents tracks.
the set of optimal solutions. From this set the selected sample Several different surfaces of interest are identified, many
is simply given by the median one (in green). Dominated of which are particular to the specific scene. For the Las
solutions are drawn with black crosses. Vegas dataset, one goal was to distinguish the different uses
of the asphalt surfaces, which included roads (i.e., roads that
III. E XPERIMENTS link different residential houses), highways (i.e., roads with
more than two lanes), and parking lots. An unusual structure
A. Dataset Description within the scene was a drainage channel located in the upper
To validate the proposed active learning approach, exper- part of the image. A further discrimination is made between
iments are conducted on two multispectral pan-sharpened residential houses and commercial buildings, and between bare
VHR remote sensing images acquired by QuickBird with soil (terrain with no use) and soil (generally, backyards with
0.6-m resolution. The datasets reflect different types of urban no vegetation cover). Finally, more traditional classes, such as
settlements at different levels of complexity. Details of sites trees, short vegetation, and water are added for a total of eleven
and images are shown in Table I. classes of land-use. The areas of shadow are very limited in
The first dataset was acquired in 2002 and refers to a the scene as the modest heights of buildings and relative sun
portion of the city of Las Vegas (Nevada). The scene, shown elevation (65.9°).
in Fig. 4(a), contains regular criss-crossed roads and examples Because of the dual nature of the architecture of the Rome
of buildings with similar heights (about one or two floors) test case, the selection of the classes is made to investigate the
but different dimensions, from small residential houses to potential of discriminating between structures with different
large commercial structures. It represents a common American heights, including buildings (structures with a maximum of
suburban landscape, including small houses and large roads, five floors), apartment blocks (rectangular structures with
which is different from the European style of old cities built a maximum of eight floors) and towers (more than eight
with more complex structures. floors). As for the previous case, other surfaces of interest
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2223

TABLE II is chosen according to the image resolution. For the Las Vegas
N UMBER OF L EARNING AND T EST S AMPLES dataset, a square SE is used to consider the major directions
FOR (a) L AS V EGAS AND (b) R OME D ATASETS of the objects on the image, which are 0° and 90°. For the
Rome dataset, being characterized by an overall 45° angle
(a) in the disposition of the objects, a diamond-shaped SE is
Class # Learning Samples # Test Samples used instead. This shape allows a better reconstruction of
Bare soil 4276 48 908 the borders of the objects in the case of O and C features.
Commercial buildings 1831 20 938
The process of reconstruction for OR and CR operators is
performed using a small (3-pixel diameter) SE. The entire
Drainage channel 1149 13 138
process of morphological filtering increases the dimensionality
Highways 2851 32 594
of the datasets from four to 40 features.
Parking lots 2269 25 939
Residential houses 7044 80 546
B. Experimental Setup
Roads 6130 70 088
Short vegetation 1803 20 611
In all the following experiments, for both datasets, all
the available samples are split in two sets, corresponding to
Soil 1480 16 918
learning set U and test set. The detailed numbers of learning
Trees 1049 11 989
and test samples are shown in Table II. The initial training
Water 118 1354 samples are selected randomly from the learning set U . For
Total 30 000 34 3023 the first dataset, starting from 55 samples, i.e., five samples per
(b)
class, the active learning algorithm is run until the number of
training samples is equal to 7995, adding 20 samples at each
Class # Learning Samples # Test Samples iteration. Analogously, for the second dataset, starting from
Apartment blocks 7081 10 2735 36 samples, i.e., four sample for each class, 20 samples are
Bare soil 5241 76031 added at each iteration up to 11 996 samples. The entire active
Buildings 11 688 16 9568 learning process is run ten times, each time with a different
Railway 1036 15 024 initial training set to yield statistically reliable results. At each
Roads 10 545 15 2992
run, the initial training samples are chosen in a completely
random way.
Short vegetation 44 89 65 128
Classification performances are evaluated in terms of several
Soil 971 14 086
measures: 1) the overall accuracy (OA), which is the percent-
Tower 3089 44 827 age of correctly classified samples among all the considered
Trees 5860 85 020 samples, independently of the classes they belong to; 2) the
Total 50 000 72 5411 Kappa statistic [41]; 3) the classification accuracies obtained
for the different classes; 4) the average accuracy (AA), which
is the average over the classification accuracies obtained for
are recognized, including roads, trees, short vegetation, soil, the different classes; and 5) the standard deviations (σ ) of OA,
bare soil, and the peculiar railway for a total of nine classes. Kappa and AA, to evaluate the stability of the active learning
Differently from the previous case, in this scene shadows method.
occupy a larger portion of the image. An SVM classifier is also trained on the entire learn-
The ground-truths are shown in Fig. 4(b) and (d) for the ing set to have a reference-training scenario, called full
Las Vegas and Rome datasets, respectively. They are obtained training. On one hand, the classification results obtained in
by careful visual inspection of separate data sources, including this way represent an upper bound for the accuracies. On
aerial images, cadastral maps and in situ inspections (for the the other hand, we expect that the lower accuracy bound
Rome scene only). An additional consideration regards objects will be given by the completely random selection strategy
within shadows that reflect little radiance because the incident (R). We recall that the purpose of any active learning strategy
illumination is occluded. These surfaces are assigned to one of is to converge to the performance of the full training scenario
the corresponding classes of interest described above. When faster than the R method. In addition, the proposed approach
classifying images at submeter spatial resolution, many of the for spectral-spatial information integration is compared with
errors may occur in the boundaries between objects. On the the performances given by the MS and BT strategies based on
other hand, often it is not possible to correctly identify an spectral information only.
edge. To limit this effect, we defined the two ground-truths by
not including boundary areas.
We note that for both datasets several classes have very C. Experimental Results
similar spectral signatures. To differentiate them, we applied Considering the Las Vegas dataset, the OA for the full
on the original images contextual filters based on mathemat- classifier is equal to 95.47%. In Fig. 5(a)–(f) we show the
ical morphology [40], which were shown to have desirable results in terms of (a) and (b) OA, (c) and (d) Kappa, and
properties when applied to urban VHR classification problems (e) and (f) AA in function of the number of training samples
[28] and [29]. In particular, four very common morphological for the proposed active learning approach, the MS and BT
filters are considered: opening (O), closing (C), opening by strategies, and the random selection. First, it is evident how
reconstruction (OR), and closing by reconstruction (CR). For the active selection of the training samples allows a faster
each of these filters, we used a structuring element (SE) whose convergence to the full accuracy with respect to the random
dimensions increased from 9 to 25 pixels with steps of 2 pixels, strategy. Comparing the different active learning strategies, we
resulting in nine morphological features. The size of the SEs note that the integration of the spatial information is useful
2224 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

(a) (b)

(c) (d)

(e) (f)

Fig. 5. Performances achieved on the Las Vegas dataset in terms of (a) and (b) OA, (c) and (d) Kappa, and (e) and (f) AA. Each graph shows the results
in function of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set. Shaded areas represent standard
deviation over the ten considered runs. R: random. MS: margin sampling. BT: breaking ties. Sp1–Sp3: spatial criterion. Full: full SVM.

in the process of training sample collection. In particular, the sample labeling and a decreasing of the computational time
strategies based on the criteria Sp1 and Sp2 converge to the full necessary to train the classifier.
accuracy using about 5000 training samples, which represent The obtained results are shown in greater detail in Table III.
about 17% of the entire learning set. Instead, about 6000 and In particular, we considered the performances obtained after
7000 samples are necessary for the methods based on the (a) 50 and (b) 100 iterations of the iterative process, which
criterion Sp3 and the spectral information only, respectively. corresponds to (a) 1035 and (b) 2035 samples used to train
In addition, we note that, before convergence, the proposed the classifier, respectively. We report the values of OA, Kappa,
strategies give an improvement with respect to the traditional AA, standard deviations associated with the accuracies (σOA ,
MS and BT criteria. This means that similar values of accu- σKAPPA , and σAA ) and class accuracies. As it can be seen, the
racies can be obtained using a minor quantity of training proposed strategies are characterized by a better performance
samples, which implies a reduction of the manual work for with respect to the MS and BT criteria from different points
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2225

TABLE III
OA, K APPA , AA, C LASS A CCURACIES , AND S TANDARD D EVIATIONS (σ ) A CHIEVED ON THE L AS V EGAS D ATASET
A FTER (a) 50 AND (b) 100 I TERATIONS OF THE A CTIVE L EARNING P ROCESS

(a)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 1035
OA 95.47 58.98 84.89 88.09 89.73 90.25 88.83 87.87 89.38 90.01 88.48
σOA − 5.74 0.56 0.40 0.24 0.25 0.40 0.28 0.16 0.17 0.26
Kappa 0.947 0.533 0.823 0.860 0.880 0.886 0.870 0.856 0.876 0.883 0.866
σKAPPA − 0.060 0.007 0.005 0.002 0.002 0.004 0.003 0.002 0.002 0.003
AA 93.35 59.33 79.22 83.15 84.86 85.37 83.98 82.54 84.40 85.05 83.47
σAA − 4.10 1.47 0.80 0.38 0.42 0.76 0.87 0.31 0.33 0.80
Bare soil 99.53 65.74 98.09 98.30 99.32 99.41 99.10 98.11 98.69 99.02 98.12
Commercial buildings 98.22 72.86 88.95 93.36 94.09 94.61 93.28 91.78 92.25 92.70 91.46
Drainage channel 99.43 58.61 91.82 96.42 97.14 97.78 96.04 96.05 95.44 96.78 94.39
Highways 97.22 45.50 86.68 90.77 93.77 94.23 91.85 90.81 94.36 95.42 93.36
Parking lots 86.73 52.67 63.14 66.11 68.96 69.83 67.52 66.36 70.32 70.82 69.43
Residential houses 97.76 61.27 91.25 94.11 96.41 96.83 95.69 94.49 96.00 96.53 94.97
Roads 96.26 59.98 87.99 91.00 91.64 92.37 90.67 91.10 91.45 92.19 90.57
Short vegetation 91.06 60.03 75.56 81.52 83.72 83.96 82.67 81.53 83.96 84.35 82.94
Soil 88.26 42.66 51.59 57.31 57.67 58.71 56.82 53.22 55.20 56.14 54.50
Trees 82.39 55.96 62.03 66.74 69.49 70.00 68.80 66.26 69.40 69.93 68.03
Water 90.03 77.32 74.33 78.99 81.21 81.37 81.31 78.21 81.29 81.65 80.39

(b)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 2035
OA 95.47 58.98 87.18 90.54 92.13 92.61 91.21 90.58 92.18 92.64 91.25
σOA − 5.74 0.42 0.89 0.19 0.19 0.35 0.32 0.14 0.14 0.22
Kappa 0.947 0.533 0.850 0.889 0.908 0.914 0.897 0.890 0.909 0.914 0.898
σKAPPA − 0.060 0.005 0.010 0.001 0.001 0.003 0.004 0.001 0.001 0.002
AA 93.35 59.33 82.00 86.55 88.39 88.93 87.46 86.58 88.53 88.97 87.50
σAA − 4.10 1.00 1.07 0.27 0.28 0.61 0.43 0.20 0.21 0.31
Bare soil 99.53 65.74 98.37 97.45 98.39 98.42 98.36 98.81 99.32 99.43 98.76
Commercial buildings 98.22 72.86 91.10 95.52 96.31 96.85 95.38 94.46 95.02 95.54 94.04
Drainage channel 99.43 58.61 95.03 97.48 98.18 98.53 97.29 98.01 97.87 98.71 96.99
Highways 97.22 45.50 90.65 93.85 96.06 96.55 94.34 93.53 96.55 96.91 95.72
Parking lots 86.73 52.67 64.83 73.68 76.77 78.04 75.03 74.31 78.53 79.10 77.25
Residential houses 97.76 61.27 93.21 95.31 97.38 97.54 96.66 95.66 97.20 97.47 96.36
Roads 96.26 59.98 89.56 92.88 93.61 94.27 92.59 92.80 93.57 94.30 92.53
Short vegetation 91.06 60.03 79.21 83.91 86.49 87.03 85.35 84.47 87.19 87.30 85.93
Soil 88.26 42.66 58.82 69.23 69.15 70.37 68.02 63.32 65.20 66.16 63.71
Trees 82.39 55.96 65.48 70.07 74.13 74.78 73.18 71.63 74.93 75.67 74.11
Water 90.03 77.32 75.70 82.70 85.81 85.85 85.88 85.34 88.49 88.10 87.11

of view. First, better values of accuracies (OA, Kappa, AA, information. We recall that, considering the standard α level of
class accuracies) are obtained using the same number of 5%, statistical significance difference is obtained if |z| > 1.96.
training samples. Then, smaller values of standard deviations Therefore, we note a significant superiority of the proposed
associated with the accuracies are verified. Indeed, smaller methods with respect to the criteria in which spatial informa-
values of standard deviation mean that the proposed strategies tion is not considered.
exhibit a greater level of stability with respect to the random To better understand the proposed strategies, a set of maps
selection of the initial training set. is shown in Fig. 6. In Fig. 6(a), we report the gray-level
Further considerations can be drawn from Table IV. To representation of the remote sensing image with an example
evaluate the statistical significance of obtained performances, of training sample set. In particular, for such analysis we
we report the z value related to the two-tailed McNemar’s considered the initial training set that gives the value of OA
test. In particular, we compare the proposed strategies with closer to the mean value obtained at the first iteration, i.e.,
criteria (a) MS and (b) BT that do not incorporate spatial equal to 58.98 as shown in Table III. The training samples
2226 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

Fig. 6. Maps for the Las Vegas dataset in terms of (a) set of initial training samples, (b) spectral criterion based on MS and (c) relative
histogram, (d) spectral criterion based on BT and (e) relative histogram, (f) spatial criterion based on the distance from the closest SV (Sp1) and (g) relative
histogram, (h) combined criterion (MS+Sp1), (i) combined criterion (BT+Sp1), (j) spatial criterion based on the Parzen window method, (k) relative histogram,
(l) combined criterion (MS+Sp2), (m) combined criterion (BT+Sp2), (n) spatial criterion based on the entropy variation (Sp3), (o) relative
histogram, (p) combined criterion (MS+Sp3), and (q) combined criterion (BT+Sp3).

are depicted with circles colored with the corresponding class map only, but is obtained by averaging the results on the ten
colors. In Fig. 6(b), we report the map of the discriminant different experiment runs. It is evident how many samples
function value, which is given for each sample j of the image are associated with very low values of discriminant function,
by the minimum absolute value of the discriminant function which are depicted in dark blue, and therefore are placed
f MIN, j obtained by training the classifier on the considered in the proximity of the boundary between different classes.
set of training samples. In this map, SVs are highlighted with This map corresponds to the map associated with the MS
white circles. The correspondence between the discriminant criterion, in which the samples closer to the boundary are
function value and the color is shown in Fig. 6(c), in which selected. Similarly, in Fig. 6(d) we report the map related
the histogram of the map is reported. In particular, for com- to the BT criterion, in which we report for each sample j
pleteness, the histogram is not referred to the considered single of the image the difference pMAX,j between the two highest
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2227

TABLE IV
M C N EMAR ’ S T EST A CHIEVED ON THE L AS V EGAS D ATASET
W ITH R ESPECT TO : (a) MS AND (b) BT C RITERIA
(a)
MS MS MS MS MS MS
Method
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 1035 2035
Z value 6.97 7.23 2.79 6.66 6.94 2.54

(b)
BT BT BT BT BT BT (a) (b)
Method
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3 Fig. 7. Maps for the Las Vegas dataset in terms of (a) discriminant function
value and (b) relative histogram for the full SVM.
#Training samples 1035 2035
Z value 6.85 7.01 2.61 6.80 6.90 2.58

posterior probabilities. The corresponding histogram is shown


in Fig. 6(e). Also in this case, many samples are characterized
by low values of the criterion, which imply that the classifier
confidence is poor. Additionally, in the other figures we report
the maps associated with the spatial information. In particular,
the map and the relative histogram associated with the criterion
based on the spatial distance from the closest SV (Sp1) are
shown in Fig. 6(f) and (g), respectively. It appears how dark
blue samples are placed in area of the image not covered by
SVs. In Fig. 6(h) and (i), we show the final maps obtained (a)
by combining the spatial Sp1 with the spectral (h) MS and (i)
BT criteria. Again, we show with dark blue color the samples
to select, i.e., in this case, the samples belonging to the set
of nondominated solutions. Similarly, in Fig. 6(j) and (k), we
report the map and the relative histogram associated with the
criterion based on the Parzen window method in the spatial
domain (Sp2). The maps that combine the Sp2 and the MS
and BT criteria are shown in Fig. 6(l) and (m), respectively.
We note that these maps are similar to those shown in Fig. 6(f)
and (g). This justifies similar performances of the two different
criteria Sp1 and Sp2 as described previously. Finally, the maps
and the histogram related to the Sp3 strategy, in which the
maximization of the spatial entropy variation is desired, are
shown in Fig. 6(n)–(q).
The analysis of the discriminant function value allows us (b)
to conduct further considerations on the different compared
Fig. 8. Results achieved on the Las Vegas dataset in terms of (a) mean value
strategies. In particular, the discriminant function value rep- of discriminant function and (b) standard deviation of discriminant function.
resents an information related to the reliability of the class Each graph shows the results in function of the number of training samples
estimation given by the classification process. Considering and averaged over ten runs of the algorithm, each with a different initial set.
the Fig. 6(b) and (c), in which the map of the discrimi- R: random. MS: margin sampling. Sp1–Sp3: spatial criterion. Full: full SVM.
nant function and the relative histogram associated with the
initial training set are depicted, we highlighted previously
how most of the samples are characterized by low values of a relatively high discriminant function value. To perform a
of discriminant function. This means that they are estimated more detailed analysis, we consider two measures which are
with poor levels of confidence. This aspect is confirmed by the following: 1) the mean value of the discriminant function
the fact that very few training samples are used to construct calculated on the entire map and 2) the standard deviation of
the classification model. We note that the histogram has the discriminant function value normalized with respect to the
approximately a monotonous decreasing. In Fig. 7(a) and (b), corresponding mean value. In Fig. 8, we report the results in
we show the map of the discriminant function and the relative function of the number of training samples for the proposed
histogram using the full training set. In this case, in which a active learning strategies, the MS, and the random ones. For
high number of training samples is considered, the samples a better visualization, we show the results until 200 iterations
are characterized by high discriminant function values and of the iterative process. Considering the initial training set,
thus high confidence. The histogram is completely different which corresponds to the starting point of the curves, it is
with respect to that shown in Fig. 6(c). We do not have a confirmed that the discriminant function has a low mean value
monotonous decreasing, but a peak placed in correspondence and a high standard deviation. In particular, a high value of
2228 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

TABLE V
M EAN VALUE AND S TANDARD D EVIATION OF THE D ISCRIMINANT F UNCTION A CHIEVED ON THE L AS V EGAS D ATASET

MS MS MS MS MS MS
Method
Full Initial R MS +Sp1 +Sp2 +Sp3 R MS +Sp1 +Sp2 +Sp3
#Training samples 30 000 55 1035 2035
Mean value 1.03 0.46 1.02 1.23 1.07 1.07 1.16 1.02 1.11 1.05 1.05 1.08
Standard deviation 0.06 0.45 0.25 0.25 0.16 0.17 0.22 0.22 0.18 0.16 0.15 0.16

(a) (b) (c)

(d) (e) (f) (g)

(h) (i) (j) (k)

Fig. 9. Classification maps achieved on the Las Vegas dataset. Top row: (a) initial, (b) full, and (c) random criterion (R). Middle row: (d) spectral criterion
based on MS, (e) MS in combination with spatial criterion based on the distance from the closest SV (Sp1), (f) MS in combination with spatial criterion
based on the Parzen window method (Sp2), and (g) MS in combination with spatial criterion based on the entropy variation (Sp3). Bottom row: (h) spectral
criterion based on BT, (i) BT in combination with Sp1, (j) BT in combination with Sp2, and (k) BT in combination Sp3. For maps (c)–(k), 1035 training
samples are used.

the standard deviation implies that the confidence map tends to spatial information in the active learning process allows us
be inhomogeneous, i.e., some samples are classified with low to obtain better performance not in terms of accuracy only,
reliability and other ones with high confidence. This result but also in terms of classification reliability, which can be
is not desirable, because the classification process tends to estimated by analyzing the discriminant function map. The
classify the samples with different levels of reliability. Instead, mean and standard deviation values are shown in Table V, in
using the full training set, we have an increment of the mean which the results obtained after 50 and 100 iterations of the
value and a substantial decrement of the standard deviation iterative process are reported. Similar considerations can be
value, which comes from a more homogeneous confidence done also for the BT strategy, which we do not report so as
map. This scenario tends to the ideal case, in which all the to not overload the paper.
samples are classified with high reliability and the associated To conclude the analysis on the Las Vegas dataset, we show
confidence map is approximately homogenous. Considering in Fig. 9 the classification maps obtained with the different
the different selection strategies, we note that active learning selection criteria. In particular, the results after 50 iterations
methods are able to increment the discriminant function mean of the iterative process are considered. Although the visual
value faster than the random selection. In addition, a faster inspection of the images is not easy, we note a slight decrease
convergence to the full result is obtained using the two of noisy classification patterns when the spatial criterion is
proposed strategies that combine the MS criterion with the integrated in the selection process.
Sp1 and Sp2 criteria. These two criteria allow us to have Concerning the Rome dataset, the results confirm the obser-
quicker decreases of the standard deviation value also, which vations done for the Las Vegas one. The graphs with the
are verified since the first iterations. Adopting the MS criterion accuracies in function of the number of training samples are
only, an improvement with respect to the random strategy is shown in Fig. 10(a)–(f). For the full classifier the OA is
obtained only when about 1500 samples are added to the equal to 88.89. Also for this set of experiments, the proposed
training set. These results show how the integration of the active learning strategies give a faster convergence to the full
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2229

(a) (b)

(c) (d)

(e) (f)

Fig. 10. Performances achieved on the Rome dataset in terms of (a) and (b) OA, (c) and (d) Kappa, (e) and (f) AA. Each graph shows the results in function
of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set. Shaded areas: standard deviation over the ten
considered runs. R: random. MS: margin sampling. BT: breaking ties. Sp1–Sp3: spatial criterion. Full: full SVM.

accuracy and better performances before convergence with in most of the cases the proposed strategies exhibit significance
respect to the random, the MS, and the BT methods. The superiority with respect to criteria that do not exploit spatial
criteria Sp1 and Sp2 allow to converge to the full accuracy information.
using about 9000 training samples, whereas about 11 000 In terms of discriminant function value, the results in
samples are necessary for the methods based on the criterion function of the number of training samples until 190 iter-
Sp3 and the spectral information only. The results are shown ations of the iterative process are shown in Fig. 11. The
in Table VI, in which we report the accuracies obtained after criteria Sp1 and Sp2 confirm the best performance both in
(a) 100 and (b) 200 iterations of the active learning process, terms of mean value increasing and standard deviation value
which correspond to (a) 2016 and (b) 4016 training samples, decreasing. In particular, the mean and standard deviation
respectively. Statistically significance differences evaluated by values obtained after 100 and 200 iterations are shown in
McNemar’s test are shown in Table VII. Also for this dataset, Table VIII.
2230 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

TABLE VI
OA, K APPA , AA, C LASS A CCURACIES , AND S TANDARD D EVIATIONS (σ ) A CHIEVED ON THE ROME D ATASET A FTER (a) 100 AND (b) 200 I TERATIONS
OF THE A CTIVE L EARNING P ROCESS
(a)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 2016
OA 88.89 40.49 75.91 77.77 78.95 79.73 77.80 77.57 79.26 80.21 78.16
σOA − 4.50 0.27 0.50 0.19 0.21 0.42 0.62 0.22 0.21 0.49
Kappa 0.868 0.322 0.713 0.735 0.753 0.762 0.739 0.732 0.756 0.767 0.743
σKAPPA − 0.043 0.003 0.006 0.002 0.002 0.005 0.007 0.002 0.002 0.005
AA 88.74 45.50 72.89 75.47 76.52 77.28 75.46 75.14 76.42 77.35 75.37
σAA − 2.98 0.76 0.64 0.31 0.30 0.58 0.69 0.30 0.29 0.55
Apartment blocks 82.18 15.97 59.90 63.38 65.00 65.82 63.16 61.86 60.23 61.62 58.60
Bare soil 95.82 75.86 91.93 91.73 93.94 94.05 93.48 92.28 92.00 92.74 91.17
Buildings 88.00 24.02 73.88 75.87 75.63 76.77 73.92 77.74 80.80 81.75 79.41
Railway 96.61 78.59 91.02 90.15 93.12 93.44 92.27 91.84 91.02 91.98 90.38
Roads 92.82 57.20 89.21 89.14 92.04 92.42 91.33 88.51 91.72 92.37 91.12
Short vegetation 87.70 42.72 70.42 75.51 72.64 74.28 71.56 75.53 77.30 78.87 75.94
Soil 88.45 51.36 57.99 66.53 65.68 66.39 64.67 67.06 69.67 70.11 68.48
Tower 74.18 24.07 34.11 39.02 40.32 41.44 39.09 34.66 35.75 36.49 34.99
Trees 92.89 39.73 87.57 87.89 90.34 90.94 89.67 86.78 89.28 90.27 88.25
(b)
MS MS MS BT BT BT
Method Full Initial R MS BT
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 4016
OA 88.89 40.49 78.54 80.50 82.30 83.03 81.04 80.65 82.82 83.59 81.70
σOA − 4.50 0.33 0.42 0.08 0.09 0.24 0.41
Kappa 0.868 0.322 0.744 0.768 0.792 0.800 0.777 0.769 0.798 0.807 0.785
σKAPPA − 0.043 0.004 0.005 0.001 0.001 0.002 0.005
AA 88.74 45.50 76.50 78.80 80.47 81.21 79.25 78.86 80.64 81.48 79.49
σAA − 2.98 0.72 0.39 0.10 0.09 0.25 0.42
Apartment blocks 82.18 15.97 64.69 68.35 71.04 71.98 68.92 67.91 66.58 68.10 64.45
Bare soil 95.82 75.86 92.85 92.81 95.26 95.41 94.67 93.01 93.27 93.84 92.60
Buildings 88.00 24.02 75.88 79.86 79.71 80.94 77.90 80.98 84.39 85.27 82.95
Railway 96.61 78.59 92.99 91.64 94.50 94.85 93.55 92.61 92.92 93.57 92.15
Roads 92.82 57.20 89.86 88.66 92.54 92.63 91.93 89.15 92.88 93.03 92.56
Short vegetation 87.70 42.72 74.87 79.62 77.69 79.22 76.25 79.80 81.82 83.26 80.34
Soil 88.45 51.36 65.64 72.44 73.07 73.70 71.79 74.81 77.36 78.54 76.07
Tower 74.18 24.07 43.38 47.81 48.67 50.34 47.26 43.16 44.52 45.37 42.87
Trees 92.89 39.73 88.35 88.04 91.71 91.79 90.97 88.32 91.98 92.29 91.40

(a) (b)

Fig. 11. Results achieved on the Rome dataset in terms of (a) mean value of discriminant function and (b) standard deviation of discriminant function.
Each graph shows the results in function of the number of training samples and averaged over ten runs of the algorithm, each with a different initial set.
R: random. MS: margin sampling. Sp1–Sp3: spatial criterion. Full: full SVM.

IV. C ONCLUSION strategies work in the spectral domain only, we suggested


combining spectral and spatial information in the iterative
In this paper, we proposed a new approach of active process of active sample selection. Spatial information was
learning for remote sensing image classification. While current based on the spatial position of the sample in the considered
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2231

TABLE VII
M C N EMAR ’ S T EST A CHIEVED ON THE ROME D ATASET W ITH R ESPECT TO (a) MS AND (b) BT C RITERIA
(a)
Method MS + Sp1 MS + Sp2 MS + Sp3 MS + Sp1 MS + Sp2 MS + Sp3
#Training samples 2016 4016
Z value 5.89 6.42 1.62 6.54 7.49 2.32

(b)
Method MS + Sp1 MS + Sp2 MS + Sp3 MS + Sp1 MS + Sp2 MS + Sp3
#Training samples 2016 4016
Z value 6.01 6.53 1.74 7.02 7.52 2.78

TABLE VIII
M EAN VALUE AND S TANDARD D EVIATION OF THE D ISCRIMINANT F UNCTION A CHIEVED ON THE ROME D ATASET

MS MS MS MS MS MS
Method Full Initial R MS R MS
+Sp1 +Sp2 +Sp3 +Sp1 +Sp2 +Sp3
#Training samples 50 000 36 2016 4016
Mean value 1.38 0.40 1.00 1.34 1.41 1.46 1.29 1.09 1.37 1.37 1.42 1.47
Standard deviation 0.33 0.59 0.42 0.44 0.35 0.35 0.38 0.40 0.38 0.33 0.32 0.35

image and, in particular, on the distance from the current SVs. R EFERENCES
For this purpose, three different criteria were proposed, based
on Euclidean distance, Parzen window method, and entropy [1] G. Camps-Valls, D. Tuia, L. Gomez-Chova, S. Jiménez, and J. Malo,
Remote Sensing Image Processing. San Mateo, CA, USA: Morgan, 2011.
variation, respectively. Combination of spectral and spatial [2] P. Mitra, C. A. Murthy, and S. K. Pal, “A probabilistic active support
criteria was done by adopting nondominated sorting. vector learning algorithm,” IEEE Trans. Pattern Anal. Mach. Intell.,
To validate the proposed solution, we conducted exper- vol. 26, no. 3, pp. 413–418, Mar. 2004.
iments on two VHR remote sensing images acquired by [3] M. Li and I. K. Sethi, “Confidence-based active learning,” IEEE Trans.
QuickBird. The obtained results showed good capabilities of Pattern Anal. Mach. Intell., vol. 28, no. 8, pp. 1251–1261, Aug. 2006.
[4] S.-S. Ho and H. Wechsler, “Query by transduction,” IEEE Trans. Pattern
the proposed approach for the selection of relevant samples. Anal. Mach. Intell., vol. 30, no. 9, pp. 1557–1571, Sep. 2008.
In particular, statistically significant advantages in terms of [5] E. Pasolli and F. Melgani, “Active learning methods for electocardio-
classification accuracy and classification reliability were eval- graphic signal classification,” IEEE Trans. Inf. Technol. Biomed., vol. 14,
uated with respect to strategies that do not exploit spatial no. 6, pp. 1405–1416, Nov. 2010.
information. Therefore, the integration of spatial information [6] B. Settles, “Active learning literature survey,” Dept. Comput. Sci., Univ.
Wisconsin-Madison, Madison, WI, USA, Tech. Rep. 1648, 2009.
showed worthy for reducing the manual sample labeling work [7] Y. Zhang, X. Liao, and L. Carin, “Detection of buried targets via active
and decreasing the computational time necessary to train the selection of labeled data: Application to sensing subsurface UXO,” IEEE
classifier. Trans. Geosci. Remote Sens., vol. 42, no. 11, pp. 2535–2543, Nov. 2004.
While in this paper we considered, for their simplicity [8] Q. Liu, X. Liao, and L. Carin, “Detection of unexploded ordnance
and effectiveness, the state-of-the-art MS and BT strategies via efficient semisupervised and active learning,” IEEE Trans. Geosci.
Remote Sens., vol. 46, no. 9, pp. 2558–2567, Sep. 2008.
as spectral criteria, the proposed approach can be in general [9] M. Ferecatu and N. Boujemaa, “Interactive remote-sensing image
applied in conjunction with any other active learning criterion retrieval using active relevance feedback,” IEEE Trans. Geosci. Remote
that exploited the samples in the spectral domain. In addition, Sens., vol. 45, no. 4, pp. 818–826, Apr. 2007.
nondominated sorting can be easily extended to combine more [10] E. Pasolli, F. Melgani, N. Alajlan, and Y. Bazi, “Active learning methods
than two criteria. However, experimental results not reported in for biophysical parameter estimation,” IEEE Trans. Geosci. Remote
Sens., vol. 50, no. 10, pp. 4071–4084, Oct. 2012.
this paper showed that the combination of just one spectral and [11] P. Mitra, B. Uma Shankar, and S. Pal, “Segmentation of multispectral
one spatial criterion was sufficient to have a good improvement remote sensing images using active support vector machines,” Pattern
of the learning process. Recognit. Lett., vol. 25, no. 9, pp. 1067–1074, Jul. 2004.
Finally, although active learning is reaching a good level of [12] D. Tuia, F. Ratle, F. Pacifici, M. Kanevski, and W. Emery, “Active
learning methods for remote sensing image classification,” IEEE Trans.
maturity from a methodological viewpoint, its implementation Geosci. Remote Sens., vol. 47, no. 7, pp. 2218–2232, Jul. 2009.
in real applications has been scarcely investigated in the liter- [13] E. Pasolli, F. Melgani, and Y. Bazi, “SVM active learning through
ature. In particular, the labeling of hundreds of pixels by the significance space construction,” IEEE Geosci. Remote Sens. Lett.,
user makes current active learning strategies hardly practical vol. 8, no. 3, pp. 431–435, May 2011.
in a real user-machine interaction scenario. This represents [14] M. Volpi, D. Tuia, and M. Kanevski, “Memory-based cluster sampling
for remote sensing image classification,” IEEE Trans. Geosci. Remote
an important direction where devoting future research efforts. Sens., vol. 50, no. 8, pp. 3096–3106, Aug. 2012.
First attempts to this problem were proposed recently in the [15] W. Di and M. M. Crawford, “View generation for multiview maximum
remote sensing community [42], [43]. disagreement based active learning for hyperspectral image classifica-
tion,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 5, pp. 1942–1954,
ACKNOWLEDGMENT May 2012.
[16] S. Rajan, J. Ghosh, and M. Crawford, “An active learning approach to
The authors would like to thank DigitalGlobe for providing hyperspectral data classification,” IEEE Trans. Geosci. Remote Sens.,
the data used in this paper. vol. 46, no. 4, pp. 1231–1242, Apr. 2008.
2232 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 52, NO. 4, APRIL 2014

[17] J. Li, J. Bioucas-Dias, and A. Plaza, “Semisupervised hyperspec- [40] P. Soille, Morphological Image Analysis. New York, NY, USA: Springer-
tral image segmentation using multinomial logistic regression with Verlag, 2004.
active learning,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 11, [41] J. Cohen, “A coefficient of agreement for nominal scales,” Educ.
pp. 4085–4098, Nov. 2010. Psychol. Meas., vol. 20, no. 1, pp. 37–46, 1960.
[18] D. Tuia, E. Pasolli, and W. J. Emery, “Using active learning to adapt [42] D. Tuia and J. Munoz-Marì, “Learning user’s confidence for active
remote sensing image classifiers,” Remote Sens. Environ., vol. 115, no. 9, learning,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2,
pp. 2232–2242, Sep. 2011. pp. 872–880, Feb. 2013.
[19] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral image seg- [43] E. Pasolli, F. Melgani, N. Alajlan, and N. Conci, “Optical image
mentation using a new Bayesian approach with active learning,” IEEE classification: A ground-truth design framework,” IEEE Trans. Geosci.
Trans. Geosci. Remote Sens., vol. 49, no. 10, pp. 3947–3960, Oct. Remote Sens., vol. 51, no. 6, pp. 3580–3597, Jun. 2013.
2011.
[20] D. Tuia, J. Muñoz-Marí, and G. Camps-Valls, “Remote sensing image
segmentation by active queries,” Pattern Recognit., vol. 45, no. 6,
pp. 2180–2192, Jun. 2012.
[21] J. Muñoz-Marí, D. Tuia, and G. Camps-Valls, “Semisupervised clas- Edoardo Pasolli (S’08–M’13) received the M.Sc.
sification of remote sensing images with active queries,” IEEE Trans. degree in telecommunications engineering and the
Geosci. Remote Sens., vol. 50, no. 10, pp. 3751–3763, Oct. 2012. Ph.D. degree in information and communication
[22] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Muñoz-Marí, “A technologies from the University of Trento, Trento,
survey of active learning algorithms for supervised remote sensing Italy, in 2008 and 2011, respectively.
image classification,” IEEE J. Sel. Topics Signal Process., vol. 5, no. 3, He was a Post-Doctoral Fellow with the Intelli-
pp. 606–617, Jun. 2011. gent Information Processing Laboratory, Department
[23] G. Schohn and D. Cohn, “Less is more: Active learning with sup- of Information Engineering and Computer Science,
port vectors machines,” in Proc. 17th Int. Conf. Mach. Learn., 2000, University of Trento, from December 2011 to Sep-
pp. 839–846. tember 2012. He is currently a Post-Doctoral Fellow
[24] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multi-class with the Computational and Information Sciences
support vector machines,” IEEE Trans. Neural Netw., vol. 13, no. 2, and Technology Office, NASA Goddard Space Flight Center, Greenbelt,
pp. 415–425, Mar. 2002. MD, USA. His current research interests include processing and recognition
[25] T. Luo, K. Kramer, D. B. Golgof, L. O. Hall, S. Samson, A. Remsen, and techniques applied to remote-sensing images and biomedical signals (classi-
T. Hopkins, “Active learning to recognize multiple types of plankton,” fication, regression, and machine learning).
J. Mach. Learn. Res., vol. 6, no. 4, pp. 589–613, 2005.
[26] G. Camps-Valls, L. Gomez-Chova, J. Muñoz-Marí, J. Vila-Francés,
and G. Calpe-Maravilla, “Composite kernels for hyperspectral image
classification,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 1,
pp. 93–97, Jan. 2006. Farid Melgani (M’04–SM’06) received the State
[27] D. Tuia, F. Ratle, A. Pozdnoukhov, and G. Camps-Valls, “Multisource Engineer degree in electronics from the University
composite kernels for urban-image classification,” IEEE Geosci. Remote of Batna, Batna, Algeria, in 1994, the M.Sc. degree
Sens. Lett., vol. 7, no. 1, pp. 88–92, Jan. 2010. in electrical engineering from the University of
[28] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, Baghdad, Baghdad, Iraq, in 1999, and the Ph.D.
“Spectral and spatial classification of hyperspectral data using SVMs degree in electronic and computer engineering from
and morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, the University of Genoa, Genoa, Italy, in 2003.
no. 11, pp. 3804–3814, Nov. 2008. He cooperated with the Signal Processing and
[29] D. Tuia, F. Pacifici, M. Kanevski, and W. J. Emery, “Classification of Telecommunications Group, Department of Bio-
very high spatial resolution imagery using mathematical morphology and physical and Electronic Engineering, University of
support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 47, Genoa, from 1999 to 2002. Since 2002, he has been
no. 11, pp. 3866–3879, Nov. 2009. an Assistant Professor and then an Associate Professor of telecommunications
[30] F. Pacifici, M. Chini, and W. J. Emery, “A neural network with the University of Trento, Trento, Italy, where he has taught pattern
approach using multi-scale textural metrics from very high- recognition, machine learning, radar remote-sensing systems, and digital
resolution panchromatic imagery for urban land-use classification,” transmission. He is the Head of the Signal Processing and Recognition
Remote Sens. Environ., vol. 113, no. 6, pp. 1276–1292, Laboratory, Department of Information Engineering and Computer Science,
Jun. 2009. University of Trento. His current research interests include processing, pattern
[31] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. A. Benediktsson, “SVM- recognition and machine learning techniques applied to remote sensing and
and MRF-based method for accurate classification of hyperspectral biomedical signals/images (classification, regression, multitemporal analysis,
images,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 736–740, and data fusion). He has co-authored more than 130 scientific publications
Oct. 2010. and is a referee for numerous international journals.
[32] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm Dr. Melgani has served on the scientific committees of several international
for optimal margin classifiers,” in Proc. Annu. ACM Workshop Comput. conferences and is an Associate Editor of the IEEE G EOSCIENCE AND
Learn. Theory, Jul. 1992, pp. 144–152. R EMOTE S ENSING L ETTERS .
[33] V. N. Vapnik, Statistical Learning Theory, New York, NY, USA: Wiley,
1998.
[34] T. T.-F. Wu, C.-J. Lin, and R. C. Weng, “Probability estimates for multi-
class classification by pairwise coupling,” J. Mach. Learn. Res., vol. 5,
pp. 975–1005, Aug. 2004. Devis Tuia (S’07–M’09) was born in Mendrisio,
[35] A. Beygelzimer, S. Kakade, and J. Langford, “Cover trees for nearest Switzerland, in 1980. He received the Diploma
neighbor,” in Proc. 23th Int. Conf. Mach. Learn., 2006, pp. 97–104. in geography from the University of Lausanne
[36] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New (UNIL), Lausanne, Switzerland, in 2004, the Master
York, NY, USA: Wiley, 2001. of Advanced Studies in environmental engineering
[37] N. Srinivas and K. Deb, “Multiobjective function optimization using from the Ecole Polytechnique Fédérale de Lausanne
nondominated sorting genetic algorithms,” Evol. Comput., vol. 2, no. 3, (EPFL), Lausanne, in 2005, and the Ph.D. degree in
pp. 221–248, 1995. environmental sciences at UNIL in 2009.
[38] N. Ghoggali, F. Melgani, and Y. Bazi, “A multiobjective genetic SVM He was a Post-Doctoral Researcher with the Uni-
approach for classification problems with limited training samples,” versity of València, València, Spain, and the Univer-
IEEE Trans. Geosci. Remote Sens., vol. 47, no. 6, pp. 1707–1718, sity of Colorado at Boulder, CO, USA, under a Swiss
Jun. 2009. National Foundation program. He is currently a Senior Research Associate
[39] E. Pasolli, F. Melgani, and M. Donelli, “Automatic analysis of GPR with the LaSIG laboratory, EPFL. His current research interests include the
images: A pattern-recognition approach,” IEEE Trans. Geosci. Remote development of algorithms for information extraction and classification of very
Sens., vol. 47, no. 7, pp. 2206–2217, Jul. 2009. high resolution remote sensing images using machine learning algorithms.
PASOLLI et al.: SVM ACTIVE LEARNING APPROACH FOR IMAGE CLASSIFICATION USING SPATIAL INFORMATION 2233

Fabio Pacifici (S’03–M’10–SM’13) received the William J. Emery (M’90–SM’01–F’03) received


Laurea (B.Sc.) (cum laude) degree and the Laurea the Ph.D. degree in physical oceanography from the
Specialistica (M.Sc.) (cum laude) in telecommuni- University of Hawaii, Honolulu, HI, USA, in 1975.
cation engineering, and the Ph.D. degree in geoin- He was with Texas A&M University, College
formation from Tor Vergata University, Rome, Italy, Station, TX, USA. He was with the University of
in 2003, 2006, and 2010, respectively. British Columbia, Vancouver, BC, USA, in 1978,
He is with DigitalGlobe since 2009. From 2005 to where he created a satellite oceanography facility
2009, he was a Visiting Scientist with the Depart- and education/research program. He was a Professor
ment of Aerospace Engineering Sciences, Univer- in aerospace engineering sciences with the Univer-
sity of Colorado, Boulder, CO, USA. He has been sity of Colorado, Boulder, CO, USA, in 1987. He is
involved in various remote sensing projects commis- an Adjunct Professor of informatics with Tor Vergata
sioned by the European Space Agency. His research activities are focused on University, Rome, Italy. He has authored over 175 refereed publications and
processing of remote sensing images, data fusion, feature extraction, pattern two textbooks.
recognition, and analysis of multi-angular and multi-temporal data. His current Dr. Emery is the Vice President for Publications of the IEEE Geoscience
research interests include classification and change detection techniques for and Remote Sensing Society (GRSS). He was the recipient of the GRSS
urban remote sensing applications using very high spatial resolution optical Educational Award in 2004 and its Outstanding Service Award in 2009. He
and/or synthetic aperture radar imagery, with special emphasis on machine has been a Fellow of the American Meteorological Society since 2010 and
learning. the American Astronautical Society since 2011. He was also recently elected
Dr. Pacifici is the current Chair of the IEEE Geoscience and Remote a Fellow of the American Geophysical Union in 2012.
Sensing Society Data Fusion Technical Committee and serves as an Associate
Editor for the IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH
O BSERVATIONS AND R EMOTE S ENSING (JSTARS). He was the recipient of
the 2011 Best Reviewer Award from IEEE Geoscience and Remote Sensing
Society for his service to IEEE JSTARS and the Best Student Paper Award at
the 2009 IEEE Joint Urban Remote Sensing Event. He also ranked first at the
2007, 2008, and 2009-2010 IEEE Geoscience and Remote Sensing Society
Data Fusion Contest.

View publication stats

You might also like