0% found this document useful (0 votes)
24 views14 pages

Niemeyer 2014

This document presents an approach for classifying airborne lidar point cloud data in urban areas into seven classes and detecting building objects. The approach uses a random forest classifier within a conditional random field framework to incorporate contextual information and improve classification accuracy. Evaluating interactions between neighboring points increases overall accuracy by 2% and improves completeness and correctness of some classes. Detected buildings are then evaluated against ISPRS benchmarks, showing correctness over 96% and completeness of 100% for main buildings larger than 50 m2.

Uploaded by

akuba393939
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

Niemeyer 2014

This document presents an approach for classifying airborne lidar point cloud data in urban areas into seven classes and detecting building objects. The approach uses a random forest classifier within a conditional random field framework to incorporate contextual information and improve classification accuracy. Evaluating interactions between neighboring points increases overall accuracy by 2% and improves completeness and correctness of some classes. Detected buildings are then evaluated against ISPRS benchmarks, showing correctness over 96% and completeness of 100% for main buildings larger than 50 m2.

Uploaded by

akuba393939
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing


journal homepage: www.elsevier.com/locate/isprsjprs

Contextual classification of lidar data and building object detection


in urban areas
Joachim Niemeyer a,⇑, Franz Rottensteiner a, Uwe Soergel b
a
Institute of Photogrammetry and GeoInformation, Leibniz Universität Hannover, Nienburger Str. 1, D-30167 Hannover, Germany
b
Institute of Geodesy, Remote Sensing and Image Analysis, TU Darmstadt, Franziska-Braun-Str. 7, D-64287 Darmstadt, Germany

a r t i c l e i n f o a b s t r a c t

Article history: In this work we address the task of the contextual classification of an airborne LiDAR point cloud. For that
Received 8 July 2013 purpose, we integrate a Random Forest classifier into a Conditional Random Field (CRF) framework. It is a
Received in revised form 4 November 2013 flexible approach for obtaining a reliable classification result even in complex urban scenes. In this way,
Accepted 4 November 2013
we benefit from the consideration of context on the one hand and from the opportunity to use a large
Available online 7 December 2013
amount of features on the other hand. Considering the interactions in our experiments increases the
overall accuracy by 2%, though a larger improvement becomes apparent in the completeness and correct-
Keywords:
ness of some of the seven classes discerned in our experiments. We compare the Random Forest approach
LIDAR
Point cloud
to linear models for the computation of unary and pairwise potentials of the CRF, and investigate the rel-
Classification evance of different features for the LiDAR points as well as for the interaction of neighbouring points. In a
Urban second step, building objects are detected based on the classified point cloud. For that purpose, the CRF
Contextual probabilities for the classes are plugged into a Markov Random Field as unary potentials, in which the
Building pairwise potentials are based on a Potts model. The 2D binary building object masks are extracted and
Detection evaluated by the benchmark ISPRS Test Project on Urban Classification and 3D Building Reconstruction.
The evaluation shows that the main buildings (larger than 50 m2) can be detected very reliably with a
correctness larger than 96% and a completeness of 100%.
Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier
B.V. All rights reserved.

1. Introduction For many applications a basic step in LiDAR processing is a clas-


sification of the point cloud. Each 3D point in the irregularly dis-
Automated urban object extraction from remotely sensed data tributed point cloud is assigned to a semantic object class. Due to
is a very challenging task due to the complexity of urban scenes. the complexity of urban scenes this task is also difficult. It is the
There are different types of objects such as buildings, low vegeta- goal of this paper to present an approach for the classification of
tion, trees, fences, and cars, that can be found in a small local a LiDAR point cloud in urban areas without the use of image data
neighbourhood, which makes it difficult to extract them reliably. providing spectral information. The only radiometric signal feature
In order to handle this problem, research often focuses on the we have access to is the so-called intensity, which is a function of
extraction of a single object type, i.e. buildings, roads, and trees; the amount of photons collected by the scanning device. After the
for overviews, cf. Mayer (2008) and Rottensteiner et al. (2012). classification, 2D building outlines are delivered from the labelled
Airborne LiDAR (Light Detection And Ranging) is a particularly point cloud.
useful technology for the acquisition of elevation data, with appli-
cations such as the generation of digital terrain models (DTM)
(Kraus and Pfeifer, 1998), data acquisition for forestry (Reitberger 1.1. Related work
et al., 2009), or power line monitoring (McLaughlin, 2006). LiDAR
data are also well-suited for automated object detection for the In recent years research mainly focused on the use of super-
generation of 3D city models. Building extraction is a prominent vised statistical methods for classification in remote sensing be-
application in this context; two recent examples are Huang et al. cause they are more flexible to handle variations in appearance
(2013) and Liu et al. (2013). of the objects to be extracted compared to model-based ap-
proaches. Besides generative classifiers modelling the joint distri-
⇑ Corresponding author. Tel.: +49 511 762 19387; fax: +49 511 762 2483. bution of the data and labels (Bishop, 2006), modern
E-mail addresses: [email protected] (J. Niemeyer), rottensteiner@- discriminative methods such as AdaBoost (Chan and Paelinckx,
ipi.uni-hannover.de (F. Rottensteiner), [email protected] (U. Soergel). 2008), Support Vector Machines (SVM) (Mountrakis et al., 2011),

0924-2716/$ - see front matter Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.isprsjprs.2013.11.001
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 153

and Random Forests (RF) (Breiman, 2001; Gislason et al., 2006) are their examples, the random field is constructed over a (radiometric
used. They usually lead to simpler models and need fewer training or depth) image grid. The neighbourhood system on which the
data in relation to generative models. These classifiers are also ap- edges of the graphical model are defined may vary with the appli-
plied to LiDAR processing tasks. For instance, Mallet (2010) used a cation, but the interactions are restricted to pairs of nodes. Lucchi
point-based multi-class SVM for the classification of full-waveform et al. (2012) use a CRF based on structured SVM (SSVM), which in-
(FW) LiDAR data, whereas Chehata et al. (2009) applied RF for that cludes an SVM model for the pairwise terms. In their case, the
purpose. However, both approaches classify each point indepen- graphical model is built on segments (superpixels), which reduces
dently without considering the labels of its neighbourhood. This the computational complexity compared to a pixel-based
is a drawback leading to inhomogeneous results in complex scenes classification.
such as urban areas, as demonstrated for example in Niemeyer Lucchi et al. (2011) have doubted the contribution of CRF-like
et al. (2011). The reason is the diversity of objects’ appearances models for classification, showing that methods for classifying
even within a single scene. Especially in urban areas roofs of differ- superpixels and applying global features can achieve a similar per-
ent shapes and other challenging objects with many details occur, formance as CRF-based models in classification of standard data
leading to overlapping distributions of features within each class. sets. Their discussion is limited to images and to CRF-based models
Shadows caused by other objects, missing data due to the objects’ involving neighbourhood terms that depend on the relative align-
properties, and random errors in the sensor data aggravate this ef- ment of objects in an image. They also show the effects of global
fect. As a consequence, purely local decisions become uncertain. constraints based on the co-occurrence statistics of objects in a
An improvement can be achieved by incorporating contextual scene. We think that the type of geometrical pairwise model used
information, which is an important cue for the classification of ob- in Lucchi et al. (2011) (‘‘sky should appear above grass’’) is not appli-
jects in complex scenes. Spatial dependencies between the object cable to remote sensing images, because it requires the definition
classes can be trained to improve the results, because some object of an absolute reference direction (e.g. the vertical in images hav-
classes are more likely to occur next to each other than others; for ing a horizontal viewing direction). Of course, height differences
instance, it is more probable that cars are situated on a street than are important features in the context of point cloud classification,
on grassland. A sound statistical model of context leads to undi- but the relative alignment in planimetry follows a similar structure
rected graphical models (Bishop, 2006) such as Markov Random as in aerial images. The benefits of using global energy terms such
Fields (MRF) (Geman and Geman, 1984). In an MRF, the class label as those based on co-occurrence statistics, also proposed in Ladický
of an object is statistically dependent on its neighbours, whereas et al. (2013), would also seem to be doubtful for the classification
the data of different objects are assumed to be conditionally inde- of remotely sensed images. In the urban remote sensing case, we
pendent (Li, 2009). Conditional Random Fields (CRF) (Kumar and usually have a small set of objects which always occur in a scene
Hebert, 2006) offer a more general model. They drop the assump- together (e.g., roads, buildings, trees and cars), so that the global
tion of conditional independence of the data of different objects, information about their co-occurrence would not seem to carry
expressed in the model of the unary potentials linking the class la- much discriminative power.
bels to the observations, and the interaction between neighbouring The first research on the context-based classification of point
objects is modelled to depend on both the labels and the data in cloud labelling was carried out in the fields of robotics and mobile
the pairwise potentials. CRFs have become a standard technique terrestrial laser scanning. Anguelov et al. (2005) proposed a classi-
for considering context in classification processes, in particular fication of a terrestrial point cloud into four object classes with
for image classification (Kumar and Hebert, 2006; Schindler, Associated Markov Networks (AMN), a subclass of MRF. Neigh-
2012). They are also becoming more and more popular in the fields bouring points are assumed to belong to the same object class with
of photogrammetry and remote sensing. Some exemplary applica- high probability, which leads to an adaptive smoothing of the clas-
tions are multi-temporal image classification (Hoberg et al., 2012), sification results. In order to reduce the number of graph nodes,
building detection in radar images (Wegner et al., 2011), and clas- ground points are eliminated based on thresholds before the actual
sification of façade images (Yang and Förstner, 2011). classification. Munoz et al. (2008) also used point-based AMNs, but
Applications of the CRF framework differ in the way they model they extended the original isotropic model to an anisotropic one, in
the potentials and in the definition of the graph structure. For the order to emphasise certain orientations of edges. This directional
unary potentials, the probabilistic output of a discriminative clas- information enables a more accurate classification of objects like
sifier is frequently used. Examples include linear models (Kumar power lines. Rusu et al. (2009) were interested in labelling an in-
and Hebert, 2006) and RF (Schindler, 2012). For the pairwise door robot environment described by point clouds. For object
potentials, most approaches use relatively simple models favour- detection points are classified using CRFs according to the geomet-
ing identical labels at neighbouring sites by penalising label ric surface they belong to, such as cylinders or planes. They applied
changes, such as the Potts model. The contrast-sensitive Potts a point-wise classification method, representing every point as a
model (Boykov and Jolly, 2001) has the same effect, but adapts node of the graphical model. Compared to our application they
the degree of penalisation related to the Euclidean distance of deal with few points (80,000), and they even reduce this data
the feature vectors. Schindler (2012) carried out a comparison of set by about 70% before the classification based on some restric-
these smoothing models applied to high resolution images. tions concerning the objects’ positions. Shapovalov et al. (2013)
Although both methods perform rather well in the comparison, also classified point clouds in indoor scenes, building a graphical
these simple models tend to over-smooth the results. Thus, a more model on point cloud segments. They consider long-range depen-
complex model might improve the results at the cost of higher dencies by so-called structural links, also based on special direc-
computational efforts in training and of having to provide fully la- tions such as the vertical, the direction to the sensor or the
belled training images. In Niemeyer et al. (2011) this was shown direction to the nearest wall. In an indoor scenario, walls can be de-
for the classification of LiDAR data of urban areas. In this case, lin- tected using heuristics (Shapovalov et al., 2013). However, in the
ear models were used for both the unary and the pairwise poten- airborne case, the number of points on walls is usually relatively
tials. In the latter case they were based on a multi-class model low. The classifications of points on walls might be one of the prob-
for the joint probability of the class labels at neighbouring sites lems one would like to solve by a CRF-based model, and the verti-
rather than on a binary model for the probability of the two labels cal and the direction to the sensor are nearly coincident. CRF were
being equal. Nowozin et al. (2011) use RF classifiers for both types also used by Lim and Suter (2007) for the point-wise classification
of potentials, also using a multi-class model for the interactions. In of terrestrial LiDAR data. They coped with the computational
154 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

complexity by adaptive point reduction. The authors improved building outlines by detecting 3D line segments in the subset of
their approach (Lim and Suter, 2009) by segmenting the points in the point cloud classified as buildings. Again, this requires the
a first step and classifying the resulting superpixels. They also con- selection of a threshold. These initial boundaries are improved
sidered both a local and a regional neighbourhood. Introducing after a planar segmentation of the point cloud, which is required
multiple scales into a CRF represented by long-range links between because the final goal of Lafarge and Mallet (2012) is the 3D recon-
superpixels, Lim and Suter (2009) improved the classification accu- struction of buildings. The planar segmentation also makes use of
racy by 5–10%. This shows the importance of considering larger re- an MRF to obtain a geometrically consistent subdivision of the
gions instead of only a very local neighbourhood of each 3D point point cloud, but this MRF-based approach is only applied to the
for a correct classification. An alternative to long-range edges, part of the point cloud classified as belonging to buildings. Poullis
which might lead to a huge computational burden if points are (2013) relies on the clustering of point cloud segments to define
to be classified individually, is the computation of multi-scale fea- individual objects. This procedure results in a binary building mask
tures. They enable a better classification of points with locally sim- in which the building boundaries are not represented very accu-
ilar features. Although belonging to different objects, the variation rately. Holes in the data are removed by an initial interpolation
of the regional neighbourhood can support the discrimination be- (though no details are explained in the paper). The building out-
tween the object types, and hence lead to a correct labelling. An ap- lines are improved using a graphical model which classifies bound-
proach utilising CRFs for remotely sensed LiDAR point clouds is ary points according to the alignment of the local boundary with
presented by Lu et al. (2009). A DTM is derived from a digital sur- previously determined dominating orientations.
face model (DSM) by applying a hybrid CRF classifying the points
to ground and non-ground points. At the same time, terrain heights 1.2. Contribution
are estimated. The work of Shapovalov et al. (2010) has its focus
only on the classification of airborne LiDAR points discerning five It is the first goal of this paper to present a probabilistic ap-
object classes ground, building, tree, low vegetation, and car. The proach for the contextual classification of point clouds in urban
authors improved the drawbacks of AMN by applying a non-asso- areas. For that purpose we apply a CRF framework with a complex
ciative Markov Network, which is able to model all class relations interaction model that is also capable to model the local spatial
instead of only preferring a same labelling of both linked nodes. structure of the data. The proposed supervised classifier is able to
First, the data are over-segmented, and then a segment-wise CRF learn context in order to find the most likely label configuration.
classification is performed. Whereas this aspect helps to cope with Following the discussion in Section 1.1 and in Niemeyer et al.
noise and computational complexity, the result heavily depends on (2011), we apply a point-based classification to preserve even
the segmentation. Small objects with sub-segment size cannot be small objects. Going beyond our previous work, the pairwise
detected, and important object details might be lost, which is, of potentials are based on RF, but unlike in Nowozin et al. (2011),
course, a drawback of all segment-based algorithms. Shapovalov our graph is based on points and, thus, irregular. We compare
et al. (2010) show that using a segmented point cloud will lead the new model with two variants of linear models in order to
to a loss of 1–3% in overall accuracy due to segmentation errors determine their effectiveness with respect to computational time
and due to the fact that classes having few samples such as cars and classification accuracy. Moreover, we analyse the influence
might be merged with the background. Whereas this does not of individual features. Whereas Chehata et al. (2009) use it to
seem to be much, it may become relevant if the classes of interest investigate the variable importance for each class, we also take into
are the ones most affected by these problems. Lafarge and Mallet account interaction classes between neighbouring points. In this
(2012) use an MRF for the classification of a point cloud. As their context an experiment with only the most important features is
main interest is in buildings, they set up a simple heuristic model carried out, and the results are compared to the classification based
for the unary potentials that requires no training, whereas the on all features.
Potts model is used for the pairwise potentials. The smoothing The second goal of this paper is the detection of building objects
parameter of the Potts model is also tuned manually. This may based on the classified point cloud. In our previous work Niemeyer
be sufficient for the particular application in this paper, but it et al. (2013) we tried to achieve this goal using a 2D raster-based
would seem to be more problematic if the number of classes to analysis, filling the pixels of a 2D building mask using the results
be discerned is increased. Xiong et al. (2011) show how point- of the point-based classification. Gaps in data were closed by per-
based and region-based classification of LiDAR data can interact. forming a morphological closing. In this paper, we present a more
They propose a hierarchical sequence of relatively simple classifi- sophisticated post-processing technique, using the posterior prob-
ers applied to segments and points. Starting either with an inde- abilities of the CRF-based classification in an MRF to generate a 2D
pendent classification of points or segments, in subsequent steps multi-label image that is consistent with the labelled point cloud.
the output of the previous step is used to define context features MRF were also used by Lafarge and Mallet (2012) for this purpose,
that help to improve the classification results. In each classification but in a more complex scenario in order to achieve the goal of 3D
stage, the results of the previous stage are taken for granted, and building reconstruction. This would seem to be a large overhead
unlike with CRF, no global optimum of the posterior distribution for the goal we want to achieve here. In addition, the MRF-based
of all labels is searched (Boykov and Jolly, 2001; Kumar and Hebert, label propagation method proposed by Lafarge and Mallet (2012)
2006). is specific for buildings and could not be applied to other object
Point cloud labelling is only a first step for the extraction of ob- classes.
jects such as buildings from a point cloud. The second step com- The performance of the proposed method is demonstrated and
prises the transition from the point cloud to contiguous objects, evaluated on a benchmark data set with three complex urban
e.g. represented by boundary polygons. Sampath and Shan (2007) scenes. For the 3D classification we discern seven classes and carry
derived 2D boundary polygons by applying a modified convex hull out a quantitative evaluation based on 3D reference data generated
algorithm directly to segments of points classified as buildings. manually. The 2D building objects are evaluated in the context of
However, such an approach requires parameters related to the the ISPRS test project on urban classification and 3D building
mean point distance and may be sensitive to irregular point distri- reconstruction (Rottensteiner et al., 2012).
butions. Dorninger and Pfeifer (2008) use alpha-shapes (Edelsb- This paper is organised as follows. The next section presents our
runner and Mücke, 1994), which also require a careful tuning of methodology including a brief description of CRF and the genera-
the parameter a. Lafarge and Mallet (2012) determine coarse tion of 2D building objects. After that, Section 3 comprises the
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 155

evaluation of the point cloud classification and extraction of ob- /i ðx; yi Þ / pðyi jhi ðxÞÞ: ð2Þ
jects. The paper concludes with Section 4.
This is a very general formulation which allows to use any discrim-
inative classifier with a probabilistic output for the unary potential
2. Methodology (Kumar and Hebert, 2006). In this work we chose a linear model and
an RF classifier for the computation of the unary potential to enable
It is the goal of point cloud classification to assign an object a comparison. Linear models were used for the definition of the un-
class label yi to each 3D point i. Common approaches such as RF ary potentials for instance by Kumar and Hebert (2006). RF have
and SVM usually consider solely the features of each point sepa- been shown to be well suited for the (non-contextual) classification
rately, and, thus, classify it independently of its neighbourhood. of LiDAR data in Chehata et al. (2009).
We use a CRF, which is able to incorporate context in the label In case of the linear model, the unary potential is defined
assignment step. As a consequence, all points are labelled simulta- according to Eq. (3):
neously. Typical class relations are learned and improve the
results. The following sections describe the CRF framework and /i;LM ðx; yi ¼ lÞ ¼ expðwTl  hi ðxÞÞ ð3Þ
the generation of 2D building objects based on the classification
In Eq. (3), the feature vectors hi ðxÞ, including an additional bias fea-
results.
ture that always takes the value 1 (Bishop, 2006), are multiplied by
a weight vector wl . There is one such a vector wl for each class l, and
2.1. Conditional Random Fields
these vectors are determined in the training stage.
As in Niemeyer et al. (2011) we restrict ourselves to assume the
CRF belong to the family of undirected graphical models with an
classes to be linearly separable. For many data sets this assumption
underlying graph Gðn; eÞ consisting of nodes n and edges e. In our
is not valid and a feature space mapping based on a quadratic fea-
case, each node ni 2 n corresponds to a 3D point. We assign class
ture expansion (Kumar and Hebert, 2006) leads to better results.
labels yi to all points simultaneously based on observed data x.
Such a generalised linear model (GLM) improved the classification
The vector y 2 X contains the labels yi for all nodes, and hence
(Niemeyer et al., 2012), but it comes along with significantly higher
has the same number of elements as n. The graph edges eij are used
computational costs as the number of parameters increases con-
to model the relations between pairs of adjacent nodes ni and nj ,
siderably. Thus, this method is only applicable for a small amount
and thus enable representing contextual relations. For that pur-
of features and is not able to handle many features as in our appli-
pose, each point ni is linked to other points (nj 2 N i ) by edges.
cation. For this reason we focus on linear models and do not use a
CRF are discriminative classifiers that model the posterior distribu-
GLM in this study.
tion pðyjxÞ directly (Kumar and Hebert, 2006):
! The other classification method used for the definition of the
1 Y YY unary potentials in this paper is RF. Based on its design, this clas-
pðyjxÞ ¼ / ðx; yi Þ wij ðx; yi ; yj Þ : ð1Þ sifier is directly appropriate for discerning multiple object classes,
ZðxÞ i2n i i2n j2N i
and it can handle many features (Gislason et al., 2006). RF do not
In Eq. (1), Ni is the neighbourhood of node ni , corresponding to the require any assumptions about the distribution of the data. An
edges linked to this particular node. The two terms /i ðx; yi Þ and RF is a bootstrap ensemble classifier based on decision-trees. It
wij ðx; yi ; yj Þ are called the unary and pairwise potentials, respec- consists of a number T of trees grown in a training step. Each inter-
tively; they are explained in the next sections. The partition func- nal node of any tree contains a test to find the best feature and a
tion Z(x) acts as normalisation constant, turning potentials into corresponding threshold splitting the data into two parts. The com-
probabilities. bination optimising a criterion, for example the Gini gain (Breiman,
2001), is chosen. We used an RF implementation for MATLAB
2.1.1. Definition of the graph (Abhishek, 2009). In this implementation, the depth of a tree de-
Compared to images, a point cloud is more complex because pends on the separability by these features. A random subset of
points are irregularly distributed in 3D space. For points there is m features is evaluated at each node, and the thresholds are ran-
no straightforward definition of the neighbourhood that can be domly tested. Each tree is grown until there is only one sample
used to define the edges of the graph. In contrast, images are ar- in each node leaf. The classification is performed by presenting
ranged in a lattice, and each pixel has a defined number of neigh- the features of an unknown sample to all the trees. Each tree t i
bours (usually four or eight). In our case each point is linked by of the nT trees casts a vote for the most likely class. If the number
edges to its k nearest neighbours in 2D, which corresponds to a of votes cast for a class l is N l , the unary potentials is defined by
cylindrical neighbourhood. In contrast to a spherical neighbour- /i;RF ðx; yi ¼ lÞ ¼ expðNl =nT Þ: ð4Þ
hood, important edges with height differences, for instance from
the canopy to the ground, are more likely to occur in such a graph. The exponent of the posteriors is used to avoid potentials of value 0
They might give valuable hints for the local configuration of classes for unlikely classes.
(Niemeyer et al., 2011). It is an advantage of RF that a value for the feature importance
can easily be obtained. For the importance measurement of a
2.1.2. Unary potential feature, its values are randomly permuted. In this way, the absence
In the unary potentials the data are represented by node fea- of this feature can be modelled. Then, the number of correctly clas-
ture vectors hi ðxÞ. For each node ni such a vector is determined sified points before and after permuting the feature is compared. In
taking into account not only the data xi observed at the corre- case of a large difference between both results, the importance of
sponding point, but also at the points in a certain neighbourhood. this feature is high for the classification task. The reader is referred
The particular definition of the node features depends on the data to Breiman (2001) for more details. This method is used to deter-
sets; the features we used in our experiments are described in mine the feature importance in our application in Section 3.2.4.
Section 3.2.1. Using these node feature vectors hi ðxÞ, the unary
potential /i ðx; yi Þ linking the data to the class labels determines 2.1.3. Pairwise potential
the most probable label for a single node given its site-wise The second term in Eq. (1) represents the pairwise (or binary)
features. It is modelled to be proportional to the probability for potential wij ðx; yi ; yj Þ and incorporates the contextual relations
yi given the data: explicitly in the classification. It models the dependencies of a node
156 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

ni from its adjacent node nj by comparing both node labels and In the training of the linear models the weight vectors have to
considering the observed data x. In the pairwise potentials, the be determined. We compare two versions for training of linear
data are represented by interaction feature vectors lij ðxÞ which models in our experiments. For the first one, CRF LM;separate , the
are computed for each edge eij . In our case, each point is linked weights are concatenated in the two parameter vectors
to points in its direct neighbourhood. Due to the similarity of the hunary ¼ ½w1 ; . . . wc T and hpairwise ¼ ½v 1;1 . . . v c;c T with c classes. We
features at neighbouring nodes, a difference of both node feature assume a Gaussian prior for the parameters with zero mean, thus
vectors, lij ðxÞ ¼ hi ðxÞ  hj ðxÞ, would thus lead to a vector with h  Nð0; r  IÞ, with standard deviation r = 1. Using this prior,
most elements close to zero. Experiments have shown that this we perform a Bayesian estimation of the parameter vectors by
may work in the case of discerning only a few classes and utilising minimising the objective functions Eqs. (9) and (10) separately:
just a small number of features, but it is not useful to distinguish
several classes of interactions and for using a large amount of
funary ¼  logðpðhunary jx; yÞ  pðhunary ÞÞ ð9Þ
features. Hence, we improve our previous work (Niemeyer et al.,
2013) to obtain lij ðxÞ by concatenating features of both node fpairwise ¼  logðpðhpairwise jx; yÞ  pðhpairwise ÞÞ: ð10Þ
feature vectors hi ðxÞ and hj ðxÞ, and computing some differences
In order to minimise these functions, we use the L-BFGS (limited
as explained in Section 3.2.1.
memory – Broyden–Fletcher–Goldfarb–Shanno) optimisation
In many CRF-based applications, the pairwise potentials are
method. It is a quasi-Newton approach that approximates the in-
based on the probability of two neighbouring node labels yi and
verse of the Hessian matrix (Liu and Nocedal, 1989).
yj being identical given lij ðxÞ (Kumar and Hebert, 2006):
In this case we obtain the best weights for each single potential,
wij ðx; yi ; yj Þ / pðyi ¼ yj jlij ðxÞÞ: ð5Þ but these do not necessarily match the best combination of both
potentials. The advantage of this method is that some samples
The contrast-sensitive Potts model, which penalises a class change for each class and class relation, respectively, can be drawn from
unless indicated by a change in the data (Schindler, 2012), belongs the fully labelled training area, leading to a speed-up of the learn-
to this group of models. More complex models can be based on the ing process.
joint posterior probability of two node labels yi and yj given lij ðxÞ: The other version, denoted as CRF LM;full , is also based on linear
wij ðx; yi ; yj Þ / pðyi ; yj jlij ðxÞÞ: ð6Þ models, but it determines the parameters simultaneously by con-
catenating all weights of wl and v l;k in a single parameter vector
These models allow to learn that certain class relations may be hfull ¼ ½w1 ; . . . wc ; v 1;1 . . . v c;c T , which is determined by optimising
more likely than others given the data. This information is used to
improve the quality of classification, with the drawback of more ffull ¼  logðpðhfull jx; yÞ  pðhfull ÞÞ: ð11Þ
parameters which have to be determined. Again, we apply a linear
For each iteration, the value of ffull , its gradient as well as an estima-
model and an RF classifier to obtain the probabilities for the
tion of the partition function ZðxÞ is required. We used the method
interactions. Similarly to Eq. (3), the linear model for the pairwise
described in Vishwanathan et al. (2006) based on L-BFGS combined
potential is designed as
with LBP for inference. On the one hand, a better classification re-
wij;LM ðx; yi ¼ l; yj ¼ kÞ ¼ expðv Tl;k  lij ðxÞÞ; ð7Þ sult might be expected in this case because this approach delivers
the optimal feature weights and results for the given combination
with one edge weight vector v l;k for each label configuration l and k of both potentials. However, in this case, training requires inference
of adjacent nodes ni and nj . The pairwise potential based on RF is on the graphical model (Vishwanathan et al., 2006) and, thus, a con-
defined by nected part of the training data representing all points and interac-
tions. In this case usually more data have to be considered for
wij;RF ðx; yi ¼ l; yi ¼ kÞ ¼ expðNl;k =nT Þ: ð8Þ
parameter estimation in the training process compared to
In Eq. (8), N l;k is the number of votes per interaction for class labels l CRF LM;separate .
and k. In both cases, if c classes have to be discerned for the nodes of In case of CRF RF , two independent RFs have to be trained for un-
the graph, there are c2 local configurations of classes involving two ary and pairwise potentials due to the different number of classes.
neighbouring nodes. Thus, the models for the pairwise potentials In this study the RF implementation considers the Gini gain for
correspond to probabilistic classifiers having to discern c2 classes training of the trees for CRF RF . The number of the random feature
(each corresponding to a local configuration of classes). subset m is set to the square root of all input features, following
To sum up, two types of CRFs are used in this work. In CRF LM , the Gislason et al. (2006). As RF optimise the overall error rate, a class
potentials /i;LM and wij;LM are based on linear models, whereas CRF RF with many samples might lead to a bias in the training step. Thus,
is modelled using RF for both potentials (/i;RF ; wij;RF ). A comparison the training set is balanced by randomly selecting the same num-
of these approaches is carried out in Section 3.2.2. In each case the ber of samples for each class by applying downsampling or over-
unary and pairwise potentials are weighted equally. A relative sampling, depending on the actual number of training samples
weighting factor can be trained in future work using cross valida- available for each class (Chen et al., 2004).
tion, as proposed by Shotton et al. (2009).
2.2. Generation of 2D objects
2.1.4. Training and inference
In the context of graphical models, inference is the task of deter- As pointed out in Section 1.2, one of the goals of our work is to
mining the optimal label configuration based on maximising pðyjxÞ detect buildings in the scene. The result of the previous step is a la-
for given parameters. For the large graph with cycles in our appli- belled point cloud. Here we want to derive a 2D representation in
cation exact inference is computationally intractable and approxi- the form of a binary building mask, which can be used to derive
mate methods have to be applied. We use the standard message polygons describing the building outlines. For that purpose, a 2D
passing algorithm Loopy Belief Propagation (LBP) (Frey and grid aligned with the XY plane of the object coordinate is defined,
MacKay, 1998) as implemented by Schmidt (2012). Although this and all points are projected to this grid. However, due to the
technique does not ensure convergence to the global optimum, it irregular distribution of the points, some pixels remain empty. In
has been shown to provide good results in Vishwanathan et al. order to deliver accurate object masks or boundaries, these holes
(2006). in the image data must be closed. A simple morphological closing
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 157

operation as used in our previous work (Niemeyer et al., 2013) is difference between the maximum and the second largest belief
not sufficient because not only the holes were closed, but also and only maintain building pixels with a difference in belief larger
some spaces between two buildings resulting in false positives. than a user-defined threshold.
On the other hand, single wrongly classified points resulted in er-
rors in the object mask which we tried to remove by morphological
3. Evaluation
opening in Niemeyer et al. (2012). However, this also removed ob-
jects with a small spatial extent. Thus, it turned out to be difficult
This section presents the experiments to evaluate the perfor-
to find a post-processing step maintaining the correct information
mance of our approach. In Section 3.1 we describe the data set
while at the same time eliminating outliers. A better approach is
which is used for evaluation. Section 3.2 is dedicated to the evalu-
needed.
ation of our CRF-based method for point cloud classification,
We build another graphical model to solve this problem. We
whereas Section 3.3 presents the evaluation of our 2D building
use a grid-based solution, looking for a 2D building mask as the ba-
detection approach.
sis for deriving the building outlines. In this case, the pixels of the
image grid correspond to the nodes of the graphical model. The
edges link each pixel to its four direct neighbours on the grid. 3.1. Study area
We use the normalised beliefs pCRF for each node obtained by LBP
in the original CRF-based classification to define the unary poten- The performance of our method is evaluated on the LiDAR data
tials in the second (the MRF-based) classification. To be precise, set of Vaihingen, Germany (Cramer, 2010), in the context of the
in each pixel i of the grid, the averages of the CRF beliefs of all ISPRS Test Project on Urban Classification and 3D Building Recon-
points falling into this pixel (N p ) are computed for each class l, struction (Rottensteiner et al., 2012). It was acquired in August
and these average beliefs are used to define the unary potentials 2008 by a Leica ALS50 system with a mean flying height of
for these pixels in the MRF. For pixels not containing a single LiDAR 500 m above ground and a 45° field of view. The average strip over-
point, we assume all classes to be equally probable (Eq. (12)): lap is 30% and the point density in the test areas is approximately
81 X 8 points/m2. Multiple echoes and intensities were recorded. How-
< Np pCRF ðyp ¼ ljhp ðxÞÞ if kNp k > 0 ever, only very few points (2.3%) are multiple returns, as the acqui-
log /i;MRF ðx; yi ¼ lÞ ¼ p2N p
: sition was in summertime under leaf-on conditions. Hence, the
1=c if kNp k ¼ 0 vertical point distribution within trees is such that most points de-
ð12Þ scribe only the canopy.
For the benchmark, three test sites with different scenes are
In Eq. (12), kN p k is the number of points falling into a pixel and c is
considered (Fig. 1). Area 1 is situated in the centre of the city of
the number of classes to be discerned. Note that compared to the
Vaihingen. Dense, complex buildings and some trees characterise
original classification of the point cloud, we might reduce the num-
this test site. Area 2 consists of a few high-rising residential build-
ber of classes to be distinguished in the MRF. For instance, walls
ings surrounded by trees. In contrast, Area 3 is a purely residential
would always appear beneath building roofs and thus would disap-
neighbourhood with small, detached houses.
pear in a 2.5D analysis. Thus, some classes are not considered in the
As the benchmark only provides reference data for 2D objects,
MRF. For the excluded classes, the beliefs from the original point
we manually labelled the point cloud of the three test areas to en-
cloud classification are simply not considered. As a consequence,
able an evaluation of the 3D classification results. The combined
the values used for the unary potentials, while still being consistent
point cloud consists of 780,879 points. We discern the following
with the general requirements of potentials, are no probabilities be-
seven object classes: grassland (22.6%), road (27.6%), building with
cause they do not necessarily sum to 1. We choose a multi-class set-
gable roof (15.3%), low vegetation (6.4%), façade (4.2%), building with
ting rather than a binary classification because in the future we
flat roof (6.3%), and tree (17.6%), where the numbers in brackets
want to expand this method to other objects, e.g. trees.
give the distribution of the object classes in the combined refer-
The pairwise potential is represented by a Potts model (Eq. (13))
ence point cloud. The class low vegetation also contains the cars.
favouring neighbouring pixels i and j to have the same labels.
In order to train the CRFs, a training area consisting of 263,368 la-

k if l ¼ k belled points to the south east of Area 1 is used. All experiments
log wij;MRF ðyi ¼ l; yj ¼ kÞ ¼ ð13Þ are performed with Matlab on a computer with Intel Core i7,
0 if l–k
2.80 GHz CPU and 16 GB RAM.
In Eq. (13), the parameter k expresses the relative weighting of both
potentials (and hence the degree of smoothing). It is set manually in
3.2. 3D point cloud classification
our experiments. The smoothing effect of the MRF closes holes of
pixels without corresponding LiDAR points. As the unary potentials
In Section 3.2.1 the definition of the site-wise feature vectors is
of these ’empty’ pixels are initialised on the assumption of an equal
given. This is followed by a comparison of different versions of our
distribution of the class labels, the Potts model might infer the
CRF-based approach in Section 3.2.2. The results of 3D point cloud
information of the neighbouring pixels to this pixel without LiDAR
classification are presented in Section 3.2.3. In Section 3.2.4 we
points. As the parameter of the Potts model is chosen manually and
analyse the importance of the features used for nodes and
because we use the outcomes of the first classification to define the
interactions.
unary potentials, we do not need an additional training step for the
MRF. For a CRF a more complex model considering the interaction
features would be required, which usually must be trained. This is 3.2.1. Features
the reason why an MRF is applied for this task. Again, we use LBP We adapted some of the LiDAR features presented in Chehata
for obtaining the optimal configuration of class labels based on et al. (2009) and also used some additional ones which we consider
the definition of potentials according to Eqs. (12) and (13). From to be well-suited this particular classification task. Note that only
the final multi-label image the binary building masks are derived LiDAR features are utilised for the classification of the point cloud.
by considering only the building class. In a post-processing step only We do not take into account features obtained from the optical
building pixels which are classified reliably are maintained in order images which are also available for this area. The following fea-
to obtain only reliable objects. For that purpose, we compute the tures are used:
158 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

Fig. 1. Test sites of scene Vaihingen. ‘Inner City’ (Area 1, left), ‘High-Riser’ (Area 2, middle) and ‘Residential’ (Area 3, right) (Rottensteiner et al., 2012).

1. intensity; areas independently using each version of our classifier. In all


2. ratio of echo number per point and number of echoes in the experiments, each point was linked to its three nearest neighbours
waveform; in 2D in the graphical model. In all cases we used only the 35 fea-
3. height above DTM; tures with the scale r = 1 m, because the versions based on the lin-
4. approximated plane (points in a spherical neighbourhood of ear model for the potentials require very long computation times
radius r are considered): sum, mean and standard deviation when the number of features is large. For the versions based on
of residuals, direction and variance of normal vector; the linear model we normalised the features for all points by sub-
5. variance of point elevations in a cylinder and in a sphere of tracting the mean values and dividing by the standard deviations.
radius r; The first CRF version based on a linear model, CRF LM;separate , trains
6. ratio of point density in a cylinder and a sphere of radius r; the unary and pairwise potentials independently from each other.
7. Eigenvalue-based features in a sphere of radius r: 3 eigen- This allows us to select only a subset of samples from the training
values (k1 ; k2 ; k3 ), omnivariance, planarity, anisotropy, sphe- set. For this test we consider 2000 randomly drawn samples per
ricity, eigenentropy, scatter (k1 /k3 ) (Chehata et al., 2009); class and class relation, respectively. These samples do not have
8. point density in a sphere of radius r; to be neighboured in this case. Training results in the best weights
9. principal curvatures k1 and k2, mean and Gaussian curva- for each single potential, but these weights do not necessarily cor-
ture in a sphere of radius r; respond to best combination of both potentials in the graphical
10. variation of intensity, omnivariance, planarity, anisotropy, model.
sphericity, point density, number of returns, k1; k2, mean In the second CRF version based on a linear model, CRF LM;full , the
curvature, and Gaussian curvature in a sphere of radius r. weights for both potentials are determined simultaneously. In this
case, a complete part of the training data must be used. We utilised
The DTM for the feature height above DTM is generated using ro- a part of the entire training point cloud consisting of 156,667
bust filtering (Kraus and Pfeifer, 1998) as implemented in the com- points, which resulted in a longer training time than the one re-
mercial software package SCOP++1. The features considering the quired for CRF LM;separate . This approach was also applied in Niemeyer
local point distribution within a sphere or a cylinder are computed et al. (2011).
for multiple scales with radii r= 1, 2, 3, and 5 m. The number of scales The third CRF version to be compared in this section is based on
was chosen empirically; using more scales did not improve the clas- RF (CRF RF ). As described in Section 2.1.4 we make use of two inde-
sification results. In total the feature vector hi ðxÞ for node ni used for pendent RF classifiers for the unary and the pairwise potentials.
the 3D classification consists of 131 entries. Note that for some They are trained separately, so that in this respect this approach
experiments (Section 3.2.2) only a subset of these features is used. is comparable to CRF LM;separate . This is why we use the same 2000
For the interactions a feature vector lij ðxÞ is required for each randomly drawn samples per class and class interaction for the
edge eij . As discussed in Section 2.1.3, the difference of both node training of the RF classifiers as we use for training in the version
feature vectors hi ðxÞ and hj ðxÞ is not promising in this case. The
height difference is an important piece of information, but in addi-
tion the actual height above ground is needed, for instance, to dis-
Table 1
tinguish between a relation of points on a roof or on the road level.
Comparison of three versions of our CRF-based classifier (CRF LM;separate ; CRF LM;full ; CRF RF ).
Hence we concatenate the original feature vectors obtained for For the 7 classes the completeness and correctness rates are presented. The best
scale r=1 m of both nodes, and additionally compute the differ- results are highlighted in bold. RF outperforms the other versions in nearly all values.
ences of the elevation and intensity values. Both points have very
CRF LM;separate CRF LM;full CRF RF
similar local neighbourhoods, thus considering the other scales in
Overall accuracy 75.7% 76.5% 80.6%
addition would not contribute significant information to support
Kappa index 0.70 0.71 0.76
the classification. As a consequence, each interaction feature vector Grassland 76.3/79.4% 75.6/77.0% 82.2/81.0%
lij ðxÞ consists of 72 elements. Road 88.3/86.6% 87.1/84.8% 88.1/91.1%
Building (gable) 83.1/85.6% 90.2/80.2% 91.1/91.2%
Low vegetation 69.3/47.1% 63.8/46.9% 77.2/49.6%
3.2.2. Comparison of linear models and Random Forests Façade 47.3/43.5% 39.0/63.5% 52.9/52.8%
In this Section we compare the three versions of our CRF-based Building (flat) 91.5/52.0% 78.2/58.3% 90.3/63.4%
classifier introduced in Section 2.1. We classified the three test Tree 52.0/90.0% 61.6/87.1% 61.7/91.3%
Training time 60.1 min 459.4 min 20.0 min
1
Classification time 81.3 min 75.3 min 3.4 min
http://www.trimble.com/imaging/inpho/geo-modeling.aspx?dtID=SCOP.
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 159

CRF LM;separate . We use RF consisting of 300 trees for both potentials, a ness values of 93.7%. More difficult to differentiate are the classes
value that was found empirically. low vegetation and façades which are less dominant in the three
The results of the comparison for all three study areas can be scenes. Class low vegetation has a rather low correctness of 50.9%
seen in Table 1. The table shows the overall accuracies (OA), kappa due to a relatively large number of tree and grassland points being
indices, completeness and correctness rates for all classes as well incorrectly assigned to this class. However, in this case even the
as the time required for training and classification. The experiment generation of the reference data was difficult for a human operator,
reveals that the approach CRF LM;separate achieves only slightly worse so the error might be partly explained by errors in the reference.
results than CRF LM;full , in which both potentials were optimised The low correctness of façades is a consequence of the vertically
simultaneously. The difference in OA is only 0.8% (75.7% and distributed points belonging to the boundaries of trees being
76.5%). CRF RF outperforms both versions based on a linear model wrongly labelled as façades due to similar features. Another eye-
with an OA of 80.6% (+4.1% compared to CRF LM;full ). The variations catching example of challenges is the confusion between trees
of the kappa indices are in a similar range. In most cases the com- and gable roofs. This is because many points on trees are located
pleteness and correctness values of the seven classes are the best on the canopy and show similar features to building roofs: the data
for RF. Especially buildings with gable roofs, grassland and low vege- set was acquired under leaf-on conditions. Almost the complete la-
tation benefit from the RF approach. Concerning the time needed ser energy was reflected from the canopy, and multiple pulses
for training and classification, RF is much faster than both linear within trees were recorded only very rarely (about 2.3%). Most of
model methods (in total 23.4 min compared to 141.4 min our features consider the point distribution within a local neigh-
(CRF LM;separate ) and 534.7 min (CRF LM;full ), respectively). In particular bourhood. For tall trees with large diameters the points on the can-
the classification time is significantly longer for the variants based opies define a very smooth surface. The deviations from a local
on a linear model for the potentials. The computation of potentials plane are not larger than those of a gable roof. An example of this
as well as LBP require more time. The latter aspect might indicate effect is presented in Fig. 3. Some significant confusion of road and
that the classes cannot be separated well by a linear decision sur- grassland is also observed. It explains about one third of all errors.
face in this case. We think that the intensity is important to distinguish these two
To sum up, CRF RF is the most accurate and fastest method in our classes, though this cannot be underlined by the feature impor-
comparison. This is why we use this version in the experiments in tance analysis (which, if carried out on a per-class-level as in Cheh-
all subsequent sections. As an RF can cope with a large amount of ata et al. (2009), can only show which features are most suitable to
features, from now on we will be able to incorporate multi-scale differentiate a class from all other classes). Intensity is sensitive to
features in the way described in Section 3.2.1. Concerning the vari- incidence angle of the laser beam. We did not carry out any correc-
ants based on a linear model for the potentials, CRF LM;separate seems tion of the raw intensities delivered with the ISPRS benchmark
to be a better choice than CRF LM;full because it is faster, works with data set; assessing the impact of such a correction could be a part
independent training samples, and leads to comparable results. of our future work. In addition, distinguishing these classes was
not even clear when digital orthophotos were used along with Li-
3.2.3. Classification results DAR data for generating the reference. However, the results for
The result of CRF classification is the assignment of an object both classes (P80.9%) obtained only from LiDAR data are quite
class label to each LiDAR point. For the experiments described in good taking into account the absence of multi-spectral informa-
this section we use the CRF based on RF (CRF RF ). Again, the graph tion. The relatively large number of grass points classified as low
is built by linking each point to its three nearest neighbours in vegetation may also be partly explained by inaccurate reference
2D. For both the unary and the pairwise potentials we use RF con- data. Nevertheless, we consider our results to be rather good, with
sisting of 300 trees, a number that was found empirically. In con- most completeness or correctness values better than 80% or even
trast to the experiment in Section 3.2.2 a larger amount of 90%. Even small details, for instance some garages and pavilions
features and more samples are used. Following the guidelines out- in Area 2 and 3, are detected accurately and most of the car points
lined in Section 2.1.4, a random subset of 11 features is used for the are correctly labelled as low vegetation (corresponding to the class
tests in the tree nodes. Accordingly, this subset has size 8 for pair- definition). This is an advantage of applying the point-based con-
wise potential with 72 features in total. For the training of the un- textual classification approach. In this experiment training took
ary potentials we used 3000 training samples per class. Another set 66 min and the computation time for inference was fast with ap-
of 3000 samples per class relation was used to train the pairwise prox. 0.8, 1.2 and 1.6 min for the three test areas, respectively.
potential, having to discriminate 49 different class interactions. We also assess the influence of integrating contextual relations
The results of the 3D point cloud classification are depicted in into the classification process. For this purpose, we compare the re-
Fig. 2. sults of the classification using our method CRF RF with a classifica-
A quantitative evaluation of the results based on the reference tion solely using the unary potential. The latter corresponds to a
generated by manual labelling shows that the method CRF RF standard RF classification of the point cloud in which each point
achieves a mean OA of 83.4% and a mean kappa index of 0.80 is labelled individually from its neighbours. The interactions in-
(0.75, 0.82, and 0.79 for Areas 1, 2, and 3, respectively) for the three crease the OA of the three areas by 2.0% in average (1.8%, 2.7%,
test areas. Keeping in mind that we differentiate a larger number of and 1.6% for Areas 1, 2, and 3, respectively). This does not look very
classes than comparable studies, e.g. Chehata et al. (2009), we con- much at a first glance, which to a certain degree can be attributed
sider this results to be rather good, given the challenging environ- to the facts that RF is per se a strong classifier and that context is
ment and the fact that the LiDAR data were captured at leaf-on implicitly considered in the RF-based classification by using mul-
conditions. If we differentiate only the three classes building (merg- ti-scale features. However, at a second glance, the pairwise poten-
ing gable roof, flat roof and façade), ground (merging grass and road) tials do have a non-negligible effect, namely in improving the
and vegetation (merging tree and low vegetation), we achieve an OA quality of the results for some of the classes. The differences of
of 93.3%. Area 1 is the most challenging scene with 80.3% OA, the the completeness and correctness rates for all classes are presented
best result is obtained in Area 2 with 85.7%. The confusion matrix in brackets in Table 2. Positive values indicate a better result of the
in Table 2 also presents the completeness and correctness values CRF considering the pairwise terms. It can be seen that most rates
for each class. The classes with many points such as grassland, are improved. Especially for the ground classes grassland and road
roads, buildings and trees are detected relatively well. Especially as well as for low vegetation, completeness and correctness in-
the class gable roof obtains very good completeness and correct- crease by up to 5.3%. The best improvement of 10.3% can be
160 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

Fig. 2. 3D view of the classification results for version CRF RF for the three study areas with seven classes: grassland (khaki), road (grey), building with gable roof (purple), low
vegetation (light green), façade (dark purple), building with flat roof (orange), and tree (green).

Table 2
Confusion matrix obtained by 3D point cloud classification of the three areas with correctness and completeness values in (%), discerning the classes grassland (Grass), road (Road),
building with gable roof (GR), low vegetation (LV), façade (Faç), building with flat roof (FR), and tree (Tree). The numbers in brackets show the changes by considering the interactions
are given compared to the classification based on unary potentials solely. Positive values represent improvements by context. Due to the interactions the OA increases from 81.4%
to 83.4%. Compared to Table 1, more features and more training samples are considered in this experiment, which is the reason why the completeness, correctness and OA values
do not match.

obtained for the correctness of façade, which significantly benefits 3.2.4. Feature importance
by incorporating context represented by the interactions of points. Based on a permutation importance measure (Breiman, 2001),
To sum up, modelling of interactions is useful to obtain a more reli- the relevance of features can easily be obtained by RF aside from
able classification compared to points being labelled individually. classification. This kind of analysis has for instance been performed
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 161

Table 4
The 14 most important node features, ordered by their rank according to feature
importance.

Rank Feature
1 Height above DTM
2 Variance of point elevations (cylinder, r ¼ 1 m)
3 Direction of normal vectors (sphere, r ¼ 1 m)
4 Direction of normal vectors (sphere, r ¼ 2 m)
5 Ratio of point density (sphere, r ¼ 1 m)
6 Variance of intensity (sphere, r ¼ 1 m)
7 Variance of point elevations (sphere, r ¼ 1 m)
8 Variance of point density (sphere, r ¼ 1 m)
9 Intensity
10 Variance of point elevations (sphere, r ¼ 2 m)
11 Direction of normal vectors (sphere, r ¼ 3 m)
Fig. 3. Tree points are wrongly classified as gable roof.
12 variance of omnivariance (sphere, r ¼ 1 m)
13 Variance of principal curvature k2 (sphere, r ¼ 1 m)
14 Variance of normal vectors (sphere, r ¼ 1 m)
by Chehata et al. (2009), who investigated the relevance of features
for the classification of a FW point cloud. In our case, we addition-
ally learn interactions between neighbouring points, and thus we
are able to analyse and compare the feature importance for both of the points are single returns and only very few points are multi-
the nodes and the edges of our CRF. Feature importance can be gi- ple returns.
ven for each variable and for each class. However, the RF classifier As we have seen that the height is the most important feature in
that is the basis for the pairwise terms has to distinguish 49 general (Table 3), we applied an experiment with all features ex-
different class relations, so that the presentation of feature impor- cept the height above DTM. The OA is still surprisingly high with
tance values per class would become very confusing. For the pur- 74.4%, and thus 9% worse than the classification with the height
pose of clarity, we mainly focus on the overall importance values features. The reasons are flat building roofs (e.g. the high-riser in
per feature over the entire forest, which can be computed by the Area 2) being erroneously assigned to road, and vice versa. Without
sum of the differences of accuracy in tree ti achieved with the cor- considering the height, both classes have very similar features (for
rect feature values and after permuting the values of that feature. example same direction of the normal vector) and cannot be sepa-
The 10 most relevant features based on the overall importance rated correctly any more. Flat roofs have a correctness of 49.7% and
values (in percent) for the nodes and interactions are presented in completeness of 48.5%. The other classes are mostly identified cor-
Table 3. It can easily be seen that the height above DTM is by far the rectly. This shows that the feature height above DTM can be ne-
most important one. It is the strongest and best discerning feature glected if there is no discrimination between both classes flat
for all classes and relations. All the other ones are less important. roof and road required. It is relatively expensive in computation
Note that only the scale r =1 m was used for the computation of as the raw point cloud has to be filtered first to derive a DTM. An
the interaction features. Both node feature vectors are concate- alternative would be to approximate the DTM by the height of
nated, and each corresponding feature is found with a similar the lowest point within a cylinder of a large radius centred at a gi-
importance vector. Additionally to the absolute elevation values ven point (Mallet, 2010). However, this is only suitable for flat ter-
the difference of heights is also important, but the difference of inten- rain. In particular the terrain of Areas 1 and 2 is characterised by
sity does not seem to contribute much information (rank 40 with several ground levels with different elevations, which is the reason
an importance value of 1.2%). why we used a standard filtering method to derive the DTM.
Moreover, the relevant features are nearly the same for nodes In another experiment we performed a 3D classification only
and interactions: For both the intensity values and its variations, with the most important features. For that purpose, the node and
the direction of the normal vectors, and the variation of height values interaction features were sorted by their importance values. Note
in a local neighbourhood are relevant. On the contrary the features that for the edge features, the importance values for the start
which are delivered from echo number and number of echoes, such and end points was summed up to determine the correct order
as variation of returns and echo ratio, are hardly important for the in this case. After ordering the features, we repeatedly classified
classification of these three areas. The reason might be that most the point cloud, using the nF most important features according

Table 3
Overview of ten most important features for the classification of nodes and
interactions, respectively, ordered by their overall importance value (mean decrease
in Gini index) obtained by RF. SP corresponds to the startpoint and EP to the endpoint
of an edge.

Rank Nodes Edges


Imp. (%) Feature Imp. (%) Feature
1 10.31 Height above DTM 5.07 Height above DTM (SP)
2 3.81 Normal (1 m) 5.06 Height above DTM (EP)
3 3.38 Normal (2 m) 4.83 Height Difference
4 2.46 Var. Z in sphere (1 m) 2.74 Var. Z in cylinder (SP)
5 2.26 point density ratio (1 m) 2.63 Var. Z in cylinder (EP)
6 2.15 Var. Z in sphere (2 m) 2.21 Point density ratio (SP)
7 2.08 Normal (3 m) 2.14 Point density ratio (EP)
8 1.97 Intensity 2.14 Var. Intensity (SP)
9 1.92 Var. Z in cylinder (1 m) 2.12 Var. Intensity (EP)
10 1.70 Normal (5 m) 2.09 Var. normal vector (SP) Fig. 4. Overall accuracy and computation time for training as a function of the
number the nF most important features used for classification.
162 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

Table 5 of 23 features (2  11 node features at scale r ¼ 1 m and height dif-


Difference D in completeness and correctness values for the classification with all ference). In this case the OA is 82.9%. Compared to the classification
features compared to the classification taking into account only the 14 best features.
Positive values indicate improvements by using all features. By considering only the
exploiting all features, this is a slight decrease of 0.5%, but using
14 most important features, the OA decreases by 0.5% (from 83.4% to 82.9%). additional features only results in a very slow increase in OA. An
interesting observation is that no feature from scale r ¼ 5 m is con-
Classes D Completeness (%) D Correctness (%)
tained in the list in Table 4. A comparison of the completeness and
Grassland 5.2 2.6 correctness per object class to those that can be achieved using all
Road 3.0 4.0
Gable roof 4.0 2.0
131 node features is given in Table 5. Positive values indicate a
Low vegetation 2.1 2.1 better performance of using all features. Eight of the 14 values ben-
Façade 4.8 14.1 efit from utilising all features. Particularly the completeness and
Flat roof 0.9 0.2 correctness of the class gable roof require information from the fea-
Tree 3.3 3.9
tures neglected in this experiment; the values slightly decrease in
accuracy by up to 0.9% with the smaller feature subset. On the
other hand, the correctness of façades is improved by 14.1% with
less features, whereas the corresponding completeness rate de-
to the sorted list and varying nF from 1 to 29. Fig. 4 shows the OA of
creases by 4.8%. For most classes, the differences in Table 5 indicate
classification and the computation times as a function of nF .
a trade-off between completeness and correctness. The exceptions
Using only the most important feature (height above the ter-
are grassland and gable roof with about 2% improvement and faç-
rain) resulted in an OA of about 50%. Adding new features initially
ades with about 2% decrease in quality as a trade-off between com-
lead to a sharp increase in OA, but after including about 10–14 fea-
pleteness and correctness (cf. Eq. (14)). Using a smaller amount of
tures, a saturation effect can be observed in Fig. 4. We found that
features leads to a shorter training time. Considering only 14 fea-
using the 14 most important features for the unary potential (cf.
tures, training took 15.4 min, which is less than 1/4 of the time re-
Table 4) was a good trade-off between accuracy and computation
quired for training if all features are used (cf. Section 3.2.3). Taking
time. Correspondingly, the interaction feature vector lij ðxÞ consists

Fig. 5. Label images obtained by MRF based on the 3D classification beliefs. Four classes are discerned in 2D: grassland (khaki), road (grey), building (purple), vegetation
(green).

Fig. 6. Pixel-wise result of class buildings (yellow = TP, red = FP, blue = FN).
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 163

Fig. 7. The point density on right building roof is very low due to the reflectance. Hence the building object is challenging to detect in 2D because of many empty pixels.

Table 6 for the detection of buildings. Most of the FPs are caused by tree
Evaluation results (%): completeness, correctness, quality. areas wrongly classified as building. This is due to the similar fea-
Building Object Object P 50 m2 Per area tures, as mentioned before: the LiDAR points covering trees are
mainly distributed on the canopy and not within the trees, which
Area A1/A2/A3 A1/A2/A3 A1/A2/A3
Completeness 86.5/85.7/83.9 100/100/100 90.8/91.4/91.6
leads to a nearly horizontal and planar point distribution. The rel-
Correctness 89.2/63.2/94.1 96.6/100/100 94.5/96.4/96.7 atively large FN area of the building situated in the north of Area 3
Quality 78.3/57.1/79.7 96.6/100/100 86.3/88.4/88.9 is covered by only very few points due to the properties of the roof
material, as can be seen in Fig. 7. As a consequence, it is challenging
to recover the building outlines only based on the LiDAR point
into account all features improves the result only slightly, but this cloud. Most of the corresponding pixels of the binary image are
gain comes along with a significantly higher computational cost. empty, because no 3D point is projected to these pixels. This effect
can partly be compensated by the MRF (Fig. 5(c)), but the beliefs
for class building are too low for the threshold used in the post-pro-
3.3. 2D building objects
cessing step (difference >50%) to eliminate unlikely object pixels.
However, our approach is nevertheless able to detect the largest
The results of the 3D classification with all features (Sec-
part of this building. Looking at the quantitative evaluation results
tion 3.2.3) serve as input for the generation of 2D building masks.
in Table 6, we see that the area-based completeness and correct-
For that purpose, a grid with a pixel size of 0.5 m is defined. The
ness values for buildings are between 90.8% and 96.7%. The Quality,
LiDAR points are projected to the xy-plane in order to determine
which is defined as (Rutzinger et al., 2009):
the pixel they correspond to. Using a relatively large pixel size
compared to the point density of about 8 points/m2 reduces the 1
Quality ¼ ; ð14Þ
number of ’empty’ pixels. Within each pixel, the averages of all be- Completeness
1
þ Correctness1  1
liefs per class are computed in the way described in Section 2.2. For
the 2D representation the class façades containing vertical objects takes values from 86.3% to 88.9%. The object-based metrics, count-
is neglected. Moreover, the class low vegetation is aggregated with ing a building as a TP if at least 50% of its area is contained in the
trees, and gable roofs with flat roofs by adding the corresponding reference, can also be seen in Table 6. Having a look at the object-
beliefs. Thus, we distinguish between road, grassland, vegetation, based evaluation results reveals that the buildings in Area 1 and 3
and buildings. The beliefs of façades are not considered any more. were detected reliably with completeness and correctness values
Using a multi-class approach enables to extract other object clas- between 83.9% and 94.1%. The objects in Area 2 suffer from a rela-
ses, such as trees, easily. However, in this investigation we focus tively poor correctness value of 63.2%, whereas the completeness
on the buildings. A smoothing of the object borders and filling of was 85.7%. As already mentioned, the FP were caused by some small
the holes without LiDAR points is performed using a Potts model misclassifications of trees labelled as building. This leads to 5 FPs
(cf. Section 2.2). Both the unary and pairwise potentials are equally compared to only 14 reference building objects in the scene. The
weighted, hence the weighting factor in Eq. (12) is set to k = 1 man- low number of objects quickly affects the correctness value. Consid-
ually. This enables to compensate the less meaningful unary poten- ering only building objects with areas larger than 50 m2 all objects
tials in data gaps by smoothing these areas. Inference by LBP takes in Area 2 and 3 were detected correctly with 100% completeness
only a few seconds. and correctness. Only in Area 1 there is one larger FP area (again
From the resulting label image consisting of 4 classes, we de- two neighbouring trees labelled as building), which results in
rived the binary object masks by considering only the building 96.6% quality. We conclude that buildings, especially the larger
class. In a post-processing step, for each building candidate pixel, ones, can be identified reliably by the proposed method.
the difference to the second largest belief is determined. Candi-
dates with a difference smaller than 50% are eliminated in order 4. Conclusions
to obtain only reliable objects. Fig. 5 shows the multi-label images
for the three test areas. These label images are evaluated in the In this paper, we have presented a context-based CRF classifier
context of the ISPRS Test Project based on a 2D reference, using for urban LiDAR point clouds. The result of our classification is a la-
the method described by Rutzinger et al. (2009). The evaluation re- belled 3D point cloud; each point is assigned to one of seven object
sults are depicted in Fig. 6. classes. No segmentation is performed. The point cloud is repre-
In Fig. 6, yellow areas represent the true positives (TP), blue cor- sented by a graphical model, making use of a complex model for
respond to false negative (FN) and red areas to false positive detec- the interaction potentials in which prominent relations between
tions (FP). It becomes evident that the majority of the buildings object classes and the data are learned in a training step. They sup-
were detected correctly. Thus, the proposed approach works well port the classification process and improve the results. Our exper-
164 J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165

iment revealed that the overall accuracy increased from 81.4% to Bishop, C.M., 2006. Pattern Recognition and Machine Learning, vol. 1. Springer, New
York.
83.4% by considering these interactions compared to a indepen-
Boykov, Y.Y., Jolly, M.P., 2001. Interactive graph cuts for optimal boundary & region
dent classification of single points. Even small objects such as gar- segmentation of objects in nd images. In: Proceedings of the Eighth IEEE
ages and pavilions are detected correctly. A comparison of three International Conference on Computer Vision (ICCV), 2001. IEEE, Vancouver,
different versions of a CRF-based classifier has shown that Random Canada, pp. 105–112.
Breiman, L., 2001. Random forests. Machine Learning 45, 5–32.
Forests are well suited for the computation of unary and pairwise Chan, J.C.W., Paelinckx, D., 2008. Evaluation of random forest and adaboost tree-
potentials needed for CRFs. They are faster, more accurate, and able based ensemble classification and spectral band selection for ecotope mapping
to handle a large amount of features compared to the versions using airborne hyperspectral imagery. Remote Sensing of Environment 112,
2999–3011.
based on linear models. An analysis of the feature importance val- Chehata, N., Guo, L., Mallet, C., 2009. Airborne lidar feature selection for urban
ues delivered by RF was carried out both for the node features and classification using random forests. In: International Archives of
for the interaction features. In both groups the relevant features Photogrammetry, Remote Sensing and Spatial Information Sciences. Paris,
France, pp. 207–212.
are nearly the same. The most important one is the height above Chen, C., Liaw, A., Breiman, L., 2004. Using Random Forest to Learn Imbalanced Data.
DTM feature. As shown by an additional experiment, the use of a Technical Report. University of California, Berkeley.
larger amount of (multi-scale) features increases the accuracy only Cramer, M., 2010. The DGPF-test on digital airborne camera evaluation – overview
and test design. Photogrammetrie-Fernerkundung-Geoinformation 2010, 73–82.
slightly by 0.5% compared to a classification based on the 14 most Dorninger, P., Pfeifer, N., 2008. A comprehensive automated 3d approach for
important features, which comes along with a significantly higher building extraction, reconstruction, and regularization from airborne laser
computational effort. In summary, it can be stated that CRFs pro- scanning point clouds. Sensors 8, 7323–7343.
Edelsbrunner, H., Mücke, E.P., 1994. Three-dimensional alpha shapes. ACM
vide a high potential for urban scene classification.
Transactions on Graphics 13, 43–72.
A second stage of the work flow uses the CRF beliefs for each Frey, B., MacKay, D., 1998. A revolution: belief propagation in graphs with cycles. In:
point in a Markov Random Field to derive a 2D multi-label image, Advances in Neural Information Processing Systems, 1–6 December 1997, vol.
which is used to define building objects. Evaluation is performed in 10. MIT Press, Denver, USA, pp. 479–485.
Geman, S., Geman, D., 1984. Stochastic relaxation, gibbs distributions, and the
the context of the ISPRS Test Project on Urban Classification and 3D bayesian restoration of images. IEEE Transactions on Pattern Analysis and
Building Reconstruction hosted by ISPRS WG III/4 (Rottensteiner Machine Intelligence 6, 721–741.
et al., 2012). It can be seen that very good per-area quality values Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random forests for land
cover classification. Pattern Recognition Letters 27, 294–300.
(completeness and correctness >90%) are obtained. On a per-object Hoberg, T., Rottensteiner, F., Heipke, C., 2012. Context models for CRF-based
level especially the large buildings are detected very reliably. Con- classification of multitemporal remote sensing data. In: ISPRS Annals of
sidering all objects some false positives have a negative impact on Photogrammetry, Remote Sensing and Spatial Information Sciences, 25
August–1 September. Melbourne, Australia, pp. 128–134.
the correctness of buildings in Area 2. The buildings in the other Huang, H., Brenner, C., Sester, M., 2013. A generative statistical approach to
two areas are reliably detected with completeness and correctness automatic 3d building roof reconstruction from laser scanning data. ISPRS
rates of >83.9%. Journal of Photogrammetry and Remote Sensing 79, 29–43.
Kraus, K., Pfeifer, N., 1998. Determination of terrain models in wooded areas with
In future work we want to set up a hierarchical CRF for the 3D airborne laser scanner data. ISPRS Journal of Photogrammetry and Remote
classification. The points should be aggregated to objects, on which Sensing 53, 193–203.
the high-level CRF is applied to model the interaction between Kumar, S., Hebert, M., 2006. Discriminative random fields. International Journal of
Computer Vision 68, 179–201.
these objects. Both levels should interact and may influence the
Ladický, L., Russell, C., Kohli, P., Torr, P.H., 2013. Inference methods for crfs with co-
decision of a single point’s classification. Moreover there are still occurrence statistics. International Journal of Computer Vision 103, 213–225.
some confusion errors of tree points wrongly classified as roof. To Lafarge, F., Mallet, C., 2012. Creating large-scale city models from 3d-point clouds: a
cope with this problem, better discriminating features as well as robust approach with hybrid Representation. International Journal of Computer
Vision 99, 69–85.
an optimisation of the graph structure will be investigated. One Li, S.Z., 2009. Markov Random Field Modeling in Image Analysis. Springer.
strategy to be pursued to achieve a better discrimination of grass- Lim, E., Suter, D., 2007. Conditional random field for 3d point clouds with adaptive
land and road may be to apply a radiometric calibration based on data reduction. In: International Conference on Cyberworlds, 24–26 October.
Hannover, Germany, pp. 404–408.
the incidence angle to the intensity feature. In order to improve Lim, E., Suter, D., 2009. 3d terrestrial LIDAR classifications with super-voxels and
the building outlines of the 2D building object images, an incorpo- multi-scale conditional random fields. Computer Aided Design 41, 701–710.
ration of the 3D façade points as additional hint might be helpful. Liu, D., Nocedal, J., 1989. On the limited memory BFGS method for large scale
optimization. Mathematical Programming 45, 503–528.
Finally, as generating training data is a tedious process, we intend Liu, C., Shi, B., Yang, X., Li, N., Wu, H., 2013. Automatic buildings extraction from
to carry out tests concerning the amount of training data required LiDAR data in urban area by neural oscillator network of visual cortex. IEEE
for a good classification performance in order to see whether one Journal of Selected Topics in Applied Earth Observations and Remote Sensing 6,
2008–2019.
could do with less training data than those used in our Lucchi, A., Li, Y., Boix, X., Smith, K., Fua, P., 2011. Are spatial and global constraints
experiments. really necessary for segmentation? In: IEEE International Conference on
Computer Vision (ICCV) 2011. IEEE, Barcelona, Spain, pp. 9–16.
Lucchi, A., Li, Y., Smith, K., Fua, P., 2012. Structured image segmentation using
Acknowledgements kernelized features. In: 12th European Conference on Computer Vision (ECCV
2012). Springer, Florence, Italy, pp. 400–413.
Lu, W., Murphy, K., Little, J., Sheffer, A., Hongbo, F., 2009. A hybrid
The authors would like to thank the anonymous reviewers for conditional random field for estimating the underlying ground surface from
their valuable comments which certainly helped to improve this airborne LiDAR data. IEEE Transactions on Geoscience and Remote Sensing 47,
paper. The Vaihingen data set was provided by the German Society 2913–2922.
Mallet, C., 2010. Analysis of Full-Waveform Lidar Data for Urban Area Mapping.
for Photogrammetry, Remote Sensing and Geoinformation (DGPF)
Ph.D. thesis. Télécom ParisTech.
(Cramer, 2010): http://www.ifp.uni-stuttgart.de/dgpf/DKEP- Mayer, H., 2008. Object extraction in photogrammetric computer vision. ISPRS
Allg.html. Journal of Photogrammetry and Remote Sensing 63, 213–222.
McLaughlin, R.A., 2006. Extracting transmission lines from airborne LIDAR data.
IEEE Geoscience and Remote Sensing Letters 3, 222–226.
References Mountrakis, G., Im, J., Ogole, C., 2011. Support vector machines in remote sensing: a
review. ISPRS Journal of Photogrammetry and Remote Sensing 66, 247–259.
Munoz, D., Vandapel, N., Hebert, M., 2008. Directional associative markov network
Abhishek, J., 2009. Classification and Regression by Randomforest-matlab <http://
for 3-D point cloud classification. In: International Symposium on 3D Data
code.google.com/p/randomforest-matlab>.
Processing, Visualization and Transmission (3DPVT), 18–20 June. Atlanta, USA,
Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., Ng, A., 2005.
pp. 1–8.
Discriminative learning of markov random fields for segmentation of 3d scan
Niemeyer, J., Wegner, J., Mallet, C., Rottensteiner, F., Soergel, U., 2011. Conditional
data. In: Proceedings of the 2005 IEEE Conference on Computer Vision
random fields for urban scene classification with full waveform LiDAR data. In:
and Pattern Recognition (CVPR). IEEE Computer Society, San Diego, USA, pp.
Photogrammetric Image Analysis (PIA). Springer, Munich, Germany, pp. 233–244.
169–176.
J. Niemeyer et al. / ISPRS Journal of Photogrammetry and Remote Sensing 87 (2014) 152–165 165

Niemeyer, J., Rottensteiner, F., Soergel, U., 2012. Conditional random fields for lidar Schindler, K., 2012. An overview and comparison of smooth labeling methods for
point cloud classification in complex urban areas. In: ISPRS Annals of land-cover classification. Transactions on Geoscience and Remote Sensing
Photogrammetry, Remote Sensing and Spatial Information Sciences, (TGRS) 50, 4534–4545.
Proceedings XXII ISPRS Congress (TC III), 25 August–1 September. Melbourne, Schmidt, M., 2012. UGM: a Matlab toolbox for probabilistic undirected graphical
Australia, pp. 263–268. models <http://www.di.ens.fr/mschmidt/Software/code.html>.
Niemeyer, J., Rottensteiner, F., Soergel, U., 2013. Classification of urban LiDAR data Shapovalov, R., Velizhev, A., Barinova, O., 2010. Non-associative markov networks
using conditional random field and random forests. In: IEEE Proceedings of the for 3D point cloud classification. In: Proceedings of the ISPRS Commission III
Joint Urban Remote Sensing Event (JURSE), 21–23 April. São Paulo, Brasil, pp. Symposium – PCV 2010. ISPRS, Saint-Mandé, France, pp. 103–108.
139–142. Shapovalov, R., Vetrov, D., Kohli, P., 2013. Spatial inference machines. In: IEEE
Nowozin, S., Rother, C., Bagon, S., Sharp, T., Yao, B., Kohli, P., 2011. Decision tree Conference on Computer Vision and Pattern Recognition, 23–28 June. Portland,
fields. In: IEEE International Conference on Computer Vision (ICCV), 2011. IEEE, USA, pp. 1–8.
Barcelona, Spain, pp. 1668–1675. Shotton, J., Winn, J., Rother, C., Criminisi, A., 2009. Textonboost for image
Poullis, C., 2013. A framework for automatic modeling from point cloud data. understanding: multi-class object recognition and segmentation by jointly
IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 2563– modeling texture, layout, and context. International Journal of Computer Vision
2574. 81, 2–23.
Reitberger, J., Schnoerr, C., Krzystek, P., Stilla, U., 2009. 3D segmentation of single Vishwanathan, S., Schraudolph, N., Schmidt, M., Murphy, K., 2006. Accelerated
trees exploiting full waveform LIDAR data. ISPRS Journal of Photogrammetry training of conditional random fields with stochastic gradient methods. In: 23rd
and Remote Sensing 64, 561–574. International Conference on Machine Learning, 25–29 June, 2006. Pittsburgh,
Rottensteiner, F., Sohn, G., Jung, J., Gerke, M., Baillard, C., Benitez, S., Breitkopf, U., USA, pp. 969–976.
2012. The ISPRS benchmark on urban object classification and 3D building Wegner, J.D., Hansch, R., Thiele, A., Soergel, U., 2011. Building detection from one
reconstruction. In: ISPRS Annals of Photogrammetry, Remote Sensing and orthophoto and high-resolution InSAR data using conditional random fields.
Spatial Information Sciences, 25 August–1 September. Melbourne, Australia, pp. IEEE Journal of Selected Topics in Applied Earth Observations and Remote
293–298. Sensing 4, 83–91.
Rusu, R., Holzbach, A., Blodow, N., Beetz, M., 2009. Fast geometric point labeling Xiong, X., Munoz, D., Bagnell, J.A., Hebert, M., 2011. 3-D scene analysis via
using conditional random fields. In: IEEE International Conference on Intelligent sequenced predictions over points and regions. In: Proceedings of IEEE
Robots and Systems, 11–15 October, 2009. St. Louis, USA, pp. 7–12. International Conference on Robotics and Automation (ICRA11), 9–13 May.
Rutzinger, M., Rottensteiner, F., Pfeifer, N., 2009. A comparison of evaluation Shanghai, China, pp. 2609–2616.
techniques for building extraction from airborne laser scanning. IEEE Yang, M.Y., Förstner, W., 2011. A hierarchical conditional random field model for
Journal of Selected Topics in Applied Earth Observations and Remote labeling and classifying images of man-made scenes. In: IEEE International
Sensing 2, 11–20. Conference on Computer Vision Workshops (ICCV Workshops), IEEE, 6–13
Sampath, A., Shan, J., 2007. Building boundary tracing and regularization from November. Barcelona, Spain, pp. 196–203.
airborne Lidar point clouds. Photogrammetric Engineering and Remote Sensing
73, 805–812.

You might also like