0% found this document useful (0 votes)
2 views

count1

This paper addresses the challenge of counting buildings in aerial images by learning the relationship between low-level image features and building counts, utilizing building footprints from public databases as labeled data. The authors propose a method that extracts straight line segments from images and uses a linear regression model to predict building counts based on these segments, avoiding the complexities of traditional building detection methods. The approach demonstrates strong linear correlations between line segments and building counts across diverse aerial scenes, making it effective for large datasets.

Uploaded by

jenifer giftlin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

count1

This paper addresses the challenge of counting buildings in aerial images by learning the relationship between low-level image features and building counts, utilizing building footprints from public databases as labeled data. The authors propose a method that extracts straight line segments from images and uses a linear regression model to predict building counts based on these segments, avoiding the complexities of traditional building detection methods. The approach demonstrates strong linear correlations between line segments and building counts across diverse aerial scenes, making it effective for large datasets.

Uploaded by

jenifer giftlin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Learning to Count Buildings in Diverse Aerial Scenes

Jiangye Yuan Anil M. Cheriyadat


Computational Sciences & Engineering Division Computational Sciences & Engineering Division
Oak Ridge National Laboratory Oak Ridge National Laboratory
Oak Ridge, Tennessee 37831 Oak Ridge, Tennessee 37831
[email protected] [email protected]

ABSTRACT is aerial images including satellite images and airborne im-


Determining the number of buildings in aerial images is an ages. A human interpreter can conveniently count build-
important problem because the information greatly benefits ings in images, but it is a tedious and time-consuming task.
applications such as population estimation, change detec- To automate this process, one option is to employ building
tion, and urbanization monitoring. In this paper, we address detection methods that are designed to detect individual
this problem by learning the relationship between low-level buildings in aerial images. Unfortunately, despite decades
image features and building counts. Building footprints from of research, reliably identifying individual buildings in di-
public cartographic databases are used as labeled data. We verse aerial scenes is still challenging [2]. The main reason is
first extract straight line segments from images. A classifier that building appearances vary significantly not only due to
is then trained to identify line segments corresponding to different roof materials, building designs, and lighting con-
building edges. Although there exist mismatches between ditions, but occlusions by shadows and other surrounding
resulting line segments and building edges, we observe a objects. Published work on building detection generally es-
strong linear relationship between building numbers and line tablishes some prior criteria for building appearances and
numbers for similar types of buildings. Based on this obser- identifies objects that satisfy the criteria [19, 17, 7, 13]. Al-
vation, we predict the building count for a given image using though showing promising performance on certain sample
the following method. We find top k images with the most images, such approaches have not been shown to work on
similar appearances from training samples and learn a linear large datasets containing diverse scenes. Note that many
regression model from this image set. The building count is building detection methods utilize LiDAR data that pro-
computed based on the model. Our method avoids the diffi- vides 3D information and can achieve much more reliable
culty in building detection and produces reliable results on performance [15, 21, 1]. However, LiDAR data are consider-
large, diverse datasets. ably more expensive than images. In this work, we will only
rely on optical images.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous; We approach the problem of counting buildings from a new
D.2.8 [Software Engineering]: Metrics—complexity mea- perspective. Instead of identifying buildings in images, we
sures, performance measures propose to learn the relationship between building counts
and low level features and infer building counts directly
based on low level features. In particular, we use straight
General Terms line segments as the low-level features because buildings are
Algorithms, Experimentation typically characterized by straight edges formed by the con-
trast between building roofs and other objects. By using
Keywords building footprints from public cartographic resources as la-
Building count, Aerial images, Straight line extraction beled data, we adaptively learn a linear regression model to
predict building counts. Our strategy has two major ad-
1. INTRODUCTION vantages. First, low-level features in images are much eas-
The number of buildings in an area is highly desirable in- ier and more reliable to extract than high-level information
formation in many geospatial related applications ranging such as individual buildings [9]. Second, we can leverage
from disaster management to urban planning. An effective a massive amount of ancillary data to apply our method
and economical data source for acquiring such information to very large-scale datasets. Unlike many machine learning
tasks that suffer from insufficient labeled data, public carto-
graphic databases provide abundant human-labeled building
footprints that are easily accessible. Exploiting such data to
(c) 2014 Association for Computing Machinery. ACM acknowledges that enhance image understanding capabilities has cultivated a
this contribution was authored or co-authored by an employee, contractor or number of recent research efforts. [10, 12, 20].
affiliate of the United States government. As such, the United States Gov-
ernment retains a nonexclusive, royalty-free right to publish or reproduce The contributions of this work can be summarized as follows.
this article, or to allow others to do so, for Government purposes only.
SIGSPATIAL’14, November 04 - 07 2014, Dallas/Fort Worth, TX, USA
Copyright 2014 ACM 978-1-4503-3131-9/14/11. . . $15.00
http://dx.doi.org/10.1145/2666310.2666389 • We propose an improved line segment extraction method.
The method is computationally efficient and produces
lines well aligned with edges even when an edge has a
low contrast and appears noisy.

• After collecting a large number of samples, we make


an important observation that the number of buildings
is linearly correlated to the number of extracted line
segments when buildings have similar appearances.

• We design a classification method that identifies line


segments corresponding to building edges based on im-
age appearances of surrounding areas. (a) (b)

• We develop a method to predict building counts by


learning a linear regression model from similar images. Figure 1: Misalignment correction. (a) Building
The method counts buildings accurately for images footprints overlaid with the corresponding images.
containing diverse types of buildings. (b) The result after correction.

The rest of the paper is organized as follows. Section 2 this assumption leads to satisfactory results in practice. For
describes the data sources used in this work. Section 3 an image window containing closely located buildings, we
presents our line extraction method. The method for esti- compute the image gradient and perform a cross-correlation
mating building counts are discussed in detail in Section 4. between building footprints and gradient magnitude. If the
In Section 5 we conduct experiments on large datasets and building footprints and images are correctly aligned, the cor-
provide quantitative evaluation. We conclude in Section 6. relation coefficient should reach its maximum. The correc-
tion result of the data in the example can be seen in Fig-
2. DATA SOURCES AND PREPROCESSING ure 1(b).
In our work, we use geo-referenced orthorectified images
with 3 color bands. Although more spectral bands can po-
tentially improve results, in this work we focus on RGB 3. STRAIGHT LINE EXTRACTION
color images. In order to develop a learning method to It is a common practice to rely on some low level image fea-
count buildings, we need labeled data for training and test- tures for finding buildings, such as corners and edges [11,
ing. OpenStreetMap (OSM)1 provides an ideal data source 14]. In this work, we use straight line segments because
for such a purpose. OSM maps are publicly available and a major discriminative characteristic of buildings from an
has detailed building footprints for many cities around the aerial view are straight edges. For line segment extraction,
world. Moreover, as a volunteered geographic information Burns et al. proposed an important method based on line
platform, OSM has over one million contributors to create support regions [3], where each connected region with sim-
and edit geographic data [5], and therefore the map coverage ilar gradient orientations is segmented and line parameters
will keep expanding. are estimated based on the region. In the paper, we follow
this framework and design a new approach to estimate line
Because OSM maps are generated using data sources differ- parameters, which generates accurate results with enhanced
ent from our images, there may exist inconsistency between efficiency.
maps and images. One type of inconsistencies is mismatched
features. For example, a map shows a building which is We use a 7 × 7 derivative of Gaussian filter with σ equal
not in the corresponding image, or vice versa. This issue to 1.2 to compute derivatives in the horizontal and vertical
is mostly caused by the time difference between maps and directions, which provide the gradient direction and mag-
images. Such inconsistencies are often limited in properly nitude at each pixel. For pixels with gradient magnitude
selected datasets. larger than a threshold, their gradient directions are quan-
tized into 8 equally divided bins between 0◦ and 360◦ . Each
Another type of inconsistency is misalignments between maps connected region containing pixels with the same directions
and images, which results from different projections and ac- forms a line support region (i.e., a region containing a line
curacy levels among data sources. Figure 1(a) shows an ex- segment). The direction quantization may cause a line to be
ample of building footprints overlaid with the corresponding broken. To address this issue, the directions are quantized
image. There are noticeable misalignments between building into another 8 bins between 22.5◦ and (360 + 22.5)◦ , and a
footprints and the image. Such misalignments lead to inac- different set of line support regions are produced based on
curate training samples for line classification and building the quantization. The lines extracted from two sets of line
count estimation and need to be corrected. support regions are integrated through a voting scheme.

We apply a simple preprocessing to reduce the inconsisten- Given a line support region, we need to determine the lo-
cies. We assume that in a local neighborhood the build- cation, length, and orientation of a line segment. In Burns’
ing footprints can be aligned with image content through method, line orientations are estimated by fitting planes to
a translation. Despite the lack of theoretical justification, pixel intensities in line support regions, and locations and
lengths are obtained by intersecting a horizontal plane with
1
http://www.openstreetmap.org/ the fitted planes. This method gives accurate results but is
computationally expensive. In order to improve efficiency, a
number of studies estimate line parameters based on bound-
ary shapes of line support regions [16, 18]. However, region
boundaries do not always reflect the actual orientations and
locations of lines. For example, a line support region can
be elongated perpendicularly to the actual line in the re-
gion when the edge is short and blurred. To overcome the
drawbacks while keeping a low computational cost, we ex-
ploit the technique of Harris edge and corner detector [6] to
determine line orientations. For a line support region, if we
shift the region and compute the pixel difference, the largest
difference occurs when the shift is perpendicular to the main
edge in the region, and the smallest difference occurs when
it is along the edge, which corresponds the line orientation.
We construct a structure tensor
I2
 P P 
A= P W x PW Ix I2y , (1) (a) (b) (c)
W Ix Iy W Iy

where Ix and Iy are the derivatives in horizontal and ver-


tical directions in a line support region W . The shift vec- Figure 2: Comparison of line segment extraction re-
tor resulting in smallest difference, which indicates the line sults on two images. (a) Input images. (b) The
orientation, is the eigenvector corresponding to the smaller results from [16]. (c) Our results.
eigenvalues of A. Such a line orientation is derived from the
gradients of all pixels in the region and thus is more robust
to noise. 40

After obtaining the orientation, we need to locate the line 30

Building count
segment such that it is best aligned with the edge in the line
support region. Here we examine the overall gradient mag-
nitude a line passes and choose the one that gives the max- 20
imum value. We use a fast implementation based on Hough
transform. A line is represented as r = x cos θ + y sin θ, 10
where θ can be calculated from the line orientation. Each
pixel location (x, y) in the line support region is plugged
into the equation to obtain an r value, which is assigned 0
0 40 80 120 160 200
to a quantization bin with a weight of its gradient magni- Line number
tude. The bin with the maximum value gives the desired r
value, which together with the orientation defines a unique
line. The part of the line overlapping with the line support Figure 3: Scatter plot of line and building numbers.
region determines the length of the line segment.

Figure 2 shows the lines extracted using our method and


manually count buildings. We plot the number of line seg-
the method in [16], which is based on line support regions
ments and the number of building for all images, as shown
and applies Fourier transform to region boundaries for line
in Figure 3. There is no clear relationship between the two
parameter estimation. It can be seen that the lines in our
variables.
results are better aligned with the edges, especially when
the edges are blurred and noisy. The two methods have
Next, we perform the same analysis on images with similar
comparable computation time.
buildings. We select a number of exemplar images with dif-
ferent building appearances and assign other images to their
4. LEARNING TO COUNT BUILDINGS similar images. To measure image similarity, we use spec-
Our goal is to learn the relationship between line segments tral histograms as image descriptors, which consist of his-
and buildings, which is utilized to estimate building counts. tograms of different filter responses [8]. Spectral histograms
It is a supervised learning with OSM building footprints used have shown to be capable of differentiating image appear-
as labeled data. We now describe the method in detail. ances with properly selected filters. We use RGB color bands
and filter responses of three Laplacian of Gaussian filters to
4.1 Line-building relationship compute spectral histograms and Euclidean distance as a
To investigate the relationship between lines and buildings, distance metric. As a result, images are grouped based on
we conduct the following experiment. We collect over 2000 appearances. We have also experimented with bag-of-words
building images of 0.6 meter resolution from different places representations built on SIFT features [4], but the results of
in the world. The size of each image is 150×150 pixels, corre- spectral histograms are more visually meaningful. We plot
sponding to a 60m×60m area. Such an area generally con- the number of line segments and the number of buildings
tains multiple buildings with approximately homogeneous for images in each group. Figure 4 shows the plots of three
structures. For each image, we extract line segments and groups. Four example images from each group are displayed
40 40 40

Building count 30 30 30

Building count

Building count
20 20 20

10 10 10

0 0 0
0 40 80 120 160 200 0 40 80 120 160 200 0 40 80 120 160 200
Line number Line number Line number

Figure 4: Scatter plots of line and building numbers for different image groups.

below each plot. ficients for different types of buildings. For example, for the
leftmost group in the figure the building number increases
A striking observation from Figure 4 is that there is a strong slowly as the line number increases because each building
linear relationship between line and building numbers. We correspond to more lines in that group. Therefore, we can-
calculate the Pearson correlation coefficient, which measures not apply a single model to all types of buildings.
the strength of the linear relationship between two variables
and equals to 1 in the case of a perfect linear relationship.
The Pearson correlation coefficients for the three groups are 4.2 Line segment classification
0.85, 0.91, and 0.86, respectively. Linear relationships are Line segments from non-building areas should not contribute
also observed for other groups. The main reason for such to any building counts. Removing those line segments can
a line-building relationship is that buildings with similar strengthen the linear relationship between line segment num-
structures tend to exhibit similar numbers of edges from an bers and building counts. Here we aim to identify line seg-
aerial view. Although extracted line segments do not per- ments corresponding to building edges. We train a multi-
fectly match building edges, the mismatches appear consis- layer perceptron (MLP) to classify line segments based on
tent and do not severely affect the linear relationship. There surrounding image appearances.
are a few images that noticeably deviate from the linear re-
lationship. We find that in those images many non-building Based on the line segments extracted from images and the
line segments are counted, which often correspond to roads corresponding building footprints (with alignments corrected
and trees. A stronger linear relationship can be expected if as described in Section 3), we label each line segment as 1 if
non-building line segments are filtered out. its maximum distance to a building edge is smaller than 3
meters and half its length and 0 otherwise. The feature used
Based on this observation, we use a simple linear regres- for classification is spectral histograms. Note that spectral
sion model to associate building numbers with line segment histograms can be used to compare image content regard-
numbers, y = βx, where x is the line segment number, y less of region sizes. For each line segment, the feature is
the building count, and β the regression coefficient. This computed from the region within a certain distance to the
model provides an effective solution for counting buildings line segment. A distinctive attribute of building edges is
with similar appearances. We only need to select several co-occurrence of perpendicular edges. To encode such infor-
small areas to manually count building numbers and extract mation in the classifier, we convert RGB values to grayscale
straight line segments, which are used to estimate β through values and apply two derivative of Gaussian filters, one with
the least square approach. The building number in the en- the same orientation as the line segment and the other per-
tire area is equal to the number of extracted line segments pendicular to the line segment. The two filter responses to-
multiplied by β. gether with RGB color bands are used to compute spectral
histograms, where each band is represented by a histogram
Another observation we have from Figure 4 is that for differ- with 11 equally divided bins. We use two neighborhood sizes
ent groups the line-to-building ratio is different. That is, the to capture information at multiple scales. The MLP has 110
linear regression models may have different regression coef- input nodes to take all feature dimensions, one hidden layer
with 70 nodes, and 1 output node. Since building lines are
issue, we propose to select images similar to the input im-
age from training samples and establish a linear regression
model based on similar images to estimate building counts.
Training samples comprise images in the training set, corre-
sponding building counts obtained from building footprints,
and line segments extracted from the images. To measure
image similarity, we use spectral histograms as image de-
scriptors and Euclidean distance as a distance metric. To
(a) compute spectral histograms, we use RGB color bands and
40 filter responses of three Laplacian of Gaussian filters with
different σ values. After obtaining the K most similar im-
ages from the training pool, their line segment numbers and
30 building counts are taken to estimate the regression coeffi-
Building count

cient using the least-squares approach. Line segments are


20 extracted from the input image and filtered by the trained
MLP. The building number of the input image is immedi-
ately obtained based on the regression coefficient and the
10
line segment number.

0 We use a method based on K-nearest-neighbor (K-NN) search


0 40 80 120 160 200
Line number
to adaptively learn a line-building relationship because such
a method is well suited for our task. Since there are poten-
(b) tially infinite types of buildings, learning a model for each
40 type is intractable. K-NN can naturally deal with a very
large number of classes. Moreover, new training samples
can be easily added without the need of retraining.
30
Building count

The complete procedure of our method can be described as


20 the following three steps.

10
1. Compile a training set that includes images and the
corresponding building footprints. Building counts of
0 each image is determined based building footprints.
0 40 80 120 160 200
Line number 2. Extract line segments for images in the training set.
(c) Label each line segment based on whether it is aligned
with edges in building footprints. Use spectral his-
tograms as features to train a MLP for line segment
Figure 5: Line segment classification. (a) Left: ex- classification. Record the number of line segments fil-
tracted line segments. Right: Line segments classi- tered by the MLP.
fied as building lines. (b) and (c) Scatter plots of line
and building numbers before and after line segment 3. Given an input image, extract line segments and count
classification. those classified as building edges by the trained MLP.
Find the K most similar images from the training
set and use their line numbers and building counts to
often much fewer than non-building lines, the errors dur- derive a linear regression model, which produces the
ing training are weighted based on the size ratio between building count based on the line segment number.
two classes so that the result is not biased toward the large
class. After training, the MLP classifier gives the posterior 5. EXPERIMENTS
probability of a line segment belonging to building edges.
We conduct experiments on two datasets, which will be re-
ferred to as Dataset I and Dataset II. Two datasets corre-
Figure 5(a) illustrates the result of line segment classification
spond to very different geographic areas.
on an image. Line segments in non-building areas are greatly
reduced. Figure 5(b) and (c) show two scatter plots of line
Dataset I covers the urban areas in San Francisco, CA. We
and building numbers for one of the image groups mentioned
collect two 5000 × 5000 image tiles with spatial resolution of
in Section 4.1, where the number of filtered line segments
0.3 meters. We randomly select 400 images of size 250 × 250
has a higher degree of linear dependence to the building
within each image tile. Two sets of images are used for train-
number. The Pearson correlation coefficient increases from
ing and testing respectively. The OSM building footprints
0.89 to 0.92 by filtering line segments.
for the corresponding areas are quite complete. When count-
ing buildings on maps, we count a partial building as one if
4.3 Building count estimation the part contains more than half area of the entire building
As discussed earlier, a single linear regression model can- or an area larger than 50 square meters. According to the
not apply to different types of buildings. To deal with this map data, the number of buildings in these images ranges
1
Table 1: Percentage of correctly counted images
with different error tolerance for Dataset I
Error tolerance 2 3 4 5 0.8
Accuracy 66.1% 79.0% 88.6% 92.9%

Accuracy rate
0.6

0.4
Table 2: Average count error with different K values
on Dataset I
K 3 4 5 6 0.2
Count error 3.08 2.78 2.51 2.51 SU method
Proposed
0
0 2 4 6 8
Error tolerance

from 0 to 31, and the average is 12. In the experiment,


pixels with gradient magnitude larger than 40 are selected
to identify line support regions, and line segments shorter Figure 9: Accuracy rates of the SU and proposed
than 3 meters are removed in order to reduce noise. For line methods.
segment classification, we use the MATLAB neural network
toolbox to construct and train a MLP. For searching K most
similar images, K is set to 5. of 1.73. Example results are presented in Figure 6(b). We
also select two different areas and give the total counts, as
We show some example images from the test set and their shown in Figure 8.
building counts from map data and our method in Fig-
ure 6(a). The counts from our method are rounded to in- We do not find any previous work that explicitly aims at
tegers. As can be seen, building appearances vary to a counting buildings from images. However, this task is closely
large extent. Moreover, many buildings are adjacent to each related to building detection. If buildings in an image are
other, where individual buildings are very difficult to detect. detected, counting buildings is trivial. On the other hand,
The counts from our method are very close to those from if we apply our method to each small window of an image
maps. To better show the diversity of scenes, we apply the so that the counts are localized at a fine scale, the result
method to two areas corresponding to highly different city is close to that of building detection. For comparison, we
blocks. Each area is divided into 250 × 250 image windows select a leading building detection method proposed by Sir-
for processing, and the total count is obtained by aggregat- macek and Unsalan [13], which will be referred to as the
ing the results. Line segments are extracted for the entire SU method. The method extracts SIFT keypoints and con-
area. For each image window, we only count the line seg- structs graphs based on the keypoints. The buildings are
ments with the centroids inside the window so that large identified through subgraph matching, which can handle oc-
buildings with long line segments are not double counted. cluded buildings. We use the code distributed by the au-
The results are shown in Figure 7. thors.

To quantitatively measure the results, we calculate the count Since the SU method cannot detect buildings that are closely
error by comparing the counts from our method and maps. spaced, it fails to produce reasonable results for Dataset I
The average count error is 2.51. To provide a more de- that contains dense buildings. We apply the SU method to
tailed measurement, we compute the percentage of correctly Dataset II and calculate the percentage of correctly counted
counted images at different levels of error tolerance (the images as described earlier. Figure 9 presents the plot of
maximum allowable deviation from the count based on maps), the accuracy rates for both methods. As can be seen, our
which are reported in Table 1. Our method produces cor- method outperforms the SU method by a significant mar-
rect counts for 66.1% images with an error tolerance of 2. gin. By examining the results, we find that the SU method
The accuracy rate reaches 92.9% with an error tolerance of tends to miss buildings with a low contrast to the surround-
5. We also calculate the average count errors using different ing areas because there is often no SIFT feature extracted
K values in K-NN search (see Table 2). We can see that the for those buildings. In our method, line segments can be ex-
results are not overly sensitive to this parameter value. tracted for those buildings and they contribute to the final
count.
Dataset II covers the small city of Kissidougou in southern
Guinea. The spatial resolution of images is 0.6 meters. We
use a 4500 × 2550 image tile corresponding to the south 6. CONCLUSIONS
part of the city for training and a 4500 × 3900 image tile We have presented a method that automatically counts build-
corresponding to the north part for testing. We randomly ings in aerial images. We observe that the number of build-
select 510 images from the training image tile and 780 images ings in images are linearly correlated to the line segment
from the test image tile, where each image is of size 150×150 number. By using building footprints from public carto-
pixels. We use the same parameter setting as for Dataset graphic databases as labeled data, we adaptively learn a lin-
I except adjusting the gradient magnitude threshold to 20 ear regression model to estimate building counts in a given
because of the different image resolution and quality. For image. We test the method on two large datasets containing
this dataset, our counting result has an average count error diverse building scenes and obtain very promising results.
M: 11 A:11 M: 16 A:16 M: 4 A:5 M: 14 A:17 M: 8 A:9

M: 20 A:20 M: 18 A:13 M: 13 A:13 M: 10 A:13 M: 5 A:7

M: 15 A:14 M: 0 A:0 M: 13 A:11 M: 20 A:20 M: 4 A:4

(a)

M: 7 A:8 M: 10 A:11 M: 9 A:10 M: 18 A:15 M: 2 A:3

M: 7 A:7 M: 12 A:13 M: 7 A:8 M: 12 A:12 M: 1 A:2

M: 0 A:0 M: 8 A:10 M: 9 A:9 M: 14 A:14 M: 6 A:7

(b)

Figure 6: Example building count results for individual image windows. (a) Dataset I. (b) Dataset II.
Buildings counts from maps and our automatic method are shown below each image. M stands for maps,
and A our automatic method.
M: 112 A:121

M: 47 A:55

Figure 7: Building count results for different city blocks in Dataset I.


M: 153 A:166

M: 185 A:201

Figure 8: Building count results for different areas in Dataset II.


There are several directions for future work. First, based on [10] V. Mnih and G. E. Hinton. Learning to detect roads
the experiments we find that many incorrect counts come in high-resolution aerial images. In Proceedings of
from images containing multiple types of buildings, where European Conference on Computer Vision. 2010.
the learned model cannot correctly describe the line-building [11] M. Molinier, J. Laaksonen, and T. Hame. Detecting
relationship. To reduce such errors, a plausible solution is to man-made structures and changes in satellite imagery
first segment the image based on texture information so that with a content-based information retrieval system
similar buildings form a segment and then estimate build- built on self-organizing maps. IEEE Transactions on
ing counts for each segment. The choice of segmentation Geoscience and Remote Sensing, 45(4):861–874, 2007.
methods needs to be investigated. Second, the output of [12] Y.-W. Seo, C. Urmson, and D. Wettergreen.
the current method is the number of buildings. In future Exploiting publicly available cartographic resources
studies, we plan to derive more information for buildings for aerial image analysis. In Proceedings of ACM
based on low-level features. For example, it appears feasible SIGSPATIAL international conference on Advances in
to estimate building sizes based on spatial distributions of geographic information systems, 2012.
line segments. [13] B. Sirmacek and C. Unsalan. Urban-area and building
detection using sift keypoints and graph theory. IEEE
7. ACKNOWLEDGMENTS Transactions on Geoscience and Remote Sensing,
This manuscript has been authored by employees of UT- 47(4):1156–1167, 2009.
Battelle, LLC, under contract DE-AC05-00OR22725 with [14] B. Sirmacek and C. Unsalan. A probabilistic
the U.S. Department of Energy. Accordingly, the United framework to detect buildings in aerial and satellite
States Government retains and the publisher, by accepting images. IEEE Transactions on Geoscience and Remote
the article for publication, acknowledges that the United Sensing, 49(1):211–221, 2011.
States Government retains a non-exclusive, paid-up, irrevo- [15] G. Sohn and I. Dowman. Data fusion of
cable, world-wide license to publish or reproduce the pub- high-resolution satellite imagery and LiDAR data for
lished form of this manuscript, or allow others to do so, for automatic building extraction. ISPRS Journal of
United States Government purposes. The authors would Photogrammetry and Remote Sensing, 62(1):43–63,
like to acknowledge the financial support for this research 2007.
from the US Government for the development of LandScan [16] C. Unsalan and K. L. Boyer. Classifying land
USA model and database. development in high-resolution panchromatic satellite
images using straight-line statistics. IEEE
8. REFERENCES Transactions on Geoscience and Remote Sensing,
[1] M. Awrangjeb, M. Ravanbakhsh, and C. S. Fraser. 42(4):907–919, 2004.
Automatic detection of residential buildings using [17] C. Ünsalan and K. L. Boyer. A system to detect
LiDAR data and multispectral imagery. ISPRS houses and residential street networks in multispectral
Journal of Photogrammetry and Remote Sensing, satellite images. Computer Vision and Image
65(5):457–467, 2010. Understanding, 98(3):423–461, 2005.
[2] T. Blaschke. Object based image analysis for remote [18] R. von Gioi, J. Jakubowicz, J.-M. Morel, and
sensing. ISPRS journal of photogrammetry and remote G. Randall. Lsd: A fast line segment detector with a
sensing, 65(1):2–16, 2010. false detection control. IEEE Transactions on Pattern
[3] J. B. Burns, A. R. Hanson, and E. M. Riseman. Analysis and Machine Intelligence, 32(4):722–732,
Extracting straight lines. IEEE Transactions on 2010.
Pattern Analysis and Machine Intelligence, [19] Y. Wei, Z. Zhao, and J. Song. Urban building
PAMI-8(4):425–455, 1986. extraction from high-resolution satellite panchromatic
[4] G. Csurka, C. Dance, L. Fan, J. Willamowski, and image using clustering and edge detection. In
C. Bray. Visual categorization with bags of keypoints. Proceedings of IEEE International Geoscience and
In ECCV Workshop on statistical learning in Remote Sensing Symposium, 2004.
computer vision, 2004. [20] J. Yuan and A. M. Cheriyadat. Road segmentation in
[5] M. Haklay and P. Weber. Openstreetmap: aerial images by exploiting road vector data. In
User-generated street maps. IEEE Pervasive Proceedings of International Conference on Computing
Computing, 7(4):12–18, 2008. for Geospatial Research and Application, 2013.
[6] C. Harris and M. Stephens. A combined corner and [21] Q.-Y. Zhou and U. Neumann. Fast and extensible
edge detector. In Alvey Vision Conferencel, 1988. building modeling from airborne lidar data. In
[7] K. Karantzalos and N. Paragios. Recognition-driven Proceedings of ACM SIGSPATIAL international
two-dimensional competing priors toward automatic conference on Advances in geographic information
and accurate building detection. IEEE Transactions systems, 2008.
on Geoscience and Remote Sensing, 47(1):133–144,
2009.
[8] X. Liu and D. Wang. A spectral histogram model for
texton modeling and texture discrimination. Vision
Research, 42(23):2617–2634, 2002.
[9] H. Mayer. Object extraction in photogrammetric
computer vision. ISPRS Journal of Photogrammetry
and Remote Sensing, 63(2):213–222, 2008.

You might also like