count1
count1
The rest of the paper is organized as follows. Section 2 this assumption leads to satisfactory results in practice. For
describes the data sources used in this work. Section 3 an image window containing closely located buildings, we
presents our line extraction method. The method for esti- compute the image gradient and perform a cross-correlation
mating building counts are discussed in detail in Section 4. between building footprints and gradient magnitude. If the
In Section 5 we conduct experiments on large datasets and building footprints and images are correctly aligned, the cor-
provide quantitative evaluation. We conclude in Section 6. relation coefficient should reach its maximum. The correc-
tion result of the data in the example can be seen in Fig-
2. DATA SOURCES AND PREPROCESSING ure 1(b).
In our work, we use geo-referenced orthorectified images
with 3 color bands. Although more spectral bands can po-
tentially improve results, in this work we focus on RGB 3. STRAIGHT LINE EXTRACTION
color images. In order to develop a learning method to It is a common practice to rely on some low level image fea-
count buildings, we need labeled data for training and test- tures for finding buildings, such as corners and edges [11,
ing. OpenStreetMap (OSM)1 provides an ideal data source 14]. In this work, we use straight line segments because
for such a purpose. OSM maps are publicly available and a major discriminative characteristic of buildings from an
has detailed building footprints for many cities around the aerial view are straight edges. For line segment extraction,
world. Moreover, as a volunteered geographic information Burns et al. proposed an important method based on line
platform, OSM has over one million contributors to create support regions [3], where each connected region with sim-
and edit geographic data [5], and therefore the map coverage ilar gradient orientations is segmented and line parameters
will keep expanding. are estimated based on the region. In the paper, we follow
this framework and design a new approach to estimate line
Because OSM maps are generated using data sources differ- parameters, which generates accurate results with enhanced
ent from our images, there may exist inconsistency between efficiency.
maps and images. One type of inconsistencies is mismatched
features. For example, a map shows a building which is We use a 7 × 7 derivative of Gaussian filter with σ equal
not in the corresponding image, or vice versa. This issue to 1.2 to compute derivatives in the horizontal and vertical
is mostly caused by the time difference between maps and directions, which provide the gradient direction and mag-
images. Such inconsistencies are often limited in properly nitude at each pixel. For pixels with gradient magnitude
selected datasets. larger than a threshold, their gradient directions are quan-
tized into 8 equally divided bins between 0◦ and 360◦ . Each
Another type of inconsistency is misalignments between maps connected region containing pixels with the same directions
and images, which results from different projections and ac- forms a line support region (i.e., a region containing a line
curacy levels among data sources. Figure 1(a) shows an ex- segment). The direction quantization may cause a line to be
ample of building footprints overlaid with the corresponding broken. To address this issue, the directions are quantized
image. There are noticeable misalignments between building into another 8 bins between 22.5◦ and (360 + 22.5)◦ , and a
footprints and the image. Such misalignments lead to inac- different set of line support regions are produced based on
curate training samples for line classification and building the quantization. The lines extracted from two sets of line
count estimation and need to be corrected. support regions are integrated through a voting scheme.
We apply a simple preprocessing to reduce the inconsisten- Given a line support region, we need to determine the lo-
cies. We assume that in a local neighborhood the build- cation, length, and orientation of a line segment. In Burns’
ing footprints can be aligned with image content through method, line orientations are estimated by fitting planes to
a translation. Despite the lack of theoretical justification, pixel intensities in line support regions, and locations and
lengths are obtained by intersecting a horizontal plane with
1
http://www.openstreetmap.org/ the fitted planes. This method gives accurate results but is
computationally expensive. In order to improve efficiency, a
number of studies estimate line parameters based on bound-
ary shapes of line support regions [16, 18]. However, region
boundaries do not always reflect the actual orientations and
locations of lines. For example, a line support region can
be elongated perpendicularly to the actual line in the re-
gion when the edge is short and blurred. To overcome the
drawbacks while keeping a low computational cost, we ex-
ploit the technique of Harris edge and corner detector [6] to
determine line orientations. For a line support region, if we
shift the region and compute the pixel difference, the largest
difference occurs when the shift is perpendicular to the main
edge in the region, and the smallest difference occurs when
it is along the edge, which corresponds the line orientation.
We construct a structure tensor
I2
P P
A= P W x PW Ix I2y , (1) (a) (b) (c)
W Ix Iy W Iy
Building count
segment such that it is best aligned with the edge in the line
support region. Here we examine the overall gradient mag-
nitude a line passes and choose the one that gives the max- 20
imum value. We use a fast implementation based on Hough
transform. A line is represented as r = x cos θ + y sin θ, 10
where θ can be calculated from the line orientation. Each
pixel location (x, y) in the line support region is plugged
into the equation to obtain an r value, which is assigned 0
0 40 80 120 160 200
to a quantization bin with a weight of its gradient magni- Line number
tude. The bin with the maximum value gives the desired r
value, which together with the orientation defines a unique
line. The part of the line overlapping with the line support Figure 3: Scatter plot of line and building numbers.
region determines the length of the line segment.
Building count 30 30 30
Building count
Building count
20 20 20
10 10 10
0 0 0
0 40 80 120 160 200 0 40 80 120 160 200 0 40 80 120 160 200
Line number Line number Line number
Figure 4: Scatter plots of line and building numbers for different image groups.
below each plot. ficients for different types of buildings. For example, for the
leftmost group in the figure the building number increases
A striking observation from Figure 4 is that there is a strong slowly as the line number increases because each building
linear relationship between line and building numbers. We correspond to more lines in that group. Therefore, we can-
calculate the Pearson correlation coefficient, which measures not apply a single model to all types of buildings.
the strength of the linear relationship between two variables
and equals to 1 in the case of a perfect linear relationship.
The Pearson correlation coefficients for the three groups are 4.2 Line segment classification
0.85, 0.91, and 0.86, respectively. Linear relationships are Line segments from non-building areas should not contribute
also observed for other groups. The main reason for such to any building counts. Removing those line segments can
a line-building relationship is that buildings with similar strengthen the linear relationship between line segment num-
structures tend to exhibit similar numbers of edges from an bers and building counts. Here we aim to identify line seg-
aerial view. Although extracted line segments do not per- ments corresponding to building edges. We train a multi-
fectly match building edges, the mismatches appear consis- layer perceptron (MLP) to classify line segments based on
tent and do not severely affect the linear relationship. There surrounding image appearances.
are a few images that noticeably deviate from the linear re-
lationship. We find that in those images many non-building Based on the line segments extracted from images and the
line segments are counted, which often correspond to roads corresponding building footprints (with alignments corrected
and trees. A stronger linear relationship can be expected if as described in Section 3), we label each line segment as 1 if
non-building line segments are filtered out. its maximum distance to a building edge is smaller than 3
meters and half its length and 0 otherwise. The feature used
Based on this observation, we use a simple linear regres- for classification is spectral histograms. Note that spectral
sion model to associate building numbers with line segment histograms can be used to compare image content regard-
numbers, y = βx, where x is the line segment number, y less of region sizes. For each line segment, the feature is
the building count, and β the regression coefficient. This computed from the region within a certain distance to the
model provides an effective solution for counting buildings line segment. A distinctive attribute of building edges is
with similar appearances. We only need to select several co-occurrence of perpendicular edges. To encode such infor-
small areas to manually count building numbers and extract mation in the classifier, we convert RGB values to grayscale
straight line segments, which are used to estimate β through values and apply two derivative of Gaussian filters, one with
the least square approach. The building number in the en- the same orientation as the line segment and the other per-
tire area is equal to the number of extracted line segments pendicular to the line segment. The two filter responses to-
multiplied by β. gether with RGB color bands are used to compute spectral
histograms, where each band is represented by a histogram
Another observation we have from Figure 4 is that for differ- with 11 equally divided bins. We use two neighborhood sizes
ent groups the line-to-building ratio is different. That is, the to capture information at multiple scales. The MLP has 110
linear regression models may have different regression coef- input nodes to take all feature dimensions, one hidden layer
with 70 nodes, and 1 output node. Since building lines are
issue, we propose to select images similar to the input im-
age from training samples and establish a linear regression
model based on similar images to estimate building counts.
Training samples comprise images in the training set, corre-
sponding building counts obtained from building footprints,
and line segments extracted from the images. To measure
image similarity, we use spectral histograms as image de-
scriptors and Euclidean distance as a distance metric. To
(a) compute spectral histograms, we use RGB color bands and
40 filter responses of three Laplacian of Gaussian filters with
different σ values. After obtaining the K most similar im-
ages from the training pool, their line segment numbers and
30 building counts are taken to estimate the regression coeffi-
Building count
10
1. Compile a training set that includes images and the
corresponding building footprints. Building counts of
0 each image is determined based building footprints.
0 40 80 120 160 200
Line number 2. Extract line segments for images in the training set.
(c) Label each line segment based on whether it is aligned
with edges in building footprints. Use spectral his-
tograms as features to train a MLP for line segment
Figure 5: Line segment classification. (a) Left: ex- classification. Record the number of line segments fil-
tracted line segments. Right: Line segments classi- tered by the MLP.
fied as building lines. (b) and (c) Scatter plots of line
and building numbers before and after line segment 3. Given an input image, extract line segments and count
classification. those classified as building edges by the trained MLP.
Find the K most similar images from the training
set and use their line numbers and building counts to
often much fewer than non-building lines, the errors dur- derive a linear regression model, which produces the
ing training are weighted based on the size ratio between building count based on the line segment number.
two classes so that the result is not biased toward the large
class. After training, the MLP classifier gives the posterior 5. EXPERIMENTS
probability of a line segment belonging to building edges.
We conduct experiments on two datasets, which will be re-
ferred to as Dataset I and Dataset II. Two datasets corre-
Figure 5(a) illustrates the result of line segment classification
spond to very different geographic areas.
on an image. Line segments in non-building areas are greatly
reduced. Figure 5(b) and (c) show two scatter plots of line
Dataset I covers the urban areas in San Francisco, CA. We
and building numbers for one of the image groups mentioned
collect two 5000 × 5000 image tiles with spatial resolution of
in Section 4.1, where the number of filtered line segments
0.3 meters. We randomly select 400 images of size 250 × 250
has a higher degree of linear dependence to the building
within each image tile. Two sets of images are used for train-
number. The Pearson correlation coefficient increases from
ing and testing respectively. The OSM building footprints
0.89 to 0.92 by filtering line segments.
for the corresponding areas are quite complete. When count-
ing buildings on maps, we count a partial building as one if
4.3 Building count estimation the part contains more than half area of the entire building
As discussed earlier, a single linear regression model can- or an area larger than 50 square meters. According to the
not apply to different types of buildings. To deal with this map data, the number of buildings in these images ranges
1
Table 1: Percentage of correctly counted images
with different error tolerance for Dataset I
Error tolerance 2 3 4 5 0.8
Accuracy 66.1% 79.0% 88.6% 92.9%
Accuracy rate
0.6
0.4
Table 2: Average count error with different K values
on Dataset I
K 3 4 5 6 0.2
Count error 3.08 2.78 2.51 2.51 SU method
Proposed
0
0 2 4 6 8
Error tolerance
To quantitatively measure the results, we calculate the count Since the SU method cannot detect buildings that are closely
error by comparing the counts from our method and maps. spaced, it fails to produce reasonable results for Dataset I
The average count error is 2.51. To provide a more de- that contains dense buildings. We apply the SU method to
tailed measurement, we compute the percentage of correctly Dataset II and calculate the percentage of correctly counted
counted images at different levels of error tolerance (the images as described earlier. Figure 9 presents the plot of
maximum allowable deviation from the count based on maps), the accuracy rates for both methods. As can be seen, our
which are reported in Table 1. Our method produces cor- method outperforms the SU method by a significant mar-
rect counts for 66.1% images with an error tolerance of 2. gin. By examining the results, we find that the SU method
The accuracy rate reaches 92.9% with an error tolerance of tends to miss buildings with a low contrast to the surround-
5. We also calculate the average count errors using different ing areas because there is often no SIFT feature extracted
K values in K-NN search (see Table 2). We can see that the for those buildings. In our method, line segments can be ex-
results are not overly sensitive to this parameter value. tracted for those buildings and they contribute to the final
count.
Dataset II covers the small city of Kissidougou in southern
Guinea. The spatial resolution of images is 0.6 meters. We
use a 4500 × 2550 image tile corresponding to the south 6. CONCLUSIONS
part of the city for training and a 4500 × 3900 image tile We have presented a method that automatically counts build-
corresponding to the north part for testing. We randomly ings in aerial images. We observe that the number of build-
select 510 images from the training image tile and 780 images ings in images are linearly correlated to the line segment
from the test image tile, where each image is of size 150×150 number. By using building footprints from public carto-
pixels. We use the same parameter setting as for Dataset graphic databases as labeled data, we adaptively learn a lin-
I except adjusting the gradient magnitude threshold to 20 ear regression model to estimate building counts in a given
because of the different image resolution and quality. For image. We test the method on two large datasets containing
this dataset, our counting result has an average count error diverse building scenes and obtain very promising results.
M: 11 A:11 M: 16 A:16 M: 4 A:5 M: 14 A:17 M: 8 A:9
(a)
(b)
Figure 6: Example building count results for individual image windows. (a) Dataset I. (b) Dataset II.
Buildings counts from maps and our automatic method are shown below each image. M stands for maps,
and A our automatic method.
M: 112 A:121
M: 47 A:55
M: 185 A:201