0% found this document useful (0 votes)
8 views

Detection

Uploaded by

jobtaccad27
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Detection

Uploaded by

jobtaccad27
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/280312027

Detection of collapsed buildings with the aerial images captured from UAV

Article in Science China Information Sciences · May 2015

CITATION READS

1 258

1 author:

Chunsheng Hua
Liaoning University (Previously SIA, Chinese Academy of Sciences)
40 PUBLICATIONS 463 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chunsheng Hua on 24 July 2015.

The user has requested enhancement of the downloaded file.


SCIENCE CHINA
Information Sciences

Article

d
Detection of collapsed buildings with the aerial images captured from UAV

Hua ChunSheng, Qi JunTong, Shang Hong, Hu WeiJian, Han JianDa

Sci China Inf Sci, Just Accepted Manuscript • DOI: 10.1007/s11432-015-5341-7

te
Publication Date (Web): 2015-05-13

Just Accepted
“Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They
are posted online prior to technical editing, formatting for publication and author proofing.
The Science China Press (SCP) provides “Just Accepted” as a free service to the research
ep community to expedite the dissemination of scientific material as soon as possible after
acceptance. “Just Accepted” manuscripts appear in full in PDF format. “Just Accepted”
manuscripts have been fully peer reviewed, but should not be considered the official
version of record. They are accessible to all readers and citable by the Digital Object
Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the
“Just Accepted” Web site may not include all articles that will be published in the journal.
After a manuscript is technically edited and formatted, it will be removed from the “Just
Accepted” Web site and published as an “Online First” article. Note that technical editing
may introduce minor changes to the manuscript text and/or graphics which could affect
cc
content, and all legal disclaimers and ethical guidelines that apply to the journal pertain.
SCP cannot be held responsible for errors or consequences arising from the use of
information contained in these “Just Accepted” manuscripts.
A

*Corresponding author ([email protected])

© Science China Press and Springer-Verlag Berlin Heidelberg 2015 info.scichina.com link.springerlink.com
SCIENCE CHINA
Information Sciences Manuscript
. RESEARCH PAPER . xxxxxx 2014, Vol. 57 xxxxxx:1–xxxxxx:15
doi: xxxxxxxxxxxxxx

Detection of collapsed buildings with the aerial

d
images captured from UAV

Hua ChunSheng1 , Qi JunTong1 * , Shang Hong2 , Hu WeiJian2 & Han JianDa1

te
1State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, ShenYang 110079, China;
2National Earthquake Response Support Service, Beijing 100049, China

Received ; accepted

Abstract In this paper, we present a method of detecting the collapsed buildings with the aerial images
which are captured by an unmanned aerial vehicle (UAV) for the postseismic evaluation. Different from the
ep conventional methods that apply the satellite images or the high-altitude UAV for the coarse disaster evaluation
over large area, the purpose of this work is to achieve the accurate detection of collapsed buildings in small
area from low altitude. By combining the motion and appearance features of collapsed buildings extracted
from successive aerial images, each pixel in the input image will be measured by a statistical method where
the background pixels will be penalized and the ones of collapsed buildings will be assigned with high value.
The candidates of collapsed buildings will be established by integrating the extracted feature points into local
groups with the online clustering algorithm. To reduce the false alarm caused by the complex background noise,
each predicted candidate will be further verified by the temporal tracking framework where both the trajectory
and the appearance of a candidate will be measured. The candidate of collapsed buildings that can survive
through long time will be considered as true positive, otherwise rejected as a false alarm. Through extensive
experiments, the efficiency and the effectiveness of proposed algorithm have been proved.
cc
Keywords Collapsed Buildings, Aerial Images, UAV, Online Detection, Temporal Tracking

Citation Hua Chunsheng. Title citation. Sci China Inf Sci, 2014, 57: xxxxxx(15), doi: xxxxxxxxxxxxxx

1 Introduction
After an earthquake, since the collapsed building is one of the most serious threats to human lives and
A

properties, the ability to automatically detect collapsed building will be very important for the following
rescue process. That is because such an ability can help people to rapidly evaluate the possible loss and
efficiently assign the limited rescue resource to the people who need the help most. In this paper, we
focus on the task of detecting collapsed buildings from the low-altitude aerial images captured from an
unmanned aerial vehicle (UAV) over small area. Here, we refer to define “collapsed building” as a building
that is completely destroyed or part of it is broken and its shape has been changed. The buildings whose
roof or glasses are broken or their wall has cracked are called as the damaged buildings.
According to their working mechanisms, the traditional postseismic evaluation methods for collapsed
buildings can be mainly categorized into three types as: (I) predicting the number and coarse distribution
* Corresponding author (email: [email protected])
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:2

of collapsed buildings according to the earthquake strength and geography structure; (II) coarse evaluating
the collapsed buildings using the remote sensing images captured by the satellite or high-altitude fixed-
wing UAV; and (III) accurate evaluation over individual buildings by the reports received from the rescue
team.
Method (I) is the fastest method that can predict the distribution and approximate number of collapsed
buildings according to the earthquake parameters (such as the earthquake intensity and hypo central

d
depth) as well as the geography structure of the hypocenter. Since this method can only provide the
approximate information of collapsed buildings, further accurate evaluation over individual building still
relies on the other methods. As for Method (II), although it could achieve the more accurate detection
of collapsed buildings in a wide area, its success is based on the assumption that the preseismic and
postseismic geographic data of such an evaluated area are available (detailed introduction could be found

te
in Section 2). Method (III) is the most accurate postseismic evaluation because the number and position
of collapsed buildings are checked one by one by humans. Therefore, it means that this method will be
laborious and time-consuming.
As shown in Fig. 1, the difficulties of detecting collapsed buildings from the low-altitude aerial im-
ages include: (1) the appearance variation of background objects (including normal buildings and other
objects) is huge; and (2) the appearance of collapsed building will be unpredictable due to the camera
viewpoints, illumination, and the camera motion, etc.. Due to such problems, the conventional learning-
based object detection algorithms [18, 19, 20, 21, 12] become unsuitable for such a detection task. That
ep
is because the binary object detector derived from a leaning based algorithm is usually obtained from the
offline training process, where massive positive and negative training samples are required to cover the
possible variation of target and background conditions. In the case of detecting collapsed buildings from
a low-altitude UAV, the mobile aerial camera further increases the complexity and quantity of training
data sets. Such complex huge training data sets will lead to the overtraining problem, which means that,
under an unknown scene, there will be no guarantee of the performance of a learning based detection
algorithm.
cc

(a) Huge variation in the appearance (b) Unpredictable appearance variation


of normal buildings. of the collapsed buildings.
A

Figure 1 The huge appearance variation of normal and collapsed buildings makes the collapse detection a difficult task
for the learning-based algorithms.

In this paper, for the disaster evaluation, we present an online detection framework for the collapsed
buildings by only using the postseismic aerial images which are captured from the UAV. Based on the
appearance and motion extracted from the successive aerial images, the probability that one image pixel
belonging to a part of one collapsed building will be measured by a statistic formulation, where pixels
of collapsed building will gain high value, while the background ones will be penalized. Through such
measurement, a collapsed building will be represented by a group of feature points that obtain high
similarity value, and the candidates of collapsed buildings will be produced by integrating those points
into local regions with the online clustering algorithm. To reduce the false alarms caused by the random
background noise, further verification of the produced candidates will be carried out by a temporal
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:3

tracking process, where a candidate that can survive through long term will be considered as a truly
collapsed building, otherwise a false alarm. The contribution of this paper includes: (1) setting up a new
method for describing the appearance of collapsed buildings; (2) a novel energy formulation is applied
for measuring the probability that an image pixel belongs to parts of collapsed buildings according
to the motion, appearance, and color information; (3) verifying the candidate of collapsed buildings
with temporal tracking algorithm; and (4) achieving almost real-time detection of collapsed buildings

d
under complex conditions. Through various experiments under complex condition, the efficiency and the
effectiveness of proposed algorithm have been proved.

2 Related works

te
To detect the collapsed buildings automatically, great efforts have been made such as using the high-
resolution satellite imagery or the images captured from the high-altitude UAV to estimate the number
and position of collapsed buildings. Generally, we can categorize those methods into two types as:
detecting the collapsed buildings (1) by measuring the difference between the preseismic and postseismic
digital surface models (hereafter called as DSM s) created from the aerial images of target area, and (2)
by checking changes (including height or shadow) in the ground 3D models created by the light detection
and ranging (LIDAR) system.
Tong [1, 2] et al. brought an accurate detection algorithm of the collapsed building by comparing
ep
the changes in preseismic and postseismic digital elevation model (hereafter called as DEM s) which
are created by the high-resolution satellite IKONOS stereo images. In [1], they detected the collapsed
buildings by comparing the height changes in the DEM before and after earthquake, where huge height
changes correspond to a collapsed building. While in [2], if the shadow of a building changes greatly
before and after the earthquake, it indicates that such a building is a collapsed one.
Similar ideas have been applied in [6], where the DSM s of all buildings are automatically created by
a pair of aerial images, and the detection of collapsed buildings is achieved by checking the changes in
DSM before and after earthquake. In [7], a binary detector of collapsed building was obtained from the
support vector machine (SVM), where the training data include the height changes in DSM and the
manually selected feature points of buildings. The collapsed buildings are detected by running such a
cc
detector over the test image at all positions and scales. The success of these fore-mentioned methods is
based on the assumption that the preseismic and postseismic DSM s (or DEM s) are available. However,
such an assumption may not be accessible, especially in the beginning days after earthquake, because it
will take days to acquire the high-resolution satellite imagery and create the DSM ( or DEM ) from the
captured images.
Other efforts [4, 5, 17] have been made to detect the collapsed buildings by only using the postseismic
images. In [4], the feature points of collapsed buildings are extracted from the Airborne Laser Scanner
(ALS) data with Hough Transformation, a binary detector of collapsed buildings is obtained by training
the feature points of collapsed buildings and background component with the maximum entropy modeling
method. While in [5], the principal component analysis and linear discriminant analysis algorithms were
A

applied to reduce the dimensions of training ALS data while keeping the detection accuracy unchanged.
Since the appearance variation of collapsed buildings and their surrounding background are huge (like
Fig.1), such learning-based detection algorithms usually require huge training data sets to cover all
the possible appearance variations. However, too many training samples may lead to the overtraining
problem, which means, under an unknown scene, the performance of such detection algorithms will not
be guaranteed.
Besides the learning-based detection algorithms, other online collapsed detection methods have also
been well studied. Li [15] et al. combined the Laplacian gradients with the morphological textures
to describe the appearance feature of collapsed building under a spectral band of color. The collapsed
buildings are extracted as groups of pixels whose appearance value is over a predefined threshold. Despite
its good results reported in that work, the performance of this work heavily depends on the manually
selected thresholds, which means this work has to work under the supervision of humans. In [17], the
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:4

features of damaged buildings are described by the gray image and a gradient orientation histogram.
When the variation of histogram is over a threshold, an object is considered as a collapsed building,
otherwise uncollapsed. To reduce the computational complexity, in this work, all the test samples are
the image patches manually cropped from the full images and divided into two pure groups: collapsed
buildings and background. So, the complexity of test image patches is determined by humans. Therefore,
without the human prior knowledge, when applied to a full aerial image, the performance of this method

d
is suspected. Since covering all the researches about detecting the collapsed buildings is beyond the scope
of this paper, more detailed survey could be found in [1, 2].

3 Proposed system

te
3.1 System setup and overview

To detect the collapsed buildings from a low-altitude UAV, we proposed an online bottom-top detection
method that can isolate a collapsed building from its surrounding background by combining the motion
and appearance features of image pixels. Fig. 2 shows the hardware setup of our system, which is
composed of a rotor-wing UAV [8, 9, 11] and a pan-tilt active camera platform. The aerial images
captured from the active camera will be transferred to a ground PC where our detection algorithm will
process those images to find the collapsed buildings. The flowchart of the proposed system has been
ep
shown in Fig. 2.
input successive aerial images

motion analysis appearance analysis


(shape and color)

similarity measurement

estimating candidates of damaged buildings


Active camera Rotor-wing unmanned
platform aerial vehicle verification of the estimated candidates

outputs
cc
Figure 2 The hardware setup and flowchart of the proposed collapse detection system.

3.1.1 Motion analysis of aerial images


Since the ruins of collapsed building often fall onto the ground at arbitrary directions, correspondingly,
their shape will contain random gradients without particular orientations. Based on such consideration,
in the successive aerial images, the random shape of ruins will create some special motion features that
could help us to discriminate them from the uncollapsed buildings. To extract the motion feature from the
ruins of collapsed buildings, at a given pixel (x, y), we can compute its motion vector I = {Ix , Iy , It } as
A

the first-order derivation of gradient in the spatial and temporal directions from three successive frames:

Ix = I(x + 1, y, t) − I(x − 1, y, t), (1)


Iy = I(x, y + 1, t) − I(x, y − 1, t), (2)
It = I(x, y, t + 1) − I(x, y, t − 1), (3)

||▽I|| = Ix2 + Iy2 + It2 . (4)

Here, Ix , Iy are the spatial gradients of (x, y), It represents its temporal movement and ||▽I|| means the
strength of vector I. Through the motion analysis, each pixel within the image will be assigned with a
binary label M (x, y) as
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:5

{
1 if ||▽I|| − θ > 0
M (x, y) = .
0 otherwise

Here, θ is a predefined threshold for filtering out the unnecessary noise (here, we set θ = 20).

d
3.1.2 Appearance analysis
To avoid the overtraining problem that the learning-based detection algorithms suffer from, in this paper,
we select the online bottom-top algorithm for detecting the collapsed buildings. Since the ruins of
collapsed building usually appear to be a small region that contains strong random gradients in almost

te
all directions, we select the histogram of oriented gradient [19] (HOG) feature to describe the appearance
of ruins. Centered at pixel (x, y), the computed gradient of all pixels within a n × n cell is oriented
into a m-bin histogram, where each bin corresponds to one orientation. After the L1 normalization, the
variation of such a histogram is calculated as:

v
u
u1 ∑ m
H(x, y) = t ∥bi − b∥2 . (5)
m i=1
ep
Here, bi represents the bin value of each orientation and b refers to the mean value of all bins. As
shown in Fig. 3, since a normal building always contains the well-organized gradients in the parallel
direction, its variation of histogram bins tends to be big (in other words, the output of Eq. (5) is big).
As for a collapsed one, its bin variation of HOG feature will be quite small. This method could help us
to discriminate a normal building from a collapsed one without the prior training process.

Normal building Extracted HOG feature 90


Histogram of normal buildings

80
70
60
50
cc
40
30
20
10
0
1 2 3 4 5 6 7 8 9

Histogram of collapsed buildings


Collapsed building Extracted HOG feature 90
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9

Figure 3 The appearance variations of normal and collapsed buildings with HOG feature.
A

Besides the HOG feature, further color similarity measurement is also applied in this work. Through
the analysis of captured aerial images over the countryside in Lushan Area, the collapsed buildings
are noticed to contain similar color to each other. That is because, in the same town, those buildings
are usually composed of similar materials (concrete and woods) and painted with similar color, which
is quite normal in the countryside. Based on such an observation, a predefined color (Rp , Gp , Bp ) is
online selected by the UAV operator to describe the representative color of collapsed buildings (it means
that this predefined color may change from scene to scene). At each image pixel (x, y), whose color is
represented as (Rx,y , Gx,y , Bx,y ), a Gaussian kennel mask C(x, y) is used to describe the color similarity
between (Rp , Gp , Bp ) and (Rx,y , Gx,y , Bx,y ). To deal with the color variation caused by different weather,
illumination, and scenes, the bandwidth in C(x, ) is set up to a large value (as to 30 in each color channel)
to follow such a variation. The output of C(x, y) will be a value that varies continuously from 0 to 1.
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:6

3.1.3 Similarity measurement


The similarity that a pixel could be considered as a part of collapsed building is computed as the product
of its motion, appearance, and color similarity as
( )
S(x, y) = M (x, y) ∗ 1 − H(x, y) ∗ C(x, y). (6)

d
Here, S(x, y) is a similarity function that can describe the probability that a pixel (x, y) could be a part
of one collapsed building. Pixels whose S(x, y) is over a predefined threshold (here, we set S(x, y) = 0.52)
will be considered as a part of the collapsed buildings. Fig. 4 (a) shows an example of the detected pixels
of collapsed building from our similarity measurement, where each detected pixel is illustrated as a 5×5
rectangle.

te
ep (a) Detected feature points of collapsed buildings (b) Integration of detected feature points

Figure 4 The detection and integration of feature points from collapsed buildings. (a) The detected feature points
of collapsed buildings from Eq. (6). Here, since one feature point is too small to be shown clearly, each feature point
is represented by one 5×5 rectangle; and (b) The produced candidate of collapsed buildings by the variable mean shift
clustering [16].

3.1.4 Producing candidate of collapsed building


cc
As shown in Fig. 4, since a collapsed building usually appears to be a region with high density of detected
feature points, producing the candidate of collapsed building becomes a classic clustering problem as
automatically integrating the distributed pixels into reasonable clusters. On such consideration, we chose
the variable mean shift clustering [16] to group the detected feature points into the candidates of collapsed
buildings. In [16], since the kernel window width of each initial cluster seed is the distance from a seed
to its k-NN data (here, we chose k = 5), this integration becomes adaptive to follow the complex data
distribution of detected feature points. Hereafter, we refer to the output of integration results as the
“candidate of collapsed building.”
Part (b) of Fig. 4 shows an example of the produced candidates of collapsed buildings with the adaptive
mean shift clustering algorithm. The candidates are naturally produced by following the distribution of
A

detected feature points.

3.1.5 Candidate verification by temporal tracking


Despite applying multiple features for creating the candidate of collapsed buildings, many false alarms
still arise due to the complex background, viewpoint changes, and variation of object appearance. To
reduce the unreasonable false alarms, we select the temporal tracking algorithm for verifying if the created
candidates are true or not. Based on the truth that a false alarm will arise occasionally while a true
candidate of collapsed building will survive for a long time, a created candidate of collapsed building will
be verified by the temporal tracking algorithm through five successive frames in the time direction. Fig.
5 shows the illustration of using temporal tracking to verify the detection results.
Here, in the temporal tracking process, we verify a candidate of collapsed building by measuring its
appearance, trajectory, and color feature as follows:
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:7

True positive detection Missing detection Trajectory of detection


result
Case (a): true detection results Case (b): false alarms

Time
Case (a) Case (b)

d
Y

te
Figure 5 Illustration of temporal tracking. Case (a): a true positive detection will appear frequently through successive
frames and its trajectory is also continuous; Case (b): a false alarm will randomly appear and its trajectory is not continuous.

App(t, i) = PApp (t, i|t + 1, i), (7)


T raj(t, i) = PT raj (t, i|t + 1, i), (8)
Col(t, i) = PCol (t, i|t + 1, i), (9)
ep F (t, i) = App(t, i) ∗ T raj(t, i) ∗ Col(t, i)

Jud(t, i) =
{
1 if F (t, i) − α > 0 ,
0 otherwise .
(10)

(11)


5
T T (i) = Jud(t, i) ∗ F (t, i). (12)
t=1

Here, as for one candidate i at time t, after finding its nearest neighbor at time t + 1, App(t, i) denotes
the appearance matching score Papp between such two candidates by the normalized cross-correlation
(N CC) template matching method, while T raj(t, i) means the trajectory matching result between two
cc
adjacent candidates where the overlapping candidates will gain high value and the separated ones will be
penalized. Col(t, i) measures the similarity of color histograms extracted from the two candidates with
Bhattacharyya distance. When the similarity F (t, i) is over the predefined threshold α (here, α = 0.6),
we consider that candidate i contains one true positive neighbor. In the case that the value of T T (i) is
over the threshold 1.8, candidate i will be finally regarded as a true positive detection result, otherwise
a false alarm.
A

Magnified window

Figure 6 Temporal tracking could efficiently remove the random false alarms.

Figure 6 shows an example of applying the temporal tracking method to verify the output in Fig.??.
Here, by measuring the appearance, color, and trajectory similarity among the produced candidates
through five successive frames, the random false alarms have been efficiently removed and only the true
positive detection results survived.
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:8

4 Experiment and discussion


4.1 Data set and benchmarks

Despite of the available public aerial photo data sets such as KOBE or Bam data sets, the aerial images
in those data sets were mainly captured by the UAV or satellite from the high altitude. Therefore, the
image resolution of collapsed buildings will be too small to be processed by the proposed algorithm. We

d
prepare a novel aerial image data set of the collapsed building by collecting images from the internet,
where all the images were captured by the UAV at low altitude. In this data set, the resolution of
collapsed buildings is big enough to be processed by our algorithm.
Our low-altitude aerial image data set includes 102 collapsed buildings where 42 buildings are clearly
visible and the rest 60 collapsed buildings are blurred. The images were taken by the UAV from various

te
altitude and viewpoints. All these images recorded the earthquakes that happened at different places
and years, including Italy (2009), Haiti (2010), Wenchuan (2008), and Yaan (2013) of China.
Since the test images were captured from the unknown area, the ground truth of collapsed buildings was
manually produced from the images. A predicted detection result of collapsed building will be compared
with the ground truth. When the overlap rate between a prediction and ground truth is over 50%, it will
be considered as a true positive detection, otherwise a false alarm.
Besides the static low-altitude test data set, we also prepared the other three data sets (Lushan01,02,
and 03 data sets) which were composed of successive aerial images, where those images were captured at
ep
different places shortly after the 2013 Yaan Earthquake. Totally, these data sets contain 408 successive
images and 985 collapsed buildings. Details of these data sets include+ (1) Lushan-01 data set (captured
from Renjia Village) contains 161 successive frames where 345 collapsed buildings could be found; (2)
Lushan-02 data set (captured from Hongxing Village) contains 478 collapsed buildings from 166 frames;
and (3) Lushan-03 data set (captured from Zhongli Village) is composed of 81 frames with 162 collapsed
buildings. To make the accurate evaluation of our system, all the experiments in Section 4.2 and 4.3 were
all performed through the whole data sets but not the selected frames.

4.2 Experiment

To obtain the necessary motion information from our low-altitude static test data set, centered at each
cc
test image, we produced its neighbor images by slightly rotating the image with ±3◦ . In this way, we can
produce the necessary motion information from a static image without destroying its spatial features.
Table. 1 shows the detailed detection results of the proposed algorithm on the test data set, where
Recall and P recision are calculated as:
∑n ∑n
i=1 T Pi T Pi
Recall = ∑m P recision = ∑si=1 , (13)
j=1 GT j k=1 DT k

where T Pi represents the ith true positive detection, GTj means the jth ground truth,and DTk refers to
the kth detection result.
A

From Table. 1, it has shown that the recall of the proposed system on clear images could be over 80%
with the precision of 94.4%, and its performance is degraded to 58% recall and 72% precision as the test
images are blurred. Such a phenomenon is quite similar to human eyes as it is much harder for us to
identify objects in the blurred images than the clear ones.

Table 1 Detection results of the proposed system on the static low-altitude test data set.

Number of collapsed buildings Recall P recision False alarm


Clear images 42 80.1 % 94.4% 5.6%
Blurred images 60 58.1 % 72% 28%
All images 102 67.3% 81.4% 18.6%
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:9

Row(a) : magnified detection results of clear images

d
Row(b): detection results with clear aerial images

te
Row(c): detection results with blurred aerial images
ep Row(d) : magnified detection results of blurred images

Figure 7 The detection results of our method with our low-altitude static test data set. Row(a): the magnified detection
results of clear aerial images; Row(b): detection results with clear aerial images; Row(c): detection results with blurred
aerial images; and Row(d): magnified detection results of blurred images. Red rectangle: true positive detection; Blue
rectangle: false alarm. The proposed method could correctly find the collapsed buildings with quite few false alarms.

Figure 7 shows some detection results of the proposed algorithm with our test data set, where the
cc
true positive detection results are shown by the red rectangles and false alarms are represented by the
blue ones. Row(a) shows some magnified image patches of the detection results of our method with clear
aerial images. Row(b) shows the corresponding detection results with clear aerial images, where one false
alarm was given. That is because that building happened to contain strong texture inside it, where the
gradient orientation of such a texture was randomly distributed. In this way, according to our definition
of collapsed building, one false alarm was produced over that area.
Row(c) shows the performance of our algorithm with blurred aerial images. Here, several false alarms
were produced in the tree area of the left image. That was because the camera focus was laid on the close
building. Therefore, compared with their surrounding blurred area, trees tend to produce strong motion
and random gradients. False alarms were assigned to those trees because the gradient orientation seemed
A

to be arbitrary. For the same reason, the proposed system produced false alarms in the middle-right and
right images of the bottom row. Magnified image patches of the detection results in Row(c) has been
shown in Row(d).

4.3 Comparative experiments

Despite the experiments on city scene images, we also evaluated the proposed algorithm with the aerial
images captured from rural scene, where the earthquake happened in Yaan of China, April 2013. The
tested image sequence was captured by Sony CX460 DV from an UAV, where the image resolution is
640× 480 pixels.
We choose the Haar-AdaBoost [18] and HOG-SVM [19] detection algorithms for the comparative
experiment with our method. The Haar-AdaBoost algorithm is selected due to its good performance on
detecting the rigid objects. The HOG-SVM detection algorithm is well known for its powerful ability in
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:10

detecting both rigid and nonrigid objects (like human body). For training a Haar-feature-based object
detector, the whole image sequence was divided into the test and training data sets. We trained a 20-
layer Haar-based detector with the AdaBoost training algorithm, where 2400 positive samples and 2900
negative ones were manually selected from the training data set. To make the comparison meaningful,
the HOG-based collapsed-building detector was obtained from the linear SVM training algorithm with
the same training samples as Haar-AdaBoost algorithm.

d
Magnified image patches of real collapsed buildings.

te
Frame2677 Frame2879 Frame3065 Frame3322
Top row: detection results of the proposed algorithm.
ep Frame2677 Frame2879 Frame3065
Middle row: detection results of the Haar-AdaBoost detection algorithm.
Frame3322

Frame2677 Frame2879 Frame3065 Frame3322


Buttom row: detection results of the HOG-SVM detection algorithm.

Figure 8 Comparative experimental results among the proposed algorithm, Haar-AdaBoost [18], and HOG-SVM [19]
cc
algorithms with the Lushan-01 data set. The image patches of real collapsed buildings are magnified and shown on the top
of this figure. From this experiment, it has been proved that our method is much superior to the conventional learning-based
detection algorithms when the appearance of collapsed buildings becomes unpredictable.

Table 2 Detection results of the proposed system, Haar-AdaBoost [18], and HOG-SVM [19] detection methods on the
Lushan-01 data set of Fig. 8.

Recall P recision False alarm


Our method 78.3 % 89.3% 10.7%
A

Haar-AdaBoost 58.1 % 8.4% 91.6%


HOG-SVM 63.7% 4.2% 95.8%

Figure 8 shows the comparative experimental results among our method, the Haar-AdaBoost, and
HOG-SVM detection algorithms. Since the appearance variation of collapsed buildings is huge and
usually depends on the test scene, the learning-based Haar-AdaBoost detector produced many false
alarms on the background regions, because it cannot distinguish the background from really collapsed
buildings. HOG-SVM detector achieved similar detection results to the Haar-AdaBoost method because
the gradient orientation of collapsed buildings is usually unpredictable. As for the proposed method,
since it will compute the appearance and motion features to predict the possible collapsed buildings
and verify those predictions with temporal tracking method, it can efficiently detect the really collapsed
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:11

buildings and reduce the unreasonable false alarms. Table. 2 1) shows the detailed results of our method,
Haar-AdaBoost, and HOG-SVM detection algorithms.
Compared with the experimental results with static clear images, in this experiment, the false alarm of
the proposed algorithm has been increased due to the image blur caused by the camera movement such
as the rapid translation, pitch, or rotation. Such a degradation is considered to be reasonable because
humans also tend to produce more false alarms when the input images become blurred.

d
Magnified image patches of real collapsed buildings

te
Frame 510 Frame 557 Frame 592 Frame 611
ep Top row: detection results of the proposed algorithm

Frame 510 Frame 557 Frame 592 Frame 611


Middle row: detection results of the Haar-AdaBoost detection algorithm

Frame 510 Frame 557 Frame 592 Frame 611


Bottom row: detection results of the HOG-SVM detection algorithm

Figure 9 Comparative experimental results between the proposed algorithm, Haar-AdaBoost [18], and HOG-SVM [19]
cc
algorithms with the Lushan-02 data set. The image patches of real collapsed buildings are magnified and shown on the top
of this figure. The proposed method has achieved the highest recall rate with the minimum false alarms. The performances
of Haar-AdaBoost and HOG-SVM detection algorithms are quite similar to each other by producing many false alarms at
the cluttered background regions.

Table 3 Detection results of the proposed system, Haar-AdaBoost [18], and HOG-SVM [19] detection methods on the
Lushan-02 data set of Fig. 9.

Recall P recision False alarm


Our method 91.2% 87.9% 12.1%
A

Haar-AdaBoost 55.4% 6.8% 93.2%


HOG-SVM 84.7% 10.7% 89.3%

Figure 9 shows the comparative experimental results of the three detection algorithms with our
Lushan-02 test data set. The magnified image patches of really collapsed buildings were shown on
the top of this figure. Among all the compared detection algorithms, our method produced the most
accurate detection results while containing the minimum false alarms. As for the Haar-AdaBoost and
HOG-SVM detection algorithms, since they all require the offline training data sets to contain all the
possible appearance variations of target objects, they achieved quite similar detection results such as
1) In this data set, 345 collapsed buildings are included.
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:12

producing many false alarms at trees or grasses. That is because such objects are not included in our
training data set, and if we keep on increasing the training data sets by including new samples to solve
this problem, it will finally lead to the overtraining problem.
Table 3 2) shows the detailed detection results of Fig. 9. The recall rate of our method has been over
91% while keeping the false alarm rate as 12%. HOG-SVM detection algorithm achieved the high recall
rate of 84.7% at the cost of high false alarm rate as 89.3%. Haar-AdaBoost detection algorithm lost this

d
experiment by producing the lowest recall rate and highest false alarm rate.

Image patches of real collapsed buildings Image patches received by Haar and HOG detectors

te
Frame 664 Frame 676 Frame 688 Frame 702
Top row: detection results of the proposed algorithm
ep Frame 664 Frame 676 Frame 688
Middle row: detection results of the Haar-AdaBoost detection algorithm
Frame 702

Frame 664 Frame 676 Frame 688 Frame 702


Bottom row: detection results of the HOG-SVM detection algorithm
cc
Figure 10 Comparative experimental results between the proposed algorithm, Haar-AdaBoost [18], and HOG-SVM [19]
algorithms with the Lushan-03 data set. The image patches of real collapsed buildings are magnified and shown on the
top of this figure. The magnified image patches received by Haar and HOG methods are not the really collapsed buildings.
The recall rate and false alarm rate of our method are 51.2% and 32%, respectively. Both Haar-AdaBoost and HOG-SVM
method get the false alarm rate of over 95%, which indicates that they are unsuitable for this test scene.

Table 4 Detection results of the proposed system, Haar-AdaBoost [18], and HOG-SVM [19] detection methods on the
Lushan-03 data set of Fig. 10.

Recall P recision False alarm


A

Our method 51.2% 68% 32%


Haar-AdaBoost 7.4% 0.9% 99.1%
HOG-SVM 63% 4.1% 95.9%

Figure 10 shows the performances of our method, Haar-AdaBoost, and HOG-SVM algorithms with
Lushan-03 test data set. In this data set, many complex background components were included such as
the cluttered buildings, river and trees. Although such difficulties reduced the recall rates of all compared
algorithms, our method still achieved the best performance in this test by ranking 2 in the recall rate
and ranking 1 in false alarm. The HOG-SVM achieved the highest recall rate at the cost of its high false

2) In this data set, 478 collapsed buildings are included.


Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:13

alarm rate as up to 95.9%. The high false alarm rate of either Haar-AdaBoost or HOG-SVM detector
indicates that they are not applicable for such a difficult test scene. Details of this test could be found
in Table. 4 3) .
Through Figs. 8, 9, and 10, the proposed algorithm has won all the experiments with high recall rate
and minimum false alarms. Either HOG-SVM or Haar-AdaBoost method has created many unreasonable
false alarms, and HOG-SVM is much superior to the latter method by achieving higher recall rate. Such

d
a stable ranking indicates that our experiments are quite fair and the performances of all compared
algorithms are independent on the test scenes.

4.4 Affection of parameters

te
It is well known that the performance of an object detector may vary due to different test scenes. However,
in the proposed algorithm, the affection of some other important parameters should also be investigated.
Figure 11 shows the affection of θ in Section 3.1.1. The top row shows the spatio-temporal gradient
images as the value of θ is increased, and the bottom row shows the corresponding detection results,
when all the other parameters are fixed. Since increasing θ will filter out the weak gradients, only the
objects with strong gradients could survive. Therefore, the proposed algorithm tends to erase detection
results with weaker gradients and only focus on objects with strong gradients. In this paper, we set θ as
ep
20 to keep high detection rate while low false alarm rate.
cc
Theta = 0, 4 detection results Theta = 50, 4 detection results Theta = 80, 3 detection results Theta = 120, 1 detection results

Top row: spatio-temporal gradient images as the value of theta changes from 0 to 120.

Bottom row: final detection results as the value of theta changes from 0 to 120.

Figure 11 The affection of threshold θ in our system. When dealing with the same test image, increasing θ will lead less
spatio-temporal gradients to be remained, and only the object with strong gradients could survive. Therefore, it will reduce
the number of detection results. Top row: the spatio-temporal gradient image; Bottom row: detection results corresponding
to different values of θ.

Another important parameter that should be investigated is the S(x, y), the top row of Eq. (6). Fig.
A

12 shows that increasing the threshold of S(x, y) within a small range will not affect the final detection
results. However, as the threshold is keeping on increasing, the number of detection results will be reduced
gradually. When the threshold is too high (as 0.9), all the detection results will vanish.
The bottom row of Fig. 12 shows the affection of increasing the value of α during temporal tracking
in Eq. (12). Through these experiments, it has been proved that increasing the value of α will erase
the detection results that have low similarity among adjacent frames. In this way, increasing α could be
helpful in removing the false alarms. However, too high value of α may also remove the correct detection
results.
The experiments were taken on a desktop PC with the CPU of Intel Xeon E5 3.6Ghz and 16GB memory.
The resolution of test image is 640 × 480 pixels and the processing time is 10 – 15 frames/second.

3) In this data set, 162 collapsed buildings are included.


Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:14

Top row: the affection of threshold in S(x,y)

when S(x,y) = 0.35, when S(x,y) = 0.52 when S(x,y) = 0.75 when S(x,y) = 0.90
3 detection results, 0 false alarm 3 detection results, 0 false alarm 1 detection results, 0 false alarm 0 detection results, 0 false alarm

d
Bottom row: the affection of similarity in Equation 12

te
alpha = 0.3 alpha = 0.6 alpha = 0.7 alpha = 0.9
4 true positive, 1 false alarm 2 true positive, 1 false alarm 2 true positive, 1 false alarm 2 true positive, 0 false alarm

Figure 12 The affection of different parameters in this work. Top row: the affection of threshold in similarity S(x, y).
Within a small range, increasing the threshold will not affect the detection results. However, as the threshold is keeping
on increasing, all the detection results will be erased gradually. Bottom row: the affection of similarity α in Eq. (12).
Although increasing the value of α could efficiently remove the false alarm, it will also remove some true positive detection.
In the temporal tracking process, the trade-off between true positive and false alarm should be further investigated.
ep
4.5 Future works

Through all the experiments in Fig. 11 and 12, it has been clear that although changing the aforemen-
tioned parameters in a small range will not affect the final detection results, their optimal value should
be adjusted according to the test scene where many things may affect our method (such as weather, illu-
mination, and seasons). The autoselection of optimal value for these parameters should be investigated
in our future work. Also, single color similarity measurement is far from being enough when the UAV
is working under the urban condition, where the color appearance of normal/collapsed buildings is more
complex than the countryside. Therefore, another further work should be made to extend our system
from single color to multiple color similarity measurements to improve the performance of our system
cc
when the targets contain complex color.

5 Conclusion

In this paper, we brought out a detection algorithm of collapsed building by only using the postseismic
aerial images for the post-disaster evaluation. By combining the appearance and motion features extracted
from successive images, the possibility that an image pixel belonging to parts of collapsed building will be
A

measured by a statistic formulation, where feature points of collapsed building will be assigned with high
value and the background ones will be penalized. The candidates of collapsed buildings are produced
by grouping the remaining feature points into reasonable clusters with the online clustering algorithm.
To reduce the false alarms caused by the complex background, temporal tracking algorithm has been
applied to verify if the produced candidate region is correct or not. Through massive experiments, the
effectiveness and efficiency of the proposed algorithm have been confirmed.

Acknowledgements

This work has been supported by the National Key Technology Research and Development Program of China
(2013BAK03B01) and the program of “One Hundred Talented People” of the Chinese Academy of Sciences
(Y3F11001).
Hua ChunSheng, et al. Sci China Inf Sci xxxxxx 2014 Vol. 57 xxxxxx:15

References
1 Tong X H, Hong Z H, et al. Building-damage detection using pre-and post-seismic high-resolution satellite stereo
imagery: a case study of the may 2008 wenchuan earthquake. ISPRS of Journal of Photogrammetry and Remote
Sensing, 2012, Vol.68, pp. 13-27
2 Tong X H, Lin X F, et al. Use of shadows for detection of earthquake-induced collapsed buildings in high-resolution
satellite imagery. ISPRS of Journal of Photogrammetry and Remote Sensing, 2013, Vol.79, pp. 53-67

d
3 Resaeianm M. Automatic classification of collapsed buildings ssing stereo aerial images. Internation Journal of Com-
puter Application, May 2012, Vol.46, No.21, pp.35-42
4 Elberink S O, Shoko M, Fathi S A, et al. Detection of collapsed buildings by classifying segmented airborne laser scanner
data. In Proceeding of International Archives of Photogrammetry and Remote Sensing and Spatial Information Science.
Calgary, Canada, 2011, Vol. XXXVIII-5/W12, pp.307-312
5 Khoshelham K, Elberink S O. Role of dimensionality reduction in segment-based classification of damaged building
roofs in airborne laser scanning data. In Proceeding of 4th International Conference on Geographic Object-Based

te
Image Analysis, Riode Janeiro-Brazil, May 7-9 2012, pp.372-377
6 Murayama Y, Tashiro T, Yamazaki F. Detection of collapsed buildings after the 2007 niigata chuetshuoki earthquake
based on digital surface model constructed from aerial images. In Proceeding of the second International Symposium
on Advances in Urban Safety, 2010, pp.319-324
7 Rezaeian M. Automatic classification of collapsed buildings using stereo aerial images. International Journal of Com-
puter Application, May 2012, Vol.46, No.21, pp.35-42
8 Dai L, Qi J T, Wu C, Han J D. Magnetic compass error analysis and calibration for rotorcraft flying robot, Robot,
July 2012, Vol.34(4), pp.418-424
9 Qi J T , Han J D. Application of wavelets transform to fault detection in rotorcraft UAV sensor failure. Journal of
ep Bionic Engineering, 2007, Vol.4(4), pp.265-270
10 Zhao J, Feng C, Shao F, Zhang X. Moving object detection and segmentation based on adaptive frame difference and
level set. Information and Control, April 2012, Vol.41(2), pp.153-158
11 Qi J T, Song D, Han J D, et al. KF-based adaptive UKF algorithm and its application for rotorcraft UAV actuator
failure estimation. International Journal of Advanced Robotic System, 2012, Vol.9, pp.1-9
12 Hua C S, Makihara Y, Yagi Y. Pedestrian detection by using a spatio-temporal histogram of oriented gradients. IEICE
Transaction on Information and System, 2013, Vol.E96-D, No.6, pp.1376-1386
13 Turker M. Automatic detection of earthquake-damaged buildings using DEMs created from pre- and post-earthquake
stereo aerial photographs. International Journal of Remote Sensing, 2005, Vol.26, No.4, pp.823-833
14 Turker M, Sumer W. Building-based damage detection due to earthquake using the watershed segmentation of post-
event aerial images. International Journal of Remote Sensing, 2008, Vol.29, No.11, pp.3073-3089
15 Li L, Zhang B, Wu Y. Fusing spectral and texture information for collapsed buildings detection in airborne image. In
cc
Proceeding of IEEE Internal Geoscience and Remote Sensing Symposium, Munich, Germany, 2012, pp. 186-189
16 Comaniciu D, Ramesh V, Meer P. The variable bandwidth mean shift and data-driven scale selection. In Proceeding
of IEEE International Conference Computer Vision, 2001, pp.438-445
17 Suner E, Turker M. Building damage detection from post-earthquake aerial imagery using building grey-value and gra-
dient orientation analysis. In Proceeding of the 2nd International Conference of Recent Advance in Space Technology,
2005, pp.577-582
18 Viola P, Jones M. Robust real-time object detection. International Journal of Computer Vision, 2004, Vol.57, Iss. 2,
pp.137-154
19 Dalal N, Triggs B. Histograms of Oriented Gradients for Human Detection. In Proceeding of IEEE Conference of
Computer Vision abd Pattern Recognition. 2005, pp.886-893
20 Felzenszwalb P F, Girshick R B, McAllester D, et al. Object Detection with Discriminatively Trained Part-based
A

Models. IEEE Transaction on Pattern Analysis and Machine Intelligence. 2010, Vol.32, No.9, pp.1627-1645
21 Felzenszwalb P F, Girshick R B, McAllester D. Cascade Object Detection with Deformable Part Models. In Proceeding
of IEEE Conference of Computer Vision and Pattern Recognition. 2010, pp.2241-2248

View publication stats

You might also like