Damage Assessment Using Deep Learning ICDM-1
Damage Assessment Using Deep Learning ICDM-1
Abstract—Natural disasters frequency is growing globally. Microsoft AI for Good/Humanitarian Action has collab-
Every year 350 million people are affected and billions of orated with Netherlands Red Cross to use high-resolution
dollars of damage is incurred. Providing timely and appropriate satellite imagery from before and after natural disasters,
humanitarian interventions like shelters, medical aid, and food to
affected communities are challenging problems. AI frameworks delineated in the publicly available xBD dataset, to develop
can help support existing efforts in solving these problems in an end-to-end Siamese convolutional neural network that can
various ways. In this study, we propose using high-resolution localize buildings and score their damage level. Such a model
satellite imagery from before and after disasters to develop a is trained on historical disaster data and then applied on
convolutional neural network model for localizing buildings and demand to identify damaged buildings during future disasters.
scoring their damage level. We categorize damage to buildings
into four levels, spanning from not damaged to destroyed, based Such AI and data-driven decision-aid tools can empower
on the xView2 dataset’s scale. Due to the emergency nature humanitarian organizations to take more informed actions
of disaster response efforts, the value of automating damage at the time of disaster and allocate their resources more
assessment lies primarily in the inference speed, rather than strategically during their field deployments. Throughout the
accuracy. We show that our proposed solution works three course of our collaboration, extensive deployment experience
times faster than the fastest xView2 challenge winning solution
and over 50 times faster than the slowest first place solution, shared by field experts and their valuable perspective as
which indicates a significant improvement from an operational a stakeholder were instrumental in informing our empirical
viewpoint. Our proposed model achieves a pixel-wise F1 score analysis of the model pipeline and will be vital in future
of 0.74 for the building localization and a pixel-wise harmonic assessments of the model performance in the fields when
F1 score of 0.6 for damage classification and uses a simpler actual disasters happen.
architecture compared to other studies. Additionally, we develop
a web-based visualizer that can display the before and after In 2019, the xView2 challenge and the xBD dataset were
imagery along with the model’s building damage predictions on announced at the Computer Vision for Global Challenges
a custom map. This study has been collaboratively conducted to Workshop at the Conference on Computer Vision and
empower a humanitarian organization as the stakeholder, that Pattern Recognition to benchmark automated computer vision
plans to deploy and assess the model along with the visualizer capabilities for localizing and scoring the degree of damage
for their disaster response efforts in the field.
Index Terms—satellite imagery datasets, neural networks,
to buildings after natural disasters [3]. In this challenge,
image segmentation, building damage classification, natural participants had to train their model offline and upload
disasters, humanitarian action their predictions for evaluation and display on the public
leaderboard based on a single unlabeled test dataset, which
I. I NTRODUCTION they could download. While this challenge provided a great
opportunity for AI researchers to weigh in on damage
Natural disasters affect 350 million people each year causing assessment tasks, it assumed no constraints on the level of
billions of dollars in damage and were the main driver of computational resources available to participants for model
hunger for 29 million people in 2021 [1]. Providing timely training and did not strictly prevent the potential hand-labeling
humanitarian aid to affected communities is increasingly and use of the test datasets in the training phase. The winning
challenging due to the growing frequency and severity of solutions used large ensembles of models, and although they
such events [2]. Impact assessment of natural disasters in perform well on the test set, they were not optimized for
a short time frame is a crucial step in emergency response inference runtime and require a prohibitively large amount
efforts as it helps first responders allocate resources effectively. of compute resources to be run on large amounts of satellite
For example, dispatching aid, sending shelters, and allocating imagery on demand during disaster events. For example, the
building material for reconstruction can be more efficient with first-place winner proposed an ensemble of four different
estimates of where damaged buildings are, and how badly models, requiring 24 inference passes for each input.
damaged they are. In this study, we propose a single model which predicts
both building edges and damage levels and that can be run detection were introduced in [4] and were proposed by other
efficiently on large amounts of input imagery. The proposed studies as well [10], [11].
multitask model includes a building segmentation module In [4], convolutional Siamese networks are trained end-to-
and a damage classification module. We use a similar model end from scratch using only the available change detection
architecture proposed by previous studies on building damage datasets. The authors proposed fully convolutional encoder-
assessment [4], [5]; however, we use a simpler encoder and do decoder networks that use the skip connection concept.
not include attention layers. We evaluate the performance of [12] presented an improved UNet++ model with dense skip
our model extensively for several different splits of the dataset connections to learn multiscale and different semantic levels
to assess its robustness to unseen disaster scenarios. From an of visual feature representations. Attention layers have been
operational perspective, the model’s runtime is of paramount proposed for general change detection networks [13] as well
importance. Thus, we benchmark the inference speed of our as building damage assessment tasks as presented in [5]. Also,
model against the winning solutions in the xView2 competition [14] proposes an attention-based two-stream high-resolution
and the existing models deployed by our stakeholder. We show network to unify the building localization and classification
that our model works three times faster than the fastest xView2 tasks into an end-to-end model via replacing the residual
challenge winning solution and over 50 times faster than the blocks in HRNet [15] with attention-based residual blocks
slowest first place solution. The baseline solution available to to improve the model’s performance. RescueNet, an end-
our stakeholder consists of two separate models for building to-end model that handles both segmentation and damage
segmentation and damage classification [6]. We were able to classification tasks was proposed in [16]. It was trained using
show that our proposed approach works 20% faster than the a localization aware loss function, that consists of a binary
baseline model available to the stakeholder and also conducts cross-entropy loss and dice loss for building segmentation and
the task in an end-to-end and more automated way, which can a foreground-only selective categorical cross-entropy loss for
improve their field operations and deployment. damage classification. [6] explored the applicability of CNN-
Finally, we develop a web-based visualizer that can display based models under scenarios similar to operational emergency
the before and after imagery along with the model’s building conditions with unseen data and the existence of time
damage predictions on a custom map. This is an important constraints. [17] proposed a dual-task Siamese transformer
step in deploying a model for real-world use cases. Even model to capture non-local features. Their model adopts
a perfect building damage assessment model will not be transformers as the backbone rather than a convolutional
practically useful if there is not a mechanism for running neural network and relies on a lightweight decoder for the
that model on new imagery and communicating the results downstream tasks.
to decision-makers that are responding to live events. A web- Graph-based models have been explored in [18] for building
based visualizer allows anyone to see both the imagery and damage detection solutions to capture similarities between
predictions without GIS software for any type of disaster. neighboring buildings for predicting the damage. They used
the xBD dataset for cross-disaster generalization. While their
II. R ELATED W ORK proposed approach showed some advantages in terms of
accuracy, it did not consistently outperform the Siamese
Convolutional neural networks (CNN) have been used CNN model in terms of F1 score, which would be a more
for change detection tasks in satellite imagery for disaster appropriate metric for imbalanced datasets. Furthermore, [19]
response and other domains including but not limited to proposed BLDNet based on a Siamese CNN combined with
changes in infrastructures. [7] proposed using pre-trained a graph node classification approach to be trained in a semi-
CNN features extracted through different convolutional layers supervised manner to reduce the number of labeled samples
and concatenation of feature maps for pre- and post-event needed to obtain new predictions. They benchmarked their
images. The authors used pixel-wise Euclidean distance approach with a semi-supervised multiresolution autoencoder
to compute change maps and thresholding methods to and showed performance improvements. The extremely
conduct classification. [8] leverages hurricane Harvey data, in imbalanced distributions of the building damages are
particular, to train CNNs to classify images as damaged and addressed in [20] by supplementing the architecture with a new
undamaged. While they report very high accuracy numbers, learning strategy comprising normality-imposed data-subset
they did not focus on detecting building edges and used a generation and incremental training. However, they propose
binary damage scale at the image-frame level. A Siamese a two-step solution approach for building localization and
CNN approach was proposed in [9] to extract features directly damage classification. Self-supervised comparative learning
from the images, pixel by pixel. To reduce the influence of approach has been studied in [21] to address the task without
imbalance between changed and unchanged pixels, the authors the requirement of labeled data. Their proposed approach is an
used weighted contrastive loss. The unique property of the asymmetric twin network architecture evaluated on the xBD
extracted features was that the feature vectors associated with dataset.
changed pixel pairs were far away from each other in the In this study, we propose a Siamese approach inspired by
feature space, whereas the ones of unchanged pixel pairs [4], [5] where UNet architecture is used for the building
were close. Fully convolutional Siamese networks for change segmentation task and UNet’s encoders with shared parameters
for pre-disaster and post-disaster imagery, are used to
score building damage levels via an end-to-end approach.
Furthermore, we also evaluate the performance of our model
in various scenarios that resemble operational emergency
conditions. Web visualizer tools have been developed for
other specific domains like data-driven wildfire modeling
[22] and fire inspection prioritization [23] in the past. Our
developed web visualizer allows imagery and prediction layers
visualization for any disasters where before and after disaster
satellite images are available.
III. DATA
Fig. 1. Imagery samples from different disasters from DigitalGlobe.
In this study, we use the xBD dataset introduced in [24]
as a new large-scale dataset for the advancement of change
detection and building damage assessment for humanitarian
assistance and disaster recovery research. This dataset has been
sourced from the Maxar/DigitalGlobe Open Data Program. It
covers 19 different disasters from around the world for which
there exists high-resolution (<0.8m/px resolution) imagery.
The disaster types include flood, wind, fire, earthquake, (a) Pre-disaster (b) Post-disaster (c) Ground Truth
tsunami, and volcano. The entire dataset contains 22,068
Fig. 2. Imagery samples with polygons showing building edges and colors
image tiles of 1024×1024 pixels that cover a total of 45,361.79 showing damage level.
sq. km. There are 850,736 building polygons available along
with a damage level label that indicates: no-damage, minor-
damage, major-damage, and destroyed. The breakdown of the layers in the model. One module of our model is based
number of polygons for pre-disaster images across different on the UNet architecture proposed in [25], which obtains
disasters is shown in Table I. Figure 1 and Figure 2 show the building segmentation mask. A single image frame is
some examples of pre- and post-disaster image frames from fed to the fully convolutional UNet model where local
the xBD dataset. See figure 5 for legend. information is captured via encoder-decoder structures and
global information is captured via several skip connections.
Name/Location Type # of polygons In the damage assessment scenario, we have a pair of pre-
Palu, Indonesia Earthquake/Tsunami 55,789 and post-disaster image frames, which are given as inputs
Mexico City, Mexico Earthquake/Tsunami 51,473 separately to the UNet module of our proposed method
Nepal Flood 43,265
Hurricane Harvey, USA Flood 37,955 using shared weights. We use the embedding layers from the
Hurricane Michael, USA Wind 35,501 encoder part of the UNet architecture for pre- and post-disaster
Hurricane Matthew, USA Wind 23,964 images to learn about the changes. In other words, the second
Portugal Wildfire 23,413
Moore, OK Wind 22,958 module of our model is a separate decoder that conducts a
Santa Rosa, CA Wildfire 21,955 damage classification task on the subtracted embedding layers
SoCal, CA Wildfire 18,969 using several convolutional layers. This idea is based on the
Sunda Strait, Indonesia Earthquake/Tsunami 16,847
Joplin, MO Wind 15,352 approach proposed in [4]. Figure 3 demonstrates the overall
Tuscaloosa, AL Wind 15,006 schema of the architecture. Our UNet architecture has five
Midwest USA Flood 13,896 convolution blocks for the encoder part and four convolution
Hurricane Florence, USA Flood 11,548
Woolsey, CA Wildfire 7,015 blocks for the decoder. Each downsampling block consists
Pinery, Australia Wildfire 5,961 of convolution, batch normalization, ReLU, and max-pooling
Lower Puna, HI Volcanic eruption 3,410 layers. Each upsampling block consists of upsampling with
Guatemala Volcanic eruption 991
TABLE I
bilinear interpolation, convolution, batch normalization, and
D ISASTER EVENTS IN THE X BD DATASET. ReLU layers. For the damage decoder, the same upsampling
blocks apply to subtracted and concatenated representations
at each step. The details of the layers can be found in our
code repository made publicly available1 . The output of the
IV. M ODEL A RCHITECTURE
damage classification mask has five channels for four damage
We propose a deep learning model that conducts both levels and one background label. We use weighted binary
building segmentation and damage classification tasks via cross-entropy loss for building segmentation and multi-label
a single pipeline. Our approach has some similarities to cross-entropy loss for damage classification. In the building
the proposed method in [5]. However, our architecture
is less complex as we do not incorporate any attention 1 https://github.com/microsoft/building-damage-assessment-cnn-siamese
separately. We also apply random horizontal and vertical
flipping during the training to reduce overfitting.
We observed that it is quite challenging to train the entire
model from scratch for both tasks simultaneously as the
performance of the building segmentation step impacts the
performance of the damage classification task significantly. As
such, we train the model sequentially based on two different
sets of weights. First, we train the building segmentation
module by setting the weight for damage classification as
zero and setting the weights for the UNet in the loss function
equal to 0.5 for both pre-disaster and post-disaster building
segmentation tasks. We also set weights for building pixels
Fig. 3. We use a Siamese U-Net model architecture where the pre- and
post disaster imagery are fed into an encoder-decoder style segmentation
equal to 15 and background pixels equal to 1 as there
model (U-Net) with shared weights (blocks with the same color in the figure is a significant imbalance between the number of pixels
share weights). The features generated by the segmentation encoder from across these two classes. In other words, [ωspre , ωspost , ωd ] =
both inputs are subtracted and passed to an additional damage classification
decoder that generates per-pixel damage level predictions. The weights of the
[0.5, 0.5, 0] and [ωs,c=0 , ωs,c=1 ] = [1, 15]. Label c = 0 denotes
damage classification decoder can be fine-tuned for specific disaster types, background pixels and label c = 1 denotes building pixels.
while relying on building segmentation output from the building decoder. Once we get reasonable performance on the validation
set for the first task, we freeze the parameters of the
UNet and we start training the model for the second task,
segmentation loss function shown in equation 1, ωs,1 and ωs,0 i.e., damage classification. Thus, we set the weights in the
denotes weights on building pixels and background pixels, loss function for pre-disaster and post-disaster segmentation
respectively. Subscript s denotes the segmentation task. ys is task as zero and we set the damage classification task
the ground truth label for each pixel and ps is the predicted equal to 1. Due to high imbalance across different damage
probability. For both pre- and post-disaster image frames, loss classes, we assign higher weights to the major-damage class
functions Lspre and Lspost are defined similarly and the UNet (label=3) and destroyed class (label=4). In other words,
model has shared weights across these two components. [ωd,c=0 , ωd,c=1 , ωd,c=2 , ωd,c=3 , ωd,c=4 ] = [1, 35, 70, 150, 120]
for the damage classification task and [ωspre , ωspost , ωd ] =
Lspre = Lspost = −(ωs,1 ys log ps + ωs,0 ys log (1 − ps )) [0, 0, 1] for building segmentation. Label d = 0 denotes
(1) background pixels and labels d = 1 to d = 4 denotes damage
In equation 2, ωd,c denotes weight on each damage class c. levels scaled from not-damaged to destroyed.
We use subscript d to denote the damage classification task. yd Since our model handles two tasks, we demonstrate
is the damage ground truth label for each pixel and pd is the performance results separately on each task. The performance
predicted probability. The damage loss, Ldmg , is calculated results for the tile-wise random split are demonstrated in the
only when a pixel is predicted as a building class or ŷs == 1. first row of Table III where the model is evaluated both
on the validation set and test set. Columns in the table
P5 are named BLD-1, DMG-0, DMG-1, DMG-2, DMG-3, and
Ldmg =− c=1 ωd,c (yd (c) log pd (c)), if ŷs == 1 (2) DMG-mean. The BLD-1 column denotes the F1 score for
class 0, which indicates building pixels. DMG-0, DMG-1,
Equation 3 indicates the combined weighted loss function
DMG-2, and DMG-3 indicate pixel-wise F1 scores for no-
for the tasks along with their corresponding weights.
damage, minor-damage, major-damage, and destroyed classes,
respectively. DMG-mean denotes the harmonic F1 score across
Ltotal = ωspre Lspre + ωspost Lspost + ωd Ldmg (3) all damage levels computed based on the following equation.
• The same Python virtual environment for all experiments same assumptions in the previous setup despite using the same
to remove the effect of different packages on the hardware. In this case, our model could process the Hurricane
performance. Ida imagery in 2 days at a cost of $14.7. The stakeholder’s
Additionally, the inference times reported in Table IV baseline solution’s speed is 1000 square kilometers per hour
include the file I/O, model loading, pre-processing, and post- on Azure NC12 GPU. We project our runtime to be 20% faster
processing costs associated with each approach and therefore than their baseline solution on a similar GPU.
represent an upper bound on the time taken to process any
Method Inference time (s) sq. km/hr
given 1024 × 1024 input (i.e. when running such approaches
over large amounts of input, the models would only need to xView2 1st place 245.75 (0.73) 1.38
be loaded from the disk a single time). xView2 2nd place 121.03 (0.36) 2.81
xView2 3rd place 108.21 (0.6) 3.14
As previously discussed, the xView2 challenge2 encouraged xView2 4th place not reproducible not reproducible
participants to optimize for leaderboard performance instead xView2 5th place 10.94 (0.06) 31.07
of throughput. As such, many of the top-placed solutions used
Our method 3.8 (0.02) 89.35
techniques such as ensembling and test time augmentation, as
TABLE IV
well as larger, more complex models in order to improve their C OMPARISON OF BUILDING DAMAGE MODEL INFERENCE TIMES ON A
performance at the cost of inference speed. The top-performing SINGLE 1024 X 1024 PIXEL TILE FOR DIFFERENT METHODS USING A
SINGLE T ESLA K80 GPU ( ON AN A ZURE NC6 MACHINE ). T IMES ARE IN
solution, for instance, consists of an ensemble of 12 models
SECONDS AND ARE AVERAGED OVER THREE RUNS WITH A STANDARD
that are run 4 times for each input (test time augmentation with DEVIATION IN PARENTHESES . T HE RESULTS FOR THE WINNING X V IEW 2
4 rotations). These solutions are prohibitively costly to run SOLUTIONS ARE REPRODUCED THROUGH THE OFFICIAL G IT H UB
REPOSITORIES PUBLISHED FOR EACH , WHERE THE ONLY MODIFICATIONS
on large inputs. For example, the Maxar Open Data program
TO THE ORIGINAL CODE WAS TO ENABLE GPU PROCESSING FOR EACH
released ∼ 20, 000km2 of pre- and post-disaster imagery INFERENCE SCRIPT. T HE RIGHTMOST COLUMN SHOWS THE INFERENCE
covering areas impacted by Hurricane Ida in 2021. Assuming SPEED IN TERMS OF ( SQ . KM )/ HR ASSUMING A 0.3 M / PIXEL INPUT SPATIAL
RESOLUTION .
the inference times from Table IV, 0.3m/px spatial resolution
of the input imagery and $0.9/hr cost of running a Tesla K80
(based on current Azure pricing), the first place solution would
cost $6,500 to run, while our solution would only cost $100 VIII. W EB V ISUALIZER T OOL
to run. In this case, our solution would generate results for the In contrast to standard vision applications, semantic
area affected by Hurricane Ida in 4.7 days while the first place segmentation models that operate over satellite imagery need
solution would take up to 301.4 days using a single NVIDIA to be applied over arbitrarily large scenes at inference-time. As
Tesla K80 GPU. such, distributing the imagery and predictions made by such
Finally, we benchmark our proposed solution in an models is non-trivial. First, high-resolution satellite imagery
optimized setting compared to the above setting: we load data scenes can be many gigabytes in size, difficult to visualize
with a parallel data-loader (vs. loading a single tile on the main (e.g. requiring GIS software and normalization steps), and may
thread), we run pre- and post-processing steps on the GPU, require pre-processing to correctly align temporal samples.
we maximize the amount of imagery that is run through the Second, the predictions from a building damage model are
model at once (vs. running on a single 1024 × 1024 tile of strongly coupled to the imagery itself. In other words,
imagery), and we use the most recent version of all relevant only distributing georeferenced polygons of where damaged
packages (vs. the earliest version pinned in the environments buildings are predicted to be is not useful in a disaster response
from the xView2 solution repositories). Here, we find that setting. The corresponding imagery is necessary to interpret
our model is able to process 612.29 square kilometers per and perform quality assessment on the predictions.
hour compared to 89.35 square kilometers per hour under the Considering these difficulties, we implement a web-based
2 Most machine learning competitions follow a similar format, whereby
visualizer to distribute the predictions made by our model over
participant solutions are only ranked in terms of the held-out test set satellite image scenes. This approach bypasses the need for any
performance. specialized GIS software, allowing any modern web-browser
IX. D EPLOYMENT AND A PPLICATIONS
Automating building detection and damage assessment has
the potential to tremendously speed up disaster response
processes [6], [27], which are of critical importance for
humanitarian organizations to estimate the geographical extent
and the severity of a disaster and plan accordingly [28]. To
ensure that such an assessment can be delivered in time,
this study’s stakeholder is implementing the proposed model
within a scalable, distributed computing system. Within this
system, satellite images are divided among many identical
instances of the model, which process them in parallel. This
Fig. 6. Screenshot of the building damage visualizer instance for the August, guarantees a fixed computation time with any number and
2021 Haiti Earthquakes. The left side of the map interface shows the pre- size of input satellite images. The model’s output is then
disaster imagery while the right side shows the post-disaster imagery. The
slider in the middle of the interface allows a user to switch between the pre- shared with the wider humanitarian network in three ways: the
and post-disaster layers to quickly see the difference in the imagery. Finally, aforementioned web visualizer, the open data-sharing platform
the building damage predictions are shown as polygons with varying shades “Humanitarian Data Exchange”, and via man-made maps
of red corresponding to increasing damage. The visibility of these predictions
can be toggled in the interface so that a user can see the underlying imagery. (in digital or printed format), which can be directly sent
to and used by first responders in the field. This ensures
the rapid diffusion of information among all stakeholders
to view the imagery and predictions, and doesn’t require users involved in disaster response management. It is worth noting
to have any formal GIS experience as all imagery is pre- that our stakeholder’s experience with applying such tools in
rendered. Specifically, users can: humanitarian settings and discussions with practitioners have
highlighted the importance of two aspects. First, the value
1) Toggle back and forth between the pre- and post-disaster of automating damage assessment lies in speed, rather than
imagery to easily see the differences; accuracy. Regardless of visible damages, detailed ground-level
2) Change the visibility of the damage predictions to see inspections by trained personnel are still needed to assess
the extent of the damage; the structural integrity of a building [29] and it is unlikely
3) Show standard layers (e.g. OpenStreetMap or Esri World that remote sensing technology will replace that in the near
Imagery) for additional spatial context. future. For this reason, the focus of satellite-based damage
This is implemented with open-source tools including: assessments should be to provide broad numerical estimates
GDAL3 , leaflet4 , and Docker5 . as fast as possible, rather than building-level prescriptions.
An instance of our visualizer is shown in Figure 6 for Secondly, while the immediate response is primarily informed
a scene from Jeremie, Haiti after the Haiti Earthquake in by disaster impact (which can be quantified by the number of
August, 2021. The tower of the Cathedral of Saint Louis Roi damaged buildings, among other metrics), long-term shelter
of France (middle of the scene) is classified as damaged by the recovery programs must take into account several other
model and can be seen to be destroyed. The code for running contextual factors, such as the socio-economic conditions of
inference with our final building damage model, as well as affected people and land ownership [30]. Because of this,
setting up an instance of the building damage visualizer tool information on building damage often needs to be combined
is publicly available 6 . with other data to be useful. Providing raw data including
geo-referenced building footprint masks and corresponding
damage levels to the humanitarian community is necessary to
enable these analyses. Furthermore, as we discussed in section
VI, trained models based on the proposed approach might
not be robust to significant distribution shift across different
geographies; as such, domain adaptation techniques need to be
explored to address the data biases issues [31]. In this context,
active learning and human-machine collaboration approaches
have been discussed in [32] and [33].
Fig. 7. Full screenshots of pre- and post-disaster images shown partially in
the building damage visualizer instance in Figure 6 for better visibility. 2021 X. C ONCLUSION
Haiti Earthquakes.
Natural disasters’ frequency is growing; thus, the impact
of such events on communities continues to increase. The
3 https://gdal.org/programs/index.html
strategic response of humanitarian organizations to allocate
4 https://leafletjs.com/ resources and save lives after disasters can be improved by
5 https://www.docker.com/ using AI tools. We propose a convolutional neural network
6 https://github.com/microsoft/Nonprofits/ model that uses satellite images from before and after natural
disasters to localize buildings using the UNet model and [14] V. Oludare, L. Kezebou, O. Jinadu, K. Panetta, and S. Agaian,
score their damage level on a scale of 1 (not-damaged) to 4 “Attention-based two-stream high-resolution networks for building
damage assessment from satellite imagery,” in Multimodal Image
(destroyed) using a multi-class classifier. We showed that while Exploitation and Learning 2022, vol. 12100. SPIE, 2022, pp. 224–
our proposed model demonstrates decent performance, it also 239.
works three times faster than the fastest xView2 challenge [15] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu,
Y. Mu, M. Tan, X. Wang et al., “Deep high-resolution representation
winning solution and over 50 times faster than the slowest learning for visual recognition,” IEEE transactions on pattern analysis
first place solution, which indicates a significant improvement and machine intelligence, vol. 43, no. 10, pp. 3349–3364, 2020.
from an operational perspective. We also developed a web- [16] R. Gupta and M. Shah, “Rescuenet: Joint building segmentation and
damage assessment from satellite imagery,” in 2020 25th International
based visualizer that can display the before and after imagery Conference on Pattern Recognition (ICPR). IEEE, 2021, pp. 4405–
along with the model’s building damage predictions on a 4411.
custom map to allow better inspection of the impacted areas by [17] H. Chen, E. Nemni, S. Vallecorsa, X. Li, C. Wu, and L. Bromley, “Dual-
decision-makers. This paper outlines results of a collaboration tasks siamese transformer framework for building damage assessment,”
arXiv preprint arXiv:2201.10953, 2022.
between Microsoft AI for Good/Humanitarian Action and 510 [18] A. Ismail and M. Awad, “Towards cross-disaster building damage
an initiative of the Netherlands Red Cross, to help inform assessment with graph convolutional networks,” arXiv preprint
field deployments using satellite imagery and AI technologies. arXiv:2201.10395, 2022.
[19] ——, “Bldnet: A semi-supervised change detection building damage
Our solution outperforms stakeholder’s current baseline model framework using graph convolutional networks and urban domain
significantly in terms of inference speed and segmentation knowledge,” arXiv preprint arXiv:2201.10389, 2022.
accuracy. This study’s stakeholder is planning to deploy and [20] Y. Wang, A. W. Z. Chew, and L. Zhang, “Building damage detection
from satellite images after natural disasters on extremely imbalanced
assess our proposed solution at the time of actual disasters. datasets,” Automation in Construction, vol. 140, p. 104328, 2022.
[21] Z. Xia, Z. Li, Y. Bai, J. Yu, and B. Adriano, “Self-supervised learning
R EFERENCES for building damage assessment from large-scale xbd satellite imagery
[1] OCHA-GHO, “Global humanitarian overview,” benchmark datasets,” arXiv preprint arXiv:2205.15688, 2022.
https://reliefweb.int/sites/reliefweb.int/files/resources/GHO2019.pdf, [22] J. Block, D. Crawl, T. Artes, C. Cowart, R. de Callafon, T. DeFanti,
2019. J. Graham, L. Smarr, T. Srivas, and I. Altintas, “Firemap: A web tool
[2] W. M. O. WMO, “Climate and weather related disasters surge five- for dynamic data-driven predictive wildfire modeling powered by the
fold over 50 years, but early warnings save lives - wmo report,” wifire cyberinfrastructure,” in AGU Fall Meeting Abstracts, vol. 2016,
https://news.un.org/en/story/2021/09/1098662, 2021. 2016, pp. PA23B–2234.
[3] Defence-Innovation-Unit, “xview2: Assess building damage,” [23] M. Madaio, S.-T. Chen, O. L. Haimson, W. Zhang, X. Cheng, M. Hinds-
https://xview2.org/challenge, 2019. Aldrich, D. H. Chau, and B. Dilkina, “Firebird: Predicting fire risk and
[4] R. C. Daudt, B. Le Saux, and A. Boulch, “Fully convolutional siamese prioritizing fire inspections in atlanta,” in Proceedings of the 22nd ACM
networks for change detection,” in 2018 25th IEEE International SIGKDD International Conference on Knowledge Discovery and Data
Conference on Image Processing (ICIP). IEEE, 2018, pp. 4063–4067. Mining, 2016, pp. 185–194.
[5] H. Hao, S. Baireddy, E. R. Bartusiak, L. Konz, K. LaTourette, [24] R. Gupta, B. Goodman, N. Patel, R. Hosfelt, S. Sajeev, E. Heim,
M. Gribbons, M. Chan, E. J. Delp, and M. L. Comer, “An attention-based J. Doshi, K. Lucas, H. Choset, and M. Gaston, “Creating xbd: A dataset
system for damage assessment using satellite imagery,” in 2021 IEEE for assessing building damage from satellite imagery,” in Proceedings of
International Geoscience and Remote Sensing Symposium IGARSS. the IEEE/CVF Conference on Computer Vision and Pattern Recognition
IEEE, 2021, pp. 4396–4399. (CVPR) Workshops, June 2019.
[6] T. Valentijn, J. Margutti, M. van den Homberg, and J. Laaksonen, [25] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
“Multi-hazard and spatial transferability of a cnn for automated building for biomedical image segmentation,” in International Conference
damage assessment,” Remote Sensing, vol. 12, no. 17, p. 2839, 2020. on Medical Image Computing and Computer-assisted Intervention.
[7] A. M. El Amin, Q. Liu, and Y. Wang, “Convolutional neural Springer, 2015, pp. 234–241.
network features based change detection in satellite images,” in [26] R. Gupta, R. Hosfelt, S. Sajeev, N. Patel, B. Goodman, J. Doshi,
First International Workshop on Pattern Recognition, vol. 10011. E. Heim, H. Choset, and M. Gaston, “xbd: A dataset for
International Society for Optics and Photonics, 2016, p. 100110W. assessing building damage from satellite imagery,” arXiv preprint
[8] S. Kaur, S. Gupta, S. Singh, D. Koundal, and A. Zaguia, “Convolutional arXiv:1911.09296, 2019.
neural network based hurricane damage detection using satellite images,” [27] A. Elia, S. Balbo, and P. Boccardo, “A quality comparison between
Soft Computing, pp. 1–15, 2022. professional and crowdsourced data in emergency mapping for potential
[9] Y. Zhan, K. Fu, M. Yan, X. Sun, H. Wang, and X. Qiu, “Change cooperation of the services,” European Journal of Remote Sensing,
detection based on deep siamese convolutional network for optical aerial vol. 51, no. 1, pp. 572–586, 2018.
images,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 10, [28] D. P. Coppola, Introduction to International Disaster Management.
pp. 1845–1849, 2017. Elsevier, 2006.
[10] F. Rahman, B. Vasu, J. Van Cor, J. Kerekes, and A. Savakis, “Siamese
[29] GFDDR, “Post-disaster needs assessments guidelines volume b
network with multi-level features for patch-based change detection in
housing,” 2017.
satellite imagery,” in 2018 IEEE Global Conference on Signal and
Information Processing (GlobalSIP). IEEE, 2018, pp. 958–962. [30] H. Shelter, “Settlements guidelines,” 2017.
[11] H. Chen, C. Wu, B. Du, and L. Zhang, “Change detection in multi- [31] S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and
temporal vhr images based on deep siamese multi-scale convolutional J. W. Vaughan, “A theory of learning from different domains,” Machine
networks,” arXiv preprint arXiv:1906.11479, 2019. learning, vol. 79, no. 1, pp. 151–175, 2010.
[12] D. Peng, Y. Zhang, and H. Guan, “End-to-end change detection for high [32] D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Munoz-Mari, “A
resolution satellite images using improved unet++,” Remote Sensing, survey of active learning algorithms for supervised remote sensing image
vol. 11, no. 11, p. 1382, 2019. classification,” IEEE Journal of Selected Topics in Signal Processing,
[13] J. Chen, Z. Yuan, J. Peng, L. Chen, H. Huang, J. Zhu, Y. Liu, and H. Li, vol. 5, no. 3, pp. 606–617, 2011.
“Dasnet: Dual attentive fully convolutional siamese networks for change [33] N. Jojic, N. Malkin, C. Robinson, and A. Ortiz, “From local algorithms
detection in high-resolution satellite images,” IEEE Journal of Selected to global results: Human-machine collaboration for robust analysis of
Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. geographically diverse imagery,” in 2021 IEEE International Geoscience
1194–1206, 2020. and Remote Sensing Symposium IGARSS. IEEE, 2021, pp. 270–273.