Building_Footprint_Generation_Using_Improved_Generative_Adversarial_Networks
Building_Footprint_Generation_Using_Improved_Generative_Adversarial_Networks
III. E XPERIMENTS
A. Description of Data sets
In this letter, we chose two study areas in Germany,
which were Munich and Berlin. We used PlanetScope satellite
imagery with three bands (R, G, and B) and a spatial resolution
of 3 m to test our proposed method. The corresponding build-
ing footprints were downloaded from OpenStreetMap (OSM).
We processed the imagery using a 256 × 256 sliding window
with a stride of 75 pixels to produce around 3000 sample
patches. The sample patches were divided into two parts, Fig. 3. Comparison of results generated by U-Net structure with different
depths. (a) Depth (d = 5). (b) Depth (d = 8). (c) Ground truth.
where 70% were used to train the network and 30% were
used to validate the trained model.
B. Experimental Setup F1 scores, and IoU scores. Specifically, the F1 and IoU metrics
are defined as follows:
The number of both generator and discriminator filters
2 × precision × recall
in the first convolution layer was 64. The downsampling F1 = (7)
factor is 2 in both the discriminator and the encoder of the precision + recall
generator. In the decoder of the generator, deconvolutions TP
IoU = (8)
were performed with an upsampling factor of 2. All con- TP + FP + FN
volutions and deconvolutions had a kernel size of 4 × 4, where TP is the number of true positives, FP is the number
a stride equal to 2, and a padding size of 1. An Adam of false positives, and FN is the number of false negatives.
solver with a learning rate of 0.0002 was adopted as an The impacts of hyperparameters have been investigated for our
optimizer for both networks. Furthermore, we use a batch proposed methods. First, the influence of different depths d of
size of one for each network and trained at 200 epochs. the U-Net structure has been explored.
The clipping parameter in CWGAN was 0.01. For the con- Fig. 3 shows the visual results of one patch with different
ditional Wasserstein generative adversarial network with gra- depths compared to the ground truth. As it can be seen
dient penalty (CWGAN-GP), the gradient penalty coefficient from Fig. 3, a large number of roofs are omitted by the
λ1 was set to 10 as recommended in [12]. Our networks network with d = 8 but are identified by the depth d =
were implemented with a Pytorch framework and trained 5. Similar phenomena have been reported in [15]. With the
on an NVIDIA TITAN X GPU with 12 GB of mem- network depth increasing, accuracy gets saturated and then
ory. Building footprint generation methods based on CGAN, degrades rapidly, since adding more layers to a suitably deep
U-Net, and ResNet-DUC in [14] were taken as the algorithms model leads to a higher training error. Note that the optimal
of comparison. depth of the network should be comparable with the size
of useful features in the imagery in order to achieve high
C. Results and Analysis accuracy.
In this letter, we evaluated the inference performances using Second, we have chosen different coefficients (λ2 = 1, 100)
metrics for a quantitative comparison: overall accuracy (OA), of L 1 loss with the CGAN and CWGAN-GP. The quantitative
606 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 16, NO. 4, APRIL 2019
Fig. 4. Visualized comparison of different networks and coefficients λ2 of L 1 loss. (a) CGAN (λ2 = 1). (b) CGAN (λ2 = 100). (c) ResNet-DUC. (d) U-Net.
(e) CWGAN (λ2 = 100). (f) CWGAN-GP (λ2 = 1). (g) CWGAN-GP (λ2 = 100). (h) Ground truth.
TABLE I
C OMPARISON OF D IFFERENT N ETWORKS ON THE T EST D ATA S ETS
Fig. 5. Training and inferencing time of different methods. (a) Training time
(in seconds). (b) Inferencing time (in milliseconds).
Fig. 6. Section of the entire Munich test area. Red: building footprint generated by the proposed method and overlays an optical image.