A Comparative Analysis of GAN-Based Methods For SAR-To-Optical Image Translation
A Comparative Analysis of GAN-Based Methods For SAR-To-Optical Image Translation
Abstract— Unlike optical sensors, synthetic aperture aperture radar (SAR) sensors are among the most commonly
radar (SAR) sensors acquire images of the Earth’s surface with used imaging sensors with different imaging capabilities. For
all-weather and all-time capabilities, which is vital in a situation instance, in disaster-affected areas covered by clouds, optical
such as a disaster assessment. However, SAR sensors do not offer
as rich visual information as optical sensors. SAR-to-optical sensors lose the ability to acquire images with good area
image-to-image translation generates optical images from SAR coverage. On the other hand, SAR sensors, with all-weather
images to benefit from what both imaging modalities have and all-time capabilities, can acquire images with minimal
to offer. It also enables multisensor image analysis of the effects of clouds or weather conditions, which enables the
same scene for applications such as heterogeneous change assessment of the disaster impact by practitioners. Yet, SAR
detection. Various architectures of generative adversarial
networks (GANs) have achieved remarkable image-to-image sensors do not offer as rich visual information as optical
translation results in different domains. Still, their performances sensors, making it difficult for remote-sensing practitioners to
in SAR-to-optical image translation have not been analyzed interpret the acquired SAR images. Combining all-weather and
in the remote-sensing domain. This letter compares and all-time capabilities of SAR sensors with the high-resolution
analyzes the state-of-the-art GAN-based translation methods detailed visual information of optical sensors can improve the
with open-source implementations for SAR-to-optical image
translation. The results show that GAN-based SAR-to-optical practitioner’s interpretation of SAR images and further guide
image translation methods achieve satisfactory results. However, downstream tasks such as heterogeneous change detection.
their performances depend on the structural complexity of However, the significant diversity of feature expression in SAR
the observed scene and the spatial resolution of the data. and optical images hinders joint data analysis, which remains a
We also introduce a new dataset with a higher resolution prevalent challenge for subsequent image-processing pipelines.
than the existing SAR-to-optical image datasets and release
implementations of GAN-based methods considered in this letter A feasible solution for this challenge is establishing a map-
to support the reproducible research in remote sensing. ping enabling the conversion between the source (i.e., SAR)
and target (i.e., optical) domains. Through cross-domain map-
Index Terms— Generative adversarial network (GAN), image-
to-image translation, multisensor images, optical, remote sens- ping, images in the source domain obtain the target domain’s
ing, SAR-to-optical image translation, synthetic aperture characteristics, reducing the nonlinear difference between het-
radar (SAR). erogeneous images. Since the cross-mapping process is similar
to the translation in natural language processing, this method
I. I NTRODUCTION is named image-to-image translation (I2IT). Several remark-
able I2IT methods have achieved excellent results on natural
E ARTH observation platforms acquire multisensor images
(or heterogeneous images) of the land surface at different
times, revealing complementary multitemporal information on
images, among which the most popular methods are based on
the generative adversarial network (GAN) [1]. According to
the land surface properties. The sensor characteristics deter- the optimization strategy and the setup of the source–target
mine the type of information and all-weather and all-time datasets, the existing GAN-based I2IT (GAN-I2IT) meth-
capabilities of the imaging platforms. Optical and synthetic ods fall under two main categories of paired and unpaired
methods [2].
Manuscript received April 18, 2022; accepted May 18, 2022. Date of The paired methods operate on a dataset of the correspond-
publication May 23, 2022; date of current version June 2, 2022. This work ing source and target images to achieve image translation,
was supported in part by the Sichuan Provincial Science and Technology
Projects under Grant 2019JDJQ0023 and in part by the National Key that is, for each source image (SAR), there must be a target
Research and Development Program of China under Grant 2020YFB0505704. (optical) image. Among the existing paired GAN-I2IT meth-
(Corresponding author: Turgay Celik.) ods, Pix2Pix [3] is a milestone using paired samples as the
Yitao Zhao, Nanqing Liu, and Heng-Chao Li are with the School of
Information Science and Technology, Southwest Jiaotong University, Chengdu input conditions and inherits the convolutional GAN [4] to
610032, China (e-mail: [email protected]; [email protected]; process image data. Many GAN-I2IT methods are inspired by
[email protected]). this method [5], [6]. Building a paired dataset requires careful
Turgay Celik is with the School of Information Science and Technology,
Southwest Jiaotong University, Chengdu 610032, China, also with the School selection of coregistered source and target images. However,
of Electrical and Information Engineering, University of the Witwatersrand, most image data collected are unpaired images, which means
Johannesburg 2000, South Africa, and also with the Faculty of Engineer- a lack of constraining conditions such as label images or
ing and Science, University of Agder, 4604 Kristiansand, Norway (e-mail:
[email protected]). instance images used in paired translation methods. Therefore,
Digital Object Identifier 10.1109/LGRS.2022.3177001 the research interest in recent years has gradually shifted from
1558-0571 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: DMI College of Engineering - CHENNAI. Downloaded on August 18,2022 at 05:20:19 UTC from IEEE Xplore. Restrictions apply.
3512605 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022
ter is the first comprehensive analysis of GAN-I2IT Ex∼ pd (x) log(1 − D(G(x)) , where the generator G tries to
methods for SAR-to-optical image translation to the best minimize the objective function, while the discriminator D
of our knowledge. tries to maximize it. Benefiting from the adversarial loss
2) We collected SAR-to-optical image pairs from function, the image generated by GAN is increasingly indis-
TerraSAR and other optical satellite image resources tinguishable from the real image.
and introduced a high-resolution dataset named In this study, we select seven state-of-the-art paired and
SAR2Opt comprising more than 4000 pairs of samples. unpaired GAN-I2IT methods for SAR-to-optical image trans-
3) We perform an extensive experimental analysis of lation as demonstrated in Fig. 1. We only considered the
GAN-I2IT methods on the SEN1-2 dataset [12] and the methods with open-source implementations for fair compar-
newly proposed SAR2Opt dataset to establish a baseline isons, only modified their data loading implementations to
for future research. We provide several recommendations accommodate the datasets, and did not make any other changes
for selecting GAN-I2IT methods for SAR-to-optical to the original implementations.
image translation and some possible future research
directions. A. Paired Methods
4) To support the reproducible research on SAR-to-optical 1) Pix2Pix [3]: Pix2Pix is a milestone method for applying
image translation, we released implementations of GAN to image-to-image translation. Pix2Pix pioneers the
GAN-I2IT methods considered in this letter and the utilization of conditional GAN (cGAN), and paired images
newly introduced SAR2Opt dataset at https://github. are considered as input conditions to guide the generation
com/MarsZhaoYT/Sar2Opt-Heterogeneous-Dataset. procedure. cGAN significantly improves on the drawback that
Authorized licensed use limited to: DMI College of Engineering - CHENNAI. Downloaded on August 18,2022 at 05:20:19 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: COMPARATIVE ANALYSIS OF GAN-BASED METHODS FOR SAR-TO-OPTICAL IMAGE TRANSLATION 3512605
the original GAN merely generates random images through the TABLE I
input noise vector, in which artificial control is challenging to A MOUNT OF T RAINING AND T ESTING PAIRS
exert.
2) BicycleGAN [5]: Since the pairing translation strategy
of Pix2Pix can only generate a single image based on the
input sample, the proposal of BicycleGAN remedies the short-
coming in the diversity of sample generation. BicycleGAN
adds the coding information of the target domain contained
in the hidden layer and introduces random noise satisfying 1) SEN1-2 Dataset: SEN1-2 [12] is an extensive public
the Gaussian distribution to broaden the diversity of image dataset released by TUM, which contains 2 82 384 pairs of
generation. SAR at a maximum spatial resolution of 5 m and optical
images acquired by Sentinel-1 and Sentinel-2 satellites, respec-
B. Unpaired Methods tively. The dataset contains four subdatasets (SEN1-2-Spring,
1) CycleGAN [7]: Corresponding to Pix2Pix in paired SEN1-2-Summer, SEN1-2-Fall, and SEN1-2-Winter) with
translation, CycleGAN is also a representative method of imaging times of different seasons. The images in the SEN1-2
unpaired translation. CycleGAN uses unpaired datasets and dataset are 256 × 256 pixels.
introduces a cycle consistency loss function. It converts the 2) SAR2Opt Dataset: We used TerraSAR-X to collect SAR
samples in the source domain into the target domain and then images at the spatial resolution of 1 m of ten cities in Asia,
transmits them back to calculate the cycle consistency. North America, Oceania, and Europe from 2007 to 2013.
2) MUNIT [13]: Compared with CycleGAN, which directs The corresponding optical images collected from Google
feature extraction, multimodal unsupervised image-to-image Earth Engine are coregistered to SAR images by manual
translation (MUNIT) decomposes the image into content space selection of control points. We extracted image patches of
and attribute space, encodes the global image content and size 600 × 600 pixels from coregistered SAR-to-optical image
style, and then exchanges the attribute codes to realize trans- pairs to form the SAR2Opt dataset.
lation. Specifically, the content codes from the source domain Note that the SEN1-2 and SAR2Opt datasets contain multi-
are combined with different style codes from the target domain ple paired SAR-to-optical image samples. We used the original
to reconstruct and realize the diversity of generation. paired datasets to train the paired methods. However, we ran-
3) NICE-GAN [8]: NICE-GAN is an improvement over domly shuffled the source and target images of the datasets
the structure of CycleGAN. A typical GAN-based image for unpaired methods to ensure that the datasets no longer
translation method requires alternate training of the generator contained paired samples. After that, we trained and tested
G and the discriminator D. There is no requirement for an paired and unpaired GAN-I2IT methods on the paired and
independent generator in NICE-GAN. After the discriminator unpaired datasets, respectively.
training, part of the early layers in the discriminator are
retained and reused as a generator for generation, which B. Metrics
reduces the complexity of the model structure. We employed full-reference image quality metrics peak
4) CUT [2]: CUT introduces contrastive learning into signal-to-noise ratio (PSNR) and structural similarity index
GAN-based image generation for the first time. Traditional measure (SSIM) [15] to measure the quality of SAR-to-optical
contrast learning uses internal patches and external patches image translation. The PSNR is a simple to compute metric,
during sampling. However, only internal patches are used but it may not align well with the perceived distortion. On the
in CUT. This strategy eliminates the strict restriction of other hand, the SSIM metric measures the perceptual similarity
“bijection” between two image domains required in the cycle based on local image structure, luminance, and contrast and
consistency. In addition, in terms of model structure, CUT aligns well with the perceived distortion. The higher the PSNR
streamlines the structure of CycleGAN, retaining only the and SSIM scores, the better the translation is.
single generator and discriminator structure.
5) Attn-CycleGAN [14]: To deal with the critical problems C. Model Training and Testing
such as mode collapse and diminished gradient in practical
The training and inference processes are implemented on
applications of GAN, Attn-CycleGAN incorporates a spatial
an NVIDIA RTX 2080Ti GPU with 11 GB of memory. Each
attention mechanism to calculate the attention map and guide
of the selected methods is trained for 100 epochs. The initial
the translation process of the generator.
learning rate of each method is 0.001, and the batch size is
set to 1. The learning rate decreases gradually after 50 epochs.
III. E XPERIMENTS
Considering that the SEN1-2 dataset [12] is too large for our
A. Datasets computing resources, we randomly sample datasets from the
We used the SEN1-2 dataset [12] and our newly proposed SEN1-2 dataset comprising 20% of the original dataset. The
SAR2Opt dataset to evaluate image translation performances SAR2Opt dataset was employed as a whole given its moderate
of different GAN-based methods. Fig. 2 shows some examples size. For all experiments, we use 80% of the data for training
from the datasets, and the relevant characteristics of the and the remaining 20% for testing. The detailed amount of
datasets are summarized as follows. samples used for training and testing is listed in Table I.
Authorized licensed use limited to: DMI College of Engineering - CHENNAI. Downloaded on August 18,2022 at 05:20:19 UTC from IEEE Xplore. Restrictions apply.
3512605 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022
TABLE II
E XPERIMENTAL R ESULTS ON THE SEN1-2 AND SAR2O PT D ATASETS
Fig. 2. Sample SAR-to-optical translation results on the SEN1-2 (top row) and SAR2Opt (bottom row) datasets.
Furthermore, we did not use data augmentation during the is better than their performances on the SEN1-2 dataset.
training process of the methods considered in this study. Specifically, Pix2Pix and CycleGAN maintain the best perfor-
mance on SAR2Opt. In addition, we note that Attn-CycleGAN
D. Experimental Results achieves comparable performance results to CycleGAN on the
1) Quantitative Analysis: Table II(a)–(d) shows the perfor- SEN1-2-Spring and SAR2Opt datasets. Since the SAR2Opt
mances of the GAN-I2IT methods on the SEN1-2 dataset, dataset contains higher-resolution SAR-to-optical image pairs
demonstrating that Pix2Pix and CycleGAN, on average, with richer image textures than the image pairs in the SEN1-2
achieve the best performances among paired and unpaired dataset, the GAN-I2IT methods, on average, yield better
methods. Table II(a) and (b) shows that the unpaired performances in this dataset.
methods achieve comparable results to the paired meth- 2) Qualitative Analysis: Fig. 2 shows SAR-to-optical image
ods on the SEN1-2-Spring and SEN1-2-Summer subdatasets. translation results of the GAN-I2IT methods on sample test
However, paired methods outperform unpaired methods on images from the SEN1-2 and SAR2Opt datasets. Among
SEN1-2-Fall and SEN1-2-Winter subdatasets as shown in the paired methods, Pix2Pix generates visually better results
Table II(c) and (d). By analyzing the corresponding datasets, compared to the results from BicycleGAN. BicycleGAN aims
we discovered that the land surface properties of the to improve the diversity in the generation process, which
SEN1-2-Spring and SEN1-2-Summer subdatasets are more resulted in additional surface artifacts and style changes in
complex than that of SEN1-2-Fall and SEN1-2-Winter sub- the translated images. CycleGAN’s results are closest to the
datasets, covering coastal, towns, and farmlands. Therefore, ground-truth images among the unpaired methods for low-
the unpairing strategy in CycleGAN achieves the best results resolution and high-resolution samples from the SEN1-2 and
on the first two subdatasets containing complex scenes. SAR2Opt datasets. The translation results of Attn-CycleGAN,
Table II(c) and (d) indicates that Pix2Pix and CycleGAN CUT, and NICE-GAN are comparable to CycleGAN’s results.
maintain the best performance on the SEN1-2-Fall and Due to its attention mechanism, Attn-CycleGAN produces
SEN1-2-Winter subdatasets. Since the two datasets possess results with more detailed texture information on the translated
relatively simple scenes and weak texture, paired methods images than CUT and NICE-GAN. One can also observe the
result in better performance on these two datasets. patching artifacts on the generated images of NICE-GAN.
Table II(e) shows the results of the GAN-I2IT methods Among all methods, MUNIT produced results with severe
on the proposed SAR2Opt dataset, indicating that the paired visual artifacts.
methods, on average, achieve better performance than the 3) Computational Stability and Efficiency: We test the
unpaired methods, among which CycleGAN outperforms other stability and convergence speed of the GAN-I2IT methods
methods. The metric results on the SAR2Opt dataset follow during the training process on the SAR2Opt dataset based
a similar trend to that of the SEN1-2 dataset, confirming on the Fréchet inception distance (FID) score [16] as shown
the applicability of the proposed dataset. The overall per- in Fig. 3. The FID score measures the distance between
formance of GAN-I2IT methods on the SAR2Opt dataset the feature vectors of synthetic and real images. Lower FID
Authorized licensed use limited to: DMI College of Engineering - CHENNAI. Downloaded on August 18,2022 at 05:20:19 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.: COMPARATIVE ANALYSIS OF GAN-BASED METHODS FOR SAR-TO-OPTICAL IMAGE TRANSLATION 3512605
Authorized licensed use limited to: DMI College of Engineering - CHENNAI. Downloaded on August 18,2022 at 05:20:19 UTC from IEEE Xplore. Restrictions apply.