GRSL Minoro 2024
GRSL Minoro 2024
net/publication/383135469
CITATION READS
1 13
4 authors, including:
All content following this page was uploaded by Rodrigo Minetto on 10 January 2025.
Abstract—Active fire segmentation in satellite imagery is a crit- A recent trend in the field is attempting to improve fire
ical remote sensing task, providing essential support for planning, segmentation results by exploring computational models based
decision-making, and policy development. Several techniques on deep networks. The problem has been addressed in several
have been proposed for this problem over the years, generally
based on specific equations and thresholds, which are sometimes manners, from binary classification (i.e. presence or not of
empirically chosen. Some satellites, such as MODIS and Landsat- fire in an image) [6], [7], to the semantic segmentation of
8 have consolidated algorithms for this task. However, for other individual pixels [8], [9]. Remote sensing images from a num-
important satellites such as Sentinel-2, this is still an open ber of sensors have been used, such as GOES-16, Himawari-
problem. In this paper, we explore the possibility of using 8, VIIRS, CBERS, Landsat-8 and Sentinel-2, also including
transfer learning to train convolutional and transformer-based
deep architectures (U-Net, DeepLabV3+ and SegFormer) for multisensor approaches. In general, authors report superior re-
active fire segmentation. We pre-train these architectures based sults when using machine learning approaches over traditional
on Landsat-8 images and automatically labeled samples, and threshold-based methods for active fire segmentation.
fine-tune them to Sentinel-2 images. Experiments show that the Regarding the Sentinel-2 satellite, Zhang et al. [10] in-
proposed method achieves F 1-scores of up to 88.4% for Sentinel- troduced a framework for the acquisition and segmentation
2 images, outperforming three threshold-based algorithms by at
least 19%, while maintaining a low demand for manually labeled of active fire images. The study was limited to the United
samples. We also address detection over seam-line regions that States and Australia. They utilized the equations proposed
present a particular challenge for existing methods. The source by [11], adjusted for the collected images, to generate masks
code and trained models are available on GitHub1 . for training a deep network. The authors from [9] segment both
Index Terms—active fire segmentation, transfer learning, fine- active fires and burnt areas combining data from the Sentinel-1
tuning, Sentinel-2 imagery, seam-lines. and 2 satellites, as well as MODIS fire products, Google Earth
images, and field observation data to establish the ground truth
masks. The large number of burnt area pixels in these masks,
I. I NTRODUCTION
compared to active fire pixels, may favor metrics focused on
The automatic segmentation of active fire in satellite im- the former, while hiding a worse performance on the latter.
agery is fundamental for environmental monitoring, providing That study focused on the Mozambique region.
invaluable data for firefighters, researchers, and policymakers One of the major challenges when training deep networks
to assess the extent and intensity of wildfires; thus contributing is the need for a large amount of labeled data — for active fire
to the sustainable development goals defined by the United segmentation, in the form of many multispectral images with
Nations, such as responsible production (as fire is often used their corresponding fire masks. The usual way of obtaining
to prepare land for agriculture), climate action, and life on such data would involve the effort of human specialists an-
earth (impact on biodiversity). That led to a number of alyzing and annotating by hand each fire pixel on thousands
techniques being proposed for this of problem, usually based of images, a costly and time-consuming task. This problem
on specific equations and thresholds derived from statistical was addressed by Pereira et al. [8] by creating masks (which
properties observed in some satellite bands and how they were later made public) based on combinations of the seg-
relate. Some sensors, such as MODIS (onboard the Aqua mentation results produced by several Landsat-8 algorithms.
and Terra satellites), VIIRS (onboard the NPP and NOAA- However, that was only possible due to the maturity and the
20 satellites), and OLI (onboard the Landsat-8 and Landsat-9 general nature of these algorithms. For Sentinel-2, the lack of
satellites) [1], [2], [3], have consolidated algorithms, which algorithms with the same qualities prevents one from using the
achieve very high-quality results. However, for other important same approach to obtain high-quality, massive labeled data.
satellites such as Sentinel-2, this is still an open problem, with The great demand for labeled data for training deep net-
several candidate solutions being proposed [2], [4], [5]. works is a known issue that may affect applications from many
domains. One very popular way of dealing with it is transfer
Andre M. Fusioka, Gabriel H. de A. Pereira, Bogdan T. Nassu and Rodrigo
Minetto are with Federal University of Technology - Parana (UTFPR), learning: taking a model pre-trained on a large benchmark
Brazil. E-mails: [email protected], [email protected], dataset, such as ImageNet or COCO, and fine-tuning it on
{rminetto,bogdan}@utfpr.edu.br. We would like to thank CNPq (Grant a smaller, task-specific dataset. Although this approach is
312815/2023-9), CAPES, FAPESP and Fundação Araucária.
Manuscript received ? ?, ?; revised ? ?, ?. effective for many types of data, satellite imagery poses
1 https://github.com/Minoro/l8tos2-transf-seamlines one additional challenge: each satellite produces multispectral
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. ?, NO. ?, ? 2024 2
images with distinct wavelengths and spatial resolutions for Landsat-8 Sentinel-2
A. Landsat-8 pre-training
Fig. 1. Representation of the similarity of wavelengths observed by each For the Landsat-8 pre-training, shown on the left side of
band of Sentinel-2 and Landsat-8. Adapted from: USGS [16]. Fig. 2, we used 22,097, 256 × 256-pixel image patches from
the dataset compiled by Pereira et al. [8]. We selected non-
In summary, the main contributions of this paper are: (i) overlapping patches covering the entire globe and including
We evaluate three popular deep network architectures for many large and small wildfire events in areas such as the
semantic segmentation (U-Net [12], Deeplab-V3+ [13] and Amazon region, Africa, Australia and the United States. Each
Segformer-B0 [15]), taking pre-trained models for active fire patch has an associated fire mask, obtained by combining
segmentation on Landsat-8 images and fine-tuning them for the masks produced by three well-known sets of condi-
Sentinel-2 images; (ii) we show that the described method tions: Schroeder et al. [1], Murphy et al. [2], and Kumar-
can successfully produce active fire segmentation models Roy et al. [3]. The masks were combined through a pixelwise
for Sentinel-2 considering three deep network architectures, majority voting process — a pixel is set in the active fire mask
without the need for massive hand-labeled data, with reduced if it is set by at least two condition sets.
training time, with results that surpass those obtained by three
We performed pre-training on the Landsat-8 images using
traditional threshold-based methods; and (iii) we point out how
3 networks: besides the U-Net [12], used by [8], we also
seam-lines are less prone to affect fine-tuned deep learning
train DeepLabV3+ [13] (with a ResNet50 backbone) and
models than non fine-tuned models or thresholding techniques.
Segformer-B0 [15] models, allowing us to compare differ-
ent architectures for transfer learning. Furthermore, different
II. M ETHODOLOGY from [8], who trained the models on Landsat-8 bands 7, 6 and
2, we trained our models on bands 7, 6 and 5, which roughly
The methodology for active fire segmentation on Sentinel- correspond to Sentinel-2 bands 12, 11 and 8A. Each network
2 images involves fine-tuning models trained on Landsat- was trained for up to 50 epochs, halting the training if the
8 images through a transfer learning process. The scheme validation loss did not improve for 5 epochs, with a learning
for transfer learning is summarized in Fig. 2. The left and rate of 0.001, using the Adam optimizer, applying vertical
right pipeline flows refer, respectively, to the Landsat-8 pre- and horizontal flips for data augmentation, with 8 images per
training and the Sentinel-2 transfer learning stages, and are batch. Note that we did not evaluate the performance of the
further detailed below. We also describe the dataset and the trained models on Landsat-8 images, since our aim is fine-
threshold-based algorithms considered in our comparison. All tuning these models for Sentinel-2 images, through a transfer
implementations were in Python, using the TensorFlow library. learning procedure.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. ?, NO. ?, ? 2024 3
TABLE I
Fig. 3. Sentinel-2 fine-tuning: the pre-trained weights from Landsat-8, used
S UMMARY OF THE BANDS USED BY THRESHOLD - BASED ACTIVE FIRE
as a starting point, are further refined by using similar wavelength bands and
SEGMENTATION METHODS . T HE U NAMBIGUOUS F IRE ROW PRESENTS THE
reference masks. The numbers indicate the order of events. The reference
BANDS USED TO IDENTIFY ACTIVE FIRE , THE P OTENTIAL F IRE ROW
masks were fully annotated by a remote sensing specialist, with part of this
PRESENTS THE BANDS USED TO IDENTIFY PIXELS NOT CAPTURED AS
set being used for training and the remaining for testing.
UNAMBIGUOUS FIRES , GENERALLY APPLIED TO THE NEIGHBORHOOD OF
THESE PIXELS . T HE FALSE A LARM C ONTROL ROW SHOWS THE BANDS
USED TO DISCARD COMMON SOURCES OF CONFUSION , SUCH AS WATER
BODIES AND SHADOWS .
C. Sentinel-2 dataset
Fig. 4 shows the world locations of the Sentinel-2 image Kato-Nakamura Liu et al. Murphy et al.
samples used in our tests. Our Sentinel-2 dataset is divided Unambiguous fire b12 , b11 , b8A b12 , b11 , b8A b12 , b11 , b8A
into three subsets: one with patches with the presence of Potential fire — b12 , b11 b12 , b11 , b8A
active fire (387 patches), another one without active fire spots False Alarm — b12 , b8A —
(10, 577 patches), and the last one with 1, 278 image patches,
all with the presence of seam-lines — artifacts caused by the
composition of multiple captures, a known issue that results III. R ESULTS AND D ISCUSSION
in misaligned bands visible in regions with clouds, and that
affects active fire recognition [17]. It is important to stress that In this section, we report performances according to the
the images with and without fire pixels also contain samples well-known metrics precision (P = tp/(tp + f p)) and re-
with seam-lines in a small proportion, which is exactly what call (R = tp/(tp + f n)), where the number of pixels of
led us to identify this problem and create a specific subset true positives (tp), false positives (f p), and false negatives
with just this type of image for later analysis. (f n) were accumulated for all images, and the metrics were
For our experiments, we used 5-fold cross-validation, with computed for the entire test set. We also report the F 1-score
3 folds being used for training, 1 for validation, and 1 for (F 1-score = 2/(1/P + 1/R)) and IoU metrics, which are
testing (training and test patches do not overlap). The folds commonly applied for segmentation problems.
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. ?, NO. ?, ? 2024 4
Threshold-based
which displayed an under-segmentation behavior (restrictive-
ness), resulting in the best precision (≈ 97%) but the worst
recall (≈ 24%). Fig. 5 shows that this method failed to identify
several active fire regions. Conversely, the Liu et al. [17] and
Murphy et al. [2] methods have a super-segmentation behavior
(as can be seen in Fig. 5), with a high recall rate (above 81%), SegFormer (TL w/o FT) DeepLab (TL w/o FT) U-Net (TL w/o FT)
but a low precision (equal or below to 59%).
Deep learning
TABLE II
ACTIVE FIRE SEGMENTATION PERFORMANCES CONSIDERING THREE
HANDCRAFTED THRESHOLD - BASED ALGORITHMS FOR S ENTINEL -2
IMAGES (M URPHY et al., K ATO -NAKAMURA AND L IU et al.), AND THREE
DEEP ARCHITECTURES (D EEP L AB V3+, S EG F ORMER B0 AND U-N ET ),
WITH TRANSFER LEARNING BUT NO FINE - TUNING (TL, NO FT), AND
WITH TRANSFER LEARNING AND FINE - TUNING (TL + FT). T HE RESULTS SegFormer (TL w/ FT) DeepLab (TL w/ FT) U-Net (TL w/ FT)
ARE THE MEAN AND STANDARD DEVIATION OVER 5 FOLDS . B OLDFACE
VALUES CORRESPOND TO THE BEST PERFORMANCE IN EACH COLUMN .
Deep learning
and proved especially sensitive to the seam-lines, which we took less than 3 minutes per epoch for DeepLab, 6 minutes for
observed tend to be detected as false positives. The Kato- U-Net, and 7 minutes for SegFormer (918 batches per epoch).
Nakamura method was not affected the same way, due to it Training a network from scratch with Sentinel-2 using
already being very restrictive. Models without fine-tuning also masks built by the thresholding algorithms, the same way as
have a significant performance drop, as they are often confused done for Landsat-8, would carry the bias of the masks. This
by these artifacts that do not appear in the Landsat-8 images means that the segmentation of seam-lines could occur since it
they were trained on. Fine-tuned models were still affected, but is a problem for the Liu et al. and Murphy et al. methods. On
on a much smaller scale. A few patches with seam-lines are the other hand, the Kato-Nakamura method is very restrictive,
present in the training set, and these were seemingly enough which would lead to many omission errors. In this context, the
to allow these models to avoid the problem in most cases. proposed approach seems to be viable since it relies on a large
source of data provided for Landsat-8 and its consolidated fire
Seam-line (zoom) Sentinel-2 patch Footprint (metadata) Liu et al. segmentation algorithms. In future works, other satellites with
similar wavelengths will be studied, in addition with the use
of satellite-specific metadata for enhancing results.
R EFERENCES
Murphy et al. SegFormer U-Net SegFormer, U-Net, [1] W. Schroeder, P. Oliva, L. Giglio, B. Quayle, E. Lorenz, and F. Morelli,
(TL w/o FT) (TL w/o FT) DeepLab (w/ FT) “Active fire detection using Landsat-8/OLI data,” Remote Sensing of
Environment, vol. 185, pp. 210–220, nov 2016.
[2] S. W. Murphy, C. R. de Souza Filho, R. Wright, G. Sabatino, and
R. Correa Pabon, “HOTMAP: Global hot target detection at moderate
spatial resolution,” Remote Sensing of Env., vol. 177, pp. 78–88, 2016.
[3] S. S. Kumar and D. P. Roy, “Global operational land imager Landsat-8
reflectance-based active fire detection algorithm,” International Journal
of Digital Earth, vol. 11, no. 2, pp. 154–178, 2018.
[4] S. Kato and R. Nakamura, “Detection of thermal anomaly using
Fig. 6. Presence of seam-lines in a Sentinel-2 image patch, as indicated Sentinel-2A data,” in IEEE International Geoscience and Remote Sens-
by the footprint metadata; false positive errors for Kato-Nakamura (1 pixel), ing Symposium, 7 2017, pp. 831–833.
Liu et al. (92 pixels), Murphy et al. (88 pixels); and networks without fine- [5] Y. Liu, W. Zhi, B. Xu, W. Xu, and W. Wu, “High-temperature anomalies
tuning DeepLab (51 pixels), SegFormer (94 pixels) and U-net (83 pixels). from Sentinel-2 MSI images,” ISPRS Journal of Photogrammetry and
After fine-tuning, the three architectures did not report any errors (0 pixels). Remote Sensing, Elsevier, vol. 177, pp. 174–193, 2021.
[6] Y. Kang, T. Sung, and J. Im, “Toward an adaptable deep-learning
model for satellite-based wildfire monitoring with consideration of
environmental conditions,” Remote Sensing of Env., vol. 298, 2023.
F 1-score drop (%)
FT
pL / FT
T
a
hy l.
pL /o F
/F
a
or /o F
/F
et
am
et
/o
w
u
ak
er
ab
et
Li
er
ab
Se urp
Se et
N
-N
-N
o-
U
or
ee
at
gF
U
ee
D
K
Fig. 7. Performance drop in F 1-score when removing most patches contain- “Towards a deep-learning-based framework of Sentinel-2 imagery for
ing seam-lines from the training set, and adding them to all the test folds. automated active fire detection,” Remote Sensing, vol. 13, 2021.
Fine-tuning, even on a very small number of patches containing seam-lines, [11] X. Hu, Y. Ban, and A. Nascetti, “Sentinel-2 MSI data for active
made the models more robust to their presence. fire detection in major fire-prone biomes: A multi-criteria approach,”
Intl. J. Applied Earth Observation and Geoinformation, vol. 101, 2021.
[12] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
works for biomedical image segmentation,” in Medical Image Comput-
IV. C ONCLUSIONS ing and Computer-Assisted Intervention (MICCAI), 2015, pp. 234–241.
[13] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking
atrous convolution for semantic image segmentation,” 2017. [Online].
Available: http://arxiv.org/abs/1706.05587++
The similarities between the Landsat-8 and Sentinel-2 bands [14] Z. Zhou, F. Zhang, H. Xiao, F. Wang, X. Hong, K. Wu, and J. Zhang,
allowed transfer learning from the former to the latter using “A novel ground-based cloud image segmentation method by using deep
a few labeled samples for three different types of deep archi- transfer learning,” IEEE Geosci. and Remote Sens. Letters, vol. 19, 2022.
[15] E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo,
tectures (SegFormer, DeepLab and U-Net). Not only that, but “Segformer: Simple and efficient design for semantic segmentation with
the fine-tuned models outperform the thresholding techniques transformers,” http://arxiv.org/abs/2105.15203, 5 2021.
available for Sentinel-2. The proposed models were also able [16] US Geological Survey, “USGS EROS Archive, Comparison of Sentinel-
2 and Landsat,” 2023. [Online]. Available: https://www.usgs.gov
to reduce the number of false positives generated by the [17] Y. Liu, B. Xu, W. Zhi, C. Hu, Y. Dong, S. Jin, Y. Lu, T. Chen, W. Xu,
segmentation of seam-lines, which are common on Sentinel-2 Y. Liu, B. Zhao, and W. Lu, “Space eye on flying aircraft: From Sentinel-
images. Moreover, since only a few images were used, fine- 2 MSI parallax to hybrid computing,” Remote Sensing of Environment,
vol. 246, 9 2020.
tuning the models to the new satellite was fast: while the
base Landsat-8 networks took up to 2.5 hours per epoch to
be trained by using a Nvidia Titan Xp (12 GB), fine-tuning