CNN - PolSAR_Image_Classification_Based_on_Deep_Convolutional_Neural_Networks_Using_Wavelet_Transformation

This document presents a deep convolutional neural network (CNN) model that utilizes Haar wavelet transformation for improved classification of polarimetric synthetic aperture radar (PolSAR) imagery. The proposed model outperforms shallower CNNs, achieving an accuracy of 93.3% in classifying PolSAR images, which is significantly higher than the accuracies of existing models. The research highlights the effectiveness of deep learning in extracting hierarchical features and reducing noise in classification maps for applications such as wetland mapping.

Uploaded by

Hsk Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CNN - PolSAR_Image_Classification_Based_on_Deep_Convolutional_Neural_Networks_Using_Wavelet_Transformation

Uploaded by

Hsk Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL.

19, 2022 4510105

PolSAR Image Classification Based on Deep

Convolutional Neural Networks Using
Wavelet Transformation
Ali Jamali , Masoud Mahdianpari , Senior Member, IEEE, Fariba Mohammadimanesh , Member, IEEE,
Avik Bhattacharya , Senior Member, IEEE, and Saeid Homayouni , Senior Member, IEEE

Abstract— Shallow convolutional neural networks (CNNs) have hand, deep learning algorithms have been widely used in
successfully been used to classify polarimetric synthetic aperture image processing and analysis as a hierarchical feature learn-
radar (PolSAR) imagery. However, one drawback of the existing ing approach [6]. Unlike conventional methods that rely on
deep CNN-based techniques is that the input PolSAR training
data are often insufficient due to their need for a significant artificial features [7], CNN can automatically extract efficient
number of training data compared to shallow CNN models discriminative features for a given task. Furthermore, deep
utilized in PolSAR image classification. In this letter, we propose learning-based models have the unique ability to encode
using the Haar wavelet transform in deep CNNs for effective fea- spatial feature information hierarchically. PolSAR images are
ture extraction to improve the classification accuracy of PolSAR extensively affected by speckle noises as a result of imaging
imagery. Based on the results, the proposed deep CNN model
obtained better average accuracy in the San Francisco region techniques, making accurate classifications more challenging.
with an accuracy of 93.3% and produced more homogeneous Extracting discriminative features has significantly enhanced
classification maps with less noise compared to the two much classification performance [6]. Furthermore, compelling hand-
shallower CNN models of AlexNet (87.8%) and a 2-D CNN crafted features necessitate prior knowledge and experience,
network (91%). The proposed algorithm is efficient and may both of which are difficult and uncommon to achieve, specif-
be applied over large areas to support regional wetland mapping
and monitoring activities using PolSAR imagery. The codes are ically for complex PolSAR data classification [8]. Convo-
available at (https://github.com/aj1365/DeepCNN_Polsar). lutional neural networks (CNNs) can automatically extract
hierarchical features and perform an end-to-end classification
Index Terms— Convolutional neural network (CNN), deep
learning, Haar transform, land cover mapping, polarimetric than a manual feature extractor. This fact is one of the most
synthetic aperture radar (PolSAR). important reasons for the success of CNN models in PolSAR
image classification. One should note that although going
I. I NTRODUCTION deep with convolutional layers obtains better generalization
capability and a higher level of accuracy, it would require
P OLARIMETRIC synthetic aperture radar (PolSAR), one
of the most widely used data in microwave remote
sensing, has made significant advances in recent years. The
more training data for convergence compared to shallower
CNN models. As a result, shallower CNNs often obtain better
PolSAR image classification task aims to classify images PolSAR classification accuracies than deep CNN models with
into several terrain classes, such as water, vegetation, and the existing PolSAR data benchmarks [4]. This letter assumes
urban area. These are helpful for geological investigation, that CNN networks with less than five convolutional layers
city planning, ocean monitoring, and determining plant growth are connected to shallow networks.
status [1]. Many PolSAR image classification and segmenta- Moreover, the implementation of CNN is additionally lim-
tion techniques have been developed [2]–[5]. On the other ited by the lack of polarimetric prior knowledge. Furthermore,
another drawback of the existing CNN-based technique is
Manuscript received 23 April 2022; revised 15 June 2022; accepted that the input PolSAR training data is still insufficient [6].
16 June 2022. Date of publication 21 June 2022; date of current version
6 July 2022. The work of Masoud Mahdianpari was supported by the Natural On the other hand, wavelet transform has been reported as
Sciences and Engineering Research Council (NSERC) under Grant RGPIN- an effective feature extractor for hyperspectral image (HSI)
2022-04766. (Corresponding author: Masoud Mahdianpari.) classification [9]. It was shown that by combining the wavelet
Ali Jamali is with the Civil Engineering Department, Faculty of
Engineering, University of Karabük, Karabük 78050, Turkey (e-mail: transform with a 2-D CNN model, the spectral and spatial
[email protected]). properties of HSI were successfully explored [9], [10]. We pro-
Masoud Mahdianpari is with the C-CORE, St. John’s, NL A1B 3X5, pose using the Haar transform to extract a valuable feature to
Canada, and also with the Department of Electrical and Computer Engineer-
ing, Memorial University of Newfoundland, St. John’s, NL A1B 3X5, Canada improve the classification accuracy of PolSAR imagery using
(e-mail: [email protected]). deep CNN models. With wavelet transform, deep CNNs can
Fariba Mohammadimanesh is with the Canada Centre for Mapping and be effectively trained with the existing PolSAR benchmark
Earth Observation, Ottawa, ON K1S 5K2, Canada.
Avik Bhattacharya is with the Microwave Remote Sensing Laboratory, datasets, achieving better classification results than shallower
Centre of Studies in Resources Engineering, Indian Institute of Technology CNNs. As such, a three-branch deep CNN network that
Bombay, Mumbai 400076, India. utilizes the capability and advantage of both deep CNNs and
Saeid Homayouni is with the Institut National de la Recherche Scientifique,
Centre Eau Terre Environment, Quebec City, QC G1K 9A9, Canada. wavelet transform is proposed for accurate PolSAR image
Digital Object Identifier 10.1109/LGRS.2022.3185118 classification. The motivation of this letter is to improve
1558-0571 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Ulsan. Downloaded on September 24,2022 at 03:25:04 UTC from IEEE Xplore. Restrictions apply.
4510105 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022

Fig. 1. Architecture of the proposed deep CNN model.

the PolSAR image classification accuracy, where CNNs have concatenate layer, followed by two convolutional and batch
shown to achieve better classification accuracies compared to normalization layers, respectively. Afterward, the outputs of
conventional classifiers in remote sensing. the first and second CNN branches are combined with the
On the other hand, deeper architecture is required to achieve third branch with a concatenate layer. It is followed by a
better classification results from CNN models. However, cur- convolutional, a batch normalization, a 2-D average pooling,
rent training data are insufficient for utilizing deep and very and a flattened layer. Finally, there are two dense and dropout
deep CNN models. Besides, most PolSAR classification algo- layers, followed by a softmax layer (see Fig. 1) in the proposed
rithms result in noisy classification maps. The main contribu- deep CNN model.
tions of this letter are given as follows.
1) Develop and propose a deep CNN network capable of A. PolSAR Input Features
outperforming much shallower CNN networks in terms
To determine the scattering characteristics of ground
of PolSAR image classification accuracies.
objects, PolSAR uses scattering matrices. Each PolSAR image
2) Decrease current PolSAR image classification noises by
pixel can be defined by a 2 × 2 complex scattering coefficients
using and proposing the Haar wavelet transform.
matrix, which is composed of the horizontal and vertical polar-
ization states of transmitted and received signals, expressed as:
II. M ETHODS
SHH SHV
The architecture of the proposed deep CNN network is S= . (1)
SVH SVV
shown in Fig. 1. As shown in Fig. 1, three branches in
the proposed deep CNN model utilize different concepts and Due to the presence of speckle noise, the scattering charac-
advantages of CNNs. Downsampling and feature extraction teristics of PolSAR data have been described by the statistical
were done in the first network using the Haar wavelet trans- coherence matrix T3 . The coherence matrix can be formulated
form. In the second network, we employed a deeper CNN as follows, in the case of reciprocity assumption (SHV = SVH ),
architecture with more parameters to be fine-tuned, while in expressed as:
the third network, the concepts of residual networks were ⎡ ⎤
T11 T12 T13
utilized [11]. It should be noted that we fed 12 PolSAR
T3 = K p K pH = ⎣ T21 T22 T23 ⎦. (2)
image features with image sizes of 12 × 12 (12at12 × 12)
T31 T32 T33
in all three CNN branches. There are three, five, and four
convolutional layers in the first (blue box), second (red box), The Pauli scattering vector is expressed by
T
and third (green box) branches. In the first branch, the first K p = 1/(2)1/2 [ SHH + Svv SHH − Svv 2SHV ] . In this
layer is a Haar wavelet layer [10], and the last layer is a batch research, we used six elements of T3 described by
normalization layer. We utilized a max-pooling layer in the T3 = [ T11 T12 T13 T22 T23 T33 ]. In addition, to improve
second branch after the first convolutional layer. In the third the classification accuracy of PolSAR images, six descriptor
branch of the proposed CNN model, the outputs of the first features were extracted from the coherency matrix T3 as
and fourth convolutional layers are combined by an additional summarized in Table I, in which SPAN = (T11 +T22 +T33 ) [12].
layer, followed by a max-pooling layer. Then, the outputs From the literature, it has been suggested that using
of the first and second CNN branches are combined by a feature descriptors will result in an improved PolSAR

Authorized licensed use limited to: University of Ulsan. Downloaded on September 24,2022 at 03:25:04 UTC from IEEE Xplore. Restrictions apply.
JAMALI et al.: PolSAR IMAGE CLASSIFICATION BASED ON DEEP CNNs USING WAVELET TRANSFORMATION 4510105

TABLE I
P OLARIMETRIC D ESCRIPTOR F EATURES E XTRACTED
F ROM THE C OHERENCY M ATRIX (T3 )

classification accuracy [4]. As such, we selected features

that have been reported to increase PolSAR classification
accuracy.

B. Wavelet Transform
Given PolSAR image patch x, a 2-D discrete wavelet with
four convolutional layers of a low-pass filter f LL and high-
pass filters of fLH , f HL , and f HH were utilized to decompose
x into four subband images of x LL , x LH , x HL , and x HH . In the
case of the Haar wavelet, f LL , fLH , fHL , and f HH are defined
as: Fig. 2. Classification accuracy changes with the parameters on the Flevoland

11 −1 −1 dataset. (a) Square neighborhood size of the local patch. (b) Training sample
f LL = , f LH = rate.
11 1 1

−1 1 1 −1
f HL = , f HH = . (3)
−1 1 −1 1 AlexNet [14] with around 172 million parameters and a 2-D
CNN network with much fewer parameters of approximately
As such, 2-D discrete wavelet transform operation is
0.3 million. The developed 2-D CNN network has only three
expressed as x LL = ( fLL Oper ator [U + 2 A02]x ↓2 ), x LH =
convolutional layers with filter sizes of 16, 32, and 64 with
( f LH Oper ator [U + 2 A02]x ↓2 ), x HL = ( f HL Oper ator [U +
kernel sizes of (7 × 7), (5 × 5), and (3 × 3), respectively.
2 A02]x ↓2 ), and x HH = ( f HH Oper ator [U + 2 A02]x ↓2 ),
We tested the developed model on two PolSAR benchmark
where convolutional operation is defined by ⊗ and ↓2 presents
datasets. The first data were acquired by NASA/JPL AIR-
the standard downsampling with a factor of 2. In other words,
SAR on August 16, 1989, for the Flevoland region in The
four fixed convolution filters with stride two are utilized in
Netherlands. The image is 750 × 1024 pixels in size. Water,
the 2-D discrete wavelet transform to perform the downsam-
barley, peas, stem bean, beet, forest, bare soil, grass, rapeseed,
pling mathematically. Considering a PolSAR image patch x,
lucerne, wheat 1, wheat 2, wheat 3, potato, and building are
x LL (i, j ), x LH (i, j ), x HL (i, j ), and x HH (i, j ) after applying a
the 15 ground-truth labels of this dataset. The second dataset
2-D Haar transform are expressed as [13]:
⎧ is a four-look NASA/JPL AIRSAR L-band data of the San
⎪
⎪ x LL (i, j ) = x(2i −1, 2 j − 1) + x(2i −1, 2 j ) Francisco, collected by NASA/JPL. The image is 900 × 1024
⎪
⎪
⎪ +x(2i , 2 j − 1) + x(2i , 2 j )
⎪ pixels in size. Ground-truth labels of San Francisco include
⎪
⎪
⎪ x LH (i, j ) = −x(2i −1, 2 j − 1) − x(2i −1, 2 j )
⎪ bare soil, mountain, water, building, and vegetation. It should
⎨
+x(2i , 2 j − 1) + x(2i , 2 j ) be noted that PolSAR data were processed in PolSARpro
(4)
⎪ x HL (i, j ) = −x(2i −1, 2 j − 1) + x(2i −1, 2 j )
⎪ V.6.0.3. software where PolSAR features were extracted as
⎪
⎪
⎪ −x(2i , 2 j − 1) + x(2i , 2 j )
⎪ bmp images. MATLAB programming language was used to
⎪
⎪
⎪ x HH (i, j ) = x(2i −1, 2 j − 1) − x(2i −1, 2 j )
⎪ extract the polarimetric descriptors from the T3 matrix and
⎩
−x(2i , 2 j − 1) + x(2i , 2 j ) stack them with the six elements of T3 matrix. The size of the
input feature patch fed into all implemented CNN networks
was 12at12 × 12. The F-1 score, overall accuracy (OA),
III. E XPERIMENTS
average accuracy (AA), and Kappa coefficient (Kappa) as
A. Dataset Description four commonly used evaluation metrics for evaluating PolSAR
The performance of the proposed deep CNN model with classification performance were adopted. The experiments
approximately 6 million parameters is evaluated against two were carried out with an NVIDIA RTX 2070 Max-Q using
models with significantly shallower architectures, including Python 3.7 in TensorFlow frameworks.

Fig. 3. Classification results of San Francisco. (a) Ground-truth map. (b) 2-D CNN. (c) AlexNet. (d) Proposed model.

Fig. 4. Classification results of Flevoland. (a) Ground-truth map. (b) 2-D CNN. (c) AlexNet. (d) Proposed model.

B. Results and Analysis TABLE II

C LASSIFICATION A CCURACY OF S AN F RANCISCO (OA = OVERALL A CCU -
Fig. 2 shows the classification accuracies of the Flevoland RACY, AA = AVERAGE A CCURACY, AND KI = K APPA I NDEX )

dataset by applying the proposed deep learning model for

different square neighborhood sizes and training sample rates.
Based on the results, the training sample ratio significantly
affected the PolSAR classification accuracies with training
sample ratios from 0.5% to 5% (AA improved by about
15%). In comparison, the change was not substantial from a
5% to 10% training sample ratio (AA improved by slightly
more than 2%), as shown in Fig. 2(a). Moreover, when
the square neighborhood size is set to 24, the developed
deep model achieved the best AA (99.93%), as shown in
Fig. 2(b). We obtained the least OA by setting the window
size to 4 (88.42%). It is clear that the spatial information
is insufficient when the size is too small. On the other
hand, the classification becomes slow when the window
size is too large, and the computing complexity significantly
rises. compared to the classification results of AlexNet and 2-
Therefore, we set the training sample ratio as 10%, while D CNN models (see Fig. 3). For instance, compared with
the window size was set to 12. As the number of class labels the results shown in Fig. 3(b) and (c), the most noticeable
in the ground-truth data is much higher for the San Francisco improvement of the proposed model shown in Fig. 3(d)
benchmark dataset than the Flevoland dataset, we only used in the PolSAR classifications is the better homogeneity of
5% of class labels in San Francisco. As shown in Table II, bare soil and mountain regions obtained from the proposed
the proposed model achieved the highest AA (93.28%) for model.
the classification of the San Francisco benchmark dataset To further validate the effectiveness of the proposed method
compared to AlexNet (87.77%) and the developed 2-D CNN in PolSAR image classification, the classification results of
(90.99%). The proposed model obtained F-1 scores of 0.81, Flevoland data are shown in Fig. 4(b)–(d). Because of the
0.91, 0.95, 0.98, and 0.99 for the recognition of vegetation, influence of speckle noise in Fig. 4(b) and (c), it is evident
bare soil, mountain, building, and water classes, respectively, that Fig. 4(b) and (c) is not optimal, whereas Fig. 4(d) had
outperforming both models of AlexNet and 2-D CNN. More- improved the map homogeneity by our model.
over, using the Haar wavelet significantly resulted in con- The proposed deep CNN model outperformed the AlexNet
siderably lower speckle noises in the San Francisco dataset (98.1%) and 2-D CNN (98.69%) networks in terms of OA with

TABLE III to two shallow CNN networks of AlexNet and a 2-D CNN.
C LASSIFICATION A CCURACY OF F LEVOLAND (OA = OVERALL Experiments validate the effectiveness and superiority of the
A CCURACY, AA = AVERAGE A CCURACY,
AND KI = K APPA I NDEX )
proposed deep CNN technique, demonstrating that it can
provide better noise immunity and provide smoother homo-
geneous areas in the PolSAR image classification. Moreover,
based on the results in terms of statistical metrics, the proposed
deep CNN model resulted in better classification accuracies in
San Francisco with an AA of 93.28% compared to AlexNet
(87.77%) and 2-D CNN (90.99%) models. In contrast, all
models showed comparable statistical results in the Flevoland
region.
R EFERENCES
[1] J.-S. Lee, M. R. Grunes, T. L. Ainsworth, L.-J. Du, D. L. Schuler, and
S. R. Cloude, “Unsupervised classification using polarimetric decompo-
sition and the complex Wishart classifier,” IEEE Trans. Geosci. Remote
Sens., vol. 37, no. 5, pp. 2249–2258, Sep. 1999, doi: 10.1109/36.789621.
[2] R. Garg, A. Kumar, N. Bansal, M. Prateek, and S. Kumar, “Seman-
tic segmentation of PolSAR image data using advanced deep learn-
ing model,” Sci. Rep., vol. 11, no. 1, p. 15365, Jul. 2021, doi:
10.1038/s41598-021-94422-y.
[3] J. Fan and J. Wang, “A two-phase fuzzy clustering algorithm based
on neurodynamic optimization with its application for PolSAR image
segmentation,” IEEE Trans. Fuzzy Syst., vol. 26, no. 1, pp. 72–83,
Feb. 2018, doi: 10.1109/TFUZZ.2016.2637373.
[4] H. X. Bi, J. Sun, and Z. B. Xu, “A graph-based semisupervised
deep learning model for PolSAR image classification,” IEEE Trans.
Geosci. Remote Sens., vol. 57, no. 4, pp. 2116–2132, Apr. 2019, doi:
10.1109/TGRS.2018.2871504.
[5] H. Bi, F. Xu, Z. Wei, Y. Xue, and Z. Xu, “An active deep learning
approach for minimally supervised POLSAR image classification,” IEEE
Trans. Geosci. Remote Sens., vol. 57, no. 11, pp. 9378–9395, Nov. 2019,
doi: 10.1109/TGRS.2019.2926434.
[6] Z. Fang, G. Zhang, Q. Dai, and B. Xue, “PolSAR image classification
based on complex-valued convolutional long short-term memory net-
a value of 99.25%, as seen in Table III. Due to Haar wavelet’s work,” IEEE Geosci. Remote Sens. Lett., vol. 19, pp. 1–5, 2022, doi:
10.1109/LGRS.2022.3146928.
feature extraction, the proposed model’s classification results [7] Z. Tirandaz, G. Akbarizadeh, and H. Kaabi, “PolSAR image segmenta-
exhibited higher coherence and smoother homogeneous areas. tion based on feature extraction and data compression using weighted
It can be reasonably concluded that the developed model neighborhood filter bank and hidden Markov random field-expectation
maximization,” Measurement, vol. 153, Mar. 2020, Art. no. 107432, doi:
outperformed both shallow models of AlexNet and 2-D CNN 10.1016/j.measurement.2019.107432.
in terms of visual interpretation in the PolSAR image classifi- [8] Y. Hu, J. C. Fan, and J. Wang, “Classification of PolSAR images
cation. Smoother classification maps with less noise would be based on adaptive nonlocal stacked sparse autoencoder,” IEEE Geosci.
Remote Sens. Lett., vol. 15, no. 7, pp. 1050–1054, Jul. 2018, doi:
necessary for PolSAR image classification due to their inherent 10.1109/LGRS.2018.2829182.
speckle noises. [9] T. V. N. Prabhakar and P. Geetha, “Two-dimensional empirical wavelet
transform based supervised hyperspectral image classification,” ISPRS
J. Photogramm. Remote Sens., vol. 133, pp. 37–45, Nov. 2017, doi:
IV. C ONCLUSION 10.1016/j.isprsjprs.2017.09.003.
[10] T. Chakraborty and U. Trehan, “SpectralNET: Exploring spatial–
Shallow CNNs are currently effectively employed to inter- spectral WaveletCNN for hyperspectral image classification,” 2021,
pret PolSAR data over large areas. However, it has been arXiv:2104.00341.
reported that the current PolSAR training data input is [11] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
insufficient for accurate classification using deep CNN net- (CVPR), Jun. 2016, pp. 770–778.
works. To improve the PolSAR image classification accuracy, [12] S. R. Cloude and E. Pottier, “A review of target decomposition theorems
we developed and proposed a three-branch deep CNN network in radar polarimetry,” IEEE Trans. Geosci. Remote Sens., vol. 34, no. 2,
pp. 498–518, Mar. 1996, doi: 10.1109/36.485127.
that utilizes the Haar wavelet transform as an effective feature [13] S. G. Mallat, “A theory for multiresolution signal decomposition:
extraction technique. It was found that the proposed technique The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell.,
produced much more homogenous and smoother classification vol. 11, no. 7, pp. 674–693, Jul. 1989, doi: 10.1109/34.192463.
[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
maps for two PolSAR benchmark datasets of the Flevoland and with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6,
San Francisco with considerably less speckle noises compared pp. 84–90, May 2017, doi: 10.1145/3065386.