0% found this document useful (0 votes)
4 views

09354430

Udyfuivh gufucib ugufr /*_&-;_$#€][]^`~

Uploaded by

xasec88580
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

09354430

Udyfuivh gufucib ugufr /*_&-;_$#€][]^`~

Uploaded by

xasec88580
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Advancing Radar Nowcasting Through Deep


Transfer Learning
Lei Han , Yangyang Zhao, Haonan Chen , Member, IEEE, and V. Chandrasekar, Fellow, IEEE

Abstract— Deep learning is emerging as a powerful tool in convolutional neural network (CNN) model for convective
scientific applications, such as radar-based convective storm storm nowcasting, which could extract predictive information
nowcasting. However, it is still a challenge to extend the applica- from radar data without making any physical assumptions as
tion of a well-trained deep learning nowcasting model, which
demands to incorporate the learned knowledge at a certain the conventional nowcasting technique does.
location to other locations characterized by different precipitation Nevertheless, almost all the deep learning-based models
features. This article designs a transfer learning framework to require massive historical data for training. This leads to a
tackle this problem. A convolutional neural network (CNN)-based common question: can the knowledge learned from a deep
nowcasting method is utilized as the benchmark, based on which learning model trained for one region be applied to other
two transfer learning models are constructed through fine-tune
and maximum mean discrepancy (MMD) minimization. The base regions? Due to the high computational cost, it is not oper-
CNN model is trained using radar data in the source study ationally practical to collect a tremendous amount of radar
domain near Beijing, China, whereas the transferred models are data to retrain a nowcasting model for a different region of
applied to the target domain near Guangzhou, China, with only a interest. In some scenarios such as when a new radar has
small amount of data in the target area. The influence of a varying just been deployed, it is impossible to obtain a long-term
number of target data samples on the nowcasting performance is
quantified. The experimental results demonstrate that the deep data set for retraining the deep learning nowcasting model.
transfer learning models can improve the nowcasting skills. On the other hand, if the model is not retrained to incorporate
local precipitation characteristics, it may result in significant
Index Terms— Convective storm nowcasting, deep learning,
transfer learning, weather radar. uncertainties in the radar nowcasting product. For example,
a deep learning model trained for Beijing or Denver area may
not perform as well in Guangzhou or New Orleans, due to
I. I NTRODUCTION
location dependence of precipitation characteristics. Therefore,

D OPPLER weather radars have been the most important


remote sensing instrument for observing clouds and
precipitation, serving as cornerstones of applications ranging
it is often challenging to effectively take advantage of deep
learning techniques in operational applications.
This study aims to advance deep learning-based radar now-
from severe weather warnings to long-term climate monitor- casting through knowledge transfer, which is also referred
ing. The 3-D radar data are also essential for nowcasting to as transfer learning. The fundamental idea of transfer
(i.e., short-term forecasting) of convective storms at high learning is to extract the knowledge from a previous or source
spatial–temporal resolutions [1]–[4]. In recent years, deep task and apply the extracted knowledge to a new/target task
learning is emerging as a powerful tool in scientific appli- [10]–[14]. A conceptual metaphor is that it will be easier for a
cations across all areas of geoscience, including atmospheric child to learn how to recognize peaches if he/she has already
research [5]–[7]. Deep learning techniques have proved to learned how to recognize apples and pears. A transfer learning
be effective for radar nowcasting through case studies in framework is constructed in this article to illustrate how to
the literature. For example, Shi et al. [8] introduced a time extract the knowledge stored in a well-trained deep learning
series-based deep learning framework called ConvLSTM and model using massive radar data at one location (i.e., Beijing)
demonstrated that ConvLSTM was superior to conventional and then transfer the extracted information to another location
radar nowcasting approaches. Han et al. [9] developed a deep (i.e., Guangzhou). In this way, the deep learning-based radar
Manuscript received September 12, 2020; revised November 21, 2020 and nowcasting model can easily be adapted for applications in
December 28, 2020; accepted January 29, 2021. This work was supported other domains.
in part by the National Natural Science Foundation of China under Grant For demonstration purpose, the CNN-based nowcasting con-
41875049, and in part by the National Key Research and Development
Program of China under Grant 2018YFC1507504-6. (Corresponding author: cept in [9] is utilized as a benchmark, which is first trained
Haonan Chen.) using extensive radar data collected in the Beijing area. Then,
Lei Han and Yangyang Zhao are with the College of Information Science two transfer learning models, respectively, based on fine-tune
and Engineering, Ocean University of China, Qingdao 266100, China.
Haonan Chen and V. Chandrasekar are with the Department of Elec- (FT) and maximum mean discrepancy (MMD) minimization
trical and Computer Engineering, Colorado State University, Fort Collins, approaches are developed to transfer the learned knowledge to
CO 80523 USA (e-mail: [email protected]). Guangzhou area, which has different precipitation characteris-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TGRS.2021.3056470. tics compared to Beijing [15], [16]. Two models are presented
Digital Object Identifier 10.1109/TGRS.2021.3056470 here mainly to illustrate that the capability of the transfer
0196-2892 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 1. Demonstration study domains: (a) map of China, (b) source study domain (red rectangle) in Beijing area, and (c) target study domain (blue rectangle)
in Guangzhou area. Both domains are about 250 km × 250 km. The yellow lines in (b) and (c) are provincial borders, and the white lines indicate coastal
lines.

learning framework can be reached via multiple models. In the the data from three other days (April 11, 20, and 22, 2019)
transferring process, only a small amount of local radar data are used for testing. All these eight days are characterized by
in Guangzhou is used. Three experiments are conducted to convective features with significant rainfall. This is a typical
demonstrate the skills of the transferred CNN models, and the scenario for transfer learning where there is a large amount
influence of varying number of target data samples (i.e., local of data for the source task, but only a small amount of data
radar data in Guangzhou) on the nowcasting performance is for the target/new task. In addition, all the reflectivity data are
quantified to shed light on how much data would be required interpolated onto Cartesian latitude–longitude coordinate with
when transferring a deep learning model. a horizontal resolution of 0.01◦ . There are 20 vertical layers
The rest of this article is organized as follows. Section II with a vertical resolution of 1 km, and the temporal update
briefly introduces the demonstration study domains and data rate is 6 min. For the training purpose, the reflectivity value
set. Section III details the architecture of the proposed transfer at each grid point is scaled to [−1, 1] through the min-max
learning models. The nowcasting performance is presented in normalization.
Section IV, and the main findings of this study are summarized
in Section V. III. M ETHODOLOGY
In order to take full advantage of deep learning which
II. S TUDY D OMAINS AND DATA S ET achieves tremendous success in image classification and com-
Fig. 1 shows the source and target study domains, which puter vision [17], [18], this research first transforms the
represent Beijing and Guangzhou areas, respectively. The nowcasting problem into a classification problem. We divide
two areas are located 2000 km apart and distinguished by the study domain into many position-fixed small boxes with
midlatitude and subtropical climates. Hence, their precipitation a size of 0.06◦ × 0.06◦ . Then, for each box, the nowcasting
characteristics are notably different, especially during convec- problem turns to a typical binary classification problem: will
tive rain events [15], [16], which is favorable for demonstration the radar reflectively be higher than Tz in a box in N minutes?
of the applicability of a deep learning model. Here, Tz is the threshold on radar reflectivity (i.e., 35 dBZ
The radar reflectivity mosaic data collected in these two in this study). N is the nowcasting lead time. For particular
areas are utilized in this article. In particular, the radar data operational applications, N can be set to different values
from June to August in 2015 in the Beijing area are used to by users. In this article, N is set to 30 min for illustration
train a base CNN model. Five days (April 26–30, 2019) of purposes.
data in the Guangzhou area are used to execute the designed Before being used for nowcasting, the classifier needs to be
transfer learning model, including training and validation, and trained using long-term historical radar data in offline mode.

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: ADVANCING RADAR NOWCASTING THROUGH DEEP TRANSFER LEARNING 3

Fig. 2. Deep transfer learning model for radar nowcasting. (a) Architecture of the base CNN. (b) FT transfer model. (c) MMD transfer model. 40@18*18 in
the first layer in (a) means that R and d R data are stacked (each has 20 vertical levels, and the size of each level is 18*18). 200@18*18 in the second layer
indicates 200 feature maps.

Once a well-trained classifier is obtained, one can use this from D S and TS . In the following, the base CNN nowcasting
classifier to make forecasts in real time for a given set of new model and two transfer learning models are detailed.
radar data.
The transfer learning used in this study is formulated as A. Base CNN
follows. Let D S and TS denote the source domain (Beijing)
and corresponding learning task in D S ; DT and TT denote the Fig. 2(a) shows the architecture of the base CNN model.
target domain (Guangzhou) and corresponding learning task It is an end-to-end deep learning nowcasting model without
in DT , respectively. The source and target domains can be the use of handcraft feature engineering (i.e., the process
expressed as of using domain knowledge to manually design features).
It has three convolutional layers and three pooling layers
 M followed by two fully connected (FC) layers. The convolution
D S = (x iS , yiS ) i=1 (1a)
 N operation is used to extract features from the input data and
DT = (x Tj , y Tj ) j =1 (1b) can preserve the spatial relationship between pixels by learning
features from small subsets of input data. A kernel or filter is
where M and N are labeled samples in D S and DT , respec- used to connect each neuron of a convolutional layer to a
tively; x iS ∈ X S , yiS ∈ [0, 1], x Tj ∈ X T , y Tj ∈ [0, 1], and small region in the input data. Moving a filter over the input
N  M. X S and X T are the feature spaces (i.e., reflectivity image yields a feature map, which will be the input of the
images) of source and target domains, respectively. x i is following pooling layer. The kernel size is 5 × 5 for the
the data instance and yi is the corresponding class label. first two convolutional layers, while 3 × 3 is used for the last
TS represents that we want to train a nowcasting CNN model in convolutional layer. The pooling operation is used to reduce
Beijing, while TT is to learn a model in Guangzhou. The goal the dimensionality of each feature map while keeping critical
of transfer learning is to help improve the learning of the target information. This CNN model has 2.32 million parameters
predictive function f T (·) in DT using the knowledge obtained and is well trained using a large amount of radar data in the

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Beijing area. A stochastic gradient descent algorithm is used of the base CNN nowcasting model, this article constructs a
for learning with the learning rate of 0.001 [19]. The readers two-stream learning model, as shown in Fig. 2(c). The first
are also referred to [9] for more details about the base CNN stream is used to process the source-domain data (Beijing);
model, which demonstrated the decent performance of this the second stream is used to process the target-domain data
model compared with traditional machine learning methods (Guangzhou). Both streams share parameters and are trained
that rely on handcraft feature engineering. jointly. The parameters trained on the Beijing data are used to
During the training period in offline mode, the input to the initialize the two streams.
base model is 3-D radar reflectivity data samples in each box. The feature representations at the last layer of these two
For training purpose, each data sample needs to be labeled streams are utilized to calculate the MMD loss, which is
as “0” or “1” according to the radar reflectivity value at the defined as
verification time in that box. Once the training is finished,
Loss = L(X, y) + λMMD2 (D S , DT ) (3)
the trained classifier can be used for predictions on new radar
data. If the output of the classifier is “1,” it is deemed that there where L(X, y) denotes the classification loss on both
will be a convective storm in 30 min in this box. Otherwise, source (Beijing) and target (Guangzhou) data X and y is
if the output is “0,” there will be no convective storm within the ground-truth label. The hyperparameter λ determines how
the next 30 min. The details of how to establish the training much to confuse the domains and was set to 0.25 in this study.
data set will be described in Section III-D. MMD2 (D S , DT ) stands for the distance between the source
and target data
B. Transfer Learning Model: CNN-FT
1  1 
M N
Previous research such as [11] indicates that the first couple MMD2 (D S , DT ) =  φ(X iS ) − φ(X Tj )2H (4)
M i=1 N j =1
of layers in a CNN mainly output general features that are
suitable to be transferred, and with the deepening of the
where X iS and X Tj represent source and target reflectivity data,
network, the features learned by the network become more
respectively; M and N are the numbers of source and target
specialized. Features transition from general to specific mainly
data samples, respectively; φ(·) is referred to as the feature
in the last FC layers in a CNN model. Therefore, this article
space map; D S and DT represent the source domain and target
first trains a base CNN using a large amount of source-domain
domain, respectively; and H denotes a reproducing kernel
data and then copies its first n layers to a target CNN. Only
Hilbert space.
the remaining layers of the target network need to be retrained
using a small amount of target-domain data, which is also
called FT. D. Establishing the Training Data Set
Fig. 2(b) shows the overall architecture of the designed FT As the nowcasting problem is transformed into a classi-
model. Since the benchmark CNN model used in this study fication problem and the classification is performed on each
has three convolutional layers followed by two FC layers box, the training data set was constructed based on boxes. The
and the last FC layer only has two neurons with very few raw 3-D radar data in a box are considered one training data
parameters (not enough to fine-tune this layer), the first three sample. Since the radar data have 20 vertical layers, one data
convolutional layers in the base CNN trained using Beijing sample has a window of 6 × 6 × 20 pixels. To establish the
data were frozen. The last two FC layers were randomly training data set, the window is moved across the 3-D radar
initialized and retrained using data from Guangzhou area. image sequentially, shifting location one box at a time. Then,
Similar to the base CNN model, the input to this transfer a label is assigned for each collected data sample: if there is
learning model is also 3-D radar reflectivity data samples. The a radar echo ≥ Tz in N minutes in a box, this box is labeled
nowcasting output at each box is “0” or “1.” Similar to the “1”; otherwise, it is labeled “0.”
base CNN model, the stochastic gradient descent algorithm Since the precipitation system is rather continuous in space,
with the learning rate of 0.001 is utilized in the optimization the neighboring eight boxes are also incorporated to represent
process. Therein, the cross entropy used as the model loss spatial variability. Hence, we have nine boxes for one data
function is defined as sample, which is of size 18 × 18 × 20. In the source domain
1 of Beijing area, the total number of data samples is 3 737 332,
n
Loss = [−yi log yi − (1 − yi ) log(1 − yi )] (2) including both training and validation data sets. In the target
n i=1
domain of Guangzhou area, 300 000 data samples are col-
where n is the batch size (256 in this study), yi is the label lected, i.e., about 10% of the source data samples.
of the i th data sample, and yi stands for the model output. In addition, this study uses the temporal trend of radar
reflectivity as another input to the deep convolutional network,
C. Transfer Learning Model: CNN-MMD mainly because the temporal evolution also plays an important
role in convective nowcasting [22]. The temporal trend is
MMD is an effective metric for comparing the distribution
simply calculated as the point-to-point difference of radar
between two data sets. By minimizing MMD, a domain-
reflectivity in a box
invariant representation can be obtained to guide the transfer of
knowledge learned from source domain to target domain with (Rt − Rt−1 )
dR = (5)
minimal loss in accuracy [20], [21]. Considering the structure Δt

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: ADVANCING RADAR NOWCASTING THROUGH DEEP TRANSFER LEARNING 5

where R stands for radar reflectivity value and Δt stands for TABLE I
the time interval of two adjacent radar volume scans. S KILL S CORES OF THE T HREE E XPERIMENTS
Three experiments, CNN, CNN-FT, and CNN-MMD, are
conducted to demonstrate the nowcasting performance of the
deep transfer learning models, including the following.
1) CNN: Baseline experiment using the CNN nowcasting
model trained using Beijing data.
2) textitCNN-FT: Using the FT method to transfer the base
CNN model to CNN-FT.
3) textitCNN-MMD: Using the MMD method to transfer whereas CNN-FT increases AUC from 0.881 to 0.933. Both
the base CNN model to CNN-MMD. transfer learning methods have improved the nowcasting per-
formance compared to the base CNN model trained using
IV. E XPERIMENTS AND R ESULTS Beijing data, and CNN-FT performs the best. Table I shows the
overall POD, FAR, and CSI values of the CNN, CNN-MMD,
A. Evaluating Metrics
and CNN-FT models. Obviously, when the original CNN
This study uses the probability of detection (POD), false model trained using the Beijing data is applied to Guangzhou,
alarm ratio (FAR), and critical success index (CSI) to quanti- the FAR value is high, which leads to the degradation of the
tatively evaluate the nowcasting performance of the transfer CSI value. Both CNN-MMD and CNN-FT models can reduce
learning models. The evaluation metrics are, respectively, the FAR value (from 0.521 to 0.363 and 0.349, respectively),
defined as follows: resulting in a significant increase of the CSI value (from
S 0.421 to 0.513 and 0.534, respectively). This indicates that
POD = (6a)
S+F the transfer learning models can incorporate both precipitation
A features learned from the original CNN model (based on
FAR = (6b)
S+A Beijing data) and the added information from the incremental
S Guangzhou data.
CSI = . (6c) In addition, the computational cost of the CNN-MMD and
S+F+A
CNN-FT models is very low. Compared with the base CNN
In each box, a success (S) occurs when both the truth and
model that takes several days to finish the training, it takes
nowcast are “1”; a failure (F) occurs when the truth is “1,”
less than 1 h to train the transferred models. This makes it
while the nowcast is “0”; and a false alarm ( A) occurs when
possible to adjust model parameters in real time.
the truth is “0,” while the nowcast is “1.” Generally speaking,
To further address how much data one may need to suc-
a better method should have higher POD, lower FAR, and
cessfully transfer a deep learning-based nowcasting model,
higher CSI.
this article uses the CNN-FT model to quantify the influence
This study also uses the receiver operating characteris-
of a varying number of target data samples on the nowcast-
tic (ROC) curve and area under curve (AUC) for evaluation.
ing performance. Different amounts of data samples from
The ROC curve is a graph showing the performance of a
the Guangzhou data set are randomly selected to form the
classification model when using varying thresholds, and AUC
subdata sets, and in total, nine subdata sets are constructed.
(the higher the better) measures the 2-D area underneath
The corresponding target data sample sizes are 5000, 10 000,
the ROC curve, which is considered an effective metric for
20 000, 30 000, 40 000, 50 000, 100 000, 200 000, and 300 000.
verifying the performance of the binary classification model.
Then, the subdata sets are used to retrain the transferred
CNN model (i.e., CNN-FT) with the same hyperparameters.
B. Results Fig. 3(c) shows the curve of CSI values as a function of
First, the base CNN model trained using Beijing data is training data samples for transfer learning. It can be seen that
directly applied to Guangzhou data in order to examine its when we used 200 000 training samples, the CSI value has
nowcasting skills. Fig. 3(a) shows the ROC curves of 30-min reached its maximum. If we continue to increase the training
nowcasts from the base CNN model for both Beijing and data samples, CSI will be stable. It is generally assumed that
Guangzhou areas. It is observed that the AUC value of the base the larger the training data set is, the better performance a deep
CNN model is only 0.881 for Guangzhou area and compared learning model can achieve. However, in the FT model, only
to 0.925 for Beijing area, indicating that the model trained the parameters on the last two layers are adjusted, whereas
using Beijing data is not well suited for Guangzhou due to other layers are frozen. In other words, the transferred model
different precipitation characteristics in these two regions. is not greedy for training data, and more training data will
Then, the designed CNN-FT and CNN-MMD models are not guarantee an improved performance of the transferred
utilized to retrain the revised model using a small amount model. It is also worth noting that when the number of training
of Guangzhou data in order to incorporate local precipita- samples is 10 000, the transferred model has a CSI value of
tion features in Guangzhou. The ROC curves of CNN-FT, about 0.52, which is already promising compared to the results
CNN-MMD, as well as the base CNN models are shown in the literature [9], [23].
in Fig. 3(b). It is observed that after the transfer learning, Fig. 4 shows the example nowcasting products in the
CNN-MMD increases the AUC value from 0.881 to 0.913, Guangzhou area at the lead time of 30 min, before and

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 3. Evaluation results of the nowcasting performance. (a) ROC curves of the base CNN nowcasting model applied to Beijing and Guangzhou. The base
CNN model was trained using Beijing data, and the nowcasting performance is degraded when applied to Guangzhou. (b) ROC curves of the base CNN
model and two transfer models applied to Guangzhou. Both transfer models improved the nowcasting skills, while CNN-FT showed the best performance.
(c) CSI values as a function of training data samples for transfer learning.

after applying transfer learning. In particular, Fig. 4(a) shows radar data to retrain the model. In order to further illustrate the
the nowcasting results on April 11, 2019 using the base differences of convective precipitation characteristics in these
CNN model trained on Beijing data, which have many two areas, the probability density functions (PDFs) of radar
false alarms indicated by the white arrows. Fig. 4(b) shows reflectivity in Beijing and Guangzhou data sets are calculated
the corresponding results based on the transferred CNN-FT and shown in Fig. 5. Obviously, the reflectivity distributions
model. Obviously, the false alarms are significantly reduced in Guangzhou and Beijing are quite different, especially in
in Fig. 4(b) compared with Fig. 4(a). The yellow arrows the convective storm regions where the reflectivity is higher
in Fig. 4(b) signify that the CNN-FT model can predict a than 35 dBZ. Precipitation echoes in Guangzhou are generally
convective storm better than the original CNN model. Sim- stronger than those in Beijing, and this difference was learned
ilarly, Fig. 4(c) and (d) shows the nowcasting results for the by the proposed models to improve the nowcasting perfor-
convective event on April 20, 2019, which again demonstrates mance in the target area. Nevertheless, it should be noted
that false alarms can be mitigated to a large extent (indicated that this article has not fully addressed the issue “how big
by white arrows) after applying the transfer learning model. the difference in two PDFs will yield negative transfer?” In
fact, there is still no universal solution to such a problem
C. Discussion in transfer learning, although various distance measurements
Utilizing the ability of knowledge transfer from a source have been developed, such as MMD, Jensen–Shannon, and
task to a target task, the proposed transfer learning methods Kullback–Leibler divergences. It is important to leverage more
successfully reuse the precipitation information learned from efforts on this problem in future studies.
one region in another region. The encouraging results indicate In addition, the designed transfer learning models are pri-
that when transferring a trained deep learning model to differ- marily based on CNN. Transfer learning techniques based on
ent regions, it is feasible to only use a small amount of local time series deep learning methods such as ConvLSTM have

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: ADVANCING RADAR NOWCASTING THROUGH DEEP TRANSFER LEARNING 7

Fig. 4. Example nowcasting products (superposed over the ground truth—radar reflectivity) at the lead time of 30-min in Guangzhou area at (a) and
(b) 0618 UTC, April 11 and (c) and (d) 0124 UTC, April 20, 2019. (a) and (c) are derived from the base CNN model, whereas (b) and (d) are based on
the transferred CNN-FT model. The red boxes represent the 30-min nowcasts, i.e., there will be radar echoes higher than 35 dBZ in these boxes. The white
lines indicate provincial borders. The black curves represent the contours of 35-dBZ reflectivity. The white arrows signify where false alarms are mitigated
by CNN-FT, and the yellow arrows signify where CNN-FT improves the forecast.

improved if the model is fine-tuned from a pretrained neural


network instead of training it from scratch. Furthermore, this
article mainly focuses on the nowcasting of severe convec-
tive precipitation, which has gained more attention and is
harder to predict compared to light–moderate stratiform rain.
Although the designed framework is flexible and can also
be applied to the same region but different seasons, it is
suggested that the threshold of Tz should be changed when
applying this model to different seasons, especially for winter
precipitation. Generic applications to cover a complete picture
of local precipitation characteristics should be investigated in
a future study. Moreover, this study only uses radar reflec-
tivity data. A possible extension of the designed models is
to incorporate more data sources such as polarimetric radar
measurements, which can provide richer information per-
Fig. 5. PDFs of radar reflectivity in Beijing and Guangzhou data sets. formance about the precipitation microphysics for improved
nowcasting.

not yet been thoroughly investigated. Although it would be V. C ONCLUSION


more difficult to transfer the time series information, the accu- Short-term prediction (nowcasting) of extreme weather
racy of time series deep learning model could potentially be events such as heavy rain can lead to significant improvement

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

in server weather warnings and emergency management R EFERENCES


decision-making. Nowcasting using weather radar reflectivity [1] M. Dixon and G. Wiener, “TITAN: Thunderstorm identification, track-
data has been shown to be particularly useful. However, ing, analysis, and nowcasting—A radar-based methodology,” J. Atmos.
conventional nowcasting approaches rely on extrapolation Ocean. Technol., vol. 10, no. 6, pp. 785–797, Dec. 1993.
and/or the availability of multisource data, and the nowcasting [2] C. Mueller et al., “NCAR auto-nowcast system,” Weather Forecasting,
vol. 18, no. 4, pp. 545–561, Aug. 2003.
performance is often hindered by the underlying physical [3] L. Han, S. Fu, L. Zhao, Y. Zheng, H. Wang, and Y. Lin, “3D convective
assumptions that may not be sufficient to represent the varying storm identification, tracking, and forecasting—An enhanced TITAN
atmospheric state. Deep learning is expected to enhance radar algorithm,” J. Atmos. Ocean. Technol., vol. 26, no. 4, pp. 719–732,
Apr. 2009.
nowcasting through extracting the complex spatial–temporal [4] J. Sun et al., “Use of NWP for nowcasting convective precipitation:
features of precipitation and associated microphysics, and it is Recent progress and challenges,” Bull. Amer. Meteorol. Soc., vol. 95,
proved to be effective for regional applications. no. 3, pp. 409–426, Mar. 2014.
Nevertheless, a deep learning model trained using radar [5] A. McGovern et al., “Using artificial intelligence to improve real-time
decision-making for high-impact weather,” Bull. Amer. Meteorol. Soc.,
data in one region may not be suitable for other regions with vol. 98, no. 10, pp. 2073–2090, Oct. 2017.
different environment and precipitation characteristics. It is not [6] M. Reichstein et al., “Deep learning and process understanding for data-
practically feasible to collect long-term radar data to retrain driven Earth system science,” Nature, vol. 566, no. 7743, pp. 195–204,
Feb. 2019.
a model for each region. This research proposes a transfer [7] H. Chen, V. Chandrasekar, H. Tan, and R. Cifelli, “Rainfall estima-
learning concept, which reuses the knowledge learned from tion from ground radar and TRMM precipitation radar using hybrid
one precipitation region in different regions. A CNN-based deep neural networks,” Geophys. Res. Lett., vol. 46, nos. 17–18,
pp. 10669–10678, Sep. 2019.
nowcasting model trained using radar data in the Beijing area [8] X. Shi, Z. Chen, H. Wang, and D.-Y. Yeung, “Convolutional
is used as a benchmark, based on which two transfer learning LSTM network: A machine learning approach for precipitation
nowcasting models are developed, namely, CNN-FT and CNN- nowcasting,” in Proc. Adv. Neural Inf. Process. Syst., 2015,
pp. 802–810.
MMD. The CNN-FT model uses the FT technique, whereas
[9] L. Han, J. Sun, and W. Zhang, “Convolutional neural network for
the CNN-MMD technique uses the MMD metric to transfer convective storm nowcasting using 3-D Doppler weather radar data,”
the CNN model. Demonstration results show that CNN-FT and IEEE Trans. Geosci. Remote Sens., vol. 58, no. 2, pp. 1487–1495,
CNN-MMD improve the nowcasting skills after transferring Feb. 2020.
[10] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
the knowledge learned from Beijing to Guangzhou area, which Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
is about 2000 km away from Beijing. The AUC value is [11] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are
improved from 0.881 (CNN) to 0.913 (CNN-MMD) and 0.933 features in deep neural networks?” in Proc. Adv. Neural Inf. Process.
Syst. (NIPS), 2014, pp. 3320–3328.
(CNN-FT). The CSI value is improved from 0.421 (CNN) to
[12] V. M. Patel, R. Gopalan, R. Li, and R. Chellappa, “Visual domain
0.513 (CNN-MMD) and 0.534 (CNN-FT). adaptation: A survey of recent advances,” IEEE Signal Process. Mag.,
In addition, nine subdata sets were constructed to vol. 32, no. 3, pp. 53–69, May 2015.
explore the influence of a varying number of target data sam- [13] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
inative domain adaptation,” in Proc. IEEE Conf. Comput. Vis. Pattern
ples on the nowcasting performance, based on the CNN-FT Recognit. (CVPR), Jul. 2017, pp. 7167–7176.
model. The experimental results show that when the train- [14] M. Wang and W. Deng, “Deep visual domain adaptation: A survey,”
ing samples are increased to 200 000, the performance of Neurocomputing, vol. 312, pp. 135–153, Oct. 2018.
the transferred model becomes stable, while a fairly good [15] Y. Ma, G. Ni, C. V. Chandra, F. Tian, and H. Chen, “Statisti-
cal characteristics of raindrop size distribution during rainy seasons
transferred model is available when the training samples are in the Beijing urban area and implications for radar rainfall esti-
increased to 10 000. This will shed light on how much data mation,” Hydrol. Earth Syst. Sci., vol. 23, no. 10, pp. 4153–4170,
Oct. 2019.
would be required when retraining a deep learning model
[16] L. Ji et al., “Raindrop size distributions and rain characteristics observed
for applications in different climate regimes. Also, although by a PARSIVEL disdrometer in Beijing, Northern China,” Remote Sens.,
the current study is concentrated on radar data, the proposed vol. 11, no. 12, p. 1479, Jun. 2019.
transfer learning framework has the potential to be applied on [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification
with deep convolutional neural networks,” in Proc. Int. Conf. Neural
many other hydrometeorological data sets. However, it should Inf. Process. Syst. Red Hook, NY, USA: Curran Associates, 2012,
be noted that this research is accomplished in offline mode. pp. 1097–1105.
Future work should focus on operational implementation of the [18] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
no. 1, pp. 436–444, 2015.
designed transfer learning nowcasting models, especially on
[19] C. Burges et al., “Learning to rank using gradient descent,” in Proc.
how to incorporate the real-time observations into the learning 22nd Int. Conf. Mach. Learn. (ICML), 2005, pp. 89–96.
schemes. [20] K. M. Borgwardt, A. Gretton, M. J. Rasch, H.-P. Kriegel, B. Schölkopf,
and A. J. Smola, “Integrating structured biological data by Kernel
ACKNOWLEDGMENT maximum mean discrepancy,” Bioinformatics, vol. 22, no. 14, pp. 49–57,
Jul. 2006.
The authors gratefully acknowledge the support of NVIDIA [21] E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell,
Corporation with the donation of the Tesla GPU used for “Deep domain confusion: Maximizing for domain invariance,” 2014,
arXiv:1412.3474. [Online]. Available: http://arxiv.org/abs/1412.3474
this research. The radar data used in this study are pro-
[22] J. R. Mecikalski and K. M. Bedka, “Forecasting convective initi-
vided by the China Meteorological Administration (CMA) ation by monitoring the evolution of moving cumulus in daytime
and can be acquired from http://data.cma.cn/en/. The inter- GOES imagery,” Monthly Weather Rev., vol. 134, no. 1, pp. 49–78,
mediate products, such as the precipitation features, in the Jan. 2006.
[23] E. Ruzanski, V. Chandrasekar, and Y. Wang, “The CASA nowcast-
training of the transfer learning models are available upon ing system,” J. Atmos. Ocean. Technol., vol. 28, no. 5, pp. 640–655,
request. May 2011.

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

HAN et al.: ADVANCING RADAR NOWCASTING THROUGH DEEP TRANSFER LEARNING 9

Lei Han received the B.Sc. degree in engineering V. Chandrasekar (Fellow, IEEE) received the bach-
mechanics and the M.Sc. degree in automatic control elor’s degree from IIT Kharagpur, Kharagpur, India,
from Harbin Engineering University, Harbin, China, in 1981, and the Ph.D. degree from Colorado State
in 1998 and 2001, respectively, and the Ph.D. degree University (CSU), Fort Collins, CO, USA, in 1986.
in atmospheric remote sensing from the Beijing He has been a Visiting Professor with the National
University of Technology, Beijing, China, in 2008. Research Council of Italy, Rome, Italy; the Univer-
From 2001 to 2004, he was a Software Engineer sity of Helsinki, Helsinki, Finland; the Finnish Mete-
with Lucent Research and Development, Qingdao, orological Institute, Helsinki, Tsinghua University,
China. From 2013 to 2014, he was a Visit- Beijing, China; and IIT Kharagpur and an Affiliate
ing Scholar with the Earth Observation Laboratory Scientist with the NASA Jet propulsion Laboratory,
(EOL), National Center for Atmospheric Research Pasadena, CA, USA; a Distinguished Visiting Sci-
(NCAR), Boulder, CO, USA. He is an Associate Professor with the College of entist at the NASA Goddard Space Flight Center, Greenbelt, MD, USA; and
Information Science and Engineering, Ocean University of China, Qingdao. a Distinguished Professor of Finland (FiDiPro). He has been the Director
His research interests include radar and satellite meteorology and machine of the Research Experiences for Undergraduate Program, for over 25 years,
learning. promoting research in the undergraduate curriculum. He is a University
Distinguished Professor with CSU. He is also the Associate Dean of the
Yangyang Zhao received the B.E. degree in elec- College of Engineering for promoting international research collaboration.
tronics and information engineering from Ningxia He has been actively involved with the research and development of weather
University, Yinchuan, China, in 2019. He is pursuing radar systems for over 35 years. He has played a key role in developing
the M.S. degree with the College of Information Sci- the CSU-CHILL National Radar Facility as one of the most advanced
ence and Engineering, Ocean University of China, meteorological radar systems available for research and continues to work
Qingdao, China. actively with the CSU-CHILL radar, supporting its research and education
His research interests include deep learning and mission. He is also the Research Director of the National Science Foundation
transfer learning. Engineering Research Center for Collaborative Adaptive Sensing of the
Atmosphere. He is an avid experimentalist conducting special experiments
to collect in situ observations to verify the new techniques and technologies.
He is an author of two text books and five general books and over 280 peer-
Haonan Chen (Member, IEEE) received the bach- reviewed journal articles. He has served as an Academic Advisor for over
elor’s degree in electrical engineering from the 70 graduate students.
Chongqing University of Posts and Telecommuni- Dr. Chandrasekar is a fellow of the American Meteorological Society, URSI,
cations, Chongqing, China, in 2010, and the M.S. the National Oceanic and Atmospheric Administration (NOAA) Cooperative
and Ph.D. degrees in electrical engineering from Institute for Research in the Atmosphere (CIRA), and the National Academy
Colorado State University (CSU), Fort Collins, CO, of Inventors. He has served as a member for the National Academy of
USA, in 2013 and 2017, respectively. Sciences Committee that wrote the books Weather Radar Technology Beyond
He worked with the NOAA Physical Sciences NEXRAD and Flash Flood Forecasting in Complex Terrain. He was a recipient
Laboratory, Boulder, CO, USA, from 2012 to 2020, of numerous awards, including Knighted by the Government of Finland,
first as a Research Student and then as a National the NASA Technical Contribution Award, the NASA Group Achievement
Research Council Research Associate and a Radar, Award, the NASA Robert H. Goddard Exceptional Achievement Award,
Satellite, and Precipitation Scientist. He is an Assistant Professor with CSU. the Outstanding Advisor Award, the CSU Innovations Award, the IEEE
His research interests span a broad range of remote sensing and multidis- GRSS Education Award, the NOAA/NWS Directors Medal of Excellence,
ciplinary data science, including radar and satellite observations of natural and the IEEE GRSS Distinguished Achievement Award. He has served
disasters, polarimetric radar systems and networking, big data analytics, as the General Chair for the IEEE IGARSS’06 Symposium. He serves
multiscale hydrometeorological data fusion, and precipitation classification, as the Chair for the Commission F, International Union of Radio Sci-
estimation, and prediction using deep learning techniques. ence (URSI). He also served as the Chief Editor for the Journal of
Dr. Chen serves as an Associate Editor for the Journal of Atmospheric and Atmospheric and Oceanic Technology and a Guest Editor for IEEE T RANS -
Oceanic Technology and URSI Radio Science Bulletin and a Guest Editor ACTIONS ON G EOSCIENCE AND R EMOTE S ENSING and IEEE J OURNAL
for Remote Sensing and IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED OF S ELECTED T OPICS IN A PPLIED E ARTH O BSERVATIONS AND R EMOTE
E ARTH O BSERVATIONS AND R EMOTE S ENSING. S ENSING.

Authorized licensed use limited to: COLORADO STATE UNIVERSITY. Downloaded on March 06,2021 at 02:57:07 UTC from IEEE Xplore. Restrictions apply.

You might also like