Prediction of Sea Surface Temperature Using Long Short Term Memory
Prediction of Sea Surface Temperature Using Long Short Term Memory
Abstract— This letter adopts long short-term memory (LSTM) Many methods have been published to predict SST. These
to predict sea surface temperature (SST), and makes methods can be generally classified into two categories [1].
short-term prediction, including one day and three days, and One is based on physics, which is also known as the numerical
long-term prediction, including weekly mean and monthly mean. model. The other is based on data, which is also called the
The SST prediction problem is formulated as a time series regres-
sion problem. The proposed network architecture is composed
data-driven model. The former tries to utilize a series of
of two kinds of layers: an LSTM layer and a full-connected differential equations to describe the variation of SST, which
dense layer. The LSTM layer is utilized to model the time is usually sophisticated and demands increasing computational
series relationship. The full-connected layer is utilized to map effort and time. In addition, numerical model differs in dif-
the output of the LSTM layer to a final prediction. The optimal ferent sea areas, whereas the latter tries to learn the model
setting of this architecture is explored by experiments and the from data. Some learning methods have been used, such
accuracy of coastal seas of China is reported to confirm the as linear regression [2], support vector machines [3], neural
effectiveness of the proposed method. The prediction accuracy is network [1], and so on.
also tested on the SST anomaly data. In addition, the model’s
online updated characteristics are presented. This letter employs the latter to predict SST, which uses long
short-term memory (LSTM) to model the time series of SST
Index Terms— Long short-term memory (LSTM), prediction, data. Long short-term memory is a special kind of recurrent
recurrent neural network (RNN), sea surface temperature (SST), neural network (RNN), which is a class of artificial neural
SST anomaly.
network where connections between units form a directed
I. I NTRODUCTION cycle. This creates an internal state of the network, which
allows it to exhibit dynamic temporal behavior. Unlike feed-
S EA surface temperature (SST) is an important parameter
in the energy balance system of the earth’s surface,
and it is also a critical indicator to measure the heat of
forward neural networks, RNNs can use their internal memory
to process arbitrary sequences of inputs [4]. However, vanilla
RNN suffers a lot about vanishing or exploding gradient prob-
sea water. It plays an important role in the process of the
lem, which cannot solve the long-term dependence problem.
earth’s surface and atmosphere interaction. Sea occupies three
And it is very difficult to train. While LSTM introduces
quarters of the global area; therefore, SST has an inestimable
the gate mechanism to prevent back-propagated errors from
influence on the global climate and the biological systems.
vanishing or exploding, which has been subsequently proved
The prediction of SST is also important and fundamental in
to be more effective than conventional RNNs [5].
many application domains, such as ocean weather and climate
In this letter, an LSTM-based method is proposed to predict
forecast, offshore activities like fishing and mining, ocean
SST. There are two main contributions. First, an LSTM-based
environment protection, ocean military affairs, and so on. It is
network is properly designed with a full-connected layer to
significant in science research and application to predict the
form a regression model for SST prediction. The LSTM layer
accurate temporal and spatial distribution of SST. However,
is utilized to model the temporal relationship among SST time
the accuracy of its prediction is always low due to many
series data. A full-connected layer is applied to map the output
uncertain factors especially in coastal seas. This problem is
of the LSTM layer to the final prediction result. Second, SST
especially obvious in coastal seas.
change is relatively stable in ocean, while it is more fluctuated
Manuscript received March 14, 2017; revised June 11, 2017 and in coastal seas. So the SST values of Baohai coastal seas are
July 21, 2017; accepted July 25, 2017. Date of publication August 11, 2017; adopted in the experiments, and the prediction results beyond
date of current version September 25, 2017. This work was supported in part
by the National Natural Science Foundation of China under Grant 41576011,
the existing methods are reported to confirm the effectiveness
Grant 61403353, and Grant 61401413 and in part by the International Science of the proposed method.
and Technology Cooperation Program of China under Grant 2014DFA10410. The remainder of this letter is organized as follows.
(Corresponding author: Junyu Dong.) Section II gives the problem formulation and describes the
Q. Zhang is with the Department of Computer Science and Technology,
Ocean University of China, Qingdao 266100, China, and also with the
proposed method in detail. Experimental results on Bohai SST
Department of Science and Information, Agriculture University of Qingdao, Data Set, which is chosen from NOAA OI SST V2 High-
Qingdao 266109, China. Resolution Data Set, are reported in Section III. Finally,
H. Wang is with the College of Oceanic and Atmospheric Sciences, Ocean Section IV concludes this letter.
University of China, Qingdao 266100, China.
J. Dong, G. Zhong, and X. Sun are with the Department of Computer II. M ETHODOLOGY
Science and Technology, Ocean University of China, Qingdao 266100, China
(e-mail: [email protected]). A. Problem Formulation
Color versions of one or more of the figures in this letter are available
online at http://ieeexplore.ieee.org. Usually, the sea surface can be divided into grids according
Digital Object Identifier 10.1109/LGRS.2017.2733548 to the latitude and longitude. Each grid will have a value at
1545-598X © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1746 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 14, NO. 10, OCTOBER 2017
TABLE IV
P REDICTION R ESULTS (A REA AVERAGE RMSE)
ON THE B OHAI SST D ATA S ET
TABLE II
P REDICTION R ESULTS (RMSE) ON F IVE
L OCATIONS W ITH D IFFERENT lr s
when lfc = 1. The reason may be the same: more layers mean
more weights to be trained and more computation it needs.
RMSE is, the better the performance is. Here, RMSE can be Therefore, in the following experiments, we set lfc as 1, and
regarded as an absolute error. And for area prediction, the area the number of its hidden units is set to the same value as the
average RMSE is used. prediction length.
To summarize, the number of LSTM layers and full-
connected layers are set to be 1. The number of the neurons
C. Determination of Parameters
in the full-connected layer is set the same as the prediction
We randomly choose five locations in the Bohai daily mean length l. The number of the hidden units in the LSTM layer
SST data set denoted as p1 , p2 , . . . , p5 to predict three days’ is chosen in an empirical value range (l/2), 2l. More hidden
SST values with a half-month (15 days) length of the previous units require more computational time; thus, the number needs
sequence. First, we fix lr and lfc as 1 and uni ts_fc as 3, and to be balanced in the application.
choose a proper value for uni ts_r from {1, 2, 3, 4, 5, 6}.
Table I shows the results on five locations with different values
of nuni ts_r . The boldface items in the table represent the best D. Results and Discussion
performance, i.e., the smallest RMSE. It can be seen from the We use the Bohai SST data set to do this experiment and
results that the best performance occurs when uni ts_r = 6. compare the proposed method with two classical regression
In this experiment, the best performance occurs when methods SVR and MLPR. Specifically, the Bohai SST daily
uni ts_r = 5 in four locations p1 , p2 , p3 , and p5 , while at p4 , mean data set and the daily anomaly data set are used
the best performance occurs when uni ts_r = 6. We can see for one-day and three-days short-term prediction. The Bohai
that the difference in RMSE is not too significant. So, in the SST weekly mean and monthly mean data sets are used for
following experiments, we set uni ts_r as 5. one-week and one-month long-term prediction. The setting is
Then, we also use the SST sequences from the same five as follows. For the short-term prediction of the LSTM network,
locations to choose a proper value for lr from {1, 2, 3}. The we set k = 10, 15, 30, 120 for l = 1, 3, 7, 30, respectively, and
other two parameters are set by uni t_r = 5 and lfc = 1. lr = 1, uni ts_r = 6, lfc = 1. For the long-term prediction of
Table II shows the results on five locations with different the LSTM network, we set k = 10, uni ts_r = 3, lr = 1 and
values of lr . The boldface items in the table represent the lfc = 1. For SVR, we use the RBF kernel and set the kernel
best performance. It can be seen from the results that the width σ = 1.6, which is chosen by fivefold cross validation on
best performance occurs when lr = 1. The reason may be the validation set. For MLPR, we use a three-layer perceptron
the increasing weights with increasing recurrent LSTM layers. network, which includes one hidden layer. The number of
In this case, the training data are not sufficient enough to hidden units is the same as the setting of the LSTM network
learn so many weights. Actually, experiences in previous study for fair comparison.
show that the recurrent LSTM layer is not the more the better. Table IV shows the results of daily short-term prediction
And during the experiments, we find that the more the LSTM and weekly, monthly long-term prediction. The boldface items
layers are, the more likely to get unstable results and the in the table represent the best performance, i.e., the smallest
more training time to be needed. Therefore, in the following area average RMSE. We also test the prediction performance
experiments, we set lr as 1. with respect to the SST daily anomalies shown in Table V.
Finally, we still use the SST sequences from the same five It can be seen from the results that the LSTM network
locations to choose a proper value for lfc from {1, 2}. Table III achieve the best prediction performance. In addition, Fig. 5
shows the results with different lfc s. The numbers in the square shows the prediction result at one location using different
brackets stand for the number of the hidden units. The boldface methods. In order to see the results clearly, we only show
items in the table represent the best performance. It can be the prediction results for one year from January 1, 2013 to
seen from the results that it achieves the best performance December 31, 2013, which is the first year of the test set.
ZHANG et al.: PREDICTION OF SST USING LSTM 1749
R EFERENCES
Green solid line represents the true value. Red dotted line [1] K. Patil, M. C. Deo, and M. Ravichandran, “Prediction of sea surface
represents the prediction results of the LSTM network. Blue temperature by combining numerical and neural techniques,” J. Atmos.,
dashed line represents the prediction results of SVR with Ocean. Technol., vol. 33, no. 8, pp. 1715–1726, 2016.
[2] J.-S. Kug, I.-S. Kang, J.-Y. Lee, and J.-G. Jhun, “A statistical approach
the RBF kernel. And cyan dashed-dotted line represents the to Indian Ocean sea surface temperature prediction using a dynamical
prediction results of the MLPR. ENSO prediction,” Geophys. Res. Lett., vol. 31, no. 9, pp. 399–420,
2004.
E. Online Model Update [3] I. D. Lins et al., “Sea surface temperature prediction via support vector
machines combined with particle swarm optimization,” in Proc. 10th Int.
In this experiment, we want to show the online characteris- Probab. Safety Assessment Manage. Conf., vol. 10. Seattle, Washington,
tics of the proposed method. We have SST values of 328 days Jun. 2010.
in 2016. We refer to the above-trained model as the original [4] Wikipedia. Recurrent Neural Network. accessed on Feb. 15, 2017.
[Online]. Available: https://en.wikipedia.org/wiki/Recurrent_neural_
model, and use this model to predict the SST values of 2016. network
Based on the original model, we continue to train the model [5] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
adding three-years’ SST observations data of 2013, 2014, no. 7553, pp. 436–444, 2015.
and 2015, and get a new model called updated model. Table VI [6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
shows the results of SST prediction for 2016 using these two [7] A. Graves. (2013). “Generating sequences with recurrent neural net-
different models. As is expected, the updated model performs works.” [Online]. Available: https://arxiv.org/abs/1308.0850
the best. [8] N. Kalchbrenner, I. Danihelka, A. Graves. (2015). “Grid long short-term
This shows a kind of online characteristics of the proposed memory.” [Online]. Available: https://arxiv.org/abs/1507.01526
[9] N. ESRL. NOAA OI SST V2 High Resolution Dataset. accessed
method: performing prediction, collecting true observations, on Feb. 15, 2017. [Online]. Available: http://www.esrl.noaa.gov/psd/
feeding the true observations back into the model to update it, data/gridded/data.noaa.oisst.v2.highres.html
and so on. However, other regression models, like SVR, do not [10] Wikipedia. Bohai Sea. Accessed on Feb. 15, 2017. [Online]. Available:
have such characteristics: when collecting new observations, https://en.wikipedia.org/wiki/Bohai_Sea
[11] J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods
the model could only be retrained from scratch, which will for online learning and stochastic optimization,” J. Mach. Learn. Res.,
waste additional computing resources. vol. 12, no. 7, pp. 257–269, 2010.
[12] J. Dean et al., “Large scale distributed deep networks,” in Proc. Adv.
IV. C ONCLUSION Neural Inf. Process. Syst., 2012, pp. 1232–1240.
[13] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,”
In this letter, the prediction of SST is formulated as a time Neural Inf. Process. Lett. Rev., vol. 11, no. 10, pp. 203–224, 2007.
series regression problem, and an LSTM-based network is [14] D. Rumelhart and J. McClelland, Parallel Distributed Processing: Explo-
proposed to model the temporal relationship of SST to predict rations in the Microstructure of Cognition. Foundations. Cambridge,
the future value. The proposed network utilizes the LSTM MA, USA: MIT Press, 1986.
[15] F. Chollet et al., (2015). Keras. [Online]. Available: https://github.com/
layer to model the time series data, and full-connected layer fchollet/keras
to map the output of the LSTM layer to the final prediction. [16] F. Pedregosa et al., “Scikit-learn: Machine learning in Python,” J. Mach.
The optimal setting of this architecture is explored through Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.