1999Forecasting Series-based Stock Price Data Using
1999Forecasting Series-based Stock Price Data Using
Abstract-A significant amount of work has been done in the of the financial series should be expected to repeat. The
area of price series forecasting using soft computing techniques, strength of models in interpolation cannot promise the
most of which are base upon supervised learning. Unfortunately, exploration ability that obviously is crucial for this
there has been evidence that such models suffer from application.
fundamental drawbacks. Given that the short-term performance Real time financial performance depends upon the
of the financial forecasting architecture can he immediately
measured, it is possible to integrate reinforcement learning into sequences of interdependent decisions and is thus path-
such applications. In this paper, we present the novel hybrid dependent. In other words, a series of actions must be taken
view for a financial series and critic adaptation stock prim in sequence and the quality of these actions usually cannot be
forecasting architecture using direct reinforcement. A new determined until the end of the sequence. This is a much
utility function called policies-matching ratio is also proposed. harder problem than supervised learning algorithms often
The need for the common tweaking work of supervised learning face. In this sense, many financial applications will fall into
is reduced and the empirical results using real financial data the reinforcement learning domain. Classical dynamic
illustrate the effectiveness of such a learning framework. programming methods have already been applied to asset
allocation [5], portfolio optimization [6], and derivatives
1. INTRODUCTION pricing applications [7]. Recently, Moody eta1 [8] proposed a
recurrent reinforcement-learning method to learn trading
Forecasting series-based stock price data via soil policies. In terms of forecasting financial series data, it is
computing techniques has in fact already taken shape in the dificult to integrate reinforcement learning techniques
past decade. Although many academics and practitioners directly. Few published research studies cover this issue. In
have tended to regard such application with a high degree of [9] we proposed a new hybrid view for fmancial series. The
skepticism, there has been accumulating evidence that the architecture offers the basis for further analysis using
markets are not fully efficient and the Artificial Intelligent- reinforcement learning. The Q-Learning approach is also
based models can outperform the benchmark models (e.g. adopted to forecast future prices trends purely from historical
Random Walk model). We anticipate that more Inlemet stock price data. The empirical results reported in that paper
financial service providers will incorporate AI techniques in show the effectiveness of this new thought. Opposite to the
their service to address the current industry trends, such as claims of the Random Walk Theory, historical fmancial
cheap real-time information, financial market and series may provide indications to predict future trends.
institutional deregulation, and global capitalization. However, the Neuro-Q forecasting architecture in [9] is
Till now, most of related research is based on traditional likely to suffer from several limitations due to the nature of
supervised learning techniques. Usually, artificial neural the value function reinforcement learning. For a discrete-time
network (ANN) models are adopted as the core engine in the dynamic system that is sui&hle for most financial series
forecasting architecture. It is believed that non-linear prediction environment, the "Bellman's curse of
relationships exist between financial variables and stock dimensionality" is unavoidable. Moreover, as pointed out by
returns. Often hybrid architectures are proposed in order to Brown [IO], the policies produced by Q-learning tend to he
obtain better predictions than simple A N N s models currently brittle because of the noisy financial data. The recurrent
provide. Such developments include the synthesis of genetic reinforcement learning (RRL) algorithm presented by [8] is a
algorithms and ANNs [I], Neuro-fuzzy architectures [Z], type of direct policy learning that eliminates the intermediate
combination of qualitative and quantitative data [3], and value function estimation procedure. Inspired by its strength
ARIMA-based ANNs [4]. Unfortunately, much of the in problem representation and computation efficiency, this
published literature is still somewhat academic. The results paper proposes the novel Critic adaptation forecasting
are case sensitive and hard to apply. In essence, all efforts architecture to predict series-based stock prices. RFU is
under the supervised learning framework will he subject to utilized as the learning algorithm for the critic network. We
the fundamental limitation in terms of why historical patterns demonstrate how learning can be implemented under the
format: Jr = z r k u s
k=l
- (-
a, = A a,.,,@,,R, ,
1
where 0, denotes the adjustable architecture parameters at
(4) For a decision function A(@,), the learning processes need
to be implemented so that 0 can be updated to maximize J r .
The RRL algorithm [SI is adopted for direct reinforcement.
time f and R, is the observable of the system
1105
Figure 1 presents the architecture for the whole forecast the individual market inertia for each index. As mentioned in
learning process. Basically, the complete adaptive learning section 3, ANN models are prone to over-fitting. In order to
process for the proposed architecture will consist of two construct relatively accurate ANN models without adding too
groups of parameter updating, i.e.. 8 for the critic network much extra work, each case experimented with different
and the weights 'W for the ANN model. Obviously the most values of k (the number of lagged values) ranging from 3 to
recent "unexpected" investment polices keep changing over 10 with an increment of 1, along with n (hidden neuron
time. Moreover, the unobserved market inertia is also likely number) ranging from 2 to 8 with an increment of 2. Such a
to change keequently. To account for this assumption, a combination resulted in 32 candidate models. The method for
rolling-training process will be applied to both supervised and choosing k and n employed standard cross-validation
reinforcement leaning at the end of each time interval 1 . verification. A small group of training data (20 daily closing
For the ANN model trained by the backpropagation prices for S&P 500 started from 11-Feh-98 and 20 daily
algorithm, the perfect data fitting for the training set can closing prices for Nasdaq started from 1-Jun-98) will he
easily be obtained if sufficient lagged values are included and available at the beginning of each experiment. In order to let
if the number of neurons in the hidden layer is also large. the ANN model reflect the market inertia's changes as timely
However, the optimal training balance is hard to get and the as possible, rolling-training and data resampling were
ANNs is well known for its tendency to over-fit. To implemented, i.e. 10 input-target pairs
overcome the over-training phenomenon common in
supervised learning, a cross-validation procedure will he ( { ( p,~,,
P,.~,...P ~ . ~tf) (P,)}) are sequentially pickedup each
included. time as the training data for the construction of the network
models. As such, 10 data pairs were divided into a training
set (7 pairs) and validation set (3 pairs). After the ANNs
model was set up and the prediction for the next day was
completed, the new actual value of the stock closing price is
CRITIC added to the training data once it becomes available. The
oldest input-target pair is eliminated. Then the training
process is restarted. Throughout the experiment, most of the
time the selected structure for the S&P 500 has 7 hidden layer
units and 3 lagged values, while the structure for Nasdaq has
2 hidden layer units and 3 lagged values.
Meanwhile, the adaptation for the parameter 8 of the
Reinforcement Signal decision function A keeps repeating at the end of each day
based on the new reinforcement signal. The learning episodes
Fig. I.Critic Adaptation Forecasting ArchitccNre for Stock Price will he increased gradually until the approximate policy
convergence is reached.
Figure 2 show the results for the S&P 500 within a certain
Note that here the exploration ability of architecture to time window. For thepolicies-matching ratio, ,U, = 0.9 and
search unknown investment strategies (which is crucial) is
promised not only by the adopted on-line stochastic FXL
optimization algorithm, hut also by the characteristic of
financial series: intrinsic noisy and uncertain. In the light of
above reasons, the noise variable will not he incorporated in
the reinforcement leaming.
N.Empirical result^
1 IO6
p2 = 0.1 are used for this experiment so the preference this
case is to minimize Mean Square Error. Other settings
include a = 0.01, U = 0.95, and number of episodes is equal
to 500.
Assume investors are more interested in the market trend
in terms of NAS/NMS Composite ( ,u, = 0.1 and ,ul= 0.9 is
used to address this assumption). Figure 3 presents the related
simulation results. Here, a = 0.005 , U = 0.75 , and number
of episodes for obtaining stable policy is 450.
The extra investment decisions, other than just following
the market inertia, are evolved alone over time. The same
evolution occurs for its implicit representation 8 .Figure 4 I I
2BMWm 2s-J"- 24.J"l-m 2l.A"+
Ti"
uses vo as an example to demonstrate such evolution. Figure
Fig. 4. Examplc of 8'sEvolution within Certain Timeframe
4 also shows the difference between the unstable learning
policy (400 learning episodes) and the convergent policy
learning (500 learning episodes). TABLE I:Camparison o f P e r f o m c e for Different Forecasting
More precise comparison results are reported in Table I in Archiecures
which five performance measures are used for a total of 1,268
predicted daily prices. These measures include Root Mean
Squares Error RMSE , Mean Absolute Error M E , Mean 1815.1 / 2109.9
Absolute Percentage Error M P E , and the Direction
Accuracy Indicators DAl and DA2 (the correction , ..,_
.._.I
I l"lYrI
r
percentage forecast for an up and down market). Note that the n A1 I 50.75%/51.76% I 44.78%/61.22%
test goal for the S&P 500 is RMSE reduction and for the I.
L.
---I I
REFERENCES
1107
[4] Jug-Hua Wang and Jia-Yann Leu. “Stock market trend prediction
using ARIMA-based neural networks,” IEEE International Conjerence
onNeurolNerworkr, 1996,Vol.4,pp.2160-2I65.
[ S ] M.J. Brennan, E.S. Schwa- and R. Lagnado, “Strategic asset
allocation,” J. Economic Dynamics Contr., vol. 21. pp. 1377-1403,
1997.
[6] R.C.Merton, Continuom-Time Finmce. Oxford, U.K.: Blackwell.
1990.
[7] J.C.Cox, S.A.Ross and M.Rubinstcin., ”Option pricing: A simplified
approach,”J. Financial Economics, vol. 7, pp. 229-263, Oct. 1979.
[E] J.Mwdy, L.Wu, Y.Liao and M.Saffell., “Performance functions and
reinforcement leaming for hading systcms and partfolios,” J.
Forecosring, vol. 17, pp. 4 4 4 7 0 , 1998.
[9] Hailin Li and Cihan H. Dagli., “Synthesis of reinforcement leaming
and artificial neural networks applied to forecast real time fmancial
series,” 24* ASEM annual conference, St.louis, USA, 2003, pp. 493-
499.
[IO] T.X.Brown, “Policy YS. value function leaming with variable discount
factors,” Proc. ,NIPS 2000 Workhop Reinforcement Learning: Leom
the Policy or Leom the Volue function?, Dee. 2000.
[I I ] W.T.Miller, R.S.Sutton and P.J.Werhas, Neural Neworkr for Control,
Cambridge, MA: MIT Press, 1990.
[I21 D.Prokhorov, R.Santiaga and D.Wunsch., “Adaptive critic designs: A
case shldy for neuraconfrol,” Neural Nerworkr. vol. 8, pp. 1367-1372,
1995.
1108