0% found this document useful (0 votes)
157 views

Gold (2003) - FX Trading Via Recurrent Reinforcement Learning PDF

This document discusses using recurrent neural networks trained via reinforcement learning to develop currency trading systems. Specifically, it investigates using single-layer and multi-layer neural networks to determine trading positions and compares their performance on different currency markets. Key findings include that trading performance varied significantly across markets and single-layer networks generally outperformed multi-layer networks for this application.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views

Gold (2003) - FX Trading Via Recurrent Reinforcement Learning PDF

This document discusses using recurrent neural networks trained via reinforcement learning to develop currency trading systems. Specifically, it investigates using single-layer and multi-layer neural networks to determine trading positions and compares their performance on different currency markets. Key findings include that trading performance varied significantly across markets and single-layer networks generally outperformed multi-layer networks for this application.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

FX Trading via Recurrent Reinforcement Learning

Carl Gold
Computation and Neural Systems
California Institute of Technology, 139-74
Pasadena, CA 91125
Email [email protected]
January 12, 2003

Abstract:
This study investigates high frequency currency trading

 
of the input, the neural networks are called “recurrent”. The
output of the network at time is the position
with neural networks trained via Recurrent Reinforcement (long/short) to take at that time. Neutral positions are not
Learning (RRL). We compare the performance of single allowed so the trader is always in the market, also known as
layer networks with networks having a hidden layer, and a “reversal system”. For a single layer neural network (also
examine the impact of the fixed system parameters on
&%(' +*
known as a perceptron) the trading function is

  ' *2143
performance. In general, we conclude that the trading
 )
"!$# # -,.0/ ) .
systems may be effective, but the performance varies widely
for different currency markets and this variability cannot

where 6 5 and 7 are the weights and threshold of the neural


be explained by simple statistics of the markets. Also we
find that the single layer network outperforms the two layer
network in this application.

network, and 8 is the “price returns” at time t. For trading
where a fixed amount is invested in every trade (“Trading
Returns”) , the price returns are given by
 :9  ;9  )
1 INTRODUCTION

Moody and Wu introduced Recurrent Reinforcement 8


(&
 function is replaced. with a <>= 4? (1)
Learning for neural network trading systems in 1996 [1], In practice the function

<>= 4?
and Moody and Saffell first published results for using such so that derivatives can be taken with respect to the decision
trading systems to trade in a currency market in 1999 [3]. function for training as described in section 2.3. The
The goal of this study is to extend the results of [3] by giving function is then thresholded to produce the output.
detailed consideration to the impact of the fixed parameters A more complex and in theory more powerful trading rule
of the trading system on performance, and by testing on a can be made from a neural network with two layers. The
larger number of currency markets. second layer of neurons is also known as the “hidden” layer.
Section 2.1 introduces the use of neural networks for trad- In this case the trading rule is:

BG*21 3
  (&
AB @ BFE
ing systems, while sections 2.2 and 2.3 review the perfor-
mance and training algorithms developed in [1] and [3].

"! #DC L LOC *


Section 2.4 details the application of these methods to trad-
ing FX markets with a bid/ask spread, while section 3.1 be-

 L * 3
gins to discuss the test data and experimental methods used.
H4I <J=  4
 
? 
 K I  I  I
"!M N 8 ) M KP,. N ) . 7
Finally, sections 3.2, 3.3, and 3.4 compare results for differ-
ent markets and for variations of the network and training

where H 5 are the output of the neurons in the first layer, and
algorithm parameters respectively.
Q , 7 5 , 6 5 , and 7 are the weights and thresholds for the first
2 TRADING WITH NEURAL NETWORKS
and second C layersC of the neural network respectively.
2.1 Neural Network Trading Functions
2.2 Returns and Performance
We begin by reviewing the use of recurrent neural net-
works to make trading decisions on a single price series, fol-
lowing the presentation given in [4] with additional details
T R
For FX trading it is standard to invest a fixed amount in
each trade. Consequently the profit at time is give by:
where appropriate. The input to the neural network is only
the recent price history and the previous position taken. Be-

STU  V  (2)
cause the previous output is fed back to the network as part
.
1
3
V    ) 8      ) 
.
where is the number of shares traded, and is the trans-
.  (3)
 6 
Due to the path dependence of trading, the exact calcula-
tion of after trading periods is:

 .  ratesharesper persharetradetraded.


action cost For test purposes we in-
 * V * * * V * *
vest so the results in different cur-

 6   % +* ' V &  -* ,   *  6  *  *  6  * ) . . (7)
.    ). ).
rency markets are easily comparable as percentage gains and
losses.

1% 3
We use the Sharpe Ratio to evaluate the performance of

 
  =  1  V  G V  3
% tion, the return at time  is a function of both the current po-
the system for training. The Sharpe Ratio is given by: Note that due to the transaction cost for switching posi-

<>= = =< (4) sition and the previous position. As described in [4] we can
approximate the exact batch learning procedure described
The standard deviation in the denominator penalizes vari-
3
by (7) with an approximate on-line update:

*
% /6& 10 % 'V&  ,  VD  6  VD   6 ) . .
ability in the returns.

results in an  
Calculating the R  algorithm, so in order to evaluate the
exact Sharpe Ratio at every point in time
 6
trader’s performance we use the Differential Sharpe Ratio      ). ). (8)
[4]. The Differential Sharpe Ratio is derived by considering

3
a moving average version of the simple Sharpe Ratio (4) : For the Differential Sharpe Ratio the derivative of the per-

 D  *
formance function with respect to the returns is:
  R 3 * 'V &    PV T   !     ) .  
 ) V 3 " 
     ) . *   VV       ) . 3      ) . *        ) . . $ ).
(5)
 
) .   ) . ) . 
 D are exponential
where  and
For trading returns, the derivatives of the return function are:

' ' 3
V  moving estimates of the
first and second moments of respectively. The Differen-
 V  23O(&
  )
tial Sharpe Ratio is derived by expanding the moving aver-
 V  * / ' / .' 3
  8  3   )
 ) .
age to first order in the adaptation parameter , and using the
first derivative term as the instantaneous performance mea-
/ / .
     "!    ) ! .     
 .   ) 3#"   
sure:
      ) . $
 ). 46465 7 
The neural network trading function is recurrent so the

. derivatives for on-line training are calculated in a man-


ner similar to back-propagation through time [4] [6]:

*
The Differential Sharpe Ratio provides a convenient assess-

 6  0 88 6  8 8    6 )
ment of the trader’s performance for use in Recurrent Rein-
forcement Learning, described in the next section.
 ). ) .
2.3 Recurrent Reinforcement Learning .
The derivative of the output function with respect to any
The goal of Recurrent Reinforcement Learning is to up- weight in the neural network can be calculated trivially for
date the weights in a recurrent neural network trader via gra- the single layer network, and by using a standard back-
* *
6  6  ) . % '6&   6  ) .  6
dient ascent in the performance function: propagation algorithm for the two layer neural network. (see
e.g. [5]) Thus all of the derivatives needed for the weight


(6) update given by (6) and (8) are readily available.

 6 is any weight or threshold of the network at time L The algorithm used for training and trading is as follows:

 , & is some measure of the trading systems performance,


where
9 
+ 6
: ; <
train the neural network in an initial training period of length

and % is an adjustable learning rate.


. The trades and performance during the training pe-
riod are used to update the network weights but are then
We also tested using the “weight decay” variant of the discarded so that they do not contribute to the final perfor-

* 3*
gradient ascent learning algorithm. Using weight decay, (6)
=?>
mance. The training period may be repeated for any number
9 + :6; >
6  6  ) .  6 )( 6  ) .  6  ) . (A)(  6
becomes: of epochs, . The training period is then followed by an
out of sample trading period of length . The trades
4
(
where is the co-efficient of weight decay. In theory,
made during the trading period are the actual trades for the
period, and also update of the network weights continues
weight decay improves neural network performance be- during the trading period. After the trading period, the start
9 + :6; >
cause smaller weights will have less tendency to over-fit the
noise in the data. [5]
of the training period is advanced by
is repeated for the entire sequence of data.
and the process
4
2.4 Bid/Ask Trading with RRL if their optimum value changed due to the change in the
other parameters. (This technique is commonly known as
As in [3], when RRL is used for a currency series with the “greedy graduate student” algorithm.)
bid/ask prices the mid price is used to calculate returns and In order to assure that the inputs to the neural networks
the the bid/ask spread is accounted for as the transaction cost were all in a reasonable range regardless of the magnitude
of trading. That is, for a bid/ask price series the price returns of the prices in the different markets, all price returns were

9   ,  9 ; 9
input to the trader in (1) are calculated in terms of the mid-
   
normalized before being input to the neural networks. In

 
price, , where and are the bid and ask order to achieve this, the mean and variance of the price
price at time respectively, and an equivalent transaction returns was calculated over the first training period (before
cost rate is applied in (3) to reflect the loss from position beginning training) and then all price returns were normal-
changes in bid/ask trading. For trading returns the equiva- ized to zero mean and unit variance with respect to these

3 ;
values before being input to the neural network. Also the

    9  ; 9
lent transaction cost is simply the spread divided by two:
data was filtered to remove non-continuous price changes
(i.e. “outliers”) from the data.
(9)

.
3.2 Comparison of Results for Different Markets
where the factor of reflects the fact that the transaction
cost is applied for a single change in position, while in re-
ality the bid/ask spread is lost for a change in position from Tables 1 and 2 gives the profit and Sharpe Ratio achieved
neutral to long and back to neutral (or neutral-short-neutral.) for each of the currency markets in the tuning set and test set
Because currency trading is typically commission free no respectively. The results shown are for the final parameter
further transaction cost is applied. For all experiments trad- values chosen to optimize performance on the tuning data
ing and performance is calculated in this way, but the fi- sets. Overall, both the one layer and the two layer neural
nal profit was calculated both by the approximate method networks trade profitably in most of the currency markets.
described by using (2), (3) and (9), and also by the exact However, the final profit level varies considerably across the
method of applying all trades at the exact bid and ask prices. different markets, from -80% to 120%. Sharpe Ratios range
The disagreement between the two methods for calculating from being small or even negative in some markets, to sur-
profits was found to be insignificant. prisingly high Sharpe Ratios of 7 or 8 in other markets.
It turned out that the performance in the currency mar-
3 EXPERIMENTS WITH CURRENCY TRADING kets chosen for testing was rather better than on the markets
used for parameter tuning. This suggests that in the future
this work would benefit from a cross-validation approach to
3.1 Data and Methods
determining the optimal parameters. The variability in the
final results for each market is low considering that the ran-
The RRL traders were tested on the High Frequency Data dom initialization of the network weights means that suc-
in Finance (HFDF) 1996 Currency Market price series 1 . cessive trials will never lead to identical position sequences,
The series give prices for 25 different markets, including and that the final results are path dependent.
both major and minor currencies. The HFDF data set in- Overall it is apparent that some of these results seem
cludes half hourly bid and ask prices for the entire year, a rather too good to be true. The explanation for the im-
total of 17568 samples (note that 1996 was a leap year). probably high performance is most likely the simple price
The markets were divided into a tuning set consisting of model used in the simulations: a single price quote is used
10 markets on which the fixed parameters of the algorithm at each half hour and the trader is guaranteed to transact at
were tuned to maximize profits and the Sharpe Ratio, and a that price. In reality, FX price quotes are noisy and the tick
test set consisting of the remaining 15 markets in which out to tick price returns actually have a negative correlation. [2]
of sample performance was evaluated. The major currency Consequently many of the prices at which the neural net-
markets were split equally into the tuning and test sets, but works trade in simulation are probably not trade-able in real
otherwise the two sets were created at random. time and performance in real trading can be expected to be
Parameters of the algorithm include the number of neu-
L
worse than the results shown here. Performance with a less
%
rons in two layers of the network, and
, the learning
(
9 +: ; < 9+:6; >
forgiving pricing model is currently under study.
rate, , the coefficient of weight decay, , the size of the Figure 1 shows single trials of a 1 layer neural net-
training and test windows,
=?>
and , and the num-
ber of epochs of training, . The large number of param-
4 work trading in the Pound-Dollar (GBP-USD), Dollar-
Swiss Franc (USD-CHF) and Dollar-Finnish Markka (USD-
eters made it quite impossible to systematically test all but FIM) markets. Only in the GBP-USD market is the trader
a small number of parameter combinations. The parameters successful throughout almost the entire year. The USD-
were tuned by systematically varying each parameter while CHF market shows an example of a market in which the
holding the other parameters fixed. After all parameters had trader makes profits and losses with approximately equal
been tuned, the previous values would be re-checked to see frequency. The USD-FIM is an example of a market where
1 Olsen & Associates HFDF96 Data Set, obtainable by contacting the trader loses money much more than it makes money,
http//:www.olsen.ch most likely because in this market the price movement is
GBP-USD USD-CHF USD-FIM
1.8 1.4
4.8
1.3
1.6 4.6
1.2
4.4
1.4 1.1
0 10000 20000 0 10000 20000 0 10000 20000
140 140
Wealth (%)
120 120
100 100
80 80

0 10000 20000 0 10000 20000 0 10000 20000

0.2 0.2
Sharpe

0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2
0 10000 20000 0 10000 20000 0 10000 20000
Time (1/2 hour periods)

Figure 1: Examples of single trials of a 1 layer neural network trading in the Pound-Dollar (GBP-USD), Dollar-Swiss Franc

average Sharpe (5) with .   4


(USD-CHF) and Dollar-Finnish Markka (USD-FIM) Markets. The Sharpe Ratio shown in the bottom plot is the moving

3* 3 3 in the mid
ginning at time  , and 
9P 9   is the ;change
9   .
small compared to the spread (see below). What is worth
noting here is that success and failure for the traders is not price over same window, i.e. 
absolute, even at a very coarse time scale: In the markets
where the trader is successful there are sizeable periods
Tables 1 and 2 show the average movement of the mid-
price divided by the bid/ask spread for windows of
    ,
without profits, and in the markets where the trader loses i.e. 5 hours, in each currency series (in the column labeled
money overall there are periods where the trader does make “M/S”). For the tuning data set the ratio is calculated for
a profit. The positions taken are not shown - in these exam- the entire data series, while for the test data set the ratio
ple the average holding time ranged from 4 to 8 hours and is only calculated for the first training period (in order to
at this scale individual trades would not be visible. preserve the integrity of the data as a true out of sample per-
Basic statistics were calculated for the price series in the formance test.) Looking at the tuning set, neither the single
tuning markets in an attempt to determine what factors may nor the two layer neural networks make a profit on any cur-
influence the ability of the neural networks to profit. Statis- rency where this ratio is close to or below 1. Consequently,
tics calculated from the moments of the price returns, such markets where the M/S ratio was below 1.5 were deemed
as the mean, variance, skew and kurtosis all showed no cor- “un-tradeable” and were ruled out for consideration of the
relation with the ability of the neural networks to profit. parameter tuning and for final results in the test set. The
However, a measure that did partly explain the profitabil- results for the un-tradeable markets are shown in tables 1
ity in the different markets was the average ratio of the ab- and 2 for purpose of comparison. It is a reasonable ques-
solute price movement to the spread over small windows tion for further research to determine whether these markets
of time. As noted above, the neural network traders hold might be tradeable at a lower frequency so that the move-
a position for approximately five hours only. Because the ment of the prices would compensate for the spread within
bid/ask spread is lost when changing positions it is only rea- a number of inputs that the neural network can adapt to. At-
sonable to expect that if the movement of the prices is small tempts to adapt traders to low movement markets by using
compared to the spread then it will not be possible to trade

a large number of price return inputs at the same data fre-
 
profitably. This intuitive idea can by easily quantified with
the Movement/Spread ratio,
T , given by: 3 quency failed.

 )
    9 P 3  
R  
  

However, while a low M/S ratio makes profit unlikely, a
high M/S ratio by no means guarantees profitability. For ex-
(10)
.
ample, the USD-CHF market has one of the highest M/S
3 ratios calculated, yet the neural networks lost money in

  
where is the window size for which the ratio is calculated,
is the average spread over a window of length be-
 this market for all combinations of parameters that were at-
tempted. Linear regression was performed for the M/S ratio
L

 , %    , (  &  , 9 (: <       
+ 6
: ;
9, +:6L ; 1:> Tuning

   Market
 
Results: Final fixed parameters for 1 layer Neural Network:
 ,%
9 +:6;     ,9 = +:6> ; 
; Final
Table

<4   fixed  parameters for 2 layer Neural Network: ,

4> =>
, ,

, , . Averages and Standard Deviations are calculated for 50 trials of each type of neural


network in each currency market. The Sharpe Ratio is the Annualized Sharpe Ratio, profits are the exact profits described in
section 2.4, and is the movement/spread ratio described in (10).

1 layer NN 2 layer NN
Market M/S Profit (%) Sharpe Profit (%) Sharpe


AUD-USD 1.96 17.7 0.8 2.2 0.09 18.1 2.0 2.3 0.26


DEM-ESP 1.17 -7.5 11.0 -1.7 2.37 -4.7 4.4 -1.1 1.05


DEM-FRF 2.05 31.3 1.3 6.2 0.1 38.9 0.8 7.7 0.16


DEM-JPY 2.47 17.2 1.2 2.1 0.15 10.5 1.9 1.3 0.23


GBP-USD 2.29 22.6 0.3 3.2 0.05 25.3 1.1 3.6 0.16


USD-CHF 2.68 -4.2 0.6 -0.4 0.06 -15.7 2.1 -1.5 0.21


USD-FIM 0.90 -24.9 0.3 -2.4 0.15 -65.5 3.2 -6.1 0.30


USD-FRF 2.81 49.3 1.5 5.9 0.04 40.1 2.0 4.8 0.24


USD-NLG 2.43 22.1 0.7 2.4 0.09 7.1 3.4 0.8 0.37
USD-ZAR 1.16 -82.1 4.4 -8.1 0.41 -76.8 7.4 -8.0 0.74


Average, all markets 4.2 2.2 0.9 0.35 -2.3 2.8 0.4 0.37
Average, M/S 1.5 22.3 0.74 3.1 0.08 17.8 2.7 2.7 0.23

vs. the Sharpe Ratios for the all the markets - the correlation the two layer network a liability if it memorizes noise in the
co-efficient was .17 for the single layer neural network and input data. On the other hand, this result seems to imply
.09 for the two layer neural network. Precisely what char- that reasonably good trading decisions for FX markets are
acteristics of a currency market make it possible to trade in some sense linearly separable in the space of the recent
profitably with an RRL trained neural network is one of the price returns and therefore not as complex as we may be-
most important issues for further research in this area. lieve.
Table 3 gives some simple per trade statistics for the 1 Figure 2 shows the effect of the number of price return
layer neural network in the tuning markets, including the inputs on the performance of a single layer neural network
holding time and profit per trade. Note that the M/S ratio trader. The results shown for three currency markets dis-
has a negative correlation to the mean holding time - the play the difficulty of choosing fixed parameters for neural
price series with low M/S ratios tend to have a longer mean network currency traders.
holding time. Linear regression of the holding time to the
While the trader performs best in the Australian Dollar -
M/S ratio on all markets gave a correlation co-efficient of
US Dollar (AUD-USD) market with fewer inputs, its perfor-
-.48. This is consistent with the results in [1] and [4] which
mance in the - Dollar - Dutch Guilder (USD-NLG) market
showed that RRL training adapts traders to higher transac-
is best with a larger number of inputs. In the GBP-USD
tion cost by reducing the trading frequency. In the case of
market performance is best at fewer inputs, but the worst
FX trading a lower M/S ratio means that that the spread is a
performance is for traders with an intermediate number of
relatively higher equivalent transaction cost and we should
inputs. This seems to defy common sense, yet the the re-
expect trade frequency to be reduced.
sult is a genuine quirk of the market. The final number of
price inputs that was chosen to optimize the average per-
3.3 Effect of Network Parameters
formance in the tuning markets was 4. This results is also
rather surprising because it means that the neural network
The parameters that must be chosen for the neural net-
traders only need the price returns from the most recent two
works are the number of layers in the network and the num-
hours in order to make effective trading decisions.
ber of neurons in each layer. Note that the number of neu-
rons in the input layer is the choice for the number of price Comparing the profit with the Sharpe Ratio for the cur-
return inputs given to the network (minus one for the recur- rency markets in figure 2 shows that there does not appear
rent input.) Examining tables 1 and 2, one of the most strik- to be a tradeoff between profit and stability of returns when
ing results of this study is that a neural network with a single choosing the optimal values for the fixed parameters of the
layer outperforms a neural network with two layers. This model. The number of inputs that has the highest profit has
may seem like a surprise because 2 layer networks are gen- the highest Sharpe Ratio as well. This was true for nearly
erally a more powerful learning model; a single layer neural all of the fixed parameters and currency markets.
network can only make decisions that are linearly separa- For the two layer neural networks the dependence on the
ble in the space of the inputs [5]. However, the high level number of price return inputs was similar, and the final value
of noise in financial data may make the learning ability of chosen for optimal performance in the tuning markets was
Table 2: Test Markets Results: Fixed parameters for the Neural Networks and column headings are the same as described in
table 1.
1 layer NN 2 layer NN
Market M/S Profit (%) Sharpe Profit (%) Sharpe


CAD-USD 1.6 7.6 0.5 1.9 0.13 -1.3 1.0 -0.3 0.25


DEM-FIM 0.84 12.1 0.9 1.8 0.14 23.7 1.3 3.6 0.19


DEM-ITL 1.97 68.5 1.5 8.0 0.19 71.9 2.9 8.4 0.35


DEM-SEK 1.92 48.8 0.6 5.9 0.08 42.4 2.4 5.1 0.31


GBP-DEM 1.64 -20.7 0.7 -3.1 0.10 -25.7 2.7 -3.8 0.41


USD-BEF 2.89 55.5 4.4 4.4 0.35 45.5 4.9 3.6 0.38


USD-DEM 3.39 14.4 1.3 1.9 0.17 12.3 2.8 1.6 0.36


USD-DKK 2.17 6.6 0.7 0.8 0.09 -8.3 1.9 -1.0 0.22


USD-ESP 1.25 9.4 7.1 0.7 0.46 -6.1 7.1 -0.4 0.46


USD-ITL 1.74 118.9 2.2 12.0 0.24 88.3 1.8 9.0 0.19


USD-JPY 2.69 13.2 0.9 1.7 0.11 8.3 3.6 1.1 0.44


USD-MYR 1.50 -1.9 0.6 -0.6 0.20 -0.7 1.4 -0.3 0.48


USD-SEK 1.58 35.9 1.2 3.2 0.11 17.2 2.9 1.5 0.27


USD-SGD 0.82 -1.7 0.3 -0.4 0.06 -8.0 0.7 -1.8 0.17


USD-XEU 2.31 58.4 0.5 7.3 0.06 46.6 2.6 5.9 0.34


Average, All Markets 28.3 1.6 3.0 0.16 20.4 2.7 2.1 0.36
Average, M/S 1.5 37.0 1.3 4.0 0.15 27.0 2.7 2.8 0.37

also 4. The dependence of the result on the number of neu- profit declines the Sharpe Ratio increases slightly. This is
rons in the hidden layer was less significant than on the the only case noted where the Sharpe Ratio was not strongly
number of price return inputs. For the more profitable mar- correlated with the profit.
kets in the tuning set (GBP-USD, DEM-FRF, USD-FRF) While the parameters defining the neural network were
there was no significant dependence on the number of hid- relatively independent of each other (i.e. the optimal num-
den units. For the less profitable markets in the tuning set ber of price return inputs did not impact the optimal number
(AUD-USD, USD-JPY, USD-NLG) there was a slight pref- of hidden layer neurons or whether the neural network per-
erence for more units in the hidden layer and in the end the formed better with one or two layers) the parameters of the
number of hidden units that optimized overall performance training algorithm showed a complex interdependence. Fig-
in the tuning markets was 16. While this seems to suggests
that the less profitable markets are in some sense more com- => %
ure 3 illustrates this in the case of the number of training
epochs and the learning rate for a two layer neural net-
plex and that is why addition hidden layer neurons improve work in the USD-FRF market. For higher learning rates,
performance, the fact that the single layer neural network fewer training epochs is best, while for lower learning rates
outperforms the two layer neural network even in these mar- more training epochs are needed. The optimal results overall
kets makes this explanation seem untenable. were found with a balance between the two as simply using
a high learning rate with a single training epoch generally
3.4 Effect of Training Parameters gave worse performance than an intermediate learning rate
and number of training epochs.
L (    
9 % +: ; <
The parameters of the training algorithm include the In the case of weight decay it is worth noting that a small
(
learning rate and the co-efficient of weight decay, and amount of weight decay (on the order of ) gave
9 + : ; >
, the length of the training and trading windows,
=>
some benefit to the two layer neural networks. However,
and
4
, and the number of epochs for training, . In
general choosing optimal values for these parameters suffer
weight decay never helped the single layer neural network
for any combination of parameters tested. This result is not
from the same difficulties as in choosing the network param- surprising since weight decay is theoretically a technique for
eters - optimal values for one market may be sub-optimal simplifying the rule learned by the neural network and pre-
for another. Figure 4 illustrates this in the case of the size of venting the neural network from memorizing noise in the
the training window for the two layer neural network trader. data - the single layer network is already about as simple
While trading in the AUD-USD market performance is best a learning rule as possible so it is not surprising that further
with a longer training window, trading USD-FRF the per- simplification gives no benefit. From a different perspective,
formance is best with a shorter training window. Trading the performance of 2 layer neural networks generally suffers
the DEM-FRF has the best performance at an intermediate
value. Note that the USD-FRF performance with respect to
training window size has the unusual property that as the
ating far from the linear range of the <>= 4?
when large weights result in the hidden layer neurons oper-
function. Because
the 1 layer neural network uses a only a single thresholded

S S

Table 3: Statistics of trades for 1 Layer Neural Network in markets used for tuning:

= average holding time in hours,

L L
= % profit per trade. Statistics are given for long positions, short positions, winning positions, and losing positions. =




:(
< 
 > S
S
 < S
 : S
< S
 > S

Overall percent of trades which are profitable.
 < 

AUD-USD 5.9 6.9 4.9
7 4.4 8.2 0.015 0.017 0.012
7 0.121 -0.146 60.1
DEM-ESP 7.1 7.8 6.3 5.4 9.1 -0.006 -0.006 -0.007 0.055 -0.076 53.2
DEM-FRF 3.1 2.3 3.9 2.2 4.7 0.015 0.016 0.013 0.057 -0.052 61.2
DEM-JPY 3.8 3.5 4.2 2.9 5.3 0.010 0.013 0.007 0.110 -0.143 60.3
GBP-USD 3.7 4.1 3.3 2.6 5.6 0.012 0.019 0.005 0.081 -0.115 64.9
USD-CHF 4.1 3.7 4.5 2.7 6.1 -0.002 0.003 -0.008 0.111 -0.162 58.3
USD-FRF 3.8 3.5 4.2 2.8 5.8 0.027 0.029 0.026 0.111 -0.126 64.7
USD-FIM 7.8 8.8 6.7 5.5 10.4 -0.026 0.004 -0.056 0.173 -0.261 54.1
USD-NLG 4.8 6.8 2.8 4.1 5.8 0.015 0.021 0.009 0.137 -0.150 57.6
USD-ZAR 11.0 13.4 8.6 9.0 12.2 -0.122 -0.104 -0.140 0.162 -0.296 38.1

Profit (%) vs # Inputs Sharpe vs # Inputs Profit (%) vs # Training Epochs Sharpe vs # Training Epochs
50 6
AUD-USD

30 4
40

ρ = .35
3 4
20 30
2
20 2
10 1 10
0 0 0
0 2 4 6 8 10 2 4 6 8 10
2 4 6 8 10 12 2 4 6 8 10 12
50 6
4
GBP-USD

30 40
ρ = .15
3 30 4
20 2 20 2
10 1 10
0 0 0
0 2 4 6 8 10 2 4 6 8 10
2 4 6 8 10 12 2 4 6 8 10 12
50 6
4
USD-NLG

30 40
ρ = .05

3 30 4
20 2 20 2
10 1 10
0 0 0
0 2 4 6 8 10 2 4 6 8 10
2 4 6 8 10 12 2 4 6 8 10 12

Figure 3: The relationship between Training Epochs and


Figure 2: The effect of the number of price return inputs on Learning Rate for a two layer neural network in the USD-

=?>
the profit and Sharpe Ratio for single layer neural networks FRF market. In all plots the x axis shows the number of
for three markets in the tuning data set. In all plots the x training epochs, , and the y axis shows the profit (left col-
axis shows the number of inputs, , and the y axis shows umn) or the annualized Sharpe Ratio (right column.) All
the profit (left column) or the annualized Sharpe Ratio (right other system parameters are fixed at values given in table
column.) The neural networks also have an additional recur- 1. Results shown are averages and standard deviation for 25
rent input not counted in this plot. All other system param- trials.
eters are fixed at values given in table 1. Results shown are
averages and standard deviation for 25 trials.
iments suggest that performance is substantially decreased

J= 4? isunitcompletely
<tion
and the dependence on the fixed parameters is altered when
it is in fact a simple linear function and the func- the the traders cannot automatically transact at any price
unchanged when all the weights are mul- which appears in the quote series. These experiments are
tiplied by an arbitrary constant. now underway and will be reported when complete.
Regardless of the price model used, the RRL method
seems to suffer from a problem that is common to gradient
4 CONCLUSIONS
ascent training of neural networks: there are a large number
of fixed parameters that can only be tuned by trial and error.
The results presented here suggest that neural networks Despite extensive experiments we cannot claim that we have
trained with Recurrent Reinforcement Learning can make found the optimal fixed parameters for currency trading in
effective traders in currency markets with a bid/ask spread. general as the possible combinations of parameters is very
However, further testing with a more realistic and less for- large and many of the parameters have a complex interde-
giving model of transaction prices is needed. Initial exper- pendence. A problem that is specific to the currency trading
Profit (%) vs Training Length Sharpe vs Training Length REFERENCES
8

AUD-USD
40
4 [1] J Moody, L Wu, Optimization of trading systems and
20
0 0
portfolios. In Neural Networks in the Capital Markets
(NNCM*96) Conference Record, Caltech, Pasadeana,
1200 2400 3600 1200 2400 3600
8
1996
DEM-FRF

40
4 [2] J Moody, L Wu, High Frequency Foreign Exchange
20
Rates: Price Behavior Analysis and “True Price” Mod-
0 0 els. In Nonlinear Modeling of High Frequency Finan-
1200 2400 3600 1200 2400 3600 cial Time Series, C Dunis and B Zhou editors, Chap.
8 2, Wiley & Sons, 1998
USD-FRF

40
4
20 [3] J Moody, M Saffell, Minimizing downside risk via
0 0 stochastic dynamic programming. In Computational
1200 2400 3600 1200 2400 3600
Finance 1999, Andrew W. Lo Yaser S. Abu-Mostafa,
Blake LeBaraon and Andreas S. Weigend, Eds. 000,
pp. 403-415, MIT Press
Figure 4: The effect of the size of the training window on the L
the x axis shows the size of the training window,
9 + : ; <
performance of a 2 layer neural network trader. In all plots
, and
[4] J Moody, M Saffell, Learning to Trade via Direct Re-
inforcement, IEEE Transactions on Neural Networks,
the y axis shows the profit (left column) or the annualized Vol 12, No 4, July 2001
Sharpe Ratio (right column.) All other system parameters [5] C Bishop, Neural Networks for Pattern Recognition,
are fixed at values given in table 1. Results shown are aver- Oxford University Press, 1995
ages and standard deviation for 25 trials.
[6] P Werbos, Backpropagation Through Time: What It
Does and How to Do It, Proceedings of the IEEE, Vol
78, No 10, Octoboer 1990
applications is that performance depends heavily on charac-
teristics of the currency markets which are understood only
poorly at this time. We can rule out trading in sluggish mar-
kets with a low ratio of absolute price movement to bid/ask
spread, but this criteria does not have predictive value for
the performance in markets with adequate price movement.
These conclusions point to a few avenues for further re-
search. Probably the most important need is a more in depth
analysis of the properties of the different currency markets
that lead to the widely varying performance of the neural
network traders. Another interesting question is how the
performance may benefit from giving the traders data other
than the recent price series, such as interest rates or other
information which has an impact on currency markets. Fi-
nally, a more open ended goal is to achieve a greater theo-
retical understanding of how and why Recurrent Reinforce-
ment Learning works that may answer questions like why
some markets are tradeable and others not, if we can im-
prove the performance of the neural networks further, or
adapt the principle of the RRL method to other learning
models such as Radial Basis Functions or Support Vector
Machines that do not rely on gradient ascent for parameter
tuning.

ACKNOWLEDGEMENTS

I would like to thank Yaser Abu-Mostafa, John Moody,


and Matthew Saffell for their direction, feedback and insight
throughout this work.

You might also like