0% found this document useful (0 votes)
212 views

Using Volume Weighted Support Vector Machines With Walk Forward PDF

This document discusses using a modified version of support vector machines called Volume Weighted Support Vector Machines (VW-SVM) to create stock trading strategies. VW-SVM incorporates trading volume into the penalty function of an SVM classifier to improve accuracy. The study aims to develop a trading strategy using VW-SVM combined with feature selection and technical indicators to make accurate predictions of future stock trends. Walk-forward testing is used to evaluate the strategy on a subset of stocks, analyzing more data points than previous similar studies. Results show the combined approach of VW-SVM, feature selection, and technical indicators significantly improves trading strategy performance.

Uploaded by

Euclides Marques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views

Using Volume Weighted Support Vector Machines With Walk Forward PDF

This document discusses using a modified version of support vector machines called Volume Weighted Support Vector Machines (VW-SVM) to create stock trading strategies. VW-SVM incorporates trading volume into the penalty function of an SVM classifier to improve accuracy. The study aims to develop a trading strategy using VW-SVM combined with feature selection and technical indicators to make accurate predictions of future stock trends. Walk-forward testing is used to evaluate the strategy on a subset of stocks, analyzing more data points than previous similar studies. Results show the combined approach of VW-SVM, feature selection, and technical indicators significantly improves trading strategy performance.

Uploaded by

Euclides Marques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Expert Systems with Applications 42 (2015) 1797–1805

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Using Volume Weighted Support Vector Machines with walk forward


testing and feature selection for the purpose of creating stock trading
strategy
_
Kamil Zbikowski
Institute of Computer Science, Faculty of Electronics and Information Technology, Warsaw University of Technology, Warsaw, Poland

a r t i c l e i n f o a b s t r a c t

Article history: This study aims to verify whether modified Support Vector Machine classifier can be successfully applied
Available online 22 October 2014 for the purpose of forecasting short-term trends on the stock market. As the input, several technical
indicators and statistical measures are selected. In order to conduct appropriate verification dedicated
Keywords: system with the ability to proceed walk-forward testing was designed and developed. In conjunction
Support Vector Machines with modified SVM classifier, we use Fishers method for feature selection. The outcome shows that using
Trend forecasting the example weighting combined with feature selection significantly improves sample trading strategy
Walk-forward testing
results in terms of the overall rate of return, as well as maximum drawdown during a trading period.
Stock trading
Ó 2014 Elsevier Ltd. All rights reserved.

1. Introduction (1997) proposed a modification of least-squares cost function used


in error backpropagation. Obtained estimator was biased towards
Researchers, analysts and investors are familiar with various the more recent observations, and the bias was determined by
methods of analyzing financial time series data. Generally, they the decay parameter in sigmoid function. Further improvements
can be divided into two categories. First one is the group of tech- can be made by applying feature selection procedure. In particular,
niques that relies on the assumption that future prices of assets, Hsu (2011) used feature selection procedure based on backward
their levels or just trends can be forecast by analyzing their past simulation. Many researchers applied other techniques for attri-
performance. It includes econometric and mathematical modeling bute selection (Dai, Shao, & Lu, 2012; Lee, 2009; Hsu, 2011; Kara,
as well as technical analysis. On the other hand, the second cate- Acar Boyacioglu, & Baykan, 2011; Huang, 2012; Ng, Liang, Li,
gory consists of methods which analyze particular asset from the Yeung, & Chan, 2014; Zhang et al., 2014). In recent years, however,
wider perspective like economic surrounding, sentiment of inves- applications of machine learning algorithms in the area of financial
tors and other factors that not always can be easily quantified time series prediction and trading are focused on combining the
and measured. knowledge gained from financial markets with the well-defined
For the purpose of enhancing investment performance, there models. Chavarnakul and Enke (2008) emphasized the role of
are numerous examples of combining data mining algorithms with trading volume for understanding stock price movements and
traditional investment strategies. One of the earliest works in the incorporated Adjusted Moving Average and Easy of Movement
field is Hammerbacher and Yager (1981) where the theory of fuzzy indicator into generalized regression neural network model which
subsets were applied to the multiple objective decision problem of was utilized on past S&P500 index data. Chu, Chen, Cheng, and
stock selection. In Wong (1991) a neural network approach was Huang (2009) stressed the role of transaction volume as well. They
described for time series forecasting. The most encouraging for utilized a type 2 fuzzy time-series model in a combination with
the proposed model is a fact that it does not require any assump- volume technical indicator for forecasting stocks indexes.
tion to be made about underlying function or model that should be Recent studies tend to hybridize SVM with all those techniques –
used. It is a significant advantage over traditional methods such as namely robust feature selection, transactional volume incorpora-
regression and Box–Jenkins model. tion and technical analysis factors. Kara et al. (2011) conducted an
In many further studies, an attempt was made to increase experiment in which neural network and plain SVM models were
models’ performance. Refenes, Bentz, Bunn, Burgess, and Zapranis compared for the prediction of stock price index movement with
the extensive use of several technical indicators. Rosillo, Giner,
E-mail address: [email protected] and Fuente (2013) used Volatility Index and technical analysis in

http://dx.doi.org/10.1016/j.eswa.2014.10.001
0957-4174/Ó 2014 Elsevier Ltd. All rights reserved.
1798 _
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805

order to forecast weekly change in S&P 500. Dai et al. (2012) incor- where
porated MARS splines for attribute selection that are then served as Pd
V tk
an input for Support Vector Regression model. v t ¼ Pmk¼0 ; ð5Þ
Existing studies showed that separately applying feature selec- i¼0 W ti

tion, transaction volume dependency and technical analysis can where V tk denotes the real transactional volume for the moment t
lead to significant improvement in terms of models performance. with the delay of k periods and d is the length of data over which
Therefore, combining all of them with an SVM classifier can be a particular feature is calculated. Problem (1) can be reformulated
natural expansion of current research. In this paper, we focus on using Lagrange multipliers ai and li as follows:
the modified version of Support Vector Machines (SVM), Volume-
Weighted SVM. Volume dependency is not provided as another 1 Xm Xm Xm
Lðw;b; aÞ ¼ jjwjj2 þ C ni  ai ðyi ðwxi  bÞ  1 þ ni Þ  li ni :
predictor but as a structural modification of SVM classifier. The 2 i¼1 i¼1 i¼1
extension makes an assumption that incorporating volume-based
ð6Þ
weighting into penalty function can lead to significant improve-
ment in classifier accuracy (Zbikowski, 2014). The main goal of In order to represent (6) as a dual problem following partial deriv-
presented experiment was to develop a trading strategy which atives need to be calculated:
uses VW-SVM in a combination with F-Score feature selection X
m
@Lðw; b; a; lÞ
and several technical indicators to make the most accurate predic- ¼0)w¼ ai yi xi ; ð7Þ
tions about future trends of a particular stock. This study focuses @w i
on a selected subset of stocks instead of analyzing stock index. This @Lðw; b; aÞ Xm

approach allows to examine far more data points than it was done ¼0) ai yi ¼ 0; ð8Þ
@b i¼1
in previous studies.
@Lðw; b; a; lÞ
The paper is organized as follows: we briefly introduce the con- ¼ 0 ) C ¼ li þ ai : ð9Þ
cept of VW-SVM in Section 2. Then, in Section 3, we describe in @ni
detail the system design which was developed for the purpose of Based on constraints (7)–(9) Eq. (6) the dual problem has the fol-
testing different variants of trading strategies. In Section 4, we lowing form:
analyze experiment results. Finally, in Section 5 we make some
X
m
1X m X m
concluding remarks. QðaÞ ¼ ai  ai aj yi yj uðxi Þuðxj Þ ð10Þ
i¼1
2 i¼1 j¼1
2. Volume Weighted Support Vector Machines
and has to be maximized according to the following conditions:
The primary objective of SVM algorithm is to maximize the X
m

margin jjwjj of separating hyperplanes in n-dimensional feature ai yi ¼ 0; ð11Þ


i¼1
space. It applies the structural minimization risk principle which
is described in detail (Cortes & Vapnik, 1995). In Zbikowski ai P 0: ð12Þ
(2014), the concept of volume-based example weighting was As it was shown in Zbikowski (2014) optimization Problem (10) has
introduced. The penalty function for this Volume Weighted SVM a similar form as in base SVM and can be solved with the applica-
(VW-SVM) is defined as follows: tion of Sequential Minimal Optimization algorithm.
1 X
m
min jjwjj2 þ C i ni ; ð1Þ 3. System design
w2H; b2R 2
i¼1

subject to: 3.1. Architecture


yi ðwxi  bÞ P 1  ni ; ð2Þ
In order to appropriately verify capabilities of using VW-SVM
ni P 0; ð3Þ for trading on the stock market, we developed dedicated system
where ni are slack variables, C i is the penalty parameter for partic- which architecture is presented in Fig. 1. Its main objective is to
ular input xi and yi is the corresponding target value. For each input make the most possible accurate predictions of future short-term
vector, the penalty term is defined as follows: trends. It consists of several modules that are responsible for spec-
ified tasks during backtesting procedure. Both, stocks quotations
C i ¼ v i C; ð4Þ and trading results are stored in a relational database. Simulation

Fig. 1. Backtesting system architecture.


_
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805 1799

Engine (SE) is responsible for properly iterate over quotations of where n is the length of the period over which indicator is
the stocks defined in the Trading Scenario Definition (TSD). It computed.
ensures that the walk-forward procedure described in detail in
Ladyzynski, Zbikowski, and Grzegorzewski (2013) is conducted to 3.2.3. On-Balance Volume
avoid look-ahead bias. Basically, it goes through the entire data In Davies (2004) On-Balance Volume indicator was described as
set with the moving window of the length m on which the model a momentum indicator that relates the price change to the volume.
is optimized. Next it enables trading on the subsequent l samples. It is defined as follows:
When the end of this subset is reached the optimization window is 8
shifted forward by l. < OBV t1 þ V t
> if Pt > Pt1 ;
SE supplies Algorithm Sandbox (AS) with quotations in order in OBV t ¼ OBV t1  V t if Pt < Pt1 ; ð17Þ
which they occur during the walk-forward procedure. An algorithm >
:
OBV t1 if Pt ¼ Pt1 ;
is injected in AS where it performs necessary computations and
makes a decision about opening or closing position in a particular where V t is the transactional volume for the moment t. The idea
asset. Those orders are then sent to Transaction Broker (TB) module behind this indicator is straightforward. When OBV rises (declines),
where they are processed, stored and forwarded to analyze by next it indicates not only that price of asset increases (decreases), but
module – Assets Portfolio Manager (APM). When SE triggers the also that serious investors and large capital are involved in this
recalculation on APM, several things happen. Firstly, APM looks movement. Moreover, even significant changes in the price that
for new orders received from TB and open proper positions. Sec- are not confirmed by the volume have no influence on the OBV
ondly, for each position APM recalculate current profits, margins value.
and equity, which are held by each instance of the algorithm. This
data is available to the algorithm when next iteration occurs.
3.2.4. Williams Oscillator
Williams Oscillator expresses aberration of the price series from
3.2. Selected technical indicators
its maximal value for the given period of time. It indicates
moments when the market of an asset is overbought or oversold.
In the previous section, we have briefly described the founda-
It is defined as follows:
tions of technical analysis. In the following paragraphs, we will
provide more in-depth information about several indicators that MaxPnt  Pt
were used for the purpose of presented experiment. It should be %Rnt ¼ 100 ; ð18Þ
MaxPnt  MinP
emphasized that all of them are used commonly with correspond-
ing trading rules which are derived from the knowledge and where
experience of traders. In the case of using artificial intelligence
algorithms, those rules are extracted automatically by particular MaxPnt ¼ maxfPt ; Pt1 ; . . . ; Ptn g
models and are not the subject of further analysis.
and
All indicators are derived from OHLC data where POt ; PHt ; P Lt and
C
P t denotes respectively open, high, low and close prices for partic- n
MinP t ¼ minfPt ; Pt1 ; . . . ; Ptn g:
ular moment t which spans time period ðt  1; ti.

3.2.1. Average True Range 3.2.5. Relative Strength Index


Average True Range measures volatility of price changes over Relative Strength Index is a momentum indicator which can be
the specified period of time. It has simple construction that bases utilized in buying stock shares near the bottom of its trend (Faber,
on an auxiliary indicator True Range which measures maximum 1994). Te formula is defined below:
difference between subsequent prices:
100
TRt ¼ maxfPHt  PLt ; PHt  PCt1 ; PLt  PCt1 g: ð13Þ RSInt ¼ 100  ; ð19Þ
1 þ RSnt
Average True Range is n-day arithmetic mean of TR computer over
where RSnt is positive to negative closes ratio defined by:
predefined period:
Pn
1X n
Ut
ATRnt ¼ TRti : ð14Þ RSnt ¼ Pt¼0
n : ð20Þ
n i¼0 t¼0 Dt

ATR is said to be superior to the standard deviation of closing prices U t and Dt are respectively upward and downward trend indicators
for the purpose of quantifying the volatility as it uses also intraday with the following definition:
prices fluctuations (Gustafson, 2001). (
PCt  PCt1 if PCt P PCt1 ;
3.2.2. Vortex Indicator Ut ¼ ð21Þ
The Vortex Indicator is a directional movement indicator. The
0 otherwise;
(
main idea behind is that individual relations between successive PCt1  PCt if Pt < PCt1 ;
quotations provide information about trends directions (Botes & Dt ¼ ð22Þ
0 otherwise:
Siepman, 2010). Vortex Indicator actually consists of two values
V ðþÞ and V ðÞ which denote positive and negative movements of
trend and are defined as follows: 3.2.6. Standard deviation and rate of return
Pd H L Due to the fact that presented model has the feature selection
k¼0 ðP tk  P tk Þ
V nðþÞ
t ¼ Pd ; ð15Þ capability additional simple measures are provided. In the case
k¼0 TRtk they are irrelevant, we expect that they will not be included in
Pd L H the training stage. Common rates of investment profitability and
k¼0 ðP tk  P tk Þ
V nðÞ
t ¼ Pd ; ð16Þ its risk are simple n-day rate of return Rnt and standard deviation
k¼0 TRtk of those quantities over the specified period of time rnt .
1800 _
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805

f ðUÞ f ðDÞ
3.3. Data preparation SBðt;jÞ þ SBðt;jÞ
F f ðt; jÞ  f ðUÞ f ðDÞ
; ð30Þ
SW ðt;jÞ þ SW ðt;jÞ
In the process of determining best investment opportunities,
technical indicators are usually analyzed as sequences of succes- where
sive values. This is not always reflected in designing trading strat-  2
egies that use data mining algorithms (Teixeira, 2010). In Wen, f ðTÞ f ðTÞ
SBðt;jÞ ¼ xt;j  xft;j ð31Þ
Yang, Song, and Jia (2010), the concept of introducing delays for
each indicator was presented and experiment conducted there and
showed interesting results. We applied this approach for each
mT 
X 2
input vector. For the particular indicator 1 f ðTÞ f ðTÞ
SW f ðTÞ ¼ xk;j  xt;j ; ð32Þ
mðTÞ  1 k¼1
f 2 fATRnt ; V nðþÞ
t ; V nðÞ
t ; OBV t ; RSInt ; %Rtn ; Rnt ; r n
t g; ð23Þ
where T 2 fU; Dg denotes target class for particular feature which
the training subset is defined as following matrix: f ðTÞ
2 3 represents either Upward or Downward trend. Values xt;j ; xft;j are
xtm;1 xtm1;2 ... xtmpf ;pf the averages of features for the f indicator over all m periods with
6 7 the delay of jth periods for the moment t computed respectively
6 xtmþ1;1 xtm;2 . . . xtmpf þ1;pf 7
6 7
Xf ¼ 6 .. .. .. .. 7; ð24Þ over target values either from T class or from all examples within
6 . . . . 7 the optimization window, mT indicates the number of examples
4 5
xt;1 xt1;2 ... xtpf ;pf within Xf which belong to class T.
The numerator in Eq. (30) measures discrimination between
where m denotes the optimization window length and pf is the two classes, and the denominator indicates variability within each
delay defined for the feature f. In order to provide unified training class. According to Chen and Lin (2006), its main disadvantage is
set for each training phase, matrices Xf are merged in accordance that it does not incorporate mutual information between features.
to the following equation: In order to reduce the risk of choosing a wrong set of attributes,
h n nðþÞ nðÞ n t n n
i slightly modified procedure described in Chen and Lin (2006)
X ¼ XATRt ; XV t ; XV t ; XOBV t ; XRSIt ; X%Rn ; XRt ; Xr t
ð25Þ
As we are dealing with classification problem, target classes
need to be properly defined. In our study, particular example is
assigned to an upward class when d-day rate of return is greater
than or equal 0 and to downward class otherwise. Therefore, for
particular moment t we are trying to forecast rate of return for
the period from t þ 1 to t þ d. The corresponding target values vec-
tor is defined as:
2 3
ytmþd
6y 7
6 tmþdþ1 7
Y¼6
6 ..
7;
7 ð26Þ
4 . 5
ytþd
where ith row of Y is the target value assigned to ith row of training
set X and is computed as follows:
(
1; if Rdt P 0;
yt ¼ ð27Þ
1; otherwise;

where d denotes prediction horizon in days.


In each iteration, algorithm receives an information about trend
predictions and depending on them makes investment decisions
for the particular asset. Partial Input vector for VW-SVM model
for the purpose of prediction is defined similarly to Eq. (24):

If ¼ ½ xt;1 xt1;2 . . . xtpf ;pf : ð28Þ

Accordingly to Eq. (25), the aggregated vector over all features f


which is directly used for prediction is defined as follows:
h n nðþÞ nðÞ n t n n
i
I ¼ IATRt ; IV t ; IV t ; IOBV t ; IRSIt ; I%Rn ; I Rt ; Irt ð29Þ

3.4. Fisher score for feature selection

Many researchers emphasize the importance of feature selec-


tion for automated stock trading algorithms (Dai et al., 2012; Lee,
2009; Hsu, 2011; Kara et al., 2011). For the purpose of presented
experiment we chose Fisher score which is feature ranking method
with the ranking function defined as: Fig. 2. Trading strategy algorithm based on VW-SVM classifier.
_
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805 1801

Table 1 training subset with each threshold and feature subset new predic-
Testing configurations. EW and FS denotes example weighting and feature selection tor was constructed. Finally, the one with the least validation error
respectively. When EW or FS is activated it is indicated by ‘‘+’’ or by ‘‘–’’ otherwise.
was selected for further analysis.
Configuration value

pf m l EW FS 3.5. Scaling procedure


Conf.1 1 100 10 – –
Conf.2 1 100 10 – + Both indicators RSI and Williams %R have values from the range
Conf.3 1 100 10 + – 0–100 whereas R being a plain rate of return usually vary from 5%
Conf.4 1 100 10 + + to 5% but this range is dynamic and may be different for each walk-
Conf.5 1 500 50 – –
forward iteration. According to Theodoridis and Koutroumbas
Conf.6 1 500 50 – +
Conf.7 1 500 50 + –
(2008) and Hsu, Chang, and Lin (2010) scaling the input data
Conf.8 1 500 50 + + increases accuracy of SVM classifier. It protects attributes with
Conf.9 5 500 50 + + lower ranges from being dominated by those with higher values.
For each feature xi from the input vector scaling procedure to the
range ½L; U where L; U 2 R and L < U is applied:
was applied. Each training set X was subdivided into another train-
ing and validation sets. Then, ten random thresholds for previously xftj  mfj
x0ftj ¼ L þ ðU  LÞ ; ð33Þ
computed Fisher score values for each feature were chosen. On the M fj  mfj

Table 2
Backtesting results for Configurations 1–4.

Stock Conf. 1 Conf. 2 Conf. 3 Conf. 4 Ref


R [%] DD [%] R [%] DD [%] R [%] DD [%] R [%] DD [%] R [%] DD [%]
ACT 6.17 61.48 95.59 48.76 32.16 57.81 43.50 64.94 295.15 57.73
AVB 48.75 57.82 48.26 69.01 23.73 48.35 35.85 56.12 219.74 72.07
CCI 86.08 46.49 177.07 56.45 310.93 25.33 336.83 47.05 882.37 76.16
GE 79.92 86.60 75.55 82.11 54.76 72.68 35.88 53.76 7.67 84.19
JPM 43.00 56.54 61.95 83.02 39.20 55.45 60.05 79.65 69.33 70.11
L 46.61 52.66 62.86 33.29 16.17 42.56 93.31 30.41 220.79 66.01
MCD 44.34 45.83 37.51 29.39 38.29 49.65 118.64 23.19 432.30 22.88
QCOM 8.85 58.17 15.57 68.83 73.31 51.96 37.93 53.48 332.92 48.18
TER 10.83 79.02 53.66 80.87 52.47 79.84 40.58 67.90 5.40 90.32
USB 42.19 61.00 7.45 61.51 32.11 52.67 37.07 68.32 63.69 76.78
ALL 79.23 86.17 46.13 78.91 78.73 88.13 52.82 81.80 52.46 78.58
BMS 21.47 53.57 33.58 58.07 16.84 52.31 25.85 49.84 71.45 52.83
EBAY 34.81 75.34 14.65 71.71 12.58 74.13 7.21 76.40 101.61 82.56
HOT 72.68 77.02 57.17 79.54 54.13 68.03 33.45 68.50 212.67 87.32
LNC 94.43 97.04 76.96 95.17 89.02 94.76 37.69 85.11 36.93 93.27
MAC 84.19 95.64 45.33 88.40 81.05 95.66 62.96 81.08 80.08 94.39
NWL 79.23 88.59 45.62 66.01 59.43 79.96 24.29 65.12 1.40 85.79
SYK 17.65 63.13 71.15 39.39 56.07 61.22 22.23 56.56 119.70 59.22
TSO 136.36 90.55 713.46 62.45 464.85 85.15 669.16 71.38 1246.63 89.45
YHOO 63.68 82.50 2.36 68.74 55.35 76.90 43.11 83.89 127.29 79.39
Mean 23.07 70.76 30.37 66.08 20.15 65.63 51.12 63.22 228.21 73.36

Table 3
Backtesting results for Configurations 5–8.

Stock Conf. 5 Conf. 6 Conf. 7 Conf. 8 Ref


R [%] DD [%] R [%] DD [%] R [%] DD [%] R [%] DD [%] R [%] DD [%]
ACT 29.76 56.85 233.09 46.10 174.14 52.75 125.76 57.96 355.99 43.29
AVB 10.89 51.21 56.40 54.16 154.07 28.17 206.98 26.57 84.08 72.07
CCI 0.47 75.10 202.31 77.57 69.31 68.07 171.67 55.54 357.20 76.16
GE 57.97 74.66 36.42 61.42 15.73 38.52 10.72 30.14 28.52 84.19
JPM 15.28 55.45 38.92 66.49 2.89 48.83 29.73 51.57 39.12 70.11
L 23.25 61.23 59.67 54.10 150.63 26.12 151.98 26.51 111.04 66.01
MCD 16.42 25.08 89.59 22.85 102.36 21.81 137.99 22.86 194.21 22.88
QCOM 53.14 37.15 187.89 36.83 152.40 36.77 207.79 36.89 58.31 48.18
TER 13.16 71.16 81.58 61.95 26.48 57.81 13.09 58.60 3.48 83.77
USB 49.48 76.19 0.64 58.37 3.54 37.02 52.73 29.41 21.71 76.78
ALL 41.39 75.50 31.31 67.49 18.72 63.80 22.71 68.87 5.84 78.58
BMS 33.17 52.99 30.85 58.75 36.72 52.39 5.52 47.67 36.06 52.83
EBAY 64.65 79.96 24.66 64.95 49.24 76.81 22.42 59.52 7.96 82.56
HOT 65.96 83.85 51.74 78.07 16.93 59.28 66.64 43.53 48.68 87.32
LNC 53.92 88.67 36.84 88.45 15.28 61.86 62.72 55.66 2.51 93.27
MAC 73.72 89.31 16.65 52.83 65.41 35.92 101.76 28.31 1.52 94.39
NWL 28.14 70.86 18.23 45.09 56.54 48.11 42.15 48.77 21.26 85.79
SYK 21.72 52.07 32.82 62.85 0.02 54.89 6.96 43.13 55.40 59.22
TSO 29.34 69.31 50.74 77.67 112.30 65.66 451.12 54.18 215.81 89.45
YHOO 52.70 75.99 2.52 59.67 8.92 59.40 97.91 41.75 9.75 79.39
Mean 15.60 66.13 35.79 59.78 51.16 49.70 92.93 44.37 78.05 72.31
1802 _
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805

where the M i and mi denotes respectively maximum and minimum to the matrix Xf . Every l quotations, classifier is retrained on the
from the training set defined as: current training data stored in matrix X. During this and next
h f f f
i l  1 iteration, this model is used for making predictions about
M f ¼ maxðxi1 Þ maxðxi2 Þ . . . maxðxipf Þ ð34Þ the forthcoming trend.
i i i

and
h i 4. Empirical study
f f f
mf ¼ minðxi1 Þ minðxi2 Þ . . . minðxipf Þ : ð35Þ
i i i
For the purpose of presented experiment, twenty stocks were
randomly selected to evaluate the performance of the trading
3.6. Trading strategy strategy presented in Section 3.6. Market data consisted of daily
quotations from January 1, 2003 to October 21, 2013.
In order to verify whether presented VW-SVM classifier has the Several configurations were tested. All are presented in Table 1.
capability to successfully forecast future trends on the stock mar- In order to investigate the influence of introducing both, example
ket, we constructed trading strategy presented in 2. It employs weighting (EW) and feature selection (FS), we decided to run tests
symbols defined in Section 3.3. Main loop iterates over all days. in all possible configurations using one, both or neither of them.
This simulates behavior of SE module described in Section 3.1. In The impact of extending lengths of optimization and trading
every iteration new value for each indicator f is computed. It is windows was also examined. For the first four configurations, we
then added at the beginning of the input vector If . To preserve set them to 100 and 10 days respectively and for further four
its constant size, last element has to be removed from If . This value experiments to 500 and 50 days. Last but not least, we also con-
is then added as the first element along with the appropriate shift ducted a separate test for the purpose of verifying delay factor pf

100000
Equity
60000
20000

2006 2008 2010 2012 2014


GE
200000
Equity
100000
0

2006 2008 2010 2012 2014


MAC
160000
Equity
60000 100000

2006 2008 2010 2012 2014


SYK
150000
Equity
50000

2006 2008 2010 2012 2014


YHOO

Fig. 3. Trading results sample for Conf. 4. Black – plain stock performance, gray – Fig. 4. Trading results sample for Conf. 8. Black – plain stock performance, gray –
strategy performance. strategy performance.
_
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805 1803

introduced in Section 3.3. Overall, 360 different experiments were the same ratio between trading and optimization windows sizes.
conducted on the data set which consisted of more than 50,000 Therefore, we could investigate only the effect of enlarging the
quotations. training data set solely. However, relation between lengths of both
For Configurations 1–4 results are presented in Table 2, whereas windows and its influence on strategy performance is definitely
for Configurations 5–8 in Table 3. Two measures were used to eval- worth examining, but it is not included in this work.
uate the performance of each of those experiments: rate of return Results presented in Table 3 for the Configuration 5 show that
(R) over the whole data set and maximum drawdown (DD) for a there was only a minor increase in the rate of return compared
particular strategy. Maximum drawdown is often used by the to Configuration 1. However, it still remained negative. Bearing
investors as the risk measure. It is preferred over other measures, in mind findings of influence of example weighting and feature
mainly because when it remains in the certain, a priori defined selection, we decided to check their performance with expanded
range, it indicates that the strategy is still reliable. Otherwise, it lengths of training data sets. As it can be expected, the experiment
may be a serious signal of strategy deterioration, for example, as showed that both of them, separately, increase the average rate of
a result of a market regime switch (Magdon-Ismail, Atiya, Pratap, return. Moreover, applying them together (Conf. 8) resulted in the
& Abu-Mostafa, 2003). In both tables, additional information about synergy effect which can be observed in a significant increase in
performance of a plain investment in the particular stock is pro- the rate of return and a corresponding decrease of average maxi-
vided. It assumes buying the stock at the beginning of a testing mum drawdown. Sample results are presented in Fig. 4. It can be
period and selling it at the end. observed that an important property of the proposed model is
Results in Table 2 showed that plain SVM classifier (Conf. 1) is revealed in this configuration. For three out of four samples, the
not able to forecast short-term trends. In this scenario, the trading strategy decided to stay out of the market at the beginning of a
strategy lost 23.07% on average with the drawdown of 70.76%. downward trend on the underlying asset. As a result, it obtained
From the perspective of investors, this is an unacceptable result
which could lead to the bankruptcy. Performance of the strategy
in Configurations 2 and 3 is much better as the mean rate of return
for all stocks is greater than 0. The average drawdown remains sig-

100000
nificant but is slightly better than for Configuration 1. Moreover,
those two experiments showed that extensions to the basic classi-
Equity
60000
fier applied independently result in improved performance of the
proposed strategy. Separately, adding example weighting raises
the average rate of return by 43.22%, and applying feature selection
20000

with Fisher score by 53.44%. In case they are combined together


(Conf. 4) overall improvement in the rate of return is 75.19%. Max-
imum drawdown decreases on average by 7.74%. Sample results 2006 2008 2010 2012 2014
GE
for this case are presented in Fig. 3. This results clearly indicated
that there is some potential in the proposed model, though we
150000

were not convinced that it can be used as a successful trading


strategy.
Equity

After analyzing classification accuracy for Configuration 1–4 we


decided to expand the length of optimization moving window to
50000

500 days. Accordingly, we increased the size of trading moving


window to 50 days. The second change was done primarily
because each optimization phase was much longer as it operates
0

on data set that was 5 times larger than before. It also preserves 2006 2008 2010 2012 2014
MAC
180000

Table 4
Backtesting results for Configuration 9.

Stock Conf. 9
120000
Equity

R [%] DD [%]
ACT 423.89 40.64
AVB 134.37 24.00
60000

CCI 497.46 28.58


GE 1.01 31.64
JPM 13.62 41.18 2006 2008 2010 2012 2014
L 160.72 26.10 SYK
MCD 140.20 22.97
QCOM 68.94 49.14
150000

TER 201.91 53.01


USB 49.20 31.80
Equity

ALL 16.59 46.74


BMS 30.89 28.56
EBAY 18.74 54.02
50000

HOT 54.19 43.14


LNC 70.76 57.10
MAC 48.58 31.09
NWL 8.98 54.05
SYK 77.29 42.25 2006 2008 2010 2012 2014
TSO 1299.46 36.90 YHOO
YHOO 75.45 42.69
Mean 168.71 39.28 Fig. 5. Trading results sample for Conf. 9. Black – plain stock performance, gray –
strategy performance.
1804 _
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805

much better results in terms of both the rate of return and maxi- architectural framework for the purpose of back-testing trading
mum drawdown than buy-and-hold strategy. strategies.
Outcomes of configurations from 1 to 8 lead to the conclusion Despite the proposed model performance, further research is
that enlarging training sample combined with feature selection needed. Further experiments should be conducted in order to ver-
and incorporating volume to SVM penalty function result in a sig- ify whether other feature selection methods can improve strategy
nificant performance improvement of proposed trading strategy. performance. One of the possible research directions is to apply
Due to this fact, we conduct another experiment in which we intro- Random Forests for the purpose of determining feature importance
duce features delays as it was described in Section 3.3. From all (Breiman, 2001). An example of such an application is described by
presented attributes subsets, the most promising is Configuration Chen and Lin (2006).
9. In this configuration, we set the delay parameter pf to 5 days Another area worth further exploration is the form of the
for every feature f from the (23) and the rest of parameters were weighting function. It should be investigated whether there exists
the same as in Configuration 8. Results for Configuration 9 contain any other superior to one presented in this study. An approach, in
Table 4 and corresponding samples of trading simulations are pre- which, instead of plain volume information, volatility would be
sented in Fig. 5. They show further improvement in performance of incorporated as the predictive factor, is one of the possible
proposed strategy. Average rate of return is 168.71%. Only in one extensions.
out of twenty simulations strategy did not manage to obtain a From the perspective of applications, it is worth examining
positive outcome. Overall return still remains worse than for the whether proposed algorithm can be deployed on other markets
buy-and-hold strategy but there is a significant decrease in as well as with different financial instruments. It would also be
maximum drawdown, which is 34.08 better. Proposed trading interesting to expand training data set beyond technical analysis
algorithm achieves better results in terms of return to risk ratio. indicators and statistical measures and include factors from the
Comparing to previous results, in this configuration strategy even economic surroundings of a particular stock.
better forecasts downward trends. Presented strategy reaches con- Last but not least, similar family of weighting function can be
siderable rate of return and keeps risk at the acceptable level. proposed for other machine learning algorithms like neural net-
It should be noted that applying walk-forward procedure works or fuzzy sets. It would be appealing to examine and compare
results in using some subsets of previously seen data again in sub- their capabilities to predict financial trends with VW-SVM.
sequent optimizations. This is quite a computationally expensive
task. Presented attributes subsets should be treated as a general
direction of further research and additional extensive test should References
be conducted in order to determine their accurate values that max-
Botes, E., & Siepman, D. (2010). The vortex indicator. Technical Analysis of Stocks &
imize expected rate of return. Commodities, 28, 20–30.
Breiman, L. (2001). Random forests. Machine learning, 45, 5–32.
5. Conclusion Chavarnakul, T., & Enke, D. (2008). Intelligent technical analysis based equivolume
charting for stock trading using neural networks. Expert Systems with
Applications, 34, 1004–1017.
The aim of this study was to determine whether modified SVM Chen, Y. -W., & Lin, C. -J. (2006). Combining SVMs with various feature selection
classifier with volume-based example weighting can be success- strategies. In Feature extraction (pp. 315–324). <http://www.csie.ntu.edu.tw/
cjlin/papers/features.pdf>.
fully applied for the purpose of predicting short-term trends on Chu, H.-H., Chen, T.-L., Cheng, C.-H., & Huang, C.-C. (2009). Fuzzy dual-factor time-
the stock market which can lead to constructing profitable trading series for stock index forecasting. Expert Systems with Applications, 36, 165–171.
strategy. We examined the influence of applying Fishers feature http://dx.doi.org/10.1016/j.eswa.2007.09.037.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20,
selection method as well. Further, in order to boost trading results, 273–297.
we extended the training vector by introducing delays of calcu- Dai, W., Shao, Y. E., & Lu, C.-J. (2012). Incorporating feature selection method into
lated technical indicators. All tests were conducted in accordance support vector regression for stock index forecasting. Neural Computing and
Applications, 23, 1551–1561.
with walk-forward methodology for analyzing financial time Davies, D. (2004). Daytrading with on-balance volume. Stocks & Commodities, 22,
series. 20–25.
Overall, the study showed that plain SVM classifier has the lim- Faber, B. (1994). The relative strength index. Stocks & Commodities, 12, 381–384.
Gustafson, G. (2001). Which volatility measure? Stocks & Commodities, 19, 46–50.
ited capability for accurate trend detection and does not perform
Hammerbacher, I. M., & Yager, R. R. (1981). The personalization of security
well with proposed trading strategy. However, proposed improve- selection: An application of fuzzy set theory. Fuzzy Sets and Systems, 5, 1–9.
ments such as feature selection and example weighting signifi- Hsu, C.-M. (2011). A hybrid procedure with feature selection for resolving stock/
cantly enhance trading results. Moreover, enlarging the training futures price forecasting problems. Neural Computing and Applications, 22,
651–671.
data set and introducing delays for technical indicators led to Hsu, C. -W., Chang, C. -C., & Lin, C. -J. (2010). A practical guide to support vector
results that were better than for buy-and-hold strategy both in classification. (pp. 1–16). <https://www.cs.sfu.ca/people/Faculty/teaching/726/
terms of rate of return and maximum drawdown. Proposed strat- spring11/svmguide.pdf>.
Huang, C.-F. (2012). A hybrid stock selection model using genetic algorithms and
egy has the ability to identify downward trends and, as a conse- support vector regression. Applied Soft Computing, 12, 807–818. http://
quence, to suspend trading until trend reversal occurs. dx.doi.org/10.1016/j.asoc.2011.10.009<http://linkinghub.elsevier.com/retrieve/
This study was inspired by Tay and Cao (2002) where modifica- pii/S1568494611004030>.
Kara, Y., Acar Boyacioglu, M., & Baykan, O. K. (2011). Predicting direction of stock
tion to the loss function was made for SVM regression problems. price index movement using artificial neural networks and support vector
We assumed that problem can be reformulated for the purpose machines: The sample of the Istanbul stock exchange. Expert Systems with
of classification. Moreover, significant improvements, inspired by Applications, 38, 5311–5319.
Ladyzynski, P., Zbikowski, K., & Grzegorzewski, P. (2013). Stock trading with
the knowledge gained from financial market analysis, can be made random forests, trend detection tests and force index volume indicators.
in its formulation. Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, 7895,
Two things are new in the presented study. Firstly, a novel 441–452.
Lee, M.-C. (2009). Using support vector machine with a hybrid feature selection
approach of VW-SVM, which incorporates volume information in
method to the stock trend prediction. Expert Systems with Applications, 36,
its loss function, is applied for constructing trading strategy. Sec- 10896–10904.
ondly, several robust methods were used together for the first Magdon-Ismail, M., Atiya, A., Pratap, A., & Abu-Mostafa, Y. (2003). The maximum
time. Fishers feature selection, VW-SVM, input vector delays and drawdown of the Brownian motion. In 2003 IEEE international conference on
computational intelligence for financial engineering, proceedings.
technical indicators were employed in combination with walk- Ng, W. W., Liang, X.-L., Li, J., Yeung, D. S., & Chan, P. P. (2014). LG-Trader: Stock
forward optimization procedure. This study also produces a novel trading decision support based on feature selection by weighted localized
_
K. Zbikowski / Expert Systems with Applications 42 (2015) 1797–1805 1805

generalization error model. Neurocomputing, 146, 104–112. http://dx.doi.org/ Theodoridis, S., & Koutroumbas, K. (2008). Pattern recognition. Academic Press.
10.1016/j.neucom.2014.04.066<http://linkinghub.elsevier.com/retrieve/pii/ Wen, Q., Yang, Z., Song, Y., & Jia, P. (2010). Automatic stock decision support system
S0925231214008765>. based on box theory and SVM algorithm. Expert Systems with Applications, 37,
Refenes, A. N., Bentz, Y., Bunn, D. W., Burgess, A. N., & Zapranis, A. D. (1997). 1015–1022.
Financial time series modelling with discounted least squares backpropagation. Wong, F. S. (1991). Time series forecasting using backpropagation neural networks.
Neurocomputing, 14, 123–138. Neurocomputing, 2, 147–159.
Rosillo, R., Giner, J., & Fuente, D. (2013). The effectiveness of the combined use of Zbikowski, K. (2014). Time series forecasting with volume weighted support vector
VIX and support vector machines on the prediction of S&P 500. Neural machines. Communications in Computer and Information Science, 424, 250–258.
Computing and Applications, 25, 321–332. http://dx.doi.org/10.1007/s00521- Zhang, X., Hu, Y., Xie, K., Wang, S., Ngai, E., & Liu, M. (2014). A causal feature
013-1487-7<http://link.springer.com/10.1007/s00521-013-1487-7>. selection algorithm for stock prediction modeling. Neurocomputing, 142, 48–59.
Tay, F. E. H., & Cao, L. J. (2002). Modified support vector machines in financial time http://dx.doi.org/10.1016/j.neucom.2014.01.057<http://linkinghub.elsevier.com/
series forecasting. Neurocomputing, 48, 847–861. retrieve/pii/S0925231214005359>.
Teixeira, L. A. (2010). A method for automatic stock trading combining technical
analysis and nearest neighbor classification. Expert Systems with Applications, 37,
6885–6890.

You might also like