Data in Brief
Data in Brief
Data in Brief
Data Article
a r t i c l e i n f o abstract
http://dx.doi.org/10.1016/j.dib.2016.12.044
2352-3409/& 2017 The Author. Published by Elsevier Inc. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
570 R. Bruni / Data in Brief 10 (2017) 569–575
Specifications Table
The datasets are taken from real-world major financial markets and they are very recent: they
range from 20th April 2010 to 12th July 2016.
The datasets contain a vast selection of financial indicators regarded as highly trend indicative by
technical analysis.
The datasets are filtered and cleaned to remove data errors and missing.
These datasets can be used as benchmarks by researchers willing to test trading algorithms on real-
world recent data.
These datasets can also be used as benchmarks to test classification strategies on publicly available
difficult data.
1. Data
We provide daily time series for two major indices belonging to two different stock markets. The
first one is the Standard & Poor's 500 (S&P 500), which is an American stock market index based on
the market capitalizations of 500 large companies having common stock listed on the NYSE or
NASDAQ. This is one of the most commonly followed equity indices, and many consider it one of the
best representations of the U.S. stock market. The second is the Financial Times Stock Exchange
Milano Indice di Borsa (FTSE MIB), which is the primary benchmark Index for the Italian equity
markets. It consists of the 40 most-traded stock classes on the exchange, and captures approximately
80% of the domestic market capitalization. For these indices, for each trading day ranging from 20th
April 2010 to 12th July 2016, we provide the opening price, closing price, maximum, minimum, and a
number of indicators regarded as highly trend indicative by technical analysis (see, e.g., [1–3] and
references therein), as described in more detail in the next section. Each data record corresponds to
one day.
We also provide a binary classification for each day: the class is 1 if the subsequent time period is
favorable for day trading and 0 otherwise. Data are filtered to check and to correct missing or inac-
curate data. Indicators which are computed using the n past observations are available only from the
ðn þ 1Þ-th record of the dataset. The class is not available for the last record. These missing data are
encoded as ‘N’. No other missing data appear in the dataset. Data cleaning is indeed an important
issue for similar data (see, e.g., [4] for references on this widespread problem). The data provided can
be used to test the effectiveness of technical analysis in predicting the trend, or to test the accuracy of
classification algorithms.
R. Bruni / Data in Brief 10 (2017) 569–575 571
Each data record refers to one single trading day. Such a time period is indicated by a subscript
t A 1; …; m. Each data record is identified with the date and it contains the following values:
After this, we compute the indicators described below. For each of them, the current value is
denoted by a subscript t, the previous by t 1, etc.
2.1. Momentum
2.2. EMA
Moving averages are widely used for the analysis of time series. A simple moving average (SMA) is
the unweighted mean of the previous n data of the historical price data, most often the closing price.
A weighted moving average (WMA) has multiplying factors to give different weights to the different
prices. Usually, recent prices receive more importance than older prices. In particular, an exponential
moving average (EMA) applies weighting factors which decrease exponentially in the past, however
never reaching zero. EMAt ðnÞ is computed using the current closing price ci and the EMA of the
previous day EMAt 1 ðnÞ.
2 2 2
EMAt ðnÞ ¼ ct þ 1 EMAt 1 ðnÞ ¼ ðct EMAt 1 ðnÞÞ þ EMAt 1 ðnÞ
nþ1 nþ1 nþ1
In our case we use n ¼ 12 and also n ¼26.
Range: EMA has the same range of the price of the asset; in general it can take any real positive value.
2.3. MACD
Moving Average Convergence/Divergence (MACD) is an oscillator that should reveal changes in the
strength, direction, momentum, and duration of a trend in a stock's price. The simplest version of
572 R. Bruni / Data in Brief 10 (2017) 569–575
MACD is the difference between two moving averages, one over a shorter period n and one over a
longer period m.
MACDt ðn; mÞ ¼ EMAt ðnÞ EMAt ðmÞ
Further insight can be obtained by using a third moving average of the MACDðn; mÞ itself over a
period s, called "signal line" SLðsÞ. When MACDðn; mÞ increases and crosses the signal line, it is a bullish
signal; when it decreases and crosses the signal line, it is a bearish signal.
MACDt ðn; m; sÞ ¼ ðEMAt ðnÞ EMAt ðmÞÞ SLt ðsÞ
In our case we use n ¼ 12, m ¼26 and s¼9.
Range: MACD can take any real value, either positive or negative. Positive values denotes that the
index trend is increasing, and vice versa.
2.4. ROI
Return on Investment (ROI) is one way of considering profits in relation to capital invested. Usually
it is the ratio between return and invested capital. In our case, we use the average return over the last
n days, denoted by aver r t ; r t 1 ; …; r t n þ 1 , and the current closing value.
aver r t ; r t 1 ; …; r t n þ 1
ROI t ðnÞ ¼
ct
In our case we use n ¼ 10, 20 and 30.
Range: ROI can take any real value, either positive or negative. Positive values denote income,
negative ones denote loss.
2.5. RSI
Relative Strength index (RSI) is a momentum oscillator that compares the magnitude of recent
gains and losses over a specified time period to measure speed and change of price movements of a
security. By defining the upward change as ut ¼ ct ct 1 if ct 4 ct 1 and 0 otherwise, and the
downward change as dt ¼ ct 1 ct if ct o ct 1 and 0 otherwise, the relative strength RSðnÞ is the
average of the last n upward changes divided the average of the last n downward changes.
aver ut ; ut 1 ; …; ut n þ 1
RSt ðnÞ ¼
aver dt ; dt 1 ; …; dt n þ 1
2.6. STOCHRSI
Stochastic oscillators attempt to predict price turning points by comparing the closing price of a
security to its price range. This concept can be applied to the RSI itself, obtaining the Stochastic RSI
R. Bruni / Data in Brief 10 (2017) 569–575 573
(SRSI). By computing the RSI range from its minimum in the last n periods minfRSI t ;
RSI t 1 ; …; RSI t n þ 1 g and its maximum in the last n periods max RSI t ; RSI t 1 ; …; RSI t n þ 1 , the SRSI
is defined as follows.
RSI t ðnÞ min RSI t ; RSI t 1 ; …; RSI t n þ 1
SRSI t ðnÞ ¼
max RSI t ; RSI t 1 ; …; RSI t n þ 1 min RSI t ; RSI t 1 ; …; RSI t n þ 1
In our case we use n ¼ 10, 14 and 30.
Range: its range is between 0 and 1.
2.7. ATR
Average True Range (ATR) measures the degree of price volatility. The rage of a price is simply
defined as maxt mint , the True Range (TR) extends it to yesterday's closing price if it was outside of
today's range:
TRt ¼ maxfmaxt ; ct 1 g minfmint ; ct 1 g
Now, by denoting with EMAt ðn; X Þ the exponential moving average of a generic X over the last n
periods, we have that ATR is the exponential moving average of the TR:
ATRt ðnÞ ¼ EMAt ðn; TRÞ
In our case we use n ¼ 14.
Range: it is any positive value.
2.8. ADX
Average Directional Index (ADX) does not indicate trend direction or momentum, only trend
strength. It is computed using the positive directional indicator (þ DI), the negative directional
indicator (-DI), and the Average True Range (ATR).
By defining the upmove as upt ¼ maxt maxt 1 and the downmove as dwt ¼ mint 1 mint ,
if upt 4 dwt and upt 4 0 then þ DMt ¼ upt ; otherwise þ DMt ¼ 0;
if dwt 4 upt and dwt 4 0 then DMt ¼ dwt ; otherwise DM t ¼ 0:
Now, recalling that EMAt ðn; X Þ denotes the exponential moving average of X over the last n periods,
we compute
100 EMAt ðn; þ DMÞ 100 EMAt ðn; DMÞ
þ DI t ðnÞ ¼ DI t ðnÞ ¼
ATRt ðnÞ ATRt ðnÞ
ADX is finally computed as follows, with absð:Þ denoting the absolute value:
EMAt ðn; absð þDI ð DI ÞÞÞ
ADX t ðnÞ ¼ 100
þ DI þ ð DI Þ
In our case we use n ¼ 14.
Range: it ranges between 0 and 100. Generally, ADX values below 20 indicate trend weakness, and
values above 40 indicate trend strength.
2.9. Williams %R
Williams %R is an oscillator that analyzes whether a stock or commodity market is trading near the
high or the low, or somewhere in between, of its recent trading range.
max maxt ; maxt 1 ; …; maxt n þ 1 ct
%Rt ðnÞ ¼ 100
max maxt ; maxt 1 ; …; maxt n þ 1 min mint ; mint 1 ; …; mint n þ 1
574 R. Bruni / Data in Brief 10 (2017) 569–575
2.10. CCI
Commodity Channel Index (CCI) is used to identify cyclical trends not only in commodities, but
also equities and currencies. Define the Typical Price (TP) as follows.
maxt þ mint þ ct
TP t ¼
3
By computing the simple average over the last n periods of the typical price and its standard
deviation, CCI is defined as follows.
TP t aver TP t ; TP t 1 ; …; TP t n þ 1
CCIt ðnÞ ¼
0:015dev TP t ; TP t 1 ; …; TP t n þ 1
In our case we use n ¼ 20.
Range: the CCI fluctuates above and below zero. The constant 0.015 should ensure that approxi-
mately 7080% of CCI values lay between 100 and þ100.
2.11. UO
Ultimate Oscillator (UO) uses buying or selling "pressure", represented by where the daily closing
price falls within the daily true range. The Buying Pressure (BP) and the True Range (TR) are com-
puted as follows.
BP t ¼ ct minfmint ; ct 1 g TRt ¼ maxfmaxt ; ct 1 g minfmint ; ct 1 g
Then, the total buying pressure over the past n days is computed as follows.
BP t þ BP t 1 þ BP t n þ 1
AVGt ðnÞ ¼
TRt þ TRt 1 þ TRt n þ 1
Such a total buying pressure is computed for short, intermediate and long time intervals, and the
UO is:
4 AVGt ðnÞ þ 2 AVGt ðmÞ þ AVGt ðsÞ
UOt ðn; m; sÞ ¼ 100
7
In our case we use n ¼ 7, m ¼14 and s¼28.
Range: it ranges between 0 and 100.
2.12. Class
The problem of data classification is the attribution of labels to records according to a criterion
automatically learned from a training set, that is a set of records that already have a class. Classifi-
cation is a very important data mining task (see also [5]), and many classifications algorithms are
today available (e.g., [6]). We assign the class to each record, so that any portion of the dataset can be
used as training set. After this learning phase, the classification algorithm will be able to predict the
class for the rest of the records. The accuracy of such a prediction can be computed by comparing it
with the real class, which is the one given in the dataset.
The class that we assign to each daily record is 1 if the subsequent day is favorable for intra-day
trading and 0 otherwise. Favorable for intra-day trading means that the increase between the opening
price and the closing price of the same day is large enough for obtaining a profit by buying at the
opening price and selling at the closing price. A threshold must be selected to define “large enough”;
R. Bruni / Data in Brief 10 (2017) 569–575 575
we select the value 0.3%, which should provide a reasonable opportunity for profit. Therefore, the
class is defined as follows. Its prediction would allow to perform intra-day trading in the following
day, as described above, or it could possibly be used to define inter-day trading strategies.
1 if 100 ðct þ 1 ot þ 1 Þ=ot þ 1 Z 0:3
Classt ¼
0 otherwise
Note that the class of a given day is clearly not computable from the data available up to that day.
However, we assigned it for the whole dataset by simply looking, for each day, at the following day.
According to technical analysis, there should be some kind of relation between the above
described indicators at day t and the market evolution at day t þ 1, that determines the class of day t
(see for example [7]). The classification algorithm aims at discovering such a relation by predicting
the class using the above described indicators. For an analysis of the existence of profit opportunities
with respect to the market index, see also [8].
The datasets are provided in CSV format, that can be opened with MS Excel or as text file.
Transparency data associated with this paper can be found in the online version at http://dx.doi.
org/10.1016/j.dib.2016.12.044.
Supplementary data associated with this article can be found in the online version at http://dx.doi.
org/10.1016/j.dib.2016.12.044.
References
[1] R.W. Colby, The Encyclopedia of Technical Market Indicators, 2nd ed., McGraw Hill, New York, 2003.
[2] C.D. Kirkpatrick, J.R. Dahlquist, Technical Analysis: The Complete Resource for Financial Market Technicians, 3rd ed.,
Financial Times Press, Old Tappan, New Jersey, 2006.
[3] J.J. Murphy, Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications, New
York Institute of Finance, Paramus, New Jersey, 1999.
[4] R. Bruni, Error Correction for Massive Data Sets, Optim. Methods Softw. 20 (2005) 295–314.
[5] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer,
New York, 2001.
[6] R. Bruni, G. Bianchi, Effective Classification using Binarization and Statistical Analysis, IEEE Trans. Knowl. Data Eng. 27
(2015) 2349–2361.
[7] A.W. Lo, J. Hasanhodzic, The Evolution of Technical Analysis: Financial Prediction from Babylonian Tablets to Bloomberg
Terminals, Bloomberg Press, New York, 2010.
[8] R. Bruni, F. Cesarone, A. Scozzari, F. Tardella, A Linear Risk-Return Model for Enhanced Indexation in Portfolio Optimization,
Oper. Res. Spectr. 37 (2015) 735–759.