Financial Technical Indicators
Financial Technical Indicators
Article
Financial Technical Indicator and Algorithmic Trading Strategy
Based on Machine Learning and Alternative Data
Andrea Frattini 1 , Ilaria Bianchini 1 , Alessio Garzonio 1 and Lorenzo Mercuri 2, *
Abstract: The aim of this paper is to introduce a two-step trading algorithm, named TI-SiSS. In
the first step, using some technical analysis indicators and the two NLP-based metrics (namely
Sentiment and Popularity) provided by FinScience and based on relevant news spread on social
media, we construct a new index, named Trend Indicator. We exploit two well-known supervised
machine learning methods for the newly introduced index: Extreme Gradient Boosting and Light
Gradient Boosting Machine. The Trend Indicator, computed for each stock in our dataset, is able to
distinguish three trend directions (upward/neutral/downward). Combining the Trend Indicator
with other technical analysis indexes, we determine automated rules for buy/sell signals. We test
our procedure on a dataset composed of 527 stocks belonging to American and European markets
adequately discussed in the news.
1. Introduction
Citation: Frattini, A.; Bianchini, I.;
Garzonio, A.; Mercuri, L. 2022. The automated stock picking/trading algorithms have recently gained an increasing
Financial Technical Indicator and relevance in the financial industry. The possibility of processing information from different
Algorithmic Trading Strategy Based conventional and unconventional sources gives to economic agents the opportunity to
on Machine Learning and Alternative make great profits combining such information with classical financial indicators. After
Data. Risks 10: 225. https://doi.org/ the advent of Big Data, new powerful computers and technologies, artificial intelligence
10.3390/risks10120225 (AI), and data science (DS) have not only increased their roles in finance but also in many
different disciplines, such as cybersecurity, marketing, economics, and many others.
Academic Editor: Mogens Steffensen
Frequently used algorithms for trading strategies privilege the usage of historical
Received: 26 September 2022 prices from stock markets as a tool to make decisions (buy and sell) in financial markets.
Accepted: 18 November 2022 Among these studies, we mention academic papers that combine supervised/unsupervised
Published: 25 November 2022 machine learning methods with information inferred directly from financial time series.
Publisher’s Note: MDPI stays neutral For instance, Allen and Karjalainen (1999) proposed a genetic algorithm to learn technical
with regard to jurisdictional claims in trading rules that are strictly connected to the level of returns and volatility. More precisely,
published maps and institutional affil- according to this work, the investor should take a long position when positive returns and
iations. low daily volatility occur and stay out of the market in the presence of negative returns
and high volatility. Although Allen and Karjalainen (1999) suggested the possibility of
extending their approach by including other types of information such as fundamental
and macro-economic data, the main ingredients for obtaining a trading signal are the
Copyright: © 2022 by the authors. volatility and the returns. Applying the support vector machine, Lee (2009) proposed an
Licensee MDPI, Basel, Switzerland. algorithm for the identification of the direction of change in the daily NASDAQ index
This article is an open access article based on a set composed of closing prices of 20 futures contracts and nine spot indexes.
distributed under the terms and
Moreover, Chong et al. (2017) analyzed three unsupervised feature extraction methods
conditions of the Creative Commons
(i.e., principal component analysis, autoencoder, and the restricted Boltzmann machine) to
Attribution (CC BY) license (https://
predict future market behavior using exclusively high-frequency intraday stock returns as
creativecommons.org/licenses/by/
input data. Starting from the hypothesis that financial time series contain all private/public
4.0/).
information, Barucci et al. (2021) recently developed a trading algorithm based on financial
indicators that are identified as outliers of the following series: returns, trading volume
growth, bid–ask spread, volatility, and serial correlation between returns and trading
volumes. These indicators have been used to identify a market signal (buy, neutral, or
sell) for each security and a corresponding trading strategy (for completeness we suggest
Ballings et al. 2015, for a comparison among different machine learning methods widely
applied in finance for the stock price direction prediction).
However, due to the presence of different sources, a challenge for an automated trading
algorithm is to take into account the investors’ opinions that some funds’ managers could
underestimate by following strategies based exclusively on financial market data. For this
reason, a new paradigm for constructing a trading algorithm is rising among academics
and practitioners. Indeed, despite the market efficient hypothesis Fama (1970), many
authors documented not only the possibility to predict future movements based on the past
financial prices but also to identify a market trend using the amount of data available in
unconventional sources such as social networks, blogs, thematic forum, online newspapers,
and many others. These data, named hereafter Alternative Data, refer to qualitative and
quantitative information and represent how a particular company (or financial instruments)
is perceived by the market and how popular it is among investors. For instance, Jaquart et al.
(2020) analyzed the literature on Bitcoin prediction through machine learning and identified
four groups of predictors: technical (e.g., returns, volatility, volume, etc.), blockchain-
based (e.g., number of bitcoin transactions), sentiment-/interest-based (e.g., bitcoin Twitter
sentiment), and asset-based (e.g., returns of a connected market index) features. Among
many authors, we mention, for example, Bollen et al. (2011), who investigated whether the
public mood, assessed through daily Twitter posts, predicts the Bitcoin market. Starting
from the observation that the increasing digitization of textual information, news, and
social media have become major resources for gathering information on important financial
events, Yang et al. (2017) developed a trading strategy using tweets sentiment and genetic
programming optimization. Recently, Duz and Tas (2021) confirmed that firm-specific
Twitter sentiment contains information for predicting stock returns and this predictive
power remains significant after controlling news sentiment. Their research leads to the
possibility of exploiting the social media sentiment in a trading strategy.
The aim of our work is twofold. Firstly, we contribute to the literature that studies the
possibility of identifying a trend for a stock using Alternative Data and standard financial
indicators. Indeed, similarly to Barucci et al. (2021), we propose a procedure based on
the identification of the market trend for a specific stock. However, our classification
(upward/neutral/downward) includes financial indicators and two metrics that quantify
all information contained in Alternative Data. Secondly, we develop a stock picking/trading
algorithm based on the results of the classification procedure.
The main instruments for our two-step trading (classification) strategy are Sentiment
and Popularity metrics. These two quantities, published daily by FinScience1 , are obtained
by searching over 1.5 million web pages, extrapolating, interpreting, and analyzing their
contents in order to identify valuable information. Sentiment, which takes values from −1
to 1, assesses the perception of the company, while Popularity, which assumes only positive
values, measures investors’ interest in a topic.
The remainder of the paper is organized as follows. Section 2 reviews technical analysis
market indicators used as inputs for our analysis with a discussion of the extreme gradient
boosting and light gradient boosted machine used in the classification step. Section 3
introduces the Sentiment and Popularity metrics developed in FinScience; then, we discuss a
supervised classification problem with the explanation of the solution for the creation of
labels. Afterwards, we create a new, reliable, FinScience-metrics-based financial technical
indicator that has the aim of predicting the price trend one day forward in the future. Finally,
we conduct a comparison with an alternative trading strategy developed by FinScience and
we also provide the gross/net performances for the most/least capitalized companies in a
dataset. Section 4 concludes.
Risks 2022, 10, 225 3 of 24
2.2. Extreme Gradient Boosting and Light Gradient Boosted Machine Algorithms
In this section we present the basic theory behind the selected models: the XGBoost and
the LightGBM algoritms. Both numerical procedures come from the same macro-family of
decision-tree-based procedures, but they are the most complicated and performant models
since they represent the most advanced level of boosting.
Risks 2022, 10, 225 4 of 24
where each f k is a prediction from the k-th decision tree. With this model construction, the
train phase is carried out by minimizing a regularized loss function between the observed
and predicted outputs. For a multiclassification problem, that is our case, the multiclass
logistic loss function (mlogloss) can be used. Let the true labels for a set of samples be
encoded as a 1-of-J binary indicator matrix Y, i.e., yi,j = 1 if sample i has label j taken from a
set of J labels. Let P be a matrix of probability estimates, with pi,j = Pr(yi,j = 1). Then, the
mlogloss L is defined as:
1 N M
L = − ∑ ∑ yi,j log( pi,j ). (2)
N i =1 j =1
Moreover, another important aspect is the regularization phase, in which the model con-
trols its complexity, preventing the overfitting problem. The XGBoost algorithm uses the
following regularizing function:
1 T
Ω = γT + λ ∑ ω j 2 (3)
2 j =1
where T is the number of leaves on a tree, ω j ∈ RT is the score on the j-th leaf of that tree, γ
and λ are, instead, the parameters used for controlling the overfitting, respectively, setting
the minimum gain split threshold and degree of regularization for L1 or L2 . Combining (2)
and (3), we have the objective function used in the minimization problem:
where the first sum is used for controlling the predictive power; indeed, l (ŷi , yi ) is a
differentiable convex loss function that measures the difference between the prediction ŷi
and the target yi , while the remaining term in (4) is used for controlling the complexity
of the model itself. The XGBoost procedure exploits the gradient descent algorithm to
minimize the quantity in (4): it is an iterative technique that computes the following
equation at each iteration (given an objective function).
∂Obj(y, ŷ)
∂ŷ
Then, the prediction ŷ is improved along with the direction of the gradient, to minimize
the objective (actually, in order to make XGBoost convert faster, it also takes into considera-
tion the second-order gradient using the Taylor approximation since not all the objective
functions have derivatives). Therefore, in the end, removing all the constant terms, the
resulting objective function is
n
1
Obj(t) = ∑ [ gi f t (xi ) + 2 hi f t2 (xi )] + Ω( f t ) (5)
i =1
Risks 2022, 10, 225 5 of 24
f t ( x ) = ωq( x ) (6)
where (q : Rm → T) is a "directing” function which assigns every data point to the q( x )-th
leaf, such as F = { f ( X ) = ωq ( x )}. Therefore, it is possible to describe the prediction
process as follows:
• Assign the data point x to a leaf by q.
• Assign the corresponding score ωq( x) on the q( x )-th leaf to the data point.
Then, it is necessary to define the set that contains the indices of the data points that
are assigned to the j-th leaf as follows:
I j = {i | q ( xi ) = j }
n t
1 1
Obj(t) = ∑ [ gi f t (xi ) + 2 hi f t2 (xi )] + γT + 2 λ ∑ ω2j
i =1 j =1
T
1
= ∑ [( ∑ gi )ω j + 2 ( ∑ hi + λ)ω2j ] + γT (7)
j =1 i∈ Ij i∈ Ij
In (7), the first part has a summation over the data points, but in the second one the
summation is performed leaf by leaf for all the T leaves. Since it is a quadratic problem of
ω j , for a fixed structure q( x) , the optimal value is
∑ i ∈ I j gi
ω ∗j =
∑i ∈ Ij hi + λ
and, therefore, simply substituting, the corresponding value of the objective function is
1 T ( ∑ i ∈ I j gi ) 2
Obj(t) = − ∑∑ + γT (8)
2 j =1 i∈ Ij hi + λ
where the leaf score ω j is always related to the first and second order of the loss function
g and h and the regularization parameter λ. This is how it is possible to find the score
associated with a leaf assuming to know the structure of a tree.
Now, we move back to the first question: how can we find a good tree structure? Since
this is a difficult question to answer, a good strategy is to split it into two sub-questions:
1. How do we choose the feature split?
2. When do we stop the split?
Starting from question number one, the first thing to say is that in any split the goal is,
of course, to find the best split-point that will optimize the objective function; therefore, for
Risks 2022, 10, 225 6 of 24
each feature it is crucial to first sort the numbers, then scan the best split-point and finally
choose the best feature.
Every time that a split is performed, a leaf is transformed into an internal node having
leaves with different scores than the initial one.
Clearly, the principal aim is to calculate the gain (or the eventual loss) obtained
from such a split. Usually, in other tree-based algorithms, the computation of this gain
is generally made through the Gini index or entropy metric, but in the XGBoost this
calculation is based on the objective function. In particular, XGBoost exploits the set of
indices I of data points assigned to the node, where IL and IR are the subsets of indices
of data points assigned to the two new leaves. Now, recalling that the best value of the
objective function on the j-th leaf is (8) without the first summation and the T value in the
last term, the gain of the split is:
" #
1 ( ∑ i ∈ I L gi ) 2 ( ∑ i ∈ I R gi ) 2 ( ∑ i ∈ I gi ) 2
gain = + − −γ
2 ∑ i ∈ I L h i + λ ∑ i ∈ IR h i + λ ∑ i ∈ I h i + λ
where:
( ∑ i ∈ I L gi ) 2
• is the value of the left leaf;
∑i ∈ IL hi + λ
( ∑ i ∈ I R gi ) 2
• is the value of the right leaf;
∑ i ∈ IR h i + λ
( ∑ i ∈ I gi ) 2
• is the objective of the previous leaf;
∑i ∈ I hi + λ
• γ is the parameter which controls the number of leaves (i.e., the complexity of
the algorithm).
To understand whether when transforming one leaf into two new leaves there is an
improvement of the objective or not, it is enough to look at the value (positive or negative)
of this gain. Therefore, in conclusion, the XGBoost algorithm, to build a tree, first finds the
best split-point recursively until the maximum depth (specifiable by the user) is reached
and then it prunes out the nodes with negative gains with a bottom-up order.
LightGBM is a fast, distributed, high-performance tree-based gradient boosting frame-
work developed in Ke et al. (2017). The most important features of this algorithm that
differentiate it from XGBoost are the faster training speed and the fact that it supports
parallel, distributed, and GPU learning and that it is capable of handling large-scale data.
Another main difference between LightGBM and XGBoost is the way in which they grow
a tree: the first one uses leaf-wise tree growth, expanding those leaves that bring a real
benefit to the model, while the second uses level-wise tree growth, expanding the tree one
level at a time and then cutting off the unnecessary branches at the end of the process.
The first thing that makes LightGBM faster in the training phase is the way it sorts the
numbers: this algorithm takes the inputs and divides them into bins, reducing a lot of the
computation effort needed to test all the possible combinations of splits. This process, called
histogram or bin way of splitting, clearly makes computations much faster than the ones of
XGBoost. The second improving characteristic of this algorithm is called exclusive feature
bundling (EFB), which reduces the dimension of the features that are mutually exclusive.
For example, if there are two features, green and red, which correspond to the color of a
financial candle, taking either value 1 or 0 based on the corresponding candlestick’s color,
then these features are mutually exclusive since a candle cannot be green and red at the
same time. Thus, this process creates a new bundled feature with a lower dimension, using
new values for identifying the two cases, which in this situation are number 11 for the
green candle and number 10 for the red one. Therefore, by reducing the dimensionality of
some of the input features, the algorithm is able to run faster, since it has fewer features
to evaluate. The third characteristic that differentiates LightLGBM from XGBoost is the
so-called gradient-based one side sampling (GOSS), which helps the LightLGBM algorithm
Risks 2022, 10, 225 7 of 24
to iteratively choose the sample to use for the computations. Suppose that the dataset
used has 100 features, then the algorithm computes 100 gradients G1 , G2 , . . . , G100 and sorts
them in descending order, for example, G73 , G24 , . . . , G8 . Then, the first 20% of these records
are taken out and an additional 10% is also randomly taken from the remaining 80% of
gradient records. Therefore, since the gradients are descendingly ordered, the algorithm
takes only the 10% of the features on which it performs well and the 20% of those features
on which it performs poorly (high gradient means high error), thus, on which it still has a
lot to learn. Afterwards, these two percentages of features are combined together, creating
the sample on which the LGBM trains, calculates the gradients again, and again applies the
GOSS in an iterative way.
3.1. Dataset
In our analysis, we consider a dataset containing the daily market closing prices for
527 companies. The period of observation ranges from 25 September 2019 to 18 February
20222 . Figure 2 shows the distribution of companies for industrial sectors and geographical
areas. The most represented sector is the Information Technology, with 117 companies while
the USA companies are 91% of the dataset.
Risks 2022, 10, 225 8 of 24
Figure 2. Classification of the companies that comprise the dataset based on geographical areas
(upper) and industrial sectors (lower).
Table 1 reports the main statistics for the five most/least market capitalized stocks
and we observe that the daily log-returns show generally a negative skewness with a high
kurtosis, denoting a departure from the normality assumption.
Table 1. Main information about the top 5 most/least capitalized stocks in the dataset. MDD stands
for maximum drawdown.
on the web. The DBPedia ontology is used and its named entity recognition (NER)—
automatic annotation of entities in a text—is able to associate pieces of text to members of
the abovementioned ontology. The ontology is based on the DBpedia Knowledge Graph,
which encompasses cleansed data from Wikipedia in all languages. Over the years, such
ontology evolved into a successful crowd-sourcing effort with the DBpedia community
continuously contributing to the ontology schema. It now contains about 800 classes,
numerous properties, and is cross-domain. Based on this ontology, the entities mentioned
are identified in the online content and create the structured data that are used as building
blocks of the time series associated with what they call “signals” (which can be companies,
people, locations, topics, or a combination of them), which enables to identify companies
and trends that are spreading in the digital ecosystem as well as analyze how they are
related one to each other.
In particular, what is possible to learn from Alternative Data is conveyed and summa-
rized into several metrics, but for the purposes of this research, only the following are taken
into consideration:
• Company Digital Popularity measures the level of diffusion of a digital signal on the
web. It is obtained by aggregating the diffusion metrics of the news mentioning the
company and can take only positive values. The diffusion metric of a news article
is quantified by taking into account the number of times the link is shared on social
media and also the relevance of the company inside the text. It basically measures
how popular a company/stock is among investors.
• Company Sentiment measures the user’s perception concerning the company and can
take values in the interval [−1, 1], boundaries included. The sentiment metric of the
company on a specific date is represented by a weighted average of the sentiment
scores of all news mentioning the company, where the weight is the popularity of
the articles.
It is through the use of these two metrics that an attempt will be made to construct a
new technical financial indicator.
Popularity and Sentiment Analysis for the Most and the Least Capitalized Companies
In this section, we analyze some features of Sentiment and Popularity metrics for Apple
and Jones Soda (simply Jones from here on). These two stocks are the most and the least
capitalized companies in the dataset. Our aim is to provide a brief statistical analysis for the
time series of these two metrics. Similar results were obtained for each stock in the dataset
and are available upon request. As expected, Table 2 confirms that an more capitalized
company is more popular than a less capitalized one among market operators (i.e., the
agents talk more about Apple than Jones). Moreover, the higher Sentiment expected value
for Apple denotes that the most capitalized has a good reputation in the considered period
(i.e., the Apple news generally improves its public image). It is worth noting that the
negative sign of the skewness for Sentiment denotes a low number of strongly negative
news even though the majority of them have been positive, leading to a positive mean.
Table 2. Main statistics for Popularity and Sentiment for Apple and Jones.
From Figures 3 and 4, the paths of Sentiment and Popularity seem to suggest a stationary
behavior in all cases. To further investigate this fact, we consider an augmented Dickey–Fuller
Risks 2022, 10, 225 10 of 24
test. The null hypothesis is the unit-root against a trend-stationary alternative hypothesis. The
p-value for both metrics and both stocks is strictly smaller than 5%, suggesting a stationary
behavior. The highest p-value (0.000485) is obtained for the Jones Sentiment.
Figure 3. Sentiment and Popularity indexes of Apple ranging from 25 September 2019 to 18 February
2022 (upper part) and the corresponding autocorrelation function (lower part).
Figure 4. Popularity and Sentiment indexes of Jones ranging from 25 September 2019 to 18 February
2022 (upper part) and the corresponding autocorrelation function (lower part).
Both figures report also the autocorrelation functions. At least one lag is statistically
significant, denoting an autoregressive dependence structure for these two metrics. In
Risks 2022, 10, 225 11 of 24
all cases, the first lag is always significant, except for the Jones Popularity that shows a
significant nonzero second lag.
However, our goal is to create predictive models and, therefore, the data we are really
interested in are not the label at time t, but rather the label at time t + 1. For this reason, we
create a new column, called Target: this column is nothing more than the result of moving
back one period in the column containing the labels. In this way, at each time instant we
will have all the data corresponding to that day (which will make up the set of explanatory
variables) and the label data of the next day (which will be our response variable, i.e., the
output). Thus, in the training phase, the model will take as input the set of explanatory
variables, and with them it will try to predict tomorrow’s label, it will check the result, and,
in case of discrepancy, it will adjust the parameters.
already is. Therefore, it is also important to perform another kind of tuning, which is also
referred to as feature selection. In fact, for improving the capability of XGB or LGBM to
make predictions, to speed up their running time and to avoid overfitting, it is crucial to
give them as input only those attributes that actually give them gain, excluding the others,
to prevent the models from “wasting time” evaluating unnecessary and useless features.
Therefore, the attributes used for creating technical indicators (e.g., adjusted high, low,
and open prices) are excluded a priori, and then several combinations of inputs are tested
for both XGB and LGBM over the sample dataset, namely:
• Model 1: model with all the inputs; closing price, sentiment and its variations, popu-
larity and its variations, SMA(7) and its slope, SMA(80) and its slope, SMA(160) and
its slope, ART(7) and its slope, RSI(7) and its slope, Labelst .
• Model 2: model with all the inputs except for the point value of the simple moving
averages (but maintaining their slopes); closing price, sentiment and its variations,
popularity and its variations, SMA(7)’s slope, SMA(80)’s slope, SMA(160)’s slope,
ART(7) and its slope, RSI(7) and its slope, Labelst .
• Model 3: model with all the inputs except for the point value of the simple moving
averages (but maintaining their slopes) and the variations of sentiment and popularity;
closing price, sentiment, popularity, SMA(7)’s slope, SMA(80)’s slope, SMA(160)’s
slope, ART(7) and its slope, RSI(7) and its slope, Labelst .
• Model 4: model with all the inputs except for the variations of sentiment/popularity;
closing price, sentiment, popularity, SMA(7) and its slope, SMA(80) and its slope,
SMA(160) and its slope, ART(7) and its slope, RSI(7) and its slope, Labelst .
• Model 5: model with all the inputs except for any value related to the metrics; closing
price, SMA(7) and its slope, SMA(80) and its slope, SMA(160) and its slope, ART(7)
and its slope, RSI(7) and its slope, Labelst .
Looking at the average accuracies obtained by each model over the test sets, the best
input combination turned out to be the one represented by Model 4, namely, the one with
all inputs but with the sentiment and popularity daily variations (but with their point
values). Therefore, both XGBoost and LightGBM with this specific input combination are
tested over the entire dataset through a powerful virtual machine provided by Google
Cloud Platform (GCP), obtaining the following results.
As can be assessed from Table 3, both models performed better than a random gambler
(who has a 33% of probability of guessing price trends, since the three possible scenarios are
upward movement, downward movement, or sideways phase) in about 518 stocks over 527.
Moreover, both XGBoost and LightGBM have an average accuracy well above the minimum
threshold of 33%, both being between 53.5% and the 54%. Therefore, our two models have
similar performances in terms of accuracy, but not in terms of speed: in fact, the LightGBM
took almost the half of XGBoost’s time for making the same computations and maintaining
the same accuracy. Moreover, looking at the following explanatory examples, we can also
make further considerations over the prediction capability of these models.
XGBoost LightGBM
Tot. Stocks 527 527
N. Stocks < 33% 8 10
N. Stocks > 33% 519 517
Average accuracy 53.95% 53.64%
Running time 6:15 h 3:30 h
Figures 6 and 7 represent the predicted outcome (green) versus the real one (blue):
the bullish price trend has label 2, the sideways phase has label 1, while the downwards
movement is labeled as 0. In this way, the graph is least confusing as possible: when the line
is down prices are falling, when it is in the middle then prices are lateralizing, and when it
Risks 2022, 10, 225 14 of 24
is up the prices are also up. However, we know that, when using a predictive model, there
always are errors, but it is also true that not all the errors have the same severity. In this
situation, there are two main types of error:
1. One, representing the case in which the model predicts an ongoing price trend (either
positive or negative) and the actual value is a sideways phase (and vice versa);
2. Another, in which the model predicts an ongoing price trend (either positive or
negative) and the actual value is the exact opposite situation.
Moreover, another parameter should also be taken into account: how long a model
persists in the error. Unfortunately, the XGBoost, when it makes errors, persists more than
LightGBM, as can be seen from Figures 6 and 7: looking at the period between time step
35 and 45 of both, it is shown how the XGBoost predicts the wrong situation for a longer
time than the LightGBM, while the latter immediately corrects the prediction and then
anticipates the next bullish trend. The XGBoost corrects its errors slower than LightGBM in
the entire dataset.
Therefore, combining all the considerations made so far, we can decree the LightGBM
as the best model between the two.
However, what about the metrics? Did they really have an impact within the models?
First of all, let us focus only on the best selected model and, to answer to these questions, let
us see how the inputs affected the model. Indeed, Figure 8 represents the ranked variables
of six different stocks.
Again, these examples of feature importance plots are extendable to the entire dataset
and, more in general, at least one metric between Sentiment and Popularity is always in the
first half of attributes by utility in the model. Therefore, the digital metrics actually played
an important role in the construction of our predictive model.
Summarizing, now we have a new financial technical indicator, called Trend Indicator,
that has the following characteristics:
• Created through the machine learning model LightGBM;
• Embedded within a rolling algorithm;
Risks 2022, 10, 225 15 of 24
• Capable of reaching an average accuracy of 53.64% (much greater than the threshold
of 33%);
• Alternative-data strongly dependent.
Now it is necessary to understand how to use this indicator in a trading strategy.
SiSS TI-SiSS
Slope SMA(160) > 0 Slope SMA(160) > 0
Slope ADX Line > 0 Trend Indicator ≥ 1
Momentum > 0
Risks 2022, 10, 225 16 of 24
Both strategies require an upward sloping moving average at 160 periods because
both of them want to trade in the same direction of the long term and, since the strategies
only perform long operations, such long-term must be bullish. Then, the TI-SiSS exploits
the Trend Indicator while the SiSS uses ADX and momentum together, since the combination
of these two indicators provides the same information as the Trend Indicator, even if our
indicator should provide this information a day in advance.
As shown in Table 5, the timing component, instead, is equal for both the strategies
and provides objective rules for buying and selling a stock.
Table 5. Timing component for both TI-SiSS and SiSS. DPVt is the value of Popularity at time t;
MA_DPVt (7) denotes the sample mean of the Popularity over the interval [t − 6, t].
Buy Sell
DPVt > MA_DPVt (7) DPVt > MA_DPVt (7)
DPVt DPVt
− 1 ≥ 100% − 1 ≥ 100%
DPVt−1 DPVt−1
Sentimentt > 0.05 Sentimentt < −0.05
Therefore, both strategies buy when the current value of Popularity is greater than its
moving average at seven periods, the change in popularity with respect to the previous
available data is at least 100%, and the sentiment is positive. On the other hand, they sell
when the same conditions occur, but with a negative sentiment.
Hence, both strategies have the same aim: exploit bullish price movements through
long positions. Moreover, by construction, the TI-SiSS and SiSS are easy to compare with
each other since they act in the same way and the only difference between them is the way
they select the stocks over which they trade on.
Unfortunately, comparing these two strategies is not as straightforward as it could
seem. Indeed, while the SiSS has only static computations, making it easily testable in back-
test, the TI-SiSS has a structural problem: since the Trend Indicator is based on a machine
learning model, it needs as large a history as possible to produce accurate predictions. As
a result, TI-SiSS in the first part of the back-test will be neither reliable nor truthful due
to the limited training set, while it will perform with the accuracy previously described
once the training set is about 80% of the history of each stock. For this reason, it was
decided to compare the two strategies by looking only at the returns of the last 20% of
observation dates for each stock in our dataset. Clearly, considering only 20 percent of a
stock is equivalent to considering just a part for the whole, which is not, and does not claim
to be, a complete yardstick, even though it provides a good starting point for making some
general assessments on TI-SiSS and SiSS.
Hence, testing the strategies only on the last 20% of observing dates for each stock
(test set), we obtain the results in Table 6.
SiSS TI-SiSS
Tot. stocks 527 527
N. stocks with trades 134 309
N. stocks with positive returns 104 207
N. stocks with negative returns 30 102
Variance of returns 55.3% 84.02%
Average return 1.98% 3.12%
Therefore, for 527 stocks, 134 and 309 stocks trades applied SiSS and TI-SiSS, respec-
tively. Looking at the results, the SiSS had 104 (30) positive (negative) returns, while TI-SiSS
had 207 positive and 102 negative results. Therefore, we assess that the TI-SiSS is riskier
Risks 2022, 10, 225 17 of 24
than the SiSS, as also confirmed by a higher variance of the returns, but this risk is repaid
by a higher average gain, rebalancing the whole scenario.
Finally, it is interesting to analyze how the two strategies behave on the same stock see
Figures 9 and 10. For this comparison, we cannot use the aforementioned five most/least
capitalized stocks since there were no operations of the SiSS in the last 20% of the time
period. For this reason, we conduct this comparison on Pioneer Natural Resources Co.,
(Irving, TX, UAS) which has an average value of market capitalization (56 billion)among
the 527 companies in the considered dataset. Then, in this case, TI-SiSS performed better,
anticipating the market with the opening of the first trade (the first green triangle) and then
achieving a better final return.
Figure 9. Operations of SiSS over the test period of Pioneer Natural Resources Co.
Figure 10. Operations of TI-SiSS over the test period of Pioneer Natural Resources Co.
Figure 11. TI-SiSS results for Apple (upper part) and for Jones (lower part).
Using the buying/selling signals for each stock in Table 7, we construct the corre-
sponding TI-SiSS strategy with EUR 100 as initial wealth. If we consider the five most
capitalized companies, the TI-SiSS strategy returns a positive performance in all cases
except for United Health asset. The highest gain is achieved for Microsoft (more than 10%
of the initial wealth). Table 7 reports the maximum drawdown (MDD) for the performances
and we note that the TI-SiSS is able to provide a gain even when there are some strong
downward movements in the stock price behavior (for instance, see Microsoft among the
most capitalized). For the five least capitalized companies, we have a loss only in two cases,
with the highest gain for Community Health, and when comparing the performances with
the MDD we observe that, in all cases, our strategy gives much better results.
Table 8 analyzes the performances of the TI-SiSS strategy, introducing a multiplicative
transaction cost that is 1% of wealth. The transaction cost is applied for each buying/selling
signal. The conclusions in Table 7 are confirmed.
Table 7. Gross performances for a TI-SiSS trading strategy for the most/least capitalized companies.
Table 8. Net performances for a TI-SiSS trading strategy for the most/least capitalized companies.
To determine the net performance, we consider 1% of wealth as a multiplicative transaction cost for
each buying/selling operation.
4. Conclusions
Using Alternative Data, we were able to construct a new financial technical indicator,
named Trend Indicator, for each stock in the dataset. Alternative Data refers to unique
information content not previously known to financial markets in the investment process.
Investors, vendors, and research firms use the term to refer to information that is different
from the usual government- or company-provided data on the economy, earnings, and
other traditional metrics. Alternative Data are big and often unstructured data coming
from different sources, mainly digital (i.e., blogs, forums, social or e-commerce platforms,
maps, etc.), and whose analysis implies an algorithmic approach, unique expertise, and
cutting-edge technology.
Using Alternative Data, FinScience daily publishes two alternative data metrics ob-
tained by applying NLP algorithms such as entity extraction and classification: Sentiment
and Popularity. These two quantities were used for the construction of our “Trend Indicator”
(for each stock) that leads to the TI-SiSS strategy. To check the profitability of the TI-SiSS,
we compared it to the SiSS strategy. The latter provided positive results and is the natural
competitor of our strategy. Moreover, we conducted a gross/net performance analysis
of the TI-SiSS for the five most/least capitalized companies in the dataset. It is worth
noting that TI-SiSS is able to provide a gain even though there are some strong downward
movements in the stock price behavior (see, for instance, Microsoft, for which we observed
the lowest value of the maximum drawdown while the corresponding TI-SiSS’s gain was
the largest one among the five most capitalized stocks). A possible drawback of our trading
strategy may be given by the choice of the underlying news: in some cases, it could not
necessarily be related to financial topics. Moreover, a possible bias comes from the presence
of fake news which might be present on social media.
Further investigation is needed to improve the strategy by including other sources or
by testing our strategy on intraday stock data and considering different markets.
Author Contributions: Conceptualization, A.F.; methodology, A.F.; software, A.F.; validation, I.B. and
A.F.; formal analysis, A.F.; investigation, I.B.; data curation, A.F.; writing—original draft preparation,
A.F.; writing—review and editing, L.M.; visualization, A.F.; supervision, I.B., A.G. and L.M.; project
administration, A.F. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: The data and the software are available upon request.
Acknowledgments: We would like to thank FinScience, an Italian firm specialized in artificial
intelligence algorithms and Alternative Data analysis for the investment industry, for supporting
this paper.
Conflicts of Interest: The authors declare no conflicts of interest.
Risks 2022, 10, 225 20 of 24
Appendix A
1: import pandas as pd
2: import pandas_ta as ta
3: from lightgbm import LGBMClassifier
4:
5: MANDATORY_COLUMNS = [’adj_close’, ’adj_high’, ’adj_low’]
6:
7: function LABELING(x):
8: out = None
9: if ((x is not None) and (x > 0)) then:
10: out = 2
11: else if ((x is not None) and (x < 0)) then:
12: out = 0
13: else if ((x is not None) and (x == 0)) then:
14: out = 1
15: end if
16: return out
17: end function
18:
19: function CHECK _ COLUMNS _ NAME(dataset:pd.DataFrame) → bool:
20: list_cols = dataset.columns
21: return set(SISS_MANDATORY_COLUMNS).issubset(list_cols)
22: end function
23:
24: function T REND _I NDICATOR(dataset:pd.DataFrame, Donchian_periods:int, test_percentage:float):
25: dataset = dataset.copy()
26: check = check_columns_name(dataset=dataset)
27: if not check then :
28: raise Exception(f"Dataset is not correct! It must contain the columns called as follows: SISS_MANDATORY_COLUMNS ")
29: else:
30: donchian = ta.donchian(dataset[’adj_low’], dataset[’adj_high’], lower_length=Donchian_periods, up-
per_length= Donchian_periods).dropna()
31: donchian_up = donchian[f’DCU_Donchian_periods_Donchian_periods’]
32: donchian_pct = donchian_up.pct_change()
33: donchian_right_size = donchian_up.tolist()
34: [donchian_right_size.append(None) for _ in range(Donchian_periods-1)]
35: dataset[’Donchian channel’] = donchian_right_size
36: donchian_pct_right_size = donchian_pct.tolist()
37: [donchian_pct_right_size.append(None) for _ in range(Donchian_periods-1)]
38: dataset[’Donchian change’] = donchian_pct_right_size
39:
40: Labels = dataset["Donchian change"].apply(labeling).dropna()
41:
42: dataset =dataset.dropna()
43: del dataset["adj_high"]
44: del dataset["adj_low"]
45: del dataset["Donchian channel"]
46: del dataset["Donchian change"]
47:
48: LGBM_model = LGBMClassifier(objective=’softmax’, n_estimators=300, learning_rate=0.3, max_depth = 15, subsam-
ple_for_bin= 200,000, min_split_gain= 0, random_state=123)
49: test_size = int(len(dataset[’adj_close’])*test_percentage)
50: Target = Labels.shift(-1)
51: Y = Target
52: X = dataset
53: test_predictions = []
54:
55: for i in range(test_size) do:
56: x_train = X[:(-test_size+i)]
57: y_train = Y[:(-test_size+i)]
58: x_test = X[(-test_size+i):]
59: LGBM_model.fit(x_train, y_train)
60: pred_test = LGBM_model.predict(x_test)
61: test_predictions.append(pred_test[0])
62: end for
63: array_of_predictions = []
64: [array_of_predictions.append(None) for _ in range(len(X[:(-test_size)]))]
65: array_of_predictions.extend(test_predictions)
66: dataset[’Trend_Predictions’] = array_of_predictions
67: end if
68: return dataset, LGBM_model
69: end function
Risks 2022, 10, 225 21 of 24
import pandas as pd
2: import pandas_ta as ta
from lightgbm import LGBMClassifier
4:
MANDATORY_COLUMNS = [’adj_close’, ’adj_high’, ’adj_low’]
6: function LABELING(x):
out = None
8: if ((x is not None) and (x > 0)) then:
out = 2
10: else if ((x is not None) and (x < 0)) then:
out = 0
12: else if ((x is not None) and (x == 0)) then:
out = 1
14: end if
return out
16: end function
Algorithm A3 TI-SiSS.
Input: adjusted close price, sentiment, popularity, company names and Trend Indicator’s
predictions
Output: Dataframe with the security names and the corresponding returns
Required Packages: pandas, numpy, talib, linregress
import pandas as pd
import numpy as np
3: import talib
from scipy.stats import linregress
Algorithm A4 SiSS.
Input: adjusted close price, adjusted high price, adjusted low price, sentiment, popularity
and company names
Output: Dataframe with the security names and the corresponding returns
Required Packages: pandas, numpy, talib, linregress
import pandas as pd
import numpy as np
import talib
4: from scipy.stats import linregress
8:
function SISS _ CHECK _ COLUMNS _ NAME(dataset:pd.DataFrame) → bool:
list_cols = dataset.columns
return set(SISS_MANDATORY_COLUMNS).issubset(list_cols)
12: end function
Notes
1 Italian firm specialized in the usage of AI in financial world. Website: accessed on 1 January 2022 https://finscience.com/it/.
2 Sentiment and Popularity were first published on 25 September 2019 and, for this reason, we are not able to take into account
previous time-frames.
3 Some assets were included in the dataset after 25 September 2019.
References
Achelis, Steven B. 2001. Technical Analysis from A to Z, 1st ed.; New York: McGraw Hill.
Allen, Franklin, and Risto Karjalainen. 1999. Using genetic algorithms to find technical trading rules. Journal of Financial Economics
51: 245–71. [CrossRef]
Ballings, Michel, Dirk Van den Poel, Nathalie Hespeels, and Ruben Gryp. 2015. Evaluating multiple classifiers for stock price direction
prediction. Expert Systems with Applications 42: 7046–56. [CrossRef]
Barucci, Emilio, Michele Bonollo, Federico Poli, and Edit Rroji. 2021. A machine learning algorithm for stock picking built on
information based outliers. Expert Systems with Applications 184: 115497. [CrossRef]
Bollen, Johan, Huina Mao, and Xiaojun Zeng. 2011. Twitter mood predicts the stock market. Journal of Computational Science 2: 1–8.
[CrossRef]
Chen, Tianqi, and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. Paper presented at 22nd Acm Sigkdd International
Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17.
Chong, Eunsuk, Chulwoo Han, and Frank C. Park. 2017. Deep learning networks for stock market analysis and prediction: Methodol-
ogy, data representations, and case studies. Expert Systems with Applications 83: 187–205. [CrossRef]
Duz Tan, Selin, and Oktay Tas. 2021. Social media sentiment in international stock returns and trading activity. Journal of Behavioral
Finance 22: 221–34. [CrossRef]
Ellis, Craig A., and Simon A. Parbery. 2005. Is smarter better? A comparison of adaptive, and simple moving average trading strategies.
Research in International Business and Finance 19: 399–411. [CrossRef]
Fama, Eugene F. 1970. Efficient capital markets: A review of theory and empirical work. Journal of Finance 25: 383–417. [CrossRef]
Jaquart, Patrick, David Dann, and Carl Martin. 2020. Machine learning for bitcoin pricing—A structured literature review. Paper
presented at WI 2020 Proceedings, Potsdam, Germany, March 8–11; Berlin: GITO Verlag; pp. 174–88.
Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly
efficient gradient boosting decision tree. Paper presented at 31st Conference on Neural Information Processing Systems (NIPS
2017), Long Beach, CA, USA, December 4–9.
Lee, Ming-Chi. 2009. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert
Systems with Applications 36: 10896–904. [CrossRef]
Levy, Robert A. 1967. Relative strength as a criterion for investment selection. The Journal of Finance 22: 595–610. [CrossRef]
LightGBm. 2022. Python Package. Available online: https://lightgbm.readthedocs.io/en/v3.3.2/ (accessed on 10 October 2022).
Thomsett, Michael C. 2019. Momentum Oscillators: Duration and Speed of a Trend. In Practical Trend Analysis: Applying Signals and
Indicators to Improve Trade Timing, 2nd ed.; Berlin: De Gruyter.
XGBoost. 2022. Python Package . Available online: https://xgboost.readthedocs.io/en/stable/python/index.html (accessed on
3 November 2022).
Yang, Steve Y., Sheung Yin Kevin Mo, Anqi Liu, and Andrei A. Kirilenko. 2017. Genetic programming optimization for a sentiment
feedback strength based trading strategy. Neurocomputing 264: 29–41. [CrossRef]