(A) - A Conditional Fuzzy Inference Approach in Forecasting
(A) - A Conditional Fuzzy Inference Approach in Forecasting
The material cannot be used for any other purpose without further
permission of the publisher and is for private use only.
There may be differences between this version and the published version.
You are advised to consult the publisher’s version if you wish to cite from
it.
http://eprints.gla.ac.uk/202957/
This study introduces a Conditional Fuzzy inference (CF) approach in forecasting. The proposed approach is able
to deduct Fuzzy Rules (FRs) conditional on a set of restrictions. This conditional rule selection discards weak rules
and the generated forecasts are based only on the most powerful ones. Through this process, it is capable of
achieving higher forecasting performance and improving the interpretability of the underlying system. The CF
concept is applied in a series of forecasting exercises on stocks and football games datasets. Its performance is
benchmarked against a Relevance Vector Machine (RVM), an Adaptive Neuro-Fuzzy Inference System (ANFIS),
an Ordered Probit (OP), a Multilayer Perceptron Neural Network (MLP), a k-Nearest Neighbour (k-NN), a
Decision Tree (DT) and a Support Vector Machine (SVM) model. The results demonstrate that the CF is providing
higher statistical accuracy than its benchmarks.
* School of Management, University of Bath, Wessex House, Bath BA2 7AY, United Kingdom
([email protected])
** University of Glasgow Business School, University of Glasgow, Gilbert Scott Building, Glasgow G12 8QQ, United
Kingdom ([email protected]), Corresponding Author
*** University of Glasgow Business School, University of Glasgow, Gilbert Scott Building, Glasgow G12 8QQ, United
Kingdom ([email protected])
**** Essex Business School, University of Essex, SO4 3SQ, United Kingdom
([email protected])
1
1. Introduction
In this research a Conditional Fuzzy inference (CF) approach for forecasting is introduced. In CF a set of Fuzzy
Rules (FRs) is generated in the in-sample and a power to each rule is assigned based on the rule’s frequency and
accuracy. Thereafter all rules are ranked and the strongest are applied in the out-of-sample. For each out-of-
sample data point, membership functions are calculated and if certain conditions are met, a weighted average
of the strongest FRs is estimated. These conditions are based on the in-sample and ensure that the rule is strong
enough for out-of-sample estimation. If the conditions for a data point are not satisfied and no strong rule is
close, then no forecast is generated for that point. Our CF process avoids weak rules and ensures that the
generated forecasts are based on a weighted average of the most powerful FRs.
The proposed methodology is advantageous to a series of issues. Firstly, it is useful in problems where the
practitioner or the researcher is interested only in strong signals and the risk of having a poor forecast is greater
than having no forecast. For example, in financial trading, betting on a sport or in any other sensitive decision-
making process, poor forecasts lead to financial losses. In these environments the underlying series are volatile
and decision makers are risk averse; abstaining from the market is better than making decisions under
uncertainty. Secondly, CF can improve the out-of-sample accuracy of the underlying system and offer
transparency at the same time. This is beneficial in problems where complex models (such as machine learning)
are necessary. Thirdly, the generated rules can be easily applied by non-experts as the number of rules applied
is small and they are easily replicated. Lastly, the chosen rules do not suffer from over-fitting or under-fitting
problems. In machine learning and complex models, it is common for the performance to be driven by extensive
experimentation. In these cases, the generated forecasts can be due to over-specification and have no
generalisation value. In CF the weak signals are dropped and the noise within the model is reduced.
An extensive empirical study is designed and implemented in order to test the merits of the proposed
methodology. CF is applied to the two of the most popular forecasting exercises, namely stocks’ and football
games results prediction. Both forecasting exercises are characterized by their high return/risk nature and
popularity amongst forecasters. Thus, it seems the perfect exercises for our CF approach. More specifically, the
proposed model will forecast the result and the number of goals within football games at the three biggest
football championships (the English Premier League, Italian Seria A and Spanish La Liga) and the direction of the
Pfizer Inc. (PFE) and Goldman Sachs Group (GS) stocks. The forecasts are evaluated through a realistic betting
exercise based on the Betbrain average odds for the case of football games and through a simple trading exercise
for the case of stocks. For the case of the football games, CF is combined with a Relevance Vector Machine
(RVM) that reduces the dimension of the input vector and a series of CF rules are extracted. For the stocks’
prediction exercise, CF is applied directly to the dataset as the inputs vector has already small dimensions. Our
CF forecasts are benchmarked against those generated by an Adaptive Neuro-Fuzzy Inference System (ANFIS)
model (the most popular fuzzy extraction structure), a single RVM model, an Ordered Probit (OP) model, from
a Multilayer Perceptron (MLP) neural network, a k-Nearest Neighbour (k-NN model), a Decision Tree (DT) and a
Support Vector Machine (SVM) model. The comparison between CF and ANFIS reveals the benefits of the
2
proposed framework compared to an unconditional (ordinary) Fuzzy Inference Systems (FIS), whilst
benchmarking the results with the single RVM will validate whether CF can increase the accuracy of the
underlying system. OP, MLP, k-NN, DT and SVM are some of the most popular techniques in football games and
stocks forecasting. Thus, the comparison with CF offers a benchmark close to the relevant literature.
Fuzzy Logic (FL) was introduced by Zadeh (1965). Its motivation is driven by the work and functioning of the
human mind. Even though a tremendous amount of information presents itself to a human in any given situation
– an amount that would ‘choke’ a typical computer – the human mind has the ability to discard the irrelevant
elements and to concentrate only on the information that is relevant. The ability of the human mind to deal only
with the part of the information that is relevant is connected with its possibility to process fuzzy information
(Zadeh, 1983). In that way the amount of information the brain has to deal with is reduced to a manageable
level and the decision process is faster, simpler and more effective. FL can be described as multi-value logic that
allows for intermediate values instead of the conventional evaluations like yes/no, up/down, on/off. These
linguistic expressions are called Fuzzy Rules (FRs). FRs are conditional statements in the “if-then” form. They are
widely used in studies with systems whose actions are incomprehensible (Bellman and Zadeh, 1970). In these
studies, researchers apply an FIS to the in-sample to generate a set of FRs that maps the interactions between
variables (see amongst others Hruschka, 1988; Teodorović, 1994; Piramuthu, 1999; Kuo, 2001; Chang et al.,
2008; and Gradojevic and Gençay, 2013). This set of FRs reveals the structure of the system and how inputs
relate to the outputs. However, this structure is rarely retained to the same extent in the out-of-sample.
Several conditional approaches have been proposed in the literature aiming to improve the performance of FL.
These modifications focus on different stages of FIS development. Earlier studies focus on the ‘inference’ stage
and try to improve the quality of a trained FIS (see among others Fukami et al., 1980; Mizumoto, 1982; Sugeno,
1985). These techniques can be categorically termed as ‘conditional inference’ methods. The ANFIS is motivated
by these works and provides a generalized double-path training algorithm which is shown to be superior to its
precedent alternatives (Jang, 1993). An alternative conditional approach to FL can be amending the
implementation (i.e. defuzzification) phase. The motivation here is to improve the performance of FL in practice
rather than improving the quality of the FRs. In this case the inference phase is not affected, and the conditional
approach is applied to an initially trained FIS. Therefore, any training/inference technique (either conditional or
unconditional) can be used to generate an FIS. The algebra for such modifications is firstly introduced in
Singpurwalla and Booker (2004). Kadri (2014) introduces an adaptive approach to defuzzification of FRs
conditionally in signal-process. The CF introduced in this paper is motivated by the previous works on conditional
defuzzification and provides a generalized straightforward procedure to improve the performance of any
underlying FIS. In particular, CF proposes a conditional framework for implementing an FIS in a robust way based
on a training dataset and the environment the dataset is placed. This allows us to adjust an FIS based on the
application by looking at endogenous factors (training dataset) and exogenous ones (environment). These
factors are introduced to an underlying FIS which could be trained conditionally or unconditionally. The aim of
CF is to improve the performance of the underlying model no matter how the original model is trained. In terms
3
of the FIS structure, the part affected by CF is the defuzzification stage. The proposed changes are essential in
problems with dynamic data (such as finance or economic series or football games results) where the ‘input-
output’ relationship varies through time and the generated FRs in the in-sample cover only a part of the out-of-
sample. This leads to a reduction of accuracy and to systems that are interpretable but inaccurate. Another
problem is the strength of the generated in-sample FRs. Different rules have different strengths and different
degrees of accuracy. Applying weak FRs in the out-of-sample leads to poor forecasts.
There are a few studies that apply FL to the context of sports and stocks forecasting. Rotshtein et al. (2005)
propose a model for predicting the result of a football match from the previous results of both teams. Although
they suggest that their model accounts for nonlinear dependencies through fuzzy knowledge, they are focusing
on a very illiquid football betting market, the one of Finland, and they do not explore the betting profitability of
their forecasts. Trawinski (2010) proposes a fuzzy model for extracting FRs in order to predict basketball game
results. The author compares ten FR learning algorithms against a standard OLS, but they do not present robust
empirical results or any betting application of football. Finally, Bastos and da Rosa (2013) propose a static and a
dynamic Poisson-Gamma model to predict the outcome of World Cup football results based on the number of
goals scored by each team. In their application, a fuzzy C-means algorithm is used for clustering. Nonetheless,
they do not offer any betting application, while their forecasting exercise is limited to a football event occurring
only every four years. Wang (2002) propose the use of a fuzzy grey model for intraday stock price forecasting.
The authors suggest that the model is efficient in terms of statistical accuracy and user-friendly, but its
shortcoming is that it is not adaptive. A FL approach for forecasting stock market trends in ASE and NYSE stock
exchanges is also presented by Atsalakis and Valavanis (2009). Their empirical findings demonstrate the
superiority of the proposed neuro-fuzzy approach compared to another thirteen models of the literature, while
they challenge the weak form of the Efficient Market Hypothesis (EMH). Sun et al. (2015) forecast the Chinese
stock index future prices using fuzzy sets and a multivariate fuzzy time series method. The authors combine
traditional fuzzy time series models and a rough set method to extract strong rules and make superior
predictions. Oztekin et al. (2016) apply ANFIS for forecasting stock price returns of the Istanbul Stock Exchange.
Its performance is benchmarked with Support Vector Machines (SVMs) and Neural Networks (NNs). Although
ANFIS is not outperforming the other models, the authors suggest that the combination of the three approaches
is beneficial for such forecasting exercises. The merits of ANFIS are also validated by the recent work of Atsalakis
et al. (2019), who apply successfully neuro-fuzzy techniques in predicting Bitcoin prices.
The majority of researchers in football games apply OP, probabilistic or machine learning methods. OP
applications in football forecasting are common due to the ordered nature of the football result variable (Away
team win, draw and Home team win). Kuypers (2000) applies OP in order to test how the betting market
participants utilize available information and suggests that betting arbitrage is possible. Goddard and
Asimakopoulos (2004) and Goddard (2005) apply OP to forecast English football outcomes based on teams’
quality and past performance indicators. Their results indicate that this approach, which is followed also in our
study, is robust and provides high forecasting performance. Other studies utilize SVM frameworks. Vlastakis et
4
al. (2008) show that SVM performs better than a Poisson model when applied to the task of predicting European
football match scores. Gomes et al. (2016) and Martins et al. (2017) also utilize SVMs to forecast the number of
football corners, goals and outcomes of Premier League and several other championships respectively. Recently,
Baboota and Kaur (2018) show that SVM with radial basis kernel is efficient in predicting the rank probability
scores for the English Premier League using teams’ past performance indicators as inputs. In terms of
probabilistic approaches, Dixon and Coles (1997) forecast the number of goals and the results of football games
with forecasting systems based on the Poisson distribution. Crowder et al. (2002) estimate the football game
results’ probabilities with refinements of the independent Poisson model. Koopman and Lit (2015) and Angelini
and De Angelis (2017) refine previously traditional Poisson models and improve upon the model of Dixon and
Coles (1997). NNs have also been used in similar forecasting exercises. For example, Constantinou et al. (2012)
suggest a pi-Bayesian Network (BN) for English Premier league match forecasting. Their model applies time-
dependent data with weighted degrees of uncertainty and exhibits high statistical accuracy along with
profitability over the publicly available odds. A similar approach is applied by Owramipur et al. (2013). Huang
and Chen (2011) apply MLPs to forecast the match outcomes of 2006 World Cup, while Şahin and Erol (2018)
suggest that MLP is superior in forecasting football matches’ attendance compared to ANFIS.
Similar approaches are applied for stock market predictions. Hausman et al. (1992) show that OP is superior in
capturing the stock price changes compared to simple linear regression approaches, while Purda (2007) utilize
OP to analyse stock market reactions originating from anticipated and surprising bond rating changes. Yang and
Parwada (2012) also incorporate OP to predict the direction of price movements of nine Australian stocks. Their
average proportion of correct forecasts exceeding the seventy percent exhibits the success of OP in their
forecasting application. Tay and Cao (2002) modify SVM to select fewer support vectors and their results show
that SVM forecasting accuracy is improved when applied in future contract price prediction. Huang and Wu
(2008) apply a Bayesian SVM in order to forecast returns of four major stock indices and they clearly exhibit the
RVM success over other traditional SVM approaches. Gong et al. (2016) apply a hybrid SVM to predict stock
return patters, while Chen and Hao (2017) combine SVM and k-NN algorithms for stock index price forecasting.
Both studies show the benefits of using SVM classifiers for financial forecasting. Guresen et al. (2011) apply a
series of NNs to the task of forecasting NASDAQ stock prices and they find that classical MLP structures are
superior in terms of statistical accuracy. Wang et al. (2011) also illustrate the merits of back-propagation NNs
for similar financial forecasting exercise. Ticknor (2013) suggests that the Bayesian regularization of NNs is
beneficial for obtaining more accurate forecasts of two US stocks. On the other hand, as shown by Patel et al.
(2015) Bayesian classifiers struggle to outperform MLP and SVM structures when it comes to forecasting stocks’
directional movements through technical indicators.
With the previous background in mind, our proposed methodology is benchmarked against some of the most
popular models used in football games and stocks’ prediction. None of the previous studies applies a conditional
fuzzy framework similar to the one introduced in our study. The remainder of this paper is organised as follows:
Section 2 introduces the concept of CF and describes in detail the methodology and the benchmarks applied.
5
Section 3 describes the dataset used and the empirical application to football betting. Section 4 presents the
conclusion. Section 5 includes some remarks on the practical implications and future works while in the
Appendix several technical details are explored. Finally, the mathematical background on ANFIS, RVM and OP
and extra empirical results are placed on an online Appendix.
2. Methodology
This section summarizes all the relevant information regarding the proposed methodology. Initially, the CF
method is motivated. Then, its mathematical and technical description is explained in detail. Finally, short
descriptions of the benchmarks are provided.
FL frameworks are traditionally applied in decision processes, where the level of uncertainty is high and a
complete problem formulation is difficult. In FL, an FIS is applied and a set of FRs are extracted for prediction
and decision-making (Sugeno, 1985)1. The most popular FIS is the ANFIS of Jang (1993). The mathematical
formulation of ANFIS is presented in the online Appendix S.1.
The methods of defuzzification vary in the literature. For example, Jang (1993) classifies possible defuzzification
functions in three main categories. In the first category a crisp output for each rule is estimated and then a
weighted average of each rule’s output forms the aggregate output as the fuzzy model’s output. The applied
weights are based on the firing strength of the rules and the output membership function. The second category
of defuzzifiers involves applying a “max” operator to qualified fuzzy outputs that meet some criteria (e.g.
minimum firing strength) and then the aggregate output is determined by a function such as the mean of
maxima, the maximum criterion and so forth. The third category is the Sugeno approach (adopted in ANFIS),
namely for each rule’s output, a general linear model based on inputs is estimated and the aggregate output is
the weighted average of all rules involved.
The conditional approach in the second category that is overlooked in the ANFIS structure can be promising. If
all FRs extracted through the training process are not of the same quality, the user might prefer to use the FRs
partially rather than entirely. The ANFIS adopts the double pass algorithm to optimise the rule specifications but
may lead to overfitting and to undesirable performance for the previously unobserved points. In other words,
the problem is that in the out-of-sample the membership grade may not be strong enough for some or all rules.
This eventually leads to bad forecasts. This problem is common in machine learning where the out-of-sample
performance is either considerably worse than the in-sample’s one (over-fitting), or the model is inadequately
trained (under-fitting).
1
For the identification and parameterization of a FIS the interested reader should refer to studies such as Bezdek et al. (1984) and Jang
(1993).
6
The proposed methodology in this study combines the merits of the second and the third defuzzification
approach. A crisp output based on a simple linear model is highly favourable in case of large-scale problems.
On the other hand, when a dataset comes with remarkable noise a selective procedure for the FRs aggregation
can control the uncertainty. The next section explains the algorithm for the CF.
2.1.2 Algorithm
𝓌́𝑖∗ = ∏𝐾 ∗
𝑘=1 𝜇́ 𝑖𝑘 (𝑥𝑘 , 𝑚𝑓) (1)
where 𝜇́ . (. ) is the membership grade calculated based on a given membership function 𝑚𝑓 and a 𝐾-dimensional
test data point with input vector 𝑥 ∗ = [𝑥1∗ , … , 𝑥𝐾∗ ]. The choice of the membership function can be central to the
model’s fit and its predictive ability. Based on Guillaume (2001), for most inputs of a fuzzy system, it can be
intuitively accepted that as an observation places farther from the centre of a rule, the associated membership
grade falls. Thus, it can be assumed that the membership function needs to come with a bell shape. The Gaussian
membership function offers such smooth shape around centres of clusters and therefore it is a fitting one for
our case. Such choice is suitable for forecasting applications like ours (see amongst others Wang and Mendel,
1992; Jang, 1993; Kuncheva, 2000; Cheng and Lee, 2001; and Akkoç, 2012). For football forecasting specifically,
Vlastakis et al. (2008), Igiri (2015) and Baboota and Kaur (2018) that apply SVM to football match prediction,
they also employ the Gaussian kernel to their models. We also experimented in the in-sample with the
univariate/multivariate sigmoid, triangular and trapezoid membership functions. In all cases, the in-sample
accuracy was worse than the one acquired with the Gaussian membership function. Under a Gaussian
distribution assumption, the membership grade for the 𝑘-th element of rule 𝑖 is given by:
𝑥𝑘∗ −𝑐𝑖𝑘 2
𝜇́ 𝑖𝑘 (𝑥𝑘∗ , 𝑚𝑓) = exp {− ( 𝑎𝑖𝑘
) } (2)
where 𝑎𝑖𝑘 and 𝑐𝑖𝑘 are specified for the FRs through the training process.
The CF proposes eligibility criteria for the defuzzifier function to discard rules with unsatisfactory membership
grades from the set of rules that form the fuzzy output. This process involves two components: the criteria set-
up and a satisfactory threshold (Θ). The criteria set-up determines the evaluation function for each rule and can
take one of the forms offered in equations (3) and (4) below:
1/𝐾
∗ 1, min𝑘 (𝜇́ 𝑖𝑘 (𝑥𝑘∗ , 𝑚𝑓)) ≥ Θ
𝐶𝑖∗ = { 1, 𝓌́𝑖 ≥ Θ (3) or alternatively 𝐶𝑖∗ = { (4)
0,𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0,𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The criteria set-up can be defined as an average of membership grades for each rule compared to Θ (equation
(3)) or as the minimum membership grade compared with Θ (equation (4)). Both procedures generate an
indicator as to whether the rule should be applied or not. In the equations (3) and (4), 𝐶𝑖∗ is the indicator to
determine whether the rule 𝑖 is qualified (𝐶𝑖∗ = 1) as an eligible rule for the specific test observation to be
7
applied. Alternatively, the rule is considered as weak if 𝐶𝑖∗ = 0. The other component to shape the eligibility
criteria is the threshold parameter. Θ determines the average power required for a rule to be considered as a
strong rule. We propose defining the Θ as:
Under this setting, the threshold level originates from two sources. Firstly, from the training dataset reflected in
𝓌𝜆 1/𝐾 and secondly, from a general value allocated to membership grade that represents a strong rule (Λ). Θ is
simply the larger of the two quantities. Alternative settings for Θ depending on the nature of application might
be used as well. In equation (5) 𝓌𝜆 1/𝐾 is the 𝐾-th root of average firing strength of a rule 𝜆 that is the minimum
acceptable level of firing strength based on the training dataset. In other words, 𝓌𝜆 1/𝐾 is the endogenous
threshold based on the in-sample. To set𝜆 the following procedure is pursued:
Firstly, the arithmetic average membership grade for each rule 𝑖 and input element 𝑘 is computed over the in-
sample dataset 𝜇̅ 𝑖𝑘 (. ). Secondly, the average firing strength 𝓌𝑖 is given by
𝓌𝑖 = ∏𝐾
𝑘=1 𝜇̅ 𝑖𝑘 (. ) (6)
Thirdly, the FRs are sorted in a descending order of average firing strength (sorted list). In the sorted list, the
rules on the top are ones that data-points place closest to the centres of the FRs. An arbitrary percentile level
for the top strongest rules is selected based on the complexity of the problem and the aims of the researcher
(e.g. ten). Then, the 90th percentile rule on the sorted list is selected. The equivalent index (row) of the selected
rule on the original list of rules is the𝜆.
Unlike 𝜆 and 𝓌𝜆 1/𝐾 that are endogenous, Λ is a fixed quantity that determines the general quality of a strong
rule. The membership grade ranges from zero to one. The closer the data-point gets to the centre of a rule, the
higher will be the membership grade. Ideally, if the data-point and the rule centre overlap, the membership
grade is equal to one. The further away the data-point gets, the lower the membership grade becomes. This
implies that the FR is getting weaker. The choice of Λ depends on the problem under study and the practitioners’
needs. In problems where uncertainty is high, and a wrong forecast can have a considerable impact, Λ should
be set high (over 0.8). This will reduce on the one hand the uncertainty but on the other hand, it will also reduce
the number of CF forecasts generated. In problems where wrong forecasts have a small effect on the utility
function of the practitioner, Λ can be set lower. This will lead to more CF forecasts that retain some level of risk.
The combination of the endogenous and the exogenous threshold ensures that the applied rules for forecasting
are correctly fitted. Overfitting and under-fitting are the most significant challenges in machine learning. If an
FIS is over-fitted, the average membership grades, the average firing strengths and accordingly the 𝓌𝜆 1/𝐾 are
high. As the model is overly specified to match the training samples, once the out-of-sample data-points are fed
the average membership grades for the new observations will fall below the 𝓌𝜆 1/𝐾 . In the CF, the model drops
these rules and looks for any remaining rule that can satisfy the 𝓌𝜆 1/𝐾 threshold. This ensures that the model
forecasts properly even if the original FIS is over-fit. If the CF is unable to identify for a single point any relevant
8
strong rule, then no forecast is generated. Similarly, in case of an under-fit model, the in-sample measure 𝓌𝜆 1/𝐾
is low. However, there might be still some rules that are strong and fit enough for certain points in the out-of-
sample. Λ ensures that under-fit rules with low 𝓌𝜆 1/𝐾 are not applied.
The proposed CF modifies the last layer. The modification is able to grasp the strongest possible FRs and drop
the mediocre and poor ones. In the CF model, the final layer outcome can be presented as:
As the equation implies the difference in CF compared to ANFIS is in the defuzzifier module, where the aggregate
output is modified to include the eligibility criteria. The CF output 𝑂́∗ is computed through a node function 𝑂́5
given the input vector 𝑥 ∗ , ANFIS specification, and threshold parameters. The ANFIS specification include
training nodes (𝑂1 , … , 𝑂4 ), premise parameters set {𝐴, 𝐶} and consequent parameters set {𝑃, 𝑄, 𝑅}. The CF
threshold parameters are {𝜆, Λ}. We propose two defuzzification functions. Firstly, by calculating a weighted
average of outputs for qualified FRs and secondly the output of the strongest qualified rule represented in
equations (8) and (9) below:
∑𝑀
𝑖=1 𝓌́𝑖 𝐶𝑖 𝑓𝑖
𝑂′5 = ∑𝑀
(8) or alternatively 𝑂′5 = 𝑓𝜈 (. ), 𝑣 = arg max(𝓌́𝑖 ) (9)
𝑖=1 𝓌́𝑖 𝐶𝑖
𝑖
In equations (8) and (9), 𝑂′5 is the node function for the last layer of CF. For each rule 𝑖 the firing strength 𝓌́𝑖 is
estimated by equation (1), while the conditional indicator 𝐶𝑖 is given by equation (3) or (4) based on threshold
parameters estimated in (5) and (6). 𝑓𝑖 is the regression output estimated by the FR (traditionally as with any
Sugeno fuzzy approach). Finally, in (9) 𝑣 is the argument of maxima for the firing strength𝓌́𝑖 .
The above-mentioned modification is crucial, as it can provide predictions for points (observations) where strong
rules are nearby and at the same time satisfy endogenous and exogenous threshold specifications. The final CF
outcome is a weighted average only of the strongest rules. This attribute is innovative in FL, as it is able to offer
interpretability of the final result, protection against substantial forecast errors and under- or over-fitting in the
underlying decision-making system. To further demonstrate the novelty of the CF method, a comparative graph
of the rules selected in the case of CF and ANFIS is presented in Figure 1.
9
Note: The distribution shows the selection of fuzzy rules with respect to the distance from centres (𝜇) in
units of standard deviation (𝜎) and the corresponding strength. In ANFIS (whole distribution) all fuzzy rules
are accepted irrespective of the strength, but in CF (dark pick of the distribution) only the strongest rules
are selected to for defuzzification stage.
From the Figure above, it is clear that CF selects rules that are on the pick of the distribution of the ANFIS
selection and within small distances from the centre.
The flowchart for the modelling section of the paper is presented in Figure 2.
Figure 2: Modelling flowchart of the study
Note: The figure presents different steps how the empirical analysis of this paper is conducted. The solid-line
arrows show the regular link between different blocks while the dashed-line arrows reflect the route for
generating combined forecasts through information fusion-based sensitivity analysis.
Figure 2 shows that in the first stage, the system inputs and target series are generated from the raw datasets.
The next step is to evaluate whether a dimension reduction is needed in the inputs’ matrix. In case that a
technique is applied, the output of the pre-processing stage is an optimal subset of inputs which explains best
the dynamics in the target series. The RVM is used in the feature selection stage to extract the most informative
input vectors (see Section S.2). Alternatively, if no dimension reduction technique is needed, the feature space
remains unaffected. In the following stage (i.e. model fitting) the CF and the benchmarks are trained to make
forecasts as individual models. Finally, the CF and the benchmark models are used to generate combined
forecasts through the information fusion-based sensitivity analysis (IF) as discussed in Section 2.3.
In our application, we apply CF following the evaluation function of equation (3), while the CF output is
calculated based on equation (8)2. Here it should be noted that the CF model synthesis is based on the algebra
offered by Zadeh (1965), Jang (1993) and Singpurwalla and Booker (2004). The alternative definition of the
2
The results of our empirical application are almost similar to both evaluation functions (equations (3) or (4)) and defuzzification
functions (equations (8) or (9)).
10
indicator in equations (3) and (4) are based on AND and PRODUCT operators in the FL context, whereas the
defuzzification methods in equations (8) and (9) are based on the discussion of Jang (1993). The theoretical
analysis of this algebra is provided in Singpurwalla and Booker (2004).
In terms of applying CF, football betting is a high risk-high return exercise. A wrong bet usually results in the total
loss of the invested capital. Thus, Λ is set high to 0.90. This will result in a stricter selection for CF and resembles
the real-world practice where bettors are highly selective regarding the games on which they will bet. The setting
for Λ is more lenient in stock market application, since the price changes are within a more limited range. In this
case the Λ is chosen between 0.5, 0.75, and 0.9 based on number of rules generated. In the out-of-sample, for
each point the membership grades for the FRs are estimated and based on the eligibility criteria, a signal might
be generated. In case a signal is generated the relevant bet is placed. If a signal is not generated, we abstain
from the market. A detailed illustrative example of the CF application is provided in Appendix A.
This approach is benchmarked against a basket of forecasting algorithms, namely a RVM, an ANFIS, an OP, an
MLP, a k-NN, a DT and an SVM model. RVM is a sparse kernel model to tackle the problem of large-scale data-
processing. The Bayesian learning framework used in RVM can generate precise forecasts while reducing the
feature space to the most important (“relevant”) vector. RVM’s solid probabilistic approach allows the inference
of the optimal hyperparameters and vectors’ weights from the data. In our proposed approach, RVM will serve
for both feature selection and prediction for the football exercise and for prediction for the stocks exercise. The
selected features, Relevance Vectors (RVs), are applied to ANFIS and to our CF approach. For the mathematical
details of ANFIS and RVM, we refer the reader to the online Appendix (S.1 and S.2 respectively).
For the case of the football games exercises, we reduce the dimension of inputs vectors with an RVM and an IF
approaches. The performance of all our models is better when RVs are selected as their inputs instead of the
ones generated by the IF. Thus, in our application the inputs of the benchmark models (ANFIS, OP, MLP, k-NN,
DT and SVM) and of our CF approach are the selected RVs from the RVM. Nevertheless, for the sake of
comparison, we present also the performance of CF with the IF generated inputs vector (CF-IF)3. For the stocks’
prediction exercise, the input vector is small enough to be fed directly to CF. In this case, all models under study
use the same set of inputs (ANFIS, OP, MLP, k-NN, DT and SVM).
The flowchart of Figure 2 makes apparent that our CF model needs to be benchmarked with ANFIS. With this
comparison, we will be able to quantify if the modification of the last layer of ANFIS is successful and whether
the conditional approach is beneficial. RVM can optimize the global parameters that affect the input variable
space. Its Bayesian probabilistic approach is beneficial by firstly producing sparse solutions able to reduce the
input space for other models and secondly optimal parameters that allow the RVM to forecast efficiently. This
3
In the case of IF, we calculate the variable sensitivity scores by means of information-fusion as described in Online Appendix S.3. We
also explored the utility of Principal Component Analysis (PCA) on the same task. It provides the worse sets of results from all those
obtained by RVM or IF. For the sake of space, they are not presented in-text but are available upon request.
11
is very important, as traditional methods such as cross-validation are not able to achieve this (Tipping, 2001).
OP originates from Logistic Regression (LR) models and tries to estimate the probability of each outcome for a
dependent variable. Depending on the number and nature of the possible choices that the dependent variable
can take, the choice of the LR type can be binomial, multinomial and ordinal. Given that the outcomes of football
matches can be ordered as Home team winning, draw and Away team winning the game and the stocks’
direction as higher or lower, the OP model is suitable for our application. The mathematical details of OP (as
earlier with ANFIS and RVM) are provided in the online Appendix S.4.
The MLPs are the most popular and simple architecture of feed-forward Neural Networks. MLPs are consisted
by three different types of layers. The input, the hidden and the output layer. The input layer includes the inputs
nodes where each node corresponds to a different explanatory variable. The output layer is formed with one
node and represents the output of the NN. The hidden layers separate the input from the output layers and
define the amount of complexity the model is capable of fitting. Following the guidelines of Zhang et al. (1999)
and the characteristics of our datasets, our MLP is consisted by three layers with one hidden layer with ten
hidden nodes. k-NN is based on the idea that pieces of time series in the past have patterns that might
resemblance to pieces in the future. It locates similar patterns in terms of nearest neighbours and the Euclidean
distance and use these patterns to predict the immediate future. The algorithm uses local information to
forecast and makes no attempt to fit a model to the whole times series at once. Our algorithm is constructed
based on the guidelines of Guégan and Huck (2005) and the characteristics of our datasets. DT is a machine
learning technique that build classification models in the form of tree structures. The dataset breaks down in a
smaller subsets while at the same time a decision tree is incrementally developed. In our application, we apply
the CART procedure of Breiman et al. (1984). The DTs are easily to understand and interpret and any nonlinear
relationships between their parameters do not affect the algorithm performance. SVM is a class of supervised
machine learning model that has an identical functional form with RVM which does not use Bayesian inference.
Compared to RVM, SVM always converge to global optima and require less training time. However, the
classification accuracy of RVM is higher than the one of SVM (Rafi and Saikh, 2013). In this study, we design our
SVM model based on the 10-fold cross validation approach of Oztekin et al. (2016). For the underlying
mathematics of the MLP and the k-NN we refer the reader to Zhang et al. (1999) and Guégan and Huck (2005)
respectively. A technical description of DT can be found in Breiman et al. (1984) and of SVM in Vapnik (1995).
For researchers that delve into intensive data mining tasks, performances of different sets of models are
evaluated over different datasets. Therefore, identifying the best approach in terms of data mining methods is
crucial, but at the same time it proves to be a challenge. One solution is to apply ‘‘composite forecasts’’ from
multiple models in order to improve accuracy and obtain better results from a set of data mining models (Sevim
et al., 2014). Information fusion is a technique that brings the above approach one step further, as it is able to
account for and prioritize on information derived from multiple forecasting models. This technique is able to
reduce bias and uncertainty that can be attributed to individual techniques (Oztekin et al.,2013; Oztekin et al.,
12
2016). Additionally, it offers a viable solution for coping with dimensionality issues common in classification
studies (Bogaert, et al., 2018). The latter is achieved by applying IF for variable selection by creating a rank order
of the importance of the input variables through the forecasting models involved.
Assuming the 𝑌̆dependent variable and a set of independent variables 𝑋̆ = {𝑥̆1 , 𝑥̆2 , . . . , 𝑥̆𝑛̆ }, an individual
predictive model 𝑖̆can be expressed as:
It should be noted that 𝐹 in not constraint by any form. Considering 𝑡̆ individual models and a linear combination
as a fusion operation 𝛤̆, the infusion model can be expressed as:
̆
𝑌̆𝑓𝑢𝑠𝑖𝑜𝑛 = 𝛤̆ (𝑌̆1 , 𝑌̆2 . . . 𝑌̆𝑡̆ ) = ∑𝑡𝑖̆=1 𝛼̆𝑖̆ 𝐹̆𝑖̆ (𝑋̆) = 𝛼̆1 𝐹̆1 (𝑋̆) + 𝛼̆2 𝐹̆2 (𝑋̆)+. . . +𝛼̆𝑡̆ 𝐹̆𝑡̆ (𝑋̆) (11)
̆
Where 𝛼̆𝑖̆ is the weighting coefficient for each individual model and ∑𝑡𝑖̆=1 𝛼̆𝑖̆ = 1.
According to Oztekin et al. (2013), the weights are proportional to the forecasting performance of each model.
For that reason, it is expected that information extracted from well performing classifiers should be found more
important than information obtained from those that do not perform well. In our case, we follow Díez-Pastor et
al. (2015) to calculate Area Under Curve (AUC) estimates that will be used as our 𝛼̆𝑖̆ . In order to calculate the
AUC estimates we need an illustration of a confusion matrix as in the figure below.
1 𝑇+ 𝐹+
𝐴𝑈𝐶 = ∫0 𝑑 , 0.5 ≤ 𝐴𝑈𝐶 ≤ 1 (12)
𝑇 + +𝐹 − 𝐹 + +𝑇 −
The boundaries of AUC indicate performances ranging from random classifiers to perfect prediction models.
Different performance metrics can be used to approximate the weights. Most of those metrics are based on a
pre-set threshold for the positive and negative classes. Given that such threshold setting is subjective, aggregate
measures, such as the AUC, are more appropriate. We follow the guidelines of Bogaert et al. (2018) and apply
five times two-fold cross validation (5x2cv) as proposed by Alpaydin (1998). In that way, we obtain median 5x2cv
AUC values used as our 𝛼̆𝑖̆ . For the case of variable selection through IF, Bogaert et al. (2018) suggest that these
estimates are appropriate to extract the rank order of the input variables by means of AUC-based permutation
variable importance measures. The reason for that is that is two-fold. Firstly, this technique allows using
information extracted from all forecasting models, while models with higher AUC obtain higher weight in the
13
fusion operator. Secondly, the use of the median 5x2cv AUC is allowing to associate better the sensitivity of a
model to the change of specific variables. In other words, when the variable is more related with the response,
the higher its mean decrease in AUC is expected to be. Rearranging equation (10) leads to the IF measure for
the input 𝑛̆ with 𝑡̆models expressed as:
̆
𝑉̆𝑛̆ (𝑓𝑢𝑠𝑖𝑜𝑛) = ∑𝑡𝑖̆=1 𝛼̆𝑖̆ 𝑉̆𝑖̆𝑛̆ = 𝛼̆1 𝑉̆1𝑛̆ + 𝛼̆2 𝑉̆2𝑛̆ +. . . +𝛼̆𝑡̆ 𝑉̆𝑡̆𝑛̆ (13)
Where 𝑉̆𝑖̆𝑛̆ stands for the variable importance measure for input 𝑛̆ in model 𝑖̆. In our case the models are six,
namely RVM, OP, MLP, k-NN, DT, and SVM4.
As a last step towards the construction of our final input features, we keep the variables that account for 90%
of the cumulative sensitivity as described by the pseudo-Pareto analysis of Oztekin et al. (2013) and Bogaert et
al. (2018). These input features are shown in Online Appendix S.3 (Tables S.1-S.9).
In this section, the empirical applications are presented. Their purpose is to demonstrate the merits of the
introduced CF concept and examine the statistical accuracy of the derived forecasts. For this purpose, CF is
applied in a series of football games and stocks’ datasets. Sections 3.1 and 3.2 presents the characteristics of
the datasets and Sections 3.3 and 3.4 have the empirical performance of CF and its benchmarks.
The first empirical application is on football games forecasting. All models are applied to forecast the results and
the number of goals of football games in the English Premier League, Italian Seria A and Spanish La Liga from
2005 to 2016. The dataset of this study is publicly available at http://www.football-data.co.uk. In football game
result forecasting, there are three different outcomes (Home win, draw and Away win). A second football
forecasting exercise is based on the number of goals, at which gamblers bet whether the total number of goals
in a game will or will not exceed 2.55.The models under study require a series of inputs that are bookmakers’
odds and past teams’ performance indicators. The BetBrain average odds are used in this study, which are the
average odds from around 138 bookmakers and betting exchanges from across the globe. These inputs are
summarized in Table 1.
Table 1: Inputs Series
Points of H team – Points of H in the last Points of A in the last Points of H in the last Points of A in the
Points of A team 1,2,3 games and from 1,2,3 games and from 1,2,3 games and from last 1,2,3 games and
before the start of the start of the season the start of the season the start of the season from the start of the
the game when H plays at home season when A
plays at away
Number of goals of Number of goals of A Number of goals of H Number of goals of A Number of shots on
H team in the last team in the last 1,2,3 team in the last 1,2,3 team in the last 1,2,3 target of H team in
1,2,3 games and games and from the games and from the games and from the the last 1,2,3 games
start of the season start of the season
4
The reason for discarding the ANFIS and CF forecasts from IF procedure is that developing a fuzzy model with our entire
set of 83 inputs would increase the number of fuzzy rules created may increase up to 283 which is computationally infeasible
to process.
5
An alternative approach based on the Asian handicap is also explored in online Appendix S.5.
14
from the start of the start of the season when when A team plays and from the start
season H team plays at home away of the season
Number of shots on Number of shots on Number of shots on Number of shots on Number of corner
target of H team in target of A team in the target of H team in the target of H team in the kicks of H team in
the last 1,2,3 games last 1,2,3 games and last 1,2,3 games and last 1,2,3 and from the the last 1,2,3 games
and from the start of from the start of the from the start of the start of the season and from the start
the season season season when H team games when A team of the season
plays at home plays at away
Number of corner Number of corner kicks Number of corner kicks H team booking points A team booking
kicks of A team in of H team in the last of A team in the last in the last 1,2,3 games points in the last
the last 1,2,3 and 1,2,3 games and from 1,2,3 games and from and from the start of 1,2, 3 games and
from the start of the the start of the season the start of the season the season from the start of the
season games when H team plays at when A team plays at season
home away
Betbrain average Betbrain average draw Betbrain average away Betbrain average over Betbrain average
home win odds odds win odds 2.5 goals odds under 2.5 goals
odds
Betbrain size of Betbrain average Asian Betbrain average Asian
handicap (home handicap home team handicap away team
team) odds odds
Note: There are six main categories of predictors. The first one is based on the odds offered by the bookies and the other five originate
from the performance of the teams over the past games. The inputs categories are: (i) odds for match outcome, number of goals and
Asian handicap size bets (home team winning, draw, away team winning, over 2.5 goals, under 2.5 goals, home team win, and away
team win Asian handicaps – 8 total), (ii) Points achieved, (iii) Goals scored (iv) Corner kicks, (v) Shots on target, (vi) Booking points. For
the last five categories, 15 different types of measures are introduced to make sure all types of developments are considered. Team H
is the home team and team A is the away team. Booking points are 25 points per red card and 10 per yellow card in a game. The total
number of inputs is 83.
The previous literature on football forecasting employs only sub-sets of Table 1. This pool consists of 83 potential
inputs. Feeding all these inputs to any predictive model increase the training time and causes overfitting. Let us
consider a simple case of having just high/low outcomes for 83 inputs. The number of fuzzy rules created may
increase up to 283 , which is impractical for both understanding the input-output interactions and forecasting.
To deal with the dimensionality problem we use a probabilistic framework, namely RVM to select the most
informative subset of potential predictors (see, online Appendix S.2). The RVM allows us to identify the best
inputs without restricting our inputs to smaller sets and serves as the variable selection component in the pre-
processing stage of Figure 2. Thus, it is not necessary to restrict our inputs to smaller sets or apply additional
techniques to reduce the dimensions of our predictors’ pool6.
The implemented forecasting exercises span the period of 2005 to 2016. The forecasts from the models are
evaluated in terms of accuracy through the relevant BetBrain average odds, the average profit per bet, the
proportional cumulative annualized return and the Kelly Criterion. The average profit per bet is defined as:
6 For example, Dixon and Robinson (1998), Oberstone (2009) and Angelini and De Angelis (2017) use the number of goals scored in a
match to improve forecasting accuracy of the final football outcome. Other studies consider the odds of home win/ draw/ away for
football game predictions (see amongst others, Dixon and Coles, 1997; Crowder et al., 2002; Dobson and Goddard, 2003; Constantinou
et al., 2012; and Boshnakov et al., 2017). The number of corner kicks implies offensive pressure and is considered a good proxy for higher
scoring probability. Thus, Andersson et al. (2009) apply this variable in their football betting models. Oberstone (2011) and Martins et al.
(2017) apply shots on target as another proxy of the offensive capacity of a team. Most of the inputs in Table 2 are based on the
performance of the home/away team during the last three games. Incorporating teams’ recent performance indicators (points of last
three games, goals scored in the last three games, etc.) is crucial for the adaptiveness of the CF process and their utility are supported by
Goddard and Asimakopoulos (2004), Goddard (2005) and Rotshtein et al. (2005). Here it should be noted that in the football betting
literature it is well-accepted practice to use bookmaker odds for modelling and forecasting football outcomes (Goddard and
Asimakopoulos, 2004; Štrumbelj, 2014; and Schumaker et al., 2016). Additionally, using fixed-odds, like ours from BetBrain, is considered
advantageous as bettors know the final odds at the time of betting (Feess et al., 2016). This is why fixed-odds betting applications are
widely found in the respective literature (see amongst others Forrest and Simmons, 2008; and Constantinou et al., 2012).
15
∑𝑞 𝑥𝑞 𝑏𝑞 −|𝑄|
|𝑄|
,𝑞 ∈ 𝑄 (14)
where 𝑄 is the set of games on the season that a bet is placed, 𝑥𝑞 takes the value of 1 if the bet on game 𝑞 is
won based on the relevant forecast and 0 otherwise, 𝑏𝑞 is the relevant BetBrain average odd and |𝑄| is the
cardinality of set 𝑄.The proportional cumulative annualized return is estimated simply by betting on each game
always the 5% of the total capital which initially is 100 units. For each subsequent game, we continue to bet the
5% of the total remaining pot. The proportional cumulative annualized return is the accumulated return in the
end of the season. This practice resembles the reality where gamblers bet a proportion of their wealth.
In any investment (such as in football betting) there are three main areas to cover: the investment strategy, the
timing of the investment (whether to invest or not), and the size of the invested capital. The investment strategy
can be guided by statistical models (RVM, ANFIS, CF, and OP in our case). The timing of the investment can also
be guided by a conditional procedure like CF or the investor’s preference. The optimal size of the investment or
football bet can be determined through the Kelly criterion. Consider the case of having initial capital 𝒳0. The
goal is to maximise the expected value of capital after 𝓃 trial (𝒳𝓃 ). Now suppose that a gambler is interested in
a bet with win (loss) probability 𝓅 (𝓆) and payoff 𝒷 for every unit wager. The purpose is to maximise:
where 𝒻 denotes the fraction invested in the bet and 𝑔(𝒻) is the growth based on the fraction invested in each
bet. The optimal fraction (𝒻 𝑜𝑝𝑡 ) based on Kelly (1956) and Thorp (2008) is given by:
𝒷𝓅−𝓆
𝒻 𝑜𝑝𝑡 = 𝒷
(16)
In order to apply the Kelly criterion, we need the corresponding probabilities 𝓅. These are readily available in
the OP framework, as the estimated conditional probabilities of each outcome. RVM and CF have a different
structure and the winning probabilities are not readily provided. However, the CF framework allows us to
resemble the Kelly criterion and following a similar procedure to extract the optimal fractions 𝒻 𝑜𝑝𝑡 7. The firing
strength can determine how probable it is to use the specific rule 𝑖 for bet 𝓀. By aggregating this measure and
normalising it for the rules with similar output we provide an approximated conditional probability for each
ordinal outcome.
More specifically, we associate the winning probability with the conditional probability of each outcome to
̃ 𝑖𝓀 for all the eligible FRs is
happen. For a bet 𝓀 to be placed the normalised conditional firing strength𝓌
estimated by:
𝐶𝑖𝓀 𝓌́𝑖𝓀
̃ 𝑖𝓀 =
𝓌 , ∀𝑖 (17)
∑5𝑖=1 𝐶𝑖𝓀 𝓌́𝑖𝓀
̃ 𝑖𝓀 s are aggregated for the rules that fall under same ordinal outcome:
In the next step the 𝓌
7 Singpurwalla and Booker (2004) demonstrate that probability theory has a sufficiently rich structure for incorporating fuzzy sets within
its framework.
16
𝜋̂𝒶𝓀 = ∑𝓈 𝓌
̃ 𝓈𝓀 |𝑓𝓈𝓀 (. ) ≈ 𝒶 (18)
where the 𝑓𝓈𝓀 (. ) represents the predicted output of the rule 𝓈. The procedure to interpret the regression output
𝑓𝓈𝓀 (. ) to a class label 𝒶 is the same in OP. By setting 𝓅 = 𝜋̂𝓀,𝒶 in equation (16), the optimal fraction for each
outcome of the bet is given. It should be noted that when the CF does not find any eligible rule for the bet, the
winning probability is zero and the 𝒻 𝑜𝑝𝑡 becomes negative. A bet is only placed in case of having a positive 𝒻 𝑜𝑝𝑡 .
The in-sample consists of three football seasons and the out-of-sample by the following two seasons (i.e. in the
first forecasting exercise for the Premiership, the football seasons 2006-2007, 2007-2008 and 2008-2009 act as
in-sample and the seasons 2009-2010 and 2010-2011 act as out-of-sample). The estimation is not rolled forward
from the first (2009-2010) to the second season (2010-2011) of the out-of-sample. Thus, the second out-of-
sample season act as robustness to the models. In all seasons the first three Home and the first three Away
games of a team are discarded from the exercise. Otherwise, if team A plays against team B and one of the two
teams has fewer than three Home and three Away games, this game is excluded from the exercise (both as in-
sample and out-of-sample data)8.
The dataset used in the second application is simpler, as it involves the returns of Pfizer Inc. (PFE) and Goldman
Sachs Group (GS) stocks. Both stocks prices are publicly available at https://finance.yahoo.com. We investigate
the stocks over the period of 10 years (2007-2016). In each case, five years and one year of weekly data are used
as in-sample and out-of-sample period respectively. The target series is the direction of stock price change at
the end of each week. As inputs, we employ the weekly returns of the last two months (8 weeks). In this case,
there is no dimensionality issue that might impede our approach and thus the inputs vector is fed directly to CF.
Our forecasts in this exercise are examined through two simple measures. The directional accuracy for each
study period 𝑡 is denoted by 𝐴𝐶𝐶𝑡 and defined as:
where 𝑆̂𝑞1 and 𝑆𝑞1 are the predicted and actual direction of change in case a prediction is made for the period
𝑞1 . In addition, 𝐼𝑞1 is the indictor equal to one if the prediction is correct and zero. The set of forecasted periods
in each study period 𝑡 is shown by 𝑄𝑡 and |𝑄𝑡 | is the number of times a prediction is made over the period 𝑡.
We also evaluate our models and their forecasts through a simple trading application. We go or stay ‘long’ (buy)
the underlying stock when the model’s forecasted direction is high. Similarly, we go or stay ‘short’ (sell) the
8 This is happening for two reasons. It is well-known amongst football enthusiasts that the behaviour of teams at the start of the season
is volatile. This happens either due to changes to the roster of the team during the summer or due to the different training during that
period (for example, a team that has qualification games for the European Championships is forced to start its preparation earlier than
the rest). Secondly, this process ensures that all series are smooth. For example, in the first game of the season the input series Points of
H team – Points of A team would have been equal to 0 and the rest of the inputs would have to be drawn from the previous season.
17
underlying stock when the model’s forecasted direction is low. Based on the derived trading signals we
estimated the annualized return (𝑅𝐸𝑇𝑡 ) for each model which is defined as:
where 𝑟𝑞1 is the affiliated logarithmic return over the period 𝑞1 . The 𝑁𝑡 is the number of new observations
available in each study period 𝑡.
In Section 3.3.1 we present the results of our CF procedure and its benchmarks applied to the task of forecasting
the game result and the number of goals. For the sake of space, the results of our models for the case of the
Asian handicap are available in the online Appendix S.5. In Section 3.3.2, we present the performance of all our
models applied in the task of forecasting the direction of the PFE stock. The performance of our models for the
GS stock are in Appendix S.7.
The performance of our models in terms of accuracy ratio for the game result and the number of goals is
presented in Tables 2-3. Additional measures of classification are provided in Tables S.19-S.20 of Section S.8.The
number of games that CF is applied is presented in the online Appendix S.6.1. The Pesaran-Timmermann (1992)
test for the aggregate performance of the models is also estimated. The null hypothesis of the test is that the
relevant model is unable to classify correctly the underlying series9.
In-sample
Model Championship Out-of- Average
2006-2009 2007-2010 2008-2011 2009-2012 2010-2013 2011-2014
sample
Premiership 1 68.75% 75.00% 66.67% 61.54% 72.73% 66.65% 68.56%**
2 63.60% 50.03% 62.50% 57.14% 67.41% 50.00% 58.45%**
RVM-CF La -Liga 1 60.00% 48.33% 57.69% 50.04% 55.81% 46.55% 53.07%**
2 63.60% 43.50% 56.68% 43.47% 44.68% 41.60% 48.92%**
Seria A 1 53.33% 54.50% 61.53% 60.00% 57.45% 61.50% 58.05%**
2 53.06% 52.63% 54.56% 66.10% 66.70% 53.57% 57.77%**
Premiership 1 46.05% 37.84% 38.25% 44.33% 35.02% 52.02% 42.25%**
2 41.22% 37.58% 42.61% 40.23% 34,66% 46.32% 41.59%**
RVM La -Liga 1 40.33% 36.00% 39.38% 39.33% 35.67% 41.33% 38.68%**
2 33.67% 36.99% 43.33% 35.33% 41.00% 37.00% 37.89%*
Seria A 1 51.35% 50.34% 45.92% 44.96% 40.60% 39.08% 45.38%**
2 47.97% 44.22% 46.22% 34.90% 39.44% 40.27% 42.17%**
Premiership 1 45.70% 40.20% 35.91% 39.18% 40.67% 39.33% 40.17%**
2 35.81% 39.93% 37.46% 37.00% 36.67% 41.33% 38.03%**
ANFIS La -Liga 1 40.00% 36.67% 41.10% 35.33% 34.33% 40.33% 37.96%*
2 38.33% 34.59% 41.00% 36.67% 40.00% 36.03% 37.77%*
Seria A 1 42.91% 35.47% 42.18% 42.44% 40.27% 40.14% 40.57%**
2 34.80% 41.16% 43.70% 36.24% 38.38% 41.61% 39.31%**
9
We did not estimate the Pesaran-Timmermann (1992) test for each separate year for two reasons. Firstly, we are interested in the
average performance of our models and not their individual accuracy for a specific year and championship. Secondly and more
importantly, the CF approach selects a subsample of games and generates forecasts only for them. In some cases, the number of CF
forecasts is too small for the test and given the test’s assumptions this can cripple the test’s efficiency (Pesaran and Timmermann, 1992).
18
Premiership 1 46.70% 44,93% 43,60% 38,49% 46.67% 47,00% 46.69%**
2 45.61% 47,65% 41,90% 43,33% 39.66% 39.32% 41.53%**
OP La -Liga 1 47.67% 52.67% 52.40% 48.67% 42.67% 51.67% 49.29%**
2 42.60% 49.66% 49.00% 47.33% 54.33% 44.33% 47.88%**
Seria A 1 47.30% 44.59% 48.30% 45.80% 46.64% 37.68% 45.05%**
2 47.97% 44.30% 46.72% 45.30% 41.55% 42.28% 44.69%**
1 37.80% 35.14% 30.54% 36.77% 22.67% 36.67% 33.26%**
Premiership
2 35.14% 31.88% 34.36% 30.00% 39.00% 30.67% 33.51%**
MLP 1 39.67% 37.67% 41.44% 41.67% 36.67% 39.67% 39.46%**
La -Liga
2 36.00% 38.01% 44.33% 39.67% 32.33% 35.33% 37.61%**
1 36.15% 37.84% 36.39% 36.55% 41.95% 36.62% 37.58%**
Seria A
2 40.20% 42.52% 44.12% 35.57% 36.97% 34.23% 38.93%**
1 50.52% 42.23% 44.63% 41.58% 47.00% 49.33% 45.88%**
Premiership
2 40.88% 48.99% 49.83% 42.67% 39.67% 48.00% 45.01%**
k-NN 1 49.67% 52.33% 45.21% 50.67% 49.67% 51.67% 49.87%**
La -Liga
2 48.00% 48.63% 50.67% 47.33% 50.33% 44.67% 48.27%**
1 42.57% 43.58% 46.60% 50.84% 41.95% 36.97% 43.75%**
Seria A
2 45.61% 41.84% 43.70% 48.32% 34.86% 33.56% 41.31%
1 34.78% 50.00% 30.77% 25.29% 48.57% 39.62% 38.17%
Premiership
2 32.56% 60.00% 34.00% 34.52% 36.07% 32.65% 38.30%
1 30.77% 36.29% 40.91% 32.31% 52.17% 26.79% 36.54%
CF-IF La -Liga
2 52.86% 34.09% 40.00% 43.90% 47.06% 47.47% 44.23%
1 41.07% 35.96% 30.88% 52.00% 36.11% 40.00% 39.34%
Seria A
2 35.90% 18.52% 41.18% 47.71% 30.77% 42.86% 36.15%
1 38.83% 43.58% 39.93% 45.70% 38.00% 44.00% 41.68%**
Premiership
2 41.89% 40.27% 38.83% 39.67% 37.33% 41.33% 39.89%**
1 43.33% 44.33% 39.38% 46.33% 47.67% 44.67% 44.29%**
DT La -Liga
2 42.00% 46.58% 43.33% 45.00% 48.67% 43.00% 44.76%**
1 44.59% 45.61% 44.22% 44.54% 42.62% 40.85% 43.74%**
Seria A
2 42.91% 41.16% 47.06% 41.61% 39.08% 44.97% 42.8%**
1 45.02% 38.85% 39.60% 35.40% 47.00% 39.00% 40.81%**
Premiership
2 42.57% 44.97% 36.43% 34.00% 43.00% 34.33% 39.22%**
1 44.00% 33.00% 42.81% 40.67% 41.67% 37.67% 39.97%**
SVM La -Liga
2 44.67% 36.99% 45.33% 46.33% 46.00% 38.33% 42.94%**
1 33.78% 35.47% 46.94% 38.66% 39.60% 36.62% 38.51%**
Seria A
2 31.08% 39.12% 44.12% 40.27% 38.03% 42.28% 39.15%**
Note: The values in the table represent the out-of-sample accuracy ratios. Row 1 corresponds to the first year of the out-of-sample and row 2 to the
second year of the out-of-sample. For example, for the first cell on the left corner 68.75% is the out-of-sample accuracy of CF for the 2009-2010
Premiership season and the 63.60% is the performance of the exact same model (same specification and rules) for the 2010-2011 season of the same
championship. A random classifier provides 33.33% accuracy ratio in this example. ** and * indicates that according to the Pesaran-Timmermann
(1992) test, the forecasts are statistically accurate in classifying the football game result at the 99% and 95% level respectively. Online Appendix S.8
presents extra measures for the fitness of different models.
In-sample
Out-of-
Model Championship 2006-2009 2007-2010 2008-2011 2009-2012 2010-2013 2011-2014 Average
sample
1 78.57% 73.68% 72.70% 62.50% 75.02% 69.23% 71.95%**
Premiership
2 75.71% 66.67% 71.43% 65.21% 70.59% 56.75% 67.73%**
1 66.66% 65.93% 72.41% 72.02% 71.19% 62.50% 68.45%**
RVM-CF La -Liga
2 60.87% 62.50% 62.61% 70.03% 72.70% 58.51% 64.54%**
1 85.69% 83.33% 70.00% 64.29% 71.43% 72.97% 74.62%**
Seria A
2 60.00% 52.17% 57.41% 58.62% 75.00% 59.46% 60.44%**
1 50.17% 50.00% 55.37% 52.23% 51.67% 57.00% 52.74%**
Premiership
2 48.99% 55.03% 51.20% 51.67% 51.00% 53.00% 51.82%*
1 57.00% 53.33% 53.42% 54.67% 51.00% 60.33% 54.96%**
RVM La -Liga
2 50.67% 52.05% 52.34% 53.30% 48.33% 54.67% 51.89%
1 53.04% 58.45% 53.74% 48.32% 48.32% 56.34% 53.04%*
Seria A
2 55.74% 51.36% 54.62% 47.65% 54.58% 50.34% 52.38%
1 53.26% 50.14% 51.68% 47.08% 49.00% 53.00% 50.69%
Premiership
2 46.96% 50.34% 48.80% 55.33% 43.67% 53.67% 49.80%
ANFIS 1 52.33% 52.00% 54.45% 50.33% 58.00% 49.33% 52.74%*
La -Liga
2 52.00% 47.33% 53.67% 45.33% 52.05% 47.00% 49.56%
Seria A 1 50.34% 53.04% 51.70% 52.94% 54.03% 61.62% 53.95%**
19
2 47.30% 53.74% 44.96% 51.68% 56.34% 49.33% 50.56%
1 49.83% 49.66% 46.98% 49.48% 49.33% 55.33% 50.10%
Premiership
2 48.99% 52.68% 46.39% 51.33% 57.33% 51.00% 51.29%
1 53.33% 54.67% 54.79% 52.67% 56.00% 56.10% 54.59%**
OP La -Liga
2 48.00% 53.42% 53.35% 56.67% 57.00% 55.67% 54.02%**
1 46.62% 56.76% 50.68% 55.46% 47.32% 51.06% 51.32%
Seria A
2 45.13% 49.12% 47.28% 45.30% 50.70% 50.67% 48.03%
1 49.48% 46.96% 51.01% 52.58% 51.00% 53.67% 50.78%
Premiership
2 48.99% 51.34% 45.36% 55.00% 49.67% 53.67% 50.67%
MLP 1 47.67% 49.67% 51.03% 55.67% 57.00% 53.00% 52.34%**
La -Liga
2 50.67% 55.82% 47.67% 56.00% 51.33% 57.67% 53.19%
1 46.96% 58.11% 54.08% 52.52% 54.70% 52.11% 53.08%*
Seria A
2 52.36% 50.00% 55.46% 47.32% 53.87% 52.01% 51.84%
1 49.83% 43.92% 48.99% 52.58% 50.67% 53.00% 49.83%
Premiership
2 43.92% 45.97% 51.20% 51.00% 53.33% 55.67% 50.18%
k-NN 1 51.67% 52.33% 49.66% 48.67% 54.67% 51.67% 51.44%
La -Liga
2 52.33% 53.42% 51.33% 54.33% 52.67% 49.33% 52.24%*
1 47.64% 51.69% 51.02% 49.16% 56.38% 55.28% 51.86%
Seria A
2 57.09% 50.34% 46.22% 53.02% 51.76% 52.01% 51.74%
1 71.43% 48.98% 42.05% 59.52% 55.17% 51.92% 54.85%
Premiership
2 40.00% 51.72% 48.78% 49.38% 57.14% 38.46% 47.58%
1 31.25% 50.00% 47.73% 55.17% 55.00% 42.62% 46.96%
CF-IF La -Liga
2 48.15% 42.08% 40.00% 47.62% 38.71% 51.43% 44.66%
1 56.41% 52.94% 54.55% 49.40% 57.89% 50.63% 53.64%
Seria A
2 44.44% 41.38% 49.38% 53.33% 52.38% 54.17% 49.18%
1 54.98% 51.01% 55.70% 43.30% 55.33% 47.67% 51.33%
Premiership
2 50.68% 53.36% 52.92% 55.00% 45.33% 54.33% 51.94%
1 55.00% 51.00% 53.42% 52.33% 58.00% 52.00% 53.63%**
DT La -Liga
2 48.67% 48.63% 53.67% 51.33% 54.33% 51.33% 51.33%
1 53.04% 55.74% 45.92% 52.10% 50.67% 56.34% 52.30%
Seria A
2 48.65% 52.72% 47.48% 51.01% 52.11% 53.02% 50.83%
1 48.80% 46.62% 57.72% 49.14% 50.00% 50.00% 50.38%
Premiership
2 52.70% 55.37% 48.80% 54.33% 48.00% 44.67% 50.64%
1 51.33% 50.33% 55.48% 52.33% 57.33% 49.33% 52.69%*
SVM La -Liga
2 50.33% 55.14% 54.00% 52.33% 53.00% 53.67% 53.08%*
1 49.66% 57.77% 46.60% 55.46% 52.01% 48.59% 51.68%
Seria A
2 43.92% 50.68% 51.26% 52.01% 54.23% 53.36% 50.91%
Note: The values in the table represent the out-of-sample accuracy ratios. Row 1 corresponds to the first year of the out-of-sample and row 2 to the
second year of the out-of-sample. For example, for the first cell on the left corner, 78.57% is the out-of-sample accuracy of CF for the 2009-2010
Premiership season and the 75.71% is the performance of the exact same model (same specification and rules) for the 2010-2011 season of the same
championship. A random classifier provides 50.00% accuracy ratio in this example. ** and * indicates that according to the Pesaran-Timmermann (1992)
test, the forecasts are statistically accurate in classifying the football game result at the 99% and 95% level respectively. Online Appendix S.8 presents
extra measures for the fitness of different models.
From the tables above, we note that the RVM-CF approach is clearly improving the accuracy of the underlying
system (RVM). It generates forecasts based on the strongest rules which in turn leads to an improved
predictability. In most of exercises, RVM-CF is up to 10% more accurate than RVM. On the other hand, ANFIS
leads to slightly poorer accuracy ratios than those obtained by RVM. A finding consistent with the related
literature (see Martens et al., 2007). The MLP presents the worst performance in game result and close
accuracies compared to ANFIS and RVM for over-under. The DT and SVM benchmarks also do no manage to
outperform CF in both applications. The SVM as expected is providing less accurate results than RVM.
Concerning the CF-IF approach, we note that its performance is significantly lower than the one generated by
CF with the RVs as inputs. It seems that the Bayesian non-linear properties of RVM are superior in our exercise
from the variable selection based on information-fusion sensitivity scores.
The PT test reveals that only the RVM-CF is capable of classifying accurately the underlying series even two
seasons ahead in all exercises. Its benchmarks seem to work well in game result but lose power in over-under
20
exercise. We also note that the accuracies are higher for the Premiership. This might imply that the English
championship has less noise or, in other words, is easier to forecast. The previously discussed accuracy ratios
might seem promising, but they do not guarantee profitability. The risk in football gambling is almost always
that the bookmakers’ odds differ between the season and championships. Betting agencies naturally possess
superior information and modify the odds by exploiting the bettor’s cognitive biases in a way that mitigate their
risk and increase their profitability (Forrest and Simmons, 2008; Newall, 2017). Tables 4 and 5 present the
average profit per bet for the models, championships and seasons under study.
21
2 -6.70% 0.51% 4.45% -0.85% -2.38% 0.64% -0.72%
1 -8.54% -14.43% -2.01% -14.20% 5.61% -3.40% -6.16%
Premiership
2 -6.93% 0.95% -6.68% -15.45% -3.01% -17.57% -8.12%
1 1.23% -21.57% -3.92% -10.21% -9.98% -0.03% -7.41%
SVM La -Liga
2 -0.97% -7.25% 0.25% -2.66% -2.74% -10.83% -4.04%
1 -18.66% -16.82% -1.24% -16.46% -1.44% -11.87% -11.08%
Seria A
2 -16.89% -1.26% -6.48% -6.32% 8.58% -10.47% -5.47%
Note: All values in the Table represent the average profit per season. Row 1 corresponds to the first year of the out-of-sample and row 2 to the second
year of the out-of-sample. For example, for the first cell on the left corner, 17.71% is the average profit per bet of CF for the 2009-2010 Premiership
season and the 3.59% is the performance of the exact same model (same specification and rules) for the 2010-2011 season of the same championship.
In-sample
Out-of-
Model Championship 2006-2009 2007-2010 2008-2011 2009-2012 2010-2013 2011-2014 Average
sample
1 17.44% 16.25% 9.14% 11.64% 28.78% 17.62% 16.81%
Premiership
2 2.80% -1.08% -0.46% 2.83% 7.07% 4.77% 2.66%
1 13.41% 15.44% 19.49% 14.14% 14.65% 9.87% 14.50%
RVM-CF La -Liga
2 0.42% 2.15% 5.14% 0.28% 3.33% 0.83% 2.03%
1 17.22% 10.28% 6.45% 20.93% 14.32% 19.03% 14.71%
Seria A
2 1.20% -0.96% -0.25% 6.97% 2.44% 3.27% 2.11%
1 1.77% 0.14% -0.42% -3.76% -4.80% -8.84% -2.65%
Premiership
2 0.95% -1.20% -4.88% -4.03% -12.51% -15.76% -6.24%
1 1.70% -1.96% -0.95% -3.00% 4.11% -3.06% -0.53%
RVM La -Liga
2 -6.16% -5.54% -2.28% -5.45% -5.78% -6.49% -5.28%
1 -1.47% -0.51% -3.46% -6.13% -0.02% -1.35% -2.16%
Seria A
2 -1.53% -2.79% -6.56% -9.97% -1.58% -6.40% -4.81%
1 -6.57% -5.43% 0.12% -3.22% -5.20% -3.38% -3.95%
Premiership
2 -9.84% -6.64% -8.55% -6.35% -5.95% -4.18% -6.92%
1 -4.20% -2.12% 0.78% -2.92% 0.12% -3.66% -2.00%
ANFIS La -Liga
2 -7.13% -3.56% -4.02% -5.26% -3.92% -7.46% -5.23%
1 -2.84% -2.48% -4.16% -2.42% -2.60% -0.32% -2.47%
Seria A
2 -4.61% -5.07% -10.49% -6.93% -3.30% -5.02% -5.90%
1 -7.95% -11.28% -0.16% -3.86% -1.96% 3.15% -3.68%
Premiership
2 -7.80% 0.43% -15.66% -6.35% 5.78% -9.50% -5.52%
1 -0.80% -8.90% -0.18% -10.24% -1.81% -5.10% -4.51%
OP La -Liga
2 -8.75% -5.94% -10.17% 5.07% -12.75% -4.20% -6.12%
1 -13.14% -0.66% -4.76% 0.03% -13.08% -10.74% -7.06%
Seria A
2 4.59% -9.44% 2.22% -14.16% -1.59% -4.86% -3.87%
1 -8.03% -13.85% -6.47% -1.95% -7.31% -1.10% -6.45%
Premiership
2 -7.31% -6.33% -18.37% 1.28% -8.31% -0.63% -6.61%
MLP 1 -13.34% -10.16% -8.95% -1.53% 0.61% -8.59% -6.99%
La -Liga
2 -7.59% -2.79% -15.21% -1.63% -9.69% -0.58% -6.25%
1 -16.32% 1.64% -1.41% -1.70% 1.45% -5.56% -3.65%
Seria A
2 -0.90% -9.56% 0.30% -7.76% -2.83% -6.76% -4.58%
1 -9.24% -15.35% -7.52% -0.43% -6.01% -3.26% -6.97%
Premiership
2 -16.73% -10.49% -0.27% -6.39% 0.15% 3.12% -5.10%
k-NN 1 -6.50% -1.67% -9.52% -12.46% -0.49% -5.94% -6.10%
La -Liga
2 -5.42% -3.36% -4.50% -2.56% -2.42% -10.63% -4.81%
1 -12.54% -4.79% -6.19% -9.18% 4.44% 1.07% -4.53%
Seria A
2 2.81% -4.51% -15.24% -0.44% -3.87% -6.21% -4.58%
1 30.83% -7.35% -22.47% 17.14% -1.83% -2.37% 2.33%
Premiership
2 -20.00% -3.38% -8.49% -5.90% 6.10% -27.33% -9.83%
1 -39.38% -2.03% -13.57% -0.83% 6.90% -17.36% -11.04%
CF-IF La -Liga
2 -6.72% 1.44% -17.70% -12.90% -28.19% -4.30% -11.40%
1 3.74% 0.53% 4.09% -5.36% 10.05% -2.18% 1.81%
Seria A
2 -15.19% -19.07% -4.15% 0.02% -4.10% 4.04% -6.41%
1 1.44% -6.23% 3.79% -16.88% 4.27% -10.35% -3.99%
Premiership
2 -5.11% -2.84% -0.37% 0.10% -14.99% 1.21% -3.67%
1 -0.29% -5.93% -3.03% -6.92% 3.76% -4.93% -2.89%
DT La -Liga
2 -9.90% -12.08% -2.99% -7.23% -1.53% -8.45% -7.03%
1 -2.34% 3.90% -14.51% -3.55% -4.69% 3.86% -2.89%
Seria A
2 -11.83% -2.17% -12.08% -5.06% -3.14% -3.75% -6.34%
22
1 -8.68% -14.95% 5.77% -2.53% -7.06% -6.67% -5.69%
Premiership
2 -0.92% 2.35% -10.88% 0.27% -9.99% -16.56% -5.95%
1 -7.02% -7.53% 3.30% -2.77% -2.17% -10.61% -4.47%
SVM La -Liga
2 -8.66% -0.08% -0.33% -7.08% -7.55% -2.75% -4.41%
1 -3.80% 1.91% -7.68% 2.58% -3.18% -11.29% -3.58%
Seria A
2 -13.38% -9.26% -1.85% -3.82% -2.34% -3.31% -5.66%
Note: All values in the Table represent the average profit per season. Row 1 corresponds to the first year of the out-of-sample and row 2 to the second
year of the out-of-sample. For example, for the first cell on the left corner, 17.44% is the average profit per bet of CF for the 2009-2010 Premiership
season and the 2.80% is the performance of the exact same model (same specification and rules) for the 2010-2011 season of the same championship.
The RVM-CF approach seems to clearly outperform its benchmarks and offers impressive profits. The average
profit per bet of RVM-CF is clearly above 10% in the first year of the out-of-sample in all cases. This profit is
substantially reduced in the second year of the out-of-sample but still remains positive. On the other hand, all
other models under study seem to present negative profits per bet in most of the cases even at the first year of
the out-of-sample. The CF-IF profit results remain far below the RVM-CF. Thus, the superiority of RVM variable
selection is validated in terms of profitability too. If we compare Tables 4 and 5 with the associated accuracy
results (Tables 2 and 3), we note that statistical accuracy is not synonymous with betting profitability. There are
cases where the PT test indicates that the underlying model classifies accurately the underlying series and the
related average profit per bet is negative.
The average profit per bet considers that the forecaster will invest the same amount of capital in all games
irrespective of their previous performance. In reality, forecasters will probably modify their invested capital
based on the previous record and follow a more adaptive strategy. Thus, in Tables 6 and 7 we present the
proportional cumulative annualized return. This measure assumes that the forecaster always bets 5% of their
total pot. So, for example, in the first game in any give exercise the gambler will bet the 5% of their total pot –
if it is 100 units, this will be 5 units. If the forecaster wins the bet and their earnings are 4 units, their total pot
would then be 104 units. Thus, in the next game the forecaster will bet 5.2 units. The proportional cumulative
annualized return offers a more realistic approach to betting where participants modify the size of their bets
based on their previous record.
In-sample
Out-of-
Model Championship 2006-2009 2007-2010 2008-2011 2009-2012 2010-2013 2011-2014 Average
sample
1 40.26% 7.42% 10.89% 11.22% 19.58% 6.91% 16.05%
Premiership
2 38.08% 1.35% 1.52% -4.63% 11.11% -5.70% 6.96%
1 -0.14% 14.58% 22.39% 18.14% 29.46% 10.04% 15.75%
RVM-CF La -Liga
2 -2.52% 3.25% 11.81% -0.48% -9.60% -5.85% -0.57%
1 15.34% 10.95% 28.64% 32.30% 32.01% 10.17% 21.57%
Seria A
2 -1.63% -32.41% 2.01% -5.02% -17.08% -6.29% -10.07%
1 -15.14% -15.74% -20.17% -29.53% -12.47% -17.24% -18.38%
Premiership
2 -27.12% -45.78% -54.28% -35.52% -60.05% -58.30% -46.84%
1 -33.45% -20.64% -19.25% -28.65% -17.53% -21.02% -23.42%
RVM La -Liga
2 -55.26% -57.36% -38.27% -47.20% -42.08% -47.36% -47.92%
1 0.15% -14.21% -7.50% -12.37% -32.55% -47.32% -18.97%
Seria A
2 -25.13% -60.93% -16.06% -61.27% -59.77% -62.83% -47.67%
1 -5.26% -10.25% -24.07% -30.17% -9.52% -16.29% -15.93%
Premiership
2 -29.16% -15.57% -36.49% -33.60% -23.41% -23.47% -26.95%
ANFIS
1 -32.55% -33.24% -32.24% -16.47% -10.26% -45.07% -28.31%
La -Liga
2 -68.26% -38.56% -69.41% -23.54% -15.05% -52.14% -44.49%
23
1 -55.85% -26.37% -6.84% -47.29% -14.54% -13.25% -27.36%
Seria A
2 -60.24% -48.70% -21.58% -55.28% -67.78% -33.88% -47.91%
1 -52.66% -23.85% -36.44% -25.68% -15.22% -40.28% -32.36%
Premiership
2 -64.66% -32.10% -52.13% -43.69% -19.65% -56.09% -44.72%
1 -15.14% -36.60% -35.06% -59.07% -22.18% -28.46% -32.75%
OP La -Liga
2 -18.44% -68.12% -62.38% 60.57% -56.87% -55.41% -33.44%
1 -62.48% -18.93% -50.38% -37.57% -37.55% -40.16% -41.18%
Seria A
2 -66.64% -46.18% -55.37% -66.88% -59.85% -58.74% -58.94%
1 -74.49% -87.65% -94.98% -17.70% -99.08% -79.39% -75.55%
Premiership
2 -82.26% -83.75% -59.58% -98.43% -80.21% -42.43% -74.44%
1 -56.07% -87.30% -76.05% -62.53% -96.09% -88.16% -77.70%
MLP La -Liga
2 -91.90% -92.74% -52.73% -92.22% -88.23% -77.60% -82.57%
1 -93.52% -93.37% -88.56% -78.34% -75.31% -44.88% -79.00%
Seria A
2 -91.96% -55.98% -2.59% -91.16% -39.58% -95.93% -62.87%
1 -42.65% -81.98% -79.02% -84.99% -69.15% -66.74% -70.75%
Premiership
2 -87.78% -47.25% -16.42% -45.13% -91.86% 24.71% -43.96%
1 -17.47% -12.96% -94.67% -33.41% -68.68% -73.52% -50.12%
k-NN La -Liga
2 -52.11% -67.81% -28.81% -86.72% -75.91% -93.66% -67.50%
1 -88.87% -62.71% -19.19% -6.31% -41.15% -45.91% -44.02%
Seria A
2 -38.15% -69.63% -88.30% -48.46% -93.58% -97.40% -72.58%
1 -55.59% -2.33% -44.24% -85.83% 14.63% -38.35% -35.29%
Premiership
2 4.89% -36.39% -51.09% -65.86% -0.35% -61.33% -35.02%
1 -70.06% -29.06% -2.62% -72.88% 38.37% -47.00% -30.54%
CF-IF La -Liga
2 -23.77% -25.99% -33.64% -32.11% -23.60% -15.58% -27.82%
1 -22.80% -44.32% -58.41% 10.81% -20.44% -6.18% -23.56%
Seria A
2 -33.00% -47.84% -19.20% -19.70% -6.23% -7.53% -22.25%
1 -90.28% -58.37% -81.30% 47.17% -85.51% -54.17% -53.74%
Premiership
2 -66.72% -94.07% -84.28% -85.72% -87.47% -53.89% -78.69%
1 -71.98% -86.90% -95.17% -66.26% -63.23% -93.25% -79.46%
DT La -Liga
2 -84.40% -40.31% -80.50% -44.35% -68.26% -92.04% -68.31%
1 -62.28% -50.25% -52.55% -46.48% -74.21% -66.36% -58.69%
Seria A
2 -77.97% -48.24% 6.32% -54.14% -62.55% -40.03% -46.10%
1 -82.41% -92.97% -64.35% -93.62% 18.17% -71.67% -64.47%
Premiership
2 -78.53% -37.90% -81.67% -95.39% -64.81% -96.12% -75.74%
1 -34.78% -97.61% -71.02% -88.41% -88.54% -74.74% -75.85%
SVM La -Liga
2 -50.70% -84.09% -45.31% -62.05% -62.66% -91.07% -65.98%
1 -95.91% -95.45% -49.32% -90.75% -58.43% -89.89% -79.96%
Seria A
2 -96.02% -60.51% -69.43% -78.64% 52.94% -87.66% -56.55%
Note: All values in the Table represent the proportional cumulative annualized return per season. Row 1 corresponds to the first year of the out-of-
sample and row 2 to the second year of the out-of-sample. For example, for the first cell on the left corner, 40.26% is the average profit per bet of CF for
the 2009-2010 Premiership season and the 38.08% is the performance of the exact same model (same specification and rules) for the 2010-2011 season
of the same championship.
Table 7: Proportional Cumulative Annualized Return (Number of Goals)
In-sample
Out-of-
Model Championship 2006-2009 2007-2010 2008-2011 2009-2012 2010-2013 2011-2014 Average
sample
1 7.29% 24.58% 19.03% 6.68% 3.81% 33.46% 15.81%
Premiership
2 0.85% 8.04% -1.69% -3.42% -2.03% -6.21% -0.74%
1 10.24% 12.48% 5.87% 33.61% 9.56% 12.57% 14.06%
RVM-CF La -Liga
2 -9.22% 4.65% -8.56% 4.18% 2.55% 0.87% -0.92%
1 9.17% 10.66% 11.54% 14.03% 25.19% 11.86% 13.74%
Seria A
2 3.63% 0.57% 3.12% 7.35% -1.25% 2.63% 2.68%
1 -21.14% -15.58% -31.61% -58.47% -14.87% -18.40% -26.68%
Premiership
2 -56.24% -41.28% -59.35% -65.55% -40.16% -51.81% -52.40%
24
2 -49.56% -51.17% -49.52% -49.08% -70.35% -49.61% -53.22%
1 -61.63% -60.78% -19.22% -40.79% -23.95% -42.25% -41.44%
La -Liga
2 -63.28% -61.36% -53.24% -67.59% -59.31% -52.22% -59.50%
1 -52.78% -55.90% -56.32% -50.63% -57.05% -54.00% -54.45%
Seria A
2 -59.55% -58.37% -60.24% -62.47% -59.88% -60.12% -60.11%
1 -39.45% -49.56% -55.32% -58.48% -47.22% -52.17% -50.37%
Premiership
2 -60.17% -58.16% -59.42% -62.21% -55.66% -63.14% -59.79%
We note that the pattern of the RVM-CF’s profitability is similar to the one obtained by the previous metric. This
profitability varies throughout the seasons and the championships but remains positive in the first year of the
out-of-sample. Conversely, now the average profitability of RVM-CF in the second year of the out-of-sample is
not always positive. The other models present a consistent negative performance in all exercises. Especially, the
traditional MLP, k-NN and DT appear to perform worse than the rest of the benchmarks, while SVM performance
is far below the RVM one especially in the case of game result. In this metric, the information-fusion benchmark
outperforms the other benchmarks, but it does not manage to achieve positive returns as in RVM-CF. The Kelly
criterion allows us to bet based on the probability of a favourable outcome. The proportion of capital dedicated
25
for each game is based on the probability of winning the bet based on RVM-CF or OP. Tables 8 and 9 present the
average profit per bet of the RVM-CF10 and the OP based on the Kelly criterion respectively.
From Tables 8 and 9 we observe an increase in the profitability in most exercises of RVM-CF and OP with the
Kelly criterion. This increase is obvious in the cases where our models were presenting before negative results,
namely the second out-of-sample year of RVM-CF and all OP exercises. In first year of the out-of-sample for
RVM-CF, where RVM-CF had already positive profit per bet, the results remain similar.
As discussed earlier we argue that bookmakers’ odds are biased to exploit the bettors’ behaviour. The nominator
of the Kelly fraction (equation 12) is the so called “edge” or expected return of the bet. The edge considers both
the correct prediction probability and the odds. When the odds are biased the edge decreases. Thorp (2008)
argues that Kelly criterion may need millions of trials to dominate other strategies in case of having a low edge.
On the other hand, Maclean et al. (2010) find Kelly criterion to be very risky in the short term and argue that
10 Given the superior performance of RVM-CF over CF-IF, we apply the Kelly criterion only to RVM-CF.
26
despite its promising long-run growth properties, it may lead to poor return outcomes. In our exercise, Kelly
improves the betting performance in case of having enough trials, such as in OP where a bet is placed for each
game. Similarly, when RVM-CF loses power and the forecasts deteriorate (as in the second year of out-of-
sample), Kelly can reduce the exposure to the risk and control the size of losses. On the other hand, for RVM-CF
in the first year of out-of-sample, where the model produces already significant profits, the Kelly criterion seems
to have no effect on our results. This happens because the number of games where RVM-CF is applied is small
and our model produce significant positive results that are not affected by the odds’ biases. At last, we note that
in most cases, our models are more accurate and profitable in the English Premiership. We assume that the
English championship has less noise or in other words, it is easier to be forecasted11. At last, we demonstrate
the benefits of the Kelly criterion and how it can be translated within the RVM-CF context. Its probabilistic nature
seems highly beneficial in football betting in cases where the underlying model has a low power.
The performance of our models in terms of accuracy ratio and annualized excess return for the PFE stock are
presented in Table 10. Additional measures of classification are provided in Table S.22 of Online Appendix S.8.
Online Appendix S.6.2 presents the number of forecasts obtained by CF. The Pesaran-Timmermann (1992) test
for the aggregate performance of the models is also estimated.
11It is well known amongst football fans, that in the English Premiership, teams’ competition is higher. The number of teams that fight
for the championship or to avoid relegation is higher than the one in La Liga and Seria A. Also, some of the biggest bookmakers originate
from UK while football betting in the English Premiership is widely prevalent in Europe and Asia.
27
The results the finance application further validate the success of CF. We observe that with CF is achieving on
average the highest accuracies and profitability. CF presents more than 5% and 6% higher excess returns from
the second (OP) and third (RVM) best model respectively. MLP and k-NN present on average negative
profitability over the ten-year period, while k-NN and SVM does not manage to exceed the 50% accuracy
benchmark for this case. Following the pattern of previous results, SVM’s profitability falls far behind RVM’s one.
Although ANFIS is achieving high profitability in specific years, on average the annualized excess returns are
around 5% lower than OP and RVM. Finally, DT overcomes the accuracy benchmark, but its profitability is the
lowest after the negative ones of MLP and k-NN. The relevant GS stock results confirm the previous results and
are presented in Online Appendix S.7.
4. Conclusions
This study introduces a CF concept in forecasting and demonstrates its utility. CF generates a set of FRs in the
in-sample and estimates their average firing strength. These rules are ranked and applied in the out-of-sample.
CF generates forecasting signals at points where strong rules are nearby and satisfy an endogenous and an
exogenous threshold. The forecasting signal is a weighted average of the strongest rules. CF is useful in OR
problems where uncertainty is high and poor forecasts are associated with substantial losses. It can offer
transparency, protection against under and over-fitting while at the same time improves the forecasting
accuracy of the underlying system.
In order to examine the CF performance, two forecasting exercises are designed. The first exercise is based on
the game result, Asian handicap and number of goals of football games in the Premiership, La Liga and Seria A
championships. CF is combined with RVM and generates forecasts in six consecutive exercises. These forecasts
are evaluated in terms of statistical accuracy and betting profitability. CF presents higher statistical accuracy and
betting profitability than a RVM, ANFIS, OP, MLP, k-NN, DT, SVM and a CF-IF model where the CF inputs vector
is based on Information Fusion-based approach. We note that although forecasting accurately the game result
and number of goals does not seem a strenuous task with our models, profitability is only obtained through CF.
Our procedure outperforms not only the most popular fuzzy approach (ANFIS) but also improves the
predictability of the underlying system. This contrasts the common dilemma with developing transparent
decision systems that sacrifice accuracy for interpretability (Martens, 2007). The introduction of the Kelly ratio
can further improve the profitability of CF and limit the losses of OP in most cases. The second application aims
to forecast the direction of stock price change of two stocks. In this exercise, the inputs vector is small enough
to be fed directly to CF. Our results validate the ones of the previous exercise, with CF providing the most
accurate forecasts and the highest annualized excess returns on average.
Both empirical applications presented above have demonstrated the merits of CF, since CF’s generated forecasts
are accurate and profitable. These forecasts are generated from a transparent process, which also reveals the
factors that determine the target series. This is important given that we applied CF to two independent
forecasting exercises of different nature. The first set of results contributes to the literature of the football
28
betting market efficiency, since we note that is possible to generate consistent profits with CF. In the stock
market case, some benchmarks (OP, RVM and ANFIS) have a time-varying profitability. Nonetheless, CF is the
only model that is able to translate consistently forecasting accuracy to profitability across all periods.
CF can be a useful tool for practitioners and academics who deal with OR problems that demand complex non-
linear techniques. In areas such as Medicine, Finance and Economics it can improve researchers’ understanding
of the underlying nature of the series. Where the general public is concerned, in areas such as financial trading
or football betting, it can offer decision rules that are simple and interpretable. It is an attractive alternative to
the numerous unconditional fuzzy inference approaches that dominate the literature.
The current study introduces CF as a robust forecasting technique and evaluates its performance through two
applications. Despite the demonstrated merits, room exists for further improvements in forecast accuracy.
Future research in CF could enhance its methodology as well as its application scope. One possible pathway is
applying the conditional framework proposed for CF to other machine learning techniques such as SVM and
MLP. This line of inquiry has much relevance for the evaluation of cost-sensitive datasets for which errors result
in severe losses (e.g. football betting). Another area to explore is using alternative forecast combination
techniques, as was conducted using IF technique in this survey to select the best performing model dynamically.
A Bayesian treatment such as dynamic model averaging (Raftery et al., 2010) could be explored in further
research as an alternative to the IF. In terms of application, future studies could deploy the CF in other fields of
practice to study how it could assist decision-making processes.
Acknowledgments
We would like to thank the three anonymous reviewers, Jason Laws, Frank McGroarty, Hans-Jörg von Mettenheim, Neil
Kellard, Hans-Georg Zimmermann, Philip Newall, as well as the participants at the Operations Research seminar of the
University of Oviedo (2014), the 27th European Conference on Operational Research (2015), the 23rd Forecasting Financial
Markets conference (2015), the Computational Operations Research workshop (2016), and the 3 rd Quantitative Finance
and Risk Analysis conference (2017) for their helpful comments.
REFERENCES
Akkoç, S., 2012. An empirical comparison of conventional techniques, neural networks and the three stage
hybrid Adaptive Neuro Fuzzy Inference System (ANFIS) model for credit scoring analysis: The case of Turkish
credit card data. European Journal of Operational Research, 222(1), pp.168-178.
Alpaydm, E., 1999. Combined 5× 2 cv F test for comparing supervised classification learning algorithms. Neural
computation, 11(8), pp.1885-1892.
Andersson, P., Memmert, D. and Popowicz, E. 2009. Forecasting outcomes of the World Cup 2006 in football:
Performance and confidence of bettors and laypeople. Psychology of Sport and Exercise, 10(1), 116-123.
Angelini, G. and De Angelis, L., 2017. PARX model for football match predictions. Journal of Forecasting, 36(7),
pp.795-807.
Atsalakis, G.S., Atsalaki, I.G., Pasiouras, F. and Zopounidis, C., 2019. Bitcoin price forecasting with neuro-fuzzy
techniques. European Journal of Operational Research (forthcoming).
29
Atsalakis, G.S. and Valavanis, K.P., 2009. Forecasting stock market short-term trends using a neuro-fuzzy based
methodology. Expert systems with Applications, 36(7), pp.10696-10707
Baboota, R. and Kaur, H. 2018. Predictive analysis and modelling football results using machine learning
approach for English Premier League. International Journal of Forecasting, (forthcoming)
Bastos, L.S. and da Rosa, J.M.C., 2013. Predicting probabilities for the 2010 FIFA World Cup games using a
Poisson-Gamma model. Journal of Applied Statistics, 40(7), pp.1533-1544.
Bellman, R.E. and Zadeh, L.A., 1970. Decision-making in a fuzzy environment. Management Science, 17(4), pp.
B141-B164.
Bezdek, J.C., Ehrlich, R. and Full, W., 1984. FCM: The fuzzy c-means clustering algorithm. Computers &
Geosciences, 10(2-3), pp.191-203.
Bogaert, M., Ballings, M. and Van den Poel, D., 2018. Evaluating the importance of different communication
types in romantic tie prediction on social media. Annals of Operations Research, 263(1-2), pp.501-527.
Boshnakov, G., Kharrat, T. and McHale, I. G. 2017. A bivariate Weibull count model for forecasting association
football scores. International Journal of Forecasting, 33(2), 458-466.
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J., 1984. Classification and regression trees. Wadsworth
& Brooks/Cole Advanced Books & Software, Monterey, CA.
Chang, P.C., Liu, C.H. and Lai, R.K., 2008. A fuzzy case-based reasoning model for sales forecasting in print circuit
board industries. Expert Systems with Applications, 34(3), pp.2049-2058.
Chen, Y. and Hao, Y., 2017. A feature weighted support vector machine and K-nearest neighbor algorithm for
stock market indices prediction. Expert Systems with Applications, 80, pp.340-355.
Cheng, C.B. and Lee, E.S., 2001. Switching regression analysis by fuzzy adaptive network. European Journal of
Operational Research, 128(3), pp.647-663.
Constantinou, A.C., Fenton, N.E. and Neil, M., 2012. pi-football: A Bayesian network model for forecasting
Association Football match outcomes. Knowledge-Based Systems, 36, pp.322-339.
Crowder, M., Dixon, M., Ledford, A. and Robinson, M., 2002. Dynamic modelling and prediction of English
Football League matches for betting. Journal of the Royal Statistical Society: Series D (The Statistician), 51(2),
pp.157-168.
Dixon, M.J. and Coles, S.G., 1997. Modelling association football scores and inefficiencies in the football betting
market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46(2), pp.265-280.
Dixon, M. and Robinson, M. 1998. A birth process model for association football matches. Journal of the Royal
Statistical Society: Series D (The Statistician), 47(3), 523-538.
Díez-Pastor, J.F., Rodríguez, J.J., García-Osorio, C. and Kuncheva, L.I., 2015. Random balance: ensembles of
variable priors classifiers for imbalanced data. Knowledge-Based Systems, 85, pp.96-111.
Dobson, S. and Goddard, J., 2003. Persistence in sequences of football match results: A Monte Carlo
analysis. European Journal of Operational Research, 148(2), pp.247-256.
Feess, E., Müller, H. and Schumacher, C. 2016. Estimating risk preferences of bettors with different bet sizes.
European Journal of Operational Research, 249(3), 1102-1112.
Forrest, D. and Simmons, R., 2008. Sentiment in the betting market on Spanish football. Applied
Economics, 40(1), pp.119-126.
Fukami, S., Mizumoto, M. and Tanaka, K., 1980. Some considerations on fuzzy conditional inference. Fuzzy sets
and Systems, 4(3), pp.243-273.
Goddard, J. and Asimakopoulos, I., 2004. Forecasting football results and the efficiency of fixed‐odds
betting. Journal of Forecasting, 23(1), pp.51-66.
Goddard, J., 2005. Regression models for forecasting goals and match results in association
football. International Journal of forecasting, 21(2), pp.331-340.
Gomes, J., Portela, F. and Santos, M.F., 2016. Pervasive Decision Support to predict football corners and goals
by means of data mining. In New Advances in Information Systems and Technologies (pp. 547-556). Springer,
Cham.
30
Gong, X., Si, Y.W., Fong, S. and Biuk-Aghai, R.P., 2016. Financial time series pattern matching with extended UCR
Suite and Support Vector Machine. Expert Systems with Applications, 55, pp.284-296.
Gradojevic, N. and Gençay, R., 2013. Fuzzy logic, trading uncertainty and technical trading. Journal of Banking &
Finance, 37(2), pp.578-586.
Guégan, D. and Huck, N., 2005. On the use of nearest neighbors in finance. Finance, 26(2), pp.67-86.
Guresen, E., Kayakutlu, G. and Daim, T.U., 2011. Using artificial neural network models in stock market index
prediction. Expert Systems with Applications, 38(8), pp.10389-10397.
Guillaume, S., 2001. Designing fuzzy inference systems from data: An interpretability-oriented review. IEEE
transactions on fuzzy systems, 9(3), pp.426-443.
Hausman, J.A., Lo, A.W. and MacKinlay, A.C., 1992. An ordered probit analysis of transaction stock prices. Journal
of Financial Economics, 31(3), pp.319-379.
Hruschka, H., 1988. Use of fuzzy relations in rule-based decision support systems for business planning
problems. European journal of operational research, 34(3), pp.326-335.
Huang, K.Y. and Chen, K.J., 2011. Multilayer perceptron for prediction of 2006 world cup football game. Advances
in Artificial Neural Systems, 2011 (11), p.1-8.
Huang, S.C. and Wu, T.K., 2008. Combining wavelet‐based feature extractions with relevance vector machines
for stock index forecasting. Expert Systems, 25(2), pp.133-149.
Igiri, C. P. 2015. Support Vector Machine—Based Prediction System for a Football Match Result. IOSR Journal of
Computer Engineering (IOSR-JCE), 17(3), pp. 21-26.
Jang, J.S., 1993. ANFIS: adaptive-network-based fuzzy inference system. IEEE transactions on systems, man, and
cybernetics, 23(3), pp.665-685.
Kadri, M.B., 2014. Disturbance rejection using fuzzy model free adaptive control (FMFAC) with adaptive
conditional defuzzification threshold. Journal of the Franklin Institute, 351(5), pp.3013-3031.
Kelly, J.L., 1956. A new interpretation of information rate. Bell Labs Technical Journal, 35(4), pp.917-926.
Koopman, S.J. and Lit, R., 2015. A dynamic bivariate Poisson model for analysing and forecasting match results
in the English Premier League. Journal of the Royal Statistical Society: Series A (Statistics in Society), 178(1),
pp.167-186.
Kuncheva, L.I., 2000. How good are fuzzy if-then classifiers?. IEEE Transactions on Systems, Man, and
Cybernetics, Part B (Cybernetics), 30(4), pp.501-509.
Kuo, R.J., 2001. A sales forecasting system based on fuzzy neural network with initial weights generated by
genetic algorithm. European Journal of Operational Research, 129(3), pp.496-517.
Kuypers, T., 2000. Information and efficiency: an empirical study of a fixed odds betting market. Applied
Economics, 32(11), pp.1353-1363.
Maclean, L.C., Thorp, E.O. and Ziemba, W.T., 2010. Long-term capital growth: the good and bad properties of
the Kelly and fractional Kelly capital growth criteria. Quantitative Finance, 10(7), pp.681-687.
Martens, D., Baesens, B., Van Gestel, T. and Vanthienen, J., 2007. Comprehensible credit scoring models using
rule extraction from support vector machines. European journal of operational research, 183(3), pp.1466-1476.
Martins, R. G., Martins, A. S., Neves, L. A., Lima, L. V., Flores, E. L., and do Nascimento, M. Z. 2017. Exploring
polynomial classifier to predict match results in football championships. Expert Systems with Applications, 83,
79-93.
Min, B., Kim, J., Choe, C., Eom, H. and McKay, R.B., 2008. A compound framework for sports results prediction:
A football case study. Knowledge-Based Systems, 21(7), pp.551-562.
Mizumoto, M. and Zimmermann, H.J., 1982. Comparison of fuzzy reasoning methods. Fuzzy sets and systems,
8(3), pp.253-283.
Newall, P.W., 2017. Behavioral complexity of British gambling advertising. Addiction Research & Theory, 25(6),
pp.505-511.
Oberstone, J. 2009. Differentiating the top English premier league football clubs from the rest of the pack:
Identifying the keys to success. Journal of Quantitative Analysis in Sports, 5(3).
31
Oberstone, J. 2011. Comparing team performance of the English premier league, Serie A, and La Liga for the
2008-2009 season. Journal of Quantitative Analysis in Sports, 7(1).
Oztekin, A., Delen, D., Turkyilmaz, A. and Zaim, S., 2013. A machine learning-based usability evaluation method
for eLearning systems. Decision Support Systems, 56, pp.63-73.
Oztekin, A., Kizilaslan, R., Freund, S. and Iseri, A., 2016. A data analytic approach to forecasting daily stock returns
in an emerging market. European Journal of Operational Research, 253(3), pp.697-710.
Owramipur, F., Eskandarian, P. and Mozneb, F.S., 2013. Football result prediction with Bayesian network in
Spanish League-Barcelona team. International Journal of Computer Theory and Engineering, 5(5), p.812.
Patel, J., Shah, S., Thakkar, P. and Kotecha, K., 2015. Predicting stock and stock price index movement using
trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, 42(1),
pp.259-268.
Pesaran, M.H. and Timmermann, A., 1992. A simple nonparametric test of predictive performance. Journal of
Business & Economic Statistics, 10(4), pp.461-465.
Piramuthu, S., 1999. Financial credit-risk evaluation with neural and neurofuzzy systems. European Journal of
Operational Research, 112(2), pp.310-321.
Purda, L.D., 2007. Stock market reaction to anticipated versus surprise rating changes. Journal of Financial
Research, 30(2), pp.301-320.
Rafi, M. and Shaikh, M. S., 2013. A comparison of SVM and RVM for Document Classification. arXiv preprint
arXiv:1301.2785.
Raftery, A.E., Kárný, M. and Ettler, P., 2010. Online prediction under model uncertainty via dynamic model
averaging: Application to a cold rolling mill. Technometrics, 52(1), pp.52-66.
Rotshtein, A. P., Posner, M., and Rakityanskaya, A. B. 2005. Football predictions based on a fuzzy model with
genetic and neural tuning. Cybernetics and Systems Analysis, 41(4), 619-630.
Şahin, M. and Erol, R., 2018. Prediction of Attendance Demand in European Football Games: Comparison of
ANFIS, Fuzzy Logic, and ANN. Computational intelligence and neuroscience, 2018.
Schumaker, R. P., Jarmoszko, A. T. and Labedz Jr, C. S. 2016. Predicting wins and spread in the Premier League
using a sentiment analysis of twitter. Decision Support Systems, 88, 76-84.
Sevim, C., Oztekin, A., Bali, O., Gumus, S. and Guresen, E., 2014. Developing an early warning system to predict
currency crises. European Journal of Operational Research, 237(3), pp.1095
Singpurwalla, N.D. and Booker, J.M., 2004. Membership functions and probability measures of fuzzy sets. Journal
of the American Statistical Association, 99(467), pp.867-877.
Štrumbelj, E. 2014. On determining probability forecasts from betting odds. International journal of forecasting,
30(4), 934-943.
Sugeno, M., 1985. Industrial applications of fuzzy control. Elsevier Science, New York, NY, USA.
Sun, B., Guo, H., Karimi, H.R., Ge, Y. and Xiong, S., 2015. Prediction of stock index futures prices based on fuzzy
sets and multivariate fuzzy time series. Neurocomputing, 151(3), pp.1528-1536.
Tay, F.E. and Cao, L.J., 2002. Modified support vector machines in financial time series forecasting.
Neurocomputing, 48(1-4), pp.847-861.
Teodorović, D., 1994. Fuzzy sets theory applications in traffic and transportation. European Journal of
Operational Research, 74(3), pp.379-390.
Thorp, E. O. (2008). The Kelly criterion in blackjack sports betting, and the stock market. In Zenios, S.A and
Ziemba, W.T. (Eds) Handbook of asset and liability management, 385-428.
Ticknor, J.L., 2013. A Bayesian regularized artificial neural network for stock market forecasting. Expert Systems
with Applications, 40(14), pp.5501-5506.
Tipping, M.E., 2001. Sparse Bayesian learning and the relevance vector machine. Journal of machine learning
research, 1(Jun), pp.211-244.
Trawinski, K., 2010. A fuzzy classification system for prediction of the results of the basketball games. In IEEE
International Conference on Fuzzy Systems (FUZZ), 2010 (pp. 1-7). IEEE.
32
Vapnik, V. N. 1995. The nature of statistical learning theory. Springer, New York, NY, USA.
Vlastakis, N., Dotsis, G. and Markellos, R.N., 2008. Nonlinear modelling of European football scores using support
vector machines. Applied Economics, 40(1), pp.111-118.
Wang, Y.F., 2002. Predicting stock price using fuzzy grey prediction system. Expert systems with applications,
22(1), pp.33-38
Wang, L.X. and Mendel, J.M., 1992. Fuzzy basis functions, universal approximation, and orthogonal least-squares
learning. IEEE Transactions on Neural Networks, 3(5), pp.807-814.
Wang, J.Z., Wang, J.J., Zhang, Z.G. and Guo, S.P., 2011. Forecasting stock indices with back propagation neural
network. Expert Systems with Applications, 38(11), pp.14346-14355.
Yang, J.W. and Parwada, J., 2012. Predicting stock price movements: an ordered probit analysis on the Australian
Securities Exchange. Quantitative Finance, 12(5), pp.791-804.
Zadeh, L.A., 1965. Fuzzy sets. Information and control, 8(3), pp.338-353.
Zadeh, L.A., 1983. The role of fuzzy logic in the management of uncertainty in expert systems. Fuzzy sets and
systems, 11(1), pp.199-227.
Zhang, G., Hu, M.Y., Patuwo, B.E. and Indro, D.C., 1999. Artificial neural networks in bankruptcy prediction:
General framework and cross-validation analysis. European journal of operational research, 116(1), pp.16-32.
Zhong, X. and Enke, D., 2017. A comprehensive cluster and classification mining procedure for daily stock market
return forecasting. Neurocomputing, 267, pp.152-168.
Appendix
A. CF illustration example
In this appendix, we present a detail CF example for the Premiership, the first forecasting exercise and the game
result as a target. The in-sample is the seasons 2006 to 2007, 2007 to 2008 and 2008 to 2009 and the out-of-
sample is the season 2009 to 2010.
As mentioned above, the first step is to collect the generated RVs when feeding all the inputs of Table 1 into
RVM. For the specific case, a set of 12 RVs are selected and based on these, the RVM generates a series of in-
sample forecasts. The RV set is presented below in Table A.1.
Betbrain average over Betbrain size of Betbrain average Asian Betbrain average Asian Points of H in the last 3
2.5 goals odds handicap (home team) handicap home team handicap away team games when H plays at
(BbAv>2.5) (BBAHh) odds (BbAvAHH) odds (BbAvAHH) home
(PtH3H)
Points of A in the last 3 Number of shots on Number of shots on Number of corner kicks Number of corner kicks
games when A plays target of H team in the target of A team in the on target of H team in on target of H team in
away last 1 game (StH1) last 1 games the last 3 game (CkH3) the last 2 game (CkH2)
(PtA3A) (StA1)
Number of corner kicks Number of corner kicks
on target of H team in on target of A team in
the last 1 game plays the last 2 game plays
home (CkH1H) away (CkA2A)
Note: Team H is the home team and team A is the away team. The parentheses represent the relevant abbreviation of the RV. In total,
12 RVs are selected.
Based on these forecasts, a set of FRs is derived. In this exercise, 13 rules are generated which cover the whole
in-sample. The rules have the form:
If (BbAv>2.5 is cluster 𝐴𝑖,1 ) and (BbAHh is cluster 𝐴𝑖,2 ) and (BbAvAHH is cluster 𝐴𝑖,3 ) and (BbAvAHA
is cluster 𝐴𝑖,4 ) and (PtH3H is cluster 𝐴𝑖,5 ) and (PtA3A is cluster 𝐴𝑖,6 ) and (StH1 is cluster 𝐴𝑖,7 ) and
33
(StA1 is cluster 𝐴𝑖,8 ) and (CkH3 is cluster 𝐴𝑖,9 ) and (CkH2 is cluster 𝐴𝑖,10 ) and (CkH1H is cluster 𝐴𝑖,11 )
and (CkA2A is cluster 𝐴𝑖,12), then the output is the result of regression 𝛿.
where 𝐴𝑖,𝑘 is the cluster specified for input element 𝑘 of the 𝑖-th rule. The FRs are specified by the premise and
consequent parameters. The premise parameters determine the clusters specification: the centres (𝑐𝑖 ) and the
standard deviations (𝜎𝑖 ) for each rule. These are presented in Table A.2.
Table A.2: Cluster characteristics for the generated rules
Rule
1 2 3 4 5 6 7 8 9 10 11 12 13
Input
variable
BbAv>2.5 2.14 2.12 1.96 2.03 2.00 2.00 2.09 1.75 2.02 1.77 2.01 2.03 2.13
(0.17) (0.13) (0.12) (0.15) (0.17) (0.17) (0.15) (0.16) (0.15) (0.16) (0.14) (0.20) (0.10)
BbAHh 0.00 -0.50 0.01 0.00 -0.01 0.01 -0.01 -1.50 -0.01 -1.50 -0.50 -0.75 -0.25
(0.80) (0.79) (0.81) (0.80) (0.82) (0.80) (0.79) (0.79) (0.78) (0.79) (0.80) (0.80) (0.81)
BbAvAHH 1.83 1.97 1.82 1.94 1.52 2.05 1.58 1.79 1.70 2.03 1.81 1.84 1.88
(1.42) (1.42) (1.42) (1.42) (1.42) (1.42) (1.42) (1.42) (1.42) (1.43) (1.42) (1.42) (1.42)
BbAvAHA 1.99 1.89 1.96 1.81 2.36 1.74 2.29 2.08 2.10 1.83 2.08 2.04 1.98
(1.72) (1.71) (1.67) (1.72) (1.71) (1.72) (1.73) (1.72) (1.70) (1.74) (1.72) (1.66) (1.69)
PtH3H 4.28 6.23 3.02 4.66 0.99 6.99 3.32 6.44 6.68 6.23 8.99 3.99 4.00
(1.59) (1.59) (1.60) (1.59) (1.59) (1.59) (1.61) (1.62) (1.66) (1.59) (1.53) (1.54) (1.59)
PtA3A 2.99 1.00 1.99 6.00 1.01 5.01 5.00 3.00 2.99 0.01 0.99 0.01 3.01
(1.58) (1.58) (1.59) (1.59) (1.59) (1.59) (1.59) (1.58) (1.58) (1.59) (1.59) (1.59) (1.59)
StH1 4.00 2.99 6.00 4.22 6.11 6.57 4.99 8.12 7.32 9.23 3.99 4.32 4.82
(3.88) (3.87) (3.88) (3.89) (3.89) (3.89) (3.89) (3.90) (3.88) (3.85) (3.88) (3.89) (3.89)
StA1 5.99 6.00 3.00 6.01 8.23 9.01 6.02 10.11 3.00 5.03 7.00 4.01 5.01
(3.71) (3.71) (3.71) (3.71) (3.71) (3.72) (3.71) (3.71) (3.71) (3.71) (3.71) (3.71) (3.72)
CkH3 13.00 15.12 19.02 13.48 13.20 17.81 19.82 16.94 16.29 19.31 9.40 9.54 21.39
(6.19) (6.19) (6.19) (6.19) (6.19) (6.19) (6.19) (6.19) (6.19) (6.19) (6.18) (6.19) (6.19)
CkH2 7.74 11.69 14.39 10.12 8.79 9.10 13.21 12.31 11.59 10.30 5.30 5.99 16
(4.72) (4.27) (4.35) (4.71) (4.68) (4.70) (4.77) (4.72) (4.71) (4.78) (4.90) (4.72) (4.75)
CkH1H 3.99 6.59 6.34 6.03 5.29 4.99 10.01 8.32 6.03 6.53 2.99 4.42 12.23
(3.36) (3.37) (3.32) (3.33) (3.35) (3.34) (3.36) (3.36) (3.36) (3.36) (3.39) (3.36) (3.37)
CkA2A 7.99 11.01 10.20 7.01 9.11 10.39 12.90 8.39 8.20 8.87 9.68 7.62 4.07
(4.06) (4.07) (4.06) (4.07) (4.06) (4.05) (4.04) (4.07) (4.08) (4.07) (4.05) (4.04) (4.05)
Note: The values in the Table represent the centre (standard deviation) of the relevant cluster. For instance, consider the first input
(BbAv>2.5); to determine which rule an observation belongs to, the membership grade is calculated by the membership function of
2
𝑒𝑥𝑝(−(𝑥∗,1 − 2.14) /(2 × 0.172 )) where 𝑥∗,1 is the given odd for the Betbrain average for greater than 2.5 goals for the match. The
firing strength (weight) of each rule is the product of the membership grades for all inputs.
The output of each rule is associated with a regression 𝛿 specified with consequent parameters. The parameters
of these regressions for each rule are presented in Table A.3.
Table A.3: Regression coefficients for the generated rules
Rule 1 2 3 4 5 6 7 8 9 10 11 12 13
Coefficient
Intercept -2.71 -9.44 10.12 2.95 -10.84 0.16 -4.96 2.46 10.57 -8.08 -12.30 15.94 3.46
BbAv>2.5 1.04 1.68 -2.44 -0.18 0.86 -0.31 0.68 -0.99 0.26 1.12 -0.82 -0.45 4.49
BbAHh -0.76 -0.20 -0.50 -0.26 -1.06 -0.74 -0.39 -0.43 -0.72 -0.70 -0.40 -1.76 -3.10
BbAvAHH 1.07 2.36 -2.39 -0.96 2.64 -0.25 0.46 -0.19 -4.48 1.55 4.10 -6.55 -3.70
BbAvAHAA 0.48 1.06 -1.06 -1.10 1.54 0.12 1.2 -0.35 -0.65 0.95 1.54 -2.01 -1.61
PtH3H -0.04 -0.14 0.16 0.07 0.04 -0.09 -0.02 -0.09 -0.19 0.07 0.42 0.04 -0.34
PtA3A 0.21 -0.44 0.11 0.09 0.14 0.05 0.23 0.16 0.02 0.17 0.37 -0.74 0.10
StH1 -0.14 0.08 0.05 -0.01 0.02 -0.02 0.01 0.01 -0.00 0.01 -0.14 0.06 -0.12
StA1 -0.09 0.12 -0.16 0.01 -0.04 0.03 -0.03 0.08 0.16 0.06 -0.08 0.21 -0.11
CkH3 -0.20 0.05 0.14 0.02 0.08 0.03 -0.02 -0.07 0.01 -0.06 -0.04 0.03 -0.08
CkH2 -0.02 -0.03 -0.13 0.04 0.03 -0.06 0.04 0.11 0.04 -0.02 0.02 -0.06 0.03
CkH1H 0.15 -0.01 -0.06 -0.08 0.05 0.07 0.01 -0.04 -0.15 0.16 0.15 -0.12 0.06
34
CkA2A 0.07 -0.09 0.05 -0.03 -0.02 0.04 -0.05 -0.03 0.03 0.03 0.06 0.08 -0.005
From the table, it is easy to extract the regression specification for the first rule 𝛿1∗ :
∗
𝛿1∗ = −2.71 + 1.04𝑥1,1
∗ ∗
− 0.76𝑥1,2 ∗
+ 1.07𝑥1,3 ∗
+ 0.48𝑥1,4 − 0.04𝑥1,5 ∗
+ 0.21𝑥1,6
∗ ∗ ∗ ∗ ∗ ∗
−0.14𝑥1,7 − 0.09𝑥1,8 − 0.20𝑥1,9 − 0.02𝑥1,10 + 0.15𝑥1,11 + 0.07𝑥1,12 (A.1)
where 𝑥 ∗ = [𝑥1∗ , … , 𝑥12
∗
] is the vector of observed values for 𝐾 = 12 RVs.
The next step is to evaluate the strength of the generated rules. To determine the CF threshold, the algorithm
of Section 2.2.2 is followed. Firstly, all rules are evaluated based on the Gaussian membership function. For each
of these rules, the average firing strength (𝓌 1/12 ) is estimated. The FRs are sorted based on the average firing
strength over the in-sample in a descending order (the sorted list). The 90th percentile of the sorted list
determines the endogenous threshold. In this case, the rule number one is the 90th percentile of the sorted list
(𝜆 = 1). Thus, the endogenous threshold level is set to𝓌11/12 = 0.85.
As the general threshold level (0.90) is greater than the endogenous one (0.85), the effective threshold (Θ) is set
to 0.90. Given the specifications of the FRs and the effective threshold, the performance of the CF model can
now be evaluated. At each test point, average membership grade for each rule is calculated by plugging the
premise parameters of Table A.1. Comparison of the membership grade with the Θ in the evaluation function of
equation (3) determines the signal. Based on equation (A.1), the evaluation function becomes:
́ ∗1/12 ≥ 0.9
𝐶𝑖∗ = { 1, 𝓌𝑖 (A.2)
0,𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
If the geometric mean of the membership grades of all the inputs (12 elements) is greater than0.9, the rule is
considered strong and qualifies for decision-making and a decision is made based on the CF model. The weighted
average of the regressions for the qualified rules determines the CF output (equation (8)). Based on the equation
(8), we have:
13 ∗
∑ 𝓌́𝑖 𝐶𝑖 𝑓𝑖
𝑂́ ∗ = ∑𝑖=1
13
𝓌́ 𝐶 ∗
(A.3)
𝑖=1 𝑖 𝑖
In the game result forecasting, the match outcome (𝑂́ ∗ ) is a weighted average of the regressions specified for
each rule (see Table A.2). The weight for each rule’s regression is normalized amount of 𝓌́𝑖 𝐶𝑖∗ as of equation
(12). Finally, for betting purposes the estimated value must be interpreted and classified (𝑂́𝑎𝑑𝑗
∗
) under three
labels of “win (+1)”, “draw (0)” or “lose (-1)”. The regression outputs are classified according to the following
transfer function:
35