Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models
Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
79-81%, with precision of 60-80%, and recall rates of 40- ratings, and whole team ratings for each side of all matches.
88%. In terms of player ratings, six features are used for all
players. Additionally, five features are included for each
Alfredo and Isa did an experiment on predicting football goalkeeper. In terms of team features, nine features are used
matches using multiple tree-based model algorithms, to represent each team. The total current match features can
including C5.0, Random Forest, and Extreme Gradient be detailed in Table I.
Boosting [7]. They used ten seasons of EPL matches, from
2007/2008 to 2016/2017, with a total of 14 independent Secondly, recent match features are calculated to help
features. They also used 10-fold cross-validation in the with match prediction. It can be divided further into three
training process. Accuracies of three mentioned models were groups, three recent results of home team against any other
64.87%, 68.55%, and 67.89% respectively. However, they teams, three recent results of away team against any other
still used statistical data of in-game match for input features. teams, and recent result of home team against away team.
Thus, it does not contribute much to future match prediction. Within each group, the number of wins, draws, losses, goal
scored, and goal conceded were averaged. At most three
Kumar performed a thorough analysis in predicting games were considered for the first two groups, and one
football matches [8]. They worked on the pure actual match game for the last category. All gathered training features
statistical data, excluding video games’ ratings. First, they
were normalized using min-max normalization, resulting in
used match statistics, such as the number of successful all features to be in between 0 and 1. The summary of recent
passes, the number of red cards given, even number of goals, match features was shown in Table II.
to predict each player rating. Secondly, they predicted post-
match results by using all statistical data occurring in the Note that we tested to see the impact of recent match
match. Finally, they combined two models to predict the features, which believed to be beneficial for football match
upcoming match results. They used several algorithms and prediction. The experiments will be discussed in the next
models, including Sequential Minimal Optimization (SMO), section.
Support Vector Machine (SVM), Bagging with Functional
Trees, and AdaBoost with Functional Trees. The best result
was obtained from SMO with past seven matches of 27 input
features each. They could reach up to 53.3875% for TABLE I. CURRENT MATCH FEATURES
predicting three classes, home win, home lose, and draw. Type Quantities Parameter name Type
According to the mentioned studies, there are several 22 players overall_rating float
limitations in different area. First, higher accuracy models 22 players potential float
22 players sprint_speed float
tended to be the results from using in-game features, which
Player Features
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
C. Data Partition the end. In other words, residuals or wrong samples
In terms of data partitioning, EPL season 2015/2016 was of each iteration are used to improve the next
separated from other seasons. This particular season, called iteration of the whole model.
final season, was used for the final test of the model to see
how the model behave under lots of unpredictable matches. IV. PROPOSED METHODOLOGY
For the other seasons, the whole data set was split using five- In order to enhance prediction performance, the concept
fold stratified cross-validation, causing five folds of data. of fusion of classifiers was proposed and implemented. The
Each fold contained 20% of data, used for testing whereas overall idea was combining two or more classifiers to
the remaining 80% of the data obtained from combining accomplish the complicated task, predicting football match
other folds were used for training the model. The stratified result in this research. Two fusion concepts, i.e., hierarchical
version was applied for the consistency of measuring model, and ensemble model, were studied in this paper as
accuracy with respect to equally numbers of wins, draws, described in the following subsections.
and losses among all five data folds.
Our proposed models and other comparative models A. Hierarchical Model
were tested using all five seasons, three latest seasons, and In hierarchical model, instead of predicting class based
two latest seasons from season 2010/2011 to season on one simple model, multiple models were constructed and
2014/2015. This setup will be beneficial for concluding how connected in hierarchical fashion. These individual models
many seasons are required, to achieve the optimal model. could be trained with similar or different data sets,
performed different classification tasks, but were combined
D. Classification Models at the end to achieve the result according to the main
In this research, there were six renowned classification objective. Two different hierarchical models were designed
models selected for comparative purpose. These models and introduced in this paper, called hierarchical model based
were used to get baseline accuracies, as well as contributed on three classifiers, and hierarchical model based on two
to proposed models. All models included were as follows. classifiers.
1) Multi-Layer Perceptron (MLP) is one of neural The first hierarchical model was composed of three
network feedforward learning algorithms. As the classifiers, A, B, and C as illustrated in Fig. 1. These
name suggested, MLP consists of at least three layers. classifiers were used to predict whether the match results are
It can have one or more non-linear layers, called win/not win, lose/not lose, and win/lose, respectively.
hidden layers, between input layer and output layer. Classifier A and B were trained with all available data, while
Since it consists of multiple layers and can be applied ‘draw’ data points were excluded from classifier C’s training
with non-linear activation function, it is applicable to dataset. The architecture of this model was constructed under
solve non-linearly separable data. the assumption that, models should be better in predicting
match that tends to win or lose, by training on matches
2) Support Vector Machine (SVM) uses the idea of without draw match result. Thus, if classifier A and B
hyperplane to help classifying data in another predicted both win and lose for the same match, that
dimensional space. This model finds the best decision particular match would be subjected to specialized classifier
boundary that can separate input data according to C to make the final decision.
their classes. This optimal boundary produces the
largest margin between different data classes. Mostly, The second hierarchical model contained two classifiers
dense data are mapped to higher dimensional space A and B. Classifier A was trained with all processed data
by a kernel function to find that boundary. points, to predict whether each match is draw or not draw.
On the other hand, classifier B focused on data set without
3) Gaussian Naive Bayes (GNB) method is a draw outcome. In this hierarchical model, new data point
classification model that is based on probability was first submitted to classifier A. If classifier A predicted
concept of Bayes’ theorem. The Gaussian function that match result to be draw, then the process is terminated
indicates the assumption that the input data are drawn with draw as the final prediction. Otherwise, if classifier A
in Gaussian, or normal, distribution. predicted not draw, that data point was then passed to
4) K-Nearest Neighbors (KNN) method has the concept classifier B to get the final classification result, as shown in
of classifying each data point based on the K nearest Fig. 2. The hypothesis behind this model is that splitting 3-
data points. Basically, it uses the majority vote to class classification task into simpler model, draw or not draw
achieve the final classification result among K nearest then win or lose, should be beneficial to football match
data points. prediction.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
model’s combination of classifiers was different from each V. EXPERIMENTS
other, in terms of sub-model algorithms composing the Our experiments were conducted in two scenarios. The
whole model. We tried a total of three different sub-model first scenario was designed for testing each comparative
combinations. The selection of each sub-model was based on model based on the feature types, the number of seasons, and
the performance of comparative models and hierarchical the number of classes of match results for training the model.
models. The selection was further discussed in the next The results of the first scenario were taken into accounts for
section. the experimental setup in the second scenario, in which the
fusion-based models were tested.
A. Scenario 1
The original data were processed by the mentioned
procedure. Thus, three sets of data, including five seasons,
three seasons, and two seasons, were used in this scenario.
For each set of data, there were two types of features for
training models. The first type contained only recent match
results while the other type was a combination of recent
match results and current match features. These two feature
types were subjected to each mentioned model, to yield each
comparative model’s performance for all data sets.
The results of all comparative models with latest match
features, and all processed features can be depicted in Table
III and Table IV. On average, the accuracies of using all
Fig. 1. Architecture of hierarchical model based on three classifiers.
features were greater than using only recent match features.
For three-class classifications, using fewer seasons tended to
have slightly higher accuracy for test set, while using three
or five seasons is more optimal for predicting final season
results. Likewise, the two-class classifications also had
similar trend, but with higher difference between test set and
final season. From these similar results, our next proposed
models were trained with all features and constructed for
three-class classifications only, but still with various number
of seasons for training phase.
B. Scenario 2
Three fusion-based models, two hierarchical models and
one ensemble model, were experimented in this scenario.
The first hierarchical model was constructed using three
classifiers as shown in Fig. 1. All of classification models in
comparative experiment were used on this model. All three
classifiers used the same classification algorithm so that we
can compared the performance of each classification
algorithm.
Fig. 2. Architecture of hierarchical model based on two classifiers.
Instead of using three classifiers, the second hierarchical
model yielded the classification results only on two
classifiers as illustrated in Fig. 2. The accuracies of these
models were evaluated to perform performance comparison
with other existing models.
As shown in Table V, the hierarchical model based on
three classifiers achieved the highest accuracies for test set
with GNB classification method. The accuracy of two-
season, three-season and five-season data sets were at
52.267%, 51.947%, 49.357%, respectively. There was no
significant improvement in this proposed model from
comparative models.
In Table VI, accuracies from the second hierarchical
model have increased noticeably, especially in test sets. All
algorithms except GNB passed 50% accuracy in two-season
Fig. 3. Architecture of ensemble model
and three-season test sets. For the final season, the overall
performance was also relatively better comparing to first
hierarchical and comparative models. The peak accuracies
came from using KNN classifiers, with 56.533% testing
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
accuracy on two-season data set, and 44.24% accuracy on first combination, but with RF, SVM, and GNB as sub-
the final-season data set. model classifiers, respectively. The third combination used
two classifiers’ hierarchical model for all sub-models, with
From prior results, three selecting combinations for using RF, KNN, and SVM as sub-model classifiers.
experimenting ensemble model were selected as follows. In
the first combination, the first sub-model was hierarchical Table VII shows accuracies of all proposed ensemble
model based on two KNN classifiers, while the second sub- combination models. The peak performance came from
model was the same first hierarchical model but with MLP using the first combination at 56.800% accuracy for test set,
classifiers. The third sub-model was RF. The second and at 43.714% accuracy for final-season data set.
combination used the same sub-models architecture as the
TABLE III. ACCURACIES FROM COMPARATIVE MODELS USING RECENT MATCH FEATURES
Classifier Accuracy (%)
3-class classification (W/D/L) 2-class classification (W/L)
Five seasons Three seasons Two seasons Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final
Season Season Season Season Season Season
MLP 45.279 41.326 46.991 41.167 49.2 40.053 63.473 57.565 65.258 56.974 65.345 55.720
SVM 44.744 41.910 46.549 40.106 48.0 40.796 61.243 57.417 62.675 56.531 64.828 55.277
GNB 46.299 40.477 47.079 40.0 49.333 39.629 61.890 55.277 62.912 56.753 65.172 57.196
KNN 46.031 40.584 48.142 39.576 51.333 40.424 63.186 55.498 64.786 55.867 66.896 56.309
RF 39.647 40.424 42.832 41.379 44.267 39.522 58.792 55.351 61.386 58.303 62.241 53.432
GB 45.762 41.804 46.195 41.804 47.333 40.637 61.961 58.007 63.028 57.491 62.931 55.350
TABLE IV. ACCURACIES FROM COMPARTIVE MODELS USING RECENT MATCH FEATURES AND CURRENT MATCH FEATURES
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
TABLE VII. ACCURACIES OF ENSEMBLE MODEL
Combination Accuracy (%)
3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
1 51.609 44.191 54.248 43.873 56.800 43.714
2 51.074 43.767 54.248 43.289 54.533 42.812
3 52.790 44.297 54.867 44.191 56.133 43.926
Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.