0% found this document useful (0 votes)
10 views

Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models

This paper presents a fusion-based classification model for predicting football match outcomes without relying on in-game statistics, using player and team ratings from video games instead. The proposed hierarchical and ensemble models were tested on English Premier League data, achieving accuracy rates of 56.5332% and 56.8002%, outperforming simpler models. The research aims to improve prediction accuracy while considering possible match outcomes of win, loss, or draw.

Uploaded by

pes2ug23cs801
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models

This paper presents a fusion-based classification model for predicting football match outcomes without relying on in-game statistics, using player and team ratings from video games instead. The proposed hierarchical and ensemble models were tested on English Premier League data, achieving accuracy rates of 56.5332% and 56.8002%, outperforming simpler models. The research aims to improve prediction accuracy while considering possible match outcomes of win, loss, or draw.

Uploaded by

pes2ug23cs801
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Predicting Football Match Result Using

Fusion-based Classification Models


Chananyu Pipatchatchawal and Suphakant Phimoltares
2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) | 978-1-6654-3831-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/JCSSE53117.2021.9493837

Advanced Virtual and Intelligent Computing (AVIC) Research Center


Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University
Pathumwan, Bangkok, Thailand
Email: [email protected], [email protected]

Abstract— In recent decades, many researchers attempted


to predict football match outcome. To forecast future match II. RELATED WORK
results, most papers relied on using in-game match statistics, Several research were done to fulfill this task of
such as number of shots on target, yellow cards, red cards, etc. predicting football match results. Those, however, can be
In this paper, fusion-based classification model was constructed categorized into various groups based on their constraints
for future matches, using none of in-game statistics. The model
and objectives. For instance, one model might aim to
used video games’ ratings of players and teams to help in
forecast future matches, while another model might want to
prediction. Two types of fusion-based models, which are
hierarchical model and ensemble model, were proposed in this correctly label finished matches based on their statistics.
paper. In the experiment, the proposed models were compared These following papers were studied for conducting and
with different simple classification models in terms of accuracy improving this research.
using a dataset of English Premier League (EPL) season In Prasetio and Harilili’s research, the main reference to
2010/2011 to 2014/2015. Additionally, each model was also this paper, they tried to predict the English Premier League
tested on the whole 2015/2016 EPL season as the selected season (EPL) season 2015/2016 by using logistic regression [3].
contains several unexpected results. Both proposed models
They used data from 2010 to 2014 of the same competition
yielded the accurate rates at 56.5332% and 56.8002%, which
are higher than those of the other models.
to train the model. The model was, however, built for
predicting which team will win only, without considering
Keywords-component; prediction football match result; possible ‘draw’ outcome. It is stated that there are only four
fusion-based classification; hierarchical model; ensemble model features used for this model, including attacking and
defending ratings for home team and away team. These
features were gathered from FIFA video games. They
I. INTRODUCTION
claimed to reach 69.5% accuracy with their best model.
From the nineteenth century to present time, football is
one of, if not the most, famous sport in the world under the Igiri and Nwachukwu. also used logistic regression with
addition of Artificial Neural Network in his work [4]. They
consideration of numbers of spectators and participants. In
used EPL season 2014/2015 as a whole dataset for this
fact, football had a total of four billion people watching research. The research produced an astounding performance
around the world in 2020 [1]. This increases the value of of 95% accuracy. However, it is important to note that they
broadcasting football matches, which also leads to higher used post-match features, such as shots, corners, fouls, or
prize pool in each competition. Being able to win as many even betting odds, to predict its own match. Hence, this
matches as possible is necessary, both for the fans and research attempted to classified which team is supposed to
economy of football club. At higher level, winning football win based on how events happen during the match, not
competition could also benefit the country. Winning high- predicting outcomes of future matches.
level competition, such as World Cup, can help their nation
gain both reputation and economically advantages [2]. The Snyder did a slightly different research by considering
both outcome prediction and betting strategy [5]. Multiple
winning nation will receive huge amount of prize pool, and
features, from non-football factors such as stadium capacity
international tourism growth. Hence, winning the most
and distance traveled by away team to statistical data of
possible games can facilitates both club and country levels. players and their manager, were used in this research. The
Many researchers and data scientists have put a lot of logistic regression model was used to train EPL season
effort for predicting football matches beforehand. This is 2010/2011 data for predicting the next season result. This
also desired, or even funded by football teams, as they might model reached 51.06% accuracy. It was also noted that two
be able to come up with different tactics and starting players previous matches and player evaluation are the most
for each individual opponent. Different prior works have important features.
different models and methods, producing results that are still Pugsee and Pattawong compared random forest classifier
improvable. The methodologies, for example, are ranged and multilayer perceptron model to predict results of EPL
from simple logistic regression to complicated neural season 2017/2018 (220 matches) [6]. They used three prior
network. One important remark is that more recent papers seasons of the same competition (1140 matches) for training.
tend to use numeric data gathered from FIFA, one of Instead of predicting three outcomes, they developed three
renowned football game, as input features to assist in classification models, one for ‘home win’ prediction, next
predicting outcomes. for ‘draw’ prediction, and last model for ‘away win’
prediction. The performance of random forest classifier
outperformed in all models. The accuracy rates were around

978-1-6654-3831-5/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
79-81%, with precision of 60-80%, and recall rates of 40- ratings, and whole team ratings for each side of all matches.
88%. In terms of player ratings, six features are used for all
players. Additionally, five features are included for each
Alfredo and Isa did an experiment on predicting football goalkeeper. In terms of team features, nine features are used
matches using multiple tree-based model algorithms, to represent each team. The total current match features can
including C5.0, Random Forest, and Extreme Gradient be detailed in Table I.
Boosting [7]. They used ten seasons of EPL matches, from
2007/2008 to 2016/2017, with a total of 14 independent Secondly, recent match features are calculated to help
features. They also used 10-fold cross-validation in the with match prediction. It can be divided further into three
training process. Accuracies of three mentioned models were groups, three recent results of home team against any other
64.87%, 68.55%, and 67.89% respectively. However, they teams, three recent results of away team against any other
still used statistical data of in-game match for input features. teams, and recent result of home team against away team.
Thus, it does not contribute much to future match prediction. Within each group, the number of wins, draws, losses, goal
scored, and goal conceded were averaged. At most three
Kumar performed a thorough analysis in predicting games were considered for the first two groups, and one
football matches [8]. They worked on the pure actual match game for the last category. All gathered training features
statistical data, excluding video games’ ratings. First, they
were normalized using min-max normalization, resulting in
used match statistics, such as the number of successful all features to be in between 0 and 1. The summary of recent
passes, the number of red cards given, even number of goals, match features was shown in Table II.
to predict each player rating. Secondly, they predicted post-
match results by using all statistical data occurring in the Note that we tested to see the impact of recent match
match. Finally, they combined two models to predict the features, which believed to be beneficial for football match
upcoming match results. They used several algorithms and prediction. The experiments will be discussed in the next
models, including Sequential Minimal Optimization (SMO), section.
Support Vector Machine (SVM), Bagging with Functional
Trees, and AdaBoost with Functional Trees. The best result
was obtained from SMO with past seven matches of 27 input
features each. They could reach up to 53.3875% for TABLE I. CURRENT MATCH FEATURES
predicting three classes, home win, home lose, and draw. Type Quantities Parameter name Type
According to the mentioned studies, there are several 22 players overall_rating float
limitations in different area. First, higher accuracy models 22 players potential float
22 players sprint_speed float
tended to be the results from using in-game features, which
Player Features

22 players reactions float


are not available for predicting future matches. Secondly, 22 players strength float
several researches focused on ‘win’ and ‘loss’ results, 22 players jumping float
leaving out ‘draw’ possibility. Although this might result in 2 players gk_diving float
higher accuracy and better performance of the model, it is 2 players gk_handling float
not reasonable and applicable in predicting the actual match 2 players gk_kicking float
result. This paper aims to improve performance in predicting 2 players gk_positioning float
football matches, while making it rationale with three 2 players gk_reflexes float
2 teams is_home_side int(0/1)
possible classification outputs. 2 teams build_up_play_speed float
Team Features

2 teams chance_creation_passing float


III. RESEARCH METHODOLOGY 2 teams chance_creation_crossing float
2 teams chance_creation_shooting float
A. Data Collection 2 teams defence_pressure float
2 teams defence_aggression float
In this research, the data set was originated from one of 2 teams defence_team_width float
Kaggle’s competitions called European Soccer Database [9]. 2 teams defence_defenderline_class int(0/1)
From this huge data set, six interested seasons of EPL
matches were selected, from season 2010/2015 to TABLE II. RECENT MATCH FEATURES
2015/2016. Each individual season contains 380 matches Match type Number of Parameter name
from a total of 20 football teams. The original dataset matches
contains 115 match features, 42 player features, and 25 team Home 3 Average number of wins
features. These features are ranging from in-game match vs Average number of draws
statistics, such as number of shots on target, to pre-match Any team Average number of losses
features such as betting odds. Moreover, players and teams Goal scored
ratings obtained from FIFA’s video games were also used in Goal conceded
Away 3 Average number of wins
this paper. These data were further processed to be suitable vs Average number of draws
for our experimental purpose. Any team Average number of losses
Goal scored
B. Preprocessing Goal conceded
Home 1 Number of wins
All features obtained from the mentioned data set can be vs Number of draws
categorized into two main types, which are current match Away Number of losses
features and recent match history features. Goal scored
Goal conceded
First, current match features will be used for predicting
match results. This includes match number, individual player

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
C. Data Partition the end. In other words, residuals or wrong samples
In terms of data partitioning, EPL season 2015/2016 was of each iteration are used to improve the next
separated from other seasons. This particular season, called iteration of the whole model.
final season, was used for the final test of the model to see
how the model behave under lots of unpredictable matches. IV. PROPOSED METHODOLOGY
For the other seasons, the whole data set was split using five- In order to enhance prediction performance, the concept
fold stratified cross-validation, causing five folds of data. of fusion of classifiers was proposed and implemented. The
Each fold contained 20% of data, used for testing whereas overall idea was combining two or more classifiers to
the remaining 80% of the data obtained from combining accomplish the complicated task, predicting football match
other folds were used for training the model. The stratified result in this research. Two fusion concepts, i.e., hierarchical
version was applied for the consistency of measuring model, and ensemble model, were studied in this paper as
accuracy with respect to equally numbers of wins, draws, described in the following subsections.
and losses among all five data folds.
Our proposed models and other comparative models A. Hierarchical Model
were tested using all five seasons, three latest seasons, and In hierarchical model, instead of predicting class based
two latest seasons from season 2010/2011 to season on one simple model, multiple models were constructed and
2014/2015. This setup will be beneficial for concluding how connected in hierarchical fashion. These individual models
many seasons are required, to achieve the optimal model. could be trained with similar or different data sets,
performed different classification tasks, but were combined
D. Classification Models at the end to achieve the result according to the main
In this research, there were six renowned classification objective. Two different hierarchical models were designed
models selected for comparative purpose. These models and introduced in this paper, called hierarchical model based
were used to get baseline accuracies, as well as contributed on three classifiers, and hierarchical model based on two
to proposed models. All models included were as follows. classifiers.
1) Multi-Layer Perceptron (MLP) is one of neural The first hierarchical model was composed of three
network feedforward learning algorithms. As the classifiers, A, B, and C as illustrated in Fig. 1. These
name suggested, MLP consists of at least three layers. classifiers were used to predict whether the match results are
It can have one or more non-linear layers, called win/not win, lose/not lose, and win/lose, respectively.
hidden layers, between input layer and output layer. Classifier A and B were trained with all available data, while
Since it consists of multiple layers and can be applied ‘draw’ data points were excluded from classifier C’s training
with non-linear activation function, it is applicable to dataset. The architecture of this model was constructed under
solve non-linearly separable data. the assumption that, models should be better in predicting
match that tends to win or lose, by training on matches
2) Support Vector Machine (SVM) uses the idea of without draw match result. Thus, if classifier A and B
hyperplane to help classifying data in another predicted both win and lose for the same match, that
dimensional space. This model finds the best decision particular match would be subjected to specialized classifier
boundary that can separate input data according to C to make the final decision.
their classes. This optimal boundary produces the
largest margin between different data classes. Mostly, The second hierarchical model contained two classifiers
dense data are mapped to higher dimensional space A and B. Classifier A was trained with all processed data
by a kernel function to find that boundary. points, to predict whether each match is draw or not draw.
On the other hand, classifier B focused on data set without
3) Gaussian Naive Bayes (GNB) method is a draw outcome. In this hierarchical model, new data point
classification model that is based on probability was first submitted to classifier A. If classifier A predicted
concept of Bayes’ theorem. The Gaussian function that match result to be draw, then the process is terminated
indicates the assumption that the input data are drawn with draw as the final prediction. Otherwise, if classifier A
in Gaussian, or normal, distribution. predicted not draw, that data point was then passed to
4) K-Nearest Neighbors (KNN) method has the concept classifier B to get the final classification result, as shown in
of classifying each data point based on the K nearest Fig. 2. The hypothesis behind this model is that splitting 3-
data points. Basically, it uses the majority vote to class classification task into simpler model, draw or not draw
achieve the final classification result among K nearest then win or lose, should be beneficial to football match
data points. prediction.

5) Random Forest (RF) is a combination of numerous B. Ensemble Model


decision trees. Individual decision tree produces
predicted class. The result is the mode of predicted Ensemble model took the concept of voting system into
classes from all subtrees, or the class with most votes. consideration. In this scheme, multiple models were trained
and make their own prediction independently. Final
6) Gradient Boosting (GB) is similar to RF in the sense prediction was based on majority votes among all existing
that they both consist of weak prediction models. In models. Each sub-model algorithm was different from each
fact, if a decision tree is used as sub-model, it could other.
be called Gradient Boosting Tree (GBT). The
difference is that sub-models in gradient boosting are This paper’s proposed ensemble model consisted of three
combined at the beginning of the process instead of at sub-models as shown in Fig. 3. Each sub-model had equal
weight contributing to the final result. Each ensemble

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
model’s combination of classifiers was different from each V. EXPERIMENTS
other, in terms of sub-model algorithms composing the Our experiments were conducted in two scenarios. The
whole model. We tried a total of three different sub-model first scenario was designed for testing each comparative
combinations. The selection of each sub-model was based on model based on the feature types, the number of seasons, and
the performance of comparative models and hierarchical the number of classes of match results for training the model.
models. The selection was further discussed in the next The results of the first scenario were taken into accounts for
section. the experimental setup in the second scenario, in which the
fusion-based models were tested.

A. Scenario 1
The original data were processed by the mentioned
procedure. Thus, three sets of data, including five seasons,
three seasons, and two seasons, were used in this scenario.
For each set of data, there were two types of features for
training models. The first type contained only recent match
results while the other type was a combination of recent
match results and current match features. These two feature
types were subjected to each mentioned model, to yield each
comparative model’s performance for all data sets.
The results of all comparative models with latest match
features, and all processed features can be depicted in Table
III and Table IV. On average, the accuracies of using all
Fig. 1. Architecture of hierarchical model based on three classifiers.
features were greater than using only recent match features.
For three-class classifications, using fewer seasons tended to
have slightly higher accuracy for test set, while using three
or five seasons is more optimal for predicting final season
results. Likewise, the two-class classifications also had
similar trend, but with higher difference between test set and
final season. From these similar results, our next proposed
models were trained with all features and constructed for
three-class classifications only, but still with various number
of seasons for training phase.

B. Scenario 2
Three fusion-based models, two hierarchical models and
one ensemble model, were experimented in this scenario.
The first hierarchical model was constructed using three
classifiers as shown in Fig. 1. All of classification models in
comparative experiment were used on this model. All three
classifiers used the same classification algorithm so that we
can compared the performance of each classification
algorithm.
Fig. 2. Architecture of hierarchical model based on two classifiers.
Instead of using three classifiers, the second hierarchical
model yielded the classification results only on two
classifiers as illustrated in Fig. 2. The accuracies of these
models were evaluated to perform performance comparison
with other existing models.
As shown in Table V, the hierarchical model based on
three classifiers achieved the highest accuracies for test set
with GNB classification method. The accuracy of two-
season, three-season and five-season data sets were at
52.267%, 51.947%, 49.357%, respectively. There was no
significant improvement in this proposed model from
comparative models.
In Table VI, accuracies from the second hierarchical
model have increased noticeably, especially in test sets. All
algorithms except GNB passed 50% accuracy in two-season
Fig. 3. Architecture of ensemble model
and three-season test sets. For the final season, the overall
performance was also relatively better comparing to first
hierarchical and comparative models. The peak accuracies
came from using KNN classifiers, with 56.533% testing

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
accuracy on two-season data set, and 44.24% accuracy on first combination, but with RF, SVM, and GNB as sub-
the final-season data set. model classifiers, respectively. The third combination used
two classifiers’ hierarchical model for all sub-models, with
From prior results, three selecting combinations for using RF, KNN, and SVM as sub-model classifiers.
experimenting ensemble model were selected as follows. In
the first combination, the first sub-model was hierarchical Table VII shows accuracies of all proposed ensemble
model based on two KNN classifiers, while the second sub- combination models. The peak performance came from
model was the same first hierarchical model but with MLP using the first combination at 56.800% accuracy for test set,
classifiers. The third sub-model was RF. The second and at 43.714% accuracy for final-season data set.
combination used the same sub-models architecture as the

TABLE III. ACCURACIES FROM COMPARATIVE MODELS USING RECENT MATCH FEATURES
Classifier Accuracy (%)
3-class classification (W/D/L) 2-class classification (W/L)
Five seasons Three seasons Two seasons Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final
Season Season Season Season Season Season
MLP 45.279 41.326 46.991 41.167 49.2 40.053 63.473 57.565 65.258 56.974 65.345 55.720
SVM 44.744 41.910 46.549 40.106 48.0 40.796 61.243 57.417 62.675 56.531 64.828 55.277
GNB 46.299 40.477 47.079 40.0 49.333 39.629 61.890 55.277 62.912 56.753 65.172 57.196
KNN 46.031 40.584 48.142 39.576 51.333 40.424 63.186 55.498 64.786 55.867 66.896 56.309
RF 39.647 40.424 42.832 41.379 44.267 39.522 58.792 55.351 61.386 58.303 62.241 53.432
GB 45.762 41.804 46.195 41.804 47.333 40.637 61.961 58.007 63.028 57.491 62.931 55.350

TABLE IV. ACCURACIES FROM COMPARTIVE MODELS USING RECENT MATCH FEATURES AND CURRENT MATCH FEATURES

Classifier Accuracy (%)


3-class classification (W/D/L) 2-class classification (W/L)
Five seasons Three seasons Two seasons Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final
Season Season Season Season Season Season
MLP 48.981 43.767 48.850 42.918 46.8 41.326 68.014 60.147 69.480 58.229 64.310 51.513
SVM 47.801 41.751 50.796 42.600 50.8 42.228 67.869 60.590 69.130 60.148 66.724 59.114
GNB 45.117 38.355 46.106 39.841 47.333 39.576 67.653 60.443 70.659 60.886 69.828 60.369
KNN 51.985 44.032 54.336 43.873 55.733 43.873 70.679 60.590 72.651 61.033 73.103 61.697
RF 49.303 43.077 51.593 42.599 52.8 42.759 69.526 61.550 70.071 60.664 68.276 60.590
GB 48.821 43.820 52.212 42.865 51.467 42.546 68.446 61.255 69.957 61.476 68.448 60.886

TABLE V. ACCURACIES FROM HIERARCHICAL MODELS BASED ON THREE CLASSIFIERS


Classifier Accuracy (%)
3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
MLP 44.261 38.196 47.522 39.523 47.600 39.416
SVM 43.830 38.515 46.195 38.780 46.0 37.719
GNB 49.357 43.130 51.947 44.350 52.267 42.600
KNN 41.632 38.143 44.956 38.833 46.933 38.409
RF 45.226 38.568 46.195 39.682 45.600 38.621
GB 43.619 40.212 45.929 39.257 45.067 39.682

TABLE VI. ACCURACIES FROM HIERARCHICAL MODELS BASED ON TWO CLASSIFIERS

Classifier Accuracy (%)


3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
MLP 52.092 44.244 54.513 43.926 54.800 43.501
SVM 49.196 43.395 51.947 43.661 51.467 42.865
GNB 43.830 37.878 44.690 39.416 45.467 39.788
KNN 52.200 43.660 54.690 44.032 56.533 44.244
RF 49.197 44.138 53.186 43.077 52.267 43.342
GB 49.197 44.138 50.442 42.918 52.267 43.342

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
TABLE VII. ACCURACIES OF ENSEMBLE MODEL
Combination Accuracy (%)
3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
1 51.609 44.191 54.248 43.873 56.800 43.714
2 51.074 43.767 54.248 43.289 54.533 42.812
3 52.790 44.297 54.867 44.191 56.133 43.926

VI. CONCLUSIONS REFERENCES


From all experiments conducted in this paper, on [1] J. Shvili and T. World, "The Most Popular Sports In The World,"
average, using current match features including player and WorldAtlas, 2021. [Online]. Available:
https://www.worldatlas.com/articles/what-are-the-most-popular-
team ratings had positive impact on predicting football sports-in-the-world.html.
match results. For three-class classification without [2] J. Staughton, "What Benefits Does A Country Get After Winning
current match features, the accuracy ranged between The World Cup," ScienceABC, 2020. [Online]. Available:
39.647% and 51.333% for test set, and between 39.523% https://www.scienceabc.com/sports/what-benefits-does-a-country-
and 41.910% for final season. Contrastly, the ranges of get-after-winning-the-world-cup.html
improved accuracy with inclusion of current match [3] D. Prasetio and D. Harlili, "Predicting football match results with
features were between 45.117% and 55.733% for test set, logistic regression," in the Proceedings of the 2016 International
Conference On Advanced Informatics: Concepts, Theory And
and between 38.355% and 44.032% for final season. Application (ICAICTA), 2016, pp. 1-5, Penang, Malaysia doi:
For proposed models, while the first hierarchical 10.1109/ICAICTA.2016.7803111.
model did not show much improvement, the second [4] I. Chinwe Peace, "An Improved Prediction System for Football a
Match Result," IOSR Journal of Engineering, vol. 04, no. 12, pp.
model produced better accuracy on average throughout all 12-20, 2014. doi: 10.9790/3021-04124012020.
classifiers. The highest accuracy reached at 56.533% for [5] J. Snyder, "What Actually Wins Soccer Matches: Prediction of
test set, and 44.244% for final season. The best the2011-2012 Premier League for Fun and Profit", 2013.
performance was obtained from the first combination of [6] P. Pugsee and P. Pattawong, "Football Match Result Prediction
ensemble model, at 56.800% accuracy for test set, and the Using the Random Forest Classifier," in the Proceedings of the 2nd
third combination at 43.714% accuracy for final-season International Conference on Big Data Technologies (ICBDT2019),
data set. 2019, pp. 154-158, New York, NY, USA. doi:
10.1145/3358528.3358593.
In summary, two out of three proposed models in this [7] Y. Alfredo and S. Isa, "Football Match Prediction with Tree Based
paper have shown better performance, measuring by Model Classification," International Journal of Intelligent Systems
accuracy, comparing to the existing models mentioned in and Applications, vol. 11, no. 7, pp. 20-28, 2019. Available:
10.5815/ijisa.2019.07.03.
this paper. Predicted results were more reliable using both
[8] K. Gunjan, "Machine Learning for Soccer Analytics," 2013. doi:
current match features and recent match results of each 10.13140/RG.2.1.4628.3761.
competitive team. Additionally, using three-season and
[9] H. Mathien, “European Soccer Database,” Kaggle, 23-Oct-2016.
two-season data sets tended to outperform using all five [Online]. Available: https://www.kaggle.com/hugomathien/soccer.
seasons. This might be due to the fact that most football [10] J. Harper, “Data experts are becoming football's best signings,”
teams cannot keep their performance, in both positive and BBCNews, 2021. [Online]. Available:
negative ways, consistently for several years. https://www.bbc.com/news/business-56164159

Even though the proposed model resulted in improved


football match prediction, it is, still, not able to achieve
great accuracy. This might be because of feature
limitation in this experiment, which only pre-match
attributes were accepted. Hence, the proposed model
outperformed related research models with similar
settings but did not reach higher accuracy obtained by
models that used all existed available attributes.
Furthermore, in the future improvement, feature
analysis and selection could be included to enhance the
performance of football match prediction models. It
could also help fasten up the training process if we could
manage to reduce amount of training features while
keeping or increasing prediction accuracy if plausible.
It is also worth mentioning that there are only few
prediction models published nowadays. This might be due
to the reason that, most of the works that are done for the
football club are kept on secret and belong to team assets
[10]. This is to prevent the benefits of models and team
selection from leaking to other competitive teams.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.

You might also like