Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models

This paper presents a fusion-based classification model for predicting football match outcomes without relying on in-game statistics, using player and team ratings from video games instead. The proposed hierarchical and ensemble models were tested on English Premier League data, achieving accuracy rates of 56.5332% and 56.8002%, outperforming simpler models. The research aims to improve prediction accuracy while considering possible match outcomes of win, loss, or draw.

Uploaded by

pes2ug23cs801

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Predicting_Football_Match_Result_Using_Fusion-based_Classification_Models

Uploaded by

pes2ug23cs801

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Predicting Football Match Result Using

Fusion-based Classification Models

Chananyu Pipatchatchawal and Suphakant Phimoltares
2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) | 978-1-6654-3831-5/21/$31.00 ©2021 IEEE | DOI: 10.1109/JCSSE53117.2021.9493837

Advanced Virtual and Intelligent Computing (AVIC) Research Center

Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University
Pathumwan, Bangkok, Thailand
Email: [email protected], [email protected]

Abstract— In recent decades, many researchers attempted

to predict football match outcome. To forecast future match II. RELATED WORK
results, most papers relied on using in-game match statistics, Several research were done to fulfill this task of
such as number of shots on target, yellow cards, red cards, etc. predicting football match results. Those, however, can be
In this paper, fusion-based classification model was constructed categorized into various groups based on their constraints
for future matches, using none of in-game statistics. The model
and objectives. For instance, one model might aim to
used video games’ ratings of players and teams to help in
forecast future matches, while another model might want to
prediction. Two types of fusion-based models, which are
hierarchical model and ensemble model, were proposed in this correctly label finished matches based on their statistics.
paper. In the experiment, the proposed models were compared These following papers were studied for conducting and
with different simple classification models in terms of accuracy improving this research.
using a dataset of English Premier League (EPL) season In Prasetio and Harilili’s research, the main reference to
2010/2011 to 2014/2015. Additionally, each model was also this paper, they tried to predict the English Premier League
tested on the whole 2015/2016 EPL season as the selected season (EPL) season 2015/2016 by using logistic regression [3].
contains several unexpected results. Both proposed models
They used data from 2010 to 2014 of the same competition
yielded the accurate rates at 56.5332% and 56.8002%, which
are higher than those of the other models.
to train the model. The model was, however, built for
predicting which team will win only, without considering
Keywords-component; prediction football match result; possible ‘draw’ outcome. It is stated that there are only four
fusion-based classification; hierarchical model; ensemble model features used for this model, including attacking and
defending ratings for home team and away team. These
features were gathered from FIFA video games. They
I. INTRODUCTION
claimed to reach 69.5% accuracy with their best model.
From the nineteenth century to present time, football is
one of, if not the most, famous sport in the world under the Igiri and Nwachukwu. also used logistic regression with
addition of Artificial Neural Network in his work [4]. They
consideration of numbers of spectators and participants. In
used EPL season 2014/2015 as a whole dataset for this
fact, football had a total of four billion people watching research. The research produced an astounding performance
around the world in 2020 [1]. This increases the value of of 95% accuracy. However, it is important to note that they
broadcasting football matches, which also leads to higher used post-match features, such as shots, corners, fouls, or
prize pool in each competition. Being able to win as many even betting odds, to predict its own match. Hence, this
matches as possible is necessary, both for the fans and research attempted to classified which team is supposed to
economy of football club. At higher level, winning football win based on how events happen during the match, not
competition could also benefit the country. Winning high- predicting outcomes of future matches.
level competition, such as World Cup, can help their nation
gain both reputation and economically advantages [2]. The Snyder did a slightly different research by considering
both outcome prediction and betting strategy [5]. Multiple
winning nation will receive huge amount of prize pool, and
features, from non-football factors such as stadium capacity
international tourism growth. Hence, winning the most
and distance traveled by away team to statistical data of
possible games can facilitates both club and country levels. players and their manager, were used in this research. The
Many researchers and data scientists have put a lot of logistic regression model was used to train EPL season
effort for predicting football matches beforehand. This is 2010/2011 data for predicting the next season result. This
also desired, or even funded by football teams, as they might model reached 51.06% accuracy. It was also noted that two
be able to come up with different tactics and starting players previous matches and player evaluation are the most
for each individual opponent. Different prior works have important features.
different models and methods, producing results that are still Pugsee and Pattawong compared random forest classifier
improvable. The methodologies, for example, are ranged and multilayer perceptron model to predict results of EPL
from simple logistic regression to complicated neural season 2017/2018 (220 matches) [6]. They used three prior
network. One important remark is that more recent papers seasons of the same competition (1140 matches) for training.
tend to use numeric data gathered from FIFA, one of Instead of predicting three outcomes, they developed three
renowned football game, as input features to assist in classification models, one for ‘home win’ prediction, next
predicting outcomes. for ‘draw’ prediction, and last model for ‘away win’
prediction. The performance of random forest classifier
outperformed in all models. The accuracy rates were around

978-1-6654-3831-5/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
79-81%, with precision of 60-80%, and recall rates of 40- ratings, and whole team ratings for each side of all matches.
88%. In terms of player ratings, six features are used for all
players. Additionally, five features are included for each
Alfredo and Isa did an experiment on predicting football goalkeeper. In terms of team features, nine features are used
matches using multiple tree-based model algorithms, to represent each team. The total current match features can
including C5.0, Random Forest, and Extreme Gradient be detailed in Table I.
Boosting [7]. They used ten seasons of EPL matches, from
2007/2008 to 2016/2017, with a total of 14 independent Secondly, recent match features are calculated to help
features. They also used 10-fold cross-validation in the with match prediction. It can be divided further into three
training process. Accuracies of three mentioned models were groups, three recent results of home team against any other
64.87%, 68.55%, and 67.89% respectively. However, they teams, three recent results of away team against any other
still used statistical data of in-game match for input features. teams, and recent result of home team against away team.
Thus, it does not contribute much to future match prediction. Within each group, the number of wins, draws, losses, goal
scored, and goal conceded were averaged. At most three
Kumar performed a thorough analysis in predicting games were considered for the first two groups, and one
football matches [8]. They worked on the pure actual match game for the last category. All gathered training features
statistical data, excluding video games’ ratings. First, they
were normalized using min-max normalization, resulting in
used match statistics, such as the number of successful all features to be in between 0 and 1. The summary of recent
passes, the number of red cards given, even number of goals, match features was shown in Table II.
to predict each player rating. Secondly, they predicted post-
match results by using all statistical data occurring in the Note that we tested to see the impact of recent match
match. Finally, they combined two models to predict the features, which believed to be beneficial for football match
upcoming match results. They used several algorithms and prediction. The experiments will be discussed in the next
models, including Sequential Minimal Optimization (SMO), section.
Support Vector Machine (SVM), Bagging with Functional
Trees, and AdaBoost with Functional Trees. The best result
was obtained from SMO with past seven matches of 27 input
features each. They could reach up to 53.3875% for TABLE I. CURRENT MATCH FEATURES
predicting three classes, home win, home lose, and draw. Type Quantities Parameter name Type
According to the mentioned studies, there are several 22 players overall_rating float
limitations in different area. First, higher accuracy models 22 players potential float
22 players sprint_speed float
tended to be the results from using in-game features, which
Player Features

22 players reactions float

are not available for predicting future matches. Secondly, 22 players strength float
several researches focused on ‘win’ and ‘loss’ results, 22 players jumping float
leaving out ‘draw’ possibility. Although this might result in 2 players gk_diving float
higher accuracy and better performance of the model, it is 2 players gk_handling float
not reasonable and applicable in predicting the actual match 2 players gk_kicking float
result. This paper aims to improve performance in predicting 2 players gk_positioning float
football matches, while making it rationale with three 2 players gk_reflexes float
2 teams is_home_side int(0/1)
possible classification outputs. 2 teams build_up_play_speed float
Team Features

2 teams chance_creation_passing float

III. RESEARCH METHODOLOGY 2 teams chance_creation_crossing float
2 teams chance_creation_shooting float
A. Data Collection 2 teams defence_pressure float
2 teams defence_aggression float
In this research, the data set was originated from one of 2 teams defence_team_width float
Kaggle’s competitions called European Soccer Database [9]. 2 teams defence_defenderline_class int(0/1)
From this huge data set, six interested seasons of EPL
matches were selected, from season 2010/2015 to TABLE II. RECENT MATCH FEATURES
2015/2016. Each individual season contains 380 matches Match type Number of Parameter name
from a total of 20 football teams. The original dataset matches
contains 115 match features, 42 player features, and 25 team Home 3 Average number of wins
features. These features are ranging from in-game match vs Average number of draws
statistics, such as number of shots on target, to pre-match Any team Average number of losses
features such as betting odds. Moreover, players and teams Goal scored
ratings obtained from FIFA’s video games were also used in Goal conceded
Away 3 Average number of wins
this paper. These data were further processed to be suitable vs Average number of draws
for our experimental purpose. Any team Average number of losses
Goal scored
B. Preprocessing Goal conceded
Home 1 Number of wins
All features obtained from the mentioned data set can be vs Number of draws
categorized into two main types, which are current match Away Number of losses
features and recent match history features. Goal scored
Goal conceded
First, current match features will be used for predicting
match results. This includes match number, individual player

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
C. Data Partition the end. In other words, residuals or wrong samples
In terms of data partitioning, EPL season 2015/2016 was of each iteration are used to improve the next
separated from other seasons. This particular season, called iteration of the whole model.
final season, was used for the final test of the model to see
how the model behave under lots of unpredictable matches. IV. PROPOSED METHODOLOGY
For the other seasons, the whole data set was split using five- In order to enhance prediction performance, the concept
fold stratified cross-validation, causing five folds of data. of fusion of classifiers was proposed and implemented. The
Each fold contained 20% of data, used for testing whereas overall idea was combining two or more classifiers to
the remaining 80% of the data obtained from combining accomplish the complicated task, predicting football match
other folds were used for training the model. The stratified result in this research. Two fusion concepts, i.e., hierarchical
version was applied for the consistency of measuring model, and ensemble model, were studied in this paper as
accuracy with respect to equally numbers of wins, draws, described in the following subsections.
and losses among all five data folds.
Our proposed models and other comparative models A. Hierarchical Model
were tested using all five seasons, three latest seasons, and In hierarchical model, instead of predicting class based
two latest seasons from season 2010/2011 to season on one simple model, multiple models were constructed and
2014/2015. This setup will be beneficial for concluding how connected in hierarchical fashion. These individual models
many seasons are required, to achieve the optimal model. could be trained with similar or different data sets,
performed different classification tasks, but were combined
D. Classification Models at the end to achieve the result according to the main
In this research, there were six renowned classification objective. Two different hierarchical models were designed
models selected for comparative purpose. These models and introduced in this paper, called hierarchical model based
were used to get baseline accuracies, as well as contributed on three classifiers, and hierarchical model based on two
to proposed models. All models included were as follows. classifiers.
1) Multi-Layer Perceptron (MLP) is one of neural The first hierarchical model was composed of three
network feedforward learning algorithms. As the classifiers, A, B, and C as illustrated in Fig. 1. These
name suggested, MLP consists of at least three layers. classifiers were used to predict whether the match results are
It can have one or more non-linear layers, called win/not win, lose/not lose, and win/lose, respectively.
hidden layers, between input layer and output layer. Classifier A and B were trained with all available data, while
Since it consists of multiple layers and can be applied ‘draw’ data points were excluded from classifier C’s training
with non-linear activation function, it is applicable to dataset. The architecture of this model was constructed under
solve non-linearly separable data. the assumption that, models should be better in predicting
match that tends to win or lose, by training on matches
2) Support Vector Machine (SVM) uses the idea of without draw match result. Thus, if classifier A and B
hyperplane to help classifying data in another predicted both win and lose for the same match, that
dimensional space. This model finds the best decision particular match would be subjected to specialized classifier
boundary that can separate input data according to C to make the final decision.
their classes. This optimal boundary produces the
largest margin between different data classes. Mostly, The second hierarchical model contained two classifiers
dense data are mapped to higher dimensional space A and B. Classifier A was trained with all processed data
by a kernel function to find that boundary. points, to predict whether each match is draw or not draw.
On the other hand, classifier B focused on data set without
3) Gaussian Naive Bayes (GNB) method is a draw outcome. In this hierarchical model, new data point
classification model that is based on probability was first submitted to classifier A. If classifier A predicted
concept of Bayes’ theorem. The Gaussian function that match result to be draw, then the process is terminated
indicates the assumption that the input data are drawn with draw as the final prediction. Otherwise, if classifier A
in Gaussian, or normal, distribution. predicted not draw, that data point was then passed to
4) K-Nearest Neighbors (KNN) method has the concept classifier B to get the final classification result, as shown in
of classifying each data point based on the K nearest Fig. 2. The hypothesis behind this model is that splitting 3-
data points. Basically, it uses the majority vote to class classification task into simpler model, draw or not draw
achieve the final classification result among K nearest then win or lose, should be beneficial to football match
data points. prediction.

5) Random Forest (RF) is a combination of numerous B. Ensemble Model

decision trees. Individual decision tree produces
predicted class. The result is the mode of predicted Ensemble model took the concept of voting system into
classes from all subtrees, or the class with most votes. consideration. In this scheme, multiple models were trained
and make their own prediction independently. Final
6) Gradient Boosting (GB) is similar to RF in the sense prediction was based on majority votes among all existing
that they both consist of weak prediction models. In models. Each sub-model algorithm was different from each
fact, if a decision tree is used as sub-model, it could other.
be called Gradient Boosting Tree (GBT). The
difference is that sub-models in gradient boosting are This paper’s proposed ensemble model consisted of three
combined at the beginning of the process instead of at sub-models as shown in Fig. 3. Each sub-model had equal
weight contributing to the final result. Each ensemble

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
model’s combination of classifiers was different from each V. EXPERIMENTS
other, in terms of sub-model algorithms composing the Our experiments were conducted in two scenarios. The
whole model. We tried a total of three different sub-model first scenario was designed for testing each comparative
combinations. The selection of each sub-model was based on model based on the feature types, the number of seasons, and
the performance of comparative models and hierarchical the number of classes of match results for training the model.
models. The selection was further discussed in the next The results of the first scenario were taken into accounts for
section. the experimental setup in the second scenario, in which the
fusion-based models were tested.

A. Scenario 1
The original data were processed by the mentioned
procedure. Thus, three sets of data, including five seasons,
three seasons, and two seasons, were used in this scenario.
For each set of data, there were two types of features for
training models. The first type contained only recent match
results while the other type was a combination of recent
match results and current match features. These two feature
types were subjected to each mentioned model, to yield each
comparative model’s performance for all data sets.
The results of all comparative models with latest match
features, and all processed features can be depicted in Table
III and Table IV. On average, the accuracies of using all
Fig. 1. Architecture of hierarchical model based on three classifiers.
features were greater than using only recent match features.
For three-class classifications, using fewer seasons tended to
have slightly higher accuracy for test set, while using three
or five seasons is more optimal for predicting final season
results. Likewise, the two-class classifications also had
similar trend, but with higher difference between test set and
final season. From these similar results, our next proposed
models were trained with all features and constructed for
three-class classifications only, but still with various number
of seasons for training phase.

B. Scenario 2
Three fusion-based models, two hierarchical models and
one ensemble model, were experimented in this scenario.
The first hierarchical model was constructed using three
classifiers as shown in Fig. 1. All of classification models in
comparative experiment were used on this model. All three
classifiers used the same classification algorithm so that we
can compared the performance of each classification
algorithm.
Fig. 2. Architecture of hierarchical model based on two classifiers.
Instead of using three classifiers, the second hierarchical
model yielded the classification results only on two
classifiers as illustrated in Fig. 2. The accuracies of these
models were evaluated to perform performance comparison
with other existing models.
As shown in Table V, the hierarchical model based on
three classifiers achieved the highest accuracies for test set
with GNB classification method. The accuracy of two-
season, three-season and five-season data sets were at
52.267%, 51.947%, 49.357%, respectively. There was no
significant improvement in this proposed model from
comparative models.
In Table VI, accuracies from the second hierarchical
model have increased noticeably, especially in test sets. All
algorithms except GNB passed 50% accuracy in two-season
Fig. 3. Architecture of ensemble model
and three-season test sets. For the final season, the overall
performance was also relatively better comparing to first
hierarchical and comparative models. The peak accuracies
came from using KNN classifiers, with 56.533% testing

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
accuracy on two-season data set, and 44.24% accuracy on first combination, but with RF, SVM, and GNB as sub-
the final-season data set. model classifiers, respectively. The third combination used
two classifiers’ hierarchical model for all sub-models, with
From prior results, three selecting combinations for using RF, KNN, and SVM as sub-model classifiers.
experimenting ensemble model were selected as follows. In
the first combination, the first sub-model was hierarchical Table VII shows accuracies of all proposed ensemble
model based on two KNN classifiers, while the second sub- combination models. The peak performance came from
model was the same first hierarchical model but with MLP using the first combination at 56.800% accuracy for test set,
classifiers. The third sub-model was RF. The second and at 43.714% accuracy for final-season data set.
combination used the same sub-models architecture as the

TABLE III. ACCURACIES FROM COMPARATIVE MODELS USING RECENT MATCH FEATURES
Classifier Accuracy (%)
3-class classification (W/D/L) 2-class classification (W/L)
Five seasons Three seasons Two seasons Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final
Season Season Season Season Season Season
MLP 45.279 41.326 46.991 41.167 49.2 40.053 63.473 57.565 65.258 56.974 65.345 55.720
SVM 44.744 41.910 46.549 40.106 48.0 40.796 61.243 57.417 62.675 56.531 64.828 55.277
GNB 46.299 40.477 47.079 40.0 49.333 39.629 61.890 55.277 62.912 56.753 65.172 57.196
KNN 46.031 40.584 48.142 39.576 51.333 40.424 63.186 55.498 64.786 55.867 66.896 56.309
RF 39.647 40.424 42.832 41.379 44.267 39.522 58.792 55.351 61.386 58.303 62.241 53.432
GB 45.762 41.804 46.195 41.804 47.333 40.637 61.961 58.007 63.028 57.491 62.931 55.350

TABLE IV. ACCURACIES FROM COMPARTIVE MODELS USING RECENT MATCH FEATURES AND CURRENT MATCH FEATURES

Classifier Accuracy (%)

3-class classification (W/D/L) 2-class classification (W/L)
Five seasons Three seasons Two seasons Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final Test Set Final
Season Season Season Season Season Season
MLP 48.981 43.767 48.850 42.918 46.8 41.326 68.014 60.147 69.480 58.229 64.310 51.513
SVM 47.801 41.751 50.796 42.600 50.8 42.228 67.869 60.590 69.130 60.148 66.724 59.114
GNB 45.117 38.355 46.106 39.841 47.333 39.576 67.653 60.443 70.659 60.886 69.828 60.369
KNN 51.985 44.032 54.336 43.873 55.733 43.873 70.679 60.590 72.651 61.033 73.103 61.697
RF 49.303 43.077 51.593 42.599 52.8 42.759 69.526 61.550 70.071 60.664 68.276 60.590
GB 48.821 43.820 52.212 42.865 51.467 42.546 68.446 61.255 69.957 61.476 68.448 60.886

TABLE V. ACCURACIES FROM HIERARCHICAL MODELS BASED ON THREE CLASSIFIERS

Classifier Accuracy (%)
3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
MLP 44.261 38.196 47.522 39.523 47.600 39.416
SVM 43.830 38.515 46.195 38.780 46.0 37.719
GNB 49.357 43.130 51.947 44.350 52.267 42.600
KNN 41.632 38.143 44.956 38.833 46.933 38.409
RF 45.226 38.568 46.195 39.682 45.600 38.621
GB 43.619 40.212 45.929 39.257 45.067 39.682

TABLE VI. ACCURACIES FROM HIERARCHICAL MODELS BASED ON TWO CLASSIFIERS

Classifier Accuracy (%)

3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
MLP 52.092 44.244 54.513 43.926 54.800 43.501
SVM 49.196 43.395 51.947 43.661 51.467 42.865
GNB 43.830 37.878 44.690 39.416 45.467 39.788
KNN 52.200 43.660 54.690 44.032 56.533 44.244
RF 49.197 44.138 53.186 43.077 52.267 43.342
GB 49.197 44.138 50.442 42.918 52.267 43.342

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.
TABLE VII. ACCURACIES OF ENSEMBLE MODEL
Combination Accuracy (%)
3-class classification (W/D/L)
Five seasons Three seasons Two seasons
Test Set Final Test Set Final Test Set Final
Season Season Season
1 51.609 44.191 54.248 43.873 56.800 43.714
2 51.074 43.767 54.248 43.289 54.533 42.812
3 52.790 44.297 54.867 44.191 56.133 43.926

VI. CONCLUSIONS REFERENCES

From all experiments conducted in this paper, on [1] J. Shvili and T. World, "The Most Popular Sports In The World,"
average, using current match features including player and WorldAtlas, 2021. [Online]. Available:
https://www.worldatlas.com/articles/what-are-the-most-popular-
team ratings had positive impact on predicting football sports-in-the-world.html.
match results. For three-class classification without [2] J. Staughton, "What Benefits Does A Country Get After Winning
current match features, the accuracy ranged between The World Cup," ScienceABC, 2020. [Online]. Available:
39.647% and 51.333% for test set, and between 39.523% https://www.scienceabc.com/sports/what-benefits-does-a-country-
and 41.910% for final season. Contrastly, the ranges of get-after-winning-the-world-cup.html
improved accuracy with inclusion of current match [3] D. Prasetio and D. Harlili, "Predicting football match results with
features were between 45.117% and 55.733% for test set, logistic regression," in the Proceedings of the 2016 International
Conference On Advanced Informatics: Concepts, Theory And
and between 38.355% and 44.032% for final season. Application (ICAICTA), 2016, pp. 1-5, Penang, Malaysia doi:
For proposed models, while the first hierarchical 10.1109/ICAICTA.2016.7803111.
model did not show much improvement, the second [4] I. Chinwe Peace, "An Improved Prediction System for Football a
Match Result," IOSR Journal of Engineering, vol. 04, no. 12, pp.
model produced better accuracy on average throughout all 12-20, 2014. doi: 10.9790/3021-04124012020.
classifiers. The highest accuracy reached at 56.533% for [5] J. Snyder, "What Actually Wins Soccer Matches: Prediction of
test set, and 44.244% for final season. The best the2011-2012 Premier League for Fun and Profit", 2013.
performance was obtained from the first combination of [6] P. Pugsee and P. Pattawong, "Football Match Result Prediction
ensemble model, at 56.800% accuracy for test set, and the Using the Random Forest Classifier," in the Proceedings of the 2nd
third combination at 43.714% accuracy for final-season International Conference on Big Data Technologies (ICBDT2019),
data set. 2019, pp. 154-158, New York, NY, USA. doi:
10.1145/3358528.3358593.
In summary, two out of three proposed models in this [7] Y. Alfredo and S. Isa, "Football Match Prediction with Tree Based
paper have shown better performance, measuring by Model Classification," International Journal of Intelligent Systems
accuracy, comparing to the existing models mentioned in and Applications, vol. 11, no. 7, pp. 20-28, 2019. Available:
10.5815/ijisa.2019.07.03.
this paper. Predicted results were more reliable using both
[8] K. Gunjan, "Machine Learning for Soccer Analytics," 2013. doi:
current match features and recent match results of each 10.13140/RG.2.1.4628.3761.
competitive team. Additionally, using three-season and
[9] H. Mathien, “European Soccer Database,” Kaggle, 23-Oct-2016.
two-season data sets tended to outperform using all five [Online]. Available: https://www.kaggle.com/hugomathien/soccer.
seasons. This might be due to the fact that most football [10] J. Harper, “Data experts are becoming football's best signings,”
teams cannot keep their performance, in both positive and BBCNews, 2021. [Online]. Available:
negative ways, consistently for several years. https://www.bbc.com/news/business-56164159

Even though the proposed model resulted in improved

football match prediction, it is, still, not able to achieve
great accuracy. This might be because of feature
limitation in this experiment, which only pre-match
attributes were accepted. Hence, the proposed model
outperformed related research models with similar
settings but did not reach higher accuracy obtained by
models that used all existed available attributes.
Furthermore, in the future improvement, feature
analysis and selection could be included to enhance the
performance of football match prediction models. It
could also help fasten up the training process if we could
manage to reduce amount of training features while
keeping or increasing prediction accuracy if plausible.
It is also worth mentioning that there are only few
prediction models published nowadays. This might be due
to the reason that, most of the works that are done for the
football club are kept on secret and belong to team assets
[10]. This is to prevent the benefits of models and team
selection from leaking to other competitive teams.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:52:06 UTC from IEEE Xplore. Restrictions apply.

Data Driven Football Scouting Assistance With Simulated Player Performance Extrapolation
No ratings yet
Data Driven Football Scouting Assistance With Simulated Player Performance Extrapolation
8 pages
Robinson Carlie 4599edn
No ratings yet
Robinson Carlie 4599edn
7 pages
NBA Game Prediction Using Machine Learning Algorithm
0% (1)
NBA Game Prediction Using Machine Learning Algorithm
6 pages
Winning Prediction Analysis in One-Day-International (ODI) Cricket Using Machine Learning Techniques
No ratings yet
Winning Prediction Analysis in One-Day-International (ODI) Cricket Using Machine Learning Techniques
8 pages
journal.pone.0284318
No ratings yet
journal.pone.0284318
15 pages
Entropy-25-00765, Introduction
No ratings yet
Entropy-25-00765, Introduction
16 pages
Predicting Epl Football Matches
No ratings yet
Predicting Epl Football Matches
9 pages
A Comparative Study On Neural Network Based Soccer Result Prediction
No ratings yet
A Comparative Study On Neural Network Based Soccer Result Prediction
6 pages
Cricket Player Data Analysis Using Clustering Technique
No ratings yet
Cricket Player Data Analysis Using Clustering Technique
5 pages
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
No ratings yet
A Novel Approach For Predicting Football Match Results: An Evaluation of Classification Algorithms
8 pages
Football - Match - Result - Prediction - Using - Neural - Networks - and - Deep - Learning Yeah
No ratings yet
Football - Match - Result - Prediction - Using - Neural - Networks - and - Deep - Learning Yeah
4 pages
Football Data Analysis Using Machine Learning Techniques
No ratings yet
Football Data Analysis Using Machine Learning Techniques
3 pages
A Software Engineering Approach in Netball Performance Analysis
No ratings yet
A Software Engineering Approach in Netball Performance Analysis
7 pages
dalal-2024-ijca-923744
No ratings yet
dalal-2024-ijca-923744
7 pages
IPL_Score_Prediction_Using_Deep_Learning[1]
No ratings yet
IPL_Score_Prediction_Using_Deep_Learning[1]
8 pages
Forty_years_of_soccer_match_outcome_modeling__an_experimental_review1
No ratings yet
Forty_years_of_soccer_match_outcome_modeling__an_experimental_review1
20 pages
Football Result Prediction With Bayesian Network in Spanish League-Barcelona Team
No ratings yet
Football Result Prediction With Bayesian Network in Spanish League-Barcelona Team
4 pages
J Knosys 2008 03 016 PDF
No ratings yet
J Knosys 2008 03 016 PDF
12 pages
Player Stats Analysis Using Machine Learning
No ratings yet
Player Stats Analysis Using Machine Learning
4 pages
IPL_PREDICTION final
No ratings yet
IPL_PREDICTION final
6 pages
Predicting Game Results For Football League Using Deep Learning
No ratings yet
Predicting Game Results For Football League Using Deep Learning
6 pages
Using Convolutional Neural Networks To Forecast Sporting Event Results - SpringerLink
No ratings yet
Using Convolutional Neural Networks To Forecast Sporting Event Results - SpringerLink
24 pages
22
No ratings yet
22
7 pages
1 s2.0 S016920702300033X Main
No ratings yet
1 s2.0 S016920702300033X Main
11 pages
Ipl Prediction Documentation
No ratings yet
Ipl Prediction Documentation
18 pages
European Football Player Valuation: Integrating Financial Models and Network Theory
No ratings yet
European Football Player Valuation: Integrating Financial Models and Network Theory
15 pages
Temporal Feature Multiplexing Network
No ratings yet
Temporal Feature Multiplexing Network
29 pages
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
No ratings yet
Sminton,+13509 Article+ (PDF) 30287 1 11 20220414
38 pages
A Comparative Study of The Different Classification Algorithms On Football Analytics
No ratings yet
A Comparative Study of The Different Classification Algorithms On Football Analytics
16 pages
Applied Computing and Informatics: Rory P. Bunker, Fadi Thabtah
No ratings yet
Applied Computing and Informatics: Rory P. Bunker, Fadi Thabtah
7 pages
Bibliographie Commentee
No ratings yet
Bibliographie Commentee
6 pages
PA in Fuzzy Based Model (Intro 2)
No ratings yet
PA in Fuzzy Based Model (Intro 2)
12 pages
Predicting Sports Results Using Latent Features A Case Study
No ratings yet
Predicting Sports Results Using Latent Features A Case Study
6 pages
Analysing Long Short Term Memory Models For Cricket Match Outcome Prediction
No ratings yet
Analysing Long Short Term Memory Models For Cricket Match Outcome Prediction
10 pages
Descriptive and Predictive Analysis of Euroleague PDF
No ratings yet
Descriptive and Predictive Analysis of Euroleague PDF
25 pages
Descriptive and Predictive Analysis of Euroleague PDF
No ratings yet
Descriptive and Predictive Analysis of Euroleague PDF
25 pages
Descriptive and Predictive Analysis of Euroleague
No ratings yet
Descriptive and Predictive Analysis of Euroleague
25 pages
Paper 3
No ratings yet
Paper 3
7 pages
s11082-023-06025-8
No ratings yet
s11082-023-06025-8
21 pages
An Improved Prediction System For Football A Match Result - Data Mining
No ratings yet
An Improved Prediction System For Football A Match Result - Data Mining
9 pages
EPL Prediction Web App
No ratings yet
EPL Prediction Web App
15 pages
Adaptation_of_YOLOv7_and_YOLOv7_tiny_for_Soccer-Ba
No ratings yet
Adaptation_of_YOLOv7_and_YOLOv7_tiny_for_Soccer-Ba
29 pages
A Predictive Analytics Model For Forecasting Outcomes in The National Football League Games Using Decision Tree and Logistic Regression
No ratings yet
A Predictive Analytics Model For Forecasting Outcomes in The National Football League Games Using Decision Tree and Logistic Regression
10 pages
Regression Models For Forecasting Goals and Match Results in Association Football
No ratings yet
Regression Models For Forecasting Goals and Match Results in Association Football
10 pages
Cricket JETIR2005307
No ratings yet
Cricket JETIR2005307
5 pages
Using Machine Learning and Candlestick Patterns To
No ratings yet
Using Machine Learning and Candlestick Patterns To
18 pages
2403.07669v1
No ratings yet
2403.07669v1
41 pages
Cluster Index 2023
No ratings yet
Cluster Index 2023
26 pages
Artigo 11
No ratings yet
Artigo 11
6 pages
Real-Time_Analysis_of_Basketball_Sports_Data_Based
No ratings yet
Real-Time_Analysis_of_Basketball_Sports_Data_Based
11 pages
Predictiveanalysis of PSL Match Winners Using Machine Learning Techniques
No ratings yet
Predictiveanalysis of PSL Match Winners Using Machine Learning Techniques
12 pages
Introduction New
No ratings yet
Introduction New
3 pages
Paper 9073
No ratings yet
Paper 9073
11 pages
Rajesh 2020
No ratings yet
Rajesh 2020
9 pages
Ipl Score Prediction
No ratings yet
Ipl Score Prediction
7 pages
Tac-Simur Tactic-Based Simulative Visual Analytics of Table Tennis
No ratings yet
Tac-Simur Tactic-Based Simulative Visual Analytics of Table Tennis
11 pages
(A) - A Bayesian In-Play Prediction Model For Association Football Outcomes
No ratings yet
(A) - A Bayesian In-Play Prediction Model For Association Football Outcomes
18 pages
Predicting Football Matches Using Neural Networks in MATLAB
100% (1)
Predicting Football Matches Using Neural Networks in MATLAB
6 pages
A Machine Learning Framework For Sport Result Prediction
No ratings yet
A Machine Learning Framework For Sport Result Prediction
7 pages
smart-sports-predictions-via-hybrid-simulation-nba-case-1jo7kg18
No ratings yet
smart-sports-predictions-via-hybrid-simulation-nba-case-1jo7kg18
12 pages
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
From Everand
EDUCATION DATA MINING FOR PREDICTING STUDENTS’ PERFORMANCE
Dr. GEETHA N DATA SCIENTIST, BENGALURU
No ratings yet
Exam Digital Communications I: 16th of March, 14:00-19:00
No ratings yet
Exam Digital Communications I: 16th of March, 14:00-19:00
2 pages
Mathematics Reaction Paper
No ratings yet
Mathematics Reaction Paper
3 pages
Mathematics Paper 1 March 2019 STD 12th Commerce HSC Maharashtra Board Question Paper
No ratings yet
Mathematics Paper 1 March 2019 STD 12th Commerce HSC Maharashtra Board Question Paper
2 pages
A N Interacting Multiple Model (IMM) Filter With: Consists of
No ratings yet
A N Interacting Multiple Model (IMM) Filter With: Consists of
5 pages
Invalidity
No ratings yet
Invalidity
10 pages
Aristotle's Theory of The Unity of Science
67% (3)
Aristotle's Theory of The Unity of Science
286 pages
CMPT 260: Logic
No ratings yet
CMPT 260: Logic
13 pages
CA2123-Lec-5 R
No ratings yet
CA2123-Lec-5 R
35 pages
Phys HSSC 1 Model Paper
No ratings yet
Phys HSSC 1 Model Paper
7 pages
Birla Institute of Technology & Science, Pilani, Hyderabad Campus Second Semester 2022-2023 Computer Programming (CS F111) Lab Sheet 4
No ratings yet
Birla Institute of Technology & Science, Pilani, Hyderabad Campus Second Semester 2022-2023 Computer Programming (CS F111) Lab Sheet 4
2 pages
Lesson 2.2 Scientific Notation
No ratings yet
Lesson 2.2 Scientific Notation
38 pages
Section 4.4 Computation in Other Bases
No ratings yet
Section 4.4 Computation in Other Bases
18 pages
05586659
No ratings yet
05586659
11 pages
Deformation and Mechanical Behaviour
No ratings yet
Deformation and Mechanical Behaviour
28 pages
How To Study Math
No ratings yet
How To Study Math
9 pages
Unified Modeling Language and Enhanced Entity Relationship: An Empirical Study
No ratings yet
Unified Modeling Language and Enhanced Entity Relationship: An Empirical Study
12 pages
ECS4863 2021 Assignment 01
No ratings yet
ECS4863 2021 Assignment 01
3 pages
NOAA - Conversion Table Specific Gravity To Salinity - 2006
No ratings yet
NOAA - Conversion Table Specific Gravity To Salinity - 2006
24 pages
Perl Programming Exercises 1 - 'A B C'
No ratings yet
Perl Programming Exercises 1 - 'A B C'
29 pages
Research in Developmental Disabilities: Annemie Desoete, Magda Praet, Daisy Titeca, Annelies Ceulemans
No ratings yet
Research in Developmental Disabilities: Annemie Desoete, Magda Praet, Daisy Titeca, Annelies Ceulemans
9 pages
Mensuration Formula
No ratings yet
Mensuration Formula
8 pages
Mathematics Class 10th Full Book MCQ's 2025
No ratings yet
Mathematics Class 10th Full Book MCQ's 2025
6 pages
The Integral Test PDF
No ratings yet
The Integral Test PDF
41 pages
Rule-Based Fuzzy Model
No ratings yet
Rule-Based Fuzzy Model
15 pages
Bioch12 Classification PDF
No ratings yet
Bioch12 Classification PDF
41 pages
WCC Module-1 Notes
No ratings yet
WCC Module-1 Notes
33 pages
1) ! 1) ! 1) ! 1 Be A Positive Integer Such That 1
No ratings yet
1) ! 1) ! 1) ! 1 Be A Positive Integer Such That 1
10 pages
Practical Guide To NumPy For Data Science
No ratings yet
Practical Guide To NumPy For Data Science
27 pages
Development of Packaging and Products For Use in Microwave Ovens (Woodhead Publishing in Materials) 2nd Edition Ulrich Erle (Editor)
100% (3)
Development of Packaging and Products For Use in Microwave Ovens (Woodhead Publishing in Materials) 2nd Edition Ulrich Erle (Editor)
62 pages