0% found this document useful (0 votes)
19 views

Football_Player_Transfer_Value_Prediction_Using_Advanced_Statistics_and_FIFA_22_Data[1]

This study aims to predict the transfer value of professional football players using FIFA 22 data and advanced statistical methods, particularly regression machine learning models. The research identifies key factors influencing transfer values, such as player skills, age, and contract duration, and finds that Gradient Boosting and eXtreme Gradient Boosting are the most effective algorithms for this prediction. The results provide valuable insights for professional teams and football analytics, emphasizing the importance of accurate player valuation in the transfer market.

Uploaded by

pes2ug23cs801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Football_Player_Transfer_Value_Prediction_Using_Advanced_Statistics_and_FIFA_22_Data[1]

This study aims to predict the transfer value of professional football players using FIFA 22 data and advanced statistical methods, particularly regression machine learning models. The research identifies key factors influencing transfer values, such as player skills, age, and contract duration, and finds that Gradient Boosting and eXtreme Gradient Boosting are the most effective algorithms for this prediction. The results provide valuable insights for professional teams and football analytics, emphasizing the importance of accurate player valuation in the transfer market.

Uploaded by

pes2ug23cs801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Football Player Transfer Value Prediction Using

Advanced Statistics and FIFA 22 Data


V B Jishnu P V Hari Narayanan Surya Aanand
2022 IEEE 19th India Council International Conference (INDICON) | 978-1-6654-7350-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/INDICON56171.2022.10040117

Department of Computer Engineering Department of Computer Engineering Department of Computer Engineering


Govt. Model Engineering College Govt. Model Engineering College Govt. Model Engineering College
Kochi, India Kochi, India Kochi, India
[email protected] [email protected] [email protected]

Dr.Preetha Theresa Joy


Professor, Department of Computer Engineering
Govt. Model Engineering College
Kochi, India
[email protected]

Abstract—In football, transfers are one of the most exciting Football has developed to the point where advanced statis-
things for fans. A transfer involves a player moving from one tics are readily available for the fans and these are often
club to another. The buying club will pay a large amount of utilised by teams and analysts. These statistics can paint an
money to the selling club to acquire the services of the player.
This amount is really large and its crucial not to overpay for extremely detailed picture about every aspect of a player, not
players for the sustainability of a club. In this study, we make just how well he plays but also the way in which he plays.
use of the FIFA 22 game data and information from football This study aims to predict the transfer value of professional
data websites like FBref and Transfermarkt to accurately predict football players based on their rankings in the video game
what a player is worth, using regression machine learning models. FIFA 22 by EA Sports, the above mentioned statistics and a
Other than player skills, we also considered important factors
like age and contract remaining, which has a significant impact few other factors like years remaining in the contract. FIFA
on transfer value. Results show that Gradient Boosting and 22 gives insight into the skills of a player and this is a very
eXtreme Gradient boosting were found to be the best algorithms. good measuring stick for understanding how good a player is
This work is beneficial for professional teams as well as football in real life. The transfer values of players are maintained by
websites. a website called Transfermarkt.com. We used the data from
Index Terms—Football Analytics, Transfer Value, Machine
Learning, Regression Transfermarkt for training our model. Statistics are obtained
from a website called FBREF which provides a vast amount
of data pertaining to player performance. Through our system,
I. I NTRODUCTION
we are trying to provide a statistical method for estimating
Football is one of the most popular sports in the world transfer value.
with millions of followers. Most countries have their own
football associations and leagues. In each of the leagues, we II. L ITERATURE R EVIEW
usually have around 10-24 teams who all strive for continuous Behravan and Razavi(2020) looked to address the draw-
improvement. The most direct way for improvement is through backs of Transfermarkt market value by predicting market
player recruitment. By recruiting new and exciting players, a values with the help of the FIFA 20 dataset. From the dataset,
club can improve its roster of players and have a better chance players were divided into 4 clusters (Goalkeeper, Defenders,
of winning more titles. This in turn improves their relationship Midfielders, Attackers) using an automatic clustering method.
with fans. Then PSO was used for automatic feature selection and SVR
A player can either be a free agent or be contracted was used for regression for each cluster. The value of a player
to another club(parent club). A free agent can be signed in the FIFA dataset was considered the true value.
directly without any negotiations with the club he used to Yigit et. al(2020) attempted to predict the transfer value
play for. However, for contracted players, the buying club of football players from major leagues. 5316 players from
has to pay a compensation fee to acquire his/her services. 11 major leagues across Europe and South America were
This compensation is usually money(termed transfer value) considered. Data from the football manager simulation game
and is extremely expensive. This figure often ranges in the was collected and merged with the transfer value from Trans-
multi-million-pound region and therefore, it is of immense fermarkt. Logarithmic transformation was used for better
importance that the right amount is paid for the right player. distribution. The KPMG valuation model was employed for
Otherwise, this impacts the long-term plan of the buying club. training. Goalkeepers were excluded from the dataset as they
Therefore, smart recruitment is crucial. had different features. A variety of regression models like

978-1-6654-7350-7/22/$31.00 ©2022 IEEE


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.
cross-validation, ridge regression, lasso regression, principal III. DATA D ESCRIPTION
component regression, and partial least squares were tried and
most resultant values were in accordance with the current
market values.
We used the data from the video game FIFA 22 by EA
Singh and Lamba(2019) aimed at predicting transfer value Sports to analyze the transfer value of players. FIFA 22 is
using FIFA18 Stats, Transfermarkt values, Statistical data, a football simulation game developed by Electronic Arts and
Fantasy league data, Popularity measures, and personal infor- it is the 29th installment in the series. EA sports employs
mation. From these, the feature set was reduced by selecting real-life scouts and data reviewers who analyze games and
the dominant features. In general, players from the upper give their opinions on players. This data is collected and
echelon of clubs were observed to have higher transfer values. then passed through an algorithm that gives us the ratings
Various algorithms like Linear Regression, Ridge Regression, for each attribute of each player. Therefore, it is a good
Decision Tree, Random Forest, and Gradient Boost were used. measure of the performance of a player. This game data
Kischstein and Liebscher(2018) aimed to analyze the rela- is publicly available and we obtained it for our system.
tionship between the playing skills and the market values of There are over 30 attributes for each player which can be
german 1 and 2 Bundesliga players. The data was collected categorized into physical(strength, speed, height metrics, etc.),
from fifaindex.com and made into a dataset of 493 players. technical(various passing, shooting, defending metrics, etc.),
Goalkeepers were excluded from this and a total of 28 at- and mental(aggression, vision, composure, awareness, etc.).
tributes were considered for each player. The market value This data is combined with the data from Transfermarkt.
of each player was extracted from Transfermarkt. Outliers The data from Transfermarkt was obtained by web scraping
were removed after standardization and the PCA method was using python scripts. The data in Transfermarkt is calculated
employed for training. on the basis of a list of factors including age, future prospects,
Muller et. al(2017) attempted to use a data-driven method to international performance, current performance, and quality of
overcome the limitations of crowdsourcing for the estimation the league the player plays in. Along with this value, Transfer-
of the market value of football players. A total of 4217 markt also provides data on player age, date of contract expiry,
players from the top 5 European leagues over a period of 6 etc which we considered important. This data is updated twice
years were considered for training. The applicability of data a season.
analytics for estimating the market value of football players FBREF is a website which tracks statistics of football
was measured. Goalkeepers were removed from the dataset players and teams from around the world. It was created by
as their performance was measured differently. Only players Sports Reference, a team of individuals who also created
having more than 90 minutes on the pitch a season were popular statistics wbesites like Baseball Reference and
taken into consideration. Data from Transfermarkt was merged Basketball Reference. The data we collected from FBREF
with data about the player’s characteristics, performance, and includes prominent data like goals, assists etc., and also
popularity. Relevant results within the scope of crowdsourced detailed metrics like xG, xA, shot creating actions, blocks
estimates were achieved. made etc. A detailed list of statistics used is mentioned in the
Stanojevic and Gyarmati(2016) aimed at transfer value appendix.
prediction using a statistical approach. Statistical data was
obtained from sports analytics company InStat. This was
merged with Transfermarkt’s market value. The transfermarkt
value was considered the true value, but with a noise factor.
In addition to statistical features, the average transfer value
of teammates was also considered. Several models were con-
sidered and gradient boosting trees regression (GBT) was the
best performing one.
Yuan He(2015) attempted to identify the various factors that
affected the transfer value of a player. Data was collected from
the Transfermarkt and the Wikipedia page of each player. A
total of 357 players were taken with 17 attributes for each
player. These attributes were then divided into 3 groups - per-
sonal info, performance info, and the ratio of predictors(goals
to appearances). The OLS method was employed and it was The above graph shows that in general attacking players
found that except for a couple of outliers, the attributes were and midfielders have the higher transfer value than defenders.
all pretty significant for prediction. The final result showed Goalkeepers have comparatively lower transfer values.
the weights and priorities of each attribute and the correlation
between the attributes and the transfer value of a player.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.
Similarly higher FIFA 22 potential also has a direct
correlation with transfer value.

The above graph illustrates that transfer value decreases


after a player is past his prime age. Young players usually
have high value.
The above graph is an example of how statistical measures
impact transfer value. Goal Creating Actions for attackers
directly influence their transfer value.

IV. M ETHODOLOGY

The transfer value of a player is dependent on the position


of the player on the pitch. FIFA attributes are also highly
related to where a player plays. Therefore, we cannot
generalize transfer values evenly for every player. To combat
this, we split the dataset based on the position into 4 namely
Goalkeepers, Defenders, Midfielders, and Attackers. This
was done based on the attribute “Best Position” in the FIFA
From the above graph we see how a player’s value
dataset. The procedure for each position is identical except
increases if he has more years remaining in his contract.
for the players and features selected for model training.
For each position, the data was split into training and
testing sets (80% training, 20% testing ). The R2 value was
used as the primary measurement of accuracy and 10-fold
cross-validation was performed to generalize the result.
The following algorithms performed the best on the data -
eXtreme Gradient Boosting(XGB), Gradient Boosting, Light
Gradient Boosting Machine(LGBM), Random Forest.

eXtreme Gradient Boosting(XGB) - It is a tree-based


algorithm that can be used for both classification and
regression. It uses its own method to build trees where
similarity score.
The above graph shows higher FIFA 22 overall value has Gradient Boosting - It is a machine learning technique
a direct correlation with a higher transfer value. that is used for regression and classification tasks. It usually
outperforms random forest.
Light Gradient Boosting Machine(LGBM) - It is a free and
open-source distributed gradient boosting framework used for
machine learning. It is based on decision tree algorithms and
is used for ranking, classification, and other machine learning
tasks.
Random Forest - It is a learning method for classification,
regression, and other tasks that operates by constructing mul-
tiple decision trees at the time of training. For classification,
the output of the random forest is the class selected by most
trees. Whereas in regression, the mean or average prediction
of the individual trees is returned.

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.
V. R ESULTS CrsPA. The R2 value was found to be 0.805 for the best
performing model XGB.
A. Goalkeepers
The table below indicates performance of the models.
Usually goalkeepers have a lower transfer value when
compared to players from other positions. We found that the
features that had the most impact on the transfer value of Defenders
goalkeepers were Overall, Potential, Age, Contract Remaining, Model R2 MAE MSE
GK Reflexes, GK Diving, GK Handling, GK Kicking, and XGB 0.805 3.0 23.75
GK Positioning. In football teams, there is usually only one GRADIENT 0.788 3.13 25.76
goalkeeper who plays the majority of the games. As a result, BOOSTING
the advanced statistics available for goalkeepers are less in LGBM 0.755 3.28 29.00
number and often skewed. Therefore, we only considered FIFA RANDOM FOREST 0.78 3.24 30.90
data for goalkeepers, and the players we considered included
the ones from FIFA 22 dataset across the world. The best
performing algorithm was Gradient Boosting Regressor with
The performance of the best model on test data is shown
an R2 value of 0.82.
below.
The table below illustrates the performance of the models
used. MAE value for this as well as other positions is in
million pounds.

Goalkeepers
Model R2 MAE MSE
XGB 0.805 0.736 4.59
GRADIENT 0.813 0.73 4.41
BOOSTING
LGBM 0.658 0.97 8.08
RANDOM FOREST 0.783 0.78 5.70

The goalkeepers from FIFA 22 have transfer values


quite low, hence the irregularity in the graph shown below,
which indicates model performance of best model on test data. C. Midfielders

We considered central midfielders and central defensive


midfielders in this category. FIFA data which contributed to
transfer value were Overall, Potential, Short Passing, Long
Passing, Stamina, Aggression, Vision, Slide Tackle, and Ball
Control. Statistical data used included Age, Contract Remain-
ing, Gls, SoT, CrdY, Recov, PPA, CK, Press, SCA, Int, Err,
Targ, and Fls. The best performing model was XGB with an
R2 value of 0.81.
The model performance is shown below.

Midfielders
Model R2 MAE MSE
XGB 0.81 3.17 26.73
B. Defenders GRADIENT 0.811 3.30 29.62
For this category, we considered center backs, full backs BOOSTING
and wing backs. The FIFA data we took included Overall, LGBM 0.783 3.65 37.19
Potential, Composure, Slide Tackle, Stand Tackle, Aggression, RANDOM FOREST 0.751 3.74 34.98
Interceptions, and Defensive Awareness. These were found
to have a good impact on transfer value. Statistical data we
took comprised of Age, Contract Remaining, Blocks, Int, Clr,
Err, Recov, Fls and CrdY. These factors contributed to how The below graphic shows the best performing model on
good a player is defensively. In addition, to highlight offensive test data.
contribution of defenders, we included Gls, Ast, SCA, and

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.
included features like a player’s contract and age which also
significantly affect the transfer value. FIFA data was publicly
available online and the statistical data was extracted from
FBref. We considered the true transfer value of a player to be
his transfermarkt.co.uk value. After the data were collected,
it was initially merged, then split based on player position.
Several regression models were built for each position, among
which the ones that performed the best were the Gradient
Boosting algorithm and Xtreme Gradient Boosting algorithm.
This approach returned satisfactory accuracy and results. Our
work benefits the football community in a couple of ways.
Firstly, it allows football websites to provide information about
a player’s transfer value(based on statistics and FIFA data)
D. Attackers in his profile page. Secondly, professional teams can use this
method to estimate transfer value of players, which helps them
For attackers, we considered strikers, attacking midfielders, identify bargains in the market and set a limit on how much
and wingers. FIFA data we took included parameters like to bid for a particular player.
Overall, Potential, Finishing, Crossing, and Dribbling.
Statistical data comprised of Age, Contract Remaninig, Gls,
Ast, xG, xA, SoT, Sh/90, SCA, GCA, Press, Att Pen, Min, VII. F UTURE S COPE
CrdY, DSucc, and CPA. The best results were found by using
the Gradient Boosting model with an R2 value of 0.86. The The project has a massive potential in the future. The
table below shows the performance of the various models amount and variety of stats are ever increasing and this will
used. definitely have a big positive impact on the project. As of
now, the finely detailed statistics are available only for the
biggest and most popular football competitions. This restricts
Attackers the amount of players we have access to and the magnitude
Model R2 MAE MSE of specialization we can give towards each position on the
XGB 0.856 3.57 37.03 football pitch. Once this changes, we will have a huge number
GRADIENT 0.860 3.56 36.51 of players to study from and thus the accuracy of the model
BOOSTING and the information it can extract from it will definitely
LGBM 0.795 4.07 57.84 increase. The video game market is a hyper competitive one,
RANDOM FOREST 0.783 3.95 58.87 with each game franchise trying to be the dominant player in
the market. As a result, a future where the video games have
much more detailed metrics than EA Sports’ FIFA 22 is easy
to envision. This means that projects like ours’ will be able to
use the data for innovative and elegant purposes like transfer
The performance of the model on test data is given
value prediction and other analyses.
below. Statistics avilable today are the most clear cut for
attackers and this reflects in the model performance.
R EFERENCES

[1] Iman Behravan and Seyed Mohammad Razavi A novel machine learning
method for estimating football players’ value in the transfer market
(2020)
[2] Ahmet Talha Yiğit, Barış Samak, Tolga Kaya Football Player Value As
sessment Using Machine Learning Techniques(2020)
[3] Prabhnoor Singh and Puneet Singh Lamba Influence of crowdsourcing,
popularity and previous year statistics in market value estimation of
football players (2019)
[4] T. Kirschstein and Steffen Liebscher Assessing the market values of
soccer players – a robust analysis of data from German 1. and 2.
Bundesliga (2019)
[5] Oliver Müller, Alexander Simons, Markus Weinmann Beyond crowd
judgments: Data-driven estimation of market value in association foot-
ball (2017)
[6] Rade Stanojevic and Laszlo Gyarmati Towards data-driven football
VI. C ONCLUSION player assessment (2016)
In this paper, we predict the transfer value of footballers [7] Yuan He Predicting Market Value of Soccer Players Using Linear
Modeling Techniques (2015)
using thier FIFA 22 data and statistical data. Contrary to previ-
ous works, we use extremely detailed statistical measures and

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.
A PPENDIX
The table below shows all the statistical data we considered.
Data was obtained from FBREF(https://fbref.com/en/)

Statistical Features Used


Feature Name Exaplanation
Age Player Age
Min Minutes Played
Gls Goals Scored
Ast Assists Provided
CrdY Number of Yellow Cards Received
CrdR Number of Red Cards Received
xG Expected Goals
xA Expected Assists
SoT Shots on Target
Sh/90 Shots per 90 minutes
KP Key Passes
PPA Completed Passes into the 18 yard box
CrsPA Completed Crosses into the 18 yard box
Prog Progressive passes made
CK Corner Kicks taken
SCA Shot Creating Actions
GCA Goal Creating Actions
TklW Tackles Won
Press Pressures Made
PSucc Successful Pressures Made
Blocks Blocks Made
Int Interceptions Made
Clr Clearances Made
Err Mistakes leading to opponents shot
Att Pen Touches in the attacking penalty Area
DSucc Successful Dribbles Completed
CPA Carries into the 18 yard box
Targ Number of times a player was the target of an
attempted pass
Fls Fouls Committed
Fld Fouls Drawn
PKwon Penalty Kicks Won
PKcon Penalty Kicks Conceded
OG Own Goals
Recov Number of loose balls recovered

Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 30,2024 at 14:41:05 UTC from IEEE Xplore. Restrictions apply.

You might also like