Event Detection in Football Improving the Reliabil
Event Detection in Football Improving the Reliabil
RESEARCH ARTICLE
a1111111111
a1111111111 Abstract
a1111111111
a1111111111 With recent technological advancements, quantitative analysis has become an increasingly
a1111111111 important area within professional sports. However, the manual process of collecting data
on relevant match events like passes, goals and tacklings comes with considerable costs
and limited consistency across providers, affecting both research and practice. In football,
while automatic detection of events from positional data of the players and the ball could
OPEN ACCESS alleviate these issues, it is not entirely clear what accuracy current state-of-the-art methods
Citation: Bischofberger J, Baca A, Schikuta E
realistically achieve because there is a lack of high-quality validations on realistic and
(2024) Event detection in football: Improving the diverse data sets. This paper adds context to existing research by validating a two-step rule-
reliability of match analysis. PLoS ONE 19(4): based pass and shot detection algorithm on four different data sets using a comprehensive
e0298107. https://doi.org/10.1371/journal.
validation routine that accounts for the temporal, hierarchical and imbalanced nature of the
pone.0298107
task. Our evaluation shows that pass and shot detection performance is highly dependent
Editor: Ersan Arslan, Tokat Gaziosmanpasa
on the specifics of the data set. In accordance with previous studies, we achieve F-scores of
University Tasliciftlik Campus: Tokat
Gaziosmanpasa Universitesi, TURKEY up to 0.92 for passes, but only when there is an inherent dependency between event and
positional data. We find a significantly lower accuracy with F-scores of 0.71 for passes and
Received: August 30, 2023
0.65 for shots if event and positional data are independent. This result, together with a criti-
Accepted: January 13, 2024
cal evaluation of existing methodologies, suggests that the accuracy of current football
Published: April 18, 2024 event detection algorithms operating on positional data is currently overestimated. Further
Peer Review History: PLOS recognizes the analysis reveals that the temporal extraction of passes and shots from positional data poses
benefits of transparency in the peer review the main challenge for rule-based approaches. Our results further indicate that the classifi-
process; therefore, we enable the publication of
cation of plays into shots and passes is a relatively straightforward task, achieving F-scores
all of the content of peer review and author
responses alongside final, published articles. The between 0.83 to 0.91 ro rule-based classifiers and up to 0.95 for machine learning classifi-
editorial history of this article is available here: ers. We show that there exist simple classifiers that accurately differentiate shots from
https://doi.org/10.1371/journal.pone.0298107 passes in different data sets using a low number of human-understandable rules. Operating
Copyright: © 2024 Bischofberger et al. This is an on basic spatial features, our classifiers provide a simple, objective event definition that can
open access article distributed under the terms of be used as a foundation for more reliable event-based match analysis.
the Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
commercial sources and we are not allowed to the overall capabilities as well as strengths and weaknesses of teams and athletes. Furthermore,
share it. It can be requested from the Austrian objective, quantitative analysis has the power to reduce the impact of individual and societal
Football Federation (https://www.oefb.at/), via the
biases which could ultimately lead to more truthful and healthy relationships between athletes,
“Abteilung für Wissenschaft, Analyse und
Entwicklung” or directly from the various data coaches, and the public.
providers/rights holders, i.e. Stats Perform (https:// With growing sophistication and decreasing costs, technologies such as video-based sys-
www.statsperform.com/), the UEFA (https://www. tems, electronic tracking systems and match analysis software become more and more wide-
uefa.com/), and Subsequent (https://subsequent. spread, leading to an increasingly important role of quantitative analysis in sports. In football,
ai/).
tactical and technical performance analysis traditionally focuses on player actions such as
Funding: The authors received no specific funding shots, passes and dribbles [1]. The data for these analyses is collected through manual tagging
for this work. of events which is a time-consuming and cost-intensive process. Additionally, while the reli-
Competing interests: The authors have declared ability of manual event detection systems can be ensured by extensive training of the data col-
that no competing interests exist. lectors [2], their validity is harder to guarantee. Data providers are not required to publish
accurate and detailed definitions of the events they annotate. Definitions also vary across pro-
viders, because most football concepts are not prescribed by the rules of the game but emerged
empirically, which makes their definitions subject to opinion. For example, the term recovery
is used for very different sets of actions, ranging from a player picking up a loose ball [3] to any
action by which a player wins the ball [4]. Even foundational actions such as passes, dribbles
and shots are typically ambiguous: For example, the provider Wyscout treats crosses as a subset
of passes whereas Stats Perform does not. Stats Perform also requires a pass to be intentional,
which is hardly an objective qualificiation.
Without universal definitions, different studies which seemingly use the same categories of
events are not necessarily comparable. Also, the event definition that is required or expected
by an analyst or researcher might differ from the definition used by the data collector. For that
reason, an automated and customizable data collection process would increase the validty of
both scientific and practiced sports performance analysis—if such a process is sufficiently
accurate.
So far, various methods to automatically extract events from raw video footage or posi-
tional data of the players and the ball have been proposed, using either machine learning or
rule-based detection routines. A rule-based algorithm operating on positional data would be
particularly well suited to not only alleviate the burden of manual data collection, but also
provide a simple, objective definition of events as a foundation for further analysis. Multiple
machine-learning- and rule-based methods have been proposed to detect events in positional
data, reporting promising accuracies of 90% and above [5–8]. However, most studies did not
evaluate their algorithms across multiple data sets, so it is not guaranteed that these algo-
rithms pick up the underlying structure of the football game rather than the error profile or
other specifics of the respective data set. Also, and more importantly, the data sets that were
used for validation are typically not independent from the positional data, as they both come
from a common intermediate source or are partially derived from each other. Using such
data for the evaluation of an algorithm inevitably leads to an inflation of its estimated perfor-
mance, since information from the reference data spills over into the input data for the
model.
This article complements and enriches those previous findings by providing a strong vali-
dation of a simple rule-based algorithm for the detection of passes and shots as two of the most
important events in football from positional data. We propose a highly robust validation rou-
tine and use it to evaluate the algorithm across four different data sets, where one data set
includes independent positional and event data. We also compare different algorithms to fur-
ther distinguish passes and shots, including both rule-based and machine-learning classifiers
to determine whether there exists a simple, human-understandable set of rules which accu-
rately distinguishes shots from passes.
Designing a proper validation routine for this problem is technically challenging, because it
involves detecting composite events from a continuous stream of positional data. It is a tempo-
ral, hierarchical and imbalanced classification task with unreliable reference data. The sug-
gested validation routine is therefore relevant beyond the scope of football event detection for
problems with a similar structure, such as object detection from videos [9] or sentiment analy-
sis from streams of social media posts [10].
Overall, the main novel contributions of this paper are:
• The presentation of a reliable validation routine for football event detection as a temporal,
hierarchical, and imbalanced classification problem.
• A reliable estimate of the performance of different pass and shot detection algorithms based
on positional data across four diverse data sets.
• A quantification and exploration of the difference in performance between independent and
dependent reference data, which adds important context to existing findings.
• An accurate pass and shot classifier that can be used as an adjustable foundation for event-
based match analysis.
The remaining paper is structured as follows: Section 2 reviews existing approaches to auto-
matic event detection in football. Section 3 elaborates the pass and shot detection algorithms
evaluated in this paper. Section 4 describes the data sets used and lays out the design of the val-
idation procedure. Section 5 presents the validation results. Section 6 provides a discussion of
the results. Section 7 summarizes the paper and proposes directions for future research.
generated from a football simulation engine. Khaustov and Mozgovoy [7] applied another
rule-based algorithm to positional data from 5 football matches and achieved an F-score of
0.93 for successful passes, 0.86 for unsuccessful passes, and 0.95 for shots on goal. Even higher
values were achieved, such as 0.998 for successful passes, on another data set containing several
short sequences of play. However, they also generated their gold standard by hand by watching
the game “in a 2D soccer simulator”, i.e. likely using the same positional data that also under-
lies the event detection process.
Richly, Moritz and Schwarz [16] used a three-layer artificial neural network to detect events
in positional data and achieved an average F-score of 0.89 for kicks and receptions. However,
they also used positional data to assist the manual creation of their gold standard, specifically
the acceleration data of the ball. They also used a very small data set with only 194 events in
total. Vidal-Codina et al. [5] proposed a two-step rule-based detector and evaluated it on a
very heterogeneous data set, however with no discussion of possible data dependencies and
differences of the algorithm’s performance between the various included providers. Among
other events, they achieved a total F-score of 0.92 for passes and 0.63 for shots.
Overall, while the achieved F-scores beyond 90% for passes and shots appear promising, the
currently available evaluation results don’t necessarily reflect a practical setting where manual
event data is supposed to be replaced and is therefore not available to pre-process positional
data. It is likely that the existing studies tend to overestimate the accuracy of their algorithms
due to information from the reference data leaking into the input data. For that reason, it is
currently not clear which merit rule-based classification routines hold concerning event detec-
tion in football and if their accuracy is sufficient for industrial and research purposes. Also,
there is a lack of agreed-upon standards regarding the validation a given algorithm, for exam-
ple the specific conditions under which a detected event can be considered to match a refer-
ence event. Other problems like low sample sizes, a lack of variety in the evaluated data sets
and the use of synthetic data further emphasize the need for new perspectives on the topic.
defined as the closest player to the ball during the hit. With the acceleration aBall(f) of the ball
in frame f and the minimal ball-player distance over all players dclosest(f), the occurence of a
hit in f is defined as:
Hitð f Þ ≔ aBall ðf Þ > min acc and dclosest ðf Þ < min vicinity ð1Þ
A play is then defined as either two subsequent hits by different players (which corresponds
to a pass or shot followed by a reception, deflection or goalkeeper save) or a hit followed by a
game interruption (e.g. a shot that misses the target or a misplaced pass that crosses the
sideline).
This definition of a play is broad and captures all “pass-like” events such as crosses, deflec-
tions, clearances, misplaced touches and throw-ins which may or may not be considered
passes in different contexts. If one wants to further subdivide the pass event into categories,
this can be done explicitly through additional rules. For example, a cross could be defined as a
pass that originates in a specified zone lateral to the penalty box, is received in the penalty box,
and reaches a certain height during its trajectory.
Since crosses, clearances and throw-ins are commonly recorded in football event data, we
include those events as passes in the evaluation. Deflections and misplaced touches on the
other hand are not always recorded, so they should be excluded algorithmically. Misplaced
touches are difficult to detect because essentially, misplaced touches differ from passes by the
intention of the player rather than directly observable parameters of the play. While the same
is true for deflections, deflections appear to have the more distinct kinematic features. Intui-
tively, deflections can be thought of as plays that directly follow another play when the deflect-
ing player has not had sufficient time to make a conscious play. We therefore use the following
rule to classify plays as deflections and exclude those from the final set of detected plays.
DeflectionðplayÞ ≔ player:frame ¼ previous play:target frame and
ð2Þ
previous play:duration < max deflection time
Algorithm 1 describes the procedure programmatically. Given that all calculations per
frame run in constant time, its time complexity is O(n) where n is the number of frames in the
positional data. The space complexity is also O(n).
Algorithm 1 Play Detection Algorithm
1: function ISDEFLECTION(play, previousPlay)
2: return play.frame = previousPlay.target_frame and previousPlay.
duration < max_deflection_time
3:
4: function ISHIT(f, min_acc, min_vicinity)
5: return aBall(f) > min_acc and dclosest(f) < min_vicinity
6:
7: function DETECTPLAYS(game, min_acc, min_vicinity, max_deflection_time)
8: plays []
9: startFrame −1
10: firstHit −1
11: previousClosestPlayer −1
12: previousPlay −1
13: for each frame f in game do
14: closestPlayer CLOSESTPLAYERTOBALLF(f)
15: if ISHIT(f, min_acc, min_vicinity) and previousClosestPlayer 6¼
closestPlayer then
16: if firstHit = −1 then
17: firstHit f
18: else if note ISDEFLECTION(play, previousPlay) then
goalkeeper play.initial_speed > min_speed_gk: A shot must either end at the goalline or
must be kicked forcefully enough. The required speed threshold differs depending on
whether the ball hits an outfield player or the opposition goalkeeper.
Machine learning. To automatically learn rulesets with various complexity, we also evalu-
ate decision trees with a fixed number of leaves. The structure of the decision trees is optimized
by fitting other hyperparamters to data.
Additionally, we use different black box machine learning models to estimate whether and
how much additional structure in the data can be uncovered when human-understandable
rules are not required. These models are a random forest, a SVM and AdaBoost with decision
trees as base classifier.
Baseline. Baseline performance is measured using a dummy predictor that always predicts
the most frequent class, i.e. “Pass” in the training data.
4. Evaluation
Data sets
We use positional and event data from four different providers for the evaluation.
• Metrica [17]: Anonymized sample data published by Metrica Sports consisting of 3 games
with synchronized positional and event data.
• Stats [18]: Synchronized positional and event data of consecutive games of a professional
men’s national team in various competitions, provided by Stats Perform, 14 games.
• Euro [19]: Positional data from the men’s European Championship 2021, provided by Tra-
cab, complemented with independent Hudl Sportscode event data, 4 games.
• Subsequent [20]: Synchronized positional and event data of consecutive games of a pro-
fessional men’s national team in various competitions, provided by Subsequent. 6 games.
The positional data from all four providers was collected using optical tracking. Tracab and
Stats Perform use in-venue camera systems whereas Metrica and Subsequent generate posi-
tional data from a single video recording and are therefore expected to be of lower quality. The
positional data contains the x-y coordinates of the players and the ball during the match, cap-
tured at 25 Hz (Metrica, Euro, Subsequent) and 10 Hz (Stats) respectively. Due to
the nature of the data, the event information contained in the four data sets is heterogeneous.
Nevertheless, all four data sets do record passes and shots, including a timestamp which can be
used to synchronize the respective action with the positional data. Additional information that
is included in all data sets is the identity of the player who performed the pass or shot. An indi-
cation about the success of the pass as well as the identity of the pass receiver and the location
at which the pass or shot starts and ends is not present in all data sets. The success of a pass
and the identity of its receiver can however be deduced from the information given about the
next ball-related event after the pass.
From qualitative inspection, it is obvious that the bundled positional and event data have
not been generated independently from each other. For example, in the Metrica data set,
the position of the ball is typically exactly equal to the position of the player who is currently in
possession of the ball—a phenomenon that has also been observed in previous studies on data
from other providers [5]. This observation strongly suggests that the position of the ball has
been partly or even entirely reconstructed from manually annotated events. To a lesser degree,
such artifacts are also apparent in the Stats and Subsequent datasets, but not in the in-
venue positional data from Tracab within the Euro data set.
In the Euro dataset, the event data was obtained from the online platform Wyscout. How-
ever, the timestamps of the events were not accurately aligned with the actual events. For that
reason, we corrected the timestamps of all passes, shots and game interruptions manually
using broadcast footage of the games. This process also involved some minor corrections to
the data, for example for events that were clearly missing or duplicate. Around 3 percent of
events have been added or removed for such reasons. No positional data was used during this
process.
Game segmentation
Since detected events have to be matched with reference events, the validation routine needs to
operate on contiguous segments of play in which to search for matching events. Since our
models contain parameters to be fitted, we need at least two such segments in order to obtain a
training set and a test set. Naturally, the data could be divided into games or halfs. But since
our smallest data set contains only 6 halfs, a subdivision along halfs would be too coarse to
obtain a representative test set.
More blended data sets are obtained by instead dividing the game at any sufficiently long
interruption. The game interruptions must be long enough so that a detected event and its
true corresponding reference event almost certainly cannot end up in different segments. A
higher interruption length therefore minimizes the risk of unwanted separations while a
shorter interruption length increases the number of available segments. We found a minimum
interruption time of 2 seconds to be a good compromise.
Temporal matching
To determine whether a detected event matches a reference event, they have to be temporally
matched. Since we treat passes and shots as composed of two atomic events which are modeled
without a duration (hits and game interruptions), it is sensible to match two plays by individu-
ally matching their two constituent events. A play is matched if both of its consitutient atomic
events match a detected event. The atomic events match if they are no further than a certain
time span (matching window) apart from each other.
The choice of the optimal matching window involves the following trade-off: If the match-
ing window is too small, it mistakenly misses actually matching events and underestimates the
performance of the algorithm. If it is too large, it could mistakenly match unrelated plays and
overestimate the performance of the algorithm. Therefore, additional information like the
player and the location of the play should be used to establish truthful matching conditions.
The shots and passes in our data sets share only one additional variable: the player who took
the play. Therefore, we further require that this player must be equal for two events to be
matched.
The dependency of detection performance on the choice of matching window is depicted
in Fig 1. We qualitatively estimate the optimal matching window as 500 milliseconds for
Stats [18], Metrica [17], and Subsequent [20], and 1000 milliseconds for Euro [19].
This is roughly where the scores begin to increase much slower than before, which indicates
that the majority of actually corresponding events have been matched.
Any ambiguities where an event matches multiple candidates are resolved by finding a
maximum cardinality matching for each segment using the Hopcroft-Karp algorithm.
Fig 1. Relationship between matching window and F-score in the training data.
https://doi.org/10.1371/journal.pone.0298107.g001
Passes and shots constitute a class imbalance, as passes are about 40 times more common
than shots in football. Since different categories of events are typically of separate interest in
analysis rather than being mixed together, it is most appropriate to assign equal importance to
passes and shots as categories, i.e. to assign more weight to an individual shot than an individ-
ual pass. This way, the algorithm will be optimized in a way that allows it to be used for the
analysis of both types of events rather than being optimized to recognize mostly passes.
Based on that line of reasoning, we compute the macro-averaged recall Rplay.
Rplay;pass þ Rplay;shot
Rplay ¼ ð3Þ
2
Rplay is then used to compute the F1-score Fplay that serves as the optimization target to bal-
ance overall recall and precision:
Rplay � Pplay 2Pplay � ðRplay;pass þ Rplay;shot Þ
Fplay ¼ 2 ¼ ð4Þ
Rplay þ Pplay 2Pplay þ Rplay;shot þ Rplay;pass
Pass/shot classification. The classification into passes and shots can be evaluated inde-
pendently of the preceding play detection step using the precision and recall of passes and
shots relative to the set of successfully matched plays.
# classified shots matched with a reference shot
• Shot Precision Pshot ¼ # classified shots matched with a reference shot or pass
Again, the optimization target must account for class imbalance. In this case, since preci-
sion and recall are available for both classes, we can use the macro-average of the two regular
F1-scores to obtain our optimization target Favg.
• Fshot ¼ RRshot
shot �Pshot
þPshot
R
pass pass �P
• Fpass ¼ Rpass þPpass
Fshot þFpass
• Favg ¼ 2
To quantify the overall performance of the classifier, we also report variants of the above
metrics relative to the total number of reference and detected events, respectively.
correctly classified shots
0
• Pshot ¼ # classified shots# ðincluding among falsely detected playsÞ
0 P0 �R0
• Fshot ¼ 2 P0shotþRshot
0
shot shot
0 �R0
Ppass
0 pass
• Fpass ¼2 0 þR0
Ppass pass
0 þF 0
Fpass
0
• Favg ¼ shot
2
Parameter optimization
Each data set is split into a training set and a test set with a 65-35 ratio of game segments. The
resulting number of shots and passes is shown in Table 2.
The parameters of the play detector are fitted on the entire training set using 300 iterations
of Bayesian optimization within the following bounds.
min vicinity 2 ½0:01m; 10m�
h m mi
min acc 2 0 2 ; 120 2
s s
Similarly, the 8 parameters of the manual rules classifier are fitted using Bayesian optimiza-
tion with 120 iterations and the following bounds.
min progression 2 ½ 100m; 50m�
The hyperparameters of the machine learning models are fitted using a 10 times repeated
10-fold stratified cross-validation on the training set using 250 iterations of Bayesian parameter
search. For the decision trees, the parameter max_leaves is instead fixed to various values.
5. Results
Play detection
As shown in Table 3, play detection performs well for the Stats, Subsequent and
Metrica data sets, achieving macro-averaged F-scores of 0.87, 0.88, and 0.83 respectively.
The more realistic Euro data set, where positional and event data are decoupled, achieves a
significantly weaker score of 0.70. Shots display a lower class-specific recall than passes across
all data sets.
As can be seen from Table 4, the optimized values for min_acc, min_vicinity, and
max_deflection_time vary significantly between the data sets.
Pass/shot classification
The results of the pass and shot classifier are shown in Fig 2.
Due to the strong imbalance of the data, the baseline model, which always predicts the
majority class, yields a macro average F-score of roughly 0.5. All classifiers easily outperform
this baseline.
AdaBoost and Random Forest show the strongest performance with F-scores Favg ranging
from 0.93 to 0.95 for Stats, Euro and Metrica, and 0.85 to 0.87 for Subsequent. The
performance of the rule-based classifiers is almost as strong with F-scores between 0.83 and 0.91.
Fig 3 shows the performance of the decision trees depending on the fixed maximum num-
ber of leaves. The performance converges already after 3-6 leaves, after which the possibility to
add more splitting rules does not lead to a clear performance improvement.
General insights into feature importance are drawn by inspecting the decision trees and the
random forest. As shown in Table 5, the 2-leaf trees either use the rule “Distance End
Position to Goalline < X” where X is around 2-5 meters away from the goal line or
“Distance Start Position to Goal < X” where X is around 20-30 meters to iden-
tify shots. There are only two other rules used in the decision trees up to 4 leaves, namely
“Opening angle < 12.4˚” and “Lateral end position (projected) < X” with
X between 5 and 10m. Beginning with the third split, the decision trees begin to learn redun-
dant splits, assigning the same class to both child nodes. Inspecting the impurity-based feature
importance of the random forest (Fig 4) confirms the paramount role of these four features in
classification, while some of the remaining features such as the initial speed of the ball, the dis-
tance of the closest attacker to the goal and the progressed distance towards the goal also
appear to be relevant.
Table 5. The logical rules learned by the first three decision trees to classify a play as a shot, for each data set.
Dstart,goal: Distance from play origin to goal. Dstart,goal: Distance from play end position to goal-line. Aopen: Opening
angle of the goal from play origin. Yend*: End position of the play, projected onto goal-line.
Data set 2 Leaves 3 Leaves 4 Leaves
Euro [19] Dstart,goal < 30.1m Aopen > 12.4˚ and Yend* < 9.48m Dend,gl < 2.14m and Aopen > 12.1˚
Stats [18] Dend,gl < 3.08m Dend,gl < 3.08m and Yend* < 8.15m Aopen > 12.6˚ and Yend* < 7.16m
Metrica [17] Dend,gl < 2.46m Dend,gl < 4.18m and Aopen > 12.9˚ Dend,gl < 2.46m and Aopen > 10.9˚
Subsequent [20] Dend,gl < 3.91m Dend,gl < 3.91m and Aopen > 10.4˚ Dend,gl < 3.91m and Aopen > 10.4˚
https://doi.org/10.1371/journal.pone.0298107.t005
The combined performance of the event detection routine is shown in Table 6, using Ada-
Boost for shot classification. The total macro-averaged F-score for detecting passes and shots
range from 0.67 to 0.82, depending on the data set. As is also evident from the evaluation of
the detector alone (Table 3), shots achieve much lower scores than passes. Passes are detected
with an overall F-score of around 0.9, except for the Euro dataset which achieves a lower
score of 0.71.
6. Discussion
Our results show that the performance of pass and shot detection is heavily dependent on the
characteristics of the data set. Regarding the data sets Subsequent, Metrica and Stats,
our study reproduces the previously observed F-scores in pass detection, while using a mini-
malistic detection algorithm. Our scores between 0.87 and 0.92 for those data sets are in line
with the results from Morra et al. (0.89) [6], Khaustov and Mozgovoy (0.86 unsuccessful
passes, 0.93 successful passes) [7], Richly et al. (0.89) [16], and Vidal-Codina et al. (0.92) [5].
The large differences of the optimal parameter values (Table 4) indiciate that the utilized
data sets are heterogeneous. The large difference in the optimal acceleration threshold stems
from the acceleration being manually computed as the second derivative of the position.
Therefore, the particularly high optimal threshold for the Euro data set indicates that its posi-
tional data is indeed the most raw among the four providers.
In contrast, Subsequent, Stats and Metrica likely used event data to post-process
their positional data. Therefore, the performance of the pass and shot detection algorithm on
these data sets is likely an overestimation of its true ability to identify these plays in raw posi-
tional data. Its performance on the Euro data set (0.71 for passes, 0.62 for shots) is a more
truthful reflection of its capabilities as the positional and event data within this data set are
independent and the positional data appears to contain few event artifacts.
Given these results and assuming that other algorithms would experience a similar drop in
performance when evaluated on independent data (see the results of Vidal-Codina et al. [5] for
a rough impression), the current state of the art in detecting events from positional data seems
unsatisfying. One third of the detected passes or shots would not appear in the manual event
data that analysts are used to, and conversely, around one third of the manually collected
events would be missing. Even when factoring in the inherent subjectivity of manual event
data, this appears to be a troubling deficit in accuracy.
Our two-step event detection pipeline exposes play detection rather than the subsequent
classification as the primary issue. Qualitative post-hoc inspection of the detector’s mistakes
on the Euro data set reveals the following causes of suboptimal performance:
• Inaccuracies of the positional data: For example, the ball position in the data from Tracab
comes with small artificats where the ball sometimes changes its velocity abruptly during a
pass. This is falsely recognized as a hit if some player happens to be close by, for example
when a pass goes slightly past or over a player. The algorithm can account for that by reduc-
ing the required player-ball-distance to determine a hit. However, the required player-ball-
distance also needs to be large enough to account for the reach of the player and noise in the
ball and player positions. The algorithm cannot account for both at the same time.
• The algorithm struggles when many players are located around the ball and the ball is accel-
erated either through dribbling or artifacts. In these situations, the closest player to the ball
can change frequently without an actual possession change. This effect is much less prevalent
in the other data sets because the ball “sticks” to the currently possessing player as presum-
ably determined from manually collected event information.
• The lack of ball height data makes it difficult to identify irrelevant x-y-accelerations due to
bouncing.
• Errors in the reference data, in particular missing events.
Shot classification on the other hand performs well across all data sets. Given the quick con-
vergence of the decision trees, it seems that for most data sets, one to three human-under-
standable rules are already sufficient to differentiate shots from passes with an accuracy of
around 90%. These rules primarily operate upon the start and end position relative to the
opponent’s goal. At least a small additional boost in accuracy can be achieved using machine
learning. A small set of rules is therefore sufficient for differentiating shots from passes and
can be used as a more objective definition for this kind of event.
Also, like other rule-based methods, the proposed algorithm runs in linear time, which
makes it suitable for real-time application which is an essential requirement in the industry
where data must be streamed to clients during matches.
Acknowledgments
Thanks to Philipp Schmid for his diligent assistance with the timestamp correction.
Author Contributions
Conceptualization: Jonas Bischofberger.
Formal analysis: Jonas Bischofberger.
Methodology: Jonas Bischofberger.
Project administration: Jonas Bischofberger.
Software: Jonas Bischofberger.
References
1. Sarmento H, Marcelino R, Campanico J, Matos N, Leitão J. Match analysis in football: a systematic
review. Journal of sports sciences. 2014; 32:1831–1843. https://doi.org/10.1080/02640414.2014.
898852 PMID: 24787442
2. Liu H, Hopkins W, Gómez AM, Molinuevo SJ. Inter-operator reliability of live football match statistics
from OPTA Sportsdata. International Journal of Performance Analysis in Sport. 2013; 13(3):803–821.
https://doi.org/10.1080/24748668.2013.11868690
3. StatsBomb Data Specification v1.1; 2019. Available from: https://github.com/statsbomb/open-data/
blob/master/doc/StatsBomb%20Open%20Data%20Specification%20v1.1.pdf.
4. Wyscout Glossary; n.d. Available from: https://dataglossary.wyscout.com/recovery/.
5. Vidal-Codina F, Evans N, El Fakir B, Billingham J. Automatic event detection in football using tracking
data. Sports Engineering. 2022; 25(1):18. https://doi.org/10.1007/s12283-022-00381-6
6. Morra L, Manigrasso F, Canto G, Gianfrate C, Guarino E, Lamberti F. Slicing and dicing soccer: auto-
matic detection of complex events from spatio-temporal data. In: Image Analysis and Recognition: 17th
International Conference, ICIAR 2020, Póvoa de Varzim, Portugal, June 24–26, 2020, Proceedings,
Part I 17. Springer; 2020. p. 107–121.
7. Khaustov V, Mozgovoy M. Recognizing events in spatiotemporal soccer data. Applied Sciences. 2020;
10(22):8046. https://doi.org/10.3390/app10228046
8. Tovinkere V, Qian RJ. Detecting semantic events in soccer games: Towards a complete solution. In:
IEEE International Conference on Multimedia and Expo, 2001. ICME 2001. IEEE Computer Society;
2001. p. 212–212.
9. Nascimento JC, Marques JS. Performance evaluation of object detection algorithms for video surveil-
lance. IEEE Transactions on Multimedia. 2006; 8(4):761–774. https://doi.org/10.1109/TMM.2006.
876287
10. Xu QA, Chang V, Jayne C. A systematic review of social media-based sentiment analysis: Emerging
trends and challenges. Decision Analytics Journal. 2022; 3:100073. https://doi.org/10.1016/j.dajour.
2022.100073
11. Brechot M, Flepp R. Dealing with randomness in match outcomes: how to rethink performance evalua-
tion in European club football using expected goals. Journal of Sports Economics. 2020; 21(4):335–
362. https://doi.org/10.1177/1527002519897962
12. Pena JL, Touchette H. A network theory analysis of football strategies. arXiv preprint arXiv:12066904.
2012;.
13. Sorano D, Carrara F, Cintia P, Falchi F, Pappalardo L. Automatic pass annotation from soccer video
streams based on object detection and lstm. In: Machine Learning and Knowledge Discovery in Data-
bases. Applied Data Science and Demo Track: European Conference, ECML PKDD 2020, Ghent, Bel-
gium, September 14–18, 2020, Proceedings, Part V. Springer; 2021. p. 475–490.
14. Khan A, Lazzerini B, Calabrese G, Serafini L. Soccer Event Detection. In: 4th International Conference
on Image Processing and Pattern Recognition; 2018. p. 119–129.
15. Chen SC, Shyu ML, Chen M, Zhang C. A decision tree-based multimodal data mining framework for
soccer goal detection. In: 2004 IEEE International Conference on Multimedia and Expo (ICME)(IEEE
Cat. No. 04TH8763). vol. 1. IEEE; 2004. p. 265–268.
16. Richly K, Moritz F, Schwarz C. Utilizing Artificial Neural Networks to Detect Compound Events in Spa-
tio-Temporal Soccer Data. In: 3rd SIGKDD Workshop on Mining and Learning from Time Series; 2017.
17. Dagnino B. Metrica Sports Sample Data; 2021. GitHub. Available from: https://github.com/metrica-
sports/sample-data/commit/e706dd506b360d69d9d123d5b8026e7294b13996.
18. Stats Perform. Proprietary data set; 2021.
19. ChyronHego; Wyscout. Proprietary data set; 2021.
20. Subsequent. Proprietary data set; 2022.