Computer Science in Sport
Computer Science in Sport
Computer
Science in Sport
Modeling, Simulation,
Data Analysis and Visualization
of Sports-Related
Data
Computer Science in Sport
SN Flashcards Microlearning
Quick and efficient studying with digital flashcards – for work
or school!
With SN Flashcards you can:
• Learn anytime and anywhere on your smartphone, tablet or computer
• Master the content of the book and test your knowledge
• Get motivated by using various question types enriched with multimedia
components and choosing from three learning algorithms (long-term-memory
mode, short-term-memory mode or exam mode)
• Create your own question sets to personalise your learning experience
If you have difficulties accessing the SN Flashcards content, please write an email
to [email protected] mentioning “SN Flashcards” and the
book title in the subject line.
Daniel Memmert
Editor
Computer Science
in Sport
Modeling, Simulation, Data Analysis and Visualization
of Sports-Related Data
Editor
Daniel Memmert
Institute of Exercise Training and Sport Informatics
German Sport University
Köln, Nordrhein-Westfalen, Germany
Translation from the German language edition: “Sportinformatik” by Daniel Memmert, © Springer-
Verlag GmbH 2023. Published by Springer. All Rights Reserved.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer-Verlag GmbH,
DE, part of Springer Nature 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or
information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica-
tion does not imply, even in the absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with
regard to jurisdictional claims in published maps and institutional affiliations.
internal peer review process, and for their constant desire to improve. Many thanks
for the always good and friendly cooperation.
I would also like to thank Ms. Erika Graf for her constant and careful supervi-
sion of the book, and her many comments and advice. In addition, I would like to
thank our student assistants Klara Rinne, Tara Coulson, and David Brinkjans for
taking a critical student perspective, for their constructive feedback to the authors,
and for numerous contributions to smooth out rough edges.
Finally, my great thanks go to the constant, very friendly, and always extremely
competent support of our book project by the staff of Springer-Verlag. First and
foremost, I would like to mention Ken Kissinger (Program Planning), who has put
a lot of time, commitment, and energy into the book. This cannot be taken for
granted, including his speed in the process! Without his expertise, it would not have
come about in this way, and for that, I thank him most sincerely. Regarding the
cooperation on the part of Springer-Verlag, I would also like to mention Meike
Barth and Anja Herzer (project management), who accompanied the book project
very successfully up to the production handover, many thanks for this. I would also
like to thank everyone else involved in production (copy-editing, typesetting) for
their professional cooperation during the production process.
I hope you enjoy reading this book and that you gain a great deal of knowledge
from it.
Daniel Memmert
Cologne, Germany
IX
Contents
I History
1 History...............................................................................................................................................3
Martin Lames
1.1 Introduction.......................................................................................................................................4
1.2 The Institutional Constitution of Sports Informatics...........................................................5
1.2.1 The Pre-institutional Phase (Before 1995)......................................................................................5
1.2.2 The Phase of the dvs Section Sports Informatics (1995–2003)..............................................6
1.2.3 The Phase of IACSS (2003–2019).......................................................................................................7
1.2.4 The Institutional Integration Phase of Informatics Working Groups (from 2019)............7
References.................................................................................................................................................8
II Data
2 Artificial Data................................................................................................................................13
Fabian Wunderlich
2.1 Example Sport...................................................................................................................................14
2.2 Background........................................................................................................................................15
2.2.1 Limits of Real-World Data.....................................................................................................................15
2.2.2 The Idea of Artificial Data.....................................................................................................................15
2.2.3 Random Numbers and Monte Carlo Simulation..........................................................................16
2.2.4 Advantages and Disadvantages of Artificial Data Sets..............................................................16
2.3 Applications.......................................................................................................................................17
References.................................................................................................................................................19
3 Text Data..........................................................................................................................................21
Otto Kolbinger
3.1 Introduction.......................................................................................................................................22
3.2 Applications.......................................................................................................................................23
3.2.1 Evaluation of Technological Officiating Aids.................................................................................23
3.2.2 Match Predictions...................................................................................................................................24
3.2.3 Talent Scouting........................................................................................................................................25
References.................................................................................................................................................26
4 Video Data......................................................................................................................................27
Eric Müller-Budack, Wolfgang Gritz, and Ralph Ewerth
4.1 Example Sport...................................................................................................................................28
4.2 Background........................................................................................................................................29
4.3 Basics and Definition.......................................................................................................................30
4.4 Applications.......................................................................................................................................31
References.................................................................................................................................................33
X Contents
5 Event Data.......................................................................................................................................35
Marc Garnica Caparrós
5.1 Example Sport...................................................................................................................................36
5.2 Background........................................................................................................................................37
5.3 Application.........................................................................................................................................38
5.3.1 Event Data to Extend Box Score Statistics......................................................................................38
5.3.2 Event Data to Value in-Game Actions and Player Impact.........................................................39
5.3.3 Event Data to Understand Player Interactions.............................................................................39
References.................................................................................................................................................40
6 Position Data.................................................................................................................................43
Daniel Memmert
6.1 Example Sport...................................................................................................................................44
6.2 Background........................................................................................................................................45
6.3 Applications.......................................................................................................................................46
References.................................................................................................................................................47
7 Online Data....................................................................................................................................49
Christoph Breuer
7.1 Example Sport...................................................................................................................................50
7.2 Background........................................................................................................................................51
7.3 Application.........................................................................................................................................52
References.................................................................................................................................................54
III Modeling
8 Modeling..........................................................................................................................................57
Jürgen Perl and Daniel Memmert
8.1 Example Sport...................................................................................................................................58
8.2 Background........................................................................................................................................60
8.3 Application.........................................................................................................................................62
References.................................................................................................................................................63
9 Predictive Models......................................................................................................................65
Fabian Wunderlich
9.1 Example Sport...................................................................................................................................66
9.2 Background........................................................................................................................................67
9.2.1 Looking into the Future........................................................................................................................67
9.2.2 Predictive Models in Sports.................................................................................................................67
9.2.3 Creation of Predictive Models............................................................................................................68
9.2.4 Exemplary Methods...............................................................................................................................69
9.3 Applications.......................................................................................................................................70
References.................................................................................................................................................71
XI
Contents
10 Physiological Modeling.........................................................................................................73
Manuel Bassek
10.1 Example Sport...................................................................................................................................74
10.2 Background........................................................................................................................................75
10.3 Applications.......................................................................................................................................76
References.................................................................................................................................................78
IV Simulation
11 Simulation......................................................................................................................................81
Jürgen Perl and Daniel Memmert
11.1 Example Sport...................................................................................................................................82
11.2 Background........................................................................................................................................83
11.3 Applications.......................................................................................................................................86
References.................................................................................................................................................88
12 Metabolic Simulation..............................................................................................................89
Dietmar Saupe
12.1 Example Sport...................................................................................................................................90
12.2 Background........................................................................................................................................91
12.3 Applications.......................................................................................................................................92
References.................................................................................................................................................97
V Programming Languages
14 An Introduction to the Programming Language R for Beginners.............109
Robert Rein
14.1 History and Philosophy..................................................................................................................110
14.2 Concept and Programming Paradigms....................................................................................111
14.3 Resources on R..................................................................................................................................112
14.4 R Community and Packages.........................................................................................................112
14.5 Introduction to Working with R...................................................................................................113
14.6 An Example Workflow in R............................................................................................................116
References.................................................................................................................................................123
XII Contents
15 Python...............................................................................................................................................125
Maximilian Klemp
15.1 Example Sport...................................................................................................................................126
15.2 Background........................................................................................................................................127
15.3 Applications.......................................................................................................................................129
References.................................................................................................................................................130
VI Data Analysis
16 Logistic Regression...................................................................................................................135
Ashwin Phatak
16.1 Example Sport...................................................................................................................................136
16.2 Background........................................................................................................................................137
16.3 Application.........................................................................................................................................138
References.................................................................................................................................................140
18 Process Mining.............................................................................................................................149
Marc Garnica Caparrós
18.1 Example Sport...................................................................................................................................150
18.2 Background........................................................................................................................................151
18.3 Application.........................................................................................................................................153
18.3.1 Process Mining in Healthcare.............................................................................................................153
18.3.2 Process Mining in Education...............................................................................................................153
18.3.3 Process Mining in Soccer......................................................................................................................153
References.................................................................................................................................................154
19 Networks Centrality.................................................................................................................157
João Paulo Ramos, Rui Jorge Lopes, Duarte Araújo, and Pedro Passos
19.1 Network Science in Football.....................................................................................................158
A
19.2 Background........................................................................................................................................159
19.3 Applications.......................................................................................................................................162
References.................................................................................................................................................166
XIII
Contents
23 Transfer Learning.......................................................................................................................193
Henrik Biermann
23.1 Example Sport...................................................................................................................................194
23.2 Background........................................................................................................................................195
23.3 Applications.......................................................................................................................................196
References.................................................................................................................................................199
24 Random Forest.............................................................................................................................201
Justus Schlenger
24.1 Example Sport...................................................................................................................................202
24.2 Background........................................................................................................................................203
24.3 Applications.......................................................................................................................................204
References.................................................................................................................................................207
26 Open-Set Recognition............................................................................................................217
Ricardo da Silva Torres
26.1 Example Sport...................................................................................................................................218
26.2 Background........................................................................................................................................218
26.3 Applications.......................................................................................................................................220
References.................................................................................................................................................221
VII Visualization
27 Visualization: Basics and Concepts................................................................................225
Daniel Link
27.1 Example Sport...................................................................................................................................226
27.2 Background........................................................................................................................................226
27.3 Applications.......................................................................................................................................227
References.................................................................................................................................................231
VIII Outlook
28 Outlook.............................................................................................................................................235
Arnold Baca
28.1 Trends...................................................................................................................................................236
28.2 Sensors.................................................................................................................................................236
28.3 Wearables und Intelligent Systems............................................................................................237
28.4 Big Data and Cloud..........................................................................................................................238
28.5 Machine Learning and Computer Vision..................................................................................239
28.6 Virtual und Augmented Reality and Robotics........................................................................239
28.7 Data Protection and Data Misuse...............................................................................................240
References.................................................................................................................................................240
Supplementary Information
Appendix. Third-Party Funds Competitively Acquired by German
Sports Scientists from the German Research Foundation (DFG)
in the Review Board for Computer Science.............................................................................244
Index.....................................................................................................................................................247
XV
Contributors
Arnold Baca Center for Sports Science and University, Sports University of Vienna,
Vienna, Austria
Manuel Bassek Institute for Exercise Training and Sports Informatics, German Sports
University Cologne, Cologne, Germany
Henrik Biermann Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
Christoph Breuer German Sport University Cologne, Institute of Sport Economics and
Sport Management, Cologne, Germany
Marc Garnica Caparrós Institute of Exercise Training and Sport Informatics, German
Sport University Cologne, Cologne, Germany
Andreas Groll Department of Statistics, Statistical Methods for Big Data, TU Dortmund
University, Dortmund, Germany
Maximilian Klemp Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
Otto Kolbinger TUM School of Medicine and Health, Technical University of Munich,
Munich, Germany
Rumena Komitova Institute of Exercise Training and Sport Informatics, German Sports
University Cologne, Cologne, Germany
XVI Contributors
Martin Lames Faculty of Sport and Health Sciences, Technical University of Munich,
Munich, Germany
Daniel Memmert Institute of Exercise Training and Sport Informatics, German Sports
University Cologne, Cologne, Germany
Jürgen Perl Institute of Computer Science, FB 08, University of Mainz, Mainz, Germany
Ashwin Phatak Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
Robert Rein Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
Justus Schlenger Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
XVII
Contributors
Markus Tilp Institute of Human Movement Science, Sport and Health, University
of Graz, Graz, Austria
Fabian Wunderlich Institute of Exercise Training and Sport Informatics, German Sport
University Cologne, Cologne, Germany
1 I
History
Contents
Chapter 1 History – 3
Martin Lames
3 1
History
Martin Lames
Contents
1.1 Introduction – 4
References – 8
This chapter was translated by Erika Graf and final approved by Martin Lames.
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_1
4 M. Lames
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Although rather late compared to other hyphenated computer sciences, sports
informatics can show steady progress toward institutionalization as a scientific
discipline.
55 Stages of this path are free working groups, the dvs section sports informatics,
the IACSS, and finally organized bi-directional cooperation in the sense of the
sports informatics definition of Link and Lames.
55 The future of the discipline depends on whether it will be possible to generate
win-win cooperations based on which sustainable interdisciplinary projects can
be realized.
1.1 Introduction
If one wants to trace the development of sports informatics as a scientific disci-
pline, one is well advised to first make sure of its subject matter. A generally
accepted definition comes from Link and Lames (2015):
Definition
The scientific discipline of “sports informatics” is understood to be multi- and
interdisciplinary research programs that contain sports science and informatics
components. Its subject is the application of tools, methods and paradigms from
computer science to questions of sports science as well as the integration of sports
science knowledge into computer science.
History
5 1
1.2 The Institutional Constitution of Sports Informatics
Characteristic of computer science since its origins, which can be equated with the
advent of electronic calculating machines in the middle of the last century, is that
besides the core computer science areas of theoretical and technical computer sci-
ence, the application of these new methods understandably immediately triggered
a whole range of scientific activities. One has to differentiate between applied com-
puter science, which established itself within computer science, and the so-called
“hyphenated computer sciences”, which (not always to the delight of the “core
computer scientists”) are at home in the respective applying science, such as medi-
cal computer science or business computer science.
These institutionalizations, which in part brought about “real” interdisciplinary
sciences, i.e. those that stand “inter”, i.e. between the two original sciences
(Heckhausen, 1986; Willimczik, 1985), were still a long time coming in sports sci-
ence. Until 1995, there was “only” cooperation in terms of content in research
programs with informatics and sports science components, but there were quite a
number of them since the ability of informatics to contribute to questions of sports
science - as in many other areas - is directly given.
It is therefore somewhat curious that the term “sports informatics” was coined
at a very early stage: In 1976, a congress volume entitled “Creative Sports
Informatics” was published by Recla and Timmer (1976), which reported on a con-
ference of the “International Association for Sports Information (IASI)” in Graz
in 1975. Here, the main interest was in the capabilities of informatics tools to cap-
ture, store and flexibly retrieve information from sports, such as information about
Olympic participants, which was made available to the press online for the first
time in Munich in 1972.
Very early uses of computer science in sports are the computerized game obser-
vation systems of the sports educators (sic!) Hagedorn et al. (1980a) in basketball
and Brettschneider Allendorf and Brettschneider (1976) in volleyball. Here, con-
ceptually and technologically high-quality work was presented, which unfortu-
1 The phases of the institutional constitution of sports informatics proposed here are by no means
identical with the existence and significance of the individual institutions named, but rather rep-
resent here significant stages on the path of the institutionalization of sports informatics.
6 M. Lames
nately was not noticed in the English-speaking world. Interesting are also the
1 studies of groups of computer scientists who either searched for fields of applica-
tion in sports, e.g. the diploma thesis of Elisabeth André et al. (1988), who later
received the Leibniz Prize, about automatic annotation in soccer, or computer sci-
entists who create computer applications out of their enthusiasm for sports, e.g. the
TOTO system by Bolch and Cerny (1990), which is based on the Elo system in
chess.
The work of Jürgen Perl was to become of decisive importance for the develop-
ment of the discipline. Together with his colleagues Wolf Miethling and Günter
Hagedorn from Paderborn, Perl had already made numerous contributions to
sports informatics (Miethling & Perl, 1981; Hagedorn et al., 1980b). At the
University of Mainz, he put a scientific focus on sports informatics starting in
1985. Significant contributions of the Mainz group were competition monitoring
systems in various sports, each of which was on the cutting edge of information
technology. Important on the way to the institutionalization of sports informatics
was a workshop series “Sport & Informatik”, which took place regularly from
1989 on.
The German Association for Sports Science (dvs) is the scientific organization for
academic German sports science. It is organized in sections corresponding to
sports science disciplines such as sports education or training science, and commis-
sions representing cross-sectional topics of enduring scientific interest such as
sports science contributions to individual sports. Thus, the idea of institutionaliz-
ing sports informatics as a section within the framework of the dvs was obvious.
However, achieving this goal was by no means a trivial act, as there were fears
within the dvs that the organization would be overstretched by the establishment of
many subunits, which furthermore corresponded to the hitherto extremely influen-
tial position of Grupe’s “Integrative Sports Science” (Krüger, 2015). Furthermore,
some groups, such as the working group “Media in Sport”, saw their claim to rep-
resentation in terms of content threatened by a sports informatics section.
In addition to numerous discussions with dvs functionaries and colleagues with
an affinity for the subject, and of course, the reference to an existing, and indeed
interdisciplinary, community that had manifested itself at the regular workshops,
an article was placed in the journal “Leistungssport” (Competitive Sports) in which
the subject area was presented to the German public under the title “Sport
Informatics: Gegengegenstandsbereich und Perspektiven einer sportwissenschaftli-
chen Teildisziplin” (Perl & Lames, 1995). There, the potential of sports informatics
for sports science was explained, especially concerning its connectivity to topics of
other sections and its ability to contribute to the support of elite sports.
Since the approval of the general meeting at the dvs-Hochschultag 1995 in
Frankfurt to found the section, the “Workshop Sport & Informatik” now operated
History
7 1
as a meeting of the “dvs section Sportinformatik”. The section successfully asserts
itself in the market of scientific organizations and organizes with the speakers Perl
(1996–2002), Wiemeyer (2002–2012, Lames (2012–2018), and Link (from 2018) its
section conferences in even years at the different centers of the discipline in
Germany (2014: Vienna). The 2018 conference in Munich will be held for the first
time with a section on “Sports Informatics and Sports Technology”, thus future-
proofing the corresponding development in the two sciences.
For Jürgen Perl, it was clear very early on that the institutionalization of sports
informatics could not stop at a national sports science section, but that the interna-
tional stage also had to be “played”. Strategically, the same path was followed on
the national level. The first international congresses in Cologne (1997), Vienna
(1999), and Cardiff (2001) gathered a critical mass of international scientists,
which then made it possible to launch the IACSS (International Association of
Computer Science in Sport) at the conference in Barcelona (2003).
Subsequently, the effort was not to limit the IACSS to Europe, which would not
have corresponded to the claim of an international scientific organization. Meetings
in Canada, China, Australia, and Brazil testify to the success of these efforts. The
establishment of national associations for sports informatics, as the umbrella orga-
nization of the IACSS, was only successful in a few countries (e.g. Germany,
Austria, Turkey, Russia, and China). Mainly probably because initiatives to found
a national association, if they are essentially only based on the initiative of indi-
vidual research personalities, did not prove to be sustainable. The recognition of
the IACSS as a member of the ICSSPE (International Council for Sport Science
and Physical Education), the umbrella organization for sports science associations,
was an important step towards institutional consolidation.
At a very early stage, under the editorship of Arnold Baca (from 2002), a jour-
nal, the IJCSS (International Journal of Computer Science in Sport) was estab-
lished, which can be seen as another important characteristic of the degree of
institutionalization of science. Since 2016, the journal has been published as an
open-access journal together with DeGruyter Verlag.
The respective IACSS presidents Perl (2003–2007), Baca (2007–2013), Lames
(2013–2022), and Zhang (from 2022) have succeeded in establishing a globally rec-
ognized and active organization with regular meetings and publication activities.
At the IACSS General Assembly in Moscow in 2019, it was decided that the IACSS
should move more in the direction of computer science and integrate existing
working groups there that deal with the topic of sports. Successfully in this direc-
8 M. Lames
tion so far have been in contact with the MLSA (Machine Learning in Sports
1 Analytics) group around Jesse Davis, Jan van Haaren, Albrecht Zimmermann, and
Ulf Brefeld. This group has been organizing either its own workshops or satellite
workshops at major computer science conferences since 2013, most recently in
Grenoble in 2022; before that, a virtual workshop was organized in 2021 and—just
like the 7 workshops before it—documented in proceedings (Brefeld et al., 2022).
Collaboration with other, comparable groups, for example from the field of com-
puter vision, is still pending.
An important instrument for the integration of computer scientists and sports
scientists are seminars in Schloss Dagstuhl, an international conference center for
computer science. Of the five seminars on sports science topics hosted with IACSS
participation to date, the most recent was held jointly with MLSA in October 2021
on “Machine Learning in Sports” (organizers: Brefeld, Davis, Lames, Little). In
the future, attendance at each other’s meetings will be required, and ideally, we will
succeed in establishing project groups that also represent sports science and com-
puter science in their ranks in terms of personnel.
Institutional integration is all the more important because the integration of
two disciplines must not be viewed naively. For example, sports scientists often rely
uncritically on the answers provided by computer science without being able to
question its basic methodological assumptions. On the other hand, computer sci-
entists often use the meanwhile good data situation in the attractive application
field of sports (professional soccer) only as a showcase for their original basic sci-
entific questions. These problems could at least be reduced with suitably designed
institutionalized cooperation and thus offer the enormous potential of sports
informatics opportunities for development.
References
Allendorf, O. & Brettschneider, W.-D. (1976). Leistungsdatenerfassung und -auswertung im Sportspiel
mit Hilfe des computergesteuerten optischen Lesestifts. In R. Andresen & G. Hagedorn (Hrsg.),
Zur Sportspielforschung (Band 1: Theorie und Praxis der Sportspiele; S. 106–116). Bartels & Wer-
nitz.
André, E., Herzog, G., & Rist, Th. (1988). On the simultaneous interpretation of real world image
sequences and their natural language description: The system SOCCER. Proceedings of the
eighth conference on artificial intelligence, Munich, August 1-5, 1988 (pp. 449–454).
Bolch, G. & Cerny, M. (1990). TOTO: Ein Programmsystem zur Analyse und Prognose der Ergeb-
nisse von Mannschaftsspielen mit Hilfe eines IBM PC. In J. Perl (Hrsg.), Sport und Informatik.
1. Workshop Sport & Informatik, Hochheim, 26.-27.4.89 (S. 25–35). Hofmann.
Brefeld, U., Davis, J., Van Haaren, J., & Zimmermann, A. (Eds.). (2022). Machine learning and data
mining for sports analytics—8th International workshop MLSA 2021, virtual event, September 13,
History
9 1
2021 (Springer Conference Proceedings, Communications in Computer and Information Science,
Vol. 1571). Springer.
Hagedorn, G., Ehrich, D., & Schmidt, G. (1980a). Computerunterstützte Spielanalyse im Basketball.
Leistungssport, 10(5), 363–372.
Hagedorn, G., Lorenz, H., & Meseck, U. (1980b). Die Verteilung spieltypischer Aktivitäten im Bas-
ketball. Leistungssport, 11(6), 442–449.
Heckhausen, H. (1986). Interdisziplinäre Forschung zwischen Intra-, Multi- und Chimären-Diszipli-
narität. In Zentrum für interdisziplinäre Forschung der Universität Bielefeld (ZIF) (Hrsg.),
Jahresbericht 1985/86 (S. 29–40). ZIF.
Krüger, M. (2015). Ommo Grupe und seine Vision des Sports. Sportwissenschaft, 45, 55–56.
Link, D., & Lames, M. (2015). An introduction to sport informatics. In A. Baca (Ed.), Computer Sci-
ence in Sport – Research and Practice (pp. 1–17). Routledge.
Miethling, W.-D., & Perl, J. (1981). Computerunterstützte Sportspielanalyse. Czwalina.
Perl, J., & Lames, M. (1995). Sportinformatik: Gegenstandsbereich und Perspektiven einer
sportwissenschaftlichen Teildisziplin. Leistungssport, 25(3), 26–30.
Recla, J., & Timmer, R. (Eds.). (1976). Kreative Sportinformatik. Hofmann.
Willimczik, K. (1985). Interdisziplinäre Sportwissenschaft—Forderungen an ein erstarrtes Konzept.
Sportwissenschaft, 15, 9–32.
11 II
Data
Contents
Artificial Data
Fabian Wunderlich
Contents
2.2 Background – 15
2.2.1 L imits of Real-World Data – 15
2.2.2 The Idea of Artificial Data – 15
2.2.3 Random Numbers and Monte Carlo Simulation – 16
2.2.4 Advantages and Disadvantages of Artificial Data Sets – 16
2.3 Applications – 17
References – 19
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_2
14 F. Wunderlich
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 The availability of real-world data sets from the sports domain can be severely
limited, especially with regard to aspects like size, consistency, or quality.
55 So-called random number generators can be used to simulate random numbers
that satisfy certain predefined characteristics.
55 By means of these random numbers, complex processes can be replicated, their
outcomes can be simulated and artificial data sets can be generated without actu-
ally having to observe these processes in reality.
55 Some advantages of artificial datasets are the almost unlimited availability and
the precise knowledge about the underlying processes.
55 The main disadvantage is that the value of artificial data is highly dependent on
whether the assumptions made during its creation accurately represent reality.
A large part of this book discusses the analysis of real-world data sets. Using these
data sets is the natural approach, as they represent the results of the complex real-
world processes, that we want to analyse by adopting methods from computer sci-
ence. However, it should be kept in mind that even real data sets can be subject to
severe problems. In particular, a major challenge is that real-world data may be
limited concerning various aspects such as size, consistency, and competition rules.
In many applications, the sample size does not present a problem, for example,
when analysing a large number of match results (Angelini & de Angelis, 2019;
Kovalchik, 2016) or goals (Wunderlich et al., 2021a). However, natural limits exist
due to the rarity of certain events, such as World Cup matches (Armatas et al.,
2007; Delgado-Bordonau et al., 2013). Moreover, for data with a high granularity
such as positional data, data availability is still significantly limited and it is still
common that studies are based on 50 or less matches (Clemente et al., 2014; Klemp
et al., 2021).
The consistency of the data can be affected by internal and external factors, as
professional sports can be subject to inconsistent rules, rule changes, or social
influences. For example, male tennis players competing in a best of five sets format
in Grand Slam tournaments while playing best of three sets in ATP Tour matches
(see Clarke & Dyte, 2000), the awarding of three instead of two points for a win in
soccer (Riedl et al., 2015), the adjustment of the three-point line in basketball
(Strumbelj et al., 2013), or possible effects of ghost games during the COVID-19
pandemic on home advantage (Wunderlich et al., 2021b). Further data-limiting
issues include the completeness and veracity of available data.
The idea of artificial data, in a way, can be compared to the idea of hypothesis test-
ing in statistics. When performing such tests, we assume some null hypothesis to be
true, and based on this, the distribution of the possible outcomes of an experiment
is calculated (based on the hypothesis being true and taking randomness into
account). Once the experiment has been performed, this theoretical distribution
helps to understand how well the experimental data match the given hypothesis. To
put it simple, it helps to decide whether the data found tend to be in favour of the
null hypothesis or against it.
While probability distributions in hypothesis testing can be stated explicitly,
artificial data usually applies to situations, which are too complex to be described
explicitly through mathematical formulas with reasonable effort. Instead, in a first
step, the underlying process is modelled mathematically by describing the system-
16 F. Wunderlich
atic characteristics of the process. Using random numbers and so-called Monte
Carlo simulation, the random aspects can then be added in a second step. Thus, the
probability of intermediate results or outcomes of the process can be estimated.
2
2.2.3 Random Numbers and Monte Carlo Simulation
Study Box
2.3 Applications
► Example 1
The first domain of application focuses on simulation of the outcomes of sports events
based on certain assumptions. Usually, both the estimation of the systematic strength of
teams or players as well as a modelling of the course of play in the respective sport are
required. Under these assumptions, matches or entire competitions can be repeatedly
simulated. Based on a large number of such simulation runs, the probability of each of
18 F. Wunderlich
the possible outcomes is obtained. The study by Newton and Aslam (2009) represents an
exemplary case of this approach. Using real data from professional tennis, the authors
estimate the systematic strength of tennis players, measured as the probability of win-
2 ning a point on their serve or return. In addition, they estimate the variation in play-
ers’ performance, measured as the variability of these probabilities. Using a model that
builds upon the rules of tennis involving points, games and sets, the probabilities for dif-
ferent match outcomes can be derived from the player characteristics via Monte Carlo
simulation. Even entire tournaments can be simulated by randomly drawing the winners
of each match under the given probabilities. Artificial data on tournament outcomes
can then be analysed to determine the most likely winners and each player’s chances of
reaching a particular round. ◄
► Example 2
The second domain of application aims to better understand the impact and interaction
of different influencing factors. In this regard, simulation and artificial data can be used
to understand the impact of individual variables and their interaction on the outcome of
a process and, above all, to avoid incorrect conclusions. An exemplary case is the study by
Bornn et al. (2019), which addresses the question of whether workload, measured as the
so-called acute chronic workload ratio (ACWR), is a predictor of injury risk. Numerous
existing studies had previously suggested that this is true, while the authors of the afore-
mentioned study suspect that other influencing factors may confound this relationship.
For this reason, artificial data were simulated under the assumption that injuries depend
only on the load of the current training session and not on the ACWR. However, when
analysing the resulting data, significant correlations between ACWR and injuries can
still be found. Solely based on theoretical considerations and artificial data, the authors
were able to prove that the load of the current training session can confound the results,
and thus previously found results using real-world data may have been inaccurate or
even misinterpreted. ◄
► Example 3
The third use case applies artificial data to overcome two challenges of real complex data
sets from sports. First, the specific situation to be analysed is rarely available due to the
generally limited number of matches, where complex data is available. Second, the issue
that experimental manipulations, which would allow the deliberate creation of such situ-
ations, are not possible in professional sports events. Artificial data can enable research-
ers to conduct meaningful analysis, despite these issues. An exemplary case is a study
by Memmert et al. (2021), in which artificial data were used to extend the analysis of
positional data in soccer. Through a simulation approach, it is possible to systematically
investigate various combinations of formation flexibility of the attacking and defensive
team. In particular, it is possible to find out which tactical flexibility of the teams is most
promising. Although real-world positional data is used as the basis for the simulation,
a pure analysis of positional data from real matches is not sufficient, as it would not be
guaranteed that all combinations to be analysed are actually present in these matches. ◄
Newton, P. K., & Aslam, K. (2009). Monte Carlo tennis: A stochastic Markov chain model. Journal
of Quantitative Analysis in Sports, 5(3). https://doi.org/10.2202/1559-0410.1169
Riedl, D., Heuer, A., & Strauss, B. (2015). Why the three-point rule failed to sufficiently reduce the
number of draws in soccer: An application of prospect theory. Journal of Sport & Exercise
2 Psychology, 37(3), 316–326. https://doi.org/10.1123/jsep.2015-0018
Štrumbelj, E., & Vračar, P. (2012). Simulating a basketball match with a homogeneous Markov model
and forecasting the outcome. International Journal of Forecasting, 28(2), 532–542. https://doi.
org/10.1016/j.ijforecast.2011.01.004
Strumbelj, E., Vračar, P., Robnik-Šikonja, M., Dežman, B., & Erčulj, F. (2013). A decade of eurole-
ague basketball: An analysis of trends and recent rule change effects. Journal of Human Kinetics,
38, 183–189. https://doi.org/10.2478/hukin-2013-0058
Wunderlich, F., & Memmert, D. (2020). Are betting returns a useful measure of accuracy in (sports)
forecasting? International Journal of Forecasting, 36(2), 713–722. https://doi.org/10.1016/j.ijfore-
cast.2019.08.009
Wunderlich, F., Seck, A., & Memmert, D. (2021a). The influence of randomness on goals in football
decreases over time. An empirical analysis of randomness involved in goal scoring in the English
Premier League. Journal of Sports Sciences, 39(20), 2322–2337. https://doi.org/10.1080/02640414
.2021.1930685
Wunderlich, F., Weigelt, M., Rein, R., & Memmert, D. (2021b). How does spectator presence affect
football? Home advantage remains in European top-class football matches played without spec-
tators during the COVID-19 pandemic. PLoS One, 16(3), e0248590. https://doi.org/10.1371/jour-
nal.pone.0248590
Zhang, X. (2018). Application of discrete event simulation in health care: A systematic review. BMC
Health Services Research, 18(1), 687. https://doi.org/10.1186/s12913-018-3456-4
21 3
Text Data
Otto Kolbinger
Contents
3.1 Introduction – 22
3.2 Applications – 23
3.2.1 E valuation of Technological Officiating Aids – 23
3.2.2 Match Predictions – 24
3.2.3 Talent Scouting – 25
References – 26
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_3
22 O. Kolbinger
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Text data can contain information about athletes, competitions, and their impact
on society that is not contained in other data
55 Computer-aided text mining methods allow economic processing and analysis of
large (text) datasets
55 Text data are currently mainly used to address sociological and economic issues
related to sport
55 First studies show potential for the systematic analysis of text data also in areas
that are primarily concerned with the performance of athletes, such as exercise
science and performance analysis.
3.1 Introduction
For a long time, the processing of questions from sports and sports science
with informatic methods dealt almost exclusively with numerical data, such as
action or position data. However, knowledge about athletes, competitions,
and their effects is often also available in textual form, such as countless scout-
ing reports in junior academies of clubs and federations. Nowadays, advances
in text recognition and machine learning allow the efficient analysis of large
text datasets. Accordingly, so-called “text mining” methods are increasingly
used in theory and practice, especially in disciplines that traditionally work a
lot with data in text form such as open-ended questionnaires or standardized
interviews. In sports science, for example, studies from the field of sports soci-
ology dominate. Several studies have already examined how fans react to polit-
ical statements by athletes, for example, Frederick et al. (2020) regarding
political statements by Megan Rapinoe or Schmidt et al. (2019) regarding pro-
tests during the national anthem. Both of the studies listed used social media
posts as their data source—another trend in research based on textual data.
Text Data
23 3
Accordingly, the exemplary applications in this chapter show two studies that
are based on social media data.
One study deals with the influence of the video assistant referee in soccer, the
so-called VAR, on the mood of fans of the English Premier League on Twitter
(Kolbinger & Knopp, 2020). Here, text data can be used to systematically investi-
gate the impact of technical innovations on stakeholders. An aspect that has been
neglected for the introduction of technological officiating aids for referees. How
posts from social media can further be used to predict game outcomes is also dem-
onstrated in this chapter using an example from American football (Schumaker et
al., 2017). The fact that this is not the only textual data that can contain valuable
information for match prediction is discussed using a study by Beal et al. (2021).
Finally, as a last exemplary application area, two papers will demonstrate how text
data can be used in talent scouting (Maymin, 2021; Seppa et al., 2017).
Definition
All structured and unstructured text bodies can in principle serve as text data. In
the context of sports informatics, this includes, but is by no means limited to,
social media posts, interviews, and expert assessments in text form.
3.2 Applications
Over the last decades, more and more sports introduced so-called Technological
Officiating Aids to support referees (Kolbinger, 2018). While sports practice and
academic publications have focused primarily on the technology itself and its
impact on decision quality, the influence of these interventions on stakeholders
such as fans has been neglected (Kolbinger & Lames, 2017). Here, textual data in
the form of social media posts offer an easily accessible way to study precisely this
influence. Over 3 billion people worldwide use platforms such as Twitter, Facebook,
or Instagram to make their opinions and emotions known on specific topics or
events (Kozinets, 2020). This of course includes polarizing topics in sports and the
video assistant in soccer can undoubtedly be seen as such (see also Kolbinger 2020).
As a data basis for the study on the influence of the VAR on the sentiment of
soccer fans on Twitter, Kolbinger and Knopp (2020) used all tweets of 129 matches
of the Premier League season 2019/20 in which the official match hashtag was used
(i.e., for example, #LIVMUN for Liverpool FC vs. Manchester United FC). Of
these total 643,251 tweets, 58,264, or 9.1%, dealt with the video referee. For these
tweets, as well as the rest of the sample, the authors also performed so-called senti-
ment analysis, to evaluate whether a post expressed an overall rather negative, neu-
tral, or positive sentiment. What was striking was that 76.2% of the tweets about
the VAR expressed negative emotions and only 12.3% expressed positive emotions.
24 O. Kolbinger
In contrast, for all other tweets during the soccer matches considered, 39.4% of the
posts were positive and only 31.3% were negative. In addition, Kolbinger and
Knopp (2020) looked at how video referee interventions affected average sentiment
during soccer matches and were able to show that on average these interventions
led to a significant drop in sentiment that lasted over 20 min.
For both deciding whether a tweet referred to the VAR and assessing sentiment,
3 the authors developed and used automatic text classifiers. This means that an algo-
rithm was trained to automatically classify the content of posts into the aforemen-
tioned categories. This is a common procedure for analyzing large amounts of text
data, and it was also used in many of the following studies. For all these studies, the
quality of the text classifier is of paramount importance. A circumstance that, as
with other applications of machine learning methods, is unfortunately often
neglected. Accordingly, it is important for readers of studies with text data that not
only the origin and type of the data are comprehensible, but also the applied clas-
sification procedure and its quality (Kolbinger, 2022).
Another area of application in which the sentiment of social media posts has
already been used in a promising way is the prediction of match results. An
American research group led by Robert P. Schumaker demonstrated this in one
study each on the English Premier League (Schumaker et al., 2016) and the
National Football League (NFL—American Football; Schumaker et al., 2017). It
is more or less an attempt to use the so-called “Wisdom of the Crowd“to predict
the outcome of games. In this subsection, we focus on the NFL study, in which the
authors used a very interesting approach. For each game, all tweets about one of
the participating teams in the 96 h (4 days) before kickoff served as the data basis.
Schumaker et al. (2017) compared how the average sentiment of these posts
changed on the last day before the game compared to the 3 days before. They sim-
ply predicted the team for which there was a more positive change in sentiment (or
a less negative one) as the winner of the game. With this simple method, the authors
achieved the same prediction rate as sports betting providers, and they were par-
ticularly good at predicting wins of underdogs.
The same pattern is found in a paper by Beal et al. (2021), but using a funda-
mentally different type of text data. Instead of using many nonspecific short text
corpora hoping for swarm intelligence, as Schumaker et al. (2017) did, the text data
of Beal et al. (2021) are each single, elaborate texts on a predefined topic. In more
detail, preliminary reports on Premier League matches from an English daily news-
paper. Here, in contrast to the previous studies, no classifications of the text (except
for the assignment of each sentence to a team) were made. Again, this approach
specifically predicted unexpected outcomes better than models based on numerical
data. Thus, it seems that there is information contained in the text that cannot (at
least not yet) be represented by numerical data like previous results or so-called key
performance indicators.
Text Data
25 3
3.2.3 Talent Scouting
“[Kobe knows] the game of basketball and what needs to be done to win”—this
excerpt from Jason Sean Fuiman’s scouting report on Kobe Bryant, who sadly
passed away much too soon, proved to be very true (Sumsky, 2020). Interestingly,
Fuiman refers here to performance characteristics of Kobe Bryant that have not
been - and may never be—mapped via numerical data. Such scouting reports exist
in countless clubs and associations for a countless number of athletes. Text-mining
techniques can harness them in an economical way, as it is demonstrated by two
studies which each attempted to predict expected performance in a professional
league based on commercial scouting reports. Both used very different approaches
to do so.
Seppa et al. (2017) combined sentiment analysis with a so-called lexicon-based
categorization of text data to evaluate scouting reports in ice hockey. This means
that they tried to assign each sentence or paragraph of a scouting report to a
category via certain keywords or word strings. For example, a sentence was
assigned to the category “effort” if words such as “effort” itself or “lazy” or word
chains such as “needs to compete harder” appeared. In conjunction with an anal-
ysis of the sentiment of the corresponding passage, they then tried to classify the
players for each of these categories, for example as players with “poor effort” or
“good puck skills”. In this way, Seppa et al. (2017) were able to predict the assist
and goal rates of players in the professional leagues better compared to models
based on the assists and goals scored in youth leagues. Predictions were even bet-
ter when the two data types were combined. However, it must be noted that the
prediction quality for both methods used and their combination was not very
high. However, the pattern that the reports were able to improve the prediction
quality is very interesting.
In a study on basketball, Maymin (2021) describes that a model he devel-
oped would have outperformed the draft performance of 29 of the 30 NBA
teams. In addition to game statistics, the model included scouting reports,
which contained scores for specific skills as well as pre-structured text data.
The text bodies were divided into “Strengths,” “Weaknesses,” “Overall,” and
“Notes.” Because Maymin (2021) was primarily concerned with comparing his
model to the draft performance of NBA teams, he did not go into detail about
the contributions of the individual components of his model. However, it is
already apparent from the summary overview of the importance of the com-
ponents that the content and, in particular, both length and sentiment of the
individual categories made a predictive contribution that was similar to that
of individual match statistics.
Accordingly, the state of research on the usability of text data for talent diag-
nostics is by no means satisfactory, especially since the few studies to date have
focused on commercial scouting reports and exclusively on the sports system in
North America. However, the initial results can certainly be considered promising
and suggest that text data represent information that is not (yet) available in other
data.
26 O. Kolbinger
3 References
Beal, R., Middleton, S. E., Norman, T. J., & Ramchurn, S. D. (2021). Combining machine learning
and human experts to predict match outcomes in football: A baseline model. Proceedings of the
AAAI Conference on Artificial Intelligence, 35(17), 15,447–15,451.
Frederick, E. L., Pegoraro, A., & Schmidt, S. (2020). “I’m not going to the f***ing white house”:
Twitter users react to Donald Trump and Megan Rapinoe. Communication & Sport, in press., 10,
1210–1228. https://doi.org/10.1177/2167479520950778
Kolbinger, O. (2018). Innovative technische Hilfsmittel zur Unterstützung von Schiedsrichtern in
Spielsportarten als Gegenstand von Evaluationsforschung [Innovative Technological Officiating
Aids as object of Evaluative Research]. Doctoral dissertation, Technical University of Munich.
Kolbinger, O. (2020). VAR experiments in the Bundesliga. In M. Armenteros, A. J. Benítez, & M. A.
Betancor (Eds.), The use of video technologies in refereeing football and other sports (pp. 228–245).
Routledge.
Kolbinger, O. (2022). Text mining and performance analysis. In: International conference on security,
privacy, and anonymity in computation, communication, and storage (pp. 3–8). Springer, Cham.
Kolbinger, O., & Knopp, M. (2020). Video kills the sentiment—Exploring fans’ reception of the
video assistant referee in the English premier league using twitter data. PLoS One, 15(12),
e0242728. https://doi.org/10.1371/journal.pone.0242728
Kolbinger, O., & Lames, M. (2017). Scientific approaches to technological officiating aids in game
sports. Current Issues in Sport Science, 2, 001. https://doi.org/10.15203/CISS_2017.001
Kozinets, R. V. (2020). Netnography: The essential guide to qualitative social media research. Sage.
Maymin, P. (2021). Using scouting reports text to predict NCAA→NBA performance. Journal of
Business Analytics, 4(1), 40–54. https://doi.org/10.1080/2573234X.2021.1873077
Schmidt, S. H., Frederick, E. L., Pegoraro, A., & Spencer, T. C. (2019). An analysis of Colin
Kaepernick, Megan Rapinoe, and the national anthem protests. Communication & Sport, 7(5),
653–677. https://doi.org/10.1177/2167479518793625
Schumaker, R. P., Jarmoszko, A. T., & Labedz, C. S., Jr. (2016). Predicting wins and spread in the
Premier League using a sentiment analysis of twitter. Decision Support Systems, 88, 76–84.
https://doi.org/10.1016/j.dss.2016.05.010
Schumaker, R. P., Labedz, C. S., Jr., Jarmoszko, A. T., & Brown, L. L. (2017). Prediction from
regional angst–a study of NFL sentiment in Twitter using technical stock market charting.
Decision Support Systems, 98, 80–88. https://doi.org/10.1016/j.dss.2017.04.010
Seppa, T., Schuckers, M. E., & Rovito, M. (2017). Text mining of scouting reports as a novel data
source for improving NHL draft analytics. In Ottawa hockey analytics conference (pp. 1–11).
Sumsky, A. (2020, July 2). Kobe Bryant’s Sco uting Report is Worth the Read. Basketball forever.
https://basketballfor-ever.com/2020/07/02/kobe-bryants-scouting-report-worth-read.
27 4
Video Data
Eric Müller-Budack, Wolfgang Gritz, and Ralph Ewerth
Contents
4.2 Background – 29
4.4 Applications – 31
References – 33
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_4
28 E. Müller-Budack et al.
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Video data capture actions and poses of athletes as well as movements.
55 With the help of AI-based approaches, video recordings can be automatically
analyzed to obtain time-accurate information about movements, actions, and
poses, available for further analysis.
55 Using information extracted from video recordings, both video and position
data can be enriched with additional metadata.
55 Using computational methods for sports field registration, position data can be
extracted from videos.
55 In the future, real-time approaches may help to evaluate live actions in individual
and team sports.
4.2 Background
On the one hand, position data help to objectively evaluate games relatively
quickly by means of an abstract representation, but on the other hand, lots of
details are naturally lost in the process. This includes information about actions
(header, straddle, etc.), pose, head and gaze direction, as well as movement
details. To capture such information, computer vision approaches can be applied
to video data, usually based on deep learning models (see 7 Chaps. 20 and 21).
In recent years, several methods have been presented to detect actions in sports
videos with exact time points (Biermann et al., 2021; Deliège et al., 2021; Giancola
& Ghanem, 2021). Body pose estimation approaches (Cao et al., 2021; Kreiss
et al., 2019) detect key points, e.g., shoulders, hips, knees, joints, etc. for the
depicted subjects to represent the pose. These approaches are trained with videos
from the domains of team and individual sports, among others, and have been
successfully applied to different sports, for example, to evaluate possible pass
options (Sangüesa et al., 2020) or to analyze penalty kicks (de Sousa Pinheiro
et al., 2022).
Position data enable various other applications. Therefore, recently, more and
more approaches have been proposed for sports field registration (Chen & Little,
2019; Sha et al., 2020; Theiner et al. 2023), which is important for the extraction of
position data from videos. Methods for sports field registration transform the vis-
30 E. Müller-Budack et al.
ible part of the pitch in the image or video frame into a 2D model of the sports
field. In combination with approaches for object detection (Zhou et al., 2020),
Theiner et al. (2022) have presented a first system for extracting position data from
television and scouting feed recordings of soccer matches.
In addition to the previously mentioned research topics, there are more research
fields around AI-based sports video analysis. These include, for example, the auto-
matic generation of highlight videos (Decroos et al., 2017) and the tracking and
(re)identification (even across shots taken from different cameras) of athletes or
4 game equipment (Cioppa et al., 2022; Habel et al., 2022; Rematas et al., 2018).
Furthermore, researchers have explored software tools (e.g., SportSense by Probst
et al., 2018) and information visualizations (Fischer et al., 2019) for effective analy-
sis processes using sports videos. Besides the aforementioned research approaches,
various commercial tools, such as Skillcorner (7 https://skillcorner.com/) and
Stats Perform (7 https://www.statsperform.com), also provide solutions for the
analysis of sports videos.
Theiner et al. (2022) presented a first by Theiner and Ewerth (2023) by esti-
system (. Fig. 4.5.1) that combines mating camera parameters via an itera-
state-of-the-art computer vision tech- tive optimization of the reprojection
niques to automatically estimate posi- errors of geometric primitives (line seg-
tion data in television broadcasts and ments of the sports field) to the 2D
scouting feed recordings of soccer sports field model. Finally, the players
matches. For this purpose, the sports are detected in the video images using a
field and field markings in the video deep learning approach (Zhou et al.,
frame are segmented and compared to a 2020) and transformed to the 2D sports
reference database of synthetic images field model through the homography
of the sports field with known camera matrix to estimate position data. The
parameters using a deep learning system has achieved very good results
approach (Chen & Little, 2019). A and provides an initial basis for per-
homography matrix is determined based forming various analyses based on posi-
on the camera parameters of the most tion data (see 7 Chap. 6), such as
similar reference image. Subsequently, automatic classification of soccer for-
the homography matrix can be applied mations (Müller-Budack et al., 2019),
to transform the video frame into a 2D spatial control (Memmert et al., 2019),
sports field model. This so-called sports and other key performance indicators
field registration was further optimized (KPIs).
.. Fig. 4.5.1 Processing pipeline for position data extraction from videos (Theiner et al.,
2022)
4.4 Applications
Giancola & Ghanem, 2021) already achieve promising results in spotting actions in
videos. Using such approaches, it is possible to efficiently search for specific actions in
videos and large video collections. For example, standard situations or situations that
led to a goal could be analyzed in more detail for selected teams, matches, etc. On the
other hand, starting from a selected scene, it is possible to find similar scenes in terms
of number and type of actions in the same or in other videos. This allows game analysts
to search for specific tactical patterns. An existing limitation of the SoccerNet dataset is
that important actions such as passes as well as their attributes (e.g., whether flat or high
4 pass) are not annotated. A corresponding taxonomy for a more complete coverage of
actions in invasion sports has been presented by Biermann et al. (2021). With the help
of annotated training data according to this taxonomy, current methods for action spot-
ting (e.g., Giancola & Ghanem, 2021) can be finetuned and extended with these classes
in the future. ◄
Sangüesa, A. A., Martı́n A., Fernández, J., Ballester, C., & Haro, G. (2020). Using Player’s Body-
Orientation to Model Pass Feasibility in Soccer. IEEE/CVF Conference on Computer Vision and
Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, 2020 (pp. 3875–3884). IEEE/
CVF.
Sha, L., Hobbs, J. A., Felsen, P., Wei, X., Lucey, P., & Ganguly, S. (2020). End-to-end camera calibra-
tion for broadcast videos. IEEE/CVF Conference on Computer Vision and Pattern Recognition,
CVPR 2020, Seattle, WA, USA, 2020 (pp. 13,624–13,633). IEEE.
Theiner, J. & Ewerth, R. (2023). TVCalib: Camera Calibration for Sports Field Registration in
Soccer. IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023,
4 Waikoloa, HI, USA, 2023 (1166–1175). IEEE/CVF.
Theiner, J., Gritz, W., Müller-Budack, E., Rein, R., Memmert, D., & Ewerth, R. (2022). Extraction
of positional player data from broadcast soccer videos. IEEE/CVF Winter Conference on
Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, 2022 (pp. 1463–1473). IEEE/
CVF.
Zhou, X., Koltun, V., & Krähenbühl, P. (2020). Tracking objects as points. European Conference on
Computer Vision, ECCV 2020, Glasgow, UK, 2020 (pp. 474–490). Springer.
35 5
Event Data
Marc Garnica Caparrós
Contents
5.2 Background – 37
5.3 Application – 38
5.3.1 E vent Data to Extend Box Score Statistics – 38
5.3.2 Event Data to Value in-Game Actions and Player Impact – 39
5.3.3 Event Data to Understand Player Interactions – 39
References – 40
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_5
36 M. Garnica Caparrós
5 If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Event data is the time-ordered collection of all actions occurring in an invasion
sports game such as soccer or basketball.
55 Event data not only includes time and sport-specific attributes but also allocates
all the events in the field and can contain custom attributes to enrich the data
analysis.
55 Invasion sports are a complex interaction of several players. Event data can help
disseminate the sequences of a game and understand the outcome.
55 Event data allows computing advanced statistics of a game that can contextual-
ise the activity of certain players in certain situations.
55 Several probabilistic models were presented in recent years aiming to analyse the
chain of events that lead to a certain team objective, such as the Expected Goals
metric.
55 Not all goals are created equally, event data contains crucial information in
understanding what, where and how it happens.
55 A correct management and modelling of event data can contribute to a better
understanding of team tactics and player contribution.
Not that long ago, soccer players were mainly evaluated only by their scoring abil-
ity. Player awards were then related to the players with higher offensive activity and
efficiency. Prior to the explosion of interest in performance statistics and key indi-
cators of players’ contribution, the assist (i.e., the pass that enables another player
to score) and the passing ability overall started receiving more attention. Nowadays
and through the most prolific years of analytics in soccer and other sports, players
Event Data
37 5
are evaluated on both offensive and defensive ends in high granularity. The short
sequence between a goal and his preceding pass has been extended substantially,
how was the attacking play started? Who moved the ball to the attacking zone
where the assist was performed? The coaching staff no longer attributes player
performance only to the last movements before a player scores, but to a larger and
more meaningful sequence of events leading to it. This sequence of XY-located
events in the pitch is the basis of Event Data, a time-based log of all actions occur-
ring during the game. The analysis of event data has become a crucial aspect of any
professional soccer team with the main applications sourcing from pattern mining,
sequence analysis and association rules. In soccer, event data led to the emergence
of advanced metrics such as the Expected Goals (xG) (Caley, 2015) and other
extensions not only in soccer but also in sports like Basketball, where event data is
also one of the data sources most used in daily operations, with expected metrics
such as Expected Possession Value (Cervone et al., 2014).
5.2 Background
The emergence of highly fined granular data is one of the main motors of the big
data analytics revolution that the sports industry experienced in the last 10 years.
When watching or playing invasion sports (Hughes & Bartlett, 2002), sports with
such common characteristics as soccer, there are several options to recreate the
game through data. Box-score statistics, often called match sheet data, provide a
very intuitive picture of the actions that occurred during the game, for example, the
number of passes of a certain team in a basketball game (Oliver, 2004) or the num-
ber of shots in a soccer game. However, this data shows a discrete summary neglect-
ing the interactions, time order and distribution of the game. The temporal
information, i.e., when the events are happening and in which order, was added
into the so-called play-by-play data, an ordered textual collection of all the actions
performed by both teams during a game. This type of data source not only pro-
vided a more exact description of the game but also allowed for sequence-based
analysis of events (Carling et al., 2008), moving the attention towards the chain of
events rather than the appearance of an individual event.
Play-by-play data collection motivated the study of temporal interactions in
invasion sports. In some cases, processing of this textual log could generate ad-hoc
box scores with temporal criteria. For instance, the distribution of the events in
time could give information about the sports structure and characteristics (Alberti
et al., 2013). Despite the increased information present in this data source, the tex-
tual information of each action occurring in the game was often insufficient, limit-
ing its application and studies. Thanks to the advances in data capture technologies
and computer-vision systems (Gudmundsson & Horton, 2018), play-by-play data
evolved into the so-called event data. Event data allows for a better understanding
of the invasion sports game as it includes spatiotemporal information of all actions
38 M. Garnica Caparrós
occurring during the game as well as several context markers. The appearance of
spatiotemporal event data in invasion sports is often related to another data source,
tracking data or positional data (Bourbousson et al., 2010; Goes et al., 2020).
Positional data is a highly granular data source collected by optical tracking sys-
tems or sensor-based technologies. This data source includes the locations of all
players and the ball at a high frequency. Positional data includes more information
than event data but it’s often harder to analyse efficiently. Event data is commonly
used for coaching, scouting or performance analysis purposes and has become a
core component of any data-driven sports organizations, namely clubs and federa-
tions. Despite that most of the event data used is currently being collected by a
5 mixture of manually annotated procedures and automatic systems, the advances in
the automated notation of sports games are expanding this data source to all
leagues and academies (Biermann et al., 2021).
Definition
Event data is defined as the time-ordered collection of all actions (events) that are
occurring in an invasion sports game. The event information includes but is not
limited to systematic information such as the timestamp when the event occurred,
the key actor of the event (e.g., the player performing the pass), the team, the
spatiotemporal features of the event (i.e., x and y coordinates of the event in the
playing field) and the outcome of the event (e.g., whether the pass was accurate or
not); and also sport-specific attributes. For instance, in soccer, events could be
enriched with the part of the body used to perform the action (left foot, right foot,
head), the type of event (e.g., diagonal passes, through passes, chipped passes), or
the difficulty associated with the event (e.g., the number of defenders in front, the
position of the goalkeeper when shooting, etc). Current advances in event data in
soccer included on each event the location of all the players and the ball as a key
context attribute (STATSBOMB, 2021). Event data is present in many invasion
sports such as soccer, basketball, handball, hockey and rugby.
5.3 Application
► Example 1
The raise of Women’s soccer has been established in recent years as the guide for so many
other sports organizations to boost and motivate equality in sports. From a performance
point of view, a recent study compared the technical and tactical differences between
Event Data
39 5
men’s and women’s soccer using event data (Garnica-Caparrós & Memmert, 2021). In
order to extract the most detailed statistics to summarize a soccer game and enrich the
comparison, the study made use of event data from two competitions (51 games of the
2016 UEFA Men’s Europe Championship and 31 games of the 2017 UEFA Women’s
Europe Championship). 33 discrete features were created from over 100 K events dis-
seminated by the period of the game and player position. A subjective comparison
methodology was present by using machine learning interpretability tools. Overall, the
study showcased pivotal factors that differentiate each gender’s performance as well as
patterns involving several indicators. ◄
► Example 2
In order to extend the approaches led by the Expected Goals (xG) metric, the full poten-
tial of event data was used with the goal to measure the impact of every action in a
soccer game (Decroos et al., 2020). The VAEP framework tries to assign a contribution
to every single event by measuring the probabilities of scoring and conceding a goal
preceding and proceeding with the event. In doing so, this approach aimed at improv-
ing existing methods that only rely on rare events, such as shots, to evaluate a player or
team’s performance. Overall, the VAEP framework can be used to quantify a player’s or
team’s offensive and defensive contributions. ◄
► Example 3
The information that event data comprises enables to analyse team’s behaviour as a
complex system of interactions. (Duch et al., 2010) highlight the power of network
analysis to understand the interaction between players in a team. Passing networks
were presented as a visual definition of teamwork and quantified the contribution of
individuals and team performance. A passing network can be built from simple Event
Data sources, the nodes of the network represent the players of a team and the edges
represent their connection during the game (i.e., passes). Weighted edges provide an
overview of the most common interactions and pivotal players on team tactics. If
available, nodes can also be allocated in the average position of the players in the field,
providing an overall allocation of the team in the field and an XY-based overview of
their tactics. ◄
40 M. Garnica Caparrós
Study Box
Current research on the use of event data ysis in soccer that could be extrapolated
sources to understand invasion sports is to the rest of invasion sports. Process
driven by the probabilistic models Mining as a tool could serve as an entry
extending the work of the xG metric and point to unfold the potential of detailed
the VAEP framework. However, its pre- event data sources beyond computing
dominant use in sports organizations advanced box score statistics. Process-
forms a need for a more democratic, based management could also allow the
interpretable and customizable applica- reproducibility of sequences in experi-
5 tion of sequence-based analysis of event
data. A recent study (Kröckel &
mental approaches through simulation.
Oversampling and artificial modelling of
Bodendorf, 2020) proposes a generaliz- event sequences could expand the exist-
able framework for players’ contribution ing knowledge in expected metrics algo-
analysis, team tactics and sequence anal- rithms and refine their interpretation.
References
Alberti, G., Iaia, F. M., Arcelli, E., Cavaggioni, L., & Rampinini, E. (2013). Goal scoring patterns in
major European soccer leagues. Sport Sciences for Health, 9, 151–153. https://doi.org/10.1007/
s11332-013-0154-9
Biermann, H., Theiner, J., Bassek, M., Raabe, D., Memmert, D., & Ewerth, R. (2021). A unified
taxonomy and multimodal dataset for events in invasion games. https://doi.org/10.48550/
ARXIV.2108.11149.
Bourbousson, J., Sève, C., & McGarry, T. (2010). Space–time coordination dynamics in basketball:
Part 2. The interaction between the two teams. Journal of Sports Sciences, 28, 349–358. https://
doi.org/10.1080/02640410903503640
Caley, M. (2015). EPL projections and expected goals method: Spurs are good! EPL projections and
expected goals method: Spurs are good! Retrieved from https://cartilagefreecaptain.sbnation.
com/2015/10/19/9295905/premier-league-projections-and-new-expected-goals
Carling, C., Bloomfield, J., Nelsen, L., & Reilly, T. (2008). The role of motion analysis in elite soccer.
Sports Medicine, 38, 839–862. https://doi.org/10.2165/00007256-200838100-00004
Cervone, D., D’Amour, A., Bornn, L., & Goldsberry, K. (2014). A multiresolution stochastic process
model for predicting basketball possession outcomes. https://doi.org/10.48550/ARXIV.1408.0777
Decroos, T., Bransen, L., Haaren, J. V., & Davis, J. (2020). VAEP: An objective approach to valuing
on-the-ball actions in soccer (extended abstract). Proceedings of the Twenty-Ninth International
Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial
Intelligence Organization. https://doi.org/10.24963/ijcai.2020/648.
Duch, J., Waitzman, J. S., & Amaral, L. A. (2010). Quantifying the Performance of Individual Players
in a Team Activity. (E. Scalas, Ed.). PLoS One, 5, e10937. https://doi.org/10.1371/journal.
pone.0010937
Event Data
41 5
Garnica-Caparrós, M., & Memmert, D. (2021). Understanding gender differences in professional
European football through machine learning interpretability and match actions data. Scientific
Reports, 11. https://doi.org/10.1038/s41598-021-90,264-w
Goes, F. R., Meerhoff, L. A., Bueno, M. J., Rodrigues, D. M., Moura, F. A., Brink, M. S., et al.
(2020). Unlocking the potential of big data to support tactical performance analysis in profes-
sional soccer: A systematic review. European Journal of Sport Science, 21, 481–496. https://doi.
org/10.1080/17461391.2020.1747552
Gudmundsson, J., & Horton, M. (2018). Spatio-temporal analysis of team sports. ACM Computing
Surveys, 50, 1–34. https://doi.org/10.1145/3054132
Hughes, M. D., & Bartlett, R. M. (2002). The use of performance indicators in performance analysis.
Journal of Sports Sciences, 20, 739–754. https://doi.org/10.1080/026404102320675602
Kröckel, P., & Bodendorf, F. (2020). Process mining of football event data: A novel approach for
tactical insights into the game. Frontiers in Artificial Intelligence, 3. https://doi.org/10.3389/
frai.2020.00047
Oliver, D. (2004). Basketball on Paper. Potomac Books Inc..
STATSBOMB. (2021). STATSBOMB. Retrieved from http://www.statsbomb.com/
43 6
Position Data
Daniel Memmert
Contents
6.2 Background – 45
6.3 Applications – 46
References – 47
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_6
44 D. Memmert
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
6
Key Messages
55 Position data describe the positions/movements of athletes and game equipment
in terms of X–Y coordinates
55 Position data reflect the complex reality, are reliable, can be evaluated objectively
and extremely quickly
55 Performance-relevant parameters from training and competition can be ana-
lyzed on the basis of position data
55 Experimental approaches can help develop and test theories in various areas of
sports science and sports informatics in the future
Three examples from the sports game soccer will be used to illustrate the questions
that game analysts—but also sports computer scientists—have in a wide variety of
team sports games that can be answered on the basis of position data (Garnica-
Caparros & Memmert, 2021; Rein et al., 2017b). First, it is important to know
which spaces on the soccer field are occupied by which games and when. In addi-
tion to such so-called space control values, it is also of interest how control shares
shift with one’s passing game: How big is the space gain in the game setup or in
interface passes in front of the opponent’s goal? Secondly, when it comes to a
team’s switching behavior, it is important to know how quickly and where a team’s
players run at the opponent after losing the ball. In this way, not only the success
rate in pressing can be quantified, but also the aggressiveness with which a team
switches after losing the ball. Thirdly, you can determine how many opponents can
still defend a player with the ball—both before and after he has played a pass. The
two values already provide information about how many players a team puts
Position Data
45 6
behind the ball when defending (rest defense), but the difference is just as interest-
ing: it shows how many opponents ultimately overplay a pass and thus take it out
of play. This value proves to be a good method for evaluating vertical passes and
can be refined as desired by, for example, integrating the pressure that the oppo-
nents apply to the passer and receiver at the time of delivery and reception in addi-
tion to the overplayed opponents.
6.2 Background
To capture the performance parameters presented above, it would take many hours
to evaluate video recordings. On the basis of so-called position data, however, this
is possible in seconds (Memmert & Raabe, 2019). The precise recording of the
positions of each actor and possibly other materials thus enables significantly more
complex analyses with new types of performance indicators today. Meanwhile,
position data are collected in various sports. While in (beach) volleyball (Link,
2014), field hockey (Stöckl & Morgan, 2013), handball (Hassan et al., 2017), tennis
(Kovalchik & Reid, 2018; van Meurs et al., 2021), badminton (Rojas-Valverde et
al., 2020) or basketball (Kempe et al., 2015) this development is only at the begin-
ning, in soccer position data (Theiner et al., 2022) are already generated by default
(Biermann et al., 2023; (cf. 7 Chap. 5)). There are several performance measures,
known as key performance indicators (KPIs), that have been used for analysis to
date (see Low et al., 2019; Memmert et al., 2017).
With the help of process, longitudinal and cross-sectional analyses, it is also
possible to investigate dynamic relationships on the basis of position data, for
example, by taking greater account of situational and thus context-specific refer-
ences and interindividual differences (cf. Rein & Memmert, 2016). For this pur-
pose, training and competition data can also be more strongly linked in the future.
In order to interpret positional data, sound theories or models are mandatory
(Memmert et al., 2019; Rein et al., 2017a). Due to the ever-increasing importance
of positional data in sports informatics, experimental approaches that establish
theories and test them empirically are currently becoming visible (Low et al., 2022a,
2022b; Memmert et al., 2019).
Definition
Position data describe the positions and movements of athletes and playing equip-
ment in the form of X–Y coordinates. In sports games, they consist of the positions
of all players and the ball in the form of X–Y coordinates (for the ball, sometimes also
Z components) (Memmert & Raabe, 2019). They are recorded either by special cam-
era systems in the stadium or by mobile devices worn by the players under their clothes
(Memmert, D. 2021. Match Analysis. Abingdon: Routledge).
46 D. Memmert
Study Box
The field experiment by Memmert et al. playing space or space control. Consistent
(2019) is the first to investigate the effects with the hypothesis, a 3-5-2 playing system
of different playing systems (here 4-2-3-1 with 5 levels (3-1-2-2) outperformed the
vs. 3-5-2) on tactical KPIs using positional 4-2-3-1 playing system with 4 levels for the
data in an 11 vs. 11 soccer match setup. “length-width quotient” and passing effi-
The KPIs were measured using dynamic ciency under pressure, because the former
KPIs such as “Effective Playing Area,” had one more (player) level in the mid-
“Length-to-Width Ratio,” space control, field. The experimental paradigm for posi-
and passing efficiency under pressure. tional data analysis represents a useful
Within the experimental positional approach to advance the development and
data analysis paradigm, both team forma- validation of theory-based models in
6 tions showed no differences in effective sports game performance analysis.
6.3 Applications
► Example 1
In a Big Data field study, a total of 50 matches of the men’s soccer Bundesliga from the
2014/15 season (2 teams, 2 half-times, 200 data sets) were automatically evaluated and
validated based on positional data with different KPIs (Memmert et al., 2016, 2017).
The focus was on the self-developed analysis tool SOCCER (Perl et al., 2013), which
combines conventional data analysis, dynamic state-event modeling, and artificial neu-
ral networks (cf. 7 Chap. 20). The winning teams convince with significantly higher
space control shares as well as space gains in their own build-up of play and also outplay
more opponents here on average. They also win a lot of space in front of the opponent’s
goal in attack. Over the course of the entire season, the teams in the top and bottom
third of the table (according to the final table) were also compared with each other. And
here, too, space control proved to be a major difference between top clubs and relegation
candidates. Regardless of whether it was the build-up to the game or attacking play: in
almost all areas, there was a significant difference in spatial dominance in the critical
zones of the pitch—in favor of the teams from the top third. If we compare the win-
ning and losing teams, we see that the winning team overplays more opponents in the
game setup during the 90 min. They also face fewer opponents on average in possession
compared to the losers—even on vertical passes in the attacking area. Nevertheless, the
losing teams made more effort in the transition game. ◄
► Example 2
In order to mask different physiological and anatomical characteristics between women
and men and to avoid gender bias in the assessment of soccer matches at the highest
level of play, tactical performance of both genders was assessed based on positional data
(Garnica-Caparros & Memmert, 2021). Artificial neural networks (s. 7 Chap. 20) were
used as objective KPIs, among others. The analysis of pass pressure efficiency measures,
Position Data
47 6
different pressing indices as well as different space control parameters reveals that women
and men show comparable values in all tactical variables. In summary, it was shown that,
in contrast to previous video-based analyses, no significant differences in soccer-specific
tactical performance between women and men in high-performance soccer are detected
when using “blinded” positional data, where no inferences can be made about gender,
and objective KPIs. The results can be used to provide objective conclusions about the
training of players, contribute to the further development and professionalization of
women’s soccer in the area of tactics, and help to promote the public perception and
attractiveness of women’s soccer on the basis of objective evaluation criteria. ◄
► Example 3
Including new context information during match phases, Klemp et al. (2022) investi-
gated the relationship between running performance and goal scoring in professional
soccer. In a sample of 302 matches of the first Bundesliga, the first goal was modeled
as a function of running performance, based on positional data, and team strength of
the teams using logistic regression. The best model showed a median accuracy of 77%,
reflecting a strong relationship between running distance and the probability of scoring
the first goal. This relationship was strongest for total running distance compared to
sprinting distance or running distance with own ball possession. The authors propose
two different possible mechanisms to explain the relationship between running perfor-
mance and scoring success found in the present study. On the one hand, better fulfill-
ment of the players’ tactical goals could be responsible for this, on the other hand, the
increasing fatigue of the opposing players may also play a role. ◄
► Example 4
Guerrero-Calderón et al. (2021) analyzed the physical performance of professional soc-
cer players during training, taking into account the contextual factors of match loca-
tion, season duration, and opponent quality, in order to build predictive models for the
performance delivered during training sessions. Training data were generated from 30
professional soccer players of the Spanish La Liga based on positional data (N = 1365
performances). During the training weeks prior to home matches, reduced effort was
shown in terms of various strength, speed, and endurance parameters. The quality of
the opponent also affected the training load. The proposed predictive model represents
an innovative approach to quantify training load in professional soccer considering
novel contextual factors. ◄
References
Biermann, H., Komitova, R., Raabe, D., Müller-Budack, E., Ewerth, R., & Memmert, D. (2023).
Synchronization of passes in event and spatiotemporal soccer data. Scientific Reports, 13(1),
15878.
48 D. Memmert
Online Data
Christoph Breuer
Contents
7.2 Background – 51
7.3 Application – 52
References – 54
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_7
50 C. Breuer
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
7
Key Messages
55 Through digitalization, more and more sports data and other data relevant to
sports science are available as online data.
55 Web scraping or web crawling is a method to make this data systematically avail-
able for sports science research
55 The use of online data leads to significant knowledge in sports science.
Digital sports data holds an enormous potential with a huge variety of applica-
tions, ranging from (1) enriched sports media content via (2) economical price
strategies up to (3) individual training data. In order to give some examples, sports
media witnessed an increase in online sports reporting and countless second-screen
offers. Additionally, more and more TV broadcasters and streaming providers
offer a tactical feed parallel to their classic sports broadcast, whereas sports leagues
and betting providers give free access to performance statistics. Digitalization
enables sporting goods manufacturers to apply dynamic price strategies via their
online selling points while countless training, health, and running apps provide
individual performance measurements.
Accordingly, this enables a myriad of possibilities for data-based sports
research. Pioneering work has been done by Kemper and Breuer (2016a), who
quantify dynamic pricing potentials by applying second sales ticket prices of FC
Bayern München home tickets. As such, the authors showed that FC Bayern
München would benefit from introducing a dynamic pricing system widely used in
North American major league sports.
Online Data
51 7
7.2 Background
Whereas online data vary significantly in its content, technically, the data is com-
monly available via online platforms. In order to use data for academic research,
researchers may either copy the data manually (manual scraping) or via web scrap-
ing, where data extraction, copying, storing, and recycling are performed auto-
matically. In this regard, the term web crawling is also used in academic literature.
With secondary data analysis, copyright and similar legal issues must be con-
sidered. Thus, data’s copyright status should always be verified before researchers
start the scraping. As a rule of thumb, all information accessed with a username
and password is considered private and should not be analyzed (Bradley & James,
2019).
Initially, a basic knowledge of programming language skills (e.g., Python) was
necessary to apply web scraping. Modern web scraping tools such as Octoparse,
Parsehub, Scraper API, or Scarpe Simple allow users to create web scrapers, even
without programming knowledge. Also, the latest updates in statistic programs
such as GNU R provide packages to perform web scraping in the program’s
interface. Still, the python-based automated information extraction trumps any
web scraping tools. Javascript can make web scraping more difficult. But there are
software solutions for this too, such as PhantomJS.
Naturally, web scraping is not always welcomed. Some website domains block
web scraping since massive web scraping affects website performance significantly.
However, an increasing trend of legally usable online data is recognized.
Technically, the web scraping process follows four steps: First, the algorithm
obtains the website’s URL. Second, the web scraper retrieves and stores the web-
site’s HTML code. Third, the retrieved HTML code is now used to identify the
interesting elements stored in a table or database. Finally, the command is adapted
in accordance with the interesting elements.
Unfortunately, providing data in a structured, machine-readable, and secured
Application Programming Interface format (API) is not common. At least in
sports online data. Such API represents a technically and legally more secure pro-
cedure and will gain more importance in the future. A role model in this field is the
National Basketball Association (NBA) which provides API interfaces to support
data usage on its website 7 nba.com. As such, statistic programs such as the previ-
ously mentioned statistic program R offer user-written applications to analyze data
without web scraping detours (e.g., nbastatR Bresler, 2021). Similarly, numerous
API applications are evident in fantasy sports.
Definition
Generally, online data are text data published on the internet covering a huge
content variety. Using web scraping, such online data can be automatically
extracted, copied, stored, and used. In this context, API interfaces are a legally
more secure alternative but less available in sports.
52 C. Breuer
Study Box
In 2009, the San Francisco Giants from at once, the total price paid was divided
the American Major League Baseball by the number of tickets sold. Matching
were the first sports club to introduce a these eBay auctions with a variety of
dynamic pricing system. Since then, additional data (date and time from the
many North American professional official Bundesliga homepage, match
sports teams have followed this example. results and table positions from
However, no German sports club has yet 7 ergebnisselive.com, derby information
introduced a dynamic pricing system. from 7 derbysieg.com, spectator figures
With the pioneering work done by from fussball-daten.sport.de, population
Kemper and Breuer (2016b), a solid dis- and capita data from MB-Research
cussion foundation is given. In the (2013), weather from dwd.de), produced
research context, the authors collected an extensive data set of 6510 auctions,
secondary ticket prices from eBay’s online analyzed using a two-stage least squared
7 auction website during the second half of regression. The results showed that tick-
the 2013/14 Bundesliga season. Excluding ets resold on the secondary market
VIP seats and categorized customer almost doubled the initial price of the
groups (Students, seniors, and disabled tickets. Sports managers may further
persons) yielded in total 6510 eBay auc- elaborate their current variable ticket
tions in which 11,637 tickets were sold. pricing strategy to implement a more
Since many auctions sold multiple tickets sophisticated dynamic pricing approach.
7.3 Application
► Example 1
Traditionally, empirical studies on dynamic pricing in sports suffered from a limited
number of observations. Applying online data and web scraping enabled Kemper and
Breuer (2016b) to collect ticket prices daily over the entire sales period of a football
club. As such, daily ticket prices of the English football club Derby County during the
2013/2014 season were listed. Analyzing ticket prices from the earliest possible purchase
retrieved 5862 price points for 11 home games, considered for adults, seniors and U18
age groups. Obtaining prices directly from Derby County’s official website (7 http://
www.tickets.wearederby.com/match-tickets/buy-tickets/#) and the number of sold tick-
ets from 7 worldfootball.net (7 http://www.worldfootball.net) enabled a hedonic price
regression which revealed a significant time effect on dynamic ticket prices. In more
detail, the study found a monotone price-time relationship, which differs from pricing
models in the aviation or hotel industry. However, sports managers may apply these find-
ings to devise a more sophisticated pricing concept. ◄
Online Data
53 7
► Example 2
In a sponsorship-related study, Breuer et al. (2021) quantified a live match’s impact
mechanism and moderator effect on TV viewers and the associated sponsors’ benefits.
Therefore, the spectators’ physiological (gaze hits on advertising boards, electrodermal
activity) and psychological (facial expressions) data were collected. Supplemented by
live-betting odds to indicate the uncertainty of outcome, the authors showed for the first
time how TV spectators’ emotions vary along the game and how the advertiser’s message
perception is affected. ◄
► Example 3
In a similar study, Herold et al. (2021) examined the effect of so-called ghost games
on TV spectators and sponsors (the COVID-19 pandemic required the Bundesliga to
play football matches without spectators in the stadium). In this context, Herold et al.
(2021) not only used web-scrapped live-betting odds to control for the game’s course
and its uncertainty of outcome. The authors also obtain tension indicating match char-
acteristics such as goals or associated match time. Web scraping technics were used
to record the accessible bookmakers’ betting odds immediately after being published
online. Matching the betting odds with the participants’ physiological data showed that
ghost matches result in lower utility for TV spectators, albeit in already decided matches.
Vice versa, ghost matches increase sponsors’ benefit since spectators often glance at
advertising media instead of stadium spectators’ reactions. ◄
► Example 4
Steinfeldt et al. (2022) investigated whether spectators affect the score difference between
the teams. The authors analyzed n = 12,500 NBA games from 11 seasons from 2010/11
to 2020/21. COVID-19 spectator restrictions granted the floor investigating a poten-
tial spectators’ effect. Using the R package nbastatR (Bresler, 2021), data on a game
level and regular season data were collected. The data set includes information of match
results, location, teams’ records, and a range of more sophisticated basketball statis-
tics matched with spectator data scraped from 7 basketball-reference.com and betting
odds from 7 oddsportal.com. The authors could demonstrate that games played with
limited spectators were more likely to be won by margins of 15, 20, or 25 points than
unrestricted crowds. Given that the effect was most severe for games played on a weaker
team’s home court, Steinfeldt et al. (2022) conclude that predominately weaker teams
suffer from limited crowd support. ◄
References
Bradley, A., & James, R. J. E. (2019). Web scraping using R. Advances in Methods and Practices in
Psychological Science, 2(3), 264–270.
Bresler, A. (2021). nbastatR: R’s interface to NBA data. R package version 0.1.1505. Retrieved
November 24, 2021, from https://github.com/abresler/nbastatR
Breuer, C., Rumpf, C., & Boronczyk, F. (2021). Sponsor message processing in live broadcasts—A
pilot study on the role of game outcome uncertainty and emotions. Psychology & Marketing,
38(5), 896–907.
Herold, E., Boronczyk, F., & Breuer, C. (2021). Professional clubs as platforms in multi-sided mar-
kets: The role of spectators and atmosphere in live football. Sustainability, 13, 2312.
Kemper, C., & Breuer, C. (2016a). Dynamic ticket pricing and the impact of time—An analysis of
price paths of the English soccer club Derby County. European Sport Management Quarterly,
16(2), 233–253.
Kemper, C., & Breuer, C. (2016b). How efficient is dynamic pricing for sport events? Designing a
dynamic pricing model for Bayern Munich. International Journal of Sport Finance, 11(1), 4–25.
Steinfeldt, H., Dallmeyer, S., & Breuer, C. (2022). The silence of the fans—The impact of restricted
crowds in the margin of victory in the NBA. International Journal of Sport Finance. 17, 165–177.
7
55 III
Modeling
Contents
Chapter 8 Modeling – 57
Jürgen Perl and Daniel Memmert
Modeling
Jürgen Perl and Daniel Memmert
Contents
8.2 Background – 60
8.3 Application – 62
References – 63
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_8
58 J. Perl and D. Memmert
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
The idea of modeling in sport is to map complex dynamic systems to their essential
8 structures, data, and interactions in order to perform descriptive, prognostic, or
planning analyses and calculations.
The four essential steps for developing and applying a model are:
55 The reduction of the real system to its essential components and dynamics.
55 The mapping of the problem situation to (informatically) manageable objects
(numbers, functions, graphs, etc.) taking into account a corresponding back-
mapping of the obtained results.
55 The analysis/calculation is a process of question-oriented information extrac-
tion—or in short: the transformation of the problem or input data into result or
output data.
55 The visualization of the results on a system-oriented level (e.g. graphical repre-
sentation).
a b
.. Fig. 8.2 (a) Result of the model calculation. (b) Comparison with reality
with their directions of movement at the beginning of the process as well as veloc-
ity values for player and ball from the video data).
The task of the model would now be to answer the input question of reachabil-
ity from these data by appropriate calculation. The result of this model calculation
(or simulation) as an answer to the reachability question would be “yes” or “no” in
the simplest case. In both cases, however, the answer remains unsatisfactory because
it does not convey where ball reachability occurs or why it does not occur. Thus, as
the fourth step of modeling, adequate visualization of the results is necessary.
The calculated path graph shows in . Fig. 8.2a that the player theoretically
could have reached the ball; but not practically: As the video recordings show, the
ball was kicked away by an opponent before the calculated contact time (cf.
. Fig. 8.2b). But this opponent was not part of the model, i.e. the model had
reduced reality too much! And this brings us back to the first and decisive aspect
of modeling—the reduction: The “reduction of the real system …” mentioned
above under (1) is necessary to be able to calculate a result at all and with reason-
able effort. But it must not be too narrow, in order not to leave out essential objects
and dynamics, which influence the result.
60 J. Perl and D. Memmert
8.2 Background
The essential aspect of reduction for the mode of action and usability of a model
for the soccer example can be seen in . Fig. 8.2b (Perl & Memmert, 2019): The
ball has undergone an abrupt change of motion in its course, which cannot be
explained from the model, but is immediately understandable for the observer: the
intervention of an opposing player. This opposing player was not part of the model
because of too strong reduction, and therefore its possible effect on the motion
dynamics to be modelled could not be recognized and calculated.
. Figure 8.3a shows the typical modeling of such a player-opponent situation:
Assuming that both players move with the same speed, the dividing line between
the blue and the yellow area shows all points reached by the blue and the yellow
player simultaneously. To all points of the blue area, the blue player reaches faster,
to all points of the yellow area his yellow opponent does. These areas of faster
reachability are also called the player’s Voronoi cell after its “discoverer”. The anal-
ysis of the reachability of the ball thus becomes more precise to the question (e. g.,
Rein et al., 2017): “How does the ball pass through the Voronoi cells of the two
8 players?”
. Figure 8.3b shows even if the blue player had moved in the optimal direc-
tion, he would not have had a chance to prevent the yellow opponent’s action—he
could not reach the ball before his opponent. Game analyses based on Voronoi
cells are now standard in soccer and are used to analyze the effectiveness of tactical
formations in terms of space control (Memmert & Raabe, 2018; Perl & Memmert,
2015) (. Fig. 8.4).
Having thus shown that less reduction can also improve the accuracy of model-
ing, the question arises: should even more aspects of reality, such as speed differ-
ences and changes or changes in movement directions, be incorporated into the
model? This question, which is central in modeling, cannot be answered with a
a b
.. Fig. 8.4 Soccer field with the Voronoi cells of the players of “A-yellow” and “B-blue” (Memmert
& Raabe, 2018)
blanket yes or no. The answer depends in each case on the available data, the still
justifiable effort, and the expected benefit of modeling and calculation. For exam-
ple, the aforementioned additions provide the possibility of a technical visualiza-
tion of the game event parallel to the video presentation—but only if the data are
available in sufficient scope and precision. Otherwise, the modeling visualizes the
data deficits rather than the gameplay. Résumé: The central art of modeling is an
adequate reduction that preserves essential dynamics without getting lost in gim-
micks (Perl, 2015).
Definition
The model is an abstract representation of a system. It is used to diagnose the
system state and predict the system behavior (Perl & Uthmann, 1997).
The 4 essential steps of modeling are (soccer example in parentheses):
55 System reduction (capturing and representing the player-ball situation)
55 Problem mapping (setting of position and velocity data)
55 Analysis/calculation (calculation of the running paths and, if necessary, the
intersection)
55 Result visualization (representation of the player-ball situation as a graph)
62 J. Perl and D. Memmert
Study Box
The goal of key performance indicators (KPI; Memmert et al., 2017, Low et al.,
2019) is to map complex system behavior to single values in order to scale, score, and
rank systems or system components. However, very often this mapping only reduces
important information about tactical behavior or game dynamics without replacing
it with more meaningful information. Perl and Memmert (2017) used a two-step
approach to bridge the gap between complex dynamics and numerical metrics in
offensive play in soccer. First, they developed a model that visualizes offensive action
in a process-oriented manner by using KPIs to represent offensive performance.
Second, this model has been organized in terms of time intervals, allowing effective-
ness to be measured both for an entire half and for intervals of arbitrary length. In
doing so, Perl and Memmert (2017) have shown that the attack efficiency profile is a
dynamic indicator of a team’s match success. In . Fig. 8.5, red profiles show how
the attack efficiency values of “A-yellow” and “B-blue” for the correlation interval
length IL = 300 sec evolve over halftime. The efficiency values (OS A, OS B) for the
second I0 = 1721 are plotted in the gray box. In the graph, the green profiles show
8 the respective space control proportions in the opponent’s 30-m zone; the purple
markers show the ball control time points.
.. Fig. 8.5 Progressions of the attack efficiencies for the interval length IL = 300
8.3 Application
► Example 1
Physiological models for the optimization of stress-performance interactions are used
to simulate
Short-term effects of competition load on performance and fatigue;
Long-term effects of training load on performance and recovery requirements (Chap.
13).
The central idea of modeling is to reduce the complex physiological interactions to
the essential aspects of stress and performance. In this context, the delays with which
Modeling
63 8
load and recovery take effect are the focus of attention: the shorter the recovery delay
compared to the load delay, the more developed performance and capability are. Based
on these analysis data, training and competition can be improved in their effect (Tampier
et al., 2012). If the data expected from the analysis does not match the measured data,
this may indicate an irregular training situation such as an unrecognized illness or illicit
aids (e.g., doping). ◄
► Example 2
Tactical-strategic models to represent and analyze player behavior in team games have
been used in soccer:
Formations: The player distribution of a team or its tactical arrangements can be
analyzed by artificial neural networks and thus reduced to a few prototypical formations
(Grunz et al., 2012; Perl et al., 2013). With the help of a simulative dynamics analy-
sis of formation changes in specific game situations, tactical behavior patterns can be
identified and then, for example, optimized, avoided, or disrupted (the opponent) (Perl
& Memmert, 2017). In this context, the modeling of creativity or creative solutions in
sports play is also successful (Memmert & Perl, 2009a, 2009b).
Voronoi cells: As shown above, Voronoi cells help to analyze the spatial control of
players, teams, or tactical groups. Together with ball control, which can be analyzed
from the position and movement data of players and ball, one can thus develop models
that calculate the efficiency of attacking behavior from the coincidence of space and ball
control relative to the players’ action effort (Perl & Memmert, 2015). ◄
References
Grunz, A., Memmert, D., & Perl, J. (2012). Tactical pattern recognition in soccer games by means of
special self-organizing maps. Human Movement Science, 31, 334–343.
Low, B., Coutinho, D., Gonçalves, B., Rein, R., Memmert, D., & Sampaio, J. (2019). A systematic
review of collective tactical behaviours in football using positional data. Sports Medicine, 50,
343–385.
Memmert, D. (Ed.). (2021). Match analysis. Routledge.
Memmert, D., Lemmink, K., & Sampaio, J. (2017). Current approaches to tactical performance anal-
yses in soccer using position data. Sports Medicine, 47, 1–10.
Memmert, D., & Perl, J. (2009a). Analysis and simulation of creativity learning by means of artificial
neural networks. Human Movement Science, 28, 263–282.
Memmert, D., & Perl, J. (2009b). Game creativity analysis by means of neural networks. Journal of
Sport Science, 27, 139–149.
64 J. Perl and D. Memmert
Memmert, D., & Raabe, D. (2018). Data analytics in football. Positional data collection, modelling and
analysis. Routledge.
Perl, J. (2015). Modelling and simulation. In A. Baca (Ed.), Computer science in sport (pp. 110–153).
Routledge.
Perl, J., Grunz, A., & Memmert, D. (2013). Tactics in soccer: An advanced approach. International
Journal of Computer Science in Sport, 12, 33–44.
Perl, J. & Memmert, D. (2015). Analysis of process dynamics in soccer by means of artificial neural
networks and Voronoi-cells. In A. Baca & M. Stöckl (eds.), Schriften der Deutschen Vereinigung
für Sportwissenschaft, Band 244, (S. 130–135). Hamburg: Czwalina.
Perl, J., & Memmert, D. (2017). A pilot study on offensive success in soccer based on space and ball
control–key performance indicators and key to understand game dynamics. International Journal
of Computer Science in Sport, 16(1), 65–75.
Perl, J., & Memmert, D. (2019). Soccer: Process and interaction. In A. Baca & J. Perl (Eds.), Model-
ling and simulation in sport and exercise (pp. 73–94). Routledge.
Perl, J. & Uthmann, Th. (1997). Modellbildung. In J. Perl, M. Lames & W.-D. Miethling (Hrsg.),
Informatik im Sport. Ein Handbuch. (pp. 65–80). Schorndorf 1997.
Rein, R., Raabe, D., & Memmert, D. (2017). “Which pass is better?” Novel approaches to assess pass-
ing effectiveness in elite soccer. Human movement science, 55, 172−181.
Tampier, M., Endler, S., Novatchkov, H., Baca, A., & Perl, J. (2012). Development of an intelligent
real-time feedback system. International Journal of Computer Science in Sport, 11(3).
8
65 9
Predictive Models
Fabian Wunderlich
Contents
9.2 Background – 67
9.2.1 L ooking into the Future – 67
9.2.2 Predictive Models in Sports – 67
9.2.3 Creation of Predictive Models – 68
9.2.4 Exemplary Methods – 69
9.3 Applications – 70
References – 71
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_9
66 F. Wunderlich
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Predictive models are relevant in almost all areas of science and society, includ-
ing forecasts on weather and climate, politics and demography, as well as stock
prices or economic growth.
55 Driven by public and media interest, high data availability, and financial incen-
9 tives, the domain of sports is a predestined environment for predictive models.
55 Predictive models pursue a clearly defined goal, are based on suitable and mean-
ingful data, use appropriate methodological approaches to statistical modelling,
as well as suitable criteria to assess predictive quality.
55 Methodologically, statistical models or computer science models (such as
machine learning models) are suitable approaches. Among other things, the
selection of methods depends on the exact problem definition and the data com-
plexity.
Predictive thinking is a common approach in sports: Who will win the Super Bowl?
Which youth player will have a successful career? What will be the results of the
next Premier League matchday? Does the national team have a chance to win the
Basketball World Cup? Which tactical formation is most promising in the upcom-
ing match? How can injuries be avoided by adjusting training load? And who will
win Wimbledon? Such questions are not only popular topics of conversation for
the media and sports fans, but also form the basis of the bookmakers’ business
model and are important aspects in ensuring the sporting and financial success of
sports teams. Thus, predictive models play a significant role in sports, in particular
with regard to the interdisciplinary combination of computer science, mathemat-
ics, and sports science.
Predictive Models
67 9
9.2 Background
Looking into the future seems to be part of normal human behaviour, especially in
our modern society. We want to ensure survival by anticipating natural disasters or
developments such as climate change. We want to live safely by anticipating threats
from everyday crime to terrorist attacks. We want to secure our financial success by
correctly anticipating stock market developments, economic trends, or consumer
behaviour. And finally, we simply want to predict whether it will rain tomorrow so
that we will have an umbrella with us when needed.
Forecasting is therefore a topic that is receiving attention in almost all areas of
science and society. The domains where predictive models play an important role
include economics (Timmermann, 2000), weather (Taylor & Buizza, 2004), climate
(Green et al., 2009), political elections (Wolfers & Leigh, 2002), political conflicts
(Brandt et al., 2014), crime (Gorr et al., 2003), demography (Booth, 2006), or
energy demand (Hong et al., 2016).
Sports is another popular application field for predictive models (Horvat & Job,
2020; McHale & Swartz, 2019; Vaughan Williams & Stekler, 2010; Wunderlich &
Memmert, 2020), whose relevance is supported by the specific characteristics of
sport.
Due to high media and public interest, there is a large amount of available data,
allowing predictive models to draw on datasets with large sample sizes and/or a
high level of detail (Angelini & de Angelis, 2019; Klemp et al., 2021; Koopman &
Lit, 2019; Lessmann et al., 2010; Štrumbelj & Vračar, 2012).
In addition, there are several incentives for good predictive models. The sports
betting market offers strong financial incentives for profitable forecasts of game
outcomes, both on the side of bookmakers and professional bettors (Boshnakov et
al., 2017; Constantinou et al., 2012; Hubáček et al., 2019). Moreover, the sports
business itself offers high sporting and financial incentives, e.g., to adequately
model spectator interest (Mueller, 2020; Van Reeth, 2019), optimal tactical move-
ment behaviour on the field (Dick & Brefeld, 2019; Le et al., 2017), or risk of injury
(Rossi et al., 2018).
Last but not least, there is a scientific interest as predictive models in sport can
help to investigate general scientific theories and concepts such as market efficiency
(Angelini & de Angelis, 2019; Bernardo et al., 2019; Direr, 2011; Goddard &
Asimakopoulos, 2004) or crowd wisdom, i.e. collaborative human judgement
(Peeters, 2018; Spann & Skiera, 2009).
68 F. Wunderlich
In this section, the necessary steps to create a predictive model are explained and
methods form mathematics and computer science are highlighted based on two
exemplary models.
Step 1: Goal
First of all, each predictive model is supposed to address a specific goal. This refers
to solving one of the numerous application examples already mentioned in this
chapter. Moreover, several further questions concerning the characteristics of the
model have to be answered, e.g.
55 Is a binary (yes/no) prediction or a percentage forecast needed?
55 Should the model be simple and intuitive to understand?
55 Is computation time influencing the value of the model?
55 Does the model aim at high accuracy or high profitability?
Step 2: Data
Data is one of the most important and often limiting aspects of model selection. In
particular, it is important to assess which data are available, whether they can be
9 used freely and if so, in what quantity (sample size) the data are available. It is also
important to consider that for meaningful predictive models, the data set should be
divided into a sufficiently large training and test data set (in-sample and out-of-
sample data). Furthermore, the application of a model only makes sense if data
quality and information content are sufficient. Even the most sophisticated model
will not be able to provide satisfactory results if the underlying data are incomplete,
erroneous, or do not contain the required information.
The following two predictive models are intended to illustrate exemplary methods.
Both are related to the outcome of sports events, and take methodologically differ-
ent approaches borrowed from statistics as well as computer science.
Study Box
ity (prediction accuracy, calibration, log-loss and discrimination), she found that
good models can predict the winner of a match in more than two-thirds of all cases.
Regression models and ELO rating-based models performed best, while none of
these mathematical models outperformed the predictive quality of betting odds.
Moreover, the author reported that all models were more successful in forecasting
matches of top players than matches of lower-ranked players.
9.3 Applications
► Example 1
This application area focuses probabilistic forecasts for the outcomes of sports events. Using
soccer as an example, this includes the final result in terms of home win, draw, away win
(Hvattum & Arntzen, 2010), the exact number of goals scored by both teams (Karlis &
Ntzoufras, 2003), or the total number of goals in the match (Wheatcroft, 2020). This applica-
tion example gains particular relevance from the possibility to bet on all these outcomes in
the sports betting market. The data basis is usually prior results (Hvattum & Arntzen, 2010;
9 Koopman & Lit, 2019) and/or additional team- or player-specific match statistics (Hubáček
et al., 2019; Štrumbelj & Vračar, 2012). With regard to methodological approaches, classical
statistical methods such as adaptive r atings and probability models are often used. ◄
► Example 2
This application area deals with predictive approaches for modelling the tactical behav-
iour of teams on the playing field. It is, therefore, more likely to be assigned to the areas
of performance analysis or game analysis. Specifically, it attempts, for example, to fore-
cast the collective movement behaviour of teams through so-called “ghosting” (Le et al.,
2017; Seidl et al., 2018) or to analyse the dangerousness of game situations and actions
on the field (Dick & Brefeld, 2019; Link et al., 2016; Lucey et al., 2014; Wei et al., 2013).
Such approaches usually draw on extensive data sets of positional data and event data,
which supports the use of machine learning models. ◄
► Example 3
A relatively recent application example is predictive approaches for injury prevention
(Rossi et al., 2018) based on motion data, which establish a link to the fields of load
control and sports medicine. Data basis can be, e.g., GPS motion data, further physical
load data, and, if available, additional personal and medical data (Ehrmann et al., 2016;
Rossi et al., 2018). Again, due to the multitude and complexity of data sources, machine
learning models may be the most suitable approach (Claudino et al., 2019). ◄
Koopman, S. J., & Lit, R. (2019). Forecasting football match results in national league competitions
using score-driven time series models. International Journal of Forecasting, 35(2), 797–809.
https://doi.org/10.1016/j.ijforecast.2018.10.011
Kovalchik, S. A. (2016). Searching for the GOAT of tennis win prediction. Journal of Quantitative
Analysis in Sports, 12(3), 127–138. https://doi.org/10.1515/jqas-2015-0059
Le, H., Carr, P., Yue, Y., & Lucey, P. (2017). Data-driven ghosting using deep imitation learning. In
Proceedings of the 11th annual MIT Sloan sports analytics conference 2017. Boston, MA.
Lessmann, S., Sung, M.-C., & Johnson, J. E. (2010). Alternative methods of predicting competitive
events: An application in horserace betting markets. International Journal of Forecasting, 26(3),
518–536. https://doi.org/10.1016/j.ijforecast.2009.12.013
Link, D., Lang, S., & Seidenschwarz, P. (2016). Real time quantification of dangerousity in football
using spatiotemporal tracking data. PLoS One, 11(12), e0168768. https://doi.org/10.1371/jour-
nal.pone.0168768
Lucey, P., Bialkowski, A., Carr, P., Yue, Y., & Matthews, I. (2014). How to get an open shot: Analyz-
ing team movement in basketball using tracking data. In Proceedings of the 8th annual MIT
SLOAN sports analytics conference. Symposium conducted at the meeting of Citeseer.
McHale, I., & Swartz, T. (2019). Editorial: Forecasting in sports. International Journal of Forecasting,
35(2), 710–711. https://doi.org/10.1016/j.ijforecast.2019.01.002
Mueller, S. Q. (2020). Pre- and within-season attendance forecasting in Major League Baseball: A
random forest approach. Applied Economics, 52(41), 4512–4528. https://doi.org/10.1080/0003684
6.2020.1736502
Peeters, T. (2018). Testing the wisdom of crowds in the field: Transfermarkt valuations and interna-
tional soccer results. International Journal of Forecasting, 34(1), 17–29. https://doi.org/10.1016/j.
ijforecast.2017.08.002
9 Rossi, A., Pappalardo, L., Cintia, P., Iaia, F. M., Fernàndez, J., & Medina, D. (2018). Effective injury
forecasting in soccer with GPS training data and machine learning. PLoS One, 13(7), e0201264.
https://doi.org/10.1371/journal.pone.0201264
Seidl, T., Cherukumudi, A., Hartnett, A., Carr, P., & Lucey, P. (2018). Bhostgusters: Realtime interac-
tive play sketching with synthesized nba defenses. In 12 th Annual MIT Sloan Sports Analytics
Conference.
Spann, M., & Skiera, B. (2009). Sports forecasting: A comparison of the forecast accuracy of predic-
tion markets, betting odds and tipsters. Journal of Forecasting, 28(1), 55–72. https://doi.
org/10.1002/for.1091
Štrumbelj, E., & Vračar, P. (2012). Simulating a basketball match with a homogeneous Markov model
and forecasting the outcome. International Journal of Forecasting, 28(2), 532–542. https://doi.
org/10.1016/j.ijforecast.2011.01.004
Taylor, J. W., & Buizza, R. (2004). A comparison of temperature density forecasts from GARCH and
atmospheric models. Journal of Forecasting, 23(5), 337–355. https://doi.org/10.1002/for.917
Timmermann, A. (2000). Density forecasting in economics and finance. Journal of Forecasting, 19(4),
231–234.
Van Reeth, D. (2019). Forecasting Tour de France TV audiences: A multi-country analysis. Interna-
tional Journal of Forecasting, 35(2), 810–821. https://doi.org/10.1016/j.ijforecast.2018.06.003
Vaughan Williams, L., & Stekler, H. O. (2010). Sports forecasting. International Journal of Forecast-
ing, 26(3), 445–447. https://doi.org/10.1016/j.ijforecast.2009.12.005
Wei, X., Lucey, P., Morgan, S., & Sridharan, S. (2013). Sweet-spot: Using spatiotemporal data to
discover and predict shots in tennis. In 7th Annual MIT Sloan sports analytics conference, Boston,
MA.
Wheatcroft, E. (2020). A profitable model for predicting the over/under market in football. Interna-
tional Journal of Forecasting, 36(3), 916–932. https://doi.org/10.1016/j.ijforecast.2019.11.001
Wolfers, J., & Leigh, A. (2002). Three tools for forecasting federal elections: Lessons from 2001. Aus-
tralian Journal of Political Science, 37(2), 223–240. https://doi.org/10.1080/10361140220148115
Wunderlich, F., & Memmert, D. (2020). Forecasting the outcomes of sports events: A review.
European Journal of Sport Science, 21(7), 944–957. https://doi.org/10.1080/17461391.2020.17930
02
73 10
Physiological Modeling
Manuel Bassek
Contents
10.2 Background – 75
10.3 Applications – 76
References – 78
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_10
74 M. Bassek
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Physiological models are used to describe the demands of physical activity and
analyze the load induced on athletes in their sport.
55 The collection of position data gives the opportunity to calculate the objective
external load precisely
55 Physiological models can be used to extrapolate from external load to internal
load.
10 55 Different physiological models are suited for different activities, like constant
speed running in track and field or non-linear movement in handball.
Definition
Physiological modeling of player load describes the extraction of physiological
parameters from non-physiological data. It can be used to monitor athletes work-
loads during training and competition to optimize training results and prevent
injuries (Akenhead & Nassis, 2016; Bourdon et al., 2017). In professional team
sports, the collected position data can be processed for that purpose.
76 M. Bassek
Study Box
Bassek et al. (2023) analyzed the player interaction effect for the difference
load of elite handball players during 77 between distance and Equivalent dis-
matches of the German Handball tance and the player positions. Wings
Bundesliga. They reported benchmark had a larger difference between distance
values for player load in elite handball and Equivalent distance then back-
matches such as distances in six speed courts and pivots. This means that the
zones, Metabolic power, Metabolic game of wings is more characterized by
work, Equivalent distance and frequent accelerations and decelerations.
Equivalent distance index. Additionally, The results are in line with other studies
they compared the influence of the phys- that report that wings ware more fre-
iological model on the measured player quently involved in counter attacks
load statistically. For that purpose, they which require maximum accelerations.
calculated the distance covered and The choice of the right physiological
Equivalent distance for every player. model is therefore crucial for the analy-
The difference between them were then sis of player load. For sports that are
compared between the positions of characterized by accelerations and
wings, backcourts and pivots with an decelerations models that implement
ANOVA. The results show a significant them should be used.
10
10.3 Applications
► Example 1
Training impulse (TRIMP). Banister (1991) suggested the TRIMP to model internal
load during endurance training. The TRIMP is calculated as the product of training
duration and intensity. The intensity is derived from heart rate data and modeled in an
exponential function to account for the non-linear relationship between intensity and
load, as seen by blood lactate curves (for detailed formula, see Borresen & Lambert,
2009). It further includes the resting and maximum heart rate as representations of
the athlete’s individual condition. Even individual lactate curves can be included in the
model (Manzi et al., 2014). The TRIMP allows for ecologic measurement of player load
as it only requires hear rate measurements. It can be used to monitor the intensity of
individual sessions of prolonged endurance training. In team sports, it can be applied
during (pre-)season conditioning and regeneration. However, it is limited to constant
speed sessions and can not be used during interval training or sport specific drills. ◄
► Example 2
Speed zones. Multidirectional team sports, like handball are characterized by the non-
linear movement behavior of players, which means that players are constantly changing
their speed (Karcher & Buchheit, 2014). One approach to measure player load in team
sports is to divide the distance covered by players into speed zones. Over the time, many
Physiological Modeling
77 10
models with different numbers of zones and cut-off speed have been used (see Miguel
et al., 2021 for a detailed review). For example, Aslan and Aç (2012) used 8 zones: (1)
walking: 0–6 km/h, (2) jogging: 6.1–8 km/h, (3) low-intensity running: 8.1–12 km/h, (4)
moderate-intensity running: 12.1–15 km/h, (5) high-intensity running: 15.1–18 km/h,
(6) low-intensity string: 18.1–21 km/h, (7) moderate-intensity sprint: 21.1–24 km/h, (8)
high intensity sprint: > 24 km/h; whereas Clemente et al. (2019) identified 4 zones: (1)
walking: 0–6.9 km/h. (2) jogging: 7–13.9 km/h, (3) running: 14–20 km/h, (4) sprinting:
>20 km/h. Other approaches individualize the speed zones based on athletes capacities
measured by lactate thresholds (Aslan & Aç, 2012) or in percentages of the maximum
speed (Bacon & Mauger, 2017).
Distances covered in different speed zones is a simple approach to modeling player
load when the activity is not linear in nature. It can be especially useful to compare the
distance-per-speed zone profiles of athletes with desirable benchmark values. Such val-
ues can be the own performance prior an injury to compare during rehabilitation or the
average professional player to identify talents. However, the different definitions of
speed zones make it difficult to compare models used by different researchers and
practitioners (Bradley & Ade, 2018). ◄
► Example 3
Metabolic power. Both approaches described previously do not incorporate accelera-
tions and decelerations in their modeling. A possible way to include them is the concept
of metabolic power. Metabolic power is defined as the energy expenditure per unit of
time necessary to move at a certain speed, and is calculated as the product of energy
cost of transport, per unit body mass and distance (J·kg−1·m−1) and velocity (m·s−1)
(di Prampero & Osgnach, 2018). The measure was first introduced by di Prampero et
al. (2005), who used the biomechanical equivalence of accelerated (or decelerated) run-
ning on flat terrain and constant running uphill (or downhill) to estimate the energy
requirement for a specific displacement. Since then it has been used in several studies to
characterize player load (Miguel et al., 2021).
The metabolic power model can provide several parameters of player load. (1)
Metabolic power: The instantaneous power needed to perform the current locomotion,
(2) Metabolic work: The energy needed to perform the locomotion in a time window, (3)
Equivalent distance: The distance someone could have covered with the same energy if
they didn’t perform any accelerations of decelerations, (4) Equivalent distance index:
The ration of equivalent distance and actual distance covered. The equivalent distance
index is can be used as an indicator for how much an activity was characterized by accel-
erations and decelerations. These measures give a more detailed view into the internal
load based on the combined analysis of distance, velocity and acceleration. Additionally,
they provide comprehensive and comparable values of player load (Polglaze & Hoppe,
2019). ◄
References
Akenhead, R., & Nassis, G. P. (2016). Training load and player monitoring in high-level football:
Current practice and perceptions. International Journal of Sports Physiology and Performance,
11(5), 587–593. https://doi.org/10.1123/ijspp.2015-0331
Aslan, A., & Aç, C. (2012). Metabolic demands of match performance in young soccer players. Jour-
nal of Sports Science & Medicine, 11, 170–179.
Bacon, C. S., & Mauger, A. R. (2017). Prediction of overuse injuries in professional U18-U21 foot-
ballers using metrics of training distance and intensity. Journal of Strength and Conditioning
Research, 31(11), 3067–3076. https://doi.org/10.1519/JSC.0000000000001744
Banister, E. W. (1991). Modeling elite athletic performance. In D. MacDougall, H. A. Wenger, & H. J.
Green (Eds.), Physiological testing of the high-performance athlete (2nd ed.). Human Kinetics
Books.
Bassek, M., Raabe, D., Memmert, D., & Rein, R. (2023). Analysis of motion characteristics and
metabolic power in elite male handball players. Journal of Sports Science and Medicine, 310–316.
https://doi.org/10.52082/jssm.2023.310
Borresen, J., & Lambert, M. I. (2009). The quantification of training load, the training response and
the effect on performance. Sports Medicine, 39(9), 779–795.
Bourdon, P. C., Cardinale, M., Murray, A., Gastin, P., Kellmann, M., Varley, M. C., Gabbett, T. J.,
Coutts, A. J., Burgess, D. J., Gregson, W., & Cable, N. T. (2017). Monitoring athlete training
loads: Consensus statement. International Journal of Sports Physiology and Performance, 12(s2),
S2-161–S2-170. https://doi.org/10.1123/IJSPP.2017-0208
Bradley, P. S., & Ade, J. D. (2018). Are current physical match performance metrics in elite soccer fit
for purpose or is the adoption of an integrated approach needed? International Journal of Sports
Physiology and Performance, 13(5), 656–664. https://doi.org/10.1123/ijspp.2017-0433
Clemente, F. M., Owen, A., Serra-Olivares, J., & Nikolaidis, P. T. (2019). Characterization of the
10 weekly external load profile of professional soccer teams from Portugal and The Netherlands.
Journal of Human Kinetics, 66, 155–164. https://doi.org/10.2478/hukin-2018-0054
di Prampero, P. E., Fusi, S., Sepulcri, L., Morin, J. B., Belli, A., & Antonutto, G. (2005). Sprint run-
ning: A new energetic approach. Journal of Experimental Biology, 208(14), 2809–2816. https://
doi.org/10.1242/jeb.01700
di Prampero, P. E., & Osgnach, C. (2018). Metabolic power in team sports—part 1: An update. Inter-
national Journal of Sports Medicine, 39(08), 581–587. https://doi.org/10.1055/a-0592-7660
Karcher, C., & Buchheit, M. (2014). On-court demands of elite handball, with special reference to
playing positions. Sports Medicine, 44(6), 797–814. https://doi.org/10.1007/s40279-014-0164-z
Lutz, J., Memmert, D., Raabe, D., Dornberger, R., & Donath, L. (2019). Wearables for integrative
performance and tactic analyses: Opportunities, challenges, and future directions. International
Journal of Environmental Research and Public Health, 17(1), 1–26. https://doi.org/10.3390/
ijerph17010059
Manzi, V., Impellizzeri, F., & Castagna, C. (2014). Aerobic fitness ecological validity in elite soccer
players: A metabolic power approach. Journal of Strength and Conditioning Research, 28(4),
6–919.
Miguel, M., Oliveira, R., Loureiro, N., García-Rubio, J., & Ibáñez, S. J. (2021). Load measures in
training/match monitoring in soccer: A systematic review. International Journal of Environmental
Research and Public Health, 18(5), 2721. https://doi.org/10.3390/ijerph18052721
Polglaze, T., & Hoppe, M. W. (2019). Metabolic power: A step in the right direction for team sports.
International Journal of Sports Physiology and Performance, 14(3), 407–411. https://doi.
org/10.1123/ijspp.2018-0661
Scott, M. T. U., Scott, T. J., & Kelly, V. G. (2016). The validity and reliability of global positioning
systems in team sport: A brief review. Journal of Strength and Conditioning Research, 30(5),
1470–1490. https://doi.org/10.1519/JSC.0000000000001221
79 IV
Simulation
Contents
Chapter 11 Simulation – 81
Jürgen Perl and Daniel Memmert
Simulation
Jürgen Perl and Daniel Memmert
Contents
11.2 Background – 83
11.3 Applications – 86
References – 88
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_11
82 J. Perl and D. Memmert
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected].
Key Messages
System—Model—Simulation:
55 System behavior becomes computable by models;
55 Model calculations simulate system behavior and make it transparent;
55 Simulations help to understand, predict and influence system behavior.
. Figure 11.2 shows how the calculated heart rates (green, directly propor-
tional) and the reserve (gray, inversely proportional) track the velocity profile with
slight delays. These delays and thus the overall course are essentially caused by the
recovery delay DR: The smaller DR, i.e. the shorter the recovery delay, the lower
and smoother the heart rate course, and the higher the reserve values.
Two essential aspects and applications for power simulation are:
1. the detection of inadequate load patterns (overload, underload) and the simu-
lative optimization of load-power dynamics.
To analyze athletic performance, e.g., in terms of potential for improvement,
one can vary the delay parameters, in the following specifically the recovery
delay DR, and thus identify the potential for increase in performance in the
simulation (cf. Perl, 2003). . Figure 11.3 shows the heart rate curve (green) and
the reserve curve (gray) of the athlete from . Fig. 11.2 with a recovery delay of
DR = 7.2. Supplementary simulated as a training target are the desired profiles
(heart rate: gray, significantly lower and smoother; reserve: black, significantly
higher) of a better-trained athlete, which can however only be achieved with a
faster recovery, DR = 5.8.
The simulation thus allows to test and compare load profiles without over-
loading the athlete with too high training loads in real tests. In the example
above, the result of the simulation would be the question of whether the DR
value for the athlete under consideration could be reduced to below 6 at all by
training. To answer this question, it would be useful to accompany the training
sessions with appropriate simulations during the training process to detect and
avoid possible overload situations in time.
11
Study Box
In individual sports, as described above, ing the perfect mix of changing tactical
the focus is on simulating physiological patterns, which depend significantly on
performance. In contrast, in rebound or the behavior of the opposing team
team games, the optimization of tactical (Memmert & Raabe, 2018). Memmert et
behavior is the subject of simulation, as al. (2021) studied professional soccer
illustrated below using soccer as an matches according to the specific tactical
example. The key to success lies in find- team behavior “attack vs. defense” based
86 J. Perl and D. Memmert
on a simulation approach. The formation become. The results also show that defen-
patterns of all matches (40 positional sive player groups always succeed in
datasets) are categorized by SOCCER© attacking groups below a certain plan-
(Perl & Memmert, 2011) for defense and ning rate value. Simulation-based posi-
offense. Monte Carlo simulation can tional data analysis reveals successful
evaluate the mathematically optimal strategic behavior patterns for attack and
strategy. The interaction simulation defense. Attacking player groups need
between offense and defense shows opti- very high flexibility (for creativity, see
mal flexibility values for both tactical Memmert & Perl, 2009a, 2009b) to be
groups. The results showed that both successful (keep possession of the ball).
offense and defense have optimal plan- Defensive player groups, on the other
ning rates to be more successful. The hand, only need to be below a defined
more complex the success indicator, the flexibility level to guarantee more suc-
more successful attacking player groups cess.
11.3 Applications
► Example 1
Detection of possible overload phases in backstroke games like tennis are possible
(unpublished analysis for EU project 7 https://matchpoint.bgtennis.bg/). Stroke
changes in backstroke games can be long and energy-consuming due to sprints to the
receiving points. . Figure 11.5 shows on the left the motion profile of a tennis player
11 and on the right the corresponding progressions of
55 Load: running speed with short pauses after own strokes;
55 Power: heart rate, initially relaxed (green), then rising into the critical range (light blue,
red) and finally ending in the collapse range (black);
55 Reserve: in the first resting phase first strongly increasing, at the end in the overload
phase decreasing into the negative collapse range.
These progressions can be diagnosed ex-post on the basis of recorded data or predicted
by simulation ex-ante with the help of load-performance simulation for avoidance. ◄
► Example 2
Recognition of implausible performance increases at the time of competition is possible
(Perl, 2004). Peak loads in competition cannot be maintained continuously, but are pre-
pared by a correspondingly building training program with an appropriate lead time. ◄
tion date and thus, as described above, significantly improves the ability to imple-
ment the load. Such a reduction of the recovery delay is, among other things, the
goal of training, but usually works over weeks to months. A significant improve-
ment within 3 days, on the other hand, is highly conspicuous and should be cause
for a more detailed analysis of the measures taken.
► Example 3
Recognizing and optimizing strategies in team games where several players act “inde-
pendently” of each other is difficult (Memmert, 2021). An innovative approach is to
divide teams into a small number of tactical groups and analyze the interaction of these
groups. The positions of players in tactical groups in soccer can then be mapped to for-
mation patterns, reflecting strategic behavior and interaction (Perl & Memmert, 2019).
Based on this information, Monte Carlo simulation allows generating tactical strategies
that are optimal—at least from a mathematical point of view. In practice, behavior can
be guided by these optimal strategies, but usually changes depending on the activities of
the opposing team. Analyzing the game from the perspective of such simulated strategies
can show how strictly or flexibly (cf. Memmert, 2015) a team varies strategic patterns. To
optimize such team behavior of tactical group interactions in professional soccer, Perl et
al. (2021) conducted a simulation and validation study based on 40 positional datasets
from professional soccer using the SOCCER© software (Grunz et al., 2012; Perl et al.,
2013; Perl & Memmert, 2011). After the validation study confirmed the applicability
88 J. Perl and D. Memmert
of the defined tactical model, the simulation study showed that offensive player groups
need less tactical flexibility to successfully gain possession of the ball, while defensive
player groups need more tactical flexibility to do so. Offensive players should thus play
with a more flexible tactical orientation to maintain possession, while defensive players
should play with a more planned orientation to be successful. ◄
References
Grunz, A., Memmert, D., & Perl, J. (2012). Tactical pattern recognition in soccer games by means of
special self-organizing maps. Human Movement Science, 31, 334–343.
Memmert, D. (2015). Teaching tactical creativity in sport: Research and practice. Routledge.
Memmert, D. (Ed.). (2021). Match analysis. Routledge.
Memmert, D., Imkamp, J., & Perl, J. (2021). Flexible defends succeeds creative attacks!—A simula-
tion approach based on position data in professional football. Journal of Software Engineering
Metabolic Simulation
Dietmar Saupe
Contents
12.2 Background – 91
12.3 Applications – 92
References – 97
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_12
90 D. Saupe
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Mathematical modeling captures the functional relationship between the mea-
surable metabolic output variables of lactate, oxygen uptake, carbon dioxide out-
put, as well as heart rate, and the power demanded.
55 Energy provision from the system of energy-rich phosphates and oxidative phos-
phorylation can also be done from a more complex meta-modeling as a system
of coupled processes simulated on the computer with suitable algorithms.
55 The (critical power) PC model describes the maximum achievable duration of a
workload on the ergometer at a given power level.
55 Models allow the estimation of interpretable parameters based on performance
tests, the analysis of performance in training and competition, the monitoring of
rehabilitation measures, and the planning of strategies for the optimal use of
12 performance potential.
Two methodologies will be used to discuss how the dynamics and limits of meta-
bolic energy provision can be described quantitatively through modeling, simula-
tion, and analysis in sports informatics and used for practical applications. The
examples are primarily related to road cycling and cycling ergometer tests, but can
be adapted to other endurance sports such as running, swimming or rowing by
appropriate modifications. In performance diagnostics, exercise science and sports
medicine, methods are used to model and predict the effects of variable exercise
demand on measurable indicators such as heart rate, oxygen uptake and lactate
production. For athletes, this can yield valuable conclusions about fitness parame-
ters and training success. In competition, it is important to use the individually
Metabolic Simulation
91 12
available energy supply in the best possible way. In every phase of a race, athletes
must be able to estimate how much power they can produce without exhausting
themselves prematurely, but also without arriving at the finish line with unused
energy reserves. In road cycling, they can do this by using their measured power in
watts and their heart rate as a guide, based on their years of training and competi-
tion experience. However, it is not enough to simply set a suitable power and main-
tain it consistently. Research using a theoretical approach has shown that on
courses with variable gradient profiles or changing wind conditions, a variable
power distribution is advantageous over a constant one. Mathematical modeling
and simulation allows to develop appropriate adaptive pacing strategies.
12.2 Background
Definition
Linear ordinary differential equations (ODE) with asymptotically constant solu-
tions describe the physiological adaptation of oxygen uptake and heart rate to a
constant power demand.
The (critical power) CP model describes the maximum amount of time that
can be sustained at a given constant power on an ergometer to complete exhaus-
tion.
Both models can be generalized for the case of a variable power demand.
12.3 Applications
.. Fig. 12.1 Step test on the ergometer until exhaustion and resulting blood lactate values from
measurements and modeling, from Schulte et al. (1999)
Metabolic Simulation
93 12
This approach to modeling has the advantage that the variables and parameters
have direct metabolic equivalents, although in many cases their sports science inter-
pretation is not as obvious as with lactate. The disadvantage is the very high complex-
ity, the required expert knowledge in biochemistry and in the mathematical methods
for the setup and numerical solution of the resulting differential-algebraic equation
system.
2. The much more common method of modeling directly sets up equations for the
dynamics of the output variables oxygen uptake (V̇O2), lacate concentration ([La]),
or heart rate (HR). Linear differential equations are preferred. These have exponen-
tial solutions that can well represent the adaptation of the metabolic system to a
performance demand. The variables and parameters are easily interpretable, typically
as amplitudes of adaptation responses and their associated time constants.
Fundamental to the dynamics of V̇O2 is first its value in equilibrium (steady-
state), i.e. after completion of the adaptation response to a constant load set on the
ergometer. This gives an individual monotonically increasing function of V̇O2 over a
power interval from P = 0 (W) to a maximum and sustained critical power P = Pc at
which maximal oxygen uptake V̇O2max is reached, see Jones and Poole (2013). The
steady-state value of V̇O2 is essentially composed of three components:
(a) a baseline value that is slightly above the resting-state V̇O2, (b) a component
A1 that increases linearly with P, and (c) a smaller, so-called slow component A2 that
is added only above a certain power threshold, see . Fig. 12.2.
For both components A1 and A2, the adaptation of V̇O2 to an incipient con-
stant load P can be described very well by an exponential function with three
parameters A, T and τ,
t T
A 1 exp
.. Fig. 12.2 Steady-state model for oxygen uptake V̇O2 as a function of load, a constant power
P. Powers greater than the critical power Pc cannot be sustained. Figure from Artiga Gonzalez et al.
(2019)
94 D. Saupe
.. Fig. 12.3 Hydraulic representation of three physiological models. On the left, the classical model
of Monod and Scherrer (1965); in the middle, the model of Wilkie (1981) with adaptive recovery rate;
and on the right, that of Morton (1986) and Margaria with an additional vessel for anaerobic-lac-
tacid energy provision. Figure from Wolf (2019)
96 D. Saupe
The simple model for maximum sustainable power has worked well in practice.
However, to determine the critical power Pc for an individual, one needs several
points (P,T) on the graph of the hyperbolic function of T. For each of these points,
an ergometer test must be performed to exhaustion, followed by a sufficiently long
recovery period. This is hardly feasible in practice, and therefore simpler tests have
been developed for this purpose, see the review in Lipková et al. (2022).
The 2-parameter CP model of Monod and Scherrer provided the foundation
for a number of refinements to address minor shortcomings. It has been criticized
that the model allows arbitrarily high power, at least for short periods, and that the
direct update of the model estimates too optimistically for recoveries below critical
power (P < Pc) as discussed above. In the 3-parameter model of Morton (1996),
power is limited, and in Skiba et al. (2014) and Wolf (2019), the recovery rate was
damped. Two other modifications of Wilkie, Margaria, and Morton are listed in
. Fig. 12.3.
The physiological models combined with a mechanical model of required phys-
ical power during cycling on a track with a given elevation profile can be used to
calculate best pacing strategies with numerical optimization, see Fayazi et al.
(2013), Sundström and Bäckström (2017), and Wolf et al. (2019).
Study Box
Simulation of Physiological
Adaptation Processes
Mark Pfeiffer and Stefan Endler
Contents
References – 105
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_13
100 M. Pfeiffer and S. Endler
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Training and competition performance are strongly related to physiological
adaptation processes.
55 Higher level (macroscopic) models represent the overall effect of many individ-
ual processes (not exclusively physiological) on the body.
55 The parameters of these metamodels can be individually adjusted (calibrated)
based on measurable characteristics (data).
55 Calibrated models allow short-term simulations (single training sessions or com-
petitions) or long-term simulations (progress through training programs).
13.2 Background
Approaches for Modelling the Relationship between Training (input) and Performance (output)
Time Dependency
I: Performance is a time-varying status
Physiological Assumptions
#Inputs bivariate (2) univariate (1) univariate (1) uni-/multivariate (1/n) multivariate (n)
.. Fig. 13.1 Overview of selected models and their physiological assumptions, mathematical basis,
and several input parameters. (After Rasche & Pfeiffer, 2019)
13.3 Applications
► Example 1
The PerPot meta-model (Perl, 2001) was adapted by Endler (2013) specifically for endurance-
oriented running (PerPot-run). On the one hand, this involved adapting the model itself by
simplifying model equations that represent individual processes in the general metamodel
that do not occur in this form in the application scenario of running sports (see . Fig. 13.2).
On the other hand, properties important for endurance sports, such as long-term atrophy,
i.e. fatigue over time, were integrated into PerPot-run. Based on the step test of the usual
performance diagnostics, Endler (2013) developed and implemented a special calibration
run to determine the individual model parameters. With the help of the individually adapted
(calibrated) models, it is possible to perform simulations of training sessions and competi-
tions. Thus, the optimal speed can be cumulatively determined by multiple simulations of a
specific competition course with different (constant) speeds. The simulation is based on the
fact that the athlete has virtually exhausted all his “energy” at the end of the competition.
The target time calculated in this way was compared with the actual target times in a study of
33 marathons and half-marathons. The average deviation was 3.62%, whereby in more than
one-third of the races the exact target time could be predicted. ◄
Another possibility to use the simulation already during training is the optimiza-
tion of interval training units. The question often arises as to how many intervals
170 1,00
14 0,95
160
13 0,90
150 0,85
12
140 0,80
11 130 0,75
10 120 0,70
9 110 0,65
Speed in km/h
100 0,60
Heart Rate
8 0,55
Reserve
90
7 0,50
80 0,45
6 70 0,40
5 60 0,35
4 50 0,30
40 0,25
3 0,20
30
2 0,15
20 0,10
1 10 0,05
0 0 0,00
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
Time Steps in Min.
Performance Load Reserve Altitude profile
should be performed and at what speed. The length of the breaks between the
intervals can also be chosen differently. With the help of simulations, these aspects
can be examined and individual optimal interval training units can be identified.
► Example 2
In high-performance sports, the goal of training is to improve athletic performance,
maintain it, or counteract a decline (e.g., due to age). To achieve this, it is necessary
to repeatedly arrange training-effective stimulus constellations at the limit of the indi-
vidual adaptation capacity, which causes performance-increasing or -maintaining adap-
tation reactions (adaptation-effective load). Targeted phases of recovery are necessary
(Halson & Jeukendrup, 2004). If there is a permanent mismatch between training load
and recovery, there is a risk of overtraining, i.e., athletic performance (ability) may stag-
nate or even temporarily decrease (non-functional overreaching); in the worst case, the
stressed physiological functional systems only recover after months and performance
permanently decrease (overtraining syndrome). This condition can cause pathological
maladaptations in the various biological regulatory mechanisms and lead to an increase
in susceptibility to injury or disease (Schwellnus et al., 2016; Soligard et al., 2016).
To design the training load and stress optimally in terms of training goal achieve-
ment and to avoid training-induced overloads, the individual training and performance
data are examined for their correlation. These so-called training effect analyses are par-
ticularly helpful when the effects of training interventions on athletic performance (abil-
ity) can not only be modeled retrospectively but also simulated prognostically. This
approach is particularly useful in sports or disciplines with less complex training and
performance structures. Studies on cycling and swimming have shown that antagonistic
models (cf. Fig. 13.1) can be used to model performance retrospectively (model fit) and
to simulate future performance development satisfactorily for given training data, espe-
cially for shorter periods (forecast quality) (Pfeiffer & Hohmann, 2012; Pfeiffer, 2008;
Fuhrmann et al., 2014). In recent years, a large number of model-comparative studies on
model fit and prognostic goodness of fit have appeared, ranging from statistical
13 approaches to machine-learning methods (including Imbach et al., 2022; Matabuena &
Rodríguez-López, 2019). ◄
Study Box
mined using a step test as part of a per- values were evaluated with different cal-
formance diagnostic test. However, culation models to determine IAS,
these tests are also time-consuming and including the most commonly used
cost-intensive, as medically trained per- method by Dickhuth et al. (1999). Heart
sonnel are required to perform them rate and speed data from the same step
(blood sampling). test were used to individualize the
With PerPot-Run, a simulative parameters of the PerPot-Run infor-
determination of IAS was developed. In matic model to the athletes. By simulat-
a study by Endler et al. (2017), this was ing the previously described gold
compared with the most common per- standard with several endurance runs,
formance diagnostic methods for deter- the IAS can then be calculated. A com-
mining IAS. For this purpose, 13 male parison showed a very high correlation
handball players (age: 23.2 ± 2.3; weight between the heart rates determined with
in kg: 88.3 ± 11.4) completed a classic the Dickhuth method and the PerPot-
performance- diagnostic step test on a Run at the IAS (ICC: 0.916; r: 0.889). It
treadmill with a 3 min step length and a could thus be shown that the cumula-
2 km/h increase per step at a starting tively determined thresholds represent a
speed of 6 km/h. Between the steps, cap- cost- and resource-saving alternative or
illary blood was taken from the earlobe supplement to classical lactate-based
to determine lactate concentration. The performance diagnostics.
References
Balagué, N., Hristovski, R., Almarcha, M. D. C., Garcia-Retortillo, S., & Ivanov, P. C. (2020).
Network physiology of exercise: Vision and perspectives. Frontiers in Physiology, 11, 1607.
Borresen, J., & Lambert, M. (2009). The quantification of training load, the training response and the
effect on performance. Sports Medicine (Aucliland, N.Z.), 39(9), 779–795.
Dickhuth, H.-H., Yin, L., Niess, A., Rocker, K., Mayer, F., Heitkamp, H. C., & Horstmann, T.
(1999). Ventilatory, lactate-derived and catecholamine thresholds during incremental treadmill
running: Relationship and reproducibility. International Journal of Sports Medicine, 20(2), 122–
127.
Endler, S. (2013). Anpassung des Metamodells PerPot an den ausdauerorientierten Laufsport zur
Trainings- und Wettkampfoptimierung. Dissertation, Johannes Gutenberg-Universität Mainz.
https://doi.org/10.25358/openscience-3652.
106 M. Pfeiffer and S. Endler
Endler, S., Hoffmann, S., Sterzing, B., Simon, P., & Pfeiffer, M. (2017). The PerPot simulated anaero-
bic threshold: A comparison to typical lactate-based thresholds. International journal of human
movement and sports sciences, 5(1), 9–15. https://doi.org/10.13189/saj.2017.050102
Fuhrmann, S., Pfeiffer, M., & Hohmann, A. (2014). Modellierung von Trainingsprozessen im
Schwimmsport. In M. Witt (Ed.), DVS-Schwimmsport-Symposium 2011 (pp. 91–98). Deutsche
Schwimmtrainer-Vereinigung.
Halson, S. L., & Jeukendrup, A. E. (2004). Does overtraining exist? An analysis of overreaching and
overtraining research. Sports Medicine, 34(14), 967–981.
Hemingway, B., Burgess, K., Elyan, E., & Swinton, P. (2020). The effects of measurement error and
testing frequency on the fitness-fatigue model applied to resistance training: A simulation
approach. International Journal of Sports Science & Coaching, 15(1), 60–71.
Hohmann, A., Lames, M., Letzelter, M., & Pfeiffer, M. (2020). Einführung in die Trainingswissenschaft.
Limpert.
Imbach, F., Perrey, S., Chailan, R., Meline, T., & Candau, R. (2022). Training load responses model-
ling and model generalisation in elite sports. Scientific Reports, 12(1), 1586.
Matabuena, M., & Rodríguez-López, R. (2019). An improved version of the classical banister model
to predict changes in physical condition. Bulletin of Mathematical Biology, 2019(81), 1867–1884.
Meeusen, R., Duclos, M., Foster, C., Fry, A., Gleeson, M., Nieman, D., et al. (2013). Prevention,
diagnosis and treatment of the overtraining syndrome: Joint consensus statement of the European
College of Sport Science (ECSS) and the American College of Sports Medicine (ACSM).
European Journal of Sport Science, 13(1), 1–24.
Perl, J. (2001). PerPot: A metamodel for simulation of load performance interaction. Electronic
Journal of Sport Science, 1(2).
Perl, J. (2002). Adaptation, antagonism and system dynamics. In G. Ghent, D. Kluka, & D. Jones
(Eds.), Perspectives—The multidisciplinary series of physical education and sport science (Vol. 4th
ed, pp. 105–125). Meyer & Meyer Sport.
Pfeiffer, M. (2008). Modeling the relationship between training and performance: A comparison of
two antagonistic concepts. International journal of computer science in sport, 7(2), 13–32.
Pfeiffer, M., & Hohmann, A. (2012). Applications of neural networks in training science. Human
Movement Science, 31(2), 344–359.
Pol, R., Balagué, N., Ric, A., Torrents, C., Kiely, J., & Hristovski, R. (2020). Training or synergizing?
Complex systems principles change the understanding of sport processes. Sports Medicine -
Open, 6(1), 28.
Rasche, C., & Pfeiffer, M. (2019). Training. In A. Baca (Ed.), Modelling and simulation in sport and
exercise (pp. 187–207). Routledge.
13 Schwellnus, M., Soligard, T., Alonso, J. M., Bahr, R., Clarsen, B., Dijkstra, H. P., et al. (2016). How
much is too much? (Part 2) International Olympic Committee consensus statement on load in
sport and risk of illness. British Journal of Sports Medicine, 50(17), 1043–1052.
Soligard, T., Schwellnus, M., Alonso, J. M., Bahr, R., Clarsen, B., Dijkstra, H. P., et al. (2016). How
much is too much? (Part 1) International Olympic Committee consensus statement on load in
sport and risk of injury. British Journal of Sports Medicine, 50(17), 1030–1041.
Vermeire, K., Ghijs, M., Bourgois, J. G., & Boone, J. (2022). The Fitness–Fatigue model: What’s in the
numbers? International Journal of Sports Physiology and Performance, 2022(17), 810–813.
107 V
Programming
Languages
Contents
An Introduction
to the Programming
Language R for Beginners
Robert Rein
Contents
References – 123
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_14
110 R. Rein
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 The R programming language has been specifically designed for data analysis
55 R allows an easy entry into data analytics without much previous programming
experience
55 R is ideal as a first programming language
55 Through the use of packages, the functionality of R is constantly being extended
so that a wide variety of data analyses can be easily performed
55 Since R is open source, its usage is free of charge and future-proof
55 R is available for all common operating systems and can be used largely indepen-
dent of them
55 With additional packages, R fully supports a literate programming approach and
allows the easy creation of even complex reports in a wide variety of formats
14.3 Resources on R
While it was difficult to get help with problems at the beginning of R, this has
changed dramatically in the last 10–15 years. For example, due to the somewhat
unfortunate naming of R for search engines, the search for solutions to problems
was not very successful at the beginning. However, this has changed completely for
the better. On the internet there are countless extremely active communities around
R, with detailed blogs, podcasts, YouTube collections and programming help for
specific questions (e.g. Stack Overflow). In addition, the number of books around
and about R has exploded in recent years. With its own series on data analysis with
R (Springer Use R!, CRC The R-Series) and countless other scientific books with
at least code examples in R up to freely available collections of high-quality, scien-
tific books on R (7 bookdown.org). In addition, with the introduction of
ChatGPT and similar LLMs it is really easy to obtain individual support.
Therefore, getting started in data analysis with R is easier than with any other
programming language. In addition, within the scientific community in the field of
statistics, R is the dominant language for implementing and testing new proce-
dures. This leads to the fact that the latest statistical procedures are usually pro-
vided directly in R through packages developed by the respective scientists
themselves and are thus available at an early stage.
As has been emphasised several times, dealing with R consists of having R execute
specific commands. Thus, in the simplest case, R can be seen as an over-proportioned
calculator. For example, on the R command line, the following command 2 + 2
followed by an ENTER leads to the following sequence:
> 2 + 2
[1] 4
Here > stands for the command line and [1] indicates the first line of the output
of R. The command line works according to the principle of a so-called
REPL. REPL is an abbreviation for read-eval-print loop. The input is read in by R
(R), evaluated within the programming language (E), the result is display (P) and
subsequently the command line goes back to the initial state (L).
However, the 4 calculated in the example is now no longer available for further
processing. Since R has executed the REPL and the output is not automatically
saved. In order to further process the return value of an expression, this value must
be made accessible in some form. In order to be able to further use calculated val-
ues, these values must be assigned an identifier (name). This entails the concept of
a variable. Experience has shown that this concept represents a first major hurdle
for the transition from, for example, spreadsheet programmes where the calcula-
tions seem to take place directly on the data to be seen. To assign a name to an
expression or its return value in R, the assignment operator <- is used. For exam-
ple, if I want to give the result of the “complex” calculation 2 + 2 * 4 a name, it
would look like this:
> x <- 2 + 2 * 4
In this case, R does not return an expression, but has internally given the result
of 2 + 2 * 4 the identifier x. Calling x from on the command line then leads to:
> x
[1] 10
114 R. Rein
I.e. the value 10 is stored in the internal memory of R and the value can be
called up or output via the identifier x. This is a fundamental difference in the way
of working compared to spread sheet programmes. In R, calculations, the return
values of expressions, are assigned identifiers and can then be called again in later
steps. Conversely, if intermediate results do not have an identifier, they cannot be
reused.
Two further explanations of the previous examples are necessary. In the previ-
ous expressions, spaces have been placed between the individual parts of the
expressions. These spaces are only for readability and have no influence on the
evaluation of the expression by R. Therefore, the expressions 2 + 2 * 4 and 2 + 2*4
are equivalent and lead to the same result. When outputting the value, you proba-
bly also noticed that R did not calculate the value 16, which would be correct if the
evaluation of the expression is carried out strictly from left to right. However, R
has applied the correct mathematical rule of multiplication before addition and has
therefore arrived at the mathematical correct result of 10.
When further processing identifiers in R, note that R distinguishes between
upper and lower case. Therefore, calling the identifier (upper case X):
> X
Error: object 'X' not found
Results in an error. The occurrence of errors often leads to great confusion for
newcomers to R, but it is a completely normal occurrence in everyday program-
ming and should therefore not upset anyone. In this case, R only complains that it
cannot find the identifier X and therefore does not know how to proceed.
Working with R is largely based on the application of functions to values. In R,
functions are used according to the pattern <NAME>(<PAR1>,<PAR2>,…
,<PARk>) (the characters <> indicate any identifier). I.e. as soon as a pair of
round brackets follows an identifier, R assumes that a function has to be called. Via
<PAR1>,<PAR2>,…,<PARk>, comma-separated parameters can be passed to
14 the function. The number of parameters depends on the definition of the function.
A simple example is the application of the square root to a numerical value.
> y <- 9
> sqrt(y)
[1] 3
The great advantage of R is that you can easily define your own functions. For
example, a function that returns the minimum and maximum of a vector as a vec-
tor with two entries can be defined as follows.
Here the keyword function() R indicates that a new function is being defined.
The keyword is followed by the two round brackets with the required parameters.
In the example, only one parameter is required, which is given the identifier x. The
naming is completely arbitrary and only has to be used if the function is to be
defined. The naming is completely arbitrary and must only be used appropriately
in the following function body, which is defined by the curly brackets {} area. When
the function is called, the parameter is replaced according to the value passed in the
brackets to the function body. The + in the output shown are just formatting sym-
bols by the R command line and should not be typed in by the programmer.
If a function from an R package is required. The package must first be installed
in the R environment on the local computer, if this has not already been done in a
previous session. Again, a function is used for this purpose. For example, to gener-
ate interactive maps with R, the package leaflet is necessary. The following com-
mand installs the package in the R system:
> install.packages("leaflet")
R contacts the CRAN server in the background and downloads the corre-
sponding package and required dependencies. The functionality of the package is
then not yet directly available, but the package must first be loaded into the cur-
rently active working environment with another command.
116 R. Rein
> library(leaflet)
These examples only serve to give a very first overview of working with R and
to get to know the first concepts of working with R. As can be seen from this sim-
ple example, one of the challenges is to learn the necessary commands and func-
tions. Getting started with R is therefore similar to learning a new (very simple)
language. Nowadays, this initial hurdle is made much easier by the significantly
improved search functions of internet search engines. Thus, a search that starts
with an R and the problem is usually sufficient to find suggested solutions. In this
way, rapid productivity in R can be achieved without the need for in-depth pro-
gramming knowledge. Therefore, R is very well suited as a first programming lan-
guage in which more advanced programming concepts can be gradually developed
as needed. Good starting sources for getting started are (Chambers, 2008; Dalgaard,
2020; Peng, 2016; Wickham & Grolemund, 2016).
The following is a somewhat more extensive example with only brief explana-
tions. Detailed explanations of the commands used can be obtained in R through
the help documentation. To do this, simply place a ? in front of the function name
and R opens the corresponding help file.
> ?mean
Let the following data set from . Table 14.1 be given. In two independent groups
A and B, the body fat content was determined and now it is to be examined whether
there is a statistically significant difference between the two groups. Of course, this
is only a synthetic example and should therefore not be carried out in this form
within an actual scientific paper, but only serves as an illustration (. Table 14.1).
14 In order to carry out a data analysis, the data must first be loaded into R. In the
raw form (Supplementary Material: bfp_data.txt), the data are in the form of a text
file. The first column of the file shows the group membership, while the second
column contains the respective fat content. The columns are separated by a comma
and a dot is used as decimal separator, following international conventions. To load
the data into R, the read_csv() function from the readr package is used. First, how-
ever, the current working directory of R must be set to the corresponding folder
under which the file is stored. The function setwd() (short for set working direc-
tory) is used for this. Since path specifications in R are always determined relative
to the working directory, this step makes further work easier since no long file
paths have to be specified.
> setwd(<PFAD>)
An Introduction to the Programming Language R for Beginners
117 14
A B
13.3 22.0
6.0 16.0
20.0 21.7
8.0 210.0
14.0 30.0
19.0 26.0
30.0
Now the package readr is loaded and the file is loaded using the function read_
csv.
> library(readr)
> bfp <- read_csv(file = 'bfp_data.txt')
The data is now available in R under the identifier bfp. The name is chosen
arbitrarily and kept short for the sake of simplicity.
The data is stored in a so-called data.frame object (or the newer version tibble)
and can now be processed further. For example, an overview of descriptive statis-
tics of the data can be generated using the function summary().
> summary(bfp)
Group BFP
Length:13 Min. : 6.00
Class :character 1st Qu.: 14.00
Mode :character Median : 20.00
Mean : 33.54
3rd Qu.: 26.00
Max. :210.00
Here you can already see that one of the data points is probably incorrect
because the value is >100, which is not possible for a percentage body fat. In the
next step, the data is displayed graphically using a boxplot. R provides numerous
functions for simple graphical representation. However, we will use the package
ggplot2 here, which enables the creation of modern publication-quality graphs
(Healy, 2018; Wickham, 2016). Again, the package must first be loaded before its
functionality can be accessed.
> library(ggplot2)
> ggplot(bfp, aes(Group, BFP)) + geom_boxplot()
In the boxplot in . Fig. 14.1, the problematic data point is even more clearly
visible and it also prevents an analysis of the data at the same time. Since we have
no further information, we exclude the data point. To do this, we use the filter func-
tion from the dplyr package.> library(dplyr)
14
> bfp_clean <- filter(bfp, BFP <= 100)
> bfp_clean
# A tibble: 12 x 2
Group BFP
<chr> <dbl>
1 A 13.3
2 A 6
3 A 20
4 A 8
5 A 14
6 A 19
7 B 22
8 B 16
9 B 21.7
10 B 30
An Introduction to the Programming Language R for Beginners
119 14
.. Fig. 14.1 Illustration of the sample data using a boxplot with the problematic data point
11 B 26
12 B 30
> ggplot(bfp_clean, aes(Group, BFP)) + geom_boxplot()
The boxplot is now much more informative (see . Fig. 14.2). Without going
into the statistical prerequisites any further, we now conduct an independent t-test
for groups with different variances. For this we again use a function from R.
As this example shows, a data analysis can be realised in R with just a few com-
mands. The developers of R have taken care that the naming of functions is as
close as possible to the desired activity, so that the English term usually allows one
to quickly deduce the function.
120 R. Rein
.. Fig. 14.2 Representation of the sample data by means of a boxplot after exclusion of the outlier
In the example, we entered all commands directly on the command line and
analysed the data interactively. In an actual analysis, the data analysis will consist
of a combination of interactive work and permanent scripts. For example, those
final commands to be applied to the data would be written to a script file so that
the analysis can be revisited or retraced at a later time. In this way, the workflow
shown could lead to the following script:
setwd(<PFAD>)
# Required packages
library(readr)
library(ggplot2)
library(dplyr)
# Read in data
14 bfp <- read_csv(file = 'bfp_data.txt')
# Process data
bfp_clean <- filter(bfp, BFP <= 100)
# Deskriptive analysis
summary(bfp_clean)
# Graphics
ggplot(bfp_clean, aes(Group, BFP)) + geom_boxplot()
# Analysis
t.test(BFP~Group, data = bfp_clean)
An Introduction to the Programming Language R for Beginners
121 14
The commands used are even more comprehensible through the use of com-
ments, which are signaled in R with a #.
Since the traceability of data analysis not only for oneself but also by external
parties in the context of scientific reproducibility is becoming increasingly impor-
tant in scientific practice, R offers extensive functionality for documenting and
publishing data analyses (Xie, 2016, 2017). Conceptually, this area falls under so-
called literate programming. In 2002, a special package Sweave was published
which allows the combination of R and Latex code in a single document. Building
on this, a whole standard has since developed in the form of RMarkdown docu-
ments to generate reports, scientific articles, books, web blogs, dashboards, interac-
tive documents, presentations and much more directly from R. A wide variety of
output formats (e.g., text, graphics) are available for this purpose including for
example docx, pdf, odf, pptx, html.
At its core, many of these processes are based on the Markdown standard,
which makes it possible to easily create even the most complex documents.
Markdown is a simplified markup language that can be used to create documents
without complicated word processing systems and where the final formatting of
the documents is left to an external systems (the pandoc software in R). The
RMarkdown format enriches the Markdown format with R specific elements. A
simple example document based on our previous scripts would be:
---
author: "Robert Rein"
title: "RMarkdwon-Beispiel"
output: pdf_document
---
# Required packages
```{r}
library(readr)
library(ggplot2)
library(dplyr)
```
# Read in data
```{r}
bfp <- read_csv(file = 'bfp_data.txt')
```
# Process data
```{r}
bfp_clean <- filter(bfp, BFP <= 100)
```
122 R. Rein
# Deskriptive analysis
```{r}
summary(bfp_clean)
```
# Graphics
```{r}
#| fig.cap="Boxplot of data",
#| fig.height=3
ggplot(bfp_clean, aes(Group, BFP)) + geom_boxplot()
```
# Analysis
```{r}
t.test(BFP~Group, data = bfp_clean)
```
The RMarkdown document begins with a so-called YAML header with meta
data. This is followed by the actual document. In Markdown notation, a hash (#)
means a first-order heading. The areas between ``{r} and ``` denote an R-code area
called a chunk. The document can be translated into a PDF report using the knitr
package. R first executes the respective code areas and the corresponding expressions
and values are replaced in the chunks. Generated graphics are also inserted directly
into the document. Recently, an updated version of RMarkdown called Quarto was
introduced which further improved the combination of code and text and further
allows to combine different programming languages in a single document.
Overall, the R environment therefore allows the complete life cycle of a data
analysis from the initial processing to the publication of the data from a single
environment. This allows recurring workflows to be automated and reporting sys-
tems to be created relatively easily for a wide range of use cases.
14
Study Box
Data analysis is rarely a straightforward task in which all processing steps are fixed
from the beginning. Therefore, every data analysis is characterized by interactive
phases in which the data must first be examined and processed. Following data pro-
cessing, the actual data analysis usually takes place. These two steps can be run
through several times in iterative cycles before a final process pipeline is created with
which the data are finally analyzed and published. A software environment for data
analysis must therefore optimally support these steps. The R programming language
was designed precisely with the aim of enabling these phases and therefore offers
possibilities for interactive as well as more structured ways of working. Through the
possibility of integrating additional packages, the functionality of R is constantly
An Introduction to the Programming Language R for Beginners
123 14
being expanded and for practically all possible use cases there are, if not complete,
then already ready-made partial solutions. R offers extensive support for the com-
plete life cycle of a data analysis. Since R is a complete programming language, all
analysis steps can be adapted to one’s own needs. R allows application at different
levels of complexity as needed. Thus, in the course of time, a development from a
new user with no programming experience to a fully-fledged programmer can be
undergone.
> y <- 3
> z <- 3 * y + 4
> z
References
Abelson, H., & Sussman, G. J. (1996). Structure and interpretation of computer programs. The MIT
Press.
Chambers, J. M. (2008). Software for data analysis: Programming with R (Vol. 2). Springer.
Dalgaard, P. (2020). Introductory statistics with R (2nd ed.). Springer.
Healy, K. (2018). Data visualization: A practical introduction. Princeton University Press.
Ihaka, R. (1998). R: Past and future history. Computing Science and Statistics, 30, 392–396.
Peng, R. D. (2016). R programming for data science. Leanpub.
Sussman, G. J., & Steele, G. L. (1998). Scheme: A interpreter for extended lambda calculus. Higher-
Order and Symbolic Computation, 11(4), 405–439.
Venables, W., & Ripley, B. D. (2000). S programming. Springer Science & Business Media.
Wickham, H. (2016). Programming with ggplot2. Springer.
Wickham, H. (2019). Advanced R. CRC Press.
Wickham, H. (2021). Mastering shiny. O'Reilly Media, Inc.
Wickham, H., & Grolemund, G. (2016). R for data science: Import, tidy, transform, visualize, and
model data. O’Reilly Media, Inc.
Xie, Y. (2016). Bookdown: Authoring books and technical documents with R markdown. Chapman and
Hall/CRC.
Xie, Y. (2017). Dynamic documents with R and knitr. Chapman and Hall/CRC.
125 15
Python
Maximilian Klemp
Contents
References – 130
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_15
126 M. Klemp
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Python is a dynamic, object-oriented, high-level open-source programming lan-
guage and one of the most popular languages for Data Science tasks
55 The large volume of data generated in sports require efficient and flexible analy-
sis pipelines, spanning from large-scale data pre-processing (such as position and
event data) to modelling by means of Machine Learning or Artificial Intelligence
55 Python offers a wide range of functionalities through its various libraries, pro-
viding solutions for every processing step occurring in sports analytics
55 Multiple state-of-the-art frameworks frequently used in sports analytics are
implemented in Python and accessible for researchers, teachers and students via
open source libraries
55 the numerous use cases show the importance and significance of Python in sports
analytics
15.2 Background
From the previously discussed structure and volume of data, there arise specific
requirements for the data processing tasks in sports analytics. For example, often-
times raw data (e.g. position or event data) are available for multiple matches, train-
ing sessions or sequences, which have to be pre-processed before actual analysis
steps can be performed. This pre-processing might include filtering, grouping and
aggregation operations as well as creating variables by transforming or combining
existing variables. Sometimes it is necessary to join data from different sources on
a common index. For example, position and event data might have to be synchro-
nized on the level of a timecode or meta-information of matches might have to be
combined with variables calculated from raw data (following the terminology from
the database querying language SQL, these operations are referred to as “joins”).
This bandwidth of possible processing steps already makes the development of
efficient and robust pre-processing pipelines inevitable. Furthermore, in processing
and analysing sports data, often the same steps have to be performed repeatedly.
The reason for this might be the collection of new data, which has to be processed
immediately, or the vast volume of the analysed data. Position data for one football
match takes around 500 MB of memory in Python, so loading a whole season
(which amounts to a sample size that analyses should be based upon at best) at
once is impossible. This circumstance primarily justifies the need for reproducible
analysis routines. Finally, after creating structured datasets through the steps out-
lined above, this data has to be modelled using methods of Machine Learning or
Artificial Intelligence in order to find associations between indicators or to solve
classification problems in the domain of computer science in sports (such as the
generation of position data from broadcasting videos or the detection of match
phases).
Python offers solutions for all of the challenges named above and is, due to this
and other features, well-suited for data processing and analysis in the sport context.
Python is highly popular in the community of data scientists in general and sports
analysts in particular because it is a open-source, dynamic, object-oriented, high-
level programming language, which provides highly flexible and up-to-date func-
tionalities due to its available modules and libraries. With respect to the
128 M. Klemp
► Example 1
The above-mentioned pre-processing steps (filtering, grouping, aggregation) are needed
in multiple endeavours of analysing large data volumes. Accordingly, Anzer and Bauer
(2021) created an Expected Goals Model using information from position and event
data simultaneously. To this end, complex processing steps for synchronization of both
data types were necessary. Klemp et al. (2021) collected multiple performance indica-
tors separately for both halves of football matches and filtered variables with respect to
ball possession or running velocities. Subsequently, the indicators calculated from raw
position data had to be combined with those from raw event data and from another
data source for betting odds. The operations performed in these two research articles
consisted of several pre-processing steps, which were performed using the functionalities
of pandas and NumPy. ◄
► Example 2
Another prominent use case within the field of tactical analysis comprises the calcula-
tion of the so-called variables of collective behaviour. These variables model the rela-
tionships and interactions among players and teams as geometric of algebraic entities,
making the library SciPy a popular choice for these calculations. Examples of this can
be found in the comparison of team formations by Memmert et al. (2019) or in the
examination of substitutions in football by Lorenzo-Martínez et al. (2022). ◄
► Example 3
Finally, Python enables statistical modelling both in the sense of null hypothesis sig-
nificance testing (e.g. Bassek et al., 2022) and for implementing complex models from
the area of Machine Learning. One of the probably most well-known applications can
be found in the work of Decroos et al. (2019), who built a model using event data to
quantify the game state for any given time instance. The game state thereby comprises
the probability of a team scoring or conceding a goal within a defined time interval in
the future. Building upon this, all player actions can be valued based on their effect on
the game state. The approach also contains data processing and visualization techniques.
It is to be highlighted especially because the algorithms developed in the course of the
study have been published within the Python library socceraction (7 https://github.com/
ML-KULeuven/socceraction) so that interested programmers can reproduce the analy-
ses using data from different providers. Finally, in this respect, the publication of a large
scale data set of event data by Luca Pappalardo et al. (2019) should be mentioned,
where also functions for the import and analysis of the data are provided by the authors
(7 https://github.com/Friends-of-Tracking-Data-FoTD/mapping-match-events-in-
Python). ◄
130 M. Klemp
Study Box
As pointed out during the previous sections, data analysts in sports are confronted
with recurring tasks, which in specific cases might still require customized solutions.
To account for this circumstance and provide a variety of generalized solutions, the
Python library floodlight was published by Raabe et al. (2022). It contains different
pre-processing and analysis routines for the investigation of sports data, at this point
in time specifically football and handball data. The functionalities of floodlight con-
tain import functions for different data sources of position, event and meta data,
visualization methods as well as specific data models from different disciplines like
exercise physiology (Metabolic Power Model, di Prampero & Osgnach, 2018),
dynamical systems (Approximate Entropy, Pincus, 1991) or collective behaviour
(Bourbousson et al., 2010). The aim of floodlight was to standardize or externalize
recurring processes of import and pre-processing so researchers could focus more
specifically on the analysis itself.
References
Anzer, G., & Bauer, P. (2021). A goal scoring probability model for shots based on synchronized
positional and event data in football (soccer). Frontiers in Sports and Active Living, 3, 624475.
Barrett, P., Hunter, J., Miller, J. T., Hsu, J.-C., & Greenfield, P. (2005). matplotlib—A portable python
plotting package. In Astronomical data analysis software and systems XIV.
Bassek, M., Raabe, D., Memmert, D., & Rein, R. (2022). Analysing motion characteristics and meta-
bolic power in elite male handball players. Journal of Sports Science and Medicine, 22(2), 310–
316.
Bourbousson, J., Sève, C., & McGarry, T. (2010). Space–time coordination dynamics in basketball:
Part 2. The interaction between the two teams. Journal of Sports Sciences, 28(3), 349–358.
15 Decroos, T., Bransen, L., Van Haaren, J., & Davis, J. (2019). Actions speak louder than goals: Valuing
player actions in soccer. In Proceedings of the 25th ACM SIGKDD international conference on
knowledge discovery & data mining.
Di Prampero, P. E., & Osgnach, C. (2018). Metabolic power in team sports—Part 1: An update.
International Journal of Sports Medicine, 39(08), 581–587.
Klemp, M., Wunderlich, F., & Memmert, D. (2021). In-play forecasting in football using event and
positional data. Scientific Reports, 11(1), 1–10.
Lorenzo-Martínez, M., Rein, R., Garnica-Caparrós, M., Memmert, D., & Rey, E. (2022). The effect
of substitutions on team tactical behavior in professional soccer. Research Quarterly for Exercise
and Sport, 93(2), 301–309.
McKinney, W. (2010). Data structures for statistical computing in python. In Proceedings of the 9th
python in science conference.
McKinney, W. (2011). pandas: A foundational python library for data analysis and statistics. Python
for High Performance and Scientific Computing, 14(9), 1–9.
Python
131 15
Memmert, D., & Raabe, D. (2018). Data analytics in football: Positional data collection, modelling and
analysis. Routledge.
Memmert, D., Raabe, D., Schwab, S., & Rein, R. (2019). A tactical comparison of the 4-2-3-1 and
3-5-2 formation in soccer: A theory-oriented, experimental approach based on positional data in
an 11 vs. 11 game set-up. PLoS One, 14(1), e0210191.
Oliphant, T. E. (2006). A guide to NumPy (Vol. Vol. 1). Trelgol Publishing USA.
Pappalardo, L., Cintia, P., Rossi, A., Massucco, E., Ferragina, P., Pedreschi, D., & Giannotti, F.
(2019). A public data set of spatio-temporal match events in soccer competitions. Scientific Data,
6(1), 1–15.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M.,
Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M.,
Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in python. The Journal of
Machine Learning Research, 12, 2825–2830.
Pincus, S. M. (1991). Approximate entropy as a measure of system complexity. Proceedings of the
National Academy of Sciences, 88(6), 2297–2301.
Raabe, D., Biermann, H., Bassek, M., Wohlan, M., Komitova, R., Rein, R., Groot, T. K., &
Memmert, D. (2022). floodlight—A high-level, data-driven sports analytics framework. Journal
of Open Source Software, 7(76), 4588.
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and statistical modeling with python. In
Proceedings of the 9th python in science conference.
133 VI
Data Analysis
Contents
Logistic Regression
Ashwin Phatak
Contents
References – 140
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_16
136 A. Phatak
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Logistic regression is one of the basic statistical classification models used for
modeling binary outcomes.
55 With a large amount of notational/event data available in sports, Logistic regres-
sion (LR) is one of the simplest tools to investigate a wide array of binary prob-
lems relevant in sports.
55 Logistic regression can handle data imbalances, it is highly interpretable and
computationally inexpensive.
16.2 Background
The rise of ‘Big Data’ in sports performance analysis has taken a whole new
approach. Data collection mechanisms and their delivery to the sports industry
have given rise to a wide potential for use of available data across sports and their
respective sub-domains (Rein & Memmert, 2016). Statistical analysis and machine
learning are the tools which help experts mine and interpret data in the decision-
making and knowledge-discovery process (Phatak et al., 2021).
Classification and regression methods are the primary methods used for knowl-
edge discovery. In classification, Logistic Regression is one of the simplest algo-
rithms used for modeling binary problems. It is computationally inexpensive,
interpretable, and robust against data imbalances. Programs like SPSS and Excel
and languages like python and R have built-in libraries to implement Logistic
Regression, making it simple and accessible to data analysts with different levels of
technical skill (Persson, 2022).
Definition
In statistics, the logistic model is a statistical model that models the probability of
an event. This is performed by taking the log odds for the event to be a linear
combination of one or more independent variables (Wright, 1995). In essence, it is
a linear regression compressed between 1 and 0 using a logit transform (see Equa-
tion below). So, the ceiling value is fixed at 1 and the floor (bottommost) at 0.
1
F ( x) =
1+ e ( ( b0 +¼+ b n xn ) )
Visually it can be interpreted as fitting a squiggle (see . Fig. 16.1) to the given data
to predict a binary dependent variable. Logistic regression and its variations such
as multiple logistic regression and logistic mixed models are capable of taking in
multiple independent variables as input and also capable of modeling interaction
effects.
138 A. Phatak
16.3 Application
► Example 1
Let’s consider a situation in an invasion game like football (soccer). There has been great
debate about Video Automated referees (VAR) (Tovar, 2021). We can potentially model
an automated system like VAR using Logistic Regression. In that case, the parameters
will be as follows:
55 Dependent variable: Offside = 1 and onside = 0
16 55 Independent Variables: At the moment when the pass is played:
–– Vertical Location of the furthermost body part of the second-last player of the
defending team at the moment when the pass was played.
–– Vertical Location of the furthermost body part of the highest player (involved in
the game) of the attacking team at the moment when the pass was played.
We can train the logistic regression algorithm by giving it a label based on whether the
given situation is onside or offside. We can then use the model as a rudimentary VAR
system to detect offsides. ◄
Logistic Regression
139 16
► Example 2
Win/Loss probability can also be modeled using Logistic Regression in multiple sports
based on betting odds (Wunderlich & Memmert, 2018; Wunderlich & Memmert, 2016).
A model can be made to improve betting odds based on the results of the past 3 games.
Take any sport in which team A is playing team B following would be the input and
output parameters:
55 Dependent Variable: Win = 1, Loss = 0
55 Independent Variables:
–– Betting odds of team A winning
–– Betting odds of team B winning
–– Result of the last 3 games for team A
–– Result of the last 3 games for team B
Note: Such a model can be theoretically made but there are nuances. In our case, we have
assumed that betting odds don’t already account for the last three games. If this is the
case we would have to deal with multicollinearity in the Dependant Variables which is a
whole new topic by itself. You can look into it further if you are interested. ◄
► Example 3
Every season in the NBA, there is a draft of players from the NCAA. A model using
logistic regression can potentially be designed to decide whether to pick the player or not
(Liu et al., 2018). Following would be the parameters of such a model.
55 Dependent Variable: Pick = 1, Not to Pick = 0
55 Independent Variables:
–– Attacking Statistics
–– Field Goals
–– 3-point shots
–– 2-point shots
–– etc.
–– Defending Statistics
–– Rebounds
–– Steals
–– Blocks
–– etc.
–– Physiological Statistics
–– Height
–– Jump height
–– etc. ◄
Study Box
A study conducted on elite and sub-elite goalkeepers in soccer, analyzed, what set of
Key Performance Indices (KPIs) distinguish champion league-level GKs (CL) from
non-champion League (NCL) level ones (Jamil et al., 2021). The analysis involved
the use of logistic regression where the CL GKs were encoded as 1 while the NCL
140 A. Phatak
were encoded as 0 as the binary dependent variable. After some preprocessing steps,
a set of 20 GK performance statistics (Independent Variables) were used to model
the differences. It was observed that CL GKs were better at short distribution with
their feet as compared to the NCL GKs. There seem to be no differences in the shot-
stopping ability.
This Idea can be used to analyze any position in football (soccer) and the encod-
ing of the players as 1/0) (success criteria) is also dependent on the research question.
A question such as what is the difference between the KPIs of Relegation level mid-
fielders as opposed to non-relation level midfielders can also be answered using the
same data and logistic regression by simply hanging the encoding criteria of the
players.
References
Jamil, M., Phatak, A., Mehta, S., Beato, M., Memmert, D., & Connor, M. (2021). Using multiple
machine learning algorithms to classify elite and sub-elite goalkeepers in professional men’s foot-
ball. Scientific Reports, 11(1), 1–7.
Liu, Y., Schulte, O., & Li, C. (2018). Model trees for identifying exceptional players in the NHL and
NBA drafts. In International workshop on machine learning and data mining for sports analytics
(pp. 93–105). Springer.
Mattera, R. (2021). Forecasting binary outcomes in soccer. Annals of Operations Research, 1–20.
Persson, I. (2022). Review of applied univariate, bivariate, and multivariate statistics using python by
Daniel Denis. Structural Equation Modeling: A Multidisciplinary Journal, 29, 321–325.
Phatak, A. A., Wieland, F. G., Vempala, K., Volkmar, F., & Memmert, D. (2021). Artificial
intelligence-based body sensor network framework–narrative review: Proposing an end-to-end
framework using wearable sensors, real-time location systems and artificial intelligence/machine
learning algorithms for data collection, data mining and knowledge discovery in sports and
healthcare. Sports Medicine-Open, 7(1), 1–15.
16 Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: Future challenges and
opportunities for sports science. Springerplus, 5(1), 1–13.
Tovar, J. (2021). The debate of VAR. In On fairness, justice, and VAR (pp. 29–39). Palgrave Macmillan.
Wright, R. E. (1995). Logistic regression.
Wunderlich, F., & Memmert, D. (2016). Analysis of the predictive qualities of betting odds and FIFA
world ranking: Evidence from the 2006, 2010 and 2014 football world cups. Journal of Sports
Sciences, 34(24), 2176–2184.
Wunderlich, F., & Memmert, D. (2018). The betting odds rating system: Using soccer forecasts to
forecast soccer. PLoS One, 13(6), e0198668.
141 17
Contents
References – 147
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_17
142 R. Komitova and D. Memmert
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Time series data are common in the sport domain and require often particular
analysis techniques.
55 Extracting information about the behavior of time series data is potentially very
important and useful for the analysis of sport data.
55 Time series data mining techniques are a good candidate for the application on
sports data in order to discover unknown knowledge.
55 Time series data mining techniques are useful for decision making in the sport
domain by looking for the existence of patterns, discovering of motifs, or detect-
ing anomalies.
17.2 Background
Data mining is a field in computer science to discover and extract useful informa-
tion from the data. Although, traditionally, analysis of sport data is based on
expert knowledge and statistical analysis, data mining techniques have become
increasingly popular in the field (Stein et al., 2017). The evolution of tracking sys-
tems becomes a further opportunity for researchers in the sport domain to extract
new knowledge related to player performance, and movement patterns, among oth-
ers. With the amount of information in the sport domain opportunities for data
mining can be extremely widespread, and benefits from the results can be enor-
mous (Bonidia et al., 2018; Ofoghi et al., 2013). For example, recurring and surpris-
ing events extracted from sport data could be helpful for sport activity recognition
and patterns could be helpful in forecasting of future events, such as prediction of
performance, coaching, and strategy planning.
There exists a large number of data mining applications and domains where
information is reordered over a period of time, leading to sequence of temporal
data (also called samples or observations). For example, in the sport domain, one
major source of temporal data generated is coming from sensor data, which can be
represented as time series. Application of time series techniques in data mining is
called time series data mining (TSDM) (Esling & Agon, 2012; Fu, 2011; Mitsa,
2010; Komitova et al., 2023). Traditional time series analysis (Box et al., 2016) con-
tains methods to analyze time series data in order to extract temporal rules from
the structure of time series, such as trends, changes in value, seasonality, periodic-
ity, or other characteristics of the data to generate an accurate forecast. TSDM, in
contrary, deals with much larger amounts of time series data and much higher
number of time series. However, the focus of TSDM is less on the analysis of the
statistical properties of time series data, but instead focuses on the discovery of
hidden relations between time series and extract potentially useful and meaningful
information from them, where the terms “useful” and “meaningful” depend on the
application. TSDM addresses tasks such as classification and clustering of time
series, anomaly detection, and motif discovery in time series, and some more.
While the sport domain is only mentioned barely, the use of TSDM tasks could
also be more extended to sport. State-of-the-art approaches used in medicine and
individual sports are hardly applicable. For example, multiple types of human
motion can occur within a recording session of physical activity (Minnen et al.,
2006; Tanaka et al., 2005). TSDM tasks such as anomaly detection and motif dis-
covery for human motion rely on similarity between (single or multidimensional)
time series sequences of activities in form of motifs. Similar activities are charac-
terized by similar sets of actions that appear frequently in sports, too.
The activities of a player can be broadly classified into simple activities (or
events) and complex activities. Simple activities do not depend on the context, i.e.
144 R. Komitova and D. Memmert
they can exist by themselves (e.g. specific actions made by individual players).
Complex activities on the other hand are composed of a set of simple activities and
may focus on understanding the relationship between the event, or the interactions
between other players and analyzing them as a sequence of events. For example,
time series data from the position data of a player during a match can be broken
down to events as passes, or shots but even more complex activities such as drib-
bling can be regarded as combination of multiple single activities. Activity recogni-
tion (or event detection) in sports by considering positional data in form of time
series can thus provide valuable knowledge and context about the actions of a
player. However, such as task is not a simple task because there is no standard tax-
onomy of player activities. Additionally, it is not easy to model complex activities,
or context category precisely and generally enough.
Definition
A time series T is an ordered sequence of real numbers, i.e. T = [t1, …, tn], where
ti ∈ R, i = 1, …, n, denotes the i-th element of the time series T. A time series can
be a collection of observations from one source, i.e. one sensor. A multidimen-
sional time series is a set of single time series. With regards to sensor data, the term
multidimensional implies the exploration of multiple time series (signals) in paral-
lel. Time series data mining (TSDM) is a field in data science to discover and
extract useful information from time series data.
17.3 Applications
► Example 1
Time series classification and time series clustering is an important and challenging prob-
lem in time series data mining (Esling & Agon, 2012; Liao, 2005; Mitsa, 2010). Time
series classification seeks to assign labels to each time series of a set. Given an unlabeled
time series, the goal of time series classification is to assign it to one out of a given
number of predefined classes. Clustering is method of creating natural groups, so called
clusters, in a dataset. Input of the clustering can be a set of time series (multidimensional
time series) from different sources, such as sensors, or the set of subsequences of a single
large time series data source. Considering a set of time series, the main idea of clustering
is to find groups of time series that are similar inside the cluster but are relatively differ-
17 ent from time series of other clusters. ◄
► Example 2
Time series anomalies can be defined as unexpected or unusual patterns in time series
data that do not conform to a well-defined notion of the expected (normal) behavior
(Zolhavarieh et al., 2014). In other words, anomalies appear when the underlying pro-
cess deviates from its normal behavior. The problem of finding them is referred to as
Time Series Data Mining
145 17
anomaly detection. Anomalies can be broadly classified into three general categories
(Chandola et al., 2009). A point anomaly is a point that deviates significantly from all
the points in the dataset (Braei & Wagner, 2020). Contextual anomalies are data points
whose values are anomalous with respect to a specific context, but not otherwise. That
is, a given behavior might be “normal” in concrete context but abnormal on another.
Finally, a collective anomaly refers to a collection of related data instances that individu-
ally may not be anomalies, but their collective appearance is anomalous. Knowing a
priori which type of anomaly the time series data might contain, helps the data analyst
to choose the appropriate detection method. ◄
► Example 3
A common problem in the time series data mining and machine learning community
is the finding of previously unknown, frequently occurring subsequences of single (or
multiple) time series, also called motifs (Chiu et al., 2003; Lin et al., 2002). Motif dis-
covery is the technique to find them and is a fundamental problem for time series data
mining (Mueen, 2014; Torkamani & Lohweg, 2017; Tanaka et al., 2005). Examples for
motifs can be peaks (e.g. local minima or maxima), changes of noise characteristics of
time series, or a variation of time or spectral components, which repeatedly occur in a
time series. However, finding motifs is a difficult task, even when they have the same or
very similar general characteristics, because in most cases the number of occurrences of
motifs, their shape and length, and duration of occurrences may be unknown (Mitsa,
2010). ◄
► Example 4
Anomaly detection and motif discovery in the medical domain is a very critical prob-
lem with requires high degree of accuracy (Lin & Li, 2010; Liu et al., 2015; Sivaraks &
Ratanamahatana, 2015). Electrocardiogram (ECG) is nothing but a time series which
consists of the electrical impulses from the heart. Anomaly detection in ECG can help
to detect the abnormal heartbeats before the diagnosis and motif discovery can help to
locate the highly similar and rapid beats in ECG. For example, Wankhedkar and Jain
(2021) proposed method to detect anomaly present in an ECG data. The beats similar
to each other are the motifs whereas the beats having high value and dissimilar are the
anomalies. ◄
discovery and classical machine learning algorithms like support vector machine, clas-
sification, and Naïve Bayes. Peaks in the accelerometer data (motifs) has been detected
from the left and right shoe of soccer player and the peaks has been classified regarding
shot, pass, and other. The detection of atomic pass events, defined as the moment in
which the ball leaves the foot of the player, was also conducted (Sanford et al., 2020).
The authors propose machine learning algorithms that are capable of detecting passes,
along with other group activities, from either the video or the position data. ◄
► Example 6
Yeh et al. (2017) for example used an algorithm to discover the correct (multidimensional)
motif location of a boxer’s punch (as repeated behavior). The algorithm matches a simple
cross with the cross on a one to two combo, and the three dimensions: right upper arm,
right forearm, and left upper lag. Two behaviors on the boxer’s dominant hand are almost
identical but is in a different position within different occurrences of the motif. ◄
Study Box
Biermann et al. (2023) used an algorithm to automatically detect events such as passes
in the positional data of soccer matches using time series data mining techniques. To
detect pass events the authors provided motif detection method from the player-ball
distance and ball acceleration in form of time series obtained from the position data.
Motifs occur when the time series data show a specific form or change at the same
time (see . Fig. 17.1). The authors therefore suggested a method that used both time
17
.. Fig. 17.1 An exemplary sequence for player-ball distance (cyan curve) and ball accelera-
tion (orange curve). Here, k denotes the time point at which a pass (an event) occurs
Time Series Data Mining
147 17
series from the position data and event data. Initially, they compared passes from the
event data with the position data. Therefore, they instruct an expert to label passes in
the position data. They found out that passes in the event data tend to be slightly
delayed in comparison to the position data. To compensate this delay, they initially
perform a time series motif detection method to construct a model that detects passes
in the position data. To perform the detection of events in the time series data, the
authors used an appropriate feature space representation of the time series.
Subsequently, they take the (delayed) passes from the event data as a template and
refine them in a certain window given the pass detection model. They report that this
algorithm is largely improving the synchronization between position and event data.
References
Biermann, H., Komitova, R., Raabe, D., Müller-Budack, E., Ewerth, R., & Memmert, D. (2023).
Synchronization of passes in event and spatiotemporal soccer data. Scientific Reports, 13, 15878.
Bonidia, R., Rodriges, L., Avila-Santos, A. P., Sanches, D., & Brancher, J. (2018). Computational
intelligence in sports: A systematic literature review. Advances Human-Computer Interaction,
2018, 1–13.
Box, G., Jenkins, G., & Reinsel, G. (2016). Time series analysis: Forecasting and control (5th ed.).
Wiley.
Braei, M., &Wagner, S. (2020). Anomaly detection in univariate time-series: A survey on the state-of-
the-art.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing
Surveys, 41(3), 1–58.
Chiu, B., Keogh, E., & Lonardi, S. (2003). Probabilistic discovery of time series motifs. In Proceedings
of the 9th international conference on knowledge discovery and data mining (KDD) (pp. 493–498).
Esling, P., & Agon, C. (2012). Time series data mining. ACM Computing Surveys (CSUR), 45(1),
1–34.
Fu, T.-C. (2011). A review on time series data mining. Engineering Applications of Artificial
Intelligence, 24(1), 164–181.
Komitova, R., Raabe, D., Rein, R., & Memmert, D. (2023). Time series data mining for sport data: A
review. International Journal of Computer Science in Sport, 21(2), 17–31.
Liao, T. (2005). Clustering of time series data—A survey. Pattern Recognition, 38(11), 1857–1874.
Lin, J., Keogh, E., Lonardi, E., & Patel, S. (2002). Finding motifs in time series. In Proceedings of the
eighth ACM SIGKDD Iternational conference on knowledge discovery and data mining 2nd work-
shop on temporal data mining (pp. 53–68).
Lin, J., & Li, Y. (2010). Finding approximate frequent patterns in streaming medical data. In IEEE
23rd international symposium on computer-based medical systems (CBMS), IEEE (pp. 13–18).
Liu, B., Li, J., Chen, C., Tan, W., Chen, Q., & Zhou, M. (2015). Efficient motif discovery for large-
scale time series in healthcare. IEEE Transactions on Industrial Informatics, 11(3), 583–590.
Minnen, D., Starner, T., Essa, I., & Isbell, C. (2006). Discovering characteristic actions from on-body
sensor data. In Wearable computers, 2006 10th IEEE international symposium on wearable com-
puters, IEEE (pp. 11–18).
Mitsa, T. (2010). Temporal data mining. Chapman and Hall/CRC.
148 R. Komitova and D. Memmert
Mueen, A. (2014). Time series motif discovery: Dimensions and applications. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 4(2), 152–159.
Ofoghi, B., Zeleznikow, J., MacMahon, C., & Raab, M. (2013). Data mining in elite sports: A review
and a framework. Measurement in Physical Education and Exercise Science, 17(3), 171–186.
Sanford, R., Gorji, S., Hafemann, L. G., Pourbabaee, B., & Javan, M. (2020). Group activity detec-
tion from trajectory and video data in soccer. Proceedings oft he IEEE/CVF conference on com-
puter vision, graphics and image processing (pp. 1–7).
Schuldhaus, D., Zwick, C., Körger, H., Dorschky, E., Kirk, R., & Eskofier, B.M. (2015). Intertial
sensor-based approach for shot/pass classification during a soccer match. In KDD workshop on
large-scale sports analytics (pp. 1–4).
Sivaraks, H., & Ratanamahatana, C. (2015). Robust and accurate anomaly detection in ECG arti-
facts using time series motif discovery. Computational and Mathematical Methods in Medicine,
2015, 1–20.
Stein, M., Jenezko, D., Seebacher, D., Jäger, A., Negel, J., Hölsch, M., Kosub, S., Schreck, T., Kleim,
D., & Grossniklaus, M. (2017). How to make sense of team sport data: From acquisition to data
modeling and research aspects. Data, 2(1), 2.
Tanaka, Y., Iwamoto, K., & Uehara, K. (2005). Discovery of time series motif from multidimen-
sional data based on MDL principle. Machine Learning, 58(2), 269–300.
Torkamani, S., & Lohweg, V. (2017). Survey on time series motif discovery. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 7(2), e1199.
Wankhedkar, R., & Jain, S. K. (2021). Motif discovery and anomaly detection in an ECG using matrix
profile. Progress in advanced computing and intelligent engineering (pp. 88–95). Springer.
Yeh, C., Kavantzas, N., & Keogh, E. (2017). Matrix profile VI: Meaningful multidimensional motif
discovery. In IEEE international conference on data mining (ICDM), IEEE (pp. 565–574).
Zolhavarieh, S., Aghabozorgi, S., & Teh, Y. (2014). A review of subsequence time series clustering.
The Scientific World Journal, 2014, 312521.
17
149 18
Process Mining
Marc Garnica Caparrós
Contents
References – 154
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_18
150 M. Garnica Caparrós
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Process Mining is a discipline that derives from Data Mining and focuses on the
analysis of end-to-end processes usually represented by execution logs.
55 Execution logs are a time-ordered collection of events occurring in any system.
For instance, the procedures a patient needs to overtake in a hospital.
55 Process Discovery is the set of algorithms that conceptually define and visualize
a process from collected data (observed behaviour).
55 Conformance checking measures the alignment between the process design (the-
oretical behaviour) and the actual execution observed in the data.
55 Process enhancement studies the process and its conceptual definition with the
aim to improve the overall system.
55 Process Mining is highly applicable to any field and opens several opportunities for
sports science, for instance, using event and positional data from invasion sports.
Basketball is a team sport although the media tries to individualise certain actions
or players. Basketball teams train weekly to practice their offensive and defensive
tactics. Often, these tactics might vary depending on the opponent. Analytics based
on positional data (the trajectories of the players and the ball on the court) and
event data (the actions performed by the players) gives tactics coaches an overview
of how a team is behaving on the court. In attack, teams usually have predefined
offensive plays, thus, players know how to start the attack, move the ball and look
for scoring opportunities. These predefined attack systems can be considered a
18 theoretical process with a start, a set of activities and resources and several end
options. These processes are also present in the collected data, on-ball actions are
present in the event data while off-ball actions are included in the positional data.
Process Mining provides the toolset to discover these processes in the collected
event and positional data, compare the observed team behaviour to the theoretical
Process Mining
151 18
design done by the coaches and finally try to improve these processes, for instance,
by identifying ignored scoring opportunities in the collected data. The implemen-
tation of Process Mining could be beneficial for sports performance and support
decision-making in sports tactics.
18.2 Background
The field of Process Mining (PM) is derived from data mining, a research discipline
involving mathematics, statistics and computer science for knowledge extraction
from datasets. Often, these datasets deal with big data challenges such as large
volume, variety in their syntax and semantics or high speed of generation. In the
1990s, the implementation of numerous IT systems was being conducted in all
areas of the modern industry revolutionizing every business. The evolution of
business digitalization brought together software systems such as Enterprise
Resource Planning (ERP) and Business Process Management (BPM) tools and
methodologies. In these early days, business process definition was part of the
design phase of any enterprise, experts would identify the requirements and design
the most efficient process to conduct a certain task involving numerous resources,
activities and agents (Tiwari et al., 2008).
Along with data mining, several other approaches started surfacing as a way to
deal with such complex implementations. Data warehousing, database management
system and advances in hardware allowed the massive collection of all the core
aspects of any business process, annotating every step and every resource used during
the process in the so-called execution log. Motivated by this massive collection of
business processes, the work of (van der Aalst et al., 2004), presented the idea of
creating data-driven process models or workflows that can be observed, analysed and
improved directly from the collected data. Therefore, Process Mining makes use of
the execution logs of any system to reconstruct the actual business processes.
Although these execution logs, often called event logs, are omnipresent in so
many industries, organizations lack a good definition and understanding of their
actual processes. Thus, PM untangles the differences between event logs (observed
behaviour) and process models (either theoretical models on how systems should
work, or data-driven discovered models of system behaviour). The applications of
such analysis include but are not limited to analysing the treatment of patients in
hospitals, improving user experience on e-commerce websites, analysing baggage
management in an airport system or controlling an automatic industry of a car
manufacturer. In fact, while the main work on PM is still part of the academic
knowledge, some businesses started applying this analytical technique to fully
understand and control their processes.
It is not until 2011 that the Process Mining Manifesto (van der Aalst et al., 2012) is
presented to the research community. In this document, PM is defined as a technique
to extract knowledge from event logs generated by information systems with the end
goal to discover, monitor and improve processes in various business domains. In this
work, it is also stated the three main PM types: discovery, conformance and enhance-
ment. Process discovery is the most research prolific type, discovery techniques cover
152 M. Garnica Caparrós
the generation of processes and visual representations directly from event logs without
additional information (van der Aalst, 2016), it allows for an automated definition of
your business process from your collected data. Conformance checking techniques anal-
yse the deviations between the data-driven automatically discovered models with the
theoretic process (how I process should work) (Rozinat & van der Aalst, 2008; Bergami
et al., 2021). Finally, the process enhancement methodologies aim to improve the exist-
ing processes using the information from the event logs by modifying the activities or
reorganizing the resources involved (de Leoni, 2022).
The existence of such event logs in a system not always needs to be a consequence
of an IT infrastructure in place. In some cases, event logs are generated from sensor
data observing a natural system (e.g., weather forecast stations) and the process is
actually a conceptual definition trying to describe the observed behaviour. In these
cases, the end goal of PM can still be applied. In sports, several data collection tech-
niques are being executed on all kinds of levels. Professional sports teams and fed-
erations monitor their athletes’ heart rate, distance covered or average fatigue levels
thanks to sensor devices. In sports included under the umbrella of invasion sports
(Hughes & Bartlett, 2002), games are also monitored generating two main data
sources: event data and positional data. Event data is originally used to generate
game statistics, counting the number of actions of a certain player, reporting the
accuracy of passes etc. However, event data is by nature the execution log of a team
in the game, the timely ordered sequence of all actions occurring in the game.
Similarly, positional data collects the position of all players and the ball at a high
frequency throughout the game. While this data is by definition a spatiotemporal
series of positions, discrete methods could produce a sequence of positional phases,
trajectories or movement patterns that could fit under the definition of a process. In
both cases, PM could consume these data sources to methodologically define the
process underneath a team’s performance in a game. Research in this field could
elaborate on how to use PM techniques to define team tactics or player functions.
Definition
Process mining (PM) is a relatively young research discipline in the intersection of
data mining and business management that conducts discovery, conformance and
enhancement of business processes. Process-aware analysis manages operational
studies assuming that the data is based on a dynamic behaviour process. Roles and
actors interact towards a certain goal or function in time and execute certain pat-
terns, orders, and workflows. Some data mining techniques, such as sequence or
episode mining, can model data as a sequence but not consider end-to-end pro-
cesses. PM identify the process models and provides a visual representation of struc-
ture or unstructured event logs (Diamantini et al., 2016). Process discovery involves
the identification of process models and visual representations of structure or
unstructured process data sourcing from any environment. Conformance checking
18 provides a measure of alignment between the strategy of a process (the theoretical
behaviour) and the actual execution. Finally, process enhancement extends or
improves the actual process models using knowledge extracted from the analysis.
Process Mining
153 18
18.3 Application
PM is a highly applicable area and has proven to be extremely insightful in all kinds
of domains and has become a core aspect of any system implementation.
► Example 2
Another great application of PM in healthcare was showcased in a hospital behaviour
analysis research work (Arnolds & Gartner, 2017). In this study, the clinical pathways,
the set of procedures that a patient undergoes in the hospital, were the centre of the
process-aware analysis. The prediction of certain clinical pathways such as diagnostics,
surgery or therapy and the transitions between them is of great importance for hospital
layout planning. Through the analysis, the authors achieved a reduction in the distances
travelled by the patients. ◄
► Example 3
Since 2020, online communication and collaboration have become an essential part of
any team or organization. Indeed, even educational systems are moving towards a more
flexible service model combining online resources with face-to-face classes. In a study
from 2016 (Alvarez et al., 2016), PM was used to measure the alignment between teacher
goals and student activities in an e-learning environment. The authors made use of the
logs generated by the students when doing their activities and the teacher’s course plan
(observed vs. theoretical behaviour). ◄
► Example 4
PM has actually already been tested in sports, in this case using soccer event data (Kröckel
& Bodendorf, 2020). In this exploratory study, the opportunities of PM were analysed as
a tool for the tactical analysis of soccer games. PM shows promising potential to evalu-
154 M. Garnica Caparrós
ate team and player’s performance in soccer and provide a standard definition of team
tactics by detecting the typical behaviour collected in the event data. PM also allows
investigating the organizational perspective of a team’s style of playing. Thus, extract-
ing metrics on how players are interacting with each other towards a common goal. ◄
Study Box
Invasion sports are sports disciplines sharing core concepts on their way of playing
and how the confrontation between the two teams is executed. In sports like soccer,
basketball, handball or rugby; teams share the purpose to invade the opponent’s ter-
ritory and scoring points while keeping the opponent’s point as low as possible in a
defined time period. Despite the similarities, research on invasion sports is mainly
presented in one-dimensional studies analysis of a single sport. The reason for this
discretization of the studies might be related to data availability or a more easy for-
mulation of the study requirements. In this scenario, Process Mining raises as a can-
didate to offer a sport-independent analysis where process models could be
generalized to any sport. For instance, abstracting the common characteristics
between a soccer goal and a rugby try. Understanding general concepts of invasion
sports in terms of location-based control of the field or the collaboration between
players towards common tasks could open up new tactics in the respective sports
and substantially contribute to sports science. In fact, recent studies have noted the
positive effect of multi-sport practices on young athletes for their potential long-
term development and specialization (Barth & Güllich, 2020).
References
Alvarez, P., Fabra, J., Hernandez, S., & Ezpeleta, J. (2016, September). Alignment of teacher’s plan
and students’ use of LMS resources. Analysis of Moodle logs. In 2016 15th international confer-
ence on information technology based higher education and training (ITHET). IEEE. https://doi.
org/10.1109/ithet.2016.7760720.
Arnolds, I. V., & Gartner, D. (2017). Improving hospital layout planning through clinical pathway
mining. Annals of Operations Research, 263, 453–477. https://doi.org/10.1007/s10479-017-2485-4
Barth, M., & Güllich, A. (2020). Non-linear association of efficiency of practice of adult elite ath-
letes with their youth multi-sport practice. Journal of Sports Sciences, 39, 915–925. https://doi.org
/10.1080/02640414.2020.1851900
18 Bergami, G., Maggi, F. M., Marrella, A., & Montali, M. (2021). Aligning data-aware declarative
process models and event logs. In Lecture notes in computer science (pp. 235–251). Springer
International Publishing. https://doi.org/10.1007/978-3-030-85469-0_16
de Leoni, M. (2022). Foundations of process enhancement. In Lecture notes in business information
processing (pp. 243–273). Springer International Publishing. https://doi.org/10.1007/978-3-031-
08848-3_8
Process Mining
155 18
Diamantini, C., Genga, L., & Potena, D. (2016). Behavioral process mining for unstructured pro-
cesses. Journal of Intelligent Information Systems, 47, 5–32. https://doi.org/10.1007/s10844-016-
0394-7
Hughes, M. D., & Bartlett, R. M. (2002). The use of performance indicators in performance analysis.
Journal of Sports Sciences, 20, 739–754. https://doi.org/10.1080/026404102320675602
Kröckel, P., & Bodendorf, F. (2020). Process mining of football event data: A novel approach for
tactical insights into the game. Frontiers in Artificial Intelligence, 3, 47. https://doi.org/10.3389/
frai.2020.00047
Rojas, E., Cifuentes, A., Burattin, A., Munoz-Gama, J., Sepúlveda, M., & Capurro, D. (2019).
Performance analysis of emergency room episodes through process mining. International Journal
of Environmental Research and Public Health, 16, 1274. https://doi.org/10.3390/ijerph16071274
Rozinat, A., & van der Aalst, W. M. (2008). Conformance checking of processes based on monitoring
real behavior. Information Systems, 33, 64–95. https://doi.org/10.1016/j.is.2007.07.001
Tiwari, A., Turner, C. J., & Majeed, B. (2008). A review of business process mining: State-of-the-art
and future trends. Business Process Management Journal, 14, 5–22. https://doi.
org/10.1108/14637150810849373
van der Aalst, W. (2016). Process mining. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-
662-49851-4
van der Aalst, W., Weijters, T., & Maruster, L. (2004). Workflow mining: Discovering process models
from event logs. IEEE Transactions on Knowledge and Data Engineering, 16, 1128–1142. https://
doi.org/10.1109/tkde.2004.47
van der Aalst, W. M. P., Adriansyah, A., Alves De Medeiros, A. K., Arcieri, F., Baier, T., Blickle, T.,
Bose, J. C., van den Brand, P. C. W., Brandtjen, R., Buijs, J. C. A. M., Burattin, A., Carmona, J.,
Castellanos, M., Claes, J., Cook, J., Costantini, N., Curbera, F., Damiani, E., de Leoni, M., et al.
(2012). Process mining manifesto. In Business process management workshops (pp. 169–194).
Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-28108-2_19
157 19
Networks Centrality
João Paulo Ramos, Rui Jorge Lopes, Duarte Araújo,
and Pedro Passos
Contents
References – 166
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_19
158 J. P. Ramos et al.
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Performance-relevant parameters from training and competition can be cap-
tured by network science
55 Network Analysis (NA) help to understand performance outcomes
55 There is a distinction between dynamics of the networks vs. dynamics on the
networks (structure vs. processes).
55 There is a distinction between match-level and play-level metrics.
The dynamics of players’ interactive behavior implies that the relevant moments of
a match evolve over time. The nature of the interactions among team players’ and
opponents are a complex process, that goes beyond passes, positioning and dis-
tances between players. To capture such complexity, hypernetworks approach is a
promising tool. Previous work on performance analysis (PA) in team ball sports
has focused on centrality. In the last two decades, dynamical systems theory (Davids
et al., 2003, 2005; Araujo et al., 2004; Reilly et al., 2005) and later, ecological
dynamics have been used as a background to describe and explain interactive
behaviors in team sports (Vilar et al., 2012). Consequently, questions about coop-
erative interactions between players from the same team emerged, for instance,
who is the player that interacts the most with their teammates? Additionally, the
question of centrality within cooperative and competitive interactions increases
relevance leading to more specific research questions such as: Considering players’
relative positioning and distance, what are the sets of players (1 vs. 1; 2 vs. 1; 1 vs.
2; 2 vs. 2; etc. …) that are more frequent (central) in the matches? Where do they
occur? Who are the players involved? or what are the processes used by the players
and teams to promote local dominance (e.g. 2 vs. 1) in the different areas of the
19 pitch or gain advantage in goal scoring opportunities?
Networks Centrality
159 19
19.2 Background
Complex systems approach to team sports performance via network science con-
siders and describes teams as having their parts (groups and players) interconnected
(Bar-Yam, 1997). Therefore, the level of system complexity can be made higher,
through structure by increasing the number of their parts and, via functionality by
increasing their interactions (Bar-Yam, 2004). In team sports, the number of play-
ers (parts) in a match is regulated, which highlights the complexity of the amount
and variability of interactions between groups and players (Bar-Yam, 2003).
With the aim of studying interactions, an increasing number of studies (Duch
et al., 2010; Dutt-Mazumder et al., 2011; Passos et al., 2011; Duarte et al., 2012;
Grund, 2012; Clemente et al., 2014, 2015a, 2015b, 2015c; Ramos et al., 2017a, 2017b)
used Social Network Analysis (SNA) as a suitable method to address the interdepen-
dencies in team sports. The approach consisted in modeling the cooperative interac-
tion between dyads of players from the same team or their actions with the ball. The
communicative power of SNA visualization by representing the aggregated passing
data across a football match (Ramos et al., 2018) between the dyads, attracted the
use of these metrics to performance analysis towards modeling intra-team coordina-
tion as the frequent passing interaction between players in team sports (Duch et al.,
2010; Passos et al., 2011; Brandes et al., 2012; Grund, 2012).
The analysis of these passing networks reveals some collective properties (e.g.,
patterns) of team performance as well as the underlying individual contribution to
those properties (Ramos, 2019). However, for practitioners it is still unclear in what
way social networks analysis contributes to an effective individual and team per-
formance improvement. Some uncertainty remains concerning how individual dif-
ferences and roles (influential players) are combined to enhance team performance
outputs (Duch et al., 2010). A chief principle present in every network analysis of
a system or process assumes that there is a structure, i.e., a network, in which these
interactions occur (Brandes et al., 2013).
We now turn to the structural properties of networks, i.e., to the properties that
are common to all networks. Given a particular system of study, there are proper-
ties that are expressed via network science. First and foremost, the system is com-
posed of interacting elements, i.e., a network is a “collection of vertices joined by
edges” (Newman, 2010) as represented in . Fig. 19.1. In the many different
approaches to study team ball sports, players and passes are represented respec-
tively by the vertices (actors) and edges (interactions) of the network. In the sim-
plest network models (simple networks or simple graphs) there is at most one edge
between any pair of vertices, whereas more sophisticated models (multigraphs)
may have multiple edges between pairs of nodes (Ramos, 2019). The most basic
global structural properties of a network are order and size, expressed in its num-
ber of vertices and edges. The global intensity level of interactions on the network
can be gauged via the density property computed as the relation between the net-
work size (number of passes or other actions with the ball—edges) over the maxi-
mum number of edges that could possibly exist within the network (Guillaume &
160 J. P. Ramos et al.
.. Fig. 19.1 Germany—Ghana: FIFA World Cup 2014. Each circle represents a player in his rela-
tive position; the radius and colour of each circle represents the number of players that a player
interacts with and his pass precision (red more precision; yellow less precision), respectively; the
arrows represent the direction of the passes between players; the width and shade of each arrow rep-
resents the number of interactions (passes) between players (lighter arrows indicate less passes, darker
arrows indicate more passes). The numbers in brackets represent the minutes played by each player
(e.g., −90′, played 90 min) or the moment in the match when a player started to play (e.g., +78′),
entered in the match at minute 78 and golo 54’means that scored a goal at minute 54′
Latapy, 2006). High density in the network tends to be associated to better team
performance (Grund, 2012; Ramos, 2019).
The referred metrics assesses the dynamics on the network, focusing on flows
across the network structure, e.g., ball passes between players. In this way, an entire
match is represented by the aggregate of all the passes that occurred, as static flip
books (cumulative snapshots of the network as a function of time) of each player
action, where player’s position remains constant but interactions cumulate over time
(Moody et al., 2005; Ramos et al., 2018). The structural properties studied in the
literature were mainly, network centrality and the density of interactions between
team members and the performance outcomes (Katz et al., 2004; Balkundi &
Kilduff, 2006; Grund, 2012). By counting the number of edges (e.g., passes) con-
nected to a vertex (e.g., player), the degree of a vertex is obtained, which is also
called degree centrality (Newman, 2010). The studies on team ball sports have been
mostly looking at the “ball flux” (e.g., ball passes) which form directed networks
from one player to another. On this sort of networks the vertices (players) have two
19 degrees: the in-degree (e.g. the number of ingoing passes to that player, or intercep-
Networks Centrality
161 19
tions made) and the out-degree (e.g. the number of the outgoing passes or interac-
tions conceded) (Newman, 2010). If one team has few players with high centrality
values, it means that those players are responsible for most of the passes performed
and team tends to be highly dependent on them, with a more predictable behavior,
which is associated to a less efficient team performance (Grund, 2012; Ramos, 2019).
By counting the number of edges and the number of vertices, the intensity of
the network is computed. Moreover, given the relation between the network size
(number of passes or other actions with the ball—edges) over the number of edges
that could possibly exist within the network, the density of the network is calcu-
lated (Guillaume & Latapy, 2006). High density in network tends to be associated
to a better team performance (Grund, 2012; Ramos, 2019; Pina et al., 2017). The
referred metrics assesses the dynamics on the network, focusing on flows across the
network structure, e.g. ball passes between players. Centrality studies could help
practitioners improving team sports performance by answering questions like:
“who is the most interactive player?” or “which players have an intermediary role?”
or “how central is a player?” or “how does each player contribute to the perfor-
mance of the others?” (Ramos et al., 2018).
Tackling the question about the most interactive player, the focus is on the local
structural analysis of the interactions. That is, on the interactions/passes between
the player and the adjacent players are given by the in-degree or the out-degree. To
tackle other questions, we have to understand the global analysis of network struc-
ture because it considers not only the adjacent vertices but that a second, a third
and/or more steps occurred in those sequences of passes. One limitation of using
these metrics which are imported from other complex systems networks analysis,
like Internet (social) networks, is that some metrics are based on the concept of
shortest path or geodesic path. This concept refers to the path between two vertices
for which there is no other path in the network that is shorter (is the geodesic dis-
tance or shortest distance that represents the shortest network distance) between
those two vertices (Newman, 2010). When analyzing soccer matches, we have the
aggregate of all passes/actions for the entire match or the aggregation during rele-
vant time spans or between some significant events (e.g., goals or goal scoring
opportunities). In these cases, we need to consider a more global network analysis,
which includes all the connections that occurred during the course of a match or
time span. That aggregating feature in team ball sports, does not follow the short-
est path concept (typical in Internet social interactions), but instead the concept of
random walks (a walk that takes random steps across the network, e.g. one player
could be a part of the sequence of passes, for more than once). This concept repre-
sents better the events during a team sport match (Ramos et al., 2017a, 2017b),
such as ball passing (Newman, 2010). Metrics like betweenness centrality, that
expresses the degree in which one vertex lies on the shortest path between two other
vertices, or closeness centrality, which is a measure of centrality that considers the
length of the shortest paths between the focal vertex and all the other vertices. This
metric is able to help answering questions about the intermediate and central play-
ers, respectively, but it assumes that interactions must happen through the shortest
paths (Ramos, 2019).
162 J. P. Ramos et al.
Definition
Network Analysis (NA) starts with a theory about the system (Brandes), the exis-
tence of an underlying network in the system is a chief assumption of this theory
(e.g., a social theory in the case of Social Network Analysis). NA is the computa-
tion of network or node properties (such as network order or node degrees) which
is only a part of the broader process of network science.
19.3 Applications
► Example 1
Whereas most of the studies with SNA in sports focused on the attacking patterns,
Sasaki and colleagues (Sasaki et al., 2017) aimed to clarify the networks created within
the defensive patterns that play a decisive role during a Rugby match. The rational was
that the cooperation that links structural entities (e.g., defenders) is always dynamic.
Thus, a different collaborative format indicates different characteristics of an entity.
Again, it was expected that the more interactive structure (higher complexity) is less
predictable and more adaptable. ◄
19
Networks Centrality
163 19
► Example 2
A few studies have investigated cooperation through direct physical contact, e.g., tack-
ling in rugby (Koh et al., 2013). The authors calculated the absolute frequency of
tackling per position during a rugby match, which contributes to a defense turnover
performance, which consequently neutralized the offensive activity. The purpose of this
study was to explain the centrality of a defensive squad in rugby. Whenever a multi-
player defense act occurred, a singular network was created. The vertices represented the
positions of the players, and the edges represented cooperation between teammates in
the course of a match (double tackling), when a tackle with two players led to a turnover
in play. The edges were weighted by the number of repetitive cooperative actions in the
aggregation of the entire match. ◄
► Example 3
There still exists a gap between SNA and performance outcomes that fosters the prac-
tical impact of the approach. The current temporal approaches did not consider the
actual sequence of ball passing to detect players that are in fact connecting their team
members through passing. This implies that passing sequences should be evaluated sepa-
rately instead of examining the aggregated passing data across a match (Ramos et al.,
2018). Thus, the interplay in each ball possession needs to be analyzed separately instead
of evaluating an aggregated passing matrix at match level. This type of approach tackles
the dynamics of the network assessing the changes in the network structure itself, like the:
(1) relational space (i.e. interactions considered in a geographical space); (2) their time
structure (i.e., rate of change, order or sequence, or simultaneity of interactions); and (3)
their relations with different types of vertices (i.e. teammates or opponents), thus consid-
ering both cooperative and competitive interactions (Moody et al., 2005; Ramos et al.,
2018), including the opportunity for a pass between ball carrier and her/his team mates
was assessed, via the possibility that such pass may or not be intercepted by an opponent
player. This landscape of passing affordances can be represented as edges between play-
ers and/or spatial locations on the pitch (Passos et al., 2020). ◄
► Example 4
Complex networks like temporal and bipartite, such as the so-called hypernetworks
(Johnson, 2006) represent interactions and relations that occur during the course of a
team sports match. In a hypernetwork a hyperedge can connect more than two nodes
(e.g. two players from one team and one from the other team: 2 vs. 1), directly repre-
senting n-ary relations as sets, σ (Johnson, 2006, 2008, 2013, 2016; Criado et al., 2010;
Boccaletti et al., 2014; Ramos et al., 2017a, 2017b). ◄
40
30
20
b1 b2
10
b1 b2
0
1 2
–10
r2 r2
–20
r1 r3 r1 r3
–30
–40
–60 –40 –20 0 20 40 60
.. Fig. 19.2 Hypernetwork construction based on proximity between players’ forming sets of
players connected through edges (polygons) and represented in their position in the soccer pitch
on a given instant of time
2. If there are central sets of each type (e.g. 1 vs. 1; 2 vs. 1; 1 vs. 2; 2 vs. 2) that
occurs in the match or in some time spans (e.g. the 10 s before a goal scoring
opportunity;
3. Who are the central players in the sets and if there are a specific central set
that occurs more than the others;
4. If there are a central specific region where the sets occur, represented in histo-
grams like heat maps;
5. How sets disaggregate or aggregate (e.g. velocity changes in players moves) in to
a new ones and which players are central promoting those transformations in
the sets.
6. The dynamics of these centralities can be assessed by hypernetworks at differ-
ent scales or levels of analysis, like: players individually, specific sets (groups)
and sets of sets. Some studies used this hypernetworks multilevel approach con-
sidering the complex systems link between micro-meso-macro levels (Ramos
et al., 2017a, 2017b; Ramos, 2019; Ribeiro et al., 2019).
► Example 5
Previous research revealed statistical significance between playing positions in successful
and unsuccessful plays in football regarding flow centrality and flow betweenness (Korte
et al., 2019). Defenders and defensive midfielders are functioning as bridging players in
70–75% of all plays they are involved in, the shares for goalkeeper and forwards are only
40–50%. ◄
► Example 6
The match-level metrics measure the share in a team total passing while the play-level
metric evaluates the prevalence in plays across a match. For instance, flow betweenness
detects how often a player is actually in-between two other players during a play and is
in fact acting as an intermediary player. ◄
19
Networks Centrality
165 19
Study Box
The hypernetworks approach to PA in team ball sports has been tackling centralities
either in cooperation and competition interactions (Ramos et al., 2017a, 2017b;
Ramos, 2019; Ribeiro et al., 2019, 2020). The promising results were mainly based on
distance interactions between players and allowed to identify some centralities:
1. The most common sets of players formed by proximity to each other were 1 vs.
1 (25%), followed by 1 vs. 2 (10.31%), 2 vs. 1 (8.8%) and 2 vs. 2 (6.81%);
2. The sets positioning interactions (e.g. through heatmaps) tends to be a reflex of
the strategy (design) for those players roles on the match;
3. The synchronization processes between team players and opponents that
emerged in the matches;
4. The dynamical changes in the sets were promoted by changes on the players run-
ning lines velocity.
► Example 7
The quantification of network centrality within a team or between teams provides
an assessment of each player’s mechanism of contribution. More specifically, the
Eigenvector centrality would reflect the specific network structures of one’s neighbor
vertices. Sasaki and colleagues did not used betweenness centrality but rather the eigen-
vector centrality, which accordingly to the authors reflected the vertex centrality strongly
(Sasaki et al., 2017). ◄
References
Araujo, D., Davids, K., Bennett, S., Button, C., & Chapman, G. (2004). Emergence of sport skills
under constraints. In Skill acquisition in sport (pp. 409–434). Routledge, Taylor & Francis
e-Library.
Balkundi, P., & Kilduff, M. (2006). The ties that lead: A social network approach to leadership. The
Leadership Quarterly, 17(4), 419–439.
Bar-Yam, Y. (1997). Dynamics of complex systems. Addison-Wesley.
Bar-Yam, Y. (2003). Complex systems and sports: Complex systems insights to building effective teams.
NECSI.
Bar-Yam, Y. (2004). Making things work: Solving complex problems in a complex world. Knowledge
Industry.
Boccaletti, S., Bianconi, G., Criado, R., Del Genio, C. I., Gómez-Gardeñes, J., Romance, M.,
Sendiña-Nadal, I., Wang, Z., & Zanin, M. (2014). The structure and dynamics of multilayer
networks. Physics Reports, 544(1), 1–122.
Brandes, U., Freeman, L. C., & Wagner, D. (2012). Social networks. In R. Tamassia (Ed.), Handbook
of graph drawing and visualization. CRC Press.
Brandes, U., Robins, G., McCranie, A., & Wasserman, S. (2013). What is network science? Network
Science, 1(1), 1–15. https://doi.org/10.1017/nws.2013.2
Clemente, F. M., Couceiro, M. S., Martins, F. M. L., & Mendes, R. S. (2014). Using network metrics
to investigate football team players’ connections: A pilot study. Motriz: Revista de Educação
Física, 20(3), 262–271.
Clemente, F. M., Couceiro, M.S., Martins, F.M.L., & Mendes, R.S. (2015a). Using network metrics
in soccer: A macro-analysis. Journal of Human Kinetics, 45(1), 123–134.
Clemente, F. M., Martins, F. M. L., Kalamaras, D., Wong, P. D., & Mendes, R. S. (2015b). General
network analysis of national soccer teams in FIFA World Cup 2014. International Journal of
Performance Analysis in Sport, 15(1), 80–96.
Clemente, F. M., Martins, F. M. L., Wong, P. D., Kalamaras, D., & Mendes, R. S. (2015c). Midfielder
as the prominent participant in the building attack: A network analysis of national teams in
FIFA World Cup 2014. International Journal of Performance Analysis in Sport, 15(2), 704–722.
Criado, R., Romance, M., & Vela-Pérez, M. (2010). Hyperstructures, a new approach to complex
systems. International Journal of Bifurcation and Chaos, 20(03), 877–883.
Davids, K., Glazier, P., Araújo, D., & Bartlett, R. (2003). Movement systems as dynamical systems.
Sports Medicine, 33(4), 245–260.
Davids, K., Araújo, D., & Shuttleworth, R. (2005). Applications of dynamical systems theory to
football. In Science and Football V (pp. 537–550).
Duarte, R., Araújo, D., Correia, V., & Davids, K. (2012). Sports teams as superorganisms: Implications
of sociobiological models of behaviour for research and practice in team sports performance
analysis. Sports Medicine, 42(8), 633–642.
Duch, J., Waitzman, J. S., & Amaral, L. A. N. (2010). Quantifying the performance of individual
players in a team activity. PLoS One, 5(6), e10937.
Dutt-Mazumder, A., Button, C., Robins, A., & Bartlett, R. (2011). Neural network modelling and
dynamical system theory. Sports Medicine, 41(12), 1003–1017.
Fewell, J. H., Armbruster, D., Ingraham, J., Petersen, A., & Waters, J. S. (2012). Basketball teams as
strategic networks. PLoS One, 7(11), e47445.
Freeman, L. C. (2000). Visualizing social networks. Journal of Social Structure, 1(1), 4.
Grund, T. U. (2012). Network structure and team performance: The case of English premier league
soccer teams. Social Networks, 34(4), 682–690.
Guillaume, J.-L., & Latapy, M. (2006). Bipartite graphs as models of complex networks. Physica A:
Statistical Mechanics and its Applications, 371(2), 795–813.
Johnson, J. (2006). Hypernetworks for reconstructing the dynamics of multilevel systems.
Johnson, J. (2008). Multidimensional events in multilevel systems. In D. In Albeverio, P. G. Andrey,
19 & A. Vancheri (Eds.), The dynamics of complex urban systems: An interdisciplinary approach. S
(pp. 311–334). Heidelberg, Physica-Verlag HD.
Networks Centrality
167 19
Johnson, J. (2013). Hypernetworks in the science of complex systems. Imperial College Press London.
Johnson, J. H. (2016). Hypernetworks: Multidimensional relationships in multilevel systems. The
European Physical Journal Special Topics, 225(6), 1037–1052.
Katz, N., Lazer, D., Arrow, H., & Contractor, N. (2004). Network theory and small groups. Small
Group Research, 35(3), 307–332.
Koh, S., Yamamoto, T., Murakami, J., & Ueno, Y. (2013). Defence performance analysis of Rugby
Union in Rugby World Cup 2011: Network analysis of the turnover contributors. In Performance
Analysis of Sport IX (pp. 120–125). Routledge.
Korte, F., Link, D., Groll, J., & Lames, M. (2019). Play-by-play network analysis in football. Frontiers
in Psychology, 10, 1738.
Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic network visualization. American
Journal of Sociology, 110(4), 1206–1241.
Newman, M. (2005). A measure of betweenness centrality based on random walks. Social Networks,
27(1), 39–54.
Newman, M. E. J. (2010). Networks: An introduction. Oxford University Press.
Passos, P., Silva, R. A. E., Gomez-Jordana, L., & Davids, K. (2020). Developing a two-dimensional
landscape model of opportunities for penetrative passing in association football—Stage I.
Journal of Sports Sciences, 38(21), 2407–2414.
Passos, P., Davids, K., Araújo, D., Paz, N., Minguéns, J., & Mendes, J. (2011). Networks as a novel
tool for studying team ball sports as complex social systems. Journal of Science and Medicine in
Sport, 14(2), 170–176.
Pina, T. J., Paulo, A., & Araújo, D. (2017). Network characteristics of successful performance in
association football. A study on the UEFA champions league. Frontiers in Psychology, 8, 1173.
https://doi.org/10.3389/fpsyg.2017.01173
Ramos, J., Lopes, R. J., Marques, P., & Araújo, D. (2017a). Hypernetworks reveal compound vari-
ables that capture cooperative and competitive interactions in a soccer match. Frontiers in
Psychology, 8, 1379.
Ramos, J., Lopes, R. J., Marques, P., & Araújo, D. (2017b). Hypernetworks: Capturing the multilay-
ers of cooperative and competitive interactions in soccer. International Congress Complex
Systems in Sport, Frontiers.
Ramos, J., Lopes, R. J., & Araújo, D. (2018). What’s next in complex networks? Capturing the con-
cept of attacking play in invasive team sports. Sports Medicine, 48(1), 17–28.
Ramos, J. (2019). Complex networks analysis in team sports performance: Multilevel Hypernetworks
approach to soccer matches. ISCTE-Instituto Universitario de Lisboa (Portugal).
Reilly, T., Cabri, J., & Araújo, D. (2005). Applications of dynamical systems theory to football. In
Science and Football V (pp. 570–572). Routledge.
Ribeiro, J., Davids, K., Araújo, D., Silva, P., Ramos, J., Lopes, R., & Garganta, J. (2019). The role of
hypernetworks as a multilevel methodology for modelling and understanding dynamics of team
sports performance. Sports Medicine, 49, 1337–1344.
Ribeiro, J., Lopes, R., Silva, P., Araújo, D., Barreira, D., Davids, K., Ramos, J., Maia, J., & Garganta,
J. (2020). A multilevel hypernetworks approach to capture meso-level synchronisation processes
in football. Journal of Sports Sciences, 38(5), 494–502.
Sasaki, K., Yamamoto, T., Miyao, M., Katsuta, T., & Kono, I. (2017). Network centrality analysis to
determine the tactical leader of a sports team. International Journal of Performance Analysis in
Sport, 17(6), 822–831.
Vilar, L., Araújo, D., Davids, K., & Button, C. (2012). The role of ecological dynamics in analysing
performance in team sports. Sports Medicine, 42(1), 1–10.
169 20
Contents
References – 176
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_20
170 M. Tilp
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Artificial neural networks are inspired by the biological nervous system in their
structure and function.
55 They consist of so-called neurons, which are arranged in several layers. Connec-
tions exist between the neurons of the individual layers, which transmit signals
between the neurons and thereby excite or inhibit them.
55 Artificial neural networks convert input information into an output signal with-
out the need to know the specific context.
55 There are different ways in which artificial neural networks work: e.g., one distin-
guishes between networks that learn contexts supervised or unsupervised (i.e.,
completely independently).
55 It has been shown that artificial neural networks can be used in sports, e.g., to
identify patterns or to make predictions.
55 It is expected that Artificial Neural Networks will be increasingly and success-
fully used in various fields of sports science in the future.
The biological nervous system adapts when a movement is learned or when a tacti-
cal situation is repeatedly observed. In the first case, if a complex movement is
repeated many times by activating muscles which are controlled by a great number
of nerve cells, the nervous system remembers the interaction of the corresponding
nerve cells (Des Marèes, 2003). During this process, the nerve cells receive informa-
tion via the so-called dendrites and, if a threshold value is reached that activates
the nerve cell, they in turn pass on information to the next cell via their axon. These
structures and the synapses, i.e., the connections between the cells, strengthen if
they are often used or atrophy if they are not used. After a training phase, the
movement thus becomes increasingly smoother and more economical. In the sec-
ond case, if certain game situations occur again and again, the sensory nervous
Artificial Neural Networks
171 20
system will always perceive them similarly. Over time, the nervous system learns to
identify the patterns that occur and can recognize the situations, e.g., a feint in a
sports game.
Artificial neural networks work in a similar way. Their neurons receive informa-
tion from other neurons and pass it on. While the biological synapses adapt over
time to improve the signal transmission and thereby excite or inhibit it, artificial
neural networks adapt the computational rules between two neurons. Similar to
the biological process, the phase of this adaptation is also called training. If a neu-
ral network has trained/learned sufficiently by using input data, it can, for example,
recognize patterns of play in a sports game (cf. Grunz et al., 2012; Memmert &
Perl, 2009a, 2009b; Perl et al., 2013).
20.2 Background
There are various types of network structures, of which the simplest will be illus-
trated here. If, at one hand, information is always passed on in one direction only
(from an input layer towards an output layer), this is referred to as a feedforward
network. If, on the other hand, the network can fall back on earlier computations
by allowing feedback, it is called a feedback network (backpropagation).
The basic structure of a feedforward neural network consists of an input layer,
which in turn consists of several neurons, one or more hidden layers, and an output
layer. The individual neurons of one layer are connected to all neurons of the next
layer (see . Fig. 20.1). Conversely, each neuron receives information from all neu-
rons in the previous layer. The connections determine to what extent a value of one
neuron is passed on to the next. This is called the weight of the connection. The
weight of the connection of e.g., neuron 2 and neuron 5 from the next layer in
. Fig. 20.1 is denoted by w25. The input value of neurons results from the weighted
values of all upstream neurons. These values can be combined by linear or nonlin-
ear calculation rules. This calculation rule is called propagation function and is in
the simplest case the sum of all weighted values of the incoming neuron values (see
calculation in Example 1).
.. Fig. 20.1 Basic structure of a neural network with input layer (neurons 1–3), hidden layer (neu-
rons 4 and 5), and output layer (neuron 6)
172 M. Tilp
The input value of a neuron from the propagation function is now further pro-
20 cessed via a so-called activation function (within the receiving neuron). Similar to
the way a neuron becomes active in nature only when a certain membrane potential
is exceeded, there are computational rules that calculate the activity value of the
neuron from the input value. There are different types of activation functions (e.g.,
a step function, which produces a constant value above a certain input threshold).
The resulting activity value can then be forwarded to the next layer or, in the case
of an output layer neuron, represents the output value.
The weights between the neurons are variable and can adapt to the data. The
phase of these adaptations, in which a set of input data is provided to the neural
network, is called the training or learning phase. During this phase, the weights
between the neurons are changed according to predefined learning rules. A distinc-
tion is made between supervised and unsupervised learning. In supervised learn-
ing, the network is provided with both input data and the corresponding output
data, just as a teacher tells the student the task and the solution. During the learn-
ing processs, the network compares the calculated and the given output values and
continuously adjusts the weights based on the differences until the differences are
below a given threshold. The way the differences are calculated are specified in a
so-called error function. Network types for supervised learning are e.g., percep-
tron, multi-layer perceptron (MLP), or radial basis function (RBF) networks.
During unsupervised learning, only input data is provided. The network then tries
to map the input data to neighboring neurons based on their similarity. Hence,
similar input data are assigned to similar output data after the training phase. One
network type for unsupervised learning is, for example, a Kohonen feature map
(KFM). Thus, the choice of network type depends on the task that it is supposed
to solve. For exact descriptions of the different network types, learning rules, prop-
agation, activation, and error functions, the reader is referred to further literature
(e.g., Backhaus et al., 2006; Sanderson, 2017).
Definition
Artificial neural networks are information-processing systems that are modeled
based on the networking of nerves in living organisms. They consist of so-called
neurons, which are arranged in individual layers. The connections between the
neurons link the input information with output information. Artificial neural net-
works are a sub-area of artificial intelligence and are used for classification, fore-
casting, and optimization tasks.
20.3 Applications
► Calculation Example 1
Here, we create an artificial neural network for beach volleyball to predict whether an
opponent will play a hard attacking spike or a targeted shot. We assume that the playing
positions during the reception, setting, and attacking affect this decision. Therefore, the
Artificial Neural Networks
173 20
distance of the attacker’s receiving position from the net (in m), the distance of the set-
ting position from the net (in m), and the distance of the attacking position from the net
(in m) serve as the net input information. Since this is a supervised learning process, the
network is also informed about the type of attack (0 = spike, 1 = shot) as output infor-
mation. In this case, the artificial neural network consists of an input layer with three
neurons (distances between 1—the reception position, 2—the set position, and 3—the
attack position to the volleyball net, respectively) and an output layer with one neuron
(type of attack). The weights between the neurons are randomly selected at the begin-
ning and adjusted during the learning process. As a learning rule, the weights could,
for example, always be increased or decreased by 5% of the input values of the current
situation (learning rate), depending on whether the result was too low or too high. In
contrast, if the calculation leads to the correct result, the weights remain unchanged
(. Fig. 20.2).
. Table 20.1 shows an example data set with data collected from a game. The cal-
culations with this data should explain how an artificial neural network works. If we
take the data of the first row from the example data set (. Table 20.1), the input data
and the corresponding weights w14 = −0.1, w24 = 0.3, and w34 = 0.2 result in a value of
0.6 (4 × (−0.1) + 3 × 0.3 + 0.5 × 0.2) (see . Table 20.2). This value is then further pro-
cessed by an activation function. In the example, a so-called threshold function would
be appropriate as activation function, e.g., taking the value 0 (spike) for values <0.5
and the value 1 (shot) for values ≥0.5. Applying the activation function would result in
the value 1 (Shot) in our calculations since 0.6 is greater than 0.5. However, since the
player played an attack (value 0) in the real situation (see . Table 20.1), the weights
would be adjusted before calculating the second game situation. In this case, the weights
.. Fig. 20.2 Calculation example 1: Left: General structure of the type of attack prediction net-
work. Right: Calculation of the input value of the output neuron based on the input data and the
weights wij in situation 1 from the example data set
20 .. Table 20.2 Input data (ei), weights (wij), weighted output values, and sums (=input values)
from the example data set. The weights always increase or decrease by 5% of the input data
Recep- Setting Attack w14 w24 w34 e1 w14 e2 w24 e3 w34 Sum
tion = = e2 = e3
e1
would be reduced by 5% of the input values from situation 1. The new weights would be
w14 = −0.1–0.05 × 4 = −0.3, w24 = 0.3–0.05 × 3 = 0.15, and w34 = 0.2–0.05 × 0.5 = 0.175.
Now, using the new input data from situation 2 (reception position = 6 m, setting posi-
tion = 1 m, attacking position = 1 m) and the new weights, the new input value is calcu-
lated. This results in a value of −1.475 (=6 × (−0.3) + 1 × 0.15 + 1 × 0.175). Since this
value is <0.5, the activation function would yield the value 0 (spike). This value again
does not match the observed behavior (shot), so the weights are again adjusted. This
process is repeated with an amount of input data as large as possible until the predic-
tions reach a given sufficient accuracy. Then the network has learned the context and can
predict whether the player will spike the ball or play a shot based on data from a (new)
input data set (reception position, setting position, attacking position). ◄
Study Box
20 only limited information of input data compared to the experts who saw all players
in the video, the predictions of the attacking position were about equally good
(68.1% vs. 65.3% correct predictions) and for the setting time even significantly bet-
ter (79.2% vs. 64.6% correct predictions) than those of the experts. The result shows
that neural networks are capable of predicting actions in sports games with similar
success rates as experts. It has to be noted that the results could probably be even
improved by varying and optimizing the input data. By pointing out the essential
prediction parameters identified by the neural network, these results could then be
used in the context of perception training. A corresponding anticipation training is
also possible for athletes in other sports such as handball (Hassan et al., 2017).
References
Backhaus, K. Erichson, B. Plinke, W., & Weiber, R. (2006). Neuronale Netze. In Multivariate
Analysemethoden (11. Auflage, pp 750–806). Springer, Berlin.
Des Marèes, H. (2003). Sportphysiologie (9. Auflage ed.). Sportverlag Strauß.
Grunz, A., Memmert, D., & Perl, J. (2012). Tactical pattern recognition in soccer games by means of
special self-organizing maps. Human Movement Science, 31, 334–343.
Hassan, A., Schrapf, N., & Tilp, M. (2017). The prediction of action positions in team handball by
non-linear hybrid neural networks. International Journal of Performance Analysis in Sport, 17,
293–302.
Komaris, D. S., Pérez-Valero, E., Jordan, L., Barton, J., Hennessy, L., O’Flynn, B., & Tedesco, S.
(2019). Predicting three-dimensional ground reaction forces in running by using artificial neural
networks and lower body kinematics. IEEE Access, 7, 156779–156786.
Memmert, D., & Perl, J. (2009a). Analysis and simulation of creativity learning by means of artificial
neural networks. Human Movement Science, 28, 263–282.
Memmert, D., & Perl, J. (2009b). Game creativity analysis by means of neural networks. Journal of
Sport Science, 27, 139–149.
Perl, J., Grunz, A., & Memmert, D. (2013). Tactics in soccer: An advanced approach. International
Journal of Computer Science in Sport, 12, 33–44.
Sanderson, G. (2017, August 1). Neural networks. [Video]. YouTube. https://www.youtube.com/watc
h?v=aircAruvnKk&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi.
Schrapf, N., Hassan, A., Wiesmeyr, S., & Tilp, M. (2022). An artificial neural network predicts setter’s
setting behavior in volleyball similar or better than experts. IFAC-PaperOnLine (55–20, pp. 612–
617).
Schrapf, N., & Tilp, M. (2013). Action sequence analysis in team handball. Journal of Human Sport
and Exercise, 8(3), 615–621.
177 21
Contents
References – 183
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_21
178 D. Raabe
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected].
Key Messages
55 Deep neural networks are neural networks with multiple layers and potentially
millions of parameters and potentially billions of links.
55 They are successfully used for particularly complex learning problems in research
and industrial applications in diverse fields such as image recognition or lan-
guage processing.
55 The ability to master complex learning problems is usually accompanied by high
training data requirements, intensive computation times, and low transparency
of the trained model.
55 In sports, there are countless applications for deep neural networks, from data
acquisition to semantic event extraction to analysis of complex tactical patterns.
Deep neural networks are an extension and development of classical neural net-
works (see 7 Chap. 20) in the field of artificial intelligence. While neural networks
in the early years of their development often consisted of only a few neurons and
at most one hidden layer, modern “deep” neural networks can consist of several
hundred million neurons, arranged in a multitude of hidden layers and billions of
connections between the neurons. Accordingly, the difference between these net-
works and classical networks is the number of layers and neurons. Sometimes,
however, the term “deep learning” is nowadays used excessively for other methods
in order to emphasize the particular complexity of a model. Closely related to the
size of these networks, however, is the leap in performance, as deep neural net-
works are now clearly ahead of their smaller ancestors on numerous problems on
standardized datasets (so-called benchmark datasets).
Due to this performance, deep neural networks are extremely popular nowa-
days and are used for application or research problems in a wide variety of domains
such as image recognition and processing (so-called computer vision), speech rec-
Deep Neural Networks
179 21
ognition, text processing, medicine, bioinformatics, climate research, but also in
sports. For example, it is only the advances in computer vision that have favored
the development of optical (i.e., camera-based) tracking methods for acquiring
positional data with acceptable accuracy. With these methods, the players and their
positions are detected and extracted from the camera image with the help of deep
neural networks. The very latest methods can even extract the positions from a
television broadcast or estimate the exact silhouette of the players (see 7 Chap. 4).
Also, in the area of sports data analysis, there are more and more methods that can
be classified as deep learning.
21.2 Background
However, deep neural networks also have some drawbacks associated with their
sheer size. These networks consist of hundreds to billions of “trainable” parame-
ters—e.g. weights of the individual propagation functions—which are adjusted
21 during the learning process. This causes both practical and theoretical problems.
To adjust the individual parameters, i.e., to “learn” the solution, these methods
typically require a lot of training data (in supervised learning), long computation
times, and associated resources such as hardware or energy. One of the largest net-
works constructed to date, the GPT-3 in the text processing domain, boasts 175
billion parameters (Brown et al., 2020). The training of this model used a total of
45 terabytes of compressed text material (about 400 billion characters) and ran on
a specifically constructed supercomputer with 10,000 processing units for about a
month. The estimated cost is in the range of several million US dollars, and the
runtime, extrapolated to a single computing unit, is several hundred years. Although
this is an extreme example, it shows that feasibility is limited both for problems
with smaller data sets (some of which have to be created by hand) and by financial
constraints.
Furthermore, due to the complexity of these models, it is not possible to com-
prehend the final predictions of a trained network. Due to the high number of
parameters, these models are usually a “black box” with opaque inner computa-
tional mechanisms. Why a network makes a certain prediction remains, for the
most part, its secret. Apart from that, from a scientific perspective, simple and
transparent models are preferable to more complex and non-transparent models
according to Occam’s razor. Because of these disadvantages, research today is
intensively focused on models that learn on the basis of very little training data
(e.g. so-called few-shot learning) and that deliver predictions that are comprehen-
sible to humans (so-called explainable artificial intelligence, abbreviated “xAI”).
Definition
Deep neural networks and deep learning refers to a group of machine learning
methods, and in particular neural networks, that are characterized by their com-
plexity and size. Such large network architectures consist of several submodules
with different structure and functionality, which are modularly linked and coupled
with each other. This approach follows the assumption that the concept to be
learned in the real world is composed of several hierarchically structured sub-con-
cepts. In a deep learning method, these are to be learned sequentially up to the final
concept through the different network layers—hence the term “deep” learning.
21.3 Applications
A classic example for the use of deep neural networks in the context of sports is the
generation of position data via so-called optical, i.e., camera-based tracking sys-
tems. These systems are characterized by their non-invasiveness, since the players
do not have to wear additional sensors on their bodies. With the help of neural
Deep Neural Networks
181 21
Module
Input for Shot Boundary Detection Field Mask Estimation Player Detection Team Assignment
.. Fig. 21.1 Schematic structure of an exemplary processing pipeline for extracting position data
from a television broadcast. (Adopted from Theiner et al. 2022)
networks, the players can be detected on the camera image, and using up to 20
cameras with different viewing angles, the exact positions can be inferred. While
these cameras are usually fixed and calibrated in the stadium infrastructure, the
technology is now so advanced that a dynamic camera image (from a TV broad-
cast, for example) alone can be used to extract player positions. However, this
requires several steps besides the identification of the players to obtain accurate
positions. Theiner et al. (2022) show an example of an entire pipeline that can solve
this task (see . Fig. 21.1). This pipeline consists of steps to identify the individual
scene cuts, select all sequences that are filmed from the main camera, recognize the
playing field and estimate the field calibration, recognize the players, extract the
positions, and assign identities. In almost all of these steps, deep learning methods
are used and show higher precision than conventional methods.
The study by Wagenaar et al. (2017), on the other hand, uses already processed
position data and addresses the question of whether goal chances can be predicted
using deep neural networks (the CNNs described above). For this purpose, several
network architectures with increasing complexity were used in an experiment.
From positional data of soccer matches, the authors extracted short sequences of
10 s in length, which either ended in a goal scoring opportunity or not. Based on
this dataset, a binary classification problem was formulated and the task for the
networks was to decide, given a sequence shown, whether a goal was scored—or
not. Since, as described, CNNs originate from image processing, the raw position
data were first transformed into 256 × 256 pixel images that schematically repre-
sent the respective game scene in a two-dimensional pixel graphic (see 7 Chap. 23).
This representation of the raw data was used as input data for the different net-
works. A comparatively small and simple CNN, an extremely performant (at that
time) GoogLeNet, and a K-nearest neighbor approach were used as baselines.
Compared to the baseline with an average classification accuracy of 57.3%, the
deep neural networks were able to achieve significantly better results: the best
GoogLeNet variant achieved an average accuracy of 67.1%.
Study Box
The study described above is one of the first to attempt to predict game tactical
21 events such as goal completions based on positional data and using deep learning
(Wagenaar et al., 2017). However, the question arises to what extent a representation
of position data in pixel graphics is appropriate for the specific problem. More gen-
erally, the question arises which data representation and which network modules or
architectures are particularly suitable for processing position data in deep learning
models. The choice of these has a decisive influence on the performance in sports-
specific learning problems, as well as on the associated model complexity and the
required sample sizes in the learning process. In a similar study, Raabe et al. (2022)
therefore propose a graph-based data representation and a matching deep neural
network architecture consisting of several submodules (see . Fig. 21.2). Here, the
raw positions of the players are represented as nodes in a graph, which are linked by
their respective interactions. This representation is intended to exploit domain-
specific characteristics of sports and thus improve the performance of the network.
Furthermore, graphs have interesting mathematical properties (so-called invari-
ances) that reduce the need for training data. Graph-based neural networks are a
recent development in the field of deep learning, which enables the processing of
non-Euclidean input data.
In an experimental comparison, this neural network architecture was then tested
against other architectures. As a learning problem, a binary classification task was
constructed in which short sequences of positional data were to be evaluated with
respect to the question whether the sequences resulted in a ball win for the defending
team—or not. For comparison, a simple baseline and four different data representa-
tions were used. One was a pure feature-based approach, in which numerous com-
mon key performance indicators (KPIs) were calculated from the raw positional
data, which were subsequently used as predictors in a logistic regression. Furthermore,
deep neural networks based on the raw position data, i.e., a long vector with the
player positions, as well as based on a representation as pixel images—as in the study
by Wagenaar et al. (2017)—were utilized. Finally, the proposed graph-based archi-
tecture was also utilized.
The results of this comparison are summarized in the following table:
Feature GCN
construction
Distance Inversion
TG Construction
Data Sparsification GRU FC
BatchNorm
Max Pooling Output
. Fig. 21.2 The graph-based deep neural network architecture used from Raabe et al.
(2022)
Deep Neural Networks
183 21
Baseline – 1 74.5
Logistic KPIs 10 76.8
regression
SVGRU Raw data 499.970 71.6
CNN Pictures 437.506 61.4
GoogLeNet Pictures 9.936.038 80.4
TGNet Graphs 109.212 80.5
In summary, both the most complex GoogLeNet with almost ten million param-
eters and the proposed graph-based model TGNet achieved by far the best classifica-
tion results. However, the TGNet is much leaner,requires only a fraction of the
parameters and the required training, inference, and adaptation times are much
lower (Raabe et al., 2022). Thus, it can be concluded that the development and selec-
tion of appropriate deep neural network architectures is a key task in sports infor-
matics in order to exploit the maximum potential of deep learning for sport-specific
problems. However, to enable thisdevelopment, other requirements are essential. In
particular, for the further development of deep learning in the field of sports, suit-
able, publicly accessible benchmark data sets as well as transparent representations
of used architectures are indispensable, which are significantly weaker in this domain
than in those fields where deep learning has already achieved groundbreaking suc-
cess (Raabe et al., 2022).
References
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam,
P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R.,
Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-
shot learners (arXiv:2005.14165). arXiv. http://arxiv.org/abs/2005.14165
Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image clas-
sification. In 2012 IEEE conference on computer vision and pattern recognition (pp 3642–3649).
https://doi.org/10.1109/CVPR.2012.6248110.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen,
P., Sainath, T., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech
recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6),
82–97. https://doi.org/10.1109/MSP.2012.2205597
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolu-
tional neural networks. Communications of the ACM, 60(6), 84–90. https://doi.
org/10.1145/3065386
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.
org/10.1038/nature14539
184 D. Raabe
Raabe, D., Biermann, H., Bassek, M., Wohlan, M., Komitova, R., Rein, R., Groot, T. K., &
Memmert, D. (2022). Floodlight—A high-level, data-driven sports analytics framework. Journal
of Open Source Software, 7(76), 4588. https://doi.org/10.21105/joss.04588
Raabe, D., Nabben, R., & Memmert, D. (2022). Graph representations for the analysis of multi-agent
21 spatiotemporal sports data. Applied Intelligence, 53, 3783–3803. https://doi.org/10.1007/s10489-
022-03631-z
Theiner, J., Gritz, W., Müller-Budack, E., Rein, R., Memmert, D., & Ewerth, R. (2022). Extraction
of positional player data from broadcast soccer videos. In Proceedings of the IEEE/CVF winter
conference on applications of computer vision (WACV) (pp 823–833).
Wagenaar, M., Okafor, E., Frencken, W., & Wiering, M. A. (2017). Using deep convolutional neural
networks to predict goal-scoring opportunities in soccer. In International conference on pattern
recognition applications and methods. https://doi.org/10.5220/0006194804480455.
185 22
Convolutional Neural
Networks
Yannick Rudolph and Ulf Brefeld
Contents
References – 192
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_22
186 Y. Rudolph and U. Brefeld
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected].
Key Messages
55 Convolutional neural networks (CNNs) are important machine learning models
for images or data with comparable grid structure
55 For example, in supervised learning, CNNs may extract features for classification
and object detection
55 CNNs are characterized by parameter sharing and by their properties regarding
shifted inputs
55 CNNs are useful for learning features of spatial situations given by positional
data from team sports
22.2 Background
Usually, several convolutional layers are combined in CNNs to process the data. In
a convolutional layer, several convolutional filters with learnable parameters operate on
the data. These operations can be thought of as moving the filters (in the two-dimen-
sional case: a matrix with learnable entries) over the input and computing an output
for each position of the filter by summing up the results of a point-wise multiplication
with the underlying input. Because a convolutional filter is applied multiple times to
22 an input, convolutional layers have comparatively fewer parameters than traditional
fully-connected layers. In this context, we also speak of parameter sharing.
To visualize a filter operation see . Fig. 22.1 which shows an input, a convolu-
tional filter, and the output resulting from the application of the filter. In the figure,
input values used to calculate the upper left output value are highlighted (the cal-
culation is given in the figure text). If the input has more than two dimensions (for
example, color images have multiple color channels in addition to the height and
width dimensions), the input and filter also have more dimensions. The output of
a filter operation remains a matrix. If there are multiple convolutional filters in a
layer, the respective outputs (also called channels or feature maps) are combined
and form a multidimensional tensor.
The example calculation in . Fig. 22.1 is useful to illustrate a property of
CNNs: If we shift the input one column to the right (padding it with zeros on the
left, for example) and apply the same convolutional filter again, we will do the same
calculation for the top right output value. That is: the output values will also shift
to the right. This property of CNNs to generate an equally shifted output for
shifted input is often described as translation equivariance—or as translation invari-
ance in terms of the final representation. Graphically, translation invariance means
that a CNN that has learned to classify balls in a particular region of an image can
also recognize balls in any other region of an image. The network can thus general-
ize, i.e., learn valid relationships and successfully apply them to new data. For true
translation equivariance or invariance, however, special conditions must be present
(see also Kayhan & van Gemert, 2020).
Usually, CNN architectures use multiple convolutional layers, of which the first
layer operates on the input and the subsequent layers operate on the respective
outputs of earlier layers. Thereby, features in layers that are deeper (that is: closer
to the final output) are usually influenced by a larger region of the input. Following
the biology of the human eye, we also call this region the receptive field, since it is
the area of the input that the feature can see. Because the receptive field grows with
the depth of a layer, the further the input is propagated through the network, the
more complex features CNNs can extract.
.. Fig. 22.1 Exemplary representation of a filter operation; the sum of the point-wise multiplication
(in color) is calculated as follows: 1 . 2 + 7 . 0 + 8 . 1 + 3 . 3 = 19
Convolutional Neural Networks
189 22
.. Fig. 22.2 The LeNet-5 CNN architecture for classifying handwritten digits (LeCun et al., 1998)
22.3 Applications
► Example 1
Already early CNNs were able to successfully classify (handwritten) digits (see back-
ground). Gerke et al. (2015) showed that CNNs are also suitable for the automatic clas-
sification of numbers on player jerseys. To do so, the authors created a dataset of more
than 8000 images of soccer players with legible jersey numbers and compared the classi-
fication using a CNN architecture to a baseline classification trained with features from
classical image processing. The study concludes that a relatively flat CNN with three
convolutional layers classifies numbers much better than the baseline method: while the
baseline classified only 40% of numbers in unseen test images correctly, the CNN could
classify numbers correctly in 83% of the test images. However, in Gerke et al. (2015), the
detection of players in the images was still implemented using methods from classical
image processing. ◄
190 Y. Rudolph and U. Brefeld
► Example 2
CNNs can also be used for object detection in images. A study by Wei et al. (2016) illus-
trates how CNNs for object detection might be relevant for sports analytics: In the study,
the authors propose so-called convolutional pose machines, which improve the detection
of human postures in images—known as pose prediction—by using CNNs. Pose predic-
tion is performed by detecting specific body parts that collectively describe the pose. The
22 detection of body parts as proposed by Wei et al. (2016) is performed by the sequential
application of CNN models. While the first CNN in this process operates only on the
image data, subsequent CNNs also consider the output of previous models. The idea
is, that information on relatively easy-to-recognize body parts (for example, head and
shoulders) can be extracted using the first CNN models. Passing this information into
subsequent CNNs can then enable the detection of more difficult-to-detect body parts
(such as elbows). On a dataset of postures from eight sports (Johnson & Everingham,
2011), the authors were able to achieve the best result in terms of correctly localized
body parts at the time. ◄
► Example 3
Regarding positional data in team sports, CNNs can be part of more complex mod-
els. An example of this is provided in a study by Fassmeyer et al. (2021), in which the
authors investigate the automatic classification of game situations in soccer using posi-
tional data. The study considers positional data with 25 positions per second, where
each position is converted into an image-like grid structure following the data transfor-
mation in Dick and Brefeld (2019). The models proposed by Fassmeyer et al. (2021) pro-
cess both the spatial and temporal dimensions of the data: For each time step, features
of the respective positions are extracted by a CNN and fed to a recurrent neural network
(RNN). RNNs are models that are suited for sequential data. To make use of positions
without a class label during training (keyword: semi-supervised learning), the models
learn a representation of the data, from which the data itself can be reconstructed. For
this purpose, the authors propose an autoencoder architecture. Empirically, Fassmeyer
et al. (2021) report very good results for the classification of corners and edges based on
the obtained representations. ◄
Study Box
Positional data (see 7 Chap. 6) is very much suitable for processing by CNNs. Dick
and Brefeld (2019) for example propose a model that automatically evaluates posi-
tions in professional soccer using a CNN architecture which operates on a spatial
representation of positional data. Specifically, the representation is based on the idea
of dividing the soccer field into a two-dimensional grid in which the positional data
Convolutional Neural Networks
191 22
of players and ball are encoded. The data is thus transformed to an image-like grid
structure. However, this data structure is not limited by the properties of images.
Instead of working with three RGB color channels, the authors use separate chan-
nels to encode different information. Specifically, Dick and Brefeld (2019) propose
nine channels, all initialized with zeros. See also the example in . Fig. 22.3.
In the first channel all player locations of the first team are set to a value of “1”.
Similarly, the locations of the opposing team and the ball are recorded in the second
and third channels. In layers four to nine the velocities of all players and of the ball
are recorded by entering the speed in both longitudinal and transverse directions at
the respective spatial locations.
The resulting representation is then processed by a CNN with three layers. The
output of the CNN is fed to a fully-connected layer, which estimates the value of
each position (as an additional input, the fully-connected layer receives the informa-
tion of ball possession). Comparing these estimated values to actual game results,
the authors conclude the usefulness of the learned evaluation for soccer analysis.
Among other things, the spatial representation helps to counter a permutation
problem: If the data was processed directly with a fully-connected neural network,
the order of players in the data would affect the evaluation of a position. Representing
the information of players of a team in only one channel counteracts this problem.
As an alternative to the approach described here, representing positional data as
graphs and processing positional data with permutation equivariant models, such as
graph neural networks (e.g., Yeh et al., 2019) or transformers (cf. Rudolph & Brefeld,
2022), appears promising.
.. Fig. 22.3 The data representation of positional soccer data suitable for processing with a
CNN as proposed by Dick and Brefeld (2019)
192 Y. Rudolph and U. Brefeld
References
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierar-
chical image database. In IEEE conference on computer vision and pattern recognition (pp. 248–
255).
Dick, U., & Brefeld, U. (2019). Learning to rate player positioning in soccer. Big Data, 7, 71–82.
22 Fassmeyer, D., Anzer, G., Bauer, P., & Brefeld, U. (2021). Toward automatically labeling situations in
soccer. Frontiers in Sports and Active Living, 3, 725431.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of
pattern recognition unaffected by shift in position. Biological Cybernetics, 36, 193–120.
Gerke, S., Müller, K. & Schäfer, R. (2015). Soccer Jersey Number Recognition Using Convolutional
Neural Networks. IEEE International Conference on Computer Vision Workshop.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In IEEE
conference on computer vision and pattern recognition (pp. 770–778).
Johnson, S., & Everingham, M. (2011). Clustered pose and nonlinear appearance models for human
pose estimation. In IEEE conference on computer vision and pattern recognition.
Kayhan, O. S., & van Gemert, J. C. (2020). On translation invariance in CNNs: Convolutional layers
can exploit absolute spatial location. In IEEE conference on computer vision and pattern recogni-
tion.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolu-
tional neural networks. In Advances in neural information processing systems.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document
recognition. Proceedings of the IEEE, 86, 2278–2324.
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., & Sutskever, I. (2021).
Zero-shot text-to-image generation. In International conference on machine learning.
Rudolph, Y., & Brefeld, U. (2022). Modeling conditional dependencies in multiagent trajectories. In
International conference on artificial intelligence and statistics.
Wei, S.-E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In IEEE
conference on computer vision and pattern recognition.
Yeh, R. A., Schwing, A. G., Huang, J., & Murphy, K. (2019). Diverse generation for multi-agent
sports games. In IEEE conference on computer vision and pattern recognition.
193 23
Transfer Learning
Henrik Biermann
Contents
References – 199
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_23
194 H. Biermann
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Traditional machine learning (e.g., using Deep Neural Networks, as discussed in
7 Chap. 22) generally requires large amounts of data (Big Data) and powerful
computing systems (computational clusters).
55 Transfer learning allows the utilization of pre-trained models (e.g., from image
processing) for solving new problems.
55 This may require a transformation of data representation (domain) if necessary.
55 Transfer learning can lead to new insights into problem-solving.
23.2 Background
In recent years, various fields have seen an increasing adoption of machine learning
methods, where models with a large number of parameters (several millions) are
tailored to specific tasks. This process involves training the models for the given
task, which requires a substantial amount of data (Big Data) to ensure their appli-
cability to unseen data, as well as powerful computational resources (rechenclus-
ter) to conduct the training within a practical timeframe. These conditions are
usually only met by large companies. For smaller groups (research teams) that
often lack access to a significant amount of task-specific data and powerful com-
puting resources, training such large models becomes impractical. However, while
the data and computational capabilities of large companies are not made publicly
available, there is an opportunity to download pre-trained models from the inter-
net. An example of such a model is ImageNet (Russakovsky et al., 2015), a deep
neural network (see 7 Chap. 10), which can reliably recognize over 1000 object
categories in more than one million images.
This is where transfer learning comes into play. Based on this principle, a model
that has already been designed (and trained) for a specific task can be used for a
new task. The layered structure of the models, consisting of different independent
layers, can be exploited. In the case of ImageNet, the first layer of the network
receives an image (e.g., an orange), processes it, and then passes the result to the
next layer of the network. This process continues until the final layer outputs an
object class. Consequently, in the final layer of the network, the object class is
determined based on the output of the penultimate layer. The output of the penul-
timate layer can be considered as a (high-dimensional) representation (feature vec-
tor) of the original input image (orange), containing all necessary information.
Since an individual layer (layer) of a deep neural network can be roughly compared
to a regression (Dreiseitl & Ohno-Machado, 2002), where data points in a (high-
dimensional) space are divided into different subgroups, it can be assumed that the
features created by the initial layers are domain-independent. For the example of
ImageNet, this means that the first layers perform image processing operations
comparable to “classical” image processing (edge detection, convolution, etc.).
Therefore, the pre-trained neural network can not only reliably recognize the 1000
object categories but also (domain-independently) extract relevant information
from an image.
As a result, in addition to the cost-effective use of very large models, transfer
learning also offers the opportunity to gain profound insights into the structure of
a problem through domain transfer. The transformation of data representation
plays a crucial role in this process. This is particularly interesting for complex con-
texts, such as those encountered in sports game analysis (. Fig. 23.1).
196 H. Biermann
23
.. Fig. 23.1 Overview of the idea of transfer learning
Definition
Transfer learning describes the process of repurposing a machine-trained model
for a new task. This enables the utilization of large and powerful models for spe-
cific tasks, even in the absence of powerful computational systems and large data-
sets for the given task.
23.3 Applications
► Example 1
As previously demonstrated, deep neural networks from image processing are com-
monly used as the base model. Thus, there are examples where transfer learning is used
to automatically recognize the content of visual data. In a study by Russo et al. (2019),
a pre-trained network from the ImageNet dataset was employed for the automatic rec-
ognition of sports videos. The authors compiled a video dataset comprising television
images from a total of 15 sports (football, rugby, table tennis, volleyball, basketball,
cricket, etc.). Using this dataset, they trained a model to automatically assign the correct
sport to the videos. The study’s results show that transfer learning from the pre-trained
model improves the previous gold standard accuracy of 96% to a perfect accuracy of
100%, showcasing the powerful image-processing elements of the pre-trained model and
the advantages of transfer learning. ◄
► Example 2
Another application of transfer learning in video images comes from a study by De
Campos et al. (2013). The authors present a model capable of automatically recogniz-
ing events in videos. The model’s hierarchical structure allows abstraction at different
semantic levels. The authors demonstrate this with an application to television images of
tennis matches. In the lowest (low-level) layer, individual shots are initially recognized.
This is followed by field markings, player detection, and ball recognition in upper lay-
Transfer Learning
197 23
ers. Finally, the top hierarchical layer detects events, enabling automatic annotation of
events such as serves, shots, or ball hits in tennis, and even automatic recognition of the
current score. This particular model structure can also be applied for transfer learn-
ing. The authors define anomalies that can be detected at each hierarchical level. An
anomaly in one of the lower hierarchical levels could indicate that the shown video is
not from a tennis match, while an anomaly in an upper level could suggest that too many
players are detected on the field. If an anomaly (different sport) is detected at the lowest
level, linear transformations are automatically triggered to adapt the model to the new
environment. By doing this, the authors successfully transfer the model to the sport of
badminton. ◄
► Example 3
Various transfer learning methods deal with tactical analysis in sports. As a new method
of graphical representation of positional data in football, Visual Rhythms (Rodrigues
et al., 2017) were introduced. This graph-based approach creates graphs from positional
data and extracts specific features from them. As these features vary over time (see time
series, 7 Chap. 26), they can be transformed into visual representations. These Visual
Rhythms can be imagined as sequences of pixels, where the image’s length encodes the
duration of a situation. The feature’s value can be represented by changing colors (simi-
lar to a heatmap). Furthermore, different Visual Rhythms for different features can be
stacked to create a two-dimensional image reflecting the evolution of various features
over time. This powerful and concise analysis tool provides a visual representation of
positional data, enabling detailed evaluation of various tactical concepts in football.
However, the specific application of Visual Rhythms for machine-based football analysis
has not been thoroughly investigated. ◄
► Example 4
Another example involves an adaptation of the well-known AlphaGo algorithm by
Google DeepMind (Silver et al., 2016). Its victory against professional Go player Lee
Sedol was perceived as highly significant by the community at the time. However, just
2 years later, a variation called the AlphaZero algorithm (Zhang & Yu, 2020) was intro-
duced. This algorithm, based on the original AlphaGo, can now play chess and shogi (a
Japanese chess variant) in addition to Go. Another unique feature of AlphaGo is that
it is trained solely by playing against itself and does not require external data (aside
from the rules of the games). In a showcase match against the then-gold-standard chess
engines Stockfish 8, AlphaGo convincingly won 64 out of 100 games, demonstrating
the adaptability of the original algorithm and the similarity between Go and chess. ◄
► Example 5
Transfer learning is also increasingly applied in other domains, such as in medicine,
where networks from the ImageNet Challenge (Russakovsky et al., 2015) can be trans-
ferred to the medical domain. An example is the “Medical Segmentation Decathlon”
198 H. Biermann
challenge (Antonelli et al., 2021), in which a medical image dataset with correspond-
ing labels was released. By applying transfer learning in this challenge, medical disease
detection and prevention are significantly improved. For example, the model can decide
for an image whether a healthy vessel or a harmful tumor is depicted. Challenges in this
problem include the relatively low amount of data and the relative similarity between
the target classes. ◄
Study Box
.. Fig. 23.2 Visual representation of positional data following the example of Wagenaar
et al. (2017)
References
Antonelli, M., Reinke, A., Bakas, S., Farahani, K., Landman, B. A., Litjens, G., …, & Cardoso, M. J.
(2021). The medical segmentation decathlon. arXiv preprint arXiv:2106.05735.
De Campos, T. E., Khan, A., Yan, F., FarajiDavar, N., Windridge, D., Kittler, J., & Christmas, W.
(2013). A framework for automatic sports video annotation with anomaly detection and transfer
learning. Machine learning and cognitive science, collocated with EUCOGIII.
Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and artificial neural network classifica-
tion models: A methodology review. Journal of Biomedical Informatics, 35(5–6), 352–359.
Rodrigues, D. C. U. M., Moura, F. A., Cunha, S. A., & Torres, R. D. S. (2017, February). Visualizing
temporal graphs using visual rhythms-a case study in soccer match analysis. In International
conference on information visualization theory and applications (Vol. 4, pp. 96–107). SciTePress.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale
visual recognition challenge. International Journal of Computer Vision, 115, 211–252.
200 H. Biermann
Russo, M. A., Kurnianggoro, L., & Jo, K. H. (2019, February). Classification of sports videos with
combination of deep learning models and transfer learning. In 2019 international conference on
electrical, computer and communication engineering (ECCE) (pp. 1–5). IEEE.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016).
Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–489.
Wagenaar, M., Okafor, E., Frencken, W., & Wiering, M. A. (2017, February). Using deep convolu-
tional neural networks to predict goal-scoring opportunities in soccer. In International conference
on pattern recognition applications and methods (Vol. 2, pp. 448–455). SCiTePress.
Zhang, H., & Yu, T. (2020). AlphaZero. In Deep reinforcement learning (pp. 391–415). Springer.
23
201 24
Random Forest
Justus Schlenger
Contents
References – 207
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_24
202 J. Schlenger
If the link is missing or does not work, please send an e-mail with the subject “SN
24 Flashcards” and the book title to [email protected].
Key Messages
55 Random forest is a machine learning algorithm based on the use of a large num-
ber of decision trees for predictions
55 The method can be used without much fine tuning and settings (out of the box)
55 The Random Forest combines the simplicity and intuitiveness of decision trees
with the complexity and flexibility of ensemble methods
55 Especially in the study of performance indicators in sports the binary decision
character lends itself
24.2 Background
However, as it turns out, a single decision tree has a very low flexibility towards
unknown data and tends to incorporate the random variance of the data for deci-
sion making (Biau & Scornet, 2016). This phenomenon is called “overfitting” and
results in a model that is not able to detect actual relevant patterns in the data. For
this reason, the Random Forest makes use of an arbitrarily large ensemble of deci-
sion trees, all of which have different and random subsets of the data and param-
eters at their disposal. When such a training process is completed, the resulting
forest can be used just like a single tree to make predictions about unknown data.
Taking a classification problem as an example, the prediction of the Random
Forest algorithm corresponds to the class that received the most “votes” of the
24 individual trees. Thus, the Random Forest algorithm combines the intuitive and
simple structure of decision trees with the flexible and robust nature of ensemble
methods, which are used in many state of the art machine learning methods (Hastie
et al., 2009). Last but not least, after using the algorithm, those parameters can be
filtered out that helped to achieve the comparatively best prediction overall.
Definition
The Random Forest is an ensemble machine learning method dedicated to classi-
fication or regression tasks by constructing numerous slightly different and uncor-
related decision trees.
24.3 Applications
.. Fig. 24.2 Exemplary representation of a decision tree using the example of score prediction in
basketball. The parameters ball possession (BB) and running distance (LD) are used as decision sup-
port for the prediction of victory (S) and defeat (N)
Random Forest
205 24
.. Fig. 24.3 Random forest with 7 decision trees using the basketball game as an example. The
trained RF is used to predict the team “Boston Celtics”, which shows an average possession of 61%
and an average percentage of total running distance per game of 54%
responding data from past games are collected and injected into the model. This results
in the decision tree shown in . Fig. 24.2.
When a team is considered, the classification starts at the root, following either the
left branch if the average possession is above 50% or the right branch if the average
possession is below 50%. Assuming our considered team has an average possession of
60%, another node follows which has a threshold at 75%. If our team is below this value
(False) a win is predicted. If this classification now takes place with an already trained
random forest instead, each examined unit is promoted through all decision trees in par-
allel. In . Fig. 24.3, hypothetical values have been chosen for the Boston Celtics team.
This results in a prediction of win (W) or loss (L) for the upcoming game in each of the
different decision trees.
Here we now check which of the two classes was assigned more frequently, in order
to finally use the more frequent class as a prediction by majority rule. ◄
► Example 2 RF
In this example, we consider a study by Smithies et al. (2021), which aims to find per-
formance indicators of the e-sport “Rocket League” that predict expertise and success
of e-sports players. In the video game “Rocket League,” each player controls a rocket-
powered sports car and attempts to launch a large ball into the opponent’s goal, similar
to the partially familiar “autoball.” In the process, the sports car can also drive along
walls and fly through the air by means of the rocket propulsion. The influence of metrics
such as “average speed”, “number of shots on goal” and “time spent in the air” on the
match result (success) and player rank (expertise) is tested. Here, the analysis of 20,000
matches revealed that “shots on goal taken”, “preventing opponent’s shots on goal” and
“ball saves” best predicted the final outcome of the match. On the other hand, for the
prediction of player rank (league system: bronze, silver, gold…), the metrics “time spent
on the ground” and “time in high speed”, for example, provided the highest accuracy. ◄
206 J. Schlenger
Study Box
In a study on soccer by Jamil et al. (2021), a set of key performance indicators (KPIs)
was used to try to distinguish Champions League goalkeepers from non-Champions
League goalkeepers. In this study, on a dataset with an observation number of
n = 14,671 match scenarios, the Random Forest algorithm was used for classifica-
tion, among other methods. As a result of this binary classification, it is shown that
the main difference is not so much in number of shots held as in short passing with
the foot. The Random Forest algorithm gave an accuracy of 0.66 which means that
66% of all predictions were correct. This is significantly above the 50% threshold
(baserate) that would be achieved by simple guessing. Predicting known player levels
to identify key performance indicators is one of the most common applications of
Random Forest in sports science. This framework can be adapted to different ques-
tions without any problems. Only the (input) parameters (independent variables)
and the coding of the target variable (dependent variable) have to be modified.
Random Forest
207 24
References
Biau, G., & Scornet, E. (2016). A random forest guided tour. TEST, 25(2), 197–227.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees.
Wadsworth & Brooks.
Groll, A., Ley, C., Schauberger, G. & Van Eetvelde, H. (2019). A hybrid random forest to predict soc-
cer matches in international tournaments. Journal of Quantitative Analysis in Sports, 15(4),
271−287. https://doi.org/10.1515/jqas-2018-0060.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Random forests. In The elements of statistical learn-
ing (pp. 587–604). Springer.
Jamil, M., Phatak, A., Mehta, S., Beato, M., Memmert, D., & Connor, M. (2021). Using multiple
machine learning algorithms to classify elite and sub-elite goalkeepers in professional men’s foot-
ball. Scientific Reports, 11(1), 1–7.
Jauhiainen, S., Kauppi, J. P., Leppänen, M., Pasanen, K., Parkkari, J., Vasankari, T., et al. (2021).
New machine learning approach for detection of injury risk factors in young team sport athletes.
International Journal of Sports Medicine, 42(2), 175–182.
Koenigstorfer, J., & Wemmer, F. (2019). What makes sports clubs successful at recruiting and retain-
ing members from the perspective of managers? Results from a random forest analysis. Journal of
Global Sport Management, 7, 644–663.
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to deci-
sion tree modeling. Journal of Chemometrics, 18(6), 275–285.
Smithies, T. D., Campbell, M. J., Ramsbottom, N., & Toth, A. J. (2021). A Random Forest approach
to identify metrics that best predict match outcome and player ranking in the esport Rocket
League. Scientific Reports, 11(1), 19285.
209 25
Statistical Learning
for the Modeling of Soccer
Matches
Gunther Schauberger and Andreas Groll
Contents
References – 214
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_25
210 G. Schauberger and A. Groll
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
25
Key Messages
55 For the modeling of soccer matches, the response/target variable can be defined
in different ways. Depending on the choice of the response variable, different
approaches of statistical learning or machine learning are suitable for the model-
ing and prediction of soccer matches
55 While at the beginning mainly classical regression methods were used, in recent
years machine learning methods such as extreme gradient boosting or random
forests have been applied more frequently
55 Machine learning methods could in particular improve the prediction quality for
new matches in the future, but are also more complex and more difficult to inter-
pret
In sports, and especially in soccer, by now a large amount of data is collected and
analyzed. An important goal of this evaluation is the modeling or prediction of
individual matches with the help of so-called statistical learning. One is often inter-
ested in the match outcome, which can be captured mathematically in different
ways, and how it depends on different covariates (features). For example, the match
outcome can be considered as the precise result in goals or as an ordinal variable
with the three categories “win team A”, “draw” and “win team B”. Typically, the
covariates are characteristics of the two competing teams, such as their respective
market values, standings, or previous performances. Furthermore, the general con-
ditions of the respective match (e.g. weather, home advantage, point vs. friendly
match, …) can also be included in the modeling. Principally, the modeling
approaches described here can also be applied to other sports, although there are
always special modeling requirements depending on the sport. For example, the
outcomes of tennis matches can simply be considered as binary variables (win
player A or win player B). The modeling of basketball matches, on the other hand,
Statistical Learning for the Modeling of Soccer Matches
211 25
differs from the modeling of soccer matches by the much higher numbers of scores
or points.
25.2 Background
The situation described above, where a certain response variable (in our case the
outcome of the match) is to be modeled with the help of covariates, represents a clas-
sic case of statistical learning, or more precisely so-called supervised statistical learn-
ing. The term “supervised” means here that for the individual observations (usually
the matches) not only covariates are available, but also the corresponding response.
In contrast, in the case of unsupervised learning, the response is unknown or not
available. Typical approaches from supervised learning are (linear) regression and
classification, a typical example of unsupervised learning are clustering methods. A
comprehensive introduction to statistical learning can be found in James et al. (2021).
In most cases, the basic question for the choice of the response variable is whether
the number of goals or the ordinal match outcome should be chosen. The more com-
mon choice is to model the number of goals. Since we are dealing with discrete count
data (0, 1, 2, 3, …), modeling using the so-called Poisson distribution is a good choice
instead of the commonly used normal distribution. For simplicity, the two numbers
of goals belonging to one match (i.e., the goals scored by Team A and Team B) are
often assumed to be (stochastically) independent (Groll et al., 2015). However, this
independence should only be understood as conditional independence (given the
information in the different covariates). The consequence of this assumption is that
the two goal counts of a match can be treated as two (conditionally) independent
observations in the data set. For example, a dataset of 100 matches would therefore
contain 200 observations. The alternative is to directly use the tuple of the two num-
bers of goals of a match as the response, which requires bivariate modeling or distri-
butional assumptions (Karlis & Ntzoufras, 2003; Groll et al., 2018). For example,
this can be solved using so-called copula regression (van der Wurp et al., 2020).
An ordinal match result would usually only contain the information “win team
A”, “draw” and “win team B” or “defeat team A”, respectively, which can be
encoded with the values 1, 2 and 3. Where necessary, the target variable may also
reflect the margin of the victory (or defeat), resulting in more than three categories.
In this context, a match represents a single observation. The response variable then
is an ordinal variable with the possible values 1, 2 or 3. In a regression context, so-
called ordinal regression is used for modeling (Schauberger et al., 2018).
In the simplest case, all of the possibilities mentioned above can be modeled by
a linear regression model, where the response is modeled as the sum of the linear
effects of the individual covariates. Usually, one unknown regression parameter
(effect) is estimated per covariate (see 7 Chap. 16). Sometimes a large number of
(potential) covariates can be involved in the modeling so that a very large number
of parameters would have to be estimated, with the consequence that ordinary
methods would be unstable or even unfeasible. In this situation, so-called regular-
ization methods, in particular variable selection techniques such as Lasso
(Tibshirani, 1996) or boosting (Friedman, 2001), can be helpful.
212 G. Schauberger and A. Groll
25
Definition
Statistical learning (Hastie et al., 2009) covers a large number of methods designed
to extract information from data. In so-called supervised learning, there is a
response variable that is to be explained and predicted using various covariates.
The relationship between the target variable and the covariates is observed on a
learning data set and, if necessary, is transferred to new observations to predict
their responses.
25.3 Applications
► Example 1
For the prediction of the 2014 FIFA World Cup (WC), Groll et al. (2015) considered
the modeling of matches from previous FIFA WCs, based on a learning dataset cov-
ering all matches of the WCs 2002–2010. Following the model of Dixon and Coles
(1997), a Poisson model was estimated in which the numbers of goals of a match were
assumed to be conditionally independent. However, their original (rather simple) model
was extended to include team-specific covariates. To handle the large number of param-
eters involved, a Lasso regularization was used to estimate the model. Variations of this
model were used to model and predict the 2019 IHF Handball World Cup (Groll et al.,
2020). For the prediction of the 2018 World Cup, the Poisson model was replaced by
a so-called random forest (a special machine learning model, see Groll et al., 2019; see
also Study Box). Further extensions of this model were then used in the prediction of
the 2019 FIFA World Cup in women’s soccer (Groll et al., 2019) and the 2020 UEFA
European Championship (EURO) in men’s soccer (Groll et al., 2021). In the latter work,
Statistical Learning for the Modeling of Soccer Matches
213 25
the prediction performance of the random forest approach was compared to so-called
extreme gradient boosting (xgboost, see Chen and Guestrin, 2016). ◄
► Example 2
In the context of modeling and predicting the UEFA EURO 2016, Groll et al. (2018)
investigated the validity of the assumption of conditional independence of the two num-
bers of goals of a match. For this purpose, they used a regression model based on the
bivariate Poisson distribution, which was first introduced by Karlis and Ntzoufras (2003)
for modeling soccer matches. Estimation of this model was performed via boosting.
The bivariate modeling has not shown any advantages over the simpler (independence-
based) modeling here.
In addition to the specific modeling assumption of a bivariate Poisson distribution,
also copula models can be used, where the marginal distributions are based on univari-
ate Poisson distributions. Such approaches were applied in the modeling of FIFA World
Cup matches by van der Wurp et al. (2020) and Van der Wurp and Groll (2021). ◄
► Example 3
In Schauberger et al. (2018), the influence of so-called match-specific variables on the
outcome of matches in the German soccer Bundesliga was investigated. Examples of
such match-specific covariates are the ball possession percentage or the running distance
of the two teams. For this purpose, an ordinal variable with five categories was used as
the response variable in order to be able to differentiate between higher and lower vic-
tory margins in addition to draws.
In Schauberger and Groll (2018), a special ordinal variant of the random forest was
used to model FIFA World Cup matches, where the match outcome was considered as
an ordinal variable with the categories “win team A,” “draw,” and “win team B” and
contrasted with different models for univariate goal modeling. ◄
Study Box
For the prediction of the FIFA World Cup 2018, Groll et al. (2019) created a training
dataset including all matches of the previous World Cups 2002–2014. Here, the indi-
vidual numbers of goals were considered as (conditionally) independent observa-
tions of the response variable, i.e., resulting in two observations per match (cf.
situation of Example 1). Various variables were collected for the competing teams,
namely economic factors (such as the GDP of the respective country), sportive fac-
tors (such as the respective position in the FIFA world rankings), variables repre-
214 G. Schauberger and A. Groll
senting the home advantage (e.g., a dummy variable for whether one of the two
teams in a match is the home team, i.e., the host country of the World Cup), vari-
ables related to the team structure (such as the average age of the respective team),
and variables related to the respective coach (e.g., the length of his tenure to date).
Furthermore, another sportive variable was added, which has a special role, since it
was not directly available, but had to be estimated by a separate statistical model.
The variable represents ability parameters which, for each team, reflect their current
strength (at the time of the start of the respective World Cup) and were estimated
based on historical matches (all matches from the last 6 years with a time weighting
factor).
A hybrid random forest model was then fitted on this training data set. The
model is referred to as hybrid because team-specific ability parameters, which them-
25 selves were derived from another statistical model, were added to the set of covari-
ates. Based on the estimated model fit and the teams’ covariates for the upcoming
2018 World Cup, the entire course of the tournament was then simulated 100,000
times. Thus, winning probabilities for all 32 participating teams could be determined.
In retrospect, the prediction performance of this hybrid random forest model was
compared on all 64 matches of the 2018 World Cup to various other statistical mod-
eling approaches, in particular also an ordinary random forest (i.e., without the
hybrid team-specific abilities) as well as the bookmakers’ betting odds. The hybrid
random forest model performed best and achieved very satisfying results overall.
References
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In proceedings
of the 22nd acm sigkdd international conference on knowledge discovery and data mining
(pp. 785–794).
Dixon, M. J., & Coles, S. G. (1997). Modelling association football scores and inefficiencies in the
football betting market. Journal of the Royal Statistical Society: Series C (Applied Statistics), 46,
265–280.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of
Statistics, 29, 337–407.
Groll, A., Heiner, J., Schauberger, G., & Uhrmeister, J. (2020). Prediction of the 2019 IHF world
men’s handball championship—A sparse Gaussian approximation model. Journal of Sports
Analytics, 6(3), 187–197.
Groll, A., Hvattum, L. M., Ley, C., Popp, F., Schauberger, G., Van Eetvelde, H., & Zeileis, A. (2021).
Hybrid machine learning forecasts for the UEFA EURO 2020. arXiv preprint arXiv:2106.05799.
Groll, A., Kneib, T., Mayr, A., & Schauberger, G. (2018). On the dependency of soccer scores—A
sparse bivariate Poisson model for the UEFA European football championship 2016. Journal of
Quantitative Analysis in Sports, 14(2), 65–79.
Groll, A., Ley, C., Schauberger, G., & Van Eetvelde, H. (2019). A hybrid random forest to predict
soccer matches in international tournaments. Journal of Quantitative Analysis in Sports, 15(4),
271–287.
Groll, A., Ley, C., Schauberger, G., Van Eetvelde, H., & Zeileis, A. (2019). Hybrid machine learning
forecasts for the FIFA women’s world cup 2019. arXiv preprint arXiv:1906.01131.
Statistical Learning for the Modeling of Soccer Matches
215 25
Groll, A., Schauberger, G., & Tutz, G. (2015). Prediction of major international soccer tournaments
based on team-specific regularized Poisson regression: An application to the FIFA world cup
2014. Journal of Quantitative Analysis in Sports, 11, 115–197.
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining,
inference, and prediction (Vol. 2). Springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning.
Springer.
Karlis, D., & Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. Journal
of the Royal Statistical Society: Series D (The Statistician), 5, 381–393.
Schauberger, G., & Groll, A. (2018). Predicting matches in international football tournaments with
random forests. Statistical Modelling, 18(5–6), 460–482.
Schauberger, G., Groll, A., & Tutz, G. (2018). Analysis of the importance of on-field covariates in the
German Bundesliga. Journal of Applied Statistics, 45(9), 1561–1578.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statisti-
cal Society Series B: Statistical Methodology, 58(1), 267–288.
Van der Wurp, H., & Groll, A. (2021). Introducing LASSO-type penalisation to generalised joint
regression modelling for count data. AStA Advances in Statistical Analysis, 107, 127–151.
van der Wurp, H., Groll, A., Kneib, T., Marra, G., & Radice, R. (2020). Generalized joint regres-
sion for count data: A penalty extension for competitive settings. Statistics and Computing, 30,
1419–1432.
217 26
Open-Set Recognition
Ricardo da Silva Torres
Contents
References – 221
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_26
218 R. da Silva Torres
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
26 Key Messages
55 In several practical pattern recognition problems, machine learning solutions
should be able to inform if a sample does not belong to any of the classes con-
sidered for training
55 In Sports Science, several practical problems should be modelled as open-set
recognition tasks
55 The use of open-set recognition solutions in Sports Science applications is a
problem overlooked in the literature
Computer Vision is one of the areas of Computer Science dedicated to the interpre-
tation and understanding of multimedia data. Several Computer Vision approaches
rely on the use of machine learning methods. Naik et al. (2022) overview recent lit-
erature in the area focusing on Sports Science. In several of the surveyed applica-
tions, open-set recognition (OSR) algorithms could be investigated. Two examples
will be used to illustrate the use of an OSR formulation. The first one refers to the
automatic classification of tactics employed by soccer teams, while the second refers
to the problem of recognizing relevant players’ actions in soccer videos.
26.2 Background
Huge collections of sport-related data have been created due to technological inno-
vations related to the development of monitoring systems and the availability of
low-cost powerful storage and processing computer systems. The adequate analysis
of the available data has been recognized as a valuable asset in supporting sports
analysis aiming at better-informed decision-making (Goes et al., 2021; Rein &
Memmert, 2016).
Open-Set Recognition
219 26
One relevant trend in the area of Sports Science refers to the use of data-driven
methods for supporting knowledge discovery. Among the most used approaches,
machine learning methods have been successfully used. Machine learning (ML) is
a subset of Artificial Intelligence technologies used to learn from data, aiming to
support prediction and inference tasks or the identification of associations among
data (Enholm et al., 2021). Existing ML methods are often grouped into four dif-
ferent categories: supervised, unsupervised, semi-supervised, and reinforcement
learning approaches. Supervised approaches (e.g., classifiers) assume the existence
of labels (class or category) associated with samples used in the training process. In
this case, models trained using a labeled collection are expected to generalize to
unseen samples at the test phase, i.e., a classifier is considered effective or accurate
if its use leads to correct predictions of the labels associated with test samples.
What if the testing sample does not belong to any of the categories considered
in the training phase? In this case, the ML method should not assign labels
considered in the training set; it should somehow inform that the testing sample
belongs to an unknown class. In fact, in several practical problems, the number of
classes or categories to be considered in the design and implementation of ML
methods can not (or should not) be defined in advance.
Definition
In machine learning, most of the time, we do not need, do not have access to, or are not
aware of all possible classes to consider at training time (de Oliveira Werneck et al., 2019;
Mendes Júnior et al., 2017; Neira et al., 2018). For instance, when classifying whether or
not a video contains a particular action of an athlete or a referee, we might have training
examples of only positive cases, i.e., videos associated with a predefined set of possible
actions (e.g., Naik et al., 2022). Open-set recognition (OSR) refers to the problem of
identifying the unknown classes during testing while maintaining performance on the
known classes (Oza & Patel, 2019).
The first initiatives towards the definition of OSR algorithms relied on the extension of
consolidated classification methods e.g., Support Vector Machines as explored by Scheirer
et al. (2012), and class proximity information as investigated by Mendes Júnior et al. (2017),
and Cardoso et al. (2017). More recently, deep learning approaches became a trend. Earlier
initiatives focused on predicting unknown samples in the final layer of proposed architec-
tures (Bendale & Boult, 2016; Ge et al., 2017; Liang et al., 2017. Liang et al. (2017), for
example, presented ODIN, an approach that explores temperature scaling and small per-
turbations in the detection of out-of-distribution samples. Bendale and Boult (2016) intro-
duced a reweighting scheme to redefine the output probabilities to detect unknown samples.
The proposed formulation, known as OpenMax, was extended by Ge et al. (2017), who
incorporated training procedures involving synthetic images generated by a Generative
Adversarial Network (GAN). In another research venue, studies have focused on genera-
tive OSR algorithms (Geng et al., 2020), which incorporate input reconstruction errors into
the deep neural network training process to support the classification of samples (Oza &
Patel, 2019; Sun et al., 2020; Yoshihashi et al., 2019). Yoshihashi et al. (2019) introduced a
framework for Classification-Reconstruction learning for OSR. Oza and Patel (2019), in
turn, proposed class-conditioned auto-encoders for the OSR problem, while Sun et al.
(2020) focused on a scheme based on Conditional Gaussian Distribution Learning.
220 R. da Silva Torres
26.3 Applications
Study Box
To the best of our knowledge, the investigation of OSR solutions in the context of
the Sports Science domain is still an overlooked problem in the literature (refer to
Boult et al. (2019) for a list of applications of OSR). Recent initiatives have investi-
Open-Set Recognition
221 26
gated open-set approaches in the context of Sports data analysis. Yoon et al. (2019)
investigated the use of a deep learning architecture for open-set recognition that
explores spatiotemporal representation based on the learning of motion and appear-
ance patterns. The study investigated the action recognition problem using datasets
composed of actions such as running, walking, climbing, jumping, ball kicking, etc.
In another study, Burns et al. (2022) investigated the use of deep triplet embeddings
for personalized activity recognition using data obtained by inertial sensors. In their
formulation, a deep learning model is trained by minimizing the distances of sam-
ples belonging to the same class and maximizing the distances of those of different
categories. Performed validation included datasets comprising not one daily but also
exercise and physiotherapy activities.
References
Bendale, A., & Boult, T. E. (2016). Towards open set deep networks. In Proceedings of the IEEE
conference on computer vision and pattern recognition (pp. 1563–1572).
Boult, T. E., Cruz, S., Dhamija, A. R., Gunther, M., Henrydoss, J., & Scheirer, W. J. (2019, July).
Learning and the unknown: Surveying steps toward open world recognition. In Proceedings of
the AAAI conference on artificial intelligence (vol. 33, no. 1, pp. 9801–9807).
Burns, D., Boyer, P., Arrowsmith, C., & Whyne, C. (2022). Personalized activity recognition with deep
triplet embeddings. Sensors, 22(14), 5222.
Cardoso, D. O., Gama, J., & França, F. M. (2017). Weightless neural networks for open set recogni-
tion. Machine Learning, 106(9), 1547–1567.
de Oliveira Werneck, R., Raveaux, R., Tabbone, S., & da Silva Torres, R. (2019). Learning cost func-
tion for graph classification with open-set methods. Pattern Recognition Letters, 128, 8–15.
Enholm, I. M., Papagiannidis, E., Mikalef, P., & Krogstie, J. (2021). Artificial intelligence and busi-
ness value: A literature review. Information Systems Frontiers, 1–26.
Ganesh, Y., Sri Teja, A., Munnangi, S. K., & Rama Murthy, G. (2019, June). A novel framework for
fine grained action recognition in soccer. In International work-conference on artificial neural net-
works (pp. 137–150). Springer.
Ge, Z., Demyanov, S., Chen, Z., & Garnavi, R. (2017). Generative openmax for multi-class open set
classification. arXiv preprint arXiv:1707.07418.
Geng, C., Huang, S. J., & Chen, S. (2020). Recent advances in open set recognition: A survey. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 43(10), 3614–3631.
Goes, F. R., Meerhoff, L. A., Bueno, M. J. O., Rodrigues, D. M., Moura, F. A., Brink, M. S., et al.
(2021). Unlocking the potential of big data to support tactical performance analysis in profes-
sional soccer: A systematic review. European Journal of Sport Science, 21(4), 481–496.
Liang, S., Li, Y., & Srikant, R. (2017). Enhancing the reliability of out-of-distribution image detec-
tion in neural networks. arXiv preprint arXiv:1706.02690.
Mendes Júnior, P. R., De Souza, R. M., Werneck, R. D. O., Stein, B. V., Pazinato, D. V., de Almeida,
W. R., et al. (2017). Nearest neighbors distance ratio open-set classifier. Machine Learning,
106(3), 359–386.
Naik, B. T., Hashmi, M. F., & Bokde, N. D. (2022). A comprehensive review of computer vision in
sports: Open issues, future trends and research directions. Applied Sciences, 12(9), 4429.
Neira, M. A. C., Júnior, P. R. M., Rocha, A., & Torres, R. D. S. (2018). Data-fusion techniques for
open-set recognition problems. IEEE Access, 6, 21242–21265.
Oza, P., & Patel, V. M. (2019). C2ae: Class conditioned auto-encoder for open-set recognition. In
Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2307–
2316).
222 R. da Silva Torres
Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: Future challenges and
opportunities for sports science. Springerplus, 5(1), 1410.
Scheirer, W. J., de Rezende Rocha, A., Sapkota, A., & Boult, T. E. (2012). Toward open set recogni-
tion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1757–1772.
Sun, X., Yang, Z., Zhang, C., Ling, K. V., & Peng, G. (2020). Conditional gaussian distribution learn-
ing for open set recognition. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition (pp. 13480–13489).
Suzuki, G., Takahashi, S., Ogawa, T., & Haseyama, M. (2018, October). Team tactics estimation in
soccer videos via deep extreme learning machine based on players formation. In 2018 IEEE 7th
global conference on consumer electronics (GCCE) (pp. 116–117). IEEE.
Wu, L., Yang, Z., Wang, Q., Jian, M., Zhao, B., Yan, J., & Chen, C. W. (2020). Fusing motion pat-
terns and key visual information for semantic event recognition in basketball videos.
Neurocomputing, 413, 217–229.
Yoon, Y., Yu, J., & Jeon, M. (2019). Spatio-temporal representation matching-based open-set action
recognition by joint learning of motion and appearance. IEEE Access, 7, 165997–166010.
Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., & Naemura, T. (2019). Classification-
reconstruction learning for open-set recognition. In Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition (pp. 4016–4025).
26
223 VII
Visualization
Contents
Visualization: Basics
and Concepts
Daniel Link
Contents
References – 231
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_27
226 D. Link
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Visualizations in sports serve to visualize patterns, trends, or outliners in sports-
related information, such as physiological, biochemical, or technical-tactical per-
27 formance data.
55 The goal is to enable a quick answer to performance diagnostic questions, for
example regarding the tactical behavior of the opponent, strength and weakness
profiles or correlations between training input and performance development.
55 In addition to general graphical elements such as network diagrams or bar
charts, it is often worthwhile to use sport-specific visualization methods that are
tailored to the respective cognitive interest.
27.2 Background
27.3 Applications
► Example 1
In the German Volleyball Association, visualizations of the set structure are used in the
context of opponent and self-analyses in beach volleyball. After the raw data has been
collected by analysts, it is loaded into a specific analysis software (BeachViewer) (Link
& Ahmann, 2013) and a graph of the match progress is generated (. Fig. 27.1). The
visualization is axisymmetric along the horizontal center line, which contains the score.
The vertical lines separate the individual rallies and visualize timeouts.
For the first attack sequence after the serve (sideout), the temporal action sequence
of a team is symbolically represented from the inside to the outside. For the serve and
reception, the quality levels are indicated (−, o, +, ++), for the representation of the
technique a straight (spike), curved (shot) and dashed (drive) arrow in green (point in
the first attack), yellow (rally was not ended by the first attack), red (attack error or kill
block) and orange (special cases) are available. The filled circle (rally success) indicates
whether it is a success in the sideout or a break. Successful sideouts are marked with
green and breaks with orange. The point difference gives a quick overview if and how
high a team is ahead or behind. This visualization allows analysts to identify phases of
strength and weakness, tactical changes or correlations between context variables (e.g.
after errors, after timeouts) and player actions. This graphic is not static, but individual
elements or even rallies can be hidden via filter settings. When clicking on an element,
the corresponding rally is played back in the video, so that a qualitative analysis of the
situation can take place. ◄
► Example 2
27 In beach volleyball—as in other sports—it is advisable to consider actions in their tem-
poral context (Link, 2022). A relevant question in the context of strategy development is
how a player acts after his own misses in the attack. Specialized representations adapted
to this question make this possible. . Figure 27.2 shows for one player in five games the
number of attacks by field zone (line, center, diagonal) for the techniques hard hit (left)
and shot (right), which played after a hard-diagonal hit into the block. However, the per-
formance diagnostic significance of the frequencies alone is not very meaningful—their
value only arises when they are presented relative to a norm. For this purpose, a color
coding of the target zones is used: the green coloring of a zone codes a positive deviation
from the norm, a red coloring a negative one. The norm here is the distribution of the
attack direction of all hard shots in the sample (i.e. not only by error).
.. Fig. 27.2 Representation of the spatial distribution of attacks in beach volleyball after misses in
the previous rally by court zone. The colors code the deviation in relation to the spatial distribution
of all attacks
Visualization: Basics and Concepts
229 27
In this example, it can be seen that after a diagonal spike into the block, the player
shows a tendency not to spike diagonally again, but rather tries to hit the middle area of
the field in case of another hard spike. If the attack is performed as a shot, no systematic
deviation from the norm can be detected visually. A possible practical consequence of
this analysis could be not to make another diagonal block against this player after he has
spiked diagonally into the block, but at least to try to cover the middle with one hand. ◄
► Example 3
In soccer game analyses, coaches and video analysts sometimes refer—explicitly or
implicitly—to the concept of availability. This tactical construct describes the probabil-
ity of success with which a player can pass the ball to a chosen teammate in a game situ-
ation. Availability is related to is related to pass risk (Power et al., 2017), but does not
refer to a retrospective view of the risk of a pass played in reality; rather, it asks whether
there was any possible pass at all that a teammate could reach with an acceptable chance
of success and in a space that was worthwhile from a tactical perspective.
. Figure 27.3 shows a visualization of availability for a moment in a soccer game.
The black circle marks the player with possession of the ball, red areas show the avail-
ability of a player, whereby the degree of transparency, indicates the probability of a
successful pass to that location. The players’ labels show the accumulated probability
value for a successful pass. The calculation is based on spatiotemporal data (see 7 Chap.
10) and a physical model calibrated via machine learning methods using ~100,000 real
passes of the soccer Bundesliga. The model is described in detail by Dick et al. (2022)
and uses the duration for each player to reach an interception location on the ball tra-
jectory based on a player motion model. The calculation is done for a variety of ball
trajectories, ball velocities, ball heights, and interception points.
The application of such a visualization essentially lies in the support of the evaluation
of individual game situations in the context of qualitative game analyses. For example,
situations can be automatically extracted in which only a few players were playable after
winning the ball or in which unfortunate passing decisions were made. For the passer, the
visualization process can be used to show possible face-off stations or to make risk-benefit
trade-offs transparent. On the side of the pass receiver, the question can be answered
whether players offer themselves to players in free spaces, or how long a player needs for
this. Both possibly provide valuable information for training design. ◄
Study Box
In the study “A Topography of Free Kicks in Soccer” (Link et al., 2016), the effects
27 of their execution location on the characterizing performance variables were investi-
gated on the basis of a sample of 1833 free kicks from the Bundesliga soccer league.
Instead of simple heatmaps, so-called isomaps (Stöckl et al., 2012) were used, which
continuously (and not discretely) represent the mean value of a variable on a two-
dimensional surface via color gradients (. Fig. 27.4). Using this visualization, it
was possible to show for example, how centrality and proximity to the goal influence
the type of execution of the free kick (. Fig. 27.4, left). The visualization of this
relationship provides a kind of norm for the goal kick vs. cross/pass decision in pro-
fessional soccer. Similarly, it was shown that crosses from the right tended to be more
successful than from the left (. Fig. 27.4, right). The reason for this is that crosses
from the right tend to be played by right-footed players with a trajectory away from
the goal and from the left with a trajectory towards the goal. Since more players are
right-footed, it can be argued that this trajectory was more successful, as the balls
may have been less likely to be intercepted by the goalkeeper. Likewise, the figure
shows that crosses from the back of the defense near the goal line were particularly
successful. This may be due to the fact that offensive players were able to initiate
running movements to the ball more quickly due to the lack of offside danger.
Visualization: Basics and Concepts
231 27
.. Fig. 27.4 Spatial distribution of the values of performance variables of free kicks
shown via ISO maps. The left graph yields a color coding of the proportion of free kicks
played as a goal kick (rather than cross or pass). The right graph represents the proportion
of successful (first after ball contact by teammate) crosses
References
C. H. Chen, W. K. Härdle, & A. Unwin (Eds.). (2007). Handbook of data visualization. Springer
Science & Business Media.
Dick, U., Link, D., & Brefeld, U. (2022). Who can receive the pass?—a computational model for
quantifying availability in soccer. Data Mining and Knowledge Discovery, 36(3), 987–1014. https://
doi.org/10.1007/s10618-022-00827-2
Link, D. (2018). Sports analytics—how (commercial) sports data create new opportunities for sports
science. German Journal of Exercise and Sport Research, 48(1), 13–26. https://doi.org/10.1007/
s12662-017-0487-7
Link, D. (2022). Spielanalyse in der Praxis: Beachvolleyball. In D. Memmert (Ed.), Spielanalyse im
Sportspiel (pp. 43–51). Springer Spektrum Berlin, Heidelberg.
Link, D., & Ahmann, J. (2013). Moderne Spielbeobachtung auf Basis von Positionsdaten.
Sportwissenschaft, 43(1), 1–11. https://doi.org/10.1007/s12662-013-0282-z
Link, D., Kolbinger, O., Weber, H., & Stöckl, M. (2016). A topography of free kicks in soccer. Journal
of Sports Sciences, 34(24), 2312–2320. https://doi.org/10.1080/02640414.2016.1232487
Power, P., Ruiz, H., Wei, X., Lucey, P. (2017). Not all passes are created equal. In Proceedings of the
23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax
(Canada) 2017 (pp. 1605–1613). ACM. https://doi.org/10.1145/3097983.3098051.
Stöckl, M., Lamb, P. F., & Lames, M. (2012). A model for visualizing difficulty in golf and subse-
quent performance rankings on the PGA tour. International Journal of Golf Science, 1(1), 10–24.
https://doi.org/10.1123/ijgs.1.1.10
233 VIII
Outlook
Contents
Outlook
Arnold Baca
Contents
References – 240
© The Author(s), under exclusive license to Springer-Verlag GmbH, DE, part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2_28
236 A. Baca
If the link is missing or does not work, please send an e-mail with the subject “SN
Flashcards” and the book title to [email protected]
Key Messages
55 Novel sensors will make a significant contribution to online exercise monitoring
or estimation of internal body processes and external influences during physical
activities.
55 Data sets obtained via sensor data fusion allow the derivation of relevant physi-
28 ological and tactical information, but also important conclusions about injury
risks.
55 The variety of technological possibilities will also lead to changes in the way
sports are practiced.
55 Data must be collected under clearly defined, standardized conditions so that
studies can be compared.
28.1 Trends
28.2 Sensors
In several market studies, wearables were and are seen as a central trend in the devel-
opment of sports. Areas of application include monitoring activity, tracking, pro-
moting motivation to exercise, assessing fitness (Passos et al., 2021), and providing
feedback. Recent developments (cf. Nithya & Nallavan, 2021; Lutz et al., 2019) sug-
gest the potential of holistic approaches, where different parameter values of single
or multiple exercising individuals are considered in their interplay and interaction,
and of sensor data fusion, where collected data from multiple sensors are linked. It
can be expected that relevant information on physiological processes, movement
execution, and tactics can be derived from this objectively determined data material,
but also that important conclusions on injury risks can be made possible. Challenges
(Lutz et al., 2019; Rana & Mittal, 2021; Zhang et al., 2019) lie in still existing deficits
in the longer-term power supply of the systems used, the availability of sensors to
capture effective biomarkers to assess internal physiological processes or responses,
and real-time feedback of biomechanical parameters. Future developments will
increasingly focus on aspects of presentation and the way information is rendered—
an optical display worn on the arm is not always the ideal solution.
Mencarini et al. (2019), based on a literature review, considered which aspects
in the development of human computer interfaces (HCI) are worth considering
when developing wearables in sport. In particular, it is suggested,
55 Consider alternatives to the wristwatch (shape and positioning) and to simulta-
neous feedback—here, for example, head-up displays (HUD), where informa-
tion is presented in such a way that no change in viewing direction is required,
could become more important
55 Place special focus on individual strengths and weaknesses and integrate sup-
port functions for groups
55 To increasingly consider cognitive and emotional aspects
238 A. Baca
It can be assumed that the variety of technological possibilities will also lead to
changes in the practice of sports.
Intelligent systems that shape their behavior autonomously to a certain extent
and manage this depending on environmental conditions and functionality will
play an even greater role in the future. Examples are the Mobile Motion Advisor
(Preuschl et al., 2010), which provides feedback depending on the current perfor-
mance and capacity, or intelligent strength training devices, which provide recom-
mendations for further movement execution based on information collected during
the sporting activity.
The availability of big data and the potential of machine learning methods have
also led to the rapidly increasing use of data-based approaches for different ques-
tions in sports and sports science (Bai & Bai, 2021). For team sports, for example,
the following application areas can be distinguished (cf. Baca, 2021):
55 Result prediction (Horvat & Job, 2020)
55 Injury prevention (Van Eetvelde et al., 2021) & Health monitoring (Wu et al.,
2021).
55 Analysis of the performance of players
55 Recognition of movement patterns
55 Tactics and strategy analysis
This development will continue in the coming years. This will be particularly true
for the prediction of match and competition results, as large sums of money are
involved in the betting sector. Furthermore, an increased use for enriching sports
broadcasts with additional information is to be expected.
However, machine learning methods have also been used for a long time in
the field of computer vision or for understanding and interpreting digital
images and videos. With the development of deep learning-based methods,
impressive results can be achieved here that also open up previously unknown
possibilities for sports. By automatically identifying two- or three-dimensional
structures and features from individual two-dimensional images or series of
such images taken by a single camera or by multiple cameras, information can
be extracted both on movements of individual persons or their body segments
and on interacting teams.
To support training and expand the presentation options for viewers, further
impressive developments can also be expected in the field of Virtual (VR) and
Augmented (also Enhanced; AR) Reality. Whereas in VR people act interactively
in a virtual environment generated in real time, in AR real objects and environ-
ments are also displayed three-dimensionally.
In training, this particularly concerns virtual and augmented environments for
interaction with virtual characters as preparation for competitions (Petri et al.,
2018). In this way, for example, reaction skills can be trained in the sport of karate.
In the future, in addition to optical sensory impressions, acoustic (e.g., the immer-
sion of a rowing blade in water) and haptic (e.g., touching sports equipment) feed-
back will increasingly be simulated.
In the media sector, the aim is to create opportunities to experience sporting
events directly from the point of view of the player in question and thus really be
in the middle of the action.
240 A. Baca
Probably the best-known example of the use of robots in sports is robot soccer.
This field of application will probably continue to be a test scenario for the use of
new methods of artificial intelligence and sensor technologies, since the goal in
RoboCup is still to beat the soccer world champion by the year 2050.
However, other possible applications are also being advanced. For example,
humanoid robots could also be used to guide sports exercises and support (fitness)
trainers (Griffiths et al., 2021).
Many of the methods and technologies addressed in this section collect, process or
store confidential, personal data. Protection of this data is not always guaranteed.
Fitness data collected via apps and wearables, for example, is often uploaded
directly to clouds or transmitted to providers. This can certainly be associated with
risks of which one should be aware.
References
Baca, A. (2015). Data acquisition and processing. In A. Baca (Ed.), Computer science in sport:
Research and practice (pp. 46–81). Routledge.
Baca, A. (2021). Machine learning. In J. Pino-Ortega & M. Rico-Gonzalez (Eds.), The use of applied
technology in team sport (pp. 230–241). Routledge.
Bai, Z., & Bai, X. (2021). Sports big data: Management, analysis, applications, and challenges.
Complexity, 2021, 6676297. https://doi.org/10.1155/2021/6676297
Blobel, T., & Lames, M. (2020). A concept for club information systems (CIS)—an example for
applied sports informatics. International Journal of Computer Science in Sport, 19(1), 102–122.
https://doi.org/10.2478/ijcss-2020-0006
Blobel, T., Rumo, M., & Lames, M. (2021). Sports information systems: A systematic review.
International Journal of Computer Science in Sport, 20(1), 1–22. https://doi.org/10.2478/ijcss-
2021-0001
Goes, F. R., Meerhoff, L. A., Bueno, M. J. O., Rodrigues, D. M., Moura, F. A., Brink, M. S., Elferink-
Gemser, M. T., Knobbe, A. J., Cunha, S. A., Torres, R. S., & Lemmink, K. A. P. M. (2021).
Unlocking the potential of big data to support tactical performance analysis in professional soc-
cer: A systematic review. European Journal of Sport Science, 21, 481–496. https://doi.org/10.1080
/17461391.2020.1747552
Griffiths, S., Alpay, T., Sutherland, A., Kerzel, M., Eppe, M., Strahl, E., & Wermter, S. (2021).
Exercise with social robots: Companion or coach? arXiv:2103.12940 [cs]. https://arxiv.org/
abs/2103.12940v1.
Horvat, T., & Job, J. (2020). The use of machine learning in sport outcome prediction: A review.
WIREs Data Mining and Knowledge Discovery, 10(5), e1380. https://doi.org/10.1002/widm.1380
Outlook
241 28
Lutz, J., Memmert, D., Raabe, D., Dornberger, R., & Donath, L. (2019). Wearables for integrative
performance and tactic analyses: Opportunities, challenges, and future directions. International
Journal of Environmental Research and Public Health, 17(1), 59. https://doi.org/10.3390/
ijerph17010059
Mencarini, E., Rapp, A., Tirabeni, L., & Zancanaro, M. (2019). Designing Wearable Systems for
Sports: A Review of Trends and Opportunities in Human–Computer Interaction. IEEE
Transactions on Human-Machine Systems, 49(4), 314−325. https://doi.org/10.1109/
THMS.2019.2919702.
Nithya, N., & Nallavan, G. (2021). Role of wearables in sports based on activity recognition and
biometric parameters: A survey. In 2021 international conference on artificial intelligence and
smart systems (ICAIS). https://doi.org/10.1109/icais50930.2021.9395761.
Passos, J., Lopes, S. I., Clemente, F. M., Moreira, P. M., Rico-González, M., Bezerra, P., & Rodrigues,
L. P. (2021). Wearables and internet of things (IoT) technologies for fitness assessment: a system-
atic review. Sensors (Basel), 21(16), 5418. https://doi.org/10.3390/s21165418
Petri, K., Bandow, N., & Witte, K. (2018). Using several types of virtual characters in sports—a lit-
erature survey. International Journal of Computer Science in Sport, 17(1), 1–48. https://doi.
org/10.2478/ijcss-2018-0001
Phatak, A. A., Wieland, F.-G., Vempala, K., Volkmar, F., & Memmert, D. (2021). Artificial intelli-
gence based body sensor network framework—narrative review: Proposing an end-to-end frame-
work using wearable sensors, real-time location systems and artificial intelligence/machine
learning algorithms for data collection, data mining and knowledge discovery in sports and
healthcare. Sports Medicine—Open, 7(1), 79. https://doi.org/10.1186/s40798-021-00372-0
Preuschl, E., Baca, A., Novatchkov, H., Kornfeind, P., Bichler, S., & Boecskoer, M. (2010). Mobile
motion advisor—A feedback system for physical exercise in schools. Procedia Engineering, 2(2),
2741–2747. https://doi.org/10.1016/j.proeng.2010.04.060
Rana, M., & Mittal, V. (2021). Wearable sensors for real-time kinematics analysis in sports: A review.
IEEE Sensors Journal, 21(2), 1187–1207. https://doi.org/10.1109/jsen.2020.3019016
Richter, C., O’Reilly, M., & Delahunt, E. (2021). Machine learning in sports science: Challenges and
opportunities. Sports Biomechanics, 81, 1–11. https://doi.org/10.1080/14763141.2021.1910334
Van Eetvelde, H., Mendonça, L. D., Ley, C., Seil, R., & Tischer, T. (2021). Machine learning methods
in sport injury prediction and prevention: A systematic review. Journal of Experimental
Orthopaedics, 8(1), 27. https://doi.org/10.1186/s40634-021-00346-x
Wu, X., Liu, C., Wang, L., & Bilal, M. (2021). Internet of things-enabled real-time health monitoring
system using deep learning. Neural Computing and Applications, 35, 14565–14576. https://doi.
org/10.1007/s00521-021-06440-6
Zhang, X., Shan, G., Wang, Y., Wan, B., & Li, H. (2019). Wearables, biomechanical feedback, and
human motor-skills’ learning & optimization. Applied Sciences, 9(2), 226. https://doi.org/10.3390/
app9020226
Zrenner, M., Heyde, C., Duemler, B., Dykman, S., Roecker, K., & Eskofier, B. M. (2021). Retrospective
analysis of training and its response in Marathon finishers based on fitness app data. Frontiers in
Physiology, 12, 669884. https://doi.org/10.3389/fphys.2021.669884
243
Supplementary
Information
Appendix. Third-Party Funds Competitively
Acquired by German Sports Scientists from
the German Research Foundation (DFG) in the
Review Board for Computer Science – 244
Index – 247
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer-Verlag GmbH, DE,
part of Springer Nature 2024
D. Memmert (ed.), Computer Science in Sport, https://doi.org/10.1007/978-3-662-68313-2
ppendix. Third-Party Funds Competitively
A
Acquired by German Sports Scientists
from the German Research Foundation (DFG)
in the Review Board for Computer Science
Index
Convolutional neural networks (CNNs), 179,
A 181, 186
CRAN, 112
Acceleration, 77
Critical power, 92
Actions, 28
Activation functions, 172
AI-based approaches, 28
Analysis routines, 127
D
Antagonism, 101 Data analysis, 110
Antagonistic models, 104 Data availability, 15, 66
Application Programming Interface Data fusion, 236
(API), 51 Data imbalances, 137
Artificial data, 16 Data mining, 142
Artificial intelligence, 202 Data processing pipelines, 127
Artificial neural networks, 170, 172 Data representation, 194
Athletic performance, 100 Decision making, 137, 142
Augmented reality, 239–240 Decision tree, 202
Automated notation, 38 Deep learning, 29, 180
Automatic text classifiers, 24 Definition, 4
Differential equations, 92
Distance, 77
B Domain-independent, 195
Dropout, 179
Beach volleyball, 226
dvs section Sportinformatik, 7
Benchmark datasets, 178, 179, 183
Dynamic pricing, 50
Big Data, 137, 238
Binary outcome, 136
Biomarker sensors, 237
E
Efficiency, 100
C ELO rating, 69
Energy provision, 90
Calculation, 58
Ensemble methods, 202
Carbon dioxide output, 90
Error function, 172
Classification algorithms, 136
Event data, 36, 142
Cloud storage, 238
Expected goals, 37, 186
Clustering, 144
External load, 74
Coaching, 143
Command line, 113
Complex dynamic systems, 58
Complex processes, 14
F
Complex system, 39 Fatigue, 100
Computer science, 194 Feature vector, 195
Computer vision, 8, 178, 179, 218, 239 Features, 179, 182, 187
Computing systems, 194 Field registration, 28
Conformance checking, 152 Forecast, 175
Context information, 47 Fully-connected neural
Contextual factors, 47 networks, 187
248 Index
G Metadata, 28
Modeling, 58, 82, 210
Genders, 46 Model parameters, 102
Gradient boosting, 210 Monte Carlo Simulation, 16
Graph-based neural networks, 182 Movements, 28
Multimedia data, 218
H
HTML, 51 N
Hydraulic model, 95 Network analysis, 158
Network science, 158
Neural networks, 46, 187
I Norm, 229
IACSS, 7
Image processing, 194
Indicators, 45 O
Individual case, 101 Online data, 50
Information, 195 Open-set recognition, 218
Institutionalization, 4 Open-source, 128
Interactive behavior, 158 Oxygen uptake, 90
Interactive programming, 128
Internal load, 74
P
K Packages, 112
Passing networks, 39
Key performance indicators (KPIs), 45, 182, Pattern recognition, 174, 218
203 Performance analysis, 82, 158
K-nearest neighbor, 181 PerPot model, 83
Knowledge-discovery process, 137 Physiological models, 74
Physiological performance, 85
Positional data, 44, 142, 186
L Poisson distribution, 211
Labels, 203 Prediction, 210
Lactate, 90 Prediction of performance, 143
Lasso, 211 Predictive models, 66
Learning rules, 172 Predictive quality, 66
Lexicon-based categorization of text data, 25 Process discovery, 151
Libraries, 128 Process enhancement, 152
Literate programming, 121 Process models, 151
Live-betting odds, 53 Profiles, 84
Load, 100 Programming languages, 126
Logistic regression model, 69 Propagation function, 171
M R
Machine learning, 8, 66, 218, 229, 239 Random forests, 69, 202, 210
Manual scraping, 51 Random numbers, 14
Mapping, 58 Recovery, 100
Markdown, 121 Recurrent neural networks, 179
Market efficiency, 67 Regression, 210
Medicine, 145 Regression analysis, 94
Metabolic power, 77 Representation, 187
249 G–W
Index