0% found this document useful (0 votes)
9 views

Data Science for Sports Analytics

The thesis 'Data Science for Sports Analytics' by Vangelis Sarlis explores the application of data science and machine learning in professional basketball, focusing on the NBA. It aims to enhance understanding of player performance, injury patterns, and economic impacts through comprehensive data analysis, proposing new algorithmic models and examining the relationship between injuries and performance. The research provides actionable insights for sports management, coaches, and healthcare professionals, contributing to the fields of sports analytics and data science.

Uploaded by

Juan Machado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Science for Sports Analytics

The thesis 'Data Science for Sports Analytics' by Vangelis Sarlis explores the application of data science and machine learning in professional basketball, focusing on the NBA. It aims to enhance understanding of player performance, injury patterns, and economic impacts through comprehensive data analysis, proposing new algorithmic models and examining the relationship between injuries and performance. The research provides actionable insights for sports management, coaches, and healthcare professionals, contributing to the fields of sports analytics and data science.

Uploaded by

Juan Machado
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 213

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/382048910

Data Science For Sports Analytics

Thesis · February 2024


DOI: 10.13140/RG.2.2.33853.06882

CITATIONS READS

0 1,075

1 author:

Vangelis Sarlis
International Hellenic University
14 PUBLICATIONS 260 CITATIONS

SEE PROFILE

All content following this page was uploaded by Vangelis Sarlis on 06 July 2024.

The user has requested enhancement of the downloaded file.


School of Science & Technology
International Hellenic University of Greece

Data Science for Sports Analytics

Vangelis Sarlis

Submitted to the
School of Science and Technology for the degree of

Doctor of Philosophy

January 2024

1
Dedications

This thesis is wholeheartedly dedicated to my parents,

Giorgos and Eleni,

whose unwavering support and encouragement have been


my strength throughout these years.

Try hard to reach the Sun...

If you fail, you will find yourself among the stars.

Giorgos Sarlis

2
Acknowledgements

I extend my deepest appreciation to my mentor and supervisor, Professor

Christos Tjortjis, for their unwavering guidance, support, and confidence in my

capabilities. Their willingness to grant me the autonomy necessary for my research has

been instrumental in identifying significant scholarly gaps through a fruitful

collaboration over the past five years. I am also privileged to have been a part of the

Data Mining and Analytics (DAMA) research group, offering special assistance to the

Greek National Health System during the challenging times of the pandemic.

Furthermore, my sincere gratitude goes to my esteemed colleagues and coauthors,

whose contributions have been indispensable to the advancement of my study and

research papers.

3
Declaration

I declare that I am the owner of the Thesis titled “Data Science for Sports Analytics”

and the presented research study. No portion of this work referred to another degree or

application or to any other university or institution.

The only known conflict of interest is with staff working at International Hellenic

University.

The author of the aforementioned thesis, with the inclusion of appendices, owns the

original copyright (© Copyright), and only International Hellenic University has the

right to use part or all of the research for educational, teaching, and promotional

purposes.

4
Abstract

This thesis ventures into the realm of Data Science (DS), and Sports Analytics are
becoming pivotal in shaping the future of professional sports. This thesis presents a
comprehensive exploration of their application within professional basketball,
particularly focusing on the National Basketball Association (NBA). The primary aim
of this research is to employ Machine Learning (ML) and Data Mining (DM) techniques
to deepen the understanding of player performance, injury patterns, and economic
impacts, thereby enabling more informed decision-making in sports management.

The objectives of this research are manifold. First, it seeks to benchmark existing
performance analytics and propose advanced algorithmic models to enhance the
predictive power and understanding of player and team performance metrics in
basketball. This involves a detailed analysis of NBA data spanning from 1996 to 2023,
providing a longitudinal perspective on the evolution of the game and its players.
Second, the study aims to quantify the relationship between player injuries and
performance by examining how demographic factors such as age and position, as well
as socioeconomic aspects, affect this dynamic. A particular focus is placed on
musculoskeletal injuries, their prevalence, and the implications for player career
trajectories and team strategies.

An additional objective is to analyse the economic ramifications of injuries, identifying


the costliest types and their impacts on team finances and player income. This research
aims to provide actionable insights for coaches, managers, and healthcare professionals
to optimize player care, team composition, and performance strategies. This study
employs a variety of ML and DS techniques, including feature selection, clustering, and
classification methods, to achieve these aims.

Through comprehensive data analysis, it was benchmarked and proposed new


algorithmic models to enhance the understanding and prediction of performance
metrics. This study delves into the correlation of injuries with players' age, position, and
performance, aiming to identify patterns and quantify their financial and strategic

5
impacts. We explore the socioeconomic and demographic factors influencing sports,
offering insights into the most prevalent injuries and their implications for the game.

Another aim is to focus on understanding injury patterns in the NBA and how these
injuries impact player performance. Utilizing a unique dataset, it identifies prevalent
injuries, the anatomical areas most affected, and explores the influence of these injuries
on players' performance post-recovery. The study stands out for its integrative method,
merging injury data with performance and salary information to shed light on the
interconnections between injuries, economic impacts, and on-court performance. It also
looks into the timing and seasonal nature of injuries to find patterns related to time and
external factors, as well as the specific effects of injuries on players' game-by-game
performance metrics. This research is aimed at aiding coaches, sports medicine
professionals, and team management by providing insights for injury prevention, player
rotation optimization, and targeted rehabilitation strategies.

The results of this research contribute to a more nuanced understanding of the


multifaceted nature of sports performance, the strategic importance of injury
management and prevention, and the significant economic considerations involved. By
revealing the intricate interplay between these elements, this thesis offers a valuable
resource for sports professionals, decision makers, and researchers, paving the way for
more effective strategies in team management and player development. This study also
sets a foundation for future work in this area, suggesting new avenues for research and
application in the burgeoning field of sports analytics.

The findings indicate a nuanced relationship between injuries, player performance, and
economic aspects, shedding light on the critical age range for peak performance and the
substantial financial burden injuries pose to teams. This research contributes to the fields
of sports analytics and data science by providing a deeper understanding of game
dynamics and presenting strategies for injury prevention and management, team
composition, and performance optimization. This thesis not only aids decision makers
in the sports industry but also sets the stage for future research in advanced sports
analytics and injury management strategies.

6
Keywords

Basketball Analytics; Business Intelligence; Data Analysis; Data Science

(DS); Injury Analytics; Machine Learning; Sports Analytics; Sports Data

Mining (DM); Musculoskeletal Injuries; Statistics; Sports Economics;

Sports Injuries; Statistics; Text Analytics; Text Mining

7
Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet the

requirements for an award at this or any other higher education institution. To the best

of my knowledge and belief, this thesis contains no material previously published or

written by another person except myself or where due reference is made.

Signature: _________________________

Date: _________________________

8
Copyright

The author of this thesis owns copyrights according to the "Copyright Statement"

section and has given International Hellenic University the right to use it for any

administrative, promotional, educational, or teaching purposes. Copies of this thesis,

either in full or partially, may be extracted in accordance with the regulations of the

Library and Information Centre of the International Hellenic University. Details of these

regulations may be obtained from the Librarian. This page must form part of any copies

made. The ownership of any patents, designs, trademarks, or any/all other intellectual

property labelled "Intellectual Property Rights" and any other reproductions, for

example, figures and tables labelled "Reproductions", which may be described in this

thesis, may not be owned by the author, and may be owned by third parties. Such

“Intellectual Property Rights” and “Reproductions” copyrights cannot and must not be

made available for use without prior written permission from the owner(s). Any further

information regarding the conditions under which disclosure, publication and

exploitation of this thesis occurred, the “Copyright Statement” and any “Intellectual

Property Rights” or “Reproductions” described in this section are available from the

Dean of the School of Science and Technology.

9
© 2024 – VANGELIS SARLIS

The Data Mining & Analytics Research Group

School of Science and Technology,

International Hellenic University

14th km Thessaloniki – Moudania, 57001 Thermi, Greece

ALL RIGHTS RESERVED

10
Consulting Committee

SUPERVISOR: Dr. Christos Tjortjis (Associate Professor School of


Science & Technology, International Hellenic
University)

MEMBERS: Dr. Georgios Evangelidis (Professor, Department of


Applied Informatics, School of Information Sciences,
University of Macedonia)

Dr. Aristidis Likas (Professor, Department of


Computer Science and Engineering, University of
Ioannina)

11
Table of Contents

Dedications _____________________________________________________ 2

Acknowledgements ______________________________________________ 3

Declaration _____________________________________________________ 4

Abstract _______________________________________________________ 5

Keywords ______________________________________________________ 7

Statement of Original Authorship ___________________________________ 8

Copyright ______________________________________________________ 9

Consulting Committee ___________________________________________ 11

Table of Contents _______________________________________________ 12

List of Tables __________________________________________________ 15

List of Figures _________________________________________________ 18

List of Research Questions _______________________________________ 19

1. Introduction________________________________________________ 20

2. Background ________________________________________________ 29

2.1 Sports Analytics _____________________________________________ 29

2.2 Data Mining and Machine Learning Used in Sports _______________ 33

2.3 Implementations of Machine Learning and Data Mining Techniques in


Sports ___________________________________________________________ 35

2.4 Player Injuries in Sports and Data Analysis Techniques ____________ 38

2.5 Major injuries that influence performance in the NBA League ______ 39

2.6 Socioeconomic Influence ______________________________________ 42

2.7 Sports Analytics and Text Mining NBA Data to Assess Injury Recovery
and Economic Impact ______________________________________________ 44

12
2.8 Injury Patterns and Impact on Performance in the NBA League Using
Sports Analytics __________________________________________________ 48

3. Data and Methods ___________________________________________ 53

3.1 Research Questions __________________________________________ 54

3.2 Aim and Objectives __________________________________________ 58

3.3 Methodology ________________________________________________ 62


3.3.1 Data Collection __________________________________________________________ 64
3.3.2 Data Engineering _________________________________________________________ 68
3.3.3 Data Analysis ____________________________________________________________ 69

4. Results ____________________________________________________ 72

4.1 Sports Analytics – Evaluation of Basketball Players and Team


Performance _____________________________________________________ 73

4.2 Sport injuries in the NBA League for 2010-2020 __________________ 76

4.3 Results through Data Mining and Machine Learning methods used for
Sports Injury Analytics ____________________________________________ 84

4.4 Findings on the Economic and Performance Impact of Injuries, Age and
Position on NBA Players ___________________________________________ 88

4.5 Results in Recovery from Injuries and Their Economic Impact______ 89

4.6 Results of Injury Patterns and Impact on Performance in the NBA


League Using Sports Analytics _____________________________________ 104

5. Discussion & Implications ___________________________________ 109

5.1 Discussion of Basketball Performance Evaluation ________________ 110

5.2 Basketball Performance Evaluation - Case Study ________________ 118

5.3 Aggregated Performance Indicator - Forecasting Scenario ________ 121

5.4 Health and Injury Analytics Discussion ________________________ 126

5.5 Discussion of Socioeconomic and Health Analytics _______________ 130

5.6 Injury Recovery and Economic Impact _________________________ 139

13
5.7 Discussion on Injury Patterns and Impact on Performance in the NBA
League Using Sports Analytics _____________________________________ 147

6. Conclusions _______________________________________________ 149

6.1 Evaluation of Basketball Players and Team Performance _________ 150

6.2 Impact of Injuries on Basketball Player and Team Performance ____ 152

6.3 Economic and Performance Impact of Injuries, Age and Position on


NBA Players ____________________________________________________ 154

6.4 Economic Impact of Injuries and Recovery Assessment ___________ 155

6.5 Injury Patterns and Impact on Performance in the NBA League Using
Sports Analytics _________________________________________________ 158

7. Future Work ______________________________________________ 160

7.1 Advanced predictive and prescriptive Sports Analytics approaches _ 163

7.2 GPS, biometric, and wearable sensor data analysis _______________ 163

7.3 Tactics, Strategy and Technical analysis ________________________ 164

7.4 Health, nutrition, and injury implications_______________________ 164

7.5 Video data analysis _________________________________________ 165

7.6 Trips, workload, sleep, and fatigue correlation __________________ 165

7.7 Social network analysis ______________________________________ 166

7.8 Budget control, investments, Risk Management, and forecasting analysis


167

7.9 Leadership and clutch skills __________________________________ 167

Appendices ___________________________________________________ 168

Abbreviations _________________________________________________ 189

References ___________________________________________________ 191

14
List of Tables
Table 1.1: Revenues of Sports Analytics in correlation with Sports Market spending
from 2015 to 2022 and Forecasting up to 2030. ........................................................... 23
Table 2.1: Datasets based on player performance metrics (regular and playoff season)
and injury and salary data for the period from 2000 to 2023. ...................................... 47
Table 3.1: Players’ performance analytics (regular and playoffs), injury analytics, and
salary data. .................................................................................................................... 67
Table 4.1: Injury analysis categorization based on health problems, organ systems and
anatomical areas/s/ sub-areas........................................................................................ 78
Table 4.2: Classification of distinct injury/pathology reasons for player absence during
the NBA period 2010-20. ............................................................................................. 80
Table 4.3: DS and ML techniques used in injury association with basketball analytics.
...................................................................................................................................... 85
Table 4.4: Performance and Injury Analytics Based on the Accuracy of Absence
Reasons. ........................................................................................................................ 86
Table 4.5: Team performance in relation to rest management in the period 2010-20. 87
Table 4.6: 2 Games before/after the injury. .................................................................. 90
Table 4.7: 5 games before and after the injury. ............................................................ 91
Table 4.8: 10 games before and after the injury. .......................................................... 93
Table 4.9: 2 games before/after the injury – Basketball Performance Analytics. ........ 94
Table 4.10: 5 games before/after the injury – Basketball Performance Analytics. ...... 95
Table 4.11: 10 games before/after the injury – Basketball Performance Analytics. .... 96
Table 4.12: Team Recovery Time Correlated with the Sum of Losses in the Period 2000
to 2023. ......................................................................................................................... 97
Table 4.13: Relationships of anatomical subareas with average recovery time and
economic losses. ........................................................................................................... 99
Table 4.14: Anatomical sub-areas injuries in comparison to Categorization (Defensive,
Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-2 Game Series. ........................................................................... 101
Table 4.15: Anatomical sub-areas injuries in comparison to Categorization (Defensive,
Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-5 Game Series. ........................................................................... 102

15
Table 4.16: Anatomical sub-areas injuries in comparison to Categorization (Defensive,
Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-10 Game Series. ......................................................................... 103
Table 4.17: Number of grouped health problems and percentage allocation. ............ 104
Table 4.18: Number of anatomical sub-areas of musculoskeletal injuries and percentage
allocation split. ........................................................................................................... 104
Table 5.1: Comparison matrix for basketball performance analytics. ........................ 112
Table 5.2: Data Mining Algorithms and Techniques Used in Sports Analytics. ....... 116
Table 5.3: MVP, Best Defender, Top Scorer, Top in Assists, Top in Steals, Top in
Rebounds and 3 best teams of the year for two seasons, 2017-18 and 2018-19. ....... 122
Table 5.4: API forecast for 2017-18. .......................................................................... 123
Table 5.5: API forecast for 2018 - 19. ........................................................................ 123
Table 5.6: API forecast for 2019 - 20. ........................................................................ 123
Table 5.7: DPI forecast 2017-18................................................................................. 124
Table 5.8: DPI forecast for 2018-19. .......................................................................... 124
Table 5.9: DPI forecast for 2019-20. .......................................................................... 124
Table 5.10: Top 30 most volatile players in the NBA championship for the period 2010–
2020. ........................................................................................................................... 127
Table 5.11: Top 30 players with the least important absences in the NBA championship
for the period 2010–2020. .......................................................................................... 128
Table 5.12: NBA performance analytics for Derrick Rose. ....................................... 128
Table 5.13: “Out of season” and “Out indefinitely” injury analyses for Derrick Rose.
.................................................................................................................................... 129
Table 5.14: Performance analytics of the Giannis Antetokounmpo NBA. ................ 129
Table 5.15: Financial losses of NBA teams associated with player health pathologies
and injuries (2010-11 up to 2019-20). ........................................................................ 131
Table 5.16: Team injury issues in the period 2010-20 categorized per season. ......... 132
Table 5.17: NBA Player average salary per position over 24 Seasons (1996-2020). 133
Table 5.18: Average annual NBA player salaries compared with those in the previous
year over 24 seasons (1996-2020). ............................................................................. 134
Table 5.19: NBA player age clustering in correlation with advanced basketball analytics
and average inflated salaries over 24 seasons (1996–2020). ...................................... 135

16
Table 5.20: Eight age clusters correlated with Player percentage, Salary adjusted for
inflation and Performance Estimate Rating (PER) over 24 NBA seasons (1996-2020).
.................................................................................................................................... 137
Table 5.21: Comparison matrix of Anatomical sub-areas for all the game series (2, 5 and
10). .............................................................................................................................. 141
Table 5.22: Comparison matrix of the performance metrics for all the game series (2, 5
and 10). ....................................................................................................................... 144
Table 5.23: Comparison matrix of more significant and effect size injuries in different
game series. ................................................................................................................ 146
Table 0.1: Advanced Rating KPIs .............................................................................. 168
Table 0.2: Defensive criteria - Advanced basketball statistics. .................................. 172
Table 0.3: Offensive criteria - Advanced basketball statistics. .................................. 173
Table 0.4: Overall Performance criteria - Advanced basketball statistics. ................. 175
Table 0.5: Pearson hypothesis correlation with performance and injury analytics. ... 176
Table 0.6: Musculoskeletal injury analysis results for the NBA period from 2010–20.
.................................................................................................................................... 180
Table 0.7: Player age clustering (ages 18-43) in correlation with basketball advanced
analytics and average wages in dollars ($) for 1996-2020. ........................................ 181
Table 0.8: Annual NBA league team financial losses in $ due to player health
pathologies and injuries (from 2010-11 up to 2019-20). ............................................ 182
Table 0.9: Annual inflation rates adjusted for 2019-20 as a baseline. ....................... 182
Table 0.10: Eight age clusters segmented across 3 playing Positions, in correlation with
Player percentage, Salary adjusted for inflation, and advanced basketball analytics over
24 NBA seasons (1996-2020)..................................................................................... 183
Table 0.11: Rating, Misc, Offensive and Defensive categorization of the Basketball
Performance Analytics ............................................................................................... 184
Table 0.1: Abbreviations for basketball analytics. ..................................................... 189

17
List of Figures

Figure 1.1: Sports industry market revenue evolution from 2015 to 2022. Projection
analysis from 2023 to 2030. ......................................................................................... 22
Figure 2.1: Sports Analytics Skills vs. Impact correlation. .......................................... 32
Figure 2.2: Average body profile of professional players in various sports compared
with that of an average man. ......................................................................................... 36
Figure 2.3: Structure and flow of the methodological approach. ................................. 47
Figure 4.1: Correlation matrix between NBA performance and injury analytics ........ 82
Figure 4.2: Pareto chart of pathology/injury events in the NBA teams from 2010–2020.
...................................................................................................................................... 83
Figure 4.3: Musculoskeletal anatomical sub-areas’ statistical significance and salary
correlation. .................................................................................................................. 106
Figure 4.4: Tornado diagram that analyzes the percentage variance in basketball
performance analytics in lesser/greater post-injury cases. ......................................... 108
Figure 5.1: Radial charts of percentage values and logarithmic normalization ......... 120
Figure 5.2: Average NBA player salary per team over 24 seasons (1996-2020). ...... 136
Figure 5.3: Left side: Radial chart distribution of average NBA player salary per team
over 24 Seasons (1996–2020). Right side: Team name and abbreviation.................. 136
Figure 5.4: Tornado funnel diagram for all NBA players over 24 seasons (1996-2020)
regarding Inflated Salary versus Age. ........................................................................ 138
Figure 5.5: Comparison of the median percentage change in anatomical subareas in
Game Series 2, 5, and 10. ........................................................................................... 141
Figure 5.6: Comparison of Median %Changes in Performance Metrics for Game Series
2, 5, and 10. ................................................................................................................ 143

18
List of Research Questions

• Research Question 1 – RQ1 in pages: 54, 58, 71, 73, 145

• Research Question 2 – RQ2 in pages: 54, 58, 124

• Research Question 3 – RQ3 in pages: 54, 58, 118, 145, 168

• Research Question 4 – RQ4 in pages: 54, 59, 118, 121

• Research Question 5 – RQ5 in pages: 55, 60, 76, 81, 152

• Research Question 6 – RQ6 in pages: 55, 60, 76, 83, 126, 127, 129, 152

• Research Question 7 – RQ7 in pages: 55, 76, 81, 126, 127, 129, 152, 154

• Research Question 8 – RQ8 in pages: 55, 84, 153

• Research Question 9 – RQ9 in pages: 55, 88, 130, 154

• Research Question 10 – RQ10 in pages: 55, 88, 131, 134, 154, 155

• Research Question 11 – RQ11 in pages: 55, 88, 89, 133, 134, 155

• Research Question 12 – RQ12 in pages: 56, 66, 89, 139, 140, 156, 157

• Research Question 13 – RQ13 in pages: 56, 93, 99, 143, 144, 156, 157

• Research Question 14 – RQ14 in pages: 56, 99, 100, 143, 156, 157

• Research Question 15 – RQ15 in pages: 56, 97, 99, 143, 144, 156, 157

• Research Question 16 – RQ16 in pages: 57, 61, 104, 105, 147, 158, 159

• Research Question 17 – RQ17 in pages: 57, 61, 106, 148, 158, 159

• Research Question 18 – RQ18 in pages: 57, 61, 108, 148, 158, 159

19
1. Introduction

Analytics are very well known in sports [1]. NBA player performance is negatively
affected by sport injuries. Despite the improvements in prevention and rehabilitation
strategies, injury rates remain high. Therefore, team senior management is required to
estimate and reduce injury risks while optimizing team tactics and strategies regarding
workload and rest days [2]. In particular, performance forecasting for betting companies
or sports clubs is incredibly important in terms of both value and improvement.
Therefore, Sports Analytics help and support domain experts in predicting possible dire
circumstances with the purpose of reducing costs and increasing team or player
performance [3].

Players’ performance prediction by using current and past data has gained attention,
particularly in basketball [4], [5]. Sports analytics and forecasting through these data
constitute a rapidly growing field with many methods that can be implemented from
different perspectives for each situation [6]. On a team, and specifically for the technical
staff and coaches, knowledge of the advantages and disadvantages of each player can
provide added value to the roster composition, new transfers, and changes in rhythm
during a match and other vital qualitative and quantitative factors [7]. The
aforementioned Performance Analytics are critically valuable for a team to minimize
budget costs, maximize team value and improve the processes in all layers and segments
of the flow [8].

In addition, many teams and countries have invested substantial amounts of money
in training athletes who can win competitions, Olympic Games, etc. Over the last few
years, basketball analytics has begun to increase traction and attempt to analyse games
in greater depth by finding advanced analytics to optimize team and player performance
[9]. New technological findings can provide the opportunity to collect additional data
and require new methods of analysis. Therefore, these new analysis methods could
exploit and generate added value for defining basketball players’ behaviour and helping
technical staff and coaches make better decisions [10].

20
Athletes are extremely popular not only within the sports industry but also in every
social activity they engage with. There is a holistic perspective for each social,
economic, and marketing activity. Player performance tied to team success increases the
social and financial status of the player and the team, as well as the competition in
general [11].

Sports clubs, technical staff, and owners organize their roster and strategy based on the
team budget. However, defeats negatively influence not only player performance but
also teams. It is important for a team to be proactive in terms of potentially upcoming
health issues. An injury or health problem not only provokes player absence, impacting
team performance but also, most importantly, causes economic loss. Regrettably, health
issues and pathologies are uncertain events that are difficult to predict [12].

Injury avoidance implies lower costs and more stability in team rosters. Therefore,
appropriate team selection by technical staff, involving monitoring and tracking of load
and rest management during matches and training, can lead to player and team
performance optimization. Biometric characteristics (such as speed, agility, height,
weight, and body mass), demographic information (such as age, country, and college),
nutritional status, training pace, psychology, marketing strategy and social life are
significant in terms of performance and salary expectations. Uncertainty minimization
in every sport can lead to cost reduction and efficiency and productivity optimization
[13].

Sports clubs are businesses that focus on profits either directly from competition wins
or indirectly from advertisements, court attendance, player reputations and other
monetary collaborations. Comprehensive cost‒benefit analyses of businesses have
focused on assessing the economic factors that influence players, teams, and
competitions. For example, research exploits the optimal ticket price based on city, team
reputation and fan level of interest. Thus, the sports industry is attempting to combine
sports analytics with business analytics to find the critical path between wins and cost
reduction [14], [15].

The global Sports Industry market size was evaluated for 2022, with 501 billion U.S.
dollars. The Sports Market is expected to grow at a compound annual growth rate

21
(CAGR) of 41.3% between 2021 and 2022. Furthermore, the sports ecosystem is
expected to be worth more than 700 billion U.S. dollars by 2026. The CAGR up to 2018
was 3.4%. This decline was mainly due to the economic slowdown across countries
caused by the COVID-19 phenomenon (Figure 1.1).

In Table 1.1 presented the evolution of Revenues of Sports Analytics in correlation with
Sports Market spending from 2015 to 2022 and the projection to reach 21.9 billion by
2030 at a CAGR of 27.3%. The emerging technologies and applications of AI and ML
increase the traction for new investments to assist the ecosystem and help key
stakeholders in decision-making [16], [17].

Figure 1.1: Sports industry market revenue evolution from 2015 to 2022. Projection
analysis from 2023 to 2030.

22
Table 1.1: Revenues of Sports Analytics in correlation with Sports Market spending
from 2015 to 2022 and Forecasting up to 2030.

Revenues (in Billion Sports Analytics Revenues (in


Year % Difference
dollars) Billion dollars)
2015 $ 327 $ 0.37 0.11%
2016 $ 338 $ 0.51 0.15%
2017 $ 350 $ 0.70 0.20%
2018 $ 362 $ 0.96 0.27%
2019 $ 459 $ 1.32 0.29%
2020 $ 388 $ 1.82 0.47%
2021 $ 355 $ 2.50 0.70%
2022 $ 501 $ 3.18 0.63%
2023 $ 548 $ 4.05 0.74%
2024 $ 597 $ 5.16 0.86%
2025 $ 651 $ 6.57 1.01%
2026 $ 710 $ 8.36 1.18%
2027 $ 774 $ 10.64 1.38%
2028 $ 843 $ 13.54 1.61%
2029 $ 919 $ 17.24 1.88%
2030 $ 1,002 $ 21.95 2.19%

Generally, sports data are irregular and sparse. They are sparse because the majority
of the players do not have long careers and do not remain in the same league and/or
team for many years. The data are not regular because the careers of each player belong
to different chronological periods [18]. A large variety of sports data, such as shots
attempted, fouls committed, defence metrics during the game and the kilometres they
run, and many other parameters of a game can be tracked with the use of SportVU
cameras. However, it is significantly more difficult to distinguish the dominant
performance analytics of each team/player than to distinguish them from the opponents’
performance. There are outlier factors, such as the psychological or physical condition
of each player/team, that can be analysed and provide additional valuable information
for decision making [19]. These are recognized electronic devices named Electronic
Performance and Tracking Systems (EPTS) that can measure all these additional data
through gyroscopes, magnetometers and accelerometer sensors that provide
opportunities to explore all these significant aspects in more depth [20].

This study examines in detail basketball health data collected over the last decade (2010-
20) through data mining (DM) in correlation with player and team performance. In

23
addition, commonalities were identified by performing classification according to the
injury and pathology criteria. Furthermore, this study provides important insights into
the impact of team injuries associated with advanced basketball analytics. Finally, the
performance impact of specific longevity injury types was assessed through DM
techniques. Injuries constitute the greatest concern for teams, management, and fans,
especially for the best players, because they can dramatically affect overall team
performance [21].

The increased ambiguity in injury prediction depends on many parameters, which are
difficult to recognize and quantify. Sports participation is extremely popular worldwide
and has physical, psychological, and social benefits. Regular engagement in sports has
been found to enhance (i) the musculoskeletal system by increasing muscular strength,
endurance and power and contributing to bone mineral content and density; (ii)
cardiorespiratory function, e.g., reducing the risk of coronal heart disease; and (iii)
mental health by promoting self-esteem and generally by improving quality of life. A
healthy and sports-active lifestyle for an athlete can also reduce any association with
injury in preparation before or during the game [11]. Despite the health benefits, taking
part in sports exposes athletes to high injury risks. The occurrence of injuries in
recreational and competitive athletes is affected by multiple factors, such as age, sex,
sport type (contact or noncontact), training workload, movement patterns that each sport
includes and other key factors, which were analysed further through this study [22],
[23], [24], [25], [26], [27], [28], [29], [30] and [31].

Basketball is widespread worldwide, ranging from recreational to professional gaming.


Its popularity is mostly due to NBA competition and promotion [2], as it is recognized
as the top basketball competition in the world [4]. The performance of NBA players
increases the popularity and socioeconomic condition of the player, the club, and the
whole league [32]. Basketball is considered generally a noncontact game with increased
physical demands and high-intensity and high-speed moving patterns ([2] and [33]).
However, basketball has evolved over time to become an increasingly physical game in
which contact is accepted and expected from players ([34] and [35]).

Big Data collection and information management are emerging sectors for trainers,
physicians, doctors, and health domain experts in sports, but they could help in injury

24
risk factor identification [36]. Hence, the use of biometric tracking technologies, such
as accelerometers, radiofrequency identification (RFID), heart rate (HR), and global
positioning system (GPS) sensors and wearables, can help individuals understand player
or team disadvantages, optimize performance, reduce potential injuries, and enhance the
recovery process [37].

This research attempts to gather all the proper analytics used in sports as state-of-
the-art performance indicators through sports data in decision making for basketball
games, teams, and players. Data mining involves identifying unknown structures and
performing data analytics [38], [39]. Hence, this approach could help decision-making
and predict uncertain data [40]. In conclusion, sports analytics could be extremely
helpful for educating the next generation of players, technical staff, and managers. These
teams can take advantage of these tools for the future prediction of roster composition,
optimization of tactics and avoidance of unexpected circumstances [41], [42].

This research study attempts to apply Data Mining (DM) techniques and methods
through Sports Analytics on socioeconomic, demographic and injury basketball
attributes in relation to advanced basketball performance analytics. Extracting and
analysing sophisticated sports metrics provides valuable insights, followed by critical
result evaluation to support decision makers in structuring their strategy, planning,
budgeting, tactics, and training to minimize risks and financial losses [43].

Big data analysis in basketball has indicated that economics and game theory can be
utilized to make predictions about social, economic, and political topics. Player and
team momentum are called in sports jargon “hot/cold hand” and “hot/cold streak” in
basketball and baseball, respectively. This momentum is derived from psychological,
social, economic, and formative aspects. Nevertheless, studies have shown that it is
difficult to accurately project these correlations [15], [44], [45]. Another study has
shown that athletes with high social rankings and reputations designate more “winners”
and “losers” characteristics in their playing attitudes [46].

An athlete with an injury needs proper medical and recovery assistance to return to the
same or better level as before. Therefore, sports clubs must know that all parts of the
service chain (doctors, physiotherapists, performance coaches, trainers, coaches, etc.)

25
are required for each issue in correlation with appropriate financial analysis at the
management level. Hence, injury impact should be seriously considered regarding team
strategy, load, rest, and recovery management. Noteworthy sports and other combined
analytics provided by microtechnology (such as training process, type of training, injury
history, stress tolerance, recovery adaptation, psychological status, and leadership skills)
have financial implications [46], [28], [47].

Health and medical businesses started focusing on sports by the 1990s, seeking vast
compensation packages for athletes due to medical care. These increasing economic
returns of investments start from colleges up to multimillion dollars in contracts that
players earn in their professional careers. Sports industry stakeholders include doctors,
performance coaches, physical trainers, physicians, and others taking care of athletes to
address issues affecting their performance and lead them into success [46], [48].

Several studies have shown that effective sports data analysis can lead to performance
improvement and that it has a remarkable positive economic influence. The sports
ecosystems of domain experts, such as scouters, technical staff, managers, sports
journalists, and sports data analysts, use data with the purpose of identifying valuable
insights. In addition, media and betting companies increasingly tend to use sports
analytics and advanced statistics through metrics, infographics and visualizations for
evaluation and prediction purposes [49], [50], [51], [52], [53].

Resource management is a crucial factor in the sports industry. Hence, each sport club’s
human resource team should hire the best staff (players, technical team, medical experts,
etc.) at the lowest price. Therefore, appropriate data analysis applied for talent
management is the most difficult part of maintaining the right balance [54], [55].

The objectives and corresponding contributions of this PhD thesis can be summarized
as follows:

i) This is a systematic review of the literature on the topics of sports analytics and
the combinations of sports and the data science ecosystem.

26
ii) In order to enhance feature selection and mitigate data noise, it was employed
suitable data preprocessing techniques. Consequently, these processed data can
aid decision-makers by offering a refined comprehension of the categorized
attributes pertaining to injury expenses in the NBA League.

iii) To support budget allocation, player selection, and rotation management and to
structure a clear vision for sport clubs, real peak performance was recognized
according to clustering based on age and player position in correlation with
advanced basketball performance analytics.

iv) To illustrate health pathologies and injuries with the most financial expenses and
help decision makers create strategies for game selection, training, load
management and recovery.

The structure of this PhD thesis includes the following sections:

2. Background: This section describes several important concepts and terminologies


of sports analytics and the related literature. The paper reviews the literature on
important sports analytics concepts correlated with players’ and teams’ health data.
Considering the important concepts and past usage of Data Science, DM and Sports
Analytics are associated with social, economic, and demographic attributes in the
literature.

3. Methodology – Research Design: Problem definition, important remarks, existing


sports analytics, and applied research algorithms. The definition of the problem,
noteworthy characteristics and analytics, important injury and pathology analytics
that influence team and player performance. Furthermore, the paper discusses the
problem definition, sets out the research questions, aims and objectives, and
proposes a methodology, including remarkable attributes, important advanced
basketball analytics and DM techniques.

4. Results: A comparison of existing advanced metrics for both team and player
performance was conducted. Moreover, this approach facilitates a comprehensive
data analysis of basketball injury analytics in relation to team and player

27
performance. Additionally, the authors presented the results of the data analysis and
visualized the impact of NBA basketball analytics on performance, offering insights
correlated with performance and players' injuries spanning over two decades.

5. Discussion & Implications: In this section, it was presented and discuss the results
and observations. Additionally, a clear comparison of the existing and historical
basketball analytics is made. Furthermore, a case study is provided to explain in a
more systematic way the basketball analytics presented in the previous sections. In
the forecasting scenario section, it was introduced two prediction formulas for the
MVP and defender of the year in basketball. The results illustration and critical
analysis based on the inferred observations. In addition, this study provides a clear
experimental analysis and evaluation of existing and past basketball injury analyses.
Moreover, this thesis provides insights, critical evaluation, and analysis of the
results.

6. Conclusion: The intention to evaluate related performance analytics and whether


they expand can be applied to different domains of sports analytics. The purpose of
evaluating the association between basketball performance analytics and injury
analytics for teams and players is to determine how such an approach can be applied
to different sports domains. In addition, it offers conclusive evaluation along with
potential applications in different domains or sports.

7. Future Work: Based on the conclusions, directions for future work are proposed.
Further work is suggested for the exploitation of these research opportunities.

Appendices: Categorization of important basketball analytics based on several factors


of advanced basketball statistics. Segmentation and classification of important
basketball injury analytics based on different features of advanced basketball analytics.
Feature selection, classification and segmentation of important injury analytics based on
different characteristics of advanced basketball analytics.

28
2. Background
The "Background" section of the thesis offers a comprehensive examination of
sports analytics, emphasizing its transformative impact on basketball. This study
underscores how the field merges information technology and scientific principles to
revolutionize decision-making in sports. By analysing historical and real-time data,
sports analytics enables stakeholders to predict future outcomes and make informed
decisions. This section outlines the various methodologies integral to this field, from
data collection and management to the use of advanced computational methods and
forecasting models. These tools and techniques are crucial for deriving meaningful
insights from vast quantities of sports data, which in turn support strategic decisions at
critical moments in the game.

Performance metrics receive significant attention in this section, highlighting their


crucial role in evaluating team composition, player performance, and overall strategic
planning. This background illustrates the growing prominence of sports analytics, not
only within the sports industry but also in broader business sectors, due to its ability to
provide deep insights into performance and strategy. The increasing trend toward the
adoption of sports analytics is attributed to its comprehensive approach, which considers
a variety of factors, including training, psychology, and player conditioning.

Furthermore, the section delves into the challenges and complexities of integrating
scientific research with practical sports experience. Growing attention has been given to
Data Mining (DM) and Machine Learning (ML), demonstrating how these technologies
automate and refine processes such as classification, segmentation, and forecasting. This
automation is pivotal in enhancing both team and individual player performance. A
critical component of the background is the discussion on sports injuries, their economic
ramifications, and the application of ML techniques in predicting and managing these
risks. This section provides a nuanced understanding of how injuries affect player
performance, team dynamics, and financial outcomes, emphasizing the need for long-
term, comprehensive strategies for injury management.

2.1 Sports Analytics


Sports analytics is an upcoming and promising segment of analytics that combines
information technology (IT) and science. Its purpose is to explore athlete performance
based on past and current data and provide projections with purpose to improve business
29
decisions [56]. Sports analytics includes the processes of data collection and
management, computational methods, and forecasting modelling to utilize knowledge
extracted from sports data and support decision making [3].

Domain experts, managers, technical staff, owners, and players pay attention to
advanced analytics, with serious consideration given to key decisions based on data,
advanced metrics, Artificial Intelligence (AI) and technology. Sports analytics have
attracted considerable attention not only from the sports industry but also from the whole
business in general because of the use of data analysis, machine learning (ML)
algorithms and technological methodologies as good examples of applying AI
approaches for their purpose based on their needs [14] and [57]. Sports industry
stakeholders can gain significant insights into team and player performance through
advanced analytics with the purpose of helping them decide faster and more accurately
during tough game/season moments. Consequently, a team can increase its chances of
winning. Team or player performance is the outcome of many parameters, such as
training and condition, psychology, body and mind preparation and other crucial factors.
Appropriate statistical analysis and critical review of findings with the assistance of
state-of-the-art technological tools and methods are important for success [58] and [59].

The term “Sports Analytics” also referred as “Statistics in Sports” in the bibliography,
comprises the segment of data collection and management, predictive modelling, and
computational methods to find valuable information for sport-related decision making
[60].

Alternatively, Sports Analytics is a scientific field that deals with the collection and
analysis of past and current sports data [61]. This collection combines and applies
methods that can give added value to a player or a team. Through this gathering and
investigation, these metrics can qualitatively analyse owners, players, coaches, and team
staff to help them predict future situations or make suitable decisions.

Sports data can be either qualitative or quantitative and come from various sources, such
as boxscores, videos, demographical, medical, and scouting reports. The data collected
should be standardized, integrated, and analysed through different basketball analyses
to enable decision makers to make critical decisions [15], [62].

30
One of the most significant topics in sports analytics is the identification of performance
analytics in teams and players. By analysing them, there is a direct impact on team
composition and on player evaluation and decision making by subject matter experts,
coaches, and technical staff [63].

Recently, there has been a highly increasing trend in sports analytics, recognized as one
of the hottest topics of analytics in general. There are many web pages and blog articles
but also a scarcity of credible peer-reviewed research articles. The challenge of Sports
Analytics is that domain experts need to combine their scientific research experience
with their sports professional career (as a player or coach) and understand how to
critically analyse important sophisticated basketball analytics [64].

The combination and application of appropriate tactics can provide added value to a
player or team to implement this critical initiative to make the right decision at the right
time. Through proper data gathering and data analysis, basketball analysts can provide
qualitative analysis to team owners, players, coaches, and technical staff to help them
predict future situations and make the right decisions to improve their performance [65].

The required skillset and influence in each sport include not only describing and
providing the data but also diagnosing, forecasting, and making critical reasoning.
Figure 2.1, illustrates the correlation between the influence of sports analytics and the
skills needed. There is a recent trend in sports analytics to move from descriptive (sports
reports) to prescriptive analytics with the purpose of understanding the game in more
depth. By using data modelling techniques and optimizations, the technical team and
subject matters can obtain valuable insights and recommendations [14]. Sports analytics
are also increasingly adopted by business ecosystems in a plethora of companies, sports
segments, teams, technical staff, and athletes [66].

31
Figure 2.1: Sports Analytics Skills vs. Impact correlation.

Generally, in sports, there are many features that determine charismatic talent, including
training time, past performance, sports IQ, and basic skills; these can be objectively
measured, such as reaction time, height, hand length, wingspan, body weight and many
other anthropometric or game-related characteristics. For instance, studies have shown
that players with long arms and better agility draft more often in NBA draft lotteries
[46].

Sports businesses invest substantial amounts of money; therefore, every single data
column is important in decision making involving ticket pricing, roster selection,
opponent analysis, and many match day decisions [9]. Therefore, this field is a sport
science field that manages data retrieval as well as the analysis of past and current
advanced statistics [61]. The combination of Data Science (DS) and ML methods can
be a competitive advantage for a team and player in targeted milestones. Sports analytics
can provide quantitative and qualitative analyses to team owners, technical team staff
and players with the purpose of understanding in more detail the past, making proper
decisions in the current situation, and predicting future circumstances for maximizing
the accuracy of the goal [63].

Sports data can be retrieved from multiple sources (quantitative and/or qualitative). The
variety of sports data can include boxscores, health, injury, videos, biometrics, and other
important metrics. The data retrieval process can be integrated, standardized, and

32
analysed through diverse advanced sports statistics with the purpose of supporting
decision-making in critical situations [15], [62].

There is high complexity in the effort to correlate and analyse several types of sports
data resources, including analysis of tactics, video tracking, roster formation, injury
prediction, and physical performance measurement through Electronic Performance and
Tracking Systems (EPTS) [67].

2.2 Data Mining and Machine Learning Used in Sports

Data mining involves the discovery of patterns or rules from substantial amounts of data
and the process of searching for valuable information in the data [38]. Therefore, one or
more techniques can be used for automated analysis and knowledge data extraction.
Furthermore, the purpose of the data analysis process was to solve the described
problems in the examined datasets. Sports teams use data mining methodologies for
interpretation or segmentation purposes, which will ultimately help them in decision
making. Assembling DM techniques and valuable information can boost a team and
provide a competitive advantage.

The classification of individual players or teams based on their performance can reveal
different perceptions or ways of play. Once the preferences in each position have been
decided and gameplay is set, the managers and coaching staff can ultimately understand,
drill down and analyse insights to choose the best option for each situation. Through
these advanced basketball analytics, this sophisticated approach can be personalized for
each team/player preference or performance [68]. With the implementation of these
concepts, it could be automated such procedures for optimized classification,
segmentation, and forecasting.

Some sophisticated algorithms that can be used in sports analytics are listed below:

1) Random Forest models are classification and regression models [69], [67] and [70].

33
2) AdaBoost is a classification algorithm that can be extended for regression [71], [72],
[73] [13] and [74]

3) A Multilayer Perceptron (MLP) is a network of perceptrons that are neurons with


multiple inputs and one output [38], [13] and [71].

4) Radial Basis Function (RBF) Networks is a class of functions whose value


increases/decreases based on the distance of the central point [75], [76], [77] and
[78].

5) Association rule-based models are algorithms that use techniques to extract


relationships from hidden items through different datasets. These studies are
discussed in [40], [71], [79], [80], [75] [81] and [82].

At this point, evaluated multiple modelling approaches combined to provide the


optimum performance rating accuracy for players and technical staff. Past research has
used classification methods, clustering, or both. The state-of-the-art algorithms used in
sports analytics based on the literature include the following:

1) Neural Networks can be used for both classification and prediction purposes
(deep learning, dropout), as stated in [69], [72], [83], [75], [68] and [71].

2) Decision Trees are predictive models, as described in [84], [85] [78], [73], [9], [14],
[19] [86] and [87].

3) Bayesian Networks are probabilistic classifiers (Markov Blanket) as explained in


[72], [13], [60], [79], [88] and [74].

4) Support Vector Machines (SVM) are classifier and regression analysis algorithms
that were developed as described in [78], [89], [75], [13], [90] and [72].

5) Linear and Logistic Regression are models for team and player forecasting [91],
[60], [92], [93], [18] and [19].

34
6) Unsupervised learning through clustering is an algorithm for partitioning where the
center of each cluster is displayed by the mean value of the objects [63], [94], [95],
[96], [68], [56], [97], [98] and [99].

2.3 Implementations of Machine Learning and Data Mining


Techniques in Sports

Sports analytics is an emerging field that tends to be an essential tool in the sports
industry. It combines Data Science and Information Technology (IT). Current and
historical data are used to measure performance and forecast performance, and these
data provide a competitive advantage for decision making [100]. Sports teams and
Sports Data Scientists use DM methodologies for interpretation or segmentation
purposes in various fields, such as tactics, strategy and financial/budgeting management
for the team and roster selection in the short or long-term according to the team’s vision
[3].

Sports industry experts, including players, performance coaches, scouters, club owners,
managers, health staff, and technical staff, seriously considered any advanced statistics
supporting their analysis. Especially at the top level, every detail can bounce a possible
positive effect on player and team performance. Sports analytics use DM and machine
learning (ML) techniques and algorithms to understand data in more depth, create self-
explanatory visualizations and help senior management in decision making and
storytelling behind raw data [14], [57], [101].

Sports data analysts should also validate and perform data preprocessing to address
noise [102]. Hence, the purpose of advanced analytics is to identify gaps and limitations
and to focus on performance improvements that lead to greater chances of victories. In
addition, Sports Analytics helps players and staff identify the critical path of a game or
game series for pace increment and decision accuracy during or after a match.
Performance estimation is a multivariate task that depends on physical, mental, social,
economic, psychological and shape variables and contains important levels of
uncertainty in correlations; this hinders forecasting. However, the aim of analytics is to
minimize the risk and bias of player and team performance. Data Science requires good

35
statistical analysis and data interpretation with the aid of technology to lead to success
[58], [59].

Each sport demands specific body characteristics and talent charisma, but the most
significant ingredients are hard training, commitment to the game and sports IQ
(Intelligence Quotient). Technical staff searches and measures biometric connections
related to height, weight, agility, speed, hand length, stamina and other important
characteristics that can maximize the performance of each player and have a greater
impact on the game, which could drive a team to more wins and achievements. Previous
studies have shown that some of the higher-ranking characteristics that influence rookie
selection in NBA league draft lotteries are height, agility, and long arms. The taller a
player is (with suitable talent of course), the greater the chances of being selected in
drafts are. The 1st player pick on the rookie scale takes over $8 million in salary for the
first season, according to 2020-21 earnings. [46], [103], [104], [105], [106].

Studies in different sports (such as NFL and tennis) have tried to identify the average
athlete body profile in each sport, as illustrated in Figure 2.2. The average height of
NBA players is 201 cm, with a 101 kg weight and 24.85 Body Mass Index (BMI), while
the average body image of a healthy man is 175 cm, 82.5 kg, and 26.66 kg BMI,
respectively [107], [108].

Figure 2.2: Average body profile of professional players in various sports compared
with that of an average man.

36
The sports industry recently focused on Big Data, not only for box-score statistics cards
but also in segments, such as opponents’ analysis, scouting analysis, ticket pricing,
roster management, training data, health data, and social network analytics, in addition
to advanced performance analytics that support decision making [9].

Sports Analytics assist businesses with the following activities: i) retrieval, ii)
collection, iii) transformation and cleansing, iv) analysis, v) visualization and vi)
forecasting by using past and current statistics based on methods and/or Data Science
and ML algorithms. Sports data scientists process the abovementioned qualitative and
quantitative information to aid owners and technical staff with tools providing
competitive advantage and optimizing performance, tactics, and strategy [15], [61],
[63], [62], [67].

Health pathologies and injuries are events with an important level of bias due to the
complexity and interdependencies that exist in this multivariate model. One of the most
famous methodologies for performance forecasting in basketball advanced analytics is
the CARMELO method, which combines regression statistics and data analysis with
input Box Plus Minus (BPM) and Real Plus Minus (RPM) statistics to estimate future
player and team performance. However, a limitation of this methodology is that it does
not include attribute correlation when modelling aggregated/combined injuries; it
includes psychological, biometrical, training, video analysis, and ergometric analytics
[3], [109], [110], [111].

Our recent study indicated that there is a high correlation between injury and
performance analytics implications, one of the aims was to correlate qualitative and
quantitative data (for instance, box-score statistics and health and injury analytics) from
multiple sports data sources [109]. Sports Analytics are the tools and methods used in
performance optimization on and off the court, with the purpose of identifying the
efficiency of teams and players. Furthermore, through these approaches, decision
makers can propose the ideal team line-up or pairs of players who are more productive
[112].

37
2.4 Player Injuries in Sports and Data Analysis Techniques

Sports injuries have been one of the biggest challenges in athletics for many years. The
identification and risk assessment of this uncertainty factor is crucial not only for
restructuring team selection but also for restructuring team selection, which is most
important in economic ways because it costs large investments for the sport
organization. Therefore, avoidance, limitations, or even good accuracy in predicting
player injuries can lead to large cost savings and help them improve their performance.
This literature review examines the benefits of sports analytics for important injury
analytics associated with performance, the effect of sports injuries and the most crucial
injuries that influence basketball performance in the NBA over the last decade (2010-
2020).

One of the most crucial segments directly related to player and team performance is
injury. Health and injury analytics are vital ingredients for understanding the past but
also for further analysis, which can lead to team success. Hence, sports analytics and
data can incorporate the health engagement of players with analytics with the purpose
of avoiding illness and injury in future similar circumstances [14].

Data Mining (DM) involves the discovery of rules and patterns from substantial
amounts of data, as well as the process of searching for valuable information within the
data [38]. Therefore, one or more techniques can be used for automated analysis and
knowledge extraction from data. Furthermore, the process of analysing the examined
datasets with the purpose of solving the described problems of performance and injury
association is described in the findings and discussion section. Sports teams use DM
methodologies for interpretation or segmentation purposes, which will ultimately help
them in decision making. Assembling DM techniques with valuable information can
boost a team and provide a competitive advantage against an opponent who does not
use these technological and scientific methodologies ([5] and [39]).

The classification of individual players or teams based on their performance can reveal
different perceptions or ways of play. Once the preferences in each position have been
determined and gameplay has been established, the managers and coaching staff can
ultimately understand, drill down and analyse insights and choose the best option for
38
each situation. Through these advanced basketball analytics, this sophisticated approach
can be customized and personalized for each team/player preference and/or performance
[68]. Hence, the implementation of these concepts can automate such procedures for
optimized classification, segmentation, and forecasting.

Soccer is an innovator in the prediction of injury possibility via the use of biomedical
tracking and monitoring technologies, with a focus on the parameters of workout
performance, injury rate, history of injury and odds of injury. Currently, other sports
(such as NFL, crickets, basketball, etc.) require significant investment in that area
because it is a cost savvy factor in the long run of a team or a player career [113].

Sports business innovation can be applied in monitoring and tracking player health
utilization, which leads to injury deductions. Hence, athletes can prolong their careers
in terms of maximum efficiency and performance. The best players on the team,
especially all-stars, invest enormous amounts of money in body recovery before and
after the match and focus more when they suffer an injury. The career longevity of a
player at that level is crucial, and he or she needs special care in terms of training,
physiotherapy, diet, nutrition, and everything relevant to body care. For example, one
of greatest players in the NBA league, LeBron James, spends 1.5 million dollars per
year taking care of his body for training, massage, therapists, gyms, appliances, chefs,
and other relevant costs [114].

2.5 Major injuries that influence performance in the NBA League


In the last two decades, many studies have been conducted with the purpose of analysing
injuries or general health pathologies that require physician referral, medication,
emergency care or both and that cause games or practices to be missed. The majority of
the studies examined all types of injuries, as well as general health pathologies ([115],
[33] and [34]), whereas some studies examined only one type of injury, such as meniscal
injuries ([116]), pelvis-hip-thigh injuries ([35] and [32]), facial injuries ([117]) and
ankle injuries ([118]). Additionally, most of the studies analysed league injuries among
all teams and players ([32], [33], [34], [115], [116] and [35]), but only one study
included 53 NBA players [117].

39
In general, the data were collected through the National Basketball [Athletic] Trainers'
Association (NBTA) online database [119], [33], [34], [116], [35] and [120], while some
authors used publicly available records [115] and [32]. The number of seasons that were
analysed and criticized in the bibliography varies widely, and there is a specific pattern;
for example, injuries were examined in one season [115], in 3 seasons [120], in 10
seasons [33] and [32], in another study for 17 seasons [34], in [116] for 21 seasons, in
study [35] for 24 seasons and in [117] for 34 seasons. In addition, the whole period of
the NBA season (Preseason, Regular season, Playoffs) was analysed by [120].

The injury rates were examined according to the age and anthropometric characteristics
(height, weight and body mass), the player’s position, the exposure against an opponent,
the body’s region, the mechanism of the injury (contact or not), the phase of the season,
the days missed, the years of participation to the league, as well as the NBA experience
according to the seasons [115], [33], [32], [34], [116] and [35]. Similarly, the influence
of injuries on player performance can occur based on game availability and career
longevity (points, rebounds, assists, steals, etc.) through the playing periods of the
season [32], [117] and [120].

According to the results of previous studies, players aged greater than 30 years
demonstrated more injuries throughout the season [115] and [32]. Regarding the
players’ position and injury prevalence, a literature review showed that “Forwards”
presented a greater number of injuries per 1000 players, while “Centers” missed the
most playing time [115]. However, in the adductor injuries, “Guards” were injured more
frequently than “Forwards” or “Centers” were (49% vs 25% vs 25%, respectively) [32].

The lower extremity was the most common injured body area and, in particular, the
knee, ankle, and foot joints [115] and [33]. Moreover, patellofemoral inflammation is
the type of injury that can cause more days to fail, after which ankle injuries and knee
sprains follow [33] and [34].

Regarding the studies that exclusively examined one region of the human body, they
showed that in the years 1988-1989 through 2011-2012, the quadriceps group (hip
region) was the most commonly injured structure and had a significantly greater rate of
game-related injury than the other structures. Furthermore, strains were most common

40
in the first month of the season, and the greatest risk of strains occurred during the eighth
year of the league, as shown in a previous study [35]. One other study [32] indicated
that following adductor injuries, NBA players returned to gameplay after an average of
16 to 17 days, or 7 to 8 games were missing [32]. Correspondingly, it does not affect
player performance, game availability or career longevity. Similarly, facial fractures do
not cause a significant decline in performance regardless of whether they involve
operative or nonoperative management [117]. Finally, ankle sprains affected 26% of the
NBA players on average during each season, which is the reason for the considerable
number of missed NBA games on aggregate. Most ankle sprains occurred during games
(n = 565, 71.0%), and they involved contact mechanisms of injury [118]. Several studies
have investigated the application of advanced ML techniques to predict player injuries
based on box-score basketball analysis, SportsVU camera data, player physio metrics
and workload management [121].

Long missed game periods have a negative impact on players’ and teams’ performance.
Therefore, a red alert in a sport team’s decision makers for an upcoming injury could be
incredibly supportive in medical and economic ways. Unfortunately, injuries are caused
by various random events (exogenous or endogenous), and they have a high level of
ambiguity in predictions [16]. Therefore, medical and injury analytics can be segmented
by multivariate factors, which makes prediction difficult. Injury forecasting is uncertain
according to physiologists and sports data analysts because it can be associated with
factors such as sleeping duration and psychological, social, and nutritional status, in
addition to training and physical activity [14].

Athlete injuries are one of the greatest challenges in sports. Risk identification and
assessment of this uncertainty are crucial not only for the sport industry regarding
rotation and team selection management but also, most importantly, because of the
economic impact of these practices. Therefore, injury avoidance, limitations, or even
accurate predictions can lead to large cost savings and improved performance.
Therefore, this paper examines the influence of sports analytics and DM on finance and
performance in basketball in assessing age, player position and injury status. The
literature was reviewed to examine the benefits of sports injury analytics for individuals
associated with performance, the effect of sports injuries and the identification of crucial
injuries that influence performance in the NBA League over the last decade, 2010–2020.

41
2.6 Socioeconomic Influence

The sports industry has grown intensely in recent years. The estimated worldwide
market size is more than $1.3 trillion, with more than 1 billion fans supporting their
favourite sport and/or team. This audience derives from a variety of sources, such as
betting and entertainment from TV, internet broadcasting or live games in stadia [122].

Soccer, basketball, and baseball are the leaders in Sports Analytics providing innovative
tech savvy solutions to understand the game in more depth but also for forecasting
purposes. Currently, in sports industry practices, wearables, SportVU cameras, drones,
sensors, and other important technologies are used to monitor every step and process,
such as training, injury, recovery, workout, and game spot checks. Therefore, the sports
industry allocates an important share of their budget for sports analytics actions with the
purpose of improving their strategy, understanding their limitations, better scouting their
opponents and, ultimately, avoiding extra costs [113], [6], [123].

Top players take care of any small detail before and after each game. Hence, training,
body recovery, nutrition, injury physiotherapy, strategy, tactics, and skill development
are some traits for career improvement, attracting large investments from players. One
of the best NBA players in the last decade, Lebron James, pays $1.5 million per year for
his body preparation, recovery, nutrition, training and all the appropriate steps needed
to stay at the top level of performance. Therefore, athletes use technology and the
benefits of sports analytics to minimize any upcoming risk (such as injury or health
problems) with the purpose of being efficient and healthy and extending their
professional career as long as possible [114], [124].

Basketball is a contact team sport. All sports of this type are risk averse to injuries.
Injuries are not only costly for club performance, implying a higher probability of losing
games but also increase financial expenses and may cause problems in teamwork. For
instance, in the 2016-17 season of the football UK Premier League, injured players cost
more than £175 million to their teams due to health problems. Artificial Intelligence
(AI), DM and ML techniques and algorithms are used for prediction, including
performance and financial benefit estimation [125].

42
For more than 20 years, sports experts have used sophisticated statistics to explain the
results and performance of these methods. Sports analytics gained a reputation with the
story/movie of Moneyball ([126], [127]) by changing the mode of player evaluation in
terms of game impact but also helping to enhance the strategy and tactics in defence and
offense before, during and after the game. The purpose behind this approach is to
retrieve valuable information through combined analytics and let club owners, managers
and players make tough decisions, which include small but vital details, and try at the
same time to minimize any upcoming risk [19].

As the salaries of players and team budgets increase over time, there is a need to gather
additional data and analyse them properly under different conditions to support effective
decision making and cost reduction. The right choice of players and good team
management can alleviate additional costs and provide stability to players and team
performance [54].

Annual salaries in the NBA can range between a few thousand dollars and tens of
millions of dollars. This depends on a variety of considerations, such as performance,
team, marketing, age peak of career and other factors. However, most professional NBA
players earned a salary of almost $5 million dollars in the 2019-20 season. The salary
of a player can fluctuate as D-league competition (subsidiary competition of the NBA)
crossover agreements earn $5,000 dollars depending on the matches that they played.
The salary cap for NBA players is close to $100 million, even though currently, nobody
earns anywhere near that amount. The top players in the NBA League earn nearly $40
million. The team salary cap for the 2019–20 season was approximately $109 million.
When teams exceed this amount, they would have to pay extra for luxury taxes [128],
[129], [119].

43
2.7 Sports Analytics and Text Mining NBA Data to Assess Injury
Recovery and Economic Impact

Sports analytics have found a strong position in the realm of sports, particularly in
predicting potential setbacks and optimizing team and player performance. This is
evident in diverse applications, from aiding team management in estimating and
mitigating injury risks to assisting betting companies in forecasting player and team
performance. The overarching goal remains consistent: reduce associated costs and
elevate the performance metrics, be it of an individual player or the entire team [3],
[130], [131], [132]. Engaging in sports has numerous advantages but also leads to
injuries. Several parameters, such as age, sex, sport nature (contact or noncontact), and
training intensity, govern the susceptibility of athletes to injuries [133], [46].

According to the National Basket Association (NBA), injuries are a significant concern,
as they can potentially detrimentally impact player performance. While advancements
have been made in prevention and rehabilitation, injury rates remain alarmingly high.
This unpredictability in injury occurrence can be attributed to several factors, many of
which are challenging to pinpoint and measure. Furthermore, the repercussions of such
injuries are not only felt by players but also by management, fans, teams, and their
overall dynamics [134], [135].

Basketball's universal appeal, much of which can be credited to the NBA's prominence,
spans from casual games to professional tournaments. The influence of the NBA is
undeniable, impacting not only the sport's popularity but also the socioeconomic
narratives of players, teams, and the league itself [136], [137].

The proliferation of big data has revolutionized how sports professionals, from trainers
to healthcare experts, approach injury management. With the advent of biometric
tracking tools such as accelerometers, RFID devices, HR monitors, and GPS wearables,
the ability to diagnose team or player vulnerabilities has improved. These technologies
play pivotal roles in refining performance, curtailing injury risks, and accelerating the
recovery trajectory [109], [37], [138].

44
An injury can have a significant impact on the economic data of a team. A key player
sustaining an injury can cause a decline in team performance and ultimately affect
revenues and profitability, as it can lead to a decrease in ticket and merchandise sales.
If a player is not able to fully recover from an injury, this can cause a decline in their
performance on the court and ultimately affect the team's overall performance [139],
[140], [141], [142].

Another study reviewed in-depth injury analysis from the 2017-2021 NBA seasons.
There was a rise in injuries and games missed, with the highest injury rates for guards
and athletes with 6-15 years of experience. This study underscores the necessity of
further research to understand and reduce injury risks in professional basketball [143].

This can also lead to a decline in revenue from sponsorships and advertising.
Additionally, injuries can also affect a team's salary cap, which is the amount of money
that the team has available to spend on player salaries. If a player sustains an injury,
they may not be able to perform at the same level as they did before, which can affect
their future salary and contract negotiations. This can also affect the team's ability to
sign new players, as they may have less money available to spend on player salaries.
Overall, an injury can have a significant impact on the economic data of a team. While
the immediate effect may be a decrease in attendance, merchandise sales and revenue
from sponsorship, the long-term effect can have a negative impact on a team's overall
performance, attendance, merchandise sales, revenue from sponsorship, salary cap and
ability to sign new players [144], [58], [90].

Researchers have investigated the use of a new ML approach to detect injury risk factors
in young team sport athletes [136]. The results showed that the novel approach was
effective at detecting injury risk factors and could be a useful tool in injury prevention
and risk management.

Several researchers have investigated the knee movement patterns of injured and
uninjured adolescent basketball players when landing from a jump. The authors
conducted a case‒control study and reported that injured players had different knee
movement patterns than uninjured players when landing. The results suggest that certain

45
knee movement patterns may contribute to an increased risk of injury in adolescent
basketball players [145], [146].

In a study, researchers identified the use of ML in classifying the integrity of articular


cartilage in the knee joint. They used near-infrared spectroscopy to gather data on the
biochemical composition of cartilage samples. Afterwards, they applied ML algorithms
to the data to classify the cartilage samples into categories of different levels of integrity.
The results showed that the ML approach was able to accurately classify the samples,
suggesting that the ML approach could be a useful tool for detecting and monitoring
cartilage degradation in the knee joint [147], [148], [149].

The research [150] aims to explore the multidimensional impact of injuries in


professional basketball, with a focus on player performance, team dynamics, and
economic outcomes. By utilizing text mining techniques on NBA data, this study
examined the complex relationship between injury and performance metrics, revealing
the significance of certain anatomical subareas. The analysis also revealed the
substantial economic burden that injuries impose on teams, highlighting the need for
comprehensive, long-term strategies for injury management. The broader objectives of
this research are to provide valuable insights into the distribution of injuries and their
varied effects, which are crucial for developing effective prevention strategies and
economic approaches in basketball. By illuminating the influence of injuries on
performance and recovery dynamics, this study offers comprehensive insights that are
beneficial to NBA teams, healthcare professionals, medical staff, and trainers. This
contributes to enhancing player care and optimizing performance strategies. This
manuscript methodically explores the connections between injuries and player
performance, the economic repercussions for teams, and the efficacy of modern data
science methodologies in addressing these challenges. By synthesizing data-driven
analysis with practical considerations, this paper aims to contribute a comprehensive
perspective on the complexities and nuances of managing sports injuries in the high-
stakes environment of professional basketball.

In Figure 2.3 presented the methodology outlined as part of the research. Each step in
the process is clearly delineated, providing a concise overview of the sequence from the

46
data collection to the final analysis phases, including injury- and salary-data
transformation.

Figure 2.3: Structure and flow of the methodological approach.

Additionally, the datasets included comment-based metadata explaining player


absences, including coaching decisions and injuries. A schema summary of the datasets
used is given in Table 2.1.

Table 2.1: Datasets based on player performance metrics (regular and playoff season)
and injury and salary data for the period from 2000 to 2023.
Name Type # Records # Features
Player performance statistics Regular 733,193 132
Player performance statistics Playoffs 48,213 132
Injury data On and off game 58,151 4
Salaries data Per season 15,365 4

47
2.8 Injury Patterns and Impact on Performance in the NBA League
Using Sports Analytics

In the domain of professional sports, particularly in the National Basketball Association


(NBA), the advent of Sports Analytics (SA), Data Science, Machine Learning (ML),
and Data Mining has revolutionized the understanding and prevention of
musculoskeletal injuries. These innovative methodologies, which involve the
comprehensive collection and analysis of data pertaining to athletes' movements,
training patterns, and injury histories, are instrumental in identifying risk factors for
knee injuries. This, in turn, facilitates the development of targeted interventions to
mitigate such occurrences, a critical advancement in athlete health and performance [3],
[109], [151], [152], [153], [154].

The musculoskeletal is a complex structure that plays a vital role in human movement
by carrying a significant amount of body weight, allowing for a wide range of motion
in six degrees of freedom [155]. Recent advancements in AI and ML algorithms have
made significant strides in predicting and diagnosing injuries, marking a paradigm shift
in how sports science approaches injury prevention and management. These
advancements not only aid in better understanding injury mechanisms but also play a
pivotal role in enhancing athlete outcomes [136], [156], [157], [158], [159].

Overall, the use of SA can lead to a better understanding of the causes of


musculoskeletal injuries and help to develop more effective strategies for preventing
these injuries. By identifying risk factors and protective strategies, it is possible to
reduce the incidence of musculoskeletal complications and ultimately improve athletes’
health and performance [160], [150], [161].

The NBA, a league where physical prowess and high-intensity play are paramount,
presents unique challenges in terms of injury prevention and management. Injury risks
in the NBA are multifaceted, stemming from factors such as improper technique during
physical activities, muscle imbalances, poor flexibility, and weak knee stability.
Addressing these risks requires a comprehensive approach combining physical
evaluations, strength and flexibility assessments, and a review of injury history. Such

48
detailed assessments pave the way for specific interventions that reduce injury risk and
enhance overall performance [162].

In particular, targeted strength training and technique refinement are crucial in


improving knee stability and reducing injury risks. Incorporating stretching and
flexibility exercises into athletes' routines can further bolster this preventive strategy.
Proactively addressing these risk factors is not merely about mitigating injury risks; it's
about enhancing overall performance and career longevity in the demanding world of
professional basketball [136], [142], [109].

Athlete injuries are a major concern in the world of sports, as they can significantly
impact on an athlete's performance and career. One of the most common types of sports-
related injuries is knee injury, which is a tear in the ligament that connects the thigh
bone to the shin bone and provides stability to the knee joint. The risk of an injury can
be caused by several factors, including improper technique during physical activity,
muscle imbalances, poor flexibility, and weak knee stability. These factors can lead to
increased stress on the body and increase the likelihood of an injury occurring. To
minimize the risk of injuries, it is essential to identify and assess the risk factors
associated with these injuries. This can be done through a combination of physical
evaluations, strength and flexibility assessments, and injury history review. An
epidemiological study of basketball injuries during one competitive season in
professional and amateur Spanish basketball. The authors aimed to evaluate the
incidence and types of injuries in Spanish basketball players, and to compare injury rates
between professional and amateur athletes. The study found that ankle sprains were the
most common injury, followed by knee injuries. Professional players had a higher
overall injury rate than amateur players, but amateur players had a higher rate of knee
injuries [160].

One study presents a deep learning approach to forecasting injuries in the NBA,
highlighting the challenges and nuances of dealing with imbalanced injury datasets. This
study emphasizes the importance of correctly splitting data to avoid overfitting and
proposes a novel model, METIC (Multiple bidirectional Encoder Transformers for
Injury Classification), for assessing injury classification. The model is designed to
process sequences of data related to past injuries and games, offering insights into risk

49
factors and the potential for predicting future injuries based on player history and game
loads [130].

The knee joint, with its complex anatomy and biomechanical functions, epitomizes the
challenges faced in sports medicine. Supporting a significant portion of body weight, it
enables a wide range of motion and is crucial for the high-level performance expected
of NBA athletes. Understanding the knee's intricate structure—including bones, joint
capsules, muscles, tendons, and ligaments—is vital in comprehending injury risks and
devising effective prevention strategies [155].

Pivoting to epidemiological insights, studies have revealed intriguing patterns in NBA


injuries. Players who sustain injuries tend to be taller and heavier, often playing in
forward or center positions. This correlation between physical attributes and injury risks
underscores the complexity of sports injuries, necessitating ongoing research for a
comprehensive understanding [163], [46], [164], [165]. There is some research that
suggests there may be a connection between musculoskeletal injury and certain
characteristics of NBA players. One study found that NBA players who sustained an
injury were more likely to be taller and heavier than players who did not sustain an
injury. Additionally, the study found that players who sustained an injury were more
likely to play the forward or center positions, which are positions that involve more
jumping and pivoting movements [166]. Further studies have highlighted the
multifactorial nature of injuries in NBA players, emphasizing the need for continued
research to unravel the intricate web of contributing factors [167], [140], [168].

A research effort has conducted an epidemiological retrospective analysis of the NBA


seasons from 2017–18 to 2020–21, detailing injury frequency, characteristics, and their
impact on performance, including the effects of the COVID-19 pandemic on the league's
schedule and injury rates. This study utilized publicly available data from the NBA's
official website, focusing on the official game box scores to track inactive players and
injuries. The research aimed to provide new insights into injury trends and their
implications for player performance and team strategies [143].

Another study identified the incidence of musculoskeletal injuries in the NBA from the
2006-2007 to the 2011-2012 seasons. The study found that there was a total of 24 knee

50
injuries during this time period, with most injuries occurring during games. The study
also found that players who sustained an injury were more likely to be taller and heavier
than players who did not sustain an injury. In conclusion, while I couldn't find a specific
study that matches the exact criteria you have asked, these studies and others have
shown that injuries tend to occur more frequently during games rather than during
practices or other team activities, however, it's hard to conclude if the majority of
injuries occur during the fourth quarter specifically as the sample size of the studies is
relatively small and it's not clear if these studies have that information. Overall, knee
injuries in NBA basketball players tend to occur most commonly during games, with a
high incidence in the fourth quarter and during the competitive season. While the exact
cause of injuries is still not fully understood, it is believed that factors such as fatigue,
overuse, and the intense physical demands of the sport may all contribute to the high
incidence of these injuries [169].

A pivotal study examining NBA player performance markers before and after severe
lower extremity injuries—including ankle, knee, and hip injuries—between 2008 and
2019, provides critical insights. The study found a notable decline in performance levels
in less than half of the players within two years post-injury, signifying the profound
impact such injuries can have on a player's career trajectory [133].

Other experimental designs have included the use of 3D motion analysis and physical
data such as sex, body mass index, hamstring flexibility, knee joint laxity, medial knee
displacement, height, ankle plantar flexion at initial contact, leg press one-repetition
max, and knee valgus at initial contact [136]. Innovative experimental approaches,
including 3D motion analysis and the examination of various physical parameters,
further contribute to the understanding of injury mechanisms. Data from inertial sensors
worn by rugby players, for instance, have been instrumental in differentiating between
healthy and post-ACL injury states [156]. Additionally, the application of ML in
diagnosing anterior cruciate ligament (ACL) tears through magnetic resonance imaging
(MRI) represents a significant leap in injury diagnostics [157], [158], [170].

A systematic review focusing on the epidemiology of sports injuries in basketball has


shed light on injury prevalence and characteristics. This comprehensive analysis
revealed that ankle and knee injuries are the most common, with a higher incidence

51
observed in male players. The predominance of injuries during games as opposed to
practice sessions highlights the urgent need for effective injury prevention strategies
tailored to the high-stakes environment of competitive play [171]. Interestingly, post-
ACL surgery performance analysis revealed that while there was an initial decline in
performance, many players were able to regain their pre-injury levels, suggesting that
ACL reconstruction does not significantly impede NBA players' career longevity or
performance [172].

Further emphasizing the importance of epidemiological studies, research tracking


basketball players over a season has offered valuable insights into the frequency and
types of injuries, with a focus on ankle sprains and overuse knee injuries. Such findings
are instrumental in informing targeted prevention efforts [23], [173].

In summary, the study on the return to performance after severe ankle, knee, and hip
injuries in NBA players, utilizing NBA data to analyse player performance pre- and
post-injury, underscores the significant impact of severe injuries on players'
performance and career paths. This research, along with ongoing studies, continues to
shape the understanding of injury patterns and prevention strategies, ultimately aiming
to safeguard the health and longevity of athletes in the high-stakes world of professional
basketball.

This study examines injury patterns in the NBA and their effects on player performance.
Utilizing a unique dataset, it identifies common injuries and their impact on players
post-recovery. The study's innovation lies in its integration of injury data with
performance metrics and salary information, offering new insights into how injuries
influence both economic and on-court performance. This approach not only uncovers
patterns in injury periodicity and seasonality but also examines the specific impacts of
injuries on players' per-game performance. The findings provide valuable contributions
to sports analytics, guiding injury prevention strategies and enhancing player welfare in
the NBA

52
3. Data and Methods

The "Data and Methods" section of this thesis provides a foundational framework for
understanding the complexities and nuances of sports analytics within the context of
professional basketball. The survey was structured around a series of research questions
designed to dissect various aspects of basketball performance evaluation, the
optimization of rating techniques, the understanding of the impact of analytics, and the
identification of dominant attributes for predicting awards such as the Most Valuable
Player (MVP) and Defender of the Year. This research extends to exploring the
prevalence of health pathologies and injuries in the NBA, their impact on player and
team performance, the cost of health issues, and the potential for text mining to reveal
patterns related to injuries.

The primary aim of this study is to critically evaluate and enhance the existing
performance analytics used in European and NBA basketball, thereby allowing teams
to gain a competitive advantage. This involves reviewing sophisticated performance
metrics and comparing them to identify trends and patterns that may have been
overlooked. Another critical aim is to quantify player performance attributes to increase
forecasting accuracy, especially in light of the unpredictable nature of sports where luck
and skill intertwine.

Addressing the complexity of sports and the vast amounts of unstructured data it
produces, this section outlines the methodologies employed to analyse and interpret
these data effectively. The study emphasizes the importance of understanding team and
player performance to make appropriate decisions, with a focus on reviewing and
comparing basketball performance analytics worldwide. The research also seeks to
associate basketball performance with advanced analytics on demographics, economics,
and injuries to find valuable insights for players, teams, and decision-makers.

The data for this comprehensive study were meticulously collected from various
accredited basketball online sources and subjected to rigorous preprocessing methods to
ensure quality and relevance. The methodology encompasses data collection,
preprocessing, analysis, and result evaluation. Advanced data analytics and text mining

53
techniques were applied to study the influence of sports injuries on athlete performance,
recovery dynamics, and economic implications for basketball teams.

3.1 Research Questions

Through the whole PhD Thesis was addressed a comprehensive set of research questions
aimed at examining into various aspects of data science in Sports Analytics, particularly
focusing on basketball performance and injuries. These questions span a wide range of
inquiries, from understanding the evaluation and optimization of player and team
performance metrics to exploring the economic implications of injuries on basketball
teams. The following questions guide my study.

Through RQs 1-4, the study investigated the background and advanced basketball
metrics used in National Basketball Association (NBA) and Euroleague games. The
purpose is to benchmark existing performance analytics used in the literature for
evaluating teams and players. Basketball is a sport that requires full set enumeration of
parameters in order to understand the game in depth and analyse the strategy and
decisions by minimizing unpredictability. It provides valuable information for team and
player performance basketball analytics to be used for better understanding of the game.

Furthermore, these analytics can be used for team composition, athlete career
improvement and assessing how this could be materialized for future predictions.
Hence, critical analysis of these metrics are valuable tools for domain experts and
decision makers to understand the strengths and weaknesses in the game, to better
evaluate opponent teams, to see how to optimize performance indicators, to use them
for team and player forecasting and finally to make better choices for team composition
[3].

1. How does performance evaluation happen for basketball players and teams?
(RQ1)

2. How can these ratings, techniques and methodologies be optimized? (RQ2)

54
3. How can the impact of basketball performance analytics be understood, and how
can the correlations between them be identified? (RQ3)

4. How can we identify the dominant attributes for the prediction of Most Valuable
Player (MVP) and Defender of the year? (RQ4)

According to RQs 5-8, it was reviewed the literature to identify important attributes
correlated with injuries and attempts to quantify their impact on player and team
performance, using analytics in the National Basketball Association (NBA) from 2010
up to 2020. It also provides an overview of Machine Learning (ML) and DS techniques
and algorithms used to study injuries. Additionally, it provides information for coaches,
sports and health scientists, managers, and decision makers to recognize the most
common injuries and investigate possible injury patterns during competitions. The study
identified teams and players who suffered the most, and the type of injuries requiring
more attention. Additionally, it was found a high impact from injuries and pathologies
on performance; musculoskeletal impairments are the most common ones that lead to
decreased performance [109].

5. Which were the most common types of general health pathologies and injuries in
the NBA League during the last decade? (RQ5)

6. Which teams and players suffer from the most injuries? (RQ6)

7. How did injuries affect NBA players’ team performance from 2010 to 2020?
(RQ7)

8. Which are the key Data Science methods used for injury analysis? (RQ8)

The aim of RQs 9-11 is to identify key attributes and metrics impacting on basketball
player performance and income. It was focused on player age, position and injuries
using Analytics for the National Basketball Association (NBA) league, from 1996 up to
2020. Our proposed approach utilises feature selection methods to estimate injury costs,

55
and clustering on age and player position in association with advanced basketball
performance Analytics. Players demonstrate peak performance between 27 and 29 years
of age but receive the highest salary between 29 and 34. According to the findings, half
the financial costs in the NBA league caused by health problems and pathologies stem
from musculoskeletal injuries. Segmentation, clustering and classification methods and
techniques used in this study can help Sports Industry professionals to formulate
strategy, make better decisions, and reduce costs [174].

9. What is the cost of NBA league player health pathologies and injuries? (RQ9)

10. At which age do NBA league players peak in terms of performance? (RQ10)

11. At which age do NBA league players reach their maximum annual salary? (RQ11)

RQs 12-15 aimed to explore the multi-dimensional impact of injuries in professional


basketball, focusing on player performance, team dynamics, and economic outcomes
from 2000 up to 2023 season. Employing advanced machine learning and text mining
techniques on suitably preprocessed NBA data, it was examined the intricate interplay
between injury and performance metrics. The findings reveal that specific anatomical
sub-areas, notably knees, ankles, and thighs, are crucial for athletic performance and
injury prevention. The analysis revealed the significant economic burden that certain
injuries impose on teams, necessitating comprehensive long-term strategies for injury
management [150].

12. Is it possible, using text mining, to distinguish patterns and relationships between
references to injuries of anatomical sub-areas and changes in advanced
performance indicators across different series, such as 2-game, 5-game, and 10-
game series? (RQ12)

13. Which types of injuries take longer for athletes to recover from before they can
return to action? (RQ13)

56
14. How do the different injury types vary in correlation with advanced basketball
performance metrics? (RQ14)

15. How does an injury affect the economic data of a team based on basketball
analytics? (RQ15)

Addressing these research questions with Sports Analytics could support the sports
industry in structuring its strategy, making better decisions related to investments by
minimizing costs and winning championships and reputation.

According to RQ16-18, the research focused on three critical aspects of injury in


professional basketball: the identification of the most common injuries and the
anatomical regions they affect, the investigation of potential patterns in injury
occurrence, and the analysis of how injuries influence players' statistical performance
and overall contribution to their teams. By examining these facets, the study aims to
provide a comprehensive overview of injury epidemiology within the NBA. The
findings are anticipated to offer valuable insights into injury prevention, management,
and rehabilitation strategies, thereby supporting player welfare and enhancing the sport's
quality. This research represents a step forward in understanding the intricate
relationship between athletic performance and injury, emphasizing the need for
advanced strategies to mitigate the impact of injuries in professional basketball.

16. What are the most common injuries in the NBA, and which anatomical regions
are most frequently affected? (RQ16)

17. Is there a tendency for injuries that occurred before or after another injury?
(RQ17)

18. Do players' per-match statistics and performance change significantly after


returning from specific injuries? (RQ18)

This study is centered around three key questions to understand injury dynamics in the
NBA. Firstly, it aims to identify the most common injuries and the anatomical regions
they predominantly affect. Secondly, the study explores whether there is a pattern in the
occurrence of injuries, particularly in relation to the timing of one injury following

57
another. Lastly, it investigates the extent to which specific injuries impact players' per-
match statistics and overall performance post-recovery. These questions are essential in
hypothesizing the relationship between injury patterns and their consequences on
playing and economic performance in basketball.

The answers to the above research questions are crucial for club owners, team staff and
domain experts in decision making, roster formation and budget control and/or
investment, with the purpose of being more proactive through the association of
advanced basketball and injury analytics. In addition, they can aid them in making
decisions and determining performance and future player career trajectories more
accurately [68].

3.2 Aim and Objectives

Based on comprehensive analysis, one of the main aims of this study is to evaluate the
existing performance analytics used in Europe and NBA (in the USA) basketball.
Therefore, all sophisticated performance game-related analytics that allow us to
distinguish defense, offense, overall, miscellaneous and performance ratings that exist
in the bibliography are reviewed. In addition, a comparison matrix is provided for these
basketball analytics, trends and patterns that may have been overlooked in the current
bibliography (RQ1-3).

Basketball is a high-intensity, physically demanding sport. This can lead to injuries,


which can have a significant impact on a player's performance and career. To mitigate
the risk of musculoskeletal injuries and improve overall performance, it is important to
understand the relationship between basketball performance and advanced analytics
[175]. Advanced analytics have become an increasingly important tool in the field of
sports performance and injury prevention [176].

The aforementioned analytics could give added value to a team and can be treated as a
competitive advantage [38]. In general, sports participation includes two important
variables. The first one is luck, and the second is skill. Luck is something random that
you cannot predict. The percentage of luck differs according to sport, league competition

58
and country. For example, in the NBA, the prevalence of luck is approximately 35%,
which is truly high [113]. Therefore, the objective of this paper is to compare
performance basketball analytics with the purpose of increasing the understanding of
important insights and minimizing the possibility of uncertain current or future events
[177]. Hence, it is crucial to understand, analyse and forecast the aforementioned
statistics to enable meaningful analytics and statements. Another aim of this research is
to quantify player performance attributes to increase forecasting accuracy [177].

However, due to the complexity of sports and the large amount of unstructured retrieved
data, there is a lack of specificity and context that can be exploited through the help of
analytics and proper analyses to obtain more in-depth valuable information [6].
Performance forecasting for teams and players is a common practice used in the sports
industry and by betting companies by gathering data from different perspectives; related
to training, matches, injury, psychology, etc.; and used for short-, mid- or long-term
predictions [178].

It is crucial for sports teams to be able to understand team/player performance and,


subsequently, to make proper decisions [112]. An objective of this research is to review
basketball performance analytics used worldwide [18]. The paper also analyses and
compares Euroleague and NBA basketball leagues to find useful insights at the
microlevel during a game and how they can use this information for critical statements
[179].

For sports teams (especially for managers and coaches), the team roster selection
criterion is highly important because it provides insight and a high-level estimation of
how players select for the upcoming season roster will perform [6]. Forecasting models
applied to NBA basketball analytics with the purpose of identifying major player
performance attributes to predict the future MVP and defender of the year. In basketball
analytics, there has been a clear analysis of existing algorithms used until recently, and
one of the aims was to verify whether these terms could be optimized. One of the
objectives of this work was to predict the NBA MVP and the Defender of the year based
on existing basketball statistics (RQ4).

59
This section introduces the key research questions, states the aims and objectives, and
outlines the applied methodology. This study focused on associating injury statistics
with advanced basketball analytics. DM techniques were used to classify various
nonhomogeneous attributes resulting from data collection into distinct and well-
explained categories. In addition, the manuscript aimed to identify insightful health and
injury analytics in terms of performance over a period of 10 years.

The main aim of this research study was to obtain insights based on the correlation of
basketball performance analytics with injury analytics. Due to the complexity of sports
and the substantial amounts of unstructured data retrieved, there is a lack of specificity
and context, which can exploit valuable information in depth through the help of
analytics [6]. Team and player performance forecasting by gathering data from different
perspectives (training, matches, injury, psychological) is a common practice used by the
sports industry and betting companies and can be used in short-, mid- or long-term
predictions [178].

It is crucial for sports teams to be able to understand team/player performance and make
appropriate decisions [112]. An objective of this research is to review basketball
performance analytics used worldwide [18] associated with injury perspective or
influence. The manuscript also strives to analyse NBA data to find useful intuitions at
the micro level for players and how this information can be optimized to provide insights
into the game and help decision making [179].

Based on retrieved advanced performance basketball analytics in combination with


injury analytics, this research aims to investigate the main and common injuries that
impact players and teams regarding their performance. In addition, the study revealed
which injury types were common and which teams and players suffered the most injuries
(RQ5 and RQ6). Finally, the recovery time needed is significant for the team.

Another purpose of this research, therefore, is to establish a relationship between


basketball performance and advanced analytics in musculoskeletal injuries. The aim is
to use data-driven methods to analyse performance and injury data and to uncover
valuable insights into the factors that contribute to injuries in basketball players. Hence,

60
this research paper can provide recommendations to decision makers on how to reduce
the risk of injuries and improve performance (RQ16-18).

One of the goals is to also examine advanced basketball performance (Advanced,


4Factors, Miscellaneous, Scoring, Traditional, Usage, and Player Track), injury and
salary analytics. This requires the use of a variety of advanced analytical tools, including
text mining and statistical modelling, to extract meaningful insights from the data.

The aim of this study is to associate basketball performance with advanced analytics on
demographics, economics, and injuries to find valuable insights for players and teams
and provide guidelines for decision makers.

Sports analytics can be applied to unstructured data collected from various data sources,
even without clear associations; these data can be combined and structured in the quest
to support domain experts. It is common for teams and players to make important
decisions quickly before, during and after a match [6], [178].

Currently, there is an upwards trend in player salaries and team budgets. Club owners
try to minimize any upcoming risk (e.g., avoid injury or select players who are not injury
volatile) in terms of player management and selection processes to reduce potential extra
costs. In addition, an objective of this paper is to uncover valuable information based on
the proposed approach validated in the literature [112], [18].

Another objective of this study is to observe useful intuitions of performance analytics


in correlation with player wages and team budgets, as well as the extra costs that arise
from injuries or pathologies. Furthermore, the microlevel data analysis and DM
mechanisms was used to strive to find further insights that could be applied in a
comparable manner to sports other than basketball [179], [180].

One of the study aims to apply advanced data analytics and text mining techniques to
study the influence of sports injuries on athlete performance, recovery dynamics, and
economic implications for basketball teams. These objectives were achieved using text
mining to find patterns among specific anatomical injury sub-areas and shifts in
advanced performance indicators across series (2-game, 5-game, and 10-game) were

61
delineated. Subsequent analyses examined the recovery trajectories of patients with
different injury types, elucidating the factors influencing recovery duration. Economic
ramifications were assessed by scrutinizing the perturbations in pertinent basketball
analytics postinjury events.

In conclusion, one of the purposes is to establish a relationship between basketball


performance and advanced analytics in the anatomical subareas of musculoskeletal
injuries, with the aim of uncovering valuable insights for players and teams and
furnishing recommendations to decision makers. By using data-driven methods to
analyse performance and injury data, the research hopes to contribute to the broader
effort to reduce the risk of injuries in basketball and improve overall performance.

3.3 Methodology

The data were primarily retrieved and scraped from various accredited basketball online
sources using Python [128], [129], [119]. A variety of unstructured, structured and
hybrid data formats with qualitative and quantitative attributes were used. Hence, these
data were combined, transformed, and analysed. The data mining techniques and state-
of-the-art machine learning algorithms were executed via the KNIME Analytics
Platform, Python ML/DM libraries and MS Excel. The code for data scraping and the
relevant KNIME workflows and Excel files for these data analysis can be accessed on
GitHub at https://github.com/vsarlis/nbainjuryanalytics.

Our methodology comprised analyses of NBA player performance, injuries, and


salaries. It encompasses three key areas: data collection, where extensive player data
from 2000–01 to 2022–23 seasons were gathered; data engineering, which refined and
integrated these data for clarity and consistency; and data analysis and statistical
methodology, applying both descriptive analytics to assess the impact of injuries on
player performance and also statistical tests based on the impact of the findings and
players’ salary changes, as outlined in the following sub-sections, and involved the
stages of data collection, pre-processing, analysis, and result evaluation [181].

62
Specifically, for the period from 2010 to March 2020, it was collected advanced
performance analytics for 1298 players and 11225 records with daily injury reports for
all players in the same period. Each player’s data include position, team, characteristics
(height, weight, age, origin etc.) apart from rating or performance indicators such as Net
Rating (NetRtg), True Shooting percentage (TS%), Assists percentage (AST%), Steals
percentage (STL%), Blocks percentage (BLK%), Turnover percentage (TOV%),
Performance Estimate Rating (PER), WinShare (WS), Plus Minus (Plus/Minus or +/-)
and Value Over Replacement (VORP) that are used in the literature as the most
appropriate rating metrics [3], [91], [13], [179] and [182].

For the case of injury data, the situation was more difficult because the data were
retrieved in a raw unstructured text format with generic descriptions, scientific medical
jargon deprived of reasoning, and time series reports for basketball games. During data
preprocessing, it was conducted cleansing, feature selection and engineering methods.
This approach helped in in-depth data understanding and improvement with the purpose
of making comprehensive classifications using a homogenous architecture based on
performance and injury criteria.

In this research also presented the followed methodology, as well as the key research
questions along with the aims and objectives. The aim to support domain experts in
understanding the relationships and correlations among socioeconomic and
demographic factors and advanced basketball performance analytics for players and
teams.

The study utilized data from multiple sources [128], [129], [119] to ensure
comprehensive information for a robust analysis. Extensive data scraping and
preprocessing techniques. Data were collected for NBA players' performance, injuries,
and salaries from the 2000-01 to 2022-23 seasons using the nba_api to extract data from
the NBA's official website [119] and consolidated into one dataset based on the data
cross-validation, missing information fulfilment and feature addition of the other data
source [128], [129]. Data retrieval and preprocessing were challenging and involved
consolidating into a supervised data model and prioritizing the quality of the data. Our
methodology encompasses two dimensions, Data Collection and Data Analysis, as
detailed in the following subsections, and includes data collection, preprocessing,

63
analysis, and result evaluation [181]. Figure 2.3 presented the methodology outlined for
this manuscript. Each step in the process is clearly delineated, providing a concise
overview of the sequence from the data collection to the final analysis phases, including
injury and salary data transformation. The preprocessing included removing irrelevant
data, employing text mining to extract detailed information from textual descriptions of
injuries and player contracts, and standardizing salary figures with inflation rates. The
final dataset, containing 749,631 records across 158 columns, was meticulously
prepared for analysis by sorting and dividing the records into subsets focused on
performance and injury data, ensuring a thorough and detailed approach to data analysis.

3.3.1 Data Collection

The data were primarily scraped and collected from different available data sources
[128], [129], [119] to take advantage of the maximum amount of information that it
could be retrieved and gain improved research outcomes based on the data analysis.
Therefore, data retrieval and preprocessing to aggregate all the data into a supervised
data model focusing on data quality are challenging processes. The approach that
followed, it was included data collection, preprocessing, analysis, and result evaluation.
For that reason, the proposed methodology is divided into two dimensions, Data
Collection and Data Analysis, as described in the following subsections.

The retrieved data were unstructured and heterogeneous, whereas DM techniques were
used for feature selection, segmentation, and classification purposes to understand the
data and gain valuable insights [183], [184]. In addition, there is qualitative and
quantitative information, such as box-scores, player positions, ages, and health issue
analytics. Regarding the retrieval of advanced performance basketball analytics, the
challenge was to combine them with demographic and financial information. The
difficulty was in checking the validity of the datasets from various data sources and in
following the preprocessing strategy. For that reason, it was kept data that existed in
more than one source, allowing for financial data bias of less than 15% to avoid large
discrepancies.

In this research, it was performed preprocessing on retrieved and combined data. The
data were collected from various accredited online data sources and cross-validated
64
[128], [129], [119]. The Extract, Transform and Load (ETL) procedure was performed
on the collected data to clean and achieve homogeneity. ETL removes inconsistencies
and improves data quality [71].

The data collection involved handling disorganized and diverse data. The methodology
included gathering, text mining, preprocessing, analysis, and result evaluation. In this
research was used Python scripts and KNIME Analytics Platform flows for data
acquisition. Preprocessing included identifying and removing missing and irrelevant
data. The data underwent Extract, Transform, and Load (ETL) using the NBA API,
PostgreSQL, and Python scripts.

The study acquired and analysed a comprehensive dataset covering player performance,
injuries, and salaries for NBA players from the 2000-01 to 2022-23 seasons and 1996
up to 2020 timespans. This section details the data sources, types, and dataset shapes.
Player performance and injury data were obtained from the nba_api through the
extraction of data from the NBA’s official website and database. Web scraping occurred
from 2000-01 to 2022-23 and involved 2296 players across those seasons. [119].

It was conducted two scrape runs: one for regular season per game_date player
performance and another for playoffs. We compiled nine distinct datasets
(leaguegamelog, players, boxscoreadvanced, boxscorefourfactors, boxscoremisc,
boxscoreplayertrack, boxscorescoring, boxscoretraditional, and boxscoreusage). These
datasets were merged using player_id and game_date, creating a complete dataset stored
in the PostgreSQL database.

Preprocessing Player Performance Data


The player performance dataset included a comprehensive array of advanced player
statistics on a per-game-date basis, including various performance key performance
indicators (KPIs). These KPIs covered general player information, advanced box score
stats, Four Factors, miscellaneous metrics, player tracking, traditional scoring, and
usage statistics. "Four Factors" is a strategic framework focusing on key elements that
significantly influence game outcomes. These factors include efficient shooting,
characterized by high field goal percentages; ball control, emphasizing minimal
turnover; rebounding prowess, both offensive and defensive; and frequent, effective

65
trips to the free-throw line. Dean's methodology highlights the importance of these
aspects in determining a team's success on the court, offering a comprehensive blueprint
for aspiring players and coaches to enhance their performance and strategies [185]
[186].
Although not all KPIs were consistently available throughout the entire data scraping
period (e.g., specific advanced performance statistics were absent in earlier years, as in
2000), it was decided to retain as much relevant information as possible for the
comprehensive analysis. Duplicate records, particularly those related to primary player
reference information such as player_name, were identified and removed.

Injuries Text Mining and Categorization (RQ12)


The injury dataset obtained via nba_api was integrated with the performance dataset.
This integration involved adding a comment column within each performance sub
dataset containing textual descriptions of player injuries. Due to the absence of
standardized formats in these comments, it was applied text mining techniques to extract
injury information.
Then, a comprehensive analysis of injury text records and an initial exploratory analysis
were conducted involving word frequency counts to identify the prominent terms (258
unique terms) within the injury descriptions. First, it was ensured that common stop
words, such as "and", "in", and "on", as well as positional words such as "left" or "right",
were filtered out to highlight the pertinent terms.
Subsequently, an n-gram analysis was performed, accounting for injuries, whose
descriptions might contain n consecutive words for clarity, e.g., "strain knee". From both
the single-word and n-gram analyses (e.g., 4451 bigram terms), the research correctly
identified the key aspects of all injuries.
Furthermore, a qualitative examination was crucial. By manually inspecting the results,
it was identified key terms and phrases that accurately represented the nature and body
areas of the injuries. Using these keywords, it was formulated the categorization system,
wherein each injury description was mapped to a specific category or injury area. As a
result, it was developed a customized dictionary to classify injuries into predefined
categories (244 unique terms).
In the final stage of the mapping, it was utilized a predefined map for injuries based on
body regions [109], culminating in detailed anatomical subareas. For instance, a

66
comment such as "placed on IL with sprained left ankle/right ankle" was categorized as
"sprained ankle" for the injury, and "Ankle" was the anatomical sub-area.
Moreover, the dataset included multiple duplicates arising from cases where a player
missed more than one game due to injury. To address this, it was identified and retained
only the initial instance and the date of the initial injury occurrence based on specific
conditions. Consequently, after mapping the textual descriptions, records were flagged
as 'duplicate = TRUE' if they referred to the same type of injury for the same player
within a 15-day window from the last reported occurrence of that specific injury type.

Salaries Data Transformation


To convert the contract dataset into a more analytically valuable salary dataset, it was
employed text mining techniques once again. This involved extracting contract
durations and monetary values from the textual descriptions within the scraped data. For
example, a contract described as a "signed restricted free agent (from Clippers) to a 6-
year, $51 M contract" was parsed to determine its length (6 years) and amount ($51
million). Additionally, it was incorporated the inflation rate in the data from [187] and
[188] to standardize the salary figures. This adjustment allowed for more meaningful
comparisons of player salaries, considering the year the contract was signed and the
economic context in the U.S., where the NBA operates.

Table 3.1: Players’ performance analytics (regular and playoffs), injury analytics, and
salary data.

Name (type) # Records # Columns


Player performance statistics (regular) 749,631 158
Injury data (on and off game) 58,151 4
Contracts\Salaries data (signed over seasons) 7257 6

A summary of the schemas for these datasets is provided in Table 3.1. These diverse
datasets were merged by leveraging identical player_id and game_date fields,
enabling their integration into a complete dataset, which was stored directly in a
PostgreSQL database. Additionally, the datasets contained comment-based
metadata, explaining why a player was absent from a game, including several
reasons, with the most common being due to coaching decisions or injuries.

67
3.3.2 Data Engineering

In the Data Engineering part, after raw data acquisition, the subsequent phase involved
data engineering, which encompassed data cleansing, structuring, and enrichment of the
collected datasets to enable in-depth analysis.

The player performance dataset provided a comprehensive overview of advanced player


statistics on a per-game-date basis. It encompassed various performance key
performance indicators (KPIs), including general player information, advanced box
score statistics, four factors, miscellaneous metrics, player tracking, traditional scoring,
and usage statistics. While not all performance KPIs were consistently available
throughout the entire scraping time frame (for example, specific advanced performance
statistics were not recorded in earlier years like 2000), the decision was made to retain
as much pertinent information as possible for the comprehensive analysis. Duplicate
records, primarily pertaining to primary player reference information such as
player_name, were identified and subsequently removed.

The injuries dataset was acquired through nba_api and integrated with the performance
dataset. This involved incorporating a comment column within each type of
performance sub-dataset, which provided textual descriptions of players’ injuries. Due
to the lack of standardized formatting in these comments, it was applied text-mining
techniques to extract the necessary injury information. Subsequently, a customized
dictionary was developed to classify injuries into predefined categories. For example, a
comment like “torn ACL in left knee (out for season)” would be categorized under “Torn
ACL”.
Additionally, the dataset contained multiple duplicates stemming from cases where a
player missed more than one game due to injury. To address this, it was identified and
retained only the initial instance along with the date of the initial injury occurrence,
applying specific conditions. Consequently, after the textual descriptions were mapped,
records were flagged as “duplicate = TRUE” if they referred to the same type of injury
for the same player within a 15-day window from the last reported occurrence of that
particular injury type.

68
The contracts dataset required transformation into a more analytically valuable salaries
dataset. Once again, it was applied text-mining techniques to extract contract lengths
and amounts from the textual descriptions within the scraped data. For instance, a
contract described as “signed restricted free agent (from Clippers) to a 6-year, $51M
contract” was parsed to discern its length (6 years) and amount (USD 51 million).

Furthermore, it was utilized inflation rate data obtained from [150] to standardize salary
figures. This adjustment enabled more meaningful comparisons of player salaries,
considering the year the contract was signed and the economic context in the U.S., where
the NBA operates.

3.3.3 Data Analysis

For the data analysis stage, it was focused on advanced basketball analytics selected
from 1996-97 up to 2022-23. We applied data cleansing methods to improve the data
analysis and quality of the final aggregated dataset.

The final dataset used for this study was derived from a complex process of data
integration, which included advanced performance metrics, injury records, and salary
data spanning from regular and playoff seasons. This comprehensive dataset consisted
of 749,631 records across 158 columns, capturing diverse aspects such as player
performance, injury history, and financial information.

The dataset underwent a meticulous two-phase preprocessing operation prior to the


commencement of analytical procedures. In the first phase, records were organized in
ascending order, first by the “PLAYER_NAME” attribute and then by
“GAME_DATE”. This organization ensured a chronological sequence of games for
each player, facilitating a more structured analysis. Subsequently, the dataset was
divided into two distinct subsets. The “Performance” dataset included instances with
non-null game dates and corresponding performance metrics. Conversely, the “Injury”
dataset contained instances marked with non-null injury dates, focusing specifically on
the players’ injury histories.

69
The statistical analysis comprised a principal test applied to the aggregated performance
data. A paired sample t-test was conducted to statistically compare the means of players’
salaries before and after injuries. This test generated a t-statistic and an associated p-
value for each anatomical subarea injury category in the dataset associated with players’
pre- and post-injury salaries.

The results of these analyses were systematically stored in a PostgreSQL database. This
organization into unique categories was designed to simplify further exploratory data
analyses and hypothesis testing, thus ensuring a robust and thorough examination of the
dataset.

The consolidated dataset includes 749,631 rows for each game of 2296 players from 35
teams (Table 3.1) and includes absence analytics (injuries, health problems, suspensions
and not selection reasoning for the game), demographic information (age, country,
college, origin, etc.), financial information (player’s salary, team’s salary cap, etc.) and
advanced basketball performance analytics. The basketball analysis methods used in
this paper included Points (PTS), Rebounds (REBS), Assists (AST), Total Rebounds
Percentage (TRB%), Turnover Percentage (TOV%), Usage Percentage (USG%), True
Shooting Percentage (TS%), Assists Percentage (AST%), Steals Percentage (STL%),
Blocks Percentage (BLK%), Net Rating (NetRtg), Player Efficiency Rating (PER), Win
Share (WS), Win Share per 48 mins (WS per_48), Box Plus Minus (BMP or Plus/Minus
or +/-) and Value over Replacement (VORP). The data filtered only for players with an
average playing time of more than 10 minutes per game (mpg) who played at least 19
games (gp) in each of the 24 regular seasons.

It was excluded from the consolidated dataset the aforementioned outliers to reinforce
the data quality and improve insights, and it was included 490,584 rows as the final
working dataset [189]. In Table 0.9, it was calculated the inflation rates adjusted for
2019–2020 as the baseline dollar value. Considering the time value of money, it was
adjusted the salaries to measure the changes in each year of the purchasing power of
money [190].

The final dataset for this study resulted from complex data integration, incorporating
advanced performance metrics, injury records, and salary data from the 2000-01 to

70
2022-23 regular and playoff seasons. This comprehensive dataset comprises 749,631
records across 158 columns, encompassing player performance, injury history, and
financials.

Primary Data Structure


Before the analytical procedures, this merged dataset underwent two additional
preprocessing steps. First, records were sorted by PLAYER_NAME and GAME_DATE
to ensure a sequential arrangement for each player. The dataset was subsequently
divided into two datasets:
Performance: Data with nonnull game dates and performance metrics.
Injury: Data with nonnull injury dates.

Statistical Evaluations
Two statistical tests were applied to the performance data:
Paired Sample t test: Compared pre- and postinjury performance means were compared,
providing t statistics and p values for each performance metric.
Effect Size Estimation: Cohen's d was used to measure differences between pre- and
postinjury performance metrics.

Summary of Statistics
In summary, it was compiled comprehensive summary statistics:
Unique Players: Determined total unique players and those meeting the analysis criteria.
Non-NA Records: Counted nonnull data points for pre- and postinjury performance
metrics.
Injury incidence and criteria metrics: Total injuries were calculated for those meeting
specific criteria.
The results were stored in a PostgreSQL database and organized for subsequent
exploratory data analysis and hypothesis testing.

71
4. Results
The "Results" section of the thesis is a comprehensive analysis of various
performance metrics and their impact on evaluating basketball players and teams. This
chapter begins by addressing the inherent uncertainties in basketball, acknowledging the
limitations of past data and forecasting tools. The study explores a wide array of
performance metrics, such as Plus/Minus, Real Plus Minus, and Player Impact Estimate,
each contributing uniquely to understanding player and team performance.

This research aims to satisfy the growing demand for advanced analytics and new
techniques in sports, particularly focusing on players' performance prediction and team
lineup evaluations.

One of the central themes of this section is the examination of sports injuries in
the NBA from 1996-2023. This research meticulously categorizes and analyses injuries,
revealing that musculoskeletal injuries are the most prevalent. It then delves into the
significant economic and performance impacts these injuries have, utilizing various
statistical methods and machine learning techniques to understand and predict their
effects.

The results highlight the complex interplay between injuries, player performance,
and team strategies, revealing that while injuries are a contributing factor to
performance, they are just one of many. The findings suggest that proper rest and load
management are crucial for optimal performance and that teams that do not rest their
key players often exhibit lower performance.

By summarizing the economic and performance impacts of injuries, age, and


position on NBA players, this study provides insights into the multifaceted nature of
basketball and the several factors that contribute to success and failure. Considering a
wide range of criteria, such as age, nutrition, training, and psychological status, is
important in strategic planning and player care.

Overall, the "Results" section offers an in-depth look at the critical role of
analytics in basketball, providing valuable insights for players, coaches, and technical
staff in the NBA. The chapter lays out the findings of the research in a detailed and
organized manner, setting the stage for discussions on implications, future research, and
the continued evolution of sports analytics.

72
4.1 Sports Analytics – Evaluation of Basketball Players and Team
Performance

Basketball is a sport that involves considerable uncertainty [10]. Although there are
plenty of past data, there are no advanced tools available for forecasting players’
performance. Some useful metrics provided for each basketball match are the following
(RQ1):

● Plus/Minus (+/- or PM): measures the impact of a player in a game (quality and
contribution). The calculation or difference of points is when a team scores versus
the opponents score points [91]. The problem with that metric is that it does not count
the matchup between the players.

● Adjusted Plus Minus (Adj +/- or APM): The adjusted plus minus is the player
statistic for rating. It calculates the influence of a player with or without a presence
on the team line-up [191]. In the NBA, the APM is one of the dominant evaluation
indicators [192].

● Real Plus Minus (Real +/- or RPM or RAPM): Included the Real Plus Minus wins
(RPM Wins) and the number of the possession’s metrics. The RPM is the net value
of ORPM (Offensive Real Plus Minus) and DRPM (Defensive Real Plus Minus) for
the estimated on-court influence on team performance.

● Player Impact Plus Minus (PIPM): Another version of the plus-minus metric that
adjusts the box-score value with the luck adjusted plus minus data [193].

● Player Impact Estimate (PIE): calculates the overall player’s contribution against
the total number of stats in games that they played [194], [59].

● CARMELO focuses on the win forecasting topic based on player statistics and ELO
ratings. This model takes into account the personal status of wins and losses [91].

● Expected Possession Value (EPV): evaluates and quantifies values that make a
player decide during the game [195], [196], [197].
73
● Wines Above Replacement (WAR): reflects a combination of a player’s projected
playing time and his projected productivity while on the court. It is computed using
the BPM variable [179]. This approach is the same as that for the WORP.

● The Performance Index Rating (PIR) is used in European basketball leagues to


determine players’ total performance [66].

● Game Score (GmSc) pays attention to any statistical detail of the player’s box score
[18].

● Net Rating (NetRtg) is used in the NBA for counting a team's point differential per
100 possessions [91].

● The Pythagorean Win Percentage is an estimation that shows a team’s win


percentage based on their points for and against [7], [198].

● Player Efficiency Rating (PER) is a per-minute rating. PER sums up all the positive
actions of players, deducts the negative events, and returns a per-minute rating of a
player's performance [182].

● Value over Replacement Player (VORP) is a box score estimate of the points per
100 team possessions that a player contributed above a replacement-level (-2.0)
player; this value is translated to an average team and prorated to an 82-game season.
Multiplied by 2.70, it converts to wins over replacement (WORP). VORP is a
positive\negative real-value analytic [179].

● Win Shares (WS): This is an estimation of the number of wins that each player
contributed to his team’s win overall during the season [13].

● Tendex: Statistical model to determine player efficiency of basketball players. The


first rating formula was considered by using linear weights [199].

74
Nevertheless, most of the metrics share similarities in terms of their measurements that
intend to provide a total perspective on a player’s statistical performance [79]. A
valuable player will achieve high performance in most of these metrics. In contrast,
conventional players will have low values of all of these traits [68].

The aim of this research is to satisfy the increasing demand for new techniques and
provide significant insights and advanced analytics for teams, technical staff, and
players. Player performance prediction depends on many variables, such as psychology,
injury risk [19], bad shots in the starting minutes of the match, and opponent match-ups,
which can have important impacts on future performance but are very difficult to define
and quantify [7]. Nevertheless, technical staff and data analysts need to evaluate these
metrics, monitor, and track performance and ultimately make important decisions for
future acquisitions or selections within a team [200].

The team’s line-up evaluation and final choice are critical during the game but also
before the game when structuring the strategy of the team against the opponents. LinNet
is a calibrated network embedding model for line-up evaluation [191]. The
quantification of each player rating on a team roster and the profile building based on
the player style of the current season and the number of possessions can drive the
construction of the line-up [201].

In conclusion, this work could also be combined in the future with data-driven sensor
methodologies for each athlete and SportVU camera data that were introduced in recent
years [202]. Therefore, an integrated solution can be developed in the short future that
includes advanced metrics with proper visualizations, heatmaps, and player tendencies
[97]. Therefore, the complete solution will quantify not only game statistics but also
important behavioural metrics that depend on physical conditions. This information is
important for optimizing performance and can lead a team to win more games [41].

The rating KPIs (Table 0.1) are quite important in data analysis but can be enforced with
the analysis of unexplored moves or decisions before them. This gap is important for
understanding how players contribute to final results [196].

75
Based on previous studies of four factors of basketball analytics with different weights
assigned (eFG%, FTr, REB and TOV), their evaluation has a significant impact on team
and player performance. [19], [192], [9].

4.2 Sport injuries in the NBA League for 2010-2020

Using the proposed methodology, it was produced results addressing RQ5-7 in the first
subsection on the injury data analysis side. The second subsection compares ML and
DM state-of-the-art techniques used in the bibliography.

During the data collection and feature selection, it was identified 11,225 records
indicating players missed NBA league games during the last decade (2010 to 2020).
Further review of the retrieved data revealed records related (n = 8667) or unrelated (n
= 2558) to a specific injury or pathology. The latter group included records
corresponding to decisions with regard to player injury status or readiness for
competition (e.g., “activated from or placed on injury list (IL) for rest”, “returned to line
up”, “do not play (DNP)”, “did not dress (DND)”, “day to day” (DTD), “rest” or
“conditioning”). From all the records indicating an injury or pathology, 3753 patients
were referred for repeated recordings of the same injury suffered by a player and were
therefore excluded from further data analysis. Eventually, only 5414 records
corresponded to an actual injury or pathology, 596 of which were related to general
health pathologies, 185 to head injuries and 4112 to musculoskeletal injuries.

Pathology or injury was either undisclosed or unspecified in 521 records (Table 4.1).
The general health-related pathologies, such as appendicitis, flu, or gastroenteritis, were
classified according to the main organ system affected, i.e., the respiratory and digestive
systems. Other pathologies (n=19) related to the circulatory system (n=7), the
integumentary system (n=2) and the reproductive system (n=1), as well as oral/tooth
(n=8) and ear problems (n=1), were reported collectively in a separate group. Several
pathologies remained unclassified, as there was no indication of the affected organ
system (e.g., illness, general soreness, viruses, allergic reactions). Head-related injuries
were also classified according to the system organ affected, which included the
integumentary (e.g., lacerations), the nervous (e.g., concussions) and the

76
musculoskeletal system (e.g., facial bone fractures, bruises). Finally, musculoskeletal
injuries were classified according to four major anatomical areas, i.e., the neck, the
trunk, and the upper and lower extremity.

A separate group of records (n=21) consisted of musculoskeletal injuries that occurred


in multiple distinct major anatomical areas (e.g., ankle and abdominal area, upper and
lower limb). Further classification of the musculoskeletal injuries in anatomical sub-
areas was performed for the trunk (chest, abdominal, thoracolumbar and pelvis areas,
including the sacral, pubic and buttock sub-area), the upper extremity (shoulder, upper
arm and forearm, elbow, and hand, thumb and fingers sub-area) and the lower extremity
(hip, thigh, knee, calf, fibular, shin, ankle, heel, foot, and toes sub-area) with the last
major area being involved in several cases of injuries in multiple distinct anatomical
sub-areas (Table 4.1).

All injuries were classified according to the major area or sub-area that occurred based
on the direct or indirect reference of a record to an injury. Direct references to a specific
injury included (i) strains of a particular muscle, (ii) sprains of a specific ligament, (iii)
cartilaginous damage (bulging or herniated discs, meniscal tears), (iv) subluxations,
dislocations, or hyperextensions of a joint, and (v) tendinopathies or (vi) bone fractures
located in any of the major anatomical areas or sub-areas specified in this study. Injuries
were also classified in a specific anatomical area when a record referred to an injury
indirectly. Such records mainly concerned symptoms such as pain, soreness,
inflammation, muscle spasm, stiffness, tightness, bruising or swelling that occurred in a
particular major anatomical area or sub-area or a method performed for specific
treatment purposes (e.g., surgery to repair a herniated disk, arthroscopic surgery
performed in a specific anatomical area, or ultrasonic treatment to remove scar tissue).

77
Table 4.1: Injury analysis categorization based on health problems, organ systems and
anatomical areas/s/ sub-areas.

Total Counts % of Health Problems


Type of Injuries/Pathologies
Health Problems of Grant Total
General health problems 596 11.01%
Digestive system 103 1.90%
Other system 19 0.35%
Respiratory system 139 2.57%
Unclassified 335 6.19%
Head injuries 185 3.42%
Integumentary system 6 0.11%
Musculoskeletal system 42 0.78%
Head 42 0.78%
Eye area 10 0.18%
Nose area 19 0.35%
Other facial areas 13 0.24%
Nervous system 137 2.53%
Head 137 2.53%
Cranial area 112 2.07%
Eye area 25 0.46%
Musculoskeletal Injuries 4112 75.95%
Musculoskeletal system 4112 75.95%
Lower extremity 2871 53.03%
Ankle area 801 14.79%
Calf area 133 2.46%
Fibular area 11 0.20%
Foot area 242 4.47%
Heel area 126 2.33%
Hip area 180 3.32%
Knee area 849 15.68%
Multiple anatomical areas 23 0.42%
Shin area 80 1.48%
Thigh area 350 6.46%
Toes area 76 1.40%

78
Multiple anatomical areas 21 0.39%
Neck 40 0.74%
Trunk 594 10.97%
Abdominal area 168 3.10%
Chest area 51 0.94%
Pelvic area 13 0.24%
Thoracolumbar area 362 6.69%
Upper extremity 586 10.82%
Elbow area 66 1.22%
Hand, Thumb & Fingers area 203 3.75%
Shoulder area 204 3.77%
Upper arm and Forearm area 14 0.26%
Wrist area 99 1.83%
Undisclosed or unspecified 521 9.62%
Total Unique reasons of Player
Absence 5414 100.00%

Table 4.1 shows that musculoskeletal injuries were the most common (75.95%),
followed by general health problems (11.01%), unspecified injuries (9.62%) and head
injuries (3.42%).

Injury analytics classified distinct injury/pathology reasons for player absence into four
different categories (health problems, organ systems, major anatomical areas, and
anatomical sub-areas (Table 4.2). Based on the injury classifications applied in the
collected data, the injuries were transformed in a way that was associated with basketball
performance analytics. Therefore, this correlation has shown that there is a negative
impact on player and team performance as injuries occur more often.

79
Table 4.2: Classification of distinct injury/pathology reasons for player absence during
the NBA period 2010-20.

Health problems Organ systems Major anatomical areas Anatomical sub-areas


General - health – Digestive (103)
related pathologies Respiratory (139)
(596) Others (19)
Unclassified (335)
Head injuries (185) Musculoskeletal Eye (10)
(42) Nose (19)
Other facial sub-areas
(13)
Nervous (137) Eye (25)
Cranial (112)
Integumentary (6) Eye (3)
Mouth (2)
Forehead (1)
Musculoskeletal Musculoskeletal Neck (40)
injuries (4112) (4112)
Trunk (594) Abdominal (168)
Thoracolumbar (362)
Chest (51)
Pelvis (13)
Upper extremity (586) Shoulder (204)
Upper arm & Forearm
(14)
Elbow (66)
Wrist (99)
Hand, Thumb &
Fingers (203)
Lower extremity (2871) Hip (180)
Thigh (350)
Knee (849)
Calf (133)
Fibular (11)
Shin (80)
Heel (126)
Ankle (801)
Foot (242)
Toes (76)
Multiple sub-areas (23)
Multiple major areas (21)
Undisclosed or
unspecified (521)
5414 unique reasons of absence
80
This study examined in more detail the major anatomical areas of musculoskeletal
injuries, as these areas account for the majority (75.95%) of health problems in the NBA.
We found, as shown in Table 0.6, that “lower extremity” was the most common type of
injury, accounting for 69.82% of the total musculoskeletal injuries. “Trunk” and “upper
extremity” injuries were next most common, with 14.45% and 14.25%, respectively,
while “neck” (0.97%) and “multiple anatomical areas” (0.51%) demonstrated the lowest
injury frequency. Thus, it was conclusively answered in RQ5 (Which were the most
common types of general health pathologies and injuries in the NBA League during the
last decade?).

Additionally, in this study performed null hypothesis significance testing to validate the
hypothesis with RQ7 that injuries affect NBA players’ and team performance,
correlating injury and performance analytics. The injury analysis data were categorical,
while the performance analysis data were numerical. Therefore, the former was
transformed into a numerical model to test the null hypothesis using the Pearson model.
Table 0.5 presents the correlation and p values between injury and performance
analytics. According to these results, all the performance variables can be considered to
have a weak positive relationship with injury analytics. The difference was statistically
significant (with a p value less than 0.05), which disproves the null hypothesis;
therefore, it could accept that certain injury analysis data are correlated with basketball
performance analytics. However, it could not conclude that these factors are the major
factors influencing performance. For example, in the case of the “Value Over
Replacement” (VORP) performance metric in relation to health problems, it was noticed
a minor positive correlation (correlation value = 0.2025 and p value = 0), whereas the
“Box Plus-Minus” (BPM) performance metric in relation to anatomical subareas
showed no relationship (correlation value = -0.03358 and p value = 0.0066). Figure 4.1
illustrates the correlation matrix based on the coloured correlations of injury and
performance analytics.

81
Figure 4.1: Correlation matrix between NBA performance and injury analytics

According to the results, the hypothesis provides strong evidence of statistical


significance between performance and injury analytics, but there is a minor positive or
zero correlation because the performance of players is a multivariate phenomenon.
Hence, performance can be influenced by injuries, but it is just one factor out of several.
Based on the aforementioned statements, it could be hypothesized that team

82
performance also has a weak relationship with injuries because players are part of a
team.

The following Pareto chart (Figure 4.2) plots the distribution of pathology and injury
events among the teams in the period 2010–20 in descending order of frequency. The
cumulative line on the secondary axis is shown as a percentage of the total events. Figure
4.1 illustrates the cumulative injuries and health problems from 2010 to 2020.
Consequently, San Antonio Spurs (SAS) had the most injuries and pathologies (241),
while the Oklahoma City Thunder (OKC) team had the least distinct impairments (116),
partly answering RQ6 (Which teams and players suffered from the most injuries?).
Table 4.2 shows the occurrences of player injuries based on the proposed classification
derived from day-to-day injury reports. For instance, for the “Anatomical sub-areas”
showed that the most common musculoskeletal injuries in this period were “Knee area”
and “Ankle area”, with 849 and 801 occurrences, respectively.

Figure 4.2: Pareto chart of pathology/injury events in the NBA teams from 2010–2020.

83
4.3 Results through Data Mining and Machine Learning methods
used for Sports Injury Analytics

Currently, there is an increasing demand for new techniques for sports, with the purpose
of providing significant insights and advanced analytics for teams, technical staff, and
players. Player performance prediction is correlated with many variables, such as
psychology, injury risk [19], bad shots in the starting minutes of the match, good
condition and opponent match-ups. These can influence performance but are difficult to
explain and quantify due to the high level of uncertainty [7]. Nevertheless, technical
staff and data analysts need to evaluate these metrics, monitor, and track performance
and make important decisions for team selection or future acquisitions through scouting
[200].

This part of the research provides an overview of injury analytics and correlations with
performance using basketball analytics. In addition, it was reviewed the state-of-the-art
techniques and methodologies of DM and ML in the bibliography used for performance
evaluation and prediction via injury associations.

Injuries are difficult to predict because they present a high level of ambiguity and
uncertainty. The CARMELO methodology involves a combination of analytics
forecasting methods in which regression models input Real Plus Minus (RPM) and Box
Plus Minus (BPM) statistics to estimate performance. The method does not consider
attributes such as psychology, injuries or ergometric statistics [3], [110].

A comparative analysis of the DS and ML techniques mentioned in the bibliography


over the last 10 years as sophisticated state-of-the-art methods for analysing and
predicting injuries are listed in Table 4.3.

It was reviewed the literature on ML and DM techniques used for player and team
performance evaluation in correlation with injury analytics (RQ8).

84
Table 4.3: DS and ML techniques used in injury association with basketball analytics.

Data Mining\Machine
Teams and Players injury relation
Learning technique
1) Random Forest predictive model 1) Based on that RF model teams can avoid players injuries
with autoregressive application with strategic decision for proper resting.
with logistic regression [21] 2) Injury analysis for prediction purposes based on time
2) Random Forest algorithm for series data. It requires enough data, difficult to find due to
injuries forecasting [67] high complexity.
3) Linear and nonlinear regression 3) Difficult in practise to apply feature selection methods
models [67], [13] with the use of r-squared to intercorrelate and interpret the
4) Multiple regression models [203]. results.
4) Quantify the correlation between various game
parameters with the probability to win. Optimal players
allocation to optimize planning and avoid injuries.
5) Pattern recognition methods to 5) The performance impact based on players loss of game(s)
identify optimal teams’ due to injury status
substitutions over the season [113]
6) Least Absolute Shrinkage and 6) Optimization of prediction accuracy in risk assessment
Selection Operator (LASSO) and and performance forecasting by applying variable selection
Ridge regression methods [157] and regularization methods. In case of useless variables
LASSO performs better, in other cases it is better to use
Ridge regression
7) Deep Learning and Artificial 7) ANN is a nonlinear ML method based on the complex
Neural Networks (ANN) for health correlation between inputs and outputs, trying to identify
and injury forecasting [13] patterns
8) Support Vector Machine (SVM) 8) Within per game statistics and injury information used for
with nonlinear classifiers [21], [13] basketball outcome prediction
9) Naïve Bayes networks 9) Diagnosis of sports injuries used for performance
probabilistic models' classifiers evaluation
[13], [157]
10) Decision trees used for 10) The output of that process is to help decision making
classification purposes using models with the criterion of which option is acting well based on the
with class labels and branches [13] trained algorithm.
Unsupervised methods: 11) Alternative nonlinear method based on the possibility
11) Fuzzy clustering [157] that one point belongs to more than one classes.
12) K-means clustering [157] 12) Data clustered according to feature similarity. Clustering
13) K-Nearest Neighbour [157], is performed based on centroids.
[204] 13) The neighbours are selected from a set of attributes in k-
NN class or the k-NN regression value
14) Markov Blanket classifier [157] 14) Used for injury risk identification and further
investigation in sport performance

In paper [109], it was reviewed the application of several data mining and machine
learning methods and/or algorithms for performance and injury analysis based on the
Absence decision (Rest, DTD, DNP, Out indefinitely, and Out for season) and
benchmark their accuracy. For each technique, it was assessed whether absence values
need to be transformed from categorical into numerical values in preparation for
85
execution. In addition, it was partitioned the joined aggregated Performance and Injury
dataset according to the 80/20 Pareto rule [205]; that is, 80% of the dataset was used for
training, and 20% was used for testing. The results of this comparative analysis are
depicted in Table 4.4Error! Reference source not found.. According to the
Performance and Injury Analyses, the XGBoost Tree Ensemble, XGBoost Linear
Ensemble (Regression) and Linear Regression methods were more accurate at
predicting missing signals.

To find important insights into the load and rest management of teams, it was attempted
to explain and quantify the impact of “Rest” on team performance. For that reason, and
for the purposes of further data analysis, it was excluded players with playing times less
than 20 minutes per game for each season from the initial dataset. This can be justified,
as only these players are expected to significantly influence team performance through
Rest/Load management. Therefore, performance analytics clustered players into three
groups based on Rest Management, that is, days of absence due to Rest.

The results of this analysis revealed that teams that do not use most of their resources,
in terms of time, players, demonstrate the lowest performance. The bottom of Table 4.4
comprises 10 teams with the worst performance: 15.21 for PER and 3.21 for WS, with
a BPM of -0.19 and a VORP of 0.88. The best outcomes, with respect to performance
analytics, were achieved by teams with balanced rest management, as indicated by the
middle group of 10 teams in Table 4.5.

Table 4.4: Performance and Injury Analytics Based on the Accuracy of Absence
Reasons.

Data Mining/Machine Learning algorithms & Techniques Accuracy


XGBoost Tree Ensemble 99.98%
XGBoost Linear Ensemble - Regression 99.76%
Linear Regression 99.75%
Naïve Bayes 96.30%
Tree Ensemble – Regression 90.64%
Random Forest – Regression 90.06%
AutoRegressive Integrated Moving Average (ARIMA) 83.30%
Fuzzy Rule 79.75%
k-nearest Neighbor (k=2) 78.38%
Gradient Boosted Trees – Regression 77.16%
Polynomial Regression 55.58%
Support Vector Machine (SVM) 31.60%

86
Table 4.5: Team performance in relation to rest management in the period 2010-20.
Teams per_avg per_class ws_avg ws_class bpm_avg bpm_class vorp_avg vorp_class REST
SAS 16.87 4.81 1.61 1.64 67
CLE 15.43 3.27 0.38 1.12 31
DAL 15.63 3.33 0.41 1.00 27
GSW 15.26 4.13 0.92 1.42 25
PHI 15.24 2.42 -0.19 0.67 21
15.51 3.53 0.35 1.12
SAC 15.82 3.30 0.08 1.03 21
ATL 15.29 4.40 0.34 1.29 19
LAC 16.10 4.73 1.21 1.67 17
BKN - NJN 14.73 2.52 -0.55 0.64 16
LAL 14.72 2.38 -0.75 0.70 13
TOR 14.99 3.37 0.08 0.97 13
HOU 15.69 4.83 0.76 1.47 12
MIA 17.74 5.43 1.67 1.91 12
MEM 16.08 4.20 1.06 1.46 12
BOS 15.41 3.84 0.46 1.17 11
15.92 4.09 0.67 1.3
MIN 16.50 2.90 0.15 0.90 10
OKC 15.72 4.04 0.83 1.35 10
CHI 15.31 4.03 0.65 1.27 9
DEN 15.45 3.92 0.42 1.16 9
POR 16.32 4.33 0.67 1.38 9
NYK 16.60 4.06 0.37 1.15 9
IND 13.97 3.73 -0.15 0.99 8
NOH - NOP 17.29 3.32 0.70 1.19 7
WAS 15.31 2.65 -0.11 0.73 7
DET 14.93 2.90 -0.28 0.76 7
15.21 3.21 -0.19 0.88
MIL 14.58 2.81 -0.60 0.67 6
PHX 15.30 3.21 -0.04 0.93 6
ORL 15.01 3.29 -0.51 0.74 0
UTA 15.08 3.96 0.23 1.16 0
CHA 14.01 2.20 -1.46 0.45 0
Grand
15.57 3.53 0.25 1.08 414
Total

87
4.4 Findings on the Economic and Performance Impact of Injuries,
Age and Position on NBA Players

Basketball is a sport with a high level of ambiguity and interrelated parameters


according to a multivariate model. Here, it was presented findings from clustering, in
terms of age (RQ10) and feature selection for injury cost relationships (RQ9), according
to state-of-the-art techniques. For age and position criteria for high earnings in the NBA
League, it was performed clustering and demonstrated associations between
demographic information and financial analytics (RQ11).

Important criteria, such as age, nutrition, training status, history and load monitoring,
psychological status, social status and lifestyle, stress tolerance and proper recovery
procedures, need to be considered to avoid the extra costs that can result in an injury or
health pathology [46].

During their careers, players can change their playing position for various reasons, such
as coaching staff decisions, opponent adaptation, specific skills targeting them, body
transformation, and age criteria. For consistency purposes, it was assigned a player to
his most common position (RQ11).

A basketball team comprises 12 active players; these players are eligible to play at any
time during a game. Therefore, proper player selection on the roster can enhance team
performance, and each minor good decision during every second of the game can take
the lead or win in the match. Over the past 10 years, the trending coaching style has
involved distributing playing time to all players of the roster by keeping or changing the
game tempo according to their strategy. For that reason, the nomination of the 6th player
in the NBA League is a good sign that technical staff are giving extra attention to all
players in the roster. Therefore, appropriate roster selection, team management, role
assignment and rotation are significant factors for achieving better results and can
provide added value to a team [206], [207], [208], [209], [210], [36].

During the NBA league season, important salary decreases occurred on 2011-12 and
2012-13 due to the NBA lockout decision, which shortened the season from 82 to 66
games per team, with total average annual salary differences of -3.38% and -9.52%,
88
respectively, compared with the previous year’s average salary [178], [211], [212]. The
worldwide recession of the global economy between 2007 and 2009 influenced NBA
player salaries, resulting in an 8% average decrease in the 2009-10 season [213], [214],
[215] (RQ11).

4.5 Results in Recovery from Injuries and Their Economic Impact

In this section, the results show the effects of injuries on basketball players' performance
by comparing a series of performance metrics captured during 2, 5, and 10 games before
and after injury events. Combining Cohen's d with the t test in research provides a more
comprehensive understanding of the statistical significance and practical importance of
the data when evaluating effect sizes in analysis. Cohen’s d thresholds of 0.2-0.5 for
small effects, 0.5-0.8 for medium effects and greater than 0.8 for large effects are used
in science and represent standard benchmarks for interpreting the magnitude of an effect
size, regardless of the direction (positive or negative) of the effect. In the context of the
research, t tests were used to identify metrics that showed significant differences pre-
and postinjury. Hence, Cohen's d was used to categorize these differences into small,
medium, or large effects. This combination of tests allows us not only to report which
differences are statistically significant but also to refer to the practical implications of
these differences based on their magnitude. Utilizing statistical methods to assess the
significance and magnitude of changes in player efficiency, it was aimed to reveal
actionable insights into the impact of injuries [216], [217] and [218]. This study not only
quantifies the immediate and short-term implications of injuries but also contributes to
the strategic planning of player recovery and game strategy, offering a valuable resource
for coaches, medical staff, and sports analysts.

Table 4.6, Table 4.7 and Table 4.8 show the results of analysing anatomical sub-areas
in correlation with the significance of the dataset's performance metrics, as stated in
Table 0.11, the effect size, and the average percentage of change in 2, 5 and 10 game
series (RQ12).

89
2 Games before/after the injury
Areas with Significant Impact:
• The anatomical sub-areas that had statistically significant impacts (p values less
than 0.05) included the ankle area, knee area, thigh area, abdominal area, and many
others, totalling 18 different areas.
• Cohen’s d and t-stat:
• Large Effect (>0.8): No areas with a large effect size were identified.
• Medium Effect (0.5-0.8): No areas with a medium effect size were identified.
• Small Effect (0.2-0.5): The upper arm and forearm area was the only area with a
small effect size.
Percentage change:
• Most Impacted: The abdominal area experienced the most significant average
percentage change, suggesting a considerable decrease in performance metrics
postinjury.
• Least Impacted: The Chest area is the least impacted based on the average
percentage change, indicating a less substantial decrease in performance.
Areas of Concern:
• The areas of concern due to significant p values but smaller effect sizes included
the ankle area, knee area, thigh area, and several others, highlighting the need for
careful consideration of both statistical significance and effect size.

Table 4.6: 2 Games before/after the injury.


Average Median Avg
Median Avg Median Avg Median
Anatomical subareas Avg P Value Percentage Percentage
P Value T-Stat T-Stat Cohen's d Cohen's d
Change Change
Ankle area 1.03E-28 1.11E-68 18.77 19.18 -0.40 -0.41 -18.73 -21.97
Knee area 2.78E-39 1.33E-45 16.29 14.39 -0.32 -0.28 -14.00 -15.47
Thigh area 7.85E-20 4.45E-31 12.45 12.41 -0.38 -0.40 -15.60 -17.80
Abdominal area 8.68E-06 1.82E-28 10.80 11.68 -0.54 -0.58 -28.09 -29.06
Foot area 3.67E-17 4.80E-25 11.12 10.62 -0.41 -0.39 -23.16 -24.21
Thoracolumbar area 5.98E-14 9.26E-21 10.53 9.48 -0.30 -0.28 -11.32 -10.69
Hand, Thumb & Fingers area 5.73E-06 2.67E-18 8.55 9.03 -0.38 -0.40 -14.90 -16.58
Hip area 8.72E-11 2.19E-17 8.30 8.75 -0.37 -0.40 -13.88 -16.43
Shoulder area 2.25E-08 7.90E-16 8.49 8.78 -0.37 -0.42 -17.46 -16.97
Calf area 2.17E-08 5.48E-13 7.56 7.41 -0.36 -0.40 -16.73 -18.81
Wrist area 5.63E-05 4.93E-08 6.06 5.58 -0.34 -0.26 -9.96 -22.79
Heel area 2.97E-04 1.78E-07 5.69 5.32 -0.32 -0.25 -14.11 -19.22

90
Elbow area 2.64E-03 1.07E-05 2.97 4.51 -0.23 -0.31 1.73 -13.62
Chest area 1.96E-02 1.47E-02 -0.23 -2.10 0.00 0.22 17.21 22.96
Pelvic area 3.33E-02 2.17E-02 1.05 2.15 -0.16 -0.32 -1.18 -23.34
Fibular area 3.38E-02 2.45E-02 1.63 2.31 -0.28 -0.39 -5.57 -15.27
Shin area 3.27E-02 3.04E-02 1.10 2.16 -0.17 -0.30 -7.61 -22.61
Upper arm and Forearm area 5.14E-02 5.01E-02 -1.74 -2.05 0.35 0.43 1.09 40.49

5 Games before/after the injury


Areas with significant impact: Very significant impacts on performance are observed
in various areas, such as the ankle, knee, thigh, thoracolumbar, and foot areas.
• Cohen’s d and t-stat:
• Large Effect (>0.8): No areas fall under this category.
• Medium Effect (0.5-0.8): No areas with a medium effect size are identified.
• Small Effect (0.2-0.5): No areas with a small effect size are identified.
• Percentage change:
• Most impacted: The upper arm and forearm areas were identified as the most
impacted areas based on the average percentage change, indicating a
considerable decrease in performance postinjury.
• Least impacted: The Shin area is identified as the least impacted area,
suggesting a relatively smaller decrease in performance.
• Areas of Concern:
• Areas of concern with significant p values but without a strong effect size
included the ankle, knee, thigh, thoracolumbar, foot, and other areas. These areas
may require further attention despite showing statistical significance.

Table 4.7: 5 games before and after the injury.


Median
Avg P Median Avg T- Median Avg Median Average %
Anatomical sub-areas of %
Value P Value Stat T-Stat Cohen’s d Cohen’s d Change
Change
Ankle area 1.78E-30 7.46E-74 18.04 18.78 -0.35 -0.35 -4.88 -7.39
Knee area 6.39E-33 6.94E-52 16.20 15.43 -0.28 -0.26 -1.56 -4.12
Thigh area 4.32E-22 9.77E-35 12.73 12.70 -0.35 -0.34 -9.91 -12.09
Thoracolumbar area 3.68E-19 2.37E-25 11.63 10.93 -0.30 -0.28 -3.27 -2.16
Foot area 5.60E-15 3.04E-24 10.58 10.42 -0.35 -0.34 -3.33 -6.83
Abdominal area 6.03E-09 2.79E-20 9.48 9.59 -0.42 -0.43 -8.54 -10.39
Shoulder area 3.15E-09 1.49E-19 8.77 9.36 -0.36 -0.36 -4.78 -5.79

91
Hip area 2.30E-10 4.40E-17 8.37 8.66 -0.35 -0.35 -3.70 -4.24
Calf area 8.50E-11 5.00E-14 7.70 7.76 -0.34 -0.34 -7.96 -9.26
Hand, Thumb & Fingers area 1.33E-07 8.27E-13 8.16 7.30 -0.35 -0.29 -5.57 -5.41
Wrist area 2.60E-06 8.93E-12 7.01 7.07 -0.42 -0.43 -10.10 -9.99
Heel area 6.18E-06 3.00E-10 6.41 6.48 -0.35 -0.35 -7.05 -8.92
Toes area 5.40E-05 6.23E-09 6.03 6.01 -0.35 -0.33 -4.87 -2.27
Elbow area 1.68E-04 1.51E-05 3.87 4.43 -0.28 -0.34 -3.50 -8.18
Chest area 2.29E-02 4.40E-03 2.95 2.92 -0.29 -0.33 -3.32 -5.24
Shin area 2.29E-02 6.91E-03 1.34 2.44 -0.18 -0.27 4.89 -1.10
Pelvic area 3.07E-02 2.10E-02 1.89 2.42 -0.26 -0.29 -7.34 -16.70
Upper arm and Forearm area 4.53E-02 4.15E-02 0.86 1.95 -0.10 -0.28 -10.80 -4.25
Fibular area 4.47E-02 4.54E-02 1.69 2.08 -0.28 -0.35 0.90 4.41

10 Games before/after the injury


Areas with significant impact: Very significant impacts on performance are noted in
various areas, such as the ankle, knee, thigh, thoracolumbar, and foot areas.
• Cohen's d and t-stat:
• Large Effect (>0.8): No areas fall under this category.
• Medium Effect (0.5-0.8): No areas with a medium effect size are identified in
the latest dataset.
• Small Effect (0.2-0.5): No areas with a small effect size are identified. This
finding suggested that either the effect sizes are less than 0.2 or that the criteria
for categorization might need adjustment based on the dataset specifics.
• Percentage change:
• Most impacted area: The chest area was identified as the most impacted area
based on the average percentage change, suggesting a notable decrease in
performance metrics postinjury.
• Least impacted: The Upper Arm and Forearm Area are identified as the least
impacted areas, which may suggest a relatively smaller change in performance.
• Areas of Concern:
• Areas of concern with significant p values but without a corresponding large or
medium effect size included the ankle, knee, thigh, thoracolumbar, foot, and
several other areas.

92
Table 4.8: 10 games before and after the injury.
Anatomical subareas Median
Avg P Median Avg T- Median Avg Median Average of
%
Value P Value Stat T-Stat Cohen’s d Cohen’s d % Change
Change
Ankle area 2.05E-41 2.65E-57 16.19 16.34 -0.28 -0.26 -0.76 -3.83
Knee area 8.79E-42 2.35E-45 15.91 14.37 -0.26 -0.22 3.17 3.43
Thigh area 1.02E-26 2.86E-29 11.94 11.54 -0.31 -0.32 -2.42 -3.81
Thoracolumbar area 2.35E-26 7.47E-29 12.01 11.39 -0.27 -0.27 3.05 2.45
Foot area 2.34E-15 6.41E-24 10.49 10.36 -0.31 -0.31 1.18 -1.29
Abdominal area 7.17E-09 6.73E-18 8.83 9.02 -0.35 -0.36 -3.63 -5.08
Shoulder area 5.64E-14 5.82E-14 7.84 7.69 -0.29 -0.27 0.57 -1.47
Hip area 3.97E-11 1.52E-13 7.58 7.56 -0.29 -0.27 5.20 1.55
Calf area 3.12E-10 1.17E-11 7.19 6.95 -0.29 -0.25 0.57 0.54
Heel area 2.07E-07 8.96E-11 6.85 6.69 -0.33 -0.33 -5.43 -5.25
Hand, Thumb & Fingers area 1.10E-08 3.50E-10 7.11 6.67 -0.28 -0.26 1.49 0.68
Wrist area 3.40E-05 1.43E-09 5.67 6.23 -0.33 -0.35 1.49 -2.68
Toes area 1.33E-05 4.26E-08 5.72 5.76 -0.30 -0.27 -4.02 -5.09
Elbow area 2.19E-03 2.15E-06 4.04 4.87 -0.26 -0.28 -1.98 -5.40
Chest area 1.87E-02 1.02E-03 3.16 3.49 -0.32 -0.31 -49.15 -6.78
Pelvic area 2.30E-02 1.39E-02 2.36 2.60 -0.26 -0.25 -8.06 -10.80
Fibular area 3.22E-02 2.22E-02 2.37 2.42 -0.34 -0.36 -5.66 -6.96
Shin area 3.13E-02 3.25E-02 2.22 2.18 -0.24 -0.23 4.53 2.28
Upper arm and Forearm area 4.17E-02 4.12E-02 0.87 1.98 -0.11 -0.27 7.94 6.24

Table 4.9, Table 4.10 and

Table 4.11 show the statistically significant results of the advanced performance metrics
in correlation with the significance of the dataset, the effect size, and the average
percentage of change in 2, 5 and 10 game series (RQ13).

2 Games before/after the injury


Notable Observations
• Metrics such as POSS_ADVANCED, OFF_RATING_ADVANCED,
DEF_RATING_ADVANCED, and others had average p values less than 0.05,
indicating significant impacts in these areas.
• Cohen's d and t-stat:
• Large Effect: No metrics displayed a large effect size (Cohen's d > 0.8).

93
• Medium Effect: Metrics such as PLUS_MINUS_TRADITIONAL,
PCT_BLK_USAGE, and BLK_TRADITIONAL have medium effect
sizes (Cohen's d between 0.5 and 0.8).
• Small Effect: PCT_TOV_USAGE is the only metric with a small effect
size (Cohen's d between 0.2 and 0.5).
• Percentage change:
• Most Impacted: DFGM_PLAYER_TRACK is the metric with the most
significant average percentage change, indicating a considerable decrease
in performance metrics postinjury.
• Least Impacted: BLK_TRADITIONAL is the metric with the least
average percentage change, suggesting a smaller change in performance.
• Areas of Concern: Metrics such as POSS_ADVANCED,
OFF_RATING_ADVANCED, and DEF_RATING_ADVANCED have
significant p values with Cohen's d <= 0.2, which may indicate areas of concern
despite their statistical significance.

Table 4.9: 2 games before/after the injury – Basketball Performance Analytics.


Median Avg Median Average
Median Avg P Avg Median
Performance Metric Cohen' Cohen' % of %
P Value Value T-Stat T-Stat
sd sd Change Change
POSS_ADVANCED 4.24E-32 4.24E-32 12.55 12.55 -0.53 -0.53 2.40 2.40
FG_PCT_PLAYER_TRACK 1.51E-14 1.51E-14 8.05 8.05 -0.54 -0.54 -15.97 -15.97
OFF_RATING_ADVANCED 2.32E-07 3.00E-04 7.13 6.82 -0.61 -0.62 -15.85 -15.51
DEF_RATING_ADVANCED 7.02E-10 6.34E-04 7.01 7.09 -0.58 -0.59 -14.09 -15.22
E_DEF_RATING_ADVANCED 1.04E-02 1.04E-02 2.79 2.79 -0.66 -0.66 -5.49 -5.49
OPP_FTA_RATE_FOUR_FACTORS 3.56E-02 3.56E-02 -2.24 -2.24 0.55 0.55 27.00 27.00
PCT_PF_USAGE 4.10E-02 4.10E-02 -2.18 -2.18 0.61 0.61 61.32 61.32
E_NET_RATING_ADVANCED 4.15E-02 4.15E-02 -2.16 -2.16 0.60 0.60 -183.09 -183.09
OPP_TOV_PCT_FOUR_FACTORS 4.95E-02 4.95E-02 -2.07 -2.07 0.65 0.65 27.79 27.79

5 Games before/after the injury


Metrics with significant impact: All the provided metrics have average p values less
than 0.05, indicating statistically significant impacts.
• Cohen's d and t-stat:
o Large Effect: None.

94
o Medium Effect (0.5-0.8): Metrics such as
'OPP_FTA_RATE_FOUR_FACTORS', 'PCT_PF_USAGE',
'E_NET_RATING_ADVANCED', and
'OPP_TOV_PCT_FOUR_FACTORS' fall into this category.
o Small Effect (0.2-0.5): There are no metrics falling into the small effect
category.
• Percentage change:
o Most impacted: 'E_NET_RATING_ADVANCED' is the most impacted
metric with the highest median average percentage change.
o Notably, “Positive: Metrics”, including “POSS_ADVANCED”,
“OPP_FTA_RATE_FOUR_FACTORS”, “PCT_PF_USAGE”, and
“OPP_TOV_PCT_FOUR_FACTORS”, exhibited notable positive changes.
o Metrics of Concern: Metrics such as 'POSS_ADVANCED',
'FG_PCT_PLAYER_TRACK', 'OFF_RATING_ADVANCED',
'DEF_RATING_ADVANCED', and 'E_DEF_RATING_ADVANCED' are
of concern due to their negative Cohen's d values.

Table 4.10: 5 games before/after the injury – Basketball Performance Analytics.


Median Average
Median Avg P Avg T- Median Median Avg
Performance Metric % of %
P Value Value Stat T-Stat Cohen's d Cohen's d
Change Change
POSS_ADVANCED 4.24E-32 4.24E-32 12.55 12.55 -0.53 -0.53 2.40 2.40
FG_PCT_PLAYER_TRACK 1.51E-14 1.51E-14 8.05 8.05 -0.54 -0.54 -15.97 -15.97
OFF_RATING_ADVANCED 2.32E-07 3.00E-04 7.13 6.82 -0.61 -0.62 -15.85 -15.51
DEF_RATING_ADVANCED 7.02E-10 6.34E-04 7.01 7.09 -0.58 -0.59 -14.09 -15.22
E_DEF_RATING_ADVANCED 1.04E-02 1.04E-02 2.79 2.79 -0.66 -0.66 -5.49 -5.49
OPP_FTA_RATE_FOUR_FACTORS 3.56E-02 3.56E-02 -2.24 -2.24 0.55 0.55 27.00 27.00
PCT_PF_USAGE 4.10E-02 4.10E-02 -2.18 -2.18 0.61 0.61 61.32 61.32
E_NET_RATING_ADVANCED 4.15E-02 4.15E-02 -2.16 -2.16 0.60 0.60 -183.09 -183.09
OPP_TOV_PCT_FOUR_FACTORS 4.95E-02 4.95E-02 -2.07 -2.07 0.65 0.65 27.79 27.79

10 Games before/after the injury


Metrics with significant impact: All the provided metrics have average p values less
than 0.05, indicating statistically significant impacts.
• Cohen's d and t-stat:
• Large Effect (>0.8): None.

95
• Medium Effect (0.5-0.8): 'OPP_TOV_PCT_FOUR_FACTORS' and
'E_NET_RATING_ADVANCED' are identified in this category.
• Small Effect (0.2-0.5): Most of the other metrics fall into this category.
• Percentage change:
• The most impacted genes, 'OPP_TOV_PCT_FOUR_FACTORS' and
'E_NET_RATING_ADVANCED', were identified in this category.
• Notably, the negative controls 'OPP_TOV_PCT_FOUR_FACTORS' and
'E_NET_RATING_ADVANCED' showed notable positive changes.
• Metrics of Concern: Metrics such as 'OPP_PTS_FB_MISC',
'DEF_RATING_ADVANCED', 'OFF_RATING_ADVANCED',
'E_DEF_RATING_ADVANCED', 'POSS_ADVANCED', and
'OPP_EFG_PCT_FOUR_FACTORS' are of concern due to their negative
Cohen's d values.

Table 4.11: 10 games before/after the injury – Basketball Performance Analytics.


Median Avg Median Average
Median Avg P Avg T- Median
Performance Metric Cohen's Cohen's % of %
P Value Value Stat T-Stat
d d Change Change
OPP_PTS_FB_MISC 0.0005 0.0005 3.965 3.965 -0.650 -0.650 -11.830 -11.830
DEF_RATING_ADVANCED 0.0037 0.0034 5.640 3.111 -0.603 -0.647 -11.010 -11.610
OFF_RATING_ADVANCED 0.0051 0.0051 2.990 2.990 -0.538 -0.538 -10.590 -10.590
E_DEF_RATING_ADVANCED 0.0051 0.0051 3.092 3.092 -0.706 -0.706 -4.710 -4.710
POSS_ADVANCED 0.0091 0.0091 2.853 2.853 -0.543 -0.543 -11.230 -11.230
OPP_EFG_PCT_FOUR_FACTORS 0.0247 0.0247 2.404 2.404 -0.526 -0.526 -4.090 -4.090
OPP_TOV_PCT_FOUR_FACTORS 0.0264 0.0264 -2.372 -2.372 0.697 0.697 17.440 17.440
E_NET_RATING_ADVANCED 0.0446 0.0446 -2.125 -2.125 0.616 0.616 44.240 44.240

In sports and more specifically in basketball, which it was examined through this
research, understanding the intricate relationship between player injuries, their recovery
time, and the resulting economic impact on teams is crucial. This understanding not only
aids in better injury management but also in strategizing financial and team dynamics.
To delve deeper into this aspect, the study presents two key tables — Table 4.12 and
Table 4.13— each serving a distinct yet interrelated purpose. The data revealed a
significant correlation between the duration of player recovery and the financial

96
implications for NBA teams, underscoring the critical nature of injury management and
prevention in professional basketball.

The analysed dataset in Table 4.12 encompasses a total of 30 unique teams to assess the
average recovery time from injuries and quantify the total economic losses incurred by
these teams because of these injuries. This table is instrumental in highlighting the
broader impact of injuries across the League, offering insights into the average duration
players take to recover and how this downtime translates into financial terms for their
respective teams. The average recovery time for each team was calculated by summing
the recovery periods for all injuries incurred by players on each team over the years and
then dividing this sum by the total number of injuries. To determine the total financial
losses, it was multiplied the per-game salary of each player by the number of games
missed due to injury and then aggregate these totals for each team, corresponding to the
teams to which the respective players belonged over the years.

The average recovery time across all teams is approximately 35.98 days, with the total
sum of team losses reaching $21,208,828,385.5. The team with the highest average
recovery time was NOH & NOK & NOP (New Orlean Hornets and New Orleans
Pelicans are the same teams with rebranding or relocation), at 48.19 days, which was
correlated with a sum of team losses of $850,911,070.3. Conversely, the team with the
lowest average recovery time was SAS (San Antonio Spurs), at 21.36 days, associated
with a sum of team losses of $484,022,964.1 (RQ15).

Table 4.12: Team Recovery Time Correlated with the Sum of Losses in the Period 2000
to 2023.

Average Recovery Time (2000 - Sum of Losses (2000 -


Teams
2023) 2023)
GSW 38.33 $980,043,613.6
DEN 33.96 $898,608,370.5
WAS 42.38 $896,393,189.0
CLE 38.91 $884,262,820.6
NYK 43.09 $882,353,294.8

97
NOH & NOK & NOP1 47.13 $850,911,070.3
HOU 35.51 $831,909,820.5
LAL 43.89 $829,088,874.5
MEM & VAN2 37.27 $818,408,844.2
MIL 33.00 $803,077,829.5
BKN & NJN3 35.94 $787,297,893.7
POR 42.67 $725,597,473.1
TOR 37.21 $719,025,631.9
DAL 31.81 $717,319,559.4
IND 28.26 $707,021,308.5
MIA 27.46 $696,390,943.7
MIN 39.37 $687,081,943.9
CHA & CHH4 40.73 $672,569,410.7
LAC 32.64 $669,659,380.7
ORL 34.44 $650,059,655.1
PHI 29.59 $633,630,120.4
ATL 35.52 $629,062,649.4
CHI 41.94 $620,958,846.2
SAC 37.62 $593,901,503.1
PHX 37.92 $553,901,960.3
UTA 32.62 $552,960,349.6
DET 33.35 $494,764,653.5
SAS 21.36 $484,022,964.1
BOS 37.31 $470,742,686.1
OKC & SEA 29.40 $467,801,724.6
Grand Total 35.98 $ 21,208,828,385.5

Table 4.13, on the other hand, takes a more granular approach by breaking down injuries
into specific anatomical sub-areas. It examines the average recovery time and associated

1
New Orleans Pelicans (previously the New Orleans Hornets, and before that, the Charlotte Hornets)
2
Memphis Grizzlies (The team started in Vancouver and moved to Memphis in 2001)
3
Brooklyn Nets (previously known as the New Jersey Nets until 2012)
4
Charlotte Hornets (The team was previously known as the Charlotte Bobcats)
98
economic losses for each type of injury were examined, categorized by the location on
the body. This detailed analysis is pivotal for understanding which injuries are most
detrimental in terms of recovery time and economic burden, thereby guiding teams and
healthcare professionals in prioritizing injury prevention and treatment strategies based
on the anatomical area affected. The average recovery time in days and the associated
financial losses for different anatomical sub-areas affected by injuries within a sport
context are shown (RQ13-15).

Table 4.13: Relationships of anatomical subareas with average recovery time and
economic losses.

Anatomical sub-areas Avg Recovery Time Sum of Team Losses


Knee area 44.47 $4,223,672,393.1
Unclassified 30.32 $3,923,783,660.9
Ankle area 32.67 $2,509,238,498.5
Thigh area 33.75 $1,544,221,395.7
Thoracolumbar area 30.02 $1,345,058,412.9
Foot area 43.91 $1,216,344,145.1
Hand, Thumb & Fingers area 51.29 $1,025,316,589.5
Heel area 45.94 $ 718,869,461.3
Shoulder area 47.04 $ 691,206,599.1
Abdominal area 35.21 $ 630,757,817.8
Calf area 36.75 $ 595,174,649.9
Hip area 28.63 $ 480,523,199.1
Wrist area 46.31 $ 412,959,249.0
Cranial area 32.92 $ 269,506,780.6
Toes area 40.45 $ 225,354,084.5
Elbow area 48.64 $ 220,358,271.9
Rest 11.59 $ 190,999,755.3
Other facial areas 64.23 $ 143,326,062.1
Neck 20.94 $ 138,930,281.6
Shin area 44.00 $ 115,452,588.8
Digestive 13.02 $ 107,701,146.1
Mouth area 25.98 $ 90,712,536.9
Eye area 36.47 $ 79,136,024.3
Fibular area 75.04 $ 64,233,781.7
Nose area 21.17 $ 54,995,090.5
Pelvic area 30.78 $ 53,377,027.9
Chest area 16.39 $ 42,115,533.9
COVID-19 76.44 $ 32,549,095.0
Upper arm and Forearm area 43.45 $ 32,083,939.4
Respiratory 19.80 $ 30,870,312.7
$ 21,208,828,385.1

99
The data encompass a range of anatomical sub-areas with varying average recovery
times and sums of team losses. The area with the longest average recovery time is the
'Hand, Thumb & Fingers area', at 51.29 days, coinciding with a financial impact of
$1,025,316,589.5. In contrast, the ‘digestive area’ had the shortest recovery time at
13.02 days, with associated losses of $107,701,146.1.

The 'Knee area' had the highest financial burden at $4,223,672,393.1, aligning with a
significant recovery time of 44.47 days. 'Respiratory' issues, despite having a low
average recovery time of 19.80 days, have a disproportionately high financial impact of
$30,870,312.7, which could reflect the broader implications of respiratory problems on
player health and availability. The total sum of team losses across all anatomical sub-
areas is substantial, amounting to $21,208,828,385.1 for the period from 2000 to 2023.

Table 4.14 and Table 4.15 provide a breakdown of injuries categorized by anatomical
sub-areas with further classification into Defensive, Miscellaneous, Offensive, and
Rating categories. The Grand Total represents the sum of all these categories for each
anatomical area. Table 4.14, Table 4.15 and Table 4.16 are based on the proper
categorization of Table 0.11 to achieve more focused analysis. Our filtered data analysis
revealed a distribution of injuries across various anatomical sub-areas, with an emphasis
on the effect size and significance of each injury. Table 0.11 shows the categorization
of basketball analytics (Defensive, Misc, Offensive and Rating). Rating metrics provide
a high-level view of a player's overall impact. Offensive and defensive metrics
breakdown the specifics of how points are scored and prevented and what efficiencies
exist in various facets of the game. The miscellaneous category offers additional context
and insights into the nuances of gameplay, such as fast break effectiveness or how
players indirectly contribute to scoring (RQ14).

In the analysis of Table 4.14, it was examined the incidence of anatomical sub-area
injuries across different performance categories: Miscellaneous (Misc), Offensive, and
Defensive plays, in addition to the ratings of these injuries. Our dataset spanned two
game seasons and included injuries related to COVID-19. The pelvic area exhibited the
greatest number of statistically significant effects on performance (11), with the
majority falling under the Defensive category (3). Interestingly, COVID-19-related
issues were notable, with a total of 9 occurrences, indicating a significant impact on

100
player availability. The wrist and abdominal areas were also common injury sites, with
7 and 8 incidences, respectively. The defensive play was associated with the highest
number of injuries (24), followed by offensive play (16) and miscellaneous causes (8).

Table 4.14: Anatomical sub-areas injuries in comparison to Categorization (Defensive,


Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-2 Game Series.

Grand
Anatomical Sub-areas (2d) Rating Misc Offensive Defensive
Total
Abdominal area 4 1 1 2 8
Ankle area 1 1 2
Calf area 1 1 2
Chest area 1 1 2
COVID-19 4 2 1 2 9
Cranial area 1 1 2
Elbow area 1 1
Eye area 1 1
Fibular area 3 1 4
Foot area 1 1
Hand, Thumb & Fingers 1 1 2
area
Heel area 1 1 2
Hip area 1 1 2
Mouth area 2 2
Pelvic area 4 3 1 3 11
Shin area 1 1 2
Shoulder area 1 1 2
Thigh area 1 1 2
Toes area 1 1
Upper arm and Forearm 1 1 1 3
area
Wrist area 4 1 1 1 7
Grand Total 20 8 16 24 68

101
The analysis of anatomical sub-area injuries across a 5+ game series (Table 4.15)
indicated that a total of 22 statistically significant effects were associated with
performance variation. Defensive plays accounted for the highest number of injuries
(11), suggesting a greater risk of this play type. The upper arm and forearm areas, along
with the abdominal area, had the highest number of injuries recorded (3 each), reflecting
their vulnerability or exposure during gameplay. COVID-19-related instances were
recorded (2), but their impact was less pronounced than that of other injury types.
Notably, only 5 injuries were rated, with the rest not specified for severity.

Table 4.15: Anatomical sub-areas injuries in comparison to Categorization (Defensive,


Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-5 Game Series.

Anatomical sub-areas (5d) Rating Misc Offensive Defensive Grand Total


Abdominal area 1 1 1 3
Chest area 1 1 2
COVID-19 1 1 2
Elbow area 1 1
Hand, Thumb & Fingers
1 1
area
Heel area 1 1
Other facial areas 1 1
Pelvic area 1 1 2
Toes area 1 1
Upper arm and Forearm
3 2 5
area
Wrist area 1 1 1 3
Grand Total 5 2 4 11 22

As shown in Table 4.16, over the course of 10 games, the analysis revealed a total of 11
statistically significant effects on performance across four anatomical sub-areas. The
upper arm and forearm areas had the highest incidence of injuries, with a total of 6
injuries, 4 of which were rated for their impact. Defensive plays were again highlighted
as the most common scenario for injuries, with a total of 4 occurrences. Injuries in other
areas were less frequent, with the fibular and pelvic areas each reporting 2 cases and the

102
abdominal area reporting 1. Only one injury was attributed to offensive play, and one
was categorized as miscellaneous.

Table 4.16: Anatomical sub-areas injuries in comparison to Categorization (Defensive,


Misc, Offensive and Rating) of performance analytics based on the Significance and
Effect size for the +-10 Game Series.

Anatomical sub-areas
Rating Offensive Misc Defensive Grand Total
(10d)
Upper arm and Forearm
4 2 6
area
Fibular area 1 1 2
Abdominal area 1 1
Pelvic area 1 1 2
Grand Total 5 1 1 4 11

103
4.6 Results of Injury Patterns and Impact on Performance in the NBA
League Using Sports Analytics

Basketball involves substantial uncertainty and interdependence within a multivariate


framework. Musculoskeletal injuries make up a significant portion of the health
problems in basketball, with a total count of 15,500, which is 65.54% of all reported
problems, as presented in Table 4.17 (RQ16).

Table 4.17: Number of grouped health problems and percentage allocation.

Health problems # Health problems % Health problems


General health problems 7532 31.85%
Head injuries 618 2.61%
Musculoskeletal Injuries 15,500 65.54%
Total 23,650 100.00%

Based on previous research [109], detailed musculoskeletal injuries span multiple major
anatomical regions (e.g., ankle, abdomen, and upper and lower limbs). Injuries were
further categorized into anatomical subareas as described in Table 4.18 the trunk
(including the chest, abdomen, and thoracolumbar), the upper extremity (encompassing
the shoulder to fingers), and the lower extremity (covering the hips to toes) [219].

Table 4.18: Number of anatomical sub-areas of musculoskeletal injuries and percentage


allocation split.

Anatomical sub-areas # Anatomical sub-areas % of Anatomical sub-areas


Knee area 3664 23.64%
Ankle area 2939 18.96%
Thoracolumbar area 1635 10.55%
Thigh area 1253 8.08%
Foot area 1137 7.34%
Hand, thumb, and fingers area 743 4.79%
Shoulder area 655 4.23%
Hip area 635 4.10%

104
Abdominal area 603 3.89%
Calf area 516 3.33%
Heel area 385 2.48%
Wrist area 361 2.33%
Toes area 268 1.73%
Elbow area 227 1.46%
Neck 168 1.08%
Shin area 114 0.74%
Chest area 98 0.63%
Pelvic area 39 0.25%
Fibular area 33 0.21%
Upper arm and forearm arm 27 0.17%
Total 15,500 100.00%

Basketball places significant demands on the musculoskeletal system, especially the


lower extremities. The data suggest that preventive measures, training modifications,
and targeted therapies could be especially beneficial for the knee, ankle, and thigh
regions. Furthermore, given the high incidence of musculoskeletal injuries, strength and
conditioning programs that focus on the entire musculoskeletal system, as well as
proprioception and balance training, may help in reducing the prevalence of these
injuries among basketball players.

The knee area stands out as the most injury-prone sub-area, accounting for 23.64% of
injuries. The ankle area (18.96%), thigh area (8.08%), and back/thoracolumbar area
(10.55%) are also significant. In contrast, areas like the toes (1.73%), upper arm and
forearm (0.17%), and fibular area (0.21%) had lower incidences of injuries (RQ16).

The lower extremities (knee, ankle, and thigh) are the most common areas prone to
injury, likely due to the physical demands of the sport/activity in question. These areas
are involved in weight-bearing, balance, and propulsion, making them susceptible to
injuries. Musculoskeletal injuries are the predominant health problem, reinforcing the
physical nature of the sport/activity. It would be worth focusing on strengthening,
conditioning, and preventive measures for these areas to reduce injury risks. Although
the neck area constitutes a smaller percentage of injuries, any injury in this region can
105
be severe, so preventive measures and safety precautions are crucial. The data provide
valuable insights for healthcare professionals, trainers, and athletes to prioritize
preventive strategies and interventions to minimize injuries in the most affected areas.

Figure 4.3 provides data on different anatomical sub-areas, covering metrics such as
statistical significance (t-stat and p-value), average salary percentage difference post
injury, average recovery time, number of players, number of injuries, positive and
negative salary changes, average salary change, and correlation with salary change
(RQ17).

Figure 4.3: Musculoskeletal anatomical sub-areas’ statistical significance and salary


correlation.

Most of the anatomical sub-areas show “Significant” results based on their p-values,
indicating that injuries in these areas have a significant impact on the metrics under
consideration. However, areas such as the thigh, thoracolumbar area, toes, upper arm
and forearm area, and wrist area are deemed “Not Significant”.

Injuries in the chest area result in the largest average salary reduction (−25.6%).
Surprisingly, some areas, such as the pelvic area and upper arm and forearm area show
a positive average salary change after injury (+13.9% and +12.6%, respectively). This
might be due to contracts, insurance, or other external factors not detailed in the table.

106
Ankle area injuries have the shortest average recovery time (41.9 days), while injuries
in the hand, thumb, and fingers area have the longest average recovery time (70.5 days).
In terms of actual salary amounts, the upper arm and forearm area sees the highest
increase (USD 1,077,624.8), whereas the chest area observes the most substantial
decrease (USD −155,840.0).

Most of the anatomical sub-areas exhibit a negative correlation with salary change post
injury. The upper arm and forearm area had the highest positive correlation (+12.6%),
while the abdominal area had the most negative correlation (−8.2%).

The Tornado diagram visually represents the variance in basketball performance metrics
post injury. Each row signifies a distinct performance metric. On the left side, they were
percentages corresponding to “Lesser Post-Injury” performances, while the right-side
displays percentages for “Greater Post-Injury” performances in Figure 2. This diagram
indicates a variance in basketball performance metrics post injury. Metrics such as
possession, defensive and offensive ratings, and usage percentage showed the highest
variance, indicating changes in players’ performance post injury.

The Figure 4.4 illustrates a comparative bar chart with two sets of data metrics, shown
in purple and green bars. The purple bars represent the percentage of a certain
performance metric that is lesser post injury (Lesser Post injury). On the other hand, the
green bars likely represent the percentage of metrics that are greater post-injury (Greater
Post injury).

The metrics POSS_ADVANCED, DEF_RATING_ADVANCED,


OFF_RATING_ADVANCED, USG_PCT_ADVANCED, and E_USG_PCT_USAGE
have the highest variance post injury, with nearly 10% of players showing either lower
or greater performances in these areas. The basketball performance analytics
terminologies are explained in a previous study [150].

107
Figure 4.4: Tornado diagram that analyzes the percentage variance in basketball
performance analytics in lesser/greater post-injury cases.

This study reveals that musculoskeletal injuries are prevalent in basketball, constituting
65.54% of all health problems (Table 4.17). The major anatomical regions affected
include the knee, ankle, and thigh. Knee injuries are most common, accounting for
23.64% of the total (Table 4.18). The data also show significant impacts of injuries on
players’ salaries in Figure 1, with the largest average reduction observed with injuries
in the chest area (−25.6%) and some areas like the pelvic area and upper arm and
forearm areas showing a positive average salary change post injury. Recovery times
varied across injuries, with ankle injuries having the shortest average recovery time
(41.9 days) (RQ18).

108
5. Discussion & Implications
The "Discussion and Implications" section of the thesis offers an insightful analysis and
interpretation of the research findings, focusing on basketball performance evaluation
and the wide-ranging impact of injuries and analytics. This section starts with a detailed
examination of crucial basketball analytics, discussing their role in categorizing player
and team characteristics and guiding strategic decisions in the sport. It presents a
breakdown of various performance metrics, such as usage percentage, net rating, and
win shares, and discusses their effectiveness in capturing the offensive and defensive
aspects of the game.

The discussion extends to defensive criteria, highlighting how metrics such as steal, and
block percentages are not only about preventing scores but also about creating
opportunities for fast break offenses. It evaluates overall basketball analytics,
emphasizing the importance of turnover rates and assist-to-turnover ratios in
understanding player efficiency and team performance.

A huge portion of the section is dedicated to a case study analysis of top NBA players
from the 2018-19 season, illustrating the application of these analytics in real-world
scenarios. The study highlights how advanced performance metrics can provide a
comprehensive evaluation of players such as Giannis Antetokounmpo, James Harden,
and others, offering insights into their contributions to the game beyond traditional box
score statistics.

The discussion also delves into the implications of injuries in the NBA, exploring their
financial and performance-related impacts. This paper presents a nuanced view of how
injuries, particularly musculoskeletal injuries, are a significant concern for players and
teams, affecting not only the individual player's performance but also the team's strategic
decisions and financial health.

Additionally, this section discusses the innovative use of Data Mining algorithms and
techniques in sports analytics, presenting a comprehensive review of various methods
and their applications. It examines how these techniques can optimize performance and

109
forecast future trends with greater accuracy, highlighting the emerging scientific field's
penetration into the sports industry.

In summary, the "Discussion and Implications" section provides a deep discussion of


the complexities of basketball performance analytics, the multifaceted impact of
injuries, and the advanced methods used to analyse and interpret sports data. It offers
critical insights and implications for players, coaches, technical staff, and the broader
sports analytics community, encouraging a more informed and strategic approach to
understanding and enhancing the game of basketball.

5.1 Discussion of Basketball Performance Evaluation

Table 0.1, Table 0.2, Table 0.3 and Table 0.4 in the Appendix section illustrate important
basketball analytics that dominate the game. A clear segmentation was conducted in this
paper with the objective of categorizing player or team characteristics in basketball and
guideline subject matters.

Table 0.1 shows the crucial performance aspects of basketball analytics. The USG
represents the percentage of player engagement during the time played. NetRtg
determines the number of points scored by a player per 100 possessions minus the
opponent of the same formula against the player. Win Shares (WS) is a five-part formula
that examines offensive play in a very precise way but does not explain defensive play
according to all important criteria. In terms of shooting efficiency, the effective field
goal percentage (eFG%) and true shooting (TS%) were calculated. Both are excellent
basketball analytics that can better explain shooting ability. The weighted formula for
eFG% is obtained by adding three points, while the weighted formula for TS% considers
all shooting categories. REB% calculates the percentage of rebounds that a player takes
when he is on court.

The defensive criteria (Table 0.2) include steals and blocks as basic metrics. DefRtg is
an advanced analytic model that shows the difference between on-court and off-court
defense performance for players or teams. Deflections and Def Loose Balls Recovered
analytics are important hustle metrics that greatly influence defensive games. In

110
addition, the influence of opponent shots is crucial for adjusting proper tactics. The
aforementioned examples are known as “truly big plays”. In fact, they are the actions
that could inspire or ignite a team as an extra boost to change the momentum and final
result of the game. STL% and BLK% represent the ability of a player to steal or block,
respectively, in each team possession. In addition, a successful steal or block does not
let the opponent score but at the same time provides the opportunity for a fast break
offense. BLK% considers field goal attempts (FGAs) in comparison with attempted
shots to explain blocking ability.

In the offense treatment (Table 0.3), the basic analysis included points, rebounds,
assists and shooting percentages at each distance. The Usage (USG) rate calculates the
possessions finished by a player, but there are cases of players with a high usage rate
while they were assisted on most of their field goals or had their own field goals.
Additionally, a good screen to the teammate can go into an easy basket (Screen Assists
metric and PTS). An offensive loose ball recovery or an offensive rebound are important
because they can give an extra team possession. AST% is important because it
determines the pace and volume of the game with an adjustment of the time played.
Points Per Possession (PPP) explains the scoring efficiency when a player has a ball.
The number of three-point field goal (3P) attempts has increased dramatically in the last
decade due to better defence, tactics, and athletic abilities of the players. 3P capability
can be recognized as a game changer because offensive skills can also be trained and
evaluated by large men on a team. By having a benchmark between FGA/Poss, OR/Poss
and TOV/Poss help to target high scores in offensive rebounds, increase FGA and offset
turnovers. PTS/Poss has an elasticity point (as they said in economics) where there is a
critical point between tempo and scoring to achieve satisfactory results on average at
each time that a player touches the ball during the offense.

Table 0.4 shows the overall categorization of basketball analytics. TOV% is a rate
metric that focuses on the percentage of times a player makes mistakes over time while
on the court. Assists/Turnover ratio (AST/TOV) can be used to measure efficiency well
since it correlates the offensive and defensive criteria according to their possessions and
is more representative than the average of assists and/or turnovers. AST/Poss and
AST/FGM are important metrics because they show how well each possession
performed and how well each person turned into a basket. STL/DP and OR/Poss should

111
have higher values than TOV/Poss for better performance results. Teams that have high
tempo might have more TOV/G and less TOV/Poss than other teams.

Table 5.1 illustrates the comparison matrix of advanced basketball analytics. The
term NetRtg refers to the offensive and defensive rating (OffRtg and DefRtg). Usually,
a team or players with higher values are the winners, but if a sports analyst uses only
this criterion, the evaluation of performance is not robust.

Table 5.1: Comparison matrix for basketball performance analytics.

Rating type Advantages Disadvantages Type


- Can be used either single - Do no show the solely impact of a
game or season player in the scoring
- Impact of player while he is - Poor handling of offense outliers Linear
+/- or BPM or
on the court - Poor handling of block shots regression
PM
- Can be tied with VORP and - Not good defense rating model
USG% for better player - Overvalue players with high values of
performance estimation USG and REB
- Do not show the specific player ability
as an individual apart from team impact.
Some coaches also select some player
duos or trios frequently
- High variance even with the use of
Adj. +/- or - Shows the efficiency for both regression with different roles, coaching Multiple
ABPM or opponents and teammates on tactics, teammates, and matchups regression
APM the court - The increment of data does not model
decrease the statistical significance
- Do not have extra info for player
tracking or play by play stats
- Bloated variances due to a
noninvertible distribution of players
- Based on the development of Linear
APM and uses in more detail regression
Real Plus aging curves and Bayesian model with
- Based only in scoring factor and impact
Minus (Real priors. a weight
of each player compared to a league
+/- or RPM or - One of the most important played on
average of players per 100 possessions.
RAPM) basketball indicators till now the square
- It is simple and of the
understandable analytic coefficients
- Focus on box score data of a game
- Can be used either single Linear
EFF - Does not have any specific weight of a
game or season formula
statistic category
- Each team correlated with Linear
ranking based on expected - ELO calculation focus only teams regression
ELO Rating
wins rating and cannot count players rating model with
- It is an effective way to rate weights
112
teams and use it for future
projections

- It was started as a fantasy


sport analytic for performance Linear
evaluation. The weights that set sometimes can give formula
FP - It is ideal for competition advantage in some categories, can be with
with long regular season disadvantage in other statistics. specific
because the statistics could be weights
normalized
- All statistics are weighted - Do not apply to seasonal statistics
Linear
differently based on frequency - Do not show the specific player ability
formula
that they occurred as an individual apart from team impact
GmSc with
- Positive and negative - In shooting categories the player should
specific
coefficient according to the have 57% for breakeven
weights
contribution - Focuses on player efficiency
- The estimation of number of
possessions can give different results or
-A normalized metric of Linear
forecasting. Sometimes there is a bias
defensive and offensive over formula
in the presented results, and this can
NetRtg 100 possessions can count the with
drive into undesirable comparisons of
ability to count the pace of specific
players or teams.
teams or players weights
- Overestimation in possessions can
drive into underestimations in ratings.
Shows how controlled is a Linear
team or a player since a faster Do not give attention into multivariate formula
PACE pace can give more factors for better player or team with
possessions\opportunities performance estimation specific
through a game. weights
- Does not count any other parameter
other than steals and blocks.
- Rewarding the inefficient shooting.
Two points field goal made worth 1.65
points and three points field goal made
-Performance rating by
worth 2.65 while missed costs 0.72
calculating positive and
PER (Player points. Hence, the shooting value could Multiple
negative accomplishments per
Efficiency be break even in 30% 2points shots and regression
minute
Rating) 21% for 3points. model
-Accounting the team's pace
- A player who shots more with the
comparing to league average
aforementioned results can gain better
PER
- Non logical phenomenon of extra high
performed PERs in extremely limited
minutes
PIE is quite similar to PER
- It captures many parameters but cannot Linear
logic by calculating the per
PIE explain in depth how performed well a formula
minute offense production and
player or a team. with
defense categories.

113
- There is arbitrage in PIR weights specific
calculations weights
PER representing the
- The logic is similar with PIE with
performance with focusing in Linear
having different weights under
per-minute contribution and formula
considerations
PIR pace adjustment. Hence, can with
- PER fails to identify the most accurate
easily compare normalized specific
value because the weights are arbitrarily
performances between team weights
calculated
and players
Pythagorean - Estimation of Win
- Simple estimation without any weight Linear
Win Percentage based on Point for
factor or regression model formula
Percentage and Points Against
A weighted basketball analytic
Linear
that counts the performance. It As all weighted formula focuses on
formula
gives extra value for positive specific criteria of the game. This
Tendex with
aspects more in assists and approach can boost some teams or
specific
steals and negative in missed players and underestimate some others.
weights
shots or turnovers
- TS% is biased in terms of FTs and
- Measuring the equivalent of Linear
underestimates the number of scored
FTA with FGA formula
points per possession
TS% - An adjustment of factor 0.44 with
- A proposed formula for precisely
based on season statistics specific
calculations could be PTS/POSS with an
could give precisely results weights
FG/FTA
Linear
- Measuring the impact of Measuring only the shooting formula
eFG% shooting efficiency of three performance of players without adding with
points added value other important factors specific
weights
Linear
A player that likes to pass more than to formula
Interpret the player usage
USG% shoot does not mean that has lower with
while he was on the floor.
impact in the game specific
weights
- Based on BPM as an
enhanced version and convert Linear
- Correlated with replacement player
VORP through the calculation into regression
factor
the estimate of overall model
contribution
It is a differentiation of Plus Multiple
Extra or less possession through the
PIPM Minus metrics by measuring regression
games can impact game result
the influence of possessions model
- The comprehensive - Being part of a good team implies
evaluation in offensive play of better score in WS
Multiple
single player - Not an overall good evaluation
WS or WS/48 regression
- Better evaluation of a single calculation
model
player due to the division of - Better evaluation take players that have
the minutes played big amount of time on the court

114
- It offers a model of marginal
offense per marginal points
per win as a contribution result
for the victory
- Based on expected
Pythagorean Win Percentage
rather than actual wins.
- It is a forecasting analytic in
a continuous manner that helps Forecasting
EPV
in decision making Focuses in microanalytic for sports and methods
(Expected
- It is a framework for related of how many possessions took a based on
Possession
basketball analytics that can team regression
Value)
overcome conventional box- models
score metrics
- Shows the importance and in
- These basketball analytics do not
which way can each player or Linear
capture the winning tendencies of players
team acts formula
and teams.
Four Factors - All these metrics associated with
- Simple logic of major factors that
with team success specific
impact the game. Score when is possible
-Based on studies the accuracy weights
and take more possessions when it is not
level is 94% on average
- Make projections based on
past and current data based on
- CARMELO methodology cannot be Forecasting
RPM and BPM and for
replicated and cannot validate the methods
defense by adding the
CARMELO accurate results of BPM and WS. based on
DRAYMOND metric
- Do not account factors as injuries, regression
- It is a blend of latest used
psychology, and work ethics models
analytics for team and player
performance
- Players can be rated on per-
Linear
minute basis accounting - Only box-score evaluation (does not
WAR (Wins formula
winning value take into account other contributions)
Above with
- Replacement level metric - Assumptions in replacement level,
Replacement) specific
evaluates the performance efficiency, USG
weights
based on minutes played
- Normalizes the sum of Linear
positive and negative formula
The estimate of 'replacement player'
VA contribution by introducing with
influence the result.
the factor of RPL and focusing specific
on PER analytic factor. weights

Table 5.2 shows comprehensive research with a classification matrix in the sports
analytics bibliography that used DM algorithms and techniques for each purpose. Our
literature verifies that Sports Analytics is an emerging scientific field that penetrates
even more of the sports industry by using DS, ML and DM techniques to optimize
performance and forecast more accuracy.

115
Table 5.2: Data Mining Algorithms and Techniques Used in Sports Analytics.

Data Mining Type of Used


Purpose Accuracy
Method Method
Suitable for large datasets and is Capture a diverse and
K-Means N nearest Trajectory
not as sensitive to outliers as comprehensive set of player
clustering [96] embedding
other clustering techniques [57] movements
Archetype
- FADA for sparse time Obtain outstanding players - ADA can be used for better
Analysis (AA)
series data (positively and negatively). performance understanding.
and Archetypoid
- ADA with h-plot for Archetypes are data-driven - AA shows the extremes cases
Analysis (ADA)
dissimilar data extreme points through data
[63]
- Simple linear models
- ANOVA
- Generalized linear Modern branch of statistics that
Functional data FDA results are consistent with
models. analyses data that are drawn
analysis (FDA) domain experts of sports
- PCA (Principal from continuous underlying
[63] analytics.
Component Analysis) processes,
- Clustering
- Classification
Variant of neural networks that
Neural Networks
can deal with sequential data of - NN achieved 54.7% accuracy.
(NN) and
variable length. It was used for - RNN scored 65.6% accuracy
Recurrent Neural
strategy classification in after better understanding of
Network (RNN)
basketball through data from data
[83]
SportVU cameras.
Latent Dirichlet LDA is a latent factor Illustrated finds repeated patterns
Organize offensive structure in
Allocation (LDA) (similar to components in offensive structure in
possessions of basketball
[220] analysis) basketball teams
Randomization Additional Monte Carlo
Inference for How much influence the wins simulations that there is no coach
Leader Effects basketball coaches effect in a team win. 20% or 30%
(RIFLE) [221] influence of teams’ success
Cross-validation (CV) Reasonable ranges with the
Regulation – Ridge
technique. CV used to proper parameters. Due to the
Bayesian [192] regression (Tikhonov
determine the optimal limit of overfitting the model to forecast
regulation)
minutes (that played) for the the performance degraded.

116
standard APM linear regression
technique.
By finding similar positions
- K-means clustering by
finder, data analysts can easily
descriptive label in The comparison of patterns for
Deep Learning find the proper positions from
each cluster to interpret individual player movements
[96] different seasons. The results
large datasets with Non on offense strategy
showed accuracy from 50% to
sensitivity to outliers
75% between the clusters
- Markov model used for
- The complex correlation of ball
expected point
movement is significant
calculation.
important by the results and
Markov - Entropy is used to The analysis of ball movement
verified by the game theoretic
Modelling [222], quantify the and effective unpredictability in
tactics.
[112] unpredictability. basketball offense
- Use of Gibbs sampler to predict
- Gibbs sampler
the full posterior distributions
(Markov chain Monte
of unknown parameters
Carlo MCMC)
By this transition, it was
Markov Model In most sports data are in two-
extended the model from spatial
transition to Modelled Two- dimensions so in general the
statistics into flexible
Poisson point dimensional data assumption in most times will be
nonparametric methods which
processes [223] not violated
allows complicated patterns
- Used algorithm Latent - NA used to find the optimal
For better accuracy used adjusted
- Network Pathway Identification path of game plays that generate
p values to classify the outliers
Analysis (NA) Analysis (LPIA) the most points
over\under performers with
- Neural - Used eigenvector - NN used to forecast the results
threshold of 10% to avoid bias.
Networks (NN) centrality to measure the of NBA games
Some interesting results are the
- Bootstrapping impact (centrality or - To count the statistical
low importance score correlated
technique [112] importance) of a node in significance of players
with small p value
the network. performance central scores

117
5.2 Basketball Performance Evaluation - Case Study

In basketball and for sports in general, there are many important analytics through which
it could make important decisions during or after the game as lessons learned for
improvement. To understand in more depth the basketball analytics referred to in Table
0.1, Table 0.2, Table 0.3 and Table 0.4, they were provided a comprehensive analysis
of different case studies of the Top 5 NBA basketball players in the 2018-19 season.
According to the most notable analytics, these players are Giannis Antetokounmpo
(MVP of the year), James Harden (top scorer of the year), Paul George (key player in
many categories), Stephen Curry (most efficient shooter) and Rudy Gobert (Defence
player of the year) [119]. This research shows the most remarkable achievements of
each player in the season, with the purpose of benchmarking them across the most
significant basketball analytics (RQ3-4).

▪ Giannis Antetokounmpo: This player had high scores in the majority of


basketball analytics and lead in WS/48 (0.292), PIPM (7.8), PIE (21.8), EFF (35.3), PER
(30.9), PACE (105.27), the AST ratio (19), and PFD (7.7). Based on the high
performance in these categories, the MVP was awarded for the year 2018-2019.

▪ James Harden: He had high skills in scoring and performance analytics, such
as PTS (36.1), AST (7.5), AST% (39.4), td3 (7), WS (15.2), BPM (11.7), GmSc (16.9),
PRA% (64.36), FP (58.7), VOPR (9.9) and USG% (39.6%). He was the real competitor
for the MVP title.

▪ Paul George: He was a valuable key player for his team by taking the lead in
the NBA League in performance analytics categories such as Deflections (3.8), Loose
Balls Recovered (2.1), STL%, RPM (7.63) and WINS (19.9).

▪ Stephen Curry: He was the offensive critical point for his team with his
effective shooting capabilities. He was notable in analytics categories such as AST/TO
(1.88), Wins Added (18.8), NetRtg (13.7), TOV% (11.6), eFG% (60.4), TS% (64.1) and
PIPM (7.4).

118
▪ Rudy Gobert: He was the defensive player of the year and showed a large hustle
for every opponent. He led in REB% (19.4), Screen Assists (6), Screen Assists PTS
(13.8), dd2 (66), BLK (2.3), TOV (1.6), DWS (5.7), DBPM (5.1), DRPM (4.4), eFG%
(66.9) and TS% (68.2%).

According to the previously mentioned analysis, a normalized radial chart (Figure 5.1)
of these top 5 players marks these performance basketball analytics. In addition, Table
1 indicates the ranges of average values for each metric to benchmark high-low values.

To conclude, the player’s nominees for the 2018-2019 season were validated either in
terms of overall basketball analytics or in specific categories based on the analysed
remarks.

Giannis Antetokounmpo PF MIL James Harden PG HOU Paul George SF OKC


Stephen Curry PG GSW Rudy Gobert C UTA
%PIMP
%ELO %WinAdded
80%
%USG %PACE

%TS 40% %PRA

%eFG 20% %Deflect

10%
%AST/TO %LoosBallRec

5%

%WINS %ScreenAssPTS

%RPM %PER

%WS/48 %PIE

%BPM %NetRtg
%VORP %FP
%EFF

119
Giannis Antetokounmpo PF MIL James Harden PG HOU Paul George SF OKC
Stephen Curry PG GSW Rudy Gobert C UTA
%PIMP
%ELO 100% %WinAdded

%USG 90% %PACE


80%
%TS 70% %PRA
60%
50%
%eFG %Deflect
40%
30%
20%
%AST/TO %LoosBallRec
10%
0%

%WINS %ScreenAssPTS

%RPM %PER

%WS/48 %PIE

%BPM %NetRtg

%VORP %FP
%EFF

Figure 5.1: Radial charts of percentage values and logarithmic normalization

120
5.3 Aggregated Performance Indicator - Forecasting Scenario

In recent years, the research community and betting companies have focused on team
win forecasting instead of concentrating on player impact on the game, as well as
identifying appropriate attributes that are the most important for forecasting purposes.

As illustrated in the forecasting scenario for three basketball seasons from 2017 to
2020 (2017-18, 2018-19 and 2019-20) in the NBA basketball competition. The data
were retrieved from various sources ([119], [128]. [129]) and aggregated into a single
dataset. After that, it was performed data cleansing to obtain data ready for analysis and
forecasting. For that reason, each season (82 basketball games = Q1–Q4) was split into
four groups. The first group represents the first quarter of the season (~20 games = Q1),
the second group relates to basketball analytics for half the season (~40 games = Q1–
Q2), and the third group relates to statistics from approximately 60 games (Q1–Q3).
Based on these evaluations, the analysis provided predictions for the MVP and Top
Defender nominee.

Hence, 20 NBA players were selected for participation in at least 30 games per season
and at least 15 minutes per game playing time on average for the whole season. An
additional condition was that they achieved nominations in different statistical
categories for these 2 seasons, as shown in Table 5.3. This table represents the awards
for MVP, Best Defender, Top in Assists, Points and Rebounds and other important
nominees. The final season (2019-20) has not yet been completed; thus, this research
will propose a prediction for the MVP and Defender of the year. To predict these two
awards, it was introduced and validated two formulas (the data were analysed and
normalized on a scale of 100) (RQ4):

121
Table 5.3: MVP, Best Defender, Top Scorer, Top in Assists, Top in Steals, Top in
Rebounds and 3 best teams of the year for two seasons, 2017-18 and 2018-19.

Players 2018-2019 2017-2018


Andre Drummond Top Rebounder Top Rebounder
Anthony Davis Top Blocker - 1st Team
Ben Simmons Rookie of the Year
Damian Lillard 2nd Team 1st Team
Giannis Antetokounmpo MVP - 1st Team 2nd Team
James Harden Top Scorer - 1st Team MVP - Top Scorer - 1st Team
Jimmy Butler 3rd Team
Joel Embiid 2nd Team 2nd Team
Karl-Anthony Towns 3rd Team
Kawhi Leonard 2nd Team GAP YEAR
Kevin Durant 2nd Team 1st Team
Kyrie Irving 2nd Team
LaMarcus Aldridge 2nd Team
LeBron James 3rd Team 1st Team
Luka Doncic Rookie of the Year GAP YEAR
Nikola Jokić 1st Team
Paul George Top Steals - 1st Team 3rd Team
Rudy Gobert Best Defender - 3rd Team Best Defender - 3rd Team
Russell Westbrook Top Assist - 3rd Team Top Assist - 2nd Team
Stephen Curry 1st Team 3rd Team

122
For the 1st formula, box score statistics and important rating basketball analytics were
selected as variables as an Aggregated Performance Indicator (API) with the
following formula:

API = [RPM (+/-) + %PER + %PIE + %4Factors + %NETRTG + %EFF + %PIR +


%Tendex + %BPM + %PIPM + %GmSc + %FP + %WS/48 + %TeamELO + %EFG%
+ %TS% + %VORP + %WinsRPM + %WAR + %EWA + %Deflections + %PACE +
%USG% + %AST/TO + %ScreenAssistsPTS + %PRA + %REB% +
%LooseBallsRecovered + %PPP + %ASTRatio ]/30

Table 5.4: API forecast for Table 5.5: API forecast for 2018 Table 5.6: API forecast for 2019
2017-18. - 19. - 20.
PLAYER Quarter API PLAYER Quarter API PLAYER Quarter API
J. Harden Q1 75.1% A. Davis Q1 76.7% G. Antetokounmpo Q1-Q3 77.8%
J. Harden Q1-Q3 73.1% G. Antetokounmpo Q1 76.6% G. Antetokounmpo Q1-Q2 77.2%
J. Harden Q1-Q2 72.0% G. Antetokounmpo Q1-Q4 76.5% G. Antetokounmpo Q1 77.0%
J. Harden Q1-Q4 71.9% G. Antetokounmpo Q1-Q3 75.2% J. Harden Q1 72.2%
L. James Q1 71.3% A. Davis Q1-Q2 74.8% J. Harden Q1-Q2 71.5%
L. James Q1-Q2 70.5% G. Antetokounmpo Q1-Q2 73.2% L. Doncic Q1 68.1%
S. Curry Q1-Q2 70.0% J. Harden Q1-Q4 72.4% J. Harden Q1-Q3 67.4%
L. James Q1-Q4 68.2% A. Davis Q1-Q3 72.1% L. Doncic Q1-Q2 65.5%
S. Curry Q1 67.5% J. Harden Q1-Q3 71.0% L. Doncic Q1-Q3 61.8%
S. Curry Q1-Q3 67.5% A. Davis Q1-Q4 69.7%
S. Curry Q1-Q4 66.7% J. Harden Q1-Q2 66.7%
L. James Q1-Q3 66.5% J. Harden Q1 64.4%

Table 5.4, Table 5.5 and Table 5.6 show the forecasts, according to the API, for the
MVP for the years 2017-18 and 2018-19, respectively, which can be verified and cross-
referenced with Table 5.3.

Table 5.6 illustrates the forecast for the MVP in 2019–20. As shown in Table 5.4
(Season 2017-18), James Harden had the best performance during the whole season
(Q1–Q4) according to the API formula and the MVP award. As shown in Table 5.5
(Season 2018-19), Giannis Antetokounmpo voted as an MVP and verified his
performance (76.5%) on the API scale. During the analysis for Season 2019-2020
(which is currently in process), the API formula predicts that Giannis Antetokounmpo
has a significant advantage for the MVP award compared with the second James Harden
(77.8% versus 67.4%, respectively) in Q1 up to Q3 of the regular season.
123
The 2nd formula focuses on the defensive criterion for the selection of the Defensive
Player of the Year, and the basketball analysis variables selected imply the following
equation for the Defensive Performance Indicator (DPI):

DPI = BLK – BLKA + PFD – PF + STL + Deflections + LooseBallsRecovered – TOV


+ ScreenAssistsPTS + AST/TO

Table 5.7, Table 5.8 and Table 5.9 verify the forecasts against the actual results in Table
5.3, while Fig. 7 shows the expected results for 2019-20. In Fig. 5, Gobert and Davis
had the same highest scores on the DPI scale, but Gobert voted as the Best Defender
from 2017-2018. As shown in Fig. 6 (Season 2018-19), the DPI formula verified that
Gobert was the Best Defender. In the last Fig. 7 (2019-20 from Q1 to Q3), DPI predicts
that Gobert, with 92.8%, is the highest candidate for this award for the regular season
(RQ2).

Table 5.7: DPI forecast 2017- Table 5.8: DPI forecast for Table 5.9: DPI forecast for 2019-
18. 2018-19. 20.
PLAYER Quarter DPI PLAYER Quarter DPI PLAYER Quarter DPI
R. Gobert Q1 87.8% A. Davis Q1 99.4% R. Gobert Q1 100%
R. Gobert Q1-Q3 87.3% A. Davis Q1-Q2 94.3% R. Gobert Q1-Q2 99.9%
R. Gobert Q1-Q4 86.8% R. Gobert Q1-Q3 91.1% R. Gobert Q1-Q3 100.0%
A. Davis Q1-Q4 86.8% R. Gobert Q1-Q4 91.0% A. Drummond Q1 92.4%
A. Davis Q1-Q3 85.8% R. Gobert Q1-Q2 89.9% A. Drummond Q1-Q2 91.8%
A. Davis Q1 84.9% A. Davis Q1-Q3 88.7% A. Drummond Q1-Q3 89.5%
R. Gobert Q1-Q2 73.5% R. Gobert Q1 87.0% G. Antetokounmpo Q1-Q3 78.2%
A. Drummond Q1-Q2 73.2% A. Davis Q1-Q4 83.3% G. Antetokounmpo Q1 77.1%
A. Drummond Q1-Q3 73.0% J. Embiid Q1 66.8% G. Antetokounmpo Q1-Q2 75.0%
A. Drummond Q1-Q4 72.1% J. Embiid Q1-Q4 65.0%
A. Drummond Q1 67.5% J. Embiid Q1-Q3 62.9%
A. Davis Q1-Q2 66.7% J. Embiid Q1-Q2 61.5%

Data scraping was performed through Python packages. All the data were retrieved from
various NBA sports sources ([119], [128] and [129]) and were aggregated in an Excel
file followed by data cleansing. In addition, normalization of the final data was
performed for the purpose of using the suggested formulas (API and DPI). Our code and
the corresponding Excel file used for data analysis can be found on GitHub at the
following link: https://github.com/vsarlis/nbastats
124
Several studies have attempted to correlate players’ salaries with their performance to
predict their “real” salaries with the use of PIE and the WinsRPM (Pythagorean Win
Estimation) as basketball performance analytics and performed regression analysis
[224]. The MVP award is a multivariate type of selection between players’ performance
advanced analytics and teams’ worth in the league [225]. The Total Performance Index
(TPI) was introduced as a proposed basketball performance metric [226] and compared
with PIR analytics.

Previous research has shown that the TPI yields better results than does the PIR (64.6%
vs 32.7%), but the former has focused on box score quantitative metrics rather than
qualitative. Cumulative Individual Accolades (CIA) was suggested as a formula but also
failed to predict the accuracy the MVP of the MVP between 2017-2018 and 2015, when
J. Harden was proposed to be in 2nd place [227]. The most accurate forecasting was
performed with the use of Back Propagation (BP) Neural Networks based on trained
data from the NBA seasons 2010-11 up to 2017-18 with the correlation of PER
basketball analysis. To avoid overfitting and overtraining of the model, they adopted the
L2 regularization method. The prediction of the DP neural algorithm yields accurate
results for MVP awards [228].

In this research paper, the API was suggested as a sophisticated formula for MVP
prediction based on qualitative and quantitative advanced analytics approved by the
basketball community. The API was used as an aggregated model of selected algorithms
that incorporated the statistical data in a way that yielded 100% accuracy for the years
2017-18 and 2018-19. For the year 2019-20, the forecast for the MVP nominee is G.
Antetokounmpo according to the API and Defensive player of the year R. Gobert based
on the DPI.

125
5.4 Health and Injury Analytics Discussion

Digital platforms quantify muscular soreness, nutritional quality, sleep quality and other
Key Performance Indicators (KPIs) for biometrics or physical conditions; these tools
can be used with limited effort and cost. Hence, these platforms are data repositories
that can be used to identify valuable information for technical staff and players through
ML.

Preprocessing, cleansing, and aggregation methods were applied to consolidate sports


data into a single dataset containing advanced basketball analytics and injury statistics
ready for analysis. Based on the results, there is a strong association between
performance analytics and injury attributes.

This research was conducted using a combination of data from various sources with
different meanings and values; therefore, feature selection and data transformation were
deemed essential in the process. Consequently, it was identified important insights
through the analysed data for team and player analytics (RQ6 and RQ7).

126
Table 5.10: Top 30 most volatile players in the NBA championship for the period 2010–
2020.
Out for Out
Top30 - Injury volatile DNP DTD Rest Total
season indefinitely
Derrick Rose 4 19 4 2 3 32
Anthony Davis 7 21 3 3 1 35
Chandler Parsons 8 8 3 1 10 30
Andrew Bogut 9 10 3 1 2 25
Wesley Matthews 2 7 3 1 4 17
Darren Collison 2 4 3 1 2 12
Blake Griffin 2 3 2 5 4 16
Tyreke Evans 15 12 2 4 0 33
Kevin Love 9 19 2 3 4 37
Chris Paul 5 11 2 3 5 26
Nene Hilario 15 15 2 2 0 34
Gordon Hayward 4 5 2 2 0 13
Kobe Bryant 1 3 2 2 4 12
Devin Booker 1 6 2 2 0 11
Rudy Gay 6 12 2 1 2 23
Joel Embiid 1 9 2 1 10 23
Jerryd Bayless 6 10 2 1 0 19
DeMarcus Cousins 4 7 2 1 5 19
Michael Carter-Williams 3 11 2 1 0 17
Dirk Nowitzki 2 5 2 1 7 17
Nikola Pekovic 8 3 2 1 0 14
Mario Chalmers 5 5 2 1 0 13
Andrea Bargnani 7 2 2 1 0 12
Lance Stephenson 2 6 2 1 1 12
Kyrie Irving 4 11 1 6 4 26
Rajon Rondo 10 6 1 4 11 32
Danilo Gallinari 3 13 1 4 1 22
Joakim Noah 9 5 1 3 0 18
Patrick Beverley 2 11 1 3 0 17
John Wall 4 4 1 3 3 15

Table 5.10 and Table 5.11 indicate the 30 most volatile players and the 30 “strongest”
players, respectively, in the period 2010-20, focusing on the “out for season” and “out
indefinitely” criteria. These criteria were set with suitable criticality weights. The
primary weight was assigned to the “Out for season” attribute, as it takes players longer
to recover and prepare for action. The secondary weight was assigned to the “Out
indefinitely” attribute because the return to line-up was ambiguous. After that comes the
“Did Not Play” (DNP) attribute as an indicator of a game not played. The “Date to Date”
(DTD) attribute refers to questionable player participation in a match. The least
weighted attribute is “Rest”, as it is not clearly associated with injury or health pathology
of the athlete (RQ6 and RQ7).

127
Table 5.11: Top 30 players with the least important absences in the NBA championship
for the period 2010–2020.
Top30 with least Out for Out
DNP DTD Rest Total
absences season indefinitely
Domantas Sabonis 0 1 0 0 0 1
Bam Adebayo 0 1 0 0 0 1
Jayson Tatum 0 2 0 0 0 2
Nikola Jokic 1 3 0 0 0 4
Donovan Mitchell 0 4 0 0 0 4
Damian Lillard 0 4 0 0 1 5
DeAndre Jordan 0 3 0 0 3 6
Giannis Antetokounmpo 1 6 0 0 1 8
Jeff Teague 3 7 0 0 2 12
James Harden 8 4 0 0 1 13
Jimmy Butler 3 10 0 0 0 13
Zach Randolph 6 7 0 0 3 16
Draymond Green 1 14 0 0 2 17
Steven Adams 0 4 0 1 0 5
Otto Porter 1 4 0 1 0 6
Rudy Gobert 0 6 0 1 0 7
Khris Middleton 1 5 0 1 0 7
C.J. McCollum 0 5 0 1 1 7
Serge Ibaka 2 5 0 1 1 9
Bradley Beal 3 6 0 1 1 11
Russell Westbrook 2 5 0 1 4 12
DeMar DeRozan 3 5 0 1 3 12
Paul Millsap 6 5 0 1 2 14
Myles Turner 0 4 0 2 0 6
Paul George 1 6 0 2 2 11
Kawhi Leonard 3 12 0 3 4 22
Karl-Anthony Towns 0 1 1 0 0 2
Ben Simmons 0 2 1 0 0 3
Dwight Howard 5 6 1 0 6 18
LeBron James 3 7 1 0 15 26

Table 5.12 and Table 5.13 show that a serious impact on top-class players’ performance
is correlated with injuries. For example, Derrick Rose (2010-11 NBA MVP)
demonstrated decreasing performance (Table 5.12) after suffering from serious injuries
after the end of the regular season 2011-12 (Table 5.13).

Table 5.12: NBA performance analytics for Derrick Rose.


Seasons Avg_mpg Sum_gp avg_pts avg_reb avg_ast avg_ts_pct avg_net_rating avg_per avg_ws avg_bpm avg_vorp
Derrick
28.11 411 16.67 2.87 4.66 0.52 -0.84 16.66 3.07 -0.08 1.32
Rose
2010-11 37.36 81 25.00 4.10 7.70 0.55 8.30 23.50 13.10 6.80 6.70
2011-12 35.26 39 21.80 3.40 7.90 0.53 10.60 23.00 6.00 6.40 2.90
2013-14 31.10 10 15.90 3.20 4.30 0.45 -3.30 9.70 -0.20 -3.10 -0.10
2015-16 31.77 66 16.40 3.40 4.70 0.48 -4.20 13.40 0.40 -2.50 -0.30
2016-17 32.53 64 18.00 3.80 4.40 0.53 -3.90 17.00 3.00 -1.00 0.50
2017-18 15.85 50 8.40 1.40 1.50 0.51 -6.20 11.40 -0.10 -5.45 -0.20
2018-19 27.29 51 18.00 2.70 4.30 0.56 0.70 19.50 3.00 1.40 1.20
2019-20 25.96 50 18.10 2.40 5.60 0.56 -3.40 21.00 2.50 2.20 1.40

128
Table 5.13: “Out of season” and “Out indefinitely” injury analyses for Derrick Rose.
TEAM_ Health Organ Major anatomical Anatomical
Date Player Season Decision Notes
Abbr problems systems areas sub-areas
Derrick Musculoskeleta Musculoskeletal Out for torn ACL in left knee
28/04/2012 2011-12 CHI Lower extremity Knee area
Rose l Injuries system season (out for season)
Derrick Musculoskeleta Musculoskeletal Out for torn meniscus in right
23/11/2013 2013-14 CHI Lower extremity Knee area
Rose l Injuries system season knee (out for season)
torn medial meniscus in
Derrick Musculoskeleta Musculoskeletal Out
24/02/2015 2014-15 CHI Lower extremity Knee area right knee (out
Rose l Injuries system indefinitely
indefinitely)
surgery to repair
Derrick Musculoskeletal Out
30/09/2015 2015-16 CHI Head injuries Head Eye area fractured left orbital
Rose system indefinitely
bone (out indefinitely)
surgery on left knee to
Derrick Musculoskeleta Musculoskeletal Out for
05/04/2017 2016-17 NYK Lower extremity Knee area repair torn meniscus
Rose l Injuries system season
(out for season)
Derrick Musculoskeleta Musculoskeletal Out for sore right elbow (out
12/03/2019 2018-19 MIN Upper extremity Elbow area
Rose l Injuries system season for season)

Table 5.10 and Table 5.11 show selected important performance rating basketball
analytics ([3], [229] and [14]) with the purpose of benchmarking the association
between injury/pathology and performance. These attributes are “avg_mpg” (average
minutes per game), “sum_gp” (sum of games played), “avg_pts” (average of scoring
points), “avg_reb” (average rebounds), “avg_Ast” (average assist), “avg_ts_pct”
(average true shooting percentage), “avg_net_rating” (average Net Rating), “avg_per”
(average Performance Estimate Rating), “avg_ws” (average Win Share), “avg_bpm”
(average box plus minus) and “avg_vorp” (average Value Over Replacement).

On the other hand, players with few critical incidents of injuries (Table 5.11), such as
Giannis Antetokounmpo, demonstrated increasing performance, which was not
negatively correlated with the injury factor, as presented by the advanced performance
analytics in Table 5.14 (RQ6 and RQ7).

Table 5.14: Performance analytics of the Giannis Antetokounmpo NBA.


Seasons Avg_mpg Sum_gp avg_pts avg_reb avg_ast avg_ts_pct avg_net_rating avg_per avg_ws avg_bpm avg_vorp
Giannis
32.40 522 20.50 9.11 4.39 0.58 3.86 22.94 9.19 5.00 4.33
Antetokounmpo
2013-14 24.64 77 6.80 4.40 1.90 0.52 -4.40 10.80 1.20 -2.50 -0.20
2014-15 31.37 81 12.70 6.70 2.60 0.55 1.10 14.80 6.20 0.00 1.20
2015-16 35.29 80 16.90 7.70 4.30 0.57 -2.60 18.80 7.10 2.10 2.90
2016-17 35.56 80 22.90 8.80 5.40 0.60 1.50 26.10 12.40 7.30 6.70
2017-18 36.75 75 26.90 10.00 4.80 0.60 2.80 27.30 11.90 6.20 5.70
2018-19 32.75 72 27.70 12.50 5.90 0.64 12.50 30.90 14.40 10.40 7.40
2019-20 30.43 57 29.60 13.70 5.80 0.61 16.10 31.90 11.10 11.50 6.60

129
5.5 Discussion of Socioeconomic and Health Analytics

In Table 5.15, Table 5.16, Table 5.17, Table 5.18, Table 5.19 and Table 5.20 and also,
for Figure 5.2 and Figure 5.3, it was presented the aggregated data containing basketball
advanced performance analytics, demographic information, and financial information.
Additionally, the combined data uncover relationships and influences between
performance, demographics and economic analytics for teams and players.

According to the preprocessed aggregated data, it was examined the relationship


between player health pathologies/injuries and the financial costs implied in the NBA
competition. Table 5.15 illustrated three different groups, namely, general health
problems, head injuries and musculoskeletal injuries, in the NBA League for 10 seasons,
starting from 2010 to 2020. It is clear for club owners, as well as medical and technical
staff, that they should organize and structure their strategy in terms of training, game
selection, load management, and recovery procedures so that the players can return
active and healthy to the team roster, avoiding any additional implications (RQ9).

Table 5.15 also indicates that musculoskeletal injuries cost approximately half (47.2%)
of the total cost of health pathologies and injuries to NBA league teams, while head
injuries cost 27.4%, and general health issues provoke 25.4% of the total financial losses
while players are injured out of line-up but earn a salary. It was analysed data over 10
years and found the total cost of health pathologies and injuries for NBA league players
to be more than 150 million dollars. The annual evolution of this cost, starting at 12
million dollars in 2010-11 and continuing to increase to 16 million dollars in 2019-20,
is illustrated in Table 0.6. The high-level effect of organ system injury and major
anatomical area on NBA incidence are presented in Table 5.15 (RQ9).

130
Table 5.15: Financial losses of NBA teams associated with player health pathologies
and injuries (2010-11 up to 2019-20).

Organ systems & Major Salary earned while injured (over 10


Anatomical Areas years)
Digestive system $ 13,861,631
Problems
General

$
Health

Respiratory system $ 10,030,557


38,178,058
Unclassified $ 14,285,870
Musculoskeletal system $ 31,797,146
$
Injuries
Head

Nervous system $ 8,770,438


41,296,496
Integumentary system $ 728,912
Neck $ 15,022,631
Musculoskeletal injuries

Trunk $ 15,933,073
Upper extremity $ 16,050,820 $
Lower extremity $ 15,600,006 71,110,255
Multiple anatomical
$ 8,503,725
areas
Total (for 10 years) $ 150,584,809

Additionally, in this study, it was conducted data analysis for 24 NBA league seasons
(from 1996-97 up to 2019-20) with the purpose of identifying the player position earning
the highest salary per year. For that reason, 1577 players were classified into three
classes, namely, Centers (“C”), Forwards (“F”) and Guards (“G”). A “Forward” player
is someone used in the positions of Power Forward or Small Forward, while a “Guard”
is used as a Shooting Guard or/and Point Guard. A “Center” is used as Centers most of
the time (RQ11). Table 5.16 shows the detailed injury categorization according to
basketball analytics during 2010-20.

131
Table 5.16: Team injury issues in the period 2010-20 categorized per season.
Team 2010- 2011- 2012- 2013- 2014- 2015- 2016- 2017- 2018- 2019-
Teams Injury Issues Total
Abbrev. 11 12 13 14 15 16 17 18 19 20
Atlanta Hawks ATL 8 21 29 21 21 11 26 25 18 13 193
Boston Celtics BOS 33 21 23 21 9 14 18 20 16 17 192
Cleveland Cavaliers CLE 17 13 25 19 18 27 28 21 20 11 199
New Orleans Hornets - NOH -
Pelicans5 NOP 10 27 22 22 17 37 20 20 19 22 216
Chicago Bulls CHI 7 20 15 16 9 28 21 17 17 14 164
Dallas Mavericks DAL 9 14 14 10 20 29 32 12 23 6 169
Denver Nuggets DEN 13 20 13 18 35 22 35 10 8 7 181
Golden State Warriors GSW 11 13 17 19 14 27 23 18 20 27 189
Houston Rockets HOU 10 11 16 33 20 22 19 43 18 12 204
Los Angeles Clippers LAC 8 14 15 16 9 31 17 17 11 14 152
Los Angeles Lakers LAL 6 11 24 26 27 10 19 11 18 6 158
Miami Heat MIA 14 18 12 19 31 29 37 20 27 23 230
Milwaukee Bucks MIL 30 28 6 37 24 25 19 14 25 13 221
Minnesota Timberwolves MIN 14 20 26 22 29 5 13 10 12 13 164
New Jersey Nets - NJN -
Brooklyn Nets6 BKN 15 29 12 30 20 21 25 23 23 11 209
New York Knicks NYK 13 20 25 28 27 24 22 14 24 14 211
Orlando Magic ORL 23 12 28 17 16 14 11 21 9 10 161
Indiana Pacers IND 14 12 13 20 26 15 10 10 11 131
Philadelphia 76ers PHI 4 15 13 15 34 33 42 26 17 11 210
Phoenix Suns PHX 9 11 15 11 9 23 23 19 15 20 155
Portland Blazers POR 23 20 9 6 21 7 23 12 11 18 150
Sacramento Kings SAC 14 13 12 11 18 47 34 14 10 8 181
San Antonio Spurs SAS 15 27 30 28 25 30 34 30 16 6 241
Oklahoma City Thunder OKC 5 9 15 9 18 15 10 9 15 11 116
Toronto Raptors TOR 20 15 16 16 8 22 18 8 21 16 160
Utah Jazz UTA 30 20 17 7 18 14 24 26 13 10 179
Memphis Grizzles MEM 12 13 18 14 23 22 29 43 25 12 211
Washington Wizards WAS 21 12 25 17 15 24 15 15 10 17 171
Detroit Pistons DET 20 10 10 12 7 23 13 12 15 22 144
Charlotte Hornets CHA 14 19 13 14 20 19 20 17 9 7 152
Total 428 510 527 547 582 681 685 557 495 402 5414

Table 5.16 and Table 5.17 present these three classes in correlation with the average
salary for each class. According to this, Centers are the best paid players in the League,
not only for their height but also for their superior performance [230], [231], [232],
[233].

According to the Law of Supply and Demand, the position of a “Center” player
represents the relationship of players in that class with the salary that teams offer. It is
difficult to find a player of such height and of the highest quality and performance at

5
New Orleans Hornets have changed name in the end of season 2012-13. From 2013-14 season they play with the
name of New Orleans Pelicans
6
New Jersey Nets have changed name in the end of season 2011-12. From 2012-13 season they play with the name
Brooklyn Nets.
132
this competitive level [234]. According to the data analysis of 1577 players, Centers
accounted for 21.3% of the total, with an average salary of $6.84 million, while Guards
accounted for 38% of the total, with the lowest average annual salary of $5.37 million,
over 24 NBA seasons (RQ11).

Table 5.17: NBA Player average salary per position over 24 Seasons (1996-2020).

Player Positions # of Players % of Players Average Inflated Salary


C (Centers) 336 21.3% $ 6,842,627
F (Forwards) 642 40.7% $ 6,159,377
G (Guards) 599 38.0% $ 5,365,431
All Players 1577 100% $ 5,960,735

Table 5.18 shows the NBA annual salary drill-down analysis per position. From 1996-
97 to 2019-20, there was an increase in salaries for “Centers” from $3,88 to $9,44
million (143% increase), for “Forwards” from $3,54 to $7.6 million (115% increase)
and for “Guards” from $3,26 to $8,05 million (147% increase), and an average increase
in salaries for all players from $3,49 to $8,17 (134% increase) (RQ11). Based on the
results, it seems that the game is changing, and all players are eligible and capable of
playing in every position needed during matches.

Undoubtedly, the age criterion is vital for players regarding their career. Therefore, sport
clubs invest in player careers, create strategies, structure their team roster, and finally
make important decisions about their budgets. Age influences performance, as
professional basketball players older than 30 years have lower speed and/or jumping
ability than younger players, which means that performance declines severely by that
age [235].

133
Table 5.18: Average annual NBA player salaries compared with those in the previous
year over 24 seasons (1996-2020).

C F G Global %
% Diff % Diff Global
Diff since
Season Average % Diff since
Average Inflated since Average since Average
previous
Inflated previous Inflated salary
Salary previous Inflated Salary previous year
Salary year
year year
1996-97 $3,881,290 $3,541,931 $3,258,706 $3,486,174
1997-98 $5,082,727 34.02% $3,743,718 8.17% $3,669,553 15.24% $3,949,293 15.94%
1998-99 $6,537,947 30.63% $4,372,168 18.60% $3,529,786 -2.32% $4,449,951 14.42%
1999-00 $7,360,855 15.05% $4,742,341 10.84% $4,211,420 21.92% $4,932,863 13.28%
2000-01 $6,966,160 -2.16% $5,802,620 26.49% $4,703,662 15.46% $5,584,412 17.04%
2001-02 $5,783,889 -14.62% $6,470,697 14.67% $5,051,722 10.44% $5,748,630 5.85%
2002-03 $6,442,727 13.16% $6,163,238 -3.24% $5,470,850 10.02% $5,950,310 5.15%
2003-04 $5,878,544 -6.69% $6,872,175 14.03% $5,780,625 8.06% $6,254,761 7.50%
2004-05 $6,064,786 5.93% $6,802,390 1.64% $4,926,732 -12.49% $5,887,708 -3.35%
2005-06 $6,226,495 6.15% $6,654,906 1.15% $5,683,056 19.26% $6,196,194 8.81%
2006-07 $7,160,742 18.73% $6,490,252 0.69% $5,743,822 4.34% $6,281,333 4.66%
2007-08 $7,927,003 13.86% $6,672,018 5.73% $5,715,278 2.34% $6,511,636 6.62%
2008-09 $7,416,343 -2.84% $6,666,141 3.76% $5,949,866 8.11% $6,527,961 4.11%
2009-10 $7,050,769 -5.25% $6,270,419 -6.26% $5,357,993 -10.25% $6,028,036 -7.97%
2010-11 $5,975,398 -13.86% $6,123,897 -0.73% $5,934,997 12.59% $6,018,818 1.49%
2011-12 $6,488,697 12.02% $5,881,954 -0.92% $4,994,752 -13.18% $5,637,403 -3.38%
2012-13 $5,979,019 -5.95% $5,383,306 -6.58% $4,197,080 -14.23% $4,997,238 -9.52%
2013-14 $6,119,719 3.86% $5,626,318 6.05% $4,358,206 5.37% $5,207,155 5.73%
2014-15 $6,112,329 1.50% $5,399,645 -2.47% $4,665,740 8.79% $5,195,075 1.39%
2015-16 $6,486,326 6.25% $6,958,399 29.02% $4,955,850 6.35% $6,013,757 15.90%
2016-17 $8,533,234 33.21% $7,152,360 4.08% $6,731,495 37.54% $7,253,602 22.14%
2017-18 $9,141,014 9.40% $7,650,239 9.24% $7,121,412 8.05% $7,710,912 8.57%
2018-19 $8,476,060 -5.01% $8,221,339 10.09% $7,032,836 1.17% $7,736,542 2.78%
2019-20 $9,435,243 13.33% $7,600,759 -5.87% $8,053,177 16.58% $8,167,815 7.49%
Grand Total $6,842,627 $6,159,377 $5,365,431 $5,960,735

In addition, this research study aimed to correlate NBA player age with salary and
advanced performance in basketball analytics. Table 5.18 presented a matrix with the
attributes of age, salary and advanced basketball performance analysis, which included
PTS, REBS, AST, TRB%, TOV%, USG%, TS%, AST%, STL%, BLK%, NetRtg, PER,
WS, WS per_48, BMP or Plus/Minus or +/- and VORP, which were used as the most
appropriate rating basketball analyses in the background [13], [3], [91], [179], [182]
(RQ10 and RQ11).

As illustrated in Table 5.19, the green highlighted cells are those that demonstrate
players reaching their peak performance within each age group. Studying 24 seasons of
the NBA League showed that players aged between 27 and 29 years are at the peak of
their careers regarding average performance in the provided most accredited basketball
advanced rating analytics. However, their maximum earnings on average are achieved
when they are between 29 and 34 years old (RQ10 and RQ11), with the maximum
average salary for an NBA league player achieved at age 34.

134
Table 5.19: NBA player age clustering in correlation with advanced basketball
analytics and average inflated salaries over 24 seasons (1996–2020).
Age Average
TRB TOV USG STL BLK Net WS VOR
Crite Inflated PTS REBS AST TS% AST% PER WS BPM
% % % % % Rtg per_48 P
rion Salary

18 $ 1,732,401 3.65 2.20 0.55 12.47 13.82 19.56 45.44 8.87 1.45 2.89 -4.67 11.10 0.59 4.02 -4.32 -0.02

19 $ 3,218,622 7.77 3.56 1.45 10.47 14.10 19.31 49.89 10.91 1.45 2.15 -4.68 12.06 1.49 4.63 -2.82 0.02

20 $ 2,877,569 8.19 3.73 1.68 10.49 14.65 19.19 50.97 12.14 1.61 2.08 -3.84 12.77 2.00 6.04 -2.14 0.26

21 $ 2,819,124 8.75 3.79 1.77 10.48 14.01 19.57 51.39 12.74 1.61 1.95 -3.41 13.20 2.32 6.66 -1.86 0.44

22 $ 2,576,071 8.22 3.61 1.67 10.20 13.97 19.16 51.69 12.78 1.62 1.81 -2.87 13.07 2.27 6.92 -1.75 0.46

23 $ 3,099,609 8.09 3.49 1.68 10.01 14.01 18.94 51.47 12.70 1.69 1.65 -2.77 13.00 2.26 7.14 -1.71 0.49

24 $ 3,967,551 8.37 3.62 1.78 10.03 14.24 18.82 51.90 13.17 1.63 1.69 -2.59 13.02 2.44 7.41 -1.59 0.56

25 $ 5,237,968 9.28 3.91 1.93 10.16 13.53 19.21 52.49 13.39 1.63 1.62 -1.57 13.88 2.91 8.61 -1.03 0.73

26 $ 6,267,616 9.80 4.01 2.12 9.99 13.53 19.28 52.62 13.92 1.65 1.66 -1.57 13.97 2.91 8.68 -0.88 0.74

27 $ 6,923,518 9.64 4.01 2.10 10.00 13.67 19.07 52.84 13.81 1.61 1.60 -1.47 13.92 2.98 8.88 -0.83 0.75

28 $ 7,677,535 9.66 3.95 2.16 9.76 13.54 18.83 52.72 13.87 1.59 1.47 -1.15 13.65 3.00 8.81 -0.83 0.78

29 $ 8,201,621 9.65 4.04 2.16 9.87 13.76 18.84 52.64 13.84 1.61 1.42 -0.40 13.68 3.01 8.96 -0.76 0.76

30 $ 8,107,519 8.88 3.93 2.02 10.10 13.67 18.21 52.24 13.38 1.56 1.56 -0.84 13.15 2.85 8.41 -1.10 0.70

31 $ 8,077,180 8.33 3.55 2.10 9.35 14.09 17.76 51.69 14.14 1.56 1.36 -0.98 12.53 2.62 7.95 -1.30 0.61

32 $ 8,210,213 8.30 3.69 2.10 9.70 13.91 17.73 51.90 14.11 1.55 1.35 -0.46 12.70 2.72 8.39 -1.27 0.62

33 $ 8,270,049 7.90 3.63 2.12 9.60 14.64 17.32 51.69 14.81 1.58 1.25 -0.93 12.40 2.62 7.94 -1.20 0.62

34 $ 8,611,010 7.85 3.65 2.19 9.39 14.36 17.06 51.21 14.95 1.60 1.36 -0.34 12.42 2.64 8.29 -1.13 0.66

35 $ 7,332,750 7.23 3.75 1.97 10.11 14.40 16.51 51.01 13.94 1.60 1.53 0.37 12.20 2.54 8.37 -1.11 0.62

36 $ 6,756,893 7.26 3.73 1.96 10.34 14.80 16.76 52.31 14.18 1.47 1.64 0.51 12.66 2.64 9.10 -0.86 0.62

37 $ 6,538,329 6.88 3.47 1.91 10.04 14.84 16.95 51.29 14.48 1.60 1.55 0.01 12.72 2.28 8.72 -0.85 0.55

38 $ 5,759,942 6.74 3.40 1.87 10.43 14.73 17.60 50.40 15.37 1.64 1.52 -1.40 12.49 1.99 7.89 -1.18 0.44

39 $ 4,898,280 6.32 3.14 1.60 10.40 14.96 16.35 52.72 12.79 1.77 1.45 1.99 12.31 2.51 9.02 -0.95 0.73

40 $ 3,190,658 5.63 2.99 1.66 10.02 14.94 15.03 53.27 12.26 1.45 1.88 -0.92 11.44 2.22 8.44 -1.49 0.48

41 $ 4,923,851 3.93 3.23 0.50 13.67 13.80 15.80 53.50 5.67 2.00 3.30 -0.20 13.40 1.47 11.50 -0.60 0.23

42 $ 2,098,713 5.20 2.60 0.70 10.30 10.90 15.75 49.95 7.00 1.35 1.70 -10.90 9.90 0.95 3.70 -3.65 0.05

43 $ 2,564,753 4.30 2.10 0.65 10.00 12.30 18.20 48.90 8.20 1.00 2.95 3.15 10.90 0.35 4.50 -3.50 -0.30

Figure 5.2 and Figure 5.3 show the financial cost simplification regarding the average
annual salary paid by each team in the analogy of 24 NBA seasons from 1996-97 to
2019-20. New Orleans Pelicans (NOP), Miami Heat (MIA) and New York Knicks are
the top 3 clubs with the highest paid salaries: $6 M, $5.74 M, and $5.72 M, respectively.
On the other hand, the teams of Atlanta Hawks (ATL), Charlotte Hornets (CHA) and
Philadelphia 76ers (PHI) have the lowest salaries, $4.04 M, $4.03 M, and $3.97 M,
respectively. The bottom 3 clubs, SEA, NJN and VAN, were excluded from this
comparison because these teams ceased to exist many years ago, when lower salaries
were the norm. Regarding performance evaluation, there is high complexity and
limitations in correlation analysis between tactics formation, SportsVU video-tracking
analytics, physical performance measurements and psychometric/biometric analytics.

135
Based on the possible implications proposed, it was considered that the research study
can provide valuable insights for decision makers.
$7,100,433
$6,877,153
$6,490,942

AVE RAG E SA LA RY
$6,284,991
$6,198,862
$6,034,347
$5,953,029
$5,870,321
$5,868,073
$5,779,036
$5,715,386
$5,679,129
$5,624,548
$5,500,259
$5,489,638
$5,451,295
$5,422,026
$5,367,642
$5,332,056
$5,320,151
$5,304,473
$5,278,654
$5,147,263
$5,116,361
$5,056,164
$5,042,696
$4,971,210
$4,878,428
$4,873,458
$4,865,279
$4,779,647
$4,566,879
$4,422,133
$3,169,732
Figure 5.2: Average NBA player salary per team over 24 seasons (1996-2020).
Teams Abbr.
Atlanta Hawks ATL
Total Boston Celtics BOS
NYK Brooklyn Nets7 BKN
VAN MIA Charlotte Hornets CHA
$7,100,000
NOH DAL Chicago Bulls CHI
NJN $6,600,000 NOP Cleveland Cavaliers CLE
CHA $6,100,000 POR Dallas Mavericks DAL
Detroit Pistons DET
ATL $5,600,000 OKC Denver Nuggets DEN
$5,100,000 Golden State Warriors GSW
PHI SAS
Houston Rockets HOU
$4,600,000 Indiana Pacers IND
DEN $4,100,000 LAL Los Angeles Clippers LAC
Los Angeles Lakers LAL
SEA $3,600,000 DET Memphis Grizzles MEM
$3,100,000 Miami Heat MIA
MIL BOS Milwaukee Bucks MIL
Minnesota Timberwolves MIN
CHI BKN New Jersey Nets3 NJN
New Orleans Hornets4 NOH
LAC CLE New Orleans Pelicans8 NOP
New York Knicks NYK
TOR IND Oklahoma City Thunder OKC
Orlando Magic ORL
MEM UTA Philadelphia 76ers PHI
MIN PHX Phoenix Suns PHX
ORL WAS Portland Blazers POR
SAC HOU
GSW Sacramento Kings SAC
San Antonio Spurs SAS
Figure 5.3: Left side: Radial chart distribution of average NBA player Seattle SuperSonics9 SEA
Toronto Raptors TOR
salary per team over 24 Seasons (1996–2020). Right side: Team name Utah Jazz UTA
and abbreviation. Vancouver Grizzlies10 VAN
Washington Wizards WAS

7 New Jersey Nets changed their brand name in the end of season 2011-12. From 2012-13 season and then they use
the name Brooklyn Nets.
8 New Orleans Hornets changed their brand name in the end of season 2012-13. From 2013-14 season and then they

use the name of New Orleans Pelicans


9 Seattle SuperSonics played last season in NBA in the season 2007-08
10 Vancouver Grizzlies played for 6 seasons in the period of 1995 to 2001

136
By performing clustering through dimensionality reduction, based on feature extraction
(Principal Component Analysis) and feature selection (Wrapper, Filter, and Embedded
methodologies), it was identified 8 age clusters, as presented in Table 5.20 and in more
detailed analysis in Table 0.7. Approximately 50% of players in the NBA are aged 23
to 28 years. Players aged between 29 and 33 years receive the highest salaries,
approximately 8.2 million dollars.

Guards (G) show the best playing performance when aged 29-30, and in the same
cluster, they receive the highest salary on average, ~7.5 million dollars. In addition,
Forwards (F) play better when aged 27-28 years, with the highest salary of ~8.6 million
dollars occurring when they are 31-33 years old. On the other hand, Centers (C) were
paid more when the respondent was aged 31-33 (9.4 million dollars), and their highest
performance was when the respondent was aged 25-26.

Table 5.20: Eight age clusters correlated with Player percentage, Salary adjusted for
inflation and Performance Estimate Rating (PER) over 24 NBA seasons (1996-2020).

Age clusters % of Players Salary Inflated PER


18 - 20 3.4% $ 2,953,999 13.11
21 - 22 10.6% $ 2,669,485 13.80
23 - 24 17.7% $ 3,558,466 13.79
25 - 26 17.2% $ 5,762,331 14.41
27 - 28 16.2% $ 7,286,295 14.15
29 - 30 13.2% $ 8,155,019 13.74
31 - 33 13.4% $ 8,171,796 13.10
>= 34 8.3% $ 7,304,144 12.94

The Tornado funnel diagram (Figure 5.4) illustrates the correlation between age and
Inflated Salary for all NBA players in descending Salary order. The percentage in
brackets indicates the percentage of the first to have a clear comparison. Based on the
results, it was concluded that players in the age group 29–34 receive the greatest salaries
(Table 5.20). Hence, teams select young people as experienced professionals to use them
as mentors to teach younger players and achieve the best mix of goals for achieving
more wins and championships.

137
Figure 5.4: Tornado funnel diagram for all NBA players over 24 seasons (1996-2020)
regarding Inflated Salary versus Age.

138
5.6 Injury Recovery and Economic Impact

Cohen's d provides a standardized metric of the difference between two means in terms
of standard deviations, making it easier to understand the magnitude of the difference.
On the other hand, t tests determine whether the difference is statistically significant. A
large t-statistic often indicates a large effect size (although the relationship is not strictly
linear due to the square root in the denominator) (RQ12).

According to the Results section, one of the aims of the paper is to identify the patterns
and relationships of anatomical sub-areas and advanced performance metrics in game
series in 2-game, 5-game, and 10-game series. In each case, the analysis has shown the
following:

For Games 2 before/after the injury


Inconclusive Areas:
• Inconclusive areas, such as the ankle area, knee area, and thigh area, included
those with significant p values but were not the most or least impacted.
• Notable Observations
• For negative Cohen's d values, many areas, including the ankle area, knee area,
and thigh area, had negative Cohen's d values, suggesting a decrease in
performance postinjury across these areas.
• Contrast areas: There are no areas that have a positive average Cohen's d but a
significant p value, which suggests that all significant areas have a negative
impact on performance.
• Percentage Change Insight: Many areas show a negative average percentage
change, indicating a decrease in performance postinjury. The significant areas
with a negative change in performance cover a broad range, from the ankle area
to the respiratory area, suggesting widespread impacts of injuries.

For Games 5 before/after the injury


Inconclusive areas:
• The inconclusive areas with significant p values that were neither most nor least
impacted included the ankle, knee, thigh, thoracolumbar, foot, abdominal, and

139
other areas. These areas exhibit significant statistical findings but do not show
the extremities of impact, warranting a more nuanced interpretation.
Notable Observations
• Negative Cohen’s d: Several areas show a negative Cohen's d, suggesting that
injuries in these areas, such as the ankle, knee, thigh, and several others,
generally lead to a decrease in performance postinjury.
• Percentage change insight: The analysis showed that most of the significant
areas had a negative average percentage change, indicating a general decrease in
performance postinjury across various anatomical sub-areas.

For Games 10 before/after the injury


Inconclusive areas:
• The inconclusive areas with significant p values that were neither most nor least
impacted included the ankle, knee, thigh, thoracolumbar, foot, abdominal, and
other areas. The impact in these areas is significant but does not show extremes,
which could warrant further investigation or a more detailed analysis.
Notable Observations
• Negative Cohen’s d: Several areas exhibit a negative Cohen's d, indicating a
general trend toward decreased performance postinjury. These included the
ankle, knee, thigh, thoracolumbar, and foot regions and additional areas.
• Percentage change insight: A range of areas shows a negative average
percentage change, denoting a decrease in performance postinjury. These
included the ankle, thigh, abdominal, heel, toe, elbow, chest, pelvic, and fibular
areas.

The matrix below (Table 5.21) and Figure 5.5 (bar chart that compares the percentage
change in the anatomical sub-areas) above offer a comparative analysis of the
performance impact before and after injury events across games 2, 5, and 10. The
findings were structured around key metrics: areas with significant impact, effect size
based on Cohen’s d, percentage change in performance, and highlighted areas of
concern. This approach provides a comprehensive view of how injuries affect player
performance across different games, enabling us to pinpoint specific areas that require
attention and possible intervention (RQ12).

140
Figure 5.5: Comparison of the median percentage change in anatomical subareas in
Game Series 2, 5, and 10.

Table 5.21: Comparison matrix of Anatomical sub-areas for all the game series (2, 5
and 10).

Metrics -
Games 2 Games 5 Games 10
Dimensions
Areas with
Significant Abdominal area, Foot area, Pelvic area, Thigh area, Pelvic area, Chest area,
Impact on Pelvic area Abdominal area Ankle area
Concern
% Change Most Impacted: Pelvic
Most Impacted: Most Impacted: Pelvic
area (-10.80%);
Abdominal area (-29.06%); area (-16.70%);
Least Impacted: Upper
Least Impacted: Upper Least Impacted:
arm and Forearm area
Forearm area (40.49%) Fibular area (4.41%)
(6.24%)

141
For Games 2 Before/After the Injury – Basketball Performance Analytics
Notable Observations
• Negative Cohen's d: Not explicitly listed; however, metrics with a negative
Cohen's d indicate a decrease in performance postinjury.
• Percentage change insight: The most significant percentage change was found
in DFGM_PLAYER_TRACK, with a change of -71.23%, and the least
significant change was a 65.15% change in BLK_TRADITIONAL. This insight
highlights the metrics that have undergone the most and least changes in terms
of performance.

For Games 5 before/after the injury – Basketball Performance Analytics


Notable Observations
• For Cohen's d, a significant number of performance metrics have negative
values for Cohen's d, indicating that, on average, postinjury performance tends
to be lower than preinjury performance.
• Percentage Change Insight: 'PCT_PF_USAGE' stands out as the most
impacted metric. The highest positive change observed was 61.32% (in
'PCT_PF_USAGE'). The most significant negative change is -183.09% (in
'E_NET_RATING_ADVANCED').

For Games 10 before/after the injury – Basketball Performance Analytics


Notable Observations
• Negative Cohen's d: The same metrics listed as 'Metrics of Concern' also feature
here, indicating their potential negative impact.
• Percentage change insight: 'E_NET_RATING_ADVANCED' was the most
impacted metric, with the highest positive change of 44.24%. The most
significant negative change was -11.83% in 'OPP_PTS_FB_MISC'.

The matrix in Table 5.22 and Figure 5.6 (bar chart comparison of the performance metric
median percentage change for 2, 5 and 10 game series) consolidates the information
from the previous analyses. The "Most Impacted" column focuses on the largest
negative impact, while the "Least Impacted/Notably Positive" column emphasizes
metrics that either showed minor declines or demonstrated potential positive changes
postinjury. The "Metrics of Concern" column underscores the metrics that exhibited

142
notable declines in performance, which might be areas to prioritize in future analyses or
interventions.

Figure 5.6: Comparison of Median %Changes in Performance Metrics for Game Series
2, 5, and 10.

The data present the relationship between recovery time and team losses. Several
hypotheses could be posited as follows (RQ13-15):

1. Team Performance: Longer recovery times could either indicate a more thorough
recovery protocol, potentially leading to better long-term team performance, or
could be a sign of more severe injuries. The data alone does not clarify this
relationship.
2. Economic Impact: The economic impact (sum of team losses) is not related to
recovery time. This could be due to a multitude of factors not accounted for in this
dataset, such as the nature of the sport, insurance policies, revenue streams of each
team, or the impact of star players' injuries.

143
Table 5.22: Comparison matrix of the performance metrics for all the game series (2, 5
and 10).

Metric/Dim
Game 2 Game 5 Game 10
ension
Significant All metrics have p values All metrics have p values All metrics have p values
Impact below 0.05 below 0.05 below 0.05
Cohen's d PCT_TOV_USAGE,
PLUS_MINUS_TRADI
TIONAL,
OPP_FTA_RATE_FOUR
PCT_BLK_USAGE,
_FACTORS,
BLK_TRADITIONAL, OPP_TOV_PCT_FOUR
PCT_PF_USAGE,
BLK_MISC, _FACTORS,
E_NET_RATING_ADV
PCT_PTS_FT_SCORIN E_NET_RATING_ADV
ANCED,
G, ANCED
OPP_TOV_PCT_FOUR_
OPP_TOV_PCT_FOUR
FACTORS
_FACTORS,
OREB_PCT_FOUR_FA
CTORS
% Change Most impacted: +65.15%
Most impacted: +61.32% Most impacted: +44.24%
in
in PCT_PF_USAGE; in
BLK_TRADITIONAL;
Least impacted: -183.09% E_NET_RATING_ADV
Least impacted: -71.23%
in ANCED;
in
E_NET_RATING_ADV Least impacted: -11.83%
DFGM_PLAYER_TRA
ANCED in OPP_PTS_FB_MISC
CK
Areas of Multiple metrics with
DEF_RATING_ADVA DEF_RATING_ADVAN
Concern both positive and
NCED with the highest CED showing a continued
negative changes; care
average % decline decline
needed in interpretation

The variation in recovery times and financial losses across anatomical sub-areas could
be indicative of several underlying factors (RQ13 and RQ15):

144
3. Frequency of Injuries: Certain areas, such as the 'Hand, Thumb & Fingers' and
'Knee', may be more prone to injuries that are both severe and frequent, leading to
longer recovery times and higher costs.
4. Economic Impact: This variable is proportional to recovery time, which suggests
that some injuries, even those that are less frequent or have shorter recovery times,
may still incur significant costs. This could be due to factors such as the player's
position, the importance of the player to the team, or medical expenses specific to
certain types of injuries.
5. Implications for Player Health: Respiratory issues, while having a shorter
average recovery time, have a high financial impact, due to the COVID-19
pandemic's effects, as shown by the separate listing of COVID-19 with a very high
recovery time (76.44 days) and associated costs.

The data indicated a variance in injury occurrence according to anatomical sub-area,


with some regions being more prone to injury in certain contexts (Defensive, Offensive)
than others (RQ3).

6. Injury distribution: A greater frequency of injuries in the 'Pelvic' and 'Fibular'


areas might indicate that these regions are either more vulnerable during play or
that the nature of the sport involves more activities that put these areas at risk.
7. Contextual Classification: The predominance of injuries in the 'Misc' category
could suggest either a wide variety of other non-classifiable injuries or a potential
issue with the classification system itself.
8. Offensive vs. Defensive Injuries: The data might reflect the different types of
stresses placed on the body during offensive and defensive plays, with certain
areas being more affected by offensive manoeuvres and others by defensive
actions.
9. Implications for Prevention and Treatment: Understanding which anatomical
areas are most at risk and in what context can help in developing targeted injury
prevention and rehabilitation programs.

145
Table 5.23 synthesized the results of Table 5.21 and Table 5.22 injuries across the three-
game series (2, 5 and 10) to highlight trends in physical injuries within different
anatomical sub-areas and types of play. This study offers insights into the recurrent
nature of certain injuries and informs targeted preventive strategies (RQ1).

Table 5.23: Comparison matrix of more significant and effect size injuries in different
game series.

Aspect Game Series 2 Game Series 5 Game Series 10


Injury Pelvic and wrist areas Shift in high incidence of injuries Upper arm and forearm areas
Prone had notable injury to the upper arm and forearm remained most prone to
Areas incidences. areas. injuries.
Play Type Defensive plays Defensive injuries continued to Defensive play injuries
Risks associated with the be significant, reflective of the persisted, emphasizing the need
highest number of physical demands of the sport. for focused preventive
injuries. strategies.
Injury Some injuries rated, Ratings not extensively reported, Ratings more prevalent,
Ratings but without a high suggesting a need for more especially in the upper arm and
severity in any detailed injury impact forearm area, indicating higher
specific area. assessments. risk.

146
5.7 Discussion on Injury Patterns and Impact on Performance in the
NBA League Using Sports Analytics

Basketball, as a sport, is characterized by a high degree of uncertainty and the presence


of numerous inter-related parameters within a multivariate framework.

The most prominent injuries are localized to areas that bear significant weight or are
heavily involved in movement and stability, namely, the knee, ankle, thigh, and back.
While some areas, such as the neck and fibular region, have lower incidences, they
should not be ignored, as such injuries can be particularly debilitating. Preventive
measures, such as strengthening exercises, stretching, and protective gear, might be
beneficial, especially for highly impacted areas (RQ16).

Teams may use these data to invest in specialized medical care and training routines to
prevent injuries in high-risk anatomical areas. Player agents might be interested in these
data to negotiate contracts, especially ensuring protection and clauses related to injuries
in areas that show significant salary implications. Medical staff can prioritize, and tailor
recovery plans based on average recovery times for different injury types.

Since the highest number of injuries is reported in the knee area, teams should consider
specialized training or protective gear to minimize such injuries. Players and agents can
use these data during contract negotiations, ensuring protective clauses, especially for
areas with significant negative salary implications. The reasons behind the positive
salary changes associated with certain injury areas should be explored.

The low variance in metrics at the bottom of the graph may indicate that these areas of
performance are less affected by injuries. For instance, metrics related to player
tracking, such as distance covered, might not be as influenced by injuries as other
performance indicators.

The data suggest that injuries predominantly affect metrics related to possession,
defensive and offensive ratings, and usage percentage. This can be attributed to players
perhaps being more cautious post injury, leading to a decrease in their on-court activity
in these domains.
147
Interestingly, while some players show a decline in performance post injury in certain
metrics, there is an almost equivalent set of players who demonstrate enhanced
performances in those same areas. This could indicate a compensatory mechanism
where players adapt their playing style post injury, focusing on areas they find more
comfortable or less taxing (RQ17).

This research underscores the physical demands of basketball, particularly on the lower
extremities, and the consequent injury patterns. The findings suggest a need for focused
preventive measures and training modifications, especially for high-risk areas like the
knee, ankle, and thigh.

The impact of injuries on salary and performance metrics highlights the economic and
professional implications of these injuries. Teams and player agents can use these data
for contract negotiations and tailored injury prevention strategies. The variance in post-
injury performance suggests that some players may adapt their playing style to
compensate for physical limitations post injury, indicating a potential area for further
research.

These results contribute valuable insights into injury patterns and their implications in
professional basketball, aiding in the development of more effective injury prevention
and management strategies (RQ18).

148
6. Conclusions
The "Conclusions" section of the thesis encapsulates the findings and reflections derived
from the comprehensive analysis of basketball players and team performance, the
economic and performance impacts of injuries, age, and player position. This chapter
begins by affirming the importance of analytics in distinguishing efficient players and
optimal team combinations, highlighting that basketball, as a team sport, benefits
significantly from analytical insights. The role of team rotation, the value of bench
players, and the evolution of coaching strategies are discussed, emphasizing their
contributions to team success.

This section then delves into the quantification of uncertainty and luck in sports,
underscoring the challenges in predicting performance outcomes, especially in high-
pressure situations. It critiques and compares various performance indicators, such as
APM and PM, addressing their limitations and the complexities involved in forecasting
player and team performance. The analysis suggested the sophisticated integration of
diverse data sources, including statistical, biomechanical, and wearable metrics, to
minimize biases and enhance the understanding of real performance.

Injuries are given particular attention because of their substantial impact on


performance and finances. The research indicates a weak positive correlation between
injuries and performance, with musculoskeletal injuries being most prevalent. The study
explored the economic implications of injuries, the importance of recovery and rest
management, and the use of various machine learning techniques to optimize prediction
and understanding.

In summary, the "Conclusions" section reaffirms the critical role of Sports


Analytics in basketball, providing valuable insights for decision makers to improve
strategic, financial, and tactical decisions. This approach emphasizes the need for
continuous monitoring, targeted injury prevention strategies, and the integration of
comprehensive data to navigate the intricate landscape of sports performance and
economics. This thesis concludes by acknowledging the multifaceted nature of sports
injuries and their consequences and proposing a nuanced approach to injury
management and player welfare optimization.
149
6.1 Evaluation of Basketball Players and Team Performance

Basketball is a team sport, which means that the significance of analytics is not only to
distinguish the most efficient players and teams but also to determine the optimal
combination of players and teams for optimizing performance on the court [112].

Team rotation is also important in player selection. Hence, a team has a roster of 12
players who are ready to be productive and efficient for each minute that they play. The
new coaching trend shows that the technical staff desires 12 eligible players to be ready
for each match. In recent years, there have been awards for the 6th player, which means
that the bench players can make enormous differences during the game. Therefore, the
proper balance of team rotation and role distribution is a key factor for team success,
and there is a large difference in that approach compared with the previous decade [206].

The quantification of uncertainty or luckiness cannot be underestimated. The purpose


of this research is to evaluate the most important rating parameters in basketball and
minimize uncertainty. In addition, the clutch factor refers to the ability of a player to
make correct decisions during critical moments or under pressure in the last seconds of
a game. Hence, it is an analytic approach with a large percentage of bias and is difficult
to use for prediction [225]. Specifically, the exploration of different sports analytics and
subsequent evaluation can boost and give extra value to decision-making to understand
each sport in greater depth.

Currently, APM and PM are the most efficient performance indicators. In particular, the
APM uses a regression model to calculate the impact of teammates and opponents while
they are in court, but these models do not consider the matchup between players and
their opponents [192]. In addition, neither technique (PM nor APM) affects overfitting;
therefore, it could not analyse the circumstances of players who appeared very
frequently on the floor compared with those who appeared very rarely.

The line-up roster selection and five-player basic roster of a basketball team cannot be
based only on analysed metrics such as the APM, RAPM, LinNet method and other
referred analytics. This analysis is difficult to use for accurate forecasting because there
are many qualitative indicators and player skillsets that cannot be easily specified. In
150
addition, the prediction of specific matchups can take place with remarkable accuracy
for a few minutes but cannot remain at these high levels for a whole match [191]. The
optimal choice of a line-up is very complex due to the different combinations of players
on the court and the difficulty in finding the best performance indicator in each position,
time, play and opponent [14].

Based on the background, research results illustrate that an aggregation of player/team


statistics, statistical modelling, visual metrics (SportVU camera analysis, heatmaps,
etc.), commentary or social network metrics, biomechanics statistics, training/gym
statistics and wearable metrics in an optimized performance formula could be the most
sophisticated solution for calculating the real performance of teams and players by
minimizing bias as much as possible [93].

The enormous number of variable resources and different sets of data increase the
complexity of interpreting and developing ways and patterns of better understanding.
Most of the time, domain experts and technical staff are ex-athletes who try to interpret
games and circumstances. Hence, through sports analytics, a critical explanation can
leverage them to use this knowledge as a competitive advantage [41].

In the case study section, the performance of the top 5 players in selected advanced
basketball analytics was analysed, and the yearly nominees for the awards were
validated. Basketball is a complex team game related to several factors, such as playing,
coaching decisions, team chemistry, psychology, sociology, training, marketing, and
health; therefore, it is difficult to estimate the greatest NBA player for each season.

151
6.2 Impact of Injuries on Basketball Player and Team Performance

In particular, the sports industry and basketball pay extra attention to small details with
the purpose of avoiding extra costs and maximizing the efficiency of teams and players.
The objective of this work was to perform an in-depth analysis of the relationship
between performance and injury. The statistical analysis of the data indicated a weak
positive correlation between injuries and performance. This implies that injury is just
one of several important variables affecting team and player performance.

This study showed that musculoskeletal system injuries are the most common injuries
in the NBA. Specifically, for the 2010–2020 period, in terms of anatomical areas, lower
extremity issues and trunk and upper extremity injuries were the most frequent and
critical injuries. These findings provided insight into the cause of injury, such as regular
or critical injuries in the ankle or knee area, which deserves further investigation. (RQ5).

The authors also examined how much teams and players suffered, as demonstrated by
Figure 4.2 and Table A1 as well as Table 5.10, Table 5.11, Table 5.12, Table 5.13 and
Table 5.14 but also, Table 4.2, respectively. Analysing these data leads to the conclusion
that even though performance is evidently negatively impacted by injuries, it is difficult
to quantify this impact, as performance is affected by several other intertwined factors.
Further in-depth analysis examining specific injuries or pathologies on a player per
player case is proposed to achieve better estimations. For example, in the case of Derrick
Rose, a 57.8% decline in performance, as assessed by the PER (avg_per) analysis, was
identified from 2011-12 to 2013-2014 after his injury. Overall, injuries and pathologies
are important parameters involved in the performance multivariate model, and a
comprehensive approach is needed for each player and/or team (RQ6).

Hence, there are ways to avoid health issues by understanding bodily and training
thresholds, consuming resting days appropriately, optimizing the load and managing, as
well as understanding in more depth all the parameters that affect the efficiency of player
and team performance (RQ7). The results of this research indicated that teams with
balanced rest\load management achieved better performance based on basketball
analytics (Table 4.4).

152
A comparative review of ML and DM techniques and algorithms revealed that the
LASSO and ridge regression methods can be used to optimize prediction. Random
forests need big data to achieve accurate predictions with the purpose of gaining insights
into avoiding injuries. Neural networks and pattern recognition methods identify
complex correlations and patterns through injury analytics. SVMs are used for game
outcome prediction based on injury information. These methods could be combined or
aggregated to produce better insights and optimized accuracy (RQ8). In this research, it
was also examined the performance analytics based on player absence from games. The
results indicated that XGBoost (for Tree Ensemble and Linear Ensemble), linear
regression and naïve Bayes had high accuracy (Table 5.2: Data Mining Algorithms and
Techniques Used in Sports Analytics.Table 4.3).

Basketball analytics are used to optimize performance on court by identifying the most
efficient players and teams and by identifying the optimal combination of team pairs
[112]. Team rotation also affects player selection. Hence, a team has a roster of 12
players who are ready to be productive and efficient for every minute they play. The
new coaching trend shows that the technical staff desires 12 eligible players to be ready
for each match instead of the old-fashioned coaching style, which uses 5 to 7 players for
most of the game. In recent years, there has been an award for the 6th player in basketball
associations, which means that bench players can make an enormous impact. Therefore,
the proper balance of team rotation, injury prevention and role distribution are a key
factor for team success, and there is a significant difference between this approach and
perceptions of the previous decade [206].

Injuries negatively impact a team’s finances. Proper player usage management,


workload tracking and monitoring, rest management and recovery based on training
history/status, age, history of injuries, psychology, and stress acceptance can affect
performance. Business understanding can be achieved with the proper analysis of data
requirements and, subsequently, suitable data transformation with the purpose of
modelling valuable information. These data models can provide a competitive
advantage to players, technical staff, scouters, and team owners and can lead them one
step ahead in terms of strategy and tactics management in comparison with opponents’
choices [113].

153
Game results and championship titles are influenced by player and team management
and use. The small details considerably impact the results. Therefore, basketball and
sports in general blend various uncertainties. Injuries are an important factor because
they are associated with playing and substitutions but also impact club finances (RQ7).
Reducing this ambiguity can lead to process optimization and improved player and team
efficiency [13].

6.3 Economic and Performance Impact of Injuries, Age and Position


on NBA Players

The aim of this work was to help decision makers assemble their vision and strategy,
make appropriate budget allocations and investments, and win championship with the
minimum cost in terms of money and time. Age and position criteria are factored in
player wage estimation based on advanced basketball analytics. In addition, injury
history and health problem patterns can help technical staff structure their training, load
management, psychological encouragement, recovery procedures, tactics, and plant
nutrition.

The avoidance of injuries is significant in sports. According to the findings,


musculoskeletal injuries account for 47.2% of the total health problems and pathologies
in the NBA League (RQ9). Based on the age criterion, clubs can form their strategy and
allocate their budget accordingly (RQ10).

One of the main objectives was to provide relationships in the form of economic metrics
of economics, demographics, injuries, and pathologies using advanced basketball
statistics. Sports performance is a multivariate model that is difficult to predict because
of its extensive uncertainty. Some of the parameters involved include shape,
psychology, social background, lifestyle, mental status, playing usage and monetary
traits [236]. There are many interdependencies and correlations among these
parameters. Therefore, the purpose of analytics is to minimize risk and forecast
uncertainties with the purpose of maximizing efficiency [3].

154
The player position is important not only for starting line-up selection but also for
rotation management and for financial reasons (RQ11). The performance of NBA
players aged between 27 and 29 years peaked based on advanced basketball analytics,
while they obtained the highest average salary between 29 and 30 years (RQ10 and
RQ11). Additionally, there are fewer NBA top-level players in the “Center” position
(21.3% of the total) but more ($5.69 million) than in the “Forwards” ($5.1 million) and
“Guards” ($4.48 million) positions. Therefore, “Centers” dominate the NBA League in
terms of highest annual earnings, but it is difficult to discover “Centers” combining
quality and performance characteristics (RQ11).

Basketball is a team sport that focuses on minor details prior to and during a match. For
example, bad or good choices, effective game tempo, and conditions in the first minutes
of a match, efficient offensive and defensive matchups are examples that give
advantages or disadvantages to a club against opponents. However, there are examples
of players who influence a match with their energy, motivation, and psychological
support to the whole team. These are characteristics that data analysts cannot quantify
and allocate to statistical categories but need to be combined and aggregated into a
sophisticated model. However, domain experts, with the help of data scientists, can
recognize important patterns through the game to increase productivity and minimize
any upcoming bias [7]. Nonetheless, data experts and technical staff can use advanced
analytics for the evaluation, tracking, monitoring, and forecasting of team and player
performance. Based on these analyses, technical staff can make significant decisions
about roster management, team culture fit, strategy, vision, scouting, future transfers,
and new talent acquisitions [200].

6.4 Economic Impact of Injuries and Recovery Assessment

The anatomical impact assessment showed consistent involvement of the knee, ankle,
and thigh areas across all the game series, suggesting that these may be critical areas for
athletic performance or injury prevention. In terms of the performance metrics, all the
selected metrics consistently demonstrated statistical significance, with p values less
than 0.05, indicating that these metrics were significantly different across games.
Notably, the percentage changes in metrics, such as 'BLK_TRADITIONAL' in Game 2
and 'PCT_PF_USAGE' in Game 5, exhibited substantial positive changes, while
155
'E_NET_RATING_ADVANCED' demonstrated the most significant negative change
in Game 5 and most positive change in Game 10, indicating a possible area of volatility
or targeted improvement. Interestingly, 'DEF_RATING_ADVANCED' exhibited a
persistent decrease in expression, warranting further investigation. These findings
underscore the importance of continuous monitoring and analysis of both anatomical
and performance metrics to optimize athlete performance and well-being over time.
(RQ12).

While it was observed the range of recovery times and the total sum of team losses, the
analysis does not offer a straightforward interpretation of the relationship between these
variables. Future research should aim to incorporate additional data points, such as the
severity of injuries, the number of players affected, the financial structure of each team,
the particular sport in question, and any insurance compensation received.

Moreover, understanding the context of the losses - whether they pertain to missed
games, decreased ticket sales, or other factors - is crucial for a more comprehensive
analysis. Finally, longitudinal studies tracking these variables over multiple seasons
could provide insights into whether the observed patterns are consistent and whether
recovery times have a direct or indirect impact on the financial outcomes of sports
teams. It would also be beneficial to include post recovery performance metrics to
evaluate the effectiveness of recovery time on player performance (RQ12-15).

The analysis revealed significant disparities in recovery times and financial impacts
across different anatomical sub-areas. While some injuries may require longer healing,
others may have a greater economic impact regardless of recovery duration. Recovery
time does not necessarily correlate directly with financial impact, highlighting the
complex nature of sports injuries and their consequences. These data underscore the
importance of targeted injury prevention and management strategies for different
anatomical sub-areas. Financial implications of injuries extend beyond direct healthcare
costs and can significantly affect a team’s finances. For a more comprehensive
understanding, further research should incorporate additional variables, such as the
frequency of injuries per anatomical area, the average contract value of injured players,
and the timeline of injuries in relation to the sports season. Additionally, a more in-depth

156
analysis might consider the role of player insurance policies and the effect of player
absence on team performance and revenue streams (RQ13-15).

The analysis of the injury distribution across anatomical sub-areas reveals significant
differences in the frequency of injuries associated with defensive and offensive actions,
as well as other Non-specified (Misc) activities. Certain anatomical areas, particularly
the 'Pelvic' and 'Fibular' regions, are more susceptible to injury, emphasizing the need
for focused prevention strategies. The large number of 'Misc' injuries suggests a
diversity of incidents that occur outside of standard defensive or offensive plays,
highlighting the multifaceted nature of sports injuries. The 'Rating' column's data are
not directly interpretable without further context, but a high score might be indicative
of more severe or costly injuries. To further this research, it would be beneficial to
integrate these data with other datasets that provide additional context, such as the type
of sport, level of play, condition of the player, and other potential risk factors.
Additionally, qualitative data regarding the circumstances of each injury could provide
insights into the causal factors and help tailor preventative measures more effectively
(RQ14).

Throughout the three examined game series, the data consistently highlighted the upper
arm and forearm areas as the most prone areas to injury, suggesting a critical focus for
prevention and protection efforts. The persistently high risk associated with defensive
plays calls for specialized training and potentially revised play strategies to mitigate
injury risks. The evolving understanding of injury severity through more frequent
ratings by the 10th game series provides valuable insights into the areas requiring the
most immediate attention. Overall, the conclusions emphasize the necessity of
continuous improvement in player safety measures, which should be informed by an
ever-growing and precise body of performance analytics data (RQ12 and RQ14).

Sports injuries, beyond their immediate physical toll, cascade into realms of
performance and economics in basketball. Recognizing these nuances can empower
stakeholders—from medical staff to team management—to strategize injury
management, optimize player welfare, and formulate robust team strategies.

157
6.5 Injury Patterns and Impact on Performance in the NBA League
Using Sports Analytics

This study represents a significant advancement in sports analytics, highlighting the


criticality of injuries in basketball, particularly in high-risk areas such as the knee, ankle,
and foot. It underscores the importance of preventive measures and the strategic
management of player injuries (RQ16).

A notable aspect of the analysis revealed a direct correlation between injuries in specific
anatomical regions and significant variations in players’ salaries, underscoring the
profound economic implications these injuries bear. A particular focus was placed on
their prevalence, the anatomical regions most affected, and their consequential impact
on player performance metrics. It was revealed that injuries predominantly occur in the
knee, ankle, and thigh areas, reflecting the intense physical demands placed on lower
extremities in basketball (RQ17).

In particular, the “DEF RATING_ADVANCED” and “OFF RATING_ADVANCED”


metrics are significant, suggesting that they are crucial performance indicators. The
novelty of this research lies in its comprehensive analysis that integrates injury data with
player performance metrics, offering new insights into the impact of injuries on
performance (RQ18).

While exploring the sequence of injuries, the study did not identify a consistent pattern
that would suggest a predisposition for subsequent injuries in specific anatomical
regions. However, this observation calls for further research with a more extensive
dataset to dive deeper into the interconnections of injuries.

The data also revealed intriguing variations in players’ performance metrics post injury.
While some players exhibited a decline in certain metrics, an equal number
demonstrated enhanced performance metrics in the same areas. This indicates a
potential adaptive mechanism, wherein players modify their playing style post injury,
potentially shifting focus to less physically taxing elements of the game.

158
The findings significantly contributed to the expanding field of sports analytics, offering
valuable insights into the patterns and consequences of injuries in professional
basketball. These insights can assist teams, medical professionals, and players in
formulating more effective injury prevention strategies and targeted rehabilitation
programs and making informed decisions about player health and performance.
Furthermore, this study lays the groundwork for future research, especially in exploring
the long-term effects of injuries on players’ careers and the sport’s economic dynamics
(RQ16-18).

This study provides vital information for teams and player agents, informing enhanced
medical care, injury prevention strategies, and contract negotiations that consider the
potential financial effects of injuries. Particularly, the findings about the knee area,
which experiences the most injuries with significant salary implications, demand further
exploration to understand underlying factors, potentially including insurance and post-
injury performance.

In conclusion, this research emphasizes the need for a data-informed approach in


basketball for injury prevention, strategic contract discussions, and tailored recovery
plans, paving the way for future studies to explore the real-world implications of these
findings.

159
7. Future Work
Sports Analytics can be used in innumerable forms, such as social engagement,
performance biomechanics analysis, psychological and physical metrics, and the
aforementioned critical analysis of advanced sports statistics, so that technical staff and
domain experts can understand more games and improve processes and methodologies
[19], [14].

Predictive analytics can be applied for forecasting purposes through different factors to
understand teams’ and opponents’ strengths and weaknesses. In addition to these
previous studies, further supplementary work can pay attention to the physical,
psychological or injury aspects of the available metrics for players and team prediction.
For the technical team and coaches, it is a great opportunity to predict such bad possible
situations that could cost team performance. One study focused on NBA player
psychology and behaviour, which measures athletes’ social network activity and
correlates with players’ performance in future games. Sentiment analysis was performed
on those online social posts to understand the thoughts and behaviour of the players and
to draw conclusions in useful reports.

Motion capture technologies are already popular and involve tremendous data collection
and the ability to track every team or player movement on the court in milliseconds. The
game statistics, sensor data from wearable devices, computer vision through SportVU
cameras and valuable information can be used as aggregated outcomes for useful
statements using DM and ML techniques that could affect sports decisions at a
noteworthy level [237].

Basketball is a sport of decision, and team selection, training, the psychological part of
a player, possession, deflection, or a pass or a shoot can have a serious impact on
performance and can directly affect the results of a game [196]. In addition, potential
future work could include the analysis and calculation of expected possession value
(EPV), which rates and evaluates each decision made during the basketball game.

160
According to the current research and the two suggested formulas (API and DPI), further
optimization based on empirical results can apply specific weights to each algorithmic
parameter and produce relevant results.

Finally, sports analytics in different domains can feed directly betting intelligence
systems to forecast player and team results to maximize their profits and accuracy [211],
[178], [60] and [238]. Future research will focus on the construction of a formula with
combined weighted basketball ratings based on play-by-play data with the purpose of
optimizing not only the performance evaluation but also forecasting more accurately.

Accurate injury prediction is difficult. This uncertainty is useful for examining future
projections in detail. It contains many parameters for research in terms of cost savings
due to injuries but, more importantly, to maintain a healthy and stable team roster [3].

Further work could seek an “injury factor” based on a structured form, with an ontology
for injuries involving related hierarchies and taxonomy, that will help data entry, as well
as injury identification and classification. These findings and ontologies will help data
scientists and domain experts identify injury patterns, reproducibility and reasoning and
provide important insights into new injuries.

Physiological characteristics, in terms of physical and training improvement, can be


assessed through video data analysis to assess the movements of athletes with pattern
recognition. In addition, GPS sensors and wearables can be used in more specialized
and focused ways for monitoring and tracking activities than can other teams and
players. Metrics can be improved using advanced basketball analytics, sleep patterns,
health analytics and other important biometric analytics. Therefore, injury limitations
could optimize athlete performance, maximize player, and team efficiency [14].

Future work can focus on the available psychological, physical, or additional injury
metrics for prediction. The technical team and coaches could be assisted in anticipating
situations that could affect team performance. One study on NBA player psychology
and behaviour measured athletes’ social network activity and correlated it with
performance in future games [19]. Sentiment analysis can also be conducted on social
network posts to understand the psychological status and behaviour of players. This

161
work showed that injuries are events with high uncertainty and unpleasant moments but
can also be used in the future for pattern recognition [239].

In addition to the current research, further work can be applied to a variety of Sports
Analytics Predictive or Descriptive segments. Possible areas of future focus include the
impact of psychology on player and team performance, medical and health issues,
recovery, tactics, biometrics, social, nutritional and leadership. These can be studied not
only for basketball but also for other team sports. Examining sports data analysis for
individual sports is an opportunity to identify patterns and insights as an appealing
alternative.

For club owners, management, and technical staff, accurate predictions of uncertain
situations that could cost either financially or in terms of performance are vital.
Psychological and behavioural analysis of athletes is also vital for mentally preparing
them. Currently, the engagement of players and teams through social networks is
extensive, motivating sports data scientists to focus on them with the purpose of
retrieving valuable information and insights. Studying social networks can facilitate
behavioural analysis, build personas segmentation, and recognize patterns. These events
can be associated with fan and team activities for descriptive analysis but also with
associations and predictions for the future for a variety of attributes, as discussed
previously.

There are many platforms and applications that use data to quantify biometrics,
muscular soreness, sleep quality, physical condition, nutritional status, fan engagement,
training, marketing analytics, and social analytics. The purpose is not only for data
visualization but also for sports businesses to focus on avoiding extra costs. In addition,
the convergence of science and technology can reduce extra investments and optimize
the productivity and effectiveness of teams’ and players’ daily routines.

The COVID-19 pandemic created new standards worldwide and dramatically


influenced every sport. For that reason, the Sports Industry should adjust the training,
fan engagement, and contact conditions and focus on every detail that could affect any
participant on each team. Data science and analytics can similarly help to minimize any

162
upcoming risk among sports members and aid with a wide range of physical and mental
issues that could provoke economic, mental and health problems [240].

Additionally, it was identified injury factors via prescriptive analysis. Big data analysis
can reveal which game period of a sport causes the most injuries and whether fatigue
plays a significant role in this health problem. Furthermore, understanding how much
player fatigue can impact the last minutes of a game and whether the leadership and
mental skills of a player can alleviate this impact are useful. Finally, quality of sleep,
long trips and number of trips can influence whether players and teams perform well.
Analysis of this topic can help technical staff in rotation and load management with the
purpose of keeping the team fresh. In summary, an aggregation of the following
segments could provide perspectives and areas for future focus to maximize
performance.

7.1 Advanced predictive and prescriptive Sports Analytics


approaches

Advanced predictive and prescriptive Sports Analytics based on box-score statistics


were used to analyse current and past performance. The predictive models in Sports
Analytics based on assumptions from a large variety and diverse data pool. In general,
for Data Science, but more specifically, there is a move that provides an α direction for
predictive and prescriptive analytics instead of the most common activities in sports
ecosystems for descriptive analysis [241].

Most related studies on sports in the bibliography are descriptive. Hence, they analyse
and evaluate the past and not provide any guidance for the future. The prescriptive and
predictive analytics datasets can be used to model and propose optimizations,
recommendations, and forecasts [14], [42].

7.2 GPS, biometric, and wearable sensor data analysis

GPS and wearable sensor data analysis through sophisticated algorithms. Artificial
Intelligence methodologies and techniques are increasingly used in the sports industry.

163
Wearable sensors used for pattern recognition, heart defect detection, audit and
monitoring of performance, biometrics, psychological stress factors and other training
attributes. Furthermore, wearable wristbands can also predict fatigue and other
important attributes, such as sleep quality, activity, action, and effectiveness [157], [58].

Biometric analysis for training and health consistency optimization. Biometric


information is collected from various machines and technologies, such as wearable
devices. Physicians and medical staff can analyse these data to better understand both
physical and physiological athlete traits. In addition, they can monitor and track in detail,
quantify risk and avoid unexpected circumstances [37].

7.3 Tactics, Strategy and Technical analysis

Tactics analysis using state of the art algorithms for decision making. Since the
complexity of Tactical analysis has increased, state-of-the-art algorithms have been used
for specific fields of interest. Tactical and technical analyses validate the importance of
these methods for identifying the strengths and weaknesses of teams and opponents
[157].

Tactical analysis of offensive and defensive plays helps domain experts’ script the
process during the on-court behaviour of athletes but also off the courts. Some athletes
have the ability to adapt easily to tactical strategies directed by technical staff. For
example, sports have different types of strategies. For example, the distractions that exist
in football can be different from those in tennis. When the coach takes a timeout during
a playing phase can change rhythm, the level of control or the strategy is correlated with
the performance [46], [242] and [243].

7.4 Health, nutrition, and injury implications

Health and injury data analysis to avoid future mishaps and quick recovery. Forecasting
of injuries and identifying upcoming risks are defined with AI applications such as
Neural Networks, SVM, Markov processes and Decision trees. Cricket, soccer,

164
basketball, volleyball, baseball, and handball are the most common sports AI
applications [157].

A specialized nutritional approach is used based on the needs, potential and position of
each player. In sports, training and match activities, psychology, sleep, and nutrition are
key elements of injury prevention [25].

7.5 Video data analysis

Video (SportVU) data analysis for tactics, scouting, player, and team optimization.
SportVU cameras were launched in 2008 [207] in the professional leagues of the NBA
by retrieving all the on-court data. The purpose of this approach is to help improve
insights into player performance, tactical analysis, and basketball strategy. Through this
technology, domain experts have the capability to understand every single time where
the ball is located and how each player moves to structure hot zones and make proper
visualizations to provide explanations of the data. According to these indicators,
decision makers can quantify the impact of each player and team and aid in game
preparation for short- and long-term assessments [244].

Therefore, these technology tracking systems with high-frame rates generate


information that plays key roles in the formulation of semantics. Through this approach,
the systems can retrieve sports plays in the magnitude that they request and apply it to
a wide range of behavioural, statistical and strategy territories [95].

7.6 Trips, workload, sleep, and fatigue correlation

Trip duration, games and training workload and fatigue correlation. Players and teams
at the higher level need to travel quite often, spending a lot of time on trips from one
game to another. According to previous studies, playing back games during a season
can increase the frequency of injuries [209].

Sleep analysis to recognize sleep patterns and optimize sleep quality The correlations
between quality of sleep, psychology and the game results can provide important
165
insights. Sleeping habits, lifestyles, and fatigue are related to a decreased immune
system [139].

Hence, an adequate mix of trips, training workloads and fatigue can maximize
performance and can be a subject of further investigation. According to previous studies,
with fatigue and heavy workloads and many matches per year, a player can reach 120
games per year (~1 game every 3 days). The aforementioned risks can lead to serious
injuries if the technical staff does not handle the injury appropriately [209]. The trip
schedule, different time intervals, jet lag, rest management, diet, hormone release, heart
rate, and levels of anxiety and fatigue can be interrelated in many aspects [139].

7.7 Social network analysis

Social network analysis to understand social behaviour and psychological status. In the
recent exploration of the latest trends in data science, sports analytics and social
networks, there has been a high need to apply machine learning techniques, data mining
and any potential influence based on social networks [75], [245].

An attempt can be made to predict, through data mining and machine learning
algorithms, game results and player/athlete performances. The goal is to create an
integrated methodology that will encapsulate all the above, introducing a sufficient
degree of innovation in such a way that it is possible to apply it to other fields beyond
sports. More specifically, basketball will be researched by evaluating and applying
algorithms from the respective literature. These algorithms use metadata to measure the
performance of teams and players and attempt to produce an estimate of how this
performance is affected by extensive exposure to social networks (Facebook, Instagram,
Twitter, etc.) [246], [247].

In its final stage, the proposed methodology will provide valuable information on the
performance of teams and players so that they can be used as advanced evaluations to
predict the best team composition and the careers of athletes [248].

166
7.8 Budget control, investments, Risk Management, and forecasting
analysis

Budget control, investments, and forecasting analysis. The comprehensive measurement


of player efficiency and the potential for a high-performance career affects the financial
aspects of contracts, budgets, and final decisions in the sports industry [90], [245].
Therefore, sophisticated data analysis can formulate forecasts of these investments and
allocate the provided budget in a conscious way.

Risk management and mitigation actions. The uncertainty that exists in sports is a high
risk that needs to be analysed and specified accordingly. With a better understanding of
all required parameters in each sport, better mitigation actions can be organized
according to the risk management approach [176], [25], [209]. Measuring risk factors
such as psychology or injuries can help individuals avoid unforeseen bad situations and
improve decision-making.

7.9 Leadership and clutch skills

Identification of leadership and clutch skills for teams and players. The clutch trait is
difficult to quantify. Usually, the superstars of each team take the responsibility of
leading the last possessions or the large shots of a match. There are some studies that
show that in some cases, there are superstars that they are not capable of taking hot shots
in the last minutes of games. Being a leader or even a clutch player requires
psychological preparation in addition to exceptional athletic capabilities or leadership
skills [36], [1], [225].

The quantitative and qualitative analysis of clutch performance in this anxiety setting is
difficult. The definition of a “clutch” is described as a substantial successful
performance in a very high-pressure circumstance [249], [46], [250]. Hence, identifying
these characteristics among all athletes on a team that can lead them to more wins is
important for domain experts.

167
Appendices
We include here various tables with basketball metrics for Rating KPIs (Table 0.1),
Defensive criteria (Table 0.2), Offensive criteria (Table 0.3), Overall Performance
criteria (Table 0.4), a Comparison matrix for basketball performance analytics (Table
5.1), and DM algorithms used in Sports Analytics (Table 5.2) classified and categorized
through multiple sources such as [119], [128], [206], [251], [252], [129], [253] and
[254]. Table 5.3 shows the MVP, Best Defender, Top Scorer, Top in Assists, Top in
Steals, Top in Rebounds and the 3 best teams of the year for two seasons (2017-18 and
2018-19). We also included two radial charts on percentage values and logarithmic
normalization (Figure 5.1) for the top 5 players performed in Season 2018-19 and a list
of abbreviations used for basketball analytics (Table 8) (RQ3).

Table 0.1: Advanced Rating KPIs


Glossary Description Metric Formula Explanation
Type
eFG% Effective Player, (FG + (0.5 * 3P FG)) \ FGA One of the recognized Four Factors. Measures
Field Goal Team field goal percentage adjusting for made 3-
Percentage point field goals being 1.5 times more valuable
(Values of than made 2-point field goals.
avg ranges (EFF FG%)
from 0 to 70)
+/- (PM) Plus Minus Player TPOC – OPOC The point differential when a player or team is
(Values of on the floor. A box score estimates of the points
avg ranges Team Points on Court vs Opponent per 100 possessions a player contributed above
up to 12) Points on Court a league-average player, translated to an
average team.
Adj. +/- Adjusted Player TPOC48 – TPOC48 The prediction is the difference in efficiency of
(APM) Plus Minus Points on Court per 48 minutes the home team against the opponent of the
(Values of versus Points off Court per 48 points per 100 possessions
avg ranges minutes
up 12)
AR Assist Ratio Player, (Assists x 100) divided by [(FGA The percentage of a player's possessions that
(Values of Team + (FTA x 0.44) + Assists + ends in an assist.
avg ranges Turnovers]
from 0 to 35)
EFF Efficiency Player, (PTS + REB + AST + STL + BLK Composition of efficiency statistic regarding
(Values of Team − Missed FG − Missed FT - offensive and defensive contribution
avg ranges TO)/GP
from 0 to 36)
EWA Estimated Player, Value Added divided by 30 This calculation it gives the estimated number
Wins Added Team of wins a player adds to a team’s season total
(Values of above what a 'replacement player' would
avg ranges produce.
from 0 to 31)
168
FP Fantasy Player, (1 * PTS) + (1.2 * TRB) + (1.5 * The number of fantasy points a player
Points Team AST) + (3 * STL) + (3 * BLK) - (1 accumulates
(Values of * TO)
avg ranges
from 0 to 62)
GmSc Game Score Player, PTS + (0.4 x FG) – (0.7 x FGA) – It is intended to give a “total perspective” on a
(Values of Team (0.4 x (FTA – FT)) + (0.7 x OREB) player’s statistical performance in a basketball
avg ranges + (0.3 x DREB) + STL + (0.7 x game, considering every statistic listed on a
from 0 to 21) AST) + (0.7 x BLK) – (0.4 x PF) – player’s box score.
TOV
NetRtg Net Rating Player, OFFRTG – DEFRTG = Net Rating (NetRtg): calculates a team's point
(Values of Team (100*PTS/ (Team FGA + Team differential per 100 possessions. On player
avg ranges TOV + (0.44*Team FTA) – Team level this statistic is the team's point differential
up to 17) OREB)) – (100*Opp PTS/ per 100 possessions while he is on court.
(Opponent FGA + Opponent TOV
+ (0.44* Opponent FTA) –
Opponent OREB))
PER Player Player, • Step1: uPER calculation PER calculates all positive and negative
Efficiency Team uPER=(1/M)*(TP+(2/3)*A+(2- accomplishments in per minute rating of player
Rating factor*(TA/TFG)*FG)+FT*0.5*( and team performance.
(Values of 1+1-
avg ranges (TA/TFG)+(2/3)*(TA/TFG))- Player Efficiency Rating is the overall rating of
from 0 to 33) VOP*T-VOP*DRB%*(FGA- a player's per-minute statistical production. The
FG)- league average is 15.00 every season.
VOP*0.44*(0.44+(0.56*DRB%)) - pace adjustment = lg_Pace/team_Pace
*(FTA-FT)+VOP*(1- - estimated pace adjustment = 2 * lg_PPG/
DRB%)*(TRB - (team_PPG + opp_PPG)
OREB)+VOP*DRB%*OREB+V
OP*S+VOP*DRB%*B- - aPER = (pace adjustment) * uPER
PF*((LFT/LPF)-
0.44*(LFTA/LPF)*VOP))
• Step2: Factor and VOP
calculation
Factor = (2/3) -
(0.5*(LA/LFG))/(2*(LFG/LFT))
VOP=LP/(LFGA-
LOREB+LT+0.44*LFTA)
• Step3: Pace and league
adjustment for PER
PER=(uPER*(LPace/TPace))
*(15/LuPER)
PIE Player Player, (PTS + FGM + FTM - FGA - FTA PIE measures a player's overall statistical
Impact Team + DREB + (.5 * OREB) + AST + contribution against the total statistics in games
Estimate STL + (.5 * BLK) - PF - TO)/ they play in. PIE yields result which are
(Values of (GmPTS + GmFGM + GmFTM - comparable to other advanced statistics (e.g.,
avg ranges GmFGA - GmFTA + GmDREB + PER) using a simple formula.
from 0 to 25) (.5 * GmOREB) + GmAST +
GmSTL + (.5 * GmBLK) - GmPF
- GmTO)
PIR Performance Player, (PTS + REB + AST + STL + BLK It is a metric primarily used in European
Index Rating Team + PFD) – (Missed FG + Missed FT leagues that attempts to calculate player or
+ TOV + BLKA + PF) team performance.

169
(Values of
avg ranges
from 0 to 40)
Pythagor Pythagorean Team Winning Percentage = GP * Pythagorean Win Percentage is an estimation
ean Win Win (PTS)*16.5/ [PTS Scored)16.5 + that shows where a team win percentage based
Percenta Percentage (PTS Allowed)16.5)] on their points for and points against
ge • Daryl Morey exponential is set to
13.91
• John Hollinger exponential is set
to 16.5
Real +/- Real Plus Player, It is the player’s or team’s average Players estimated on-court impact on team
(RPM) Minus Team influence of net points different performance, measured in net point differential
(Values of per X (100) offensive and per 100 offensive and defensive possessions.
avg ranges defensive possessions. RPM considers teammates, opponents, and
up to 12) additional factors
PIPM Player Player, {[(ORtg + DRtg) + (AvgORtg + PIPM is a Plus-minus metric that adjusts the
Impact Plus Team AvgDRtg)] * (Min^2)}/(G*Min) luck part with box-score data. These 3
Minus components are: box-score prior, luck-adjusted
(Values of G*Min = 82*48 if its NBA on-off data, and luck-adjusted net rating.
avg ranges
up to 9)
Tendex Tendex Player, • Standard Tendex Rating: (raw It is one of the first formula for performance
(Values of Team statistical formula) calculation. Tendex is the summary of positive
average (PTS) + (REB) + (AST) + (STL) + and negative efforts
ranges from (BLK) - (Missed FG) - 0.5 *
0 to 0.4) (Missed FT) - (TOV) - (PF)/(MP)/
(Game Pace).
• Modified Tendex Rating:
(weighted average statistical
formula)
{(PTS) + (REB) + 1.25 * (AST) +
1.25 * (STL) + (BLK) - 1.25 *
(TOV) - (Missed FG) - (Missed
FT/2) - (PF/2)/(MP)}/ (Game
Pace).
USG% Usage Rate Player {[FGA + (FT Attempts x 0.44) + The number of possessions a player is finishing
(Values of (AST x 0.33) + TOV] x Total MIN per game.
avg ranges x League Pace} divided by (MIN x
from 0 to 40) Team Pace)
VA Value Added Player, [MIN * (PER - PRL)]/67 The estimated number of points a player adds
Team PRL = 11.5 for power forwards, to a team’s season total above what a
11.0 for point guards, 10.6 for 'replacement player' (for instance, the 12th man
centers, 10.5 for shooting guards on the roster) would produce.
and small forwards
VORP Value over Player [BPM – (-2.0)] * (% of MP) * A box score estimates of the points per 100
Replacement (GP/82) TEAM possessions that a player contributed
Player above a replacement-level. (RL is -2.0 of an
(Values of avg team of 82 game season)
avg ranges
from 0 to 10)
Wins Wins Added Player, {[Avg(onORtg, onDRtg) + It is the combination of on court Rtg, the PIMP
Added Team (PIPM/2) ^13.91] Avg(onORtg, above a replacement level (RPL)
onDRtg) + (PIPM/2) ^13.91) +

170
Avg(onORtg, onDRtg) - (PIPM/2)
^13.91)] – RPL} * [MP/Min]

WS Win Shares Player, (PP-0.92 * LPPP * (FGA+0.44 Win Shares is the sum of offensive and
(Values of Team *FTA+TO))/(0.32*LPPG*(TP/LP defensive Win Shares
avg ranges )) + (MP/TMP * TDP * (1.08 *
from 0 to LPPP-
0.3) DefRtg/100)/(0.32*LPPG*(TP/LP
)
WS/48 or Win Shares Player, Win Shares divided with minutes An estimate of the number of wins contributed
40 (Per 48 or 40 Team played by a player per 48 minutes
Minutes) (league average is approximately
.100)
PACE Pace (avg Player, MP * ((Tm Poss + Opp Poss)/(2 * The number of possessions a team uses per
values from Team (Tm MP/5))). game. Pace factor is an estimate of the number
95 to 110) of possessions per Minutes Played (MP) by a
team.
USG% Usage Player (FGA + POSS Ending FTA + The % of tactic plays in a player used while on
Percentage TO)/POSS the floor
(ranges of
avg values
from 5 to 40)
TS% True Player PTS/2 * [(FGA + (0.44 * FTA) It captures the effect of shooting percentage if
Shooting we accounted for free throws and 3-pointers.
Percentage The factor 0.44 can be adjusted based of linear
(Values from model of past seasons.
40 to 75)
TOR Turnover Player, (TOV x 100)/[(FGA + (FTA x The percentage of possessions that end in a
Ratio Team 0.44) + AST + TOV] turnover
POSS Possessions Player, 0.96 * (FGA) + (TO) + (0.44 * The number of possessions played by a player
Team (FTA) - (OREB)) or team.
POSS\G Possession Player, Total FGA+ (0.475*FTΑ) + TOV The number of possessions that played divided
per Game Team – OREB by the played games
REB% or Rebound Player, (100 x (REBs x Team MIN))/ One of the recognized Four Factors.
REBr Rate (Values Team [Player MIN x (Team REBs + The % of missed shots that a player rebound.
of avg ranges Opponent REBs)] (Rebounding Percentage)
from 0 to 25)
PRA/G Points, Player, Points + Rebounds + The average of player with points, rebounds,
Rebound and Team Assists/Game and assists per game
Assist
(Values of
avg ranges
from 0 to 50)
WAR or Wins Above Player (Win% - RL) * (Min/48) The evaluation of player performance of him
WARP Replacement and four average players of his team compared
(Values of Where: Win% = TmOffRat^14/ with the opponent team with four average
avg ranges (TmOffRat^14 + TmDefRat^14) players and a replacement level player.
from 0 to 20)

171
Table 0.2: Defensive criteria - Advanced basketball statistics.
Metric
Glossary Description Explanation
Type
The opponent % two-point field goal attempts blocked
Block Percentage
BLK% Player while he was on the floor.
or Block Rate
100*(BLK*(TMP/5))/(MP*(OFGA-O3PA))
Defensive
The % of available defensive rebounds a player grabbed
DREB% Rebound Player
while he was on the floor.
Percentage
Percent of Team's The % of a team's personal fouls that a player has while on
PF% Player
Personal Fouls the court
The % of opponent possessions that end with a steal by the
Player,
STL% Steal Percentage player on the floor.
Team
100*(STL*(TMP/5))/(MP*OP)
Player, A block occurs when the defense player tips the ball,
BLK Blocks
Team blocking their chance to score
Deflections
Player, The number of the defense tackle the ball on a non-shot
Deflections (Values of avg
Team attempt
ranges from 0 to 5)
Defensive Player, The number of rebounds a player or team has collected
DREB
Rebounds Team while they were on defense
Player, Number of times that takes the ball from a player on
STL Steals
Team offense, causing a turnover
The number of points allowed per 100 possessions by a
team. For a player, it is the number of points per 100
Player,
DefRtg Defensive Rating possessions that the team allows while that individual
Team
player is on the court. The formula is: 100*((Opp Points)/
(Opp POSS)).
A box score estimates of the defensive points per 100
Defensive Player,
DBPM possessions a player contributed above a league-average
Plus/Minus Team
player, translated to an average team.
The number of points a team scores per 100 possessions.
Defensive The formula is: (100*Opp Points/ (Opponent FGA +
DEF EFF Team
Efficiency Opponent TOV + (0.44* Opponent FTA) – Opponent
OREB))
The formula is: (Player spg+ Player bpg/team minutes
Defensive Player played)- (times blown by*Pace of Players Era) * Total
DPR Player
Rating Average of Possessions + (Players DRTG*Team
Pace)/Total number of years played
Players estimated on-court impact on team defensive
Defensive Real Player,
DRPM performance, measured in points allowed per 100 defensive
Plus Minus Team
possessions.
Defensive Win The number of wins contributed by a player due to his
DWS Player
Shares defense.

172
Table 0.3: Offensive criteria - Advanced basketball statistics.
Metric
Glossary Description Explanation
Type
3 Point Field Goals Player, The % of 3-point field goals attempted while on the
3PA%
Percentage Team court
3 Point Field Goals Player,
3PM% The % 3-point field goals made while on the court
Percentage Made Team
Percent Blocked Field Player, The % own blocked field goal attempts while on the
BLKA%
Goal Attempts Team court
Field Goal Percentage Player,
FGA% The % field goals attempted while on the court
Attempted Team
Field Goal Percentage Player,
FGM% The % made field goals while on the court
Made Team
Free Throw Percentage Player,
FTA% The % made free throws has made while on the court
Attempted Team
Free Throw Percentage Player,
FTM% The % made free throws while on the court
Made Team
One of Four Factor. How often it goes to line and how
Player,
FTr Free Throw Factor often they made it.
Team
FT/FGA
FTM/FT Percent of Team's Free Player,
Team free throw attempts made per field goal attempt
A% Throws Made Team
Offensive Rebound The % of available offensive rebounds a player took
OREB% Player
Percentage while is on the floor
% of Team's Personal Player, The % of a team's personal fouls drawn by a player has
PFD%
Fouls Drawn Team while on the court
Player, The % of a team's points that a player has while on the
PTS% % of Team's Points
Team court
PTS % of Points (2-Point Field Player, The % of points scored by a player or team from 2-point
2PT% Goals) Team field goals
PTS % of Points (3-Point Field Player, The % of points scored by a player or team from 3-point
3PT% Goals) Team field goals
PTS % of Points (Fast Break Player, The % of scored points by a player or team from fast
FBPS% Points) Team break opportunities
Percent of Points (Free Player, The % of scored points by a player or team from free
PTS FT%
Throws) Team throws
Player, The % of isolation plays that shoot free throws of a
2nd PTS Second Chance Points
Team shooting foul
3 Point Field Goals Player, The number of 3-point field goals that a player or team
3PA
Attempted Team has attempted
Player, The % of a team's 3-point field goals made while on the
3PM 3 Point Field Goals Made
Team court
Player, The number of points scored by a player or team while
FBPS Fast Break Points
Team on a fast break
Player,
FGA Field Goals Attempted The number of 2-point field goals attempted
Team

173
Player,
FGA/Poss FGA/Possession Calculated the shot attempts in each possession.
Team
Player,
FGM Field Goals Made The number of 2-point field goals made
Team
Player,
FTA Free Throws Attempted The number of free throws attempted
Team
Player,
FTM Free Throws Made The number of free throws made
Team
Offensive Player,
OR/P Offensive Rebounds per completed possession
Rebounds/Possession Team
Player, The number of rebounds gathered while they were on
OREB Offensive Rebounds
Team offense
Player, The number of personal fouls that are drawn by a player
PFD Personal Fouls Drawn
Team or team
Player, The number of points scored by a player or team in the
PITP Points in the Paint
Team paint
Player,
PTS/Poss Points/Possession The made points in each time touches the ball.
Team
Player,
PTS Points The number of scored points.
Team
PTS Off Player, The number of points scored following an opponent's
Points Off Turnovers
Tov Team turnover.
A box score estimates of the offensive points per 100
OBPM Offensive Plus/Minus Player
possessions a player
The number of points a team scores per 100 possessions.
OFF EFF Offensive Efficiency Team The formula is: (100*Points/ (Team FGA + Team TOV
+ (0.44*Team FTA) – Team OREB))
Measures a team's or player (on court) points scored per
Player,
OffRtg Offensive Rating 100 possessions. (100*Points/ (Team FGA + Team
Team
TOV + (0.44*Team FTA) – Team OREB))
Player, Player's on-court impact on team offensive performance
ORPM Offensive RPM
Team in points scored per 100 offensive possessions
The number of wins contributed by a player due to
OWS Offensive Win Shares Player
offense.
Points Per Possession The number of points a player or team scores per
Player,
PPP (Values of avg ranges up possession
Team
to 1.8) PTS/(FGA+0.44*FTA+TOV)

174
Table 0.4: Overall Performance criteria - Advanced basketball statistics.
Metric
Glossary Description Explanation
Type
The % of teammate’s FG that a player assists when is
on the floor
Player, 100*AST/(((MP/(TMP/5)) *TFG)-FG)
AST% Assist Percentage
Team AST=Assists, MP=Minutes Played, TMP=Team
Minutes Played, TFG=Team Field Goals, FG=Field
Goals
The number of turnovers committed per 100
Player,
TOV% Turnover Percentage possessions. One of the recognized Four Factors.
Team
100*TOV/(FGA+0.44*FTA+TOV)
100*(TRB*(TMP/5))/(MP*(TTRB+OTRB)). It is a
Total Rebound
TRB% Player weighted average of total rebounds a player took while
Percentage
he was on the floor.
Player,
AST/Poss Assists per Possession It shows how well passed the ball through the game.
Team
Player,
AST Assists The number of assists that goes to a made basket
Team
Player, The number of shots attempted and blocked by a
BLKA Blocks against
Team defender
The number of double-digit number total in two of the
DD2 Double doubles Player five statistical categories in a game that a player
achieves
The number of double-digit number total in three of the
TD3 Triple doubles Player five statistical categories in a game that a player
achieves
Player,
FT/Poss FT/Possession The free throw shot attempt per every possession.
Team
Player, The number of free throws attempts in comparison to
FTA RATE Free Throw Attempt
Team the number of field goal attempts
Player,
GP Games Played The number of games a team or player played
Team
Player,
L Losses The number of games lost by a team or player
Team
MPG/MIN Minutes Played Player The number of minutes played by a team or player
Player, The number of personal fouls a player or team
PF Personal Fouls
Team committed
PRL = 11.5 for power forwards, 11.0 for point guards,
Position Replacement Player,
PRL 10.6 for centers, 10.5 for shooting guards and small
Level Team
forwards
Player, The number of total rebounds a team or player has
REB Rebounds
Team collected on either offense or defense
Steals/Defensive Player, How many steals your defense gets for every one of
STL/DP
Possession Team your opponent’s offensive possessions.
Player, How often a team or player made a turnover every time
TOV/Poss Turnovers/Possession
Team you touch the ball.

175
Player, A turnover occurs when the player or team on offense
TOV Turnovers
Team loses the ball to the defense
Player,
W Wins The number of games won by a team or player
Team
Player, W/GP. The percentage of games played that a player or
%WIN Win Percentage
Team team has won
Loose Ball Recovered
Loose Ball Player, The defensive or offensive actions while trying to
(Values of avg ranges
Rec Team secure a loose ball
from 0 to 12)
Assist to Turnover
Ratio Player, The number of assists for a player or team compared to
AST/TOV
(Values of avg ranges Team the number of turnovers they have committed
from 0 to 3)

Table 0.5: Pearson hypothesis correlation with performance and injury analytics.

Performance Analytics Injury and Pathology Analytics Correlation value p value


dws Health problems (to number) 0.203733647 0
vorp Health problems (to number) 0.20250302 0
ws Health problems (to number) 0.198798205 0
pts Health problems (to number) 0.17777791 0
vorp Decision (to number) 0.175652937 0
mpg Health problems (to number) 0.174924883 0
mp Health problems (to number) 0.168113505 0
ows Health problems (to number) 0.167714983 0
age Health problems (to number) 0.163223241 0
ws Decision (to number) 0.158759013 0
dws Decision (to number) 0.157230572 0
per Health problems (to number) 0.155746406 0
bpm Health problems (to number) 0.154361048 0
usg_pct Health problems (to number) 0.147925275 0
mpg Decision (to number) 0.147874715 0
reb Health problems (to number) 0.144807622 0
ast Health problems (to number) 0.143563138 0
pts Decision (to number) 0.141766282 0
age Decision (to number) 0.139955705 0
ows Decision (to number) 0.136814476 0
net_rating Health problems (to number) 0.136644511 0
ws_per_48 Health problems (to number) 0.136627497 0
ast Decision (to number) 0.136620921 0
obpm Health problems (to number) 0.1311837 0
usg_pct Decision (to number) 0.13117092 0
bpm Decision (to number) 0.130693744 0
per Decision (to number) 0.1285853 0
reb Decision (to number) 0.117934248 0
mp Decision (to number) 0.117664308 0
net_rating Decision (to number) 0.113934376 0
g Health problems (to number) 0.110995604 0
ast_pct Decision (to number) 0.110866792 0

176
ws_per_48 Decision (to number) 0.110397258 0
obpm Decision (to number) 0.106864139 0
dws Organ systems (to number) 0.106758716 0
ast_pct Health problems (to number) 0.105522419 0
dbpm Health problems (to number) 0.102594894 0
vorp Organ systems (to number) 0.101908744 2.22045E-16
ws Organ systems (to number) 0.097689685 2.44249E-15
dbpm Decision (to number) 0.097167679 3.55271E-15
age Organ systems (to number) 0.09015643 2.84217E-13
age Anatomical sub-areas (to number) -0.089672837 3.80448E-13
dreb_pct Health problems (to number) 0.083457486 1.40734E-11
reb Organ systems (to number) 0.082683879 2.16711E-11
fg3a_per_fga_pct Decision (to number) -0.080625669 6.7052E-11
dreb_pct Decision (to number) 0.080128018 8.77431E-11
ows Organ systems (to number) 0.078831405 1.75485E-10
mp Organ systems (to number) 0.074845304 1.38008E-09
ts_pct Health problems (to number) 0.070277343 1.292E-08
per Organ systems (to number) 0.069363988 1.98816E-08
bpm Organ systems (to number) 0.069342822 2.00799E-08
pts Organ systems (to number) 0.06750502 4.7014E-08
dbpm Organ systems (to number) 0.065649338 1.08569E-07
drb_pct Health problems (to number) 0.062810931 3.74188E-07
dws Anatomical sub-areas (to number) -0.062264943 4.71941E-07
mpg Organ systems (to number) 0.061443927 6.66656E-07
dreb_pct Organ systems (to number) 0.058226514 2.47608E-06
ws_per_48 Organ systems (to number) 0.057772346 2.96415E-06
usg_pct Organ systems (to number) 0.056301228 5.26113E-06
fg3a_per_fga_pct Health problems (to number) -0.056259761 5.34584E-06
g Organ systems (to number) 0.053937845 1.28501E-05
stl_pct Health problems (to number) 0.05287222 1.9E-05
g Decision (to number) 0.052080337 2.52903E-05
net_rating Organ systems (to number) 0.052062684 2.54509E-05
dreb_pct Anatomical sub-areas (to number) -0.051530406 3.07747E-05
obpm Organ systems (to number) 0.050901092 3.84348E-05
age Major anatomical areas (to number) -0.050143444 5.00617E-05
vorp Anatomical sub-areas (to number) -0.049117925 7.11823E-05
drb_pct Organ systems (to number) 0.04825817 9.51309E-05
player_weight Decision (to number) 0.047511161 0.000121933
drb_pct Decision (to number) 0.047027887 0.000142908
player_weight Anatomical sub-areas (to number) -0.046443881 0.000172788
stl_pct Decision (to number) 0.046167381 0.000188899
usg_pct Major anatomical areas (to number) -0.044375222 0.000332783
ast Organ systems (to number) 0.043646037 0.000416623
drb_pct Anatomical sub-areas (to number) -0.040134267 0.001174033
fta_per_fga_pct Decision (to number) 0.039983352 0.001225394
dbpm Anatomical sub-areas (to number) -0.039786753 0.001295415
ws Anatomical sub-areas (to number) -0.038080969 0.002077187
trb_pct Health problems (to number) 0.037807769 0.002236661
player_weight Health problems (to number) 0.037737029 0.002279743
orb_pct Major anatomical areas (to number) 0.037007513 0.002770512
usg_pct Anatomical sub-areas (to number) -0.036567991 0.003110959
ast_pct Organ systems (to number) 0.036027463 0.003581753
net_rating Anatomical sub-areas (to number) -0.035587916 0.004011387
player_height Anatomical sub-areas (to number) -0.035275804 0.004344304
177
oreb_pct Major anatomical areas (to number) 0.034547813 0.005220198
reb Anatomical sub-areas (to number) -0.033888492 0.006147913
stl_pct Organ systems (to number) 0.033854165 0.00620005
bpm Anatomical sub-areas (to number) -0.033585879 0.006621433
trb_pct Organ systems (to number) 0.032572231 0.008455642
player_weight Major anatomical areas (to number) -0.031821649 0.01009376
ts_pct Decision (to number) 0.031363578 0.011227093
player_weight Organ systems (to number) 0.030653644 0.013207344
ts_pct Organ systems (to number) 0.030340712 0.014174171
per Anatomical sub-areas (to number) -0.028859666 0.019646245
blk_pct Organ systems (to number) 0.028507416 0.02119188
stl_pct Anatomical sub-areas (to number) -0.028247593 0.022398827
fta_per_fga_pct Health problems (to number) 0.027434308 0.026570882
trb_pct Decision (to number) 0.02587134 0.036494712
vorp Major anatomical areas (to number) -0.025155282 0.042005134
ws_per_48 Anatomical sub-areas (to number) -0.025124245 0.042259092
player_height Organ systems (to number) 0.023887064 0.053492849
trb_pct Anatomical sub-areas (to number) -0.023593594 0.056495811
fg3a_per_fga_pct Organ systems (to number) -0.023592211 0.056510297
blk_pct Anatomical sub-areas (to number) -0.023253739 0.060147393
orb_pct Decision (to number) -0.022456837 0.069481249
player_height Decision (to number) 0.022280403 0.071700952
mp Anatomical sub-areas (to number) -0.022103126 0.073989431
dws Major anatomical areas (to number) -0.021561294 0.081356304
obpm Anatomical sub-areas (to number) -0.021244279 0.085935534
orb_pct Health problems (to number) -0.01991998 0.107360779
ows Anatomical sub-areas (to number) -0.019283029 0.119073516
pts Major anatomical areas (to number) -0.018035752 0.144885187
fg3a_per_fga_pct Major anatomical areas (to number) -0.017405252 0.159466936
tov_pct Decision (to number) 0.01722846 0.163748543
pts Anatomical sub-areas (to number) -0.016157157 0.191562974
g Anatomical sub-areas (to number) -0.015912807 0.198367601
trb_pct Major anatomical areas (to number) 0.01583842 0.200473785
ast_pct Major anatomical areas (to number) -0.012802184 0.300778859
orb_pct Anatomical sub-areas (to number) 0.012120232 0.327263754
fta_per_fga_pct Anatomical sub-areas (to number) -0.011610022 0.348039546
tov_pct Health problems (to number) -0.010742624 0.385237965
mpg Anatomical sub-areas (to number) -0.010702631 0.387009723
ws Major anatomical areas (to number) -0.01052102 0.395117611
net_rating Major anatomical areas (to number) -0.010057219 0.416284564
fta_per_fga_pct Organ systems (to number) 0.010054429 0.416413865
ws_per_48 Major anatomical areas (to number) 0.009682129 0.433882255
reb Major anatomical areas (to number) 0.008950359 0.469423575
player_height Health problems (to number) 0.008913841 0.471238395
tov_pct Major anatomical areas (to number) -0.0071492 0.563376572
blk_pct Major anatomical areas (to number) -0.006774381 0.58400911
ts_pct Major anatomical areas (to number) 0.006182223 0.617303226
mp Major anatomical areas (to number) -0.005917625 0.632444104
oreb_pct Anatomical sub-areas (to number) 0.005870525 0.635155718
player_height Major anatomical areas (to number) -0.005809271 0.638689565
fta_per_fga_pct Major anatomical areas (to number) -0.005295322 0.668657325
dreb_pct Major anatomical areas (to number) -0.005114549 0.67932794
blk_pct Health problems (to number) 0.004989319 0.686757938
tov_pct Anatomical sub-areas (to number) 0.004912462 0.691333032
178
fg3a_per_fga_pct Anatomical sub-areas (to number) 0.004499526 0.716103672
ast Anatomical sub-areas (to number) 0.004483195 0.717089723
orb_pct Organ systems (to number) -0.004218233 0.733152137
ts_pct Anatomical sub-areas (to number) 0.004096462 0.740573945
oreb_pct Decision (to number) 0.00391622 0.751603804
ast Major anatomical areas (to number) -0.00345627 0.779976227
g Major anatomical areas (to number) 0.003394718 0.783796398
ows Major anatomical areas (to number) -0.003162445 0.798258395
blk_pct Decision (to number) 0.00314294 0.799476069
oreb_pct Health problems (to number) -0.002845255 0.818118933
drb_pct Major anatomical areas (to number) 0.002448377 0.843132958
stl_pct Major anatomical areas (to number) 0.002328131 0.850744305
obpm Major anatomical areas (to number) 0.002219088 0.857658523
mpg Major anatomical areas (to number) 0.001650596 0.893870996
bpm Major anatomical areas (to number) 0.001384868 0.910878567
dbpm Major anatomical areas (to number) -0.001236421 0.920397998
oreb_pct Organ systems (to number) -0.000330038 0.978718989
per Major anatomical areas (to number) 0.00018176 0.988279064
ast_pct Anatomical sub-areas (to number) -0.000128081 0.991740451
tov_pct Organ systems (to number) 2.62675E-05 0.99830606

179
The following tables detail injury, demographic, social and economic attributes categorized according to basketball analytics.

Table 0.6: Musculoskeletal injury analysis results for the NBA period from 2010–20.
Total %
Total
Total Counts Health Problems % of Health Problems of Grant Total Health
Type of Injury/Pathology Health
Problems of
Problems
10-11 11-12 12-13 13-14 14-15 15-16 16-17 17-18 18-19 19-20 10-11 11-12 12-13 13-14 14-15 15-16 16-17 17-18 18-19 19-20 Grant Total
Musculoskeletal Injuries 341 354 422 421 418 469 478 444 426 339 8.29 8.61 10.26 10.24 10.17 11.41 11.62 10.80 10.36 8.24 4112 100
Musculoskeletal system 341 354 422 421 418 469 478 444 426 339 8.29 8.61 10.26 10.24 10.17 11.41 11.62 10.80 10.36 8.24 4112 100
Lower extremity 237 241 290 290 273 324 350 317 315 234 5.76 5.86 7.05 7.05 6.64 7.88 8.51 7.71 7.66 5.69 2871 69.82
Ankle area 61 65 93 82 86 79 89 97 82 67 1.48 1.58 2.26 1.99 2.09 1.92 2.16 2.36 1.99 1.63 801 19.48
Calf area 11 12 14 18 15 14 19 10 9 11 0.27 0.29 0.34 0.44 0.36 0.34 0.46 0.24 0.22 0.27 133 3.23
Fibular area 1 0 1 1 1 1 3 2 0 1 0.02 0.00 0.02 0.02 0.02 0.02 0.07 0.05 0.00 0.02 11 0.27
Foot area 28 22 27 29 13 27 35 24 23 14 0.68 0.54 0.66 0.71 0.32 0.66 0.85 0.58 0.56 0.34 242 5.89
Heel area 7 13 8 9 16 14 14 14 18 13 0.17 0.32 0.19 0.22 0.39 0.34 0.34 0.34 0.44 0.32 126 3.06
Hip area 14 9 17 12 20 24 24 18 19 23 0.34 0.22 0.41 0.29 0.49 0.58 0.58 0.44 0.46 0.56 180 4.38
Knee area 77 69 89 97 71 105 99 95 92 55 1.87 1.68 2.16 2.36 1.73 2.55 2.41 2.31 2.24 1.34 849 20.65
Multiple anatomical areas 5 4 3 2 2 0 2 5 0 0 0.12 0.10 0.07 0.05 0.05 0.00 0.05 0.12 0.00 0.00 23 0.56
Shin area 6 5 8 6 12 10 8 10 9 6 0.15 0.12 0.19 0.15 0.29 0.24 0.19 0.24 0.22 0.15 80 1.95
Thigh area 22 33 28 28 29 41 46 34 49 40 0.54 0.80 0.68 0.68 0.71 1.00 1.12 0.83 1.19 0.97 350 8.51
Toes area 5 9 2 6 8 9 11 8 14 4 0.12 0.22 0.05 0.15 0.19 0.22 0.27 0.19 0.34 0.10 76 1.85
Multiple anatomical areas 0 0 6 4 2 2 6 0 1 0 0.00 0.00 0.15 0.10 0.05 0.05 0.15 0.00 0.02 0.00 21 0.51
Neck 2 0 8 6 3 2 2 4 7 6 0.05 0.00 0.19 0.15 0.07 0.05 0.05 0.10 0.17 0.15 40 0.97
Trunk 57 64 57 68 70 73 47 50 62 46 1.39 1.56 1.39 1.65 1.70 1.78 1.14 1.22 1.51 1.12 594 14.45
Abdominal area 15 24 21 14 17 17 17 12 15 16 0.36 0.58 0.51 0.34 0.41 0.41 0.41 0.29 0.36 0.39 168 4.09
Chest area 3 7 3 8 8 8 3 5 4 2 0.07 0.17 0.07 0.19 0.19 0.19 0.07 0.12 0.10 0.05 51 1.24
Pelvic area 2 0 0 2 1 0 0 2 3 3 0.05 0.00 0.00 0.05 0.02 0.00 0.00 0.05 0.07 0.07 13 0.32
Thoracolumbar area 37 33 33 44 44 48 27 31 40 25 0.90 0.80 0.80 1.07 1.07 1.17 0.66 0.75 0.97 0.61 362 8.80
Upper extremity 45 49 61 53 70 68 73 73 41 53 1.09 1.19 1.48 1.29 1.70 1.65 1.78 1.78 1.00 1.29 586 14.25
Elbow area 6 5 8 5 9 5 7 10 6 5 0.15 0.12 0.19 0.12 0.22 0.12 0.17 0.24 0.15 0.12 66 1.61
Hand, Thumb & Fingers area 15 14 25 19 25 19 26 19 19 22 0.36 0.34 0.61 0.46 0.61 0.46 0.63 0.46 0.46 0.54 203 4.94
Shoulder area 14 21 19 18 23 32 21 27 11 18 0.34 0.51 0.46 0.44 0.56 0.78 0.51 0.66 0.27 0.44 204 4.96
Upper arm and Forearm area 1 0 4 1 0 1 3 2 0 2 0.02 0.00 0.10 0.02 0.00 0.02 0.07 0.05 0.00 0.05 14 0.34
Wrist area 9 9 5 10 13 11 16 15 5 6 0.22 0.22 0.12 0.24 0.32 0.27 0.39 0.36 0.12 0.15 99 2.41
Total Unique reasons of
341 354 422 421 418 469 478 444 426 339 8.29 8.61 10.26 10.24 10.17 11.41 11.62 10.80 10.36 8.24 4112 100
Absence

180
Table 0.7: Player age clustering (ages 18-43) in correlation with basketball advanced analytics and average wages in dollars ($) for 1996-2020.
Advanced
Analytics Age Criterion
Age 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
Salary Inflated 1,732,401 3,218,622 2,877,569 2,819,124 2,576,071 3,099,609 3,967,551 5,237,968 6,267,616 6,923,518 7,677,535 8,201,621 8,107,519 8,077,180 8,210,213 8,270,049 8,611,010 7,332,750 6,756,893 6,538,329 5,759,942 4,898,280 3,190,658 4,923,851 2,098,713 2,564,753

Points (PTS) 3.65 7.77 8.19 8.75 8.22 8.09 8.37 9.28 9.80 9.64 9.66 9.65 8.88 8.33 8.30 7.90 7.85 7.23 7.26 6.88 6.74 6.32 5.63 3.93 5.20 4.30

Rebounds (REBS) 2.20 3.56 3.73 3.79 3.61 3.49 3.62 3.91 4.01 4.01 3.95 4.04 3.93 3.55 3.69 3.63 3.65 3.75 3.73 3.47 3.40 3.14 2.99 3.23 2.60 2.10

Assists (AST) 0.55 1.45 1.68 1.77 1.67 1.68 1.78 1.93 2.12 2.10 2.16 2.16 2.02 2.10 2.10 2.12 2.19 1.97 1.96 1.91 1.87 1.60 1.66 0.50 0.70 0.65
Total Rebounds
12.47 10.47 10.49 10.48 10.20 10.01 10.03 10.16 9.99 10.00 9.76 9.87 10.10 9.35 9.70 9.60 9.39 10.11 10.34 10.04 10.43 10.40 10.02 13.67 10.30 10.00
% (TRB%)
Turnover %
13.82 14.10 14.65 14.01 13.97 14.01 14.24 13.53 13.53 13.67 13.54 13.76 13.67 14.09 13.91 14.64 14.36 14.40 14.80 14.84 14.73 14.96 14.94 13.80 10.90 12.30
(TOV%)
Usage % (USG%) 19.56 19.31 19.19 19.57 19.16 18.94 18.82 19.21 19.28 19.07 18.83 18.84 18.21 17.76 17.73 17.32 17.06 16.51 16.76 16.95 17.60 16.35 15.03 15.80 15.75 18.20
True Shooting %
45.44 49.89 50.97 51.39 51.69 51.47 51.90 52.49 52.62 52.84 52.72 52.64 52.24 51.69 51.90 51.69 51.21 51.01 52.31 51.29 50.40 52.72 53.27 53.50 49.95 48.90
(TS%)
Assists % (AST%) 8.87 10.91 12.14 12.74 12.78 12.70 13.17 13.39 13.92 13.81 13.87 13.84 13.38 14.14 14.11 14.81 14.95 13.94 14.18 14.48 15.37 12.79 12.26 5.67 7.00 8.20

Steals % (STL%) 1.45 1.45 1.61 1.61 1.62 1.69 1.63 1.63 1.65 1.61 1.59 1.61 1.56 1.56 1.55 1.58 1.60 1.60 1.47 1.60 1.64 1.77 1.45 2.00 1.35 1.00

Block % (BLK%) 2.89 2.15 2.08 1.95 1.81 1.65 1.69 1.62 1.66 1.60 1.47 1.42 1.56 1.36 1.35 1.25 1.36 1.53 1.64 1.55 1.52 1.45 1.88 3.30 1.70 2.95

Net Rating -4.67 -4.68 -3.84 -3.41 -2.87 -2.77 -2.59 -1.57 -1.57 -1.47 -1.15 -0.40 -0.84 -0.98 -0.46 -0.93 -0.34 0.37 0.51 0.01 -1.40 1.99 -0.92 -0.20 -10.90 3.15
Performance
Estimate Rating 11.10 12.06 12.77 13.20 13.07 13.00 13.02 13.88 13.97 13.92 13.65 13.68 13.15 12.53 12.70 12.40 12.42 12.20 12.66 12.72 12.49 12.31 11.44 13.40 9.90 10.90
(PER)
Win Shares (WS) 0.59 1.49 2.00 2.32 2.27 2.26 2.44 2.91 2.91 2.98 3.00 3.01 2.85 2.62 2.72 2.62 2.64 2.54 2.64 2.28 1.99 2.51 2.22 1.47 0.95 0.35
Win Shares per 48
4.02 4.63 6.04 6.66 6.92 7.14 7.41 8.61 8.68 8.88 8.81 8.96 8.41 7.95 8.39 7.94 8.29 8.37 9.10 8.72 7.89 9.02 8.44 11.50 3.70 4.50
(WS_per_48)
Box Plus-Minus
-4.32 -2.82 -2.14 -1.86 -1.75 -1.71 -1.59 -1.03 -0.88 -0.83 -0.83 -0.76 -1.10 -1.30 -1.27 -1.20 -1.13 -1.11 -0.86 -0.85 -1.18 -0.95 -1.49 -0.60 -3.65 -3.50
(BPM)
Value Over
Replacement -0.02 0.02 0.26 0.44 0.46 0.49 0.56 0.73 0.74 0.75 0.78 0.76 0.70 0.61 0.62 0.62 0.66 0.62 0.62 0.55 0.44 0.73 0.48 0.23 0.05 -0.30
(VORP)

181
Table 0.8: Annual NBA league team financial losses in $ due to player health pathologies and injuries (from 2010-11 up to 2019-20).

Major Anatomical Areas 2010-11 2011-12 2012-13 2013-14 2014-15 2015-16 2016-17 2017-18 2018-19 2019-20 Grand Total
Digestive system 1,465,153 1,093,806 551,841 1,194,776 476,808 1,436,636 1,819,506 1,260,875 2,709,375 1,852,856 13,861,631
Ear - 277,721 - - - - - - - - 277,721

Head 1,091,592 884,145 1,469,144 648,376 1,127,630 1,104,802 1,276,155 1,795,726 2,693,372 702,653 12,793,594

Integumentary system - 261,217 - 467,696 - - - - - - 728,912

Lower extremity 1,202,585 1,253,626 1,396,152 1,253,724 1,176,429 1,489,473 1,783,167 1,853,490 2,060,566 2,130,795 15,600,006

Mouth - 357,077 - 401,166 225,969 820,486 1,195,950 - - - 3,000,648

Multiple anatomical areas - - 1,250,122 848,056 1,371,933 2,018,750 3,014,863 - - - 8,503,725

Musculoskeletal system 1,139,670 1,532,274 1,509,550 1,061,958 1,488,962 1,430,075 1,787,418 1,838,266 2,041,867 1,895,143 15,725,183

Neck 1,620,295 - 2,389,320 1,432,776 448,163 4,250,000 850,000 1,875,056 1,064,406 1,092,616 15,022,631

Nervous system 785,676 819,499 1,498,369 924,462 4,104,932 637,500 - - - - 8,770,438

Respiratory system 1,529,907 696,041 1,171,688 1,146,337 1,291,661 570,952 992,422 1,660,805 508,701 462,043 10,030,557

Trunk 1,212,023 1,193,589 1,368,456 1,349,778 1,056,388 1,330,892 1,602,264 1,894,970 1,841,684 3,083,029 15,933,073

Unclassified 784,883 1,213,733 1,208,549 1,498,635 1,159,440 1,472,583 1,893,394 1,614,937 1,766,682 1,673,033 14,285,870
Upper extremity 1,062,279 1,538,406 1,078,032 1,153,449 1,238,949 1,464,252 1,789,024 1,438,391 2,133,647 3,154,392 16,050,820
ANNUAL TOTAL 11,894,064 11,121,133 14,891,222 13,381,190 15,167,264 18,026,401 18,004,164 15,232,515 16,820,299 16,046,558 150,584,809

Table 0.9: Annual inflation rates adjusted for 2002-03 0.703 2012-13 0.898
2019-20 as a baseline. 2003-04 0.719 2013-14 0.911
2004-05 0.739 2014-15 0.926
Year Annual Inflation adjusted 2005-06 0.764 2015-16 0.927
using 2019-20 as a baseline 2006-07 2016-17
0.788 0.939
1996-97 0.613 2007-08 0.811 2017-18 0.959
1997-98 0.628 2008-09 0.842 2018-19 0.982
1998-99 0.637 2009-10 0.839 2019-20 1.000
1999-00 0.651 2010-11 0.853
2000-01 0.673 2011-12 0.880
2001-02 0.692
182
Table 0.10: Eight age clusters segmented across 3 playing Positions, in correlation with Player percentage, Salary adjusted for inflation, and
advanced basketball analytics over 24 NBA seasons (1996-2020).

Age clusters Pos Segmentation Player % Inflated Salary PER Net Rating VORP WS per 48 TS % USG % MPG PTS AST REB
>= 34 C 1.9% $ 9,109,364 14.1 0.71 0.67 0.110 52.41% 16.15% 21.1 7.1 1.17 5.73
31 - 33 C 2.3% $ 9,452,372 13.9 -0.21 0.66 0.105 53.33% 16.68% 21.3 7.5 1.16 5.74
29 - 30 C 2.6% $ 9,221,345 14.2 -0.48 0.54 0.101 53.57% 17.04% 22.0 8.1 1.13 5.95
27 - 28 C 3.0% $ 7,763,829 14.8 -0.99 0.58 0.109 54.00% 17.32% 21.5 8.1 1.06 5.89
25 - 26 C 3.5% $ 6,297,997 15.7 -1.75 0.64 0.113 54.49% 17.97% 22.2 8.9 1.09 6.30
23 - 24 C 3.2% $ 4,301,598 15.4 -1.39 0.55 0.112 54.45% 17.80% 20.9 8.2 0.96 5.75
21 - 22 C 1.8% $ 3,097,907 16.3 -2.01 0.58 0.116 55.84% 18.89% 21.5 9.1 1.02 6.06
18 - 20 C 0.6% $ 2,975,363 15.2 -4.30 0.29 0.096 54.83% 18.09% 20.9 8.2 0.87 5.63
>= 34 F 3.1% $ 6,808,314 12.2 -0.17 0.58 0.085 51.19% 16.83% 22.4 7.8 1.61 4.28
31 - 33 F 5.2% $ 8,618,154 13.0 0.09 0.65 0.090 52.40% 17.61% 23.3 8.8 1.62 4.41
29 - 30 F 5.5% $ 8,271,112 13.4 -0.45 0.72 0.089 52.62% 18.50% 24.4 9.8 1.61 4.69
27 - 28 F 6.5% $ 7,809,199 14.3 -0.71 0.91 0.098 53.73% 19.12% 25.1 10.7 1.66 4.87
25 - 26 F 6.5% $ 6,037,888 14.2 -0.92 0.88 0.093 53.11% 19.15% 25.1 10.6 1.64 4.85
23 - 24 F 7.0% $ 3,819,302 13.9 -2.10 0.72 0.088 52.56% 18.87% 23.0 9.5 1.43 4.57
21 - 22 F 4.4% $ 2,710,571 13.9 -2.57 0.57 0.083 52.53% 19.05% 22.7 9.5 1.31 4.68
18 - 20 F 1.5% $ 3,028,907 13.3 -3.18 0.33 0.070 51.58% 18.97% 22.2 8.9 1.22 4.50
>= 34 G 3.3% $ 6,728,231 13.0 0.95 0.81 0.084 52.15% 17.88% 22.6 8.4 3.26 2.39
31 - 33 G 5.9% $ 7,264,115 12.9 -0.56 0.75 0.078 51.84% 18.64% 24.2 9.3 3.26 2.55
29 - 30 G 5.1% $ 7,490,587 13.9 -0.09 1.01 0.088 52.65% 19.84% 25.5 10.7 3.41 2.72
27 - 28 G 6.7% $ 6,562,890 13.8 -1.16 0.90 0.080 52.18% 20.08% 25.2 10.8 3.39 2.78
25 - 26 G 7.3% $ 5,258,335 14.0 -1.20 0.88 0.082 52.15% 20.42% 24.9 10.8 3.23 2.73
23 - 24 G 7.5% $ 2,996,631 13.0 -2.31 0.64 0.065 51.43% 20.05% 23.1 9.6 2.90 2.61
21 - 22 G 4.4% $ 2,449,103 12.6 -2.78 0.52 0.056 50.64% 20.24% 22.8 9.5 2.88 2.64
18 - 20 G 1.3% $ 2,854,992 11.8 -4.61 0.16 0.037 49.40% 20.40% 23.0 9.5 2.95 2.69

183
Table 0.11 displays the categorization of basketball performance metrics into four main
areas: Rating, Misc, Offensive, and Defensive. Each area contains various performance
basketball analytics of players and teams. In more detail, each performance category
analyses:

1. Rating: Measure overall player efficiency and impact on the game. Advanced
metrics that normalize performance to account for pace and playing time, such as
NET_RATING_ADVANCED and PACE_PER40_ADVANCED, are included.
Usage percentages such as USG_PCT highlight a player's involvement in game
play.
2. Offensive: Quantify the scoring, playmaking, and efficiency of the offense.
Traditional stats such as FGM and FGA track shot making, while advanced stats
such as OFF_RATING_ADVANCED measure offensive efficiency. The scoring
percentages associated with specific play types, such as
PCT_PTS_2PT_SCORING or PCT_PTS_3PT_SCORING, indicate where a
player or team excels in scoring.
3. Defensive: Evaluate a player's or team's defensive effectiveness.
DREB_PCT_ADVANCED could indicate a player’s ability to rebound on the
defensive end. Steal-related statistics such as STL_TRADITIONAL and
PCT_STL_USAGE measure defensive playmaking.
4. Miscellaneous: Capture diverse aspects of the game not strictly classified as
offensive or defensive. PTS_FB_MISC and PTS_PAINT_MISC provide insight
into how teams score in transition and in the paint. Player tracking data, as
indicated by metrics with "_PLAYER_TRACK", offer a detailed look into player
movements and actions. PFD_MISC and SAST_PLAYER_TRACK can indicate
a player's influence on the game beyond primary scoring and assists.

Table 0.11: Rating, Misc, Offensive and Defensive categorization of the Basketball
Performance Analytics
Rating Misc Offensive Defensive
AST_PCT_ADVANCED: BLK_MISC: AST_TRADITIONAL: DREB_PCT_ADVANCED:
Percentage of team field Miscellaneous block Traditional count of assists Advanced metric measuring the
goals a player assisted stats not fitting made by a player. percentage of available defensive
while on court traditional or rebounds a player grabbed while
advanced categories on the court.
AST_RATIO_ADVANCE BLKA_MISC: E_OFF_RATING_ADVANCE DEF_RATING_ADVANCED:
D: Assists per 100 Miscellaneous stats D: Advanced metric evaluating An advanced metric that estimates

184
possessions used by a for shots blocked by a player or team's offensive a player's overall defensive impact
player opponents efficiency. per 100 possessions.

AST_TOV_ADVANCED: OPP_PTS_2ND_CH FG_PCT_TRADITIONAL: DREB_TRADITIONAL:


Ratio of a player's assists to ANCE_MISC: Points Traditional field goal Traditional count of defensive
turnovers scored by opponents percentage, measuring overall rebounds grabbed by a player.
on second-chance shooting success.
opportunities
E_NET_RATING_ADVA OPP_PTS_FB_MISC FG3_PCT_TRADITIONAL: E_DEF_RATING_ADVANCED:
NCED: Player's net impact : Points scored by Traditional three-point field Enhanced defensive rating,
on team's offensive and opponents on fast goal percentage. offering a more comprehensive
defensive efficiency breaks view of a player's defensive
efficiency.
E_PACE_ADVANCED: OPP_PTS_OFF_TO FG3A_TRADITIONAL: BLK_TRADITIONAL:
Estimate of the pace at V_MISC: Points Traditional count of three-point Traditional count of shots blocked
which a player plays scored by opponents field goal attempts by a player. by a player.
(possessions per 48 off turnovers
minutes)
E_USG_PCT_ADVANCE OPP_PTS_PAINT_ FG3M_TRADITIONAL: REB_PCT_ADVANCED:
D: Usage rate, measuring MISC: Points scored Traditional count of three-point Advanced metric measuring the
the percentage of team by opponents in the field goals made. percentage of total rebounds
plays involving a player paint (offensive and defensive) a player
while on court grabbed while on the court.
EFG_PCT_ADVANCED: PF_MISC: Personal FGA_TRADITIONAL: REB_TRADITIONAL:
Effective field goal fouls count in Traditional count of total field Traditional count of total rebounds
percentage, accounting for miscellaneous goal attempts. (both offensive and defensive)
3-point field goals situations grabbed by a player.
EFG_PCT_FOUR_FACT PFD_MISC: Count of FGM_TRADITIONAL: STL_TRADITIONAL:
ORS: Part of the 'Four personal fouls drawn Traditional count of total field Traditional count of steals made
Factors' of basketball in various scenarios goals made. by a player.
success, measuring
effective shooting
efficiency
FTA_RATE_FOUR_FAC PTS_2ND_CHANCE FT_PCT_TRADITIONAL: PCT_STL_USAGE: Percentage
TORS: Free throw attempt _MISC: Points scored Traditional free throw of a player's steals relative to their
rate in the context of the on second-chance percentage. overall on-court engagement and
'Four Factors' opportunities usage.
FTA_TRADITIONAL: PTS_FB_MISC: OFF_RATING_ADVANCED:
Traditional count of a Points scored on fast Advanced metric for assessing a
player's free throw breaks player's or team's offensive
attempts performance per 100
possessions.
FTM_TRADITIONAL: PTS_OFF_TOV_MI OREB_PCT_ADVANCED:
Traditional count of a SC: Points scored off Advanced metric measuring the
player's made free throws turnovers percentage of available

185
offensive rebounds a player
grabbed while on the court.
MIN_ADVANCED: PTS_PAINT_MISC: OREB_PCT_FOUR_FACTOR
Minutes played, an Points scored in the S: Offensive rebound
advanced metric paint percentage as part of the 'Four
considering various factors Factors' in basketball analysis.
NET_RATING_ADVAN AST_PLAYER_TRA OREB_TRADITIONAL:
CED: Team's point CK: Assists tracked in Traditional count of offensive
differential per 100 specific player rebounds grabbed by a player.
possessions while the tracking scenarios
player is on the court
OPP_EFG_PCT_FOUR_F CFG_PCT_PLAYER PCT_AST_2PM_SCORING:
ACTORS: Opponent's _TRACK: Player's Percentage of two-point field
effective field goal catch-and-shoot field goals made that were assisted.
percentage, a defensive goal percentage in
metric player tracking
OPP_FTA_RATE_FOUR CFGA_PLAYER_TR PCT_AST_3PM_SCORING:
_FACTORS: Opponent's ACK: Player's catch- Percentage of three-point field
free throw attempt rate, and-shoot field goal goals made that were assisted.
indicating defensive attempts in player
efficiency tracking
OPP_OREB_PCT_FOUR CFGM_PLAYER_T PCT_AST_FGM_SCORING:
_FACTORS: Opponent's RACK: Player's Overall percentage of field goals
offensive rebound catch-and-shoot field made that were assisted.
percentage, a defensive goals made in player
metric tracking
OPP_TOV_PCT_FOUR_ DFG_PCT_PLAYER PCT_PTS_2PT_MR_SCORIN
FACTORS: Opponent's _TRACK: Player's G: Percentage of points scored
turnover percentage, a part defense against field from mid-range two-point shots.
of defensive metrics goal percentage in
player tracking
PACE_ADVANCED: The DFGA_PLAYER_T PCT_PTS_2PT_SCORING:
pace factor, estimating the RACK: Player's Percentage of points scored
number of possessions per defense against field from all two-point field goals.
48 minutes goal attempts in
player tracking
PACE_PER40_ADVANC DFGM_PLAYER_T PCT_PTS_3PT_SCORING:
ED: Similar to pace factor RACK: Player's Percentage of points scored
but calculated per 40 defense against field from three-point field goals.
minutes goals made in player
tracking
PCT_AST_USAGE: FTAST_PLAYER_T PCT_PTS_PAINT_SCORING:
Percentage of team's assists RACK: Free throw Percentage of points scored in
a player account for while assists tracked in the paint.
playing player tracking
scenarios

186
PCT_BLK_USAGE: ORBC_PLAYER_T PCT_UAST_2PM_SCORING:
Percentage of team's RACK: Offensive Percentage of two-point field
blocks a player accounts rebounds captured in goals made without an assist.
for while playing player tracking
PCT_BLKA_USAGE: PASS_PLAYER_TR PCT_UAST_3PM_SCORING:
Percentage of a player's ACK: Passes made Percentage of three-point field
shots that are blocked by tracked in player goals made without an assist.
opponents tracking scenarios
PCT_DREB_USAGE: RBC_PLAYER_TR PCT_UAST_FGM_SCORING:
Percentage of available ACK: Rebounds Overall percentage of field goals
defensive rebounds a captured in player made without an assist.
player gets tracking
PCT_FG3A_USAGE: SAST_PLAYER_TR PCT_FGA_2PT_SCORING:
Percentage of a team's ACK: Secondary Percentage of total field goal
three-point attempts taken assists tracked in attempts that are two-point
by a player player tracking shots.
scenarios
PCT_FG3M_USAGE: SPD_PLAYER_TRA PCT_FGA_3PT_SCORING:
Percentage of a team's CK: Speed of the Percentage of total field goal
three-point makes player during play, attempts that are three-point
attributed to a player tracked in player shots.
tracking scenarios
PCT_FGA_USAGE: TCHS_PLAYER_TR PTS_TRADITIONAL:
Percentage of team's field ACK: Touches of the Traditional count of total points
goal attempts taken by a ball by the player scored by a player.
player tracked in player
tracking
PCT_FGM_USAGE: UFG_PCT_PLAYER PCT_OREB_USAGE:
Percentage of team's field _TRACK: Player's Percentage of team's offensive
goals made by a player unguarded field goal rebounds a player accounts for
percentage in player while on the court.
tracking
PCT_FTA_USAGE: UFGA_PLAYER_T PCT_PTS_USAGE: Percentage
Percentage of team's free RACK: Player's of team's points a player
throw attempts taken by a unguarded field goal accounts for while on the court.
player attempts in player
tracking
PCT_FTM_USAGE: UFGM_PLAYER_T PCT_PTS_FB_SCORING:
Percentage of team's free RACK: Player's Percentage of points scored
throws made by a player unguarded field goals from fast breaks.
made in player
tracking
PCT_REB_USAGE: PCT_TOV_USAGE: PCT_PTS_FT_SCORING:
Percentage of team's total Percentage of a Percentage of points scored
rebounds grabbed by a player's turnovers from free throws.
player

187
relative to their usage
rate
PIE_ADVANCED: Player DIST_PLAYER_TR PCT_PTS_OFF_TOV_SCORI
Impact Estimate, a measure ACK: Distance NG: Percentage of points scored
of a player's overall covered by the player off turnovers.
statistical contribution during play, tracked in
player tracking
PLUS_MINUS_TRADITI DRBC_PLAYER_T
ONAL: The point RACK: Defensive
differential when the rebounds captured in
player is on and off the player tracking
court
POSS_ADVANCED: The FG_PCT_PLAYER_
number of possessions a TRACK: Player's
player is involved in overall field goal
percentage in player
tracking
TM_TOV_PCT_ADVAN PF_TRADITIONAL:
CED: Team's turnover Traditional count of
percentage while a player personal fouls
is on the court committed by a player
TM_TOV_PCT_FOUR_F TO_TRADITIONAL
ACTORS: Team's turnover : Traditional count of
percentage as a part of the turnovers committed
'Four Factors' by a player
TS_PCT_ADVANCED: PCT_PF_USAGE:
True shooting percentage, Percentage of a
measuring shooting player's personal fouls
efficiency (including free relative to their usage
throws) rate
USG_PCT_ADVANCED: PCT_PFD_USAGE:
Usage percentage, Percentage of a
indicating the proportion of player's personal fouls
team plays used by a player drawn relative to their
usage rate
USG_PCT_USAGE:
Similar to usage
percentage, a measure of
how involved a player is in
team plays

188
Abbreviations
Table 0.1: Abbreviations for basketball DRPM Defensive Real Plus Minus
analytics. DS Data Science
Basketball DWS Defensive Win Shares
Analytics Explanation EFF Efficiency
Abbreviations Effective Field Goal
eFG%
%WIN Win Percentage Percentage
2nd PTS Second Chance Points ELO Team ELO
3P or TP 3 Points Electronic Performance and
EPTS
Tracking Systems
3 Point Field Goals
3PA EWA Estimated Wins Added
Attempted
3 Point Field Goals FBPS Fast Break Points
3PA%
Percentage FG Field Goals
3PM 3 Point Field Goals Made FGA Field-Goal Attempts
3 Point Field Goals Field Goal Percentage
3PM% FGA%
Percentage Made Attempted
Adj. +/- (APM) Adjusted Plus Minus FGA/Possession FGA/Possession
AGE or A Age FGM Field Goals Made
AI Artificial Intelligence FGM% Field Goal Percentage Made
AR Assist Ratio FP Fantasy Points
AST Assists FT Free Throws
AST RATIO Assists Ratio FT/Possession FT/Possession
AST% Assists Percentage Free throw attempts per field
FT/FTA
AST/A Assists goal attempt
AST/Possession Assists per Possession FTA Free-Throw Attempts
AST/TO Assists per Turnover Ratio Free Throw Percentage
FTA%
Attempted
BA Basketball Analytics
FTM Free Throws Made
BLK or B Blocks
FTM% Free Throw Percentage Made
BLK% Blocks Percentage Percent of Team's Free
BLKA Blocks against FTM/FTA%
Throws Made
Percent Blocked Field Goal FTr Free Throw Factor
BLKA%
Attempts GmSc Game Score
BPM Box Plus Minus
GP Games Played
DBPM Defensive Box Plus Minus
L Losses
dd2 Double double
LA League Assists
DEF EFF Defensive Efficiency
LFG League Field Goals
Defl Deflections
LFGA League Field-Goal Attempts
DEFRTG Defensive Rating
LFT League Free Throws
DM Data Mining
LFTA League Free-Throw Attempts
DM Data Mining
Loose Ball Rec Loose Ball Recovered
DP Defensive Possession Loose Balls
DPR Defensive Player Rating Loose Balls Recovered
Recovered
Defensive Rebound LORB League Offensive Rebounds
DRB%
Percentage LPace or LP League Pace
DREB Defensive Rebounds
LPF League Personal Fouls
Defensive Rebounds
DREB% LPPG League Points Per Game
Percentage
LPPP League Points Per Possession

189
LPTS League Points % of Points (Fast Break
PTS FBPS%
LTOV League Turnovers Points)
Percent of Points (Free
LuPER League uPER PTS FT%
Throws)
M or Min Minutes PTS OFF TO Points Off Turnovers
ML Machine Learning PTS% % of Team's Points
MP Minutes played PTS/Possession Points/Possession
MPG/MIN Minutes Played Real +/- (RPM) Real Plus Minus
MVP Most Valuable Player REB or TREB Total Rebounds
NetRtg Net Rating REB% Total Rebounds Percentage
OBPM Offensive Box Plus Minus RPM Real Plus Minus
OFF EFF Offensive Efficiency SA Sports Analytics
OffRtg Offensive Rating Screen Assists Screen Assists
OPOC Opponent Points on Court Screen Assists
Offensive Screen Assists to Points
OR/P PTS
Rebounds/Possession SDM Sports Data Mining
OREB Offensive Rebounds STL or S Steals
Offensive Rebounds
OREB% STL% Steals Percentage
Percentage
ORPM Offensive Real Plus Minus TA Team Assists
ORPM Offensive RPM td3 Triple Double
OWS Offensive Win Shares TDP Team Defensive Possessions
PACE Pace TFG Team Field Goals
PER Player Efficiency Rating Tm or T Team
PF Personal Fouls TMP Team Minutes Played
PFD Personal Fouls Drawn TO or TOV Turnovers
% of Team's Personal Fouls TO RATIO Turnovers Ratio
PFD% TOV% Turnovers Percentage
Drawn
PIE Player Impact Estimate TP or TPace Team Pace
PIPM Player Impact Plus Minus TPOC Team Points on Court
PIR Performance Index Rating Points off Court per 48
TPOffC48
PITP Points in the Paint minutes
Points on Court per 48
PL or P Player TPOnC48
minutes
PM (+/-) Plus Minus TRB Total Rebounds
Pos Position TRB% Total Rebound Percentage
Poss Possession TS% True Shooting Percentage
PP Points Produced USG% Usage Percentage
PPP Points Per Possession VORP Value Over Replacement
Points Rebounds Assists
PRA% W Wins
Percentage
PRL Position Replacement Level WAR or WARP Wins Above Replacement
WINS or RPM
PTS Points Real Plus Minus WINs
WINS
% of Points (2-Point Field
PTS 2PT% WS Win Shares
Goals)
% of Points (3-Point Field WS/48 Win Shares per 48 minutes
PTS 3PT% AST/FGM Assist per Field Goal Made
Goals)

190
References

[1] R. B. Minton, Sports Math: An Introductory Course in the Mathematics of Sports


Science and Sports Analytics. Chapman and Hall/CRC, 2017. doi:
10.1201/9781315371603.
[2] S. Kaplan et al., “The Economic Impact of NBA Superstars: Evidence from
Missed Games using Ticket Microdata from a Secondary Marketplace,” MIT
Sloan Sport. Anal. Conf., pp. 1–29, 2019, [Online]. Available:
http://www.sloansportsconference.com/wp-content/uploads/2019/02/Economic-
Impact-of-NBA-Superstars.pdf
[3] V. Sarlis and C. Tjortjis, “Sports analytics — Evaluation of basketball players
and team performance,” Inf. Syst., vol. 19, no. May, p. 19, 2020, doi:
10.1016/j.is.2020.101562.
[4] E. Turban, R. Sharda, and D. Delen, Decision Support and Business Intelligence
Systems, 9th ed., vol. 9. Pearson, 2011. doi: 658.4030285-dc22.
[5] A. Senderovich, A. Shleyfman, M. Weidlich, and A. Gal, “To aggregate or to
eliminate ? Optimal model simplification for improved process performance
prediction,” Inf. Syst., vol. 0, pp. 1–16, 2018, doi: 10.1016/j.is.2018.04.003.
[6] B. Gerrard, “Moneyball and the Role of Sports Analytics: A Decision-Theoretic
Perspective,” in North American Society for Sport Management Conference
(NASSM 2016), 2016, no. Nassm, pp. 2010–2012. doi:
10.1177/2167479513480355.Gibbs.
[7] K. Pelechrinis and E. Papalexakis, “Athlytics: Winning in Sports with Data,”
Proc. Elev. ACM Int. Conf. Web Search Data Min. - WSDM ’18, pp. 787–788,
2018, doi: 10.1145/3159652.3162005.
[8] O. Kostakis, N. Tatti, and A. Gionis, Discovering recurring activity in temporal
networks, vol. 31, no. 6. Springer US, 2017. doi: 10.1007/s10618-017-0515-0.
[9] W. Tichy, “Changing the Game: Dr. Dave Schrader on sports analytics,”
Ubiquity, no. May, pp. 1–10, May 2016, doi: 10.1145/2933230.
[10] V. C. P. Chen, S. Bum, K. Asil, and D. Sundaramoorthi, “Preface : Data mining
and analytics,” Springer Sci. Media, 2018, doi: 10.1007/s10479-018-2787-1.
[11] K. Lankhorst et al., “Sports participation related to injuries and illnesses among
ambulatory youth with chronic diseases: Results of the health in adapted youth
sports study,” BMC Sports Sci. Med. Rehabil., vol. 11, no. 1, pp. 1–12, 2019, doi:
191
10.1186/s13102-019-0145-5.
[12] T. B. Swartz, “Research directions in cricket,” Handb. Stat. Methods Anal. Sport.,
pp. 445–460, 2017, doi: 10.1201/9781315166070.
[13] C. Cao, “Sports Data Mining Technology Used in Basketball Outcome
Prediction,” 2012. [Online]. Available:
http://arrow.dit.ie/scschcomdis/39%5Cnhttp://arrow.dit.ie/scschcomdis/39/
[14] T. H. Davenport, “Analytics in Sports: The New Science of Winning,” Int. Inst.
Anal., vol. 2, no. February, pp. 1–28, 2014.
[15] E. Morgulev, O. H. Azar, and R. Lidor, “Sports analytics and the big-data era,”
Int. J. Data Sci. Anal., vol. 5, no. 4, pp. 213–222, 2018, doi: 10.1007/s41060-
017-0093-7.
[16] B. R. Humphreys and C. Johnson, “The Effect of Superstars on Game
Attendance: Evidence From the NBA,” J. Sports Econom., vol. 21, no. 2, pp.
152–175, 2020, doi: 10.1177/1527002519885441.
[17] Statista, “Statista,” 2023. https://www.statista.com/statistics/370560/worldwide-
sports-market-revenue/
[18] G. Vinue and I. Epifanio, “Forecasting basketball players’ performance using
sparse functional data,” Stat. Anal. Data Min., no. 1, pp. 1–26, 2018.
[19] P. O. Donoghue, Research Methods for Sports Performance Analysis. 2009. doi:
10.4324/9780203878309.
[20] S. G. H. Vıctor Blanco, Roman Salmeron, “A multicriteria selection system based
on player performance. Case study: The Spanish ACB Basketball League,” pp.
1–17, 2018.
[21] S. Hisham, Talukder, Thomas, Vincent Geoff, Foster Camden, Hu Juan, Huerta
Aparna, Kumar Mark, Malazarte Diego, Saldana Shawn, “Preventing in‐game
injuries for NBA players,” MIT Sloan Sport. Anal. Conf., vol. 2015, pp. 1–13,
2016.
[22] N. Maffulli, U. G. Longo, N. Gougoulias, D. Caine, and V. Denaro, “Sport
injuries: A review of outcomes,” Br. Med. Bull., vol. 97, no. 1, pp. 47–80, 2011,
doi: 10.1093/bmb/ldq026.
[23] E. Cumps, E. Verhagen, and R. Meeusen, “Prospective epidemiological study of
basketball injuries during one competitive season: Ankle sprains and overuse
knee injuries,” J. Sport. Sci. Med., vol. 6, no. 2, pp. 204–211, 2007.
[24] K. D. Barber Foss, G. D. Myer, and T. E. Hewett, “Epidemiology of basketball,

192
soccer, and volleyball injuries in middle-school female athletes,” Phys.
Sportsmed., vol. 42, no. 2, pp. 146–153, 2014, doi: 10.3810/psm.2014.05.2066.
[25] M. K. Drew, J. Cook, and C. F. Finch, “Sports-related workload and injury risk:
Simply knowing the risks will not prevent injuries: Narrative review,” Br. J.
Sports Med., vol. 50, no. 21, pp. 1306–1308, 2016, doi: 10.1136/bjsports-2015-
095871.
[26] S. Habelt, C. C. Hasler, K. Steinbrück, and M. Majewski, “Sport injuries in
adolescents,” Orthop. Rev. (Pavia)., vol. 3, no. 2, p. 18, 2011, doi:
10.4081/or.2011.e18.
[27] L. Abernethy and C. Bleakley, “Strategies to prevent injury in adolescent sport:
A systematic review,” Br. J. Sports Med., vol. 41, no. 10, pp. 627–638, 2007, doi:
10.1136/bjsm.2007.035691.
[28] C. A. Emery, “Injury prevention and future research,” Med. Sport Sci., vol. 48,
pp. 179–200, 2005, doi: 10.1159/000084289.
[29] K. Herman, C. Barton, P. Malliaras, and D. Morrissey, “The effectiveness of
neuromuscular warm-up strategies, that require no additional equipment, for
preventing lower limb injuries during sports participation: a systematic review,”
BMC Med., vol. 10, pp. 1–12, 2012, doi: 10.1186/1741-7015-10-75.
[30] M. Klügl et al., “The prevention of sport injury: An analysis of 12 000 published
manuscripts,” Clin. J. Sport Med., vol. 20, no. 6, pp. 407–412, 2010, doi:
10.1097/JSM.0b013e3181f4a99c.
[31] N. Maffulli, U. G. Longo, F. Spiezia, and V. Denaro, “Sports injuries in young
athletes: Long-term outcome and prevention strategies,” Phys. Sportsmed., vol.
38, no. 2, pp. 29–34, 2010, doi: 10.3810/psm.2010.06.1780.
[32] B. H. Patel et al., “Adductor injuries in the National Basketball Association: an
analysis of return to play and player performance from 2010 to 2019,” Phys.
Sportsmed., vol. 00, no. 00, pp. 1–8, 2020, doi:
10.1080/00913847.2020.1746978.
[33] C. Starkey, “Injuries and Illnesses in the National Basketball Association: A 10-
Year Perspective,” J. Athl. Train., vol. 35, no. 2, pp. 161–167, 2000.
[34] M. C. Drakos, B. Domb, C. Starkey, L. Callahan, and A. A. Allen, “Injury in the
National Basketball Association: A 17-year overview,” Sports Health, vol. 2, no.
4, pp. 284–290, 2010, doi: 10.1177/1941738109357303.
[35] T. J. Jackson, C. Starkey, D. McElhiney, and B. G. Domb, “Epidemiology of hip

193
injuries in the national basketball association: A 24-year overview,” Orthop. J.
Sport. Med., vol. 1, no. 3, pp. 1–7, 2013, doi: 10.1177/2325967113499130.
[36] E. Malinowski, Betaball: How Silicon Valley and Science Built One of the
Greatest Basketball Teams in History. Atria books, 2017. [Online]. Available:
http://repositorio.unan.edu.ni/2986/1/5624.pdf
[37] R. S. Sikka, M. Baer, A. Raja, M. Stuart, and M. Tompkins, “Analytics in sports
medicine: Implications and responsibilities that accompany the era of big data,”
J. Bone Jt. Surg. - Am. Vol., vol. 101, no. 3, pp. 276–283, 2019, doi:
10.2106/JBJS.17.01601.
[38] M. Dunham, Data Mining - Introductory and Advanced Topics. Prentice Hall,
2003.
[39] I. Katakis et al., “Mining Urban Data ( Part B ),” Inf. Syst., vol. 57, pp. 75–76,
2016, doi: 10.1016/j.is.2016.01.001.
[40] E. Siegel, Predictive Analytics. Wiley, 2013. [Online]. Available:
http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/978-1-
5225-1837-2.ch004
[41] J. F. Drazan, A. K. Loya, B. D. Horne, and R. Eglash, “From Sports to Science :
Using Basketball Analytics to Broaden the Appeal of Math and Science Among
Youth,” 2017 MIT-Sloan Sport. Anal. Conf., no. March, pp. 1–16, 2017.
[42] R. P. Schumaker, A. T. Jarmoszko, and C. S. Labedz, “Predicting wins and spread
in the Premier League using a sentiment analysis of twitter,” Decis. Support Syst.,
vol. 88, pp. 76–84, 2016, doi: 10.1016/j.dss.2016.05.010.
[43] A. Adhikari, A. Majumdar, G. Gupta, and A. Bisi, “An innovative super-
efficiency data envelopment analysis, semi-variance, and Shannon-entropy-
based methodology for player selection: evidence from cricket,” Ann. Oper. Res.,
vol. 284, no. 1, 2020, doi: 10.1007/s10479-018-3088-4.
[44] M. Sandri, P. Zuccolotto, and M. Manisera, “Markov switching modelling of
shooting performance variability and teammate interactions in basketball,” J. R.
Stat. Soc. Ser. C Appl. Stat., vol. 69, no. 5, pp. 1337–1356, 2020, doi:
10.1111/rssc.12442.
[45] Ö. Sürer, D. W. Apley, and E. C. Malthouse, “Coefficient tree regression for
generalized linear models,” Stat. Anal. Data Min., vol. 14, no. 5, pp. 407–429,
2021, doi: 10.1002/sam.11534.
[46] B. Cole, A. J. H. Arundale, J. Bytomski, and A. Amendola, Basketball Sports

194
Medicine and Science. Springer, 2020. doi: 10.1007/978-3-662-61070-1.
[47] J. del Corral, A. Maroto, and A. Gallardo, “Are Former Professional Athletes and
Native Better Coaches? Evidence From Spanish Basketball,” J. Sports Econom.,
vol. 18, no. 7, pp. 698–719, 2017, doi: 10.1177/1527002515595266.
[48] B. Simmons, The book of Basketball: The NBA according to the sports guy. 2010.
[49] C. Perin et al., “State of the Art of Sports Data Visualization,” Comput. Graph.
Forum, Wiley, p. 25, 2018.
[50] F. Thabtah, L. Zhang, and N. Abdelhamid, “NBA Game Result Prediction Using
Feature Analysis and Machine Learning,” Ann. Data Sci., vol. 6, no. 1, pp. 103–
116, 2019, doi: 10.1007/s40745-018-00189-x.
[51] V. C. Pantzalis and C. Tjortjis, “Sports Analytics for Football League Table and
Player Performance Prediction,” 11th Int. Conf. Information, Intell. Syst. Appl.
IISA 2020, 2020, doi: 10.1109/IISA50023.2020.9284352.
[52] K. Apostolou and C. Tjortjis, “Sports Analytics algorithms for performance
prediction,” 10th Int. Conf. Information, Intell. Syst. Appl. IISA 2019, pp. 1–4,
2019, doi: 10.1109/IISA.2019.8900754.
[53] C. Arndt and U. Brefeld, “Predicting the future performance of soccer players,”
Stat. Anal. Data Min. ASA Data Sci. J., vol. 9, no. 5, pp. 373–382, Oct. 2016, doi:
10.1002/sam.11321.
[54] T. H. Davenport and J. Harris, Competing of Analytics: The New Science of
Winning. 2007.
[55] U. Brefeld, J. Davis, R. Goebel, and J. Van Haaren, Machine Learning and Data
Mining in Sports Analytics. 2018. doi: 10.1007/978-3-030-17274-9.
[56] M. Cohen and M. Sloane, “Predictive Modeling and Statistical Analysis in
Sports,” 2015.
[57] M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y.
Manolopoulos, “Efficient and flexible algorithms for monitoring distance-based
outliers over data streams,” Inf. Syst., vol. 55, pp. 37–53, 2016, doi:
10.1016/j.is.2015.07.006.
[58] L. Martin, Sports Performance Measurement and Analytics: The Science of
Assessing Performance, Predicting Future Outcomes, Interpreting Statistical
Models, and Evaluating the Market Value of Athletes. Pearson Education LTD.,
2016. doi: 10.1007/s11587-007-0017-2.
[59] M. van Bommel and L. Bornn, “Adjusting for scorekeeper bias in NBA box

195
scores,” Data Min. Knowl. Discov., vol. 31, no. 6, pp. 1622–1642, 2017, doi:
10.1007/s10618-017-0497-y.
[60] R. Silva, Minusha, “Sports Analytics,” 2016. [Online]. Available:
http://eprints.whiterose.ac.uk/91467/
[61] B. C. Alamar, Sports analytics: A guide for coaches, managers, and other
decision makers. Columbia University Press, 2013.
[62] I. Bhandari, E. Colet, J. Parket, Z. Pines, R. Pratap, and K. Ramanujam,
“Advanced Scout: Data Mining and Knowledge Discovery in NBA Data,” Data
Min. Knowl. Discov., vol. 125, pp. 121–125, 1997.
[63] G. Vinue and I. Epifanio, “Archetypoid analysis for sports analytics,” Data Min.
Knowl. Discov., vol. 31, no. 6, pp. 1643–1677, 2017, doi: 10.1007/s10618-017-
0514-1.
[64] T. B. Swartz, “Working in Sports Analytics,” pp. 12–15, 2009, [Online].
Available: https://orinsports.com/wp-content/uploads/2015/12/Tim-Swartz-
Working-in-sports-analytics.pdf
[65] B. Marr, Big Data: Using SMART big data, analytics and metrics to make better
decisions and improve performance. John Wiley & Sons, 2015.
[66] H. Arrieta et al., “Relative age effect and performance in the U16 , U18 and U20
European Basketball Championships Championships,” J. Sports Sci., vol. 0414,
no. March, 2016, doi: 10.1080/02640414.2015.1122204.
[67] J. Fernández, “From Training to Match Performance: An Exploratory and
Predictive Analysis on F.C. Barcelona GPS Data,” Facultat d’Informàtica de
Barcelona, 2016.
[68] F. Bianchi, T. Facchinetti, and P. Zuccolotto, “Role revolution: Towards a new
meaning of positions in basketball,” Electron. J. Appl. Stat. Anal., vol. 10, no. 03,
pp. 712–734, 2017, doi: 10.1285/i20705948v10n3p712.
[69] Aurélien Géron, Hands-on Machine Learning with Scikit-Learn& TensorFlow.
O’Reilly Media, 2017. doi: 10.3389/fninf.2014.00014.
[70] J. Foreman, Data Smart. Wiley, 2014.
[71] J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Third.
Morgan Kaufmann, 2011. doi: 10.1016/B978-0-12-381479-1.00001-0.
[72] C. M. Bishop, Machine Learning and Pattern Recoginiton. Springer US, 2007.
[73] P. Tzirakis and C. Tjortjis, “T3C: improving a decision tree classification
algorithm’s interval splits on continuous attributes,” Adv. Data Anal. Classif.,

196
vol. 11, no. 2, pp. 353–370, 2017, doi: 10.1007/s11634-016-0246-x.
[74] I. Witten, E. Frank, and M. Hall, Data Mining Practical Machine Learning Tools
and Techniques, Third., no. 3. Elsevier Inc., 2011.
[75] R. Ul Mustafa, M. S. Nawaz, M. I. U. Lali, T. Zia, and W. Mehmood, “Predicting
the Cricket match outcome using crowd opinions on social networks: A
comparative study of machine learning methods,” Malaysian J. Comput. Sci., vol.
30, no. 1, pp. 63–76, 2017, doi: 10.22452/mjcs.vol30no1.5.
[76] T. M. Mitchell, Machine Learning. McGraw-Hill Science, 1997. doi:
10.1016/j.cub.2007.11.035.
[77] S. Raschka, Python Machine Learning: Unlock deeper insights into machine
learning with this vital guide to cutting-edge predictive analytics. 2016.
[78] M. R. Berthold, Journeys to Data Mining. Springer US, 2012. doi: 10.1007/978-
3-642-28047-4.
[79] D. Miljković, L. Gajić, A. Kovačević, and Z. Konjović, “The use of data mining
for basketball matches outcomes prediction,” in SIISY 2010 - 8th IEEE
International Symposium on Intelligent Systems and Informatics, 2010, no.
September, pp. 309–312. doi: 10.1109/SISY.2010.5647440.
[80] S. M. Ghafari and C. Tjortjis, “A Survey on Association Rules Mining Using
Heuristics,” WIREs Data Min. Knowl. Discov., vol. 9, no. 4, 2019.
[81] S. Yakhchi, S. M. Ghafari, C. Tjortjis, and M. Fazeli, “ARMICA-Improved: A
New Approach for Association Rule Mining,” in Springer International
Publishing, no. 3, 2017, pp. 296–306. doi: 10.1007/978-3-319-63558-3_25.
[82] L. Dong and C. Tjortjis, “Experiences of Using a Quantitative Approach for
Mining Association Rules,” in Handbook of Research on Emerging Rule-Based
Languages and Technologies, no. May, 2003, pp. 693–700. doi: 10.1007/978-3-
540-45080-1_93.
[83] K. Wang and R. Zemel, “Classifying NBA Offensive Plays Using Neural
Networks,” MIT Sloan Sport. Anal. Conf., pp. 1–9, 2016.
[84] P. Foster and T. Fawcett, Data Science for Business What You Need to Know
about Data Mining and Data Analytic Thinking. O’Reilly, 2013.
[85] J. Ledolter, Data Mining and Business Analytics with R. Wiley, 2013. doi:
10.1007/978-1-4614-6080-0.
[86] J. Gudmundsson and M. Horton, “Spatio-Temporal Analysis of Team Sports --
A Survey,” 2016, doi: 10.1145/3054132.

197
[87] C. Tjortjis and J. Keane, “T3: A Classification Algorithm for Data Mining,” in
Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2412, no.
August 2002, 2002, pp. 50–55. doi: 10.1007/3-540-45675-9_9.
[88] A. Franks, A. Miller, L. Bornn, and K. Goldsberry, “Characterizing the spatial
structure of defensive skill in professional basketball,” Ann. Appl. Stat., vol. 9,
no. 1, pp. 94–121, 2015, doi: 10.1214/14-AOAS799.
[89] L. Lefebure, “Understanding Player Positions in the NBA,” Stanford Univ., pp.
1–5, 2014.
[90] R. Nagarajan and L. Li, “Optimizing NBA player selection strategies based on
salary and statistics analysis,” Proc. - 2017 IEEE 15th Int. Conf. Dependable,
Auton. Secur. Comput. 2017 IEEE 15th Int. Conf. Pervasive Intell. Comput. 2017
IEEE 3rd Int. Conf. Big Data Intell. Compu, vol. 2018-Janua, pp. 1076–1083,
2018, doi: 10.1109/DASC-PICom-DataCom-CyberSciTec.2017.175.
[91] V. Madhavan, “Predicting NBA Game Outcomes with Hidden Markov Models,”
2017.
[92] D. Cervone, L. Bornn, and K. Goldsberry, “NBA Court Realty Intro: the
Basketball Court is a Real Estate Market,” MIT Sloan Sport. Anal. Conf., no. 1,
pp. 1–8, 2016.
[93] A. Franks, A. Miller, L. Bornn, and K. Goldsberry, “Counterpoints: Advanced
Defensive Metrics for NBA Basketball,” MIT Sloan Sport. Anal. Conf., pp. 1–8,
2015.
[94] D. Lutz, “A cluster analysis of NBA players,” 2010.
[95] L. Sha et al., “Interactive sports analytics: An intelligent interface for utilizing
trajectories for interactive sports play retrieval and analytics,” ACM Trans.
Comput. Interact., vol. 25, no. 2, 2018, doi: 10.1145/3185596.
[96] A. Nistala and J. Guttag, “Using Deep Learning to Understand Patterns of Player
Movement in the NBA,” MIT Sloan Sport. Anal. Conf., pp. 1–14, 2019.
[97] E. Papalexakis and K. Pelechrinis, “tHoops: A Multi-Aspect Analytical
Framework Spatio-Temporal Basketball Data,” 2017. [Online]. Available:
http://arxiv.org/abs/1712.01199
[98] A. Miller, L. Bornn, and K. Goldsberry, “Factorized Point Process Intensities: A
Spatial Analysis of Professional Basketball,” vol. 32, pp. 1–9, 2014, [Online].
Available: papers://d471b97a-e92c-44c2-8562-4efc271c8c1b/Paper/p821

198
[99] Y. Kanellopoulos, P. Antonellis, C. Tjortjis, C. Makris, and N. Tsirakis, “K-
attractors: A partitional clustering algorithm for numeric data analysis,” Appl.
Artif. Intell., vol. 25, no. 2, pp. 97–115, 2011, doi:
10.1080/08839514.2011.534590.
[100] L. Csató, “The UEFA Champions League seeding is not strategy-proof since the
2015/16 season,” Ann. Oper. Res., vol. 292, no. 1, pp. 161–169, 2020, doi:
10.1007/s10479-020-03637-1.
[101] J. Brooks, M. Kerr, and J. Guttag, “Using machine learning to draw inferences
from pass location data in soccer,” Stat. Anal. Data Min., vol. 9, no. 5, pp. 338–
349, 2016, doi: 10.1002/sam.11318.
[102] I. Markov, E. Stamatatos, and G. Sidorov, Improving cross-topic authorship
attribution: The role of pre-processing, vol. 10762 LNCS, no. Cic. Springer
International Publishing, 2018. doi: 10.1007/978-3-319-77116-8_21.
[103] A. Kucharski, The Perfect Bet: How Science and Math Are Taking the Luck Out
of Gambling. Basic Books, 2016. [Online]. Available:
http://repositorio.unan.edu.ni/2986/1/5624.pdf
[104] RealGM, “Basketball Real GM,” 2021.
https://basketball.realgm.com/nba/info/rookie_scale/2021 (accessed Jan. 10,
2021).
[105] W. Winston, Mathletics : how gamblers, managers, and sports enthusiasts use
mathematics in baseball, basketball, and football. Princeton University Press,
2009. doi: 796.0151.
[106] A. Motomura, K. V. Roberts, D. M. Leeds, and M. A. Leeds, “Does It Pay to
Build Through the Draft in the National Basketball Association?,” J. Sports
Econom., vol. 17, no. 5, pp. 501–516, 2016, doi: 10.1177/1527002516641169.
[107] PsychGuides, “Male body image and the average athlete,” 2021.
https://www.psychguides.com/interact/male-body-image-and-the-average-
athlete/ (accessed Jan. 10, 2021).
[108] N. H. Amin, A. B. Old, L. P. Tabb, R. Garg, N. Toossi, and D. L. Cerynik,
“Performance outcomes after repair of complete achilles tendon ruptures in
National Basketball Association players,” Am. J. Sports Med., vol. 41, no. 8, pp.
1864–1868, 2013, doi: 10.1177/0363546513490659.
[109] V. Sarlis, V. Chatziilias, C. Tjortjis, and D. Mandalidis, “A Data Science
approach analysing the Impact of Injuries on Basketball Player and Team

199
Performance,” Inf. Syst., vol. 99, p. 16, 2021, doi: 10.1016/j.is.2021.101750.
[110] N. Silver, “CARMELO NBA player projections,” 2015.
[111] W. Feng, C. Y. Lim, T. Maiti, and Z. Zhang, “Spatial regression and estimation
of disease risks: A clustering-based approach,” Stat. Anal. Data Min., vol. 9, no.
6, pp. 417–434, 2016, doi: 10.1002/sam.11314.
[112] L. Pham, S. Anand, and J. Piette, “Evaluating Basketball Player Performance via
Statistical Network Modeling,” MIT Sloan Sport. Anal. Conf., no. June, p. 12,
2011, [Online]. Available: https://www.researchgate.net/publication/267963402
[113] R. P. Schumaker, O. K. Solieman, and H. Chen, “Sports Knowledge Management
and Data Mining,” in Annual Review of Information Science and Technology
develop, vol. 44, Springer, 2010, pp. 115–158. doi:
10.1002/aris.2010.1440440110.
[114] S. Davis, “LeBron James diet, workouts, treatment cost him $1.5 million in body
care - Business Insider.” https://www.businessinsider.com/how-lebron-james-
spends-money-body-care-2018-7 (accessed Dec. 12, 2020).
[115] J. Orchard and J. Hayes, “Using the world wide web to conduct epidemiological
research : an example using the national basketball association,” Int. Sport. J, vol.
2, no. 2, pp. 1–15, 2001.
[116] P. C. Yeh, C. Starkey, S. Lombardo, G. Vitti, and F. D. Kharrazi, “Epidemiology
of isolated meniscal injury and its effect on performance in athletes from the
National Basketball Association,” Am. J. Sports Med., vol. 40, no. 3, pp. 589–
594, 2012, doi: 10.1177/0363546511428601.
[117] P. P. Salehi, A. Heiser, S. J. Torabi, B. Azizzadeh, J. Lee, and Y. H. Lee, “Facial
Fractures and the National Basketball Association: Epidemiology and
Outcomes,” Laryngoscope, vol. 130, no. 12, pp. E824–E832, 2020, doi:
10.1002/lary.28690.
[118] M. M. Herzog et al., “Ankle Sprains in the National Basketball Association,
2013-2014 Through 2016-2017,” Am. J. Sports Med., vol. 47, no. 11, pp. 2651–
2658, 2019, doi: 10.1177/0363546519864678.
[119] NBA.com, “NBA.com,” 2023. https://stats.nba.com (accessed Nov. 01, 2023).
[120] C. D. Mack et al., “The establishment and refinement of the national Basketball
Association player injury and illness database,” J. Athl. Train., vol. 54, no. 5, pp.
466–471, 2019, doi: 10.4085/1062-6050-18-19.
[121] H. Talukder et al., “Preventing in-game injuries for NBA players Paper ID :

200
1590,” MIT Sloan Sport. Anal. Conf., vol. 2015, pp. 1–13, 2016.
[122] J. Plunkett, “Global Sports Industry Soars to $1.3 Trillion,” Plunkett Research
Online, 2020. https://www.plunkettresearch.com/global-sports-industry-soars-
to-1-3-trillion/ (accessed Jan. 20, 2021).
[123] V. A. Matheson, “Contrary Evidence on the Economic Effect of the Super Bowl
on the Victorious City,” J. Sports Econom., vol. 6, no. 4, pp. 420–428, 2005, doi:
10.1177/1527002504267489.
[124] P. S. Gill and T. B. Swartz, “A characterization of the degree of weak and strong
links in doubles sports,” J. Quant. Anal. Sport., vol. 15, no. 2, pp. 155–162, 2019,
doi: 10.1515/jqas-2018-0080.
[125] R. Beal, T. J. Norman, and S. D. Ramchurn, “Artificial intelligence for team
sports: A survey,” Knowl. Eng. Rev., vol. 34, pp. 1–37, 2019, doi:
10.1017/S0269888919000225.
[126] J. J. Catalfano, “Moneyball to Moreyball : How Analytics Have Shaped the NBA
Today,” 2015. [Online]. Available:
http://fisherpub.sjfc.edu/sport_undergrad/106/
[127] M. Lewis, Moneyball: the art of winning an unfair game. 2004. doi:
10.5860/CHOICE.41-4733.
[128] W. NBA Basketball Reference, “NBA Basketball Reference,” 2021.
https://www.basketball-reference.com/ (accessed Aug. 20, 2023).
[129] W. ESPN NBA Stats, “ESPN NBA Stats,” 2023. https://www.espn.com/nba/stats
(accessed Jan. 10, 2023).
[130] A. Cohan, J. Schuster, and J. Fernandez, “A deep learning approach to injury
forecasting in NBA basketball,” J. Sport. Anal., vol. 7, no. 4, pp. 277–289, 2021,
doi: 10.3233/jsa-200529.
[131] M. C. Malamatinos, E. Vrochidou, and G. A. Papakostas, “On Predicting Soccer
Outcomes in the Greek League Using Machine Learning,” Computers, vol. 11,
no. 9, 2022, doi: 10.3390/computers11090133.
[132] A. Cortez, A. Trigo, and N. Loureiro, “Football Match Line-Up Prediction Based
on Physiological Variables: A Machine Learning Approach†,” Computers, vol.
11, no. 3, 2022, doi: 10.3390/computers11030040.
[133] G. S. Bullock, T. Ferguson, A. H. Arundale, C. L. Martin, G. S. Collins, and S.
Kluzek, “Return to performance following severe ankle, knee, and hip injuries in
National Basketball Association players,” PNAS Nexus, vol. 1, no. 4, pp. 1–7,

201
2022, doi: 10.1093/pnasnexus/pgac176.
[134] N. Mateus et al., “Clustering performance in the European Basketball according
to players’ characteristics and contextual variables,” Int. J. Sport. Sci. Coach.,
vol. 15, no. 3, pp. 405–411, 2020, doi: 10.1177/1747954120911308.
[135] J. Nakase, K. Kitaoka, Y. Shima, T. Oshima, G. Sakurai, and H. Tsuchiya, “Risk
factors for noncontact anterior cruciate ligament injury in female high school
basketball and handball players: A prospective 3-year cohort study,” Asia-Pacific
J. Sport. Med. Arthrosc. Rehabil. Technol., vol. 22, pp. 34–38, 2020, doi:
10.1016/j.asmart.2020.06.002.
[136] S. Jauhiainen et al., “New Machine Learning Approach for Detection of Injury
Risk Factors in Young Team Sport Athletes,” Int. J. Sports Med., vol. 42, no. 2,
pp. 175–182, 2021, doi: 10.1055/a-1231-5304.
[137] S. Kaplan, “The Economic Value of Popularity: Evidence from Superstars in the
National Basketball Association,” SSRN Electron. J., p. 50, 2020, doi:
10.2139/ssrn.3543686.
[138] D. L. Marks, C. Vinegoni, J. S. Bredfeldt, and S. A. Boppart, “Methods, systems
and software programs for enhanced sports analytics and applications,” Dec. 06,
2015 doi: 10.1063/1.1829162.
[139] D. B. McKeag, Handbook of Sports Medicine and Science. Boca Raton, FL :
CRC Press, 2020.: CRC Press, 2003. doi: 10.1201/9781351060073.
[140] T. Krosshaug et al., “Mechanisms of anterior cruciate ligament injury in
basketball: Video analysis of 39 cases,” Am. J. Sports Med., vol. 35, no. 3, pp.
359–367, 2007, doi: 10.1177/0363546506293899.
[141] B. Li and X. Xu, “Application of Artificial Intelligence in Basketball Sport,” J.
Educ. Heal. Sport, vol. 11, no. 7, pp. 54–67, 2021, doi:
10.12775/jehs.2021.11.07.005.
[142] Z. Terner and A. Franks, “Modeling player and team performance in basketball,”
Annu. Rev. Stat. Its Appl., vol. 8, pp. 1–23, 2021, doi: 10.1146/annurev-statistics-
040720-015536.
[143] L. Torres-Ronda, I. Gámez, S. Robertson, and J. Fernández, “Epidemiology and
injury trends in the National Basketball Association: Pre- and perCOVID-19
(2017–2021),” PLoS One, vol. 17, no. 2 February, pp. 1–19, 2022, doi:
10.1371/journal.pone.0263354.
[144] N. J. Vaudreuil, C. F. van Eck, S. J. Lombardo, and F. D. Kharrazi, “Economic

202
and Performance Impact of Anterior Cruciate Ligament Injury in National
Basketball Association Players,” Orthop. J. Sport. Med., vol. 9, no. 9, pp. 1–6,
2021, doi: 10.1177/23259671211026617.
[145] Q. Louw, K. Grimmer, and C. Vaughan, “Knee movement patterns of injured and
uninjured adolescent basketball players when landing from a jump: A case-
control study,” BMC Musculoskelet. Disord., vol. 7, pp. 1–7, 2006, doi:
10.1186/1471-2474-7-22.
[146] V. Neilson, S. Ward, P. Hume, G. Lewis, and A. McDaid, “Effects of augmented
feedback on training jump landing tasks for ACL injury prevention: A systematic
review and meta-analysis,” Phys. Ther. Sport, vol. 39, pp. 126–135, 2019, doi:
10.1016/j.ptsp.2019.07.004.
[147] I. O. Afara et al., “Machine Learning Classification of Articular Cartilage
Integrity Using Near Infrared Spectroscopy,” Cell. Mol. Bioeng., vol. 13, no. 3,
pp. 219–228, 2020, doi: 10.1007/s12195-020-00612-5.
[148] M. F. Aljunid and D. H. Manjaiah, Data Management, Analytics and Innovation,
vol. 808. Springer, 2019. doi: 10.1007/978-981-13-1402-5.
[149] M. McClusky, Faster, Higher, Stronger: How Sports Science Is Creating a New
Generation of Superathletes and What We Can Learn from Them. Cambridge
University Press, 2014. [Online]. Available:
https://www.cambridge.org/core/product/identifier/CBO9781107415324A009/t
ype/book_part
[150] V. Sarlis, G. Papageorgiou, and C. Tjortjis, “Sports Analytics and Text Mining
NBA Data to Assess Recovery from Injuries and their Economic Impact,” MDPI
- Comput., 2023.
[151] C. Richter, M. O’Reilly, and E. Delahunt, “Machine learning in sports science:
challenges and opportunities,” Sport. Biomech., vol. 00, no. 00, pp. 1–7, Apr.
2021, doi: 10.1080/14763141.2021.1910334.
[152] S. Kato and S. Yamagiwa, “Predicting Successful Throwing Technique in Judo
from Factors of Kumite Posture Based on a Machine-Learning Approach,”
Computation, vol. 10, no. 10, pp. 1–17, 2022, doi:
10.3390/computation10100175.
[153] M. Frank, D. Drikakis, and V. Charissis, “Machine-learning methods for
computational science and engineering,” Computation, vol. 8, no. 1, pp. 1–35,
2020, doi: 10.3390/computation8010015.

203
[154] M. Mirmozaffari, A. Alinezhad, and A. Gilanpour, “Data Mining Apriori
Algorithm for Heart Disease Prediction,” Int. J. Comput. Commun. Instrum. Eng.,
vol. 4, no. 1, 2017, doi: 10.15242/ijccie.dir1116010.
[155] J. F. Abulhasan and M. J. Grey, “Anatomy and physiology of knee stability,” J.
Funct. Morphol. Kinesiol., vol. 2, no. 4, 2017, doi: 10.3390/jfmk2040034.
[156] S. Tedesco et al., “Motion sensors-based machine learning approach for the
identification of anterior cruciate ligament gait patterns in on-the-field activities
in rugby players,” Sensors (Switzerland), vol. 20, no. 11, 2020, doi:
10.3390/s20113029.
[157] J. G. Claudino, D. de O. Capanema, T. V. de Souza, J. C. Serrão, A. C. Machado
Pereira, and G. P. Nassis, “Current Approaches to the Use of Artificial
Intelligence for Injury Risk Assessment and Performance Prediction in Team
Sports: a Systematic Review,” Sport. Med. - Open, vol. 5, no. 1, 2019, doi:
10.1186/s40798-019-0202-3.
[158] Y. Lao et al., “Diagnostic accuracy of machine-learning-assisted detection for
anterior cruciate ligament injury based on magnetic resonance imaging: Protocol
for a systematic review and meta-analysis,” Med. (United States), vol. 98, no. 50,
pp. 1–5, 2019, doi: 10.1097/MD.0000000000018324.
[159] E. Dritsas and M. Trigka, “Efficient Data-Driven Machine Learning Models for
Cardiovascular Diseases Risk Prediction,” Sensors, vol. 23, no. 3, p. 1161, Jan.
2023, doi: 10.3390/s23031161.
[160] J. Mateos Conde, M. T. Cabero Morán, and C. Moreno Pascual, “Prospective
epidemiological study of basketball injuries during one competitive season in
professional and amateur Spanish basketball,” Phys. Sportsmed., vol. 50, no. 4,
pp. 349–358, 2022, doi: 10.1080/00913847.2021.1943721.
[161] G. Papageorgiou, V. Sarlis, and C. Tjortjis, “Unsupervised Learning in NBA
Injury Recovery: Advanced Data Mining to Decode Recovery Durations and
Economic Impacts,” Inf., vol. 15, no. 1, 2024, doi: 10.3390/info15010061.
[162] T. Allen and J. E. Goff, “Resources for sports engineering education,” Sport.
Eng., vol. 21, no. 4, pp. 245–253, 2018, doi: 10.1007/s12283-017-0250-1.
[163] T. E. Hewett, G. D. Myer, K. R. Ford, M. V. Paterno, and C. E. Quatman,
“Mechanisms, prediction, and prevention of ACL injuries: Cut risk with three
sharpened and validated tools,” J. Orthop. Res., vol. 34, no. 11, pp. 1843–1855,
2016, doi: 10.1002/jor.23414.

204
[164] T. H. Trojian, A. Cracco, M. Hall, M. Mascaro, G. Aerni, and R. Ragle,
“Basketball injuries: Caring for a basketball team,” Curr. Sports Med. Rep., vol.
12, no. 5, pp. 321–328, 2013, doi: 10.1097/01.CSMR.0000434055.36042.cd.
[165] M. Khan, S. Ekhtiari, T. Burrus, K. Madden, J. P. Rogowski, and A. Bedi,
“Impact of Knee Injuries on Post-retirement Pain and Quality of Life: A Cross-
Sectional Survey of Professional Basketball Players,” HSS J., vol. 16, pp. 327–
332, 2020, doi: 10.1007/s11420-019-09736-5.
[166] E. J. Petushek, D. Sugimoto, M. Stoolmiller, G. Smith, and G. D. Myer,
“Evidence-Based Best-Practice Guidelines for Preventing Anterior Cruciate
Ligament Injuries in Young Female Athletes: A Systematic Review and Meta-
analysis,” Am. J. Sports Med., vol. 47, no. 7, pp. 1744–1753, 2019, doi:
10.1177/0363546518782460.
[167] N. Caplan and D. F. Kader, “Knee Injury Patterns Among Men and Women in
Collegiate Basketball and Soccer: NCAA Data and Review of Literature,” in
Classic Papers in Orthopaedics, London: Springer London, 2014, pp. 153–155.
doi: 10.1007/978-1-4471-5451-8_37.
[168] D. Sundemo et al., “Generalised joint hypermobility increases ACL injury risk
and is associated with inferior outcome after ACL reconstruction: A systematic
review,” BMJ Open Sport Exerc. Med., vol. 5, no. 1, 2019, doi: 10.1136/bmjsem-
2019-000620.
[169] C. M. Powers, “The influence of abnormal hip mechanics on knee injury: A
biomechanical perspective,” J. Orthop. Sports Phys. Ther., vol. 40, no. 2, pp. 42–
51, 2010, doi: 10.2519/jospt.2010.3337.
[170] G. T. G. Hughes, V. Camomilla, B. Vanwanseele, A. J. Harrison, D. T. P. Fong,
and E. J. Bradshaw, “Novel technology in sports biomechanics: some words of
caution,” Sport. Biomech., vol. 00, no. 00, pp. 1–9, Apr. 2021, doi:
10.1080/14763141.2020.1869453.
[171] C. V. Andreoli, B. C. Chiaramonti, E. Buriel, A. D. C. Pochini, B. Ejnisman, and
M. Cohen, “Epidemiology of sports injuries in basketball: Integrative systematic
review,” BMJ Open Sport Exerc. Med., vol. 4, no. 1, 2018, doi: 10.1136/bmjsem-
2018-000468.
[172] B. S. Kester, O. A. Behery, S. V. Minhas, and W. K. Hsu, “Athletic performance
and career longevity following anterior cruciate ligament reconstruction in the
National Basketball Association,” Knee Surgery, Sport. Traumatol. Arthrosc.,

205
vol. 25, no. 10, pp. 3031–3037, 2017, doi: 10.1007/s00167-016-4060-y.
[173] F. Vidal-Codina, N. Evans, B. El Fakir, and J. Billingham, “Automatic event
detection in football using tracking data,” Sport. Eng., vol. 25, no. 1, pp. 1–15,
2022, doi: 10.1007/s12283-022-00381-6.
[174] V. Sarlis and C. Tjortjis, “Assessing Economic and Performance Impact of
Injuries , Age and Position on NBA Players Using Data Mining and Sports
Analytics,” p. 25.
[175] E. Alonso Pérez-Chao, A. Lorenzo, A. Scanlan, P. Lisboa, C. Sosa, and M. Á.
Gómez, “Higher Playing Times Accumulated Across Entire Games and Prior to
Intense Passages Reduce the Peak Demands Reached by Elite, Junior, Male
Basketball Players,” Am. J. Mens. Health, vol. 15, no. 5, 2021, doi:
10.1177/15579883211054353.
[176] S. M. Short, C. W. MacDonald, and D. Strack, “Hip and groin injury prevention
in elite athletes and team sport – current challenges and opportunities,” Int. J.
Sports Phys. Ther., vol. 16, no. 1, pp. 270–281, 2021, doi: 10.26603/001c.18705.
[177] T. Chartier, “Embracing Imperfection,” J. Corp. Account. Financ., vol. 27, no. 4,
pp. 67–68, May 2016, doi: 10.1002/jcaf.22162.
[178] P. Vaz de Melo, V. Almeida, A. Loureiro, and C. Faloutsos, “Forecasting in the
NBA and Other Team Sports: Network Effects in Action,” ACM Trans. Knowl.
Discov. Data, vol. 6, no. 3, pp. 1–27, 2012, doi: 10.1145/2362383.2362387.
[179] A. M. Franks, A. D’Amour, D. Cervone, and L. Bornn, “Meta-analytics: Tools
for understanding the statistical properties of sports metrics,” J. Quant. Anal.
Sport., vol. 12, no. 4, pp. 151–165, 2016, doi: 10.1515/jqas-2016-0098.
[180] W. U.S. Bureau of Labor Statistics, “U.S. Bureau of Labor Statistics,” 2022.
https://www.bls.gov/ (accessed Feb. 01, 2022).
[181] B. J. Jansen, K. Moore, and S. Carman, “Evaluating the performance of
demographic targeting using gender in sponsored search,” Inf. Process. Manag.,
vol. 49, no. 1, pp. 286–302, 2013, doi: 10.1016/j.ipm.2012.06.001.
[182] Y. Yang, “Predicting Regular Season Results of NBA Teams Based on
Regression Analysis of Common Basketball Statistics,” 2015. doi:
10.1145/3132847.3132886.
[183] P. D’Urso, L. De Giovanni, and R. Massari, “Trimmed fuzzy clustering of
financial time series based on dynamic time warping,” Ann. Oper. Res., vol. 299,
no. 1–2, pp. 1379–1395, 2021, doi: 10.1007/s10479-019-03284-1.

206
[184] A. Zimmermann, “Basketball predictions in the NCAAB and NBA: Similarities
and differences,” Stat. Anal. Data Min., vol. 9, no. 5, pp. 350–364, 2016, doi:
10.1002/sam.11319.
[185] D. Oliver, Basketball on Paper: Rules and Tools for Performance Analysis. 2004.
[186] A. Groll and D. Liebl, “Editorial special issue: Statistics in sports,” AStA Adv.
Stat. Anal., no. 0123456789, 2022, doi: 10.1007/s10182-022-00453-9.
[187] Investopedia, “U.S. Inflation Rate by Year: 1929–2023.”
https://www.investopedia.com/inflation-rate-by-year-7253832 (accessed Sep.
01, 2023).
[188] C. M. Group, “US Inflation Calculator,” 2023.
https://www.usinflationcalculator.com/inflation/current-inflation-rates/
(accessed Sep. 01, 2023).
[189] P. D’Urso, L. De Giovanni, and R. Massari, “Smoothed self-organizing map for
robust clustering,” Inf. Sci. (Ny)., vol. 512, no. xxxx, pp. 381–401, 2020, doi:
10.1016/j.ins.2019.06.038.
[190] P. Downward, A. Dawson, and T. Dejonghe, Sports Economics: Theory,
Evidence and Policy. Elsevier, 2009. [Online]. Available:
www.elsevierdirect.com
[191] K. Pelechrinis, “Linnet: Probabilistic lineup evaluation through network
embedding,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell.
Lect. Notes Bioinformatics), vol. 11053 LNAI, pp. 20–36, 2019, doi:
10.1007/978-3-030-10997-4_2.
[192] J. Sill, “Improved NBA Adjusted + / - Using Regularization and Out-of-Sample
Testing,” in MIT sloan sports analytics conference, 2010, pp. 1–7.
[193] “www.bball-index.com.” https://www.bball-index.com/player-impact-plus-
minus/ (accessed Sep. 15, 2019).
[194] Y. Chen, Y. Gong, and X. Li, “Evaluating NBA player performance using
bounded integer data envelopment analysis,” Infor Inf. Syst. Oper. Researxher,
vol. 55, no. 1, pp. 38–51, 2017, doi: 10.1080/03155986.2016.1262581.
[195] “Databall - Advanced Stats in the NBA for Dummies.”
https://nbastatsgeeks.wordpress.com/newinnovative-apbrmetrics/expected-
possession-value-epv/
[196] D. Cervone, A. D’Amour, L. Bornn, and K. Goldsberry, “POINTWISE:
Predicting Points and Valuing Decisions in Real Time with NBA Optical

207
Tracking Data,” MIT Sloan Sport. Anal. Conf., vol. 64, pp. S1081–S1082, 2012,
doi: http://dx.doi.org/10.1002/art.37735.
[197] D. Cervone, A. D’Amour, L. Bornn, and K. Goldsberry, “A Multiresolution
Stochastic Process Model for Predicting Basketball Possession Outcomes,” J.
Am. Stat. Assoc., vol. 111, no. 514, pp. 585–599, 2016, doi:
10.1080/01621459.2016.1141685.
[198] S. J. Miller, “A derivation of the pythagorean won-loss formula in baseball,” vol.
30, pp. 1–13, 2006.
[199] M. Sachanidi, N. Apostolidis, D. Chatzicharistos, M. Sachanidi, N. Apostolidis,
and D. Chatzicharistos, “Passing efficacy of young basketball players : test or
observation ?,” Int. J. Perform. Anal. Sport, 2013.
[200] T. Chartier, “Coachable Business Results,” J. Corp. Account. Financ., vol. 27,
no. 1, pp. 83–85, Nov. 2015, doi: 10.1002/jcaf.
[201] M. Goldberg, “Evaluating Lineups and Complementary Play Styles in the NBA
The Harvard community has made this,” 2018.
[202] T. Chartier, “Valuing Data,” Wiley Online Library, vol. 28, no. 2, pp. 88–89, Jan.
2017. doi: 10.1002/jcaf.
[203] Y. Li, L. Wang, and F. Li, “A data-driven prediction approach for sports team
performance and its application to National Basketball Association,” Omega -
Elsevier, no. September, p. 102123, 2019, doi: 10.1016/j.omega.2019.102123.
[204] O. Levchenko et al., “BestNeighbor: efficient evaluation of kNN queries on large
time series databases,” Knowl. Inf. Syst., 2020, doi: 10.1007/s10115-020-01518-
4.
[205] A. B. Downey, Think Stats: Probability and Statistics for Programmers, vol. 70,
no. 2. 2011. doi: 10.1017/CBO9781107415324.004.
[206] Statathlon.com, “Statathlon.com,” 2021. https://statathlon.com/ (accessed Sep.
20, 2020).
[207] J. B. Bension, “The Importance of NBA Box Score Statistics and the Value of
Statistical Outbursts,” 2019. [Online]. Available:
http://repository.library.csuci.edu/bitstream/handle/10211.3/214528/Bension,
Jack MSCS Thesis F 19_done.pdf?sequence=1
[208] S. Serrano, Basketball (and Other Things): A Collection of Questions Asked,
Answered, Illustrated. Abrams Image, 2017. [Online]. Available:
file:///C:/Users/User/Downloads/fvm939e.pdf

208
[209] M. Teramoto, C. L. Cross, D. M. Cushman, T. G. Maak, D. J. Petron, and S. E.
Willick, “Game injuries in relation to game schedules in the National Basketball
Association,” J. Sci. Med. Sport, vol. 20, no. 3, pp. 230–235, 2017, doi:
10.1016/j.jsams.2016.08.020.
[210] M. Á. Gómez Ruano, L. Gasperi, and C. Lupo, “Performance analysis of game
dynamics during the 4th game quarter of NBA close games,” Int. J. Perform.
Anal. Sport, vol. 16, no. 1, pp. 249–263, 2016, doi:
10.1080/24748668.2016.11868884.
[211] H. Manner, “Modeling and forecasting the outcomes of NBA basketball games,”
J. Quant. Anal. Sport., vol. 12, no. 1, p. 25, 2016, doi: 10.1515/jqas-2015-0088.
[212] G. Cheng, Z. Zhang, M. N. Kyebambe, and N. Kimbugwe, “Predicting the
outcome of NBA playoffs based on the maximum entropy principle,” Entropy,
vol. 18, no. 12, pp. 1–15, 2016, doi: 10.3390/e18120450.
[213] E. Siegel, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie,
Or Die. Wiley, 2013.
[214] S. Zagelmeyer and P. J. Gollan, “Exploring terra incognita: preliminary
reflections on the impact of the global financial crisis upon human resource
management,” Int. J. Hum. Resour. Manag., vol. 23, no. 16, pp. 3287–3294,
2012, doi: 10.1080/09585192.2012.689158.
[215] C. Wheelan and C. Wheelam, Naked Statistics: Stripping the Dread from the
Data. W. W. Norton & Company, 2013. doi: 10.1080/09332480.2014.890874.
[216] O. A. Donoghue, A. J. Harrison, N. Coffey, and K. Hayes, “Functional data
analysis of running kinematics in Chronic Achilles tendon injury,” Med. Sci.
Sports Exerc., vol. 40, no. 7, pp. 1323–1335, 2008, doi:
10.1249/MSS.0b013e31816c4807.
[217] H. R. Thornton, J. A. Delaney, G. M. Duthie, and B. J. Dascombe, “Developing
athlete monitoring systems in team sports: Data analysis and visualization,” Int.
J. Sports Physiol. Perform., pp. 698–705, 2019.
[218] J. Christmann, M. Akamphuber, A. L. Müllenbach, and A. Güllich, “Crunch time
in the NBA – The effectiveness of different play types in the endgame of close
matches in professional basketball,” Int. J. Sport. Sci. Coach., vol. 13, no. 6, pp.
1090–1099, 2018, doi: 10.1177/1747954118772485.
[219] V. Sarlis, G. Papageorgiou, and C. Tjortjis, “Injury Patterns and Impact on
Performance in the NBA League Using Sports Analytics,” Computation, vol. 12,

209
no. 2, p. 36, Feb. 2024, doi: 10.3390/computation12020036.
[220] A. C. Miller and L. Bornn, “Possession Sketches : Mapping NBA Strategies,”
MIT Sloan Sport. Anal. Conf., pp. 1–12, 2017.
[221] C. R. Berry and A. Fowler, “How Much Do Coaches Matter?,” MIT Sloan Sport.
Anal. Conf., no. 2011, pp. 1–25, 2019, [Online]. Available:
http://freakonomics.com/2012/12/21/is-changing-the-coach-really-the-answer/.
[222] A. D’Amour, D. Cervone, L. Bornn, and K. Goldsberry, “Move or Die : How
Ball Movement Creates Open Shots in the NBA,” MIT Sloan Sport. Anal. Conf.,
2015.
[223] J. Mortensen and L. Bornn, “From Markov models to Poisson point processes:
Modeling movement in the NBA,” MIT Sloan Sport. Anal. Conf., pp. 1–10, 2019,
[Online]. Available: http://www.sloansportsconference.com/wp-
content/uploads/2019/02/Markov-Models.pdf
[224] J. Rosen, P. Arcidiacono, F. Advisor, K. Kimbrough, and F. Advisor,
“Determining NBA Free Agent Salary from Player Performance,” 2016.
[225] E. Whitmoyer, “Measuring Greatness in the NBA,” 2019.
[226] C. Marmarinos, T. Bolatoglou, K. Karteroliotis, and N. Apostolidis, “Structural
validity and reliability of new index for evaluation of high-level basketball
players,” Int. J. Perform. Anal. Sport, vol. 19, no. 4, pp. 624–631, 2019, doi:
10.1080/24748668.2019.1644803.
[227] J. C. Scheide and M. J. Krebs, “Generating Relative Pick Value in the NBA Draft
and Predicting Success from College Basketball Generating Relative Pick Value
in the NBA Draft and Predicting Success from College Basketball,” 2019.
[228] J. Hu, H. Zhang, and J. Qiu, “Prediction of MVP attribution in NBA regular
match based on BP neural network model,” ACM Int. Conf. Proceeding Ser.,
2019, doi: 10.1145/3358331.3358374.
[229] G. Vinué and I. Epifanio, “Forecasting basketball players’ performance using
sparse functional data,” Stat. Anal. Data Min., vol. 12, no. 6, pp. 534–547, 2019,
doi: 10.1002/sam.11436.
[230] C. Wolf, “An Analysis of NBA Teams’ Spending by Position for the Upcoming
Season,” 2019. https://www.samford.edu/sports-analytics/fans/2019/An-
Analysis-of-NBA-Teams-Spending-by-Position-for-the-Upcoming-Season
(accessed Feb. 21, 2021).
[231] E. Johnsson, “Wanna make money in the NBA? Be a Center,” The official blog

210
of the Harvard Sports Analysis Collective, 2021. An Analysis of NBA Teams’
Spending by Position for the Upcoming Season (accessed Jan. 15, 2021).
[232] D. Curcic, “The Ultimate Analysis of NBA Salaries [1991-2019],” 2020.
https://runrepeat.com/salary-analysis-in-the-nba-1991-2019 (accessed Feb. 10,
2021).
[233] M. R. Ward and A. D. Harmon, “ESport Superstars,” J. Sports Econom., vol. 20,
no. 8, pp. 987–1013, 2019, doi: 10.1177/1527002519859417.
[234] W. O’Neil, How to Make Money in Stocks: A Winning System in Good Times and
Bad. McGraw-Hill, 2013.
[235] A. Kalén, A. Pérez-Ferreirós, P. B. Costa, and E. Rey, “Effects of age on physical
and technical performance in National Basketball Association (NBA) players,”
Res. Sport. Med., vol. 00, no. 00, pp. 1–12, 2020, doi:
10.1080/15438627.2020.1809411.
[236] S. Cea, G. Durán, M. Guajardo, D. Sauré, J. Siebert, and G. Zamorano, “An
analytics approach to the FIFA ranking procedure and the World Cup final draw,”
Ann. Oper. Res., vol. 286, no. 1–2, pp. 119–146, 2020, doi: 10.1007/s10479-019-
03261-8.
[237] F. Sun, C. Yi, W. Li, and Y. Li, “A wearable H-shirt for exercise ECG monitoring
and individual lactate threshold computing,” Comput. Ind., vol. 92–93, pp. 1–11,
2017, doi: 10.1016/j.compind.2017.06.004.
[238] B. T. Foster and M. D. Binns, “Analytics for the Front Office: Valuing
Protections on NBA Draft Picks,” MIT Sloan Sport. Anal. Conf., pp. 1–26, 2019.
[239] N. F. N. Bittencourt, W. H. Meeuwisse, L. D. Mendonça, J. M. Ocarino, and S.
T. Fonseca, “Complex systems approach for sports injuries: moving from risk
factor identification to injury pattern recognition—narrative review and new
concept,” Br. J. Sports Med., pp. 1309–1314, 2016, doi: 10.1136/bjsports-2015-
095850.
[240] A. M. Watson, K. Haraldsdottir, K. Biese, B. Stevens, and T. McGuine, “The
Association of COVID-19 Incidence with Sport and Face Mask Use in United
States High School Athletes,” medRxiv, 2021, [Online]. Available:
https://doi.org/10.1101/2021.01.19.21250116
[241] V. Cordes and L. Olfman, “Sports analytics: Predicting athletic performance with
a genetic algorithm,” AMCIS 2016 Surfing IT Innov. Wave - 22nd Am. Conf. Inf.
Syst., no. 2014, pp. 1–10, 2016.

211
[242] R. Metulini, “Players Movements and Team Shooting Performance: a Data
Mining approach for Basketball.,” arXiv, no. 2018, pp. 1–10, 2018.
[243] S. Yano et al., “Tactics-Trend Analysis for Increasing the Possibility of Shooting
in a Basketball Match,” Proc. 2020 14th Int. Conf. Ubiquitous Inf. Manag.
Commun. IMCOM 2020, 2020, doi: 10.1109/IMCOM48794.2020.9001784.
[244] S. Wu and L. Bornn, “Modeling Offensive Player Movement in Professional
Basketball,” Am. Stat., vol. 72, no. 1, pp. 72–79, 2018, doi:
10.1080/00031305.2017.1395365.
[245] I. Papadaki and M. Tsagris, “Estimating NBA players salary share according to
their performance on court : A machine learning approach,” no. October, 2020.
[246] T. Miller, Sports Analytics and Data Science: Winning the Game with Methods
and Models. Pearson Education, Inc., 2016.
[247] R. M. Achen, “Examining the Influence of Facebook Fans, Content, and
Engagement on Business Outcomes in the National Basketball Association,” J.
Soc. Media Organ., vol. 3, no. 1, 2016.
[248] E. Radicchi and M. Mozzachiodi, “Social talent scouting: A new opportunity for
the identification of football players?,” Phys. Cult. Sport. Stud. Res., vol. 70, no.
1, pp. 28–43, 2016, doi: 10.1515/pcssr-2016-0012.
[249] J. A. Martínez and L. Martínez, “A stakeholder assessment of basketball player
evaluation metrics,” J. Hum. Sport Exerc., vol. 6, no. 1, pp. 153–183, 2011, doi:
10.4100/jhse.2011.61.17.
[250] P. Zuccolotto and M. Manisera, Basketball Data Science: With Applications in
R, vol. 53, no. 9. Taylor & Francis Group, 2020. doi:
10.1017/CBO9781107415324.004.
[251] “APBRMetrics.” http://www.apbr.org/metrics/index.php
[252] “Fivethirtyeight.” https://fivethirtyeight.com/ (accessed Aug. 20, 2019).
[253] “Boxscoregeeks.com.” https://www.boxscoregeeks.com (accessed Aug. 20,
2019).
[254] “Pivot Sports.” https://blog.pivotsports.com/ (accessed Aug. 20, 2019).

212

View publication stats

You might also like