Player Availability Rating
Player Availability Rating
November 8, 2018
1 Introduction
The core competency required of a manager is
to be able to assess a players performance. This
is a challenging endeavor as the performance
of a player depends on a number of attributes
and understandably varies from season to sea-
son. The approach adopted by this paper is to
create a model that uses only player character-
istics such as height, weight, expected time on
ice and expected shooting percentage to create a
projection for a player. Various regression based
models were created in order to estimate the of- Figure 1: Visual representation of flow of data
fensive production of a players in the form of for producing the PAR rating
1
3 Formal Description (P P Gpredicted −P P Gactual )
P AR = (P P CGseason ) + wo ∗
The performance of players in the National (P P Gpredicted −P P Gactual )
Hockey League varies from season to season due (P P CGrecent )
to a variety of reasons such as overall team per-
where
formance, player usage, physiological attributes,
coaching styles etc. Each of the 31 teams in the
league employ an army of scouts that are respon-
sible for analyzing players over the course of a P P Gpredicted : Predicted points/game for
season which can be a daunting task. This pa- skater
per proposes a tool that can be used by general P P Gactual : Actual points/game for
managers to evaluate the performance of play- skater
ers during the course of a season based on their P P CGseason : Percentage of points earned
expected and predicted offensive performances, by team during current
the long-term and short-term performances of season
their teams, and their usage per game. The al- P P CGrecent : Percentage of points earned
gorithm used to determine the performance of by team during last 10 games
the players is presented in Algorithm 1. wo : A parameter to control the
influence of recent team
Result: PAR (Player Availability Rating) performance on player
P P Gpredicted , P P Gactual , P P CGseason , availability
P P CGrecent , Xtest , Ytest , Xtrain , Ytrain ,
wo = 2;
method={’linearRegression’, ’k-NN’, The difference between the predicted and ac-
’NeuralNets’, ’decisionTree’, tual PPG values for each player is indicative of
’randomForest’}; the player’s performance. A negative value cor-
modelminerror = Choose method that responds to the player out-performing expecta-
produces the lowest average error; tions whereas a positive value means that the
while items in Xtest do player is under-performing. The short term and
Use modelminerror with lowest long term winning percentages of the player’s
calculated error and recalculate team can be indicative of the pressure that the
P P Gpredicted ; opposing general manager might be facing to
Extract latest PPG values from make a move to either trade for a player or
NHL.com and assign to P P Gactual ; to trade an under-performing player. Based on
Extract team PPCG values (season) other factors such as health and confidence lev-
from NHL.com and assign to els, which are challenging to quantify, the player
P P CGseason ; might simply need a change of scenery and a
Extract team PPCG values (recent) trade to a different team might be a winning
from NHL.com and assign to proposition for all parties involved. Regularly
P P CGrecent ; using the proposed algorithm can help managers
(P P Gpredicted −P P Gactual ) stay on top of such situations.
PAR = (P P CGseason ) + wo ∗
(P P Gpredicted −P P Gactual )
(P P CGrecent ) ; The PAR estimate captures the essence of
end existing ratings that are dependent on proba-
Algorithm 1: PAR Algorithm bilistic considerations. However, the formula-
tion presented above is not derived from any of
these sources. It also does not use a probabilistic
Each of the models were implemented by us-
method to make predictions.
ing existing libraries (scikit-learn) and were op-
timized by running a grid search over the pa- The PAR estimate uses a combination of neu-
rameters. ral nets and empirical formulation to quantify
This insight is invaluable as the players on the the availability of an individual during an NHL
list above are either severely under-performing, season. Such a formulation was not found in
or their teams are not performing well or a com- literature. However, a number of sources at-
bination of both. These players may be more tempted to quantify player and team perfor-
likely to be surrendered by opposing general mance based on a probabilistic approach [3].
managers during trade negotiations and might The most popular one being TrueSkill [3] which
be considered under the radar acquisitions with is a Bayesian Rating system that has been ap-
the potential for very high reward. The formula plied to other sports such as Basketball [2] and
used to determine this rating is presented below: Football [2].
2
4 Related Work Method/Source Mean Median
Neural Nets 0.211 0.188
The following sources were consulted during Decision Tree 0.222 0.21
literature review: Random Forest 0.215 0.193
k-NN Regression 0.234 0.21
i.) Forecasting Success in the National Hockey Linear Regression 0.245 0.22
League using In-Game Statistics and Textual TSN.ca 0.202 0.173
Data [7]: NHL.com 0.197 0.167
This paper utilizes traditional and advanced
statistics for individual players on a team to Table 1: Calculated error values comparing pre-
predict how teams will perform over the course dicted and actual PPG values for top 100 scorers
of a season. The core concept of using statistics in the NHL
to determine the cumulative performance of
players is similar to the idea presented in this
paper. However, PAR makes a point not to fensive). An attempt had been made to do so in
use advanced statistics and in its stead makes baseball but it was limited to the outcome of
use of physical player characteristics and their games based on recent performance [3].
expected usage over the course of an NHL
season.
Comparison or Demonstration
ii.) Estimating the Value of Major League In order to demonstrate the effectiveness of neu-
Baseball Players [4]: This paper attempts ral nets to predict player performance based
to quantify the value of players to determine on historic data, five models were created us-
how much they should be paid. The author ing five different regression based methods. The
proposes a formulation that considers a num- performance of the neural network was com-
ber of features/factors that might determine pared against these methods. The table be-
player value. PAR attempts to consider similar low summarizes the error values observed (pre-
features but only looks at offensive contribution. dicted PPG-actual PPG) for these methods. Ta-
ble X also looks at predictions made by TSN.ca
iii.) Predicting the Major League Baseball and NHL.com before the start of the 2017-2018
Season [6]: This paper uses neural networks hockey season and compares them to the base-
to solve a binary classification problem in the line (current PPG values) for the top 100 players
form of wins and losses for baseball teams over in the league as determined by their PPG.
the course of a Major League Baseball season. The training, validation and test datasets
Their use of neural networks along with a large were created by creating scripts that extracted
amount of data to make these predictions. the required data from NHL.com and using
an 80-10-10 split. Similarly, additional scripts
iv.) TrueSkill - A Bayesian skill rating sys- were created in order to extract baseline data
tem [3]: The paper above uses a probabilistic from NHL.com[5] and TSN.ca[1]. Each feature
approach to skill assessment to produce a rating was modified using z-score normalization before
based on the outcome of previously played training the model.
games. This paper uses chess rankings to Based on the results in Table 1, neural nets
illustrate their approach. provided the lowest error values of any method
with linear regression having the worst perfor-
v.) Knowing what we don’t know in NCAA mance of any method, as expected. Figure 2 il-
Football ratings: Understanding and using lustrates the performances of each of the meth-
structured uncertainty [2]: This paper uses the ods in predicting the top 100 players with the
TrueSkill method and applies it to evaluate highest PPG values.
team performance for NCAA football games. The top 10 players based on the predicted
The focus is on team performance as opposed PPG determined by the neural nets is presented
to player performance. in Table 2. Their current PPG values are also
presented as a reference. The difference in their
The papers reviewed above focus on a proba- predicted vs actual values can be attributed to
bilistic evaluation of performance. After an ex- the current season being only 30% complete.
haustive search of the literature, no papers were Over the course of the season, the actual PPG
found that use non-probabilistic machine learn- values are expected to decrease.
ing algorithms to produce real valued outputs The final step of the PAR algorithm is to
to evaluate player contributions (offensive or de- apply the PAR formula using the predicted
3
Payer Name Actual Predicted Player Name PAR
Nikita Kucherov 1.48 1.18 Ryan Dzingel 2.29
Brad Marchand 1.17 1.02 Mark Stone 2.07
Claude Giroux 1.07 1 Cam Fowler 2.03
Connor McDavid 1.17 0.99 Brandon Montour 1.66
Anze Kopitar 1.17 0.99 Tyler Myers 1.64
Johnny Gaudreau 1.28 0.98 Brendan Perlini 1.21
John Tavares 1.14 0.97 Nick Foligno 1.14
Evgeny Kuznetsov 1.06 0.96 Tomas Tatar 1.12
Brayden Point 0.85 0.93 Dion Phaneuf 1.02
Mark Scheifele 1.21 0.91 Gabriel Landeskog 0.98
Table 2: Comparison of actual and predicted Table 3: Top 10 players with the highest PAR
PPG values for the current top 10 offensive con- estimates
tributors in the NHL
4
worth exploring in the future.
Conclusions
The goal of the paper was to assess the viabil-
ity of using neural nets to predict player perfor-
mance. The results indicate that neural nets and
other regression based methods can be used to
adequately complete this task. The results were
compared to actual PPG values as well as other
sources such as TSN.ca[1] and NHL.com[5] that
are considered to be a top resource for player
projections. The project was also extended in
order to predict the estimate the Player Avail-
ability Rating (PAR) which is a novel metric
that aims to quantify the availability of a player
based on the current performance of the player
and the performance of their team. The results
indicate that neural nets outperform the other
regression based methods and are comparable to
those made by TSN.ca[1] and NHL.com[5].
The author has also launched a website that
has adopted the algorithm presented in this
paper: gmaiplaybook.com
References
[1] S. Cullen. Statistically speaking: Projected
top 300 scorers, 2017.
[2] T. M. D. Tarlow, .T Graepel. Knowing
what we don’t know in ncaa football ratings:
Understanding and using structured uncer-
tainty. 2014.
[3] G. D. Fatta, G. M. Haworth, and K. W. Re-
gan. Skill rating by bayesian inference. 2009
IEEE Symposium on Computational Intelli-
gence and Data Mining, 2009.
[4] B. Fields. Estimating the value of major
league baseball players. 2001.
[5] P. Jensen. Fantasy: Top 250 rankings for
2017-18, 2017.
[6] C. W. R. Jia and D. Zeng. Predicting the
major league baseball season. 2013.
[7] J. Weissbock and D. Inkpen. Combining tex-
tual pre-game reports and statistical data
for predicting success in the national hockey
league. Advances in Artificial Intelligence
Lecture Notes in Computer Science, page
251–262, 2014.