Artificial Intelligence in Sports Predic

This document describes using artificial intelligence and neural networks to predict the outcomes of sporting events. Specifically, it presents an expanded model from previous work that uses a multi-layer perceptron neural network with a number of features to capture the quality of sporting teams. The system is shown to perform well and compare favorably to human predictions in several competitions. Extensive testing was done in a major international sports tipping competition.

Uploaded by

Mirela Olteanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Artificial Intelligence in Sports Predic

Uploaded by

Mirela Olteanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Artificial Intelligence in Sports Prediction

Alan McCabe Jarrod Trevathan

MAIT Technologies School of Mathematics, Physics and Information Technology
Belfast, Ireland James Cook University
Email: [email protected] Email: [email protected]

Abstract— This paper presents an extension of earlier work in

the use of artificial intelligence for prediction of sporting out-
comes. An expanded model is described, as well as a broadening
of the area of application of the original work. The model used
is a form of multi-layer perceptron and it is presented with
a number of features which attempt to capture the quality of
various sporting teams. The system performs well and compares
favourably with human tipsters in several environments. A study
of less rigid “World Cup” formats appears, along with extensive Fig. 1. A simple neural network topology, with features, weights and a single
live testing results in a major international tipping competition. output unit.
I. I NTRODUCTION feature extraction process; Section IV details the experiments
This paper extends previous work [8] in exploring the utility conducted and the results of the work, including comparisons
of neural networks, in particular multi-layer perceptrons, in to “expert” tipsters; future work is presented in Section V and
predicting the outcome of sporting contests given only basic Section VI contains the concluding remarks.
information. This is a less traditional application area of neural II. M ODELLING THE F EATURE S PACE
networks and somewhat of a novelty, however, the basic prin-
ciples of machine learning still apply. Additionally, attaching A. Neural Networks
to predictions an indication of how certain the predictor is, and The main reason for using neural networks (NNs) to model
rewarding such predictions appropriately, are important issues the feature space is the model’s ability to learn the relationship
in many fields. between inputs and outputs upon presentation of examples
The data used in this work was taken from several dif- [3, 13]. It is necessary only to provide a set of sample data
ferent sources and covers four major league sports: the Aus- (also known as training data or a training set) to the network
tralian National Rugby League (NRL), the Australian Football and the use of learning (or training) algorithms such as back-
League (AFL), Super Rugby (Super 12 and Super 14) and propagation perform an adjustment of the network to better
English Premier League football (EPL) from as early as 2002. model the problem domain.
Each of these leagues has different characteristics, schedules, There are many types of neural networks and one of the
lengths and team structures (for example EPL has a feature most popular models is the multi-layer perceptron (MLP).
in which the bottom three teams are “relegated” at the end of MLPs associate a weight with each of the input features (see
each season and replaced by three new teams). Section III-A for a discussion of the features used in this
The data contains noise in that there are obviously details study) according to that features importance in the model (see
influencing the contest outside of those which are being Figure 1). This type of network topology is imminently suited
captured in the feature set. Firstly, there is what is often to the well-defined domain discussed in this paper, because
referred to as individual “form” of the players, however it several features exist and a weighting must be associated
is believed that in most cases, the team’s overall skill level with each according to its contribution to the solution. These
will transcend poor individual form. Secondly, there is the weights can be set to specific initial values (possibly to
fact that the team’s skill level, or quality, can be affected by facilitate an intentional bias) or simply randomly assigned.
the unavailability of “star” players due to injury, suspension or The learning algorithm then adjusts the weights to minimize
representative duties (such as in the NRL where players may the error between the target output (the desired output pro-
be called away from their team for state or country duties). vided in the learning examples) and the actual output (the
Some experiments were performed on trying to account for output as calculated by the MLP). There are a number of
player availability and these are reported later in the paper. learning algorithms that can be used to optimize the weights in
There was a conscious effort to ensure that there was no MLPs, such as back-propagation, conjugate gradient descent
subjectivity in the feature set. and Levenberg-Marquardt [3]. The two learning algorithms
The paper is organised as follows: Section II gives a used in this work were back-propagation and the conjugate-
background on the neural network engine used to make the gradient method, both classical algorithms which are effective,
predictions; Section III describes the raw data used and the relatively simple and well understood. Some argue that other
learning algorithms often perform faster [4], but as this study Performance in Previous Game: the team’s performance in
deals with a small feature set with no requirement for real- their most recent game. In the first round this value is typically
time operation, the above-mentioned advantages outweigh any taken from the previous season’s final game.
perceived increase in speed. Performance in Previous n Games: the average of the
The back-propagation and conjugate-gradient methods work performance for the most recent n games. Up to five previous
by iteratively training the network using the presented training games were considered in the feature set. This is an attempt
data. On each iteration (or epoch), the entire training set is to gauge the recent form and take into account whether the
presented to the network, one case at a time. In order to update team is on a winning or losing streak.
the weights, a cost function is determined and its derivative Team Ranking: the team’s position on the league ladder based
(or gradient) with respect to each weight is estimated [12]. on a list of the teams, sorted by their overall Performance
Weights are updated following the direction of steepest descent value. This feature’s use is obvious as, with all other things
of the cost function. A common cost function and the one used being equal, a team with a high ranking is expected to defeat
in this work is the root mean squared (or RMS) error. During a team with a low ranking.
experimentation with the two methods, back-propagation was Points-for in Previous n Games: the average of the points
a little slower to learn than the conjugate-gradient approach, scored by the team in the most recent n games. Values for n
but both methods resulted in almost identical error rates, with of one to five were used as five separate features.
back-propagation slightly more accurate. Further discussions Points-against in Previous n Games: the average of the
of MLPs and the two learning algorithms can be found in most points scored against the team (expressed as a negative num-
neural network texts, for example [3, 12, 13]. ber) in the most recent n games. Values for n of one to five
III. I NPUT DATA were used in the feature set.
Location: a value indicating whether the current game is
The raw data for this work’s experiments was gathered from
played at the team’s home venue or elsewhere. The value 1 is
several sources, and depending on the league being examined
taken for a home game, and 0 for an away game.
there were between thirteen and thirty-eight rounds of compe-
Player Availability: not in the basic feature set, this feature
tition. From this raw data it is possible to determine round by
was used in further testing reported below. A “star” player
round statistics (features) for each team including their current
is defined as one who is currently involved in their nation’s
success rate, recent performance, the points they’ve scored in
national side (and/or state side in the NRL competition).
the competition to date (to rate their offensive capabilities), the
This feature then is the proportion of star players who are
points scored against them (to rate their defensive capabilities)
unavailable in a given week.
and several other indicative features.
When appropriate, the feature values were averaged over the
A. Feature Extraction number of matches played. This was done so that a meaningful
A conscious effort was made to exclude any subjective comparison may still be made between two teams which have
features from the raw data. Features obtained were based played a different number of matches.
solely on details such as scoreline, recent performance and IV. E XPERIMENTATION
position on the “league ladder” relative to other teams. This The experiments conducted involved extracting the afore-
removes the need for any human judgement in generation of mentioned features, and then constructing the models. As
features and removes any bias (intentional or otherwise) on the mentioned in Section II the choice was made to use a multi-
part of that human. A set of features was obtained for each layer perceptron to model the features, and back-propagation
team, for a given round of competition, as follows: (which proved slightly more effective than the conjugate-
Points-for: the total points scored by the team in matches so gradient method) to facilitate learning. Specifically, a three-
far this season. layer MLP was used with nineteen input units, or twenty
Points-against: the total points scored against the team in when Player Availability is included (one for each feature),
matches so far this season, expressed as a negative number. ten hidden units and a single output unit. The output unit was
Overall Performance: the team’s performance based on their normalised to be a value between zero and one inclusive.
win/loss record. Two points are awarded for a win (or three The feature set values were calculated for each team for
points in the case of EPL), one point for a draw and no points each round of competition. The MLP was trained using all
for a loss. Performance is then the sum of these values over examples from previous rounds, going back to the previous
each round of competition so far. season if necessary, and re-trained after each round. Predic-
Home Performance and Away Performance: the cumulative tions were made for the current round by using the MLP to
performance value calculated using only home games and only calculate an output value for each team based on that team’s
away games respectively. The Overall Performance feature can feature set. An output value of close to one for a particular
hide specific details such as home-ground performance, for team indicated a high level of confidence that the team was
example if the team has a 90% success rate at their home going to win their upcoming match, and an output value closer
ground and a 10% success rate away from home, then an to zero indicated a lower confidence level.
overall success rate of 50% hides some important information The output values for the two teams competing in each game
when trying to predict a winner. were calculated and the team which had the highest output
TABLE I
value (i.e., the highest confidence that the team would be
L IVE TESTING ACCURACY FOR THE FOUR TARGET SPORTING LEAGUES .
victorious) was taken as the predicted winner (or tip) for that
League Best Worst Average
match. Success rates were then calculated as the proportion of AFL 68.1% 58.9% 65.1%
tips for which the predicted winner matched the actual winner. NRL 67.2% 52.2% 63.2%
Super Rugby 75.4% 58.0% 67.5%
A. Results EPL 58.9% 51.8% 54.6%
The results presented here are an augmentation and up-
date of those presented in the original article [8]. Averages
and highlights for the four different league competitions are
presented, as well as comparisons to human tipsters and the
result of two World Cup format tests. In addition, the results
of brief experiments on player availability adjustments are
presented. All predictions were made available under the name
of “McCabe’s Artificially Intelligent Tipper” or MAIT on the
official system website [7] several days in advance of the Fig. 2. Performance over the 2007 Super 14 season. After an early period
matches. In addition, the predictions were often announced of adjustment, the accuracy steadily and consistently improved.
in print media, radio, or other websites [1, 15, 16]. weights for the models for other teams. It was surprising how
It should be noted that it is often difficult to measure the quickly the models learned the new teams details, normalising
success of a system such as this, as there are few benchmarks performance within two to three weeks. As can be seen
for the use of neural networks in this domain. Systems with in Figure 2, the typical behaviour of Super Rugby aligns
a similar structure have been used to perform other predictive with what is classically expected from artificial intelligence
tasks, such as use of neural networks to predict prices on the algorithms, with an initial performance roughly equivalent to
stock market [6, 9, 10, 11, 17]. These systems do not typically random assignment of results, steadily improving throughout
perform with outstanding success and are rarely more effective the remainder of the season. The average performance in Super
than a naive human investor (largely due to the fact that stock Rugby has remained quite high at 67.5%.
market prices are affected by a very large number of variables, English Premier League football (also known as soccer)
many of which are difficult to quantify). posed a new problem with such a high prevalence of draws
A small number of other published systems exist which (23% of EPL matches in the last three years have ended in a
attempt to apply neural networks or other logical methods draw). As a result, an allowance must be made to specifically
to the sporting arena, with varying levels of success [2, 14]. predict draws, which is done by specifying a tolerance level
The best-case result for a single season reported in purely NN for the difference between ratings for the two teams. If the
systems was a 58% success, which occurred in the Australian difference falls below this tolerance level, a draw is predicted,
National Rugby League. Regular (human) experts typically otherwise the team with the higher rating is the predicted
have success rates of somewhere between 60% and 65% for winner for that contest. Over the last two seasons, 35 draws
NRL, AFL and Super Rugby. In the EPL, human experts have been predicted, with 16 of those being correct at a 45.7%
successfully predict somewhere between 50% and 55% of success rate, approximately double what would be expected
matches, where the much higher prevalence of draws results in with random assignment.
lower overall accuracy. Note that for the purposes of results
C. Comparison to Human Tipsters
presented here, unless a draw was specifically predicted, an
actual result of a draw is counted as incorrect. As mentioned, it’s difficult for an effective comparison given
the absence of a large number of other systems doing similar
B. League Competitions
analysis. A more meaningful test of the prediction algorithms
All four of the league competitions mentioned previously is to compare the performance against a large set of human
have been subject to live testing for at least the last three tipsters. To this end, the MAIT system was entered in a
seasons. The NRL predictions for example have been made major international tipping competition called TopTipper [5]
publicly available since 2002. Table I presents the best and in the 2006-2007 season. This competition hosts thousands of
worst whole-season performances, along with the average (human) contestants from year to year, of varying skill levels.
accuracy for each of the target sports. It should be noted that Figure 3 illustrates the MAIT system’s performance versus
that the NRL accuracy was well down in 2007 (resulting in the human competitors, showing how the relative performance
the worst performance recorded, 11% lower than the previous steadily improved throughout the season. By the final week of
worst), where further inspection has identified the fact that the competition the system had taken first position, which was
the system had not been applying any home ground advantage, a considerable achievement.
which will require continued analysis and monitoring in future.
The Super Rugby competition was slightly different in that D. World Cup Format
two new teams were introduced in 2006, which necessitated The extension to “World Cup” environments presented a
some small changes in the algorithm. The weights for the NN significant challenge to the system. The major difference
models for the two new teams were set at an average of all between World Cups and more structured league formats is
sporting arenas. The major professional American sports seem
an obvious choice, and will allow a comparison with a larger
set of existing systems. Expansion to different sports also
allows for the development of a richer feature set, and the
monitoring of the effects of these on the models.
The other focus area for future work in the short term is
Fig. 3. Performance in TopTipper’s English Premier League tipping com- margin prediction. That is, not only are the models going to be
petition. This figure tracks the percentile that the MAIT system lay in (the used to predict a winner, but also the margin of victory, which
percentage of other competitors that the MAIT system was ahead of). obviously poses a more significant challenge to the system.
that each of the teams have a very different performance VI. C ONCLUSIONS
history on which to draw. All teams have played a different
number of games at greatly different intervals and against Despite an existing “novelty” value for this work, there is
greatly different opposition. still theoretical interest in the modelling of features in a noisy
In these cases it was necessary to considerably expand the environment and the use of machine learning techniques to
number of teams under consideration, so as to have a measure predict probabilistic events. Perhaps the primary attraction of
of the “quality” of all teams involved in previous matches, sport in general is that there are so many elements which
in order to assess the significance of a given team’s previous contribute to the result, and that on any given day either
results. At the beginning of the tournament proper, the ratings team is capable of winning. This same fact is what makes
for all non-active teams were disregarded and the process the prediction process so difficult and why so much time and
continued as with a normal league format. money is spent by individuals trying to predict winners in
This procedure was followed for both the 2003 and 2007 various tipping competitions and gambling outlets.
Rugby World Cup tournaments, where live predictions were This paper described an extension to previous work in the
again made. In 2003 the Sydney Morning Herald newspaper generalization and modelling of behaviour of teams in sporting
presented the predictions in direct competition with their own contests. Results were reported for different sports and various
resident human expert. The final result for the MAIT system seasons, and compared against human “expert” tipsters. The
was 45 correct predictions from the 48 matches (at a 93.8% multi-layer perceptrons used were able to adapt very quickly
success rate), compared to 42 from the human expert. In the and perform well despite the limited information and the
2007 tournament, one in which several more unusual results outside influences not included in the feature set.
presented themselves, the system still performed quite well, R EFERENCES
recording 40 correct predictions from the 48 matches (83.3%). [1] ABC Radio National. http://www.abc.net.au/rn/, 2007.
[2] Baulch, M. Using Machine Learning to Predict the Results of Sporting
E. Player Availability Matches. Thesis, University of Queensland, 2001.
During the 2006 NRL season, experiments were performed [3] Bishop, C. Neural Networks for Pattern Recognition. Oxford University
Press, 1995.
with a simple player availability algorithm. A requirement was [4] Hassoun, M. Fundamentals of Artificial Neural Networks. Massachusetts
imposed that all features be entirely objective. Also included Institute of Technology Press, 1995.
was the definition of a “star” player as one who was involved [5] Internet Digital Media Australia. TopTipper: Online Tipping Competi-
tions. http://www.toptipper.com/, 2007.
in the most recent national or state side for which they were [6] Kalyvas, E. Using Neural Networks and Genetic Algorithms to Predict
eligible. The number of star players unavailable for a given Stock Market Returns. Masters Thesis, University of Manchester, 2001.
team for a given round was then used as an input feature [7] McCabe, A. McCabe’s Artificially Intelligent Tipper (MAIT).
http://www.mymait.com/, 2007.
in the model (divided by seventeen, the total number of [8] McCabe, A. An Artificially Intelligent Sports Tipper. Artificial Intelli-
players nominated for a single team), and the models re- gence ’02, Canberra, 2002.
trained using data from the 2004 and 2005 seasons. A total [9] McCann, P. J. and Kalman, B. L. A Neural Network Model for the Gold
Market. http://citeseer.nj.nec.com/308853.html.
of nine differences (compared with models not incorporating [10] McNelis, P. D. Neural Networks in Finance: Gaining Predictive Edge
the player availability feature) occurred over the course of the in the Market. Academic Press Advanced Finance Series, 2004.
2006 season, with a single extra correct prediction being made [11] Op ’t Landt, F. W. Stock Price Prediction Using Neural Networks.
Maters Thesis, Leiden University, 1997.
when player availability is taken into account. [12] Pessoa, L. Multilayer Perceptrons versus Hidden Markov Models: Com-
Despite the single extra hit, an improvement of just over parisons and Applications to Image Analysis and Visual Pattern Recog-
0.5%, this approach was abandoned in 2007 for two reasons. nition. Qualifying Examination Report, Georgia Institute of Technology,
1995.
Firstly, significant effort was required to maintain the list of [13] Russell, S. and Norvig, P. Artificial Intelligence - A Modern Approach
star players, as well as performing comparisons with nomi- (2nd Edition). Prentice Hall, 2002.
nated teams on a weakly basis. Secondly, the actual prediction [14] Swinburne Sports Statistics. http://www.swinburne.edu.au/sport/, 2007.
[15] Sydney Morning Herald - Business News, World News and Breaking
often changed (sometimes repeatedly) in the days leading up News. http://www.smh.com.au/, 2007.
to the game based on the injury status of these star players. [16] Townsville Bulletin: Local and Regional News.
http://www.townsvillebulletin.com.au/, 2007.
V. F UTURE W ORK [17] Toulson, D. L. and Toulson, S. P. Use of Neural Network Ensembles
for Portfolio Selection and Risk Management. NeuroCOLT Technical
There are several possibilities for future directions with Report Series, NC-TR-96-046, 1996.
this work, the most imminent being an extension into further