This document describes using artificial intelligence and neural networks to predict the outcomes of sporting events. Specifically, it presents an expanded model from previous work that uses a multi-layer perceptron neural network with a number of features to capture the quality of sporting teams. The system is shown to perform well and compare favorably to human predictions in several competitions. Extensive testing was done in a major international sports tipping competition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
98 views
Artificial Intelligence in Sports Predic
This document describes using artificial intelligence and neural networks to predict the outcomes of sporting events. Specifically, it presents an expanded model from previous work that uses a multi-layer perceptron neural network with a number of features to capture the quality of sporting teams. The system is shown to perform well and compare favorably to human predictions in several competitions. Extensive testing was done in a major international sports tipping competition.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
Artificial Intelligence in Sports Prediction
Alan McCabe Jarrod Trevathan
MAIT Technologies School of Mathematics, Physics and Information Technology Belfast, Ireland James Cook University Email: [email protected] Email: [email protected]
Abstract— This paper presents an extension of earlier work in
the use of artificial intelligence for prediction of sporting out- comes. An expanded model is described, as well as a broadening of the area of application of the original work. The model used is a form of multi-layer perceptron and it is presented with a number of features which attempt to capture the quality of various sporting teams. The system performs well and compares favourably with human tipsters in several environments. A study of less rigid “World Cup” formats appears, along with extensive Fig. 1. A simple neural network topology, with features, weights and a single live testing results in a major international tipping competition. output unit. I. I NTRODUCTION feature extraction process; Section IV details the experiments This paper extends previous work [8] in exploring the utility conducted and the results of the work, including comparisons of neural networks, in particular multi-layer perceptrons, in to “expert” tipsters; future work is presented in Section V and predicting the outcome of sporting contests given only basic Section VI contains the concluding remarks. information. This is a less traditional application area of neural II. M ODELLING THE F EATURE S PACE networks and somewhat of a novelty, however, the basic prin- ciples of machine learning still apply. Additionally, attaching A. Neural Networks to predictions an indication of how certain the predictor is, and The main reason for using neural networks (NNs) to model rewarding such predictions appropriately, are important issues the feature space is the model’s ability to learn the relationship in many fields. between inputs and outputs upon presentation of examples The data used in this work was taken from several dif- [3, 13]. It is necessary only to provide a set of sample data ferent sources and covers four major league sports: the Aus- (also known as training data or a training set) to the network tralian National Rugby League (NRL), the Australian Football and the use of learning (or training) algorithms such as back- League (AFL), Super Rugby (Super 12 and Super 14) and propagation perform an adjustment of the network to better English Premier League football (EPL) from as early as 2002. model the problem domain. Each of these leagues has different characteristics, schedules, There are many types of neural networks and one of the lengths and team structures (for example EPL has a feature most popular models is the multi-layer perceptron (MLP). in which the bottom three teams are “relegated” at the end of MLPs associate a weight with each of the input features (see each season and replaced by three new teams). Section III-A for a discussion of the features used in this The data contains noise in that there are obviously details study) according to that features importance in the model (see influencing the contest outside of those which are being Figure 1). This type of network topology is imminently suited captured in the feature set. Firstly, there is what is often to the well-defined domain discussed in this paper, because referred to as individual “form” of the players, however it several features exist and a weighting must be associated is believed that in most cases, the team’s overall skill level with each according to its contribution to the solution. These will transcend poor individual form. Secondly, there is the weights can be set to specific initial values (possibly to fact that the team’s skill level, or quality, can be affected by facilitate an intentional bias) or simply randomly assigned. the unavailability of “star” players due to injury, suspension or The learning algorithm then adjusts the weights to minimize representative duties (such as in the NRL where players may the error between the target output (the desired output pro- be called away from their team for state or country duties). vided in the learning examples) and the actual output (the Some experiments were performed on trying to account for output as calculated by the MLP). There are a number of player availability and these are reported later in the paper. learning algorithms that can be used to optimize the weights in There was a conscious effort to ensure that there was no MLPs, such as back-propagation, conjugate gradient descent subjectivity in the feature set. and Levenberg-Marquardt [3]. The two learning algorithms The paper is organised as follows: Section II gives a used in this work were back-propagation and the conjugate- background on the neural network engine used to make the gradient method, both classical algorithms which are effective, predictions; Section III describes the raw data used and the relatively simple and well understood. Some argue that other learning algorithms often perform faster [4], but as this study Performance in Previous Game: the team’s performance in deals with a small feature set with no requirement for real- their most recent game. In the first round this value is typically time operation, the above-mentioned advantages outweigh any taken from the previous season’s final game. perceived increase in speed. Performance in Previous n Games: the average of the The back-propagation and conjugate-gradient methods work performance for the most recent n games. Up to five previous by iteratively training the network using the presented training games were considered in the feature set. This is an attempt data. On each iteration (or epoch), the entire training set is to gauge the recent form and take into account whether the presented to the network, one case at a time. In order to update team is on a winning or losing streak. the weights, a cost function is determined and its derivative Team Ranking: the team’s position on the league ladder based (or gradient) with respect to each weight is estimated [12]. on a list of the teams, sorted by their overall Performance Weights are updated following the direction of steepest descent value. This feature’s use is obvious as, with all other things of the cost function. A common cost function and the one used being equal, a team with a high ranking is expected to defeat in this work is the root mean squared (or RMS) error. During a team with a low ranking. experimentation with the two methods, back-propagation was Points-for in Previous n Games: the average of the points a little slower to learn than the conjugate-gradient approach, scored by the team in the most recent n games. Values for n but both methods resulted in almost identical error rates, with of one to five were used as five separate features. back-propagation slightly more accurate. Further discussions Points-against in Previous n Games: the average of the of MLPs and the two learning algorithms can be found in most points scored against the team (expressed as a negative num- neural network texts, for example [3, 12, 13]. ber) in the most recent n games. Values for n of one to five III. I NPUT DATA were used in the feature set. Location: a value indicating whether the current game is The raw data for this work’s experiments was gathered from played at the team’s home venue or elsewhere. The value 1 is several sources, and depending on the league being examined taken for a home game, and 0 for an away game. there were between thirteen and thirty-eight rounds of compe- Player Availability: not in the basic feature set, this feature tition. From this raw data it is possible to determine round by was used in further testing reported below. A “star” player round statistics (features) for each team including their current is defined as one who is currently involved in their nation’s success rate, recent performance, the points they’ve scored in national side (and/or state side in the NRL competition). the competition to date (to rate their offensive capabilities), the This feature then is the proportion of star players who are points scored against them (to rate their defensive capabilities) unavailable in a given week. and several other indicative features. When appropriate, the feature values were averaged over the A. Feature Extraction number of matches played. This was done so that a meaningful A conscious effort was made to exclude any subjective comparison may still be made between two teams which have features from the raw data. Features obtained were based played a different number of matches. solely on details such as scoreline, recent performance and IV. E XPERIMENTATION position on the “league ladder” relative to other teams. This The experiments conducted involved extracting the afore- removes the need for any human judgement in generation of mentioned features, and then constructing the models. As features and removes any bias (intentional or otherwise) on the mentioned in Section II the choice was made to use a multi- part of that human. A set of features was obtained for each layer perceptron to model the features, and back-propagation team, for a given round of competition, as follows: (which proved slightly more effective than the conjugate- Points-for: the total points scored by the team in matches so gradient method) to facilitate learning. Specifically, a three- far this season. layer MLP was used with nineteen input units, or twenty Points-against: the total points scored against the team in when Player Availability is included (one for each feature), matches so far this season, expressed as a negative number. ten hidden units and a single output unit. The output unit was Overall Performance: the team’s performance based on their normalised to be a value between zero and one inclusive. win/loss record. Two points are awarded for a win (or three The feature set values were calculated for each team for points in the case of EPL), one point for a draw and no points each round of competition. The MLP was trained using all for a loss. Performance is then the sum of these values over examples from previous rounds, going back to the previous each round of competition so far. season if necessary, and re-trained after each round. Predic- Home Performance and Away Performance: the cumulative tions were made for the current round by using the MLP to performance value calculated using only home games and only calculate an output value for each team based on that team’s away games respectively. The Overall Performance feature can feature set. An output value of close to one for a particular hide specific details such as home-ground performance, for team indicated a high level of confidence that the team was example if the team has a 90% success rate at their home going to win their upcoming match, and an output value closer ground and a 10% success rate away from home, then an to zero indicated a lower confidence level. overall success rate of 50% hides some important information The output values for the two teams competing in each game when trying to predict a winner. were calculated and the team which had the highest output TABLE I value (i.e., the highest confidence that the team would be L IVE TESTING ACCURACY FOR THE FOUR TARGET SPORTING LEAGUES . victorious) was taken as the predicted winner (or tip) for that League Best Worst Average match. Success rates were then calculated as the proportion of AFL 68.1% 58.9% 65.1% tips for which the predicted winner matched the actual winner. NRL 67.2% 52.2% 63.2% Super Rugby 75.4% 58.0% 67.5% A. Results EPL 58.9% 51.8% 54.6% The results presented here are an augmentation and up- date of those presented in the original article [8]. Averages and highlights for the four different league competitions are presented, as well as comparisons to human tipsters and the result of two World Cup format tests. In addition, the results of brief experiments on player availability adjustments are presented. All predictions were made available under the name of “McCabe’s Artificially Intelligent Tipper” or MAIT on the official system website [7] several days in advance of the Fig. 2. Performance over the 2007 Super 14 season. After an early period matches. In addition, the predictions were often announced of adjustment, the accuracy steadily and consistently improved. in print media, radio, or other websites [1, 15, 16]. weights for the models for other teams. It was surprising how It should be noted that it is often difficult to measure the quickly the models learned the new teams details, normalising success of a system such as this, as there are few benchmarks performance within two to three weeks. As can be seen for the use of neural networks in this domain. Systems with in Figure 2, the typical behaviour of Super Rugby aligns a similar structure have been used to perform other predictive with what is classically expected from artificial intelligence tasks, such as use of neural networks to predict prices on the algorithms, with an initial performance roughly equivalent to stock market [6, 9, 10, 11, 17]. These systems do not typically random assignment of results, steadily improving throughout perform with outstanding success and are rarely more effective the remainder of the season. The average performance in Super than a naive human investor (largely due to the fact that stock Rugby has remained quite high at 67.5%. market prices are affected by a very large number of variables, English Premier League football (also known as soccer) many of which are difficult to quantify). posed a new problem with such a high prevalence of draws A small number of other published systems exist which (23% of EPL matches in the last three years have ended in a attempt to apply neural networks or other logical methods draw). As a result, an allowance must be made to specifically to the sporting arena, with varying levels of success [2, 14]. predict draws, which is done by specifying a tolerance level The best-case result for a single season reported in purely NN for the difference between ratings for the two teams. If the systems was a 58% success, which occurred in the Australian difference falls below this tolerance level, a draw is predicted, National Rugby League. Regular (human) experts typically otherwise the team with the higher rating is the predicted have success rates of somewhere between 60% and 65% for winner for that contest. Over the last two seasons, 35 draws NRL, AFL and Super Rugby. In the EPL, human experts have been predicted, with 16 of those being correct at a 45.7% successfully predict somewhere between 50% and 55% of success rate, approximately double what would be expected matches, where the much higher prevalence of draws results in with random assignment. lower overall accuracy. Note that for the purposes of results C. Comparison to Human Tipsters presented here, unless a draw was specifically predicted, an actual result of a draw is counted as incorrect. As mentioned, it’s difficult for an effective comparison given the absence of a large number of other systems doing similar B. League Competitions analysis. A more meaningful test of the prediction algorithms All four of the league competitions mentioned previously is to compare the performance against a large set of human have been subject to live testing for at least the last three tipsters. To this end, the MAIT system was entered in a seasons. The NRL predictions for example have been made major international tipping competition called TopTipper [5] publicly available since 2002. Table I presents the best and in the 2006-2007 season. This competition hosts thousands of worst whole-season performances, along with the average (human) contestants from year to year, of varying skill levels. accuracy for each of the target sports. It should be noted that Figure 3 illustrates the MAIT system’s performance versus that the NRL accuracy was well down in 2007 (resulting in the human competitors, showing how the relative performance the worst performance recorded, 11% lower than the previous steadily improved throughout the season. By the final week of worst), where further inspection has identified the fact that the competition the system had taken first position, which was the system had not been applying any home ground advantage, a considerable achievement. which will require continued analysis and monitoring in future. The Super Rugby competition was slightly different in that D. World Cup Format two new teams were introduced in 2006, which necessitated The extension to “World Cup” environments presented a some small changes in the algorithm. The weights for the NN significant challenge to the system. The major difference models for the two new teams were set at an average of all between World Cups and more structured league formats is sporting arenas. The major professional American sports seem an obvious choice, and will allow a comparison with a larger set of existing systems. Expansion to different sports also allows for the development of a richer feature set, and the monitoring of the effects of these on the models. The other focus area for future work in the short term is Fig. 3. Performance in TopTipper’s English Premier League tipping com- margin prediction. That is, not only are the models going to be petition. This figure tracks the percentile that the MAIT system lay in (the used to predict a winner, but also the margin of victory, which percentage of other competitors that the MAIT system was ahead of). obviously poses a more significant challenge to the system. that each of the teams have a very different performance VI. C ONCLUSIONS history on which to draw. All teams have played a different number of games at greatly different intervals and against Despite an existing “novelty” value for this work, there is greatly different opposition. still theoretical interest in the modelling of features in a noisy In these cases it was necessary to considerably expand the environment and the use of machine learning techniques to number of teams under consideration, so as to have a measure predict probabilistic events. Perhaps the primary attraction of of the “quality” of all teams involved in previous matches, sport in general is that there are so many elements which in order to assess the significance of a given team’s previous contribute to the result, and that on any given day either results. At the beginning of the tournament proper, the ratings team is capable of winning. This same fact is what makes for all non-active teams were disregarded and the process the prediction process so difficult and why so much time and continued as with a normal league format. money is spent by individuals trying to predict winners in This procedure was followed for both the 2003 and 2007 various tipping competitions and gambling outlets. Rugby World Cup tournaments, where live predictions were This paper described an extension to previous work in the again made. In 2003 the Sydney Morning Herald newspaper generalization and modelling of behaviour of teams in sporting presented the predictions in direct competition with their own contests. Results were reported for different sports and various resident human expert. The final result for the MAIT system seasons, and compared against human “expert” tipsters. The was 45 correct predictions from the 48 matches (at a 93.8% multi-layer perceptrons used were able to adapt very quickly success rate), compared to 42 from the human expert. In the and perform well despite the limited information and the 2007 tournament, one in which several more unusual results outside influences not included in the feature set. presented themselves, the system still performed quite well, R EFERENCES recording 40 correct predictions from the 48 matches (83.3%). [1] ABC Radio National. http://www.abc.net.au/rn/, 2007. [2] Baulch, M. Using Machine Learning to Predict the Results of Sporting E. Player Availability Matches. Thesis, University of Queensland, 2001. During the 2006 NRL season, experiments were performed [3] Bishop, C. Neural Networks for Pattern Recognition. Oxford University Press, 1995. with a simple player availability algorithm. A requirement was [4] Hassoun, M. Fundamentals of Artificial Neural Networks. Massachusetts imposed that all features be entirely objective. Also included Institute of Technology Press, 1995. was the definition of a “star” player as one who was involved [5] Internet Digital Media Australia. TopTipper: Online Tipping Competi- tions. http://www.toptipper.com/, 2007. in the most recent national or state side for which they were [6] Kalyvas, E. Using Neural Networks and Genetic Algorithms to Predict eligible. The number of star players unavailable for a given Stock Market Returns. Masters Thesis, University of Manchester, 2001. team for a given round was then used as an input feature [7] McCabe, A. McCabe’s Artificially Intelligent Tipper (MAIT). http://www.mymait.com/, 2007. in the model (divided by seventeen, the total number of [8] McCabe, A. An Artificially Intelligent Sports Tipper. Artificial Intelli- players nominated for a single team), and the models re- gence ’02, Canberra, 2002. trained using data from the 2004 and 2005 seasons. A total [9] McCann, P. J. and Kalman, B. L. A Neural Network Model for the Gold Market. http://citeseer.nj.nec.com/308853.html. of nine differences (compared with models not incorporating [10] McNelis, P. D. Neural Networks in Finance: Gaining Predictive Edge the player availability feature) occurred over the course of the in the Market. Academic Press Advanced Finance Series, 2004. 2006 season, with a single extra correct prediction being made [11] Op ’t Landt, F. W. Stock Price Prediction Using Neural Networks. Maters Thesis, Leiden University, 1997. when player availability is taken into account. [12] Pessoa, L. Multilayer Perceptrons versus Hidden Markov Models: Com- Despite the single extra hit, an improvement of just over parisons and Applications to Image Analysis and Visual Pattern Recog- 0.5%, this approach was abandoned in 2007 for two reasons. nition. Qualifying Examination Report, Georgia Institute of Technology, 1995. Firstly, significant effort was required to maintain the list of [13] Russell, S. and Norvig, P. Artificial Intelligence - A Modern Approach star players, as well as performing comparisons with nomi- (2nd Edition). Prentice Hall, 2002. nated teams on a weakly basis. Secondly, the actual prediction [14] Swinburne Sports Statistics. http://www.swinburne.edu.au/sport/, 2007. [15] Sydney Morning Herald - Business News, World News and Breaking often changed (sometimes repeatedly) in the days leading up News. http://www.smh.com.au/, 2007. to the game based on the injury status of these star players. [16] Townsville Bulletin: Local and Regional News. http://www.townsvillebulletin.com.au/, 2007. V. F UTURE W ORK [17] Toulson, D. L. and Toulson, S. P. Use of Neural Network Ensembles for Portfolio Selection and Risk Management. NeuroCOLT Technical There are several possibilities for future directions with Report Series, NC-TR-96-046, 1996. this work, the most imminent being an extension into further