0% found this document useful (0 votes)
297 views13 pages

Tennis Predictions

This document presents a study on predicting tennis match winners using soft computing techniques. It develops predictors using a fuzzy inference system, neural network, and strength equation to calculate the chances of each player winning based on their past performance data. The predictors take inputs like titles, career win ratios on different surfaces, recent matches, and ranking. The results from each predictor are then combined through voting to provide the overall prediction. The approach aims to predict matches before they start rather than during play.

Uploaded by

GrinCurtis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
297 views13 pages

Tennis Predictions

This document presents a study on predicting tennis match winners using soft computing techniques. It develops predictors using a fuzzy inference system, neural network, and strength equation to calculate the chances of each player winning based on their past performance data. The predictors take inputs like titles, career win ratios on different surfaces, recent matches, and ranking. The results from each predictor are then combined through voting to provide the overall prediction. The approach aims to predict matches before they start rather than during play.

Uploaded by

GrinCurtis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Machine Learning Research

2017; 2(3): 86-98


http://www.sciencepublishinggroup.com/j/mlr
doi: 10.11648/j.mlr.20170203.12

Using Soft Computing Techniques for Prediction of


Winners in Tennis Matches
Mateus de Araujo Fernandes
Federal Institute of Education, Science and Technology in Sergipe, Aracaju/SE, Brazil

Email address:
[email protected]

To cite this article:


Mateus de Araujo Fernandes. Using Soft Computing Techniques for Prediction of Winners in Tennis Matches. Machine Learning Research.
Vol. 2, No. 3, 2017, pp. 86-98. doi: 10.11648/j.mlr.20170203.12

Received: February 24, 2017; Accepted: March 20, 2017; Published: April 10, 2017

Abstract: The forecast of winners in sports brings valuable information for both organizers, media and audience, and this is
particularly important in tennis, where the results of a round in a tournament determine which matches will occur in the next
round. With that in mind, this work presents a study of the main factors influencing matches predictability and, from this
analysis, a new hybrid approach is proposed to calculate the chances of victory of each of the competitors before the start of a
match. A Fuzzy Inference System, with its ability to reproduce knowledge of an expert among mixed information, a Neural
Network, with the capability of features extraction from examples, and a Strength Equation with optimized weighting factors
are the techniques employed. These predictors have as inputs data from previous performances of the players, which in this
case try to capture their short, medium and long-term performances, as well as their affinity for the different types of surfaces.
Subsequently the results from these predictors are combined by a voting system. The results are encouraging, showing
significant gains when comparing to the use of the ATP ranking.
Keywords: Artificial Intelligence, Forecast, Soft Computing

1. Introduction
Tennis is one of the most popular sports in the world, investors and media, so they are provided with important
especially when considering the universe of individual sports. information for business planning and analysis of economic
With an annual tour consisting of approximately 800 viability.
tournaments spread over 70 countries [1-2], where the most In this context, the development of predictors for matches
important of those attract millions of viewers and distribute outcomes is one of the lines of studies, aiming to generate
millionaire prizes, this sport has a large and loyal legion of data that can be of interest not only for informational use or
fans and its top players are some of the most popular and as a source of incomes from betting, but also for planning the
well-paid [3] sportsmen of the world. tournaments and their coverage. Predicting the most probable
With this popularity Tennis is moving a continuously matches to occur in the forthcoming rounds and/or their
ascendant sum of money with tickets, advertising contracts, duration times can, for example, assist in the allocation of
sporting goods and even bets, not to mention the prizes attractive games in the major courts and at the best times,
offered by the tournaments and the value of the athletes’ allow forecasts of public and audience, and even uphold
images for publicity. Parallel to this increase in commercial merchandising actions [6-7].
interest that permeates not only tennis, but professional In the literature, several studies deal with the predictability
sports in general, there is an increasingly strong presence of of results employing the most diverse approaches, including
quantitative scientific methodologies applied to their point-by-point analyzes during the match and predictions of
analysis. These methods have become indispensable for both winners before the start of each match.
players and coaches to analyze performance, strategies, The works of Clowes et al. [8] and Klaasen and Magnus
weaknesses, strengths [4], and even aspects of physical [6] are some of those that are based on point-by-point
conditioning and biomechanics [5], as well to organizers, analysis, focusing not only on the forecast before the
87 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

beginning of the match but also – and especially – during its system.
course, with simulations based on the probability of the The general framework and the dataset acquisition are
player who’s serving to win the next point. Knottenbelt et al. presented on Section 2.1, followed by the analysis of the
[9] also presented a predictor for matches with analysis after influence factor on matches’ predictability shown in Section
every point, however, adding information on the performance 2.2. Afterwards, the Section 2.3 contains the implementation
of the players involved against a common opponent in the details of the soft computing techniques employed for
past. This is made in order to eliminate the bias that exists in prediction. Subsequently, the results and their analysis are
service statistics: stronger players, because usually they more presented on Section 3 for the matches’ predictors, with
often advance to the final rounds of tournaments, confront, in comparisons to real results. Finally, Section 4 brings the final
an average, stronger opponents. discussions and the concluding remarks.
Clarke and Dyte [10] set a logistic regression model to
calculate the probability of winning a set based on the 2. Method
differences in ranking points between players. This model
was used to forecast matches outcomes and to simulate 2.1. Dataset Acquisition
tournaments.
These works rely on the hypothesis that the points or sets The development of the quantitative methods proposed
played are independent and identically distributed (i. i. d.), relies on a database composed with statistics of 220 active
with this meaning that previous results do not exert influence players in the men’s professional circuit in the years 2014
on the forthcoming results. However, the work of Klaasen e and 2015. This data is made available by the Association of
Magnus [11] discuss the validity of this hypothesis, Tennis Professionals (ATP) on its official website [1] and
concluding that winning the previous point has a positive consist of:
influence on winning the current point, and that at pressure a) Number of titles accumulated during a player’s career;
points the servers are negatively affected, what seems to be b) Career Ratio – Fraction of overall matches won
more verisimilar. throughout a player’s career;
In the approach proposed by del Corral and Prieto- c) Grass, Clay and Hard Ratios – Fraction of wins on the
Rodríguez [12], consisting of a prediction for winners in different surfaces;
Grand Slam matches without sticking to the scoreboard, the d) Grand Slam Ratio – Fraction of matches won in the four
analyzed variables of influence on the results of matches main tournaments, disputed in best of five sets;
were the surface type and physical characteristics of the e) Last 10 – Fraction of wins in the most recent matches.
competitors, in addition to the ranking of both players. The These statistics comprise only results from matches played
work of McHale and Morton [7] perform predictions using a in the main draws of tournaments at ATP and Grand Slam
Bradley-Terry model (based on pairwise data comparisons) levels, i.e., results in tournaments of lower levels
adjusted from previous results and on the surface where the (Challengers, Futures, and Qualifiers) are not considered in
matches were played. Meanwhile, Scheibehenne and Bröder order to standardize the difficulty levels of the matches and
[13] show that it is possible to obtain good correct prediction maintain an equality in the comparisons. The data employed
rates only with the recognition of players’ names by an was updated at different moments in order to be consistent
audience not necessarily specialized. with the required forecasts.
In the present work the approach adopted is also intended With that in hand, the study begins with an evaluation of
to give forecasts of matches outcomes before the first ball is players’ performance data with the purpose of discovering
thrown up and not taking into consideration any events that what factors/parameters give the major contribution to more
may occur during its course. For this purpose, three efficient results forecasts. This evaluation will maximize the
different predictors are proposed: the first employing a predictive capabilities of the Soft Computing techniques,
Fuzzy Inference System based on memberships and rules while defining the best variables to be used as inputs for the
that attempt to mimic the knowledge of an expert, the predictors.
second using an equation can calculate a “strength” factor An additional database with the results of all matches of
for each player at a specific tournament, based on previous these levels played in the most recent seasons was obtained
performance and optimized weighting factors, and the third from [14]. This dataset also gives the position and points in
using an Artificial Neural Network and exploring its the ATP entries ranking for all the players, updated prior to
capabilities of learning and feature extraction from training the start of each tournament. During the development of the
sets composed by a database of matches. To make the best predictors, with the implementation details discussed in
of the power of these techniques, a previous study is done Section 2.3, the dataset relative to the tournaments disputed
trying to provide an insight on some quantitative in 2014 are used for training and adjustments. The dataset
performance factors and their correlation with the belief in relative to the 2015’s tournaments is used for testing, being
who will be the winner of a particular match (and with presented only to the final versions of the predictors and
extensions to championships). At the end, the outcomes of allowing a comparative study of their performance on
these predictors are combined in one by a majority vote matches outcomes predictions.
Machine Learning Research 2017; 2(3): 86-98 88

2.2 Analysis of Influence Factors in the Forecasts leader of the ranking, although the difference in positions are
the same. Therefore, the strength relationship that seems to
2.2.1. Ranking Influence exist is not linked only to the ranking position, which induces
The ATP entry ranking [1] is responsible for classifying to think of a model involving a non-linear mathematical
professional tennis players based on their points accumulated relationship.
in the tournaments played through the last 52 weeks, with the To better understand this trend, the graph shown in Figure
purpose of defining the admissions and the draws for the 1 presents curves of the ranking points versus the ranking
forthcoming tournaments. In this work, its use is proposed as position for five different dates between the years 2012 and
a medium-term performance measure for the players. 2015 (a period without changes in the criteria for points
With regard to the ranking information utilized in the distribution in tournaments). When these curves are
predictions, an interesting observation is that, based on analyzed, the obtained relationship seems to be similar to the
statistics of ATP’s matches, can be noted a strong tendency aforementioned trend, with differences in points increasingly
that the difficulty encountered by a tennis player to win an higher as we approach the top of the ranking. That makes
opponent seems to increase in steps increasingly wider as the sense in the way the ranking was designed, considering that,
ranking of these opponents approaches the pinnacle. In other as the difficulty of opponents tends to grow rapidly, the
words, it is much more common a victory of the 120th ranking points awarded to a player for each advanced round
classified against the 101th than a win of the 20th against the in a tournament grow geometrically [1].

Figure 1. Ranking points and position relationship for different dates.

Using curves for a larger number of dates is possible to


model this tendency, as made by Clarke and Dyte [10], 2.2.2. Long and Short-Term Performance
relating ranking points and position. For the case here Based on the obtained dataset, one of the possible ways of
studied, the best fit was found employing a power equation quantifying a player’s performance through his career is by
where the parameters were adjusted by minimization of his victory ratios (in general numbers and on specific
squared errors and resulted in (1): surfaces, as made available by ATP); however, these ratios
are not always reliable, mainly due to the difference between
Points = 18157 ⋅ Position −0.779 (1) the numbers of matches played along the career of each
athlete. Illustrating with a real case, the young tennis player
Considering this information, it was proposed for this Jiri Vesely, at the moment of a specific data collection for this
work, as a way to quantify the ranking dependence in the study, had played only three matches on the grass in high
expected performance of the players, the simple use of their level tournaments and had won two of them, resulting in a
current number of points, normalized relative to the points of good ratio of 0.667. However, in practical terms, this value
the leader. should not be more significant than the fraction of 0.656
89 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

obtained with 59 victories and 31 defeats by the much more The study of Clarke and Dyte [10] compares the
experienced Ivo Karlovic. preference of players for a certain surface to the home
For such reasons, forecast models should also consider advantage observed in team sports such as football or
other factors for a long-term performance measure, and to do basketball, given that, for tennis, disputing a tournament in a
this, here are taken into consideration the overall career ratio player’s home country usually does not bring a significant
and the number of titles. The latter appears as a relevant advantage for his performance, as pointed out by Holder and
factor to the history of the athlete and, in this work, its Nevill [15].
application is proposed considering only absolute numbers, Moreover, the work of Barnett and Pollard [16] analyzed
with no weighting factors due to their relevance. the performance of players on different surfaces, showing
Quantification to be used as performance factor is made that those with better performance on grass courts hardly
simply by a normalization, having as reference the largest have the clay as they second best surface (and vice versa).
number of titles among the players in activity, in this case, The hard courts, as the DecoTurf used in the US Open and
the number of tournaments won by Roger Federer. the Plexicushion used in the Australian Open, are a
The short-term performance here is quantified as the “halfway” between them.
fraction of matches won in the last 10 played immediately An analysis in the database used in this work leads to a
prior to the tournament under analysis. This number is based similar conclusion when quantified the correlations between
on matches played in the main draws of ATP’s and Grand performances on different surfaces with the use of Pearson’s
Slam, but without considering weighting factors for victories coefficient, also referred to as product-moment correlation
in different levels of tournaments or against different levels coefficient. This measure represents the strength of a linear
of opponents. These measures, along ranking information, relationship between paired data, and it is calculated by the
are expected to portray more accurately the career and the Equation (2):
current “momentum” of each player.
∑( x )( )
n

i − x yi − y
2.2.3. Surface Influence r= i =1

Although at the primordium of the sport all tournaments (2)


 n
( ) ( )
2  n 2

were played on grass courts, tennis now counts on three  ∑ xi − x   ∑ yi − y 


 i =1   i =1 
different floors classes: hard (which encompass a variety of
synthetic floors), clay, and the grass itself, currently adopted where x and y are the data vectors, containing n values each.
in a small number of tournaments. Each of these surfaces – Values of r approaching 1 indicate strong linear relationships,
considering their influence on game speed, the bounce of the while null values show lack of linear relationship between
ball and the players’ movements on the court – has the vectors [17].
peculiarities in physical demands, techniques and tactics, For the studied group, the correlation between the vector
requiring great adaptability by the players and often resulting composed by the fractions of matches won on clay by the
in significant performance differences. 220 players of the dataset and the analogue vector for hard
The victory ratio on a specific surface is here considered courts was calculated as 0.688, and for the grass-hard pair,
due to this fact. This is an important factor to aid in the 0.719. These values clearly indicate a stronger correlation
forecasts, because different surfaces require different features than that obtained for the pair grass-clay, calculated as 0.528.
from the athletes. For example, on the grass, being that the
fastest floor, players who are owners of a good service and 2.2.4. Grand Slam Matches
greater ability to play aggressively, including net approaches Another variable of interest is the fraction of matches won
and volleys to shorten the points, usually have in this surface in Grand Slam tournaments, class composed by the most
their best performance. In contrast, on clay, the slowest traditional and prestigious tournaments in the circuit:
surface, usually the best adapted players are those with good Australian Open, Roland Garros, Wimbledon and US Open.
defensive skills and efficient movement in the baseline, what These tournaments are the only ones with the main draws
is correlated with performance in longer rallies. These composed by 128 players and, for the men, to have their
characteristics are evidenced when comparing, for example, matches played in best of 5 sets. Therefore, money prizes and
styles of play and results on both surfaces of the greatest ranking points awarded to winners are also more generous. It
champions in activity, Roger Federer and Rafael Nadal, being is observed in this case a different behavior in data, which
the first owner of a more offensive style and the biggest can be related to mental and physical components, as the
winner of the professional era on grass courts, while the matches are longer and being part of the biggest events,
second, with his efficiency near the baseline, is the greatest which draw more attention of public and media.
champion on clay courts.
Machine Learning Research 2017; 2(3): 86-98 90

50
Clay Ratio
45 Grass Ratio
Hard Ratio
40

35

30
Frequency

25

20

15

10

0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Intervals

Figure 2. Frequency distribution of victory ratios on the different surfaces for ATP tournaments.

40
Grand Slam Ratio
35

30

25
Frequency

20

15

10

0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Intervals

Figure 3. Frequency distribution of victory ratios for Grand Slam tournaments.

Evidence of this difference can be seen when comparing matches, has a slightly different look, with its peak shifted
the graphs of Figures 2 and 3. The first shows the frequency toward lower values.
distributions for the performances (win ratios) of the players This shift of the peak can be explained by the tendency of
in the group examined on the three different classes of the victories of players with lower rankings become scarcer
surfaces for ATP tournaments, where can be noticed that they in these tournaments, in other words, a smaller number of
approach symmetric Gaussian distributions with mean 0.5. players tends to concentrate the success. This greater
However, the second plot, with performances in Grand Slam favoritism confirmation ratio in Grand Slam tournaments is
91 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

also observed in the model proposed by Clarke and Dyte between the win ratios throughout the career of the players
[10], that highlights that the favorite, being more likely to computed only in the same surface of the tournament under
win sets, will be harder to be beaten in 5 sets than in 3 sets consideration. In this predictor, the performance in the Grand
matches. This observation is consistent with the facts, given Slam tournaments and in the last 10 matches were chosen not
that in recent years most of the Grand Slam titles (42 in the to be included in the model, simplifying its design while
52 tournaments disputed between 2004 and 2016) were won making its rules more intuitive.
by just three players: the Swiss Roger Federer, the Spaniard For each one of the matches to be predicted, these three
Rafael Nadal and the Serbian Novak Djokovic. By this feat, inputs are calculated and subsequently fuzzified, being divided
these players are already recognized as some of the biggest in four categories of values – high negative, low negative, low
champions in the history of this sport. positive, and high positive – by using triangular membership
functions defined in a generic way by (3):
2.3. Development of the Predictors
 0, x≤a
This section presents the theoretical aspects that underlie x−a
the proposed predictors, as well as the details of their design  , a≤ x≤b
b − a
and implementation. trimf ( x; a, b, c) =  (3)
c − x , b ≤ x ≤ c
c −b
2.3.1. Fuzzy Predictor  0,
 c≤x
Since the seminal work on this subject – the article by
Zadeh [18] – fuzzy logic is being employed in a large variety
and with characteristics shown in Figure 4, for calculating the
of problems, being the Inference Systems some of its more
degrees of compatibility that provide a belief in the
prominent applications. Introduced by Mamdani, those
antecedents of each rule. Triangular membership functions
systems are ruled by the approximated reasoning known as
are chosen because of their mathematical simplicity and
Generalized Modus Ponens, based in linguistic variables and
efficiency, resulting in a reduced computational cost. There
IF-THEN implication rules to generate the typical reasoning
were no improvements observed in the predictions by
of the fuzzy systems, by using human experience to develop
changing the triangular membership functions for others (as
intelligent algorithms capable of dealing with
Gaussian), neither by optimizing the number of classes or
heterogeneous/imprecise data in a variety of applications [19-
their parameters for a fine tune.
20]. The solution adopted in this work is based on a zero-
To generate the set of rules that define the Inference
order Sugeno inference system [21], where the consequent of
System, the human experience is the primary source of
the implication rules is a constant.
information, and the Fuzzy logic here shows its primary
The fuzzy predictor here developed utilizes as inputs three
purpose, allowing to express mathematically knowledge that
variables, each of them being introduced in the form of a
commonly is dealt with in a linguistic form. Thus, setting up
difference between values for the respective players involved
the two possible outcomes – victory of Player 1 or victory of
in a specific match. The first is the difference between the
Player 2 – a rule base is built to analyze the variables. A
current values in the ranking points, normalized relative to
sample of some of these rules is shown in Table 1. The AND
the score of the leader of the ATP entries ranking at that very
operators are implemented with the minimum function
moment, as cited in Section 2.2.1. The second variable is the
according to (4):
difference between the history of the players, with their
values quantified by an arithmetic mean of the wins ratio µC ( x ) = min ( µ A ( x ) , µ B ( x ) ) (4)
(matches at ATP and Grand Slam levels) accrued throughout
their career and the coefficient of titles, calculated as where µA and µB are the chosen membership functions.
described in Section 2.2.2. The third variable is the difference

Table 1. Excerpt from the rule base of the proposed inference system.

∆Ranking ∆History ∆Surface Result


IF High Positive AND High Positive - THEN P1 Wins
IF Low Positive AND Low Negative AND Low Positive THEN P1 Wins
IF High Negative AND Low Positive AND High Positive THEN P1 Wins
IF High Positive AND Low Negative AND High Negative THEN P2 Wins
IF Low Negative AND Low Negative AND Low Positive THEN P2 Wins
IF High Negative AND High Negative - THEN P2 Wins
Machine Learning Research 2017; 2(3): 86-98 92

0.9

0.8

0.7
High Negative
Membership

0.6
Low Negative
0.5
Low Positive
0.4 High Positive
0.3

0.2

0.1

0
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
Variables

Figure 4. Membership functions employed by the Fuzzy predictor.

The method described so far gives the weighting factors Perceptron (MLP), a network with direct signal propagation
for each one of the results. However, the desired outputs here where the neurons are arranged in sequential layers and the
are the beliefs in the membership of the input dataset to each outputs of every neuron in each layer are connected to inputs
one of the possible outcomes for a match, and is not of the following layer. The “knowledge” of the network is
composed by a single value, as usual in fuzzy inference stored in the weights associated with each one of these
systems. This leads to the adoption of the weighting factors connections, with their “learning” being made by iterative
themselves as the desired beliefs, after suitable normalization algorithms that adjust these weights based on examples (pair
to obtain a percentage for each player. The victory is credited of inputs-outputs known a priori).
to the player with the higher value. The inference system for A MLP network with one intermediate layer and one
such task, as described, is illustrated in Figure 5. output layer is able to solve some nonlinear problems and to
approximate continuous functions, while the addition of one
more intermediate layer enables it to implement any function,
linearly separable or not, as demonstrated by Cybenko [23].
The number of nodes in each layer is the main responsible
for the convergence in the training phase and for the
precision of results [22].
Each one of the ANN’s nodes contains an activation
function, responsible for calculating the node’s output from
the weighted sum of its inputs. The most usual activation
functions are those based on sigmoid functions (s shaped),
due to their balance between linear and non-linear behavior,
and also because they are continuous-valued monotonically
Figure 5. Fuzzy inference system employed as predictor. increasing functions, differentiable at all points [24]. The
sigmoid functions chosen for this case are the one known as
2.3.2. Neural Network Predictor logistic, given by (5):
An Artificial Neural Network, in the original paradigm
1
inspired by biological neural networks, consists of a set of f ( x) = (5)
1 + e− x
processing units (also called neurons or nodes) destined to
provide an output value within a certain range, based on the and the hyperbolic tangent, given by (6):
weighted sum of its inputs and subsequent application of an
“activation function”. There are several possible e x − e− x
f ( x) = (6)
arrangements for the connections between those units to form e x + e− x
networks [22], and the most widespread is the Multi Layer
93 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

These functions present the property of compressing the one and large negative values being squashed to zero. Other
input, with the large positive values asymptotically approaching examples of activation functions can be found in [24].

Figure 6. Architecture of the Neural Network proposed for prediction.

In this study, such network is then trained for, based on a proposed model, are employed the same three performance
dataset of matches where the winner is already known as well measures on which the Fuzzy predictor is based: coefficients
as data from the history and previous performances of the for the current score in the ranking, for the player’s history
players, use its the ability of generalization to predict the (composed by number of titles and career win ratio) and ratio
winner in new matches when given new inputs. The of victories on the same surface of the tournament under
architecture that resulted more appropriate to handle this analysis. However, in this model are employed individual
problem, determined empirically, is the one depicted in values for the two players competing against each other in a
Figure 6 and was implemented in MatLab. specific match, resulting in a total of six inputs. The output
This ANN is composed of an input layer with four neurons variables are two, each representing the victory of one of the
and hyperbolic tangent activation functions, a hidden layer players in binary values. So, for a victory of the first player is
with four neurons and logistic activation functions, and an expected that its corresponding output will have unit value
output layer with two neurons and hyperbolic tangent and the other output a null value, for example. An excerpt
activation functions. All links between nodes are weighted by from the training dataset is illustrated in Table 2. Neural
a specific weight: Wi vector for the input layer, Wh for the models considering also the use of Grand Slam performance
intermediate (hidden) layer, and Wo for the output. data (totalizing eight inputs) were also evaluated. All the
As input variables for the neural network, in the final variables are normalized for the interval [0, 1].

Table 2. Excerpt from the training dataset for the Neural Network.

Tournament Match Score Rk. 1 Hist. 1 Surf. 1 Rk. 2 Hist. 2 Surf. 2 P1 P2


Wimbledon’14 Murray-Dimitrov 1-6 6-7 2-6 0.374 0.757 0.830 0.208 0.470 0.622 0 1
US Open’14 Cilic-Federer 6-3 6-4 6-4 0.144 0.11 0.670 0.587 0.887 0.829 1 0
Paris’14 Djokovic-Raonic 6-2 6-3 0.817 0.826 0.828 0.279 0.546 0.710 1 0
Xangai’14 Nadal-Lopez 3-6 6-7 0.600 0.870 0.776 0.124 0.435 0.509 0 1
Toronto’14 Ferrer-Dodig 1-6 6-3 6-3 0.290 0.682 0.637 0.057 0.312 0.483 1 0

The learning of the ANN in this study was based on the output pattern to allow the calculation of the error between
most popular of the training algorithms: the backpropagation. both. This error is propagated through the network in a
In this method, the weights of the connections between the reverse path (“backward phase”, justifying the name of the
network’s nodes are initialized with random values. After algorithm). The product of the error of each output by a
that, sets of input values that result in an output already constant “learning rate” is subtracted from the connections’
known are presented to the network (in random order). For weights of the respective node in the last layer. The error of
each one of these sets, the network output with the current each node of the previous layers is calculated using the errors
weights is computed, in the so-called “forward phase” of of the nodes from the following layer connected to it,
training. The output obtained is then compared to the correct weighted by the weights of the connections between them
Machine Learning Research 2017; 2(3): 86-98 94

[22]. The procedure is repeated, with new pairs of each of the studied attributes will be assigned a weighting
input/output vectors being presented to the network until a factor. This equation, here denominated “Strength Equation”
stopping criterion is reached: the mean square error becomes also has the objective to quantify the strength of each player
smaller than a predetermined limit, a maximum number of for a specific tournament, that is, his ability to succeed based
iterations is reached, or the error becomes stagnated between on his current form, his history and his performance on that
iterations. A success in the training phase will result in a specific surface. Therefore, the same equation may be used
network ready for the forecasts. for predictions of matches’ outcomes from a belief
calculation based on the comparison between the strengths of
2.3.3. Strength Equation any two players.
Based on the previously presented analysis of the factors The proposed equation has the following form (7):
that influence the matches’ outcomes, an intuitive way of
measure them comparatively is by an equation where for

S n = w 1⋅titles + w 2 ⋅ranking + w 3 ⋅last 10 + w 4 ⋅career + w 5 ⋅grand slam + w 6 ⋅surface (7)

where each of the attributes is obtained as described in combination of weights such that the winners’ strength is
Section 2.2 and the vector wi is composed by their respective greater than the losers’ strength in as many cases as possible.
weighting factors. Figure 7 depicts schematically the process For simplicity, the values of the weights were restricted to
of forecasting the outcome of a match using this equation. natural numbers in the [0, 5] interval, with no noticeable
performance loss. The control parameters were set as
following: Convergence = 0.0001, Mutation Rate = 0.05,
Population Size = 200, and Random Seed = 0. Here, two
different equations were optimized with data from
tournaments played in 2014: one for matches in best of three
sets (ATP level) and other for matches in best of five sets
(Grand Slam level), with the results shown in Table 3.
From the abovementioned values, the greatest influence of
the surface and ranking for the matches of three sets can be
seen, where the Grand Slam ratio was restricted to zero.
Curiously, optimization led the Last 10 also to zero, even
being one of the factors of greater weight in matches of five
sets, along with the Ranking and Grand Slam ratio. In the
latter case, the influences of career ratio and surface ratio
were devalued.

Table 3. Strength Equation’s weights adjustment.

Attribute wi (ATP’s) wi (Grand Slam)


Titles 1 2
Ranking 4 4
Last 10 0 4
Career Ratio 4 0
Figure 7. Representation of the forecast procedure using the Strength Grand Slam Ratio 0 5
Equation. Surface Ratio 5 1

The need for adjustment of the weights makes this model 2.3.4. Voting System
dependent of a dataset with matches’ results and player’s Having been developed the three predictors based on
information for “training”, as well as in the neural model. soft computing, the following step was to develop a
With this set, the adjustment can be performed by a system that encompasses their strengths, combining the
combinatorial optimization process, which here is done by three outcomes in only one. A simple yet efficient way of
means of the evolutionary algorithm available in the achieving that is by a voting system. Having an odd
Microsoft Excel’s Solver. This tool makes available a Genetic number of predictors, it was chosen for aggregation the
Algorithm where the user defines the inputs variables, the simple Majority Vote, what means that the predicted
output, restrictions and the control parameters. The objective winner of a match will be the one considered the favorite
is to maximize the number of correct predictions of winners by at least two out of the three independent classifiers.
in that set of matches, in other words, to generate a Figure 8 depicts this process.
95 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

ATP's Grand Slam


Year % Correct % Correct % Correct % Correct
Ranking Bets Ranking Bets
2013 64.00% 66.98% 75.31% 77.97%
2014 66.85% 67.80% 74.07% 75.98%

From the group of matches played in 2014, the survey shows


that the percentage of matches won by the best ranked player
was close to 67% in ATP tournaments and 74% in the Grand
Slam, what is quite consistent with the average observed over
previous years, as shown in Table 4. This comparison is
important to check that the analyses are not based on atypical
events data. The same table also shows, for comparative
purposes, the percentage of correct predictions based on
Figure 8. Representation of the Voting System’s framework. bookmakers’ odds took from five of the major websites for bets
in sports, according to numbers also compiled in [14]. These
values represent the fraction of matches that were won by
3. Results players who were considered the most quoted, what is a good
The analysis of the proposed predictors’ performance was benchmark for a predictor, given that these numbers depict the
based on a database containing all matches’ results from the confidence of bookmakers, who are expected to have some
last few seasons, available at [14]. That database, with the knowledge about players’ (past performance, current form and
information of 1744 matches played in tournaments of eventual particularities) and tournaments’ characteristics.
categories ATP 250, ATP 500, Masters 1000 and ATP Finals, Unpredictable results, that could be considered outliers,
added to 508 Grand Slam matches (all of them played in were not removed from the training dataset. This decision is
2014 and involving over 200 different players), constitutes intended to let the predictors try to draw a pattern for these
the training dataset for the Neural Network, and is employed results that would normally be considered as unforeseeable.
to fine-tune the Fuzzy predictor and to optimize the weights As previously mentioned, this work focuses on predictions
in the Strength Equation. of winners in tennis matches without considering events during
Validation of the predictors and the statistics of their their course or even their final score. Thus, for the three
responses were based on a set with data from 1109 matches proposed predictors the information required for analysis is a
played in ATP tournaments added to the 381 matches played in set of inputs for each match, as detailed in Section 2.3, and the
the tournaments Australian Open, Roland Garros and winner of this match to compare with the predictors’ outputs.
Wimbledon during 2015 season. Again, this group of matches The results of the predictors for the matches in ATP level
includes more than 200 different players. These are the same used as the validation dataset is shown in Table 5, segmented
datasets employed by the preliminary work presented in [25]. by surfaces. It is noteworthy that, although the results with this
segmentation are presented, there were no specific models per
Table 4. Correct Predictions by Ranking and Bookmaker’s Odds – Previous surface; the Fuzzy Inference System is the same used in all
Years. predictions, the Neural Network is trained with all the matches
ATP's Grand Slam of ATP tournaments without distinction, and the Strength
Year % Correct % Correct % Correct % Correct Equation model has also its coefficients optimized for that
Ranking Bets Ranking Bets class of tournaments. Hit rates are compared to those obtained
2010 64.72% 67.12% 74.95% 78.59% from mere comparison of rankings at the time of the
2011 66.14% 69.40% 75.00% 78.22%
2012 66.07% 68.92% 74.85% 77.86%
tournament and also from the bookmakers’ odds, as aforesaid.

Table 5. Performance of Matches’ Prediction – ATP’s.

Number of % Correct % Correct % Correct % Correct Neural % Correct % Correct Voting


Surface
Matches Ranking Bets Fuzzy Net. Equation Sys.
Hard 474 62.87% 69.62% 65.82% 71.66% 66.88% 67.30%
Clay 483 68.74% 70.73% 70.60% 77.02% 72.26% 72.46%
Grass 152 55.92% 66.77% 62.50% 72.15% 69.08% 69.08%
All 1109 64.47% 69.75% 67.45% 74.06% 69.52% 69.79%

Table 6. Performance of Matches’ Prediction – Grand Slam.

Number of % Correct % Correct % Correct % Correct % Correct % Correct


Grand Slam
Matches Ranking Bets Fuzzy Neural Net. Equation Voting Sys.
Australian Open 127 74.80% 78.74% 77.95% 80.71% 85.83% 81.10%
Roland Garros 127 71.65% 78.15% 76.38% 77.56% 81.89% 78.74%
Wimbledon 127 75.59% 75.59% 74.02% 71.65% 72.44% 74.80%
All 381 74.02% 77.49% 76.12% 76.64% 80.05% 78.22%
Machine Learning Research 2017; 2(3): 86-98 96

From these results, it can be seen that the percentage of crowd, fatigue generated in previous rounds, momentary
correct answers obtained by the Fuzzy predictor represents a changes in physical and emotional conditions, extra
gain over the prediction by ranking comparison, but still has a motivations or pressure etc.
performance inferior to that of the bookmakers, that was nearly Another metrics for the quality of predictors is the
equaled by the model using the Strength Equation. On the DeFinetti Measure, capable of quantifying the accuracy of
other side, the Neural Network achieved the greatest accuracy predictions when confronted with the results that actually
with a very significant margin, which means that it was able to occurred [26]. The importance of this quantification lies on
extract relevant features from the training dataset and to the fact that often the number of correct or incorrect
quantify them efficiently in the model. The voting system outcomes from a predictor can misled its quality evaluation,
based on the outcomes of the other predictors presented good while not worrying about the previously estimated error
results, but was inferior to the Neural Network. margins. An example of this problem is illustrated by
The results from the forecasts made for Grand Slam observing the different forecasts for the match between
matches are shown in Table 6, presenting the same Rafael Nadal and Dustin Brown in Wimbledon 2015,
comparisons. In this case, the Fuzzy predictor had the same surprisingly won by the German, who at that moment
modeling and the same rule base previously used for the ATP occupied the modest 102nd position in the ATP’s entries
tournaments, while the Neural Network, although having ranking. The Fuzzy predictor calculated his chances of
identical configuration, was trained only having as reference victory with a probability inferior to 0.001, while the
the set consisting of the 508 Grand Slam matches played in Strength Equation indicated 0.186. While both have missed
the four tournaments disputed in 2014. The same applies to the winner (in this case even bookmakers gave Brown a low
the Strength Equation, which had its coefficients optimized credibility of 0.141), it is clear that the error of the Fuzzy
having as reference this same dataset. For the neural model, predictor was more serious.
tests were also conducted including a new input: the Grand To obtain this measurement for a series of predictions, the
Slam victory ratio. However, the addition of this variable to DeFinetti distance must be initially calculated for every
the model did not result in improvements in the quality of the match by the equation (8):
predictions, and because of this the results presented are from
a network with the original configuration, previously ( pw1 − 1)2 + ( pw 2 − 0 )2 if player 1 wins the match
presented in Figure 6. DF =  (8)
( pw1 − 0 ) + ( pw 2 − 1) if player 2 wins the match
2 2
For this class of tournaments, as the percentage of matches
where the best-ranked won is significantly higher, it is
where pw1 and pw2 are the probabilities of victory previously
expected the margins of improvement with the use of the
assigned to the players. That distance corresponds
predictors to be smaller, and observing the numbers in Table
geometrically to the quadratic Euclidean distance between
6 it is what can be noted for most cases. Here, the first two
the predicted values and the ones that really occurred, when
proposed predictors improved the figures obtained from the
win and loss probabilities are considered elements in a
forecast by ranking, but both were slightly below the
vector. The DeFinetti Measure of the series of predictions can
performance of bookmakers. The Strength Equation, in turn,
then be obtained by calculating the arithmetic mean of the
was able to obtain correct predictions percentage notably
DeFinetti distances calculated for each match, where a
above the others, especially for the first two Grand Slam
predictor is as best as lower is this average [26].
analyzed, showing that the weights’ adjustment was efficient
The values obtained by this means for each of the
enough to result in a good model for this problem, after
proposed predictors are shown in Table 7, distinguished by
adding input variables representing the previous performance
the classes of tournaments – ATP’s with matches in best of
in Slams and in the last 10 matches. Once again, the voting
three sets and Grand Slam with matches in best of five sets.
system performed better than two of the predictors, but
By way of comparison, the table also contains values
couldn’t beat the best.
calculated for a simple predictor by ranking where the
The difficulty in improving the rates obtained by bettors
probabilities of winning of each player were obtained by
make clear the limitations of automatic forecasts, since these
weighting their ranking points at that moment. The measure
can never cover all the quantitative and qualitative aspects
for the Voting System was not computed, due to the absence
that a human predictor (as the ones who bet) could take into
of a numerical outcome.
account. Some examples of these aspects are influence of the

Table 7. DeFinetti's Measure for the Proposed Predictors.

Level Number of Matches Ranking Predictor Fuzzy Predictor Neural Network Predictor Equation Predictor
ATP 1109 0.428 0.410 0.369 0.424
Grand Slam 381 0.351 0.337 0.342 0.335
97 Mateus de Araujo Fernandes: Using Soft Computing Techniques for Prediction of Winners in Tennis Matches

From the aforementioned results, the first observation to


be made is that in all cases the measures are inferior to 0.50,
which means that all predictors have performance superior to References
a “predictor” that assigns 50% chance of winning for each of
the tennis players in every match. Moreover, it can be [1] ATP. Official site of men’s professional tennis. 2015. Available
online at: <http://www.atpworldtour.com>. Last accessed:
perceived that the three methods showed better results than November 1, 2015.
the inference by ranking, with a positive highlight for the
value obtained for the Neural Network with the ATP’s and [2] ITF. International tennis federation. 2015. Available online at:
the best performance of the Strength Equation with Grand <http://www.itftennis.com>. Last accessed: November 1, 2015.
Slams, which implies in consistency with the results of the [3] FORBES. The world's highest-paid athletes. 2015. Available
percentages of correct outcomes. online at: <http://www.forbes.com/athletes/list/>. Last
accessed: November 1, 2015.
4. Conclusion [4] GONZÁLEZ-DÍAZ, J.; GOSSNERB, O.; ROGERS, B. W.
Performing best when it matters most: Evidence from
This paper presented a study on the predictability of winners professional tennis. Journal of Economic Behavior &
in tennis matches, starting from analysis of players’ Organization, n. 84, p. 767– 781, 2012. ISSN 0167-2681.
performance, taking into account their career, their current [5] FERRAUTI, A. et al. Diagnostic of footwork characteristics
momentum, and their aptitude on different surfaces. The and running speed demands in tennis on different ground
problem of predicting the matches’ outcomes was approached surfaces. Sport Orthopädie Traumatologie, n. 29, p. 172–179,
by three different methods: the first an Artificial Neural 2013. Available online at:
Network, the second a Fuzzy Inference System and the third a <http://dx.doi.org/10.1016/j.orthtr.2013.07.017>. Last
accessed: November 1, 2015.
Strength Equation with weighting factors adjusted by
optimization. They all rely on classical techniques of Soft [6] KLAASEN, F.; MAGNUS, J. R. Forecasting the winner of a
Computing, considered relevant for their efficiency and tennis match. European Journal of Operational Research, n.
versatility in applications, and the obtained performances (both 148, p. 257–267, 2003. ISSN 0377-2217.
individually and combined by a Voting System) ratify that. The [7] MCHALE, I.; MORTON, A. A Bradley-Terry type model for
predictors presented good results, always surpassing the hits forecasting tennis match results. International Journal of
rates obtained by simply comparing players’ rankings and in Forecasting, n. 27, p. 619–630, 2011. ISSN 0169-2070.
some cases even outperforming the – in most cases experts – [8] CLOWES, S.; COHEN, G.; TOMLJANOVIC, L. Dynamic
bookmakers. These predictors can also be used to obtain evaluation of conditional probabilities of winning a tennis
beliefs in what players will have more chance to succeed prior match. In: AUSTRALIAN CONFERENCE ON
a given tournament, helping coaches to select teams for MATHEMATICS AND COMPUTERS IN SPORT, 6.
competitions like the Davis Cup or the Olympics and even Proceedings… Gold Coast, Australia: 6M&CS, 2002.
Available online at: <http://hdl.handle.net/10453/6673>. Last
helping the own players to compose their calendar with the accessed: November 1, 2015.
tournaments where they could perform better.
The study exposed here, however, is part of a model [9] KNOTTENBELT, W. J.; SPANIAS, D.; MADURSKA, A. M.
subject to many imperfections, since it is impossible to A common-opponent stochastic model for predicting the
outcome of professional tennis matches. Computers and
quantify dozens of factors that can influence the outcome of Mathematics with Applications, n. 64, p. 3820–3827, 2012.
matches, as the momentary emotional state, injuries, support ISSN 0898-1221. Available online at:
from fans, fitness, possible lack of tempo or shape after an <http://dx.doi.org/10.1016/j.camwa.2012.03.005>. Last
absence from the circuit, adaptations to changes in equipment accessed: November 1, 2015.
etc. However, it can be noted that there are margins for [10] CLARKE, S. R.; DYTE, D. Using official ratings to simulate
improvement in predictions, especially by looking at the major tennis tournaments. International Transactions in
Neural Network’s results for the matches played in three sets, Operational Research, n. 7, p. 585–594, 2000. ISSN 1475-3995.
or the Strength Equation’s results for the Grand Slam
[11] KLAASSEN, F.; MAGNUS, J. Are points in tennis
matches, situations where large gains were achieved. independent and identically distributed? Evidence from a
Future work will focus on improvement by using dynamic binary panel data model. Journal of the American
information from new variables, such as head-to-head Statistical Association, n. 96, p. 500–509, 2001.
numbers and the prize money obtained within some specified
[12] DEL CORRAL, J.; PRIETO-RODRIGUEZ, J. Are differences
period of time preceding the tournament under analysis. That in ranks good predictors for Grand Slam tennis matches?
is a way of giving more value to the most important victories, International Journal of Forecasting, n. 26, p. 551–563, 2010.
as the major tournaments offer more generous prizes and ISSN 0169-2070.
higher monetary values are awarded on victories in the later
[13] SCHEIBEHENNE, B.; BRODER, A. Predicting Wimbledon
stages of tournaments. That information, though more 2005 tennis results by mere player name recognition.
difficult to obtain, may allow a better performance in the International Journal of Forecasting, n. 23, p. 415–426, 2007.
predictions. ISSN 0169-2070.
Machine Learning Research 2017; 2(3): 86-98 98

[14] TENNIS DATA. Tennis results and betting odds data. 2015. [21] SUGENO, M. et al. (Ed.). Industrial Applications of Fuzzy
Available online at: <http://www.tennis- Control. New York, NY, USA: Elsevier Science Pub. Co.,
data.co.uk/alldata.php>. Last accessed: November 1, 2015. 1985.
[15] HOLDER, R. L.; NEVILL, A. M. Modelling performance at [22] BRAGA, A. P.; CARVALHO, A.; LUDERMIR, T. Redes
international tennis and golf tournaments: is there a home Neurais Artificiais – Teoria e Aplicações. Rio de Janeiro, RJ,
advantage? The Statistician, n. 46, p. 551–559, 1997. Brazil: LTC, 2000.
[16] BARNETT, T.; POLLARD, G. How the tennis court surface [23] CYBENKO, G. Approximation by superpositions of a
affects player performance and injuries. Medicine and Science sigmoidal function. Mathematics of Controls, Signals, and
in Tennis, n. 12, v. 1, p. 34-37, 2007. ISSN 1567-2352. Systems, Springer Verlag, n. 2, p. 303-314, 1989.
[17] WEISSTEIN, E. W. Correlation Coefficient. 2015. Available [24] HAYKIN, S. Neural Networks – A Comprehensive
online at: Foundation. Upper Saddle River, NJ, USA: Prentice Hall,
<http://mathworld.wolfram.com/CorrelationCoefficient.html>. 1998.
Last accessed: November 1, 2015.
[25] FERNANDES, M. A. Inteligência computacional aplicada à
[18] ZADEH, L. Fuzzy Sets. Information and Control, n. 8: p. 338- previsão de vencedores em partidas de tênis. Revista
353, 1965. Available online at: Brasileira de Computação Aplicada, v. 8, n. 2, p. 82–98,
<http://dx.doi.org/10.1016/S0019-9958(65)90241-X>. Last 2016. ISSN 2176-6649.
accessed: November 1, 2015.
[26] ARRUDA, M. L. Poisson, Bayes, Futebol e DeFinetti.
[19] JANG, J.-S.; SUN, C.-T.; MIZUTANI, E. Neuro-Fuzzy and Master’s Degree Dissertation, USP, São Paulo, Brazil, 2000.
Soft Computing: A Computational Approach to Learning and
Machine Intelligence. Upper Saddle River, NJ, USA: Prentice- [27] LIMA, B. N. B. et al. Probabilidades no esporte. TRIM:
Hall, 1997. revista de investigación multidisciplinar, Universidad de
Valladolid, n. 5, p. 39-53, 2012. Available online at:
[20] FERNANDES, M. A. Classificação de alvos utilizando <http://uvadoc.uva.es/handle/10324/11665>. Last accessed:
atributos cinemáticos. Master’s Degree Dissertation, ITA, São November 1, 2015.
José dos Campos, Brazil, 2009.

You might also like