Sports-scheduling
Sports-scheduling
https://doi.org/10.1007/s11750-020-00576-9
ORIGINAL PAPER
Guillermo Durán1,2
Abstract
Recent years have witnessed vigorous development in the application of mathemati-
cal and computational techniques to many aspects of the organization and planning
of sports competitions. Known as sports scheduling, this field is one of the subjects
of the present survey, which reviews different problems tackled in the associated
literature, the techniques employed for solving them and the results of their imple-
mentation in real-world cases. Other topics in sports analytics such as player perfor-
mance, result prediction, fantasy games and analysis of rankings are also examined.
Special attention is given to applications in Latin America. The main challenges fac-
ing sports scheduling and other areas of sports analytics are also discussed.
1 Introduction
Sports are not only one of the world’s most popular recreations but also a global
industry with a large economic footprint. The last Super Bowl, the annual champion-
ship game of professional American football (NFL), was a great sporting and artistic
event. Held in Miami in February 2020, it featured a half-time show with Jennifer
Lopez and Shakira, pulling in a worldwide television audience of some 100 mil-
lion viewers for a single event that lasted more than 3 h. The Rio Olympics in 2016
brought together more than 11,000 athletes representing 207 nations, who competed
in 41 disciplines from 28 different sports. The games were followed on television
* Guillermo Durán
[email protected]
1
Instituto de Cálculo y Departamento de Matemática, FCEN-UBA and CONICET, Buenos Aires,
Argentina
2
Departamento de Ingeniería Industrial, FCFM-Universidad de Chile, Santiago, Chile
13
Vol.:(0123456789)
G. Durán
by 5 billion viewers around the world and the cost of staging them was 13 billion
US dollars. Football, the most popular sport in Latin America, has some 200 mil-
lion registered players at all levels in more than 200 countries affiliated with FIFA,
the game’s international governing body. The organization’s most recent World Cup,
hosted by Russia in 2018, took place over an entire month, attracted a television
audience estimated at more than 3.5 billion and required investments of about 14 US
billion dollars, 2 billion more than the previous World Cup held in 2014 in Brazil.
These figures not only underline the importance of sports across the planet but
also suggest a breadth of possibilities for the utilization of quantitative tools to
enhance athlete performance and improve the organization, logistics, economics and
overall functioning of sports activities generally. Such applications of mathematical
tools are known in the literature as sports analytics, and make particularly heavy use
of the techniques of optimization, statistics and data science.
One of the fields within sports analytics that has seen the greatest development
in recent years is the efficient organization and planning of competitions known
as sports scheduling. The present survey reviews the main problems addressed in
the sports scheduling literature, the mathematical and computational techniques
employed in their solution and the results they have generated in actual practice.
Special emphasis is given to applications in Latin America, their implementation
and impact.
The first theoretical results in the field of sports scheduling date back to the 1980s
and the work of Dominique (De Werra 1980, 1981, 1982). In particular, the author
proved certain results on the minimum number of breaks (consecutive matches with
the same home-away status) in round-robin tournaments using concepts borrowed
from graph theory.
Most sports leagues around the world use a round-robin schedule format in which
each team plays every other team a fixed number of times. If they meet each other
just once, the format is a single round robin; if twice, it is a double round robin,
etc. According to Nemhauser and Trick (1998), there are two types of round-robin
schedules: temporally constrained and temporally relaxed. In the former case, the
number of available game slots (known as “rounds”) is equal to the number of
games each team must play, plus any necessary byes for leagues with an odd num-
ber of teams. Such schedules are said to be compact. Many leagues use this setup,
including most professional football leagues in Europe and Latin America. In tem-
porally relaxed schedules, on the other hand, the number of available rounds for
each team is greater than the minimum necessary so each team will have multiple
byes. This type of schedule is used by professional leagues in North America such
as the National Basketball Association (NBA) (Bean and Birge 1980; Bao 2009) and
the National Hockey League (NHL) (Fleurent and Ferland 1993; Costa 1995; Craig
et al. 2009), numerous amateur sports leagues (Knust 2010), and cricket leagues in
Australia, England and New Zealand (Willis and Terrill 1994; Wright 1994, 2005).
Interesting surveys of the sports scheduling literature can be found in Easton et al.
(2004), Kendall et al. (2010) and Rasmussen and Trick (2008), but all of them were
published at least a decade ago so an update covering the material published in the
years since is now in order. Benchmarks of instances from some applications can be
found in Nurmi et al. (2010) and Van Bulck et al. (2020).
13
Sports scheduling and other topics in sports analytics: a survey…
As regards sports analytics problems other than sports scheduling, the range
of possibilities is even greater. A number of these will be reviewed in this sur-
vey, including results prediction, fantasy games, rankings analysis and player per-
formance, the best-known example of the latter being an application to baseball
described in the book entitled Moneyball (Lewis 2003).
The remainder of this paper consists of three sections. Section 2 is devoted to
sports scheduling and surveys problems of optimization and feasibility in the draw-
ing up of sports schedules and efficient referee assignments, with special focus on
Latin-American applications. A number of basic formulations using integer pro-
gramming are also exhibited. Section 3 reviews some results on other topics in
sports analytics, including player performance, results prediction, fantasy games
and sports ranking analysis. Finally, Sect. 4 presents our conclusions and discusses
a number of interesting possibilities for future work in sports scheduling and other
areas of sports analytics.
2 Sports scheduling
This section addresses a number of different sports scheduling problems and devel-
ops, for three cases, their formulation as a mathematical model. The first subsection
introduces the optimization problems addressed in sports scheduling, most of which
are variations on the famous travelling tournament problem (TTP). The second sub-
section is devoted to feasibility problems, for which merely finding a feasible solu-
tion is a complex task given the set of constraints. Our focus will be on applications
to football, which generally fall into this category. The third subsection deals with
problems relating to the efficient assignment of referees in sports leagues, which
are essentially derivations of the travelling umpire problem (TUP) and the referee
assignment problem (RAP). Finally, the fourth subsection presents integer program-
ming models for some of the problems formulated in the three previous subsections.
An excellent test bed for sports scheduling models, algorithms and methodological
tools is the TTP as set out in Easton et al. (2001). Given a set of n teams and the
distances between their home cities (using as a source of the first instances the US-
based Major League Baseball), the TTP consists in scheduling a fictitious compact
double round-robin tournament (each team plays every other team twice, once at
home and once away) with 2(n − 1) rounds in such a way that no team plays fewer
than L or more than U consecutive games either at home or away, no team plays any
other team in two consecutive rounds, and the total distance travelled is minimized.
Also, no team returns home during a sequence of away games known as a road trip.
The TTP is very difficult to solve; indeed, small instances of just 12 teams have
still not been solved to optimality (see the TTP website Trick 2001). This remains
the case even for circular instances in which the team venues form a circle such that
the distance between two consecutive team venues is always equal to 1 (Trick 2001).
13
G. Durán
By contrast, solving such instances of the famous travelling salesman problem (TSP)
is trivial. However, it has also been proved in the case of the TTP that if all distances
between any pair of teams are equal to 1, distance minimization and breaks maximi-
zation are equivalent and solving the instance is, therefore, straightforward (Urrutia
and Ribeiro 2006).
The computational complexity of the TTP has not yet been fully determined apart
from a few special cases of L and U. What is known is that the problem is poly-
nomial for L = 1 and U = 2 (Easton et al. 2003), and NP-hard for L = 1 and U = 3
(Thielen and Westphal 2011) as well as for L = 1 and U = ∞ (Bhattacharyya 2016).
For any other values of L and U, the computational complexity remains an open
question.
Various techniques have been suggested for solving the TTP. A method proposed
in Easton et al. (2003) combines integer programming with constraint programming
(CP) while in Goerigk and Westphal (2012) integer programming is combined with
local search. In Benoist et al. (2001), an algorithm mixes CP with Lagrangian relax-
ation and in Henz (2004) CP is used in conjunction with large neighborhood search.
Different heuristic solution approaches have also been developed, such as simulated
annealing (Anagnostopoulos et al. 2006; Van Hentenryck and Vergados 2006), tabu
search (Cardemil and Durán 2002; Di Gaspero and Schaerf 2007) and genetic algo-
rithms (Choubey 2010). Yet other proposals attempt to solve the TTP using approxi-
mation algorithms (Miyashiro 2008; Yamaguchi et al. 2009; Westphal and Noparlik
2012).
A number of authors have analyzed variants of the original TTP. One that
arises quite naturally is the travelling tournament problem with predefined venues
(TTPPV) in which each pair of teams plays a single match with a predetermined
home-away status (Melo et al. 2006). The problem consists in finding a compact
single round-robin schedule compatible with the preset venue definitions such that
the total distance traveled by the teams is minimized and no team plays more than
three consecutive games either at home or away. Three different formulations of the
TTPPV are presented in Melo et al. (2006), one of which will be considered here in
Sect. 2.4.1. The computational tests detailed in the cited study show that instances
of the problem up to n = 8 can be solved to optimality with exact formulations but
beginning with n = 10, recourse must be had to heuristic methods.
The mirrored version of the problem has been analyzed in two articles. A two-
stage method that can solve to optimality instances of up to eight teams is proposed
in Cheung (2008). A heuristic combining GRASP with an iterated local search
presented in Ribeiro and Urrutia (2007) has obtained very good solutions for large
instances including the Brazilian football league in 2003 which then had 24 teams.
Another variant of the TTP, for application to multiple round robins (i.e., not just a
double), is presented in Hoshino and Kawarabayashi (2011). The authors designed
a method based on Dijkstra’s well-known algorithm to obtain a multi-round sched-
ule. They tested their approach in a case study of the Nippon Professional Baseball
(NPB) in Japan, where 2 leagues of 6 teams play a total of 40 sets of three intra-
league games over a series of 8 single round robins. The resulting schedules reduced
the total travel distance of the NPB’s manually defined 2010 season calendar by
25%.
13
Sports scheduling and other topics in sports analytics: a survey…
13
G. Durán
Unlike the cases discussed in the previous section, where optimization figures prom-
inently because of the travel reduction objective, many sports scheduling problems
have as their main objective the satisfaction of a series of constraints imposed by
a governing body of the league to be scheduled. In such situations, the problem is
essentially one of feasibility, and very often it is precisely the need to satisfy all of
those restrictions that results in great computational complexity. Typically there will
still be an objective function (to minimize breaks, minimize occurrences of multiple
games in a given locality/venue in the same round, etc.) but in the search for a good
solution it will not be the primary concern.
The majority of the world’s football leagues fit this category given that their sea-
son formats are typically round robins with a single match per week, which effec-
tively rules out the possibility of multi-game road trips and, therefore, the need to
minimize travel distance. The prime focus of this subsection will thus be on feasibil-
ity problems, especially as they arise in football and with an emphasis on successful
Latin American applications.
13
Sports scheduling and other topics in sports analytics: a survey…
To the best of our knowledge, there are only two documented real-world cases
of the application of OR scheduling to football in the last century. The first case
was the 1989–1990 season of the Dutch league (Schreuder 1992) in which 18 teams
competed in a single round robin. The underlying model considered various com-
mercial, sporting, and organizational factors. The aim was to satisfy certain condi-
tions by maximizing a weighted sum of soft requirements. The solution method was
heuristic and 80% of the requirements were satisfied.
The other case, this one in Latin America, was the 1995 season of the Argen-
tinean league. A simulated annealing approach was used to assign teams to a canon-
ical schedule, the same predetermined template used by most leagues around the
world where mathematical scheduling techniques are not employed. According to
Eduardo Dubuc [reported in Paenza (2006)], the assignments considered a num-
ber of factors relating to television broadcasting of the games. The approach was
dropped the following year, however.
In the following subsections, we survey various real applications implemented
in Latin America, Europe and international tournaments held over the last 20 years.
The first lasting experience with OR scheduling in football was initiated in Latin
America, where mathematical techniques have been used to generate schedules
for Chile’s professional leagues since 2005. Actual implementation is handled by
the Chilean Professional Football Association (ANFP), which applies integer pro-
gramming-based methods to decide what matches are played in each round of the
season. Various objectives are taken into account such as keeping costs down and
ensuring that the schedules are attractive for fans. The Association has so far defined
more than 60 schedules for the professional-level First, Second and Third Divisions,
the Women’s League and the youth divisions. The direct economic impact of the
improvements obtained through these OR techniques has been estimated at approxi-
mately US dollars 70 million, which includes reductions in television broadcaster
operating costs, growth in pay-television subscriptions, increased ticket revenue, and
lower travel costs for the teams (Alarcón et al. 2017).
As noted earlier, the problems encountered in these applications are essentially
those of feasibility. Since solving the basic model directly in its integer linear for-
mulation is only possible for small instances, these problems are typically addressed
through the use of a three-stage approach, as was proposed in Nemhauser and Trick
(1998). In the first stage, an integer programming model generates home-away pat-
terns (HAP) for the teams. In the second stage, a different feasible HAP is assigned
to each team (recall that in any feasible round-robin schedule, no two teams may
have the same HAP). Finally, in the third stage another integer programming model
is used to verify the remaining constraints reflecting conditions requested by the
Association and the teams themselves. Using this approach, solutions are found for
single round-robin instances of Chile’s First Division in 3–8 h.
In certain cases, however, the problem has proved to be solvable in a single stage
using constraint programming or a branch-and-cut approach. A detailed exposition
13
G. Durán
of these models and their implementation is set out in Durán et al. (2007), Durán
et al. (2012) and Noronha et al. (2007). A simplified version of the basic integer lin-
ear programming formulation is described below in Sect. 2.4.2.
These applications of OR techniques in Chilean professional football have also
had significant non-economic impacts (Alarcón et al. 2017). For example, the incor-
poration of team requirements and various sporting criteria has improved sched-
ule fairness and the transparency of the whole scheduling process, and has also
increased Chilean football fans’ interest in the leagues. The models and methods
used in the scheduling have been disseminated widely, which has helped to promote
OR as an effective tool for addressing practical problems. Our outreach activities on
sports scheduling have involved thousands of high school and university students
in four countries and a more general audience of millions of television viewers and
Internet users.
The Chilean football scheduling project as a whole was a finalist for the 2016
Franz Edelman Award, the most important international operations research com-
petition, organized by the US-based Institute for Operations Research and the Man-
agement Sciences (INFORMS). It was also a finalist for the EURO Excellence in
Practice Award (Bonn, 2009).
Other reported real-world scheduling applications in Latin American football
have been implemented by leagues in Brazil, Honduras, Ecuador and Argentina.
Similar techniques are now being used by DIMAYOR, Colombia’s top-tier division,
in a project which has yet to be written up in the literature.
The OR scheduling of Brazil’s highest division for the 2009 and 2010 seasons
is presented in Ribeiro and Urrutia (2012). The league, organized by the Brazilian
Soccer Confederation (CBF), had 20 teams in both years and the season format was
a mirrored double round robin. The modeling of the problem had two objectives: the
minimization of the number of breaks (a logical condition for schedule fairness) and
the maximization of the number of attractive games between elite clubs that could
be broadcast by the largest television network in the country, which is the league’s
major sponsor. The CBF also imposed a series of hard constraints, many of which
were typical of such formats such as no breaks in the first two and last two matches
of the season, crossed home-away patterns for certain pairs of teams (i.e., when one
plays at home the other plays away) and no scheduling of the most attractive matches
for the last rounds of the season.
The authors modeled the problem using the above-mentioned three-stage solu-
tion approach based on the method proposed in Nemhauser and Trick (1998). For
each of the two objectives, bounds were readily calculated specifying, respectively,
the minimum necessary number of breaks, and the maximum possible number of
attractive games between elite clubs that could be broadcast on television. The prob-
lem was attacked in the third stage (that is, after generating the home-away patterns
and assigning them to the teams) using a classic multi-objective optimization tech-
nique, imposing the minimum number of breaks as a constraint and maximizing the
other (attractive games) objective. If the optimal value was the upper bound, the
solution was said to be at the “ideal point” and is the one retained; otherwise, the
two first stages are repeated. In practice, “ideal point” solutions considered optimal
13
Sports scheduling and other topics in sports analytics: a survey…
in relation to both objectives were obtained for every instance in less than 10 min of
run time.
The proposed methodology was compared with the manual schedules drawn up
by the CBF for the 2005 and 2006 seasons, revealing in both cases that not only
were the manual game calendars far from the optimal solutions on both objectives
but also that certain of the hard constraints were not satisfied.
In the case of Honduras, the OR scheduling of the Central American country’s
professional league for the 2010 season is reported in Fiallos et al. (2010). A number
of conditions were considered and assigned different levels of priority. The objective
function minimized the cost of penalties for constraint violations. The 10 teams in
the league played a mirrored double round robin (i.e., the second half of the season
repeats the first except that the teams’ home-away status in each match is reversed),
followed by a playoff series. The integer linear program modeling the problem was
solved to optimality in a matter of minutes.
The development of the Ecuadorian application began in early 2011, when the
authors of Recalde et al. (2013) made a presentation to the Ecuadorian Football Fed-
eration (FEF by its Spanish initials) demonstrating that the design of feasible sched-
ules using mathematical programming would bring significant benefits unattain-
able with manual scheduling methods. The cited paper presents two methods that
were proposed for solving the problem, one an integer programming model and the
other a heuristic. The resulting schedules got the approval of the FEF and were used
from 2012 to 2018 by the league’s first and second divisions as well as the country’s
youth leagues.
Late in 2018 the Liga Pro de Ecuador, which replaced the FEF as the organizer
of the First and Second Divisions, contacted our group with a request to take over
the mathematical programming of their schedules. Beginning with the 2019 season,
we have designed the two divisions’ match calendars using an ILP model that satis-
fies various conditions. The most important of these include maintaining a fair bal-
ance between the teams both in travel distances and match locations in high- and
low-altitude regions of the country while also satisfying the special requirements of
individual teams. The First Division, currently with 16 teams, plays a mirrored dou-
ble round robin while the 10 Second Division teams play a quadruple round robin.
Each of the two double round robins is formatted on the French system, in which the
first round in the first round robin is the same as the last round of the second round
robin while the rest of the matches in each round robin follow the same sequence.
The third and fourth round robins simply repeat the first two. This French format
has been used in the past by the first divisions of the French and Russian leagues, is
the current arrangement in Luxembourg and the Czech Republic, and as we will see
later, is also employed in the South American Qualifiers for the World Cup.
Also in 2018, OR scheduling techniques were adopted by the Superliga Argen-
tina de Fútbol (SAF), responsible for organizing the season schedules of Argenti-
na’s First Division and its youth leagues. The First Division clubs all have teams in
each of these youth leagues, classed by age as major divisions (Under-20, Under-
18, Under-17) and minor divisions (Under-16, Under-15, Under-14). Regular season
play in the youth circuits typically follows a single round-robin format, the minor
divisions playing the same schedule as the majors but with the home-away status
13
G. Durán
of the matches reversed. This setup can give rise to very significant differences in
travel distances between a club’s major and minor division teams, a frustrating situa-
tion for club officials, coaches and players alike but almost impossible to avoid with
manual season scheduling techniques. Nor are manual methods able to take into
account any number of other criteria that go into the design of a satisfactory match
calendar.
Integer programming models have been implemented that simultaneously sched-
ule these multiple youth divisions while also meeting a series of other desirable con-
ditions. The central one was a better balance between the teams in travel distances,
which was pursued through the application of two alternative solution methods. One
of them was based on regional team clusters, treating the problem as one of feasibil-
ity, while the other was built around an explicit analysis of actual distances between
the teams’ home venues, an approach more closely resembling those described in
the previous section (optimization problems). The solutions generated by these two
methods have been used by the youth leagues to draw up their season schedules
since 2018, and have produced a series of benefits for all stakeholders (Durán et al.
2020).
In 2018–2019, the application of these techniques was extended to the First Divi-
sion of Argentina’s professional football league (Durán et al. 2019c). The season
consists of a single round-robin tournament. This can result in large differences
between teams in travel distances over the course of the tournament, an important
consideration in a league where an away match round trip can total more than 3000
km. In addition, since previous tournaments were scheduled manually, little care
was taken to alternate the venue where a given pair of teams met. This meant that
many such matchups were played at the same venue several seasons in a row. In the
most extreme case, two particular teams were scheduled to play each other at the
same venue 6 years running.
Among its other positive aspects, the model implemented for the Argentinean
league has generated relatively even road trips for neighboring teams and changed
the venues in every case where a pair of teams had played at the same location in
three or more consecutive seasons. The classic three-phase solution proved to be
ineffective in obtaining feasible solutions that were merely feasible, no doubt
because of the large number of teams (26 in 2018–2019 and 24 in 2019–2020).
The incorporation of geographical patterns (considering for each team, in addition
to whether in a given round it plays at home or away, the region where each away
match is played) was decisive in obtaining solution times of no longer than 1 h.
And finally, the scheduling also included the definition of the exact day and kickoff
time of each match within each round, results rarely attempted in football scheduling
applications.
13
Sports scheduling and other topics in sports analytics: a survey…
played a mirrored double round robin. The objective function minimized the cost of
penalizing the violated constraints and various hard or soft constraints representing
attractiveness to fans, fairness, and organizational criteria were imposed. The model
used a heuristic methodology to generate the schedules. In Austria, the same solu-
tion approach was utilized six times between 1997 and 2003 to define the schedules
for the league’s 10 teams playing a non-mirrored quadruple round robin.
A report on the scheduling of Denmark’s professional league premier division
for the 2006–2007 season using mathematical programming is given in Rasmussen
(2008). The 12 Danish clubs played a triple round robin and the schedule require-
ments included geographical and top-team considerations expressed by both hard
and soft constraints. The objective function minimized the penalties imposed for
violating certain of the restrictions. The solution used Benders’ decomposition and
column-generating techniques, providing good solutions with short computation
times.
The top-level Belgian soccer league has been scheduled since the 2007–2008 sea-
son using OR methods (Goossens and Spieksma 2009). The 18 league clubs play
a mirrored double round robin and the central objective is the minimization of the
number of breaks. A variety of other conditions are imposed relating to television
broadcast requirements and what are known as carry-over effects (team i confers
a unit of carry-over effect on team j if i plays the same third club in round k that j
plays in round k + 1). The constraints are classified by priority level and penalties
are levied for violations. The solution approach uses a mixed integer programming
(MIP) model that satisfies the mirroring constraint and minimizes the number of
breaks. Local search procedures are then used to improve the solution by minimiz-
ing the global penalty.
The scheduling of the Norwegian league is discussed in Flatberg et al. (2009).
The 14 teams in the league play a non-mirrored double round robin. The approach is
built around an integer programming model whose objective function incorporates
carry-over effects.
Other cases of the application of optimization techniques to the scheduling of
European football leagues that have not been reported in the literature but have been
brought to the attention of the author include leagues in the Czech Republic and
Poland, both instances dating back a number of years. A current example is the c.
The Spanish and English leagues, two of the world’s most important circuits, do
not use the canonical template, which is easily applied manually, nor have the sched-
uling methods they employ ever been reported in the literature. In Spain, the canoni-
cal schedule was used until some years ago (on this, see Goossens and Spieksma
2012, a highly interesting analysis of the numbers of teams and the schedule and
season formats in Europe’s top-tier leagues). As for England, the techniques utilized
by the firm responsible for the scheduling are described briefly on the webpage of
the Premier League (see https://www.premierleague.com/news/419020).
13
G. Durán
Very little has been reported in the literature on the use of mathematical and compu-
tational techniques to draw up schedules for international football tournaments. To
the best of our knowledge, the first case also originated in Latin America, with the
application of OR scheduling to the South American qualifying stage of the 2018
World Cup.
Every 4 years, the ten national teams in the South American Football Confedera-
tion (CONMEBOL) compete in this qualifying process to secure one of the slots in
the World Cup finals assigned to CONMEBOL members. The format for the quali-
fiers is a double round-robin tournament played over nine double rounds in which
every team plays twice in each one. The entire tournament is spread over 2 years, the
individual double rounds being scheduled several months apart.
The tournament schedule was based on the same mirrored schedule format for
some 20 years until CONMEBOL, faced with growing complaints from its mem-
bers, decided to look into a different arrangement for the 2018 qualifiers. Integer
programming models were used to construct schedules that overcame the main
drawbacks of the previous approach. After exploring numerous different design cri-
teria, a schedule template based on the French format was proposed to the CONME-
BOL members, the same one described here earlier for the Second Division of the
Ecuadorian league.
The main characteristics of the template were that every team plays once at
home and once away in each double round, each team’s double round home-away
sequences are well balanced (i.e., five double rounds start at home and four start
away, or the other way around), and no team is scheduled to play against both of the
strongest teams (Argentina and Brazil) in any given double round. The traditional
mirrored format previously used was unable to satisfy all of these conditions.
This proposal was unanimously approved by the CONMEBOL members and was
used for the qualifiers of the last (2018) World Cup. The design was presented at
an OR congress in Valencia, Spain in 2018, where it was a finalist for the EURO
Excellence in Practice Award. The implementation of the proposed models and their
impact are detailed in Durán et al. (2017). They will be employed again for the 2022
World Cup qualification process, the sole modification being that the restriction bar-
ring teams from playing both Argentina and Brazil in the same double round has
been dropped.
Unlike the scheduling literature on assigning teams to matches, few works have been
dedicated to the efficient assignment of matches to referees. As with the application
of the TTP to match scheduling, the publications that do exist utilize two different
formulations of the problem. In this case, the two are the Travelling Umpire Prob-
lem (TUP) (Trick et al. 2012) and the Referee Assignment Problem (RAP) (Duarte
et al. 2007a). Both attempt to capture the most essential elements of an efficient ref-
eree assignment.
13
Sports scheduling and other topics in sports analytics: a survey…
The TUP is formulated as follows. Given a set of n referees, 2n teams and a com-
pact double round-robin schedule format (every pair of different teams i and j plays
once in team i’s venue and once in team j’s), the number of rounds is 4n − 2. The
problem consists in assigning a referee to each match at a given venue in each round
(the assumed numbers of referees and teams implies all referees are needed in each
round) such that the total cost of referee travel is minimized over the course of the
tournament. There are five constraints in the problem:
Instances of the TUP are available on a website (Toffolo et al. 2015). In ⌊Oliveira
⌋
et al. (2015) it is shown that the TUP is NP-complete for 0 ≤ d1 ≤ n2 and
13
G. Durán
⌊ ⌋
d2 = n
2
− 1, while in Duarte et al. (2007a) it is demonstrated that the RAP is
NP-complete also.
Although both problems capture the general aspects of referee assignment, in
real cases other ad hoc conditions must be added to reflect the particularities of
the application in question. So far, there have been very few real-world appli-
cations of referee assignment using these techniques, but those that have been
reported include the following: cricket in England (Wright 1991), Major League
Baseball in the US (Trick et al. 2012), and in Latin America, football in Chile
(Alarcón et al. 2014) and basketball in Argentina (Durán et al. 2019b). The lat-
ter two cases will be described in more detail below. Other publications found in
the literature on the subject of referee assignment are confined to certain meth-
odological aspects or computational experiments (see, for example, Oliveira et al.
2016; Toffolo et al. 2016; Xue et al. 2015; Duarte et al. 2007b).
It should be noted that unlike match scheduling, where solutions typically relate
to complete seasons or tournaments, referee assignments are produced for short peri-
ods of time. The reason is that situations requiring assignment changes often arise
as a season progresses such as injuries, travel to other tournaments, or conflicts with
certain teams. When this happens, new data must be incorporated into the problem
so that updated solutions can be generated.
For Chilean football, integer linear programming models were developed for ref-
eree assignment at various league levels (Alarcón et al. 2014). A number of criteria
for enhancing the transparency and objectivity of the assignment process were con-
sidered. As well as better balances in the number of matches each referee had to offi-
ciate at, the frequency each was assigned to a given team, and the distances each had
to travel over the course of a season, these enhancements included the generation of
assignments that align referee skill and experience with the level of importance of
the matches they are assigned to.
The problem is solved using two approaches, one traditional and the other a pat-
tern-based approach inspired by the home-away patterns that have been successfully
employed for scheduling the season match calendars of sports leagues around the
world. The two approaches have been implemented for real instances of the prob-
lem, obtaining results that have significantly improved upon the manual assign-
ments. Also, the pattern-based approach has achieved major reductions in execution
times, solving real instances to optimality in a matter of seconds. By contrast, the
traditional formulation takes anywhere from several minutes to an hour or more to
find a solution.
The models initially assign referees for a complete tournament but are rerun for
each round or every two or three rounds, incorporating new and relevant information
as it arises. They were used a decade ago by Chile’s football leagues at both the pro-
fessional and youth levels.
As for Argentinean basketball, a referee assignment application employed by the
league that to some extent resembles the TUP, mainly in that both attempt to mini-
mize referee travel costs, is reported in Durán et al. (2019b). The approach devel-
oped for the purpose solves the problem using an integer linear programming model.
Simultaneously with travel cost minimization, the objective is to satisfy a series of
13
Sports scheduling and other topics in sports analytics: a survey…
other conditions. The problem is broken down into a series of relatively small sub-
problems representing successive periods of the season, the solution of each one
being part of the input to the one following.
Upon its development this approach was tested by applying it to the First Divi-
sion’s 2015–2016 season, the last one before the model was adopted when referees
were still assigned using manual methods. The travel costs simulated by the model
turned out to be 26% lower than those actually incurred under the manual defini-
tions, and unlike the latter, all of the different restrictions that had been requested by
the league authorities were satisfied. The tool has been used by the First and Second
Divisions since the 2016–2017 season.
In what follows, we set out for illustrative purposes the basic integer linear program-
ming (ILP) models for a variant of the travelling tournament problem (the TTPPV),
the Chilean football problem and the Travelling Umpire Problem.
This formulation is modeled with O(n3 ) variables and O(n4 ) constraints as presented
in Melo et al. (2006). Let G be the set of matches represented by ordered pairs of
teams determined by the predefined home-away status of each match. The game
between teams i and j is represented by an ordered pair that is either (i, j) or (j, i),
and is played in the former case at the team i venue and in the latter case at the team
j venue. Thus, for every pair of different teams i and j, either (i, j) ∈ G , or (j, i) ∈ G.
2.4.1.1 Variables The following two families of binary variables are needed to for-
mulate the model:
{
1 if team t plays at home against team j in round k.
ztjk =
0 otherwise.
{
1 if team t travels from the team i venue to the team j venue .
ytij =
0 otherwise.
The z variables are typically found in this type of formulation. The y variables repre-
sent the trip performed by a team between two cities (recall that a journey between
the cities of two different teams is performed at most once by each team).
2.4.1.2 Objective function The objective function minimizes the travel distances
over all of the teams.
n n n
∑ ∑ ∑
mπn
́ dij ⋅ ytij .
t=1 i=1 j=1
13
G. Durán
2.4.1.3 Constraints
2. The variables corresponding to games between teams t and j at the team t venue
take the value of zero if the game is predetermined to occur at the team j venue.
n−1
∑
ztjq = 0, ∀(j, t) ∈ G.
q=1
5. Team t makes a trip from the home city of team i to its own home city if it has an
away game against the latter followed by a home game in the succeeding round.
n
∑
ytit ≥ zit,k−1 + ztjk − 1, t, i = 1, … , n, with t ≠ i, k = 2, … , n − 1.
j=1
j≠t
6. Team t travels from its home city to that of team i to play away against the latter
following a home game in the previous round.
n
∑
ytti ≥ ztj,k−1 + zitk − 1, t, i = 1, … , n, with t ≠ i, k = 2, … , n − 1.
j=1
j≠t
7. Team t travels from its home city to the home city of team i if it plays away
against the latter in the first round.
ytti ≥ zit1 , t, i = 1, … , n with t ≠ i.
13
Sports scheduling and other topics in sports analytics: a survey…
8. Team t returns from the home city of team i to its own home city if it plays away
against team i in the last round.
ytit ≥ zit,n−1 , t, i = 1, … , n with t ≠ i.
9. In any 4 consecutive games team t cannot play more than three away, implying
that the team cannot play more than three consecutive away games.
k+3 n
∑ ∑
zjtq ≤ 3, t, i = 1, … , n k = 1, … , n − 4.
q=k
j=1
j≠t
10. In any four consecutive games team t must play at least one away, implying that
the team cannot play more than three consecutive games at home.
k+3 n
∑ ∑
zjtq ≥ 1, t = 1, … , n k = 1, … , n − 4.
q=k
j=1
j≠t
2.4.2 An ILP model for the Chilean first division football league
Below we present a model for a 20-team league that plays a single round robin (i.e.,
19 rounds). The teams are grouped into four zones of five teams with separate stand-
ings and point totals for each zone. The two top teams in each group at season’s end
participate in a playoff system to determine the champion. The model is a simplified
version of the basic integer linear programming model used by the Chilean football
league. The complete version may be found in Durán et al. (2007).
To represent home and away game sequence constraints (that is, to model the
“breaks”), we define ∀i ∈ I and ∀k = 1, … , 18 the following auxiliary variables,
also binary:
13
G. Durán
{
1 if team i plays at home in rounds k and k + 1.
yik =
0 otherwise.
{
1 if team i plays away in rounds k and k + 1.
wik =
0 otherwise.
For the instance, considered here, the model generates approximately 8000 variables.
2.4.2.2 Objective function The model maximizes an objective function that meas-
ures the concentration of games between teams in the same group (known as “attrac-
tive games”) played towards the final rounds of the tournament. The OF takes the
following form:
∑ ∑ ∑ ∑
́
max k ⋅ xijk ,
1≤k≤19 e i ∈ t(e) j ∈ t(e)
where
t(e) denotes the set of teams in group e, e ∈ E = {1, 2, 3, 4}.
In certain cases other games may be included in the “attractive games” category,
such as the “classic rivalries” or matches between teams fighting to avoid relegation
to the Second Division.
2.4.2.3 Constraints The constraints formulated below are the basic ones required to
define a single round robin plus a few examples of restrictions peculiar to the Chilean
tournament. In the complete version of the model, the constraints number about 3000.
1. Each team plays every other team once over the course of the 19 rounds in the
tournament.
∑[ ]
xijk + xjik = 1 ∀ i, j ∈ I.
k
3. Each team plays at least nine rounds, but not more than ten, at home.
∑∑
9≤ xijk ≤ 10 ∀ i ∈ I.
j k
13
Sports scheduling and other topics in sports analytics: a survey…
8. Let A be a set of rounds such that if a team plays at home (away) in a round
belonging to A, it must play away (at home) in the following round. Usually,
A = {1, 18}. The objective is to balance the teams’ home and away games
between the early and late stages of the tournament.
∑[ ]
xijk + xij(k+1) = 1 ∀ i ∈ I, k ∈ A.
j
9. Each team must play two of its four group opponents at home and the other two
away.
∑ ∑
xijk = 2 ∀e ∈ E, i ∈ t(e).
k j ∈ t(e)
10. When a northern (southern) team plays two consecutive away games, neither of
them will be played in the South (North).
∑
[xijk + xij(k+1) ] ≤ 1 − wjk ∀j ∈ North, k < 19
i∈South
∑
[xijk + xij(k+1) ] ≤ 1 − wjk ∀j ∈ South, k < 19.
i∈North
11. The three strong teams (UC, CC, UCH) play each other in three classic match-
ups, each team playing one of the matches at home.
∑[ ] ∑[ ]
xhik + xjik = xhjk + xijk h = UC, i = CC, j = UCH.
k k
12. No team may play two consecutive games against a strong team.
∑ [ ]
xijk + xjik + xij(k+1) + xji(k+1) ≤ 1 ∀ i ∈ I, k < 19.
j∈Strong Teams
13
G. Durán
Set forth in what follows is the exact formulation of the travelling umpire problem
using an ILP as presented in Trick and Yildiz (2012). This specification has O(n4 )
variables and constraints.
• 𝕊, 𝕋 , 𝕌 : Set{
of rounds, teams and referees, respectively.
j if team 𝐢 plays at home in round 𝐬 against team 𝐣
• OPP(s, i) =
−j if team 𝐢 plays away in round 𝐬 against 𝐣
• dij : kilometers between the home venues of teams 𝐢 and 𝐣
• n1 = ⌊ n −⌋d1 − 1
• n2 = 2n − d2 − 1
• N1 = {0, … , n1 }
• N2 = {0, … , n2 }.
2.4.3.2 Variables The following two families of binary variables are needed to for-
mulate the model:
⎧
⎪ 1 if a match at venue 𝐢, in round 𝐬, is assigned to referee 𝐮.
xisu =⎨
⎪0 otherwise.
⎩
⎧
⎪ 1 if referee 𝐮 officiates at a game at venue 𝐢 in round 𝐬 and then a game at venue 𝐣 in round 𝐬 + 1.
zijsu =⎨
⎪0 otherwise.
⎩
2.4.3.3 Objective function The model minimizes an objective function that meas-
ures the total number of kilometers travelled by the set of referees.
∑∑∑ ∑
mπń dij ⋅ zijsu .
i∈𝕋 j∈𝕋 u∈𝕌 s∈𝕊∶s<|S|
2.4.3.4 Constraints
13
Sports scheduling and other topics in sports analytics: a survey…
∑
xisu = 1, ∀s ∈ 𝕊, u ∈ 𝕌.
i∈𝕋 ∶OPP(s,i)>0
3. Each referee officiates at least once at a match involving each team at home:
∑
xisu ≥ 1, ∀i ∈ 𝕋 , u ∈ 𝕌.
s∈𝕊∶OPP(s,i)>0
4. Each referee officiates no more than once at any given venue every n − d1 rounds:
∑
xi(s+s1 )u ≤ 1, ∀i ∈ 𝕋 , u ∈ 𝕌, s ∈ 𝕊 ∶ s ≤ |S| − n1 .
s1 ∈N1
5. Each ⌊referee
⌋ officiates at a match involving any given team no more than once
every 2n − d2 rounds:
( )
∑ ∑
xi(s+s2 )u + xk(s+s2 )u ≤ 1, ∀i ∈ 𝕋 , u ∈ 𝕌, s ∈ 𝕊 ∶ s ≤ |S| − n2 .
s2 ∈N2 k∈T∶OPP(s+s2 ,k)>0
6. The following constraint is required to define the logical relationship between the
x and z variables, thus allowing the referee trips to be modeled:
xisu + xj(s+1)u − zijsu ≤ 1, ∀i, j ∈ 𝕋 , u ∈ 𝕌, s ∈ 𝕊 ∶ s < |S|.
In this section, we survey a few areas of research in sports analytics and the related
literature. In the area of sports business, Fried and Mumcu (2016) is worthy of note.
Player performance has been the subject of a number of books in recent years. Two
examples of these works are Alamar (2013) and Miller (2015).
Of cases involving a particular sport, the best known is the one described in the
book “Moneyball” (Lewis 2003), mentioned in the Introduction. It is based on the true
story of Billy Beane, general manager of the Major League Baseball team the Oakland
Athletics. Beane used data science to determine the team’s roster and improve its per-
formance, achieving great success with much smaller budgets than the big clubs in the
league. These techniques are now widely practiced in basketball, not just for choosing
players but also for game play decisions (Zuccolotto and Manisera 2020), especially
among NBA teams.
The following subsections focus on results prediction, fantasy games and rankings
analysis, three lines of investigation in sports analytics that make particularly heavy use
of statistical tools and data science.
13
G. Durán
3.1 Result prediction
Sports results prediction has seen steady growth in recent years. The most influential
Internet site for this activity is fivethirtyeight.com, which focuses on politics and
economics as well as sports. Taking its name from the number of electors in the
US electoral college, the site was created by the American writer and statistician
Nate Silver in 2008. Silver had already made his name in the early 2000s with the
development of a statistical system that projects the future performance of baseball
players. Another website, this one featuring predictions for cricket, basketball and
football using artificial intelligence tools, is predict22.com.
For predictions relating exclusively to football, a Bayesian methodology was pro-
posed in Suzuki et al. (2010). Its performance was tested on the results of the 2006
World Cup in Germany. The pioneer website for football-only predicting is 851.cl.
This site was launched for the 2015 Copa América (formerly the South American
Football Championship), was utilized again for the 2016 edition of the tournament,
and has been applied in various seasons of the Chile’s First Division.
The model behind 851.cl is based on a predictive model devised by Dixon and
Coles (1997) that assumes goal-scoring follows a Poisson distribution. It has two
specific coefficients per team denoted “offensive strength” and “defensive strength”,
and two general ones (the same for each team) capturing the home advantage
denoted “extra at-home offensive strength”, and “extra at-home defensive strength”.
In any given match with a home team H and an away team A, goals by H behave
according to a Poisson variable with parameter 𝜆 while goals by A behave accord-
ing to a Poisson variable with parameter 𝜇, where 𝜆 = “offensive strength H”—
“defensive strength A” + “extra at-home offensive strength” and 𝜇 = “offensive
strength A”—“defensive strength H”—“extra at-home defensive strength”.
These “extra” coefficients are not necessarily positive, and for matches played at
a neutral venue they are omitted from the calculations, in which case H and A are
taken simply as the identifiers of the two teams. All the coefficients are estimated by
generalized linear regression (calculated with the R software package) using data on
the teams’ past matches. The probabilities attached to the different possible results
for each match are calculated as the product of the two Poisson variables (i.e., the
probability that H scores m goals and A scores n goals is the probability of the for-
mer multiplied by that of the latter given the corresponding 𝜆 and 𝜇 coefficients for
H and A, respectively). Using these probabilities, future scheduled matches are sim-
ulated to obtain each team’s probability of ending the season at a given place in the
standings or advancing to a certain round in a cup tournament.
A broadly similar model on the website 301060.exactas.uba.ar. incorporates addi-
tional home-away factors specific to each team with a view to achieving more realis-
tic formulations of them than a single general specification used for all teams alike.
This extension of the model was used to predict the outcomes of the 2018 World
Cup, the 2019 Copa America, and the 2018–2019 and 2019–2020 Superliga Argen-
tina (i.e., First Division) seasons. As an indication of this approach’s accuracy, we
note that in the case of the 2018 World Cup, the score in 30 out of the 48 group
stage matches was precisely the one the model found most likely, while the same
was true for 11 out of the 16 matches in the knockout stage.
13
Sports scheduling and other topics in sports analytics: a survey…
3.2 Fantasy games
Building sports results prediction models has a number of aspects in common with
the development of quantitative tools for fantasy sports games. In football, the first
fantasy game was “Fantacalcio” (http://www.fantacalcio.kataweb.it), which was
13
G. Durán
3.3 Rankings
Another area that has generated considerable interest in the literature is sports rank-
ings. There are a number of works that study the predictive power of the football
rankings published by FIFA, which attempt to classify the world’s national sides.
Two examples are McHale and Davies (2007) and Lasek et al. (2013).
In Lasek et al. (2016), the authors propose strategies for teams to improve their
ranking while Cea et al. (2020) presents a discussion of what are considered to be
13
Sports scheduling and other topics in sports analytics: a survey…
deficiencies in the ranking method used by FIFA up until the 2018 World Cup and
makes proposals for its improvement. An alternative predictive model is calibrated
to provide a reference ranking for evaluating the performance of various simple
changes to that procedure. It should be noted, however, that since 2018 FIFA has
used a new ranking procedure whose formula takes its inspiration from the Elo rat-
ing system, which was originally created by the Hungarian physicist Arpad Elo to
rank chess players but has since been applied to various sports. No study of the pre-
dictive abilities of the new procedure has yet been published. The use of the FIFA
rankings and the draw format for the World Cup finals are discussed in Cea et al.
(2020) and Guyon (2015). Certain ideas proposed in the latter work were adopted
for the 2018 World Cup draw.
Similar studies have been done on rankings and performance analyses in profes-
sional tennis. In Radicchi (2011), Radicchi proposed an algorithm for identifying
the best tennis player in history. In an article published the following year (Din-
gle et al. 2012), the authors explored the relationship between official rankings of
professional tennis players and rankings computed using a variant of the PageRank
algorithm, originally generated by the creators of Google to rank pages (see Page
et al. 1998; Brin and Page 1998).
They show that Radicchi’s equations constitute a direct application of the Pag-
eRank algorithm and present up-to-date comparisons of official rankings with Pag-
eRank-based rankings for both the Association of Tennis Professionals (ATP) and
Women’s Tennis Association (WTA) tours. For top-ranked players the two rankings
are broadly in line but for those near the bottom there is wide variation, raising ques-
tions about how well the official ranking mechanism reflects true player ability. For
a 390-day sample of tennis matches, PageRank-based rankings were found to be bet-
ter predictors of match outcome than the official rankings. A later adaptation of the
authors’ ideas presented in Aronson (2015) arrives at similar conclusions.
This survey has reviewed the main problems addressed in the sports scheduling lit-
erature, the mathematical and computational techniques used to solve them and their
practical results. Much effort has been devoted to the study of these problems in
recent years and as we have just seen, many interesting real-world applications have
been implemented across a range of sports in countries around the globe. In particu-
lar, a number of successful applications in football, volleyball and basketball have
been reported over the last 15 years from Latin America, many of which have been
described here above.
There are, however, numerous challenges still to be tackled. One of them is the
parallel scheduling of multiple leagues, as is done with the six interrelated divi-
sions of the Argentinean youth leagues discussed here earlier. Other cases found in
the literature are an amateur softball league in the US (Grabau 2012), a regional
rugby league in New Zealand (Burrows and Tuffley 2015) and an amateur table ten-
nis league in Germany (Schönberger 2017). Multi-league scheduling is approached
from a more methodological standpoint in Davari et al. (2020). The article addresses
13
G. Durán
13
Sports scheduling and other topics in sports analytics: a survey…
double challenge is proposed in Linfati et al. (2019). The formulation of the problem
includes the usual objective functions and restrictions for the two subproblems to
address such considerations as travel distance reduction and minimum referee quali-
fications for certain matches. The proposed method is tested on two Latin American
leagues (the first divisions of Chile’s football and basketball leagues) and the World
Volleyball League, the latter a competition between participating countries’ national
teams, with very promising results.
As regards other areas in sports analytics, this survey reviewed the use of quan-
titative tools for results prediction, fantasy games and ranking analysis in various
sports. Also mentioned was the study of player performance, team improvement,
efficient use of team budgets, etc., with applications primarily to baseball, basketball
and American football. The use of these latter tools in football (soccer) is consider-
ably less common but will no doubt grow significantly in the years to come.
There no doubt exist many other aspects of sports analytics that are amenable
to mathematical techniques. For example, we are currently working on the applica-
tion of statistical techniques to interesting problems that arise in football such as the
moments at which goals are scored in the most important leagues and the generation
of a league inequality index based on the Gini Index used in economics (for more
details on this metric, see, for example, Ceriani and Verme 2012). The inequal-
ity analysis could be used to measure the differences between the teams of a given
league in their financial resources and how these relate to the teams’ respective point
totals in a given tournament.
In conclusion, the application of mathematical and computational techniques to
a variety of problems arising in sports scheduling and other areas of sports analyt-
ics over a range of different sports has been extensively studied and reported in the
literature. However, there remains ample room for broadening of the field to address
new questions and challenges. The coming years thus promise to be very productive.
Acknowledgements The author is grateful to both anonymous referees and one of the editors in chief for
their comments and suggestions, which were of great help in improving the final version of this article.
Thanks are also due to the Asociación Nacional de Fútbol Profesional de Chile (ANFP); the Superliga
Argentina de Fútbol (SAF); the Liga Pro in Ecuador; the Confederación Sudamericana de Fútbol (CON-
MEBOL); the Asociación de Clubes de Básquetbol de la Argentina (AdC); and the Asociación de Clubes
Liga Argentina de Voleibol (ACLAV) for their collaboration on various of the projects discussed in this
survey. The author also wants to thank to Mario Guajardo, Gonzalo Zamorano, Javier Marenco, Pablo
Rey, Facundo Gutiérrez, Iván Monardo, Alejandro Alvarez, Florencia Fernández, Flavia Bonomo, Ken-
neth Rivkin, Manuel Durán, Marcelo Mariosa and Claudia Zelzman for their helpful comments on this
survey. Partial funding for the preparation of this article was provided by the Department of Industrial
Engineering at the University of Chile, the Instituto Sistemas Complejos de Ingeniería (ISCI) in Santiago,
Chile (CONICYT PIA/BASAL AFB180003), and Grants nos. UBACyT 20020170100495BA (UBA,
Argentina) and ANPCyT PICT 2015-2218 (Mincyt, Argentina). Finally, the author owes a special debt
of gratitude to Rafael Epstein and Andrés Weintraub, who taught him the supreme art of applying opera-
tions research to real-world problems.
References
Alamar B (2013) Sports analytics: a guide for coaches, managers, and other decision makers. Columbia
University Press, New York
13
G. Durán
Alarcón F, Durán G, Guajardo M (2014) Referee assignment in the Chilean football league using integer
programming and patterns. Int Trans Oper Res 21(3):415–438
Alarcón F, Durán G, Guajardo M, Miranda J, Muñoz H, Ramírez L, Ramírez M, Sauré D, Siebert M,
Souyris S, Weintraub A, Wolf-Yadlin R, Zamorano G (2017) Operations research transforms sched-
uling of Chilean soccer leagues and South American world cup qualifiers. Interfaces 47(1):52–69
Anagnostopoulos A, Michel L, Van Hentenryck P, Vergados Y (2006) A simulated annealing approach to
the traveling tournament problem. J Sched 9(2):177–193
Aronson A (2015) TenisRank: Un nuevo ranking de jugadores de tenis basado en PageRank (in Spanish).
Master thesis in computer science, University of Buenos Aires
Atan T, Cavdaroglu B (2018) Minimization of rest mismatches in round robin tournaments. Comput Oper
Res 99:78–89
Bao R (2009) time relaxed round robin tournament and the NBA scheduling problem. Ph.D thesis, Cleve-
land State University
Bartsch T, Drexl A, Kröger S (2006) Scheduling the professional soccer leagues of Austria and Germany.
Comput Oper Res 33(7):1907–1937
Bean J, Birge J (1980) Reducing travelling costs and player fatigue in the National Basketball Associa-
tion. Interfaces 10:98–102
Beliën J, Goossens D, Van Reeth D, De Boeck L (2011) Using mixed integer programming to win a
cycling game. INFORMS Trans Educ 11(3):93–99
Benoist T, Laburthe L, Rottembourg B (2001) Lagrange relaxation and constraint programming collabo-
rative schemes for traveling tournament problems. In: Proceedings of the 3rd international work-
shop on the integration of AI and OR techniques (CP-AI-OR), pp 15–26
Bhattacharyya R (2016) Complexity of the unconstrained traveling tournament problem. Oper Res Lett
44(5):649–654
Bonomo F, Cardemil A, Durán G, Marenco J, Saban D (2012) An application of the traveling tournament
problem: the Argentine volleyball league. Interfaces 42(3):245–259
Bonomo F, Durán G, Marenco J (2014) Mathematical programming as a tool for virtual soccer coaches: a
case study of a fantasy sport game. Int Trans Oper Res 21(3):399–414
Brandao F, Pedroso JP (2014) A complete search method for the relaxed traveling tournament problem.
EURO J Comput Optim 2:77–86
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN
Syst 30(1–7):107–117
Burrows W, Tuffley C (2015) Maximizing common fixtures in a round robin tournament with two divi-
sions. Australas J Comb 63(1):153–169
Cardemil A, Durán G (2002) Un algoritmo tabú search para el traveling tournament problem (in Span-
ish). Revista Ingeniería de Sistemas (Universidad de Chile) 18:95–115
Cea S, Durán G, Guajardo M, Sauré D, Siebert J, Zamorano G (2020) An analytics approach to the FIFA
ranking procedure and the World Cup final draw. Ann Oper Res 286(1):119–146
Ceriani L, Verme P (2012) The origins of the Gini index: extracts from Variabilità e Mutabilità (1912) by
Corrado Gini. J Econ Inequal 10:421–443
Cheung K (2008) Solving mirrored traveling tournament problem benchmark instances with eight teams.
Discret Optim 5(1):138–143
Choubey N (2010) A novel encoding scheme for traveling tournament problem using genetic algorithm.
IJCA Spec Issue Evolut Comput 2:79–82
Cocchi G, Galligari A, Picca NF, Piccialli V, Schoen F, Sciandrone M (2018) Scheduling the Italian
National Volleyball Tournament. Interfaces 48(3):271–284
Costa D (1995) An evolutionary Tabu search algorithm and the NHL scheduling problem. INFOR Inf
Syst Opera Res 33(3):161–178
Craig S, While L, Barone L (2009) Scheduling for the National Hockey League using a multi-objective
evolutionary algorithm. In: Proceedings of the Australasian joint conference on artificial intelli-
gence, pp 381–390
Davari M, Goossens D, Belien J, Lambers R, Spieksma F (2020) The multi-league sports scheduling
problem, or how to schedule thousands of matches. Oper Res Lett 48(2):180–187
De Werra D (1980) Geography, games and graphs. Discret Appl Math 2:327–337
De Werra D (1981) Scheduling in sports. N-Holl Math Stud 11:381–395
De Werra D (1982) Minimizing irregularities in sports schedules using graph theory. Discret Appl Math
4:217–226
13
Sports scheduling and other topics in sports analytics: a survey…
Di Gaspero L, Schaerf A (2007) A composite-neighborhood tabu search approach to the traveling tourna-
ment problem. J Heuristics 13:189–207
Dingle N, Knottenbelt W, Spanias D (2012) On the pageranking of professional tennis players. Lect
Notes Comput Sci 7587:237–247
Dixon M, Coles S (1997) Modeling association football scores and inefficiencies in the football betting
market. Appl Stat 46(2):265–280
Duarte A, Ribeiro CC, Urrutia S (2007) Referee assignment in sports tournaments. Lect Notes Comput
Sci 3867:158–173
Duarte A, Ribeiro CC, Urrutia S (2007) A hybrid ILS heuristic to the referee assignment problem with an
embedded MIP strategy. Lect Notes Comput Sci 4771:82–95
Durán G, Miranda J, Guajardo M, Sauré D, Souyris S, Weintraub A, Wolf R (2007) Scheduling the Chil-
ean soccer league by integer programming. Interfaces 37:539–552
Durán G, Guajardo M, Wolf YR (2012) Operations research techniques for scheduling Chile’s second
division soccer league. Interfaces 42(3):273–285
Durán G, Guajardo M, Sauré D (2017) Scheduling the South American qualifiers to the 2018 FIFA World
Cup by integer programming. Eur J Oper Res 262(3):1109–1115
Durán G, Durán S, Marenco J, Mascialino F, Rey P (2019) Scheduling Argentina’s professional bas-
ketball leagues: a variation on the relaxed travelling tournament problem. Eur J Oper Res
275(3):1126–1138
Durán G, Guajardo M, Gutiérrez F (2019) Efficient referee assignment in Argentina’s professional basket-
ball leagues using operations research methods. (submitted)
Durán G, Guajardo M, López A, Marenco J, Zamorano G (2020) Scheduling multiple sports leagues with
travel distance fairness: an application to Argentinean youth football. Int J Appl Anal (in press)
Durán G, Guajardo M, Zamorano G (2019) Scheduling the Argentina’s football superliga. In: Proceed-
ings of the 30th. European conference on operational research, Dublin, Ireland
Easton K, Nemhauser G, Trick M (2001) The traveling tournament problem: description and benchmarks.
Lect Notes Comput Sci 2239:580–584
Easton K, Nemhauser G, Trick M (2003) Solving the travelling tournament problem: a combined integer
programming and constraint programming approach. Lect Notes Comput Sci 2740:100–109
Easton K, Nemhauser G, Trick M (2004) Sports scheduling. In: Leung J (ed) Handbook of scheduling,
vol 52. CRC Press, Boca Raton, pp 1–52
Fiallos J, Pérez J, Sabillón F, Licona M (2010) Scheduling soccer league of Honduras using integer pro-
gramming. In: Johnson A, Miller J (eds) Proceedings of the (2010) industrial engineering research
conference, Cancún, Mexico
Flatberg T, Nilssen E, Stølevik M (2009) Scheduling the topmost football leagues of Norway. In: 23rd
European conference on operational research, book of abstracts, Bonn, Germany
Fleurent C, Ferland J (1993) Allocating games for the NHL using integer programming. Oper Res
41:649–654
Fried G, Mumcu C (2016) Sport analytics: a data-driven approach to sport business and management.
Routledge, London
Froncek D (2001) Scheduling the Czech national basketball league. Congr Numer 153:5–24
Gale D, Shapley L (1962) College admissions and the stability of marriage. Am Math Mon 69(1):9–14
Goerigk M, Westphal S (2012) A combined local search and integer programming approach to the trave-
ling tournament problem. In: Proceedings of the practice and theory of automated timetabling
(PATAT 2012), pp 29–31
Goossens D, Spieksma F (2009) Scheduling the Belgian soccer league. Interfaces 39(2):109–118
Goossens D, Spieksma F (2012) Soccer schedules in Europe: an overview. J Sched 15:641–651
Grabau M (2012) Softball scheduling as easy as 1-2-3 (strikes you’re out). Interfaces 42(3):310–319
Guajardo M, Jörnsten K (2017) The stable tournament problem: matching sports schedules with prefer-
ences. Oper Res Lett 45(5):461–466
Gupta A (2017) Time series modeling for dream team in fantasy premier league. In: Proceedings of the
international conference on sports engineering ICSE-2017, Jaipur, India
Guyon J (2015) Rethinking the FIFA World Cup final draw. J Quant Anal Sports 11(3):169–182
Henz M (2004) Playing with constraint programming and large neighborhood search for traveling tourna-
ments. In: Proceedings of the 5th international conference on the practice and theory of automated
timetabling (PATAT, 2004), pp 23–32
Hoshino R, Kawarabayashi K (2011) A multi-round generalization of the traveling tournament problem
and its application to Japanese baseball. Eur J Oper Res 215:481–497
13
G. Durán
Kendall G, Knust S, Ribeiro C, Urrutia S (2010) Scheduling in sports: an annotated bibliography. Com-
put Oper Res 37(1):1–19
Knust S (2010) Scheduling non-professional table-tennis leagues. Eur J Oper Res 200(2):358–367
Lasek J, Szlávik Z, Bhulai S (2013) The predictive power of ranking systems in association football. Int J
Appl Pattern Recognit 1(1):27–46
Lasek J, Szlávik Z, Gagolewski M, Bhulai S (2016) How to improve a team’s position in the FIFA rank-
ing? A simulation study. J Appl Stat 43(7):1349–1368
Lewis M (2003) Moneyball: the art of winning an unfair game. Norton & Company, London
Linfati R, Gatica G, Escobar J (2019) A flexible mathematical model for the planning and designing of a
sporting fixture by considering the assignment of referees. Int J Ind Eng Comput 10:281–294
McHale I, Davies S (2007) Statistical analysis of the effectiveness of the FIFA world rankings. In: Albert
J, Koning RH (eds) Statistical thinking in sports. Chapman & Hall CRC, Boca Raton, pp 77–89
Melo R, Urrutia S, Ribeiro C (2006) The traveling tournament problem with predefined venues. J Sched
12(6):607–622
Miller T (2015) Sports analytics and data science: winning the game with methods and models. Pearson
Education, New York
Miyashiro R, Matsui T, Imahori S (2008) An approximation algorithm for the traveling tournament prob-
lem. In: Proceedings of the 7th international conference on the practice and theory of automated
timetabling (PATAT, 2008)
Nemhauser G, Trick M (1998) Scheduling a major college basketball conference. Oper Res 46:1–8
Noronha T, Ribeiro C, Durán G, Souyris S, Weintraub A (2007) A branch-and-cut algorithm for schedul-
ing the highly-constrained Chilean soccer tournament. Lect Notes Comput Sci 3867:174–186
Nurmi K, Goossens D, Bartsch T, Bonomo F, Briskorn D, Durán G, Kyngäs J, Marenco J, Ribeiro C,
Spieksma F, Urrutia S, Wolf R (2010) A framework for a highly constrained sports scheduling
problems. In: Proceedings of the 2010 IAENG international conference on operations research
(ICOR at IMECS), Hong Kong
Oliveira L, Souza C, Yunes T (2015) On the complexity of the traveling umpire problem. Theor Comput
Sci 562:101–111
Oliveira L, Souza C, Yunes T (2016) Lower bounds for large traveling umpire instances. Comput Oper
Res 72(C):147–159
Paenza A (2006) Matemática... Estás Ahí? Episodio 2 (in Spanish). Siglo XXI, Buenos Aires
Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web.
In: Proceedings of the 7th international World Wide Web conference, Brisbane, Australia
Radicchi F (2011) Who is the best player ever? A complex network analysis of the history of professional
tennis. PLoS One 6(2):e17249
Rasmussen R (2008) Scheduling a triple round robin tournament for the best Danish soccer league. Eur J
Oper Res 185(2):795–810
Rasmussen R, Trick M (2008) Round robin scheduling—a survey. Eur J Oper Res 188:617–636
Recalde D, Torres R, Vaca P (2013) Scheduling the professional Ecuadorian football league by integer
programming. Comput Oper Res 40(10):2478–2484
Ribeiro C, Urrutia S (2007) Heuristics for the mirrored traveling tournament problem. Eur J Oper Res
179:775–787
Ribeiro C, Urrutia S (2012) Scheduling the Brazilian soccer tournament: solution approach and practice.
Interfaces 42(3):260–272
Schönberger J (2017) The championship timetabling problem-construction and justification of test cases.
In: Proceedings of MathSport international (2017) conference, Italy, Padua, p 330
Schreuder J (1992) Combinatorial aspects of construction of competition Dutch professional football
leagues. Discret Appl Math 35(3):301–312
Suzuki K, Salasar B, Leite G, Louzada-Neto F (2010) A Bayesian approach for predicting match out-
comes: the 2006 (Association) Football World Cup. J Oper Res Soc 61(10):1530–1539
Thielen C, Westphal S (2011) Complexity of the traveling tournament problem. Theor Comput Sci
412:345–351
Toffolo T, Wauters T, Trick M (2015) An automated benchmark website for the traveling umpire prob-
lem. http://www.gent.cs.kuleuven.be/tup. Accessed 28 Feb 2020
Toffolo T, Wauters T, Van Malderen S, Vanden BG (2016) Branch-and-bound with decomposition-based
lower bounds for the traveling umpire problem. Eur J Oper Res 250(3):737–744
Toffolo T, Christiaens J, Spieksma F, Vanden BG (2019) The sport teams grouping problem. Ann Oper
Res 275:223–243
13
Sports scheduling and other topics in sports analytics: a survey…
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
13