FATA-Skilloscopy_Bayesian_Modeling_of_Decision_Makers_Skill
FATA-Skilloscopy_Bayesian_Modeling_of_Decision_Makers_Skill
6, NOVEMBER 2013
Abstract—This paper proposes and demonstrates Skilloscopy model such systems effectively, it is necessary to model and
as an approach to the assessment of decision makers. In to measure decision makers’ skill.
an increasingly sophisticated, connected, and information-rich
The importance of skill evaluation and modeling of decision
world, decision making is becoming both more important and
more difficult. At the same time, modeling decision making on makers in motor and cognitive activities has been recognized
computers is becoming more feasible and of interest, partly in a broad range of application domains: mineral processing
because the information input to those decisions is increasingly plant [1], surgery [2], [3], air traffic controllers [4], robotic
on record. The aims of Skilloscopy are to rate and rank decision arms [5], senior police officers [6], intelligent tutoring system
makers in a domain relative to each other; these aims do not
and adaptive learning platforms [7], sports [8], and games [9].
include an analysis of why a decision is wrong or suboptimal,
nor the modeling of the underlying cognitive process of making Bayesian inference has been adopted successfully in many
the decisions. In the proposed method, a decision-maker is applications for model selection and decision making. Typi-
characterized by a probability distribution of its competence cally, these methods have been applied to determine models
in choosing among quantifiable alternatives. This probability that explain the empirical evidence of data representing best
distribution is derived by classic Bayesian inference from a
solutions to problems or data generated by complex systems.
combination of prior belief and evidence of the decisions. Thus,
decision makers’ skills may be better compared, rated, and This paper proposes the application of the classic Bayes’
ranked. The proposed method is applied and evaluated in the rule to determine parametric models of decision makers’ skill.
game domain of chess. A large set of games by players across a Given a history of decisions in a given problem domain, can
broad range of the World Chess Federation (FIDE) Elo ratings we generate a model of decision makers’ skill? How close is
has been used to infer the distribution of players’ rating directly
the decision making to optimal? And more specifically, how
from the moves they play rather than from game outcomes.
Demonstration applications address questions frequently asked can their competence level be rated and ranked? Finally, how
by the chess community regarding the stability of the Elo rating can we assess, monitor, and compare decision makers’ skills?
scale, the comparison of players of different eras and/or leagues, Decision makers exhibit different levels of expertise and
and controversial incidents possibly involving fraud. The method competence, which can be represented by a skill level. Prob-
of Skilloscopy may be applied in any decision domain where the
ability density functions can encode uncertainties due to
value of the decision-options can be quantified.
the lack of empirical evidence and the intrinsic variability
Index Terms—Bayesian inference, decision making, skill of human factors. In this paper, we combine prior beliefs
evaluation.
and empirical evidence of the quality of the decisions made
by decision makers using a Bayesian inference process to
I. Introduction
generate a simple probabilistic model of their skill.
PILOT attempts to land in marginal conditions. Multi-
A ple agencies work furiously on a major emergency. A
student progresses his learning with less than total awareness,
The only assumption underlying Skilloscopy, our proposed
approach, is the availability of a utility function, either exact
or heuristic, that provides a numerical estimation of the utility
motivation, or organization. A golfer performs a 3-D rotation of the alternative choices. The utility function is not required
and translation in a powerful yet precise driver swing. to be available at the time of the decision, nor available to
In problems of complex decision making, the combined the decision maker. It serves as benchmark system of the
pressure of events, real time, partial information, problem decisions in their context to support the assessment process.
complexity, and limitations on human (and computer) re- For example, this may be the case in training and monitoring
sources may cause the human component to take suboptimal human operators, in games and sports, or in cases, where the
decisions, short of the utopian agent in a how to manual. To utility function is computationally prohibitive for a real time
Manuscript received September 15, 2011; revised September 19, 2012; scenario. If a model and a software simulator is available,
accepted February 6, 2013. Date of publication August 7, 2013; date of current a brute force approach can provide a utility function by
version October 14, 2013. This paper was recommended by Associate Editor exploring all alternatives and evaluating their effects on the
E. P. Blasch.
The authors are with the School of Systems Engineering, The University overall goal. If no explicit model of the system is available,
of Reading, Whiteknights, Reading, Berkshire RG6 6AX, U.K. (e-mail: the utility function could just be the off-line assessment by an
[email protected]; [email protected]). expert.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. The proposed method is demonstrated and evaluated in the
Digital Object Identifier 10.1109/TSMC.2013.2252893 game domain of chess. Chess has always been a favorite
2168-2216
c 2013 IEEE
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1291
demonstration domain in the fields of cognitive psychology utility values and the uncertainty of decisions. It is typically
and artificial intelligence as it is a well-documented familiar distinguished into descriptive, prescriptive, and normative.
large complex model domain many of whose aspects are Normative Decision Theory is concerned with identifying
subject to quantification. An important contribution to the optimal decisions as taken by an ideal decision maker, who
study of human skills from the chess domain is the Elo rating is rational, fully informed, and able to compute with perfect
system [10]. It was originally devised for chess and adopted accuracy. Prescriptive Decision Theory is concerned with what
by the United States Chess Federation in 1960 and by the people should and can do. Decision analysis [12] (for a
FIDE in 1970. The Elo rating system determines the relative more recent survey, see [13]) is the practical application of
strength (rating) of players by an iterative inference process Prescriptive Decision Theory, and is aimed at finding tools
based on the outcome of the games. and methodologies that can support people to make better
While, in general, it may be difficult to assess the accuracy decisions.
of models of decision makers in many domains, models of In contrast, Descriptive Decision Theory [14] is concerned
chess players’ skill can be compared with the Elo ratings. with describing what people actually do. Descriptive models
This paper proposes a data mining approach to model have been developed with the aim of capturing the underlying
decision makers’ skill and, more specifically, its contributions processes that guide human choice behavior under uncertainty.
are summarized as follows. However, the effort in understanding and describing the actual
1) It defines a generic domain-independent model of process of and reasons for decision making has not been sup-
stochastic decision-making agents. ported by empirical studies; the development of a Descriptive
2) It applies a Bayesian inference methodology by means Decision Theory itself has been questioned as nonachievable
of an efficient adaptive algorithm to generate probabilis- [14].
tic models of decision makers’ skill. In a long tradition, cognitive psychology and artificial
3) It demonstrates the use of the method in the game intelligence have also tried to define and explain skilled
domain of chess to determine players’ skills from the behavior, such as human expertise, in problem solving and
quality of decisions (moves) rather than from game decision making. The game of chess has always been a favorite
outcomes. demonstration domain. chess players’ thinking [15], [16] has
4) It demonstrates applications of the method to address a been studied for a long time and two main models have been
number of questions asked in the chess domain. provided. One mechanism is based on pattern recognition to
Although the proposed method is experimentally tested on access a knowledge database. The second approach is a search
the game domain of chess, it is not based on any specific strategy through the problem space. The relative importance
model of the decision making process of chess players. How given to knowledge and quantitative search varies in the
a player chooses a particular move among alternative choices proposed theories of skilled behavior in chess [17], [18].
(variants) is not considered or modeled. The method can be This paper takes a descriptive data mining approach to the
easily adopted in rating skilled behavior and general types of problem. Rather than building an explicit model of the decision
expertise in other domains. The method infers skills directly making process, we propose to build a probabilistic model of
from the innate quality of the decisions and independently of the decision maker’s skill, without any attempt to understand
the competitive nature of the activity. For example, this method or to model the underlying decision process.
could be effectively adopted to monitor and evaluate training
and education activities.
III. A Model of Decision Makers’ Skill
The rest of this paper is organized as follows. Section II
reviews the general approaches in Decision Theory. Section III In problems of a strategic character, human beings exhibit
introduces the general problem of modeling decision makers’ different levels of expertise and competence in making deci-
skill and the proposed Bayesian inference method. Section IV sions that have a direct or indirect effect on the achievement of
discusses related work in skill rating, in general, and in an overall goal. In general, we expect that the competence of
chess, in particular. Section V presents the application of decision makers is the cumulative result of abilities, training
the proposed method to generate chess players’ ratings. An and experience, typically referred to as “skill.”
experimental analysis of the method and innovative applica- In general, a model of decision makers’ skill may be defined
tions in the domain of chess are presented, respectively, in by a set of parameters, i.e., multidimensional skills. Without
Section VI and VII. Section VIII reviews related work. Finally, the loss of generality, in this paper, we limit our analysis to
Section IX summarizes the paper and indicates some future the simple model of a single parameter c ∈ R. We assume that
research directions. decision makers can be modeled by a numeric skill level c,
which indicates their competence in solving a particular class
of problems. The advantage of defining models with a single
II. Modeling Decision Making parameter is that the skill level can be directly used for ranking
Decision making under uncertainty and, more generally, and rating. However, a multidimensional skill model may be
Decision Theory, has been studied for a long time [11] more suitable to capture specific aspects in more complex
with contributions from several academic disciplines including domains.
statistics, economics, psychology, philosophy, political, and Even though the average skill level of a given decision
social science. Decision Theory is concerned with identifying maker is expected to vary smoothly over time, decision
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1292 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1293
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1294 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1295
A. Reference Chess Engine and Stochastic Chess Agent VI. Experimental Analysis
During a chess game, players make decisions according to For the experimental analysis, the following resources have
individual judgement under time pressure. Rational decision been used:
making requires a definite set of alternative actions and 1) publicly available data in Portable Game Notation
knowledge of the utilities of the outcomes of each possible (PGN) from sources including the Chessbase database
action. A player’s skill is a measurement of their ability to [34];
make choices as close as possible to the optimal ones. In order 2) Toga II v1.3.1 [35], a publicly available, reputable, and
to assess a player’s skill level, we ideally need the best move widely used chess engine;
benchmark but this is only available via Endgame tables in the 3) the Universal chess interface (UCI) protocol [36] for
Endgame Zone. In the general case of the whole game, this is chess engine input/output;
clearly not feasible. However, significant advances have been 4) Java code implementing UCI and the Bayesian inference
made in the last decades in terms of chess engines’ playing method (see Section III-C).
strength [28]. chess engines can be adopted as domain experts During a preprocessing, phase positions were acquired from
to provide utility values for alternative moves. the games, ignoring the first 12 moves by each side (assumed
Given a chess engine CE, a chess board position analysis to be from the book). Configurations of the chess board were
results in a list of recommended best moves and their heuristic converted to a set of events {e = (p, m)}, where each move m
utility values in pawn units (centipawns). The value associated is made in a board position p. Positions were analyzed using
with a move corresponds to the estimated advantage of the the chess engine, which provided the utility values for a finite
position that the move will lead to after a number of turns. set of alternatives moves.
Because of time constraints and the exponential nature of the Each analysis was carried out to depth d = 10 plies. The
computational complexity, the analysis of the chess engine can chess engine was configured to determine and report the top
only be performed up to a given maximum depth d (number moves (nmv = 10) it found in each position. Both values
of plies) and for a limited number nmv of alternative variants. were chosen as compromises between computing speed and
For the purpose of this paper, we consider the reference comprehensiveness of the data. Depth 10 is not considered
chess engine CE(d, nmv ). sufficient to outplay the stronger players in our samples, but
In contrast to the ideal Endgame Zone scenario, three main apparently it suffices to identify most of their inaccuracies.
factors introduce an approximation in the evaluation of a Finally, the Bayesian inference process has been applied to
chess position in terms of candidate moves and relative values. the preprocessed data to generate the probability distribution
They are the limited search depth and span (parameters d and of the parameter c.
nmv ) and the heuristic nature of the chess engine’s analysis. In all our tests, the arbitrary constant k of formula (1) has
This is a general problem of sensitivity to heuristic utility been set to 0.1, which corresponds to a thousandth of the value
functions in Descriptive Decision Theory. The influence of of a pawn.
this approximation in our analysis will be the scope of future
investigations.
A. Composite Reference Elo Players
The analysis of a position q is a function fCE that provides
a list Vq = {(mi , vi )} of candidate moves mi and their estimated This experiment shows that the proposed Bayesian approach
utility values vi (1 ≤ i ≤ nmv ) is able to detect different skill levels among players with
different Elo ratings.
fCE : q → Vq . The decisions of players of different skill levels have been
analyzed. We have used all available 3432 games in which
Let Mq = {mi } be the set of candidate moves in Vq and both players had Elo ratings within 10 points of some Elo
vmax = maxi {vi }. figure, e.g., games of players rated between 2390 and 2410.
To model human players’ skill we associate a likelihood Games were grouped according to the Elo rating of the players.
function L (1) with the reference chess engine CE to create Each group contains games between players with a similar Elo
a stochastic chess agent R(c). The stochastic chess agent will rating (ELOmin ≤ ELO(player) ≤ ELOmax ). The number of
not always play the best move; it uses a likelihood function to games and the number of positions (move-events) that have
select one of the alternatives provided by the reference chess been included in the datasets of composite reference players
engine. are given in Table III.
Following the approach in Section III-B the probability of We have applied the Bayesian inference method described
R(c) selecting a move mj is given by: in Section III-C to each dataset of Table III independently.
The probability distributions of the parameter c is shown in
(vmax −vj +k)−c
Fig. 2(a). A summary of these distributions is provided in
, if mj ∈ Mq Table III in terms of the mean, standard deviation and 95%
p(c|mj ) = (vmax −vi +k)−c (6)
mi ∈Mq
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1296 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013
TABLE III
Chess Games Datasets
1
Player ELOmin ELOmax Period Games N CRmin CRmax c̄ σc σc · N 2
Elo2100 2090 2110 1994-1998 213 11727 1.047 1.089 1.068 0.0101 1.091
Elo2200 2190 2210 1971-1998 2771 140390 1.098 1.112 1.105 0.0030 1.117
Elo2300 2290 2310 1971-2005 1064 57627 1.147 1.169 1.158 0.0049 1.171
Elo2400 2390 2410 1971-2006 3055 152589 1.210 1.224 1.217 0.0031 1.219
Elo2500 2490 2510 1995-2006 1646 83748 1.256 1.276 1.266 0.0044 1.266
Elo2600 2590 2610 1995-2006 746 37623 1.309 1.337 1.323 0.0068 1.326
Elo2700 2690 2710 1991-2006 121 9279 1.335 1.391 1.364 0.0140 1.345
B. Convergence Analysis
In order to check the derivation process of the probabil-
ity distribution of the apparent skill, we have taken snap-
shots at different iteration steps (i.e., number of positions).
Fig. 3 shows the analysis that has been carried out on the 2400
Elo data. The curves in Fig. 3(a) show the evolution of the
probability distribution during the refinement of the Bayesian
inference process. The expected value c̄ [Fig. 3(b)] quickly
converges and the standard deviation [Fig. 3(c)] decreases as
the inference process draws on more data. The asymptotic
value of the standard deviation is a measure of the intrinsic
variability of the skill level. Fig. 2. Posterior probability distributions of the model R(c) for composite
reference Elo players. (a) Competence probability distributions. (b) FIDE Elo
rating versus inferred apparent competence (c̄±σ): the linear regression model
C. Skill Difference in Players With Similar Elo Ratings is y = 1949.53 · x + 32.39.
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1297
TABLE IV
Analysis of the Opponent Players in the Dataset E2400
VII. Applications
In this section, we demonstrate some interesting applications
of the proposed method in the domain of chess.
In the first example, the method is used to generate the
skill profile of players over several decades, even before the
official adoption of the Elo rating system. We compare the
profile of a top player with the official FIDE Elo ratings
and with the ratings generated by Chessmetrics [37]. FIDE
have published players’ Elo ratings every three months since
1970. Chessmetrics has been chosen for comparison because
it is an attempt to improve the accuracy of the statistical
inference method of the Elo system. It has also been applied to
game data before the FIDE adoption of the Elo system. Both
Elo and Chessmetrics use only the results of games (paired
comparisons) to infer players’ strength.
We have also generated a historical comparison of a few
top players’ profiles. This scenario is used to to carry out an
experimental analysis of the sensitivity to the prior probability.
Fig. 3. Convergence evolution for the dataset 2400 Elo. (a) Probability In the second example, we present a chart that is suitable to
distribution at different iteration steps. (b) Expected value. (c) Standard visualize the within-game skills of players and their opponents.
deviation.
In particular, we have generated this chart to analyze an aspect
We have applied the Bayesian inference process to each set of the famous and controversial [38] final stage of the 1948
of events Sr,i and compute c̄ for each of them. World Championship.
In this case, the apparent skill c̄ measures the quality of Finally, in the third example, we show how to use ratings
moves played by a single player during a single game, with a and profiles generated by the proposed method to analyze the
consequent expected high uncertainty because of the limited performance of players accused of cheating.
amount of data on which the inference is carried out.
We have computed first order statistics of the apparent skill A. Profiling Human Skills Over Time
c̄ over the three sets S0 , S1 and S 1 . In this test, we have used We have selected all the publicly available games of a few
2
602 games of the dataset E2400, 313 of which were a win top players to generate and compare their skill profiles over
and 289 a draw (n0 = n1 = 313 and n 1 = 578). The average many years. The apparent competence c̄ has been converted
2
apparent skill μc̄ over all 1204 Sr,i , regardless of the result r, into an equivalent Elo rating (c2ELO) by using the regression
is 1.2109. The results over each set S0 , S1 , and S 1 are shown model obtained in Section VI-A and Fig. 2(b).
2
in Table IV. First, we have generated the skill profile of Viktor Lvovich
In spite of the small number of events in a single game, the Korchnoi (1931-) to compare the proposed method to other
Bayesian approach is able to detect a meaningful difference chess rating methods. Korchnoi is currently the oldest active
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1298 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1299
Fig. 7. Within-game skill chart: Keres’ and opponent’s competence (c̄) over
single games (World Championship 1948).
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1300 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013
IX. Conclusion
The problem of modeling decision makers’s skill was in-
vestigated. The proposed approach, Skilloscopy, was based on
the definition of a general stochastic model and a Bayesian
Fig. 8. D. P. Singh’s (games from Oct. 2005 to Oct. 2008). (a) Within-game
skill chart: Singh’s and opponent’s competence (c̄). (b) Probability density inference method. It did not assume any domain-specific
function of the competence before and after allegations. model of the decision making process.
The approach was demonstrated in the game domain of
those four games Keres’ competence c is below 1.1; in all chess. The experimental analysis had shown the viability of
other games it is above 1.1. The chart shows that in those rating players’ skill by benchmarking against chess engines.
four games Keres clearly performed worse than in any other The statistical inference was based on the quality of decisions,
game of the competition. If this was intended or not is, of rather than on paired comparisons (game outcomes) as in
course, out of the scope of the analysis. Nevertheless, such previous approaches.
a tool may be used to support or to reject hypotheses and to The method was successfully applied to a large set of chess
motivate further analysis and, eventually, investigations. game data and validated with the FIDE Elo ratings. The
C. Alleged Chess Cyborgs experimental analysis provided evidence of the accuracy of
the method in estimating the skill level of players regardless
There have been quite a number of players suspected of of the outcome of the games and of the opponent rating.
fraudulently receiving computer advice during play (e.g., [39]). Further work will address the generalization of the method
D.P. Singh’s play came under suspicion in the second half to multidimensional skills and the influence of approximations
of 2006 [40]. We have analyzed all available games from Oct. of the utility function.
2005 to Oct. 2008 (Fig. 8) over the entire period and over the In principle, the proposed method can be effectively adopted
two periods before and after the allegations. in similar domains, where an accurate method to determine
Fig. 8(a) shows the within-game chart for all games over the utility values was available. It can be used, for example, to
entire period. In some games, he played at an exceptionally analyze in real-time the likely abilities of students and skilled
high skill level. workers in defined-process scenarios.
The two apparent competence profiles before and after the
allegations are shown in Fig. 8(b). These profiles are well
separated and indicate a drop of skill level after the allegations References
were made.
[1] O. Haavisto and A. Remes, “Data-based skill evaluation of human
These example suggest that the proposed method could operators in process industry,” in Proc. Int. Conf. Control Automat. Syst.,
be useful to create real-time skill monitoring applications to Oct. 2010, pp. 707 –712.
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1301
[2] M. Aizuddin, N. Oshima, R. Midorikawa, and A. Takanishi, “Develop- [29] M. E. Glickman, “Parameter estimation in large dynamic paired com-
ment of sensor system for effective evaluation of surgical skill,” in Proc. parison experiments,” J. Royal Stat. Soc. Ser. C (Appl. Stat.), vol. 48,
First IEEE/RAS-EMBS Int. Conf. Biomed. Robotics Biomechatron., Feb. no. 4, pp. 377–394, 1999.
2006, pp. 678–683. [30] J. Beasley, The Mathematics of Games. New York, NY, USA: Dover,
[3] E. Lorias, M. Minor, S. Ortiz, P. Olivares, and J. Gnecchi, “Computer 2006.
system for the evaluation of laparoscopic skills,” in Proc. Electron. [31] P. Dangauthier, R. Herbrich, T. Minka, and T. Graepel, “TrueSkill
Robotics Automot. Mech. Conf., Sep.–Oct. 2010, pp. 19–22. through time: Revisiting the history of chess,” in Proc. NIPS, 2008,
[4] K. Yacef and L. Alem, “Evaluation of learner’s skills in the context pp. 931–938.
of dynamic and complex systems,” in Proc. IEEE Int. Conf. Syst. Man [32] M. Guid and I. Bratko, “Computer analysis of world chess champions,”
Cybern., vol. 3. Beijing, China, Oct. 1996, pp. 2200–2204. ICGA J., vol. 29, no. 2, pp. 65–73, Nov. 2006.
[5] A. Koivo and D. Repperger, “Skill evaluation of human operators,” in [33] C. Sullivan. (2008). Comparison of Great Players [Online]. Available:
Proc. IEEE Int. Conf. Syst. Man Cybern. Comput. Cybern. Simul., vol. 3. http://www.truechess.com/web/champs.html
Oct. 1997, pp. 2103–2108. [34] ChessBase GMBH. (2013). Chessbase Player Database [Online]. Avail-
[6] R. Hartley and G. Varley, “The design and evaluation of simulations for able: http://www.chessbase.com
the development of complex decision-making skills,” in Proc. IEEE Int. [35] T. Gaksch. (2013). Toga II 1.3.1 Chess Engine [Online]. Available: http:
Conf. Adv. Learn. Technol., 2001, pp. 145–148. //www.superchessengine.com/toga ii.htm
[7] G. M. Goh, C. Quek, and D. Maskell, “Epilist II: Closing the loop in [36] S. Meyer-Kahlen. (2013). Definition of the Universal Chess Interface
the development of generic cognitive skills,” in Proc. IEEE Trans. Syst. [Online]. Available: http://wbec-ridderkerk.nl/html/UCIProtocol.html
Man Cybern., A, Syst. Hum., vol. 40, no. 4. Jul. 2010, pp. 676–685. [37] J. Sonas. (2013). Chessmetrics [Online]. Available: http://www.
[8] K. Watanabe and M. Hokari, “Kinematical analysis and measurement chessmetrics.com
of sports form,” in Proc. IEEE Trans. Syst. Man Cybern., A, Syst. Hum., [38] C. C. Moul and J. V. Nye, “Did the Soviets Collude? A statistical
vol. 36, no. 3. May 2006, pp. 549–557. analysis of Championship Chess 1940-78,” J. Econ. Behav. Organ.,
[9] R. Herbrich, T. Minka, and T. Graepel, “TrueSkillTM : A Bayesian skill vol. 70, no. 1–2, pp. 10–21, 2009.
rating system,” in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 569– [39] F. Friedel, “Cheating in chess,” in Advances in Computer Games 9, H. J.
576. van den Herik and B. Monien, Eds. Maastricht, The Netherlands: IKAT,
[10] A. Elo, The Rating of Chessplayers, Past and Present. New York, NY, 2001, pp. 327–346.
USA: Arco, 1978. [40] Chessbase News. (2007). D.P. Singh: Supreme Talent or Flawed Genius?
[11] D. North, “A tutorial introduction to decision theory,” IEEE Trans. Syst. [Online]. Available: http://www.chessbase.com/newsdetail.asp?newsid=
Sci. Cybern., vol. 4, no. 3, pp. 200–210, Sep. 1968. 3595
[12] H. Raiffa, Decision Analysis: Introductory Lectures on Choices Under [41] G. McC. Haworth, “Reference fallible endgame play,” ICGA J., vol. 26,
Uncertainty. Reading, MA: Addison-Wesley, republished by McGraw- no. 2, pp. 81–91, Jun. 2002.
Hill, 1968. [42] G. McC. Haworth, “Gentlemen, stop your engines!” ICGA J., vol. 30,
[13] W. Edwards, R. F. Miles, Jr., and D. von Winterfeldt, Eds., Advances in no. 3, pp. 150–156, Sep. 2007.
Decision Analysis From Foundations to Applications. Cambridge, U.K.: [43] G. Di Fatta, G. Haworth, and K. Regan, “Skill rating by Bayesian
Cambridge Univ. Press, 2007. inference,” in Proc. IEEE Symp. Comput. Intell. Data Mining, Apr. 2009,
[14] S. Dillon, “Descriptive decision making: Comparing theory with prac- pp. 89–94.
tice,” in Proc. 33rd Ann. Oper. Res. Soc. New Zealand Conf., Aug. 1998,
pp. 99–108.
[15] A. De Groot, Het denken van den schaker. Amsterdam, The Netherlands:
Giuseppe Di Fatta (M’02) received the “Laurea”
Noord Hollandsche, 1946.
degree (M.Eng.) in electronics engineering and the
[16] A. De Groot, Thought and Choice in Chess, 2nd ed. (Revised translation
Ph.D. degree in computer science from the Univer-
of De Groot, 1946). Hague, The Netherlands: Mouton Publishers, 1978.
sity of Palermo, Palermo, Italy, in 1995 and 2002,
[17] F. Gobet, “Chess players’ thinking revisited,” Swiss J. Psychol., vol. 57,
respectively.
no. 1, pp. 18–32, 1998.
He is an Associate Professor of computer science
[18] F. Gobet and N. Charness, Expertise in Chess, Chess and Games. Cam-
at the University of Reading, Reading, U.K., where
bridge Handbook on Expertise and Expert Performance. Cambridge,
he has been since 2006. In 1999, he was a Research
U.K.: Cambridge Univ. Press, 2006.
Fellow at the International Computer Science Insti-
[19] J. O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed.
tute (ICSI), Berkeley, CA, USA. From 2000 to 2004,
Berlin, Germany: Springer-Verlag, 1985.
he was with the High Performance Computing and
[20] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data
Networking Institute, National Research Council, Italy. From 2004 to 2006, he
Analysis, 2nd ed. London, U.K.: Chapman and Hall/CRC, 2003.
was with the University of Konstanz, Konstanz, Germany. He has published
[21] P. M. Lee, Bayesian Statistics: An Introduction, 3rd ed. New York, NY,
over 60 papers in peer-reviewed conferences and journals. His current research
USA: Wiley, 2004.
interests include data mining, scalable algorithms, distributed and parallel
[22] R. Coulom, “Whole-History Rating: A Bayesian rating system for
computing, and multidisciplinary applications.
players of time-varying strength,” in Proc. Conf. Comp. Games, Beijing,
Dr. Di Fatta has organized and chaired international workshops and confer-
China, 2008, pp. 113–124.
ences in data mining, distributed systems, and computer networks.
[23] T. Minka, “A family of algorithms for approximate Bayesian inference,”
Ph.D. dissertation, Massachusetts Ins. Technol., Cambridge, MA, USA,
2001.
[24] J. K. Kruschke, Doing Bayesian Data Analysis: A Tutorial Introduction Guy Haworth received the M.A. degree in math-
with R and BUGS. Amsterdam, The Netherlands: Academic/Elsevier, ematics from Oxford University, Oxford, U.K., in
2011. 1971, the Diploma degree in computer science from
[25] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block Cambridge University, Cambridge, U.K., in 1969,
designs–Part I: The method of paired comparisons,” Biometrika, vol. 39, and researched the issues of performance, paral-
no. 3–4, pp. 324–345, 1952. lelism, and integrity of real-time systems at Cam-
[26] M. E. Glickman, “A comprehensive guide to chess ratings,” Am. Chess bridge until 1972.
J., vol. 3, pp. 59–102, 1995. He has been a Lecturer at the University of Read-
[27] B. West. (2006). A simple and flexible rating method for predicting ing, Reading, U.K., since 2003. In 30 years in the
success in the NCAA basketball tournament. J. Quantitative Anal. Sports industry, his roles ranged from product development,
[Online]. 2(3), article 3. Available: http://www.degruyter.com/view/j/ sales and marketing to customer service and con-
jqas.2006.2.3/jqas.2006.2.3.1039/jqas.2006.2.3.1039.xml?format=INT sultancy, mainly for International Computers Limited, Reading, U.K. He has
[28] The Swedish Chess Computer Association. (2013). The SSDF Rating published over 60 papers. His current research interests include the application
List [Online]. Available: http://ssdf.bosjo.net/list.htm of systems theory, frameworks, and the soft systems methodology.
Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.