0% found this document useful (0 votes)
17 views12 pages

FATA-Skilloscopy_Bayesian_Modeling_of_Decision_Makers_Skill

The paper introduces Skilloscopy, a Bayesian modeling approach for assessing decision makers' skills across various domains, particularly in chess. It aims to rate and rank decision makers based on their competence in making decisions rather than analyzing the decision-making process itself. The method utilizes Bayesian inference to derive a probability distribution of a decision maker's skill from prior beliefs and empirical evidence, allowing for effective comparison and evaluation of decision-making abilities.

Uploaded by

roger.chemoul86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views12 pages

FATA-Skilloscopy_Bayesian_Modeling_of_Decision_Makers_Skill

The paper introduces Skilloscopy, a Bayesian modeling approach for assessing decision makers' skills across various domains, particularly in chess. It aims to rate and rank decision makers based on their competence in making decisions rather than analyzing the decision-making process itself. The method utilizes Bayesian inference to derive a probability distribution of a decision maker's skill from prior beliefs and empirical evidence, allowing for effective comparison and evaluation of decision-making abilities.

Uploaded by

roger.chemoul86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1290 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO.

6, NOVEMBER 2013

Skilloscopy: Bayesian Modeling of


Decision Makers’ Skill
Giuseppe Di Fatta, Member, IEEE, and Guy Haworth

Abstract—This paper proposes and demonstrates Skilloscopy model such systems effectively, it is necessary to model and
as an approach to the assessment of decision makers. In to measure decision makers’ skill.
an increasingly sophisticated, connected, and information-rich
The importance of skill evaluation and modeling of decision
world, decision making is becoming both more important and
more difficult. At the same time, modeling decision making on makers in motor and cognitive activities has been recognized
computers is becoming more feasible and of interest, partly in a broad range of application domains: mineral processing
because the information input to those decisions is increasingly plant [1], surgery [2], [3], air traffic controllers [4], robotic
on record. The aims of Skilloscopy are to rate and rank decision arms [5], senior police officers [6], intelligent tutoring system
makers in a domain relative to each other; these aims do not
and adaptive learning platforms [7], sports [8], and games [9].
include an analysis of why a decision is wrong or suboptimal,
nor the modeling of the underlying cognitive process of making Bayesian inference has been adopted successfully in many
the decisions. In the proposed method, a decision-maker is applications for model selection and decision making. Typi-
characterized by a probability distribution of its competence cally, these methods have been applied to determine models
in choosing among quantifiable alternatives. This probability that explain the empirical evidence of data representing best
distribution is derived by classic Bayesian inference from a
solutions to problems or data generated by complex systems.
combination of prior belief and evidence of the decisions. Thus,
decision makers’ skills may be better compared, rated, and This paper proposes the application of the classic Bayes’
ranked. The proposed method is applied and evaluated in the rule to determine parametric models of decision makers’ skill.
game domain of chess. A large set of games by players across a Given a history of decisions in a given problem domain, can
broad range of the World Chess Federation (FIDE) Elo ratings we generate a model of decision makers’ skill? How close is
has been used to infer the distribution of players’ rating directly
the decision making to optimal? And more specifically, how
from the moves they play rather than from game outcomes.
Demonstration applications address questions frequently asked can their competence level be rated and ranked? Finally, how
by the chess community regarding the stability of the Elo rating can we assess, monitor, and compare decision makers’ skills?
scale, the comparison of players of different eras and/or leagues, Decision makers exhibit different levels of expertise and
and controversial incidents possibly involving fraud. The method competence, which can be represented by a skill level. Prob-
of Skilloscopy may be applied in any decision domain where the
ability density functions can encode uncertainties due to
value of the decision-options can be quantified.
the lack of empirical evidence and the intrinsic variability
Index Terms—Bayesian inference, decision making, skill of human factors. In this paper, we combine prior beliefs
evaluation.
and empirical evidence of the quality of the decisions made
by decision makers using a Bayesian inference process to
I. Introduction
generate a simple probabilistic model of their skill.
PILOT attempts to land in marginal conditions. Multi-
A ple agencies work furiously on a major emergency. A
student progresses his learning with less than total awareness,
The only assumption underlying Skilloscopy, our proposed
approach, is the availability of a utility function, either exact
or heuristic, that provides a numerical estimation of the utility
motivation, or organization. A golfer performs a 3-D rotation of the alternative choices. The utility function is not required
and translation in a powerful yet precise driver swing. to be available at the time of the decision, nor available to
In problems of complex decision making, the combined the decision maker. It serves as benchmark system of the
pressure of events, real time, partial information, problem decisions in their context to support the assessment process.
complexity, and limitations on human (and computer) re- For example, this may be the case in training and monitoring
sources may cause the human component to take suboptimal human operators, in games and sports, or in cases, where the
decisions, short of the utopian agent in a how to manual. To utility function is computationally prohibitive for a real time
Manuscript received September 15, 2011; revised September 19, 2012; scenario. If a model and a software simulator is available,
accepted February 6, 2013. Date of publication August 7, 2013; date of current a brute force approach can provide a utility function by
version October 14, 2013. This paper was recommended by Associate Editor exploring all alternatives and evaluating their effects on the
E. P. Blasch.
The authors are with the School of Systems Engineering, The University overall goal. If no explicit model of the system is available,
of Reading, Whiteknights, Reading, Berkshire RG6 6AX, U.K. (e-mail: the utility function could just be the off-line assessment by an
[email protected]; [email protected]). expert.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. The proposed method is demonstrated and evaluated in the
Digital Object Identifier 10.1109/TSMC.2013.2252893 game domain of chess. Chess has always been a favorite
2168-2216 
c 2013 IEEE

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1291

demonstration domain in the fields of cognitive psychology utility values and the uncertainty of decisions. It is typically
and artificial intelligence as it is a well-documented familiar distinguished into descriptive, prescriptive, and normative.
large complex model domain many of whose aspects are Normative Decision Theory is concerned with identifying
subject to quantification. An important contribution to the optimal decisions as taken by an ideal decision maker, who
study of human skills from the chess domain is the Elo rating is rational, fully informed, and able to compute with perfect
system [10]. It was originally devised for chess and adopted accuracy. Prescriptive Decision Theory is concerned with what
by the United States Chess Federation in 1960 and by the people should and can do. Decision analysis [12] (for a
FIDE in 1970. The Elo rating system determines the relative more recent survey, see [13]) is the practical application of
strength (rating) of players by an iterative inference process Prescriptive Decision Theory, and is aimed at finding tools
based on the outcome of the games. and methodologies that can support people to make better
While, in general, it may be difficult to assess the accuracy decisions.
of models of decision makers in many domains, models of In contrast, Descriptive Decision Theory [14] is concerned
chess players’ skill can be compared with the Elo ratings. with describing what people actually do. Descriptive models
This paper proposes a data mining approach to model have been developed with the aim of capturing the underlying
decision makers’ skill and, more specifically, its contributions processes that guide human choice behavior under uncertainty.
are summarized as follows. However, the effort in understanding and describing the actual
1) It defines a generic domain-independent model of process of and reasons for decision making has not been sup-
stochastic decision-making agents. ported by empirical studies; the development of a Descriptive
2) It applies a Bayesian inference methodology by means Decision Theory itself has been questioned as nonachievable
of an efficient adaptive algorithm to generate probabilis- [14].
tic models of decision makers’ skill. In a long tradition, cognitive psychology and artificial
3) It demonstrates the use of the method in the game intelligence have also tried to define and explain skilled
domain of chess to determine players’ skills from the behavior, such as human expertise, in problem solving and
quality of decisions (moves) rather than from game decision making. The game of chess has always been a favorite
outcomes. demonstration domain. chess players’ thinking [15], [16] has
4) It demonstrates applications of the method to address a been studied for a long time and two main models have been
number of questions asked in the chess domain. provided. One mechanism is based on pattern recognition to
Although the proposed method is experimentally tested on access a knowledge database. The second approach is a search
the game domain of chess, it is not based on any specific strategy through the problem space. The relative importance
model of the decision making process of chess players. How given to knowledge and quantitative search varies in the
a player chooses a particular move among alternative choices proposed theories of skilled behavior in chess [17], [18].
(variants) is not considered or modeled. The method can be This paper takes a descriptive data mining approach to the
easily adopted in rating skilled behavior and general types of problem. Rather than building an explicit model of the decision
expertise in other domains. The method infers skills directly making process, we propose to build a probabilistic model of
from the innate quality of the decisions and independently of the decision maker’s skill, without any attempt to understand
the competitive nature of the activity. For example, this method or to model the underlying decision process.
could be effectively adopted to monitor and evaluate training
and education activities.
III. A Model of Decision Makers’ Skill
The rest of this paper is organized as follows. Section II
reviews the general approaches in Decision Theory. Section III In problems of a strategic character, human beings exhibit
introduces the general problem of modeling decision makers’ different levels of expertise and competence in making deci-
skill and the proposed Bayesian inference method. Section IV sions that have a direct or indirect effect on the achievement of
discusses related work in skill rating, in general, and in an overall goal. In general, we expect that the competence of
chess, in particular. Section V presents the application of decision makers is the cumulative result of abilities, training
the proposed method to generate chess players’ ratings. An and experience, typically referred to as “skill.”
experimental analysis of the method and innovative applica- In general, a model of decision makers’ skill may be defined
tions in the domain of chess are presented, respectively, in by a set of parameters, i.e., multidimensional skills. Without
Section VI and VII. Section VIII reviews related work. Finally, the loss of generality, in this paper, we limit our analysis to
Section IX summarizes the paper and indicates some future the simple model of a single parameter c ∈ R. We assume that
research directions. decision makers can be modeled by a numeric skill level c,
which indicates their competence in solving a particular class
of problems. The advantage of defining models with a single
II. Modeling Decision Making parameter is that the skill level can be directly used for ranking
Decision making under uncertainty and, more generally, and rating. However, a multidimensional skill model may be
Decision Theory, has been studied for a long time [11] more suitable to capture specific aspects in more complex
with contributions from several academic disciplines including domains.
statistics, economics, psychology, philosophy, political, and Even though the average skill level of a given decision
social science. Decision Theory is concerned with identifying maker is expected to vary smoothly over time, decision

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1292 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013

TABLE I A. Utility Function


General Notation
The utility function provides a measure of the quality
of the decisions, i.e., a benchmark system of the available
Notation Description
alternatives.
Q problem class
q problem instance The skill of a decision maker measures the degree of
Aq set of alternatives for problem q agreement of their decisions with the benchmark system.
a alternative Given a decision problem q with a finite set of alternatives
c competence, skill level
u() utility function Aq , the utility function u(q, a) returns their utility values in
L() likelihood function an arbitrary range and some units. Where context allows, we
p() probability function will consider the problem q and the set of alternatives Aq
P0 prior probability constant
R(c) stochastic reference agent with competence c implicit to simplify the notation. The utility function expresses
S set of events q, a preferences in outcomes: u(a1 ) > u(a2 ) if a1 leads or is
expected to lead to a better outcome than a2 .
The utility function is domain specific and its availability
making activities can be appropriately described as stochastic is the only fundamental assumption in the proposed approach.
processes, where decisions are generated by an agent with an The utility function is necessary to carry out the assessment
apparent competence c. We introduce the concept of a stochas- of decision makers, not to perform decision-making activities.
tic reference agent R(c). The agent R(c), when presented with The utility function may be exact or heuristic, relative or
a problem with a set of alternatives, chooses by means of a absolute, based on an human expert, a system model or a
stochastic process biased by its competence level. brute-force search within a system simulator. In case the util-
Skilloscopy imagines that the decisions have been taken by ity function is heuristic, approximations obviously introduce
one of a set of these stochastic agents of defined reference errors and uncertainty. The influence and sensitivity of the
behavior and skill. Skilloscopy asks the following question: method to the approximation of the utility function need to be
On the assumption that the decisions observed have properly addressed. However, this is beyond the scope of this
been made by one of the reference agents available, presentation and the subject of future work.
what is the probability associated with the hypothe-
sis that a particular agent made the decision? B. Reference Agent and Decision Likelihood
This is a classic inverse problem that, given a set of prior The stochastic reference agent is a generic synthetic de-
beliefs, can be solved precisely by the process of Bayesian cision maker. Given a set of alternatives and their utility
inference. values, the stochastic behavior of the reference agent R(c) is
In this section a general model of decision makers’ skill is defined by the likelihood that an alternative is chosen, given
introduced and a Bayesian inference method presented. The its competence c.
Bayesian inference of the model of decision makers’ skill is Agents are not meant to model human decision makers.
defined by five components: They are used to define a parametric skill model in adhering
1) a class of decision making problems; to a benchmark system.
2) a utility function for the problem class; The likelihood function L(·) is defined in terms of the
3) a reference agent with a decision likelihood function; utility function u(·). The likelihood function L(a|c) provides
4) a prior probability of the competence; the likelihood of an alternative a being chosen by the stochastic
5) empirical/historical data, i.e., a set of problems and reference agent R(c).
associated decisions of a decision maker. Stochastic agents R(c) should cover the entire skill range of
Given a problem class Q, a decision making problem q ∈ Q decision makers (c ∈ [0, ∞]). The agent with zero apparent
is associated to a set of alternatives Aq = {a}. A utility function competence (c = 0) corresponds to making random choices.
u(q, a) assigns numerical values to each feasible alternative a Greater c values correspond to better competence. The ideal
of the problem q. decision maker R(∞) always makes an optimal decision as
The model is defined by the likelihood that the alternative defined by u(·).
a is chosen by a stochastic reference agent with given The likelihood of an agent choosing an alternative a is
competence c. The likelihood depends on the particular always greater than zero, regardless of its utility value. The
problem q and, more specifically, on the relative utility of the likelihood is a monotonically nondecreasing function of the
alternatives Aq . utility (dL/du ≥ 0) and is convex (d 2 L/du2 ≥ 0).
The Bayesian method is adopted to determine the model The requirements for the function L are summarized as
that best explains the evidence of a given decision maker’s past follows:
decisions. The Bayesian inference method requires an initial 1) L(a|c) is finite and positive for all alternatives a ∈ Aq
prior probability, which is used to incorporate any knowledge and competence parameter c;
about the probability of the model before any evidence is 2) for c = 0, L(·) is independent of u(·), i.e., all alternatives
considered. Where no prior knowledge is available, a uniform are equally likely to be chosen (zero competence);
distribution may be used. 3) as the competence-parameter c increases and given
Table I summarizes the general notation used in this section. u(ai ) > u(aj ), L(ai |c)/L(aj |c) increases;

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1293

C. Inference of the Parametric Model


Bayesian inference [19]–[21] is an iterative process in which
evidence modifies an initial probability distribution of belief.
In each iteration, the initial distribution is the prior probability,
whereas the modified belief is the posterior probability.
Let us consider the event q, a, where a is an alternative
chosen for the problem q (a ∈ Aq ). In the following, the
problem q is considered implicit to simplify the notation. The
posterior probability of the parameter c, given the evidence of
the choice a, depends on the a priori probability p(c) and the
conditional probability of the evidence a given the competence
parameter c, as stated by the Bayes’ theorem
p(c) · p(a|c)
p(c|a) =  . (3)
c p(c) · p(a|c)
Fig. 1. Likelihood function for various u values (k = 0.1).
Let us consider a set of N events S = {qi , ai }, where
i ∈ [1, N]. We iteratively apply the Bayesian rule in (3) to
4) as c → ∞, L(ai |c)/L(aj |c) → ∞, i.e., less attractive
the set of events in S, where the a priori probability at step i
options can be made arbitrarily unlikely.
(i > 1) is the posterior probability at step i − 1
Given the above requirements, we choose to define the
likelihood of an alternative a being chosen by a stochastic
p(c|ai−1 ) · p(ai |c)
reference agent with given competence c as p(c|ai ) =  (4)
c p(c|ai−1 ) · p(ai |c)
p(a|c) ∝ L(a|c) = (u(a∗ ) − u(a) + k)−c (1)
when considering the sequence of events qi−1 , ai−1  and

where a = argmaxaj ∈Aq (u(aj )) and k is an arbitrary small qi , ai .
constant (k ∈]0, 1]), which ensures that L(·) is finite. The In the experimental analysis, we have set the initial a priori
constant k should be small w.r.t. the typical utility values in probability p(c) to be a know nothing uniform probability P0 ,
the specific domain. if not otherwise stated.
The conditional probability p(a|c) of the agent R(c) For a given set of events associated to a decision maker,
selecting the alternative a given the competence parameter the inference process produces an a posteriori probability
c, is simply given by normalizing the likelihood in (1), as distribution of the model parameter c. The expected value c̄ is,
follows: by definition, the average apparent competence of the decision
maker and can be used to generate a skill rating system. The
variance of the probability distribution provides a measure
L(a|c) of the uncertainty of the rating, similarly to [9] and [22].
p(a|c) =  . (2)
aj ∈Aq L(aj |c)
The uncertainty of the competence can be associated to the
cumulative effect of several causes, including the uncertainty
The likelihood in formula (1) is a function of the com- of prior belief, the intrinsic variability of human factors and the
petence c and the difference (u (a) = u(a∗ ) − u(a)) be- limited amount of empirical evidence. According to the central
tween the utility value of the alternative a and the value of limit theorem, the variance should be inversely proportional
the optimal alternative according to the benchmark system. to the square root of the sample size (empirical evidence).
u (a) is non-negative by definition. Fig. 1 shows an example However, for a very large sample size we expect to find a non
of the likelihood as function of the competence c for a zero asymptotic minimum of the variance due to the intrinsic
set of 8 alternatives, whose corresponding u (a) values are variability specific to each human decision maker.
{0.0, 0.25, 0.5, 0.9, 1.5, 2.0, 3.0, 12.0}. Given a value u (a), In general, practical applications of Bayesian inference
agents with greater competence are more likely to identify involve the subjective choice of prior probabilities and are
better decisions. Given a competence value c > 0, worse limited by the computational complexity of numerical meth-
decisions are less likely to be chosen than better ones. ods for the integration of the posterior probabilities. In the
The formula (1) is not the only function that can comply experimental analysis, we investigate the effect of different
with the requirements discussed above. It has been chosen prior probability distributions.
for its simplicity and generality, i.e., it does not depend on 1) Adaptive Algorithm: Efficient numerical methods can
a specific application domain. Further investigations may be be used to approximate the posterior distribution arising in
devoted to the experimental analysis of the effect of different Bayesian inference, especially for high-dimensional functions.
likelihood functions in specific application domains. Methods based on Markov Chain Monte Carlo (MCMC) and
In the next section, we describe the Bayesian inference Gibbs sampling have been proposed [23] and are available
method to determine a probability distribution of the parameter [24]. In our case, we have adopted a simple efficient algorithm,
c for decision makers from the evidence of their past decisions. which is briefly discussed here.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1294 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013

The parameter c is notionally in [0, ∞] but is initialized in TABLE II


practical computations as in [cmin , cmax ], where cmin ≥ 0. Additional Notation for Chess Domain
We have used an adaptive detection of the range of c for
a more efficient computation of the probability distribution. Notation Description
The parameters cmin , cmax , and δc define a finite set of discrete CE Chess Engine, a computer programme
W white
values of the parameter c B black
d number of plies, a Chess Engine parameter
ci = cmin + i · δc (5) nmv number of move variants, a Chess Engine parameter
m Chess move
where 0 ≤ i ≤ ( cmaxδ−c
c
min
). v utility values associated to a Chess move
q a problem, a Chess board position
The three parameters are adjusted during execution to allow Mq set of candidate moves in a given Chess position q
a better resolution of the distribution of c. The iterative process Vq set of pairs m, v, Chess move and its utility value
starts from a wide range [cmin , cmax ] with a coarse precision N number of board positions (problems)
G set of Chess games
(δc = 0.1). At each iteration step, the range is narrowed and S0,i set of events associated with a lost game
the precision increased (δc is decreased). This results in a S1,i set of events associated with a won game
more efficient computation in terms of runtime and memory S 1 ,i set of events associated with a drawn game
2
requirements. ELO FIDE Elo rating of a Chess player

same ELO figure to be attained in a different era by a player


IV. Skill Rating who in fact, in absolute terms, plays worse (inflation) or better
Many rating systems in games and sports are based on (deflation).
the Bradley–Terry model for paired comparisons [25]. The Elo ratings are determined from the results of games and
assumptions of such rating systems are that the strength of not by the innate quality of the moves played: they, therefore,
a player can be described by a single value (rating) and that measure competitive performance rather than underlying skill.
expected game results depend only on the difference between There have been criticisms of the Elo approach [30] and im-
the ratings of the two players. provements [9], [22], [29], [31] have been proposed. However,
Ratings based on pairwise comparisons perform an indirect they are still based on the outcomes of paired comparisons and
inference of the skill level by means of the outcomes of affected over time by the changing player population. As such,
competitive activities involving two or more individuals. Such these approaches cannot accurately determine the following:
rating systems are intrinsically relative. Most rating systems 1) inflationary trends in ratings, i.e., changing the quality
fall into this category, including the most prevalent, the Elo of play at a specific Elo figure;
system [10]. 2) the relative skill of players in a specific part of the game,
In comparison, Skilloscopy enables the direct measurement e.g., the opening or endgame;
of skill and could be applied, more generally, in noncompeti- 3) the relative skill of contemporary players in different
tive domains of complex decision making, where professional leagues and/or of different eras;
standards must be maintained despite the pressures of events, 4) whether a match or game was won by good play or lost
time constraints, partial information, problem complexity, and by bad play, or
ability. 5) whether a player is playing abnormally well and perhaps
cheating.
A. Skill Rating in Chess In contrast, a few systems have been proposed to assess,
The Elo system [10], [26], perhaps the best known rating rank and rate absolute chess skill. Guid and Bratko [32] and
system, was originally created for chess and later adapted Sullivan [33] use the error of move-decisions to calculate
to other games, video games and sports. For example, the mean error, but they do not use the full move-context of the
U.S. Table Tennis Association have adopted it to rate players decisions, nor do they use a Bayesian approach.
[26]. The National Collegiate Athletic Association (NCAA)
uses several rating systems, including Jeff Sagarin’s computer
ratings [27]. Sagarin’s ratings are used for NCAA basketball
teams and in the calculation of the Bowl Championship Series V. Bayesian Inference of Chess Players’ Rating
(BCS) computer ratings in college football. Sagarin’s overall The application of the proposed method to the game domain
ratings are based on two different ratings, one of which is a of chess and, in particular, to the problem of chess players
modification of the Chess Elo rating system. rating is presented. chess players, at their turn of play, face
Within a pool of players, Elo differences are meaningful, decision making problems over the set of legal moves for the
but Elos from different pools of players are not comparable given position on the board. Their decisions may often be seen
as Elos have no absolute meaning. They are also affected as suboptimal, if compared to a good benchmark system, e.g.,
by the Elos being imported (exported) to (from) the pool by the analysis of alternatives provided by a better player.
players entering (leaving). The Elo scales for human players In the remainder, we show how the general approach of
and for computers [28] are said to have been affected by both Section III can be specialized for the domain of chess. Table II
deflationary and inflationary forces [29]. It is possible for the summarizes some additional notation used in this Section.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1295

A. Reference Chess Engine and Stochastic Chess Agent VI. Experimental Analysis
During a chess game, players make decisions according to For the experimental analysis, the following resources have
individual judgement under time pressure. Rational decision been used:
making requires a definite set of alternative actions and 1) publicly available data in Portable Game Notation
knowledge of the utilities of the outcomes of each possible (PGN) from sources including the Chessbase database
action. A player’s skill is a measurement of their ability to [34];
make choices as close as possible to the optimal ones. In order 2) Toga II v1.3.1 [35], a publicly available, reputable, and
to assess a player’s skill level, we ideally need the best move widely used chess engine;
benchmark but this is only available via Endgame tables in the 3) the Universal chess interface (UCI) protocol [36] for
Endgame Zone. In the general case of the whole game, this is chess engine input/output;
clearly not feasible. However, significant advances have been 4) Java code implementing UCI and the Bayesian inference
made in the last decades in terms of chess engines’ playing method (see Section III-C).
strength [28]. chess engines can be adopted as domain experts During a preprocessing, phase positions were acquired from
to provide utility values for alternative moves. the games, ignoring the first 12 moves by each side (assumed
Given a chess engine CE, a chess board position analysis to be from the book). Configurations of the chess board were
results in a list of recommended best moves and their heuristic converted to a set of events {e = (p, m)}, where each move m
utility values in pawn units (centipawns). The value associated is made in a board position p. Positions were analyzed using
with a move corresponds to the estimated advantage of the the chess engine, which provided the utility values for a finite
position that the move will lead to after a number of turns. set of alternatives moves.
Because of time constraints and the exponential nature of the Each analysis was carried out to depth d = 10 plies. The
computational complexity, the analysis of the chess engine can chess engine was configured to determine and report the top
only be performed up to a given maximum depth d (number moves (nmv = 10) it found in each position. Both values
of plies) and for a limited number nmv of alternative variants. were chosen as compromises between computing speed and
For the purpose of this paper, we consider the reference comprehensiveness of the data. Depth 10 is not considered
chess engine CE(d, nmv ). sufficient to outplay the stronger players in our samples, but
In contrast to the ideal Endgame Zone scenario, three main apparently it suffices to identify most of their inaccuracies.
factors introduce an approximation in the evaluation of a Finally, the Bayesian inference process has been applied to
chess position in terms of candidate moves and relative values. the preprocessed data to generate the probability distribution
They are the limited search depth and span (parameters d and of the parameter c.
nmv ) and the heuristic nature of the chess engine’s analysis. In all our tests, the arbitrary constant k of formula (1) has
This is a general problem of sensitivity to heuristic utility been set to 0.1, which corresponds to a thousandth of the value
functions in Descriptive Decision Theory. The influence of of a pawn.
this approximation in our analysis will be the scope of future
investigations.
A. Composite Reference Elo Players
The analysis of a position q is a function fCE that provides
a list Vq = {(mi , vi )} of candidate moves mi and their estimated This experiment shows that the proposed Bayesian approach
utility values vi (1 ≤ i ≤ nmv ) is able to detect different skill levels among players with
different Elo ratings.
fCE : q → Vq . The decisions of players of different skill levels have been
analyzed. We have used all available 3432 games in which
Let Mq = {mi } be the set of candidate moves in Vq and both players had Elo ratings within 10 points of some Elo
vmax = maxi {vi }. figure, e.g., games of players rated between 2390 and 2410.
To model human players’ skill we associate a likelihood Games were grouped according to the Elo rating of the players.
function L (1) with the reference chess engine CE to create Each group contains games between players with a similar Elo
a stochastic chess agent R(c). The stochastic chess agent will rating (ELOmin ≤ ELO(player) ≤ ELOmax ). The number of
not always play the best move; it uses a likelihood function to games and the number of positions (move-events) that have
select one of the alternatives provided by the reference chess been included in the datasets of composite reference players
engine. are given in Table III.
Following the approach in Section III-B the probability of We have applied the Bayesian inference method described
R(c) selecting a move mj is given by: in Section III-C to each dataset of Table III independently.
The probability distributions of the parameter c is shown in
 (vmax −vj +k)−c
Fig. 2(a). A summary of these distributions is provided in
 , if mj ∈ Mq Table III in terms of the mean, standard deviation and 95%
p(c|mj ) = (vmax −vi +k)−c (6)
mi ∈Mq

0, otherwise. credibility region (CR) for c.


The expected value of c, E(c) = c̄, measures the average
In this case, we have generalized to feasible moves which quality of moves played in the games. We refer to c̄ as
may not be considered by the reference chess engine due to the apparent skill. The standard deviation σc measures the
the limited number nmv of alternative variants. uncertainty of the apparent skill level, caused by the varying

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1296 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013

TABLE III
Chess Games Datasets

1
Player ELOmin ELOmax Period Games N CRmin CRmax c̄ σc σc · N 2
Elo2100 2090 2110 1994-1998 213 11727 1.047 1.089 1.068 0.0101 1.091
Elo2200 2190 2210 1971-1998 2771 140390 1.098 1.112 1.105 0.0030 1.117
Elo2300 2290 2310 1971-2005 1064 57627 1.147 1.169 1.158 0.0049 1.171
Elo2400 2390 2410 1971-2006 3055 152589 1.210 1.224 1.217 0.0031 1.219
Elo2500 2490 2510 1995-2006 1646 83748 1.256 1.276 1.266 0.0044 1.266
Elo2600 2590 2610 1995-2006 746 37623 1.309 1.337 1.323 0.0068 1.326
Elo2700 2690 2710 1991-2006 121 9279 1.335 1.391 1.364 0.0140 1.345

performance of the players, the finiteness of the data, and the


spread of initial belief.
As expected, the standard deviation and the width of the
credibility region depend on the amount of input data and
is broadly proportional to √1N , where N is the number of
positions. The Elo2100 and Elo2700 datasets contain less data
than the others and so show slightly higher standard deviation.
In the next section, we analyze the effect of the amount of
input data in more detail.
In Fig. 2(b), a linear regression fit of the apparent skill of
composite reference Elo players is shown. This shows that the
proposed skill rating system correlates closely with the FIDE
Elo rating system in spite of the fact that they use different
information to infer the rating. The apparent skill is based on
the utility of individual moves, while FIDE ELO is based on
the results of whole games.
The parameters of the linear regression model can be used
to convert the apparent competence c̄ into an equivalent Elo
rating (c2ELO), i.e., c2ELO = 1949.53 · c̄ + 32.39.

B. Convergence Analysis
In order to check the derivation process of the probabil-
ity distribution of the apparent skill, we have taken snap-
shots at different iteration steps (i.e., number of positions).
Fig. 3 shows the analysis that has been carried out on the 2400
Elo data. The curves in Fig. 3(a) show the evolution of the
probability distribution during the refinement of the Bayesian
inference process. The expected value c̄ [Fig. 3(b)] quickly
converges and the standard deviation [Fig. 3(c)] decreases as
the inference process draws on more data. The asymptotic
value of the standard deviation is a measure of the intrinsic
variability of the skill level. Fig. 2. Posterior probability distributions of the model R(c) for composite
reference Elo players. (a) Competence probability distributions. (b) FIDE Elo
rating versus inferred apparent competence (c̄±σ): the linear regression model
C. Skill Difference in Players With Similar Elo Ratings is y = 1949.53 · x + 32.39.

In this section, we present the experimental test aimed at


investigating differences of apparent skill in single games by outcome to generate three sets of events S0 , S1 , and S 1 .
2
between players with a similar Elo rating. Note that such a The set S1 contains sets of move-events that have been made
difference cannot be detected by the Elo system in principle. by players who won the game, S0 contains moves made by
The Elo rating captures an average performance of a player players who lost, and S 1 those made by players who drew
2
in terms of game outcomes and not in terms of the quality of ⎧
the moves. ⎨ S0,W ∪ S0,B → S0 = {S0,i }, i = 1, ..., n0
Given a set of games G among players with similar Elo G→ S1,W ∪ S1,B → S1 = {S1,i }, i = 1, ..., n1 .
rating (e.g., E2400), from each game we have extracted two ⎩ S 1 ∪ S 1 → S 1 = {S 1 }, i = 1, ..., n 1
,W 2 ,B 2 ,i 2 2 2
sets of events, one for each player, white (W) and black
(B). The events are associated with the outcome of the game Each set Sr,i = {q, m}, where r ∈ {0, 1, 21 }, contains the
(loss = 0, win = 1, draw = 1/2). The events are aggregated moves of a single player during a single game.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1297

TABLE IV
Analysis of the Opponent Players in the Dataset E2400

set nr μc̄ σc̄


S0 313 1.1493 0.0686
S1 313 1.2302 0.0623
S1 578 1.2339 0.0460
2

between the two opponents of a game having similar Elo


ratings. On an average, players who have won the game have
a higher apparent skill c̄ than their opponents who have lost.
Players who have drawn have even higher apparent skill.
This can be explained considering that in drawn games both
opponents have played well with no or irrelevant errors. In
draws, the intrinsic quality of the game is in general higher.
When a player has reached a significant advantage during the
game, they may prefer to play an easy and safe strategy. They
can even afford to make small errors provided the outcome
is ensured. In this case, there is a lack of motivation to play
high-risk tactics, even if optimal.

VII. Applications
In this section, we demonstrate some interesting applications
of the proposed method in the domain of chess.
In the first example, the method is used to generate the
skill profile of players over several decades, even before the
official adoption of the Elo rating system. We compare the
profile of a top player with the official FIDE Elo ratings
and with the ratings generated by Chessmetrics [37]. FIDE
have published players’ Elo ratings every three months since
1970. Chessmetrics has been chosen for comparison because
it is an attempt to improve the accuracy of the statistical
inference method of the Elo system. It has also been applied to
game data before the FIDE adoption of the Elo system. Both
Elo and Chessmetrics use only the results of games (paired
comparisons) to infer players’ strength.
We have also generated a historical comparison of a few
top players’ profiles. This scenario is used to to carry out an
experimental analysis of the sensitivity to the prior probability.
Fig. 3. Convergence evolution for the dataset 2400 Elo. (a) Probability In the second example, we present a chart that is suitable to
distribution at different iteration steps. (b) Expected value. (c) Standard visualize the within-game skills of players and their opponents.
deviation.
In particular, we have generated this chart to analyze an aspect
We have applied the Bayesian inference process to each set of the famous and controversial [38] final stage of the 1948
of events Sr,i and compute c̄ for each of them. World Championship.
In this case, the apparent skill c̄ measures the quality of Finally, in the third example, we show how to use ratings
moves played by a single player during a single game, with a and profiles generated by the proposed method to analyze the
consequent expected high uncertainty because of the limited performance of players accused of cheating.
amount of data on which the inference is carried out.
We have computed first order statistics of the apparent skill A. Profiling Human Skills Over Time
c̄ over the three sets S0 , S1 and S 1 . In this test, we have used We have selected all the publicly available games of a few
2
602 games of the dataset E2400, 313 of which were a win top players to generate and compare their skill profiles over
and 289 a draw (n0 = n1 = 313 and n 1 = 578). The average many years. The apparent competence c̄ has been converted
2
apparent skill μc̄ over all 1204 Sr,i , regardless of the result r, into an equivalent Elo rating (c2ELO) by using the regression
is 1.2109. The results over each set S0 , S1 , and S 1 are shown model obtained in Section VI-A and Fig. 2(b).
2
in Table IV. First, we have generated the skill profile of Viktor Lvovich
In spite of the small number of events in a single game, the Korchnoi (1931-) to compare the proposed method to other
Bayesian approach is able to detect a meaningful difference chess rating methods. Korchnoi is currently the oldest active

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1298 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013

Fig. 4. Viktor Lvovich Korchnoi (games from 1950 to 2006).

grandmaster and he is considered one of the strongest players


who never won the World Championship (WC). He played a
candidate final (1975) and two WC finals (1978, 1981). His
longevity near the top of the international chess competitions
and the turbulent political environment in which he played
(he defected from the U.S.S.R in 1976) make the study of his
historical ratings particularly interesting.
The chart in Fig. 4 shows the equivalent Elo rating (c2ELO)
obtained with the proposed method, the actual Elo rating
extracted from game annotations, the average Elo rating per
year, the Chessmetrics rating, and the average Chessmetrics
rating per year. Korchnoi’s c2ELO profile (Fig. 4) shows the
consistency of the proposed method with other methods based
on paired comparisons. Inference based on paired comparisons
may be affected by the frequency of played games, the number
and strength of similar and better opponents and other factors. Fig. 5. Selected top players’ ratings based on paired comparisons. (a) FIDE
ELO ratings. (b) Chessmetrics ELO ratings.
The rating provided by the proposed Bayesian inference is
based on the intrinsic quality of individual moves.
In Fig. 4 two synchronous drops in Korchnoi’s Elo and Korchnoi in the comparison. In the analysis for these players,
Chessmetrics ratings are evident in the periods 1971–1972 and we have used all available games, a total of 9404 games and
1980–1983. They indicate that Korchnoi was not performing 22 1965 positions.
as before in terms of game outcomes. Fig. 5 reproduces the FIDE Elo ratings and Chessmetrics
The rating produced by the Bayesian inference is consistent ratings [37] of the selected players.
with the other two methods before 1980. In comparison, in the Fig. 6 shows a comparison of the four players selected. In
period from 1980 to 1986 the Bayesian method indicates a sta- this case, we have carried out three different Bayesian infer-
ble competence. In this period, he may have lost games against ences varying the initialization policy of the prior probability.
better opponents, but he did not show a worse competence in Since the analysis covers several decades, we have decided to
terms of game quality. This may be related to the dominance reset the prior probability at the beginning of each year. In the
of other players, e.g., Karpov and Kasparov [see Fig.6(a)]. first year (1950), the uniform probability is always used.
In the previous section, the excellent linear regression fit Fig. 6(a) reproduces the profiles when a continuous infer-
in Fig. 2(b) has shown that the Bayesian inference method ence is performed: the prior probability at the beginning of
correlates well with the FIDE Elo rating system. Nevertheless, the year is the posterior probability of the previous year.
the evident inconsistency in Korchnoi’s ratings in the period Fig. 6(b) reproduces the profiles when the prior probability
1980-1986 may suggest that the Bayesian inference method at the beginning of the year is set to N(μc , 2·σc ), where μc and
is able to detect the difference between a game lost against σc are the statistics of the posterior probability of the previous
a better opponent and a game lost because of a poorer year.
competence than in the past. Fig. 6(c) reproduces the profiles when the prior probability
For a longitudinal comparison of players, we have selected at the beginning of the year is set to a uniform probability:
the players awarded the World Champion title between 1970 each year is analyzed independently.
and 1990: Bobby Fischer (1943, 2008), Anatoly Karpov The different initializations of the prior probability
(1951-) and Garry Kasparov (1963-). We have also included correspond to a different weighting of the knowledge w.r.t.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1299

Fig. 7. Within-game skill chart: Keres’ and opponent’s competence (c̄) over
single games (World Championship 1948).

filter. When prior knowledge is given less or no weight, high-


frequency components are not filtered and sudden variations
of the skill are shown.

B. Within-Game Skill Chart


Each chess game provides two lists of moves, one for each
players, and is associated with a result. Each list of moves
can be used to generate apparent competencies (c̄) for the two
opponents during the game. The within-game skill chart can
be used to convey information about the performance of two
opponents of games and their outcomes.
An example of this type of chart is shown in Fig. 7, which
is based on Keres’ games at the 1948 World Championship.
Each data point represents a game between Keres and his
opponent and is shown with a symbol to indicate win, loss,
or draw for Keres. The horizontal axis is associated to Keres’
average competence during the game; the vertical axis with
his opponent’s average competence.
For example, the solid square at coordinates (1.8, 0.3) in
the chart of Fig. 7 means a victory for Keres: in this game
Keres made moves with an average competence c̄ = 1.8 and
his opponent with c̄ = 0.3. In this game, Keres won easily
against an opponent who performed poorly.
Keres’ World Championship performance against Botvinnik
in 1948 has long been a matter of speculation, as it is rumored
that he was under pressure not to impede the latter’s progress
to the title. The event was played as a quintuple round
Fig. 6. Historical comparison of top players (c2ELO ratings) over time.
(a) Three different initializations of the prior probability at the beginning of
robin. Keres’ and opponents’ competence per game have been
each year. (b) Running inference: no prior probability reset prior probability computed for the 20 games in which he was involved (Fig. 7).
reset to N(μc,t−1 , 2×σc,t−1 ). (c) Prior probability reset to uniform distribution. The chart has symbols (cross, square, and circle) to represent
the outcomes of games; in this particular chart there are also
new empirical evidence. At the beginning of each year, two additional symbols (circle and filled circle) to identify
the prior knowledge is provided by the inference over the the games Keres played against Botvinnik. Keres lost the first
previous year. The three initializations methods correspond four games against Botvinnik and won the last only when
to a full propagation, partial propagation, and no propagation Botvinnik had already secured the title. The vertical line at
of this knowledge. about x = 1.1 corresponds to Keres’ competence c̄ = 1.1. The
As expected, when a continuous inference is performed the line clearly separates the first four games Keres played against
curves are smooth, as the past knowledge works as low-pass Botvinnik from all his other games in the championship. In

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
1300 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS, VOL. 43, NO. 6, NOVEMBER 2013

deter clandestine activity and to help focus the Tournament


Director’s finite forensic resources appropriately during play.

VIII. Related Work


This paper generalizes and extends in several ways the work
in [41]–[43].
The work in [41] introduced the concept of reference agents
for the Endgame Zone (EZ), defined as that part of chess for
which Endgame Tables (EGTs) have been computed. An EGT
provides the value, win/draw/loss, of a position and its depth
to some win goal (e.g., to mate) if decisive: chess engines play
optimal moves in EZ by using EGTs.
The work in [42] introduced the concept of the reference
agent based on a chess engine. The work in [43] presented a
Bayesian inference method to generate chess players’ ratings
and provided the first experimental analysis. The present work
extends the analysis of chess players’ skill in [43] in several
ways, including the following. A small-error linear regression
fit shows that the method provides the same discriminative
power as indirect statistical inference methods based on paired
comparisons such as the Elo rating system. An experimental
analysis of sensitivity to prior probabilities is carried out. A
number of interesting and novel applications of the method to
the game of chess are demonstrated.

IX. Conclusion
The problem of modeling decision makers’s skill was in-
vestigated. The proposed approach, Skilloscopy, was based on
the definition of a general stochastic model and a Bayesian
Fig. 8. D. P. Singh’s (games from Oct. 2005 to Oct. 2008). (a) Within-game
skill chart: Singh’s and opponent’s competence (c̄). (b) Probability density inference method. It did not assume any domain-specific
function of the competence before and after allegations. model of the decision making process.
The approach was demonstrated in the game domain of
those four games Keres’ competence c is below 1.1; in all chess. The experimental analysis had shown the viability of
other games it is above 1.1. The chart shows that in those rating players’ skill by benchmarking against chess engines.
four games Keres clearly performed worse than in any other The statistical inference was based on the quality of decisions,
game of the competition. If this was intended or not is, of rather than on paired comparisons (game outcomes) as in
course, out of the scope of the analysis. Nevertheless, such previous approaches.
a tool may be used to support or to reject hypotheses and to The method was successfully applied to a large set of chess
motivate further analysis and, eventually, investigations. game data and validated with the FIDE Elo ratings. The
C. Alleged Chess Cyborgs experimental analysis provided evidence of the accuracy of
the method in estimating the skill level of players regardless
There have been quite a number of players suspected of of the outcome of the games and of the opponent rating.
fraudulently receiving computer advice during play (e.g., [39]). Further work will address the generalization of the method
D.P. Singh’s play came under suspicion in the second half to multidimensional skills and the influence of approximations
of 2006 [40]. We have analyzed all available games from Oct. of the utility function.
2005 to Oct. 2008 (Fig. 8) over the entire period and over the In principle, the proposed method can be effectively adopted
two periods before and after the allegations. in similar domains, where an accurate method to determine
Fig. 8(a) shows the within-game chart for all games over the utility values was available. It can be used, for example, to
entire period. In some games, he played at an exceptionally analyze in real-time the likely abilities of students and skilled
high skill level. workers in defined-process scenarios.
The two apparent competence profiles before and after the
allegations are shown in Fig. 8(b). These profiles are well
separated and indicate a drop of skill level after the allegations References
were made.
[1] O. Haavisto and A. Remes, “Data-based skill evaluation of human
These example suggest that the proposed method could operators in process industry,” in Proc. Int. Conf. Control Automat. Syst.,
be useful to create real-time skill monitoring applications to Oct. 2010, pp. 707 –712.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.
DI FATTA AND HAWORTH: SKILLOSCOPY: BAYESIAN MODELING OF DECISION MAKERS’ SKILL 1301

[2] M. Aizuddin, N. Oshima, R. Midorikawa, and A. Takanishi, “Develop- [29] M. E. Glickman, “Parameter estimation in large dynamic paired com-
ment of sensor system for effective evaluation of surgical skill,” in Proc. parison experiments,” J. Royal Stat. Soc. Ser. C (Appl. Stat.), vol. 48,
First IEEE/RAS-EMBS Int. Conf. Biomed. Robotics Biomechatron., Feb. no. 4, pp. 377–394, 1999.
2006, pp. 678–683. [30] J. Beasley, The Mathematics of Games. New York, NY, USA: Dover,
[3] E. Lorias, M. Minor, S. Ortiz, P. Olivares, and J. Gnecchi, “Computer 2006.
system for the evaluation of laparoscopic skills,” in Proc. Electron. [31] P. Dangauthier, R. Herbrich, T. Minka, and T. Graepel, “TrueSkill
Robotics Automot. Mech. Conf., Sep.–Oct. 2010, pp. 19–22. through time: Revisiting the history of chess,” in Proc. NIPS, 2008,
[4] K. Yacef and L. Alem, “Evaluation of learner’s skills in the context pp. 931–938.
of dynamic and complex systems,” in Proc. IEEE Int. Conf. Syst. Man [32] M. Guid and I. Bratko, “Computer analysis of world chess champions,”
Cybern., vol. 3. Beijing, China, Oct. 1996, pp. 2200–2204. ICGA J., vol. 29, no. 2, pp. 65–73, Nov. 2006.
[5] A. Koivo and D. Repperger, “Skill evaluation of human operators,” in [33] C. Sullivan. (2008). Comparison of Great Players [Online]. Available:
Proc. IEEE Int. Conf. Syst. Man Cybern. Comput. Cybern. Simul., vol. 3. http://www.truechess.com/web/champs.html
Oct. 1997, pp. 2103–2108. [34] ChessBase GMBH. (2013). Chessbase Player Database [Online]. Avail-
[6] R. Hartley and G. Varley, “The design and evaluation of simulations for able: http://www.chessbase.com
the development of complex decision-making skills,” in Proc. IEEE Int. [35] T. Gaksch. (2013). Toga II 1.3.1 Chess Engine [Online]. Available: http:
Conf. Adv. Learn. Technol., 2001, pp. 145–148. //www.superchessengine.com/toga ii.htm
[7] G. M. Goh, C. Quek, and D. Maskell, “Epilist II: Closing the loop in [36] S. Meyer-Kahlen. (2013). Definition of the Universal Chess Interface
the development of generic cognitive skills,” in Proc. IEEE Trans. Syst. [Online]. Available: http://wbec-ridderkerk.nl/html/UCIProtocol.html
Man Cybern., A, Syst. Hum., vol. 40, no. 4. Jul. 2010, pp. 676–685. [37] J. Sonas. (2013). Chessmetrics [Online]. Available: http://www.
[8] K. Watanabe and M. Hokari, “Kinematical analysis and measurement chessmetrics.com
of sports form,” in Proc. IEEE Trans. Syst. Man Cybern., A, Syst. Hum., [38] C. C. Moul and J. V. Nye, “Did the Soviets Collude? A statistical
vol. 36, no. 3. May 2006, pp. 549–557. analysis of Championship Chess 1940-78,” J. Econ. Behav. Organ.,
[9] R. Herbrich, T. Minka, and T. Graepel, “TrueSkillTM : A Bayesian skill vol. 70, no. 1–2, pp. 10–21, 2009.
rating system,” in Proc. Adv. Neural Inf. Process. Syst., 2007, pp. 569– [39] F. Friedel, “Cheating in chess,” in Advances in Computer Games 9, H. J.
576. van den Herik and B. Monien, Eds. Maastricht, The Netherlands: IKAT,
[10] A. Elo, The Rating of Chessplayers, Past and Present. New York, NY, 2001, pp. 327–346.
USA: Arco, 1978. [40] Chessbase News. (2007). D.P. Singh: Supreme Talent or Flawed Genius?
[11] D. North, “A tutorial introduction to decision theory,” IEEE Trans. Syst. [Online]. Available: http://www.chessbase.com/newsdetail.asp?newsid=
Sci. Cybern., vol. 4, no. 3, pp. 200–210, Sep. 1968. 3595
[12] H. Raiffa, Decision Analysis: Introductory Lectures on Choices Under [41] G. McC. Haworth, “Reference fallible endgame play,” ICGA J., vol. 26,
Uncertainty. Reading, MA: Addison-Wesley, republished by McGraw- no. 2, pp. 81–91, Jun. 2002.
Hill, 1968. [42] G. McC. Haworth, “Gentlemen, stop your engines!” ICGA J., vol. 30,
[13] W. Edwards, R. F. Miles, Jr., and D. von Winterfeldt, Eds., Advances in no. 3, pp. 150–156, Sep. 2007.
Decision Analysis From Foundations to Applications. Cambridge, U.K.: [43] G. Di Fatta, G. Haworth, and K. Regan, “Skill rating by Bayesian
Cambridge Univ. Press, 2007. inference,” in Proc. IEEE Symp. Comput. Intell. Data Mining, Apr. 2009,
[14] S. Dillon, “Descriptive decision making: Comparing theory with prac- pp. 89–94.
tice,” in Proc. 33rd Ann. Oper. Res. Soc. New Zealand Conf., Aug. 1998,
pp. 99–108.
[15] A. De Groot, Het denken van den schaker. Amsterdam, The Netherlands:
Giuseppe Di Fatta (M’02) received the “Laurea”
Noord Hollandsche, 1946.
degree (M.Eng.) in electronics engineering and the
[16] A. De Groot, Thought and Choice in Chess, 2nd ed. (Revised translation
Ph.D. degree in computer science from the Univer-
of De Groot, 1946). Hague, The Netherlands: Mouton Publishers, 1978.
sity of Palermo, Palermo, Italy, in 1995 and 2002,
[17] F. Gobet, “Chess players’ thinking revisited,” Swiss J. Psychol., vol. 57,
respectively.
no. 1, pp. 18–32, 1998.
He is an Associate Professor of computer science
[18] F. Gobet and N. Charness, Expertise in Chess, Chess and Games. Cam-
at the University of Reading, Reading, U.K., where
bridge Handbook on Expertise and Expert Performance. Cambridge,
he has been since 2006. In 1999, he was a Research
U.K.: Cambridge Univ. Press, 2006.
Fellow at the International Computer Science Insti-
[19] J. O. Berger, Statistical Decision Theory and Bayesian Analysis, 2nd ed.
tute (ICSI), Berkeley, CA, USA. From 2000 to 2004,
Berlin, Germany: Springer-Verlag, 1985.
he was with the High Performance Computing and
[20] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data
Networking Institute, National Research Council, Italy. From 2004 to 2006, he
Analysis, 2nd ed. London, U.K.: Chapman and Hall/CRC, 2003.
was with the University of Konstanz, Konstanz, Germany. He has published
[21] P. M. Lee, Bayesian Statistics: An Introduction, 3rd ed. New York, NY,
over 60 papers in peer-reviewed conferences and journals. His current research
USA: Wiley, 2004.
interests include data mining, scalable algorithms, distributed and parallel
[22] R. Coulom, “Whole-History Rating: A Bayesian rating system for
computing, and multidisciplinary applications.
players of time-varying strength,” in Proc. Conf. Comp. Games, Beijing,
Dr. Di Fatta has organized and chaired international workshops and confer-
China, 2008, pp. 113–124.
ences in data mining, distributed systems, and computer networks.
[23] T. Minka, “A family of algorithms for approximate Bayesian inference,”
Ph.D. dissertation, Massachusetts Ins. Technol., Cambridge, MA, USA,
2001.
[24] J. K. Kruschke, Doing Bayesian Data Analysis: A Tutorial Introduction Guy Haworth received the M.A. degree in math-
with R and BUGS. Amsterdam, The Netherlands: Academic/Elsevier, ematics from Oxford University, Oxford, U.K., in
2011. 1971, the Diploma degree in computer science from
[25] R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block Cambridge University, Cambridge, U.K., in 1969,
designs–Part I: The method of paired comparisons,” Biometrika, vol. 39, and researched the issues of performance, paral-
no. 3–4, pp. 324–345, 1952. lelism, and integrity of real-time systems at Cam-
[26] M. E. Glickman, “A comprehensive guide to chess ratings,” Am. Chess bridge until 1972.
J., vol. 3, pp. 59–102, 1995. He has been a Lecturer at the University of Read-
[27] B. West. (2006). A simple and flexible rating method for predicting ing, Reading, U.K., since 2003. In 30 years in the
success in the NCAA basketball tournament. J. Quantitative Anal. Sports industry, his roles ranged from product development,
[Online]. 2(3), article 3. Available: http://www.degruyter.com/view/j/ sales and marketing to customer service and con-
jqas.2006.2.3/jqas.2006.2.3.1039/jqas.2006.2.3.1039.xml?format=INT sultancy, mainly for International Computers Limited, Reading, U.K. He has
[28] The Swedish Chess Computer Association. (2013). The SSDF Rating published over 60 papers. His current research interests include the application
List [Online]. Available: http://ssdf.bosjo.net/list.htm of systems theory, frameworks, and the soft systems methodology.

Authorized licensed use limited to: Imperial College London. Downloaded on September 30,2022 at 09:09:23 UTC from IEEE Xplore. Restrictions apply.

You might also like