Ondas - Monte-Carlo
Ondas - Monte-Carlo
Abstract—In this paper, we propose a Monte-Carlo Tree Search player fights against an AI-controlled character. An AI in PvC
(MCTS) fighting game AI capable of dynamic difficulty adjust- usually acts as the opponent for the human player who plays
ment while maintaining believable behaviors. This work targets alone, sometimes as a sparring partner. In this work, we focus
beginner-level and intermediate-level players. In order to improve
players’ skill while at the same time entertaining them, AIs are on PvC and target beginner and intermediate human players
needed that can evenly fight against their opponent beginner and in fighting games.
intermediate players, and such AIs are called dynamic difficulty One of the main features of beginner and intermediate
adjustment (DDA) AIs. In addition, in order not to impair the players is that they do not fully know the game information
players’ playing motivation due to the AI’s unnatural actions such as character operations, available actions and fighting
such as intentionally taking damage with no resistance, DDA
methods considering restraint of its unnatural actions are needed. styles or tactics. They are often defeated by players who fully
In this paper, for an MCTS-based AI previously proposed by the know about the game and by AIs that are too strong compared
authors’ group, we introduce a new evaluation term on action with them. This may cause beginner and intermediate players
believability, to the AIs evaluation function, that focuses on the to lose the motivation to play the game, in the middle of
amount of damage to the opponent. In addition, we introduce a improvement of their skill, and quit it. To prevent this, AIs are
parameter that dynamically changes its value according to the
current game situation in order to balance this new term with needed that can entertain beginner and intermediate players,
the existing one, focusing on adjusting the AI’s skill equal to while such players are still improving their playing skills.
that of the player, in the evaluation function. Our results from Previously, the authors’ group proposed a Monte-Carlo Tree
the conducted experiment using FightingICE, a fighting game Search (MCTS) fighting game AI called “Entertaining AI”
platform used in a game AI competition at CIG since 2014, show (eAI) [1] whose goal is to entertain human players. This AI
that the proposed DDA-AI can dynamically adjust its strength
to its opponent human players, especially intermediate players, can evenly fight against its opponent players by dynamically
while restraining its unnatural actions throughout the game. adjusting its strength according to their playing skill, called
Index Terms—Monte-Carlo tree search, dynamic difficulty dynamic difficulty adjustment (DDA). Namely, eAI will con-
adjustment, fighting game AI, believable, FightingICE duct an action according to the current game situation: when
eAI is losing, it will conduct a strong action, otherwise, eAI
I. I NTRODUCTION will conduct a weak action. From the experimental results, eAI
Fighting games are real-time games in which a character could entertain its opponent human players by evenly fighting
controlled by a human player or a game AI has to defeat their against them. However, we observed that eAI often conducted
opponent using various attacks and evasion. In this work, AI unnatural actions such as repeating no-hit attacks and repeating
is defined as a computer program that controls a character step back even though the distance between the characters is
in a game. There are two types of matches in fighting games: far away. In order not to impair players’ playing motivation
Player VS Player (PvP), where two human players fight against due to AIs’ unnatural actions such as those by eAI mentioned
each other, and Player VS Computer (PvC), where a human above, DDA methods able to restrain its unnatural actions are
Fig. 1. Game flow [4] Ikeda and Viennot [2] mentioned the required elements
according to which players can enjoy playing games and how
to design them in terms of AIs in Go. They said using the
aforementioned game flow that AIs are needed that can adjust
their strength according to the opponent players’ skill to evenly
fight against or lose with a little difference in winning ratio.
Fig. 2 shows the game flow applied to fighting game AIs by
us with reference to the aforementioned work by Ikeda and
Viennot. In Fig. 2, players cannot enjoy playing the game if
the opponent AI crushes them (a) or loses with no resistance
at all (b). Additionally, performing clearly unnatural actions
only to balance the game (c) also impairs players’ enjoyment.
AIs should evenly fight against its opponent without unnatural
actions (d), and finally, AIs might lose to its opponent with a
little difference (e), or win if the opponent made some mistakes
Fig. 2. Game flow in fighting games
(f). That is, DDA-AIs capable of restraining its unnatural
actions are needed.
needed [2]. III. E XISTING M ETHODS FOR MCTS-BASED DDA
In this paper, we propose an MCTS fighting game AI
In this section, we describe two DDA-AIs using MCTS.
capable of dynamic difficulty adjustment while maintaining
These AIs are used for comparison with our proposed AI.
believable behaviors. This work targets beginner-level and
intermediate-level players. We use eAI as a based AI and we A. Entertaining AI
introduce a new evaluation term on action believability, to the Entertaining AI (eAI) was an MCTS-based DDA-AI pro-
AI’s evaluation function, that focuses on the amount of damage posed by our group [1]. This DDA method combines MCTS,
to the opponent. In addition, we introduce a parameter that Roulette Selection, and Rule-Based. In this section, we mainly
dynamically changes its value according to the current game explain MCTS, but we point out here that Roulette Selection,
situation in order to balance this new term with the existing where the frequency of each action actually played by the
term in the evaluation function. We verify the performance opponent human player is used in simulation of his/her actions,
of our proposed DDA-AI by a subjective experiment using is deployed in all of the AIs evaluated in this work. For more
FightingICE, a fighting game platform used in a game AI details about the other methods, please see Ishihara et al. [5].
competition at CIG since 2014 [3]. Fig. 3 shows an overview of MCTS applied to fighting
games. This MCTS is based on an open loop approach [6].
II. G AME F LOW
In this figure, the root node has the current game information
Chen [4] mentioned the required elements by which players which consists of both characters’ Hit-Points (HPs), energies,
can enjoy playing games and how to design games to satisfy positions, ongoing actions and the remaining time of the game.
players using game flow (Fig. 1). In Fig. 1, the x-axis Each node except the root node represents an action. In this
represents players’ skill of the game and the y-axis represents MCTS, an action spans from its input to its end, at which the
the game difficulty. This figure indicates that players can enjoy next action becomes executable. An edge simply represents the
playing the game if their skill and the game difficulty fall connection between a parent node and its child node. When
in “FLOW ZONE”. That is, adjusting the game difficulty a parent node’s action has finished, the next action will be
according to players’ skill is needed. This can be said not one of its child nodes. In summary, the game tree using this
only for game design, but also game AIs. MCTS represents the execution order of the AI’s actions.
eAI repeats the four steps in Fig. 3 within a time budget of 4) Backpropagation: evalj obtained from the simulation
Tmax . After the time budget is depleted, eAI selects the most part is backpropagated from the leaf node to the root node.
visited direct child node (action) from the root node as the The U CB1 value of each node along the path is updated as
next action. The rest of this subsection explains each step of well.
MCTS.
B. True Proactive Outcome-Sensitive Action Selection
1) Selection: The child nodes with the highest Upper
Confidence Bounds (U CB1) value [7] are selected from the True Proactive Outcome-Sensitive Action Selection
root node until a leaf node is reached. The formula of U CB1 (TPOSAS) is one of the MCTS-based DDA-AIs with
is: believability proposed by Demediuk et al. [8]. TPOSAS
also uses the same U CB1 formula (1). However, TPOSAS
evaluates nodes using the following formula:
r
2 ln N
U CB1i = X i + C , (1)
Ni +
node.score = − (|hs | − Ih ) , (5)
where Ni is the number of times node (action) i was visited,
N is the sum of Ni for node i and its sibling nodes and C is where hs is the HP difference between the AI and the oppo-
a constant. Xi is the average evaluation of node i represented nent, Ih defines the interval within which all HP differences
+
by the following formula: can be neglected, and (·) indicates the ramp function, i.e.,
a function behaving like the identity function for positive
1 X
Ni numbers and returning 0 for negative numbers.
Xi = evalj , (2) In this formula, the evaluations of all actions having hs less
Ni j=1
than Ih will be 0; otherwise, will be negative. Therefore, all
where evalj is the reward value gained in the jth simulation nodes (actions) with hs less than Ih are more visited. Because
and is defined as: there exist multiple actions that have the highest evaluation
value of zero, unnatural behaviors like repeating the same
action can be avoided.
afterHP my − afterHP opp
evalj = 1 − tanh
j j
, (3) In our experiment, Ih is set to 10% of the maximum player
Scale health as in the work by Demediuk et al. [8].
where afterHPjmy and afterHPjopp stand for HP of the AI C. Problems
and the opponent after the jth simulation, respectively, and As we mentioned in Section I, eAI could entertain its
Scale is a constant. As the HP difference between the AI and opponent human players by evenly fighting against them.
the opponent after the simulation is closer to 0, evalj will However, we could observe that eAI often conducted unnatural
obtain an evaluation value closer to 1. Thereby, strong actions actions such as repeating no-hit attacks and repeating step back
are highly evaluated when the AI is losing; otherwise, weak even though the distance between both characters is far away,
actions. especially in the game situation where the HP difference is
eAI selects the nodes with the highest U CB10i value (using around zero. In that situation, the evaluations of actions which
X i value normalized by using formula (4)) from the root node do not give damage to the opponent and at the same time
until a leaf node is reached. receive no damage such as moving actions will be higher than
other actions. From this, one can readily see that eAI tends to
0 X i − X min
Xi = (4) select such unnatural actions in the above situation.
X max − X min Demediuk et al. conducted the experiments where TPOSAS
In this formula, X max , X min stand for the maximum and fought against human players and other AIs that were submit-
minimum Xi in all nodes at the same tree depth. ted to the Fighting Game AI Competition (FTGAIC)1 to verify
2) Expansion: After a leaf node is reached in the Selection the method’s effectiveness. From these experimental results,
part, if the number of times the leaf node is explored exceeds TPOSAS could dynamically adjust its strength according to
a threshold Nmax and the depth of the tree is lower than a its opponents’ skill. However, although they mentioned about
threshold Dmax , all possible child nodes are created at once its believability, the authors did not quantitatively evaluate this
from the leaf node. factor. Also, they only used the HP difference at the end of the
3) Simulation: A simulation is carried out for Tsim sec- game as the evaluation criterion of DDA, and did not evaluate
onds, sequentially using all actions in the path from the root whether the AI can dynamically adjust its strength throughout
node to the current leaf node for the AI, and actions selected the game.
by Roulette Selection (see [5]) for the opponent. If the number IV. P ROPOSED M ETHOD
of actions of the AI or the opponent used in the simulation is
In this section, we define what is believability in fighting
less than a given number, five in our previous work, randomly
games and explain our new DDA method considering fighting-
selected actions will be used after all actions of the AI or
game believable behaviors.
the opponent have been conducted. The variable evalj is then
calculated using formula (2). 1 http://www.ice.ci.ritsumei.ac.jp/∼ftgaic/index.htm
A. Definition of Believable Behaviors in Fighting Games
As mentioned in Section I, the main purpose of fighting
games is to defeat the opponent using various attacks and
evasion. For that purpose, in this work, believable behaviors
are defined as the aggressive behaviors aimed to defeat the
opponent such as hitting attacks to the opponent properly.
Conversely, unnatural behaviors are defined as those behaviors
contrary to the main purpose mentioned above such as no-
hit attacks (described in Section III-C), although it could be
argued that such non-aggressive actions are also performed to a
certain extent by some human players to taunt their opponents.
B. Evaluation Function with Believability
Fig. 4. Screen shot of FightingICE
The new evaluation function taking into account believabil-
ity is defined as follows:
TABLE I
PARAMETERS USED IN THE EXPERIMENTS
evalj = (1 − α) Bj + αEj , (6)
Notation Meaning Value
where Ej is for difficulty adjustment defined using the same C Balancing parameter 0.42
formula as formula (3). Bj about the AI’s aggressiveness Nmax Threshold of the number of visits 7
(believability) represented by the following formula: Dmax Threshold of the tree depth 3
Tsim The number of simulations 60 frames
beforeHPjopp − afterHPjopp Tmax Execution time of MCTS 16.5 ms
Bj = tanh , (7) Scale Scaling parameter 30
Scale
where beforeHPjopp and afterHPjopp stand for HP of the
opponent before and after the jth simulation, respectively, A. FightingICE
and Scale is a constant. If the AI gives a high amount of FightingICE (Fig. 4) is a real-time 2D fighting game
damage to the opponent, Bj will obtain a high evaluation platform used in a game AI competition (FTGAIC) at CIG
value. Therefore, this term makes the evaluations of aggressive since 2014 [3]. This game has all main elements of fighting
actions aimed at defeating the opponent higher than non- games. In addition, it does not use a ROM emulator and has
aggressive ones. been originally developed from scratch and publicly made
The coefficient α in formula (6) is dynamically determined available for research purpose (see [10-14] for other recent
by formula (8) based on the current game situation: publications using this platform), so there are no legal issues
beforeHP my −beforeHP opp
j j
to be concerned. In FightingICE, one game consists of three
tanh Scale +1 60-second rounds and one frame is set to 1/60 seconds. Each
α= , (8)
2 AI has to decide and input an action in one frame. Each
where beforeHPjmy and beforeHPjopp stand for HP of the AI character’s initial HP is set to HPmax , and it will decrease
and the opponent, respectively, before the jth simulation, and when the character is hit. After 60 seconds or either character’s
Scale is a constant. The more the AI is winning against the HP is 0, the game will proceed to the next round, and each
opponent, the closer α reaches 1. Conversely, the more the character’s HP will be reset to HPmax . The character with the
AI is losing against the opponent, α becomes closer to 0. larger remaining HP at the end of the round is the winner.
Therefore, this coefficient makes it easier for the AI to select In our experiments, the value of HPmax is set at 400
actions suitable for difficulty adjustment (Ej ) when the AI is according to the rule of Standard Track of FTGAIC.
winning and select those increasing its aggressiveness (Bj )
B. Parameters
when the AI is losing. Also, when the HP difference is zero
which means the AI is evenly fighting against the opponent, The parameters used in our experiments are shown in
α becomes 0.5. In that situation, the AI selects actions that Table I. These parameters were set empirically through pre-
maintain both difficulty and believability. experiments.
In summary, the mechanism of our proposed method is C. Methods
making the AI select actions by considering not only how to
adjust its difficulty toward the opponent’s skill but also always We conducted subjective experiments to verify whether
how to defeat it. BEAI can adjust its strength according to the opponents’
skill while maintaining its believability. We used 38 subjects
V. E XPERIMENTS (average age: 23.4 ± 2.2) in our experiments. Before starting
In this section, we describe the conducted experiments to our experiments, we conducted an informed consent session
verify the performance of our proposed DDA-AI (Believable about our experiments, and subjects’ consents were obtained
Entertaining AI: BEAI). with their signature in a separate informed consent form. In
addition, we used eAI and TPOSAS for comparison. Our TABLE II
experiments were conducted for two days; the first day is to C ONTENT OF QUESTIONNAIRE
measure each subject’s skill of fighting games (Exp. 1) while
Dimension Index Content
the second day is to have them individually fight against eAI, 1 I felt it content
Positive Affect
TPOSAS and BEAI (Exp. 2). The content of Exp. 1 and Exp. 2 I felt it enjoyable
2 is given below. 3 I felt it challenged
Challenge
4 I felt it stimulated
1) Measurement of fighting games’ skill (Exp. 1): 5 The opponent’s attack skills were believable
Believability
The procedure of Exp. 1 is as follows: 6 The opponent’s dodging skills were believable
Name p-value
Expert .658
Intermediate .084
Beginner .723
C. Positive Affect
Fig. 6 shows the average evaluations of Positive Affect
toward gameplay against eAI, TPOSAS and BEAI, in each
group. In Fig. 6, the x-axis represents the group names,
the y-axis represents the evaluation value (1: Boring ∼ 5:
Enjoyable) of Positive Affect, and the error bar represents the
standard deviation of it in each group. We can see that BEAI
obtains higher evaluation values than eAI and TPOSAS against
Expert and Beginner. However, it obtains a lower evaluation
value than eAI against Intermediate. From our analysis, we
could observe that BEAI often forced players to fight in
the close range compared to the other two AIs. Subjects
belonging to Intermediate fought against their opponent AIs
Fig. 5. Average AHDTGs against eAI, TPOSAS and BEAI, in each group using various actions and strategies, similar to those players in
TABLE VI
R ESULTS OF A F RIEDMAN TEST ON C HALLENGE IN EACH GROUP
Name p-value
Expert .187
Intermediate .024∗
Beginner .840
TABLE VII
R ESULTS OF A F RIEDMAN TEST ON B ELIEVABILITY IN EACH GROUP
Name p-value
Expert .886
Intermediate .042∗
Beginner .420
R EFERENCES
[1] M. Ishihara, T. Miyazaki, T. Harada, and R. Thawonmas, “Analysis of
Effects of AIs and Interfaces to Players’ Enjoyment in Fighting Games,”
IPSJ Journal, vol. 57, no. 11, pp. 2415-2425, 2016, (in Japanese).
[2] K. Ikeda and S. Viennot, “Production of Various Strategies and Position
Control for Monte-Carlo Go - Entertaining human players,” in Proc.
Computational Intelligence in Games (CIG), 8 pages, 2013.
[3] F. Lu, K. Yamamoto, L. H. Nomura, S. Mizuno, Y. Lee, and R. Thawon-
mas, “Fighting Game Artificial Intelligence Competition Platform,” in
Proc. IEEE 2nd Global Conference on Consumer Electronics (GCCE),
pp. 320-323, 2013.
[4] J. Chen, “Flow in games (and everything else),” Communications of the
ACM, vol. 50, no. 4, pp. 31-34, 2007.
[5] M. Ishihara, T. Miyazaki, C. Y. Chu, T. Harada, and R. Thawonmas,
“Applying and Improving Monte-Carlo Tree Search in a Fighting Game
AI,” in Proc. 13th International Conference on Advances in Computer
Entertainment Technology (ACE 2016), no. 27, 2016.
[6] D.P. Liebana, J. Dieskau, M. Hunermund, S. Mostaghim, S. Lucas,
“Open Loop Search for General Video Game Playing,” in Proc. the 2015
Annual Conference on Genetic and Evolutionary Computation (GECCO’
15), pp. 337-344, 2015.
[7] L. Kocsis and C. Szepesvári, “Bandit Based Monte-Carlo Planning,” in
Proc. European Conference on Machine Learning (ECML), pp. 282-293,
2006.
[8] S. Demediuk, M. Tamassia, W. L. Raffe, F. Zambetta, X. Li, and F.
Mueller, “Monte Carlo Tree Search Based Algorithms for Dynamic
Difficulty Adjustment,” in Proc. Computational Intelligence and Games
(CIG), pp. 53-59, 2017.
[9] S. Yoshida, M. Ishihara, T. Miyazaki, Y. Nakagawa, T. Harada, and R.
Thawonmas, “Application of Monte-Carlo Tree Search in a Fighting
Game AI,” in Proc. IEEE 5th Global Conference on Consumer Elec-
tronics (GCCE), pp. 623-624, 2016.
[10] R. Ishii, S. Ito, M. Ishihara, T. Harada and R. Thawonmas, “Monte-Carlo
Tree Search Implementation of Fighting Game AIs Having Personas,” in
Proc. 2018 IEEE Conference on Computational Intelligence and Games
(CIG 2018), 2018.