0% found this document useful (0 votes)
33 views

Assignment (2)

This paper compares the performance of various AI bots in the game Yinsh, including adversarial search bots, machine learning bots, and hybrid approaches. The study highlights the strengths and limitations of each method, revealing that while adversarial search excels in tactical decision-making, machine learning bots demonstrate greater adaptability and innovation in strategy development. The results indicate that hybrid techniques may offer the best overall performance in complex strategic environments like Yinsh.

Uploaded by

afg.malika
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Assignment (2)

This paper compares the performance of various AI bots in the game Yinsh, including adversarial search bots, machine learning bots, and hybrid approaches. The study highlights the strengths and limitations of each method, revealing that while adversarial search excels in tactical decision-making, machine learning bots demonstrate greater adaptability and innovation in strategy development. The results indicate that hybrid techniques may offer the best overall performance in complex strategic environments like Yinsh.

Uploaded by

afg.malika
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Comparison of Machine Learning Bots and Adversarial Search Bots

in the game of Yinsh


By group KEN2
Academic year: 2024-2025

Contents work to approximate Q-values dynamically, which is


reinforced by a replay buffer for iterative training.
1 Introduction 1 The machine learning bot adapts and learns very effi-
ciently its performance is limited to the training data
2 Yinsh Playing Bots 2 presented and the computational resources. Finally
2.1 The Random Bot . . . . . . . . . . . . 2 the hybrid bot
2.2 The Alpha-Beta Bot . . . . . . . . . . 3
2.3 Deep Q-Learning Bot . . . . . . . . . 3
2.4 The Policy Gradient Bot . . . . . . . . 4 1 Introduction
3 Experiments 4 The emergence of Artificial Intelligence, also known
as AI, in strategic game decision-making has sig-
4 Results 5
nificantly improved the computational modeling of
5 Discussion 6 ’human-like’ reasoning and problem-solving. Re-
searchers have used methods like adversarial search,
6 Conclusion 6 machine learning, and hybrid approaches to optimise
gameplay—each technique offering distinct advan-
tages and limitations [3]. However, the comparison
Abstract between the performances and behaviours of these
approaches has not been explored, especially in com-
• Make at the end when everything is finished • Little plex board games like Yinsh. Yinsh is a two-player
summary of the whole paper • Shopping window of abstract board game that challenges AI agents thanks
the scientific paper to its strategic depth and computational complexity.
This paper explores the performance of different AI Yinsh embodies both offensive and defensive strate-
bots in the game Yinsh. More specifically the abil- gies by making players create a line of five chips while
ities of the adversarial search bot, machine learning simultaneously managing the position of their rings.
bot and hybrid approaches are compared to evaluate By creating this need for adaptability and foresight,
their strengths and limitations. Yinsh presents differ- the game Yinsh makes an excellent choice for evaluat-
ent challenges requiring both strategic foresight and ing different artificial intelligence agents. Adversarial
tactical adaptability. The adversarial search both search bots utilise minimax and alpha-beta pruning
uses mini-max algorithms and alpha beta pruning for algorithms to efficiently evaluate and anticipate the
efficient decision making, showing its tactical abili- opponent’s move [7]. Techniques like this thrive in
ties and computational efficiency. The machine learn- games with well-defined state spaces like Checkers,
ing bot uses deep Q-learning, utilizing a neural net- which have already been computationally solved [2].

1
However, Yinsh is a game wherein the vast and dy- making field. With their adaptive learning capabili-
namic state space creates challenges for Adversarial ties, we hypothesise that machine learning bots will
search bots due to their reliance on static heuristics demonstrate more incredible innovation and adapt-
[4]. ability in strategy development. However, adversar-
On the other hand, machine learning bots imple- ial search bots may excel in immediate tactical deci-
ment approaches like deep reinforcement learning, en- sions due to their computational efficiency. We ex-
hancing their performance iteratively [6]. As seen pect hybrid approaches combining dynamic learning
in games like Go and Starcraft, machine bots learn with precise strategic evaluation to achieve the best
from their gameplay experiences, allowing them to overall results. This research will analyse simulated
discover innovative strategies and adapt to many dif- matches between these bots, focusing on performance
ferent scenarios [3]. However, the computational re- metrics and qualitative aspects. By systematically
sources and their dependence on training data can comparing these methodologies, we aim to advance
limit their scalability, especially in games like Yinsh, knowledge in AI significantly and demonstrate the
which require real-time strategic adjustments [5]. Hy- untapped potential of hybrid techniques in complex
brid approaches, a relatively new concept in data sci- strategic environments like Yinsh.
ence, combine the strengths of Adversarial search and
machine learning by integrating structured search al-
gorithms and adaptive learning mechanisms. Tech-
niques like these promise to fill the gap between
static and dynamic strategies by integrating immedi- 2 Yinsh Playing Bots
ate decision-making and long-term planning. Despite
their potential, these hybrid methods remain rela-
tively unexplored, especially in the game Yinsh, mak- 2.1 The Random Bot
ing room for questions about their effectiveness in
games such as Yinsh. Previous research has provided To analyse the performance of Machine Learning and
valuable insights into AI methods for games. Studies Adversarial Search techniques in playing Yinsh, there
on adversarial search have highlighted its computa- must be a baseline against which to compare. For this
tional efficiency and precision [7]. baseline, we used a bot that played the game using
Moreover, machine learning approaches demon- neither, which is making random within legal Yinsh
strate adaptability, uncovering strategies across var- moves.
ious game types thanks to reinforcement learning
The Random Bot is very straightforward, which
[6]. Hybrid approaches have a high potential for ad-
makes it easy to implement and does not impact test
dressing the complexities of Yinsh, considering they
times. Because of its randomness, this bot can dis-
have outperformed traditional methods in certain do-
rupt the strategies utilised by the more advanced bots
mains [4]. However, no study has comprehensively
in their play and create more varied game states for
compared these approaches in this context. This
the bots to be tested against. Therefore, it serves
knowledge gap motivates our investigation: How does
very well as a baseline. On the other hand, a bot
incorporating Machine Learning bots into playing
with decision rules, while more realistic as a player
Yinsh compare to Adversarial Search bots and Hy-
agent, would not create nearly as much variety be-
brid Machine Learning and Adversarial Search tech-
tween runs for testing purposes.
niques for play performance? Moreover, what strate-
gies do learning bots arrive at, and what patterns Its implementation in the game is likewise simple.
emerge from this learning strategy? By addressing The Random Bot simply queries the game engine for
these questions, this study seeks to enhance its un- all the available moves and picks one at random to
derstanding of AI’s capabilities in strategic games play in both the ing placement and play phases of a
like Yinsh, broadening the computational decision- game.

2
2.2 The Alpha-Beta Bot
The first bot we used in the study is the Alpha-Beta
Bot, a strategic AI agent made to navigate the com-
plexities of Yinsh. This bot uses a combination of
hard-coded heuristics, the mini-max algorithm, and
alpha-beta pruning to find the best move according
to an evaluation function.
In the ring placement phase, the Alpha-Beta Bot
prioritises getting the so-called choke points at the
edges of the board. Then, it tries to have the most
rings in one of the zones we split the board into with- Figure 1: UML Decision Diagram for Alpha-Beta Bot
out giving its opponent more than a two-ring advan-
tage on any of the other ones. This strategy is based is coded with. It cannot learn or improve upon the
on an online discussion of initial ring placement in general Yinsh strategy, which an experienced human
Yinsh [1]. player will also know.
In the playing phase, the Alpha-Beta Bot performs
a mini-max search based on our evaluation function.
The evaluation function assigns scores to each move
2.3 Deep Q-Learning Bot
based on its game state. It considers whether there The Deep Q-Learning bot was developed as an ad-
are 3, 4, and 5 chips in a row, ring removals, and wins. vanced AI agent to develop strategic gameplay for
This mini-max tree is then pruned using the alpha- Yinsh beyond the general strategy used by the Alpha-
beta pruning technique, which improves efficiency by Beta Bot. The bot uses a neural network to approx-
eliminating unnecessary branches in the search tree. imate Q-values dynamically. A one-dimensional vec-
This allows the bot to find the best move without tor represents the game state, encapsulating the de-
spending too many branches. sign, and this vector then serves as input to the neural
To achieve this behaviour, we first implemented the network.
Alpha-Beta Bot to check which of these phases the The neural network’s outer layer is fixed at 100,
game is in. If it is in the ring placement phase, it representing the maximum number of potential ac-
places a ring in the checked points if they are free. tions. However, the number of valid moves varies
If these points are not free, it calculates the score dynamically in each state. The bot applies a mask-
for each zone of the board. It first tries to ensure ing mechanism that filters out invalid actions to avoid
no zone is more than two rings in favour of the op- complications. This ensures that only Q values corre-
ponent; then, the zone is most in its favour. In the sponding to valid moves are considered for decision-
playing phase, the Alpha-Beta constructs a mini-max making. By focusing only on practical actions, the
tree recursively to reach the best move. It also sorts masking mechanism enhances the bot’s efficiency and
the moves in each node to cut down on computational accuracy, which makes the bot make more strategic
time. choices.
The Alpha-Beta bot balances between not being This DQN bot uses a replay buffer and a memory
overly complicated in its structural design and play- system that records gameplay experiences as (state,
ing strategically. The mini-max algorithm allows it to action, reward, next state) tuples. These experiences
respond to its opponent, while the pruning keeps the capture a diverse range of scenarios encountered dur-
computational cost to a minimum. Moreover, it does ing the game. After each game, the replay buffer is
not require upfront computation like the other bots, sampled randomly, and the neural network is trained
which will be discussed later in this paper. However, using these experiences. This approach reduces cor-
it is also only as good as the evaluation function it relations in the training data and ensures that the bot

3
learns from successes and failures, making it more ef- ordered, and the probability of it leading the bot to a
ficient. win in a game is based on its training. After this, we
The bots training process includes updating the run the mini-max search on these new moves, giving
neural network weights by minimising the error be- us more information than a simple mini-max search
tween the target and predicted Q-values, which are based on a general evaluation function.
computed using the Bellman equation. This iterative
improvement allows the bot to adapt its strategies
over time, lining up its decision-making with long- We implemented this behaviour by creating a Re-
term rewards. By leveraging the replay buffer and ward class. This class handles the training and in-
masking mechanism, the bot can handle the complex crementally adjusts the direction weights, which the
and dynamic game of Yinsh, proving its advanced bot then calls in its mini-max tree. This mini-max
learning and adaptability. tree is implemented in the same way as the previous
Alpha-Beta Bot.

3 Experiments

We ran seven win rate tests to compare the perfor-


mance of our bots. Each of these win rate tests con-
sisted of 200 games. We measured the win rate of
each bot against itself and the random bot, giving us
Figure 2: UML Decision Diagram for Alpha-Beta Bot
seven combinations. To conduct these experiments,
we implemented a separate interface without any vi-
sual elements, printed the results in the terminal, and
2.4 The Policy Gradient Bot saved the game’s progress as a CSV file.
The Policy Gradient Bot is similar to the Alpha-Beta
Bot discussed previously, with the addition of a pol- We chose these specific comparisons because Yinsh
icy gradient machine learning algorithm implemented does not have a scoring system, so the win rate is
within the decision-making process. This bot also fo- the intuitive performance measure. Comparing the
cuses on the directions of the moves and their weights. bots to the Random Bot as a baseline allowed us to
Initially, the weights for each move direction are see how well the bot itself played. The tests against
equal. Then, with each game the Policy Gradient themselves, on the other hand, showed us if there
Bot plays, the weights of the directions are adjusted was a significant advantage for the bots to play first
and updated. After the bot is trained, the moves are or not.

4
4 Results

Figure 5: Pie chart of the win rates of Alpha-Beta


Bot vs. Alpha-Beta Bot. The colour red repre-
sents White’s win rate and is equal to 50.5%. The
colour green represents Blacks win rate and is equal
Figure 3: Pie chart of the win rates of Random Bot to 47.5%. The colour blue represents the percentage
vs. Random Bot. The colour red represents White’s of a Draw, equal to 4%.
win rate and is equal to 6%. The colour green repre-
sents Blacks win rate and is equal to 5%. The colour
blue represents the percentage of a Draw, equal to
85%

Figure 6: Pie chart of the win rates of Deep Q-


Learning Bot vs. Random Bot. The colour red rep-
Figure 4: Pie chart of the win rates of Random resents the Deep Q-Learning bots’ win rate and is
Bot vs. Alpha-Beta Bot. The colour red represents equal to 32%. The colour green represents the Ran-
Alpha-Beta’s win rate and is equal to 93%. The dom Bots’ win rate and is equal to 3.5%. The colour
colour green represents the percentage of a Draw, blue represents the percentage of a Draw, equal to
equal to 93% 64.5%.

5
yield similarly good results. The Deep Q-Learning
Bot managed to won 32% of the games, lost 3.5% of
them while drawing 64.5% of the time. This limita-
tion in the bots perfomance resulted from the Deep
Q-Learning Bot not having enough time to obtain
substantial ammount. The Deep Q-Learning requires
more time and data, which we unfortunately could
not provide, resulting in a performance worse than
the Alpha-Beta bot.
This leads us to conclude that we may have
oversimplified our approach given the tasks. We
hope that acknowledging these shortcomings will im-
prove future research on this subject. Our sugges-
Figure 7: Pie chart of the win rates of Deep Q- tions for future research are, therefore, refining the
Learning Bot vs. Deep Q-Learning Bot. The colour learning environment, enhancing the training of the
red represents White’s win rate and is equal to 31%. datasets, and exploring other reinforcement learning
The colour green represents Blacks win rate and is techniques that are better at handling the complex
equal to 53%. The colour blue represents the per- environment of the game Yinsh.
centage of a Draw, equal to 36%. Even though our research has limitations, it also
has several strengths. One notable strength was our
tailored approach to our Alpha-Beta bot in the game
5 Discussion of Yinsh, which made it perform well despite Yinsh
being a very complex game. Another strength is our
The aim of our study was to compare various Arti- use of a policy gradient in our Deep Q-Learning Bot,
ficial Intelligence strategies and focus on their per- which allows us to see how learning strategies can
formance in the game of Yinsh. These strategies in- adapt in our game of Yinsh. The findings of this
cluded Random Bots, Alpha-Beta Bots, and a Deep research are interesting yet align with existing lit-
Q-learning bot. When looking at the results of our erature, and they confirm the strengths of adver-
research, we can confidently say that there are pat- sarial search techniques, as concluded in previous
terns in the gameplay. studies [2], [6]. Interestingly, our machine learning
For instance, our findings showed that the Alpha- bot did not work as well as research would suggest.
Beta bot consistently outperformed the Random Bot, While classic techniques like the adversarial search
winning 93% of the time. In contrast, the Random performed reliably and excelled even, our attempt
Bot won no games and the two came to a draw only at machine learning did not surpass the classic tech-
7% of the time. Moreover, the Aphpa-Beta Bot only niques. However, this is not unexpected since ma-
came to a draw only 2% of the time and showed chine learning requires a lot of training, which we
very little difference between starting first or second unfortunately lacked in our research. More training
(white or black). The success of the Alpha-Beta Bot would result in better results for the machine learning
drawing so few games shows the effectiveness of ad- bot.
versarial search techniques in open ended strategy (implications further research)
games like Yinsh. This shows the potential of im-
plementing strategic algorithms in complex games,
especially games where fast tactical decisions are in- 6 Conclusion
evitable.
However, implementing our machine learning tech- This study investigated the performance of several ar-
niques, particularly the Deep Q-Learning bot, did not tificial intelligence techniques in the strategic board

6
game of Yinsh. These techniques are Machine Learn- [4] R. Cant, J. Churchill, and D. Al-Dabass. “A hy-
ing, Adversarial Search and Hybrid approaches. By brid artificial intelligence approach with appli-
establishing a Random Bot as a baseline, we could cation to games”. In: Proceedings of the 2002
analyse the performance of the more advanced tech- International Joint Conference on Neural Net-
niques in terms of their adaptability and innovation works. IJCNN’02 (Cat. No.02CH37290) (2002),
in a strategic environment. Our findings show that pp. 1575–1580. doi: 10 . 1109 / ijcnn . 2002 .
while Machine Learning Bots show enhanced adapt- 1007752.
ability, they do not compare to Adversarial Search [5] Niels Justesen et al. Deep Learning for Video
bots and excel in immediate tactical decisions. Hy- Game Playing. 2019. arXiv: 1708 . 07902
brid approaches show a combination that promises to [cs.AI]. url: https://arxiv.org/abs/1708.
use both benefits. 07902.
To conclude this study, we can emphasise the po-
tential of AI capabilities in complex strategic envi- [6] Richard E. Korf. “Multi-player alpha-beta prun-
ronments if they are improved. The importance of ing”. In: Artificial Intelligence 48.1 (Feb. 1991),
appropriate AI technique selection based on specific pp. 99–111. doi: 10 . 1016 / 0004 - 3702(91 )
needs is highlighted by utilising different approaches 90082-u.
and demonstrating that these approaches excel in dis- [7] Jonathan Schaeffer et al. “Checkers is solved”.
tinct aspects of gameplay. This insight can enhance In: Science 317.5844 (Sept. 2007), pp. 1518–
AI systems’ future development and mechanisms for 1522. doi: 10.1126/science.1144079.
competitive gameplay in strategic games.
Future research could explore developing and im-
plementing better hybrid techniques and assess their
effectiveness for various game scenarios. Investigat-
ing the impact of many diverse training datasets for
the Machine Learning bot could reveal insights into
their development. Exploring other AI methods may
also yield insights that will further push the bound-
aries of AI capabilities in complex games like Yinsh.

References
[1] Various Authors. Initial Ring Placement Strat-
egy. Accessed: 2025-01-21. Mar. 2012. url:
https://boardgamegeek.com/thread/30384/
initial-ring-placement-strategy.
[2] Ze’ev Ben-Porat and Martin Charles Golumbic.
“Learning from experience in board games”. In:
Advances in Artificial Intelligence (1990), pp. 1–
25. doi: 10.1007/978-1-4613-9052-7_1.
[3] Thomas A. Berrueta, Allison Pinosky, and Todd
D. Murphey. “Maximum Diffusion Reinforce-
ment Learning”. In: Nature Machine Intelligence
6.5 (May 2024), pp. 504–514. doi: 10 . 1038 /
s42256-024-00829-3.

You might also like