Assignment (2)
Assignment (2)
1
However, Yinsh is a game wherein the vast and dy- making field. With their adaptive learning capabili-
namic state space creates challenges for Adversarial ties, we hypothesise that machine learning bots will
search bots due to their reliance on static heuristics demonstrate more incredible innovation and adapt-
[4]. ability in strategy development. However, adversar-
On the other hand, machine learning bots imple- ial search bots may excel in immediate tactical deci-
ment approaches like deep reinforcement learning, en- sions due to their computational efficiency. We ex-
hancing their performance iteratively [6]. As seen pect hybrid approaches combining dynamic learning
in games like Go and Starcraft, machine bots learn with precise strategic evaluation to achieve the best
from their gameplay experiences, allowing them to overall results. This research will analyse simulated
discover innovative strategies and adapt to many dif- matches between these bots, focusing on performance
ferent scenarios [3]. However, the computational re- metrics and qualitative aspects. By systematically
sources and their dependence on training data can comparing these methodologies, we aim to advance
limit their scalability, especially in games like Yinsh, knowledge in AI significantly and demonstrate the
which require real-time strategic adjustments [5]. Hy- untapped potential of hybrid techniques in complex
brid approaches, a relatively new concept in data sci- strategic environments like Yinsh.
ence, combine the strengths of Adversarial search and
machine learning by integrating structured search al-
gorithms and adaptive learning mechanisms. Tech-
niques like these promise to fill the gap between
static and dynamic strategies by integrating immedi- 2 Yinsh Playing Bots
ate decision-making and long-term planning. Despite
their potential, these hybrid methods remain rela-
tively unexplored, especially in the game Yinsh, mak- 2.1 The Random Bot
ing room for questions about their effectiveness in
games such as Yinsh. Previous research has provided To analyse the performance of Machine Learning and
valuable insights into AI methods for games. Studies Adversarial Search techniques in playing Yinsh, there
on adversarial search have highlighted its computa- must be a baseline against which to compare. For this
tional efficiency and precision [7]. baseline, we used a bot that played the game using
Moreover, machine learning approaches demon- neither, which is making random within legal Yinsh
strate adaptability, uncovering strategies across var- moves.
ious game types thanks to reinforcement learning
The Random Bot is very straightforward, which
[6]. Hybrid approaches have a high potential for ad-
makes it easy to implement and does not impact test
dressing the complexities of Yinsh, considering they
times. Because of its randomness, this bot can dis-
have outperformed traditional methods in certain do-
rupt the strategies utilised by the more advanced bots
mains [4]. However, no study has comprehensively
in their play and create more varied game states for
compared these approaches in this context. This
the bots to be tested against. Therefore, it serves
knowledge gap motivates our investigation: How does
very well as a baseline. On the other hand, a bot
incorporating Machine Learning bots into playing
with decision rules, while more realistic as a player
Yinsh compare to Adversarial Search bots and Hy-
agent, would not create nearly as much variety be-
brid Machine Learning and Adversarial Search tech-
tween runs for testing purposes.
niques for play performance? Moreover, what strate-
gies do learning bots arrive at, and what patterns Its implementation in the game is likewise simple.
emerge from this learning strategy? By addressing The Random Bot simply queries the game engine for
these questions, this study seeks to enhance its un- all the available moves and picks one at random to
derstanding of AI’s capabilities in strategic games play in both the ing placement and play phases of a
like Yinsh, broadening the computational decision- game.
2
2.2 The Alpha-Beta Bot
The first bot we used in the study is the Alpha-Beta
Bot, a strategic AI agent made to navigate the com-
plexities of Yinsh. This bot uses a combination of
hard-coded heuristics, the mini-max algorithm, and
alpha-beta pruning to find the best move according
to an evaluation function.
In the ring placement phase, the Alpha-Beta Bot
prioritises getting the so-called choke points at the
edges of the board. Then, it tries to have the most
rings in one of the zones we split the board into with- Figure 1: UML Decision Diagram for Alpha-Beta Bot
out giving its opponent more than a two-ring advan-
tage on any of the other ones. This strategy is based is coded with. It cannot learn or improve upon the
on an online discussion of initial ring placement in general Yinsh strategy, which an experienced human
Yinsh [1]. player will also know.
In the playing phase, the Alpha-Beta Bot performs
a mini-max search based on our evaluation function.
The evaluation function assigns scores to each move
2.3 Deep Q-Learning Bot
based on its game state. It considers whether there The Deep Q-Learning bot was developed as an ad-
are 3, 4, and 5 chips in a row, ring removals, and wins. vanced AI agent to develop strategic gameplay for
This mini-max tree is then pruned using the alpha- Yinsh beyond the general strategy used by the Alpha-
beta pruning technique, which improves efficiency by Beta Bot. The bot uses a neural network to approx-
eliminating unnecessary branches in the search tree. imate Q-values dynamically. A one-dimensional vec-
This allows the bot to find the best move without tor represents the game state, encapsulating the de-
spending too many branches. sign, and this vector then serves as input to the neural
To achieve this behaviour, we first implemented the network.
Alpha-Beta Bot to check which of these phases the The neural network’s outer layer is fixed at 100,
game is in. If it is in the ring placement phase, it representing the maximum number of potential ac-
places a ring in the checked points if they are free. tions. However, the number of valid moves varies
If these points are not free, it calculates the score dynamically in each state. The bot applies a mask-
for each zone of the board. It first tries to ensure ing mechanism that filters out invalid actions to avoid
no zone is more than two rings in favour of the op- complications. This ensures that only Q values corre-
ponent; then, the zone is most in its favour. In the sponding to valid moves are considered for decision-
playing phase, the Alpha-Beta constructs a mini-max making. By focusing only on practical actions, the
tree recursively to reach the best move. It also sorts masking mechanism enhances the bot’s efficiency and
the moves in each node to cut down on computational accuracy, which makes the bot make more strategic
time. choices.
The Alpha-Beta bot balances between not being This DQN bot uses a replay buffer and a memory
overly complicated in its structural design and play- system that records gameplay experiences as (state,
ing strategically. The mini-max algorithm allows it to action, reward, next state) tuples. These experiences
respond to its opponent, while the pruning keeps the capture a diverse range of scenarios encountered dur-
computational cost to a minimum. Moreover, it does ing the game. After each game, the replay buffer is
not require upfront computation like the other bots, sampled randomly, and the neural network is trained
which will be discussed later in this paper. However, using these experiences. This approach reduces cor-
it is also only as good as the evaluation function it relations in the training data and ensures that the bot
3
learns from successes and failures, making it more ef- ordered, and the probability of it leading the bot to a
ficient. win in a game is based on its training. After this, we
The bots training process includes updating the run the mini-max search on these new moves, giving
neural network weights by minimising the error be- us more information than a simple mini-max search
tween the target and predicted Q-values, which are based on a general evaluation function.
computed using the Bellman equation. This iterative
improvement allows the bot to adapt its strategies
over time, lining up its decision-making with long- We implemented this behaviour by creating a Re-
term rewards. By leveraging the replay buffer and ward class. This class handles the training and in-
masking mechanism, the bot can handle the complex crementally adjusts the direction weights, which the
and dynamic game of Yinsh, proving its advanced bot then calls in its mini-max tree. This mini-max
learning and adaptability. tree is implemented in the same way as the previous
Alpha-Beta Bot.
3 Experiments
4
4 Results
5
yield similarly good results. The Deep Q-Learning
Bot managed to won 32% of the games, lost 3.5% of
them while drawing 64.5% of the time. This limita-
tion in the bots perfomance resulted from the Deep
Q-Learning Bot not having enough time to obtain
substantial ammount. The Deep Q-Learning requires
more time and data, which we unfortunately could
not provide, resulting in a performance worse than
the Alpha-Beta bot.
This leads us to conclude that we may have
oversimplified our approach given the tasks. We
hope that acknowledging these shortcomings will im-
prove future research on this subject. Our sugges-
Figure 7: Pie chart of the win rates of Deep Q- tions for future research are, therefore, refining the
Learning Bot vs. Deep Q-Learning Bot. The colour learning environment, enhancing the training of the
red represents White’s win rate and is equal to 31%. datasets, and exploring other reinforcement learning
The colour green represents Blacks win rate and is techniques that are better at handling the complex
equal to 53%. The colour blue represents the per- environment of the game Yinsh.
centage of a Draw, equal to 36%. Even though our research has limitations, it also
has several strengths. One notable strength was our
tailored approach to our Alpha-Beta bot in the game
5 Discussion of Yinsh, which made it perform well despite Yinsh
being a very complex game. Another strength is our
The aim of our study was to compare various Arti- use of a policy gradient in our Deep Q-Learning Bot,
ficial Intelligence strategies and focus on their per- which allows us to see how learning strategies can
formance in the game of Yinsh. These strategies in- adapt in our game of Yinsh. The findings of this
cluded Random Bots, Alpha-Beta Bots, and a Deep research are interesting yet align with existing lit-
Q-learning bot. When looking at the results of our erature, and they confirm the strengths of adver-
research, we can confidently say that there are pat- sarial search techniques, as concluded in previous
terns in the gameplay. studies [2], [6]. Interestingly, our machine learning
For instance, our findings showed that the Alpha- bot did not work as well as research would suggest.
Beta bot consistently outperformed the Random Bot, While classic techniques like the adversarial search
winning 93% of the time. In contrast, the Random performed reliably and excelled even, our attempt
Bot won no games and the two came to a draw only at machine learning did not surpass the classic tech-
7% of the time. Moreover, the Aphpa-Beta Bot only niques. However, this is not unexpected since ma-
came to a draw only 2% of the time and showed chine learning requires a lot of training, which we
very little difference between starting first or second unfortunately lacked in our research. More training
(white or black). The success of the Alpha-Beta Bot would result in better results for the machine learning
drawing so few games shows the effectiveness of ad- bot.
versarial search techniques in open ended strategy (implications further research)
games like Yinsh. This shows the potential of im-
plementing strategic algorithms in complex games,
especially games where fast tactical decisions are in- 6 Conclusion
evitable.
However, implementing our machine learning tech- This study investigated the performance of several ar-
niques, particularly the Deep Q-Learning bot, did not tificial intelligence techniques in the strategic board
6
game of Yinsh. These techniques are Machine Learn- [4] R. Cant, J. Churchill, and D. Al-Dabass. “A hy-
ing, Adversarial Search and Hybrid approaches. By brid artificial intelligence approach with appli-
establishing a Random Bot as a baseline, we could cation to games”. In: Proceedings of the 2002
analyse the performance of the more advanced tech- International Joint Conference on Neural Net-
niques in terms of their adaptability and innovation works. IJCNN’02 (Cat. No.02CH37290) (2002),
in a strategic environment. Our findings show that pp. 1575–1580. doi: 10 . 1109 / ijcnn . 2002 .
while Machine Learning Bots show enhanced adapt- 1007752.
ability, they do not compare to Adversarial Search [5] Niels Justesen et al. Deep Learning for Video
bots and excel in immediate tactical decisions. Hy- Game Playing. 2019. arXiv: 1708 . 07902
brid approaches show a combination that promises to [cs.AI]. url: https://arxiv.org/abs/1708.
use both benefits. 07902.
To conclude this study, we can emphasise the po-
tential of AI capabilities in complex strategic envi- [6] Richard E. Korf. “Multi-player alpha-beta prun-
ronments if they are improved. The importance of ing”. In: Artificial Intelligence 48.1 (Feb. 1991),
appropriate AI technique selection based on specific pp. 99–111. doi: 10 . 1016 / 0004 - 3702(91 )
needs is highlighted by utilising different approaches 90082-u.
and demonstrating that these approaches excel in dis- [7] Jonathan Schaeffer et al. “Checkers is solved”.
tinct aspects of gameplay. This insight can enhance In: Science 317.5844 (Sept. 2007), pp. 1518–
AI systems’ future development and mechanisms for 1522. doi: 10.1126/science.1144079.
competitive gameplay in strategic games.
Future research could explore developing and im-
plementing better hybrid techniques and assess their
effectiveness for various game scenarios. Investigat-
ing the impact of many diverse training datasets for
the Machine Learning bot could reveal insights into
their development. Exploring other AI methods may
also yield insights that will further push the bound-
aries of AI capabilities in complex games like Yinsh.
References
[1] Various Authors. Initial Ring Placement Strat-
egy. Accessed: 2025-01-21. Mar. 2012. url:
https://boardgamegeek.com/thread/30384/
initial-ring-placement-strategy.
[2] Ze’ev Ben-Porat and Martin Charles Golumbic.
“Learning from experience in board games”. In:
Advances in Artificial Intelligence (1990), pp. 1–
25. doi: 10.1007/978-1-4613-9052-7_1.
[3] Thomas A. Berrueta, Allison Pinosky, and Todd
D. Murphey. “Maximum Diffusion Reinforce-
ment Learning”. In: Nature Machine Intelligence
6.5 (May 2024), pp. 504–514. doi: 10 . 1038 /
s42256-024-00829-3.