Study on Evaluation Function Design of Mahjong using Supervised Learning

Study on Evaluation Function Design of Mahjong
using Supervised Learning
Hokaido University
Graduate School of Information Science and Technology
Harmonious Systems Engineering Laboratory
Yeqin Zheng
1

Background
• Perfect information games
– 1997 -- Deep blue vs. world champion on chess
– 2007 -- Quackle vs. world champion on scrabble
– 2016 -- AlphaGo vs. world champion on Go
• Monte Carlo tree search theory
• Deep learning method for pre-train network
– AlphaGo Zero vs. AlphaGo on Go
• Deep learning method
• Reinforcement learning
• Imperfect information games
– Uncertainty
– Randomness
– Complex rules
– Difficult for simulation
*

Previous research’s model.
• Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer
Mahjong Player Based on Monte Carlo Simulation and
Opponent Models， Proceedings of the 2015 IEEE Conference
on Computational Intelligence and Games (CIG 2015), pp.275-
283, Aug. 2015.
• Monte Carlo tree search to simulate opponents' movement
• Prediction of game states.
3

Purpose
• This study is about using supervised learning
theory and deep learning method on imperfect
information game -- Mahjong.
• Improvement:
– New feature engineering
• Improve the training results of networks
– Discard method
• Improve aggressive during games
4

Introduction of Mahjong Rule
• Mahjong tiles consist of 4 types, 34 different tile
and each tile has 4 pieces, totally 136 pieces.
5
Hand:
consist of tiles
River:
discarded tiles
Dora tile
Mountain:
invisible tiles for
stealing
Meld: Open hands

Goal of Mahjong
• Goal of mahjong is to make a winning hand into a special
format.
• There are two different types for earning points:
6
Tsumo: get last tile from mountain and earn
points from all other players
Ron: use other player’s last discarded tile
and earn points from that player

Difficulty & Approach
• Difficulty
– Imperfect information game has much more states than
perfect information game.
• It's almost impossible to meet a same game state from any game
you played ever.
– Randomness and uncertainty will fill the entire game
process.
• Approach
– Dividing the movements during games into several types.
– Using multi-networks and methods to make different
movements in different states.
7

Introduction of Tenhou.net
Tenhou is the one of the most popular online mahjong
service in Japan.
• 4,870,311 users totally.
• About 5000 players on
line on the same time.
• Our training data are all
from “houou” table
8
Tenhou.net Model
Game states
Decision

Introduction of Tenhou.net's API 9
Data from Tenhou.net Mean Example
T/U/V/W (+ ID) T/U/V/W: Player's position
ID: Tile ID from 0 to 135
T123 ＃Dealer steals a North
V #Player in position west steal a tile
D/E/F/G + ID D/E/F/G: Player's position
ID: Tile ID from 0 to 135
E123 ＃Player in position south
discards a North
Reach who= “Player's
position”
Who makes a call of riichi Reach who="2"
#Player in position west calls a riichi
N who=”Player's position”
m=”meld"
Who makes a call of meld N who="3" m=``34567"
#Player in position north calls a meld
Agari Who makes a call of winning and his hands, point changes, waiting
tile, yaku and who lose point
Ryuukyoku End a round without anyone wins and the point changes
Data to Tenhou.net Mean Example
T + ID Discard a tile and the tile's ID T123 ＃You discard a North
Reach who=”0" Make a call of riichi
N who=”0" m=”meld" Make a call of meld N who="0" m=``34567"
Agari Make a call of winning

Process of Decision Making 10
Player: Steal a tile
System: Win check
Player: Decide a tile to be discarded
Player: Call winYes
System: riichi check
Player: Discard
Player: Call riichi & discardYes
Last player's turn
Next player's turn
System: Win check Opponent: Call winYes

Introduction of Related Terminology
• Waiting/Tenpai: One or more players have made winning
hands and waiting for the last tiles to earn score.
• N shanten: After n effective tiles drawn into hands by
player will make hands into winning hands and enter
waiting state.
11

Aggressive Move
• Two types of game states
– No one is in waiting (Attack route): Discard a tile to make hands closer to winning
hands and earn score, which may lead to a decrease in number of shanten.
– Someone may in waiting (Defense route):
• Aggressive move: Player choose a tile that may decrease the number of shanten and
unsafe for current game state, also may lead to a decrease in player’s score because other
players may have entered waiting states and waiting for this tile.
• Safe move: Discard a tile that has less danger of losing score and give up to win, which
may make hands away from winning hands and lead to an increase in number of shanten.
12
In this case, player D has make a riichi
(Someone is in waiting):
- Aggressive move: discard a tile to
turn into waiting state which may
lead to losing point.
- Safe move: discard a tile that player
D has discarded will lead to an
increase in number of shanten.
Without aggressive move:
- Fold always
- Difficult to make a winning hands

Model Details -- Networks 13
Choose a tile
to discard
Waiting-tile network
(WTR)
Waiting-or-not
network
(WR)
opponents'
waiting
probability
probability of
34 tiles that
maybe waited
Discard network
(DR)
Lose-point
network
(LP)
probability
of 34 tiles
that maybe
discarded
probability
of point may
lose for the
tiles in hand
Defense/fold route
WR > threshold
6*6*107(108)
feature map
Discard network
(DR)
WR ≤ threshold
probability
of 34 tiles
that maybe
discarded
Attack route

Model Details -- Networks
• No one is in waiting
– Maximum of the output from discard network
• Someone may in waiting
– Minimum of lose point expert (LPE)
– LPEi = WR * (DRi *) WTRi * LPi ,
where i is tile ID which in hands. In order to increse aggressive
move, the output from discard network will be multiplied to LPE.
• The threshold to turn mode
– Collecting the data of games states when there is player in
waiting.
– Using the waiting-or-not network to calculate the
probability for these games states.
– Calculating the average of outputs which is 0.245.
14

Model Details -- Feature Engineering
• Matrix with strong connection between each adjacent nodes in
matrix performs better for convolutional neural network (CNN).
• Modeling each non-repeating tile into a vector space.
• Turning the vector space into 6 * 6 matrix base.
15

Features in Feature Map
Feature map
hands, 4 layers
river, 4 layers
turn's movement, 24 * 4 layers
dora tiles, 1 layer
invisible tiles, 1 layer
close hand, 1 layer
(discard tile, 1 layer)
16
107 layers feature map will not
include the discard tile feature.

Networks Details 17
Network Content Output Data
amount
WR waiting-or-not
network
predict the probability that
other players are waiting
a probability about whether other
players is in waiting or not (From 0
to 1)
300,000
WTR waiting-tiles
network
predict the probabilities of
tiles that others may wait for
a list of 34 probabilities about how
dangerous 34 tiles
4,000*34
DR discard network predict which tile in hand will
be discarded if player is a
mahjong high level player
a list of probability which are 34
tiles' probabilities of being discarded
100,000*34
Training data:
Waiting-or-not network:
Input: 107 layer feature map
Output:
1: someone is in waiting
0: no one is in waiting
Waiting-tiles network:
Input: 107 layer feature map
Output:
1: tiles being waited
0: other tiles
In waiting Wait for 1s and 4s

Networks Details 18
Network Content Output Data amount
LP lose-point
network
predict how many point will
lost if discard one tile
a list that consists of 6 probabilities
about how many han in other hand
if he wins this round
16,500*6
Training data:
Lose-point network:
• Input: 108 layer feature map
• Output: the lost for this discarded
tile

Networks Details 19
Number of
convolutional
kernels
Size of
convolutional
kernels
Edge
processing
padding
Activation
function
512 4*4 same relu
512 3*3 same none
512 2*2 same relu
Dropout
256 2*2 same none
256 3*3 same relu
Dropout
128 3*3 same none
128 2*2 same relu
Dropout
Full connected
6*6*107(108)
feature map as
input layer
Hidden layer
(Totally 7 layers)
...
...
Output layer and
full connected layer

Final Accuracy of Each Network 20
Network Accuracy
Waiting-or-not network 82.7%
Waiting-tiles network 40.2%
Lose-point network 88.7%
Discard network 88.4%
The Waiting-tiles
network has the
accuracy only 40.2% is
that the result only
calculate whether the
maximum of output is
being waited.

Experiment and Result
• Comparison of three models in our experiment
21
Model Game state Attack route Defense route
Best choice
algorithm (BCA)
Make a call of
riichi or open
hands with over
three melds
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which the in-
waiting player has
discarded
Combine BCA's
attack mode with
deep model for
defense
Make a prediction
that someone
may in waiting
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which will lead to
the least loss
Deep model
Make a prediction
that someone
may in waiting
Imitate expert
players discard
base on current
game state
Choose the tile
which will lead to
the least loss

• We perform 60 games for each model on “Ippan” table,
which every player can participate in.
22
Ippan
table
(avg. lv.
1.5)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 27% 30% 25% 18% 24% 11% 14%
BCA +
defense
model
17% 28% 45% 10% 18% 8% 0%
Deep
model
22% 27% 33% 18% 20% 9% 8%
Players'
average
(Tenhou)
20% 23% 27% 30% 20% 19% -
Geen: Worst performance Red: Best performance

• We perform 100 games for each model on “Joukyuu”
table.
23
Joukyuu
table
(avg. lv.
11.75)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 19% 23% 30% 28% 16% 18% 12%
BCA with
deep
model
22% 28% 33% 17% 17% 8% 1%
Deep
model
24% 29% 27% 20% 21% 11% 7%
Players'
average
(Tenhou)
25% 25% 25% 25% 23% 15% 17%
Geen: Worst performance Red: Best performance

Competition Between Each Model 24
1st/2nd/3rd/4th 1 BCA
1 BCA + defense
model
1 Deep model
3 BCA - 2/6/9/3 3/7/6/4
3 BCA + defense
model
6/4/5/5 - 5/6/5/4
3 Deep model 4/5/5/6 4/7/7/2 -
The result table shows that:
• BCA
• Good in attack
• Easy to be defended
• BCA + defense mode
• Great in defense
• Less aggressive move
• Deep model
• Good in defense
• Balance in defensive and offensive
We performed 20 game for each model with a 1 vs 3 games.

Comparison Between Discard Method
• Two discard methods show different performance during expriment.
• Make a comparison for these two methods.
• It’s easier to be speculate the non-deep learning AI’s state and what
tiles it’s waiting for.
• Deep model performs more like a human player than non-deep
learning AI in attack which we can get from the top rate and win rate.
25
Discard method Waiting
Waiting
rate
Waiting
prediction
Waiting
tiles
prediction
BCA 438 53.94% 91.32% 57.53%
Discard model 411 49.58% 83.43% 39.90%

Conclusion
• The deep model in this study shows a good performance
during Mahjong games.
– High 2nd rate.
– Aggressive move.
• New feature engineering performs good.
• Performance when model predicts that someone is in
waiting are better than human player’s average.
• It’s possible to make a better multi-network model based
on this experiment.
Thank you for listening.
26

Research performance
・Information Processing Society of Japan
1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita,
Hidenori Kawamura: Study on Evaluation Function Design of Mahjong
using Supervised Learning, Special Internet Groups(Sig), Vol 194,
Hokkaido(2019)
27

Study on Evaluation Function Design of Mahjong using Supervised Learning

More Related Content

Similar to Study on Evaluation Function Design of Mahjong using Supervised Learning (20)

More from harmonylab (20)

Recently uploaded (20)

Study on Evaluation Function Design of Mahjong using Supervised Learning