SlideShare a Scribd company logo
Study on Evaluation Function Design of Mahjong
using Supervised Learning
Hokaido University
Graduate School of Information Science and Technology
Harmonious Systems Engineering Laboratory
Yeqin Zheng
1
Background
• Perfect information games
– 1997 -- Deep blue vs. world champion on chess
– 2007 -- Quackle vs. world champion on scrabble
– 2016 -- AlphaGo vs. world champion on Go
• Monte Carlo tree search theory
• Deep learning method for pre-train network
– AlphaGo Zero vs. AlphaGo on Go
• Deep learning method
• Reinforcement learning
• Imperfect information games
– Uncertainty
– Randomness
– Complex rules
– Difficult for simulation
*
Previous research’s model.
• Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer
Mahjong Player Based on Monte Carlo Simulation and
Opponent Models Proceedings of the 2015 IEEE Conference
on Computational Intelligence and Games (CIG 2015), pp.275-
283, Aug. 2015.
• Monte Carlo tree search to simulate opponents' movement
• Prediction of game states.
3
Purpose
• This study is about using supervised learning
theory and deep learning method on imperfect
information game -- Mahjong.
• Improvement:
– New feature engineering
• Improve the training results of networks
– Discard method
• Improve aggressive during games
4
Introduction of Mahjong Rule
• Mahjong tiles consist of 4 types, 34 different tile
and each tile has 4 pieces, totally 136 pieces.
5
Hand:
consist of tiles
River:
discarded tiles
Dora tile
Mountain:
invisible tiles for
stealing
Meld: Open hands
Goal of Mahjong
• Goal of mahjong is to make a winning hand into a special
format.
• There are two different types for earning points:
6
Tsumo: get last tile from mountain and earn
points from all other players
Ron: use other player’s last discarded tile
and earn points from that player
Difficulty & Approach
• Difficulty
– Imperfect information game has much more states than
perfect information game.
• It's almost impossible to meet a same game state from any game
you played ever.
– Randomness and uncertainty will fill the entire game
process.
• Approach
– Dividing the movements during games into several types.
– Using multi-networks and methods to make different
movements in different states.
7
Introduction of Tenhou.net
Tenhou is the one of the most popular online mahjong
service in Japan.
• 4,870,311 users totally.
• About 5000 players on
line on the same time.
• Our training data are all
from “houou” table
8
Tenhou.net Model
Game states
Decision
Introduction of Tenhou.net's API 9
Data from Tenhou.net Mean Example
T/U/V/W (+ ID) T/U/V/W: Player's position
ID: Tile ID from 0 to 135
T123 Dealer steals a North
V #Player in position west steal a tile
D/E/F/G + ID D/E/F/G: Player's position
ID: Tile ID from 0 to 135
E123 Player in position south
discards a North
Reach who= “Player's
position”
Who makes a call of riichi Reach who="2"
#Player in position west calls a riichi
N who=”Player's position”
m=”meld"
Who makes a call of meld N who="3" m=``34567"
#Player in position north calls a meld
Agari Who makes a call of winning and his hands, point changes, waiting
tile, yaku and who lose point
Ryuukyoku End a round without anyone wins and the point changes
Data to Tenhou.net Mean Example
T + ID Discard a tile and the tile's ID T123 You discard a North
Reach who=”0" Make a call of riichi
N who=”0" m=”meld" Make a call of meld N who="0" m=``34567"
Agari Make a call of winning
Process of Decision Making 10
Player: Steal a tile
System: Win check
Player: Decide a tile to be discarded
Player: Call winYes
System: riichi check
Player: Discard
Player: Call riichi & discardYes
Last player's turn
Next player's turn
System: Win check Opponent: Call winYes
Introduction of Related Terminology
• Waiting/Tenpai: One or more players have made winning
hands and waiting for the last tiles to earn score.
• N shanten: After n effective tiles drawn into hands by
player will make hands into winning hands and enter
waiting state.
11
Aggressive Move
• Two types of game states
– No one is in waiting (Attack route): Discard a tile to make hands closer to winning
hands and earn score, which may lead to a decrease in number of shanten.
– Someone may in waiting (Defense route):
• Aggressive move: Player choose a tile that may decrease the number of shanten and
unsafe for current game state, also may lead to a decrease in player’s score because other
players may have entered waiting states and waiting for this tile.
• Safe move: Discard a tile that has less danger of losing score and give up to win, which
may make hands away from winning hands and lead to an increase in number of shanten.
12
In this case, player D has make a riichi
(Someone is in waiting):
- Aggressive move: discard a tile to
turn into waiting state which may
lead to losing point.
- Safe move: discard a tile that player
D has discarded will lead to an
increase in number of shanten.
Without aggressive move:
- Fold always
- Difficult to make a winning hands
Model Details -- Networks 13
Choose a tile
to discard
Waiting-tile network
(WTR)
Waiting-or-not
network
(WR)
opponents'
waiting
probability
probability of
34 tiles that
maybe waited
Discard network
(DR)
Lose-point
network
(LP)
probability
of 34 tiles
that maybe
discarded
probability
of point may
lose for the
tiles in hand
Defense/fold route
WR > threshold
6*6*107(108)
feature map
Discard network
(DR)
WR ≀ threshold
probability
of 34 tiles
that maybe
discarded
Attack route
Model Details -- Networks
• No one is in waiting
– Maximum of the output from discard network
• Someone may in waiting
– Minimum of lose point expert (LPE)
– LPEi = WR * (DRi *) WTRi * LPi ,
where i is tile ID which in hands. In order to increse aggressive
move, the output from discard network will be multiplied to LPE.
• The threshold to turn mode
– Collecting the data of games states when there is player in
waiting.
– Using the waiting-or-not network to calculate the
probability for these games states.
– Calculating the average of outputs which is 0.245.
14
Model Details -- Feature Engineering
• Matrix with strong connection between each adjacent nodes in
matrix performs better for convolutional neural network (CNN).
• Modeling each non-repeating tile into a vector space.
• Turning the vector space into 6 * 6 matrix base.
15
Features in Feature Map
Feature map
hands, 4 layers
river, 4 layers
turn's movement, 24 * 4 layers
dora tiles, 1 layer
invisible tiles, 1 layer
close hand, 1 layer
(discard tile, 1 layer)
16
107 layers feature map will not
include the discard tile feature.
Networks Details 17
Network Content Output Data
amount
WR waiting-or-not
network
predict the probability that
other players are waiting
a probability about whether other
players is in waiting or not (From 0
to 1)
300,000
WTR waiting-tiles
network
predict the probabilities of
tiles that others may wait for
a list of 34 probabilities about how
dangerous 34 tiles
4,000*34
DR discard network predict which tile in hand will
be discarded if player is a
mahjong high level player
a list of probability which are 34
tiles' probabilities of being discarded
100,000*34
Training data:
Waiting-or-not network:
Input: 107 layer feature map
Output:
1: someone is in waiting
0: no one is in waiting
Waiting-tiles network:
Input: 107 layer feature map
Output:
1: tiles being waited
0: other tiles
In waiting Wait for 1s and 4s
Networks Details 18
Network Content Output Data amount
LP lose-point
network
predict how many point will
lost if discard one tile
a list that consists of 6 probabilities
about how many han in other hand
if he wins this round
16,500*6
Training data:
Lose-point network:
• Input: 108 layer feature map
• Output: the lost for this discarded
tile
Networks Details 19
Number of
convolutional
kernels
Size of
convolutional
kernels
Edge
processing
padding
Activation
function
512 4*4 same relu
512 3*3 same none
512 2*2 same relu
Dropout
256 2*2 same none
256 3*3 same relu
Dropout
128 3*3 same none
128 2*2 same relu
Dropout
Full connected
6*6*107(108)
feature map as
input layer
Hidden layer
(Totally 7 layers)
...
...
Output layer and
full connected layer
Final Accuracy of Each Network 20
Network Accuracy
Waiting-or-not network 82.7%
Waiting-tiles network 40.2%
Lose-point network 88.7%
Discard network 88.4%
The Waiting-tiles
network has the
accuracy only 40.2% is
that the result only
calculate whether the
maximum of output is
being waited.
Experiment and Result
• Comparison of three models in our experiment
21
Model Game state Attack route Defense route
Best choice
algorithm (BCA)
Make a call of
riichi or open
hands with over
three melds
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which the in-
waiting player has
discarded
Combine BCA's
attack mode with
deep model for
defense
Make a prediction
that someone
may in waiting
Choose the tile
which can make
hands closer to
winning hands
Choose the tile
which will lead to
the least loss
Deep model
Make a prediction
that someone
may in waiting
Imitate expert
players discard
base on current
game state
Choose the tile
which will lead to
the least loss
Experiment and Result
• We perform 60 games for each model on “Ippan” table,
which every player can participate in.
22
Ippan
table
(avg. lv.
1.5)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 27% 30% 25% 18% 24% 11% 14%
BCA +
defense
model
17% 28% 45% 10% 18% 8% 0%
Deep
model
22% 27% 33% 18% 20% 9% 8%
Players'
average
(Tenhou)
20% 23% 27% 30% 20% 19% -
Geen: Worst performance Red: Best performance
Experiment and Result
• We perform 100 games for each model on “Joukyuu”
table.
23
Joukyuu
table
(avg. lv.
11.75)
Top 2nd 3rd 4th Win rate
Feed
rate
Aggressive
move
BCA 19% 23% 30% 28% 16% 18% 12%
BCA with
deep
model
22% 28% 33% 17% 17% 8% 1%
Deep
model
24% 29% 27% 20% 21% 11% 7%
Players'
average
(Tenhou)
25% 25% 25% 25% 23% 15% 17%
Geen: Worst performance Red: Best performance
Competition Between Each Model 24
1st/2nd/3rd/4th 1 BCA
1 BCA + defense
model
1 Deep model
3 BCA - 2/6/9/3 3/7/6/4
3 BCA + defense
model
6/4/5/5 - 5/6/5/4
3 Deep model 4/5/5/6 4/7/7/2 -
The result table shows that:
• BCA
• Good in attack
• Easy to be defended
• BCA + defense mode
• Great in defense
• Less aggressive move
• Deep model
• Good in defense
• Balance in defensive and offensive
We performed 20 game for each model with a 1 vs 3 games.
Comparison Between Discard Method
• Two discard methods show different performance during expriment.
• Make a comparison for these two methods.
• It’s easier to be speculate the non-deep learning AI’s state and what
tiles it’s waiting for.
• Deep model performs more like a human player than non-deep
learning AI in attack which we can get from the top rate and win rate.
25
Discard method Waiting
Waiting
rate
Waiting
prediction
Waiting
tiles
prediction
BCA 438 53.94% 91.32% 57.53%
Discard model 411 49.58% 83.43% 39.90%
Conclusion
• The deep model in this study shows a good performance
during Mahjong games.
– High 2nd rate.
– Aggressive move.
• New feature engineering performs good.
• Performance when model predicts that someone is in
waiting are better than human player’s average.
• It’s possible to make a better multi-network model based
on this experiment.
Thank you for listening.
26
Research performance
・Information Processing Society of Japan
1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita,
Hidenori Kawamura: Study on Evaluation Function Design of Mahjong
using Supervised Learning, Special Internet Groups(Sig), Vol 194,
Hokkaido(2019)
27

More Related Content

Similar to Study on Evaluation Function Design of Mahjong using Supervised Learning (20)

PPTX
AlphaGo
Jackei Kuo
 
PDF
Android application - Tic Tac Toe
Sarthak Srivastava
 
PPTX
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
Guru Nanak Technical Institutions
 
PDF
Alpha go 16110226_김영우
영우 김
 
PPTX
Starcraft 2016
Cheong Mok Bae
 
PDF
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
AdityaSuryavamshi
 
PPTX
An evolutionary tic tac toe player ccit2012
Belal Al-Khateeb
 
PDF
Games.4
Praveen Kumar
 
PDF
Introduction to Alphago Zero
Chia-Ching Lin
 
PPTX
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
Guru Nanak Technical Institutions
 
PPTX
22PCOAM11 Unit 2: Session 7 Adversarial Search .pptx
Guru Nanak Technical Institutions
 
PPTX
Intelligent Heuristics for the Game Isolation
Kory Becker
 
PDF
Stratego
Sergiu Redeca
 
PDF
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
Guru Nanak Technical Institutions
 
PDF
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Communication Systems & Networks
 
PPTX
Analysis on steam platform
Yousef Fadila
 
PDF
Multiplayer Networking Game
Tanmay Krishna
 
PDF
Gdmc v11 presentation
jihoon jeon
 
PDF
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
udayvanand
 
PPT
games, infosec, privacy, adversaries .ppt
MuhammadAbdullah311866
 
AlphaGo
Jackei Kuo
 
Android application - Tic Tac Toe
Sarthak Srivastava
 
AI3391 Artificial Intelligence Session 14 Adversarial Search .pptx
Guru Nanak Technical Institutions
 
Alpha go 16110226_김영우
영우 김
 
Starcraft 2016
Cheong Mok Bae
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
AdityaSuryavamshi
 
An evolutionary tic tac toe player ccit2012
Belal Al-Khateeb
 
Games.4
Praveen Kumar
 
Introduction to Alphago Zero
Chia-Ching Lin
 
AI3391 Artificial Intelligence Session 20 partially observed games.pptx
Guru Nanak Technical Institutions
 
22PCOAM11 Unit 2: Session 7 Adversarial Search .pptx
Guru Nanak Technical Institutions
 
Intelligent Heuristics for the Game Isolation
Kory Becker
 
Stratego
Sergiu Redeca
 
AI3391 Artificial Intelligence UNIT III Notes_merged.pdf
Guru Nanak Technical Institutions
 
Smart Attacks on the integrity of the Internet of Things Avoiding detection b...
Communication Systems & Networks
 
Analysis on steam platform
Yousef Fadila
 
Multiplayer Networking Game
Tanmay Krishna
 
Gdmc v11 presentation
jihoon jeon
 
21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE
udayvanand
 
games, infosec, privacy, adversaries .ppt
MuhammadAbdullah311866
 

More from harmonylab (20)

PDF
【卒業論文】LLMを甚いたMulti-Agent-Debateにおける反論の効果に関する研究
harmonylab
 
PDF
【卒業論文】深局孊習によるログ異垞怜知モデルを甚いたサむバヌ攻撃怜知に関する研究
harmonylab
 
PDF
【卒業論文】LLMを甚いた゚ヌゞェントの盞互䜜甚による俳句の生成ず評䟡に関する研究
harmonylab
 
PPTX
【修士論文】垝囜議䌚および囜䌚議事速蚘録における可胜衚珟の長期的倉遷に関する研究
harmonylab
 
PPTX
【修士論文】競茪における泚目レヌス遞定ずLLMを甚いたレヌス玹介蚘事生成に関する研究
harmonylab
 
PDF
【卒業論文】ステレオカメラによる車䞡制埡における深局孊習の適甚に関する研究A Study on Application of Deep Learning...
harmonylab
 
PDF
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
harmonylab
 
PPTX
【修士論文】LLMを甚いた俳句掚敲ず批評文生成に関する研究
harmonylab
 
PDF
【修士論文】芖芚蚀語モデルを甚いた衣服画像ペアの比范文章生成に関する研究A Study on the Generation of Comparative...
harmonylab
 
PPTX
【DLれミ】Generative Image Dynamics, CVPR2024
harmonylab
 
PDF
From Pretraining Data to Language Models to Downstream Tasks Tracking the Tr...
harmonylab
 
PDF
Generating Automatic Feedback on UI Mockups with Large Language Models
harmonylab
 
PDF
【DLれミ】XFeat: Accelerated Features for Lightweight Image Matching
harmonylab
 
PPTX
【修士論文】代替出勀者の遞定業務における䟝頌順決定方法に関する研究   千坂知也
harmonylab
 
PPTX
【修士論文】経路探玢のための媒介䞭心性に基づく道路ネットワヌク階局化手法に関する研究
harmonylab
 
PPTX
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
harmonylab
 
PPTX
【卒業論文】印象タグを甚いた衣服画像生成システムに関する研究
harmonylab
 
PPTX
【卒業論文】倧芏暡蚀語モデルを甚いたマニュアル文章修正手法に関する研究
harmonylab
 
PPTX
DLれミPrimitive Generation and Semantic-related Alignment for Universal Zero-S...
harmonylab
 
PPTX
DLれミ: MobileOne: An Improved One millisecond Mobile Backbone
harmonylab
 
【卒業論文】LLMを甚いたMulti-Agent-Debateにおける反論の効果に関する研究
harmonylab
 
【卒業論文】深局孊習によるログ異垞怜知モデルを甚いたサむバヌ攻撃怜知に関する研究
harmonylab
 
【卒業論文】LLMを甚いた゚ヌゞェントの盞互䜜甚による俳句の生成ず評䟡に関する研究
harmonylab
 
【修士論文】垝囜議䌚および囜䌚議事速蚘録における可胜衚珟の長期的倉遷に関する研究
harmonylab
 
【修士論文】競茪における泚目レヌス遞定ずLLMを甚いたレヌス玹介蚘事生成に関する研究
harmonylab
 
【卒業論文】ステレオカメラによる車䞡制埡における深局孊習の適甚に関する研究A Study on Application of Deep Learning...
harmonylab
 
A Study on the Method for Generating Deformed Route Maps for Supporting Detou...
harmonylab
 
【修士論文】LLMを甚いた俳句掚敲ず批評文生成に関する研究
harmonylab
 
【修士論文】芖芚蚀語モデルを甚いた衣服画像ペアの比范文章生成に関する研究A Study on the Generation of Comparative...
harmonylab
 
【DLれミ】Generative Image Dynamics, CVPR2024
harmonylab
 
From Pretraining Data to Language Models to Downstream Tasks Tracking the Tr...
harmonylab
 
Generating Automatic Feedback on UI Mockups with Large Language Models
harmonylab
 
【DLれミ】XFeat: Accelerated Features for Lightweight Image Matching
harmonylab
 
【修士論文】代替出勀者の遞定業務における䟝頌順決定方法に関する研究   千坂知也
harmonylab
 
【修士論文】経路探玢のための媒介䞭心性に基づく道路ネットワヌク階局化手法に関する研究
harmonylab
 
A Study on Decision Support System for Snow Removal Dispatch using Road Surfa...
harmonylab
 
【卒業論文】印象タグを甚いた衣服画像生成システムに関する研究
harmonylab
 
【卒業論文】倧芏暡蚀語モデルを甚いたマニュアル文章修正手法に関する研究
harmonylab
 
DLれミPrimitive Generation and Semantic-related Alignment for Universal Zero-S...
harmonylab
 
DLれミ: MobileOne: An Improved One millisecond Mobile Backbone
harmonylab
 
Ad

Recently uploaded (20)

PDF
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
PDF
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
PDF
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
PDF
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PPTX
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
PPTX
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
PDF
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
PPT
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
PPTX
Work at Height training for workers .pptx
cecos12
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
PPTX
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
PPT
FINAL plumbing code for board exam passer
MattKristopherDiaz
 
PPTX
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
PPTX
Computer network Computer network Computer network Computer network
Shrikant317689
 
PDF
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
PDF
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
茪読䌚資料_Miipher and Miipher2 .
NABLAS株匏䌚瀟
 
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
NFPA 10 - Estandar para extintores de incendios portatiles (ed.22 ENG).pdf
Oscar Orozco
 
lesson4-occupationalsafetyandhealthohsstandards-240812020130-1a7246d0.pdf
arvingallosa3
 
How to Buy Verified CashApp Accounts IN 2025
Buy Verified CashApp Accounts
 
June 2025 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
FSE_LLM4SE1_A Tool for In-depth Analysis of Code Execution Reasoning of Large...
cl144
 
Unit_I Functional Units, Instruction Sets.pptx
logaprakash9
 
Designing for Tomorrow – Architecture’s Role in the Sustainability Movement
BIM Services
 
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
resming1
 
SF 9_Unit 1.ppt software engineering ppt
AmarrKannthh
 
Work at Height training for workers .pptx
cecos12
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Artificial Intelligence jejeiejj3iriejrjifirirjdjeie
VikingsGaming2
 
FINAL plumbing code for board exam passer
MattKristopherDiaz
 
Stability of IBR Dominated Grids - IEEE PEDG 2025 - short.pptx
ssuser307730
 
Computer network Computer network Computer network Computer network
Shrikant317689
 
Authentication Devices in Fog-mobile Edge Computing Environments through a Wi...
ijujournal
 
CLIP_Internals_and_Architecture.pdf sdvsdv sdv
JoseLuisCahuanaRamos3
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
茪読䌚資料_Miipher and Miipher2 .
NABLAS株匏䌚瀟
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
AsadShad4
 
Ad

Study on Evaluation Function Design of Mahjong using Supervised Learning

  • 1. Study on Evaluation Function Design of Mahjong using Supervised Learning Hokaido University Graduate School of Information Science and Technology Harmonious Systems Engineering Laboratory Yeqin Zheng 1
  • 2. Background • Perfect information games – 1997 -- Deep blue vs. world champion on chess – 2007 -- Quackle vs. world champion on scrabble – 2016 -- AlphaGo vs. world champion on Go • Monte Carlo tree search theory • Deep learning method for pre-train network – AlphaGo Zero vs. AlphaGo on Go • Deep learning method • Reinforcement learning • Imperfect information games – Uncertainty – Randomness – Complex rules – Difficult for simulation *
  • 3. Previous research’s model. • Naoki Mizukami and Yoshimasa Tsuruoka. Building a Computer Mahjong Player Based on Monte Carlo Simulation and Opponent Models Proceedings of the 2015 IEEE Conference on Computational Intelligence and Games (CIG 2015), pp.275- 283, Aug. 2015. • Monte Carlo tree search to simulate opponents' movement • Prediction of game states. 3
  • 4. Purpose • This study is about using supervised learning theory and deep learning method on imperfect information game -- Mahjong. • Improvement: – New feature engineering • Improve the training results of networks – Discard method • Improve aggressive during games 4
  • 5. Introduction of Mahjong Rule • Mahjong tiles consist of 4 types, 34 different tile and each tile has 4 pieces, totally 136 pieces. 5 Hand: consist of tiles River: discarded tiles Dora tile Mountain: invisible tiles for stealing Meld: Open hands
  • 6. Goal of Mahjong • Goal of mahjong is to make a winning hand into a special format. • There are two different types for earning points: 6 Tsumo: get last tile from mountain and earn points from all other players Ron: use other player’s last discarded tile and earn points from that player
  • 7. Difficulty & Approach • Difficulty – Imperfect information game has much more states than perfect information game. • It's almost impossible to meet a same game state from any game you played ever. – Randomness and uncertainty will fill the entire game process. • Approach – Dividing the movements during games into several types. – Using multi-networks and methods to make different movements in different states. 7
  • 8. Introduction of Tenhou.net Tenhou is the one of the most popular online mahjong service in Japan. • 4,870,311 users totally. • About 5000 players on line on the same time. • Our training data are all from “houou” table 8 Tenhou.net Model Game states Decision
  • 9. Introduction of Tenhou.net's API 9 Data from Tenhou.net Mean Example T/U/V/W (+ ID) T/U/V/W: Player's position ID: Tile ID from 0 to 135 T123 Dealer steals a North V #Player in position west steal a tile D/E/F/G + ID D/E/F/G: Player's position ID: Tile ID from 0 to 135 E123 Player in position south discards a North Reach who= “Player's position” Who makes a call of riichi Reach who="2" #Player in position west calls a riichi N who=”Player's position” m=”meld" Who makes a call of meld N who="3" m=``34567" #Player in position north calls a meld Agari Who makes a call of winning and his hands, point changes, waiting tile, yaku and who lose point Ryuukyoku End a round without anyone wins and the point changes Data to Tenhou.net Mean Example T + ID Discard a tile and the tile's ID T123 You discard a North Reach who=”0" Make a call of riichi N who=”0" m=”meld" Make a call of meld N who="0" m=``34567" Agari Make a call of winning
  • 10. Process of Decision Making 10 Player: Steal a tile System: Win check Player: Decide a tile to be discarded Player: Call winYes System: riichi check Player: Discard Player: Call riichi & discardYes Last player's turn Next player's turn System: Win check Opponent: Call winYes
  • 11. Introduction of Related Terminology • Waiting/Tenpai: One or more players have made winning hands and waiting for the last tiles to earn score. • N shanten: After n effective tiles drawn into hands by player will make hands into winning hands and enter waiting state. 11
  • 12. Aggressive Move • Two types of game states – No one is in waiting (Attack route): Discard a tile to make hands closer to winning hands and earn score, which may lead to a decrease in number of shanten. – Someone may in waiting (Defense route): • Aggressive move: Player choose a tile that may decrease the number of shanten and unsafe for current game state, also may lead to a decrease in player’s score because other players may have entered waiting states and waiting for this tile. • Safe move: Discard a tile that has less danger of losing score and give up to win, which may make hands away from winning hands and lead to an increase in number of shanten. 12 In this case, player D has make a riichi (Someone is in waiting): - Aggressive move: discard a tile to turn into waiting state which may lead to losing point. - Safe move: discard a tile that player D has discarded will lead to an increase in number of shanten. Without aggressive move: - Fold always - Difficult to make a winning hands
  • 13. Model Details -- Networks 13 Choose a tile to discard Waiting-tile network (WTR) Waiting-or-not network (WR) opponents' waiting probability probability of 34 tiles that maybe waited Discard network (DR) Lose-point network (LP) probability of 34 tiles that maybe discarded probability of point may lose for the tiles in hand Defense/fold route WR > threshold 6*6*107(108) feature map Discard network (DR) WR ≀ threshold probability of 34 tiles that maybe discarded Attack route
  • 14. Model Details -- Networks • No one is in waiting – Maximum of the output from discard network • Someone may in waiting – Minimum of lose point expert (LPE) – LPEi = WR * (DRi *) WTRi * LPi , where i is tile ID which in hands. In order to increse aggressive move, the output from discard network will be multiplied to LPE. • The threshold to turn mode – Collecting the data of games states when there is player in waiting. – Using the waiting-or-not network to calculate the probability for these games states. – Calculating the average of outputs which is 0.245. 14
  • 15. Model Details -- Feature Engineering • Matrix with strong connection between each adjacent nodes in matrix performs better for convolutional neural network (CNN). • Modeling each non-repeating tile into a vector space. • Turning the vector space into 6 * 6 matrix base. 15
  • 16. Features in Feature Map Feature map hands, 4 layers river, 4 layers turn's movement, 24 * 4 layers dora tiles, 1 layer invisible tiles, 1 layer close hand, 1 layer (discard tile, 1 layer) 16 107 layers feature map will not include the discard tile feature.
  • 17. Networks Details 17 Network Content Output Data amount WR waiting-or-not network predict the probability that other players are waiting a probability about whether other players is in waiting or not (From 0 to 1) 300,000 WTR waiting-tiles network predict the probabilities of tiles that others may wait for a list of 34 probabilities about how dangerous 34 tiles 4,000*34 DR discard network predict which tile in hand will be discarded if player is a mahjong high level player a list of probability which are 34 tiles' probabilities of being discarded 100,000*34 Training data: Waiting-or-not network: Input: 107 layer feature map Output: 1: someone is in waiting 0: no one is in waiting Waiting-tiles network: Input: 107 layer feature map Output: 1: tiles being waited 0: other tiles In waiting Wait for 1s and 4s
  • 18. Networks Details 18 Network Content Output Data amount LP lose-point network predict how many point will lost if discard one tile a list that consists of 6 probabilities about how many han in other hand if he wins this round 16,500*6 Training data: Lose-point network: • Input: 108 layer feature map • Output: the lost for this discarded tile
  • 19. Networks Details 19 Number of convolutional kernels Size of convolutional kernels Edge processing padding Activation function 512 4*4 same relu 512 3*3 same none 512 2*2 same relu Dropout 256 2*2 same none 256 3*3 same relu Dropout 128 3*3 same none 128 2*2 same relu Dropout Full connected 6*6*107(108) feature map as input layer Hidden layer (Totally 7 layers) ... ... Output layer and full connected layer
  • 20. Final Accuracy of Each Network 20 Network Accuracy Waiting-or-not network 82.7% Waiting-tiles network 40.2% Lose-point network 88.7% Discard network 88.4% The Waiting-tiles network has the accuracy only 40.2% is that the result only calculate whether the maximum of output is being waited.
  • 21. Experiment and Result • Comparison of three models in our experiment 21 Model Game state Attack route Defense route Best choice algorithm (BCA) Make a call of riichi or open hands with over three melds Choose the tile which can make hands closer to winning hands Choose the tile which the in- waiting player has discarded Combine BCA's attack mode with deep model for defense Make a prediction that someone may in waiting Choose the tile which can make hands closer to winning hands Choose the tile which will lead to the least loss Deep model Make a prediction that someone may in waiting Imitate expert players discard base on current game state Choose the tile which will lead to the least loss
  • 22. Experiment and Result • We perform 60 games for each model on “Ippan” table, which every player can participate in. 22 Ippan table (avg. lv. 1.5) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 27% 30% 25% 18% 24% 11% 14% BCA + defense model 17% 28% 45% 10% 18% 8% 0% Deep model 22% 27% 33% 18% 20% 9% 8% Players' average (Tenhou) 20% 23% 27% 30% 20% 19% - Geen: Worst performance Red: Best performance
  • 23. Experiment and Result • We perform 100 games for each model on “Joukyuu” table. 23 Joukyuu table (avg. lv. 11.75) Top 2nd 3rd 4th Win rate Feed rate Aggressive move BCA 19% 23% 30% 28% 16% 18% 12% BCA with deep model 22% 28% 33% 17% 17% 8% 1% Deep model 24% 29% 27% 20% 21% 11% 7% Players' average (Tenhou) 25% 25% 25% 25% 23% 15% 17% Geen: Worst performance Red: Best performance
  • 24. Competition Between Each Model 24 1st/2nd/3rd/4th 1 BCA 1 BCA + defense model 1 Deep model 3 BCA - 2/6/9/3 3/7/6/4 3 BCA + defense model 6/4/5/5 - 5/6/5/4 3 Deep model 4/5/5/6 4/7/7/2 - The result table shows that: • BCA • Good in attack • Easy to be defended • BCA + defense mode • Great in defense • Less aggressive move • Deep model • Good in defense • Balance in defensive and offensive We performed 20 game for each model with a 1 vs 3 games.
  • 25. Comparison Between Discard Method • Two discard methods show different performance during expriment. • Make a comparison for these two methods. • It’s easier to be speculate the non-deep learning AI’s state and what tiles it’s waiting for. • Deep model performs more like a human player than non-deep learning AI in attack which we can get from the top rate and win rate. 25 Discard method Waiting Waiting rate Waiting prediction Waiting tiles prediction BCA 438 53.94% 91.32% 57.53% Discard model 411 49.58% 83.43% 39.90%
  • 26. Conclusion • The deep model in this study shows a good performance during Mahjong games. – High 2nd rate. – Aggressive move. • New feature engineering performs good. • Performance when model predicts that someone is in waiting are better than human player’s average. • It’s possible to make a better multi-network model based on this experiment. Thank you for listening. 26
  • 27. Research performance ・Information Processing Society of Japan 1) Yeqin Zheng, Soichiro Yokoyama, Tomohisa Yamashita, Hidenori Kawamura: Study on Evaluation Function Design of Mahjong using Supervised Learning, Special Internet Groups(Sig), Vol 194, Hokkaido(2019) 27