0% found this document useful (0 votes)
13 views6 pages

Thattinaphanich 2019

Uploaded by

ansokramy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Thattinaphanich 2019

Uploaded by

ansokramy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

Thai Named Entity Recognition Using


Bi-LSTM-CRF with Word and Character
Representation
Suphanut Thattinaphanich Santitham Prom-on
Department of Computer Engineering, Department of Computer Engineering,
Faculty of Engineering Faculty of Engineering
King Mongkus University of Technology Thonburi King Mongkut’s University of Technology Thonburi
Bangkok, Thailand Bangkok, Thailand
[email protected] [email protected]

Abstract—Named Entity Recognition (NER) is a handy tool the increase of computation power. In this research paper, we
for many natural language processing tasks to identify and present the neural network architecture, Bidirectional Long
extract a unique entity such as person, location, organization Short-term Memory with Conditional Random Field (Bi-
and time. In English and Chinese, NER has been thoroughly
researched and is able to be applied in more practical settings. LSTM-CRF), implemented with word and character repre-
Its development in Thai is still limited because of rare resources sentation for the named entity recognition task. With the
and language difficulties such as the lack of boundary indicator nature of Bi-LSTM, the model can learn the information
for words, phrases and sentences. In this paper, we present from the successive word and the precedent word in the
an application of Bi-LSTM-CRF with word/character level sentence. In addition, the model absorbs the information of
representation, to solve this problem. Firstly, we prepared texts
by tokenizing a sentence to a bunch of words. We then prepared each popular word from word and character representation.
word representation and Bi-LSTM character representation. In
the end, we built a recurrent neural network combined with
II. R ELATED W ORK
CRF to learn the sequence of text and extract the knowledge For the Thai language, NER research is still in an infantile
to build NER recognition to overcome this problem. Our model state. A few works attempted to address this issue. Firstly,
was evaluated by the NER opensource corpus from a Facebook
Charoenpornsawat et al. (1998) [4] used context words, part
group ThaiNLP. The results of our model yielded precision,
recall, and F1 at 91.79%, 91.51% and 91.65% respectively. of speech (POS) and heuristic rules to extract the information
Index Terms—Named Entity Recognition, Recurrent Neural as a feature. Then they applied the Ripper and Winnow
Network, Bi-LSTM, Conditional Random Field, Thai Language algorithm to classify the entity names with the result of
92.17% accuracy. Chanlekha et al. (2004) [5] used maximum
entropy with heuristic information from rules and word-
I. I NTRODUCTION occurrences to extract the named entity. The result of the
Name Entity Recognition (NER), also known as entity experiment yielded the accuracy of approximately 87.7%
extraction, is one of essential elements in Natural Language including name of person (90.44%), organization (89.87%)
Processing (NLP) tasks. It is used in information extraction and location (82.16%). Tirasaroj et al. (2009) [2] forged
to identify and segment named entities and classify them into the conditional random fields model to extract NER and
several predefined classes. In widely spoken languages such experimented the pattern and the factors affecting the model.
as English and Chinese, massive studies have been conducted The result for InterBEST2009 news showed that the pattern
in various algorithms for this task. Nevertheless, Thai named had some effects on the system. The patterns for the experi-
entity recognition is limited due to the characteristics of the ment include BOI (Begin, Inside, Outside) and BOIE (Begin,
language. Thai does not have orthographical information or Outside, Inside, End) for Person, Location and Organization
boundary indicators to separate words, phrases or sentences. entities. For token word evaluation, the most accurate pattern
This creates a challenge for the machine learning model to was BOI which could archive 86.5% F1 score. Saetiew et
learn the feature from the hand-engineering feature. al. [6] used likelihood probability of tokenized words to
Given the limitation, only few studies exist for Thai name identify the person entity from texts with 85.15 F1 score for
entity recognition task. In the past decade, the most common InterBEST 2009 news corpus. Phatthiyaphaibun Wannaphong
methods for this task was a machine learning approach sim- [7] improved Tirasaroj et al. [2] by adding Part-of-Speech
ilar to English and other languages. One successful strategy as a feature to the CRF model. The result obtains slight
for solving this task was using the Conditional Random improvement at 86.9% F1 score.
Field (CRF) model which provided excellent outcomes [1][2]. For the popular language, NER is one of the fundamental
Recently, deep learning application has come to supersede research fields in NLP. Many pieces of research can archive
the conventional methods in multiple fields including image over 90% accuracy with state-of-the-art algorithms in 2010s:
recognition and natural language processing due to its ability They are machine learning and deep learning algorithms,
to generalize knowledge in the large size of the dataset and combined with other outstanding techniques. Santos et al.

978-1-7281-1019-6/19/$31.00 ©2019 IEEE 149


2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

(2014) [8] lifted the character-level embedding idea to per-


form with the neural network then compared the performance
of word-level embedding and character-level embedding as a it = σ(Wi ht−1 + Ui xt + bi )
feature with an F1 score of 82.21% for Spanish and 71.23% ft = σ(Wf ht−1 + Uf xt + bf )
for Portuguese. Chiu et al. (2015) [9] presented Bi-directional
c̃ = tanh(Wc ht−1 + Uc xt + bc ))
LSTM-CNNs architecture to challenge the name entity recog-
nition. They performed Bi-LSTM-CNN combined with word ct = ft ct−1 + it c̃t
embedding, character embedding feature, and Lexicons to the ot = σ(Wo ht−1 + Uo xt + bo )
CoNLL-2003 dataset with an F1 score of 91.62%. Xeuzhe et ht = ot tanh(ct )
al. (2016) [10] constructed Bi-directional LSTM-CNNs-CRF.
The model computed word-level representation using GloVe where σ is the element-wise sigmoid function and is the
[11] as word embedding and character-level representation element-wise product. xt is the input vector (word embedding
using the convolution neural network. After they prepared all and character embedding) at time t, and ht is the hidden state
the input, they fed them into Bi-LSTM and passed the output vector at time t that stores the information from this state and
vector of Bi-LSTM through a CRF layer to compute the tag the past. Ui , Uf , Uc , Uo are the weight matrices of gate for
in the last phase. The best result for ConLL2013 shared- input xt . Wi , Wf , Wc , Wo are the weight metrices for hidden
task was 91.21% F1 score. Similar to Xeuzhe et al. [10], state ht . bi , bf , bc , bo are the bias vectors.
Lample et al. (2016) [12] built a Bi-LSTM-CRF with word Bi-directional LSTMs is an ordinary LSTM network with
and character representation. They, however, used LSTM to an additional layer that passes the data from the backward
form character representation. The result for ConLL2013 test direction. This idea is popular in text processing because
set was 90.94% F1 score. we trained the text model with a complete sentence. This
From these results, it is clear that the deep learning technique provides past and future information. Instead of
method should also be used to accomplish this task for learning the context from the previous word, with bidirec-
Thai. Therefore, we constructed a deep learning architecture tional LSTM, we can trained the model using both the
including word representation, character representation, Bi- precedent word and the successive word.
directional Long-short term memory and CRF components
to solve the named entity recognition issue.
C. Word-Level Representation
III. M ODEL C OMPONENTS
Word embedding is vector representation with low dimen-
A. Tokenization sional space. These vectors are generated from the large body
Thai is a run-on language which lacks the boundary of text such as Wikipedia. Hence, they contain the informa-
indicator that demarcate word boundaries; all words in the tion that represents the contextual similarities between words.
document are stuck together and you need to preprocess There are several ways to generate word embedding such as
them first. In this current state, there are many approaches to word2vec [16], GloVe [11] or the embedding layer from the
separate a sentence into a pack of words including algorithm- pre-trained language model. For our model, we used a word
based, dictionary-based, machine learning-based and deep embedding layer from ULMFit [17]. ULMFit is an effective
learning-based approaches. Our word tokenization method transfer learning method to build a language model that can
is a dictionary-based word segmentation combined with be applied to many NLP tasks. ULMFit uses AWD-LSTM
maximum matching algorithm and Thai Cluster Text called [18] to build a language model. While training the pre-trained
newmm [13]. Since NER task may not require much accuracy language model, ULMFit predicts the next word to learn the
in tokenization because there is a BOI tag that handles this information and create a language model. This method using
problem, we selected newmm because this algorithm can deep learning to understand the context from surrounding
deliver a good accuracy with less processing time. words and update word vectors. Our word embedding is
extracted from a general-domain corpus language model
B. Bi-directional LSTM
which contains common features from a tremendous text
Recurrent Neural Networks [14] are a powerful and robust corpus.
type of neural networks. The idea of architecture is the
sequential information where the output can resurface itself.
D. Character-Level Representation
This method is regarded as a short-term memory because
it can remember the output that loops back to itself. So Character-level representation is a word embedding that
the recurrent network has significant inputs: the present, and contains information from a list of characters. In general,
recent past. However unfortunately, if the distance of infor- there are two remarkable ways to generate character-level
mation node is quite far, RNN will be unable to learn that representation including CNN and LSTM. These techniques
information. This problem is called long-term dependencies. are powerful to obtain morphological information from the
Long-Short term memory networks [15] are a particular characters instead of using hand-engineering prefix and suffix
class of RNN. This architecture enables the mechanism to information about the word. Reimers et al. [18] explained that
learn long-term dependencies. By using the cell state, it there was no significant difference in the result for NER.
allows the information to flow from the previous state and So in this paper, we used Bi-LSTM to perform this task.
update every time it passes to the next state. For the formulae Figure 1 shows the Bi-LSTM model for fusing the character
of LSTM unit at time t are: representation from the set of characters.

150
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

Then we fed them into Bi-LSTM network. Finally, the output


vectors of Bi-LSTM were fed to the CRF layer to decode the
best tag sequence.

Fig. 1. Character-level Representation using Bi-LSTM networks

E. Conditional Random Field


For sequence labeling jobs, CRF is one of the most
excellent models to predict the chain of labels from analyzing
the relationship of the word. For example, in the NER task, Fig. 2. Bi-LSTM CRF with Word/Character representation Architecture
the word with label I-PER cannot be followed with tag I-
ORG. Hence, instead of decoding the label independently,
we modeled the label sequence using a conditional random IV. N ETWORK T RAINING
field [1]. In this section, we give the detail about training parameter
x = {x1 , x2 , ..., xn } represents an input sequence where for our neural network. We used Keras library to building the
xi is the input vector of ith word and y = {y1 , y2 , ..., yn } neural network model. Our computation power was a Tesla
represents tag prediction sequences from sequence x. P is the P-100-PCIE-16GB from google cloud. The model training
matrix of output score from the Bi-LSTM network. P has the time took around 3-4 hours for 50 epoch.
size of n x k where k is the number of unique tags. Pi,j is
the score of j th tag of the ith word in a sentence. We can A. Word Embedding
define the score as: We used public word embedding from the Thai2Fit project
n n [20]. Thai2Fit implements ULMFit and trains Thai Language
Model from Thai Wikipedia including 130000 documents.
X X
S(x|y) = Ayi ,yi+1 + Pi,yi (1)
i=0 i=1 This word embedding contains 60000 word vectors with 400
dimensions. The input word for our model contained 250
where A is a matrix of transition scores and Ai,j describes
words for one document because the input document came
the score of transition from tag i to tag j. y0 and yn indicate
from many sources and may have a long paragraph. If an
the start and end tags from the sentence.
input word was not long enough, the leftover vectors would
A softmax of all possible tags can represent the probability
be assigned as padding and the word that was not in Thai2Fit
for sequence y as:
would be assigned as an unknown word.
exp(S(x, y))
P (y|x) = P (2) B. Character Embedding
y 0 ∈Y
x
exp(S(x, y 0 )))
Character embedding was generated from all characters
In the training phase, we use maximum conditional likeli- in Thai2Fit word embedding containing 399 characters. We
hood estimation to maximize the log probability of the correct trained the character embedding from scratch, which means
tag sequence by: these character embedding resulted from random initials for
every character at the start. The unknown character such as
log(P (y|x)) = S(x, y) − log(
X
exp(S(x, y 0 ))) (3) Japanese and Chinese words which did not include lookup ta-
y 0 ∈Yx
ble was mapped as an unknown character. The input character
for input vector contained 32 characters for one word. If input
Decoding is to predict the label y* with the highest score character was not long enough, the leftover vectors would be
given by: assigned as padding. For the recurrent dropout for Bi-LSTM
layer, Reimers et al. [18] showed that a variation dropout
y ∗ = argmaxS(x, y 0 ) (4) (0.5) could yield better result. Thus, we set the dropout at
y 0 ∈Yx
0.5. The output dimension of character representation from
F. Bi-LSTM-CRF Bi-LSTM was 64.
After we gathered all the component, we then developed
our neural network model. Figure 2 shows the detailed C. Optimization Algorithm
overview of the architecture. The input of the main Bi-LSTM- The optimization parameter for this neural network model
CRF model is character-level representation generated from was Adam with batch size = 32. The initial learning rate of
Bi-LSTM concatenated with word embedding from ULMFit. the model was 0.001. The state unit of main Bi-LSTM was

151
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

set at 512 units and the dropout equaled to 0.6. We trained 50


epoch in every experiment and selected only the best result.
V. E XPERIMENT
A. Datasets
For Thai named entity recognition, the dataset was quite
small. The standard dataset for training named entity was
BEST2009 of which the label explained only whether this
word was NER. However, it did not identify that it was
person entity, location entity, organization entity or other
entities. Later, Tirasaroj et al. [8] presented a 3 class dataset
containing person, organization and location. At present,
there is an opensource dataset from PyThaiNLP project [2]
that builds the expanding dataset of Tirasaroj dataset using the Fig. 3. Example of data
crowdsourcing to gather the multiple tag dataset. This dataset
includes 13 classes, 6148 sentences, 50593 tokens tokenized
by the newmm method with NER tags in the BOI format. For and applying Bi-LSTM character representation. The fourth
our experiment, we randomly split the dataset with 80:20 experiment (Model 4) was performing Bi-LSTM-CRF with
ratio, that is 4918 training sentences (40545 tokens) and pre-trained word embedding from ULMFit and applying Bi-
1230 testing sentences (10048). Table I shows the summary LSTM character representation. The evaluation metric was in
number of our dataset and Table II shows proportion of NER terms of precision, recall and F1 scores. The precision is the
tag. number of correctly predicted NE token divided by the total
token of NE which is extracted by the system. The recall
is a measure of the number of correctly predicted NE token
TABLE I
DATASET D ESCIPTION divided by the total number of manual annotated NE token.
And F1 score is a weighted mean of the precision and the
Type All Train Test
recall from the following formula:
Sentence 6148 4918 1230
Word 197704 157467 40237
NER Token 50593 40545 10048 P recision ∗ Recall
F =2∗ (5)
P recision + Recall
As shown in Table III, applying only pre-trained word
TABLE II embedding could improve 3% for precision and recall so F1
NER TAG D ESCIPTION score increased up to 3%. Using only character representation
could slightly enhance precision for 1.7% and recall dropped
Type Word Token
Date 1823 5676 slightly. Therefore, this model could improve F1 score at ap-
Email 11 79 proximately 0.5%. However, when we combined two factors,
Law 194 40545 we obtained a huge gain of 4.9% for precision and 7.1% for
Len 125 400
recall, for which we gained a significant overall improvement
Location 4488 9239
Money 579 2002 at 91.65% F1 score.
Organization 5577 12342 In table IV, we constructed the experimental to compare the
Percent 160 407 performance of our best model with, the state-of-the-art NER
Person 3159 14698
Phone 108 391
model for Thai language of Phatthiyaphaibun Wannaphong
Time 905 2695 [7], CRF model using NER and POS tag as features. Our
URL 114 1834 model outperformed the CRF + POS model by obtaining
ZIP 25 25 4.75% F1 score improvement.

C. Hyperparameter
B. Main Result In this section, we experimented how the hyperparameter
We tested our Bi-LSTM-CRF experiment with two config- affected the model. This experiment base used the best
urations including word representation and character repre- performance from subsection B, Bi-LSTM-CRF using word
sentation. We constructed the 4 tests to evaluate the result representation and character representation. The main factor
of applying word and character representations with Bi- for this experiment was LSTM units for the main model,
LSTM-CRF models. The first experiment (Model 1) was dropout and optimizer as shown in Table IV. The result de-
the performance of Bi-LSTM-CRF with initially random scribes that the hyperparameter did not affect much the model
word embedding training from scratch and did not include as the result was changing around 0.64%. The variation
character embedding. The second experiment (Model 2) was dropout could yield a better in every experiment compared to
the performance of Bi-LSTM-CRF with pre-trained word naive dropout similar to Reimers et al.’s result [18]. However,
embedding from ULMFit without character embedding. The for the number of LSTM units and optimizers, the F1 score
third experiment (Model 3) was performing Bi-LSTM-CRF was improved when the number of LSTM units increased
with initially random word embedding training from scratch and Adam as optimizer equaled at 512. Nevertheless, when

152
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

TABLE III hyperparameters slightly affect the model. More studies will
F1 S CORE F OR B I -LSTM M ODEL shed some light further in the development of Thai NER.
Type Model 1 Model 2 Model 3 Model 4 Firstly, we need more dataset from reliable sources, since
Date 93.59 92.93 92.62 95.96 the corpus from PyThaiNLP is crowdsourcing. The data may
Email 44.44 76.19 0.00 100.00 contain methodological errors. Second, there are alternative
Law 62.22 71.02 0.50 71.35
Len 83.44 83.24 79.55 89.49 methods to generate word embedding. We can develop and
Location 79.55 84.66 79.52 87.73 experiment with the other word embedding to improve the
Money 89.58 94.02 91.15 95.46 result. Lastly, multiple strategies to implement the model
Organization 78.39 81.48 79.26 85.20 should be further explored by adding an additional feature
Percent 94.40 75.68 88.89 93.20
Person 90.80 95.14 91.29 97.01 such as ELMO embedding or BERT embedding.
Phone 79.90 91.60 92.33 98.02
Time 84.92 84.17 86.37 89.73 ACKNOWLEDGMENT
URL 95.50 98.85 98.33 98.85
ZIP 66.67 85.71 72.73 100.00 We would like to thank Mrs.Nutcha Tirasaroj, NECTEC
Average 85.64 88.85 86.18 91.65 and PyThaiNLP for providing the NER corpus, Mr.Charin
Polpanumas for the Thai2Fit word embedding and Depart-
ment of Computer Engineering, King Mongkut’s University
TABLE IV
F1 S CORE F OR B I -LSTM M ODEL of Technology Thonburi for their generous support for this
work.
Type Best Model CRF + POS
Date 95.96 93.76 R EFERENCES
Email 100.00 100
Law 71.35 59.24 [1] J. Lafferty, A. McCallum, and F. Pereira. ”Conditional random fields:
Len 89.49 93.23 Probabilistic models for segmenting and labeling sequence data.,” In
Location 87.73 80.65 Proc, ICML, 2001.
Money 95.46 91.69 [2] N. Tirasaroj and W. Aroonmanakun, Thai Named Entity Recognition
Organization 85.20 80.69 Based on Conditional Random Fields, in International Symposium on
Percent 93.20 83.84 Natural Language Processing (SNLP), Thailand, pp. 216-220, 2009.
Person 97.01 91.63 [3] Python Thailand Group (PyThaiNLP). Thai Named Entity Recognition
Phone 98.02 95.76 Corpus [Online]. Available: https://github.com/PyThaiNLP/pythainlp.
Time 89.73 84.86 [4] P. Charoenpornsawat, B. Kijsirikul, and S. Meknavin, Feature-based
URL 98.85 96.57 Proper Name Identification in Thai, in Proc. of National Computer
ZIP 100.00 90.9 Science and Engineering Conference: NCSEC98, Thailand, 1998.
Average 91.65 86.9 [5] H. Chanlekha and A. Kawtrakul, Thai Named Entity Extraction by
incorporating Maximum Entropy Model with Simple Heuristic Infor-
mation, in Natural Language Processing (IJCNLP), China, 2004.
[6] N. Saetiew, T. Achalakul, and S. Prom-onm, ”Thai Person Name
Recognition (PNR) Using Likelihood Probability of Tokenized Words,”
the LSTM was placed at 1024, the F1 score would slightly in 5th International Electrical Engineering Congress, Pattaya, Thailand,
drop. On the other hand, Nadam optimizer can yield the best 8-10 March 2017.
result at unit = 1024. But in the end, F1 score of unit = 512 [7] W. Phatthiyaphaibun. Thai Named Entity Recognitions for PyThaiNLP
[Online]. Available: https://github.com/wannaphongcom/thai-ner.
with Adam can archive the better result than unit = 1024 with [8] C. Santos and V. Guimares, ”Boosting Named Entity Recognition with
Nadam. Neural Character Embeddings,” In Proceedings of NEWS 2015 The
Fifth Named Entities Workshop, page 25.
TABLE V [9] J. P. Chiu and E. Nichols, Named Entity Recognition with Bidirectional
H YPERPARAMETER R ESULT LSTM-CNNs, Transactions of the Association for Computational Lin-
guistics, vol. 4, pp. 357370, 2016.
Variation [10] X. Ma and E. Hovy, End-to-end Sequence Labeling via Bi-directional
Precision Recall F1 LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the
Optimizer LSTM Unit Drop out
Adam 256 0.1 91.53 90.91 91.22 Association for Computational Linguistics (Volume 1: Long Papers),
Adam 256 0.5 92.02 90.80 91.41 2016.
[11] J. Pennington, R. Socher, and C. Manning, Glove: Global Vectors
Adam 512 0.1 91.74 90.40 91.06
for Word Representation, Proceedings of the 2014 Conference on
Adam 512 0.5 91.79 91.51 91.65
Empirical Methods in Natural Language Processing (EMNLP), 2014.
Adam 1024 0.1 91.15 90.86 91.01
[12] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C.
Adam 1024 0.5 91.99 90.77 91.38 Dyer, Neural Architectures for Named Entity Recognition, Proceedings
Nadam 256 0.1 91.65 91.08 91.36 of the 2016 Conference of the North American Chapter of the Associ-
Nadam 256 0.5 90.69 91.72 91.2 ation for Computational Linguistics: Human Language Technologies,
Nadam 512 0.1 91.38 91.30 91.34 2016.
Nadam 512 0.5 91.93 90.83 91.38 [13] K. Chaovavanich. Dictionary-based Thai Word Seg-
Nadam 1024 0.1 92.28 90.00 91.12 mentation using maximal matching algorithm and Thai
Nadam 1024 0.5 91.91 91.19 91.56 Character Cluster (TCC) (newmm) [Online]. Available:
https://github.com/PyThaiNLP/pythainlp.
[14] D. Rumelhart, G. Hinton, and R. Williams, ”Learning representations
by back-propagating errors,” Nature 323, no. 6088, pp. 533–536,
C ONCLUSION October 1986.
[15] S. Hochreiter and J. Schmidhuber, ”Long Short-Term Memory,” Neural
This paper presents Bi-LSTM-CRF with word and char- Computation 9, no. 8, pp. 1735–1780, 1997.
acter representations to extract named entities from sen- [16] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, ”Dis-
tences from PyThaiNLP corpus. The result of the model tributed Representations of Words and Phrases and their Composi-
tionality,” Proceeding NIPS’13 Proceedings of the 26th International
yield an incredible achievement for precision, recall, and F1 Conference on Neural Information Processing Systems - Volume 2
at 91.79%, 91.51%, and 91.65% respectively. In addition, Pages 3111-3119, Nevada, 2013

153
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND

[17] J. Howard and S. Ruder, Universal Language Model Fine-tuning for


Text Classification., in ACL, 2018.
[18] S. Merity, N. S. Keskar, and R. Socher, Regularizing and Optimizing
LSTM Language Models., CoRR, vol. abs/1708.02182, 2017.
[19] N. Reimers and I. Gurevych, Optimal Hyperparameters for
Deep LSTM-Networks for Sequence Labeling Tasks., CoRR, vol.
abs/1707.06799, 2017.
[20] C. Polpanumas. Thai2Fit [Online]. Available:
https://github.com/cstorm125/thai2fit.

154

You might also like