Thattinaphanich 2019
Thattinaphanich 2019
Abstract—Named Entity Recognition (NER) is a handy tool the increase of computation power. In this research paper, we
for many natural language processing tasks to identify and present the neural network architecture, Bidirectional Long
extract a unique entity such as person, location, organization Short-term Memory with Conditional Random Field (Bi-
and time. In English and Chinese, NER has been thoroughly
researched and is able to be applied in more practical settings. LSTM-CRF), implemented with word and character repre-
Its development in Thai is still limited because of rare resources sentation for the named entity recognition task. With the
and language difficulties such as the lack of boundary indicator nature of Bi-LSTM, the model can learn the information
for words, phrases and sentences. In this paper, we present from the successive word and the precedent word in the
an application of Bi-LSTM-CRF with word/character level sentence. In addition, the model absorbs the information of
representation, to solve this problem. Firstly, we prepared texts
by tokenizing a sentence to a bunch of words. We then prepared each popular word from word and character representation.
word representation and Bi-LSTM character representation. In
the end, we built a recurrent neural network combined with
II. R ELATED W ORK
CRF to learn the sequence of text and extract the knowledge For the Thai language, NER research is still in an infantile
to build NER recognition to overcome this problem. Our model state. A few works attempted to address this issue. Firstly,
was evaluated by the NER opensource corpus from a Facebook
Charoenpornsawat et al. (1998) [4] used context words, part
group ThaiNLP. The results of our model yielded precision,
recall, and F1 at 91.79%, 91.51% and 91.65% respectively. of speech (POS) and heuristic rules to extract the information
Index Terms—Named Entity Recognition, Recurrent Neural as a feature. Then they applied the Ripper and Winnow
Network, Bi-LSTM, Conditional Random Field, Thai Language algorithm to classify the entity names with the result of
92.17% accuracy. Chanlekha et al. (2004) [5] used maximum
entropy with heuristic information from rules and word-
I. I NTRODUCTION occurrences to extract the named entity. The result of the
Name Entity Recognition (NER), also known as entity experiment yielded the accuracy of approximately 87.7%
extraction, is one of essential elements in Natural Language including name of person (90.44%), organization (89.87%)
Processing (NLP) tasks. It is used in information extraction and location (82.16%). Tirasaroj et al. (2009) [2] forged
to identify and segment named entities and classify them into the conditional random fields model to extract NER and
several predefined classes. In widely spoken languages such experimented the pattern and the factors affecting the model.
as English and Chinese, massive studies have been conducted The result for InterBEST2009 news showed that the pattern
in various algorithms for this task. Nevertheless, Thai named had some effects on the system. The patterns for the experi-
entity recognition is limited due to the characteristics of the ment include BOI (Begin, Inside, Outside) and BOIE (Begin,
language. Thai does not have orthographical information or Outside, Inside, End) for Person, Location and Organization
boundary indicators to separate words, phrases or sentences. entities. For token word evaluation, the most accurate pattern
This creates a challenge for the machine learning model to was BOI which could archive 86.5% F1 score. Saetiew et
learn the feature from the hand-engineering feature. al. [6] used likelihood probability of tokenized words to
Given the limitation, only few studies exist for Thai name identify the person entity from texts with 85.15 F1 score for
entity recognition task. In the past decade, the most common InterBEST 2009 news corpus. Phatthiyaphaibun Wannaphong
methods for this task was a machine learning approach sim- [7] improved Tirasaroj et al. [2] by adding Part-of-Speech
ilar to English and other languages. One successful strategy as a feature to the CRF model. The result obtains slight
for solving this task was using the Conditional Random improvement at 86.9% F1 score.
Field (CRF) model which provided excellent outcomes [1][2]. For the popular language, NER is one of the fundamental
Recently, deep learning application has come to supersede research fields in NLP. Many pieces of research can archive
the conventional methods in multiple fields including image over 90% accuracy with state-of-the-art algorithms in 2010s:
recognition and natural language processing due to its ability They are machine learning and deep learning algorithms,
to generalize knowledge in the large size of the dataset and combined with other outstanding techniques. Santos et al.
150
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND
151
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND
C. Hyperparameter
B. Main Result In this section, we experimented how the hyperparameter
We tested our Bi-LSTM-CRF experiment with two config- affected the model. This experiment base used the best
urations including word representation and character repre- performance from subsection B, Bi-LSTM-CRF using word
sentation. We constructed the 4 tests to evaluate the result representation and character representation. The main factor
of applying word and character representations with Bi- for this experiment was LSTM units for the main model,
LSTM-CRF models. The first experiment (Model 1) was dropout and optimizer as shown in Table IV. The result de-
the performance of Bi-LSTM-CRF with initially random scribes that the hyperparameter did not affect much the model
word embedding training from scratch and did not include as the result was changing around 0.64%. The variation
character embedding. The second experiment (Model 2) was dropout could yield a better in every experiment compared to
the performance of Bi-LSTM-CRF with pre-trained word naive dropout similar to Reimers et al.’s result [18]. However,
embedding from ULMFit without character embedding. The for the number of LSTM units and optimizers, the F1 score
third experiment (Model 3) was performing Bi-LSTM-CRF was improved when the number of LSTM units increased
with initially random word embedding training from scratch and Adam as optimizer equaled at 512. Nevertheless, when
152
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND
TABLE III hyperparameters slightly affect the model. More studies will
F1 S CORE F OR B I -LSTM M ODEL shed some light further in the development of Thai NER.
Type Model 1 Model 2 Model 3 Model 4 Firstly, we need more dataset from reliable sources, since
Date 93.59 92.93 92.62 95.96 the corpus from PyThaiNLP is crowdsourcing. The data may
Email 44.44 76.19 0.00 100.00 contain methodological errors. Second, there are alternative
Law 62.22 71.02 0.50 71.35
Len 83.44 83.24 79.55 89.49 methods to generate word embedding. We can develop and
Location 79.55 84.66 79.52 87.73 experiment with the other word embedding to improve the
Money 89.58 94.02 91.15 95.46 result. Lastly, multiple strategies to implement the model
Organization 78.39 81.48 79.26 85.20 should be further explored by adding an additional feature
Percent 94.40 75.68 88.89 93.20
Person 90.80 95.14 91.29 97.01 such as ELMO embedding or BERT embedding.
Phone 79.90 91.60 92.33 98.02
Time 84.92 84.17 86.37 89.73 ACKNOWLEDGMENT
URL 95.50 98.85 98.33 98.85
ZIP 66.67 85.71 72.73 100.00 We would like to thank Mrs.Nutcha Tirasaroj, NECTEC
Average 85.64 88.85 86.18 91.65 and PyThaiNLP for providing the NER corpus, Mr.Charin
Polpanumas for the Thai2Fit word embedding and Depart-
ment of Computer Engineering, King Mongkut’s University
TABLE IV
F1 S CORE F OR B I -LSTM M ODEL of Technology Thonburi for their generous support for this
work.
Type Best Model CRF + POS
Date 95.96 93.76 R EFERENCES
Email 100.00 100
Law 71.35 59.24 [1] J. Lafferty, A. McCallum, and F. Pereira. ”Conditional random fields:
Len 89.49 93.23 Probabilistic models for segmenting and labeling sequence data.,” In
Location 87.73 80.65 Proc, ICML, 2001.
Money 95.46 91.69 [2] N. Tirasaroj and W. Aroonmanakun, Thai Named Entity Recognition
Organization 85.20 80.69 Based on Conditional Random Fields, in International Symposium on
Percent 93.20 83.84 Natural Language Processing (SNLP), Thailand, pp. 216-220, 2009.
Person 97.01 91.63 [3] Python Thailand Group (PyThaiNLP). Thai Named Entity Recognition
Phone 98.02 95.76 Corpus [Online]. Available: https://github.com/PyThaiNLP/pythainlp.
Time 89.73 84.86 [4] P. Charoenpornsawat, B. Kijsirikul, and S. Meknavin, Feature-based
URL 98.85 96.57 Proper Name Identification in Thai, in Proc. of National Computer
ZIP 100.00 90.9 Science and Engineering Conference: NCSEC98, Thailand, 1998.
Average 91.65 86.9 [5] H. Chanlekha and A. Kawtrakul, Thai Named Entity Extraction by
incorporating Maximum Entropy Model with Simple Heuristic Infor-
mation, in Natural Language Processing (IJCNLP), China, 2004.
[6] N. Saetiew, T. Achalakul, and S. Prom-onm, ”Thai Person Name
Recognition (PNR) Using Likelihood Probability of Tokenized Words,”
the LSTM was placed at 1024, the F1 score would slightly in 5th International Electrical Engineering Congress, Pattaya, Thailand,
drop. On the other hand, Nadam optimizer can yield the best 8-10 March 2017.
result at unit = 1024. But in the end, F1 score of unit = 512 [7] W. Phatthiyaphaibun. Thai Named Entity Recognitions for PyThaiNLP
[Online]. Available: https://github.com/wannaphongcom/thai-ner.
with Adam can archive the better result than unit = 1024 with [8] C. Santos and V. Guimares, ”Boosting Named Entity Recognition with
Nadam. Neural Character Embeddings,” In Proceedings of NEWS 2015 The
Fifth Named Entities Workshop, page 25.
TABLE V [9] J. P. Chiu and E. Nichols, Named Entity Recognition with Bidirectional
H YPERPARAMETER R ESULT LSTM-CNNs, Transactions of the Association for Computational Lin-
guistics, vol. 4, pp. 357370, 2016.
Variation [10] X. Ma and E. Hovy, End-to-end Sequence Labeling via Bi-directional
Precision Recall F1 LSTM-CNNs-CRF, Proceedings of the 54th Annual Meeting of the
Optimizer LSTM Unit Drop out
Adam 256 0.1 91.53 90.91 91.22 Association for Computational Linguistics (Volume 1: Long Papers),
Adam 256 0.5 92.02 90.80 91.41 2016.
[11] J. Pennington, R. Socher, and C. Manning, Glove: Global Vectors
Adam 512 0.1 91.74 90.40 91.06
for Word Representation, Proceedings of the 2014 Conference on
Adam 512 0.5 91.79 91.51 91.65
Empirical Methods in Natural Language Processing (EMNLP), 2014.
Adam 1024 0.1 91.15 90.86 91.01
[12] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C.
Adam 1024 0.5 91.99 90.77 91.38 Dyer, Neural Architectures for Named Entity Recognition, Proceedings
Nadam 256 0.1 91.65 91.08 91.36 of the 2016 Conference of the North American Chapter of the Associ-
Nadam 256 0.5 90.69 91.72 91.2 ation for Computational Linguistics: Human Language Technologies,
Nadam 512 0.1 91.38 91.30 91.34 2016.
Nadam 512 0.5 91.93 90.83 91.38 [13] K. Chaovavanich. Dictionary-based Thai Word Seg-
Nadam 1024 0.1 92.28 90.00 91.12 mentation using maximal matching algorithm and Thai
Nadam 1024 0.5 91.91 91.19 91.56 Character Cluster (TCC) (newmm) [Online]. Available:
https://github.com/PyThaiNLP/pythainlp.
[14] D. Rumelhart, G. Hinton, and R. Williams, ”Learning representations
by back-propagating errors,” Nature 323, no. 6088, pp. 533–536,
C ONCLUSION October 1986.
[15] S. Hochreiter and J. Schmidhuber, ”Long Short-Term Memory,” Neural
This paper presents Bi-LSTM-CRF with word and char- Computation 9, no. 8, pp. 1735–1780, 1997.
acter representations to extract named entities from sen- [16] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, ”Dis-
tences from PyThaiNLP corpus. The result of the model tributed Representations of Words and Phrases and their Composi-
tionality,” Proceeding NIPS’13 Proceedings of the 26th International
yield an incredible achievement for precision, recall, and F1 Conference on Neural Information Processing Systems - Volume 2
at 91.79%, 91.51%, and 91.65% respectively. In addition, Pages 3111-3119, Nevada, 2013
153
2019 4th International Conference on Information Technology (InCIT), Bangkok, THAILAND
154