On The Vietnamese Name Entity Recognition: A Deep Learning Method Approach
On The Vietnamese Name Entity Recognition: A Deep Learning Method Approach
Abstract—Named entity recognition (NER) plays an important Fields (CRFs) [11], [22]. For the VLSP 2016 data set, the first
role in text-based information retrieval. In this paper, we combine Vietnamese NER system has applied MEMMs with specific
Bidirectional Long Short-Term Memory (Bi-LSTM) [7], [27] with features [25]. However, they have not achieved accuracy that
Conditional Random Field (CRF) [9] to create a novel deep
learning model for the NER problem. Each word as input of far beyond those of classical machine learning methods. Most
the deep learning model is represented by a Word2vec-trained of the above models depends heavily on specific resources
vector. A word embedding set trained from about one million and hand-crafted features, which makes it difficult for those
articles in 2018 collected through a Vietnamese news portal models to apply to new domains and other tasks.
(baomoi.com). In addition, we concatenate a Word2Vec [18]- In [19], [20], the author used the information of word, word
trained vector with semantic feature vector (Part-Of-Speech
(POS) tagging, chunk-tag) and hidden syntactic feature vector shapes, part-of-speech tags, chunking tags as hand-crafted fea-
(extracted by Bi-LSTM nerwork) to achieve the (so far best) tures for CRF to label entity tags [23]. Over the past few years,
result in Vietnamese NER system. The result was conducted many deep learning models have been proposed to overcome
on the data set VLSP2016 (Vietnamese Language and Speech these limitations. Some NER models have used LSTM and
Processing 2016 [29]) competition. CRF to predict NER [8], [12]. In addition, benefits from both
Index Terms—Vietnamese, Named Entity Recognition, Long
Short-Term Memory, Conditional Random Field, Word Embed- the expression of words and characters when combining CNN
ding and CRF are presented in [17], [28].
In this study, we introduce a deep neural network for Viet-
I. I NTRODUCTION namese NER using extraction of morphological features au-
Named-entity recognition (NER) (also known as entity iden- tomatically through a Bi-LSTM (character feature) network
tification, entity chunking and entity extraction) is a subtask combined with POS features - tagging and chunk tag. The
of information extraction that seeks to locate and classify model includes two bidirectional-lstm hidden layer and an
named entity mentions in unstructured text into pre-defined output layer CRF. For Vietnamese language, we use the data
categories such as the person names, organizations, locations, set from the 2016 VLSP contest. The results show that our
medical codes, time expressions, quantities, monetary values, model outperforms the best previous systems for Vietnamese
percentages, etc . It is a fundamental NLP research problem NER [23] with F1 is 95.61% on test set.
that has been studied for years. It is also considered as one The remainder of this paper is structured as follows. Section
of the most basic and important tasks in some big problems II refers related work on named entity recognition. Section
such as information extraction, question answering, entity III describes the implementation method. Section IV gives
linking, or machine translation. Recently, there are many experimental results and discussions. Finally, the conclusion
novel ideal in NER task such as Cross-View Training (CVT) will be given in Section V.
[4], a semi-supervised learning algorithm that improves the
representations of a Bi-LSTM sentence encoder using a mix II. RELATED WORK
of labeled and unlabeled data, or deep contextualized word The approaches for NER task can be divided into two rou-
representation [24] and contextual string embeddings, a recent tines: (1) statistical learning approaches and (2) deep learning
type of contextualized word embedding that were shown to methods.
yield state-of-the-art results [1], [2]. These studies have shown In the first type, the authors used traditional labeling models
new state-of-the-art methods with F1 scores on NER task. such as crf, hidden markov model, support vector machine,
In Vietnamese language, NER systems in VLSP 2016 adopted maximum entropy that are heavily dependent on hand-crafted
either conventional feature-based sequence labeling models features. Sentences are expressed in the form of a set of
such as Recurrent neural network (RNN), Bidirectional Long features such as word, pos, chunk, etc Then they are put into
Short Term Memory (Bi-LSTM) [25], Maximum-Entropy- a linear model for labeling. Some examples following this
Markov Models (MEMMs) [14], [21], Conditional Random routine are [6], [13], [15], [16]. These models were proven to
work quite well for low existing resources languages such as
Vietnamese. However, these kinds of NER systems are relied
heavily on the used feature set, and on hand-crafted features
that are expensive to construct and are difficultly reusable [23].
For the second routine, with the appearance of deep learning
models with superior computational performance seems to
improve the accuracy of the NER task. The performance of
deep learning models also have been shown much better than
the statistical based methods. In particular, the convolutional
neural network (CNN) [30], recurrent neural network (RNN),
LSTM networks are popular use, we can exploit the syntax
feature through character embedding in combination with
word embedding [26], [28]. Other information such as pos-
tag and chunk-tag is also used to provide more information
about semantic [3], [20], [25]. The word vectors are combined Fig. 1. Character-level Embedding
in different ways, then feed into the Bi-LSTM network with
CRF in output. For Vietnamese, there are many NER systems
using LSTM network. In [25], the authors introduced a model
that uses two Bi-LSTM layers with softmax layers at the LSTM units contain a memory cell that can maintain in-
output, with input from vectors using syntax specific, F1 score formation in memory for controlled periods of time. A cell
is 92.05%. A model using single Bi-LSTM layer combining in the LSTM network consists of three control gates: forget
crf at the output to achieve F1-score of 83.25% was given in gate (determining which information is ignored and which is
[22]. A number of high-precision models are introduced in the retained), update gate (deciding how much of the memorized
[3] with Bi-LSTM-CRF model with the input is the extracted information is added to the current state) and the output gate
vector with characteristic of word character, F1 is 94.88%. And (making the decision about which part of the current cell
most recently, a combination of Bi-LSTM - attention layer - makes it to the output). At time t, cell updates are given as
CRF model with F1 score of 95.33% was given in [23]. follows: