0% found this document useful (0 votes)

146 views

Transformer-Based Korean Pretrained Language Models - NLP - Ai

This document provides a survey of Korean pretrained language models from the past three years. It introduces different types of Korean models, benchmark datasets, and compares model performance scores. The document aims to comprehensively analyze various publicly available Korean PLMs through research results from companies and individuals.

Uploaded by

Ppp P

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views

Transformer-Based Korean Pretrained Language Models - NLP - Ai

Uploaded by

Ppp P

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1

Transformer-based Korean Pretrained Language

Models: A Survey on Three Years of Progress
Kichang Yang, Undergraduate Student, Soongsil University

Abstract—With the advent of Transformer, which was used in translation models in 2017, attention-based architectures
began to attract attention. Furthermore, after the emergence of BERT, which strengthened the NLU-specific encoder
part, which is a part of the Transformer, and the GPT architecture, which strengthened the NLG-specific decoder part,
various methodologies, data, and models for learning the Pretrained Language Model began to appear. Furthermore, in
arXiv:2112.03014v1 [cs.CL] 25 Nov 2021

the past three years, various Pretrained Language Models specialized for Korean have appeared. In this paper, we
intend to numerically and qualitatively compare and analyze various Korean PLMs released to the public.

Index Terms—Computational Linguistics, Natural Language Processing, Machine Learning, AI

1 I NTRODUCTION

T HE hot keyword in the field of Natural Lan-

guage Processing and, furthermore, Machine
Learning for the last three years, was the Trans-
a comprehensive survey by combining the research
results of individual researchers/developers and
Korean companies such as Naver2 , Kakao3 , and
former [1]-based BERT [2] or GPT [3] model using SKT4 do. In this paper, the contribution we would
the attention algorithm. Originally, the Transformer like to claim is as follows.
was a model focused on the NMT model that ap-
peared because of the gradient bottleneck problem • Introduction and summary of the type of
that occurred during train RNN [4] for implement- Korean models that have been released so
ing the Neural Machine Translation model, but far
BERT, which performs only the role corresponding • Introduction and arrangement of Korean
to NLU in the Transformer, and GPT, which per- benchmark datasets that have been released
forms only the role corresponding to NLG. With so far
the advent of the two models, various models, algo- • Comprehensive score analysis of published
rithms, and data pre-processing methods appeared. models.
In addition to this, as a pretrained-model shar-
ing platform called ”Transformers” [5] created by
huggingface1 appeared, the NLP/AI field achieved
unprecedented growth in both academia and in- 2 R ELATED W ORKS
dustry. Most recently, large-scale models such as
GPT3 [6] that scale-up the parameters and data 2.1 Neural Machine Translation
of GPT by hundreds of times or more, or models The most popular framework for NMT is the
that expanded (ViT [7]) or mixed (DALL-E [8]) encoder-decoder model [1], [9], [10], [11], [12].
modality was appeared, it seemed to be getting Adopting attention module greatly improved the
a little closer to AGI. On the other hand, based performance of encoder-decoder model by using
on this huggingface platform and the Transformer context vector instead of fixed length vector [11],
series model, active research and development of [12]. By exploiting multiple attentive heads, the
models specialized in the Korean domain were also Transformer model has become the de-facto stan-
conducted in many areas of companies, schools, and dard model in NMT [1], [13], [14].
individuals. Accordingly, we would like to conduct
2. https://www.navercorp.com
contact: [email protected] 3. https://www.kakaocorp.com
1. https://huggingface.co/ 4. https://www.sktelecom.com
2

2.3 Korean NLP Benchmarks

Various fine-tuning data and test sets that can
measure the performance of Korean natural lan-
guage tasks have been released. NSMC5 dataset
is a Sentiment Analysis dataset labeled on Naver
movie review comment data. Naver and Changwon
University unveiled NaverNER6 , the Korean NER
data, at a competition held together in 2018. Kakao
Brain released the KorNLI and KorSTS [19] datasets
for measuring the NLU performance of Korean in
2020. Also, in 2019, LG CNS released KorQuAD7 ,
a SQuAD dataset for Korean to measure the per-
formance of Korean question and answer tasks. In
2020, the BEEP! [20] dataset for the Korean Hate
Speech Classification task was released. Most re-
Fig. 1. The emergence of hugging face provided a platform to
share the pretrained model in the natural language processing cently, KLUE [21], Korean version of GLUE [22]
field and machine learning field, and at the same time brought benchmark was released. However, we are not re-
the revival of the Transformer model-based attention algorithm, porting this results as lots of models are not re-
leading to tremendous growth in both the industry / academia in
the AI and NLP fields.
ported for this benchmark yet.

3 KOREAN PLM A RCHITECTURES

2.2 Pretraining with Unsupervised Feature-
Language models after 2018 can be classified into
based Approaches
three major types according to the pretraining
method (Fig. 2). (1) The first group of models
In recently, there are several approaches for pre- is Encoder-Centric Models, which focus on “Un-
training methods of the main stream, using feature- derstanding” of language (NLU) by using objec-
based approaches. OpenAI GPT [3] uses decoder tive functions such as predicting the correspond-
of Transformer architecture with next token pre- ing MASK (MLM) after creating and inserting the
diction (Auto Regressive) object. On the other side MASK in the input sentence. These models are later
of GPT, BERT [2] uses submodule of Transformer fintuned for tasks such as classification or feature
architecture(encoder), with Masked Language Mod- extraction. As a representative model, BERT series
eling(MLM) and Next Sentence Prediction(NSP) ob- PLMs are applicable. (2) The second case is using
ject for pretraining. RoBERTa [15] is similar to BERT the objective function to predict the next token of
architecture except it trained without NSP object each input token. Since these models are optimized
and static masking during the pretraining process. for inference corresponding to Auto-Regressive,
ELECTRA [16] uses MLM object with an adver- they are mainly used for Downstream-Task (Chat-
sarial objective used in GAN [17] architecture for bot, Lyric Generation, etc) learning corresponding
pretraining, using only a discriminator in finetun- to Language Generation (NLG). This mainly applies
ing different from GAN. BART [18] uses both an to GPT-based PLMs. (3) The third case is a model
encoder and decoder (ie. full architecture of trans- that utilizes the entire architecture of Transformer,
former) architecture with several permutation and which has recently been introduced in many ways.
deletion objectives. In the finetuning step, simply Models such as T5 [23], BART [18], and MASS [24]
plugging in the task-specific inputs and outputs are representative. The model trained in this way
into each PLM, we introduce and finetune all of the shows some significant performance improvement
parameters end-to-end. However, recent researches not only in NLU and NLG, but also in tasks where
of Large-Scale PLM like GPT-3 [6] show that any the effect of PLM is hard to see, such as NMT. In this
finetuning steps are not needed as the size of PLM section, we introduce the tokenizers and parameters
and data are large enough to remember all of tasks of the Korean pretrained models that have been
and information from training data. However, as
there are very few number of large-scale Korean 5. https://github.com/e9t/nsmc
PLMs exist, our survey does not include these types 6. https://github.com/naver/nlp-challenge
of PLMs. 7. https://korquad.github.io/
3

models

<EOS>

_자연어

<EOS>
NLP

_일종
NLP

one

모델

이다
of

의
is
Encoder Decoder Encoder Decoder

Transformer

<MASK>

트랜스포머
models

models
BERT

_자연어
models
NLP
GPT

_일종
one

one

모델

이다
NLP
one
of

의
is

of
is
Fig. 2. Three main types of PLM. Encoder-Centric Models (left) trained with MLM Task for Language Understanding, Decoder-
Centric Models(center) trained with Next Token Prediction Task for Language Generation. Seq2Seq Models (right) trained with
various objects and tasks (NMT, Summarization, etc) with Next Token Prediction for language generation with understanding.
Detailed objectives or architectures can be differ by individual models.

released so far based on the three categories we 3.1.3 KoELECTRA

classified above. KoELECTRA11 is an ELECTRA-based language
model trained from ‘Modu Corpus’12 released by
3.1 Encoder-Centric Models the National Institute of Korean Language (NIKL),
Korean Wikipedia, NamuWiki13 , and various news
data.

3.1.4 KcBERT
KcBERT [25] is a Korean BERT model trained based
on the BERT model using about 12 GB of Naver
Fig. 3. Objective of Encoder-Centric Models. this objective is to
predict tokens that positions of input are masked, designed for politics news comments data. Tokenizer uses Word-
Natural Language Understanding. piece [26] BPE and is preprocessed to handle emojis
and special characters.
Encoder centric models are focused on extract-
ing the features of language. Some tasks like classi- 3.1.5 SoongsilBERT (KcBERT2)
fication, clustering, tagging are able to use this type SoongsilBERT14 is a language model pretrained by
of models as PLM. using community data of Soongsil University and
3.1.1 KoBERT Modu Corpus in addition to the news comments
data used in KcBERT. Most of the settings are
KoBERT8 is the first Korean pretrained model
identical, except it is trained based on the RoBERTa
shared on huggingface released by SKT-Brain. It
model and uses Byte-level BPE Tokenizer. More-
is mostly the same as BERT’s configuration, but
over, SoongsilBERT is more fitting to community
the tokenizer uses SentencePiece 9 , not the Word
terminology. In other words, it does not perform
Piece Toeknizer used in BERT. For the data used
well in non-community domains.
for pretraining, 5 million sentences and 54 million
words were used in the Korean wiki.
3.1.6 KcELECTRA
3.1.2 HanBERT KcELECTRA15 is a model trained by collecting ad-
HanBERT10 is a BERT model trained using about ditional data (mainly comments) to the data used
150GB (General Domain: 70GB, Patent Domain: for KcBERT. In NSMC Task, the model is currently
75GB) and 700 million sentences of Korean corpus. recording State-of-the-Arts.
The tokenizer used a private tokenizer called Moran
Tokenizer, and the vocab size was 54000. 11. https://github.com/monologg/KoELECTRA
12. https://corpus.korean.go.kr/
8. https://github.com/SKTBrain/KoBERT 13. Large-scale Korean open domain encyclopedia.
9. https://github.com/google/sentencepiece 14. https://github.com/jason9693/Soongsil-BERT
10. https://github.com/tbai2019/HanBert-54k-N 15. https://github.com/Beomi/KcELECTRA
4

3.1.7 DistilKoBERT HyperCLOVA [30]: HyperCLOVA is first

•

DistilKoBERT16 is a lightweighted version of version of Korean Large-Scale PLM. Param-

KoBERT distillation based on huggingface’s Distil- eter size is up to 82B, but the models(i.e
Bert [27]model. The Teacher model and tokenizer parameters) are not published now.
used are the same as KoBERT. • SKT KoGPT-trinity: SKT KoGPT-trinity
(We’ll call this model as SKGPT) is the first
3.1.8 KoBigBird public version of Large-scaled Korean PLM.
KoBigBird [28] is released for long-range under- The size of parameters is 1.2B and trained
standing of Korean language. This model covered with Ko-DATA dataset which is inner refined
with more than 8 times longer than the usual (512 corpus of SKT for training the model.
tokens) BERT models. • KakaoBrain KoGPT [31]19 : Kakao Brain’s
KoGPT(for preventing confused with
KoGPT2 released by SKT, We’ll call this
3.2 Decoder-Centric Models
model as KakaoGPT) is the largest(size
of model) public version of Korean PLM.
Parameter size is 6B.
SKGPT and KakaoGPT announced with their
down-stream task results as finetuning, unlike Hy-
Fig. 4. Objective of Decoder-Centric Models. this objective is just perCLOVA registered as prompt-tuning version.
predict next token of each position of input tokens, designed for
Auto-Regressive inference, which forward model iterative until 3.3 Seq2Seq-Centric Models
eos (end-of-sequence) token predicted.

Decoder centric models are focused on a genera-

tion of languages. Some tasks like dialog (someone
called chatbot), Lyric-generation, or other types of
”generate” language are able to use this type of Fig. 5. Objective of Seq2Seq-Centric Models. this type’s objec-
models as PLM. Like fig. 5, objective functions of tive is usually mixed with NLU and NLG function.
Decoder-centric models are too simple, just predict-
Seq2Seq [10] centric models use seq2seq trans-
ing the next token of all sequences is done. Unfor-
former architecture for both NLU and NLG. Lots
tunately, in Korean, a very few number of models
of pretraining method are available as so many
that trained on this type are released as many PLM
seq2seq tasks exist. Unfortunately, in Korean, a
focused on NLU, not NLG.
few numbers of PLM trained with this method are
opened.
3.2.1 SKT-AI KoGPT2
KoGPT217 is a language model that GPT2 [29]-based 3.3.1 KoBART
PLM for the first Korean Natural Language Gener- KoBART20 is one of Seq2Seq versions of PLM, based
ation released by SKT-AI. Korean Wikipedia, Modu on BART model, which have training objects: text
Corpus, and the Blue House National Petition18 and infilling (for NLU) and Auto-Regressive(for NLG).
private data like news were used for model training. It was trained on more than 40GB corpus.
Char BPE Tokenizer is used for tokenization, in
addition to the custom (unused) tokens that used 4 E XPERIMENT
to train the downstream task. We exploit an aggregate Down-Stream task bench-
mark results of several pretrained models we dis-
3.2.2 Large-Scale PLM cussed above. Using the benchmark datasets intro-
As mentioned above, in abstraction, we will introduced in related works, We report results with 2
duce about large-scale LMs for Korean, but we will aspects. (1) Tasks deal with only a single sentence
not deal with these models because of computa- task (TABLE 1), and (2) Tasks deal with multiple
tional limitations. sentences or have some interactions with multiple
agents (TABLE 2).
16. https://github.com/monologg/DistilKoBERT
17. https://github.com/SKT-AI/KoGPT2 19. https://github.com/kakaobrain/kogpt
18. https://github.com/akngs/petitions 20. https://github.com/SKT-AI/KoBART
5

TABLE 1
A Result of Single Sentence Tasks.

Models NSMC* BEEP!(Dev)2 Naver NER3 Size(MB)

KoELECTRA(Small) 89.36 63.07 85.4 54
KoELECTRA(Base) 90.63 67.61 88.11 431
DistilKoBERT 88.6 60.72 84.65 108
KoBERT 89.59 66.21 87.92 351
SoongsilBERT(Small) 90.7 66 84 213
SoongsilBERT(Base) 91.2 69 85.2 370
KcBERT(Base) 89.62 68.78 84.34 417
KcBERT(Large) 90.68 69.91 85.53 1200
KoBigBird(Base) 91.18 - - 436
KoBART 90.24 - - 473
KoGPT2 91.13 - - 490
HanBERT 90.06 68.32 87.70 614
XLM-Roberta-Base 89.03 64.06 86.65 1030
KcELECTRA-base 91.71 74.05 86.90 475
1
measured by accuracy.
2, 3
measured by F1 score.

4.1 Single Sentence Tasks 4.1.3 Naver NER Result

Korean Benchmarks with a single sentence are
Naver NER dataset is a data published by process-
mainly focused on classification or tagging task.
ing Korean Wikipedia into text form. Total num-
NSMC is Korean sentiment classification bench-
ber of training sets is 90,000 examples. KoELEC-
mark which have binary classes, labeled with
TRA (Base) model achieved State-of-the-Art in this
NAVER Corp’s Shopping review comments. and
task. One of the interesting things is KcBERT and
BEEP! is Korean hate-speech classification bench-
SoongsilBERT, unlike in NSMC or BEEP! results,
mark labeled with ”Hate”, ”Offensive” and ”None”
these models are not performed well, even worse
classes. Naver NER is namely Named Entity Recog-
than the general multilingual model (XLM) [32] that
nition benchmark of Korean, which opend by
is not specialized in Korean.
NAVER Corp.

4.1.1 NSMC Result

NSMC is one of the benchmark datasets classify- 4.2 Multiple Sentence and Agent Tasks
ing whether sentiment is positive or negative. All
sentences are come from the commercial review The result of this task showed different patterns
sentence of NAVER. The size of this dataset is 150k than before. KoELECTRA and KoBigBird showed
sentences for training and 50k sentences for testing. best results, whereas KcBERT and SoongsilBERT
KcELECTRA has recorded State-of-the-Art (SOTA) showed better before. In this task, unlike before, the
in this task with 91.71 accuracy. texts of the datasets are much longer as there are
some interaction between sentences (NLI, STS) or
4.1.2 BEEP! Result agents (QA). KorNLI and KorSTS are NLI dataset
BEEP! is a human-annotated corpus where the in- for Korean, released by Kakao Brain. and Question
tensity of hate speech is tagged with the labels of Pair (Korean) dataset is a pharaphrase detection
‘hate’, ‘offensive’, and ‘none’, built upon celebrity benchmark that finds the similarity between two
news comments on a Korean online news platform. question sentences for Korean. Unfortunately, we
KcELECTRA achieved highest score of the models can not access this dataset anymore as this repo
with 69.91 F1 Scores. One of the interesting things is is removed now. Finally, KorQuAD dataset is a
that DistilKoBERT, lightweight version of KoBERT, Korean version of SQuAD (QA) dataset. Although
the score is degraded more than 5 points unlike the latest version of this dataset is 2.0, We used 1.0
NSMC scores of the two models are nearly the same. as lots of models reported in this version.
6

TABLE 2
A Result of Multiple Sentence & Agent Tasks.

Models KorNLI1 KorSTS2 Question Pair3 KorQuaD (Dev)4 Size (MB)

KoELECTRA (Small) 78.6 80.79 94.85 82.11 / 91.13 54
KoELECTRA (Base) 82.24 85.53 95.25 84.83 / 93.45 431
DistilKoBERT 72 72.59 92.48 54.40 / 77.97 108
KoBERT 79.62 81.59 94.85 51.75 / 79.15 351
SoongsilBERT (Small) 76 74.2 92 - 213
SoongsilBERT (Base) 78.3 76 94 - 370
KcBERT (Base) 74.85 75.57 93.93 60.25 / 84.39 417
KcBERT (Large) 76.99 77.49 94.06 62.16 / 86.64 1200
KoBigBird (Base) - - - 87.08 / 94.71 436
KoBART - 81.66 94.34 - 473
KoGPT2 - 78.4 - - 490
HanBERT 80.32 82.73 94.72 78.74 / 92.02 614
XLM-Roberta-Base 80.23 78.45 93.8 64.70 / 88.94 1030
KcELECTRA-base 81.65 82.65 95.78 70.60 / 90.11 475
1, 3
measured by accuracy.
2
measured by spearman correlation.
4
measured by (1) EM score and (2) F1 score.

4.2.1 KorNLI Result Wikipedia articles, 60,407 Q&A pairs for the train-
NLI task is the task classifying the relationship ing set, and 5,774 Q&A pairs for Dev set. In this task,
between two sentences as ”entailment”, ”contradic- KoBigBird scored the highest score (87.08 EM score
tion” and ”neutral”. KorNLI dataset has 942,854 ex- / 94.71 F1-score), On the other hand, KcBERT and
amples (pair) for training, 2,490 exmamples for eval- KoBERT are not performed well even scored lower
uation, and 5,010 examples for testing. KoELECTRA than XLM. It seems the sentence length of corpus
scored State-of-the-Arts in this task. However, most for pretraining is too short to understand long-term
of Korean PLM scored lower points than XLM, not sequence.
Korean centered model.

4.2.2 KorSTS Result 5 C ONCLUSION

STS task is identical to NLI, except for scoring In this survey, we discussed several Korean pre-
metric. This dataset score similarity between two trained language models and benchmarks and com-
sentences from 1 (not similar) to 5 (identical). In pared with these models. Of course, there are many
KorSTS, it has 5,749 examples for training, and 1,500 more publicly available Korean language models
examples for evaluation, and 1,379 examples for other than the model we introduced, but we could
testing. Like 4.2.1, KoELECTRA recorded the best not include all of them for reasons such as the length
score in this task. of this paper or the reason that the benchmark
results were not reported in various ways. In future
4.2.3 Question Pair Result works, we expect that the latest Korean benchmarks
Question Pair dataset has 6,888 examples of train such as KLUE and various surveys will appear
sets and 688 examples of test sets. KcELECTRA to promote the development of Korean NLP and
model has recorded the best in this task. How- furthermore, Computational Linguistics.
ever, Question Pair dataset is currently unavailable
because the repository of this task and dataset is
vanished. R EFERENCES
4.2.4 KorQuAD Result [1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones,
A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is
The total data of KorQuAD are divided into all you need,” Advances in Neural Information Processing
10,645 paragraphs and 66,181 Q&A pairs for 1,560 Systems, vol. 30, pp. 5998–6008, 2017.
7

[2] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Denoising sequence-to-sequence pre-training for natural
Pre-training of deep bidirectional transformers for lan- language generation, translation, and comprehension.”
guage understanding,” in Proceedings of the 2019 Conference [19] J. Ham, Y. J. Choe, K. Park, I. Choi, and H. Soh, “Kornli
of the North American Chapter of the Association for Compu- and korsts: New benchmark datasets for korean natural
tational Linguistics: Human Language Technologies, Volume 1 language understanding,” CoRR, vol. abs/2004.03289,
(Long and Short Papers), 2019, pp. 4171–4186. 2020. [Online]. Available: https://arxiv.org/abs/2004.
[3] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, 03289
“Improving language understanding by generative pre- [20] J. Moon, W. I. Cho, and J. Lee, “BEEP! Korean corpus
training.” of online news comments for toxic speech detection,”
[4] H. Sak, A. Senior, and F. Beaufays, “Long short-term in Proceedings of the Eighth International Workshop on
memory recurrent neural network architectures for large Natural Language Processing for Social Media. Online:
scale acoustic modeling.” Association for Computational Linguistics, Jul. 2020,
[5] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, pp. 25–31. [Online]. Available: https://www.aclweb.org/
A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz et al., anthology/2020.socialnlp-1.4
“Huggingface’s transformers: State-of-the-art natural lan- [21] S. Park, J. Moon, S. Kim, W. I. Cho, J. Han, J. Park, C. Song,
guage processing,” arXiv preprint arXiv:1910.03771, 2019. J. Kim, Y. Song, T. Oh, J. Lee, J. Oh, S. Lyu, Y. Jeong, I. Lee,
[6] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, S. Seo, D. Lee, H. Kim, M. Lee, S. Jang, S. Do, S. Kim,
P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell K. Lim, J. Lee, K. Park, J. Shin, S. Kim, L. Park, A. Oh, J.-W.
et al., “Language models are few-shot learners,” arXiv Ha, and K. Cho, “Klue: Korean language understanding
preprint arXiv:2005.14165, 2020. evaluation,” 2021.
[7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, [22] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R.
X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, Bowman, “Glue: A multi-task benchmark and analysis
G. Heigold, S. Gelly et al., “An image is worth 16x16 platform for natural language understanding,” 2019.
words: Transformers for image recognition at scale,” in [23] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
International Conference on Learning Representations, 2020. M. Matena, Y. Zhou, W. Li, and P. J. Liu, “Exploring
[8] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, the limits of transfer learning with a unified text-to-text
A. Radford, M. Chen, and I. Sutskever, “Zero-shot text- transformer,” 2020.
to-image generation,” CoRR, vol. abs/2102.12092, 2021. [24] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu, “Mass:
[Online]. Available: https://arxiv.org/abs/2102.12092 Masked sequence to sequence pre-training for language
[9] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, generation,” in ICML, 2019.
F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase [25] J. Lee, “Kcbert: Korean comments bert,” in Proceedings
representations using rnn encoder–decoder for statistical of the 32nd Annual Conference on Human and Cognitive
machine translation,” in Proceedings of the 2014 Confer- Language Technology, 2020, pp. 437–440.
ence on Empirical Methods in Natural Language Processing [26] M. Schuster and K. Nakajima, “Japanese and korean voice
(EMNLP), 2014, pp. 1724–1734. search,” in 2012 IEEE International Conference on Acoustics,
[10] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to se- Speech and Signal Processing (ICASSP). IEEE, 2012, pp.
quence learning with neural networks,” in Advances in 5149–5152.
neural information processing systems, 2014, pp. 3104–3112. [27] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert,
[11] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine a distilled version of bert: smaller, faster, cheaper and
translation by jointly learning to align and translate,” lighter,” arXiv preprint arXiv:1910.01108, 2019.
arXiv preprint arXiv:1409.0473, 2014. [28] J. Park and D. Kim, “Kobigbird: Pretrained bigbird
[12] M.-T. Luong, H. Pham, and C. D. Manning, “Effective ap- model for korean,” Nov. 2021. [Online]. Available:
proaches to attention-based neural machine translation,” https://doi.org/10.5281/zenodo.5654154
in Proceedings of the 2015 Conference on Empirical Methods in [29] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and
Natural Language Processing, 2015, pp. 1412–1421. I. Sutskever, “Language models are unsupervised multi-
[13] M. Ott, S. Edunov, D. Grangier, and M. Auli, “Scaling task learners,” 2019.
neural machine translation,” in Proceedings of the Third [30] B. Kim, H. Kim, S.-W. Lee, G. Lee, D. Kwak, J. D. Hyeon,
Conference on Machine Translation: Research Papers, 2018, pp. S. Park, S. Kim, S. Kim, D. Seo et al., “What changes
1–9. can large-scale language models bring? intensive study
[14] D. So, Q. Le, and C. Liang, “The evolved transformer,” on hyperclova: Billions-scale korean generative pretrained
in International Conference on Machine Learning, 2019, pp. transformers,” in Proceedings of the 2021 Conference on
5877–5886. Empirical Methods in Natural Language Processing, 2021, pp.
[15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, 3405–3424.
M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: [31] I. Kim, G. Han, J. Ham, and W. Baek, “Kogpt: Kakao-
A robustly optimized bert pretraining approach,” arXiv brain korean(hangul) generative pre-trained transformer,”
preprint arXiv:1907.11692, 2019. https://github.com/kakaobrain/kogpt, 2021.
[16] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, [32] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary,
“ELECTRA: Pre-training text encoders as discriminators G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer,
rather than generators,” in ICLR, 2020. [Online]. Available: and V. Stoyanov, “Unsupervised cross-lingual representa-
https://openreview.net/pdf?id=r1xMH1BtvB tion learning at scale,” CoRR, vol. abs/1911.02116, 2019.
[17] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, [Online]. Available: http://arxiv.org/abs/1911.02116
D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
“Generative adversarial networks,” Communications of the
ACM, vol. 63, no. 11, pp. 139–144, 2020.
[18] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mo-
hamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart:

NLP QB
100% (2)
NLP QB
14 pages
Transformers: State-of-the-Art Natural Language Processing
No ratings yet
Transformers: State-of-the-Art Natural Language Processing
8 pages
ChatGPT KZ Feb2023 PDF
No ratings yet
ChatGPT KZ Feb2023 PDF
7 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Advancement in NLP Paper
No ratings yet
Advancement in NLP Paper
49 pages
Pars BERT
No ratings yet
Pars BERT
10 pages
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
No ratings yet
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
40 pages
Rishabh Sharma (Anantika Johari)
No ratings yet
Rishabh Sharma (Anantika Johari)
8 pages
Investigating Masking-Based Data Generation in Language Models
No ratings yet
Investigating Masking-Based Data Generation in Language Models
8 pages
duan2020
No ratings yet
duan2020
6 pages
Information 14 00242
No ratings yet
Information 14 00242
17 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
The Development of Language AI Models in 2018
No ratings yet
The Development of Language AI Models in 2018
5 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Bert
No ratings yet
Bert
10 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
No ratings yet
Problem Statement:: Rule-Based Machine Translation (RBMT), Statistical Machine Translation (SMT), Neural
4 pages
Pre-Trained Models For Natural Language Processing: A Survey
No ratings yet
Pre-Trained Models For Natural Language Processing: A Survey
31 pages
NLP Cook BOOK With Transformers
No ratings yet
NLP Cook BOOK With Transformers
27 pages
A Comprehensive Evaluation of Neural SPARQL Query Generation From Natural Language Questions
No ratings yet
A Comprehensive Evaluation of Neural SPARQL Query Generation From Natural Language Questions
22 pages
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
No ratings yet
Qiu et al. - 2020 - Pre-trained Models for Natural Language Processing
28 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
(IJETA-V11I3P37) :anantika Johari, Rishabh Sharma, Aanchal Meena, Vansh Tiwari
No ratings yet
(IJETA-V11I3P37) :anantika Johari, Rishabh Sharma, Aanchal Meena, Vansh Tiwari
9 pages
(2303.18223) A Survey of Large Language Models
No ratings yet
(2303.18223) A Survey of Large Language Models
115 pages
2009.05451v1
No ratings yet
2009.05451v1
12 pages
16
No ratings yet
16
8 pages
Literature Review On Vulnerability Detection Using
No ratings yet
Literature Review On Vulnerability Detection Using
10 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
2108.05542
No ratings yet
2108.05542
42 pages
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
No ratings yet
Analysis of The Evolution of Advanced Transformer-Based Language Models: Experiments On Opinion Mining
16 pages
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
No ratings yet
A E A T - B L M: E O M: Nalysis of The Volution of Dvanced Ransformer Ased Anguage Odels Xperiments On Pinion Ining
16 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
German's Next Language Model - Branden Chan, Stefan Schweter, and Timo Moller
No ratings yet
German's Next Language Model - Branden Chan, Stefan Schweter, and Timo Moller
9 pages
2211.05994v4 (1)
No ratings yet
2211.05994v4 (1)
14 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Prompt Part1
No ratings yet
Prompt Part1
36 pages
Generative AI in the Era of Transformers
No ratings yet
Generative AI in the Era of Transformers
8 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Transformers
No ratings yet
Transformers
27 pages
ERNIE 3.0 Large-Scale Knowledge Enhanced Pre-Training For Language Understanding and Generation-2107.02137
No ratings yet
ERNIE 3.0 Large-Scale Knowledge Enhanced Pre-Training For Language Understanding and Generation-2107.02137
22 pages
Bert
No ratings yet
Bert
36 pages
LSTM to BERT
No ratings yet
LSTM to BERT
30 pages
All about Encoder-Decoder Models
No ratings yet
All about Encoder-Decoder Models
50 pages
AI-Driven Natural Language Processing Using Transformer Models
No ratings yet
AI-Driven Natural Language Processing Using Transformer Models
3 pages
Trend
No ratings yet
Trend
47 pages
paper_review
No ratings yet
paper_review
6 pages
E
No ratings yet
E
5 pages
NLP Cookbook
No ratings yet
NLP Cookbook
27 pages
NLP Cookbook
No ratings yet
NLP Cookbook
27 pages
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Literary research on NLP
No ratings yet
Literary research on NLP
4 pages
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
No ratings yet
Leveraging ParsBERT and Pretrained MT5 For Persian Abstractive
7 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
The State-Of-Art Applications of NLP: Evidence From ChatGPT
No ratings yet
The State-Of-Art Applications of NLP: Evidence From ChatGPT
7 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
AcceleratingTrainingOfTransformerBasedLanguageModelsWithProgressiveLayerDropping
No ratings yet
AcceleratingTrainingOfTransformerBasedLanguageModelsWithProgressiveLayerDropping
16 pages
FLAT Unit-1 2M Questions PDF
No ratings yet
FLAT Unit-1 2M Questions PDF
4 pages
Reading Words Exclusive 2019 Cevap Anahtari PDF
No ratings yet
Reading Words Exclusive 2019 Cevap Anahtari PDF
20 pages
LLM NTOES
No ratings yet
LLM NTOES
1,139 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
2 pages
AI for Marketing (1)
No ratings yet
AI for Marketing (1)
198 pages
AI 3rd Unit - Part 2 - Natural Language Processing
No ratings yet
AI 3rd Unit - Part 2 - Natural Language Processing
36 pages
6 Study of Question Answering On
No ratings yet
6 Study of Question Answering On
5 pages
NLP UNIT 5 part b
100% (2)
NLP UNIT 5 part b
31 pages
Laskar 21
No ratings yet
Laskar 21
32 pages
NLP Using Python
100% (3)
NLP Using Python
12 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
NLP
No ratings yet
NLP
11 pages
Rationale-Guided Retrieval Augmented Generation For Medical Question Answering
No ratings yet
Rationale-Guided Retrieval Augmented Generation For Medical Question Answering
15 pages
Unleashing The Power of Large Language Models Fauber
No ratings yet
Unleashing The Power of Large Language Models Fauber
4 pages
Bag of Words Algorithm: Paragraph
No ratings yet
Bag of Words Algorithm: Paragraph
3 pages
NLP Quantum
No ratings yet
NLP Quantum
126 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Machine Learning Natural Language 2023
No ratings yet
Machine Learning Natural Language 2023
28 pages
CT3 Set A
No ratings yet
CT3 Set A
3 pages
CTT_2024_proceedings
No ratings yet
CTT_2024_proceedings
96 pages
IndoXLNet Pre-Trained Language Model For Bahasa Indonesia
No ratings yet
IndoXLNet Pre-Trained Language Model For Bahasa Indonesia
15 pages
NLP Project Final Report1
No ratings yet
NLP Project Final Report1
10 pages
A Thorough Evaluation of Task-Specific Pretraining For Summarization
No ratings yet
A Thorough Evaluation of Task-Specific Pretraining For Summarization
6 pages
Introduction nlc
No ratings yet
Introduction nlc
69 pages
Dissertation Proposal
No ratings yet
Dissertation Proposal
14 pages
Explain To Me Like I Am Five - Sentence Simplification Using Transformers
No ratings yet
Explain To Me Like I Am Five - Sentence Simplification Using Transformers
4 pages
Language Model and NLP
No ratings yet
Language Model and NLP
1 page
Transformer-Based Korean Pretrained Language Models - NLP - Ai
No ratings yet
Transformer-Based Korean Pretrained Language Models - NLP - Ai
7 pages
From generation to judgement & LLM - artigo
No ratings yet
From generation to judgement & LLM - artigo
36 pages

Transformer-Based Korean Pretrained Language Models - NLP - Ai

Uploaded by

Transformer-Based Korean Pretrained Language Models - NLP - Ai

Uploaded by

1

Transformer-based Korean Pretrained Language

Index Terms—Computational Linguistics, Natural Language Processing, Machine Learning, AI

T HE hot keyword in the field of Natural Lan-

2.3 Korean NLP Benchmarks

3 KOREAN PLM A RCHITECTURES

released so far based on the three categories we 3.1.3 KoELECTRA

3.1.7 DistilKoBERT HyperCLOVA [30]: HyperCLOVA is first

DistilKoBERT16 is a lightweighted version of version of Korean Large-Scale PLM. Param-

Decoder centric models are focused on a genera-

Models NSMC* BEEP!(Dev)2 Naver NER3 Size(MB)

4.1 Single Sentence Tasks 4.1.3 Naver NER Result

4.1.1 NSMC Result

Models KorNLI1 KorSTS2 Question Pair3 KorQuaD (Dev)4 Size (MB)

4.2.2 KorSTS Result 5 C ONCLUSION

You might also like