0% found this document useful (0 votes)
9 views4 pages

neural machine paper 5

The document discusses UnsupNMT, a three-year project aimed at developing unsupervised neural machine translation (NMT) methods that rely solely on monolingual data without bilingual resources. It highlights the innovative approach of using deep learning and cross-lingual word embeddings to improve translation between languages with limited contact. The project aims to enhance machine translation performance and explore applications in transfer learning, with promising initial results reported in various scenarios.

Uploaded by

paronivivanco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

neural machine paper 5

The document discusses UnsupNMT, a three-year project aimed at developing unsupervised neural machine translation (NMT) methods that rely solely on monolingual data without bilingual resources. It highlights the innovative approach of using deep learning and cross-lingual word embeddings to improve translation between languages with limited contact. The project aims to enhance machine translation performance and explore applications in transfer learning, with promising initial results reported in various scenarios.

Uploaded by

paronivivanco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Procesamiento del Lenguaje Natural, Revista nº 63, septiembre de 2019, pp.

151-154 recibido 31-03-2019 revisado 25-04-2019 aceptado 17-05-2019

Unsupervised Neural Machine Translation, a new


paradigm solely based on monolingual text
Traducción Automática Neuronal no Supervisada, un nuevo
paradigma basado solo en textos monolingües
Mikel Artetxe, Gorka Labaka, Eneko Agirre
IXA NLP Group, HiTZ Basque Center for Language Technologies
University of the Basque Country (UPV/EHU)
{mikel.artetxe, gorka.labaka, e.agirre}@ehu.eus

Abstract: This article presents UnsupNMT, a 3-year project of which the first year
has already been completed. UnsupNMT proposes a radically different approach to
machine translation: unsupervised translation, that is, translation based on mono-
lingual data alone with no need for bilingual resources. This method is based on
deep learning of temporal sequences and uses cutting-edge interlingual word repre-
sentations in the form of cross-lingual word embeddings. This project is not only a
highly innovative proposal but it also opens a new paradigm in machine translation
which branches out to other disciplines, such us transfer learning. Despite the cu-
rrent limitations of unsupervised machine translation, the techniques developed are
expected to have great repercussions in areas where machine translation achieves
worse results, such as translation between languages which have little contact, e.g.
German and Russian.
Keywords: Machine Translation, Deep Learning, Word Embedding
Resumen: Este artı́culo presenta UnsupNMT, un proyecto de 3 años del que ha
trascurrido la primera anualidad. UnsupNMT plantea un método radicalmente dife-
rente de hacer traducción automática: la traducción no supervisada, es decir, basada
exclusivamente en textos monolingües sin ningún recurso bilingüe. El método pro-
puesto se basa en aprendizaje profundo de secuencias temporales combinado con los
últimos avances en representación interlingual de palabras (“cross-lingual word em-
beddings”). Además de ser una propuesta propiamente innovadora, abre un nuevo
paradigma de traducción automática con ramificaciones en otras disciplinas como
el aprendizaje por transferencia (“transfer learning”). A pesar de las limitaciones
actuales de la traducción automática no-supervisada, se espera que las técnicas desa-
rrolladas tengan gran repercusión en áreas donde la traducción automática consigue
peores resultados, como la traducción entre pares de idiomas con poco contacto,
tales como alemán o ruso.
Palabras clave: Traducción Automática, Aprendizaje Profundo, Word Embeddings
1 Introduction the dominant paradigm to machine transla-
tion (Bahdanau, Cho, and Bengio, 2014). As
Machine translation has been one of the most opposed to the traditional statistical machine
prominent applications of artificial intelligen- translation (SMT), NMT systems are trained
ce since the very beginnings of the field. In end-to-end, take advantage of continuous re-
addition to its intrinsic interest given the dif- presentations that greatly alleviate the spar-
ficulty and completeness of the problem, ma- sity problem, and make use of much larger
chine translation has a huge practical interest contexts, thus mitigating the locality pro-
in our increasingly global word, as it promises blem. Thanks to this, NMT has been repor-
to break the language barrier while keeping ted to significantly improve over SMT both
the cultural heritage and diversity of all the in automatic metrics and human evaluation.
languages spoken in the world.
In very recent times, previous approa- Current NMT methods require expensi-
ches have been superseded by neural machine ve annotated data, as they fail terribly when
translation (NMT), which has now become the training data is not big enough (Koehn
ISSN 1135-5948. DOI 10.26342/2019-63-18 © 2019 Sociedad Española para el Procesamiento del Lenguaje Natural
Mikel Artetxe, Gorka Labaka, Eneko Agirre

and Knowles, 2017). Unfortunately, the lack 2 Goals


of large parallel corpora is a practical pro- The overall goal of this project is to deve-
blem for the vast majority of language pairs, lop unsupervised learning methods for neu-
including low-resource languages (e.g. Bas- ral machine translation. In order to maximize
que) as well as many combinations of ma- impact of this nascent technology, the project
jor languages (e.g. German-Russian). Several will be structured in three goals:
authors have recently tried to address this
problem using pivoting or triangulation tech- Goal 1: Develop methods to train NMT
niques (Chen et al., 2017) as well as semi- models in a completely unsupervised manner,
supervised approaches (He et al., 2016), but relying solely in monolingual corpora. This is
these methods still require a strong cross- the core goal of this project. The rest of the
lingual signal. goals explore practical ramifications and im-
pact of unsupervised NMT.
In this project, we introduce unsupervi- Goal 2: Improve the state-of-the-art in ma-
sed neural machine translation, a new para- chine translation in different real-world sce-
digm where the system learns to translate narios where we have access to varying de-
between two languages without the need of grees of cross-lingual signals. This goal explo-
any bilingual dictionary or translation memo- res whether the new paradigm has practical
ries. That is, given large bodies of monolin- applications.
gual text (monolingual corpora), the system Goal 3: Transfer learning. This goal explo-
is able to extract the patterns which allow to res whether the new paradigm has practical
translate from one language to another. implications in transferring natural language
processing systems from a resource-rich lan-
Our approach would use standard deep guage to a less-resourced language.
learning models for sequence-to-sequence
learning. More concretely, we would follow 3 Technical approach
the encoder/decoder architecture, combining We propose a radically new approach to
a single language independent encoder that unsupervised machine translation based on
would compose the cross-lingual word em- deep learning, a direction that has shown to
beddings and several language specific deco- be highly successful in other related areas, in-
ders that would decompose this representa- cluding standard supervised machine trans-
tion back into the appropriate language. The lation itself through NMT. The core of
system would be trained in an unsupervised our approach is to learn to compose cross-
manner following the same principle of denoi- lingual word representations in an unsuper-
sing autoencoders, and we would explore ad- vised manner. Then, use those embeddings
ditional techniques like adversarial training to generate a first translation system that can
and backtranslation to enhance proper lear- be improved by monolingual techniques, such
ning (Sennrich, Haddow, and Birch, 2016). as denoising autoencoders or backtranslation.
Regarding the first step, and in order
The new techniques will open new re- to obtain our cross-lingual word representa-
search avenues on machine translation. We tions, we will rely on techniques pioneered
are at a very good position to check whet- by us to build cross-lingual embedding map-
her this new paradigm allows to improve the pings (Artetxe, Labaka, and Agirre, 2017; Ar-
state-of-the-art in MT, especially for less- tetxe, Labaka, and Agirre, 2018a). Recent
resourced language pairs and domains. Mo- works in unsupervised word embedding map-
reover, the developed methods to train cross- ping (Lample et al., 2018a) have managed to
lingual sentence representations will also be obtain results comparable to previous super-
useful for cross-lingual transfer learning, as vised techniques, which we managed to im-
already shown by cross-lingual word repre- prove (Artetxe, Labaka, and Agirre, 2018b).
sentations. Finally, the viability to induce However, existing methods are based on the
translation models in a completely unsu- geometric interpretation of the embedding
pervised environment would empirically pro- space (e.g. minimizing the Euclidean distan-
ve the existence of an inherent connection ce, maximizing the cosine similarity...), which
among all languages, which has a great in- is unnatural and presumably suboptimal for
terest from the point of view of Linguistics. machine translation. For that reason, we will
152
Unsupervised neural machine translation, a new paradigm solely based on monolingual text

explore alternative interpretations. At the sa- or sentence-level relations.


me time, while existing models are bilingual,
we plan to extend them to the multilingual 4 Current progress
scenario, so we can exploit the relationship
among several languages at the same time. The first attempts which obtained promi-
These cross-lingual word-embeddings con- sing results in standard machine transla-
tain enough bilingual information to genera- tion benchmarks using monolingual corpo-
te a rudimentary word-by-word translation ra only (Artetxe et al., 2018; Lample et al.,
system, which can be improved by different 2018a) build upon unsupervised cross-lingual
techniques. Either by directly generating a embedding mappings, which independently
neural translation system that makes use of train word embeddings in two languages and
these pre-trained embeddings and is trained learn a linear transformation to map them
using techniques such as denoising autoenco- to a shared space. The resulting cross-lingual
ders or backtranslation, or, taking advantage embeddings are used to initialize a shared en-
of the SMT modular architecture, by using coder for both languages, and the entire sys-
these embeddings to generate a phrase-table tem is trained using a combination of denoi-
that can be combined with a language model. sing autoencoding, back-translation and, in
In relation to Goal 2, the project will ex- the case of (Lample et al., 2018a), adversa-
plore extensions of the previous approach to rial training.
exploit cross-lingual signals of different de- During the first months of the project we
gree when available, which would be used have made progress mainly in our core Goal
to improve the state-of-the-art in machine 1, where we show that the modular architec-
translation in different practical scenarios. In ture of phrase-based SMT is more suitable
particular, we plan to: for this problem. Our work (Artetxe, Labaka,
and Agirre, 2018c), concurrent with (Lam-
If a small parallel corpus is available, use ple et al., 2018b), adapted the same princi-
it to fine-tune our model taking care not ples discussed above to train an unsupervised
to overfit. This type of fine-tuning has al- SMT model, obtaining large improvements
ready been shown to be effective in NMT over the original unsupervised NMT systems.
in the case of domain adaptation (Chu, More concretely, we learn cross-lingual n-
Dabre, and Kurohashi, 2017). gram embeddings from monolingual corpora
If a comparable corpus is available, use based on the mapping method discussed ear-
our model to iteratively extract reliable lier, and use them to induce an initial phrase-
parallel sentences from it and improve table that is combined with an n-gram lan-
the model with these parallel sentences. guage model and a distortion model. This
initial system is then refined through itera-
While the main scenario under considera- tive back-translation.
tion in this project is that of machine trans- More recently (Artetxe, Labaka, and Agi-
lation, we also plan to explore the appli- rre, 2019), we identifiy and address several
cation of the developed methods in cross- deficiencies of existing unsupervised SMT ap-
lingual transfer learning (Goal 3), where a proaches by exploiting subword information,
model is trained in one language and used developing a theoretically well founded unsu-
in a different one. This has a great practi- pervised tuning method, and incorporating
cal interest, as it allows to leverage the lar- a joint refinement procedure. Moreover, we
gely available annotated data and resources use our improved SMT system to initialize a
in major languages (in particular, English), dual NMT model, which is further fine-tuned
to other less resourced ones (e.g. Basque or through on-the-fly back-translation. Toget-
even Spanish for some tasks like co-reference, her, we obtain large improvements over the
sentiment analysis or named-entity recogni- previous state-of-the-art in unsupervised ma-
tion). For that purpose, we plan to extend the chine translation. For instance, we get 22.5
existing methods based on cross-lingual word BLEU points in English-to-German WMT
embeddings to incorporate the entire encoder 2014, 5.5 points more than the previous best
learned with our approach, so the resulting unsupervised system, and 0.5 points more
system does not only account for word-level than the (supervised) shared task winner
relations but also for more complex phrase- back in 2014.
153
Mikel Artetxe, Gorka Labaka, Eneko Agirre

Acknowledgments Chen, Y., Y. Liu, Y. Cheng, and V. O. Li.


UnsupNMT is a project funded by the Spa- 2017. A teacher-student framework for
nish Ministry of Economy, Industry and zero-resource neural machine translation.
Competitiveness (TIN2017-91692-EXP). In Proceedings of the 55th Annual Meeting
of the Association for Computational Lin-
References guistics (Volume 1: Long Papers), pages
Artetxe, M., G. Labaka, and E. Agirre. 2017. 1925–1935. Association for Computatio-
Learning bilingual word embeddings with nal Linguistics.
(almost) no bilingual data. In Procee- Chu, C., R. Dabre, and S. Kurohashi.
dings of the 55th Annual Meeting of the 2017. An empirical comparison of do-
Association for Computational Linguistics main adaptation methods for neural ma-
(Volume 1: Long Papers), pages 451–462, chine translation. In Proceedings of the
Vancouver, Canada, July. Association for 55th Annual Meeting of the Association
Computational Linguistics. for Computational Linguistics (Volume 2:
Artetxe, M., G. Labaka, and E. Agirre. Short Papers), pages 385–391. Association
2018a. Generalizing and improving bi- for Computational Linguistics.
lingual word embedding mappings with a He, D., Y. Xia, T. Qin, L. Wang, N. Yu,
multi-step framework of linear transfor- T.-Y. Liu, and W.-Y. Ma. 2016. Dual
mations. In Proceedings of the Thirty- learning for machine translation. In D. D.
Second AAAI Conference on Artificial In- Lee, M. Sugiyama, U. V. Luxburg, I. Gu-
telligence (AAAI-18), pages 5012–5019. yon, and R. Garnett, editors, Advances
Artetxe, M., G. Labaka, and E. Agirre. in Neural Information Processing Systems
2018b. A robust self-learning method for 29. Curran Associates, Inc., pages 820–
fully unsupervised cross-lingual mappings 828.
of word embeddings. In Proceedings of the Koehn, P. and R. Knowles. 2017. Six
56th Annual Meeting of the Association challenges for neural machine translation.
for Computational Linguistics (Volume 1: In Proceedings of the First Workshop on
Long Papers), pages 789–798. Association Neural Machine Translation, pages 28–39.
for Computational Linguistics. Association for Computational Linguis-
Artetxe, M., G. Labaka, and E. Agirre. tics.
2018c. Unsupervised statistical machi- Lample, G., A. Conneau, L. Denoyer, and
ne translation. In Proceedings of the M. Ranzato. 2018a. Unsupervised machi-
2018 Conference on Empirical Methods ne translation using monolingual corpora
in Natural Language Processing, pages only. In Proceedings of the 6th Interna-
3632–3642, Brussels, Belgium, October- tional Conference on Learning Represen-
November. Association for Computational tations (ICLR 2018), April.
Linguistics.
Lample, G., M. Ott, A. Conneau, L. Denoyer,
Artetxe, M., G. Labaka, and E. Agirre. 2019. and M. Ranzato. 2018b. Phrase-based
An effective approach to unsupervised ma- & neural unsupervised machine transla-
chine translation. In Proceedings of the tion. In Proceedings of the 2018 Conferen-
57th Annual Meeting of the Association ce on Empirical Methods in Natural Lan-
for Computational Linguistics (Volume 1: guage Processing, pages 5039–5049, Brus-
Long Papers). Association for Compu- sels, Belgium, October-November. Asso-
tational Linguistics. ciation for Computational Linguistics.
Artetxe, M., G. Labaka, E. Agirre, and Sennrich, R., B. Haddow, and A. Birch. 2016.
K. Cho. 2018. Unsupervised neural ma- Improving neural machine translation mo-
chine translation. In Proceedings of the dels with monolingual data. In Procee-
6th International Conference on Learning dings of the 54th Annual Meeting of the
Representations (ICLR 2018), April. Association for Computational Linguistics
Bahdanau, D., K. Cho, and Y. Bengio. 2014. (Volume 1: Long Papers), pages 86–96,
Neural machine translation by jointly lear- Berlin, Germany, August. Association for
ning to align and translate. arXiv e- Computational Linguistics.
prints, abs/1409.0473, September.
154

You might also like