IndoNLU: Benchmark and resources for evaluating Indonesian natural language understanding

B Wilie, K Vincentio, GI Winata, S Cahyawijaya… - arXiv preprint arXiv …, 2020 - arxiv.org
Although Indonesian is known to be the fourth most frequently used language over the
internet, the research progress on this language in the natural language processing (NLP) is …

Multilingual is not enough: BERT for Finnish

A Virtanen, J Kanerva, R Ilo, J Luoma… - arXiv preprint arXiv …, 2019 - arxiv.org
Deep learning-based language models pretrained on large unannotated text corpora have
been demonstrated to allow efficient transfer learning for natural language processing, with …

CoNLL 2018 shared task: Multilingual parsing from raw text to universal dependencies

D Zeman, J Hajic, M Popel, M Potthast… - Proceedings of the …, 2018 - aclanthology.org
Every year, the Conference on Computational Natural Language Learning (CoNLL) features
a shared task, in which participants train and test their learning systems on the same data …

Turku neural parser pipeline: An end-to-end system for the CoNLL 2018 shared task

J Kanerva, F Ginter, N Miekka, A Leino… - Proceedings of the …, 2018 - aclanthology.org
In this paper we describe the TurkuNLP entry at the CoNLL 2018 Shared Task on
Multilingual Parsing from Raw Text to Universal Dependencies. Compared to the last year …

An improved neural network model for joint POS tagging and dependency parsing

DQ Nguyen, K Verspoor - arXiv preprint arXiv:1807.03955, 2018 - arxiv.org
We propose a novel neural network model for joint part-of-speech (POS) tagging and
dependency parsing. Our model extends the well-known BIST graph-based dependency …

Polyglot contextual representations improve crosslingual transfer

P Mulcaire, J Kasai, NA Smith - arXiv preprint arXiv:1902.09697, 2019 - arxiv.org
We introduce Rosita, a method to produce multilingual contextual word representations by
training a single language model on text from multiple languages. Our method combines the …

Viable dependency parsing as sequence labeling

M Strzyz, D Vilares, C Gómez-Rodríguez - arXiv preprint arXiv:1902.10505, 2019 - arxiv.org
We recast dependency parsing as a sequence labeling problem, exploring several
encodings of dependency trees as labels. While dependency parsing by means of …

What does neural bring? Analysing improvements in morphosyntactic annotation and lemmatisation of Slovenian, Croatian and Serbian

N Ljubešić, K Dobrovoljc - Proceedings of the 7th workshop on …, 2019 - aclanthology.org
We present experiments on Slovenian, Croatian and Serbian morphosyntactic annotation
and lemmatisation between the former state-of-the-art for these three languages and one of …

Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks

J Kanerva, F Ginter, T Salakoski - Natural Language Engineering, 2021 - cambridge.org
In this paper, we present a novel lemmatization method based on a sequence-to-sequence
neural network architecture and morphosyntactic context representation. In the proposed …

A broad-coverage corpus for Finnish named entity recognition

J Luoma, M Oinonen, M Pyykönen… - Proceedings of the …, 2020 - aclanthology.org
We present a new manually annotated corpus for broad-coverage named entity recognition
for Finnish. Building on the original Universal Dependencies Finnish corpus of 754 …