Strong baselines for neural semi-supervised learning under domain shift

S Ruder, B Plank - arXiv preprint arXiv:1804.09530, 2018 - arxiv.org
Novel neural models have been proposed in recent years for learning under domain shift.
Most models, however, only evaluate on a single task, on proprietary datasets, or compare …

[PDF][PDF] MultiLexNorm: A shared task on multilingual lexical normalization

R Van Der Goot, A Ramponi, A Zubiaga… - Seventh Workshop on …, 2021 - pure.itu.dk
Lexical normalization is the task of transforming an utterance into its standardized form. This
task is beneficial for downstream analysis, as it provides a way to harmonize (often …

Enhancing BERT for lexical normalization

B Muller, B Sagot, D Seddah - The 5th workshop on noisy user …, 2019 - inria.hal.science
Language model-based pre-trained representations have become ubiquitous in natural
language processing. They have been shown to significantly improve the performance of …

[HTML][HTML] Graph-based Turkish text normalization and its impact on noisy text processing

S Demir, B Topcu - Engineering Science and Technology, an International …, 2022 - Elsevier
User generated texts on the web are freely-available and lucrative sources of data for
language technology researchers. Unfortunately, these texts are often dominated by …

[PDF][PDF] Dialect text normalization to normative standard Finnish

N Partanen, M Hämäläinen… - Workshop on Noisy …, 2019 - researchportal.helsinki.fi
We compare different LSTMs and transformer models in terms of their effectiveness in
normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common …

Monoise: Modeling noise using a modular normalization system

R van der Goot, G van Noord - arXiv preprint arXiv:1710.03476, 2017 - arxiv.org
We propose MoNoise: a normalization model focused on generalizability and efficiency, it
aims at being easily reusable and adaptable. Normalization is the task of translating texts …

[PDF][PDF] Rule-based text normalization for Malay social media texts

SNAN Ariffin, S Tiun - International Journal of Advanced …, 2020 - pdfs.semanticscholar.org
Malay social media text is a text written on social media networks like Twitter. Commonly,
this text comprises nonstandard words, filled with dialects, foreign languages, word …

Noise-robust morphological disambiguation for dialectal Arabic

N Zalmout, A Erdmann, N Habash - … of the 2018 Conference of the …, 2018 - aclanthology.org
User-generated text tends to be noisy with many lexical and orthographic inconsistencies,
making natural language processing (NLP) tasks more challenging. The challenging nature …

Lexical normalization for code-switched data and its effect on POS-tagging

R Van Der Goot, Ö Çetinoğlu - arXiv preprint arXiv:2006.01175, 2020 - arxiv.org
Lexical normalization, the translation of non-canonical data to standard language, has
shown to improve the performance of manynatural language processing tasks on social …

Annotating Norwegian language varieties on Twitter for part-of-speech

P Mæhlum, A Kåsen, S Touileb, J Barnes - arXiv preprint arXiv …, 2022 - arxiv.org
Norwegian Twitter data poses an interesting challenge for Natural Language Processing
(NLP) tasks. These texts are difficult for models trained on standardized text in one of the two …