Applying the transformer to character-level transduction

S Wu, R Cotterell, M Hulden - arXiv preprint arXiv:2005.10213, 2020 - arxiv.org
The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

Towards realistic practices in low-resource natural language processing: The development set

K Kann, K Cho, SR Bowman - arXiv preprint arXiv:1909.01522, 2019 - arxiv.org
Development sets are impractical to obtain for real low-resource languages, since using all
available data for training is often more effective. However, development sets are widely …

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

The Janes project: language resources and tools for Slovene user generated content

D Fišer, N Ljubešić, T Erjavec - Language resources and evaluation, 2020 - Springer
The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …

Automatic normalisation of early Modern French

R Bawden, J Poinhos, E Kogkitsidou… - Proceedings of the …, 2022 - aclanthology.org
Spelling normalisation is a useful step in the study and analysis of historical language texts,
whether it is manual analysis by experts or automatic analysis using downstream natural …

Learning attention for historical text normalization by learning to pronounce

M Bollmann, J Bingel, A Søgaard - … of the 55th Annual Meeting of …, 2017 - aclanthology.org
Automated processing of historical texts often relies on pre-normalization to modern word
forms. Training encoder-decoder architectures to solve such problems typically requires a lot …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

[PDF][PDF] Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation

Y Scherrer, N Ljubešic - Proceedings of the 13th conference on …, 2016 - academia.edu
Abstract The Swiss German dialect corpus Archi-Mob poses great challenges for NLP and
corpus linguistic research due to the massive amount of variation found in the transcriptions …

MoNoise: A multi-lingual and easy-to-use lexical normalization tool

R Van Der Goot - Proceedings of the 57th Annual Meeting of the …, 2019 - aclanthology.org
In this paper, we introduce and demonstrate the online demo as well as the command line
interface of a lexical normalization system (MoNoise) for a variety of languages. We further …