Applying the transformer to character-level transduction
The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …
A large-scale comparison of historical text normalization systems
M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …
techniques have been proposed, including rule-based methods, distance metrics, character …
Towards realistic practices in low-resource natural language processing: The development set
Development sets are impractical to obtain for real low-resource languages, since using all
available data for training is often more effective. However, development sets are widely …
available data for training is often more effective. However, development sets are widely …
An evaluation of neural machine translation models on historical spelling normalization
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
The Janes project: language resources and tools for Slovene user generated content
The paper presents the results of the Janes project, which aimed to develop language
resources and tools for Slovene user generated content. The paper first describes the 200 …
resources and tools for Slovene user generated content. The paper first describes the 200 …
Automatic normalisation of early Modern French
R Bawden, J Poinhos, E Kogkitsidou… - Proceedings of the …, 2022 - aclanthology.org
Spelling normalisation is a useful step in the study and analysis of historical language texts,
whether it is manual analysis by experts or automatic analysis using downstream natural …
whether it is manual analysis by experts or automatic analysis using downstream natural …
Learning attention for historical text normalization by learning to pronounce
Automated processing of historical texts often relies on pre-normalization to modern word
forms. Training encoder-decoder architectures to solve such problems typically requires a lot …
forms. Training encoder-decoder architectures to solve such problems typically requires a lot …
[PDF][PDF] Normalization of historical texts with neural network models
M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …
effective natural language processing (NLP) for these documents is on the rise. However …
[PDF][PDF] Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation
Y Scherrer, N Ljubešic - Proceedings of the 13th conference on …, 2016 - academia.edu
Abstract The Swiss German dialect corpus Archi-Mob poses great challenges for NLP and
corpus linguistic research due to the massive amount of variation found in the transcriptions …
corpus linguistic research due to the massive amount of variation found in the transcriptions …
MoNoise: A multi-lingual and easy-to-use lexical normalization tool
R Van Der Goot - Proceedings of the 57th Annual Meeting of the …, 2019 - aclanthology.org
In this paper, we introduce and demonstrate the online demo as well as the command line
interface of a lexical normalization system (MoNoise) for a variety of languages. We further …
interface of a lexical normalization system (MoNoise) for a variety of languages. We further …