Applying the transformer to character-level transduction

S Wu, R Cotterell, M Hulden - arXiv preprint arXiv:2005.10213, 2020 - arxiv.org
The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

Handling of nonstandard spelling in GRAC

M Shvedova, A Rysin, V Starko - 2021 IEEE 16th International …, 2021 - ieeexplore.ieee.org
GRAC is a large reference corpus of Ukrainian spanning over 200 years. The system of
morphological analysis used to mark up the corpus was originally designed only for modern …

Interactive machine translation for the language modernization and spelling normalization of historical documents

M Domingo, F Casacuberta - Pattern Analysis and Applications, 2023 - Springer
Historical documents are an important part of our cultural heritage. Among other task related
to their processing, it is important to modernize their language in order to make them …

Automatic Normalisation of Middle French and its Impact on Productivity

R Rubino, S Coram-Mekkey, J Gerlach… - Proceedings of the …, 2024 - aclanthology.org
This paper presents a study on automatic normalisation of 16th century documents written in
Middle French. These documents present a large variety of wordforms which require …

Semi-supervised contextual historical text normalization

P Makarov, S Clematide - 2020 - zora.uzh.ch
Historical text normalization, the task of mapping historical word forms to their modern
counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; …

Traduction automatique pour la normalisation du français du XVII e siècle

S Gabay, L Barrault - TALN 2020, 2020 - hal.science
L'étude des états de langue anciens se heurte à un double problème: d'une part la distance
d'avec l'orthographe actuelle, qui empêche de recourir aux solutions standards de TAL, et …

[PDF][PDF] EtymoLink: A Structured English Etymology Dataset

Y Gao, W Sun - Proceedings of the 5th Workshop on …, 2024 - aclanthology.org
Etymology, and the field of lexicography, is often constrained by unstructured data formats
buried in scholarly articles and dictionaries. This paper presents a methodology and an …

Historical text normalization with delayed rewards

S Flachs, M Bollmann, A Søgaard - … of the 57th Annual Meeting of …, 2019 - aclanthology.org
Training neural sequence-to-sequence models with simple token-level log-likelihood is now
a standard approach to historical text normalization, albeit often outperformed by phrase …

Enriching character-based neural machine translation with modern documents for achieving an orthography consistency in historical documents

M Domingo, F Casacuberta - New Trends in Image Analysis and …, 2019 - Springer
The nature of human language and the lack of a spelling convention make historical
documents hard to handle for natural language processing. Spelling normalization tackles …