Normalization of historical texts with neural network models

S Wu, R Cotterell, M Hulden - arXiv preprint arXiv:2005.10213, 2020 - arxiv.org

The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …

被引用次数：85 相关文章所有 8 个版本

[PDF] arxiv.org

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org

There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

被引用次数：94 相关文章所有 7 个版本

[PDF] academia.edu

Handling of nonstandard spelling in GRAC

M Shvedova, A Rysin, V Starko - 2021 IEEE 16th International …, 2021 - ieeexplore.ieee.org

GRAC is a large reference corpus of Ukrainian spanning over 200 years. The system of
morphological analysis used to mark up the corpus was originally designed only for modern …

被引用次数：7 相关文章所有 4 个版本

[PDF] springer.com

Interactive machine translation for the language modernization and spelling normalization of historical documents

M Domingo, F Casacuberta - Pattern Analysis and Applications, 2023 - Springer

Historical documents are an important part of our cultural heritage. Among other task related
to their processing, it is important to modernize their language in order to make them …

被引用次数：1 相关文章所有 4 个版本

[PDF] aclanthology.org

Automatic Normalisation of Middle French and its Impact on Productivity

R Rubino, S Coram-Mekkey, J Gerlach… - Proceedings of the …, 2024 - aclanthology.org

This paper presents a study on automatic normalisation of 16th century documents written in
Middle French. These documents present a large variety of wordforms which require …

被引用次数：2 相关文章所有 4 个版本

[PDF] uzh.ch

Semi-supervised contextual historical text normalization

P Makarov, S Clematide - 2020 - zora.uzh.ch

Historical text normalization, the task of mapping historical word forms to their modern
counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; …

被引用次数：8 相关文章所有 3 个版本

[PDF] hal.science

Traduction automatique pour la normalisation du français du XVII e siècle

S Gabay, L Barrault - TALN 2020, 2020 - hal.science

L'étude des états de langue anciens se heurte à un double problème: d'une part la distance
d'avec l'orthographe actuelle, qui empêche de recourir aux solutions standards de TAL, et …

被引用次数：10 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] EtymoLink: A Structured English Etymology Dataset

Y Gao, W Sun - Proceedings of the 5th Workshop on …, 2024 - aclanthology.org

Etymology, and the field of lexicography, is often constrained by unstructured data formats
buried in scholarly articles and dictionaries. This paper presents a methodology and an …

Historical text normalization with delayed rewards

S Flachs, M Bollmann, A Søgaard - … of the 57th Annual Meeting of …, 2019 - aclanthology.org

Training neural sequence-to-sequence models with simple token-level log-likelihood is now
a standard approach to historical text normalization, albeit often outperformed by phrase …

被引用次数：7 相关文章所有 3 个版本

[PDF] mdomingo.me

Enriching character-based neural machine translation with modern documents for achieving an orthography consistency in historical documents

M Domingo, F Casacuberta - New Trends in Image Analysis and …, 2019 - Springer

The nature of human language and the lack of a spelling convention make historical
documents hard to handle for natural language processing. Spelling normalization tackles …

被引用次数：5 相关文章所有 4 个版本