Applying the transformer to character-level transduction
The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …
A large-scale comparison of historical text normalization systems
M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …
techniques have been proposed, including rule-based methods, distance metrics, character …
Handling of nonstandard spelling in GRAC
M Shvedova, A Rysin, V Starko - 2021 IEEE 16th International …, 2021 - ieeexplore.ieee.org
GRAC is a large reference corpus of Ukrainian spanning over 200 years. The system of
morphological analysis used to mark up the corpus was originally designed only for modern …
morphological analysis used to mark up the corpus was originally designed only for modern …
Interactive machine translation for the language modernization and spelling normalization of historical documents
M Domingo, F Casacuberta - Pattern Analysis and Applications, 2023 - Springer
Historical documents are an important part of our cultural heritage. Among other task related
to their processing, it is important to modernize their language in order to make them …
to their processing, it is important to modernize their language in order to make them …
Automatic Normalisation of Middle French and its Impact on Productivity
This paper presents a study on automatic normalisation of 16th century documents written in
Middle French. These documents present a large variety of wordforms which require …
Middle French. These documents present a large variety of wordforms which require …
Semi-supervised contextual historical text normalization
P Makarov, S Clematide - 2020 - zora.uzh.ch
Historical text normalization, the task of mapping historical word forms to their modern
counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; …
counterparts, has recently attracted a lot of interest (Bollmann, 2019; Tang et al., 2018; …
Traduction automatique pour la normalisation du français du XVII e siècle
S Gabay, L Barrault - TALN 2020, 2020 - hal.science
L'étude des états de langue anciens se heurte à un double problème: d'une part la distance
d'avec l'orthographe actuelle, qui empêche de recourir aux solutions standards de TAL, et …
d'avec l'orthographe actuelle, qui empêche de recourir aux solutions standards de TAL, et …
[PDF][PDF] EtymoLink: A Structured English Etymology Dataset
Y Gao, W Sun - Proceedings of the 5th Workshop on …, 2024 - aclanthology.org
Etymology, and the field of lexicography, is often constrained by unstructured data formats
buried in scholarly articles and dictionaries. This paper presents a methodology and an …
buried in scholarly articles and dictionaries. This paper presents a methodology and an …
Historical text normalization with delayed rewards
Training neural sequence-to-sequence models with simple token-level log-likelihood is now
a standard approach to historical text normalization, albeit often outperformed by phrase …
a standard approach to historical text normalization, albeit often outperformed by phrase …
Enriching character-based neural machine translation with modern documents for achieving an orthography consistency in historical documents
M Domingo, F Casacuberta - New Trends in Image Analysis and …, 2019 - Springer
The nature of human language and the lack of a spelling convention make historical
documents hard to handle for natural language processing. Spelling normalization tackles …
documents hard to handle for natural language processing. Spelling normalization tackles …