Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Deep learning-based morphological taggers and lemmatizers for annotating historical texts

H Schmid - Proceedings of the 3rd international conference on …, 2019 - dl.acm.org
Part-of-speech tagging, morphological tagging, and lemmatization of historical texts pose
special challenges due to the high spelling variability and the lack of large, high-quality …

[HTML][HTML] Text mining the history of medicine

P Thompson, RT Batista-Navarro, G Kontonatsios… - PloS one, 2016 - journals.plos.org
Historical text archives constitute a rich and diverse source of information, which is
becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it …

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

[PDF][PDF] A multilingual evaluation of three spelling normalisation methods for historical text

E Pettersson, B Megyesi, J Nivre - Proceedings of the 8th …, 2014 - aclanthology.org
We present a multilingual evaluation of approaches for spelling normalisation of historical
text based on data from five languages: English, German, Hungarian, Icelandic, and …

Spelling normalisation and linguistic analysis of historical text for information extraction

E Pettersson - 2016 - diva-portal.org
Abstract Pettersson, E. 2016. Spelling Normalisation and Linguistic Analysis of Historical
Text for Information Extraction. Studia Linguistica Upsaliensia 17. 147 pp. Uppsala: Acta …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

To normalize, or not to normalize: The impact of normalization on part-of-speech tagging

R Van der Goot, B Plank, M Nissim - arXiv preprint arXiv:1707.05116, 2017 - arxiv.org
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical
data? To the best of our knowledge, little is known on the actual impact of normalization in a …

Applying rule-based normalization to different types of historical texts—an evaluation

M Bollmann, F Petran, S Dipper - … 2011, Poznań, Poland, November 25--27 …, 2014 - Springer
This paper deals with normalization of language data from Early New High German. We
describe an unsupervised, rule-based approach which maps historical wordforms to modern …

How to tag non-standard language: Normalisation versus domain adaptation for slovene historical and user-generated texts

K Zupan, N Ljubešić, T Erjavec - Natural Language Engineering, 2019 - cambridge.org
Part-of-speech (PoS) tagging of non-standard language with models developed for standard
language is known to suffer from a significant decrease in accuracy. Two methods are …