Comparison of text preprocessing methods
CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …
a key area that directly affects the natural language processing (NLP) application results. For …
Deep learning-based morphological taggers and lemmatizers for annotating historical texts
H Schmid - Proceedings of the 3rd international conference on …, 2019 - dl.acm.org
Part-of-speech tagging, morphological tagging, and lemmatization of historical texts pose
special challenges due to the high spelling variability and the lack of large, high-quality …
special challenges due to the high spelling variability and the lack of large, high-quality …
[HTML][HTML] Text mining the history of medicine
Historical text archives constitute a rich and diverse source of information, which is
becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it …
becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it …
An evaluation of neural machine translation models on historical spelling normalization
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
[PDF][PDF] A multilingual evaluation of three spelling normalisation methods for historical text
We present a multilingual evaluation of approaches for spelling normalisation of historical
text based on data from five languages: English, German, Hungarian, Icelandic, and …
text based on data from five languages: English, German, Hungarian, Icelandic, and …
Spelling normalisation and linguistic analysis of historical text for information extraction
E Pettersson - 2016 - diva-portal.org
Abstract Pettersson, E. 2016. Spelling Normalisation and Linguistic Analysis of Historical
Text for Information Extraction. Studia Linguistica Upsaliensia 17. 147 pp. Uppsala: Acta …
Text for Information Extraction. Studia Linguistica Upsaliensia 17. 147 pp. Uppsala: Acta …
[PDF][PDF] Normalization of historical texts with neural network models
M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …
effective natural language processing (NLP) for these documents is on the rise. However …
To normalize, or not to normalize: The impact of normalization on part-of-speech tagging
Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical
data? To the best of our knowledge, little is known on the actual impact of normalization in a …
data? To the best of our knowledge, little is known on the actual impact of normalization in a …
Applying rule-based normalization to different types of historical texts—an evaluation
This paper deals with normalization of language data from Early New High German. We
describe an unsupervised, rule-based approach which maps historical wordforms to modern …
describe an unsupervised, rule-based approach which maps historical wordforms to modern …
How to tag non-standard language: Normalisation versus domain adaptation for slovene historical and user-generated texts
K Zupan, N Ljubešić, T Erjavec - Natural Language Engineering, 2019 - cambridge.org
Part-of-speech (PoS) tagging of non-standard language with models developed for standard
language is known to suffer from a significant decrease in accuracy. Two methods are …
language is known to suffer from a significant decrease in accuracy. Two methods are …