A comparison of the UCREL variant detector and modern spell checkers on English historical corpora

M Piotrowski - 2012 - books.google.com

More and more historical texts are becoming available in digital form. Digitization of paper
documents is motivated by the aim of preserving cultural heritage and making it more …

被引用次数：320 相关文章所有 6 个版本

[PDF] researchgate.net

Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare's Romeo and Juliet

J Culpeper - International Journal of Corpus Linguistics, 2009 - jbe-platform.com

This paper explores keywords, key part-of-speech categories and key semantic categories
and their role in text analysis. The first part of the paper addresses a set of issues relating to …

被引用次数：381 相关文章所有 9 个版本

[PDF] researchgate.net

[PDF][PDF] VARD2: A tool for dealing with spelling variation in historical corpora

A Baron, P Rayson - 2008 - researchgate.net

When applying corpus linguistic techniques to historical corpora, the corpus researcher
should be cautious about the results obtained. Corpus annotation techniques such as part of …

被引用次数：258 相关文章所有 4 个版本

[PDF] arxiv.org

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org

There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

被引用次数：90 相关文章所有 7 个版本

[PDF] academia.edu

[PDF][PDF] Word frequency and key word statistics in historical corpus linguistics

A Baron, P Rayson, D Archer - Anglistik: International Journal of …, 2009 - academia.edu

Frequency-sorted word lists have long been part of the standard methodology for exploiting
corpora. Sinclair (1991: 30) noted that" anyone studying a text is likely to need to know how …

被引用次数：199 相关文章所有 5 个版本

[图书][B] Contemporary corpus linguistics

P Baker - 2012 - books.google.com

Corpus linguistics uses large electronic databases of language to examine hypotheses
about language use. These can be tested scientifically with computerised analytical tools …

被引用次数：169 相关文章所有 4 个版本

[PDF] nowpublishers.com

Temporal information retrieval

N Kanhabua, A Anand - Proceedings of the 39th International ACM …, 2016 - dl.acm.org

The study of temporal dynamics and its impact can be framed within the so-called temporal
IR approaches, which explain how user behavior, document content and scale vary with …

被引用次数：106 相关文章所有 14 个版本

[PDF] mmu.ac.uk

Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora

P Rayson, DE Archer, A Baron… - Proceedings of the …, 2007 - e-space.mmu.ac.uk

In this paper we focus on automatic part-of-speech (POS) annotation, in the context of
historical English texts. Techniques that were originally developed for modern English have …

被引用次数：120 相关文章所有 10 个版本

[PDF] arxiv.org

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org

In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

被引用次数：50 相关文章所有 6 个版本

[PDF] liu.se

[PDF][PDF] An SMT approach to automatic annotation of historical text

E Pettersson, B Megyesi, J Tiedemann - Proceedings of the Workshop on …, 2013 - ep.liu.se

In this paper we propose an approach to tagging and parsing of historical text, using
characterbased SMT methods for translating the historical spelling to a modern spelling …

被引用次数：79 相关文章所有 7 个版本