[图书][B] Natural language processing for historical texts

M Piotrowski - 2012 - books.google.com
More and more historical texts are becoming available in digital form. Digitization of paper
documents is motivated by the aim of preserving cultural heritage and making it more …

Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare's Romeo and Juliet

J Culpeper - International Journal of Corpus Linguistics, 2009 - jbe-platform.com
This paper explores keywords, key part-of-speech categories and key semantic categories
and their role in text analysis. The first part of the paper addresses a set of issues relating to …

[PDF][PDF] VARD2: A tool for dealing with spelling variation in historical corpora

A Baron, P Rayson - 2008 - researchgate.net
When applying corpus linguistic techniques to historical corpora, the corpus researcher
should be cautious about the results obtained. Corpus annotation techniques such as part of …

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

[PDF][PDF] Word frequency and key word statistics in historical corpus linguistics

A Baron, P Rayson, D Archer - Anglistik: International Journal of …, 2009 - academia.edu
Frequency-sorted word lists have long been part of the standard methodology for exploiting
corpora. Sinclair (1991: 30) noted that" anyone studying a text is likely to need to know how …

[图书][B] Contemporary corpus linguistics

P Baker - 2012 - books.google.com
Corpus linguistics uses large electronic databases of language to examine hypotheses
about language use. These can be tested scientifically with computerised analytical tools …

Temporal information retrieval

N Kanhabua, A Anand - Proceedings of the 39th International ACM …, 2016 - dl.acm.org
The study of temporal dynamics and its impact can be framed within the so-called temporal
IR approaches, which explain how user behavior, document content and scale vary with …

Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora

P Rayson, DE Archer, A Baron… - Proceedings of the …, 2007 - e-space.mmu.ac.uk
In this paper we focus on automatic part-of-speech (POS) annotation, in the context of
historical English texts. Techniques that were originally developed for modern English have …

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

[PDF][PDF] An SMT approach to automatic annotation of historical text

E Pettersson, B Megyesi, J Tiedemann - Proceedings of the Workshop on …, 2013 - ep.liu.se
In this paper we propose an approach to tagging and parsing of historical text, using
characterbased SMT methods for translating the historical spelling to a modern spelling …