[图书][B] Natural language processing for historical texts
M Piotrowski - 2012 - books.google.com
More and more historical texts are becoming available in digital form. Digitization of paper
documents is motivated by the aim of preserving cultural heritage and making it more …
documents is motivated by the aim of preserving cultural heritage and making it more …
Keyness: Words, parts-of-speech and semantic categories in the character-talk of Shakespeare's Romeo and Juliet
J Culpeper - International Journal of Corpus Linguistics, 2009 - jbe-platform.com
This paper explores keywords, key part-of-speech categories and key semantic categories
and their role in text analysis. The first part of the paper addresses a set of issues relating to …
and their role in text analysis. The first part of the paper addresses a set of issues relating to …
[PDF][PDF] VARD2: A tool for dealing with spelling variation in historical corpora
When applying corpus linguistic techniques to historical corpora, the corpus researcher
should be cautious about the results obtained. Corpus annotation techniques such as part of …
should be cautious about the results obtained. Corpus annotation techniques such as part of …
A large-scale comparison of historical text normalization systems
M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …
techniques have been proposed, including rule-based methods, distance metrics, character …
[PDF][PDF] Word frequency and key word statistics in historical corpus linguistics
Frequency-sorted word lists have long been part of the standard methodology for exploiting
corpora. Sinclair (1991: 30) noted that" anyone studying a text is likely to need to know how …
corpora. Sinclair (1991: 30) noted that" anyone studying a text is likely to need to know how …
[图书][B] Contemporary corpus linguistics
P Baker - 2012 - books.google.com
Corpus linguistics uses large electronic databases of language to examine hypotheses
about language use. These can be tested scientifically with computerised analytical tools …
about language use. These can be tested scientifically with computerised analytical tools …
Temporal information retrieval
N Kanhabua, A Anand - Proceedings of the 39th International ACM …, 2016 - dl.acm.org
The study of temporal dynamics and its impact can be framed within the so-called temporal
IR approaches, which explain how user behavior, document content and scale vary with …
IR approaches, which explain how user behavior, document content and scale vary with …
Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora
In this paper we focus on automatic part-of-speech (POS) annotation, in the context of
historical English texts. Techniques that were originally developed for modern English have …
historical English texts. Techniques that were originally developed for modern English have …
An evaluation of neural machine translation models on historical spelling normalization
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
[PDF][PDF] An SMT approach to automatic annotation of historical text
In this paper we propose an approach to tagging and parsing of historical text, using
characterbased SMT methods for translating the historical spelling to a modern spelling …
characterbased SMT methods for translating the historical spelling to a modern spelling …