[图书][B] Corpus linguistics: Method, theory and practice

T McEnery, A Hardie - 2011 - books.google.com
Corpus linguistics is the study of language data on a large scale-the computer-aided
analysis of very extensive collections of transcribed utterances or written texts. This textbook …

Shakespearizing modern language using copy-enriched sequence-to-sequence models

H Jhamtani, V Gangal, E Hovy, E Nyberg - arXiv preprint arXiv:1707.01161, 2017 - arxiv.org
Variations in writing styles are commonly used to adapt the content to a specific context,
audience, or purpose. However, applying stylistic variations is still by and large a manual …

[PDF][PDF] VARD2: A tool for dealing with spelling variation in historical corpora

A Baron, P Rayson - 2008 - researchgate.net
When applying corpus linguistic techniques to historical corpora, the corpus researcher
should be cautious about the results obtained. Corpus annotation techniques such as part of …

[PDF][PDF] Word frequency and key word statistics in historical corpus linguistics

A Baron, P Rayson, D Archer - Anglistik: International Journal of …, 2009 - academia.edu
Frequency-sorted word lists have long been part of the standard methodology for exploiting
corpora. Sinclair (1991: 30) noted that" anyone studying a text is likely to need to know how …

[图书][B] Contemporary corpus linguistics

P Baker - 2012 - books.google.com
Corpus linguistics uses large electronic databases of language to examine hypotheses
about language use. These can be tested scientifically with computerised analytical tools …

[图书][B] Corpus linguistics for online communication: A guide for research

L Collins - 2019 - taylorfrancis.com
Corpus Linguistics for Online Communication provides an instructive and practical guide to
conducting research using methods in corpus linguistics in studies of various forms of online …

Deep learning-based morphological taggers and lemmatizers for annotating historical texts

H Schmid - Proceedings of the 3rd international conference on …, 2019 - dl.acm.org
Part-of-speech tagging, morphological tagging, and lemmatization of historical texts pose
special challenges due to the high spelling variability and the lack of large, high-quality …

[HTML][HTML] The electronic corpus of 17th-and 18th-century polish texts

W Gruszczyński, D Adamiec, R Bronikowska… - Language Resources …, 2022 - Springer
The paper describes the process of building the electronic corpus of 17th-and 18th-century
Polish texts, a relatively large, balanced, structurally and morphologically annotated …

Guidelines for normalising Early Modern English corpora: Decisions and justifications

D Archer, M Kytö, A Baron, P Rayson - icame Journal, 2015 - sciendo.com
Abstract Corpora of Early Modern English have been collected and released for research for
a number of years. With large scale digitisation activities gathering pace in the last decade …

Creation of an annotated corpus of Old and Middle Hungarian court records and private correspondence

A Novák, K Gugán, M Varga, A Dömötör - Language Resources and …, 2018 - Springer
The paper introduces a novel annotated corpus of Old and Middle Hungarian (16–18
century), the texts of which were selected in order to approximate the vernacular of the given …