Named entity recognition and classification in historical documents: A survey

M Ehrmann, A Hamdi, EL Pontes, M Romanello… - ACM Computing …, 2023 - dl.acm.org
After decades of massive digitisation, an unprecedented number of historical documents are
available in digital format, along with their machine-readable texts. While this represents a …

Data-Efficient French Language Modeling with CamemBERTa

W Antoun, B Sagot, D Seddah - arXiv preprint arXiv:2306.01497, 2023 - arxiv.org
Recent advances in NLP have significantly improved the performance of language models
on a variety of tasks. While these advances are largely driven by the availability of large …

BERToldo, the Historical BERT for Italian

A Palmero Aprosio, S Menini, S Tonelli - Proceedings of the Second …, 2022 - cris.fbk.eu
Recent works in historical language processing have shown that transformer-based models
can be successfully created using historical corpora, and that using them for analysing and …

[HTML][HTML] Historical Portuguese corpora: a survey

TF Osório, H Lopes Cardoso - Language Resources and Evaluation, 2024 - Springer
This survey aims to thoroughly examine and evaluate the current landscape of electronic
corpora in historical Portuguese. This is achieved through a comprehensive analysis of …

The Moderniſa Project: Orthographic Modernization of Spanish Golden Age Dramas with Language Models

J De la Rosa, Á Cuéllar, J Lehmann - Anuario Lope de Vega Texto …, 2024 - revistes.uab.cat
La creciente aplicación de métodos computacionales a la literatura española del Siglo de
Oro ha revelado la necesidad de automatizar la modernización de los textos para facilitar su …

Domain-Adapting BERT for Attributing Manuscript, Century and Region in Pre-Modern Slavic Texts

P Lendvai, U Reichel, A Jouravel… - Proceedings of the 4th …, 2023 - aclanthology.org
Our study presents a stratified dataset compiled from six different Slavic bodies of text, for
cross-linguistic and diachronic analyses of Slavic Pre-Modern language variants. We …

Uso de IA generativa como herramienta de inducción a la programación en carreras STEAM

CI Pairetti, GL Rodríguez… - Memorias de las …, 2023 - publicaciones.sadio.org.ar
En este trabajo presentamos los lineamientos generales de una metodología didáctica que
incorpora la utilización de Inteligencia Artificial Generativa (IAG) en la enseñanza de la …

Corpus studies of language through time: Introduction to the special issue

T McEnery, G Brookes, I Clarke - International Journal of Corpus …, 2022 - jbe-platform.com
The study of language through time has long been an area where the corpus approach to
the analysis of language has been an important method. The possibility of using other …

A Data-driven Approach to Natural Language Processing for Contemporary and Historical French

PO Suarez - 2022 - theses.hal.science
In recent years, neural methods for Natural Language Processing (NLP) have consistently
and repeatedly improved the state of the art in a wide variety of NLP tasks. One of the main …

A Workflow for HTR-Postprocessing, Labeling and Classifying Diachronic and Regional Variation in Pre-Modern Slavic Texts

P Lendvai, M van Gompel, A Jouravel… - Proceedings of the …, 2024 - aclanthology.org
We describe ongoing work for developing a workflow for the applied use case of classifying
diachronic and regional language variation in Pre-Modern Slavic texts. The data were …