[HTML][HTML] Automatic genre identification: a survey
T Kuzman, N Ljubešić - Language Resources and Evaluation, 2023 - Springer
Automatic genre identification (AGI) is a text classification task focused on genres, ie, text
categories defined by the author's purpose, common function of the text, and the text's …
categories defined by the author's purpose, common function of the text, and the text's …
[HTML][HTML] Automatic genre identification for robust enrichment of massive text collections: Investigation of classification methods in the era of large language models
Massive text collections are the backbone of large language models, the main ingredient of
the current significant progress in artificial intelligence. However, as these collections are …
the current significant progress in artificial intelligence. However, as these collections are …
Exploring predictive uncertainty and calibration in NLP: A study on the impact of method & data scarcity
We investigate the problem of determining the predictive confidence (or, conversely,
uncertainty) of a neural classifier through the lens of low-resource languages. By training …
uncertainty) of a neural classifier through the lens of low-resource languages. By training …
Camel Treebank: An open multi-genre Arabic dependency treebank
Abstract We present the Camel Treebank (CAMELTB), a 188K word open-source
dependency treebank of Modern Standard and Classical Arabic. CAMELTB 1.0 includes 13 …
dependency treebank of Modern Standard and Classical Arabic. CAMELTB 1.0 includes 13 …
The GINCO training dataset for web genre identification of documents out in the wild
This paper presents a new training dataset for automatic genre identification GINCO, which
is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words …
is based on 1,125 crawled Slovenian web documents that consist of 650 thousand words …
Are UD treebanks getting more consistent? a report card for English UD
A Zeldes, N Schneider - arXiv preprint arXiv:2302.00636, 2023 - arxiv.org
Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies
project raise the expectation that joint training and dataset comparison is increasingly …
project raise the expectation that joint training and dataset comparison is increasingly …
A finite-state morphological analyser for Highland Puebla Nahuatl
This paper describes the development of a free/open-source finite-state
morphologicaltransducer for Highland Puebla Nahuatl, a Uto-Aztecan language spoken in …
morphologicaltransducer for Highland Puebla Nahuatl, a Uto-Aztecan language spoken in …
Training and evaluation of vector models for Galician
M Garcia - Language Resources and Evaluation, 2024 - Springer
This paper presents a large and systematic assessment of distributional models for Galician.
To this end, we have first trained and evaluated static word embeddings (eg, word2vec …
To this end, we have first trained and evaluated static word embeddings (eg, word2vec …
On Uncertainty In Natural Language Processing
D Ulmer - arXiv preprint arXiv:2410.03446, 2024 - arxiv.org
The last decade in deep learning has brought on increasingly capable systems that are
deployed on a wide variety of applications. In natural language processing, the field has …
deployed on a wide variety of applications. In natural language processing, the field has …
[HTML][HTML] Parlamint-it: an 18-karat UD treebank of Italian parliamentary speeches
C Alzetta, S Montemagni, M Sartor… - Language Resources and …, 2024 - Springer
Abstract The paper presents ParlaMint-It, a new treebank of Italian parliamentary debates,
linguistically annotated based on the Universal Dependencies (UD) framework. The …
linguistically annotated based on the Universal Dependencies (UD) framework. The …