JSI and WüNLP at the DIALECT-COPA Shared Task: In-Context Learning From Just a Few Dialectal Examples Gets You Quite Far

N Ljubešić, T Kuzman, P Rupnik, I Vulić… - Proceedings of the …, 2024 - aclanthology.org
The paper presents the JSI and WüNLP systems submitted to the DIALECT-COPA shared
task on causal commonsense reasoning in dialectal texts. Jointly, we compare LLM-based …

Do LLMs learn a true syntactic universal?

J Hale, M Stanojević - Proceedings of the 2024 Conference on …, 2024 - aclanthology.org
Do large multilingual language models learn language universals? We consider a
candidate universal much-discussed in the linguistics literature, the Final-over-Final …

ParlaMint II: advancing comparable parliamentary corpora across Europe

T Erjavec, M Kopp, N Ljubešić, T Kuzman… - Language Resources …, 2024 - Springer
The paper presents the results of the ParlaMint II project, which comprise comparable
corpora of parliamentary debates of 29 European countries and autonomous regions …

The parlaspeech collection of automatically generated speech and text datasets from parliamentary proceedings

N Ljubešić, P Rupnik, D Koržinek - International Conference on Speech …, 2024 - Springer
Recent significant improvements in speech and language technologies come both from self-
supervised approaches over raw language data as well as various types of explicit …

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

N Ljubešić, T Kuzman - arXiv preprint arXiv:2403.12721, 2024 - arxiv.org
This paper presents a collection of highly comparable web corpora of Slovenian, Croatian,
Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole …

Slovenian parliamentary corpus siParl

K Meden, T Erjavec, A Pančur - Language Resources and Evaluation, 2024 - Springer
Parliamentary debates represent an essential part of democratic discourse and provide
insights into various socio-demographic and linguistic phenomena-parliamentary corpora …

CLASSLA-Express: a Train of CLARIN. SI Workshops on Language Resources and Tools with Easily Expanding Route

N Ljubešić, T Kuzman, IF Petrović, J Parizoska… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces the CLASSLA-Express workshop series as an innovative approach to
disseminating linguistic resources and infrastructure provided by the CLASSLA Knowledge …

Classification of Lyric Poetry Written in Serbian

V Kadić, S Milanović, V Batanović - 2024 32nd …, 2024 - ieeexplore.ieee.org
In terms of natural language processing, Serbian belongs to low-resource languages, with a
small number of available datasets and tools. In this paper, we present a novel poem …

Dependency parser for Bulgarian

A Atanasov - Proceedings of the Sixth International Conference …, 2024 - aclanthology.org
This paper delves into the implementation of a Biaffine Attention Model, a sophisticated
neural network architecture employed for dependency parsing tasks. Proposed by Dozat …

Gos 2: A New Reference Corpus of Spoken Slovenian

D Verdonik, K Dobrovoljc, T Erjavec… - Proceedings of the …, 2024 - aclanthology.org
This paper introduces a new version of the Gos reference corpus of spoken Slovenian,
which was recently extended to more than double the original size (300 hours, 2.4 million …