EstBERT: A pretrained language-specific BERT for Estonian

H Tanvir, C Kittask, S Eiche, K Sirts - arXiv preprint arXiv:2011.04784, 2020 - arxiv.org
This paper presents EstBERT, a large pretrained transformer-based language-specific
BERT model for Estonian. Recent work has evaluated multilingual BERT models on …

Named entity recognition in Estonian 19th century parish court records

S Orasmaa, K Muischnek, K Poska… - Proceedings of the …, 2022 - aclanthology.org
This paper presents a new historical language resource, a corpus of Estonian Parish Court
records from the years 1821-1920, annotated for named entities (NE), and reports on named …

Estonian Named Entity Recognition: New Datasets and Models

K Sirts - Proceedings of the 24th Nordic Conference on …, 2023 - aclanthology.org
This paper presents the annotation process of two Estonian named entity recognition (NER)
datasets, involving the creation of annotation guidelines for labeling eleven different types of …

Enhancing Multilingual Information Extraction Towards Global Linguistic Inclusivity

M Nguyen - 2024 - search.proquest.com
In our interconnected world, the diversity of around 7,000 languages presents challenges
and opportunities for bridging language barriers. Multilingual information extraction …

Sentiment Analysis of Customer Emails Using BERT

KL Langli - 2023 - ntnuopen.ntnu.no
I løpet av de siste årene har språkmodeller blitt veldig populære, og de brukes for øyeblikket
til å løse ulike oppgaver innen naturlig språkprosessering (NLP). Mange selskap har store …

[PDF][PDF] Estonian Language Understanding: a Case Study on the COPA Task

HA Kuulmets, A Tattar, M Fishel - Baltic Journal of Modern Computing, 2022 - bjmc.lu.lv
The lack of Estonian NLU datasets severely affects advancing Estonian-specific NLP
research. With this paper we aim to relieve the issue by publishing a new Estonian NLU …

Catching lexemes: The case of Estonian noun-based ambiforms

G Paulsen, E Vainik, A Lohk, M Tuulik - eLex 2021 conference: Post …, 2021 - diva-portal.org
The aim of this study is to test a statistic relying on corpus data, the distributional index (D-
index): a statistical benchmark that helps lexicographers judge if a morphological form has …

Zpracování češtiny s využitím kontextualizované reprezentace

P Vysušilová - 2021 - dspace.cuni.cz
S rostoucím objemem dat, zejména nestrukturovaného textu, roste důleži-tost zpracování
přirozeného jazyka. Nejmodernějšími technologiemi posledních let jsou neuronové sítě …