BERTimbau: pretrained BERT models for Brazilian Portuguese

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org

Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

被引用次数：291 相关文章所有 2 个版本

[PDF] ieee.org

A survey of text representation and embedding techniques in nlp

R Patil, S Boit, V Gudivada, J Nandigam - IEEE Access, 2023 - ieeexplore.ieee.org

Natural Language Processing (NLP) is a research field where a language in consideration
is processed to understand its syntactic, semantic, and sentimental aspects. The …

被引用次数：51 相关文章所有 2 个版本

[PDF] arxiv.org

MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

I Chalkidis, M Fergadiotis, I Androutsopoulos - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal
documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 …

被引用次数：91 相关文章所有 6 个版本

[PDF] mdpi.com

End-to-end transformer-based models in textual-based NLP

A Rahali, MA Akhloufi - AI, 2023 - mdpi.com

Transformer architectures are highly expressive because they use self-attention
mechanisms to encode long-range dependencies in the input sequences. In this paper, we …

被引用次数：51 相关文章所有 5 个版本

[PDF] arxiv.org

Sabiá: Portuguese large language models

R Pires, H Abonizio, TS Almeida… - Brazilian Conference on …, 2023 - Springer

As the capabilities of language models continue to advance, it is conceivable that “one-size-
fits-all” model will remain as the main paradigm. For instance, given the vast number of …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Gender bias in masked language models for multiple languages

M Kaneko, A Imankulova, D Bollegala… - arXiv preprint arXiv …, 2022 - arxiv.org

Masked Language Models (MLMs) pre-trained by predicting masked tokens on large
corpora have been used successfully in natural language processing tasks for a variety of …

被引用次数：52 相关文章所有 6 个版本

[PDF] arxiv.org

Advancing neural encoding of portuguese with transformer albertina pt

J Rodrigues, L Gomes, J Silva, A Branco… - EPIA Conference on …, 2023 - Springer

To advance the neural encoding of Portuguese (PT), and a fortiori the technological
preparation of this language for the digital age, we developed a Transformer-based …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

Findings of the TSAR-2022 shared task on multilingual lexical simplification

H Saggion, S Štajner, D Ferrés, KC Sheang… - arXiv preprint arXiv …, 2023 - arxiv.org

We report findings of the TSAR-2022 shared task on multilingual lexical simplification,
organized as part of the Workshop on Text Simplification, Accessibility, and Readability …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

Findings of the VarDial evaluation campaign 2023

N Aepli, Ç Çöltekin, R Van Der Goot… - arXiv preprint arXiv …, 2023 - arxiv.org

This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …

被引用次数：35 相关文章所有 13 个版本

[PDF] frontiersin.org

Lexical simplification benchmarks for English, Portuguese, and Spanish

S Štajner, D Ferrés, M Shardlow, K North… - Frontiers in Artificial …, 2022 - frontiersin.org

Even in highly-developed countries, as many as 15–30% of the population can only
understand texts written using a basic vocabulary. Their understanding of everyday texts is …

被引用次数：33 相关文章所有 9 个版本