Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arXiv preprint arXiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

A survey of text representation and embedding techniques in nlp

R Patil, S Boit, V Gudivada, J Nandigam - IEEE Access, 2023 - ieeexplore.ieee.org
Natural Language Processing (NLP) is a research field where a language in consideration
is processed to understand its syntactic, semantic, and sentimental aspects. The …

MultiEURLEX--A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer

I Chalkidis, M Fergadiotis, I Androutsopoulos - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce MULTI-EURLEX, a new multilingual dataset for topic classification of legal
documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 …

End-to-end transformer-based models in textual-based NLP

A Rahali, MA Akhloufi - AI, 2023 - mdpi.com
Transformer architectures are highly expressive because they use self-attention
mechanisms to encode long-range dependencies in the input sequences. In this paper, we …

Sabiá: Portuguese large language models

R Pires, H Abonizio, TS Almeida… - Brazilian Conference on …, 2023 - Springer
As the capabilities of language models continue to advance, it is conceivable that “one-size-
fits-all” model will remain as the main paradigm. For instance, given the vast number of …

Gender bias in masked language models for multiple languages

M Kaneko, A Imankulova, D Bollegala… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked Language Models (MLMs) pre-trained by predicting masked tokens on large
corpora have been used successfully in natural language processing tasks for a variety of …

Advancing neural encoding of portuguese with transformer albertina pt

J Rodrigues, L Gomes, J Silva, A Branco… - EPIA Conference on …, 2023 - Springer
To advance the neural encoding of Portuguese (PT), and a fortiori the technological
preparation of this language for the digital age, we developed a Transformer-based …

Findings of the TSAR-2022 shared task on multilingual lexical simplification

H Saggion, S Štajner, D Ferrés, KC Sheang… - arXiv preprint arXiv …, 2023 - arxiv.org
We report findings of the TSAR-2022 shared task on multilingual lexical simplification,
organized as part of the Workshop on Text Simplification, Accessibility, and Readability …

Findings of the VarDial evaluation campaign 2023

N Aepli, Ç Çöltekin, R Van Der Goot… - arXiv preprint arXiv …, 2023 - arxiv.org
This report presents the results of the shared tasks organized as part of the VarDial
Evaluation Campaign 2023. The campaign is part of the tenth workshop on Natural …

Lexical simplification benchmarks for English, Portuguese, and Spanish

S Štajner, D Ferrés, M Shardlow, K North… - Frontiers in Artificial …, 2022 - frontiersin.org
Even in highly-developed countries, as many as 15–30% of the population can only
understand texts written using a basic vocabulary. Their understanding of everyday texts is …