Polylm: An open source polyglot large language model

X Wei, H Wei, H Lin, T Li, P Zhang, X Ren, M Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and
generate following nature language instructions. However, the development of LLMs has …

Generative language models for paragraph-level question generation

A Ushio, F Alva-Manchego… - arXiv preprint arXiv …, 2022 - arxiv.org
Powerful generative models have led to recent progress in question generation (QG).
However, it is difficult to measure advances in QG research since there are no standardized …

mGPT: Few-Shot Learners Go Multilingual

O Shliazhko, A Fenogenova, M Tikhonova… - Transactions of the …, 2024 - direct.mit.edu
This paper introduces mGPT, a multilingual variant of GPT-3, pretrained on 61 languages
from 25 linguistically diverse language families using Wikipedia and the C4 Corpus. We …

Text embedding inversion security for multilingual language models

Y Chen, H Lent, J Bjerva - … of the 62nd Annual Meeting of the …, 2024 - aclanthology.org
Textual data is often represented as real-numbered embeddings in NLP, particularly with the
popularity of large language models (LLMs) and Embeddings as a Service (EaaS) …

Interpretable long-form legal question answering with retrieval-augmented large language models

A Louis, G van Dijck, G Spanakis - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Many individuals are likely to face a legal dispute at some point in their lives, but their lack of
understanding of how to navigate these complex issues often renders them vulnerable. The …

IndicNLG benchmark: Multilingual datasets for diverse NLG tasks in Indic languages

A Kumar, H Shrotriya, P Sahu, R Dabre… - arXiv preprint arXiv …, 2022 - arxiv.org
Natural Language Generation (NLG) for non-English languages is hampered by the scarcity
of datasets in these languages. In this paper, we present the IndicNLG Benchmark, a …

Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution

T Li, K Murray - arXiv preprint arXiv:2305.17325, 2023 - arxiv.org
Zero-shot cross-lingual transfer is when a multilingual model is trained to perform a task in
one language and then is applied to another language. Although the zero-shot cross-lingual …

Scale: Scaling up the complexity for advanced language model evaluation

V Rasiah, R Stern, V Matoshi, M Stürmer… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks
(even professional domain-specific ones), emphasizing the need for novel, more …

Little red riding hood goes around the globe: Crosslingual story planning and generation with large language models

E Razumovskaia, J Maynez, A Louis, M Lapata… - arXiv preprint arXiv …, 2022 - arxiv.org
Previous work has demonstrated the effectiveness of planning for story generation
exclusively in a monolingual setting focusing primarily on English. We consider whether …

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

T Santosh, C Weiss, M Grabmair - arXiv preprint arXiv:2410.09527, 2024 - arxiv.org
In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress.
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …