Evaluating large language models: A comprehensive survey
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …
spectrum of tasks. They have attracted significant attention and been deployed in numerous …
Evaluating gender bias of pre-trained language models in natural language inference by considering all labels
P Anantaprayoon, M Kaneko, N Okazaki - arXiv preprint arXiv:2309.09697, 2023 - arxiv.org
Discriminatory social biases, including gender biases, have been found in Pre-trained
Language Models (PLMs). In Natural Language Inference (NLI), recent bias evaluation …
Language Models (PLMs). In Natural Language Inference (NLI), recent bias evaluation …
Dumb: A benchmark for smart evaluation of dutch models
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of
datasets for low-, medium-and high-resource tasks. The total set of nine tasks includes four …
datasets for low-, medium-and high-resource tasks. The total set of nine tasks includes four …
From base to conversational: Japanese instruction dataset and tuning large language models
Instruction tuning is essential for large language models (LLMs) to become interactive. While
many instruction tuning datasets exist in English, there is a noticeable lack in other …
many instruction tuning datasets exist in English, there is a noticeable lack in other …
A survey of deep learning techniques for machine reading comprehension
Reading comprehension involves the process of reading and understanding textual
information in order to answer questions related to it. It finds practical applications in various …
information in order to answer questions related to it. It finds practical applications in various …
Hae-rae bench: Evaluation of korean knowledge in language models
Large Language Models (LLMs) trained on massive corpora demonstrate impressive
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …
JCoLA: Japanese Corpus of Linguistic Acceptability
Neural language models have exhibited outstanding performance in a range of downstream
tasks. However, there is limited understanding regarding the extent to which these models …
tasks. However, there is limited understanding regarding the extent to which these models …
Jacolbert and hard negatives, towards better japanese-first embeddings for retrieval: Early technical report
B Clavié - arXiv preprint arXiv:2312.16144, 2023 - arxiv.org
Document retrieval in many languages has been largely relying on multi-lingual models,
and leveraging the vast wealth of English training data. In Japanese, the best performing …
and leveraging the vast wealth of English training data. In Japanese, the best performing …
Japanese SimCSE technical report
We report the development of Japanese SimCSE, Japanese sentence embedding models
fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese …
fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese …
Aurora-m: The first open source multilingual language model red-teamed according to the us executive order
Pretrained language models underpin several AI applications, but their high computational
cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to …
cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to …