Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

Evaluating gender bias of pre-trained language models in natural language inference by considering all labels

P Anantaprayoon, M Kaneko, N Okazaki - arXiv preprint arXiv:2309.09697, 2023 - arxiv.org
Discriminatory social biases, including gender biases, have been found in Pre-trained
Language Models (PLMs). In Natural Language Inference (NLI), recent bias evaluation …

Dumb: A benchmark for smart evaluation of dutch models

W de Vries, M Wieling, M Nissim - arXiv preprint arXiv:2305.13026, 2023 - arxiv.org
We introduce the Dutch Model Benchmark: DUMB. The benchmark includes a diverse set of
datasets for low-, medium-and high-resource tasks. The total set of nine tasks includes four …

From base to conversational: Japanese instruction dataset and tuning large language models

M Suzuki, M Hirano, H Sakaji - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Instruction tuning is essential for large language models (LLMs) to become interactive. While
many instruction tuning datasets exist in English, there is a noticeable lack in other …

A survey of deep learning techniques for machine reading comprehension

S Kazi, S Khoja, A Daud - Artificial Intelligence Review, 2023 - Springer
Reading comprehension involves the process of reading and understanding textual
information in order to answer questions related to it. It finds practical applications in various …

Hae-rae bench: Evaluation of korean knowledge in language models

G Son, H Lee, S Kim, H Kim, J Lee, JW Yeom… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) trained on massive corpora demonstrate impressive
capabilities in a wide range of tasks. While there are ongoing efforts to adapt these models …

JCoLA: Japanese Corpus of Linguistic Acceptability

T Someya, Y Sugimoto, Y Oseki - arXiv preprint arXiv:2309.12676, 2023 - arxiv.org
Neural language models have exhibited outstanding performance in a range of downstream
tasks. However, there is limited understanding regarding the extent to which these models …

Jacolbert and hard negatives, towards better japanese-first embeddings for retrieval: Early technical report

B Clavié - arXiv preprint arXiv:2312.16144, 2023 - arxiv.org
Document retrieval in many languages has been largely relying on multi-lingual models,
and leveraging the vast wealth of English training data. In Japanese, the best performing …

Japanese SimCSE technical report

H Tsukagoshi, R Sasano, K Takeda - arXiv preprint arXiv:2310.19349, 2023 - arxiv.org
We report the development of Japanese SimCSE, Japanese sentence embedding models
fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese …

Aurora-m: The first open source multilingual language model red-teamed according to the us executive order

T Nakamura, M Mishra, S Tedeschi, Y Chai… - arXiv preprint arXiv …, 2024 - arxiv.org
Pretrained language models underpin several AI applications, but their high computational
cost for training limits accessibility. Initiatives such as BLOOM and StarCoder aim to …