Understanding and mitigating language confusion in llms

K Marchisio, WY Ko, A Bérard, T Dehaze… - arXiv preprint arXiv …, 2024 - arxiv.org
We investigate a surprising limitation of LLMs: their inability to consistently generate text in a
user's desired language. We create the Language Confusion Benchmark (LCB) to evaluate …

LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

Y Lu, W Zhu, L Li, Y Qiao, F Yuan - arXiv preprint arXiv:2407.05975, 2024 - arxiv.org
Large Language Models~(LLMs) demonstrate remarkable translation capabilities in high-
resource language tasks, yet their performance in low-resource languages is hindered by …

Preference Tuning For Toxicity Mitigation Generalizes Across Languages

X Li, ZX Yong, SH Bach - arXiv preprint arXiv:2406.16235, 2024 - arxiv.org
Detoxifying multilingual Large Language Models (LLMs) has become crucial due to their
increasing global use. In this work, we explore zero-shot cross-lingual generalization of …

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

J Dang, A Ahmadian, K Marchisio, J Kreutzer… - arXiv preprint arXiv …, 2024 - arxiv.org
Preference optimization techniques have become a standard final stage for training state-of-
art large language models (LLMs). However, despite widespread adoption, the vast majority …

How Does Quantization Affect Multilingual LLMs?

K Marchisio, S Dash, H Chen, D Aumiller… - arXiv preprint arXiv …, 2024 - arxiv.org
Quantization techniques are widely used to improve inference speed and deployment of
large language models. While a wide body of work examines the impact of quantized LLMs …

MINERS: Multilingual Language Models as Semantic Retrievers

GI Winata, R Zhang, DI Adelani - arXiv preprint arXiv:2406.07424, 2024 - arxiv.org
Words have been represented in a high-dimensional vector space that encodes their
semantic similarities, enabling downstream applications such as retrieving synonyms …

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

J Gallifant, S Chen, P Moreira, N Munch, M Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Medical knowledge is context-dependent and requires consistent reasoning across various
natural language expressions of semantically equivalent phrases. This is particularly crucial …

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

P Gupta, LQ Yau, HH Low, I Lee, HM Lim… - arXiv preprint arXiv …, 2024 - arxiv.org
WalledEval is a comprehensive AI safety testing toolkit designed to evaluate large language
models (LLMs). It accommodates a diverse range of models, including both open-weight …

M2QA: Multi-domain Multilingual Question Answering

L Engländer, H Sterz, C Poth, J Pfeiffer… - arXiv preprint arXiv …, 2024 - arxiv.org
Generalization and robustness to input variation are core desiderata of machine learning
research. Language varies along several axes, most importantly, language instance (eg …

BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

J Myung, N Lee, Y Zhou, J Jin, RA Putri… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) often lack culture-specific knowledge of daily life, especially
across diverse regions and non-English languages. Existing benchmarks for evaluating …