Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

D Myers, R Mohawesh, VI Chellaboina, AL Sathvik… - Cluster …, 2024 - Springer
Abstract Foundation and Large Language Models (FLLMs) are models that are trained using
a massive amount of data with the intent to perform a variety of downstream tasks. FLLMs …

Self-rewarding language models

W Yuan, RY Pang, K Cho, S Sukhbaatar, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
We posit that to achieve superhuman agents, future models require superhuman feedback
in order to provide an adequate training signal. Current approaches commonly train reward …

[HTML][HTML] Exploring human-like translation strategy with large language models

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu
Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …

xcomet: Transparent machine translation evaluation through fine-grained error detection

NM Guerreiro, R Rei, D van Stigt, L Coheur… - arXiv preprint arXiv …, 2023 - arxiv.org
Widely used learned metrics for machine translation evaluation, such as COMET and
BLEURT, estimate the quality of a translation hypothesis by providing a single sentence …

Are large language model-based evaluators the solution to scaling up multilingual evaluation?

R Hada, V Gumma, A de Wynter, H Diddee… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive performance on Natural
Language Processing (NLP) tasks, such as Question Answering, Summarization, and …

A comprehensive survey on instruction following

R Lou, K Zhang, W Yin - arXiv preprint arXiv:2303.10475, 2023 - arxiv.org
Task semantics can be expressed by a set of input-output examples or a piece of textual
instruction. Conventional machine learning approaches for natural language processing …

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org
We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

GEMBA-MQM: Detecting translation quality error spans with GPT-4

T Kocmi, C Federmann - arXiv preprint arXiv:2310.13988, 2023 - arxiv.org
This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect
translation quality errors, specifically for the quality estimation setting without the need for …

Dyval 2: Dynamic evaluation of large language models by meta probing agents

K Zhu, J Wang, Q Zhao, R Xu, X Xie - arXiv preprint arXiv:2402.14865, 2024 - arxiv.org
Evaluation of large language models (LLMs) has raised great concerns in the community
due to the issue of data contamination. Existing work designed evaluation protocols using …

Scaling up cometkiwi: Unbabel-ist 2023 submission for the quality estimation shared task

R Rei, NM Guerreiro, J Pombal, D van Stigt… - arXiv preprint arXiv …, 2023 - arxiv.org
We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT
2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence …