The devil is in the errors: Leveraging large language models for fine-grained machine translation...

Foundation and large language models: fundamentals, challenges, opportunities, and social impacts

D Myers, R Mohawesh, VI Chellaboina, AL Sathvik… - Cluster …, 2024 - Springer

Abstract Foundation and Large Language Models (FLLMs) are models that are trained using
a massive amount of data with the intent to perform a variety of downstream tasks. FLLMs …

被引用次数：47 相关文章所有 2 个版本

[PDF] amazonaws.com

Self-rewarding language models

W Yuan, RY Pang, K Cho, S Sukhbaatar, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

We posit that to achieve superhuman agents, future models require superhuman feedback
in order to provide an adequate training signal. Current approaches commonly train reward …

被引用次数：164 相关文章所有 4 个版本

[HTML] mit.edu

[HTML][HTML] Exploring human-like translation strategy with large language models

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu

Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …

被引用次数：45 相关文章所有 9 个版本

[PDF] arxiv.org

xcomet: Transparent machine translation evaluation through fine-grained error detection

NM Guerreiro, R Rei, D van Stigt, L Coheur… - arXiv preprint arXiv …, 2023 - arxiv.org

Widely used learned metrics for machine translation evaluation, such as COMET and
BLEURT, estimate the quality of a translation hypothesis by providing a single sentence …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Are large language model-based evaluators the solution to scaling up multilingual evaluation?

R Hada, V Gumma, A de Wynter, H Diddee… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated impressive performance on Natural
Language Processing (NLP) tasks, such as Question Answering, Summarization, and …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

A comprehensive survey on instruction following

R Lou, K Zhang, W Yin - arXiv preprint arXiv:2303.10475, 2023 - arxiv.org

Task semantics can be expressed by a set of input-output examples or a piece of textual
instruction. Conventional machine learning approaches for natural language processing …

被引用次数：4 相关文章所有 2 个版本

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

GEMBA-MQM: Detecting translation quality error spans with GPT-4

T Kocmi, C Federmann - arXiv preprint arXiv:2310.13988, 2023 - arxiv.org

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect
translation quality errors, specifically for the quality estimation setting without the need for …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Dyval 2: Dynamic evaluation of large language models by meta probing agents

K Zhu, J Wang, Q Zhao, R Xu, X Xie - arXiv preprint arXiv:2402.14865, 2024 - arxiv.org

Evaluation of large language models (LLMs) has raised great concerns in the community
due to the issue of data contamination. Existing work designed evaluation protocols using …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Scaling up cometkiwi: Unbabel-ist 2023 submission for the quality estimation shared task

R Rei, NM Guerreiro, J Pombal, D van Stigt… - arXiv preprint arXiv …, 2023 - arxiv.org

We present the joint contribution of Unbabel and Instituto Superior T\'ecnico to the WMT
2023 Shared Task on Quality Estimation (QE). Our team participated on all tasks: sentence …

被引用次数：13 相关文章所有 6 个版本