Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Menli: Robust evaluation metrics from natural language inference

Y Chen, S Eger - Transactions of the Association for Computational …, 2023 - direct.mit.edu
Recently proposed BERT-based evaluation metrics for text generation perform well on
standard benchmarks but are vulnerable to adversarial attacks, eg, relating to information …

Llms as narcissistic evaluators: When ego inflates evaluation scores

Y Liu, NS Moosavi, C Lin - arXiv preprint arXiv:2311.09766, 2023 - arxiv.org
Automatic evaluation of generated textual content presents an ongoing challenge within the
field of NLP. Given the impressive capabilities of modern language models (LMs) across …

RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue

Z Shi, W Sun, S Zhang, Z Zhang, P Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating open-domain dialogue systems is challenging for reasons such as the one-to-
many problem, ie, many appropriate responses other than just the golden response. As of …

CLEME: debiasing multi-reference evaluation for grammatical error correction

J Ye, Y Li, Q Zhou, Y Li, S Ma, HT Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the performance of Grammatical Error Correction (GEC) systems is a challenging
task due to its subjectivity. Designing an evaluation metric that is as objective as possible is …

Aligning neural machine translation models: Human feedback in training and inference

MM Ramos, P Fernandes, A Farinhas… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a recent technique to improve the
quality of the text generated by a language model, making it closer to what humans would …

Evaluation metrics on text summarization: comprehensive survey

E Davoodijam, M Alambardar Meybodi - Knowledge and Information …, 2024 - Springer
Automatic text summarization is the process of shortening a large document into a summary
text that preserves the main concepts and key points of the original document. Due to the …

Large language models are inconsistent and biased evaluators

R Stureborg, D Alikaniotis, Y Suhara - arXiv preprint arXiv:2405.01724, 2024 - arxiv.org
The zero-shot capability of Large Language Models (LLMs) has enabled highly flexible,
reference-free metrics for various tasks, making LLM evaluators common tools in NLP …

Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives

DH Hagos, R Battle, DB Rawat - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
The emergence of Generative Artificial Intelligence (AI) and Large Language Models (LLMs)
has marked a new era of Natural Language Processing (NLP), introducing unprecedented …

ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications

S Takeshita, T Green, I Reinig, K Eckert… - arXiv preprint arXiv …, 2024 - arxiv.org
Extensive efforts in the past have been directed toward the development of summarization
datasets. However, a predominant number of these resources have been (semi) …