Measure and improve robustness in NLP models: A survey

P Liang, R Bommasani, T Lee, D Tsipras… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

被引用次数：1127 相关文章所有 5 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：637 相关文章所有 4 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：230 相关文章所有 4 个版本

[PDF] arxiv.org

Grounding and evaluation for large language models: Practical challenges and lessons learned (survey)

K Kenthapadi, M Sameki, A Taly - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org

With the ongoing rapid adoption of Artificial Intelligence (AI)-based systems in high-stakes
domains, ensuring the trustworthiness, safety, and observability of these systems has …

被引用次数：8 相关文章所有 3 个版本

[PDF] mit.edu

State of what art? a call for multi-prompt llm evaluation

M Mizrahi, G Kaplan, D Malkin, R Dror… - Transactions of the …, 2024 - direct.mit.edu

Recent advances in LLMs have led to an abundance of evaluation benchmarks, which
typically rely on a single instruction template per task. We create a large-scale collection of …

被引用次数：68 相关文章所有 4 个版本

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

被引用次数：16 相关文章

[PDF] neurips.cc

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

被引用次数：61 相关文章所有 6 个版本

[PDF] arxiv.org

Glue-x: Evaluating natural language understanding models from an out-of-distribution generalization perspective

L Yang, S Zhang, L Qin, Y Li, Y Wang, H Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Pre-trained language models (PLMs) are known to improve the generalization performance
of natural language understanding models by leveraging large amounts of data during the …

被引用次数：76 相关文章所有 6 个版本

[PDF] arxiv.org

Robust recommender system: a survey and future directions

K Zhang, Q Cao, F Sun, Y Wu, S Tao, H Shen… - arXiv preprint arXiv …, 2023 - arxiv.org

With the rapid growth of information, recommender systems have become integral for
providing personalized suggestions and overcoming information overload. However, their …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

SemEval-2024 task 2: Safe biomedical natural language inference for clinical trials

M Jullien, M Valentino, A Freitas - arXiv preprint arXiv:2404.04963, 2024 - arxiv.org

Large Language Models (LLMs) are at the forefront of NLP achievements but fall short in
dealing with shortcut learning, factual inconsistency, and vulnerability to adversarial inputs …

被引用次数：48 相关文章所有 2 个版本