A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Evaluating large language models: A comprehensive survey

Z Guo, R Jin, C Liu, Y Huang, D Shi, L Yu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a broad
spectrum of tasks. They have attracted significant attention and been deployed in numerous …

A survey on large language models: Applications, challenges, limitations, and practical usage

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - techrxiv.org
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, R Qureshi, A Shah, M Irfan, A Zafar… - Authorea …, 2023 - techrxiv.org
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

[PDF][PDF] Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng… - arXiv preprint arXiv …, 2023 - researchgate.net
Abstract Large Language Models (LLMs) have demonstrated remarkable capabilities in
important tasks such as natural language understanding, language generation, and …

Dyval: Graph-informed dynamic evaluation of large language models

K Zhu, J Chen, J Wang, NZ Gong, D Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have achieved remarkable performance in various
evaluation benchmarks. However, concerns about their performance are raised on potential …

Benchmarking llms via uncertainty quantification

F Ye, M Yang, J Pang, L Wang, DF Wong… - arXiv preprint arXiv …, 2024 - arxiv.org
The proliferation of open-source Large Language Models (LLMs) from various institutions
has highlighted the urgent need for comprehensive evaluation methods. However, current …

Are large language model-based evaluators the solution to scaling up multilingual evaluation?

R Hada, V Gumma, A de Wynter, H Diddee… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive performance on Natural
Language Processing (NLP) tasks, such as Question Answering, Summarization, and …

Towards an understanding of large language models in software engineering tasks

Z Zheng, K Ning, J Chen, Y Wang, W Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have drawn widespread attention and research due to their
astounding performance in tasks such as text generation and reasoning. Derivative …

State of what art? a call for multi-prompt llm evaluation

M Mizrahi, G Kaplan, D Malkin, R Dror… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advances in large language models (LLMs) have led to the development of various
evaluation benchmarks. These benchmarks typically rely on a single instruction template for …