A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

M Abbasian, E Khatibi, I Azimi, D Oniani… - NPJ Digital …, 2024 - nature.com
Abstract Generative Artificial Intelligence is set to revolutionize healthcare delivery by
transforming traditional patient care into a more personalized, efficient, and proactive …

M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models

W Zhang, M Aljunied, C Gao… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

Llama beyond english: An empirical study on language capability transfer

J Zhao, Z Zhang, L Gao, Q Zhang, T Gui… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent times, substantial advancements have been witnessed in large language models
(LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of …

Cmb: A comprehensive medical benchmark in chinese

X Wang, GH Chen, D Song, Z Zhang, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) provide a possibility to make a great breakthrough in
medicine. The establishment of a standardized medical benchmark becomes a fundamental …

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lyv, X Wang, Q Yin… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Lawbench: Benchmarking legal knowledge of large language models

Z Fei, X Shen, D Zhu, F Zhou, Z Han, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated strong capabilities in various aspects.
However, when applying them to the highly specialized, safe-critical legal domain, it is …

Nphardeval: Dynamic benchmark on reasoning ability of large language models via complexity classes

L Fan, W Hua, L Li, H Ling, Y Zhang - arXiv preprint arXiv:2312.14890, 2023 - arxiv.org
Complex reasoning ability is one of the most important features of current LLMs, which has
also been leveraged to play an integral role in complex decision-making tasks. Therefore …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …