Evaluating the performance of large language models on gaokao benchmark

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：1794 相关文章所有 4 个版本

[PDF] nature.com

Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI

M Abbasian, E Khatibi, I Azimi, D Oniani… - NPJ Digital …, 2024 - nature.com

Abstract Generative Artificial Intelligence is set to revolutionize healthcare delivery by
transforming traditional patient care into a more personalized, efficient, and proactive …

被引用次数：38 相关文章所有 7 个版本

[PDF] neurips.cc

M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models

W Zhang, M Aljunied, C Gao… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite the existence of various benchmarks for evaluating natural language processing
models, we argue that human exams are a more suitable means of evaluating general …

被引用次数：92 相关文章所有 5 个版本

[PDF] arxiv.org

Yi: Open foundation models by 01. ai

A Young, B Chen, C Li, C Huang, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce the Yi model family, a series of language and multimodal models that
demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and …

被引用次数：285 相关文章所有 2 个版本

[PDF] arxiv.org

Llama beyond english: An empirical study on language capability transfer

J Zhao, Z Zhang, L Gao, Q Zhang, T Gui… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent times, substantial advancements have been witnessed in large language models
(LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of …

被引用次数：47 相关文章所有 2 个版本

[PDF] arxiv.org

Cmb: A comprehensive medical benchmark in chinese

X Wang, GH Chen, D Song, Z Zhang, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) provide a possibility to make a great breakthrough in
medicine. The establishment of a standardized medical benchmark becomes a fundamental …

被引用次数：62 相关文章所有 3 个版本

[PDF] arxiv.org

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lyv, X Wang, Q Yin… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

被引用次数：32 相关文章所有 2 个版本

[PDF] arxiv.org

Lawbench: Benchmarking legal knowledge of large language models

Z Fei, X Shen, D Zhu, F Zhou, Z Han, S Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated strong capabilities in various aspects.
However, when applying them to the highly specialized, safe-critical legal domain, it is …

被引用次数：63 相关文章所有 3 个版本

[PDF] arxiv.org

Nphardeval: Dynamic benchmark on reasoning ability of large language models via complexity classes

L Fan, W Hua, L Li, H Ling, Y Zhang - arXiv preprint arXiv:2312.14890, 2023 - arxiv.org

Complex reasoning ability is one of the most important features of current LLMs, which has
also been leveraged to play an integral role in complex decision-making tasks. Therefore …

被引用次数：24 相关文章所有 3 个版本

[PDF] arxiv.org

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org

Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

被引用次数：47 相关文章所有 2 个版本