A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews

X Wang, Y Xiao, J Huang, S Yuan, R Xu… - Proceedings of the …, 2024 - aclanthology.org
Abstract Role-playing agents (RPAs), powered by large language models, have emerged as
a flourishing field of applications. However, a key challenge lies in assessing whether RPAs …

Unveiling bias in fairness evaluations of large language models: A critical literature review of music and movie recommendation systems

CK Sah, L Xiaoli, MM Islam - arXiv preprint arXiv:2401.04057, 2024 - arxiv.org
The rise of generative artificial intelligence, particularly Large Language Models (LLMs), has
intensified the imperative to scrutinize fairness alongside accuracy. Recent studies have …

Emotionally numb or empathetic? evaluating how llms feel using emotionbench

J Huang, MH Lam, EJ Li, S Ren, W Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating Large Language Models'(LLMs) anthropomorphic capabilities has become
increasingly important in contemporary discourse. Utilizing the emotion appraisal theory …

Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …

Chatgpt an enfj, bard an istj: Empirical study on personalities of large language models

J Huang, W Wang, MH Lam, EJ Li, W Jiao… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have made remarkable advancements in the field of
artificial intelligence, significantly reshaping the human-computer interaction. We not only …

Who is ChatGPT? Benchmarking LLMs' Psychological Portrayal Using PsychoBench

J Huang, W Wang, EJ Li, MH Lam, S Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently showcased their remarkable capacities, not
only in natural language processing tasks but also across diverse domains such as clinical …

Exploring the Sensitivity of LLMs' Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters

M Loya, DA Sinha, R Futrell - arXiv preprint arXiv:2312.17476, 2023 - arxiv.org
The advancement of Large Language Models (LLMs) has led to their widespread use
across a broad spectrum of tasks including decision making. Prior studies have compared …

On the reliability of psychological scales on large language models

J Huang, W Jiao, MH Lam, EJ Li… - Proceedings of The …, 2024 - aclanthology.org
Recent research has focused on examining Large Language Models'(LLMs) characteristics
from a psychological standpoint, acknowledging the necessity of understanding their …

Can a computer outfake a human?

J Phillips, C Robie - Personality and Individual Differences, 2024 - Elsevier
Faking on personality tests continues to be a challenge in hiring practices, and with the
increased accessibility to free, generative AI large language models (LLM), the difference …