How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

J Huang, EJ Li, MH Lam, T Liang, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Decision-making, a complicated task requiring various types of abilities, presents an
excellent framework for assessing Large Language Models (LLMs). Our research …

Near to mid-term risks and opportunities of open source generative ai

F Eiras, A Petrov, B Vidgen, CS de Witt, F Pizzati… - arXiv preprint arXiv …, 2024 - arxiv.org
In the next few years, applications of Generative AI are expected to revolutionize a number
of different areas, ranging from science & medicine to education. The potential for these …

clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents

A Beyer, K Chalamalasetti, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
It has been established in recent work that Large Language Models (LLMs) can be
prompted to" self-play" conversational games that probe certain capabilities (general …

Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games

N Herr, F Acero, R Raileanu, M Pérez-Ortiz… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their
strategic decision-making abilities remain largely unexplored. To fully benefit from the …

How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics

N Bhavsar, J Jordan, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
What makes a good Large Language Model (LLM)? That it performs well on the relevant
benchmarks--which hopefully measure, with some validity, the presence of capabilities that …

Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

M Cheng, H Zhang, J Yang, Q Liu, L Li… - … Proceedings of the …, 2024 - dl.acm.org
Large language model evaluation plays a pivotal role in the enhancement of its capacity.
Previously, numerous methods for evaluating large language models have been proposed …

Large Language Models are Bad Game Theoretic Reasoners: Evaluating Performance and Bias in Two-Player Non-Zero-Sum Games

N Herr, F Acero, R Raileanu, M Perez-Ortiz… - ICML 2024 Workshop on … - openreview.net
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their
strategic abilities remain largely unexplored. Game theory provides a good framework for …