How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Decision-making, a complicated task requiring various types of abilities, presents an
excellent framework for assessing Large Language Models (LLMs). Our research …
excellent framework for assessing Large Language Models (LLMs). Our research …
Near to mid-term risks and opportunities of open source generative ai
In the next few years, applications of Generative AI are expected to revolutionize a number
of different areas, ranging from science & medicine to education. The potential for these …
of different areas, ranging from science & medicine to education. The potential for these …
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
A Beyer, K Chalamalasetti, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
It has been established in recent work that Large Language Models (LLMs) can be
prompted to" self-play" conversational games that probe certain capabilities (general …
prompted to" self-play" conversational games that probe certain capabilities (general …
Are Large Language Models Strategic Decision Makers? A Study of Performance and Bias in Two-Player Non-Zero-Sum Games
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their
strategic decision-making abilities remain largely unexplored. To fully benefit from the …
strategic decision-making abilities remain largely unexplored. To fully benefit from the …
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
What makes a good Large Language Model (LLM)? That it performs well on the relevant
benchmarks--which hopefully measure, with some validity, the presence of capabilities that …
benchmarks--which hopefully measure, with some validity, the presence of capabilities that …
Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform
Large language model evaluation plays a pivotal role in the enhancement of its capacity.
Previously, numerous methods for evaluating large language models have been proposed …
Previously, numerous methods for evaluating large language models have been proposed …
Large Language Models are Bad Game Theoretic Reasoners: Evaluating Performance and Bias in Two-Player Non-Zero-Sum Games
Large Language Models (LLMs) have been increasingly used in real-world settings, yet their
strategic abilities remain largely unexplored. Game theory provides a good framework for …
strategic abilities remain largely unexplored. Game theory provides a good framework for …