How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments
Decision-making, a complicated task requiring various types of abilities, presents an
excellent framework for assessing Large Language Models (LLMs). Our research …
excellent framework for assessing Large Language Models (LLMs). Our research …
Beyond static datasets: A deep interaction approach to llm evaluation
Large Language Models (LLMs) have made progress in various real-world tasks, which
stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are …
stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are …
A survey on large language model-based game agents
The development of game agents holds a critical role in advancing towards Artificial General
Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers …
Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers …
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
This paper presents a comprehensive survey of the current status and opportunities for
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …
clembench-2024: A Challenging, Dynamic, Complementary, Multilingual Benchmark and Underlying Flexible Framework for LLMs as Multi-Action Agents
A Beyer, K Chalamalasetti, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
It has been established in recent work that Large Language Models (LLMs) can be
prompted to" self-play" conversational games that probe certain capabilities (general …
prompted to" self-play" conversational games that probe certain capabilities (general …
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context
Large language models (LLMs) have demonstrated the potential to mimic human social
intelligence. However, most studies focus on simplistic and static self-report or performance …
intelligence. However, most studies focus on simplistic and static self-report or performance …
[PDF][PDF] BERALL: Towards Generating Retrieval-augmented State-based Interactive Fiction Games
R Chambers, N Tack, E Pearson… - The 4th Wordplay …, 2024 - wordplay-workshop.github.io
Interactive fiction (IF) games are a genre of games where the player interacts with the
fictional world via text-based commands, solving puzzles primarily by exploring the world …
fictional world via text-based commands, solving puzzles primarily by exploring the world …
Benchmarking Large Language Model (LLM) Performance for Game Playing via Tic-Tac-Toe
O Topsakal, JB Harper - Electronics, 2024 - mdpi.com
This study investigates the strategic decision-making abilities of large language models
(LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive …
(LLMs) via the game of Tic-Tac-Toe, renowned for its straightforward rules and definitive …
Two Giraffes in a Dirt Field: Using Game Play to Investigate Situation Modelling in Large Multimodal Models
While the situation has improved for text-only models, it again seems to be the case currently
that multimodal (text and image) models develop faster than ways to evaluate them. In this …
that multimodal (text and image) models develop faster than ways to evaluate them. In this …
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics
N Bhavsar, J Jordan, S Hakimov… - arXiv preprint arXiv …, 2024 - arxiv.org
What makes a good Large Language Model (LLM)? That it performs well on the relevant
benchmarks--which hopefully measure, with some validity, the presence of capabilities that …
benchmarks--which hopefully measure, with some validity, the presence of capabilities that …