Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model

W Zhang, Z Cheng, Y He, M Wang, Y Shen… - arXiv preprint arXiv …, 2024 - arxiv.org
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …

Mutual theory of mind in human-ai collaboration: An empirical study with llm-driven ai agents in a real-time shared workspace task

S Zhang, X Wang, W Zhang, Y Chen, L Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Theory of Mind (ToM) significantly impacts human collaboration and communication as a
crucial capability to understand others. When AI agents with ToM capability collaborate with …

Richelieu: Self-evolving llm-based agents for ai diplomacy

Z Guan, X Kong, F Zhong, Y Wang - arXiv preprint arXiv:2407.06813, 2024 - arxiv.org
Diplomacy is one of the most sophisticated activities in human society, involving complex
interactions among multiple parties that require skills in social reasoning, negotiation, and …

Physgame: Uncovering physical commonsense violations in gameplay videos

M Cao, H Tang, H Zhao, H Guo, J Liu, G Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in video-based large language models (Video LLMs) have witnessed
the emergence of diverse capabilities to reason and interpret dynamic visual content …

A survey on large language model-based game agents

S Hu, T Huang, F Ilhan, S Tekin, G Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of game agents holds a critical role in advancing towards Artificial General
Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers …

Generalist virtual agents: A survey on autonomous agents across digital platforms

M Gao, W Bu, B Miao, Y Wu, Y Li, J Li, S Tang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models

Y Zhang, S Mao, T Ge, X Wang, A de Wynter… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents a comprehensive survey of the current status and opportunities for
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …

Understanding World or Predicting Future? A Comprehensive Survey of World Models

J Ding, Y Zhang, Y Shang, Y Zhang, Z Zong… - arXiv preprint arXiv …, 2024 - arxiv.org
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …

FinVision: A Multi-Agent Framework for Stock Market Prediction

S Fatemi, Y Hu - Proceedings of the 5th ACM International Conference …, 2024 - dl.acm.org
Financial trading has been a challenging task, as it requires the integration of vast amounts
of data from various modalities. Traditional deep learning and reinforcement learning …

Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

P Zhang, H Jin, L Hu, X Li, L Kang, M Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have significantly enhanced the
ability of LLM-based systems to perform complex tasks through natural language processing …