Multimodal self-instruct: Synthetic abstract image and visual reasoning instruction using language model
Although most current large multimodal models (LMMs) can already understand photos of
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …
natural scenes and portraits, their understanding of abstract images, eg, charts, maps, or …
Mutual theory of mind in human-ai collaboration: An empirical study with llm-driven ai agents in a real-time shared workspace task
Theory of Mind (ToM) significantly impacts human collaboration and communication as a
crucial capability to understand others. When AI agents with ToM capability collaborate with …
crucial capability to understand others. When AI agents with ToM capability collaborate with …
Richelieu: Self-evolving llm-based agents for ai diplomacy
Diplomacy is one of the most sophisticated activities in human society, involving complex
interactions among multiple parties that require skills in social reasoning, negotiation, and …
interactions among multiple parties that require skills in social reasoning, negotiation, and …
Physgame: Uncovering physical commonsense violations in gameplay videos
Recent advancements in video-based large language models (Video LLMs) have witnessed
the emergence of diverse capabilities to reason and interpret dynamic visual content …
the emergence of diverse capabilities to reason and interpret dynamic visual content …
A survey on large language model-based game agents
The development of game agents holds a critical role in advancing towards Artificial General
Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers …
Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers …
Generalist virtual agents: A survey on autonomous agents across digital platforms
In this paper, we introduce the Generalist Virtual Agent (GVA), an autonomous entity
engineered to function across diverse digital platforms and environments, assisting users by …
engineered to function across diverse digital platforms and environments, assisting users by …
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models
This paper presents a comprehensive survey of the current status and opportunities for
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …
Large Language Models (LLMs) in strategic reasoning, a sophisticated form of reasoning …
Understanding World or Predicting Future? A Comprehensive Survey of World Models
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …
multimodal large language models such as GPT-4 and video generation models such as …
FinVision: A Multi-Agent Framework for Stock Market Prediction
Financial trading has been a challenging task, as it requires the integration of vast amounts
of data from various modalities. Traditional deep learning and reinforcement learning …
of data from various modalities. Traditional deep learning and reinforcement learning …
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
Recent advancements in large language models (LLMs) have significantly enhanced the
ability of LLM-based systems to perform complex tasks through natural language processing …
ability of LLM-based systems to perform complex tasks through natural language processing …