Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …
Palm-e: An embodied multimodal language model
Large language models excel at a wide range of complex tasks. However, enabling general
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …
Camel: Communicative agents for" mind" exploration of large language model society
The rapid advancement of chat-based language models has led to remarkable progress in
complex task-solving. However, their success heavily relies on human input to guide the …
complex task-solving. However, their success heavily relies on human input to guide the …
Voxposer: Composable 3d value maps for robotic manipulation with language models
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …
React: Synergizing reasoning and acting in language models
While large language models (LLMs) have demonstrated impressive capabilities across
tasks in language understanding and interactive decision making, their abilities for …
tasks in language understanding and interactive decision making, their abilities for …
Progprompt: Generating situated robot task plans using large language models
Task planning can require defining myriad domain knowledge about the world in which a
robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to …
robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to …
Inner monologue: Embodied reasoning through planning with language models
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …
(LLMs) can be applied to domains beyond natural language processing, such as planning …
A generalist agent
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …
towards building a single generalist agent beyond the realm of text outputs. The agent …
Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings
Integrating large language models (LLMs) with various tools has led to increased attention
in the field. Existing approaches either involve fine-tuning the LLM, which is both …
in the field. Existing approaches either involve fine-tuning the LLM, which is both …