Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods

Y Cao, H Zhao, Y Cheng, T Shu, Y Chen… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …

Palm-e: An embodied multimodal language model

D Driess, F Xia, MSM Sajjadi, C Lynch… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models excel at a wide range of complex tasks. However, enabling general
inference in the real world, eg, for robotics problems, raises the challenge of grounding. We …

Camel: Communicative agents for" mind" exploration of large language model society

G Li, H Hammoud, H Itani… - Advances in Neural …, 2023 - proceedings.neurips.cc
The rapid advancement of chat-based language models has led to remarkable progress in
complex task-solving. However, their success heavily relies on human input to guide the …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

React: Synergizing reasoning and acting in language models

S Yao, J Zhao, D Yu, N Du, I Shafran… - arXiv preprint arXiv …, 2022 - arxiv.org
While large language models (LLMs) have demonstrated impressive capabilities across
tasks in language understanding and interactive decision making, their abilities for …

Progprompt: Generating situated robot task plans using large language models

I Singh, V Blukis, A Mousavian, A Goyal… - … on Robotics and …, 2023 - ieeexplore.ieee.org
Task planning can require defining myriad domain knowledge about the world in which a
robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to …

Inner monologue: Embodied reasoning through planning with language models

W Huang, F Xia, T Xiao, H Chan, J Liang… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

A generalist agent

S Reed, K Zolna, E Parisotto, SG Colmenarejo… - arXiv preprint arXiv …, 2022 - arxiv.org
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …

Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings

S Hao, T Liu, Z Wang, Z Hu - Advances in neural …, 2023 - proceedings.neurips.cc
Integrating large language models (LLMs) with various tools has led to increased attention
in the field. Existing approaches either involve fine-tuning the LLM, which is both …