Survey on large language model-enhanced reinforcement learning: Concept, taxonomy, and methods

Y Cao, H Zhao, Y Cheng, T Shu, Y Chen… - … on Neural Networks …, 2024 - ieeexplore.ieee.org
With extensive pretrained knowledge and high-level general capabilities, large language
models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in …

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Drivedreamer: Towards real-world-driven world models for autonomous driving

X Wang, Z Zhu, G Huang, X Chen, J Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org
World models, especially in autonomous driving, are trending and drawing extensive
attention due to their capacity for comprehending driving environments. The established …

Manigaussian: Dynamic gaussian splatting for multi-task robotic manipulation

G Lu, S Zhang, Z Wang, C Liu, J Lu, Y Tang - European Conference on …, 2025 - Springer
Performing language-conditioned robotic manipulation tasks in unstructured environments
is highly demanded for general intelligent robots. Conventional robotic manipulation …

Towards efficient llm grounding for embodied multi-agent collaboration

Y Zhang, S Yang, C Bai, F Wu, X Li, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is
challenging due to the complexity of the physical world. Especially, LLM planning for multi …

Policy adaptation via language optimization: Decomposing tasks for few-shot imitation

V Myers, BC Zheng, O Mees, S Levine… - arXiv preprint arXiv …, 2024 - arxiv.org
Learned language-conditioned robot policies often struggle to effectively adapt to new real-
world tasks even when pre-trained across a diverse set of instructions. We propose a novel …

Quar-vla: Vision-language-action model for quadruped robots

P Ding, H Zhao, W Zhang, W Song, M Zhang… - … on Computer Vision, 2025 - Springer
The important manifestation of robot intelligence is the ability to naturally interact and
autonomously make decisions. Traditional quadruped robot learning typically handles …

Worlddreamer: Towards general world models for video generation via predicting masked tokens

X Wang, Z Zhu, G Huang, B Wang, X Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
World models play a crucial role in understanding and predicting the dynamics of the world,
which is essential for video generation. However, existing world models are confined to …

Sora as an agi world model? a complete survey on text-to-video generation

J Cho, FD Puspitasari, S Zheng, J Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-video generation marks a significant frontier in the rapidly evolving domain of
generative AI, integrating advancements in text-to-image synthesis, video captioning, and …