A survey on integration of large language models with intelligent robots

Y Kim, D Kim, J Choi, J Park, N Oh, D Park - Intelligent Service Robotics, 2024 - Springer
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …

Open x-embodiment: Robotic learning datasets and rt-x models

A O'Neill, A Rehman, A Gupta, A Maddukuri… - arXiv preprint arXiv …, 2023 - arxiv.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Voxposer: Composable 3d value maps for robotic manipulation with language models

W Huang, C Wang, R Zhang, Y Li, J Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that
can be extracted for robot manipulation in the form of reasoning and planning. Despite the …

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

B Zitkovich, T Yu, S Xu, P Xu, T Xiao… - … on Robot Learning, 2023 - proceedings.mlr.press
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0

A O'Neill, A Rehman, A Maddukuri… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Octo: An open-source generalist robot policy

OM Team, D Ghosh, H Walke, K Pertsch… - arXiv preprint arXiv …, 2024 - arxiv.org
Large policies pretrained on diverse robot datasets have the potential to transform robotic
learning: instead of training new policies from scratch, such generalist robot policies may be …

Where are we in the search for an artificial visual cortex for embodied intelligence?

A Majumdar, K Yadav, S Arnaud, J Ma… - Advances in …, 2023 - proceedings.neurips.cc
We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …

Tptu: Task planning and tool usage of large language model-based ai agents

J Ruan, Y Chen, B Zhang, Z Xu, T Bao… - … Models for Decision …, 2023 - openreview.net
With recent advancements in natural language processing, Large Language Models (LLMs)
have emerged as powerful tools for various real-world applications. Despite their prowess …

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - First Workshop on Vision …, 2024 - openreview.net
Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …