Homerobot: Open-vocabulary mobile manipulation

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

被引用次数：76 相关文章所有 2 个版本

[PDF] thecvf.com

Openeqa: Embodied question answering in the era of foundation models

A Majumdar, A Ajay, X Zhang, P Putta… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …

被引用次数：43 相关文章所有 2 个版本

[PDF] arxiv.org

Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

被引用次数：15 相关文章所有 4 个版本

[PDF] thecvf.com

Affordancellm: Grounding affordance from vision language models

S Qian, W Chen, M Bai, X Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

Affordance grounding refers to the task of finding the area of an object with which one can
interact. It is a fundamental but challenging task as a successful solution requires the …

被引用次数：14 相关文章所有 4 个版本

[PDF] openreview.net

Large language models as generalizable policies for embodied tasks

A Szot, M Schwarzer, H Agrawal… - The Twelfth …, 2023 - openreview.net

We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Ok-robot: What really matters in integrating open-knowledge models for robotics

P Liu, Y Orru, C Paxton, NMM Shafiullah… - arXiv preprint arXiv …, 2024 - arxiv.org

Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …

被引用次数：36 相关文章所有 4 个版本

[PDF] arxiv.org

Prompt a robot to walk with large language models

YJ Wang, B Zhang, J Chen, K Sreenath - arXiv preprint arXiv:2309.09969, 2023 - arxiv.org

Large language models (LLMs) pre-trained on vast internet-scale data have showcased
remarkable capabilities across diverse domains. Recently, there has been escalating …

被引用次数：30 相关文章所有 3 个版本

[PDF] thecvf.com

Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation

M Khanna, Y Mao, H Jiang, S Haresh… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We contribute the Habitat Synthetic Scene Dataset a dataset of 211 high-quality 3D
scenes and use it to test navigation agent generalization to realistic 3D environments. Our …

被引用次数：23 相关文章所有 3 个版本

[PDF] openreview.net

How to prompt your robot: A promptbook for manipulation skills with code as policies

MG Arenas, T Xiao, S Singh, V Jain… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Large Language Models (LLMs) have demonstrated the ability to perform semantic
reasoning, planning and write code for robotics tasks. However, most methods rely on pre …

被引用次数：12 相关文章所有 2 个版本