Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Openeqa: Embodied question answering in the era of foundation models

A Majumdar, A Ajay, X Zhang, P Putta… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …

Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Affordancellm: Grounding affordance from vision language models

S Qian, W Chen, M Bai, X Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
Affordance grounding refers to the task of finding the area of an object with which one can
interact. It is a fundamental but challenging task as a successful solution requires the …

Large language models as generalizable policies for embodied tasks

A Szot, M Schwarzer, H Agrawal… - The Twelfth …, 2023 - openreview.net
We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …

Ok-robot: What really matters in integrating open-knowledge models for robotics

P Liu, Y Orru, C Paxton, NMM Shafiullah… - arXiv preprint arXiv …, 2024 - arxiv.org
Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …

Prompt a robot to walk with large language models

YJ Wang, B Zhang, J Chen, K Sreenath - arXiv preprint arXiv:2309.09969, 2023 - arxiv.org
Large language models (LLMs) pre-trained on vast internet-scale data have showcased
remarkable capabilities across diverse domains. Recently, there has been escalating …

Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation

M Khanna, Y Mao, H Jiang, S Haresh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We contribute the Habitat Synthetic Scene Dataset a dataset of 211 high-quality 3D
scenes and use it to test navigation agent generalization to realistic 3D environments. Our …

How to prompt your robot: A promptbook for manipulation skills with code as policies

MG Arenas, T Xiao, S Singh, V Jain… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) have demonstrated the ability to perform semantic
reasoning, planning and write code for robotics tasks. However, most methods rely on pre …