Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Visual language maps for robot navigation

C Huang, O Mees, A Zeng… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Grounding language to the visual observations of a navigating agent can be performed
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …

ViNT: A foundation model for visual navigation

D Shah, A Sridhar, N Dashora, K Stachowicz… - arXiv preprint arXiv …, 2023 - arxiv.org
General-purpose pre-trained models (" foundation models") have enabled practitioners to
produce generalizable solutions for individual machine learning problems with datasets that …

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

Correcting robot plans with natural language feedback

P Sharma, B Sundaralingam, V Blukis, C Paxton… - arXiv preprint arXiv …, 2022 - arxiv.org
When humans design cost or goal specifications for robots, they often produce specifications
that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases …

A persistent spatial semantic representation for high-level natural language instruction execution

V Blukis, C Paxton, D Fox, A Garg… - Conference on Robot …, 2022 - proceedings.mlr.press
Natural language provides an accessible and expressive interface to specify long-term tasks
for robotic agents. However, non-experts are likely to specify such tasks with high-level …

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

Habitat 3.0: A co-habitat for humans, avatars and robots

X Puig, E Undersander, A Szot, MD Cote… - arXiv preprint arXiv …, 2023 - arxiv.org
We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in
home environments. Habitat 3.0 offers contributions across three dimensions:(1) Accurate …