Sim-to-real transfer for vision-and-language navigation

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press

Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

被引用次数：335 相关文章所有 5 个版本

[PDF] arxiv.org

Visual language maps for robot navigation

C Huang, O Mees, A Zeng… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Grounding language to the visual observations of a navigating agent can be performed
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …

被引用次数：259 相关文章所有 4 个版本

[PDF] arxiv.org

ViNT: A foundation model for visual navigation

D Shah, A Sridhar, N Dashora, K Stachowicz… - arXiv preprint arXiv …, 2023 - arxiv.org

General-purpose pre-trained models (" foundation models") have enabled practitioners to
produce generalizable solutions for individual machine learning problems with datasets that …

被引用次数：71 相关文章所有 4 个版本

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

被引用次数：106 相关文章所有 6 个版本

[PDF] jair.org Full View

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

被引用次数：42 相关文章所有 14 个版本

[PDF] thecvf.com

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

被引用次数：33 相关文章所有 6 个版本

[PDF] arxiv.org

Correcting robot plans with natural language feedback

P Sharma, B Sundaralingam, V Blukis, C Paxton… - arXiv preprint arXiv …, 2022 - arxiv.org

When humans design cost or goal specifications for robots, they often produce specifications
that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases …

被引用次数：87 相关文章所有 3 个版本

[PDF] mlr.press

A persistent spatial semantic representation for high-level natural language instruction execution

V Blukis, C Paxton, D Fox, A Garg… - Conference on Robot …, 2022 - proceedings.mlr.press

Natural language provides an accessible and expressive interface to specify long-term tasks
for robotic agents. However, non-experts are likely to specify such tasks with high-level …

被引用次数：119 相关文章所有 7 个版本

[PDF] thecvf.com

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Habitat 3.0: A co-habitat for humans, avatars and robots

X Puig, E Undersander, A Szot, MD Cote… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in
home environments. Habitat 3.0 offers contributions across three dimensions:(1) Accurate …

被引用次数：53 相关文章所有 3 个版本