Beyond the nav-graph: Vision-and-language navigation in continuous environments

J Krantz, E Wijmans, A Majumdar, D Batra… - Computer Vision–ECCV …, 2020 - Springer
We develop a language-guided navigation task set in a continuous 3D environment where
agents must execute low-level actions to follow natural language navigation directions. By …

Sim-to-real transfer for vision-and-language navigation

P Anderson, A Shrivastava, J Truong… - … on Robot Learning, 2021 - proceedings.mlr.press
We study the challenging problem of releasing a robot in a previously unseen environment,
and having it follow unconstrained natural language navigation instructions. Recent work on …

Soat: A scene-and object-aware transformer for vision-and-language navigation

A Moudgil, A Majumdar, H Agrawal… - Advances in Neural …, 2021 - proceedings.neurips.cc
Natural language instructions for visual navigation often use scene descriptions (eg,
bedroom) and object references (eg, green chairs) to provide a breadcrumb trail to a goal …

Babywalk: Going farther in vision-and-language navigation by taking baby steps

W Zhu, H Hu, J Chen, Z Deng, V Jain, E Ie… - arXiv preprint arXiv …, 2020 - arxiv.org
Learning to follow instructions is of fundamental importance to autonomous agents for vision-
and-language navigation (VLN). In this paper, we study how an agent can navigate long …

Multimodal attention networks for low-level vision-and-language navigation

F Landi, L Baraldi, M Cornia, M Corsini… - Computer vision and …, 2021 - Elsevier
Abstract Vision-and-Language Navigation (VLN) is a challenging task in which an agent
needs to follow a language-specified path to reach a target destination. The goal gets even …

Sub-instruction aware vision-and-language navigation

Y Hong, C Rodriguez-Opazo, Q Wu, S Gould - arXiv preprint arXiv …, 2020 - arxiv.org
Vision-and-language navigation requires an agent to navigate through a real 3D
environment following natural language instructions. Despite significant advances, few …

Improving cross-modal alignment in vision language navigation via syntactic information

J Li, H Tan, M Bansal - arXiv preprint arXiv:2104.09580, 2021 - arxiv.org
Vision language navigation is the task that requires an agent to navigate through a 3D
environment based on natural language instructions. One key challenge in this task is to …

Curriculum learning for vision-and-language navigation

J Zhang, J Fan, J Peng - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation (VLN) is a task where an agent navigates in an
embodied indoor environment under human instructions. Previous works ignore the …

Learning to stop: A simple yet effective approach to urban vision-language navigation

J Xiang, XE Wang, WY Wang - arXiv preprint arXiv:2009.13112, 2020 - arxiv.org
Vision-and-Language Navigation (VLN) is a natural language grounding task where an
agent learns to follow language instructions and navigate to specified destinations in real …

Find a way forward: a language-guided semantic map navigator

Z Wang, M Li, M Wu, MF Moens… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we introduce the map-language navigation task where an agent executes
natural language instructions and moves to the target position based only on a given 3D …