Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Looking beyond single images for weakly supervised semantic segmentation learning

W Wang, G Sun, L Van Gool - IEEE Transactions on Pattern …, 2022 - ieeexplore.ieee.org
This article studies the problem of learning weakly supervised semantic segmentation
(WSSS) from image-level supervision only. Current popular solutions leverage object …

Hop: History-and-order aware pre-training for vision-and-language navigation

Y Qiao, Y Qi, Y Hong, Z Yu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …

Dreamwalker: Mental planning for continuous vision-language navigation

H Wang, W Liang, L Van Gool… - Proceedings of the …, 2023 - openaccess.thecvf.com
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely
traversable environment to reach a distant target location, given language instructions. It …

Structured scene memory for vision-language navigation

H Wang, W Wang, W Liang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recently, numerous algorithms have been developed to tackle the problem of vision-
language navigation (VLN), ie, entailing an agent to navigate 3D environments through …

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

Target-driven structured transformer planner for vision-language navigation

Y Zhao, J Chen, C Gao, W Wang, L Yang… - Proceedings of the 30th …, 2022 - dl.acm.org
Vision-language navigation is the task of directing an embodied agent to navigate in 3D
scenes with natural language instructions. For the agent, inferring the long-term navigation …