Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Envedit: Environment editing for vision-and-language navigation

J Li, H Tan, M Bansal - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

J Li, M Bansal - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …

Adapt: Vision-language navigation with modality-aligned action prompts

B Lin, Y Zhu, Z Chen, X Liang, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Vision-Language Navigation (VLN) is a challenging task that requires an embodied
agent to perform action-level modality alignment, ie, make instruction-asked actions …

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

Target-driven structured transformer planner for vision-language navigation

Y Zhao, J Chen, C Gao, W Wang, L Yang… - Proceedings of the 30th …, 2022 - dl.acm.org
Vision-language navigation is the task of directing an embodied agent to navigate in 3D
scenes with natural language instructions. For the agent, inferring the long-term navigation …