Vision-language navigation with random environmental mixup

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

被引用次数：77 相关文章所有 4 个版本

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arXiv preprint arXiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

被引用次数：110 相关文章所有 6 个版本

[PDF] thecvf.com

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

被引用次数：36 相关文章所有 6 个版本

[PDF] thecvf.com

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

被引用次数：31 相关文章所有 5 个版本

[PDF] thecvf.com

Envedit: Environment editing for vision-and-language navigation

J Li, H Tan, M Bansal - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …

被引用次数：74 相关文章所有 5 个版本

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

被引用次数：2 相关文章

[PDF] neurips.cc

Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation

J Li, M Bansal - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

Abstract Vision-and-Language Navigation requires the agent to follow language instructions
to navigate through 3D environments. One main challenge in Vision-and-Language …

被引用次数：24 相关文章所有 5 个版本

[PDF] thecvf.com

Adapt: Vision-language navigation with modality-aligned action prompts

B Lin, Y Zhu, Z Chen, X Liang, J Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Vision-Language Navigation (VLN) is a challenging task that requires an embodied
agent to perform action-level modality alignment, ie, make instruction-asked actions …

被引用次数：50 相关文章所有 6 个版本

[PDF] thecvf.com

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Target-driven structured transformer planner for vision-language navigation

Y Zhao, J Chen, C Gao, W Wang, L Yang… - Proceedings of the 30th …, 2022 - dl.acm.org

Vision-language navigation is the task of directing an embodied agent to navigate in 3D
scenes with natural language instructions. For the agent, inferring the long-term navigation …

被引用次数：51 相关文章所有 5 个版本