Adapt: Vision-language navigation with modality-aligned action prompts

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier

Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

被引用次数：268 相关文章所有 4 个版本

[PDF] sagepub.com

Transfer learning in robotics: An upcoming breakthrough? A review of promises and challenges

N Jaquier, MC Welle, A Gams, K Yao… - … Journal of Robotics …, 2023 - journals.sagepub.com

Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …

被引用次数：12 相关文章所有 2 个版本

[PDF] thecvf.com

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

被引用次数：46 相关文章所有 5 个版本

[PDF] arxiv.org

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

被引用次数：22 相关文章所有 4 个版本

[PDF] thecvf.com

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

被引用次数：47 相关文章所有 5 个版本

[PDF] arxiv.org

Navgpt-2: Unleashing navigational reasoning capability for large vision-language models

G Zhou, Y Hong, Z Wang, XE Wang, Q Wu - European Conference on …, 2025 - Springer

Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …

被引用次数：11 相关文章所有 7 个版本

[PDF] thecvf.com

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

被引用次数：30 相关文章所有 6 个版本

[PDF] thecvf.com

Improving vision-and-language navigation by generating future-view image semantics

J Li, M Bansal - Proceedings of the IEEE/CVF Conference …, 2023 - openaccess.thecvf.com

Abstract Vision-and-Language Navigation (VLN) is the task that requires an agent to
navigate through the environment based on natural language instructions. At each step, the …

被引用次数：29 相关文章所有 5 个版本

[PDF] aclanthology.org

Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - Proceedings of the …, 2024 - aclanthology.org

Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …

被引用次数：12 相关文章所有 2 个版本

[PDF] thecvf.com

Learning vision-and-language navigation from youtube videos

K Lin, P Chen, D Huang, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic
3D environments using natural language instructions. Existing VLN methods suffer from …

被引用次数：21 相关文章所有 6 个版本