[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

Transfer learning in robotics: An upcoming breakthrough? A review of promises and challenges

N Jaquier, MC Welle, A Gams, K Yao… - … Journal of Robotics …, 2023 - journals.sagepub.com
Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Robot learning in the era of foundation models: A survey

X Xiao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

Navgpt-2: Unleashing navigational reasoning capability for large vision-language models

G Zhou, Y Hong, Z Wang, XE Wang, Q Wu - European Conference on …, 2025 - Springer
Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …

Adaptive zone-aware hierarchical planner for vision-language navigation

C Gao, X Peng, M Yan, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …

Improving vision-and-language navigation by generating future-view image semantics

J Li, M Bansal - Proceedings of the IEEE/CVF Conference …, 2023 - openaccess.thecvf.com
Abstract Vision-and-Language Navigation (VLN) is the task that requires an agent to
navigate through the environment based on natural language instructions. At each step, the …

Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - Proceedings of the …, 2024 - aclanthology.org
Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …

Learning vision-and-language navigation from youtube videos

K Lin, P Chen, D Huang, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic
3D environments using natural language instructions. Existing VLN methods suffer from …