[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
grounding natural language in image data, facilitating a broad range of cross-modal tasks …
Transfer learning in robotics: An upcoming breakthrough? A review of promises and challenges
Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …
Bird's-Eye-View Scene Graph for Vision-Language Navigation
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …
environments following human instructions, has shown great advances. However, current …
Robot learning in the era of foundation models: A survey
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …
Gridmm: Grid memory map for vision-and-language navigation
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …
following the natural language instruction in 3D environments. To represent the previously …
Navgpt-2: Unleashing navigational reasoning capability for large vision-language models
Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …
Adaptive zone-aware hierarchical planner for vision-language navigation
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …
the global goal according to the instruction. Essentially, during navigation, a series of sub …
Improving vision-and-language navigation by generating future-view image semantics
Abstract Vision-and-Language Navigation (VLN) is the task that requires an agent to
navigate through the environment based on natural language instructions. At each step, the …
navigate through the environment based on natural language instructions. At each step, the …
Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation
Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …
making and generalization abilities across various tasks. However, existing zero-shot agents …
Learning vision-and-language navigation from youtube videos
Vision-and-language navigation (VLN) requires an embodied agent to navigate in realistic
3D environments using natural language instructions. Existing VLN methods suffer from …
3D environments using natural language instructions. Existing VLN methods suffer from …