Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Scaling data generation in vision-and-language navigation
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …
demand for the diversity of traversable environments and the quantity of supervision for …
Bird's-Eye-View Scene Graph for Vision-Language Navigation
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …
environments following human instructions, has shown great advances. However, current …
Looking beyond single images for weakly supervised semantic segmentation learning
This article studies the problem of learning weakly supervised semantic segmentation
(WSSS) from image-level supervision only. Current popular solutions leverage object …
(WSSS) from image-level supervision only. Current popular solutions leverage object …
Hop: History-and-order aware pre-training for vision-and-language navigation
Pre-training has been adopted in a few of recent works for Vision-and-Language Navigation
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …
(VLN). However, previous pre-training methods for VLN either lack the ability to predict …
Dreamwalker: Mental planning for continuous vision-language navigation
VLN-CE is a recently released embodied task, where AI agents need to navigate a freely
traversable environment to reach a distant target location, given language instructions. It …
traversable environment to reach a distant target location, given language instructions. It …
Structured scene memory for vision-language navigation
Recently, numerous algorithms have been developed to tackle the problem of vision-
language navigation (VLN), ie, entailing an agent to navigate 3D environments through …
language navigation (VLN), ie, entailing an agent to navigate 3D environments through …
Adaptive zone-aware hierarchical planner for vision-language navigation
Abstract The task of Vision-Language Navigation (VLN) is for an embodied agent to reach
the global goal according to the instruction. Essentially, during navigation, a series of sub …
the global goal according to the instruction. Essentially, during navigation, a series of sub …
Target-driven structured transformer planner for vision-language navigation
Vision-language navigation is the task of directing an embodied agent to navigate in 3D
scenes with natural language instructions. For the agent, inferring the long-term navigation …
scenes with natural language instructions. For the agent, inferring the long-term navigation …