Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …
datasets, providing for good generalization to real-world settings. However, particularly in …
Visual language maps for robot navigation
Grounding language to the visual observations of a navigating agent can be performed
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …
ViNT: A foundation model for visual navigation
General-purpose pre-trained models (" foundation models") have enabled practitioners to
produce generalizable solutions for individual machine learning problems with datasets that …
produce generalizable solutions for individual machine learning problems with datasets that …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Scaling data generation in vision-and-language navigation
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …
demand for the diversity of traversable environments and the quantity of supervision for …
Correcting robot plans with natural language feedback
When humans design cost or goal specifications for robots, they often produce specifications
that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases …
that are ambiguous, underspecified, or beyond planners' ability to solve. In these cases …
A persistent spatial semantic representation for high-level natural language instruction execution
Natural language provides an accessible and expressive interface to specify long-term tasks
for robotic agents. However, non-experts are likely to specify such tasks with high-level …
for robotic agents. However, non-experts are likely to specify such tasks with high-level …
Gridmm: Grid memory map for vision-and-language navigation
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …
following the natural language instruction in 3D environments. To represent the previously …
Habitat 3.0: A co-habitat for humans, avatars and robots
We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in
home environments. Habitat 3.0 offers contributions across three dimensions:(1) Accurate …
home environments. Habitat 3.0 offers contributions across three dimensions:(1) Accurate …