Training language models to follow instructions with human feedback
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …
intent. For example, large language models can generate outputs that are untruthful, toxic, or …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Envedit: Environment editing for vision-and-language navigation
Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …
environment based on natural language instructions. Due to limited available data for agent …
Counterfactual cycle-consistent learning for instruction following and generation in vision-language navigation
Since the rise of vision-language navigation (VLN), great progress has been made in
instruction following--building a follower to navigate environments under the guidance of …
instruction following--building a follower to navigate environments under the guidance of …
Pathdreamer: A world model for indoor navigation
People navigating in unfamiliar buildings take advantage of myriad visual, spatial and
semantic cues to efficiently achieve their navigation goals. Towards equipping …
semantic cues to efficiently achieve their navigation goals. Towards equipping …
Less is more: Generating grounded navigation instructions from landmarks
We study the automatic generation of navigation instructions from 360-degree images
captured on indoor routes. Existing generators suffer from poor visual grounding, causing …
captured on indoor routes. Existing generators suffer from poor visual grounding, causing …
Etpnav: Evolving topological planning for vision-language navigation in continuous environments
Vision-language navigation is a task that requires an agent to follow instructions to navigate
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
in environments. It becomes increasingly crucial in the field of embodied AI, with potential …
Vision-language navigation: a survey and taxonomy
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …
from a human guide to navigate in previously unseen environments using visual …
A new path: Scaling vision-and-language navigation with synthetic instructions and imitation learning
Recent studies in Vision-and-Language Navigation (VLN) train RL agents to execute natural-
language navigation instructions in photorealistic environments, as a step towards robots …
language navigation instructions in photorealistic environments, as a step towards robots …