Retrospectives on the embodied ai workshop
We present a retrospective on the state of Embodied AI research. Our analysis focuses on
13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are …
13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are …
Hierarchical cross-modal agent for robotics vision-and-language navigation
Deep Learning has revolutionized our ability to solve complex problems such as Vision-and-
Language Navigation (VLN). This task requires the agent to navigate to a goal purely based …
Language Navigation (VLN). This task requires the agent to navigate to a goal purely based …
Interactive learning from activity description
We present a novel interactive learning protocol that enables training request-fulfilling
agents by verbally describing their activities. Unlike imitation learning (IL), our protocol …
agents by verbally describing their activities. Unlike imitation learning (IL), our protocol …
Semantically-aware spatio-temporal reasoning agent for vision-and-language navigation in continuous environments
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in
continuous 3D environments, which requires an autonomous agent to follow natural …
continuous 3D environments, which requires an autonomous agent to follow natural …
Iterative vision-and-language navigation
Abstract We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for
evaluating language-guided agents navigating in a persistent environment over time …
evaluating language-guided agents navigating in a persistent environment over time …
Foam: A follower-aware speaker model for vision-and-language navigation
The speaker-follower models have proven to be effective in vision-and-language navigation,
where a speaker model is used to synthesize new instructions to augment the training data …
where a speaker model is used to synthesize new instructions to augment the training data …
Behavioral analysis of vision-and-language navigation agents
To be successful, Vision-and-Language Navigation (VLN) agents must be able to ground
instructions to actions based on their surroundings. In this work, we develop a methodology …
instructions to actions based on their surroundings. In this work, we develop a methodology …
CLEAR: Improving vision-language navigation with cross-lingual, environment-agnostic representations
Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the
environment based on language instructions. In this paper, we aim to solve two key …
environment based on language instructions. In this paper, we aim to solve two key …
Define, evaluate, and improve task-oriented cognitive capabilities for instruction generation models
Recent work studies the cognitive capabilities of language models through psychological
tests designed for humans. While these studies are helpful for understanding the general …
tests designed for humans. While these studies are helpful for understanding the general …
Graphmapper: Efficient visual navigation by scene graph generation
Understanding the geometric relationships between objects in a scene is a core capability in
enabling both humans and autonomous agents to navigate in new environments. A sparse …
enabling both humans and autonomous agents to navigate in new environments. A sparse …